Discussion:
[rsnapshot-discuss] strategy for backup
Eildert Groeneveld
2015-06-05 15:48:23 UTC
Permalink
Dear All

it is not lack of information and cookbooks available for Rsnapshot
that poses a problem but rather the shear volume.

I would like to back up a small Linux network to a dedicated server,
where the clients are not always on (laptops and other machines being
switched off).

I do not need to develop my own strategy, but would be happy if I could
just use a setup that has proven useful elsewhere. The only required
that I have are (which I would assume everyone else has):
- automatic operation (the server runs 24/7)
- information (mail) how the backup went each day

I would assume that this is a standard setup that many have. So I would
assume that there is also a standard rsnapshot description for it.

Maybe here is someone who can point me to such a document.

Thanks in advance

Tredlie


------------------------------------------------------------------------------
Christopher Barry
2015-06-05 16:40:30 UTC
Permalink
On Fri, 05 Jun 2015 17:48:23 +0200
Post by Eildert Groeneveld
Dear All
it is not lack of information and cookbooks available for Rsnapshot
that poses a problem but rather the shear volume.
I would like to back up a small Linux network to a dedicated server,
where the clients are not always on (laptops and other machines being
switched off).
I do not need to develop my own strategy, but would be happy if I could
just use a setup that has proven useful elsewhere. The only required
- automatic operation (the server runs 24/7)
- information (mail) how the backup went each day
I would assume that this is a standard setup that many have. So I would
assume that there is also a standard rsnapshot description for it.
Maybe here is someone who can point me to such a document.
Thanks in advance
Tredlie
------------------------------------------------------------------------------
What makes this setup different/interesting/complex is the apparent
unreliability of the client state, e.g. it would not be desirable for
the client to be switched off in the middle of a backup, and it will
produce a lot of false errors when the client is off on purpose.

If the clients are always on from time X to time Y, then this can be
reasonable scheduled on the server. If the time the devices are up and
on the network is a variable, then the most efficient method would be
to have the client initiate the backup at a time when it is running, on
the network, and will be for a long enough time to complete the backup.

You seem to want a fully automatic solution for a variable and
unpredictable environment. Possibly using a script to ssh from the
client to the server, and from there running rsnapshot against the
client is what would work best overall. How you decide to schedule those
backups and handle errors is another discussion.


--
Regards,
Christopher Barry

Random geeky fortune:
There's such a thing as too much point on a pencil.
-- H. Allen Smith, "Let the Crabgrass Grow"

------------------------------------------------------------------------------
Eildert Groeneveld
2015-06-05 17:34:09 UTC
Permalink
Christopher

thanks for the response.
You are indeed listing the specifics of this setup, backup of laptops in
a network. And all your points are well taken.

However, I would think that there are million of laptops floating around
that want to get integrated in standard company or home backup systems.
Thus, my question is also one about strategy. In our world (Linux) if
there is a proper problem, one can be sure that there is also an
matching solution.

Maybe, this solution is not fully automated, i.e. it may put some
constraints on the user (like do not switch off your system without
checking for backup running).

How do you people do this? Hopefully on the basis of Rsnapshot?

Tred
Post by Christopher Barry
On Fri, 05 Jun 2015 17:48:23 +0200
Post by Eildert Groeneveld
Dear All
it is not lack of information and cookbooks available for Rsnapshot
that poses a problem but rather the shear volume.
I would like to back up a small Linux network to a dedicated server,
where the clients are not always on (laptops and other machines being
switched off).
I do not need to develop my own strategy, but would be happy if I could
just use a setup that has proven useful elsewhere. The only required
- automatic operation (the server runs 24/7)
- information (mail) how the backup went each day
I would assume that this is a standard setup that many have. So I would
assume that there is also a standard rsnapshot description for it.
Maybe here is someone who can point me to such a document.
Thanks in advance
Tredlie
------------------------------------------------------------------------------
What makes this setup different/interesting/complex is the apparent
unreliability of the client state, e.g. it would not be desirable for
the client to be switched off in the middle of a backup, and it will
produce a lot of false errors when the client is off on purpose.
If the clients are always on from time X to time Y, then this can be
reasonable scheduled on the server. If the time the devices are up and
on the network is a variable, then the most efficient method would be
to have the client initiate the backup at a time when it is running, on
the network, and will be for a long enough time to complete the backup.
You seem to want a fully automatic solution for a variable and
unpredictable environment. Possibly using a script to ssh from the
client to the server, and from there running rsnapshot against the
client is what would work best overall. How you decide to schedule those
backups and handle errors is another discussion.
--
Regards,
Christopher Barry
There's such a thing as too much point on a pencil.
-- H. Allen Smith, "Let the Crabgrass Grow"
------------------------------------------------------------------------------
Tapani Tarvainen
2015-06-05 18:31:49 UTC
Permalink
Post by Eildert Groeneveld
However, I would think that there are million of laptops floating around
that want to get integrated in standard company or home backup systems.
Thus, my question is also one about strategy. In our world (Linux) if
there is a proper problem, one can be sure that there is also an
matching solution.
Of course.
Post by Eildert Groeneveld
Maybe, this solution is not fully automated, i.e. it may put some
constraints on the user (like do not switch off your system without
checking for backup running).
How do you people do this? Hopefully on the basis of Rsnapshot?
I've been doing just that with rsnapshot for years, backing
up laptops of people who really can't be expected to pay
any attention to it, with machines that are on at irregular
times and can be turned off unexpectedly at any time.

I'm on the road now so can't show you my config, but
the key points are:

(1) Use sync_first option and do the rotation only if
sync run succeeds. That way it doesn't matter
if the target shuts off halfway, next sync will just
continue where it left off (thus backups can actually
work even if the machine is never on long enough
continuously for a complete backup run).
(2) Schedule backups more often than are really needed,
assuming most will fail because the target is off,
and at the lowest level do more than expected at
the next level (e.g., do hourly every hour but
set keep level for hourlies at only two, so that
two hourlies are likely to succeed often enough).
(3) use Separate config file and crontab entries
for every machine (or a custom script to handle them
independently), so they won't interfere with each other
(assume at least one machine will be usually off anyway).
(4) Do hourly-daily-weekly-monthly-yearly (or whatever)
cycles regardless of actual times, instead do, e.g.,
weekly after seven succesful dailies &c.

That's pretty much it. Takes a bit of planning to get the
details right but once done, works like charm.

Although, I've only done this with Linux machines
(including said laptops) - I've no idea what problems
Windows or Mac or other yecchy things could cause.
--
Tapani Tarvainen

------------------------------------------------------------------------------
Eildert Groeneveld
2015-06-05 19:09:13 UTC
Permalink
Tapani

that sounds like a strategy to me worth being documented and shared!

Maybe once you are back?


thanks!!

Eildert
Post by Tapani Tarvainen
Post by Eildert Groeneveld
However, I would think that there are million of laptops floating around
that want to get integrated in standard company or home backup systems.
Thus, my question is also one about strategy. In our world (Linux) if
there is a proper problem, one can be sure that there is also an
matching solution.
Of course.
Post by Eildert Groeneveld
Maybe, this solution is not fully automated, i.e. it may put some
constraints on the user (like do not switch off your system without
checking for backup running).
How do you people do this? Hopefully on the basis of Rsnapshot?
I've been doing just that with rsnapshot for years, backing
up laptops of people who really can't be expected to pay
any attention to it, with machines that are on at irregular
times and can be turned off unexpectedly at any time.
I'm on the road now so can't show you my config, but
(1) Use sync_first option and do the rotation only if
sync run succeeds. That way it doesn't matter
if the target shuts off halfway, next sync will just
continue where it left off (thus backups can actually
work even if the machine is never on long enough
continuously for a complete backup run).
(2) Schedule backups more often than are really needed,
assuming most will fail because the target is off,
and at the lowest level do more than expected at
the next level (e.g., do hourly every hour but
set keep level for hourlies at only two, so that
two hourlies are likely to succeed often enough).
(3) use Separate config file and crontab entries
for every machine (or a custom script to handle them
independently), so they won't interfere with each other
(assume at least one machine will be usually off anyway).
(4) Do hourly-daily-weekly-monthly-yearly (or whatever)
cycles regardless of actual times, instead do, e.g.,
weekly after seven succesful dailies &c.
That's pretty much it. Takes a bit of planning to get the
details right but once done, works like charm.
Although, I've only done this with Linux machines
(including said laptops) - I've no idea what problems
Windows or Mac or other yecchy things could cause.
--
Eildert Groeneveld
===================================================
Institute of Farm Animal Genetics (FLI)
Mariensee 31535 Neustadt Germany
Tel : (+49)(0)5034 871155 Fax : (+49)(0)5034 871143
e-mail: ***@fli.bund.de
web: http://vce.tzv.fal.de
==================================================


------------------------------------------------------------------------------
Eildert Groeneveld
2015-06-08 17:08:02 UTC
Permalink
Thanks to all of you! This is once again am impressive
example about open source and support in that area.

To me Tapani's setup sounds very appealing as it can be used a
generalized setup for all rsnapshot backup situation:
the permanent connection/availability of clients is just a
special case of the temporarily unavailable systems to be backed up.
Thus, looking at it from a strategic perspective (as indicated in the
OP) this has a lot of appeal to me.

@Tapani: maybe you could provide more info on your scripts
and they could be turned into a more generally available and documented
procedure

Eildert
Post by Tapani Tarvainen
Post by Eildert Groeneveld
However, I would think that there are million of laptops floating around
that want to get integrated in standard company or home backup systems.
Thus, my question is also one about strategy. In our world (Linux) if
there is a proper problem, one can be sure that there is also an
matching solution.
Of course.
Post by Eildert Groeneveld
Maybe, this solution is not fully automated, i.e. it may put some
constraints on the user (like do not switch off your system without
checking for backup running).
How do you people do this? Hopefully on the basis of Rsnapshot?
I've been doing just that with rsnapshot for years, backing
up laptops of people who really can't be expected to pay
any attention to it, with machines that are on at irregular
times and can be turned off unexpectedly at any time.
I'm on the road now so can't show you my config, but
(1) Use sync_first option and do the rotation only if
sync run succeeds. That way it doesn't matter
if the target shuts off halfway, next sync will just
continue where it left off (thus backups can actually
work even if the machine is never on long enough
continuously for a complete backup run).
(2) Schedule backups more often than are really needed,
assuming most will fail because the target is off,
and at the lowest level do more than expected at
the next level (e.g., do hourly every hour but
set keep level for hourlies at only two, so that
two hourlies are likely to succeed often enough).
(3) use Separate config file and crontab entries
for every machine (or a custom script to handle them
independently), so they won't interfere with each other
(assume at least one machine will be usually off anyway).
(4) Do hourly-daily-weekly-monthly-yearly (or whatever)
cycles regardless of actual times, instead do, e.g.,
weekly after seven succesful dailies &c.
That's pretty much it. Takes a bit of planning to get the
details right but once done, works like charm.
Although, I've only done this with Linux machines
(including said laptops) - I've no idea what problems
Windows or Mac or other yecchy things could cause.
--
Eildert Groeneveld
===================================================
Institute of Farm Animal Genetics (FLI)
Mariensee 31535 Neustadt Germany
Tel : (+49)(0)5034 871155 Fax : (+49)(0)5034 871143
e-mail: ***@fli.bund.de
web: http://vce.tzv.fal.de
==================================================


------------------------------------------------------------------------------
Helmut Hullen
2015-06-05 19:23:00 UTC
Permalink
Hallo, Eildert,
Post by Eildert Groeneveld
However, I would think that there are million of laptops floating
around that want to get integrated in standard company or home backup
systems. Thus, my question is also one about strategy. In our world
(Linux) if there is a proper problem, one can be sure that there is
also an matching solution.
"rsnapshot" should run on that machine for which the backup is wanted,
and "rsnapshot" needs to run as "root" (on this machine).

Can you fulfill these two conditions for each laptop in your (company)
LAN?

Viele Gruesse!
Helmut


------------------------------------------------------------------------------
Patrick O'Callaghan
2015-06-06 11:23:48 UTC
Permalink
Post by Helmut Hullen
"rsnapshot" should run on that machine for which the backup is wanted,
and "rsnapshot" needs to run as "root" (on this machine).
Rsnapshot should run on the server where the backup is being stored.
Running on the client machine is possible but extremely inefficient
(on the order of 10 times slower in my experience) because the process
of detecting unchanged files means they have to copied across the
network to be analyzed, even if they are then discarded.

poc

------------------------------------------------------------------------------
Helmut Hullen
2015-06-06 13:07:00 UTC
Permalink
Hallo, Patrick,
Post by Patrick O'Callaghan
Post by Helmut Hullen
"rsnapshot" should run on that machine for which the backup is
wanted, and "rsnapshot" needs to run as "root" (on this machine).
Rsnapshot should run on the server where the backup is being stored.
Aehemmm ... then you may run into problems because it's difficult to
read the complete client's hdd with it's root rights.

Viele Gruesse!
Helmut


------------------------------------------------------------------------------
Tapani Tarvainen
2015-06-06 14:18:51 UTC
Permalink
Post by Helmut Hullen
Post by Patrick O'Callaghan
Rsnapshot should run on the server where the backup is being stored.
Aehemmm ... then you may run into problems because it's difficult to
read the complete client's hdd with it's root rights.
Are we having a language problem here?

Normally rsnapshot does on the server but accesses the clients
with root privileges over ssh, so it can read everything.
I rather suspect that's what you both are trying to say.

(Sorry if I misunderstood.)
--
Tapani Tarvainen

------------------------------------------------------------------------------
Patrick O'Callaghan
2015-06-07 00:07:00 UTC
Permalink
Post by Helmut Hullen
Hallo, Patrick,
Post by Patrick O'Callaghan
Post by Helmut Hullen
"rsnapshot" should run on that machine for which the backup is
wanted, and "rsnapshot" needs to run as "root" (on this machine).
Rsnapshot should run on the server where the backup is being stored.
Aehemmm ... then you may run into problems because it's difficult to
read the complete client's hdd with it's root rights.
Clearly the server has to connect with a root-privileged rsync process
on the client. This can be done either directly (by running an rsync
daemon) or via an ssh tunnel to a root login. The latter is preferable
in my opinion because you can set it up to use shared keys rather than
an inline root password. If you're paranoid you can also configure the
sshd on the client side to only allow execution of rsync and no other
remote command.

I suspect the rsnapshot authors simply assumed that it would be used
this way, since the documentation doesn't really make it very
explicit. I actually set it up the other way (running rsnapshot from
the client to an NFS-shared backup server) and everything worked for a
long time. It was only when I looked into what was really happening
that I realized that I had misunderstood this. When I changed it to
work the other way (with no NFS sharing for this purpose), backup
times dropped from over an hour to around 5 minutes.

poc

------------------------------------------------------------------------------
Nico Kadel-Garcia
2015-06-07 04:16:21 UTC
Permalink
On Sat, Jun 6, 2015 at 8:07 PM, Patrick O'Callaghan
Post by Patrick O'Callaghan
Post by Helmut Hullen
Hallo, Patrick,
Post by Patrick O'Callaghan
Post by Helmut Hullen
"rsnapshot" should run on that machine for which the backup is
wanted, and "rsnapshot" needs to run as "root" (on this machine).
Rsnapshot should run on the server where the backup is being stored.
Aehemmm ... then you may run into problems because it's difficult to
read the complete client's hdd with it's root rights.
Clearly the server has to connect with a root-privileged rsync process
on the client. This can be done either directly (by running an rsync
Being "root privileged" depends on what you're backing up. I've
certainly used rsnapshot to back up personal user content or
databases, that required no root privileges. Access to the content of
the target host depends on privilege, certainly. But it's often
eassier to use personal privileges, or a very limited and lightly
privileged account, to grab basic config files and/or databases and
not risk the entire server if the backup system is compromised.
Post by Patrick O'Callaghan
daemon) or via an ssh tunnel to a root login. The latter is preferable
in my opinion because you can set it up to use shared keys rather than
an inline root password. If you're paranoid you can also configure the
sshd on the client side to only allow execution of rsync and no other
remote command.
I suspect the rsnapshot authors simply assumed that it would be used
this way, since the documentation doesn't really make it very
explicit. I actually set it up the other way (running rsnapshot from
the client to an NFS-shared backup server) and everything worked for a
long time. It was only when I looked into what was really happening
that I realized that I had misunderstood this. When I changed it to
work the other way (with no NFS sharing for this purpose), backup
times dropped from over an hour to around 5 minutes.
poc
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
------------------------------------------------------------------------------
Sam Pinkus
2015-06-07 05:45:20 UTC
Permalink
------------------------------------------------------------------------------
Patrick O'Callaghan
2015-06-07 12:46:28 UTC
Permalink
I have a feeling this has been covered before, but whats the working
hypothesis for the x10 decrease in speeds mentioned (assuming this is
NFSv4)? I tested with NFSv4 and got the roughly the same on the initial
sync. With no files changed NFS is slower, but it was still under 1 second
for 1000 files. Using SSHFS in my test load I got x7 slower with synchronous
R/W. With no changes its roughly the same as rsnapshot.
The speed difference is from direct measurement from the two versions
on my setup ("client" and "server" refer to the the functional
components of the backup system, i.e. the machine being backed up and
the machine where the backup set is stored, not the
communications-level processes which may go in the other direction):

Version 1: rsnapshot running on the client, with the backup set being
accessed via NFS mount(s) from the server. There is only one rsync
process, running on the client. From the point of view of this
process, it's comparing two local files at a time. Recall how rsync
works: candidate files are compared by various criteria including
block-by-block to minimize the amount of physical copying. However in
this scenario the blocks are being compared *on the client machine*,
i.e. each file is transferred via NFS protocols to the client,
comparison is run, then any differing blocks are written back to the
new backup copy. Clearly in this case rsync is not reducing the amount
of traffic relative to a direct remote copy (in fact it's often moving
more data across the network in this case).

Version 2: rsnapshot runs on the server, connecting via an SSH tunnel
to an rsync process on the client. Each end of the connection
generates its own block checksums *locally* and only differing blocks
are copied across. This is how rsync is designed to work, and it does
it very efficiently.

poc

------------------------------------------------------------------------------
Christopher Barry
2015-06-07 14:42:47 UTC
Permalink
On Sun, 7 Jun 2015 13:46:28 +0100
Post by Patrick O'Callaghan
I have a feeling this has been covered before, but whats the working
hypothesis for the x10 decrease in speeds mentioned (assuming this is
NFSv4)? I tested with NFSv4 and got the roughly the same on the
initial sync. With no files changed NFS is slower, but it was still
under 1 second for 1000 files. Using SSHFS in my test load I got x7
slower with synchronous R/W. With no changes its roughly the same as
rsnapshot.
The speed difference is from direct measurement from the two versions
on my setup ("client" and "server" refer to the the functional
components of the backup system, i.e. the machine being backed up and
the machine where the backup set is stored, not the
Version 1: rsnapshot running on the client, with the backup set being
accessed via NFS mount(s) from the server. There is only one rsync
process, running on the client. From the point of view of this
process, it's comparing two local files at a time. Recall how rsync
works: candidate files are compared by various criteria including
block-by-block to minimize the amount of physical copying. However in
this scenario the blocks are being compared *on the client machine*,
i.e. each file is transferred via NFS protocols to the client,
comparison is run, then any differing blocks are written back to the
new backup copy. Clearly in this case rsync is not reducing the amount
of traffic relative to a direct remote copy (in fact it's often moving
more data across the network in this case).
Version 2: rsnapshot runs on the server, connecting via an SSH tunnel
to an rsync process on the client. Each end of the connection
generates its own block checksums *locally* and only differing blocks
are copied across. This is how rsync is designed to work, and it does
it very efficiently.
poc
------------------------------------------------------------------------------
The way I see the problem, is that the variability of client access
poses issues with the basic way rsnapshot wants to operate.

One method that Tapani put forward keeps the solution entirely in the
domain of rsnapshot by creatively manipulating timing and basically
'over-subscribing', understanding and accepting that there will be a lot
of failures. This keeps all configuration centralized on the backup
server, and thus is easier to manage for a sysadmin. With the only tool
at your disposal being rsnapshot itself, this seems like it is a very
good solution.

Another method I can think of expands the toolbox somewhat using bash
and ssh. This method has the benefit of running backups based on client
availability, rather than the server continuously trying to backup
ghosts and erroring out. e.g. it's more efficient and less noisy.

Here's how it would work:

Each client would have a script execute using the @reboot cron
directive. This script would test for the availability of the backup
server using ping to it's local LAN address. Not available? exit (or
keep testing for some period of time before exiting, to allow network
to come up). Available? scp a flag file named for the hostname of the
client to a specific directory on the server. This file could be empty
or it may contain data like IP address of the client. All of the data
about the file (name, IP) are variables. The same exact script is on all
clients. The client could also have a icon on their desktop to manually
run the script if they want to force a backup as well.

Once a minute, the server would scan this directory for files using a
simple bash wrapper to rsnapshot. If a file is found, the server would
read then delete the file(s), and initiate a backup to the client(s),
using the data present in the file(s) and filename(s) to uniquely
generate the unique portions of the rsnapshot.conf file(s) on the fly.

This can be made completely self administering if desired. In other
words, a new client that has never been backed up can just work because
it has a unique hostname, the client script is present on the host,
and it's key is on the server, and the server's key is on it. All
things easily automated as well.

Then, I would use apache to allow remote access to each user's data
securely to self-serve restores as required. Again, with an icon
already setup on their desktops.

Anyway, that's how I would approach this issue.

--
Regards,
Christopher Barry

Random geeky fortune:
Slous' Contention:
If you do a job too well, you'll get stuck with it.

------------------------------------------------------------------------------
Nico Kadel-Garcia
2015-06-07 15:36:02 UTC
Permalink
On Sun, Jun 7, 2015 at 10:42 AM, Christopher Barry
Post by Christopher Barry
Another method I can think of expands the toolbox somewhat using bash
and ssh. This method has the benefit of running backups based on client
availability, rather than the server continuously trying to backup
ghosts and erroring out. e.g. it's more efficient and less noisy.
directive. This script would test for the availability of the backup
server using ping to it's local LAN address. Not available? exit (or
keep testing for some period of time before exiting, to allow network
to come up). Available? scp a flag file named for the hostname of the
client to a specific directory on the server. This file could be empty
or it may contain data like IP address of the client. All of the data
about the file (name, IP) are variables. The same exact script is on all
clients. The client could also have a icon on their desktop to manually
run the script if they want to force a backup as well.
This does not scale well. And many laptops go into "suspend" mode,
rather than being rebooted when opened up, so doing this on reboot
gets awkward, so relying on boot time operations to trigger a backup
is awkward, and will make the system lag like crazy while at its most
urgent "I need my laptop running now!!!" stage.

Giving clients access to scp information to the rsnapshot server is a
security problem waiting to bite you, hard. Unless you're very
careful, that scp access can be used to overwhelm various partitions
on your rsnapshot server. And it's not necessary.

It's fairly reasonable to set up cron tasks on the server to look for
the availability of laptops, and to review some additional status such
as the age of the youngest successful rsnapshot for that client, to
generate an audit for rsnapshot. targets on the ranspshot server
itself.
Post by Christopher Barry
Once a minute, the server would scan this directory for files using a
simple bash wrapper to rsnapshot. If a file is found, the server would
read then delete the file(s), and initiate a backup to the client(s),
using the data present in the file(s) and filename(s) to uniquely
generate the unique portions of the rsnapshot.conf file(s) on the fly.
That requires multiple hosts writing to the same directory on the same
host. No, no, no, no, no, never do this!!! It's re-inventing file
based semaphores cross multiple hosts, and it's not needed.
Post by Christopher Barry
This can be made completely self administering if desired. In other
words, a new client that has never been backed up can just work because
it has a unique hostname, the client script is present on the host,
and it's key is on the server, and the server's key is on it. All
things easily automated as well.
Allowing the clients to write these configuration flags to the server
is begging to allow one client to overwrite another system's config
file and start mixing and matching backups. This is just *begging* for
pain.
Post by Christopher Barry
Then, I would use apache to allow remote access to each user's data
securely to self-serve restores as required. Again, with an icon
already setup on their desktops.
Anyway, that's how I would approach this issue.
Ouch. No, Just no. Too many sytems, too much cross-communication, and
no record of the changes.

If you really need to allow clients to configure themselves, you might
consider using a git or subversion repo to store formatted
configurations published by the client, allowing the rsnapshot server
to read them. That gets you away from running and securing extra
in-house hand-built services. But it still presents serous security
and configurations concerns for a critical system.

------------------------------------------------------------------------------
Patrick O'Callaghan
2015-06-07 16:44:18 UTC
Permalink
Post by Nico Kadel-Garcia
Ouch. No, Just no. Too many sytems, too much cross-communication, and
no record of the changes.
I agree. You definitely want to keep as much as possible of the
complexity on the server and not the clients. The clients need only
have:
* An rsync binary installed
* An sshd binary installed
* SSH tokens shared with the server (at the privilege level required)

The server needs:
* rsnapshot (obviously)
* SSH tokens set up for each client
* rsnapshot config files for each client (small per-client files
included by a global master file).
* A queueing system to handle multiple clients, each of which can have
multiple failures before succeeding.

The last part is probably the most complex. Part of it is handled by
cron but there'd also need to be more policy embedded in scripts that
cron calls (i.e. it won't call rsnapshot directly). This could get as
complicated as you like, e.g. should multiple clients be backed up in
parallel? should there be a warning after a certain level of failures?
does the warning go to the user or to the admin? etc. These are
decisions that only the admin can take. What is clear is that if users
have freedom to turn their machines on and off at arbitrary times,
they need to be told that backups will only be on a "best effort"
basis.

poc

------------------------------------------------------------------------------
Tapani Tarvainen
2015-06-07 16:07:34 UTC
Permalink
Post by Christopher Barry
The way I see the problem, is that the variability of client access
poses issues with the basic way rsnapshot wants to operate.
One method that Tapani put forward keeps the solution entirely in the
domain of rsnapshot by creatively manipulating timing and basically
'over-subscribing', understanding and accepting that there will be a lot
of failures.
Yes. Note though that "failure" here is really just detecting client
presence, even though it may sound uglier.

I might add that I did consider and even experiment with various
means of checking client availability, having the client notify
the server it's ready &c, but eventually decided there's no real
benefit over simply letting rsnapshot do it all.
Post by Christopher Barry
This keeps all configuration centralized on the backup server, and
thus is easier to manage for a sysadmin. With the only tool at your
disposal being rsnapshot itself, this seems like it is a very good
solution.
Thank you.
Post by Christopher Barry
Another method I can think of expands the toolbox somewhat using bash
and ssh. This method has the benefit of running backups based on client
availability, rather than the server continuously trying to backup
ghosts and erroring out. e.g. it's more efficient and less noisy.
I don't think you can improve efficiency much. Testing for ssh
success is pretty fast and efficient, and that's all rsnapshot does
before giving up.

There's one area where improvement is possible: if the clients
are connected to the server's network so rarely and for so short
periods of time that backup rarely or never completes, it would
help to have the client initiate the backup. But I've never found
that to be a problem: adjusting the frequency the server tries
to do the backup has always been sufficient (and due to the
way rsnapshot works, an extra backup with nothing new to copy
takes very little time).

As for being noisy, that depends on how you handle error
messages. It should not be too hard to write a custom
script to report successes and failures in as much
detail as you like.
Without going into details, your solutions feels seriously
over-engineered, difficult to manage and error-prone for
very little practical gain.
--
Tapani Tarvainen

------------------------------------------------------------------------------
Christopher Barry
2015-06-08 09:17:49 UTC
Permalink
Post by Tapani Tarvainen
Without going into details, your solutions feels seriously
over-engineered, difficult to manage and error-prone for
very little practical gain.
details would be nice, since 'feelings' are a tad unscientific.

but to rebut your heartfelt belief-system briefly:

* it's not 'over-engineered', it's 'under-understood'

* it's not 'difficult to manage', it's 'self-managing'

* error-prone? irony. error is the typical result of your method.


good luck,
-C


------------------------------------------------------------------------------
Tapani Tarvainen
2015-06-08 09:37:32 UTC
Permalink
Post by Christopher Barry
Post by Tapani Tarvainen
Without going into details, your solutions feels seriously
over-engineered, difficult to manage and error-prone for
very little practical gain.
details would be nice, since 'feelings' are a tad unscientific.
* it's not 'over-engineered', it's 'under-understood'
* it's not 'difficult to manage', it's 'self-managing'
* error-prone? irony. error is the typical result of your method.
We seem to understand the word "error" differently.

But the tone of your message leaves me with no desire to continue the
discussion, so I will offer my apologies for my unscientific message
and leave it at that.

Good luck with whatever method you end up using.
--
Tapani Tarvainen

------------------------------------------------------------------------------
Sam Pinkus
2015-06-07 17:10:47 UTC
Permalink
------------------------------------------------------------------------------
Patrick O'Callaghan
2015-06-07 17:19:24 UTC
Permalink
Post by Patrick O'Callaghan
Version 1: rsnapshot running on the client, with the backup set being
accessed via NFS mount(s) from the server. There is only one rsync
process, running on the client. From the point of view of this
process, it's comparing two local files at a time. Recall how rsync
works: candidate files are compared by various criteria including
block-by-block to minimize the amount of physical copying. However in
this scenario the blocks are being compared *on the client machine*
Aha, of course. Rsync does have a `--whole-file` option which turns off
delta-transfer. Would help here but its still going to be slower.
Because of system-level buffering of the NFS file reads, it probably
wouldn't make any difference at all. In my case network latency
dominated everything (this was on a Gigabit LAN with a fast switch and
a fast client) though file server read throughput bandwidth would also
have an impact. That's why minimizing the real amount of network data
transfer is so important.

poc

------------------------------------------------------------------------------
Helmut Hullen
2015-06-08 18:41:00 UTC
Permalink
Hallo, Eildert,
Post by Eildert Groeneveld
To me Tapani's setup sounds very appealing as it can be used a
the permanent connection/availability of clients is just a
special case of the temporarily unavailable systems to be backed up.
Thus, looking at it from a strategic perspective (as indicated in the
OP) this has a lot of appeal to me.
By the way: I use "rsnapshot" for backing up some notebook Windows-XP
clients whose partitions are mounted via cifs. Sometimes the backup is
incomplete, that damages the hardlink chain.

Therefore about every 3 months I run "hardlink" on that server which
stores all these backups for re-building this hardlink chain.

Principle:

cd /Path/to/backups
hardlink monthly.1? monthly.? weekly.?

That job should run at times when no weekly or monthly backup is
renamed.

My hardlink program is part of

<http://arktur.shuttle.de/CD/beta/slack/ap1/hardlink-1.2-i486-1hln.tgz>

Viele Gruesse!
Helmut


------------------------------------------------------------------------------
Loading...