Discussion:
[rsnapshot-discuss] Source directory doesn't exist - should this be fatal?
David Cantrell
2015-07-13 14:25:19 UTC
Permalink
I had a disk die a few days ago, and rsnapshot is saying ...
----------------------------------------------------------------------------
/usr/local/bin/rsnapshot daily
----------------------------------------------------------------------------
ERROR: backup /Volumes/Vault/ Vault/ - Source directory "/Volumes/Vault/" \
doesn't exist
ERROR: ---------------------------------------------------------------------
ERROR: Errors were found in /usr/local/etc/rsnapshot.conf,
ERROR: rsnapshot can not continue. If you think an entry looks right, make
ERROR: sure you don't have spaces where only tabs should be.
This prevents all my other targets from being backed up.

I think this should emit a warning, but be non-fatal. Anyone disagree?
--
David Cantrell | top google result for "internet beard fetish club"

Longum iter est per praecepta, breve et efficax per exempla.
Nico Kadel-Garcia
2015-07-15 00:24:13 UTC
Permalink
I think you need to split up your rsnapshot tasks, one config for each backed up host or major service. That will help avoid just this sort of problem.

Nico Kadel-Garcia
Email: ***@gmail.com
Sent from iPhone
Post by David Cantrell
I had a disk die a few days ago, and rsnapshot is saying ...
----------------------------------------------------------------------------
/usr/local/bin/rsnapshot daily
----------------------------------------------------------------------------
ERROR: backup /Volumes/Vault/ Vault/ - Source directory "/Volumes/Vault/" \
doesn't exist
ERROR: ---------------------------------------------------------------------
ERROR: Errors were found in /usr/local/etc/rsnapshot.conf,
ERROR: rsnapshot can not continue. If you think an entry looks right, make
ERROR: sure you don't have spaces where only tabs should be.
This prevents all my other targets from being backed up.
I think this should emit a warning, but be non-fatal. Anyone disagree?
--
David Cantrell | top google result for "internet beard fetish club"
Longum iter est per praecepta, breve et efficax per exempla.
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
rsnapshot-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Ken Woods
2015-07-15 01:08:43 UTC
Permalink
I get that we're a unique case, but your logic doesn't scale. For
instance, our processing farm has 3600-ish volumes spread across
600-ish hosts.

kw
Post by Nico Kadel-Garcia
I think you need to split up your rsnapshot tasks, one config for each backed up host or major service. That will help avoid just this sort of problem.
Nico Kadel-Garcia
Sent from iPhone
Post by David Cantrell
I had a disk die a few days ago, and rsnapshot is saying ...
----------------------------------------------------------------------------
/usr/local/bin/rsnapshot daily
----------------------------------------------------------------------------
ERROR: backup /Volumes/Vault/ Vault/ - Source directory "/Volumes/Vault/" \
doesn't exist
ERROR: ---------------------------------------------------------------------
ERROR: Errors were found in /usr/local/etc/rsnapshot.conf,
ERROR: rsnapshot can not continue. If you think an entry looks right, make
ERROR: sure you don't have spaces where only tabs should be.
This prevents all my other targets from being backed up.
I think this should emit a warning, but be non-fatal. Anyone disagree?
--
David Cantrell | top google result for "internet beard fetish club"
Longum iter est per praecepta, breve et efficax per exempla.
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
rsnapshot-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
rsnapshot-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Gordon Messmer
2015-07-15 01:34:43 UTC
Permalink
Post by Ken Woods
I get that we're a unique case, but your logic doesn't scale. For
instance, our processing farm has 3600-ish volumes spread across
600-ish hosts.
I don't see why that would scale less well in a single configuration
file rather than individual files per host (or per volume).

I used to manage 100+ hosts with rsnapshot. Each host had its own
configuration file, and the exit status of rsnapshot was fed to Nagios,
so that our backup status was monitored. In that system, it was
important that the failure of a single volume was fatal, not a warning.
Gordon Messmer
2015-07-15 16:43:11 UTC
Permalink
Post by Gordon Messmer
I don't see why that would scale less well in a single configuration
file rather than individual files per host (or per volume).
Math. It's hard, yeah?
First, admittedly, I totally garbled my reply. I apologize for that.

However, I don't really see the point that you're trying to make. David
suggested that a failure in one retain line should be treated as a
warning rather than an overall failure. Nico's reply suggested that he
separate different hosts or different services into separate
configuration files so that a failure would affect only the backup of
that host or service.

That's a perfectly reasonable suggestion, and I don't see how it creates
scalability concerns.

Especially when you have a very large number of hosts or directories to
back up, separating them into their own configuration files becomes more
desirable. Each host (or however you decide to partition your
configurations, but I'll use host for example) can be generated from a
template, as Nico suggested. That makes maintenance much easier. When
you're dealing with hundreds or thousands of entries, you don't want to
manage the configuration by hand. Adding or removing items is much more
reliable when you script your maintenance tasks.

With individual files, you'll probably use a short script similar to
"run-parts" to run rsnapshot with the interval specified for each of the
configuration files present. rsnapshot processes each of the "retain"
entries in the configuration files in series. If you break up your
configuration files and process them in a loop, that remains true.
However, you can choose to add logic to the loop to run several
rsnapshot instances in parallel if your backup disk is fast enough to
not be the bottleneck in your backup system. In that case, individual
files scale better than a single file.

Finally, the exit status of rsnapshot is the only really reliable means
it has of indicating whether the backup process was successful or not.
You could, as I did, use the exit status and the name of the
configuration file run to feed a monitoring system so that alerts can be
generated when backups fail, and provide a dashboard for monitoring a
large set of backups. If individual "retain" lines are demoted to a
warning from a failure, then the ability to reliably communicate failure
is lost. rsnapshot only gets one exit code, and it should definitely be
used to indicate the most severe error that it encountered during a
backup run.

So, for reliability and scalability, individual files really look like
best practice to me. If you think I'm wrong, I'm open to criticism.
David Cantrell
2015-07-15 17:20:54 UTC
Permalink
Post by Gordon Messmer
So, for reliability and scalability, individual files really look like
best practice to me. If you think I'm wrong, I'm open to criticism.
I agree with you. However, I only have half a dozen things to back up
and so splitting it all out is overkill. At this sort of scale the
correct solution is to check my mail and see if rsnapshot whined at me.

If we do make this fatal error into a warning, then a config like yours
which has every backup point being handled by a seperate rsnapshot
process will still work as you expect - eventually rsync will say "WTF,
that doesn't exist", rsnapshot will notice, and eventually exit with a
non-zero status.
--
David Cantrell | Godless Liberal Elitist

The Law of Daves: in any gathering of technical people, the
number of Daves will be greater than the number of women.
Nico Kadel-Garcia
2015-07-15 12:50:20 UTC
Permalink
Post by Ken Woods
I get that we're a unique case, but your logic doesn't scale. For
instance, our processing farm has 3600-ish volumes spread across
600-ish hosts.
kw
It actually scales pretty well. A "Makefile" generates classes of
.conf files, all using a common template, and generates both the
rsnapshot tasks and the cron jobs as needed. Time skew among the cron
jobs can be automatically based on IP address or a hash of the
hostname, to avoid doing all 600 hosts or 3600 tasks at once.

I've done it up to several hundred nodes, when I was building Beowulf
clusters. Heck, it even allowed me to tun and update an "/etc/exports"
file to make the backups for individual hosts remotely accessible to
those hosts, exported as "read-only" directories with the snapshot
root containing only that host's rsnapshot content.
Ken Woods
2015-07-15 17:24:54 UTC
Permalink
I keep forgetting that you don't deal with very large data sets.

Many(most) of our hosts can have daily changes of more than can be written to backup within that time period, especially while processing.

Whatever. My point is this: Having a fatal error on a failed drive within a volume would be a pain in the ass for us.
Post by Nico Kadel-Garcia
Post by Ken Woods
I get that we're a unique case, but your logic doesn't scale. For
instance, our processing farm has 3600-ish volumes spread across
600-ish hosts.
kw
It actually scales pretty well. A "Makefile" generates classes of
.conf files, all using a common template, and generates both the
rsnapshot tasks and the cron jobs as needed. Time skew among the cron
jobs can be automatically based on IP address or a hash of the
hostname, to avoid doing all 600 hosts or 3600 tasks at once.
I've done it up to several hundred nodes, when I was building Beowulf
clusters. Heck, it even allowed me to tun and update an "/etc/exports"
file to make the backups for individual hosts remotely accessible to
those hosts, exported as "read-only" directories with the snapshot
root containing only that host's rsnapshot content.
David Cantrell
2015-07-15 12:23:09 UTC
Permalink
Post by Nico Kadel-Garcia
I think you need to split up your rsnapshot tasks, one config for each backed up host or major service. That will help avoid just this sort of problem.
So rsnapshot's ability to back up multiple sources is pointless?

If I've configured it to back up a bunch of remote hosts and one of them
is unavailable, the rest get backed up anyway. I don't see why we should
treat backups of local filesystems differently.
Post by Nico Kadel-Garcia
Post by David Cantrell
I had a disk die a few days ago, and rsnapshot is saying ...
----------------------------------------------------------------------------
/usr/local/bin/rsnapshot daily
----------------------------------------------------------------------------
ERROR: backup /Volumes/Vault/ Vault/ - Source directory "/Volumes/Vault/" \
doesn't exist
ERROR: ---------------------------------------------------------------------
ERROR: Errors were found in /usr/local/etc/rsnapshot.conf,
ERROR: rsnapshot can not continue. If you think an entry looks right, make
ERROR: sure you don't have spaces where only tabs should be.
This prevents all my other targets from being backed up.
I think this should emit a warning, but be non-fatal. Anyone disagree?
--
David Cantrell | Enforcer, South London Linguistic Massive

Sobol's Law of Telecom Utilities:
Telcos are malicious; cablecos are simply clueless.
Benedikt Heine
2015-07-15 02:42:15 UTC
Permalink
Hi,
Post by David Cantrell
Anyone disagree?
Yes!
Post by David Cantrell
I think this should emit a warning, but be non-fatal.
So what do you do, when a directory is temporarily not available? You'll
loose hardlinking.

For example you backup an NFS mount 3 times:

run 1. Initial rsnapshot backup, no Problemm -> the snapshot 0 contains
the initial backup
run 2. NFS is not available right now. -> the snapshot 0 is empty,
snapshot 1 contains the inital backup
rn 3. NFS is available again, and _nothing_ changed -> snapshot 0 is the
exact copy of the inital backup but not hardlinked, snapshot 1 is empty,
and snapshot 2 is the initial backup.

You use now 2 times the space.

Rsnapshot's main feature is the use of hardlinks. We can't fullfil the
hardlinking while ignoring missing backup sources.

Sincerely,
Benedikt Heine
David Cantrell
2015-07-15 12:41:03 UTC
Permalink
Post by Benedikt Heine
Post by David Cantrell
I think this should emit a warning, but be non-fatal.
So what do you do, when a directory is temporarily not available? You'll
loose hardlinking.
You won't. snapshots will be rotated, copied, and linked as normal.
Remember, it's already non-fatal for remote backups.

Right now I've got the filesystem that was on the dodgy disk commented
out in my rsnapshot.conf, and backups of it are rotating just fine.

snapshots $ ls -li {daily,monthly}.[01]/BadDisk/some-file
1079771098 -rw-r--r-- 17 david staff 5 Feb 1 18:01 daily.0/BadDisk/some-file
1079771098 -rw-r--r-- 17 david staff 5 Feb 1 18:01 daily.1/BadDisk/some-file
1079771098 -rw-r--r-- 17 david staff 5 Feb 1 18:01 monthly.0/BadDisk/some-file
1079771098 -rw-r--r-- 17 david staff 5 Feb 1 18:01 monthly.1/BadDisk/some-file

My most recent daily backup finished a few hours ago. The filesystem
that was on a bad disk is still offline while I restore it. You can see
that hard links were correctly created at last night's rotation, and the
night before.
--
David Cantrell | Official London Perl Mongers Bad Influence
Christopher Barry
2015-07-15 13:22:48 UTC
Permalink
On Wed, 15 Jul 2015 13:41:03 +0100
Post by David Cantrell
Post by Benedikt Heine
Post by David Cantrell
I think this should emit a warning, but be non-fatal.
So what do you do, when a directory is temporarily not available?
You'll loose hardlinking.
You won't. snapshots will be rotated, copied, and linked as normal.
Remember, it's already non-fatal for remote backups.
Right now I've got the filesystem that was on the dodgy disk commented
out in my rsnapshot.conf, and backups of it are rotating just fine.
snip...
Post by David Cantrell
My most recent daily backup finished a few hours ago. The filesystem
that was on a bad disk is still offline while I restore it. You can see
that hard links were correctly created at last night's rotation, and
the night before.
Just hypothesizing here... My initial reaction to your OP was, right,
should be non-fatal. But what if your rsync options delete stuff in the
backup no longer present in the original volume being backup up?

Wouldn't that create the empty destination that would then need to be
completely repopulated once BadDisk was fixed?

First, do no harm. Failing in this context seems like the least bad
thing to do overall.

But I might be missing something...
-C
David Cantrell
2015-07-15 13:54:59 UTC
Permalink
Post by Christopher Barry
Just hypothesizing here... My initial reaction to your OP was, right,
should be non-fatal. But what if your rsync options delete stuff in the
backup no longer present in the original volume being backup up?
Wouldn't that create the empty destination that would then need to be
completely repopulated once BadDisk was fixed?
No. If you tell rsync to use a non-existent local directory as its
source rsync just aborts like this:

rsync: link_stat "/home/david/foo" failed: No such file or directory (2)
rsync error: some files could not be transferred (code 23) at main.c(789)

and the contents of the target are left unchanged. If the target doesn't
exist it is not created.
--
David Cantrell | Pope | First Church of the Symmetrical Internet

PERL: Politely Expressed Racoon Love
Christopher Barry
2015-07-15 16:36:42 UTC
Permalink
On Wed, 15 Jul 2015 14:54:59 +0100
Post by David Cantrell
Post by Christopher Barry
Just hypothesizing here... My initial reaction to your OP was, right,
should be non-fatal. But what if your rsync options delete stuff in
the backup no longer present in the original volume being backup up?
Wouldn't that create the empty destination that would then need to be
completely repopulated once BadDisk was fixed?
No. If you tell rsync to use a non-existent local directory as its
rsync: link_stat "/home/david/foo" failed: No such file or directory
(2) rsync error: some files could not be transferred (code 23) at
main.c(789)
and the contents of the target are left unchanged. If the target
doesn't exist it is not created.
OK, 'non-existent' is the key there, and maybe that's what all this
is has been about from the beginning - can't look up the OP at the
moment... If that's always been your specific issue, then please ignore
the rest of this.

But say you were mounting a volume on /home/david/foo. If the disk or
its filesystem was horked, but foo, being a mountpoint in the current
filesystem was there, couldn't a non-fatal response cause the
duplication others as well as I have suggested? Not trying to be
argumentative David, just trying to understand why it is the way it is
too.

-C
David Cantrell
2015-07-15 17:14:59 UTC
Permalink
Post by Christopher Barry
Post by David Cantrell
Post by Christopher Barry
Wouldn't that create the empty destination that would then need to be
completely repopulated once BadDisk was fixed?
No. If you tell rsync to use a non-existent local directory as its
rsync: link_stat "/home/david/foo" failed: No such file or directory
(2) rsync error: some files could not be transferred (code 23) at
main.c(789)
and the contents of the target are left unchanged. If the target
doesn't exist it is not created.
OK, 'non-existent' is the key there, and maybe that's what all this
is has been about from the beginning - can't look up the OP at the
moment... If that's always been your specific issue, then please ignore
the rest of this.
Yeah, on a Mac when a filesystem goes away then the mount point under
/Volumes also goes away.
Post by Christopher Barry
But say you were mounting a volume on /home/david/foo. If the disk or
its filesystem was horked, but foo, being a mountpoint in the current
filesystem was there, couldn't a non-fatal response cause the
duplication others as well as I have suggested? Not trying to be
argumentative David, just trying to understand why it is the way it is
too.
The current code won't give you a fatal response in that case anyway,
because all it checks for is that the directory you're telling it to
back up exists! It *doesn't* check that there's a filesystem mounted on
there.

I suspect that the current check is there because it's possible to
check it, and in general checking your input before doing anything is a
good practice. It's a bit harder to check whether something to be backed
up using rsync-over-ssh (for example) exists, so that's left to rsync to
handle.
--
David Cantrell | Pope | First Church of the Symmetrical Internet

Are you feeling bored? depressed? slowed down? Evil Scientists may
be manipulating the speed of light in your vicinity. Buy our patented
instructional video to find out how, and maybe YOU can stop THEM
Continue reading on narkive:
Loading...