Discussion:
[rsnapshot-discuss] relinking (deduping) disconnected rsnapshot trees
Winkel, Richard J.
2015-06-03 14:49:10 UTC
Permalink
Because of an undetected disk overflow I have fragmented copies of
partial rsnapshot backups on a raid.
I'd rather not just go back to the last intact backup, but find a way to
merge the new data with the existing
tree. In other words, scan directories A and B and
if files A/subpathK/fileX and B/subpathK/fileX exist and are identical,
then link them together, otherwise do nothing.
Rsync (3.1.1) doesn't seem to be the tool to use, at least I can't
figure it out.
Has anyone else run across this issue and how did you resolve it?

Thanks,
Rich
------------------------------------------------------------------------------
Christopher Barry
2015-06-03 16:24:44 UTC
Permalink
On Wed, 3 Jun 2015 14:49:10 +0000
Post by Winkel, Richard J.
Because of an undetected disk overflow I have fragmented copies of
partial rsnapshot backups on a raid.
Can you go into more detail here? disk overflow? Do you mean you ran
out of disk space, didn't notice and backups have been failing for
some period?

How have you proceeded to correct this problem? e.g. did you replace/add
disks and rebuild, and now you have a larger RAID (5?) and wish to
resume backing up to this? Did you make room by deleting a bunch of
older stuff? Or, do you now have another additional RAID device to add
new backups to? The more detail the better.
Post by Winkel, Richard J.
I'd rather not just go back to the last intact backup, but find a way
to merge the new data with the existing
tree. In other words, scan directories A and B and
if files A/subpathK/fileX and B/subpathK/fileX exist and are
identical, then link them together, otherwise do nothing.
Rsync (3.1.1) doesn't seem to be the tool to use, at least I can't
figure it out.
Has anyone else run across this issue and how did you resolve it?
Thanks,
Rich
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
------------------------------------------------------------------------------
Winkel, Richard J.
2015-06-03 17:29:41 UTC
Permalink
Thanks for the reply!
I moved one of the backup trees elsewhere so I have some free space.
The overflow has been happening for about a month.
I obviously need to check what happened to the syslog message.
Also I had lazy_deletes turned on, I think this interfered with the
rollback
procedure, it left _delete* directories lying around that were never
cleaned up.
Post by Christopher Barry
On Wed, 3 Jun 2015 14:49:10 +0000
Post by Winkel, Richard J.
Because of an undetected disk overflow I have fragmented copies of
partial rsnapshot backups on a raid.
Can you go into more detail here? disk overflow? Do you mean you ran
out of disk space, didn't notice and backups have been failing for
some period?
How have you proceeded to correct this problem? e.g. did you replace/add
disks and rebuild, and now you have a larger RAID (5?) and wish to
resume backing up to this? Did you make room by deleting a bunch of
older stuff? Or, do you now have another additional RAID device to add
new backups to? The more detail the better.
Post by Winkel, Richard J.
I'd rather not just go back to the last intact backup, but find a way
to merge the new data with the existing
tree. In other words, scan directories A and B and
if files A/subpathK/fileX and B/subpathK/fileX exist and are
identical, then link them together, otherwise do nothing.
Rsync (3.1.1) doesn't seem to be the tool to use, at least I can't
figure it out.
Has anyone else run across this issue and how did you resolve it?
Thanks,
Rich
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
------------------------------------------------------------------------------
Scott Hess
2015-06-03 17:18:22 UTC
Permalink
If it's a limited amount of unlinked data, then one approach you could use
is:

mv weekly.0 weekly.0.broken
cp -al weekly.1 weekly.0
rsync -aHS weekly.0.broken/ weekly.1/
rm -rf weekly.0.broken

This will take an existing correct backup (weekly.1), then rsync _over_
it. Depending on what kind of fidelity you want to the brokenness, you
might add --delete to the rsync command. Basically just read the
rsnapshot.log and figure out a script which will replicate the gist of it.

BUT! Obviously you need to be really careful, and you would be well
advised to spend a bit of time thinking about how this is going to affect
your overall backup. Normally I would not expect rsnapshot to leave you
with fragmentary unlinked backups. Having something you believe to be that
implies that your tip-of-stream is disconnected from your older backups,
and hot-patching one or two directories isn't going to resolve that. In
that case I'd think even harder about the problem, and maybe even fake up a
simple rsnapshot sandbox to experiment with, to make sure I'm making things
better rather than worse.

Also ... IMHO you might be best served to shift those broken directories
aside and set a calendar entry to manually delete them after an appropriate
time, and not treat them as part of your normal rsnapshot stream. Those
broken snapshot directories are mis-leading, and I'd worry about mistakes
made in six months when something else comes up and you have to manually
intervene.

-scott
Post by Winkel, Richard J.
Because of an undetected disk overflow I have fragmented copies of
partial rsnapshot backups on a raid.
I'd rather not just go back to the last intact backup, but find a way to
merge the new data with the existing
tree. In other words, scan directories A and B and
if files A/subpathK/fileX and B/subpathK/fileX exist and are identical,
then link them together, otherwise do nothing.
Rsync (3.1.1) doesn't seem to be the tool to use, at least I can't
figure it out.
Has anyone else run across this issue and how did you resolve it?
Thanks,
Rich
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Winkel, Richard J.
2015-06-03 18:21:18 UTC
Permalink
Hi Scott,
Thanks for the thoughtful reply. You have a good approach, I don't think anything would be lost, but I think files could be linked in that shouldn't be there.
Unfortunately I can't just set the broken stuff aside, I have about 20TB of data here and the raid is about 95% full.

On 06/03/2015 12:18 PM, Scott Hess wrote:
If it's a limited amount of unlinked data, then one approach you could use is:

mv weekly.0 weekly.0.broken
cp -al weekly.1 weekly.0
rsync -aHS weekly.0.broken/ weekly.1/
rm -rf weekly.0.broken

This will take an existing correct backup (weekly.1), then rsync _over_ it. Depending on what kind of fidelity you want to the brokenness, you might add --delete to the rsync command. Basically just read the rsnapshot.log and figure out a script which will replicate the gist of it.

BUT! Obviously you need to be really careful, and you would be well advised to spend a bit of time thinking about how this is going to affect your overall backup. Normally I would not expect rsnapshot to leave you with fragmentary unlinked backups. Having something you believe to be that implies that your tip-of-stream is disconnected from your older backups, and hot-patching one or two directories isn't going to resolve that. In that case I'd think even harder about the problem, and maybe even fake up a simple rsnapshot sandbox to experiment with, to make sure I'm making things better rather than worse.

Also ... IMHO you might be best served to shift those broken directories aside and set a calendar entry to manually delete them after an appropriate time, and not treat them as part of your normal rsnapshot stream. Those broken snapshot directories are mis-leading, and I'd worry about mistakes made in six months when something else comes up and you have to manually intervene.

-scott


On Wed, Jun 3, 2015 at 7:49 AM, Winkel, Richard J. <***@missouri.edu<mailto:***@missouri.edu>> wrote:
Because of an undetected disk overflow I have fragmented copies of
partial rsnapshot backups on a raid.
I'd rather not just go back to the last intact backup, but find a way to
merge the new data with the existing
tree. In other words, scan directories A and B and
if files A/subpathK/fileX and B/subpathK/fileX exist and are identical,
then link them together, otherwise do nothing.
Rsync (3.1.1) doesn't seem to be the tool to use, at least I can't
figure it out.
Has anyone else run across this issue and how did you resolve it?

Thanks,
Rich
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-***@lists.sourceforge.net<mailto:rsnapshot-***@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Winkel, Richard J.
2015-06-03 17:28:49 UTC
Permalink
I guess I'm just being lazy. But if anyone else already has something
in hand
it seems like it would be useful to a lot of people.
Otherwise I guess I'll have to invent it.
Post by Winkel, Richard J.
Because of an undetected disk overflow I have fragmented copies of
partial rsnapshot backups on a raid.
I'd rather not just go back to the last intact backup, but find a way to
merge the new data with the existing
tree. In other words, scan directories A and B and
if files A/subpathK/fileX and B/subpathK/fileX exist and are identical,
then link them together, otherwise do nothing.
Rsync (3.1.1) doesn't seem to be the tool to use, at least I can't
figure it out.
Has anyone else run across this issue and how did you resolve it?
Thanks,
Rich
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
------------------------------------------------------------------------------
Winkel, Richard J.
2015-06-03 17:29:05 UTC
Permalink
#!/bin/bash
if [ $# -ne 2 ]; then
echo "Syntax $0 tree1 tree2"
echo "Scans 2 trees for identically path'd and named files and
if they are identical, links them together."
exit 1
fi
if ! [ -d "$1" -a -d "$2" -a $(df -P "$1" "$2" | awk '{print $1}'|
uniq | wc -l) -eq 2 ]; then
echo "Arguments must be directories on the same partition!
Exiting..."
exit 2
fi
find "$1" -type f -print | sed "s,$1,," |while read f; do
if cmp -s "$1/$f" "$2/$f"; then
echo "Linking $1/$f to $2/$f "
rm -f "$2/$f"
ln "$1/$f" "$2/$f"
fi
done
------------------------------------------------------------------------------
Christopher Barry
2015-06-04 20:03:56 UTC
Permalink
On Wed, 3 Jun 2015 17:29:05 +0000
Post by Winkel, Richard J.
#!/bin/bash
if [ $# -ne 2 ]; then
echo "Syntax $0 tree1 tree2"
echo "Scans 2 trees for identically path'd and named files
and if they are identical, links them together."
exit 1
fi
if ! [ -d "$1" -a -d "$2" -a $(df -P "$1" "$2" | awk '{print $1}'|
uniq | wc -l) -eq 2 ]; then
echo "Arguments must be directories on the same partition!
Exiting..."
exit 2
fi
find "$1" -type f -print | sed "s,$1,," |while read f; do
if cmp -s "$1/$f" "$2/$f"; then
echo "Linking $1/$f to $2/$f "
rm -f "$2/$f"
ln "$1/$f" "$2/$f"
fi
done
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
This may be useful:
http://serverfault.com/questions/618735/can-i-use-rsync-to-create-a-list-of-only-changed-files

It shows a way to leverage rsync to create a list of the files that
would be transfered from the <src> to the <dest>, without actually
doing anything. Armed with this list, you can infer that all other
files can be hardlinks.

HTH
--
Regards,
Christopher Barry

Random geeky fortune:
If it ain't baroque, don't phiques it.

------------------------------------------------------------------------------
Rasmus Borup Hansen
2015-06-04 06:07:52 UTC
Permalink
You may want to take a look at hardlink:

http://jak-linux.org/projects/hardlink/ <http://jak-linux.org/projects/hardlink/>

Best,

Rasmus

Intomics is a contract research organization specialized in deriving core biological insight from large scale data. We help our clients in the pharmaceutical industry develop tomorrow's medicines better, faster, and cheaper through optimized use of biomedical data.
-----------------------------------------------------------------
Hansen, Rasmus Borup Intomics - from data to biology
System Administrator Diplomvej 377
Scientific Programmer DK-2800 Kgs. Lyngby
Denmark
E: ***@intomics.com W: http://www.intomics.com/
P: +45 5167 7972 P: +45 8880 7979
Post by Winkel, Richard J.
Because of an undetected disk overflow I have fragmented copies of
partial rsnapshot backups on a raid.
I'd rather not just go back to the last intact backup, but find a way to
merge the new data with the existing
tree. In other words, scan directories A and B and
if files A/subpathK/fileX and B/subpathK/fileX exist and are identical,
then link them together, otherwise do nothing.
Rsync (3.1.1) doesn't seem to be the tool to use, at least I can't
figure it out.
Has anyone else run across this issue and how did you resolve it?
Thanks,
Rich
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Scott Hess
2015-06-04 17:39:14 UTC
Permalink
This is a good starting point:
http://en.wikipedia.org/wiki/List_of_duplicate_file_finders

The primary issue with these tools in an rsnapshot root is that they can
become memory constrained, unless special care is taken. I ended up
writing my own perl script which runs a find generating various metadata,
then runs that through sort to group the files which could be the same,
then processes that result.

-scott
Post by Rasmus Borup Hansen
http://jak-linux.org/projects/hardlink/
Best,
Rasmus
*Intomics is a contract research organization specialized in deriving core
biological insight from large scale data. We help our clients in the
pharmaceutical industry develop tomorrow's medicines better, faster, and
cheaper through optimized use of biomedical data.*
-----------------------------------------------------------------
Hansen, Rasmus Borup Intomics - from data to biology
System Administrator Diplomvej 377
Scientific Programmer DK-2800 Kgs. Lyngby
Denmark
P: +45 5167 7972 P: +45 8880 7979
Because of an undetected disk overflow I have fragmented copies of
partial rsnapshot backups on a raid.
I'd rather not just go back to the last intact backup, but find a way to
merge the new data with the existing
tree. In other words, scan directories A and B and
if files A/subpathK/fileX and B/subpathK/fileX exist and are identical,
then link them together, otherwise do nothing.
Rsync (3.1.1) doesn't seem to be the tool to use, at least I can't
figure it out.
Has anyone else run across this issue and how did you resolve it?
Thanks,
Rich
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
Helmut Hullen
2015-06-04 09:00:00 UTC
Permalink
Hallo, Rasmus,
Post by Rasmus Borup Hansen
http://jak-linux.org/projects/hardlink/
<http://jak-linux.org/projects/hardlink/>
Or you use the original "hardlink" (written in C and then compiled into
a binary):

http://arktur.shuttle.de/CD/beta/slack/ap1/hardlink-1.2-i486-1hln.tgz

This "hardlink" program is really small. And really quick.
Authors: Jakub Jalunik, Dag Wieers, Dries Verachtert etc.


Viele Gruesse!
Helmut


------------------------------------------------------------------------------
Loading...