Discussion:
[rsnapshot-discuss] rsnapshot without syncing
Oon-Ee Ng
2016-08-06 22:37:43 UTC
Permalink
Summary: I'd like to use rsnapshot only for rotation, is that
possible? Can sync_first do rotation only (including hard link
copying) WITHOUT ever running sync?

Detailed Explanation:

I have a server with a larg-ish hard disk attached. Backups are done
from multiple machines (Linux using ssh+rsync, windows using cygwin
ssh+rsync, Android using sftp). The resulting folders for 2 machines
would look something like:-
/mnt/HDD/A
/mnt/HDD/B

I'm already using rsnapshot for some months, so I have:-
/mnt/HDD/snapshots/daily.0/A
/mnt/HDD/snapshots/daily.0/B
/mnt/HDD/snapshots/daily.1/A
/mnt/HDD/snapshots/daily.1/B
/mnt/HDD/snapshots/daily.2/A
/mnt/HDD/snapshots/daily.2/B
and so on, including weekly/monthly intervals.

There's actually two full copies of the data on this server. For
machine A, this is a copy at /mnt/HDD/A and another copy at
/mnt/HDD/snapshots/daily.0/A. I'd like to save some space, and imagine
I could perhaps do something like this:-

1. Tell rsnapshot to only rotate (mv daily.5->daily.6, mv daily.4->daily.5 ....)
2. Tell rsnapshot to do hard linking from daily.0 to daily.1

My intention is that machine A would run a backup rsync to
/mnt/HDD/snapshots/daily.0 instead of /mnt/HDD/A. In my understanding,
that would mean there's only one full copy of the data, and if the
data never changed, the backup machine would only take up as much size
as the sum of all machines which are sending backups.

Please advise if:-

1. The above is flawed and impossible as a concept.
2. rsyncs during the rsnapshot run would corrupt the backup
3. The sync_first option can do what I want

Thank you for your time.

------------------------------------------------------------------------------
Nico Kadel-Garcia
2016-08-07 00:00:25 UTC
Permalink
Post by Oon-Ee Ng
Summary: I'd like to use rsnapshot only for rotation, is that
possible? Can sync_first do rotation only (including hard link
copying) WITHOUT ever running sync?
It's a modest perl script or two well designed to be extremely
flexible per user requirements.
Post by Oon-Ee Ng
I have a server with a larg-ish hard disk attached. Backups are done
from multiple machines (Linux using ssh+rsync, windows using cygwin
ssh+rsync, Android using sftp). The resulting folders for 2 machines
would look something like:-
/mnt/HDD/A
/mnt/HDD/B
I'm already using rsnapshot for some months, so I have:-
/mnt/HDD/snapshots/daily.0/A
/mnt/HDD/snapshots/daily.0/B
/mnt/HDD/snapshots/daily.1/A
/mnt/HDD/snapshots/daily.1/B
/mnt/HDD/snapshots/daily.2/A
/mnt/HDD/snapshots/daily.2/B
and so on, including weekly/monthly intervals.
There's actually two full copies of the data on this server. For
machine A, this is a copy at /mnt/HDD/A and another copy at
/mnt/HDD/snapshots/daily.0/A. I'd like to save some space, and imagine
I could perhaps do something like this:-
1. Tell rsnapshot to only rotate (mv daily.5->daily.6, mv daily.4->daily.5 ....)
2. Tell rsnapshot to do hard linking from daily.0 to daily.1
daily.0 and daily.1 should already be hardlinked. Any files that did
not differ between daily.0 and daily.1 should have been hardlinked
when daily.0 got generated.
Post by Oon-Ee Ng
My intention is that machine A would run a backup rsync to
/mnt/HDD/snapshots/daily.0 instead of /mnt/HDD/A. In my understanding,
What? Why? Why not do a cp -al from daily.0 first, then ran a
well-formed rsync command on top of that? This is asically wnat
daily.1 and daily.0 do anyway.

Do be careful with this approach. If someone then SCP's on top of
/mnt/HDD/A, they'll be copying on top of the "static" backups. This is
the risk of *any* approach writes on top of the rsnapshot hardlinked
backups, for any reason.

That risk is why few people do this. Anything that might write on top
of that staging area, even accidentally, could corrupt all the
backups.
Post by Oon-Ee Ng
that would mean there's only one full copy of the data, and if the
data never changed, the backup machine would only take up as much size
as the sum of all machines which are sending backups.
Plus inodes for all rhe subdirectories. I've seen people get bitten by that one!
Post by Oon-Ee Ng
Please advise if:-
1. The above is flawed and impossible as a concept.
Possible? Yes, it's a scripting environment and you can get away with
anything you can script. Good idea? I don't think so.
Post by Oon-Ee Ng
2. rsyncs during the rsnapshot run would corrupt the backup
3. The sync_first option can do what I want
I think this is actually your best bet.
Post by Oon-Ee Ng
Thank you for your time.
------------------------------------------------------------------------------
_______________________________________________
rsnapshot-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
------------------------------------------------------------------------------
Oon-Ee Ng
2016-08-11 02:08:14 UTC
Permalink
Thanks for your reply.
Post by Nico Kadel-Garcia
Post by Oon-Ee Ng
My intention is that machine A would run a backup rsync to
/mnt/HDD/snapshots/daily.0 instead of /mnt/HDD/A. In my understanding,
What? Why? Why not do a cp -al from daily.0 first, then ran a
well-formed rsync command on top of that? This is asically wnat
daily.1 and daily.0 do anyway.
Do be careful with this approach. If someone then SCP's on top of
/mnt/HDD/A, they'll be copying on top of the "static" backups. This is
the risk of *any* approach writes on top of the rsnapshot hardlinked
backups, for any reason.
That risk is why few people do this. Anything that might write on top
of that staging area, even accidentally, could corrupt all the
backups.
The 'why' is that I would like to save space, basically. More below.
Post by Nico Kadel-Garcia
Post by Oon-Ee Ng
3. The sync_first option can do what I want
I think this is actually your best bet.
As I understand it, sync_first uses a .sync folder which would use up
as much space as the folder being backed up, something I'm trying to
avoid.

Here was my idea on how things should work:-

1. daily.1 is a hard-linked identical copy of daily.0
2. copying files into daily.0 disassociates that link, meaning daily.1
is not changed at all, while daily.0 has the 'latest' copy
3. running `rsnapshot daily` just rotates daily.1 to daily.2 and then
cp -al daily.0 daily.1

It appears, if I'm deducing from your replies correctly, that my '2'
is mistaken as changes to daily.0 would also affect daily.1 (and hence
all subsequent backups). Is this only a concern with scp? Can I avoid
this problem with a properly formed rsync, for example? I have full
control over all possible writes to the backup hard disc.
Patrick O'Callaghan
2016-08-11 09:56:03 UTC
Permalink
Post by Oon-Ee Ng
Post by Nico Kadel-Garcia
That risk is why few people do this. Anything that might write on top
of that staging area, even accidentally, could corrupt all the
backups.
The 'why' is that I would like to save space, basically. More below.
Post by Nico Kadel-Garcia
Post by Oon-Ee Ng
3. The sync_first option can do what I want
I think this is actually your best bet.
As I understand it, sync_first uses a .sync folder which would use up
as much space as the folder being backed up, something I'm trying to
avoid.
Wrong. The .sync folder becomes the backup folder when the process is
finished, and only files which have changed since the last backup will be
physically copied, the rest being links. This is exactly what you want.

poc
Oon-Ee Ng
2016-08-11 21:02:02 UTC
Permalink
On Thu, Aug 11, 2016 at 5:56 PM, Patrick O'Callaghan
Post by Patrick O'Callaghan
Post by Oon-Ee Ng
As I understand it, sync_first uses a .sync folder which would use up
as much space as the folder being backed up, something I'm trying to
avoid.
Wrong. The .sync folder becomes the backup folder when the process is
finished, and only files which have changed since the last backup will be
physically copied, the rest being links. This is exactly what you want.
Thanks! It appears then that I misunderstand the man page, which
states regarding sync_first that:-
This benefit comes at the cost of one more snapshot worth of disk
space. The default is 0 (off).

I understood that to mean .sync would take up as much disk space as a
full backup, but I guess the 'snapshot worth' is important there,
meaning it would be an additional number of hard links rather than
additional space taken up.
Patrick O'Callaghan
2016-08-11 22:49:34 UTC
Permalink
Post by Oon-Ee Ng
On Thu, Aug 11, 2016 at 5:56 PM, Patrick O'Callaghan
Post by Patrick O'Callaghan
Post by Oon-Ee Ng
As I understand it, sync_first uses a .sync folder which would use up
as much space as the folder being backed up, something I'm trying to
avoid.
Wrong. The .sync folder becomes the backup folder when the process is
finished, and only files which have changed since the last backup will be
physically copied, the rest being links. This is exactly what you want.
Thanks! It appears then that I misunderstand the man page, which
states regarding sync_first that:-
This benefit comes at the cost of one more snapshot worth of disk
space. The default is 0 (off).
I understood that to mean .sync would take up as much disk space as a
full backup, but I guess the 'snapshot worth' is important there,
meaning it would be an additional number of hard links rather than
additional space taken up.
Plus any changed files of course, but essentially yes.

poc
Oon-Ee Ng
2016-08-12 01:50:22 UTC
Permalink
On Fri, Aug 12, 2016 at 6:49 AM, Patrick O'Callaghan
Post by Patrick O'Callaghan
Post by Oon-Ee Ng
I understood that to mean .sync would take up as much disk space as a
full backup, but I guess the 'snapshot worth' is important there,
meaning it would be an additional number of hard links rather than
additional space taken up.
Plus any changed files of course, but essentially yes.
Thanks, I'm going to test replacing the 'sync' step with remotely run
rsyncs and see how well that works.

Loading...