Discussion:
[rsnapshot-discuss] inotify+rsnapshot
Patrick O'Callaghan
2017-05-11 23:53:46 UTC
Permalink
Has anyone looked at the potential advantages of using inotify in
combination with rsnapshot, basically in order to reduce the amount of
tree-walking necessary when doing a backup? I realise that inotify has its
own costs and may not be able to handle very large trees, but I thought I'd
ask all the same. The idea would be to use inotify to keep a list of
directories for rsnapshot to check at backup time, thus avoiding the need
to descend into branches that haven't changed.

I've no doubt this has occurred to many people before now, but a quick
Google search doesn't show up much for some reason. There are several
mentions of lsyncd (https://axkibe.github.io/lsyncd/) but that's focused on
live mirroring, which is not the same thing.

poc
David Cantrell
2017-05-12 11:24:40 UTC
Permalink
This post might be inappropriate. Click to display it.
Patrick O'Callaghan
2017-05-12 12:38:21 UTC
Permalink
Post by David Cantrell
Post by Patrick O'Callaghan
Has anyone looked at the potential advantages of using inotify in
combination with rsnapshot, basically in order to reduce the amount of
tree-walking necessary when doing a backup? I realise that inotify has
its
Post by Patrick O'Callaghan
own costs and may not be able to handle very large trees, but I thought
I'd
Post by Patrick O'Callaghan
ask all the same. The idea would be to use inotify to keep a list of
directories for rsnapshot to check at backup time, thus avoiding the need
to descend into branches that haven't changed.
rsnapshot doesn't descend into branches, it delegates all the actual
examination of what's on your disk to rsync. Just about the only benefit
you could get from using inotify would be to maybe skip calling rsync on
*local* backup points that haven't been updated at all, but that would
come at the cost of a very large increase in complexity, and I would
expect that the saving would be very small.
Yes, rsync is doing all the heavy lifting. My question was prompted by the
amount of activity I see on my backup device while rsnapshot is running.
It's a fairly slow NAS with a 100Mbps LAN interface and takes half an hour
to scan 180GB, of which much less than 1% will have changed in a 24-hour
period.

poc
Christopher Barry
2017-05-12 23:33:34 UTC
Permalink
On Fri, 12 May 2017 13:38:21 +0100
Post by Patrick O'Callaghan
Post by David Cantrell
Post by Patrick O'Callaghan
Has anyone looked at the potential advantages of using inotify in
combination with rsnapshot, basically in order to reduce the
amount of tree-walking necessary when doing a backup? I realise
that inotify has
its
Post by Patrick O'Callaghan
own costs and may not be able to handle very large trees, but I thought
I'd
Post by Patrick O'Callaghan
ask all the same. The idea would be to use inotify to keep a list
of directories for rsnapshot to check at backup time, thus
avoiding the need to descend into branches that haven't changed.
rsnapshot doesn't descend into branches, it delegates all the actual
examination of what's on your disk to rsync. Just about the only
benefit you could get from using inotify would be to maybe skip
calling rsync on *local* backup points that haven't been updated at
all, but that would come at the cost of a very large increase in
complexity, and I would expect that the saving would be very small.
Yes, rsync is doing all the heavy lifting. My question was prompted by
the amount of activity I see on my backup device while rsnapshot is
running. It's a fairly slow NAS with a 100Mbps LAN interface and takes
half an hour to scan 180GB, of which much less than 1% will have
changed in a 24-hour period.
poc
Yeah, I've also wondered about that. Why don't you just set inotify
loose on your 180GB fs and see what happens? If it runs really lightly,
and can keep a list up to date 'enough' for you, just feed that
manually into rsync yourself and see what the deal is. If it's way
better, fork rsnapshot or write a new backup program. If you write a
new one in bash I'd probably help you. Sounds like a cool project.
--
Regards,
Christopher
Patrick O'Callaghan
2017-05-12 23:51:09 UTC
Permalink
Post by Christopher Barry
Yeah, I've also wondered about that. Why don't you just set inotify
loose on your 180GB fs and see what happens? If it runs really lightly,
and can keep a list up to date 'enough' for you, just feed that
manually into rsync yourself and see what the deal is. If it's way
better, fork rsnapshot or write a new backup program. If you write a
new one in bash I'd probably help you. Sounds like a cool project.
I might try that :-)

poc
Art Sackett
2017-05-12 00:56:39 UTC
Permalink
Has anyone looked at the potential advantages of using inotify in combination
with rsnapshot, basically in order to reduce the amount of tree-walking
necessary when doing a backup?
I can't say that I've looked into it, as such, but because I use inotify
quite a lot in my usual work I've thought about it a time or two.

I can't or haven't yet envisioned a way to get around the need for file
system traversal at startup, in response to IN_Q_OVERFLOW, and after
mounting a filesystem as a member of a watched directory structure.
inotify doesn't support remote (networked) file system events, so
something else would have to be implemented to gain support for them.
And then, just to keep it interesting, there are race conditions in
inotify itself that can require reaching out to readdir() or stat() for
more reliable but still not race-free state discovery.

I'd expect that an inotify implementation for this purpose would be ten
percent inotify and 90% workarounds and shims.
--
Art Sackett
http://www.artsackett.com/
Patrick O'Callaghan
2017-05-14 12:31:03 UTC
Permalink
Post by Art Sackett
Has anyone looked at the potential advantages of using inotify in combination
with rsnapshot, basically in order to reduce the amount of tree-walking
necessary when doing a backup?
I can't say that I've looked into it, as such, but because I use inotify
quite a lot in my usual work I've thought about it a time or two.
I can't or haven't yet envisioned a way to get around the need for file
system traversal at startup, in response to IN_Q_OVERFLOW, and after
mounting a filesystem as a member of a watched directory structure.
inotify doesn't support remote (networked) file system events, so
something else would have to be implemented to gain support for them.
And then, just to keep it interesting, there are race conditions in
inotify itself that can require reaching out to readdir() or stat() for
more reliable but still not race-free state discovery.
I'd expect that an inotify implementation for this purpose would be ten
percent inotify and 90% workarounds and shims.
Good points, thanks for the answer.

poc

Loading...