I'm backing up a Linux box over SMB to a NAS. I mount the NAS locally and then I rsync a lot of data (100GB or so). I believe it's taking an awfully long time to do it: more than 12 hours. I would expected to be much faster once everything is copied since almost nothing is changed from day to day.
Is there a way to speed this up?
I was thinking that maybe rsync thinks it's working with local hard disks and uses checksum instead of time/size comparisons? But I didn't find a way to force time and date comparisons. Anything else I could check?
-
Smells like you have a cheaper NAS. It could also be from your network bandwidth...
"Standard" consumer NAS are really weak when it comes to heavy IO which is what you are trying to do here. It could also be a cheap switch connecting your PC and your NAS that is not strong enough to handle all the packets correctly.
J. Pablo Fernández : The same NAS, the same switch, another computer, running Windows, back up to it, much more information, in under four hours.From Antoine Benkemoun -
try this it think aleast gives you 10% more what speed your getting http://www.thegeekstuff.com/2009/09/linux-remote-backup-using-rsnapshot-rsync-utility/
J. Pablo Fernández : Can it work over SMB instead of SSH?From Rajat -
I think you're having a misunderstanding of the rsync algorithm and how the tool should be applied.
Rsync's performance advantage comes from doing delta transfers-- that is, moving only the changed bits in a file. In order to determine the changed bits, the file has to be read by the source and destination hosts and block checksums compared to determine which bits changed. This is the "magic" part of rsync-- the rsync algorithm itself.
When you're mounting the destination volume with SMB and using rsync to copy files from what Linux "sees" as a local source and a local destination (both mounted on that machine), most modern rsync versions switch to 'whole file' copy mode, and switch off the delta copy algorithm. This is a "win" because, with the delta-copy algorithm on, rsync would read the entire destination file (over the wire from the NAS) in order to determine what bits of the file have changed.
The "right way" to use rsync is to run the rsync server on one machine and the rsync client on the other. Each machine will read files from its own local storage (which should be very fast), agree on what bits of the files have changed, and only transfer those bits. They way you're using rsync amounts of a trumped-up 'cp'. You could accomplish the same thing with 'cp' and it would probably be faster.
If your NAS device supports running an rsync server (or client) then you're in business. If you're just going to mount it on the source machine via SMB then you might as well just use 'cp' to copy the files.
Evan Anderson : Ooo! Downvotes! I'd be curious to hear why you downvoted the answer, considering it's technically accurate.J. Pablo Fernández : I can't run rsync server on the NAS, otherwise I would be doing so. When not using an rsync server, rsync can use the checksum or the size and datetime to find out whether a file changed or not. According to the man page, it'll use the size and datetime by default, but my experience is that it is not doing that and I don't see a way to force it. I only see a way to force checksumming. --checksum: Without this option, rsync uses a "quick check" that (by default) checks if each file's size and time of last modification match between the sender and receiver.J. Pablo Fernández : Evan, give me a couple of minutes to write my comment.Evan Anderson : What behaviour are you seeing that's telling you that it's checksumming the files? The "quick check" behaviour is the default behaviour, so there's no way to "force" it. If you can't run rsync on the NAS just use 'cp'. It'll be as fast or faster.J. Pablo Fernández : According to how I understand rsync work, it should check the local date and time, the remote date and time and if they match not copy the file. Which means it shouldn't copy 99% of the files, but the fact that it takes more than 12hs for 60GB or so tells me that is either copying everything (which seems to be what you are implying by saying that cp will be faster) or that it is actually checksumming, which means it's not copying everything, but it is downloading everything.Evan Anderson : I'd run it with the "--dry-run" and "--verbose" arguments to see what it thinks it's doing. I wonder if your NAS device isn't representing the modification times exactly the same as the source. You could add a "--size-only" argument and see if that changes things. What filesystem are you running on the NAS device?J. Pablo Fernández : Thanks Evan, I'll try those recommendations. Regarding NAS' FS, I'm not sure, but I would guess it's ext3.From Evan Anderson -
It sounds like timestamps are your problem, as this page relates:
0 comments:
Post a Comment