{"id":1889,"date":"2016-09-19T19:48:30","date_gmt":"2016-09-20T00:48:30","guid":{"rendered":"http:\/\/swildow.darktech.org\/wp\/?p=1889"},"modified":"2016-09-19T19:48:30","modified_gmt":"2016-09-20T00:48:30","slug":"the-non-beginners-guide-to-syncing-data-with-rsync","status":"publish","type":"post","link":"https:\/\/www.wildow.com\/blog\/?p=1889","title":{"rendered":"The Non-Beginner\u2019s Guide to Syncing Data with Rsync"},"content":{"rendered":"<h2><a title=\"The Non-Beginner\u2019s Guide to Syncing Data with Rsync\" href=\"http:\/\/www.howtogeek.com\/175008\/the-non-beginners-guide-to-syncing-data-with-rsync\/\" rel=\"bookmark\">The Non-Beginner\u2019s Guide to Syncing Data with Rsync<\/a><\/h2>\n<div class=\"thecontent\">\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-175009\" src=\"http:\/\/www.howtogeek.com\/wp-content\/uploads\/2013\/11\/650x250x1-rsyncheader.jpg.pagespeed.gp+jp+jw+pj+js+rj+rp+rw+ri+cp+md.ic.mO5xCkQGOy.jpg\" alt=\"\" width=\"650\" height=\"250\" \/><\/p>\n<p>The rsync protocol can be pretty simple to use for ordinary backup\/synchronization jobs, but some of its more advanced features may surprise you.\u00a0 In this article, we\u2019re going to show how even the biggest data hoarders and backup enthusiasts can wield rsync as a single solution for all of their data redundancy needs.<\/p>\n<h3>Warning: Advanced Geeks Only<\/h3>\n<p>If you\u2019re sitting there thinking \u201cWhat the heck is rsync?\u201d or \u201cI only use rsync for really simple tasks,\u201d you may want to check out our previous article on <a href=\"http:\/\/www.howtogeek.com\/135533\/how-to-use-rsync-to-backup-your-data-on-linux\/\">how to use rsync to backup your data on Linux<\/a>, which gives an introduction to rsync, guides you through installation, and showcases its more basic functions.\u00a0 Once you have a firm grasp of how to use rsync (honestly, it isn\u2019t that complex) and are comfortable with a Linux terminal, you\u2019re ready to move on to this advanced guide.<\/p>\n<h3>Running rsync on Windows<\/h3>\n<p>First, let\u2019s get our Windows readers on the same page as our Linux gurus.\u00a0 Although rsync is built to run on Unix-like systems, there\u2019s no reason that you shouldn\u2019t be able to use it just as easily on Windows.\u00a0 <a href=\"http:\/\/cygwin.com\/\">Cygwin<\/a> produces a wonderful Linux API that we can use to run rsync, so head over to their website and download the <a href=\"http:\/\/cygwin.com\/setup-x86.exe\">32-bit<\/a> or <a href=\"http:\/\/cygwin.com\/setup-x86_64.exe\">64-bit<\/a> version, depending on your computer.<\/p>\n<p>Installation is straightforward; you can keep all options at their default values until you get to the \u201cSelect Packages\u201d screen.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-175011\" src=\"http:\/\/www.howtogeek.com\/wp-content\/uploads\/2013\/11\/546x399x3-cygwininstallation.jpg.pagespeed.gp+jp+jw+pj+js+rj+rp+rw+ri+cp+md.ic.krxsnWapoC.jpg\" alt=\"\" width=\"546\" height=\"399\" \/><\/p>\n<p>Now you need to do the same steps for Vim and SSH, but the packages are going to look a bit different when you go to select them, so here are some screenshots:<\/p>\n<p>Installing Vim:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-175012\" src=\"http:\/\/www.howtogeek.com\/wp-content\/uploads\/2013\/11\/546x399x4-cygwininstallation.jpg.pagespeed.gp+jp+jw+pj+js+rj+rp+rw+ri+cp+md.ic.IlHD0osah_.jpg\" alt=\"\" width=\"546\" height=\"399\" \/><\/p>\n<p>Installing SSH:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-175013\" src=\"http:\/\/www.howtogeek.com\/wp-content\/uploads\/2013\/11\/546x399x5-cygwininstallation.jpg.pagespeed.gp+jp+jw+pj+js+rj+rp+rw+ri+cp+md.ic.dYlWpnDxo4.jpg\" alt=\"\" width=\"546\" height=\"399\" \/><\/p>\n<p>After you\u2019ve selected those three packages, keep clicking next until you finish the installation. Then you can open Cygwin by clicking on the icon that the installer placed on your desktop.<\/p>\n<h3>rsync Commands: Simple to Advanced<\/h3>\n<p>Now that the Windows users are on the same page, let\u2019s take a look at a simple rsync command, and show how the use of some advanced switches can quickly make it complex.<\/p>\n<p>Let\u2019s say you have a bunch of files that need backed up \u2013 who doesn\u2019t these days?\u00a0 You plug in your portable hard drive so you can backup your computers files, and issue the following command:<\/p>\n<blockquote><p><code>rsync -a \/home\/geek\/files\/ \/mnt\/usb\/files\/<\/code><\/p><\/blockquote>\n<p>Or, the way it would look on a Windows computer with Cygwin:<\/p>\n<blockquote><p><code>rsync -a \/cygdrive\/c\/files\/ \/cygdrive\/e\/files\/<\/code><\/p><\/blockquote>\n<p>Pretty simple, and at that point there\u2019s really no need to use rsync, since you could just drag and drop the files.\u00a0 However, if your other hard drive already has some of the files and just needs the updated versions plus the files that have been created since the last sync, this command is handy because it only sends the new data over to the hard drive.\u00a0 With big files, and especially transferring files over the internet, that is a big deal.<\/p>\n<p>Backing up your files to an external hard drive and then keeping the hard drive in the same location as your computer is a very bad idea, so let\u2019s take a look at what it would require to start sending your files over the internet to another computer (one you\u2019ve rented, a family member\u2019s, etc).<\/p>\n<blockquote><p><code>rsync -av --delete -e 'ssh -p 12345\u2019 \/home\/geek\/files\/ geek2@10.1.1.1:\/home\/geek2\/files\/<\/code><\/p><\/blockquote>\n<p>The above command would send your files to another computer with an IP address of 10.1.1.1.\u00a0 It would delete extraneous files from the destination that no longer exist in the source directory, output the filenames being transferred so you have an idea of what\u2019s going on, and tunnel rsync through SSH on port 12345.<\/p>\n<p>The <code>-a -v -e --delete<\/code> switches are some of the most basic and commonly used; you should already know a good deal about them if you\u2019re reading this tutorial.\u00a0 Let\u2019s go over some other switches that are sometimes ignored but incredibly useful:<\/p>\n<p><code>--progress<\/code> \u2013 This switch allows us to see the transfer progress of each file.\u00a0 It\u2019s particularly useful when transferring large files over the internet, but can output a senseless amount of information when just transferring small files across a fast network.<\/p>\n<p>An rsync command with the <code>--progress<\/code> switch as a backup is in progress:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-175014\" src=\"http:\/\/www.howtogeek.com\/wp-content\/uploads\/2013\/11\/532x244x6-rsync.jpg.pagespeed.gp+jp+jw+pj+js+rj+rp+rw+ri+cp+md.ic.xktbW8GBlo.jpg\" alt=\"\" width=\"532\" height=\"244\" \/><\/p>\n<p><code>--partial<\/code> \u2013 This is another switch that is particularly useful when transferring large files over the internet.\u00a0 If rsync gets interrupted for any reason in the middle of a file transfer, the partially transferred file is kept in the destination directory and the transfer is resumed where it left off once the rsync command is executed again.\u00a0 When transferring large files over the internet (say, a couple of gigabytes), there\u2019s nothing worse than having a few second internet outage, blue screen, or human error trip up your file transfer and having to start all over again.<\/p>\n<p><code>-P<\/code> \u2013 this switch combines <code>--progress<\/code> and <code>--partial<\/code>, so use it instead and it will make your rsync command a little neater.<\/p>\n<p><code>-z<\/code> or <code>--compress<\/code> \u2013 This switch will make rsync compress file data as it\u2019s being transferred, reducing the amount of data that has to be sent to the destination.\u00a0 It\u2019s actually a fairly common switch but is far from essential, only really benefiting you on transfers between slow connections, and it does nothing for the following types of files: 7z, avi, bz2, deb, g,z iso, jpeg, jpg, mov, mp3, mp4, ogg, rpm, tbz, tgz, z, zip.<\/p>\n<p><code>-h<\/code> or <code>--human-readable<\/code> \u2013 If you\u2019re using the <code>--progress<\/code> switch, you\u2019ll definitely want to use this one as well.\u00a0 That is, unless you like to convert bytes to megabytes on the fly.\u00a0 The <code>-h<\/code> switch converts all outputted numbers to human-readable format, so you can actually make sense of the amount of data being transferred.<\/p>\n<p><code>-n<\/code> or <code>--dry-run<\/code> \u2013 This switch is essential to know when you\u2019re first writing your rsync script and testing it out.\u00a0 It performs a trial run but doesn\u2019t actually make any changes \u2013 the would-be changes are still outputted as normal, so you can read over everything and make sure it looks okay before rolling your script into production.<\/p>\n<p><code>-R<\/code> or <code>--relative<\/code> \u2013 This switch must be used if the destination directory doesn\u2019t already exist.\u00a0 We will use this option later in this guide so that we can make directories on the target machine with timestamps in the folder names.<\/p>\n<p><code>--exclude-from<\/code> \u2013 This switch is used to link to an exclude list that contains directory paths that you don\u2019t want backed up.\u00a0 It just needs a plain text file with a directory or file path on each line.<\/p>\n<p><code>--include-from<\/code> \u2013 Similar to <code>--exclude-from<\/code>, but it links to a file that contains directories and file paths of data you want backed up.<\/p>\n<p><code>--stats<\/code> \u2013 Not really an important switch by any means, but if you are a sysadmin, it can be handy to know the detailed stats of each backup, just so you can monitor the amount of traffic being sent over your network and such.<\/p>\n<p><code>--log-file<\/code> \u2013 This lets you send the rsync output to a log file. We definitely recommend this for automated backups in which you aren\u2019t there to read through the output yourself.\u00a0 Always give log files a once over in your spare time to make sure everything is working properly.\u00a0 Also, it\u2019s a crucial switch for a sysadmin to use, so you\u2019re not left wondering how your backups failed while you left the intern in charge.<\/p>\n<p>Let\u2019s take a look at our rsync command now that we have a few more switches added:<\/p>\n<blockquote><p><code>rsync -avzhP --delete --stats --log-file=\/home\/geek\/rsynclogs\/backup.log --exclude-from '\/home\/geek\/exclude.txt' -e 'ssh -p 12345' \/home\/geek\/files\/ geek2@10.1.1.1:\/home\/geek2\/files\/<\/code><\/p><\/blockquote>\n<p>The command is still pretty simple, but we still haven\u2019t created a decent backup solution.\u00a0 Even though our files are now in two different physical locations, this backup does nothing to protect us from one of the main causes of data loss: human error.<\/p>\n<h3>Snapshot Backups<\/h3>\n<p>If you accidentally delete a file, a virus corrupts any of your files, or something else happens whereby your files are undesirably altered, and then you run your rsync backup script, your backed up data is overwritten with the undesirable changes.\u00a0 When such a thing occurs (not if, but when), your backup solution did nothing to protect you from your data loss.<\/p>\n<p>The creator of rsync realized this, and added the <code>--backup<\/code> and <code>--backup-dir<\/code> arguments so users could run differential backups.\u00a0 The very <a href=\"http:\/\/rsync.samba.org\/examples.html\">first example on rsync\u2019s website<\/a> shows a script where a full backup is run every seven days, and then the changes to those files are backed up in separate directories daily.\u00a0 The problem with this method is that to recover your files, you have to effectively recover them seven different times.\u00a0 Moreover, most geeks run their backups several times a day, so you could easily have 20+ different backup directories at any given time.\u00a0 Not only is recovering your files now a pain, but even just looking through your backed up data can be extremely time consuming \u2013 you\u2019d have to know the last time a file was changed in order to find its most recent backed up copy.\u00a0 On top of all that, it\u2019s inefficient to run only weekly (or even less often in some cases) incremental backups.<\/p>\n<p>Snapshot backups to the rescue!\u00a0 Snapshot backups are nothing more than incremental backups, but they utilize hardlinks to retain the file structure of the original source.\u00a0 That may be hard to wrap your head around at first, so let\u2019s take a look at an example.<\/p>\n<p>Pretend we have a backup script running that automatically backs up our data every two hours.\u00a0 Whenever rsync does this, it names each backup in the format of: Backup-month-day-year-time.<\/p>\n<p>So, at the end a typical day, we\u2019d have a list of folders in our destination directory like this:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-175015\" src=\"http:\/\/www.howtogeek.com\/wp-content\/uploads\/2013\/11\/527x86x7-timedirectories.jpg.pagespeed.gp+jp+jw+pj+js+rj+rp+rw+ri+cp+md.ic.0qMv7SgQCu.jpg\" alt=\"\" width=\"527\" height=\"86\" data-pin-nopin=\"true\" \/><\/p>\n<p>When traversing any of those directories, you\u2019d see every file from the source directory exactly as it was at that time.\u00a0 Yet, there would be no duplicates across any two directories.\u00a0 rsync accomplishes this with the use of hardlinking through the <code>--link-dest=DIR<\/code> argument.<\/p>\n<p>Of course, in order to have these nicely- and neatly-dated directory names, we\u2019re going to have to beef up our rsync script a bit.\u00a0 Let\u2019s take a look at what it would take to accomplish a backup solution like this, and then we\u2019ll explain the script in greater detail:<\/p>\n<blockquote><p><code>#!\/bin\/bash<\/code><\/p>\n<p><code>#copy old time.txt to time2.txt<\/code><\/p>\n<p><code>yes | cp ~\/backup\/time.txt ~\/backup\/time2.txt<\/code><\/p>\n<p><code>#overwrite old time.txt file with new time<\/code><\/p>\n<p><code>echo `date +\u201d%F-%I%p\u201d` &gt; ~\/backup\/time.txt<\/code><\/p>\n<p><code>#make the log file<\/code><\/p>\n<p><code>echo \u201c\u201d &gt; ~\/backup\/rsync-`date +\u201d%F-%I%p\u201d`.log<\/code><\/p>\n<p><code>#rsync command<\/code><\/p>\n<p><code>rsync -avzhPR --chmod=Du=rwx,Dgo=rx,Fu=rw,Fgo=r --delete --stats --log-file=~\/backup\/rsync-`date +\u201d%F-%I%p\u201d`.log --exclude-from '~\/exclude.txt' --link-dest=\/home\/geek2\/files\/`cat ~\/backup\/time2.txt` -e 'ssh -p 12345' \/home\/geek\/files\/ geek2@10.1.1.1:\/home\/geek2\/files\/`date +\u201d%F-%I%p\u201d`\/<\/code><\/p>\n<p><code>#don\u2019t forget to scp the log file and put it with the backup<\/code><\/p>\n<p><code>scp -P 12345 ~\/backup\/rsync-`cat ~\/backup\/time.txt`.log geek2@10.1.1.1:\/home\/geek2\/files\/`cat ~\/backup\/time.txt`\/rsync-`cat ~\/backup\/time.txt`.log<\/code><\/p><\/blockquote>\n<p>That would be a typical snapshot rsync script.\u00a0 In case we lost you somewhere, let\u2019s dissect it piece by piece:<\/p>\n<p>The first line of our script copies the contents of time.txt to time2.txt.\u00a0 The yes pipe is to confirm that we want to overwrite the file.\u00a0 Next, we take the current time and put it into time.txt.\u00a0 These files will come in handy later.<\/p>\n<p>The next line makes the rsync log file, naming it rsync-date.log (where date is the actual date and time).<\/p>\n<p>Now, the complex rsync command that we\u2019ve been warning you about:<\/p>\n<p><code>-avzhPR, -e, --delete, --stats, --log-file, --exclude-from, --link-dest<\/code> \u2013 Just the switches we talked about earlier; scroll up if you need a refresher.<\/p>\n<p><code>--chmod=Du=rwx,Dgo=rx,Fu=rw,Fgo=r<\/code> \u2013 These are the permissions for the destination directory.\u00a0 Since we are making this directory in the middle of our rsync script, we need to specify the permissions so that our user can write files to it.<\/p>\n<p><b>The use of date and cat commands<\/b><\/p>\n<p>We\u2019re going to go over each use of the date and cat commands inside the rsync command, in the order that they occur.\u00a0 Note: we\u2019re aware that there are other ways to accomplish this functionality, especially with the use of declaring variables, but for the purpose of this guide, we\u2019ve decided to use this method.<\/p>\n<p>The log file is specified as:<\/p>\n<blockquote><p><code>~\/backup\/rsync-`date +\u201d%F-%I%p\u201d`.log<\/code><\/p><\/blockquote>\n<p>Alternatively, we could have specified it as:<\/p>\n<blockquote><p><code>~\/backup\/rsync-`cat ~\/backup\/time.txt`.log<\/code><\/p><\/blockquote>\n<p>Either way, the <code>--log-file<\/code> command should be able to find the previously created dated log file and write to it.<\/p>\n<p>The link destination file is specified as:<\/p>\n<blockquote><p><code>--link-dest=\/home\/geek2\/files\/`cat ~\/backup\/time2.txt`<\/code><\/p><\/blockquote>\n<p>This means that the <code>--link-dest<\/code> command is given the directory of the previous backup.\u00a0 If we are running backups every two hours, and it\u2019s 4:00PM at the time we ran this script, then the <code>--link-dest<\/code> command looks for the directory created at 2:00PM and only transfers the data that has changed since then (if any).<\/p>\n<p>To reiterate, that is why time.txt is copied to time2.txt at the beginning of the script, so the <code>--link-dest<\/code> command can reference that time later.<\/p>\n<p>The destination directory is specified as:<\/p>\n<blockquote><p><code>geek2@10.1.1.1:\/home\/geek2\/files\/`date +\u201d%F-%I%p\u201d`<\/code><\/p><\/blockquote>\n<p>This command simply puts the source files into a directory that has a title of the current date and time.<\/p>\n<p>Finally, we make sure that a copy of the log file is placed inside the backup.<\/p>\n<blockquote><p><code>scp -P 12345 ~\/backup\/rsync-`cat ~\/backup\/time.txt`.log geek2@10.1.1.1:\/home\/geek2\/files\/`cat ~\/backup\/time.txt`\/rsync-`cat ~\/backup\/time.txt`.log<\/code><\/p><\/blockquote>\n<p>We use secure copy on port 12345 to take the rsync log and place it in the proper directory.\u00a0 To select the correct log file and make sure it ends up in the right spot, the time.txt file must be referenced via the cat command.\u00a0 If you\u2019re wondering why we decided to cat time.txt instead of just using the date command, it\u2019s because a lot of time could have transpired while the rsync command was running, so to make sure we have the right time, we just cat the text document we created earlier.<\/p>\n<h3>Automation<\/h3>\n<p>Use <a href=\"http:\/\/www.howtogeek.com\/101288\/how-to-schedule-tasks-on-linux-an-introduction-to-crontab-files\/\">Cron on Linux<\/a> or <a href=\"http:\/\/www.howtogeek.com\/123393\/how-to-automatically-run-programs-and-set-reminders-with-the-windows-task-scheduler\/\">Task Scheduler on Windows<\/a> to automate your rsync script.\u00a0 One thing you have to be careful of is making sure that you end any currently running rsync processes before continuing a new one.\u00a0 Task Scheduler seems to close any already running instances automatically, but for Linux you\u2019ll need to be a little more creative.<\/p>\n<p>Most Linux distributions can use the pkill command, so just be sure to add the following to the beginning of your rsync script:<\/p>\n<blockquote><p><code>pkill -9 rsync<\/code><\/p><\/blockquote>\n<h3>Encryption<\/h3>\n<p>Nope, we\u2019re not done yet.\u00a0 We finally have a fantastic (and free!) backup solution in place, but all of our files are still susceptible to theft.\u00a0 Hopefully, you\u2019re backing up your files to some place hundreds of miles away.\u00a0 No matter how secure that faraway place is, theft and hacking can always be problems.<\/p>\n<p>In our examples, we have tunneled all of our rsync traffic through SSH, so that means all of our files are encrypted while in transit to their destination.\u00a0 However, we need to make sure the destination is just as secure.\u00a0 Keep in mind that rsync only encrypts your data as it is being transferred, but the files are wide open once they reach their destination.<\/p>\n<p>One of rsync\u2019s best features is that it only transfers the changes in each file.\u00a0 If you have all of your files encrypted and make one minor change, the entire file will have to be retransmitted as a result of the encryption completely randomizing all of the data after any change.<\/p>\n<p>For this reason, it\u2019s best\/easiest to use some type of disk encryption, such as <a href=\"http:\/\/www.howtogeek.com\/howto\/6229\/how-to-use-bitlocker-on-drives-without-tpm\/\">BitLocker<\/a> for Windows or <a href=\"http:\/\/code.google.com\/p\/cryptsetup\/wiki\/DMCrypt\">dm-crypt<\/a> for Linux.\u00a0 That way, your data is protected in the event of theft, but files can be transferred with rsync and your encryption won\u2019t hinder its performance.\u00a0 There are other options available that work similarly to rsync or even implement some form of it, such as Duplicity, but they lack some of the features that rsync has to offer.<\/p>\n<p>After you\u2019ve setup your snapshot backups at an offsite location and encrypted your source and destination hard drives, give yourself a pat on the back for mastering rsync and implementing the most foolproof data backup solution possible.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>The Non-Beginner\u2019s Guide to Syncing Data with Rsync The rsync protocol can be pretty simple to use for ordinary backup\/synchronization jobs, but some of its more advanced features may surprise you.\u00a0 In this article, we\u2019re going to show how even &#8230; <a class=\"more-link\" href=\"https:\/\/www.wildow.com\/blog\/?p=1889\">Read More &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12],"tags":[],"class_list":["post-1889","post","type-post","status-publish","format-standard","hentry","category-linux"],"_links":{"self":[{"href":"https:\/\/www.wildow.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1889","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wildow.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wildow.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wildow.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wildow.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1889"}],"version-history":[{"count":1,"href":"https:\/\/www.wildow.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1889\/revisions"}],"predecessor-version":[{"id":1890,"href":"https:\/\/www.wildow.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1889\/revisions\/1890"}],"wp:attachment":[{"href":"https:\/\/www.wildow.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1889"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wildow.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1889"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wildow.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1889"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}