Turn your old Raspberry Pi into an automatic backup server

Image by:

Opensource.com

If you're one of those people upgrading to the Raspberry Pi 3, you might wonder what to do with your old, lesser Pi. Aside from turning it into an array of blinking LEDs to entertain your cat, you might consider configuring it as a microcontroller.

Making backups of our digital lives is, as most of us begrudgingly admit, the most important thing of daily computing that none of us bother to do. That's because going through the backup process requires us to remember to do it, it takes effort, and it takes time. And that's precisely why the best backup solution is the solution that you don't do at all; it's the one you automate.

Such a system is best when it's always on, running in the background. And that's exactly what a Raspberry Pi is best at. You can leave the Pi on all day and all night and never notice it on your power bill, and you can task it with the simple activity of running backups across your home network. All you need is a Raspberry Pi and a big hard drive and you have built, essentially, a custom version of those annoying "easy backup" systems that hard drive companies come out with every few years (you know the ones? the ones you hook up to your network, waste a weekend trying to configure only to discover in a hidden online forum that nothing works as advertised due to a bug in the firmware, which the hard drive company promised they'll fix "soon" two years ago).

rdiff-backup

First, you need to choose some backup software to have your backup server (your Pi) and your clients (your laptop, desktop, and whatever else) run.

There are several tools for auto backups, but I've found over the years that most of the nice slick graphical backup solutions end up falling out of maintenance until they fade away, forcing me to switch to something different. That gets annoying after a while, so I started using rsync, the venerable old UNIX command that's been around for decades. This served me quite well, but I started finding myself wanting versioned backups of certain files; rsync does a backup for files that have changed, but it overwrites the old version with the new, so if my problem isn't that a file has been deleted but that I've messed up a file beyond recognition, then having rsync'd backup files don't do me a bit of good, because the backup almost always ends up being the bad version of a file that I was looking to replace.

Then I found rdiff-backup, a simple backup tool based on rsync (it uses librsync), and thereby inheriting its reliability (it has, however, only been around since 2001, so it doesn't have quite the history that rsync has). Rdiff-backup performs incremental backups locally or over a network using standard UNIX tools (tar, rdiff, rsync, and so on), so even if it does fade away, the backup files it creates are still useful. It's lightweight and runs on both Linux and FreeBSD, so it's trivial to run even on the oldest Raspberry Pi.

Server install

You don't need any special setup to turn your Raspberry Pi into a backup server. Assuming your Pi is up and running, all you need to do is install rdiff-backup from your repository, ports, or extras site.

Client install

As for your clients (that is, the computers that are going to get backed up by your Pi), rdiff-backup can be run on Linux, BSD, Windows, and Mac OS X, so chances are you can use this for all the computers running in your home.

The big hard drive

Even a 64GB SD card isn't going to go very far for incremental backups, so you'll need a big hard drive to hook up to your Pi. You know your own data best, so let that be your guide when shopping for a drive. For my home network, I have a relatively small (given the number of multimedia data files I work with) 3TB drive; I do that for a number of reasons, but primarily because I don't actually back up all of the data I own. A lot of data I work with exists elsewhere anyway, so there's no need for me to back it up, and things like my music and movie collection I don't consider vital enough to backup, either. So don't feel like you have to literally keep track of every last kilobyte; just get to know your data and what matters to you most.

Once you've got the hard drive, hook it up to your Pi and format it. Strictly speaking, you may not absolutely have to format it, but if you're going to have Linux manage the data then you may as well store the data on a native filesystem. This assumes that your backup drive is either new or a drive you want to wipe completely. If not, you can skip this part.

To format a drive on Linux, you must use root permissions. It somewhat depends on what distribution you are running on your Pi (Raspbian, Pidora, and so on), but usually the sudo command is the way to invoke this. No matter what, the tool to use is parted, and as long as you have no other drives attached to your Pi (aside from the SD card it has booted from), then the location of your drive is /dev/sda. For safety, I'll use /dev/sdx just to avoid potential copy-paste mishaps.

First, confirm the location of your drive:

$ sudo ls -1 /dev/sd*
/dev/sdX
/dev/sdX1
/dev/sdX2

Then run parted on the drive to confirm its total size:

  $ sudo parted /dev/sdX unit MB print
Model: Tycoon hard drive Corp. (scsi)
Disk /dev/sda: 1985442MB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Look at the line that starts with Disk; this gives you the total size of the drive in megabytes. Jot that down somewhere, because you'll need it in a moment.

Next, create a new partition on the drive, spanning the entire drive. Only do this if you want to wipe the backup drive completely to make room for all your backups. If there is any data on the drive that you do not want to disappear forever, then do not do this.

  $ sudo parted /dev/sdX mklabel gpt
Warning: The existing disk label on /dev/sdX will be destroyed
and all data on this disk will be lost. Do you want to continue?
Yes/No? Yes
Information: You may need to update /etc/fstab.
$ sudo parted /dev/sdx mkpart primary 1 1985442
Information: You may need to update /etc/fstab.

Your fresh partition exists now, so create a filesystem inside of it. Note that for this command, you use the partition rather than the disk location. So instead of /dev/sda, for example, you would use /dev/sda1. For best results, also provide the disk with a label (the -L option), which we will use later to auto-mount the drive.

$ sudo mkfs.ext4 -L backupdrive /dev/sdx1

Your drive is now ready for its life as a backup drive.

Auto-mounting the backup drive

The idea of using a Pi for your backup server is, in part, that it'll always be on. But if something does happen (a power failure, for example, or accidental shutdown) then you want your backup drive to be re-mounted automatically or else any attempt to backup will fail.

To setup auto-mounting for your drive, first create a standard location for it to be mounted. Drives are usually mounted to locations like /media or /run/media, which is fine, but for simplicity just create a directory for it at the root of your filesystem:

$ sudo mkdir /backupdrive

And then edit /etc/fstab with root privileges in the text editor of your choice. Add this line:

LABEL=backupdrive     /backupdrive    ext4   user,rw  0 0

And finally mount the drive:

$ sudo mount -a

The initial backup

The first backup you do is the largest and slowest backup because everything that you want backed up is getting copied to your drive. Subsequent backups are much smaller and faster because only new files (or blobs) or changes to files get copied over.

First, install rdiff-backup on the client computer (the one to be backed up to the Pi). It's available for the major operating systems.

To make sure that your future backups go as expected, make your first backup using the same command and same setup that you intend to use for the incremental backups. That means you shouldn't disconnect the big drive from the Pi and plug it into the client so that it goes faster; perform every backup the same way every time, so that you know exactly how to automate it later.

On the Pi, make a directory for the folder you are about to backup from your client. Assuming you want to backup the client's home directory, create the a mirror of that folder on the backup drive:

$ sudo mkdir -p /backupdrive/home/seth

And then make sure that the same user owns the directory:

$ chown seth:users /backupdrive/home/seth

This assumes that user seth exists both on the client and on the Pi. You don't have to do it that way (rdiff-backup can sign into the Pi as a different user), but it sometimes makes it easier to manage when the backups are mirrors of the source.

This also assumes that you are backing up your home directory. That's usually a good place to start (I assume that if you're running Linux, then you can download and replace the base system for free), but you might want to leave out large files that you don't need to backup. List files and folders to exclude from backups in a file called .excludes in your home directory. At the very least, you can probably safely exclude your trash directory:

$ echo "$HOME/.local/.local/share/Trash" && $HOME/.excludes

The basic rdiff-backup command from your client computer, where 192.168.3.14 is the IP address of your Pi:

$ rdiff-backup --terminal-verbosity 8 --print-statistics \
--exclude-globbing-filelist $HOME/.excludes \
/home/seth/ seth@192.168.3.14::/backupdrive/home/seth/

That command should kick off a lengthy rsync process in which all files are discovered to not exist on the backup drive, and therefore are copied from the client to the Pi. If it failed, check the permissions involved; your user (on the Pi) must be able to write to the backup drive. Also, your user must be able to successfully SSH into the Pi remotely.

Auto login

Since our aim is to automate this process, the login process that kicks off backups must also happen without intervention. It's easy to make SSH login automatic; just use ssh key login. This can be done as a single step with ssh-copy-id, which should be in your Pi distro's repository). To use a special key just for this backup server, use the ssh config file to specify what key to use.

Cron job

Assuming everything has worked so far, there's no reason an unattended backup should fail. To make that happen, take the same command you used for the initial backup and assign it to a cronjob. This is generally done with the command cronjob -e:

0 */6 * * * rdiff-backup --exclude-globbing-filelist /home/seth/.excludes \
/home/seth/ seth@192.168.3.14::/backupdrive/seth/

That cronjob runs the backup command every six hours (on the hour). You can adjust the frequency according to your needs.

Restore data

Now that the backup has been automated, there's only one command you actually need to remember: how to restore a file from the backups you are so dutifully making.

The simplest restore command is as simple as an rsync or scp:

$ rdiff-backup --restore-as-of now \
seth@192.168.3.14::/backupdrive/seth/paint/tux.svg \
~/paint/tux.svg

This command restores from the backup server the most recent version of tux.svg to the same path on your client machine. Notice that you don't have to worry about special file paths to account for versions; if you want the most recent version, you just restore the same path that is missing or that you have corrupted, and let rdiff-backup resolves that request to the most recent version.

But the --restore-as-of option is more flexible than that. Maybe the version of the file you need is from five days ago:

$ rdiff-backup --restore-as-of 5D
seth@192.168.3.14::/backupdrive/seth/paint/tux.svg \
~/paint/tux.svg

There are several other means of restoring files, and they're all listed in the official rdiff-backup documentation, but in practice I have found that the --restore-as-of option is the one that gets used most often. In the less common circumstances that you know the exact day and time of the last good version of a file and need to pull it very specifically from your backups, rdiff-backup handles that for you too; you just have to get the rather unwieldy diff filename, stored alongside your backup data on the backup drive.

For example:

$ rdiff-backup 192.168.3.14::/backupdrive/seth/rdiff-backup-data/increments/ \
paint.2016-01-24T06:06:00-07:00.diff.gz $HOME

This restores the file paint from the backup performed at 6:06 a.m. on January 24. It does not place, of course, just the diff data of that file into your home directory, but a fully reconstructed version of the file. That's what rdiff-backup is for.