Turn your old Raspberry Pi into an automatic backup server

No readers like this yet.
open source button on keyboard

Opensource.com

If you're one of those people upgrading to the Raspberry Pi 3, you might wonder what to do with your old, lesser Pi. Aside from turning it into an array of blinking LEDs to entertain your cat, you might consider configuring it as a microcontroller.

Making backups of our digital lives is, as most of us begrudgingly admit, the most important thing of daily computing that none of us bother to do. That's because going through the backup process requires us to remember to do it, it takes effort, and it takes time. And that's precisely why the best backup solution is the solution that you don't do at all; it's the one you automate.

Such a system is best when it's always on, running in the background. And that's exactly what a Raspberry Pi is best at. You can leave the Pi on all day and all night and never notice it on your power bill, and you can task it with the simple activity of running backups across your home network. All you need is a Raspberry Pi and a big hard drive and you have built, essentially, a custom version of those annoying "easy backup" systems that hard drive companies come out with every few years (you know the ones? the ones you hook up to your network, waste a weekend trying to configure only to discover in a hidden online forum that nothing works as advertised due to a bug in the firmware, which the hard drive company promised they'll fix "soon" two years ago).

rdiff-backup

First, you need to choose some backup software to have your backup server (your Pi) and your clients (your laptop, desktop, and whatever else) run.

There are several tools for auto backups, but I've found over the years that most of the nice slick graphical backup solutions end up falling out of maintenance until they fade away, forcing me to switch to something different. That gets annoying after a while, so I started using rsync, the venerable old UNIX command that's been around for decades. This served me quite well, but I started finding myself wanting versioned backups of certain files; rsync does a backup for files that have changed, but it overwrites the old version with the new, so if my problem isn't that a file has been deleted but that I've messed up a file beyond recognition, then having rsync'd backup files don't do me a bit of good, because the backup almost always ends up being the bad version of a file that I was looking to replace.

Then I found rdiff-backup, a simple backup tool based on rsync (it uses librsync), and thereby inheriting its reliability (it has, however, only been around since 2001, so it doesn't have quite the history that rsync has). Rdiff-backup performs incremental backups locally or over a network using standard UNIX tools (tar, rdiff, rsync, and so on), so even if it does fade away, the backup files it creates are still useful. It's lightweight and runs on both Linux and FreeBSD, so it's trivial to run even on the oldest Raspberry Pi.

Server install

You don't need any special setup to turn your Raspberry Pi into a backup server. Assuming your Pi is up and running, all you need to do is install rdiff-backup from your repository, ports, or extras site.

Client install

As for your clients (that is, the computers that are going to get backed up by your Pi), rdiff-backup can be run on Linux, BSD, Windows, and Mac OS X, so chances are you can use this for all the computers running in your home.

The big hard drive

Even a 64GB SD card isn't going to go very far for incremental backups, so you'll need a big hard drive to hook up to your Pi. You know your own data best, so let that be your guide when shopping for a drive. For my home network, I have a relatively small (given the number of multimedia data files I work with) 3TB drive; I do that for a number of reasons, but primarily because I don't actually back up all of the data I own. A lot of data I work with exists elsewhere anyway, so there's no need for me to back it up, and things like my music and movie collection I don't consider vital enough to backup, either. So don't feel like you have to literally keep track of every last kilobyte; just get to know your data and what matters to you most.

Once you've got the hard drive, hook it up to your Pi and format it. Strictly speaking, you may not absolutely have to format it, but if you're going to have Linux manage the data then you may as well store the data on a native filesystem. This assumes that your backup drive is either new or a drive you want to wipe completely. If not, you can skip this part.

To format a drive on Linux, you must use root permissions. It somewhat depends on what distribution you are running on your Pi (Raspbian, Pidora, and so on), but usually the sudo command is the way to invoke this. No matter what, the tool to use is parted, and as long as you have no other drives attached to your Pi (aside from the SD card it has booted from), then the location of your drive is /dev/sda. For safety, I'll use /dev/sdx just to avoid potential copy-paste mishaps.

First, confirm the location of your drive:

$ sudo ls -1 /dev/sd*
/dev/sdX
/dev/sdX1
/dev/sdX2

Then run parted on the drive to confirm its total size:

  $ sudo parted /dev/sdX unit MB print
Model: Tycoon hard drive Corp. (scsi)
Disk /dev/sda: 1985442MB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Look at the line that starts with Disk; this gives you the total size of the drive in megabytes. Jot that down somewhere, because you'll need it in a moment.

Next, create a new partition on the drive, spanning the entire drive. Only do this if you want to wipe the backup drive completely to make room for all your backups. If there is any data on the drive that you do not want to disappear forever, then do not do this.

  $ sudo parted /dev/sdX mklabel gpt
Warning: The existing disk label on /dev/sdX will be destroyed
and all data on this disk will be lost. Do you want to continue?
Yes/No? Yes
Information: You may need to update /etc/fstab.
$ sudo parted /dev/sdx mkpart primary 1 1985442
Information: You may need to update /etc/fstab.

Your fresh partition exists now, so create a filesystem inside of it. Note that for this command, you use the partition rather than the disk location. So instead of /dev/sda, for example, you would use /dev/sda1. For best results, also provide the disk with a label (the -L option), which we will use later to auto-mount the drive.

$ sudo mkfs.ext4 -L backupdrive /dev/sdx1

Your drive is now ready for its life as a backup drive.

Auto-mounting the backup drive

The idea of using a Pi for your backup server is, in part, that it'll always be on. But if something does happen (a power failure, for example, or accidental shutdown) then you want your backup drive to be re-mounted automatically or else any attempt to backup will fail.

To setup auto-mounting for your drive, first create a standard location for it to be mounted. Drives are usually mounted to locations like /media or /run/media, which is fine, but for simplicity just create a directory for it at the root of your filesystem:

$ sudo mkdir /backupdrive

And then edit /etc/fstab with root privileges in the text editor of your choice. Add this line:

LABEL=backupdrive     /backupdrive    ext4   user,rw  0 0

And finally mount the drive:

$ sudo mount -a

The initial backup

The first backup you do is the largest and slowest backup because everything that you want backed up is getting copied to your drive. Subsequent backups are much smaller and faster because only new files (or blobs) or changes to files get copied over.

First, install rdiff-backup on the client computer (the one to be backed up to the Pi). It's available for the major operating systems.

To make sure that your future backups go as expected, make your first backup using the same command and same setup that you intend to use for the incremental backups. That means you shouldn't disconnect the big drive from the Pi and plug it into the client so that it goes faster; perform every backup the same way every time, so that you know exactly how to automate it later.

On the Pi, make a directory for the folder you are about to backup from your client. Assuming you want to backup the client's home directory, create the a mirror of that folder on the backup drive:

$ sudo mkdir -p /backupdrive/home/seth

And then make sure that the same user owns the directory:

$ chown seth:users /backupdrive/home/seth

This assumes that user seth exists both on the client and on the Pi. You don't have to do it that way (rdiff-backup can sign into the Pi as a different user), but it sometimes makes it easier to manage when the backups are mirrors of the source.

This also assumes that you are backing up your home directory. That's usually a good place to start (I assume that if you're running Linux, then you can download and replace the base system for free), but you might want to leave out large files that you don't need to backup. List files and folders to exclude from backups in a file called .excludes in your home directory. At the very least, you can probably safely exclude your trash directory:

$ echo "$HOME/.local/.local/share/Trash" && $HOME/.excludes

The basic rdiff-backup command from your client computer, where 192.168.3.14 is the IP address of your Pi:

$ rdiff-backup --terminal-verbosity 8 --print-statistics \
--exclude-globbing-filelist $HOME/.excludes \
/home/seth/ seth@192.168.3.14::/backupdrive/home/seth/

That command should kick off a lengthy rsync process in which all files are discovered to not exist on the backup drive, and therefore are copied from the client to the Pi. If it failed, check the permissions involved; your user (on the Pi) must be able to write to the backup drive. Also, your user must be able to successfully SSH into the Pi remotely.

Auto login

Since our aim is to automate this process, the login process that kicks off backups must also happen without intervention. It's easy to make SSH login automatic; just use ssh key login. This can be done as a single step with ssh-copy-id, which should be in your Pi distro's repository). To use a special key just for this backup server, use the ssh config file to specify what key to use.

Cron job

Assuming everything has worked so far, there's no reason an unattended backup should fail. To make that happen, take the same command you used for the initial backup and assign it to a cronjob. This is generally done with the command cronjob -e:

0 */6 * * * rdiff-backup --exclude-globbing-filelist /home/seth/.excludes \
/home/seth/ seth@192.168.3.14::/backupdrive/seth/

That cronjob runs the backup command every six hours (on the hour). You can adjust the frequency according to your needs.

Restore data

Now that the backup has been automated, there's only one command you actually need to remember: how to restore a file from the backups you are so dutifully making.

The simplest restore command is as simple as an rsync or scp:

$ rdiff-backup --restore-as-of now \
seth@192.168.3.14::/backupdrive/seth/paint/tux.svg \
~/paint/tux.svg

This command restores from the backup server the most recent version of tux.svg to the same path on your client machine. Notice that you don't have to worry about special file paths to account for versions; if you want the most recent version, you just restore the same path that is missing or that you have corrupted, and let rdiff-backup resolves that request to the most recent version.

But the --restore-as-of option is more flexible than that. Maybe the version of the file you need is from five days ago:

$ rdiff-backup --restore-as-of 5D
seth@192.168.3.14::/backupdrive/seth/paint/tux.svg \
~/paint/tux.svg

There are several other means of restoring files, and they're all listed in the official rdiff-backup documentation, but in practice I have found that the --restore-as-of option is the one that gets used most often. In the less common circumstances that you know the exact day and time of the last good version of a file and need to pull it very specifically from your backups, rdiff-backup handles that for you too; you just have to get the rather unwieldy diff filename, stored alongside your backup data on the backup drive.

For example:

$ rdiff-backup 192.168.3.14::/backupdrive/seth/rdiff-backup-data/increments/ \
paint.2016-01-24T06:06:00-07:00.diff.gz $HOME

This restores the file paint from the backup performed at 6:06 a.m. on January 24. It does not place, of course, just the diff data of that file into your home directory, but a fully reconstructed version of the file. That's what rdiff-backup is for.

Back it up

Backing up is important, and your old Pi can help. Set it up today and you won't be sorry.

Seth Kenlon
Seth Kenlon is a UNIX geek, free culture advocate, independent multimedia artist, and D&D nerd. He has worked in the film and computing industry, often at the same time.

16 Comments

Nice article, Seth. I still have a Pi or two waiting for something useful to do. This would be a good way to do that.

I use a set of scripts that I have created over the years that are based on rsync. I create a new path every day and use the link-dest argument to point to the most recent previous backup. The rsync command in my script then creates hard links in today's backup directory to yesterday's directory. It then checks for required updates and adds new files and modifies existing ones on today's directory, leaving yesterday's alone by creating new copies of linked files that have been altered, severing the links to yesterday for only those files. It then modifies the newly created unlinked files leaving previous versions as archives. I keep 30 days worth of archives.

Your solution looks quite elegant, too.

Once again, the power of open source is displayed by the many powerful and customizable alternatives available for almost any task.

Heck, David, that script might be worth releasing to the public somewhere! Sounds really nice, and I'm always curious about efficiency with big blobs of non-diff'able data. Sounds like your script might be worth trying out.

In reply to by dboth

Great article, Seth! I've been cudgeling my brain for a better way to manage my backups, and this lights up a new path for me to check out!

It keeps my home network pretty well backed-up. One thing it doesn't address, obviously, is off-site backups, but I'm thinking about starting a "backup exchange" programme with a family member back in the States; I'll drive his backups at my house on my Pi, and he'll drive and host my backups at his on his Pi. Off site, data centers we each trust.

In reply to by druthb

Good project , i will try asp..

Never a bad idea to backup, and it's certainly better to put an old Pi to work than letting it sit and gather dust!

In reply to by Hamdta Paulo (not verified)

This is excellent Seth thanks.

Is there any way the backups can be encrypted?

I have some of my drives encrypted but that becomes a bit pointless if the backup is not.

Absolutely! the easiest way to do that, I think, would be to encrypt the volume itself. I use LUKS for this. I'll make a note that this would be a good topic for a future article, but the short version is that LUKS is native to Linux, so as long as your Pi and computer are each configured to mount and decrypt the drive, they will each freely read and write from the volume. No problem. Since rsync uses SSH, the traffic itself is also encrypted.

Any other computer that the drive is plugged into, of course, will not be able to decrypt the drie, unless you yourself provide credentials.

In reply to by Dave McIntyre (not verified)

Just wondering if this could be kludged to automate backups from several classes worth of Raspberry Pi's. I teach computer science in middle school and might have up to 45 students a day working on Pi's, though not all at the same time..... Does that sound doable?

No kludge necessary, really. It absolutely an do this. My home network has two users across four machines (no where near 45 users, obviously); each machine spawns its own rdiff-backup process, and the usual amounts of data are not such that a bottleneck ever occurs. I don't think that a traffic jam is likely in the classroom, either, unless your students are creating massive amounts of data.What I would probably do, though, is just use git and teach the students to back up their OWN work! Git is easy to set up on a Pi or spare machine. Since 45 students is a lot of user accounts to manage, I have (again, in smaller settings; usually classes of no more than 10 students at a time), simply created one git repo per class, and then had each student create their own branch inside the class repository. Not exactly the intended design, but for simple student work, it tends to work ok. For advanced classes, I just have them SSH into the server and create their own git repository themselves. This is all done on a classroom-only server, within a private class LAN, obviously.

In reply to by Bob Irving (not verified)

Hmmm... If you needed files versioning then maybe it was easied to add to rsync correct filesystem - BTRFS?
Some time ago I was using Backuppc, it is also using rsync but has also some deduplication options.

Great idea. Using BTRFS would be a great thing to investigate for the future.

In reply to by asd (not verified)

What version of the Raspberry pi is the image? Mine doesn't look like that.

An old one? Mine don't look like that either. I think it must be either an early early version or else it's not actually an RPi. There are lots of other SOC boards out there, and many work in basically the same way (that is to say, they can run Linux). Then again, none of mine looked like this one https://opensource.com/life/16/3/how-configure-raspberry-pi-microcontro… either until I got an A+, which I didn't even know existed til I got one on clearance.

I am curious about the image, if for no other reason than for historical curiosity.

In reply to by Jacob F Roecker

I'm the OP here. We have 20 RPi's for next year and I'm reminded that we need a solution. I've looked in PiNet, which might be perfect except that all of our stuff runs on wifi, which PiNet doesn't support. So now I'm revisiting the issue. Since students will be sharing the Pi's (perhaps as many as 60 students to 20 Pi's), I'm also worried about students overwriting each other's stuff. If only PiNet worked with wifi....

Bob, PiNet doesn't actually know whether you are using wifi or not (a node on a network is a node on the network) but PiNet is warning you that if you have slow wifi performance, then your system performance will be slow. So if you can get good enough wifi, then you could give it a go, in theory.
.
If not, then remember that the Pi is a Linux system; it's natively multi-user. Why not give each student a unique login so they can't overwrite one another's stuff? Without identity management, you'd have to create the users once on a master SD card, and then duplicate that card for all the other Pi's (presumably, you'll have to do that any way, though). Use those login credentials as the basis for your shared file backups and version control.
.
Also, don't underestimate the kids. A local organisaiton, Makerbox.org.nz, runs lots of classes on Linux computers with a rotating student base. The teachers use one generic login and simply enforce that kids save their work in a folder with their name on it (which usually get "backed up" at the end of each day, by virtue of the fact that a teacher has to copy the directory to a thumbdrive so she can check their work) and it works fine. Nobody deletes any body else's directory.
.
You might try posting your questions and thoughts over on http://linuxquestions.org; it's a support forum, so there may be better ideas there. Either way, I do wonder if a "classroom management" article may be in my future....

In reply to by Bob Irving (not verified)

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.