How to roll your own backup solution with BorgBackup, Rclone, and Wasabi cloud storage

Protect your data with an automated backup solution built on open source software and inexpensive cloud storage.
492 readers like this.
Construction worker building a cinderblock wall

Opensource.com

For several years, I used CrashPlan to back up my family's computers, including machines belonging to my wife and siblings. The fact that CrashPlan was essentially "always on" and doing frequent backups without ever having to think about it was fantastic. Additionally, the ability to do point-in-time restores came in handy on several occasions. Because I'm generally the IT person for the family, I loved that the user interface was so easy to use that family members could recover their data without my help.

Recently CrashPlan announced that it was dropping its consumer subscriptions to focus on its enterprise customers. It makes sense, I suppose, as it wasn't making a lot of money off folks like me, and our family plan was using a whole lot of storage on its system.

I decided that the features I would need in a suitable replacement included:

  • Cross-platform support for Linux and Mac
  • Automation (so there's no need to remember to click "backup")
  • Point-in-time recovery (or something close) so if you accidentally delete a file but don't notice until later, it's still recoverable
  • Low cost
  • Replicated data store for backup sets, so data exists in more than one place (i.e., not just backing up to a local USB drive)
  • Encryption in case the backup files fall into the wrong hands

I searched around and asked my friends about services similar to CrashPlan. One was really happy with Arq, but no Linux support meant it was no good for me. Carbonite is similar to CrashPlan but would be expensive, because I have multiple machines to back up. Backblaze offers unlimited backups at a good price (US$ 5/month), but its backup client doesn't support Linux. BackupPC was a strong contender, but I had already started testing my solution before I remembered it. None of the other options I looked at matched everything I was looking for. That meant I had to figure out a way to replicate what CrashPlan delivered for me and my family.

I knew there were lots of good options for backing up files on Linux systems. In fact, I've been using rdiff-backup for at least 10 years, usually for saving snapshots of remote filesystems locally. I had hopes of finding something that would do a better job of deduplicating backup data though, because I knew there were going to be some things (like music libraries and photos) that were stored on multiple computers.

I think what I worked out came pretty close to meeting my goals.

My backup solution

backup solution diagram

opensource.com

Ultimately, I landed on a combination of BorgBackup, Rclone, and Wasabi cloud storage, and I couldn't be happier with my decision. Borg fits all my criteria and has a pretty healthy community of users and contributors. It offers deduplication and compression, and works great on PC, Mac, and Linux. I use Rclone to synchronize the backup repositories from the Borg host to S3-compatible storage on Wasabi. Any S3-compatible storage will work, but I chose Wasabi because its price can't be beat and it outperforms Amazon's S3. With this setup, I can restore files from the local Borg host or from Wasabi.

Installing Borg on my machine was as simple as sudo apt install borgbackup. My backup host is a Linux machine that's always on with a 1.5TB USB drive attached to it. This backup host could be something as lightweight as a Raspberry Pi if you don't have a machine available. Just make sure all the client machines can reach this server over SSH and you are good to go.

On the backup host, initialize a new backup repository with:

$ borg init /mnt/backup/repo1

Depending on what you're backing up, you might choose to make multiple repositories per machine, or possibly one big repository for all your machines. Because Borg deduplicates, if you have identical data on many computers, sending backups from all those machines to the same repository might make sense.

Installing Borg on the Linux client machines was straightforward. On Mac OS X I needed to install XCode and Homebrew first. I followed a how-to to install the command-line tools, then used pip3 install borgbackup.

Backing up

Each machine has a backup.sh script (see below) that is kicked off by cron at regular intervals; it will make only one backup set per day, but it doesn't hurt to try a few times in the same day. The laptops are set to try every two hours, because there's no guarantee they will be on at a certain time, but it's very likely they'll be on during one of those times. This could be improved by writing a daemon that's always running and triggers a backup attempt anytime the laptop wakes up. For now, I'm happy with the way things are working.

I could skip the cron job and provide a relatively easy way for each user to trigger a backup using BorgWeb, but I really don't want anyone to have to remember to back things up. I tend to forget to click that backup button until I'm in dire need of a restoration (at which point it's way too late!).

The backup script I'm using came from the Borg quick start docs, plus I added a little check at the top to see if Borg is already running, which will exit the script if the previous backup run is still in progress. This script makes a new backup set and labels it with the hostname and current date. It then prunes old backup sets with an easy retention schedule.

Here is my backup.sh script:

#!/bin/sh

REPOSITORY=borg@borgserver:/mnt/backup/repo1

#Bail if borg is already running, maybe previous run didn't finish
if pidof -x borg >/dev/null; then
    echo "Backup already running"
    exit
fi

# Setting this, so you won't be asked for your repository passphrase:
export BORG_PASSPHRASE='thisisnotreallymypassphrase'
# or this to ask an external program to supply the passphrase:
export BORG_PASSCOMMAND='pass show backup'

# Backup all of /home and /var/www except a few
# excluded directories
borg create -v --stats                          \
    $REPOSITORY::'{hostname}-{now:%Y-%m-%d}'    \
    /home/doc                                   \
    --exclude '/home/doc/.cache'                \
    --exclude '/home/doc/.minikube'             \
    --exclude '/home/doc/Downloads'             \
    --exclude '/home/doc/Videos'                \
    --exclude '/home/doc/Music'                 \

# Use the `prune` subcommand to maintain 7 daily, 4 weekly and 6 monthly
# archives of THIS machine. The '{hostname}-' prefix is very important to
# limit prune's operation to this machine's archives and not apply to
# other machine's archives also.
borg prune -v --list $REPOSITORY --prefix '{hostname}-' \
    --keep-daily=7 --keep-weekly=4 --keep-monthly=6

The output from a backup run looks like this:

------------------------------------------------------------------------------
Archive name: x250-2017-10-05
Archive fingerprint: xxxxxxxxxxxxxxxxxxx
Time (start): Thu, 2017-10-05 03:09:03
Time (end):   Thu, 2017-10-05 03:12:11
Duration: 3 minutes 8.12 seconds
Number of files: 171150
------------------------------------------------------------------------------
                       Original size      Compressed size Deduplicated size
This archive:               27.75 GB             27.76 GB 323.76 MB
All archives:                3.08 TB              3.08 TB 262.76 GB

                       Unique chunks         Total chunks
Chunk index:                 1682989             24007828
------------------------------------------------------------------------------
[...]
Keeping archive: x250-2017-09-17                      Sun, 2017-09-17 03:09:02
Pruning archive: x250-2017-09-28                      Thu, 2017-09-28 03:09:02

Once I had all the machines backing up to the host, I followed the instructions for installing a precompiled Rclone binary and set it up to access my Wasabi account.

This script runs each night to synchronize any changes to the backup sets:

#!/bin/bash
set -e

repos=( repo1 repo2 repo3 )

#Bail if rclone is already running, maybe previous run didn't finish
if pidof -x rclone >/dev/null; then
    echo "Process already running"
    exit
fi

for i in "${repos[@]}"
do
    #Lets see how much space is used by directory to back up
    #if directory is gone, or has gotten small, we will exit
    space=`du -s /mnt/backup/$i|awk '{print $1}'`

    if (( $space < 34500000 )); then
       echo "EXITING - not enough space used in $i"
       exit
    fi

    /usr/bin/rclone -v sync /mnt/backup/$i wasabi:$i >> /home/borg/wasabi-sync.log 2>&1
done

The first synchronization of the backup set to Wasabi with Rclone took several days, but it was around 400GB of new data, and my outbound connection is not super-fast. But the daily delta is very small and completes in just a few minutes.

Restoring files

Restoring files is not as easy as it was with CrashPlan, but it is relatively straightforward. The fastest approach is to restore from the backup stored on the Borg backup server. Here are some example commands used to restore:

#List which backup sets are in the repo
$ borg list borg@borgserver:/mnt/backup/repo1
Remote: Authenticated with partial success.
Enter passphrase for key ssh://borg@borgserver/mnt/backup/repo1: 
x250-2017-09-17                      Sun, 2017-09-17 03:09:02
#List contents of a backup set
$ borg list borg@borgserver:/mnt/backup/repo1::x250-2017-09-17 | less
#Restore one file from the repo
$ borg extract borg@borgserver:/mnt/backup/repo1::x250-2017-09-17 home/doc/somefile.jpg
#Restore a whole directory
$ borg extract borg@borgserver:/mnt/backup/repo1::x250-2017-09-17 home/doc

If something happens to the local Borg server or the USB drive holding all the backup repositories, I can also easily restore directly from Wasabi. If the machine has Rclone installed, using rclone mount I can mount the remote storage bucket as though it were a local filesystem:

#Mount the S3 store and run in the background
$ rclone mount wasabi:repo1 /mnt/repo1 &
#List archive contents
$ borg list /mnt/repo1
#Extract a file
$ borg extract /mnt/repo1::x250-2017-09-17 home/doc/somefile.jpg

How it's working

Now that I've been using this backup approach for a few weeks, I can say I'm really happy with it. Setting everything up and getting it running was a lot more complicated than just installing CrashPlan of course, but that's the difference between rolling your own solution and using a service. I will have to watch closely to be sure backups continue to run and the data is properly synchronized to Wasabi.

But, overall, replacing CrashPlan with something offering comparable backup coverage at a really reasonable price turned out to be a little easier than I expected. If you see room for improvement please let me know.

This was originally published on Local Conspiracy and is republished with permission.

Picture of Christopher Aedo
Christopher Aedo has been working with and contributing to open source software since his college days. Most recently he can be found at Teradata where he serves as Director of Open Source, focusing on helping the organization embrace open source software through internal use and external contributions.

28 Comments

Hi Christopher, thanks for your great article. One doubt only: what do you mean with: "works great on PC, Mac, and Linux"? Do you mean "Windows PC, Mac, and Linux"? In my family there are some Windows PCs so I'm wondering if it's possible to extend your solution to support them.

Thanks,
Flavio

Flavio, yes I should have said Windows PCs. Borg Backup will run under the Windows 10 Linux Subsystem though their site says it's currently considered experimental. Presumably rclone would work under the linux subsystem as well, so it should be possible to run all of this on Windows 10.

In reply to by Fly66 (not verified)

Wow! I wasn't aware of Wasabi. I ended up replacing CrashPlan on my home server with a combination of CloudBerry and BackBlaze B2. It's stupid cheap for the amount of data that I have. There is a GUI for people who prefer it, but a CLI for admins and for a one-time payment of $30, it's hard to beat.

Yes, Borg can do encryption according its site. So that backup also could be encrypted. Then it would not be so easily restorable directly from Wasabi, but one would need to replace she Borg server first and then resync the Borg server back with the encrypted data from Wasabi.

You say "not so easy", but it looks easy enough. From the article the remote Wasabit Borg repository may be mounted as a local directory with rclone, so one wouldn't need to necessarily sync the whole Borg repo back in order to restore some files. Though for large Borg repos there may be considerable slowness to deal with restores. I'd recommend periodically backing up a copy of the local Borg repo to a removable hard drive for faster access in such a disaster. In fact if you use external USB drives, you should be rotating several of these off site regardless.

In reply to by Lars Schotte (not verified)

Thank you Christopher for the great ressource that I was not aware of.

You can actually restore individual files easily, using 'borg mount' which will mount a backup as a FUSE filesystem.
You also don't need to check if borg is running in your script; operations like create will put a lock on the repo, and any subsequent operation will fail.

Hi Christopher,
what happens if you backup almost similar directories to the same repo? (e.g. Box A backs up then Box B but it misses some files from Box A. Will these files vanish on A if you restore A from the repo?
How do you organize this?

Let say you backup box A into archives named boxA- and box A into archives named boxB-, all of them into the same borg repo in order to maximize the deduplication.

To restore boxA you will use an archive named boxA-. The same for boxB.

In reply to by jeff jurugu (not verified)

Of course I would say :

Let say you backup box A into archives named boxA- and box B into archives named boxB
^^

Sorry.

In reply to by mahikeulbody (not verified)

If you rclone a corrupted borg repo you get a corrupted wasabi archive...

Yes, this is indeed a problem! There need to be consistency checks before uploading!

First: Serious corruption will -- afaics -- occur, when you use Rclone while there is an ongoing commit/backup by one of your computers. You said that the laptops backup irregular and afaics there is no checking for this involved here.
You hence need at least to add the `pidof -x borg` check from your backup.sh to your Rclone script.

Second: Mild corruption will occur when the backup process breaks at some point. E.g. laptop shuts off/powers off, WiFi breaks, ... You said that you want things to work unnoticed in the background. That makes a realistic chance for this to happen. Divide the risk by the awareness of the end user. :-)
Usually borg is able to repair these cases. BUT i) You have to notice and do this manually ii) I don't know whether this always works/this is bullet proof iii) In the current setup described in the post, there is no checking if backup succeeded. Using CRON in the background, it might happen that your backup fails once and all subsequent backups fail unnoticed.

Third: There are myriads of other potential vectors (e.g. data degradation [1], failing hardware [2], ...). You might add a `borg check` as well as a separate checksum comparison of both archives prior to Rclone.

Anyway, thank you for the nice article! :-)

[1] https://en.wikipedia.org/wiki/Data_degradation
[2] https://github.com/borgbackup/borg/issues/3602

In reply to by mahikeulbody (not verified)

Hello Christopher,

Before asking my question, probably stupid, I explain my test case:

I tested BorgBackup on macOS High Sierra with HubiC / test-borg as HubiC synchronization folder:

1 - borg init --encryption=repokey /test-borg

2 - borg create /test-borg::friday /test

Files / folders are then created in / test-borg:

test-borg/
├── data/
│ ├── 0/
│ ├── 1
│ ├── 3
│ ├── 4 (taille de 5MB)
│ ├── 5

The file "4" is in fact the encrypted archive which gathers my 2 pdf files contained in the folder /test to backup (test1.pdf and test2.pdf).

3 - The file 4 is then sent to HubiC.

I make another backup without changing the contents of the / test folder :

4 - borg create /test-borg::saturday /test

test-borg/
├── data/
│ ├── 0/
│ ├── 1
│ ├── 3
│ ├── 5
│ ├── 7
│ ├── 8 (taille de 5MB)
│ ├── 9

Observation: the file "4" is renamed to "8". The file "4" is removed from HubiC and the file "8" is uploaded.

Note: With the normal use of HubiC, creating a backup without modification still involves a reupload of the data (due to the renaming) while nothing has changed in the folder to backup.

Does using a cloud storage like Wasabi have a different mechanic?
(Rename the file directly on Wasabi so no data reupload)

I do not have a high speed of upload (40 minutes for 300 MB) so if all the files are renamed it is as if we re-upload everything every time.

Thank you for your article.

I reply to myself :
The case described above is valid only for a small total file size.
I continued my tests with more data (2 GB) and only news files added is uploaded to HubiC.
To summarize: : the best backup solution (encrypted and compressed) for my needs.

Thanks for this article who helped me to find a backup solution.

In reply to by Loop (not verified)

I observed the same thing, and even though it does start reusing files at some point, having to upload all the data every day bugs me. I'm going to find another backup program because of that. Duplicati is promising, but it's been alpha for ages and just went beta.

In reply to by Loop (not verified)

A great article with interesting backup solutions!

Maybe you should also mention the privacy point of a backup solution. All your family images goes out to an external server cloud. I would not like to give all my private images/docs from kids and parents out of hand just to save a view bucks or have comfort.

Today its very easy to do a backup system in house. If you use MacOs Family - it has auto backups included to multiple external hard drives and its encrypted the best is it does all this without pushing any button. if you need also Linux i think a raspberry solution with external backup discs is a good investment.
_
by the way the comments chaptcha does not work if you have a privacy mode on. had do turn of save sites to post this comment. its also not good to give out capcha data from all your commenting users to an external commercial company like G who collects and uses data.

both borg and rclone have options to encrypt the data before sending it to remote storage, so the cloud server can't read it without a key. Of course, this means that you need to keep the key in a separate, safe place.

It's important to have off-site backups in case e.g. your house burns down.

In reply to by John Snow (not verified)

borg requires you to choose an encryption mode on creation of the repository. Hence you really need to want your files to lie unencrypted somewhere to do so. Otherwise BLAKE2b and SHA-256 are pretty solid and can be considered safe for uploading to somewhere. And if you're really paranoid or eager, you can set your ssh encryption accordingly to not use a weaker one for uploading.

In reply to by John Snow (not verified)

Thank you Christopher for this article! I'm facing the same issue and I decided few weeks ago to implement both backup.
However, I didn't know about Wasabi Cloud Storage. You recommend to mount the Wasabi Storage to restore a specific file if needed (via Borg mount). How is it seen from Wasabi point of view? Isn't it considered as a full download of the archive? If so, I think the solution is less cost-effective because Wasabi will apply fees for the full restoration.

I'm going to start this response with I COULD BE WRONG! (I don't want to take responsibility for someone being hit with a big transfer bill if I am mistaken...)

That said - my understanding is that when you mount an s3 bucket as a file system, you're able to access blocks randomly. So you wouldn't need to stream the entire backup just to get a small portion of it. In my own poking around for tests I was definitely able to pull out a file without (as far as I could tell) streaming the entire backup. I would suggest a quick test though to validate, as I am pretty sure you can do a partial restore without streaming the entire backup archive.

In reply to by Jeremy Fritzen

By the way, could you give us a Feedback about the billing policy?

I read the whole section related to the billing policy. It's pretty clear but I want to make sure about the billing before sending my data.

---My use case---
I plan to send my local data backup archive every day to the Wasabi Cloud Storage. The backup archive will always have the same name so the archive on Wasabi will be overwritten every day.

Let's say my archive is 1 TB, the bill for 1 month should be 3,99$.
If I understand well, I will be charged for 90days of storage for this archive, even if I overwrite it the second day, right?

Then, I understand that 1 archive file will cost me 3,99$ / 30 * 90 = 11,97$.
As I will overwrite this archive everyday by a new daily archive, is it right to think that for 1 year I will be charged:

11,97$ * 365 = 4379,05$ ?

Thank you very much for your help

Bye!

Jeremy

In reply to by docaedo

Just a side note: If you save your passphrase in plain text, make sure to consider the access to that file. E.g. by setting file permissions to 700. On larger multi user networks it happens often, that the home folders are word readable.

Hey Christohper.. good post thanks for taking the time! My biggest issue is having a stable solution to backup the widnows machines. For all my Ubuntu's borg is an awesome way to go... I've been breaking my head trying to make the windows port of that to function but no luck. I have a Ubuntu file server (that has borg on it at the moment) which is a VM backed by a ceph cluster. What comes to mind for my use case? What would be a good way to get windows files to the borg server.... I'm almost thinking of setting up a temp in between server on ubuntu as well that would receive raw data from the windows clients and then borg it over to the main repo... but doesn't feel clean!

Currently, my backup solution is 2 BTRFS HDDs, with snapshots that are rotated: everyday, when a new snapshot is taken and copied to the other disk, the oldest one is deleted. Example:

Day 1 bkps:

A, B, C, D

Day 2 bkps:

B, C, D, E

I'm thinking of getting rid of the second brtfs snapshots bkp and turn it into a borg repo, however I'm failing to see if it would be possible to delete an older borg bkp and keep my data, the same way I do with BTRFS.

Because I'm afraid of, let's say, 2 years from now I have to restore all my files because a meteor hit my house, I will have to have over 720 files (one per day, for 2 years) to get all my data. Is it possible to keep deleting the oldest borg bkp and keep all the files?

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.