Creating an Append-Only Backup with Restic

I wanted to backup my data. Easy premise, but harder to get right.

There are a few questions that need to be answered if you want to create a backup:

What backup strategy do i want to use?
Which software do I want to use?
Where will I store my backup?

At the end, I will present my current restic configuration.

Strategy

To create a backup strategy means thinking about what sort of incidents you want to recover from. The (data) loss of your drive? The loss of your drive and one backup? Two? There might be other considerations apart from disk failure as well, like the location of you and your backup(s) and the frequency of natural or human-made disasters there.

Consider how long you want to store the data and how important it is, and then search for a solution that fits your risk acceptance.

A good starting point for your backup strategy is the 3-2-1 rule ^{1
Ruggiero, P., Heckathorn, M. A. (2012). Data backup options. US-CERT. https://www.cisa.gov/sites/default/files/publications/data_backup_options.pdf., p. 1}: Keep 3 copies: the version you work on and two backups. The backups should be on two different types of media, and one of the copies should be offsite.

For me, this means that the data that lives on my computer’s and server’s disks should get mirrored to my NAS and some cloud storage. The data that lives only on my NAS will be mirrored to cloud storage as well, but since that is only one copy, I will create another backup on a hard drive and give that to some relative.

This way, I will have at least three copies (PC/server, NAS, cloud), on two mediums (hard drives and cloud storage), and offsite backups. This should save me from data loss, the cloud provider going down, and disasters.

Software

Overview

Now that you know where to store the data, you have to decide which software to use.

There are many possibilities of how to store backups: A simple copy, compressed archives, incremental backups, or deduplicated archives.

A simple copy takes up a lot of disk space. Compressed archives save some of that, but you still have to store each version of the files separately, which is a huge waste of resources. Incremental backups save space by only storing changes to a full backup, but if one incremental backup fails, the complete backup from that point on may be broken.

A good compromise are deduplicating backups. They are basically full backups, but the data is split into chunks. If there are duplicate chunks, e. g. the same file from two different points in time, they will be stored only once.^[1]

[1]: This also means that it is sometimes better not to compress your files before backing them up.

If you want to recover your data, the chunks will be read and form your files again. Now, there are some dangers here as well, like bit flips which might alter some chunks, but they should only concern one file (and can sometimes be repaired).

Another thing you want to consider is encryption, especially if the backups are not under your control. Encryption is a great way to ensure that noone can read your data, but should you loose access to your keys, it is also a great way to ensure that you can’t read your own data.

Restic

In the end, I wanted a deduplicated and encrypted backup. There are many programs written for this use case, but it came down to borg or restic for me. In the end I chose restic for its encryption, since borg’s is weaker if multiple clients update the same repository. ^{2
BorgBackup (n.d.). Borg - Deduplicating Archiver 1.2.4 documentation. https://borgbackup.readthedocs.io/en/1.2.4/internals/security.html#attack-model}

Restic does everything I want, is a single binary (portability!), and has an explicit threat model that works for me.

A big disadvantage is its key management. In restic, your password doesn’t encrypt the data directly, but rather a key file. This keyfile holds the data to create the key actually encrypting files. This means that an attacker, once they have a decrypted keyfile, can read all the repository data forever, even if their key is revoked. This is, of course, only a worry if the attacker has access to the repository storage, but still something to keep in mind.

Storage Provider

Since an attacker that can access restic keys can also read all the other data on my system, securing from reading the repository is kind of pointless and has to be solved by other means. I can, however, protect myself from an attacker deleting or overwriting my backups by creating append-only backups. That means that once the data is written, it can’t be overwritten or deleted.

There are two ways to achieve this: with a VPS with enough storage and rclone (see: ruderich.org) or with a storage provider. A VPS just for backups would have led to a lot of manual work, since I would have wanted to keep it out of my usual infrastructure. I therefore chose to go with a managed storage.

I considered quite a few providers to store my backups at. On the list were:

Hetzner Storage Box
Scaleway
OVH
rsync.net
Backblaze
Wasabi
AWS
Some VPS to store my data on

My goal was to find an european storage provider with S3 versioning and IAM access policies with the ability to restrict deletion to subdirectories (locks/*) and the ability to restrict the deletion of noncurrent object versions.

Sadly (and I would love to be corrected on this, write me!), only AWS had those features. Two came close:

OVH has IAM access policies (though they seem to need a bit of work), but no versioning
Scaleway has versioning and some IAM management (you can’t restrict permissions to certain paths), but no DeleteObjectVersion restriction, meaning anyone with a DeleteObject permission can delete any version of any file

So, AWS it is. I am not completely happy, but since restics encryption seems to be holding up, I will use it for my server backups. My personal backup will have to wait for a bit.

Final Configuration

I created a S3 Bucket with versioning enabled and a corresponding user with the following permissions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DeleteLocks",
            "Effect": "Allow",
            "Action": "s3:DeleteObject",
            "Resource": "arn:aws:s3:::my-restic-bucket/locks/*",
            "Condition": {
                "ForAnyValue:IpAddress": {
                    "aws:SourceIp": [
                        "1.2.3.4/32",
                        "a:b:c:d::/64"
                    ]
                }
            }
        },
        {
            "Sid": "AllowListings",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": "arn:aws:s3:::my-restic-bucket",
            "Condition": {
                "ForAnyValue:IpAddress": {
                    "aws:SourceIp": [
                        "1.2.3.4/32",
                        "a:b:c:d::/64"
                    ]
                }
            }
        },
        {
            "Sid": "AllowReadWrite",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::my-restic-bucket/*",
            "Condition": {
                "ForAnyValue:IpAddress": {
                    "aws:SourceIp": [
                        "1.2.3.4/32",
                        "a:b:c:d::/64"
                    ]
                }
            }
        }
    ]
}

This allows writing (and overwriting, but that’s what versioning is for) and listing, but only deleting objects in the locks folder. It also filters by source ip, so only the server that backs up to that repository can actually access it. Someone gaining access to that server and reading files would trigger alarms, so the backup should be fairly safe.

Forgetting a backup gives me the following error: Remove(<snapshot/7642d7e379>) returned error, retrying after 1.080381816s: client.RemoveObject: Access Denied.

The backup automation is done via a cron job that runs once a day:

1
2
3
4
5
6
7
8
#!/bin/bash

for env in "/root/backup-envs"/*; do
    echo "Backup for $env"
    source $env

    restic backup your/backup/directory
done

The files in backup-envs look like this, but you can use any restic backend you want. The repository just has to be initialized.

1
2
3
4
5
6
7
#!/bin/bash

export AWS_ACCESS_KEY_ID='xxxxx'
export AWS_SECRET_ACCESS_KEY='xxxxxxx'
export AWS_DEFAULT_REGION='eu-central-1'
export RESTIC_REPOSITORY='s3:https://s3.amazonaws.com/my-restic-repo'
export RESTIC_PASSWORD='xxxxx'

Finally, I also implemented some lifecycle rules: I am deleting all versions of noncurrent locks, remove multipart uploads that are older than a week, and move everything noncurrent and the data directory to the Glacier Instant Retrieval class.

Access Logging is also enabled.

If I want to access the data, I either log in from my server or temporarily disable the IP restriction for AWS.

This also means that it is sometimes better not to compress your files before backing them up.