This is Part 4 of my series on building a restic-based system backup series. The rest of the articles can be found here.
Replication
A goal from the start of this project has been replicating backup date to multiple locations. A long personal and professional history of dealing with backups leads me to the mantra that it isn't backed up until it's backed up to three different locations. Restic has several features which make this easy: backend storage (to a first approximation) is treated as append only — a blob, one stored, is never touched although may be deleted as part of expiring snapshots. Second, everything is encrypted, so you can feel as safe spreading your data to any number of cost-effective storage providers as you trust restic's encryption setup (which I generally trust).
In general, I want the client systems to know only about one service, the server we're backing up to. Everything else, the replication to other storage, should happen on the backup server. Also, we want new snapshots to get replicated relatively soon after they are created. If I decide to make an arbitrary snapshot for whatever reason, I don't want to have to remember to go replicate it, or wait until "the daily replication job"
These criteria lend themselves to something which watches for new snapshots
on the backup server. Restic makes this easy, as one of the very last things
it does after a sucessful snapshot is make a new snapshot
object.
There's one directory to watch, and when a new object appears there, replicate.
How to do that, though?
Minio does contain a notification system, and I strongly considered that for a while (to the point of submitting a patch to some incorrect documentation around that). But that offered two complications. First, setting up notification involves both changing the minio configuration file and also submitting some commands to tell it what you want notifications for, which complicates setup. Second, I quickly fell down a rabbit hole of building a RESTful notification service. This isn't impossible to overcome, but it was blocking the real work I wanted to do (more on that later).
My next consideration was using the Linux kernel inotify
facility
to watch for events in the snapshot directory, but that also fell under
roughly the same problems as the previous solution, and also added some Linuxisms
that I didn't want to add at this point. Of course, that said, I do freely
use bash
scripts, with some bashisms in them, instead of a strictly
POSIX-compliant shell, but, frankly, I'm not all that interested in running
this on AIX. So, take this all with an appropriate grain of salt.
The solution I finally set on is backup-syncd
, the not as elegant
but still useful setup. This simply runs in a loop, sleeping (by default for a
minute) and then looking at the files in the snapshot
directory.
If the contents have changed, fire off a script to do whatever syncing you want
to do. There's some extra stuff to log and be robust and pass off to the
sync script some idea of what's changed in case it wants to use that, but otherwise
it's pretty simple.
A decent part of systems engineering is fitting the solution you make to the problem you actually need to solve. I'm not expecting to back up thousands of systems to one backup server, so the overhead of a watcher script for each client waking up every minute to typically go back to sleep isn't really a consideration. And yes, depending on timing it could be almost two minutes before a system starts replicating, but that's close enough that I don't care. And while I do want to eventually build that RESTful syncing service to work with Minio's notification system, that's a desire to understand building those services robustly, and shouldn't get in the way of the fact that right now, I just want backups to work.
That said, another decent part of systems engineering is the ability to make
that solution not fuck you over in the future. You have to be able to recognize
that what fits now may not fit in the future, and while what you're doing now
may not scale to that next level, it at least won't be a huge barrier to moving
to that next level. In this case, its easy enough to swap out backup-syncd
with something more sophisticated, should it be necessary. You could go
another way, as well — for a low-priority client you could certainly
configure backup-syncd
to only wake up every few hours, or even
forgo it completely in lieu of a classic cron-every-night solution, should
the situation warrant.
Runsvdir
Now that we have more than one service running for each client, I've updated
the setup to use a per-client runsvdir
, which manages all the
services a particular client needs to do backups. Here we have a top-level
runsvdir
, called by the systemd unit file, which is responsible
for running the services for all clients. In turn, that top-level runsvdir
runs one runsvdir
for each client, which in turn runs minio and
backup-syncd for that client. The idea here being that I want to treat each
client as a single unit, and be able to turn it on and off at will.
There's a small issue with the way runsv
manages services. To
cleanly stop runsvdir
and everything its running, you want to send
it a SIGHUP
. The way we start a client runsvdir
is to
make an appropriate symlink, which does what we expect. But when we remove
that symlink, the supervising runsvdir
sends the client
runsvdir
a SIGTERM
signal, which makes the client
runsvdir
go away without touching the child runsv
processes
it started. You can customize what happens to the client runsvdir
process, however, and I'll be doing that in a future phase of this project.
Future wants
I'll end here by outlining some future ideas and wants for this setup:
- Monitoring and sanity checking: I want some sort of audit of every storage backend for a client, to make sure that the snapshots I want are where I want them
- Restoration checking: A wise person once said that nobody wants a backup system, everybody wants a restoration system. Something which restores some set of files and does some sanity checking would be good
- Metamanagement: Instead of making symlinks and poking around manually, I want scripts where I can enable and disable a particular client, get the status of a particular client's backups, etc.