Thu, 07 Feb 2019

Securely storing environment variables in gopass

See yubi-env in my 'one-offs' repository.

Posted at: 18:03 | category: /computers/yubikey | Link

Mon, 04 Feb 2019

Vault Secrets Engines

At home I make extensive use of both the Minio object storage server and the Backblaze B2 object storage service. I've also recently started making use of HashiCorp Vault.

Given how useful it is to generate dynamic secrets with Vault, I wanted to extend that to my usage of Minio and B2, so writing a secrets engine plugin for Vault has been on my project list for quite some time. A couple weeks ago I came across David Adams's Sample Vault Secrets Plugin, and after about an hour of staring at that and the Vault source, everything clicked and I started writing plugins.

It's a testament to the contributors to Vault that the plugins system is very well thought out and incredibly easy to use, making it a rather simple task to extend Vault. You can find both of mine on GitHub:

Full Disclosure:At the time that this post was written, I am a HashiCorp employee, although this post and the plugins were written as a personal project and are not official HashiCorp products. The views expressed in this posting are entirely personal and are not statements made on the behalf of HashiCorp, Inc.

Posted at: 12:42 | category: /computers/hashistack | Link

Sat, 29 Dec 2018

Restic Systems Backup Setup, Part 5 - minio multi-user support

This is Part 5 of my series on building a restic-based system backup series. The rest of the articles can be found here.

One of the original design decisions in my restic systems backup setup was isolation between hosts. I didn't want root on one system to be able to access the backups of other hosts, even if they were storing backups on a common backup server.

At that time, Minio, the object storage server I was using on the backup server, only supported single-tenancy — there was a single "access key"/"secret key" per instance, with access to every object and every bucket in that instance. Minio's recommendation at the time was to run multiple instances, each on a distinct port, to provide isolation. That's the solution I went with at that time.

Sometime in October, when I was pre-occupied with my wedding, Minio added Multi-User Support. This support adds the ability to have multiple users per Minio instance, each with distinct access and secret keys, and adds decent support for S3-style policies. After a bit of experimentation I was able to figure out a setup where I could run a single Minio instance, put each system's backup in a distinct bucket, and create policies that kept everything separate.

This would greatly simplify my backup setup, and ties in with some other changes I want to make. I want to make it much more easier to add and remove backup clients, and to get the current status of them, amongst other things. I've also started eating, well, not my dogfood, but work's dogfood, and I've got a running Consul and Vault cluster running, and I want to start leveraging that as well.

Expect the next post in this series to talk about the new setup, while hopefully avoiding System 2.0 tarpits.

Posted at: 12:47 | category: /computers/backups/restic-systems-backups | Link

Wed, 25 Jul 2018

Using gopass with mutt

I've been using gopass for a long time as my password manager — with my GnuPG and Yubikey setup accessing my passwords on both my laptop and my colocated box is pretty transparently the same.

I randomly came across the fact that mutt will do backtick expansion in its configuration file. With that, I can keep my Mutt imap password in gopass and have mutt fetch it with set imap_pass=`pass mutt_imap_pass`

Posted at: 08:34 | category: /computers/nifty | Link

Sat, 14 Jul 2018

Wrapping Consul Lock

I've recently installed a Consul cluster at home, mostly to act as an HA backing store for Vault. If you've been following along, I've also been moving to Restic for my system backups so, of course, I want snapshots of Consul to end up there.

But this isn't a post about that — when I've got it running well and cleaned up, I'll post it and talk about it. What I want to talk about is the way that I'm wrapping the consul lock command.

The gist of the script is:

  1. Run consul snapshot save... and save the output to a local file
  2. Archive that file away

I want that running on all three Consul servers, but I only ever want the snapshot to happen on a single server at a time. That way, if I ever have to take a Consul server down for maintenance, snapshots will still happen within the cluster. This is a common practice, and it's really easy with consul lock. This is also useful where you want a cronjob or task to run, and it can run on more than one server but you only want it to run on a single server at any time. Rather than only running it on a single server, and having to remember to switch it elsewhere in the event that server goes down, or having to install some sort of distributed queueing system, you can use this.

If you do consul lock only one instance of will run at a time, which is the basis of the technique I'm using. But I don't want to put the consul lock logic in my crontab, I want it embedded in the script. Here's a simple way of doing that:

#!/bin/bash

[...]

function inner {
    #do stuff inside the lock
}

#parse arguments here, getopt, etc

if [ "$1" == "inner" ]; then
    shift
    inner $@
elif [ "$1" == ... ]; then
    #other subcommands
else
    consul lock $KVPREFIX $0 $ARGUMENTS $GO $HERE inner
fi

If you call your script normally, it will use consul lock to call itself with the argument inner, which does the stuff which should only be done on a single host at a time.

Posted at: 23:03 | category: /computers/consul | Link

Sun, 18 Feb 2018

Thoughts about On-Call

This month there have been a couple of interesting discussions about on-call rotations in the tech industry. The first was started by Charity Majors, who sparked a thread on Twitter:

All this heated talk about on call is certainly revealing the particular pathologies of where those engineers work. Listen:

1) engineering is about building *and maintaining* services
2) on call should not be life-impacting
3) services are *better* when feedback loops are short

— Charity Majors (@mipsytipsy) February 10, 2018

A couple days later John Barton followed up with an article that I really enjoyed, and pretty much whole-heartedly endorse. I had a few thoughts from both of these, and wanted to talk about them here.

"But that's just an incentive for engineers to weasel extra pay by building broken systems": I think this falls apart in several ways. First, that extra pay doesn't just appear with no additional consequences — the engineer on-call still has to actually fix the problem, wake up at odd hours, be bothered when they'd much rather be bowling or watching a movie or reading a book or sleeping, etc. Second, if this actually works at your company, your management is broken. Period. That's the whole point of it, to put an explicit material cost to this additional duty. If your management tolerates abuse of this pay, they either explictly consider this part of the cost of doing business, or they're not paying close enough attention, and both of those cases are entirely on them.

To everyone that argues that an engineer's pay covers this, I'd counter by asking "Okay, how much of that pay represents the on-call expectation?" I'm guessing many places wouldn't be able to do that. And unlike many things an employer pays for that are fuzzy, hard-to-define criteria, this one is easy, all it takes is a stop-watch and a calculator to count up how many minutes are spent responding to incidents. Is what you're being paid for it worth it? As John points out, many other industries with highly trained professionals pay on-call differentials, and tech shouldn't be any different.

I'd also add a guideline to John's list: if someone gets a page, the next day someone covers for them for 24 hours. While this isn't official policy where I work, it's my own unofficial policy to offer to cover for my co-workers when they have a particularly bad on-call day. Someone who is woken at 3am, even if they can go back to sleep ten minutes later, doesn't get as good as rest and isn't as effective the next day. Having that followed by another interrupted sleep the next night both makes the problem worse and also makes it so that the the most critical person on your team, the one responding to an emergency, is in less than peak condition. Don't let people shrug this off with an "I'm fine" — there's a large body of sleep research that disagrees with them.

Like many things in tech that I think are bad, it's only going to change if expectations start changing, and expectations aren't going to change unless we start prodding them in the right direction. I think these kinds of questions, asking for the kinds of policies John advocates, needs to be something more standard industry-wide. If my situation warrants, I plan on making this part of the questions I ask any potential employer, and if your situation warrants, I'd ask you to do the same.

Posted at: 14:39 | category: /tech | Link

Sun, 21 Jan 2018

Disabling Yubikey 4 OTP

Since I can never remember this:

I don't make use of the Yubikey OTP mode, so I don't want what a former co-worker called "yubidroppings" when I accidentially brush my key.

Short answer: get ykpersonalize and run ./ykpersonalize -m 5, since I only want U2F and CCID modes enabled. Tell it yes twice.

Posted at: 12:10 | category: /computers/yubikey | Link

Sat, 20 Jan 2018

Restic Systems Backup Setup, Part 4.5 - Why not just rclone

This is Part 4.5 of my series on building a restic-based system backup series. The rest of the articles can be found here.

.@thomaskula nice article! Did you consider just running rclone in a loop?

— restic (@resticbackup) January 15, 2018

After I posted part 4 of my restic backup series, @resticbackup asked the above question, and I thought trying to answer it would be a good intermediate article.

As a brief background, rclone describes itself as "rsync for cloud storage". It can talk to a rather comprehensive number of storage providers as well as local storage, and can perform operations as simple mkdir, cp and ls and ones as complicated as syncing between two different storage providers. It, like rsync, is a useful tool in your kit.

So why not just run rclone in a loop? Actually, that might not be a bad idea, and it's certainly a simple one. Pick a loop sleep that matches your replications needs, put some error checking in, and fire away. If I were going to do this, I'd likely use rclone copy rather than rclone sync. copy will copy files from the source to the destination, but will not delete any files on the destination which do not exist on the source. sync on the other hand, will make the source and destinations look exactly the same.

My preference for copy over clone is two fold. First, I like the idea of having different retention policies at different repositories. For example, some of the storage provider options are so inexpensive, for my scale of storage needs, that I basically treat them as "just shove things in there forever, I don't care", or, at least, care about once a year. On the other hand, local fast storage is much more expensive, that perhaps I can only afford to keep, say, a week or two of backups around for all of my systems. By treating the repositories as distinct, and with the append-only nature of restic, I can do that, keeping what I'm likely to need for most restore operations at hand, and keeping longer term, much much less likely to be needed restored data off some place where it's harder to access but cheaper to keep.

The second reason for treating the repositories separate is it helps guard against "oh shit!" moments: if you are cloning every five minutes and you accidentially delete some data you need, you've got a narrow window to realize that and stop the clone. At some point in your life, you will do this — I remarked once that "[s]ome people think I'm smart, but that's not it. I just remember exactly what each of those bite marks on my ass means."

That all said, I'm going to keep using the mechanism I outlined in the last article, of firing off a new sync job every time a new snapshot appears. There's a few reasons for this. First, it's there, and its working. Baring some overriding need to change this setup, I don't plan on exerting energy to change it — for now.

Second, there is some amount of overhead cost here every time I do a sync. My goal is that data starts being synced within a couple minutes of a new snapshot being created. I'm still, however, mostly doing the one-backup-a-day-late-at-night model (at least for now). With that, I'll actually have work to do less than one-tenth of one percent of the time, which just feels off. I'll admit, of course, that's just a gut feeling. In addition, even if I'm not copying data, building up a list of what I have locally and, more importantly, what's at the remote repository, has some cost. All of the storage providers charge something for operations like LIST, etc. That said, honestly, I haven't ran the math on it and the charge here is almost certainly one or two epsilons within nothing, so perhaps this isn't much of a reason to care.

The two important bits on conclusion: first, I have something working, so I'm going to keep using it until it hurts to do so, which, honestly, is a good chunk of the reason I do many things. We'll be fancy and call it being "pragmatic". Second, your needs and costs and criteria are certainly different from mine, and what's best for you requires a solid understanding of those things — one size certainly doesn't fit all.

Posted at: 18:32 | category: /computers/backups/restic-systems-backups | Link

Mon, 15 Jan 2018

Restic Systems Backup Setup, Part 4 - Replication and Runsvdir

This is Part 4 of my series on building a restic-based system backup series. The rest of the articles can be found here.

Replication

A goal from the start of this project has been replicating backup date to multiple locations. A long personal and professional history of dealing with backups leads me to the mantra that it isn't backed up until it's backed up to three different locations. Restic has several features which make this easy: backend storage (to a first approximation) is treated as append only — a blob, one stored, is never touched although may be deleted as part of expiring snapshots. Second, everything is encrypted, so you can feel as safe spreading your data to any number of cost-effective storage providers as you trust restic's encryption setup (which I generally trust).

In general, I want the client systems to know only about one service, the server we're backing up to. Everything else, the replication to other storage, should happen on the backup server. Also, we want new snapshots to get replicated relatively soon after they are created. If I decide to make an arbitrary snapshot for whatever reason, I don't want to have to remember to go replicate it, or wait until "the daily replication job"

These criteria lend themselves to something which watches for new snapshots on the backup server. Restic makes this easy, as one of the very last things it does after a sucessful snapshot is make a new snapshot object. There's one directory to watch, and when a new object appears there, replicate. How to do that, though?

Minio does contain a notification system, and I strongly considered that for a while (to the point of submitting a patch to some incorrect documentation around that). But that offered two complications. First, setting up notification involves both changing the minio configuration file and also submitting some commands to tell it what you want notifications for, which complicates setup. Second, I quickly fell down a rabbit hole of building a RESTful notification service. This isn't impossible to overcome, but it was blocking the real work I wanted to do (more on that later).

My next consideration was using the Linux kernel inotify facility to watch for events in the snapshot directory, but that also fell under roughly the same problems as the previous solution, and also added some Linuxisms that I didn't want to add at this point. Of course, that said, I do freely use bash scripts, with some bashisms in them, instead of a strictly POSIX-compliant shell, but, frankly, I'm not all that interested in running this on AIX. So, take this all with an appropriate grain of salt.

The solution I finally set on is backup-syncd, the not as elegant but still useful setup. This simply runs in a loop, sleeping (by default for a minute) and then looking at the files in the snapshot directory. If the contents have changed, fire off a script to do whatever syncing you want to do. There's some extra stuff to log and be robust and pass off to the sync script some idea of what's changed in case it wants to use that, but otherwise it's pretty simple.

A decent part of systems engineering is fitting the solution you make to the problem you actually need to solve. I'm not expecting to back up thousands of systems to one backup server, so the overhead of a watcher script for each client waking up every minute to typically go back to sleep isn't really a consideration. And yes, depending on timing it could be almost two minutes before a system starts replicating, but that's close enough that I don't care. And while I do want to eventually build that RESTful syncing service to work with Minio's notification system, that's a desire to understand building those services robustly, and shouldn't get in the way of the fact that right now, I just want backups to work.

That said, another decent part of systems engineering is the ability to make that solution not fuck you over in the future. You have to be able to recognize that what fits now may not fit in the future, and while what you're doing now may not scale to that next level, it at least won't be a huge barrier to moving to that next level. In this case, its easy enough to swap out backup-syncd with something more sophisticated, should it be necessary. You could go another way, as well — for a low-priority client you could certainly configure backup-syncd to only wake up every few hours, or even forgo it completely in lieu of a classic cron-every-night solution, should the situation warrant.

Runsvdir

Now that we have more than one service running for each client, I've updated the setup to use a per-client runsvdir, which manages all the services a particular client needs to do backups. Here we have a top-level runsvdir, called by the systemd unit file, which is responsible for running the services for all clients. In turn, that top-level runsvdir runs one runsvdir for each client, which in turn runs minio and backup-syncd for that client. The idea here being that I want to treat each client as a single unit, and be able to turn it on and off at will.

There's a small issue with the way runsv manages services. To cleanly stop runsvdir and everything its running, you want to send it a SIGHUP. The way we start a client runsvdir is to make an appropriate symlink, which does what we expect. But when we remove that symlink, the supervising runsvdir sends the client runsvdir a SIGTERM signal, which makes the client runsvdir go away without touching the child runsv processes it started. You can customize what happens to the client runsvdir process, however, and I'll be doing that in a future phase of this project.

Future wants

I'll end here by outlining some future ideas and wants for this setup:

Posted at: 12:29 | category: /computers/backups/restic-systems-backups | Link

Updates and Engagement

The standard end-of-the-year party and eating season conspired to keep me from much creative work here, but I've been off work this past week and managed to wrap up a new issue of Late Night Thinking and do some work on my restic systems backup setup. Both will appear here shortly.

Also, if you're one of the small number of people who haven't found this out from any number of places, on 1 November 2016A I got engaged to E, my boyfriend of two years. Wedding is this coming November.

Posted at: 11:36 | category: /random/2016b/01 | Link