S3, boto and IAM

As part of my process to replace the power-hungry eight-year old server I have at home with a tiny Intel NUC, I'm slowly moving any real services off of it onto my colocated machine. The last real service I'm running at home is my backup machine, which handles both my AFS cell backups and my rsync-based machine backup scripts.

Moving the backup server to colocation isn't difficult, but I need to find a place to stash the second, disaster-recovery copy of all of my backups. The obvious and most cost-effective solution is shoving all the data into Amazon's S3 service, particularly if I have it go into Glacier storage. For a project at the day job I've been using Duplicity for backups, which will happily handle S3 as a backend.

In a sane setup, there's a bucket dedicated to backups, say, tproa-backups, with each machine having a prefix that its backups are sent to. Each machine would get an IAM identity that would have the appropriate rights to create objects in S3 with that prefix, so machine's couldn't trip over each other's backups.

The documentation for S3 and Duplicity is rather sparse, and none of it talks about using IAM identities for access control. After getting Connection reset by peer errors from Duplicity, I tried getting the lastest versions of both Duplicity and Boto, the Python library around AWS. That failed, so next I tried using the Boto s3put script to try to shove something into S3, which also failed.

After digging around, I found the correct incantation to set in your bucket policy to allow an IAM identity to do Duplicity backups. In the following example, you'll see the following identifiers:

arn:aws:iam::854026359331:user/backup-gozer
The machine identity for gozer.tproa.net
arn:aws:s3:::tproa-backups/gozer.tproa.net
The S3 bucket/prefix for backups

{
    "Version": "2008-10-17",
    "Id": "Policy1402707051767",
    "Statement": [
	{
	    "Sid": "Stmt1402707005319",
	    "Effect": "Allow",
	    "Principal": {
		"AWS": "arn:aws:iam::854026359331:user/backup-gozer"
	    },
	    "Action": "s3:*",
	    "Resource": [
		"arn:aws:s3:::tproa-backups/gozer.tproa.net*",
		"arn:aws:s3:::tproa-backups/gozer.tproa.net"
	    ]
	},
	{
	    "Sid": "Stmt1402707048357",
	    "Effect": "Allow",
	    "Principal": {
		"AWS": "arn:aws:iam::854026359331:user/backup-gozer"
	    },
	    "Action": [
		"s3:ListBucket",
		"s3:GetBucketLocation"
	    ],
	    "Resource": "arn:aws:s3:::tproa-backups"
	}
    ]
}

In the first statement, I'm giving the machine IAM identity the rights to do anything under tproa-backups/gozer.tproa.net. Note, in particular, that you do not put a '/' at the end of the prefix. In the second statement, I'm giving the machine IAM identity the rights to both list the bucket and find the bucket location of the bucket tproa-backups Again, the same note about not ending the bucket name with a slash. The s3:GetBucketLocation right is crucial, without it the Boto library can't find which location the bucket is in, so it can't connect to the proper S3 frontend, which causes it to bomb out without any useful error message.