Hacking AFS Dumps for Fun and Profit

Well, for fun at least.

The traditional way of doing AFS volume dumps tends to follow a classical "Full,incremental,incremental" pattern, with occasional new Full dumps so that the number of dumps one has to restore for a given time period is manageable (at work, we do something that is roughly "Monthly-Weekly-Daily"). This also lets you do expiration of dumps for stuff you no longer need — if you only want to keep two months worth of dumps it is easy to determine which dump files you no longer need.

At home, however, doing full dumps is painful for large volumes, because my DSL connection has a rather paltry upload speed, and since I keep copies of the volume dumps both at home and at my colo location, no matter where I do the dump at least one of the transfers will be slow. What I would like to do, then, is a process where I do a painful full dump once, and then every day simply do a dump of what has changed since the previous day. This gets painful quickly, since after about three days the number of dumps to restore gets too large to want to do. In addition, you can never throw away any dump, since they are now all necessary (potentially), to do a restore.

My desire, then, is to have something that pulls apart dump files and keeps enough data around for every particular backup point so that I can synthesize what appears to be a full dump file for that point. AFS volume dumps handily do that: they will tell you either "Here's a vnode that's changed" and "This vnode is present but hasn't changed since your reference time" If you combine that with some logic that keeps track of what the vnodes looked like in the last backup, you all of a sudden have enough information to be able to do the sythesis.

Thus the impetus to create pyafsdump, a Python module that understands and can do various things with AFS volume dumps. As a proof-of-concept I put together a pair of hackish scripts, one of which pulls apart volume dumps and generates some metadata, and another which reads that metadata and synthesizes a full dump. A very rough test seems to indicate that it works, I was able to pull apart a full dump and three subsequent incremental dumps, and from that generate a full dump that contained what the volume looked like at the time the third incremental was made, which was restorable with vos restore.

A public git repository can be found at http://kula.tproa.net/code/pyafsdump.git