Thursday 26th September 2013: 6.18am. Just had a very stressful and long day! Last night I upgraded my cloud's software and expanded the hard drive in one of my VMs as I was running out of space on it, all looked good, so I went to bed. I woke up this morning, and hmm my encrypted offsite personal data store is gone, something important as I need to file my Irish tax return and it contains the tax return login keys. Moreover, the VM which implements the encrypted store is frozen, same one as I expanded last night. I try rebooting it, and it hangs a few minutes after boot. Weird.
My first thought is it must be some software incompatibility in the upgrades - I upgraded each of the VMs as well as the hypervisor. So I spend the first few hours trial and erroring downgrades of various components, resetting the VM each time to a backup, removing various features and drives. The backup works perfectly, but as soon as I reextend its storage, and as soon as that VM writes anywhere into the extra storage, it hangs again. Even weirder.
So I wipe the VM, and completely redo it from backup. Same problem, writing into new storage freezes the VM. I wonder, is there a problem with the space where the hypervisor keeps the hard drives? While writing test files there, I notice that the hard drives image sizes add up to more than the partition's size. Eh? I think for a moment ... and the lightbulb turns on. ext4 stores files sparsely, so only the data written is stored and you can create files as large as you like. Therefore my VM was actually running out of storage.
So I clean out some stuff, and voila now the VM is working right. But hey, why isn't my encrypted offsite store working? The one with all my incredibly important, must never, ever lose data? That one.
Well, it turns out that when you delete a VM in my hypervisor, it helpfully deletes all drives associated with that VM i.e. my encrypted data, all of it. And the offsite backup had very helpfully replicated the deletion of everything to the remote site, so the remote backup was also hosed. When what I had done sank in ... that horrible sinking feeling in the pit of your stomach ...
Luckily, it turns out that LVM - the software which manages storage at its bottom most level - keeps a recent history of state, or, put another way, it lets you undo horrible horrible mistakes like deleting everything you own. One quick command later and voila, back came all my incredibly important stuff. And I felt so very, very much relieved ... all my tax stuff, accounts, everything important lives there. And I have no backup, as I had always assumed my offsite backup would be enough if anything ever went wrong.
That is going to change!!! The encrypted store predates my more recent ZFS storage array which actively protects against cosmic ray corruption and the like, and it is a very good place to keep important stuff. So, before I sleep tonight I'm going to set up automated daily backups of my encrypted store onto my ZFS array, still in encrypted form of course. And let this be a lesson to myself, don't keep your eggs in one basket ever again!
I think I'll sleep very well tonight.