Back to backups

It’s been a while since I documented the current backup architecture at the house which has changed a little bit with the inclusion of two little HP Microservers running Solaris 11. I’m a big fan of the HP Microservers as they offer the reliability and flexibility of a true server, but the energy consumption and silence of a small NAS at an unbeatable price.

Overview

The core of the backup and operations is based on ZFS and it’s ability to take snapshots and replicate them asychronously. In addition the two servers use RAIDZ over four 2 Tb drives for resiliency.

Starting at the point furthest away from the offsite disaster recovery copies, the core of the day to day action is on a Mac Mini and a ZFS Microserver in the living room. The Mac Mini is connected to the TV and the stereo and it’s primary role is the media center. The iTunes library is far too large to fit conveniently on a single disk, so the Mini contains only the iTunes application and the library database. The actual contents of the iTunes library is stored on the ZFS server via an NFS mount which ensures that the path is consistent and auto-mounted, even before a user session is opened. AFP mounts are user dependent and open with the session and in case of conflicts will append a “-1” to the name listed in /Volumes which can cause all sorts of problems.

The Media ZFS filesystem is snapshotted and replicated every hour to the second server in the office. The snapshot retention is set to 4 days (96 hourly snapshots). So in the case of data corruption, I can easily roll back to any snapshot state in the last few days, or I can manually restore any files deleted by accident by browsing the snapshots. A key point here is that the ZFS filesystem architecture follows a block level changelog so that replication activity contains only the modified blocks and can be calculated on the fly during the replication operation. This means that there are no long evaluation cycles like those in traditional backup approaches using Time Machine or rsync.

iPhoto libraries are also stored on the server due to their size on a separate user volume and copied using the same methods.

Then there’s the question of the Mini’s backups. In order to minimize RPO and RTO, I have two approaches. One is that I use SuperDuper to clone the internal Mini SSD to an external 2.5” drive once per day. This permits an RTO of practically zero, if the internal drive dies, I can immediately reboot from the external drive with a maximum of 24 hours lag in the contents. To assuage the issue of data loss, the Mini is backed up every hour via Time Machine to the local ZFS server. I’m using the napp-it tool on the ZFS box to handle the installation and configuration of the Netatalk package to publish ZFS filesystems over AFP. Again, the backup volume is replicated hourly to the second server in the office.

RTO iTunes

Another advantage of this structure is that if the living room server dies, the only thing I need to do is to change the NFS mount point on the Mini to point to the server in the office and everything is back online. The catch is that because the house is very very old and I haven’t yet found an effective, discreet method for pulling GbE between the office and the living room, this connection is over Wifi, so there is a definite performance hit. But for music and up to 720p video it works just fine.

iOS

All of the iOS devices in the house are linked to the iTunes library on the Mini, including backups so they get a free ride on the backups of the Mini.

Portables

All of the MacBooks in the house are also backed up via Time Machine to volumes on the living room server, with hourly replication to the office so there are always two copies available at any moment.

The office

In the office I have the second ZFS server plus an older Mac Mini running OS X Server. The same strategy is applied on this Mac Mini as well. An external drive, duplicated via SuperDuper for a quick return to service, but I’ve had issues with the sheer number of files on the server causing problems with Time Machine, so I also use SuperDuper to clone the server to a disk image on the ZFS server.

I have a number of virtual machines for lab work stored on the ZFS server in the office in various formats (VirtualBox, ESX, Fusion, Xen, …) on dedicated volumes on the ZFS server, accessed via NFS. I’ve played with iSCSI on this system and it works well, but NFS is considerably more flexible and any performance difference is negligable. Currently the virtualisation host is a old white box machine, but I’m dreaming of building a proper ESX High Availability cluster using two Mac Minis based on the news that I can install ESXi 5 on the latest generation and virtualize OS X instances as well as my Linux and Windows VMs.

Offsite

No serious backup plan would be complete without an offsite component. I currently use a simple USB dual drive dock to hold the backup zpool (striped for maximum space) made up of two 2 Tb drives. They receive an incremental update to all of the filesystems on a daily basis, but only retain the most recent snapshot.

These disks are swapped out on a weekly or bi-weekly basis. With the contents of these two disks I can reconstruct my entire environment using any PC that I can install Solaris.

The best part of this backup structure is that it requires practically no intervention on my part at all. I receive email notifications of the replication transactions so if anything goes wrong I’ll spot it in the logs. The only real work on my part is swapping out the offsite disks on a regular basis, but even there, the process is forgiving and I can swap the disks at any time as there is no hard schedule that has to be followed.

Resources