Auto Replicate

Here’s a little script I put together to simplify my life dealing with ZFS replication.

Sources available on github

The idea was to have a simple one liner that would replicate a source filesystem to a remote filesystem with a minimum of options to deal with. It does not create any snapshots itself - you have to take them yourself - Time Slider or some other snapshot scheduling method that works for you.

The idea is that you should be able to pop the script into a cron job and leave it alone. It will create the destination filesystem if required and then keep it up to date.

The other thing this script doesn’t do is delete any snapshots. It’s up to you to define your own cleanup policy on the remote copy.

The other thing I did test out to verify is that you can use a temporary copy to move big data sets from one datacenter to another so that you’re not sending terabytes across the WAN. Using the same script to send data from the master to a laptop or a portable ZFS NAS, you can then integrate it on a remote server and it will properly establish the links between the snapshots.

Comments welcome, feel free to use and modify it for your own environment, although I tried to make it as generically useful as possible. I’m more of a Perl hand than shell scripter so any recommendations for more elegant methods are definitely welcome and any other improvements that seem useful.

Update:

Added a check for the existence of the source filesystem since the errors that this situation produces are very unclear.

Update 25 jan 2010

Just tried running this on a non-root account under b130 (since the option of going straight to root and not creating a user account is no longer an option in the releases after 2008.11) and it fails out. trying to figure out the necessary rights and when pfexec needs to be used and checking for a source snapshot.

Further to this one, it turns out the problem is related to the obvious (in hindsight) issue of trying to replicate a filesystem that has no snapshots. I’ve added a check to verify that a snapshot exists before continuing through the script and added a pfexec prefix to the zfs commands. If you’re running as root, this should not cause any problems, but if you’re running as a user account (with the appropriate zfs privileges) this is required.

Corrected a problem with the grep syntax for matching snapshots from the source and destination.

Update 2 Mar 2011

This has been in my local copy for a while but I’m finally getting around to posting. When localhost is used as a destination, the copy recv command references the local instance of zfs rather than passing through ssh. This means that in the examples used for copying from one system to another, you don’t need a separate server to accept the copy, you can simply attach some external disks, create a pool and do the copy locally.

I’m currently using this on 2009.06, Nexenta 3.0.4 and Solaris Express.

Update 22 april 2011

Removed the -d option from the recv command which seem to cause “invalid backup stream” errors when receiving incremental streams.

Update 6 august 2012

Added a check for Nexenta and Open Indiana distributions in order to use the correct full paths to the required commands. Note that under these versions, you’ll need to be running as root or have delegated access to the zfs and zpool commands.

Update 5 october 2012

Added in zfs holds to eliminate the potential problems of missing or mismatched snapshots. After a successful transfer, the oldest snapshot on the source gets a hold with the name of the destination pool so that it cannot be auto deleted by external snapshot cleanup jobs. This ensures that there will always be matching source and destination snapshots available for the next incremental send.

NB: you can still break the thing by taking snapshots on the destination. But you shouldn’t be doing that…

To Do:

Add an option for initial replication to retain the snapshot history instead of only the most recent one