In a pinch...

A fun story for the sysadmins out there on an ugly situation that got fixed relatively easily. Recently, I ran into a situation in a client datacenter where they were running a FreeNAS system where the USB key with the OS had died. All of the important file services, notably the NFS service to a couple of ESXi servers were still running, but anything that touched the OS was dead. So no web console, and no SSH connections.

In my usual carry bag, I have my MacBook Pro, Thunderbolt to GbE adaptor and a Samsung 1Tb T2 USB 3 flash drive, formatted in ZFS. And of course some spare USB keys.

So first up, using VMware Fusion, I installed the latest version of FreeNAS on a spare key in case the original was a complete loss. How to do this? Well, you can’t boot a BIOS based VM off a USB key, but you can boot from an ISO and then connect the USB key as a destination for the install. So now I have something to run the server on later.

Then the question is, how to swap this out without taking down the production machines that are running on the ESXi servers? For this I created a new Ubuntu VM and installed ZFS on Linux plus the NFS kernel server. Now that I have an environment that has native USB 3 and automatic NFS publishing from the ZFS attribute “sharenfs” I connected the Samsung T2 to the VM and imported the zpool. I couldn’t use FreeNAS in this case since it’s support for USB 3 is not great.

Then there was a quick space calculation to see if I could squeeze the running production machines into the free space. I had to blow away some temporary test machines and some older iso images to be sure I was OK. Then create a new file system with the ever so simple “zfs create t2ssd/panic” followed by “zfs set sharenfs=on” and open up all of the rights on the new filesystem. Oh, and of course, “zfs set compression=lz4” wasn’t necessary since was already on by default on the pool.

Then it was just a matter of mounting the NFS share on the ESXi servers and launching a pile of svMotion operations to move them to the VM on my portable computer on a USB Drive. Despite the complete non enterprisey nature of this kludge, I was completely saturating the GbE link (the production system runs on 10GbE - thank god for 10GBase-T and ethernet backwards compatibility).

Copying took a while, but after a few hours I had all of the running production machines transferred over and running happily on a VM on my portable computer on a USB Drive.

Then it was just a matter of rebooting the server off of the new USB key, importing the pool and setting up the appropriate IP addresses and sharing out the volumes. Once the shares came back online, they were immediately visible to the ESXi servers.

Then I left the MacBook in the rack overnight while the svMotion operations copied all the VMs back to the points of origin.

Best part: nobody noticed.