Nasuni - scaling unstructured data

Storage Field Day 16 included a presentation by Nasuni that was very interesting, showing how the product had matured and how it can solve real world issues. An interesting product that sits in an odd space that is closer to SaaS but includes hardware edge servers with an option to have them be deployed in virtual machines.

The problem they solve is the issue of coping with unstructured data at scale, including the ability to cope with widely distributed offices. One things that I particularly like about the product is that it is a complete end-to-end solution that covers every aspect of managing unstructured data, from providing access to backups to DR which has the side effect that it’s often hard to do any kind of direct comparison with other products and approaches. But here’s an overview.

Some traditional approaches

Big central filers

This is the most basic approach to managing unstructured data: big file servers in a central datacenter. This has a lot of advantages in that everything is in one place, you can standardize all of the management and centralize backups. This falls down in two major ways compared to Nasuni, notably with respects to scaling and when you have multiple offices.

For the branch offices the problem is the latency and bandwidth getting back to the central file servers. Users hate it when they double click a file and it doesn’t open immediately. The current palliative is adding WAN accelerators like RiverBed or Silverpeak which do a pretty good job of handling local caching of file IO, but fundamentally, they are networking products, optimized at the block level without any particular knowledge about the file traffic they are handling which has some advantages from the point of view that some files may share blocks which can be deduplicated in the network streams but lacks any additional context awareness like metadata pre-caching.

Then there is the scaling issue. Centralized NAS systems are great and many of them can scale to staggeringly large systems, but there are two big issues when you look at the complete life-cycle of managing NAS systems and those are scaling the related systems and the stairstep upgrade issue.

On the first one, the classic problem is that every evolution on the primary storage requires that the backup systems evolve in lockstep. Originally, the issue was that you simply ran out of bandwidth to the tape storage, but now that most backups are disk to disk (D2D) the issue is about the amount of disk available on the backup system. Which of course needs to scale as a function of the amount of data that needs to be backed up. I can’t count the number of times that I have had to intervene with backup systems that simply ran out of space.

Then there’s the stair step issue. Once you’ve reached the limit of your current NAS, you have basically two options: buy a new one and find some way to link it to the existing share structure (think DFS-N) or buy a bigger one and migrate everything over. Both of these activities fall into the category of activity that I file under IT for IT’s sake. There’s a good technical reason that this action needs to be done, but other than the additional space, there’s no upside for the business and lots of possible things that can go sideways in the process.

There’s a third issue that is less pertinent in the vast majority of cases, but that should be taken into account and that’s the fact that all of the traditional NAS solutions can scale up to a certain point, but scaling down is not really an option unless you’re using a more modern scale-out cluster architecture (and you’ve made the CapEx investment anyway, so that doesn’t really change much). If your workloads have any kind of significant seasonality in capacity or load you have to buy for your peak load and let it idle the rest of the time.

Multi-site replication

Here the idea is that each branch office has a local copy of the data that is replicated via (probably) some kind of mesh with something like DFS-R. From a user perspective, this is one of the best approaches since there’s a local copy of all of the data and from a backup and DR standpoint, it’s also nice since you can backup the copy you have on a system in your datacenter and since there are lots of copies everywhere, DR is pretty much built-in.

But we still have the downsides of scaling of the backup systems as you grow, plus the issue that now when your data requirements grow, you have n systems to upgrade as well to ensure that they have the necessary capacity to hold all of the replicated shares.

In any kind of replicated system you’ll also run into the issues surrounding file locking where people in multiple sites want to work on the same file and try to open it simultaneously. Historically, this has been a little flaky with the standard Microsoft tools around DFS-R, but has improved significantly in recent versions of Windows Server.

NAS in the cloud

This is the simplest end run in some ways, in that you reproduce the traditional architecture with a centralized NAS, but instead of deploying it in your datacenter, it lives in a cloud provider’s environment. Which brings us back to the same issues that we had with running this in the datacenter. Latency issues to the clients, plus additional complexity with WAN accelerators and you need to design a complete backup process as simply taking snapshots in the cloud is probably insufficient for most companies’ data retention requirements for this kind of data.

Nasuni approach

Nasuni has done a lot of work to address the various pain points noted above. It’s worth noting that the other solutions are completely viable, if you’re willing to live with the various constraints and limitations, but with Nasuni, you have the opportunity to divest IT operations of all the operational hassles surrounding unstructured data.

Nasuni uses an object store for the back-end with the UniFS layered on top for all of the nice features like snapshots and so on. You can roll your own and then use the object store’s replication features for the backups since the replicated data also contains the snapshot history permitting you to go back in time for file restores. Or if you’re using a cloud provider like S3 or Azure, there’s just an inter-region replication policy to apply to get complete backup protection.

Using a cloud provider’s object store as the back-end has the additional advantage that scaling is no longer an issue. From a practical standpoint, S3 and Azure will scale to whatever you need without any kind of operational actions on the part of the IT team, avoiding both the stair-step problems and the backup scaling issues.

From a DR standpoint, the backup is now a dual-use copy which can be used the case of unavailability of the primary system where the UniFS instance just has to be switched to the secondary copy as it’s back-end and you’re back in business. From what I understand there are some technical niceties about the way that Azure handles this that makes it possible for this to be a hot failover without administrative intervention so it’s more in the high availability category than DR with the Big Red Button that needs to be pushed. If you use your own internal object store, that will depend on the specific implementation on how it handles the issue of an outage on the primary store.

From the perceived performance on the user side, similar to many WAN accelerators, Nasuni uses locally installed appliances that act as local caches for the cloud master. Each site can have a custom sized appliance designed for the local workload of the office, and can be deployed as a hardware appliance or if you already have some kind of local hypervisor for other services (typically domain controllers, print servers, etc.) you can deploy the appliance in a virtual machine.

For the file locking issue, they have a separate highly available service specifically designed to handle distributed file locking across sites that runs out of band of the regular file access.

Upshot

If you’re dealing with medium to large scale multi-site file sharing, Nasuni is definitely worth looking into. Also useful for single site NAS installations to lighten the operational load, getting rid of the administrative work of managing the NAS storage, backup and DR.