Revisiting Thin Provisioning

This post is coming from having to explain the different levels of thin provisioning to a client who was dealing with this for the first time. It’s sometimes hard to remember that not everyone has gone through the same learning cycle and not been exposed to the history you have. This post is also aimed at the UI designers for storage products.

The basics : At each level where you assign storage to something, thin provisioning will allow you to propose a logical object of that storage type that basically lies to the system consuming it so that you don’t need to physically allocate the total size immediately, but instead will consume it as writes go to the object.

Storage bay: “Sure, here’s a 4Tb LUN, format it!”

Server: “Cool, I have 4Tb of new space to put stuff!”

Reality: you’ve written the filesystem file allocation table and used a few megabytes.

The danger of thin provisioning is spending beyond your means. If your storage bay only has 2Tb of space and you allocate a 4Tb LUN, the day that you need to write that one byte over the physical 2Tb you have, the system will just refuse to do so and you’re screwed. So the rule of thumb I always use at the storage bay level is to only allocate up to what you really have available. Thin provisioning means that you will probably have some extra swing space available for emergencies. But that’s what it’s for: emergencies like having the swing space to do a restore without wiping the original machine.

With that said, the problem this client was having was that the UI on his storage bay would show him the amount of space that a given LUN was actually using and for some reason the UI was showing the volumes using a traditional green-yellow-red coloring based on the amount of blocks allocated. Not understanding the way the storage system worked, he was understandably concerned when he started seeing volumes in red. Note to all UI designers: stop this! As long as you haven’t allocated more volume space than you have, fully allocated volumes are not an issue. Not necessarily the most efficient use of your disk as we’ll in a second, but not a problem.

A thin provisioned volume that has 100% of its blocks allocated is not in danger of anything. It’s working as designed. But how do you get to 100% when obviously from the server consuming the volume it’s not 100% full? This is the difference between allocated at the storage level and being part of a logically defined file object. When you create a new file, it obviously has to allocate some fresh blocks to store it. But when you delete the file, the filesystem doesn’t actually do anything to the blocks where the file’s data is stored. All it does is update the allocation table to note that the blocks previously occupied by the file are no longer used.

But the underlying storage has no concept of files or file systems. It is just a bunch of dumb blocks associated with a set of addresses on the physical storage. You asked it to allocate those blocks and it did so. The fact that the filesystem is no longer using them is not its problem. You did put data on them at one point so they’ve been allocated.

Now at the filesystem level, deleting that file freed up space from its point of view. If at some time in the future it needs to write a new file and all of the blocks have been written to at some point in the past, it’s simply going to write that new file onto blocks that previously held the data of a file that has been deleted. So in this case, the previously allocated blocks are going to have their contents overwritten by the new data, but not consuming any additional space on the underlying storage.

Back to the efficiency question. What if I create a volume, fill it with data and then delete everything? Well, the filesystem will be empty and ready to accept new files, so that’s fine from a functional use standpoint. But if you can’t delete the volume and you want that space back to use for something else you need some kind of arrangement between the consumer of the storage and the storage system so that they can agree on which blocks the filesystem is actually using so the storage system can put the unused blocks back in circulation.

The old school brute force method was having a process in the storage consumer that would write zeroes to all of the blocks not currently used by files and then the storage system would run a process looking for zeros and take back those blocks. If you’ve been paying attention you can see how inefficient this process is since writing a zero is writing data so the storage system will allocate all of the blocks on the volume and then go back and shrink it later. Aside from the insane amount of IO traffic that this generates, this is awfully time-consuming. Smarter storage systems look for zeroes being written and treat them as requests to free up the blocks and if there’s a read to an unallocated block, it knows that it can only contain a zero (assuming that you had real data that contained blocks of zeroes).

Then if you’re using VMware and a storage system compatible with VAAI and using VMFS 6 there’s a kind of backchannel that allows the storage system to understand a delete request at the filesystem level and free up the corresponding blocks.

Anyway the point I wanted to make here was that thin provisioned volumes on a storage bay that are red because they are 100% allocated are not a reason to panic. UI designers please take note. On the other hand start ringing bells and putting up flashing alerts when the allocated space exceeds the physical capacity of the system because without this people can dig themselves in pretty deep if they’re not paying attention.