Something goofy with VDR

There’s something that’s confused me with VMware Data Recovery that I’m wondering about. If you’ve ever watched the performance counters on the backup data stores assigned to VDR, one thing you notice right away is the very high level of read IOps. Glancing at it quickly that doesn’t seem to make any sense, since it’s the destination for your backups, not the source.

Looking further at it, it appears that there is a high level of read operations of 4KB. Again, more than a little strange. The only thing that I can think of is that it’s reading out the hash values of the dedup signatures, but even then it shouldn’t be needing 4Kb, unless that’s the minimum size on the filesystem.

If it’s the hash values of deduplicated blocks, there’s either something very inefficient in the VDR engine or my back of the envelope calculations are missing something.

Assuming two totally full 1Tb backup destinations, that works out to 2 147 483 648 KB (2_1024_1024*1024). Assuming 4KB blocks from the IO profile, there are 536 870 912 possible hash signatures to keep track of. Given a standard hash value of 128B, the entire hash table should only take up 65 536KB which fits easily into memory. By default VDR is configured with 2Gb of memory.

So that should negate my theory that it’s going to disk to pull out hash values to compare against incoming new data.

Which brings me back to my original question - why in the world is VDR hitting the destination disks hard in random read IO for small blocks? To the best of my knowledge, VDR uses a fixed block size for deduplication, unlike appliances like DataDomain which use variable block sizes.

The overall performance of VDR is dependent on having some fairly high performance disk storage which seems strange. This profile is completely understandable for when it’s doing integrity checks and scanning the entire contents of the destination, but very strange activity for performing backups.

As long as you’re aware of this, you can plan appropriately with reasonably high performance storage for use with VDR, but often the reflex is to go with high capacity, relatively slow disks for disk to disk backups which results in pretty horrible VDR performance.

Anyone know what this IO is doing?