May 22, 2012

All the A-SIS talk

There is quite a bit of chatter about de-duplication methods and technologies now-a-days, mostly sparked by NetApp’s announcement of data de-duplication facilities on their storage systems. De-duplication has been a buzzword that’s been around for couple of years now, with companies like Quantum, Data Domain, FalconStor, Symantec, Sepaton etc providing various products, and with some of them making it their focus of presentations in their Backup/DR sales calls. I’ve sat in a few of those and heard about how vendors are promsing 20 to 1, 50 to 1 or even 100 to 1 data reductions. EMC quickly went in and snapped up Avamar for this same reason.


You probably already read Dave’s blog posting on how he sees data de-duplication fitting into NetApp’s master plan. I even picked up the phone and talk to my NetApp buddies about it. Unfortunately, I came away less impressed with the NetApp De-dup facility. May be I expect more out of netapp, being a long time customer and one who has been excited about their innovations in the past.

NetApp’s de-duplication technology is based on its SnapLock product. SnapLock had a single-instancing algorithm called “A-SIS”, acronym for Advanced Single Instance Storage. NetApp used that technology and the checksumming feature of WAFL to arrive at the de-duplication layer. Its all well and good if you leave it at that. However, I am left wanting more information and can only come away with less than impressive results.

First of all, the details on the de-duplication offering is sketchy. Depending on who you ask and which article you read, you get a different picture.

Claim: We now support data de-duplication on all of our storage systems. (It takes a license.)

Reality: The way I read this is this:

  1. Upgrade the OS to a version where A-SIS is supported
  2. Buy the A-SIS license for that head and install it
  3. Start de-duping

Well, not really. At least the NetApp Sales force does not think so. The information I received is that A-SIS is supported on NearStore R200 and NearStore on FAS (3000/6000) series.

I frankly think the claim is correct and the sales force needs to get an updated version of the playbook. However, it does show the disconnect.

Claim: If the same block of data is present in two different LUNs or files, then the storage system spots this and saves space by keeping just one copy.

Reality: Yes. For “files”, the restriction is that it works on per flex vol only. You cannot turn on de-duplication if you have traditional volumes. You also cannot turn on de-duplication across multiple flexvols. So if you have two flex vols for your HR department and they have duplicate files spread across these two different flexvols, then there is no de-duping that can happen.

With this limited functionality, the ease of install and use of the de-duping facility almost HAS to be there for me to get excited. I am still left with the larger problem of data getting duplicated across servers/luns, file systems, volumes etc and not to mention, the other guy’s storage array. What do I do with that ?

Let’s say that it did work across flex vols. Would you really want to do that ?

Imagine a scenario where there duplicate files in multiple file shares, all coming from multiple volumes (flexvols i.e.). Data is now being de-duplicated and being stored in one location. The following questions arise:

  • Which location is it stored it ? May be the location where the file was first created.


  • What happens if a volume is lost ? Granted you have RAID-DP et al. But let us say that you had multiple disk failure and you lost a volume. What then ? The duplicate files that it has that are part of other volumes are also lost. What would be the cost of recovery and cost of the downtime that is now caused because of the enterprise-wide failure that is now perceived ?


  • How is meta-data handled ? One file is of type NTFS and another duplicate file is of type Unix. Would it constitute a duplicate file ?

To this end, I believe a strong data de-duplication strategy w[sh]ould involve de-duplicating to a target, not necessarily onto the primary disk itself.



Trackbacks

  1. NetApp De-Dup FAQ…

Speak Your Mind

*