59
Deduplication School 2010 W. Curtis Preston Executive Editor, TechTarget Founder/CEO Truth in IT, Inc. Founder/CEO Truth in IT, Inc. Follow on Twitter @wcpreston Follow on Twitter @wcpreston

Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Deduplication School 2010

W. Curtis Preston

Executive Editor, TechTarget

Founder/CEO Truth in IT, Inc.Founder/CEO Truth in IT, Inc.

Follow on Twitter @wcprestonFollow on Twitter @wcpreston

Page 2: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

A Little About MeA Little About Me• When I started as “backup guy” at $35B company in

1993:1993:• Tape Drive: QIC 80 (80 MB capacity)

• Tape Drive: Exabyte 8200 (2.5 GB & 256KB/s)

• Biggest Server: 4 GB (’93), 100 GB (’96)Biggest Server: 4 GB ( 93), 100 GB ( 96)

• Entire Data Center: 200 GB (’93), 400 GB (’96)

• My TIVO now has 5 times the storage my data center did!

• Consulting in backup & recovery since ‘96• Consulting in backup & recovery since 96

• Author of O’Reilly’s Backup & Recovery & Using SANs and NAS

• Webmaster of BackupCentral.com

• Founder/CEO of Truth in IT

• Follow me on Twitter @wcpreston

Page 3: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc.• Inspired by Consumer Reports™, but designed for IT

• No advertising, no “partners” = no need to SPIN

• No huge consulting fees just to find out which products work and which ones don’t work (such fees typically start work and which ones don t work (such fees typically start at $10K and go all the way to $100K!)

• Funded instead by $999 annual subscription y $ p

• Private online community with written research, testing results, podcasts of interviews with users of products, and direct communication with real customers of the products you’re interested in – all included

• In beta now at http://www truthinit com• In beta now at http://www.truthinit.com

Page 4: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

AgendaAgenda• Understanding Deduplication

• Using Deduplication in Backup Systems

• Using Data Reduction in Primary Systems• Using Data Reduction in Primary Systems

• Recent Backup Software Advancements

• Backing up Virtual Servers

• Backups on a BudgetBackups on a Budget

• Stump Curtis

Page 5: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Session 1

Understanding Deduplication

Page 6: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Why Disk?Why Disk?• First a little history

Page 7: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

History of My World Part IHistory of My World Part I• When I joined the industry (1993)

Di k 4 MB/ t 256 KB/• Disks were 4 MB/s, tapes were 256 KB/s

• Networks were 10 Mb shared

S t l t (2010)

QIC 80 (60 KB/s)

• Seventeen years later (2010)• Disks are 70 MB/s, tapes are 120 MB/s

• Networks are 10 Gb switched• Networks are 10 Gb switched

• Changes in 17 years17 i i di k d (l kil RAID

Exabyte 8200 (256 KB/s)

• 17x increase in disk speed (luckily, RAIDhas created virtual disks that are way faster)

• 500x increase in tape speed!

• 1000x+ increase in network speed

DECStation 5000

Page 8: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

More HistoryMore History• Plan A: Stage to disk, spool to tape

• Pioneered by IBM in 90s, widely adopted in late 00s

• Large, very fast virtual disk as caching mechanism to tape

• Only need enough disk to hold one night’s backups• Only need enough disk to hold one night s backups

• Helps backups; does not help restores

• Plan B: Backup to disk leave on disk• Plan B: Backup to disk, leave on disk• AKA the early VTL craze

• Helps backups and restoresHelps backups and restores

• Disk was still way too expensive to make this feasible for most people

Page 9: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Plan C: DedupePlan C: Dedupe• It’s perfect for “traditional” backup

• Fulls backup the same data every day/week/month

• Incrementals backup entire file when only one byte changes

• Both backup file 100 times if it’s in 100 locations• Both backup file 100 times if it s in 100 locations

• Databases are often backed up full every day

• Tons of duplicate blocks!

• Average actual reduction of 10:1 and higher

• It’s not perfect for everythingp y g• Pre-compressed or encrypted data

• File types that don’t have versions (multimedia)

Page 10: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

NaysayersNaysayers• Eliminate all but one copy?

• No, just eliminate duplicates per location

• What about hash collisions?• More on this later, but this is nothing but FUD

• If you’re unconvinced, use a delta differential approach

• Doesn’t this have immutability concerns?• Everything that changes the format of the data has

i bili ( b d )immutability concerns (e.g. sector-based storage, tar, etc)

• Job of backup/archive applications is to verify same in/out

Wh t b t th “d d t ”?• What about the “dedupe tax”?• Let’s talk more about this one in a bit

Page 11: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Is There a Plan D?Is There a Plan D?• Some pundits/analysts think dedupe (especially

target dedupe) is a band-aid, and will eventually be done away with via backup-

ft b d d d d lt b k tsoftware-based dedupe, delta-backups, etc.

• Maybe this will happen in a 3-5 year time span, maybe it won’t. (In fact, some backup software companies will tell you they don’t need no stinking dedupe appliances.)

• That’s still no argument for not moving on what’s available to solve your problems now

Page 12: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

How Dedupe Works

Page 13: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Your Mileage WILL VaryYour Mileage WILL Vary• You really can get 10x to 400x

• It depends on• Frequency of full backups (more fulls = more dupes)

• How much of a given incremental backup contains versions of other files (multimedia generally doesn’t have versions)

• Length of retention (longer retention = more dupes)• Length of retention (longer retention = more dupes)

• Redundancy in single full backup (if your product notices)

• Things that confuse dedupe• Things that confuse dedupe• Encrypting data before the dedupe process sees it

• Compressing data before the dedupe process sees itCompressing data before the dedupe process sees it

• Multiplexing to a VTL

Page 14: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

How Do They Identify Duplicate Data?How Do They Identify Duplicate Data?• Two very different methods

• Chunking/hashing

• Asigra, EMC Avamar, Symantec PureDisk, CommVault Simpanap

• EMC Data Domain, Greenbytes, FalconStor VTL & FDS, NEC, Quantum DXi

• Delta differential• Delta differential

• Exagrid, IBM Protectier, Ocarina, SEPATON

• Some systems may use a hybrid approach

Page 15: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Chunking/Hashing MethodChunking/Hashing Method• Slice all data into segments or chunks

• Run chunk through hashing algorithm (SHA-1)

• Check hash value against all other hash values• Check hash value against all other hash values

• Chunk with identical hash value is discarded

• Will find redundant blocks between files from different file systems, even different servers

Page 16: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Delta Differential MethodDelta Differential Method• Correlate backups

• Mathematical methods

• Using metadata

• Compare similar backups byte-by-byte

• Examplep• Tonight’s backup of Exchange instance Elvis is seen as

“similar” to last night’s backup of Elvis

T i h ’ b k f El i i d b b b l • Tonight’s backup of Elvis is compared byte-by-byte to last night’s backup of Elvis & redundant segments are found

Page 17: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Hashing & Delta DifferentialHashing & Delta Differential• Hashing

• Most used method with most mileageMost used method with most mileage• Some concerned about hash collisions (more on this later)• Compares everything to everything, therefore gets more

dedupe out of similar data in dissimilar datasets (e.g. production and test copy of same data)production and test copy of same data)

• Delta Differentials• Faster than hashing• Faster than hashing• No concern about hash collisions• Only compares like backups, so will get no dedupe on

similar data in dissimilar datasets, but does get more dedupe on same datadedupe on same data

• What will you get? Only testing with your data ill th t tiwill answer that question.

Page 18: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Hash Collisions: The real numbersHash Collisions: The real numbersHash Size Number of Hashes & Amount of Data to achieve

Desired Probability (Assuming 8k chunk size)

10-15 10-5

128 bits (MD5) 8.2 × 1011 6.6 PB 8.2 × 1016 20.9 YB

160 bits (SHA-1) 5 4 × 1016 432 5 EB 5 4 × 1021 1 371 181 YB160 bits (SHA-1) 5.4 × 10 432.5 EB 5.4 × 10 1,371,181 YB

• 10-15: Odds of single disk writing incorrect data and not knowing it (Undetectable Bit Error Rate or UBER)

• With SHA-1, we have to write 6.6 PB to get those odds

• 10-5: Worst odds of a double-disk RAID5 failure

• We have to write 1,371,181 YB to reach those odds• Original formula here: http://en.wikipedia.org/wiki/Birthday_attack

• Original formula modified with MacLaurin series expansion to mitigate Excel’s • Original formula modified with MacLaurin series expansion to mitigate Excel s lack of precision and is here: backupcentral.com/hash-odds.xls

Page 19: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Where Is the Data Deduped?Where Is the Data Deduped?• Target Dedupe

D t i t difi d LAN & d d d t t t• Data is sent unmodified across LAN & deduped at target

• No LAN/WAN benefits until you replicate target to target

• Cannot compress or encrypt before sending to target

• Source Dedupe

• Redundant data is identified at backup client

• Only new, unique data sent across LAN/WAN

• LAN/WAN benefits, can back up remote/mobile data

• Allows for compression, encryption at source

• Hybrid

• Fingerprint data at source, dedupe at target

• Allows for compression, encryption at source

Page 20: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Let’s Make It More ComplicatedLet s Make It More Complicated• Standalone Target Dedupe

• Dedupe appliance separate from backup software

• Integrated Target Dedupe• Target dedupe from b/u s/w vendor that backs up to POD*

• Standalone Source Dedupe• Full dedupe solution that only does source dedupe

• Integrated Source Dedupeg p• Backup software that can dedupe at client (or not)

• HybridHybrid• Also from backup software company

*Plain Ol’ Disk

Page 21: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Name That DedupeName That Dedupe• Standalone Target Dedupe

• Data Domain Exagrid Greenbytes IBM NEC Quantum • Data Domain, Exagrid, Greenbytes, IBM, NEC, Quantum, SEPATON

• Integrated Target Dedupe• Integrated Target Dedupe• Symantec NetBackup

Integ ated So ce Ded pe• Integrated Source Dedupe• Asigra, Symantec NetBackup

• Standalone Source Dedupe• EMC Avamar, i365 eVault, Symantec NetBackup

• Hybrid• CommVault Simpana

Page 22: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Multi-node Deduplication

AKA Global Deduplication

AKA Clustered DeduplicationAKA Clustered Deduplication

Page 23: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

What We’re Not Talking AboutWhat We re Not Talking About• Remember hashing vs. delta differential dedupe

• Delta compares like to like

• Hashing compares everything to everything• Hashing compares everything to everything

• Some sales reps from some companies (that d ’ h l i d / l b l d d ) don’t have multi-node/global dedupe) are calling the latter global dedupe. It’s not.

• At a minimum this is honest confusion

• Possibly this is subterfuge to confuse the buyeross b y t s s subte uge to co use t e buye

Page 24: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Si l d /L l M lti d /Gl b lSingle-node/Local vs. Multi-node/Global• Assume a customer buys multiple nodes of a

dedupe system

• Suppose, then, that they back up exactly the same client to each of those multiple nodes

• If the vendor fails to recognize the duplicate If the vendor fails to recognize the duplicate data and stores it multiple times, it has single-node/local dedupe / p

• If the vendor recognizes duplicate data across multiple nodes and stores it on only one node multiple nodes and stores it on only one node, they have multi-node/global dedupe

Page 25: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Doctor It Hurts When I Do ThisDoctor It Hurts When I Do This

• Single-node/local dedupe vendors say “then don’t do that. Why would you do that?”

• They tell you to split up your datasets and send a given d t t t l li dataset to only one appliance

• Easy to do if

• Y d t t i h• Your dataset sizes never change

• A given dataset never outgrows a node

S i l d l ill i t t th t thi l • Some single-node sales reps will point out that this also doesn’t harm your dedupe ratio because most dedupe is from comparing like to like. They’re also the same ones p g yclaiming they get better dedupe because they compare all to all. Which is it?

Page 26: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Multi node Is the Way to GoMulti-node Is the Way to Go• Especially for larger environments & budget

conscious environments that buy as they go

• With multi-node dedupe you can load-balance & treat same as you would a large tape library

• Single-node dedupe pushes the vendors to ride Single node dedupe pushes the vendors to ride the crest of the CPU/RAM wave

Multi node vendors can ride behind the wave • Multi-node vendors can ride behind the wave, saving cost without reducing value

Page 27: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Multi/Single Node Dedupe VendorsMulti/Single Node Dedupe Vendors• Multi-node/global

• EMC Avamar (12 nodes)

• Exagrid (10 nodes)

• NEC (55 nodes)• NEC (55 nodes)

• SEPATON (8 nodes)

• Symantec PureDisk, NetBackup & Backup Exec

• Diligent (2 nodes)

• Single-node/local (as of Mar 2010)g / ( )• EMC Data Domain

• NetApp ASIS

• Quantum

Page 28: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

When Is It Deduped?

AKA Inline or Post Process?

Page 29: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Get Out the SwordsGet Out the Swords

• We’d have just as much luck trying to settle these arguments• Apple vs Windows

• Linux vs either of them

• Linux vs FreeBSD

• Vmware vs the mainframe (the original hypervisor)Vmware vs the mainframe (the original hypervisor)

• Cable modem vs DSL

• Initial common sense leans to inline but post-• Initial common sense leans to inline, but post-process offers a lot of advantages

C t i k b d t t i k b d • Cannot pick based on concept; must pick based on price/performance

Page 30: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

What’s the Difference?What s the Difference?• This only applies to target dedupe

• Inline is synchronous dedupe

• Post-process is asynchronous dedupe • Post-process is asynchronous dedupe

• Both are deduping as the data is coming into h d i ( i h d d fi )the device (with most products and configs)

• The question is really where the dedupe process reads the native data from. If it reads it from RAM, we’re talking inline. If it reads it from disk, we’re talking post process.

Page 31: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Inline & Post-process: An I/O WalkthroughInline & Post process: An I/O WalkthroughStep IL Hash IL Delta PP Hash PP Delta

Ingest (100%) RAM write RAM write Disk write Disk write

New segment RAM read RAM read Disk read Disk read

Old segment RAM read Disk read RAM read Disk read

Match (90%) Disk delete Disk delete( )

No match(10%) Disk write Disk write

For every 100 GB an inline hash system writes 10 GB to disk

For every 100 GB an inline delta system writes 10 GB, reads 100 GB from disk

For every 100 GB a post process hash system writes 100 GB, reads 100 GB, and deletes 90 GB from disk

For every 100 GB a post process delta system writes 100 GB, reads 200 GB, and deletes 90 GB from disk,

Common sense seems like inline has a major advantage

Things change when you consider the “dedupe tax”

Page 32: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

The Chair Recognizes InlineThe Chair Recognizes Inline• When you’re done with backups, you’re done

with dedupe

• Backups begin replicating as soon as they arrive

• The post-process vendors need a staging area

Th d d ’ d d i • The post-process vendors don’t start deduping until a backup is done; that will make things take longertake longer

Page 33: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

The Chair Recognizes Post processThe Chair Recognizes Post-process• When backups are done, dedupe is almost done

• Replication begins as soon as the first backup is done

• We wait until a backup is done, not until all the backups are done (unless you tell us to)

• The staging area allowsI iti l b k t b f t• Initial backups to be faster

• Allows copies and recent restores to come from native data

• Allows for staggered implementation of dedupe

• Selecti el ded pe onl hat makes sense• Selectively dedupe only what makes sense

• You don’t need as much staging disk as you might think

I li d l d l b k d t • Inline vendors may slow down large backups and restore. They always rehydrate. We only rehydrate older data.

Page 34: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Inline & Post process VendorsInline & Post-process Vendors• Inline

• EMC Data Domain

• IBM Protectier

• NEC HydraStor• NEC HydraStor

• Post-processE id• Exagrid

• Greenbytes

• Quantum DXiQuantum DXi

• SEPATON Deltastor

Page 35: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

How Does Replication Work?How Does Replication Work?• Does replication use dedupe?

• Can I replicate many-to-one, one-to-many, cascading replication?

• If deduping many to one, will it dedupe globally across those appliances?across those appliances?

• Can I control what gets replicated and when? (e g production vs development)(e.g. production vs development)

Page 36: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Is There an Index?Is There an Index?• What happens if the index is destroyed?

• How do you protect against that?

• Does it need its index to read the data?

• What do you to verify data integrity?

• What about malicious people?

• Some dedupe vendors aren’t very good at answering these questions, partially because they don’t get them enoughthey don t get them enough

• Make sure you ask them

Page 37: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Truth in IT Backup Concierge ServiceTruth in IT Backup Concierge Service• Community of verified but anonymous end-users

(no vendors)( )

• Included in base service: “Billable” product & strategy-related questions

• Learn from other customer’s questions & answers

• Much less expensive than traditional consulting

Talk to real people using the products you are interested in• Talk to real people using the products you are interested in

• Podcast interviews with end-users and thought leaders

• Unbiased product briefings written by expertsUnbiased product briefings written by experts

• Coming soon: Reports of lab tests by experts

• Field test reports designed by us, conducted by end-usersp g y , y

• One year subscription: $999

Page 38: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Session Two

Using Deduplication in Backup Systems

Using Data Reduction in Primary Systemsy

Page 39: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

The “Dedupe Tax” AKA “Rehydration Problem”The “Dedupe Tax” AKA “Rehydration Problem”

• Essentially a read from very fragmented data

• Not all dedupe systems are equally adept at reassembling Humpty Dumptyg p y p y

• Especially visible during tape copies & restores of large systems (single stream performance)of large systems (single stream performance)

• Recent POC of three major vendors showed 3x diff i fdifference in performance!

• Remember to test replica source & destination

Page 40: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Isn’t It Cheaper Just toIsn t It Cheaper Just to…• Buy tape?

• Tape is cheaper than ever & keeps getting cheaper

• Must encrypt if you’re using tape

• Must use D2D2T to stream modern tape drives• Must use D2D2T to stream modern tape drives

• Must constantly tweak to ensure you’re doing it right

• Take all that away and use dedupe

• May not be cheaper but definitely better

• Buy JBOD/RAIDy /• Even if it were free, you still have to power it

• Power/cooling bill will be 10-20x more with JBOD/RAID

• Replication not feasible, stuck with tape for offsite (see above)

Page 41: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Let’s Talk About What MattersLet s Talk About What Matters• What are the risks of their approach?

• Data integrity questionsg y q

• How big is it?• What’s my dedupe ratio?• How big can it grow (local vs global)g g ( g )

• How fast is it• How fast can it backup/restore/copy my data?• How fast is replication?p

• How much does it cost?• Pricing schemes are all over the board• Try to get them on even playing fieldy o g o p y g d• Also consider operational costs

• Adding storage• Replacing drives (how long does rebuild take?)• Monitoring, etcMonitoring, etc

Page 42: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Advanced Uses of Deduplication

Page 43: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Eliminate Tape ShippingEliminate Tape Shipping• Offsite backups w/o

shipping tapesshipping tapes

• Backups with no human hands on them

• Make tapes offsite from replicated copy and never move them

• No tapes shipped = No need to encrypt tapes

Page 44: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Shorter Recovery Point ObjectivesShorter Recovery Point Objectives• Most companies run

backups once per daybackups once per day

• Even though they back up their transaction logs, throughout the day, they’re only sent offsite once per dayonce per day

• Dedupe and replication could get them offsite immediately throughout the day

Page 45: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

VMware BackupVMware Backup• One of the challenges with

typical VMware backup is typical VMware backup is the I/O load it places on the server

• Source dedupe can perform an incremental-forever backup with a much lower I/O load

C ld ll t • Could allow you to continue simpler backups without having to invest in VCB

Page 46: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

ROBO & Laptop BackupsROBO & Laptop Backups• Dedupe software can

protect even the largest p glaptops over the Internet

• It can also protect relatively large remote y gsites without installing hardware

• Restores can be done locally (for slower RTOs) or locally using a local recovery server (for quicker RTOs)quicker RTOs)

Page 47: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Where to Use Target/Source DedupeWhere to Use Target/Source Dedupe• Laptops, Vmware, Hyper-V are easy: it’s got to be source

• Small, remote sets of data also an easy decision. Could do target w/remote backup server, but cost usually pushes people to sourcepushes people to source.

• A medium-sized (<1 TB) remote site could use a remote target system or remote source dedupe backup server target system or remote source dedupe backup server that replicates to CO

• Medium-large datacenter could also use eitherg

• Large datacenter (10TB+) might start to find things they don’t like about a source system

• Should do POC to decide

Page 48: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Source Dedupe: Remote Backup Server?Source Dedupe: Remote Backup Server?• If using source dedupe to backup a remote

office, should you back up directly to a centralized backup server or backup to a remote b k th t li t t t l backup server that replicates to a central server?

• It’s all about the RTO you need.

• Decide on RTO, test “totally remote” restore , yand see if it can meet it.

• If not use a remote server• If not, use a remote server

Page 49: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

How Big is Too Big to Replicate Backups?How Big is Too Big to Replicate Backups?• Remote office replicating to a CO, or a CO

replicating its backups to a DR site, there is a limit to how much you can replicate

• Make sure you’ve done all you can to maximize deduplication ratio. A 10:1 site will need twice as much bandwidth as a 20:1 site.

• Depends on daily deduplicated change rate, p y p g ,which is a factor of data types and dedupe ratio

• Now common to protect 1 TB over typical WAN • Now common to protect 1 TB over typical WAN lines, much more over dedicated lines

Page 50: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Test, Test, Test!!!

Page 51: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Test EverythingTest Everything• Installation and configuration, including adding additional capacity

• Support – call and ask stupid questions

D d ti• Dedupe ratio• Must use your data• Must use your retention settings• Must fill up the system

• All speeds• Backup speed• Copy speed – extremely important to test• Restore speed

• Aggregate performance• With all your data types• Especially true if using local dedupe

• Single stream performance• Backup speed• Restore and copy speed (especially if going to tape)

• ReplicationReplication• Performance• Lag time (if using post process)

• Dedupe speed (if using post process)

• Loss of physical systemsD i b ild ti• Drive rebuild times

• Reverse replication to replace array?• Unplug things, see how it handles it• Be mean!

Page 52: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Testing Methods: Source DedupeTesting Methods: Source Dedupe• Must install on all data types you plan to back

up

• Must task the system to the point that you plan to use it… VMware anyone?

• OK to back up many redundant systems; that’s OK to back up many redundant systems; that s kind of the point

Remember to test speed of copy to tape if you • Remember to test speed of copy to tape if you plan to do so

Page 53: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Testing Methods: Target DedupeTesting Methods: Target Dedupe• Copy production backups into IDT/VTL using

your backup software’s built-in cloning/migration/dupe features

• Use dedicated drives if possible and script it to run 24x7

• You must fill up the system, expire some data, then add more data to see steady state infoy

• Copy/backup to one system, replicate to another record entire time then restore/copy another, record entire time, then restore/copy data from replicated copy

Page 54: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Data Reduction in Primary Storage

Page 55: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

A Whole New Ball GameA Whole New Ball Game• In primary space, we use the term data reduction, as it’s

more inclusive than dedupemore inclusive than dedupe

• A very different access pattern; latency is much more importantimportant

• The standard in backup world is tape: just don’t be slower than that and you’re OKs o d you O

• The standard in primary world is disk: anything you do to slow it down will kill the project

• Will not get same ratios as backup

• Summary: the job is harder and the rewards are fewerSummary: the job is harder and the rewards are fewer

• And yet, some are still trying it

Page 56: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

OptionsOptions• Compression

• File-level dedupe

• Sub-file-level dedupe• Sub-file-level dedupe

• Some files compress, but don’t dedupe

• Some files dedupe but don’t compress well

Page 57: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

VendorsVendors• Compression

• Storwize, Ocarina

• File-level dedupe• EMC Celerra

• Sub-file-level dedupep• NetApp ASIS, Ocarina, Greenbytes, Exar/Hifn, SNOracle

• Usually you get compression or dedupeUsually you get compression or dedupe

• Ocarina & Exar claim to do both compression and sub file level dedupeand sub-file-level dedupe

Page 58: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Pros/Cons of Primary Data ReductionPros/Cons of Primary Data Reduction• Saves disk space, power/cooling

• Can have positive or negative impact on performance – must test to see which

• Does not usually help backups: data is re-duped before being read by any app, including backupbefore being read by any app, including backup

• Exception to above rule is NetApp SnapMirror to tapetape

Page 59: Deduplication School 2010 - TechTargetmedia.techtarget.com/searchStorage/downloads/dedupe-school2010.pdf · A Little Bit About Truth in IT IncA Little Bit About Truth in IT, Inc

Contact MeContact Me• Email [email protected]

• Websites to which I contribute:• http://www.backupcentral.com

• http://www.searchstorage.com

• http://www.searchdatabackup.com

• Follow me on Twitter @wcpreston

• My upcoming venture:My upcoming venture:• http://www.truthinit.com