View
1
Download
0
Category
Preview:
Citation preview
Data Reduction:Data Reduction: Primary Storage EditionPrimary Storage Edition
P d BPresented By:Marc Staimer, President & CDSDragon Slayer Consultingg y gmarcstaimer@comcast.net503‐579‐3763
Dragon Slayer Consulting Intro
M St i P id t & CDSMarc Staimer ‐ President & CDS12+ years
Storage, SANS, SW, Networks, Servers
Consults vendors (> 100)
Consults end users (> 400)Consults end users (> 400)
Analysis at trade shows
Publishes consistently with Tech Target
marcstaimer@mac.com503-579-3763Publishes consistently with Tech Target
Periodically published for trade magazines
30+ years industry experiencey y p
Sep 10 NY Storage Decisions 2
Agenda
Primary Storage Problem Attacked
Data Reduction Methods
Effectiveness of Each on Primary Data Storage
Who Does What
Conclusions
Sep 10 NY Storage Decisions 4
Primary Storage Problem Attacked:Explosive Storage ConsumptionExplosive Storage Consumption
“There are 100 million Microsoft Office documents created every day ”every day.”Forrester Research
“L 161 b f di i l i f i d“Last year, 161 exabytes of digital information were created, representing 3 million times the information inall the books ever written.”all the books ever written.IDC
“58% of new corporate data growth is unstructured data58% of new corporate data growth is unstructured datasuch as Microsoft Office documents. This is set to grow at an annual rate of 96%.”
Sep 10 NY Storage Decisions 5
Taneja Group
Server Virtualization Can Compound Problem
G ld iGolden images
ISO files
Virtual desktops
Sep 10 NY Storage Decisions 6
Admins Having Inelastic Task Limits
Human beings are task & time limitedNo matter how many productivity toolsNo matter how many productivity tools
Only so many tasks one person can accomplishIn a given hour, day, week, month, year
Most admins are overloadedNot getting any better until
Genetic engineering or cyborgs
Manual tasks are badHuman speed & error proneHuman speed & error prone
Automated tasks are goodMachine speed, fewer errorsp
Admins have far too many manually intensive tasksSep 10 NY Storage Decisions 7
Data Reduction Aims At:Slowing Explosive Primary Storage ConsumptionSlowing Explosive Primary Storage Consumption
By doing so, reducing operationalSystems
Port
Switches
Cables
InfrastructureInfrastructure
Power
Coolingg
Most importantly admin mgmt & tasks
From the economic perspective, it reducesOpEx & CapEx
Sep 10 NY Storage Decisions 8
Conservation of EnergyConservation of EnergyConservation of EnergyConservation of Energy
TotalTotal amountamount ofof energyenergy inin anan isolatedisolated systemsystem remainsremainsconstantconstant oror conservedconserved overover timetime EnergyEnergy cancan neitherneither bebegygycreatedcreated nornor destroyed,destroyed, itit cancan onlyonly bebe transformedtransformed fromfromoneone statestate toto anotheranother.. TheThe onlyonly thingthing thatthat cancan happenhappen totoyy gg ppppenergyenergy inin ourour universeuniverse isis thatthat itit cancan changechange form,form, (e(e..gg..chemicalchemical energyenergy cancan becomebecome kinetickinetic energyenergy..)) Einstein’sEinstein’sgygy gygy ))theorytheory ofof relativityrelativity showsshows massmass && energyenergy areare 22 sidessides ofofthethe samesame coincoin wherewhere neitherneither appearsappears w/ow/o thethe otherother..pppp
Sep 10Sep 10 NY Storage DecisionsNY Storage Decisions 99
What Does This Have to Do w/Data Reduction?
There are alwaysThere are alwaystradeoffstradeoffs
Sep 10 NY Storage Decisions 10
Primary Data Reduction Methodologies
L l C iLossless CompressionDeduplicationPermanent Compression
A.K.A. File Optimization
Sep 10 NY Storage Decisions 11
Lossless CompressionOriginal form of deduplicationOriginal form of deduplication
Reduces file size & primary storage consumption byEliminating duplicate patterns (files, words, blocks, etc.)
Utilizes a database or dictionary with pointers or translatorsOften referred to as LZS (Lempel Ziv Stac) adaptive compression
Constantly looking & learning as patterns changeConstantly looking & learning as patterns changeChanges dictionary or database to reflect more efficient patterns
Can be software or hardware basedHardware vendors include HIFN & Storwize
Sep 10 NY Storage Decisions 12
Lossless Compression – The Good
It is very effective withd f lUncompressed files
Unstructured data (DBMS)
Golden images
Redundant ISO files
Virtual desktop
Structured data
Expect storage consumption reductionRanging from 10 to 50%
Sep 10 NY Storage Decisions 13
Lossless Compression – The Bad
Add’l i & d l l iAdd’l write & read latency = longer response timesCompression takes time (there is no free lunch)Data must be “rehydrated” to be read – more timey
Does not work with all file or data typesPreviously compressed data such as MS Office, JPEGs, MP3s, MP4s, Zip, etc.
Data can actually end up being bigger
Encrypted data
Doesn’t compress between filesDoesn t compress between filesData in duplicate files do not compress
Sep 10 NY Storage Decisions 14
Deduplication – Also Lossless
3 Diff t t3 Different typesFile
Block or blocklet
Content aware
Sep 10 NY Storage Decisions 15
1. Storage Based File Based Dedupe
Reduces duplicate filesReduces duplicate filesReduces primary storage consumption by
Eliminating duplicate identical filesEliminating duplicate identical filesCoarse granularity approach
Identical filesIdentical files
Any difference what‐so‐ever does not get deduped
Sep 10 NY Storage Decisions 16
File Dedupe – The Good
InexpensiveOft li f t tOften no license fee on some storage systems
Very effective onDuplicate Email attachmentsDuplicate Email attachments
Duplicate ISO files
Golden images (as NAS files)Go de ages (as S es)
Virtual desktop images (as NAS files)
Expect storage consumption reductionp g pRanging from 10 to 50%
Sep 10 NY Storage Decisions 17
File Dedupe – Who
T tTargetCaringo
EMC NS‐Series & Celerra blades
EMC Centera
NetApp FAS/V‐Series
Nexsan Assureon
Permabit
Tarmin
Sep 10 NY Storage Decisions 18
File Dedupe – The Bad
Add’l write & read latency = longer response timesFile dedupe takes time (there is no free lunch) – specifically inline dedupeFile dedupe takes time (there is no free lunch) specifically inline dedupe
Post processing does not but must be done off hours
Data must be “rehydrated” to be read – more time
k h ll f l dDoes not work with all file or data typesPreviously compressed data such as MS Office, JPEGs, MP3s, MP4s, Zip, etc.
Encrypted dataEncrypted dataStructured data
Only dedupes identical filesAny change eliminates ability to be deduped
Sep 10 NY Storage Decisions 19
2. Storage Block or Blocklet Based Dedupe
Reduces duplicate blocks or blockletsBlocklets are sub‐blocks
Reduces primary storage consumption byEliminating duplicate data blocks or blocklets
Fine granularity approach
Similar to LZS compressionSimilar to LZS compressionExcept is works cross files
Blocks or blocklets deduped across fileslocks or blocklets deduped across files
Sep 10 NY Storage Decisions 20
Block or Blocklet Dedupe – The Good
Good level of dedupeApp, protocol, file, pathname, & block address independentApp, protocol, file, pathname, & block address independent
Very effective onDuplicate Email attachments
Duplicate ISO files
Golden images
Virtual desktop imagesVirtual desktop images
Structured data
Available from numerous vendorsEMC, Exagrid, Falconstor, NetApp, Permabit, Quantum & others
Expect storage consumption reductionRanging from 20 to 90%
Sep 10 NY Storage Decisions 21
Block or Blocklet Dedupe – Who
Target SoftwareEMC – Data Domain
ExaGrid
FalconStor
Vast majority of BUR products (~30)
Permabit
SDFS for LinuxHuawei
HPIBMIBM
NetApp
NexentaPermabitPermabit
QuantumSepaton
SUN
Sep 10 NY Storage Decisions 22
Block & Blocklet Dedupe – The Bad
Add’l write & read latency = longer response timesBlock & blocklet dedupe takes time inline slows response timeBlock & blocklet dedupe takes time – inline slows response time
Post processing does not but must be done after hours
Data must also be “rehydrated” to be read – more time
Does not work with all file or data typesPreviously compressed data such as MS Office, JPEGs, MP3s, MP4s, Zip, etc.
E t d d tEncrypted data
Designed primarily for secondary data (BU, Snaps, Replication)Not nearly the level of duplicate data in primaryNot nearly the level of duplicate data in primaryTends to have a relatively high cost premium
Sep 10 NY Storage Decisions 23
3. Content Aware Based Storage Dedupe
Unstructured file storage object deduplicationReading & decompressing filesReading & decompressing files
MPEGs, JPEGs, Office, PDFs, Zips, etc.
Removing duplicate storage objectsReplacing them with pointers
Optimizing remaining storage objectsThen re‐compressing them in their native format
Sep 10 NY Storage Decisions 24
How Content‐Aware Dedupe Works
TXTJPGPNG
PDFTXT
PNGJPG
Before After
PNG
TXTJPG
TXT
PPT
WORD
PNG
TXT
TXT
JPG
JPG
ExtractExtract Correlate OptimizeOptimize
JPGZIP
Delayer and decode files in to fundamental storage objects
Correlate storage objects within & across files; finds both exact and similar matches
Applies file‐aware optimizers to unique objects & re‐stores
Sep 10 NY Storage Decisions 25
Content Aware Dedupe – The Good
Incredibly effective on most file types100 f diff t l d d fil t
WOW!100s of different already compressed file types
Mpegs, Jpegs, Office, Zip, PDFs, etc.
Duplicate Email attachmentsDuplicate Email attachments
Duplicate ISO files
Golden images (as NAS files)
Virtual desktop images (as NAS files)
Expect storage consumption reductionRanging from 40 to 90%
Sep 10 NY Storage Decisions 26
Content Aware Dedupe – The Bad
Post processing onlyMust be done after hours
Higher response times (latency) for reading filesData must also be “rehydrated” to be read
Doesn’t work so well w/structured dataRequires special “reader” software
On user desktop, server, or NAS to read deduped files
Requires x86 appliance hardware to processAdditional costAdditional cost
Only one vendor (DELL – Ocarina)Relatively high licensing costsy g g
Sep 10 NY Storage Decisions 27
Permanent Compression – a.k.a. File Optimization Unstructured file shaping or sculptingf p g p g
CanCan you tell theyou tell theCanCan you tell the you tell the difference? difference?
Solves the file bloat problem caused by inefficient softwareReading & decompressing files
MS Office (PowerPoint, Word, Excel) & JPEGs
R i flRemoving superfluousUnnecessary baggage, junk data, or excessive resolution
W/O compromise visual content integrity, breaking files, removing content/ p g y, g , g
Then re‐compressing them in their native formatSep 10 NY Storage Decisions 28
Independent Production Results1
File TypeFile Type OriginalOriginal OptimizedOptimized SavingsSavingsPowerPoint 110MB 17.4MB 84%o e o t 0 . 8 %
Excel 10MB 2.4MB 72%
Word 10MB 3.2MB 68%
1Average results source:
Sep 10 NY Storage Decisions 29
NXPowerLite Trident Warrior Results, FORCEnet
File Optimization – The Good
Very effective with Office files & JPEGsReduces file size forever @ a very affordable license cost
Readable by native applications & all usersNo reader software required AND no rehydration is required
Faster backups, transfers, migrations, reads, response times
Works with compression & dedupell lNot mutually exclusive
Sep 10 NY Storage Decisions 30
File Optimization – The Bad
Post processing onlyO ti i ti t b d ft hOptimization must be done after hours
Only works with Office files & JPEGs today
Limited vendorsLimited vendorsOne established vendor (Neuxpower) with over 1M users
One new vendor (Balesio) with few usersOne new vendor (Balesio) with few users
Sep 10 NY Storage Decisions 31
Things To Think About
Used appropriatelyPrimary storage data reduction can be quite goody g q g
Used inappropriatelyData reduction is marginal
File optimization combined w/compression or dedupeCan provide incredible results
Not all files or data will benefit from a given data reduction technology
There are response time issues to be aware of & work throughSome technologies requires post processing
Sep 10 NY Storage Decisions 32
Conclusions
Primary storage data reduction has real valuePrimary storage data reduction has real valueOperational value
Reduced infrastructure & managementReduced infrastructure & management
Economic valueReduced CapEx & OpExReduced CapEx & OpEx
But only if utilized appropriately
Sep 10 NY Storage Decisions 33
Recommended