Upload
redwireservices
View
8.454
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Data deduplication is a hot topic in storage and saves significant disk space for many environments, with some trade offs. We’ll discuss what deduplication is and where the Open Source solutions are versus commercial offerings. Presentation will lean towards the practical – where attendees can use it in their real world projects (what works, what doesn’t, should you use in production, etcetera).
Citation preview
Open Source Data Deduplication
Nick Webbnickwredwireservicescom wwwredwireservicescom RedWireServices
(206) 829-8621
Last updated 8102011
Introduction
What is Deduplication Different kinds Why do you want it How does it work Advantages Drawbacks Commercial Implementations Open Source implementations performance
reliability and stability of each
What is Data Deduplication
Wikipedia
data deduplication is a specialized data compression technique for eliminating coarse-grained redundant data typically to improve storage utilization In the deduplication process duplicate data is deleted leaving only one copy of the data to be stored along with references to the unique copy of data Deduplication is able to reduce the required storage capacity since only the unique data is stored
Depending on the type of deduplication redundant files may be reduced or even portions of files or other data that are similar can also be removed
Why Dedupe
Save disk space and money (less disks) Less disks = less power cooling and space Improve write performance (of duplicate data) Be efficient ndash donrsquot re-copy or store previously
stored data
Where does it Work Well Secondary Storage
BackupsArchives Online backups with limited
bandwidthreplication Save disk space ndash additional
full backups take little space
Virtual Machines (Primary amp Secondary)
File Shares
Not a Fit
Random data Video Pictures Music Encrypted files
ndash many vendors dedupe then encrypt
Types
Source Target Global FixedSliding Block File Based (SIS)
Drawbacks
Slow writes slower reads High CPUmemory utilization (dedicated server
is a must) Increases data loss risk corruption
Collision risk of 13x10^-49 chance per PB (256 bit hash amp 8KB Blocks)
How Does it Work
Without Dedupe
With Dedupe
Block Reclamation
In general blocks are not removedfreed when a file is removed
We must periodically check blocks for references a block with no reference can be deleted freeing allocated space
Process can be expensive scheduled during off-peak
Commercial Implementations
Just about every backup vendor Symantec CommVault Cloud Asigra Baracuda Dropbox (global) JungleDisk
Mozy
NASSANBackup Targets NEC HydraStor DataDomainEMC Avamar Quantum NetApp
Open Source Implementations Fuse Based
Lessfs SDFS (OpenDedupe)
Others ZFS btrfs ( Off-line only)
Limited (file based SIS) BackupPC (reliable) Rdiff-backup
How Good is it
Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical
Especially true in backup or virtual environments
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Introduction
What is Deduplication Different kinds Why do you want it How does it work Advantages Drawbacks Commercial Implementations Open Source implementations performance
reliability and stability of each
What is Data Deduplication
Wikipedia
data deduplication is a specialized data compression technique for eliminating coarse-grained redundant data typically to improve storage utilization In the deduplication process duplicate data is deleted leaving only one copy of the data to be stored along with references to the unique copy of data Deduplication is able to reduce the required storage capacity since only the unique data is stored
Depending on the type of deduplication redundant files may be reduced or even portions of files or other data that are similar can also be removed
Why Dedupe
Save disk space and money (less disks) Less disks = less power cooling and space Improve write performance (of duplicate data) Be efficient ndash donrsquot re-copy or store previously
stored data
Where does it Work Well Secondary Storage
BackupsArchives Online backups with limited
bandwidthreplication Save disk space ndash additional
full backups take little space
Virtual Machines (Primary amp Secondary)
File Shares
Not a Fit
Random data Video Pictures Music Encrypted files
ndash many vendors dedupe then encrypt
Types
Source Target Global FixedSliding Block File Based (SIS)
Drawbacks
Slow writes slower reads High CPUmemory utilization (dedicated server
is a must) Increases data loss risk corruption
Collision risk of 13x10^-49 chance per PB (256 bit hash amp 8KB Blocks)
How Does it Work
Without Dedupe
With Dedupe
Block Reclamation
In general blocks are not removedfreed when a file is removed
We must periodically check blocks for references a block with no reference can be deleted freeing allocated space
Process can be expensive scheduled during off-peak
Commercial Implementations
Just about every backup vendor Symantec CommVault Cloud Asigra Baracuda Dropbox (global) JungleDisk
Mozy
NASSANBackup Targets NEC HydraStor DataDomainEMC Avamar Quantum NetApp
Open Source Implementations Fuse Based
Lessfs SDFS (OpenDedupe)
Others ZFS btrfs ( Off-line only)
Limited (file based SIS) BackupPC (reliable) Rdiff-backup
How Good is it
Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical
Especially true in backup or virtual environments
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
What is Data Deduplication
Wikipedia
data deduplication is a specialized data compression technique for eliminating coarse-grained redundant data typically to improve storage utilization In the deduplication process duplicate data is deleted leaving only one copy of the data to be stored along with references to the unique copy of data Deduplication is able to reduce the required storage capacity since only the unique data is stored
Depending on the type of deduplication redundant files may be reduced or even portions of files or other data that are similar can also be removed
Why Dedupe
Save disk space and money (less disks) Less disks = less power cooling and space Improve write performance (of duplicate data) Be efficient ndash donrsquot re-copy or store previously
stored data
Where does it Work Well Secondary Storage
BackupsArchives Online backups with limited
bandwidthreplication Save disk space ndash additional
full backups take little space
Virtual Machines (Primary amp Secondary)
File Shares
Not a Fit
Random data Video Pictures Music Encrypted files
ndash many vendors dedupe then encrypt
Types
Source Target Global FixedSliding Block File Based (SIS)
Drawbacks
Slow writes slower reads High CPUmemory utilization (dedicated server
is a must) Increases data loss risk corruption
Collision risk of 13x10^-49 chance per PB (256 bit hash amp 8KB Blocks)
How Does it Work
Without Dedupe
With Dedupe
Block Reclamation
In general blocks are not removedfreed when a file is removed
We must periodically check blocks for references a block with no reference can be deleted freeing allocated space
Process can be expensive scheduled during off-peak
Commercial Implementations
Just about every backup vendor Symantec CommVault Cloud Asigra Baracuda Dropbox (global) JungleDisk
Mozy
NASSANBackup Targets NEC HydraStor DataDomainEMC Avamar Quantum NetApp
Open Source Implementations Fuse Based
Lessfs SDFS (OpenDedupe)
Others ZFS btrfs ( Off-line only)
Limited (file based SIS) BackupPC (reliable) Rdiff-backup
How Good is it
Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical
Especially true in backup or virtual environments
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Why Dedupe
Save disk space and money (less disks) Less disks = less power cooling and space Improve write performance (of duplicate data) Be efficient ndash donrsquot re-copy or store previously
stored data
Where does it Work Well Secondary Storage
BackupsArchives Online backups with limited
bandwidthreplication Save disk space ndash additional
full backups take little space
Virtual Machines (Primary amp Secondary)
File Shares
Not a Fit
Random data Video Pictures Music Encrypted files
ndash many vendors dedupe then encrypt
Types
Source Target Global FixedSliding Block File Based (SIS)
Drawbacks
Slow writes slower reads High CPUmemory utilization (dedicated server
is a must) Increases data loss risk corruption
Collision risk of 13x10^-49 chance per PB (256 bit hash amp 8KB Blocks)
How Does it Work
Without Dedupe
With Dedupe
Block Reclamation
In general blocks are not removedfreed when a file is removed
We must periodically check blocks for references a block with no reference can be deleted freeing allocated space
Process can be expensive scheduled during off-peak
Commercial Implementations
Just about every backup vendor Symantec CommVault Cloud Asigra Baracuda Dropbox (global) JungleDisk
Mozy
NASSANBackup Targets NEC HydraStor DataDomainEMC Avamar Quantum NetApp
Open Source Implementations Fuse Based
Lessfs SDFS (OpenDedupe)
Others ZFS btrfs ( Off-line only)
Limited (file based SIS) BackupPC (reliable) Rdiff-backup
How Good is it
Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical
Especially true in backup or virtual environments
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Where does it Work Well Secondary Storage
BackupsArchives Online backups with limited
bandwidthreplication Save disk space ndash additional
full backups take little space
Virtual Machines (Primary amp Secondary)
File Shares
Not a Fit
Random data Video Pictures Music Encrypted files
ndash many vendors dedupe then encrypt
Types
Source Target Global FixedSliding Block File Based (SIS)
Drawbacks
Slow writes slower reads High CPUmemory utilization (dedicated server
is a must) Increases data loss risk corruption
Collision risk of 13x10^-49 chance per PB (256 bit hash amp 8KB Blocks)
How Does it Work
Without Dedupe
With Dedupe
Block Reclamation
In general blocks are not removedfreed when a file is removed
We must periodically check blocks for references a block with no reference can be deleted freeing allocated space
Process can be expensive scheduled during off-peak
Commercial Implementations
Just about every backup vendor Symantec CommVault Cloud Asigra Baracuda Dropbox (global) JungleDisk
Mozy
NASSANBackup Targets NEC HydraStor DataDomainEMC Avamar Quantum NetApp
Open Source Implementations Fuse Based
Lessfs SDFS (OpenDedupe)
Others ZFS btrfs ( Off-line only)
Limited (file based SIS) BackupPC (reliable) Rdiff-backup
How Good is it
Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical
Especially true in backup or virtual environments
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Not a Fit
Random data Video Pictures Music Encrypted files
ndash many vendors dedupe then encrypt
Types
Source Target Global FixedSliding Block File Based (SIS)
Drawbacks
Slow writes slower reads High CPUmemory utilization (dedicated server
is a must) Increases data loss risk corruption
Collision risk of 13x10^-49 chance per PB (256 bit hash amp 8KB Blocks)
How Does it Work
Without Dedupe
With Dedupe
Block Reclamation
In general blocks are not removedfreed when a file is removed
We must periodically check blocks for references a block with no reference can be deleted freeing allocated space
Process can be expensive scheduled during off-peak
Commercial Implementations
Just about every backup vendor Symantec CommVault Cloud Asigra Baracuda Dropbox (global) JungleDisk
Mozy
NASSANBackup Targets NEC HydraStor DataDomainEMC Avamar Quantum NetApp
Open Source Implementations Fuse Based
Lessfs SDFS (OpenDedupe)
Others ZFS btrfs ( Off-line only)
Limited (file based SIS) BackupPC (reliable) Rdiff-backup
How Good is it
Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical
Especially true in backup or virtual environments
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Types
Source Target Global FixedSliding Block File Based (SIS)
Drawbacks
Slow writes slower reads High CPUmemory utilization (dedicated server
is a must) Increases data loss risk corruption
Collision risk of 13x10^-49 chance per PB (256 bit hash amp 8KB Blocks)
How Does it Work
Without Dedupe
With Dedupe
Block Reclamation
In general blocks are not removedfreed when a file is removed
We must periodically check blocks for references a block with no reference can be deleted freeing allocated space
Process can be expensive scheduled during off-peak
Commercial Implementations
Just about every backup vendor Symantec CommVault Cloud Asigra Baracuda Dropbox (global) JungleDisk
Mozy
NASSANBackup Targets NEC HydraStor DataDomainEMC Avamar Quantum NetApp
Open Source Implementations Fuse Based
Lessfs SDFS (OpenDedupe)
Others ZFS btrfs ( Off-line only)
Limited (file based SIS) BackupPC (reliable) Rdiff-backup
How Good is it
Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical
Especially true in backup or virtual environments
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Drawbacks
Slow writes slower reads High CPUmemory utilization (dedicated server
is a must) Increases data loss risk corruption
Collision risk of 13x10^-49 chance per PB (256 bit hash amp 8KB Blocks)
How Does it Work
Without Dedupe
With Dedupe
Block Reclamation
In general blocks are not removedfreed when a file is removed
We must periodically check blocks for references a block with no reference can be deleted freeing allocated space
Process can be expensive scheduled during off-peak
Commercial Implementations
Just about every backup vendor Symantec CommVault Cloud Asigra Baracuda Dropbox (global) JungleDisk
Mozy
NASSANBackup Targets NEC HydraStor DataDomainEMC Avamar Quantum NetApp
Open Source Implementations Fuse Based
Lessfs SDFS (OpenDedupe)
Others ZFS btrfs ( Off-line only)
Limited (file based SIS) BackupPC (reliable) Rdiff-backup
How Good is it
Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical
Especially true in backup or virtual environments
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
How Does it Work
Without Dedupe
With Dedupe
Block Reclamation
In general blocks are not removedfreed when a file is removed
We must periodically check blocks for references a block with no reference can be deleted freeing allocated space
Process can be expensive scheduled during off-peak
Commercial Implementations
Just about every backup vendor Symantec CommVault Cloud Asigra Baracuda Dropbox (global) JungleDisk
Mozy
NASSANBackup Targets NEC HydraStor DataDomainEMC Avamar Quantum NetApp
Open Source Implementations Fuse Based
Lessfs SDFS (OpenDedupe)
Others ZFS btrfs ( Off-line only)
Limited (file based SIS) BackupPC (reliable) Rdiff-backup
How Good is it
Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical
Especially true in backup or virtual environments
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Without Dedupe
With Dedupe
Block Reclamation
In general blocks are not removedfreed when a file is removed
We must periodically check blocks for references a block with no reference can be deleted freeing allocated space
Process can be expensive scheduled during off-peak
Commercial Implementations
Just about every backup vendor Symantec CommVault Cloud Asigra Baracuda Dropbox (global) JungleDisk
Mozy
NASSANBackup Targets NEC HydraStor DataDomainEMC Avamar Quantum NetApp
Open Source Implementations Fuse Based
Lessfs SDFS (OpenDedupe)
Others ZFS btrfs ( Off-line only)
Limited (file based SIS) BackupPC (reliable) Rdiff-backup
How Good is it
Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical
Especially true in backup or virtual environments
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
With Dedupe
Block Reclamation
In general blocks are not removedfreed when a file is removed
We must periodically check blocks for references a block with no reference can be deleted freeing allocated space
Process can be expensive scheduled during off-peak
Commercial Implementations
Just about every backup vendor Symantec CommVault Cloud Asigra Baracuda Dropbox (global) JungleDisk
Mozy
NASSANBackup Targets NEC HydraStor DataDomainEMC Avamar Quantum NetApp
Open Source Implementations Fuse Based
Lessfs SDFS (OpenDedupe)
Others ZFS btrfs ( Off-line only)
Limited (file based SIS) BackupPC (reliable) Rdiff-backup
How Good is it
Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical
Especially true in backup or virtual environments
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Block Reclamation
In general blocks are not removedfreed when a file is removed
We must periodically check blocks for references a block with no reference can be deleted freeing allocated space
Process can be expensive scheduled during off-peak
Commercial Implementations
Just about every backup vendor Symantec CommVault Cloud Asigra Baracuda Dropbox (global) JungleDisk
Mozy
NASSANBackup Targets NEC HydraStor DataDomainEMC Avamar Quantum NetApp
Open Source Implementations Fuse Based
Lessfs SDFS (OpenDedupe)
Others ZFS btrfs ( Off-line only)
Limited (file based SIS) BackupPC (reliable) Rdiff-backup
How Good is it
Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical
Especially true in backup or virtual environments
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Commercial Implementations
Just about every backup vendor Symantec CommVault Cloud Asigra Baracuda Dropbox (global) JungleDisk
Mozy
NASSANBackup Targets NEC HydraStor DataDomainEMC Avamar Quantum NetApp
Open Source Implementations Fuse Based
Lessfs SDFS (OpenDedupe)
Others ZFS btrfs ( Off-line only)
Limited (file based SIS) BackupPC (reliable) Rdiff-backup
How Good is it
Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical
Especially true in backup or virtual environments
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Open Source Implementations Fuse Based
Lessfs SDFS (OpenDedupe)
Others ZFS btrfs ( Off-line only)
Limited (file based SIS) BackupPC (reliable) Rdiff-backup
How Good is it
Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical
Especially true in backup or virtual environments
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
How Good is it
Many see 10-20x deduplicaiton meaning 10-20 times more logical object storage than physical
Especially true in backup or virtual environments
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
SDFS OpenDedupewwwopendeduporg
Java 7 Based platform agnostic Uses fuse S3 storage support Snapshots Inline or batch mode deduplication Supposedly fast (290MBps+ on great HW) Support for globalclustered dedupe Probably most mature OSS Dedupe (IMHO)
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
SDFS
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
SDFS Install amp Go
Install Java rpm ndashUvh SDFS-107-2x86_64rpm
sudo mkfssdfs --volume-name=sdfs_128k
--io-max-file-write-buffers=32
--volume-capacity=550GB
--io-chunk-size=128
--chunk-store-data-location=mntdata
sudo modprobe fuse
sudo mountsdfs -v sdfs_128k -m
mntdedupe
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
SDFS Pro
Works when configured properly Appears to be multithreaded
Con Slow resource intensive (CPUMemory) Fragile easy to mess up options leading to crashes little
user feedback Standard POSIX utilities do not show accurate data (eg df
must use getfattr -d ltmount pointgt and calculate bytes rarr GBTB and free yourself)
Slow with 4k blocks recommended for VMs
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
LessFSwwwlessfscom
Written in C = Less CPU Overhead Have to build yourself (configure ampamp make ampamp make install)
Has replication encryption Uses fuse
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
LessFS Install
wget httplessfs-142targz
tar zxvf targz
wget httpdb-4830targz
yum install buildstuffhellip
echo never gt syskernelmmredhat_transparent_hugepagedefrag
echo no gt syskernelmmredhat_transparent_hugepagekhugepageddefrag
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
LessFS Go
sudo vi etclessfscfg
BLOCKDATA_PATH=mntdatadtablockdatadta
META_PATH=mntmetamta
BLKSIZE=4096 only 4k supported on centos 5
ENCRYPT_DATA=on
ENCRYPT_META=off
mklessfs -c etclessfscfg
lessfs etclessfscfg mntdedupe
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
LessFS
Pro Does inline compression by default as well Reasonable VM compression with 128k blocks
Con Fragile StatsFS info hard to see (per file accounting no totals) Kernel gt= 2626 required for blocks gt 4k (RHEL6 only) Running with 4k blocks is not really feasible
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
LessFS
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Other OSS
ZFS Tried it and empirically it was a drag but I have no
hard data (got like 3x dedupe with identical full backups of VMs)
At least itrsquos stablehellip
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Kick the Tires
Test data set ~330GB of data 22GB of documents pictures music Virtual Machines
ndash 220GB Windows 2003 Server with SQL Datandash 2003 AD DC ~60GBndash 2003 Server ~8GBndash Two OpenSolaris VMs 15 amp 27GBndash 3GB Windows 2000 VMndash 15GB XP Pro VM
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Kick the Tires
Test Environment AWS High CPU Extra Large Instance ~7GB of RAM ~Eight Cores ~25GHz each ext4
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Compression Performance
First round (all ldquouniquerdquo data)
If another copy was put in (like another full) we should expect 100 reduction for that non-unique data (1x dedupe per run)
FS Home Data
Home Reduction
VM Data
VM Reduction
Combined Total Reduction
MBps
SDFS 4k 21GB 450 109GB
64 128GB 61 16
lessfs 4k (est)
24GB -9 NA 51 NA 50 4
SDFS 128k 21GB 450 255GB
16 276GB 15 40
lessfs 128k 21GB 450 130GB
57 183GB 44 24
targz --fast 21GB 450 178GB
41 199GB 39 35
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Compression (ldquoUniquerdquo Data)
SDFS 4klessfs 4k (est)
SDFS 128klessfs 128k
targz --fast-1000
000
1000
2000
3000
4000
5000
6000
7000
Home Reduction
VM Reduction
Total Reduction
5
-9
55
5
64
51
16
57
41
61
50
15
44
39
Home Reduction VM Reduction Total Reduction
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Write Performance(dont trust this)
raw SDFS 4k lessfs 4k SDFS 128k lessfs 128k targz --fast0
5
10
15
20
25
30
35
40
MBps
MBps
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Kick the Tires Part 2
Test data set ndash two ~204GB full backup archives from a popular commercial vendor
Test Environment VirtualBox VM 2GB RAM 2 Cores 2x7200RPM
SATA drives (meta amp data separated for LessFS) Physical CPU Quad Core Xeon
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Write Performance
raw SDFS 128k W SDFS 128k Re-W LessFS 128k W LessFS 128k Re-W0
5
10
15
20
25
30
35
40
MBps
MBps
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Compression (Backup Data)
SDFS LessFS Raw000
500
1000
1500
2000
2500
3000
3500
4000
7
34
0
Reduction (128K Blocks)
Reduction
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Load(SDFS 128k)
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
Open Source Dedupe
Pro Free Can be stable if well managed
Con Not in repos yet Efforts behind them seem very limited 1 dev each NoPoor documentation
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
The Future
Eventual Commodity brtfs
Dedupe planned (off-line only)
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
ConclusionRecommendations
Dedupe is great if it works and it meets your performance and storage requirements
OSS Dedupe has a way to go SDFSOpenDedupe is best OSS option right
now JungleDisk is good and cheap but not OSS
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
About Red Wire Services
If you found this presentation helpful consider Red Wire Services for your next Backup Archive or IT Disaster Recovery Planning project
Learn more at wwwRedWireServicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom
About Nick Webb
Nick Webb is the founder of Red Wire Services in Seattle WA Nick is available to speak on a variety of IT Disaster Recovery related topics including
Preserving Your Digital Legacy
Getting Started with your Small Business Disaster Recovery Plan
Archive Storage for SMBs
If interested in having Nick speak to your group please call (206) 829-8621 or email inforedwireservicescom