29
PRESENTATION TITLE GOES HERE The Curious Case of Database Deduplication Gurmeet Goindi Oracle

The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

PRESENTATION TITLE GOES HERE

The Curious Case of Database Deduplication

Gurmeet Goindi Oracle

Page 2: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

2 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Agenda

Introduction Deduplication Databases and Deduplication All Flash Arrays and Deduplication

Page 3: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

3 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Quick Show of Hands

How many Storage administrators? How many System administrators ? How many Network administrators ? How many Database administrators? Neither? Example, Managers?

Page 4: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

4 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Enterprise Data Growth

Year over year data growth !

Enterprise data is growing 10 - 20% per year Some industries upwards of 50%+ data growth / annually

Increase in production data is magnified in the backup storage infrastructure

More production data expands exponentially into the backup storage infrastructure

Page 5: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

5 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Exponential Growth in Backup Infrastructure

Backup retention periods remain constant based on business needs

Often expanding due to regulatory requirements

Storage budget doesn’t scale with data growth Data center space and power constraints hamper storage scale-outs

5

Production Data Size One Month of Backups

Last Year:

This Year:

Data Growth Multiples in Backup Copies:

Page 6: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

6 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Backup Storage Optimization Strategies

Backup Compression Software (e.g. native utilities) Hardware (e.g. tape or disk drives)

Deduplication Source Side (software) Purpose Built Backup Appliances Hybrid Approach

6

Page 7: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

7 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Agenda

Introduction Deduplication Concepts Databases and Deduplication All Flash Arrays and Deduplication

Page 8: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

8 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Deduplication 101

Deduplication: Replace redundant backup data with pointers to shared copy

Redundant data identified by unique algorithms

Lowers storage costs by reducing capacity requirements Can be done at source, inline or post-process Mileage varies for deduplication of database blocks

Page 9: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

9 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Deduplication Internals

Is the Fingerprint Unique ?

Write to storage,

update the metadata

Yes

No

Duplicate Found: Bit

Compare with existing block

Doesn’t Match

Write to storage,

update the metadata

Matches

Update the metadata

Deduplication Engine

Hash / Fingerprint Backup Data

Page 10: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

10 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Deduplication & Compression

Compression is about shrinking the dataset Encoding information using fewer bits than the original representation Examples: Lempel-Ziv (LZ) compression

Deduplication is about finding unique datasets and reducing redundant dataset Deduplicated dataset can be further compressed Compressed dataset will yield poor and unpredictable deduplication ratios

Page 11: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

11 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Source Side Deduplication

Deduplication software is installed on each host Software reads the data to be backed up and looks for duplicates Transmits the deduped backup to a storage appliance, for most cases Performance impact:

Host : High Network: Low

Usually the backups need to be reconstituted when copied to tape

Page 12: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

12 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Target Side Deduplication

Deduplication occurs at backup/storage appliance Two flavors:

Inline – Backup is deduplicated prior to being written to disk Post-process – Backup is written to a staging area, then deduplicated

Performance Impact: Host : High Network: High

Backups must be reconstituted when copied to tape

Page 13: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

13 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Agenda

Introduction Deduplication Concepts Databases and Deduplication All Flash Arrays and Deduplication

Page 14: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

14 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Databases And Deduplication

Leading vendor – 30x times deduplication for File Systems 6x for Relational Databases Emerging all Flash array vendor – 10x saving for file systems – 2.1x for Relational Databases Most vendors recommend full backups to achieve better deduplication ratios

What drives low deduplication ratios for

databases ?

Page 15: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

15 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Relational Databases – Quick Intro

Relational database: repository and management of data organized in a relational model * Database Management System (DBMS): the software infrastructure to maintain these key characteristics:

Structures: Well-defined objects to store data Operations: Well-defined actions to access & update data & structure Integrity rules: Well-defined rules to govern operations

Data organized in a table: a two-dimensional relation with rows (tuples) and columns (attributes)

Each row in a table has the same set of columns E.g. employee, department, salary tables

Page 16: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

16 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Databases and Storage

Database Objects Reads and Writes Meta data associated with these Objects

Database Logging Mechanism to deliver transactional consistency Critical functionality for database operation Availability features

Page 17: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

17 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Database manages its logical storage in data blocks

Minimum unit of I/O

A data block has a well-defined structure

Block header is kept consistent with payload, rows do not overlap, metadata in its place … More than simple bits: can always verify logical consistency

Database Block Format

Page 18: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

18 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Even the row (user) data is very carefully formatted within a database data block Blocks have a fixed size (usually 8K) All blocks are unique Update to a block will result in metadata updates to adjacent blocks as well

Database Block Format .. Contd

Page 19: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

19 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Full Backups Incremental Backups Image copy: Fairly efficient for restore Inefficient use of Server, Network and Storage resources Might Provide Better deduplication ratios

Application aware intelligent image copy Better storage and network resource utilization, but not for server workload Faster restore times compared to incremental copies

Transmit only changes on a periodic basis Efficient use to storage, network and server resources

Time to recover depends on the number of incremental backups needed to be restored Faster than any full backup strategy, reduces backup window

Most successful Database backup strategies include a combination of the two approaches to meet your Recovery Objectives

Understanding Database Backup Options

Page 20: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

20 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Incremental Backups And Deduplication

A database aware incremental backup only has unique blocks Deduplication ratios are minimal / unpredictable

Limited to duplicates in the user data within the block Low impact on database server and network

Full Backup Sunday

1

4 3

2

Incremental Monday

5

7

6

8

Incremental Tuesday

9

C B

A

Incremental Wednesday

5 9

7

1 6

4 3

C B 8

2 A

Full Backup Thursday

Blocks Transmitted

All

Changed data

1 2 4 3 5 6 8 7 9 A C B All

Page 21: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

21 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Full Backups And Deduplication

Huge number of redundant blocks transmitted In the above example: almost 80% of the traffic is redundant With a typical 5%-10% churn rate nearly 90-95% traffic will be redundant

Hence deduplication ratios are high Significant impact on database server and network

Full Backup Sunday

1

4 3

2

Full Backup Monday

5

7

6

8

Full Backup Tuesday

9

C B

A

Full Backup Wednesday

5 9

7

1 6

4 3

C B 8

2 A

Full Backup Thursday

Blocks Transmitted All

Changed data

All All All All

Page 22: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

22 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Database Compression and Deduplication

Several database integrated compression techniques:

OLTP data compression on user data Columnar compression Backup compression

Most will yield unpredictable deduplication performances on the backup stream Similar implications with encryption as well

Page 23: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

23 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

What About Database Logs ?

Fundamental structure of the database: transaction logs

Most crucial structure for data recovery Contents: change data, undo data, schema / object management statements, etc. Well-defined data fields, relationships

Almost zero scope for deduplication REDO RECORD - Thread:1 RBA: 0x00003e.0000d188.01b0 LEN: 0x07c8 VLD: 0x01

SCN: 0x0000.00d6ca18 SUBSCN: 1 04/30/2007 21:06:42 CHANGE #1 TYP:0 CLS:67 AFN:3 DBA:0x00c1fbb1 OBJ:4294967295 SCN:0x0000.00d6ca15 SEQ: 1 OP:5.2 ktudh redo: slt: 0x0007 sqn: 0x00000000 flg: 0x000a siz: 120 fbi: 0 uba: 0x00c05098.0004.01 pxid: 0x0000.000.00000000

Page 24: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

24 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

To Summarize

Is deduplication not the correct design choice for Database backups? Well, it depends Better deduplication ratios possible:

With daily full backups May not be practical, based on database size

By disabling database compression / encryption Consistent with your security / capacity policies?

After customizing backup parameters Need to balance database / dedupe best practices

For storage efficiency, do the testing + math

Daily full backups + 3rd party dedupe

Daily incremental backups + native compression VS.

Page 25: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

25 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Agenda

Introduction Deduplication Concepts Databases and Deduplication All Flash Arrays and Deduplication

Page 26: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

26 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Deduplication And All Flash Arrays

Solid State Drives are still 5-10x more expensive than Hard Disk Drives Economics of SSDs improve as the footprint of the dataset to be stored is reduced Some All Flash Array vendors are leveraging deduplication to increase the storage efficiency of SSDs Deduplication ratios are fairly modest compared to that seen in secondary storage or Purpose Built Backup Appliances

Page 27: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

27 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Deduplication in All Flash Arrays – Technology Enablers

Is the Fingerprint Unique ?

Write to storage,

update the metadata

Yes

No

Duplicate Found: Bit

Compare with existing block

Doesn’t Match

Write to storage,

update the metadata

Matches

Update the metadata

Deduplication Engine

Hash / Fingerprint Dataset

SSDs accelerate read I/O by many folds thus accelerating bit comparisons and fingerprint lookups Fewer writes result in higher flash endurance Less strong hashing allows the array to leverage in-built CPU hashing functions to generate the fingerprint

Page 28: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

28 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Databases, Deduplication and All Flash Arrays

Most All Flash Arrays use fixed length chunk size By definition each database block update results in a unique block and changes to metadata of adjacent blocks

Algorithms that are unaware of database block boundary will generate a unique hash for each update Thus defeating the purpose of deduplication

Page 29: The Curious Case of Database Deduplication · Structures: Well -defined objects to store data Operations: Well -defined actions to access & update data & structure Integrity rules:

29 2014 Data Storage Innovation Conference. © Oracle Inc. All Rights Reserved.

Questions ?