43
© Copyright IBM Corporation, 2011 Tzahi Shahak Product Manager Real-time Compression Real-time Compression in Storwize V7000 & SAN Volume Controller (SVC)

Real Time Compress V7000

Embed Size (px)

Citation preview

© Copyright IBM Corporation, 2011

Tzahi Shahak

Product ManagerReal-time Compression

Real-time Compression in Storwize V7000 & SAN Volume Controller (SVC)

2 © Copyright IBM Corporation, 2011

Compression Without CompromiseEnhancing Storwize V7000 to Deliver Extraordinary Efficiency

• Introducing IBM Storwize V7000 Real-time Compression• Innovative, easy-to-use compression fully integrated into Storwize

V7000• High performance implementation supports active primary

workloads

• Storwize V7000 Real-time Compression typically delivers 50% or better compression for data that is not already compressed

• Compression helps reduce– Storage purchase costs

– Rack space

– Cooling

– Software costs for additional functions

• Compression can help freeze storage growth or delay need for additional purchases

3 © Copyright IBM Corporation, 2011

Compression Without CompromiseAdvantages Compared with other Technologies

IBM Real-time Compression can be used with active primary data– High performance compression supports workloads off-limits to other

alternatives

– Significantly expands candidate data for compression

– Greater compression benefits through use on more types of data

IBM Real-time Compression operates immediately and is easy to manage

– No need to schedule periods to run post-process compression

– Eliminates need to reserve space for uncompressed data waiting post-processing

IBM Real-time Compression supports all Storwize V7000 storage– Internal or externally virtualized storage

– Can significantly enhance value of existing storage assets

4 © Copyright IBM Corporation, 2011

Compression Without CompromiseAdvantages Compared with other Technologies

IBM Real-time Compression can be used with active primary data– High performance compression supports workloads off-limits to other

alternatives

– Significantly expands candidate data for compression

– Greater compression benefits through use on more types of data

IBM Real-time Compression operates immediately and is easy to manage

– No need to schedule periods to run post-process compression

– Eliminates need to reserve space for uncompressed data waiting post-processing

IBM Real-time Compression supports all Storwize V7000 storage– Internal or externally virtualized storage

– Can significantly enhance value of existing storage assets

5 © Copyright IBM Corporation, 2011

Real-time Compression – Basics

Compression is an alternative to Thin Provisioning– They both allow you to use less physical space on disk than is presented to

the host A Compressed Volume is “a kind of” Thin Provisioning

– Only uses physical storage to store compressed data– Volume can be built from a pool using internal or external MDisks

Compression requires the I/O group hardware be one of the following platforms

– SVC Model 2145-CF8/CG8 Nodes– Storwize V7000 Model 2076-1xx/3xx Control Enclosure

Can use Volume mirroring to convert to a Compressed Volume

6 © Copyright IBM Corporation, 2011

Real-time Compression – Basics

Maximum of 200 Compressed Volumes per I/O group will initially be supported

Licensing is as follows:– For SVC it is per TB of Volume capacity as seen by a host

• Need fifty 100GB Compressed Volumes so need 5TB license

– For Storwize V7000 it is per enclosure• E.g. Customer has 4 enclosure system and is virtualizing an external disk

system with 2 enclosures they would require 6 enclosure license Note: Creating the first Compressed Volume in an I/O

group will instantly dedicate CPU and memory resources from the nodes/node canisters in that I/O group to the compression engine

– So planning/sizing should be done before implementing in a production environment

More detail on this and how compression works will be provided on the June 13th call tomorrow

7 © Copyright IBM Corporation, 2011

Clien ts

SVC S /W C om ponent

RAC E S /W C om ponent

F ro nt E nd

R e m o te C o p y

C ac he

F las h C o p y

Mirro ring

T hin P ro vis io ning

V irtualizatio n

Storag e

B ack E nd

R andom AccessC ompression

Engine™

All copy services will interoperate with compressed Volumes

– All copy services will be working with uncompressed data

• No real changes in sizing and planning for FlashCopy or replication

– Bandwidth sizing for replication same for compressed/non-compressed Volumes

– Compression engine resources allocated per I/O group need considered in sizing

All Thin Provisioning properties apply to compressed Volumes

– Virtual capacity, real capacity, used capacity, etc.

New property introduced– Uncompressed capacity

• Provides an indication of how much uncompressed data has been written to the Volume

Real-time Compression – Basics

8 © Copyright IBM Corporation, 2011

Real-time Compression – GUI Support

GUI Displays Compression Savings on a Volume, Pool and System basis:

9 © Copyright IBM Corporation, 2011

Real-time Compression – GUI Support

GUI Performance panel shows separate CPU utilization for Compression and System workloads

10 © Copyright IBM Corporation, 2011

Real-time Compression – Sizing Tools

The following tools will be available to support customers deploying Compression

– Disk Magic

• Will ask the user to provide an “Effectiveness” value (similar to Easy Tier)– Available later this year

– Capacity Magic

• Will ask the user to provide a compression ratio to complete the sizing

– Comprestimator

• A tool to estimate the compression ratio which is achievable for a given set of data

• Loaded on customer’s hosts

11 © Copyright IBM Corporation, 2011

Real-time Compression – 45 Day Trial License

45 Day Free Trial License of Compression Function– Included in software so simply activate using the GUI by setting to

something other then zero to avoid errors in event log

12 © Copyright IBM Corporation, 2011

Compression Performance

13 © Copyright IBM Corporation, 2011

• Storwize V7000 with Real-time Compression delivers up to 4x compression while maintaining VMware and Application performance

VMware VMmark Performance Benchmarks

MailServer Score MailServer QoS

OLIO QoSOLIO Score

Compression Without Compromise – VMware

Uncompressed

Compressed

Measured Score(Performance)

0

100

200

300

400

Uncompressed

Compressed

Uncompressed

Compressed

0

1000

2000

3000

4000

Measured Score(Performance)

0

10

20

30

40

50

Measured QoS(Response Time)

Uncompressed

Compressed

Measured QoS(Response Time)

010203040506070

14 © Copyright IBM Corporation, 2011

Compression Without Compromise

Storwize V7000 with Real-time Compression delivers up to 5x compression while maintaining or improving application business throughput

Business throughput / Transactional IOPS – higher is better

Source: IBM lab measurements, 96/48-drive configurations

0

1500

3000

TransactionProcessing

Email

Tra

nsa

cti

on

Th

rou

gh

pu

t

No Compression

Compressed

15 © Copyright IBM Corporation, 2011

0

.8

1.2

.4

The benchmark was performed using a Storwize V7000 system with 300GB SAS HDDs and 300GB SSDs. 1.2TB DB2 database with 700 concurrent clients were used in the benchmark. The same test was performed with compressed volumes and non-compressed volumes.

Stock Level

Res

pons

e T

ime

in S

econ

ds

Delivery Order Status

No Compression - 96 disks1.144

.857

.468

Database Performance

Storwize V7000 with Real-time Compression delivers up to 5x compression while maintaining or improving database transaction response time and overall business throughput

Tested using industry standard TPC-C Benchmark – 1.2TB DB2 Database with 700 users

Response time in seconds – lower is better (faster response time)

Compressed – 48 disks

.701

.665

.385.46

.20

.501

Compressed – 6 Flash Drives

16 © Copyright IBM Corporation, 2011

Beta Customer database testing resultsSVC virtualizing 6-node XIV Gen2 configuration

Orion (Oracle I/O Calibration Tool) is a standalone tool for calibrating the I/O performance for storage systems that are intended to be used for Oracle databases. The calibration results are useful for understanding the performance capabilities of a storage system, either to uncover issues that would impact the performance of an Oracle database or to size a new database installation

17 © Copyright IBM Corporation, 2011

Expected Compression Rates

IBM Comprestimator tool should be used to evaluate expected compression benefits in existing environments

18 © Copyright IBM Corporation, 2011

Comprestimator is a host based utility for a fast estimation of a block device compression ratio

Objectives: Run over a block device Estimates:

– Portion of non-zero blocks in the volume

– Compression rate of non-zero blocks

Performance: Runs FAST! < 60 seconds, no matter what the volume size is Provides accuracy level for the estimation: ~5 % max error

– Can improve guarantee with more samples (longer running time)

Method: Random sampling and compression throughout the volume Collect enough non-zero samples to gain desired confidence

– More zero blocks slower (takes more time to find non-zero blocks) Mathematical analysis gives confidence guarantees Note: the tool is estimating compression during migration of a volume into RtC

Comprestimator

19 © Copyright IBM Corporation, 2011

Compression Implementation Guidelines Compression performance –

– Performance of thin provisioned volumes (supported by the same number of HDDs) with and without compression is roughly equivalent

Use Comprestimator to identify workloads that are good candidates for compression

– More than 45% savings – recommend to compress

– Between 25-45% savings – recommend evaluating workload with compression

– Less than 25% savings – recommend avoiding compression

Common workloads suitable for compression– Databases – DB2, Oracle, MS-SQL, etc.

– Applications based on databases – SAP, Oracle Applications, etc.

– Server Virtualization – KVM, VMware, Hyper-V, etc.

– Other compressible workloads – engineering, seismic, collaboration, etc.

Common workloads NOT suitable for compression– Workloads using pre-compressed data types such as video, images, audio, etc.

– Workloads using encrypted data

– Heavy sequential write oriented workloads

– Other workloads using incompressible data or data with low compression rate

20 © Copyright IBM Corporation, 2011

Additional Considerations

Compression is supported for a maximum of 200 compressed volume copies per I/O group. Note this limit applies only to compressed volumes, there is no restriction for the number of non-compressed volumes

Recommended to use compression on:– 4 core systems (V7000, CF8, older CG8) with less than 25% CPU utilization (before enabling

compression)

– 6 core systems (newer CG8) with less than 50% CPU utilization (before enabling compression)

The CPU reallocation is done as soon as the first compressed volume is defined (even if it is not used)

If existing system CPU utilization is over the thresholds mentioned above – in environments with less than 4 I/O groups a new I/O group to support compression can be added to the cluster

Compressed volumes are not supported with Easy Tier in this release

21 © Copyright IBM Corporation, 2011

Additional Considerations

Compression is supported for a maximum of 200 compressed volume copies per I/O group. Note this limit applies only to compressed volumes, there is no restriction for the number of non-compressed volumes

Recommended to use compression on:– 4 core systems (V7000, CF8, older CG8) with less than 25% CPU utilization (before enabling

compression)

– 6 core systems (newer CG8) with less than 50% CPU utilization (before enabling compression)

The CPU reallocation is done as soon as the first compressed volume is defined (even if it is not used)

If existing system CPU utilization is over the thresholds mentioned above – in environments with less than 4 I/O groups a new I/O group to support compression can be added to the cluster

Compressed volumes are not supported with Easy Tier in this release

22 © Copyright IBM Corporation, 2011

Q&A

23 © Copyright IBM Corporation, 2011

Q&A

24 © Copyright IBM Corporation, 2011

Q&A

25 © Copyright IBM Corporation, 2011

Q&A

26 © Copyright IBM Corporation, 2011

Q&A

27 © Copyright IBM Corporation, 2011

Q&A

28 © Copyright IBM Corporation, 2011

Additional Considerations

Compression is supported for a maximum of 200 compressed volume copies per I/O group. Note this limit applies only to compressed volumes, there is no restriction for the number of non-compressed volumes

Recommended to use compression on:– 4 core systems (V7000, CF8, older CG8) with less than 25% CPU utilization (before enabling

compression)

– 6 core systems (newer CG8) with less than 50% CPU utilization (before enabling compression)

The CPU reallocation is done as soon as the first compressed volume is defined (even if it is not used)

If existing system CPU utilization is over the thresholds mentioned above – in environments with less than 4 I/O groups a new I/O group to support compression can be added to the cluster

Compressed volumes are not supported with Easy Tier in this release

29 © Copyright IBM Corporation, 2011

Data Compression Basics

30 © Copyright IBM Corporation, 2011

Compression Basics – Lempel Ziv

Detects repetitions in the data

Replaces portions of the data with references to matching data

30

ASJKDFHASJABCDEFORIUFSDFWEIRUCMNXSDFKWIOEUZXCMZXNVSFJSDFLSJCXCSKLRJHWEIOUOCZXCVKMSNDKSFSJZXM23NB33KJK1J1HJGJHHJ1VFHJGFHJ1GHJG23GJ123ABCDEFDHJKWEIORUWOEIRIXCVLXVJLASDFSDF’LSDGRERMNJDFJKGDGJERTYUIRDJKGHDKJTEHTREUITYEUIDWOSIOSDFWEOIRUABCDEFKDFHSDFJHWEIORWERYWEFUWYEIRUWERYXDKJFHSWETR5DFGCVBNA1SFSKLJFSKLDFJSLKDFJSLKDFJSKLDFJSDLFKJSDFKLJSDFKLJSDLFJSDFKLSJDFKLSJDEI4SDFDFDFDSDSDFSDFSDFSDFDSDFSDFSSDF283

4HKJH

31 © Copyright IBM Corporation, 2011

Compression Basics – Sliding Window

• Repetitions can be detected only within the sliding window history

• Common sliding window size – 32K

• Repetitions outside the window can not be referenced

Window size limit• Memory footprint required to

hold history in searchable manner

• Processing power required for searching larger history window

• Size of pointer needed to reference small repetition

31

ASJKDFHASJHWRETORIUFSDFWEIRUCMNXSDFKWIOEUZXCMZXNVSFJSDFLSJCXABCDEFHWEIOUOCZXCVKMSNDKSFSJZXM23NB33KJK1J1HJGJHHJ1VFHJGFHJ1GHJG23GJ123ABCDEFDHJKWEIORUWOEIRIXCVLXVJLASDFSDF’LSDGRERMNJDFJKGDGJERTYUIRDJKGHDKJTEHTREUITYEUIDWOSIOSDFWEOIRUKDFHSDFJHWEIORWERYWEFUWYEIRUWERYXDKJFHSWETR5DFGCVBNA1SFSKLJFSKLDFJSLKDFJSLKDFJSKLDFJSDLFKJSDFKLJSDFKLJSDABCDEFLFJSDFKLSJDFKLSJDEI4SDFDFDFDSDSDFSDFSDFSDFDSDFSDFSSDF2

834HKJH

32 © Copyright IBM Corporation, 2011

Compression Basics – Huffman Coding

Detects common characters in data

Represents common character using less bits

32

IAIIIABIBIIDMIBBBMIIIIIBBBMADKLEBBIIIIBBBIIIIAJHJKJDAMMMMIIIIIIIBBBIIIIISDFDIOIIIIIIIABBBBBMIIIIMMMIIIIIIDDFMMMMIIGFMMAEERTGMMDFMMMIIIIIIIAAABBBBBBBIIIIIUIIDIIIIIIDDGDBBBBBBBMMMEERMB

BIIBMI

Common Char Bit Representation

Other

I

B

M

0

10

110

111 + 8 Bits

33 © Copyright IBM Corporation, 2011

Compression – Random Access

• Data is dependent on preceding data due to the nature of compression

• In order to read from a specific location, all data before it has to be decompressed

• To write to a specific location, all data after it has to be recompressed as well

• Not effective for large files or block devices

• Compression implementations do not support random access

33

CompressedData

Data

34 © Copyright IBM Corporation, 2011

Compression – Random Access Chunks

• Break original data to fixed chunks• Each chunk is compressed and

decompressed independently• Enables some random access to

the data (reads, not writes)

• Large chunks – Heavy I/O penalty– 4KB update = 1MB read + 1MB write

• Small chunks – Poor compression

• Variable output– Data fragmentation– Lower performance over time– Lower compression ratio over time

34

CompressedData

Data

1

2

3

4

5

6

7

1 23

4 56

7

35 © Copyright IBM Corporation, 2011

Compression – Random Access Chunks

• Break original data to fixed chunks• Each chunk is compressed and

decompressed independently• Enables some random access to

the data (reads, not writes)

• Large chunks – Heavy I/O penalty– 4KB update = 1MB read + 1MB write

• Small chunks – Poor compression

• Variable output– Data fragmentation– Lower performance over time– Lower compression ratio over time

35

CompressedData

Data

1

2

3

4

5

6

7

1 23

4 56

7

36 © Copyright IBM Corporation, 2011

IBM RACE Technology

37 © Copyright IBM Corporation, 2011

Variable Input Fixed Output

• RACE flips this approach, taking a variable data stream size and producing “fixed” output units

– Compressed volumes have a consistent layout

– Temporal locality: data that’s accessed together is compressed together

– Variable sized input chunks get better compression

– Requires fewer disk I/Os– Delivers better performance

• No Fragmentation• Consistent performance over time• Consistent compression ratio over

time

37

CompressedData

Data

1

2

3

4

5

6

123456

1

2

3

4

5

6

CompressedData

38 © Copyright IBM Corporation, 2011

Temporal Compression

Applications make multiple updates to data Traditional and post-process compression

uses fixed-sized chunks and compresses each update based on its location on a volume

RACE compression acts on data that is written around the same time (“temporal locality”) not according to location

Temporal locality is more related to real application operations

RACE takes advantage of the structure of the data and its application level relations

Better compression efficiency and performance

38

1 2 3

Time

TemporalCompression

Window

1

2

3

Location Compression

Window

# = Data Update

39 © Copyright IBM Corporation, 2011

Compressed Data Indexing

• Data is mapped to its location in the compressed container

• Efficient data updates are made possible with remapping

• Hierarchical indexing enables fast access and efficient memory usage

• Efficient write of the map with low I/O overhead

39

CompressedData

Data

Index

40 © Copyright IBM Corporation, 2011

Compression Journaling

• Compressed Data is written in a journal

– Physical location– Length– Data

• Journal entries are compressed• Compressed data populates

fixed length blocks

• Enables temporal data compression

• Compressed data write – no read before write

• Compressed data write – less data written to disk

40

W1

123456

W2 W3 W4

Time

Journal

CompressedData

1 2 3 4

C1 C2 C3 C4

41 © Copyright IBM Corporation, 2011

Progressive Compressed Block Write

• Each write from the host is compressed independently

• Compression rate of the resulting block is nearly identical to the compression ratio of compressing the entire data in one operation

• Compression dictionary is preserved between the independent writes

© Storwize 2010 Storwize Confidential and Proprietary 41

32K 34K 264K 1

20K 1K 7K

42 © Copyright IBM Corporation, 2011

43 © Copyright IBM Corporation, 2011

The following terms are trademarks of International Business Machines Corporation in the United States, other countries, or both:IBM, IBM Logo, on demand business logo, Enterprise Storage Server, xSeries, BladeCenter, eServer, ServeRAID andFlashCopy, System Storage, Tivoli, Easy Tier, Active Cloud EngineThe following are trademarks or registered trademarks of other companies.Intel is a trademark of the Intel Corporation in the United States and other countries.Java and all Java-related trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc., in the United States and other countries.Lotus, Notes, and Domino are trademarks or registered trademarks of Lotus Development Corporation.Linux is a registered trademark of Linus Torvalds.Microsoft, Windows and Windows NT are registered trademarks of Microsoft Corporation.SET and Secure Electronic Transaction are trademarks owned by SET Secure Electronic Transaction LLC.UNIX is a registered trademark of The Open Group in the United States and other countries.Storwize and the Storwize logo are trademarks or registered trademarks of Storwize Inc., an IBM Company.

* All other products may be trademarks or registered trademarks of their respective companies.

Notes:Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.

IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.

All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.

This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.

The information on the new products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information on the new products is for informational purposes only and may not be incorporated into any contract. The information on the new products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for our products remains at our sole discretion.

All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

This presentation and the claims outlined in it were reviewed for compliance with US law. Adaptations of these claims for use in other geographies must be reviewed by the local country counsel for compliance with local laws.

Legal Information and Trademarks