44

Table Compression in Oracle9i R2 Plamen Zyumbyulev INSIDE OUT,, Let someone k n o w ”

Embed Size (px)

Citation preview

Table Compression in Oracle9i R2

Plamen Zyumbyulev

INSIDE OUT

,, Let someone k n o w ”

Agenda

Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion

Table Compression Facts

Table compression is useful

Everyone benefits from space saving

It not only saves space but can increase performance

It can’t be implemented everywhere

Why Table Compression ?

Table Compression increases:– I/O-subsystem capacity– I/O throughput– query scan performance (mainly FTS)– buffer cache capacity

Table Compression:– reduces cost of ownership– is easy to use– requires minimal table definition changes– is transparent to applications

Overview: Table Compression

Compression algorithm is based on removing data redundancy

Tables and Materialized Views can be compressed – Compression can also be specified at the partition level and

tablespace level– Indexes and index-organized tables are not compressed with

this method (there are other methods for index and IOT compression)

Compression is dependent upon the actual data DDL/DML commands are supported on compressed

tables Table columns cannot neither be added nor deleted

from a compressed table.

Which Applications benefit from Table Compression?

Table Compression targets read intensive applications such as Decision Support and OLAP

All schema designs benefit from Compression

Agenda

Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion

How does Table Compression work?

Data is compressed by eliminating duplicate values in a database block

First Name Last Name

Scott Smith

Henry Smith

Henry Scott

Henry-Scott McGryen

Dictionary is built per block information to uncompress data is available in each block

If column values from same or different columns have the same values, they share the same symbol table entry. Only entire column values are compressed.

Sequences of columns are compressed as one entity if a sequence of column values occurs multiple times in many rows.

Block Level Compression

1233033 Meyer 11 Homestead Rd 13.99 1212300 Meyer11 Homestead Rd 1.99 1243012 Meyer 11 Homestead Rd 1.99 9923032 McGryen 3 Main Street 1.99 9833023 McGryen 3 Main Street 1.99 2133056 McGryen 3 Main Street 1.99

Non-Compressed Block

Block Header

Free Space

Block Header

Compressed Block

Meyer 11 Homestead Rd 1.99 McGryen 3 Main Street

1233033 13.9912123001243012 992303233023

982133056

Free Space

Symbol Table

Invoice CustName CustAddr Sales_amt1233033 Meyer 11 Homestead Rd 13.991212300 Meyer 11 Homestead Rd 1.991243012 Meyer 11 Homestead Rd 1.999923032 McGryen 3 Main Street 1.999833023 McGryen 3 Main Street 1.992133056 McGryen 3 Main Street 1.99

How Table Compression works

All columns are considered for compression Only worthwhile compression is performed Symbol table is created within each database

block depending on block content– Self tuning symbol table is created automatically

by the system– No explicit declaration of symbol table entries– Compression algorithm automatically adapts to

changes in data distribution

Which data is compressed

Compression occurs only when data is inserted with a bulk (direct-path) insert operation.

– Direct Path SQL*Loader– insert /*+ append */ …– create table … as select …– alter table move …

A table can consist of compressed and uncompressed blocks transparently.

Any DML operation can be applied to a table storing compressed blocks. However, conventional DML operations cause records to be stored uncompressed*.

SQL Commands

For a new table:– Create with compress attribute in table definition

create table … compress

For an existing table:1. Alter table to add compress attribute only new

rows are compressedalter table foo compress;

2. Compress table old and new rows are compressedalter table foo move compress;

Process of Compressing a Block

Deletes, Inserts and Updates

Deletes, Inserts and Updates are possible but can cause fragmentation and waste disk space when modifying compressed data.

Large PCTFREE will lead to low compression ratios. Setting PCTFREE to 0 (default) is recommended for all tables storing compressed data.

Updates

When a column is updated the algorithm checks whether a symbol table entry for the new value exists.

– If it exists, the reference of the updated column is modified to the new symbol table entry and its reference count is increased by one. At the same time the reference count of the old value is decreased by one.

– If no symbol table entry exists for the new column value, that value is inserted non-compressed into the row.

UPDATE TABLE itemSET i_color = ‘green’WHERE i_color =’blue’

If the old column value (‘green’) was also compressed and its reference count after the update operation became zero, the old symbol table entry is replaced with a new symbol table entry without touching all rows of one block.

Some update operations can take advantage of compression

Deletes

During delete operations all references counters of the deleted rows are decreased by one. Once a reference counter becomes zero, the corresponding symbol table entry is purged.

A symbol table is never deleted from a block even if no reference into it exists because the overhead of an empty symbol table is only 4 bytes.

Agenda

Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion

Test Environment:

One very big table – 2.3 TB

Table is partitioned per day.

One partition is around 3,2 GB

Once the data is loaded and processed it becomes read only.

Most of the table access is – FTS

Agenda

Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion

Space Savings

Table Compression significantly reduces disk and buffer cache requirements

Compression results mostly depend on data content on block level

Definitions: Compression Factor

Space Savings

Non Compressed Blocks

Compressed BlocksCF=

Non Compressed Blocks – Compressed Blocks

Non-Compressed BlocksSS= x100

What affects Compression?

Column length long short

Number distinct values low high

Block size large small

Sorted data yes no

Column sequence yes no

Modified data yes no

Column length long short

Number distinct values low high

Block size large small

Sorted data yes no

Column sequence yes no

Modified data yes no

Table Characteristic Compression Factor

high low

Estimating CF by using data samples create function compression_ratio (tabname varchar2)return number is-- sample percentagepct number := 0.000099;-- original block count (should be less than 10k)blkcnt number := 0;-- compressed block countblkcntc number;beginexecute immediate ' create table TEMP_UNCOMPRESSED pctfree 0as select * from ' || tabname ||' where rownum < 1';while ((pct < 100) and (blkcnt < 1000)) loopexecute immediate 'truncate table TEMP_UNCOMPRESSED';execute immediate 'insert into TEMP_UNCOMPRESSED select *from ' ||tabname || ' sample block (' || pct ||',10)';execute immediate 'selectcount(distinct(dbms_rowid.rowid_block_number(rowid)))from TEMP_UNCOMPRESSED' into blkcnt;pct := pct * 10;end loop;execute immediate 'create table TEMP_COMPRESSED compress asselect * from TEMP_UNCOMPRESSED';execute immediate 'selectcount(distinct(dbms_rowid.rowid_block_number(rowid)))from TEMP_COMPRESSED' into blkcntc;execute immediate 'drop table TEMP_COMPRESSED';execute immediate 'drop table TEMP_UNCOMPRESSED';return (blkcnt/blkcntc);end;/

Ordered vs. Not ordered

The biggest CF increase comes from ordering the data

0

0.5

1

1.5

2

2.5

3

3.5

CF

not ordered dataoredered data

How Data volume affects CF

2.4

2.6

2.8

3.2

3.4

3.6

Com

pres

sion

Fac

tor

1 3 5

Days in one partition

7 9

3.0

Ordered Data

1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5Input Data

1 2 3 4Block1

5Block2

4 rowsper blockcompressed

CF=2.5

Each value is compressed in one blockSymbol table contains 1 2 3 4 5Block contains 20 values

1 2 3 4 55 rowsper blockcompressed

CF=4

Block1

Not all values fit into first blockSymbol tables contains

1 2 3 4Block contains 16 values

Symbol tables contains only 5Block contains 4 values

20 values 1 2 3 4 5Ordered1 column row

Sorting can also improve the clustering factor of your indexes.

Not Ordered Data

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5Input Data

20 values 1 2 3 4 5Not ordered1 column row

1 2 3 4 55 rowsper blockcompressed

CF=4

Block1

CF=1

1 2 3 4Block1

5 1 2 3Block2

4 5 1 2Block3

3 4 5 1Block4

2 3 4 5Block5

4 rowsper blockcompressed

Choosing the columns to order by

Sorting on fields with very low cardinality does not necessarily yield to better compression

The optimal columns to sort on seem to be those that have a table/partition-wide cardinality equal to the number of rows per block

Column correlation should be considered

The process is iterative

Know your data

Without a detailed understanding of the data distribution it is very difficult to predict the most optimal order.

Table/partition statistics are useful– dba_tables– dba_tab_partitons

Looking into a particular data block is very helpful

– substr(rowid, 1, 15)

Improving ordering speed Set SORT_AREA_SIZE for the session as big as

possible. Use dedicated temp tbs with big extent size (multiple of SORT_AREA_SIZE + 1 block)

If the sort needs more space: The data is split into smaller sort runs; each piece is

sorted individually. The server process writes pieces to temporary

segments on disk; these segments hold intermediate sort run data while the server works on another sort run.

The sorted pieces are merged to produce the final result.

If SORT_AREA_SIZE is not large enough to merge all the runs at once, subsets of the runs are merged in a number of merge passes.

Agenda

Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion

How CF affects FTS performance

Queries are executed against compressed schema and non-compressed schema

Overall query speedup 65%

050

100150200250300350400450500

Tho

usan

ds r

ows

per

seco

nd

non-compressedcompressed CF=3.34

Query Elapsed Time Speedup

The larger the compression factor the larger the elapsed time speedup

Query speedup results from reduction in I/O-operations required– Speedup depends on the weakness of the I/O-

subsystem– Speedup depends on how sparse the blocks are

that the query accesses

Performance impact on loads and DML

On system with unlimited IO bandwidth, data load may be two times longer (even more if data need to be ordered).

Bulk loads are IO-bound on many systems. Deleting compressed data is 10% faster. Inserting new data is as fast as inserting into non

compressed table. UPDATE operations are 10-20% slower for

compressed tables on average, mainly due to some complex optimizations that have been implemented for uncompressed tables, and not yet implemented for compressed tables.

Other Performance Tests

Parallel load performance (CPU)

Delete operation CPU Utilization Update operation CPU Utilization

Delete/Update Performance

FTS Performance

Parallel Full Table Scan CPU Utilization

Parallel Full Table Scan IO Performance

Table Access by ROWID

Agenda

Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion

Best Practices

Use Compression in read intensive applications

Execute bulk loads (SQLLDR and Parallel Insert) to compress rows

Compress older data in large Data Warehouses– Integrate Table Compression into the ‘rolling

window’ paradigm: Compress all but most recent partition

Compress Materialized views Only compress infrequently updated tables

Data normalization and Table Compression

“Normalize till it hurts, denormalize till it works”

High normalization may result in a high number of table joins (bad performance)

Both data normalization and table compression reduce redundancy

Conclusion

Table Compression:– reduces costs by shrinking the database

footprint on disk

– is transparent to applications

– often improves query performance due to reduced disk I/O

– increases buffer cache efficiency

AQ&Q U E S T I O N SQ U E S T I O N S

A N S W E R SA N S W E R S

[email protected]