29
Oracle Index study for Event TAG DB M. Boschini [email protected] S. Della Torre [email protected]

Oracle Index study for Event TAG DB M. Boschini [email protected] S. Della Torre [email protected]

Embed Size (px)

Citation preview

Page 1: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Oracle Index study for Event TAG DB

M. Boschini [email protected]. Della Torre [email protected]

Page 2: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Outline TAG DB issue:

space usage issue INDEX accounting for ~ 50% of used space...

query performance Tests with random data Tests with AMS01 data Tests with AMS02 Cosmic data

M. Boschini Oracle Index Study for STATUS DB – TIM October 2008 - CERN

Page 3: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

EVENT TAG DB Goal: test feasibility of storing in an Oracle DB TAG

info for AMS02 events TAG is a 64bit number AMS02 MC has, as for now, no meaningful TAG info

(0 or error bit) We start from previous studies:

http://ams.cern.ch/AMS/Reports/AMSnote-2000_09_05b.ps.gz

http://ams.cern.ch/AMS/Reports/Computing/Apr2004/M.Boschini_LV3_DB.pdf

Page 4: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

EVENT TAG DB

Test INDEX type space usage Test INDEX usage Time insert & selections What can we use ?

Page 5: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

test environment SLC 4.6 (64bit) 2 dual core Intel(R) Xeon(R) CPU @ 2.00GHz Oracle 11g 64 bit

we opted for Oracle11g because of product lifetime All data is fed to DB using C-OCI programs Oracle setup:

user AMSDES with dedicated Bigfile tablespace overkill wrt Oracle's suggestion (>= 1 TB) ...?

default TEMP TBLSpace (3 GB)

Page 6: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

test environment Dummy Table:

RUN (NUMBER) EVENT (NUMBER) TAG (BINARY_DOUBLE)

TAG field will be used for indexing ...assuming queries will be mainly on TAG

10^8 records, equivalent to ~ 10 days of DAQ Test B-Tree vs BitMap Index

as hinted by our previous studies...

Page 7: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Indexes (theory) Btree

default index type in Oracle very space consuming. Not advised for dataware housing

dataware house: write few, read many, HUGE data sample BitMap

advised for dataware housing Bitmap indexes are stored as an array of zero-and-one

values, with one entry for each row. Should be used only on low cardinality fields

Page 8: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Indexes (theory) BitMap

From Oracle Advanced Programming White Paper, Sep. 2008

“...Conventional wisdom holds that bitmap indexes are most appropriate for columns having low distinct values—such as GENDER, MARITAL_STATUS. This assumption is not completely accurate, however. In reality, a bitmap index is always advisable for systems in which data is not frequently updated by many concurrent systems...”

Page 9: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Tests Items to be tested are:

SPACE OCCUPANCY = f(index type) QUERY PERFORMANCE = f(index type)

Index types: B-Tree and Bitmap We started with a random generated sample We then used an AMS01 sample

40 ntuples no request on charge, pmass, pmom, lat (nothing!)

AMS02 Cosmic data

Page 10: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Disk Space TABLE SIZE = 5 GB

this means that for AMS02 we'll need 500 GB B-Tree INDEX SIZE = 4.8 GB

this means that for AMS02 we'd need 480 GB BitMap INDEX SIZE = 70 MB

this means that for AMS02 we'd need 7 GB thus, at least for space reasons, we should use

BitMap INDEX, a factor 30 smaller.

Page 11: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

when to create an INDEX After TABLE has been populated ! B-Tree INDEX creation time:

4 µsec/record mainly SORT and disk space allocation

BitMap INDEX creation time: 0.5 µsec/record

Page 12: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Theory: is an INDEX useful ? Not always... ORACLE has a built-in decision taking algorithm

(EXECUTION PLAN) which decides how to actually implement a query.

Execution plan tries to optimize disk accesses and CPU usage.

So, e.g., if reading an index which returns “many” records takes more than X, than the index is not used.

These are based, since Oracle 10.0.2i on Cost Based Optimizer and Dynamic Sampling (by default ON)

Page 13: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

EXECUTION PLANPLAN_TABLE_OUTPUT

---------------------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 3591811347

------------------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 36793 | 503K| 297 (1) | 00:00:04 |

| 1 | TABLE ACCESS FULL | N_TAG_RAND_NOISE | 36793 | 503K| 297 (1)| 00:00:05 |

Page 14: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

EXECUTION PLANPLAN_TABLE_OUTPUT

---------------------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 3591811347

------------------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 36793 | 503K| 297 (1) | 00:00:04 |

| 1 | TABLE ACCESS BY INDEX ROWID| N_TAG_RAND_NOISE | 36793 | 503K| 297 (1)| 00:00:04 |

|* 2 | INDEX RANGE SCAN | TAG_RAND_IDX | 36793 | | 106 (0)| 00:00:02 |

Page 15: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

IDX usage Test To test if Index is actually used, we wrote a simple

PLSQL+Perl program that Dynamically populates statistics table

HOW-MANY <--> TAG_STA For each TAG_STA value, generates and analyzes

EXECUTION PLAN for SELECT run, event from TABLE wher tag=TAG_STA;

We thus run CARDINALITY queries to DB and analyze the Optimizer decisions...

Page 16: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Sample1 CARDINALITY:5relative population > 5.7%

Page 17: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Sample2 CARDINALITY:2517relative population: 6x10E-4

Page 18: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Performance Sample1 and Sample2 both do NOT use INDEX in the

select query. This is because Oracle's built-in optimization algorithm

discovers that too many records will be returned, and thus decides to ignore the INDEX

Thus: LOW/MEDIUM cardinality with HIGH relative

population makes INDEX uneffective

Page 19: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Sample3 CARDINALITY:1024relative population 6x10E-5

Page 20: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Performance Sample3 uses INDEX ! This is because Oracle's builtin optimization algorithm

discovers that TABLE ACCESS BY INDEX ROWID is efficient.

Thus LOW/MEDIUM cardinality with “correct for

Oracle” relative population makes INDEX effective !

Page 21: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Real Data: AMS01 We then used an AMS01 sample

40 ntuples (no request on charge, pmass, pmom, ecc) “scrambled” this sample to get 10^8 records

Average relative cardinality = 2x10E-6 BitMap Index is USED ! Selection time

no IDX: 22 sec IDX: 0.05 sec

INDEX really effective

Page 22: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Real Data: AMS02 !!! We also used AMS02 Cosmic Rays data

we used only 2 2x10E8 events no request on charge, pmass, pmom, ...(nothing!)

Table size: 5.5 GB 2 BitMap IDX size: 57 MB 2

just for the sake of curiosity: a UNIQUE requirement on RUN+EVENTNO creates an INDEX 5 GB big !!!

Btree IDX size: 4.2 GB 2

Page 23: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Real Data: AMS02 !!! Mean Relative cardinality: 1.8x10E-4 Max Relative cardinality: 3x10E-3 BitMap Index can be USED !

tested EXECUTION PLAN for all TAG values.

Page 24: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Real Data: AMS02 !!! Selection time

no IDX: Average Selection time: 24 sec 0.01 sec/record once retrieved

IDX: Average Selection time: 0.05 sec 0.005 sec/record once retrieved

Again, B-Map index is very effective.

Page 25: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

naïve usage... The most naïve way to design the DB is 1 table for

each month of DAQ. This leads to ~36 tables. Most naïve way to query them all is using a UNION

statement which still uses index with nearly no overhead for the SORT

UNIQUE/UNION-ALL part...

Page 26: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

PLAN_TABLE_OUTPUT------------------------------------------------------------------------------------------------------------------------------------Plan hash value: 2624995586

----------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |----------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 44 | 1408 | 290K (50)| 00:58:03 || 1 | SORT UNIQUE | | 44 | 1408 | 290K (50)| 00:58:03 || 2 | UNION-ALL | | | | | || 3 | TABLE ACCESS BY INDEX ROWID | COSMIC_TDV | 42 | 1344 | 145K (1)| 00:29:10 || 4 | BITMAP CONVERSION TO ROWIDS| | | | | ||* 5 | BITMAP INDEX SINGLE VALUE | COMIX_TDV | | | | || 6 | TABLE ACCESS BY INDEX ROWID | COSMIC_TDV1 | 2 | 64 | 144K (1)| 00:28:53 || 7 | BITMAP CONVERSION TO ROWIDS| | | | | ||* 8 | BITMAP INDEX SINGLE VALUE | COMIX_TDV1 | | | | |----------------------------------------------------------------------------------------------

44 rows returned out of 377,516,896 rows in total

Rough estimate for SELECT on 3 years: 25 minutes...

Page 27: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

Conclusions In general, Bit-Map index is very efficient in space usage

30 times smaller than normal Btree index Index is used in queries if relative cardinality is low If so, selection time is very low. According to AMS01 and AMS02 Cosmic Rays

analyzed, BitMap index can be used We could thus use it for AMS02 expected space usage: 500 GB (table)+ 6 GB (idx)

EventStatusTable02 files will use ~ 150 GB...

Page 28: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it
Page 29: Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it

AMS01 sample CARDINALITY:89860

average relative population 5.7%