View
12
Download
0
Category
Preview:
DESCRIPTION
Open World 2014 CON5193 Oracle In-Memory The Game Changer in Data Warehousing and Business Intelligence
Citation preview
Oracle In-Memory - Game Changer in Data Warehousing and Business Intelligence
Dr.-Ing. Holger Friedrich
03/2012© 2014 sumIT AG
Agenda
• Introduction • Columnar Stores • Oracle In-Memory • Analytics • Loading • Conclusions
2
03/2012© 2013 sumIT AG 3
sumIT AG
• Consulting and implementation services in Switzerland • Experts for
– Data Warehousing and – Business Intelligence solutions
• Focussed on Oracle technology • ‘BI Foundation specialized’ partner • ‘Data Warehousing specialized’ partner • Exalytics competence center with own server • Our motto: Get Value From Data • Visit our web site: www.sumit.ch
(in German)
03/2012© 2013 sumIT AG 4
Holger Friedrich• Computer Science diploma of
Karlsruhe Institute of Technology (KIT) • Ph.D. in Robotics and Machine Learning • More than 16 years experience with Oracle technology • Expert for
– Data Integration – Data Warehousing, – Data Mining and – Business Intelligence
• Technical Director of sumIT AG !
• First Oracle ACE for DWH/BI in Switzerland
03/2012© 2014 sumIT AG
Agenda
• Introduction • Columnar Databases • Oracle In-Memory • Analytics • Loading • Conclusions
5
03/2012© 2014 sumIT AG
Advantages
• Best for queries that - scan large quantities of data - on a rather small set of columns - compute aggregates on the
results
• High compression benefits on most columns (except ones containing distinct values)
6
Well suited for OLAP/BI
03/2012© 2014 sumIT AG
Drawbacks (Up To Now)• Some operations very costly
- DML - Queries retrieving entire rows !!!
• Complex DBMS infrastructure has to be build once more - storage (management) - security - clustering - disaster recovery - …
7
Less suited for OLTP
03/2012© 2014 sumIT AG
Competition
• Niche vendors - Exasol - HP Vertica - Infobright - Paracell !
• The usual suspects - Microsoft (Columnstore Indexes) - IBM - Teradata - and of course SAP/HANA
8
03/2012© 2014 sumIT AG
Agenda
• Introduction • Columnar Stores • Oracle In-Memory • Analytics • Loading • Conclusions
9
03/2012© 2014 sumIT AG
Columnar Stores/DBs - Oracle’s Flavour
10
• transparent column store managed next to the row store • not either/or • persistent storage row-based as before • column store DML-synched in real-time • the entire Oracle DB-ecosphere remains unchanged
- security - backup - disaster recovery - RAC - …
• NO application changes required!
03/2012© 2014 sumIT AG
Technology Gems
1. In-memory storage index 2. Filtering on binary compressed data 3. Columnar storage of selected columns 4. Transparent querying across storage hierarchy 5. Real-time background actualization of columnar store 6. Parallel query execution on the columnar store 7. SIMD vector processing 8. In-memory fault tolerance on RAC 9. On-demand building of multi-dimensional aggregation
data structure (almost an on-the-fly MOLAP cube)
11
03/2012© 2014 sumIT AG
In-Memory Storage Index
12
• Column data ist stored separated in compression units (IMCUs) • In-Memory Storage Indexes store Min/Max values for each
column for each IMCU • IMCUs with Min/Max outside
a query predicate can besafely ignored duringprocessing
• v$mystat shows informationabout number of IMCUsassessed vs. IMCUs pruned
Memory
SALESColumn Format
Min 1 Max 3
Min 4 Max 7
Min 8 Max 12
Min 13 Max 15
Example: Find sales from stores with a store_id of 8 or higher
03/2012© 2014 sumIT AG
SIMD Vector Processing
13
• Single Instruction processing Multiple Data values
• Evaluation of a set of column values in a single CPU instruction cycle
• Potential to speed up processing to billions of rows per second
Load multiple PROMO_ID values
Vector Compare all values in 1 cycle
CPU
PRO
MO
_ID
9999
99999999
9999
Example: Find all sales With PROMO_ID 9999
VECT
OR
REG
ISTE
R
Memory
03/2012© 2014 sumIT AG
In-Memory Aggregation
14
• New optimizer transformation Vector Group By • Resembles well-known star transformation • Two phase, 6 step process • Phase 1 - preparation
1. Scan dimensions 2. Build key vectors 3. Prepare accumulator 4. Build tmp-tables for dim select attributes
• Phase 2 - computation 5. Scan facts w.r.t. key vectors
6. Join filtered facts with tmp-tables
03/2012© 2014 sumIT AG
In-Memory Aggregation - XPLAN
15
03/2012© 2014 sumIT AG
In-Memory on RAC Including Fault Tolerance
16
• Distribution of large objects’ in-memory compression units (IMCUs) – automatically (default) –BY ROWID RANGE –BY {SUB}PARTITION
• Fault tolerance (engineered systems only) – DISTRIBUTE clause to keep
redundant IMCU copies on nodes – DISTRIBUTE ALL = each IMCU
copied to every node
03/2012© 2014 sumIT AG
Assessment
17
• The In-Memory-Option can extremely improve query performance • In particular data scanning is benefiting • Joins & Vector-By aggregations are accelerated as well • However, it is advanced technology not magic • Sorting, classic aggregation etc. still take time
Scan Data Aggregate
t
Row Store
Scan Data AggregateIn-Memory
Scan Data AggregateRow StoreScan Data AggregateIn-Memory
Join / Sort / Group / …
Join / Sort / Group / …
03/2012© 2014 sumIT AG
Agenda
• Introduction • Columnar Stores • Oracle In-Memory • Analytics • Loading • Conclusions
18
03/2012© 2014 sumIT AG
Unprecedented Performance for…
• Reporting queries - Simple - SQL*Analytics
• (Tool based) OLAP • Dimensional queries
19
03/2012© 2014 sumIT AG
Simple Reporting Queries • Query characteristics
- few joins - simple one-step aggregations (if at all) - lots of filtering - sometimes many rows and to be displayed
• Processing - scanning in columnar store use IMCU storage indexes - join by bloom filtering applied on columnar store - scanning and joining effort far outweighs other processing effort - but large number of rows may need time to transfer to client - SIMD computation can be used on a large scale
• In-Memory impact – high performance gains
20
03/2012© 2014 sumIT AG
SQL*Analytics Reporting Queries
• Query characteristics - some joins - complex analytic functions - lots of filtering - often many attributes to be displayed
• Processing - scanning in columnar store use IMCU storage indexes - join by bloom filtering applied on columnar store - share of processing effort other than scanning and joining rises - SIMD computation can be used
• In-Memory impact – gain of performance, but smaller than for more simple reporting queries
21
03/2012© 2014 sumIT AG
(Tool Based) OLAP
22
• Query characteristics – horrendously complex queries – chaining of with clauses – complex analytic functions and aggregations
• Processing - short scanning time - hard for optimizer to find efficient plan - materialization of temporary results ‘breaks' pure columnar processing - intermediate computation effort exceeds columnar in-memory share of effort
• In-Memory impact – gain of using in-memory option depends on query complexity – the need for pre-computing (some) aggregates remains
03/2012© 2014 sumIT AG
Dimensional Queries• Characteristics
- few simple joins (star shape) - filtering on dimensions - most aggregations along dimension attributes - massive amount of facts - sometimes massive dimensions
• Technology & consequences - short scanning time - application of optimizer's new vector-group-by transformation
• In-Memory impact – high performance gain
23
03/2012© 2014 sumIT AG
Different Reporting Queries
24
03/2012© 2014 sumIT AG
Acceleration Of Reporting Queries
25
report type no of rows result set row store (SGA)
(SGA)
columnar store times X
simple 400K 35 10ms 2ms 5
join
(bloom)
14M & 55K 2M 25s 25s 1
join, top10(analytics)
14M & 55K 10 2s 1s 2
dimensional (vector by)
14M & 1.8K & 72 88 8s 0.8s 10
• Demo comparing SGA row based vs. in-memory columnar store • Small Virtual Machine • No SIMD support in demo environment • Serial execution
Higher gains on enterprise infrastructure
03/2012© 2014 sumIT AG
Agenda
• Introduction • Columnar Stores • Oracle In-Memory • Analytics • Loading • Conclusions
26
03/2012© 2014 sumIT AG
Data Quality & Consistency Assessment
• Typical tests - column value checks - intra row checks - inter row checks - inter table checks
• Challenge - often complex conditions - functions have to be applied - costly, also in columnar store - e.g. not REGEXP_LIKE (ssnumber, ‘\d{3}\.\d{4}\.\d{4}\.\d{2}’)
• Observation - gain depends significantly on test complexity
27
03/2012© 2014 sumIT AG
Meta Data Transformation During ETL
• Typical scenario - transformation of source dependent (domain) data into DWH
standard representation - usually using mapping tables - e.g. sourcesys=‘SAP’ and
gender = ‘0' => return ‘male'
- typical case of joins without aggregation
• Challenge - staging tables initially not in column store
• Strategy - populate only columns to be transformed into column store - check population time vs. speed gain
28
mapping tablestaged src
src system is JD Edwards
gender entries for all rows
gend
er
DW
H g
ende
r
src
syst
em
gend
er
03/2012© 2014 sumIT AG
Key Transformations
• Typical scenario - transformation of source dependent natural/business keys into DWH owned
surrogate representation - reverse lookups for data mart loading - multiple (outer) joins against target tables - typical case of (outer) joins without aggregation
• Challenge - staging tables initially not in column store
• Strategy - populate only rows to be transformed into column store - check population time vs. speed gain - works also with lookup tables in columnar and staging table in row format
29
03/2012© 2014 sumIT AG
Example Key Lookup Query
30
select s.invoicenumber, s.year, s.audit_id, s.cutoffdt, r.id invoice_id, m.id member_id from (select * from st_db_rechnung_in_t where rownum < 100000) s, db_rechnung_ht r, pv_mitglied_ht m where s.invoicenumber = m.invoicenumber (+) and s.invoicenumber = r.invoicenumber (+) and s.year = r.year (+) and s.incoiceitem = r.incoiceitem (+) and s.srcmodifieddt > SYSDATE-720
1. scan staging table
3. outer join to lookup tables
2. take last 2 years
4. return DWH-IDs plus some other stuff
03/2012© 2014 sumIT AG
Chaining Of Bloom Filters
31
-------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 99999 | 9081K| | 2848 (1)| 00:00:01 | |* 1 | HASH JOIN OUTER | | 99999 | 9081K| 8008K| 2848 (1)| 00:00:01 | | 2 | JOIN FILTER CREATE | :BF0000 | 99999 | 6835K| | 1090 (1)| 00:00:01 | |* 3 | HASH JOIN OUTER | | 99999 | 6835K| 6840K| 1090 (1)| 00:00:01 | | 4 | JOIN FILTER CREATE | :BF0001 | 99999 | 5664K| | 278 (1)| 00:00:01 | |* 5 | VIEW | | 99999 | 5664K| | 278 (1)| 00:00:01 | |* 6 | COUNT STOPKEY | | | | | | | | 7 | TABLE ACCESS FULL | ST_DB_RECHNUNG_IN_T | 99999 | 3613K| | 278 (1)| 00:00:01 | | 8 | JOIN FILTER USE | :BF0001 | 395K| 4637K| | 29 (7)| 00:00:01 | |* 9 | TABLE ACCESS INMEMORY FULL| PV_MITGLIED_HT | 395K| 4637K| | 29 (7)| 00:00:01 | | 10 | JOIN FILTER USE | :BF0000 | 781K| 17M| | 73 (13)| 00:00:01 | |* 11 | TABLE ACCESS INMEMORY FULL | DB_RECHNUNG_HT | 781K| 17M| | 73 (13)| 00:00:01 | --------------------------------------------------------------------------------------------------------------
1. scan staging table2. create Bloom filters on lookup tables
3. apply Bloom filters
4. hash-join bloom filter false positives
03/2012© 2014 sumIT AG
Conclusions
• Oracle In-Memory is a game changer on the DWH/BI market – in contrary to niche players it is absolutely enterprise ready – in contrary to the other big players its use requires no modifications
• Therefore, In-Memory provides a big leap in performance with - low risks - low project-, infrastructure-, maintenance- & development cost
• However, In-Memory is no silver bullet • Speed-up varies very much on query complexity • Good design of ETL processes & analyses remains important • Powerful infrastructure is still required
(think about using Oracle Engineered Systems)
32
Recommended