CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

Oracle In-Memory - Game Changer in Data Warehousing and Business Intelligence

Dr.-Ing. Holger Friedrich

Agenda

• Introduction • Columnar Stores • Oracle In-Memory • Analytics • Loading • Conclusions

sumIT AG

• Consulting and implementation services in Switzerland • Experts for

– Data Warehousing and – Business Intelligence solutions

• Focussed on Oracle technology • ‘BI Foundation specialized’ partner • ‘Data Warehousing specialized’ partner • Exalytics competence center with own server • Our motto: Get Value From Data • Visit our web site: www.sumit.ch

(in German)

Holger Friedrich• Computer Science diploma of

Karlsruhe Institute of Technology (KIT) • Ph.D. in Robotics and Machine Learning • More than 16 years experience with Oracle technology • Expert for

– Data Integration – Data Warehousing, – Data Mining and – Business Intelligence

• Technical Director of sumIT AG !

• First Oracle ACE for DWH/BI in Switzerland

Agenda

• Introduction • Columnar Databases • Oracle In-Memory • Analytics • Loading • Conclusions

Advantages

• Best for queries that - scan large quantities of data - on a rather small set of columns - compute aggregates on the

results

• High compression benefits on most columns (except ones containing distinct values)

Well suited for OLAP/BI

Drawbacks (Up To Now)• Some operations very costly

- DML - Queries retrieving entire rows !!!

• Complex DBMS infrastructure has to be build once more - storage (management) - security - clustering - disaster recovery - …

Less suited for OLTP

Competition

• Niche vendors - Exasol - HP Vertica - Infobright - Paracell !

• The usual suspects - Microsoft (Columnstore Indexes) - IBM - Teradata - and of course SAP/HANA

Agenda

Columnar Stores/DBs - Oracle’s Flavour

• transparent column store managed next to the row store • not either/or • persistent storage row-based as before • column store DML-synched in real-time • the entire Oracle DB-ecosphere remains unchanged

- security - backup - disaster recovery - RAC - …

• NO application changes required!

Technology Gems

1. In-memory storage index 2. Filtering on binary compressed data 3. Columnar storage of selected columns 4. Transparent querying across storage hierarchy 5. Real-time background actualization of columnar store 6. Parallel query execution on the columnar store 7. SIMD vector processing 8. In-memory fault tolerance on RAC 9. On-demand building of multi-dimensional aggregation

data structure (almost an on-the-fly MOLAP cube)

In-Memory Storage Index

• Column data ist stored separated in compression units (IMCUs) • In-Memory Storage Indexes store Min/Max values for each

column for each IMCU • IMCUs with Min/Max outside

a query predicate can besafely ignored duringprocessing

• v$mystat shows informationabout number of IMCUsassessed vs. IMCUs pruned

Memory

SALESColumn Format

Min 1 Max 3

Min 4 Max 7

Min 8 Max 12

Min 13 Max 15

Example: Find sales from stores with a store_id of 8 or higher

SIMD Vector Processing

• Single Instruction processing Multiple Data values

• Evaluation of a set of column values in a single CPU instruction cycle

• Potential to speed up processing to billions of rows per second

Load multiple PROMO_ID values

Vector Compare all values in 1 cycle

99999999

Example: Find all sales With PROMO_ID 9999

Memory

In-Memory Aggregation

• New optimizer transformation Vector Group By • Resembles well-known star transformation • Two phase, 6 step process • Phase 1 - preparation

1. Scan dimensions 2. Build key vectors 3. Prepare accumulator 4. Build tmp-tables for dim select attributes

• Phase 2 - computation 5. Scan facts w.r.t. key vectors

6. Join filtered facts with tmp-tables

In-Memory Aggregation - XPLAN

In-Memory on RAC Including Fault Tolerance

• Distribution of large objects’ in-memory compression units (IMCUs) – automatically (default) –BY ROWID RANGE –BY {SUB}PARTITION

• Fault tolerance (engineered systems only) – DISTRIBUTE clause to keep

redundant IMCU copies on nodes – DISTRIBUTE ALL = each IMCU

copied to every node

Assessment

• The In-Memory-Option can extremely improve query performance • In particular data scanning is benefiting • Joins & Vector-By aggregations are accelerated as well • However, it is advanced technology not magic • Sorting, classic aggregation etc. still take time

Scan Data Aggregate

Row Store

Scan Data AggregateIn-Memory

Scan Data AggregateRow StoreScan Data AggregateIn-Memory

Join / Sort / Group / …

Agenda

Unprecedented Performance for…

• Reporting queries - Simple - SQL*Analytics

• (Tool based) OLAP • Dimensional queries

Simple Reporting Queries • Query characteristics

- few joins - simple one-step aggregations (if at all) - lots of filtering - sometimes many rows and to be displayed

• Processing - scanning in columnar store use IMCU storage indexes - join by bloom filtering applied on columnar store - scanning and joining effort far outweighs other processing effort - but large number of rows may need time to transfer to client - SIMD computation can be used on a large scale

• In-Memory impact – high performance gains

SQL*Analytics Reporting Queries

• Query characteristics - some joins - complex analytic functions - lots of filtering - often many attributes to be displayed

• Processing - scanning in columnar store use IMCU storage indexes - join by bloom filtering applied on columnar store - share of processing effort other than scanning and joining rises - SIMD computation can be used

• In-Memory impact – gain of performance, but smaller than for more simple reporting queries

(Tool Based) OLAP

• Query characteristics – horrendously complex queries – chaining of with clauses – complex analytic functions and aggregations

• Processing - short scanning time - hard for optimizer to find efficient plan - materialization of temporary results ‘breaks' pure columnar processing - intermediate computation effort exceeds columnar in-memory share of effort

• In-Memory impact – gain of using in-memory option depends on query complexity – the need for pre-computing (some) aggregates remains

Dimensional Queries• Characteristics

- few simple joins (star shape) - filtering on dimensions - most aggregations along dimension attributes - massive amount of facts - sometimes massive dimensions

• Technology & consequences - short scanning time - application of optimizer's new vector-group-by transformation

• In-Memory impact – high performance gain

Different Reporting Queries

Acceleration Of Reporting Queries

report type no of rows result set row store (SGA)

columnar store times X

simple 400K 35 10ms 2ms 5

(bloom)

14M & 55K 2M 25s 25s 1

join, top10(analytics)

14M & 55K 10 2s 1s 2

dimensional (vector by)

14M & 1.8K & 72 88 8s 0.8s 10

• Demo comparing SGA row based vs. in-memory columnar store • Small Virtual Machine • No SIMD support in demo environment • Serial execution

Higher gains on enterprise infrastructure

Agenda

Data Quality & Consistency Assessment

• Typical tests - column value checks - intra row checks - inter row checks - inter table checks

• Challenge - often complex conditions - functions have to be applied - costly, also in columnar store - e.g. not REGEXP_LIKE (ssnumber, ‘\d{3}\.\d{4}\.\d{4}\.\d{2}’)

• Observation - gain depends significantly on test complexity

Meta Data Transformation During ETL

• Typical scenario - transformation of source dependent (domain) data into DWH

standard representation - usually using mapping tables - e.g. sourcesys=‘SAP’ and

gender = ‘0' => return ‘male'

- typical case of joins without aggregation

• Challenge - staging tables initially not in column store

• Strategy - populate only columns to be transformed into column store - check population time vs. speed gain

mapping tablestaged src

src system is JD Edwards

gender entries for all rows

Key Transformations

• Typical scenario - transformation of source dependent natural/business keys into DWH owned

surrogate representation - reverse lookups for data mart loading - multiple (outer) joins against target tables - typical case of (outer) joins without aggregation

• Challenge - staging tables initially not in column store

• Strategy - populate only rows to be transformed into column store - check population time vs. speed gain - works also with lookup tables in columnar and staging table in row format

Example Key Lookup Query

select s.invoicenumber, s.year, s.audit_id, s.cutoffdt, r.id invoice_id, m.id member_id from (select * from st_db_rechnung_in_t where rownum < 100000) s, db_rechnung_ht r, pv_mitglied_ht m where s.invoicenumber = m.invoicenumber (+) and s.invoicenumber = r.invoicenumber (+) and s.year = r.year (+) and s.incoiceitem = r.incoiceitem (+) and s.srcmodifieddt > SYSDATE-720

1. scan staging table

3. outer join to lookup tables

2. take last 2 years

4. return DWH-IDs plus some other stuff

Chaining Of Bloom Filters

-------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 99999 | 9081K| | 2848 (1)| 00:00:01 | |* 1 | HASH JOIN OUTER | | 99999 | 9081K| 8008K| 2848 (1)| 00:00:01 | | 2 | JOIN FILTER CREATE | :BF0000 | 99999 | 6835K| | 1090 (1)| 00:00:01 | |* 3 | HASH JOIN OUTER | | 99999 | 6835K| 6840K| 1090 (1)| 00:00:01 | | 4 | JOIN FILTER CREATE | :BF0001 | 99999 | 5664K| | 278 (1)| 00:00:01 | |* 5 | VIEW | | 99999 | 5664K| | 278 (1)| 00:00:01 | |* 6 | COUNT STOPKEY | | | | | | | | 7 | TABLE ACCESS FULL | ST_DB_RECHNUNG_IN_T | 99999 | 3613K| | 278 (1)| 00:00:01 | | 8 | JOIN FILTER USE | :BF0001 | 395K| 4637K| | 29 (7)| 00:00:01 | |* 9 | TABLE ACCESS INMEMORY FULL| PV_MITGLIED_HT | 395K| 4637K| | 29 (7)| 00:00:01 | | 10 | JOIN FILTER USE | :BF0000 | 781K| 17M| | 73 (13)| 00:00:01 | |* 11 | TABLE ACCESS INMEMORY FULL | DB_RECHNUNG_HT | 781K| 17M| | 73 (13)| 00:00:01 | --------------------------------------------------------------------------------------------------------------

1. scan staging table2. create Bloom filters on lookup tables

3. apply Bloom filters

4. hash-join bloom filter false positives

Conclusions

• Oracle In-Memory is a game changer on the DWH/BI market – in contrary to niche players it is absolutely enterprise ready – in contrary to the other big players its use requires no modifications

• Therefore, In-Memory provides a big leap in performance with - low risks - low project-, infrastructure-, maintenance- & development cost

• However, In-Memory is no silver bullet • Speed-up varies very much on query complexity • Good design of ETL processes & analyses remains important • Powerful infrastructure is still required

(think about using Oracle Engineered Systems)

CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

Documents

Oracle Data Warehousing Guide - Interdisciplinary

Data Warehousing on Oracle RAC Best · PDF fileAdditional Considerations ... Data Warehousing on Oracle RAC Best Practices Page 3 Design and Test to Meet ... Data Warehousing on Oracle

AGILE METHODS AND DATA WAREHOUSING - Snowflake · AGILE METHODS AND DATA WAREHOUSING : HOW TO DELIVER FASTER. ... › Oracle ACE Director ... Needed experience in Data Warehousing,

Best Practices for Extreme Performance with Oracle Data Warehousing

Best Practices – Extreme Performance with Data Warehousing on Oracle Database_db_v2

Extreme Performance Data Warehousing - Oracle Exadata · Extreme Performance Data Warehousing - Oracle Exadata Saint Kim Director, Technology Sales Consulting,

trends in data warehousing for insurance companies · trends in data warehousing for insurance companies ... Oracle, Microsoft launch BI ... trends in data warehousing future trends

Oracle 11g and OBIEE Data Warehousing and Analytics

2 Day + Data Warehousing Guide Oracle 11g

Oracle Database 12c - Built for Data Warehousing...5 | ORACLE DATABASE 12C – BUILT FOR DATA WAREHOUSING Data Ingestion Data Ingestion is responsible for moving, cleaning and transforming

Best Practices – Extreme Performance with Data Warehousing on Oracle Database

Oracle Database 2 Day + Data Warehousing Guide

Data Warehousing on Oracle RAC Best · PDF fileData Warehousing on Oracle RAC Best Practices Page 3 Design and Test to Meet Specific Business Needs.....25 Conclusion

Oracle Database 11g for Data Warehousing

Oracle Data Warehousing 11g Certified Implementation ... · Oracle Data Warehousing 11g Certified Implementation ... Oracle Data Warehousing 11g Certified Implementation Specialist

Oracle 11g for data warehousing - NOCOUG - Northern California

Oracle Database 11g BI and Data Warehousing D52358

Data Warehousing and Business Intelligence - The …arisant.com/wp-content/uploads/Data-Warehousing-and-Business... · Oracle Data Warehousing and Business Intelligence - The Complete

Oracle 11g Data Warehousing

Oracle Data Warehousing Guide - maine.edu Database Data Warehousing Guide, 11g Release 1 (11.1) ... Data Mining ... Overview of Data Warehousing with Materialized Views