53
Niloy Mukherjee, Shasank Chavan, Maria Colgan, Dinesh Das, Mike Gleeson, Sanket Hase, Allison Holloway, Hui Jin, Jesse Kamp, Kartik Kulkarni, Tirthankar Lahiri, Juan Loaiza, Neil Macnaughton, Vineet Marwah, Atrayee Mullick, Andy Witkowski, Jiaqi Yan, Mohamed Zait Distributed Architecture of Oracle Database In-memory

Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Embed Size (px)

Citation preview

Page 1: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Niloy Mukherjee, Shasank Chavan, Maria Colgan, Dinesh Das, Mike Gleeson, Sanket Hase, Allison Holloway, Hui Jin,

Jesse Kamp, Kartik Kulkarni, Tirthankar Lahiri, Juan Loaiza, Neil Macnaughton, Vineet Marwah, Atrayee Mullick,

Andy Witkowski, Jiaqi Yan, Mohamed Zait

Distributed Architecture of Oracle Database In-memory

Page 2: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Overview

1. Motivation– Trends and current solutions

2. Solution– Real Application Clustera– Oracle Database In-Memory

3. Preliminary evalutation– Some test results

4. Conclusion

Page 3: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

MotivationWhy Do We Need This?

Page 4: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Data Trends● Deluge of data● Ad-hoc real-time analysis

Page 5: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Typical Solution

● ETL – Extract, transform, load● Analyze data in dedicated system

OLTPApplication

OLAPApplication

● Complexity and manageability overhead!

● No real-time analytics

Page 6: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Data Format

● Columnar format– Great for OLAP– Fast scans of single column

● Row format– Great for OLTP– Handle entire rows

Page 7: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Hardware Trends● More cores, processors● Cheaper memory● Requires distributed applications

Page 8: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

In-Memory Databases● Memory resident

– Oracle TimesTen (mid 1990s)– Both row and column based– Main memory now conceived

as primary storage!

● Disk resident– Persistent

Page 9: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Scaling Out● Aggregate power and memory● DB may not fit in single machine● Less contention for resources● Elastic!

Page 10: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Scaling Up● Majority of workloads are quite small● Median @ Microsoft & Yahoo: less than 14GB● 90% @ Facebook under 100GB● Commodity server: 100s of GB and 32 cores● Oracle Sun SuperCluster: 32TB and 1024 cores

Page 11: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

But...● Scaling out offers

– High availability– Fast recovery

Page 12: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

How can mixed OLTAP be provided seamlessly, transparently AND be distributed?

Then the Question Is...

Page 13: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

SolutionHow was it solved?

Page 14: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Real Application Cluster● Real Application Clusters abstracts away cluster details

Page 15: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

DBIM● Oracle Database In-memory (2014)● Real Application Clusters● Dual format● Both disk and memory● Both OLAP and OLTP (Mixed OLTAP)

Page 16: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Dual Format● Row format

– Buffer cache (in memory)– Traditional logging

● Column format– In-memory– Fast scans

Page 17: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

DBIM Instance Architecture

Page 18: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Shared Buffer Cache

Page 19: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Shared Buffer Cache● Shared collective cache of data blocks● Global Cache Service manages

– Location– Access– Handles all OLTP DML operations

● ACID

Page 20: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

In-Memory Column Store

Page 21: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

In-Memory Compression Unit (IMCU)● Construction:

– Convert row → column– Apply «intelligent data transformation»

and compression● Unit of distribution and scan● Contiguous● Each column becomes a Compression

Unit (CU)– User selectable compression, capacity vs

performance

Page 22: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Scanning IMCUs● SIMD instructions● In-memory Storage Indexes

– Automatically created– Pruning based on filter

predicates– E.g. max and min for each CU

● Low scan cost enables– Bloom filter joins– Vector Group By

Page 23: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

In-Memory Column Store● Container for in-memory segments

– Each segment contiguous and contains several IMCUs

● NUMA enabled – distributes equally● Home location index

– Look up segment from data block

Page 24: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Distribution Manager

Page 25: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Distribution manager● Wanted qualities

– Scale out - Extremely scalable distribution– High availability

● in-memory fault tolerance● efficient recovery

– Scale up – distribution across NUMA nodes– Seamless interaction with Oracle's SQL execution engine

Page 26: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Distribution Schemes● By partition● By sub-partiton● By rowid/block range● Automatic

Page 27: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Distribution Mechanism● Why not centralized?

– Non-trivial consistency communication by the coordinating instance● Why not decentralized?

– Lack of consensus → inconsistency● Best of both worlds!?

– Two phase distribution

Page 28: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Phase 1: Consensus● Multiple instances may trigger (re)distribution

– Need leader selection

Page 29: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Phase 1: Consensus

Broadcast Acknowledge Leader downgrade

Page 30: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Phase 2: Population● Calculate block ranges

– Use SCN broadcasted in phase 1● Determine home location

– Rendezvous hashing● NUMA is static

– Use modulo based distribution

Page 31: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Side Note: Rendezvous Hashing● Given a hash function h and an object O, select the instance S

whereh(S, O)

takes on the highest value.● Alternatively: Lowest value● Desirable property: Minimal disruption

Page 32: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Phase 2: Population● Generate IMCUs● Update home location index● Release locks

At the end: All home location indexes consistent

Page 33: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Home Location Indexes

Page 34: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Redistribution● On cluster topology change● Same as distribution● Reuse SCN!

Page 35: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

IM Transaction Manager

Page 36: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

IM Transaction Manager● Maintains transactional consistency● Uses a system change number (SCN)● Snapshot management unit (SMU)

– Fills in the gap between the IMCU's SCN and query SCN

IMCU+SCN SMU

Note: Requires regular repopulation

Page 37: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Distributed SQL Execution

Page 38: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Distributed SQL Execution● Index vs scan

– Extrapolate cost from home location index● Scan:

– Determine degree of parallelism– Allocate nodes– For 1-safe, select first or secondary

Page 39: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Distributed SQL Execution● Hierarchy of

(sub)distributors● Distribute work based on

home location index● Align to IMCUs and

NUMA boundaries– All block ranges within

same memory

Instance

Query

NUMA node

Page 40: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Home Location Aware Scanning

Page 41: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Uniqueness of Architecture

● SAP HANA– More centralized– poor load

balancing– no redundancy

● No-SQL– Focus on

performance– Not ACID

● IBM DB2 + BLU– Per node in-

memory column db

– no in-memory redundancy

Page 42: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Preliminary EvaluationDid it work?

With data from TPC-H

Page 43: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Distribution● Non-partitioned table

– «atomics» table– Constant size

● Composite-partitioned table– «lineitem» table– Increasing size

● 84-way partitioned, each subpartitioned 256 ways (hash)

Speedup seems to be linear

Page 44: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Query Execution● 4 query sets:

Page 45: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Query Set 1● Selects counts● Where clauses with

increasing complexity

Page 46: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Query Set 2● Select max● Increasing

complexity in select clause

Page 47: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Query Set 3● Different like

predicates

Page 48: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Query Set 4● Simple '<='

predicate● Increasing selectivity

Page 49: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

In-Memory Distribution Awareness● Auto distributed, no redundancy

Page 50: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

NUMA Aware Query Execution● Scale up

Page 51: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

In-Memory Fault Tolerance● 1-safe redundancy● First on 8 instances, then after killing one● (availability)

Page 52: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

ConclusionOr more like a summary?

Page 53: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Conclusion● Seamless real-time analytics on huge data volumes with redundancy

→ mixed OLTAP● Oracle DBIM should solve this

– Application transparent– In-memory– Distributed– Uses Oracle's SQL execution framework (consistent interface)