Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent

Niloy Mukherjee, Shasank Chavan, Maria Colgan, Dinesh Das, Mike Gleeson, Sanket Hase, Allison Holloway, Hui Jin,

Jesse Kamp, Kartik Kulkarni, Tirthankar Lahiri, Juan Loaiza, Neil Macnaughton, Vineet Marwah, Atrayee Mullick,

Andy Witkowski, Jiaqi Yan, Mohamed Zait

Distributed Architecture of Oracle Database In-memory

Overview

1. Motivation– Trends and current solutions

2. Solution– Real Application Clustera– Oracle Database In-Memory

3. Preliminary evalutation– Some test results

4. Conclusion

MotivationWhy Do We Need This?

Data Trends● Deluge of data● Ad-hoc real-time analysis

Typical Solution

● ETL – Extract, transform, load● Analyze data in dedicated system

OLTPApplication

OLAPApplication

● Complexity and manageability overhead!

● No real-time analytics

Data Format

● Columnar format– Great for OLAP– Fast scans of single column

● Row format– Great for OLTP– Handle entire rows

Hardware Trends● More cores, processors● Cheaper memory● Requires distributed applications

In-Memory Databases● Memory resident

– Oracle TimesTen (mid 1990s)– Both row and column based– Main memory now conceived

as primary storage!

● Disk resident– Persistent

Scaling Out● Aggregate power and memory● DB may not fit in single machine● Less contention for resources● Elastic!

Scaling Up● Majority of workloads are quite small● Median @ Microsoft & Yahoo: less than 14GB● 90% @ Facebook under 100GB● Commodity server: 100s of GB and 32 cores● Oracle Sun SuperCluster: 32TB and 1024 cores

But...● Scaling out offers

– High availability– Fast recovery

How can mixed OLTAP be provided seamlessly, transparently AND be distributed?

Then the Question Is...

SolutionHow was it solved?

Real Application Cluster● Real Application Clusters abstracts away cluster details

DBIM● Oracle Database In-memory (2014)● Real Application Clusters● Dual format● Both disk and memory● Both OLAP and OLTP (Mixed OLTAP)

Dual Format● Row format

– Buffer cache (in memory)– Traditional logging

● Column format– In-memory– Fast scans

DBIM Instance Architecture

Shared Buffer Cache

Shared Buffer Cache● Shared collective cache of data blocks● Global Cache Service manages

– Location– Access– Handles all OLTP DML operations

● ACID

In-Memory Column Store

In-Memory Compression Unit (IMCU)● Construction:

– Convert row → column– Apply «intelligent data transformation»

and compression● Unit of distribution and scan● Contiguous● Each column becomes a Compression

Unit (CU)– User selectable compression, capacity vs

performance

Scanning IMCUs● SIMD instructions● In-memory Storage Indexes

– Automatically created– Pruning based on filter

predicates– E.g. max and min for each CU

● Low scan cost enables– Bloom filter joins– Vector Group By

In-Memory Column Store● Container for in-memory segments

– Each segment contiguous and contains several IMCUs

● NUMA enabled – distributes equally● Home location index

– Look up segment from data block

Distribution Manager

Distribution manager● Wanted qualities

– Scale out - Extremely scalable distribution– High availability

● in-memory fault tolerance● efficient recovery

– Scale up – distribution across NUMA nodes– Seamless interaction with Oracle's SQL execution engine

Distribution Schemes● By partition● By sub-partiton● By rowid/block range● Automatic

Distribution Mechanism● Why not centralized?

– Non-trivial consistency communication by the coordinating instance● Why not decentralized?

– Lack of consensus → inconsistency● Best of both worlds!?

– Two phase distribution

Phase 1: Consensus● Multiple instances may trigger (re)distribution

– Need leader selection

Phase 1: Consensus

Broadcast Acknowledge Leader downgrade

Phase 2: Population● Calculate block ranges

– Use SCN broadcasted in phase 1● Determine home location

– Rendezvous hashing● NUMA is static

– Use modulo based distribution

Side Note: Rendezvous Hashing● Given a hash function h and an object O, select the instance S

whereh(S, O)

takes on the highest value.● Alternatively: Lowest value● Desirable property: Minimal disruption

Phase 2: Population● Generate IMCUs● Update home location index● Release locks

At the end: All home location indexes consistent

Home Location Indexes

Redistribution● On cluster topology change● Same as distribution● Reuse SCN!

IM Transaction Manager

IM Transaction Manager● Maintains transactional consistency● Uses a system change number (SCN)● Snapshot management unit (SMU)

– Fills in the gap between the IMCU's SCN and query SCN

IMCU+SCN SMU

Note: Requires regular repopulation

Distributed SQL Execution

Distributed SQL Execution● Index vs scan

– Extrapolate cost from home location index● Scan:

– Determine degree of parallelism– Allocate nodes– For 1-safe, select first or secondary

Distributed SQL Execution● Hierarchy of

(sub)distributors● Distribute work based on

home location index● Align to IMCUs and

NUMA boundaries– All block ranges within

same memory

Instance

Query

NUMA node

Home Location Aware Scanning

Uniqueness of Architecture

● SAP HANA– More centralized– poor load

balancing– no redundancy

● No-SQL– Focus on

performance– Not ACID

● IBM DB2 + BLU– Per node in-

memory column db

– no in-memory redundancy

Preliminary EvaluationDid it work?

With data from TPC-H

Distribution● Non-partitioned table

– «atomics» table– Constant size

● Composite-partitioned table– «lineitem» table– Increasing size

● 84-way partitioned, each subpartitioned 256 ways (hash)

Speedup seems to be linear

Query Execution● 4 query sets:

Query Set 1● Selects counts● Where clauses with

increasing complexity

Query Set 2● Select max● Increasing

complexity in select clause

Query Set 3● Different like

predicates

Query Set 4● Simple '<='

predicate● Increasing selectivity

In-Memory Distribution Awareness● Auto distributed, no redundancy

NUMA Aware Query Execution● Scale up

In-Memory Fault Tolerance● 1-safe redundancy● First on 8 instances, then after killing one● (availability)

ConclusionOr more like a summary?

Conclusion● Seamless real-time analytics on huge data volumes with redundancy

→ mixed OLTAP● Oracle DBIM should solve this

– Application transparent– In-memory– Distributed– Uses Oracle's SQL execution framework (consistent interface)

Documents

Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent