Upload
vuongmien
View
242
Download
2
Embed Size (px)
Citation preview
Niloy Mukherjee, Shasank Chavan, Maria Colgan, Dinesh Das, Mike Gleeson, Sanket Hase, Allison Holloway, Hui Jin,
Jesse Kamp, Kartik Kulkarni, Tirthankar Lahiri, Juan Loaiza, Neil Macnaughton, Vineet Marwah, Atrayee Mullick,
Andy Witkowski, Jiaqi Yan, Mohamed Zait
Distributed Architecture of Oracle Database In-memory
Overview
1. Motivation– Trends and current solutions
2. Solution– Real Application Clustera– Oracle Database In-Memory
3. Preliminary evalutation– Some test results
4. Conclusion
MotivationWhy Do We Need This?
Data Trends● Deluge of data● Ad-hoc real-time analysis
Typical Solution
● ETL – Extract, transform, load● Analyze data in dedicated system
OLTPApplication
OLAPApplication
● Complexity and manageability overhead!
● No real-time analytics
Data Format
● Columnar format– Great for OLAP– Fast scans of single column
● Row format– Great for OLTP– Handle entire rows
Hardware Trends● More cores, processors● Cheaper memory● Requires distributed applications
In-Memory Databases● Memory resident
– Oracle TimesTen (mid 1990s)– Both row and column based– Main memory now conceived
as primary storage!
● Disk resident– Persistent
Scaling Out● Aggregate power and memory● DB may not fit in single machine● Less contention for resources● Elastic!
Scaling Up● Majority of workloads are quite small● Median @ Microsoft & Yahoo: less than 14GB● 90% @ Facebook under 100GB● Commodity server: 100s of GB and 32 cores● Oracle Sun SuperCluster: 32TB and 1024 cores
But...● Scaling out offers
– High availability– Fast recovery
How can mixed OLTAP be provided seamlessly, transparently AND be distributed?
Then the Question Is...
SolutionHow was it solved?
Real Application Cluster● Real Application Clusters abstracts away cluster details
DBIM● Oracle Database In-memory (2014)● Real Application Clusters● Dual format● Both disk and memory● Both OLAP and OLTP (Mixed OLTAP)
Dual Format● Row format
– Buffer cache (in memory)– Traditional logging
● Column format– In-memory– Fast scans
DBIM Instance Architecture
Shared Buffer Cache
Shared Buffer Cache● Shared collective cache of data blocks● Global Cache Service manages
– Location– Access– Handles all OLTP DML operations
● ACID
In-Memory Column Store
In-Memory Compression Unit (IMCU)● Construction:
– Convert row → column– Apply «intelligent data transformation»
and compression● Unit of distribution and scan● Contiguous● Each column becomes a Compression
Unit (CU)– User selectable compression, capacity vs
performance
Scanning IMCUs● SIMD instructions● In-memory Storage Indexes
– Automatically created– Pruning based on filter
predicates– E.g. max and min for each CU
● Low scan cost enables– Bloom filter joins– Vector Group By
In-Memory Column Store● Container for in-memory segments
– Each segment contiguous and contains several IMCUs
● NUMA enabled – distributes equally● Home location index
– Look up segment from data block
Distribution Manager
Distribution manager● Wanted qualities
– Scale out - Extremely scalable distribution– High availability
● in-memory fault tolerance● efficient recovery
– Scale up – distribution across NUMA nodes– Seamless interaction with Oracle's SQL execution engine
Distribution Schemes● By partition● By sub-partiton● By rowid/block range● Automatic
Distribution Mechanism● Why not centralized?
– Non-trivial consistency communication by the coordinating instance● Why not decentralized?
– Lack of consensus → inconsistency● Best of both worlds!?
– Two phase distribution
Phase 1: Consensus● Multiple instances may trigger (re)distribution
– Need leader selection
Phase 1: Consensus
Broadcast Acknowledge Leader downgrade
Phase 2: Population● Calculate block ranges
– Use SCN broadcasted in phase 1● Determine home location
– Rendezvous hashing● NUMA is static
– Use modulo based distribution
Side Note: Rendezvous Hashing● Given a hash function h and an object O, select the instance S
whereh(S, O)
takes on the highest value.● Alternatively: Lowest value● Desirable property: Minimal disruption
Phase 2: Population● Generate IMCUs● Update home location index● Release locks
At the end: All home location indexes consistent
Home Location Indexes
Redistribution● On cluster topology change● Same as distribution● Reuse SCN!
IM Transaction Manager
IM Transaction Manager● Maintains transactional consistency● Uses a system change number (SCN)● Snapshot management unit (SMU)
– Fills in the gap between the IMCU's SCN and query SCN
IMCU+SCN SMU
Note: Requires regular repopulation
Distributed SQL Execution
Distributed SQL Execution● Index vs scan
– Extrapolate cost from home location index● Scan:
– Determine degree of parallelism– Allocate nodes– For 1-safe, select first or secondary
Distributed SQL Execution● Hierarchy of
(sub)distributors● Distribute work based on
home location index● Align to IMCUs and
NUMA boundaries– All block ranges within
same memory
Instance
Query
NUMA node
Home Location Aware Scanning
Uniqueness of Architecture
● SAP HANA– More centralized– poor load
balancing– no redundancy
● No-SQL– Focus on
performance– Not ACID
● IBM DB2 + BLU– Per node in-
memory column db
– no in-memory redundancy
Preliminary EvaluationDid it work?
With data from TPC-H
Distribution● Non-partitioned table
– «atomics» table– Constant size
● Composite-partitioned table– «lineitem» table– Increasing size
● 84-way partitioned, each subpartitioned 256 ways (hash)
Speedup seems to be linear
Query Execution● 4 query sets:
Query Set 1● Selects counts● Where clauses with
increasing complexity
Query Set 2● Select max● Increasing
complexity in select clause
Query Set 3● Different like
predicates
Query Set 4● Simple '<='
predicate● Increasing selectivity
In-Memory Distribution Awareness● Auto distributed, no redundancy
NUMA Aware Query Execution● Scale up
In-Memory Fault Tolerance● 1-safe redundancy● First on 8 instances, then after killing one● (availability)
ConclusionOr more like a summary?
Conclusion● Seamless real-time analytics on huge data volumes with redundancy
→ mixed OLTAP● Oracle DBIM should solve this
– Application transparent– In-memory– Distributed– Uses Oracle's SQL execution framework (consistent interface)