Upload
dataversity
View
843
Download
1
Embed Size (px)
Citation preview
Calpont InfiniDB®
Accelerating Data InsightsAccelerating Data Insights
®
Scalable Analytics for Your NoSQL Big DataJim Tommaney, CTO CalpontNoSQL NowAugust 24, 2011
Calpont Proprietary and Confidential
Key Takeaways
• Calpont and InfiniDB•Architecture – Columnar Storage• Architecture Columnar Storage• Architecture – Map Reduction Distribution of Work• Performance Characteristics• Performance Characteristics• Ease of Use and Flexibility• Extensibility• Extensibility
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.2
Calpont Corporation
• Companyo Privately held and backedo Headquartered in Frisco TXo Headquartered in Frisco, TX
• Productso InfiniDB Enterprise
Our MissionTo provide ao InfiniDB Enterprise
Launched February 2010o InfiniDB Community
Launched in October, 2009
To provide a scalable data platform that
enables analytic business decisionsbusiness decisions
as timely as customers and markets dictate.
®
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.3
InfiniDB Release Highlights
• Version 1.0 ‐ Oct. 2009/Feb. 2010o Columnar storage.go Map‐reduction distribution of work.o High speed data load.
• Version 1.5 – June 2010o Sub‐query added to map‐reduction framework.
S l F Wh lSelect, From, Where clause support. Correlated, Non‐Correlated sub‐query.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.4
InfiniDB Release Highlights
• Version 2.0 – November 2010o Compression with real‐time decompression.p po User‐defined functions, fully parallel and distributed.
Latitude/longitude distance calculation.Geo‐Fencing ‐ is a location within polygon.
o Enhanced partition elimination.E h d ll li i f d i io Enhanced parallelization of reduction operations.
• Version 2.1 – March 2011o Statistical aggregate functionso Statistical aggregate functions.o View support.o Auto‐increment
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.5
o Insert‐select.
InfiniDB Release Highlights
• Version 2.2 – June 2011o Group_concat and bit aggregate functions.p_ gg go Additional scalar functions made parallel and distributed.o Improved performance and memory for large strings.
• Version 3.0 – Q4/Q1Cl d h d hio Cloud shared nothing.
o Distributed/parallel load.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.6
Technology Trends
Moore's Law and B eyond
300
200
250
Data W arehouse Grow th - 75%
Mem ory Capac ity - 60%
Disk Capac ity - 50%
Moore's Law (CPU) - 45%
150
200
Perc
ent I
ncre
ase
Moore s Law (CPU) - 45%
Disk Bandw idth - 40%
Mem ory Bandw idth - 20%
Disk Latency - 10%
50
100
P
Mem ory Latency -10%
05 6 7 8 9 10
Ye ar s
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.7
Trends Drive Demand for Alternate Solutions
Moore's Law and B eyond
300
200
250
Data W arehouse Grow th - 75%
Mem ory Capac ity - 60%
Disk Capac ity - 50%
Moore's Law (CPU) - 45%
150
200
Perc
ent I
ncre
ase
Moore s Law (CPU) - 45%
Disk Bandw idth - 40%
Mem ory Bandw idth - 20%
Disk Latency - 10%
50
100
P
Mem ory Latency -10%
05 6 7 8 9 10
Ye ar s
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.8
Traditional Row/Index Based DBMS for Analytics
Moore's Law and B eyond
300
200
250
Data W arehouse Grow th - 75%
Mem ory Capac ity - 60%
Disk Capac ity - 50%
Moore's Law (CPU) - 45%
150
200
Perc
ent I
ncre
ase
Moore s Law (CPU) - 45%
Disk Bandw idth - 40%
Mem ory Bandw idth - 20%
Disk Latency - 10%I d O ti
50
100
P
Mem ory Latency -10%Index Operations
05 6 7 8 9 10
Ye ar s
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.9
InfiniDB Technology Foundations
Moore's Law and B eyond
300
200
250
Data W arehouse Grow th - 75%
Mem ory Capac ity - 60%
Disk Capac ity - 50%
Moore's Law (CPU) - 45%
• Scalable Disk• Scalable Cache
l
150
200
Perc
ent I
ncre
ase
Moore s Law (CPU) - 45%
Disk Bandw idth - 40%
Mem ory Bandw idth - 20%
Disk Latency - 10%d /
• Real‐time Decompression• Efficient I/O from cache• Efficient I/O from disk
50
100
P
Mem ory Latency -10%No Random I/O Operations
05 6 7 8 9 10
Ye ar s
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.10
InfiniDB ArchitectureColumnar Storage
InfiniDB Architecture – Columnar Storage
What is Columnar Storage ?• Stores each column for a table in a
Column 1File 1
Column 2File 2
Column 3File 3
different file/block on disk.o Column 1 values stored in file 1.
C l 2 l d i fil 2o Column 2 values stored in file 2.o Column 3 values stored in file 3.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.1212
InfiniDB Architecture – Columnar Storage
• Rows are identified by offset. Row 101 can be found at:
Column 1File 1
Column 2File 2
Column 3File 3
o Column 1 value is at offset 101 in file1.o Column 2 value is at offset 101 in file2.
C l 3 l i t ff t 101 i fil 3o Column 3 value is at offset 101 in file3.
Offset 101 1234 2012‐01‐01 Smith
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.1313
InfiniDB Architecture – Column Restriction
Restriction ‐ find rows based on filters• Column Filter (filter 1 filter 2 filter 3)
Col 1File 1
Col 2File 2
Col 3File 3
Col 90File 90
• Column Filter (filter 1, filter 2, filter 3)• Table Expression/Functions (exp 1, exp 2)• Join Filter (join 1, join 2, join 3)Join Filter (join 1, join 2, join 3)
Just‐in‐time column access defers I/O until
…needed.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.14
InfiniDB Architecture – Column Projection
Projection – display columns as selected.• Select Column Filter (filter 1 filter 2
Col 1File 1
Col 2File 2
Col 3File 3
Col 90File 90
• Select Column Filter (filter 1, filter 2, filter 3, etc.)
Just do I/O for:• Columns selected
…• Rows that pass the filters
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.15
Column Restriction and Projection
|-------
|------- Extent # 5
|---------------Co
--------C
o
--C
olum
Filter
Filter
Filter 3
Projection Projection
lumn # F
olumn # S
n # Seve
1 2
Four -------
Six -------
enteen ---
Extent # 27
---------|
---------|
---------|
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.16
• Automatic Vertical Partitioning and Horizontal Partitioning• Just‐In‐Time Materialization
InfiniDB Architecture – Columnar Storage
InfiniDB Adds:InfiniDB Eliminates:
• Efficient I/O• Full Table Scan
• Real‐time Compression• Random I/O
• Fast, predictable Load• Index Load Overhead
• Predictable Performance• Conditional Performance
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.1717
Performance
InfiniDB Architecture Map Reduction FrameworkMap Reduction Framework
InfiniDB – Two Tier Architecture
or …
Purpose built for big data analyticsPurpose built for big data analytics.• User Module (UM)
Understands SQL.
Single Server
Q• Performance Module (PM)
Operates on data blocks.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.19
Tiered MPP Building Blocks
Module Process Functionality Value
• Hosts MySQL Familiar DBMS interfaceMySQL • Connection management
• SQL parsing & optimizationLeverages existing partner integrationsDelivers full SQL syntax support
• Abstracts physical and logical Enables shared nothing and shared everything storage
Extent Mapp y g
storage• Metadata store
everything storageEnables partition eliminationBuilt‐in failover
• Work distribution Independent scalability and tunable
ExeMgrWork distribution
• Final results management and aggregation
concurrencyMulti‐threaded to take advantage of multi‐core HW platforms
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.20
Tiered MPP Building Blocks
Module Process Functionality Value
• Scale‐out cache managementb d f l d
Independent scalability and tunable f
PrimProc • Distributed scan, filter, join and aggregation operations
• Resource management
performanceMulti‐threaded to take advantage of multi‐core HW platforms
• High Speed Bulk Load Enables concurrent reads and writes, non‐
Datag p
• Transactional DML and DDL• Online schema extensions
blocking read enabledMulti‐threaded to take advantage of multi‐core HW platforms
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.21
Tiered MPP Building Blocks
What is the basic unit of work within the Performance Module?
• One thread working on a range of rows. Typically 1/2 million rows, stored in a few hundred blocks of data.
• Execute all column operations required (restriction and projection).• Execute any group by/aggregation against local data.
R t lt t U M d l• Return results to User Module. • Primitives are run in parallel and fully distributed (MPP).
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.22
InfiniDB Performance Ch t i tiCharacteristics
InfiniDB Load Performance
• Load rate capable of 1 million rows/second depending on disk and data model.on disk and data model.
• Consistent load rate over time.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.24
TIME
InfiniDB Load Performance
• Through 60 billion rows• Through 60 billion rows.
• Through 225 billion rows.g
• Through 1.031 trillion rows.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.25
InfiniDB Query Performance – Percona SSB
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.26
Performance Benchmark – Percona SSB
Percona External Test vs. Internal Tests vs. 16 PMs @ AWScached queries, scale factor 1000
40000
45000
500001PM2PMs4PMs16PMS (AWS)
25000
30000
35000
Seco
nds
16PMS (AWS)InfoBright - PerconaLucid - PerconaInfiniDB - Percona
9 694 53
10000
15000
20000S 9,694.536,867.74
0
5000
Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.27
SSB Queries on Amazon Web Services (AWS)
InfiniDB Internal vs. InfiniDB @ AWS - cached queries, scale factor 1000
1200
1000
1PM2PMs4PMs16PMS (AWS)
600
800
seco
nds
200
400
s
7.83
0Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.28
Asia Region Distributor Benchmark
InfiniDB (1 PM)InfiniDB (2 PMs)InfiniDB (2 PMs)Legacy ColumnarDBMS-X Row-Based
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.29
Typical Proof‐of‐Concept Results
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.30
InfiniDB Ease of Use
InfiniDB Ease of Use – Load and Go
InfiniDB Load and Go Experience:
1. Create Table.2. Load Data.3 Enjoy Performance3. Enjoy Performance.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.32
InfiniDB Ease of Use – Automatic Everything
• Column storage happens automatically.• Compression happens automatically.p pp y• Which compression to use happens automatically.• No index build or maintenance.• Extent map partition behavior happens automatically.• Distribution of data across server/disk resources happens
automaticallyautomatically.• Distribution of work happens automatically.• Ad‐hoc performance happens automatically.Ad hoc performance happens automatically.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.33
Full Featured SQL to Map‐Reduction Mapping
Robust Column‐Aware Optimizer Handles:o Filter order optimization.o Join order optimization.
P f l J i O i i i H dlPowerful Join Optimizations Handle:o Inner join, outer join, semi‐join (sub‐query).o N‐table single step hash‐join (up to 60).o N table single step hash join (up to 60).
Queue‐Based Scheduling of Performance Module Handles:o Automatically parallelizes query.o Allows small queries to get in, and return, while larger query is running
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.34
running.
Full Featured Mapping from SQL to Map‐Reduce
Robust Tools to Maximize Physical I/O:o Reading only the columns selected to avoid I/O.o Just‐in‐time materialization to avoid I/O.o Automatic partition elimination to avoid I/O.S l bl d t b ff h t id I/O f di ko Scalable data buffer cache to avoid I/O from disk.
o Compression to minimize the bytes read from disk.
Extensible User Defined Function (UDF):o UDFs run as full‐featured functions within InfiniDB.o Gain full benefits of Optimizer, Join, Scheduler, and Physical I/O features.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.35
InfiniDB Ease of Use – Avoiding Trade‐Offs
Traditional (and some current) DBMS technologies often involve significant trade‐offs that just don’t exist within InfiniDB.
Load Rate vs. More Indexes.M Att ib t B tt P fMore Attributes vs. Better PerformanceSummary Tables vs. Real‐time access to data
Save Space vs. Query Performancep Q y
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.36
InfiniDBExtensibilityExtensibility
Extensibility for Big Data
Big Data and Extensibility• Data size continues to escalate.• New uses of data to drive business actions. • New attributes and dimensions are continually being included.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.38
InfiniDB Extensibility – Scale Efficiently
Handling Data Scale• InfiniDB scales with your data.y• Scalability combined with very efficient I/O.
o Columnar storage.o Just‐in‐time materialization.o Partition elimination.o Scalable cacheo Scalable cache. o Columnar compression.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.39
InfiniDB Extensibility – Online Schema Changes
Schema Changes• The InfiniDB columnar architecture eliminates table rebuilds.• New column files are added without change to existing columns. • InfiniDB also allows for these column additions to be handled as on‐line operations.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.40
InfiniDB Extensibility – Business Logic
The Data Driven Business• Extend your analytics capability with InfiniDB’s User Defined y y p y(parallel and distributed) Functions.
• Reactive and predictive analysis of your data:klo Quickly
o Predictably• Remove BarriersRemove Barriers
o No waiting for new aggregates to be built.o No waiting for new code to be written.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.41
InfiniDB Connectivity with Hadoop™
The bi‐directional InfiniDB‐Hadoop connector is designed to transfer data between the InfiniDB database and the Hadoop Cluster by implementing Hadoop versions of InfiniDBInputFormat and InfiniDBOutputFormat Classes for the Hadoop framework.
Calpont InfiniDB® – Hadoop™ Connector ‐ Coming September 2011
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.42
InfiniDB Customer ExperienceExperience
InfiniDB Customer Experience
A number of customer case studies are available at www.calpont.com for further detail, but the key differential features as to why customers are choosing InfiniDB include:
P f t l• Performance at scale. • Large number of dimensions.• Ad‐hoc query performance
®
Ad hoc query performance. • Unique record analysis. • Near real‐time load capability. • Faster time to market.• Predictable query performance.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.44
Key Takeaways
The InfiniDB Performance Architecture• Architecture – Columnar Storage• Architecture Columnar Storage• Architecture – Map Reduction Distribution of Work
The InfiniDB Deployment Experience• Performance Characteristics• Performance Characteristics• Ease of Use and FlexibilityE ibili• Extensibility
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.45