45
Calpont InfiniDB ® Accelerating Data Insights Accelerating Data Insights ® Scalable Analytics for Your NoSQL Big Data Jim Tommaney, CTO Calpont NoSQL Now August 24, 2011 Calpont Proprietary and Confidential

Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Embed Size (px)

Citation preview

Page 1: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Calpont InfiniDB®

Accelerating Data InsightsAccelerating Data Insights

®

Scalable Analytics for Your NoSQL Big DataJim Tommaney, CTO CalpontNoSQL NowAugust 24, 2011 

Calpont Proprietary and Confidential

Page 2: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Key Takeaways

• Calpont and InfiniDB•Architecture – Columnar Storage• Architecture  Columnar Storage• Architecture – Map Reduction Distribution of Work• Performance Characteristics• Performance Characteristics• Ease of Use and Flexibility• Extensibility• Extensibility

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.2

Page 3: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Calpont Corporation

• Companyo Privately held and backedo Headquartered in Frisco TXo Headquartered in Frisco, TX

• Productso InfiniDB Enterprise

Our MissionTo provide ao InfiniDB Enterprise

Launched February 2010o InfiniDB Community

Launched in October, 2009

To provide a scalable data platform that

enables analytic business decisionsbusiness decisions

as timely as customers and markets dictate.

®

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.3

Page 4: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Release Highlights

• Version 1.0 ‐ Oct. 2009/Feb. 2010o Columnar storage.go Map‐reduction distribution of work.o High speed data load.

• Version 1.5 – June 2010o Sub‐query added to map‐reduction framework.

S l F Wh lSelect, From, Where clause support.  Correlated, Non‐Correlated sub‐query.

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.4

Page 5: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Release Highlights

• Version 2.0 – November 2010o Compression with real‐time decompression.p po User‐defined functions, fully parallel and distributed.

Latitude/longitude distance calculation.Geo‐Fencing ‐ is a location within polygon.

o Enhanced partition elimination.E h d ll li i f d i io Enhanced parallelization of reduction operations.

• Version 2.1 – March 2011o Statistical aggregate functionso Statistical aggregate functions.o View support.o Auto‐increment

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.5

o Insert‐select.

Page 6: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Release Highlights

• Version 2.2 – June 2011o Group_concat and bit aggregate functions.p_ gg go Additional scalar functions made parallel and distributed.o Improved performance and memory for large strings.  

• Version 3.0 – Q4/Q1Cl d h d hio Cloud shared nothing.

o Distributed/parallel load.

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.6

Page 7: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Technology Trends

Moore's Law and B eyond

300

200

250

Data W arehouse Grow th - 75%

Mem ory Capac ity - 60%

Disk Capac ity - 50%

Moore's Law (CPU) - 45%

150

200

Perc

ent I

ncre

ase

Moore s Law (CPU) - 45%

Disk Bandw idth - 40%

Mem ory Bandw idth - 20%

Disk Latency - 10%

50

100

P

Mem ory Latency -10%

05 6 7 8 9 10

Ye ar s

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.7

Page 8: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Trends Drive Demand for Alternate Solutions

Moore's Law and B eyond

300

200

250

Data W arehouse Grow th - 75%

Mem ory Capac ity - 60%

Disk Capac ity - 50%

Moore's Law (CPU) - 45%

150

200

Perc

ent I

ncre

ase

Moore s Law (CPU) - 45%

Disk Bandw idth - 40%

Mem ory Bandw idth - 20%

Disk Latency - 10%

50

100

P

Mem ory Latency -10%

05 6 7 8 9 10

Ye ar s

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.8

Page 9: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Traditional Row/Index Based DBMS for Analytics

Moore's Law and B eyond

300

200

250

Data W arehouse Grow th - 75%

Mem ory Capac ity - 60%

Disk Capac ity - 50%

Moore's Law (CPU) - 45%

150

200

Perc

ent I

ncre

ase

Moore s Law (CPU) - 45%

Disk Bandw idth - 40%

Mem ory Bandw idth - 20%

Disk Latency - 10%I d O ti

50

100

P

Mem ory Latency -10%Index Operations

05 6 7 8 9 10

Ye ar s

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.9

Page 10: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Technology Foundations

Moore's Law and B eyond

300

200

250

Data W arehouse Grow th - 75%

Mem ory Capac ity - 60%

Disk Capac ity - 50%

Moore's Law (CPU) - 45%

• Scalable Disk• Scalable Cache

l

150

200

Perc

ent I

ncre

ase

Moore s Law (CPU) - 45%

Disk Bandw idth - 40%

Mem ory Bandw idth - 20%

Disk Latency - 10%d /

• Real‐time Decompression• Efficient I/O from cache• Efficient I/O from disk

50

100

P

Mem ory Latency -10%No Random I/O Operations    

05 6 7 8 9 10

Ye ar s

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.10

Page 11: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB ArchitectureColumnar Storage

Page 12: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Architecture – Columnar Storage

What is Columnar Storage ?• Stores each column for a table in a 

Column 1File 1

Column 2File 2

Column 3File 3

different file/block on disk.o Column 1 values stored in file 1.

C l 2 l d i fil 2o Column 2 values stored in file 2.o Column 3 values stored in file 3.

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.1212

Page 13: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Architecture – Columnar Storage

• Rows are identified by offset.  Row 101 can be found at:

Column 1File 1

Column 2File 2

Column 3File 3

o Column 1 value is at offset 101 in file1.o Column 2 value is at offset 101 in file2.

C l 3 l i t ff t 101 i fil 3o Column 3 value is at offset 101 in file3.

Offset 101 1234 2012‐01‐01 Smith

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.1313

Page 14: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Architecture – Column Restriction

Restriction ‐ find rows based on filters• Column Filter (filter 1 filter 2 filter 3)

Col 1File 1

Col 2File 2

Col 3File 3

Col 90File 90

• Column Filter  (filter 1, filter 2, filter 3)• Table Expression/Functions (exp 1, exp 2)• Join Filter (join 1, join 2, join 3)Join Filter (join 1, join 2, join 3)

Just‐in‐time column access defers I/O until 

…needed. 

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.14

Page 15: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Architecture – Column Projection

Projection – display columns as selected.• Select Column Filter (filter 1 filter 2

Col 1File 1

Col 2File 2

Col 3File 3

Col 90File 90

• Select Column Filter  (filter 1, filter 2, filter 3, etc.)

Just do I/O for:• Columns selected

…• Rows that pass the filters

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.15

Page 16: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Column Restriction and Projection

|-------

|------- Extent # 5

|---------------Co

--------C

o

--C

olum

Filter

Filter

Filter 3

Projection Projection

lumn # F

olumn # S

n # Seve

1 2

Four -------

Six -------

enteen ---

Extent # 27

---------|

---------|

---------|

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.16

• Automatic Vertical Partitioning and Horizontal Partitioning• Just‐In‐Time Materialization

Page 17: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Architecture – Columnar Storage

InfiniDB Adds:InfiniDB Eliminates:

• Efficient I/O• Full Table Scan

• Real‐time Compression• Random I/O  

• Fast, predictable Load• Index Load Overhead 

• Predictable Performance• Conditional Performance

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.1717

Performance

Page 18: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Architecture Map Reduction FrameworkMap Reduction Framework

Page 19: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB – Two Tier Architecture

or …

Purpose built for big data analyticsPurpose built for big data analytics.• User Module (UM)

Understands SQL.

Single Server

Q• Performance Module (PM)

Operates on data blocks.

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.19

Page 20: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Tiered MPP Building Blocks

Module Process Functionality Value

• Hosts MySQL  Familiar DBMS interfaceMySQL • Connection management

• SQL parsing & optimizationLeverages existing partner integrationsDelivers full SQL syntax support

• Abstracts physical and logical Enables shared nothing and shared everything storage

Extent Mapp y g

storage• Metadata store

everything storageEnables partition eliminationBuilt‐in failover

• Work distribution Independent scalability and tunable 

ExeMgrWork distribution

• Final results management and aggregation

concurrencyMulti‐threaded to take advantage of multi‐core HW platforms

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.20

Page 21: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Tiered MPP Building Blocks

Module Process Functionality Value

• Scale‐out cache managementb d f l d

Independent scalability and tunable f

PrimProc • Distributed scan, filter, join and aggregation operations

• Resource management

performanceMulti‐threaded to take advantage of multi‐core HW platforms

• High Speed Bulk Load Enables concurrent reads and writes, non‐

Datag p

• Transactional DML and DDL• Online schema extensions

blocking read enabledMulti‐threaded to take advantage of multi‐core HW platforms

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.21

Page 22: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Tiered MPP Building Blocks

What is the basic unit of work within the Performance Module?

• One thread working on a range of rows.  Typically 1/2 million rows, stored in a few hundred blocks of data.

• Execute all column operations required (restriction and projection).• Execute any group by/aggregation against local data.

R t lt t U M d l• Return results to User Module. • Primitives are run in parallel and fully distributed (MPP).  

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.22

Page 23: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Performance Ch t i tiCharacteristics

Page 24: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Load Performance

• Load rate capable of 1 million rows/second depending on disk and data model.on disk and data model. 

• Consistent load rate over time.

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.24

TIME

Page 25: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Load Performance

• Through 60 billion rows• Through 60 billion rows.

• Through 225 billion rows.g

• Through 1.031 trillion rows.

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.25

Page 26: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Query Performance – Percona SSB

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.26

Page 27: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Performance Benchmark – Percona SSB

Percona External Test vs. Internal Tests vs. 16 PMs @ AWScached queries, scale factor 1000

40000

45000

500001PM2PMs4PMs16PMS (AWS)

25000

30000

35000

Seco

nds

16PMS (AWS)InfoBright - PerconaLucid - PerconaInfiniDB - Percona

9 694 53

10000

15000

20000S 9,694.536,867.74

0

5000

Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.27

Page 28: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

SSB Queries on Amazon Web Services (AWS)

InfiniDB Internal vs. InfiniDB @ AWS - cached queries, scale factor 1000

1200

1000

1PM2PMs4PMs16PMS (AWS)

600

800

seco

nds

200

400

s

7.83

0Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.28

Page 29: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Asia Region Distributor Benchmark

InfiniDB (1 PM)InfiniDB (2 PMs)InfiniDB (2 PMs)Legacy ColumnarDBMS-X Row-Based

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.29

Page 30: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Typical Proof‐of‐Concept Results

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.30

Page 31: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Ease of Use 

Page 32: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Ease of Use – Load and Go

InfiniDB Load and Go Experience:

1. Create Table.2. Load Data.3 Enjoy Performance3. Enjoy Performance.

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.32

Page 33: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Ease of Use – Automatic Everything

• Column storage happens automatically.• Compression  happens automatically.p pp y• Which compression to use happens automatically.• No index build or maintenance.• Extent map partition behavior happens automatically.• Distribution of data across server/disk resources happens 

automaticallyautomatically.• Distribution of work happens automatically.• Ad‐hoc performance happens automatically.Ad hoc performance happens automatically.

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.33

Page 34: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Full Featured SQL to Map‐Reduction Mapping

Robust Column‐Aware Optimizer Handles:o Filter order optimization.o Join order optimization.

P f l J i O i i i H dlPowerful Join Optimizations Handle:o Inner join, outer join, semi‐join (sub‐query).o N‐table single step hash‐join (up to 60).o N table single step hash join (up to 60).  

Queue‐Based Scheduling of Performance Module Handles:o Automatically parallelizes query.o Allows small queries to get in, and return, while larger query is running

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.34

running.

Page 35: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Full Featured Mapping from SQL to Map‐Reduce

Robust Tools to Maximize Physical I/O:o Reading only the columns selected to avoid I/O.o Just‐in‐time materialization to avoid I/O.o Automatic partition elimination to avoid I/O.S l bl d t b ff h t id I/O f di ko Scalable data buffer cache to avoid I/O from disk.

o Compression to minimize the bytes read from disk.

Extensible User Defined Function (UDF):o UDFs run as full‐featured functions within InfiniDB.o Gain full benefits of Optimizer, Join, Scheduler, and Physical I/O features.  

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.35

Page 36: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Ease of Use – Avoiding Trade‐Offs

Traditional (and some current) DBMS technologies often involve significant trade‐offs that just don’t exist within InfiniDB.

Load Rate  vs.  More Indexes.M Att ib t B tt P fMore Attributes  vs.  Better PerformanceSummary Tables vs. Real‐time access to data

Save Space      vs.  Query Performancep Q y

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.36

Page 37: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDBExtensibilityExtensibility

Page 38: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Extensibility for Big Data

Big Data and Extensibility• Data size continues to escalate.• New uses of data to drive business actions.  • New attributes and dimensions are continually being included.

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.38

Page 39: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Extensibility – Scale Efficiently

Handling Data Scale• InfiniDB scales with your data.y• Scalability combined with very efficient I/O.  

o Columnar storage.o Just‐in‐time materialization.o Partition elimination.o Scalable cacheo Scalable cache. o Columnar compression. 

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.39

Page 40: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Extensibility – Online Schema Changes

Schema Changes• The InfiniDB columnar architecture eliminates table rebuilds.• New column files are added without change to existing columns.  • InfiniDB also allows for these column additions to be handled as on‐line operations. 

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.40

Page 41: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Extensibility – Business Logic

The Data Driven Business• Extend your analytics capability with InfiniDB’s User Defined y y p y(parallel and distributed) Functions.

• Reactive and predictive analysis of your data:klo Quickly

o Predictably• Remove BarriersRemove Barriers

o No waiting for new aggregates to be built.o No waiting for new code to be written.

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.41

Page 42: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Connectivity with Hadoop™

The bi‐directional InfiniDB‐Hadoop connector is designed to transfer data between the InfiniDB database and the Hadoop Cluster by implementing Hadoop versions of InfiniDBInputFormat and InfiniDBOutputFormat Classes for the Hadoop framework. 

Calpont InfiniDB® – Hadoop™ Connector ‐ Coming September 2011

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.42

Page 43: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Customer ExperienceExperience

Page 44: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

InfiniDB Customer Experience

A number of customer case studies are available at www.calpont.com for further detail, but the key differential features as to why customers are choosing InfiniDB include:

P f t l• Performance at scale.  • Large number of dimensions.• Ad‐hoc query performance

®

Ad hoc query performance.  • Unique record analysis.  • Near real‐time load capability. • Faster time to market.• Predictable query performance.

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.44

Page 45: Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data

Key Takeaways

The InfiniDB Performance Architecture• Architecture – Columnar Storage• Architecture  Columnar Storage• Architecture – Map Reduction Distribution of Work

The InfiniDB Deployment Experience• Performance Characteristics• Performance Characteristics• Ease of Use and FlexibilityE ibili• Extensibility

InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont.  All Rights Reserved.45