20
Calpont InfiniDB ® Accelerating Data Insights InfiniDB 3: Speeding Big Data Analytics in Amazon EC2 Jim Tommaney, CTO Calpont June 2012 ®

InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

Embed Size (px)

DESCRIPTION

Calpont CTO Jim Tommaney provides an overview InfiniDB 3, Calpont’s analytic data platform. Discussion Topics•How InfiniDB is architected for Big Data analytics•How InfiniDB is provisioned for Amazon EC2 with an AMI•How to quickly create a small or large cluster•How InfiniDB’s parallel load capabilities deliver linear load scaling

Citation preview

Page 1: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

Calpont InfiniDB® Accelerating Data Insights

InfiniDB 3: Speeding Big Data Analytics in Amazon EC2 Jim Tommaney, CTO Calpont June 2012

®

Page 2: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.

Today’s Presenter - Jim Tommaney

• Calpont’s Chief Technologist • Architect of InfiniDB • 25 years experience in applied data

technologies for BI and analytics • Drives InfiniDB roadmap and futures • Closely engaged in client deployments

and POCs

2

Page 3: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.

Today’s Discussion

• Introduction • InfiniDB Architecture for Big Data Analytics • InfiniDB 3

o Provisioned for Amazon EC2 o Demo – Creating a small or large cluster o Parallel load options for load scaling o Demo – Cpimport Load for MPP

3

Page 4: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved. 4

How Fast is the World’s Big Data Footprint Growing?

How Big is a Byte? 1 gigabyte 1000000000 bytes

1000 gigabytes 1 terabyte

1 million terabytes 1 Exabyte

1 billion terabytes 1 Zettabyte

How is Data Growing Daily? 15 petabytes of new information is created each day – 8x more information than in all the libraries in the United States Experiments at the CERN laboratory generate 40 TBs of data every second

IT Implications of Big Data Wal-Mart - one million transactions every hour, feeding databases that store 2.5 petabytes – 167 times the books in American’s Library of Congress

Page 5: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.

Evolution of the Analytic Platform

1960s 1970s 1980s 1990s 2000s 2010+

First DBMS (IDS)

Relational Prototypes

Relational Systems

Commercialize

DBMS Extensions

OLAP MOLAP Cubes

Analytic Platforms

Page 6: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.

What is InfiniDB?

6

Columnar Performance Efficiency

Widely used MySQL Interface

MPP, MapReduce style Query Execution

Simple, Powerful Platform for Big Data Analytics

Page 7: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.

Benefits of InfiniDB

7

Real-time, Consistent Query Performance

Linear Scale for Massive Data

Removes Limits to Dimensions and Granularity

Easy to Deploy and Maintain

Page 8: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.

Data Warehouse

Hadoop

Operational

Transactional

Dimensional Analytics

Data Discovery

Predictive Analytics

Analytic Data Store

Analytic Needs Analytic Data Environment Big Data Sources Data Integration

ETL

MDM

Direct Load Model Legacy RDBMS

How InfiniDB is Used

Page 9: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.

Hadoop Operational

Transactional

Dimensional Analytics

Data Discovery

Predictive Analytics

Analytic Needs Analytic Data Environment Big Data Sources Data Integration

ETL

MDM Legacy RDBMS

Big Data Reference Architecture

Page 10: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved. 10

InfiniDB Product Evolution

• 100% Columnar • Full scale-out MPP • Fully integrated map

reduction operations • High speed data load

InfiniDB 1.0 • Full parallel sub-query • UTF-8 Support • Expanded SQL support • Added support for

additional Linux platforms

InfiniDB 1.5

• UDFs for In-database analytics

• Real-time compression • Enhanced partitioning • Enhanced parallelization

InfiniDB 2.0 InfiniDB 3

• Parallel Load for Big Data • Transparent provisioning and run time operations on Amazon EC2

Page 11: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.

InfiniDB 3

11

Increasing Flexibility while Preserving Simplicity and Speed

Easier to… • Take advantage of Cloud deployments • Load Massive Data for Distributed HW

Deployment Flexibility

Unmatched Simplicity and

Speed

Page 12: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.

InfiniDB 3 - New Capabilities

12

Prepackaged AMI for

automatic provisioning of InfiniDB nodes on EC2

Transparent support of EC2 virtual storage and data

redundancy (EBS) polices

Page 13: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.

Accessing the InfiniDB AMI Trial

13

1. Calpont.com/tryinfiniDB

2. Select the AMI option

3. Provide AWS #

4. Calpont will provide

access within 24 hrs

Page 14: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.

Hadoop Operational

Transactional

Dimensional Analytics

Data Discovery

Predictive Analytics

Analytic Needs Analytic Data Environment Big Data Sources Data Integration

ETL

MDM Legacy RDBMS

Big Data Reference Architecture

User Module

Performance Module

Page 15: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB AMI DEMO

Page 16: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.

InfiniDB 3 - New Capabilities

• Parallel Data Load designed for Big Data

16

Same simple command Several data load configurations possible

Linear performance as more nodes participate in the loading

No query performance degradation during data load

SIMPLE

SCALABLE

FAST

Page 17: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.

InfiniDB 3 - Parallel Data Load Options

17

Bulk Load, Central

1 data source

1 single command

Auto distribution across S/N

Single Bulk Load,

Partitioned

n partitioned data sources

1 single command

n Performance Module nodes

Parallel Bulk Load,

Partitioned

n partitioned data sources

n bulk load commands

n Performance Modules

Page 18: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB Cpimport Load DEMO

Page 19: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.

InfiniDB 3 – Key Takeaways

Scalable with Amazon but same platform you are used to on-premise Easier to deploy with AMI Extended load performance to MPP

deployments Andy Hayler

Information Difference 2011 DW Landscape

Survey

“Based on this survey, the

data warehouse vendor with

the happiest customers in

2011 was Teradata, followed

by CALPONT, then IBM,

followed by Kognitio and

Kalido”

Page 20: InfiniDB 3 - Speeding Big Data Analytics in Amazon EC2

www.calpont.com @Calpont, @InfiniDB

®