28
Optiq A dynamic data management framework @julianhyde Friday, October 4, 13

Optiq: A dynamic data management framework

Embed Size (px)

DESCRIPTION

Optiq is a dynamic data management framework. In this talk, which I gave at the Pentaho Community Meetup in Sintra, Portugal in October, 2013, I describe how Optiq can be combined with Pentaho tools such as Mondrian to build high-performance analytics on Hadoop, NoSQL, and heterogeneous big data systems.

Citation preview

Page 1: Optiq: A dynamic data management framework

OptiqA dynamic data management framework

@julianhyde

Friday, October 4, 13

Page 2: Optiq: A dynamic data management framework

3 things

1. Databases are good.

2. It is hard to build analytics if your data is “all over the place.”

3. Optiq makes heterogeneous data look and behave more like a database.

Friday, October 4, 13

Page 3: Optiq: A dynamic data management framework

Mondrian on Optiq on hybrid data + memory

Friday, October 4, 13

Page 4: Optiq: A dynamic data management framework

Big Data

Friday, October 4, 13

Page 5: Optiq: A dynamic data management framework

Friday, October 4, 13

Page 6: Optiq: A dynamic data management framework

“Data all over the place”

• Different locations (HDFS, memory, DBMS)

• Different formats

• Different workloads

• Transactional: Mainly write, targeted read

• Analytic: Mainly read (bulk write)

• Query latency: Interactive vs. batch

• Data latency: Can we show out-of-date data?

Friday, October 4, 13

Page 7: Optiq: A dynamic data management framework

Databases are good

• Central point to manage data

• Simple, standard API for apps

• Powerful modeling techniques (e.g. star schemas)

• Data independence (i.e. tune your data after you write your application)

• Query optimization

Friday, October 4, 13

Page 8: Optiq: A dynamic data management framework

Optiq

Friday, October 4, 13

Page 9: Optiq: A dynamic data management framework

Conventional DB architectureFriday, October 4, 13

Page 10: Optiq: A dynamic data management framework

Optiq architectureFriday, October 4, 13

Page 11: Optiq: A dynamic data management framework

Examples

Friday, October 4, 13

Page 12: Optiq: A dynamic data management framework

Example #1: CSV

• Uses CSV adapter (optiq-csv)

• Demo using sqlline

• Easy to run this for yourself:$ git clone https://github.com/julianhyde/optiq-csv$ cd optiq-csv$ mvn install$ ./sqlline

Friday, October 4, 13

Page 13: Optiq: A dynamic data management framework

More adapters

Adapters Embedded Planned

CSV Cascading (Lingual) HBase (Phoenix)

JDBC Apache Drill Spark

MongoDB Cassandra

Splunk Mondrian

linq4j

Friday, October 4, 13

Page 14: Optiq: A dynamic data management framework

Example #2:Splunk + MySQL

SELECT p.product_name, COUNT(*) AS cFROM splunk.splunk AS s JOIN mysql.products AS p ON s.product_id = p.product_idWHERE s.action = 'purchase'GROUP BY p.product_nameORDER BY c DESC

Friday, October 4, 13

Page 15: Optiq: A dynamic data management framework

Expression tree

Friday, October 4, 13

Page 16: Optiq: A dynamic data management framework

Optimized tree

Friday, October 4, 13

Page 17: Optiq: A dynamic data management framework

Analytics on heterogeneous data

Friday, October 4, 13

Page 18: Optiq: A dynamic data management framework

Simple analytics problem

• 100M U.S. census records

• 1KB each record, 100GB total

• 4 SATA3 disks, total 1.2GB/s

• How to count all records in under 5s?

Friday, October 4, 13

Page 19: Optiq: A dynamic data management framework

Simple analytics problem

• 100M U.S. census records

• 1KB each record, 100GB total

• 4 SATA3 disks, total 1.2GB/s

• How to count all records in under 5s?

Friday, October 4, 13

Page 20: Optiq: A dynamic data management framework

Simple analytics problem

• 100M U.S. census records

• 1KB each record, 100GB total

• 4 SATA3 disks, total 1.2GB/s

• How to count all records in under 5s?

• Not possible?! It takes 80s just to read the data.

Friday, October 4, 13

Page 21: Optiq: A dynamic data management framework

Solution: Cheat!

Friday, October 4, 13

Page 22: Optiq: A dynamic data management framework

Solution: Cheat!

• Compress data

• Column-oriented storage

• Store data in sorted order

• Put data in memory

• Cache previous query results

• Pre-compute (materialize) aggregates

Friday, October 4, 13

Page 23: Optiq: A dynamic data management framework

How Optiq helps you to cheat

Friday, October 4, 13

Page 24: Optiq: A dynamic data management framework

How Optiq helps you to cheat

• Materialized views

• Pre-defined aggregate tables

• Cached query results = In-memory tables

• Smart cache maintenance

• Quickly bring materializations online & offline

• Materializations over a subset of the data

• Spark distributed, in-memory processing & cache

• Application thinks it is talking to a single SQL database

Friday, October 4, 13

Page 25: Optiq: A dynamic data management framework

Mondrian on Optiq on hybrid data + memory

Friday, October 4, 13

Page 26: Optiq: A dynamic data management framework

Summary

Friday, October 4, 13

Page 27: Optiq: A dynamic data management framework

3 things (reprise)1. Databases are good. Especially the

flexibility that SQL gives us.

2. It is hard to build analytics if your data is all over the place. Different workloads (operational vs. analytic, small write vs. bulk read) require different data structures.

3. Optiq is not a database. But Optiq creates a federated data architecture that performs well, and looks like a database to your tools.

Friday, October 4, 13

Page 28: Optiq: A dynamic data management framework

@julianhyde

optiq https://github.com/julianhyde/optiqmondrian http://mondrian.pentaho.comblog http://julianhyde.blogspot.com

Friday, October 4, 13