24
Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Presto: Fast SQL-on-Anythingincluding Delta Lake, Snowflake, Elasticsearch and more!

Kamil Bajda-PawlikowskiCo-founder/CTO @ Starburst

Page 2: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Agenda

▪ Presto & Starburst▪ Delta Lake Integration▪ Data Platform Architecture▪ Use Cases

Page 3: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Presto & Starburst

Page 4: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

What is Presto?

High performance MPP SQL

engine

•Interactive ANSI SQL queries

•Proven scalability

•High concurrency

Separation of compute & storage

•Scale storage & compute independently

•SQL-on-anything

•Federated queries

Community-driven open

source project

Deploy Anywhere

•Kubernetes

•Cloud

•On premises

Page 5: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Presto Users

Facebook: 10,000+ of nodes, 1000s of users

Uber 2,000+ nodes, 160K+ queries dailyLinkedIn: 500+ nodes, 200K+ queries daily

Lyft: 400+ nodes, 100K+ queries daily

Page 6: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Starburst

6

Enterprise

Grade Security

On-Prem,

or Cloud

Rapid Time to

Insights

Low Cost of

Ownership

24x7 Expert

Support

ANSI SQL MPP

Query Engine

High

Concurrency

Our Platform

Named Open Source Startup to Watch 2020

600% Growth YoY

100+

Enterprise Customers

NPS Score

80+

Massive

Scale

Page 7: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Starburst Enterprise Presto

Performance Connectivity Security Management

30+ supported enterprise

connectors

High performance parallel

connectors for Oracle,

Teradata, Snowflake and

more

Support

From petabytes to exabytes

– query data from disparate

sources using SQL – with

high concurrency

Control your

price/performance with the

latest cost-based optimizer

Caching available for

frequently accessed data

Kerberos & LDAP

integration

Global Security for fine-

grained Access Control

Data encryption

Data masking

Query auditing

Configuration

Autoscaling

High availability

Monitoring

Deploy anywhere

The largest team of Presto

experts in the world

Fully-tested, stable

releases, curated by the

Presto creators

Hot fixes & security

patches

24x7 support, 365 – we’ve

got your back

7

Page 8: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Starburst CustomersTech

Retail Media & Telco

Finance & Insurance

Healthcare & Pharma Other

Page 9: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Delta Lake Integration

Page 10: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Why Delta Lake?

▪ ACID properties over data lake

▪ Open source table format

▪ Stored as Parquet files

▪ Object storage support

▪ Schema evolution

▪ Time travel feature

▪ Metadata & statistics

▪ Data skipping & z-ordering

Page 11: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Native Presto Delta Lake Reader

Supports data skipping & dynamic filtering

Optimizes query using file statistics

Supports reading the Delta transaction log

Native connector written from scratch

Page 12: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Native Delta Lake Reader Performance

▪ 2x average speedup across 22 queries

▪ 6x best query speedup

▪ “What we have here is game changing for our industry. Especially now that the native Delta reader works as fast as it does. We have people lining up to now use this data”

▪ “We have queries that were running in 10 minutes that are now running in 47 seconds"

Feedback from customers:Standard TPC-H benchmark:

Page 13: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Data Platform Architecture

Page 14: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Starburst PlatformData Scientists Data AnalystsFinance Marketers

The Data Consumption Layer

Existing analytics tools

Data Masking Global SecurityColumn + Row-

level permissionsQuery Auditing Fine-grained

access controlData Encryption

Data Lakes Relational Databases NoSQL Stores Publish/Subscribe

Azure Event Hub

Page 15: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Different SQL Technologies In Your Toolbelt

Streaming Ingestion

Machine Learning

Data Investigation

Large Batch Jobs

Fast Federated Queries

High Concurrency SQL Engine

High Performance Ad Hoc

Reporting/Analytics

Optionality

Cloud Data Warehouse

Rapid Ad Hoc Reporting/Analytics

Fast, but everything must live in

Snowflake (ETL/ELT is required)

Vendor and data lock in

Page 16: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Cloud Data Platform Ecosystem

Page 17: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Deployment Architecture

Page 18: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Use Cases

Page 19: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Data Flow Diagram

Using a combination of Databricks and Starburst Presto to

bring a full data ingestion and analytical environment to life

Page 20: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Data Ingestion and Transformation

● Real-time ingestion of event data into

Delta tables

● Customer and inventory data ingested

every hour

● Modified customer information merged

into Delta Lake table

● Data marts created using streaming and

batch data

Page 21: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Query-time Data Federation

● Single point of access to numerous data

sources

● Query Delta Lake and federate with

legacy databases as well as many

NoSQL data stores

● Enforce table, column and row level

policies to ensure maximum data

security

● Mask column data for different groups

and users

Page 22: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Data Consumption & Analytics BI Reporting Tools

SQL Query Tools

• Connect using a variety of BI and SQL

tools including Looker, Tableau, Power

BI and DBeaver

• JDBC, ODBC and many libraries

including Python, R and Java

SELECT id, COUNT(*), SUM(active_seconds)

FROM delta.iot.events e

JOIN snowflake.sales.customer c ON (e.customer_id = c.id)

WHERE e.event_date >= current_date

AND c.region = 'US'

AND c.id IN

(SELECT l.customer_id

FROM elastic.web.logs l

WHERE l.visit_date >= date '2020-01-01')

GROUP BY id;

Page 23: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Thank You!Try Presto with Delta:

www.starburstdata.com/delta-lake-reader

Page 24: Presto: Fast SQL-on-Anything · 2020. 12. 1. · Presto: Fast SQL-on-Anything including Delta Lake, Snowflake, Elasticsearch and more! Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst

Feedback

Your feedback is important to us.

Don’t forget to rateand review the sessions.