26
1 1 Presto - Analytical Database Wojciech Biela Łukasz Osipiuk https://prestodb.io

Presto - Analytical Database. Overview and use cases

Embed Size (px)

Citation preview

Page 1: Presto - Analytical Database. Overview and use cases

11

Presto - Analytical Database Wojciech BielaŁukasz Osipiuk

https://prestodb.io

Page 2: Presto - Analytical Database. Overview and use cases

2

Who are we?

Center for Hadoop

Page 3: Presto - Analytical Database. Overview and use cases

3

History of Presto

FALL 20126 developers start Presto

development

FALL 201488 Releases

41 Contributors 3943 Commits

FALL 2015132 Releases

105 Contributors6300 Commits

---------Teradata part of

Presto community & offers support

SPRING 2013Presto rolled out within Facebook

FALL 2013Facebook open sources Presto

FALL 2008Facebook open

sources Hive

Page 4: Presto - Analytical Database. Overview and use cases

4

➔ 100% open source distributed ANSI SQL engine for Big Data

➔ Optimized for low latency, Interactive querying◆ Cross platform query capability, not only SQL on Hadoop◆ Distributed under the Apache license, now supported by Teradata◆ Used by a community of well known, well respected technology companies◆ Modern code base◆ Proven scalability

What is Presto?

Page 5: Presto - Analytical Database. Overview and use cases

5

High level architecture

Data stream API

Worker

Data stream API

Worker

Coordinator

MetadataAPI

Parser/analyzer Planner Scheduler

Worker

Client

Data locationAPI

Pluggable

Page 6: Presto - Analytical Database. Overview and use cases

6

Plan executionHive Presto

map

reduce

I/O

I/O

I/O

I/O

I/O

task task

task task

task task

task

I/O

Page 7: Presto - Analytical Database. Overview and use cases

7

Presto Extensibility – connector interfaces

Parser/analyzer Planner

Worker

Data location API

Hiv

e

Ca

ssa

nd

ra

Ka

fka

MyS

QL

Metadata API

Hiv

e

Ca

ssa

nd

ra

Ka

fka

MyS

QL

Data stream API

Hiv

e

Ca

ssa

nd

ra

Ka

fka

MyS

QL

Scheduler

Coordinator

Page 8: Presto - Analytical Database. Overview and use cases

8

Presto Extensibility – plugins

➔ Connectors

➔ Data types

➔ Extra functions

➔ Security providers

Page 9: Presto - Analytical Database. Overview and use cases

9

➔ Facebook◆ Multiple production clusters (100s of nodes total)

● Including 300PB Hadoop data warehouse● Single cluster size order of 10s of nodes

◆ 1000s of internal daily active users◆ Millions of queries each month◆ Multiple PBs scanned every day◆ Trillions of rows a day◆ ORC format

➔ Netflix ◆ Over 250-node production cluster on EC2◆ Over 15 PB in S3 (Parquet format)◆ Over 300 users and 2.5K queries daily◆ presto-cli, R, Python, BI tools◆ 50% queries under 4s

Some usage facts

Page 10: Presto - Analytical Database. Overview and use cases

10

Netflix Data Pipeline

Suro / Kafka Cassandra

AegisthusUrsula

Amazon S3

TVs mobile laptop dimensionsevents

TD

TVs mobile laptopTVs mobile laptop

Page 11: Presto - Analytical Database. Overview and use cases

11

Presto use-cases at Facebook

➔ three use cases

◆ Data warehouse - big data

◆ User facing - small data

◆ User facing - medium data

Page 12: Presto - Analytical Database. Overview and use cases

12

Presto use-cases at Facebook (data warehouse)

HDFS data warehouse

Page 13: Presto - Analytical Database. Overview and use cases

13

Presto use-cases at Facebook (data warehouse)

➔ Multiple clusters

➔ O(103) of users

➔ O(106) queries per month

➔ petabytes of data scanned every day

➔ 100s of concurrent queries

Page 14: Presto - Analytical Database. Overview and use cases

14

Presto use-cases at Facebook (data warehouse)

Loader

Client

Presto

Data Node

Presto

Data Node

M/R

Data Node

M/R

Data Node

Presto

Data Node

Presto

Hive

Page 15: Presto - Analytical Database. Overview and use cases

15

Presto use-cases at Facebook (data warehouse)

Client

Presto

PrestoDispatcher

Presto

Presto

Presto

Presto

Presto

Page 16: Presto - Analytical Database. Overview and use cases

16

Presto use-cases at Facebook (realtime)

Real time user facing

Page 17: Presto - Analytical Database. Overview and use cases

17

Presto use-cases at Facebook (realtime)

Requirements

➔ User facing

➔ 0.1-5 seconds latency

➔ Support for data updates

➔ highly available

➔ 10-15 way joins

Page 18: Presto - Analytical Database. Overview and use cases

18

Presto use-cases at Facebook (realtime)

Loader

Client

mysqlPresto

Presto

Presto

mysql

mysql

mysql

mysql

Page 19: Presto - Analytical Database. Overview and use cases

19

Presto use-cases at Facebook (semi realtime)

Requirements

➔ Large data sets (smaller than warehouse)

➔ seconds to minutes latency

➔ predictable performance

➔ 5-15 minutes load latency

➔ 100s concurrent queries

Page 20: Presto - Analytical Database. Overview and use cases

20

Presto use-cases at Facebook (semi realtime)

Raptor

Page 21: Presto - Analytical Database. Overview and use cases

21

Presto use-cases at Facebook (semi realtime)

Raptor Loader

Client

Presto

Flash

Presto

Flash

Presto

Flash

Presto

FlashPresto

mysql

Kafka

Kafka

KafkaKafka

Loader

Gluster

Gluster

backup tier

Page 22: Presto - Analytical Database. Overview and use cases

22

Presto use-cases at Facebook (semi realtime)

Raptor Loader

Client

Presto

Flash

Presto

Flash

Presto

Flash

Presto

FlashPresto

mysql

Kafka

Kafka

KafkaKafka

Loader

Gluster

Gluster

backup tier

INSERT INTO raptor_table SELECT * from kafka_table where token BETWEEN ${last_token} AND ${next_token}

MARK LOAD in PROGRESS in MySQL

Page 23: Presto - Analytical Database. Overview and use cases

23

Presto use-cases at Facebook (semi realtime)

Extra features

➔ Physical data reorganization

➔ Fully fledged and atomic DDL

➔ Atomic data loading

➔ Tiered architecture

Page 24: Presto - Analytical Database. Overview and use cases

24

➔ Data stays in memory during execution and is pipelined across nodes MPP-style

➔ Vectorized columnar processing

➔ Presto is written in highly tuned Java◆ Efficient in-memory data structures◆ Very careful coding of inner loops◆ Bytecode generation

➔ Optimized ORC reader

➔ Predicates push-down

➔ Query optimizer

Presto = Performance

Page 25: Presto - Analytical Database. Overview and use cases

25

www.github.com/facebook/prestowww.github.com/prestodb

Certified Distro: www.teradata.com/prestoWebsite: www.prestodb.ioPresto : User’s Group: www.groups.google.com/group/presto-users

Interested in joining Teradata?● Presto development ● other Hadoop related development and consulting

contact our Recruitment Partner: Renata Rosłoniec (VBC)tel. 514 035 237, [email protected]

How can I contribute?