Presto for the Enterprise @ Hadoop Meetup

  • Published on
    12-Apr-2017

  • View
    729

  • Download
    0

Embed Size (px)

Transcript

Warsaw Hadoop User GroupWojciech Bielaukasz Osipiuk

www.teradata.com/presto

#

#Wojtek 2014 Teradata

History of Teradata Center for HadoopFormerly Hadapt Founded in July, 2010 by Justin Borgman, Kamil Bajda-Pawlikowski, and Daniel AbadiPioneered SQL-on-Hadoop marketBased on work done by database research group in Yale Computer Science DepartmentHybrid of Hadoop scalability and DBMS performance

TodayAcquired by Teradata in July, 2014, renamed Teradata Center for Hadoop20+ developers with deep Hadoop and database expertiseHeadquarters in Boston, MATeams in US (MA, CA) and Poland (Warsaw)Contributors to open source project PrestoWho are we? - Teradata Center for Hadoop!

#Supported SQL features:aggregation, scalar, most popular joinslimited subqueriesapproximate at ?simple data types: . structural data types: row, map, array

Not supported SQL features:Keyword end used as column name, not able to use it in projection unless wrapped in quotesTPCH query 6 returns invalid result No support for NUMERIC / DECIMAL type. We are getting rounding errors due to this. This is current work in progress.NATURAL JOIN is not supportedScalar subqueries not supported: "where x = (select y from ...)"Correlated subqueries not supportedNon-equi joins only supported for inner join: ("n_name" < "p_name")EXISTS, EXCEPT, INTERSECT not supportedDROP TABLE IF EXISTS is unsupported (and errors out with a confusing message: " mismatched input 'exists' expecting {, '.'}")SEMI JOIN is not supportedROLLUP is not supportedLIMIT ALL and OFFSET clause not supported

2014 Teradata

What is Presto?

What is Teradata doing?

Can I see a Demo?

How can I contribute?

Talk Agenda

#Supported SQL features:aggregation, scalar, most popular joinslimited subqueriesapproximate at ?simple data types: . structural data types: row, map, array

Not supported SQL features:Keyword end used as column name, not able to use it in projection unless wrapped in quotesTPCH query 6 returns invalid result No support for NUMERIC / DECIMAL type. We are getting rounding errors due to this. This is current work in progress.NATURAL JOIN is not supportedScalar subqueries not supported: "where x = (select y from ...)"Correlated subqueries not supportedNon-equi joins only supported for inner join: ("n_name" < "p_name")EXISTS, EXCEPT, INTERSECT not supportedDROP TABLE IF EXISTS is unsupported (and errors out with a confusing message: " mismatched input 'exists' expecting {, '.'}")SEMI JOIN is not supportedROLLUP is not supportedLIMIT ALL and OFFSET clause not supported

2014 Teradata

100% open source distributed ANSI SQL engine for Big DataModern code baseProven scalability

Optimized for low latency, Interactive queryingCross platform query capability, not only SQL on HadoopDistributed under the Apache license, now supported by TeradataUsed by a community of well known, well respected technology companies

What is Presto?

#Supported SQL features:aggregation, scalar, most popular joinslimited subqueriesapproximate at ?simple data types: . structural data types: row, map, array

Not supported SQL features:Keyword end used as column name, not able to use it in projection unless wrapped in quotesTPCH query 6 returns invalid result No support for NUMERIC / DECIMAL type. We are getting rounding errors due to this. This is current work in progress.NATURAL JOIN is not supportedScalar subqueries not supported: "where x = (select y from ...)"Correlated subqueries not supportedNon-equi joins only supported for inner join: ("n_name" < "p_name")EXISTS, EXCEPT, INTERSECT not supportedDROP TABLE IF EXISTS is unsupported (and errors out with a confusing message: " mismatched input 'exists' expecting {, '.'}")SEMI JOIN is not supportedROLLUP is not supportedLIMIT ALL and OFFSET clause not supported

2014 Teradata

History of Presto

FALL 20126 developers start Presto developmentFALL 201488 Releases 41 Contributors 3943 CommitsSPRING 201598 Releases65 Contributors4587 Commits---------Teradata joins Presto community & offers support

SPRING 2013Presto rolled out within FacebookFALL 2013Facebook open sources PrestoFALL 2008Facebook open sources Hive

#

2014 Teradata

Query Execution

Data stream APIWorker

Data stream APIWorker

Coordinator

MetadataAPI

Parser/analyzer

Planner

SchedulerWorker

Client

Data locationAPI

Pluggable

#Classic MPP database engineExternal data, no internal metadataSingle coordinator, multiple workersWorker autodiscoveryJava8

In memory vector processing.Reading in data in vector form if possible.Reading only subset of datapartitionspredicate pushdown (sql,orc)ANSI SQL compliant (subset)Plan -> byte-code generationPipelining danych (no spill to disk) - no checkpointing (no failover)Client (CLI/poor-mans JDBC)Cluster management through REST API and SQL (sys schema)

2014 Teradata

Query Execution

Data stream APIWorker

Data stream APIWorker

CoordinatorData LocationAPI

MetadataAPI

Parser/analyzer

Planner

SchedulerWorker

Client

Pluggable

#Client sends query over HTTP to coordinatorClients:presto-clijdbc (needs work)odbc (needs work)(contributions) R client, python, ruby, ...

HTTP RESTful API.Pretty Stable.JSON based.

2014 Teradata

Query Execution

Data stream APIWorker

Data stream APIWorker

CoordinatorData LocationAPI

MetadataAPI

Parser/analyzer

Planner

SchedulerWorker

Client

Pluggable

#Query ParsingANSI SQL some more rare features like APPROXIMATE AT CONFIDENCEwritten from scratchANTLR basedno catalogall metadata provided by connectorslimited support for extending data type universum

Query Analyzingassign types to entities

2014 Teradata

Query Execution

Data stream APIWorker

Data stream APIWorker

CoordinatorData LocationAPI

MetadataAPI

Parser/analyzer

Planner

SchedulerWorker

Client

Pluggable

#QueryPlanner.class

LogicalPlanner.plan(analysis) -> logical PlanPlanFragmenter.createSubPlans(Plan) -> fragmented planSubPlanplanFragmentchildren (listdefines data flow from parent to children (replication, hashing, round robin).

2014 Teradata

select shipdate, count(*) count,cast(sum(extendedprice)as bigint) price from h_lineitem where returnflag = 'R' group by shipdateorder by count limit 20Logical and fragmented plan

#QueryPlanner.class

LogicalPlanner.plan(analysis) -> logical PlanPlanFragmenter.createSubPlans(Plan) -> fragmented planSubPlanplanFragmentchildren (listdefines data flow from parent to children (replication, hashing, round robin).

select * from hive.default.h_nation,psql.public.p_regionwhere h_nation.regionkey = p_region.regionkey;

Logical and fragmented plan

#QueryPlanner.class

LogicalPlanner.plan(analysis) -> logical PlanPlanFragmenter.createSubPlans(Plan) -> fragmented planSubPlanplanFragmentchildren (listdefines data flow from parent to children (replication, hashing, round robin).

Query Execution

Data stream APIWorker

Data stream APIWorker

CoordinatorData LocationAPI

MetadataAPI

Parser/analyzer

Planner

SchedulerWorker

Client

Pluggable

#Responsible for assigning nodes for tasks from fragmented plan.Takes data location into consideration (code needs rework)Monitors tasks execution and schedules execution process.

2014 Teradata

Query Execution

Data stream APIWorker

Data stream APIWorker

CoordinatorData LocationAPI

MetadataAPI

Parser/analyzer

Planner

SchedulerWorker

Client

Pluggable

#Data is exposed by connector.Packaged into sequence of memory pages.Each page has vectorized structure:sequence of Blocks (each column is a block)different block implementations for different datatypes/phases of processing.Http communication between workers. Not JSON - efficient binary protocol.Huge request so transport layer not a problem. 2014 Teradata

Query Execution

Data stream APIWorker

Data stream APIWorker

CoordinatorData LocationAPI

MetadataAPI

Parser/analyzer

Planner

SchedulerWorker

Client

Pluggable

page 1blockAblockBpageblockAblockB...

#Data is exposed by connector.Packaged into sequence of memory pages.Each page has vectorized structure:sequence of Blocks (each column is a block)different block implementations for different datatypes/phases of processing.Http communication between workers. Not JSON - efficient binary protocol.Huge request so transport layer not a problem. 2014 Teradata

Query Execution

Data stream APIWorker

Data stream APIWorker

CoordinatorData LocationAPI

MetadataAPI

Parser/analyzer

Planner

SchedulerWorker

Client

Pluggable

#Data is exposed by connector.Packaged into sequence of memory pages.Each page has vectorized structure:sequence of Blocks (each column is a block)different block implementations for different datatypes/phases of processing.Http communication between workers. Not JSON - efficient binary protocol.Huge request so transport layer not a problem. 2014 Teradata

Plan executionHivePrestomapreduceI/OI/OI/OI/OI/OtasktasktasktasktasktasktaskI/O

#MapReduce:Synchronisation points between stagesDumping stage result after each map/reduce taskFailover support (resume computation from disk snapshot)

Presto:Streaming data between tasksAfter initial read data from some storage (depending on connector) data always in memory.Some operations require all data to fit in memory (joins)No checkpoints (simplify architecture) -> no failover

2014 Teradata

Presto Extensibility plugins

ConnectorsData typesExtra functions(new) Security providers

#

2014 Teradata

Presto Extensibility connector interfaces

Parser/analyzer

PlannerWorker

Data location API

HiveCassandraKafkaMySQL

Metadata APIHiveCassandraKafkaMySQL

Data stream APIHiveCassandraKafkaMySQL

SchedulerCoordinator

#Other connectors: Posgtgresql / generic SQLRaptorTeradata (in progress)Hive/HDFS connectoroptimized to read ORC and RCFileother formats available via Hive storage handlers (with SerDe overhead) 2014 Teradata

Presto Extensibility connector interfaces

public interface Connector{ ConnectorHandleResolver getHandleResolver(); ConnectorMetadata getMetadata(); ConnectorSplitManager getSplitManager(); ConnectorPageSourceProvider getPageSourceProvider() ConnectorRecordSetProvider getRecordSetProvider() ConnectorPageSinkProvider getPageSinkProvider() ConnectorRecordSinkProvider getRecordSinkProvider() ConnectorIndexResolver getIndexResolver() Set getSystemTables() List> getTableProperties() ConnectorAccessControl getAccessControl() void shutdown() {}}

#

2014 Teradata

Data stays in memory during execution and is pipelined across nodes MPP-style

Vectorized columnar processing

Presto is written in highly tuned JavaEfficient in-memory data structuresVery careful coding of inner loopsBytecode generation

Optimized ORC reader

Predicates push-down

Query optimizer

Presto = Performance

#designed for short queries, but still OLAP not OLTPuse HIVE for batch/ETL type jobs

2014 Teradata

FacebookMultiple production clusters (100s of nodes total)Including 300PB Hadoop data warehouse1000s of internal daily active usersMillions of queries each monthMultiple PBs scanned every dayTrillions of rows a day

Netflix Over 200-node production cluster on EC2Over 15 PB in S3 (Parquet format)Over 300 users and 2.5K queries dailyPresto in Production

#

2014 Teradata

100% open source contributions to Presto to increase adoption in the enterprise

A multi-year roadmap commitment to phased enhancements of the open source code

The first ever commercial support offering for Presto

What is Teradata Doing?Teradata Certified Prestowww.teradata.com/presto

#

Hadoop Distro Agnostic

Modern Code BasePresto is well-designed open source software with proper database architecture

Strong Like-Minded Community

Push down processing across multiple data platforms

Leverage Teradata expertise to make SQL for Hadoop viableWhy is Teradata Contributing to Presto?

#

2014 Teradata

ImplementIntegrateProliferateInstallerDocumentationMonitoring & Support ToolsODBC / JDBC DriversBI CertificationSecurityConnectorsCommercial SupportPhase 1Phase 2Phase 3

June 8, 2015Q4 20152016

Expanding ANSI SQL Coverage

Teradata Contributions to Presto

Management Tools Integration

YARN Integration

#

2014 Teradata

Ease of install and management via Presto-Admin toolwww.github.com/prestodb/presto-adminPackaging Presto as an RPM

Testing Framework for Prestowww.github.com/prestodb/temptoAdded large number of tests

JDBC driver for JAVA 6

Various SQL improvements

Teradatas Contributions

#

2014 Teradata

Continued SQL ImprovementsSecurity Authentication & AuthorizationMore Connectors e.g. HbaseODBC & JDBC Drivers that actually workBI tool certifications e.g. TableauYARN IntegrationAmbari IntegrationOpen Source our Docker based Dev Env - WIPOpen our Continuous Integration platform to the communityTeradatas Contribution Product Roadmap

#

2014 Teradata

Teradata Engineers Dedicated to Presto

#

2014 Teradata

Presto is an integral part of the Airbnb data infrastructure stack with hundreds of employees running queries each day with the technology. We are excited to see Teradata joining the Presto open source community and are encouraged by the direction of their contributions - James Mayfield, product lead, Airbnb."We are excited to see Teradata's commitment to Presto and adding capabilities in the open source domain. This will create interesting opportunities within our technical and business teams to open up more access options to our critical data. We think this is a positive for Teradata and for the community as a whole- Steve Deasy, vice president of Engineering, Groupon.Early Feedback is Extremely Positive

#

Demo Time!

#

www.github.com/facebook/prestowww.github.com/prestodb

Certified Distro: www.teradata.com/prestoWebsite: www.prestodb.ioPresto : Users Group: www.groups.google.com/group/presto-usersFacebook Page: www.facebook.com/prestodbTwitter: #prestodb

How can I contribute?

#

2014 Teradata

Were hiring!WarsawBoston

Job Offer: bit.do/presto

Contact: Wojciech.Biela@teradata.com

Join us!

#

2014 Teradata

Available for DownloadPresto 101t Server, CLI, JDBCPresto-Admin 0.1DocumentationHDP w/ Presto VM SandboxCDH w/ Presto VM Sandbox

www.teradata.com/presto

Presto 101t certified by Teradata

#

Wojciech.Biela@teradata.com

Lukasz.Osipiuk@teradata.com

#