1. 1 Boston Hadoop User Group Meetup, July 7, 2015 Kamil Bajda-Pawlikowski Matt Fuller
2. 2 History of Teradata Center for Hadoop Formerly Hadapt Founded in July, 2010 by Borgman, Bajda-Pawlikowski, and Abadi Pioneered SQL-on-Hadoop market Based on work done by database research group in Yale Computer Science Department Hybrid of Hadoop scalability and DBMS performance Today Acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop 30 developers with deep Hadoop and database expertise Headquarters in Boston, MA Contributors to open source project Presto Who are we? - Teradata Center for Hadoop!
3. 3 What is Presto? What is Teradata doing? Can I see a Demo? How can I contribute? Talk Agenda
4. 4 100% open source distributed ANSI SQL engine for Big Data Modern code base Proven scalability Optimized for low latency, Interactive querying Cross platform query capability, not only SQL on Hadoop Distributed under the Apache license, now supported by Teradata Used by a community of well known, well respected technology companies What is Presto?
5. 5 History of Presto FALL 2012 4 developers start Presto development FALL 2014 88 Releases 41 Contributors 3943 Commits SPRING 2015 98 Releases 65 Contributors 4587 Commits --------- Teradata joins Presto community & offers support SPRING 2013 Presto rolled out within Facebook FALL 2013 Facebook open sources Presto FALL 2008 Facebook open sources Hive Timeline image courtesy of Facebook
6. 6 Presto Architecture Data stream API Worker Data stream API Worker Coordinator Metadata API Parser/ analyzer Planner Scheduler Worker Client Data location API Pluggable https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
7. 7 Presto Extensibility connectors Parser/ analyzer Planner Worker Data location API Hive Cassandra Kafka MySQL Metadata API Hive Cassandra Kafka MySQL Data stream API Hive Cassandra Kafka MySQL Scheduler Coordinator https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
8. 8 Data stays in memory during execution and is pipelined across nodes MPP-style Vectorized columnar processing Presto is written in highly tuned Java Efficient in-memory data structures Very careful coding of inner loops Bytecode generation Optimized ORC reader Presto = Performance
9. 9 Facebook Multiple production clusters (100s of nodes total) - Including 300PB Hadoop data warehouse 1000s of internal daily active users Millions of queries each month Multiple PBs scanned every day Trillions of rows a day Netflix Over 200-node production cluster on EC2 Over 15 PB in S3 (Parquet format) Over 300 users and 2.5K queries daily Presto in Production
10. 10 100% open source contributions to Presto to increase adoption in the enterprise A multi-year roadmap commitment to phased enhancements of the open source code The first ever commercial support offering for Presto What is Teradata Doing? Teradata Certified Presto www.teradata.com/presto
11. 11 Hadoop Distro Agnostic Modern Code Base Presto is well-designed open source software with proper database architecture Strong Like-Minded Community Push down processing across multiple data platforms Leverage Teradata expertise to make SQL for Hadoop viable Why is Teradata Contributing to Presto?
12. 12 Demo Time!
13. 13 Implement Integrate Proliferate Installer Documentation Monitoring & Support Tools Management Tool Integration YARN Integration ODBC / JDBC Drivers BI Certification Security Connectors Commercial Support Phase 1 Phase 2 Phase 3 June 8, 2015 Q4 2015 2016 Expanding ANSI SQL Coverage Teradata Contributions to Presto
14. 14 Ease of install and management via Presto-Admin tool www.github.com/prestodb/presto-admin Packaging Presto as an RPM Testing Framework for Presto www.github.com/prestodb/tempto Added large number of tests Improvements to JDBC driver To be open sourced on www.github.com/prestodb soon! Various SQL improvements Teradatas Contributions
15. 15 YARN Integration Ambari Integration ODBC & JDBC Drivers that actually work Security Authentication & Authorization Continued SQL Improvements BI tool certifications e.g. Tableau More Connectors e.g. Hbase Open Source our Docker based Dev Env Open our Continuous Integration platform to the community Teradatas Contribution Product Roadmap
16. 16 www.github.com/facebook/presto www.github.com/prestodb Certified Distro: www.teradata.com/presto Website: www.prestodb.io Presto Users Group: www.groups.google.com/group/presto-users Facebook Page: www.facebook.com/prestodb Twitter: #prestodb How can I contribute?
17. 17 Available for Download Presto 101t Server, CLI, JDBC Presto-Admin 0.1 Documentation HDP w/ Presto VM Sandbox CDH w/ Presto VM Sandbox www.teradata.com/presto Presto 101t certified by Teradata