Presto @ Treasure Data - Presto Meetup Boston 2015

  • Published on
    12-Apr-2017

  • View
    816

  • Download
    3

Embed Size (px)

Transcript

  • Designing An Evolving Database Service with Presto

    Taro L. Saito leo@tresaure-data.com

    Oct 6th, 2015. Presto Meetup @ Boston

    mailto:leo@tresaure-data.com

  • Presto Usage at Treasure Data

    2

    100~ customers are actively using Presto 30,000~ Presto queries every day Importing 1,000,000~ records / sec.

    Import Export

    Store Analyze with Presto/Hive

  • Mobile and Web Sources

    Mobile SDKs

    JavaScript SDK (web access logs)

    3

  • Stream Sources

    Streaming

    Apache Logs nginx logs

    syslogJSON logs

    4

    JSON

  • Existing Data Sources

    Bulk Import

    Data files (CSV, TSV, etc.) MySQL

    PostgreSQLOracle

    5

  • Embedded Devices

    Collect data from Embedded linux, serial devices, MQTT, XBee Radio, etc.

    6

  • Import data, now.

    7

  • Treasure Data Architecture

    8

    LogLogLogLogLogLog

    1-hourpartition1-hour

    partition1-hourpartition

    Hadoop MapReduce

    2015-09-29 01:00:00

    2015-09-29 02:00:00

    2015-09-29 03:00:00

    Real-Time Storage

    ArchiveStorage

    time column-based partitioning

    Hive Presto

    Log

    many small log files log merge job

    LogLogLogLogLog

    Distributed SQL Query Engine

    S3 (AWS) Rick CS (IDCF)

    Columnar Format

  • JSON data {time: 1412380700, user:1}

    Additional Column {time: 1412381000, user:2, status:200}

    Type Escalation (int -> string) {time: 1412390000, user:U01, status:200}

    MessagePack A fast and compact JSON-like format

    Auto type conversion Table schema MessagePack types

    Extensible Columnar Store

    9

  • Use Cases

  • E-COMMERCE

    BEFORE

    AFTER

    Biggest Mobile Shopping

    WISH.COM

    Reduced costs

    Scalability

    Single data warehouse11

    http://WISH.COM

  • GAMING

    BEFORE

    AFTER

    Daily Upload Delay of 1-2 days

    2500+ servers

    Real-timeReal-time

    2500+ servers

    1 Billion records/day

    Reduced TCO

    Real-time collection

    Real-time access to KPIs

    Top 10 globally; 40M+ users

    x 20

    12

  • AD TECH

    Publishers Dashboard Advertisers Dashboard

    800 B/month

    Live in 2 weeks with 1 engineer!

    300% growth

    Europes largest mobile ad-exchange

    More than 50 billion impressions/month

    13

  • LOYALTY

    Aggregation

    E-CommerceMarketing Campaigns;

    Promotions

    Customer Segmentation

    A/B Testing

    14

  • Challenges Handle Huge Query Result Output

    SELECT */ CREATE TABLE AS /INSERT INTO Parallel Result Upload to S3

    Bypass JSON result generation at the coordinator

    td-presto connector Accesses MessagePack based columnar store Handle S3 access retry / pipelining

    Future: Better query plan visualization

    Quickly find the performance bottleneck and memory consuming tasks Storing intermediate query results to disks

    Process large joins, query resource limitation

    15

  • Extensible Schema SQL via Hive, Presto

    Unlimited Users, Queries

    Enterprise Apps

    Enterprise Apps Data Science Tools

    REST API

    Ingestion: Streaming, Bulk

    BI Tools

    treasuredata.com/request_demo

    http://treasuredata.com/request_demo

Recommended

View more >