Presto meetup 2015-03-19 @Facebook

  • Published on
    14-Jul-2015

  • View
    658

  • Download
    10

Embed Size (px)

Transcript

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Presto as a ServiceTips for operation and monitoring

    Dongmin YuTreasure Data, Inc.min@treasure-data.comJeroMQ / ZeroMQ committer & maintainer

    Mar 19, 2015Presto Meetup @ Facebook

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Topics

    Presto as a Service in Treasure Data Error Recovery Presto Deployment

    Tips for Monitoring Presto JSON API Presto + Fluentd

    Custom changes

    2

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Treasure Data: Presto as a Service

    3

    Presto Public Release

  • Hive

    TD API / Web ConsoleInteractive query

    batch query

    Presto

    Treasure Data

    PlazmaDB:MessagePack Columnar Storage

    td-presto connector

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Deployment Building Presto takes more than 20 minutes.

    Facebook frequently releases new versions

    Let CircleCI build Presto Deploy jar files to private Maven repository We sometime use non-release versions

    for fixing serious bugs hot-fix patches

    Integration Test td-presto connector

    PlazmaDB, Multi-tenant query scheduler Query optimizer

    Run test queries on staging cluster Presto Verifier

    5

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Production: Blue-Green Deployment http://martinfowler.com/bliki/BlueGreenDeployment.html

    2 Presto Coordinators (Blue/Green) Route Presto queries to the active cluster No down-time upon deployment

    Launch Presto worker instances with chef

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Error Recovery

    Presto has no fault tolerance Error types

    User error Syntax errors

    SQL syntax, missing function Semantic errors

    missing tables/columns Insufficient resource

    Exceeded task memory size Internal failure

    I/O error S3/Riak CS

    worker failure etc.

    7

    Worth A Retry!

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Failed Query Rate

    8

  • Copyright 2015 Treasure Data. All Rights Reserved. 9

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Query Retry Patterns used in TD

    Error code + message pattern

    10

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Monitoring Presto with Fluentd

    11

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Monitoring Presto

    REST API for monitoring Presto state JSON format

    (presto server IP):8080/v1/query List of recent queries (BasicQueryInfo class)

    (presto server IP):8080/v1/query/(query id) Detailed query state information Query plan, tasks and running worker IDs Processed rows/data size

    12

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Query List /v1/query

    13

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Detailed query Info /v1/query/(query id)

    14

  • Copyright 2015 Treasure Data. All Rights Reserved.

    /ui/query-execution/(query id)

    15

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Complex Queries

    16

  • Copyright 2015 Treasure Data. All Rights Reserved. 17

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Presto Coordinator

    Organizes query execution pipelines Coordinates presto workers

    Retrieves table partition and split location from connectors Creates distributed query plans

    Full GC Stalls coordinator

    When memory is insufficient Use memory-rich machine GC Tuning

    UseG1GC

    18

  • Copyright 2015 Treasure Data. All Rights Reserved.

    presto-metrics (Ruby)

    https://github.com/xerial/presto-metrics

    19

  • Copyright 2015 Treasure Data. All Rights Reserved. 20

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Query Collection in TD

    SQL query logs query, detailed query plan, elapsed time, processed rows, etc. newSetBinder(binder,EventClient.class).addBinding()

    .to(FluentEventClient.class)

    Presto is used for analyzing the query history

    21

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Daily/Hourly Query Usage

    22

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Query Running Time

    More than 90% of queries finishes within 2 min. expected response time for interactive queries

    23

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Detecting Anomaly

    Started Query Rate (in 5min/15min) If no query has started, cluster may be down (or not started properly)

    Processed rows in a query Sum up the number of the processed rows from all of the sub stages Simple, but the most reliable measure

    Send an alert Slack notification PagerDuty call

    JP/US team rotation

    24

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Benchmarking

    Query performance comparison between two versions of Presto

    Benchmark Run query set multiple times Store the results to TD Report the result with Presto

    Aggregation query

    25

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Presto Operation Tool

    Prestop Our internal tool for managing multiple presto

    clusters written in Scala

    Query monitoring Benchmarking Workload simulation

    stress testing

    Monitoring Datadog PageDuty ChartIO (query stats)

    26

  • Copyright 2015 Treasure Data. All Rights Reserved.

    buffer

    Optimizing Scan Performance Storage Manager

    Fully utilize the network bandwidth from S3 TD Presto becomes CPU bottleneck

    27

    TableScanOperators

    s3 file list table schema header

    request

    S3 / RiakCS

    release(Buffer)

    Buffer size limitReuse allocated buffers

    Request Queue

    priority queue max connections limit

    HeaderColumn Block 0 (column names)

    Column Block 1

    Column Block i

    Column Block m

    MPC1 file

    HeaderReader

    callback to HeaderParser

    ColumnBlockReader

    headerHeaderParser

    parse MPC file header column block offsets column names

    column block requestColumn block requests

    column block

    prepare

    buffer

    MessageUnpackerMessageUnpacker

    S3 read

    S3 read

    pull records

    Retry GET request on- 500 (internal error)- 503 (slow down)- 404 (not found)- eventual consistency

    S3 read decompression msgpack-java v07 On-demand de-ser

    S3 read

    S3 read

    S3 read

  • Copyright 2015 Treasure Data. All Rights Reserved.

    Multi-tenancy: Resource Allocation Price-plan based resource allocation

    Parameters The number of worker nodes to use (min-candidates) The number of hash partitions (initial-hash-partitions) The maximum number of running tasks per account

    If running queries exceeds allowed number of tasks, the next queries need to wait (queued)

    Presto: SqlQueryExecution class Controls query execution state: planning -> running -> finished

    No resource allocation policy

    Extended TDSqlQueryExection class monitors running tasks and limits resource usage Rewriting SqlQueryExecutionFactory at run-time by using ASM library

    28

  • Copyright 2015 Treasure Data. All Rights Reserved.

    WE ARE HIRING!

    29

    Check: www.treasuredata.com

Recommended

View more >