68
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Eva Tse and Daniel Weeks, Netflix October 2015 BDT303 Running Presto and Spark on the Netflix Big Data Platform

Running Presto and Spark on the Netflix Big Data Platform

  • Upload
    eva-tse

  • View
    1.115

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Running Presto and Spark on the Netflix Big Data Platform

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Eva Tse and Daniel Weeks, Netflix

October 2015

BDT303

Running Presto and Spark on the Netflix

Big Data Platform

Page 2: Running Presto and Spark on the Netflix Big Data Platform

What to Expect from the Session

Our big data scaleOur architectureFor Presto and Spark

- Use cases- Performance- Contributions- Deployment on Amazon EMR- Integration with Netflix infrastructure

Page 3: Running Presto and Spark on the Netflix Big Data Platform
Page 4: Running Presto and Spark on the Netflix Big Data Platform
Page 5: Running Presto and Spark on the Netflix Big Data Platform
Page 6: Running Presto and Spark on the Netflix Big Data Platform
Page 7: Running Presto and Spark on the Netflix Big Data Platform

Our Biggest Challenge is Scale

Page 8: Running Presto and Spark on the Netflix Big Data Platform

Netflix Key Business Metrics

65+ millionmembers

50 countries 1000+ devices

supported

10 billionhours / quarter

Page 9: Running Presto and Spark on the Netflix Big Data Platform

Our Big Data Scale

Total ~25 PB DW on Amazon S3 Read ~10% DW dailyWrite ~10% of read data daily

~ 550 billion events daily

~ 350 active platform users

Page 10: Running Presto and Spark on the Netflix Big Data Platform

Global Expansion200 countries by end of 2016

Page 11: Running Presto and Spark on the Netflix Big Data Platform

Architecture Overview

Page 12: Running Presto and Spark on the Netflix Big Data Platform

Cloudapps

Kafka Ursula

CassandraAegisthus

Dimension Data

Event Data

15 min

Daily

AWS S3

SS Tables

Data Pipelines

Page 13: Running Presto and Spark on the Netflix Big Data Platform

Parquet FF Metacat

(Federated metadata service)

Pig workflow visualization

Data movement

Data visualization

(Amazon EMR

Hadoop clusters)

Job/Cluster perfvisualization

Data lineage

Data quality

Storage Compute Service Tools

(Federated execution service)

AWS S3

Page 14: Running Presto and Spark on the Netflix Big Data Platform

Analytics

ETL

Interactive data exploration

Interactive slice & dice

RT analytics & iterative/ML algo and more ...

Different Big Data Processing Needs

Page 15: Running Presto and Spark on the Netflix Big Data Platform

Amazon S3 as our DW

Page 16: Running Presto and Spark on the Netflix Big Data Platform

Amazon S3 as Our DW Storage

Amazon S3 as single source of truth (not HDFS)Designed for 11 9’s durability and 4 9’s availabilitySeparate compute and storageKey enablement to

- multiple heterogeneous clusters - easy upgrade via r/b deployment

S3

Page 17: Running Presto and Spark on the Netflix Big Data Platform

What About Performance?

Amazon S3 is a much bigger fleet than your clusterOffload network load from clusterRead performance

- Single stage read-only job has 5-10% impact - Insignificant when amortized over a sequence of stages

Write performance- Can be faster because Amazon S3 is eventually consistent w/

higher thruput

Page 18: Running Presto and Spark on the Netflix Big Data Platform

Presto

Page 19: Running Presto and Spark on the Netflix Big Data Platform

Presto is an open-source, distributed SQL query engine for running interactive analytic queries against data sources of

all sizes ranging from gigabytes to petabytes

Page 20: Running Presto and Spark on the Netflix Big Data Platform

Why We Love Presto?

Hadoop friendly - integration with Hive metastoreWorks well on AWS - easy integration with Amazon S3Scalable - works at petabyte scaleUser friendly - ANSI SQL Open source - and in Java!Fast

Page 21: Running Presto and Spark on the Netflix Big Data Platform
Page 22: Running Presto and Spark on the Netflix Big Data Platform

Usage Stats

~3500 queries/day

> 90%

Page 23: Running Presto and Spark on the Netflix Big Data Platform

Expanding Presto Use Cases

Data exploration and experimentationData validationBackend for our interactive a/b test analytics applicationReportingNot ETL (yet?)

Page 24: Running Presto and Spark on the Netflix Big Data Platform

Presto Deployment

Page 25: Running Presto and Spark on the Netflix Big Data Platform

Our Deployment

Version 0.114 + patches + one non-public patch (Parquet vectorized read integration)

Deployed via bootstrap action on Amazon EMRSeparate clusters from our Hadoop YARN clustersNot using Hadoop servicesLeverage Amazon EMR for cluster management

Page 26: Running Presto and Spark on the Netflix Big Data Platform

Two Production Clusters

Resource isolation

Ad hoc cluster1 coordinator (r3.4xl) + 225 workers (r3.4xl)

Dedicated application cluster1 coordinator (r3.4xl) + 4 workers + dynamic workers (r3.xl, r3.2xl,

r3.4xl)Dynamic cluster sizing via Netflix spinnaker API

Page 27: Running Presto and Spark on the Netflix Big Data Platform

Dynamic Cluster Sizing

Page 28: Running Presto and Spark on the Netflix Big Data Platform

Integration with Netflix Infrastructure

Atlas

Sidecar CoordinatorAmazon EMR Presto Worker

Kafka

Presto cluster

(for data lineage)

(for monitoring)

Queries + usersinfo

Page 29: Running Presto and Spark on the Netflix Big Data Platform

Presto Contributions

Page 30: Running Presto and Spark on the Netflix Big Data Platform

Our Contributions

Parquet File Format SupportSchema evolutionPredicate pushdownVectorized read**

Complex TypesVarious functions for map, array and struct typesComparison operators for array and struct types

Amazon S3 File SystemMulti-part uploadAWS instance credentialsAWS role support*Reliability

Query OptimizationSingle distinct -> group byJoins with similar sub-queries*

Page 31: Running Presto and Spark on the Netflix Big Data Platform

Parquet

Columnar file formatSupported across Hive, Pig, Presto, SparkPerformance benefits across different processing enginesGood performance on Amazon S3Majority of our DW is on Parquet FF

Page 32: Running Presto and Spark on the Netflix Big Data Platform

RowGroup Metadata x Nrow count, size, etc.

Dict PageData PageData Page

Column ChunkData PageData PageData Page

Column ChunkDict PageData PageData Page

RowGroup x N

Footerschema, version, etc.

Column Chunk Metadataencoding,size,min, max

Parquet

Column Chunk

Column Chunk Metadataencoding,size,min, max

Column Chunk Metadataencoding,size,min, max

Page 33: Running Presto and Spark on the Netflix Big Data Platform

Vectorized Read

Parquet: read column chunks in batches instead of row-by-row

Presto: replace ParquetHiveRecordCursor with ParquetPageSource

Performance improvement up to 2x for Presto

Beneficial for Spark, Hive, and Drill also

Pending parquet-131 commit before we can publish a Presto patch

Page 34: Running Presto and Spark on the Netflix Big Data Platform

Predicate Pushdown

Dictionary pushdown

column chunk stats [5, 20]dictionary page <5,11,18,20>skip this row group

Statistics pushdown

column chunk stats [20, 30]skip this row group

Example: SELECT… WHERE abtest_id = 10;

Page 35: Running Presto and Spark on the Netflix Big Data Platform

Predicate Pushdown

Works best if data is clustered by predicate columns

Achieves data pruning like Hive partitions w/o the metadata overhead

Can also be implemented in Spark, Hive and Pig

Page 36: Running Presto and Spark on the Netflix Big Data Platform

Atlas Analytics Use Case

Pushdown No Pushdown0

20000

40000

60000

80000

100000

CPU (s)

Pushdown No Pushdown0

50100150200250300350

Wall-clock Time (s)

Pushdown No Pushdown0

2000

4000

6000

8000

Rows Processed (Mils)

Pushdown No Pushdown0

100

200

300

400

Bytes Read (GB)

170x

170x170x

10x

Example query: Analyze 4xx errors from Genie for a day

High cardinality/selectivity for app name and metrics name as predicates

Data staged and clustered by predicate columns

Page 37: Running Presto and Spark on the Netflix Big Data Platform

Stay Tuned…

Two upcoming blog posts on techblog.netflix.com

- Parquet usage @ Netflix Big Data Platform

- Presto + Parquet optimization and performance

Page 38: Running Presto and Spark on the Netflix Big Data Platform

Spark

Page 39: Running Presto and Spark on the Netflix Big Data Platform

Apache Spark™ is a fast and general engine for large-scale data processing.

Page 40: Running Presto and Spark on the Netflix Big Data Platform

Why Spark?

Batch jobs (Pig, Hive)• ETL jobs • Reporting and other analysis

Interactive jobs (Presto)

Iterative ML jobs (Spark)

Programmatic use cases

Page 41: Running Presto and Spark on the Netflix Big Data Platform

Deployments @ Netflix

Spark on Mesos• Self-serving AMI• Full BDAS (Berkeley Data Analytics Stack)• Online streaming analytics

Spark on YARN• Spark as a service• YARN application on Amazon EMR Hadoop• Offline batch analytics

Page 42: Running Presto and Spark on the Netflix Big Data Platform

Version Support

$ spark-shell –ver 1.5 …

s3://…/spark-1.4.tar.gzs3://…/spark-1.5.tar.gzs3://…/spark-1.5-custom.tar.gz

s3://…/1.5/spark-defaults.conf

s3://…/h2prod/yarn-site.xmls3://../h2prod/core-site.xml…

ConfigurationApplication

Page 43: Running Presto and Spark on the Netflix Big Data Platform

Multi-tenancy

Page 44: Running Presto and Spark on the Netflix Big Data Platform

Multi-tenancy

Page 45: Running Presto and Spark on the Netflix Big Data Platform

Dynamic Allocation on YARN

Optimize for resource utilization

Harness cluster’s scale

Still provide interactive performance

Page 46: Running Presto and Spark on the Netflix Big Data Platform

ContainerContainer

Container Container Container Container Container

Task Execution Model

Container

Task Task TaskTaskTaskTask

Task Task TaskTaskTask

Container

Task

Pending Tasks

Traditional MapReduce

Spark ExecutionPersistent

Page 47: Running Presto and Spark on the Netflix Big Data Platform

Dynamic Allocation [SPARK-6954]

Page 48: Running Presto and Spark on the Netflix Big Data Platform

Cached Data

Spark allows data to be cached• Interactive reuse of data set• Iterative usage (ML)

Dynamic allocation• Removes executors when no tasks are pending

Page 49: Running Presto and Spark on the Netflix Big Data Platform

Cached Executor Timeout [SPARK-7955]

val data = sqlContext .table("dse.admin_genie_job_d”) .filter($"dateint">=20150601 and $"dateint"<=20150830)data.persistdata.count

Page 50: Running Presto and Spark on the Netflix Big Data Platform

Preemption [SPARK-8167]

Symptom• Spark tasks randomly fail with “executor lost” error

Cause• YARN preemption is not graceful

Solution• Preempted tasks shouldn’t be counted as failures but should be retried

Page 51: Running Presto and Spark on the Netflix Big Data Platform

Reading / Processing / Writing

Page 52: Running Presto and Spark on the Netflix Big Data Platform

Amazon S3 Listing Optimization

Problem: Metadata is big data• Tables with millions of partitions• Partitions with hundreds of files each

Clients take a long time to launch jobs

Page 53: Running Presto and Spark on the Netflix Big Data Platform

Input split computation

mapreduce.input.fileinputformat.list-status.num-threads• The number of threads to use list and fetch block locations for the specified

input paths.

Setting this property in Spark jobs doesn’t help

Page 54: Running Presto and Spark on the Netflix Big Data Platform

File listing for partitioned table

Partition path

Seq[RDD]

HadoopRDD

HadoopRDD

HadoopRDD

HadoopRDD

Partition path

Partition path

Partition path

Input dir

Input dir

Input dir

Input dir

Sequentially listing input dirs via S3N file system.

S3N

S3N

S3N

S3N

Page 55: Running Presto and Spark on the Netflix Big Data Platform

SPARK-9926, SPARK-10340

Symptom• Input split computation for partitioned Hive table on Amazon S3 is slow

Cause• Listing files on a per partition basis is slow• S3N file system computes data locality hints

Solution• Bulk list partitions in parallel using AmazonS3Client• Bypass data locality computation for Amazon S3 objects

Page 56: Running Presto and Spark on the Netflix Big Data Platform

Amazon S3 Bulk Listing

Partition path

ParArray[RDD]

HadoopRDD

HadoopRDD

HadoopRDD

HadoopRDD

Partition path

Partition path

Partition path

Input dir

Input dir

Input dir

Input dir

Amazon S3 listing input dirs in parallel via AmazonS3Client

AmazonS3Client

Page 57: Running Presto and Spark on the Netflix Big Data Platform

Performance Improvement

1 24 240 7200

2000400060008000

10000120001400016000

1.5 RC2S3 bulk listing

# of partitions

seco

nds

SELECT * FROM nccp_log WHERE dateint=20150801 and hour=0 LIMIT 10;

Page 58: Running Presto and Spark on the Netflix Big Data Platform

Hadoop Output Committer

How it works• Each task writes output to a temp dir.• Output committer renames first successful task’s temp dir to final

destination

Challenges with Amazon S3• Amazon S3 rename is copy and delete (non-atomic)• Amazon S3 is eventual consistent

Page 59: Running Presto and Spark on the Netflix Big Data Platform

Amazon S3 Output Committer

How it works• Each task writes output to local disk• Output committer copies first successful task’s output to

Amazon S3

Advantages• Avoid redundant Amazon S3 copy• Avoid eventual consistency• Always write to new paths

Page 60: Running Presto and Spark on the Netflix Big Data Platform

Our Contributions

SPARK-6018SPARK-6662SPARK-6909SPARK-6910SPARK-7037SPARK-7451SPARK-7850

SPARK-8355SPARK-8572SPARK-8908SPARK-9270SPARK-9926SPARK-10001SPARK-10340

Page 61: Running Presto and Spark on the Netflix Big Data Platform

Next Steps for Netflix Integration

Metrics

Data lineage

Parquet integration

Page 62: Running Presto and Spark on the Netflix Big Data Platform

Key Takeaways

Page 63: Running Presto and Spark on the Netflix Big Data Platform

Our DW source of truth is on Amazon S3

Run custom Presto and Spark distros on Amazon EMR• Presto as stand-alone clusters

• Spark co-tenant on Hadoop YARN clusters

We are committed to open source; you can run what we run

Page 64: Running Presto and Spark on the Netflix Big Data Platform

What’s Next

Page 65: Running Presto and Spark on the Netflix Big Data Platform

On Scaling and Optimizing Infrastructure…

Graceful shrink of Amazon EMR +

Heterogeneous instance groups in Amazon EMR +

Netflix Atlas metrics +

Netflix Spinnaker API =

Load-based expand/shrink of Hadoop YARN clusters

Page 66: Running Presto and Spark on the Netflix Big Data Platform

Expand new Presto use casesIntegrate Spark in Netflix big data platform

Explore Spark for ETL use cases

On Presto and Spark…

Page 67: Running Presto and Spark on the Netflix Big Data Platform

Remember to complete your evaluations!

Page 68: Running Presto and Spark on the Netflix Big Data Platform

Thank you!Eva Tse & Daniel Weeks

Office hour: 12:00pm – 2pm Today @ Netflix Booth #326