33
Extending the Yahoo! Streaming Benchmark Jamie Grier @jamiegrier [email protected] om

Extending the Yahoo Streaming Benchmark

Embed Size (px)

Citation preview

Page 1: Extending the Yahoo Streaming Benchmark

Extending the Yahoo! Streaming Benchmark

Jamie Grier@[email protected]

Page 2: Extending the Yahoo Streaming Benchmark

Who am I?• Director of Applications Engineering at data

Artisans• Previously working on streaming

computation at Twitter, Gnip and Boulder Imaging

• Involved in various kinds of stream processing for about a decade

• High-speed video, social media streaming, general frameworks for stream processing

Page 3: Extending the Yahoo Streaming Benchmark

Overview• Yahoo! performed a benchmark comparing

Apache Flink, Storm and Spark• The benchmark never actually pushed Flink

to it’s throughput limits but stopped at Storms limits

• I knew Flink was capable of much more so I repeated the benchmarks myself

• I did a follow up blog post explaining my findings and will summarize them here

Page 4: Extending the Yahoo Streaming Benchmark

Yahoo! Benchmark• Count ad impressions grouped by

campaign• Compute aggregates over a 10 second

window• Emit current value of window aggregates

to Redis every second for query• Map ads to campaigns using Redis as well

Page 5: Extending the Yahoo Streaming Benchmark

Any questions so far?

Page 6: Extending the Yahoo Streaming Benchmark

Storm Code

Page 7: Extending the Yahoo Streaming Benchmark

Flink Code

Page 8: Extending the Yahoo Streaming Benchmark

Hardware Specs• 10 Kafka brokers with 2 partitions each• 10 compute nodes (Flink / Storm)• Each machine has 1 Xeon [email protected] CPU

• 4 cores w/ hyperthreading• 32 GB RAM (only 8GB allocated to JVMs)

• 10 GigE Ethernet between compute nodes• 1 GigE Ethernet between Kafka cluster and compute

nodes

Page 9: Extending the Yahoo Streaming Benchmark

Logical Deployment

Data Generat

orKafka Source Filter Project Join

Redis

Window Sink Redis

Stream Processor

Page 10: Extending the Yahoo Streaming Benchmark

Redis

Apache StormDeployment

Kafka

Kafka

Kafka

Source Filter Project Join Window Sink

FlinkData Generator

Redis

Shuffle

Apache Storm10 Gige Link1 Gige Link

Page 11: Extending the Yahoo Streaming Benchmark

Redis

Kafka

Kafka

Kafka

Source Filter Project Join Window Sink

FlinkData Generator

Redis

Shuffle

10 Gige Link1 Gige Link

Page 12: Extending the Yahoo Streaming Benchmark

Redis

Kafka

Kafka

Kafka

Source / Filter Project Join Window Sink

FlinkData Generator

Redis

Shuffle

10 Gige Link1 Gige Link

Page 13: Extending the Yahoo Streaming Benchmark

Redis

Kafka

Kafka

Kafka

Source / Filter / Project Join Window Sink

FlinkData Generator

Redis

Shuffle

10 Gige Link1 Gige Link

Page 14: Extending the Yahoo Streaming Benchmark

Redis

Kafka

Kafka

Kafka

Source / Filter / Project / Join Window Sink

FlinkData Generator

Redis

Shuffle

10 Gige Link1 Gige Link

Page 15: Extending the Yahoo Streaming Benchmark

Redis

Kafka

Kafka

Kafka

Window / Sink

FlinkData Generator

Redis

Shuffle

Source / Filter / Project / Join

10 Gige Link1 Gige Link

Page 16: Extending the Yahoo Streaming Benchmark

Redis

Kafka

Kafka

Kafka

FlinkData Generator

Redis

Shuffle

Window / SinkSource / Filter / Project / Join

10 Gige Link1 Gige Link

Page 17: Extending the Yahoo Streaming Benchmark

Redis

Kafka

Kafka

Kafka

FlinkData Generator

Redis

Shuffle

Apache FlinkDeployment

Apache Flink

Window / SinkSource / Filter / Project / Join

10 Gige Link1 Gige Link

Page 18: Extending the Yahoo Streaming Benchmark

Processing Guarantees

Apples and OrangesApache Storm Apache Flink

At least once semantics

Exactly once semantics

Double counting after failures No double counting

Lost state after failures No state loss

Page 19: Extending the Yahoo Streaming Benchmark

Benchmark

Storm

Flink

0 750,000 1,500,000 2,250,000 3,000,000 3,750,000

Baseline

Throughput: msgs/sec

Page 20: Extending the Yahoo Streaming Benchmark

Bottleneck AnalysisApache Storm

Kafka

Kafka

Kafka

Source Filter Project Join Window Sink

FlinkData Generator

Shuffle

Apache Storm10 Gige Link1 Gige Link

Redis

Redis

Page 21: Extending the Yahoo Streaming Benchmark

Bottleneck AnalysisApache Storm

Kafka

Kafka

Kafka

Source Filter Project Join Window Sink

FlinkData Generator

Shuffle

Apache Storm10 Gige Link1 Gige Link

Redis

Redis

CPU

Page 22: Extending the Yahoo Streaming Benchmark

Redis

Kafka

Kafka

Kafka

FlinkData Generator

Redis

Shuffle

Bottleneck AnalysisApache Flink

Apache Flink

Window / SinkSource / Filter / Project / Join

10 Gige Link1 Gige Link

Page 23: Extending the Yahoo Streaming Benchmark

Redis

Kafka

Kafka

Kafka

FlinkData Generator

Redis

Shuffle

Bottleneck AnalysisApache Flink

Apache Flink

Window / SinkSource / Filter / Project / Join

10 Gige Link1 Gige Link

Network

Page 24: Extending the Yahoo Streaming Benchmark

Redis

Kafka

Kafka

Kafka

FlinkData Generator

Redis

Shuffle

Eliminate theBottleneck

Apache Flink

Window / SinkSource / Filter / Project / Join

10 Gige Link1 Gige Link

Page 25: Extending the Yahoo Streaming Benchmark

Redis

FlinkData Generator

Redis

Shuffle

Apache Flink

Window / SinkSource / Filter / Project / Join

10 Gige Link1 Gige Link

Eliminate theBottleneck

Page 26: Extending the Yahoo Streaming Benchmark

Redis

Redis

Shuffle

Apache Flink

Window / SinkSource / Filter / Project / Join

10 Gige Link1 Gige Link

DataGenerator

Eliminate theBottleneck

Page 27: Extending the Yahoo Streaming Benchmark

Redis

Redis

Shuffle

Apache Flink

Window / SinkSource / Filter / Project / Join

10 Gige Link1 Gige Link

DataGenerator

Apache FlinkDeployment

Round 2

Page 28: Extending the Yahoo Streaming Benchmark

Benchmark

Storm

Flink

0 750,000 1,500,000 2,250,000 3,000,000 3,750,000

Baseline

Throughput: msgs/sec

Page 29: Extending the Yahoo Streaming Benchmark

BenchmarkRound 2

Storm

Flink

Flink (10 GigE)

0 4,000,000 8,000,000 12,000,000 16,000,000

10 GigE end-to-end

Throughput: msgs/sec

Page 30: Extending the Yahoo Streaming Benchmark

Results• Apache Flink achieved 15 million messages

/ sec on Yahoo! benchmark• Much stronger processing guarantees:

Exactly once• 80x higher than what was reported in the

original Yahoo! benchmark on similar hardware

Page 31: Extending the Yahoo Streaming Benchmark

Questions?

Page 32: Extending the Yahoo Streaming Benchmark

Storm Compatibility• Lot’s of companies already have applications

written using the Storm API• Flink provides a Storm compatibility layer• Run your Storm jobs on Flink with a one line

code change• Flink also allows you to reuse your existing

Storm spout and bolt code from a Flink job• Give it a try!

Page 33: Extending the Yahoo Streaming Benchmark

Thanks!