19
Extending the Yahoo Streaming Benchmark for Apache Apex San Jose Apache Apex Meetup May 4 th 2016 Sandesh Hegde [email protected]

Extending the Yahoo streaming benchmark to Apache Apex

Embed Size (px)

Citation preview

Extending the Yahoo

Streaming Benchmark for Apache Apex

San Jose Apache Apex MeetupMay 4th 2016

Sandesh [email protected]

Background

• Yahoo created a benchmark to compare Stream processing systems and

compared Storm, Flink and Spark Streaming [1]

• dataArtisans extended the benchmark by comparing Flink and Storm with different scenarios [2]

• No benchmark comparison about Stream processing is complete without including Apache Apex.

2

Yahoo Streaming Benchmark

Simple Advertisement Application : To see how many times an ad campaign has been seen in an window.

• Read ads from Kafka• Deserialize JSON string

• Filter unnecessary ads

• Projection of Fields ( remove non-essential fields )

• Join ad id with campaign id from Redis

• Windowed count per campaign and output to Redis

3

Application - with Kafka

4

Kafka Input Deserialize FilterKafka Redis OutputRedis JoinFilter Fields

Setup• Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz• 10GigE Between compute nodes• 4 Kafka Brokers ( 2 Partitions each & 1 Replica )• Kafka Version : 0.8.2• Apex ( 3.4-SNAPSHOT & 3.3 ) & Flink ( 1.0.2 )• Yarn-Containers size: 16GB• 1 ZooKeeper• Message Size: 218 Bytes• Sample Message: {"user_id":"e5e0db4b-05ea-4ac5-af7a-4bba5ed27c4c","

page_id":"80f60d0a-b02b-40e2-a667-5548a1120dda","ad_id":"600589859","ad_type":"banner78","event_type":"purchase","event_time":"1462374087774","ip_address":"1.2.3.4"}

5

Apex Application

6

Physical Plan

7

Quick Primer on Locality

8

• CONTAINER_LOCAL■ Deployed in the same process, different threads■ No serialization■ Queue between the operators

• THREAD_LOCAL■ Same thread■ No serialization■ Use it only when operators do light work

Note: [New feature] Anti Affinity is not covered here.

Benchmarking Against Previous Releases

9https://www.datatorrent.com/blog/blog-apex-performance-benchmark/

Part of Release Certification

Application : with Kafka

10

https://github.com/sandeshh/streaming-benchmarks

Application - With Generator

11

Kafka Input Deserialize FilterKafka Redis OutputRedis JoinFilter Fields

Generator

Application - With Generator

12

https://github.com/sandeshh/streaming-benchmarks Setup: Single Partition

State of the Art & Streaming

13

Generator Filter Redis OutputRedis JoinFilter Fields

What’s our recommendation to query the State?In memory Key-Value store in the operators?

Application - State Store & Query

14

Generator FilterDimensional Computation

Redis JoinFilter Fields Store (HDHT) QueryResult

1. Durable state ( HDHT is a key value store native to Hadoop ) [4]2. Single System, scales with your application3. Easy integration with external Consoles [7]4. Low operability cost

5. Complex Dimensional Computation [5][6]

Demo

15

Q&A

16

References

17

1. https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at

2. http://data-artisans.com/extending-the-yahoo-streaming-benchmark/

3. https://www.datatorrent.com/blog/blog-apex-performance-benchmark/

4. https://www.datatorrent.com/blog/data-store-for-scalable-stream-processing/

5. https://www.datatorrent.com/blog/blog-dimensions-computation-aggregate-navigator-part-1-intro/

6. https://www.datatorrent.com/blog/dimensions-computation-aggregate-navigator-part-2-implementation/

7. http://docs.datatorrent.com/app_data_framework/

© 2016 DataTorrent

Resources

18

• Apache Apex - http://apex.apache.org/• Subscribe - http://apex.apache.org/community.html• Download - https://www.datatorrent.com/download/• Twitter

o @ApacheApex; Follow - https://twitter.com/apacheapexo @DataTorrent; Follow – https://twitter.com/datatorrent

• Meetups - http://www.meetup.com/topics/apache-apex• Webinars - https://www.datatorrent.com/webinars/• Videos - https://www.youtube.com/user/DataTorrent• Slides - http://www.slideshare.net/DataTorrent/presentations • Startup Accelerator Program - Full featured enterprise product

o https://www.datatorrent.com/product/startup-accelerator/

© 2016 DataTorrent

We Are Hiring

19

[email protected]

• Developers/Architects

• QA Automation Developers

• Information Developers

• Build and Release

• Community Leaders