Enterprise Grade Streaming under 2ms on Hadoop

Preview:

Citation preview

Enterprise Grade Streaming Under 2ms On Hadoop

@vijaysbhat

2

3

VS.

4

5

6

7

X (predictor)Spend amount, geo

Y (response)

Simple Velocity Advanced

8

9

10

11

Hard Metrics Goal

Latency < 40msIdeally < 16ms

Throughput Goal of 2000 events / second

Durability No loss, every message gets exactly one response

Availability 99.5% uptime (downtime of 1.83 days / year);Ideally 99.999% uptime (downtime of 5.26 minutes / year)

Scalability Can add resources, still meet latency requirements

Integration Transparently connected to existing systems – Hardware, Messaging, HDFS

Soft Metrics Goal

Open Source All components licensed as open source

Extensibility Rules can be updated, model is regularly refreshed

12

13

Onyx

14

Enterprise Readiness

RoadmapPerformance

Community

15

16

17

18

19

20

21

YARN

22

23

24

Failure Handling

25

26

• Avg. 0.25ms, @70k records/sec, w/ 600GB RAM

Thread Local on ~54M eventsPercentiles (in ms)

Throughput CountAvg

(ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s

70k/sec54,126,122 0.19 1 1 1 2 2 5 6

Performance

27

Durability

• Two physically independent pipelines on the same cluster processing identical data

• For the same tuple, we find the best-case time between two pipelines– 39 records out of 5.2M exceeded 16ms – 173 out of 5.2M exceeded 16ms in one pipeline but succeeded in the other

• 99.99925% success rate – “Five Nines”•Average Latency of 0.0981ms

28

@vijaysbhatlinkedin.com/in/vijaysbhat

Recommended