Enterprise Grade Streaming under 2ms on Hadoop

View
197
Download
2
Category

Technology

Preview:

Citation preview

Enterprise Grade Streaming Under 2ms On Hadoop

@vijaysbhat

VS.

X (predictor)Spend amount, geo

Y (response)

Simple Velocity Advanced

Hard Metrics Goal

Latency < 40msIdeally < 16ms

Throughput Goal of 2000 events / second

Durability No loss, every message gets exactly one response

Availability 99.5% uptime (downtime of 1.83 days / year);Ideally 99.999% uptime (downtime of 5.26 minutes / year)

Scalability Can add resources, still meet latency requirements

Integration Transparently connected to existing systems – Hardware, Messaging, HDFS

Soft Metrics Goal

Open Source All components licensed as open source

Extensibility Rules can be updated, model is regularly refreshed

Onyx

Enterprise Readiness

RoadmapPerformance

Community

YARN

Failure Handling

• Avg. 0.25ms, @70k records/sec, w/ 600GB RAM

Thread Local on ~54M eventsPercentiles (in ms)

Throughput CountAvg

(ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s

70k/sec54,126,122 0.19 1 1 1 2 2 5 6

Performance

Durability

• Two physically independent pipelines on the same cluster processing identical data

• For the same tuple, we find the best-case time between two pipelines– 39 records out of 5.2M exceeded 16ms – 173 out of 5.2M exceeded 16ms in one pipeline but succeeded in the other

• 99.99925% success rate – “Five Nines”•Average Latency of 0.0981ms

@vijaysbhatlinkedin.com/in/vijaysbhat

Recommended

Bryon Gill, Pittsburgh Supercomputing Center We Will Discuss •Hadoop Architecture Overview •Practical Examples •“Classic” Map-Reduce •Hadoop Streaming •Spark, Hbase and

Documents

Ambari-Apex-RTS Integration for Big Data Hadoop Streaming Apps Operations

Technology

Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming

Technology

Stream Processingfiles.meetup.com/18978602/Pune_Apex_Meetup_03Feb_2016_Strea… · •Open Source Projects •Large Scale Compute & Storage –Hadoop, NoSQL •Streaming Technologies

Documents

Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Streaming Revolution

Technology

Tytuł oryginału: Hadoop: The Definitive Guide, Fourth Edition · 2019-05-15 · Narzędzie Streaming Hadoop ... Avro i model MapReduce .....346 Sortowanie za pomocą modelu MapReduce

Documents

Streaming IoT Analytics with the PI Integrator for Azure ... · Streaming IoT Analytics with the PI Integrator for Azure, ... DW OLTP, DW, Hadoop, EDSs Hadoop, Teradata, Linux, Windows

Documents

Tytuł oryginału: Hadoop: The Definitive Guide, Fourth Editionpdf.helion.pl/hadoop/hadoop.pdf · 6 _ Spis treści Narzędzie Streaming Hadoop .....57

Documents

Yahoo Audience Expansion: Migration from Hadoop Streaming to Spark

Documents

Real-Time Streaming: IMS to Apache Kafka and Hadoop - 2017€¦ · Hadoop - 2017 Scott Quillicy SQData. Virtual IMS User Group August 22nd 2017 Outline methods of streaming mainframe

Documents

Low Latency Streaming Data Processing in Hadoop

Data & Analytics

Hadoop for Data Science: Moving from BI dashboards to R models, using Hive streaming

Technology

Big Data Infrastructures & Technologies - …homepages.cwi.nl/~boncz/bads/03-The Hadoop Ecosystem.pdf · Big Data Infrastructures & Technologies Hadoop Streaming ... Big Data Infrastructures

Documents

Ejecutando Lenguaje R en Hadoop: BigR - Meetupfiles.meetup.com/7770922/BigR.pdf · Hadoop Streaming – Open Source – part of Hadoop frameworks. Invoking R Script in MapReduce through

Documents

Stainless Steel Pilot Piston Solenoid Valve 2MS Series for ... VALVE-S-M-H... · Stainless Steel Pilot Piston Solenoid Valve 2MS Series ... 2MS/2MSO Series Stainless Steel Pilot Piston

Documents

Hadoop architecture and ecosystem...Input stream 17 Test Spark streaming Second sentence Spark streaming Second Spark batch of 10 seconds (test,1), (spark,2), (streaming,2), ... version

Documents

iotools and ROctopusiotools: hmr() - Hadoop Map Reduce • iotools: highly efficient chunk-wise I/O on streams: let’s use it with Hadoop streaming! • “formatters” define how

Documents

About Streaming Data Solutions for Hadoop

Technology

MapReduce: Programming - Fordhamstorm.cis.fordham.edu/zhang/cs5950/slides/MapReduceProgramming.pdfOutline • Review and demo • Homework 1 • MapReduce paradigm: hadoop streaming

Documents

Recommendations with hadoop streaming and python

Technology