Hadoop is-dead

Preview:

DESCRIPTION

The report of Hadoop's death has been greatly exaggerated.

Citation preview

HADOOP: NOT DEAD YETClint Green (+clintgreen)

Volume Velocity Variety

“Big Data” is when the size of the data itself becomes part of the

problem.

- Big Data Now, O’Reilly

PERCEPTION IS REALITY

Hadoop is Flawed

“You can’t install it without an expert.”

“Fine for R&D, but not for real production.”

“Hadoop is just for batch processing.”

“The dirty-little-secret with Hadoop is…”

Hadoop isn’t for RealWork™

1.Adopt Hadoop for pilot projects.2.Scale Hadoop to production use.3.Observe an unacceptable

performance penalty.4.Morph to a real parallel DBMS.

-Michael Stonebraker, CACM, May 2012(Vertica, VoltDB, SciDB)

Availability

Partition ToleranceConsistency

Availability

Partition ToleranceConsistency

Atomicity

Consistency

Isolation

Durability

ACI

D

“4. Morph to a real parallel DBMS.”

REALITY IS RELATIVE

Evolve

“Hadoop has become the kernel of the distributed operating system for Big Data…

No one uses the kernel alone.”-Doug Cutting, Strata 2012

(Cloudera, ASF)

Hadoop + MapReduce

“There is nothing really embarrassing about embarrassingly parallel applications."

-Luiz André Barroso, ACM 2011(Distinguished Engineer Google)

Not Just for Batch Anymore…

APACHEHAMA D

RILL

APACHE

Apache Hadoop YARNThe per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.

No Secret Here

Help is on the Way

ACTUAL PROBLEMS

VS

VS

THANK YOUClint Green (+clintgreen)

Recommended