Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable...

Preview:

Citation preview

QCon London 2012

Hadoop: Scalable Infrastructure for Big Data

QCon London 2012Parand Tony DarugarFounder and CEO, Xpenserparand@xpenser.com

QCon London 2012

What is Hadoop?

QCon London 2012

Hadoop is the Linux of Big Data Processing

QCon London 2012

Infrastructurefor

Large Scale Computation& Data Processingon a network of

Commodity Hardware.

QCon London 2012

Why Hadoop?

QCon London 2012

Scale

QCon London 2012

Cost

QCon London 2012

Freedom

QCon London 2012

Does Anyone Use Hadoop?

QCon London 2012

IBMVISAMicrosoftFacebookYahooAOL...

eHarmonyZion's bankNY TimesTwittereBayLinkedIn...

QCon London 2012

Alternatives

Build your own

Get creative withRDBMS architecture

QCon London 2012

What's the idea?

QCon London 2012

Commodity Hardware

Distributed Operation

QCon London 2012

Wisdom:

Embrace Failure (hardware)

Be Resilient (software)

QCon London 2012

What's in the box?

QCon London 2012

Hadoop DistributedFile System

QCon London 2012

DistributedComputationFramework

QCon London 2012

Map-ReduceProgramming Model

QCon London 2012

HDFS

● Your data in triplicate● Built-in resiliency to large scale failures

● Intelligent Data Distribution● Very large data sizes

QCon London 2012

Distributed Computation

● Built-in resiliency to large scale failures

● Distribute work to workers, collect results from fastest

● Move computation to data (not data to computation)

QCon London 2012

Map Reduce

Very simple programming model: Map(anything)->key, value Sort, partition on key

Reduce(key,value)->key, valueNo parallel processing or message passing semanticsProgrammable in Java or any other language (streaming)

QCon London 2012

Ecosystem

HBase: NoSQL BigTable clone

Hive: Somewhat-SQL data store

Pig: SQL-like programming model

Chukwa, Scribe, Mahoot, Cassandra,Oozie, Sqoop, ...

QCon London 2012

Commercial Support

Cloudera

HortonWorks

IBM

...

QCon London 2012

How?

Try it in non-distributed mode

Try it on a few spare machines

Try it on EC2

Try it! http://hadoop.apache.org/

QCon London 2012

Case Studies

QCon London 2012

eHarmony

QCon London 2012

Biz360 (Attensity)

QCon London 2012

Yahoo!

QCon London 2012

You!

QCon London 2012

Start with ETL

QCon London 2012

Start with batch,non time-criticaltasks

QCon London 2012

Start with storingyour large dataon HDFS

QCon London 2012

Move batch processing to Hadoop

Serve from RDBMS

QCon London 2012

Embrace. Be OneWith The Hadoop.

QCon London 2012

Parand Tony Darugarparand@xpenser.com

Questions?

Recommended