35
QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser [email protected]

Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

Embed Size (px)

Citation preview

Page 1: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Hadoop: Scalable Infrastructure for Big Data

QCon London 2012Parand Tony DarugarFounder and CEO, [email protected]

Page 2: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

What is Hadoop?

Page 3: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Hadoop is the Linux of Big Data Processing

Page 4: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Infrastructurefor

Large Scale Computation& Data Processingon a network of

Commodity Hardware.

Page 5: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Why Hadoop?

Page 6: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Scale

Page 7: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Cost

Page 8: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Freedom

Page 9: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Does Anyone Use Hadoop?

Page 10: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

IBMVISAMicrosoftFacebookYahooAOL...

eHarmonyZion's bankNY TimesTwittereBayLinkedIn...

Page 11: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Alternatives

Build your own

Get creative withRDBMS architecture

Page 12: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

What's the idea?

Page 13: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Commodity Hardware

Distributed Operation

Page 14: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Wisdom:

Embrace Failure (hardware)

Be Resilient (software)

Page 15: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

What's in the box?

Page 16: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Hadoop DistributedFile System

Page 17: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

DistributedComputationFramework

Page 18: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Map-ReduceProgramming Model

Page 19: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

HDFS

● Your data in triplicate● Built-in resiliency to large scale failures

● Intelligent Data Distribution● Very large data sizes

Page 20: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Distributed Computation

● Built-in resiliency to large scale failures

● Distribute work to workers, collect results from fastest

● Move computation to data (not data to computation)

Page 21: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Map Reduce

Very simple programming model: Map(anything)->key, value Sort, partition on key

Reduce(key,value)->key, valueNo parallel processing or message passing semanticsProgrammable in Java or any other language (streaming)

Page 22: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Ecosystem

HBase: NoSQL BigTable clone

Hive: Somewhat-SQL data store

Pig: SQL-like programming model

Chukwa, Scribe, Mahoot, Cassandra,Oozie, Sqoop, ...

Page 23: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Commercial Support

Cloudera

HortonWorks

IBM

...

Page 24: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

How?

Try it in non-distributed mode

Try it on a few spare machines

Try it on EC2

Try it! http://hadoop.apache.org/

Page 25: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Case Studies

Page 26: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

eHarmony

Page 27: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Biz360 (Attensity)

Page 28: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Yahoo!

Page 29: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

You!

Page 30: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Start with ETL

Page 31: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Start with batch,non time-criticaltasks

Page 32: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Start with storingyour large dataon HDFS

Page 33: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Move batch processing to Hadoop

Serve from RDBMS

Page 34: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Embrace. Be OneWith The Hadoop.

Page 35: Hadoop: Scalable Infrastructure for Big Data - QCon London · QCon London 2012 Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser

QCon London 2012

Parand Tony [email protected]

Questions?