Upload
dinhdiep
View
217
Download
3
Embed Size (px)
Citation preview
QCon London 2012
Hadoop: Scalable Infrastructure for Big Data
QCon London 2012Parand Tony DarugarFounder and CEO, [email protected]
QCon London 2012
What is Hadoop?
QCon London 2012
Hadoop is the Linux of Big Data Processing
QCon London 2012
Infrastructurefor
Large Scale Computation& Data Processingon a network of
Commodity Hardware.
QCon London 2012
Why Hadoop?
QCon London 2012
Scale
QCon London 2012
Cost
QCon London 2012
Freedom
QCon London 2012
Does Anyone Use Hadoop?
QCon London 2012
IBMVISAMicrosoftFacebookYahooAOL...
eHarmonyZion's bankNY TimesTwittereBayLinkedIn...
QCon London 2012
Alternatives
Build your own
Get creative withRDBMS architecture
QCon London 2012
What's the idea?
QCon London 2012
Commodity Hardware
Distributed Operation
QCon London 2012
Wisdom:
Embrace Failure (hardware)
Be Resilient (software)
QCon London 2012
What's in the box?
QCon London 2012
Hadoop DistributedFile System
QCon London 2012
DistributedComputationFramework
QCon London 2012
Map-ReduceProgramming Model
QCon London 2012
HDFS
● Your data in triplicate● Built-in resiliency to large scale failures
● Intelligent Data Distribution● Very large data sizes
QCon London 2012
Distributed Computation
● Built-in resiliency to large scale failures
● Distribute work to workers, collect results from fastest
● Move computation to data (not data to computation)
QCon London 2012
Map Reduce
Very simple programming model: Map(anything)->key, value Sort, partition on key
Reduce(key,value)->key, valueNo parallel processing or message passing semanticsProgrammable in Java or any other language (streaming)
QCon London 2012
Ecosystem
HBase: NoSQL BigTable clone
Hive: Somewhat-SQL data store
Pig: SQL-like programming model
Chukwa, Scribe, Mahoot, Cassandra,Oozie, Sqoop, ...
QCon London 2012
Commercial Support
Cloudera
HortonWorks
IBM
...
QCon London 2012
How?
Try it in non-distributed mode
Try it on a few spare machines
Try it on EC2
Try it! http://hadoop.apache.org/
QCon London 2012
Case Studies
QCon London 2012
eHarmony
QCon London 2012
Biz360 (Attensity)
QCon London 2012
Yahoo!
QCon London 2012
You!
QCon London 2012
Start with ETL
QCon London 2012
Start with batch,non time-criticaltasks
QCon London 2012
Start with storingyour large dataon HDFS
QCon London 2012
Move batch processing to Hadoop
Serve from RDBMS
QCon London 2012
Embrace. Be OneWith The Hadoop.