The Hadoop Ecosystem& HBase
Kai Voigt, Cloudera Inc.Warsaw Hadoop User Group, July 11th, 2012
Freitag, 13. Juli 12
A Hadoop ClusterFreitag, 13. Juli 12
Part 1:Hadoop Ecosystem
Freitag, 13. Juli 12
Freitag, 13. Juli 12
HDFS
Freitag, 13. Juli 12
HDFS
MapReduce
Freitag, 13. Juli 12
HDFS
MapReduceJava
Java
Freitag, 13. Juli 12
HDFS
MapReduceJava
Java
hadoop fs
CmdLine
Freitag, 13. Juli 12
HDFS
MapReduceJava
Java
hadoop fs
CmdLine
FUSE
Posix
Freitag, 13. Juli 12
HDFS
MapReduceJava
Java
Sqoop
RDBMS
hadoop fs
CmdLine
FUSE
Posix
Freitag, 13. Juli 12
HDFS
MapReduceJava
Java
Sqoop
RDBMS
Flume
Events
hadoop fs
CmdLine
FUSE
Posix
Freitag, 13. Juli 12
HDFS
MapReduceJava
Java
Sqoop
RDBMS
Flume
Events
hadoop fs
CmdLine
FUSE
Posix
Streaming
Script
Freitag, 13. Juli 12
HDFS
MapReduce
Hive
SQL
Java
Java
Sqoop
RDBMS
Flume
Events
hadoop fs
CmdLine
FUSE
Posix
Streaming
Script
Freitag, 13. Juli 12
HDFS
MapReduce
Hive Pig
SQL
Java
Java
Script
Sqoop
RDBMS
Flume
Events
hadoop fs
CmdLine
FUSE
Posix
Streaming
Script
Freitag, 13. Juli 12
HDFS
MapReduce
Hive Pig Mahout
SQL
Java
Java
Script Java
Sqoop
RDBMS
Flume
Events
hadoop fs
CmdLine
FUSE
Posix
Streaming
Script
Freitag, 13. Juli 12
HDFS
HBaseMapReduce
Hive Pig Mahout
SQL
Java
Java
Script Java
Sqoop
RDBMS
Flume
Events
Java
hadoop fs
CmdLine
FUSE
Posix
Streaming
Script
Freitag, 13. Juli 12
HDFS
HBaseMapReduce
Hive Pig Mahout
SQL
Java
Java
Script Java
Sqoop
RDBMS
Flume
Events
Java
Oozie
Whirr
hadoop fs
CmdLine
FUSE
Posix
Streaming
Script
Hue
Freitag, 13. Juli 12
CDH 4.0
• Cloudera's Distribution Including Hadoop
• http://www.cloudera.com/
• Packages and Virtual Machines
• True Apache
HDFS HMapReH P MSJJ
S J
SR FlE
JOW
Freitag, 13. Juli 12
Part 2:Apache HBase
Freitag, 13. Juli 12
Data ModelRowID Col1 Col2 Col3 Col4 Col56289121219328342
aaa bbb cccddd eee 111
fff 222ggg hhh
iii jjj kkk lll 333mmm nnn
Freitag, 13. Juli 12
RegionsRowID Col1 Col2 Col3 Col4 Col56289121
aaa bbb cccddd eee 111
fff 222
RowID Col1 Col2 Col3 Col4 Col5219328342
ggg hhhiii jjj kkk lll 333
mmm nnn
Freitag, 13. Juli 12
Column FamiliesRowID Col1 Col26289121
aaa bbbddd
RowID Col3 Col4 Col56289121
ccceee 111
fff 222
RowID Col1 Col2219328342
gggiii jjj
mmm
RowID Col3 Col4 Col5219328342
hhhkkk lll 333nnn
Freitag, 13. Juli 12
Multiple Versions
Foo21:09RowID: 627
ColumnName: Col7
Freitag, 13. Juli 12
Multiple Versions
Foo21:09RowID: 627
ColumnName: Col7
Bar22:34
Freitag, 13. Juli 12
Multiple Versions
Foo21:09RowID: 627
ColumnName: Col7
Bar22:34'DEL'
23:12
Freitag, 13. Juli 12
Multiple Versions
Foo21:09RowID: 627
ColumnName: Col7
Bar22:34'DEL'
23:12
(RowID, Columnname, Timestamp) -> Value
Freitag, 13. Juli 12
Simple API
• PUT 'table', 'rowid', 'column', 'value'
• GET 'table', 'rowid', 'column'
• GET 'table', 'rowid'
• DELETE 'table', 'rowid', 'column'
• DELETE 'table', 'rowid'
• SCAN 'table'
Freitag, 13. Juli 12
Additional Features
• MapReduce Input/Output Format
• Hive Interface
• Thrift API
• RESTful API
• Sqoop Connector
• Flume Sink
Freitag, 13. Juli 12
Thank You!
Freitag, 13. Juli 12