Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
CS378–BigDataProgramming
Lecture18HadoopEcosystem
CS378-Fall2017 BigDataProgramming 1
HadoopEcosystem
• Manyothertoolshavebeenimplementedon– Hadoop– HDFS(HadoopDistributedFileSystem)
• We’llbrieflydiscussafew– HBase– ZooKeeper– Pig,Impala– Hive
CS378-Fall2017 BigDataProgramming 2
HadoopEcosystemthebigdatablog.weebly.com
CS378-Fall2017 BigDataProgramming 3
HBase
• Column-orientdatabase– ImplementedontopofHDFS– Distributed
• Goalistoscaletoverylargedatasets– Withreal-Tmeread/writeaccess
CS378-Fall2017 BigDataProgramming 4
Column-orientedDatabase
• Tablecellsaretheunitofaccess– Contentisuninterpretedarrayofbytes– Acellisversioned(canhavemulTpleversions)
• Tablecellisaccessedby– Row,column,andversion(oZenaTmestamp)
• Columnsaregroupedintofamilies
CS378-Fall2017 BigDataProgramming 5
Column-orientedDatabase
• Newcolumnfamilymemberscanbeadded
• Columnfamilymembersarestoredtogether• Forbestperformance,familymembersshouldbeaccessedtogether
• Rowscanbesubsetintoregions
CS378-Fall2017 BigDataProgramming 6
HbaseFigure13-1fromHadoopTheDefiniTveGuide3rdEdiTon
CS378-Fall2017 BigDataProgramming 7
ZooKeeper
• MessagingandsynchronizaToninadistributedenvironment– Distributedqueues,locks– LeaderelecTonamongagroupofpeers
• Highavailability(toleratesfailures)• LooselycoupledinteracTons– Rendezvousmechanism
CS378-Fall2017 BigDataProgramming 8
Pig
• HigherleveldatastructuresandoperaTons– HigherlevelthanJavacodeformap-reducejob
• Language:PigLaTn– OperaTonsandtransformaTonsondata– Pigconvertsthesetomap-reducejobsforyou
• ThinkofitasaquerylanguagefordatainHDFS
CS378-Fall2017 BigDataProgramming 9
PigExamplesSummarizaTon
CS378-Fall2017 BigDataProgramming 10
PigExamplesBinning
CS378-Fall2017 BigDataProgramming 11
PigExamplesJoin
CS378-Fall2017 BigDataProgramming 12
Impala
• InteracTveSQLfordatainHDFS,HBase
• SQLprocessingengine– ParallelexecuTon– Horizontalscaling
• Runsoneachdatanode– DirectaccesstoHDFS,HBase(nomap-reduce)
CS378-Fall2017 BigDataProgramming 13
Hive
• DatawarehouseontopofHadoop
• SQLforaccess– Hiveconvertsaqueryintoaseriesofmap-reducesteps
• VariousHiveclientsareavailable– JDBC,ODBC,ThriZ,…
CS378-Fall2017 BigDataProgramming 14