Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
The Hadoop Stack, Part 2 Introduction to HBase
CSE40822–CloudComputing–Fall2018Prof.DouglasThain
UniversityofNotreDame
Three Case Studies
• Workflow:PigLatin• AdataflowlanguageandexecutionsystemthatprovidesanSQL-likewayofcomposingworkflowsofmultipleMap-Reducejobs.
• Storage:HBase• ANoSQlstoragesystemthatbringsahigherdegreeofstructuretotheflat-filenatureofHDFS.
• Execution:Spark• Anin-memorydataanalysissystemthatcanuseHadoopasapersistencelayer,enablingalgorithmsthatarenoteasilyexpressedinMap-Reduce.
References
• FayChangetal,Bigtable:ADistributedStorageSystemforStructuredData,OSDI2006• http://research.google.com/archive/bigtable.html
• IntroductiontoHbaseSchemaDesign,AmandeepKhurana,Login;Magazine,2012.• https://www.usenix.org/publications/login/october-2012-volume-37-number-5/introduction-hbase-schema-design
• ApacheHBaseDocumentation• http://hbase.apache.org
From HDFS to HBase
• HDFSprovidesuswithafilesystemconsistingofarbitrarilylargefilesthatcanonlybewrittenonceandareshouldbereadsequentially,endtoend.ThisworksfineforsequentialOLAPquerieslikePig.• OLTPworkloadswanttoreadandwriteindividualcellsinalargetable.(e.g.updateinventoryandpriceasorderscomein.)• HBaseimplementsOLTPinteractionsontopofHDFSbyusingadditionalstorageandmemorytoorganizethetables,andwritingthembacktoHDFSasneeded.• HBaseisanopensourceimplementationoftheBigTabledesignpublishedbyGoogle.
HBase Data Model
• Adatabaseconsistsofmultipletables.• Eachtableconsistsofmultiplerows,sortedbyrowkey.• Eachrowcontainsarowkeyandoneormorecolumnfamilies.• Eachcolumnfamilyisdefinedwhenthetableiscreated.• Columnfamiliescancontainmultiplecolumns.(family:column)• Acellisuniquelyidentifiedby(table,row,family:column).• Acellcontainsanuninterpretedarrayofbytesandatimestamp.
Data in Tabular Form
Name Home Office
Key First Last Phone Email Phone Email
101 Florian Krepsbach 555-1212 [email protected]
666-1212
102 Marilyn Tollerud 555-1213 666-1213
103 Pastor Inqvist 555-1214 [email protected]
Data in Tabular Form
Name Home Office Social
Key First Middle Last Phone Email Phone Email FacebookID
101 Florian Garfield Krepsbach 555-1212
666-1212
102 Marilyn Tollerud 555-1213
666-1213
103 Pastor Inqvist 555-1214
Newcolumnscanbeaddedatruntime.
Columnfamiliescannotbeaddedatruntime.
Don’t be fooled by the picture: Hbase is really a sparse table.
Nested Data Representation TablePeople(Name,Home,Office){
101:{ Timestamp:T403; Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”}, Home:{Phone=“555-1212”,Email=“[email protected]”}, Office:{Phone=“666-1212”,Email=“[email protected]”}},102:{ Timestamp:T593; Name:{First=“Marilyn”,Last=“Tollerud”}, Home:{Phone=“555-1213”}, Office:{Phone=“666-1213”}},…
}
Nested Data Representation GETPeople:101
101:{ Timestamp:T403; Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”}, Home:{Phone=“555-1212”,Email=“[email protected]”}, Office:{Phone=“666-1212”,Email=“[email protected]”}}
GETPeople:101:Name
People:101:Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”}GETPeople:101:Name:First
People:101:Name:First=“Florian”
Fundamental Operations
• CREATEtable,families• PUTtable,rowid,family:column,value• PUTtable,rowid,whole-row• GETtable,rowid• SCANtable(WITHfilters)• DROPtable
Consistency Model
• Atomicity:Entirerowsareupdatedatomicallyornotatall.• Consistency:
• AGETisguaranteedtoreturnacompleterowthatexistedatsomepointinthetable’shistory.(Checkthetimestamptobesure!)• ASCANmustincludealldatawrittenpriortothescan,andmayincludeupdatessinceitstarted.
• Isolation:Notguaranteedoutsideasinglerow.• Durability:Allsuccessfulwriteshavebeenmadedurableondisk.
The Implementation
• I’llgiveyoutheideainseveralsteps:• Idea1:Putanentiretableinonefile.(Itdoesn’twork.)• Idea2:Log+onefile.(Better,butdoesn’tscaletolargedata.)• Idea3:Log+onefilepercolumnfamily.(Gettingbetter!)• Idea4:Partitiontableintoregionsbykey.
• Keepinmindthat“onefile”meansonegiganticfilestoredinHDFS.Wedon’thavetoworryaboutthedetailsofhowthefileissplitintoblocks.
Idea 1: Put the Table in a Single File
• Howdowedothefollowingoperations?• CREATE• DELETE• SCAN• GET• PUT
101:{Timestamp:T403;Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”},Home:{Phone=“555-1212”,Email=“[email protected]”},Office:{Phone=“666-1212”,Email=“[email protected]”}},102:{Timestamp:T593;Name:{First=“Marilyn”,Last=“Tollerud”},Home:{Phone=“555-1213”},Office:{Phone=“666-1213”}},...
File“People”
Variable-Length Data is Fundamentally Hard!
Funreadingonthetopic:http://www.joelonsoftware.com/articles/fog0000000319.html
ID FirstName LastName Phone101 Florian Krepsbach 555-3434102 Marilyn Tollerud 555-1213103 Pastor Ingvist 555-1214
101:{Timestamp:T403;Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”},Home:{Phone=“555-1212”,Email=“[email protected]”},Office:{Phone=“666-1212”,Email=“[email protected]”}},102:{Timestamp:T593;Name:{First=“Marilyn”,Last=“Tollerud”},Home:{Phone=“555-1213”},Office:{Phone=“666-1213”}},...
SQLTable:People(ID:Integer,FirstName:CHAR[20],LastName:Char[20],Phone:CHAR[8])UPDATEPeopleSETPhone=“555-3434”WHEREID=403;
HBaseTablePeople(ID,Name,Home,Office)PUTPeople,403,Home:Phone,555-3434
Eachrowisexactly52byteslong.Tomovetothenextrow,justfseek(file,+52);TogettoRow401,fseek(file,401*52);Overwritethedatainplace.
?
Idea 2: One Tablet + Transaction Log
101:{Timestamp:T403;Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”},Home:{Phone=“555-1212”,Email=“[email protected]”},Office:{Phone=“666-1212”,Email=“[email protected]”}},102:{Timestamp:T593;Name:{First=“Marilyn”,Last=“Tollerud”},Home:{Phone=“555-1213”},Office:{Phone=“666-1213”}},...
TableforPeople
PUT101:Office:Phone=“555-3434”PUT102:Home:[email protected]….
TransactionLogforTablePeople:
Changesareappliedonlytothelogfile,Thentheresultingrecordiscachedinmemory.Readsmustconsultbothmemoryanddisk.
MemoryCacheforTablePeople:
101 102
GETPeople:101 GETPeople:103PUTPeople:101:Office:Phone=“555-3434”
Idea 2 Requires Periodic Log Compression
101:{Timestamp:T403;Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”},Home:{Phone=“555-1212”,Email=“[email protected]”},Office:{Phone=“666-1212”,Email=“[email protected]”}},102:{Timestamp:T593;Name:{First=“Marilyn”,Last=“Tollerud”},Home:{Phone=“555-1213”},Office:{Phone=“666-1213”}},...
TableforPeopleonDisk:(Old)
PUT101:Office:Phone=“555-3434”PUT102:Home:[email protected]….
TransactionLogforTablePeople:
101:{Timestamp:T403;Name:{First=“Florian”,Middle=“Garfield”,Last=“Krepsbach”},Home:{Phone=“555-1212”,Email=“[email protected]”},Office:{Phone=“555-3434”,Email=“[email protected]”}},102:{Timestamp:T593;Name:{First=“Marilyn”,Last=“Tollerud”},Home:{Phone=“555-1213”,Email=“[email protected]”},...
TableforPeopleonDisk:(New)
Writeoutanewcopyofthetable,withallofthechangesapplied.Deletethelogandmemorycache,andstartover.
Idea 3: Partition by Column Family
DataforColumnFamilyName
TabletsforPeopleonDisk:(Old)
PUT101:Office:Phone=“555-3434”PUT102:Home:[email protected]….
TransactionLogforTablePeople:
TabletsforPeopleonDisk:(New)
Writeoutanewcopyofthetablet,withallofthechangesapplied.Deletethelogandmemorycache,andstartover.
DataforColumnFamilyHome
DataforColumnFamilyOffice
DataforColumnFamilyHome(Changed)
DataforColumnFamilyOffice(Changed)
DataforColumnFamilyName
Idea 4: Split Into Regions
Region1:Keys100-200
Region2:Keys200-300
Region3:Keys300-400
Region4:Keys400-500
RegionServer
RegionMaster
RegionServer
RegionServer
RegionServer
TransactionLog
MemoryCache
Tablet
Tablet
Tablet
HbaseClient
(DetailofOneRegionServer)
ColumnFamilyName
ColumnFamilyHome
ColumnFamilyOffice
WherearetheserversfortablePeople?
Accessdataintables
ColumnFamilyName ColumnFamilyHome ColumnFamilyOffice
Region1Keys101-200
Region2Keys201-300
Region3Keys301-400
TablePeople
The consistency model is tightly coupled to the scalability of the implementation!
Consistency Model
• Atomicity:Entirerowsareupdatedatomicallyornotatall.• Consistency:
• AGETisguaranteedtoreturnacompleterowthatexistedatsomepointinthetable’shistory.(Checkthetimestamptobesure!)• ASCANmustincludealldatawrittenpriortothescan,andmayincludeupdatessinceitstarted.
• Isolation:Notguaranteedoutsideasinglerow.• Durability:Allsuccessfulwriteshavebeenmadedurableondisk.
Questions for Discussion
• Whatarethetradeoffsbetweenhavingaverytallverticaltableversushavingaverywidehorizontaltable?• Supposeatablestartsoffsmall,andthenalargenumberofrowsareaddedtothetable.Whathappensandwhatshouldthesystemdo?• UsingthePeopleexample,comeupwithsomeexamplesofsimplequeriesthatareveryfast,andsomethatareveryslow.Isthereanythingthatcanbedoneabouttheslowqueries?• WoulditbepossibletohaveSCANreturnaconsistentsnapshotofthesystem?Howwouldthatwork?