1/14
Miscellanea Assignment #3 Extensions Summary Conclusion References
CS-695 NoSQL DatabaseHBase (part 2 of 2)
Dr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck Cartledge
1 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 2015
2/14
Miscellanea Assignment #3 Extensions Summary Conclusion References
Table of contents I
1 Miscellanea
2 Assignment #3
3 Extensions
4 Summary
5 Conclusion
6 References
3/14
Miscellanea Assignment #3 Extensions Summary Conclusion References
Corrections and additions since last lecture.
Assignment #03 isavailable1.
Corrected typos in lecture#005
4/14
Miscellanea Assignment #3 Extensions Summary Conclusion References
Words of explanation.
The full text is available at:http://www.cs.odu.edu/
~ccartled/
Teaching/2015-Fall/NoSQL/
Assignments/03/
In general terms:
1 Parse data
2 Create columnar database
3 Alter the database for newdata
4 Query database
5 Create histograms
5/14
Miscellanea Assignment #3 Extensions Summary Conclusion References
They are minimal.
HBase is more of a data store than a database.
Large omission is lack join capability
Can be implemented via MapReduce
Can be implemented via client app.
Nested loop, scanning each table entry against all table otherentries
Lots of IO means slow
6/14
Miscellanea Assignment #3 Extensions Summary Conclusion References
They are minimal.
A variety and selection of HBase shell commands1
General — status, version, whoami
Management — alter, create, describe, disable( all),drop( all), enable( all), list, show filters (part of scan)
Data manipulation — count, delete(all), get, get counter,incr, put, scan, truncate
“Surgery tools” — assign, balancer( switch), close region,compact, flush, major compact, move, split, unassign,zk dump
Cluster replication — a raft of commands
Security tools — grant, revoke, user permission
1https://learnhbase.wordpress.com/2013/03/02/hbase-shell-commands/
7/14
Miscellanea Assignment #3 Extensions Summary Conclusion References
They are minimal.
Example scan commands2
scan command applies any filters and returns rows and columns that pass thefilter
scan ’hbase:meta’
scan ’hbase:meta’, LIMIT => 1
scan ’hbase:meta’, LIMIT => 10, COLUMNS => ’info:regioninfo’
scan ’hbase:meta’, LIMIT => 1, STARTROW => ’bethu’
2https://wiki.apache.org/hadoop/Hbase/Shell
8/14
Miscellanea Assignment #3 Extensions Summary Conclusion References
They are minimal.
Example filters 3
Filters return rows that pass the filter criteria (this may be adifferent type operation than some people are used to performing).
Syntax: FilterName (argument, argument,... , argument)
Compound operators: AND, OR, SKIP, and WHILE
Execution order: parens, SKIP and WHILE, AND, OR
Operators, <,<=,=, ! =, >=, >
“PrefixFilter (’Row’) AND PageFilter (1) ANDFirstKeyOnlyFilter ()”
A complete list of filters is described in Chapter 75 [2].Available filters are found using the HBase show filters
command.
3http://www.hadooptpoint.com/filters-in-hbase-shell
9/14
Miscellanea Assignment #3 Extensions Summary Conclusion References
They are minimal.
Misc. sampling of companies using HBase4
Adobe — small cluster
Caree.rs — high tech hiring firm, backend data, machinelearning, hire recommendations
Facebook — power messages infrastructure
Infolinks — In-Text ad provider to optimize ad placement
OCLC — catalog collections from 72,000 libraries
OpenLogic — world’s Open Source packages
Pacific Northwest National Laboratory — biological datawarehouse
Stumbleupon — real-time data storage and analytics
Twitter — back up of all mysql tables in production backend
WorldLingo — machine translation4https://wiki.apache.org/hadoop/Hbase/PoweredBy
10/14
Miscellanea Assignment #3 Extensions Summary Conclusion References
They are minimal.
Bloom Filters
A space efficient probabilistic data structure designed to answer the question:have I seen this data before??
Small number of hash functionseach set single bit in a bit vector
Same hash functions used to seeif bits are set
May return FALSE POSITIVE
Will not return FALSENEGATIVE
As filter becomes fuller, moreFALSE POSITIVEs
Image from [1].
Conceived by Burton Howard Bloom in 1970 as part of a spell checker.
11/14
Miscellanea Assignment #3 Extensions Summary Conclusion References
Strengths and weaknesses
Good and not so good
Strengths
Autoversioning of dataScaleability (gigabyte or terabyte databases)Community support
Weaknesses
Scaleability (gigabyte or terabyte databases)not small databasesSmall data sets (A few thousand/millionrows[2])Lack of strong query languageRequires Hadoop and Zookeeper typeinfrastructure (i.e., hardware)Lack of sorting and indexing capabilitiesLack of data types
12/14
Miscellanea Assignment #3 Extensions Summary Conclusion References
Applicabilities
Good for, and not so good for
Good fit
Event loggingContent managementsystems, bloggingplatformsCounters, and expiringusage cases
Not so good fit
If ACID is requiredAggregating data (SUM ,AVG, etc.)Prototyping applications
13/14
Miscellanea Assignment #3 Extensions Summary Conclusion References
What have we covered?
Reviewed assignment #03Covered some of the HBase“querying” capabilitiesRemember Assignment #03due before next class
Next time: MongoDB
14/14
Miscellanea Assignment #3 Extensions Summary Conclusion References
References I
[1] Ben Podgursky, Bloomjoin: Bloomfilter + cogroup,http://liveramp.com/engineering/bloomjoin-bloomfilter-cogroup/,2013.
[2] Apache Staff, Apache hbase reference guide,http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture, 2015.