14
1/14 Miscellanea Assignment #3 Extensions Summary Conclusion References CS-695 NoSQL Database HBase (part 2 of 2) Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge Dr. Chuck Cartledge 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015 1 Oct. 2015

CS-695 NoSQL Database HBase (part 2 of 2)ccartled/Teaching/2015... · Misc. sampling of companies using HBase4 Adobe — small cluster Caree.rs — high tech hiring firm, backend

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS-695 NoSQL Database HBase (part 2 of 2)ccartled/Teaching/2015... · Misc. sampling of companies using HBase4 Adobe — small cluster Caree.rs — high tech hiring firm, backend

1/14

Miscellanea Assignment #3 Extensions Summary Conclusion References

CS-695 NoSQL DatabaseHBase (part 2 of 2)

Dr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck Cartledge

1 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 20151 Oct. 2015

Page 2: CS-695 NoSQL Database HBase (part 2 of 2)ccartled/Teaching/2015... · Misc. sampling of companies using HBase4 Adobe — small cluster Caree.rs — high tech hiring firm, backend

2/14

Miscellanea Assignment #3 Extensions Summary Conclusion References

Table of contents I

1 Miscellanea

2 Assignment #3

3 Extensions

4 Summary

5 Conclusion

6 References

Page 3: CS-695 NoSQL Database HBase (part 2 of 2)ccartled/Teaching/2015... · Misc. sampling of companies using HBase4 Adobe — small cluster Caree.rs — high tech hiring firm, backend

3/14

Miscellanea Assignment #3 Extensions Summary Conclusion References

Corrections and additions since last lecture.

Assignment #03 isavailable1.

Corrected typos in lecture#005

Page 4: CS-695 NoSQL Database HBase (part 2 of 2)ccartled/Teaching/2015... · Misc. sampling of companies using HBase4 Adobe — small cluster Caree.rs — high tech hiring firm, backend

4/14

Miscellanea Assignment #3 Extensions Summary Conclusion References

Words of explanation.

The full text is available at:http://www.cs.odu.edu/

~ccartled/

Teaching/2015-Fall/NoSQL/

Assignments/03/

In general terms:

1 Parse data

2 Create columnar database

3 Alter the database for newdata

4 Query database

5 Create histograms

Page 5: CS-695 NoSQL Database HBase (part 2 of 2)ccartled/Teaching/2015... · Misc. sampling of companies using HBase4 Adobe — small cluster Caree.rs — high tech hiring firm, backend

5/14

Miscellanea Assignment #3 Extensions Summary Conclusion References

They are minimal.

HBase is more of a data store than a database.

Large omission is lack join capability

Can be implemented via MapReduce

Can be implemented via client app.

Nested loop, scanning each table entry against all table otherentries

Lots of IO means slow

Page 6: CS-695 NoSQL Database HBase (part 2 of 2)ccartled/Teaching/2015... · Misc. sampling of companies using HBase4 Adobe — small cluster Caree.rs — high tech hiring firm, backend

6/14

Miscellanea Assignment #3 Extensions Summary Conclusion References

They are minimal.

A variety and selection of HBase shell commands1

General — status, version, whoami

Management — alter, create, describe, disable( all),drop( all), enable( all), list, show filters (part of scan)

Data manipulation — count, delete(all), get, get counter,incr, put, scan, truncate

“Surgery tools” — assign, balancer( switch), close region,compact, flush, major compact, move, split, unassign,zk dump

Cluster replication — a raft of commands

Security tools — grant, revoke, user permission

1https://learnhbase.wordpress.com/2013/03/02/hbase-shell-commands/

Page 7: CS-695 NoSQL Database HBase (part 2 of 2)ccartled/Teaching/2015... · Misc. sampling of companies using HBase4 Adobe — small cluster Caree.rs — high tech hiring firm, backend

7/14

Miscellanea Assignment #3 Extensions Summary Conclusion References

They are minimal.

Example scan commands2

scan command applies any filters and returns rows and columns that pass thefilter

scan ’hbase:meta’

scan ’hbase:meta’, LIMIT => 1

scan ’hbase:meta’, LIMIT => 10, COLUMNS => ’info:regioninfo’

scan ’hbase:meta’, LIMIT => 1, STARTROW => ’bethu’

2https://wiki.apache.org/hadoop/Hbase/Shell

Page 8: CS-695 NoSQL Database HBase (part 2 of 2)ccartled/Teaching/2015... · Misc. sampling of companies using HBase4 Adobe — small cluster Caree.rs — high tech hiring firm, backend

8/14

Miscellanea Assignment #3 Extensions Summary Conclusion References

They are minimal.

Example filters 3

Filters return rows that pass the filter criteria (this may be adifferent type operation than some people are used to performing).

Syntax: FilterName (argument, argument,... , argument)

Compound operators: AND, OR, SKIP, and WHILE

Execution order: parens, SKIP and WHILE, AND, OR

Operators, <,<=,=, ! =, >=, >

“PrefixFilter (’Row’) AND PageFilter (1) ANDFirstKeyOnlyFilter ()”

A complete list of filters is described in Chapter 75 [2].Available filters are found using the HBase show filters

command.

3http://www.hadooptpoint.com/filters-in-hbase-shell

Page 9: CS-695 NoSQL Database HBase (part 2 of 2)ccartled/Teaching/2015... · Misc. sampling of companies using HBase4 Adobe — small cluster Caree.rs — high tech hiring firm, backend

9/14

Miscellanea Assignment #3 Extensions Summary Conclusion References

They are minimal.

Misc. sampling of companies using HBase4

Adobe — small cluster

Caree.rs — high tech hiring firm, backend data, machinelearning, hire recommendations

Facebook — power messages infrastructure

Infolinks — In-Text ad provider to optimize ad placement

OCLC — catalog collections from 72,000 libraries

OpenLogic — world’s Open Source packages

Pacific Northwest National Laboratory — biological datawarehouse

Stumbleupon — real-time data storage and analytics

Twitter — back up of all mysql tables in production backend

WorldLingo — machine translation4https://wiki.apache.org/hadoop/Hbase/PoweredBy

Page 10: CS-695 NoSQL Database HBase (part 2 of 2)ccartled/Teaching/2015... · Misc. sampling of companies using HBase4 Adobe — small cluster Caree.rs — high tech hiring firm, backend

10/14

Miscellanea Assignment #3 Extensions Summary Conclusion References

They are minimal.

Bloom Filters

A space efficient probabilistic data structure designed to answer the question:have I seen this data before??

Small number of hash functionseach set single bit in a bit vector

Same hash functions used to seeif bits are set

May return FALSE POSITIVE

Will not return FALSENEGATIVE

As filter becomes fuller, moreFALSE POSITIVEs

Image from [1].

Conceived by Burton Howard Bloom in 1970 as part of a spell checker.

Page 11: CS-695 NoSQL Database HBase (part 2 of 2)ccartled/Teaching/2015... · Misc. sampling of companies using HBase4 Adobe — small cluster Caree.rs — high tech hiring firm, backend

11/14

Miscellanea Assignment #3 Extensions Summary Conclusion References

Strengths and weaknesses

Good and not so good

Strengths

Autoversioning of dataScaleability (gigabyte or terabyte databases)Community support

Weaknesses

Scaleability (gigabyte or terabyte databases)not small databasesSmall data sets (A few thousand/millionrows[2])Lack of strong query languageRequires Hadoop and Zookeeper typeinfrastructure (i.e., hardware)Lack of sorting and indexing capabilitiesLack of data types

Page 12: CS-695 NoSQL Database HBase (part 2 of 2)ccartled/Teaching/2015... · Misc. sampling of companies using HBase4 Adobe — small cluster Caree.rs — high tech hiring firm, backend

12/14

Miscellanea Assignment #3 Extensions Summary Conclusion References

Applicabilities

Good for, and not so good for

Good fit

Event loggingContent managementsystems, bloggingplatformsCounters, and expiringusage cases

Not so good fit

If ACID is requiredAggregating data (SUM ,AVG, etc.)Prototyping applications

Page 13: CS-695 NoSQL Database HBase (part 2 of 2)ccartled/Teaching/2015... · Misc. sampling of companies using HBase4 Adobe — small cluster Caree.rs — high tech hiring firm, backend

13/14

Miscellanea Assignment #3 Extensions Summary Conclusion References

What have we covered?

Reviewed assignment #03Covered some of the HBase“querying” capabilitiesRemember Assignment #03due before next class

Next time: MongoDB

Page 14: CS-695 NoSQL Database HBase (part 2 of 2)ccartled/Teaching/2015... · Misc. sampling of companies using HBase4 Adobe — small cluster Caree.rs — high tech hiring firm, backend

14/14

Miscellanea Assignment #3 Extensions Summary Conclusion References

References I

[1] Ben Podgursky, Bloomjoin: Bloomfilter + cogroup,http://liveramp.com/engineering/bloomjoin-bloomfilter-cogroup/,2013.

[2] Apache Staff, Apache hbase reference guide,http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture, 2015.