Upload
ud
View
219
Download
0
Embed Size (px)
Citation preview
7/24/2019 No SQL HBase Overview v 1.0
1/31
Overview of NoSQL Databases
& HBase
7/24/2019 No SQL HBase Overview v 1.0
2/31
Agenda
Why NoSQL Problems with RDBMS
What is NoSQL
CAP Theorem NoSQL break down
HBase Overivew & Architecture
7/24/2019 No SQL HBase Overview v 1.0
3/31
Why NoSQL now ?
7/24/2019 No SQL HBase Overview v 1.0
4/31
What are the problems with RDBM
Fig. The growth in database transactions and volumes has a large impact on response times
Caching
Master/Slave
Master/Master
Cluster
Table Partitioning
Federated tables
Sharding
Distributed DBs
7/24/2019 No SQL HBase Overview v 1.0
5/31
Database ShardingThe Shared-Nothing A
DatabaseShardingprovides amethod forscalabilityacross
independentservers, eachwith their ownCPU, memoryand disk.
7/24/2019 No SQL HBase Overview v 1.0
6/31
Database ShardingAdvantages
DatabaseSharding
Smallerdatabasesare easierto manage.
Smallerdatabasesare faster
DatabaseSharding
can reducecosts.
7/24/2019 No SQL HBase Overview v 1.0
7/31
What is NoSQL ?
Next Generation Databases mostly addressing some of the points: being non-redistributed, open-source and horizontally scalable , Schema-free, easy-repsupport, simple API, eventually consistent
- nosql-database.org
Non-Relational
Distributed
Open-source
Horizontally Scalable
Schema-Free
Replication Support
Simple API
Eventually Consistent
7/24/2019 No SQL HBase Overview v 1.0
8/31
What is NoSQL
Its about using the right tool for the job Not all system have the same data needs.
Sql is not the only option, nor it is always best one.
Consider all options carefully and choose wisely
Not Only SQLIts not about flaming SQL
Its about opening our minds to new technologies
7/24/2019 No SQL HBase Overview v 1.0
9/31
CAP Theorem
Requirements to distributed systems Consistencythe system is in a consistent state after an operation
All clients see the same data
Strong consistency (ACID) vs. eventual consistency(BASE)
Availability the system is always on, no downtime Node failure toleranceall clients can find some available replica
Software/hardware upgrade tolerance Partition tolerancethe system continues to function even when split into
subsets (by a network disruption), i.e. Tolerance to network partition. Not only for reads, but writes as well!
CAP Theorem (E. Brewer, N. Lynch) You can satisfy at most 2 out of the 3 requirements.
7/24/2019 No SQL HBase Overview v 1.0
10/31
Consistency or Availability?
7/24/2019 No SQL HBase Overview v 1.0
11/31
Consistency or Availability?
Partition
C
A
7/24/2019 No SQL HBase Overview v 1.0
12/31
CAP Theorem - Explained
CA Single site clusters (easier to ensure all nodes are a
contact)
e.g. 2PC
When a partition occurs, the system blocks
CP
Some data may be inaccessible (availability sacrifthe rest is still consistent/accurate
e.g. sharded database
Requests can complete at nodes that have quorum
7/24/2019 No SQL HBase Overview v 1.0
13/31
CAP Theorem - Explained
AP System is still available under partitioning, but
data returned may be inaccurate
e.g. DNS, caches, Master/Slave replication
Need some conflict resolution strategy
Request can complete at any live node, possib
strong consistency.
7/24/2019 No SQL HBase Overview v 1.0
14/31
Eventually Consistency for Availability
BASE (basically available Soft state Eventually consistence) Weak Consistency (stale data ok)
Availability first
Faster
Approximate answers ok
ACID (Atomicity, Consistency, Strong consistency
Isolation
Safer
Availability?
7/24/2019 No SQL HBase Overview v 1.0
15/31
CAP Theorem
7/24/2019 No SQL HBase Overview v 1.0
16/31
NoSQL Break-down
Key-Value stores Column Families
Document-Oriented
Graph Databases Object Databases
XML Databases
7/24/2019 No SQL HBase Overview v 1.0
17/31
Focus of Different Data Models
7/24/2019 No SQL HBase Overview v 1.0
18/31
7/24/2019 No SQL HBase Overview v 1.0
19/31
7/24/2019 No SQL HBase Overview v 1.0
20/31
When to use NoSQL
Bigness
Massive write performance
Fast key-value access
Flexible schema and flexible datatypes
Schema migration
Write availability
Easier maintainability, administration and operations
No single point of failure
Generally available parallel computing
Use the right data model for the right problem
Distributed systems support
Tunable CAP tradeoffs
7/24/2019 No SQL HBase Overview v 1.0
21/31
Drawbacks of NoSQL
OLTP - Outside VoltDB, complex multi-object transactionsgenerally not supported.
Data IntegrityDeveloper need to take care.
SQLStill very few NoSQL systems provide a SQL interf
Ad-hoc queriesIf you need to answer the real time quabout your data that you cant predict in advance, RDBMgenerally winner here.
Complex relationshipsSome NoSQL systems supportsrelationships, but RDBMS is still winner in relating
Maturity and StabilityRDBMS still have the edge here
7/24/2019 No SQL HBase Overview v 1.0
22/31
HBase - Google BigTable Paper
Sparse, distributed, persistent multi-dimensional sorted map indexed by
(row_key, column_key, timestamp)
7/24/2019 No SQL HBase Overview v 1.0
23/31
HBase - Storage
7/24/2019 No SQL HBase Overview v 1.0
24/31
HBase
Strongly consistent reads/writes: HBase is not an "evconsistent" DataStore. This makes it very suitable for ta
as high-speed counter aggregation.
Automatic sharding: HBase tables are distributed on
via regions, and regions are automatically split and redistributed as your data grows.
Automatic RegionServer failover
Hadoop/HDFS Integration: HBase supports HDFS out
as its distributed file system.
7/24/2019 No SQL HBase Overview v 1.0
25/31
HBase
MapReduce: HBase supports massively parallelized pvia MapReduce for using HBase as both source and sin
Java Client API: HBase supports an easy to use Java
programmatic access.
Thrift/REST API: HBase also supports Thrift and REST Java front-ends.
Block Cache and Bloom Filters: HBase supports a Blo
and Bloom Filters for high volume query optimization.
7/24/2019 No SQL HBase Overview v 1.0
26/31
7/24/2019 No SQL HBase Overview v 1.0
27/31
7/24/2019 No SQL HBase Overview v 1.0
28/31
When to use HBase?
HBase isn't suitable for every problem.
When you need Random write or Random reads on HDFS
Write-dominated workloads : Examples like time-series duser analytics etc are very write-heavy
Large-scale workloads, if data growing at 250MB/monhard to manage this with a traditional RDBMS.
HBase has been insufficient for analytics because its desito support simple operations such as create, read, updatdelete rather than other operations such as aggregation.
Wh H d I l ?
7/24/2019 No SQL HBase Overview v 1.0
29/31
When to use Hive and Impala?
Hive: MapReduce as an execution engine
High latency, low throughput queries
Fault-tolerance model based on MapReduce's on-diskcheckpointing; materializes all intermediate results
Java runtime allows for easy late-binding of functionalformats and UDFs.
Extensive layering imposes high runtime overhead
Impala: Does not use Map-Reduce as an execution engine
Direct, process-to-process data exchangeIn Memory
No fault tolerance
An execution engine designed for low runtime
7/24/2019 No SQL HBase Overview v 1.0
30/31
7/24/2019 No SQL HBase Overview v 1.0
31/31
References PHP conference 2011 presentation on NoSQL http://lanyrd.com/2011/phpuk2011/vid
http://www.codefutures.com/database-sharding/ http://codahale.com/you-cant-sacrifice-partition-tolerance/
http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
http://horicky.blogspot.com/2009/11/nosql-patterns.html
http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosq
http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html http://nosql-database.org/
http://hbase.apache.org/book.html
wiki.toadforcloud.com
http://lanyrd.com/2011/phpuk2011/video/http://www.codefutures.com/database-sharding/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://www.allthingsdistributed.com/2008/12/eventually_consistent.htmlhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://horicky.blogspot.com/2009/11/nosql-patterns.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://nosql-database.org/http://hbase.apache.org/book.htmlhttp://wiki.toadforcloud.com/index.php/HBase_Storagehttp://wiki.toadforcloud.com/index.php/HBase_Storagehttp://hbase.apache.org/book.htmlhttp://hbase.apache.org/book.htmlhttp://nosql-database.org/http://nosql-database.org/http://nosql-database.org/http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://horicky.blogspot.com/2009/11/nosql-patterns.htmlhttp://horicky.blogspot.com/2009/11/nosql-patterns.htmlhttp://horicky.blogspot.com/2009/11/nosql-patterns.htmlhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/2008/12/eventually_consistent.htmlhttp://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://www.codefutures.com/database-sharding/http://www.codefutures.com/database-sharding/http://www.codefutures.com/database-sharding/http://lanyrd.com/2011/phpuk2011/video/