No SQL HBase Overview v 1.0

  • Upload
    ud

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

  • 7/24/2019 No SQL HBase Overview v 1.0

    1/31

    Overview of NoSQL Databases

    & HBase

  • 7/24/2019 No SQL HBase Overview v 1.0

    2/31

    Agenda

    Why NoSQL Problems with RDBMS

    What is NoSQL

    CAP Theorem NoSQL break down

    HBase Overivew & Architecture

  • 7/24/2019 No SQL HBase Overview v 1.0

    3/31

    Why NoSQL now ?

  • 7/24/2019 No SQL HBase Overview v 1.0

    4/31

    What are the problems with RDBM

    Fig. The growth in database transactions and volumes has a large impact on response times

    Caching

    Master/Slave

    Master/Master

    Cluster

    Table Partitioning

    Federated tables

    Sharding

    Distributed DBs

  • 7/24/2019 No SQL HBase Overview v 1.0

    5/31

    Database ShardingThe Shared-Nothing A

    DatabaseShardingprovides amethod forscalabilityacross

    independentservers, eachwith their ownCPU, memoryand disk.

  • 7/24/2019 No SQL HBase Overview v 1.0

    6/31

    Database ShardingAdvantages

    DatabaseSharding

    Smallerdatabasesare easierto manage.

    Smallerdatabasesare faster

    DatabaseSharding

    can reducecosts.

  • 7/24/2019 No SQL HBase Overview v 1.0

    7/31

    What is NoSQL ?

    Next Generation Databases mostly addressing some of the points: being non-redistributed, open-source and horizontally scalable , Schema-free, easy-repsupport, simple API, eventually consistent

    - nosql-database.org

    Non-Relational

    Distributed

    Open-source

    Horizontally Scalable

    Schema-Free

    Replication Support

    Simple API

    Eventually Consistent

  • 7/24/2019 No SQL HBase Overview v 1.0

    8/31

    What is NoSQL

    Its about using the right tool for the job Not all system have the same data needs.

    Sql is not the only option, nor it is always best one.

    Consider all options carefully and choose wisely

    Not Only SQLIts not about flaming SQL

    Its about opening our minds to new technologies

  • 7/24/2019 No SQL HBase Overview v 1.0

    9/31

    CAP Theorem

    Requirements to distributed systems Consistencythe system is in a consistent state after an operation

    All clients see the same data

    Strong consistency (ACID) vs. eventual consistency(BASE)

    Availability the system is always on, no downtime Node failure toleranceall clients can find some available replica

    Software/hardware upgrade tolerance Partition tolerancethe system continues to function even when split into

    subsets (by a network disruption), i.e. Tolerance to network partition. Not only for reads, but writes as well!

    CAP Theorem (E. Brewer, N. Lynch) You can satisfy at most 2 out of the 3 requirements.

  • 7/24/2019 No SQL HBase Overview v 1.0

    10/31

    Consistency or Availability?

  • 7/24/2019 No SQL HBase Overview v 1.0

    11/31

    Consistency or Availability?

    Partition

    C

    A

  • 7/24/2019 No SQL HBase Overview v 1.0

    12/31

    CAP Theorem - Explained

    CA Single site clusters (easier to ensure all nodes are a

    contact)

    e.g. 2PC

    When a partition occurs, the system blocks

    CP

    Some data may be inaccessible (availability sacrifthe rest is still consistent/accurate

    e.g. sharded database

    Requests can complete at nodes that have quorum

  • 7/24/2019 No SQL HBase Overview v 1.0

    13/31

    CAP Theorem - Explained

    AP System is still available under partitioning, but

    data returned may be inaccurate

    e.g. DNS, caches, Master/Slave replication

    Need some conflict resolution strategy

    Request can complete at any live node, possib

    strong consistency.

  • 7/24/2019 No SQL HBase Overview v 1.0

    14/31

    Eventually Consistency for Availability

    BASE (basically available Soft state Eventually consistence) Weak Consistency (stale data ok)

    Availability first

    Faster

    Approximate answers ok

    ACID (Atomicity, Consistency, Strong consistency

    Isolation

    Safer

    Availability?

  • 7/24/2019 No SQL HBase Overview v 1.0

    15/31

    CAP Theorem

  • 7/24/2019 No SQL HBase Overview v 1.0

    16/31

    NoSQL Break-down

    Key-Value stores Column Families

    Document-Oriented

    Graph Databases Object Databases

    XML Databases

  • 7/24/2019 No SQL HBase Overview v 1.0

    17/31

    Focus of Different Data Models

  • 7/24/2019 No SQL HBase Overview v 1.0

    18/31

  • 7/24/2019 No SQL HBase Overview v 1.0

    19/31

  • 7/24/2019 No SQL HBase Overview v 1.0

    20/31

    When to use NoSQL

    Bigness

    Massive write performance

    Fast key-value access

    Flexible schema and flexible datatypes

    Schema migration

    Write availability

    Easier maintainability, administration and operations

    No single point of failure

    Generally available parallel computing

    Use the right data model for the right problem

    Distributed systems support

    Tunable CAP tradeoffs

  • 7/24/2019 No SQL HBase Overview v 1.0

    21/31

    Drawbacks of NoSQL

    OLTP - Outside VoltDB, complex multi-object transactionsgenerally not supported.

    Data IntegrityDeveloper need to take care.

    SQLStill very few NoSQL systems provide a SQL interf

    Ad-hoc queriesIf you need to answer the real time quabout your data that you cant predict in advance, RDBMgenerally winner here.

    Complex relationshipsSome NoSQL systems supportsrelationships, but RDBMS is still winner in relating

    Maturity and StabilityRDBMS still have the edge here

  • 7/24/2019 No SQL HBase Overview v 1.0

    22/31

    HBase - Google BigTable Paper

    Sparse, distributed, persistent multi-dimensional sorted map indexed by

    (row_key, column_key, timestamp)

  • 7/24/2019 No SQL HBase Overview v 1.0

    23/31

    HBase - Storage

  • 7/24/2019 No SQL HBase Overview v 1.0

    24/31

    HBase

    Strongly consistent reads/writes: HBase is not an "evconsistent" DataStore. This makes it very suitable for ta

    as high-speed counter aggregation.

    Automatic sharding: HBase tables are distributed on

    via regions, and regions are automatically split and redistributed as your data grows.

    Automatic RegionServer failover

    Hadoop/HDFS Integration: HBase supports HDFS out

    as its distributed file system.

  • 7/24/2019 No SQL HBase Overview v 1.0

    25/31

    HBase

    MapReduce: HBase supports massively parallelized pvia MapReduce for using HBase as both source and sin

    Java Client API: HBase supports an easy to use Java

    programmatic access.

    Thrift/REST API: HBase also supports Thrift and REST Java front-ends.

    Block Cache and Bloom Filters: HBase supports a Blo

    and Bloom Filters for high volume query optimization.

  • 7/24/2019 No SQL HBase Overview v 1.0

    26/31

  • 7/24/2019 No SQL HBase Overview v 1.0

    27/31

  • 7/24/2019 No SQL HBase Overview v 1.0

    28/31

    When to use HBase?

    HBase isn't suitable for every problem.

    When you need Random write or Random reads on HDFS

    Write-dominated workloads : Examples like time-series duser analytics etc are very write-heavy

    Large-scale workloads, if data growing at 250MB/monhard to manage this with a traditional RDBMS.

    HBase has been insufficient for analytics because its desito support simple operations such as create, read, updatdelete rather than other operations such as aggregation.

    Wh H d I l ?

  • 7/24/2019 No SQL HBase Overview v 1.0

    29/31

    When to use Hive and Impala?

    Hive: MapReduce as an execution engine

    High latency, low throughput queries

    Fault-tolerance model based on MapReduce's on-diskcheckpointing; materializes all intermediate results

    Java runtime allows for easy late-binding of functionalformats and UDFs.

    Extensive layering imposes high runtime overhead

    Impala: Does not use Map-Reduce as an execution engine

    Direct, process-to-process data exchangeIn Memory

    No fault tolerance

    An execution engine designed for low runtime

  • 7/24/2019 No SQL HBase Overview v 1.0

    30/31

  • 7/24/2019 No SQL HBase Overview v 1.0

    31/31

    References PHP conference 2011 presentation on NoSQL http://lanyrd.com/2011/phpuk2011/vid

    http://www.codefutures.com/database-sharding/ http://codahale.com/you-cant-sacrifice-partition-tolerance/

    http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

    www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

    http://horicky.blogspot.com/2009/11/nosql-patterns.html

    http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosq

    http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html http://nosql-database.org/

    http://hbase.apache.org/book.html

    wiki.toadforcloud.com

    http://lanyrd.com/2011/phpuk2011/video/http://www.codefutures.com/database-sharding/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://www.allthingsdistributed.com/2008/12/eventually_consistent.htmlhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://horicky.blogspot.com/2009/11/nosql-patterns.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://nosql-database.org/http://hbase.apache.org/book.htmlhttp://wiki.toadforcloud.com/index.php/HBase_Storagehttp://wiki.toadforcloud.com/index.php/HBase_Storagehttp://hbase.apache.org/book.htmlhttp://hbase.apache.org/book.htmlhttp://nosql-database.org/http://nosql-database.org/http://nosql-database.org/http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.htmlhttp://horicky.blogspot.com/2009/11/nosql-patterns.htmlhttp://horicky.blogspot.com/2009/11/nosql-patterns.htmlhttp://horicky.blogspot.com/2009/11/nosql-patterns.htmlhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/2008/12/eventually_consistent.htmlhttp://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://codahale.com/you-cant-sacrifice-partition-tolerance/http://www.codefutures.com/database-sharding/http://www.codefutures.com/database-sharding/http://www.codefutures.com/database-sharding/http://lanyrd.com/2011/phpuk2011/video/