Upload
elephantscale
View
347
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Doug Judd (co-founder / CEO) of hypertable talks about hypertable store @ Big Data Guru's meetup on 2014-10-21
Citation preview
Hypertable
An Open Source, High Performance,
Massively Scalable Database
Doug Judd CEO Hypertable Inc.
Three Reasons to Choose Hypertable
• High Performance • Open Source • Future Direction SQL
Introduction
Highlights • Modeled after Google’s Bigtable database • High Performance Implementation (C++) • Apache Thrift interface for all popular languages
(Java, PHP, Ruby, Python, Perl, etc) • Broad Hadoop distribution support
o Apache 2 o Cloudera CDH3, CDH4, CDH5 o IBM BigInsights 3 o Hortonworks HDP2 o MapR
• Actively developed for 8 years
Open Source • Licensed under the GPL • Hosted on GitHub
o git://github.com/hypertable/hypertable.git o https://github.com/hypertable/hypertable.git
• Online source documentation • Mailing Lists
o groups.google.com/group/hypertable-user o groups.google.com/group/hypertable-dev
Bigtable • Google’s most successful scalable database • Bigtable underpins 100+ Google services • YouTube, Blogger, Google Earth, Google Maps,
Orkut, Gmail, Google Analytics, Google Book Search, Google Code, Crawl Database, Google Code …
• Data is physically ordered by primary key – it’s not a distributed hash table
How Hypertable Differs From A Traditional RDBMS
• Horizontally Scalable • Sparse Table Structure
o Variable number of columns per-row o Rows can have billions of columns
• Cells can have multiple time stamped versions
Database Model • Sparse, two-dimensional tables • Cells can have multiple versions • Cells addressed by 4-part key
o Row o Column family o Column qualifier
o Timestamp
Conceptual Table Representation
Actual Table Representation
Anatomy of a Key • Column Family is 8-bit • Timestamp and Revision are 64-bit integer
nanoseconds since Epoch • Simple byte-wise comparison
Architecture
Table Growth Process
How Scaling Works
How Scaling Works
How Scaling Works
High Level Architecture
High Level Architecture
High Level Architecture
High Level Architecture
High Level Architecture
High Level Architecture
RangeServer Insert Handling
RangeServer Query Handling
Cellstore Format
Bloom Filter
Request Routing
Administration
Cluster Task AutomationTool
• ht_cluster • Modeled after Capistrano • Role
o Designates a function or service and the set of machines that will perform that function or service
o Examples: Hyperspace, Master, Slave (RangeServer), ThriftBroker o Machines can belong to one ore more roles
• Task o Script written for specific roles and used to manage the associated
function or service
o Examples: start_hyperspace, stop_hyperspace
cluster.def INSTALL_PREFIX=/opt/hypertable HYPERTABLE_VERSION=0.9.8.2 PACKAGE_FILE=/tmp/hypertable-0.9.8.2-linux-x86_64.tar.gz FS=hadoop HADOOP_DISTRO=cdh4 ORIGIN_CONFIG_FILE=/root/hypertable.cfg PROMPT_CLEAN=true role: source test00 role: master test[00-02] role: hyperspace test[00-02] role: slave test[03-99] - test37 role: thriftbroker role: spare include: "core.tasks"
Common Tasks ht cluster start
ht cluster stop
ht cluster push_config
ht cluster install_package
ht cluster upgrade
Monitoring
Ganglia Metrics
Thrift Broker Metrics Metric Units Connections count Requests requests/s Errors errors/s Virtual Memory GB Resident Memory GB Heap Size GB Heap Slack Bytes GB CPU user percentage CPU sys percentage Version string
Range Server Metrics Metric Units Scans scans/s Updates updates/s Bytes Returned bytes/s Bytes Scanned bytes/s Byte Scan Yield percentage Bytes WriUen bytes/s Cells Returned cells/s Cells Scanned cells/s Cell Scan Yield percentage Outstanding Scanners count Request Backlog count
Metric Units Major Compactions count Minor Compactions count Merging Compactions count GC Compactions count Virtual Memory GB Resident Memory GB Heap Size GB Heap Slack Bytes GB Tracked Memory GB CPU user percentage CPU sys percentage
Range Server Metrics Metric Units Ranges count CellStores count Block Cache Hits percentage Block Cache Memory GB Block Cache Fill GB Query Cache Hits Percentage Query Cache Memory GB Query Cache Fill GB Version string
FS Broker Metrics Metric Units Read Throughput MB/s Write Throughput MB/s Syncs syncs/s Sync Latency milliseconds Errors count JVM GCs count JVM GC Time milliseconds JVM Heap Size GB Virtual Memory GB Resident Memory GB
Metric Units Heap Size GB Heap Slack Bytes GB CPU user percentage CPU sys percentage Version string
Master and Hyperspace Metrics
Metric Units Operations operations/s Virtual Memory GB Resident Memory GB Heap Size GB Heap Slack Bytes GB CPU user percentage CPU sys percentage Version string
Metric Units Requests requests/s Virtual Memory GB Resident Memory GB Heap Size GB Heap Slack Bytes GB CPU user percentage CPU sys percentage Version string
Master Hyperspace
Slow Query Log • ThriftBroker feature • Logs queries that
take longer than 10 seconds
• Log line format o End time (seconds) o Start time (seconds) o Function called o Client IP/port o Latency (milliseconds) o Sub-scanner count
o Bytes Returned o Bytes Scanned o Disk read o Servers contacted o Namespace o HQL representation of query
Features
Namespaces
Namespaces USE ‘/’;
CREATE NAMESPACE foo; USE foo;
CREATE NAMESPACE bar;
CREATE TABLE mytable (a, b, c); GET LISTING;
(bar) namespace mytable
Atomic Counters • Column option:
CREATE TABLE counts ( url COUNTER );
• Modified via existing API using specially formatted values:
Value Format Description [+]n Increment counter by n -n Decrement counter by n =n Reset counter to n
Secondary Indexes
Total Cells Inserted: 1 billion Total Time Taken: 45 minutes
Aggregate Throughput (inserts/s): 372,362 Aggregate Throughput (bytes/s): 14,763,300
§ Six test machines - Dual Six-core Opteron HE Processors - 24 GB RAM - 4X 2TB SATA drives
§ Single Indexed column - Key: randomly generated 20-byte integer - Value: two randomly chosen words from /usr/share/dict/
words
Secondary Indexes (HQL)
CREATE TABLE products ( title, section, info, category,
INDEX section, INDEX info, QUALIFIER INDEX info, QUALIFIER INDEX category );
Secondary Indexes SELECT title FROM products WHERE info:actor = “Jack Nicholson”;
B00002VWE0 title Five Easy Pieces (1970) B002VWNIDG title The Shining (1980)
Secondary Indexes SELECT title, info:author FROM products WHERE info:author =~ /^Stephen [PK]/;
0307743659 title The Shining Mass Market Paperback 0307743659 info:author Stephen King 0321776402 title C++ Primer Plus (6th Edition) (Developer's Library)
0321776402 info:author Stephen Prata
Secondary Indexes SELECT title FROM products WHERE Exists(info:studio);
B00002VWE0 title Five Easy Pieces (1970) B000Q66J1M title 2001: A Space Odyssey [Blu-ray] B002VWNIDG title The Shining (1980)
Secondary Indexes SELECT title FROM products WHERE info:author =~ /^Stephen P/ OR info:publisher =~ /^Anchor/; 0307743659 title The Shining Mass Market Paperback 0321776402 title C++ Primer Plus (6th Edition) (Developer's Library)
Secondary Indexes SELECT title FROM products WHERE info:author =~ /^Stephen [PK]/ AND info:publisher =~ /^Anchor/; 0307743659 title The Shining Mass Market Paperback
Secondary Indexes SELECT title FROM products WHERE ROW =^ 'B' AND info:actor = 'Jack Nicholson'; B00002VWE0 title Five Easy Pieces (1970) B002VWNIDG title The Shining (1980)
Regex Filtering • Google’s RE2 regular expression engine
o Extremely fast (up to 50X Java regex) o Searches run in time linear in the size of the
input o Searches constrained to a fixed amount of
memory
• Supported Searches: o Row key o Column qualifier o Value
Regex Filtering SELECT info:/^a/ FROM products; 0307743659 info:author Stephen King 0321321928 info:author Stephen C. Dewhurst 0321776402 info:author Stephen Prata B00002VWE0 info:actor Karen Black B00002VWE0 info:actor Jack Nicholson B000Q66J1M info:actor Gary Lockwood B000Q66J1M info:actor Keir Dullea B002VWNIDG info:actor Shelley Duvall B002VWNIDG info:actor Jack Nicholson
Regex Filtering
SELECT title FROM products
WHERE ROW REGEXP "2";
0321321928 title C++ Common Knowledge: Essential Intermediate Programming [Paperback]
0321776402 title C++ Primer Plus (6th Edition) (Developer's Library)
B00002VWE0 title Five Easy Pieces (1970) B002VWNIDG title The Shining (1980)
Regex Filtering
SELECT title FROM products WHERE VALUE REGEXP "\("; 0321776402 title C++ Primer Plus (6th Edition) (Developer's Library)
B00002VWE0 title Five Easy Pieces (1970) B002VWNIDG title The Shining (1980)
Hadoop MapReduce • MapReduce Input/Output formats
o Normal (mapreduce) o Streaming (mapred)
• Load data from HT to Hive and vice-versa • Use Hive types • Use Hive QL (joins, aggregations) • Low latency data warehousing • Uses Hypertable’s native MapReduce Input/Output
format
Column Family Options • TTL=<t>
o “time to live” o Remove cells that are older than <t>
• MAX_VERSIONS=<n> o Keep only most recent <n> cell versions
Access Groups CREATE TABLE User ( name, address, photo, profile, ACCESS GROUP default (name, address, photo), ACCESS GROUP profile (profile) );
Adaptive Memory Allocation
Group Commit • Supports highly concurrent updates • Trades average latency for better throughput • By default, commit log writes are auto-coalesced • Commit log write interval can be statically
configured per-table: CREATE TABLE counts ( url, domain ) GROUP_COMMIT_INTERVAL=100;
Caching • Block Cache
o Caches CellStore blocks o Can be configured to store blocks compressed or
uncompressed (default = compressed) o Dynamically adjusted size based on workload
• Query Cache o Caches query results o Caches single row queries only
Compression • Cell Store blocks are compressed • Commit Log updates are compressed • Supported Compression Schemes:
bmz, lzo, quicklz, snappy, zlib, none
• Quicklz performance numbers:
Language Compression Speed (MB/s)
Decompression Speed (MB/s)
C++ 308 358
Java 127 95
Performance Study
Hypertable vs. HBase • Modeled after test described in Bigtable paper • Hypertable 0.9.5.5 vs. HBase 0.90.4 • 16-node Cluster
o CPU: 2X AMD C32 Six-core model 4170 HE 2.1GHz o RAM: 24GB o Disk: 4X 2TB SATA
• Tests Run o Random Write o Scan o Random Read Zipfian o Random Read Uniform
Random Write
Scan
Random Read Zipfian
Case Studies
• Operational Data Store • System metrics
o CPU o Memory o IO o Network
• Application metrics o Web o DB o Caches
• Business metrics o Usage o Revenue
Case Study: Noah System
• Storage Capacity o Up to 100TB o Up to 1 trillion records
• Automatic Sharding o Irregular data growth patterns
• Heavy Writes o ~30K inserts/s
• Fast Reads of Recent Data • Table Scans
System Requirements
Architecture Diagram
• 2nd Largest Indian Internet Portal • Rediffmail
o One of the world’s largest email services o Over 100 Million registered users
• Active Deployments o Rediffmaill
o Email SPAM classification o News Crawl Database o Recommendation System
Case Study: Rediff
Architectural Overview
Query Latency
Summary
• High Performance • Open Source • Future Direction SQL
The End