Upload
calista-edds
View
223
Download
6
Tags:
Embed Size (px)
Citation preview
1
CS 440 Database Management Systems
Lecture 14: NoSQL & NewSQL, Cont’d.
Some slides due to Magda Balazinska
2
Scaling by partitioning & replication
• Partition the data across machines• Replicate the partitions– Good: • spread read queries across replica
– Bad: • should keep the replica consistent after write queries
– Ugly: • difficult to scale transactions
– two phase commit is expensive
• difficult to scale complex operations
3
NoSQL: Not Only SQL/ Not relational
• Goals– highly scalable data management system– flexible data model: various records from different schema
• They are willing to give up– Complex queries
• e.g. no join
– ACID guarantees• weaker versions, e.g. eventual consistency
– Multi-object transactions
• Not all NoSQL systems give up all these properties
4
NoSQL key features
• Scale horizontally simple operations – key lookups, reads and writes of one record or a
small number of records, simple selections • Replicate/distribute data over many servers • Simple call level interface (contrast w/ SQL) • Weaker concurrency model than ACID • Efficient use of distributed indexes and RAM • Flexible schema
5
Different types of NoSQL
Taxonomy based on the data models:• Key-value stores– e.g., Dynamo, Project Voldemort, Memcached
• Document stores– e.g., SimpleDB, CouchDB, MongoDB
• Extensible Record stores– e.g., Bigtable, HBase, Cassandra
• NewSQL: new type of RDBMSs– e.g., Megastore, VoltDB,
6
Key-Value stores features • Data model: (key, value) pairs– values are binary objects– no further schema
• Operations– insert, delete, and lookup operations on keys – no operation across multiple data items
• Consistency– replication with eventual consistency• e.g., vector clocks in Dynamo
– goal to NEVER reject any writes (bad for business!) – multiple versions with conflict resolution during reads
7
Key-Value stores features
• Use replication to provide fault-tolerance• Quorum replication in Dynamo – Each update creates a new version of an object – Vector clocks track causality between versions– Parameters: • N = number of copies (replicas) of each object • R = minimum number of nodes that must participate in
a successful read • W = minimum number of nodes that must participate in
a successful write • Quorum: R+W > N
8
Key-Value stores internals
• Only primary index: lookup by key– No secondary indexes!
• Data remains in main memory• Most systems also offer a persistence option• Some offer ACID transactions others do not– Multiversion concurrency control or locking
Multiversion Concurrency Control
• Idea: Let writers make a “new” copy while readers use an appropriate “old” copy:
OO’
O’’
MAINSEGMENT(Currentversions ofDB objects)
VERSIONPOOL(Older versions thatmay be useful for some active readers.)
Readers are always allowed to proceed.– But may be blocked until writer commits.
Multiversion CC (Contd.)
• Each version of an object has its writer’s TS as its WTS, and the TS of the Xact that most recently read this version as its RTS.
• Versions are chained backward; we can discard versions that are “too old to be of interest”.
• Each Xact is classified as Reader or Writer.– Writer may write some object; Reader never will.– Xact declares whether it is a Reader when it begins.
Reader Xact
• For each object to be read:– Finds newest version with WTS < TS(T). (Starts
with current version in the main segment and chains backward through earlier versions.)
• Assuming that some version of every object exists from the beginning of time, Reader Xacts are never restarted.– However, might block until writer of the
appropriate version commits.
T
old newWTS timeline
Writer Xact
• To read an object, follows reader protocol.• To write an object:– Finds newest version V s.t. WTS < TS(T). – If RTS(V) < TS(T), T makes a copy CV of V,
with a pointer to V, with WTS(CV) = TS(T), RTS(CV) = TS(T). (Write is buffered until T commits; other Xacts can see TS values but can’t read version CV.)
– Else, reject write.
T
old newWTS
CV
V
RTS(V)
13
Check out DynamoDB!
http://aws.amazon.com/dynamodb/
14
Different types of NoSQL
Taxonomy based on the data models:• Key-value stores– e.g., Dynamo, project voldemort, Memcached
• Document stores– e.g., SimpleDB, CouchDB, MongoDB
• Extensible Record stores– e.g., BigTable, HBase, Cassandra
• NewSQL: new type of RDBMSs
15
Document stores
• A "document” is a pointer-less object– e.g., JSON– nested or not– schema-less
• They may have secondary indexes. • Scalability– Replication (e.g. SimpleDB, CounchDB – means
entire db is replicated)– Sharding (MongoDB)– Both
16
Amazon SimpleDB (1/3)
• Partitioning– Data partitioned into domains: queries run within a domain– Domains seem to be unit of replication. Limit 10GB– Can use domains to manually create parallelism
• Data Model/ Schema– No fixed schema– Objects are defined with attribute-value pairs
17
Amazon SimpleDB (2/3)
• Indexing – Automatically indexes all attributes
• Support for writing – PUT and DELETE items in a domain
• Support for querying – GET by key– Selection + sort:
SELECT output_list FROM domain_name [where expression] [sort_instructions] [limit limit]
– A simple form of aggregation: count– Query is limited to 5s and 1MB output (but can continue)
18
Amazon SimpleDB (3/3)
• Availability and consistency – Data is stored redundantly across multiple servers– Takes time for the update to propagate to all locations• Eventually consistent, but an immediate read might
not show the change– Choose between consistent or eventually consistent
read
19
Different types of NoSQL
Taxonomy based on the data models:• Key-value stores– e.g., Dynamo, project voldemort, Memcached
• Document stores– e.g., SimpleDB, CouchDB, MongoDB
• Extensible record stores– e.g., BigTable, HBase, Cassandra
• NewSQL: new type of RDBMSs
20
Extensible record stores
• Data model is rows and columns• Typical Access: Row ID, Column ID, Timestamp • Scalability by splitting rows and columns over nodes– Rows: sharding on primary key– Columns: "column groups" = indication for which columns
to be stored together (e.g. customer name/address group, financial info group, login info group)
21
Google Bigtable
• Distributed storage system• Designed to store structured data • Scale to thousands of servers • Store up to several hundred terabytes (maybe even
petabytes) • Perform backend bulk processing • Perform real-time data serving• To scale, Bigtable has a limited set of features
22
Bigtable data model
• Sparse, multidimensional sorted map
(row:string, column:string, time:int64)string
Columns are grouped in to families
23
Bigtable key features• Read/writes of data under single row key is atomic– Only single-row transactions!
• Data is stored in lexicographical order – Improves data access locality– Horizontally partitioned into tablets– Tablets are unit of distribution and load balancing
• Column families are unit of access control• Data is versioned (old versions garbage collected) – Ex: most recent three crawls of each page, with times
24
Bigtable API
• Data definition– Creating/deleting tables or column families – Changing access control rights
• Data manipulation–Writing or deleting values– Looking up values from individual rows– Iterating over subset of data in the table
• Can select on rows, columns, and timestamps
25
HBase
• Open source implementation of BigTablehttp://hbase.apache.org/
26
Different types of NoSQL
Taxonomy based on the data models:• Key-value stores– e.g., Dynamo, project voldemort, Memcached
• Document stores– e.g., SimpleDB, CouchDB, MongoDB
• Extensible record stores– e.g., BigTable, HBase, Cassandra
• NewSQL: new type of RDBMSs
27
Scalable RDBMS: NewSQL
• Means RDBS that are offering sharding • Key difference: – NoSQL make it difficult or impossible to perform large
scope operations and transactions (to ensure performance), while scalable RDBMS do not preclude these operations, but users pay a price only when they need them.
• Megastore, VoltDB, MySQL Cluster, Clusterix, ScaleDB
28
Megastore
• Implemented over Bigtable, used within Google • Megastore is a layer on top of Bigtable– Transactions that span nodes– A database schema defined in a SQL-like language – Hierarchical paths that allow some limited joins
• Megastore is made available through the Google App Engine Datastore
29
VoltDB
• Main-memory RDBMS: no disk IO no buffer mngmt!• Sharded across a shared-nothing cluster – One transaction = one stored procedure – So both the data and processing are partitioned
• Transaction processing – SQL execution single-threaded for each shard – Avoids all locking and latching overheads
• Synchronous multi-master replication for HA
30
Application 1
• Web application that needs to display lots of customer information; the users data is rarely updated, and when it is, you know when it changes because updates go through the same interface.
31
Application 2
• Department of Motor Vehicle: lookup objects by multiple fields (driver's name, license number, birth date, etc); "eventual consistency" is ok, since updates are usually performed at a single location.
32
Application 3
• eBay-style application. Cluster customers by country; separate the rarely changed "core” customer information (address, email) from frequently-updated info (current bids).
33
Application 4
• Everything else (e.g. a serious DMV application)
34
Criticism (from Stonebraker, CACM2011)
• No ACID Equals No Interest in enterprises– Screwing up mission-critical data is no-no-no
• Low-level Query Language is Death – Before SQL
• NoSQL means NoStandards – One (typical) large enterprise has 10,000 databases. These
need accepted standards