CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian...

Preview:

Citation preview

CS 405G: Introduction to Database Systems

24 NoSQLReuse some slides of Jennifer Widom

Chen Qian University of Kentucky

Summary

Tree-based indexes: O(logN) for search and update, support range queries

Hash-based indexes: best for equality searches O(1), cannot support range searches.

Static and dynamic

04/19/23 Chen Qian @ University of Kentucky 2

NoSQL: The Name

“SQL” = Traditional relational DBMS Recognition over past decade or so: Not every data management/analysis problem is best solved using a traditional relational DBMS “NoSQL” = “No SQL” = Not using traditional relational DBMS

“No SQL” Don’t use SQL language

NoSQL Systems: Motivation

NoSQL: The Name

“SQL” = Traditional relational DBMS Recognition over past decade or so: Not every data management/analysis problem is best solved using a traditional relational DBMS “NoSQL” = “No SQL” = Not using traditional relational DBMS

“No SQL” Don’t use SQL language

“NoSQL” = “Not Only SQL”

NoSQL Systems: Motivation

Not every data management/analysisproblem is best solved using a traditional DBMS

Database Management System (DBMS) provides….

… efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data.

Database Management System (DBMS) provides….

… efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data.

NoSQL Systems: Motivation

NoSQL Systems

Alternative to traditional relational DBMS+ Flexible schema+ Quicker/cheaper to set up+ Massive scalability+ Relaxed consistency higher performance & availability

– No declarative query language more programming– Relaxed consistency fewer guarantees

NoSQL Systems: Motivation

Example #1: Web log analysis

Each record: UserID, URL, timestamp, additional-info

Task: Load into database systemData cleaningData extractionVerificationSchemaNothing above is needed for noSQL!

NoSQL Systems: Motivation

Example #1: Web log analysis

Each record: UserID, URL, timestamp, additional-info

Task: Find all records for… Given UserID Given URL Given timestamp Certain construct appearing in additional-info

NoSQL Systems: Motivation

Example #1: Web log analysis

Each record: UserID, URL, timestamp, additional-infoSeparate records: UserID, name, age, gender, …

Task: Find average age of user accessing given URL

May not require strict consistency.

NoSQL Systems: Motivation

Example #2: Social-network graph

Each record: UserID1, UserID2

Separate records: UserID, name, age, gender, …

Task: Find all friends of friends of friends of … friends of given user

Large number of joins?Not efficient at all!Specially designed graph database may be better

NoSQL Systems: Motivation

Example #3: Wikipedia pages

Large collection of documentsCombination of structured and unstructured data

Task: Retrieve introductory paragraph of all pages about U.S. presidents before 1900

Mix of structured and unstructured data

NoSQL Systems: Motivation

NoSQL Systems

Alternative to traditional relational DBMS+ Flexible schema+ Quicker/cheaper to set up+ Massive scalability+ Relaxed consistency higher performance & availability

– No declarative query language more programming– Relaxed consistency fewer guarantees

NoSQL Systems: Motivation

NoSQL Systems

Overview

NoSQL Systems

Several incarnations MapReduce framework: OLAP Key-value stores: OLTP Document stores Graph database systems

NoSQL Systems: Overview

MapReduce Framework

Originally from Google, open source Hadoop No data model, data stored in files User provides specific functions

map() reduce()

System provides data processing “glue”, fault-tolerance, scalability

NoSQL Systems: Overview

Map and Reduce Functions

Map: Divide problem into subproblems

Reduce: Do work on subproblems, combine results

NoSQL Systems: Overview

MapReduce Architecture

NoSQL Systems: Overview

MapReduce Example: Web log analysis

Each record: UserID, URL, timestamp, additional-infoTask: Count number of accesses for each domain (inside

URL)

NoSQL Systems: Overview

MapReduce Example (modified #1)

Each record: UserID, URL, timestamp, additional-infoTask: Total “value” of accesses for each domain based on

additional-info

NoSQL Systems: Overview

MapReduce Framework

No data model, data stored in files User provides specific functions System provides data processing “glue”, fault-tolerance, scalability

NoSQL Systems: Overview

MapReduce Framework

Schemas and declarative queries are missedHive – schemas, SQL-like query language

Pig – more imperative but with relational operators Both compile to “workflow” of Hadoop (MapReduce) jobs

NoSQL Systems: Overview

Key-Value Stores

Extremely simple interface Data model: (key, value) pairs Operations: Insert(key,value), Fetch(key), Update(key), Delete(key)

Implementation: efficiency, scalability, fault-tolerance Records distributed to nodes based on key Replication Single-record transactions, “eventual consistency”

NoSQL Systems: Overview

Key-Value Stores

Extremely simple interface Data model: (key, value) pairs Operations: Insert(key,value), Fetch(key), Update(key), Delete(key) Some allow (non-uniform) columns within value Some allow Fetch on range of keys

Example systems Google BigTable, Amazon Dynamo, Cassandra, Voldemort, HBase, …

NoSQL Systems: Overview

Document Stores

Like Key-Value Stores except value is document Data model: (key, document) pairs Document: JSON, XML, other semistructured formats Basic operations: Insert(key,document), Fetch(key), Update(key), Delete(key) Also Fetch based on document contents

Example systems CouchDB, MongoDB, SimpleDB, …

NoSQL Systems: Overview

Graph Database Systems Data model: nodes and edges Nodes may have properties (including ID) Edges may have labels or roles

NoSQL Systems: Overview

Graph Database Systems Interfaces and query languages vary Single-step versus “path expressions” versus full recursion Example systems

Neo4j, FlockDB, Pregel, … RDF “triple stores” can map to graph databases

NoSQL Systems: Overview

NoSQL Systems

“NoSQL” = “Not Only SQL” Not every data management/analysis problem is best solved exclusively using a traditional DBMS

Current incarnations– MapReduce framework– Key-value stores– Document stores– Graph database systems

NoSQL Systems: Overview

Recommended