27
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

Embed Size (px)

Citation preview

Page 1: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

CS 405G: Introduction to Database Systems

24 NoSQLReuse some slides of Jennifer Widom

Chen Qian University of Kentucky

Page 2: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

Summary

Tree-based indexes: O(logN) for search and update, support range queries

Hash-based indexes: best for equality searches O(1), cannot support range searches.

Static and dynamic

04/19/23 Chen Qian @ University of Kentucky 2

Page 3: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

NoSQL: The Name

“SQL” = Traditional relational DBMS Recognition over past decade or so: Not every data management/analysis problem is best solved using a traditional relational DBMS “NoSQL” = “No SQL” = Not using traditional relational DBMS

“No SQL” Don’t use SQL language

NoSQL Systems: Motivation

Page 4: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

NoSQL: The Name

“SQL” = Traditional relational DBMS Recognition over past decade or so: Not every data management/analysis problem is best solved using a traditional relational DBMS “NoSQL” = “No SQL” = Not using traditional relational DBMS

“No SQL” Don’t use SQL language

“NoSQL” = “Not Only SQL”

NoSQL Systems: Motivation

Page 5: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

Not every data management/analysisproblem is best solved using a traditional DBMS

Database Management System (DBMS) provides….

… efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data.

Database Management System (DBMS) provides….

… efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data.

NoSQL Systems: Motivation

Page 6: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

NoSQL Systems

Alternative to traditional relational DBMS+ Flexible schema+ Quicker/cheaper to set up+ Massive scalability+ Relaxed consistency higher performance & availability

– No declarative query language more programming– Relaxed consistency fewer guarantees

NoSQL Systems: Motivation

Page 7: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

Example #1: Web log analysis

Each record: UserID, URL, timestamp, additional-info

Task: Load into database systemData cleaningData extractionVerificationSchemaNothing above is needed for noSQL!

NoSQL Systems: Motivation

Page 8: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

Example #1: Web log analysis

Each record: UserID, URL, timestamp, additional-info

Task: Find all records for… Given UserID Given URL Given timestamp Certain construct appearing in additional-info

NoSQL Systems: Motivation

Page 9: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

Example #1: Web log analysis

Each record: UserID, URL, timestamp, additional-infoSeparate records: UserID, name, age, gender, …

Task: Find average age of user accessing given URL

May not require strict consistency.

NoSQL Systems: Motivation

Page 10: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

Example #2: Social-network graph

Each record: UserID1, UserID2

Separate records: UserID, name, age, gender, …

Task: Find all friends of friends of friends of … friends of given user

Large number of joins?Not efficient at all!Specially designed graph database may be better

NoSQL Systems: Motivation

Page 11: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

Example #3: Wikipedia pages

Large collection of documentsCombination of structured and unstructured data

Task: Retrieve introductory paragraph of all pages about U.S. presidents before 1900

Mix of structured and unstructured data

NoSQL Systems: Motivation

Page 12: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

NoSQL Systems

Alternative to traditional relational DBMS+ Flexible schema+ Quicker/cheaper to set up+ Massive scalability+ Relaxed consistency higher performance & availability

– No declarative query language more programming– Relaxed consistency fewer guarantees

NoSQL Systems: Motivation

Page 13: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

NoSQL Systems

Overview

Page 14: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

NoSQL Systems

Several incarnations MapReduce framework: OLAP Key-value stores: OLTP Document stores Graph database systems

NoSQL Systems: Overview

Page 15: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

MapReduce Framework

Originally from Google, open source Hadoop No data model, data stored in files User provides specific functions

map() reduce()

System provides data processing “glue”, fault-tolerance, scalability

NoSQL Systems: Overview

Page 16: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

Map and Reduce Functions

Map: Divide problem into subproblems

Reduce: Do work on subproblems, combine results

NoSQL Systems: Overview

Page 17: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

MapReduce Architecture

NoSQL Systems: Overview

Page 18: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

MapReduce Example: Web log analysis

Each record: UserID, URL, timestamp, additional-infoTask: Count number of accesses for each domain (inside

URL)

NoSQL Systems: Overview

Page 19: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

MapReduce Example (modified #1)

Each record: UserID, URL, timestamp, additional-infoTask: Total “value” of accesses for each domain based on

additional-info

NoSQL Systems: Overview

Page 20: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

MapReduce Framework

No data model, data stored in files User provides specific functions System provides data processing “glue”, fault-tolerance, scalability

NoSQL Systems: Overview

Page 21: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

MapReduce Framework

Schemas and declarative queries are missedHive – schemas, SQL-like query language

Pig – more imperative but with relational operators Both compile to “workflow” of Hadoop (MapReduce) jobs

NoSQL Systems: Overview

Page 22: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

Key-Value Stores

Extremely simple interface Data model: (key, value) pairs Operations: Insert(key,value), Fetch(key), Update(key), Delete(key)

Implementation: efficiency, scalability, fault-tolerance Records distributed to nodes based on key Replication Single-record transactions, “eventual consistency”

NoSQL Systems: Overview

Page 23: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

Key-Value Stores

Extremely simple interface Data model: (key, value) pairs Operations: Insert(key,value), Fetch(key), Update(key), Delete(key) Some allow (non-uniform) columns within value Some allow Fetch on range of keys

Example systems Google BigTable, Amazon Dynamo, Cassandra, Voldemort, HBase, …

NoSQL Systems: Overview

Page 24: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

Document Stores

Like Key-Value Stores except value is document Data model: (key, document) pairs Document: JSON, XML, other semistructured formats Basic operations: Insert(key,document), Fetch(key), Update(key), Delete(key) Also Fetch based on document contents

Example systems CouchDB, MongoDB, SimpleDB, …

NoSQL Systems: Overview

Page 25: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

Graph Database Systems Data model: nodes and edges Nodes may have properties (including ID) Edges may have labels or roles

NoSQL Systems: Overview

Page 26: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

Graph Database Systems Interfaces and query languages vary Single-step versus “path expressions” versus full recursion Example systems

Neo4j, FlockDB, Pregel, … RDF “triple stores” can map to graph databases

NoSQL Systems: Overview

Page 27: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

NoSQL Systems

“NoSQL” = “Not Only SQL” Not every data management/analysis problem is best solved exclusively using a traditional DBMS

Current incarnations– MapReduce framework– Key-value stores– Document stores– Graph database systems

NoSQL Systems: Overview