108
CS165 Section Niv Dayan Demystifying the Zoo of Contemporary Database Systems 1

CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

CS165 Section

Niv Dayan

Demystifying the Zoo of Contemporary Database Systems

1

Page 2: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Introduction

1980 1990 2000 2010

2Time

Page 3: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Introduction

1980 1990 2000 2010

3Time

Page 4: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Introduction

• Different architectures

– Performance

– Data integrity

– User interface

4

Page 5: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Introduction

• Different architectures

– Performance

– Data integrity

– User interface

5

Page 6: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Introduction

• Theme: any trend in database technology can be traced to a trend in hardware

• Claim: The new database technologies are adaptations to changes in hardware

Database designer Hardware

6

Page 7: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

DBHistory

7

Page 8: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• 3 goals of database design

– Speed

– Affordability

– Resilience to system failure

• How you achieve them depends on hardware

8

Page 9: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• Two storage media:

Main Memory Fast, expensive, volatile

DiskSlow, cheap, non-volatile

9

Page 10: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• How should data be stored across them?

• Main memory is volatile and expensive

Frequently accessed data is here

All data is here

10

Page 11: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• To make a system fast, address bottleneck

• Disk is extremely slow

Fetch Retrieve

Main memory

Disk11

Page 12: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• To make a system fast, address bottleneck

• Disk is extremely slow

Fetch Retrieve

Pluto

Earth

Fetch Retrieve

Main memory

Disk12

Page 13: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• Why so slow?

• Two questions:– Question 1: How to minimize disk access?

– Question 2: What to do during a disk access?

Disk hand moving

13

Page 14: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• Problem: How to minimize disk accesses?

• Solution: Store data that is frequently co-accessed at the same physical location

• Consolidates many disk accesses to one

Data item 1

Data item 2

14

Page 15: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

ID Name Balance

1 Bob 1002 Sara 1003 Will 450

History

• Example: Bank

• Co-locate all information about each customer

• Customer Sara deposits $100

Main Memory

Disk

Database

2 Sara 100

Add 1002 Sara 200

2 disk accesses, since data about sara is co-located

15

Page 16: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• What to do during a disk access?

• Start running the next operation(s)

• Improves performance

• But data can get corrupted16

Page 17: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• A couple, Bob and Sara, share a bank account

• Both deposit $100 at same time

balance = 0

17

Page 18: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• A couple, Bob and Sara, share a bank account

• Both deposit $100 at same time

balance = 0

balance = 0

Fetch

Retrieve

18

Page 19: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• A couple, Bob and Sara, share a bank account

• Both deposit $100 at same time

balance = 0

balance = 0

19

Page 20: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• A couple, Bob and Sara, share a bank account

• Both deposit $100 at same time

balance = 0

balance = 0 balance = 0

Fetch

Retrieve

20

Page 21: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• A couple, Bob and Sara, share a bank account

• Both deposit $100 at same time

balance = 0

balance = 0 balance = 0

21

Page 22: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• A couple, Bob and Sara, share a bank account

• Both deposit $100 at same time

balance = 0

balance = 100 balance = 100

22

Page 23: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• A couple, Bob and Sara, share a bank account

• Both deposit $100 at same time

balance = 100

balance = 100 balance = 100

write write

23

Page 24: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• A couple, Bob and Sara, share a bank account

• Both deposit $100 at same time

balance = 100

24

Page 25: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• A couple, Bob and Sara, share a bank account

• Both deposit $100 at same time

balance = 100

Account balance should be 200!Bob and Sara lost money.

25

Page 26: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• Question: how to achieve concurrency while maintaining data integrity?

• Insight: transactions can be concurrent, as long as they don’t modify the same data

• Solution: locking– Bob locks data, modifies it, releases lock

– Sara waits until lock is released

• Downside: – transactions may need to wait for locks.

26

Page 27: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• 3 goals of database design

– Speed

– Affordability

– Resilience to system failure

27

Page 28: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• 3 goals of database design

– Speed

–Affordability

– Resilience to system failure

28

Page 29: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• Disk was cheap, but not so cheap

• 1 gigabyte for $10000 in 1980

• Avoid storing replicas of same data

ID name account-ID balance

1 Bob 1 100

2 Sara 1 100

3 Trudy 2 450

29

Page 30: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• Solution: “Normalization”. Break tables.

ID name account-ID balance

1 Bob 1 100

2 Sara 1 100

3 Trudy 2 450ID name account-ID

1 Bob 1

2 Sara 1

3 Trudy 2

ID balance

1 100

2 450

Customers Accounts

• Bonus: Easier to maintain data integrity 30

Page 31: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• Normalization: – Saves storage space

– Easier to maintain data integrity

• Downside: reads are more expensive

– Need to join tables

ID name account-ID

1 Bob 1

2 Sara 1

3 Trudy 2

ID balance

1 100

2 450

Customers Accounts

31

Page 32: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• Data is decomposed accross tables

• Query Language: SQL

– select balance from Customers c, Accounts a where c.account-ID = a.ID and c.name = “Bob”

ID name account-ID

1 Bob 1

2 Sara 1

3 Trudy 2

ID balance

1 100

2 450

Customers Accounts

32

Page 33: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• 3 goals of database design

– Speed

–Affordability

– Resilience to system failure

33

Page 34: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• 3 goals of database design

– Speed

– Affordability

–Resilience to system failure

34

Page 35: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• Many things can go wrong

– Power failure

– Hardware failure

– Natural disaster

• Data is precious (e.g. bank)

• Provide recovery mechanism

35

Page 36: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Sara’sbalance = 100Sara’sbalance = 100

History

• Example: Sara transfers $100 to Anna

• Power stops in the middle

Anna’s balance = 450

Sara’sbalance = 0

36

Page 37: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Sara’sbalance = 100

History

• Example: Sara transfers $100 to Anna

• Power stops in the middle

Anna’s balance = 450

Sara’sbalance = 0

37

Page 38: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Anna’s balance = 450Anna’sbalance = 450

Sara’sbalance = 0

History

• Example: Sara transfers $100 to Anna

• Power stops in the middle

Anna’sbalance = 550

38

Page 39: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Anna’s balance = 450Anna’s balance = 450

Sara’sbalance = 0

History

• Example: Sara transfers $100 to Anna

• Power stops in the middle

Anna’sbalance = 550

At this point, power fails

39

Page 40: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• Transaction: a sequence of operations all takes place, or none take place.

• Transactions should be atomic

40

Page 41: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• Problem: how to guarantee atomicity?

• Solution: use a log (on disk )

• All data changes are recorded in the log

• After power failure, examine log

• Undo changes by unfinished transactions

Data Log

41

Page 42: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• Data integrity– Concurrency (fix with locking)

– System failure (fix with logging)

• ACID– Atomicity

– Consistency

– Isolation

– Durability

42

Page 43: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

History

• Summary– Speed

– Affordability

– Resilience to system failure

• Relational databases:– Normalize data into multiple tables

– ACID (locking & logging)

– SQL

• Design decisions are motivated by hardware43

Page 44: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

44

Page 45: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• What changed in hardware?

• How does it affect database design?

45

Page 46: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• Disk is 107 times cheaper

• Main memory is 106 times cheaper

46

Page 47: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• Disk is now dirt cheap

• Organizations keep all historical data

• Business intelligence

• E.g. Amazon

– revenues from product X on date Y

– which products are bought together

47

Page 48: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• Traditional system architecture:

End usersTransactional

DatabaseBusiness

Intelligence

transactionsAnalytical

queries

48

Page 49: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• Problem:

–Analytical queries are expensive • Touch a lot of data

• Disk access

• Locks

–They slow down transactions.

–End-users wait longer

49

Page 50: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• Solution: split database

End usersTransactional

DatabaseData

warehouseBusiness

Intelligence

50

Page 51: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• Different workloads

• Different internal design

Transactional Database Data

warehouse51

Page 52: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• Example analytical queries

– How long is delivery? (2 columns, all rows)

– Revenue from product X? (2 columns, all rows)

• Problem:

– Data is stored row by row

orderid

custid

productid

price orderdate

receiptdate

priority status comment

... ... ... ... ... ... ... ... ...

52

Page 53: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

• Solution: column-store – Each column is stored separately

– Good for analytical queries

– Changes entire architecture

– Examples: Vertica, Vectorwise, Greenplum, etc.

orderid

custid

status price orderdate

receiptdate

priority clerk comment

... ... ... ... ... ... ... ... ...

Today

53

Page 54: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• How are transactional databases affected by hardware changes?

Transactional Database

54

Page 55: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• Main memory is cheaper

• Terabytes are affordable

• Enough to store all transactional data

• E.g. Amazon

– Products list

– User accounts

55

Page 56: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• Main memory was expensive

• Now it’s cheaper

Data Log

56

Page 57: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• Transactional databases are main memory databases

• Bottleneck used to be disk access

• The new bottleneck is ACID (logging, locking)

Useful work

ACID

Disk access

Useful work

ACID

57

Page 58: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today• More challenges

• Due to internet, 100% availability is key

• Data is replicated

58

Page 59: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• Joins become more expensive

ID name account-ID

1 Bob 1

2 Sara 1

3 Trudy 2

ID balance

1 100

2 450

Customers Accounts

Joins

59

Page 60: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• Replication and locks become more expensive

Accounts Accounts

Replication, locks

ID balance

1 100

2 450

ID balance

1 100

2 450

60

Page 61: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• Single machine bottlenecks:

• Multiple machine bottlenecks:

Replication, locks, joins

Logging & locking

61

Page 62: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Today

• NoSQL and NewSQL address these

–NoSQL simplifies

–NewSQL engineers

62

Page 63: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL(MongoDB)

63

Page 64: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL(MongoDB)

64http://db-engines.com/en/ranking

Page 65: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• Name popularized in 2009

• Conference on “open source distributed non-relational databases”

• NoSQL was a hashtag

65

Page 66: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• Different types

– Document stores

– Column-oriented

– Key-value-stores

– Graph databases

Similar

Different

66

Page 67: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• MongoDB - Main decisions

1. No joins

• Aggregate related data into “documents”• Reduces network traffic

• Data modeling is harder

2. No ACID

• Faster

• Concurrency & system failure can corrupt data

67

Page 68: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• Single machine bottlenecks:

• Multiple machine bottlenecks:

Replication, locks, joins

Logging & locking

68

Page 69: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• To avoid joins, data is de-normalized

ID name account-ID

1 Bob 1

2 Sara 1

3 Trudy 2

ID balance

1 100

2 450

Customers Accounts

ID name account-ID balance

1 Bob 1 100

2 Sara 1 100

3 Trudy 2 450 69

Page 70: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• In MongoDB

• db.customers.find(name:“Sara”)

Collection of customer Documents

{ name: “Bob”,account-ID: 1,balance: 100 }

{name: “Sara”,account-ID: 1,balance: 100 }

{ name: “Trudy”,account-ID: 2,balance: 450 }

70

Page 71: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• Documents are flexible

{ name: “Bob”,account-ID: 1,balance: 100, favorite-color: “red”credit-score: 3.0

}

{name: “Sara”,account-ID: 1,balance: 100 hobbies: [“rowing”, “running”]

}

71

Page 72: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• Main point: no need for joins

• All related data is in one place

72

{ name: “Bob”,account-ID: 1,balance: 100, favorite-color: “red”credit-score: 3.0

}

Page 73: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• Single machine bottlenecks:

• Multiple machine bottlenecks:

Replication, locks, joins

Logging & locking

73

Page 74: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• MongoDB does not lock

• Recall Bob and Sara

• Deposit $100 at same time to shared account

• Overwrite each other’s update

No general way to prevent this

74

Page 75: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• Single machine bottlenecks:

• Multiple machine bottlenecks:

Replication, locks, joins

Logging & locking

75

Page 76: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• Eventual consistency

• Different operation order across replicas

• E.g. concurrent addition and multiplication

DepositAdd 100

Interest Multiply by 1.1

{…balance: 100

... }

{…balance: 100

... }76

Page 77: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• Eventual consistency

• Different operation order across replicas

• E.g. concurrent addition and multiplication

DepositAdd 100

Interest Multiply by 1.1

{…balance: 100

... }

{…balance: 100

... }77

Page 78: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• Eventual consistency

• Different operation order across replicas

• E.g. concurrent addition and multiplication

DepositAdd 100

Interest Multiply by 1.1

Inconsistent replicas

{…balance: 220

... }

{…balance: 210

... }78

Page 79: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• Single machine bottlenecks:

• Multiple machine bottlenecks:

Replication, locks, joins

Logging & locking

79

Page 80: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NoSQL

• When to use MongoDB?

• Non-interacting entities

– No sharing (e.g. bank account)

– No exchanging (e.g. money transfers)

• Commutative operations on data

• You need a flexible data model

80

Page 81: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

81

Page 82: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• ACID & good performance

• Redesign internal architecture.

82

Page 83: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• Data is normalized into multiple tables

• Tables partitioned and replicated across machines

ID name account-ID

1 Bob 1

2 Sara 1

3 Trudy 2

ID balance

1 100

2 450

Customers Accounts

83

Page 84: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• Single machine bottlenecks:

• Multiple machines bottlenecks:

Replication, locks, joins

Logging & locking

84

Page 85: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• History: concurrent transactions were introduced since disk was slow

• Today: Now all data is in main memory

• Transactions in main memory are fast

• Less need for concurrency

• VoltDB removes concurrency

• Thus, no need for locking

85

Page 86: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• Recall Bob and Sara

• Deposit $100 at same time to shared account

• Overwrite each other’s update

In VoltDB, this cannot happen

86

Page 87: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• Single machine bottlenecks:

• Multiple machine bottlenecks:

Replication, locks, joins

Logging & locking

87

Page 88: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• History: log introduced for recovery

• Today: it takes too long to recover from log

• Instead, replicate data across machines

• If one machine fails, others continue working

• Simplifies logging

88

Page 89: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• Single machine bottlenecks:

• Multiple machine bottlenecks:

Replication, locks, joins

Logging & locking

89

Page 90: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• Try to avoid joins across machines

• Store data that is commonly accessed at same time on same machine

ID name account-ID

1 Bob 1

2 Sara 1

ID balance

1 100

Customers

ID name account-ID

3 Trudy 2

ID balance

2 450

Customers

AccountsAccounts

90

Page 91: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

ID name account-ID

1 Bob 1

2 Sara 1

ID balance

1 100

Customers

ID name account-ID

3 Trudy 2

ID balance

2 450

Customers

AccountsAccounts

• Alleviates problem

• Does not solve it (e.g. money transfer)91

Page 92: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• Single machine bottlenecks:

• Multiple machine bottlenecks:

Replication, locks, joins

Logging & locking

92

Page 93: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• Tables are replicated

• Enforce operation order across replicas

DepositAdd 100

Interest Multiply by 1.1

ID balance

... 100

ID balance

... 100

93

Page 94: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• Tables are replicated

• Enforce operation order across replicas

DepositAdd 100

Interest Multiply by 1.1

ID balance

... 100

ID balance

... 100

94

Page 95: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• Tables are replicated

• Enforce operation order across replicas

DepositAdd 100

Interest Multiply by 1.1

ID balance

... 200

ID balance

... 200

95

Page 96: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• Tables are replicated

• Enforce operation order across replicas

DepositAdd 100

Interest Multiply by 1.1

ID balance

... 200

ID balance

... 200

96

Page 97: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• Tables are replicated

• Enforce operation order across replicas

DepositAdd 100

Interest Multiply by 1.1

ID balance

... 200

ID balance

... 200

97

Page 98: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• Tables are replicated

• Enforce operation order across replicas

DepositAdd 100

Interest Multiply by 1.1

ID balance

... 220

ID balance

... 220

98

Page 99: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• Tables are replicated

• Enforce operation order across replicas

DepositAdd 100

Interest Multiply by 1.1

ID balance

... 220

ID balance

... 220

99

Page 100: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• Single machine bottlenecks:

• Multiple machine bottlenecks:

Replication, locks, joins

Logging & locking

100

Page 101: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

NewSQL

• When to use VoltDB?

– run at scale

–You need 100% availability

–You need ACID

101

Page 102: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Conclusion

102

Page 103: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Conclusion

• Hardware is cheaper

103

Page 104: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Conclusion

Transactional Database

Data warehouses

104Row-store Column-store

Page 105: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Conclusion

• Single machine bottlenecks:

• Multiple machines bottlenecks:

Replication, locks, joins

Logging & locking

105

Page 106: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Conclusion

• NoSQL adapts by simplifying

– No ACID

– No joins

• NewSQL adapts by reengineering

– ACID

– Removes concurrency

– Simplifies logging

– Smart but limited partitioning across servers

106

Page 107: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Conclusion

• Disclaimer: there is much more

– Scientific databases:

– Time series databases:

– Graph databases

• Caveat: rapid changes

• But hopefully now you have reasoning tools

107

Page 108: CS165 Section Niv Dayan - Harvard Universitydaslab.seas.harvard.edu/.../S11_ACID_intro.pdf · NoSQL • MongoDB - Main decisions 1. No joins • Aggregate related data into “documents”

Conclusion

• Thanks!

108