35
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gra y/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina Bitton Jim Gray

Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Embed Size (px)

Citation preview

Page 1: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

1

The Rebirth ofDatabase Machines

Dina Bitton

Jim Gray

Page 2: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

2

Outline

• Active Disks are coming

• Disk Tutorial (not presented, but slides in deck)

• Disk Arms are important (optimize them)

• The Rebirth of Database Machines

Page 3: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

3

Disks of 30 Years Ago

• 10 MB

• Failed every few weeks

• Cost more than 400$

Page 4: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

4

Disk Arrays

• 24 cpus

• 384 disks

• More mips in the disks than in the cpus

Page 5: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

5

Year 2003 Disks• Big disk (10 $/GB)

– 3”– 200 GB– 150 kaps (k accesses per second)– 30 MBps sequential

• Small disk (20 $/GB)– 2”– 40 GB– 100 kaps – 20 MBps sequential

• Both running DBMS, Mail, Web, and OS

Page 6: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

6

• From CMU Active Disk web sitehttp://www.pdl.cs.cmu.edu/Active/

Page 7: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

7

Research Problem: When every disk is a super-computer…

And there are thousands of them...

• Who manages data placement?

• Query plans among 1,000 severs?

• How does

– mirroring work?

– backup work?

• Where does my program run?

Page 8: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

8

Relevant University Research on Active Disks

• Kim Keeton & Dave Patterson @ UC Berkeleyhttp://www.cs.berkeley.edu/~pattrsn/talks/sigmod98-keynote.ppt

• Erik Riedel & Garth Gibson @ CMUhttp://www.pdl.cs.cmu.edu/Active/

• Mike Franklin @ U Marylandhttp://www.cs.umd.edu/projects/bdisk

• Anurag Acharya, Mustafa Uysal @ UC SBhttp://www.cs.ucsb.edu/TRs/TRCS98-06.html

Page 9: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

9

Outline

• Active Disks are coming

• Disk Tutorial (not presented, but slides in deck)

• Disk Arms are important

• The Rebirth of Database Machines

Page 10: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

12

Disk Access Time

• Access time = SeekTime 6 ms+ RotateTime 3 ms+ ReadTime 1 ms

• Rotate time:– 5,000 to 10,000 rpm

• ~ 12 to 6 milliseconds per rotation• ~ 6 to 3 ms rotational latency

Page 11: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

13

Disk Access Time Improves Slowly• Access time = SeekTime 6 ms 8%/y

+ RotateTime 3 ms 8%/y+ ReadTime 1 ms 40%/y

• Other useful facts:– Power rises more than size3 (small is indeed beautiful)

– Small devices are more rugged– Small devices can use plastics (forces are much smaller)

e.g. bugs fall without breaking anything

Page 12: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

14

Disk Seek Time

• Seek time is ~ Sqrt(distance)(distance = 1/2 acceleration x time2)

• Specs assume seek is 1/3 of disk

• Short seeks are common. (over 50% are zero length)

• Typical 1/3 seek time: 6 ms

• 4x improvement in 20 years.

Full Accelerate Full Stop

spee

d

time

Page 13: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

20

Disk Access Ratios Have Changed• Key metrics:

$/GBKaps/GB (KB accesses per second per GB)

SCAN: time to scan the disk• Scan going from minutes to days• Disk arms are precious resource

(disk capacity is no longer the precious resource) Kaps/GB went from 500 to 7 and going to 1

yearCapacity

GB $/GB kapskaps/GB

ScanSequential

ScanRandom

1988 0.25 20,000 30 1200 2 minutes 20 minutes1998 18 50 120 7 20 minutes 5 hrs2003 200 5 200 1 2 hrs 1.2 days

Page 14: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

21

Stripe For More Bandwidth• N-stores have N-times the bandwidth

• Works great!

• Supported by most file systems

Page 15: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

22

Mirrors: Replicate Stores for Availability• Read one, write all

• If one fails, rebuild from survivor

• Run scrubber in background to fix faults

• N-replicas can give N-times the bandwidth

• UnAvailabity ~ NMTTF

MTTF

years000,000,1years50

day12

A Million Years!!!

Page 16: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

23

RAID5: Parity Saves Storage Space

• Mirrors: 50% storage overhead– read one, write both

• RAID5: 12% Storage overhead: – read one, write one plus parity

PARITY

Page 17: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

24

Interesting Fact: Mirrored Disks Optimize Disk Arms

• Doubles read bandwidthSequential: Read

stagger reads from each drive (stripe)

Random: Read closest arm seek is min seekseek is min seek.

• Doubles write cost (write both)

– Write time increases because

seek is max seekseek is max seek.

Seek Distance vs Disks

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6 7 8

Number of disks

Fra

ctio

n o

f d

isk

surf

ace

for

seek

Write Seek

Read Seek

Page 18: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

25

If Mix Reads & WritesMirror is Better Than Partition

• 2 servers are better than one

• Benefit is better than 2x write cost if reads writes

Seek Distance vs Disks

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1 2 3 4 5 6 7 8

Number of disks

Write Seek

Read Seek

Normalized Seek Time

0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8

Number of disks

Frac

tion

of d

isk

surf

ace

for

seek

Write

25% Read

50% read

75% read

Read

Page 19: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

26

What if you have LOTS of Disks• When you have BIG disks (200 GB),

arms are precious, space is cheap.

• If you replicate 1000x– write seek time asymptotically approaches 1.7x– read seek time asymptotically approaches zero.

0

100

200

300

400

500

600

700

800

900

1000

1 10 100 1000

Write

Read

Distance to Seek Time to Seek

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

1 2 3 4 5 6 7 8 9 10

disks

tim

e

Read

Write

Time to Seek

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 200 400 600 800 1000

disks

tim

e

Read

Write

Page 20: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

27

Outline

• Active Disks are coming

• Disk Tutorial (not presented, but slides in deck)

• Disk Arms are important

• The Rebirth of Database Machines

Page 21: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

28

The Rebirth of Database Machines

Dina Bitton Jim Gray

IDS Microsoft

Page 22: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

29

Outline

• Performance hungry databases

• History: life and death of database machines

• What has changed that can make database machines work today

• Shared-Nothing Database Machine

• Where is the required bandwidth

• DMP : Shared-Nothing & Shared-Everything

Page 23: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

30

Demand for Database Performance

• Larger Databases:

– marketing data warehouses: TB of historical data

– daily news broadcasts: 1 TB of searchable video/audio data

• Large Scans: Searches require access to large fraction of database

• Repeated Scans: DSS queries, Data mining algorithms

Page 24: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

31

Life, Death & Reincarnation• Database Machines are coming, Database Machines

are coming ... (Hsiao 1979)

• Then there was Britton-Lee, Direct, ICL …– Teradata builds highly-parallel shared-nothing SQL

server– many university “paper” designs

• “Database Machines, An Idea whose time has Passed?” (Boral- DeWitt 1983)

• Then there was MMDBs, Grace, Gamma and more Teradata

• Then there was Software (Parallel Database Query)• Next: PDQ + lots of disks with power controllers

Page 25: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

32

And All Along

Stonebraker’s Opinion:

“The history of DBMS research is littered with innumerable proposals to construct hardware database machines to provide high performance operations. In general these have been proposed by hardware types with a clever solution in searchof a problem on which it might work.”

Readings in Database Systems, Morgan-Kaufmann

Page 26: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

33

Why Not then, but Yes now• Too early: small databases on 1 disk

TB databases span thousands disks, need partitioning • Disk filter designs: addressed only small part of DBMS

requirementsdisk controllers are fast computers

• Exotic technologies (bubbles, CCD…) went away• Special purpose hardware increased design time and

costHigher level of integration, VLSI design tools better

• Parallel query processing was not well-understoodLarge body of research, successful commercial implementations

Page 27: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

34

Parallel Query Processing[DeWitt-Gray CACM91]

Source Data

Scan

Sort

Source Data

Scan

Sort

Source Data

Scan

Sort

Source Data

Scan

Sort

Source Data

Scan

Sort

Merge

Pipelining

data streams flow from one operator to the next

Partitioning

tables are partitioned to allow concurrent processing on partitions

Page 28: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

35

Data Pathway Contention[Patterson Sigmod 1998]

• Diskexternal I/O bus bottleneck to transfer rate, cost

• Networkinternal I/O bus interface is bottleneck to delivered bandwidth

• Memory-Processorprocessor-memory interface (cache+memory bus) is bottleneck to delivered bandwidth

Page 29: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

36

Processor&

Memory

Processor&

Memory

Processor&

Memory

Processor&

Memory. . .

Scalable Interconnect

A Shared-Nothing Database Machine

No contention in memory access or parallel disk access => “Embarrassingly Parallel” Scan [Patterson]But: how fast need Interconnect be? Each processor has own OS, communication protocols,DB instance Exchange data streams for pipelining ops, for sort, merge Can’t support M:N mapping between disks & threads

Page 30: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

37

Share-Everything?

• Need more bandwidth for shipping data streams than network can provide

• Need M:N mapping from disks to processors for sort/merge

• Control & synchronization: Data-flow best to synchronize processors

Page 31: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

38

Level ofIntegration

Components Links Throughput Latency

Chip Transistor /Gate

Connectionlines

30 GB/Sec(16 64-bit registersat 200 MHz)

1-8 internal clocks

Board Chips / Discretecomponents

1. Point-to-pointconnections2. Buses

1. 800 MB/Sec

2. 150 MB/Sec

1.Half of transaction(10 clocks of theslowest device )1. 10-50 bus clocks

System Board /Interface

1. Crossbars2. Buses

1. 200-500MB/Sec

2. 80 MB/Sec

1. 10 crossbar clocks

2. 10-50 bus clocks

Network Node (Systems)/ Bridges

Fibre Channel

Ethernet / VIA

Fibre Channel :100–200 MB/Sec

Sender overhead +Receiver overhead +transmission latency +link availability

Where to Get the Bandwidth?

Page 32: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

39

Direct connection

BAM RAM

P 4

I/O interface adapter

Bus adapter

DMP BOARD

To Host Computer To other DMP Boards via high-speed switch

• Massive Parallel Operation data-flow control• M:N thread-to-disk RFM

. . . . . .P 1

NP 1 NP 2 NP 16

Direct processor to disk accessDirect disk to memory connect

1 80...

The Data Manipulation Platform

Page 33: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

40

Scan tabX 1

Scan tabX 2

Scan tabX 32

Scan tabY 3

Scan tabY 1. . . . . .

Exchange 1 Exchange 2

HJoin HJoin HJoin

Exchange 3

Exchange 4

Group 1 Group 2

Sort 1 Sort 2

Exchange 5

1 2 32 1 3

Select sum(tabX.amount*.08), tabY.region from tabX,tabY

where tabX.key=tabY.regiongroup by tabY.region, order by tabY.region;

A DSS Query Execution Plan

. . . . . .

Database Disks

Temp Disks

1/3 selected

1/10 joined

1/10 grouped

Page 34: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

41

Scan tabX 1

Scan tabX 2

Scan tabX 32

Scan tabY 3

Scan tabY 1. . .

Exchange 1 Exchange 2

HJoin HJoin HJoin

Exchange 3

Exchange 4

Group 1 Group 2

Sort 1 Sort 2

Exchange 5

1 2 32 1 3

Bandwidth Requirements

. . . . . .Database Disks

Temp DiskContention

32*20MB/s= 640 MB/s

2.1 MB/s

21 MB/s

210 MB/s

Page 35: Bitton & Gray: The Rebirth of Database Machines, Gray/talks/DB_Machine_Rebirth.ppt 1 The Rebirth of Database Machines Dina

Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt

42

Conclusion

DMP: shared-nothing and shared-everything

IT ISN’T THAT YOU CAN’T SHAREIT IS WHERE YOU SHARE

ON A CHIP ON A BOARD ON A NETWORK