25
http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research [email protected] http://research.microsoft.com/ ~gray/talks

Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research [email protected] gray/talks

Embed Size (px)

Citation preview

Page 1: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1

Store Everything

OnlineIn A Database

Jim GrayMicrosoft Research

[email protected]://research.microsoft.com/~gray/talks

Page 2: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 2

Outline

•Store Everything•Online (Disk not Tape)

• In a Database

Page 3: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 3

How Much is Everything?• Soon everything can be

recorded and indexed• Most bytes will never be

seen by humans.• Data summarization, trend

detection anomaly detection are key technologies

See Mike Lesk: How much information is there: http://www.lesk.com/mlesk/ksg97/ksg.html

See Lyman & Varian:

How much informationhttp://www.sims.berkeley.edu/research/projects/how-much-info/

Yotta

Zetta

Exa

Peta

Tera

Giga

Mega

KiloA BookA Book

.Movie

All LoC books(words)

All Books MultiMedia

Everything!

Recorded

A PhotoA Photo

24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli

Page 4: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 4

1E+3

1E+4

1E+5

1E+6

1E+7

1988 1991 1994 1997 2000

disk TB growth: 112%/y

Moore's Law: 58.7%/y

ExaByte

Disk TB Shipped per Year1998 Disk Trend (J im Porter)

http://www.disktrend.com/pdf/portrpkg.pdf.

Storage capacity beating Moore’s law

3 k$/TB today (raw disk)

1k$/TB by end of 2002

Moores law 58.70% /year

Revenue 7.47%TB growth 112.30% (since 1993)

Price decline 50.70% (since 1993)

Page 5: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 5

Outline

•Store Everything•Online (Disk not Tape)

• In a Database

Page 6: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 6

Online Data

• Can build 1PB of NAS disk for 5M$ today

• Can SCAN (read or write) entire PB in 3 hours.• Operate it as a data pump: continuous sequential scan

• Can deliver 1PB for 1M$ over Internet– Access charge is 300$/Mbps bulk rate

• Need to Geoplex data (store it in two places).

• Need to filter/process data near the source,– To minimize network costs.

Page 7: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 7

The “Absurd” Disk

• 2.5 hr scan time (poor sequential access)

• 1 access per second / 5 GB (VERY cold data)

• It’s a tape!

1 TB100 MB/s

200 Kaps

Page 8: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 8

Disk vs Tape

Disk– 80 GB– 35 MBps– 5 ms seek time– 3 ms rotate latency– 3$/GB for drive

2$/GB for ctlrs/cabinet– 15 TB/rack

– 1 hour scan

Tape– 40 GB– 10 MBps– 10 sec pick time– 30-120 second seek time– 2$/GB for media

8$/GB for drive+library– 10 TB/rack

– 1 week scan

The price advantage of disk is growing the performance advantage of disk is huge!At 10K$/TB, disk is competitive with nearline tape.

GuestimatesCern: 200 TB3480 tapes2 col = 50GBRack = 1 TB=12 drives

Page 9: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 90

100

200

300

400

500

Premium SAN Dell/3ware DIY

Building a Petabyte Disk Store• Cadillac ~ 500k$/TB = 500M$/PB

plus FC switches plus… 800M$/PB• TPC-C SANs (Brand PC 18GB/…) 60 M$/PB• Brand PC local SCSI 20M$/PB• Do it yourself ATA 5M$/PB

Page 10: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 10

Cheap Storage and/or Balanced System

• Low cost storage (2 x 3k$ servers) 5K$ TB2x ( 800 Mhz, 256Mb + 8x80GB disks + 100MbE)

raid5 costs 6K$/TB

• Balanced server (5k$/.64 TB)– 2x800Mhz (2k$)– 512 MB – 8 x 80 GB drives (2K$)– Gbps Ethernet + switch (300$/port)– 9k$/TB 18K$/mirrored TB

2x800 Mhz512 MB

Page 11: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 11

Next step in the Evolution• Disks become supercomputers

– Controller will have 1bips, 1 GB ram, 1 GBps net– And a disk arm.

• Disks will run full-blown app/web/db/os stack

• Distributed computing

• Processors migrate to transducers.

Page 12: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 12

It’s Hard to Archive a PetabyteIt takes a LONG time to restore it.

• At 1GBps it takes 12 days!• Store it in two (or more) places online (on disk?).

A geo-plex• Scrub it continuously (look for errors)• On failure,

– use other copy until failure repaired, – refresh lost copy from safe copy.

• Can organize the two copies differently (e.g.: one by time, one by space)

Page 13: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 13

Outline

•Store Everything•Online (Disk not Tape)

• In a Database

Page 14: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 14

Why Not file = object + GREP?• It works if you have thousands of objects

(and you know them all)

• But hard to search millions/billions/trillions with GREP

• Hard to put all attributes in file name.– Minimal metadata

• Hard to do chunking right.

• Hard to pivot on space/time/version/attributes.

Page 15: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 15

The Reality: it’s build vs buy

• If you use a file system you will eventually build a database system:– metadata, – Query, – parallel ops, – security,….– reorganize, – recovery, – distributed, – replication,

Page 16: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 16

OK: so I’ll put lots of objects in a fileDo It Yourself Database

• Good news: – Your implementation will be

10x faster than the general purpose one easier to understand and use than the general purpose on.

• Bad news: – It will cost 10x more to build and maintain– Someday you will get bored maintaining/evolving it– It will lack some killer features:

• Parallel search• Self-describing via metadata• SQL, XML, … • Replication• Online update – reorganization• Chunking is problematic (what granularity, how to aggregate)

Page 17: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 17

Top 10 reasons to put Everything in a DB1. Someone else writes the million lines of code2. Captures data and Metadata,3. Standard interfaces give tools and quick learning4. Allows Schema Evolution without breaking old apps5. Index and Pivot on multiple attributes

space-time-attribute-version….6. Parallel terabyte searches in seconds or minutes7. Moves processing & search close to the disk arm

(moves fewer bytes (qestons return datons). 8. Chunking is easier (can aggregate chunks at server).9. Automatic geo-replication 10. Online update and reorganization. 11. Security 12. If you pick the right vendor, ten years from now, there will

be software that can read the data.

Page 18: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 18

DB Centric Examples

• TerraServer– All images and all data in the database (chunked as small

tiles).www.TerraServer.Microsoft.com/

– http://research.microsoft.com/~gray/Papers/MSR_TR_99_29_TerraServer.doc

• SkyServer & Virtual Sky– Both image and semantic data in a relational store.– Parallel search & NonProcedural access are important.– http://research.microsoft.com/~gray/Papers/MS_TR_99_30_Sloan_Digital_Sky_Survey.doc

– http://dart.pha.jhu.edu/sdss/getMosaic.asp?Z=1&A=1&T=4&H=1&S=10&M=30– http://virtualsky.org/servlet/Page?F=3&RA=16h+10m+1.0s&DE=

%2B0d+42m+45s&T=4&P=12&S=10&X=5096&Y=4121&W=4&Z=-1&tile.2.1.x=55&tile.2.1.y=20

Page 19: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 19

OK… Why don’t they use our stuff?

• Wrong metaphor: HDF with hyper-slab is better match.

• Impedence match: getting stuff in/out of DB is too hard

• We sold them OODBs and they did not work (unreliable, poor performance, no tools).

• …

Page 20: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 20

So, why will the future be different?

• They have MUCH more data (10^8 files?)

• Java / C# eases impedance mismatch: rowsets == ragged arrays.

• Tools are better– Optimizers are better– CPU and disk parallelism actually works now– Statistical packages are better.

Page 21: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 21

Outline

•Store Everything•Online (Disk not Tape)

• In a Database

Page 22: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 22

But… The title of the talk was…“The Future of

Distributed Database Systems”

Nobody wants to share his database.

blocks, files, tables are wrong abstraction for networks.(too low level)

“Objects are the right abstraction”

So, UDDI / WSDL / SOAP is the solution (not SQL)

XML is the wire format, XLANG is the workflow protocol, Query will be in there somewhere.

Page 23: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 23

DDB technology GREAT in a Cluster

• Uniform architecture

• Trust among nodes

• High bandwidth-low latency communication

• Programs have single system image

• Queries run in parallel

• Global optimizer does query decomposition

Page 24: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 24

But in a Distributed System

• Heterogenous architecture makes query planning much harder

• No trust

• Communication is slow and expensive (minimize it).

Higher level abstraction to minimize round trips

Page 25: Http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 1 Store Everything Online In A Database Jim Gray Microsoft Research Gray@Microsoft.com gray/talks

http://research.microsoft.com/~gray/talks/Gray_GriPhyN.ppt 25

DDB the Trust Issue• Customers serve

themselves• Follow the rules posted

on the door• No Overhead, no staff!

• Clerks serve Customers • Take order, fill order, fill out

invoice, collect money. • Overhead: staff, training, rules,

DDB Grocery

• Customers serve themselves

• Follow the rules posted on the dorr

Client/Server Groceries