Upload
elvin-lamb
View
247
Download
1
Tags:
Embed Size (px)
Citation preview
What's Next for Database? Jim GrayMicrosofthttp://research.microsoft.com/~Gray
Keynote ▪ 30 September 2005 ▪ 9:00
Outline
Looking at the past: old problems now look easy
Looking forward:data avalanche hereintegrate ALL kinds of data
Watershed: The new world Programs + data: Info Ecosystem All data classes (Objectifying Information) Approximate answers
Keynote ▪ 30 September 2005 ▪ 9:00
Old Problems Now Look Easy
1985 goal: 1,000 transactions per second Couldn’t do it at the time At the time:
100 transactions/second 50 M$ for the computer
(y2005 dollars)
Keynote ▪ 30 September 2005 ▪ 9:00
Old Problems Now Look Easy
1985 goal: 1,000 transactions per second Couldn’t do it at the time At the time:
100 transactions/second 50 M$ for the computer
(y2005 dollars)
Now: easy Laptop does 8,200 debit-
credit tps ~$400 desktop
Thousands of DebitCredit Transactions-Per-Second: Easy and Inexpensive, Gray & Levine, MSR-TR-2005-39, ftp://ftp.research.microsoft.com/pub/tr/TR-2005-39.doc
Keynote ▪ 30 September 2005 ▪ 9:00
Hardware & Software Progress Throughput 2x per 2 years tracks MHz
X86&X64 tpmC per CPU over time
100
1,000
10,000
100,000
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
tpm
C/c
pu
30x in 10 years41%/yearDouble every 2 years
X86&X64 tpmC per Mhz over time
0
5
10
15
20
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Throughput/$ 2x per 1.5 years 40%/y hardware, 20%/y software
A Measure of Transaction Processing 20 Years Later ftp://ftp.research.microsoft.com/pub/tr/TR-2005-57.doc IEEE Data Engineering Bulletin, V. 28.2, pp. 3-4, June 2005
TPC-A and TPC-C tps/$ Trends
0.01
0.10
1.00
10.00
100.00
1000.00
1990 1992 1994 1996 1998 2000 2002 2004
Th
rou
gh
pu
t /
k$
TPC-CTPC A
~100x in 10 years~2x per 1.5 years
No obvious end in sight!
~2x / 1.5 years
Keynote ▪ 30 September 2005 ▪ 9:00
100x Improvement Every Decade $1B job becomes $10M job $1M job becomes 10K$ job Terabytes common now (~500$ today) Petabytes in a decade.
Challenge: We can capture & store everything. What’s interesting? What can you tell me about X?
Keynote ▪ 30 September 2005 ▪ 9:00
Q: How Much is “Everything”A: About 15 Exabytes
Q: How much is digital?A: 70% and growing
Q: Where does it come from?A: Video, voice, sensors,
Q: How fast is it growing?A: Growing 10%/y now, 55%/y when ALL digital
PB/y CAG
print 0.2 2%
film 427 4%
video 300 5%
computer 1,693 55%
Information Growth vsStorage Media
Source: Larson & Varian, “How Much Information”: as of 2003 http://www.sims.berkeley.edu/research/projects/how-much-info/
Keynote ▪ 30 September 2005 ▪ 9:00
Where is the Data?Smart Objects Everywhere Phones, PDAs, Cameras,… have small DBs. Disk drives have enough cpu, memory
to run a full-blown DBMS. All these devices want-need to share data. Need a simple-but-complete dbms They need an Esperanto:
a data exchange language and paradigm.
Billions of Clients Millions of Servers
Keynote ▪ 30 September 2005 ▪ 9:00
The Perfect System Knows everything Knows what you want to know Tells you the answer…
in a an easy-to-understand way; just before you ask
Tells you what you should have asked And…
It is inexpensive to buy It is inexpensive to own.
Well, maybe not everyone wants this… but every organization does.
Keynote ▪ 30 September 2005 ▪ 9:00
Oh! And the PEOPLE COSTS are HUGE! People costs have always exceeded IT capital. But now that hardware is “free” … Self-managing, self-configuring, self-healing, self-
organizing and … is key goal. No DBAs for cell phones or cameras. Requires
Clear and simple knobs on modules Software manages these knobs
Keynote ▪ 30 September 2005 ▪ 9:00
Our Challenge Capture, Store, Organize, Search, Display
All information. Personal Organizational Societal
There is a huge gap between what we have today and what we need.
Data capture is relatively easy Curate, Organize, Search, Display still too hard.
Keynote ▪ 30 September 2005 ▪ 9:00
Outline
Looking at the past: old problems now look easy
Looking forward:data avalanche hereintegrate ALL kinds of data
Watershed: The new world Programs + data: Info Ecosystem All data classes (Objectifying Information) Approximate answers
Keynote ▪ 30 September 2005 ▪ 9:00
DBMS Re-conceptualization Re-Unification of Programs & Data Allows Objectification of Information
eg: what is a gene? What properties&methods?
what is a person? What properties&methods?
What is an X? What properties&methods?
Need to “glue” all these models together Time, Space, text,… are core types Person, event, document, gene,.. are extensions. The “Action” is in these extensions.
Keynote ▪ 30 September 2005 ▪ 9:00
Code and Data: Separated at Birth
COBOL IDENTIFICATION: document
ENVIRONMENT: OS
DATA: Files/Records
PROCEDURE: code
AUTHOR, PROGRAM-ID, INSTALLATION, SOURCE-COMPUTER, OBJECT-COMPUTER, SPECIAL-NAMES, FILE-CONTROL, I-O-CONTROL, DATE-WRITTEN, DATE-COMPILED, SECURITY.
CONFIGURATION SECTION. INPUT-OUTPUT SECTION.
FILE SECTION. FILE SECTION. WORKING-STORAGE SECTION. WORKING-STORAGE SECTION. LINKAGE SECTION. LINKAGE SECTION. REPORT SECTION. REPORT SECTION. SCREEN SECTION.SCREEN SECTION.
CODASYL - DBTG COnference on DAta SYstems Languages Data Base Task GroupDefined DDL for a network data model Set-Relationship semantics Cursor Verbs
Isolated from procedures.
No encapsulation“knowledge”
“data”
Keynote ▪ 30 September 2005 ▪ 9:00
The Object-Relational Worldmarry programming languages and DBMSs
Stored procedures evolve to “real” languagesVB, Java, C#,.. With real object models.
Data encapsulated: a class with methods Tables are enumerable & indexable
record sets with foreign keys Records are vectors of objects Opaque or transparent types Set operators on transparent classes Transactions:
Preserve invariants A composition strategy An exception strategy
Ends Inside-DB Outside-DB dichotomy
Klaus Wirth: Programs = Algorithms + Data Structures
Business Business ObjectsObjects
Keynote ▪ 30 September 2005 ▪ 9:00
Ask not “How to add objects to databases?”,Ask “What kind of object is a database?”
Q: Given an object model, what is a DB?A: DataSet class and methods
(nested relation with metadata)The basis for the ecosystem
Distributed DBExtensible DBInteroperable DB….
implicit in ODBC, OleDBexplicit within the DBMS ecosystem
Input: Command (any language) Output: Dataset
Tablesor Textor cubeOr…..
Question
Dataset
Keynote ▪ 30 September 2005 ▪ 9:00
DB System Architecture
The classic DBMS model
os
records
sets
utilities
Added:+Text, Time, Space
+ Triggers and queues + Replication, Pub/sub + Extract-Transform-Load + Cubes, Data mining
+ XML, XQuery+ Programming Languages+ Many more extensions coming
…
Replicatio
n
ET
LT
extC
ubesD
ata Mine
Tim
eS
paceN
otification
Procedure
s
QueuesX
ML
os
records
sets
utilities
A Mess?
but applications need to query other data types
Keynote ▪ 30 September 2005 ▪ 9:00
Evolving to be Information Services Container
develop, deploy, and execution environment Classic ++
+ Programming Languages + Triggers and queues + Replication, Pub/sub + Extract-Transform-Load + Text, Time, Space + Cubes, Data mining + XML, XQuery + Many more extensions coming
DBMS is an ecosystemOO is the key structuring strategy: Everything is a class Database is a complex object Core object is DataSet Classes publish/consume them Depends on strong Object Model
os
records
sets
utilities
DataSet
Keynote ▪ 30 September 2005 ▪ 9:00
Internet
Our API
catalogs
Query Processor
data
Applications
Competio
r1
Other us
Other us
Other us
Other us
itterators
Buffer Pool
Remote Node Remote Node
Competitor2
What’s Outside?
Keynote ▪ 30 September 2005 ▪ 9:00
Classic: What’s Outside? Three Tier Computing
Clients gather input, do presentation do some workflow (script)
Send high-level requests to ORB (Object Request Broker)
ORB dispatches workflows, orchestrate flows & queues
Workflows invoke business objects Business object read/write database
DatabasesDatabases
Business Business ObjectsObjects
workflowsworkflows
PresentationPresentation
Keynote ▪ 30 September 2005 ▪ 9:00
DatabasesDatabases
Business Business ObjectsObjects
workflowsworkflows
PresentationPresentation
DBMS is Web Service!Client/server is back; the revenge of TP-lite
Web servers and runtimes (Apache, IIS, J2EE, .NET) displaced TP monitors & ORBS Give persistent objects Holistic programming model & environment
Web services (soap, wsdl, xml)are displacing current brokers
DBMS listening to Port 80publishing WSDL, DISCO,WS-Sec Servicing SOAP calls.DBMS is a web service
Basis for distributed systems. A consequence of OR DBMS
DB
MS
DB
MS
Keynote ▪ 30 September 2005 ▪ 9:00
Queues & Workflows Apps are loosely connected via
Queued messages Queues are databases. Basis for workflow Queues: the first class to add to
an OR DBMS Queues fire triggers.
Active databases Synergy with DBMS
security, naming, persistence, types, query,…
Workflow:Script Execute Administer &
Expedite all built on queues
Keynote ▪ 30 September 2005 ▪ 9:00
What’s new here? DBMS have tight-integration with
language classes (Java, C#, VB,.. )
The DB is a class You can add classes to DB. Adding indices is “easy”
If you have a new idea. Now have solid queue systems
Adding workflow is “easy”If you have a new idea.
This is a vehicle for publishing data on the Web.
Tablesor Textor cubeOr…..
Question
Dataset
Internet
Internet
Web serviceTablesor Textor cubeOr…..
Keynote ▪ 30 September 2005 ▪ 9:00
Text, Temporal, and Spatial Data Access Q: What comes after queues?
A: Basic types: text, time, space,… Great application of OR technology Key idea:
table valued functions == indicesAn index is a table, organized differentlyQuery executor uses index to map: Key → set (aka sequence of rows)
Table valued function can do this mapOptimizer can use it.
+extras: cost function, cardinality,…
BIG DEAL: Approximate answers: Rank and Support
select Title, Abstract, T.Rank from Books join FreeTextTable(Title, Abstract, 'XML semistructured') Ton BookID = T.Key
select store, holiday, sum(sales) from Sales join HolidayDates(2004) Ton Sales.day = T.daygroup by store, holiday
select galaxy, distance from GetNearbyObjEQ(22,37)
Keynote ▪ 30 September 2005 ▪ 9:00
Data Mining and Machine Learning
Tasks: classification, association, prediction Tools: Decision trees, Bayes, A Priori,
clustering, regression, Neural net,… now unified with DBs
Create table T (x,y,z,u,v,w)Learn “x,y,z” from “u,v,w” using <algorithm>
Train T with data. Then can ask:
Probability x,y,z,u,v,w What are the u,v,w probabilities given x,y,z
Example: Learn height from age. Anyone with a data mining algorithm has
full access to the DBMS infrastructure. Challenge: Better learning algorithms.
Keynote ▪ 30 September 2005 ▪ 9:00
Notification:Stream and Sensor Processing
Traditionally: Query billions of facts
Streams: millions of queries one new fact New protein compare to all DNA Change in price or time
Implications New aggregation operators (extension) New programming style Streams in products:
Queries represented as records New query optimizations.
Sensor networks push queries out to sensors. Simpler programming model Optimizes power & bandwidth
facts
Q?
A!
Qfact, fact, fact…
Notification
Keynote ▪ 30 September 2005 ▪ 9:00
Semi-Structured Data “Everyone starts with the same schema:
<stuff/>.” Then they refine it.” J. Widom
“Strong schema” has pros-and-cons.
Files <stuff/> and XML <<foo/> <bar/>>are here to stay. Get over it!
File directories are databases; Pivot on any attribute Folders are standing queries. Freetext+schema search (better precision/recall)
Cohabit with row-stores
Keynote ▪ 30 September 2005 ▪ 9:00
Publish-Subscribe, ReplicationExtract-Transform-Load (ETL) Data has many users Replicas for availability and/or performance Mobile users do local updates synchronize later. Classic Warehouse
Replicate to data warehouse Data marts subscribe to publications
Disaster Recovery geoplex ETL is a major application & component
Data loading Data scrubbing Publish/subscribe workflows.
Key to data integration (capture / scrub)
Keynote ▪ 30 September 2005 ▪ 9:00
Restatement: DB Systems evolved to be containers for information servicesdevelop, deploy, and execution environment
DBMS is an ecosystemKey structuring strategy: Everything is a class Database is a complex object Core object is DataSet Approximate answers
This architecture lets you add your new ideas.
DataSet
os
records
sets
utilities
Keynote ▪ 30 September 2005 ▪ 9:00
Summary:
Looking at the past: old problems now look easy
Looking forward:data avalanche hereintegrate ALL kinds of data
Watershed: The new world Programs + data: Info Ecosystem All data classes (Objectifying Information) Approximate answers
Keynote ▪ 30 September 2005 ▪ 9:00
Additional Resources Papers at: http://research.microsoft.com/~gray/JimGrayPublications.htm Talks at:
http://research.microsoft.com/~gray/JimGrayTalks.htm
Basis for this talk: “The Revolution in Database Architecture”http://research.microsoft.com/research/pubs/view.aspx?tr_id=735
Very interesting & related: David Campbell“Service Oriented Database Architecture: App Server-Lite?”http://research.microsoft.com/research/pubs/view.aspx?tr_id=983
Thank you!Thank you for attending this session and the 2005 PASS
Community Summit in Grapevine! Please help us improve the quality of our conference by completing your
session evaluation form. Completed evaluation forms may be given to the room monitor as you exit or to staff at the
registration desk.