52
SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Embed Size (px)

Citation preview

Page 1: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

SIGMOD ‘97Industrial Session 5

SIGMOD ‘97Industrial Session 5

Standard Benchmarks for Database Systems

Chair: Dave DeWitt

(Jim Gray, in absentia)

Page 2: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-C:The OLTP Benchmark

TPC-C:The OLTP Benchmark

Charles LevineMicrosoft

[email protected]

Page 3: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Benchmarks: What and WhyBenchmarks: What and Why

What is a benchmark? Domain specific

No single metric possible The more general the benchmark, the less useful it is for anything in particular. A benchmark is a distillation of the essential attributes of a workload

Desirable attributes Relevant meaningful within the target domain Understandable Good metric(s) linear, orthogonal, monotonic Scaleable applicable to a broad spectrum of hardware/architecture Coverage does not oversimplify the typical environment Acceptance Vendors and Users embrace it

Page 4: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Benefits and LiabilitiesBenefits and Liabilities

Good benchmarks Define the playing field Accelerate progress

Engineers do a great job once objective is measurable and repeatable

Set the performance agenda Measure release-to-release progress Set goals (e.g., 10,000 tpmC, < 50 $/tpmC) Something managers can understand (!)

Benchmark abuse Benchmarketing Benchmark wars

more $ on ads than development

Page 5: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Benchmarks have a LifetimeBenchmarks have a Lifetime

Good benchmarks drive industry and technology forward. At some point, all reasonable advances have been made. Benchmarks can become counter productive by encouraging

artificial optimizations. So, even good benchmarks become obsolete over time.

Page 6: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

What is the TPC?What is the TPC?

TPC = Transaction Processing Performance Council Founded in Aug/88 by Omri Serlin and 8 vendors. Membership of 40-45 for last several years

Everybody who’s anybody in software & hardware

De facto industry standards body for OLTP performance

Administered by:Shanley Public Relations ph: (408) 295-8894777 N. First St., Suite 600 fax: (408) 295-9768San Jose, CA 95112-6311 email: [email protected]

Most TPC specs, info, results are on the web page: http://www.tpc.org

Page 7: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-C OverviewTPC-C Overview

Moderately complex OLTP The result of 2+ years of development by the TPC Application models a wholesale supplier managing orders. Order-entry provides a conceptual model for the benchmark;

underlying components are typical of any OLTP system. Workload consists of five transaction types. Users and database scale linearly with throughput. Spec defines full-screen end-user interface. Metrics are new-order txn rate (tpmC) and price/performance

($/tpmC) Specification was approved July 23, 1992.

Page 8: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-C’s Five TransactionsTPC-C’s Five Transactions

OLTP transactions: New-order: enter a new order from a customer Payment: update customer balance to reflect a payment Delivery: deliver orders (done as a batch transaction) Order-status: retrieve status of customer’s most recent order Stock-level: monitor warehouse inventory

Transactions operate against a database of nine tables. Transactions do update, insert, delete, and abort;

primary and secondary key access. Response time requirement: 90% of each type of transaction must

have a response time 5 seconds, except stock-level which is 20 seconds.

Page 9: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-C Database SchemaTPC-C Database Schema

WarehouseWarehouseWW

LegendLegend

Table NameTable Name<cardinality><cardinality>

one-to-manyone-to-manyrelationshiprelationship

secondary indexsecondary index

DistrictDistrictW*10W*10

1010

CustomerCustomerW*30KW*30K

3K3K

HistoryHistoryW*30K+W*30K+

1+1+

ItemItem100K (fixed)100K (fixed)

StockStockW*100KW*100K100K100K WW

OrderOrderW*30K+W*30K+1+1+

Order-LineOrder-LineW*300K+W*300K+

10-1510-15

New-OrderNew-OrderW*5KW*5K0-10-1

Page 10: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

22

TPC-C WorkflowTPC-C Workflow

11

Select txn from menu:Select txn from menu:1. New-Order 1. New-Order 45%45%2. Payment 2. Payment 43%43%3. Order-Status3. Order-Status 4%4%4. Delivery 4. Delivery 4%4%5. Stock-Level 5. Stock-Level 4%4%

Input screenInput screen

Output screenOutput screen

Measure menu Response TimeMeasure menu Response Time

Measure txn Response TimeMeasure txn Response Time

Keying time

Think time

33

Go back to 1Go back to 1

Cycle Time DecompositionCycle Time Decomposition(typical values, in seconds,(typical values, in seconds, for weighted average txn)for weighted average txn)

Menu = 0.3Menu = 0.3

Keying = 9.6Keying = 9.6

Txn RT = 2.1Txn RT = 2.1

Think = 11.4Think = 11.4

Average cycle time = 23.4Average cycle time = 23.4

Page 11: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Data SkewData Skew

NURand - Non Uniform Random NURand(A,x,y) = (((random(0,A) | random(x,y)) + C) % (y-x+1)) + x

Customer Last Name: NURand(255, 0, 999) Customer ID: NURand(1023, 1, 3000) Item ID: NURand(8191, 1, 100000)

bitwise OR of two random values skews distribution toward values with more bits on

75% chance that a given bit is one (1 - ½ * ½) skewed data pattern repeats with period of smaller random number

Page 12: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

NURand DistributionNURand Distribution

TPC-C NURand function: frequency vs 0...255

Record Identitiy [0..255]

Rel

ativ

e F

requ

ency

of

Acc

ess

to T

his

Rec

ord

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0

10

20

30

40

50

60

70

80

90

10

0

110

12

0

13

0

14

0

15

0

16

0

17

0

18

0

19

0

20

0

21

0

22

0

23

0

24

0

25

0

cumulativedistribution

Page 13: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

ACID TestsACID Tests

TPC-C requires transactions be ACID. Tests included to demonstrate ACID properties met. Atomicity

Verify that all changes within a transaction commit or abort.

Consistency Isolation

ANSI Repeatable reads for all but Stock-Level transactions. Committed reads for Stock-Level.

Durability Must demonstrate recovery from

Loss of power Loss of memory Loss of media (e.g., disk crash)

Page 14: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TransparencyTransparency

TPC-C requires that all data partitioning be fully transparent to the application code. (See TPC-C Clause 1.6) Both horizontal and vertical partitioning is allowed All partitioning must be hidden from the application

Most DBMS’s do this today for single-node horizontal partitioning. Much harder: multiple-node transparency.

For example, in a two-node cluster:

Warehouses:Warehouses:1-1001-100 101-200101-200

Node ANode Aselect * select * from warehousefrom warehousewhere W_ID = 150where W_ID = 150

Node BNode Bselect * select * from warehousefrom warehousewhere W_ID = 77where W_ID = 77

Any DML operation must beAny DML operation must beable to operate against the able to operate against the entire database, regardless of entire database, regardless of physical location.physical location.

Page 15: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Transparency (cont.)Transparency (cont.)

How does transparency affect TPC-C? Payment txn: 15% of Customer table records are non-local to the

home warehouse. New-order txn: 1% of Stock table records are non-local to the home

warehouse.

In a distributed cluster, the cross warehouse traffic causes cross node traffic and either 2 phase commit, distributed lock management, or both.

For example, with distributed txns:

Number of nodesNumber of nodes % Network Txns% Network Txns11 0022 5.55.533 7.37.3nn10.910.9

Page 16: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-C Rules of ThumbTPC-C Rules of Thumb

1.2 tpmC per User/terminal (maximum) 10 terminals per warehouse (fixed) 65-70 MB/tpmC priced disk capacity (minimum) ~ 0.5 physical IOs/sec/tpmC (typical) 250-700 KB main memory/tpmC (how much $ do you have?) So use rules of thumb to size 10,000 tpmC system:

How many terminals? How many warehouses? How much memory? How much disk capacity? How many spindles?

» 8340 = 10000 / 1.2» 8340 = 10000 / 1.2

» 834 = 8340 / 10» 834 = 8340 / 10

» 2.5 - 7 GB » 2.5 - 7 GB

» 650 GB = 10000 * 65» 650 GB = 10000 * 65

» Depends on MB capacity vs. physical IO. » Depends on MB capacity vs. physical IO. Capacity: 650 / 8 = 82 spindlesCapacity: 650 / 8 = 82 spindlesIO: 10000*.5 / 82 = 61 IO/sec IO: 10000*.5 / 82 = 61 IO/sec TOO HOT!TOO HOT!

Page 17: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Response TimeResponse Timemeasured heremeasured here

Typical TPC-C Configuration (Conceptual)Typical TPC-C Configuration (Conceptual)

DatabaseDatabaseServerServer

......

ClientClient

C/SLAN

Term.LAN

Presentation ServicesPresentation Services Database FunctionsDatabase FunctionsEmulated User LoadEmulated User Load

Har

dwar

eH

ardw

are

RTERTE, e.g.:, e.g.:EmpowerEmpowerpreVuepreVueLoadRunnerLoadRunnerS

oftw

are

Sof

twar

e TPC-C application +TPC-C application +Txn Monitor and/orTxn Monitor and/ordatabase RPC librarydatabase RPC librarye.g., Tuxedo, ODBCe.g., Tuxedo, ODBC

TPC-C application TPC-C application (stored procedures) + (stored procedures) + Database engine +Database engine +Txn MonitorTxn Monitore.g., SQL Server, Tuxedoe.g., SQL Server, Tuxedo

Driver SystemDriver System

Page 18: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Competitive TPC-C Configuration TodayCompetitive TPC-C Configuration Today

8070 tpmC; $57.66/tpmC; 5-yr COO= 465 K$ 2 GB memory, disks: 37 x 4GB + 48 x 9.1GB (560 GB total) 6,700 users

Page 19: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-C Current ResultsTPC-C Current Results

Best Performance is 30,390 tpmC @ $305/tpmC (Digital) Best Price/Perf. is 7,693 tpmC @ $42.53/tpmC (Dell)

-

50

100

150

200

250

300

350

- 5,000 10,000 15,000 20,000 25,000 30,000 35,000

Throughput (tpmC)

Pri

ce/P

erfo

rman

ce (

$/tp

mC

)

Compaq

Dell

Digital

HP

IBM

NCR

SGI

Sun

TPC-C results as of 5/9/97TPC-C results as of 5/9/97

Page 20: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-C Results (by OS)TPC-C Results (by OS)

TPC-C Results by OS

-

50

100

150

200

250

300

350

400

- 5,000 10,000 15,000 20,000 25,000 30,000

Throughput (tpmC)

Pri

ce/P

erfo

rman

ce (

$/tp

mC

)

Unix

Windows NT

TPC-C results as of 5/9/97TPC-C results as of 5/9/97

Page 21: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-C Results (by DBMS)TPC-C Results (by DBMS)

TPC-C Results by DBMS

-

50

100

150

200

250

300

350

400

- 5,000 10,000 15,000 20,000 25,000 30,000

Throughput (tpmC)

Pri

ce/P

erfo

rman

ce (

$/tp

mC

) Informix

Microsoft

Oracle

Sybase

TPC-C results as of 5/9/97TPC-C results as of 5/9/97

Page 22: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Analysis from 30,000 ft.Analysis from 30,000 ft.

Unix results are 2-3x more expensive than NT. Doesn’t matter which DBMS

Unix results are more scalable Unix: 10, 12, 16, 24 way SMPs NT: 4-way SMP w/ Intel & 8-way SMP on Digital Alpha

Highest performance is on clusters only a few results (trophy numbers?)

Page 23: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-C SummaryTPC-C Summary

Balanced, representative OLTP mix Five transaction types Database intensive; substantial IO and cache load Scaleable workload Complex data: data attributes, size, skew

Requires Transparency and ACID Full screen presentation services De facto standard for OLTP performance

Page 24: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Reference MaterialReference Material

TPC Web site: www.tpc.org TPC Results Database: www.microsoft.com/sql/tpc IDEAS web site: www.ideasinternational.com Jim Gray, The Benchmark Handbook for Database and

Transaction Processing Systems, Morgan Kaufmann, San Mateo, CA, 1991.

Raj Jain, The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling, John Wiley & Sons, New York, 1991.

William Highleyman, Performance Analysis of Transaction Processing Systems, Prentice Hall, Englewood Cliffs, NJ, 1988.

Page 25: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-D The Industry Standard Decision

Support Benchmark

TPC-D The Industry Standard Decision

Support Benchmark

Jack StephensInformix

[email protected]

Page 26: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

OutlineOutline

Overview The Database The Queries The Execution Rules The Results Early Lessons The Future

Page 27: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-D OverviewTPC-D Overview

Complex Decision Support workload The result of 5 years of development by the TPC Benchmark models ad hoc queries

extract database with concurrent updates multi-user environment

Specification was approved April 5, 1995.

TPC-ATPC-A

TPC-BTPC-B TPC-CTPC-C

TPC-DTPC-D

OLTP TransactionsOLTP Transactions

DSS QueriesDSS Queries

Business AnalysisBusiness Analysis

Business OperationsBusiness Operations

Page 28: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

OutlineOutline

Overview The Database The Queries The Execution Rules The Results Early Lessons The Future

Page 29: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-D SchemaTPC-D Schema

CustomerCustomerSF*150KSF*150K

LineItemLineItemSF*6000KSF*6000K

OrderOrderSF*1500KSF*1500K

SupplierSupplierSF*10KSF*10K

NationNation2525

RegionRegion55

PartSuppPartSuppSF*800KSF*800K

PartPartSF*200KSF*200K

Time2557

Legend:Legend:• • Arrows point in the direction of one-to-many relationships.Arrows point in the direction of one-to-many relationships.• • The value below each table name is its cardinality. SF is the Scale The value below each table name is its cardinality. SF is the Scale Factor.Factor.• • The Time table is optional. So far, not used by anyone.The Time table is optional. So far, not used by anyone.

Page 30: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Schema UsageSchema Usage

PART SUPPLIER PARTSUPP CUSTOMER LINEITEM ORDDER NATIONREGION

QUERY

partke

yname

mfgr

brand

type

sizecontainer

retailprice

comment

suppkey

name

address

nationkey

phone

acctb

al

comment

partke

ysuppkey

availqty

supplycost

comment

custke

yname

address

nationkey

phone

acctb

al

mktse

gment

comment

orderke

ypartke

ysuppkey

linenumber

quantity

extendedprice

disco

unt

tax

returnflag

linesta

tus

shipdate

commitd

ate

receiptdate

shipinstru

ctshipmode

comment

orderke

ycustke

yordersta

tus

totalprice

orderdate

orderpriority

clerk

shippriority

comment

nationkey

name

regionkey

comment

regionkey

name

comment1

234567891011121314151617

Page 31: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-D Database Scaling and LoadTPC-D Database Scaling and Load

Database size is determined from fixed Scale Factors (SF): 1, 10, 30, 100, 300, 1000, 3000, 10000 (note that 3 is missing, not a typo) These correspond to the nominal database size in GB.

(I.e., SF 10 is approx. 10 GB, not including indexes and temp tables.) Indices and temporary tables can significantly increase the total disk capacity.

(3-5x is typical)

Database is generated by DBGEN DBGEN is a C program which is part of the TPC-D spec. Use of DBGEN is strongly recommended. TPC-D database contents must be exact.

Database Load time must be reported Includes time to create indexes and update statistics. Not included in primary metrics.

Page 32: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

OutlineOutline

Overview The Database The Queries The Execution Rules The Results Early Lessons The Future

Page 33: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-D Query SetTPC-D Query Set

17 queries written in SQL92 to implement business questions. Queries are pseudo ad hoc:

Substitution parameters are replaced with constants by QGEN QGEN replaces substitution parameters with random values No host variables No static SQL

Queries cannot be modified -- “SQL as written” There are some minor exceptions. All variants must be approved in advance by the TPC

Page 34: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

2.3 Forecasting Revenue Query (Q6)This query quantifies the amount of revenue increase that would have resulted from eliminating company-wide discounts in a given percentage range in a given year. Asking this type of “what if” query can be used to look for ways to increase revenues.

2.3.1 Business QuestionThe Forecasting Revenue Change Query considers all the lineitems shipped in a given year with discounts between DISCOUNT+0.01 and DISCOUNT-0.01. The query list the amount by which the total revenues would have decreased if these discounts had been eliminated for lineitems with item quantities less than QUANTITY. Note that the potential revenue increase is equal to the sum of (L_EXTENDEDPRICE * L_DISCOUNT) for all lineitems with quantities and discounts in the qualifying range.

2.3.2 Functional Query DefinitionSELECT SUM(L_EXTENDEDPRICE*L_DISCOUNT) AS REVENUE FROM LINEITEM WHERE L_SHIPDATE >= DATE ‘[DATE]]’ AND L_SHIPDATE < DATE ‘[DATE]’ + INTERVAL ‘1’ YEAR AND L_DISCOUNTBETWEEN [DISCOUNT] - 0.01 AND [DISCOUNT] + 0.01 AND L_QUANTITY < [QUANTITY]

2.8.3 Substitution ParametersValues for the following substitution parameters must be generated and used to build the executable query text.

1. DATE is the first of January of a randomly selected year within [1993-1997]2. DISCOUNT is randomly selected within [0.02 .. 0.09]3. QUANTITY is randomly selected within [24 .. 25]

Sample Query DefinitionSample Query Definition

Page 35: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Sample Query Definition (cont.)Sample Query Definition (cont.)

2.8.4 Query ValidationFor validation against the qualification database the query must be executed using the following values for the substitution parameters and must produce the following output:

Values for substitution parameters:1. DATE = 1994-01-012. DISCOUNT = 0.063. QUANTITY = 24

Query validation output data:

1 row returned

| REVENUE || 11450588.04 |

Query validation demonstrates the integrity of an implementation Query phrasings are run against 100MB data set Data set must mimic the design of the test data base Answers sets must match those in the specification almost exactly

If the answer sets don’t match, the benchmark is invalid!

Page 36: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Formal Query Definitions are ISO-92 SQL EQT must match except for Minor Query Modification

Date/Time Syntax AS clauses

Table Naming Conventions Ordinal Group By/Order By

Statement Terminators Coding Style (I.e., white space)

Any other phrasing must be a Pre-Approved Query Variant Variants must be justifiable base on a criteria similar to 0.2 Approved variants are include in the specification

An implementation may use any combinations of Pre-Approved Variants, Formal Query Definitions and Minor Query Modifications.

Query VariationsQuery Variations

Page 37: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-D Update FunctionsTPC-D Update Functions

Update 0.1% of data per query stream About as long as a medium sized TPC-D query

Implementation of updates is left to sponsor, except: ACID properties must be maintained The update functions must be a set of logically consistent transactions

New Sales Update Function (UF1) Insert new rows into ORDER and LINEITEM tables

equal to 0.1% of table size

Old Sales Update Function (UF2) Delete rows from ORDER and LINEITEM tables

equal to 0.1% of table size

Page 38: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

OutlineOutline

Overview The Database The Queries The Execution Rules The Results Early Lessons The Future

Page 39: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-D Execution RulesTPC-D Execution Rules

Power Test Queries submitted in a single stream (i.e., no concurrency) Each Query Set is a permutation of the 17 read-only queries Sequence:

Throughput Test Multiple concurrent query

streams Single update stream Sequence:

CacheCache FlushFlush

QueryQuerySet 0Set 0(optional)(optional)

UF1UF1QueryQuerySet 0Set 0 UF2UF2

Timed SequenceTimed SequenceWarm-up, untimedWarm-up, untimed

Query Set 1Query Set 1Query Set 2Query Set 2

Query Set NQuery Set N

UF1 UF2 UF1 UF2 UF1 UF2UF1 UF2 UF1 UF2 UF1 UF2Updates:Updates:

.... ..

Page 40: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-D Execution Rules (cont.)TPC-D Execution Rules (cont.)

Load Test Measures the time to go from an

empty database to reproducible query runs

Not a primary metric; appears on executive summary

Sequence:

DBMSDBMSInitializedInitialized

DBGENDBGENRunRun

DataDataLoadedLoaded

IndexesIndexesBuiltBuilt

StatsStatsGatheredGathered

Timed SequenceTimed SequencePreparation, UntimedPreparation, Untimed

ReadyReadyforforQueriesQueries

Page 41: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-D MetricsTPC-D Metrics

Power Metric (QppD) Geometric Mean

Throughput (QthD) Arithmetic Mean

Both Metrics represent “Queries per Gigabyte Hour”

QppD Size SF

QI i UI jj

j

i

i@

( , ) ( , )

3600

0 0191

2

1

17

where

QI(i,0) Timing Interval for Query i, stream 0

UI(j,0) Timing Interval for Update j, stream 0

SF Scale Factor

QthD SizeS

SFTS@

17

3600

where:

S number of query streams

T elapsed time of test (in seconds)S

Page 42: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-D Metrics (cont.)TPC-D Metrics (cont.)

Composite Query-Per-Hour Rating (QphD) The Power and Throughput metrics are combined to get

the composite queries per hour.

Reported metrics are: Power: QppD@Size Throughput: QthD@Size Price/Performance: $/QphD@Size

Comparability: Results within a size category (SF) are comparable. Comparisons among different size databases are strongly discouraged.

QphD Size QppD Size QthD Size@ @ @

Page 43: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

OutlineOutline

Overview The Database The Queries The Execution Rules The Results Early Lessons The Future

Page 44: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Disclosure RequirementsDisclosure Requirements

All results must comply with standard TPC disclosure policies Results must be reviewed by a TPC auditor certified for TPC-D A Full Disclosure Report and Executive Summary must be on file with

the TPC before a result is publicly announced

All results are subject to standard TPC review policies Once filed, result are “In Review” for sixty days While in review, any member company may file a challenge against a

result that they think failed to comply with the specification All challenges and compliance issues are handled by the TPC’s

judiciary, the Technical Advisory Board(TAB) and affirmed by the membership

Page 45: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-D Current ResultsTPC-D Current Results

0

500

1000

1500

2000

2500

3000

3500

15-Apr-96

16-Apr-96

30-Apr-96

4-Jun-96

17-Sep-96

23-Sep-96

23-Sep-96

4-Nov-96

25-Nov-96

26-Nov-96

22-Jan-97

22-Jan-97

4-Mar-97

21-Mar-97

28-Mar-97

2-May 0

2000

4000

6000

8000

10000

12000

14000

QppD@100G

QthD@100G

QppD@300G

QthD@300G

QppD@1000G

QthD@1000G

$/QphD@100G

$/QphD@300G

$/QphD@1000G

Per

form

ance

Per

form

ance P

rice/Perform

anceP

rice/Perform

ance

TPC-D results as of 5/9/97TPC-D results as of 5/9/97

Page 46: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

OutlineOutline

Overview The Database The Queries The Execution Rules The Results Early Lessons The Future

Page 47: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Do good, Do well and TO-DODo good, Do well and TO-DO

First, the good news… TPC-D has improved products

First real quantification of optimizer performance for some vendors

TPC-D has increased competition

Then some areas that bear watching… Workload is maturing; indexing and query fixes are giving way to

engineering SMP/MPP price barrier is disappearing, but so is some of the

performance difference Meta knowledge of the data is becoming critical: better stats, smarter

optimizers, wiser data placement

Page 48: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Things we missed...Things we missed...

And finally the trouble spots… No metric will please, customers, engineers, and marketing managers TPC-D has failed to showcase multi-user decision support No results yet on 10G or 30G Decision support is moving faster than the TPC: OLAP, data marts,

data mining, SQL3, ADTs, Universal {IBM, Informix, Oracle}

Page 49: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

OutlineOutline

Overview The Database The Queries The Execution Rules The Results Early Lessons The Future

Page 50: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

TPC-D, version 2: OverviewTPC-D, version 2: Overview

Goal: define a workload to “take over” for TPC-D 1.x in time with its lifecycle (~2 year from now)

Two areas of focus: Address the known deficiencies of the 1.x specification

Introduce data skew Require multi-user executions

What number of streams is interesting? Should updates scale with users? with data volume?

Broaden the scope of the query set and data set “Snowstorm” schema Larger query set Batch and Trickle update models

Page 51: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

An extensible TPC workload?An extensible TPC workload?

Make TPC-D extensible: Three types of extensions

Query: new question on the same schema Schema: new representations and queries on the same data Data: new data types and operators

Simpler adoption model than full specification Mini-spec presented by three sponsors Eval period for prototype/refinement (Draft status) Acceptance as an extension Periodic review for renewal, removal or promotion to base workload

The goal is an adaptive workload: more responsive to the market and more inclusive of new technology without losing comparability or relevance

Page 52: SIGMOD ‘97 Industrial Session 5 Standard Benchmarks for Database Systems Chair: Dave DeWitt (Jim Gray, in absentia)

Want to learn more about TPC-D?Want to learn more about TPC-D?

TPC WWW site: www.tpc.org The latest specification, tools, and results The version 2 white paper

TPC-D Training Video Six hour video by the folks who wrote the spec. Explains, in detail, all major aspects of the benchmark.

Available from the TPC:Shanley Public Relations ph: (408) 295-8894777 N. First St., Suite 600 fax: (408) 295-9768San Jose, CA 95112-6311 email: [email protected]