29
Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman (Waterloo) Physical Design: State of Art 1 / 13

Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

  • Upload
    ledieu

  • View
    217

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Fundamentals of Physical Design:State of Art

David Toman

D. R. Cheriton School of Computer Science

D. Toman (Waterloo) Physical Design: State of Art 1 / 13

Page 2: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Benefits of Database Technology

1 High-level/declarative DML (query) Languages

Physical Data IndependenceAbility to develop/change the physical schema without changingthe conceptual (logical) schema.

⇒ essential to fully realize productivity gains in development.

2 Transactions and Concurrency Control

3 Recovery

4 . . .

D. Toman (Waterloo) Physical Design: State of Art 2 / 13

Page 3: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Benefits of Database Technology

1 High-level/declarative DML (query) Languages

Physical Data IndependenceAbility to develop/change the physical schema without changingthe conceptual (logical) schema.

⇒ essential to fully realize productivity gains in development.

2 Transactions and Concurrency Control

3 Recovery

4 . . .

D. Toman (Waterloo) Physical Design: State of Art 2 / 13

Page 4: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Benefits of Database Technology

1 High-level/declarative DML (query) Languages

Physical Data IndependenceAbility to develop/change the physical schema without changingthe conceptual (logical) schema.

⇒ essential to fully realize productivity gains in development.

2 Transactions and Concurrency Control

3 Recovery

4 . . .

D. Toman (Waterloo) Physical Design: State of Art 2 / 13

Page 5: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Conceptual/Logical vs. Physical Data

IDEA:Insulate Users/Applications

from Physical Design issues. . . essentially ADTs for DATA

Issues:• DDL Languages?

• How are the schematalinked together?

• How to executeDML requests?

ANSI SPARC Architecture

D. Toman (Waterloo) Physical Design: State of Art 3 / 13

Page 6: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Conceptual/Logical vs. Physical Data

IDEA:Insulate Users/Applications

from Physical Design issues. . . essentially ADTs for DATA

Issues:• DDL Languages?

• How are the schematalinked together?

• How to executeDML requests?

ANSI SPARC Architecture

D. Toman (Waterloo) Physical Design: State of Art 3 / 13

Page 7: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Queries vs. Query Processing

(1) User/App Queries:∗ formulated w.r.t. the conceptual/external schema∗ high level (declarative) query languages (SQL, OQL)

logic-based semantics based on satisfaction“does a database D and a tuple t make a query Q true?”

(2) Query PLANS:∗ formulated w.r.t. physical schema∗ low-level iterator-based language (relational algebra)

Problem:How to get from (1) to (2)? Query Optimization!

D. Toman (Waterloo) Physical Design: State of Art 4 / 13

Page 8: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Queries vs. Query Processing

(1) User/App Queries:∗ formulated w.r.t. the conceptual/external schema∗ high level (declarative) query languages (SQL, OQL)

logic-based semantics based on satisfaction“does a database D and a tuple t make a query Q true?”

(2) Query PLANS:∗ formulated w.r.t. physical schema∗ low-level iterator-based language (relational algebra)

Problem:How to get from (1) to (2)? Query Optimization!

D. Toman (Waterloo) Physical Design: State of Art 4 / 13

Page 9: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Queries vs. Query Processing

(1) User/App Queries:∗ formulated w.r.t. the conceptual/external schema∗ high level (declarative) query languages (SQL, OQL)

logic-based semantics based on satisfaction“does a database D and a tuple t make a query Q true?”

(2) Query PLANS:∗ formulated w.r.t. physical schema∗ low-level iterator-based language (relational algebra)

Problem:How to get from (1) to (2)? Query Optimization!

D. Toman (Waterloo) Physical Design: State of Art 4 / 13

Page 10: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Standard Approach to Physical Design (and Queries)

First (BAD) IDEA:Logical schema is closely tied to Physical Schema

. . . this simplifies Query Optimization (hence mostly focus on costs)

Example (RDBMS)The DDL statement CREATE TABLE:

1 declares a new relation (conceptual; includes keys, . . . )2 creates a base file (physical; includes structure, placement, . . . )

This approach has been followed for ∼ 30 years. . . the IBM DB2’s CREATE TABLE now has 3 pages of options.

But there are other related DDL statements:. . . CREATE VIEW (conceptual) and CREATE INDEX (physical)

D. Toman (Waterloo) Physical Design: State of Art 5 / 13

Page 11: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Standard Approach to Physical Design (and Queries)

First (BAD) IDEA:Logical schema is closely tied to Physical Schema

. . . this simplifies Query Optimization (hence mostly focus on costs)

Example (RDBMS)The DDL statement CREATE TABLE:

1 declares a new relation (conceptual; includes keys, . . . )2 creates a base file (physical; includes structure, placement, . . . )

This approach has been followed for ∼ 30 years. . . the IBM DB2’s CREATE TABLE now has 3 pages of options.

But there are other related DDL statements:. . . CREATE VIEW (conceptual) and CREATE INDEX (physical)

D. Toman (Waterloo) Physical Design: State of Art 5 / 13

Page 12: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Standard Approach to Physical Design (and Queries)

First (BAD) IDEA:Logical schema is closely tied to Physical Schema

. . . this simplifies Query Optimization (hence mostly focus on costs)

Example (RDBMS)The DDL statement CREATE TABLE:

1 declares a new relation (conceptual; includes keys, . . . )2 creates a base file (physical; includes structure, placement, . . . )

This approach has been followed for ∼ 30 years. . . the IBM DB2’s CREATE TABLE now has 3 pages of options.

But there are other related DDL statements:. . . CREATE VIEW (conceptual) and CREATE INDEX (physical)

D. Toman (Waterloo) Physical Design: State of Art 5 / 13

Page 13: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Common Advice to “DB tuning”

Second (BAD, but practical and expedient) IDEA:(In closely-tied approaches) Physical Design can be

greatly influenced by changes to the Conceptual/Logical Schema

. . . hence we don’t have to change query optimizer

Example (in RDBMS)1 Denormalization (and NULLs)

2 Horizontal/Vertical Partitioning

D. Toman (Waterloo) Physical Design: State of Art 6 / 13

Page 14: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Common Advice to “DB tuning”

Second (BAD, but practical and expedient) IDEA:(In closely-tied approaches) Physical Design can be

greatly influenced by changes to the Conceptual/Logical Schema

. . . hence we don’t have to change query optimizer

Example (in RDBMS)1 Denormalization (and NULLs)

2 Horizontal/Vertical Partitioning

D. Toman (Waterloo) Physical Design: State of Art 6 / 13

Page 15: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Denormalization

Golden Rule for Conceptual to Logical MappingsIndependent Entities (Relationships) are kept separate

⇒ normal forms (relational: BCNF, 3NF)

. . . makes “complex object reconstruction” harder (joins)

ADVICE:Settle for lower Normal Form

(i.e., combine multiple entities/relationships into one logical unit).

. . . avoids joins

PROBLEMS:• update anomalies (often leads to proliferation of NULLs!)• increase of storage space (in the standard approach)• queries/updates have to be reformulated

D. Toman (Waterloo) Physical Design: State of Art 7 / 13

Page 16: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Denormalization

Golden Rule for Conceptual to Logical MappingsIndependent Entities (Relationships) are kept separate

⇒ normal forms (relational: BCNF, 3NF)

. . . makes “complex object reconstruction” harder (joins)

ADVICE:Settle for lower Normal Form

(i.e., combine multiple entities/relationships into one logical unit).

. . . avoids joins

PROBLEMS:• update anomalies (often leads to proliferation of NULLs!)• increase of storage space (in the standard approach)• queries/updates have to be reformulated

D. Toman (Waterloo) Physical Design: State of Art 7 / 13

Page 17: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Denormalization

Golden Rule for Conceptual to Logical MappingsIndependent Entities (Relationships) are kept separate

⇒ normal forms (relational: BCNF, 3NF)

. . . makes “complex object reconstruction” harder (joins)

ADVICE:Settle for lower Normal Form

(i.e., combine multiple entities/relationships into one logical unit).

. . . avoids joins

PROBLEMS:• update anomalies (often leads to proliferation of NULLs!)• increase of storage space (in the standard approach)• queries/updates have to be reformulated

D. Toman (Waterloo) Physical Design: State of Art 7 / 13

Page 18: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Denormalization

Golden Rule for Conceptual to Logical MappingsIndependent Entities (Relationships) are kept separate

⇒ normal forms (relational: BCNF, 3NF)

. . . makes “complex object reconstruction” harder (joins)

ADVICE:Settle for lower Normal Form

(i.e., combine multiple entities/relationships into one logical unit).

. . . avoids joins

PROBLEMS:• update anomalies (often leads to proliferation of NULLs!)• increase of storage space (in the standard approach)• queries/updates have to be reformulated

D. Toman (Waterloo) Physical Design: State of Art 7 / 13

Page 19: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Denormalization: Example

Normalized Design:

employee(id, name, dept)

department(dept, address)

Denormalization:

empdept(id, name, dept, address)

D. Toman (Waterloo) Physical Design: State of Art 8 / 13

Page 20: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Partitioning

ADVICE:Make logical units correspond to frequent requests:

Vertical Partition:⇒ attributes together in a query form fragments

Horizontal Partition:⇒ value ranges in queries form fragments

PROBLEMS:• lossless vs. efficient designs• queries/updates have to be reformulated

D. Toman (Waterloo) Physical Design: State of Art 9 / 13

Page 21: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Partitioning

ADVICE:Make logical units correspond to frequent requests:

Vertical Partition:⇒ attributes together in a query form fragments

Horizontal Partition:⇒ value ranges in queries form fragments

PROBLEMS:• lossless vs. efficient designs• queries/updates have to be reformulated

D. Toman (Waterloo) Physical Design: State of Art 9 / 13

Page 22: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Problems?

Second (BAD, but practical and expedient) IDEA:(In closely-tied approaches) Physical Design can be

greatly influenced by changes to the Conceptual/Logical Schema

. . . completely breaks Physical Data Independence

• Materialized Views (essentially additional tables)• Data Cubes (summary tables)

. . . while preserving Data Integrity?

Physical design ∼ changes to logical schema + index selection

D. Toman (Waterloo) Physical Design: State of Art 10 / 13

Page 23: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Problems?

Second (BAD, but practical and expedient) IDEA:(In closely-tied approaches) Physical Design can be

greatly influenced by changes to the Conceptual/Logical Schema

. . . completely breaks Physical Data Independence

• Materialized Views (essentially additional tables)• Data Cubes (summary tables)

. . . while preserving Data Integrity?

Physical design ∼ changes to logical schema + index selection

D. Toman (Waterloo) Physical Design: State of Art 10 / 13

Page 24: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Problems?

Second (BAD, but practical and expedient) IDEA:(In closely-tied approaches) Physical Design can be

greatly influenced by changes to the Conceptual/Logical Schema

. . . completely breaks Physical Data Independence

• Materialized Views (essentially additional tables)• Data Cubes (summary tables)

. . . while preserving Data Integrity?

Physical design ∼ changes to logical schema + index selection

D. Toman (Waterloo) Physical Design: State of Art 10 / 13

Page 25: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Built-in Assumptions: 2-level Storage and Clustering

Additional complications:

• implicit assumption of two-level storage

. . . most current DBMS assume this

. . . specialized main-memory DBMS

• impact on data structures (indices)

. . . primary vs. secondary indices

. . . clustered vs. unclustered indices

D. Toman (Waterloo) Physical Design: State of Art 11 / 13

Page 26: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Built-in Assumptions: 2-level Storage and Clustering

Additional complications:

• implicit assumption of two-level storage

. . . most current DBMS assume this

. . . specialized main-memory DBMS

• impact on data structures (indices)

. . . primary vs. secondary indices

. . . clustered vs. unclustered indices

D. Toman (Waterloo) Physical Design: State of Art 11 / 13

Page 27: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Goals and Solutions

GOAL 1:Develop approaches and technology that allow for loose couplingbetween conceptual/logical designs and physical design.

. . . allows logical design to closely follow the conceptual design. . . while supporting a wide variety of physical designs

GOAL 2:Design a small number of primitives that support all of the above.

1 uniform DDL for both conceptual/logical and physical objects2 capabilities (index) declarations for physical objects3 integrity constraints to establish links between objects4 no built-in assumptions (e.g., 2-level store)

D. Toman (Waterloo) Physical Design: State of Art 12 / 13

Page 28: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Goals and Solutions

GOAL 1:Develop approaches and technology that allow for loose couplingbetween conceptual/logical designs and physical design.

. . . allows logical design to closely follow the conceptual design. . . while supporting a wide variety of physical designs

GOAL 2:Design a small number of primitives that support all of the above.

1 uniform DDL for both conceptual/logical and physical objects2 capabilities (index) declarations for physical objects3 integrity constraints to establish links between objects4 no built-in assumptions (e.g., 2-level store)

D. Toman (Waterloo) Physical Design: State of Art 12 / 13

Page 29: Fundamentals of Physical Design: State of Artdavid/cs848s10/lec1.pdf · Fundamentals of Physical Design: State of Art David Toman D. R. Cheriton School of Computer Science D. Toman

Organization of the Lectures

1 Uniform Approach to Physical Design and Schema Languages

2 How do we execute queries? (take 1: conjunctive queries)

3 How do we execute queries? (take 2: first-order queries)

4 Look into the Future (discussion/seminar)

D. Toman (Waterloo) Physical Design: State of Art 13 / 13