25
CPSC 504: Data Management Review of Relational Model 2/2 Laks V.S. Lakshmanan Dept. of CS UBC

CPSC 504: Data Management Review of Relational Model 2/2 Laks V.S. Lakshmanan Dept. of CS UBC

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

CPSC 504: Data ManagementReview of Relational Model

2/2Laks V.S. Lakshmanan

Dept. of CS UBC

Getting at the data – Querying

• Relational DBs are queried with SQL. But where did that come from what is the basis for it?

• Relational DBs can be queried using logic.

• In fact, we will review some logic-based QLs.

• SQL = logic + some practically crucial features like aggregation & nesting.

Logic Query Language(s) • stocks(Ticker, Company), prices(Date,

Ticker, Type, Value), indexes(Date, DOW, TSX, S&P).

• Find the ticker of “Syncrude Corp.”: – {T.Ticker | stocks(T) & T.Company = “Syncrude

Corp.”}.

• Find the Tickers of companies, company names, and the corresponding closing prices on those days when DOW was more than 12,000. – {(T.Ticker, T.Company, P.Date, P.Value) |

stocks(T) & prices(P) & indexes(I) & T.Ticker=P.Ticker & P.Date=I.Date & I.DOW>=12000 & P.Type=`closing’}.

Logic QL(s) – Tuple Relational Calculus

• TRC key features: – Tuple variables (basic unit)– Output tuple assembled from pieces of tuple vars– Conditions imposed as “built-in” predicates – Quantifiers

• Quantifier example: Find stocks (tickers) which had a higher closing price than every other company on August 15, 2011.

{(T.Ticker) | stocks(T) & (P1)[prices(P1) & T.Ticker=P1.Ticker & P1.Type=`closing’ & P1.Date=2011/08/15 & (P2)[prices(P2) & P2.Date=2011/08/15 & P2.Type=`closing’ P2.Value ≤ P1.Value]]}.

Logic QL – Datalog (in lieu of Domain Relational Calculus)

• Rule-based query language. • Syntax similar to DRC. • Supports recursion. • E.g.: Q1: q1(T) stocks(T, `Syncrude Corp.’). Q2: q2(T, C, D, P) stocks(T, C) &

prices(D, T, `closing’, P) & indexes(D, DJ, W1, W2) & DJ >= 12000.

Datalog (contd.)

• Note the use of variables and constants as predicate arguments.

• Database predicates vs. built-in predicates. • Base tables vs. derived tables (aka views). • Rule ::= Head Body. • Head – a DB predicate. • Body – a conjunction of DB and built-in

predicates. • Query – a set of rules, defining a query

predicate. • Rules need to be safe.

Datalog (contd.)

• There is an implicit in front of every rule body. – e.g.?

• Can we express at all? • E.g.: Q3: q3(T) stocks(T, C) & bad(T). bad(T1) stocks(T1, C1) & stocks(T2,

C2) & prices(2007/08/15, T1, `closing’, V1) & prices(2007/08/15, T2, `closing’, V2) & V2 > V1.

Datalog (contd.)

• Datalog can go beyond what we have just seen.

• Recursion: e.g., let flights(F, T) denote there is a direct flight from city F to city T. Find all cities you can fly to from Vancouver, possibly in a series of hops.

flyTo(X, Y) flights(X, Y). flyTo(X, Y) flights(X, Z) & flyTo(Z, Y). ?– flyTo(`Vancouver’, Y).

Datalog wrap up.

• Efficient query answering – esp. when recursion, negation, aggregation(will see shortly), or combos are present.

• Powerful QL. • Numerous efficient QP strategies

have been developed.

Relational Algebra

• RA is based on five simple ops – select, project, Cartesian (aka cross) product, union, minus.

• When combined, it makes for a rather powerful QL, equiv. in expressive power, to TRC or Datalog w/o recursion.

• You just need efficient algorithms for basic ops and useful macros.

• And a query optimizer that chooses the best plan for evaluating a query based on estimated cost, using a cost model.

RA

• Select: Company=`Sybcrude Corp.’(stocks) – filter out tuples whose value for Company is `Syncrude Corp.’

• Project: Ticker(stocks) – find all tickers. • Product: stocks x prices – find all

combinations of tuples from the two relations.

• Union: Ticker(stocks) Ticker(prices). • Minus: Ticker(stocks) Ticker(prices).

RA

• Example “macros”: • Join and division – examples. • Other macros: In implementing

operators, you want to piggyback when it makes sense: e.g., if we want to compute a Join;select;project cascade, we can do select and project “for free” on the fly, while paying only for joining.

• Exercise: Express Q1—Q3 in RA.

SQL (Structured Query Language)

• Inspired mostly by TRC. • Ad hoc additions – partly inspired by

RA and partly by need. – “Natural join”, “left outer join”, etc. – SUM(Sal), AVG(Height), etc. – Nesting queries inside others.

• SQL can also express updates, unlike the “pure” QLs seen so far.

SQL review (contd.)

• Q1: select Ticker from stocks where Company=`Syncrude Corp.’ • What is the connection to TRC? • Q2: select S.Ticker, Company, P.Date,

Value from stocks S, prices P, indexes I where S.Ticker=P.Ticker AND P.Date=I.Date AND I.DOW>=12000

SQL review (contd.)

• Q3: select S.Ticker from stocks S where NOT EXISTS ( select * from stocks S2, prices P1,

prices P2 where P1.Date=2007/08/15 AND

P2.Date=2007/08/15 AND S.Ticker=P1.Ticker

AND S2.Ticker=P2.Ticker AND P1.Value < P2.Value )

SQL review wrap up

• Q3 can be expressed more concisely using grouping and aggregation.

• Q4: Find the average value of each type of price.

select Type, AVG(Value) from prices group by Type

SQL updates

• We can explicitly insert a tuple of values into a table.

• Can modify select fields of a specific tuple.

• Can perform query-driven updates.

SQL DDL

• Can define schema. • Can define ICs and triggers.

Intro. to Conjunctive Queries

• In datalog, a rule of the form: H B1, ..., Bm. - range-restricted and safe. e.g., p(X,Y) a(X,Z), b(Z,W), c(Z,Y), W>1. In SQL, single block queries w/ no agg or

grouping. In RA, SPJ queries. Tableau Queries.

Concurrency control

• Supports access by multiple users/processes, while preserving integrity of data.

• E.g.: child checking account balance. • father depositing money into account. • Mother making a withdrawal. • Each transaction = read;change; write. • Should be interleaved carefully to

prevent incorrect state!

Transactions

• Atomicity: either a transaction as a whole succeeds, or fails; nothing part way.

• Consistency: only transactions that respect DB’s ICs are allowed.

• Isolation: at any time, the schedule of actions (coming from diff. transactions) being performed is serializable, i.e., is equivalent to running them one transaction at a time.

• Durability: after a commit, the effect of a trsnsaction persists.

Recovery

• From disk failures – done through RAID.

• From power failures – done by keeping a detailed log of transactions (actions) performed. Roll back if need be to preserve correct state.

DBMS Architecture

Summing it all up

• DBMS – one of the most sophisticated mission-critical software systems.

• Real DBMSs – tend to be complex with many components.

• Query Optimizer, Transaction Manager, Disk Space Manager – key components.

• Based on decades of solid research. • In some ways, RDBMS as a model and as a

technology – a gold standard: – For data models. – For software systems.

Further Reading

• In addition to the list already seen: • P. Bernstein, V. Hadzilacos, and N.

Goodman: Concurrency Control and Recovery in Database Systems.

• J. Gray and A. Reuter: Transaction Processing: Concepts and Techniques.

• M. Stonebraker and J. Hellerstein: Readings in DB Systems (the red book) – contains several great papers (on CC & Recovery and other topics).