View
218
Download
0
Embed Size (px)
Citation preview
Getting at the data – Querying
• Relational DBs are queried with SQL. But where did that come from what is the basis for it?
• Relational DBs can be queried using logic.
• In fact, we will review some logic-based QLs.
• SQL = logic + some practically crucial features like aggregation & nesting.
Logic Query Language(s) • stocks(Ticker, Company), prices(Date,
Ticker, Type, Value), indexes(Date, DOW, TSX, S&P).
• Find the ticker of “Syncrude Corp.”: – {T.Ticker | stocks(T) & T.Company = “Syncrude
Corp.”}.
• Find the Tickers of companies, company names, and the corresponding closing prices on those days when DOW was more than 12,000. – {(T.Ticker, T.Company, P.Date, P.Value) |
stocks(T) & prices(P) & indexes(I) & T.Ticker=P.Ticker & P.Date=I.Date & I.DOW>=12000 & P.Type=`closing’}.
Logic QL(s) – Tuple Relational Calculus
• TRC key features: – Tuple variables (basic unit)– Output tuple assembled from pieces of tuple vars– Conditions imposed as “built-in” predicates – Quantifiers
• Quantifier example: Find stocks (tickers) which had a higher closing price than every other company on August 15, 2011.
{(T.Ticker) | stocks(T) & (P1)[prices(P1) & T.Ticker=P1.Ticker & P1.Type=`closing’ & P1.Date=2011/08/15 & (P2)[prices(P2) & P2.Date=2011/08/15 & P2.Type=`closing’ P2.Value ≤ P1.Value]]}.
Logic QL – Datalog (in lieu of Domain Relational Calculus)
• Rule-based query language. • Syntax similar to DRC. • Supports recursion. • E.g.: Q1: q1(T) stocks(T, `Syncrude Corp.’). Q2: q2(T, C, D, P) stocks(T, C) &
prices(D, T, `closing’, P) & indexes(D, DJ, W1, W2) & DJ >= 12000.
Datalog (contd.)
• Note the use of variables and constants as predicate arguments.
• Database predicates vs. built-in predicates. • Base tables vs. derived tables (aka views). • Rule ::= Head Body. • Head – a DB predicate. • Body – a conjunction of DB and built-in
predicates. • Query – a set of rules, defining a query
predicate. • Rules need to be safe.
Datalog (contd.)
• There is an implicit in front of every rule body. – e.g.?
• Can we express at all? • E.g.: Q3: q3(T) stocks(T, C) & bad(T). bad(T1) stocks(T1, C1) & stocks(T2,
C2) & prices(2007/08/15, T1, `closing’, V1) & prices(2007/08/15, T2, `closing’, V2) & V2 > V1.
Datalog (contd.)
• Datalog can go beyond what we have just seen.
• Recursion: e.g., let flights(F, T) denote there is a direct flight from city F to city T. Find all cities you can fly to from Vancouver, possibly in a series of hops.
flyTo(X, Y) flights(X, Y). flyTo(X, Y) flights(X, Z) & flyTo(Z, Y). ?– flyTo(`Vancouver’, Y).
Datalog wrap up.
• Efficient query answering – esp. when recursion, negation, aggregation(will see shortly), or combos are present.
• Powerful QL. • Numerous efficient QP strategies
have been developed.
Relational Algebra
• RA is based on five simple ops – select, project, Cartesian (aka cross) product, union, minus.
• When combined, it makes for a rather powerful QL, equiv. in expressive power, to TRC or Datalog w/o recursion.
• You just need efficient algorithms for basic ops and useful macros.
• And a query optimizer that chooses the best plan for evaluating a query based on estimated cost, using a cost model.
RA
• Select: Company=`Sybcrude Corp.’(stocks) – filter out tuples whose value for Company is `Syncrude Corp.’
• Project: Ticker(stocks) – find all tickers. • Product: stocks x prices – find all
combinations of tuples from the two relations.
• Union: Ticker(stocks) Ticker(prices). • Minus: Ticker(stocks) Ticker(prices).
RA
• Example “macros”: • Join and division – examples. • Other macros: In implementing
operators, you want to piggyback when it makes sense: e.g., if we want to compute a Join;select;project cascade, we can do select and project “for free” on the fly, while paying only for joining.
• Exercise: Express Q1—Q3 in RA.
SQL (Structured Query Language)
• Inspired mostly by TRC. • Ad hoc additions – partly inspired by
RA and partly by need. – “Natural join”, “left outer join”, etc. – SUM(Sal), AVG(Height), etc. – Nesting queries inside others.
• SQL can also express updates, unlike the “pure” QLs seen so far.
SQL review (contd.)
• Q1: select Ticker from stocks where Company=`Syncrude Corp.’ • What is the connection to TRC? • Q2: select S.Ticker, Company, P.Date,
Value from stocks S, prices P, indexes I where S.Ticker=P.Ticker AND P.Date=I.Date AND I.DOW>=12000
SQL review (contd.)
• Q3: select S.Ticker from stocks S where NOT EXISTS ( select * from stocks S2, prices P1,
prices P2 where P1.Date=2007/08/15 AND
P2.Date=2007/08/15 AND S.Ticker=P1.Ticker
AND S2.Ticker=P2.Ticker AND P1.Value < P2.Value )
SQL review wrap up
• Q3 can be expressed more concisely using grouping and aggregation.
• Q4: Find the average value of each type of price.
select Type, AVG(Value) from prices group by Type
SQL updates
• We can explicitly insert a tuple of values into a table.
• Can modify select fields of a specific tuple.
• Can perform query-driven updates.
Intro. to Conjunctive Queries
• In datalog, a rule of the form: H B1, ..., Bm. - range-restricted and safe. e.g., p(X,Y) a(X,Z), b(Z,W), c(Z,Y), W>1. In SQL, single block queries w/ no agg or
grouping. In RA, SPJ queries. Tableau Queries.
Concurrency control
• Supports access by multiple users/processes, while preserving integrity of data.
• E.g.: child checking account balance. • father depositing money into account. • Mother making a withdrawal. • Each transaction = read;change; write. • Should be interleaved carefully to
prevent incorrect state!
Transactions
• Atomicity: either a transaction as a whole succeeds, or fails; nothing part way.
• Consistency: only transactions that respect DB’s ICs are allowed.
• Isolation: at any time, the schedule of actions (coming from diff. transactions) being performed is serializable, i.e., is equivalent to running them one transaction at a time.
• Durability: after a commit, the effect of a trsnsaction persists.
Recovery
• From disk failures – done through RAID.
• From power failures – done by keeping a detailed log of transactions (actions) performed. Roll back if need be to preserve correct state.
Summing it all up
• DBMS – one of the most sophisticated mission-critical software systems.
• Real DBMSs – tend to be complex with many components.
• Query Optimizer, Transaction Manager, Disk Space Manager – key components.
• Based on decades of solid research. • In some ways, RDBMS as a model and as a
technology – a gold standard: – For data models. – For software systems.
Further Reading
• In addition to the list already seen: • P. Bernstein, V. Hadzilacos, and N.
Goodman: Concurrency Control and Recovery in Database Systems.
• J. Gray and A. Reuter: Transaction Processing: Concepts and Techniques.
• M. Stonebraker and J. Hellerstein: Readings in DB Systems (the red book) – contains several great papers (on CC & Recovery and other topics).