27
Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP TCS @ 2002 Montreal, 28th Aug

Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

Università degli Studi di Pisa

Speaker: Giovanni Conforti

Joint work with: Orlando Ferrara and Giorgio Ghelli

TQL Algebra and its Implementation

IFIP TCS @ 2002 Montreal, 28th August

Page 2: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

2

• Short introduction to SSD and SSD query languages.

• Tree logic and TQL overview.

• TQL Algebra motivations.

What I’m going to talk about…

• TQL Algebra presentation.

• Translation algorithm.

• Translation correctness.

• Our implementation model.

• Conclusions and future works.

IFIP TCS @ 2002 Montreal, 28th August

Page 3: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

3

• Semi-Structured Data (SSD) are used to:

• model and query web (HTML, XML, …);• store sperimental data;• integrate eterogeneous databases;• …

Semi-structured Data

• Semi-Structured Data (SSD) structure is:

• irregular;• implicit;• always in evolution;• .........

IFIP TCS @ 2002 Montreal, 28th August

Page 4: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

4

Data model: SSD as labelled trees (Example)

articles

articlearticle

authordate

title

monthyear

GordonApr, 2000

Feb

TQL

… …

author

Cardelli

date

2001

author

Ghelli

… …

IFIP TCS @ 2002 Montreal, 28th August

articles[article[

author[Cardelli] |author[Gordon] |title [Anywhere] |date[Apr, 2000] ]

article[author[Ghelli] |

title[TQL] |conf[ETAPS] |date[

month[Feb] | year[2001] ] ]

]

Page 5: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

5

• As for tabular data we have SQL and relational algebra, we’d want to define query language and algebra for SSD

• Specify and develop a good query language for SSD (in paricular for XML) is one of the main current challenges of database and web research communities.

SSD query languages

• After several proposals (Lorel, YATL, XMLQL, XDuce, etc.) the W3C has introduced the standard XQuery whose implementation and specification are work in progress.

IFIP TCS @ 2002 Montreal, 28th August

Page 6: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

6

• Extend the ambient logic to describe properties of SSD, obtaining a tree logic

• The Tree logic is a modal logic good to express:

• properties that regard horizontal and vertical structure of SSD

• properties whose specification requires negation, recursion or universal quantification

• constraint and types of SSD

TQL – the idea

• Introduce free variables inside tree logic formulas; use a pattern-matching approach to bind these variables to values inside a given data source new SSD query strategy: TQL

IFIP TCS @ 2002 Montreal, 28th August

Page 7: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

7

• Based on three clauses:• matching;• filtering;• reconstruction.

• The possibility of integrating logic expression and queries inside the same language gives several advantages in terms of expressivity and optimization (i.e. rewriting based on types)

TQL – the language

Fused in the binding operator:

• But this talk is not about TQL language, but about TQL Algebra… so i will introduce TQL aspects only needed to understand our work about the algebra.

IFIP TCS @ 2002 Montreal, 28th August

• If you want to learn more about TQL see these two articles [WebDB2002] and [ETAPS2000]

Page 8: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

8

Page 9: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

9

Page 10: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

10

Page 11: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

11

A, B ::= T A A B x. A X. A

0 L[A] A | B L ~ L’ X

A

Tree Logics - syntax

Negation allows the definition of derived operators:

F A B x. AX. A L[A] A || B

Path Expressions:

• regular expressions;

• compact way to express constraints on paths over trees;

• can be defined using Tree Logics formulas.

Es. .m.n[A] as m[ n[A] | T ] | T

IFIP TCS @ 2002 Montreal, 28th August

Page 12: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

12

F 0 iff F = 0

F A B iff F A e F B

F m[A] iff F = m[F’] e F’ A

F A | B iff F’, F’’. F = F’ | F’’ e F’ A e F’’

B

F m[A] iff F’. F = m[F’] F’ A

F A || B iff F’, F’’. F = F’ | F’’ F’ A o F’’

B

F T always

F X iff F = (X)

F A iff ( F A )

… … …

Tree Logics – describing set of trees (forests)

IFIP TCS @ 2002 Montreal, 28th August

Page 13: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

13

TQL Queries

Syntax:Q, Q’ ::= 0 | X | L[Q] | f(Q) | Q | Q’ | from Q A select Q’

Example: result[ from $articles articles[

article[title[$T] | date[$D] | T ] | T] select article[title[$T] | date[$D]]]

{ month[Feb] | year[2001] }

{TQL}

{Apr, 2000}{Anywhere}

$D$T

result[ article[ title [Anywhere] | date[Apr, 2000] ] | article[ title[TQL] | date[ month[Feb] | year[2001] ] ] ]

IFIP TCS @ 2002 Montreal, 28th August

Page 14: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

14

In general an intermediate algebra assures:

• transformability

• executability

TQL Algebra motivations – in general

Parser

Transation

Execution

TQL query

Algebric expression

TQL Rewriting

TQL Algebra Rewriting

Physical optimization

IFIP TCS @ 2002 Montreal, 28th August

Page 15: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

15

• No current algebra for XML supports TQL operators (negation,

quantification, horizontal navigation, etc.) => we write a new one.

TQL Algebra motivations – TQL case

IFIP TCS @ 2002 Montreal, 28th August

• Due to negation and derived operators, this algebra must support

infinite bindings (variable bound to an infinite number of values).

• We want an algebra whose semantics is formally specified in

order to prove its correctness w.r.t. TQL semantics.

• We want a running prototype, so we have to implement data

structures and translation, evaluation algorithms for TQL Algebra

Page 16: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

16

• It is an algebra of tables and trees, defined on four sorts.• label expressions L : denoting labels;

• tree expressions Q : denoting forests (set of trees);• row expressions RV: denoting rows over V (tuples with type V);

• table expressions TV: denoting finite or infinite tables (set of rows) with schema V.

TQL Algebra – sorts and their semantics

IFIP TCS @ 2002 Montreal, 28th August

• The basic sort is the table one, that is used to represent the evaluation of a Q A TQL binding operation.

• SSD and TQL query results are naturally represent by tree expressions.

Page 17: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

17

TQL Algebra – table expressions

• One-row tables

{RV} | {(x L )} | {(x Q )}

• Relational operators (union, cartesian product, projection and restriction)

T UV, V’

T | T V ,V’ T | V T L ~ L’ T

• Universe and Complement

1V | CoV (T )

• Vertical test and horizontal iterator of trees

if Q = y[Y] then T Y,y else T | U{Q=Y|Y’}

Y|Y’

• Recursion

letrec M = Y. T M,Y in T M | M( Q )IFIP TCS @ 2002 Montreal, 28th August

Page 18: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

18

TQL Algebra – tree expressions

Tree algebra reflects the TQL operators used to build trees (queries). The differences are• X does not denote a variable, but a name of a row;• we have a new metavariable Y ranging over tree variables;• the from-select clause is substituted by the tree construction (multiset union) Parr T Qr whose informal semantic is:

“Compute the union of all Qr where r is a row belonging to T”.

IFIP TCS @ 2002 Montreal, 28th August

Q ::= R(X) | Y | 0 | Q | Q’ | L[Q] | f(Q) | Parr T Qr

Page 19: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

19

TQL Algebra – derived table expressions

• We can define by translation several useful table expressions:

• intersection, junction, extension

•co-projection (dual of projection)

•other structural test on the tree

• These operators are very useful for translate derived operators of the tree logic!

•All of them are implemented in the current system.

IFIP TCS @ 2002 Montreal, 28th August

Page 20: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

20

Translation from TQL to TQL Algebra

IFIP TCS @ 2002 Montreal, 28th August

• The core of translation is the binder translation. We perform a semantic inversion transforming a formula (function from substitutions to set of trees) to a function that, given a tree returns a set of substitutions (table expression).

A Q, RV,

• Translation is defined by structural recursion on A

• It actually depends from the current schema V,

• Q and R are only plugged somewhere inside the expression.

• is an environment mapping logical recursive variables to algebric ones.

╓ ╖

Page 21: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

21

Translation from TQL to TQL Algebra - example

• Example:

from Q A x. x[$Z] select Q’RV = Par

r TQ’ RV ; r

T A x. x[$Z] Q ,RV,

A

{$Z}

x[$Z]

……

IFIP TCS @ 2002 Montreal, 28th August

╓ ╖ ╓ ╖

╓ ╖

╓ ╖╓ ╖

Page 22: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

22

Translation – operators

IFIP TCS @ 2002 Montreal, 28th August

Formula Algebric Operator Dual Formula Dual Algebric Op.

T Universe F Empty

A Complement

A B Junction A B Ext. Union

x. A, X. A Projection x. A, X. A

Co-Projection

0, L[A] Test L[A] Test (inv)

L ~ L’ Restriction

A | B Union Iterator A || B Join Iterator

X Singleton

A Recursion (minfix) A maxfix

Page 23: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

23

Translation correctness

IFIP TCS @ 2002 Montreal, 28th August

• The formal approach we have taken allows us to prove the correctness of the translation. That is :

Theorem

FV(RV) dom(e) , FV(Q) V

[[ Q ]] e(RV ) = Q RV e

Semantics of the query Q in e(RV ) is equivalent to the semantics of the translation of Q in RV

╓ ╖╙ ╜

• The core of the proof is the from-select case in which we prove the correctness of binder translation

Page 24: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

24

• Representing in a finite space possibly infinite tables.

Implementing the algebra – model description

• We use disjunctive constraints (closely related to proposals in constraint databases).

• For each algebric operator we define and implement the corresponding one that works on disjunctive constraints.

• New algorithms for complex operators (complement, co-projection, tree navigation)

{ a }{ b }

NotIn { a, b }{ a }

$Y$X

IFIP TCS @ 2002 Montreal, 28th August

Page 25: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

25

Implementing the algebra – The TQL System

Tql Engine

Sys Interface

DB

World Wide Web

…...

World Wide Web

Tql Applet

Tql ServletTql GUI

XML

Tql Applet

File system

• Implemented in Java and ported to C#.

• Some stats:

• ~20.000 LoC;

• 182 classes.

• Download at:

http://tql.di.unipi.it/tql

IFIP TCS @ 2002 Montreal, 28th August

Page 26: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

26

• TQL Algebra:

• realized as a tool for execute TQL;• seems to be quite general;• it is implemented (with some restictions);• deals with infinite tables.

Conclusions

• Future works:

• rewritings (with types and constraints);• static safety analysis;• cost model and physical optimizations;• extension to the graph model (graph logic).

IFIP TCS @ 2002 Montreal, 28th August

Page 27: Università degli Studi di Pisa Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli TQL Algebra and its Implementation IFIP

27

The End

IFIP TCS @ 2002 Montreal, 28th August

The End.