45
1 Part 3: Query Languages Managing XML and Semistructured Data

1 Part 3: Query Languages Managing XML and Semistructured Data

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

1

Part 3: Query Languages

Managing XML and Semistructured Data

2

In this section…In this section… Lorel (A Lightweight Object REpository Language -

developed at Standford) XPath specification

• data model• Examples [xpath, axis]• syntax

XQuery FLWR expressions FOR and LET expressions Collections and sorting (XML-QL the earlier version in AT&T Labs)

Resources:The Lorel Query Language for Semistructured Data  by Abiteboul, Quass, McHugh, Widom, Wiener, in International Journal on Digital Libraries, 1997.A formal semantics of patterns in XSLT by Phil Wadler. XML Path Language (XPath) www.w3.org/TR/xpathXQuery: A Query Language for XML Chamberlin, Florescu, et al.W3C recommendation: www.w3.org/TR/xquery/

3

Querying XML DataQuerying XML Data A core query language (extracting +

restructuring) XPath (core expressions) allows simple

navigation through the tree XQuery is used as the SQL of XML XSLT (Extensible Stylesheet Language

Transformation) = recursive traversal based on pattern matching - will not discuss here

4

Sample Data for QueriesSample Data for Queries<biblio>

<paper>…</paper><book><author> Smith </author>

<date> 1999 </date> <title> Database Systems </title></book><book > <author> Roux</author> <author> Combalusier</author> <date> 1976 </date> <title> Database Systems </title></book>

</biblio>

<biblio><paper>…</paper><book><author> Smith </author>

<date> 1999 </date> <title> Database Systems </title></book><book > <author> Roux</author> <author> Combalusier</author> <date> 1976 </date> <title> Database Systems </title></book>

</biblio>

5

Will illustrate with:XML DB =

&o1

&o12 &o24 &o29

&96&30

paper bookbook

authordate

titleauthor

authordate

title

biblio

&o47 &o48 &o50

&o52 &25

Smith 1999 DatabaseSystems

RouxCombalusier

1976 DatabaseSystems

. . .

A Core Query LanguageA Core Query LanguageA SQL-like language for querying semi-structured data

6

Query 1:

SELECT author: XFROM biblio.book.author X

SELECT author: XFROM biblio.book.author X

&o1

&o12 &o24 &o29

&96&30

paper bookbook

authordate

titleauthor

authordate

title

biblio

&o47 &o48 &o50

&o52 &25

Smith 1999 DatabaseSystems

Roux Combalusier1976

DatabaseSystems

. . .

answer

author

authorauthor Answer =

{author: “Smith”, author: “Roux”, author: “Combalusier”}

Answer ={author: “Smith”, author: “Roux”, author: “Combalusier”}

7

Query 2:

SELECT row: XFROM biblio._ XWHERE “Smith” in X.author

SELECT row: XFROM biblio._ XWHERE “Smith” in X.author

&o1

&o12 &o24 &o29

&96&30

paper bookbook

authordate

titleauthor

authordate

title

biblio

&o47 &o48 &o50

&o52 &25

Smith 1999 DatabaseSystems

Roux Combalusier1976

DatabaseSystems

. . .

answer

row

row

. . .

Answer ={row: {author:“Smith”, date: 1999, title: “Database…”}, row: …}

Answer ={row: {author:“Smith”, date: 1999, title: “Database…”}, row: …}

8

Query 3:

SELECT row: ( SELECT author: Y FROM X.author Y)FROM biblio.book X

SELECT row: ( SELECT author: Y FROM X.author Y)FROM biblio.book X

&o1

&o12 &o24 &o29

&96&30

paper bookbook

authordate

titleauthor

authordate

title

biblio

&o47 &o48 &o50

&o52 &25

Smith 1999 DatabaseSystems

Roux Combalusier1976

DatabaseSystems

. . .

answer

row

row

&a1

&a2author

authorauthor

Answer ={row: {author:“Smith”}, row: {author:“Roux”, author:“Combalusier”,},}

Answer ={row: {author:“Smith”}, row: {author:“Roux”, author:“Combalusier”,},}

9

Query 4:

SELECT ( SELECT row: {author: Y, title: T} FROM X.author Y, X.title T)FROM biblio.book XWHERE “Roux” in X.author

SELECT ( SELECT row: {author: Y, title: T} FROM X.author Y, X.title T)FROM biblio.book XWHERE “Roux” in X.author

&o1

&o12 &o24 &o29

&96&30

paper bookbook

authordate

titleauthor

authordate

title

biblio

&o47 &o48 &o50

&o52 &25

Smith 1999 DatabaseSystems

Roux Combalusier1976

DatabaseSystems

. . .

answer

row

row

&a1

&a2author

author title

Answer ={row: {author:“Roux”, title: “Database…”}, row: {author:“Combalusier”, title: “Database…”},}

Answer ={row: {author:“Roux”, title: “Database…”}, row: {author:“Combalusier”, title: “Database…”},}

title

10

LorelLorel Minor syntactic differences in regular path

expressions (% instead of _, # instead of _*) Common path convention:

becomes:

SELECT biblio.book.authorFROM biblio.bookWHERE biblio.book.year = 1999

SELECT biblio.book.authorFROM biblio.bookWHERE biblio.book.year = 1999

SELECT X.authorFROM biblio.book XWHERE X.year = 1999

SELECT X.authorFROM biblio.book XWHERE X.year = 1999

11

LorelLorel Existential variables:

• What happens with books having multiple authors ? Author is existentially quantified:

SELECT biblio.book.yearFROM biblio.bookWHERE biblio.book.author = “Roux”

SELECT biblio.book.yearFROM biblio.bookWHERE biblio.book.author = “Roux”

SELECT X.yearFROM biblio.book X, X.author YWHERE Y = “Roux”

SELECT X.yearFROM biblio.book X, X.author YWHERE Y = “Roux”

12

LorelLorel

Path variables. @P in:

• What happens on graphs with cycles ? Constructing new results

• Several default rules Casting between datatypes

• Very useful in practice

SELECT @PFROM biblio.# @P X

SELECT @PFROM biblio.# @P X

13

XPathXPath http://www.w3.org/TR/xpath (11/99) Building block for other W3C standards:

• XSL Transformations (XSLT) • XML Link (XLink)• XML Pointer (XPointer)• XML Query

Was originally part of XSL

14

XPath: SummaryXPath: Summarybib matches a bib element

* matches any element

/ matches the root element

/bib matches a bib element under root

bib/paper matches a paper in bib

bib//paper matches a paper in bib, at any depth

//paper matches a paper at any depth

paper|book matches a paper or a book

@price matches a price attribute

bib/book/@price matches price attribute in book, in bib

bib/book/[@price<“55”]/author/lastname matches…

15

Example for XPath QueriesExample for XPath Queries<bib>

<book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

16

Data Model for XPathData Model for XPath

bib

book book

publisher author . . . .

Addison-Wesley Serge Abiteboul

The root

The root element

17

XPath: Simple ExpressionsXPath: Simple Expressions

Result: <year> 1995 </year>

<year> 1998 </year>

Result: empty (there were no papers)

/bib/book/year/bib/book/year

/bib/paper/year/bib/paper/year

18

XPath: Restricted Kleene ClosureXPath: Restricted Kleene Closure

Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author>

Result: <first-name> Rick </first-name>

//author//author

/bib//first-name/bib//first-name

19

XPath: Text NodesXPath: Text Nodes

Result: Serge Abiteboul

Jeffrey D. Ullman

!Rick Hull doesn’t appear because he has firstname, lastname

Functions in XPath:• text() = matches the text value• node() = matches any node (= * or @* or text())• name() = returns the name of the current tag

/bib/book/author/text()/bib/book/author/text()

20

XPath: WildcardXPath: Wildcard

Result: <first-name> Rick </first-name>

<last-name> Hull </last-name>

* Matches any element

//author/*//author/*

21

XPath: Attribute NodesXPath: Attribute Nodes

Result: “55”

@price means that price is has to be an attribute

/bib/book/@price/bib/book/@price

22

XPath: PredicatesXPath: Predicates

Result: <author> <first-name> Rick </first-name>

<last-name> Hull </last-name>

</author>

/bib/book/author[firstname]/bib/book/author[firstname]

23

XPath: More PredicatesXPath: More Predicates

Result: <lastname> … </lastname>

<lastname> … </lastname>

/bib/book/author[firstname][address[//zip][city]]/lastname/bib/book/author[firstname][address[//zip][city]]/lastname

24

XPath: More PredicatesXPath: More Predicates

/bib/book[@price < “60”]/bib/book[@price < “60”]

/bib/book[author/@age < “25”]/bib/book[author/@age < “25”]

/bib/book[author/text()]/bib/book[author/text()]

25

XQueryXQuery Based on Quilt

(which is based on XML-QL)

http://www.w3.org/TR/xquery/

2/2001 XML Query data

model• Ordered !

FLWOR (flower) Expressions

FOR ...

LET...

WHERE...

ORDER BY…

RETURN...

FOR ...

LET...

WHERE...

ORDER BY…

RETURN...

26

XQueryXQueryQuery: Find all book titles published after 1995:

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year > 1995

RETURN $x/title

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year > 1995

RETURN $x/title

* bib.xml is shown on slide 15Result:<title> Principles of Database…</title>

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

27

XQueryXQuery

Query: Find book titles by the coauthors of “Foundations of Databases”:

FOR $x IN bib/book[title/text() = “Foundations …”]/author $y IN bib/book[author/text() = $x/text()]/title

RETURN <answer> $y/text() </answer>

FOR $x IN bib/book[title/text() = “Foundations …”]/author $y IN bib/book[author/text() = $x/text()]/title

RETURN <answer> $y/text() </answer>

Result: <answer> Foundations … </ answer > < answer> Foundations …</ answer >

The answer willcontain duplicates !

28

XQueryXQuery

Same as before, but eliminate duplicates:

FOR $x IN bib/book[title/text() = “Database Theory”]/author $y IN distinct(bib/book[author/text() = $x/text()]/title)

RETURN <answer> $y/text() </answer>

FOR $x IN bib/book[title/text() = “Database Theory”]/author $y IN distinct(bib/book[author/text() = $x/text()]/title)

RETURN <answer> $y/text() </answer>

Result: < answer> Foundations …</ answer >distinct = a function

that eliminates duplicates

29

SQL and XQuery Side-by-sideSQL and XQuery Side-by-sideProduct(pid, name, maker)Company(cid, name, city)

Query: Find all products made in Seattle

SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle”

SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle”

FOR $x IN /db/Product/row $y IN /db/Company/rowWHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN $x/name

FOR $x IN /db/Product/row $y IN /db/Company/rowWHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN $x/name

SQL XQuery

FOR $y IN /db/Company/row[city/text()=“Seattle”] $x IN /db/Product/row[maker/text()=$y/cid/text()]RETURN $x/name

FOR $y IN /db/Company/row[city/text()=“Seattle”] $x IN /db/Product/row[maker/text()=$y/cid/text()]RETURN $x/name

CoolXQuery

30

XQuery: NestingXQuery: Nesting

Query: For each author of a book by Morgan Kaufmann, list all books s/he published:

FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)RETURN <result> { $a, FOR $t IN /bib/book[author=$a]/title RETURN $t } </result>

FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)RETURN <result> { $a, FOR $t IN /bib/book[author=$a]/title RETURN $t } </result>

<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>

<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>

Result:

31

XQueryXQuery FOR $x IN expr -- binds $x to each value in the

list expr LET $x = expr -- binds $x to the entire list expr

• Useful for common subexpressions and for aggregations

<big_publishers>

FOR $p IN distinct(document("bib.xml")//publisher)

LET $b := document("bib.xml")/book[publisher = $p]

WHERE count($b) > 100

RETURN $p

</big_publishers>

<big_publishers>

FOR $p IN distinct(document("bib.xml")//publisher)

LET $b := document("bib.xml")/book[publisher = $p]

WHERE count($b) > 100

RETURN $p

</big_publishers>

count = a (aggregate) function that returns the number of elms

32

XQueryXQuery

Query: Find books whose price is larger than average:

FOR $a IN /bib/bookLET $b:=avg(/bib/book/price/text())WHERE $a/price/text() > $bRETURN $a

FOR $a IN /bib/bookLET $b:=avg(/bib/book/price/text())WHERE $a/price/text() > $bRETURN $a

33

XQueryXQuery

$b is a collection of elements, not a single elementcount = a (aggregate) function that returns the number of elements

<big_publishers> { FOR $p IN distinct(//publisher/text()) LET $b := document("bib.xml")/book[publisher/text() = $p] WHERE count($b) > 100 RETURN <publisher> $p </publisher>}</big_publishers>

<big_publishers> { FOR $p IN distinct(//publisher/text()) LET $b := document("bib.xml")/book[publisher/text() = $p] WHERE count($b) > 100 RETURN <publisher> $p </publisher>}</big_publishers>

Query: Find all publishers that published more than 100 books:

34

FOR v.s. LETFOR v.s. LETFOR Binds node variables iterationLET Binds collection variables one valueExamples

FOR $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

FOR $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...

LET $x := document("bib.xml")/bib/book

RETURN <result> $x </result>

LET $x := document("bib.xml")/bib/book

RETURN <result> $x </result>

Returns: <result> <book>...</book> <book>...</book> <book>...</book> ...</result>

35

Sorting in XQuerySorting in XQuery

<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> {<name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> {$b/title , $b/@price} </book> SORTBY(price DESCENDING) } </publisher> SORTBY(name) </publisher_list>

<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> {<name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> {$b/title , $b/@price} </book> SORTBY(price DESCENDING) } </publisher> SORTBY(name) </publisher_list>

36

Sorting in XQuerySorting in XQuery Sorting arguments: refer to the name space of the

RETURN clause, not the FOR clause To sort on an element you don’t want to display,

first return it, then remove it with an additional query.

<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> { <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> { $b/title , $b/price } </book> ORDER BY price DESCENDING } </publisher> ORDER BY name </publisher_list>

<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> { <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> { $b/title , $b/price } </book> ORDER BY price DESCENDING } </publisher> ORDER BY name </publisher_list>

37

Collections in XQueryCollections in XQuery

Ordered and unordered collections• /bib/book/author = an ordered collection

• Distinct(/bib/book/author) = an unordered collection

LET $b = /bib/book $b is a collection $b/author a collection (several authors...)

RETURN <result> $b/author </result>RETURN <result> $b/author </result>Returns: <result> <author>...</author> <author>...</author> <author>...</author> ...</result>

38

If-Then-ElseIf-Then-Else

FOR $h IN //holding RETURN <holding> { $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author } </holding> ORDER BY title

FOR $h IN //holding RETURN <holding> { $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author } </holding> ORDER BY title

39

QuantifiersQuantifiers

FOR $b IN //book

WHERE SOME $p IN $b//para SATISFIES

contains($p, "sailing")

AND contains($p, "windsurfing")

RETURN $b/title

FOR $b IN //book

WHERE SOME $p IN $b//para SATISFIES

contains($p, "sailing")

AND contains($p, "windsurfing")

RETURN $b/title

Existential Existential QuantifiersQuantifiers

FOR $b IN //book

WHERE EVERY $p IN $b//para SATISFIES

contains($p, "sailing")

RETURN $b/title

FOR $b IN //book

WHERE EVERY $p IN $b//para SATISFIES

contains($p, "sailing")

RETURN $b/title

Universal Universal QuantifiersQuantifiers

40

Other Stuff in XQueryOther Stuff in XQuery

BEFORE and AFTER• for dealing with order in the input

FILTER• deletes some edges in the result tree

Recursive functions• Currently: arbitrary recursion• Perhaps more restrictions in the future ?

41

Group-By in XQuery ??Group-By in XQuery ??

No GROUPBY currently in XQuery A recent proposal (next)

• What do YOU think ?

42

Group-By in XQuery ??Group-By in XQuery ??

FOR $b IN document("http://www.bn.com")/bib/book,

$y IN $b/@year

WHERE $b/publisher="Morgan Kaufmann"

RETURN GROUPBY $y

WHERE count($b) > 10

IN <year> $y </year>

FOR $b IN document("http://www.bn.com")/bib/book,

$y IN $b/@year

WHERE $b/publisher="Morgan Kaufmann"

RETURN GROUPBY $y

WHERE count($b) > 10

IN <year> $y </year>

SELECT year

FROM Bib

WHERE Bib.publisher="Morgan Kaufmann"

GROUPBY year

HAVING count(*) > 10

SELECT year

FROM Bib

WHERE Bib.publisher="Morgan Kaufmann"

GROUPBY year

HAVING count(*) > 10

with GROUPBY

Equivalent SQL

43

Group-By in XQuery ??Group-By in XQuery ??

FOR $b IN document("http://www.bn.com")/bib/book, $a IN $b/author, $y IN $b/@yearRETURN GROUPBY $a, $y IN <result> $a, <year> $y </year>, <total> count($b) </total> </result>

FOR $b IN document("http://www.bn.com")/bib/book, $a IN $b/author, $y IN $b/@yearRETURN GROUPBY $a, $y IN <result> $a, <year> $y </year>, <total> count($b) </total> </result>

FOR $Tup IN distinct(FOR $b IN document("http://www.bn.com")/bib, $a IN $b/author, $y IN $b/@year RETURN <Tup> <a> $a </a> <y> $y </y> </Tup>), $a IN $Tup/a/node(), $y IN $Tup/y/node() LET $b = document("http://www.bn.com")/bib/book[author=$a,@year=$y] RETURN <result> $a, <year> $y </year>, <total> count($b) </total> </result>

FOR $Tup IN distinct(FOR $b IN document("http://www.bn.com")/bib, $a IN $b/author, $y IN $b/@year RETURN <Tup> <a> $a </a> <y> $y </y> </Tup>), $a IN $Tup/a/node(), $y IN $Tup/y/node() LET $b = document("http://www.bn.com")/bib/book[author=$a,@year=$y] RETURN <result> $a, <year> $y </year>, <total> count($b) </total> </result>

with GROUPBY

Without GROUPBY

44

Group-By in XQuery ??Group-By in XQuery ??

FOR $b IN document("http://www.bn.com")/bib/book, $a IN $b/author, $y IN $b/@year, $t IN $b/title, $p IN $b/publisher RETURN GROUPBY $p, $y IN <result> $p, <year> $y </year>, GROUPBY $a IN <authorEntry> $a, GROUPBY $t IN $t <authorEntry> </result>

Nested GROUPBY’s

45

XQueryXQuery

Summary:[Demo]

FOR-LET-WHERE-RETURN = FLWR

FOR/LET Clauses

WHERE Clause

RETURN Clause

List of tuples of bounded variables

List of pruned tuples of bounded variables

Instance of XQuery data model