Management of XML and Semistructured Data
Lecture 5: Query Languages
Wednesday, 4/1/2001
Strudel and StruQL
• Strudel = a Website management tool
• Idea: separate the following three tasks– Management of data
• use some database
– Management of the site’s structure • use StruQL
– Management of the site’s presentation• use HTML templates (this was before XML...)
Example: Bibliography Data
{Bib: { paper: { author: “Jones”,
author: “Smith”,
title: “The Comma”,
year: 1994 },
paper: { author: “Jones”,
title: “The Dot”,
year: 1998 },
paper: { author: “Mark”,
.... }
. . .
}
}
{Bib: { paper: { author: “Jones”,
author: “Smith”,
title: “The Comma”,
year: 1994 },
paper: { author: “Jones”,
title: “The Dot”,
year: 1998 },
paper: { author: “Mark”,
.... }
. . .
}
}
Input data: Bib
paper paperpaper
authorauthor title
year
“Jones” “Smith” “The Comma” .....
Simple Website Definition in StruQL
WHERE Root -> “Bib.paper.author” -> A
CREATE Root(), HomePage(A)
LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “name” -> A HomePage(A) -> “home” -> Root()
WHERE Root -> “Bib.paper.author” -> A
CREATE Root(), HomePage(A)
LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “name” -> A HomePage(A) -> “home” -> Root()
Root()
HomePage(“Smith”) HomePage(“Jones”) HomePage(“Mark”)
personperson
person
StruQL query:
Result:
Root(), HomePage(A) = Skolem Functions (more later)
“Smith” “Jones” “Mark”name name name
home
home
home
Complex Website Definition in StruQL
WHERE Root -> “Bib” -> X, X -> “paper” -> P, P -> “author” -> A, P -> “title” -> T, P -> “year” -> Y
CREATE Root(), HomePage(A), YearPage(A,Y), PubPage(P)
LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “yearentry” -> YearPage(A,Y), YearPage(A,Y) -> “publication” -> PubPage(P), PubPage(P) -> “author” -> HomePage(A), PubPage(P) -> “title” -> T
WHERE Root -> “Bib” -> X, X -> “paper” -> P, P -> “author” -> A, P -> “title” -> T, P -> “year” -> Y
CREATE Root(), HomePage(A), YearPage(A,Y), PubPage(P)
LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “yearentry” -> YearPage(A,Y), YearPage(A,Y) -> “publication” -> PubPage(P), PubPage(P) -> “author” -> HomePage(A), PubPage(P) -> “title” -> T
Example: A Complex Web SiteRoot()
YearPage(“Smith”,1994)
YearPage(“Smith”,1996)
YearPage(“Jones”,1994)
YearPage(“Jones”,1998)
YearPage(“Mark”,1996)
yearentry yearentry yearentryyearentry yearentry
publication
publicationPubPage(“The Comma”) PubPage(“The Dot”)
publicationpublication
publication
title title
author
author
author
HomePage(“Smith”) HomePage(“Jones”) HomePage(“Mark”)
personperson
person
“The Comma” “The Dot”
Skolem Functions
• Maier, 1986– in OO systems
• Kifer et al, 1989– F-logic
• Hull and Yoshikawa, 1990– deductive db (ILOG)
• Papakonstantinou et al., 1996– semistructured db (MSL)
Skolem Functions in Logic
Origins: First Order Logic
The Satisfiability problemgiven a formula , does it have a model ?
Skolem Functions in Logic
• Example: does have a model ?
Skolem functions: replace with functions, drop
Fact: has a model iff ’ “has a model”
z)))R(y,z)z.(R(x,y)y.(R(x,x.
y))))f(x,R(y,y))f(x,(R(x,y)R(x,'
Skolem Functions in Databases
Recall Datalog:
Means:
Answer(title, author) :- Paper(author, title, year)Answer(title, author) :- Paper(author, title, year)
year))title,or,Paper(auth title)er(author,year.(Answtitle.author.
Skolem Functions in Databases
Now consider:
I want to “create a new object x”. What meaning ?
Answer(author, x) :- Paper(author, title, year)Answer(author, x) :- Paper(author, title, year)
year))title,or,Paper(auth x)er(author,year.(Answx.title.author.
year))title,or,Paper(auth x)er(author,year.(Answx.title.author.
year))title,or,Paper(auth x)er(author,year.(Answtitle.x.author.
Skolem Functions in Databases
Better: use Skolem functions directly in Datalog
Choices:Answer(author, NewObj(author)) :- Paper(author, title, year)Answer(author, NewObj(author)) :- Paper(author, title, year)
Answer(author, NewObj(author,title)) :- Paper(author, title, year)Answer(author, NewObj(author,title)) :- Paper(author, title, year)
Answer(author, NewObj(title,year)) :- Paper(author, title, year)Answer(author, NewObj(title,year)) :- Paper(author, title, year)
Answer(author, NewObj()) :- Paper(author, title, year)Answer(author, NewObj()) :- Paper(author, title, year)
Skolem Functions in StruQL
StruQL’s semantics:
• Input graph: (Node, Edge)
• Output graph:(Node’, Edge’)
Example:
WHERE Root -> “Bib.paper.author” -> A
CREATE Root(), HomePage(A)
LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “name” -> A HomePage(A) -> “home” -> Root()
WHERE Root -> “Bib.paper.author” -> A
CREATE Root(), HomePage(A)
LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “name” -> A HomePage(A) -> “home” -> Root()
Node’(Root()) :-Node’(HomePage(A)) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A)Edge’(Root,person,HomePage(A)) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(HomePage(A),person, A) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(HomePage(A),home,Root()) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A)
Node’(Root()) :-Node’(HomePage(A)) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A)Edge’(Root,person,HomePage(A)) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(HomePage(A),person, A) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(HomePage(A),home,Root()) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A)
XPath• http://www.w3.org/TR/xpath (11/99)
• Building block for other W3C standards:– XSL Transformations (XSLT) – XML Link (XLink)– XML Pointer (XPointer)– XML Query
• Was originally part of XSL
Example for XPath Queries<bib>
<book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
Data Model for XPath
bib
book book
publisher author . . . .
Addison-Wesley Serge Abiteboul
The root
The root element
Much like the Xquery data model
XPath: Simple Expressions
/bib/book/year
Result: <year> 1995 </year>
<year> 1998 </year>
/bib/paper/year
Result: empty (there were no papers)
XPath: Restricted Kleene Closure
//author
Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author>
/bib//first-nameResult: <first-name> Rick </first-name>
Xpath: Text Nodes
/bib/book/author/text()
Result: Serge Abiteboul
Jeffrey D. Ullman
Rick Hull doesn’t appear because he has firstname, lastname
Functions in XPath:– text() = matches the text value– node() = matches any node (= * or @* or text())– name() = returns the name of the current tag
Xpath: Wildcard
//author/*
Result: <first-name> Rick </first-name>
<last-name> Hull </last-name>
* Matches any element
Xpath: Attribute Nodes
/bib/book/@price
Result: “55”
@price means that price is has to be an attribute
Xpath: Qualifiers
/bib/book/author[firstname]
Result: <author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
Xpath: More Qualifiers
/bib/book/author[firstname][address[//zip][city]]/lastname
Result: <lastname> … </lastname>
<lastname> … </lastname>
Xpath: More Qualifiers
/bib/book[@price < “60”]
/bib/book[author/@age < “25”]
/bib/book[author/text()]
Xpath: Summarybib matches a bib element
* matches any element
/ matches the root element
/bib matches a bib element under root
bib/paper matches a paper in bib
bib//paper matches a paper in bib, at any depth
//paper matches a paper at any depth
paper|book matches a paper or a book
@price matches a price attribute
bib/book/@price matches price attribute in book, in bib
bib/book/[@price<“55”]/author/lastname matches…
Xpath: More Details
• An Xpath expression, p, establishes a relation between:– A context node, and– A node in the answer set
• In other words, p denotes a function:– S[p] : Nodes -> {Nodes}
• Examples:– author/firstname– . = self– .. = parent– part/*/*/subpart/../name = part/*/*[subpart]/name
The Root and the Root
• <bib> <paper> 1 </paper> <paper> 2 </paper> </bib>
• bib is the “document element”
• The “root” is above bib
• /bib = returns the document element
• / = returns the root
• Why ? Because we may have comments before and after <bib>; they become siblings of <bib>
• This is advanced xmlogy
Xpath: More Details
• We can navigate along 13 axes:ancestorancestor-or-selfattributechilddescendantdescendant-or-selffollowingfollowing-siblingnamespaceparentprecedingpreceding-siblingself
Xpath: More Details
• Examples:– child::author/child:lastname = author/lastname
– child::author/descendant::zip = author//zip
– child::author/parent::* = author/..
– child::author/attribute::age = author/@age
• What does this mean ?– paper/publisher/parent::*/author
– /bib//address[ancestor::book]
– /bib//author/ancestor::*//zip
Xpath: Even More Details
• name() = the name of the current node– /bib//*[name()=book] same as /bib//book
• What does this mean ? /bib//*[ancestor::*[name()!=book]]
– In a different notation bib.[^book]*._
• Navigation axis give us strictly more power !