97
1 Lecture 5: XML and XQuery

Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

1

Lecture 5: XML and XQuery

Page 2: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

2

Semistructured Data

◆Another data model, based on trees.

◆Motivation: flexible representation of data.◗ Often, data comes from multiple sources

with differences in notation, meaning, etc.

◆Motivation: sharing of documents among systems and databases.

Page 3: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

3

Graphs of Semistructured Data

◆Nodes = objects.

◆Labels on arcs (attributes, relationships).

◆Atomic values at leaf nodes (nodes with no arcs out).

◆Flexibility: no restriction on:◗ Labels out of a node.

◗ Number of successors with a given label.

Page 4: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

4

Example: Data Graph

Bud

A.B.

Gold1995

MapleJoe’s

M’lob

beer beerbar

manfmanf

servedAt

name

namename

addr

prize

year award

root

The bar objectfor Joe’s Bar

The beer objectfor Bud

Notice anew kindof data.

Page 5: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

5

XML

◆XML = Extensible Markup Language.

◆While HTML uses tags for formatting (e.g., “italic”), XML uses tags for semantics (e.g., “this is an address”).

◆Key idea: create tag sets for a domain (e.g., genomics), and translate all data into properly tagged XML documents.

Page 6: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

6

Well-Formed and Valid XML

◆Well-Formed XML allows you to invent your own tags.◗ Similar to labels in semistructured data.

◆Valid XML involves a DTD (Document Type Definition), a grammar for tags.

Page 7: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

7

Well-Formed XML

◆Start the document with a declaration, surrounded by <?xml … ?> .

◆Normal declaration is:<?xml version = “1.0” standalone = “yes” ?>◗ “Standalone” = “no DTD provided.”

◆Balance of document is a root tag surrounding nested tags.

Page 8: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

8

Tags

◆Tags, as in HTML, are normally matched pairs, as <FOO> … </FOO> .

◆Tags may be nested arbitrarily.

◆XML tags are case sensitive.

Page 9: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

9

Example: Well-Formed XML

<?xml version = “1.0” standalone = “yes” ?><BARS>

<BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME>

<PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME>

<PRICE>3.00</PRICE></BEER></BAR><BAR> …

</BARS>

A NAMEsubobject

A BEERsubobject

Page 10: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

10

XML and Semistructured Data

◆Well-Formed XML with nested tags is exactly the same idea as trees of semistructured data.

◆We shall see that XML also enables nontree structures, as does the semistructured data model.

Page 11: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

11

Example

◆The <BARS> XML document is:

Joe’s Bar

Bud 2.50 Miller 3.00

PRICE

BAR

BAR

BARS

NAME . . .

BAR

PRICENAME

BEERBEER

NAME

Page 12: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

12

DTD Structure

<!DOCTYPE <root tag> [

<!ELEMENT <name>(<components>)>

. . . more elements . . .

]>

Page 13: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

13

DTD Elements

◆The description of an element consists of its name (tag), and a parenthesized description of any nested tags.◗ Includes order of subtags and their

multiplicity.

◆Leaves (text elements) have #PCDATA (Parsed Character DATA ) in place of nested tags.

Page 14: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

14

Example: DTD

<!DOCTYPE BARS [

<!ELEMENT BARS (BAR*)>

<!ELEMENT BAR (NAME, BEER+)>

<!ELEMENT NAME (#PCDATA)>

<!ELEMENT BEER (NAME, PRICE)>

<!ELEMENT PRICE (#PCDATA)>

]>

A BARS object haszero or more BAR’snested within.

A BAR has oneNAME and oneor more BEERsubobjects.

A BEER has aNAME and aPRICE.

NAME and PRICEare text.

Page 15: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

15

Element Descriptions

◆Subtags must appear in order shown.

◆A tag may be followed by a symbol to indicate its multiplicity.◗ * = zero or more.

◗ + = one or more.

◗ ? = zero or one.

◆Symbol | can connect alternative sequences of tags.

Page 16: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

16

Example: Element Description

◆A name is an optional title (e.g., “Prof.”), a first name, and a last name, in that order, or it is an IP address:

<!ELEMENT NAME (

(TITLE?, FIRST, LAST) | IPADDR

)>

Page 17: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

17

Use of DTD’s

1. Set standalone = “no”.

2. Either:a) Include the DTD as a preamble of the XML

document, or

b) Follow DOCTYPE and the <root tag> by SYSTEM and a path to the file where the DTD can be found.

Page 18: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

18

Example (a)<?xml version = “1.0” standalone = “no” ?><!DOCTYPE BARS [

<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>

]><BARS>

<BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME> <PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME> <PRICE>3.00</PRICE></BEER>

</BAR> <BAR> …

</BARS>

The DTD

The document

Page 19: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

19

Example (b)

◆ Assume the BARS DTD is in file bar.dtd.<?xml version = “1.0” standalone = “no” ?><!DOCTYPE BARS SYSTEM “bar.dtd”><BARS>

<BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME>

<PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME>

<PRICE>3.00</PRICE></BEER></BAR><BAR> …

</BARS>

Get the DTDfrom the filebar.dtd

Page 20: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

20

Attributes

◆Opening tags in XML can have attributes.

◆In a DTD,

<!ATTLIST E . . . >

declares an attribute for element E, along with its datatype.

Page 21: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

21

Example: Attributes

◆Bars can have an attribute kind, a character string describing the bar.

<!ELEMENT BAR (NAME BEER*)>

<!ATTLIST BAR kind CDATA #IMPLIED>

Character stringtype; no tags

Attribute is optionalopposite: #REQUIRED

Page 22: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

22

Example: Attribute Use

◆ In a document that allows BAR tags, we might see:

<BAR kind = “sushi”>

<NAME>Akasaka</NAME>

<BEER><NAME>Sapporo</NAME>

<PRICE>5.00</PRICE></BEER>

...

</BAR>

Note attributevalues are quoted

Page 23: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

23

ID’s and IDREF’s

◆Attributes can be pointers from one object to another.◗ Compare to HTML’s NAME = “foo” and

HREF = “#foo”.

◆Allows the structure of an XML document to be a general graph, rather than just a tree.

Page 24: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

24

Creating ID’s

◆Give an element E an attribute A of type ID.

◆When using tag <E > in an XML document, give its attribute A a unique value.

◆Example:

<E A = “xyz”>

Page 25: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

25

Creating IDREF’s

◆To allow objects of type F to refer to another object with an ID attribute, give F an attribute of type IDREF.

◆Or, let the attribute have type IDREFS, so the F –object can refer to any number of other objects.

Page 26: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

26

Example: ID’s and IDREF’s

◆ Let’s redesign our BARS DTD to include both BAR and BEER subelements.

◆ Both bars and beers will have ID attributes called name.

◆ Bars have SELLS subobjects, consisting of a number (the price of one beer) and an IDREF theBeer leading to that beer.

◆ Beers have attribute soldBy, which is an IDREFS leading to all the bars that sell it.

Page 27: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

27

The DTD

<!DOCTYPE BARS [<!ELEMENT BARS (BAR*, BEER*)><!ELEMENT BAR (SELLS+)>

<!ATTLIST BAR name ID #REQUIRED><!ELEMENT SELLS (#PCDATA)>

<!ATTLIST SELLS theBeer IDREF #REQUIRED><!ELEMENT BEER EMPTY>

<!ATTLIST BEER name ID #REQUIRED><!ATTLIST BEER soldBy IDREFS #IMPLIED>

]>Beer elements have an ID attribute called name,and a soldBy attribute that is a set of Bar names.

SELLS elementshave a number(the price) andone referenceto a beer.

Bar elements have nameas an ID attribute andhave one or moreSELLS subelements.

Explainednext

Page 28: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

28

Example Document

<BARS>

<BAR name = “JoesBar”>

<SELLS theBeer = “Bud”>2.50</SELLS>

<SELLS theBeer = “Miller”>3.00</SELLS>

</BAR> …

<BEER name = “Bud” soldBy = “JoesBar

SuesBar …”/> …

</BARS>

Page 29: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

29

Empty Elements

◆We can do all the work of an element in its attributes.◗ Like BEER in previous example.

◆Another example: SELLS elements could have attribute price rather than a value that is a price.

Page 30: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

30

Example: Empty Element

◆In the DTD, declare:

<!ELEMENT SELLS EMPTY>

<!ATTLIST SELLS theBeer IDREF #REQUIRED>

<!ATTLIST SELLS price CDATA #REQUIRED>

◆Example use:

<SELLS theBeer = “Bud” price = “2.50”/>Note exception to“matching tags” rule

Page 31: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

31

XPath

Path Expressions

Conditions

Page 32: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

32

Paths in XML Documents

◆XPath is a language for describing paths in XML documents.

◆Really think of the semistructured data graph and its paths.

Page 33: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

33

Example DTD

<!DOCTYPE BARS [

<!ELEMENT BARS (BAR*, BEER*)>

<!ELEMENT BAR (PRICE+)>

<!ATTLIST BAR name ID #REQUIRED>

<!ELEMENT PRICE (#PCDATA)>

<!ATTLIST PRICE theBeer IDREF #REQUIRED>

<!ELEMENT BEER EMPTY>

<!ATTLIST BEER name ID #REQUIRED>

<!ATTLIST BEER soldBy IDREFS #IMPLIED>

]>

Page 34: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

34

Example Document

<BARS>

<BAR name = “JoesBar”>

<PRICE theBeer = “Bud”>2.50</PRICE>

<PRICE theBeer = “Miller”>3.00</PRICE>

</BAR> …

<BEER name = “Bud” soldBy = “JoesBar

SuesBar … ”/> …

</BARS>

Page 35: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

35

Path Descriptors

◆Simple path descriptors are sequences of tags separated by slashes (/).

◆If the descriptor begins with /, then the path starts at the root and has those tags, in order.

◆If the descriptor begins with //, then the path can start anywhere.

Page 36: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

36

Value of a Path Descriptor

◆Each path descriptor, applied to a document, has a value that is a sequence of elements.

◆An element is an atomic value or a node.

◆A node is matching tags and everything in between.◗ I.e., a node of the semistructured graph.

Page 37: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

37

Example: /BARS/BAR/PRICE

<BARS>

<BAR name = “JoesBar”>

<PRICE theBeer = “Bud”>2.50</PRICE>

<PRICE theBeer = “Miller”>3.00</PRICE>

</BAR> …

<BEER name = “Bud” soldBy = “JoesBar

SuesBar …”/> …

</BARS>/BARS/BAR/PRICE describes theset with these two PRICE elementsas well as the PRICE elements forany other bars.

Page 38: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

38

Example: //PRICE

<BARS>

<BAR name = “JoesBar”>

<PRICE theBeer = “Bud”>2.50</PRICE>

<PRICE theBeer = “Miller”>3.00</PRICE>

</BAR> …

<BEER name = “Bud” soldBy = “JoesBar

SuesBar …”/>…

</BARS>//PRICE describes the same PRICEelements, but only because the DTDforces every PRICE to appear withina BARS and a BAR.

Page 39: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

39

Wild-Card *

◆A star (*) in place of a tag represents any one tag.

◆Example: /*/*/PRICE represents all price objects at the third level of nesting.

Page 40: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

40

Example: /BARS/*

<BARS>

<BAR name = “JoesBar”>

<PRICE theBeer = “Bud”>2.50</PRICE>

<PRICE theBeer = “Miller”>3.00</PRICE>

</BAR> …

<BEER name = “Bud” soldBy = “JoesBar

SuesBar …”/> …

</BARS> /BARS/* captures all BARand BEER elements, suchas these.

Page 41: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

41

Attributes

◆In XPath, we refer to attributes by prepending @ to their name.

◆Attributes of a tag may appear in paths as if they were nested within that tag.

Page 42: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

42

Example: /BARS/*/@name

<BARS>

<BAR name = “JoesBar”>

<PRICE theBeer = “Bud”>2.50</PRICE>

<PRICE theBeer = “Miller”>3.00</PRICE>

</BAR> …

<BEER name = “Bud” soldBy = “JoesBar

SuesBar …”/> …

</BARS>/BARS/*/@name selects allname attributes of immediatesubelements of the BARS element.

Page 43: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

43

Selection Conditions

◆A condition inside […] may follow a tag.

◆If so, then only paths that have that tag and also satisfy the condition are included in the result of a path expression.

Page 44: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

44

Example: Selection Condition

◆/BARS/BAR[PRICE < 2.75]/PRICE<BARS>

<BAR name = “JoesBar”>

<PRICE theBeer = “Bud”>2.50</PRICE>

<PRICE theBeer = “Miller”>3.00</PRICE>

</BAR> …The condition that the PRICE be< $2.75 makes this price but notthe Miller price satisfy the pathdescriptor.

Page 45: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

45

Example: Attribute in Selection

◆/BARS/BAR/PRICE[@theBeer = “Miller”]<BARS>

<BAR name = “JoesBar”>

<PRICE theBeer = “Bud”>2.50</PRICE>

<PRICE theBeer = “Miller”>3.00</PRICE>

</BAR> …Now, this PRICE elementis selected, along withany other prices for Miller.

Page 46: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

46

Axes

◆In general, path expressions allow us to start at the root and execute steps to find a sequence of nodes at each step.

◆At each step, we may follow any one of several axes.

◆The default axis is child:: --- go to all the children of the current set of nodes.

Page 47: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

47

Example: Axes

◆/BARS/BEER is really shorthand for /BARS/child::BEER .

◆@ is really shorthand for the attribute:: axis.◗ Thus, /BARS/BEER[@name = “Bud” ] is

shorthand for

/BARS/BEER[attribute::name = “Bud”]

Page 48: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

48

More Axes

◆ Some other useful axes are:1. parent:: = parent(s) of the current node(s).

2. descendant-or-self:: = the current node(s) and all descendants.◗ Note: // is really shorthand for this axis.

3. ancestor::, ancestor-or-self, etc.

Page 49: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

49

XQuery

Values

FLWR Expressions

Other Expressions

Page 50: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

50

XQuery

◆XQuery extends XPath to a query language that has power similar to SQL.

◆XQuery is an expression language.◗ Like relational algebra --- any XQuery

expression can be an argument of any other XQuery expression.

◗ Unlike RA, with the relation as the sole datatype, XQuery has a subtle type system.

Page 51: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

51

The XQuery Type System

1. Atomic values : strings, integers, etc.◆ Also, certain constructed values like true(),

date(“2004-09-30”).

2. Nodes.◆ Seven kinds.

◆ We’ll only worry about four, on next slide.

Page 52: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

52

Some Node Types

1. Element Nodes are like nodes of semistructured data.

◆ Described by !ELEMENT declarations in DTD’s.

2. Attribute Nodes are attributes, described by !ATTLIST declarations in DTD’s.

3. Text Nodes = #PCDATA.

4. Document Nodes represent files.

Page 53: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

53

Example Document

<BARS>

<BAR name = “JoesBar”>

<PRICE theBeer = “Bud”>2.50</PRICE>

<PRICE theBeer = “Miller”>3.00</PRICE>

</BAR> …

<BEER name = “Bud” soldBy = “JoesBar

SuesBar … ”/> …

</BARS>

Page 54: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

54

Example Nodes

BARS

PRICEPRICE

BEERBAR name =“JoesBar”

theBeer =“Miller”

theBeer= “Bud”

SoldBy= “…”

name =“Bud”

3.002.50 Green = elementGold = attributePurple = text

Page 55: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

55

Document Nodes

◆Form: document(“<file name>”).

◆Establishes a document to which a query applies.

◆Example: document(“/usr/ullman/bars.xml”)

Page 56: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

56

FLWR Expressions

1. One or more for and/or let clauses.

2. Then an optional where clause.

3. A return clause.

Page 57: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

57

Semantics of FLWR Expressions

◆Each for creates a loop.◗ let produces only a local definition.

◆At each iteration of the nested loops, if any, evaluate the where clause.

◆If the where clause returns TRUE, invoke the return clause, and append its value to the output.

Page 58: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

58

FOR Clauses

for <variable> in <expression>, . . .

◆Variables begin with $.

◆A for-variable takes on each item in the sequence denoted by the expression, in turn.

◆Whatever follows this for is executed once for each value of the variable.

Page 59: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

59

Example: FOR

for $beer in document(“bars.xml”)/BARS/BEER/@name

return

<BEERNAME> {$beer} </BEERNAME>

◆ $beer ranges over the name attributes of all beers in our example document.

◆ Result is a list of tagged names, like <BEERNAME>Bud</BEERNAME> <BEERNAME>Miller</BEERNAME> . . .

“Expand the en-closed string byreplacing variablesand path exps. bytheir values.”

Page 60: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

60

LET Clauses

let <variable> := <expression>, . . .

◆Value of the variable becomes the sequence of items defined by the expression.

◆Note let does not cause iteration; for does.

Page 61: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

61

Example: LET

let $d := document(“bars.xml”)

let $beers := $d/BARS/BEER/@name

return

<BEERNAMES> {$beers} </BEERNAMES>

◆Returns one element with all the names of the beers, like:

<BEERNAMES>Bud Miller …</BEERNAMES>

Page 62: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

62

Following IDREF’s

◆XQuery (but not XPath) allows us to use paths that follow attributes that are IDREF’s.

◆If x denotes a sequence of one or more IDREF’s, then x =>y denotes all the elements with tag y whose ID’s are one of these IDREF’s.

Page 63: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

63

Example

◆ Find all the beer elements where the beer is sold by Joe’s Bar for less than 3.00.

◆ Strategy:1. $beer will for-loop over all beer elements.

2. For each $beer, let $joe be either the Joe’s-Bar element, if Joe sells the beer, or the empty sequence if not.

3. Test whether $joe sells the beer for < 3.00.

Page 64: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

64

Example: The Query

let $d := document(”bars.xml”)

for $beer in $d/BARS/BEER

let $joe := $beer/@soldBy=>BAR[@name=“JoesBar”]

let $joePrice := $joe/PRICE[@theBeer=$beer/@name]

where $joePrice < 3.00

return <CHEAPBEER> {$beer} </CHEAPBEER>

Attribute soldBy is of typeIDREFS. Follow each refto a BAR and check if itsname is Joe’s Bar.

Find that PRICE subelementof the Joe’s Bar element thatrepresents whatever beer iscurrently $beer.

Only pass the values of$beer, $joe, $joePrice tothe RETURN clause if thestring inside the PRICEelement $joePrice is < 3.00

Page 65: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

65

Order-By Clauses

◆FLWR is really FLWOR: an order-by clause can precede the return.

◆Form: order by <expression>◗ With optional ascending or descending.

◆The expression is evaluated for each output element.

◆Determines placement in output sequence.

Page 66: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

66

Example: Order-By

◆List all prices for Bud, lowest first.

let $d := document(“bars.xml”)

for $p in $d/BARS/BAR/PRICE[@theBeer=”Bud”]

order by $p

return { $p }

Page 67: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

67

Predicates

◆Normally, conditions imply existential quantification.

◆Example: /BARS/BAR[@name] means “all the bars that have a name.”

◆Example: /BARS/BAR[@name=”JoesBar”]/PRICE = /BARS/BAR[@name=”SuesBar”]/PRICE means “Joe and Sue have at least one price in common.”

Page 68: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

68

Path Expression Examples

Doc =

&o1

&o12 &o24 &o29

&o43

&o70 &o71

&96

&243 &206

&25

“Serge”“Abiteboul”

1997

“Victor”“Vianu”

122 133

paper bookpaper

references

references references

authortitle

yearhttp

author

authorauthor

title publisherauthor

authortitle

page

firstname lastnamefirstname

lastnamefirst last

Bib

&o44 &o45 &o46

&o47 &o48 &o49 &o50 &o51

&o52

Bib/paper = <&o12,&o29>

Bib/book/publisher = <&o51>

Bib/paper/author/lastname = <&o71,&206>

Note that order of elements matters!

Page 69: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

69

FOR vs. LET: Example

FOR $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...

LET $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

Returns:<result> <book>...</book> <book>...</book> <book>...</book> ...</result>

Page 70: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

70

XQuery Example 1

Find all book titles published after 1995:

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year > 1995

RETURN $x/title

Result: <title> abc </title> <title> def </title> <title> ghi </title>

Page 71: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

71

XQuery Example 2For each author of a book by Morgan

Kaufmann, list all books she published:

FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)

RETURN <result>

$a,

FOR $t IN /bib/book[author=$a]/title

RETURN $t

</result>

distinct = a function that eliminates duplicates (after converting inputs to atomic values)

Page 72: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

72

Results for Example 2

<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>

Observe how nested structure of result elements is determined by the nested structure of the query.

Page 73: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

73

XQuery Example 3

count = (aggregate) function that returns the number of elements

<big_publishers>

FOR $p IN distinct(document("bib.xml")//publisher)

LET $b := document("bib.xml")/book[publisher = $p]

WHERE count($b) > 100

RETURN $p

</big_publishers>

For each publisher p

- Let the list of books published by p be b

Count the # books in b, and return p if b > 100

Page 74: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

74

XQuery Example 4

Find books whose price is larger than average:

LET $a=avg(document("bib.xml")/bib/book/price)

FOR $b in document("bib.xml")/bib/book

WHERE $b/price > $a

RETURN $b

Page 75: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

75

Collections in XQuery◆ Ordered and unordered collections

◗ /bib/book/author = an ordered collection

◗ Distinct(/bib/book/author) = an unordered collection

◆ Examples:◗ LET $a = /bib/book $a is a collection; stmt

iterates over all books in collecion

◗ $b/author also a collection (several authors...)

RETURN <result> $b/author </result>

Returns a single collection! <result> <author>...</author> <author>...</author> <author>...</author> ... </result>

However:

Page 76: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

76

Collections in XQuery

What about collections in expressions ?

◆ $b/price list of n prices

◆ $b/price * 0.7 list of n numbers??◆ $b/price * $b/quantity list of n x m numbers ??

◗ Valid only if the two sequences have at most one element◗ Atomization

◆ $book1/author eq "Kennedy" - Value Comparison◆ $book1/author = "Kennedy" - General Comparison

Page 77: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

77

Sorting in XQuery

<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher)

ORDERBY $p RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p]

ORDERBY $b/price DESCENDING RETURN <book>

$b/title , $b/price </book> </publisher></publisher_list>

Page 78: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

78

Conditional Expressions: If-Then-Else

FOR $h IN //holding

ORDERBY $h/titleRETURN <holding>

$h/title,

IF $h/@type = "Journal"

THEN $h/editor

ELSE $h/author

</holding>

Page 79: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

79

Existential Quantifiers

FOR $b IN //book

WHERE SOME $p IN $b//para SATISFIES

contains($p, "sailing")

AND contains($p, "windsurfing")

RETURN $b/title

Page 80: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

80

Universal Quantifiers

FOR $b IN //book

WHERE EVERY $p IN $b//para SATISFIES

contains($p, "sailing")

RETURN $b/title

Page 81: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

81

Other Stuff in XQuery◆ Before and After

◗ for dealing with order in the input

◆ Filter◗ deletes some edges in the result tree

◆ Recursive functions

◆ Namespaces

◆ References, links …

◆ Lots more stuff …

Page 82: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

82

AppendixXML Schema and

XQuery Data Model

Page 83: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

83

XML Schema

◆Includes primitive data types (integers, strings, dates, etc.)

◆Supports value-based constraints (integers > 100)

◆User-definable structured types

◆Inheritance (extension or restriction)

◆Foreign keys

◆Element-type reference constraints

Page 84: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

84

Sample XML Schema<schema version=“1.0”

xmlns=“http://www.w3.org/1999/XMLSchema”><element name=“author” type=“string” /><element name=“date” type = “date” /><element name=“abstract”> <type> … </type></element><element name=“paper”> <type> <attribute name=“keywords” type=“string”/> <element ref=“author” minOccurs=“0” maxOccurs=“*” /> <element ref=“date” /> <element ref=“abstract” minOccurs=“0” maxOccurs=“1” /> <element ref=“body” /> </type></element></schema>

Page 85: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

85

XML-Query Data Model

◆Describes XML data as a tree◆Node ::= DocNode |

ElemNode | ValueNode | AttrNode | NSNode | PINode | CommentNode | InfoItemNode | RefNodehttp://www.w3.org/TR/query-datamodel/2/2001

Page 86: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

86

XML-Query Data Model

Element node (simplified definition):

◆ elemNode : (QNameValue, {AttrNode }, [ ElemNode | ValueNode]) ElemNode

◆ QNameValue = means “a tag name”Reads: “Give me a tag, a set of attributes, a list of

elements/values, and I will return an element”

Page 87: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

87

XML Query Data Model

Example:

<book price = “55”

currency = “USD”>

<title> Foundations … </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<year> 1995 </year>

</book>

book1= elemNode(book, {price2, currency3}, [title4, author5, author6, author7, year8])

price2 = attrNode(…) /* next */currency3 = attrNode(…)title4 = elemNode(title, string9)…

Page 88: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

88

Page 89: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

89

XQuery Values

◆ Item = node or atomic value.

◆ Value = ordered sequence of zero or more items.

◆ Examples:1. () = empty sequence.

2. (“Hello”, “World”)

3. (“Hello”, <PRICE>2.50</PRICE>, 10)

Page 90: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

90

Nesting of Sequences Ignored

◆A value can, in principle, be an item of another value.

◆But nested list structures are expanded.

◆Example: ((1,2),(),(3,(4,5))) = (1,2,3,4,5) = 1,2,3,4,5.

◆Important when values are computed by concatenating other values.

Page 91: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

91

Effective Boolean Values

◆ The effective boolean value (EBV) of an expression is:

1. The actual value if the expression is of type boolean.

2. FALSE if the expression evaluates to 0, “” [the empty string], or () [the empty sequence].

3. TRUE otherwise.

Page 92: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

92

EBV Examples

1. @name=”JoesBar” has EBV TRUE or FALSE, depending on whether the name attribute is ”JoesBar”.

2. /BARS/BAR[@name=”GoldenRail”] has EBV TRUE if some bar is named the Golden Rail, and FALSE if there is no such bar.

Page 93: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

93

Boolean Operators

◆ E1 and E2, E1 or E2, not(E ), if (E1) then E2 else E3 apply to any expressions.

◆ Take EBV’s of the expressions first.◆ Example: not(3 eq 5 or 0) has value

TRUE.◆ Also: true() and false() are functions

that return values TRUE and FALSE.

Page 94: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

94

Quantifier Expressions

some $x in E1 satisfies E2

2. Evaluate the sequence E1.

3. Let $x (any variable) be each item in the sequence, and evaluate E2.

4. Return TRUE if E2 has EBV TRUE for at least one $x.

◆ Analogously:every $x in E1 satisfies E2

Page 95: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

95

Document Order

◆Comparison by document order: << and >>.

◆Example: $d/BARS/BEER[@name=”Bud”] << $d/BARS/BEER[@name=”Miller”] is true iff the Bud element appears before the Miller element in the document $d.

Page 96: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

96

Set Operators

◆union, intersect, except operate on sequences of nodes.◗ Meanings analogous to SQL.

◗ Result eliminates duplicates.

◗ Result appears in document order.

Page 97: Lecture 5: XML and XQuery - Kent State Universityjin/teaching/AdvancedDataBases/xml-xquery.pdf · XML and Semistructured Data Well-Formed XML with nested tags is exactly the same

97

Other Operators

◆Use Fortran comparison operators to compare atomic values only.◗ eq, ne, gt, ge, lt, le.

◆Arithmetic operators: +, - , *, div, idiv, mod.◗ Apply to any expressions that yield

arithmetic or date/time values.