Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf ·...

Preview:

Citation preview

Introduction to XPATH

Adapted from” XML How To Program”by Deitel ” XML How To Program”by Deitel

Chapter 5 –XML Path Language(XPath)

– Readings:� XML Path Language (XPath)

http://www.w3.org/TR/xpath

Introduction

� XPath is a language for specifying navigation within an XML document

� It also provides basic facilities for manipulating strings, � It also provides basic facilities for manipulating strings, numbers, and booleans

� XPath models an XML document as a tree of nodes� Most common nodes are: d, e, a, and t-nodes� XPath defines a way to compute a string value for

each type of a node� Used by other XML technologies

– e.g.XSLT, Xpointer,Xquery

Nodes

� XML document– XML documents are treated as trees of nodes– The root of the tree is called the document node (or root node).– Each node represents part of XML document

� Seven types– Root (document node)– Element– Attribute– Text– Comment– Processing instruction– Namespace

� Attributes or namespaces are not children of their parent node– They describe their parent node

Partial Faculty.xml

d101 faculty.xml

e101faculty

a

name

student student

e101a101

e102 e110 e118

Computingcourse course course

a102

t101

e103 e104 e107

e105a103

c2313

cid

year

2009 ADA

name

e106

t102

sid

x000123

grade

t103

A

e108 e109

t104

sid grade

t105

x0008787 B+

Element node continued

e110course

a

cid

e110

student studenta104

t106

e111 e112 e115

e113a105

c2314

year

2009 CIS

name

e114

t107

sid

x000123

grade

t108

A

e116 e117

t109

sid

x0008787

grade

t110

C

Element node continued

e118coursecid

e118

lecturer student

e118course

a106

t111

e119 e120 e121

a107

c2313

cid

year

2008 IMD

name

t112

Jones

e122 e123

t113

sid

x0008787

grade

t114

C+

XPath Axis

� Within an XPath step, Axis specifies “direction ” in which to navigate through a document

– For example, the step:child::studentchild::studentwhere Axis = child:: and Node-test = student would select all child nodes (of a context node) that have the name student

� The XPath supports 12 different axis for navigation� The child:: axis is the most commonly used � Some of the others are:

– attribute:: (access attributes of a context node),– descendant:: (access descendant nodes of a context node),– self:: (access the context node itself),– descendant-or-self:: (access the context node and its

descendants, and returns the contents of the nodes that satisfy the node test)

– parent:: (access the parent node of a context node),

XPath axes.Axis Name Ordering Description

self:: none The context node itself. See Note

parent:: reverse The context node’s parent, if one exists. See Note

child:: forward The context node’s children, if they exist. The default if no axis is provided See Note

ancestor:: reverse The context node’s ancestors parents, grandparents etc, if they exist. ancestor:: reverse The context node’s ancestors parents, grandparents etc, if they exist.

ancestor-or-self:: reverse The context node’s ancestors and also itself.

descendant:: forward The context node’s descendants i.e children grandchildren etc.

descendant-or-self:: forward The context node’s descendants and also itself.

following:: forward Selects everything in the document after the closing tag of the current node.

following-sibling:: forward The sibling nodes following the context node.

preceding:: reverse Selects everything in the document that is before the start tag of the current node

preceding-sibling:: reverse The sibling nodes preceding the context node.

attribute:: forward The attribute nodes of the context node. See Note

namespace:: forward The namespace nodes of the context node.

NOTE: These AXES can be used in an abbreviated form

Some location -path abbreviations.

Location Path Description

child:: This location path is used by default if no axis is supplied and may therefore be omitted. be omitted.

attribute:: The attribute axis may be abbreviated as @.

/descendant-or-self::node()/ This location path is abbreviated as two slashes (//).

self::node() The context node is abbreviated with a period (.).

parent::node() The context node’s parent is abbreviated with two periods (..).

Node-set operators.

Node-set Operators Description

pipe (|)

Performs the union of two node-sets.

slash (/) Separates location steps.

double-slash (//) Abbreviation for the location path /descendant-or-self::node()/

div Division

!= , <=, <, =, >=, > are also supported

and AND

or OR

Some Node -set and String functions

Functions Description

last Returns the number of nodes in the node-set.

position Returns the position number of the current node in the node-set being tested. the node-set being tested.

count Returns the number of nodes in node-set.

name Returns a string containing a the name of the node in the node-set argument that is first in document order.

string-length Returns the number of characters in the string.

starts-with Returns true if the first argument string starts with the second argument string; otherwise returns false.

contains Returns true if the first argument string contains the second argument string; otherwise returns false.

See http://www.w3.org/TR/xpathfor more functions

The Node -test of an XPath Step

� A Node-test specifies a simple test on XML nodes found along the steps’ axisnodes found along the steps’ axis

� Nodes that pass that test are candidates for the next step

� The node test can be based on the– Node name, or– Node kind

Node-Test Based on Names

� Each axis has a main node kind– the attribute:: axis has attribute – all other axes (child:: , descendant:: , parent:: ) have

element as the main node kindelement as the main node kind

� Only node name tests on nodes of the main node kind can be true

� Suppose course e118 is the context node– descendant::sid returns (e122),– child::* returns (e119, e120, e121),– attribute::year returns (the value 2008 of a107),– attribute::name returns () (an empty sequence of

nodes)

Node-Test Based on the Node Kind

� The most common node-tests that are based on the node kind are:

– node() that selects each node, regardless of the kindnode() that selects each node, regardless of the kind– text() that selects each t-node,– element() that selects each e-node, and– attribute() that selects each a-node- comment() that selects each c-node

� Suppose student node e121 is the context node,then

child::grade/child::text()

returns the sequence (t114) whose string value is C+ (actually, query processor returns only the string C+)

XPath Location Paths

� Navigation through an XML document is declared using Location Paths expressions

� Location paths can be expressed using either an unabbreviated or an abbreviated syntax

� Location Paths are made up of steps

Evaluation of a Location Path

� A location path is evaluated step by step, from left to right

� A step is applied to a single node, so called � A step is applied to a single node, so called context node

� The application of a step on a context node selects a sequence of result nodes

� Each node of a result sequence is then used as a context node in the following step

� The result of an expression is a concatenation of node-sequences selected by the last step

Unabbreviated Syntax of Location Paths

� A location path has the following syntax:Path ::= Step1/…/Stepn

where each Step is a triple (Axis, Node-test, Predicate) where each Step is a triple (Axis, Node-test, Predicate) and is defined as follows:

Step ::= Axis:: Node-test Predicate*– The axis specifies the direction to move in the document tree– The node test selects nodes along the specified axis, and– The predicates (if any) filter the nodes selected

� Separators “/” between two subsequent steps indicate a direct superior-subordinated relationship between nodes involved in steps

What Does an XPath Expression Return?

� A location path expression returns a sequence of result nodes with their contents in the form of an XML document

� This XML document does not have to be well formedThis XML document does not have to be well formed� Xpath expression:

/child::course[attribute::cid=“c2313”]/child::student[child::sid=“x0008787"]

� Result: <student> <sid>x0008787</sid><grade>B+</grade>

</student><student>

<sid>x0008787</sid><grade>C+</grade>

</student>

Predicates of a Step

� An XPath step can also include a sequence of predicatesin square brackets

[<predicate>] [<predicate>]

� Predicates are applied to nodes selected by a node-test� Only nodes that evaluate true for all predicates will belong

to the result of a step� A predicate compares a node property with a value using

operators from the set {=, <, >,<, >, !=,}� A node property can be:

– The value of an attribute,– The value of #PCDATA of an element, or– The sibling order value of a node (returned by the function position() )

Examples of XPath Predicates

� Let faculty e101 be the context node– child::course[position()=2] selects the second child

element of the context node that has the name course , and element of the context node that has the name course , and returns e110

– child::course[attribute::cid= “c2313 ”] selects all course children of the context node that have the attribute cid= “c2313 ”, and returns (e102, e118)

– descendant::student[child::sid= “x000123 ”] selects the student descendants of the context node that have a sid child with a string value equal to “s1 ” (e104, e112)

Abbreviated Syntax of Location Path (1)

� The most important abbreviation is that child:: axis can be omitted from a location step

� In fact, child:: is the default axis� For example,

– student/sid is a short for – child::student/child::sid

� There is also an abbreviation for attributes: attribute:: can be abbreviated to @

� For example,– course[@year= “2009 ”] is short for– child::course[attribute::year= “2007 ”] and will

select all course children of the context node whose year is “2009 ” (e102, e110)

Abbreviated Syntax of Location Path (2)

� If a predicate expression evaluates to an integer value that value is considered to be the position of the node selected

For example, step would select the second – For example, student[2] step would select the second student child of the context node

� An empty step ‘// ’ is also a frequently used abbreviation, it specifies that the element that follows may be nested anywhere within the document

– //student would select all student nodes anywhere within the document

– course[@cid= “c2313 ”][@year= “2008 ”]//grade will select all grade elements subordinated to the course element with pid= “p13 ” and year= “2008 ”

Abbreviated Syntax of Location Path (3)

� A location step of “. “is short for self::node() , where self:: refers to the context node and node()

returns nodes of any typereturns nodes of any type� For example:

– .//student is short for– self::node()/descendant-or-self::node()/child::student

and will select all student elements that are children of the context node itself or of any of its descendants

� A location step of .. is short for parent::node()– For example,

� ../lecturer is short for � parent::node()/child::lecturer and will select all lecturer

children of the parent of the context node

1 <?xml version = "1.0"?>

2

3 <!--: books.xml -->

4 <books>

5 <!-- XML book list -->

6 <book>

7 <title>Java How to Program</title>

8 <translation edition = "1">Spanish</translation>

9 <translation edition = "1">Chinese</translation>

10 <translation edition = "1">Japanese</translation>

11 <translation edition = "2">French</translation>

12 <translation edition = "2">Japanese</translation><price> 75</price>13 <price>75</price>

14 </book>

1515

16

17 <book>

18 <title>C++ How to Program</title>

19 <translation edition = "1">Korean</translation>

20 <translation edition = "2">French</translation>

21 <translation edition = "2">Spanish</translation>

22 <translation edition = "3">Italian</translation>

23 <translation edition = "3">Japanese</translation>

24 <price>65</price>

25 </book>

26 </books>

Predicate Exercises for book.xml

� Examine the XPATH expressions and– In your own words explain what will be returned– Execute them. Did you get it right?1. /books/book[2]

2. /child::books/child::book[position()=2]

3. /books/book[price>70]

4. /books/book[price>70]/title/text()

5. /books/book[last()]

6. /books/book/translation[@edition="1" and text() ="Chinese"]/preceding-sibling::title/text()

Write some XPATH Expressions

� Which books have Japanese translations?– Hint– Use predicate– Use predicate

� Boolean expression for filtering nodes from the search� Compare string value of current node to string ‘Japanese’

� Find the textbook name that has a 3rd edition and a Italian translation

� What translations of the C++ How To Program text book are on the first and second editions?

Summary

� XPath is a language for specifying navigation through an XML document

� XPath models an XML document as a tree of nodesA location path has the following syntax:� A location path has the following syntax:

Path ::= Step1/…/Stepn

where each Step is a triple (Axis, Node-test, Predicate): – The axis specifies the direction to move in the document tree– The node test selects nodes along the specified axis, and– The predicates (if any) filter the nodes selected

� A location path can be either:Relative, or Absolute� A relative location path is declared with regard to a context node

and its evaluation starts from this node

Recommended