50
XPath XPath By Laouina Marouane

XPath By Laouina Marouane. Outline Introduction Data Model Expression Patterns Patterns Location Paths Location Paths Example XPath 2.0 Practice

  • View
    223

  • Download
    1

Embed Size (px)

Citation preview

XPathXPath

By Laouina Marouane

OutlineOutline

IntroductionIntroduction Data ModelData Model ExpressionExpression

PatternsPatterns Location PathsLocation Paths

ExampleExample XPath 2.0XPath 2.0 PracticePractice ConclusionConclusion

What is XPath?What is XPath?

A scheme for locating documents and identifying sub-A scheme for locating documents and identifying sub-structures within them.structures within them.

A language designed to be used by both XSL A language designed to be used by both XSL Transformations (XSLT) and XPointer.Transformations (XSLT) and XPointer.

Provides common syntax and semantics for functionality Provides common syntax and semantics for functionality shared between XSLT and XPointer.shared between XSLT and XPointer.

Primary purpose: Address ‘parts’ of an XML document, Primary purpose: Address ‘parts’ of an XML document, and provide basic facilities for manipulation of strings, and provide basic facilities for manipulation of strings, numbers and booleans.numbers and booleans.

W3C Recommendation. November 16, 1999W3C Recommendation. November 16, 1999 Latest version: http://www.w3.org/TR/xpathLatest version: http://www.w3.org/TR/xpath

Why XPath?Why XPath?

Unique identifiers are not sufficientUnique identifiers are not sufficient Assigning unique identifier to every element is Assigning unique identifier to every element is

a burdena burden Identity of element may be unknown Identity of element may be unknown Identifiers cannot handle ranges of textIdentifiers cannot handle ranges of text May be inconvenient to identify a large May be inconvenient to identify a large

number of objects by listing their identifiersnumber of objects by listing their identifiers

IntroductionIntroduction XPath uses a compact, string-based, rather than XML XPath uses a compact, string-based, rather than XML

element-based syntax.element-based syntax. Operates on the abstract, logical structure of an XML Operates on the abstract, logical structure of an XML

document (tree of nodes) rather than its surface syntax.document (tree of nodes) rather than its surface syntax. Uses a path notation (like URLs) to navigate through this Uses a path notation (like URLs) to navigate through this

hierarchical tree structure, from which it got its name.hierarchical tree structure, from which it got its name. A subset of it can be used for matching, i.e. testing A subset of it can be used for matching, i.e. testing

whether or not a node matches a pattern. whether or not a node matches a pattern. Models an XML document as a tree of nodes of types: Models an XML document as a tree of nodes of types:

element, attribute, text.element, attribute, text. Supports Namespaces.Supports Namespaces. Name of a node (a pair consisting of a local part and Name of a node (a pair consisting of a local part and

namespace URI).namespace URI). Example of an XPath expression: /bib/book/publisherExample of an XPath expression: /bib/book/publisher

Data ModelData Model

Treats an XML document as a logical treeTreats an XML document as a logical tree This tree consists of 7 nodes:This tree consists of 7 nodes:

Root Node – the root of the document not the document elementRoot Node – the root of the document not the document element Element Nodes – one for each element in the documentElement Nodes – one for each element in the document

Unique ID’sUnique ID’s Attribute NodesAttribute Nodes Namespace NodesNamespace Nodes Processing Instruction NodesProcessing Instruction Nodes Comment NodesComment Nodes Text NodesText Nodes

The tree structure is ordered and reads from top to The tree structure is ordered and reads from top to bottom and left to rightbottom and left to right

Data ModelData Model

bib

book book

publisher author . . . .

Addison-Wesley Serge Abiteboul

The root

The root elementProcessing instruction

Comment

ExampleExample

For this simple doc:For this simple doc:<doc><doc><?Pub Caret?><?Pub Caret?><para>Some <em>emphasis</em> here. </para><para>Some <em>emphasis</em> here. </para><para>Some more stuff.</para><para>Some more stuff.</para></doc></doc>

Might be represented as:Might be represented as:rootroot

<doc><doc>

<?Pub Caret?> <?Pub Caret?> <para> <para> <para><para>

text <em> text <em> text text texttext

texttext

ExpressionsExpressions A text string to select an element, attribute, A text string to select an element, attribute,

processing instructions, or textprocessing instructions, or text The primary syntactic construct in XPath.The primary syntactic construct in XPath. An expression is evaluated to yield an object, An expression is evaluated to yield an object,

which has one of the following four basic which has one of the following four basic types:types:

1.1. node-set (an unordered collection of nodes node-set (an unordered collection of nodes without duplicates) without duplicates)

2.2. boolean (true or false) boolean (true or false) 3.3. number (a floating-point number) number (a floating-point number) 4.4. string (a sequence of UCS characters) string (a sequence of UCS characters)

Element ContextElement Context

Meaning of element can depend upon its Meaning of element can depend upon its contextcontext

<book><<book><titletitle>…</>…</titletitle></book>></book><person><<person><titletitle>…</>…</titletitle></person>></person>

Want to search for, e.g. title of book, not Want to search for, e.g. title of book, not title of persontitle of person XPath exploits sequential and hierarchical XPath exploits sequential and hierarchical

context of XML to specify elements by their context of XML to specify elements by their context (i.e. location in hierarchy)context (i.e. location in hierarchy)• titletitle book/titlebook/title person/titleperson/title

ContextContext

Expression evaluation occurs with respect to a Expression evaluation occurs with respect to a context .context .

The context consists of:The context consists of:1.1. a node (the a node (the context nodecontext node) ) 2.2. a pair of non-zero positive integers (the a pair of non-zero positive integers (the

context positioncontext position and the and the context sizecontext size) ) 3.3. a set of variable bindings a set of variable bindings 4.4. a function library a function library 5.5. the set of namespace declarations in scope for the set of namespace declarations in scope for

the expression the expression

More on context typesMore on context types

The context position is always less than or equal The context position is always less than or equal to the context size to the context size

The variable bindings consist of a mapping from The variable bindings consist of a mapping from variable names to variable valuesvariable names to variable values

The function library consists of a mapping from The function library consists of a mapping from function names to functions. Each function takes function names to functions. Each function takes zero or more arguments and returns a single zero or more arguments and returns a single result result

The namespace declarations consist of a The namespace declarations consist of a mapping from prefixes to namespace URIs mapping from prefixes to namespace URIs

PatternsPatterns

A pattern is an expression used not to find A pattern is an expression used not to find objects, but to establish if a specific object objects, but to establish if a specific object matches certain criteriamatches certain criteria

Very important in XSLT specificationVery important in XSLT specification The 'The '||' symbol is used to specify ' symbol is used to specify

alternative patterns for matchingalternative patterns for matching note|warning|/book/intronote|warning|/book/intro

Location PathsLocation Paths

One important kind of expression is a location path One important kind of expression is a location path (special case of expr)(special case of expr)

The result of evaluating an expression that is a location The result of evaluating an expression that is a location path is the node-set containing the nodes selected by the path is the node-set containing the nodes selected by the location path location path

Location paths can recursively contain expressions that Location paths can recursively contain expressions that are used to filter sets of nodes are used to filter sets of nodes

LocationPath (most important construct) describes a path LocationPath (most important construct) describes a path from 1 point to another. from 1 point to another. Analogy: Set of street directions.Analogy: Set of street directions.

““Second store on the left after the third light”Second store on the left after the third light” Two types of paths: Relative & AbsoluteTwo types of paths: Relative & Absolute Composed of a series of Composed of a series of stepssteps (1 or more) and optional (1 or more) and optional predicatespredicates

Relative PathsRelative Paths A relative location path consists of a sequence A relative location path consists of a sequence

of one or more location steps separated by / of one or more location steps separated by / Each node in that set is used as a context node Each node in that set is used as a context node

for the following step for the following step E.g. E.g. parapara will select children of the will select children of the

current node that are of name 'current node that are of name 'parapara''• <chapter><chapter> //Current node//Current node <title>…</title> <title>…</title> <para>…</para> <para>…</para> //Selected//Selected <note> <note> <para>…</para> <para>…</para> //Not selected until note//Not selected until note <note> <note></chapter></chapter>

Verbose expression is Verbose expression is child::parachild::para

Absolute PathsAbsolute Paths

An absolute location path consists of / An absolute location path consists of / optionally followed by a relative location optionally followed by a relative location pathpath

A / by itself selects the root node of the A / by itself selects the root node of the document containing the context nodedocument containing the context node

Location Location StepsSteps

A location step has three parts:A location step has three parts:1.1. an axis, which specifies the tree an axis, which specifies the tree

relationship between the nodes selected relationship between the nodes selected by the location step and the context node,by the location step and the context node,

2.2. a node test, which specifies the node type a node test, which specifies the node type and expanded-name of the nodes selected and expanded-name of the nodes selected by the location step, andby the location step, and

3.3. zero or more predicates, which use zero or more predicates, which use arbitrary expressions to further refine the arbitrary expressions to further refine the set of nodes selected by the location step.set of nodes selected by the location step.

Location Steps parts explainedLocation Steps parts explained AxesAxes

13 axes defined in XPath13 axes defined in XPath Ancestor, ancestor-or-selfAncestor, ancestor-or-self AttributeAttribute ChildChild Descendant, descendant-or-selfDescendant, descendant-or-self FollowingFollowing PrecedingPreceding Following-sibling, preceding-siblingFollowing-sibling, preceding-sibling NamespaceNamespace ParentParent SelfSelf

Node testNode test Identifies type of node. Evaluates to true/falseIdentifies type of node. Evaluates to true/false Can be a name or function to evaluate/verify typeCan be a name or function to evaluate/verify type

PredicatePredicate XPath boolean expressions in square brackets following the basis(axis & node XPath boolean expressions in square brackets following the basis(axis & node

test)test)

Location Steps in syntaxLocation Steps in syntax

The syntax for a location step is the axis The syntax for a location step is the axis name and node test separated by a name and node test separated by a double colon, followed by zero or more double colon, followed by zero or more expressions each in square brackets.expressions each in square brackets.

For example, in child::para[position()=1], For example, in child::para[position()=1], child is the name of the axis, para is the child is the name of the axis, para is the node test and [position()=1] is a predicate node test and [position()=1] is a predicate

Abbreviated SyntaxAbbreviated Syntax

child:: can be omitted from a location step.child:: can be omitted from a location step.(child is the default axis)(child is the default axis)div/para is equivalent to div/para is equivalent to child::div/child::parachild::div/child::para

attribute:: can be abbreviated to @attribute:: can be abbreviated to @ // is short for /descendant-or-self::node()/// is short for /descendant-or-self::node()/ A location step of . is short for self::node()A location step of . is short for self::node()

ex: .//para is short for ex: .//para is short for self::node()/descendant-or-self::node()/child::paraself::node()/descendant-or-self::node()/child::para

Location step of .. is short for Location step of .. is short for parent::node()parent::node()

WildcardsWildcards

Sometimes don't or can't know namesSometimes don't or can't know names Can use wildcard 'Can use wildcard '**' for any single element' for any single element

• book/intro/titlebook/intro/title and and book/chapter/titlebook/chapter/title are matched by are matched by book/*/titlebook/*/title (but so is (but so is book/appendix/titlebook/appendix/title))

Verbose Verbose child::*child::* Multiple asterisks can match several levelsMultiple asterisks can match several levels

• But must know exact level and that inappropriate But must know exact level and that inappropriate matches won't be madematches won't be made

DescendantsDescendants

Rather than use wildcard - Recursively Rather than use wildcard - Recursively search through descendantssearch through descendants chapter//parachapter//para will go through chapter will go through chapter

hierarchy and select any hierarchy and select any parapara elements elements• <chapter><chapter> //Starting node//Starting node <title>…</title> <title>…</title> <para>…</para> <para>…</para> //Selected//Selected <note> <note> <para>…</para> <para>…</para> //Selected //Selected <note> <note></chapter></chapter>

child::chapter/descendant-or-self::node()/child::parachild::chapter/descendant-or-self::node()/child::para

AncestorsAncestors

To signify parent of context elementTo signify parent of context element ''....'' parent()parent()

To find all 'To find all 'titletitle' elements that share ' elements that share parent of context nodeparent of context node ../title../title parent::node()/child::titleparent::node()/child::title

Other RelationshipsOther Relationships

May move around siblings of current May move around siblings of current context elementcontext element preceding-sibling::preceding-sibling:: following-sibling::following-sibling::

preceding-sibling::

following-sibling::

parent::

child::

Other Relationships (2)Other Relationships (2)

Can access all ancestors and Can access all ancestors and descendants of current context elementdescendants of current context element ancestor::ancestor:: descendant::descendant::

These methods don't select siblingsThese methods don't select siblings

descendant::

ancestor::

Other Relationships (3)Other Relationships (3)

Can access all ancestors and Can access all ancestors and descendants of current context elementdescendants of current context element ancestor-or-self::ancestor-or-self:: descendant-or-self::descendant-or-self::

These methods don't select siblingsThese methods don't select siblings

descendant-or-self::

ancestor-or-self::

Other Relationships (4)Other Relationships (4)

Can access all preceding and following Can access all preceding and following completedcompleted nodes of current context nodes of current context elementelement preceding::preceding:: following::following::

Can access attributesCan access attributes attribute::attribute::

following::

preceding::

attribute::

Predicate FiltersPredicate Filters

Location paths are indiscriminateLocation paths are indiscriminate May get a list of items that are selectedMay get a list of items that are selected

Predicate filter is used to filter the listPredicate filter is used to filter the list Filter is held between 'Filter is held between '[ ][ ]' '

Simplest is Simplest is position()position() function function predicatepredicate exon[position() = 1]exon[position() = 1] //1st exon//1st exon intron[2]intron[2] //2nd intron//2nd intron

Can combine tests with 'Can combine tests with 'andand' and '' and 'oror''

Position TestsPosition Tests

The The last()last() operation operation Locates the last sibling in listLocates the last sibling in list

The The count()count() operation operation Evaluates the number of items in listEvaluates the number of items in list child::transcript[count(child::intron) = 1]child::transcript[count(child::intron) = 1]

The The id()id() operation operation Checks the identifier of the elementChecks the identifier of the element child::transcript[id("ENS0001")]child::transcript[id("ENS0001")]

Attribute TestsAttribute Tests

Attributes can be selectedAttributes can be selected feature/@typefeature/@type

Elements can be selected dependant upon Elements can be selected dependant upon attribute valueattribute value feature[@type="exon"]feature[@type="exon"]

FunctionsFunctions

Functions in XPath:Functions in XPath: text()text() = matches the text value = matches the text value node()node() = matches any node (= * or @* = matches any node (= * or @*

or or text()text())) name()name() = returns the name of the = returns the name of the

current tagcurrent tag

BooleansBooleans

A boolean can only have two values: true A boolean can only have two values: true or falseor false

The following expressions can be The following expressions can be evaluated:evaluated: oror andand =, !==, != <=, <, >=, ><=, <, >=, >

ExampleExample

Operations perform boolean tests on Operations perform boolean tests on conditionsconditions exon[not(position() = 1)]exon[not(position() = 1)] transcript[not(exon)]transcript[not(exon)] intron[position != last()]intron[position != last()] exon[position > 2]exon[position > 2] exon[position >= 3]exon[position >= 3] exon[position() = 1 or last()]exon[position() = 1 or last()]

NumbersNumbers

A number represents a floating-point A number represents a floating-point numbernumber

The numeric operators convert their The numeric operators convert their operands to numbersoperands to numbers

Operators include:Operators include: +, -, *, div, mod+, -, *, div, mod Since XML allows - in names, the - operator Since XML allows - in names, the - operator

typically needs to be preceded by whitespace typically needs to be preceded by whitespace Example: 5 mod 2 returns 1Example: 5 mod 2 returns 1

StringsStrings

Strings consist of a sequence of zero or Strings consist of a sequence of zero or more charactermore character

A character is defined in the XML A character is defined in the XML RecommendationRecommendation

ExampleExample

Strings can be tested for characters and Strings can be tested for characters and substringssubstrings <note>hello there</note><note>hello there</note>

• note[contains(text(), "hello")]note[contains(text(), "hello")] <note><note><b><b>hellohello</b></b> there</note> there</note>

• note[contains(note[contains(.., "hello")], "hello")] The 'The '..' is current node, and will go through all ' is current node, and will go through all

children children

Example (2)Example (2)

starts-withstarts-with(string, pattern)(string, pattern) note[starts-with(., "hello")]note[starts-with(., "hello")]

string(string(expexp)) note[contains(string(2))]note[contains(string(2))]

string-after(string-after(string, terminatorstring, terminator)) string-before(string-before(string, terminatorstring, terminator)) substringsubstring(string, offset, length(string, offset, length))

Example (3)Example (3)

normalize(normalize(stringstring)) Removes trailing and leading Removes trailing and leading

whitespacewhitespace translate(translate(string, source, string, source, replacereplace)) translate(., ";+", ",")translate(., ";+", ",")

concat(concat(stringsstrings)) string-length(string-length(stringstring))

Core Function LibraryCore Function Library XPath defines a core set of functions and operatorsXPath defines a core set of functions and operators All implementations of Xpath must implement the core All implementations of Xpath must implement the core

function libraryfunction library Node Set FunctionsNode Set Functions

list/item[position() mod2 = 1]list/item[position() mod2 = 1] selects all odd number element of a listselects all odd number element of a list id)(“foo”)/child::para[position()=5]id)(“foo”)/child::para[position()=5] selects the 5selects the 5thth para child of the element with para child of the element with

the unique ID foothe unique ID foo String FunctionsString Functions

substring(“12345”, 0, 3) returns “12”substring(“12345”, 0, 3) returns “12” Boolean FunctionsBoolean Functions

booleanboolean true() returns “true” true() returns “true” Number FunctionsNumber Functions

numbernumber sum(node-set) returns the sum of the nodes sum(node-set) returns the sum of the nodes

Example for XPath QueriesExample for XPath Queries<<bibbib>>

<<bookbook> <> <publisherpublisher> Addison-Wesley </> Addison-Wesley </publisherpublisher>> < <authorauthor> Serge Abiteboul </> Serge Abiteboul </authorauthor>> < <authorauthor> <> <first-namefirst-name> Rick </> Rick </first-namefirst-name>> < <last-namelast-name> Hull </> Hull </last-namelast-name>> </ </authorauthor>> < <authorauthor> Victor Vianu </> Victor Vianu </authorauthor>> < <titletitle> Foundations of Databases </> Foundations of Databases </titletitle>> < <yearyear> 1995 </> 1995 </yearyear>></</bookbook>><<bookbook priceprice=“55”>=“55”> < <publisherpublisher> Freeman </> Freeman </publisherpublisher>> < <authorauthor> Jeffrey D. Ullman </> Jeffrey D. Ullman </authorauthor>> < <titletitle> Principles of Database and Knowledge Base Systems> Principles of Database and Knowledge Base Systems </</titletitle>> < <yearyear> 1998 </> 1998 </yearyear>></</bookbook>>

</</bibbib>>

Example summaryExample summary

bibbib matches a matches a bibbib element element** matches any elementmatches any element// matches the matches the rootroot element element/bib/bib matches a matches a bibbib element under element under rootrootbib/paperbib/paper matches a matches a paperpaper in in bibbibbib//paperbib//paper matches a matches a paperpaper in in bibbib, at any depth, at any depth//paper//paper matches a paper at any depthmatches a paper at any depthpaper|bookpaper|book matches a matches a paperpaper or a or a bookbook@price@price matches a matches a priceprice attribute attributebib/book/@pricebib/book/@price matches matches priceprice attribute in attribute in bookbook, in , in bibbibbib/book/[@price<“55”]/author/lastname bib/book/[@price<“55”]/author/lastname matches…matches…

XPath 2.0XPath 2.0

Latest version: Latest version: http://www.w3.org/TR/xpath20/http://www.w3.org/TR/xpath20/

W3C Working Draft 22 August 2003 W3C Working Draft 22 August 2003 Any expression that is syntactically valid Any expression that is syntactically valid

and executes successfully in both XPath and executes successfully in both XPath 2.0 and XQuery 1.0 will return the same 2.0 and XQuery 1.0 will return the same result in both languages result in both languages

XPath 2.0 (2)XPath 2.0 (2) XPath 2.0 is a much more powerful language XPath 2.0 is a much more powerful language

that operates on a much larger domain of data that operates on a much larger domain of data typestypes

A better way of describing XPath 2.0 is as an A better way of describing XPath 2.0 is as an expression language for processing sequences, expression language for processing sequences, with built-in support for querying XML with built-in support for querying XML documents documents

driving forces behind XPath 2.0 include not only driving forces behind XPath 2.0 include not only the XPath 2.0 Requirements document but also the XPath 2.0 Requirements document but also many of the XML Query language requirements.many of the XML Query language requirements.

XPath 2.0 is a strict syntactic subset of XQuery XPath 2.0 is a strict syntactic subset of XQuery 1.0 1.0

XPath 2.0 (3)XPath 2.0 (3)

XPath 2.0 introduces support for the XML XPath 2.0 introduces support for the XML Schema primitive types, which Schema primitive types, which immediately gives the user access to 19 immediately gives the user access to 19 simple types, including dates, years, simple types, including dates, years, months, URIs, etc. months, URIs, etc.

In addition, a number of functions and In addition, a number of functions and operators are provided for processing and operators are provided for processing and constructing these different data types constructing these different data types

XPath 2.0 (4)XPath 2.0 (4) Everything is a sequenceEverything is a sequence sequences are orderedsequences are ordered In XPath 1.0, if you wanted to process a In XPath 1.0, if you wanted to process a

collection of nodes, you had to deal with node-collection of nodes, you had to deal with node-sets.sets.

In XPath 2.0, the concept of the node-set has In XPath 2.0, the concept of the node-set has been generalized and extended.been generalized and extended.

sequences may contain simple-typed values as sequences may contain simple-typed values as well as nodes well as nodes

““for” expression enables iteration over for” expression enables iteration over sequences sequences

XPath 2.0 (5)XPath 2.0 (5) sum(for $x in /order/item return $x/price * sum(for $x in /order/item return $x/price *

$x/quantity)$x/quantity) Conditional expression:Conditional expression: if ($widget1/unit-cost < $widget2/unit-cost) if ($widget1/unit-cost < $widget2/unit-cost) then $widget1then $widget1 else $widget2 else $widget2 Quantifiers:Quantifiers: some $x in /students/student/name satisfies $x = some $x in /students/student/name satisfies $x =

"Fred“"Fred“ every $x in /students/student/name satisfies $x = every $x in /students/student/name satisfies $x =

"Fred""Fred"

XPath 2.0 (6)XPath 2.0 (6)

Intersections, differences, unions:Intersections, differences, unions: The except operator to select all of a given The except operator to select all of a given

node-set, except for certain nodes node-set, except for certain nodes @* except @exc:foo@* except @exc:foo the intersect operator the intersect operator $x intersect /foo/bar$x intersect /foo/bar

Some PracticeSome Practice

Try XPath Visualizer.Try XPath Visualizer. You can download it from:You can download it from:http://www.vbxml.com/downloads/files/xpathvisualiserseptehttp://www.vbxml.com/downloads/files/xpathvisualiserseptember.zipmber.zip It can help you with:It can help you with: Learning and playing with XPath expressions. Learning and playing with XPath expressions. Composing and visually verifying the exact XPath Composing and visually verifying the exact XPath

expression when designing an XSLT stylesheet. expression when designing an XSLT stylesheet. Obtaining the quantitative characteristics of an xml Obtaining the quantitative characteristics of an xml

document, counts, sums, arithmetical and relational document, counts, sums, arithmetical and relational results, strings, substrings, etc. results, strings, substrings, etc.

ConclusionConclusion

XPath provides a concise and intuitive way XPath provides a concise and intuitive way to address into XML documentsto address into XML documents

Standard part of the XSLT and XPointer Standard part of the XSLT and XPointer specificationsspecifications

Implementing XPath basically requires Implementing XPath basically requires learning the abbreviated syntax of location learning the abbreviated syntax of location path expressions and the functions of the path expressions and the functions of the core librarycore library

ReferencesReferences

http://www.w3.org/TR/xpathhttp://www.w3.org/TR/xpath http://www.w3.org/TR/xpath20/http://www.w3.org/TR/xpath20/ http://www.vbxml.com/xpathvisualizer/defahttp://www.vbxml.com/xpathvisualizer/defa

ult.aspult.asp http://www.xml.com/pub/a/2002/03/20/xpathttp://www.xml.com/pub/a/2002/03/20/xpat

h2.htmlh2.html XML in a NutshellXML in a Nutshell