Querying XML Sameer S. Pradhan. The Problem (DBMS Vs Docs) z3-level hierarchy: table, record and field zOrder is not part of the information zStrings

Querying XML

Sameer S. Pradhan

The Problem (DBMS Vs Docs)

3-level hierarchy: table, record and fieldOrder is not part of the informationStrings in separate fields are separateLocation of data is not generally

significantLinking is far more often part of the data,

not part of the schema representing data

Goals

Data Model Based on XML Infoset

Query OperatorsQuery Language

Usage Scenarios

Human readable documentsData-oriented documentsMixed-model documentsAdministrative dataFiltering streamsMultiple syntactic environments

General Requirements

Syntax Binding MAY have more than one syntax binding

Declarativity MUST be declarative

Protocol Independence MUST be defined independently of any

protocolsError Conditions

XML Query Functionality (1)

Quantifiers MUST include support for both Universal

and Existential QuantifiersHierarchy and Sequence

MUST support operations on hierarchy and sequence of document structures

Aggregation MUST allow computing summary

information


Combination MUST be able to combine information from

multiple documents or from different parts of the same document

Sorting MUST be able to sort query results

Structural Preservation MUST preserve structure of original

document


Structural Transformation MUST be able to transform and create new

structuresReferences

MUST be able to traverse intra- and inter-document references

Text and Element Boundaries MUST handle text across element

boundaries


Operation on Schemas MUST be able to access Schemas or DTDs

Extensibility SHOULD support the use of externally

defined functionsOperation on Names

MUST perform simple operations on names MAY perform more powerful operations


Closure MUST be closed with respect to the XML

Query data model

XML Query Data Model (1)

Datatypes MUST represent XML 1.0 data as well as

simple and complex types of XML SchemaReferences

MUST include support for references, both, internal and external

Schema Availability MUST query even in the absence of

Schema

XML Query Data Model (2)

Trees Node-labeled Edge-labeled

XML Query data model is a Node-labeled, tree-constructor representation

Node functions Constructors Accessors

Node Accessors

A node has eight accessors isDocNode isElemNode isValueNode isAttrNode isNSNode isPINode isCommentNode isInfoItemNode

Value Constructors

Fourteen primitive XML Schema datatypes stringValue boolValue floatValue doubleValue decimalValue timeDurValue recurDurValue

binaryValue urirefValue idValue idrefValue qnameValue entityValue notationValue

Note: ValueNode replaces XPath’s TextNode

Example

<?xml version=1.0?><p:part

xmlns:p=“http://www.mywebsite.com/PartSchema” xsi:schemaLocation =

“http://www.mywebsite.com/PartSchema

http://www.mywebsite.com/PartSchema” name=“nutbolt”> <mfg>Acme</mfg> <price>10.50</price></p:part>

Data-Model (1)

children(D1) = [ Ref(E1) ]root(D1) = Ref(E1) name(E1) =

QNameValue("http://www.mywebsite.com/PartSchema", "part", Ref(Def_QName))children(E1) = [ Ref(E2), Ref(E3) ] attributes(E1) = { Ref(A1) } namespaces(E1) = { Ref(N1) } type(E1) = Ref(Def_part_type)parent(E1) = Ref(D1)

name(A1) = QNameValue(null, "name", Ref(Def_QName))value(A1) = Ref(StringValue("nutbolt", Ref(Def_string)))

Data-Model (2)

parent(A1) = Ref(E1) prefix(N1) = Ref(StringValue("p", Ref(Def_string)))uri(N1) =

URIRefValue("http://www.mywebsite.com/PartSchema", Ref(Def_uriReference))parent(N1) = Ref(E1)

Constraints on Data Model

Node References Defined by the query system NOT by the

query languageNode Identity

The function ref is one-to-one onto ref_equal(ref(n1), ref(n2)) equal(n1,n2)

Unique parentDuplicate-free list of children

XQL

XQL - XML Query LanguageThe name was an ad hoc selection,

but seems like it has and will survive for quite some time

XQL Design (1)

Compact, easy to type and readSimple for common casesEmbeddable in programs, scripts, URLsUnique identification of each nodeDeclarative NOT proceduralEvaluation at any level in the documentResult in document order; no repeat

node

XQL Design (2)

Superset of XSLClosure is guaranteed ONLY if the

implementation returns well-formed XML documents

XQL: Syntax (1)

Mimics the URI navigation syntaxNotation

/ : Root context ./ : Current context // : Recursive descent from root .// : Recursive descent from current

node @ : Attribute * : Any element

Sample Document

<?xml version='1.0'?><bookstore specialty='novel'> <book style='autobiography'> <title>Seven Years in Trenton</title> <author> <first-name>Joe</first-name> <last-name>Bob</last-name> <award>Trenton Literary Review Honorable Mention</award> </author> <price>12</price> </book><my:book style='leather' price='29.50' xmlns:my='http://www.placeholder-name-

here.com/schema/'> <my:title>Who's Who in Trenton</my:title> <my:author>Robert Bob</my:author> </my:book></bookstore>

XQL: Examples (1)

./author author /bookstore //author .//author book[bookstore/@specialty = @style] author/first-name author/* bookstore//title bookstore/*/title *[@specialty]

XQL: Examples (2)

book[@style] book/@style book[excerpt]/author[degree] book[excerpt][title] book[excerpt $and$

title] author[name = …] author[name $eq$ …] author[. = ‘Bob’] author[text() = ‘Bob’] author[first-name!text() = ‘Bob’] degree[index() $lt$ 3] degree[index() < 3]

XQL: Examples (3)

<x> <y/> <y/> </x> <x> <y/> <y/> </x>

x/y[index() = 0] x/y[0] (x/y)[0] x[0]/y[0] book[end()] author[first-name][2] price[@intl!value() = ‘canada’] my:* *:book book/@my:style

XQL: Examples (4)

author[publications!count() > 10] books[pub_date < date(‘1995-01-01’)] books[pub_date < date(@first)] bookstore/(book | magazine) //comment()[1] ancestor(book/author) author[0, 2 $to$ 4, -1]

XML-QL

SQL-like Features of query languages for semi-

structured data Supports joins and aggregates

XML-QL: Sample Document

<bib> <book year="1995">  <title> An Introduction to Database Systems </title> <author> <lastname> Date </lastname> </author> <publisher> <name> Addison-Wesley </name > </publisher> </book> <book year="1998"> <title> Foundation for Object Databases: The Third Manifesto </title> <author> <lastname> Date </lastname> </author> <author> <lastname> Darwen </lastname> </author> <publisher> <name> Addison-Wesley </name > </publisher> </book></bib>

XML-QL: Flattening Query (1)

WHERE <book> <publisher><name>Addison-Wesley</name></publisher> <title> $t</title> <author> $a</author> </book> IN "www.a.b.c/bib.xml" CONSTRUCT $a

Note: Flattening is not possible with XQL

XML-QL: Result (1)

<result> <author> <lastname> Date </lastname> </author> <title> An Introduction to Database Systems </title> </result><result> <author> <lastname> Date </lastname> </author> <title> Foundation for Object Databases: The Third Manifesto </title> </result><result> <author> <lastname> Darwen </lastname> </author> <title> Foundation for ObjectDatabases: The Third Manifesto </title></result>

XML-QL: Nested Queries (2)

WHERE <book > $p</> IN "www.a.b.c/bib.xml", <title > $t</>, <publisher><name>Addison-Wesley</></> IN $p CONSTRUCT <result> <title> $t </> WHERE <author> $a </> IN $p CONSTRUCT <author> $a</> </>

XML-QL: CONTENT_AS

WHERE <book> <title> $t </> <publisher><name>Addison-Wesley </> </> </> CONTENT_AS $p IN "www.a.b.c/bib.xml"CONSTRUCT <result><title> $t </> WHERE <author> $a</> IN $p CONSTRUCT <author> $a</> </>

XML-QL: Result (2)

<result> <title> An Introduction to Database Systems </title> <author> <lastname> Date </lastname> </author> </result>

<result> <title> Foundation for Object/Relational Databases: The Third

Manifesto </title> <author> <lastname> Date </lastname> </author> <author> <lastname> Darwen </lastname> </author></result>

XML-QL: Query (3)

WHERE <article> <author> <firstname> $f </> // firstname $f <lastname> $l </> // lastname $l </></> CONTENT_AS $a IN "www.a.b.c/bib.xml”<book year=$y> <author> <firstname> $f </> // join on same firstname $f <lastname> $l </> // join on same lastname $l </></> IN "www.a.b.c/bib.xml", $y > 1995CONSTRUCT <article> $a </>

XML-QL: ELEMENT_AS

WHERE <article> <author> <firstname> $f</> // firstname $f <lastname> $l</> // lastname $l </> </> ELEMENT_AS $e IN "www.a.b.c/bib.xml"...CONSTRUCT $e

XML-QL: Tag Variables

WHERE <$p> <title> $t </title> <year>1995</> <$e> Smith </> </> IN "www.a.b.c/bib.xml", $e IN {author, editor} CONSTRUCT <$p> <title> $t </title> <$e> Smith </> </>

Note: XQL does not support tag variables

XML-QL: Regular Expressions

<!ELEMENT part (name brand part*)><!ELEMENT name CDATA><!ELEMENT brand CDATA>

WHERE <part*><name>$r</> <brand>Ford</> </> IN www.a.b.c/bib.xml"

CONSTRUCT <result>$r</>

WHERE <$*> <name>$r</> <brand>Ford</> </> IN "www.a.b.c/bib.xml"CONSTRUCT <result>$r</>

WHERE <part+.(subpart|component.piece)>$r</> IN "www.a.b.c/parts.xml" CONSTRUCT <result> $r</>

Note: XQL does not support regular expressions

XML-QL: Joins

WHERE <person> <name></> ELEMENT_AS $n <ssn> $ssn</> </> IN "www.a.b.c/data.xml",

<taxpayer> <ssn> $ssn</> <income></> ELEMENT_AS $i </> IN "www.irs.gov/taxpayers.xml" CONSTRUCT <result> $n $i </>

XML-QL: Ordering

WHERE <pub> &p </> in "www.a.b.c/bib.xml", <title> $t </> in $p, <year> $y </> in $p <month> $z </> in $p ORDER-BY $y,$z CONSTRUCT $t

Note: XQL does not support ordering

XML-QL: Grouping

CONSTRUCT <results> { WHERE <bib><book> <title>$t</title> <author><last>$l</last><first>$f</first></author> </book> </bib> IN "www.bn.com/bib.xml" CONSTRUCT <result ID=author($l,$f)> <title>$t</title> <author><last>$l</last><first>$f</first></author></result>} </results>

Note: Explicit grouping is not possible with XQL

XML-QL: Functions

FUNCTION findDeclaredIncomes($Taxpayers, $Employees) WHERE <taxpayer> <ssn> $s </> <income> $x </> </> IN $Taxpayers, <employee> <ssn> $s </> <name> $n </> </> IN $Employees CONSTRUCT <result> <name> $n </> <Income> $x </> </> END

findDelcaredIncomes("www.irs.gov/taxpayers.xml", “www.a.b.c/employees.xml")

XQuery

Builds directly on XPointerSpecial type for the resultsAbility to return ranges (spans)

XQuery: Syntax

? : Selects element with given id ^ : Selects among containers of current

node < : Preceding sibling > : Following sibling « : All preceding nodes » : All following nodes @ : Attribute $ : Selects a range by matching a string

XQuery: Queries

descendant(FOOTNOTE & TYPE=‘CITATION’).(REF) descendent(SEC & descendent(LEVEL = ‘SECRET’)) descendent(FOOTNOTE & TYPE=‘CITATION’).

(REF){1-2}.link(role=AUTHOR) descendent(FOOTNOTE & (child(AUTHOR).attr(TYPE)

= *(ancestor(CHAPTER).attr(AUTHOR))) union(id(foo), id(bar), descendent(SEC)) intersection(descendent (ITEM & string(‘dog’)),

descendent (ITEM & string(‘cat’))) difference(fsibling(div), ID(SECRET)) ^TI P* [^UI OL DL] {1,3} SUMMARY $

Other Query Languages

Lorel (Lightweight Object REpository Language)

YATLXtractXmlqueryXML Query Engine

And...

QUILT

The problem with most query languages is that they are either document oriented or database oriented

QUILT is derived from both domains and promises substantial coverage of both areas

It has a FLWR (pronounced as ‘flower’) construct

References

http://www.w3.org/TR/2000/WD-xmlquery-req-20000131 http://www.w3.org/TandS/QL/QL98/pp/xql.html http://www.w3.org/TR/1998/NOTE-xml-ql-19980819/ http://www.w3.org/TandS/QL/QL98/pp/xquery.html http://www.fatdog.com/ http://www.almaden.ibm.com/cs/people/chamberlin/

quilt_lncs.pdf http://www-db.research.bell-labs.com/user/simeon/xquery.html http://www-db.stanford.edu/lore/ http://www.cs.washington.edu/homes/zives/research/

xmlquery.pdf http://www.oasis-open.org/cover/xmlQuery.html (main

source)

Documents

Querying XML Sameer S. Pradhan. The Problem (DBMS Vs Docs) z3-level hierarchy: table, record and field zOrder is not part of the information zStrings