213
ICDE’2001, Heidelberg, Germany . Florescu, J. Siméon XML Data: From Research to Standards Daniela Florescu Propel Jérôme Siméon Bell Laboratories

XML Data: From Research to Standards

  • Upload
    macy

  • View
    22

  • Download
    0

Embed Size (px)

DESCRIPTION

XML Data: From Research to Standards. Daniela Florescu Propel. Jérôme Siméon Bell Laboratories. Data and the Web: A bit of history. Research: > 1950’s : Lisp [Mac Carthy] > 1960’s : Tree languages [Buchi] > 1970’s : Relational DBs [Codd] > 1990 : Graphlog [Univ. Toronto] - PowerPoint PPT Presentation

Citation preview

Page 1: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 1D. Florescu, J. Siméon

XML Data:From Research to

Standards

Daniela FlorescuPropel

Jérôme SiméonBell Laboratories

Page 2: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 2D. Florescu, J. Siméon

Data and the Web:A bit of history

• Research:> 1950’s: Lisp [Mac Carthy]

> 1960’s: Tree languages [Buchi]

> 1970’s: Relational DBs [Codd]

> 1990: Graphlog [Univ. Toronto]

> 1994: O2 extensions [INRIA]

> 1995: Tsimmis & OEM [Stanford]

> 1995: UnQL [UPenn]

Need to handle irregular Web data.Use graph data models.

• Internet industry:> 1957 : Sputnik launches ARPA

> 1972 : First demonstration of ARPANET

> 1989 : Number of hosts breaks 100,000> 1991 : CERN releases the World Wide

Web HTML as the support for information

> 1997 : 20 Million Hosts, 1 Million Web sites

> 1998 : W3C releases XML to represent information on the WebXML provides a syntax for irregular

textual Web information.

?

Page 3: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 3D. Florescu, J. Siméon

The secret of HTML success• Everybody can write it:

> HTML is simple> HTML is textual: it is human readable, you can use any

editor, ...

• Everybody can read it> HTML is portable on any platform> The browser is the universal application

• It connects pieces of information together> Through hypertext links

Page 4: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 4D. Florescu, J. Siméon

But new applications = new needs• Infomediaries:

– Search engines– Web portals– Digital libraries– Virtual enterprises

• Electronic services:– On-line catalogs and procurement– Comparison shoppers– Market places

• Scientific applications• Manufacturing engineering

etc.More than HTML: data on the Web

More than the browser: applications on the Web

Page 5: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 5D. Florescu, J. Siméon

The Secret of XML Popularity

It looks like HTML...> Simple, familiar, easy to learn, human-readable> Universal and portable> Supported by the W3C: trusted and quickly adopted by the

industry

…but it’s more than HTML!> Flexible: you can represent any information> Extensible: you can represent it the way you want!

<book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year>

</book> …

Page 6: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 6D. Florescu, J. Siméon

XML Is Only the Beginning...• How do you build applications ?

> There is an urgent need for XML tools

• Designing XML tools is a data management problem:> XML 1.0 to describe structured documents

~ Syntax for trees

> XML data models to describe the information content~ Data model for trees

> XML schemas to describe the structure of information~ Data definition language for trees

> XML languages to describe information processing~ Data manipulation language for trees

Page 7: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 7D. Florescu, J. Siméon

About the Tutorial• XML through database glasses• Contains:

> Up-to-date information about standards> Relationship with research> Convergence and divergences

• Divided in 4 parts:1. Introduction to XML 1.02. Data models 3. Schema languages4. Query languages

Please, please, please, ask questions!

Page 8: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 8D. Florescu, J. Siméon

Part IXML 1.0

Page 9: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 9D. Florescu, J. Siméon

About the W3C• Membership organization

• Different types of groups inside the W3C:– Working groups– Interest groups– Coordination groups

• Status of W3C documents:– Note– Working draft– Last Call– Candidate/proposed recommendation– Recommendation ~ Standard

Page 10: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 10D. Florescu, J. Siméon

XML activities inside W3C• Core XML

> eXtensible Markup Language (XML 1.0), namespaces, Infoset

• XML Linking> XML Pointer Language (XPointer), XML Linking language

• XML Schema

• XML Query> XML Data Model, Algebra and Query Language

• Document Object Model

• XSL> XPath> XSLT/XSL: Transformation and stylesheet language

Page 11: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 11D. Florescu, J. Siméon

XML 1.0:Well formed documents

<book year=“1967” ><title>The politics of experience</title><author>R.D. Laing</author><ref isbn=“1341-1444-555”/><section>

The great and true Amphibian, whose nature is disposed to…..

<title>Persons and experience</title> Even facts become...

</section> …</book>

• An XML Document is composed of:> markup: element, attributes> text: #PCDATA, CDATA

• Well-formed document:> verifies XML lexical conventions> contains properly nested elements with a single root element> can contain empty elements, mixed text and elements

Page 12: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 12D. Florescu, J. Siméon

XML 1.0:Valid documents

<?XML version=“1.0”?> <DOCTYPE book [ <!ELEMENT book (title, author*, publisher?,

section+)> <!ATTLIST book year CDATA #IMPLIED> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT section (#PCDATA | title | section)*> ]>...

• A Valid XML document verifies a Document Type Definition (DTD):> grammar for the document> constraints on the structure of elements, attributes, entities,

notations...> a DTD is optional

(We will see more about DTD in the schema part of the tutorial)

Page 13: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 13D. Florescu, J. Siméon

Some additional features• General entities &myentity;

> Declared as part of XML 1.0 or in a DTD> Used to escape characters, as macros for pieces of

documents&amp; = &

> An XML document contains Unicode characters&#60; = &lt; = <

• Parameter entities %myentity;> Declared in a DTD, used as macros for pieces of DTDs

<!ENTITY %macro “publisher (#PCDATA)”> …

<!ELEMENT %macro;>

Page 14: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 14D. Florescu, J. Siméon

Even more additional features

• Namespaces mynames:name> a set of names identified by an URI> tags and attribute names become qualified names

(QName)

• Processing instructions> to embed processing in a document (e.g. Java applet in

HTML)

• Comments

<myns:section xmlns:myns=“http://caravel.inria.fr/mySchema” > <myns:title> Persons and experience</myns:title></myns:section>

<!-- This is a comment -->

Page 15: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 15D. Florescu, J. Siméon

Part IIData Model

Page 16: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 16D. Florescu, J. Siméon

Why a data model for XML ?

• As a support for physical/logical independence> XML can be stored in files, a native XML repository, a relational

database> XML can be virtual, as a view of a repository, integrated sources> XML can be in memory, using data structures in C, C++, Java, etc> XML can be streamed between processes

• To describe information content of XML documents> to agree and reason about information content, preservation

• To define semantics of operations:

> equality, etc.

For old & well-know (but good!) reasons

Page 17: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 17D. Florescu, J. Siméon

But XML has specifics• Serialization syntax

• Some information exists only after schema validation

> price is not a string but a decimal value> refs is not a string but a list of references

• One more motivation for a data model:To isolate the user from syntactic details of XML

<xsd:attribute name=“price” type=“xsd:decimal”/><xsd:attribute name=“bookid” type=“xsd:ID”/><xsd:attribute name=“refs” type=“xsd:IDREFS”/>

<book bookid=“b1” price=“10.50”/><title>War &amp; Peace</title><author>Tolstoi</author><biblio refs=“b1 b2 b3”>

</book>

Page 18: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 18D. Florescu, J. Siméon

Existing data models• Graph and tree models used in research

• Document Object Model (DOM)> status: recommendation> programmatic interface for XML (with an object-oriented

flavor)

• XML Information Set (Infoset)> describes the information content exported by XML processors> can be generated after parsing or after validation

• XML languages’ Data models:> required for language semantics> XPath: recommendation has it’s own data model> XML Query Data model: working draft

Page 19: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 19D. Florescu, J. Siméon

• Graph based, unordered, edge-labeled (here OEM)

> But XML is ordered, tree based> Node-labeled seems more natural (e.g., like in DOM)

Semistructured model

&b0

&b1

&b2 &b3

“Tolstoi” 10.50

book

bookbook

references

biblio

biblio

authortitle price

author

authorauthor

titlepublisherauthor

authortitle

Bib

“War & Peace”

refs

refs

refs

Page 20: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 20D. Florescu, J. Siméon

Ordered model• Node-labeled, ordered trees, with references (YAT)

> But what about attributes (unordered!), namespaces, processing interactions, etc. ?

“War & Peace”

title

b0: bib

b1:

price

“Tolstoi"

author

10.50

book

biblio

b2:b3:

refs

&b1 &b2 &b3

title priceauthor

book

biblio

book

......

..........................................

Page 21: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 21D. Florescu, J. Siméon

XML Infoset• Specifies a description of information in a well-

formed XML document

• Abstract way to think about XML data

• Other processors (e.g. XML Schema) can contribute informationHere is an example in a made-up syntax:

b1 = Element [ local name = “book”;children =[ Element [ local name = “title” ... ];

Element [ local name = “author”... ]; ... ]attributes = [ Attribute [ local name = “price”;

children = [ Character [ code = ‘1’ ];

Character = [ code = ‘0’ ];

Character = [ code = ‘.’ ];

Character = [ code = ‘5’];

Character = [ code = ‘0’ ] ];

attribute type = “xsd:decimal” ] ... ] ]

Page 22: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 22D. Florescu, J. Siméon

XML Query Data Model• A node-labeled, tree model with references

> Very close to XPath data model

• Generated after validation> provides also pointers to schema information

• Uses a functional notation> no explicit data structure

• Defines a mapping from post-schema validated Infoset to XML Query Data Model> preserves original infoset (e.g., characters)

Page 23: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 23D. Florescu, J. Siméon

XML Query Data Model• Nodes

Node = DocNode | ElemNode | AttrNode | ValueNode

| NSNode | PINode | CommentNode | InfoItemNode

• XML Schema primitive types string, boolean, ID, IDREF, decimal, QName, ...

• Collectionssequence bag union[T] {T} T1 | T2

Referencesref(T)

Page 24: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 24D. Florescu, J. Siméon

Constructors & accessors• Attribute Constructor

attrNode : (QNameValue, ValueNode) -> AttrNodeValueNode = StringValue | DecimalValue | ...qnameValue : (uriReference | null, string)-> QNameValue

• Attribute Accessorsname : AttrNode -> QNameValuevalue : AttrNode -> ValueNodetype : AttrNode -> ElemNode

• Example:<book price=“10.50”/>

A1 = attrNode(qnameValue(null, “price”),decimalValue(10.50))

name(A1) = qnameValue(null, “price”)value(A1) = decimalValue(10.50)

Page 25: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 25D. Florescu, J. Siméon

XML Data Model: Conclusion• Research focuses on simple formal models

• Many standards related to the need for a data model

• XML Query Data Model reconciles both worlds> Complete with respect to XML> Simple design with a clear connection to a formal model:

ordered trees, node-labeled, with references> Clear relationship two other W3C standards:

mapping to XML Infoset based on XPath + typed values and unordered collections

> Less clear relationship with DOM

Page 26: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 26D. Florescu, J. Siméon

Part IIIData Definition Language

Page 27: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 27D. Florescu, J. Siméon

Why a DDL for XML ?For old & well-know (but good!) reasons• As an ontology & modeling tool:

> to describe the structure of information: entities, relationships...

> to share common descriptions between actors/applications> to guide query formulation and application development

• For error detection & safety: > to verify that documents comply to what the application

expects> to make sure that the application accesses valid data> to enforce safe operations (e.g., don’t do float arithmetic on

trees!)> to check that compositions of operations make sense

• For performances:> to design storage (saving space, improving clustering, etc.)> to process queries (algebraic laws, rewriting path expressions,

etc.)

Page 28: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 28D. Florescu, J. Siméon

But XML deals with new needs• XML data created from legacy repositories

> Need to capture schemas from heterogeneous sources– Relational schemas: Simple but with integrity constraints– Object-oriented schemas: Typed references, Inheritance...– Document grammars: Regular expressions, mixed text and

structure

• XML used on the Web, for data exchange > Need to remain flexible– Web sources: From strict schemas to well-formed

documents (smooooothly........)– Many applications use the same information:

We should be able to type the same document in multiple ways

Page 29: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 29D. Florescu, J. Siméon

Existing schema languages

• DTDs (W3C recommendation as part of XML 1.0)> powerful for documents: regular expressions, mixes of text and

structure> limited for other applications: cannot capture relational or object

schemas

• XML Schema (Candidate recommendation)> Many new features: data types, forms of subtyping, etc.> More powerful but quite complex

• Schemas for unordered semistructured models: > Data guides, Graph schemas, using Datalog > Used for optimization, schema inference from data

• Schemas for ordered trees models> Regular tree grammars, YAT, lotos, XDuce, Relax, TRex etc.> Used for optimization, type checking and inference from queries

Page 30: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 30D. Florescu, J. Siméon

DDL Roadmap3.1. Describing atomic values

> integer, string, float, date, images, etc

3.2. Describing structures> elements: tag-coupled approach vs. tag-decoupled

approach> attributes

3.3. More semantics> identity, references, relationships intra or inter

documents> isa: notion of inheritance...

3.4. Simplifying schema reuse> import/export abilities> refinement of existing descriptions

Page 31: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 31D. Florescu, J. Siméon

Values in XML: easy ?• DTD says it’s easy:

Recipe: #PCDATA = string CDATA = other strings, ...I.e.: Everything is a string

Unfortunately: Strings are not a panacea...

• Database research says it’s easy:Recipe: Take a data model with atomic types

Each value is in a different type...I.e.: Don’t deal with syntax but data model

Unfortunately: XML = file = syntax

Page 32: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 32D. Florescu, J. Siméon

Values in XML: many issues...

• Addressing numerous needs:> float, string, int, date, URI, telephone number, gif, applet, etc.

• Living with XML 1.0 syntax> The same lexical representation can correspond to several values

> The same value can have several lexical representations

> binary formats (images, etc.) must be serialized in a portable way

• Compatible with other standards

• Compatible with internationalization> World Wide Web!

<book><title>Haystacks at Chailly </title><author>Monet</author> <date>1865</date><price>1865</price></book>

<book><ref>Monet1865</ref><in_stock>true</in_stock></book><book><ref>Monet1865</ref><in_stock>1</in_stock></book>

Page 33: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 33D. Florescu, J. Siméon

XML Schema Part 2: Datatypes

• Defines 14 built-in types (basic types)> general purpose types> types for compatibility with DTDs

• Relies on other existing standards whenever possible> IEEE 754-1985 for floats> UCS [ISO 10646] & Unicode for internationalization> ISO 8601 for dates

• Gives the ability to define new types (derived types)

• Single lexical representation for many values ?> document is interpreted with respect to a given schema> if no schema, the value is given the type string

Page 34: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 34D. Florescu, J. Siméon

Datatypes: base types• Base types cover essential needs

> “classic” values: string, boolean, float, double, decimal> temporal values: timeDuration, recurringDuration> binary values: binary> Web-related types: uriReference, QName> DTD types: ID, IDREF, ENTITY, NOTATION

• One value for several syntaxes> Each base type has a set of values (value space)> Values may have several lexical representations (lexical

space)> Equality and order are defined in terms of the value space

Page 35: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 35D. Florescu, J. Siméon

Base types: examplesDatatype Examples Notes

string Victor Hugo

boolean true, f alse, 1,0

fl oat 12, 12.00, 1.2E-2, I NF mx2 e where m < 2 24 -149 <= e <= 104

double 12, 12.00, 1.2E-2, I NF mx2 e where m < 2 53 -1075 <= e <= 970

decimal 0, -0, 1.23, 123.4 Arbitrary precision

timeDuration P29Y2MT1H30M1.3S 29 years, 2 months, 3 days, 1 hour, 30 minutes, 1.3 seconds

recurringDuration --08-29T19:05:00 August 29th at 7.05pm every year

uriRef erence http:/ / www.w3.org/

Page 36: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 36D. Florescu, J. Siméon

Datatypes: facets• Each base type has facets (read: properties)

• Some facets are fundamentals> equality, order> bounded, cardinality, numeric

• Some facets are constraining> length, minLength, maxLength: for string, binary or lists> maxInclusive, maxExclusive, minInclusive, minExclusive> precision, scale: for decimal numbers> encoding: hex or base64 for binary> enumeration, pattern> duration, period

Page 37: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 37D. Florescu, J. Siméon

Datatypes: derived types• One can derive types by restriction of

facets

• One can derive types by list

• XML Schema offers predefined derived types> integer, nonpositiveInteger, int, date, year, century,

timeInstant, language, etc.

> IDREFS, NMTOKENS, etc.

<simpleType name=’integer' base=’xsd:decimal'> <scale value='0'/></simpleType>

<simpleType name=’int' base=’xsd:integer'> <maxInclusive value=’2147483647'/> <mininclusive value=‘-2147483648’/></simpleType>

<simpleType name=’IDREFS' base=’xsd:IDREF’ derivedBy=‘xsd:list’/>

Page 38: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 38D. Florescu, J. Siméon

Now you can practice...> Using a range facet

> Using an enumeration facet

> Using a pattern facet

> Using a list type

> etc.

<simpleType name=’auctionprice' base=’xsd:decimal'> <minInclusive value='10'/></simpleType>

<simpleType name=’booktype' base=’xsd:string'> <xsd:enumeration value=”Book"/> <xsd:enumeration value=”Collection"/>...

<xsd:simpleType name=”isbn" base=‘xsd:string’> <xsd:pattern value=”ISBN \d{10}"/></xsd:simpleType>

<xsd:simpleType name=”auctions" base="xsd:auctionprice” derivedBy=“xsd:list”/>

Page 39: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 39D. Florescu, J. Siméon

Describing Values: Conclusion• Not addressed in research

• XML Schema Part2: Datatypes does a good job> Quite complete> Deals with complex requirements

(e.g.,internationalization)

• Defines values but not operations!> Needed by XPath, XQuery…

Page 40: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 40D. Florescu, J. Siméon

Describing XML structures• element names

> with the names themselves: book, title, etc.> possibly with wildcards: ~ = any tag, !a = not a,

etc.

• element children> using regular expressions

• element attributes> unordered attribute-value pairs

• Main question: types vs. element names> does the element name determines the type ?> tag-coupled types vs. tag-decoupled types

Page 41: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 41D. Florescu, J. Siméon

Coupled types• Approach taken by DTDs

> two elements with same name have always same type

> children = regular expression over elements

• Properties> easy to parse: => no depth look-ahead> no closure under union, no local names allowed> cannot express relational, object-oriented schemas

<!ELEMENT book (title, author+, price, publisher, section, conclusion?)><!ELEMENT title (#PCDATA)>....<!ELEMENT author (name,affiliation)<!ELEMENT name (first, last)><!ELEMENT first (#PCDATA)>....<!ELEMENT publisher (name, address)>...

Page 42: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 42D. Florescu, J. Siméon

Decoupled types• Approach taken by YAT, XDuce, lotos, etc.

> types are decoupled from element names> children are defined by regular expressions over types

> different types can have the same tag

• Properties> equivalent to regular tree grammars> closure under intersection, complement, union...> more precise type for documents and queries> harder to parse (might require look-ahead and

backtracking)

type Book = book [ Title, Author+, Price, Publisher, Section, Conclusion? ]type Title = title [ String ]type Author = author [ Name, Affiliation ]type Name = name [ first [ String ], last [ String ] ] ...

type Publisher = publisher [ PName, Address ]type PName = name [ String ]

Page 43: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 43D. Florescu, J. Siméon

Decoupled types cont’d• They are simple to define

> basic entities: datatypes, tags, type names> one construct : typesschema ::= type type_name = type .........type ::= String | Boolean | ... (* datatypes *) | type_name (* type name *) | tag [ type ] (* element *) | ~ [ type ] (* element with wild

card *) | type, type (* sequence *) | type | type (* union *) | type* (* kleene star *)

Page 44: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 44D. Florescu, J. Siméon

Decoupled types cont’d• They can easily describe mixed content

• They can easily describe all well-formed documents

• They support a notion of subtyping via inclusion

> all documents of type Body2 are also of type Body and UrTree

• But they can be ambiguous

> deciding between Body and Body2 can be expensive

type Section = section [ title [ String ], Body ]type Body = content [ (b [ Body ] | footnote [ String ] | Section | String)* ]

type UrScalar = (String | Boolean | Float | Double ...)type UrTree = UrScalar | ~[ UrTree* ]

type Body2 = content [ String, (b [ String ] | footnote [ String ] | String)*, Section* ]Body2 <: Body <: UrTree

type Section2 = section [ title [ String ], Body2*,Body* ]

Page 45: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 45D. Florescu, J. Siméon

Decoupled types & full XML• How do you describe attributes ?

> but attributes are unordered, without duplicates> they do not interact with the children of the element> they cannot contain complex values

• How do you describe references ?> Like in object schemas [Cluet et al 1998]:

> but it’s even harder to parse because of cycles [Beeri, Milo 1999]

• How do you deal with XML specifics ?> entities, process instructions, name spaces, serialization,

etc.

type Book = book [ @isbn [ String ], Title, Author+, Price, Publisher, Section, Conclusion? ]

type Author = author [ name [ first [ String ], type Book = book [ title [ String ], last [ String ] ] ]

&Author+,&Publisher ] type Publisher = publisher [ name [ String ] ]

Page 46: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 46D. Florescu, J. Siméon

What about XML Schema ?• Tries to get the expressive power of decoupled types

+ the ease of parsing of coupled types

• Advanced features: “subtyping”, constraints...

• Deals with all the specifics of XML

• XML Schema Syntax is in XMLResults in a pretty complex specification

<xsd:element name=”book”> <xsd:complexType> <xsd:element name=”title" type="xsd:string"/>

<xsd:element name=”author” maxOccurs=“unbounded”> <xsd:complexType><element name=“first” type=“xsd:string”/> <element name=“last” type=“xsd:string”/> </xsd:complexType></xsd:element> ……… </xsd:complexType></xsd:element>

Page 47: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 47D. Florescu, J. Siméon

Element & attribute declarations• Element decl. ~ associate element names to

types> have a name and their content is described by a type

• Attribute decl. ~ associate element names to types> have a name and contain an atomic value> can be required or optional> can only appear inside elements (through complex types)

<xsd:element name=”title" type="xsd:string"/> title [ String ]

<xsd: element name = “affiliation” type=“publisher”/> affiliation [ Publisher ]

<xs:attribute name=”price”/> @price [ String ]?

<xs:attribute name=”auctionhistory” type="auctions”@auctionhistory [ Auctions] use="required"/> type Auctions = Decimal*

Page 48: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 48D. Florescu, J. Siméon

Model groups• Defines content models (i.e., type for the children of an

element)~ equivalent to regular expressions over elements<xsd:sequence> title[Title],price[Price]

<xsd:element name=”title" type=”Title"/> <xsd:element name=”price" type=”Price"/></xsd:sequence>

<xsd:choice> ( publisher[Publisher] <xsd:element name=”publisher” type=“Publisher”/> | editor[Author]) <xsd:element name=”editor” type=“Author”/></xsd:choice>

<xsd:sequence minOccurs=“0” book[ Book ]* maxOccurs=“unbounded”>

<xsd:element name = “book” type=“Book”></xsd:sequence>

<xsd:all> (title[Title],price[Price]) <xsd:element name=”title" type=”Title"/> | (price[Price],title[Title]) <xsd:element name=”price" type=”Price"/></xsd:all>

Page 49: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 49D. Florescu, J. Siméon

Complex type definitions> they contain a content model and attribute declarations

> they can be empty

> they can be recursive> then can be mixed (I.e., strings + sub elements)

<xsd:complexType name=“Book”> type Book = @isbn [String], <sequence> title [String] <xsd:element name=”title" type="xsd:string"/> author[ Name ]+ <xsd:element name=”author” maxOccurs=“unbounded”

type=“AuthorName”/> </sequence> <xsd:attribute name = “isbn” type=“xsd:string/></xsd:complexType>

</xsd:complexType name=“RefBib” content=“empty”> type RefBib = @refto [ &UrTree ] <xsd:attribute name = “refto” type=“xsd:IDREF/></xsd:complexType>

</xsd:complexType name=“Body” content=“mixed”> type Body = (b[Body]|String)* <xsd:element name = “b” type=“Body” minOccurs=“0”

maxOccurs=“unbounded”/></xsd:complexType>

Page 50: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 50D. Florescu, J. Siméon

Some feature interactions• Local element restrictions

> local elements with same name can have different types

> but they must have the same type among siblings

• To be simple or not to be simple...

> requires a complexType defined by extension over decimals

<xsd:element name=”author”> <xsd:complexType> type Author = author [ name[ AuthorName ] ]<xsd:element name=”name” type=“AuthorName”/>

</xsd:complexType></xsd:element><xsd:element name=”publisher"/><xsd:complexType> type Publisher = publisher [ name [ String ]

]<xsd:element name=”name" type="xsd:string"/>...

</xsd:complexType></xsd:element>

<internationalPrice currency='EU'>423.46</internationalPrice>

<xsd:complexType name=“Names”> type Names = name [ AuthorName ],

<xsd:element name=”name” type=“AuthorName”/> name [ String ]? <xsd:element name=“name” type = “xsd:string” minOccurs = “0”/><xsd:complexType>

Page 51: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 51D. Florescu, J. Siméon

Describing Structures:Conclusion• Research : formal models with good properties

• XML Schema Part1: Structures is complex> Deals with XML syntactic aspects> Focuses on validation> Many features with complex interactions

• Need for some middle ground> We need to reason about schemas (e.g., for typing)> XML Schema: Formalism has just been released

Page 52: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 52D. Florescu, J. Siméon

Integrity constraints• Come from relational

> practical view-point: key & foreign key constraints

> theoretical view-point: functional & inclusion dependencies> studied in depth in the literature

• Many useful applications of ICs> used to preserve information when mapping ER model to

relational> used for safety and verification (e.g., controlling updates)> used for optimization (e.g., dropping useless joins)

• reasoning about ICs is hard:> implication of functional + inclusion dependencies is

undecidable> etc.

Book ( isbn, title, price, publisher ) isbn is a key for the relation BookAuthor (authorid, first, last, affiliation) authorid and first,last are both keys for the relation AuthorWrote (isbn,authorid) isbn and authorid are foreign keys to Book and Author

Page 53: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 53D. Florescu, J. Siméon

ID/IDREF mechanism in DTDs• Very simple ICs to model identity and references

• ID attributes must have distinct values> they identify elements uniquely in a document> but they are not exactly like keys: publisher’s stickers and

book’s isbns must be different

• IDREF attributes must have values from ID attributes> they can capture references to other elements> but: they allow refs to point to publishers!

<!ELEMENT book (title, author+, price, publisher, section, bibliography?)><!ATTLIST book isbn ID #required><!ELEMENT title (#PCDATA)><!ELEMENT publisher (name, address)><!ATTLIST publisher sticker ID #required><!ELEMENT bibliography EMPTY><!ATTLIST bibliography refs IDREFS #implied>

Page 54: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 54D. Florescu, J. Siméon

Adding constraints to DTDs• We can replace IDs by real keys:

• We can replace IDREFs by real foreign keys

> Reasoning about simple IC’s for XML is possible [FanSimeon 2000]

> Reasoning about IC’s with DTDs is very hard [FanLibkin 2001]

book.isbn -> book isbn is a key for the relation bookpublisher.sticker -> publisher sticker is a key for the relation publisher

author.authorid -> author authorid is a key for the relation authorwrote.isbn, wrote.authorid -> wrote isbn and authorid are a key for the relation wrote

biblio.refs <= book.isbn refs is a multi-valued foreign key from biblio to book

wrote.isbn <= book.isbn isbn is a foreign key from wrote to bookwrote.authorid <= author.authorid authorid is foreign key from wrote to author

Page 55: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 55D. Florescu, J. Siméon

Constraints in XML Schema• XML Schema can define powerful constraints

> Using XPath expressions

• One can define keys:

> the selector gives the collection on which the constraint applies

• One can define foreign keys:

• Many open issues> is XPath too powerful for reasoning (predicates, function calls ?) > which notion of equality is used ?> interaction between ICs and structural constraints ?

<key name=”Isbn"> <selector>books/book</selector> <field>@isbn</field> </key>

<key name=”Publisher"> <selector>books/book/publisher</selector> <field>@sticker</field> </key>

<keyref refer=”Isbn"> <selector>books/book/biblio</selector><field>@refs</field> </keyref>

Page 56: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 56D. Florescu, J. Siméon

Unified Constraint Model• Based on XML Query Algebra type system• Key/Foreign Key domains are defined by Types• Very simple path expression for key components

> Powerful: relational keys/fkeys, object references, ID/IDREFs> Close to relational approach> Simple enough to reason about satisfiability

[Fan Kuper Simeon 2001]

type Book = book [ title [ String ], Author*, publisher [ Publisher ] … ]

type Author = author [ name [ String ], wrote [ String* ] ]

key book = Book [| ./title/data() |]

fkey authorbooks = Authors [| ./wrote/data() |] references book

Page 57: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 57D. Florescu, J. Siméon

Reusing schemas• Many benefits

> sharing existing definitions> faster development

• Traditional techniques for schema reuse:> some notion of import and the ability to resolve name conflicts

> inheritance, based on subtyping

• We need means to access schemas over the Web

class Author inherit Person class Publisher inherit Company tuple(affiliation : Publisher ) tuple(address:string) tuple(first:string,last:string,affiliation:Publisher) tuple(name:string, address: string)<: tuple(first:string,last:string) <: tuple(name:string)

Import Person, Company from StdClass

class Person class Company tuple(name : tuple( first : string, tuple(name: string) last : string ))

Page 58: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 58D. Florescu, J. Siméon

Reusing XML Schemas• Means to import types from other schemas

> access and import though URIs> name conflict resolution based on namespaces

• Mechanisms for limited “inheritance” or subtyping> notions of extension and restriction> abstract types and “equivalence classes”

<schema xmlns="http://www.w3.org/1999/XMLSchema”

xmlns:html="http://www.w3.org/1999/xhtml" targetNamespace="uri:mybiblio”

xmlns:my="uri:mybiblio">

Page 59: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 59D. Florescu, J. Siméon

Extension• Extension allows to add new fields in a complex type

• Now you can use both types> but you might need to mark the data with xsi:type attributes

> you cannot export the document without its type anymore...

<complexType name=”ContactAuthor" base=” Author" derivedBy="extension">

<element name=”telephone" type=”xsd:string"/> </complexType>

<author xsi:type=“Author”><name> <first>Serge</first><last>Abiteboul</last></name>

<affiliation>INRIA</affiliation></author><author xsi:type=“ContactAuthor”>

<name><first>Jerome</first><last>Simeon</last></name><affiliation>Bell Laboratories</affiliation><telephone>+1 908 582 5473</telephone>

</author>

Page 60: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 60D. Florescu, J. Siméon

Restriction• Restricts the scope of a type definition

• 5x5 table across schema features to define restriction

• Spirit is to allow:> smaller datatypes> narrowed range for sequences t{n,m} < t{n’,m} iff n>n’

&& m<m’> reduced alternative t1 < (t1|t2)> propagation of restriction t1 < t1’ implies t1 < (t1’|t2)

<xsd:element name=”book2” base=“book” derivedBy=“restriction”> <xsd:complexType> <xsd:element name=”title" type="xsd:string"/>

<xsd:element name=”author” minOccurs=“2” maxOccurs=“10”>....... </xsd:complexType></xsd:element>

Page 61: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 61D. Florescu, J. Siméon

“Equivalence classes”• Allows to define elements that can be used in place

of other elements

> allow an element named contact to be used whenever an author element is expected

> the corresponding type can be a derived type

> of course, “equivalence classes” are not based on equivalence

<element name=“contact” type=“ContactAuthor” equivClass=’author' />

<author><name> <first>Serge</first><last>Abiteboul</last></name>

<affiliation>INRIA</affiliation></author><contact>

<name><first>Jerome</first><last>Simeon</last></name><affiliation>Bell Laboratories</affiliation><telephone>+1 908 582 5473</telephone>

</contact>

Page 62: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 62D. Florescu, J. Siméon

Some short-comings• Restriction is very syntactic

> the following two types are not restrictions of one another!

• Restriction and extension are not possible together:

<xsd:sequence> a[A],(b[B],c[C]) <xsd:element name=“a" type=”A"/> <xsd:sequence> <xsd:element name=“b" type=”B"/> <xsd:element name=”c" type=”C"/> </xsd:sequence></xsd:sequence>

<xsd:sequence> (a[A],b[B]),c[C] <xsd:sequence> <xsd:element name=“a" type=”A"/> <xsd:element name=”b" type=”B"/> </xsd:sequence><xsd:element name=“c" type=”C"/></xsd:sequence>

Person1 = person [ name [ UrTree ], age [ Integer ] ]

Person2 = person [ name [ String ], age [ Integer ],

address [ Address ] ]

Page 63: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 63D. Florescu, J. Siméon

Subtyping: Conclusion• Subtyping and inheritance in programming languages

• By name subtyping in XML Schema: relies on user declaration

• Structural subtyping in XDuce relies on set inclusion

• Subsumption for semistructured data [Buneman et al 1997] and for XML [Kuper Simeon 2001] proposes a trade-off between by name and structural subtyping

Still an open problem

Page 64: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 64D. Florescu, J. Siméon

XML DDL: Conclusion• Many research work with interesting and

complementary properties

• Complete but complex XML Schema specification...

• Yet no approach that reconciles all of the above

• And still some difficult problems to solve:> concrete integrity constraint language that is tractable> syntactic vs. semantics notion of subtyping ?> use of types for language typing> use of types for query processing> use of types for storage

Page 65: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 65D. Florescu, J. Siméon

Part IVXML Query Languages

Page 66: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 66D. Florescu, J. Siméon

Plan of the rest of the talk• Querying XML: problem definition

• Previous query languages for XML and graph-based data

• Xquery as a “standard” query language for XML– Syntax and semantics

– Functionalities and expressive power

– Open issues

• Other desirable features for Xquery

• Research problems related to XML data management

• Conclusion

Page 67: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 67D. Florescu, J. Siméon

In search of a query language...

• What do we call a query language?

The language used to describe, in a declarative fashion, the mapping

between an input instance of the data model to an output instance of the data

model.

What data model for XML ?

Page 68: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 68D. Florescu, J. Siméon

XML data models

• XML is just a syntax and did not have any standard data model for many years (still doesn’t !)

• Graphs data models have been used to model irregular data even before XML

• All query languages for graph-based data models are relevant to XML

• Xquery data model (www.w3c.org/TR/query-datamodel)– First formal and complete data model for XML– Used in the formal semantic specification of Xquery

Page 69: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 69D. Florescu, J. Siméon

XML example<book year=“1967” >

<title>The politics of experience</title><author>R.D. Laing</author><ref isbn=“1341-1444-555”/><section>

The great and true Amphibian, whose nature is disposed to…..

<title>Persons and experience</title> Even facts become...

</section> …</book>

Page 70: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 70D. Florescu, J. Siméon

XML data model in a slide• An instance of the data model = a forest of nodes

• Eight type of nodes:

– Document, element, attribute, value, namespace, processing-instruction, comment, reference nodes

• Each type of node has accessors (e.g name(element)) and constructors (e.g. comment(“this is a comment”))

• Nodes have an optional (unique) parent

• Nodes have an identity that can be queried and preserved

• Support for ordered and unordered collections

• No support for nested collections

• Document order can be queried and preserved

• Data model instances are described and constraint by a type system

Page 71: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 71D. Florescu, J. Siméon

XML query language requirements (1)

1. Select portions of an XML document

2. Copy portions of a document while

preserving the hierarchy and the order of

the nodes

3. Combine (join) two documents

4. Construct new documents

5. Navigate irregular or unknown documents

Page 72: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 72D. Florescu, J. Siméon

XML query language requirements (2)

6. Formulate predicates on the tag names and

attribute names

7. Query and preserve the nodes global

topological order

8. Apply aggregation and sorting functions

9. Apply existential and universal quantifiers

10. Apply full-text predicates and text operations

Page 73: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 73D. Florescu, J. Siméon

Relevant query languages• Query languages for graph data

– e.g. GOOD, GraphLog, Clean

• Query languages/scripting languages for the WEB – e.g. WebSQL, WebOQL, WebL

• Query languages for semi-structured data– e.g. MSL, UnQL, StruQL, YATL

• Research query languages for XML– e.g. XML-QL, Lorel, XML-GL, Quilt, Xduce

• Industry query languages for XML– e.g. XQL, OQL extensions to query SGML documents

• Standard processing languages for XML (W3C standards)– e.g. XPath, XSLT

• Standard W3C XML Query Language: Xquery “XML Query Languages: Experiences and Exemplars”, M. Fernandez, J. Simeon, P.

Wadler“Comparative Analysis of Five XML Query Languages”, Angela Bonifati, Stefano Ceri

Page 74: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 74D. Florescu, J. Siméon

XQuery• Current working drafts inside the W3C

www.w3c.org/XML/Query

• Basis of the future “standard” XML query language

• Xquery will have a : (a) human readable (non-XML) syntax and (b) an XML syntax (ABQL)

• XML Algebra:– Formal data model, type system– Formal semantics for the query languageCaveat: many features and design decisions

are stable; some will change

Page 75: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 75D. Florescu, J. Siméon

Xquery as a functional language• Xquery :

– consumes an instance of the XML data model as input– produces an instance of the XML data model as output

• Xquery is a functional language (like OQL)• Xquery is a strongly typed language• A query is an expression• Static semantics:

– Given an expression computes the type of the result

• Dynamic semantics: – Given an expression and an environment, determines the

resulting value

• Environment binds functions and variables

Page 76: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 76D. Florescu, J. Siméon

Xquery expressions• Constants (all XML Schema atomic types)

– “string literal” , 1345.46E23, etc

• Variables– $x, $y

• XPath expressions (for navigation)– $x/girls, $y/* , $x/@name

• Expression OP Expression– 1 +3, true and false, $x/girls union $x/boys

• f(exp1,...exp2)– descendents($x)

• FLWR expressions (for iteration)• SORTBY expressions• Quantified expressions • Conditional expressions• XML node constructors (elements, attributes, etc)

Page 77: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 77D. Florescu, J. Siméon

Xquery functions and operators

• Arithmetic operators– +, -, *, div,

• Logical operators– And, Not, Or

• Collection oriented operators– Union, intersection, difference, empty(), distinct()

• XML specific functions– Document(), name(), value(), string(), etc

• Work in progress• Many semantic open issues: what is the semantics of a

+ operator when the input is not a value of a numerical type but a list of strings ? See type coercion problem later on.

Page 78: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 78D. Florescu, J. Siméon

Navigation using Xpath• General syntax:

expression ‘/’ step• Step:

axis ‘::’ nodeTest

• Axis control the direction– ancestor, ancestor-or-self, attribute, child, descendent, descendent-or-self, following,

following-sibling, namespace, parent, preceding, preceding-sibling, self

• Node test by– Name (e.g. employee, myNS:employee, *: employee, myNS:* , *:* )– Type (e.g. node(), comment(), text() )

• Examples of path expressions

document(“employees.xml”)/child::employee

$x/parent::*

$x/ancestor::*/descendent::comment()

Page 79: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 79D. Florescu, J. Siméon

Semantics of path expressions

• Semantics of path expressions in Xpath 1.0(1) Ordered forests of nodes as input, ordered forests of nodes as output (2) For each root node in the input forest, select the nodes in the same document that obey to the given axis; among those select and return the ones that satisfy the node test.(3) No duplicates are allowed in the output(4) Output nodes are ordered by the document order(5) Nodes preserve their identity

• No type error for $book/firstname

• A list of lists is automatically flattened

Page 80: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 80D. Florescu, J. Siméon

XML example<book year=“1967” >

<title>The politics of experience</title><author>R.D. Laing</author><ref isbn=“1341-1444-555”/><section>

The great and true Amphibian, whose nature is disposed to…..

<title>Persons and experience</title> Even facts become...

</section> …</book>

Page 81: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 81D. Florescu, J. Siméon

Shortcuts in Xpath (1)• Axis is not mandatory

– By default it is child $x/child::person -> $x/person

• Short-hands for common axes– Descendents,

$x/descendant::comment() -> $x//comment() – Parent

$x/parent::* -> $x/.. – Attribute

$x/attribute::name -> $x/@name – Self

$x/self::* -> $x/.

Page 82: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 82D. Florescu, J. Siméon

Shortcuts in Xpath (2) • Implicit root node

$root/department -> /department $root -> /

where $root is implicitly bound to the current document node

• Implicit current node$self/title -> ./title $self/title -> title where $self is implicitly bound to the ‘current’ node

(eliminates the need for an explicit variable declaration in second-order operators like sortby and filter predicates )

Page 83: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 83D. Florescu, J. Siméon

Iteration • Syntax :

for variable in expression0 return expression1

• Example :» for $y in document(“books.xml”)/book return $y/authors» for $x in //text() return value($x)» for $z in ( for $y in //book return $y/authors ) return $z» for $z in //book return ( for $y in $z/authors ) return $y)

• Semantics :– bind the variable to each root node of the forest returned by

expression0; for each such binding evaluate expression1; concatenate the resulting forests.

Page 84: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 84D. Florescu, J. Siméon

Local variable declaration

• Syntax : let variable := expression1 return expression2

• Example :» let $y := document(“books.xml”)/book return count($y)» let $a :=f(2) return $a+$a

• Semantics :– Evaluate expression1 and add a binding of the variable with this

value to the current environment; evaluate expression2 in this environment; remove the local variable from the environment.

• Usage:– Avoid common sub-expressions repetition– Split large expressions into smaller, more manageable sub-

expressions.

Page 85: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 85D. Florescu, J. Siméon

Conditional expressions

• Syntax : if expression1 then expression2 else expression3

• Example :» if $book/year <1980 then “old book” else “new book”» if count($company//employee)>200 then BigCompanyTaxCalculation($company)

else SmallCompanyTaxCalculation($company)

• Semantics :– If expression1 evaluates to true then return the result of

the evaluation of expression2 else return the result of the evaluation of expression3.

Page 86: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 86D. Florescu, J. Siméon

FLWR expressions• Syntactic sugar that combines FOR, LET, IF• Syntax:

( ( for (for_variable_binding)+ ) | ( let (let_variable binding)+ ) | ( where expression ) )+ return expressionfor_variable_binding := variable IN expressionlet_variable_binding := variable := expression

• Example for $x in //employee, $y in //department let $z := $x/name where $x/@departament=$y/name return $z

Page 87: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 87D. Florescu, J. Siméon

FLWR example • FLWR expression:

for $x in //employee, $y in //department

let $z := $x/name where $x/@department=$y/name return $z

• Syntactic sugar for: for $x in //employee

return ( for $ y in /department return (let $z := $x/name return if ( $x/@department=$y/name ) then $z else [] /*empty list */ ) )

Page 88: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 88D. Florescu, J. Siméon

Filter predicates• Syntactic sugar that simplifies some FLWR

expressions

• Syntax: expression1 [ expression2 ]where expression 2 is allowed to use the $self implicit variable

(or the equivalent . )

• Semantics: – if expression2 is of type boolean, shorthand for

for $self in expression1where expression2return $self

– if expression2 is of type integer, return the Nth root element of the forest returned by expression1

Page 89: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 89D. Florescu, J. Siméon

Filter predicates (2)• Filtering by predicate :

» //employee [./name/firstname = “jerome”]» //book [price <25]» //book [count(author [@sex=“female”] )>0 ]

• Filtering by position :» /book[3] » /book[3]/author[1] » /book[3]/author[1 to 4]

• Same syntax, different semantics based on the type of the expression !

Page 90: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 90D. Florescu, J. Siméon

Quantifiers• Syntax:

some variable in expression1 satisfies expression2every variable in expression1 satisfies expression2

• Examples:»some $x in //book satisfies $x/price <200»//book[some $x in author satisfies $x/@sex=“female”]

» for $x in //department where every $y in $x/employee satisfies $y/salary >1000 return $x/manager/name

Page 91: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 91D. Florescu, J. Siméon

Sorting• Syntax:

expression0 SORTBY ‘(‘ expression1 [ ASCENDING | DESCENDING ] , ….,

expressionK [ ASCENDING | DESCENDING ] ‘)’

• Semantics:– Second order operator– Stable sort using the comparison function defined on the domains

1..K– The implicit self variable is allowed in expression1,…, expressionk

• Examples:» //employee sortby (./name/firstname)» //person sortby ( ./income descending, ./name ascending)» for $x in //departments where count($x/employee)>2000 return $x sortby (revenue)

Page 92: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 92D. Florescu, J. Siméon

Global (document) order queries

• Syntax: expression1 ( before | after ) expression2

• Semantics: – return all the roots of the first forest that are

located before (resp. after) at least one root node in the second forest according to the global topological order of the document

• Examples:– //incision before //anesthesia[1]– //paragraph after //section[name=“introduction”] before //paragraph[contains(“Xquery”)

Page 93: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 93D. Florescu, J. Siméon

Element constructors (1)• Normal XML elements:

<section title=“Introduction” > This is the introduction of the book entitled <title>Data on the Web</title> written by <author> Dan Suciu </author> <author>Peter Buneman</author> <author> Serge Abiteboul </author> . </section>

• XML elements with dynamically computed data <section title = $s/title > “This is the introduction of

the book entitled“, $s/ascendents::book/title , “ written by “, for $a in$s/ascendents::book/author return <author> concat($a/firstname, $a,lastname) </author> </section>

Page 94: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 94D. Florescu, J. Siméon

Element constructors (2) • Example: “For each book with an author, return the

book and its authors; for each book with an editor return the book’s title and the editor’s affiliation”.

<bibliography> for $x in //book return

if(empty($x/author)) then <book> $x/title, $x/editor/affiliation</book>

else <book>$x/title, $x/authors></book> </bibliography> Attention to the deep copy semantics !

Page 95: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 95D. Florescu, J. Siméon

Constructing other types of nodes

• Eight types of nodes:– Document, elements, attributes, references,

namespaces, comments, processing-instructions

• Elements are constructed using an XML notation

• All the others use specific functions– comment(“Please look at this issue!”)– makeAttribute(“age”, 25)

Page 96: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 96D. Florescu, J. Siméon

FILTER• Example: ”Retrieve the table of content of a

specific book”

filter(document(“input.xml”)//book[@ISBN=10],

//book | //section | //title | //section/title/text() )

• Copy from the input document only the book elements, the section elements, the section titles and their text content (but not their children)• For the copied nodes, preserve their relative order and their hierarchical structure.

Page 97: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 97D. Florescu, J. Siméon

FILTER example<?XML version=“1.0”?><bib>…………………………….

<book ISBN=“10” year=“1967” > <title>The politics of experience</title> <author><firstname>R.D.</firstname>

<lastname>Laing</lastname>

</author> <section>

<title>Persons and experience</title> The great and true Amphibian <section>

Exploitation must not .... </section>

</section> </book>………………………..</bib>

<?XML version=“1.0”?><book> <title>The politics of experience</title>

<section> <title>Persons and experience</title>

<section> ..................... <section> </section></book>

Page 98: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 98D. Florescu, J. Siméon

Dealing with node identity

• All nodes in the data model have node identity

• Node identity is preserved through queries:– All the constructs in Xquery preserve node identity

except

– The element constructor that makes copies of the input nodes and generates new nodes with new identity

• Two node can be compared using the identity equality operator (‘==‘)

Page 99: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 99D. Florescu, J. Siméon

XQueries• … we talked until now about expressions

• What is a query?

• An Xquery is defined as:– A list of context definitions– A list of function definitions– A main expression

• The result of the query is the result of the evaluation of the main expression

• Context definition:– Namespace definitions

Page 100: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 100D. Florescu, J. Siméon

Local function declarations• Syntax:

function functionName ‘(‘ Parameter list ‘)’ return dataType ‘ {‘ expression ‘}’

• Example:function total_cost($x myNS:component) return xsd:float{ if(simpleComponent($x)) then return $x/price/data() else return sum(for $y in $x/* return total_cost($y )) }

total_cost(/component[1])

• Functions can be recursive; no restrictions on the type of the recursion

• Functions obey to the “implicit mapping rule”

Page 101: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 101D. Florescu, J. Siméon

Static semantics for path expressions

”Retrieve the titles of all the books.”

• Input: type Bib = bib [ Book* ] type Book = book [ title [ String ], year [ Integer ] author

[ String ]* ] • Query: document(“bib0.xml”)/book/title

• Result: <title>Data on the Web</title> <title>Foundations of Databases</title> : title[String]*

Page 102: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 102D. Florescu, J. Siméon

Static semantics for the iteration

Example: ”Retrieve all the books written before 1967.”

• Query: for $v in document(“bib0.xml”)/book return if $v/title < 1967 then $v else []

• Result: <book>…..</book> <book>…..</book> : book[ title [ String ], year [ Integer], author [String]* ]

Page 103: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 103D. Florescu, J. Siméon

Plan of the rest of the talk• Querying XML: problem definition

• Previous query languages for XML and graph-based data

• Xquery as a “standard” query language for XML– Syntax and semantics

– Functionalities and expressive power

– Open issues

• Other desirable features for Xquery

• Research problems related to XML data management

• Conclusion

Page 104: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 104D. Florescu, J. Siméon

Joins• Example: “For each book found at both amazon.com and

bn.com list the title of the book and the price from each vendor”.

<book-with-prices> for $a in document(“amaxon.xml”)/book, $b in document(“bn.xml”)/book where $b/isbn=$a/isbn return

<book> $a/title, <price-amazon>$a/price</price-amazon>, <price-bn>$b/price</price-bn> </book> </book-with prices>

Page 105: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 105D. Florescu, J. Siméon

Left-outer joins• Example: “For each book found at both amazon.com list

the title of the book and its price. If the book also appears in bn.com, list also the bn price”.

<book-with-prices> for $a in document(“amaxon.xml”)/book return

<book> $a/title,

<price-amazon>$a/price</price-amazon>, for $b in document(“bn.xml”)/book where $b/isbn=$a/isbn return <price-bn>$b/price</price-bn> </book> </book-with prices>

Page 106: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 106D. Florescu, J. Siméon

Full-outer joins• Example: “For each book found at either amazon.com or

bn.com list its price(s).”

let $allISBNs:=distinct(document(“amazon.xml”)/book/isbn union document(“bn.xml”)/book/isbn )return <book-with-prices> for $isbn in $allISBNs return

<book> ( for $a in document(“amazon.xml”)/book where $a/isbn=$isbn return <price-amazon>$b/price</price-amazon> ),

( for $b in document(“bn.xml”)/book where $b/isbn=$isbn return <price-bn>$b/price</price-bn> ) </book> </book-with prices>

Page 107: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 107D. Florescu, J. Siméon

Group-by and Having• Example: “For each author with more then

10 books list the name of the author and the list of the first 10 books that he/she wrote”.

for $a in distinct(//author)let $books := for $b in //book[author=$a]where count($books)>10return <result> $a/name, $books[1 to 10] </result>

Page 108: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 108D. Florescu, J. Siméon

Views and parameterized views

• Support for views is a must

• Views are supported via functions

• Non-parameterized views are functions with no arguments; parameterized views are functions with at least one argument

• Xquery can support recursive views (unrestricted form of recursion)

• Termination is ensured by the programmer

Page 109: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 109D. Florescu, J. Siméon

Open issues• Three value logic :

– XML Schema supports elements with nil content– Xquery has to deal with the absence of information

• Extensibility :– Some functions will be written in other programming languages

then Xquery– How are those functions declared and invoked in Xquery?

• Exceptions and exception handling mechanisms :– What is the semantics of a query in case of exceptions?– What is the semantics of Booleans operators in case of

exceptions?– How should we raise and catch exceptions?

• Type coercion rules :– XML has no mandatory Schema; does this imply that data should

be converted on the fly to the types expected by the operators ?– E.g. lists to singletons, strings to float, float to string

Page 110: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 110D. Florescu, J. Siméon

Implicit type casting in Xpath 1.0• Data model has 4 types:

– untyped set, string, integer, Boolean

• The evaluation uses implicit type casting rules:

/person [ child/age = 19] implicit existential quantifier

/person [ child/age + 1 = 20] the age of the first child equal 19

/book[@year] implicit existential quantifier

/book[@year+1-1] two type conversions: string->int, int->Boolean

will return a book written in 1999 if it happens that this is

the 1999th book in the document

/book[title=“”] empty set to string conversion

returns also the books without a <title> subelement

Page 111: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 111D. Florescu, J. Siméon

Plan of the rest of the talk• Querying XML: problem definition

• Previous query languages for XML and graph-based data

• Xquery as a “standard” query language for XML– Syntax and semantics

– Functionalities and expressive power

– Open issues

• Other desirable features for Xquery

• Research problems related to XML data management

• Conclusion

Page 112: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 112D. Florescu, J. Siméon

XML patterns and pattern matching

• UnQl, XML-QL, YATL• Example:

– ”Retrieve the titles of the books written by Laing before 1967”

WHERE <bib> <book year= $y ISBN= $isbn>

<title> $t </title> <author> <lastname>Laing</lastname> </author> </book>

</bib> in “bib.xml”, $y<1967

CONSTRUCT <resultBook ISBN= $isbn > <resultTitle> $t </resultTitle> </resultBook>

•No distinction between For and Where•Pattern matching semantics

Page 113: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 113D. Florescu, J. Siméon

Skolem functions• UnQl, XML-QL, Lorel• Example:

– ”Retrieve the titles of the all the books, grouped by year of publication”

WHERE <bib> <book year= $y>

<title> $t </title> </book>

</bib> in “bib.xml

CONSTRUCT <groupPerYear id= F($y) > <resultTitle> $t </resultTitle> </groupPerYear>

Page 114: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 114D. Florescu, J. Siméon

Vertical regular expression• UnQl, XML-QL, Lorel, YATL• Example:

– ”Retrieve the titles of all the sections or chapters”

WHERE <bib> <book>

< (section | chapter) * > <title> $t </title>

</> </book>

</bib> in “bib.xml” CONSTRUCT <resultTitle> $t </resultTitle>

Page 115: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 115D. Florescu, J. Siméon

Horizontal regular expressions

• YATL

• A Tree Pattern = type expression without union, and with annotated variables ($v)

• Example: ”Retrieve the first author after the book title”

• Process DTDs like: <!ELEMENT bib (title, author+)*>

• Example: “Create a bibliography for each author”

book[ title [ String ] book($b) [ title [ $t ], author[String]+, +author [ $a ]+, UrTree* ] _ ]

MAKE $aMATCH book WITH book [ _ , title , _, author[$a] , *author, _ ]

MAKE *($a) bib [ author [ $a ], *title [ $t ] ]MATCH bib WITH bib[*(title [ $t ], +author [ $a ] )]

Page 116: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 116D. Florescu, J. Siméon

XML-related research problems(1)

• Update languages for XML

• XML views of object-relational databases

• Storing XML data in object-relational DBMSs– new challenges for the traditional DBMSs and for SQL

• Alternative storage methods for XML data

• Indexing XML

• Query processing algorithms for XQuery

• Efficient (streamed) processing of XML transformations

• Mixing structured search with full-text search

• Distributed execution of XML queries

• XML benchmarks

Page 117: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 117D. Florescu, J. Siméon

XML-related research problems(2)• XML-based information mediation

• XML data cleaning

• XML data compression

• XML-based information brokering

• XML-based workflow systems

• XML scripting languages

and many more...

Page 118: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 118D. Florescu, J. Siméon

Conclusion• XML is the lingua franca of the Web • XML is the next big challenge for the database community• Large quantities of a new type of data

– textual, irregular, self-organizing, distributed, replicated, etc.

• Many orders of magnitude larger:– the volume of XML data– the number of XML data repositories

• We have now good quality standards: – XML data model, XML schemas, XML query and transformation

languages

• Very clear need from the industry• Extraordinary opportunity for database research !

Page 119: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 119D. Florescu, J. Siméon

XSLT(1)• Paper:

– “XSL Transformations (XSLT)”, W3C recommendation

• XML to XML rule based transformation language

• An XSLT program is an XML document itself

The divided self

publisher

R.D. Laing

author

book

titlepublisherauthor

bookbook

......

..........................................

title

bib

Pantheon Books

The divided self

publisher

R.D. Laing

author

book

titlepublisherauthor

bookbook

......

..........................................

title

bib

Pantheon Books

The divided self

publisher

R.D. Laing

author

book

titlepublisherauthor

bookbook

......

..........................................

title

bib

Pantheon Books

DOM

XML

HTML

data

transformation

result

Page 120: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 120D. Florescu, J. Siméon

XSLT(2)

• An XSLT program is a valid XML document containing:– elements in the <xsl:> namespace (i.e. the XSLT statements)

– elements in other namespaces(i.e the user-defined data)

• The result of the evaluation of an XSLT program on an input XML document := the XSLT document where each <xsl:> element has been replaced with the result of its “evaluation”

• Uses Xpath as a sublanguage

• Used mostly as a stylesheet language

Page 121: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 121D. Florescu, J. Siméon

XSLT programs

• An XSLT program – is an element of type <xsl:stylesheet>

1. XSL elements describing rewriting rules– <xsl:template>

2. XSL elements describing rule execution control – <xsl:apply-templates>– <xsl:call-template>

3. XSL elements describing instructions– <xsl:element>, <xsl:attribute>, <xsl:for-each>,

<xsl:if>, <xsl:copy>, <xsl:copy-of>, <xsl:sort>, <xsl:value-of>, etc

Page 122: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 122D. Florescu, J. Siméon

XSLT processing model• Process an XML document (procedure PD):

1. Apply the procedure PL (bellow) to a list with a single node: the root of the document

• Process a list L of nodes (procedure PL):1. Process each node N (procedure P bellow) in the list (with current

node=N and current list=L)

2. Return the concatenation (in the right order) of the partial results

PL([x1, x2…, xn]) = [ P(x1), P(x2), …, P(xn)]

• Process a node N (procedure P):1. Find all applicable templates to the node N

2. Find the “best” template among them

3. Instantiate the content of the template

4. Return this result

Page 123: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 123D. Florescu, J. Siméon

<xsl:template>• Basic XSLT concept: describes a rewriting rule

• It has:– attributes to describe the acceptable input – content to describe the output

• Attributes:– match: Xpath expression describing the elements to which this

template applies– name: the name of the template rule– priority: guides the choice of the best template to apply

• The content is a legal XML fragment with:– Elements from the xsl namespace – Other elements (user data)

Page 124: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 124D. Florescu, J. Siméon

<xsl:template> example <xsl:template name=“myTemplate” match=“book[title]” >

<resultBook> <xsl:attribute name=resultYear>

<xsl:value-of select=“./@year”/> </xsl:attribute>

The title of this book is <resultTitle>

<xsl:value-of select=“./title”/> </resultTitle>

and it was.... </resultBook><xsl:template>

Page 125: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 125D. Florescu, J. Siméon

Instantiating an <xsl:template>

• ... on a node N:» returns the content of the template where the <xsl:> elements

from the content of the template have been replaced with the result of their “evaluation” ( with the current node=N )

» Two types of <xsl:> elements in the content:

1. Instruction elements » <xsl:copy>, <xsl:copy-of>, <xsl:value-of>, <xsl:for-each>» return a certain list of nodes according to their particular semantics

2. Rule control elements » <xsl:apply-templates>, <xsl:call-templates>» recursive calls to the rule engine (see below)

• Maps an XML node into a list of XML nodes

Page 126: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 126D. Florescu, J. Siméon

<xsl:template> example <xsl:template name=“myTemplate” match=“book[title]” >

<resultBook> <xsl:attribute name=resultYear>

<xsl:value-of select=“./@year”/> </xsl:attribute>

The title of this book is <resultTitle>

<xsl:value-of select=“./title”/> </resultTitle>

and it was.... </resultBook><xsl:template>

Page 127: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 127D. Florescu, J. Siméon

Example of instantiation<book ISBN=“10” year=“1967” >

<title>The politics of experience</title> <author>R.D.Laing</author> <section> The great and tr

<title>Persons and experience</title>

<section> Exploitation must not been….

</section> </section> </book>

<resultBook resultYear=1967> The title of this book is <resultTitle>

The politics of experience </resultTitle> and it was ….</resultBook>

Input XML

Output XML

Page 128: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 128D. Florescu, J. Siméon

Recursive <xsl:template><xsl:template name=“myTemplate” match=“book[title]”

> <resultBook>

<xsl:attribute name=resultYear><xsl:value-of select=“./@year”/>

</xsl:attribute> <resultTitle>

<xsl:value-of select=“./title”/> </resultTitle>

<xsl:apply-template select= “./section” /> </resultBook><xsl:template>

Invokes the procedure PL with current list= “./section”.  

Page 129: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 129D. Florescu, J. Siméon

Recursive calls• <xsl:apply-templates>

– invokes recursively the procedure PL – the argument is a new list of nodes

» explicitly specified in the select attribute» by default is the list of children of the current node

<xsl:apply-template select=“ ./section ”/>

• <xsl:call-template>– triggers the instantiation of a specific template identified by

name – does not change the context node and the context list

<xsl:call-template name=“myTemplate” />

Page 130: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 130D. Florescu, J. Siméon

XSLT execution control <xsl:stylesheet>------------------------------------------------------------------ <xsl:template name=“myTemplate”>

<xsl:apply-template select=“./ascendent::book”/> <xsl:template>------------------------------------------------------------------ <xsl:template match=“section”>

This is a section of the book <xsl:call-template name=“myTemplate”/> and its name is <xsl:value-of select=“./title”> . </xsl:template>------------------------------------------------------------------ <xsl:template match=“book”>

<xsl:value-of select=“./title”> </xsl:template>----------------------------------------------------------------- <xsl:template match=“/”>

<xsl:apply-template select=“//section[title]”> </xsl:template>------------------------------------------------------------------</xsl:stylesheet>

Page 131: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 131D. Florescu, J. Siméon

Built-in templates------------------------------------------------------------------

<xsl:template match=“*|/”> apply recursively on the children <xsl:apply-templates select=“./node()” /> if element</xsl:template>

------------------------------------------------------------------

<xsl:template match=“@*|text()”><xsl:value-of select=“.”/> print the content

</xsl:template> if text node or attribute

-----------------------------------------------------------------

<xsl:template match=“processing-instruction()|comment()”/> ignore (do nothing) if processing instruction or comment

Page 132: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 132D. Florescu, J. Siméon

TOC of a certain book

<xsl:template match=“/”> <xsl:apply-template select=“//book[@ISBN=10]”>

</xsl:template>----------------------------------------------------------------------------------

<xsl:template match=“book”><xsl:apply-template select=“./section”>

</xsl:template>--------------------------------------------------<xsl:template match=“section”>

Section <xsl:value-of select=“title”> <xsl:apply-templates select=“./section”>

</xsl:template>

-----------------------------------------------------------------

Page 133: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 133D. Florescu, J. Siméon

XSLT

• Like Xquery, it describes general XML to XML transformations

• Built-in processing model

• Full recursion

• Possibile to write non-terminating programs even on trees

• XSLT vs. Xquery – same expressive power– differences: programming style, XML vs. Non-XML syntax

• Could be considered as a query language

• Is it “declarative” ?

Page 134: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 134D. Florescu, J. Siméon

Part IVData Manipulation

Language

Page 135: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 135D. Florescu, J. Siméon

Query languages for XML• problem definition

• overview of different approaches

• overview of representative research languages – query languages for semistructured data

– research and industry query languages for XML

• status of the XML Query Working Group– XML Query Algebra (working draft)

– XQuery: a query language for XML (working draft)

Page 136: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 136D. Florescu, J. Siméon

In search of a query language...• What do we call a query language?

The language used to describe, in a declarative fashion, the mapping

between an input instance of the data model to an output instance of the data

model.

Page 137: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 137D. Florescu, J. Siméon

XML vs. graph-based models• XML document content could be modeled as a graph

– components (elements, attributes) in a hierarchical structure

• ...but XML is more complicated than that– several distinct types of nodes

» text, elements, attributes, comments, processing instructions, etc.

– some parts are ordered (e.g. children of an element) and some other parts not ordered (e.g. attributes)

– in the absence of a DTD or schema, the document is a tree; otherwise it could be a graph

• We will not consider only XML query languages, but also query languages for graph-based data

Page 138: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 138D. Florescu, J. Siméon

Some relevant query languages• Query languages for graph data

e.g. GOOD, GraphLog, Clean

• Query languages for the WEB e.g. WebSQL, WebOQL

• Query languages for semi-structured datae.g. MSL, UnQL, StruQL

• Research query languages for XMLe.g. XML-QL, Lorel, YATL, XML-GL, Quilt, XDuce

• Industry query languages for XMLe.g. XQL, OQL extensions to query SGML documents

• Standard processing languages for XML (W3C standards)e.g. XPath, XSLT

“XML Query Languages: Experiences and Exemplars”M. Fernandez, J. Simeon, P. Wadler

“Comparative Analysis of Five XML Query Languages”Angela Bonifati, Stefano Ceri

Page 139: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 139D. Florescu, J. Siméon

XML languages: the big picture

SPJ +RegExpr +grouping.

Expressive power

Data model

Simple graphs

Idealized XML data model

Real XML

Navigation & selection

OQL+RegExpr

XML-QL (2) Lorel (3)

UnQL (1)

XSLT (7)

XQuery (6)

XPath(5)

SPJ+RegExp

OQL+conditional +full recursion

YATL (4)

Page 140: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 140D. Florescu, J. Siméon

DDL Roadmap3.1. XPath

> Building block for several other languages

3.2. XQuery and the XML Query Algebra> Both working drafts> Design based on requirements and use cases

3.3. Other languages and features> XML-QL, Lorel, YATL, XDuce, etc.> Focusing on specific features

3.4. XSLT> Already a W3C recommendation> Already widely used

Page 141: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 141D. Florescu, J. Siméon

XPath: Overview• Syntax for XML document navigation and

node selection

• Papers:– “XML Path Language (XPath)”, W3C

recommendation

• Building block for other W3C activities:– XSL Transformations (XSLT) – XML Link (XLink)– XML Pointer (XPointer)– XML Query (XQuery)

Page 142: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 142D. Florescu, J. Siméon

XPath Expressions• A query is an expression (Location Path)

– describes a single navigation path in an XML document

• A query simply selects a list of nodes from the input document

• A Location Path consists of:– a context node– a series of Location Steps separated by /

• A verbose Location Step consists of:– an axis, a node test, a list of predicates

document(“bib.xml”) / child::book [./attribute::ISBN=10] / descendant::section / [position()=1]

Page 143: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 143D. Florescu, J. Siméon

XPath• Location step:

– an axis, a node test, a list of predicates

• 13 Axes:– ancestor, ancestor-or-self, attribute, child, descendent,

descendent-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self

• Node Test: – name test (e.g. section, *, myNs:myTag) – type test (e.g. text(), comment(), node() )

document(“bib.xml”) / child::bib/ child::* [./attribute::ISBN=10] /

descendant::section [position()=1] / child::comment()

Page 144: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 144D. Florescu, J. Siméon

XPath abbreviated syntax book CN/child::book book/@ISBN CN/child::book/attribute::ISBN

section[1] CN/child::section[position()=1]. CN.. CN/parent::*../text() CN/parent::*/child::text()//section ROOT/descendent-or-self::section/section ROOT/child::section// ROOT/descendent-or-self::*//section[last()]

ROOT/descendent-or-self::section[position()=last()]

//section [5] [title=“introduction”]//section [title=“introduction”] [5]

Page 145: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 145D. Florescu, J. Siméon

Semantic aspects of XPath• Data model has 4 types:

– untyped set, string, integer, Boolean

• The evaluation uses implicit type casting rules:/person [ child/age = 19] implicit existential quantifier/person [ child/age + 1 = 20] the age of the first child equal 19/book[@year] implicit existential quantifier/book[@year+1-1] two type conversions: string->int, int->Boolean will return a book written in 1999 if it happens that this is the 1999th book in the document/book[title=“”] empty set to string conversion returns also the books without a <title> sub-elementpreceding::foo[1] and (preceding::foo)[1] are not the same

Page 146: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 146D. Florescu, J. Siméon

XML Query Working Group

• XML Query Requirements (WD)– What should be achieved with the language

• XML Query Use Cases (WD)– Many examples of queries for a lot of applications

• XML Query Algebra (WD)– Formal basis for the language(s)

• XQuery : a traditional syntax (WD)

• An XML syntax (Not here yet)

Page 147: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 147D. Florescu, J. Siméon

XML Query Requirements

• Declarative

• Expressive (joins, manipulation of documents, etc)– Supporting both database applications and

documents applications

• Formally specified– Precise semantics

• Two syntaxes: ‘user-readable’ and XML

• Should allow updates in the future

Page 148: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 148D. Florescu, J. Siméon

XML Query Use Cases• Illustrate the Query language with examples

– Access to relational databases– Access to documents– Full-text queries– Recursive queries– queries that use references– metadata queryingEtc.

• Decide what XQuery should and should not do– Make 80/20 cut

• ‘Benchmark’ for the language design– Important queries should be easy to write

Page 149: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 149D. Florescu, J. Siméon

XML Query Algebra• Based on XML Query Data Model

• ‘Minimal’ set of operations

• Static semantics (type checking)– Can infer the type of your query

• Dynamic semantics (result of the query)

• Expressive enough to support Xquery– Iteration (and join)– Navigation– Functions with full recursion

• Contains a tutorial on types and expressions

Page 150: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 150D. Florescu, J. Siméon

Static semantics for path expressions

”Retrieve the titles of all the books.”

• Input: type Bib = bib [ Book* ] type Book = book [ title [ String ], year [ Integer ] author

[ String ]* ] • Query: document(“bib0.xml”)/book/title

• Result: <title>Data on the Web</title> <title>Foundations of Databases</title> : title[String]*

Page 151: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 151D. Florescu, J. Siméon

Static semantics for the iteration

Example: ”Retrieve all the books written before 1967.”

• Query: for $v in document(“bib0.xml”)/book return if $v/title < 1967 then $v else []

• Result: <book>…..</book> <book>…..</book> : book[ title [ String ], year [ Integer], author [String]* ]

Page 152: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 152D. Florescu, J. Siméon

XQuery

• First Working Draft in February

• Coming from work on Quilt– Already a number of test implementations

• Supports XML Query use cases

• Draft of semantics on top XML Query Algebra

• Test parsers are available

Page 153: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 153D. Florescu, J. Siméon

XQuery• Data model:

– the XML Query working group data model

• Language description:– borrows features from OQL, XML-QL, Lorel, XQL, ML. – as ML, OQL, Lorel: it is a functional language– includes a subset of XPath as a sub-language– as ML, it uses IF-THEN-ELSE and LET constructs– as YATL, it uses local function definitions– as XQL, it uses BEFORE and AFTER operators (global

topological order of the XML document)– new FILTER operator to do projection while

preserving the hierarchy and the order

Page 154: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 154D. Florescu, J. Siméon

XQuery• A query:= a list of local function definitions + the

main expression to evaluate

• An XQuery expression:– constant (all XML Schema atomic types)– variable– f(exp1,...exp2)

» +, -, and, or, union, intersection, etc– LET var := expr1 in expr2– XPath expression (for navigation)– FLWR expression– SORT expr1 by expr2– XML node constructors (elements, attributes, etc)

Page 155: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 155D. Florescu, J. Siméon

XPath in XQuery

• Query1: ”Retrieve the titles of all the books written before 1967.”

document(“bib.xml”)//book[@year<1967]/title

• An XPath expression is an XQuery expression• Returns the selected forest of the input

document • XPath queries can be used as building blocks

for more complex expressions

Page 156: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 156D. Florescu, J. Siméon

FLWR expressions• Query1: ”Retrieve the titles of the books written

by Laing before 1967, together with their reviews.”

FOR $b in document(“bib.xml”)//book[@year<1967],

$r in document(“reviews.xml”)//review

WHERE $b/authors/lastname=“Laing” and $b/@ISBN=$r/@ISBN

RETURN

<resultBook ISBN=$b/@ISBN>

<title> $b/title/text() </title>,

$r

</resultBook>FLWR expression

Page 157: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 157D. Florescu, J. Siméon

Local variables• Query1: ”Retrieve the titles of the books written by

Laing before 1967 together with their reviews.”

FOR $b in document(“input.xml”)//book[@year<1967]

LET $R := document(“input.xml”)//review[@isbn=$b/@isbn]

WHERE $b/authors/lastname=“Laing”

RETURN

<resultBook ISBN=$b/@ISBN>

<resultTitle> $t </resultTitle>

<bookReviews> $R </bookReviews>

</resultBook>

Page 158: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 158D. Florescu, J. Siméon

Global order operators• Query4: “Retrieve the titles of the first 4

sections (and of their subsections) of a specific book.”

LET $b := /bib/book[@ISBN=10] IN

$b//section/title BEFORE $b/section[5]

the list of all the titles of the

sections of the book $bthe fifth section of the book $b

the book with ISBN = 10

the list of all the titles that appear before the fifth section (in the global topological order of the document)

Page 159: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 159D. Florescu, J. Siméon

FILTER• Query1: ”Retrieve the table of content of a

specific book”

document(“input.xml”)//book[@ISBN=10]

FILTER //book | //section | //title | //section/title/text()

• Erase all the nodes from the input document except the book element, the section elements, the section titles and their text content• For the remaining nodes, preserve their relative order and their hierarchical structure.

Page 160: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 160D. Florescu, J. Siméon

FILTER example<?XML version=“1.0”?><bib>…………………………….

<book ISBN=“10” year=“1967” > <title>The politics of experience</title> <author><firstname>R.D.</firstname>

<lastname>Laing</lastname>

</author> <section>

<title>Persons and experience</title> The great and true Amphibian <section>

Exploitation must not .... </section>

</section> </book>………………………..</bib>

<?XML version=“1.0”?><book> <title>The politics of experience</title>

<section> <title>Persons and experience</title>

<section> ..................... <section> </section></book>

Page 161: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 161D. Florescu, J. Siméon

XQuery: conclusion

• XQuery design goals:– learn from previous experience– keep it simple– make sure it is useful– make sure it is semantically clean :)

• Still many issues:– Which additional feature to add (full regular

expressions, text operators, etc)– Relationship with XPath – Relationship with XML Query Algebra– Relationship with XML Schema

Page 162: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 162D. Florescu, J. Siméon

UnQL(1)• Authors:

– P.Buneman, D. Suciu, M. Fernandez

• Papers:– “UnQL: A Query Language and Algebra for

Semistructured Data Based on Structural Recursion”, P. Buneman, M. Fernandez and D.Suciu, VLDB Journal 9(1), 2000.

– More information at: http://www.research.att.com/~suciu/unql-

home.html

Page 163: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 163D. Florescu, J. Siméon

UnQL(2)

• Initial data model:– trees with labeled edges and labeled leaves

• A query = a function– takes a tree as input and returns a tree as output

• Language description:– based on structural recursion

“The form of the program follows the form of the data.”

Page 164: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 164D. Florescu, J. Siméon

UnQL tree data model• 4 constructs to build a tree

(1) the empty set is a tree (with no nodes and no edges)(2) if V is a value then {V} is a tree (leaf node)(3) if T is a tree and L is a label then {L:T} is a tree (edge

construction)(4) if T1 and T2 are trees then T1 U T2 is a tree (union)

publisher

..........................................

{book : {title: ”The divided self”} {author: ”R.D.Laing”} {publisher: ”Pantheon Books”}}

The divided self

publisher

R.D. Laing

author

book

titleauthor

bookbook

......title

bib

Panthoen Books

Page 165: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 165D. Florescu, J. Siméon

UnQL query language• A query = a function • A function = an ordered set of rules• A rule:

– left-hand side: » a pattern : when the rule has to be applied

– right-hand side» an expression that describes how to create the resulting tree

• 4 types of patternsF({“a”}) = {“A”} F({“b”: T}) = {“B”: F(T)}

• Syntactic restrictions of the expression in the right-hand side in order to guarantee nice behavior

Page 166: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 166D. Florescu, J. Siméon

UnQL in action (3)• Query1: ”Retrieve the titles of all the books”.

F({L:T})= if L=“title” then {“result”:T} else F(T) specific rules -------------------- F( T1 U T2) = F (T1) U F(T2) fixed in the F({})={} language

The divided self

publisher

R.D. Laing

author

book

titleauthor

bookbook

......title

bib

Panthoen Books

Page 167: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 167D. Florescu, J. Siméon

UnQL in action (4)• Query2: ”Copy the document while translating

the edge labels into French and omitting the sections and their descendents.”

F( T1 U T2) = F (T1) U F(T2) F({})={} -------------------- F({“book”:T})={“livre”:F(T)} F({“title”:T})={“titre”: F(T)}

F({“year”:T})={“annee”: F(T)} F({ L : T}={} F({V})=V

T

book

F(T)

livreF

Page 168: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 168D. Florescu, J. Siméon

Alternative SELECT-WHERE syntax

• Query2: ”Copy the books written before 1967 while translating the edge labels into French and omitting the sections and their descendents.”

SELECT {livre : {titre: T} {annee: Y}

} /* output tree pattern */

WHERE {bib {book :

{title: T} {year: Y}

}} in db, /* input tree pattern */ Y <1967

• Can be translated into the previous formalism

T Y

- - - -

Page 169: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 169D. Florescu, J. Siméon

Vertical regular expressions • Introduced by POQL (INRIA)

• Query4: ”Retrieve the books that have a section or a chapter entitled “Persons and experience”

SELECT {title: T}WHERE

{bib: {book: {title: T} { (section| chapter )*.title : “Persons and

experience” }}

} in db Any regular expression can be expressed using structural recursion

Page 170: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 170D. Florescu, J. Siméon

Cyclic data in UnQL

The divided self

publisher

R.D. Laing

author

Western studies

book

titlepublisherauthor

bookbook

......

..........................................

titlecitation

citation

• Normal evaluation would create infinite loops

• Two (equivalent) solutions:– memoization (do not visit the same node twice)– bulk semantics (apply the function on each edge in parallel

and group the resulting graph at the end)

F({“title”:T})={“result”:T} F({L:T})=F(T)

Page 171: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 171D. Florescu, J. Siméon

UnQL: final conclusion• Structural recursion as a programming style• Defined on trees but also on cyclic data • Well defined semantics• Well studied properties

– expressive power (FO+TC)– computable in PTIME– compositional q1 o q2 =q3– allows for traditional optimization– structural recursion guarantees termination even for cyclic

data

• Very interesting study but not usable as such for XML. XML is not a simple graph.

Page 172: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 172D. Florescu, J. Siméon

XML-QL(1)• Authors:

– A. Deutch, M. Fernandez, D.Florescu, A.Levy, D. Suciu

• Papers:– “XML-QL: a Query Language for XML”, A. Deutsch,

M.Fernandez, D. Florescu,A. Levy, D. Suciu, Proc. Int. Conf. of WWW, 1999.

• Implementation:– available at http://www.research.att.com/~mff/xmlql/doc– home-grown main memory XML data repository– query optimizer and execution engine

Page 173: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 173D. Florescu, J. Siméon

XML-QL(2)• Data model:

– node and edge labeled graph (elements & attributes)– a (totally) ordered or a (totally) unordered graph

• Language description:– WHERE clause to bind variables and to test predicates– CONSTRUCT clause to create new XML structures

• Features:– as UnQL: XML patterns for both the WHERE clause and the

CONSTRUCT clause– as UnQL: regular expressions for navigation– in addition: joins on multiple input sources– in addition: Skolem functions to create nested structures

Page 174: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 174D. Florescu, J. Siméon

XML patterns• Query1: ”Retrieve the titles of the books written

by Laing before 1967”

WHERE <bib> <book year= $y ISBN= $isbn>

<title> $t </title> <author> <lastname>Laing</lastname> </author> </book>

</bib> in “bib.xml”, $y<1967

CONSTRUCT <resultBook ISBN= $isbn > <resultTitle> $t </resultTitle> </resultBook>

$y $isbn $t

- - -

- - -

Page 175: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 175D. Florescu, J. Siméon

Joins in XML-QL• Query2: ”Retrieve all the rewiews about books written

by Laing”. WHERE

<bib><book ISBN = $i> <author>

<lastName>Laing</lastName></author>

</book></bib> in “bib.xml”, <reviews>

<review ISBN = $i> </review> ELEMENT_AS $e </reviews> in “reviews.xml”

CONSTRUCT$e

Page 176: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 176D. Florescu, J. Siméon

Outer-joins in XML-QL• Using nested queries • Query3: ”Retrieve the titles of the books written by Laing before

1967, together with their reviews (if any).”

WHERE <bib><book year=$y ISBN= $i > <title>$t</title> <authors><lastname>Laing</lastname></author> </book></bib> in “bib.xml”, $y<1967 CONSTRUCT <resultBook ISBN=$i> <title> $t</title>,

( WHERE <reviews> <review ISBN = $i> </review> ELEMENT_AS $r </reviews> in “reviews.xml”

CONSTRUCT $r)

</resultBook>

Outer-join semantics.

Page 177: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 177D. Florescu, J. Siméon

Meta-data queries• Query4: “Which kind of elements can be found in

the content of the element corresponding to the book with isbn=10 ?”

WHERE

<bib>

<book ISBN=“10”> <$tagName> </> </book>

</bib> in “bib.xml”, CONSTRUCT

<result>$tagName <result>

Page 178: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 178D. Florescu, J. Siméon

Fusion using Skolem functions• Fusion introduced by MSL (TSIMMIS)• Query5: ”Retrieve the titles of the all the books, grouped

first by year and then by publisher”. WHERE

<bib><book year=$y><title> $t </title><publisher>$p/publisher>

</book><bib> CONSTRUCT

<bookPerYear id=F1($y) > <bookPerYear&Publisher id=F2($y,$p) >

<bookTitle> $t </bookTitle> </bookPerYear&Publisher >

</bookPerYear>

Automatic fusion of all the bookPerYear elements with the same id attribute

$y $p $t

Page 179: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 179D. Florescu, J. Siméon

Skolem functions issues• Query5: ”Retrieve the titles of the books published by

“Pantheon Books”, grouped by year and by publisher”. WHERE

<bib><book year=$y><title> $t </title><publisher>$p/publisher>

</book><bib> CONSTRUCT

<bookPerYear id=F1($y) > <bookPerYear&Publisher id=F2($p) >

<bookTitle> $t </bookTitle> </bookPerYear&Publisher >

</bookPerYear>

Creates graphs with cycles and sharing.Several possible XML serializations.

Page 180: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 180D. Florescu, J. Siméon

Skolem functions issues• Query5: ”Retrieve the titles of the books published by

“Pantheon Books”, grouped by year and by publisher”. WHERE

<bib><book year=$y><title> $t </title><publisher>$p/publisher>

</book><bib> CONSTRUCT

<bookPerYear id=F1($y) > <newElement> We have an order problem </newElement> <bookPerYear&Publisher id=F2($y, $p) >

<bookTitle> $t </bookTitle> </bookPerYear&Publisher >

</bookPerYear>

Creates graphs with cycles and sharing.Several possible XML serializations.

Page 181: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 181D. Florescu, J. Siméon

XML-QL: final conclusion• Advantages:

– XML templates look very familiar– can express selection, projection, join, grouping – can construct deeply nested XML elements

• Limitations:– problems with the semantics of Skolem functions:

» order» nested Skolem functions

– preserving structure and hierarchy is difficult– no disjunction, aggregates, quantifiers, etc.– data model ignores some important XML details

Page 182: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 182D. Florescu, J. Siméon

Lorel• Authors:

– S. Abiteboul, D. Quass, J.McHugh, J. Widom, J. Wiener

• Paper:– “The Lorel Query Language for Semistructured Data”, S.

Abiteboul, D. Quass, J.McHugh, J. Widom, J. Wiener, Journal of Digital Libraries, 1(1), 1997

– Semistructured data (OEM), reconverted to XML

• Lorel is an extension of OQL for OEM:– functional language– applies type coercion (relaxes the strong typing constraint of

OQL) – performs path navigation with full regular expressions– adds an XML element creation operator– adds Skolem functions for grouping

Page 183: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 183D. Florescu, J. Siméon

OQL-like queries for XML• Query1: ”Retrieve the books written by Laing

before 1967.”

SELECT xml(result: $b )

FROM $b in bib.book

WHERE $b.author.lastname?=“Laing” and $b.@year<1967

•UnQL & XML-QL vs. Lorel: •No more patterns and pattern matching but path expressions.

•Different syntax. Equivalent expressive power.

Page 184: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 184D. Florescu, J. Siméon

Type coercion• Query1: ”Retrieve the books written by Laing

before 1967.”

SELECT xml(result: $b )

FROM $b in bib.book

WHERE $b.author.lastname=“Laing” and $b.@year<1967

SELECT xml(result: $b )

FROM $b in bib.book

WHERE

exists $l in $b.author.lastname?: $l =“Laing” and

real($b.@year) < real(1967)

Page 185: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 185D. Florescu, J. Siméon

Type coercion in Lorel• Basic comparison operators for atomic types

– conversion to the most general type (real)

• Coercion for equality– “set=value” => existential quantifier– “set=atomic object” => existential quantifier– “set, value=complex object” => false – complex object equality defined recursivelyprice=“12.5” verifies price<13 but no price<“013”

• Traditional operators loose their convenient properties (transitivity, distributivity, etc)

• Problem for query processing !

Page 186: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 186D. Florescu, J. Siméon

Lorel: final conclusion• Extends OQL in the following way:

– relaxes the strong typing constraint (type coercion)– adds regular path expressions for the navigation– adds Skolem functions

• Advantages:– builds on a powerful and well defined language

(OQL)– type coercion deals with irregular data

• Limitations:– type coercion is not always good– data model ignores some important XML details

Page 187: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 187D. Florescu, J. Siméon

YATL• Authors: Jerome Simeon, Sophie Cluet

• Papers: “Your Mediators Need Data Conversion!” Sigmod’1998

“The New YATL: Design and Specifications”, INRIA 1999

• Initial goal: data conversion and integration

• Data model: ordered trees, references, node-labeled

• Language description:– like OQL & Lorel: functional language

– like others: database iterator (make...match...where)

– like others: Skolem functions to manipulate references

– pattern matching with horizontal regular expressions

– local functions with full recursive functions for conversions

• Implementation: v1 INRIA in 1998 & v2 Bell Labs in 2000

Page 188: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 188D. Florescu, J. Siméon

YATL• Papers: “Your Mediators Need Data Conversion!” Sigmod’1998,

“The New YATL: Design and Specifications”, INRIA 1999

• Initial goal: data conversion and integration

• Data model: ordered trees, references, node-labeled

• Language description:– like OQL & Lorel: functional language

– like others: database iterator (make...match...where)

– like others: Skolem functions to manipulate references

– pattern matching with horizontal regular expressions

– full recursive functions and case expression for conversions

• Implementation: v1 INRIA in 1998 & v2 Bell Labs in 2000

Page 189: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 189D. Florescu, J. Siméon

Tree patterns in YATL• Query1: ”Retrieve the titles of the books

published in 1967 by ‘ Pantheon Books ’.

MAKE result [ $t ]

MATCH « bib.xml »  WITH book[ @year[$y],

title[$t],

publisher[$p] ]

WHERE $p = “Pantheon Books” and $y=1967

Different semantics for matching: •no additional children allowed in a book •the cardinality of each @year, title and publisher has to be respected •the order of @year, title and publisher has to be respected

Page 190: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 190D. Florescu, J. Siméon

Tree patterns in YATL• Query1: ”Retrieve the titles of the books

published in 1967 by ‘ Pantheon Books ’.

MAKE result [ $t ]

MATCH input.xml  WITH book[ _, @year[$y] _

title[$t], _,

publisher[$p], _ ]

WHERE $p = “Pantheon Books” and $y=1967

Different semantics for the patterns: •DO allow additional children in a book •the cardinality of each @year, title and publisher has to be respected •the order of @year, title and publisher has to be respected

Page 191: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 191D. Florescu, J. Siméon

Horizontal regular expressions• A Tree Pattern = type expression without union, and

with annotated variables ($v)

• Query: ”Retrieve the first author after the book title  ”.

• Process DTDs like: <!ELEMENT bib’ (title, author+)*>

Ex: “Create a bibliography for each author”

book[ title [ String ] book($b) [ title [ $t ], author[String]+, +author [ $a ]+, UrTree* ] _ ]

MAKE $aMATCH book WITH book [ _ , title , _, author[$a] , *author, _ ]

MAKE *($a) bib [ author [ $a ], *title [ $t ] ]MATCH bib’ WITH bib[*(title [ $t ], +author [ $a ] )]

Page 192: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 192D. Florescu, J. Siméon

Recursive functions• Query1: ”Retrieve the table of content of a

book.”

• Problem: how to enforce termination ?!

define function toc($b) = case $b of | title[$t] -> title[$t] | section [*$child] -> section[ *toc($child) ] | _ [ *$child ] -> [ *toc($child) ];

toc(bib/book);

Page 193: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 193D. Florescu, J. Siméon

YATL: final conclusion• YATL design goals :

– Orthogonal constructs + functional glue– Regular expressions = XML types

= YATL primitive operation– Recursion and case statement: very expressive

to support queries, conversion and integration– Efficient on the classical database queries

• Open issues :– no termination!– optimization of recursion and case ?

Page 194: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 194D. Florescu, J. Siméon

XSLT(1)• Paper:

– “XSL Transformations (XSLT)”, W3C recommendation

• XML to XML rule based transformation language

• An XSLT program is an XML document itself

The divided self

publisher

R.D. Laing

author

book

titlepublisherauthor

bookbook

......

..........................................

title

bib

Pantheon Books

The divided self

publisher

R.D. Laing

author

book

titlepublisherauthor

bookbook

......

..........................................

title

bib

Pantheon Books

The divided self

publisher

R.D. Laing

author

book

titlepublisherauthor

bookbook

......

..........................................

title

bib

Pantheon Books

DOM

XML

HTML

data

transformation

result

Page 195: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 195D. Florescu, J. Siméon

XSLT(2)

• An XSLT program is a valid XML document containing:– elements in the <xsl:> namespace (i.e. the XSLT statements)

– elements in other namespaces(i.e the user-defined data)

• The result of the evaluation of an XSLT program on an input XML document := the XSLT document where each <xsl:> element has been replaced with the result of its “evaluation”

• Uses Xpath as a sublanguage

• Used mostly as a stylesheet language

Page 196: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 196D. Florescu, J. Siméon

XSLT programs

• An XSLT program – is an element of type <xsl:stylesheet>

1. XSL elements describing rewriting rules– <xsl:template>

2. XSL elements describing rule execution control – <xsl:apply-templates>– <xsl:call-template>

3. XSL elements describing instructions– <xsl:element>, <xsl:attribute>, <xsl:for-each>,

<xsl:if>, <xsl:copy>, <xsl:copy-of>, <xsl:sort>, <xsl:value-of>, etc

Page 197: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 197D. Florescu, J. Siméon

XSLT processing model• Process an XML document (procedure PD):

1. Apply the procedure PL (bellow) to a list with a single node: the root of the document

• Process a list L of nodes (procedure PL):1. Process each node N (procedure P bellow) in the list (with current

node=N and current list=L)

2. Return the concatenation (in the right order) of the partial results

PL([x1, x2…, xn]) = [ P(x1), P(x2), …, P(xn)]

• Process a node N (procedure P):1. Find all applicable templates to the node N

2. Find the “best” template among them

3. Instantiate the content of the template

4. Return this result

Page 198: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 198D. Florescu, J. Siméon

<xsl:template>• Basic XSLT concept: describes a rewriting rule

• It has:– attributes to describe the acceptable input – content to describe the output

• Attributes:– match: Xpath expression describing the elements to which this

template applies– name: the name of the template rule– priority: guides the choice of the best template to apply

• The content is a legal XML fragment with:– Elements from the xsl namespace – Other elements (user data)

Page 199: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 199D. Florescu, J. Siméon

<xsl:template> example <xsl:template name=“myTemplate” match=“book[title]” >

<resultBook> <xsl:attribute name=resultYear>

<xsl:value-of select=“./@year”/> </xsl:attribute>

The title of this book is <resultTitle>

<xsl:value-of select=“./title”/> </resultTitle>

and it was.... </resultBook><xsl:template>

Page 200: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 200D. Florescu, J. Siméon

Instantiating an <xsl:template>

• ... on a node N:» returns the content of the template where the <xsl:> elements

from the content of the template have been replaced with the result of their “evaluation” ( with the current node=N )

» Two types of <xsl:> elements in the content:

1. Instruction elements » <xsl:copy>, <xsl:copy-of>, <xsl:value-of>, <xsl:for-each>» return a certain list of nodes according to their particular semantics

2. Rule control elements » <xsl:apply-templates>, <xsl:call-templates>» recursive calls to the rule engine (see below)

• Maps an XML node into a list of XML nodes

Page 201: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 201D. Florescu, J. Siméon

<xsl:template> example <xsl:template name=“myTemplate” match=“book[title]” >

<resultBook> <xsl:attribute name=resultYear>

<xsl:value-of select=“./@year”/> </xsl:attribute>

The title of this book is <resultTitle>

<xsl:value-of select=“./title”/> </resultTitle>

and it was.... </resultBook><xsl:template>

Page 202: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 202D. Florescu, J. Siméon

Example of instantiation<book ISBN=“10” year=“1967” >

<title>The politics of experience</title> <author>R.D.Laing</author> <section> The great and tr

<title>Persons and experience</title>

<section> Exploitation must not been….

</section> </section> </book>

<resultBook resultYear=1967> The title of this book is <resultTitle>

The politics of experience </resultTitle> and it was ….</resultBook>

Input XML

Output XML

Page 203: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 203D. Florescu, J. Siméon

Recursive <xsl:template><xsl:template name=“myTemplate” match=“book[title]”

> <resultBook>

<xsl:attribute name=resultYear><xsl:value-of select=“./@year”/>

</xsl:attribute> <resultTitle>

<xsl:value-of select=“./title”/> </resultTitle>

<xsl:apply-template select= “./section” /> </resultBook><xsl:template>

Invokes the procedure PL with current list= “./section”.  

Page 204: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 204D. Florescu, J. Siméon

Recursive calls• <xsl:apply-templates>

– invokes recursively the procedure PL – the argument is a new list of nodes

» explicitly specified in the select attribute» by default is the list of children of the current node

<xsl:apply-template select=“ ./section ”/>

• <xsl:call-template>– triggers the instantiation of a specific template identified by

name – does not change the context node and the context list

<xsl:call-template name=“myTemplate” />

Page 205: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 205D. Florescu, J. Siméon

XSLT execution control <xsl:stylesheet>------------------------------------------------------------------ <xsl:template name=“myTemplate”>

<xsl:apply-template select=“./ascendent::book”/> <xsl:template>------------------------------------------------------------------ <xsl:template match=“section”>

This is a section of the book <xsl:call-template name=“myTemplate”/> and its name is <xsl:value-of select=“./title”> . </xsl:template>------------------------------------------------------------------ <xsl:template match=“book”>

<xsl:value-of select=“./title”> </xsl:template>----------------------------------------------------------------- <xsl:template match=“/”>

<xsl:apply-template select=“//section[title]”> </xsl:template>------------------------------------------------------------------</xsl:stylesheet>

Page 206: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 206D. Florescu, J. Siméon

Built-in templates------------------------------------------------------------------

<xsl:template match=“*|/”> apply recursively on the children <xsl:apply-templates select=“./node()” /> if element</xsl:template>

------------------------------------------------------------------

<xsl:template match=“@*|text()”><xsl:value-of select=“.”/> print the content

</xsl:template> if text node or attribute

-----------------------------------------------------------------

<xsl:template match=“processing-instruction()|comment()”/> ignore (do nothing) if processing instruction or comment

Page 207: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 207D. Florescu, J. Siméon

TOC of a certain book

<xsl:template match=“/”> <xsl:apply-template select=“//book[@ISBN=10]”>

</xsl:template>----------------------------------------------------------------------------------

<xsl:template match=“book”><xsl:apply-template select=“./section”>

</xsl:template>--------------------------------------------------<xsl:template match=“section”>

Section <xsl:value-of select=“title”> <xsl:apply-templates select=“./section”>

</xsl:template>

-----------------------------------------------------------------

Page 208: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 208D. Florescu, J. Siméon

XSLT: final conclusion

• Describes general XML to XML transformations

• Built-in processing model

• Full recursion (not only structural recursion like UnQL!)

• Possibile to write non-terminating programs even on trees

• XSLT vs. Quilt – equivalent expressive power– differences: programming style, XML vs. Non-XML syntax

• Could be considered as a query language

• Is it “declarative” ? Should it be a QL candidate?

Page 209: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 209D. Florescu, J. Siméon

XML-related research problems(1)• Update languages for XML

• XML views of object-relational databases

• Storing XML data in object-relational DBMSs– new challenges for the traditional DBMSs

• Alternative storage methods for XML data

• Indexing XML

• Query processing algorithms for XML data

• Mixing structured search with full-text search

• XML benchmarks

Page 210: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 210D. Florescu, J. Siméon

XML-related research problems(2)• Distributed execution of XML queries

• XML-based information mediation

• XML data cleaning

• XML data compression

• Efficient (streamed) processing of XML transformations

• XML-based information brokering

• XML-based workflow systems

and many more...

Page 211: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 211D. Florescu, J. Siméon

Conclusion• XML is the lingua franca of the Web • XML is the next big challenge for the database community• Large quantities of a new type of data

– textual, irregular, self-organizing, distributed, replicated, etc.

• Many orders of magnitude larger:– the volume of XML data– the number of XML data repositories

• The need for such a technology is here• The solutions are not here !• Myriad of standards and products issued from industry

What is the role of the research?

Page 212: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 212D. Florescu, J. Siméon

Typeswitch• Goal:

– control the evaluation using the type of a certain expression

• Syntax:typeswitch expression0 ‘ [ ‘ as variable ‘ ] ’

case type1 return expression1 ……….. case typeK return expressionK else return expressionk+1

• Semantics: – compute the dynamic type of the expression0 – if the dynamic type of expression0 and the typeK have a non-

empty intersection, the entire expression evaluates to the result of the expressionK.

– if no case clause satisfies this requirement, return the result of the expressionk+1.

Page 213: XML Data: From Research to Standards

ICDE’2001, Heidelberg, Germany 213D. Florescu, J. Siméon

Typeswitch (2)• Example:

for $x in /department[name=“operations”]/personnel/*

return typeswitch $x

case manager return $x/salary+ 1000

case regular_employee return $x/salary

else error