58
about XML/Xquery/RDF

about XML/Xquery/RDF

  • Upload
    kele

  • View
    30

  • Download
    1

Embed Size (px)

DESCRIPTION

about XML/Xquery/RDF. < h1 > Bibliography < p > < i > Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 < p > < i > Data on the Web Abiteoul, Buneman, Suciu < br > Morgan Kaufmann, 1999. < bibliography > - PowerPoint PPT Presentation

Citation preview

Page 2: about  XML/Xquery/RDF

HTML vs. XML<h1> Bibliography </h1><p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995<p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999

<bibliography> <book> <title> Foundations…

</title> <author> Abiteboul

</author> <author> Hull </author> <author> Vianu </author> <publisher> Addison

Wesley </publisher> <year> 1995 </year> </book> …

</bibliography>

“Self-describing”

-Schema info part of the data

-Good for data exchange

(albeit baroque for storage)

Page 3: about  XML/Xquery/RDF

Why are Database folks so excited about XML?

• XML is just a syntax for (self-describing) data

• This is still exciting because– No standard syntax for

relational data– With XML, we can

• Translate any legacy data to XML

• Can exchange data in XML format

– Ship over the web, input to any application

Page 4: about  XML/Xquery/RDF

XML machine accessible meaningThis is what a web-page in natural language looks like for a machine

Jim Hendler

Page 5: about  XML/Xquery/RDF

XML machine accessible meaning

CV

name

education

work

private

< >

< >

< >

< >

< >

XML allows “meaningful tags” to be added toparts of the text

Jim Hendler

Page 6: about  XML/Xquery/RDF

XML machine accessible meaning

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

<>

<>

<>

But to your machine, the tags look like this….

Jim Hendler

Page 7: about  XML/Xquery/RDF

XML machine accessible meaning

Schemas help….

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

<>

<>

<>

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

<>

<>

<>

< > …by relating common termsbetween documents

Jim Hendler

Page 8: about  XML/Xquery/RDF

But other people use other schemas

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

>

<>

<>

Someone else has one like this….

Jim Hendler

Page 9: about  XML/Xquery/RDF

But other people use other schemas

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

<>

<>

<>

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

<>

<>

<>

< >…which don’t fit in

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

< >

< >

Moral: There is still

need for ontology

mapping..

Jim Hendler

Page 10: about  XML/Xquery/RDF

11/18

Page 11: about  XML/Xquery/RDF

The X-standards…

• XML: an on-the-wire representation for data– Xquery: a query language for XML– Xschema: a schema description language for

XML data• RDF: a language for meta-data description• WSDL/SOAP/UDDI: languages for

describing services

Page 12: about  XML/Xquery/RDF

XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements:

<book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element

well formed XML document: if it has matching tags

Page 13: about  XML/Xquery/RDF

<h1> Bibliography </h1><p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995<p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999

<bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> …

</bibliography>

HTML describes presentation

XML describes content

Page 14: about  XML/Xquery/RDF
Page 15: about  XML/Xquery/RDF

XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements:

<book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element

well formed XML document: if it has matching tags

Page 16: about  XML/Xquery/RDF

More XML: Attributes

<book price = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year></book>

Attributes are single-valued --No guidance on when to use them

Page 17: about  XML/Xquery/RDF

More XML: Oids and References

<person id=“o555”> <name> Jane </name> </person>

<person id=“o456”> <name> Mary </name> <children idref=“o123 o555”/></person>

<person id=“o123” mother=“o456”><name>John</name></person>

oids and references in XML are just syntax

Object identifiers

Page 18: about  XML/Xquery/RDF

XML vs. Relational Data• XML is meant as a language that supports

both Text and Structured Data– Conflicting demands...

• XML supports semi-structured data– In essence, the schema can be union

of multiple schemas • Easy to represent books with or

without prices, books with any number of authors etc.

• XML supports free mixing of text and data– using the #PCDATA type

• XML is ordered (while relational data is unordered)

TEXT

Structured(relational)

Data

XMLLessStructure

MoreStructure

Page 19: about  XML/Xquery/RDF

DTDs<!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT section ((title,section*) | text)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)>]>

<paper> <section> <text> </text> </section> <section> <title> </title> <section> … </section> <section> … </section> </section></paper>

Notice that DTD is not

In XML syntax…

Semi-structured

Page 20: about  XML/Xquery/RDF

XML Schemas

• More recent proposal (with XML syntax)• unifies previous schema proposals• generalizes DTDs• uses XML syntax• two documents: structure and datatypes

– http://www.w3.org/TR/xmlschema-1– http://www.w3.org/TR/xmlschema-2

Page 21: about  XML/Xquery/RDF

RDF: Meta-data Standard for Web<rdf:Description about=“www.mypage.com”> <about> birds, butterflies, snakes </about> <author> <rdf:Description> <firstname> John </firstname> <lastname> Smith </lastname> </rdf:Description> </author></rdf:Description>

www.mypage.com

birds, butterflies, snakes

John Smith

about author

firstname lastname

Good’ol semantic networks..?

Page 22: about  XML/Xquery/RDF

Querying XML• Requirements:

– Need to handle lack of schema.• We may not know much about the data, so we need to navigate the XML.

– Need to support both “information retrieval” and “SQL-style” queries.

• Ordered vs. un-ordered XML – “Human readable”

• like SQL?

• Candidates– Many… based on conflicting requirements

• XSL: Makes IR folks happy• XML-QL: Makes DB folks happy• Xquery : W3C’s attempt to make everybody (un)happy

Page 23: about  XML/Xquery/RDF

11/20

Agenda: Xquery examples

Information Integration

Page 24: about  XML/Xquery/RDF

• XQuery 1.0: An XML Query Language

– W3C Working Draft 20 December 2001

• XML Query Use Cases – W3C Working Draft 20

December 2001• Microsoft .Net Xquery Language

Demo– http://131.107.228.20/– Supports querying on the

documents described in the W3C Use Cases

• Xquery Tutorial by Fankhauser & Wadler

– www.research.avayalabs.com/user/wadler/papers/xquery-tutorial/ xquery-tutorial.pdf

Xquery Resources

Page 25: about  XML/Xquery/RDF

FLoWeR Expressions

Xquery queries are made up of FLWR expressions that work on “paths”

• For binds variables to nodes• Let computes aggregates• Where applies a formula to find matching elements• Return constructs the output elements

Path expressions are of the form: element//element/element[attrib=value]

Page 26: about  XML/Xquery/RDF

Comparison to SQL• Look at the use case description on Xquery manual

• Supports all (?) SQL style queries (with different syntax of course) [default queries in the demo]

• Has support for – “construction”—outputting the answers in arbitrary XML formats

(use case XMP )– “path expressions” --- navigating the XML tree (use case seq)– Simple text queries [use case text]– Allows queries on “Tag” elements

• Removes the “data/meta-data” barrier in queries• For each book that has at least one author, list the title and first two authors,

and an empty "et-al" element if the book has additional authors. [XMP use case 6]

Page 27: about  XML/Xquery/RDF

DTD for http://www.bn.com/bib.xml

<!ELEMENT bib (book* )> <!ELEMENT book (title, (author+ | editor+ ), publisher, price )> <!ATTLIST book year CDATA #REQUIRED ><!ELEMENT author (last, first )> <!ELEMENT editor (last, first, affiliation )> <!ELEMENT title (#PCDATA )> <!ELEMENT last (#PCDATA )> <!ELEMENT first (#PCDATA )> <!ELEMENT affiliation (#PCDATA )> <!ELEMENT publisher (#PCDATA )><!ELEMENT price (#PCDATA )>

Page 28: about  XML/Xquery/RDF

Example Query

<bib> { for $b in /bib/book where $b/publisher =

"Addison-Wesley" and $b/@year > 1991 return <book year={ $b/@year

}> { $b/title } </book> } </bib>

“For all books after 1991, return with Year changed from a tag to an attribute”

<bib> <book year="1994"> <title>TCP/IP

Illustrated</title> </book> <book year="1992"> <title>Advanced

Programming in the Unix environment</title>

</book></bib>

ResultQuery

Page 29: about  XML/Xquery/RDF

Example Query (2) • Return the books that cost more at amazon

than fatbrainLet $amazon := document(

http://www.amazon.com/books.xml),Let $fatbrain := document(

http://www.fatbrain.com/books.xml)For $am in $amazon/books/book, $fat in $fatbrain/books/bookWhere $am/isbn = $fat/isbn and $am/price > $fat/priceReturn <book>{ $am/title, $am/price, $fat/price

}<book>

Join

Page 30: about  XML/Xquery/RDF

XML frenzy in the DB Community

• Now that XML is there, what can we do with it?– Convert all databases from Relational to XML?

• Or provide XML views of relational databases?– Develop theory of native XML databases?

• Or assume that XML data will be stored in relational databases..

– Issues: What sort of storage mechanisms? What sort of indices?

Page 31: about  XML/Xquery/RDF

XML middleware for Databases• XML adapters (middle-ware)

received significant attention in DB community– SilkRoute (AT&T)– Xperanto (IBM)

• Issues:– Need to convert relational data

into XML• Tagging (easy)

– Need to convert Xquery queries into equivalent SQL queries

• Trickier as Xquery supports schema querying

SQL

Relations

Xquery

XML

Page 32: about  XML/Xquery/RDF

Xquery Tutorial

Craig KnoblockUniversity of Southern California

Page 33: about  XML/Xquery/RDF

References• XQuery 1.0: An XML Query Language

– W3C Working Draft 20 December 2001• XML Query Use Cases

– W3C Working Draft 20 December 2001• Microsoft .Net Xquery Language Demo

– http://131.107.228.20/– Supports querying on the documents described in the W3C Use

Cases• Xquery Tutorial by Fankhauser & Wadler

– www.research.avayalabs.com/user/wadler/papers/xquery-tutorial/ xquery-tutorial.pdf

Page 34: about  XML/Xquery/RDF

DTD for http://www.bn.com/bib.xml

<!ELEMENT bib (book* )> <!ELEMENT book (title, (author+ | editor+ ), publisher, price )> <!ATTLIST book year CDATA #REQUIRED ><!ELEMENT author (last, first )> <!ELEMENT editor (last, first, affiliation )> <!ELEMENT title (#PCDATA )> <!ELEMENT last (#PCDATA )> <!ELEMENT first (#PCDATA )> <!ELEMENT affiliation (#PCDATA )> <!ELEMENT publisher (#PCDATA )><!ELEMENT price (#PCDATA )>

Page 35: about  XML/Xquery/RDF

Data for www.bn.com/bib.xml<bib>

<book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price> 65.95</price>

</book> <book year="1992">

<title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author>

<publisher>Addison-Wesley</publisher> <price>65.95</price>

</book>

Page 36: about  XML/Xquery/RDF

Data for www.bn.com/bib.xml (cont.)

<book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price>

</book> <book year="1999">

<title>The Economics of Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first>

<affiliation>CITI</affiliation> </editor><publisher>Kluwer Academic Publishers</publisher> <price>129.95</price>

</book> </bib>

Page 37: about  XML/Xquery/RDF

Document References

• Document can either be referenced explicitly or in the default namespace

• In the Microsoft Demo– /Bib =

document("http://www.bn.com/bib.xml")/bib• We will use /bib throughout, but you

must use the expansion to run the demo• In Theseus the document for xquery is

passed as input

Page 38: about  XML/Xquery/RDF

Projection• Return the names of all authors of books/bib/book/author

=<author><last>Stevens</last><first>W.</first></author><author><last>Stevens</last><first>W.</first></author><author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author>

Page 39: about  XML/Xquery/RDF

Project (cont.)• The same query can also be written as a for loop/bib/book/author

=for $bk in /bib/book return

for $aut in $bk/author return $aut=

<author><last>Stevens</last><first>W.</first></author><author><last>Stevens</last><first>W.</first></author><author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author>

Page 40: about  XML/Xquery/RDF

Selection• Return the titles of all books published before

1997/bib/book[@year < "1997"]/title=<title>TCP/IP Illustrated</title><title>Advanced Programming in the Unix

environment</title>

Page 41: about  XML/Xquery/RDF

Selection (cont.)• Return the titles of all books published before

1997/bib/book[@year < "1997"]/title=for $bk in /bib/book where $bk/@year < "1997" return $bk/title=<title>TCP/IP Illustrated</title><title>Advanced Programming in the Unix

environment</title>

Page 42: about  XML/Xquery/RDF

Selection (cont.)• Return book with the title “Data on the Web”/bib/book[title = "Data on the Web"]=

<book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></

author><author><last>Buneman</last><first>Peter</first></

author><author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price>

</book>

Page 43: about  XML/Xquery/RDF

Selection (cont.)• Return the price of the book “Data on the

Web”/bib/book[title = "Data on the Web"]/price=<price> 39.95</price>

How would you return the book with a price of $39.95?

Page 44: about  XML/Xquery/RDF

Selection (cont.)• Return the book with a price of $39.95for $bk in /bib/book where $bk/price = " 39.95" return $bk=

<book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price>

</book>

Page 45: about  XML/Xquery/RDF

Construction• Return year and title of all books published before 1997for $bk in /bib/book where $bk/@year < "1997" return <book>{ $bk/@year, $bk/title }</book>=<book year="1994"> <title>TCP/IP Illustrated</title></book><book year="1992"> <title>Advanced Programming in the Unix

environment</title></book>

Page 46: about  XML/Xquery/RDF

Grouping• Return titles for each authorfor $author in distinct(/bib/book/author/last) return <author name={ $author/text() }> { /bib/book[author/last = $author]/title }</author>=<author name="Stevens"> <title>TCP/IP Illustrated</title> <title>Advanced Programming in the Unix environment</title></author><author name="Abiteboul"> <title>Data on the Web</title></author>…

Page 47: about  XML/Xquery/RDF

Join• Return the books that cost more at amazon than

fatbrainLet $amazon := document(

http://www.amazon.com/books.xml),Let $fatbrain := document(

http://www.fatbrain.com/books.xml)For $am in $amazon/books/book, $fat in $fatbrain/books/bookWhere $am/isbn = $fat/isbn and $am/price > $fat/priceReturn <book>{ $am/title, $am/price, $fat/price }<book>

Page 48: about  XML/Xquery/RDF

Example Query 1

<bib> { for $b in /bib/book where $b/publisher = "Addison-Wesley" and

$b/@year > 1991 return <book year={ $b/@year }> { $b/title } </book> } </bib>What does this do?

Page 49: about  XML/Xquery/RDF

Result Query 1

<bib> <book year="1994"> <title>TCP/IP Illustrated</title> </book> <book year="1992"> <title>Advanced Programming in the Unix

environment</title> </book></bib>

Page 50: about  XML/Xquery/RDF

Example Query 2<results>{ for $b in

document("http://www.bn.com/bib.xml")/bib/book, $t in $b/title, $a in $b/author return <result> { $t } { $a } </result> }</results>

Page 51: about  XML/Xquery/RDF

Result Query 2<results> <result><title>TCP/IP Illustrated</title> <last>Stevens </last> </result> <result><title>Advanced Programming in the Unix environment</title> <last>Stevens</last> </result> <result><title>Data on the Web</title> <last>Abiteboul</last> </result> <result> <title>Data on the Web</title> <last>Buneman</last> </result> <result><title>Data on the Web</title> <last>Suciu</last> </result></results>

Page 52: about  XML/Xquery/RDF

Example Query 3

<books-with-prices>{ for $b in document("http://www.bn.com/bib.xml")//book, $a in

document("http://www.amazon.com/reviews.xml")//entry where $b/title = $a/title return <book-with-prices> { $b/title } <price-amazon>{ $a/price/text() }</price-amazon> <price-bn>{ $b/price/text() }</price-bn> </book-with-prices>}</books-with-prices>

Page 53: about  XML/Xquery/RDF

Result Query 3

<books-with-prices> <book-with-prices> <title>TCP/IP Illustrated</title> <price-amazon>65.95</price-amazon> <price-bn> 65.95</price-bn> </book-with-prices> <book-with-prices> <title>Advanced Programming in the Unix environment</title> <price-amazon>65.95</price-amazon> <price-bn>65.95</price-bn> </book-with-prices> <book-with-prices> <title>Data on the Web </title> <price-amazon>34.95</price-amazon> <price-bn> 39.95</price-bn> </book-with-prices></books-with-prices>

Page 54: about  XML/Xquery/RDF

Example Query 4

<bib> { for $b in document("www.bn.com/bib.xml")//book where $b/publisher = "Addison-Wesley" and $b/@year >

"1991" return <book> { $b/@year } { $b/title } </book> sortby (title) } </bib>

Page 55: about  XML/Xquery/RDF

Example Result 4

<bib> <book year="1992"> <title>Advanced Programming in the Unix

environment</title> </book> <book year="1994"> <title>TCP/IP Illustrated</title> </book> </bib>

Page 56: about  XML/Xquery/RDF

Impact of XML on IntegrationIf and when all sources accept

Xqueries and exchange data in XML format, then– Mediator can accept user

queries in Xquery– Access sources using Xquery– Get data back in XML format– Merge results and send to user

in XML format• How about now?

– Sources can use XML adapters (middle-ware)

Mediator

Xquery

XML

Xquery

XML

SQL

Relations

Xquery

XML

Page 57: about  XML/Xquery/RDF

Is XML standardization a magical solution for Integration?

If all WEB sources standardize into XML format– Source access (wrapper generation

issues) become easier to manage– BUT all other problems remain

• Still need to relate source (XML)schemas to mediator (XML)schema

• Still need to reason about source overlap, source access limitations etc.

• Still need to manage execution in the presence of source/network uncertainities

QueryQuery

Services

Webpages

Structureddata

Sensors(streamingData)

Services

Webpages

Structureddata

Sensors(streamingData)

ExecutorNeeds to handleSource/network

Interruptions,Runtime uncertainity,

replanning

Source Fusion/Query Planning

Needs to handle:Multiple objectives,Service composition,

Source quality & overlap

Source TrustOntologies;

Source/ServiceDescriptions

Replanning

Requests

Prefere

nce/U

tility

Model

Answers

ProbingQueries

Sour

ce C

alls

Monitor

Updating StatisticsExecutor

Needs to handleSource/network

Interruptions,Runtime uncertainity,

replanning

Source Fusion/Query Planning

Needs to handle:Multiple objectives,Service composition,

Source quality & overlap

Source TrustOntologies;

Source/ServiceDescriptions

Replanning

Requests

Prefere

nce/U

tility

Model

Answers

ProbingQueries

Sour

ce C

alls

Monitor

Updating Statistics

Mediator

Xquery

XML

Xquery

XML

Page 58: about  XML/Xquery/RDF

“Semantic Web”

• The LAV/GAV approaches assume that some human expert will do the actual schema mapping

• The “semantic-web” initiative attempts to automate schema mapping– Idea: Allow pages to write logical axioms relating their

vocabulary (tags) to other external tags– Support automatic inference of relations between

source and mediator schema using these rules • DAML+OIL