28
© 2017 A. Alawini, S. Davidson XML and XQuery Susan B. Davidson CIS 700: Advanced Topics in Databases MW 1:30-3 Towne 309 http://www.cis.upenn.edu/~susan/cis700/homepage.html

XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

© 2017 A. Alawini, S. Davidson

XMLandXQuery

SusanB.DavidsonCIS700:AdvancedTopicsinDatabases

MW1:30-3

Towne309

http://www.cis.upenn.edu/~susan/cis700/homepage.html

Page 2: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

2

XML Anatomy

<?xml version="1.0" encoding="ISO-8859-1" ?> <dblp> <mastersthesis mdate="2002-01-03" key="ms/Brown92"> <author>Kurt P. Brown</author> <title>PRPL: A Database Workload Specification Language</title> <year>1992</year> <school>Univ. of Wisconsin-Madison</school> </mastersthesis> <article mdate="2002-01-03" key="tr/dec/SRC1997-018"> <editor>Paul R. McJones</editor> <title>The 1995 SQL Reunion</title> <journal>Digital System Research Center Report</journal> <volume>SRC1997-018</volume> <year>1997</year> <ee>db/labs/dec/SRC1997-018.html</ee> <ee>http://www.mcjones.org/System_R/SQL_Reunion_95/</ee> </article>

Processing Instr.

Element

Attribute

Close-tag

Open-tag

Page 3: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

3

XML Data Model Visualized (and simplified!)

Root

?xml dblp

mastersthesis article

mdate key author title year school editor title year journal volume ee ee

mdate key

2002…

ms/Brown92

Kurt P….

PRPL…

1992

Univ….

2002…

tr/dec/…

Paul R.

The…

Digital…

SRC…

1997

db/labs/dec

http://www.

attribute root

p-i element

text

Page 4: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

4

Structural Constraints: Document Type Definitions (DTDs)

TheDTDisanEBNFgrammardefiningXMLstructure• XMLdocumentspecifiesanassociatedDTD,plustherootelement

• DTDspecifieschildrenoftheroot(andsoon)

DTDdefinesspecialsignificanceforattributes:• IDs–specialattributesthatareanalogoustokeysforelements

• IDREFs–referencestoIDs

• IDREFS–anastyhackthatrepresentsalistofIDREFs

Page 5: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

5

An Example DTD

ExampleDTD:<!ELEMENT dblp((mastersthesis | article)*)> <!ELEMENT mastersthesis(author,title,year,school,committeemember*)> <!ATTLIST mastersthesis(mdate CDATA #REQUIRED

key ID #REQUIRED advisor CDATA #IMPLIED>

<!ELEMENT author(#PCDATA)>

… ExampleuseofDTDinXMLfile:

<?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPEdblpSYSTEM“my.dtd"><dblp>…

Page 6: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

6

Representing Graphs and Links in XML: Basically Using Foreign Keys

<?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE graph SYSTEM “special.dtd"> <graph>

<author id=“author1”> <name>John Smith</name> </author> <article> <author ref=“author1” /> <title>Paper1</title> </article> <article> <author ref=“author1” /> <title>Paper2</title> </article>

Page 7: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

7

Graph Data Model

Root

!DOCTYPE graph

author article

name title

ref ref

John Smith

author1 author1

Paper2

?xml article

id

author1

author author title

Paper1

Page 8: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

8

Graph Data Model

Root

!DOCTYPE graph

author article

name title

ref ref

John Smith

Paper2

?xml article

id

author1

author author title

Paper1

Page 9: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

9

Querying XML

Howdoyouqueryadirectedgraph?atree?

ThestandardapproachusedbymanyXML,semistructured-data,andobjectquerylanguages:• Definesomesortofatemplatedescribingtraversalsfromtherootofthedirectedgraph

• InXML,thebasisofthistemplateiscalledanXPath

Page 10: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

10

XPaths

Initssimplestform,anXPathislikeapathinafilesystem:/mypath/subpath/*/morepath

• TheXPathreturnsanodesetrepresentingtheXMLnodes(andtheirsubtrees)attheendofthepath

• XPathscanhavenodetestsattheend,returningonlyparticularnodetypes,e.g.,text(),processing-instruction(),comment(),element(),attribute()

• XPathisfundamentallyanorderedlanguage:itcanqueryinorder-awarefashion,anditreturnsnodesinorder

Page 11: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

11

Some Example XPath Queries

• /dblp/mastersthesis/title• /dblp/*/editor• //title• //title/text()

Page 12: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

12

Context Nodes and Relative Paths

XPathhasanotionofacontextnode:it’sanalogoustoacurrentdirectory• “.”representsthiscontextnode• “..”representstheparentnode• Wecanexpressrelativepaths:

subpath/sub-subpath/../..getsusbacktothecontextnode

Ø Bydefault,thedocumentrootisthecontextnode

Page 13: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

13

Predicates – Selection Operations

Apredicateallowsustofilterthenodesetbasedonselection-likeconditionsoversub-XPaths:

/dblp/article[title=“Paper1”]

whichisequivalentto:/dblp/article[./title/text()=“Paper1”]

Page 14: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

14

Axes: More Complex Traversals

Thusfar,we’veseenXPathexpressionsthatgodownthetree(anduponestep)• Butwemightwanttogoup,left,right,etc.

• Theseareexpressedwithso-calledaxes:•  self::path-step•  child::path-step parent::path-step

• descendant::path-step ancestor::path-step

• descendant-or-self::path-step ancestor-or-self::path-step

•  preceding-sibling::path-step following-sibling::path-step

•  preceding::path-step following::path-step

• ThepreviousXPathswesawwerein“abbreviatedform”

Page 15: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

15

Querying Order

• Wesawinthepreviousslidethatwecouldqueryforprecedingorfollowingsiblingsornodes

• Wecanalsoqueryanodeforitspositionaccordingtosomeindex:• fn::first() ,fn::last()returnindexof0th&lastelementmatchingthelaststep:

• fn::position() givestherelativecountofthecurrentnode

child::article[fn::position()=fn::last()]

Page 16: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

16

Beyond XPath: XQuery

Astrongly-typed,Turing-completeXMLmanipulationlanguage• AttemptstodostatictypecheckingagainstXMLSchema• BasedonanobjectmodelderivedfromSchema

UnlikeSQL,fullycompositional,highlyorthogonal:• Inputs&outputscollections(sequencesorbags)ofXMLnodes

• Anywhereaparticulartypeofobjectmaybeused,mayusetheresultsofaqueryofthesametype

• DesignedmostlybyDBandfunctionallanguagepeople

Page 17: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

17

XQuery’s Basic Form

• HasananalogousformtoSQL’sSELECT..FROM..WHERE..GROUPBY..ORDERBY

• Themodel:bindnodes(ornodesets)tovariables;operateovereachlegalcombinationofbindings;produceasetofnodes

• “FLWOR”statement[notecasesensitivity!]:for{iteratorsthatbindvariables}let{collections}where{conditions}orderby{order-paths}return{outputconstructor}

• MixesXML+XQuerysyntax;use{}as“escapes”

Page 18: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

18

XML Data Model Visualized

Root

?xml dblp

mastersthesis article

mdate key author title year school editor title year journal volume ee ee

mdate key

2002…

ms/Brown92

Kurt P….

PRPL…

1992

Univ….

2002…

tr/dec/…

Paul R.

The…

Digital…

SRC…

1997

db/labs/dec

http://www.

attribute root

p-i element

text

Page 19: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

19

“Iterations” in XQuery Aseriesof(possiblynested)FORstatementsassigningtheresultsofXPathstovariables

for$rootindoc(“http://my.org/my.xml”) for$subin$root/rootElement, $sub2in$sub/subElement,…• Somethinglikeatemplatethatpattern-matches,producesa“bindingtuple”

• Foreachofthese,weevaluatetheWHEREandpossiblyoutputtheRETURNtemplate

• document()ordoc()functionspecifiesaninputfileasaURI

Page 20: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

20

Two XQuery Examples <root-tag>{for$pindoc(“dblp.xml”)/dblp/proceedings,$yrin$p/yrwhere$yr=“1999”return<proc>{$p}</proc>

}</root-tag>for$iindoc(“dblp.xml”)/dblp/inproceedings[author/text()=“JohnSmith”]

return<smith-paper> <title>{$i/title/text()}</title> <key>{$i/@key}</key> {$i/crossref} </smith-paper>

Page 21: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

21

Nesting in XQuery

NestingXMLtreesisperhapsthemostcommonoperationInXQuery,it’seasy–putasubqueryinthereturnclausewhereyouwantthingstorepeat!

for$uindoc(“dblp.xml”)/dblp/universitywhere$u/country=“USA”return<ms-theses-99> {$u/name}{ for$mtin$u/../mastersthesis where$mt/year/text()=“1999”and____________ return$mt/title} </ms-theses-99>

Page 22: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

22

Collections & Aggregation in XQuery InXQuery,manyoperationsreturncollections

• XPaths,sub-XQueries,functionsoverthese,…• Theletclauseassignstheresultstoavariable

Aggregationappliesafunctionoveracollection(elegant!)

let$allpapers:=doc(“dblp.xml”)/dblp/articlereturn<article-authors><count>{fn:count(fn:distinct-values($allpapers/authors))}</count>

{ for$paperindoc(“dblp.xml”)/dblp/articlelet$pauth:=$paper/authorreturn<paper>{$paper/title} <count>{fn:count($pauth)}</count> </paper>

}</article-authors>

Page 23: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

23

Collections, Ctd.

UnlikeSQL,wecancomposeaggregationsandcreatenewcollectionsfromold:

<result>{let$avgItemsSold:=fn:avg(for$orderindoc(“my.xml”)/orders/orderlet$totalSold=fn:sum($order/item/quantity)return$totalSold)return$avgItemsSold

}</result>

Page 24: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

24

Distinct-ness

InXQuery,DISTINCT-nesshappensasafunctionoveracollection• Butsincewehavenodes,wecandoduplicateremovalaccordingtovalueornode

• Candofn:distinct-values(collection)toremoveduplicatevalues,orfn:distinct-nodes(collection)toremoveduplicatenodes

for$yearsinfn:distinct-values(doc(“dblp.xml”)//year/text()

return$years

Page 25: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

25

Sorting in XQuery

• SQLactuallyallowsyoutosortitsoutput,withaspecialORDERBYclause

• InXQuery,whatweorderisthesequenceof“resulttuples”outputbythereturnclause:

for$xindoc(“dblp.xml”)/proceedingsorderby$x/title/text()return$x

Page 26: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

26

What If Order Doesn’t Matter?

Bydefault:• SQLisunordered

• XQueryisorderedeverywhere!

• Butunorderedqueriesaremuchfastertoanswer

XQueryhasawayoftellingthequeryenginetoavoidpreservingorder:• unordered{for$xin(mypath)…}

Page 27: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

27

Querying & Defining Metadata – Can’t Do This in SQL

Cangetanode’snamebyqueryingname():for$xindoc(“dblp.xml”)/dblp/*

returnname($x)

Canconstructelementsandattributesusingcomputednames:

for$xindoc(“dblp.xml”)/dblp/*,

$yearin$x/year,

$titlein$x/title/text()return

element{name($x)}{

attribute{“year-”+$year}{$title}

}

Page 28: XML and XQuery - Information and Computer Science · 2018-02-06 · XQuery Summary Very flexible and powerful language for XML • Clean and orthogonal: can always replace a collection

28

XQuery Summary

VeryflexibleandpowerfullanguageforXML• Cleanandorthogonal:canalwaysreplaceacollectionwithanexpressionthatcreatescollections

• DBanddocument-oriented(withkeywordsearchextensions)

• ThecoreisrelativelycleanandeasytounderstandTuringComplete–thereareseveralXQueryfunctionsthatenablethis(notdiscussed).