1
Chapter 10: XMLChapter 10: XML
What is XMLWhat is XML Basic Components of XMLBasic Components of XML XPathXPath XQueryXQuery
2
What is XML?What is XML?
EExxtensible tensible MMarkup arkup LLanguageanguage Structured markupStructured markup Simplified SGMLSimplified SGML Next-generation HTMLNext-generation HTML W3C Recommendation (spec)W3C Recommendation (spec)
World Wide Web ConsortiumWorld Wide Web Consortium
3
Family TreeFamily Tree
SGML (1985)
HTML (1993)
XML (1998)
GML (1969)
4
HTML ExampleHTML Example
<HTML><HTML><HEAD><HEAD><TITLE>HTML example</TITLE><TITLE>HTML example</TITLE></HEAD> </HEAD>
<BODY> <BODY>
<H1>HTML example</H1> <H1>HTML example</H1>
<P>This is an example of HTML markup codes. </P><P>This is an example of HTML markup codes. </P>
</BODY></BODY>
</HTML></HTML>
ExampleExample
5
HTML and XMLHTML and XML
HTML: HTML: content and presentation are mixed, structure?content and presentation are mixed, structure? Tags, e.g. <H>, <li>, are fixed and specify Tags, e.g. <H>, <li>, are fixed and specify
presentation presentation XML:XML:
Content, presentation, and structure are Content, presentation, and structure are separatedseparated
User can define new tags with meaningful User can define new tags with meaningful annotationannotation
6
Basic SyntaxBasic Syntax
Starts with XML declarationStarts with XML declaration<?xml version="1.0" standalone=“yes”?><?xml version="1.0" standalone=“yes”?>
Rest of document inside the "root Rest of document inside the "root element"element"<TEI.2>…</TEI.2><TEI.2>…</TEI.2>
<state><state>
<sname> Texas </sname><sname> Texas </sname>
<scode> TX </scode><scode> TX </scode>
</state></state>
7
Two Kinds of XMLTwo Kinds of XML
Standalone Standalone <?xml version="1.0" standalone=“yes”?><?xml version="1.0" standalone=“yes”?>
Using Document Type Definition (DTD)Using Document Type Definition (DTD) <?xml version="1.0" standalone=“no”?><?xml version="1.0" standalone=“no”?> <!DOCTYPE state SYSTEM “state.dtd”><!DOCTYPE state SYSTEM “state.dtd”> DTD is the meta-data to describe available tagsDTD is the meta-data to describe available tags <!DOCTYPE state[<!DOCTYPE state[
<!ELEMENT state(sname, scode)><!ELEMENT state(sname, scode)>
<!ELEMENT sname (#PCDATA)><!ELEMENT sname (#PCDATA)>
<!ELEMENT scode (#PCDATA)><!ELEMENT scode (#PCDATA)>
]>]>
8
HTML is an application of HTML is an application of XMLXML
Available tags, e.g. <P> are used to Available tags, e.g. <P> are used to describe presentationdescribe presentation
Where is the DTD of HTML?Where is the DTD of HTML?
9
Well-formed vs. ValidWell-formed vs. Valid
XML must be XML must be well-formedwell-formed correct syntaxcorrect syntax tags match, tags nest, all characters legaltags match, tags nest, all characters legal parser must reject if not well-formedparser must reject if not well-formed
XML may be XML may be validvalid with respect to a with respect to a DTD (Document Type Definition)DTD (Document Type Definition) tags are used correctlytags are used correctly tags are all declaredtags are all declared attributes are declaredattributes are declared
10
Validity CheckingValidity Checking
Checks everything specified in a DTDChecks everything specified in a DTD Can't check text (currency, spelling)Can't check text (currency, spelling) Checks against DTD: this is a valid memo, Checks against DTD: this is a valid memo,
book, bibliography, ...book, bibliography, ...
11
XML SyntaxXML Syntax
The XML declarationThe XML declaration ElementsElements EntitiesEntities TextText Declarations and NotationsDeclarations and Notations Processing InstructionsProcessing Instructions CommentsComments
12
The XML DeclarationThe XML Declaration
At very beginning of fileAt very beginning of file Officially optional, but always use itOfficially optional, but always use it Can declare version, encoding, standaloneCan declare version, encoding, standalone
Must be in that orderMust be in that order Each is optionalEach is optional
Must declare other encodingsMust declare other encodings <?xml encoding="Big5"?><?xml encoding="Big5"?>
<?xml encoding="ISO-8859-1"?><?xml encoding="ISO-8859-1"?>
13
ElementsElements
Basic building block of XMLBasic building block of XML Star and end tagStar and end tag
<person>Nico</person><person>Nico</person> Attributes: <date format=“iso8601”> Attributes: <date format=“iso8601”>
</date></date> May be abbreviated by: <date May be abbreviated by: <date
format=“iso8601”/> format=“iso8601”/> Elements can be arbitrary nested to Elements can be arbitrary nested to
describe very rich information structuredescribe very rich information structure
14
Elements and AttributesElements and Attributes Attributes can parameterize an elementAttributes can parameterize an element
<state region = “Southen”><state region = “Southen”> <sname> Texas </sname><sname> Texas </sname> <scode> TX </scode><scode> TX </scode> </state></state>
Can be represented by sub-element Can be represented by sub-element <state><state> <region> Southen </region><region> Southen </region> <sname> Texas </sname><sname> Texas </sname> <scode> TX </scode><scode> TX </scode> </state></state>
15
Attribute SyntaxAttribute Syntax
Name can be any Unicode character, digit, Name can be any Unicode character, digit, or '.', '-', '_'or '.', '-', '_'
Cannot repeat: Cannot repeat: same attribute name can not appear more same attribute name can not appear more
than once in an elementthan once in an element Order doesn't matterOrder doesn't matter Values must be quoted (single or double)Values must be quoted (single or double) Values may not contain "<"Values may not contain "<" Values may have defaults in DTDValues may have defaults in DTD
16
Attributes and Sub-Attributes and Sub-elementselements
A matter of preferenceA matter of preference Main differences:Main differences:
Attribute name can not repeat in the same Attribute name can not repeat in the same elementelement
Sub-element can repreatSub-element can repreat Attribute values are always string dataAttribute values are always string data
Sub-elements can have further sub-elementsSub-elements can have further sub-elements
17
Special AttributesSpecial Attributes
id has unique identifier for elementid has unique identifier for element idref references an ididref references an id
<state id = “texas”> <sname> Texas </sname> <scode> TX </scode> <cityin idref = “dallas”/> </state>
<city id = “dallas”> <dcode> DAL </ccode> <cname> Dallas </cname> <stateof idref = “texas”/></city>
18
A unit of textA unit of text Five predefined entitiesFive predefined entities
& (&) '(‘) <(<) >(>) "& (&) '(‘) <(<) >(>) "(“)(“)
Define your own in DTDDefine your own in DTD<!ENTITY euro "€"><!ENTITY euro "€">
Use numeric character referencesUse numeric character references€ €€ €
EntitiesEntities
19
TextText
Character stringsCharacter strings Use predefined entities (< & …)Use predefined entities (< & …)
XML Example: < (>) &(&) <(<)XML Example: < (>) &(&) <(<) CDATA ("character data") section for raw CDATA ("character data") section for raw
text without using entitiestext without using entities<![CDATA[ if a< b then print a is less than b<![CDATA[ if a< b then print a is less than b
]]>]]>
20
DeclarationsDeclarations
Allow validity checkingAllow validity checking OptionalOptional May be internal (in document), external, or May be internal (in document), external, or
bothboth DTD (Document Type Definition) is all DTD (Document Type Definition) is all
active declarationsactive declarations Use existing DTDs when possibleUse existing DTDs when possible
21
External DTDExternal DTD
Most commonMost common Use DOCTYPE declaration before root Use DOCTYPE declaration before root
elementelement <!DOCTYPE greeting SYSTEM "hello.dtd"><!DOCTYPE greeting SYSTEM "hello.dtd">
<greeting>Hello, world!</greeting><greeting>Hello, world!</greeting>
22
Internal (standalone) DTDInternal (standalone) DTD
For custom documentsFor custom documents Also uses DOCTYPE declarationAlso uses DOCTYPE declaration
<!DOCTYPE greeting [<!DOCTYPE greeting [<!ELEMENT greeting (#PCDATA)><!ELEMENT greeting (#PCDATA)>]>]><greeting>Hello, world!</greeting><greeting>Hello, world!</greeting>
Specify in XML declarationSpecify in XML declaration <?xml version="1.0" standalone="yes"?><?xml version="1.0" standalone="yes"?>
23
External plus Internal DTDExternal plus Internal DTD
Usually to declare entitiesUsually to declare entities Use DOCTYPE declaration before root Use DOCTYPE declaration before root
elementelement <!DOCTYPE greeting SYSTEM "hello.dtd" [<!DOCTYPE greeting SYSTEM "hello.dtd" [
<!ENTITY excl "!"><!ENTITY excl "!">]>]><greeting>Hello, world!</greeting><greeting>Hello, world!</greeting>
24
Element Type DeclarationsElement Type Declarations
Declare nameDeclare name Declare allowed contentDeclare allowed content
<!ELEMENT a EMPTY><!ELEMENT a EMPTY><!ELEMENT either (one | theother)><!ELEMENT either (one | theother)><!ELEMENT ordered (first, second)><!ELEMENT ordered (first, second)><!ELEMENT list (item+)><!ELEMENT list (item+)><!ELEMENT dl ((dt?, dd?)*)><!ELEMENT dl ((dt?, dd?)*)><!ELEMENT text (#PCDATA)><!ELEMENT text (#PCDATA)><!ELEMENT mixed (#PCDATA | b | i | em)><!ELEMENT mixed (#PCDATA | b | i | em)>
25
Attribute List DeclarationsAttribute List Declarations
Declare attributes for an elementDeclare attributes for an element Declare value typesDeclare value types Declare defaultsDeclare defaults
<!ATTLIST termdef<!ATTLIST termdef id ID #REQUIRED id ID #REQUIRED name CDATA #IMPLIED> name CDATA #IMPLIED><!ATTLIST list<!ATTLIST list type (bullets|ordered|glossary) type (bullets|ordered|glossary) "ordered">"ordered"><!ATTLIST form<!ATTLIST form method CDATA #FIXED "POST"> method CDATA #FIXED "POST">
26
Entity DeclarationsEntity Declarations
<!ENTITY copy “©”><!ENTITY copy “©”> <!ENTITY copyright <!ENTITY copyright
"© Infoseek Corp. 1999, All rights "© Infoseek Corp. 1999, All rights reserved">reserved">
27
Processing InstructionsProcessing Instructions
Instructions to applicationsInstructions to applications fonts?fonts? security?security? correctness checks?correctness checks?
Linking to a style sheetLinking to a style sheet<?xml-stylesheet href="mystyle.css" <?xml-stylesheet href="mystyle.css"
type="text/css"?> type="text/css"?> Instructions to indexing robotsInstructions to indexing robots
<?robots index="no" follow="yes"?><?robots index="no" follow="yes"?>
28
CommentsComments
Like HTML and SGMLLike HTML and SGML<!-- a comment --><!-- a comment -->
Anything is OK inside a commentAnything is OK inside a comment <!-- <head> & <tail> are elements --><!-- <head> & <tail> are elements -->
<!-- <?xml?> declaration goes here --><!-- <?xml?> declaration goes here -->
29
What is a DTD?What is a DTD?
"Document Type Definition""Document Type Definition" Bunch of XML declarationsBunch of XML declarations Usually external to documentUsually external to document Designed for some purpose (use one that Designed for some purpose (use one that
matches your needs)matches your needs) Best left to expertsBest left to experts
30
A Bug Report DocumentA Bug Report Document
<?xml?><bugreport><product>xmltron</product><version>1.1</version><os>RTE</os><osversion>4.0</osversion><date scheme="ISO8601">1999-11-03</date><report><summary>doesn’t work</summary><detail>at all</detail></report><solution>none yet</solution></bugreport>
31
Make a Document TypeMake a Document Type
<!DOCTYPE bugreport [ <!-- declarations go here -->
]><bugreport> ...
Doctype and root element must match
32
Declarations for ElementsDeclarations for Elements
<!DOCTYPE bugreport [<!ELEMENT bugreport wait 'til next slide><!ELEMENT product #PCDATA><!ELEMENT version #PCDATA><!ELEMENT os #PCDATA><!ELEMENT osversion #PCDATA><!ELEMENT date #PCDATA><!ELEMENT report (summary, detail)><!ELEMENT summary #PCDATA><!ELEMENT detail #PCDATA><!ELEMENT solution #PCDATA>]>
33
Declaration for Root Declaration for Root ElementElement
<!DOCTYPE bugreport [<!ELEMENT bugreport (product, version, os, osversion, date, report, solution?)>
<solution> is optional, others required andmust be in this order.
34
Declarations for AttriburesDeclarations for Attribures
<!ATTLIST date scheme CDATA #IMPLIED>
"CDATA" instead of "PCDATA" means it isn't "parsed" for entities
35
Declarations for AttributesDeclarations for Attributes
"CDATA" instead of "PCDATA" means it "CDATA" instead of "PCDATA" means it isn't "parsed" for entities (no markup)isn't "parsed" for entities (no markup)
#IMPLIED means optional (value #IMPLIED means optional (value implied by document)implied by document)
separate ATTLIST declarations for the separate ATTLIST declarations for the same element are OKsame element are OK
internal ATTLIST declarations override internal ATTLIST declarations override externalexternal
<!ATTLIST date scheme CDATA #IMPLIED>
36
documents = contents + documents = contents + stylestyle
Extensible Stylesheet Language (XSL)Extensible Stylesheet Language (XSL) Specifications still in draftSpecifications still in draft But implementations keeping paceBut implementations keeping pace
37
<?xml version="1.0"?><?xml version="1.0"?><?xml-stylesheet type="text/css" href="xmlpartstyle.css"?><?xml-stylesheet type="text/css" href="xmlpartstyle.css"?><PARTS><PARTS> <TITLE>Computer Parts</TITLE><TITLE>Computer Parts</TITLE> <PART><PART> <ITEM>Motherboard</ITEM><ITEM>Motherboard</ITEM> <MANUFACTURER>ASUS</MANUFACTURER><MANUFACTURER>ASUS</MANUFACTURER> <MODEL>P3B-F </MODEL><MODEL>P3B-F </MODEL> <COST> 123.00</COST><COST> 123.00</COST> </PART></PART> <PART><PART> <ITEM>Video Card</ITEM><ITEM>Video Card</ITEM> <MANUFACTURER>ATI</MANUFACTURER><MANUFACTURER>ATI</MANUFACTURER> <MODEL>All-in-Wonder Pro</MODEL><MODEL>All-in-Wonder Pro</MODEL> <COST> 160.00</COST><COST> 160.00</COST> </PART></PART> <PART><PART> <ITEM>Sound Card</ITEM><ITEM>Sound Card</ITEM> <MANUFACTURER>Creative Labs</MANUFACTURER><MANUFACTURER>Creative Labs</MANUFACTURER> <MODEL>Sound Blaster Live</MODEL><MODEL>Sound Blaster Live</MODEL> <COST> 80.00</COST><COST> 80.00</COST> </PART></PART> <PART><PART> <ITEM> inch Monitor</ITEM><ITEM> inch Monitor</ITEM> <MANUFACTURER>LG Electronics</MANUFACTURER><MANUFACTURER>LG Electronics</MANUFACTURER> <MODEL> 995E</MODEL><MODEL> 995E</MODEL> <COST> 290.00</COST><COST> 290.00</COST> </PART></PART></PARTS></PARTS> Using a cascading style sheet, we will see Using a cascading style sheet, we will see
38
XPathXPath
Used to access part of XML document Used to access part of XML document Compact, non-XML syntax Compact, non-XML syntax Use a pattern expression to identify nodes Use a pattern expression to identify nodes
in an XML documentin an XML document Have a library of standard functions Have a library of standard functions W3C Standard W3C Standard
39
XPath ExampleXPath Example
Sample XMLSample XML The root elementThe root element
/STATES/STATES The SCODE of all STATE elements of STATES The SCODE of all STATE elements of STATES
element element /STATES/STATE/SCODE/STATES/STATE/SCODE
All the CAPTIAL element with a CNAME sub-element All the CAPTIAL element with a CNAME sub-element of the STATE element of the STATES elementof the STATE element of the STATES element /STATES/STATE/CAPITAL[CNAME=‘Atlanta’]/STATES/STATE/CAPITAL[CNAME=‘Atlanta’]
All CITIES elements in the XML documentAll CITIES elements in the XML document //CITIES//CITIES
40
More XPath ExampleMore XPath Example
Element AA with two ancestorsElement AA with two ancestors /*/*/AA/*/*/AA
First BB element of AA elementFirst BB element of AA element /AA/BB[1]/AA/BB[1]
All the CC elements of the BB elements All the CC elements of the BB elements which has an sub-element A with value ‘3’ which has an sub-element A with value ‘3’ /BB[A=‘3’]/CC/BB[A=‘3’]/CC
Any elements AA or elements CC of Any elements AA or elements CC of elements BBelements BB //AA | /BB/CC//AA | /BB/CC
41
Even More XPath ExampleEven More XPath Example
Select all sub-elements of elements BB of elements Select all sub-elements of elements BB of elements AAAA /BB/AA/*/BB/AA/* When you do not know the sub-elementsWhen you do not know the sub-elements Different from /BB/AADifferent from /BB/AA
Select all attributes named ‘aa’Select all attributes named ‘aa’ //@aa//@aa
Select all CITIES elements with an attribute named aaSelect all CITIES elements with an attribute named aa //CITIES[@aa]//CITIES[@aa]
Select all CITIES elements with an attribute named aa Select all CITIES elements with an attribute named aa with value ‘123’with value ‘123’ //CITIES[@aa = ‘123’]//CITIES[@aa = ‘123’]
42
AxisAxis
Context nodeContext node Evaluation of XPath is from left to rightEvaluation of XPath is from left to right The context node the current node (set) being The context node the current node (set) being
evaluatedevaluated AxisAxis
Specifies the relationship of the resulting Specifies the relationship of the resulting nodes relative to context nodenodes relative to context node
Example: Example: /child::AA – children of AA, abbreviated by /AA/child::AA – children of AA, abbreviated by /AA //AA/ancestor::BB – BB elements who are ancestor of //AA/ancestor::BB – BB elements who are ancestor of
any AA elementsany AA elements
43
AxesAxes
ancestorancestor: //BBB/ancestor::*: //BBB/ancestor::* <AAA><AAA>
<BBB/> <BBB/> <CCC/> <CCC/> <BBB/> <BBB/> <BBB/> <BBB/> <DDD><DDD> <BBB/> <BBB/> </DDD> </DDD> <CCC/> <CCC/> </AAA></AAA>
44
AxesAxes
ancestorancestor: //BBB/ancestor::DDD: //BBB/ancestor::DDD <AAA> <AAA>
<BBB/> <BBB/> <CCC/> <CCC/> <BBB/> <BBB/> <BBB/> <BBB/> <DDD> <DDD> <BBB/> <BBB/> </DDD> </DDD> <CCC/> <CCC/> </AAA> </AAA>
45
AxesAxes
attributeattribute: Contains all attributes of the current node: Contains all attributes of the current node //BBB/attribute::* – abbreviated by //@//BBB/attribute::* – abbreviated by //@ <AAA> <AAA>
<BBB <BBB aa=‘1’aa=‘1’/> /> <CCC/> <CCC/> <BBB <BBB aa=‘2’aa=‘2’ /> /> <BBB <BBB aa=‘3’aa=‘3’ /> /> <DDD> <DDD> <BBB <BBB bb=‘31’bb=‘31’ /> /> </DDD> </DDD> <CCC/> <CCC/> </AAA> </AAA>
//BBB/attribute::bb//BBB/attribute::bb
46
AxesAxes
childchild /AAA/DDD/child::BBB – child can be omitted for /AAA/DDD/child::BBB – child can be omitted for
abbreviationabbreviation <AAA> <AAA>
<BBB/> <BBB/> <CCC/> <CCC/> <BBB/> <BBB/> <BBB/> <BBB/> <DDD> <DDD> <BBB/> <BBB/> </DDD> </DDD> <CCC/> <CCC/> </AAA> </AAA>
47
AxesAxes
descendantdescendant /AAA/descendent::*/AAA/descendent::* <AAA> <AAA>
<BBB/> <BBB/> <CCC/> <CCC/> <BBB/> <BBB/> <BBB/> <BBB/> <DDD> <DDD> <BBB/> <BBB/> </DDD> </DDD> <CCC/> <CCC/> </AAA> </AAA>
/AAA/descendent::CCC ?/AAA/descendent::CCC ?
48
AxesAxes
parentparent //BBB/parent::*//BBB/parent::* <AAA><AAA>
<BBB/> <BBB/> <CCC/> <CCC/> <BBB/> <BBB/> <BBB/> <BBB/> <DDD><DDD> <BBB/> <BBB/> </DDD></DDD> < CCC/> < CCC/> </AAA></AAA>
//BBB/parent::DDD ?//BBB/parent::DDD ?
49
AxesAxes
descendant-or-selfdescendant-or-self followingfollowing following-siblingfollowing-sibling preceding: preceding: preceding-siblingpreceding-sibling selfself
50
PredicatesPredicates
Filters a element setFilters a element set A predicate is placed inside square brackets ( [ ] )A predicate is placed inside square brackets ( [ ] ) Example: //Example: //BBB[position() mod 2 = 0 ]BBB[position() mod 2 = 0 ] <<AAAAAA> >
< <BBBBBB/> /> < <BBBBBB/> /> < <BBBBBB/> /> < <BBBBBB/> /> < <BBBBBB/> /> < <BBBBBB/> /> < <BBBBBB/> /> < <BBBBBB/> /> < <CCCCCC/> /> < <CCCCCC/> /> < <CCCCCC/> /> </ </AAAAAA> >
51
PredicatesPredicates
//BBB[@aa=’31’]//BBB[@aa=’31’] <AAA> <AAA>
<BBB aa=‘1’/> <BBB aa=‘1’/> <CCC/> <CCC/> <BBB aa=‘2’ /> <BBB aa=‘2’ /> <BBB aa=‘3’ /> <BBB aa=‘3’ /> <DDD> <DDD> <BBB bb=‘31’ /><BBB bb=‘31’ /> </DDD> </DDD> <CCC/> <CCC/> </AAA> </AAA>
Is it different from //BBB/attribute::bb?Is it different from //BBB/attribute::bb?
52
XQueryXQuery
XQuery is a general purpose query XQuery is a general purpose query language for XML data language for XML data
XQuery uses a XQuery uses a for … let … where .. resultfor … let … where .. result … … syntaxsyntax forfor SQL from SQL from wherewhere SQL where SQL where resultresult SQL select SQL select letlet allows temporary variables, and has allows temporary variables, and has no equivalent in SQLno equivalent in SQL
53
FLWR Syntax in XQuery FLWR Syntax in XQuery Simple FLWR expression in XQuery Simple FLWR expression in XQuery
find all accounts with balance > 400, find all accounts with balance > 400, with each result enclosed in an with each result enclosed in an <account-number> .. </account-<account-number> .. </account-number> tagnumber> tag forfor $x$x in in /bank-2/account/bank-2/account let let $acctno := $x/@account-$acctno := $x/@account-number number wherewhere $x/balance > 400 $x/balance > 400 return return <account-number> $acctno <account-number> $acctno </account-number></account-number>
54
Path Expressions and Path Expressions and FunctionsFunctions
The function The function distinct( )distinct( ) can be used to can be used to removed duplicates in path expression removed duplicates in path expression resultsresults
The functionThe function document(name)document(name) returns returns root of named documentroot of named document E.g. E.g. document(“bank-2.xml”)/bank-2/accountdocument(“bank-2.xml”)/bank-2/account
Aggregate functions such as Aggregate functions such as sum( )sum( ) and and count( )count( ) can be applied to path expression can be applied to path expression resultsresults
55
JoinsJoins Joins are specified in a manner very Joins are specified in a manner very
similar to SQLsimilar to SQL
for for $a $a inin /bank/account, /bank/account, $c $c inin /bank/customer,/bank/customer, $d $d inin /bank/depositor /bank/depositor
where where $a/account-number = $a/account-number = $d/account-number $d/account-number and and $c/customer-name = $c/customer-name = $d/customer-name$d/customer-name
return return <cust-acct> $c $a </cust-<cust-acct> $c $a </cust-acct>acct>
56
The same query can be expressed with the The same query can be expressed with the selections specified as XPath selections:selections specified as XPath selections: forfor $a $a inin /bank/account /bank/account $c $c inin /bank/customer /bank/customer
$d $d inin /bank/depositor[ /bank/depositor[ account-number = account-number = $a/account-number $a/account-number andand customer-name = customer-name = $c/customer-name$c/customer-name]] return return <cust-acct> $c $a</cust-acct><cust-acct> $c $a</cust-acct>
57
Changing Nesting StructureChanging Nesting Structure
<bank-1><bank-1> forfor $c $c inin /bank/customer /bank/customer returnreturn
<customer><customer> $c/*$c/* for for $d $d inin /bank/depositor[customer-name = /bank/depositor[customer-name =
$c/customer-name],$c/customer-name], $a $a inin /bank/account[account- /bank/account[account-
number=$d/account-number]number=$d/account-number] returnreturn $a $a
</customer></customer> </bank-1></bank-1>
58
XQuery Path ExpressionsXQuery Path Expressions
$c/text()$c/text() gives text content of an element gives text content of an element without any without any subelements/tagssubelements/tags
XQuery path expressions support the “–>” XQuery path expressions support the “–>” operator for dereferencing IDREFsoperator for dereferencing IDREFs Equivalent to the id( ) function of XPath, but Equivalent to the id( ) function of XPath, but
simpler to usesimpler to use Can be applied to a set of IDREFs to get a set of Can be applied to a set of IDREFs to get a set of
resultsresults June 2001 version of standard has changed “–June 2001 version of standard has changed “–
>” to “=>”>” to “=>”
59
Sorting in XQuery Sorting in XQuery Sortby Sortby clause can be used at the end of clause can be used at the end of
any expression. E.g. to return customers any expression. E.g. to return customers sorted by namesorted by name for for $c in /bank/customer$c in /bank/customer return return <customer> $c/* </customer> <customer> $c/* </customer> sortbysortby(name)(name)
60
Can sort at multiple levels of nesting (sort by Can sort at multiple levels of nesting (sort by customer-name, and by account-number within customer-name, and by account-number within each customer)each customer)
<bank-1><bank-1> for for $c in /bank/customer$c in /bank/customer returnreturn
<customer><customer> $c/* $c/* for for $d$d in in /bank/depositor[customer-/bank/depositor[customer-
name=$c/customer-name],name=$c/customer-name], $a $a in in /bank/account[account-/bank/account[account-
number=$d/account-number]number=$d/account-number] return return <account> $a/* </account> <account> $a/* </account> sortbysortby(account-number)(account-number)
</customer></customer> sortby sortby(customer-name)(customer-name) </bank-1></bank-1>
61
Application Program Application Program InterfaceInterface There are two standard application program There are two standard application program
interfaces to XML data:interfaces to XML data: SAX SAX (Simple API for XML)(Simple API for XML)
Based on parser model, user provides event handlers Based on parser model, user provides event handlers for parsing events for parsing events
E.g. start of element, end of elementE.g. start of element, end of element Not suitable for database applicationsNot suitable for database applications
DOM DOM (Document Object Model)(Document Object Model) XML XML data is parsed into a tree representation data is parsed into a tree representation Variety of functions provided for traversing the DOM Variety of functions provided for traversing the DOM
treetree E.g.: Java DOM API provides Node class with methodsE.g.: Java DOM API provides Node class with methods
getParentNode( ), getFirstChild( ), getParentNode( ), getFirstChild( ), getNextSibling( )getNextSibling( ) getAttribute( ), getData( ) (for text node) getAttribute( ), getData( ) (for text node) getElementsByTagName( ), … getElementsByTagName( ), …
Also provides functions for updating DOM treeAlso provides functions for updating DOM tree
62
Storage of XML DataStorage of XML Data XML data can be stored in XML data can be stored in
Non-relational data storesNon-relational data stores Flat filesFlat files
Natural for storing XMLNatural for storing XML But has all problems discussed in Chapter 1 (no But has all problems discussed in Chapter 1 (no
concurrency, no recovery, …)concurrency, no recovery, …) XML databaseXML database
Database built specifically for storing XML data, Database built specifically for storing XML data, supporting DOM model and declarative queryingsupporting DOM model and declarative querying
Currently no commercial-grade systemsCurrently no commercial-grade systems
Relational databasesRelational databases Data must be translated into relational formData must be translated into relational form Advantage: mature database systemsAdvantage: mature database systems Disadvantages: overhead of translating data and Disadvantages: overhead of translating data and
queriesqueries
63
Storage of XML in Storage of XML in Relational DatabasesRelational Databases
Alternatives:Alternatives: String RepresentationString Representation Tree RepresentationTree Representation Map to relationsMap to relations
64
String RepresentationString Representation Store each top level element as a string field of a Store each top level element as a string field of a
tuple in a relational databasetuple in a relational database Use a single relation to store all elements, orUse a single relation to store all elements, or Use a separate relation for each top-level element typeUse a separate relation for each top-level element type
E.g. account, customer, depositor relationsE.g. account, customer, depositor relations Each with a string-valued attribute to store the elementEach with a string-valued attribute to store the element
Indexing:Indexing: Store values of subelements/attributes to be indexed Store values of subelements/attributes to be indexed
as extra fields of the relation, and build indices on as extra fields of the relation, and build indices on these fieldsthese fields
E.g. customer-name or account-numberE.g. customer-name or account-number Oracle 9 supports Oracle 9 supports function indices function indices which use the which use the
result of a function as the key value. result of a function as the key value. The function should return the value of the required The function should return the value of the required
subelement/attributesubelement/attribute
65
String Representation String Representation (Cont.)(Cont.)
Benefits: Benefits: Can store any XML data even without DTDCan store any XML data even without DTD As long as there are many top-level elements As long as there are many top-level elements
in a document, strings are small compared to in a document, strings are small compared to full documentfull document
Allows fast access to individual elements.Allows fast access to individual elements.
DrawbackDrawback:: Need to parse strings to access Need to parse strings to access values inside the elementsvalues inside the elements Parsing is slow.Parsing is slow.
66
Tree RepresentationTree Representation Tree representation: Tree representation: model XML data as tree and store model XML data as tree and store
using relationsusing relations nodes(id, type, label, value)nodes(id, type, label, value) child (child-id, parent-id) child (child-id, parent-id)
Each element/attribute is given a unique identifierEach element/attribute is given a unique identifier Type indicates element/attributeType indicates element/attribute Label specifies the tag name of the element/name of Label specifies the tag name of the element/name of
attributeattribute Value is the text value of the element/attributeValue is the text value of the element/attribute The relation The relation child child notes the parent-child relationships in the notes the parent-child relationships in the
treetree Can add an extra attribute to Can add an extra attribute to child child to record ordering of children to record ordering of children
bank (id:1)
customer (id:2) account (id: 5)
customer-name(id: 3)
account-number (id: 7)
67
Tree Representation (Cont.)Tree Representation (Cont.)
Benefit: Can store any XML data, even Benefit: Can store any XML data, even without DTDwithout DTD
Drawbacks:Drawbacks: Data is broken up into too many pieces, Data is broken up into too many pieces,
increasing space overheadsincreasing space overheads Even simple queries require a large number of Even simple queries require a large number of
joins, which can be slowjoins, which can be slow
68
Mapping XML Data to Mapping XML Data to RelationsRelations Map to relationsMap to relations
If DTD of document is known, can map data to If DTD of document is known, can map data to relationsrelations
A relation is created for each element typeA relation is created for each element type Elements (of type #PCDATA), and attributes are Elements (of type #PCDATA), and attributes are
mapped to attributes of relationsmapped to attributes of relations More details on next slide …More details on next slide …
Benefits: Benefits: Efficient storageEfficient storage Can translate XML queries into SQL, execute Can translate XML queries into SQL, execute
efficiently, and then translate SQL results back efficiently, and then translate SQL results back to XMLto XML
Drawbacks: need to know DTD, Drawbacks: need to know DTD, translation overheads still presenttranslation overheads still present
69
Mapping XML Data to Mapping XML Data to Relations (Cont.)Relations (Cont.) Relation created for each element type containsRelation created for each element type contains
An id attribute to store a unique id for each elementAn id attribute to store a unique id for each element A relation attribute corresponding to each element attributeA relation attribute corresponding to each element attribute A parent-id attribute to keep track of parent elementA parent-id attribute to keep track of parent element
As in the tree representationAs in the tree representation Position information (iPosition information (ithth child) can be store too child) can be store too
All subelements that occur only once can become All subelements that occur only once can become relation attributesrelation attributes For text-valued subelements, store the text as attribute For text-valued subelements, store the text as attribute
valuevalue For complex subelements, can store the id of the For complex subelements, can store the id of the
subelementsubelement Subelements that can occur multiple times Subelements that can occur multiple times
represented in a separate tablerepresented in a separate table Similar to handling of multivalued attributes when Similar to handling of multivalued attributes when
converting ER diagrams to tablesconverting ER diagrams to tables
70
Mapping XML Data to Mapping XML Data to Relations (Cont.)Relations (Cont.) E.g. For E.g. For bank-1 bank-1 DTD with DTD with accountaccount elements elements
nested within nested within customercustomer elements, create elements, create relationsrelations customer(id, parent-id, customer-name, customer-customer(id, parent-id, customer-name, customer-
stret, customer-city)stret, customer-city) parent-idparent-id can be dropped here since parent is the sole root can be dropped here since parent is the sole root
elementelement All other attributes were subelements of type #PCDATA, and All other attributes were subelements of type #PCDATA, and
occur only onceoccur only once account (id, parent-id, account-number, branch-name, account (id, parent-id, account-number, branch-name,
balance)balance) parent-idparent-id keeps track of which customer an account occurs keeps track of which customer an account occurs
underunder Same account may be represented many times with different Same account may be represented many times with different
parentsparents