24
5 Processing XML

Processing XML

Embed Size (px)

DESCRIPTION

5. Processing XML. Overview. Parsing XML documents Document Object Model (DOM) Simple API for XML (SAX) Class generation. What's the Problem?. ?. The XML Handbook Goldfarb Prescod - PowerPoint PPT Presentation

Citation preview

Page 1: Processing XML

5

Processing XML

Page 2: Processing XML

5 - 2

Parsing XML documents Document Object Model (DOM) Simple API for XML (SAX)

Class generation

Overview

Page 3: Processing XML

5 - 3

What's the Problem?

<?xml version="1.0"?><books> <book> <title>The XML Handbook</title> <author>Goldfarb</author> <author>Prescod</author> <publisher>Prentice Hall</publisher> <pages>655</pages> <isbn>0130811521</isbn> <price currency="USD">44.95</price>

</book> <book> <title>XML Design</title> <author>Spencer</author> <publisher>Wrox Press</publisher>

...</book>

</books>

?

Book

?

Page 4: Processing XML

5 - 4

Parsing XML Documents

Document Tree

Parser

Docu-ment

DTD /Schema

Applicationimplements

DocumentHandler

endDocument

startDocument

endElement

endElement

startElement

startElement

DOM SAX

Page 5: Processing XML

5 - 5

Parser

Project X (Sun Microsystems) Ælfred (Microstar Software) XML4J (IBM) Lark (Tim Bray) MSXML (Microsoft) XJ (Data Channel) Xerces (Apache) ...

Page 6: Processing XML

5 - 6

Prescod

book

PrenticeHall

<?xml version="1.0"?><books> <book> <title>The XML Handbook</title> <author>Goldfarb</author> <author>Prescod</author> <publisher>Prentice Hall</publisher> <pages>655</pages> <isbn>0130811521</isbn> <price currency="USD">44.95</price>

</book> <book> <title>XML Design</title> <author>Spencer</author> <publisher>Wrox Press</publisher>

...</book>

</books>

The Document Object Model

XML Document Structure

The XMLHandbook Goldfarb 655

books

book

publisher pages isbnauthortitle

...

Page 7: Processing XML

5 - 7

The Document Object Model

Provides a standard interface for access to and manipulation of XML structures.

Represents documents in the form of a hierarchy of nodes.

Is platform- and programming-language-neutral

Is a recommendation of the W3C (October 1, 1998)

Is implemented by many parsers

Page 8: Processing XML

5 - 8

DOM - Structure Model

Document

Node

NodeList

Element

Prescod

book

PrenticeHall

The XMLHandbook Goldfarb 655

books

book

publisher pages isbnauthortitle

...

Page 9: Processing XML

5 - 9

The Document Interface

Method Result

docTypeimplementationdocumentElementgetElementsByTagName(String)createTextNode(String)createComment(String)createElement(String)create CDATASection(String)

DocumentTypeDOMImplementationElementNodeListStringCommentElementCDATASection

Page 10: Processing XML

5 - 10

The Node Interface

Method Result

nodeNamenodeValuenodeTypeparentNodechildNodesfirstChildlastChildpreviousSiblingnextSiblingattributesinsertBefore(Node new,Node ref)replaceChild(Node new,Node old)removeChild(Node)hasChildNode

StringStringshortNodeNodeListNodeNodeNodeNodeNodeNamedMapNodeNodeNodeBoolean

Page 11: Processing XML

5 - 11

Node Types / Node NamesResult: NodeType /NodeName

Node Node Node Fields Type NameELEMENT_NODE 1 tagNameATTRIBUTE_NODE 2 name of attributeTEXT_NODE 3 "#text"CDATA_SECTION_NODE 4 "#cdata-section"ENTITY_REFERENCE_NODE 5 name of entity referencedENTITY_NODE 6 entity namePROCESSING_INSTRUCTION_NODE 7 targetCOMMENT_NODE 8 "#comment"DOCUMENT_NODE 9 "#document"DOCUMENT_TYPE_NODE 10 document type nameDOCUMENT_FRAGMENT_NODE 11 "#document-fragment"NOTATION_NODE 12 notation name

Page 12: Processing XML

5 - 12

The NodeList Interface

Method Result

lengthitem(int)

IntNode

Page 13: Processing XML

5 - 13

The Element Interface

Method Result

tagNamegetAttribute(String)setAttribute(String name, String value)removeAttribute(String)getAttributeNode(String)setAttributeNode(Attr)removeAttributeNode(String)getElementsByTagName

StringStringAttr

AttrAttr

NodeList

Page 14: Processing XML

5 - 14

DOM Methods for Navigation

firstChild lastChild

nextSiblingpreviousSibling

parentNode

getElementsByTagName

childNodes(length, item())

Page 15: Processing XML

5 - 15

DOM Methods for Manipulation

appendChildinsertBeforereplaceChildremoveChild

createElementcreateAttributecreateTextNode

Page 16: Processing XML

5 - 16

Example

Goldfarb Spencer

books

book book

author authorauthor

Prescod

doc.documentElement.childNodes.item(0).getElementsByTagName("author"). item(1).childNodes.item(0).datadoc.documentElement.childNodes.item(0).getElementsByTagName("author"). item(1).childNodes.item(0).data

Root NodeDOM

Object TextBookssecondAuthor

TextSubnodes

firstthereof

firstBook

Authors

Page 17: Processing XML

5 - 17

Script

<HTML><HEAD><TITLE>DOM Example</TITLE></HEAD><BODY><H1>DOM Example</H1><SCRIPT LANGUAGE="JavaScript">

var doc, root, book1, authors, author2; doc = new ActiveXObject("Microsoft.XMLDOM"); doc.async = false; doc.load("books.xml"); if (doc.parseError != 0)

alert(doc.parseError.reason); else {

root = doc.documentElement;document.write("Name of Root node: " + root.nodeName + "<BR>");document.write("Type of Root node: " + root.nodeType + "<BR>");book1 = root.childNodes.item(0);authors = book1.getElementsByTagName("author");document.write("Number of authors: " + authors.length + "<BR>");author2 = authors.item(1);document.write("Name of second author: " + author2.childNodes.item(0).data);}

</SCRIPT></BODY></HTML>

<HTML><HEAD><TITLE>DOM Example</TITLE></HEAD><BODY><H1>DOM Example</H1><SCRIPT LANGUAGE="JavaScript">

var doc, root, book1, authors, author2; doc = new ActiveXObject("Microsoft.XMLDOM"); doc.async = false; doc.load("books.xml"); if (doc.parseError != 0)

alert(doc.parseError.reason); else {

root = doc.documentElement;document.write("Name of Root node: " + root.nodeName + "<BR>");document.write("Type of Root node: " + root.nodeType + "<BR>");book1 = root.childNodes.item(0);authors = book1.getElementsByTagName("author");document.write("Number of authors: " + authors.length + "<BR>");author2 = authors.item(1);document.write("Name of second author: " + author2.childNodes.item(0).data);}

</SCRIPT></BODY></HTML>

Page 18: Processing XML

5 - 18

SAX - Simple API for XML

Docu-ment

DTD

Application

endDocument

startDocument

endElement

endElement

startElement

startElement

Parser

Page 19: Processing XML

5 - 19

SAX - Simple API for XML

Event-driven parsing model "Don't call the DOM, the parser calls you." Developed by the members of the XML-DEV Mailing List Released on May 11, 1998 Supported by many parsers ... ... but Ælfred is the saxon king.

Page 20: Processing XML

5 - 20

Procedure

DOM Creating a parser instance Parsing the whole document Processing the DOM tree

SAX Creating a parser instance Registrating event handlers with the parser Parser calls the event handler during parsing

Page 21: Processing XML

5 - 21

Namespace Support

<?xml version="1.0"?><order xmlns="http://www.net-standard.com/namespaces/order" xmlns:bk="http://www.net-standard.com/namespaces/books" xmlns:cust="http://www.net-standard.com/namespaces/customer">...<bk:book> <bk:title>XML Handbook</bk:title> <bk:isbn>0130811521</bk:isbn></bk:book>....</order>

Page 22: Processing XML

5 - 22

Access to Qualified Elements

Node "book"

bk:book

http://www.net-standard.com/namespaces/books

bk

book

Interface "Node"

DOM Level 2

Method

nodeName

namespaceURI

prefix

localName

qName

uri

localName

SAX 2.0

startElement

Page 23: Processing XML

5 - 23

Generation of Data Structures

DTD / Schema'yacht'

Generation

01 yacht05 name05 details10 type

Class

Processing

<?xml?><yacht yachtid='147'><name>Mona Lisa</name><image file='yacht147.jpg'/><description> Any text describing this yacht 147</description><details> <type>GULFSTAR 55</type> ength>1700</length> <width>480</width> <draft>170</draft> <sailsurface>112</sailsurface> <motor>84</motor> <headroom>202</headroom> <bunks>8</bunks></details></yacht>

01 yacht05 VENTANA05 details10 GULFSTAR 55

Object

Page 24: Processing XML

5 - 24

Summary

To avoid expensive text processing, applications use an XML parser that creates a DOM tree of a document.

The DOM provides a standardized API to access the content of documents and to manipulate them.

Alternatively or additionally, applications can work event-based using the SAX interface, which is provided by many parsers.