75
1 Processing XML Processing XML with Java with Java A comprehensive tutorial about XML processing with Java XML tutorial of W3Schools

1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

  • View
    233

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

1

Processing XML with JavaProcessing XML with Java

A comprehensive tutorial about XML processing with Java

XML tutorial of W3Schools

Page 2: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

2

ParsersParsers

• What is a parser?

Parser

Formal grammar

Input AnalyzedData

The structure(s) of the input, according to the atomic elements

and their relationships (as described in the grammar)

Page 3: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

3

XML-Parsing StandardsXML-Parsing Standards

• We will consider two parsing methods that implement W3C standards for accessing XML

• DOM- convert XML into a tree of objects - “random access” protocol

• SAX- “serial access” protocol- event-driven parsing

Page 4: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

4

XML ExamplesXML Examples

Page 5: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

5

<?xml version="1.0"?>

<!DOCTYPE countries SYSTEM "world.dtd">

<countries>

<country continent="&as;">

<name>Israel</name>

<population year="2001">6,199,008</population>

<city capital="yes"><name>Jerusalem</name></city>

<city><name>Ashdod</name></city>

</country>

<country continent="&eu;">

<name>France</name>

<population year="2004">60,424,213</population>

</country>

</countries>

world.xmlworld.xmlroot element

validating DTD file

reference to an entity

Page 6: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

6

countries

country

Asia

continent

Israel

name

year

2001

6,199,008

population

city

capital

yes

name

Jerusalem

country

Europe

continent

France

nameyear

2004

60,424,213

population

city

capital

no

name

Ashdod

XML Tree ModelXML Tree Modelelement

element

attribute simple content

Page 7: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

7

<!ELEMENT countries (country*)> <!ELEMENT country (name,population?,city*)> <!ATTLIST country continent CDATA #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT city (name)><!ATTLIST city capital (yes|no) "no"> <!ELEMENT population (#PCDATA)> <!ATTLIST population year CDATA #IMPLIED> <!ENTITY eu "Europe"> <!ENTITY as "Asia"><!ENTITY af "Africa"><!ENTITY am "America"><!ENTITY au "Australia">

world.dtdworld.dtd

default value

Open world.xml in your browser

Check world2.xml for #PCDATA exmaple

As opposed to

required

parsedNot parsed

Page 8: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

8

<?xml version="1.0"?>

<forsale date="12/2/03"

xmlns:xhtml="http://www.w3.org/1999/xhtml">

<book>

<title> <xhtml:em>DBI:</xhtml:em>

<![CDATA[Where I Learned <xhtml>.]]>

</title>

<comment

xmlns="http://www.cs.huji.ac.il/~dbi/comments">

<par>My <xhtml:b> favorite </xhtml:b> book!</par>

</comment>

</book>

</forsale>

sales.xmlsales.xml

“xhtml” namespace declaration

default namespace declaration

namespace overriding

(non-parsed) character data

NamespacesNamespaces

Page 9: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

9

<?xml version="1.0"?>

<forsale date="12/2/03"

xmlns:xhtml="http://www.w3.org/1999/xhtml">

<book>

<title> <xhtml:h1> DBI </xhtml:h1>

<![CDATA[Where I Learned <xhtml>.]]>

</title>

<comment

xmlns="http://www.cs.huji.ac.il/~dbi/comments">

<par>My <xhtml:b> favorite </xhtml:b> book!</par>

</comment>

</book>

</forsale>

sales.xmlsales.xml

Namespace: “http://www.w3.org/1999/xhtml”

Local name: “h1”

Qualified name: “xhtml:h1”

Page 10: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

10

<?xml version="1.0"?>

<forsale date="12/2/03"

xmlns:xhtml="http://www.w3.org/1999/xhtml">

<book>

<title> <xhtml:h1> DBI </xhtml:h1>

<![CDATA[Where I Learned <xhtml>.]]>

</title>

<comment

xmlns="http://www.cs.huji.ac.il/~dbi/comments">

<par>My <xhtml:b> favorite </xhtml:b> book!</par>

</comment>

</book>

</forsale>

sales.xmlsales.xml

Namespace: “http://www.cs.huji.ac.il/~dbi/comments”

Local name: “par”

Qualified name: “par”

Page 11: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

11

<?xml version="1.0"?>

<forsale date="12/2/03"

xmlns:xhtml="http://www.w3.org/1999/xhtml">

<book>

<title> <xhtml:h1>DBI</xhtml:h1>

<![CDATA[Where I Learned <xhtml>.]]>

</title>

<comment

xmlns="http://www.cs.huji.ac.il/~dbi/comments">

<par>My <xhtml:b> favorite </xhtml:b> book!</par>

</comment>

</book>

</forsale>

sales.xmlsales.xml

Namespace: “”

Local name: “title”

Qualified name: “title”

Page 12: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

12

<?xml version="1.0"?>

<forsale date="12/2/03"

xmlns:xhtml="http://www.w3.org/1999/xhtml">

<book>

<title> <xhtml:h1>DBI</xhtml:h1>

<![CDATA[Where I Learned <xhtml>.]]>

</title>

<comment

xmlns="http://www.cs.huji.ac.il/~dbi/comments">

<par>My <xhtml:b> favorite </xhtml:b> book!</par>

</comment>

</book>

</forsale>

sales.xmlsales.xml

Namespace: “http://www.w3.org/1999/xhtml”

Local name: “b”

Qualified name: “xhtml:b”

Page 13: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

13

DOM DOM –– D Documentocument O Objectbject MModelodel

Page 14: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

14

DOM ParserDOM Parser

• DOM = Document Object Model

• Parser creates a tree object out of the document

• User accesses data by traversing the tree- The tree and its traversal conform to a W3C standard

• The API allows for constructing, accessing and manipulating the structure and content of XML documents

Page 15: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

15

<?xml version="1.0"?>

<!DOCTYPE countries SYSTEM "world.dtd">

<countries>

<country continent="&as;">

<name>Israel</name>

<population year="2001">6,199,008</population>

<city capital="yes"><name>Jerusalem</name></city>

<city><name>Ashdod</name></city>

</country>

<country continent="&eu;">

<name>France</name>

<population year="2004">60,424,213</population>

</country>

</countries>

Page 16: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

16

The DOM TreeThe DOM TreeDocument

countries

country

Asia

continent

Israel

name

year

2001

6,199,008

population

city

capital

yes

name

Jerusalem

country

Europe

continent

France

nameyear

2004

60,424,213

population

city

capital

no

name

Ashdod

Page 17: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

17

Using a DOM TreeUsing a DOM Tree

DOM Parser DOM TreeXML File

API

Application

in memory

Page 18: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

18

Page 19: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

19

Creating a DOM TreeCreating a DOM Tree

• A DOM tree is generated by a DocumentBuilder

• The builder is generated by a factory, in order to be

implementation independent

• The factory is chosen according to the system

configuration

DocumentBuilderFactory factory =

DocumentBuilderFactory.newInstance();

DocumentBuilder builder = factory.newDocumentBuilder();

Document doc = builder.parse("world.xml");

Page 20: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

20

Configuring the FactoryConfiguring the Factory

• The methods of the document-builder factory enable you to configure the properties of the document building

• For example- factory.setValidating(true) - factory.setIgnoringComments(false)

Read more about DocumentBuilderFactory Class, DocumentBuilder Class

Page 21: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

21

The The NodeNode Interface Interface

• The nodes of the DOM tree include- a special root (denoted document)

• The Document interface retrieved by builder.parse(…) actually extends the Node Interface

- element nodes - text nodes and CDATA sections- attributes- comments- and more ...

• Every node in the DOM tree implements the Node interface

This is very unintuitive (some would even say this is a bloated design)

Page 22: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

22

A light-weight fragment of the document. Can hold several sub-trees

InterfacesInterfaces in a DOM Tree in a DOM TreeDocumentFragment

Document

CharacterDataText

Comment

CDATASection

Attr

Element

DocumentType

Notation

Entity

EntityReference

ProcessingInstruction

Node

NodeList

NamedNodeMap

DocumentType

Figure as appears in : “The XML Companion” - Neil Bradley

Run FragmentVsElement with 1st argument fragment/element

Page 23: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

23

Interfaces in the DOM TreeInterfaces in the DOM Tree

Document

Document Type Element

Attribute Element ElementAttribute Text

ElementText Entity Reference TextText

TextComment

Page 24: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

24

Node NavigationNode Navigation

• Every node has a specific location in tree

• Node interface specifies methods for tree navigation- Node getFirstChild();- Node getLastChild();- Node getNextSibling();- Node getPreviousSibling();- Node getParentNode();- NodeList getChildNodes();- NamedNodeMap getAttributes()

Page 25: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

25

Node Navigation (cont)Node Navigation (cont)

getFirstChild()

getPreviousSibling()

getChildNodes()

getNextSibling()

getLastChild()

getParentNode()

Page 26: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

26

Not a very OO approach… reminds one of simulating OO using unions in C….

Node PropertiesNode Properties

• Every node has (a dubious design and…)- a type- a name- a value- attributes

• The roles of these properties differ according to the node types

• Nodes of different types implement different interfaces (that extend Node)

Only Element Nodes have attributes… some would say this should’ve been a property of the Element derived class and not of Node.

Very few nodes have both a significant name and a significant value

Page 27: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

27

InterfacenodeNamenodeValueattributes

Attrname of attributevalue of attributenull

CDATASection"#cdata-section"content of the Sectionnull

Comment"#comment"content of the commentnull

Document"#document"nullnull

DocumentFragment"#document-fragment"nullnull

DocumentTypedoc-type namenullnull

Elementtag namenullNodeMap

Entityentity namenullnull

EntityReferencename of entity referencednullnull

Notationnotation namenullnull

ProcessingInstructiontargetentire contentnull

Text"#text"content of the text nodenull

Names, Values and AttributesNames, Values and Attributes

Page 28: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

28

Node Types - Node Types - getNodeType()getNodeType()

ELEMENT_NODE = 1

ATTRIBUTE_NODE = 2

TEXT_NODE = 3

CDATA_SECTION_NODE = 4

ENTITY_REFERENCE_NODE = 5

ENTITY_NODE = 6

PROCESSING_INSTRUCTION_NODE = 7

COMMENT_NODE = 8

DOCUMENT_NODE = 9

DOCUMENT_TYPE_NODE = 10

DOCUMENT_FRAGMENT_NODE = 11

NOTATION_NODE = 12

if (myNode.getNodeType() == Node.ELEMENT_NODE) { //process node …}

Read more about Node Interface

Page 29: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

29

import org.w3c.dom.*;

import javax.xml.parsers.*;

public class EchoWithDom {

public static void main(String[] args) throws Exception {

DocumentBuilderFactory factory =

DocumentBuilderFactory.newInstance();

factory.setIgnoringElementContentWhitespace(true);

DocumentBuilder builder = factory.newDocumentBuilder();

Document doc = builder.parse(“world.xml");

new EchoWithDom().echo(doc);

} e.g white spaces used for indentation in non mixed data elements

Page 30: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

30

private void echo(Node n) {

print(n);

if (n.getNodeType() == Node.ELEMENT_NODE) {

NamedNodeMap atts = n.getAttributes();

++depth;

for (int i = 0; i < atts.getLength(); i++) echo(atts.item(i));

--depth; }

depth++;

for (Node child = n.getFirstChild(); child != null;

child = child.getNextSibling()) echo(child);

depth--;

}

Attribute nodes are not

included…

Page 31: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

31

private int depth = 0;

private String[] NODE_TYPES = {

"", "ELEMENT", "ATTRIBUTE", "TEXT", "CDATA",

"ENTITY_REF", "ENTITY", "PROCESSING_INST",

"COMMENT", "DOCUMENT", "DOCUMENT_TYPE",

"DOCUMENT_FRAG", "NOTATION" };

private void print(Node n) {

for (int i = 0; i < depth; i++) System.out.print(" ");

System.out.print(NODE_TYPES[n.getNodeType()] + ":");

System.out.print("Name: "+ n.getNodeName());

System.out.print(" Value: "+ n.getNodeValue()+"\n");

}} run EchoWithDom, pay attention to the default values

Page 32: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

32

public class WorldParser {

public static void main(String[] args) throws Exception {

DocumentBuilderFactory factory =

DocumentBuilderFactory.newInstance();

factory.setIgnoringElementContentWhitespace(true);

DocumentBuilder builder =

factory.newDocumentBuilder();

Document doc = builder.parse("world.xml");

printCities(doc);

}

Another ExampleAnother Example

Page 33: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

33

public static void printCities(Document doc) {

NodeList cities = doc.getElementsByTagName("city");

for(int i=0; i<cities.getLength(); ++i) {

printCity((Element)cities.item(i));

}

}

public static void printCity(Element city) {

Node nameNode =

city.getElementsByTagName("name").item(0);

String cName = nameNode.getFirstChild().getNodeValue();

System.out.println("Found City: " + cName);

}

Another Example (cont)Another Example (cont)

run WorldParser

Searches within descendents

Page 34: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

34

Normalizing the DOM TreeNormalizing the DOM Tree

• Normalizing a DOM Tree has two effects:- Combine adjacent textual nodes- Eliminate empty textual nodes

• To normalize, apply the normalize() method to the document element

Created by node

manipulation…

Page 35: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

35

Node ManipulationNode Manipulation

• Children of a node in a DOM tree can be manipulated - added, edited, deleted, moved, copied, etc.

• To constructs new nodes, use the methods of Document- createElement, createAttribute, createTextNode,

createCDATASection etc.

• To manipulate a node, use the methods of Node- appendChild, insertBefore, removeChild, replaceChild,

setNodeValue, cloneNode(boolean deep) etc.

Page 36: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

36

Node Manipulation (cont)Node Manipulation (cont)

RefNew

insertBefore

Old

New

replaceChild

cloneNode

deep = 'false'

deep = 'true'

Figure as appears in “The XML Companion” - Neil Bradley

Page 37: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

37

SAX SAX –– S Simple imple AAPIPI for for XXMLML

Page 38: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

38

SAX ParserSAX Parser

• SAX = Simple API for XML

• XML is read sequentially

• When a parsing event happens, the parser invokes the corresponding method of the corresponding handler.

• This is called event-driven programming. Most GUI programs are written using this paradigm.

• The handlers are programmer’s implementation of standard Java API (i.e., interfaces and classes)

Page 39: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

39

<?xml version="1.0"?>

<!DOCTYPE countries SYSTEM "world.dtd">

<countries>

<country continent="&as;"> <!--israel-->

<name>Israel</name>

<population year="2001">6,199,008</population>

<city capital="yes"><name>Jerusalem</name></city>

<city><name>Ashdod</name></city>

</country>

<country continent="&eu;">

<name>France</name>

<population year="2004">60,424,213</population>

</country>

</countries>

Page 40: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

40

<?xml version="1.0"?>

<!DOCTYPE countries SYSTEM "world.dtd">

<countries>

<country continent="&as;"> <!--israel-->

<name>Israel</name>

<population year="2001">6,199,008</population>

<city capital="yes"><name>Jerusalem</name></city>

<city><name>Ashdod</name></city>

</country>

<country continent="&eu;">

<name>France</name>

<population year="2004">60,424,213</population>

</country>

</countries>

Start Document

Page 41: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

41

<?xml version="1.0"?>

<!DOCTYPE countries SYSTEM "world.dtd">

<countries>

<country continent="&as;"> <!--israel-->

<name>Israel</name>

<population year="2001">6,199,008</population>

<city capital="yes"><name>Jerusalem</name></city>

<city><name>Ashdod</name></city>

</country>

<country continent="&eu;">

<name>France</name>

<population year="2004">60,424,213</population>

</country>

</countries>

Start Element

Page 42: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

42

<?xml version="1.0"?>

<!DOCTYPE countries SYSTEM "world.dtd">

<countries>

<country continent="&as;"> <!--israel-->

<name>Israel</name>

<population year="2001">6,199,008</population>

<city capital="yes"><name>Jerusalem</name></city>

<city><name>Ashdod</name></city>

</country>

<country continent="&eu;">

<name>France</name>

<population year="2004">60,424,213</population>

</country>

</countries>

Start Element

Page 43: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

43

<?xml version="1.0"?>

<!DOCTYPE countries SYSTEM "world.dtd">

<countries>

<country continent="&as;"> <!--israel-->

<name>Israel</name>

<population year="2001">6,199,008</population>

<city capital="yes"><name>Jerusalem</name></city>

<city><name>Ashdod</name></city>

</country>

<country continent="&eu;">

<name>France</name>

<population year="2004">60,424,213</population>

</country>

</countries>

Comment

Page 44: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

44

<?xml version="1.0"?>

<!DOCTYPE countries SYSTEM "world.dtd">

<countries>

<country continent="&as;"> <!--israel-->

<name>Israel</name>

<population year="2001">6,199,008</population>

<city capital="yes"><name>Jerusalem</name></city>

<city><name>Ashdod</name></city>

</country>

<country continent="&eu;">

<name>France</name>

<population year="2004">60,424,213</population>

</country>

</countries>

Start Element

Page 45: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

45

<?xml version="1.0"?>

<!DOCTYPE countries SYSTEM "world.dtd">

<countries>

<country continent="&as;"> <!--israel-->

<name>Israel</name>

<population year="2001">6,199,008</population>

<city capital="yes"><name>Jerusalem</name></city>

<city><name>Ashdod</name></city>

</country>

<country continent="&eu;">

<name>France</name>

<population year="2004">60,424,213</population>

</country>

</countries>

Characters

Page 46: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

46

<?xml version="1.0"?>

<!DOCTYPE countries SYSTEM "world.dtd">

<countries>

<country continent="&as;"> <!--israel-->

<name>Israel</name>

<population year="2001">6,199,008</population>

<city capital="yes"><name>Jerusalem</name></city>

<city><name>Ashdod</name></city>

</country>

<country continent="&eu;">

<name>France</name>

<population year="2004">60,424,213</population>

</country>

</countries>

End Element

Page 47: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

47

<?xml version="1.0"?>

<!DOCTYPE countries SYSTEM "world.dtd">

<countries>

<country continent="&as;"> <!--israel-->

<name>Israel</name>

<population year="2001">6,199,008</population>

<city capital="yes"><name>Jerusalem</name></city>

<city><name>Ashdod</name></city>

</country>

<country continent="&eu;">

<name>France</name>

<population year="2004">60,424,213</population>

</country>

</countries>

End Element

Page 48: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

48

<?xml version="1.0"?>

<!DOCTYPE countries SYSTEM "world.dtd">

<countries>

<country continent="&as;"> <!--israel-->

<name>Israel</name>

<population year="2001">6,199,008</population>

<city capital="yes"><name>Jerusalem</name></city>

<city><name>Ashdod</name></city>

</country>

<country continent="&eu;">

<name>France</name>

<population year="2004">60,424,213</population>

</country>

</countries>

End Document

Page 49: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

49

SAX ParsersSAX Parsers

SAX Parser

When you see the start of the document do …

When you see the start of an element do … When you see

the end of an element do …

<?xml version="1.0"?>...

Page 50: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

50

Used to create a SAX Parser

Handles document events: start tag, end tag,

etc.

Handles Parser Errors

Handles DTD

Handles Entities

XMLReaderXML

XML-ReaderFactory

ContentHandler

EntityResolver

ErrorHandler

DTDHandler

Page 51: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

51

Creating a ParserCreating a Parser

• The SAX interface is an accepted standard

• There are many implementations of many vendors- Standard API does not include an actual

implementation, but Sun provides one with JDK

• We would like to be able to change the implementation used, without changing any code in the program- How is this done?

Page 52: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

52

Factory Design PatternFactory Design Pattern

• Have a “factory” class that creates the actual parsers- org.xml.sax.helpers.XMLReaderFactory

• The factory checks configurations, mainly the value of a system property, that specify the implementation- Can be set outside the Java code: a configuration file, a

command-line argument, etc.

• In order to change the implementation, simply change the system property

Read more about XMLReaderFactory Class

Page 53: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

53

Creating a SAX ParserCreating a SAX Parser

import org.xml.sax.*;

import org.xml.sax.helpers.*;

public class EchoWithSax {

public static void main(String[] args) throws Exception {

System.setProperty("org.xml.sax.driver",

"org.apache.xerces.parsers.SAXParser");

XMLReader reader =

XMLReaderFactory.createXMLReader();

reader.parse("world.xml");

}

}Read more about XMLReader Interface, Xerces SAXParser class

Implements XMLReader

Page 54: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

54

Implementing the Content HandlerImplementing the Content Handler

• A SAX parser invokes methods such as startDocument, startElement and endElement of its content handler as it runs

• In order to react to parsing events we must:- implement the ContentHandler interface- set the parser’s content handler with an instance of

our ContentHandler implementation

Page 55: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

55

ContentHandlerContentHandler Methods Methods

• startDocument - parsing begins

• endDocument - parsing ends

• startElement - an opening tag is encountered

• endElement - a closing tag is encountered

• characters - text (CDATA) is encountered

• ignorableWhitespace - white spaces that should be ignored (according to the DTD)

• and more ...

Read more about ContentHandler Interface

Page 56: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

56

The Default HandlerThe Default Handler

• The class DefaultHandler implements all handler interfaces (usually, in an empty manner)- i.e., ContentHandler, EntityResolver, DTDHandler, ErrorHandler

• An easy way to implement the ContentHandler interface is to extend DefaultHandler

Read more about DefaultHandler Class

Page 57: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

57

A Content Handler ExampleA Content Handler Example

import org.xml.sax.helpers.DefaultHandler;

import org.xml.sax.*;

public class EchoHandler extends DefaultHandler {

int depth = 0;

public void print(String line) {

for(int i=0; i<depth; ++i) System.out.print(" ");

System.out.println(line);

}

Page 58: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

58

A Content Handler ExampleA Content Handler Example

public void startDocument() throws SAXException {

print("BEGIN"); }

public void endDocument() throws SAXException {

print("END"); }

public void startElement(String ns, String localName,

String qName, Attributes attrs) throws SAXException {

print("Element " + qName + "{");

++depth;

for (int i = 0; i < attrs.getLength(); ++i)

print(attrs.getLocalName(i) + "=" + attrs.getValue(i)); }

We will discuss this interface later…

Page 59: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

59

A Content Handler ExampleA Content Handler Example

public void endElement(String ns, String lName,

String qName) throws SAXException {

--depth;

print("}"); }

public void characters(char buf[], int offset, int len)

throws SAXException {

String s = new String(buf, offset, len).trim();

++depth; print(s); --depth; } }

Page 60: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

60

Fixing The ParserFixing The Parser

public class EchoWithSax {

public static void main(String[] args) throws Exception {

XMLReader reader =

XMLReaderFactory.createXMLReader();

reader.setContentHandler(new EchoHandler());

reader.parse("world.xml");

}

}

run EchoWithSax

What would happen without this line?run the EchoWithSax2

Page 61: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

61

Empty ElementsEmpty Elements

• What do you think happens when the parser parses an empty element?

<rating stars="five" />

run EchoWithSax3

Page 62: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

62

AttributesAttributes Interface Interface• The Attributes interface provides an access to all

attributes of an element- getLength(), getQName(i), getValue(i), getType(i), getValue(qname), etc.

• The following are possible types for attributes: CDATA, ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, ENTITY, ENTITIES, NOTATION

• There is no distinction between attributes that are defined explicitly from those that are specified in the DTD (with a default value)

run EchoWithSax and check “capital” attribute, compare to xml source

Read more about Attributes Interface

#attributes

Page 63: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

63

ErrorHandlerErrorHandler Interface Interface

• We implement ErrorHandler to receive error events

(similar to implementing ContentHandler)

• DefaultHandler implements ErrorHandler in

an empty fashion, so we can extend it (as before)

• An ErrorHandler is registered with- reader.setErrorHandler(handler);

• Three methods:- void error(SAXParseException ex);- void fatalError(SAXParserExcpetion ex);- void warning(SAXParserException ex);

Page 64: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

64

Parsing ErrorsParsing Errors

• Fatal errors disable the parser from continuing parsing- For example, the document is not well formed, an unknown

XML version is declared, etc.

• Errors (that is recoverable ones) occur for example when the parser is validating and validity constrains are violated

• Warnings occur when abnormal (yet legal) conditions are encountered- For example, an entity is declared twice in the DTD

Read more about ErrorHandler Interface

Page 65: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

65

Usually appear with external non-xml resources and describe their

type

EntityResolver and DTDHandlerEntityResolver and DTDHandler

• The interface EntityResolver enables the programmer to specify a new source for translation of external entities

• The interface DTDHandler enables the programmer to react to notations and unparsed entities declarations inside the DTD

e.g. external DTD

Read more about EntityResolver Interface, DTDHandler Interface

Page 66: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

66

Features and PropertiesFeatures and Properties• SAX parsers can be configured by setting their features and

properties

• Syntax:- reader.setFeature("feature-url", boolean) - reader.setProperty("property-url", Object)

• Standard feature URLs have the form:http://xml.org/sax/features/feature-name

• Standard property URLs have the formhttp://xml.org/sax/properties/prop-name

• Using URLs is a (confusing) technique used to ensure unique

names for non-standard features and properties added by different

implementations. These URLs do not represent web addresses!

Page 67: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

67

Feature/Property ExamplesFeature/Property Examples• Features:

- namespaces - are namespaces supported?- validation - does the parser validate (against the

declared DTD) ?- http://apache.org/xml/features/nonvalidating/load-external-dtd

• Ignore the DTD? (spec. to Xerces implementation)

• Properties:- xml-string - the actual text that caused the

current event (read-only with getProperty())- lexical-handler - see the next slide...

Read more about Features

Read more about Properties

Page 68: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

68

Lexical EventsLexical Events

• Lexical events have to do with the way that a document

was written and not with its content

• Examples:- A comment is a lexical event (<!-- comment -->)- The use of an entity is a lexical event (&gt;)

• These can be dealt with by implementing the

LexicalHandler interface, and setting the lexical-

handler property to an instance of the handler

Page 69: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

69

LexicalHandlerLexicalHandler Methods Methods

• comment(char[] ch, int start, int length)

• startCDATA()

• endCDATA()

• startEntity(java.lang.String name)

• endEntity(java.lang.String name)

• and more...

Read more about LexicalHandler Interface

Page 70: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

70

SAX vs. DOMSAX vs. DOM

Page 71: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

71

Parser EfficiencyParser Efficiency

• The DOM object built by DOM parsers is usually complicated (composed of many, many objects) and requires more memory storage than the XML file itself.- A lot of time is spent on construction before use.- For some very large documents, this may be impractical.

• SAX parsers store only local information that is encountered during the serial traversal and do not create many temporary objects during traversal.

• Hence, programming with SAX parsers is, in general, more efficient.

Page 72: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

72

Programming using SAX is Programming using SAX is DifficultDifficult

• In some cases, programming with SAX might seem difficult at a first glance:- How can we find, using a SAX parser, elements e1

with ancestor e2?- How can we find, using a SAX parser, elements e1

that have a descendant element e2?- How can we find the element e1 referenced by the

IDREF attribute of e2?

• In other cases, using SAX can be more elegant.

Page 73: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

73

Node NavigationNode Navigation• SAX parsers do not provide access to elements other

than the one currently visited in the serial (DFS) traversal of the document

• In particular,- They do not read backwards- They do not enable access to elements by ID or name

• DOM parsers enable any traversal method

• Hence, using DOM parsers can be more comfortable

Page 74: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

74

More DOM AdvantagesMore DOM Advantages

• DOM object compiled XML

• You can save time and effort if you send and receive DOM objects instead of XML files- But, DOM object are generally larger than the source

• DOM parsers provide a natural integration of XML reading and manipulating - e.g., “cut and paste” of XML fragments

Page 75: 1 Processing XML with Java A comprehensive tutorial about XML processing with JavaXML processing with Java XML tutorial of W3SchoolsW3Schools

75

Which should we use?Which should we use?DOM vs. SAXDOM vs. SAX

• If your document is very large and you only need a few elements – use SAX

• If you need to manipulate (i.e., change) the XML – use DOM

• If you need to access the XML many times – use DOM (assuming the file is not too large)

• Depending on the task it hand, it might be easier for you to visualize it implemented using one of the APIs and not the other – so use it!