View
52
Download
3
Category
Tags:
Preview:
DESCRIPTION
Before today’s lecture. Personal Project Due date (including demo your work): 4/12 Grading scheme. Before today’s lecture. Final Project Group members : Deadline (for grouping your members): Before 4/10 Send the name list of your group members to 尚純 or 紹楷 - PowerPoint PPT Presentation
Citation preview
1Before today’s lecture
• Personal Project– Due date (including demo your work): 4/12– Grading scheme
Application
All XML documents Schema documents Application source codes Web-based interfaces Other source codes
50%
Paper Project paper 40%Demonstration Design layout and functionalities 10%
2Before today’s lecture
• Final Project– Group members:
• Deadline (for grouping your members): Before 4/10• Send the name list of your group members to 尚純 or 紹楷 • For those who can’t make a team, we’ll make a group for you. The group
members will be posted on 4/12• If you want to make a change, the deadline is on 4/15
– Project Topics: • Will be posted on the web, pick one and send your topic to 尚純 or 紹楷 • Alternatively, send a proposal for selecting your own topic. • The proposal should include reference information of the topic and the
scope of the project.• Teaching Assisstants: 吳尚純 knaughtally@hotmail.com李紹楷 inses1982@yahoo.com.tw
3
Simple API for XML (SAX)
Is SAX too hard for mortal programmers? And is the domination of DOM a bad
thing?
4
• Introduction
• XML Parsing Operations
• The SAX API
• How SAX Processing Works
• SAX-based parsers
• Events
• An SAX Example: Step by Step
• Example (SAX1.0): Tree Diagram
• SAX 2.0
• Example: Printing the notes in an XML document
• Summary
5Introduction
• Processing XML– Create a Parser object Point the object to an XML doc. Process
• Basic Operations for processing an XML document– A basic XML processing architecture
– 3 key layers: XML documents, The application, infrastructure for working with XML doc.
XML
Document(s)Applicatio
n
Character Stream
Serializer
Parser
Standardized
XML APIs
6Introduction (cont.)
• Basic Operations (cont.)– Parsing is the first step that enables an application to work with an XML
doc.
– Parsing process breaks up the text of an XML document into small identifiable pieces (nodes)
– Parser will break documents into pieces, recognized as start-end tags, attribute value pairs, chunks of text content, processing instructions, comments, and so on.
– These pieces are fed into application through well-defined APIs implementing a particular parsing model
– Four parsing models are commonly in use:
7Introduction (cont.)
• Basic Operations (cont.)– Four parsing models are commonly in use:
1. Pull Parsing
① The application always ask the parser to give it the next piece of information
② It is as if the app. has to “pull” the info. out of the parser, activate the communication by the app.
③ The XML community has not yet defined standard APIs for the “pull parsing”
④ It could happen soon because of its popularity!
2. Push Parsing
① The parser sends notifications to the application during the parsing process
② The notifications are sent in “reading” order (i.e., their appearance order in the document)
8Introduction (cont.)
• Basic Operations (cont.)
2. Push Parsing
③ Notifications are typically implemented as event callbacks in the application
④ Known as event-based parsing
⑤ Simple API for XML (SAX) is the standard for push parsing
3. One-step Parsing
① The parser reads the whole XML doc. and generates a data structure (a parse tree) describing its entire contents (elements, attributes,… etc.)
② W3C Standard : XML DOM (Document Object Model): specifies the types of objects that will be included in the parse tree, their properties, and operations
③ The DOM is a language- and platform-independent API.
④ The biggest problem is memory overhead and computational efficiency
9Introduction (cont.)
• Basic Operations (cont.)4. Hybrid Parsing
① Combine the characteristics of the other two parsing models to create efficient parsers for special scenarios
② Lets break the concept of loading and parsing to analyse the condition
– Loading the document: one-step parsing
– Parsing the rest of the document: providing partial information extracted from the document for the application
③ For example, Push + one-step parsing
– The application thinks it is working with a one-step parser; in reality, the parsing process has just begun
– As the application keep accessing more objects on the DOM tree, the parsing continues incrementally
– Just enough of the XML document is parsed at any given point to give the application the objects it wants to see
10An example of hybrid parsing
• In Sun's reference implementation, the DOM API builds on the SAX API as shown in the diagram,
• Sun's implementation of the Document Object Model (DOM) API uses the SAX libraries to read in XML data and construct the tree of data objects that constitutes the DOM.
• Sun's implementation also provides a framework to help output the object tree as XML data
11Introduction (cont.)
• Why define many models?– Trade-offs between memory efficiency, computational efficiency, and ease of
programming– A table is presented to compare the trade-offs of the models
Model Control of Parsing
Control of Context
Memory Efficiency
Computational efficiency
Ease of Programming
Pull Application Application High Highest LowPush (SAX) Parser Application High High LowOne-step(DOM) Parser Parser Lowest Lowest High
One-step(JDOM) Parser Parser Low Low Highest
Hybrid (DOM) Parser Parser Medium Medium High
Hybrid (JDOM) Parser Parser Medium Medium Highest
12Introduction (cont.)• How to choose between SAX and DOM: Whether you choose DOM or SAX is
going to depend on several factors:– Purpose of the application:
• To make changes to the data and output it as XML, then in most cases, DOM is the way to go.
• SAX is much more complex to program, as you'd have to make changes to a copy of the data rather than to the data itself.
– Amount of data: For large files, SAX is a better bet. – How the data will be used: If only a small amount of the data will actually be used,
you may be better off using SAX to extract it into your application. – On the other hand, if you know that you will need to refer back to large amounts of
information that has already been processed, SAX is probably not the right choice. – The need for speed: SAX implementations are normally faster than DOM
implementations.
• It's important to remember that SAX and DOM are not mutually exclusive. • Use DOM to create a stream of SAX events, • Use SAX to create a DOM tree. • In fact, most parsers used to create DOM trees are actually using SAX to do it!
13The SAX APIs
• SAX (The Simple API for XML )
– SAX is the Simple API for XML, originally a Java-only API.
– SAX was the first widely adopted API for XML in Java, and is a “de facto” standard.
– The current version is SAX 2.0.x, and there are versions for several programming language environments other than Java
– Another method for accessing XML document’s contents
– Developed by XML-DEV mailing-list members
– Uses event-based model
• Notifications (events) are raised as document is parsed
14The SAX APIs (cont.)
• SAX Parsing architecture: using the common abstract factory design pattern
1. Create an instance of SAXParserFactory (used to create an instance of SAX Parser)
2. SAXReader: event trigger, when the parse() method is invoked, the reader starts firing events to the application by invoking registered callbacks
3. Those methods are defined by the interfaces ContentHandler, ErrorHandler, DTDHandler, and EntityResolver.
15The SAX APIs (cont.)
• Here is a summary of the key objects in SAX APIs:
• SAXParserFactory
Creates an instance of the parser determined by the system property, javax.xml.parsers.SAXParserFactory
• SAXParser
Defines several kinds of parse() methods. In general, you pass an XML data source and a DefaultHandler object to the parser, which processes the XML and invokes the appropriate methods in the handler object.
• SAXReader
Carries on the conversation with the SAX event handlers you define
16The SAX APIs (cont.)
• DefaultHandler
Implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods), so you can override only the ones you're interested in.
• ContentHandler
Defines methods, which are invoked when the parser encounters the text in an XML element or an inline processing instruction, respectively.
• ErrorHandler
Methods in response to various parsing errors.
• DTDHandler
Defines methods you will generally never be called upon to use. Used when processing a DTD to recognize and act on declarations for an unparsed entity.
17The SAX APIs (cont.)
• Being event-based means that the parser reads an XML document from beginning to end,
• Each time it recognizes a syntax construction, it notifies the application that is running it
• The SAX parser notifies the application by calling methods from the ContentHandler interface.
• For example, when the parser comes to a less than symbol ("<"), it calls the startElement method;
18The SAX API (cont.)
• when it comes to character data, it calls the characters method;
• when it comes to the less than symbol followed by a slash ("</"), it calls the endElement method
• To illustrate, let's look at an example XML document and walk through what the parser does for each line.
19How SAX Processing Works
• SAX analyzes an XML stream as it goes by, much like an old ticker tape.
• Consider the following XML code snippet:
• A SAX processor analyzing this code snippet would generate, in general, the following events:
Start document Start element (samples) Characters (white space) Start element (server) Characters (UNIX) End element (server) Characters (white space) Start element (monitor) Characters (color) End element (monitor) Characters (white space) End element (samples)
<?xml version="1.0"?> <samples>
<server>UNIX</server> <monitor>color</monitor>
</samples>
20How SAX Processing Works (cont.)
• The SAX API allows a developer to capture these events and act on them
– What does “the developer” represent for?
• SAX processing involves the following steps:1. Create an event handler. 2. Create the SAX parser. 3. Assign the event handler to the parser. 4. Parse the document, sending each event to the handler.
21How SAX Processing Works (cont.)
• The pros and cons of event-based processing
– The advantages of this kind of processing are much like the advantages of streaming media. (like interpreter?)
– Analysis can get started immediately, rather than waiting for all of the data to be processed.
– The application is simply examining the data as it goes by, it doesn't need to store it in memory:
– A huge advantage when it comes to large documents.
22How SAX Processing Works (cont.)
• The pros and cons of event-based processing
– In fact, an application doesn't even have to parse the entire document;
– Stop when certain criteria have been satisfied.
– In general, SAX is also much faster than the alternative, the DOM.
– On the other hand, because the application is not storing the data in any way,
– it is impossible to make changes to it using SAX, or to move backwards in the data stream.
23SAX-based Parsers
• SAX-based parsers– Use Sun Microsystem’s JAXP in Textbook
• Tools– A text editor: XML files are simply text. To create and read them, a text editor is
all you need. – JavaTM 2 SDK, Standard Edition version 1.4.x: SAX support has been built
into the latest version of Java (available at http://java.sun.com/j2se/1.4.2/download.html), won't need to install any separate classes. Using an earlier version of Java, such as Java 1.3.x, you'll also need
• an XML parser such as the Apache project's Xerces-Java (available at http://xml.apache.org/xerces2-j/index.html),
• or Sun's Java API for XML Parsing (JAXP), part of the Java Web Services Developer Pack (available at http://java.sun.com/webservices/downloads/webservicespack.html).
• You can also download the official version from SourceForge (available at http://sourceforge.net/project/showfiles.php?group_id=29449).
– Other Languages: Should you wish to adapt the examples, SAX implementations are also available in other programming languages.
– You can find information on C, C++, Visual Basic, Perl, and Python implementations of a SAX parser at http://www.saxproject.org/?selected=langs.
24Some SAX-based parsers.
Product Description
JAXP
Sun’s JAXP is available from java.sun.com/xml. JAXP supports both SAX and DOM.
Xerces Apache’s Xerces parser is available at www.apache.org. Xerces supports both SAX and DOM.
MSXML 3.0 Microsoft’s msxml parser available at msdn.microsoft.com/xml. This parser supports both SAX and DOM.
25Setup
• Java applications to illustrate SAX API
– Java 2 Standard Edition required
• Download at www.java.sun.com/j2se
• Installation instructions
– www.deitel.com/faq/java3install.htm
– JAXP required• Download at java.sun.com/xml/download.html
26Events
• SAX parser– Invokes certain methods (Fig.
9.2) when events occur– Programmers override these
methods to process data
27Fig. 9.2 Methods invoked by the SAX
parserMethod Name Description
setDocumentLocator Invoked at the beginning of parsing.
startDocument Invoked when the parser encounters the start of an XML document.
endDocument Invoked when the parser encounters the end of an XML document.
startElement Invoked when the start tag of an element is encountered.
endElement Invoked when the end tag of an element is encountered.
characters Invoked when text characters are encountered. ignorableWhitespace Invoked when whitespace that can be safely
ignored is encountered. processingInstruction Invoked when a processing instruction is
encountered.
28The SAX API – an Example
<priceList> [parser calls startElement] <coffee> [parser calls startElement] <name>Mocha Java</name>
[parser calls startElement, characters, and endElement] <price>11.95</price>
[parser calls startElement, characters, and endElement] </coffee> [parser calls endElement]<priceList> [parser calls endElement]
• The default implementations of the methods that the parser calls do nothing• You need to write a subclass implementing the appropriate methods to get
the functionality you want• For example, suppose you want to get the price per pound for Mocha Java. • You would write a class extending DefaultHandler (the default
implementation of ContentHandler) in which you write your own implementations of the methods startElement and characters
29The SAX API – an Example (cont.)
• You code has three tasks. – Scan the command line for the name (or URI) of an XML file. – Create a parser object. – Tell the parser object to parse the XML file named on the command line, and tell it
to send your code all of the SAX events it generates.
• Step I: Scan the command line – For an argument. If there isn't an argument, you print an error message and exit. – Otherwise, assume that the first argument is the name or URI of an XML file
public static void main(String argv[]) { if (argv.length == 0 || (argv.length == 1 && argv[0].equals("-help"))) { // Print an error message and exit... } PrintOutline s1 = new PrintOutline(); s1.parseURI(argv[0]);
}
30The SAX API – an Example (cont.)
• Step II: Create a parser object – To create a parser object, use JAXP's SAXParserFactory API to create
a SAXParser
public void parseURI(String uri) {
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
. . .
31The SAX API – an Example (cont.)
• Step 3: Parse the file and handle any events – We've created our parser object, we need to have it parse the file. That's
done with the parse() method
– Notice that the parse() method takes two arguments. The first is the URI of the XML document, while the second is an object that implements the SAX event handlers
public void parseURI(String uri) { try {
SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser sp = spf.newSAXParser();
sp.parse(uri, this); } catch (Exception e) {
System.err.println(e); }
}
32The SAX API – an Example (cont.)
– In the case of PrintOutline, you're extending the SAX DefaultHandler interface:
– DefaultHandler has an implementation of a number of event handlers. These implementations do nothing, which means all your code has to do is implement handlers for the events you care about.
– Note: The exception handling above is sloppy; as an exercise for the reader, feel free to handle specific exceptions, such as SAXException or java.io.IOException.
– A major benefit of the DefaultHandler interface is that it shields you from having to implement all of the event handlers.
– DefaultHandler implements all of the event handlers; you just implement the ones you care about.
public class PrintOutline extends DefaultHandler{
…….
}
33The SAX API – an Example (cont.)
• Step IV: Implementing event handlers – startdocument() event handlers
– Simply writing out a basic XML declaration, regardless of whether one was in the original XML document or not.
– Currently the base SAX API doesn't return the details of the XML declaration
public void startDocument() { System.out.println("<?xml version=\"1.0\"?>");
}
34The SAX API – an Example (cont.)
• Next, here's what you do for startElement():– Print the name of the elements and attributes– Namespace URI in braces before the element's local name – rawName contains the raw XML 1.0 name if a namespace URI doesn't
have
public void startElement(String namespaceURI, String localName, String rawName, Attributes attrs) {
System.out.print("<"); System.out.print(rawName); if (attrs != null) {
int len = attrs.getLength(); for (int i = 0; i < len; i++) {
System.out.print(" "); System.out.print(attrs.getQName(i));
System.out.print("=\"");System.out.print(attrs.getValue(i));System.out.print("\"");
} } System.out.print(">");
}
35The SAX API – an Example (cont.)
• More event handling – characters() : printing the XML document to the console, you're simply
printing the portion of the character array that relates to this event
public void characters(char ch[ ], int start, int length) {
System.out.print(new String(ch, start, length)); }
– endElement() : simply write out the end tag – endDocument() : Do nothing just for the completeness.
public void endElement(String namespaceURI, String localName, String rawName) { System.out.print("</"); System.out.print(rawName); System.out.print(">");
} public void endDocument() {
System.out.println("End of Document");}
36The SAX API – an Example (cont.)• Step V: Error handling:
– SAX defines the ErrorHandler interface; – Implemented by DefaultHandler; – contains three methods: warning, error, and fatalError (defined by the XML
specification ) • warning(): Issued in response to a warning• error(): Issued in response to an error condition. • fatalError(): Issued in response to a fatal error
public void warning(SAXParseException ex) { System.err.println("[Warning] "+ getLocationString(ex)+": "+ ex.getMessage());
} public void error(SAXParseException ex) {
System.err.println("[Error] "+ getLocationString(ex)+": "+ ex.getMessage()); } public void fatalError(SAXParseException ex) throws SAXException {
System.err.println("[Fatal Error] "+ getLocationString(ex)+": "+ ex.getMessage()); throw ex;
}
37Example: Tree Diagram
• Java application– Parse XML document with SAX-based parser– Output document data as tree diagram– extends org.xml.sax.HandlerBase
• implements interface EntityResolver– Handles external entities
• implements interface DTDHandler– Handles notations and unparsed entities
• implements interface DocumentHandler– Handles parsing events
• implements interface ErrorHandler– Handles errors
Outline 38
Fig. 9.3 Application to create a tree diagram for an XML document.
import specifies location of classes needed by application
Assists in formatting
Override method to output parsed document’s URL
1 // Fig. 9.3 : Tree.java2 // Using the SAX Parser to generate a tree diagram.34 import java.io.*;5 import org.xml.sax.*; // for HandlerBase class6 import javax.xml.parsers.SAXParserFactory;7 import javax.xml.parsers.ParserConfigurationException;8 import javax.xml.parsers.SAXParser;910 public class Tree extends HandlerBase {11 private int indent = 0; // indentation counter1213 // returns the spaces needed for indenting14 private String spacer( int count )15 {16 String temp = "";1718 for ( int i = 0; i < count; i++ )19 temp += " ";2021 return temp;22 }2324 // method called before parsing25 // it provides the document location26 public void setDocumentLocator( Locator loc )27 {28 System.out.println( "URL: " + loc.getSystemId() );29 }30
import specifies location of classes needed by application
Assists in formatting
Override method to output parsed document’s URL
Outline 39
Fig. 9.3 Application to create a tree diagram for an XML document. (Part 2)
Overridden method called when root node encountered
Overridden method called when end of document is encountered
Overridden method called when start tag is encountered
Output each attribute’s name and value (if any)
31 // method called at the beginning of a document32 public void startDocument() throws SAXException33 {34 System.out.println( "[ document root ]" );35 }3637 // method called at the end of the document38 public void endDocument() throws SAXException39 {40 System.out.println( "[ document end ]" );41 }4243 // method called at the start tag of an element44 public void startElement( String name,45 AttributeList attributes ) throws SAXException46 {47 System.out.println( spacer( indent++ ) +48 "+-[ element : " + name + " ]");4950 if ( attributes != null )5152 for ( int i = 0; i < attributes.getLength(); i++ )53 System.out.println( spacer( indent ) +54 "+-[ attribute : " + attributes.getName( i ) +55 " ] \"" + attributes.getValue( i ) + "\"" );56 }57
Overridden method called when root node encountered
Overridden method called when end of document is encountered
Overridden method called when start tag is encountered
Output each attribute’s name and value (if any)
Outline 40
Fig. 9.3 Application to create a tree diagram for an XML document. (Part 3)
Overridden method called when end of element is encountered
Overridden method called when processing instruction is encountered
Overridden method called when character data is encountered
58 // method called at the end tag of an element59 public void endElement( String name ) throws SAXException60 {61 indent--;62 }6364 // method called when a processing instruction is found65 public void processingInstruction( String target,66 String value ) throws SAXException67 {68 System.out.println( spacer( indent ) +69 "+-[ proc-inst : " + target + " ] \"" + value + "\"" );70 }7172 // method called when characters are found73 public void characters( char buffer[], int offset,74 int length ) throws SAXException75 {76 if ( length > 0 ) {77 String temp = new String( buffer, offset, length );7879 System.out.println( spacer( indent ) +80 "+-[ text ] \"" + temp + "\"" );81 }82 }83
Overridden method called when end of element is encountered
Overridden method called when processing instruction is encountered
Overridden method called when character data is encountered
Outline 41
Fig. 9.3 Application to create a tree diagram for an XML document. (Part 4)
Overridden method called when ignorable whitespace is encountered
Overridden method called when error (usually validation) occurs
Overridden method called when problem is detected (but not considered error)
Method main starts application
84 // method called when ignorable whitespace is found85 public void ignorableWhitespace( char buffer[],86 int offset, int length )87 {88 if ( length > 0 ) {89 System.out.println( spacer( indent ) + "+-[ ignorable ]" );
90 }91 }9293 // method called on a non-fatal (validation) error94 public void error( SAXParseException spe ) 95 throws SAXParseException96 {97 // treat non-fatal errors as fatal errors98 throw spe;99 }100101 // method called on a parsing warning102 public void warning( SAXParseException spe )103 throws SAXParseException104 {105 System.err.println( "Warning: " + spe.getMessage() );106 }107108 // main method109 public static void main( String args[] )110 {111 boolean validate = false;112
Overridden method called when ignorable whitespace is encountered
Overridden method called when error (usually validation) occurs
Overridden method called when problem is detected (but not considered error)
Method main starts application
Outline 42
Fig. 9.3 Application to create a tree diagram for an XML document. (Part 5)
Allow command-line arguments (if we want to validate document)
SAXParserFactory can instantiate SAX-based parser
113 if ( args.length != 2 ) {
114 System.err.println( "Usage: java Tree [validate] " +
115 "[filename]\n" );
116 System.err.println( "Options:" );
117 System.err.println( " validate [yes|no] : " +
118 "DTD validation" );
119 System.exit( 1 );
120 }
121
122 if ( args[ 0 ].equals( "yes" ) )
123 validate = true;
124
125 SAXParserFactory saxFactory =
126 SAXParserFactory.newInstance();
127
128 saxFactory.setValidating( validate );
129
Allow command-line arguments (if we want to validate document)
SAXParserFactory can instantiate SAX-based parser
Outline 43
Fig. 9.3 Application to create a tree diagram for an XML document. (Part 6)
Instantiate SAX-based parser
Handles errors (if any)
130 try {
131 SAXParser saxParser = saxFactory.newSAXParser();
132 saxParser.parse( new File( args[ 1 ] ), new Tree() );
133 }
134 catch ( SAXParseException spe ) {
135 System.err.println( "Parse Error: " + spe.getMessage() );
136 }
137 catch ( SAXException se ) {
138 se.printStackTrace();
139 }
140 catch ( ParserConfigurationException pce ) {
141 pce.printStackTrace();
142 }
143 catch ( IOException ioe ) {
144 ioe.printStackTrace();
145 }
146
147 System.exit( 0 );
148 }
149}
Instantiate SAX-based parser
Handles errors (if any)
Outline 44
Fig. 9.4 XML document spacing1.xml.
XML document does not reference DTD
XML document with elements test, example and object
Root element test contains attribute name with value “ spacing 1 ”
Note that whitespace is preserved: attribute value (line 7), line feed (end of line 7), indentation (line 8) and line feed (end of line 8)
1 <?xml version = "1.0"?>
2
3 <!-- Fig. 9.4 : spacing1.xml -->
4 <!-- Whitespaces in nonvalidating parsing -->
5 <!-- XML document without DTD -->
6
7 <test name = " spacing 1 ">
8 <example><object>World</object></example>
9 </test>
URL: file:C:/Tree/spacing1.xml[ document root ]+-[ element : test ] +-[ attribute : name ] " spacing 1 " +-[ text ] "" +-[ text ] " " +-[ element : example ] +-[ element : object ] +-[ text ] "World" +-[ text ] ""[ document end ]
Root element test contains attribute name with value “ spacing 1 ”
XML document with elements test, example and object
XML document does not reference DTD
Note that whitespace is preserved: attribute value (line 7), line feed
(end of line 7), indentation (line 8) and line feed (end of line 8)
Outline 45
Fig. 9.5 XML document spacing2.xml.
DTD checks document’s characters, so any “removable” whitespace is ignorable
Line feed at line 14, spaces at beginning of line 15 and line feed at line 15 are ignored
1 <?xml version = "1.0"?>23 <!-- Fig. 9.5 : spacing2.xml -->4 <!-- Whitespace and nonvalidated parsing -->5 <!-- XML document with DTD -->67 <!DOCTYPE test [8 <!ELEMENT test (example)>9 <!ATTLIST test name CDATA #IMPLIED>10 <!ELEMENT element (object*)>11 <!ELEMENT object (#PCDATA)>12 ]>1314 <test name = " spacing 2 ">15 <example><object>World</object></example>16 </test>
URL: file:C:/Tree/spacing2.xml[ document root ]+-[ element : test ] +-[ attribute : name ] " spacing 2 " +-[ ignorable ] +-[ ignorable ] +-[ element : example ] +-[ element : object ] +-[ text ] "World" +-[ ignorable ][ document end ]
DTD checks document’s characters, so any “removable” whitespace is ignorable
Line feed at line 14, spaces at beginning of line 15 and line
feed at line 15 are ignored
Outline 46
Fig. 9.6 Well-formed XML document.
Invalid document because element example cannot contain element item
Validation disabled, so document parses successfully
Parser does not process text in CDATA section and returns character data
1 <?xml version = "1.0"?>23 <!-- Fig. 9.6 : notvalid.xml -->4 <!-- Validation and non-validation -->56 <!DOCTYPE test [7 <!ELEMENT test (example)>8 <!ELEMENT example (#PCDATA)>9 ]>1011 <test>12 <?test message?>13 <example><item><![CDATA[Hello & Welcome!]]></item></example>14 </test>
URL: file:C:/Tree/notvalid.xml[ document root ]+-[ element : test ] +-[ ignorable ] +-[ ignorable ] +-[ proc-inst : test ] "message" +-[ ignorable ] +-[ ignorable ] +-[ element : example ] +-[ element : item ] +-[ text ] "Hello & Welcome!" +-[ ignorable ][ document end ]
Invalid document because element example cannot contain element item
Validation disabled, so document parses successfully
Parser does not process text in CDATA section and returns character data
Outline 47
Fig. 9.6 Well-formed XML document.(Part 2)
Validation enabled
Parsing terminates when fatal error occurs at element item
URL: file:C:/Tree/notvalid.xml[ document root ]+-[ element : test ] +-[ ignorable ] +-[ ignorable ] +-[ proc-inst : test ] "message" +-[ ignorable ] +-[ ignorable ] +-[ element : example ]Parse Error: Element "example" does not allow "item"
Parsing terminates when fatal error occurs at element item
Validation enabled
Outline 48
Fig. 9.7 Checking an XML document without a DTD for validity.
Validation disabled in first output, so document parses successfully
Validation enabled in second output, and parsing fails because DTD does not exist
1 <?xml version = "1.0"?>23 <!-- Fig. 9.7 : valid.xml -->4 <!-- DTD-less document -->56 <test>7 <example>Hello & Welcome!</example>8 </test>
URL: file:C:/Tree/valid.xml[ document root ]+-[ element : test ] +-[ text ] "" +-[ text ] " " +-[ element : example ] +-[ text ] "Hello " +-[ text ] "&" +-[ text ] " Welcome!" +-[ text ] ""[ document end ]
URL: file:C:/Tree/valid.xml[ document root ]Warning: Valid documents must have a <!DOCTYPE declaration.Parse Error: Element type "test" is not declared.
Validation disabled in first output, so document parses successfully
Validation enabled in second output, and parsing fails because DTD does not exist
49Example: Tree Diagram (Summary)
• SAX 1.0 supported!
• When compiling, the message,
“Tree.java uses or overrides a deprecated API”
“Recompile with –deprecation for details”
• After compiling, 3 warning (class has been deprecated) were issued:
1. HandlerBase should be replaced by DefaultHandler
2. & 3. AttributeList should be replaced by Attributes
Better replace SAX1.0 with SAX2.0
Problem with Xerces vs. JAXP
50SAX 2.0
• SAX 2.0– Recently released– We have been using JAXP– Xerces parser (Apache) supports SAX 2.0
51SAX 2.0 (cont.)
• SAX 2.0 major changes– Class HandlerBase replaced with DefaultHandler
– AttributeList replaced with Attributes
– Element and attribute processing support namespaces
– Loading and parsing processes has changed
• Alternative methods can be applied
– Methods for retrieving and setting parser properties• e.g., whether parser performs validation
Outline 52
Fig. 9.10Java application that indents an XML document.
Replace class HandlerBase with class DefaultHandler
Provides same service as that of SAX 1.0
1 // Fig. 9.10 : printXML.java2 // Using the SAX Parser to indent an XML document.34 import java.io.*;5 import org.xml.sax.*;6 import org.xml.sax.helpers.*;7 import javax.xml.parsers.SAXParserFactory;8 import javax.xml.parsers.ParserConfigurationException;9 import javax.xml.parsers.SAXParser;1011 public class PrintXML extends DefaultHandler {12 private int indent = 0; // indention counter1314 // returns the spaces needed for indenting15 private String spacer( int count )16 {17 String temp = "";1819 for ( int i = 0; i < count; i++ )20 temp += " ";2122 return temp;23 }2425 // method called at the beginning of a document26 public void startDocument() throws SAXException27 {28 System.out.println( "<?xml version = \"1.0\"?>" );29 }30
Replace class HandlerBase with class DefaultHandler
Provides same service as that of SAX 1.0
Outline 53
Fig. 9.10Java application that indents an XML document. (Part 2)
Provides same service as that of SAX 1.0
Method startElement now has four arguments (namespace URI, element name, qualified element name and element attributes)
Attributes are now stored in Attributes object
Method endElement now has three arguments (namespace URI, element name and qualified element name)
31 // method called at the end of the document32 public void endDocument() throws SAXException33 {34 System.out.println( "---[ document end ]---" );35 }3637 // method called at the start tag of an element38 public void startElement( String uri, String eleName, 39 String raw, Attributes attributes ) throws SAXException40 {4142 System.out.print( spacer( indent ) + "<" + raw );4344 if ( attributes != null )4546 for ( int i = 0; i < attributes.getLength(); i++ )47 System.out.print( " "+ attributes.getLocalName( i ) +48 " = " + "\"" +49 attributes.getValue( i ) + "\"" );50 System.out.println( ">" );51 indent += 3;52 }5354 // method called at the end tag of an element55 public void endElement( String uri, String eleName, 56 String raw ) throws SAXException57 {58 indent -= 3;59 System.out.println( spacer(indent) + "</" + raw + ">");60 }61
Provides same service as that of SAX 1.0
Method startElement now has four arguments
(namespace URI, element name, qualified element name
and element attributes)
Method endElement now has three arguments (namespace
URI, element name and qualified element name)
Attributes are now stored in Attributes object
Outline 54
Fig. 9.10Java application that indents an XML document. (Part 3)
Provides same service as that of SAX 1.0
Provides same service as that of SAX 1.0
62 // method called when characters are found
63 public void characters( char buffer[], int offset,
64 int length ) throws SAXException
65 {
66 if ( length > 0 ) {
67 String temp = new String( buffer, offset, length );
68
69 if ( !temp.trim().equals( "" ) )
70 System.out.println( spacer(indent) + temp.trim() );
71 }
72 }
73
74 // method called when a processing instruction is found
75 public void processingInstruction( String target,
76 String value ) throws SAXException
77 {
78 System.out.println( spacer( indent ) +
79 "<?" + target + " " + value + "?>");
80 }
81
82 // main method
83 public static void main( String args[] )
84 {
85
Provides same service as that of SAX 1.0
Provides same service as that of SAX 1.0
Outline 55
Fig. 9.10Java application that indents an XML document. (Part 4)
Create Xerces SAX-based parser
SAX-based parser parses InputSource
86 try {
87 XMLReader saxParser = ( XMLReader ) Class.forName(
88 "org.apache.xerces.parsers.SAXParser" ).newInstance();
89
90 saxParser.setContentHandler( new PrintXML() );
91 FileReader reader = new FileReader( args[ 0 ] );
92 saxParser.parse( new InputSource( reader ) );
93 }
94 catch ( SAXParseException spe ) {
95 System.err.println( "Parse Error: " + spe.getMessage() );
96 }
97 catch ( SAXException se ) {
98 se.printStackTrace();
99 }
100 catch ( Exception e ) {
101 e.printStackTrace();
102 }
103
104 System.exit( 0 );
105 }
106}
Create Xerces SAX-based parser
SAX-based parser parses InputSource
Lines: 86-92 replace with the following codes:XMLReader xmlReader = null;try {
SAXParserFactory spfactory = SAXParserFactory.newInstance(); SAXParser saxParser = spfactory.newSAXParser();xmlReader = saxParser.getXMLReader();xmlReader.setContentHandler( new PrintXML() );xmlReader.setErrorHandler(new PrintXML());FileReader reader = new FileReader( argv[0] );xmlReader.parse( new InputSource( reader ) );
}
Outline 56
Fig. 9.11Sample execution of printXML.java
Processing instruction that links to stylesheet
Output
1 <?xml version = "1.0"?>23 <!-- Fig. 9.11 : test.xml -->
45 <?xml:stylesheet type = "text/xsl" href = "something.xsl"?>67 <test>8 <example value = "100">Hello and Welcome!</example>910 <a>
11 <b>12345</b>12 </a>13 </test>
Processing instruction that links to stylesheet
<?xml version = "1.0"?><?xml:stylesheet type = "text/xsl" href = "something.xsl"?><test> <example value = "100"> Hello and Welcome! </example> <a> <b> 12345 </b> </a></test>---[ document end ]---
57Summary
• SAX is a faster,
• More lightweight way to read and manipulate XML data than the Document Object Model (DOM).
• SAX is an event-based processor that allows you to deal with elements, attributes, and other data as it shows up in the original document. (streaming evenets)
• Because of this architecture, SAX is a read-only system,
• But that doesn't prevent you from using the data. Make a copy and process it!
58Summary (cont.)
• Resources– Basic grounding in XML read through the "Introduction to XML" tutorial (developerWorks,
August 2002).See the official SAX 2.0 page (http://www.saxproject.org).
– Learn to use a SAX filter to manipulate data (developerWorks, October 2001).– Read about using SAX filters for flexible processing (developerWorks, March 2003).– Find out how to build SAX-like apps in PHP (developerWorks, March 2003).– Learn how to set up a SAX parser (developerWorks, July 2003).– Learn more about validation and the SAX ErrorHandler interface (developerWorks, June 2001).– Understand how to stop a SAX parser when you have enough data (developerWorks, June 2002).– Explore XSL transformations to and from a SAX stream (developerWorks, July 2002).– Turn a SAX stream into a DOM or JDOM object with "Converting from SAX" (developerWorks,
April 2001).– Download the Java 2 SDK, Standard Edition version 1.4.2
(http://java.sun.com/j2se/1.4.2/download.html).– SAX was developed by the members of the XML-DEV mailing list. Try the Java version, now a
SourceForge project (http://sourceforge.net/project/showfiles.php?group_id=29449).– Try SAX implementations: available in other programming languages – Get IBM's XML-related tools such as the DB2 XML Extender, which provides a bridge between
XML and relational systems. Visit the DB2 Developer Domain to learn more about DB2.– Find out how you can become an IBM Certified Developer in XML and related technologies
59
That’s it for today!Have a nice and lovely spring
holiday!
• Do not forget to check the web site for important message regarding the demo date of your personal project.
60getLocationString()• The private method gives more details about the error. • The SAXParseException class defines methods such as getLineNumber() and
getColumnNumber() to provide the line and column number where the error occurred.
• getLocationString merely formats this information into a useful string• Putting this code into a separate method means you don't have to include this
code in every error handler
private String getLocationString(SAXParseException ex) {
StringBuffer str = new StringBuffer();String systemId = ex.getSystemId();if (systemId != null){
int index = systemId.lastIndexOf('/');if (index != -1)
systemId = systemId.substring(index + 1);str.append(systemId);
}str.append(':');str.append(ex.getLineNumber());str.append(':');str.append(ex.getColumnNumber());return str.toString();
}
61Processing Instruction
• Processing Instructions
• An XML file can also contain processing instructions that give commands or information to an application that is processing the XML data.
• Processing instructions have the following format:<?target instructions?>
62
• At the most basic level:– An application can directly output XML markup
– In the figure, this is indicated by the application working with a character stream
– Simple? Not really, must handle all the basic syntax rules (start-end tag, attribute quoting, …. etc.) – a good topic for final project!
• Parsing and serialization:
– Parsing the XML document first,
– Constructing a data structure describing the XML document
– Utilizing the process of emitting XML markup from a data structure
– Utilizing the API for the processing methods
Recommended