Upload
hicham-qaissi
View
968
Download
0
Embed Size (px)
DESCRIPTION
An eady reference of SAX, DOM and JDOM parsers
Citation preview
Parsing XML with SAX, DOM & JDOM 2
Contents 0. What is an XML parser? ............................................................................................ 3
1. Describing the example to develop ........................................................................... 3
2. SAX ............................................................................................................................. 6
3. DOM ........................................................................................................................ 11
4. JDOM ....................................................................................................................... 14
5. Conclusion ............................................................................................................... 16
Parsing XML with SAX, DOM & JDOM 3
0. What is an XML parser?
The XML parsers bring us the possibility of analyzing and composing of the XML
documents. Analyzing the XML data and structure, we can make some objects in some
languages programming (Java in our case). Also we can make the inverse process, in other
words, make a XML document from some data objects (See Fig. 1). In this manual, I analyze
with examples three kinds, SAX, DOM & JDOM.
1. Describing the example to develop
The example that I make is entertained. This is the same for the entire three API (SAX,
DOM and JDOM). The example consists in analyzing a XML document that contains
information about some books (ISBN code (isbn is an attribute), Name, Author name, Price,
Editorial). The program expects a book code (ISBN), and searches this book into the XML. If the
book exists, all its information are printed by the standard output, in other case, we print a
message notifying that the book doesn’t exist in the XML. Are you finding it as amusing as I do?
Let’s go!!!
Parsing XML with SAX, DOM & JDOM 4
The xml example (books.xml) is the following:
<books> <book isbn="0000000001"> <name>Book 1</name> <author>Author name 1</author> <price>12.54</price> <editorial>Editorial 1</editorial> </book> <book isbn="0000000002"> <name>Book 2</name> <author>Author name 2</author> <price>58.25</price> <editorial>Editorial 2</editorial> </book> <book isbn="0000000003"> <name>Book 3</name> <author>Author name 3</author> <price>29.45</price> <editorial>Editorial 3</editorial> </book> <book isbn="0000000004"> <name>Book 4</name> <author>Author name 4</author> <price>78.95</price> <editorial>Editorial 4</editorial> </book> <book isbn="0000000005"> <name>PBook 5</name> <author>Author name 5</author> <price>61.25</price> <editorial>Editorial 5</editorial> </book> </books>
Parsing XML with SAX, DOM & JDOM 5
For all parsers (SAX, DOM & JDOM), I use this DTO (Data Transfer Object):
public class MyBook { private String isbn ; private String name; private String author ; private String price ; private String editorial ; public String getIsbn() { return isbn ; } public void setIsbn(String isbn) { this. isbn = isbn; } public String getName() { return name; } public void setName(String name) { this. name = name; } public String getAuthor() { return author ; } public void setAuthor(String author) { this. author = author; } public String getPrice() { return price ; } public void setPrice(String price) { this. price = price; } public String getEditorial() { return editorial ; } public void setEditorial(String editorial) { this. editorial = editorial; } }
Parsing XML with SAX, DOM & JDOM 6
2. SAX
SAX (Simple API for XML), it Works by events and associated methods. As the parser is
reading the document XML and finds the components (the events) of the document
(elements, attributes, values, etc) or it detects errors, is invoking to the methods that the
programmer has associated. You can find more information about SAX on
www.saxproject.org.
First, be sure that you’ve included the sax jar in the classpath (The jar file can be
downloaded http://sourceforge.net/projects/sax/files/). We must instantiate the reader. This
reader implements the XMLReader’s interface, we can obtain it from the abstract class
SAXParser. I obtain SAXParser from the SAXParserFactory. The method parse of XMLReader
analyses the xml document:
import java.io.IOException; import org.xml.sax.SAXException; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.XMLReader; public class MySAXSeracher{ public static void main(String[] args) { try { SAXParserFactory factory = SAXParserFactory. newInstance(); factory.setNamespaceAware( true ); factory.setValidating( true ); SAXParser saxParser = factory.newSAXParser(); XMLReader xr = saxParser.getXMLReader(); xr.parse( args[0] ); } catch ( IOException ioe ) { System. out.println( "Error: " + ioe.getMessage() ); } catch ( SAXException saxe ){ System. out.println( "Error: " + saxe.getMessage() ); } catch ( ParserConfigurationException pce ){ System. out.println( "Error: " + pce.getMessage() ); } } }
If the program compiles, it means that java and the jar file are ok. Nevertheless, the
program doesn’t do anything because we haven’t been interested on any event at the
moment. It’s important to catch the exceptions java.io.IOException,
org.xml.sax.SAXException and
javax.xml.parsers.ParserConfigurationException.
Parsing XML with SAX, DOM & JDOM 7
To manipulate the events, our main class must extends
org.xml.sax.helpers.DefaultHandler. DefaultHandler implements the following
interfaces:
org.xml.sax.ContentHandler: events about data (The most extended)
org.xml.sax.ErrorHandler: events about errors
org.xml.sax.DTDhandler: DTD’s treatment
org.xml.sax.EntityResolver: foreign entities
We can make our own classes implementing ContentHandler and ErrorHandler to treat
the event which we are interested in:
Data: implementing ContentHandler and associate it to the reader (parser) with the
method setContenthandler().
Errors: implementing ErrorHandler and associate it to the reader (parser) with the
method setErrorHandler().
The most important methods in the interface ContentHandler (implemented by
DefaultHandler which is extended by our class MySAXSearcher) are:
• startDocument():Receive notification of the beginning of a document.
• endDocument(): Receive notification of the end of a document.
• startElement():Receive notification of the beginning of an element
• endElement():Receive notification of the end of an element.
• characters():Receive notification of character data.
See more about ContentHandler on
http://download.oracle.com/javase/1.4.2/docs/api/org/xml/sax/ContentHandler.html.
Now, MySAXSearcher is the following (I’ve made my own ContentHandler and
ErrorHandler, it’s much more clean than overriding the ContentHandler and ErrorHandler
interesting methods in our class that extends DefaultHandler):
Parsing XML with SAX, DOM & JDOM 8
MySAXSearcher.java:
import java.io.IOException; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.SAXException; import org.xml.sax.XMLReader; import org.xml.sax.helpers.DefaultHandler; public class MySAXSearcher extends DefaultHandler{ public static void main(String[] args) { MySAXSearcher searcher = new MySAXSearcher(); searcher.searchBook(args[0], args[1]); } private void searchBook(String xml, String isbn){ try { SAXParserFactory factory = SAXParserFactory. newInstance(); factory.setNamespaceAware( true ); factory.setValidating( true ); SAXParser saxParser = factory.newSAXParser(); XMLReader xr = saxParser.getXMLReader(); // Assigning my own ContentHandler at my XMLReader. MyContentHandler ch = new MyContentHandler(); ch. isbnSearched = isbn; xr.setContentHandler( ch ); // Assigning my own ErrorHandler at my XMLReader. xr.setErrorHandler( new MyOwnErrorHandler() ); xr.setFeature( "http://xml.org/sax/features/validation" , false); xr.setFeature( "http://xml.org/sax/features/namespaces" , true); long before = System. currentTimeMillis(); xr.parse( xml ); long after = System. currentTimeMillis(); printResult (xml, ch, after - before); } catch ( IOException ioe ) { System. out.println( "Error: " + ioe.getMessage() ); } catch ( SAXException saxe ){ System. out.println( "Error: " + saxe.getMessage() ); } catch ( ParserConfigurationException pce ){ System. out.println( "Error: " + pce.getMessage() ); } } public void printResult(String xml, MyContentHandler ch, long time){ System. out.println( "Document " + xml + ". Parsed in : " + time + " ms"); if (ch. book != null){ System. out.println( "Book found:" ); System. out.println( " Isbn: " + ch. book .getIsbn()); System. out.println( " Name: " + ch. book .getName()); System. out.println( " Author: " + ch. book .getAuthor()); System. out.println( " Price: " + ch. book .getPrice()); System. out.println( " Editorial: " + ch. book .getEditorial());
Parsing XML with SAX, DOM & JDOM 9
} else { System. out.println( "Book not found" ); } } }
MyContentHandler.java:
import org.xml.sax.Attributes; import org.xml.sax.ContentHandler; import org.xml.sax.Locator; import org.xml.sax.SAXException; public class MyContentHandler implements ContentHandler { boolean isBookFound = false; String isbnSearched = "" ; String currentNode = "" ; MyBook book = null; // Overrided public void startDocument() throws SAXException { System. out.println( "***Start document***" ); } // Overrided public void endDocument() throws SAXException { System. out.println( "***End document***" ); } // Overrided public void startElement(String uri, String local, String raw, Attributes attrs) { currentNode = local; if ( "book" .equals(local) && ! isBookFound ){ // The book node only has an attribute (isbn) if ( "isbn" .equals(attrs.getLocalName(0)) && isbnSearched .equals(attrs.getValue(0))){ isBookFound = true; book = new MyBook(); book .setIsbn( isbnSearched ); } } } // Overrided public void characters( char ch[], int start, int length) { String value = "" ; // I get the text value for ( int i = start; i < start + length; i++) { value+= Character. toString(ch [i]); } if (! "" .equals(value.trim()) && isBookFound ){ if( "name" .equals( currentNode )){ book .setName(value.trim()); } else if ( "author" .equals( currentNode )){ book .setAuthor(value.trim()); } else if ( "price" .equals( currentNode )){ book .setPrice(value.trim()); } else if ( "editorial" .equals( currentNode )){ book .setEditorial(value.trim()); isBookFound = false; } }
Parsing XML with SAX, DOM & JDOM
10
} // Overrided public void endElement(String arg0, String arg1, String arg2) throws SAXException { } // Overrided public void endPrefixMapping(String arg0) throws SAXException { } // Overrided public void ignorableWhitespace( char[] arg0, int arg1, int arg2) throws SAXException { } // Overrided public void processingInstruction(String arg0, String arg1) throws SAXException { } // Overrided public void setDocumentLocator(Locator arg0) { } // Overrided public void skippedEntity(String arg0) throws SAXException { } // Overrided public void startPrefixMapping(String arg0, String arg1) throws SAXException { } }
MyErrorHandler.java:
import org.xml.sax.ErrorHandler; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; public class MyErrorHandler implements ErrorHandler { // Overrided public void warning(SAXParseException ex) { System. err.println( "[Warning] : " + ex.getMessage()); } // Overrided public void error(SAXParseException ex) { System. err.println( "[Error] : " +ex.getMessage()); } // Overrided public void fatalError(SAXParseException ex) throws SAXException { System. err.println( "[Error!] : " +ex.getMessage()); } }
With our xml (books.xml), and the book code to search 0000000003, we can executed
our program with:
java MySAXSearcher “books.xml” “0000000003”
Parsing XML with SAX, DOM & JDOM
11
The result must be the following:
***Start document***
***End document*** Document books.xml Parsed in: 141ms Book found: Isbn: 0000000003 Name: Book 3 Author: Author name 3 Price: 29.45 Editorial: Editorial 3
3. DOM
DOM (Document Object Model), while SAX offers access at all elements of document,
DOM brings the parsing as a tree that can be parsed and transformed. DOM has some
disadvantages and advantages with regards to SAX:
Disadvantage:
• The data can be acceded only when the entire document is parsed.
• The tree is an object loaded on the memory; this is problematic for big and
complex documents.
Advantages:
• With DOM we can manipulate (update, delete and add elements) the xml
document. Also, we can create a new xml document.
To manipulate an xml document, we must instantiate a Document (interface) object
that implements the Document interface (extends the interface Node). We use the classes’
javax.xml.parsers.DocumentBuilder and javax.xml.parsers.DocumentBuilderFactory, we
invoke the method parse() to obtain a Document object.
For manipulate an XML with DOM, there are some important classes’:
org.w3c.dom.Document (interface representing the entire XML document),
org.w3c.dom.Element (Elements in the XML document), org.w3c.dom.Node (node that has
some elements) and org.w3c.dom.Att (The attributes of every element).
Ok, now let’s talk in java code language. As DTO (Data Transfer Object), I use the same
object MyBook.
Parsing XML with SAX, DOM & JDOM
12
MyDOMSearcher.java:
import java.io.File; import java.io.IOException; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import org.w3c.dom.Document; import org.w3c.dom.Node; import org.w3c.dom.NodeList; import org.xml.sax.SAXException; public class MyDOMSearcher { public static void main(String[] args) { MyDOMSearcher searcher = new MyDOMSearcher(); searcher.searchBook(args[0], args[1]); } private void searchBook(String xml, String isbn) { long before = System. currentTimeMillis(); MyBook book = null; try{ DocumentBuilderFactory factory = DocumentBuilderFactory. newInstance(); factory.setNamespaceAware( true); factory.setValidating( true); DocumentBuilder parser = factory.newDocumentBuilde r(); // I assign my own ErrorHandler to my Parser parser.setErrorHandler( new MyErrorHandler()); File file = new File(xml); Document doc = parser.parse (file); // I obtain all the elements <book> // NodeList is an interface that has 2 methods: // 1. item(int): returns the Node (Interface) Objec t of the position int. // 2. getLength(): returns the length of the List NodeList booksNodes = doc.getElementsByTagName( "book" ); NodeList bookChildsNodes = null; String isbnAttribute = "" ; for( int i = 0; i < booksNodes.getLength(); i++) { Node node = booksNodes.item(i); if(node != null && node.hasAttributes()) { isbnAttribute = node.getAttributes().getNamedItem( "isbn" ).getNodeValue(); if(isbnAttribute.equals(isbn)){ //I've caught the isbn searched if(book == null){ book = new MyBook(); book.setIsbn(isbn); } if(node.hasChildNodes()){ bookChildsNodes = node.getChildNodes(); for ( int j = 0; j < bookChildsNodes.getLength(); j++) { if( "name" .equals(bookChildsNodes.item(j).getNodeName())){ book.setName(bookChildsNodes.item(j).getTextContent ());
Parsing XML with SAX, DOM & JDOM
13
} else i f( "author" .equals(bookChildsNodes.item(j).getNodeName())){ book.setAuthor(bookChildsNodes.item(j).getTextCon tent()); } else if( "price" .equals(bookChildsNodes.item(j).getNodeName())){ book.setPrice(bookChildsNodes.item(j).getTextCont ent()); } else if( "editorial" .equals(bookChildsNodes.item(j).getNodeName())){ book.setEditorial(bookChildsNodes.item(j).getText Content()); // I've found my book. Ending the for iteration break; } } } } } } } catch(IOException ioe){ System. err.println( "[Error] : " +ioe.getMessage()); } catch(ParserConfigurationException pce){ System. err.println( "[Error] : " +pce.getMessage()); } catch(SAXException se){ System. err.println( "[Error] : " +se.getMessage()); } long after = System. currentTimeMillis(); printResults(xml, book, after - before); } public void printResults(String xml, MyBook book, long time) { System. out.println( "Document " + xml + ". Parsed in : " + time + " ms"); if (book != null){ System. out.println( "Book found:" ); System. out.println( " Isbn: " + book.getIsbn()); System. out.println( " Name: " + book.getName()); System. out.println( " Author: " + book.getAuthor()); System. out.println( " Price: " + book.getPrice()); System. out.println( " Editorial: " + book.getEditorial()); } else{ System. out.println( "Book not found" ); } } }
Parsing XML with SAX, DOM & JDOM
14
4. JDOM
All the precedents API’s are available for many programming languages, but their use
is laborious in Java. A specific API has been made for java (JDOM), that API uses the own
capacities and features of Java, therefore, using it make the XMlL parsing easily. We can find
some related information on www.jdom.org.
Now, let’s make the same example (searching a book in our XML) with JDOM (be sure
that the jar is installed in your classpath, you can download it on
http://www.jdom.org/dist/binary/).
MyJDOMSearcher.java:
import java.io.IOException; import java.util.Iterator; import java.util.List; import org.jdom.Document; import org.jdom.Element; import org.jdom.JDOMException; import org.jdom.input.SAXBuilder; public class MyJDOMSearcher { private String isbn ; private MyBook book ; private boolean noSearchMore = false; public static void main(String[] args) { try { long before = System. currentTimeMillis(); MyJDOMSearcher searcher = new MyJDOMSearcher(); // The second parameter is the isbn to search searcher. isbn = args[1]; SAXBuilder saxBuilder = new SAXBuilder(); Document document = saxBuilder.build(args[0]) ; searcher.searchBook(document.getRootElement() ); long after = System. currentTimeMillis(); searcher.printResults(args[0], after-before); } catch (JDOMException jde){ System. err.println( "[Error] JDOMException: " +jde.getMessage()); } catch (IOException ioe){ System. err.println( "[Error] IOException: " +ioe.getMessage()); } } private void searchBook(Element element){ inspect(element); List content = element.getContent(); Iterator iterator = content.iterator(); Element child = null; Object object = null;
Parsing XML with SAX, DOM & JDOM
15
while(iterator.hasNext()){ // All times we have "books" node object = iterator.next(); if(object instanceof Element){ child = ((Element)object); //Casting from Object to Element searchBook(child); } } } // Recursively descend the tree public void inspect(Element element) { if (! noSearchMore ){ // If I've had the book yet, I'll do anything if( "book" .equals(element.getQualifiedName()) & book == null){
if( isbn .equals(element.getAttribute( "isbn" ).getValue())){ book = new MyBook(); book .setIsbn( isbn ); } } if( book != null){ if( "name" .equals(element.getQualifiedName())){ book .setName(element.getValue()); } if( "author" .equals(element.getQualifiedName())){ book .setAuthor(element.getValue()); } if( "price" .equals(element.getQualifiedName())){ book .setPrice(element.getValue()); } if( "editorial" .equals(element.getQualifiedName())){ book .setEditorial(element.getValue()); noSearchMore = true; } } } } private void printResults(String xml, long time) { System. out.println( "Document " + xml + ". Parsed in : " + time + " ms"); if ( book != null){ System. out.println( "Book found:" ); System. out.println( " Isbn: " + book .getIsbn()); System. out.println( " Name: " + book .getName()); System. out.println( " Author: " + book .getAuthor()); System. out.println( " Price: " + book .getPrice()); System. out.println( " Editorial: " + book .getEditorial()); } else { System. out.println( "Book not found" ); } } }
Parsing XML with SAX, DOM & JDOM
16
5. Conclusion
Executing the same example with the three API’s (MySAXSearcher, MyDOMSearcher
and MyJDOMSearcher) having us parameters received the same xml file and the isbn to search
("0000000003"), the result (in time) obtained is the following:
MySAXSearcher MyDOMSearcher MyJDOMSearcher 93 ms 750 ms 609 ms
The SAX API is faster than DOM and JDOM (But it’s laborious).
���� ��