38
Internet Technology 1 Presentation 10: XML technologies

Internet Technology 1 Presentation 10: XML technologies

Embed Size (px)

Citation preview

Page 1: Internet Technology 1 Presentation 10: XML technologies

Internet Technology 1

Presentation 10:

XML technologies

Page 2: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 2 af 34

Outline

• W3C & heritage of XML• XML Markup & Namespaces• DTD’s• XML Schemas• DOM/SAX

Page 3: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 3 af 34

W3C & the legacy of XML

• World Wide Consortium– Founded 1994 to lead the WWW into the future– For standardizations on the Internet– First Chairman: Sir Tim Berners-Lee– Run by Chairman, Director & Staff– Boards of members submits proposals and work to formulate– Ensures standardization of WWW technologies

• Like: XHTML, XML, XSL, CSS, SOAP, WAP etc.• Members: Microsoft, IBM, SUN, Oracle

– http://www.w3c.org

Page 4: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 4 af 34

SGML

• Standard Generalized Markup Language (ISO 8879:1986 SGML)

• ISO-standard technology for defining generalized markup languages for documents

• Original SGML, which was accepted in October 1986

• Based on Generalized Markup Language (GML)

• GML was developed in the 1960s by Charles Goldfarb, Edward Mosher and Raymond Lorie

This is a front-of-screen photograph from a 3279 mainframe-attached screenLEXX Editor for the OED (1985/1986), sample entry (segment of)This is a front-of-screen photograph from a 3279 mainframe-attached screenLEXX Editor for the OED (1985/1986), sample entry (segment of)

Page 5: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 5 af 34

XML markup

• eXtended Markup Language• XML based on SGML (subset of)• Like SGML for structure not layout (as HTML)• XML targets the Internet – but is also being used

for application exchange formats (Open Office, XMI) – CSVs

• XML is an W3C Recommendation– http://www.w3.org/TR/REC-xml

• Structure decided by DTD or Schema (more later)• Wide spread support for XML (hype)

Page 6: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 6 af 34

Examples of XML usage

• GUI for “thin” clients– XHTML– WML (we shall look closer at this shortly)

• Inter-process communication– SOAP– BizTalk– ebXML

• Databases– XML Databases– XQuery– XLink, XPointer

• Representation / exchange of data– XMI (UML diagrams exchange format)– MathXML– CML– Proprietary

• Example: EPJ XML –good thing when every danish county makes its own• Easy to comprehend due to the nature of XML• Open office

Page 7: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 7 af 34

Presenting XML documents

• Examples fetched from DEITEL• First we will look at a standalone XML document

and its components (elements)– Note: XML document needs to be Well formed (must

be syntactically correct)

• Please go to http://www.w3schools.com/default.asp to see more in-depth examples of XML usage

Page 8: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 8 af 38

Article.xml

1 <?xml version = "1.0"?>2 3 <!-- Fig. 20.1: article.xml -->4 <!-- Article structured with XML -->5 6 <article>7 8 <title>Simple XML</title>9 10 <date>September 19, 2001</date>11 12 <author>13 <firstName>Tem</firstName>14 <lastName>Nieto</lastName>15 </author>16 17 <summary>XML is pretty easy.</summary>18 19 <content>Once you have mastered XHTML, XML is easily20 learned. You must remember that XML is not for21 displaying information but for managing information.22 </content>23 24 </article>

Element article is the root element.

Elements title, date, author, summary and content are child elements of article.

Element author is a “container element” contains Childs, firstName and lastName

Page 9: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 9 af 34

Browser displaying XML (unformatted)

IE5.5 displaying article.xml.

Page 10: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 10 af 34

Small “stop up” exercise 1

• We will do some small common exercises.• Use 5 min. to this exercise:• Setup a XML document, containing data about

your merits at IHA and other institutions– Incl. “group of subjects” to which you have participated

and your possible trainee period– The courses might have a name, a code, a type, a

course holder, type of assessment, and other topics you find of relevance. Start with 2-3 courses

– Your course gradings how do they relate to the courses and what are they?

Page 11: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 11 af 34

Use of XML Namespaces

• XML namespaces used to avoid naming conflicts• When several different elements are involved• <book> isn't always a book• Keyword ”xmlns”• Remember xmlns is nothing more than organizing

names in spaces.

Page 12: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 12 af 38

Namespace.xml

1 <?xml version = "1.0"?>2 3 <!-- Fig. 20.4 : namespace.xml -->4 <!-- Demonstrating Namespaces -->5 6 <text:directory xmlns:text = "urn:deitel:textInfo"7 xmlns:image = "urn:deitel:imageInfo">8 9 <text:file filename = "book.xml">10 <text:description>A book list</text:description>11 </text:file>12 13 <image:file filename = "funny.jpg">14 <image:description>A funny picture</image:description>15 <image:size width = "200" height = "100"/>16 </image:file>17 18 </text:directory>

Keyword xmlns creates two namespace prefixes, text and image.

URIs (Uniform Resource Identifiers) ensure that a namespace is unique.

Attribute

Page 13: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 13 af 38

Defaultnamespace.xml

1 <?xml version = "1.0"?>2 3 <!-- Fig. 20.5 : defaultnamespace.xml -->4 <!-- Using Default Namespaces -->5 6 <directory xmlns = "urn:deitel:textInfo"7 xmlns:image = "urn:deitel:imageInfo">8 9 <file filename = "book.xml">10 <description>A book list</description>11 </file>12 13 <image:file filename = "funny.jpg">14 <image:description>A funny picture</image:description>15 <image:size width = "200" height = "100"/>16 </image:file>17 18 </directory>

Default namespace.

Element file uses the default namespace.

Element file uses the namespace prefix image.

Page 14: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 14 af 38

Stop up exercise 2

• Use Namespace(s) in your XML document (from s.u. exc. 1)

Page 15: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 15 af 38

DTDs

• Document Type Definition• Extended Backus-Naur Form• Defines how an XML document is structured

– Required elements

– Nesting of elements

– Does not define types or behaviour

• If DTD is used – some parsers can decide if XML document is “valid” – which is more than just “well formed”

Page 16: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 16 af 38

Letter.dtd1 <!-- Fig. 20.4: letter.dtd -->2 <!-- DTD document for letter.xml -->3 4 <!ELEMENT letter ( contact+, salutation, paragraph+, 5 closing, signature )>6 7 <!ELEMENT contact ( name, address1, address2, city, state,8 zip, phone, flag )>9 <!ATTLIST contact type CDATA #IMPLIED>10 11 <!ELEMENT name ( #PCDATA )>12 <!ELEMENT address1 ( #PCDATA )>13 <!ELEMENT address2 ( #PCDATA )>14 <!ELEMENT city ( #PCDATA )>15 <!ELEMENT state ( #PCDATA )>16 <!ELEMENT zip ( #PCDATA )>17 <!ELEMENT phone ( #PCDATA )>18 <!ELEMENT flag EMPTY>19 <!ATTLIST flag gender (M | F) "M">20 21 <!ELEMENT salutation ( #PCDATA )>22 <!ELEMENT closing ( #PCDATA )>23 <!ELEMENT paragraph ( #PCDATA )>24 <!ELEMENT signature ( #PCDATA )>

The ELEMENT element type declaration defines the rules for element letter.

The plus sign (+) occurrence indicator specifies that the DTD allows one or more occurrences of an element. (2 contacs in our example)

The contact element definition specifies that element contact contains child elements name, address1, address2, city, state, zip, phone and flag— in that order.

#CDATA Unparsed character#PCDATA Parsed character

Page 17: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 17 af 38

Letter.dtd

1 <!-- Fig. 20.4: letter.dtd -->2 <!-- DTD document for letter.xml -->3 4 <!ELEMENT letter ( contact+, salutation, paragraph+, 5 closing, signature )>6 7 <!ELEMENT contact ( name, address1, address2, city, state,8 zip, phone, flag )>9 <!ATTLIST contact type CDATA #IMPLIED>10 11 <!ELEMENT name ( #PCDATA )>12 <!ELEMENT address1 ( #PCDATA )>13 <!ELEMENT address2 ( #PCDATA )>14 <!ELEMENT city ( #PCDATA )>15 <!ELEMENT state ( #PCDATA )>16 <!ELEMENT zip ( #PCDATA )>17 <!ELEMENT phone ( #PCDATA )>18 <!ELEMENT flag EMPTY>19 <!ATTLIST flag gender (M | F) "M">20 21 <!ELEMENT salutation ( #PCDATA )>22 <!ELEMENT closing ( #PCDATA )>23 <!ELEMENT paragraph ( #PCDATA )>24 <!ELEMENT signature ( #PCDATA )>

The ATTLIST element type declaration defines an attribute (i.e., type) for the contact element.

Keyword #IMPLIED specifies that if the parser finds a contact element without a type attribute, the parser can choose an arbitrary value for the attribute or ignore the attribute and the document will be valid.

Flag #PCDATA specifies that the element can contain parsed character data (i.e., text).

See letter.xml next page

Page 18: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 18 af 38

Letter.xml

1 <?xml version = "1.0"?>2 3 <!-- Fig. 20.3: letter.xml -->4 <!-- Business letter formatted with XML -->5 6 <!DOCTYPE letter SYSTEM "letter.dtd">7 8 <letter>9 10 <contact type = "from">11 <name>John Doe</name>12 <address1>123 Main St.</address1>13 <address2></address2>14 <city>Anytown</city>15 <state>Anystate</state>16 <zip>12345</zip>17 <phone>555-1234</phone>18 <flag gender = "M"/>19 </contact>20 21 <contact type = "to">22 <name>Joe Schmoe</name>23 <address1>Box 12345</address1>24 <address2>15 Any Ave.</address2>25 <city>Othertown</city>26 <state>Otherstate</state>27 <zip>67890</zip>28 <phone>555-4321</phone>29 <flag gender = "M"/>30 </contact>31

Page 19: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 19 af 38

Letter.xml

Program Output

32 <salutation>Dear Sir:</salutation>33 34 <paragraph>It is our privilege to inform you about our new35 database managed with XML. This new system allows36 you to reduce the load of your inventory list server by37 having the client machine perform the work of sorting38 and filtering the data.</paragraph>39 <closing>Sincerely</closing>40 <signature>Mr. Doe</signature>41 42 </letter>

http://msdn.microsoft.com/archive/default.asp?url=/archive/en-us/samples/internet/welcome.asp

Page 20: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 20 af 38

Program Output

http://msdn.microsoft.com/archive/default.asp?url=/archive/en-us/samples/internet/welcome.asp Or from http://kurser.iha.dk/eit/net1/iw3_htp_3e_examples/ch20_XML/xml_validator.exe

Page 21: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 21 af 38

Stop up exercise 3

• Take 5 minutes:• Setup a DTD, that confirms to your xml

document educations.xml and extend the xml document with DTD information

Page 22: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 22 af 38

XML Schema

• DTD works OK – but– Is in Ex. Backus-Naur Form – why not use XML to describe?– Cannot declare a type to of an element – <amount>hundrede kr</amount>

• Could give problems

– Several other problems

• W3C XML Schema– Use XML to describe the structure of XML documents …– Possible to give type information to XML definitions

• Not supported by all parsers yet• Will live besides DTDs for a while

Page 23: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 23 af 38

Book.xsd1 <?xml version = "1.0"?>2 3 <!-- Fig. 20.8 : book.xsd -->4 <!-- Simple W3C XML Schema document -->5 6 <xsd:schema xmlns:xsd = "http://www.w3.org/2000/10/XMLSchema"7 xmlns:deitel = "http://www.deitel.com/booklist"8 targetNamespace = "http://www.deitel.com/booklist">9 10 <xsd:element name = "books" type = "deitel:BooksType"/>11 12 <xsd:complexType name = "BooksType">13 <xsd:element name = "book" type = "deitel:BookType"14 minOccurs = "1" maxOccurs = "unbounded"/>15 </xsd:complexType>16 17 <xsd:complexType name = "BookType">18 <xsd:element name = "title" type = "xsd:string"/>19 </xsd:complexType>20 21 </xsd:schema>

Namespace prefix.

Element element defines an element to be included in the XML document structure.

Attributes name and type specify the element’s name and data type, respectively.

Element complexType defines an element type that has a child element named book.

Attribute minOccurs specifies that books must contain a minimum of one book element.

The resulting namespace.

Page 24: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 24 af 38

Book.xsd

1 <?xml version = "1.0"?>2 3 <!-- Fig. 20.8 : book.xsd -->4 <!-- Simple W3C XML Schema document -->5 6 <xsd:schema xmlns:xsd = "http://www.w3.org/2000/10/XMLSchema"7 xmlns:deitel = "http://www.deitel.com/booklist"8 targetNamespace = "http://www.deitel.com/booklist">9 10 <xsd:element name = "books" type = "deitel:BooksType"/>11 12 <xsd:complexType name = "BooksType">13 <xsd:element name = "book" type = "deitel:BookType"14 minOccurs = "1" maxOccurs = "unbounded"/>15 </xsd:complexType>16 17 <xsd:complexType name = "BookType">18 <xsd:element name = "title" type = "xsd:string"/>19 </xsd:complexType>20 21 </xsd:schema>

A BookType has an Element named Title of Type “xsd:string” – which is defined at “http://www.w3.org/2000/10/XMLSchema”

Page 25: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 25 af 38

Page 26: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 26 af 38

Stop up exercise 4

• Take a 5 minutes discussion to your neighbor about how to replace the DTD with a Schema

Page 27: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 27 af 38

How to use XML?

• Need a parser (or a parser API) to access XML (as with CSV)• Two commonly used methods:

– DOM (Document Object Model)• W3C Recommendation• Makes a tree structure representation of an XML document in memory

– SAX (Simple API for XML)• Supported by diff. vendors• Parses document line by line and sends events to subscribers• Needs to parse every time access to XML document is needed

• DOM is better for– Slow to load XML document (need all)– Quick access to random read or update of XML (like WWW browser - BOM)– Requires a lot of memory (need to hold entire XML in mem)

• SAX is better for– Applications subscribing to certain parts of XML (event subscription)– Slow for random access to XML document (must parse every time)

• Think at a XML documents as a kind of persistent data (“a database”)

Page 28: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 28 af 38

What is DOM

• DOM: Document Object Model– http://www.w3.org/TR/2003/REC-DOM-Level-2-HTML-

20030109/

• W3C definition:– Standard for accessing structured documents– Core DOM used with XML– HTML DOM used with HTML– Representation of an object as an object tree structure – Provides a uniform interface for programming and scripting

languages– API’s available for JavaScript, Java, C++, C# etc.

Page 29: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 29 af 38

DOM Tree Structure

• Tree structure of an XML document (left)• … or HTML (right) document

… table

tbody

… …

tr trtr

td tdtd

tekst

<table> <tbody> <tr> <td> tekst </td>….

Page 30: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 30 af 38

Example – using DOM on Article.xml

• We have looked at Article.xml• We Will:

– Look at the Article.xml document again

– Look at the Tree Structure formed by loading it into a DOM

– Use JavaScript to work on it

Page 31: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 31 af 38

1 <?xml version = "1.0"?>2 3 <!-- Fig. 20.1: article.xml -->4 <!-- Article structured with XML -->5 6 <article>7 8 <title>Simple XML</title>9 10 <date>September 19, 2001</date>11 12 <author>13 <firstName>Tem</firstName>14 <lastName>Nieto</lastName>15 </author>16 17 <summary>XML is pretty easy.</summary>18 19 <content>Once you have mastered XHTML, XML is easily20 learned. You must remember that XML is not for21 displaying information but for managing information.22 </content>23 24 </article>

XML document – Article.XML

Page 32: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 32 af 38

DOM Methods

firstName

lastName

contents

summary

author

date

title

article

Tree structure for article.xml.

Page 33: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 33 af 38

DOMExample.html

1 <?xml version="1.0"?>2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">4 <html xmlns="http://www.w3.org/1999/xhtml">5 6 <!-- Fig. 20.15 : DOMExample.html -->7 <!-- DOM with JavaScript -->8 9 <head>10 <title>A DOM Example</title>11 </head>12 13 <body>14 15 <script type = "text/javascript" language = "JavaScript">16 <!--17 var xmlDocument = new ActiveXObject( "Microsoft.XMLDOM" );18 19 xmlDocument.load( "article.xml" );20 21 // get the root element22 var element = xmlDocument.documentElement;23 24 document.writeln( 25 "<p>Here is the root node of the document: " +26 "<strong>" + element.nodeName + "</strong>" +27 "<br />The following are its child elements:" +28 "</p><ul>" );29 30 // traverse all child nodes of root element31 for ( var i = 0; i < element.childNodes.length; i++ ) {32 var curNode = element.childNodes.item( i );33

Instantiate a Microsoft XML Document Object Model object and assign it to reference xmlDocument.

method load loads article.xml (Fig. 20.1) into memory.

Property documentElement corresponds to the root element in the document (e.g., article).

Iterates through the root node’s children using property childNodes.

Page 34: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 34 af 38

DOMExample.html

34 // print node name of each child element35 document.writeln( "<li><strong>" + curNode.nodeName36 + "</strong></li>" );37 }38 39 document.writeln( "</ul>" );40 41 // get the first child node of root element42 var currentNode = element.firstChild;43 44 document.writeln( "<p>The first child of root node is: " +45 "<strong>" + currentNode.nodeName + "</strong>" +46 "<br />whose next sibling is:" );47 48 // get the next sibling of first child49 var nextSib = currentNode.nextSibling;50 51 document.writeln( "<strong>" + nextSib.nodeName +52 "</strong>.<br />Value of <strong>" +53 nextSib.nodeName + "</strong> element is: " );54 55 var value = nextSib.firstChild;56 57 // print the text value of the sibling58 document.writeln( "<em>" + value.nodeValue + "</em>" +59 "<br />Parent node of <strong>" + nextSib.nodeName +60 "</strong> is: <strong>" + 61 nextSib.parentNode.nodeName + "</strong>.</p>" );62 -->63 </script>64 65 </body>66 </html>

Retrieve the root node’s first child node (i.e., title) using property firstChild.

Property parentNode returns a node’s parent node.

Page 35: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 35 af 38

Program Output

Page 36: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 36 af 38

Tools

• XML-Spy: www.xml-spy.com• Sun’s Stylus Studio: www.stylusstudio.com • Others:

– API’s for programmatic access

Page 37: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 37 af 38

Formatting XSL & CSS

• XML is only content – no formatting

• Possible to transform the data to XHTML (or other) using JavaScript og server-side

• The W3C ideal is using CSS or XSL – eXtensible Style Sheets

• CSS is most common today– but XSL has more features

Page 38: Internet Technology 1 Presentation 10: XML technologies

Ingeniørhøjskolen i ÅrhusSlide 38 af 38

The 3 Main Technologies of XSL

• XSLT, a language for transforming information

• XSL or XSL-FO, a language for formatting information

• XPath, a language for defining parts of an XML document and accessing them

• Each of these elements could fill an entire class.• We will be dealing with them in a later course