47
XML An introduction

XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

  • View
    228

  • Download
    0

Embed Size (px)

Citation preview

Page 1: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

XML

An introduction

Page 2: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

xml

• XML like HTML is created from the Standard Generalized Markup Language, SGML

Page 3: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

A brief introduction to XML: A simple xml doc

<?xml version =“1.0”?>

<!– a simple xml example…this is a comment --!>

<mymessage>

<message>Welcome to XML!</message>

</mymessage>

Page 4: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

In validator: file is in examples\ch05\intro.xml

Page 5: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

XML documents and format

• An XML document contains data, not formatting information. As we’ll learn, there are ways (xsl and fo files, for example) to provide formatting for xml analogous to that in which css provided formatting for html.

Page 6: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

XML

• XML are typically stored in a file with suffix .xml, though this is not required. They can be created with any editor (save as ASCII text). Many packages like MS Word can save files as type .xml

• An xml document contains a single root which contains other elements, Anything appearing before the root is called the prolog. Elements directly under the root are its children. The structure is recursive.

• In the example, the root’s child message contains the text “Here is some message”.

Page 7: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

The character set• XML characters are CR, LF and Unicode.• An XML document consists of markup and character data. • Markup is enclosed in angle brackets (like html): <>• Character data appears between the start and end tag.• An xml parser passes whitespace characters to the application.

Insignificant whitespace can be collapsed in a process called normalization.

• It is a good idea to add whitespace to an xml document for readability.

• &, <, >, ‘ and “ are reserved characters. An “entity reference” makes it possible to use these as characters in the character data part of an xml document.

• Entity references begin with & and end with ;• In this way character data is not confused with markup.• Single and double quote are used to delimit attribute values.

Page 8: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

More on syntax• There must be exactly one root.• Proper nesting of elements is required.• Start tags require close tags.• Unlike HTML, the author can define her own tags in XML.• Tags are case sensitive• Parser needs to distinguish markup from character data• Typically, whitespace is normalized – reduced to 1 whitespace char.• Entity references are marked with an ampersand and allow us to

use meta characters (‘<‘, ‘>’ and so on) which are part of the language syntax.

• Entity references (for example, “&lt”) allow us to represent and distinguish the reserved characters <,>,& in XML.

• They may only appear as an entity reference in character data

Page 9: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

XML intro continued

• A DOM-based parser returns a tree structure. A DOM parser must process the entire document to create a (java) object which may be 3 or 4X the size of the original. Not advisable if there are storage size constraints.

• A SAX (Simple-API for XML) -based parser returns events. SAX parsers have a smaller footprint.

• Many parsers can be downloaded for free and several come with java 1.4+

Page 10: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

A brief introduction to XML

• An xml validator parses an XML document and indicates if it is correct.

• A number of free “Validators” are available, including one from MS which I downloaded and used in this ppt.

Page 11: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Validator

Microsoft provides a validating program free for download (with javascript and VBscript versions) at

http://msdn.microsoft.com/archive/default.asp?url=/archive/en-us/samples/internet/xml/xml_validator/default.asp

Or search MSDN+validatorThere are others out there:http://validator.w3.org/http://www.stg.brown.edu/service/xmlvalid/http://www.w3schools.com/XML/xml_validator.asp

Page 12: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Link to validator program on my w drive

• http://employees.oneonta.edu/higgindm/internet%20programming/validate_js.htm

• This is a link for javascript validator• http://employees.oneonta.edu/higgindm/internet

%20programming/validate_vbs.htm• This is a link for vbscript validator

Page 14: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Parser continued

• The parser will indicate if the document is well-formed.• In DOM-based parsing, a ‘+’ in the left margin indicates a

node has children and a’ –’ indicates all child nodes have been expanded.

• The MS Validator uses color coding to indicate child nodes can be expanded

• An element that stores other elements is called a container element.

• The parser makes the document content available for further processing if it is well-formed.

Page 15: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Validator example

Page 16: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Validator

Page 17: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Reserved characters

• <message>&lt;&gt;&amp;</message> would enable a character data message to contain characters: <>&

Page 18: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

DTD: document type definition

• a dtd file may contain the definition of an xml structure.• XML files may refer back to a dtd.• If an XML document has a DTD or Schema, a validating

parser can determine not merely if it is well-formed XML, but whether it is valid.

• Valid means conforming to a dtd or schema.

Page 19: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Another example: Unicode

• Lang.xml (next slide) uses unicode entity references to represent arabic words.

• lang.dtd (also shown in a later slide) is used to generate unicode characters (arabic) for some entity references in the XML file.

Page 20: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

DTD: document type definition: a dtd file may contain the definition of an xml structure.

<?xml version = "1.0"?><!-- Fig. 5.4 : lang.xml --><!-- Demonstrating Unicode --><!DOCTYPE welcome SYSTEM "lang.dtd"><welcome> <from> <!-- Deitel and Associates --> &#1583;&#1575;&#1610;&#1578;&#1614;&#1604; &#1571;&#1606;&#1583; <!-- entity --> &assoc; </from> <subject> <!-- Welcome to the world of Unicode --> &#1571;&#1607;&#1604;&#1575;&#1611; &#1576;&#1603;&#1605; &#1601;&#1610;&#1616; &#1593;&#1575;&#1604;&#1605; <!-- entity --> &text; </subject></welcome>

Page 21: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Lang.dtd

<!-- lang.dtd -->

<!ELEMENT welcome ( from, subject )><!ELEMENT from ( #PCDATA )><!ELEMENT subject ( #PCDATA )><!ENTITY assoc

"&#1571;&#1587;&#1617;&#1608;&#1588;&#1616;&#1610;&#1614;&#1578;&#1618;&#1587;">

<!ENTITY text "&#1575;&#1604;&#1610;&#1608;&#1606;&#1610;&#1603;&#1608;&#1583;">

Page 22: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Lang.xml in validator

Page 23: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Lang.xml in IE

Page 24: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

About the example

• The DTD reference contains: DOCTYPE, the name of the root, the SYSTEM flag indicating the DTD file is external, and the name of that file.

• Root element welcome contains two elements: from and subject.

• Some lines contain entity references for unicode.• The DTD also defines some other entity

references.

Page 25: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

More about markup• XML end tags may consist of /> if there is an empty element as in <emptyelt xxxx />• but otherwise must consist of a complete end-tag as in:<sometag> xxxxxxxxxxx </sometag>• Elements may or may not have content (child elements or character

data)• Elements may have 0 or more attributes associated with them.

Attributes appear in the element’s start tag:<car doors =“4”/>• Attribute values must appear in single or double quotes.• Element and attribute names may not contain blanks.• Here, element car has attribute doors with value 4.• Attributes may contain any characters and be of any length but must

start with a letter or underscore.

Page 26: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Usage.xml uses a stylesheet<?xml version = "1.0"?>

<!-- Fig. 5.5 : usage.xml --><!-- Usage of elements and attributes -->

<?xml:stylesheet type = "text/xsl" href = "usage.xsl"?>

<book isbn = "999-99999-9-X"> <title>Deitel&apos;s XML Primer</title>

<author> <firstName>Paul</firstName> <lastName>Deitel</lastName> </author>

<chapters> <preface num = "1" pages = "2">Welcome</preface> <chapter num = "1" pages = "4">Easy XML</chapter> <chapter num = "2" pages = "2">XML Elements?</chapter> <appendix num = "1" pages = "9">Entities</appendix> </chapters>

<media type = "CD"/></book>

Page 27: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Usage.xls

In notes<? Xxxxx ?> in usage.xml represents a pi (that is,

a processing instruction). PI consist of a PI target (xml:stylesheet, in this example) and a PI value. Note syntax.

PI can be used to help authors embed application-specific data in an xml document. If the application processing the xml doesn’t use the PI, then it has no effect on the xml document content.

Page 28: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Usage.xml in validator

Page 29: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Usage.XML document loaded into IE: Browser uses stylesheet to generate HTML

Page 30: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

CData

• The character data appearing in CData sections is ignored by the xml parser.

• CData might be used for JavaScript or VBScript.

• CData starts with <![CData[ and ends with ]]>

• CData may contain reserved characters, but not the text: “]]>”

Page 31: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Text example 5.7<?xml version = "1.0"?><!-- Fig. 5.7 : cdata.xml --><!-- CDATA section containing C++ code --><book title = "C++ How to Program" edition = "3"> <sample> // C++ comment if ( this-&gt;getX() &lt; 5 &amp;&amp; value[ 0 ] != 3 ) cerr &lt;&lt; this-&gt;displayError(); </sample> <sample> <![CDATA[ // C++ comment if ( this->getX() < 5 && value[ 0 ] != 3 ) cerr << this->displayError(); ]]> </sample> C++ How to Program by Deitel &amp; Deitel</book>

Page 32: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

CData example from text 5.7

Page 33: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Cdata.xml in MS validator (file is in examples\ch05)

Page 34: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

letter.xml - I removed blank lines to get it to fit here<?xml version = "1.0"?><letter> <contact type = "from"> <name>Jane Doe</name> <address1>Box 12345</address1> <address2>15 Any Ave.</address2> <city>Othertown</city> <state>Otherstate</state> <zip>67890</zip> <phone>555-4321</phone> <flag gender = "F"/> </contact> <contact type = "to"> <name>John Doe</name> <address1>123 Main St.</address1> <address2></address2> <city>Anytown</city> <state>Anystate</state> <zip>12345</zip> <phone>555-1234</phone> <flag gender = "M"/> </contact> <salutation>Dear Sir:</salutation> <paragraph>It is our privilege to inform you about our new database managed with <bold>XML</bold>. This new system allows you to reduce the load on your inventory list server by having the client machine perform the work of sorting and filtering the data.</paragraph>

<paragraph>The data in an XML element is normalized, so plain-text diagrams such as /---\ | | \---/ will become gibberish.</paragraph>

<closing>Sincerely</closing> <signature>Ms. Doe</signature></letter>

Page 35: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

letter.xml in Validator

Page 36: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

namespaces

• Naming collisions can occur when xml authors use the same tag names

• Namespaces provide a mechanism for making tag references unambiguous.

• A namespace reference appears with the start and end tags followed by a colon. So,

• <movie:character>Scrooge</movie:character> can be differentiated from <ascii:character>colon</ascii:character>

• Namespace prefixes are tied to unique URI in the xml document. Almost any name can be used to create a namespace prefix.

• In this example ascii and movie are namespace prefixes. Namespace prefixes can precede element and attribute values to avoid collisions.

• A URL may be used for a URI. The only requirement though is uniqueness as the URLs are not visited by the parser.

Page 37: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Namespace example 5.8<?xml version = "1.0"?><!-- Fig. 5.8 : namespace.xml --><!-- Namespaces --><text:directory xmlns:text = "urn:deitel:textInfo" xmlns:image = "urn:deitel:imageInfo"> <text:file filename = "book.xml"> <text:description>A book list</text:description> </text:file> <image:file filename = "funny.jpg"> <image:description>A funny

picture</image:description> <image:size width = "200" height = "100"/> </image:file></text:directory>

Page 38: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Namespace.xml in validator: file is in examples\ch05

Page 39: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Namespace.xml example 5.8 in IE

Page 40: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Namespaces continued

• Providing a prefix can be tedious. A default namespace can be created and elements and attributes used in the xml document from this namespace do not need prefixes.

Page 41: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Default namespaces<?xml version = "1.0"?>

<!-- Fig. 5.9 : defaultnamespace.xml --><!-- Using Default Namespaces -->

<directory xmlns = "urn:deitel:textInfo" xmlns:image = "urn:deitel:imageInfo">

<file filename = "book.xml"> <description>A book list</description> </file>

<image:file filename = "funny.jpg"> <image:description>A funny picture</image:description> <image:size width = "200" height = "100"/> </image:file>

</directory>

Page 42: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Default namespaces

• Now, file is in the default namespace.

• Compare this example to the earlier namespace example where text and image were distinct namespaces.

Page 43: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Defaultnamespace.xml in IE

Page 44: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Day planner case study…to be continued…

<?xml version = "1.0"?><!-- Fig. 5.10 : planner.xml --><!-- Day Planner XML document --><planner> <year value = "2000"> <date month = "7" day = "15"> <note time = "1430">Doctor&apos;s appointment</note> <note time = "1620">Physics class at BH291C</note> </date> <date month = "7" day = "4"> <note>Independence Day</note> </date> <date month = "7" day = "20"> <note time = "0900">General Meeting in room 32-A</note> </date> <date month = "7" day = "20"> <note time = "1900">Party at Joe&apos;s</note> </date> <date month = "7" day = "20"> <note time = "1300">Financial Meeting in room 14-C</note> </date> </year></planner>

Page 45: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Planner.xml in validator

Page 46: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

day planner using a java GUI. SAX parser is used to parse the document.

(in text chapter 8)

Page 47: XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML

Homework on this section

• Install an xml validator

• Create your own xml file and validate it.

• Post screenshots of your XML file and what validator.