34
XML

XML. What is XML? XML stands for EXtensible Markup Language.XML is a markup language much like HTML. XML was designed to describe data. XML was created

Embed Size (px)

Citation preview

XML

What is XML?• XML stands for EXtensible Markup Language.XML is a

markup language much like HTML. • XML was designed to describe data. XML was created

to structure, store and to send information. • It is just pure information wrapped in XML tags. Someone

must write a piece of software to send, receive or display it. (APIs have been developed in C, C++, java and other languages that help in creating, reading and manipulating XML documents. Therefore it becomes easy to write such software).

• XML is a cross-platform, software and hardware independent tool for transmitting information.

• XML tags are not predefined. You must define your own tags.

Example 1 :

<priceList>

<coffee>

<name>Mocha Java</name>

<price>11.95</price>

</coffee>

<coffee>

<name>Espresso</name>

<price>12.50</price>

</coffee>

</priceList>

<student-list>

<batch year=“2005”>

<student>

<name>ABC</name>

<roll-no>11667</roll-no>

</student>

<student>

<name>ffs</name>

<roll-no>16667</roll-no>

</student>

</batch>

</student-list>

Example 2

Uses• XML is used to Exchange Data: In the real world, computer

systems and databases contain data in incompatible formats. One of the most time-consuming challenges for developers has been to exchange data between such systems over the Internet. Converting the data to XML can greatly reduce this complexity and create data that can be read by many different types of applications.

• XML can be used to Share Data: With XML, plain text files can be used to share data. Since XML data is stored in plain text format, XML provides a software- and hardware-independent way of sharing data.This makes it much easier to create data that different applications can work with. It also makes it easier to expand or upgrade a system to new operating systems, servers, applications, and new browsers.

• XML can be used to Store Data: XML can also be used to store data. XML provides many of the things found in databases: storage (XML documents), schemas (DTDs, XML Schemas, RELAX NG, and so on), query languages (XQuery, XPath, XQL, XML-QL, QUILT, etc.), programming interfaces (SAX, DOM, JDOM), and so on. On the minus side, it lacks many of the things found in real databases: efficient storage, indexes, security, transactions and data integrity, multi-user access, triggers, queries across multiple documents, and so on.

• XML can be used to Create new Languages: XML is the mother of WAP and WML.

Origin• XML and its related technologies are developed and

approved by W3C.• Released in December 1997.• SGML (Standard Generalized Markup Language by

IBM) was the first language that was used to describe data.

• XML is successor to SGML, simplified and adapted to internet. XML is a subset of SGML.(HTML also was successor of SGML.)

• XML as been used to define successor of HTML called XHTML.

• SGML and XML are meta-languages.

XML applications

a) WML(Wireless markup language), MathML(Mathematical Markup Language)

b) All deployment descriptors in J2EE specification uses XML

c) JSP (yes, JSPs can be coded as XML)

<?xml version=“1.0”?>

<note>

<to style=“bold”>Harry Potter </to>

<from>Ron</from>

<heading>Reminder</heading>

<horizontal_line/>

<body>Please, get your magic wand.</body>

<!-–letter format-->

&quot;

</note>

Root element

Empty Elements

Attributes

XML structure

Elements

Comment

Processing instructions

Entity References

•There are 5 pre-declared entities:

&lt (<), &gt(>), &quot (“) , &apos(‘), &amp (&)

Redefining XML•An XML document is an information unit that can be viewed in two ways: as a linear sequence of characters that contain character data or markup or entity references, or as an abstract data structure that is a tree of nodes.

(Entity references are replaced by with what ever they refer to (their referents).

(Element names should not begin with a number and cannot contain spaces.

Well-formed Constraints• All XML elements must nest correctly.• XML tags are case sensitive. The case of the start

tag and and its corresponding end tag must match.• All XML elements must be properly nested • All XML documents must have one and only one

Root element • All the elements (other than the root)must have one

and one parent• Attribute values must always be quoted • Empty tags must end with a ‘/’.• An XML document that confirms to the above rules

is called a “Well formed” XML document.

DTD• Document Type Definition

• The DTD defines the structure of the xml document and how content is nested.

• An XML document is valid only when it is well-formed and confirms to the DTD (or XML schema) defined for it.

• DTD defines the grammar rules for forming an XML document.

Two ways to link a dtd with xml• XML document itself contains an Internal DTD.

• Create a separate file with any name and with extension .dtd and link it to xml file like this:

<!DOCTYPE note SYSTEM “x.dtd”>

or

<!DOCTYPE note SYSTEM “http://ser/x.dtd”>

Root element

DTD components• Declarations:<!DOCTYPE[…]>

<!ELEMENT ...>

<!ATTLIST ...>

<!ENTITY ...>

DOCTYPE:

<!DOCTYPE priceList[..]>Click to view xml

Root element

ELEMENT DeclarationText Only:<!ELEMENT name (#PCDATA)>

Element Only:<!ELEMENT name (child1,child2) >

Mixed:<!ELEMENT name (#PCDATA|child1|child2)>

Anything:<!ELEMENT name ANY>

Empty:<!ELEMENT name EMPTY>

Cardinalitynone: the absence of cardinality indicates one and only one

*: 0 or more+ 1 or more? 0 or 1

Sequence list: comma separated list

Choice list: |

Example: Click to view xml

<?xml version="1.0" ?>

<!DOCTYPE priceList [

<!ELEMENT priceList (coffee)+>

<!ELEMENT coffee (name, price) >

<!ELEMENT name (#PCDATA) >

<!ELEMENT price (#PCDATA) > ]>

<priceList>

<coffee>

<name>Mocha Java</name>

<price>11.95</price>

</coffee>

</priceList>

Embedding dtd in xml

Attribute DeclarationsPossible examples to declare ATTLIST tag

<!ATTLIST review

title CDATA #REQUIRED

ISBN ID #REQUIRED

Similar IDREF #IMPLIED

libno NMTOKEN #IMPLIED

authors NMTOKENS #REQUIRED

hardbound (YES|NO) “NO”

source CDATA #FIXED “BOOK”

>

types defaults

• CDATA: text in quotes• ID: text, but its value must be unique in the

document.• IDREF: text that is equal to the value of an ID

attribute of some element in the document.• NMTOKEN: text that cannot contain white space• NMTOKENS: a comma-separated list of NMTOKEN

items#REQUIRED: the attribute is required.#IMPLIED: the attribute is optional.#FIXED with default value: the attribute

must always have the default valueNames that begin with ‘xml’ is reserved.

Example1?xml version="1.0" ?><!DOCTYPE bookreview [<!ELEMENT bookreview (review)+><!ELEMENT review (#PCDATA)><!ATTLIST review title CDATA #REQUIRED ISBN ID #REQUIRED Similar IDREF #IMPLIED libno NMTOKEN #IMPLIED authors NMTOKENS #REQUIRED hardbound (YES|NO) "NO" source CDATA #FIXED "BOOK"

> ]>

<bookreview><review title="The 7 habits of highly effective people" ISBN="o-7434-0885-3" libno="a1"authors="StephenR.Covey">

Powerful lessons in personal change</review> <review title="Failing Forward" ISBN="o-81-7809-077-5" libno="a1"Similar="o-7434-0885-3" authors="JohnC.Maxwell" hardbound="YES">

Turning mistakes into stepping stones for success

</review> </bookreview>

Entity declaration• All the entities except predefined entities need to be

declared.• Two types of Entities:a) Parameter Entity: Entity reference within used within the

DTD.b) General Entity: Entity reference used within the XML

document. DTD:<!ENTITY GenEntity “xml doc

entity”><!ENTITY ParEntity % “dtd doc

entity”> %ParEntity; IN XML:&GenEntity;

Gets replaced by its value

Gets replaced by its value

• It is an error to put a parameter reference in the xml document. But it is not an error to put an entity reference in DTD in defining the value of another entity. But the reference will not be ‘resolved’ until it is used in the document.

• Example:

In DTD<!ENTITY rights “All rights reserved”>

<!ENTITY book “Designed by WW. &rights; ”>

In XML:

&book; Designed by WW. All rights reserved

Drawback of DTD1.Their syntax is not XML and so XML parsers cannot parse them into component parts very easily.2.They have a very primitive system of data types.3.They are not modular. It is not very easy to reuse a DTD.4.They are not easily extensible. (No inheritance in DTD)DTDs are written in EBNF (Extended Backus-Naur Form) notation.•XML Schema are replacement for DTD. XML schemas are intended to give XML all these features.

XML Parser•XML parsers/processors are the processors which check if the XML document is well-formed (non-validating parser) or valid (validating parser).

•XML parsing is required so that our application can inspect, retrieve and modify the document contents. XML parser program this sits between XML document and our application.

Example of validating XML parser freely available:

Xerces-C(C++) , Xerces-J(Java) from Apache

IBM4C(C++), IBM4J (Java)

Oracle XML Parser that comes with 8i.

Sun has included JAXP and JAXB API which provides support other parser. Sun also has a validating parser called The Java Project X TR2 which requires JDK1.1.6 !

XML Support

xml doc

xml parser

xml application

xml docreturn valid/invalid document

<?xml version=“1.0”?>

<?noisemaker noise=“sound.wav”?>

<note>

<to style=“bold”>Harry Potter </to>

<from>Ron</from>

<body>Please, get your magic wand.</body>

<java>

<![CDATA[

if(a>b && a<10) doThis(); ]]>

</java>

</note>

Character data that you don’t want to be parsed

CDATA section

DOM and SAX Parsers

Parsing• In an attempt to standardize the way parser should

work, two specification has come out, that spells out the interfaces that an application can expect from a parser:

• SAX: the Simple API for XML: SAX processes the XML document a tag at a time and generates events.

• DOM: the Document Object Model: describes the document as a data-structure in the form of tree. It first loads the entire xml in the form of tree. Then application can edit any traverse and edit any node.

SAX Programming model in java

XML source

DTD

(optional) SAXParser calls

handler methods

startDocument

startElement

characters

endElement

endDocument

etc

output

Your Class implementing ContentHandler interface

SAXParserFactory

2.input

2. input

1. creates 2. input

events

DOM Programming model in java

XML source

DocumentBuilderNode

DTD

(optional)

3.Parse and build the tree

Document (DOM)2.input 2.input

DocumentBuilderFactory

1.creates

Your Class

SAX DOM1. Fast, efficient for reading

of XML data2. Less memory3. Cannot go back to the earlier

visited node or leap ahead to different position.

4. Cannot be used to modify, add or delete nodes.

5. Not W3C standard

1. Slow reading of XML data

2. More memory 3. Can go to any position from

anywhere.

4. Can be used to modify, add or delete nodes.

5. W3C standard