29
XML Introduction XML Introduction

XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

Embed Size (px)

Citation preview

Page 1: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

XML XML IntroductionIntroduction

Page 2: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

What is XML?What is XML? XML stands for eXtensible Markup XML stands for eXtensible Markup

LanguageLanguage XML is a markup language much like XML is a markup language much like

HTML that is used to describe dataHTML that is used to describe data XML tags are not predefined - you must XML tags are not predefined - you must

define your own tagsdefine your own tags XML uses a Document Type Definition XML uses a Document Type Definition

(DTD) or an XML Schema to describe the (DTD) or an XML Schema to describe the datadata

XML with a DTD or XML Schema is XML with a DTD or XML Schema is designed to be self-descriptivedesigned to be self-descriptive

Page 3: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

XML vs HTMLXML vs HTML

XML was designed to carry dataXML was designed to carry data HTML and XML were designed with HTML and XML were designed with

different goalsdifferent goals HTML is about displaying HTML is about displaying

information, while XML is about information, while XML is about describing informationdescribing information

Page 4: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

XML exampleXML example

<?xml version=“1.0” encoding=“ISO-8859-1”?> <?xml version=“1.0” encoding=“ISO-8859-1”?>

<note><note>

<to> Tom </to><to> Tom </to>

<from> Jane </from><from> Jane </from>

<heading> Reminder </heading><heading> Reminder </heading>

<body> Meeting </body><body> Meeting </body>

</note></note> The XML example was created to structure The XML example was created to structure

and store information. It does not DO and store information. It does not DO anything because someone must write anything because someone must write programs to send, receive or display it.programs to send, receive or display it.

Page 5: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

XML example (cont.)XML example (cont.)

New tags and the corresponding New tags and the corresponding document structure are “invented” document structure are “invented” in our examplein our example

Page 6: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

How XML complements How XML complements HTMLHTML

XML separates Data from HTMLXML separates Data from HTML Data can be exchanged between Data can be exchanged between

incompatible systemsincompatible systems With XML, plain text files can be With XML, plain text files can be

used to share dataused to share data XML can also be used to create new XML can also be used to create new

languages, such as the Wireless languages, such as the Wireless Markup Language (WML)Markup Language (WML)

Page 7: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

XML ElementsXML Elements

<?xml version=“1.0” encoding=“ISO-8859-1”?> <?xml version=“1.0” encoding=“ISO-8859-1”?>

<note><note>

<to> Tom </to><to> Tom </to>

<from> Jane </from><from> Jane </from>

<heading> Reminder </heading><heading> Reminder </heading>

<body> Meeting </body><body> Meeting </body>

</note></note> The first line in the document – the XML The first line in the document – the XML

declaration – defines the XML version and declaration – defines the XML version and character encoding used in the documentcharacter encoding used in the document

Page 8: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

XML Elements (cont.)XML Elements (cont.) In the example, the document conforms In the example, the document conforms

to the 1.0 specification of XML and uses to the 1.0 specification of XML and uses the ISO-8859-1 (Latin-1/West European) the ISO-8859-1 (Latin-1/West European) character setcharacter set

The next line describes the root element The next line describes the root element of the document <note>of the document <note>

The next 4 lines describe 4 child The next 4 lines describe 4 child elements of the root: <to>, <from>, elements of the root: <to>, <from>, <heading> and <body><heading> and <body>

And finally the last line ends the root And finally the last line ends the root element </note>element </note>

Page 9: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

XML Elements (cont.)XML Elements (cont.) All XML elements must have a closing tagAll XML elements must have a closing tag In HTML, some elements do not have to have In HTML, some elements do not have to have

a closing tag, for example:a closing tag, for example:<p> This is a paragraph<p> This is a paragraph

XML tags are case sensitive, whereas HTML XML tags are case sensitive, whereas HTML tags are case insensitivetags are case insensitive

All XML elements must be properly nested All XML elements must be properly nested and improper nesting of tags makes no sense and improper nesting of tags makes no sense to XML.to XML.<b><i>This is bold and italic</b></i> <b><i>This is bold and italic</b></i> - -

IncorrectIncorrect<b><i>This is bold and italic</i></b> <b><i>This is bold and italic</i></b> - Correct- Correct

Page 10: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

XML Elements (cont.)XML Elements (cont.)

All XML documents must have a root All XML documents must have a root elementelement

All other elements are within this root All other elements are within this root elementelement

All elements can have sub elements All elements can have sub elements (child elements) that are properly (child elements) that are properly nested within their parent element:nested within their parent element:<root><root>

<child><child><subchild>….</subchild><subchild>….</subchild>

</child></child></root></root>

Page 11: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

XML Elements (cont.)XML Elements (cont.) XML elements can have attributes in XML elements can have attributes in

name/value pairsname/value pairs The attribute value must always be The attribute value must always be

quotedquoted<?xml version=“1.0” encoding=“ISO-8859-<?xml version=“1.0” encoding=“ISO-8859-

1”?>1”?>

<note date=“6/1/2004”><note date=“6/1/2004”>

<to>Tom</to><to>Tom</to>

</note></note> The syntax for writing comments in XML The syntax for writing comments in XML

is similar to that of HTML: is similar to that of HTML:

<!-- this is a comment --><!-- this is a comment -->

Page 12: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

XML Elements (cont.)XML Elements (cont.) XML elements have relationshipsXML elements have relationships For example, a hierarchy such as:For example, a hierarchy such as:

My First XMLMy First XMLIntroduction to XMLIntroduction to XML

What is HTMLWhat is HTML

What is XMLWhat is XML

XML SyntaxXML SyntaxElements must have a closing tagElements must have a closing tag

Elements must be properly nestedElements must be properly nested

Page 13: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

XML Elements (cont.)XML Elements (cont.) Can be organized into:Can be organized into:<book><book>

<title>My First XML</title><title>My First XML</title><chapter>Introduction to XML<chapter>Introduction to XML

<para>What is HTML</para><para>What is HTML</para><para>What is XML</para><para>What is XML</para>

</chapter></chapter><chapter>XML Syntax<chapter>XML Syntax

<para>Elements must have a closing tag</para><para>Elements must have a closing tag</para><para>Elements must be properly nested</para><para>Elements must be properly nested</para>

</chapter></chapter>

</book></book>

Page 14: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

Element NamingElement Naming

XML elements must follow these XML elements must follow these naming rules:naming rules: Names can contain letters, numbers, Names can contain letters, numbers,

and other charactersand other characters Names must not start with a number or Names must not start with a number or

punctuation characterpunctuation character Names must not start with the letters Names must not start with the letters

xml (or XML or Xml ..)xml (or XML or Xml ..) Names cannot contain spacesNames cannot contain spaces

Page 15: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

Element Naming (cont.)Element Naming (cont.)

Non-English letters like Non-English letters like 轵 轵 are are perfectly legal in XML element perfectly legal in XML element names, but watch out for problems if names, but watch out for problems if the software vendor doesn't support the software vendor doesn't support themthem

The ":" should not be used in The ":" should not be used in element names because it is element names because it is reserved for reserved for namespaces namespaces (discussed (discussed later)later)

Page 16: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

"Well Formed" XML "Well Formed" XML documentsdocuments

XML with correct syntax is Well Formed XML with correct syntax is Well Formed XMLXML

XML validated against a Document XML validated against a Document Type Definition (DTD) is Valid XMLType Definition (DTD) is Valid XML

A DTD defines the legal elements of an A DTD defines the legal elements of an XML documentXML document

A Schema is an alternative to a DTD but A Schema is an alternative to a DTD but we are not going to cover it because we are not going to cover it because Greenstone does not use XML SchemaGreenstone does not use XML Schema

Page 17: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

Introduction to DTDIntroduction to DTD

The purpose of a Document Type Definition is The purpose of a Document Type Definition is to define the legal building blocks of an XML to define the legal building blocks of an XML documentdocument

A DTD defines the document structure with a A DTD defines the document structure with a list of legal elementslist of legal elements

A DTD can be declared inline in your XML A DTD can be declared inline in your XML document or as an external referencedocument or as an external reference

If a DTD is included in your XML document, it If a DTD is included in your XML document, it should be wrapped in a DOCTYPE definition:should be wrapped in a DOCTYPE definition:<!DOCTYPE root-element [element-declarations]> - <!DOCTYPE root-element [element-declarations]> -

InlineInline

<!DOCTYPE note SYSTEM “Outside.dtd"> - External <!DOCTYPE note SYSTEM “Outside.dtd"> - External Reference Reference

Page 18: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

Introduction to DTD Introduction to DTD (cont.)(cont.) An Inline DTD example:An Inline DTD example:

1: <?xml version="1.0"?>1: <?xml version="1.0"?>2: 2: <!DOCTYPE note [<!DOCTYPE note [3: <!ELEMENT note (to,from,heading,body)>3: <!ELEMENT note (to,from,heading,body)>4: 4: <!ELEMENT to (#PCDATA)> <!ELEMENT to (#PCDATA)>5: 5: <!ELEMENT from (#PCDATA)> <!ELEMENT from (#PCDATA)>6: 6: <!ELEMENT heading (#PCDATA)> <!ELEMENT heading (#PCDATA)>7: 7: <!ELEMENT body (#PCDATA)> <!ELEMENT body (#PCDATA)>8: 8: ]>]>9: 9: <note><note>

10: 10: <to> Tom </to> <to> Tom </to>11: <from> Jane </from>11: <from> Jane </from>12: <heading>Reminder</heading>12: <heading>Reminder</heading>13: <body>Don't forget me this 13: <body>Don't forget me this weekend!</body>weekend!</body>

14:14: </note></note>

Page 19: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

Introduction to DTD Introduction to DTD (cont.)(cont.)

!DOCTYPE note (in line 2) defines !DOCTYPE note (in line 2) defines that this is a document of type notethat this is a document of type note

!ELEMENT note (in line 3) defines !ELEMENT note (in line 3) defines the note element as having four the note element as having four elements: "to,from,heading,body"elements: "to,from,heading,body"

!ELEMENT to (in line 4) is defined to !ELEMENT to (in line 4) is defined to be of type "#PCDATA".be of type "#PCDATA".

!ELEMENT from (in line 5) is !ELEMENT from (in line 5) is defined to be of type "#PCDATA"defined to be of type "#PCDATA"

and so on.....and so on.....

Page 20: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

Introduction to DTD Introduction to DTD (cont.)(cont.)

External Reference example:External Reference example:<?xml version="1.0"?><?xml version="1.0"?>

<!DOCTYPE note SYSTEM "note.dtd"><!DOCTYPE note SYSTEM "note.dtd"><note><note>

<to> Tom </to><to> Tom </to><from> Jane </from><from> Jane </from><heading>Reminder</heading><heading>Reminder</heading><body>Don't forget me this <body>Don't forget me this

weekend!</body>weekend!</body> </note> </note>

Page 21: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

Introduction to DTD Introduction to DTD (cont.)(cont.)

Note.dtdNote.dtd<!ELEMENT note <!ELEMENT note

(to,from,heading,body)>(to,from,heading,body)>

<!ELEMENT to (#PCDATA)><!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)><!ELEMENT from (#PCDATA)>

<!ELEMENT heading (#PCDATA)><!ELEMENT heading (#PCDATA)>

<!ELEMENT body (#PCDATA)><!ELEMENT body (#PCDATA)>

Page 22: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

Introduction to DTD Introduction to DTD (cont.)(cont.)

PCDATAPCDATA PCDATA is text that will be parsed by a PCDATA is text that will be parsed by a

parser parser Tags inside the text will be treated as Tags inside the text will be treated as

markup and entities will be expandedmarkup and entities will be expanded CDATACDATA

CDATA is text that will NOT be parsed by a CDATA is text that will NOT be parsed by a parser parser

Tags inside the text will NOT be treated as Tags inside the text will NOT be treated as markup and entities will not be expandedmarkup and entities will not be expanded

Page 23: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

Declaring a DTD ElementDeclaring a DTD Element

In the DTD, XML elements are In the DTD, XML elements are declared with an element declared with an element declaration:declaration:

<!ELEMENT element-name category><!ELEMENT element-name category>

oror<!ELEMENT element-name (element-<!ELEMENT element-name (element-

content)>content)>

There are various types of elements There are various types of elements that we can declare in a DTDthat we can declare in a DTD

Page 24: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

Types of DTD elementsTypes of DTD elements Empty ElementsEmpty Elements

<!ELEMENT element-name EMPTY><!ELEMENT element-name EMPTY> Elements with only character dataElements with only character data

<!ELEMENT element-name (#PCDATA)><!ELEMENT element-name (#PCDATA)> Elements with any contentsElements with any contents

<!ELEMENT element-name ANY><!ELEMENT element-name ANY> Elements with children (sequences)Elements with children (sequences)

<!ELEMENT element-name (child-element-<!ELEMENT element-name (child-element-name, …)>name, …)>

Note that when children are declared in a Note that when children are declared in a sequence separated by commas, the children sequence separated by commas, the children must appear in the same sequence in the must appear in the same sequence in the document.document.

Page 25: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

Types of DTD elements Types of DTD elements (cont.)(cont.)

Declaring only one occurrence of the same Declaring only one occurrence of the same element element <!ELEMENT element-name (child-name)><!ELEMENT element-name (child-name)>

Declaring minimum one occurrence of the Declaring minimum one occurrence of the same elementsame element <!ELEMENT element-name (child-name+)><!ELEMENT element-name (child-name+)>

Declaring zero or more occurrences of the Declaring zero or more occurrences of the same element same element <!ELEMENT element-name (child-name*)><!ELEMENT element-name (child-name*)>

Declaring zero or one occurrences of the Declaring zero or one occurrences of the same element same element <!ELEMENT element-name (child-name?)><!ELEMENT element-name (child-name?)>

Page 26: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

Types of DTD elements Types of DTD elements (cont.)(cont.)

Declaring either/or contentDeclaring either/or content <!ELEMENT note (to,from,header,<!ELEMENT note (to,from,header,

(message|body))>(message|body))> Declaring mixed contentDeclaring mixed content

<!ELEMENT note (#PCDATA|to|from|<!ELEMENT note (#PCDATA|to|from|header|message)*>header|message)*>

DTD attributes are declared with an DTD attributes are declared with an ATTLIST declaration - not covered ATTLIST declaration - not covered because Greenstone does not use because Greenstone does not use themthem

Page 27: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

DTD ValidationDTD Validation

Internet Explorer 5.0 and above can Internet Explorer 5.0 and above can validate your XML against a DTDvalidate your XML against a DTD

An ill-formed or illegal XML An ill-formed or illegal XML document will generate an errordocument will generate an error

Page 28: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

Why use a DTD?Why use a DTD?

Each XML file can carry a description of Each XML file can carry a description of its own format with it its own format with it

With a DTD, independent groups of With a DTD, independent groups of people can agree to use a common DTD people can agree to use a common DTD for interchanging data for interchanging data

A standard DTD can be used to verify that A standard DTD can be used to verify that the data one receives from the outside the data one receives from the outside world is valid world is valid

We can also use a DTD to verify our own We can also use a DTD to verify our own datadata

Page 29: XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like

ReferencesReferences

DTD Tutorial DTD Tutorial http://www.w3schools.com/dtd/default.http://www.w3schools.com/dtd/default.aspasp

XML TutorialXML Tutorial

http://http://www.xmlfiles.comwww.xmlfiles.com/xml//xml/

http://www.w3schools.com/xml/default.http://www.w3schools.com/xml/default.aspasp

HTML TutorialHTML Tutorial

http://www.w3schools.com/html/defaulthttp://www.w3schools.com/html/default.asp.asp