36
University of Nottingham ool of Computer Science & Information Technology Introduction to XML 1. The XML Language Tim Brailsford

University of Nottingham School of Computer Science & Information Technology Introduction to XML 1. The XML Language Tim Brailsford

Embed Size (px)

Citation preview

University of Nottingham

School of Computer Science & Information Technology

Introduction to XML1. The XML Language

Tim Brailsford

2

University of Nottingham

Markup Languages The word “Markup” is derived from the printing

industry Detailed stylistic instructions for typesetting Usually hand-written on the copy (eg underlining some text

that is to be set in italics). Markup languages do the same job for computerised

documentation systems. Markup adds logical structure to a document, or

indicates how it is to be laid out (on paper or screen). Markup languages are a set of instructions that are

amenable to automatic processing.

3

University of Nottingham

Markup Languages (cont.) Usually a sequence of characters in a text file that

indicate structure or behaviour of the content. For example (in HTML)

This is <B>bold</B> and this is <I>italic</I> <TITLE>This is the title.</TITLE>

Markup may be created by directly editing the symbols, but is more usually hidden from end-users.

Examples HTML RTF Hytime

4

University of Nottingham

Generalised Markup Languages Proprietary markup languages are problematic.

Generalised markup languages are langauges for defining markup languages.

Metalanguages

SGML

5

University of Nottingham

SGML - History Standard Generalised Markup Language 1969 - GML from IBM

text editing formatting information retrieval

1980 SGML first published 1980’s SGML adopted by US IRS & DOD 1986 - ISO standard

ISO 8879: Information processing--Text and office systems--Standard Generalized Markup Language (SGML), ([Geneva]: ISO, 1986).

6

University of Nottingham

SGML SGML defines a system of tag markup

<TAG>This is a pair of SGML tags</TAG> SGML is a standard for how to specify a tag set. Document Type Definition (DTD) SGML documents contain structural elements that

can be described without consideration of how they are displayed.

SGML application. HTML is an SGML application.

7

University of Nottingham

Benefits of SGML Documents are created by thinking in terms of

structure rather than appearance (which may change over time).

Documents are portable because any SGML compliant software can interpret them by reference to the DTD.

Documents originally intended for one medium can easily be re-purposed for other media, such as the computer display screen.

8

University of Nottingham

What is XML? XML is based upon SGML, but is substantially

simplified for use on the WWW. Like SGML, XML is a metalanguage

arbitrary definition of elements <TITLE> <PARAGRAPH> <ChapterHeading> <PRICE>

<PARTNUMBER> <MANUFACTUER> <ExamGrade>

Syntax may optionally be described by a DTD Valid documents - have a DTD Well formed documents do not have a DTD

Style and content are completely separate XML documents contain content Style is specified by stylesheets

9

University of Nottingham

Example XML Applications MathML - maths CML - chemistry SVG - vector graphics XHTML - WWW SMIL - synchronised multimedia MusicML - sheet music FpML - financial products RETML - real estate transactions and many, many others

10

University of Nottingham

XML Elements XML documents consist of one or more elements. Elements consist of a pair of tags and (optionally) enclosed text.

<TITLE>The XML Companion</TITLE>

Elements may have attributes.

<TITLE type=“book”>The XML Companion</TITLE>

Elements may contain other elements.

<REFERENCE> <TITLE type=“book”>The XML Companion</TITLE></REFERENCE>

Empty elements may be self closing.

<PICTURE src=“mypic.jpg”> </PICTURE>

<PICTURE src=“mypic.jpg” />

11

University of Nottingham

Contents vs Style XML tags contain meaning not appearance. This allows extra information to be extracted Consider the example of the scientific names of

animals. scientific names are in latin by convention they are always printed in italics

The scientific name of the domestic dogis Canis familiaris, and of the domestic cat is Felis catus.

12

University of Nottingham

Contents vs Style XML tags contain meaning not appearance. This allows extra information to be extracted Consider the example of the scientific names of

animals. scientific names are in latin by convention they are always printed in italics

The scientific name of the domestic dogis Canis familiaris, and of the domestic cat is Felis catus.

In HTML

<P>The <I>scientific</I> name of the domestic dog is <I>Canis familiaris</I>, and of the domestic cat is <I>Felis catus.</I></P>

NB there is no distinction between scientific names and emphasis.

In HTML

<P>The <I>scientific</I> name of the domestic dog is <I>Canis familiaris</I>, and of the domestic cat is <I>Felis catus.</I></P>

NB there is no distinction between scientific names and emphasis.

13

University of Nottingham

Contents vs Style XML tags contain meaning not appearance. This allows extra information to be extracted Consider the example of the scientific names of

animals. scientific names are in latin by convention they are always printed in italics

The scientific name of the domestic dogis Canis familiaris, and of the domestic cat is Felis catus.

In XML

<P>The <emph>scientific</emph> name of the domestic dog is <sci>Canis familiaris</sci>, and of the domestic cat is <sci>Felis catus.</sci></P>

NB emphasis and scientific names are different tags. They may both be displayed as italic, but they can be treated separately.

In XML

<P>The <emph>scientific</emph> name of the domestic dog is <sci>Canis familiaris</sci>, and of the domestic cat is <sci>Felis catus.</sci></P>

NB emphasis and scientific names are different tags. They may both be displayed as italic, but they can be treated separately.

14

University of Nottingham

Rendering of XML XML files contain content not appearance Stylesheets contain appearance and behaviour XML data is rendered by being transformed into

some form suitable for display RTF (for simple printing) PDF or PostScript (for printing or display) HTML (for display over the web) HTML 4.0 / DHTML (for complex interfaces)

The transformation is defined by a stylesheet Rendering may be done by standalone software, or

by a web browser, or on a web server.

15

University of Nottingham

Standalone Rendering

HTMLXML

XSL

16

University of Nottingham

Client Side Rendering

XML

XSLHTML

Server (any) Browser with XSL engine(eg MS IE > 5.0)

XML

XSL

17

University of Nottingham

Server Side Rendering

HTML

Browser (any)Server with XSL engineeg Apache/Tomcat/Cocoon

XML

XSLHTML

18

University of Nottingham

Client vs Server Stylesheets Client side stylesheets are processed in client XML is delivered to the client

XSL/CSS must be supported by client MS IE supports CSS & XSLT

(non-standard in 5.x mostly standard in 6.x) Netscape 7 & Mozilla supports CSS and possibly XSLT via plugins.

Server side stylesheets are processed in server XML is not delivered to the client, it is transformed usually

to HTML or PDF XSL/CSS must be supported by server Cocoon is an Open Source project, implementing XSL as a Java servlet Any browser can then be used

19

University of Nottingham

Cocoon on Nottingham Servers Any file placed in a directory called public_html

is accessible within the Nottingham network with the url:http://www.cs.nott.ac.uk/~username/filename

Files with the .xml extension are automatically processed by cocoon.

Providing that they have an XSL stylesheet and the correct Cocoon processing instructions they will be “transformed” into (usually) HTML.

20

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?>

<booklist title="Some XML Books">

</booklist>

21

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?>

<booklist title="Some XML Books">

</booklist>

Root element (one per document)

XML declaration

22

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" >

<booklist title="Some XML Books">

</booklist>

23

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" >

<booklist title="Some XML Books">

</booklist>

Define root element and specify DTD.

24

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" ><!-- This is a comment -->

<booklist title="Some XML Books">

</booklist>

25

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" ><!-- This is a comment -->

<booklist title="Some XML Books">

</booklist> This is a comment (as SGML / HTML)

26

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" ><!-- This is a comment --><?xml-stylesheet type="text/xsl" href=”iti-xml2.xsl"?>

<booklist title="Some XML Books">

</booklist>

27

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" ><!-- This is a comment --><?xml-stylesheet type="text/xsl" href=”iti-xml2.xsl"?>

<booklist title="Some XML Books">

</booklist> This defines the XSL stylesheet

28

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" ><!-- This is a comment --><?xml-stylesheet type="text/xsl" href="books3.xsl"?> <?cocoon-process type="xslt"?>

<booklist title="Some XML Books">

</booklist>

29

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" ><!-- This is a comment --><?xml-stylesheet type="text/xsl" href="books3.xsl"?> <?cocoon-process type="xslt"?>

<booklist title="Some XML Books">

</booklist>

This is a Cocoon processing directive (NB not standard XML, but required by Cocoon 1.7.4).

30

University of Nottingham

Adding Content

<booklist title="Some XML Books">

<book> <author>

<name>St. Laurent</name><initial>S</initial>

</author> <date>1998</date> <title edition="Second">XML: A Primer</title> <publisher>MIS Press</publisher> <website href="http://www.simonstl.com/xmlprim/" /> <rating stars="4"/> </book>

</booklist>

31

University of Nottingham

Benefits of a DTD DTDs are optional in XML

DTD allows validation of documents

DTD defines the application Vital for collaborative development IPR implications

DTD allows entity definitions (ie symbols, shortcuts, “foreign” characters etc.).

32

University of Nottingham

XML Namespaces Namespaces are mechanisms to ensure that

elements are unique Namespaces in XML are optional Consider the following:

<title>The Title</title>

<title text=“The Title” />

<title> <text>The Title</text></title>

33

University of Nottingham

Ensuring uniqueness Unique element names

Unique attribute content

<title-one>The Title</titleone>

<title-two text=“The Title” />

<title-three> <text>The Title</text></title-three>

<title ns=“one”>The Title</title>

<title ns=“two” text=“The Title” />

<title ns=“3”> <text>The Title</text></title>

34

University of Nottingham

xmlns attribute The xmlns attribute is used to declare

namespaces This must be a URI

<title xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-one”> The Title</title>

<title xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-two” text=“The Title” />

<title xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-three”> <text>The Title</text></title>

35

University of Nottingham

Namespace Abbreviations If an element doesn’t have a namespace

defined it inherits that of its parent.

Where multiple namespaces are used together aliases may be declared

<demo xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-two” > <title text=“The Title” /></demo>

<demo xmlns:first =“http://www.cs.nott.ac.uk/~tjb/NSdemo-one” xmlns:second =“http://www.cs.nott.ac.uk/~tjb/NSdemo-two” > <first:title>The Title</first:title> <second:title text=“The Title” /></demo>

36

University of Nottingham

XML Namespaces Namespaces in XML are optional Namespaces ensure that elements are unique In different contexts a given tag might mean

different things - eg consider <BOOK> To me it might mean a book in a bibliography To a bookshop it might contain stock details To a travel agent it might contain information about flight

bookings!

Namespaces attach unique labels to a given tag set. URLs are usually used as namespace labels.