47
Introduction to XML February 07, 2002

Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Embed Size (px)

Citation preview

Page 1: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Introduction to XML

February 07, 2002

Page 2: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

From HTML to XML

As mentioned in previous classes, if you know HTML, then you already know XML… really!In this class, we will look at the basic conventions of XML and you will see the ways in which they mirror those of HTML.

Page 3: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

The Building Blocks of XML

XML uses the same building blocks that HTML does: elements, attributes, and values.An XML element is the most basic unit of your document. It can contain practically anything else, including other elements and text. An element is delimited by an opening tag (<…>) and a closing tag (</…>), which may or may not contain attributes and values.

Page 4: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

XML Elements

In XML an element is literally what you make and ‘name’ it.The name, which you invent yourself, should describe the element’s purpose and in particular its contents. For example:

<chaptertitle>The Beginning</chapter title>By ‘marking up’ the text like this, you are providing additional information about the tagged text.

Page 5: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

XML Elements: An Example

HTML<H1>The Beginning</H1>

XML<chaptertitle>The Beginning</chapter title>

The XML element you create provides metadata for the chosen text.

Page 6: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

XML Attributes

XML attributes, which are contained within the element’s start tag, have quotation-mark delimited values that further describe the purpose and content of the particular tag.e.g. <title language=“english”>The World</title>or <title language=“german”>Die Welt</title>An element can have as many attributes as needed as long as they each have an unique name.

Page 7: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Writing XML Code

Writing XML code is an almost identical process as writing HTML code.Like HTML, you can insert spaces to make your code easier to view and edit. For example, <book>…<title>…<chapter>…<subheading>, etc. can be written as <book>…

<title>… <chapter>…

<subheading>…

Page 8: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Rules for Writing XML

A ‘root’ element is required for each XML document– Every XML document must contain one root element that

contains all of the other elements in the document. The only pieces of XML allowed outside the root element are comments and processing instructions.

– You can think of it like a container for the content.Closing tags are required for all elements– Every element must have a closing tag. Empty tags should

either use the all-in-one closing tag with backslash before the final “>” (e.g., <image/>) or use both opening and closing tags (<image></image>).

Page 9: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Rules for XML… cont’d

Elements must be properly nested– If you start element 1, then start element 2, you must

first close element 2, and then element 1.• e.g. <book><chapter>…</chapter></book>

Case matters– XML is case sensitive. BOOK, Book, and book

elements would all be considered different and unrelated!

Page 10: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Rules for XML… cont’d

Values must be enclosed in quotations marks– An attributes values (numbers, words, etc.) must

always be enclosed in double quotations marks• e.g., language=“english” NOT language=english OR

language=‘english’

Entity references must be declared– Unlike HTML, any entity reference used in XML must

be declared in a DTD (document type description) before being used.

Page 11: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Rules for XML… cont’d

You must declare the XML Version used– Every time you create an XML document, you must declare

the version of XML used to create it. In this case, since there has only been one version of XML, your declaration will look like this:

• <?xml version=“1.0”?>

– This should always be your first line of XML.– The <? and ?> tags enclose processing instructions. In

addition to declaring the XML version, processing instructions can specify style sheets, among other things.

Page 12: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Writing XML

With your first line of XML being the version declaration processing instructions, the first real XML that you will create is the root element.The root element acts like a container for all the other elements and content in your document. You can liken it to the <HTML> tag used in HTML.It is your first ‘structural’ statement.Only processing instructions should exist outside the root element.

Page 13: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Root elements: Examples

For example, if you were created a XML version of a book, you might create the element <book>. The <book> root would then contain all other content, looking something like this:

<?xml version=“1.0”?> <book>

All other XML and content here. </book>

Page 14: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Root Element: Examples

Alternatively, if you were creating an XML version of a sheet of music, you might specify the root element as <opus3>, resulting in a structure like:

<?xml version=“1.0”?><opus3>All other XML and content here.

</opus3>

Page 15: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Writing XML… cont’d

Think of the root element then as the largest unit of structure for your document.You can then plan other lesser units to fit within the root. Using <book> as our example, one can imagine lesser structural units such as <chapter> elements, <section> elements, as well as presentation elements such as <chaptertitle> and <sectiontitle>.Even more than HTML, with XML it is important to plan ahead rather than trying to create elements on the fly.

Page 16: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Writing XML… cont’d

Like SGML and HTML, XML also allows for comments within the XML code. Comments are useful annotations or instructions that you put in the code so that future users/designers, including yourself!, can understand what you had originally intended.Comments are create by using the <!-- and --> start and end tags, such as:

<!--The following section uses the “class.css” style sheet. You will need to ensure that the “class.css” is in the proper directory.-->

Page 17: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Writing XML… cont’d

Writing Special Characters/Symbols– Unlike HTML which allows for a whole bunch of special

characters delineated by an ampersand (&) and a semi-colon (;), such as “&amp;” for “&”, XML allows for only five. All other special characters and symbols must be pre-defined in your DTD (document type description).

– The five special characters/symbols allowed in XML are:• &lt; (<)• &gt; (>)• &quot; (“)• &apos; (‘)• &amp; (&)

Page 18: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Writing XML… cont’d

Observing these few rules, you will be able to create your XML documents just as you would HTML documents.Remember that XML requires you to plan ahead, particularly with defining elements (tags) and entities (such as special characters or repeated text).Take a look at the examples that follow…

Page 19: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

HTML vs. XML

<html><head>

<title>Gerhardt Rudner’s The Fall</title></head><body>

<h1> Gerhardt Rudner’s The Fall</h1><h3>Criticism</h3><h5> Introduction</h5>Gerhardt Rudner’s The Fall is considered by most to be one of the

most influential books of 2001….</body>

</html>

Page 20: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

HTML vs. XML

<?xml version=“1.0”?><book>

<booktitle> Gerhardt Rudner’s The Fall </booktitle><chapterone> <chapteronetitle> Criticism </chapteronetitle> <sectionone>

<sectiontitle>Introduction</sectiontitle>Gerhardt Rudner’s The Fall is considered by most to

be one of the most influential books of 2001…. </sectionone></chapterone>

</book>

Page 21: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Any Questions?

Page 22: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Valid and Well-Formed XML

You may have hear two adjectives bandied about by XML authors and technical writers. These are: Valid and Well-Formed.Both terms refer to the process of validating your XML document and require that your document meet certain standards. For those of you who have taken Database class, this process is similar to the ‘ordered form’ requirements of databases.

Page 23: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Valid and Well-Formed… cont’d

Valid– A valid document must have a DTD--a set of rules that define

what tags can appear in the document and how they must nest within each other. The DTD also must declare all entities apart from those five special ones we looked at previously. Entities are reusable bits of data that can be used many times, but need be transmitted only once (more on this later).

– Thus, a XML document is valid when it conforms to the rules established in the DTD. That’s it!

Page 24: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Valid and Well-Formed…cont’d

Well-Formed– A document that is well-formed is easy for a computer

program to read and ready for network delivery.– Specifically, well-formed documents must have these

characteristics:• All the beginning and end tags match up• Empty tags use special XML syntax (e.g., <empty/>)• All the attributes are double-quoted (e.g., id=“dog”)• All the entities (reusable text, special characters, etc.) are properly

declared.

Page 25: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Creating DTDs

As mentioned, a document type definition, or DTD, specifies the valid syntax, structure, and format for defining your own markup language.If you do not follow the rules in your DTD, your XML parser or browser will complain bitterly. A parser/browser cannot properly display and process a XML document that does not conform to its DTD.Therefore, it is important that you gain a good understanding of DTDs and how they work!

Page 26: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Creating DTDs… cont’d

In your DTD, you will define the elements, attributes, and values to be used in your XML markup.Internal or External DTDs– For individual XML documents, it is often simplest to

create the DTD within the XML document itself.– However, if you want to use the DTD with a set of

documents, to avoid duplication, you will want to create an external DTD.

Page 27: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Creating Internal DTDs

To create an Internal DTD:– At the top of your document, after your XML

declaration (ie. <?xml version=“1.0”?>), type: <!DOCTYPE root element [

– For example, <!DOCTYPE book [– Leave some space so that you will have room to put

in your definitions of elements, attributes, and values– Then type ]> to end your internal DTD

Page 28: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Creating Internal DTDs… cont’d

Example:<?xml version=“1.0”?><!DOCTYPE book [

]><book>….

Page 29: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Creating External DTDs

To write an External DTD:– Create a new text file with any simple text editor– Define the rules attributed to your elements,

attributes, and values– Save the file as text only with the “.dtd” extension– There are certain conventions that you need to follow

when naming your DTD…

Page 30: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Creating External DTDs… cont’d

There are two kinds of external DTDs that you need to be aware of:– Personal External DTDs - private DTDs created

solely for your documents– Public External DTDs - public DTDs created for use

by anyone

Page 31: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Declaring personal DTDs

To declare a personal DTD, you must first add an attribute to your XML declaration and then specify where the parser/browser can find your DTD:– In the XML version declaration, add the attribute

standalone=“no”• e.g. <?xml version=“1.0” standalone=“no”?>

– Next, type <!DOCTYPE root where root is the name of your root element (e.g. <!DOCTYPE book)

– Add a space and type SYSTEM, to indicate a personal DTD on your system, and the absolute URL to your DTD file

• e.g. <!DOCTYPE book SYSTEM “http://is.dal.ca/~sboon/book.dtd”>

Page 32: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

XML personal DTD Declaration

In the end, it should look like this:

<?xml version=“1.0” standalone=“no”?><!DOCTYPE book SYSTEM “http://is.dal.ca/~sboon/book.dtd”><book>….

Page 33: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Creating External DTDs… cont’d

Naming public External DTDs:– You should name your public DTD using a formal public

identifier or FPI. To create an FPI:• First, type “+” if your DTD is approved by the ISO, or “-” if it is not a

recognized standard• Next, type “// yourname // DTD”, where yourname is the name of the

individual or organization who created the DTD• Type a space and then a label (often the root element) where the

label describes the DTD.• Finally, type “//xx//” where xx is the two letter abbreviation for the

language of the XML document (e.g., EN for English, FR for French)

Page 34: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Declaring Public DTDs

To declare a public DTD, you must first add an attribute to your XML declaration and then specify where the parser/browser can find your DTD:– In the XML version declaration, add the attribute standalone=“no”

• e.g. <?xml version=“1.0” standalone=“no”?>– Next, type <!DOCTYPE root where root is the name of your root element

(e.g. <!DOCTYPE book)– Add a space and type PUBLIC, to indicate a public DTD, and then your

FPI in double-quotes• e.g. <!DOCTYPE book PUBLIC “-//Sven_Svenson//DTD book//EN//”>

– Then add another space and the absolute URL of your DTD file in quotation marks.

Page 35: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

XML Public DTD Declaration

Your final public external DTD declaration should look something like:

<?xml version=“1.0” standalone=“no”?><!DOCTYPE book PUBLIC “-//Stuart_Boon//DTD book//EN//” “http://is.dal.ca/~sboon/book.dtd”><book>….

Page 36: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Defining Elements and Attributes

A DTD must define rules for each and every element and attribute that will appear in your XML document. Otherwise, it will not be valid.Whenever you change your XML, remember to make the corresponding changes to your DTD, particularly if you are adding elements or attributes to your document.

Page 37: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Defining Elements

In order to define your XML markup, you must first define the content and structure of each element contain within your XML documents.To define an element, type <!ELEMENT yourtag where yourtag is the tag your are creating and wish to define.Next, type (contents) where contents describes the elements contained within the element you are defining, or type EMPTY if the element you are defining has no content.Keep in mind that you must not forget the parentheses and that EMPTY elements will often contain attributes.

Page 38: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Examples

Defining our root element– <!ELEMENT book (chapterone, chaptertwo,…,

sectionone, sectiontwo…, etc., etc.)>Defining an image element– <!ELEMENT image EMPTY>– Empty elements are often used to reference external

files (such as images) and binary dataAlways remember that XML is case-sensitive!

Page 39: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Structural Definitions

Definitions like the root definition on the previous page, describe structures within your XML document. That is, we define our root element as containing a number of other elements.Elements can be defined so as to contain only one other element (e.g., <!ELEMENT dog (dogname)> ) or a sequence of elements (such as our root example.These definitions define how the structure of your XML breaks down, forming a hierarchy or tree pattern.

Page 40: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Text Definitions

However, not all element contain structural information. Many will contain only content or textual information.To define an element to contain text:– Type <!ELEMENT yourtag where yourtag is the tag you are

creating and wish to define.– Next, type a space and (#PCDATA)>– This states that the element you define will only contain text– PCDATA stands for parsed character data and refers to

everything except your XML code.

Page 41: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Defining Elements… cont’d

So, for example, your DTD will contain structural elements, such as your root element, which describes what other elements are contained within it, as well as textual elements that contain only text:

<!ELEMENT book (chapterone, chaptertwo,…, sectionone, sectiontwo…, etc., etc.)>

<!ELEMENT booktitle (#PCDATA)>….

Page 42: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Defining Elements… cont’d

To place further constraints on the number of times that a given element can appear in your document (e.g., you don’t want 2 book titles), XML provides three special symbols: ? + *– Placing an “?” after an element indicates that this element can appear

only once, if at all, in your document– Placing an “+” after an element indicates that the element must appear at

least once, and can appear as many times as needed– Placing an “*” after an element indicates that the element can appear as

many times as needed, or not at all. Furthermore, adding an asterisk to a sequence in parentheses means that the elements can appear in any number and in any order.

Page 43: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Number constraints example

If we take our book element example, we can augment it like this:

<!ELEMENT book (booktitle?, chapter+, chaptertitle*,…, etc., etc.)>

• Here we limit <booktitle> so that it can only appear once, as well as indicating that a book must have at least one <chapter>, and that a book can contain as many <chaptertitle>s as necessary.

Page 44: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Creating attributes

While you can break down an element into smaller and smaller units of information, it is sometimes more useful to add supplementary data to the element itself rather than to the element’s content.In other words, information contained in attributes tends to be about your XML document, rather than your content. They are primarily metadataAttributes are very commonly used with empty elements to point or link to the content of the element.

Page 45: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Attributes… cont’d

To define attributes:– Type <!ATTLIST yourtag where yourtag is the name of the

element in which the attribute will appear.– Type the name of the attribute– Then, either type CDATA (not #PCDATA) for any combination

of numbers or text (basically for anything), or type (value1 | value2 | etc.) where either value1 or value2 (etc.) is the ONLY value acceptable. You could make huge strings of values by simply continuing to place a vertical bar between values.

Page 46: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Defining Attributes… cont’d

Finally, you must type one of the following:– “value” where value will be the default value if none is

explicitly set– “#FIXED value” where value is the default and ONLY value

for that attribute (i.e., it is fixed)– “#REQUIRED” to specify that the attribute must contain some

(not pre-specified value)– Or, “#IMPLIED” to specify that there is no default value, and

the value may be omitted if desired.Finish the <!ATTLIST with a > to complete your definition

Page 47: Introduction to XML February 07, 2002. From HTML to XML As mentioned in previous classes, if you know HTML, then you already know XML… really! In this

Attribute examples

<!ELEMENT date (#PCDATA)><!ATTLIST year CDATA #IMPLIED>

• This attribute definition says that the date element may contain an optional (#IMPLIED) year attribute that contains any number of characters (CDATA).

<!ELEMENT date (#PCDATA)><!ATTLIST year (1999 | 2000 | 2001 | 2002) #REQUIRED>

• This attribute definition says that the date element must be used (#REQUIRED) and that the value must be one of 1999, 2000, 2001, or 2002. Those are the only choices (from value list).