Upload
domenica-rinaldi
View
216
Download
0
Embed Size (px)
Citation preview
XML BASIC
Laurea Magistrale in InformaticaLaurea Magistrale in Informatica
Chapter 02 Chapter 02 Modulo del corsoModulo del corso
Thecnologies for InnovationThecnologies for Innovation
XML Basic
2
Agenda
Syntax : element and attributes
XML Prolog
Examples
Additional Resource
DTD and XML Schema : introduction
Well Formed and Valid Documents
Validation
Syntax : element and attributes
XML Prolog
Examples
Additional Resource
DTD and XML Schema : introduction
Well Formed and Valid Documents
Validation
XML Basic
3
Sintassi di un documento XML (I)
Un documento XML è un file di testo che contiene una serie di tag, attributi e testo secondo regole sintattiche ben definite
Un documento XML è intrinsecamente caratterizzato da una struttura gerarchica
Esso è composto da componenti denominati elementi
Ciascun elemento rappresenta un componente logico del documento e può contenere altri elementi (sottoelementi) o del testo
XML Basic
4
Gli elementi possono avere associate altre informazioni che ne descrivono le proprietà. Queste informazioni sono chiamate attributi
L’organizzazione degli elementi segue un ordine gerarchico ad albero che prevede un elemento principale, chiamato root element o semplicemente root o radice
La radice contiene l’insieme degli altri elementi del documento. Possiamo rappresentare graficamente la struttura di un documento XML tramite un albero, generalmente noto come document tree
Sintassi di un documento XML(II)
XML Basic
5
articolo
testo
paragrafoparagrafo
testo
immagine
paragrafo
codice
testo
titolo
titolo
titolotitolo
file
Document Tree Example (I)
<?xml version="1.0" ?><articolo titolo="Titolo dell’articolo"> <paragrafo titolo="Titolo del primo paragrafo"> <testo> Blocco di testo del primo paragrafo </testo> <immagine file="immagine1.jpg"> </immagine> </paragrafo> <paragrafo titolo="Titolo del secondo paragrafo"> <testo> Blocco di testo del secondo paragrafo </testo> <codice> Esempio di codice </codice> <testo> Altro blocco di testo </testo> </paragrafo> <paragrafo tipo="bibliografia"> <testo> Riferimento ad un articolo </testo> </paragrafo></articolo>
XML Basic
6
Document Tree Example (Newspaper)
<newspaper><section><page><article><headline>XML 8 Announced</headline><byline>Jan Doe</byline><body>The W5C today announced...</body></article><ad><client>Crazy Ed's Cars</client><size>1/4 page</size><run>2 weeks</run></ad></page></section></newspaper>
The structure of the document reflects the structure of the newspaper: The newspaper contains sections, which in turn have pages, and on each page are articles and advertisements.
XML Basic
7
Trees and Relationships
As you can see from the preceding example, XML documents are structured as trees, and there are relationships that exist between the elements in an XML document.
For example, with these elements:
<newspaper>
<section>
</section>
</newspaper>
the <newspaper> element is the parent of the <section> element, and the <section> element is the child of the <newspaper> element.
These relationships become very important as you move into more advanced areas of XML, as you will use these relationships for navigating and locating information within the XML tree with technologies such as XPath.
XML Basic
8
ELEMENTS
The bulk of actual data in your XML documents will be in the form of elements.
Elements are tag pairs, which are case sensitive, consisting of both a start tag, and an end tag.
The name of the element itself is called the element type, whereas within a document, when the element occurs it is referred to as an instance of the element.
<example>An Example Element</example>
The element type here is "example"; The element itself is actually the entire string,with the start tag,
content, and end tag all together. The text contained between the tags is called the element
content.
XML Basic
9
ELEMENTS:different types of content
PCData (text) When elements have PCData or text content, they do not contain
any child elements, only text.
The "PCData" stands for "Parsed Character Data," which is simply data that is read by the XML parser.
Element If an element has only child elements as its content:
<example><child>Some text...</child></example> then the element is said to have element content.
Mixed If an element has both text and element content:
<example>Text and <bold>emphasized</bold> text.</example> then the element is said to have mixed content.
XML Basic
10
Empty Tags
There are instances where you might have an element that is empty, or does not contain any text or child elements.
If this is the case, you can write the element with both start and end tags:
<empty></empty>
However, there is also a shorthand that can be used for elements that do not have any content:
<empty/>
XML Basic
11
ATTRIBUTES
Not all data in XML documents is stored in element content. Some information may be stored in attributes.
Attributes are simply a means for associating named values with elements.
HTML example: <img src="myimage.gif"> img tag, the src specification is an attribute.
Attributes are placed in the start-tag of the element, separated with a space. The content of the attribute is enclosed in quotation marks, either single or double, and an element can have any number of attributes, so long as each attribute name is unique.
<shirt size="medium"/>;<pants size="30">Bell Bottoms</pants>
As you can see, attributes can be used with empty elements or elements with text or mixed content as well.
XML Basic
12
Structure: XML Declaration
version The version attribute is required, and it is used to alert the XML
processor to the version of XML which was used to author this particular XML document. Currently, the only acceptable version is "1.0."
encoding The encoding attribute is used to specify the character set that is used
for encoding the document. You can use any Unicode character set here, and the default value is "UTF-8." This attribute is not required.
standalone The standalone attribute is used to denote whether or not the
document requires a DTD in order to be processed. If the value is "No" then the XML Processor will assume that the document needs a DTD, and if there is not one, it will cause an error. This attribute (or declaration) is not required, and the default value is "Yes."
Every XML document should begin with the XML Declaration, which takes the following form: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
The XML declaration always starts with the "<?xml" and always ends with "?>".
XML Basic
13
Structure: XML Prolog
The XML Prolog consists of least two parts—the XML
Declaration which we have just discussed, and a
DOCTYPE Declaration.
The DOCTYPE Declaration is used to associate an internal
set of declarations with the document, or to link the XML
document to an external DTD file for validation.
The XML Prolog is not required to work with well-formed
XML; however, to work with valid XML you will need to
use the DOCTYPE declaration.
XML Basic
14
EXAMPLE (I)
In this example, we're going to create a simple XML document for a technical journal. This journal XML document will contain some elements that describe the cover, a table of contents, the journal articles themselves, and an index.
First, we need to start our XML document with the XML declaration and the root element:<?xml version="1.0"?><journal></journal>
The XML declaration contains the mandatory version attribute, but because we are not going to do anything with special character sets or with validation, we can leave out the encoding and standalone attributes. By not specifying them, the default values will be used.
Next, we will create the element for the cover of our journal, and call it <cover>.<cover art="photo.jpg"><slug>Learn the Secrets of XML</slug></cover>
The cover element has an art attribute, which is used to specify the cover art. The cover element has element content; that is, it contains another element, which is called slug which contains the text for the slug, or teaser, that will appear on the cover as well.The slug element contains PCData content, which is just text.
XML Basic
15
EXAMPLE (II)
Next, we want to create the element for the table of contents. We'll call this element contents and like the cover element, it will have element content, in the form of a title element. The title element will contain text that is the title of the article, as it would appear in the table of contents. The other piece of information we need in the table of contents is the page number on which the article appears.<contents><title page="3">Authoring XML Documents</title></contents>
For the articles, we're going to use a number of elements to describe the article:
• article— This element will contain the child elements which contain the data for the article and its author.
• headline— This element is the headline of the article.
• byline— The byline for the author of the article.
• body— The text of the article.
The resulting XML code looks like this:<article><headline>Authoring XML Documents</headline><byline>Joe Smith</byline><body>So you want to work with XML...</body></article>
XML Basic
16
EXAMPLE (III)
Finally, we want to create an index to track references to technologies within the article.
The index element will be used to store each reference that will appear in the index, and
it will contain child elements for each reference. That reference element will also
need to have a page number associated with it, so we can once again make use of a page
attribute to track the page number of the reference.
The resulting XML code is as follows:
<index>
<reference page="4">XML Prolog</reference>
</index>
XML Basic
17
EXAMPLE complete listing
<?xml version="1.0"?><journal><cover art="photo.jpg"><slug>Learn the Secrets of XML</slug><slug>XSLT Transforms the Web</slug><slug>Namespaces and Why You Need Them</slug></cover><contents><title page="3">Authoring XML Documents</title><title page="6">Transforming the Web with XSLT</title><title page="9">What's in a Namespace?</title><title page="12">Graphics and XML with SVG</title></contents><article><headline>Authoring XML Documents</headline><byline>Joe Smith</byline><body>So you want to work with XML...</body></article><article><headline>Transforming the Web with XSLT</headline><byline>Jane Doe</byline><body>XML can easily be turned into HTML...</body></article>
<article><headline>What's in a Namespace?</headline><byline>Jane Jones</byline><body>When is an name not a name...</body></article><article><headline>Graphics and XML with SVG</headline><byline>Sally Smith</byline><body>Drawing on the Web with SVG...</body></article><index><reference page="4">XML Prolog</reference><reference page="8">apply-templates</reference><reference page="11">xmlns</reference><reference page="15">SVG</reference></index></journal>
XML Basic
18
Riepilogo sintassi (I)
Prologo XML, necessario per ogni documento XML
Ogni documento XML deve contenere un unico elemento di massimo livello (root) che contenga tutti gli altri elementi del documento.
Ogni elemento deve avere un tag di chiusura o, se vuoti, possono prevedere la forma abbreviata (/>)
Gli elementi devono essere opportunamente nidificati, cioè i tag di chiusura devono seguire l’ordine inverso dei rispettivi tag di apertura
XML è case-sensitive I valori degli attributi devono sempre essere
racchiusi tra singoli o doppi apici
<?xml version="1.0" ?>
XML Basic
19
La violazione di una qualsiasi di queste regole fa in modo che il documento risultante non venga considerato ben formato. Anche se queste regole possono sembrare semplici, occorre prestarvi molta attenzione se si usa un semplice editor di testo. Codice del tipo
<articolo titolo=test>...</Articolo>
darà qualche problema, e lo stesso dicasi per situazioni analoghealla seguente:
<paragrafo><testo>abcdefghi...</paragrafo></testo>
Riepilogo sintassi (II)
XML Basic
20
Riepilogo sintassi (III)
The text enclosed by the root tags may contain an arbitrary number of XML elements.
The basic syntax for one element is:
<element_name attribute_name="attribute_value">Element Content</element_name>
The two instances of »element_name« are referred to as the start-tag and end-tag, respectively.
«Element Content» is some text which may again contain XML elements.
So, a generic XML document contains a tree-based data structure.
XML Basic
21
<recipe name="bread" prep_time="5 mins" cook_time="3 hours">
<title>Basic bread</title>
<ingredient amount="8" unit="dL">Flour</ingredient>
<ingredient amount="10" unit="grams">Yeast</ingredient>
<ingredient amount="4" unit="dL" state="warm">Water</ingredient>
<ingredient amount="1" unit="teaspoon">Salt</ingredient>
<instructions>
<step>Mix all ingredients together.</step>
<step>Knead thoroughly.</step>
<step>Cover with a cloth, and leave for one hour in warm room.</step>
<step>Knead again.</step>
<step>Place in a bread baking tin.</step>
<step>Cover with a cloth, and leave for one hour in warm room.</step>
<step>Bake in the oven at 180(degrees)C for 30 minutes.</step>
</instructions>
</recipe>
Recipe Data Structure
XML Basic
22
Anche la scelta dei nomi dei tag deve seguire alcune regole. Un tag può iniziare con un lettera o un underscore (_) e può contenere lettere, numeri, il punto, l’underscore (_) o il trattino (-). Non sono ammessi spazi o altri caratteri. Potrebbe essere necessario inserire in un documento XML dei caratteri particolari che potrebbero renderlo non ben formato. Ad esempio, se dobbiamo inserire del testo che contiene il simbolo <, corriamo il rischio che possa venire interpretato come l’inizio di un nuovo tag, come nel seguente esempio:
<testo> il simbolo < indica minore di</testo>
Riepilogo sintassi (IV)
XML Basic
23
Entity references
In the markup languages a character entity reference is a reference to a particular kind of named entity that has been predefined or explicitly declared in a Document Type Definition (DTD).
The replacement text of the entity consists of a single character from the Universal Character Set/Unicode. The purpose of a character entity reference is to provide a way to refer to a character that is not universally encodable.
Actually, XML has two relevant concepts:
a "predefined entity reference" is a reference to one of the special
characters denoted by <, >, &, ", or ';
while a "character reference" (or "numeric character reference") is a construct such as   or   that refers to a character by means of its numeric Unicode codepoint.
XML Basic
24
Entity Reference Examples (I)
Here is an example using a predeclared XML entity to
represent the ampersand in the name "AT&T":<company_name>AT&T</company_name>
<testo> il simbolo < indica minore di</testo>
An example of a numeric character reference is "€", which refers to the Euro symbol by means of its Unicode codepoint in hexadecimal
XML Basic
25
Entity references DTD declaration
Additional entities (beyond the predefined ones) can be declared in the document's Document Type Definition (DTD). Declared entities can describe single characters or pieces of text, and can reference each other.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE example [
<!ENTITY copy "©">
<!ENTITY copyright-notice "Copyright © 2006, XYZ Enterprises">
]>
<example>
©right-notice;
</example>
When viewed in a suitable browser, the XML document above appears as:
Copyright © 2006, XYZ Enterprises
XML Basic
26
Numeric character references
Numeric character references look like entity references, but instead of a name, they contain the "#" character followed by a number.
The number (in decimal or "x"-prefixed hexadecimal) represents a Unicode code point.
They have typically been used to represent characters that are not easily encodable, such as an Arabic character in a document produced on a European computer.
The ampersand in the "AT&T" example could also be escaped like this (decimal 38 and hexadecimal 26 both represent the Unicode code point for the "&" character):
<company_name>AT&T</company_name>
<company_name>AT&T</company_name>
Similarly, in the previous example, notice that "©" is used to generate the “©” symbol.
XML Basic
27
In determinate situazioni gli elementi da sostituire con le entità possono essere molti, il che rischia di rendere illeggibile il testo ad essere umano. Si consideri il caso in cui un blocco di testo illustri proprio del codice XML:
<codice> <libro> <capitolo> </capitolo> </libro></codice>
In questo caso, al posto di sostituire tutte le occorrenze dei simboli speciali con le corrispondenti entità è possibile utilizzare una sezione CDATA.
CDATA SECTION
XML Basic
28
Una sezione CDATA (Character DATA) è un blocco di testo che viene considerato sempre come testo, anche se contiene codice XML o altri caratteri speciali. Per indicare una sezione CDATA è sufficiente racchiuderla tra le sequenze di caratteri <![CDATA[ e ]]>. Il nostro esempio diventerà come segue:
<codice> <![CDATA[ <libro> <capitolo> </capitolo> </libro> ]]></codice>
Character Data
XML Basic
29
Comments
can be placed anywhere in the tree, including in the text if the content of the element is text or #PCDATA.
XML comments start with <!- - and end with - ->.
Two consecutive dashes (--) may not appear anywhere in the text of the comment.
<!-- This is a comment. -->
XML Basic
30
Processing Instruction(PI)
XML provides the processing instruction as an alternative means of passing information to particular applications that may read the document.
A processing instruction begins with <? and ends with ?>.
Immediately following the <? is an XML name called the target , possibly the name of the application for which this processing instruction is intended or possibly just an identifier for this particular processing instruction.
The rest of the processing instruction contains text in a format appropriate for the applications for which the instruction is intended.
XML Basic
31
PI EXAMPLE
The most common processing instruction, xml-stylesheet, is used to attach stylesheets to documents.
It always appears before the root element
In this example, the xml-stylesheet processing instruction tells browsers to apply the CSS stylesheet person.css to this document before showing it to the reader.
<?xml-stylesheet href="person.css" type="text/css"?>
<person>
Alan Turing
</person>
XML Basic
32
SCHEMI
Sistema per la catalogazione delle specie a rischio di estinzione EndML
Elementi Animale Sottospecie Popolazione minacce
1. Non si scrivono documenti in XML2. Si usa XML per creare specifici linguaggi di marcatura personalizzati
(applicazioni XMLapplicazioni XML)3. Si scrivono i documenti in quei linguaggi
1. Non si scrivono documenti in XML2. Si usa XML per creare specifici linguaggi di marcatura personalizzati
(applicazioni XMLapplicazioni XML)3. Si scrivono i documenti in quei linguaggi
Lo specifico linguaggio si definisce specificando quali elementi ed attributi sono ammessi o necessari in un documento conformeconforme
Lo specifico linguaggio si definisce specificando quali elementi ed attributi sono ammessi o necessari in un documento conformeconforme
Insieme di regoleSchema del Schema del documentodocumento
XML Basic
33
DTD & XML Schema
Definiscono regole per la produzione di documenti strutturati
Una DTD: Document Type Definition contiene le definizioni dei tipi di elementi, degli attributi, delle entità, delle notazioni. Un DTD dichiara
quali elementi, tipi, entità notazioni sono legali
…. ed in quale parte del documento lo sono
XML Schema: Successore delle DTD
Basato su XML, fornisce un’alternativa alle DTD, più potente
XML Basic
34
Document Type Definition DTD
The oldest schema format for XML inherited from SGML.
It has no support for newer features of XML, most importantly namespaces.
It lacks expressiveness. Certain formal aspects of an XML document cannot be captured in a DTD.
It uses a custom non-XML syntax, to describe the schema.
DTD is still used in many applications because it is considered the easiest to read and write.
XML Basic
35
XML Schema
XML schema language, described by the W3C as the successor of DTDs…….
Initialism : XSD (XML Schema Definition). XSDs are far more powerful than DTDs in describing
XML languages. They use a rich datatyping system, allow for more
detailed constraints on an XML document's logical structure, and must be processed in a more robust validation framework.
XSDs also use an XML-based format, which makes it possible to use ordinary XML tools to help process them, although XSD implementations require much more than just the ability to read XML.
XML Basic
36
Well-formed and valid XML documents two levels of correctness
Well-formed. A well-formed document conforms to all of XML's syntax rules. For example, if a start-tag appears without a
corresponding end-tag, it is not well-formed. A document that is not well-formed is not considered
to be XML; a conforming parser is not allowed to process it.
Valid. A valid document additionally conforms to some semantic rules. These rules are either user-defined, or included as an XML schema, especially DTD. For example, if a document contains an undefined
element, then it is not valid; a validating parser is not allowed to process it.
XML Basic
37
Parser validanti e non validanti
Il cuore di un applicazione XML è il parser, ovvero quel modulo che legge il documento XML e ne crea una rappresentazione interna utile per successive elaborazioni (come la visualizzazione)
Un parser validante, in presenza di un DTD, è in grado di verificare la validità del documento, o di segnalare gli errori di markup presenti
Un parser non validante invece, anche in presenza di un DTD è solo in grado di verificare la buona forma sintattica del documento
Un parser non validante è molto più semplice e veloce da scrivere, ma è in grado di fare meno controlli. In alcune applicazioni, però, non è necessario validare i documenti, solo verificare la loro buona forma
XML Basic
38
Validazione del Linguaggio di MARKUP
Documento XML
Parser validator
e
SCHEMA XML
Collegato al
documento
Documento valido se conforme a tutte le regole
Documento XML
Parser non
validatore
SCHEMA XML
Collegato al
documento
Documento well formed se sintatticamente corretto
XML Basic
39
Well formed verification ; book markup
t ito lo
p rim o no m e
se co n do no m e
a u to re
p re fa zio ne
ca p ito lo 1
ca p ito lo 2
a p pe nd ice
ca p ito lo m e d ia
b o ok
<?xml version = "1.0"?>
<!-- Impiego degli elementi e degli attributi XML -->
<book isbn = "999-99999-9-X">
<title> XML Primer</title>
<author>
<firstName>Paul</firstName>
<lastName>Deitel</lastName>
</author>
<chapters>
<preface num = "1" pages = "2">Welcome</preface>
<chapter num = "1" pages = "4">Easy XML</chapter>
<chapter num = "2" pages = "2">XML Elements</chapter>
<appendix num = "1" pages = "9">Entities</appendix>
</chapters>
<media type = "CD"/>
</book>
XML Basic
40
Well formed verification ; book markup (II)
<?xml version = "1.0"?>
<!-- Impiego degli elementi e degli attributi XML -->
<book isbn = "999-99999-9-X">
<title> XML Primer</title>
<author>
<firstName>Paul</firstName>
<lastName>Deitel</lastName>
</author>
<chapters>
<preface num = "1" pages = "2">Welcome</preface>
<chapter num = "1" pages = "4">Easy XML</chapter>
<chapter num = "2" pages = "2">XML Elements</chapter>
<appendix num = "1" pages = "9">Entities</appendix>
<!-- </chapters> -->
<media type = "CD"/>
</book>
XML Basic
41
Markup del libro con output ottenuto con un foglio di stile
Usage.xml applico Foglio di stileUsage.xsl
ottengo
Istruzione di elaborazione (PI o Processing Instruction :
<?xml:stylesheet type ="text/xsl" href ="usage.xsl"?>
1. <? E ?> delimitano le PI
2. Target o riferimento (xml:stylesheet)
3. Valore type ="text/xsl" href ="usage.xsl”
Istruzione di elaborazione (PI o Processing Instruction :
<?xml:stylesheet type ="text/xsl" href ="usage.xsl"?>
1. <? E ?> delimitano le PI
2. Target o riferimento (xml:stylesheet)
3. Valore type ="text/xsl" href ="usage.xsl”
XML Basic
42
Analisi della validazione
Documento DTD intro.dtd
<!ELEMENT myMessage ( message )>
Dichiara l’elemento myMessage come root con un unico child di nome message
<!ELEMENT message ( #PCDATA )>
Dichiara che l’elemento message deve contenere dati di caratteri riconosciuti dal parser XML
Documento XML intro.xml
<?xml version = "1.0"?>
<!DOCTYPE myMessage SYSTEM "intro.dtd">
Prologo del documento Dichiarazione di tipo !DOCTYPE
myMessage nome del tipo (nome dell’elemento root
SYSTEM la dichiarazione è esterna al documento e si trova alla URL: intro.dtd
<myMessage>
<message>Welcome to XML!</message>
</myMessage>
XML Basic
43
<?xml version = "1.0"?>
<!-- Fig. 6.3: intro-invalid.xml -->
<!-- Simple introduction to XML markup -->
<!DOCTYPE myMessage SYSTEM "intro.dtd">
<!-- Root element missing child element message -->
<myMessage>
</myMessage>
XML Basic
44
Additional Resources (I)
XML 1.0 Recommendation (http://www.w3.org/TR/REC-xml ) The XML 1.0 Recommendation (Second Edition) from
the W3C is the final word on XML. If you have a question about a technical aspect of XML, this should be the first source you consult.
Annotated XML Recommendation (http://www.xml.com/axml/testaxml.htm) The Annotated XML Recommendation is an excellent
resource for making sense of the sometimes difficult-to-read XML Recommendation. Written by Tim Bray (one of the XML 1.0 Editors), the Annotated XML Recommendation provides some clarification on confusing areas of the Rec, and offers some historical tidbits as well.
XML Basic
45
Additional Resources (II)
XML-DEV ([email protected]) The XML-DEV mailing list is a good resource for
developers actively working with XML. Discussion ranges from Recommendation debates to practical tips. To subscribe, send an e-mail to the address with "subscribe“ in the body of the message.
comp.text.xml The comp.text.xml USENET Newsgroup can also be a
great resource for interacting with other XML developers.
The XML FAQ (http://www.ucc.ie/xml/) The XML Frequently Asked Questions can address
some issues such as why XML is structured the way it is, and when it might be appropriate to use XML as a solution.offers some historical tidbits as well.
XML Basic
46
Additional Resources (III)
XML.com (http://www.xml.com) XML.com is a commercial Web site dedicated to
tracking and reporting on XML and XML-related issues. The site covers not only XML 1.0, but also any and all related activities and can be a great source of tutorials, articles,and general XML information.
XML.org (http://www.xml.org) XML.org is another commercial site, billing itself as
the industry portal for XML. The site features the XML Cover Pages, which is Robin Cover's news column tracking developments in SGML/XML.
XML Basic
47
PARSER
AltovaXML free parser from Altova, also included in XMLSpy, MapForce, and StyleVision
RomXML Embedded XML commercial toolkit written in ANSI-C.
XDOM open-source XML parser (and DOM and XPath implementation) in Delphi/Kylix.
XML resources at the Open Directory Project TinyXml Simple and small C++ XML parser. FoX fully validating XML parser library, written in Fortran. Intel_XSS XML parsing, validation, XPath, XSLT. sw8t.xml Lightweight, high-performance, intuitive
JavaScript XML Parser. Includes API docs and developer's guide