Upload
ruggiero-ippolito
View
213
Download
1
Tags:
Embed Size (px)
Citation preview
XML Introduction
Laurea Magistrale in InformaticaLaurea Magistrale in Informatica
Chapter 01 Chapter 01 Modulo del corsoModulo del corso
Thecnologies for InnovationThecnologies for Innovation
XML - Introduction
2
Agenda
What is …… Ten points for XML History and Evolution Technologies for add funtionalities XML Family XML Application Areas Electronic Data Interchange
XML - Introduction
3
XML: what is
The Extensible Markup Language (XML) is a general-purpose specification for creating custom markup languages
markup language is an artificial language using a set of annotations to text that give instructions regarding how text is to be displayed. A well-known example of a markup language in use in
computing is HyperText Markup Language (HTML)
It is classified as an extensible language because it allows its users to define their own elements
XML - Introduction
4
XML: cosa è
XML è un metalinguaggio, che permette di definire sintatticamente linguaggi di markup
definisce un insieme regole (meta)sintattiche, attraverso le quali è possibile descrivere formalmente un linguaggio di markup, detto applicazione XML ogni applicazione XML eredita da XML un insieme di caratteristiche
sintattiche comuni ogni applicazione XML a sua volta definisce una sintassi formale
particolare XML permette di esplicitare la (le) struttura(e) di un documento in modo
formale mediante marcatori (markup) che vanno inclusi all’interno del testo (character data)
Il markup rappresenta la struttura logica del documento Il markup si riconosce dal resto del testo perché compreso tra delimiter,
informalmente: <xxxx> &yyyy;
XML - Introduction
5
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
1. XML is for structuring data
XML documents reflect the structure of the data that they contain. For example, if the document were a book, it might contain <section> elements, which would in turn contain <chapter> elements, and so on.
XML is a set of rules (you may also think of them as guidelines or conventions) for designing text formats that let you structure your data.
XML makes it easy for a computer to generate data, read data, and ensure that the data structure is unambiguous.
XML avoids common pitfalls in language design: it is extensible, platform-independent, and it supports internationalization and localization. fully Unicode-compliant.
XML - Introduction
6
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
2. XML looks a bit like HTML
Like HTML, XML makes use of tags (words bracketed by '<' and '>') and attributes (of the form name="value").
While HTML specifies what each tag and attribute means, and often how the text between them will look in a browser, XML uses the tags only to delimit pieces of data, and leaves the interpretation of the data completely to the application that reads it. In other words, if you see "<p>" in an XML file, do not assume it is
a paragraph. Depending on the context, it may be a price, a parameter, a person, a p... (and who says it has to be a word with a "p"?).
XML - Introduction
7
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
3. XML is text, but isn't meant to be read
Although XML is verbose, and it is all ASCII text, XML is still designed primarily to be used by automated systems, not necessarily read by humans.
Like HTML, XML files are text files that people shouldn't have to read, but may when the need arises.
Compared to HTML, the rules for XML files allow fewer variations. A forgotten tag, or an attribute without quotes makes an XML file unusable, while in HTML such practice is often explicitly allowed.
XML - Introduction
8
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
4. XML is verbose by design
Since XML is a text format and it uses tags to delimit the data, XML files are nearly always larger than comparable binary formats.
That was a conscious decision by the designers of XML. The advantages of a text format are evident, and the disadvantages can usually be compensated at a different level. Disk space is less expensive than it used to be, and compression
programs like zip and gzip can compress files very well and very fast.
In addition, communication protocols such as modem protocols and HTTP/1.1, the core protocol of the Web, can compress data on the fly, saving bandwidth as effectively as a binary format.
XML - Introduction
9
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
5. XML is a family of technologies
The core of XML is the XML 1.0 recommendation. Beyond XML 1.0, "the XML family" is a growing set of modules that offer useful services to accomplish important and frequently demanded tasks XLink describes a standard way to add hyperlinks to an XML file. XPointer is a syntax in development for pointing to parts of an XML document. An
XPointer is a bit like a URL, but instead of pointing to documents on the Web, it points to pieces of data inside an XML file.
CSS, the style sheet language, is applicable to XML as it is to HTML. XSL is the advanced language for expressing style sheets. It is based on XSLT, a
transformation language used for rearranging, adding and deleting tags and attributes.
The DOM is a standard set of function calls for manipulating XML (and HTML) files from a programming language.
XML Schemas 1 and 2 help developers to precisely define the structures of their own XML-based formats.
XML - Introduction
10
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
6. XML is new, but not that new
Development of XML started in 1996 and it has been a W3C Recommendation since February 1998, which may make you suspect that this is rather immature technology.
In fact, the technology isn't very new. Before XML there was SGML, developed in the early '80s, an ISO standard since 1986, and widely used for large documentation projects.
The designers of XML simply took the best parts of SGML, guided by the experience with HTML, and produced something that is no less powerful than SGML, and vastly more regular and simple to use.
XML - Introduction
11
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
7. XML leads HTML to XHTML
There is an important XML application that is a document format: W3C's XHTML, the successor to HTML. XHTML has many of the same elements as HTML.
The syntax has been changed slightly to conform to the rules of XML. A format that is "XML-based" inherits the syntax from XML and restricts it in certain ways (e.g, XHTML allows "<p>", but not "<r>"); it also adds meaning to that syntax (XHTML says that "<p>" stands for "paragraph", and not for "price", "person", or anything else).
XML - Introduction
12
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
8. XML is modular
Using XML, you can define vocabularies that are designed to be reused.
By creating DTDs or XML Schemas, you can create sets of documents that are all based on common vocabularies.
Similarly, using XML Namespaces, you can publish and share those vocabularies without conflicts. Since two formats developed independently may have elements
or attributes with the same name, care must be taken when combining those formats (does "<p>" mean "paragraph" from this format or "person" from that one?).
XML - Introduction
13
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
9. XML is the basis for RDF and the Semantic Web
RDF, or the Resource Description Framework, and the Semantic Web are both initiatives of the W3C to help refine the way information is organized on the Web.
XML is the basis of these technologies, and will help organize the information on the Web, making it easier for users to find and access the information they need.
XML - Introduction
14
XML in 10 Points http://www.w3.org/XML/1999/XML-in-10-points.html
10. XML is license-free, platform-independent and well-supported
XML is not owned by any corporation, nor is it controlled by a corporation.
It is a publication of the W3C, and as such, it can be used freely by anyone.
And although some may have issues with the W3C process, or what ends up in the final Recommendations, the bottom line is that it makes XML a fairly open standard. (open standard is a standard that is publicly available and has various rights to use associated with it. )
XML - Introduction
15
Riferimenti in Italiano
XML in 10 punti Questo sommario in 10 punti cerca di raccogliere
alcuni concetti basilari che permettano al neofita di vedere un po' di luce attraverso la nebbia. di Andrea Benassi 26 Novembre 2003
http://www.indire.it/content/index.php?action=read&id=313
XML - Introduction
16
XML e W3C
XML is recommended by the World Wide Web Consortium (W3C).
The recommendation specifies both the lexical grammar and the requirements for parsing.
Lexical That is, the rules governing how a character sequence is divided up into subsequences of characters, each of which represents an individual token.
parsing, or, more formally, syntactic analysis, is the process of analyzing a sequence of tokens to determine their grammatical structure with respect to a given (more or less) formal grammar.
XML - Introduction
17
History
It started as a simplified subset of the Standard Generalized Markup Language (SGML) The versatility of SGML for dynamic information
display was understood by early digital media publishers in the late 1980s prior to the rise of the Internet.
By the mid-1990s some practitioners of SGML had gained experience with the World Wide Web, and believed that SGML offered solutions to some of the problems the Web was likely to face as it grew. Dan Connolly added SGML to the list of W3C's
activities when he joined the staff in 1995; work began in mid-1996 when Sun Microsystems engineer Jon Bosak developed a charter and recruited collaborators.
It started as a simplified subset of the Standard Generalized Markup Language (SGML) The versatility of SGML for dynamic information
display was understood by early digital media publishers in the late 1980s prior to the rise of the Internet.
By the mid-1990s some practitioners of SGML had gained experience with the World Wide Web, and believed that SGML offered solutions to some of the problems the Web was likely to face as it grew. Dan Connolly added SGML to the list of W3C's
activities when he joined the staff in 1995; work began in mid-1996 when Sun Microsystems engineer Jon Bosak developed a charter and recruited collaborators.
XML - Introduction
18
Evolution
XML was compiled by a working group of eleven members, supported by an (approximately) 150-member Interest Group. Technical debate took place on the Interest Group mailing list and issues were resolved by consensus or, when that failed, majority vote of the Working Group.
The XML Working Group never met face-to-face; the design was accomplished using a combination of email and weekly teleconferences. The major design decisions were reached in twenty weeks of intense work between July and November 1996, when the first Working Draft of an XML specification was published.
Further design work continued through 1997, and XML 1.0 became a W3C Recommendation on February 10, 1998.
XML - Introduction
19
Working Group's goals
Internet usability, general-purpose usability SGML compatibility Facilitation of easy development of processing
software and minimization of optional features Legibility, formality, conciseness, and ease of
authoring. Like its antecedent SGML, XML allows for some
redundant syntactic constructs and includes repetition of element identifiers. In these respects, terseness was not considered
essential in its structure.
XML - Introduction
20
The name “XML” …. other names (CURIOSITY)
"MAGMA" (Minimal Architecture for Generalized
Markup Applications)
"SLIM" (Structured Language for Internet Markup)
"MGML" (Minimal Generalized Markup Language).
XML - Introduction
21
Perché non SGML?
SGML ha molti pregi, ma ha dalla sua una complessità d’uso e di comprensione notevole Non è pensato per la rete XML contiene tutte le caratteristiche di SGML che servono per creare applicazioni generali
...senza scendere nel livello di dettaglio e pedanteria richiesti da SGML
Inoltre, il successo di HTML ha fatto capire che: Il mondo degli sviluppatori è pronto ad accogliere il modello
basato sul markup La semplicità è un punto di forza fondamentale
The differences between SGML and XML are highlighted in a note published by
the W3C, which can be found at: http://www.w3.org/TR/NOTE-sgmlxml-971215 .
XML - Introduction
22
XML version
XML 1.0, was initially defined in 1998. It has undergone minor revisions since then, without being given a
new version number, and is currently in its fourth edition, as published on August 16, 2006. It is widely implemented and still recommended for general use.
The second, XML 1.1, was initially published on February 4, 2004, the same day as XML 1.0 Third Edition, and is currently in its second edition, as published on August 16, 2006. XML 1.1 is not very widely implemented and is recommended for
use only by those who need its unique features. XML 1.0 and XML 1.1 differ in the requirements of characters
used for element and attribute names: XML 1.0 only allows characters which are defined in Unicode 2.0, which includes most world scripts, but excludes those which were added in later Unicode versions.
XML - Introduction
23
HTML case
XML non è un sostituto di HTML
HTML nasce come DTD di SGML per la pubblicazione di semplici documenti testuali con qualche immagine e collegamento ipertestuale
Vengono implementate nel tempo molte estensioni proprietarie che creano barriere all’interoperatività degli strumenti
I browser (parser) rilassano le regole sintattiche ed interpretano anche documenti HTML “scorretti”
HTML è per presentare informazioni, XML è per descrivere informazioni.
XML - Introduction
24
Many Technologies Contribute to the Power of XML
If you wanted to use XML as a file format for storing information, and then publishing that information in print, on CD-ROM, and on the World Wide Web, you would need to make use of some you would need to make use of some other technologies that are not specifically XML, but might be other technologies that are not specifically XML, but might be based on XML, or be supplementary to XML. based on XML, or be supplementary to XML.
If you wanted to use XML as a file format for storing information, and then publishing that information in print, on CD-ROM, and on the World Wide Web, you would need to make use of some you would need to make use of some other technologies that are not specifically XML, but might be other technologies that are not specifically XML, but might be based on XML, or be supplementary to XML. based on XML, or be supplementary to XML.
You might have an XML document that you want to display on the Web; however, XML documents do not contain any information about display formatting. To transform the XML data into HTML or XHTML for displaying it on the Web, you might need to use a style sheet, such as the
Extensible Stylesheet Language (XSL)
XML - Introduction
25
Documet Type Definition
You might also need to specify exactly how XML files are to be structured, using a set of rules ( Document Type Definition (DTD)). DTDs are an integral part of creating valid XML, but
they are actually not formally defined anywhere. DTDs are a holdover from SGML, maintained for
compatibility reasons. The syntax used for the declarations in DTDs is
defined as a part of the XML 1.0 Recommendation DTDs are useful—without them or another type of
schema, it is impossible to verify that an XML file is structured properly within the rules the author had in mind.
But DTDs are not required in order to use XML
XML - Introduction
26
Note: XML can come in two varieties: well formed and valid
Well-formed XML means that the XML is written in the proper format, and that it complies with all the rules for XML as set forth in the XML 1.0 Recommendation.
Valid XML means that the XML document has been validated against a rule set, or schema,
XML - Introduction
27
XML 1.0 Reccomandation defines the basic structures of XML
Elements Attributes Entities Notations CDATA sections PCData Sections Comments This includes defining the conventions for names,
case sensitivity, start tags, end tags, and so on. Everything you need to work with well-formed
XML is contained within this one Recommendation.
XML - Introduction
28
XML-Related Recommendations
There are also a number of W3C Recommendations that are very closely related to the core XML technology.
In this category, the Recommendations define some technologies that are designed specifically to add functionality to XML 1.0.
These technologies include XML Namespaces and XML Schemas
There are also a number of W3C Recommendations that are very closely related to the core XML technology.
In this category, the Recommendations define some technologies that are designed specifically to add functionality to XML 1.0.
These technologies include XML Namespaces and XML Schemas
XML - Introduction
29
Namespaces
XML allows developers to create their own markup languages, for use in a variety of applications.
However, there is nothing to stop two developers from developing markup languages that have similar tags, but with different structure or meaning.
If both of these developers were using their markup languages internally only, this might not be a problem.
But what if these developers start sharing their vocabularies with their clients, vendors, and the general public? The result could be confusion about what tag means what, and in what context.
XML - Introduction
30
Namespace example (I)
Developer One designs a <name> element that looks like this:
<name>
<first>John</first>
<last>Doe</last>
</name>
Developer Two, however, prefers to use a <name> element with no children:
<name>John Doe</name>
For example, what happens if a vendor is working with both organizations?
Developer One designs a <name> element that looks like this:
<name>
<first>John</first>
<last>Doe</last>
</name>
Developer Two, however, prefers to use a <name> element with no children:
<name>John Doe</name>
For example, what happens if a vendor is working with both organizations?
XML - Introduction
31
Namespace example (II)
Create elements as being a part of a specific namespace.
This means that when they are used, the parser is aware that they belong to a namespace, and if a similar element is used, but it belongs to a different namespace, there is no conflict.
Namespaces make use of a special attribute called xmlns that allows you to define a prefix and the namespace URI.
<?xml version="1.0"?><customers
xmlns:vendor="http://www.vendor.com"xmlns:supplier="http://www.supplier.com">
<vendor:name>John Dough</vendor:name><supplier:name>
<first>Jane</first><last>Doe</last>
</supplier:name></customers>
XML - Introduction
32
XML Schemas
In order to be considered valid, the XML document needs
to either have a DTD or an XML Schema.
XML Schemas represent a formal schema
language for defining the structure of XML
documents.
The XML Schema specification deals with some of the
shortcomings of DTDs, such as the lack of robust data
structures, and also abandons the cryptic syntax of DTDs
for an easier-to-use XML-based syntax
In order to be considered valid, the XML document needs
to either have a DTD or an XML Schema.
XML Schemas represent a formal schema
language for defining the structure of XML
documents.
The XML Schema specification deals with some of the
shortcomings of DTDs, such as the lack of robust data
structures, and also abandons the cryptic syntax of DTDs
for an easier-to-use XML-based syntax
XML - Introduction
33
XML Family
There are also a number of W3C Recommendations that deal with various aspects of XML that are not necessarily related to the structure of an XML document but provide mechanisms for implementing XML in
practical solutions. These recommendations are related to the display
or navigation of XML documents. XML è in realtà una famiglia di linguaggi.
Alcuni hanno l’ambizione di standard, altri sono solo proposte di privati o industrie interessate. Alcuni hanno scopi generali, altri sono applicazioni specifiche per ambiti ristretti.
XML - Introduction
34
Extensible Stylesheet Language (XSL).
Stylesheet language designed to aid in the presentation of XML.
As a stylesheet language, it is similar to Cascading Style Sheets (CSS), although there are some significant differences
XSL uses an XML syntax to specify how elements within an XML document should be displayed.
XML - Introduction
35
Extensible Stylesheet Language (XSL) example
<document><title>Introducing XML</title><byline>John Doe</byline><body>Learning about XML is not complicated...</body></document>
If we wanted to display the title of the document in italic, we could use an XSL sheet that looks something like this:
<xsl:template match="title"><fo:block font-style="italic"><xsl:apply-templates/></fo:block></xsl:template>
When the stylesheet and XML document are processed by an XSL-capable parser, the result will be a document displayed with the title in italic.
XML - Introduction
36
Extensible Stylesheet Language Trasformation(XSLT)
XSLT is a technology that allows developers to author a
stylesheet which when processed, will result in the elements
and attributes of an XML document being transformed into
another format.
For example, by using XSLT it is possible to transform an XML element:
<byline>John Doe</byline>
into an HTML tag set:
<b>John Doe</b>
XML - Introduction
37
XPath/XPointer
XPath is a Recommendation that was developed specifically for
locating components within an XML document
XPointer is a Recommendation that allows developers to easily
refer to and locate XML document fragments.
This is very useful for several types of applications, including the ability to have multiple
authors working on a single large XML document, or making extremely large XML
documents more manageable for editing purposes.
XPointer enables you to specify points and ranges within your XML documents, which
can then be treated as "mini" documents in their own right.
XML - Introduction
38
XLink/XInclude/XBase
One of the most powerful aspects of information on the World Wide Web is the ability to
link together documents of interest. Therefore, a linking mechanism for XML
documents naturally increases the power of XML.
The XLink and XBase Recommendations are both used to
specify information about linking XML documents together.
Linking in XML is more complicated than in HTML, because there are more types of
links available to developers
There are also applications where simply linking between documents might not be ideal
and you might want to build a large XML document from a set of smaller documents.
For that purpose, there is the XInclude Recommendation,
which provides the means to include sets of XML documents
into a single document structure.
XML - Introduction
39
Processing XML files
Three traditional techniques for processing XML files are:
Using a programming language and the SAX API. Using a programming language and the DOM API. Using a transformation engine and a filter (XSL)
An application programming interface (API) is a set of functions, procedures, methods or classes that an operating system, library or service provides to support requests made by computer programs
XML - Introduction
40
Document Object Model, or DOM
XML and structured documents like XML are trees, and the DOM is essentially an API for manipulating the document tree.
Rather than an API based on user events (such as clicking a mouse), the DOM is based on the structure of the document itself.
The DOM is likely to be best suited for applications where the document must be accessed repeatedly or out of sequence order. If the application is strictly sequential and one-pass,
the SAX model is likely to be faster and use less memory.
XML and structured documents like XML are trees, and the DOM is essentially an API for manipulating the document tree.
Rather than an API based on user events (such as clicking a mouse), the DOM is based on the structure of the document itself.
The DOM is likely to be best suited for applications where the document must be accessed repeatedly or out of sequence order. If the application is strictly sequential and one-pass,
the SAX model is likely to be faster and use less memory.
XML - Introduction
41
Simple API for XML, or SAX
SAX is an event-driven API, which means that rather than working with the document structure as a whole, SAX allows you to deal with specific parts of a document as the document is parsed.
The quantity of memory that a SAX parser must use in order to function is typically much smaller than that of a DOM parser. DOM parsers must have the entire tree in memory before any processing
can begin. The memory footprint of a SAX parser, by contrast, is based only on the
maximum depth of the XML file Because of the event-driven nature of SAX, processing documents
can often be faster than DOM-style parsers. Memory allocation takes time, so the larger memory footprint of the DOM is also a performance issue.
Due to the nature of DOM, streamed reading from disk is impossible. Processing XML documents that could never fit into memory is only possible through the use of a stream XML parser, such as a SAX parser.
SAX is an event-driven API, which means that rather than working with the document structure as a whole, SAX allows you to deal with specific parts of a document as the document is parsed.
The quantity of memory that a SAX parser must use in order to function is typically much smaller than that of a DOM parser. DOM parsers must have the entire tree in memory before any processing
can begin. The memory footprint of a SAX parser, by contrast, is based only on the
maximum depth of the XML file Because of the event-driven nature of SAX, processing documents
can often be faster than DOM-style parsers. Memory allocation takes time, so the larger memory footprint of the DOM is also a performance issue.
Due to the nature of DOM, streamed reading from disk is impossible. Processing XML documents that could never fit into memory is only possible through the use of a stream XML parser, such as a SAX parser.
XML - Introduction
42
XML and Data: Document Repositories
There are a number of tools called document repositories, which are designed specifically for maintaining large documents or sets of documents.
Because these tools are based in SGML, most have rapidly adapted to XML and are available for use now.
Document repositories can be viewed as specialized databases, designed to work with large documents.
They often have special features, such as the capability to enable users to edit only a part of a document, and then integrate that part into the
There are a number of tools called document repositories, which are designed specifically for maintaining large documents or sets of documents.
Because these tools are based in SGML, most have rapidly adapted to XML and are available for use now.
Document repositories can be viewed as specialized databases, designed to work with large documents.
They often have special features, such as the capability to enable users to edit only a part of a document, and then integrate that part into the
XML - Introduction
43
XML and Data: XQuery
The proper design of your database structure (the schema) is essential
The best data in the world is useless without proper queries.
Because XML documents are now being stored in relational databases, object databases, document repositories, and as simple flat files, the W3C wanted to create a common query language which would enable users to create queries that would work across all these different kinds of data applications.
One way to look at XQuery is as an XML-specific SQL.
The advantage to XQuery for XML is that XQuery is being designed specifically for XML,with the structure of XML documents in mind.
The proper design of your database structure (the schema) is essential
The best data in the world is useless without proper queries.
Because XML documents are now being stored in relational databases, object databases, document repositories, and as simple flat files, the W3C wanted to create a common query language which would enable users to create queries that would work across all these different kinds of data applications.
One way to look at XQuery is as an XML-specific SQL.
The advantage to XQuery for XML is that XQuery is being designed specifically for XML,with the structure of XML documents in mind.
XML - Introduction
44
The Related Technologies
There is another category of XML technologies called XML vocabularies.
These are individual markup languages that have been written using XML
1.0.
XML vocabularies can be treated just like any other XML document, because
they are wellformed (and in many cases, valid) XML.
When you are developing XML documents, what you are really doing is
developing your own XML vocabularies. However, there may already be an
existing XML vocabulary that will meet your needs.
There are literally hundreds of XML vocabularies in existence. Some of these
vocabularies are being developed privately for use within a specific
organization. And some are being developed publicly for anyone to use.
The vocabularies we have chosen to cover here are vocabularies that are
being developed in conjunction with the W3C, and either are, or will likely
become, W3C Recommendations
XML - Introduction
45
Different Vocabularies : XHTML
XHTML, which stands for XML HTML.
XHTML is simply HTML, rewritten to comply with the rules for being well-formed
The reasoning behind this move is that XHTML will allow XML applications to read and treat HTML as if it were just another XML document
One critical difference is that unlike HTML, XHTML is case sensitive, and all the tags have to appear in lower case. That is because XML is case sensitive, so <body> and <BODY> are not the same tag.
Additionally, XHTML requires that all tags be properly closed and nested; HTML does not.
XML - Introduction
46
Different Vocabularies
To make wireless communication easier between devices, and to serve documents to wireless devices, there is an XML-based vocabulary in use (and in ongoing development) designed specifically for wireless: the Wireless Markup Language (WML).
Scalable Vector Graphics (SVG) is an XML-based specification for creating graphics, which could be used on the Web or in print. SVG enables these graphics to be created in a text file, based on the geometry of the graphic.
Synchronized Multimedia Integration Language (SMIL) is an XML-based language that allows developers to create multimedia presentations in an XML-based language. It allows features similar to that of PowerPoint or Flash, such as animated graphics, sounds, and the ability to interact with the presentation on some level (such as following links)
Resource Description Framework (RDF) is primarily an XML-based format for expressing metadata about information on the Web. Metadata is data about data; for example, a table of contents in a book might be considered metadata because it describes the contents of each chapter in the book.
XML - Introduction
47
Ragioni per l’uso di XML
Trasmettere dati tra sistemi diversi (e spesso tra piattaforme diverse)
Inviare informazioni in un formato indipendente dalla sua rappresentazione (separazione tra contenuti e presentazione)
Scambiarsi informazioni insieme alla struttura semantica dell’informazione Trasmettere dati che sono facilmente intellegibili sia
dall’uomo che dal computer Consentire alle imprese di accelerare l’integrazione con i
loro business partner Migliorare la diffusione delle informazioni dentro l’impresa e
sul web Permettere la gestione di quei documenti precedentemente
di competenza dell’EDI
XML - Introduction
48
Tecnologia XML Vantaggi
Presentazione dei dati orientata all’utente La combinazione di XML+XSL: permette di separare la logica di business dalla logica di
presentazione libera l’applicazione dai vincoli legati al device di
presentazione Scambio di dati tra applicazioni
l’integrazione tra applicazioni è possibile con uno sforzo, che è una frazione di quello tradizionale dell’area EDI
Pubblicazione di dati direttamente in XML il formato leggibile dalla macchina (UNICODE) può
essere combinato con altri dati ed elaborato ulteriormente (impossibile con HTML)
XML - Introduction
49
AREE APPLICATIVE PRINCIPALI
Goldfarb e Prescod nel loro testo "The XML Handbook" dividono tutte le applicazioni XML in due grandi categorie: POP (Presentation oriented publishing) MOM (Message oriented middleware)
Il POP gestisce documenti il cui utente finale è un lettore umano. Il publishing di testi, di manuali, di presentazioni sono obiettivi di POP.
Le finalità di POP sono simili a quelle dell'HTML. Usando l'XML è però possibile dare connotazioni strutturali più ricche ai testi (vedi: DocBook).
Gli stylesheet permettono di trasformare documenti che rappresentano la struttura logica in documenti che descrivono il layout fisico. Cambiando stylesheet, si può cambiare il modo in cui i documenti sono visualizzati/stampati.
Il MOM si basa sullo scambio di documenti XML fra programmi al fine di svolgere una funzione coordinata in un ambiente distribuito. Un esempio di MOM è la gestione automatica di ordini fra fornitori e
clienti. Il MOM può coinvolgere diversi tipi di risorse (p.e., database e sistemi
di message-queuing), per le quali si stanno diffondendo interfacce basate su XML.
XML - Introduction
50
Presentation Oriented Publishing
POP è stata l’applicazione killer di SGML Ha portato enormi risparmi alle aziende che
lavoravano sul Web negli anni ‘80 Invece di creare documenti formattati, gli utenti
umani creano astrazioni non formattate Il file rappresenta ciò che è nel documento, non come
deve apparire L’utente POP non si preoccupa dei dati ma della
rendition Per ottenere il risultato desiderato specificare dei
foglio di stile, uno per la stampa, uno per il CD-Rom, uno per il Web, etc.
XML - Introduction
51
Message Oriented Middleware
MOM l’applicazione killer di XML sul Web MOM influenza radicalmente il concetto di middleware
XML - Introduction
52
XML AREE APPLICATIVE
Content management presentation-oriented publishing one common data format multiple rendering styles (XSL)
Data interchange/EDI data interchange / EDI interfacing of heterogeneous products inter-process communication (IPC)
Application integration application-to-application communication Internet message formats (protocols) client/middle tier/server
Data aggregation/portal enterprise information portals
XML - Introduction
53
Electronic Data Interchange
The transfer of structured data, by agreed message standards, from one computer system to another without human intervention. Even in this era of technologies such as XML web
services, the Internet and the World Wide Web, EDI is still the data format used by the vast majority of electronic commerce transactions in the world.
Comprende: Un set di regole sintattiche per strutturare i dati Un protocollo per lo scambio interattivo Messaggi standard
Le organizzazioni che inviano o ricevono documenti sono chiamate in terminologia EDI "trading partners"
XML - Introduction
54
Essential elements of EDI
the use of an electronic transmission medium (originally a value-added network, but increasingly the open, public Internet) rather than the despatch of physical storage media such as magnetic tapes and disks;
the use of structured, formatted messages based on agreed standards (such that messages can be translated, interpreted and checked for compliance with an explicit set of rules);
relatively fast delivery of electronic documents from sender to receiver (generally implying receipt within hours, or even minutes); and
direct communication between applications (rather than merely between computers).
XML - Introduction
55
Il vecchio EDI
Formati diversi per ciascuna applicazione
Il codice applicativo non ha una vista univoca
Nuovi attori hanno impatti devastanti
Può soltanto condividere elementi definiti in precedenza
I nuovi bisogni non possono essere facilmente soddisfatti
XML - Introduction
56
XML può essere la soluzione
Formati diversi per ciascuna applicazione
XML fornisce una singola vista logica
L’architettura flessibile supporta nuovi componenti
XML - Introduction
57
Calcolo Distribuito (I)
Reazione lenta ai cambiamenti
Costi di manutenzione elevati
Flessibilità limitata I cambiamenti dei dati
si propagano a tutti i livelli
XML - Introduction
58
Calcolo Distribuito (II)
Più standard Più semplice Più facilmente
estensibile Minori costi di
manutenzione Maggiore reattività API e template
language standard
XML - Introduction
59
Esempio: fatturazione elettronica
La fatturazione elettronica “elaborabile”, quella cioè orientata ad automatizzare le
registrazioni contabili, è basata su sistemi di trasmissione di dati commerciali ed
amministrativi che, utilizzando reti di trasmissione telematica o reti di telecomunicazioni
nazionali ed internazionali, consentono di scambiare automaticamente tra due
applicazioni informatiche, messaggi strutturati mediante una norma concordata. Sono
tali, per esempio, i tradizionali sistemi di trasmissione EDI (Electronic Data Interchange
che scambiano dati secondo tracciati standard internazionali, utilizzando reti di
trasmissione private oppure le più innovative,e meno onerose, soluzioni WEBEDI con
tecnologie di trasmissione web-based oppure le ultime nate, le soluzioni XML-based,
dove i dati vengono scambiati utilizzando il metalinguaggio XML (eXtensible Markup
Language), secondo gli stessi standard dell’EDI oppure con nuovi standard
internazionali
XML - Introduction
60
Approccio XML/EDI basato su scambio di messaggi
Piero De Sabbata ENEA
XML - Introduction
61
Trasmissione messaggi e sicurezza
Piero De Sabbata ENEA
XML - Introduction
62
Lo scenario message based
Piero De Sabbata ENEA
XML - Introduction
63
Alcuni Riferimenti
Specifications W3C XML homepage The XML 1.0 specification The XML 1.1 specification
Sources Introduction to Generalized Markup by Charles Goldfarb Making Mistakes with XML by Sean Kelly The Multilingual WWW by Gavin Nicol Retrospective on Extended Reference Concrete Syntax by Rick Jelliffe XML Based languages Essential XML Quick Reference XML, Java and the Future of the Web by Jon Bosak XML tutorials in w3schools XML.gov
Retrospectives Thinking XML: The XML decade by Uche Ogbuji XML: Ten year anniversary by Elliot Kimber Closing Keynote, XML 2006 by Jon Bosak Five years later, XML... by Simon St. Laurent 23 XML fallacies to watch out for by Sean McGrath W3C XML is Ten!, XML 10 years press release
XML - Introduction
64
ConsortiumRecommendations
Canonical XML · CDF · CSS · DOM · HTML · MathML · OWL · PLS · RDF · RDF Schema · SISR · SMIL · SOAP · SRGS · SSML · SVG · SPARQL · Timed Text · VoiceXML · WSDL · XForms · XHTML ·
XML · XML Base · XML Events · XML Information Set · XML Schema (W3C) · XML Signature · XPath · XPointer · XQuery · XSL Transformations · XSL-FO · XSL · XLink
Notes XHTML+SMIL · XAdES Working Drafts CCXML · CURIE · InkML · XFrames
· XFDL · WICD · XHTML+MathML+SVG · XBL · XProc · HTML 5
XML - Introduction
65
UNICODE
E’ un sistema di codifica che assegna un numero univoco ad ogni carattere usato per la scrittura di testi, in maniera indipendente dalla lingua, dalla piattaforma informatica e dal programma utilizzato.
Il codice assegnato al carattere viene rappresentato con U +, seguito dalle quattro (o sei) cifre esadecimali del numero che lo individua.
Attualmente lo standard Unicode non rappresenta ancora tutti i caratteri in uso nel mondo.
Essendo ancora in evoluzione, si prefigge di coprire tutti i caratteri
rappresentabili, garantendo la compatibilità e la non sovrapposizione
con le codifiche dei caratteri già definiti, ma lasciando comunque dei
ben precisi campi di codici "non usati", da riservare per la gestione
autonoma all'interno di applicazioni particolari.
XML - Introduction
66
XML - Introduction
67
Character encoding
Unicode can be implemented by different character
encodings
Una codifica di caratteri consiste in un codice che associa
un insieme di caratteri ad un insieme di altri oggetti, come
numeri (specialmente nell'informatica) con lo scopo di
facilitare la memorizzazione di un testo in un computer o la
sua trasmissione attraverso una rete di telecomunicazioni.
Esempi comuni sono il Codice Morse e la codifica ASCII.
The most commonly used encoding is UTF-8
XML - Introduction
68
UTF-8
UTF-8 (Unicode Transformation Format, 8 bit) è una codifica dei caratteri Unicode in sequenze di lunghezza variabile di byte
Usa da 1 a 4 byte per rappresentare un carattere Unicode.
Per esempio un solo byte è necessario per rappresentare i 128 caratteri dell'alfabeto ASCII, corrispondenti alle posizioni Unicode da U+0000 a U+007F.
Esempi :
http://it.wikipedia.org/wiki/UTF-8#Descrizione
http://en.wikipedia.org/wiki/UTF-8#Examples
XML - Introduction
69
EsempiIntervallo Unicode
UTF-8Binario
0x000000 - 0x00007F
0xxxxxxx
0x000080 - 0x0007FF
110xxxxx 10xxxxxx
0x000800 - 0x00FFFF
1110xxxx 10xxxxxx 10xxxxxx
0x010000 - 0x10FFFF
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Per esempio, il carattere alef (א), corrispondente all'Unicode U+05D0, viene rappresentato in UTF-8 con questo procedimento:
ricade nell'intervallo da 0x0080 a 0x07FF. Secondo la tabella verrà rappresentato con due byte. 110xxxxx 10xxxxxx.
l'esadecimale 0x05D0 equivale al binario 101-1101-0000.
gli undici bit vengono copiati in ordine nelle posizioni marcate con "x". 110-10111 10-010000.
il risultato finale è la coppia di byte 11010111 10010000, o in esadecimale 0xD7 0x90
The Dollar Sign ($), which is Unicode U+0024 or binary 10 0100:
this falls into the first line of the table range of U+0000 through U+007F
The first line of the table shows it will be encoded using one byte, 0xxxxxxx
Putting the binary right-justified into the 'x' bits results in 00100100
This byte in hexadecimal is 0x24. Thus the ASCII dollar sign is encoded unchanged.
The Euro symbol (€), which is Unicode U+20AC or binary 10 0000 1010 1100:
this falls into the third line of the table range of U+0800 through U+FFFF
The third line of the table shows it will be encoded using three bytes, 1110xxxx,10xxxxxx,10xxxxxx.
Putting the binary right-justified into the 'x' bits results in 11100010,10000010,10101100
These bytes in hexadecimal are 0xE2,0x82,0xAC. That is the encoding of the Euro symbol (€) in UTF-8.
XML - Introduction
70
World Wide Web Consortium
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web (abbreviated WWW or W3).
It is arranged as a consortium where member organizations maintain full-time staff for the purpose of working together in the development of standards for the World Wide Web.
As of October 2008, the W3C had 418 members (http://www.w3.org/Consortium/Member/List )
W3C also engages in education and outreach, develops software and serves as an open forum for discussion about the Web.
It was founded and is headed by Sir Tim Berners-Lee.
XML - Introduction
71
XML - Introduction
72
What is a Recommendation?
Unlike an officially sanctioned standards body, such as the International Standards Organization (ISO), the W3C is not an
official standards organization. The W3C simply publishes "Recommendations," which are not
binding in any way. Simply put, they are a set of guidelines, published and copyrighted by the W3C.
The power of these "Recommendations" comes from the fact that people treat them as standards by consensus, and the fact that you can't claim compliance with a Recommendation and not be in compliance without violating the copyrights.
XML - Introduction
73
Incarico a Charles F. Goldfarb di costruire un sistema per la memorizzazione, la ricerca, la gestione e la pubblicazione di documenti legali
Goldfarb scoprì che molti sistemi, in IBM, non potevano comunicare tra loro I formati dei file nelle diverse applicazioni erano
proprietari ...e diversi tra loro!!! 3 fatti importanti
I diversi programmi avevano bisogno di supportare una rappresentazione comune dei documenti
Il linguaggio comune doveva essere specifico per i documenti legali
Il linguaggio doveva essere specificato in una maniera formale, capace di delimitare in modo appropriato gli elementi
La risposta è stato GML (Generalized Markup Language), precursore di SGML (Standard GML), il linguaggio da cui deriva XML
XML - Introduction
74
Standard Generalized Markup Language (ISO 8879:1986 SGML)
is an ISO Standard metalanguage in which one can define markup languages for documents.
SGML is a descendant of IBM's Generalized Markup Language (GML), developed in the 1960s by Charles Goldfarb, Edward Mosher and Raymond Lorie (whose surname initials were used by Goldfarb to make up the term GML).
SGML provides an abstract syntax that can be realized in many different concrete syntaxes
SGML was originally designed to enable the sharing of machine-readable documents in large projects in government, law and industry, which have to remain readable for several decades.
It has also been used extensively in the printing and publishing industries, but its complexity has prevented its widespread application for small-scale general-purpose use. Primarily intended for text and database publishing, one of its first major
applications was the second edition of the Oxford English Dictionary (OED), which was and is wholly marked up using an SGML-like markup.
XML - Introduction
75
W3C XML 10 Years
On 10 February 1998, W3C published Extensible Markup Language (XML) 1.0 as a W3C Recommendation. W3C would like to thank the dedicated communities -- including people who have participated in W3C's XML groups and mailing lists, the SGML community, and xml-dev -- whose efforts have created a successful family of technologies based on the solid XML 1.0 foundation.
"There is essentially no computer in the world, desk-top, hand-held, or back-room, that doesn't process XML sometimes," said Tim Bray of Sun Microsystems.
"This is a good thing, because it shows that information can be packaged and transmitted and used in a way that's independent of the kinds of computer and software that are involved. XML won't be the last neutral information-wrapping system; but as the first, it's done very well."
XML - Introduction
76
Il concetto di metalinguaggio (I)
In logic and linguistics, a metalanguage is a language used to make statements in another language which is called the object language ( cioè un formalismo per descrivere rigorosamente un altro linguaggio)
Markup languages are different from metalanguages as they only describe how a document should be presented and not the syntax of a computer programming language, however it's possible to use schemas like XML Schemas to describe content rules.
XML is the metalanguage used to describe XHTML just as SGML is used to describe HTML.
XHTML is much stricter than HTML, for example XHTML is case sensitive unlike HTML.
XML - Introduction
77
metalinguaggio
documenti
Il concetto di metalinguaggio (II)
XML
Math-ML XHTML DocBook
sintassi
metasintassi
linguaggi
XML - Introduction
78
Dato che XML è un metalinguaggio per specificare altri linguaggi, costituisce un “livello comune” per il dialogo in ambienti differenti
XML non dice nulla su che tag utilizzare, ma fissa solo delle regole comuni per eseguire correttamente il parsing del file
E’ possibile usare XML per gli scopi più disparati, a seconda delle operazioni che verranno eseguite dalla specifica applicazione di fronte al markup utilizzato
Regole XML
Tag specifici
Appl.
xmlparser
Dati (file XML)
Il concetto di metalinguaggio (III)