97
INTRODUCTION TO XML

Underlying Technologies XML Is the Glue Program the Web XML Browse the Web HTML T C P/IP Connect the Web Technology Innovation ConnectivityPresentation

Embed Size (px)

Citation preview

Slide 1

INTRODUCTION TO XML Underlying Technologies XML Is the GlueProgram the WebXMLBrowse the WebHTMLTCP/IPConnect the WebTechnologyInnovationConnectivityPresentationConnecting ApplicationsFTP, E-mail, GopherWeb PagesWeb ServicesIntroducing XMLXML = Extensible Markup LanguageIn information technology, extensible describes something, such as a program, programming language, or protocol, that is designed so that users or developers can expand or add to its capabilitiesBecause XML is extensible, it can be used to create a wide variety of document types.XML enables document authors to create entirely new markup languages for describing specific types of data, including mathematical formulas, chemical molecular structures, music, recipes, etc.

Markup is a text-based notation for describing data Some XML-based markup languages include XHTML ,MathML (for mathematics), VoiceXML (for speech), SMIL (the Synchronous Multimedia Integration Languagefor multimedia presentations), CML (Chemical Markup Languagefor chemistry) and XBRL (Extensible Business Reporting Languagefor financial data exchange

12

34

5

6

7 Welcome to XML!

8

Line numbers are not part of XML document. We include them for clarity.Document begins with declaration that specifies XML version 1.0Element message is child element of root element myMessage XML documents contain text that represents content (i.e. data) such as welcome to xml !!It contains element that specify the documents structure , such as mymessageXML documents delimit elements with start tags and end tags Start tag the element name in angle brackets End tag consists of the element name preceded by a forward slash (/) Every XML document must have exactly one root element that contains all the other elements such as mymessageVIEWING AND MODIFIFYING XML documents XML documents are highly portable. Viewing or modifying an XML documentwhich typically ends with the .xml filename extensiondoes not require special software. Any text editor that supports ASCII/Unicode characters can open XML documents for viewing and editing. One important characteristic of XML is that it is both human readable and machine readable.Processing XML documentsProcessing an XML document requires a software program called an XML parser (or an XML processor)Parsers check an XML documents syntax and enable software programs to process marked-up dataXML syntax requires: a single root element , a start tag, and an end tag for each element properly nested tags (i.e. the end tag for a nested element must appear before the end tag of the enclosing element)

XML is case sensitive .A document that conforms to this syntax is a well-formed XML document and is syntactically correct.Parser is built in IE7 and FF2Validating XML DocumentsAn XML document optionally can reference a document that defines that XML documents structure. This document is either a Document Type Definition (DTD) or a schema.When an XML document references a DTD or schema, some parsers (called validating parsers) can read the DTD/schema and check that the XML document follows the structure that the DTD/schema defines. If the XML document conforms to the DTD/schema (i.e., the document has the appropriate structure), the XML document is valid. Parsers that cannot check for document conformity against DTDs/schemas are non-validating parsers. If an XML parser (validating or non-validating) can process an XML document successfully, that XML document is well formed (i.e., it is syntactically correct). By definition, a valid XML document also is well-formed.

Elements title, date, author, summary and content are child elements of article.1 2 3 4 5 6 7 8 Simple XML9 10 September 19, 200111 12 13 Tem14 Nieto15 16 17 XML is pretty easy.18 19 Once you have mastered XHTML, XML is easily20 learned. You must remember that XML is not for21 displaying information but for managing information.22 23 24 Optional XML declaration.Element article is the root element.IE713We begin with the optional XML declaration on line 1. Value version indicates the XML version to which the document conforms. The current XML standard is version 1.0.Blank lines (line 2), white spaces and indentation help improve readability Comments (lines 3-4) begin with , can be placed anywherePlacing any characters including white spaces before the XML declaration is an errorLine2(1-4) is the XML prologlines(6-24) are the root element XML element and attribute names can be of any length and may contain letters, digits, underscores, hyphens and periods. However, XML names must begin with either a letter or an underscore.Using either a space or a tab in an XML element or attribute name is an error using xml in any combination of uppercase or lower case letter at the beginning is error Parent(container)/ child relationship (all children at the same nesting level are siblings)

Viewing an xml DocumentNote that the XML document is simply a text file does not contain formatting information for the article. This is because XML is a technology only for structuring data. Formatting and displaying data from an XML document are application specific issues. For example, when Internet Explorer 7 loads an XML document, IE7s parser msxml (Microsoft XML Core Services) parses and displays the document data .the firefox has a similar capability .Each browser has a built-in style sheet to format the data

Firefox18Notice the minus sign () and plus sign (+). IE7 places these symbols next to all container elements. A minus sign indicates that IE7 is displaying that container elements child elements. Clicking the minus sign next to an element causes IE7 to hide that container elements children and replaces the minus sign with a plus sign. Clicking the plus sign next to an element causes IE7 to display that container elements children and replaces the plus sign with a minus sign.

The same can be found in FF2

Document Type DefinitionsA DTD enables an XML parser to verify whether an XML document is valid (i.e., its elements contain the proper attributes, are in the proper sequence, etc.)..XML Markup for a Business letter

Jane Doe Box 12345 15 Any Ave. Othertown Otherstate 67890 555-4321

This xml document references a DTD John Doe 123 Main St. Anytown Anystate 12345 555-1234

Dear Sir:

It is our privilege to inform you about our new database managed with XML. This new system allows you to reduce the load on your inventory list server by having the client machine perform the work of sorting and filtering the data.

Please visit our website for availability and pricing.

Sincerely, Ms. Jane Doe

: root element line(7-43)

Root element letter contains child elements contact, salutation, paragraph, closing and signature. ....

Element with attribute ( type = ) (sender)

NotesCase sensitive not

Attribute name Attribute value Jane DoeElement with data between elements tags

letter.dtd 26

To reference the DTD file we use :

referenceRoot nameExternal file Name & locationWe have :

ELEMENTEMPTYATTLIST#RQUIREDCDATA#FIXED#IMPLIED#PCDATAELEMENT element type declaration defines the rules for element letter. In this case, letter contains one or more contact elements, one salutation element, one or more paragraph elements, one closing element and one signature element,in that sequence. The plus sign (+) occurrence indicator specifies that the DTD allows one or more occurrences of an element. asterisk (*), which indicates an optional element that can occur any number of times (zero or more)the question mark (?), which indicates an optional element that can occur at most once (zero or one occurrence) If an element does not have an occurrence indicator, the DTD allows exactly one occurrence.The contact element definition (line 7) specifies that element contact contains child elements name, address1, address2, city, state, zip, phone and flag in that order. The DTD requires exactly one occurrence of each element.ATTLISTelement type declaration to define an attribute (i.e., type) for the contact element.

KEYWORD #IMPLIEDspecifies that if the parser finds a contact element without a type attribute, the parser can choose an arbitrary value for the attribute or ignore the attribute and the document will be valid.KEYWORD #REQUIREDspecifies that the attribute must be present in the element, and keywordKEYWORD #FIXED

specifies that the attribute (if present) must have the given fixed value. For example,

indicates that attribute zip must have the value 01757 for the document to be valid

KEYWORD# DEFAULT The attribute has a default valueCDATA

CDATA specifies that attribute type contains character data (i.e.,a string) which indicates that the parser will not process the data, but will pass the data to the application without modification.

#PCDATA

specifies that the element can contain parsed character data (i.e., text). Parsable character data should not contain markup characters, such as less than () and ampersand (&). The document author should replace any markup character with its corresponding entity (i.e., or &).

Appendix A contains special characterEntity ReferencesCharacter

Tove Jani Reminder Don't forget me this weekend

This XML document has a reference to a DTD:

Tove Jani Reminder Don't forget me this weekend!

A Reference to a DTD

A DTD File

Using xml Schema XML Schema is an XML-based alternative to DTD.An XML schema describes the structure of an XML document.The XML Schema language is also referred to as XML Schema Definition (XSD).

An XML SCHEMA The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD.An XML Schema:defines elements that can appear in a documentdefines attributes that can appear in a documentdefines which elements are child elementsdefines the order of child elements

What is an XML Schema?

defines the number of child elementsdefines whether an element is empty or can include textdefines data types for elements and attributesdefines default and fixed values for elements and attributes

We think that very soon XML Schemas will be used in most Web applications as a replacement for DTDs. Here are some reasons:XML Schemas are extensible to future additionsXML Schemas are richer and more powerful than DTDsXML Schemas are written in XMLXML Schemas support data typesXML Schemas support namespacesXML Schemas are much more powerful than DTDs.

XML Schemas are the Successors of DTDs

Tove Jani Reminder Don't forget me this weekend!

A Simple XML Document

An XML Schema

Tove Jani Reminder Don't forget me this weekend!

A Reference to an XML Schema

The element is the root element of every XML Schema:

......

The Element

The element may contain some attributes. A schema declaration often looks something like this:

......

xmlns:xs=http://www.w3.org/2001/XMLSchema

indicates that the elements and data types used in the schema come from the "http://www.w3.org/2001/XMLSchema" namespace. It also specifies that the elements and data types that come from the "http://www.w3.org/2001/XMLSchema" namespace should be prefixed with xs:

targetNamespace=http://www.w3schools.com

indicates that the elements defined by this schema (note, to, from, heading, body.) come from the "http://www.w3schools.com" namespace.

xmlns=http://www.w3schools.com

indicates that the default namespace is "http://www.w3schools.com".

elementFormDefault="qualified

indicates that any elements used by the XML instance document which were declared in this schema must be namespace qualified.

Referencing a Schema in an XML Document

Example

noNamespaceSchemaLocation attribute is very similar to the schemaLocation attribute, in that it allows you to specify the location of a Schema for use with your document. However, since a Schema is not required to have a targetNamespace, this attribute can be used for Schema that do not have a namespace

xsi:noNamespaceSchemaLocation

This is family.xsd file in the same folder with xml file

Xml file :Example

No target spaceRoot element This is note.xsd file in the same folder with xml file:

......

Example Xml file

Root elementTarget name space Hello From XML This is an XML document! We can use also UTF-16 or ISO-8859-1

Character Encodings: The structures that make up an XML Schema are called components, and every XML Schema is essentially a set of components which outline the constraints placed on the content of an XML document.

Documents which contain XML elements and attributes are XML instance documents, and Schema can constrain those instancesThe Components of XML SchemaSimple elements A simple element is an XML element that can contain only text. It cannot contain any other elements or attributes.However, the "only text" restriction is quite misleading. The text can be of many different types. It can be one of the types included in the XML Schema definition (boolean, string, date, etc.), or it can be a custom type that you can define yourself.You can also add restrictions (facets) to a data type in order to limit its content, or you can require the data to match a specific pattern.

A Simple Element

The syntax for defining a simple element is:

where xxx is the name of the element and yyy is the data type of the element. XML Schema has a lot of built-in data types. The most common types are:xs:stringxs:decimalxs:integerxs:booleanxs:datexs:time

Defining a Simple Element

Here are some XML elements:Refsnes361970-03-27And here are the corresponding simple element definitions: