85
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

Embed Size (px)

Citation preview

Page 1: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

Processing of structured documents

Spring 2002, Part 2Helena Ahonen-Myka

Page 2: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

2

XML Namespaces

An XML document may contain multiple markup vocabularies

reuse of existing markup, e.g. including HTML markup in some document type

An XML namespace is a collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names

Page 3: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

3

Author A writes a document:

<?xml version=”1.0”?><references> <name>Macmillan</name> <link href=”http://www.mcp.com”/> <name>ABC News</name> <link href=”http://www.abcnews.com”/></references>

Page 4: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

4

Author B adds some rating….

<?xml version=”1.0”?><references> <name>Macmillan</name> <link href=”http://www.mcp.com”/> <rating>5 stars</rating> <name>ABC News</name> <link href=”http://www.abcnews.com”/> <rating>3 stars</rating></references>

Page 5: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

5

Also Author C wants to add some rating...

<?xml version=”1.0”?><references> <name>Macmillan</name> <link href=”http://www.mcp.com”/> <rating>G</rating> <name>ABC News</name> <link href=”http://www.abcnews.com”/> <rating>PG</rating></references>

Page 6: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

6

Author D would like to combine the documents...

<?xml version=”1.0”?><references> <name>Macmillan</name> <link href=”http://www.mcp.com”/> <rating>5 stars</rating> <rating>G</rating> <name>ABC News</name> <link href=”http://www.abcnews.com”/> <rating>3 stars</rating> <rating>PG</rating></references>

Page 7: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

7

Which rating? -> different names

<?xml version=”1.0”?><references> <name>Macmillan</name> <link href=”http://www.mcp.com”/> <qa-rating>5 stars</qa-rating> <pa-rating>G</pa-rating> <name>ABC News</name> <link href=”http://www.abcnews.com”/> <qa-rating>3 stars</qa-rating> <pa-rating>PG</pa-rating></references>

Page 8: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

8

Namespaces give a disciplined method for naming

<?xml version=”1.0”?><references xmlns:qa=”http://joker.com/2000/star-rating” xmlns:pa=”http://penguin.xmli.com/2000/review” xmlns=”http://pineapplesoft.com/1999/ref”> <name>Macmillan</name> <link href=”http://www.mcp.com”/> <qa:rating>5 stars</qa:rating> <pa:rating>G</pa:rating> ...</references>

Page 9: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

9

Namespacesxmlns:qa=”http://joker.com/2000/star-rating”

qa: prefix http://joker.com/2000/star-rating

the namespacea unique name (URI guarantees): no need to retrieve

anything from the address

xmlns=” http://pineapplesoft.com/1999/ref”> the default namespace elements without prefixes belong to this

namespacereferences, name, link

Page 10: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

10

Namespaces

qa:rating a qualified name (Qname)

scoping: The namespace is valid for the element where

it is declared and all the elements within its content

Page 11: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

11

Scoping

<?xml version=”1.0”?><ref:references xmlns:ref=”http://pineapplesoft.com/1999/ref”> <ref:name>Macmillan</ref:name> <ref:link href=”http://www.mcp.com”/> <pa:rating xmlns:pa=”http://penguin.xmli.com/2000/review”>G</pa:rating> <ref:name>ABC News</ref:name> <ref:link href=”http://www.abcnews.com”/> <qa:rating xmlns:qa=”http://joker.com/2000/star-rating”> 3 stars</qa:rating></ref:references>

Page 12: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

12

Namespaces and DTD

XML 1.0 DTDs are not namespace-awareall the elements and attributes that are in

some namespace have to be declared using the corresponding prefix

for elements with prefix ’pre’ : an attribute ’xmlns:pre’ has to be declared

Page 13: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

13

Namespaces and DTD

<?xml version=”1.0”?><!DOCTYPE ref:references [<!ELEMENT ref:references (ref:name, ref:link, (pa:rating | qa:rating)*)+><!ATTLIST ref:references xmlns:ref CDATA #REQUIRED><!ELEMENT ref:name (#PCDATA)><!ELEMENT ref:link EMPTY><!ATTLIST ref:link href CDATA #REQUIRED><!ELEMENT pa:rating (#PCDATA)><!ATTLIST pa:rating xmlns:pa CDATA #REQUIRED><!ELEMENT qa:rating (#PCDATA)><!ATTLIST qa:rating xmlns:qa CDATA #REQUIRED>]>

Page 14: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

14

DTD: external and internal subsets

external and internal subset make up the DTD; internal has higher precedence

syntax: <!DOCTYPE root-type-name SYSTEM ”ex.dtd” <!--

external subset in file ex.dtd --> [ <!-- internal subset may come here --> ]>

internal subset may declare new elements (with attributes) or new attributes for existing elements

namespaces facilitate the control of name conflicts

Page 15: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

15

Namespaces and XML Schema

An XML Schema document contains declarations of namespaces that are used in the document e.g. xmlns:xsd=”http://www.w3.org/2001/XMLSchema”

for the elements with special XML Schema semantics

Target namespace: ~these definitions included in this schema give definition to this namespace targetNamespace=”uri:mywork”

Page 16: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

16

Namespaces and XML Schema

In XML Schema, schema components from different target namespaces can be used together

-> enables the schema validation of instance content defined across multiple namespaces

Page 17: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

17

XML Information set

An XML document’s information set consists of a number of information items

an information item is an abstract description of some part of an XML document mainly to be used in other specifications

each information item has a set of associated named properties

Page 18: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

18

XML Information set

Tree structure provided by the processor (no special interface is specified)

e.g. entities expanded to their replacement text, attributes with their default values

properties: e.g. for each element its child elements and attributes

Page 19: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

19

Information items

document information itemelement information itemsattribute information itemsprocessing instruction information

itemsunexpanded entity reference

information itemscharacter information items

Page 20: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

20

Information items (cont.)

comment information itemsdocument type declaration

information itemunparsed entity information itemsnotation information itemsnamespace information items

Page 21: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

21

Example: document information item

There is exactly one document information item in the information set

all information items are accessible from the properties of the document information item, either directly or indirectly through the properties of other information items

Page 22: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

22

Example: document information item

Properties: children document element notations unparsed entities base URI character encoding scheme standalone version all declarations processed

Page 23: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

23

Example: element information items

There is an element information item for each element appearing in the XML document

one of the element information items is the value of the document element property of the document information item (root element)

all other element information items are accessible recursively

Page 24: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

24

Example: element information items

An element information item has the following properties: namespace name local name prefix children attributes namespace attributes in-scope namespaces base URI parent

Page 25: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

25

Example

<?xml version=”1.0”?>

<msg:message doc:date=”19990421”

xmlns:doc=”http://doc.example.org/namespaces/doc”

xmlns:msg=”http://message.example.org/”

>Phone home!</msg:message>

Page 26: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

26

The information set for the sample document

A document information iteman element information item with

namespace name ”http://message.example.org/”, local part ”message”, and prefix ”msg”

Page 27: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

27

The information set for the sample document (cont.)

an attribute information item with the namespace name ”http://doc.example.org/namespaces/doc”, local part ”date”, prefix ”doc”, and normalized value ”19990421”

three namespace information items for the http://www.w3.org/XML/1998/namespace, http://doc.example.org/namespaces/doc, http://message.example.org namespaces

Page 28: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

28

The information set for the sample document (ctnd.)

Two attribute information items for the namespace attributes

eleven character information items for the character data

Page 29: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

29

XML 1.0 reporting requirements

For instance: an XML processor must always provide all

characters in a document that are not part of markup to the application

a validating XML processor must inform the application which of the character data in a document is white space appearing within element content

an XML processor must normalize line-ends to LF before passing them to the application

Page 30: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

30

XML 1.0 reporting requirements (ctnd.)

A validating XML processor must include the replacement text of an entity in place of an entity reference

an XML processor must supply the default value of attributes declared in the DTD for a given element type but not appearing in the element’s start tag

Page 31: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

31

What is not in the information set? For instance,

the document type name the difference between the two forms of an

empty element: <foo/> and <foo></foo> the order of attributes within a start-tag white space within start-tags (other than

significant white space in attribute values) and end-tags

the difference between CR, CR-LF, and LF line termination

Page 32: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

32

XML Schema

DTDs have drawbacks:DTDs have drawbacks: they can only define the element structure and attributes they cannot define any database-like constraints for

elements: Value (min, max, etc.) Type (integer, string, etc.)

DTDs are not written in XML and cannot thus be processed with the same tools as XML documents, XSL(T), etc.

difficult to combine different vocabularies (namespaces)

XML SchemasXML Schemas: are written in XML avoid most of the DTD drawbacks

Page 33: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

33

XML Schema

XML Schema Part 1: Structures:XML Schema Part 1: Structures: Element structure definition as with DTD: Elements,

attributes, also enhanced ways to control structures

XML Schema Part 2: Datatypes:XML Schema Part 2: Datatypes: Primitive datatypes (string, boolean, float, etc.) Derived datatypes from primitive datatypes (time,

recurringDate) Constraining facets for each datatype (minLength,

maxLength, pattern, precision, etc.)

The following is based on:The following is based on: XML Schema Part 0: Primer (2.5.2001)

Page 34: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

34

Reminder: DTD declarations

<!ELEMENT name (fname+, lname)><!ELEMENT address (name, street,

(city, state, zipcode) | (zipcode, city))>

<!ELEMENT contact (address, phone*, email?)>

<!ELEMENT fname (#PCDATA)>

Page 35: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

35

<?xml version=”1.0”?><purchaseOrder orderDate=1999-10-20”> <shipTo country=”US”> <name>Alice Smith</name>

<street>123 Maple Street</street><city>Mill Valley</city><state>CA</state><zip>90952</zip>

</shipTo>

A sample document

Page 36: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

36

<billTo country=”US”> <name>Robert Smith</name>

<street>8 Oak Avenue</street><city>Old Town</city><state>PA</state><zip>95819</zip>

</billTo>

<comment>Hurry, my lawn is going wild!</comment>

Continues...

Page 37: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

37

<items><items> <item partNum="872-AA"><item partNum="872-AA"> <productName>Lawnmower</productName><productName>Lawnmower</productName> <quantity>1</quantity><quantity>1</quantity> <price>148.95</price><price>148.95</price> <comment>Confirm this is electric</comment><comment>Confirm this is electric</comment> </item></item> <item partNum="926-AA"><item partNum="926-AA"> <productName>Baby Monitor</productName><productName>Baby Monitor</productName> <quantity>1</quantity><quantity>1</quantity> <price>39.98</price><price>39.98</price> <shipDate>1999-05-21</shipDate><shipDate>1999-05-21</shipDate> </item></item> </items></items></purchaseOrder> </purchaseOrder>

… continues

Page 38: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

38

DTD

<!ELEMENT purchaseOrder (shipTo, billTo, comment?, items) >

<!ATTLIST purchaseOrder orderDate CDATA #REQUIRED>

<!ELEMENT shipTo (name, street, city, state, zip)>

<!ATTLIST shipTo country CDATA #REQUIRED>

<!ELEMENT billTo (name, street, city, state, zip)>

<!ATTLIST billTo country CDATA #REQUIRED>

<!ELEMENT comment (#PCDATA)>

<!ELEMENT items (item+)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT street (#PCDATA)>

Page 39: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

39

DTD continues

<!ELEMENT city (#PCDATA)>

<!ELEMENT state (#PCDATA)>

<!ELEMENT zip (#PCDATA)>

<!ELEMENT item (productName, quantity, USPrice, (comment |

shipDate))>

<!ATTLIST item partNum CDATA #REQUIRED>

<!ELEMENT productName (#PCDATA)>

<!ELEMENT quantity (#PCDATA)>

<!ELEMENT USPrice (#PCDATA)>

<!ELEMENT shipDate (#PCDATA)>

Page 40: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

40

Complex and simple types

Schema defines types for elements and attributes

complex types: allow elements in their content and may have attributes

simple types: cannot have element content and cannot have attributes

elements can have complex or simple types, attributes can have simple types

Page 41: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

41

XML Schema: structure<xsd:schema

xmlns:xsd=”http://www.w3.org/2001/XMLSchema”>

<xsd:annotation> … </xsd:annotation>

<xsd:element name=”purchaseOrder” type=”PurchaseOrderType”/>

<xsd:element name=”comment” type=”xsd:string”/>

<xsd:complexType name=”PurchaseOrderType”>

<xsd:sequence>… </xsd:sequence>

<xsd:attribute name=”orderDate” type=”xsd:date”/>

</xsd:complexType>

</xsd:schema>

Page 42: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

42

USAddress type

<xsd:complexType name=”USAddress” > <xsd:sequence> <xsd:element name=”name” type=”xsd:string” /> <xsd:element name=”street” type=”xsd:string” /> <xsd:element name=”city” type=”xsd:string” /> <xsd:element name=”state” type=”xsd:string” /> <xsd:element name=”zip” type=”xsd:decimal” /> </xsd:sequence> <xsd:attribute name=”country” type=”xsd:NMTOKEN” fixed=”US” /></xsd:complexType>

Page 43: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

43

PurchaseOrderType

<xsd:complexType name=”PurchaseOrderType”> <xsd:sequence> <xsd:element name=”shipTo” type=”USAddress” /> <xsd:element name=”billTo” type=”USAddress” /> <xsd:element ref=”comment” minOccurs=”0” /> <xsd:element name=”items” type=”Items” /> </xsd:sequence> <xsd:attribute name=”orderDate” type=”xsd:date” /></xsd:complexType>

Page 44: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

44

Shared types, references

element declarations for shipTo and billTo associate different element names with the same complex type

attribute declarations must reference simple types

element comment declared on the top level of the schema (here reference only)

Page 45: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

45

Occurrence constraints

minOccurs, maxOccurs (defaults: 1) minOccurs: minimun number of times

an element may appear element is optional, if minOccurs = 0 maxOccurs: maximum number of

times an element may appearattributes may appear once or not

at all

Page 46: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

46

Attributes use, default and fixed (in attribute declarations)

Attribute ”use” is used in an attribute declaration to indicate whether the attribute is ’required’, ’optional’ or ’prohibited’

default value may be provided if ’optional’ is set if the instance does not give the value

the default is used

Page 47: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

47

Attributes use, default and fixed (in attribute declarations)

Attribute ”fixed” the value of the attribute is the value of

”fixed”

<xsd:attribute name=”temp1” type=”xsd:decimal” use=”optional” default=”37” />

<xsd:attribute name=”temp2” type=”xsd:decimal” use=”optional” fixed=”37” />

<xsd:attribute name=”temp2” type=”xsd:decimal” use=”required” fixed=”37” />

Page 48: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

48

Items<xsd:complexType name="Items"><xsd:complexType name="Items"> <xsd:sequence><xsd:sequence> <xsd:element name="item" minOccurs="0" maxOccurs="unbounded"><xsd:element name="item" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType><xsd:sequence><xsd:complexType><xsd:sequence> <xsd:element name=”productName” type=”xsd:string” /><xsd:element name=”productName” type=”xsd:string” /> <xsd:element name="quantity"><xsd:element name="quantity"> <xsd:simpleType><xsd:simpleType>

<xsd:restriction base="xsd:positiveInteger"><xsd:restriction base="xsd:positiveInteger"> <xsd:maxExclusive value="100"/><xsd:maxExclusive value="100"/>

</xsd:restriction></xsd:restriction> </xsd:simpleType></xsd:simpleType> </xsd:element></xsd:element> <xsd:element name="USprice" type="xsd:decimal"/><xsd:element name="USprice" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/><xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date”<xsd:element name="shipDate" type="xsd:date” minOccurs="0"/>minOccurs="0"/> </xsd:sequence></xsd:sequence> <xsd:attribute name="partNum" type="Sku” use=”required”/><xsd:attribute name="partNum" type="Sku” use=”required”/> </xsd:complexType></xsd:complexType> </xsd:element></xsd:sequence></xsd:element></xsd:sequence></xsd:complexType></xsd:complexType>

Page 49: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

49

Anonymous type definitions

Schemas can be constructed by defining sets of named types such as PurchaseOrderType on the top level and then declaring elements such as purchaseOrder

if a type is used only once, it is more compactly defined as an anonymous type

Page 50: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

50

Anonymous type definitions

You can define anonymous types by the lack of ’type=’ in an element declaration and by the presence of an unnamed (simple or complex) type definition following the element name see the Items type definition

Page 51: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

51

Global elements and attributes

Global elements and attributes have declarations that appear as the children of the schema element

global elements and attributes can be referenced in one or more declarations using the ref attribute

Page 52: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

52

Global elements and attributes

global elements can appear in the instance document in the place where they have been referenced, or at the top level of the document

global declarations cannot contain references

global declarations cannot contain occurrence constraints

Page 53: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

53

Simple types

Built-in types e.g. string, integer, positiveInteger, decimal,

float, boolean, time, date, recurringDay, uriReference, language, ID, IDREF

must have XML Schema namespace prefixderived types

derived from built-in and other derived types by defining restrictions to the base type

each base type has a set of facets that can be used for restrictions

Page 54: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

54

Facets

XML Schema defines 15 facets e.g. string has facets: length,

minLength, maxLength, pattern, enumeration

e.g. integer has facets: pattern, enumeration, maxInclusive, maxExclusive, minInclusive, minExclusive, precision, scale

Page 55: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

55

Defining a new type of integer

<xsd:simpleType name=”myInteger”>

<xsd:restriction base=”xsd:integer”>

<xsd:minInclusive value=”10000”/>

<xsd:maxInclusive value=”99999”/>

</xsd:restriction>

</xsd:simpleType>

New type whose range of values is between 10000 and 99999

Page 56: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

56

Patterns

<xsd:simpleType name=”Sku”><xsd:simpleType name=”Sku”> <xsd:restriction base=”xsd:string”><xsd:restriction base=”xsd:string”> <xsd:pattern value="\d{3}-[A-Z]{2}"/><xsd:pattern value="\d{3}-[A-Z]{2}"/> <xsd:restriction><xsd:restriction></xsd:simpleType></xsd:simpleType>

”three digits followed by a hyphen followed by two upper-case ASCII letters”

Page 57: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

57

Enumeration facet

<xsd:simpleType name=”USState”>

<xsd:restriction base=”xsd:string”>

<xsd:enumeration value=”AK”/>

<xsd:enumeration value=”AL”/>

<xsd:enumeration value=”AR”/>

<!-- and so on -->

</xsd:restriction>

</xsd:simpleType>

Limits values to a set of distinct values

Page 58: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

58

List types

List types are comprised of sequences of simple types

<xsd:element name=”listOfMyInt” type=”listOfMyIntType”>

<xsd:simpleType name=”listOfMyIntType”>

<xsd:list itemtype=”myInteger”/>

</xsd:simpleType>

instance:

<listOfMyInt>20003 15037 95977 95945</listOfMyInt>

Page 59: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

59

Union types

Type can be chosen from a set:

<xsd:element name=”zips” type=”zipUnion”>

<xsd:simpleType name=”zipUnion”>

<xsd:union memberTypes=”USState listOfMyIntType”/>

</xsd:simpleType>

<zips>CA</zips>

<zips>95630 95977 95945</zips>

Page 60: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

60

Element content

How to define attributes for elements with simple type content? In instance: <internationalPrice currency=”EUR”>423.45</internationalPrice> in the sample schema: <xsd:element name=”USPrice” type=”xsd:decimal”/> comes

close

but simple types cannot have attributes -> a complex type has to be defined

Page 61: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

61

Element content

New complex type is derived from type decimal

<xsd:element name=”internationalPrice>

<xsd:complexType>

<xsd:simpleContent>

<xsd:extension base=”xsd:decimal”>

<xsd:attribute name=”currency” type=”xsd:string” />

</xsd:extension>

</xsd:simpleContent>

</xsd:complexType>

</xsd:element>

Page 62: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

62

Mixed content

Element contains both character data and subelements

<letterBody>

<salutation>Dear Mr.<name>Robert Smith</name>.</salutation>

Your order of <quantity>1</quantity> <productName>Baby

Monitor</productName> shipped from our warehouse on

<shipDate>1999-05-21</shipDate> …

</letterBody>

Page 63: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

63

Mixed content<xsd:element name=”letterBody”> <xsd:complexType mixed=”true”> <xsd:sequence> <xsd:element name=”salutation”> <xsd:complexType mixed=”true”> <xsd:sequence> <xsd:element name=”name” type=”xsd:string”/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name=”quantity” type=”xsd:positiveInteger”/> … </xsd:sequence></xsd:complexType></xsd:element>

Page 64: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

64

Empty content

Assume we want the internationalPrice element to have both the unit of currency and the price as attribute values: <internationalPrice currency=”EUR”

value=”423.45” />

i.e. the element has no contentsolution: no elements defined in the

content model

Page 65: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

65

Empty content

<xsd:element name=”internationalPrice” <xsd:complexType> <xsd:complexContent> <xsd:restriction base:”xsd:anyType”> <xsd:attribute name=”currency” type=”xsd:string” /> <xsd:attribute name=”value” type=”xsd:decimal” /> </xsd:restriction> </xsd:complexContent> </xsd:complexType></xsd:element>

Page 66: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

66

Shorthand for empty complex type

<xsd:element name=”internationalPrice” <xsd:complexType> <xsd:attribute name=”currency” type=”xsd:string” /> <xsd:attribute name=”value” type=”xsd:decimal” /> </xsd:complexType></xsd:element>

Page 67: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

67

anyType

The anyType seen in the definition for an empty content model represents an abstraction which is the base type from which all simple and complex types are derived

anyType does not constrain its content in any way

can be used like other types is a default if no type is specified

<xsd:element name=”anything” />

Page 68: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

68

Building content models

<xsd:sequence>: fixed order<xsd:choice>: (1) choice of

alternatives<xsd:group>: grouping (also named)<xsd:all>: no order specified

Page 69: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

69

Nested choice and sequence groups

<xsd:complexType name=”PurchaseOrderType”> <xsd:sequence> <xsd:choice> <xsd:group ref=”shipAndBill” /> <xsd:element name=”singleUSAddress” type=”USAddress” /> </xsd:choice> <xsd:element name=”items” type=”Items” /> </xsd:sequence>

Page 70: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

70

Nested choice and sequence groups

<xsd:group name=”shipAndBill”> <xsd:sequence> <xsd:element name=”shipTo” type=”USAddress” /> <xsd:element name=”billTo” type=”USAddress” /> </xsd:sequence></xsd:group>

Page 71: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

71

An ’all’ group

An all group: all the elements in the group may appear once or not at all, and they may appear in any order

limited to the top-level of any content model

has to be the only child at the topgroup’s children must all be individual

elements (no groups), and no element in the content model may appear more than once

Page 72: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

72

An ’all’ group

<xsd:complexType name=”PurchaseOrderType”> <xsd:all> <xsd:element name=”shipTo” type=”USAddress” /> <xsd:element name=”billTo” type=”USAddress” /> <xsd:element ref=”comment” minOccurs=”0” /> <xsd:element name=”items” type=”Items” /> </xsd:all> <xsd:attribute name=”orderDate” type=”xsd:date” /> </xsd:complexType>

Page 73: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

73

Attribute groups

Also attribute definitions can be grouped and named

<xsd:element name=”item” > <xsd:complexType> <xsd:sequence> … </xsd:sequence> <xsd:attributeGroup ref=”ItemDelivery” /> </xsd:complexType></xsd:element>

<xsd:attributeGroup name=”ItemDelivery”> <xsd:attribute name=”partNum” type=”SKU” /> …</xsd:attributeGroup>

Page 74: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

74

Namespaces and XML Schema

An XML Schema document contains declarations of namespaces that are used in the document e.g. xmlns:xsd=”http://www.w3.org/2001/XMLSchema”

for the elements with special XML Schema semantics

Target namespace: ~these definitions included in this schema give definition to this namespace targetNamespace=”uri:mywork”

Page 75: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

75

Namespaces and XML Schema

In XML Schema, schema components from different target namespaces can be used together

-> enables the schema validation of instance content defined across multiple namespaces

Page 76: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

76

Importing schema declarations

Every top-level schema component is associated with a target namespace (or, explicitly, with none, if the target namespace is not defined)

a component may refer to another component that is in a different namespace, using an import element

Page 77: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

77

Import

<schema xmlns=”http://www.w3.org/2001/XMLSchema” xmlns:html=”http://www.w3.org/1999/xhtml” targetNamespace=”uri:mywork” xmlns:my=”uri:mywork”>

<import namespace=”http://www.w3.org/1999/xhtml”>…<complexType name=”myType”> <sequence> <element ref=”html:p” minOccurs=”0”/> </sequence> …</complexType><element name=”myElt” type=”my:myType”/></schema>

Page 78: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

78

Type libraries

As XML schemas become more widespread, schema authors will want to create simple and complex types that can be shared and used as the basic building blocks for building new schemas

XML Schemas already provide types that play this role: the simple types

other examples: currency, units of measurement, business addresses

Page 79: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

79

Example: currencies

<schema targetNamespace=”http://www.example.com/Currency” xmlns:c=”http://www.example.com/Currency” xmlns=”http://www.w3.org/2000/08/XMLSchema”><complexType name=”Currency”> <simpleContent> <extension base=”decimal”> <attribute name=”name”> <simpleType> <restriction base=”string”> <enumeration value=”AED”/>

<enumeration value=”AFA” /> <enumeration value=”ALL” /> …

Page 80: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

80

Extending content models

Mixed content models an element can contain, in addition to

subelements, also arbitrary character data

import an element can contain elements whose types

are imported from external namespaces e.g. this element may contain an HTML p

element here

more flexible way: any element, any attribute

Page 81: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

81

Example<purchaseReport

xmlns=”http://www.example.com/Report”><regions> <!-- part sales by regions --> </regions><parts> <!-- part descriptions --> </parts><htmlExample> <table xmlns=”http://www.w3.org/1999/xhtml” border=”0” width=”100%”> <tr> <th align=”left”>Zip Code</th> <th align=”left”>Part Number </th> <th align=”left”>Quantity</th> </tr> <tr><td>95819</td><td> </td> <td> </td></tr> <tr><td> </td><td>872-AAA</td><td>1</td></tr> ...

Page 82: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

82

Including an HTML table

To permit the appearance of HTML in the instance document we modify the report schema by declaring the content of the element htmlExample by the any element

in general, an any element specifies that any well-formed XML is permissible in a type’s content model

in the example, we require the XML to belong to the namespace http://www.w3.org/1999/xhtml -> the XML should be XHTML

Page 83: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

83

Schema declaration with any

<element name=”purchaseReport”> <complexType> <sequence> <element name=”regions” type=”r:RegionsType”/> <element name=”parts” type=”r:PartsType”/> <element name=”htmlExample”> <complexType> <sequence> <any namespace=”http://www.w3.org/1999/xhtml” minOccurs=”1” maxOccurs=”unbounded” processContents=”skip”/> </sequence> ...

Page 84: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

84

Schema validation

The attribute processContents skip: no validation strict: an XML processor is obliged to obtain

the schema associated with the required namespace and validate the HTML appearing within the HTMLExample element

Page 85: Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

85

anyAttribute

<element name=”htmlExample”> <complexType> <sequence> <any namespace=”http://www.w3.org/1999/xhtml” minOccurs=”1” maxOccurs=”unbounded” processContents=”skip”/> </sequence> <anyAttribute namespace=”http://www.w3.org/1999/xhtml”/> </complexType></element>