Upload
neeraj-singh
View
221
Download
0
Embed Size (px)
Citation preview
8/14/2019 Session03 XML Validation DTD
1/28
2008 MindTree Consulting
XML Validation
DTD
Sep-2009
8/14/2019 Session03 XML Validation DTD
2/28
Slide 2
Agenda
Introduction to XML Validation
DTD
XML Schema
8/14/2019 Session03 XML Validation DTD
3/28
2008 MindTree Consulting
XML Validation
8/14/2019 Session03 XML Validation DTD
4/28
Slide 4
An Introduction to XML Validation
One of the important innovations of XML is the ability to placepreconditions on the data the programs read, and to do this in a
simple declarative way.
XML allows you to say
that every Order element must contain exactly one Customer element,
that each Customer element must have an id attribute that contains an
XML name token,
that every ShipTo element must contain one or more Streets, one City,
one State, and one Zip, and so forth.
Checking an XML document against this list of conditions is called
validation.
Validation is an optional step but an important one.
8/14/2019 Session03 XML Validation DTD
5/28
Slide 5
Validation
There are many reasons and opportunities to validate an XML document:When we receive one, before importing data into a legacy system
When we receive one, before importing data into a legacy system, when we have
produced or hand-edited one
To test the output of an application, etc.
Validation as firewall
to serve as actual firewalls when we receive documents from the external world
(as is commonly the case with Web Services and other XML communications),
to provide check points when we design processes as pipelines of transformations.
Validation can take place at several levels.
Structural validation
Data validation
8/14/2019 Session03 XML Validation DTD
6/28
Slide 6
Schema Languages
There is more than one language in which you can express suchvalidation conditions. Generically, these are called schema
languages, and the documents that list the constraints are called
schemas.
Different schema languages have different strengths and
weaknesses.
The document type definition (DTD) is the only schema language
built into most XML parsers and endorsed as a standard part of XML.
The W3C XML Schema Language (schemas for short, though its
hardly the only schema language) addresses several limitations of
DTDs.
Many other schema languages have been invented that can easily
be integrated with your systems.
8/14/2019 Session03 XML Validation DTD
7/28 2008 MindTree Consulting
Document Type Definition (DTD)
8/14/2019 Session03 XML Validation DTD
8/28Slide 8
Document Type Definition (DTD)
XML 1.0 included a set of tools for defining XML document structures,called Document Type Definitions (DTDs).
A DTD focuses on the element structure of a document. It says what
elements a document may contain, what each element may and must
contain in what order, and what attributes each element has. DTDs can be
used for:defining reusable content (entities),
some kinds of metadata information (notations).
mechanisms for providing default values for attributes.
Document type definitions (DTDs) serve two general purposes.They provide the syntax for describing/constraining the logical structure of
a document. (Element/attribute declarations are used for it)
They provide syntax for composing a logical document from physical
entities. (entity/notation declarations are used to accomplish it.)
8/14/2019 Session03 XML Validation DTD
9/28Slide 9
DTD Declarations
DTDs contain
several types
of
declarations
DOCTYPE ENTITY NOTATION ELEMENT ATTLIST
8/14/2019 Session03 XML Validation DTD
10/28Slide 10
The DOCTYPE declaration is the container for all other DTD
declarations.
The document type declaration is placed in the instance
documents prolog, after the XML declaration but before the root
element start-tag to associate the given document with a set of
declarations.
The name of the DOCTYPE must be the same as the name of the
documents root element.
Example:
8/14/2019 Session03 XML Validation DTD
11/28Slide 11
DOCTYPE Syntax
DOCTYPE may contain internal declarations (referred to as theinternal DTD subset ), may refer to declarations in external files
(referred to as the external DTD subset ), or may use a combination
of both techniques.
8/14/2019 Session03 XML Validation DTD
12/28Slide 12
Internal Declarations
The simplest way to define a DTD is through internal declarations. In this case, all
declarations are simply placed between the open/close square brackets. The obvious
downside to this approach is that you cant reuse the declarations across different
XML document instances.
]>
Billy Bob
33
8/14/2019 Session03 XML Validation DTD
13/28Slide 13
External Declarations
DOCTYPE can also contain a reference to an external resourcecontaining the declarations. This type of declaration is useful
because it allows you to reuse the declarations in multiple
document instances.
The DOCTYPE declaration references the external resource through
public and system identifiers.
A system identifier is a URI that identifies the location of the
resource; a public identifier is a location-independent identifier.
Processors can use the public identifier to determine how to retrieve the
physical resource if necessary. The PUBLIC token identifies a public
identifier followed by a backup system identifier.
8/14/2019 Session03 XML Validation DTD
14/28Slide 14
Using external declarations examples
Using external declarations (public
identifier)
Billy Bob
33
Using external declarations (systemidentifier)
Billy Bob
33
8/14/2019 Session03 XML Validation DTD
15/28Slide 15
Internal and external declarations
A DOCTYPE declaration can also use both the internal and external declarations.
This is useful when youve decided to use external declarations but you need to extend them
further or override certain external declarations.
Note: only ENTITY and ATTLIST declarations may be overridden.
Example
Billy Bob
33
8/14/2019 Session03 XML Validation DTD
16/28Slide 16
An ELEMENT declaration defines an element of the specified name with thespecified content model. The content model defines the elements allowed
children.
Content Model Basics
Syntax DescriptionANY Any child is allowed within the element.
EMPTY No children are allowed within the element.
(#PCDATA) PCDATA stands for parsed character data and means
the element can contain text.
(child1,child2,...) Only the specified children in the order given are
allowed within the element.
(child1|child2|...) Only one of the specified children is allowed within
the element.
8/14/2019 Session03 XML Validation DTD
17/28
8/14/2019 Session03 XML Validation DTD
18/28Slide 18
Elements - Examples
Element and text content models
Billy
Smith
43
0.1
Jill
J
Smith
21
Mixed content model
< p SYSTEM "p.dtd">
This is an example of mixed
content!
8/14/2019 Session03 XML Validation DTD
19/28Slide 19
aName2 aType default ...>
Declaration Description
Value Default value for attribute. If the
attribute is not explicitly used on
the given element, it will still
exist in the logical documentwith the specified default value.
#REQUIRED Attribute is required on the given
element.
#IMPLIED Attribute is optional on the givenelement.
#FIXED
"value"
Attribute always has the
specified fixed value.
Type Description
CDATA Arbitrary character data
ID A name that is unique within the
documentIDREF A reference to an ID value in the
document
ENTITY The name of an unparsed entity
declared in the DTD
ENTITIES A space-delimited list of ENTITY
values
NMTOKEN A valid XML name (NMTOKEN is
essentially a word without spaces.)
NMTOKENS A space-delimited list of
NMTOKEN values
Default declarations - After the attribute type,you must specify either a default value for the
attribute or a keyword that specifies whether it is
required.
Attribute types-Attribute types make it possibleto constrain the attribute value in different ways.
See the following list of type identifiers for details.
Attribute enumerations
8/14/2019 Session03 XML Validation DTD
20/28
Slide 20
Attribute enumerations
...)>
Example - Using attribute types
name CDATA #REQUIRED
species NMTOKEN #FIXED "human"
id ID #REQUIRED
mgr IDREF #IMPLIED
manage IDREFS #IMPLIED>
Example - Using attribute enumerations
title (president|vice-pres|secretary|sales)
#REQUIRED>
format NOTATION (cs|lf) "cs">
1927 N 52 E, Layton, UT, 84041
Its also possible to define an attribute as an enumeration of tokens. The tokens may be of type NMTOKEN or NOTATION . In
either case, the attribute value must be one of the specified enumerated values.
8/14/2019 Session03 XML Validation DTD
21/28
Slide 21
Entities are the most atomic unit of information in XML. Entities are usedto construct logical XML documents (as well as DTDs) from physical
resources. There are several types of entities, each of which is declared
using an ENTITY declaration.
A given entity is either
General or parameter Internal or external Parsed or unparsed
General Entity may only be referenced in an XML document (not the DTD).
Parameter Entity may only be referenced in a DTD (not the XML document).
Internal Entity value defined inline.External Entity value contained in an external resource.
Parsed Entity value parsed by a processor as XML/DTD content.
Unparsed Entity value not parsed by XML processor.
8/14/2019 Session03 XML Validation DTD
22/28
Slide 22
Entity Syntax
Distinct Entity Types Syntax Description
Internal
parameter
"systemId">
External
parameter
Internal general
"systemId">
External parsed
general
"systemId" NDATA nname>
Unparsed
Entity References Syntax Description
&name; General
%name; Parameter
Name is used as the value of
an attribute of type ENTITY
or ENTITIES
Unparsed
Note that unparsed entities
are always general and
external whereas
parameter/internal
entities are always
parsed.
8/14/2019 Session03 XML Validation DTD
23/28
Slide 23
Internal parameter entities
Always parsed
Referenced
within
ELEMENT,ATTRIBUTE,
NOTATION,ENTITY
Used toparameterize
portions of theDTD
(%name;) is
replaced with
the parsed
content
Internalparameter
entities
Example: Parameter entities in the internal subset
%nameDecl;
]>
Billy Bob
Its common to override parameter entities defined in the
external subset with declarations in the internal subset
Parameter entities may not be referenced within other
declarations in internal subset but it can be in external subset
External parameter entities
8/14/2019 Session03 XML Validation DTD
24/28
Slide 24
External parameter entities
Example
%decls;
]>
Billy Bob
33
External parameter entities are
used to include declarations
from external resources.
External parameter entities are
always parsed. A reference to an
external parameter entity
(%name;) is replaced with the
parsed content.
This example uses an external
parsed entity (decls) to includethe set of declarations that are
contained in person-decls.dtd.
8/14/2019 Session03 XML Validation DTD
25/28
Slide 25
Internal general entities
Internal general entities always contain
parsed XML content. The parsed content is
placed in the logical XML document
everywhere its referenced (&name;).
Example : Using internal general entities
"BillySmith">
]>
&n;
&a;
The resulting logical document couldbe serialized as follows:
Billy
Smith
33
8/14/2019 Session03 XML Validation DTD
26/28
Slide 26
External general parsed entities and Unparsed entities
External general parsed entities
External general parsed entities are used the same
way as internal general entities except for the fact
that they arent defined inline. They always contain
parsed XML content that becomes part of the logical
XML document wherever its referenced (&name;).
Example:
]>
&n;
&a;
Unparsed entities
nname>
Unparsed entities make it possible to attach
arbitrary binary resources to an XML document.
Unparsed entities are always general and
external.
Because unparsed entities can reference any
binary resource, applications require additional
information to determine the resources type.
The notation name (nname) provides exactly this
type of information
Because unparsed entities dont contain XML
content, they arent referenced the same way as
other general entities (&name;), but rather
through an attribute of type ENTITY/ENTITIES.
]>
Aaron
8/14/2019 Session03 XML Validation DTD
27/28
Slide 27
Questions
8/14/2019 Session03 XML Validation DTD
28/28
Thank you
XML Technology, Semester 4
SICSR Executive MBA(IT) @ MindTree, Bangalore, India
By Neeraj Singh (toneeraj(AT)gmail(DOT)com
)
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]