Upload
micah-ryno
View
212
Download
0
Embed Size (px)
Citation preview
Hands-on XML 1 ® GvdS Palstar 2001
Begrijpen van XML
Gert van der Steen
Palstar bvUniversity of Utrecht
Hands-on XML 2 ® GvdS Palstar 2001
Understanding XML
• Elements
• Entities
• Attributes
• Miscelaneous
Hands-on XML 3 ® GvdS Palstar 2001
Elements
<!ELEMENT book ( title, ( chapter, notes )+ ) >
Keyword Element name
Content model
Model group
Syntax:
XML name:
• any length
• case sensitive
• contains letters, digits and punctuation ‘.’ ‘-’ ‘_’ ‘:’
• starts with a letter
Hands-on XML 4 ® GvdS Palstar 2001
Element declarations: operators
If A and B are model groups:
A, B A followed by B
A | B either A or B
A? optional A: zero or one
A+ one or more A
A* zero or more A
( A ) grouping of A
Hands-on XML 5 ® GvdS Palstar 2001
(SCOC) 'SEQUENCE CONNECTOR' AND 'OR CONNECTOR'
Why: Alternative Sequences
ex. 1a. DTD: <!element a ( #pcdata ) ><!element b ( #pcdata ) ><!element c ( #pcdata ) >
c. DOC:<a>Text for a.</a><b>..b..</b><c>..c1..</c>
Hands-on XML 6 ® GvdS Palstar 2001
(SCOC) 'SEQUENCE CONNECTOR' AND 'OR CONNECTOR'
Why: Alternative Sequences
ex. 1a DTD: <!element scoc1 ( a, ( b | c ) ) >
b Syntax diagram: scoc1 = a bc
c. DOC: <scoc1><a>..a1..</a><c>..c1..</c></scoc1>
Hands-on XML 7 ® GvdS Palstar 2001
(SCOC) 'SEQUENCE CONNECTOR' AND 'OR CONNECTOR'
Why: Alternative Sequences
ex. 1a DTD: <!element scoc1 ( a, ( b | c ) ) >
b Syntax diagram: scoc1 = abc
c. DOC: <scoc1><a>..a1..</a><c>..c1..</c></scoc1>
Hands-on XML 8 ® GvdS Palstar 2001
(SCOC) 'SEQUENCE CONNECTOR' AND 'OR CONNECTOR' (continued)
c. DOC: <scoc2><a>..a1..</a><c>..c1..</c></scoc2>
b Syntax diagram: scoc2 =
ex. 2a DTD: <!element scoc2 ( (a | b), c) >
cab
Hands-on XML 9 ® GvdS Palstar 2001
(SCOC) 'SEQUENCE CONNECTOR' AND 'OR CONNECTOR' (continued)
c. DOC: <scoc3><b>..b1..</b><b>..b2..</b></scoc3>
b Syntax diagram: scoc3 =
ex. 3a DTD: <!element scoc3 ( (a | b), ( b | c ) ) >
ab
bc
Hands-on XML 10 ® GvdS Palstar 2001
(OI) 'OCCURRENCE INDICATOR'
Why: To Repeat Elements
ex. 1 a DTD: <!element oi1 (a+, b*, c?, d) >
b1 Syntax diagram: oi1 = d
c1a DOC: <oi1><a>..a1..</a><a>..a2..</a>
<d>....d1....</d></oi1>
c1b DOC: <oi1><a>..a1..</a><b>..b1..</b><b>..b2..</b>
<c>..<c1>..</c><d>..d1..</d></oi1>
c2 DOC: Correct, minimally, the following input: <b>.. b1..</b><c>..c1..</c><c>..c2..></c>
<d>..d1..</d>
c3 Construct some input and parse
a b c
Hands-on XML 11 ® GvdS Palstar 2001
(OI) 'OCCURRENCE INDICATOR' (CONTINUED)
ex. 2 a1 DTD:<!element oi2 ( a | b )+ >
a2 DTD:<!element oi3 ( a+ | b+ ) >
b2 Construct Syntax Diagrams for oi2 and oi3
Will oi2 and oi3 accept the same input?
Hands-on XML 12 ® GvdS Palstar 2001
(OI) 'OCCURRENCE INDICATOR' (CONTINUED)
a DTD: <!element oi4 ( ( a?, b+)*, (c, d? )+ )
b Syntax Diagram oi4=
b3 Which elements are permissable before and after b? And for c? (Use next sheet and check with editor)
c1 DOC: <oi4><c>..c1..</c></oi4> DOC: <oi4><b>..b1..</b><b>..b2..</b><c>..c1..</c> <d>..d1..</d><c>..c2..</c></oi4> DOC: <oi4><a>..a1..</a><b>..b1..</b> <c>..c1..</c></oi4>
c2 DOC: Correct, minimally, the following input: <a>..a1..</a><c>..c1..</c> and parse
c3 Construct some more input and parse
>
a b c d
ex. 3
Hands-on XML 13 ® GvdS Palstar 2001
Exercise sheet
<oi4> a b c d </oi4>
voor b
na b
voor c
na c
Hands-on XML 14 ® GvdS Palstar 2001
(OI) 'OCCURRENCE INDICATOR' (CONTINUED)
ex. 4 a1 DTD: <!element x1 ( a, ( b, a )+ , b ) >
a2 DTD: <!element x2 ( ( a, b )+ ) >
b2 Construct Syntax Diagrams for x1 and x2
Will x1 and x2 accept the same input?
Hands-on XML 15 ® GvdS Palstar 2001
(OIOC) 'OCCURENCE INDICATOR' AND 'OR CONNECTOR'
ex. 1 a DTD: <!element oioc1 ( ( a* | ( b?, c+) )* ) >
b1 Syntax Diagram oioc1 =
a
bc
b3 Which elements are permissable before and after a, b, and c? (Use next sheet and check with editor)
c3 Construct some input and parse
Hands-on XML 16 ® GvdS Palstar 2001
Exercise sheet
onmiddelijk <oioc1> a b c </oioc1>
voor a
na a
voor b
na b
voor c
na c
Hands-on XML 17 ® GvdS Palstar 2001
ex.2 a DTD: <!element oioc2 ( ( (a* | b? )* | c+ ) | ( d?, e ) ) >
b2 Construct the Syntax Diagram
c3 Construct some input and parse
(OIOC) 'OCCURENCE INDICATOR' AND 'OR CONNECTOR' (CONT)
Hands-on XML 18 ® GvdS Palstar 2001
(NE) 'NESTING ELEMENTS'
ex. a DTD:<!element ne1 ( a, l1 ) >
DTD:<!element l1 ( b ) >
b1 Syntax Diagram ne1 =
l1 =
c1 DOC:<ne1><a>..a1..</a><l1><b>..b1..</b></l1></ne1>
a l1
b
Hands-on XML 19 ® GvdS Palstar 2001
(RE) 'RECURSIVE ELEMENTS'
ex. 1 a DTD: !element re1 ( (a, re1, b) | c) >
b Syntax Diagram re1 =a - re1 - b
c
c1 DOC: <re1><a>..a1..</a> <re1><a>..a2..</a> <re1><c>..c1..</c> </re1> <b>..b2..</b> </re1> <b>..b1..</b> </re1>
c2 DOC: <re1><c>..c1..</c></re1>
Hands-on XML 20 ® GvdS Palstar 2001
(RE) 'RECURSIVE ELEMENTS' (CONTINUED)
ex.2 a DTD: <!element l (lh, li+) > <!element lh (#pcdata) > <!element li (l | #pcdata) >
c1 DOC: <l><lh>Europe</lh> <li><l><lh>Netherlands</lh> <li>Amsterdam</li> <li>Zeist</li> </l></li> <li><l><lh>England</lh> <li>Swindon</li> </l></li> </l>
c3 Add streets by another nesting of l and try out
Hands-on XML 21 ® GvdS Palstar 2001
’DETERMINISTIC'
Why: A content model must be deterministic(SGML: “not ambiguous”)
ex. 1 a DTD: <!element am1 ( ( a, b) | ( a, c ) ) >DTD: <!element am2 ( a, ( b | c ) ) >
b1 Syntax Diagrams am1 =
Syntax diagrams am2 =
In XML, am2 is deterministic, am1 not
a - b
a - c
b
ca
Hands-on XML 22 ® GvdS Palstar 2001
’DETERMINISTIC' (CONTINUED)
ex. 2 a DTD: <!element am3 (a?, b*)*, a*, c) >
b1 Syntax Diagram am3 =
cb
b3 Is content model deterministic?
a1
a2
Hands-on XML 23 ® GvdS Palstar 2001
COMMENT DECLARATION
Between other declarations in the DTD
<!-- text of the comment -->
Between other text in the DOC
<!-- text of the comment -->
Can be any lenght
.
.
.
Hands-on XML 24 ® GvdS Palstar 2001
Why: to process text for entity references and to be sensitive
to the appearance of tags
ex. a DTD: <!element pcd (#pcdata) >
c1 DOC: <pcd> the character entity Ä is
processed within pcdata, also
"<!-- comment -->" will be treated </pcd>
#PCDATA (PARSED CHARACTER DATA)
Hands-on XML 25 ® GvdS Palstar 2001
CDATA (CHARACTER DATA)
Why: to keep text literally
ex. DTD: no special declaration required
DOC: <p>The character entity <![CDATA [Ä
is not processed within CDATA, also
"<!-- comment -->"]]> will not be
treated.</p>
Hands-on XML 26 ® GvdS Palstar 2001
Mixed content model
Why: Allows for “floating” or “in-line” elements in between text
Syntax in DTD: restricted to #PCDATA alternating with subelements
• DTD:– <!ELEMENT par ( #PCDATA | warning )* >
• DOC:
– <par>Clean with an alcoholic <warning>flammable</warning> substance</par>
Hands-on XML 27 ® GvdS Palstar 2001
Preserved space
• Within #PCDATA white space will be normalized to a single character• With the reserved attribute ‘xml:space’ the white space will be preserved• DOC:
<par xml:space=“preserve”>
O
--I--
I
/ \
</par>
Hands-on XML 28 ® GvdS Palstar 2001
Language attribute
• With the reserved attribute ‘xml:lang’ the natural language of the contained #PCDATA is encoded
• The values of the attribute are language identifiers as defined by [RFC1766], “Tags for the Identification of Languages”
• DOC:
<par xml:lang=“en-GB”>What colour is it?</par>
<par xml:lang=“en-US”>What color is it?</par>
Hands-on XML 29 ® GvdS Palstar 2001
EMPTY
Why: 1. to refer to objects which are internal or external to the document
2. To trigger special processing
DTD: <!element pagebreak empty >
DOC: ...The page will break here.<pagebreak/>..
or: ...The page will break here.<pagebreak></pagebreak>..
Hands-on XML 30 ® GvdS Palstar 2001
Entities
for DTD: <!ENTITY % nwc “note | warning | caution” >
for DOC: <!ENTITY XML “Extensible Markup Language” >
Keyword Entity name
ReplacementSyntax:
Hands-on XML 31 ® GvdS Palstar 2001
Entities
• Entity references are requests for data to be imbedded at the
point of reference
• In a Document:– Internal text entities: simple text replacement
– External text entities: inclusion of an external document
– Binary entities: reference to multimedia files
– Character defining entities: for characters outside the default characterset
– Built-in entities: for characters used in markup
– Character entities: the number of a character in the default characterset
• In a DTD:
– Parameter entities: simple text replacement
Hands-on XML 32 ® GvdS Palstar 2001
Internal text entities
• Purpose: simple text replacement; text stored in entity
• DTD: – <!ENTITY gca "Graphics Communications Association" >
• DOC:– ... the &gca; sponsor meetings ...
– ==> ... the Graphics Communications Association sponsor meetings ...
Hands-on XML 33 ® GvdS Palstar 2001
External text entities
• Purpose: inclusion of an external document; reference stored in entity
• DTD:– <!ENTITY ch1 SYSTEM "http://www.../ch1.xml">
• DOC: – <book>a book about xml &ch1; ... more content ... </book>
Hands-on XML 34 ® GvdS Palstar 2001
Binary entities
• Purpose: reference to multimedia files (“Non-XML data”)
• Syntax in DTD:– <!NOTATION Name PUBLIC Datatype>– <!ENTITY Name SYSTEM URL NDATA Datatype>
• DTD– <!NOTATION EPS PUBLIC "+//ISBN 0-7923-1::Graphic Notation//NOTATION
Adobe Systems Encapsulated Postscript//EN">– <!ENTITY figure1 SYSTEM "c:\graphics\figure1.pic" NDATA EPS>– <!ELEMENT graphics EMPTY> – <!ATTLIST graphics filename ENTITY #IMPLIED>– <!ELEMENT p (#PCDATA | graphics)+ >
• DOC– <p>As is shown in the following diagram: <graphics filename=”figure1"/></p>– wrong: <p> As is shown in the following diagram: &figure1;</p>
Hands-on XML 35 ® GvdS Palstar 2001
Character defining entities
• Purpose: for characters outside the default characterset
• DTD:– <!ENTITY % ISOnum PUBLIC "ISO 8879-1986//ENTITIES Numeric and
Special Graphic//EN" SYSTEM “/ents/isonum.ent”>– %ISOnum;
• file isonum.ent:– <!ENTITY frac34 "[frac34]” -- fraction seven-eighths -->– <!ENTITY plusmn "[plusmn]” -- plus-or-minus sign -->– ...
• DOC:– <p>..about ¾ of the height..</p>– => <p>..about ¾ of the height..</p>
Hands-on XML 36 ® GvdS Palstar 2001
Character entities
• Purpose: hard coding of characters, e.g. for UNICODE “©” <=> ©
• DTD: – <!ENTITY Copyright "©” >
• DOC: &Copyright;– Resolution by parser of "&Copyright ”: "©”– Resolution by printer/browser of "©”: " ©”
• Ranges:– &0; .. &255; -- extended ASCII set: ISO 8859/1, used under Windows, Sun
Unix and as the Web default– &256; .. &65535; -- Unicode/ISO10646– larger -- any Unicode character– alternative to decimal: hexadecimal, like © or ￸
Hands-on XML 37 ® GvdS Palstar 2001
XML built-in character entities
• Purpose: for characters used in markup
• DTD: no declaration required
• DOC:
– < for ‘<‘
– > for ‘>’
– & for ‘&’
– ' for “’’
– " for ‘”’
Hands-on XML 38 ® GvdS Palstar 2001
Parameter entities
• Purpose: simple text replacement in a DTD
– DTD: <!ENTITY % subelems "(para | list | table | note)" >
– DTD: <!ELEMENT body (things, %subelems;) >
– ==> DTD: <!ELEMENT body (things, (para | list | table | note)) >
• Purpose: to keep text literally
– DOC: <!ENTITY % subelems "(para | list | table | note)" >
Hands-on XML 39 ® GvdS Palstar 2001
ATTRIBUTE DECLARATION
Why: to associate information with an Element: metadata, hypertext, multimedia, layout (!?), ...
Syntax:
DTD: <!ELEMENT el_name (............) >
<!ATTLIST el_name att_name1 type1 default1 att_name2 type2 default2 >
Spelling of attribute name: as an XML name (~ element name)
Allowed: more than one <!ATTLIST for an element
Element name
Attributename
Type ofattribute
Defaultvalue
Hands-on XML 40 ® GvdS Palstar 2001
TYPES FOR ATTRIBUTE DECLARED VALUES
Type: Attribute value is:
CDATA SGML character data
ENTIT(Y)(IES) ( list of) subdocument(s) entity name(s)
ID Unique identifier for element
IDREF(S) (list of) (a) reference(s) to a previously ID
NMTOKEN(S) (list of) name token(s)
NOTATION member of a list ot notations
Name group one of a finite set
Hands-on XML 41 ® GvdS Palstar 2001
ATTRIBUTE DECLARATION, declared values 1/3
<!ELEMENT memo ( idinfo, body ) ><!ATTLIST memo
rev CDATA #REQUIREDsize NMTOKEN #REQUIREDprojects NMTOKENS #REQUIRED
[any character]+
[ letter | 0..9 | - | . | _ | :]+
NMTOKEN, [" ", NMTOKEN]*
DTD:
DOC:
<memo rev="27/1/96 - 3.2a"
size=”.17-.19" projects="2-a 3-b" >..... </memo>
attributename
type defaultvalue
Hands-on XML 42 ® GvdS Palstar 2001
ATTRIBUTE DECLARATION, declared values 2/3
<!NOTATION tex PUBLIC "-//local//NOTATION TeX Formula//EN” “c:\programs\show_tex” >
<!ENTITY pic1 SYSTEM "c:\proj3\file12" NDATA tex ><!ENTITY pic2 SYSTEM "c:\proj4\file15" NDATA tex ><!ELEMENT fig empty > <!ELEMENT figr empty > <!ELEMENT figrs empty >
<!ATTLIST fig
<!ATTLIST figr<!ATTLIST figrs
DTD:
DOC:<fig id="oor" file="pic1" > <fig id="neus" file="pic2” ><figr refid="neus"> <figrs refids="neus oor">
fileIDENTITYIDREF
#REQUIRED#REQUIRED#REQUIRED >refid
id
IDREFS #REQUIRED >refids
Hands-on XML 43 ® GvdS Palstar 2001
ATTRIBUTE DECLARATION, declared values 3/3
<!NOTATION eqn SYSTEM "c:\eqn.exe”><!NOTATION tex SYSTEM "c:\tex.exe” ><!ELEMENT memo ( idinfo, body ) ><!ELEMENT formula CDATA ><!ATTLIST memo security ( ts | sec | unc ) #REQUIRED ><!ATTLIST formula data NOTATION #REQUIRED >
DTD:
DOC:
<memo security="sec">...</memo>
<formula data="eqn"> 3 over 4 </formula>
Hands-on XML 44 ® GvdS Palstar 2001
DEFAULT VALUES FOR ATTRIBUTE DECLARATIONS
Reserved Words:
FIXED - used for attributes with constant values
REQUIRED - demands a user-entered value (always the case when there is no DTD
IMPLIED - value supplied by application if not entered explicitly
Example default value in DOC:
<!ATTLIST memo security ( ts | sec | unc ) “unc” >
Hands-on XML 45 ® GvdS Palstar 2001
ATTRIBUTE EXERCISES
Experiment in ex.inp with attributes accordingto the attribute declarations in ex.dtd of:
- memo- memo1- fig- figr- figrs
Hands-on XML 46 ® GvdS Palstar 2001
CONDITIONAL SECTION in DTD
Why: to indicate which parts of a DTD should be selected
Example in DTD:
<!ENTITY % standard ”INCLUDE”>
<!ENTITY % variant ”IGNORE” >
......
<![ %standard; [<!ENTITY % Text “#PCDATA | emph1”> ]]>
<![ %variant; [<!ENTITY % Text “#PCDATA | emph2”> ]]>
......
Hands-on XML 47 ® GvdS Palstar 2001
Processing instructions (“PI”)
Why: to contain information that is not part of the document, e.g. to
trigger processor functions
Can be (mis)used for many purposes.
DTD: not required
DOC:
<p> Here follows a pagebreak <?newpage?></p>
Hands-on XML 48 ® GvdS Palstar 2001
Parsing: Well-formed versus Valid
• Well-formed
– XML declaration required
– Tags must be balanced or be an EMPTY tag
– All attribute values must be quoted
– No markup characters (< or &) in the character data allowed
– Properly nested elements
– Attributes must be of type CDATA (if no dtd is used)
• Valid
– Well-formed plus conforms to DTD
Hands-on XML 49 ® GvdS Palstar 2001
Parsing sequence
External subset
Parsing pathXML document
XML Declaration
Document Type Declaration
Internal subset
Prolog
Text+
Markup
Hands-on XML 50 ® GvdS Palstar 2001
Concise XML Syntax
<?XML version=“1.0” encoding=“UTF-8” standalone=“no” ?><!DOCTYPE example SYSTEM “Example.dtd”[<!ENTITY XML “eXtensble Markup Language”><!ENTITY history SYSTEM “History.XML”><!ENTITY wheelchair SYSTEM “c:/Wheelchair.tif” ><!ENTITY % figs “INCLUDE”>]><example><par>The &XML; format is a very important moveto bringing the benefits of structured markupto the masses.</par>&history;<par>The following figure shows a wheelchair:</par><fig filename=“wheelchair” /><par>The tags <![CDATA[<example>, <par> and <fig../> are used in this document]].</par></example>
Example.xml<par>Superficially it looks like HTML because the tags have the same delimiters, < and > </par><par xml:space=‘preserve’ xml:lang=“en.gb”> --- XML --- | |SGML HTML</par>
History.xml
<!-- The example DTD --><!NOTATION TIFF SYSTEM “Showtiff.exe” ><!ENTITY % figs “IGNORE” ><![%figs[<!ENTITY %ExampleContent “par | fig”>]]><!ENTITY % ExampleContent “par”><!element example (%exampleContent;)+><!element par (#PCDATA)><!element fig EMPTY><!attlist fig filename ENTITY #REQUIRED>
Example.dtd
XML DeclarationDocument Type Declaration
Internal subset
c:/Wheelchair.tifExternal subset
With thanks to Neil Bradley: “The XML Companion”, Addison Wesley Longman, 2nd ed., ISBN 0-201-342855
Prolog
Hands-on XML 51 ® GvdS Palstar 2001
HOW TO WRITE DOCUMENT TYPE DEFINITIONS
• Left brain:
- Makes subdivisions
- Results in numbering and hierarchy
• Right brain:
- Makes associations
- Results in relations
• In XML:
- Hierarchical structure by rewriting elements in components
- Necessary: document analysis
- Associative structure by writing attributes to elements
- Necessary: inventory of useful relations
Hands-on XML 52 ® GvdS Palstar 2001
DESIGNER OF STRUCTURED DOCUMENTS
authors publishers
document analyst / designer
hypertext designer
database designer
information types designer
Hands-on XML 53 ® GvdS Palstar 2001
Oefeningen in het aanpassen van documenten en DTD’s
• Breid B.xml uit, o.a. met een lijst
• Breng modificaties aan in EX.DTD en Ex.xml:
– creëer floating elementen “fn” en “fnr” met bij elkaar horende id’s en gebruik
deze binnen een ander element, bijv. “Z” dat #PCDATA bevat
– kort een bepaalde constructie in EX.DTD af door een parameter entity
– vervang een stuk tekst in de invoer door een general entity, te definiëren in
de DTD
• Maak/genereer een documentschema van B.DTD, analoog aan
het schema voor het Memo en het Workshop Manual
Hands-on XML 54 ® GvdS Palstar 2001
Oefeningen in het zelf maken van kleine DTD’s
Beschrijf de regelmaat in de opgegeven patronen in een content model; test m.b.v. een uitbreiding van EX.DTD en Ex.xml.
Teneinde het document kort te houden gebruiken we de volgende element declaraties:
<!ELEMENT p empty>
<!ELEMENT q empty> etcetera.
Het document kan dan bevatten: <p/>, <q/> etcetera.
A. drie documenten: ppqrss pqqs pppqrsss
content model: gebruik "(", ")", ",", "+" en "?"
B. vier documenten: p pq pqr pqrs
content model: gebruik "(", ")", "," en "?"
C. vijf documenten: pqp pqr qrp qqp prr
content model: gebruik "(", ")", "," en "|"