Upload
cutter
View
29
Download
0
Tags:
Embed Size (px)
DESCRIPTION
What Are Real DTDs Like. Group Members : Xijie Zeng Peiyu Cai Presentor : Xijie Zeng. Outline. Overview Introduction Local properties Global properties. Overview. XML is widely used in a variety of areas DTDs with different structures define XML with different usages - PowerPoint PPT Presentation
Citation preview
What Are Real DTDs Like
Group Members :Xijie ZengPeiyu Cai
Presentor :Xijie Zeng
Outline
Overview Introduction Local properties Global properties
Overview
XML is widely used in a variety of areas
DTDs with different structures define XML with different usages
A survey based on a number of DTDs in our real world
Introduction DTDs are from XML.org DTD repository Three DTD categories :
app :Describe objects interchanged between programs/applications
data :Describe data stored in database
meta :Describe the structure of document markup
60 DTDs- 7 are app, 13 are data, 40 are meta
Introduction (cont.) A DTD can be described as a collection of ele
ment declarations of the form e α where e is the element name and α is the content model. The content model α::= ε| pcdata |e |α,α| α| α|α* | α+ | α?
Introduction (cont.)Email DTD<!ELEMENT email (head, body)><!ELEMENT head (from, to+, cc*, subject)><!ELEMENT from EMPTY><!ATTLIST from name CDATA #IMPLIED
address CDATA #REQUIRED><!ELEMENT to EMPTY><!ATTLIST to name CDATA #IMPLIED
address CDATA #REQUIRED><!ELEMENT cc EMPTY><!ATTLIST cc name CDATA #IMPLIED
address CDATA #REQUIRED>
<!ELEMENT subject (#PCDATA)><!ELEMENT body (text, attachment*)><!ELEMENT text (#PCDATA)><!ELEMENT attachment EMPTY><!ATTLIST attachment encoding (mime|binhex) "m
ime" file CDATA #REQUIRED>
email (head, body)head (from, to+, cc*, subject)from (ε)
to (ε)
cc (ε)
subject (pcdata)body (text, attachment*)text (pcdata)attachment (ε)
Introduction (cont.)
Local propertiesDescribe content models in individual element declarations
Global propertiesDescribe the graph-theoretic structure of the whole DTD
Local properties Content model classification
(1) pcdata (2) ε (3) any
No restriction on subelements (4) Mixed content
body (text, attachment*)text (pcdata)
(5) “|” only but not mixed content (6) “,” only (7) Complex content
Contains both “|” and “,”directory (dirname, dirinfo?, dirdesc?, (file | directory)*)
(8) List α * α +
(9) Single α ?
body1 (pcdata, attatchment*)
Local properties (cont.)
Content model classification
Local properties (cont.) Syntactic complexitydepth(ε) = 0;depth(е) = 1;depth(α*) = depth(α+) = depth(α?) =depth(pcdata) = 1;depth(α1,α2,…, αn) = depth(α1|α2,…|αn) =
depth(α) + 1;
max(depth(αi)) + 1;
Local properties (cont.) An examplehead (from, to+, cc*, subject)depth(from, to+, cc*, subject)
= depth(cc*) + 1= depth(cc) + 1 + 1= 1 + 1 + 1 = 3
Local properties (cont.) Determinism
If a content model DOES NOT require look ahead when parsing, it is a deterministic content model.non-deterministic content model : (a, b) | (a, c)
deterministic content model : a, (b|c) Result
It detects 5 non-deterministic content models in 4 DTDs.
Local properties (cont.) Ambiguity
Definition : An expression R is ambiguous if and only if there exists some string s in R such that there can be distinct ways to parse string s.partner (name?, onetime?, partnrid?, partnrtype?, syncind?, name*, parentid?, partnridx?, partnrratg*)
ResultIt detects 2 ambiguous content models.
Global properties ReachabilityDefinition : An element name e’ is reachable from e, denoted
by e e’ , if either e αand e’ occurs in α, or e e” and e” e’.
An example :email (head, body)head (from, to+, cc*, subject)
Definition : An element name e is reachable if r e, where r is the name of the root element. Otherwise element name e is called unreachable or useless.
email head email subjecthead subject
Global properties (cont.) Reachability
Unreachable element names in DTDs
Global properties (cont.) Recursions
Definition : A content model αis derivable from an element name e, denoted by e α, if either e α, or e α’, e’ α”, and α= α’[e’/ α”], where α= α’[e’/ α”] denotes the content model obtained by substituting α” for all occurrences of e’ in α’.
An example :email (head, body) head (from, to+, cc*, subject)
Definition : A DTD is recursive if and only if it has an element name e such that e e and e is reachable.
email (head, body)
head (from, to+, cc*, subject)
(from, to+, cc*, subject, body)email
Global properties (cont.) Recursions Definition : A DTD is linear recursive if and only if it is recursive and for any
reachable element name e and any e α, e occurs at most once inαand the occurrence is not enclosed in “*” or “+”. A DTD is said to be non-linear recursive if it is recursive but is not linear recursive.
An example of non-linear recursive :directory (dirname, dirinfo?, dirdesc?, (file | directory)*)
An example of linear recursive :e (pcdata | e)
ResultNo linear recursive DTD is found in the sample DTDs.There are 7, 2 and 26 non-linear recursive DTDs in the app, data and me
ta category respectively.
Global properties (cont.) Chain of stars
An example :entity (name*, contact*, location*, phone*, fax*)location (city*, otherinfo?)There is a chain of 2 stars.
Global properties (cont.) Chain of stars
Global properties (cont.) Hubs
Definition : Fan-in of an element name e is the cardinality of the set {e’ | e’ αand e occurs in α}. An element name with a large fan-in value is called hub.
An example :email (head, body)head (from, to+, cc*, subject)from (ε)to (ε)cc (ε)subject (pcdata)body (text, attachment*)text (pcdata)attachment (ε)
The fan-in value of email element is 0, and the fan-in value of all other elements in this DTD is 1.
Global properties (cont.)Result :
Fan-in of elements in data DTDs Fan-in of elements in meta DTDs
Summary Local properties
Content model classification Syntactic complexity Determinism Ambiguity
Global properties Reachability Recursions Chain of stars Hubs
One drawback of this survey It does not study any properties of attributes