Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Modeling languages for Semi-Structured DocumentsStructured Documents
C O M P A R I S O N A N D T R A N S L A T I O N B E T W E E N D M L A N D I T S C O M P E T I T O R S
Yudan ZhaiDep. Of Informatiquep q
06/08/2009
Outline
Project Introduction Project Introduction
XML Modeling in General
Th C i f S h L The Comparison of Schema Languages
The Development of a Schema Translator
Conclusion
Project Introductionj
Goal of the projectGoal of the project
XML Modeling in general XML Modeling in general
Compare DML with other XML schema languages
Make a translation tool between DML and Relax NG Make a translation tool between DML and Relax NG
Project Introductionj
Mil t Milestones -
MS1 - Related study Understand XML modeling in general
DTD, XML Schema and Relax NG
Make a comparison among these three schemas
MS I d th t d f DML MS2 - In-depth study of DML YML, DML, DGL
Comparison: DML vs Relax NG / XML Schema / DTD Comparison: DML vs. Relax NG / XML Schema / DTD
MS3 - Implementation
XML Modeling in Generalg
XML XML stands for - eXtensible Markup Language
Motivation – Exchange information
Valid document The document should be readable and understandable with XML- The document should be readable and understandable with XML-
aware software.
Sets of rules and constrains are defined. specified by XML schema languages
Four Schema Languages g g
DTD Document Type Definitions
Can be defined inline
XML Schema Published by the W3C More express power
Too complexity syntax
Four Schema Languages g g
Relax NG Being standardized in OASIS
Clean, simple and powerful
T ib l i d l Treat attributes as elements in content models
DML DML Document Modeling Language
Is a regular tree grammar-based schema language
Supports inheritance
Comparison of Schema LanguagesComparison of Schema Languages
The easiest syntax –DTDThe easiest syntax DTD
Richest build in data types XML Schema Richest build-in data types –XML Schema
Simple yet powerful enough –Relax NG
As a part integrated system – DML
The Development of a Schema Translatorp
Project Introduction
Language: JAVA
D l i E i t JDK Developing Environment: JDK 5.0
Function: Function: Converting From RelaxNG to DML
Converting From DML to Relax NGg
The Development of a Schema Translatorp
Abstract syntax
ASN.1(Abstract Syntax Notation One )
Standard and Notation
Describes data structures
Implementationp
Ab t t t f R l NG Abstract syntax for Relax NGGrammar : = srt : Start ; def : Define Start : = top : TopDefine : = name : Identifier; elt : Element Element : = nc : NameClass; top : TopTOP : = na : NotAllowed | pattern : PatternPattern : = empty : Empty | nep :NonEmptyPatternNonEmptyPattern : = txt : TEXT | data : Data
| value : Valueue | list : NGList| att : NGAttribute | ref : REF | att : NGAttribute | ref : REF | oom : OneOrMore | choice : Choice | group : Group | itl : Interleave
Text : = < text /> Data : = type : Identifier ; dtl : URI Value : = dtl : URI ; type : Identifier ; ns : String ;; yp ; g ;
content:StringList : = pattern : PatternNGAttribute : = name : String ; pattern : Pattern Ref : = name : IdenfifierOneOrMore : = nep : NonEmptyPattern
h iChoice : = nep : NonEmptyPatternGroup : = nep : NonEmptyPatternInterleave : = nep : NonEmptyPatternNameClass : = anyName : AnyName
| nsName : NsName| name : Name| name : Name
Identifier : = S
Implementationp
Ab t t t f DML Abstract syntax for DMLSCHEMA ::= ns:NS*;str:STRUCT*;type:TYPE*NS ::= id:ID;uri:URI|ns:NS*STRUCT ::= sim:SIMPLE |named:NAMED |der:DERIVED |str:STRUCT*STRUCT :: sim:SIMPLE |named:NAMED |der:DERIVED |str:STRUCTTYPE ::= id:ID;pattr:PATTERN |type:TYPE*SIMPLE ::= att:ATTRIBUTE;cnt:CONTENTCONTENT ::= item:ITEM |ref:REFITEM ::= seq:SEQ |choice:CHOICE |elt:ELT
|txt:TXT |any:ANY |item:ITEM*|txt:TXT |any:ANY |item:ITEM*REF ::= qn:QNAMESEQ ::= occ:OCC;item:ITEMCHOICE ::= occ:OCC;item:ITEMELT ::= val:VAL;occ:OCC;sim:SIMPLEATTRIBUTE ::= anyatt:ANYATT | use:USE;val:VAL* |att:ATTRIBUTE*TXT ::= val:VAL;occ:OCC;BANY ::= occ:OCC;sim:SIMPLEANYATT ::= use:USE;val:VALOCC ::= 1|?|+|*OCC :: 1|?| |USE ::= 1|?VAL ::= tref:TYPEREF |pattr:PATTERN |id:ID |APP |CPYNAMED ::= id:ID;sim:SIMPLEID ::= id:stringB ::= Boolean B ::= Boolean
Relax NG to DML
A hit t Architecture
Relax NG Tree Builder
Relax NG Tree Builder
E l Example<?xml version="1.0" encoding="ISO-8859-1"?>
<grammar><start>
<ref name="simple-elt"/></start></start><define name="simple-elt">
<element><name ns="">a</name>
ib<attribute><name ns="">id</name><text/>
</attribute></attribute></element>
</define></grammar>
Relax NG Tree Builder
Corresponding abstract tree
Converter(Relax NG to DML)( )
Basic Rules: Basic Rules: Grammar -------> Schema
TOP -------> SimpleStructure Attribute -------> Attribute Attribute > Attribute Reference -------> Reference Other Pattern -------> Item
• Empty -------> NULL• NonEmptyPattern -------> Item
Text -------> Text Data, Value -------> Value List,OneOrMore,Group -------> SEQ, , p Q Choice -------> Choice InterLeave -------> Choice and SEQ
Define -------> NamedStructure Element >SimpleStructure Element ------->SimpleStructure
• NameClass -------> Value• Top ------->SimpleStructure
Converter(Relax NG to DML)( )
R l Rules But Reference -------> Reference?
<Seq><ref name="elt-a"/><ref name="elt-b"/>
/S
<Seq><ref name="elt-a"/><ref name="elt-b"/>
/S
<start><oneOrMore>
<group><ref name="elt-a"/><ref name="elt-b"/>
<Seq><ref name="elt-a"/><ref name="elt-b"/>
/S</Seq></Seq><ref name elt b />
</group></start><define name="elt-a">
<element><name ns "">a</name>
</Seq><seq occ="many"><elt occ="once"><name content="a"/><text occ="once" eol="false"><value type="string"/><name ns= >a</name>
<text/></element>
</define><define name="elt-b">
<value type= string /></text></elt><elt occ="once"><name content="b"/>
t t " " l "f l "<element><name ns="">b</name><text/>
</element></define>
<text occ="once" eol="false"><value type="string"/></text></elt></seq>/de e
Result
<?xml version="1.0" encoding="ISO-8859-1"?> <?xml version="1 0" encoding="UTF-8"?><?xml version 1.0 encoding ISO 8859 1 ?><!-- TWO ELEMENTS --><grammar><start><oneOrMore><group>
<?xml version= 1.0 encoding= UTF-8 ?>
<yml><seq occ="many">
<group><ref name="elt-a"/><ref name="elt-b"/>
</group></oneOrMore>
<elt occ="once"><name content="a"/><text occ="once" eol="false"><value type="string"/>
</start><define name="elt-a"><element>
<name ns="">a</name><text/>
<value type= string /></text>
</elt><elt occ="once">
<text/></element>
</define><define name="elt-b"><element>
"" b /
<name content="b"/><text occ="once" eol="false"><value type="string"/></text><name ns="">b</name>
<text/></element>
</define></grammar>
</text></elt></seq></yml>
Reverse Convertingg
A hit t Architecture
DML Tree Builder
Converter( DML to Relax NG)( )
B i l Basic rules: Schema -------> Grammar
SimpleStructure > Top SimpleStructure -------> Top
ATT -------> ATT
CNT-------> PatternC• Item ------->Pattern
• Ref -------> Ref
N dS D fi NamedStructure -------> Define
SimpleStructure ------->TOP
Converter( DML to Relax NG)( )
E ti l Exception rules:
Simple structure contains Element Element -------> Reference
Add the ne Define Str ct re Add the new Define Structure
Element contains Element Element contains Element Element -------> Reference
Add the new Define Structure
Converter( DML to Relax NG)( )
E ti l Exception rules
<yml version="1.0" type="dml">
<grammar xmlns=http://relaxng.org/ns/structure/1.0>y y
<elt><name content="addressbook"/><elt occ="many">
<name content="contact"/>
<start><ref name="addressbook-NC"/>
</start><define name "addressbook NC" ><name content= contact />
<ref name="contact-content"/></elt>
</elt>
<define name="addressbook-NC" ><element><name ns="">addressbook</name><oneOrMore>
<Structure>……..
</structure></yml>
<ref name="contact-NC"/></oneOrMore>
</element></define></yml> </define><define>
……</define><grammar>
/<grammar>
Converter( DML to Relax NG)( )
How to deal with Occurrence? How to deal with Occurrence?
Many <OneOrMore> P <OneOrMore>
Free <Choice>Free <Choice><OneOrMore>P<OneOrMore><empty/>
<Choice> <Choice>
Optional
<Choice>P <empty/><Choice>
Once P
Result (DML->RelaxNG)( )
Add b k d l Addressbook.dml
Addressbook.rng
Conclusion
DML DML is a part of integration system for the management of semi-structured
documentsh i h DTD has a stronger expressive power than DTD
Reduce the complexity as XML Schema Is very comparable to Relax NG but provides an inheritance mechanism
Implementation Based on Abstract Syntax Notation Based on Abstract Syntax Notation Good expansibility for the program Limitations:
Weak for data type conversion Syntax of RelaxNG is limited to Simple syntax only Did not consider inheritance in DML