34
Yves Marcoux - Balisage 2008 1 Graph characterization of overlap-only TexMECS and other overlapping markup formalisms Yves MARCOUX GRDS – EBSI Université de Montréal Visiting researcher Universitetet i Bergen

Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

  • Upload
    raine

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Graph characterization of overlap-only TexMECS and other overlapping markup formalisms. Yves MARCOUX. GRDS – EBSI Université de Montréal. Visiting researcher Universitetet i Bergen. Overview of the talk. Graph representations of structured documents Overlap, in markup and structure - PowerPoint PPT Presentation

Citation preview

Page 1: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 1

Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves MARCOUX

GRDS – EBSIUniversité de Montréal

Visiting researcherUniversitetet i Bergen

Page 2: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 2

Overview of the talk

1. Graph representations of structured documents

2. Overlap, in markup and structure

3. OO-TexMECS (N.B.: Overlap-Only ;-)

4. Node-ordered DAGs (noDAGs)

5. Results and consequences

6. Future work

Page 3: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 3

1. Graph representations of structured documents

Page 4: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 4

XML document = tree

top

b

c

<top> <a> <b/> </a> <c/></top>

a

Embedding in markup Child-parent in tree

Page 5: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 5

Any tree an XML document

top

b

c

<top> <a> <b/> </a> <c/></top>

a

Page 6: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 6

Any tree an XML document

top

b

c

<top> <a> <b/> <d><e/></d> </a> <c/></top>

a

e

d

Perfect isomorphism !

Page 7: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 7

Document Object Models

• DOMs are essentially graph representations of structured documents

• "Patched" for attributes, namespaces, etc.

• DOM manipulations = graph modifications

• It suffices to make sure that the graph remains a tree

Page 8: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 8

2. Overlap, in markup and structure

Page 9: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 9

Problem of overlap

• In real life (outside of XML documents!), information is often not purely hierarchical

• Classical examples:– verse structure vs sentence structure– speech structure vs line structure

• In general: multiple structures applied (at least in part) to same contents

Page 10: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 10

Overlap

Page 11: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 11

Two views of overlap

• Geometric view: overlapping markup

• Common contents view: non-tree graph structure

Page 12: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 12

Example (markup)

(Peer) Hvorfor bande? (Åse) Tvi, du tør ej!Alt ihob er tøv og tant!

<vers> <peer>Hvorfor bande?</peer><åse>Tvi, du tør ej!</vers><vers> Alt ihob er tøv og tant!</åse></vers>

Page 13: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 13

Example (graph)

(Peer) Hvorfor bande? (Åse) Tvi, du tør ej!Alt ihob er tøv og tant!

vers vers

peeråse

Alt ihob er tøv og tant!Hvorfor bande? Tvi, du tør ej!

Page 14: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 14

Embedding Child-parent

vers vers

peeråse

Alt ihob er tøv og tant!Hvorfor bande? Tvi, du tør ej!

<vers> <peer>Hvorfor bande?</peer><åse>Tvi, du tør ej!</vers><vers> Alt ihob er tøv og tant!</åse></vers>

Page 15: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 15

Still perfect isomorphism?

• Not for graphs in general... cycles !

• Maybe for acyclic graphs ?

• Let's try more examples...

Page 16: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 16

What if the last verse is saidin chorus by Peer & Åse ?

vers vers

peeråse

Alt ihob er tøv og tant!Hvorfor bande? Tvi, du tør ej!

Page 17: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 17

What if the last verse is saidin chorus by Peer & Åse ?

vers vers

peer åse

Alt ihob er tøv og tant!Hvorfor bande? Tvi, du tør ej!

Still acyclic, but no corresponding markup !

Page 18: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 18

So, imperfect isomorphism

• Some acyclic graphs have corresponding documents (i.e., are serializable)

• Some (apparently) don't...

• Manipulations of the graph (DOM) may leave it non-serializable !

• Which acyclic graphs are serializable?

Page 19: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 19

3. OO-TexMECS

Page 20: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 20

TexMECS

• A particular proposal to address the overlap problem with overlapping markup

• MECS (Huitfeldt 1992-1996)– Multi-element code system

• TexMECS (Huitfeldt & SMcQ 2003)– "Trivially extended MECS"

• Markup Languages for Complex Documents (MLCD) project

Page 21: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 21

Overlap-only TexMECS

• TexMECS allows overlapping markup...

• but also much more:– virtual elements, interrupted elements, etc.

• OO-TexMECS 101– Start-tags: <a|– End-tags: |a>– Overlapping elements allowed– Natural notion of well-formedness

Page 22: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 22

4. Node-ordered DAGs (noDAGs)

Page 23: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 23

Contents ordering

• In a serialized document, contents appears in some order

• The order is often significant– Procedure steps, verses in a poem, etc.

• Thus, the order must be present in graph representations– XML: children ordered in the tree

Page 24: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 24

Node-ordered DAGs

• Node-ordered directed acyclic graphs

• Essentially a DAG with children ordered

• Why "node-ordered" (and not "child-ordered") ?– Provisions for further ordering of nodes

Page 25: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 25

Examples

A

B C

E F G H

D

B

C

D

E

A

Page 26: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 26

5. Results and consequences

Page 27: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 27

Main results

1. A noDAG is serializable in OO-TexMECS iff it is completion-acyclic (essentially)

2. Any OO-TexMECS well-formed document can be obtained by serializing some completion-acyclic noDAG

3. You don't gain any expressivity by allowing node-ordering over and above children-ordering

Page 28: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 28

Consequences

• We now know how to define a DOM for overlapping markup

• A DOM-based editor is complete

• Round-tripping is possible between noDAGs and OO-TexMECS

• Results also apply to similar formalisms– TexMECS with more features except virtual

and interrupted elements

Page 29: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 29

Completions

• Intuition: combination of parent-child relationships & children-ordering informs on the relative positioning of tags in an eventual serialization

• If contradictory information is derivable: the graph is not serializable

Page 30: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 30

Example: "starts-before"

A

B C

E F G H

D

Page 31: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 31

Example: "ends-after"

A

B C

E F G H

D

Page 32: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 32

Contradiction = cycle !

A

B C

E F G H

D

Completion-acyclic each completion is acyclic

Page 33: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 33

Future work

• Optimal algorithm for verification

• Optimal serialization algorithm

• Relaxing conditions on the graph

• Relationships with GODDAGs (SMcQ & Huitfeldt 2004)

• And, yes...

• Intertextual semantics of overlap !

Page 34: Graph characterization of overlap-only TexMECS and other overlapping markup formalisms

Yves Marcoux - Balisage 2008 34

Thank you !

Questions ?

<[email protected]>