8
http://www.text-technology.de/projects/ sekimo.html TextTechnological M odelling ofInform ation Sekimo Solutions mentioned by the TEI CONCUR: an optional feature of SGML (not XML) that allows multiple hierarchies to be marked up concurrently in the same document milestone elements: empty elements that mark the boundaries between elements in a non-nesting structure fragmentation of an item: the division of a single element into two or more parts, each of which nests properly within its context virtual joins: the re-creation of a virtual element from fragments of text redundant encoding: information encoded in multiple forms

Http:// Sekimo Solutions mentioned by the TEI CONCUR: an optional feature of SGML (not XML) that allows multiple

Embed Size (px)

Citation preview

Page 1: Http:// Sekimo Solutions mentioned by the TEI  CONCUR: an optional feature of SGML (not XML) that allows multiple

http://www.text-technology.de/projects/sekimo.html

Text TechnologicalModelling of Information

Sekimo

Solutions mentioned by the TEI

CONCUR: an optional feature of SGML (not XML) that allows multiple hierarchies to be marked up concurrently in the same document

milestone elements: empty elements that mark the boundaries between elements in a non-nesting structure

fragmentation of an item: the division of a single element into two or more parts, each of which nests properly within its context

virtual joins: the re-creation of a virtual element from fragments of text

redundant encoding: information encoded in multiple forms

Page 2: Http:// Sekimo Solutions mentioned by the TEI  CONCUR: an optional feature of SGML (not XML) that allows multiple

http://www.text-technology.de/projects/sekimo.html

Text TechnologicalModelling of Information

Sekimo

Problems with milestones

milestones are empty elements

milestones elements have no content

consequences:

no content model restriction can be stated by a document grammar

standard SG/XML editors cannot annotate these regions

SG/XML parsers cannot ensure proper nesting of the milestone elements

to process these regions by means of a style sheet is

more difficult (XSLT) or

impossible (CSS)

Page 3: Http:// Sekimo Solutions mentioned by the TEI  CONCUR: an optional feature of SGML (not XML) that allows multiple

http://www.text-technology.de/projects/sekimo.html

Text TechnologicalModelling of Information

Sekimo

CLIX/Horse-milestones

Differing type of milestones

<milestone type=’start’ gi=’q’ id=’foo’/> … <milestone type=’end’ gi=’q’ coid=’foo’/>

<start gi=’q’ id=’foo’/>...<end gi=’q’ coid=’foo’/>

CLIX

Non-XML:

<B>s<I>xyz</B>t</I>

Would be :

<B sID=’1’/>b<I sID=’2’/>xyz<B eID=’1’/>t<I eID=’2’/>

Page 4: Http:// Sekimo Solutions mentioned by the TEI  CONCUR: an optional feature of SGML (not XML) that allows multiple

http://www.text-technology.de/projects/sekimo.html

Text TechnologicalModelling of Information

Sekimo

Problems with the other TEI-solutions

CONCUR:

(de facto) not implemented (and not part of XML) fragmentation of an item:

results in 'containers' containing only a part of the text, e.g. a fragmented sentence or para would not contain an entire sentence or paragraph, as implied

virtual joins:

requires a separate interpretation of the SGML document redundant encoding:

results in multiple files the files are not integrated in a larger unit it exists no unit containing all the information

Page 5: Http:// Sekimo Solutions mentioned by the TEI  CONCUR: an optional feature of SGML (not XML) that allows multiple

http://www.text-technology.de/projects/sekimo.html

Text TechnologicalModelling of Information

Sekimo

Stand-off annotation

new layers of annotation are added by building a new tree whose nodes are SGML elements which do not contain textual content, but links to another layer

in some respects a generalization of the virtual joins (although not mentioned by the TEI), because

not only contents of elements are joined, but also ranges between points within the document

link base:

Distinction 1: markup already contained in an annotation layer vs. text content, addressed by character offsets

Distinction 2: one (dedicated) layer as the link target vs. (free) interlinking of

several layers

andreas
Link base'markup already contained in an annotation layer' ismost often used in practical applicationstext content, addressed by character offsets was meant by Thopson and McKelvie in 1997In most of its applications, stand-off annotation makes use of one layer as the link target of the new tier, but it is also possible to link to severals already existing layers ([Carletta et al. 2003]). Sometimes this new layer is included in the same document, and sometimes the layers are separated.
Page 6: Http:// Sekimo Solutions mentioned by the TEI  CONCUR: an optional feature of SGML (not XML) that allows multiple

http://www.text-technology.de/projects/sekimo.html

Text TechnologicalModelling of Information

Sekimo

Advantages of stand-off annotation

Thompson & McKelvie (1997)

the source document might be read-only

annotation files can be distributed without distributing the source text

Michael Glass & Barbara Di Eugenio (2002)

discontinuous segments of text can be combined in a single annotation

independent parallel coders can produce independent annotations

different annotation files can contain different layers of information

Pianta & Bentivogli (2004)

elegance and clarity

processing conceptually simple

andreas
The new layers can only be interpreted by reference to the layer(s) they point to.
Page 7: Http:// Sekimo Solutions mentioned by the TEI  CONCUR: an optional feature of SGML (not XML) that allows multiple

http://www.text-technology.de/projects/sekimo.html

Text TechnologicalModelling of Information

Sekimo

Drawbacks of stand-off annotation

new layers require a separate interpretation

the layers, although separate, depend on each other

the information, although included, is difficult to access using generic methods

standard parsing or editing software cannot be employed

standard document grammars can only be used for the level, containing both markup and textual data

linking at a sub-element range is difficult

the primary layer should be a (primary) level

andreas
The new layers can only be interpreted by reference to the layer(s) they point to.
Page 8: Http:// Sekimo Solutions mentioned by the TEI  CONCUR: an optional feature of SGML (not XML) that allows multiple

http://www.text-technology.de/projects/sekimo.html

Text TechnologicalModelling of Information

Sekimo

Non SGML-based Markup Languages

some non-SGML-based markup languages have been proposed, e.g. Multi-Element Code System (MECS) or TexMECS

its major extension with respect to SGML and XML is that overlapping ranges are admitted within documents.

in 2002 the Layered Markup and Annotation Language (LMNL) was proposed Tennison and Piez 2002

LMNL is a markup language which not only allows to annotate overlapping elements but also to connect the element names to corresponding annotation levels.

LMNL solves both problems, but

(full) LMNL is not SGML-based

andreas
The new layers can only be interpreted by reference to the layer(s) they point to.