Upload
sharon-nash
View
213
Download
1
Embed Size (px)
Citation preview
http://www.text-technology.de/projects/sekimo.html
Text TechnologicalModelling of Information
Sekimo
Solutions mentioned by the TEI
CONCUR: an optional feature of SGML (not XML) that allows multiple hierarchies to be marked up concurrently in the same document
milestone elements: empty elements that mark the boundaries between elements in a non-nesting structure
fragmentation of an item: the division of a single element into two or more parts, each of which nests properly within its context
virtual joins: the re-creation of a virtual element from fragments of text
redundant encoding: information encoded in multiple forms
http://www.text-technology.de/projects/sekimo.html
Text TechnologicalModelling of Information
Sekimo
Problems with milestones
milestones are empty elements
milestones elements have no content
consequences:
no content model restriction can be stated by a document grammar
standard SG/XML editors cannot annotate these regions
SG/XML parsers cannot ensure proper nesting of the milestone elements
to process these regions by means of a style sheet is
more difficult (XSLT) or
impossible (CSS)
http://www.text-technology.de/projects/sekimo.html
Text TechnologicalModelling of Information
Sekimo
CLIX/Horse-milestones
Differing type of milestones
<milestone type=’start’ gi=’q’ id=’foo’/> … <milestone type=’end’ gi=’q’ coid=’foo’/>
<start gi=’q’ id=’foo’/>...<end gi=’q’ coid=’foo’/>
CLIX
Non-XML:
<B>s<I>xyz</B>t</I>
Would be :
<B sID=’1’/>b<I sID=’2’/>xyz<B eID=’1’/>t<I eID=’2’/>
http://www.text-technology.de/projects/sekimo.html
Text TechnologicalModelling of Information
Sekimo
Problems with the other TEI-solutions
CONCUR:
(de facto) not implemented (and not part of XML) fragmentation of an item:
results in 'containers' containing only a part of the text, e.g. a fragmented sentence or para would not contain an entire sentence or paragraph, as implied
virtual joins:
requires a separate interpretation of the SGML document redundant encoding:
results in multiple files the files are not integrated in a larger unit it exists no unit containing all the information
http://www.text-technology.de/projects/sekimo.html
Text TechnologicalModelling of Information
Sekimo
Stand-off annotation
new layers of annotation are added by building a new tree whose nodes are SGML elements which do not contain textual content, but links to another layer
in some respects a generalization of the virtual joins (although not mentioned by the TEI), because
not only contents of elements are joined, but also ranges between points within the document
link base:
Distinction 1: markup already contained in an annotation layer vs. text content, addressed by character offsets
Distinction 2: one (dedicated) layer as the link target vs. (free) interlinking of
several layers
http://www.text-technology.de/projects/sekimo.html
Text TechnologicalModelling of Information
Sekimo
Advantages of stand-off annotation
Thompson & McKelvie (1997)
the source document might be read-only
annotation files can be distributed without distributing the source text
Michael Glass & Barbara Di Eugenio (2002)
discontinuous segments of text can be combined in a single annotation
independent parallel coders can produce independent annotations
different annotation files can contain different layers of information
Pianta & Bentivogli (2004)
elegance and clarity
processing conceptually simple
http://www.text-technology.de/projects/sekimo.html
Text TechnologicalModelling of Information
Sekimo
Drawbacks of stand-off annotation
new layers require a separate interpretation
the layers, although separate, depend on each other
the information, although included, is difficult to access using generic methods
standard parsing or editing software cannot be employed
standard document grammars can only be used for the level, containing both markup and textual data
linking at a sub-element range is difficult
the primary layer should be a (primary) level
http://www.text-technology.de/projects/sekimo.html
Text TechnologicalModelling of Information
Sekimo
Non SGML-based Markup Languages
some non-SGML-based markup languages have been proposed, e.g. Multi-Element Code System (MECS) or TexMECS
its major extension with respect to SGML and XML is that overlapping ranges are admitted within documents.
in 2002 the Layered Markup and Annotation Language (LMNL) was proposed Tennison and Piez 2002
LMNL is a markup language which not only allows to annotate overlapping elements but also to connect the element names to corresponding annotation levels.
LMNL solves both problems, but
(full) LMNL is not SGML-based