http://www.text-technology.de
Text TechnologicalModel ling of Information
- 1 -
Information Modelling of Language and Text:
XML-based, multi-level, semantic-oriented.-Some methods (only)-
www.text-technology.de
http://www.text-technology.de
Text TechnologicalModel ling of Information
- 2 -
Research Group „Texttechnological Information Modelling“
University of Bielefeld: D. Gibbon MODELEX
D. Metzing SEKIMO
associated: J.-T. Milde Multimodal Corpora TASX
University of Dortmund: A. Storrer HYTEX
University of Giessen: H. Lobin SEMDOC
University of Tübingen: U. Mönnich COMOD
The TASX-Annotator: http://tasxforce.lili.uni-bielefeld.de/
http://www.text-technology.de
Text TechnologicalModel ling of Information
- 3 -
Methodological issues: Multidimensionality of linguistic data requires:
(1) multiple tiers of annotation (xml-based)
(2) connections between multiple tiers (specific methods)
(3) multi-annotation of identical raw data (multiple trees)
(4) specific relations between multi-level annotations
http://www.text-technology.de
Text TechnologicalModel ling of Information
- 4 -
Methodological issues: Multidimensionality of linguistic data requires:
(5) a distinction between one or more conceptual levels (semantic markup) and one or more annotation layers (syntactic markup) as well as mappings between both
(6) ways to make use of and to generate different annotation sets (annotation + data) given more uniform conceptual representations (accessibility of corpora (search, hypothesis testing, comparative or typological analysis))
http://www.text-technology.de
Text TechnologicalModel ling of Information
- 5 -
Semdoc: Annotation
<segment id="s24" parent="g6" newtopic="illustration_bck" litref="s" footnoteref="s33a">From the now infamous McDonald's coffee spill case to litigation against Ford and Firestone for injuries caused by tire tread separation to tobacco litigation, high stakes civil cases have become familiar staples of our media diet (see e.g., Are lawyers burning America, 1995; Budiansky, 1995; Church, 1986; Langley, 1986; Stossel, 1996).5 </segment>
<sect1> <para> ... From the now infamous McDonald's coffee spill case to litigation against Ford and Firestone for injuries caused by tire tread separation to tobacco litigation, high stakes civil cases have become familiar staples of our media diet (see e.g., Are lawyers burning America, 1995; Budiansky, 1995; Church, 1986; Langley, 1986; Stossel, 1996). <footnoteref linkend="i5">5</footnoteref> </para></sect1>
<segment id="i17" parent="i56" relname="span">From the now infamous McDonald's coffee spill case to litigation against Ford and Firestone for injuries caused by tire tread separation to tobacco litigation, high stakes civil cases have become familiar staples of our media diet</segment><segment id="i18" parent="i17" relname="evidence“> (see e.g., Are lawyers burning America, 1995; Budiansky, 1995; Church, 1986; Langley, 1986; Stossel, 1996).5</segment>
structural
thematic
rhetorical
http://www.text-technology.de
Text TechnologicalModel ling of Information
- 6 -
Sekimo: Multiple annotations of Japanese dialogue corpora
Annotation categories are based upon widely used tag-sets like IPADIC (Chasen)
The results of corpus analysis can be used to
- compare the tag-sets empirically
- augment tag-sets with conceptual information,
- reuse existing corpora which are based upon the same tat-sets
http://www.text-technology.de
Text TechnologicalModel ling of Information
- 7 -
Sekimo: Sample Annotation
http://www.text-technology.de
Text TechnologicalModel ling of Information
- 8 -
Ich heiße Meier
Example: Modeling of congruency in Japanese and German
watashi ha murano to moushimasuLexical-pragmaticcongruency
Morpho-syntacticcongruency
General
Ja-Germ-1
Ja-Germ-2
Ja-Germ-3
Ja-1
verb has marker
sentence has subject
subject has marker
two annotation units havemarker
verb and utterancehave marker
Conceptual difference of congruency reflects in different configurations ofannotations, related via secondary information structuring:
http://www.text-technology.de
Text TechnologicalModel ling of Information
- 9 -
Visualisation as SVG graphic
http://www.text-technology.de
Text TechnologicalModel ling of Information
- 10 -
Transformation
<NOUN>
watashi
</NOUN>
Sekimo: Example for mapping annotations <->concepts
<noun>watashi</noun>
<word pos=„noun“>watashi</word>
<word><feature>pos</feature>
<value>noun</value>
watashi</word>
noun
word[@pos=„noun“]
word[feature=„pos“ & value=„noun“]
NOUN
WORD
NOUN KOPULA
Concepts
Mapping
Annotations
http://www.text-technology.de
Text TechnologicalModel ling of Information
- 11 -
ModeLex: Temporal Calculus (Allen) for multimodal annotations
● Relations between annotation layers● Can be applied to
-Text: Order is given by character sequence-Signal: Order is given by timestamps
http://www.text-technology.de
Text TechnologicalModel ling of Information
- 12 -
Lexicon Model: Subclassification of annotation units, based upontemporal relations
Corpus Classification hierarchy
Properties
class
subclass
subsubclass
properties of class
properties of
subclass
properties of
subsubclass
http://www.text-technology.de
Text TechnologicalModel ling of Information
- 13 -
HyTex Corpus: Documents about the domain of „text technology“ (syntactic level)
Domain-specific knowledge (semantic level)
TermNet: representation of knowledge about terms and concepts of the domain (in the style of WordNet )
HyTex: Multi-level approach
Linguistic annotation:POS-TaggingLemmatizationChunk-Parsing
Textgrammatical annotation:Definitions and technical termsTopical and rhetorical structures
TermNet: Representation of semantic relations between technical terms of the domain
User model(static or dynamic)
Adaptive generation of hypertext views on coherence criteria
User models (pragmatic level)
fixed user profiles or dynamic generation of links according to the history of previous usage
http://www.text-technology.de
Text TechnologicalModel ling of Information
- 14 -
Hytex:Research Cooperation and Contacts
DFG-Forschergruppe 437:text technological modelling
of information
Text-grammatical
foundations for the
(semi)automated
text-to-hypertext
conversion (HyTex)
DEREKO: Corpus Technology at the University of Tübingen
Chunk Parser for the syntactic annotation of the HyTex corpus
WordNet Project, Princeton UniversityGermaNet Project, University of Tübingen
Exchange of entities and relations for the TermNet model
TEMIS: Text Mining Solutions Heidelberg/Paris
Annotation schema for anaphoric and co-reference relations in German texts. Usage of the Text Mining-Tool Knowledge Extractor for the annotation of definitions
Intelligent Views: Knowledge Management, Darmstadt
Usage of the tool „K-Infinity“ supporting the convenient construction and maintenance of the TermNet
http://www.text-technology.de
Text TechnologicalModel ling of Information
- 15 -
Sekimo: Project Context
SFB Mehrsprachigkeit Hamburg: Jadex
Japanese and German expert discourse in mono- und multilingual constellations
secondary information
structuring and
comparative
discourse analysis
DFG-Forschergruppe 437
Texttechnologische Informationsmodellieru
ng
NITE: Natural Interactivity Tools Engineering
University of Southern Denmark Universitat Autònoma de Barcelona
DFKI SaarbrückenHCRC Edinburgh
IMS StuttgartILC Pisa
http://www.text-technology.de
Text TechnologicalModel ling of Information
- 16 -
Research Group „Texttechnological Information Modelling“
January 2004
International Conference Center for Interdisciplinary Research
„Modeling Linguistic Information Resources“
University of Bielefeld
(1) Semantics of Generic Document Structures and Discourse Parsing
(2) Modelling Textual, Lexical and World Knowledge as a Basis for Hypertext Linking
(3) Multiple Annotation of Language Data
(4) Multimodal Lexical Information for Language Documentation