View
218
Download
0
Embed Size (px)
Citation preview
Clemens BecksteinHarald SackHeiko PeterFriedrich-Schiller-Universität JenaGermany
SAAW2006 - 1st Semantic Authoring and Annotation WorkshopAthens, GA, USA, November 6th 2006
Tags and Dependencies: An Integrated View of Document Annotation
2
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
Outline Introduction Types of Annotation and their Dependencies
Logical Structure Conceptual Structure Referential Structure
The Structures in Concert Conclusion
3
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● Searching the WWW today○ document retrieval○ keyword based search
user
search engine
query
result
document(s)
• text
• images
• video
• audio
• …
document content
keywords
search engine indexdocuments to read
+ metadata
4
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● Metadata
documentsemantic annotation
Solution 1: manual annotationProblem: not efficient (expensive)
both solutions alone are unsatisfying ….
Solution 2: data mining and automatic annotationProblem: domain dependent, unreliable,…
+ metadata ?
5
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● There is already (often unused) Metadata
documentsemantic annotation
index TOC references
index (conceptual knowledge)TOC (structural knowledge)references (referential knowledge)
basis of semantic document annotation
6
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● Searching the WWW tomorrow (?)○ fact retrieval (or at least extended document retrieval)○ content based search
user
personalsearch agent
query
answer
document(s)
documents with theirdependency structure
+ metadata
reasoningdata mining
the answer
7
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
Outline Introduction Types of Annotation and their Dependencies
Logical Structure Conceptual Structure Referential Structure
The Structures in Concert Conclusion
8
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● Documents, Tags, and Annotations
<b> Lorem</b> ipsum dolor sit amet, <br/>consectetuer adipiscing elit. <br/> <a href=“……“ title=“..“/>Sed orci purus, semper eget, <br/>tristique quis, adipiscing <br/><!--<rdf:annotation user=“…“ tag=“…“…/> posuere, erat. Aenean <br/> ultricies odio id sem.Sed <br/><h1> nec felis sit ametante </h1>tempor sagittis. Vestibulum <br/>est nunc, lobortis cursus, <br/>semper vel, pulvinar sed, <br/> odio. Vestibulum blandit…
stringsannotations
associate distinguished document parts with metadata
document
consists of
smallest addressabledocument unit
9
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● Documents, Tags, and Annotations○ Examples
book
• smallest document unit: word
• higher order units: sentence, paragraph, page, chapter, part, …
video
• smallest document unit: pixel
• higher order units: blocks, macro blocks, slices, frames,objects, scenes, acts,…
10
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
Outline Introduction Types of Annotation and their Dependencies
Logical Structure Conceptual Structure Referential Structure
The Structures in Concert Conclusion
11
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● Logical Document Structure○ Structural tags
● can be specified ○ explicitely (structural information) or○ implicitely (formatting information)
● can be associated with names/titles● can be used for document navigation
1 2 3 4 5 6 7 8 9 10 11 1312 14 15 16
Paragraph 1.1 Paragraph 1.2 Paragraph 1.3 Paragraph 2.1 Paragraph 2.2
Chapter 1 Chapter 2
12
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● Logical Document Structure○ Table of Contents (TOC) from structural tags
page 1 page 2 page 3 page 4 page 5 page 6 page 7 page 8 page 9 page 10 page 11 page 13page 12 page 14 page 15
1.1 Introduction 1.2 Definition of thebasic formalism
1.3 Reasoning Algorithms
2.1 Introduction 1.1 OR-Branching finding a model
1. Basic Description Logics 2. Complexity of Reasoning
1. Basic Description Logics 1
1. Introduction 1
2. Definition of the basic formalism 5
3. Reasoning algirithms 7
2. Complexity of Reasoning 11
1. Introduction 11
2. OR-Branching: finding a model 12
13
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
Outline Introduction Types of Annotation and their Dependencies
Logical Structure Conceptual Structure Referential Structure
The Structures in Concert Conclusion
14
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● Conceptual Document Structure○ Can be considered as a kind of ontological skeleton○ Covers concepts of the document and their relationships
○ Using implicitely given conceptual structure requires understanding of document content
○ Explicitely given conceptual structure (only a small fraction of entire conceptual structure) can be defined by
● document author (e.g., index entries, external metadata)● document users (e.g., social tagging)
○ The conceptual document structure can also be used for document navigation
15
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● Conceptual Document Structure○ Using explicitely given conceptual document structure together with
logical document structure to define the document index
field mouserodent
habitatdentision
incisorrotationof teeth
root
meadow vole prairie volebeaverhamster
SEA
SUB SUB SUB
SUB SUB
SUB SUB
SUBSUBSUB
16
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● Conceptual Document Structure
field mouserodent
habitatdentision
incisorrotationof teeth
root
meadow vole prairie volebeaverhamster
SEASUB SUB SUB
SUB SUB
SUB SUB
SUBSUBSUB
1 2 3 4 5 6 7 8 9 10 11 1312 14 15 16
Paragraph1.1 Paragrah1.2 Paragraph1.3
conceptualstructure
logicalstructure
17
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● Conceptual Document Structure
rodent, 1beaver, 10, 11dentision
incisor, 4rotation of teeth, 5
hamster, 2 - 4see also meadow vole
…
field mouse, 13, 15
prairie vole, 16
meadow vole, 16
habitat, 15
see also rodent Document Index
18
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
Outline Introduction Types of Annotation and their Dependencies
Logical Structure Conceptual Structure Referential Structure
The Structures in Concert Conclusion
19
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● Referential Document Structure○ Internal links:
References between parts of the same document e.g., see / see also, footnotes, figures, comments…
○ External links:References between different documentse.g., bibliographic references and citations,…
○ Only a fraction of the entire referentialdocument structure is given explicitely
○ Graph Visualization (Link Graph)
○ together with logical document structure table of figure, references, …
20
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
Outline Introduction Types of Annotation and their Dependencies
Logical Structure Conceptual Structure Referential Structure
The Structures in Concert Conclusion
21
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● The Structures in Concert
field mouserodent
habitatdentision
incisorrotationof teeth
root
meadow vole prairie volebeaverhamster
SUB SUB SUB
SUB SUB
SUB SUB
SUBSUBSUB
1 2 3 4 5 6 7 8 9 10 11 1312 14 15 16
Paragraph1.1 Paragrah1.2 Paragraph1.3
conceptualstructure
logicalstructure
referentialstructure
22
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● The Structures in Concert
field mouserodent
habitat
incisorrotationof teeth
root
meadow vole prairie vole
SUB SUB SUB
SUB SUB
SUB SUB
SUBSUBSUB
1 2 3 4 5 6 7 8 9 10 11 1312 14 15 16
Paragraph1.1 Paragrah1.2 Paragraph1.3
conceptualstructure
logicalstructure
referentialstructure
dentision beaverhamster
23
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● The Structures in Concert○ All three structures in concert can be used for
● Document reading tours (extended document retrieval)○ goal oriented selections of documents (what is mandatory to
understand the topic under consideration?)○ with additional reading directions (which document unit to read
in what order)○ by also considering user annotations, personalized reading
tours can be suggested (dependent on prior knowledge of the user)
● Collaborative authoring(avoiding ambiguities or duplicates, support index generation and cross referencing,…)
● Compute answers…(with the help of sophisticated reasoning and additional means of data mining and content understanding)
24
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
Outline Introduction Types of Annotation and their Dependencies
Logical Structure Conceptual Structure Referential Structure
The Structures in Concert Conclusion
25
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● Conclusion
○ Documents have intrinsic logical, conceptual and referential characteristics
○ There are complex dependencies among the document structures carrying those characteristics
○ Logical, conceptual, and referential structures along with their interdependencies should be made explicit ( meta data)
○ Applications should maintain and use those meta data, e.g. for● authoring● navigation● searching
26
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
Outline Introduction Types of Annotation and their Dependencies
Logical Structure Conceptual Structure Referential Structure
The Structures in Concert Conclusion
27
Tags and Dependencies: An Integrated View of Document Annotation
Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany
● Related Work Topic Maps○ Topic Maps represent concepts and relationships
(conceptional structure and relational structure)
beaver dentision
rodent part of
partwhole
10
11
Topic Map
1
Resources
association type
role
association
topic
type
Topic Maps do notinclude the logicaldocument structure