27
Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens, GA, USA, November 6th 2006 Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

Clemens BecksteinHarald SackHeiko PeterFriedrich-Schiller-Universität JenaGermany

SAAW2006 - 1st Semantic Authoring and Annotation WorkshopAthens, GA, USA, November 6th 2006

Tags and Dependencies: An Integrated View of Document Annotation

Page 2: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

2

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

Outline Introduction Types of Annotation and their Dependencies

Logical Structure Conceptual Structure Referential Structure

The Structures in Concert Conclusion

Page 3: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

3

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● Searching the WWW today○ document retrieval○ keyword based search

user

search engine

query

result

document(s)

• text

• images

• video

• audio

• …

document content

keywords

search engine indexdocuments to read

+ metadata

Page 4: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

4

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● Metadata

documentsemantic annotation

Solution 1: manual annotationProblem: not efficient (expensive)

both solutions alone are unsatisfying ….

Solution 2: data mining and automatic annotationProblem: domain dependent, unreliable,…

+ metadata ?

Page 5: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

5

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● There is already (often unused) Metadata

documentsemantic annotation

index TOC references

index (conceptual knowledge)TOC (structural knowledge)references (referential knowledge)

basis of semantic document annotation

Page 6: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

6

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● Searching the WWW tomorrow (?)○ fact retrieval (or at least extended document retrieval)○ content based search

user

personalsearch agent

query

answer

document(s)

documents with theirdependency structure

+ metadata

reasoningdata mining

the answer

Page 7: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

7

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

Outline Introduction Types of Annotation and their Dependencies

Logical Structure Conceptual Structure Referential Structure

The Structures in Concert Conclusion

Page 8: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

8

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● Documents, Tags, and Annotations

<b> Lorem</b> ipsum dolor sit amet, <br/>consectetuer adipiscing elit. <br/> <a href=“……“ title=“..“/>Sed orci purus, semper eget, <br/>tristique quis, adipiscing <br/><!--<rdf:annotation user=“…“ tag=“…“…/> posuere, erat. Aenean <br/> ultricies odio id sem.Sed <br/><h1> nec felis sit ametante </h1>tempor sagittis. Vestibulum <br/>est nunc, lobortis cursus, <br/>semper vel, pulvinar sed, <br/> odio. Vestibulum blandit…

stringsannotations

associate distinguished document parts with metadata

document

consists of

smallest addressabledocument unit

Page 9: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

9

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● Documents, Tags, and Annotations○ Examples

book

• smallest document unit: word

• higher order units: sentence, paragraph, page, chapter, part, …

video

• smallest document unit: pixel

• higher order units: blocks, macro blocks, slices, frames,objects, scenes, acts,…

Page 10: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

10

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

Outline Introduction Types of Annotation and their Dependencies

Logical Structure Conceptual Structure Referential Structure

The Structures in Concert Conclusion

Page 11: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

11

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● Logical Document Structure○ Structural tags

● can be specified ○ explicitely (structural information) or○ implicitely (formatting information)

● can be associated with names/titles● can be used for document navigation

1 2 3 4 5 6 7 8 9 10 11 1312 14 15 16

Paragraph 1.1 Paragraph 1.2 Paragraph 1.3 Paragraph 2.1 Paragraph 2.2

Chapter 1 Chapter 2

Page 12: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

12

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● Logical Document Structure○ Table of Contents (TOC) from structural tags

page 1 page 2 page 3 page 4 page 5 page 6 page 7 page 8 page 9 page 10 page 11 page 13page 12 page 14 page 15

1.1 Introduction 1.2 Definition of thebasic formalism

1.3 Reasoning Algorithms

2.1 Introduction 1.1 OR-Branching finding a model

1. Basic Description Logics 2. Complexity of Reasoning

1. Basic Description Logics 1

1. Introduction 1

2. Definition of the basic formalism 5

3. Reasoning algirithms 7

2. Complexity of Reasoning 11

1. Introduction 11

2. OR-Branching: finding a model 12

Page 13: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

13

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

Outline Introduction Types of Annotation and their Dependencies

Logical Structure Conceptual Structure Referential Structure

The Structures in Concert Conclusion

Page 14: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

14

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● Conceptual Document Structure○ Can be considered as a kind of ontological skeleton○ Covers concepts of the document and their relationships

○ Using implicitely given conceptual structure requires understanding of document content

○ Explicitely given conceptual structure (only a small fraction of entire conceptual structure) can be defined by

● document author (e.g., index entries, external metadata)● document users (e.g., social tagging)

○ The conceptual document structure can also be used for document navigation

Page 15: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

15

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● Conceptual Document Structure○ Using explicitely given conceptual document structure together with

logical document structure to define the document index

field mouserodent

habitatdentision

incisorrotationof teeth

root

meadow vole prairie volebeaverhamster

SEA

SUB SUB SUB

SUB SUB

SUB SUB

SUBSUBSUB

Page 16: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

16

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● Conceptual Document Structure

field mouserodent

habitatdentision

incisorrotationof teeth

root

meadow vole prairie volebeaverhamster

SEASUB SUB SUB

SUB SUB

SUB SUB

SUBSUBSUB

1 2 3 4 5 6 7 8 9 10 11 1312 14 15 16

Paragraph1.1 Paragrah1.2 Paragraph1.3

conceptualstructure

logicalstructure

Page 17: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

17

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● Conceptual Document Structure

rodent, 1beaver, 10, 11dentision

incisor, 4rotation of teeth, 5

hamster, 2 - 4see also meadow vole

field mouse, 13, 15

prairie vole, 16

meadow vole, 16

habitat, 15

see also rodent Document Index

Page 18: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

18

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

Outline Introduction Types of Annotation and their Dependencies

Logical Structure Conceptual Structure Referential Structure

The Structures in Concert Conclusion

Page 19: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

19

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● Referential Document Structure○ Internal links:

References between parts of the same document e.g., see / see also, footnotes, figures, comments…

○ External links:References between different documentse.g., bibliographic references and citations,…

○ Only a fraction of the entire referentialdocument structure is given explicitely

○ Graph Visualization (Link Graph)

○ together with logical document structure table of figure, references, …

Page 20: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

20

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

Outline Introduction Types of Annotation and their Dependencies

Logical Structure Conceptual Structure Referential Structure

The Structures in Concert Conclusion

Page 21: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

21

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● The Structures in Concert

field mouserodent

habitatdentision

incisorrotationof teeth

root

meadow vole prairie volebeaverhamster

SUB SUB SUB

SUB SUB

SUB SUB

SUBSUBSUB

1 2 3 4 5 6 7 8 9 10 11 1312 14 15 16

Paragraph1.1 Paragrah1.2 Paragraph1.3

conceptualstructure

logicalstructure

referentialstructure

Page 22: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

22

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● The Structures in Concert

field mouserodent

habitat

incisorrotationof teeth

root

meadow vole prairie vole

SUB SUB SUB

SUB SUB

SUB SUB

SUBSUBSUB

1 2 3 4 5 6 7 8 9 10 11 1312 14 15 16

Paragraph1.1 Paragrah1.2 Paragraph1.3

conceptualstructure

logicalstructure

referentialstructure

dentision beaverhamster

Page 23: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

23

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● The Structures in Concert○ All three structures in concert can be used for

● Document reading tours (extended document retrieval)○ goal oriented selections of documents (what is mandatory to

understand the topic under consideration?)○ with additional reading directions (which document unit to read

in what order)○ by also considering user annotations, personalized reading

tours can be suggested (dependent on prior knowledge of the user)

● Collaborative authoring(avoiding ambiguities or duplicates, support index generation and cross referencing,…)

● Compute answers…(with the help of sophisticated reasoning and additional means of data mining and content understanding)

Page 24: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

24

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

Outline Introduction Types of Annotation and their Dependencies

Logical Structure Conceptual Structure Referential Structure

The Structures in Concert Conclusion

Page 25: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

25

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● Conclusion

○ Documents have intrinsic logical, conceptual and referential characteristics

○ There are complex dependencies among the document structures carrying those characteristics

○ Logical, conceptual, and referential structures along with their interdependencies should be made explicit ( meta data)

○ Applications should maintain and use those meta data, e.g. for● authoring● navigation● searching

Page 26: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

26

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

Outline Introduction Types of Annotation and their Dependencies

Logical Structure Conceptual Structure Referential Structure

The Structures in Concert Conclusion

Page 27: Clemens Beckstein Harald Sack Heiko Peter Friedrich-Schiller-Universität Jena Germany SAAW2006 - 1st Semantic Authoring and Annotation Workshop Athens,

27

Tags and Dependencies: An Integrated View of Document Annotation

Clemens Beckstein, Harald Sack, Heiko Peter, Institut für Informatik, FSU Jena, D-07743 Jena, Germany

● Related Work Topic Maps○ Topic Maps represent concepts and relationships

(conceptional structure and relational structure)

beaver dentision

rodent part of

partwhole

10

11

Topic Map

1

Resources

association type

role

association

topic

type

Topic Maps do notinclude the logicaldocument structure