Ingo Frommholz COLLATE – Collaboratory for Annotation, Indexing and Retrieval of Digitized Historical Archive Material DELOS International Cooperation Workshop, May 30, 2003 Ingo Frommholz Fraunhofer IPSI, Darmstadt frommholz@ipsi.fraunhofer.de http://ipsi.fraunhofer.de/
COLLATE – Collaboratory for Annotation, Indexing and Retrieval of Digitized Historical Archive Material DELOS International Cooperation Workshop , May 30, 2003. Ingo Frommholz Fraunhofer IPSI, Darmstadt frommholz@ ipsi.fraunhofer.de http://ipsi.fraunhofer.de/. - PowerPoint PPT Presentation
Citation preview
COLLATE - Collaboratory for Annotation, Indexing and Retrieval of
Digitized Historical Archive MaterialDELOS International
Cooperation Workshop, May 30, 2003
Ingo Frommholz
Ingo Frommholz
Valuable historic document collections exist, but are scattered in
national archives
Sources mostly not available online
Difficult-to-use database & referencing systems
Valuable expert domain knowledge exists, but mostly inaccessible to
externals
Tacit knowledge, insufficiently documented
Ingo Frommholz
Preserve historic documents in a distributed multimedia
repository
European historic film documentation (20ies and 30ies)
Historic film censorship (legal docs, applications & decisions,
correspondence, etc.), Press material (articles), Photos (stills,
portraits) & film posters, Digital film/video fragments
XML metadata (cataloguing & content indexing)
Ensure accessibility
Content- and context-based retrieval
DELOS International Cooperation Workshop Prague, May 30, 2003
XML metadata: to comply with common standards
COLLATE provides a work environment for content indexing and
annotation
Ingo Frommholz
Filmarchiv Austria, Vienna, Austria
Technology developers
Sword ICT S.r.l., Bari, Italy
Evaluation partner
DELOS International Cooperation Workshop Prague, May 30, 2003
Ingo Frommholz
Interpretative content analysis of documents
Reconstruct „unity“ of cultural phenomena, interlinking scattered
knowledge sources
Offer new knowledge working environment
Organize collaborative work
Create enhanced cultural information services
Raise awareness & visibility of cultural archives
DELOS International Cooperation Workshop Prague, May 30, 2003
Ingo Frommholz
Ingo Frommholz
Newspaper Articles
Ingo Frommholz
1) ABC Ontology => http://metadata.net/harmony/
Description of single-medium atomic digital resources has advanced
in the past several years due to the development of metadata
standards such as Dublin Core, which provides a framework for
describing simple textual or image resources, and MPEG-7, which
will provide the same for audio, video and audiovisual
resources.
While such single medium documents are certainly useful and
prevalent, the potential of digital libraries lies in their ability
to store and deliver complex multimedia resources that combine
text, image, audio and video components. The relationships between
these components are multifaceted including temporal, spatial,
structural and semantic and any descriptions of a multimedia
resource must account for these relationships.
2) CIDOC CRM => http://cidoc.ics.forth.gr/what_is_crm.html
The CIDOC Conceptual Reference Model (CRM) provides definitions and
a formal structure for describing the implicit and explicit
concepts and relationships used in cultural heritage
documentation.
The CIDOC CRM is intended to promote a shared understanding of
cultural heritage information by providing a common and extensible
semantic framework that any cultural heritage information can be
mapped to. It is intended to be a common language for domain
experts and implementers to formulate requirements for information
systems and to serve as a guide for good practice of conceptual
modelling. In this way, it can provide the "semantic glue" needed
to mediate between different sources of cultural heritage
information, such as that published by museums, libraries and
archives.
3) FRBR Functional Requirements for Bibliographic Records =>
http://www.ifla.org/VII/s13/frbr/frbr.htm
4) LC TGM II Thesaurus for Graphical Materials II: Genre &
Physical Characteristic Terms =>
http://www.loc.gov/rr/print/tgm2/
5) FIAF Classification Scheme for Literature on Film and
Television
Ingo Frommholz
film
creation
x
original
version
x
censor-
ship
x
shorted
version
x
precedes
precedes
follows
Directing
x
hasAction
cencorship
decision
x
hasAction
hasParticipant
Ingo Frommholz
DELOS International Cooperation Workshop Prague, May 30, 2003
The architecture is based on the reference model for an Open
Archival Information System (OAIS).
The OAIS reference model was developed in the domain of space data
systems.
According to the definition in [1], “an OAIS is an archive,
consisting of an organization
of people and systems, that has accepted the responsibility to
preserve
information and make it available to a designated community”.
[1] Consultive Committee for Space Data Systems. Reference Model
for an Open
Archival Information System (OAIS), January 2002.
Ingo Frommholz
Ingo Frommholz
Discourse Structures
“Discourses represent extended communication between two or more
participants in a shared context.” (Rich & Sidner, 1998)
Establishing a discourse context
Annotation thread reflects scientific discourse
Typed links (DSR) between
Ingo Frommholz
interpersonal
elaboration
Ingo Frommholz
DELOS International Cooperation Workshop Prague, May 30, 2003
Ingo Frommholz
Document Retrieval in COLLATE
For a query q, a ranking of documents is returned. Therefore, a
retrieval weight r is calculated for each document.
Documents are ranked according to descending retrieval
weights
The retrieval is based on the document’s metadata (given by film
scientists or extracted from the digitized documents) and on the
annotation thread.
DELOS International Cooperation Workshop Prague, May 30, 2003
Ingo Frommholz
A document is seen in the light of its interpretations
We also consider at which point of the discourse a statement is
made and what relation exists between the statement and the entity
this statement refers to.
Example: Consider the query for “all censorship decisions made for
political reasons”.
DELOS International Cooperation Workshop Prague, May 30, 2003
Ingo Frommholz
Metadata Only
here are not the real reasons.
I see a political background as
the main reason.
similar decisions with the
political background, but I
reason in this case.
Ingo Frommholz
Metadata + Interpretation
here are not the real reasons.
I see a political background as
the main reason.
similar decisions with the
political background, but I
reason in this case.
Ingo Frommholz
Analysis of Discourse Structure Relations
I think the reasons mentioned
here are not the real reasons.
I see a political background as
the main reason.
similar decisions with the
political background, but I
reason in this case.
Ingo Frommholz
Ingo Frommholz
Current State
A first prototype was delivered to the archives and is used by
them
A second prototype will be delivered soon, introducing discourse
structure relations and advanced collaboration features to the
users
A third prototype will contain context-based retrieval
DELOS International Cooperation Workshop Prague, May 30, 2003
Ingo Frommholz
DELOS International Cooperation Workshop Prague, May 30, 2003
Ingo Frommholz
More information?
Descriptive
Information
(DigiProt)