Ingo Frommholz COLLATE – Collaboratory for Annotation, Indexing and Retrieval of Digitized Historical Archive Material DELOS International Cooperation Workshop, May 30, 2003 Ingo Frommholz Fraunhofer IPSI, Darmstadt frommholz@ipsi.fraunhofer.de http://ipsi.fraunhofer.de/

Ingo Frommholz Fraunhofer IPSI, Darmstadt frommholz@ ipsi.fraunhofer.de ipsi.fraunhofer.de

  • Upload

  • View

  • Download

Embed Size (px)


COLLATE – Collaboratory for Annotation, Indexing and Retrieval of Digitized Historical Archive Material DELOS International Cooperation Workshop , May 30, 2003. Ingo Frommholz Fraunhofer IPSI, Darmstadt frommholz@ ipsi.fraunhofer.de http://ipsi.fraunhofer.de/. - PowerPoint PPT Presentation

Citation preview

COLLATE - Collaboratory for Annotation, Indexing and Retrieval of Digitized Historical Archive MaterialDELOS International Cooperation Workshop, May 30, 2003
Ingo Frommholz
Ingo Frommholz
Valuable historic document collections exist, but are scattered in national archives
Sources mostly not available online
Difficult-to-use database & referencing systems
Valuable expert domain knowledge exists, but mostly inaccessible to externals
Tacit knowledge, insufficiently documented
Ingo Frommholz
Preserve historic documents in a distributed multimedia repository
European historic film documentation (20ies and 30ies)
Historic film censorship (legal docs, applications & decisions, correspondence, etc.), Press material (articles), Photos (stills, portraits) & film posters, Digital film/video fragments
XML metadata (cataloguing & content indexing)
Ensure accessibility
Content- and context-based retrieval
DELOS International Cooperation Workshop Prague, May 30, 2003
XML metadata: to comply with common standards
COLLATE provides a work environment for content indexing and annotation
Ingo Frommholz
Filmarchiv Austria, Vienna, Austria
Technology developers
Sword ICT S.r.l., Bari, Italy
Evaluation partner
DELOS International Cooperation Workshop Prague, May 30, 2003
Ingo Frommholz
Interpretative content analysis of documents
Reconstruct „unity“ of cultural phenomena, interlinking scattered knowledge sources
Offer new knowledge working environment
Organize collaborative work
Create enhanced cultural information services
Raise awareness & visibility of cultural archives
DELOS International Cooperation Workshop Prague, May 30, 2003
Ingo Frommholz
Ingo Frommholz
Newspaper Articles
Ingo Frommholz
1) ABC Ontology => http://metadata.net/harmony/
Description of single-medium atomic digital resources has advanced in the past several years due to the development of metadata standards such as Dublin Core, which provides a framework for describing simple textual or image resources, and MPEG-7, which will provide the same for audio, video and audiovisual resources.
While such single medium documents are certainly useful and prevalent, the potential of digital libraries lies in their ability to store and deliver complex multimedia resources that combine text, image, audio and video components. The relationships between these components are multifaceted including temporal, spatial, structural and semantic and any descriptions of a multimedia resource must account for these relationships.
2) CIDOC CRM => http://cidoc.ics.forth.gr/what_is_crm.html
The CIDOC Conceptual Reference Model (CRM) provides definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation.
The CIDOC CRM is intended to promote a shared understanding of cultural heritage information by providing a common and extensible semantic framework that any cultural heritage information can be mapped to. It is intended to be a common language for domain experts and implementers to formulate requirements for information systems and to serve as a guide for good practice of conceptual modelling. In this way, it can provide the "semantic glue" needed to mediate between different sources of cultural heritage information, such as that published by museums, libraries and archives.
3) FRBR Functional Requirements for Bibliographic Records => http://www.ifla.org/VII/s13/frbr/frbr.htm
4) LC TGM II Thesaurus for Graphical Materials II: Genre & Physical Characteristic Terms => http://www.loc.gov/rr/print/tgm2/
5) FIAF Classification Scheme for Literature on Film and Television
Ingo Frommholz
Ingo Frommholz
DELOS International Cooperation Workshop Prague, May 30, 2003
The architecture is based on the reference model for an Open Archival Information System (OAIS).
The OAIS reference model was developed in the domain of space data systems.
According to the definition in [1], “an OAIS is an archive, consisting of an organization
of people and systems, that has accepted the responsibility to preserve
information and make it available to a designated community”.
[1] Consultive Committee for Space Data Systems. Reference Model for an Open
Archival Information System (OAIS), January 2002.
Ingo Frommholz
Ingo Frommholz
Discourse Structures
“Discourses represent extended communication between two or more participants in a shared context.” (Rich & Sidner, 1998)
Establishing a discourse context
Annotation thread reflects scientific discourse
Typed links (DSR) between
Ingo Frommholz
Ingo Frommholz
DELOS International Cooperation Workshop Prague, May 30, 2003
Ingo Frommholz
Document Retrieval in COLLATE
For a query q, a ranking of documents is returned. Therefore, a retrieval weight r is calculated for each document.
Documents are ranked according to descending retrieval weights
The retrieval is based on the document’s metadata (given by film scientists or extracted from the digitized documents) and on the annotation thread.
DELOS International Cooperation Workshop Prague, May 30, 2003
Ingo Frommholz
A document is seen in the light of its interpretations
We also consider at which point of the discourse a statement is made and what relation exists between the statement and the entity this statement refers to.
Example: Consider the query for “all censorship decisions made for political reasons”.
DELOS International Cooperation Workshop Prague, May 30, 2003
Ingo Frommholz
Metadata Only
here are not the real reasons.
I see a political background as
the main reason.
similar decisions with the
political background, but I
reason in this case.
Ingo Frommholz
Metadata + Interpretation
here are not the real reasons.
I see a political background as
the main reason.
similar decisions with the
political background, but I
reason in this case.
Ingo Frommholz
Analysis of Discourse Structure Relations
I think the reasons mentioned
here are not the real reasons.
I see a political background as
the main reason.
similar decisions with the
political background, but I
reason in this case.
Ingo Frommholz
Ingo Frommholz
Current State
A first prototype was delivered to the archives and is used by them
A second prototype will be delivered soon, introducing discourse structure relations and advanced collaboration features to the users
A third prototype will contain context-based retrieval
DELOS International Cooperation Workshop Prague, May 30, 2003
Ingo Frommholz
DELOS International Cooperation Workshop Prague, May 30, 2003
Ingo Frommholz
More information?