22
SIGNIFICANT ENVIRONMENT INFORMATION FOR LTDP Fabio Corubolo, Adil Hasan – University of Liverpool Anna Eggers, Jens Ludwig - Göttingen State University Library Mark Hedges, Simon Waddington - King’s College London This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no FP7-601138 PERICLES.

IPRES 2014 paper presentation: significant environment information for LTDP

Embed Size (px)

Citation preview

SIGNIFICANT ENVIRONMENT INFORMATION FOR LTDP

Fabio Corubolo, Adil Hasan – University of LiverpoolAnna Eggers, Jens Ludwig - Göttingen State University LibraryMark Hedges, Simon Waddington - King’s College London

This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no FP7-601138 PERICLES.

Objective and outline• Aim: Ensure long term usability of Digital Objects (DO)

• Usability of Digital Object usually requires access to parts of its environment • Define a broad set of information (Environment information) • Consider its significance (Significant environment information) • Explore and test pragmatic methods to collect such information

Environment information definition

• All the entities (DOs, metadata, policies, rights, services, users, etc.) useful to correctly access, render and use the DO.

Refinement:• The information about the set of relationships between the

source DO and any related objects from its environment.

Environment for a DO• Technical system information (OS, system architecture, etc.)• DO metadata (descriptive, structural, technical) • User, policy, process information (User BG knowledge, …)• Information necessary to make use of the object including: • Auxiliary data (e.g. calibration data for to support sensor data)• External documentation (e.g. specifications, related documents) • Implicit knowledge about what data is useful to use the DO (e.g. the user

knowledge about what is relevant and what not in the collection) • More…

Environment

No object is an island, entire of itself• Digital objects are used in a rich environment

Digital object

Ext. Metadata

Present FutureStorage Digital object

Digital object information• Rich and varied terminology• The scope of each term is not

absolutely defined• We are aiming to support

object use: use-centric view• First broad - Environment

information: more or less all that sits outside of the DO

Standards, and coverage – initial analysis

Significant Environment Information (SEI)• Use of a DO has a purpose • The purpose gives a scope to the dependent environment

information• Weights can express the importance for a specific purpose

(definition)We define SEI as the set of relationships between a DO and its environment information qualified with purpose and weights

How to collect and measure SEI? • Observe the use of DOs – in different phases of lifecycle • in the environment of creation and use

• Collect dependencies for use (relationships to other DOs)• Measure significance e.g. based on frequency of use• Different semantics and factors for significance weights (value,…) – WIP • Weights will change in time

• Sheer curation: curation activities integrated in the use workflow; lightweight and transparent

Pericles Extraction Tool (PET)• Open source* framework - builds on the SEI concepts • Uses a sheer curation approach – right time and place• Generic, modular, domain agnostic• Collection by observation – monitoring changes in time• Snapshot of the system environment • To observe unstructured workflows• https://github.com/pericles-project/pet

* Release due soon, approved but waiting for final stamps

PET Architecture and modules• Available and used system resources;• File format identification and

checksums;• Currently running processes; • Event information (file and network)

from processes;• Graphic configuration information;• MS Office and PDF font

dependencies.• Native commands

The compulsory screenshots slide

How to setup PET for a use scenario• PET is installed, configured, started on the machine where the

DOs are used – stays in monitoring mode• The profile (modules and configuration) are use case specific • The user interacts normally with the DOs while PET collects SEI

in the background• The environment information, DO events and changes are

collected for future use and analysis

General scenario for PET1. Use PET to collect environment information when-where the

DOs are used, based on profiles--- We are now here ---

2. Analyse the information collected to infer new relationships (also SEI) between DOs - forming a graph structure

3. Assign weights to relationships based on the purpose and significance – weighted graph

Experiment: use case description• Fictional scenario, based on operations for ISS SOLAR payload• Operator’s task: resolve anomalies • Process: extensive search in the archived data + documents• Issue: how to preserve implicit information, help with overload • PET task: record SEI for a specific anomaly• monitor environment, record significant events, infer documentation

useful to solve the anomaly• SEI: to identify and debug a specific anomaly, that is the

implicit operator knowledge

Experimental results (1)An anomaly is reported in an handover sheet

The operator proceeds with documentation search and consultation, all tracked by PET

Experimental results (2)• Environment monitoring• Events, extraction on occurrence of events • Leads to dependency inference

• In future work we consider more complex issues• ‘noise’ from multitask, • careful analysis of collected data in the next phases

Conclusions, Future work• Define Significant Environment Information (SEI) for object reuse • Base for dependency graphs weighted on significance and purpose

• Explain ways to obtain SEI and significance weights • Present the PET tool – to collect SEI• Show experimental results - initial dependency collectionFuture:• Improve: filtering, dependency inference• Work on definition and semantics for significance weights • Use weighted dependency graphs to support appraisal

Thank you! More information:

• https://github.com/pericles-project/pet

About the PERICLES project• Promoting and enhancing reuse of information throughout the

content lifecycle taking account of evolving semantics• Ensure availability and reuse of digital objects for the next

generations• Extensions to current preservation and lifecycle models to

address the evolution of dynamic heterogeneous resources and their dependencies• Models capturing intent and interpretative context: key to

achieving “preservation by design”

Facts & Figures• Collaborative FP7 project on digital preservation• 12 million Euro, co-funded by the European Commission• 11 partners: research institutions, IT development and

application domain• 6 European countries• Feb 2013 – Feb 2017• Project website: http://www.pericles-project.eu

ConsortiumCOORDINATOR: King’ s College London – UK

ACADEMIC PARTNERS:Hoegskolan i Borås – University of Borås – SEGeorg-August-Universität Göttingen – DEUniversity of Liverpool – UKCentre for Research and Technology Hellas – GRUniversity of Edinburgh – UK

NON-ACADEMIC PUBLIC SECTOR ORGANISATIONS Tate – UK Belgian User Service and Operation Centre - B.USOC – BE

PRIVATE SECTOR ORGANISATIONS Dotsoft – GRSpace Applications Services NV/SA (SpaceApps) – BEXerox Research Centre Europe - FR