63
2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

Embed Size (px)

Citation preview

Page 1: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 1

Claude HUC (CNES – French Space Agency)

An Organisational Model for Digital Archive Centres

Page 2: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 2

SummarySummary

Our context: Long-term archival of space data

Digital preservation: a problem shared by everyone

The Organizational Model proposed Overview

Description of each service: functions, responsibilities, external interfaces, skills

required

Conclusions and References

Page 3: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 3

Space DataSpace Data

Nearly always produced in digital formatDiversity and wealth of scientific information - numerous subjects:

astronomy, earth observation, space physics, fundamental physics, biology, etc.

Observations from scientific instruments

Processing results of these observations, calibrations

Results from modelling processing

Documents describing instruments, experimental processes

Metadata

Publications

Page 4: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 4

Space DataSpace Data

Diversity of representation formats Text documents, sets of digital values, images,

vectorized graphics, video, sound, multimedia

documents, etc.

Very large volumes

Complexity

Geographical dispersion

Zoom in on the problem of preservation facing all

sectors of activity

Page 5: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 5

Why preserve space data?Why preserve space data?

Because the observations are often unique and cannot be reproduced

A comet or a solar flare, for example

Because scientific analysis often involves observations over very long periods of time

For example, studying changes in the earth's climate

Because these observations could, in the future, be processed again using new analysis methods not currently available

Etc.

Page 6: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 6

Lessons learnt at CNESLessons learnt at CNES

Space data in digital format for the last 40 yearsAccelerated obsolescence of technologies since 1990Backup program conducted from 1995 to 2000

Justified by the expected discontinuation of magnetic tape storage technologies

Most data has been saved, but useful scientific observations have nevertheless been lost

==> Some damaged media, but even more often the description of the information is incomplete, inaccurate or even unavailable

Large text documents entered on word processor in 1985 Entered under MS Word again in 1990 (Word 2) then entered again under MS 

Word in 1997 (Word 95) Compatibility chain broken in less than 10 years

Page 7: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 7

Lessons learnt at CNESLessons learnt at CNES

Progressive elaboration of pragmatic solutions

Large number of functions Large diversity of needed skills :

Physical preservation of files, storage media

Representation information formats

Knowledge and understanding of the archived information

Technologies for search and retrievalA relative paralysis in front of the growing obsolescence of

technologies Impacts on the costs

How to deal with such a situation ?

Page 8: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 8

Lessons learnt at CNESLessons learnt at CNES

An excessive focus on technical problems (media, data formats..)

The separation between The cases for which we have valid organisational and technical approaches to

propose

The cases for which we don’t have reliable solution : Example : the emulation ways are always part of the research activity and not part of

acceptable operational possibilities for organization in charge of information preservation

Page 9: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 9

The problem:The problem: reduction of time reduction of time scalesscales

Problems encountered: office documents: ==> compatibility chain broken in less than 10 years More recent scientific documents (1995!!) but for which all mathematical formulas

had to be re-typed. Scientific observations stored on magnetic tapes saved at the last minute

Technological change accelerating since the 1990's

This trend shows no signs of slackening off, quite the contrary (at least 5

versions of MS Word for Windows since 1995).

The preservation of a digital document saved 10 years ago, or even less, may already be vulnerable.

Page 10: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 10

Who is concerned today?Who is concerned today?

Almost everyone:

Administration: public records, etc.

Health care sector

Pension funds

Industry: oil, aeronautics, etc.

Scientific research, the space sector

Defence

Nuclear sector

…..as well as private individuals.

Page 11: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 11

The French PIN GroupThe French PIN Group

PIN: Pérennisation des Informations NumériquesPérennisation des Informations Numériques (Preservation of digital information) (Preservation of digital information) Inter-organizational work group set up in 2000

within the Aristote non-profit association (http://pin.cnes.fr) Fields of activity:

Terminology and standards Information representation formats Archiving systems Organizational: roles of the participants Information to be archived: classification, formats, life cycle, etc. Legal issues

Actions Methodology used to evaluate formats for preservation Creation of global, comprehensive and practical training

Page 12: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 12

Participants in theParticipants in thePIN GroupPIN Group

French National Archives, Centre of Contemporary Archives (Direction des Archives de France, Centre des Archives Contemporaines)

French National Library (BnF: Bibliothèque nationale de France) French Atomic Energy Commission (CEA) National Computer Centre of Higher Education (CINES: Centre Informatique

National de l'Enseignement Supérieur) French Agricultural Research Centre for International Development (CIRAD: Centre

de coopération Internationale en Recherche Agronomique pour le Développement) France Télécom Research and Development Groupe Médéric (pension funds) French Institute of Research on the Utilization of the Sea (IFREMER: Institut

Français de Recherche pour l'Exploitation de la MER) French National Audiovisual Institute (INA: Institut National de l'Audiovisuel) National Geographical Institute (IGN: Institut Géographique National) INRIA Ministries of Agriculture, Public Works, Justice and Defence City of Paris Institut Pasteur

Page 13: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 13

Vulnerability:Vulnerability: numerous causesnumerous causes

Technical factors Obsolescence of storage technologies, software and systems

Dependency between data created and the creation environment

Data or documents not describedOrganizational and financial factors

Preservation of information is an activity in its own right.

Need to review work organisation, sharing of responsibility,

setting up the right skills at the right placeNormative, legal, industrial, psychological factors, related to lack of

basic training...

Page 14: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 14

Understand the problem posed Understand the problem posed in order to solve itin order to solve it

This is the purpose of the 'Reference Model for an Open Archival Information System (OAIS)' Issue 1. January 2002

http://www.ccsds.org/CCSDS/recommandreports.jsp#interchange

CCSDS reference: CCSDS 650.0-B-1 (free)

http://www.iso.org/ ISO reference: ISO 14721:2002 (206 Swiss Francs…)

Translation in progress, conducted jointly between CNES and the BnF

Detailed analysis, definition of concepts, a functional model and an information model to understand all aspects involved in archiving information in digital format

Page 15: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 15

OAIS: What is an archive?OAIS: What is an archive?

A structure whose aim is to preserve information for access and use by a Defined Community of Users.

Data preservation

Long-term access to data

Along with data, preservation of all information required for its comprehension

and use

Definition taken from ISO standard 14721:2002 Archival is not:

a copy or a system backup

data filed away permanently when it is not expected to be used any more

Page 16: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 16

OAIS:OAIS: Data and informationData and information

Information any type of knowledge which can be exchanged

independent of the format (i.e. physical or digital) used to represent this

information

Data: information representation formats

Page 17: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 17

OAIS:OAIS: Functional EntitiesFunctional Entities

SIP = Submission Information PackageAIP = Archival Information PackageDIP = Dissemination Information Package

SIP

DescriptiveInfo.

AIP AIP DIP

Administration

PRODUCER

CONSUMER

queriesresult sets

MANAGEMENT

Ingest Access

DataManagement

ArchivalStorage

DescriptiveInfo.

Preservation Planning

orders

Page 18: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 18

Search for a practical structureSearch for a practical structure

Administration

PRODUCER

CONSUMER

queriesresult sets

MANAGEMENT

Ingest Access

DataManagement

ArchivalStorage

Preservation Planning

orders

Page 19: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 19

OAIS:OAIS: Functional EntitiesFunctional Entities

A very large number of functions and sub-fonctions we have to deal with

We do not know clearly how to start to solve the problem

The existing organization for traditional (non digital) archives is not really appropriate for digital information

In this framework, we propose an organization split in autonomous services – each service is in charge of precise functions with well defined interfaces with other services : reduce the size of the problem should help us to solve it

Link the abstract and global OAIS approach to the pragmatic practices

Page 20: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 20

Three coordinated servicesThree coordinated services

CONSUMER

PRODUCER

Ingest

ArchivalStorage

DataManagement

Access

Coordination

Page 21: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 21

Service?Service?

‘Service’ is taken to mean an administrative unit consisting of people, technical facilities and resources given a clear assignment

For each service proposed, it is necessary to: accurately define functions and responsibilities,

specify the external interfaces (relations with the other services and relations with the entities external to the archive),

specify the skills required in order for the service to operate.

It is assumed that each service implements its own set of procedures and standards and has the means and resources required to carry out its tasks.

Page 22: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 22

Three coordinated servicesThree coordinated services

Archival Storage

IngestData

managementand Access

Coordination

Page 23: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 23

Additional considerationsAdditional considerations

Administration

PRODUCER

CONSUMER

queriesresult sets

MANAGEMENT

Ingest Access

DataManagement

ArchivalStorage

Preservation Planning

orders

Page 24: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 24

Ingest serviceIngest service

Archival Storage

IngestData

managementand Access

Coordination

Page 25: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 25

Ingest Service

is responsible

for collecting data objects from producers and

for all tasks involved in transforming the objects handed over by producers into digital objects that meet all requirements applicable to archiving.

The Ingest Service has to deal with all actions and tasks identified in the CCSDS standard and the ISO draft standard

'Producer-Archive Interface Abstract Standard' negotiation: what the producer can and cannot do

transformations on the data and metadata which remain the responsibility of Archival

Storage

Page 26: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 26

Ingest Service

Main tasks: Receive objects from producers and check their compliance with the established plan.

Change the data and metadata format when necessary (files delivered in MS Word format, for example, may be transformed into PDF/Archive

format, and text files containing metadata may be transformed into structured XML files).

Assign to received digital objects a unique identifier consistent with the archive’s naming principles.

Add metadata by placing received objects in a contextual relationship with other archived objects or with documents available in other archives.

Transfer all archivable data objects (data and metadata) to Archival Storage service.

Transfer metadata and any objects that are to be accessible on line to Data Management and Access service.

Page 27: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 27

Ingest Service: external interface

Interface with the Archival Storage (AS) service The data and metadata files to be preserved are sent to Archival Storage where they are

organized in a 'virtual' tree structure.

Archival Storage is responsible for their preservation. Interfaces are extremely simple. They consist of a very small number of actions

which can be implemented from a workstation in the Ingest service:

A realistic description of the actions used to send a digital object to AS is given below: Connect to AS (always at the initiative of the Ingest Service), authentication. Request storage of a digital object, indicating its identifier and the service class expected for

this object. Send file. Receive acknowledgement from AS. Close session.

Page 28: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 28

Ingest Service: external interface

Output: interface with Data Management and Access

The metadata files, in standardized format, are sent to DMA.

These metadata files concern all levels: they may include: descriptions of collections and sub-collections (records groups and sub-groups),

descriptions and identification of single digital objects.

Page 29: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 29

Ingest Service:: essential essential choices choices

We made the following essential choices: The format of digital data (office documents, scientific observations, images,

etc.) must be: standardized, whenever possible, independent of the software used to create the data, described (syntax and semantics) exhaustively.

Metadata must be standardized.

We reject methods based on format migrations conducted regularly. These migrations could not be achieved with certainty.

Page 30: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

0010000000100000001000000011000100110010001100000010000000100000001000000010000000110101001110000010000000100000001101010010000000100000001000000011000100111001001110000011000100101101001100000011100100101101001100110011000001010100001100010011010000111010001100100011001000111010001100000011000100101110001101100011001000111001010110100010000000100000001000000011001100111000001110010011100000101110001101110011000000100000001000000010000000110010001100110011010100110010001011100011010000110101001000000010000000100000001000000010000000101101001100010010111000110110001101100010000000100000001000000010110100110111001011100110111001100100010000000100000001011010011000100111000001011100011011000110001001000000010000000101101001110000011000000101100011000000110000001000000010000000101101001110000011000000101110001100000011000000100000001000000010000000110001001000000010000000110100001100110010000000100000001000000011100100100000001000000010000000100000001110010011100000111001001011100011010000110010001000000010000000100000001000000010000000110001001101100010111000110000001100100010000000100000001000000010000000100000001101010011001100101110001110000011001100100000001000000010000000100000001000000011010100110110001011100011011000110010001000000010000000100000001000000010000000100000001101000010111000111000001100010010000000100000001000000010000000110111001110010011001100101110001101010011100000100000001000000010000000100000001000000010000000110011001011100011001100110100000010100010000000100000001000000011000100110010001100000010000000100000001000000010000000110101001110000010000000100000001101010010000000100000001000000011000100111001001110000011000100101101001100000011100100101101001100110011000001010100001100010011010000111010001100100011001000111010001100000011010100101110001101110011100000111000010110100010000000100000001000000011001100111000001110010011010000101110001101100011010000100000001000000010000000110010001101010011010000110110001011100011100000111000001000000010000000100000001000000010000000101101001100010010111000110100001110010010000000100000001000000010110100111001001011100011011100110110001000000010000000101101001100010011100000101110001110010011011000100000001000000010110100111000001100000010111000110000001100000010000000100000001011010011100000110000001011100011000000110000001000000010000000100000001100010010000000100000001101000011010000100000001000000010000000111001001000000010000000100000001000000011100100111001001100100010111000110001001100010010000000100000001000000010000000100000001100010011011000101110001100000011001100100000001000000010000000100000001000000011010100110100001011100011000000111000001000000010000000100000001000000010000000110101001101100010111000111000001101110010000000100000001000000010000000100000001000000011010000101110001110010011000000100000001000000010000000100000001101110011100100110000001011100011011000111000001000000010000000100000001000000010000000100000001100110010111000110011001110000000101000100000001000000010000000110001001100100011000000100000001000000010000000100000001101010011100000100000001000000011010100100000001000000010000000110001001110010011100000110001001011010011000000111001001011010011001100110000010101000011000100110100001110100011001000110010001110100011000100110000001011100011000000110010001101110101101000100000001000000010000000110011001110010011001000110101001011100011000000110101001000000010000000100000001100100011010000110010001110010010111000110001001101010010000000100000001000000010000000100000001011010011000100101110001101000011100100100000001000000010000000101101001110000010111000110111001110000010000000100000001011010011000100111000001011100011001000110111001000000010000000101101001110000011000000101110001100000011000000100000001000000010110100111000001100000010111000110000001100000010000000100000001000000011000100100000001000000011010000110100001000000010000000100000001110010010000000100000001000000010000000111001001110010011010000101110001101110011100100

An Example!!!

Page 31: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 31

How many standards ...in just a How many standards ...in just a few years?few years?

Hundreds and hundreds of data format standards can be found on the Web

Page 32: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 32

.9429221140.1217.6867.9671.2815.09634.628.071205851981-09-30T14:26:17.736Z2537.991884.430.11-80.00-80.00-14.78-17.0229221145.0617.7668.4271.7715.75629.758.361205851981-09-30T14:26:21.975Z3902.583974.89-1.46-18.69-25.60-80.00-80.0013391147.5217.8168.6572.0116.09627.348.511205851981-09-30T14:26:26.134Z3553.843160.87-1.35-15.79-23.01-80.00-80.0013391149.9917.8568.8872.2516.44624.938.671205851981-09-30T14:26:34.533Z3606.403068.59-1.13-15.65-21.63-80.00-80.0013391154.9217.9469.3472.7417.18620.148.991205851981-09-30T14:26:38.772Z3729.722871.54-1.10-13.96-19.98-80.00-80.0013391157.3817.9969.5772.9817.57617.779.161205851981-09-30T14:26:42.932Z3715.963019.52-1.10-14.96-20.79-80.00-80.0013391159.8518.0369.8073.2217.96615.409.33120585198109-30T14:26:51.330Z4047.072315.92-0.46-11.32-13.37-80.00-80.0013391164.7718.1370.2673.7018.79610.709.681205851981-09-30T14:26:55.570Z4073.732350.72-0.27-12.11-12.78-80.00-80.0013391167.2318.1870.4973.9319.23608.369.861205851981-09-30T14:26:59.

ASCII interpretation (ISO 646)

Page 33: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 33

120 58 5 1981-09-30T14:22:01.629Z 3898.70 2352.45 -1.66 -7.72 -18.61 -80.00 -80.00120 58 5 1981-09-30T14:22:05.788Z 3894.64 2546.88 -1.49 -9.76 -18.96 -80.00 -80.00120 58 5 1981-09-30T14:22:10.027Z 3925.05 2429.15 -1.49 -8.78 -18.27 -80.00 -80.00120 58 5 1981-09-30T14:22:18.426Z 3946.38 2500.82 -1.71 -8.42 -19.33 -80.00 -80.00120 58 5 1981-09-30T14:22:22.585Z 3964.01 2591.92 -1.76 -8.83 -19.86 -80.00 -80.00120 58 5 1981-09-30T14:22:35.222Z 4021.69 2406.25 -1.62 -7.75 -18.24 -80.00 -80.00120 58 5 1981-09-30T14:22:39.381Z 4007.80 2474.73 -1.69 -8.05 -18.88 -80.00 -80.00120 58 5 1981-09-30T14:22:43.620Z 4037.11 2329.44 -1.74 -6.70 -18.23 -80.00 -80.00120 58 5 1981-09-30T14:22:52.018Z 4099.12 2399.84 -1.33 -8.46 -16.84 -80.00 -80.00120 58 5 1981-09-30T14:22:56.177Z 4120.44 2447.08 -1.46 -8.25 -17.44 -80.00 -80.00120 58 5 1981-09-30T14:23:00.417Z 4114.44 2291.40 -1.57 -6.73 -17.11 -80.00 -80.00120 58 5 1981-09-30T14:23:08.815Z 4140.36 2347.36 -1.47 -7.41 -16.91 -80.00 -80.00120 58 5 1981-09-30T14:23:12.974Z 4172.42 2302.13 -1.27 -7.70 -15.82 -80.00 -80.00120 58 5 1981-09-30T14:23:17.213Z 4198.63 2434.23 -1.38 -8.11 -16.80 -80.00 -80.00120 58 5 1981-09-30T14:23:25.611Z 4182.57 2381.31 -1.30 -8.10 -16.30 -80.00 -80.00120 58 5 1981-09-30T14:23:29.770Z 4214.61 2112.57 -1.14 -6.63 -14.20 -80.00 -80.00120 58 5 1981-09-30T14:23:34.009Z 4245.87 2113.51 -0.96 -7.24 -13.35 -80.00 -80.00

Separate the various information fields

Page 34: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

Define the meaning

For each field

Page 35: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

ÐÏࡱá > þÿ O Q þÿÿÿ N ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿì¥Á Y ¿ ì& bjbjóWóW V ‘= ‘= ! ÿÿ ÿÿ ÿÿ ] Ò Ò Ò Ò Ò Ò Ò æ æ æ æ 8 $ B T æ ò  p 2 H H H H H H ~ € € € € € € Ÿ ô “ ´ € - Ò H H H H H € Ò Ò H H  H ü Ò H Ò H ~ æ æ Ò Ò Ò Ò H ~ * 8 V 2 @ Ò Ò ~ H – ,  0~,2]Áæ æ D Ê r l – PRESENTATION ET OBJECTIFS DU COLLOQUELa pérennisation des données scientifiques et techniques vise à préserver à long terme des informations, livrées sous forme de données numériques, dans le but de les restituer à une communauté diverse d’utilisateurs. On sait que des données stockées sans vision à long terme deviennent rapidement inutilisables du fait de ruptures technologiques qui peuvent frapper d’obsolescence des solutions que l’on croyait pérennes. On sait aussi que l’archivage organisé des données ne suffit pas, car ce qu’attend l’utilisateur d’une archive est, non pas la donnée, mais l’information dont la donnée n’est que le réceptacle. ' ‹ ß à ö ú û + 8 9 A E Q U V g … † › ¥ Ø ê - P ` r „ ¾ Ë & 1 k x Ž — ± ¼ Ù ç 0 A U a r € œ ¦ Â Í å ñ � ) 3 U ` z ‰ ¡ Ð ç g h ‘ ’ ú ü Ô Ö × \ ] ^ š › À Á û ÷ ô ô ñ ï ëñ ñ ñ ñ ñ ñ ñ ñ ñ �ñ ñ ñ ñ ñ ñ ñ ñ ñ ñ ñ ñ ñ ñ ñ ñ ñ ñ é ô ç ÷ ñ é é ç å éà j wð57B*5B*CJ B*CJ � � �CJ 5CJ 5CJ X ' ( ) * S T U V W X j k l „ … † ‡ ˆ� � ‰ Š ‹ ¶ · ¸ ¹ ý ý ý ý ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ø ö ö ö $ ' ( ) * S T U V W X j k l „ … † ‡ ˆ ‰ Š ‹ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ À Á Â Ã Ä Å Æ Ç È É Ê Ë ß à B h i ˜ Ø M o ýýýý ú÷ôñîëèåâßÜÙÖÓÐÍÊÇÄÁ¾»µ²¬ñ£ š— “þÿÿÈþÿÿÿÿÿHÿÿ�ÿwÿÿÿxÿÿÿžÿÿÿ ëÿÿÿ ìÿÿÿ Àÿÿÿ ÁÿÿÿÂÿÿÿÃÿÿÿÄÿÿÿÅÿÿÿÆÿÿÿÇÿÿÿÈÿÿÿÉÿÿÿÊÿ

Example of an MS WORD 97 file!!!

Page 36: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 36

Ingest Service: technical resources required

The hardware, software and communication resources required to receive digital objects correctly after they have been sent by the producer services have no special characteristics.

It is necessary to decide on a case-by-case basis whether to secure the

transfers: to authenticate the objects received, and guarantee their fixity with respect to objects sent the producer.

These resources will need to be adapted depending on the volume of data to be handled and the frequency of transfers.

Obviously, a set of software programs will be required for computer-aided preparation of the data and metadata to be archived.

Page 37: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 37

Ingest Service:: s skills requiredkills required

There is clearly a need for two types of competence: The Archivist, who can:

define, jointly with the producer, the information to be preserved, check that this information is meaningful and complete, organize this information within a structured system.

The Computer Expert, specialist in data management and representation of information in digital format, in order to:

define the data and metadata formats acceptable for preservation, check their conformance, implement, if necessary, a format transformation process, specify the development of the computer tools required by this service, develop and

use them.

These specialized skills in digital representation also imply broad computer knowledge.

Page 38: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 38

Ingest Service:: s skills requiredkills required

These two areas of competence are combined in a new profession referred to as Digital Data Manager, strongly based on norms and standards applicable to data and metadata representation.

In addition, there is a need to communicate and negotiate with data and document producers:

long-term iterative work

Page 39: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 39

Ingest Service:lessons learnt at CNES

In practice, the collection of a complete set of correctly described organized digital objects represents the most difficult and, finally, the most costly task, especially in terms of human resources.

Page 40: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 40

Archival StorageArchival Storage

Archival Storage

IngestData

Managementand Access

Coordination

Page 41: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 41

Archival Storage:Archival Storage:Functions seen from the Functions seen from the

'customer' side'customer' side Scenario:

I am responsible for archiving digital data, images, documents, etc.

This data consists of sets of files i.e. bit streams of known format and content that I can process and present to the end users in meaningful format.

I expect an Archival Storage service to: take responsibility for these files for their long-term preservation guarantee the integrity of these files be able to restore these files within the delay agreed by the service contract provide a stable 'technical interface' whose services I can call upon (archive a file, restore,

rename, create a virtual tree structure, etc.) be able to handle technology changes (migrations of storage media, etc.) with no impact on the

interface and therefore no impact on my applications manage the access rights to this data.

Based on this concept: the organization of Archival Storage remains completely independent of the other services, the service can be reused in numerous contexts within the relevant organization.

Page 42: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 42

Archival Storage:Archival Storage: responsibility responsibility for file integrityfor file integrity

AS must take responsibility for all activities required to maintain the integrity of the digital objects:

storage of objects on storage media, together with one or more backup copies which

must be kept in separate locations,

continuous monitoring of the condition of the media (number of read operations carried

out on each medium, measurable bit error rate, etc.),

periodic replacement of media considered less reliable by new media,

integration of storage technology upgrades to carry out migrations (periodic or continuous

depending on the chosen policy) to the new media best adapted to its activities.

etc.

Page 43: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 43

Archival Storage:Archival Storage:SkillsSkills

Skills of computer experts specialized in: the management of large sets of stored files, duplicated on various types of medium,

high-speed network technologies in order to communicate with the 'customers' of the

service,

high-capacity storage technologies, storage robots, etc., the storage media, their

characteristics and reliability,

the means used to monitor the condition of the media, implementation of these means,

the ability to maintain in operational readiness a system open round the clock and to

adapt the system according to technological upgrades and increased demands

...

Page 44: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 44

Archival Storage:Archival Storage: the STAF the STAF service set up at CNESservice set up at CNES

STAF stands for "Service de Transfert et d’Archivage de Fichiers" (File Archive and Transfer Service)

Set up in 1994

The aim of the STAF service is to preserve CNES data files collections obtained

from scientific experiments.

This data consists of non-reproducible reference data, stable in time and

intended for long term use.

The STAF mission: to store a "collection" of files using an application logic that

remains valid regardless of changes in systems and storage technologies.

Page 45: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 45

Archival storage:Archival storage: the STAF the STAF service set up at CNESservice set up at CNES

Guarantee data integrity and confidentiality for each 'customer' using the service

Transparency of operations

Possibility of extending the storage capacities as required

Possibility of integrating new client machines

Currently: more than 3.8 million files for a volume of 145 Terabytes

Page 46: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 46

Lessons learnt at CNESLessons learnt at CNES

The STAF (File Archive and Transfer Service) concept has stood the test of time Ten years of experience No data lost Increasing number of customers Increasing volume of stored data

The concept also allows the archival storage system to be sharedbetween several sites of the same institutionbetween several separate institutions

A minimum critical size is essential in order to reduce costs (hardware, software and human resources)

Page 47: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 47

Lessons learnt at CNESLessons learnt at CNES

The operating principle of this service and the practical results obtained over the last 10 years have convinced the BnF of the benefits offered by this type of service.

A draft agreement is being discussed between CNES and the BnF, concerning the reuse, by the BnF, of the management software implemented in the service.

This type of service may be: specific to an institution, or shared by several separate institutions,

managed by a private company.

Page 48: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 48

Data Management and Communication

Archival Storage

IngestData

Managementand Access

Coordination

Data Management andCommunication

Page 49: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 49

Data Management and Access:: functionsfunctions

Manage the data collection preserved by the Archive and

communicate this information to authorized users.

Implement and maintain a computer system providing remote access

- via a graphic user interface - to a set of functions know the content of the archive,

find the data they need (selection criteria based on metadata, for example),

select the data corresponding to their needs,

order and retrieve this data,

if necessary, transform the archived data before supplying it to the user (change format, Added-Value Services, etc.).

Page 50: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 50

Data Management and Access:: functionsfunctions

The search for useful data is based on metadata together with additional techniques (browsing, data mining, etc.)

The data can be retrieved via the network or copied onto a standard medium (CD-ROM, DVD, DLT, etc.) depending on the volume

a dedicated service at the CNES Computer Centre (SEM: Service d ’Echange de Media / Medium Exchange Service) has been set up to allow data distribution and reception on physical media

Manage relations with the community of users

Page 51: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 51

Data Management and Access: technical resources required

Technical resources required The system set up by DMA relies largely on database and Internet information

communication technologies.

Systems partially or totally meeting the requirements of DMA are or will be available on

the market, limiting the costs of special computer developments.

DMA may need to have access to the resources required to copy digital objects onto the

distribution media. Lastly, in some cases, regardless of data managed by Archival

Storage, it may need to have its own storage area for data objects that must be

immediately available on line for users, hence the need for a certain amount of storage

capacity (usually on disk).

Page 52: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 52

Data Management and Access:: skills skills requiredrequired

Skills of computer experts specialized in: data models information search processes database technologies Internet technologies and languages (Human-machine Interface on browser,

etc.) maintaining in operational readiness systems open to large and small

communities of users.

General knowledge of archiving issues knowledge of the data categories handled knowledge of metadata and data selection criteria appropriate for user needs

Page 53: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 53

Data Management and Access:: feedback to CNESfeedback to CNES

These Data Management and Communication skills have been implemented

to provide space scientific data on various topics (astronomy, oceanography,

etc.).

Despite the diversity of digital objects and logics specific to each topic, the

problem, which will shortly be solved, is to reduce costs through the use of a

generic system that can be adapted to each topic.

Page 54: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 54

CoordinationCoordination

Archival Storage

IngestData

Managementand Access

Coordination

Page 55: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 55

The CoordinatorThe Coordinator

The Coordinator directs and has overall responsibility for archiving (as understood in the OAIS context)

Depending on the context, the coordinator is known as the data manager,

technical data collection manager, archivist, main archivist, etc.

Mission: Organize the workload between the various 'services'

Ensure that the interfaces between these services are clear

Coordinate the work for common areas of competence: the information model the dictionary of deliverable digital objects etc.

Page 56: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 56

Why an organizational Model ?Why an organizational Model ?

To define the appropriate organization (human resources et technical facilities) for a digital archive,

To define precisely the required skills for each service

The split in services make easier the cost analysis

The split in services should help for the definition of commercial products well adapted to the needs of ech service

To simplify the definition of an archive certification process

Page 57: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 57

Relations with the institution's Information System: STAF example

Scientific data center

Responsible for archiving observations

on a given topic

STAF: archival storage

Project A

Project B

Project C

Page 58: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 58

Relations with the institution's IS: information management and supply

Community of plasma physics

users

Community of oceanography

users

Community of PHOBOS

users

etc.

SIPAD - Generic system for preservation and access to data and information

------------------------------------------------Management and access to plasma physics data

-----------------------------------------------Management and access

to oceanography data-----------------------------------------------

Management and access to PHOBOS experimental data

-----------------------------------------------etc.

Page 59: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 59

Essential Critical PointEssential Critical Point

The critical point is collecting information and ensuring all activities required to create:

files in format suitable for long-term preservation,

'standardized' metadata.

This critical point concerns both: the content (completeness, accuracy, authenticity)

the format (open, standardized, etc.)

The critical point depends to a certain extent on the company's technical policy and office computer policy.

Page 60: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 60

ConclusionConclusion

Technology will continue to change but information will stay

Rather than trying to keep a particular technology alive in the long term, we therefore deliberately gave priority to the options based on knowledge of the structure, syntax and semantics of the information

The proposed Organizational Model is mainly based on this choice. Its vocation is to help identify concrete, applicable solutions.

It is also based on analysis of skills and professions.

Lastly, it is based on extensive feedback to CNES which has backed up our choice.

Page 61: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 61

ConclusionConclusion

It must be possible to carry out external inspections and audits on this type of organization

The digital archive must be able to demonstrate through its organization,

its resources,

its teams and applicable standards and procedures,

its ability to perform its mission and therefore guarantee long-term preservation of the digital information it is responsible for.

This opens the door to debate on 'Certification' of Digital Archives.

Page 62: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 62

Conclusion:Conclusion: our vision for the our vision for the futurefuture

The Ingest Service: there is a considerable degree of intellectual added value which cannot be replaced by automatic processes. Some software programs may make the work easier, but they cannot think for us.

The Archival Storage Service: fundamentally technological. For this type of service, companies must be able to provide turnkey solutions which are both reliable and financially attractive.

The Data Management and Access Service: highly technological, but depends to a large extent on the information model. Once again, turnkey data management and access systems easily be adapted for use in different fields can be developed.

Page 63: 2004/09/21 1 Claude HUC (CNES – French Space Agency) An Organisational Model for Digital Archive Centres

2004/09/21 63

ReferencesReferences

[1] CCSDS 650.0-B-1., Reference Model for an Open Archival Information System (OAIS). CCSDS Blue Book. Issue 1. January 2002, ISO 14721:2003,

[2] CCSDS 651.0-R-1., Producer-Archive Interface Methodology Abstract Standard (PAIMAS). CCSDS Blue Book Issue 1, May 2004. (ISO standardization process currently started)

[3] CCSDS 647.1-B-1., Data Entity Dictionary Specification Language (DEDSL) - Abstract Syntax (CCSD0011). CCSDS Blue Book. Issue 1. June 2001 ISO 21961:2003,

[4] CCSDS 647.3-B-1 Data Entity Dictionary Specification Language (DEDSL)—XML/DTD Syntax (CCSD0013). CCSDS Blue Book. Issue 1. January 2002, ISO 22643:2003,

All these References are freely available on the web site: http://www.ccsds.org/CCSDS/recommandreports.html