CASPAR : Early Results and Future Goals

Embed Size (px)

Citation preview

  • 8/8/2019 CASPAR : Early Results and Future Goals

    1/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 1

    CASPAR: Early results andfuture goals

    David Giaretta

  • 8/8/2019 CASPAR : Early Results and Future Goals

    2/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 2

    CASPAR aimsProduce tools and techniques to support digitalpreservation and make it easier to share the cost must be relatively easy to use must have a low buy-in in terms of effort required for

    adoption must avoid requiring wholesale change of everyone elsessystems

    must be decentralised and reproducible so that it can live onafter the formal end of the CASPAR project

    must be preservable

    must be open: open source, open standardsCannot do everything but should do somethingbroadly usefulWorking closely with the UK Digital Curation Centre

  • 8/8/2019 CASPAR : Early Results and Future Goals

    3/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 3

    Digital PreservationEasy to doas long as you can provide moneyforever Easy to test claims about toolsas long as you live a long time

  • 8/8/2019 CASPAR : Early Results and Future Goals

    4/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 4

    Validation

    Demonstrate theoretical basisAccelerated lifetime tests Changes in hardware Changes in environment Changes in Designated Community

    Demonstrate increased trustworthiness Measured using draft Certification

    Standard

  • 8/8/2019 CASPAR : Early Results and Future Goals

    5/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 5

    Digital PreservationNeed to preserve information & knowledge not just the bits Documents, videos are rendered simple?

    Data must be processed - harder Need to manage knowledge to keep archivesalive through time Preservation is a process, not a one-time event Preservation is expensive costs need to be shared

    The alternative is money endless supplies of money

    Open Archival Information Systems ReferenceModel (ISO 1 472 1) provides a general conceptualframework

  • 8/8/2019 CASPAR : Early Results and Future Goals

    6/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 6

    Immediate benefits of Digital

    Preservation: Use of Unfamiliar DataGlobal Cyber-Infrastructures allow users tofind and try to use data from many sources

    Some sources will be familiar Most available sources will be unfamiliar

    How can one be sure that the unfamiliar datais used correctly

    Garbage in garbage outNeed to be able to deal with unfamiliar datawhether it is contemporary or old (preserved)

  • 8/8/2019 CASPAR : Early Results and Future Goals

    7/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 7

    OAIS Reference ModelISO 14721 : Reference Model for an Open Archival Information Systems(OAIS). http://public.ccs d s.o rg /publicatio ns/a r chiv e /650x0b1.p df An OAIS is an archive, consisting of an organization of people andsystems, that has accepted the responsibility to preserve information andmake it available for a Designated Community.L ong Term Preservation : The act of maintaining information, in a correctand Independently Understandable form, over the Long Term.L ong Term is long enough to be concerned with the impacts of changingtechnologies, including support for new media and data formats, or with achanging user community.Designated Community: An identified group of potential Consumerswho should be able to understand a particular set of information. The

    Designated Community may be composed of multiple user communities.Has sufficient documentation to allow the information to beunderstood and used by the Designated Community without having toresort to special resources not widely available, including namedindividuals.

    OASIS OAIXX

  • 8/8/2019 CASPAR : Early Results and Future Goals

    8/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 8

    OAIS Information ModelInformation

    Object

    RepresentationInformation

    1+

    interpretedusing1+Data

    Object

    interpretedusing

    PhysicalObject DigitalObject

    BitSequence

    1+

    Recursion ends atKNOWLEDGEBASEof the DESIGNATEDCOMMUNITY(this knowledge willchange over time andregion)

  • 8/8/2019 CASPAR : Early Results and Future Goals

    9/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 9

    Rep.Info. Classification

  • 8/8/2019 CASPAR : Early Results and Future Goals

    10/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 10

    FITS FILE

    FITSSTANDARD

    PDFSTANDARD

    FITSJAV A s/w

    JAV A V M

    PDFs/w

    FITSDICTIONARY

    DICTIONARYSPECIFICATION

    UNICODESPECIFICATION

    XML

    SPECIFICATION

  • 8/8/2019 CASPAR : Early Results and Future Goals

    11/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 11

    Representation InformationThe Data Object is interpreted using theRepresentation Information (RepInfo)The Reference Model is designed to ensure

    that an OAIS is not set the impossible task of having to provide all possible RepInfoimmediatelyHence:

    Take account of the Designated Community and itsassociated Knowledge Base

    The amount of RepInfo is not fixed Additional RepInfo will be needed over time

  • 8/8/2019 CASPAR : Early Results and Future Goals

    12/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 12

    Early ResultsHigh level architecture for sharing costand access to RepresentationInformationDetailed examinations of specificdatasets to understand what is really

    needed to keep them understandableand usable

  • 8/8/2019 CASPAR : Early Results and Future Goals

    13/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 13

    Rep. Info. Use and maintenance

  • 8/8/2019 CASPAR : Early Results and Future Goals

    14/30

  • 8/8/2019 CASPAR : Early Results and Future Goals

    15/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 15

    CASPARinformationflowarchitecture

    Rep

    Info

  • 8/8/2019 CASPAR : Early Results and Future Goals

    16/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 16

    CASPAR TestbedsThree testbeds Cultural: UNESCO Performing Arts: INA , IRCAM

    Scientific: ESA and CCLRCComplex, multi-source, multifaceted dataMany common preservation & evaluation &validation issues

    Some specific requirements on preservation(technical, delivery, legal) Specific user communities/ Knowledge basesAlso test the OAIS model

  • 8/8/2019 CASPAR : Early Results and Future Goals

    17/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 17

    Science: CCLRC example

    Ionosonde data

    World map of ionosondes

  • 8/8/2019 CASPAR : Early Results and Future Goals

    18/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 18

    Some IssuesDifficult to derive physical quantitiesfrom data Can be analysed in multiple ways Raises fundamental questions about

    Representation InformationCommon automated method is

    proprietary Data structure also proprietary Paper documentation - restricted accessProvenance and trust

  • 8/8/2019 CASPAR : Early Results and Future Goals

    19/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 19

    ESA exampleGOME

    GlobalOzoneMonitoringInstrument

    on ERS-2

  • 8/8/2019 CASPAR : Early Results and Future Goals

    20/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 2 0

    GOME data processing

  • 8/8/2019 CASPAR : Early Results and Future Goals

    21/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 2 1

    GOME Level 4 product:Integration of GOME, other data and models

    GOME Level 3 product: Integrationof time and space data

    GOME Level 2 product:Oz one profile atgiven location

  • 8/8/2019 CASPAR : Early Results and Future Goals

    22/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 22

    Some IssuesProvenance and Context of processeddata

    relationship to

    Representation Information of raw dataand

    Knowledge base of DesignatedCommunity

  • 8/8/2019 CASPAR : Early Results and Future Goals

    23/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 23

    UNESCO examplesDATA :

    Scanned documents and maps

    Aerial and close range photography(Digital photogrammetry)

    Monument measurements (Laser scanning)

    Satellite images (Remote sensing andimage processing)

    Multi-scale digital cartography (Geographicinformation systems (GIS) and CAD)

    3 D models, virtual tours (Computer visualization)

    Mandatory Documentation:

    Identification of property

    Description of property

    Justification of inscription

    State of conservation andfactors affecting theproperty

    Protection andManagement

    Monitoring

    Documentation

    Contact information of responsible authorities

    Signature on behalf of theState Party(ies)

    World HeritageList

  • 8/8/2019 CASPAR : Early Results and Future Goals

    24/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 24

    Performing Arts examplesExamples:Score

    MAX/MSP patches

    Additional instructions

  • 8/8/2019 CASPAR : Early Results and Future Goals

    25/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 2 5

    Some IssuesWhat is Preservation of performability? Composers intention

    AuthenticityProprietary software and hardwareCopyrightDigital Rights Management

  • 8/8/2019 CASPAR : Early Results and Future Goals

    26/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 2 6

    Shared InfrastructureRegistries of Representation InformationPersistent Identifier name resolvers

    DOI? ARK? URL? none are guaranteedInterfaces support preservation andinteroperabilityStandards Preservation Description

    Information Fixity, Provenance, Reference, Context

    Accreditation/Certification for repositories

  • 8/8/2019 CASPAR : Early Results and Future Goals

    27/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 27

    Knowledge at the heart of preservation

    Knowledge driven approachKnowledge management to support long-termpreservation of concepts/information including: Single, complex, on demand, interactive objects DRM Authenticity Access Storage Designated Community descriptions

    Knowledge base definitionontologies

  • 8/8/2019 CASPAR : Early Results and Future Goals

    28/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 2 8

    WHENComponent architecture and prototypesby month 1 2

    Framework architecture month 1 8Component integration months 24 -3 0Testbed implementations months 3 0- 3 6Project completion month 42

  • 8/8/2019 CASPAR : Early Results and Future Goals

    29/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 2 9

    www.casparpreserves.eu

  • 8/8/2019 CASPAR : Early Results and Future Goals

    30/30

    CODATA 2006, Beijing, China 23-25 Oct 2006 3 0

    ConclusionsScience Data and Knowledge needs more than juststoring the bitsUnderstanding and being able to process the vastamount of unfamiliar data which is available is hardIt is expensive Costs much be shared

    So far the Open Archival Information Systems

    Reference Model is OK Many similarities can be exploited Many subtleties need to be explored

    Watch this space