42
APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and STFC

APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Embed Size (px)

Citation preview

Page 1: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

APARSENMetadata for preservation, curation and

interoperabilityWorkshop on Research Metadata in Context

7-8 Sept 2010, Nijmegen

David GiarettaAPA and STFC

Page 2: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Digital Preservation• Ensure that digitally encoded information are

understandable and usable over the long term– Long term could start at just a few years

• Easy to make claims– Difficult to provide proof

• Reference Model for Open Archival Information System (ISO 14721)– The basic standard for work in digital preservation– Defines terminology and compliance criteria

Page 3: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Definitions (OAIS)

• Long Term Preservation: The act of maintaining information, Independently Understandable by a Designated Community, and with evidence supporting its Authenticity, over the Long Term.

• Long Term: A period of time long enough for there to be concern about the impacts of changing technologies, including support for new media and data formats, and of a changing Designated Community, on the information being held in an OAIS. This period extends into the indefinite future.

Not just BIT preservation

Not just rendering

Information not just DATA or Documents

Authenticity

Page 4: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Basic concept

• Digital preservation had been dominated by libraries and (state) archives

• However there was a focus there on “rendered objects” and

• Tendency to think data is an “easy” add-onHOWEVER• Need to deal with DATA – processed to new things, not

just rendered• Need to follow OAIS – finer grained view • Need to test and prove that things work

“metadata”“CASPAR banned the use of the term metadata unless absolutely necessary”

Page 5: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Data…Level 2 GOME Satellite

instrument data

Page 6: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Contains numbers – need meaning

6

Page 7: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

...to process to this

7

Page 8: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

...or this

8

Page 9: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

...through complex processing schemes

9

Page 10: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

10

Just Format?

sfqsftfoubujpo jogpsnbujpo svmft

You have a file

JHOVE tells you it is WORD version 7

Page 11: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

..with some extra information..

11

representation information rules

Format Registries – useful but not enough: formats can be used for multiple purposes e.g. audio files used to store configuration parameters

Page 12: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

12

Examples (cont)

• “504b0304140000000800f696….”• “This is a ZIP file which contains Word files,

each of which contains an encoded message which needs the key ‘!D$G^AJU*KI’ to decode it using encryption method SHA7”

Page 13: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

13

Examples (cont)

• LaTex file containing an EPS (Encapulated Postscript) version of an image

• Web page containing Java Applet generating random numbers

• SWISS-PROT data• Foreign Language emails

Page 14: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

14

XML enough? – can stare at this and probably understand it

<family> <father>John</father> <mother>Mary</mother> <son>Paul</son></family>

Page 15: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

..but what about this?

15

<VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.1 http://www.ivoa.net/xml/VOTable/v1.1" xmlns="http://www.ivoa.net/xml/VOTable/v1.1"><RESOURCE><TABLE name="6dfgs_E7_subset" nrows="875"><PARAM arraysize="*" datatype="char" name="Original Source"

value="http://www-wfau.roe.ac.uk/6dFGS/6dfgs_E7.fld.gz"><DESCRIPTION>URL of data file used to create this table.</DESCRIPTION></PARAM><PARAM arraysize="*" datatype="char" name="Comment" value="Cut down 6dfGS dataset for TOPCAT demo

usage."/><FIELD arraysize="15" datatype="char" name="TARGET"><DESCRIPTION>Target name</DESCRIPTION></FIELD><FIELD arraysize="11" datatype="char" name="DEC" unit="DMS"><DATA><FITS><STREAM encoding='base64'>U0lNUExFICA9ICAgICAgICAgICAgICAgICAgICBUIC8gU3RhbmRhcmQgRklUUyBmb3JtYXQgICAgICAgICAgICAgICAgICAgICAgICAgICBCSVRQSVggID0gICAgICAgICAgICAgICAgICAgIDggLyBDaGFyYWN0ZXIgZGF0YSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIE5BWElTICAgPSAgICAgICAgICAgICAgICAgICAgMCAvIE5vIGltYWdlLCBqdXN0IGV4dGVuc2lvbnMgICAgICAgICAgICAgICAgICAgICAg

Page 16: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and
Page 17: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Performance Viewer: side-by-side comparison and validation of the transformation. From left to right: 3D visualization in Ogre3D, 3D model of the stage including the virtual dancer in VRML.

Page 18: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Figure 8 Some aspects of acousmatic production

Page 19: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Rendered

Non-Rendered

Static Dynamic

DynamicStatic

Simple

Complex

SimpleComplex

Rendered

Non-

Rendered

Page 20: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

20

Information Model & Representation Information

The Information Model is key

Recursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY

(this knowledge will change over time and region)

InformationObject

RepresentationInformation

1+

interpretedusing1+Data

Object

interpretedusing

PhysicalObject

DigitalObject

BitSequence

1+

Page 21: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Representation Information Network

Page 22: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and
Page 23: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Modules and Dependencies:defining the Designated

CommunityREADME.txt

TEXT EDITORENGLISH

LANGUAGE

WINDOWS XP

FITS FILE

FITS STANDARD

PDF STANDARD

FITSJAVA s/w

JAVA VMPDF s/w

FITS DICTIONARY

DICTIONARYSPECIFICATION

UNICODESPECIFICATION

XMLSPECIFICATION

MULTIMEDIA PERFORMANCE DATA

C3D DirectX MAX/MSP

3D motiondata files

3D scenedata files

motion to musicmapping strategy

Page 24: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

FITS FILE

FITS DICTIONARY

FITS STANDARD

PDF SOFTWARE

JAVA VM

PDF STANDARD

FITS JAVA SOFTWARE

DICTIONARY SPECIFICATION

XML SPECIFICATION

UNICODE SPECIFICATION

DDL DESCRIPTION

DDLDEFINITION

DDLSOFTWARE

Page 25: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

If we can run this then we can run the Java software to extract the numbers

If we cannot run this then we can use an emulator or use its RepInfo to re-create a Java VM

If we cannot run the Java Virtual Machine then we use this source code to re-write in another programming language such as C

If we can run this then we can use this in a generic application to extract the numbers

If we cannot run the DDL software then we can look at the DDL definition and write some software to extract the numbers

In principle we could use this, plus the Dictionaries in order to understand the keywords in order to extract the numbers

FITS FILE

FITS DICTIONARY

FITS STANDARD

PDF SOFTWARE

JAVA VM

PDF STANDARD

FITS JAVA SOFTWARE

DICTIONARY SPECIFICATION

XML SPECIFICATION

UNICODE SPECIFICATION

DDL DESCRIPTION

DDLDEFINITION

DDLSOFTWARE

Page 26: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

•Rep

•Info

/DISCIPLINE

•Virtualisation

Page 27: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Virtualisation

Page 28: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

2-D array

2-D image

2-D astronomical

image

HeightWidth

Bits per Pixel

HeightWidth

Bits per PixelCo-ordinate system

Time

HeightWidth

Bits per PixelAstronomical co-ordinate system

Time – EPOCHBandpass

Page 29: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

General Table

Time series Science data table

Number of columnsNames of columns

Number of rowsValue in cell at any row, column

Number of columnsNames of columns

Number of rowsValue in cell at any row, columnTime corresponding to any row

Number of columnsNames of columns

Number of rowsValue in cell at any row, column

Type of column valueColumn “metadata”

Table “metadata”

Page 30: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Root node

Node 4Node 3

Node 2Node 1

Node 6Node 6

Node 5

Node 9Node 8Node 7

Get the RootGet the number of children for a node

Get child number “i”

Page 31: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Image

Cultural Heritage

Image

ArtisticImage

Astronomical Image

Earth Observation

Image

Optical Astronomical

Image

X-ray Astronomical

Image

Page 32: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Archival Information

Package

Preservation DescriptionInformation

Content Information

further described by

Package Description

Packaging Information

derivedfrom

describedby

delimitedby

identifies

Page 33: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Preservation DescriptionInformation

FixityInformation

ProvenanceInformation

ReferenceInformation

ContextInformation

Access RightsInformation

Page 34: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

34

Archival

Package

Contentfurther described by

Package Packaging

derivedfrom

describedby

delimitedby

DataObject

PhysicalObject

DigitalObject

StructureReferenceOther

Interpretedusing

Interpretedusing*

1

11...*

Bit

addsmeaning

to

Provenance Context Fixity AccessRights

Page 35: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

RepresentationInformation

Provenance

has

has

Page 36: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

USE DATA• Use application to find data in

Repository• Create DIP with enough RepInfo for the

user (via DC profile)• Obtain more RepInfo from Registry if

necessary

DRM

Cost sharing

Preservable infrastructure

Page 37: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

APARSEN

Technical2000

Management5000

Spreading excellence

4000

Economic/Legal3000

2100: Preservation Services

1200: Staff and experience exchange

2200: Identifiers & citabillity

2300: Storage solutions

2400: Authenticity & Provenance

2500: Interoperability & intelligibility

2600: Annotation, Reputation & data quality

3100: Digital Rights & access management

3200: Cost /benefit data collection and modelling

3300: Peer Review & 3rd party Certification

3400: Brokerage services

3500: Data policies and governance

4100: External W/S & symposia

4200: Formal qualifications

4300: Training courses

4400: Awareness raising

5100: Financial management

5200: Technical co-ord.

2700: Scalability 3600: Business cases

Integration1000

1400: Common testing environments 4500: Liaison with

other stakeholders

1300: Common standards

1100: Common Vision

4600: International liaison

1500: Internal W/S & symposia

1600: Common tools, software repository and market place

5300: Evaluate impact of the Network of Excellence

Page 38: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Technical2000

Economic/Legal3000

2100: Preservation Services

2200: Identifiers & citabillity

2300: Storage solutions

2400: Authenticity & Provenance

2500: Interoperability & intelligibility

2600: Annotation, Reputation & data quality

3100: Digital Rights & access management

3200: Cost /benefit data collection and modelling

3300: Peer Review & 3rd party Certification

3400: Brokerage services

3500: Data policies and governance

2700: Scalability 3600: Business cases

Page 39: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Trust Certification of repositories

Reputation and trustability of datasets, publications and people

Authenticity

SustainabilityBusiness cases

Preservation

Cost/benefit analysis

Transfer of custody – who to hand over to and what to hand over

Storage solutions

UsabilityIntelligibility

Use by common tools

Cross domain usability

Interoperability

AccessIdentify of datasets, publication, people

Rights and responsibilities

Policies and governance

Page 40: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

FUTURE

• Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved

• Non-maintainability of essential hardware, software or support environment may make the information inaccessible

• The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity

• Access and use restrictions may not be respected in the future• Loss of ability to identify the location of data• The current custodian of the data, whether an organisation or

project, may cease to exist at some point in the future• The ones we trust to look after the digital holdings may let us

down

Page 41: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

Links• CASPAR – http://www.casparpreserves.eu • CASPAR Source code - http://sourceforge.net/projects/digitalpreserve/ • OAIS Reference Model

-http://public.ccsds.org/publications/archive/650x0b1.pdf • and the updated draft is available from

http://public.ccsds.org/sites/cwe/rids/Lists/CCSDS%206500P11/Overview.aspx • CASPAR Validation report

http://www.casparpreserves.eu/Members/cclrc/Deliverables/caspar-validation-evaluation-report/at_download/file

• PARSE.Insight: – www.parse-insight.eu

• Alliance for Permanent Access:– www.alliancepermanentaccess.eu

• Digital Curation Centre: – www.dcc.ac.uk

42

Page 42: APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and

END