29
Repositories, Workspaces, Web Servic - some ideas Peter Wittenb The Language Archive - Max Planck Instit CLARIN Research Infrastruct Nijmegen, The Netherlan

Repositories, Workspaces, Web Services - some ideas -

Embed Size (px)

DESCRIPTION

Repositories, Workspaces, Web Services - some ideas -. Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure Nijmegen, The Netherlands. scope of workshop. clear focus on technology and architecture issues for preservation and access - PowerPoint PPT Presentation

Citation preview

Page 1: Repositories, Workspaces, Web Services - some ideas -

Repositories, Workspaces, Web Services- some ideas -

Peter WittenburgThe Language Archive - Max Planck Institute

CLARIN Research InfrastructureNijmegen, The Netherlands

Page 2: Repositories, Workspaces, Web Services - some ideas -

scope of workshop

• clear focus on technology and architecture issues for preservation and access

• many other issues not in focus although relevant • IPR, license issues only partially • quality of data & metadata • certification (RAC, DSA, etc)• AAI• cost aspects • etc.

• let's have interactive presentations • should be able to extract essentials

Page 3: Repositories, Workspaces, Web Services - some ideas -

Definitions?

Page 4: Repositories, Workspaces, Web Services - some ideas -

so simple

repository

Page 5: Repositories, Workspaces, Web Services - some ideas -

- orange- 2010- plum

- 2010- pear- 2010- apple

- 2010

+ Metadata

repository

metadata registry

?

?dangerous since physical paths may change etc

Page 6: Repositories, Workspaces, Web Services - some ideas -

- orange- 2010- plum

- 2010- pear- 2010- apple

- 2010

+ replication due to preservation

repository

metadata registry

repository

?

?dangerous since metadata records can be re-used metadata should be stable

transfer at physical level

Page 7: Repositories, Workspaces, Web Services - some ideas -

- orange- 2010- plum

- 2010- pear- 2010- appel

- 2010

+ replication and PIDs

repository

metadata registry

repository

- PID4- 2010- PID3

- 2010- PID2- 2010- PID1

- URL1- URL 2

PID registry

?

dangerous: another indirection layer

transfer at physical level

access possiblewhich rights?

same access rights

Page 8: Repositories, Workspaces, Web Services - some ideas -

- orange- 2010- plum

- 2010- pear- 2010

what about collections

repository

metadata registry

repository

- PID4- 2010- PID3

- 2010- PID2- 2010- PID1

- URL1- URL 2

PID registry

transfer at physical level

- collection- 2010

- appel- 2010

- PIDx- URL

PS:collections are dynamic

Page 9: Repositories, Workspaces, Web Services - some ideas -

topic of high relevance

• ESFRI Task Force on Repositories (report)

• e-IRG/ESFRI Task Force on Data Management (report)

• Blue Ribbon Task Force on Sustainable Digital Preservation and Access (report)

• EC High Level Expert Group on Scientific Data (report)

• ASIS&T Summit Phoenix on Research Data and Access (slides & summary)

• T. Hey et al. The Fourth Paradigm: Data-Intensive Scientific Discovery (book)

Page 10: Repositories, Workspaces, Web Services - some ideas -

summarizing the challenges

• how to • manage the data Tsunami • maintain data visibility • preserve the data (just seen one solution)

• protect the data integrity • ensure that we get the object we wanted• guarantee data authenticity (how to present)• maintain context and provenance information

• protect privacy and rights in complex data world • maintain trust in data

• federate repositories to (virtually) integrate data• achieve (partial) interoperability • exploit distributed data without copying

Page 11: Repositories, Workspaces, Web Services - some ideas -

speaking about metadata harvesting

Access DataWith Extraction and Analysis, Through CatalogDirect to Partner Sites

View Information on DataThrough CatalogLink to Data at Partner Site

Search Shared Catalog

Data Mirror

MetadataCatalog

Harvester

Online Catalog

Online Analysis

Page 12: Repositories, Workspaces, Web Services - some ideas -

speaking about architectures

Page 13: Repositories, Workspaces, Web Services - some ideas -

speaking about federations

Page 14: Repositories, Workspaces, Web Services - some ideas -

speaking about federations

Page 15: Repositories, Workspaces, Web Services - some ideas -

speaking about federations

Page 16: Repositories, Workspaces, Web Services - some ideas -

general configuration

repository A- architecture- rights domain- access paths- etc.

mirror repository X- architecture- rights domain- access paths- etc.

adapter(s)

adapter(s)

repository B- architecture- rights domain- access paths- etc.

adapters

repository C- architecture- rights domain- access paths- etc.

adapters

mirror repository Y- architecture- rights domain- access paths- etc.

adapters

mirror repository Z- architecture- rights domain- access paths- etc.

adapters

can be special does not scale

Page 17: Repositories, Workspaces, Web Services - some ideas -

general configuration

repository A- architecture- rights domain- access paths- etc.

mirror repository X- architecture- rights domain- access paths- etc.

API

API

repository B- architecture- rights domain- access paths- etc.

API

repository C- architecture- rights domain- access paths- etc.

API

mirror repository Y- architecture- rights domain- access paths- etc.

API

mirror repository Z- architecture- rights domain- access paths- etc.

API

replication layer

Page 18: Repositories, Workspaces, Web Services - some ideas -

generic HLEG figure

Data generators Users

Common Data Services

Community Support Services

Dat

a Cu

ratio

nUser functionalitiesData capture & transferVirtual Research Environments

Data discovery & navigationWorkflow generationAnnotation, Interpretability

Safe & persistent storageIdentifiers, Authenticity, Workflow execution, Mining

Trus

t

Page 19: Repositories, Workspaces, Web Services - some ideas -

requirements for intermediate layer

• needs to cope with large diversity of solutions and architectures• may only minimally interfere with local repository solutions

(too much has been invested along community traditions)• needs to respect rights domains and preserve access rights

• needs to be transparent to proven utilization mechanisms • needs to operate at logical level (canonical collections)• needs to scale with number of (community) data centers

• only one way to go:• separate functionality into independent components

(data, metadata, PIDs, etc)• specify proper interfaces (of course)

Page 20: Repositories, Workspaces, Web Services - some ideas -

requirements for layer

• how to manage procedures/workflows in complex landscape • how to assess quality and correctness of all workflows • how to maintain provenance information

• only one way to go• make use of an easy-to-interpret declarative language • establish proper "policy rules on all levels" • map these rules to robust and proven activities • separate declarative language from interpretation engine

• iRODS is an attempt in this direction respect to Reagan Moore and his team

• at MPI since some years such a declarative language to manageaccess rights for the million objects which need to be treated individually and which are part of collections

Page 21: Repositories, Workspaces, Web Services - some ideas -

Reagan's data environments

• moving not bytes but collections • need to maintain integrity of collections (incl. relations) • collections are assembled for a certain purpose• collections have properties to ensure their purpose• policies ensure maintenance of properties• procedures implement policies • procedures result in state information• assessment step to validate state

• purpose, properties, policies, procedures, state info

Page 22: Repositories, Workspaces, Web Services - some ideas -

program - 1st part

• Larry Lannom (CNRI): about a digital object architecture • Alex Wade (MS): approach from MS• Malte Dreyer: thoughts about generic API• John Kennedy: heterogeneity of repositories in DEISA• Ken Galluppi: federating several repositories • Willem Elbers: federation tests with iRODS• Jean-Yves Nief: iRODS in professional use • Peter and Johannes: summary + discussion

Page 23: Repositories, Workspaces, Web Services - some ideas -

utilization challenge

• utilization software may not be affected by replication • utilization software should also make use of copies • any replication solution needs to demonstrate this !!!!

existingutilizationsoftware

Page 24: Repositories, Workspaces, Web Services - some ideas -

work spaces and profiles

• users want to • store data • protect data• share data• enrich data• change data• etc.

• data is somewhere in this complex domain

• users want transparentaccess

• how to get this done?

profilesattributes

quotasetc

Page 25: Repositories, Workspaces, Web Services - some ideas -

processing chains - specification

datametadataregistries

toolmetadataregistries

data operation data* operation

workflow specification framework

this is very discipline specific - various possibilitiescuration/annotation/enrichment/visualization pipelines, etc

Page 26: Repositories, Workspaces, Web Services - some ideas -

processing chains - execution

workflow execution framework

Page 27: Repositories, Workspaces, Web Services - some ideas -

the challenges

• large amounts of data is at mirroring repositories • let's execute operations on the mirroring sites • how to easily deploy operators • how to inform execution environment about invocation way• how to let them act on the user's behalf

• etc

Page 28: Repositories, Workspaces, Web Services - some ideas -

program - 2nd part

• SARA colleagues: workspace in NL • Morris Riedel (FZJ): workspace ideas• Johannes & John (RZG): operational aspects • Thomas & Erhard (U Tübingen): WebLicht example• Mike Papazoglou (U Tilburg): generic SOA aspects

• Peter: wrap up and discussion

Page 29: Repositories, Workspaces, Web Services - some ideas -

thanks for the attention