43
Luis Faria [email protected] KEEP SOLUTIONS www.keep7solu:ons.com Open Repositories 2013 CharloFetown, PEI, Canada, 2013707709 Suppor/ng the preserva/on lifecycle in repositories h"p://goo.gl/V6142

Supporting the preservation lifecycle in repositories

Embed Size (px)

DESCRIPTION

To accomplish effective digital preservation, repositories need to be able to incorporate processes such as planning, monitoring and preservation operations. These processes feed into each other and create a continuous cycle that allows a repository to detect opportunities and risks and act accordingly. Each of these digital preservation processes have already been extensively studied (Antunes et al., 2011; CCSDS, 2002; Hunter & Choudhury, 2006) and tools to support each process have already been developed (Asseg et al., 2013; Becker et al., 2009; Faria et al., 2012), but many repository implementations still lack complete and continuous digital preservation features. This presentation shows a global view on digital preservation processes and how they fit together in a digital preservation cycle. Furthermore, it describes tools that support these processes and explains how to incrementally integrate them into digital repositories providing a complete systematic and semi-automatic digital preservation system.

Citation preview

Page 1: Supporting the preservation lifecycle in repositories

Luis%Faria%[email protected]

KEEP%SOLUTIONS%www.keep7solu:ons.com

Open%Repositories%2013CharloFetown,%PEI,%Canada,%2013707709

Suppor/ng2the2preserva/on2lifecycle2in2repositories

h"p://goo.gl/V6142

Page 2: Supporting the preservation lifecycle in repositories

h8p://www.keep<solu/ons.com

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

KEEP$SOLUTIONS

• Company2specialized2in2informa/on2management• Digital2preserva/on2experts• Open2source:2RODA,2KOHA,2DSpace,2Moodle,2etc.• Scien/fic2research

• SCAPE:%large7scale%digital%preserva:on%environments• 4C:%digital%preserva:on%cost%modeling

2

Page 3: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

KEEP$SOLUTIONS$research$partners

3

Page 4: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

The$past:$RODA$1.0.0

• Presented%in%Open%Repositories%2009• Open%source%digital%repository• Based%on%Fedora%Commons• Modern%web%interface• For%archives• For%digital%preserva:on

4

Page 5: Supporting the preservation lifecycle in repositories
Page 6: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

The$present:$RODA$Community

• Adapted2to2be2a2true2open<source2project• For2users

• Easy2to2install• Easy2to2test2(virtual2machine)• Support2mailing2lists2and2documenta/on• Free2or2paid2support

• For2developers• Development2and2transla/on2guidelines• Easy2build2(maven)• Available2on2GitHub• Support2mailing2lists• Plenty2more2documenta/on

• More2info:2h8p://www.roda<community.org

6

Page 7: Supporting the preservation lifecycle in repositories
Page 8: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Current$pracCce$problems

• Repository%has%content• Organiza:on%has%policies%in%place%(e.g.%no%compression%allowed)

8

P1: Does the content conform to policies? Are there any risks?Even on a changing content, policies and environment?

• Found%a%preserva:on%risk!

P2: How to easily and trustworthily decide which action to take?

Page 9: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Current$pracCce$problems

• Content%grows%exponen:ally%in%volume,%heterogeneity%and%complexity

9

P4: How to do digital preservation in large-scale environments?

• Know%what%ac:on%to%take

P3: How to ensure and monitor the quality of chosen action and that the decision assumptions remain valid?

Page 10: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle

10

Repository

Environment and users

access, ingest, harvest

Page 11: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle

11

Watch

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest

Page 12: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle

12

Planning

Watch

create/re-evaluateplans

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest

Page 13: Supporting the preservation lifecycle in repositories

Planning

Watch

create/re-evaluateplans

deployplan

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest

Operations

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle

13

Page 14: Supporting the preservation lifecycle in repositories

PlanningOperations

Watch

create/re-evaluateplans

deployplan

executeaction plan

monitored actions

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle

14

Page 15: Supporting the preservation lifecycle in repositories

PlanningOperations

Watch

create/re-evaluateplans

deployplan

executeaction plan

monitored actions

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest Policies

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle

15

Page 16: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle$(in$pracCce)

16

Planning

Operations

Watch

create/re-evaluateplans

deployplanexecute

action plan

monitored actions

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest Policies

Page 17: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle$(in$pracCce)

17

Planning

Operations

Watch

create/re-evaluateplans

deployplanexecute

action plan

monitored actions

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest Policies

Scout

Plato

Workflow2engine

Page 18: Supporting the preservation lifecycle in repositories

deployplan

Repository

Environment and users

access, ingest,harvest

Scout

Plato

Scout Web UI & Email notification

Notification API

Report API

Scout Adaptors

Plan management API

Data Connector API

create/re-evaluateplans

Plato Web UI

monitored events and actions

monitored content

Workflow engine

Workflow engine API

Planner

execute plan

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

SCAPE$PreservaCon$Suite

18

Page 19: Supporting the preservation lifecycle in repositories

deployplan

Repository

Environment and users

access, ingest,harvest

Scout

Plato

Scout Web UI & Email notification

Notification API

Report API

Scout Adaptors

Plan management API

Data Connector API

create/re-evaluateplans

Plato Web UI

monitored events and actions

monitored content

Workflow engine

Workflow engine API

Planner

execute plan

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

SCAPE$PreservaCon$Suite

19

Small$scale:2Taverna2%Large$scale:2SCAPE2plaaorm

hFp://www.taverna.org.ukhFp://wiki.opf7labs.org/display/SP/SCAPE+Pla\orm

Page 20: Supporting the preservation lifecycle in repositories

deployplan

Repository

Environment and users

access, ingest,harvest

Scout

Plato

Scout Web UI & Email notification

Notification API

Report API

Scout Adaptors

Plan management API

Data Connector API

create/re-evaluateplans

Plato Web UI

monitored events and actions

monitored content

Workflow engine

Workflow engine API

Planner

execute plan

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

SCAPE$PreservaCon$Suite

20

P1

P2

P3

P4 Automa:on%and%integra:on

Page 21: Supporting the preservation lifecycle in repositories

SCAPE$PreservaCon$SuiteTools$and$APIs

Page 22: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

SCAPE$Digital$object$model

• Standard%model%for%represen:ng%digital%objects• Based%on%METS%and%PREMIS• Specifies%intellectual%en:ty%(SIP,%AIP%and%DIP)• Specifica:on:%hFps://github.com/openplanets/scape7pla\orm7api

22

Page 23: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Data$Connector$API

23

• Access%and%modify%content%on%the%repository• HTTP%REST%API• Methods:

• Retrieve%%intellectual%en:ty,%metadata,%representa:on,%file%or%named%bit%stream

• Ingest%intellectual%en:ty%(sync%or%async)• Update%intellectual%en:ty,%representa:on%or%file• Search%intellectual%en::es,%representa:ons%or%files%(SRU)

• API%specifica:on:%hFps://github.com/openplanets/scape7pla\orm7api

• Ref.%implementa:on:%Fall%2013%in%Fedora%4%and%RODA

Page 24: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Report$API

• Provides%access%to%repository%events• Events:

• Ingest%started%and%finished• Viewed$or%downloaded%descrip:ve%metadata%or%representa:on• Preserva:on%plan$executed

• OAI7PMH%data%provider• PREMIS%events%metadata

• Agent:%who%triggered%the%event• Date/:me:%when$did%the%event%occur• Details:%what%happened

• API%specifica:on:%hFps://github.com/openplanets/scape7pla\orm7api

• Ref.%implementa:on:%%hFps://github.com/openplanets/roda

24

Page 25: Supporting the preservation lifecycle in repositories

Scout:$a$preservaCon$watch$system

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

• Monitors%aspects%of%the%world%to%detect%preserva:on%risks%and%opportuni:es

• Triple%store• Adaptors

• Data%Connector%&%Report%API• SCAPE%Policy%model• PRONOM• Web%seman:c%extrac:on• Renderability%experiments

• Web%interface

• Triggers:%templates%and%SPARQL• Email%no:fica:ons

• Demo:%hFp://scout.scape.keep.pt

25

Content

Policies Web

Scout

Risk notification

Humanknowledge

Registries

hFp://openplanets.github.io/scout/

Page 35: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Plan$management$API

• Deploy2and2management2preserva/on2plans2in2the2repository• HTTP2REST2API• Methods:

• Search2and2retrieve2plans• Deploy2a2new2plan• Retrieve2or2add2a2plan2execu/on2state2(in2progress,2success2or2fail)• Update$plan2lifecycle2status2(enabled2or2disabled)

• Implementa/on2can2use:2• Workflow2engine:2Taverna2or2SCAPE2plaaorm• Data2connector2API

• API2specifica/on:2hFps://github.com/openplanets/scape7pla\orm7api

• Ref.2implementa/on:2Fall220132for2Fedora242and2RODA

27

Page 36: Supporting the preservation lifecycle in repositories

Planning

Content profile

Policies

Risks or Opportunities

Environment information

Define requirements

Evaluate alternatives

Analyse results

Build preservation plan

Preservation plan

Action alternatives

Operations

Watch

Representative sample content

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Plato:$a$preservaCon$planning$tool

• Systema:c%planning• Traceable,%documented,%

trustworthy

• Integrated:• Data%Connector%API%(Content)• Scout%(Watch,2Content2profile,2sampling)

• SCAPE%Policy%model• Plan%management%API%(Opera/ons)• Taverna%compa:ble%workflows

28

hFp://ifs.tuwien.ac.at/dp/plato

Page 37: Supporting the preservation lifecycle in repositories
Page 38: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle$(in$pracCce)

30

Planning

Operations

Watch

create/re-evaluateplans

deployplanexecute

action plan

monitored actions

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest Policies

Scout

Plato

Workflow2engine

Page 39: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Conclusions

31

P1: Does the content conform to policies? Are there any risks?Even on a changing content, policies and environment?

P2: How to easily and trustworthily decide which action to take?

S1: Use Scout: preservation watch system

S2: Use Plato: preservation planning tool

Page 40: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Conclusions

32

P3: How to ensure and monitor the quality of chosen action and that the decision assumptions remain valid?

P4: How to do digital preservation in large-scale environments?

S3: Q&A in preservation plans (Plato), monitoring of Q&A (Report API & Scout), automatic Scout triggers created by Plato

S4: Automation and end-to-end integration of preservation processes.

Page 41: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Roadmap

• Scout:• User%support• More%adaptors• More%trigger%templates

• Plato:• Automa:c%create%Scout%triggers• Automa:c%deploy%using%plan%management%API

• Repository%reference%implementa:ons:%RODA%and%Fedora%4

33

Page 42: Supporting the preservation lifecycle in repositories

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Conclusions

• All%APIs%published• Ref.%implementa:ons%in%RODA%and%Fedora%4%in%Fall%2013• All%tools%available%in%Github

34

Add preservation to your repository now!

Page 43: Supporting the preservation lifecycle in repositories

Luis%Faria%[email protected]

KEEP%SOLUTIONS%www.keep7solu:ons.com

Open%Repositories%2013CharloFetown,%PEI,%Canada,%2013707709

Suppor/ng2the2preserva/on2lifecycle2in2repositories

h"p://goo.gl/V6142