Supporting the preservation lifecycle in repositories

Preview:

DESCRIPTION

To accomplish effective digital preservation, repositories need to be able to incorporate processes such as planning, monitoring and preservation operations. These processes feed into each other and create a continuous cycle that allows a repository to detect opportunities and risks and act accordingly. Each of these digital preservation processes have already been extensively studied (Antunes et al., 2011; CCSDS, 2002; Hunter & Choudhury, 2006) and tools to support each process have already been developed (Asseg et al., 2013; Becker et al., 2009; Faria et al., 2012), but many repository implementations still lack complete and continuous digital preservation features. This presentation shows a global view on digital preservation processes and how they fit together in a digital preservation cycle. Furthermore, it describes tools that support these processes and explains how to incrementally integrate them into digital repositories providing a complete systematic and semi-automatic digital preservation system.

Citation preview

Luis%Faria%lfaria@keep.pt

KEEP%SOLUTIONS%www.keep7solu:ons.com

Open%Repositories%2013CharloFetown,%PEI,%Canada,%2013707709

Suppor/ng2the2preserva/on2lifecycle2in2repositories

h"p://goo.gl/V6142

h8p://www.keep<solu/ons.com

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

KEEP$SOLUTIONS

• Company2specialized2in2informa/on2management• Digital2preserva/on2experts• Open2source:2RODA,2KOHA,2DSpace,2Moodle,2etc.• Scien/fic2research

• SCAPE:%large7scale%digital%preserva:on%environments• 4C:%digital%preserva:on%cost%modeling

2

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

KEEP$SOLUTIONS$research$partners

3

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

The$past:$RODA$1.0.0

• Presented%in%Open%Repositories%2009• Open%source%digital%repository• Based%on%Fedora%Commons• Modern%web%interface• For%archives• For%digital%preserva:on

4

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

The$present:$RODA$Community

• Adapted2to2be2a2true2open<source2project• For2users

• Easy2to2install• Easy2to2test2(virtual2machine)• Support2mailing2lists2and2documenta/on• Free2or2paid2support

• For2developers• Development2and2transla/on2guidelines• Easy2build2(maven)• Available2on2GitHub• Support2mailing2lists• Plenty2more2documenta/on

• More2info:2h8p://www.roda<community.org

6

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Current$pracCce$problems

• Repository%has%content• Organiza:on%has%policies%in%place%(e.g.%no%compression%allowed)

8

P1: Does the content conform to policies? Are there any risks?Even on a changing content, policies and environment?

• Found%a%preserva:on%risk!

P2: How to easily and trustworthily decide which action to take?

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Current$pracCce$problems

• Content%grows%exponen:ally%in%volume,%heterogeneity%and%complexity

9

P4: How to do digital preservation in large-scale environments?

• Know%what%ac:on%to%take

P3: How to ensure and monitor the quality of chosen action and that the decision assumptions remain valid?

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle

10

Repository

Environment and users

access, ingest, harvest

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle

11

Watch

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle

12

Planning

Watch

create/re-evaluateplans

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest

Planning

Watch

create/re-evaluateplans

deployplan

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest

Operations

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle

13

PlanningOperations

Watch

create/re-evaluateplans

deployplan

executeaction plan

monitored actions

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle

14

PlanningOperations

Watch

create/re-evaluateplans

deployplan

executeaction plan

monitored actions

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest Policies

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle

15

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle$(in$pracCce)

16

Planning

Operations

Watch

create/re-evaluateplans

deployplanexecute

action plan

monitored actions

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest Policies

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle$(in$pracCce)

17

Planning

Operations

Watch

create/re-evaluateplans

deployplanexecute

action plan

monitored actions

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest Policies

Scout

Plato

Workflow2engine

deployplan

Repository

Environment and users

access, ingest,harvest

Scout

Plato

Scout Web UI & Email notification

Notification API

Report API

Scout Adaptors

Plan management API

Data Connector API

create/re-evaluateplans

Plato Web UI

monitored events and actions

monitored content

Workflow engine

Workflow engine API

Planner

execute plan

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

SCAPE$PreservaCon$Suite

18

deployplan

Repository

Environment and users

access, ingest,harvest

Scout

Plato

Scout Web UI & Email notification

Notification API

Report API

Scout Adaptors

Plan management API

Data Connector API

create/re-evaluateplans

Plato Web UI

monitored events and actions

monitored content

Workflow engine

Workflow engine API

Planner

execute plan

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

SCAPE$PreservaCon$Suite

19

Small$scale:2Taverna2%Large$scale:2SCAPE2plaaorm

hFp://www.taverna.org.ukhFp://wiki.opf7labs.org/display/SP/SCAPE+Pla\orm

deployplan

Repository

Environment and users

access, ingest,harvest

Scout

Plato

Scout Web UI & Email notification

Notification API

Report API

Scout Adaptors

Plan management API

Data Connector API

create/re-evaluateplans

Plato Web UI

monitored events and actions

monitored content

Workflow engine

Workflow engine API

Planner

execute plan

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

SCAPE$PreservaCon$Suite

20

P1

P2

P3

P4 Automa:on%and%integra:on

SCAPE$PreservaCon$SuiteTools$and$APIs

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

SCAPE$Digital$object$model

• Standard%model%for%represen:ng%digital%objects• Based%on%METS%and%PREMIS• Specifies%intellectual%en:ty%(SIP,%AIP%and%DIP)• Specifica:on:%hFps://github.com/openplanets/scape7pla\orm7api

22

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Data$Connector$API

23

• Access%and%modify%content%on%the%repository• HTTP%REST%API• Methods:

• Retrieve%%intellectual%en:ty,%metadata,%representa:on,%file%or%named%bit%stream

• Ingest%intellectual%en:ty%(sync%or%async)• Update%intellectual%en:ty,%representa:on%or%file• Search%intellectual%en::es,%representa:ons%or%files%(SRU)

• API%specifica:on:%hFps://github.com/openplanets/scape7pla\orm7api

• Ref.%implementa:on:%Fall%2013%in%Fedora%4%and%RODA

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Report$API

• Provides%access%to%repository%events• Events:

• Ingest%started%and%finished• Viewed$or%downloaded%descrip:ve%metadata%or%representa:on• Preserva:on%plan$executed

• OAI7PMH%data%provider• PREMIS%events%metadata

• Agent:%who%triggered%the%event• Date/:me:%when$did%the%event%occur• Details:%what%happened

• API%specifica:on:%hFps://github.com/openplanets/scape7pla\orm7api

• Ref.%implementa:on:%%hFps://github.com/openplanets/roda

24

Scout:$a$preservaCon$watch$system

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

• Monitors%aspects%of%the%world%to%detect%preserva:on%risks%and%opportuni:es

• Triple%store• Adaptors

• Data%Connector%&%Report%API• SCAPE%Policy%model• PRONOM• Web%seman:c%extrac:on• Renderability%experiments

• Web%interface

• Triggers:%templates%and%SPARQL• Email%no:fica:ons

• Demo:%hFp://scout.scape.keep.pt

25

Content

Policies Web

Scout

Risk notification

Humanknowledge

Registries

hFp://openplanets.github.io/scout/

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Plan$management$API

• Deploy2and2management2preserva/on2plans2in2the2repository• HTTP2REST2API• Methods:

• Search2and2retrieve2plans• Deploy2a2new2plan• Retrieve2or2add2a2plan2execu/on2state2(in2progress,2success2or2fail)• Update$plan2lifecycle2status2(enabled2or2disabled)

• Implementa/on2can2use:2• Workflow2engine:2Taverna2or2SCAPE2plaaorm• Data2connector2API

• API2specifica/on:2hFps://github.com/openplanets/scape7pla\orm7api

• Ref.2implementa/on:2Fall220132for2Fedora242and2RODA

27

Planning

Content profile

Policies

Risks or Opportunities

Environment information

Define requirements

Evaluate alternatives

Analyse results

Build preservation plan

Preservation plan

Action alternatives

Operations

Watch

Representative sample content

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Plato:$a$preservaCon$planning$tool

• Systema:c%planning• Traceable,%documented,%

trustworthy

• Integrated:• Data%Connector%API%(Content)• Scout%(Watch,2Content2profile,2sampling)

• SCAPE%Policy%model• Plan%management%API%(Opera/ons)• Taverna%compa:ble%workflows

28

hFp://ifs.tuwien.ac.at/dp/plato

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

PreservaCon$lifecycle$(in$pracCce)

30

Planning

Operations

Watch

create/re-evaluateplans

deployplanexecute

action plan

monitored actions

monitored content and events

monitored environment and users

Repository

Environment and users

access, ingest, harvest Policies

Scout

Plato

Workflow2engine

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Conclusions

31

P1: Does the content conform to policies? Are there any risks?Even on a changing content, policies and environment?

P2: How to easily and trustworthily decide which action to take?

S1: Use Scout: preservation watch system

S2: Use Plato: preservation planning tool

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Conclusions

32

P3: How to ensure and monitor the quality of chosen action and that the decision assumptions remain valid?

P4: How to do digital preservation in large-scale environments?

S3: Q&A in preservation plans (Plato), monitoring of Q&A (Report API & Scout), automatic Scout triggers created by Plato

S4: Automation and end-to-end integration of preservation processes.

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Roadmap

• Scout:• User%support• More%adaptors• More%trigger%templates

• Plato:• Automa:c%create%Scout%triggers• Automa:c%deploy%using%plan%management%API

• Repository%reference%implementa:ons:%RODA%and%Fedora%4

33

This%work%was%par,ally%supported%by%the%SCAPE%Project.The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137).

Conclusions

• All%APIs%published• Ref.%implementa:ons%in%RODA%and%Fedora%4%in%Fall%2013• All%tools%available%in%Github

34

Add preservation to your repository now!

Luis%Faria%lfaria@keep.pt

KEEP%SOLUTIONS%www.keep7solu:ons.com

Open%Repositories%2013CharloFetown,%PEI,%Canada,%2013707709

Suppor/ng2the2preserva/on2lifecycle2in2repositories

h"p://goo.gl/V6142

Recommended