26
Developing data management expertise at King’s College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould, Archives & Information Management (AIM)

Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

Embed Size (px)

Citation preview

Page 1: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

Developing data management expertise at King’s College London

Experience of the PEKin projectGareth Knight, Centre for e-Research (CeRch)

Lindsay Ould, Archives & Information Management (AIM)

Page 2: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

2

Overview

1. Aims & objectives of PEKin project

2. Project methodology

3. Findings on current state of data management

4. Action taken to address issues

5. Further work to be performed

6. Lessons learnt

7. Potential for reuse of project deliverables

Page 3: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

3

What is PEKin?

Title: Preservation Exemplar at King’s (PEKin)

Funder: JISC, Preservation strand of

Information Environment 09-11

Time period: 1 April 09 – 31 October 10

Project partners:• Centre for e-Research (CeRch)• Archives & Information Management (AIM)• Based at King’s College London

Page 4: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

4

What is a Digital Record?

“Recorded information generated, collected or received in the initiation,

conduct or completion of an activity and that comprises

sufficient content, context and structure to provide proof or

evidence of that activity “International Committee on Archives (ICA)

Page 5: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

5

New archiving challenges

Changing state of digital information: •Changing notion of what constitutes a record of business:

• Core business: student information, committees, estates, etc.• Increasingly research outputs (data, papers) – funder requirements

•Changing composition:• Born digital content (static and dynamic resourcees)• Hybrid (paper+digital), digital only

•Lifecycles:• Creation process: Create, revise, publish 1st version, revise, publish

2nd version. Repeat.• Access lifecycle: Technology dependencies (hardware & software)

Implications:• Archival process: Archive at earlier stage? Capture using different

technologies?• Data value: Can we be sure that everything has business value?

Page 6: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

6

Methodology

1. Evaluate existing information management procedures and working practices at institutional level and revise accordingly

• What remains viable?• Elements that require revision• Gaps and omissions

2. Determine the data management needs that data producers and systems managers in academic units/professional services encounter and determine most effective approach to address requirements

3. Implement a technical system capable of curating and preserving digital records of long-term archival value.

Page 7: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

7

Review of existing frameworks

Reviewed DAF, DRAMBORA & DIRKS, etc. All req. further refinement to apply to own situation.

DAF: Data Asset/Audit Framework+ Useful for gathering detailed information on data assets located in departments

+ Useful for analysing data management practices

- Time-consuming to perform

- Does not provide a method of evaluating problems & developing a mitigation strategy

DRAMBORA+ Provides formal structure for identifying, describing & evaluating risks & developing a strategy to mitigate or avoid them.

+ well-defined list of risk categories and factors

- Intended for OAIS-like environments rather than less formalised research ‘systems’

- Focus upon OAIS workflow rather than data creation lifecycle

Page 8: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

8

Integrating frameworks

DAF & DRAMBORA are broadly similar, but some work needed:•Normalised terminology and definitions & adopted some archival terminology•Activity classification: Activities placed in diff. categories in DRAMBORA & DAF.•“Light touch” approach - establish balance between DRAMBORA system-level & DAF asset level analysis

• high-level analysis of data assets using DAF• Omitted various DRAMBORA risk categories unrelated to data

management•Adopted e-Research lifecycle model•Stages were tied-in with distinct project outputs

Page 9: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

9

Audit FrameworkInitial contact with unit

managerEstablish buy-in

Select methodology and prepare material

Obtain and analyse publicly accessible

resources

Obtain and analyse internal resources

Document information

Analyse data assetsAnalyse data

management activitiesProduce case study

Identify risks and determine

consequences

Evaluate risks and determine

consequences

Prepare risk analysis report

Develop mitigation strategy

Present mitigation strategies to stakeholders

Develop mitigation plan

AssessManagement

approach

Analyse documentary

sources

PlanAudit

AnalyseRisk

Develop management

strategy

Implement mitigation plan

Implement management

strategy

Evaluation success of mitigation plan implementation

Page 10: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

10

Administrative Case Studies

Business departments & content types examined:

• Committee: Council, Academic Board and sub-committees• Estates: Project & operational records• Student: Records held outside the Student system (SITS)

•Archival value digital records•Mapped to current College paper holdings

Page 11: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

11

Research Case Studies

Research groups/projects/departments examined:

• Environmental Research Group (ERG)• Environment Monitoring group• Environment Modelling group

• Twins Early Development Study (TEDS)• Regional Information Collection Centre (RICC)

•Period of change – since April 2010, IT provided centrally with storage provision review underway•Archives have previously ly accepted pioneering research data in past•Acquisition policy is now under review for born digital/digitised records

Page 12: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

12

Administrative study findings

• Opportunity to redefine collections• All areas required digital records management support before archives could be identified

• Quality control varied between records• Duplication with paper and born-digital versions retained

• Lack of ownership of born digital records by administrative staff

Page 13: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

13

Research study findings

• Challenge to identify data sets of archival value• TEDS & ERG funded dedicated data management

roles including back-up & information security processes

• However, majority of research groups do not have equivalent support, placing data at risk

• Funding bids lacked formal data management plans to provide assurance or influence further funding

• Continuing preservation of data not considered with focus on current work

Page 14: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

14

Comparison of research & admin data management

• Individual researchers & administrative staff lack understanding of risk and use personal data approach

• Understanding of digital environment is still outside their comfort zone - hybrid duplicated collections

• High risk when staff – Principle Investigators or Administrators leave

• No point of contact for advice or support

Page 15: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

15

Risk Assessment of research data management

• Multiple risks identified• Active data management was good - recommendations made for best practice

• Mitigation• Content versioning system• Store multiple versions of each data file• Implement integrity monitoring• Data management plan to document approach

Page 16: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

16

Risk Assessment of administrative data management

• More risks identified than with research data • Lack of business owner for data sets• ISS provide storage & systems management but little data management expertise

• ISS Data Management role now in place• Move to digital capture will address risks • Risk mitigation as for research records

Page 17: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

17

Actions taken by project

1. Institution-level Policies

2. Work with departments to address data management risks

3. Documentation

4. Implementation of KCL Digital Archive

Page 18: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

18

1. Institution-level Policies

Update of existing policies:• Acquisition policy: Refinements to existing acquisition policies• Retention Policy: Appraisal criteria for records of value• Information Management: Appraisal criteria and advisory

material

Develop new policies:• Preservation Policy: content preservation strategy for

institutional data of short and long-term value

See http://www.kcl.ac.uk/iss/igc/tools/staff.html for guidance currently available

Page 19: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

19

2. Liaise with data creators & managers

•Enable management to gain a better understanding of data assets within their department/group and the potential risk factors that may limit data usage.

•Work with data producers & systems managers to address data management issues that they identified as a concern, e.g. versioning

•Make data producers & management aware of risk factors that exist and make recommendations for actions that may help to avoid or mitigate issues.

•Make them aware of support available within College & other departments/groups/projects that are working to resolve common issues.

Page 20: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

20

3. Documentation

Self-help documentation to help data creators & managers to:• Understand data management issues & key concepts• Practical steps to diagnose and address DM issues/people to contact

Data Management ‘workbook’:• Creating your data: Issues to consider prior and in early stages of

development to ensure data is fit for purpose & usable over time.• Organising your data: Methods for structuring & documenting data to enable it

to be used & understood• Maintaining access and use of data: Approaches that may be adopted to

ensure continued access & use of data.• Appraising your data: Recommendations for applying archival principles

Content Type Reports:• Short pragmatic reports tailored to specific content types (raster images,

audio, e-mail, documents)

To be published on KCL web site in near future

Page 21: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

21

4. KCL Digital Archive

Implemented Alfresco ECM (Community Edition) to manage college data of long-term archival value

Standards compliance• OAIS RM, U.S. Department of Defense 5015.2-STD, ISO 15489,

TRAC when in full service

•Bitstream preservation:• fixity creation/verification, online + offline storage

•Information Content Preservation:• Format conversion, event logging – audit trail

•Access:• Limited to archive reading room, catalogue descriptive MD to

common standard

Page 22: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

22

Rules-based approach to data management

jBPM synchronous or asynchronous workflows•Content model compliance

• Conforms to defined structure & object types

•Fixity generation• All: MD5, SHA-1, CRC

•Format identification• All: File(1), DROID

•Technical metadata extraction• Format specific: JHOVE, MP3Info, others

•Conversion to preservation & dissemination derivative• parameters for each format & MD criteria (e.g. OpenOffice, ImageMagick)• Record action results as PREMIS Event

•Close collection to prevent further update•Obsolescence monitoring?

• Risk assessment based upon future development of PRONOM/UDFR

•Manual activity for future date?

Page 23: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

23

Future Plans

• Embedding approach into archives & wider institution

• Identify research management needs at early stage (funding proposal, active/semi-active use) rather than end

• Skills audit & needs assessment• Support & training for data management staff

• College Storage strategy • Increased availability of College storage

Page 24: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

24

Lessons Learnt

• Better understanding of data ‘ecosystem’ in college – data lifecycle, infrastructure

• Progress made with identifying & addressing data management support – need to ‘scale-up’ to college as whole.

• Need to manage semi-current record, in addition to active and archival records

• Requirements for storage• Raised profile for Archives & CeRch• Need for cross-disciplinary approach to managing

data – combination of expertise & shared language

Page 25: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

25

What may be used by other projects?

Output Use

Project Methodology Anyone wishing to combine

archival/curation approach for

managing digital records

Audit methodology + templates Anyone wishing to perform similar

assessment and evaluation of DM

activities.

Data Management workbook &

Content Type reports

Anyone wishing to implement DM

practices in their own

institution/compare against others/

staff wishing to improve DM practices

Data management system Experience & documentation on use of

Alfresco as preservation system

Page 26: Developing data management expertise at Kings College London Experience of the PEKin project Gareth Knight, Centre for e-Research (CeRch) Lindsay Ould,

Thank YouAny questions?

Gareth Knight Lindsay Ould

Centre for e-Research

(CeRch)

Archives & Information Management

(AIM)

[email protected] [email protected]