22
Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

Embed Size (px)

Citation preview

Page 1: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

Repositioning for repositories:making the move to science data management

Gerry Ryder

CSIRO Information Management & Technology

21 January 2009

Page 2: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

Managing science data

“Researchers need to be obliged to document and manage

their data with as much professionalism as they devote to

their experiments. And they should receive greater support

in this endeavour than they are afforded at present.”

    - Editorial, Nature 455, 1 (4 September 2008)

Page 3: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

Presentation overview

• About CSIRO

• About e-Science

• Managing the ‘data deluge’

• CSIRO’s e-Science Information Management Strategy

• e-SIM pilot project and data survey

• Developing the CSIRO Data Management Service

• Opportunities, issues and challenges

Page 4: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

About CSIRO

• CSIRO: the Commonwealth Scientific and Industrial Research Organisation

• Australia’s national science agency

• 6,400 staff across 50 sites throughout Australia and overseas

• Researching agribusiness, energy and transport, environment and natural resources, health, information and communications technology, manufacturing and mineral resources

Page 5: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

About e-Science

• Computationally intensive science carried out in highly distributed network environments

• Advanced ICT capabilities enabling greater access to broadband communication networks, research instruments and facilities, sensor networks and data repositories

• Scientists can collaborate with colleagues and utilise scientific facilities across the globe

• Creating huge volumes of data to store, share, manipulate and re-use

Page 6: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

The “data deluge”

• Between 2006-2010 data will increase 6x : from 161 exabytes to 988 exabytes

• 1 exabyte = 1 billion gigabytes!

• “Big Science” : in one month, the Square Kilometre Array Pathfinder will generate more information than is currently contained in the world’s academic libraries

• “Small Science”: produces images, videos, spreadsheets, specimens, samples, survey results, databases

Page 7: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

Australian National Data Service (ANDS)

Create a collection of services and support infrastructure that will allow researchers to manage and share data across organisational and disciplinary boundaries

• Influence national data management policy

• Inform best practice for the curation of data

• Create a cohesive collection of research resources

• Facilitate access to data

Page 8: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

e-Science Information Management Strategy

e-SIM Vision:

“CSIRO scientists will be able to engage in computation and data intensive, collaborative research across disciplines, organisations and countries by gathering, analysing and sharing scientific information securely and efficiently, leading to greater scientific outcomes for Australia”

Page 9: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

e-SIM benefits

• Broadcast and share research outputs in a secure and reliable way

• Long term preservation of data assets at a lower cost

• Discover and re-use existing data within CSIRO and externally

• Collaborate with science partners in a trusted environment

• Protect copyright and intellectual property

• Identify new research partners and datasets

Page 10: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

Developing the e-SIM program

• Small multi-disciplinary team – IT, BA, Library, Project

Manager

• Partner with researchers to build knowledge

• Develop a technology package

• Document policy issues

• Identify skill-sets

Page 11: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

e-SIM pilot project

• Develop a pilot repository service built on Fedora, Muradora and GeoNetworks components

• Map researcher workflows to understand the relationship between research and data lifecycles

• Document metadata requirements for data description and discovery

• Valuable learning exercise, but …

Page 12: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

Conducting a data survey

85 Theme Leaders interviewed by 35 Library Services staff

16 structured questions + anecdotal information covering:

• sources of data

• data ownership

• data re-use and sharing

• confidentiality, privacy and IP issues

• data types and formats

• data storage and volume

• retention and disposal practices

Page 13: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

Pilot and survey findings

• Researchers want to manage their data more effectively

• Many researchers would be willing to share their data more broadly, but lack skills, time and infrastructure to do so

• Much data is obtained from third parties such as clients, collaborators, public agencies or colleagues

• Lack of clarity around copyright, licensing and reuse of data and relevant legislation such as the Archives Act

• Volume of data expected to grow significantly over the next 5 years

• A high proportion of researchers retain all data

Page 14: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

Developing the Data Management Service

4 components to the Data Management Service

1. A skilled set of information managers who can assist researchers to describe their data using metadata and provide them with support in using the systems and tools that underpin the service

2. A registry service via a web-front end that will enable researchers to deposit and describe their data sets

3. An identity provider service (AAF) that will enable

researchers to manage access to data held in repositories based on user profile

4. Repositories for the secure storage of managed data

Page 15: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

The Information Design Service

Outreach and support

• promoting the Data Management Service to researchers

• training and supporting researchers to use data management tools and processes

• identifying candidate data collections

• assisting researchers to identify and search relevant data repositories

Page 16: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

The Information Design Service

Metadata support

• profiling and translation of metadata

• defining CSIRO metadata requirements

• metadata enhancement and quality assurance

• assisting with bulk import and migration of metadata

Page 17: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

The Information Design Service

Data curation and lifecycle management

• retention and disposal guidelines

• format obsolescence and migration

• rights management - incorporating embargo periods, license restrictions and copyright

Page 18: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

The Information Design Service

Information architecture and technical liaison

• internal organisation of repository content including collections, folders and file-naming

• external presentation of repository content

• interface and functionality of ingest and search tools

Page 19: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

Challenges moving forward

• Standards fragmented or non-existent

• Data is complex – no title page or contents pages!

• Rapidly evolving area with many stakeholders

• Managing research inputs and outputs

• Open access, copyright, IP management issues

• Bridging the language gap

Page 20: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

What we bring to data management

Skills and experience in:

• Describing resources

• Searching and retrieving resources

• Working in a standards based environment

• Managing databases and content in different formats

• Promoting and supporting information tools and services

But … we bring those skills to an environment that is more fragmented, more complex, more dynamic and less mature than we are used to

Page 21: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

CSIRO. Respositioning for repositories. Informaiton Online 2009, Sydney, Australia.

Summing up

“Researchers need to be obliged to document and manage

their data with as much professionalism as they devote to

their experiments. And they should receive greater support

in this endeavour than they are afforded at present.”

    - Editorial, Nature, 455, 1 (4 September 2008)

“ A repository is not a piece of software and a system, but a set of services”

- Palmer, C. et al (2008)

Page 22: Repositioning for repositories: making the move to science data management Gerry Ryder CSIRO Information Management & Technology 21 January 2009

Contact UsPhone: 1300 363 400 or +61 3 9545 2176

Email: [email protected] Web: www.csiro.au

Thank you

CSIRO Information Management and TechnologyGerry Rydere-Science Information Management Program

Phone: 08 8303 8687 Email: [email protected]: www.csiro.au