Upload
caroline-compton
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
a centre of expertise in data curation and preservation
2nd International Digital Curation Conference, November 2006
Reflections on open scholarship: process, product and people
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0
Funded by:
Dr Liz Lyon,
DCC Associate Director Outreach Director, UKOLN, University of Bath, UK
a centre of expertise in data curation and preservation
2nd International Digital Curation Conference, November 2006
Three themes• How?
– Unpacking the title: open scholarship
• What? – Creating and using science-ready archives
• Who?– “Digital natives” as data scientists
•Publicly available?
•Shared?
•Inclusive?
•Collaborative?
•Participative?
•Non-proprietary?What do we mean by “open”?
Scholarship today? “Open Access”
Data-centric 2020 vision
Data-driven science
Reference datasets as infrastructure
Research into neglected tropical diseases http://www.thesynapticleap.org/
“Open source science”
http://openwetware.org/wiki/Main_Page
Synthetic biology: materials for (bio) mash-ups? Interesting IPR issues…..
Bioblog
Blogs, blogs and meta-blogs….
The Tool Box?
http://www.flickr.com/photos/64696485@N00/13146762/
The Peer Review Process?
The Scientific Paper?
http://www.ch.ic.ac.uk/wiki2/index.php/Mauveine
Crystal Structure reports - data-rich scientific articles
• 3-d positional coordinates• Atomic motions• Molecular geometry• Chemical bonding• Crystal packing• Chemical behaviour arising
from structure
• Two dedicated IUCr journals: Acta Cryst. C, E
• Important part of scientific discussion in many other titles: Acta Cryst. B, D, F
Original slide: Brian McMahon, IUCr
Validation of data through publication
• Data-centric scholarly “publications”
• Raw, primary, derived data integrated with interpretations
• Mandatory submission of data with text
a centre of expertise in data curation and preservation
2nd International Digital Curation Conference, November 2006The database publication?
http://declanbutler.info/blog/?p=58#more-58
The “mash-up”
Data from FAO, WHO
+
Google Earth
Pause for thought…..• Big science communities
– Grid-enabled applications– Large managed open data archives– Funder policy driver
• Small(er) science communities– Collaborative and social software– Evolving open wikis and blogs– Grassroots driver
• Curation and preservation issues– Burgeoning wiki and blog content– Web archiving
• Positioning of repositories???
Big science
Funder-mandated sharing?
Top down
Small science
Community culture Discipline? Institution?
Bottom up
“science-ready archives”
• Laboratory protocols: common practice
• Instrumentation: proprietary software
• Standard specifications and formats
Data capture
• Working towards standard specifications in the lab– Open Microscopy Environment OME– Medical imaging DICOM– Flow cytometry standard FCS– Mass Spectrometry Standards Working Group
mzData vs mzXML
• Laboratory management data systems in development
Maintenance Engineer Aircraft Lands
Visual Inspection
Provide Information
Quote Diagnos is
Brief Diagnos is / Prognos is
Check Diagnoses
Maintenance Procedure
Diagnos is Result
Release Engine
complete
Maintenance Result
Maintenance Analys t (Fleet Manager)
Detailed Diagnos is / Prognos is
Provide Further Details
Reques t Information
Sign-off Diagnos is
Analys t Decis ion
[ information required ]
[ diagnosis ]
DAME signal processing workflows using Grid Services
Domain Expert
Detailed Analys is
[ unknown ]
Reques t Further Details
Expert Decis ion
[ known ][ Clear ]
[ unknown ]
[ information required ]
[ diagnosis ]
[ fault unresolved ]
[ fault resolved ]
Rolls RoyceDS&SAirport
RepoMMan: Repository Metadata and Management (Univ Hull) using WS-BPEL
Workflow: m2m?
e-Scientist desktop?
Slide: Carole Goble
Silchester: A VRE for Archaeology
Harmonisation and normalisation • Standard Deposit API (GNU eprints, Dspace, Fedora)• Dublin Core Application Profile for ePrints (+ Eduserv)
• Requirements: richer metadata set, support for value-added services, version identification, appropriate copy (OA), citations
• Based on FRBR• Data model for scholarly works• Application profile includes simple and qualified DC properties
The ePrints application profile• simple DC properties (the usual suspects … )
– identifier, title, abstract, subject, creator, publisher, type, language, format• qualified DC properties
– access rights, licence, date available, bibliographic citation, references, date modified
• new properties– grant number, affiliation institution, status, version, copyright holder
• properties from other schemes– funder, supervisor, editor (MARC relators) – name, family name, given name, workplace homepage, mailbox, homepage
(FOAF)• clearer use of existing relationships
– has version, is part of• new relationship properties
– has adaptation, has translation, is expressed as, is manifested as, is available as
• vocabularies – access rights, entity type, resource type and status
Slide: Julie Allinson, UKOLN, Andy Powell, Eduserv
Use DC Application Profile for ePrints?
Data description and discovery• Validation, publication & discovery of
data models & schema• eBank Application Profile
http://www.ukoln.ac.uk/projects/ebank-uk/schemas/
• Harmonisation and normalisation of metadata and semantics
• DOI http://dx.doi.org/10 .1594/ecrystals.chem.soton.ac.uk/145
• Rights & Citation policy http://ecrystals.chem.soton.ac.uk/rights.html
• Crystallography: a community working together
Aggregator services
Institutional data repositories
Deposit , Validation
Publication
Validation
Data analysis, transformation, mining, modelling Search,
harvest
Presentation services / portals
Data discovery, linking, citation
Laboratory repository
Deposit
eCrystals ‘Global Federation’ Model 23/10/2006
Publishers: peer-review journals, conference proceedings, etc
Curation
Preservation
Subject Repository
Institution Library & Information Services
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0Data creation
& capture in “Smart lab”
Data discovery, linking, citation
Search, harvest
Search, harvest
Deposit
Deposit
Deposit
Data deposit & sharing: roles and responsibilities
• Funder• Institution• Faculty• Individual
Noor et al PLoS Biol 4(7) 2006
eBank Project exemplar
Adding value: aggregating & linking data + interpretations
“Repository wow-factor”…
…or adding value through user interface tools…
Facilitating use and re-use: text mining tools
• Adding value
Second pause for thought…
• We need to work with instrument suppliers
• We need to understand more about workflow
• We need to develop new ways of adding value to datasets through innovative user tools and services
• We need more evidence of how data is used and re-used (or not…)
Getting the skills mix• Communities, teams, individuals• International Virtual Observatory
Alliance– Global community– Virtual organisation
• Multi-disciplinary team approach– eBank Project exemplar: computer
scientists, domain scientists (chemists), digital library experts
– Lessons learnt: e-Science Human Factors Audit Report 2006 Roy Kawalsky, Loughborough
• NSF Report 2005 Long-lived digital data collections – “Data scientist”
?
Wanted! data scientist
Digital natives as data scientists?• eBank Project: assessing role of
research data in u/g Chemical Informatics and MChem courses at Univ. of Southampton
• Pedagogic evaluation by Grainne Conole
• Report imminent….
“Well basically I’ve done nothing like it before, so it’s the first time I’ve sort of delved into computing or computational chemistry … quite nice, quite enjoyed starting off with just like a string of data and pop it into say a database, just a flat string of numbers basically and then come out with a crystal structure, which is exactly what it should represent which is quite cool”
“There were several parts to the course – We started off with how to get 2D and 3D representations of molecules onto a computer using a one-dimensional format, a SMILE string …so just ways of like getting data into a format so that it can be easily shared between different computers or different people without having to change lots of things”
Source: Grainne Conole
New skills requirements:
• interdisciplinary
• quantitative
• data curation
Integrate within the curriculum
Wingreen & Botstein Mol Cell Biol 7, 2006
Final pause for thought…
• Various approaches to develop and obtain digital curation skills
• Skills are there but often in discrete communities: we need to bring communities together (like at this conference…)
• Integration within the curriculum: undergraduate students, library & information science, archival studies, computer science
• Provide recognition and a career path for emerging data scientists
a centre of expertise in data curation and preservation
2nd International Digital Curation Conference, November 2006
Take home messages
• Scholarship is changing fast
• Big science and open source science both create significant digital curation challenges
• Science-ready archives are the goal
• Native data scientists are coming
• The culture will change too……….
a centre of expertise in data curation and preservation
2nd International Digital Curation Conference, November 2006
Thank you….