a centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship: process, product and people This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0 Funded by: Dr Liz Lyon, DCC Associate Director Outreach Director, UKOLN, University of Bath, UK Slide 2 a centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Three themes How? Unpacking the title: open scholarship What? Creating and using science-ready archives Who? Digital natives as data scientists Slide 3 Publicly available? Shared? Inclusive? Collaborative? Participative? Non-proprietary? What do we mean by open? Slide 4 Scholarship today? Open Access Slide 5 Data- centric 2020 vision Data-driven science Slide 6 Reference datasets as infrastructure Slide 7 Research into neglected tropical diseases http://www.thesynapticleap.org/ Open source science Slide 8 http://openwetware.org/wiki/Main_Page Slide 9 Slide 10 Slide 11 Synthetic biology: materials for (bio) mash-ups? Interesting IPR issues.. Slide 12 Bioblog Blogs, blogs and meta- blogs. Slide 13 The Tool Box? http://www.flickr.com/photos/64696485@N00/13146762/ Slide 14 The Peer Review Process? Slide 15 The Scientific Paper? http://www.ch.ic.ac.uk/wiki2/index.php/Mauveine Slide 16 Crystal Structure reports - data-rich scientific articles 3-d positional coordinates Atomic motions Molecular geometry Chemical bonding Crystal packing Chemical behaviour arising from structure Two dedicated IUCr journals: Acta Cryst. C, E Important part of scientific discussion in many other titles: Acta Cryst. B, D, F Original slide: Brian McMahon, IUCr Validation of data through publication Slide 17 Data-centric scholarly publications Raw, primary, derived data integrated with interpretations Mandatory submission of data with text Slide 18 a centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 The database publication? Slide 19 http://declanbutler.info/blog/?p=58#more-58 The mash-up Data from FAO, WHO + Google Earth Slide 20 Pause for thought.. Big science communities Grid-enabled applications Large managed open data archives Funder policy driver Small(er) science communities Collaborative and social software Evolving open wikis and blogs Grassroots driver Curation and preservation issues Burgeoning wiki and blog content Web archiving Positioning of repositories??? Slide 21 Big science Funder-mandated sharing? Top down Small science Community culture Discipline? Institution? Bottom up science-ready archives Slide 22 Laboratory protocols: common practice Instrumentation: proprietary software Standard specifications and formats Data capture Slide 23 Working towards standard specifications in the lab Open Microscopy Environment OME Medical imaging DICOM Flow cytometry standard FCS Mass Spectrometry Standards Working Group mzData vs mzXML Laboratory management data systems in development Slide 24 RepoMMan: Repository Metadata and Management (Univ Hull) using WS-BPEL Workflow: m2m? e-Scientist desktop? Slide: Carole Goble Slide 25 Silchester: A VRE for Archaeology Slide 26 Harmonisation and normalisation Standard Deposit API (GNU eprints, Dspace, Fedora) Dublin Core Application Profile for ePrints (+ Eduserv) Requirements: richer metadata set, support for value-added services, version identification, appropriate copy (OA), citations Based on FRBR Data model for scholarly works Application profile includes simple and qualified DC properties Slide 27 The ePrints application profile simple DC properties (the usual suspects ) identifier, title, abstract, subject, creator, publisher, type, language, format qualified DC properties access rights, licence, date available, bibliographic citation, references, date modified new properties grant number, affiliation institution, status, version, copyright holder properties from other schemes funder, supervisor, editor (MARC relators) name, family name, given name, workplace homepage, mailbox, homepage (FOAF) clearer use of existing relationships has version, is part of new relationship properties has adaptation, has translation, is expressed as, is manifested as, is available as vocabularies access rights, entity type, resource type and status Slide: Julie Allinson, UKOLN, Andy Powell, Eduserv Slide 28 Use DC Application Profile for ePrints? Slide 29 Data description and discovery Validation, publication & discovery of data models & schema eBank Application Profile http://www.ukoln.ac.uk/projects/ebank- uk/schemas/ Harmonisation and normalisation of metadata and semantics DOI http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145 Rights & Citation policy http://ecrystals.chem.soton.ac.uk/rights.html Crystallography: a community working together Slide 30 Aggregator services Institutional data repositories Deposit, Validation Publication Validation Data analysis, transformation, mining, modelling Search, harvest Presentation services / portals Data discovery, linking, citation Laboratory repository Deposit eCrystals Global Federation Model 23/10/2006 Publishers: peer- review journals, conference proceedings, etc Curation Preservation Subject Repository Institution Library & Information Services This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0 Data creation & capture in Smart lab Data discovery, linking, citation Search, harvest Deposit Slide 31 Data deposit & sharing: roles and responsibilities Funder Institution Faculty Individual Noor et al PLoS Biol 4(7) 2006 Slide 32 eBank Project exemplar Adding value: aggregating & linking data + interpretations Slide 33 Repository wow-factor or adding value through user interface tools Slide 34 Facilitating use and re-use: text mining tools Adding value Slide 35 Second pause for thought We need to work with instrument suppliers We need to understand more about workflow We need to develop new ways of adding value to datasets through innovative user tools and services We need more evidence of how data is used and re-used (or not) Slide 36 Getting the skills mix Communities, teams, individuals International Virtual Observatory Alliance Global community Virtual organisation Multi-disciplinary team approach eBank Project exemplar: computer scientists, domain scientists (chemists), digital library experts Lessons learnt: e-Science Human Factors Audit Report 2006 Roy Kawalsky, Loughborough NSF Report 2005 Long-lived digital data collections Data scientist Slide 37 ? Wanted! data scientist Slide 38 Digital natives as data scientists? eBank Project: assessing role of research data in u/g Chemical Informatics and MChem courses at Univ. of Southampton Pedagogic evaluation by Grainne Conole Report imminent. Slide 39 Well basically Ive done nothing like it before, so its the first time Ive sort of delved into computing or computational chemistry quite nice, quite enjoyed starting off with just like a string of data and pop it into say a database, just a flat string of numbers basically and then come out with a crystal structure, which is exactly what it should represent which is quite cool There were several parts to the course We started off with how to get 2D and 3D representations of molecules onto a computer using a one-dimensional format, a SMILE string so just ways of like getting data into a format so that it can be easily shared between different computers or different people without having to change lots of things Source: Grainne Conole Slide 40 New skills requirements: interdisciplinary quantitative data curation Integrate within the curriculum Wingreen & Botstein Mol Cell Biol 7, 2006 Slide 41 Final pause for thought Various approaches to develop and obtain digital curation skills Skills are there but often in discrete communities: we need to bring communities together (like at this conference) Integration within the curriculum: undergraduate students, library & information science, archival studies, computer science Provide recognition and a career path for emerging data scientists Slide 42 a centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Take home messages Scholarship is changing fast Big science and open source science both create significant digital curation challenges Science-ready archives are the goal Native data scientists are coming The culture will change too. Slide 43 a centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Thank you.