Better Management Reduces Data Loss Risk

Embed Size (px)

Citation preview

  • 8/12/2019 Better Management Reduces Data Loss Risk

    1/3

    Better management reduces data loss risk

    The impact of data loss can be staggering for research. Nathan

    Westgarthargues the case for better data management

    The loss of scientific data can have a devastating impact on careers. Aftermoving all of his data home to write up, biologist Billy Hinchen returned oneafternoon to find that his laptop and all his backup hard drives had beenstolen. All that remained was a disparate collection of data, spread aroundnumerous small flash drives, email attachments and scribbled drawings thatwere difficult to piece together once the main bulk of information had beenlost.

    The knock-on effect was disastrous. As Billy put it: I was focussed on creatinghigh resolution, 3D time lapse videos of developing crustacean embryos, so allof my work was digital-based. When I lost my laptop and backups, I lost400GB of data and close to four years of work. As a direct result of this I endedup getting an MPhil rather than the PhD Id been working towards. I was

    hoping to have an illustrious career in science and for a time it seemed likeeverything would be stopped in its tracks.

    The importance of data management

    While this is an extreme case of data loss, it does highlight how important it isto consider how scientific data is managed. From the surveys and interviewsweve held with the academic community, weve hear a common theme:

    researchers seem to have difficulty managing and accessing their data.Furthermore, it appears to be an on-going problem for research scientists, atany stage of their careers.

    Former PhD student and subsequent founder of thefigshare platform, MarkHahnel, typified a common challenge: During my PhD I was never good at

    managing my research data. I had so many different file names for my datathat I always struggled to find the correct file quickly and easily when it wasrequested. My former PI was so horrified upon seeing the state of my dataorganisation that she held an emergency lab book meeting with the rest of mygroup when l was leaving.

    http://www.figshare.com/http://www.figshare.com/
  • 8/12/2019 Better Management Reduces Data Loss Risk

    2/3

    Research data management is becoming one of the most pressing issues facingthe scientific community, not just for university management teams, but alsofor every individual researcher. Our investigations have revealed a concerningpicture of the effect that poor data management is having on the quality and

    reliability of scientific outputs.More data, more complexity

    The amount of research data being generated is currently increasing by 30 percent annually (Why manage research data? In G. Pryor (Ed.), Managingresearch data (pp. 1-16), Facet Publishing). This data is not being effectivelymanaged, stored, and made easily accessible. One study found that the odds ofsourcing datasets decline by 17 per cent each year and that a huge 80 per centof scientific data is lost within two decades (The availability of research data

    declines rapidly with article age, Current Biology(24)1: 94-97).The information that remains is often poorly reported. In a second review,researchers found that 54 per cent of the resources used to performexperiments across 238 published studies could not be identified, makingverification impossible (On the reproducibility of science: uniqueidentification of research resources in the biomedical literature, PeerJ 1:e148).This means that much of the $1.5 trillion per year estimated total global spendon research and development is wasted (2013 Global R & D Funding Forecast,Advantage Business Media).

    As well as the financial consequences, poor data management can have asignificant impact on time and other resources. For example, since the year2000, over 80,000 patients have taken part in clinical trials based on researchthat was later retracted because of error or fraud (Problems with scientificresearch: How science goes wrong, The Economist). Meanwhile, the number ofpeer-reviewed paper retractions due to error has grown over fivefold since1990 (Why has the number of scientific retractions increased?, PLOSONE 8(7): e68397). At best, thats a lot of wasted time and effort, but, atworse, drug discovery is halted and careers are severely affected.

    Given the above, it is perhaps unsurprising that as many as 34 developedcountries have signed up to the Declaration on Access to Research Data from

    Public Funding. In addition, key funding bodies such as the NIH, MRC andWellcome Trust now request that data-management plans be part ofapplications.

    http://www.facetpublishing.co.uk/title.php?id=7562http://www.facetpublishing.co.uk/title.php?id=7562http://dx.doi.org/10.1016/j.cub.2013.11.014http://dx.doi.org/10.1016/j.cub.2013.11.014http://dx.doi.org/10.1016/j.cub.2013.11.014http://dx.doi.org/10.1016/j.cub.2013.11.014http://dx.doi.org/10.7717/peerj.148http://dx.doi.org/10.7717/peerj.148http://dx.doi.org/10.7717/peerj.148http://dx.doi.org/10.7717/peerj.148http://bit.ly/1fGZeR5http://bit.ly/1fGZeR5http://econ.st/1clsPtIhttp://econ.st/1clsPtIhttp://econ.st/1clsPtIhttp://econ.st/1clsPtIhttp://dx.doi.org/10.1371/journal.pone.0068397http://dx.doi.org/10.1371/journal.pone.0068397http://dx.doi.org/10.1371/journal.pone.0068397http://dx.doi.org/10.1371/journal.pone.0068397http://dx.doi.org/10.1371/journal.pone.0068397http://1.usa.gov/1cNNbNOhttp://1.usa.gov/1cNNbNOhttp://dx.doi.org/10.1371/journal.pone.0068397http://dx.doi.org/10.1371/journal.pone.0068397http://econ.st/1clsPtIhttp://econ.st/1clsPtIhttp://bit.ly/1fGZeR5http://bit.ly/1fGZeR5http://dx.doi.org/10.7717/peerj.148http://dx.doi.org/10.7717/peerj.148http://dx.doi.org/10.1016/j.cub.2013.11.014http://dx.doi.org/10.1016/j.cub.2013.11.014http://www.facetpublishing.co.uk/title.php?id=7562http://www.facetpublishing.co.uk/title.php?id=7562
  • 8/12/2019 Better Management Reduces Data Loss Risk

    3/3

    Looking after your data

    The time has come to start better protecting our scientific data. The startingpoint is to make the capturing of research data more efficient through thebetter use of technology.

    There are a host of generic tools available that can be used to fit into existingresearch workflows. Some such tools proving popular are Evernote, cloudstorage services like Google Drive and Dropbox, and code hosting sites likeGitHub. While many offer a range of benefits, these tools havent been

    designed with the scientific community in mind.

    For this reason, tools specifically for academics are starting to be developed tosuit their needs. For example, Digital Sciencesfigshare tool is a cloud-basedrepository where researchers can store their data outputs privately, share

    them with colleagues, or make them publicly available and citable with apermanent DOI. figshare is increasingly being used by institutions to host andmanage all file types of research data, securely in the cloud. Institutions canalso use it to promote collaboration internally and facilitate backup andorganisation, without having to share data with the wider world untilresearchers are ready to publish.

    Digital Science has also recently developedProjects, an application that letsresearchers safely manage and organise their research data on the desktop. Itprovides a visual timeline to make finding files easy, backup functionality tohelp seamlessly recover previous versions of files, annotation features and astructured hierarchy to encourage users to organise their files.

    In the future, we hope to see data management taken more seriously byeveryone involved in making science happen, from individual researchersthrough to institutions and governments. While all are clearly dedicated toimproving human existence through exploration and discovery, more energymust be put in to safeguarding this data for the future benefit of science.

    Nathan Westgarth is a product manager for research tools at Digital Science. He

    manages Projects, which aims to help scientific researchers organise their data

    in a safe, simple and structured way

    http://figshare.com/http://www.projects.ac/http://www.projects.ac/http://figshare.com/