10
RESEARCH Open Access Towards mainstreaming of biodiversity data publishing: recommendations of the GBIF Data Publishing Framework Task Group Tom Moritz 1* , S Krishnan 2 , Dave Roberts 3 , Peter Ingwersen 4,5 , Donat Agosti 6 , Lyubomir Penev 7 , Matthew Cockerill 8 , Vishwas Chavan 9 Abstract Background: Data are the evidentiary basis for scientific hypotheses, analyses and publication, for policy formation and for decision-making. They are essential to the evaluation and testing of results by peer scientists both present and future. There is broad consensus in the scientific and conservation communities that data should be freely, openly available in a sustained, persistent and secure way, and thus standards for freeand openaccess to data have become well developed in recent years. The question of effective access to data remains highly problematic. Discussion: Specifically with respect to scientific publishing, the ability to critically evaluate a published scientific hypothesis or scientific report is contingent on the examination, analysis, evaluation - and if feasible - on the re-generation of data on which conclusions are based. It is not coincidental that in the recent climategatecontroversies, the quality and integrity of data and their analytical treatment were central to the debate. There is recent evidence that even when scientific data are requested for evaluation they may not be available. The history of dissemination of scientific results has been marked by paradigm shifts driven by the emergence of new technologies. In recent decades, the advance of computer-based technology linked to global communications networks has created the potential for broader and more consistent dissemination of scientific information and data. Yet, in this digital era, scientists and conservationists, organizations and institutions have often been slow to make data available. Community studies suggest that the withholding of data can be attributed to a lack of awareness, to a lack of technical capacity, to concerns that data should be withheld for reasons of perceived personal or organizational self interest, or to lack of adequate mechanisms for attribution. Conclusions: There is a clear need for institutionalization of a data publishing frameworkthat can address sociocultural, technical-infrastructural, policy, political and legal constraints, as well as addressing issues of sustainability and financial support. To address these aspects of a data publishing framework - a systematic, standard approach to the formal definition and public disclosure of data - in the context of biodiversity data, the Global Biodiversity Information Facility (GBIF, the single inter-governmental body most clearly mandated to undertake such an effort) convened a Data Publishing Framework Task Group. We conceive this data publishing framework as an environment conducive to ensure free and open access to worlds biodiversity data. Here, we present the recommendations of that Task Group, which are intended to encourage free and open access to the worldsbiodiversity data. * Correspondence: [email protected] 1 1968½ South Shenandoah Street, Los Angeles, California 90034-1208, USA Full list of author information is available at the end of the article Moritz et al. BMC Bioinformatics 2011, 12(Suppl 15):S1 http://www.biomedcentral.com/1471-2105/12/S15/S1 © 2011 Moritz et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Towards mainstreaming of biodiversity data publishing: recommendations of the GBIF Data Publishing Framework Task Group

Embed Size (px)

Citation preview

RESEARCH Open Access

Towards mainstreaming of biodiversity datapublishing recommendations of the GBIF DataPublishing Framework Task GroupTom Moritz1 S Krishnan2 Dave Roberts3 Peter Ingwersen45 Donat Agosti6 Lyubomir Penev7 Matthew Cockerill8Vishwas Chavan9

Abstract

Background Data are the evidentiary basis for scientific hypotheses analyses and publication for policy formationand for decision-making They are essential to the evaluation and testing of results by peer scientists both presentand future There is broad consensus in the scientific and conservation communities that data should be freelyopenly available in a sustained persistent and secure way and thus standards for lsquofreersquo and lsquoopenrsquo access to datahave become well developed in recent years The question of effective access to data remains highly problematic

Discussion Specifically with respect to scientific publishing the ability to critically evaluate a published scientifichypothesis or scientific report is contingent on the examination analysis evaluation - and if feasible - on there-generation of data on which conclusions are based It is not coincidental that in the recent lsquoclimategatersquocontroversies the quality and integrity of data and their analytical treatment were central to the debate There isrecent evidence that even when scientific data are requested for evaluation they may not be available The historyof dissemination of scientific results has been marked by paradigm shifts driven by the emergence of newtechnologies In recent decades the advance of computer-based technology linked to global communicationsnetworks has created the potential for broader and more consistent dissemination of scientific information anddata Yet in this digital era scientists and conservationists organizations and institutions have often been slow tomake data available Community studies suggest that the withholding of data can be attributed to a lack ofawareness to a lack of technical capacity to concerns that data should be withheld for reasons of perceivedpersonal or organizational self interest or to lack of adequate mechanisms for attribution

Conclusions There is a clear need for institutionalization of a lsquodata publishing frameworkrsquo that can addresssociocultural technical-infrastructural policy political and legal constraints as well as addressing issues ofsustainability and financial support To address these aspects of a data publishing framework - a systematicstandard approach to the formal definition and public disclosure of data - in the context of biodiversity data theGlobal Biodiversity Information Facility (GBIF the single inter-governmental body most clearly mandated toundertake such an effort) convened a Data Publishing Framework Task Group We conceive this data publishingframework as an environment conducive to ensure free and open access to worldrsquos biodiversity data Here wepresent the recommendations of that Task Group which are intended to encourage free and open access to theworldsrsquo biodiversity data

Correspondence tommoritzgmailcom11968frac12 South Shenandoah Street Los Angeles California 90034-1208 USAFull list of author information is available at the end of the article

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

copy 2011 Moritz et al licensee BioMed Central Ltd This is an open access article distributed under the terms of the Creative CommonsAttribution License (httpcreativecommonsorglicensesby20) which permits unrestricted use distribution and reproduction inany medium provided the original work is properly cited

BackgroundData usage and definitionsThe term lsquodatarsquo [12] has two primary uses One specificto the information technology community refers to anymachine readable code that allows information to beread by stored in accessed by or shared by computersFor example the United States National Science Foun-dation lsquoDataNetrsquo program defines data as ldquoAny informa-tion that can be stored in digital form and accessedelectronically including but not limited to numericdata text publications sensor streams video audioalgorithms software models and simulations imagesetcrdquo [3] Under this definition theoretically everythingcan be lsquodigitizedrsquo and become lsquodatarsquo This lsquobits and bytesrsquodefinition of data challenges us to ask what cannot - ifdigitally captured - be considered lsquodatarsquoA second usage refers to data in a more precise epis-

temic way as ldquoPrecise well-defined representations ofobservations descriptions or measurements of a referent(object phenomena or event) recorded in some stan-dard well-specified wayrdquo [4]In this report we use this latter definition although

stipulating that such data may be technically formattedas text (descriptions) as maps as visual images or audiorecordings as signals as symbols or as numbers Thisclarity is essential because in the context of biodiversityconservation in general and a biodiversity data publish-ing framework in particular data have a foundationalplace in the wisdomknowledge hierarchy [56] Informa-tion knowledge and wisdom are synthesized from fac-tual data which are thus the basis for informed policiesdecision-making and sustainable use of biotic resourcesWe urge that careful attention be consistently paid towhich usage of the term lsquodatarsquo is intended Of elementalimportance is that to be useful descriptions of data andof their provenance lineage [7] and structure normallycollected as lsquometadatarsquo must exist

The volume of dataWe are experiencing a tremendous increase in data gen-erated by a variety of research processes For example arecent article reviewing the growth of data in the Inter-national Nucleotide Sequence Database Collaboration(INSDC) notes ldquothe INSDC databases have grown tocontain over 95 billion base pairs reflecting an exponen-tial growth rate in which the amount of stored data hasdoubled every 18 monthsrdquo [8]This increase has major implications for data manage-

ment data processing data archiving and data accessi-bility The potential for a tremendous signal to noiseproblem - challenging data users to effectively selectrelevant high quality data from a rapidly expanding cor-pus of data - suggests the urgent need extensively andconsistently to implement well designed and deployed

data management strategies These strategies must care-fully evaluate the relative returns on investment forincremental investments in data creation and collection[9] It seems possible that only certain selected lsquocanoni-calrsquo datasets of primary importance in guiding policy orin informing key decisions will be managed in fullaccordance with optimal recommendations Determina-tion of which datasets merit this level of optimal man-agement seems best left to community mechanismsHowever recent challenges to the IntergovernmentalPanel on Climate Change (referred to as lsquoclimategatersquo)make clear the importance of exhaustive documentationfor datasets on which policies with major globalnational and even local economic consequences arebased [10] In general each researcher is responsible forthe quality and integrity of their data by direct releaseof data or by publication based on data they are impli-citly warranting that best professional practices havebeen followed in definition creation and management ofsuch data

Collections of data databases datasets and data tablesIn colloquial scientific usage collections of data are var-iously referred to as lsquodatabasesrsquo lsquodatasetsrsquo and lsquodatatablesrsquo or merely as lsquodatarsquo In an effort to standardizeusage for such collections a recent publication [11] byseveral members of the Task Group has proposed a ser-ies of possible working definitions

ldquorsquoData tablesrsquo represent precisely the set or sets ofdata upon which the analyses and conclusions of agiven scientific paper are based A data table is thusa discrete fixed time-bounded collection serving asa referentrsquoDatasetsrsquo represent discrete collections of dataunderlying a scientific paper Datasets are thus alsofixed and time-bound though functioning in a moregeneral way as a referentrsquoDatabasesrsquo represent larger dynamic and moreextensively coherent collections of data By this defi-nition databases are not fixed or time-bounded buthave properties of quality control and integrity andshould provide the capacity for version control andversion retrospectionrdquo

In the context of this article we propose a clear dis-tinction between fixed data tables that represent pre-cisely the set or sets of data on which the specificanalysis and conclusions of a scientific paper are baseddatasets understood as a fixed and time-bound logicalfiles presenting a collection of facts (observationsdescriptions or measurements) formally structured intostandard records and dynamic databases representinglarger and more extensive collections of data that may

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 2 of 10

or may not include the precise data tables or more gen-eral datasets tables that are the referent(s) for a givenscientific paper Each data record is structured in fieldswith specifications for appropriate field content In thecontext of this article lsquoprimary biodiversity datarsquo isdefined as digital text or multimedia data records pro-viding facts about the instance of an organism the whatwhere when how and by whom of the occurrence andthe recording [12] By this definition data tables anddatasets are inextricably linked to scientific papers andthe publisher must assure consistent and secure accessin perpetuity to referent data tables and datasets [11]Thus these collections of data impose the heaviest bur-den of responsibility on the publisher for sustainedaccessWith respect to the publishing of data the customary

practices of science suggest that data providing the evi-dence for conclusions drawn in a scientific paper orreport should be available for review evaluation andtesting This provision is fundamental to the objectivepractice of science as lsquoorganized skepticismrsquo [13] Appro-priate standards for testing data vary depending on theexact nature of the data For example in situ field dataare evaluated by consideration of the field context themethodology or apparatus used to collect data the con-sistency or inconsistency with other comparable studiesthe quality and detail of the reported observationsphotographs or audio recordings and material evidence(specimen genetic sample scat tracks and so on) Theactual practicability of testing and assessing data ishighly dependent on the thoroughness with which dataare described and how completely the context for datacollection is described This leads logically to the ques-tion of metadata as a source of necessary contextualinformation about data

How data have meaning metadatarsquo2607rsquo and lsquo059998rsquo are each an actual datum or lsquodatapointrsquo It is immediately obvious that without anydescription of context for the creation and capture ofdata an isolated datum is meaningless Descriptiveinformation is necessary to impart meaning The formerdatum was recorded by Henry Cavendish in his ldquoExperi-ments to Determine the Density of the Earthrdquo (21 June1798) and was published in the Philosophical Transac-tions of the Royal Society of London [14] The Cavendishdatum was a result of a humanly contrived experimentusing a specially designed apparatus The latter datum isa reading obtained from automated data loggers record-ing sap flow in Manzanita plants at the University ofCalifornia James Reserve Mt San Jacinto California (4December 2007 1137) and was recorded by a data log-ger in an as yet unpublished Microsoft Excel spread-sheet (Gary Geller 2010 personal communication)

However in the simple contexts disclosed above wehave learned that some agent conducted a data gather-ing exercise at a given date and time and at a describedplace Inference of a probable general scientific domainor discipline for the data - for example physics or ecol-ogy or botany - provides only a very general delimiter ofthe probable character of the data We do not forexample know the actual type of automated data loggerused its proper calibration the actual details of itsdeployment in this instance of use or the competenceof the person using the data logger Lacking this infor-mation and other information that would serve to vali-date the quality of the data presented we are challengedwith the need to develop and to provide more completedescriptions to make data fit for use and in particularfit for testing and evaluation

Provision of metadataTo avoid the risks of overly intricate and elaboratemetadata standards that fail by requiring inordinateinvestments of time and resources we suggest thatmetadata be initially designed to provide minimally ade-quate description for discovery and access to data Wepropose that in the interests of optimal efficiency ofeffort careful efforts be made to apply inference andrecursion in creation of such minimally adequate meta-data and that metadata subsequently be available for thecontinuing addition of fresh increments of metadataThis recommendation implies that metadata creationshould be a continuous collaborative process not a sin-gle event Specifically with respect to museum collec-tions we recommend that links to relevant typespecimens be included as a part of the metadata recordMoreover we believe that by careful application of

qualified social tagging - that is of indexing by expertusers applying well-formed ontologically suitable voca-bularies and authority files - substantial developmentand enrichment of metadata records can be accom-plished (this recommendation requires applications thatcan support a dynamic coherent and iterative develop-ment of metadata over time) [15]We also suggest that assessment of the fitness of

metadata for use be considered from the lsquodemand sidersquoby asking how data have typically been used to besteffect in the creation of biodiversity knowledge andpolicyThere are many technical publications - for example

Voss and Emmons lsquoMammalian diversity in neotropicallowland rainforests a preliminary assessmentrsquo [16] theUS Fish and Wildlife Servicersquos lsquoStatistical guide to dataanalysis of avian monitoring programsrsquo [17] or Agostiet alrsquos lsquoAnts standard methods for measuring and mon-itoring biodiversityrsquo [18] - that provide detailed descrip-tions of common data collection methods or of

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 3 of 10

statistical processes applied to biodiversity dataRecently the European Union Framework Projects 6project EDIT (European Distributed Institute for Taxon-omy) has developed a complete workflow from datacollection in the field to assembly of datasets and ana-lyses [1920] These and many other works provide gui-dance in the development of standard ontologies fordata descriptionWe recommend a research process that - from an

ontological perspective - systematically reviews analyzesand specifies how data can most efficiently be suppliedto fit the needs of these primary biodiversity-monitoringprocesses We suggest detailed survey and analysis ofthe primary and standard forms of processing that bycommunity consensus are of greatest proven value andimpact in biodiversity conservation This assures thatinvestments in data collection will have optimal proba-tive force Based in this analysis standards can belsquoreverse engineeredrsquo to produce data best suited to thedemands of biodiversity conservationWe also strongly recommend careful analysis of stan-

dards already under development The EcologicalMetadata Language (EML) [21] under continuingdevelopment has made significant progress but webelieve that the issues raised elsewhere in this reporthave yet to be addressed Specifically significant onto-logical work remains to be accomplished regarding theanalysis and standard definition of biological field tech-niques data transformation methods and statisticalprocessesWe also believe that the scripting capacity of standard

statistical packages [22] and still emergent applicationsfor documenting scientific workflow (such as Kepler[23]) may both have direct utility in recording the pro-cess and context for scientific data capture A notableexample of such workflow capture is in the Galaxygenomics platform [24] Ontological research and devel-opment coupled with applications development shouldprovide the necessary foundations for required descrip-tions of dataIn the social sciences the Data Documentation Initia-

tive based at the University of Michiganrsquos Interuniver-sity Consortium for Political and Social Research(ICPSR) has been underway for several years and isnow at version 31 [25] Similarly a 2009 publication ofthe OECD has proposed a model template for metadatadescribing a published dataset [26] The requirement offree text abstracts may provide an adequate frame forsuch detailed specification but considerable additionalwork will be demanded particularly in deriving minimaldescriptive standards for discovery of biodiversity dataThe importance of metadata in exposing data to dis-

covery becomes increasingly important as the units intowhich data are assembled become smaller The

molecular sequence repositories developed and main-tained by International Nucleotide Sequence DatabaseCollaborations (INSDC [27]) such as GenBank [28]ENA [29] and DDBJ [30] are perhaps among the bestknown example of a data repository but although thesearch interfaces and the utility of data contained withGenBank are very limited (and especially geared formolecular biologists) its global prominence makes it anobvious search target Biodiversity data in general arefar more complicated and tend to be made available insmaller blocks for example the data associated with asingle publication Locating and combining data relevantto a particular purpose thus becomes a goal in itself andis made possible through the existence of metadatausing standard vocabularies

Open access and biodiversity dataOpen access to primary biodiversity data is essentialboth for enabling effective decision making and forempowering stakeholders involved with and affected bythe conservation of biodiversity [31-33] Specifically withrespect to scientific publishing the ability to criticallyevaluate a published scientific hypothesis or scientificreport is contingent on the examination analysis eva-luation and if feasible re-generation of data on whichconclusions are based Biodiversity is not an exceptionto such data restrictions For example authors of apaper published on the failure of African game parks tosuccessfully conserve large mammals were unable topresent local data gathered from reserve operators whowanted it to be kept confidential [34]There is broad emerging consensus in the scientific

and conservation communities that data should befreely openly available in a sustained persistent andsecure way [35-38] However many existing primarybiodiversity data are neither accessible nor discoverable[39] This issue is further compounded by lack of appro-priate representation andor visualization of availabledata and lack of linkability among distributed and het-erogeneous data resources [4041] This adversely affectsthe optimal utility of the biodiversity data Thus anurgent need exists for the discovery of primary biodiver-sity data and its publication in the public domainFor decades there have been declarations statements

policies and guidelines encouraging open access to pri-mary scientific data [3142] With the establishment ofthe Global Biodiversity Information Facility (GBIF) in2001 an attempt has been made to develop a globalinfrastructure to consolidate the discovery of the worldrsquosprimary biodiversity data and to provide coherentaccess Currently the GBIF network facilitates access tonearly 304 million data records through its portal [43]However these primary biodiversity data records arejust a fraction of the estimated volume of existing data

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 4 of 10

[44-47] This large volume of biodiversity data collectedby a vast number of biodiversity researchers and ama-teurs [3147] remains largely undiscovered and unpub-lished This is attributable we believe to a lack ofencouragement misperceptions of self-interest or lackof infrastructural support Although infrastructure sup-port is increasingly available the problem of appropriateprofessional recognition for institutions and individualsremains [31] We believe that this lack of incentiveremains a major impediment to the provision of freeand open access to primary biodiversity data

The GBIF data publishing framework task groupThe foregoing discussion emphasizes the need for a datapublishing framework to evolve metrics and indicatorsthat provides incentives to multiple actors involved inthe generation of data Recognizing the need for addres-sing social policy political and technical issues influen-cing discovery and publishing through the GBIFnetwork the GBIF Data Publishing Framework TaskGroup (DPF TG) was commissioned in March 2009[48] The DPF TG was tasked with providing recom-mendations on (a) social technical and policy interven-tions that would encourage publication of primarybiodiversity data as a necessary and in-built step in thescientific data management cycle (b) opportunities andmechanisms to incentivize and attribute credit forinvestment in primary biodiversity data publishing fromindividual to institutional to national levels and (c)mechanismsprocesses for recognizing efforts of datapublishers The concept of the data publishing frame-work was described at the International BiodiversityInformatics Conference (rsquoe-Biosphere 09rsquo) held in Lon-don in June 2009 [49] In its meeting in June 2009 theDPF TG discussed issues influencing discovery and pub-lishing of primary biodiversity data and possible solu-tions in overcoming impediments

A data publishing framework for primary biodiversitydataDuring its meeting in June 2009 the DPF TG investedsignificant time in defining and determining the scopeand purpose of the data publishing framework for pri-mary biodiversity data The DPF TG recognized theneed expressed by the data originators and informationsystemnetworks for data usage metrics and indicatorsto ensure that the overall utility and impact of their datamanagement and publishing activities is objectivelydocumented leading to crediting of these activities asscientific activity on a par with the recognition receivedfor conventional scholarly publication [31] Furthermoremeasures of scientistsrsquo productivity will be betterinformed through data publishing which requires a pro-fessional cultural change in the recognition of scientific

output [50] Such an incentive mechanism wouldachieve increased data mobilization and increased recog-nition for data generation both desirable outcomes forscientistsOur discussion examined five primary components

that comprise a data publishing framework These com-ponents are (a) socio-cultural (b) technical-infrastruc-tural (c) policy-political (d) legal and (e) economic andthey support various activities of the data publishingcycle (see Figure 1 in [31]) These components are notonly complementary but are inter-dependent Thusthere is no dependency on a sequence of componentsas components need to be implemented concurrentlyTherefore we define a data publishing framework as anenvironment conducive to ensuring free and openaccess to the worldrsquos primary biodiversity data Thecore purpose of the framework is to overcome barriersor impediments affecting access to data and the pub-lishing of data

RecommendationsOn the basis of our understanding of issues influencinglsquofree and open accessrsquo discovery and publishing of theprimary biodiversity data to encourage institutionaliza-tion of the data publishing framework for discoverypublishing and use of primary biodiversity data wemake specific recommendations The key words lsquomustrsquolsquomust notrsquo lsquorequiredrsquo lsquoshallrsquo lsquoshall notrsquo lsquoshouldrsquo lsquoshouldnotrsquo lsquorecommendedrsquo lsquomayrsquo and lsquooptionalrsquo in this docu-ment are to be interpreted as described in RFC 2119lsquoKey words for use in RFCs to Indicate RequirementLevelsrsquo of the Internet Engineering Task Force [51]Sharing of biodiversity data must be the expected

norm We stipulate that withholding of data - to protectprecise localities for collectible or marketable plants oranimals or for species of special concern - should be theexception and require explicit justification We empha-size that such data represent a small fraction of biodi-versity data and should not be allowed to dictate normalpractice We also stipulate that our call for access tobiodiversity data does not supersede national or indigen-ous rights to regulate uses of biodiversity data as protec-tion against commercial exploitation (rsquobiopiracyrsquo) Tothis end we suggest close consultation and confirmationwith CITES [52] and the TRAFFIC Secretariat [53]when questions of this kind occur As a corollary allcontributors of data must receive appropriate propor-tional recognition for their contributions of data Onthis backdrop we offer 24 recommendations Recom-mendation 1 is however the primary recommendationthat leads to the other recommendationsRecommendation 1 All data relevant to the under-

standing of biodiversity and to biodiversity conservationshould be made freely openly and effectively available

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 5 of 10

Recommendation 2 GBIF must re-examine its cur-rent data resources endorsement model and scrutinizethe current practice that national nodes or associateparticipant nodes are required to give endorsementbefore the data are discovered and indexed throughGBIF networkRecommendation 3 GBIF must engage mainstream

scholarly publishers and scientific societies with scho-larly publications to be part of the GBIF network as amajority of them would qualify to be thematicglobalregional associate-participantsRecommendation 4 GBIF must support the develop-

ment of a tool to convert tabular data into resourcedescription framework (RDF) formats conforming to astandard ontology This would be highly desirable forsmall custodianspublishers but is primarily a tool formainstream scholarly publishers (Support for develop-ment of such an open source application should besought from mainstream commercial publishers) GBIFshall evaluate standards such as BioPax [54]Recommendation 5 GBIF must facilitate discovery

and mobilization of all streamstypes of relevant biodi-versity data (This effort should - in close collaborationwith others focusing on this development - includeontological analysis of the most important types of datato be considered the elaboration of suitable workingformats for that data and the developing of mappingstofrom such working formats to a standard RDF formatfor interchange purposes)Recommendation 6 GBIF should develop a set of

supporting tools (such as templates) for biodiversitydata to accommodate more than simple occurrencedata GBIF must increasingly engage with various biodi-versity data communitiesRecommendation 7 GBIF must facilitate discovery of

un-digitized and not yet published datasets togetherwith indexing of published datasets (potentially toinclude semantic indexing based on RDF to allow data-sets to be filtered and retrieved with SPARQL queries)In this regard we strongly endorse the recommendationby the GBIF Global Strategy and Action Plan for Mobili-zation of Natural History Collections data [55]Recommendation 8 GBIF should review the use of

legacy literature such as is stored in Biodiversity Heri-tage Library (BHL) to explore uses of marked-up textsfor data mining and capture of historical biodiversityinformationRecommendation 9 GBIF must explore and develop

the capacity to run queries at the GBIF data portal toreturn harmonized well formed XML andor RDF suchthat fields can be extracted for subsequent analysisRecommendation 10 GBIF must expand and

improve its metadata implementation framework tosuch that fitness for use of the data resource for

intended use can be ascertained from metadata Forexample data records should identify lineage and prove-nance (where data originated and from which dataresource) of all contributed data - at least to the pre-vious phase of data transformation Further we stronglyencourage early implementation of the recommenda-tions of the GBIF Metadata Implementation FrameworkTask Group [56]Recommendation 11 GBIF must strengthen its net-

work of mirror sites and distributed network of lsquotrusteddigital repositoriesrsquo (also called data hosting centers) Inthis regard we call on GBIF to ensure early implementa-tion of the recommendations in this issue on data host-ing infrastructure [57]Recommendation 12 GBIF must explore the feasibil-

ity of using a cloud infrastructure to overcome barriersof investment and maintenance required for biodiversitydata discovery and publishing especially in the develop-ing and under-developed regions of the worldRecommendation 13 GBIF must ensure an early

implementation of the recommendations of the GBIFLife Sciences Identifier (LSID)globally unique identifier(GUID) Task Group [58] We further emphasize theneed for GBIF to adopt a stable and proven persistentidentifier such as the lsquodigital object identifier (doi)rather than unstable persistent identifiersRecommendation 14 GBIF must explore the poten-

tial of the Data Usage Index (DUI) as potential incenti-vization mechanism to recognize efforts required forpublishing of biodiversity data [3159] GBIF shoulddevelop a prototype of such an implementationRecommendation 15 GBIF must institutionalize a

lsquodata citation mechanismrsquo and establish a lsquodata citationservicersquo facilitating deep-data citation and registrationand resolving of citations [26] For the purposes ofaccountability and citation (attribution) all contributorsof data to any aggregation should be identified andacknowledged Individuals or institutions responsible forprimary data have an obligation to make these owner-ship statements available to the aggregators who areresponsible for using them The Dryad applicationwhich uses DataCite to register dois is an initial effortto address this concern [60] In any data aggregationchain the aggregator at each level is responsible foridentification of data sources from previous level ofaggregation and its contributors We believe that thisprovision avoids the complexity of comprehensive iden-tity of all lsquocascadedrsquo data sources and contributors dur-ing the aggregation process It is of course neverthelessthe case that the validity and integrity of data are ulti-mately linked to the sum of the integrity and validity ofall data processes in the lineage of data creationRecommendation 16 GBIF should investigate inno-

vative mechanisms for discovery and publishing of

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 6 of 10

primary biodiversity data in multiple languages GBIFshould commission a position paper detailing suchmechanisms for potential uptake by the communityRecommendation 17 GBIF must institutionalize the

lsquobiodiversity informatics potentialrsquo (BIP) Index todemonstrate the potential and urgency for nations toimplement biodiversity informatics [61] In the longterm GBIF must lead the periodic release of a lsquoglobalbiodiversity information outlookrsquo report analyzing thecurrent state of biodiversity information to meet thelocal-to-global scale biodiversity targetsRecommendation 18 GBIF must commission a strat-

egy paper demystifying the concernsissues related tointellectual property rights and primary biodiversitydata In this regard the substantial work done by theScience Commons (for example the Science CommonsProtocol for Implementing Open Access Data [62]) andthe Open Knowledge Foundation [63] should havedirect applicationRecommendation 19 GBIF should encourage spon-

sors of biodiversity research whether government agen-cies corporations or private foundations to setmandatory requirements for free and open access tobiodiversity data GBIF should encourage that negotia-tions for overhead (indirect) cost contributions fromfunders should include calculations of cost for sustaineddigital infrastructure that is adequate for free and opensharing and the sustained secure and persistent mainte-nance of data Proposals should be expected to includeadequate planning and financial provision for sustaineddata management and access We further recommendthat GBIF should encourage peer review processes thatinclude rigorous scrutiny of past histories of successfulsharing and should support the norm of state-of-the-artplanning for sharing not simply promises to ldquoput dataon the webrdquoRecommendation 20 GBIF must develop a plan to

foster linkages between scholarly publishers and datapublishers from the local to the global scale GBIFshould encourage that records of professional publica-tion be evaluated - at least in part - on the basis of pub-lication in open access journals that do not deny accessthrough lsquopaywallsrsquo and that provide support for sustain-able open access to dataRecommendation 21 GBIF should urge accreditation

bodies for educational institutions and museums torequire demonstrated evidence of capacity to supportdigital access and maintenance of dataRecommendation 22 GBIF should encourage profes-

sional societies and professional disciplines to requireevidence of effective sharing of data in evaluations forhiring promotion and tenureRecommendation 23 GBIF should develop a concep-

tual lsquolandscape maprsquo depicting GBIFrsquos position role

unique advantages and collaborative strategies amid themany biodiversity and biodiversity informatics initiativesat local to global scales This is very important given thebroad reach of the earlier recommendations It is impor-tant that the scope of the GBIFrsquos own vision and mis-sion is well defined with a clear picture of how GBIFrsquosrole fits into a wider framework of sustainable develop-ment and of free and open access to biodiversity dataRecommendation 24 GBIF must evaluate prioritize

and implement the recommendations made by its taskgroups - the Content Needs Assessment Task Group(CNA TG) [42] the Multimedia Resources Task Group(MRTG) [6465] the Metadata Implementation Frame-work Task Group (MIFTG) [56] the LSID-GUID TaskGroup (LGTG) [58] the Observational Data TaskGroup (ODTG) [66] - and in the Global Strategy andAction Plan for Natural History Collections Data(GSAP-NHC) [55] and recommendations on e-learningrecommendations [67] Knowledge Organization System(KOS) [68] and fitness for use [69]

DiscussionThese recommendations grew out of our discussion inJune 2009 Since then there have been subsequentrevisions and modifications of the recommendationsand some additions Chavan and Ingwersen [31]further elaborated on various components of the datapublishing framework especially pertaining to theissues of persistent identifiers the data usage indexand a data citation mechanism This was further dis-cussed during the DataCite Summer Workshop 2010[70] Members of the Task Group were engaged inexploring solutions to various components of the datapublishing framework some of which are included inthis issue [57596171] and some published elsewhere[697273] and MJ Costello WK Michener et al per-sonal communicationIn January 2011 the US National Science Foundation

(NSF) implemented a policy requiring all NSF grantapplicants to submit data management plans as a partof any grant proposal [74] This policy change seems torepresent a very significant fulfillment of our recom-mendation though the exact details of its implementa-tion remain as yet unclearWe believe that timely implementation of

these recommendations and suggested solutions orapproaches by the GBIF network will support muchneeded recognition for individual and institutionalefforts in management and publishing of primary bio-diversity data GBIFrsquos support of these recommenda-tions should be of critical importance in establishingtheir credibility and winning their widespread adop-tion Implementation of these recommendations shouldsubstantially increase the volume of available primary

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 7 of 10

biodiversity data substantiating public investment inbiodiversity science and conservation of bioticresourcesThe DPF TG notes several preliminary efforts to

implement these recommendations by the GBIF Secre-tariat The DPF TG recommendation on incentivizingefforts for metadata authoring has led the GBIF secre-tariat to commission Pensoft Publishers to create a lsquodatapaperrsquo [71] section in four of its journals (BioRisks Phy-toKeys NeoBiota and ZooKeys) alongside a lsquopush-buttonrsquomechanism to generate XML-encoded manuscripts frommetadata descriptions to be submitted directly to thepublisher for peer review and editorial evaluation andpublication in a form of a data paper [71] The BIPIndex an exploratory study to develop metrics to deter-mine country-level biodiversity informatics potentialshas been undertaken [61] GBIF was moreover invitedto be part of the group of experts convened by theCODATA (the Committee on Data for Science andTechnology) to develop an approach to data citationWe were mandated to make recommendations for

potential uptake by the GBIF network However webelieve that these recommendations apply to thebroader biodiversity informatics and ecoinformaticscommunity Nevertheless we reiterate that the GBIFnetwork is the most natural venue to kick-start the earlyimplementation of these recommendations As GBIFenters into its third phase in which it aspires to be theforemost global resource for biodiversity information[75] an early leadership and proactive step towardsimplementation of these recommendations is imperativefor its success

Conclusions and future workThe effective sharing of research data has become a goalof the international research community Implementa-tion of these recommendations should expedite the pro-gress of archiving curation discovery and publishing ofprimary biodiversity data because scientists and origina-tors of data will realize the value and incentives for suchefforts We believe that implementation of our recom-mendations by the GBIF network and its adoption bysimilar initiatives such as GEO-BON IPBES and CBDwill contribute to a much needed global research infra-structure and specifically to an open access regime inbiodiversity and conservation science We furtherbelieve that adoption should encourage the evolution ofa richly informed virtual research space for future stu-dies in biodiversity [76] However we believe that ulti-mately implementation of these recommendations willdepend less on policy-political decisions or technical-infrastructural development and primarily on culturalnormative and attitudinal changes by individuals institu-tions and organizations

AcknowledgementsThis article has been published as part of BMC Bioinformatics Volume 12Supplement 15 2011 Data publishing framework for primary biodiversitydata The full contents of the supplement are available online at httpwwwbiomedcentralcom1471-210512issue=S15 Publication of the supplementwas supported by the Global Biodiversity Information Facility

Author details11968frac12 South Shenandoah Street Los Angeles California 90034-1208 USA2Aundh Pune 411007 India 3Zoology Microbiology Research GroupZoology Department Natural History Museum Cromwell Road London SW75BD UK 4Royal School of Library and Information Science Birketinget 6Copenhagen DK 2300 Denmark 5Oslo University College Pb 4 St OlavsPlass 0130 Oslo Norway 6Plazi Zinggst 16 3600 Bern Switzerland andAmerican Museum of Natural History Central Park West at 79th Street NewYork NY 10024 USA 7Institute of Biodiversity and Ecosystem ResearchBulgarian Academy of Sciences and Pensoft Publishers 13a Geomilev Street1111 Sophia Bulgaria 8BioMedCentral Ltd Floor 6 236 Grayrsquos Inn RoadLondon WC1X 8HB UK 9Global Biodiversity Information Facility SecretariatUniversitetsparken 15 DK 2100 Copenhagen Denmark

Competing interestsThe authors declare that they have no competing interests

Published 15 December 2011

References1 Merriam-Webster [httpwwwmerriam-webstercomdictionarydata]2 Wikipedia [httpenwikipediaorgwikiData]3 National Science Foundation Sustainable Digital Data Preservation and

Access Network Partners (DataNet) Program Solicitation NSF 07-601 2008[httpwwwnsfgovpubs2007nsf07601nsf07601htmtoc]

4 AnthroDPA Metadata Working Group Report of the AnthroDPA MetaDataWorking Group May 2009 Sponsored by the Wenner-Gren Foundationand the US NSF[httpanthrodatadpaorgMediaAnthroDataDPA20Reportpdf]

5 Ackoff RL From data to wisdom Journal of Applied Systems Analysis 1989163-9

6 Bellinger C Castro D Mills A Data Information Knowledge and Wisdom2004 [httpwwwsystems-thinkingorgdikwdikwhtm]

7 Bose R Frew J Lineage retrieval for scientific data processing a surveyACM Computing Surveys 2005 371-28

8 Lathe W Williams J Mangan M Karolchik D Genomic data resourceschallenges and promises Nature Education 2008 13[httpwwwnaturecomscitabletopicpageGenomic-Data-Resources-Challenges-and-Promises-743721]

9 Grantham HS Moilanen A Wilson KA Pressey RL Rebelo TGPossingham HP Diminishing return on investment for biodiversity datain conservation planning Conservation Letters 1190-198 doi 101111j1755-263X200800029x

10 Closing the Climategate Nature 2010 468345 doi 101038468345a11 Penev L Erwin T Miller J Chavan V Moritz T Griswold C Publication and

dissemination of datasets in taxonomy ZooKeys working exampleZooKeys 2009 111-8 doi 103897zookeys11210

12 GBIF GBIF Work Programme 2009-2010 Copenhagen Global BiodiversityInformation Facility 2008

13 Merton RK The Normative Structure of Science The Sociology of ScienceTheoretical and Empirical Investigations Chicago University of Chicago Press1979 267-278

14 Cavendish H Read AS Experiments to determine the density of theearth Philos Trans R Soc Lond 1798 II469-526

15 Michener WK Meta-information concepts for ecological datamanagement Ecological Informatics 2006 13-7 doi 101016jecoinf200508004

16 Voss RS Emmons L Mammalian diversity in neotropical lowlandrainforests a preliminary assessment Bulletin of the American Museum ofNatural History 1996 230

17 Nur N Jones SL Geupel GR Statistical Guide to Data Analysis of AvianMonitoring Programs BTP-R6001-1999 Washington DC US Departmentof the Interior Fish and Wildlife Service 1999 61[httplibraryfwsgovPubs9avian_monitoringpdf]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 8 of 10

18 Agosti D Majer J Alonso E Schultz TR Ants Standard Methods forMeasuring and Monitoring Biodiversity Biological Diversity HandbookSeries Washington DC Smithsonian Institution Press 2000 [httpantbaseorgantspublications2033020330pdf]

19 EDIT Platform for Cybertaxonomy [httpwp5e-taxonomyeu]20 EDIT Volume on field recording techniques and protocols for all taxa

biodiversity inventories 2010 [httpwwwabctaxabevolumesvolume-8-manual-atbi]

21 Knowledge Network for Biodiversity an Introduction to EcologicalMetadata Language [httpknbecoinformaticsorgeml_metadata_guidehtml]

22 Borer ET Seabloom EW Jones MB Schildhauer M Some simple guidelinesfor effective data management ESA Bulletin 2009 90206-214[httpwwwesajournalsorgdoipdf1018900012-9623-902205]

23 The Kepler Project [httpskepler-projectorg]24 Giardine B Riemer C Hardison RC Burhans R Elnitski L Shah P Zhang Y

Blankenberg D Albert I Taylor J Miller W Kent WJ Nekrutenko A Galaxy aplatform for interactive large-scale genome analysis Genome Res 2005151451-1455

25 DDI Alliance Metadata specification for social and behavioral sciencesver 31[http httpwwwddiallianceorg]

26 Green T We need publishing standards for datasets and data tablesWhite paper OECD Publishing 2009 9-11 doi 101787603233448430

27 International Nucleotide Sequence Database Collaboration [httpinsdcorg]

28 GenBank [httpwwwncbinlmnihgovGenbankindexhtml]29 European Nucleotide Archive [httpwwwebiacukena]30 DNA Data Bank of Japan [httpwwwddbjnigacjp]31 Chavan VS Ingwersen P Towards a data publishing framework for

primary biodiversity data challenges and potentials for the biodiversityinformatics community BMC Bioinformatics 2009 10(Suppl 14)S2 doi1011861471-2105-10-S14-S2

32 Penev L Sharkey M Erwin T van Noort S Buffington M Seltmann KJohnson N Taylor M Thompson FC Dallwitz MJ Data publication anddissemination of interactive keys under the open access modelZooKeys working example ZooKeys 2009 211-17 doi 103897zookeys21274

33 Reichman OJ Jones MB Schildhauer MP Challenges and opportunitiesof open data in ecology Science 2011 331703 doi 101126science1197962

34 Craigie ID Baillie JEM Balmford A Carbone C Collen B Green REHutton JM Large marine population declines in Africarsquos protected areasBiol Conserv 2010 1432221-2228

35 Berlin Declaration on Open Access to Knowledge in the Sciences andHumanities 2003 [httpoampgdelangen-ukberlin-prozessberliner-erklarung]

36 Berlin Declaration Table of Signatories [httpoampgdelangen-ukberlin-prozesssignatoren]

37 About Conservation Commons [httpconservationcommonsnetcc_en_1-about-conservation-commons]

38 Conservation Commons Partners [httpconservationcommonsnetpartners]

39 Chavan V Watve AV Londhe MS Rane NS Pandit AT Krishnan SCataloguing Indian biota the electronic catalogue of known Indianfauna Curr Sci 2004 87749-763

40 Sarkar IN Biodiversity informatics organizing and linking informationacross the spectrum of life Brief Bioinf 2007 8347-357

41 Page RDM Biodiversity informatics the challenge of linking data and therole of shared identifiers Brief Bioinf 2008 9345-354

42 Faith DP Collen B Arino AH Koleff P Guinotte J Kerr J Chavan V Bridgingthe biodiversity data gaps recommendations of the GBIF ContentNeeds Assessment Task Group Biodiversity Informatics 2011

43 GBIF Data Portal [httpdatagbiforg]44 Butler D Gee H Macilwain C Museum research comes off list of

endangered species Nature 1998 394115-11745 Chavan V Krishnan S Natural history collections A call for national

information infrastructure Curr Sci 2003 8434-4246 Arino AH Approaches to estimating the universe of natural history

collections data Biodiversity Informatics 2010 781-9247 Heidorn PB Shedding light on the dark data in the long-tail of science

Library Trends 2008 57280-299 doi 101353lib00036

48 GBIF GBIF commissions Data Publishing Framework Task Group (10March 2009)[httpwwwgbiforgcommunicationsnews-and-eventsshowsinglearticlegbif-commissions-data-publishing-framework-task-group]

49 Chavan V Data Publishing = Scholarly Publishing e-Biosphere 09International Conference on Biodiversity Informatics June 2009 London[httpwwwslidesharenetvishwaschavanebiosphere09-vc-final-1734144]

50 Roberts D Chavan V Standards identifier could mobilize data and freetime Nature 2008 453449-450

51 IETF RFC 2119 (Released 1997)[httpwwwietforgrfcrfc2119txt]52 CITES [httpwwwcitesorg]53 TRAFFIC [httpwwwtrafficorg]54 BioPAX - Biological Pathway Exchange [httpwwwbiopaxorg]55 Berendsohn WG Chavan V Macklin JA Recommendations of the GBIF

Task Group on the Global Strategy and Action Plan for the mobilizationof the natural history collections data Biodiversity Informatics 2010767-71

56 Global Biodiversity Information Facility Report of the GBIF MetadataImplementation Framework Task Group (MIFTG) Copenhagen GlobalBiodiversity Information Facility 2009 [httpwww2gbiforgGBIF-MIFTG-Reportpdf]

57 Goddard A Wilson N Cryer P Yamashita G Data hosting infrastructure forprimary biodiversity data BMC Bioinformatics 2011 12(Suppl 15)S5

58 GBIF Adoption of Persistent Identifiers for Biodiversity InformaticsRecommendations of the GBIF LSID GUID Task Group CopenhagenGlobal Biodiversity Information Facility 2009 [httpwww2gbiforgPersistent-Identifierspdf]

59 Ingwersen P Chavan V Indicators for the Data Usage Index (DUI) anincentive for publishing primary biodiversity data through globalinformation infrastructure BMC Bioinformatics 2011 12(Suppl 15)S3

60 DataCite Metadata [httpswwwdatadryadorgwikiDataCite_Metadata]61 Arino AH Chavan V King N The Biodiversity Informatics Potential Index

BMC Bioinformatics 2011 12(Suppl 15)S462 Science Commons Protocol for Implementing Open Access Data [http

sciencecommonsorgprojectspublishingopen-access-data-protocol]63 Open Knowledge Foundation [httpokfnorg]64 Morris R Olson A OrsquoTuama E Riccardi G Whitbread G Hagedorn G

Teage I Heikkinen M Leary P Barve V Chavan V Recommendations of theGBIF Multimedia Resources Task Group Copenhagen Global BiodiversityInformation Facility 2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

65 Morris R Olson A Freeland C Hagedorn G Riccardi G Carausu M-COrsquoTuama E Chavan V Mobilising Multimedia Resources in Biodiversity2nd Report of the GBIF Multimedia Resources Task Group (MRTG)Copenhagen Global Biodiversity Information Facility 2009 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

66 Kelling S Ingole B Daly B Stein B Lepage D OrsquoTuama E Cooper JJones M Lahti T Chavan V Recommendations of the GBIF ObservationalData Task Group Copenhagen Global Biodiversity Information Facility2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

67 Balde O Encinas Escribano M Gonzaacutelez-Talavaacuten A Martens MJMNorton GA Talukdar GH GBIF Task Group on Electronic Learning FinalReport version 10 Copenhagen Global Biodiversity Information Facility2010 [httplinksgbiforggbif_elearning_task_group_en_v1pdf]

68 Catapano T Hobern D Lapp H Morris RA Morrision N Noy NSchildhauer M Thau D Recommendations for the Use of KnowledgeOrganisation Systems by GBIF Copenhagen Global BiodiversityInformation Facility 2001 [httplinksgbiforggbif_kos_whitepaper_v1pdf]Released on 04 Feb 2011

69 Hill AW Otegui J Arintildeo AH Guralnick RP GBIF Position Paper on FutureDirections and Recommendations for Enhancing Fitness-for-Use Acrossthe GBIF Network version 10 Copenhagen Global BiodiversityInformation Facility 2010 [httpwww2gbiforgGPP-Finalpdf] PrimaryBiodiversity Data

70 Chavan V Towards Data Publishing Framework DataCite Summer Meeting7-8 June 2010 Hannover Germany [httpflowcastsmediaelearninguni-hannoverde2010-07-05datacite2010AcquiringhighqualityresearchdataAndreasHense-640-video-O3hD9ZOmmp4]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 9 of 10

71 Chavan V Penev L The data paper a mechanism to incentivize datapublishing in biodiversity science BMC Bioinformatics 2011 12(Suppl15)S2

72 Berents P Hamer M Chavan V Towards demand driven publishingapproaches to the prioritization of digitization of natural historycollections data Biodiversity Informatics 2010 7113-119

73 Chavan VS Sood RK Arino AH Best Practice Guide for lsquoData Discoveryand Publishing Strategy and Action Plansrsquo version 10 CopenhagenGlobal Biodiversity Information Facility 2010 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

74 NSF Data Management Plan Requirements [httpwwwnsfgovenggeneraldmpjsp]

75 GBIF GBIF Strategic Plan 2012-2016 Seizing the Future CopenhagenGlobal Biodiversity Information Facility 2011 [httpgbifddbjnigacjpgbif_newsuploadGBIF_Strategic_Plan_2012-16pdf]

76 Gaikwad J Chavan V Open access and biodiversity conservationchallenges and potentials for the developing world Data Science Journal2006 51-17

doi1011861471-2105-12-S15-S1Cite this article as Moritz et al Towards mainstreaming of biodiversitydata publishing recommendations of the GBIF Data PublishingFramework Task Group BMC Bioinformatics 2011 12(Suppl 15)S1

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 10 of 10

  • Abstract
    • Background
    • Discussion
    • Conclusions
      • Background
        • Data usage and definitions
        • The volume of data
        • Collections of data databases datasets and data tables
        • How data have meaning metadata
        • Provision of metadata
        • Open access and biodiversity data
        • The GBIF data publishing framework task group
        • A data publishing framework for primary biodiversity data
          • Recommendations
          • Discussion
          • Conclusions and future work
          • Acknowledgements
          • Author details
          • Competing interests
          • References

BackgroundData usage and definitionsThe term lsquodatarsquo [12] has two primary uses One specificto the information technology community refers to anymachine readable code that allows information to beread by stored in accessed by or shared by computersFor example the United States National Science Foun-dation lsquoDataNetrsquo program defines data as ldquoAny informa-tion that can be stored in digital form and accessedelectronically including but not limited to numericdata text publications sensor streams video audioalgorithms software models and simulations imagesetcrdquo [3] Under this definition theoretically everythingcan be lsquodigitizedrsquo and become lsquodatarsquo This lsquobits and bytesrsquodefinition of data challenges us to ask what cannot - ifdigitally captured - be considered lsquodatarsquoA second usage refers to data in a more precise epis-

temic way as ldquoPrecise well-defined representations ofobservations descriptions or measurements of a referent(object phenomena or event) recorded in some stan-dard well-specified wayrdquo [4]In this report we use this latter definition although

stipulating that such data may be technically formattedas text (descriptions) as maps as visual images or audiorecordings as signals as symbols or as numbers Thisclarity is essential because in the context of biodiversityconservation in general and a biodiversity data publish-ing framework in particular data have a foundationalplace in the wisdomknowledge hierarchy [56] Informa-tion knowledge and wisdom are synthesized from fac-tual data which are thus the basis for informed policiesdecision-making and sustainable use of biotic resourcesWe urge that careful attention be consistently paid towhich usage of the term lsquodatarsquo is intended Of elementalimportance is that to be useful descriptions of data andof their provenance lineage [7] and structure normallycollected as lsquometadatarsquo must exist

The volume of dataWe are experiencing a tremendous increase in data gen-erated by a variety of research processes For example arecent article reviewing the growth of data in the Inter-national Nucleotide Sequence Database Collaboration(INSDC) notes ldquothe INSDC databases have grown tocontain over 95 billion base pairs reflecting an exponen-tial growth rate in which the amount of stored data hasdoubled every 18 monthsrdquo [8]This increase has major implications for data manage-

ment data processing data archiving and data accessi-bility The potential for a tremendous signal to noiseproblem - challenging data users to effectively selectrelevant high quality data from a rapidly expanding cor-pus of data - suggests the urgent need extensively andconsistently to implement well designed and deployed

data management strategies These strategies must care-fully evaluate the relative returns on investment forincremental investments in data creation and collection[9] It seems possible that only certain selected lsquocanoni-calrsquo datasets of primary importance in guiding policy orin informing key decisions will be managed in fullaccordance with optimal recommendations Determina-tion of which datasets merit this level of optimal man-agement seems best left to community mechanismsHowever recent challenges to the IntergovernmentalPanel on Climate Change (referred to as lsquoclimategatersquo)make clear the importance of exhaustive documentationfor datasets on which policies with major globalnational and even local economic consequences arebased [10] In general each researcher is responsible forthe quality and integrity of their data by direct releaseof data or by publication based on data they are impli-citly warranting that best professional practices havebeen followed in definition creation and management ofsuch data

Collections of data databases datasets and data tablesIn colloquial scientific usage collections of data are var-iously referred to as lsquodatabasesrsquo lsquodatasetsrsquo and lsquodatatablesrsquo or merely as lsquodatarsquo In an effort to standardizeusage for such collections a recent publication [11] byseveral members of the Task Group has proposed a ser-ies of possible working definitions

ldquorsquoData tablesrsquo represent precisely the set or sets ofdata upon which the analyses and conclusions of agiven scientific paper are based A data table is thusa discrete fixed time-bounded collection serving asa referentrsquoDatasetsrsquo represent discrete collections of dataunderlying a scientific paper Datasets are thus alsofixed and time-bound though functioning in a moregeneral way as a referentrsquoDatabasesrsquo represent larger dynamic and moreextensively coherent collections of data By this defi-nition databases are not fixed or time-bounded buthave properties of quality control and integrity andshould provide the capacity for version control andversion retrospectionrdquo

In the context of this article we propose a clear dis-tinction between fixed data tables that represent pre-cisely the set or sets of data on which the specificanalysis and conclusions of a scientific paper are baseddatasets understood as a fixed and time-bound logicalfiles presenting a collection of facts (observationsdescriptions or measurements) formally structured intostandard records and dynamic databases representinglarger and more extensive collections of data that may

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 2 of 10

or may not include the precise data tables or more gen-eral datasets tables that are the referent(s) for a givenscientific paper Each data record is structured in fieldswith specifications for appropriate field content In thecontext of this article lsquoprimary biodiversity datarsquo isdefined as digital text or multimedia data records pro-viding facts about the instance of an organism the whatwhere when how and by whom of the occurrence andthe recording [12] By this definition data tables anddatasets are inextricably linked to scientific papers andthe publisher must assure consistent and secure accessin perpetuity to referent data tables and datasets [11]Thus these collections of data impose the heaviest bur-den of responsibility on the publisher for sustainedaccessWith respect to the publishing of data the customary

practices of science suggest that data providing the evi-dence for conclusions drawn in a scientific paper orreport should be available for review evaluation andtesting This provision is fundamental to the objectivepractice of science as lsquoorganized skepticismrsquo [13] Appro-priate standards for testing data vary depending on theexact nature of the data For example in situ field dataare evaluated by consideration of the field context themethodology or apparatus used to collect data the con-sistency or inconsistency with other comparable studiesthe quality and detail of the reported observationsphotographs or audio recordings and material evidence(specimen genetic sample scat tracks and so on) Theactual practicability of testing and assessing data ishighly dependent on the thoroughness with which dataare described and how completely the context for datacollection is described This leads logically to the ques-tion of metadata as a source of necessary contextualinformation about data

How data have meaning metadatarsquo2607rsquo and lsquo059998rsquo are each an actual datum or lsquodatapointrsquo It is immediately obvious that without anydescription of context for the creation and capture ofdata an isolated datum is meaningless Descriptiveinformation is necessary to impart meaning The formerdatum was recorded by Henry Cavendish in his ldquoExperi-ments to Determine the Density of the Earthrdquo (21 June1798) and was published in the Philosophical Transac-tions of the Royal Society of London [14] The Cavendishdatum was a result of a humanly contrived experimentusing a specially designed apparatus The latter datum isa reading obtained from automated data loggers record-ing sap flow in Manzanita plants at the University ofCalifornia James Reserve Mt San Jacinto California (4December 2007 1137) and was recorded by a data log-ger in an as yet unpublished Microsoft Excel spread-sheet (Gary Geller 2010 personal communication)

However in the simple contexts disclosed above wehave learned that some agent conducted a data gather-ing exercise at a given date and time and at a describedplace Inference of a probable general scientific domainor discipline for the data - for example physics or ecol-ogy or botany - provides only a very general delimiter ofthe probable character of the data We do not forexample know the actual type of automated data loggerused its proper calibration the actual details of itsdeployment in this instance of use or the competenceof the person using the data logger Lacking this infor-mation and other information that would serve to vali-date the quality of the data presented we are challengedwith the need to develop and to provide more completedescriptions to make data fit for use and in particularfit for testing and evaluation

Provision of metadataTo avoid the risks of overly intricate and elaboratemetadata standards that fail by requiring inordinateinvestments of time and resources we suggest thatmetadata be initially designed to provide minimally ade-quate description for discovery and access to data Wepropose that in the interests of optimal efficiency ofeffort careful efforts be made to apply inference andrecursion in creation of such minimally adequate meta-data and that metadata subsequently be available for thecontinuing addition of fresh increments of metadataThis recommendation implies that metadata creationshould be a continuous collaborative process not a sin-gle event Specifically with respect to museum collec-tions we recommend that links to relevant typespecimens be included as a part of the metadata recordMoreover we believe that by careful application of

qualified social tagging - that is of indexing by expertusers applying well-formed ontologically suitable voca-bularies and authority files - substantial developmentand enrichment of metadata records can be accom-plished (this recommendation requires applications thatcan support a dynamic coherent and iterative develop-ment of metadata over time) [15]We also suggest that assessment of the fitness of

metadata for use be considered from the lsquodemand sidersquoby asking how data have typically been used to besteffect in the creation of biodiversity knowledge andpolicyThere are many technical publications - for example

Voss and Emmons lsquoMammalian diversity in neotropicallowland rainforests a preliminary assessmentrsquo [16] theUS Fish and Wildlife Servicersquos lsquoStatistical guide to dataanalysis of avian monitoring programsrsquo [17] or Agostiet alrsquos lsquoAnts standard methods for measuring and mon-itoring biodiversityrsquo [18] - that provide detailed descrip-tions of common data collection methods or of

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 3 of 10

statistical processes applied to biodiversity dataRecently the European Union Framework Projects 6project EDIT (European Distributed Institute for Taxon-omy) has developed a complete workflow from datacollection in the field to assembly of datasets and ana-lyses [1920] These and many other works provide gui-dance in the development of standard ontologies fordata descriptionWe recommend a research process that - from an

ontological perspective - systematically reviews analyzesand specifies how data can most efficiently be suppliedto fit the needs of these primary biodiversity-monitoringprocesses We suggest detailed survey and analysis ofthe primary and standard forms of processing that bycommunity consensus are of greatest proven value andimpact in biodiversity conservation This assures thatinvestments in data collection will have optimal proba-tive force Based in this analysis standards can belsquoreverse engineeredrsquo to produce data best suited to thedemands of biodiversity conservationWe also strongly recommend careful analysis of stan-

dards already under development The EcologicalMetadata Language (EML) [21] under continuingdevelopment has made significant progress but webelieve that the issues raised elsewhere in this reporthave yet to be addressed Specifically significant onto-logical work remains to be accomplished regarding theanalysis and standard definition of biological field tech-niques data transformation methods and statisticalprocessesWe also believe that the scripting capacity of standard

statistical packages [22] and still emergent applicationsfor documenting scientific workflow (such as Kepler[23]) may both have direct utility in recording the pro-cess and context for scientific data capture A notableexample of such workflow capture is in the Galaxygenomics platform [24] Ontological research and devel-opment coupled with applications development shouldprovide the necessary foundations for required descrip-tions of dataIn the social sciences the Data Documentation Initia-

tive based at the University of Michiganrsquos Interuniver-sity Consortium for Political and Social Research(ICPSR) has been underway for several years and isnow at version 31 [25] Similarly a 2009 publication ofthe OECD has proposed a model template for metadatadescribing a published dataset [26] The requirement offree text abstracts may provide an adequate frame forsuch detailed specification but considerable additionalwork will be demanded particularly in deriving minimaldescriptive standards for discovery of biodiversity dataThe importance of metadata in exposing data to dis-

covery becomes increasingly important as the units intowhich data are assembled become smaller The

molecular sequence repositories developed and main-tained by International Nucleotide Sequence DatabaseCollaborations (INSDC [27]) such as GenBank [28]ENA [29] and DDBJ [30] are perhaps among the bestknown example of a data repository but although thesearch interfaces and the utility of data contained withGenBank are very limited (and especially geared formolecular biologists) its global prominence makes it anobvious search target Biodiversity data in general arefar more complicated and tend to be made available insmaller blocks for example the data associated with asingle publication Locating and combining data relevantto a particular purpose thus becomes a goal in itself andis made possible through the existence of metadatausing standard vocabularies

Open access and biodiversity dataOpen access to primary biodiversity data is essentialboth for enabling effective decision making and forempowering stakeholders involved with and affected bythe conservation of biodiversity [31-33] Specifically withrespect to scientific publishing the ability to criticallyevaluate a published scientific hypothesis or scientificreport is contingent on the examination analysis eva-luation and if feasible re-generation of data on whichconclusions are based Biodiversity is not an exceptionto such data restrictions For example authors of apaper published on the failure of African game parks tosuccessfully conserve large mammals were unable topresent local data gathered from reserve operators whowanted it to be kept confidential [34]There is broad emerging consensus in the scientific

and conservation communities that data should befreely openly available in a sustained persistent andsecure way [35-38] However many existing primarybiodiversity data are neither accessible nor discoverable[39] This issue is further compounded by lack of appro-priate representation andor visualization of availabledata and lack of linkability among distributed and het-erogeneous data resources [4041] This adversely affectsthe optimal utility of the biodiversity data Thus anurgent need exists for the discovery of primary biodiver-sity data and its publication in the public domainFor decades there have been declarations statements

policies and guidelines encouraging open access to pri-mary scientific data [3142] With the establishment ofthe Global Biodiversity Information Facility (GBIF) in2001 an attempt has been made to develop a globalinfrastructure to consolidate the discovery of the worldrsquosprimary biodiversity data and to provide coherentaccess Currently the GBIF network facilitates access tonearly 304 million data records through its portal [43]However these primary biodiversity data records arejust a fraction of the estimated volume of existing data

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 4 of 10

[44-47] This large volume of biodiversity data collectedby a vast number of biodiversity researchers and ama-teurs [3147] remains largely undiscovered and unpub-lished This is attributable we believe to a lack ofencouragement misperceptions of self-interest or lackof infrastructural support Although infrastructure sup-port is increasingly available the problem of appropriateprofessional recognition for institutions and individualsremains [31] We believe that this lack of incentiveremains a major impediment to the provision of freeand open access to primary biodiversity data

The GBIF data publishing framework task groupThe foregoing discussion emphasizes the need for a datapublishing framework to evolve metrics and indicatorsthat provides incentives to multiple actors involved inthe generation of data Recognizing the need for addres-sing social policy political and technical issues influen-cing discovery and publishing through the GBIFnetwork the GBIF Data Publishing Framework TaskGroup (DPF TG) was commissioned in March 2009[48] The DPF TG was tasked with providing recom-mendations on (a) social technical and policy interven-tions that would encourage publication of primarybiodiversity data as a necessary and in-built step in thescientific data management cycle (b) opportunities andmechanisms to incentivize and attribute credit forinvestment in primary biodiversity data publishing fromindividual to institutional to national levels and (c)mechanismsprocesses for recognizing efforts of datapublishers The concept of the data publishing frame-work was described at the International BiodiversityInformatics Conference (rsquoe-Biosphere 09rsquo) held in Lon-don in June 2009 [49] In its meeting in June 2009 theDPF TG discussed issues influencing discovery and pub-lishing of primary biodiversity data and possible solu-tions in overcoming impediments

A data publishing framework for primary biodiversitydataDuring its meeting in June 2009 the DPF TG investedsignificant time in defining and determining the scopeand purpose of the data publishing framework for pri-mary biodiversity data The DPF TG recognized theneed expressed by the data originators and informationsystemnetworks for data usage metrics and indicatorsto ensure that the overall utility and impact of their datamanagement and publishing activities is objectivelydocumented leading to crediting of these activities asscientific activity on a par with the recognition receivedfor conventional scholarly publication [31] Furthermoremeasures of scientistsrsquo productivity will be betterinformed through data publishing which requires a pro-fessional cultural change in the recognition of scientific

output [50] Such an incentive mechanism wouldachieve increased data mobilization and increased recog-nition for data generation both desirable outcomes forscientistsOur discussion examined five primary components

that comprise a data publishing framework These com-ponents are (a) socio-cultural (b) technical-infrastruc-tural (c) policy-political (d) legal and (e) economic andthey support various activities of the data publishingcycle (see Figure 1 in [31]) These components are notonly complementary but are inter-dependent Thusthere is no dependency on a sequence of componentsas components need to be implemented concurrentlyTherefore we define a data publishing framework as anenvironment conducive to ensuring free and openaccess to the worldrsquos primary biodiversity data Thecore purpose of the framework is to overcome barriersor impediments affecting access to data and the pub-lishing of data

RecommendationsOn the basis of our understanding of issues influencinglsquofree and open accessrsquo discovery and publishing of theprimary biodiversity data to encourage institutionaliza-tion of the data publishing framework for discoverypublishing and use of primary biodiversity data wemake specific recommendations The key words lsquomustrsquolsquomust notrsquo lsquorequiredrsquo lsquoshallrsquo lsquoshall notrsquo lsquoshouldrsquo lsquoshouldnotrsquo lsquorecommendedrsquo lsquomayrsquo and lsquooptionalrsquo in this docu-ment are to be interpreted as described in RFC 2119lsquoKey words for use in RFCs to Indicate RequirementLevelsrsquo of the Internet Engineering Task Force [51]Sharing of biodiversity data must be the expected

norm We stipulate that withholding of data - to protectprecise localities for collectible or marketable plants oranimals or for species of special concern - should be theexception and require explicit justification We empha-size that such data represent a small fraction of biodi-versity data and should not be allowed to dictate normalpractice We also stipulate that our call for access tobiodiversity data does not supersede national or indigen-ous rights to regulate uses of biodiversity data as protec-tion against commercial exploitation (rsquobiopiracyrsquo) Tothis end we suggest close consultation and confirmationwith CITES [52] and the TRAFFIC Secretariat [53]when questions of this kind occur As a corollary allcontributors of data must receive appropriate propor-tional recognition for their contributions of data Onthis backdrop we offer 24 recommendations Recom-mendation 1 is however the primary recommendationthat leads to the other recommendationsRecommendation 1 All data relevant to the under-

standing of biodiversity and to biodiversity conservationshould be made freely openly and effectively available

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 5 of 10

Recommendation 2 GBIF must re-examine its cur-rent data resources endorsement model and scrutinizethe current practice that national nodes or associateparticipant nodes are required to give endorsementbefore the data are discovered and indexed throughGBIF networkRecommendation 3 GBIF must engage mainstream

scholarly publishers and scientific societies with scho-larly publications to be part of the GBIF network as amajority of them would qualify to be thematicglobalregional associate-participantsRecommendation 4 GBIF must support the develop-

ment of a tool to convert tabular data into resourcedescription framework (RDF) formats conforming to astandard ontology This would be highly desirable forsmall custodianspublishers but is primarily a tool formainstream scholarly publishers (Support for develop-ment of such an open source application should besought from mainstream commercial publishers) GBIFshall evaluate standards such as BioPax [54]Recommendation 5 GBIF must facilitate discovery

and mobilization of all streamstypes of relevant biodi-versity data (This effort should - in close collaborationwith others focusing on this development - includeontological analysis of the most important types of datato be considered the elaboration of suitable workingformats for that data and the developing of mappingstofrom such working formats to a standard RDF formatfor interchange purposes)Recommendation 6 GBIF should develop a set of

supporting tools (such as templates) for biodiversitydata to accommodate more than simple occurrencedata GBIF must increasingly engage with various biodi-versity data communitiesRecommendation 7 GBIF must facilitate discovery of

un-digitized and not yet published datasets togetherwith indexing of published datasets (potentially toinclude semantic indexing based on RDF to allow data-sets to be filtered and retrieved with SPARQL queries)In this regard we strongly endorse the recommendationby the GBIF Global Strategy and Action Plan for Mobili-zation of Natural History Collections data [55]Recommendation 8 GBIF should review the use of

legacy literature such as is stored in Biodiversity Heri-tage Library (BHL) to explore uses of marked-up textsfor data mining and capture of historical biodiversityinformationRecommendation 9 GBIF must explore and develop

the capacity to run queries at the GBIF data portal toreturn harmonized well formed XML andor RDF suchthat fields can be extracted for subsequent analysisRecommendation 10 GBIF must expand and

improve its metadata implementation framework tosuch that fitness for use of the data resource for

intended use can be ascertained from metadata Forexample data records should identify lineage and prove-nance (where data originated and from which dataresource) of all contributed data - at least to the pre-vious phase of data transformation Further we stronglyencourage early implementation of the recommenda-tions of the GBIF Metadata Implementation FrameworkTask Group [56]Recommendation 11 GBIF must strengthen its net-

work of mirror sites and distributed network of lsquotrusteddigital repositoriesrsquo (also called data hosting centers) Inthis regard we call on GBIF to ensure early implementa-tion of the recommendations in this issue on data host-ing infrastructure [57]Recommendation 12 GBIF must explore the feasibil-

ity of using a cloud infrastructure to overcome barriersof investment and maintenance required for biodiversitydata discovery and publishing especially in the develop-ing and under-developed regions of the worldRecommendation 13 GBIF must ensure an early

implementation of the recommendations of the GBIFLife Sciences Identifier (LSID)globally unique identifier(GUID) Task Group [58] We further emphasize theneed for GBIF to adopt a stable and proven persistentidentifier such as the lsquodigital object identifier (doi)rather than unstable persistent identifiersRecommendation 14 GBIF must explore the poten-

tial of the Data Usage Index (DUI) as potential incenti-vization mechanism to recognize efforts required forpublishing of biodiversity data [3159] GBIF shoulddevelop a prototype of such an implementationRecommendation 15 GBIF must institutionalize a

lsquodata citation mechanismrsquo and establish a lsquodata citationservicersquo facilitating deep-data citation and registrationand resolving of citations [26] For the purposes ofaccountability and citation (attribution) all contributorsof data to any aggregation should be identified andacknowledged Individuals or institutions responsible forprimary data have an obligation to make these owner-ship statements available to the aggregators who areresponsible for using them The Dryad applicationwhich uses DataCite to register dois is an initial effortto address this concern [60] In any data aggregationchain the aggregator at each level is responsible foridentification of data sources from previous level ofaggregation and its contributors We believe that thisprovision avoids the complexity of comprehensive iden-tity of all lsquocascadedrsquo data sources and contributors dur-ing the aggregation process It is of course neverthelessthe case that the validity and integrity of data are ulti-mately linked to the sum of the integrity and validity ofall data processes in the lineage of data creationRecommendation 16 GBIF should investigate inno-

vative mechanisms for discovery and publishing of

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 6 of 10

primary biodiversity data in multiple languages GBIFshould commission a position paper detailing suchmechanisms for potential uptake by the communityRecommendation 17 GBIF must institutionalize the

lsquobiodiversity informatics potentialrsquo (BIP) Index todemonstrate the potential and urgency for nations toimplement biodiversity informatics [61] In the longterm GBIF must lead the periodic release of a lsquoglobalbiodiversity information outlookrsquo report analyzing thecurrent state of biodiversity information to meet thelocal-to-global scale biodiversity targetsRecommendation 18 GBIF must commission a strat-

egy paper demystifying the concernsissues related tointellectual property rights and primary biodiversitydata In this regard the substantial work done by theScience Commons (for example the Science CommonsProtocol for Implementing Open Access Data [62]) andthe Open Knowledge Foundation [63] should havedirect applicationRecommendation 19 GBIF should encourage spon-

sors of biodiversity research whether government agen-cies corporations or private foundations to setmandatory requirements for free and open access tobiodiversity data GBIF should encourage that negotia-tions for overhead (indirect) cost contributions fromfunders should include calculations of cost for sustaineddigital infrastructure that is adequate for free and opensharing and the sustained secure and persistent mainte-nance of data Proposals should be expected to includeadequate planning and financial provision for sustaineddata management and access We further recommendthat GBIF should encourage peer review processes thatinclude rigorous scrutiny of past histories of successfulsharing and should support the norm of state-of-the-artplanning for sharing not simply promises to ldquoput dataon the webrdquoRecommendation 20 GBIF must develop a plan to

foster linkages between scholarly publishers and datapublishers from the local to the global scale GBIFshould encourage that records of professional publica-tion be evaluated - at least in part - on the basis of pub-lication in open access journals that do not deny accessthrough lsquopaywallsrsquo and that provide support for sustain-able open access to dataRecommendation 21 GBIF should urge accreditation

bodies for educational institutions and museums torequire demonstrated evidence of capacity to supportdigital access and maintenance of dataRecommendation 22 GBIF should encourage profes-

sional societies and professional disciplines to requireevidence of effective sharing of data in evaluations forhiring promotion and tenureRecommendation 23 GBIF should develop a concep-

tual lsquolandscape maprsquo depicting GBIFrsquos position role

unique advantages and collaborative strategies amid themany biodiversity and biodiversity informatics initiativesat local to global scales This is very important given thebroad reach of the earlier recommendations It is impor-tant that the scope of the GBIFrsquos own vision and mis-sion is well defined with a clear picture of how GBIFrsquosrole fits into a wider framework of sustainable develop-ment and of free and open access to biodiversity dataRecommendation 24 GBIF must evaluate prioritize

and implement the recommendations made by its taskgroups - the Content Needs Assessment Task Group(CNA TG) [42] the Multimedia Resources Task Group(MRTG) [6465] the Metadata Implementation Frame-work Task Group (MIFTG) [56] the LSID-GUID TaskGroup (LGTG) [58] the Observational Data TaskGroup (ODTG) [66] - and in the Global Strategy andAction Plan for Natural History Collections Data(GSAP-NHC) [55] and recommendations on e-learningrecommendations [67] Knowledge Organization System(KOS) [68] and fitness for use [69]

DiscussionThese recommendations grew out of our discussion inJune 2009 Since then there have been subsequentrevisions and modifications of the recommendationsand some additions Chavan and Ingwersen [31]further elaborated on various components of the datapublishing framework especially pertaining to theissues of persistent identifiers the data usage indexand a data citation mechanism This was further dis-cussed during the DataCite Summer Workshop 2010[70] Members of the Task Group were engaged inexploring solutions to various components of the datapublishing framework some of which are included inthis issue [57596171] and some published elsewhere[697273] and MJ Costello WK Michener et al per-sonal communicationIn January 2011 the US National Science Foundation

(NSF) implemented a policy requiring all NSF grantapplicants to submit data management plans as a partof any grant proposal [74] This policy change seems torepresent a very significant fulfillment of our recom-mendation though the exact details of its implementa-tion remain as yet unclearWe believe that timely implementation of

these recommendations and suggested solutions orapproaches by the GBIF network will support muchneeded recognition for individual and institutionalefforts in management and publishing of primary bio-diversity data GBIFrsquos support of these recommenda-tions should be of critical importance in establishingtheir credibility and winning their widespread adop-tion Implementation of these recommendations shouldsubstantially increase the volume of available primary

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 7 of 10

biodiversity data substantiating public investment inbiodiversity science and conservation of bioticresourcesThe DPF TG notes several preliminary efforts to

implement these recommendations by the GBIF Secre-tariat The DPF TG recommendation on incentivizingefforts for metadata authoring has led the GBIF secre-tariat to commission Pensoft Publishers to create a lsquodatapaperrsquo [71] section in four of its journals (BioRisks Phy-toKeys NeoBiota and ZooKeys) alongside a lsquopush-buttonrsquomechanism to generate XML-encoded manuscripts frommetadata descriptions to be submitted directly to thepublisher for peer review and editorial evaluation andpublication in a form of a data paper [71] The BIPIndex an exploratory study to develop metrics to deter-mine country-level biodiversity informatics potentialshas been undertaken [61] GBIF was moreover invitedto be part of the group of experts convened by theCODATA (the Committee on Data for Science andTechnology) to develop an approach to data citationWe were mandated to make recommendations for

potential uptake by the GBIF network However webelieve that these recommendations apply to thebroader biodiversity informatics and ecoinformaticscommunity Nevertheless we reiterate that the GBIFnetwork is the most natural venue to kick-start the earlyimplementation of these recommendations As GBIFenters into its third phase in which it aspires to be theforemost global resource for biodiversity information[75] an early leadership and proactive step towardsimplementation of these recommendations is imperativefor its success

Conclusions and future workThe effective sharing of research data has become a goalof the international research community Implementa-tion of these recommendations should expedite the pro-gress of archiving curation discovery and publishing ofprimary biodiversity data because scientists and origina-tors of data will realize the value and incentives for suchefforts We believe that implementation of our recom-mendations by the GBIF network and its adoption bysimilar initiatives such as GEO-BON IPBES and CBDwill contribute to a much needed global research infra-structure and specifically to an open access regime inbiodiversity and conservation science We furtherbelieve that adoption should encourage the evolution ofa richly informed virtual research space for future stu-dies in biodiversity [76] However we believe that ulti-mately implementation of these recommendations willdepend less on policy-political decisions or technical-infrastructural development and primarily on culturalnormative and attitudinal changes by individuals institu-tions and organizations

AcknowledgementsThis article has been published as part of BMC Bioinformatics Volume 12Supplement 15 2011 Data publishing framework for primary biodiversitydata The full contents of the supplement are available online at httpwwwbiomedcentralcom1471-210512issue=S15 Publication of the supplementwas supported by the Global Biodiversity Information Facility

Author details11968frac12 South Shenandoah Street Los Angeles California 90034-1208 USA2Aundh Pune 411007 India 3Zoology Microbiology Research GroupZoology Department Natural History Museum Cromwell Road London SW75BD UK 4Royal School of Library and Information Science Birketinget 6Copenhagen DK 2300 Denmark 5Oslo University College Pb 4 St OlavsPlass 0130 Oslo Norway 6Plazi Zinggst 16 3600 Bern Switzerland andAmerican Museum of Natural History Central Park West at 79th Street NewYork NY 10024 USA 7Institute of Biodiversity and Ecosystem ResearchBulgarian Academy of Sciences and Pensoft Publishers 13a Geomilev Street1111 Sophia Bulgaria 8BioMedCentral Ltd Floor 6 236 Grayrsquos Inn RoadLondon WC1X 8HB UK 9Global Biodiversity Information Facility SecretariatUniversitetsparken 15 DK 2100 Copenhagen Denmark

Competing interestsThe authors declare that they have no competing interests

Published 15 December 2011

References1 Merriam-Webster [httpwwwmerriam-webstercomdictionarydata]2 Wikipedia [httpenwikipediaorgwikiData]3 National Science Foundation Sustainable Digital Data Preservation and

Access Network Partners (DataNet) Program Solicitation NSF 07-601 2008[httpwwwnsfgovpubs2007nsf07601nsf07601htmtoc]

4 AnthroDPA Metadata Working Group Report of the AnthroDPA MetaDataWorking Group May 2009 Sponsored by the Wenner-Gren Foundationand the US NSF[httpanthrodatadpaorgMediaAnthroDataDPA20Reportpdf]

5 Ackoff RL From data to wisdom Journal of Applied Systems Analysis 1989163-9

6 Bellinger C Castro D Mills A Data Information Knowledge and Wisdom2004 [httpwwwsystems-thinkingorgdikwdikwhtm]

7 Bose R Frew J Lineage retrieval for scientific data processing a surveyACM Computing Surveys 2005 371-28

8 Lathe W Williams J Mangan M Karolchik D Genomic data resourceschallenges and promises Nature Education 2008 13[httpwwwnaturecomscitabletopicpageGenomic-Data-Resources-Challenges-and-Promises-743721]

9 Grantham HS Moilanen A Wilson KA Pressey RL Rebelo TGPossingham HP Diminishing return on investment for biodiversity datain conservation planning Conservation Letters 1190-198 doi 101111j1755-263X200800029x

10 Closing the Climategate Nature 2010 468345 doi 101038468345a11 Penev L Erwin T Miller J Chavan V Moritz T Griswold C Publication and

dissemination of datasets in taxonomy ZooKeys working exampleZooKeys 2009 111-8 doi 103897zookeys11210

12 GBIF GBIF Work Programme 2009-2010 Copenhagen Global BiodiversityInformation Facility 2008

13 Merton RK The Normative Structure of Science The Sociology of ScienceTheoretical and Empirical Investigations Chicago University of Chicago Press1979 267-278

14 Cavendish H Read AS Experiments to determine the density of theearth Philos Trans R Soc Lond 1798 II469-526

15 Michener WK Meta-information concepts for ecological datamanagement Ecological Informatics 2006 13-7 doi 101016jecoinf200508004

16 Voss RS Emmons L Mammalian diversity in neotropical lowlandrainforests a preliminary assessment Bulletin of the American Museum ofNatural History 1996 230

17 Nur N Jones SL Geupel GR Statistical Guide to Data Analysis of AvianMonitoring Programs BTP-R6001-1999 Washington DC US Departmentof the Interior Fish and Wildlife Service 1999 61[httplibraryfwsgovPubs9avian_monitoringpdf]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 8 of 10

18 Agosti D Majer J Alonso E Schultz TR Ants Standard Methods forMeasuring and Monitoring Biodiversity Biological Diversity HandbookSeries Washington DC Smithsonian Institution Press 2000 [httpantbaseorgantspublications2033020330pdf]

19 EDIT Platform for Cybertaxonomy [httpwp5e-taxonomyeu]20 EDIT Volume on field recording techniques and protocols for all taxa

biodiversity inventories 2010 [httpwwwabctaxabevolumesvolume-8-manual-atbi]

21 Knowledge Network for Biodiversity an Introduction to EcologicalMetadata Language [httpknbecoinformaticsorgeml_metadata_guidehtml]

22 Borer ET Seabloom EW Jones MB Schildhauer M Some simple guidelinesfor effective data management ESA Bulletin 2009 90206-214[httpwwwesajournalsorgdoipdf1018900012-9623-902205]

23 The Kepler Project [httpskepler-projectorg]24 Giardine B Riemer C Hardison RC Burhans R Elnitski L Shah P Zhang Y

Blankenberg D Albert I Taylor J Miller W Kent WJ Nekrutenko A Galaxy aplatform for interactive large-scale genome analysis Genome Res 2005151451-1455

25 DDI Alliance Metadata specification for social and behavioral sciencesver 31[http httpwwwddiallianceorg]

26 Green T We need publishing standards for datasets and data tablesWhite paper OECD Publishing 2009 9-11 doi 101787603233448430

27 International Nucleotide Sequence Database Collaboration [httpinsdcorg]

28 GenBank [httpwwwncbinlmnihgovGenbankindexhtml]29 European Nucleotide Archive [httpwwwebiacukena]30 DNA Data Bank of Japan [httpwwwddbjnigacjp]31 Chavan VS Ingwersen P Towards a data publishing framework for

primary biodiversity data challenges and potentials for the biodiversityinformatics community BMC Bioinformatics 2009 10(Suppl 14)S2 doi1011861471-2105-10-S14-S2

32 Penev L Sharkey M Erwin T van Noort S Buffington M Seltmann KJohnson N Taylor M Thompson FC Dallwitz MJ Data publication anddissemination of interactive keys under the open access modelZooKeys working example ZooKeys 2009 211-17 doi 103897zookeys21274

33 Reichman OJ Jones MB Schildhauer MP Challenges and opportunitiesof open data in ecology Science 2011 331703 doi 101126science1197962

34 Craigie ID Baillie JEM Balmford A Carbone C Collen B Green REHutton JM Large marine population declines in Africarsquos protected areasBiol Conserv 2010 1432221-2228

35 Berlin Declaration on Open Access to Knowledge in the Sciences andHumanities 2003 [httpoampgdelangen-ukberlin-prozessberliner-erklarung]

36 Berlin Declaration Table of Signatories [httpoampgdelangen-ukberlin-prozesssignatoren]

37 About Conservation Commons [httpconservationcommonsnetcc_en_1-about-conservation-commons]

38 Conservation Commons Partners [httpconservationcommonsnetpartners]

39 Chavan V Watve AV Londhe MS Rane NS Pandit AT Krishnan SCataloguing Indian biota the electronic catalogue of known Indianfauna Curr Sci 2004 87749-763

40 Sarkar IN Biodiversity informatics organizing and linking informationacross the spectrum of life Brief Bioinf 2007 8347-357

41 Page RDM Biodiversity informatics the challenge of linking data and therole of shared identifiers Brief Bioinf 2008 9345-354

42 Faith DP Collen B Arino AH Koleff P Guinotte J Kerr J Chavan V Bridgingthe biodiversity data gaps recommendations of the GBIF ContentNeeds Assessment Task Group Biodiversity Informatics 2011

43 GBIF Data Portal [httpdatagbiforg]44 Butler D Gee H Macilwain C Museum research comes off list of

endangered species Nature 1998 394115-11745 Chavan V Krishnan S Natural history collections A call for national

information infrastructure Curr Sci 2003 8434-4246 Arino AH Approaches to estimating the universe of natural history

collections data Biodiversity Informatics 2010 781-9247 Heidorn PB Shedding light on the dark data in the long-tail of science

Library Trends 2008 57280-299 doi 101353lib00036

48 GBIF GBIF commissions Data Publishing Framework Task Group (10March 2009)[httpwwwgbiforgcommunicationsnews-and-eventsshowsinglearticlegbif-commissions-data-publishing-framework-task-group]

49 Chavan V Data Publishing = Scholarly Publishing e-Biosphere 09International Conference on Biodiversity Informatics June 2009 London[httpwwwslidesharenetvishwaschavanebiosphere09-vc-final-1734144]

50 Roberts D Chavan V Standards identifier could mobilize data and freetime Nature 2008 453449-450

51 IETF RFC 2119 (Released 1997)[httpwwwietforgrfcrfc2119txt]52 CITES [httpwwwcitesorg]53 TRAFFIC [httpwwwtrafficorg]54 BioPAX - Biological Pathway Exchange [httpwwwbiopaxorg]55 Berendsohn WG Chavan V Macklin JA Recommendations of the GBIF

Task Group on the Global Strategy and Action Plan for the mobilizationof the natural history collections data Biodiversity Informatics 2010767-71

56 Global Biodiversity Information Facility Report of the GBIF MetadataImplementation Framework Task Group (MIFTG) Copenhagen GlobalBiodiversity Information Facility 2009 [httpwww2gbiforgGBIF-MIFTG-Reportpdf]

57 Goddard A Wilson N Cryer P Yamashita G Data hosting infrastructure forprimary biodiversity data BMC Bioinformatics 2011 12(Suppl 15)S5

58 GBIF Adoption of Persistent Identifiers for Biodiversity InformaticsRecommendations of the GBIF LSID GUID Task Group CopenhagenGlobal Biodiversity Information Facility 2009 [httpwww2gbiforgPersistent-Identifierspdf]

59 Ingwersen P Chavan V Indicators for the Data Usage Index (DUI) anincentive for publishing primary biodiversity data through globalinformation infrastructure BMC Bioinformatics 2011 12(Suppl 15)S3

60 DataCite Metadata [httpswwwdatadryadorgwikiDataCite_Metadata]61 Arino AH Chavan V King N The Biodiversity Informatics Potential Index

BMC Bioinformatics 2011 12(Suppl 15)S462 Science Commons Protocol for Implementing Open Access Data [http

sciencecommonsorgprojectspublishingopen-access-data-protocol]63 Open Knowledge Foundation [httpokfnorg]64 Morris R Olson A OrsquoTuama E Riccardi G Whitbread G Hagedorn G

Teage I Heikkinen M Leary P Barve V Chavan V Recommendations of theGBIF Multimedia Resources Task Group Copenhagen Global BiodiversityInformation Facility 2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

65 Morris R Olson A Freeland C Hagedorn G Riccardi G Carausu M-COrsquoTuama E Chavan V Mobilising Multimedia Resources in Biodiversity2nd Report of the GBIF Multimedia Resources Task Group (MRTG)Copenhagen Global Biodiversity Information Facility 2009 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

66 Kelling S Ingole B Daly B Stein B Lepage D OrsquoTuama E Cooper JJones M Lahti T Chavan V Recommendations of the GBIF ObservationalData Task Group Copenhagen Global Biodiversity Information Facility2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

67 Balde O Encinas Escribano M Gonzaacutelez-Talavaacuten A Martens MJMNorton GA Talukdar GH GBIF Task Group on Electronic Learning FinalReport version 10 Copenhagen Global Biodiversity Information Facility2010 [httplinksgbiforggbif_elearning_task_group_en_v1pdf]

68 Catapano T Hobern D Lapp H Morris RA Morrision N Noy NSchildhauer M Thau D Recommendations for the Use of KnowledgeOrganisation Systems by GBIF Copenhagen Global BiodiversityInformation Facility 2001 [httplinksgbiforggbif_kos_whitepaper_v1pdf]Released on 04 Feb 2011

69 Hill AW Otegui J Arintildeo AH Guralnick RP GBIF Position Paper on FutureDirections and Recommendations for Enhancing Fitness-for-Use Acrossthe GBIF Network version 10 Copenhagen Global BiodiversityInformation Facility 2010 [httpwww2gbiforgGPP-Finalpdf] PrimaryBiodiversity Data

70 Chavan V Towards Data Publishing Framework DataCite Summer Meeting7-8 June 2010 Hannover Germany [httpflowcastsmediaelearninguni-hannoverde2010-07-05datacite2010AcquiringhighqualityresearchdataAndreasHense-640-video-O3hD9ZOmmp4]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 9 of 10

71 Chavan V Penev L The data paper a mechanism to incentivize datapublishing in biodiversity science BMC Bioinformatics 2011 12(Suppl15)S2

72 Berents P Hamer M Chavan V Towards demand driven publishingapproaches to the prioritization of digitization of natural historycollections data Biodiversity Informatics 2010 7113-119

73 Chavan VS Sood RK Arino AH Best Practice Guide for lsquoData Discoveryand Publishing Strategy and Action Plansrsquo version 10 CopenhagenGlobal Biodiversity Information Facility 2010 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

74 NSF Data Management Plan Requirements [httpwwwnsfgovenggeneraldmpjsp]

75 GBIF GBIF Strategic Plan 2012-2016 Seizing the Future CopenhagenGlobal Biodiversity Information Facility 2011 [httpgbifddbjnigacjpgbif_newsuploadGBIF_Strategic_Plan_2012-16pdf]

76 Gaikwad J Chavan V Open access and biodiversity conservationchallenges and potentials for the developing world Data Science Journal2006 51-17

doi1011861471-2105-12-S15-S1Cite this article as Moritz et al Towards mainstreaming of biodiversitydata publishing recommendations of the GBIF Data PublishingFramework Task Group BMC Bioinformatics 2011 12(Suppl 15)S1

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 10 of 10

  • Abstract
    • Background
    • Discussion
    • Conclusions
      • Background
        • Data usage and definitions
        • The volume of data
        • Collections of data databases datasets and data tables
        • How data have meaning metadata
        • Provision of metadata
        • Open access and biodiversity data
        • The GBIF data publishing framework task group
        • A data publishing framework for primary biodiversity data
          • Recommendations
          • Discussion
          • Conclusions and future work
          • Acknowledgements
          • Author details
          • Competing interests
          • References

or may not include the precise data tables or more gen-eral datasets tables that are the referent(s) for a givenscientific paper Each data record is structured in fieldswith specifications for appropriate field content In thecontext of this article lsquoprimary biodiversity datarsquo isdefined as digital text or multimedia data records pro-viding facts about the instance of an organism the whatwhere when how and by whom of the occurrence andthe recording [12] By this definition data tables anddatasets are inextricably linked to scientific papers andthe publisher must assure consistent and secure accessin perpetuity to referent data tables and datasets [11]Thus these collections of data impose the heaviest bur-den of responsibility on the publisher for sustainedaccessWith respect to the publishing of data the customary

practices of science suggest that data providing the evi-dence for conclusions drawn in a scientific paper orreport should be available for review evaluation andtesting This provision is fundamental to the objectivepractice of science as lsquoorganized skepticismrsquo [13] Appro-priate standards for testing data vary depending on theexact nature of the data For example in situ field dataare evaluated by consideration of the field context themethodology or apparatus used to collect data the con-sistency or inconsistency with other comparable studiesthe quality and detail of the reported observationsphotographs or audio recordings and material evidence(specimen genetic sample scat tracks and so on) Theactual practicability of testing and assessing data ishighly dependent on the thoroughness with which dataare described and how completely the context for datacollection is described This leads logically to the ques-tion of metadata as a source of necessary contextualinformation about data

How data have meaning metadatarsquo2607rsquo and lsquo059998rsquo are each an actual datum or lsquodatapointrsquo It is immediately obvious that without anydescription of context for the creation and capture ofdata an isolated datum is meaningless Descriptiveinformation is necessary to impart meaning The formerdatum was recorded by Henry Cavendish in his ldquoExperi-ments to Determine the Density of the Earthrdquo (21 June1798) and was published in the Philosophical Transac-tions of the Royal Society of London [14] The Cavendishdatum was a result of a humanly contrived experimentusing a specially designed apparatus The latter datum isa reading obtained from automated data loggers record-ing sap flow in Manzanita plants at the University ofCalifornia James Reserve Mt San Jacinto California (4December 2007 1137) and was recorded by a data log-ger in an as yet unpublished Microsoft Excel spread-sheet (Gary Geller 2010 personal communication)

However in the simple contexts disclosed above wehave learned that some agent conducted a data gather-ing exercise at a given date and time and at a describedplace Inference of a probable general scientific domainor discipline for the data - for example physics or ecol-ogy or botany - provides only a very general delimiter ofthe probable character of the data We do not forexample know the actual type of automated data loggerused its proper calibration the actual details of itsdeployment in this instance of use or the competenceof the person using the data logger Lacking this infor-mation and other information that would serve to vali-date the quality of the data presented we are challengedwith the need to develop and to provide more completedescriptions to make data fit for use and in particularfit for testing and evaluation

Provision of metadataTo avoid the risks of overly intricate and elaboratemetadata standards that fail by requiring inordinateinvestments of time and resources we suggest thatmetadata be initially designed to provide minimally ade-quate description for discovery and access to data Wepropose that in the interests of optimal efficiency ofeffort careful efforts be made to apply inference andrecursion in creation of such minimally adequate meta-data and that metadata subsequently be available for thecontinuing addition of fresh increments of metadataThis recommendation implies that metadata creationshould be a continuous collaborative process not a sin-gle event Specifically with respect to museum collec-tions we recommend that links to relevant typespecimens be included as a part of the metadata recordMoreover we believe that by careful application of

qualified social tagging - that is of indexing by expertusers applying well-formed ontologically suitable voca-bularies and authority files - substantial developmentand enrichment of metadata records can be accom-plished (this recommendation requires applications thatcan support a dynamic coherent and iterative develop-ment of metadata over time) [15]We also suggest that assessment of the fitness of

metadata for use be considered from the lsquodemand sidersquoby asking how data have typically been used to besteffect in the creation of biodiversity knowledge andpolicyThere are many technical publications - for example

Voss and Emmons lsquoMammalian diversity in neotropicallowland rainforests a preliminary assessmentrsquo [16] theUS Fish and Wildlife Servicersquos lsquoStatistical guide to dataanalysis of avian monitoring programsrsquo [17] or Agostiet alrsquos lsquoAnts standard methods for measuring and mon-itoring biodiversityrsquo [18] - that provide detailed descrip-tions of common data collection methods or of

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 3 of 10

statistical processes applied to biodiversity dataRecently the European Union Framework Projects 6project EDIT (European Distributed Institute for Taxon-omy) has developed a complete workflow from datacollection in the field to assembly of datasets and ana-lyses [1920] These and many other works provide gui-dance in the development of standard ontologies fordata descriptionWe recommend a research process that - from an

ontological perspective - systematically reviews analyzesand specifies how data can most efficiently be suppliedto fit the needs of these primary biodiversity-monitoringprocesses We suggest detailed survey and analysis ofthe primary and standard forms of processing that bycommunity consensus are of greatest proven value andimpact in biodiversity conservation This assures thatinvestments in data collection will have optimal proba-tive force Based in this analysis standards can belsquoreverse engineeredrsquo to produce data best suited to thedemands of biodiversity conservationWe also strongly recommend careful analysis of stan-

dards already under development The EcologicalMetadata Language (EML) [21] under continuingdevelopment has made significant progress but webelieve that the issues raised elsewhere in this reporthave yet to be addressed Specifically significant onto-logical work remains to be accomplished regarding theanalysis and standard definition of biological field tech-niques data transformation methods and statisticalprocessesWe also believe that the scripting capacity of standard

statistical packages [22] and still emergent applicationsfor documenting scientific workflow (such as Kepler[23]) may both have direct utility in recording the pro-cess and context for scientific data capture A notableexample of such workflow capture is in the Galaxygenomics platform [24] Ontological research and devel-opment coupled with applications development shouldprovide the necessary foundations for required descrip-tions of dataIn the social sciences the Data Documentation Initia-

tive based at the University of Michiganrsquos Interuniver-sity Consortium for Political and Social Research(ICPSR) has been underway for several years and isnow at version 31 [25] Similarly a 2009 publication ofthe OECD has proposed a model template for metadatadescribing a published dataset [26] The requirement offree text abstracts may provide an adequate frame forsuch detailed specification but considerable additionalwork will be demanded particularly in deriving minimaldescriptive standards for discovery of biodiversity dataThe importance of metadata in exposing data to dis-

covery becomes increasingly important as the units intowhich data are assembled become smaller The

molecular sequence repositories developed and main-tained by International Nucleotide Sequence DatabaseCollaborations (INSDC [27]) such as GenBank [28]ENA [29] and DDBJ [30] are perhaps among the bestknown example of a data repository but although thesearch interfaces and the utility of data contained withGenBank are very limited (and especially geared formolecular biologists) its global prominence makes it anobvious search target Biodiversity data in general arefar more complicated and tend to be made available insmaller blocks for example the data associated with asingle publication Locating and combining data relevantto a particular purpose thus becomes a goal in itself andis made possible through the existence of metadatausing standard vocabularies

Open access and biodiversity dataOpen access to primary biodiversity data is essentialboth for enabling effective decision making and forempowering stakeholders involved with and affected bythe conservation of biodiversity [31-33] Specifically withrespect to scientific publishing the ability to criticallyevaluate a published scientific hypothesis or scientificreport is contingent on the examination analysis eva-luation and if feasible re-generation of data on whichconclusions are based Biodiversity is not an exceptionto such data restrictions For example authors of apaper published on the failure of African game parks tosuccessfully conserve large mammals were unable topresent local data gathered from reserve operators whowanted it to be kept confidential [34]There is broad emerging consensus in the scientific

and conservation communities that data should befreely openly available in a sustained persistent andsecure way [35-38] However many existing primarybiodiversity data are neither accessible nor discoverable[39] This issue is further compounded by lack of appro-priate representation andor visualization of availabledata and lack of linkability among distributed and het-erogeneous data resources [4041] This adversely affectsthe optimal utility of the biodiversity data Thus anurgent need exists for the discovery of primary biodiver-sity data and its publication in the public domainFor decades there have been declarations statements

policies and guidelines encouraging open access to pri-mary scientific data [3142] With the establishment ofthe Global Biodiversity Information Facility (GBIF) in2001 an attempt has been made to develop a globalinfrastructure to consolidate the discovery of the worldrsquosprimary biodiversity data and to provide coherentaccess Currently the GBIF network facilitates access tonearly 304 million data records through its portal [43]However these primary biodiversity data records arejust a fraction of the estimated volume of existing data

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 4 of 10

[44-47] This large volume of biodiversity data collectedby a vast number of biodiversity researchers and ama-teurs [3147] remains largely undiscovered and unpub-lished This is attributable we believe to a lack ofencouragement misperceptions of self-interest or lackof infrastructural support Although infrastructure sup-port is increasingly available the problem of appropriateprofessional recognition for institutions and individualsremains [31] We believe that this lack of incentiveremains a major impediment to the provision of freeand open access to primary biodiversity data

The GBIF data publishing framework task groupThe foregoing discussion emphasizes the need for a datapublishing framework to evolve metrics and indicatorsthat provides incentives to multiple actors involved inthe generation of data Recognizing the need for addres-sing social policy political and technical issues influen-cing discovery and publishing through the GBIFnetwork the GBIF Data Publishing Framework TaskGroup (DPF TG) was commissioned in March 2009[48] The DPF TG was tasked with providing recom-mendations on (a) social technical and policy interven-tions that would encourage publication of primarybiodiversity data as a necessary and in-built step in thescientific data management cycle (b) opportunities andmechanisms to incentivize and attribute credit forinvestment in primary biodiversity data publishing fromindividual to institutional to national levels and (c)mechanismsprocesses for recognizing efforts of datapublishers The concept of the data publishing frame-work was described at the International BiodiversityInformatics Conference (rsquoe-Biosphere 09rsquo) held in Lon-don in June 2009 [49] In its meeting in June 2009 theDPF TG discussed issues influencing discovery and pub-lishing of primary biodiversity data and possible solu-tions in overcoming impediments

A data publishing framework for primary biodiversitydataDuring its meeting in June 2009 the DPF TG investedsignificant time in defining and determining the scopeand purpose of the data publishing framework for pri-mary biodiversity data The DPF TG recognized theneed expressed by the data originators and informationsystemnetworks for data usage metrics and indicatorsto ensure that the overall utility and impact of their datamanagement and publishing activities is objectivelydocumented leading to crediting of these activities asscientific activity on a par with the recognition receivedfor conventional scholarly publication [31] Furthermoremeasures of scientistsrsquo productivity will be betterinformed through data publishing which requires a pro-fessional cultural change in the recognition of scientific

output [50] Such an incentive mechanism wouldachieve increased data mobilization and increased recog-nition for data generation both desirable outcomes forscientistsOur discussion examined five primary components

that comprise a data publishing framework These com-ponents are (a) socio-cultural (b) technical-infrastruc-tural (c) policy-political (d) legal and (e) economic andthey support various activities of the data publishingcycle (see Figure 1 in [31]) These components are notonly complementary but are inter-dependent Thusthere is no dependency on a sequence of componentsas components need to be implemented concurrentlyTherefore we define a data publishing framework as anenvironment conducive to ensuring free and openaccess to the worldrsquos primary biodiversity data Thecore purpose of the framework is to overcome barriersor impediments affecting access to data and the pub-lishing of data

RecommendationsOn the basis of our understanding of issues influencinglsquofree and open accessrsquo discovery and publishing of theprimary biodiversity data to encourage institutionaliza-tion of the data publishing framework for discoverypublishing and use of primary biodiversity data wemake specific recommendations The key words lsquomustrsquolsquomust notrsquo lsquorequiredrsquo lsquoshallrsquo lsquoshall notrsquo lsquoshouldrsquo lsquoshouldnotrsquo lsquorecommendedrsquo lsquomayrsquo and lsquooptionalrsquo in this docu-ment are to be interpreted as described in RFC 2119lsquoKey words for use in RFCs to Indicate RequirementLevelsrsquo of the Internet Engineering Task Force [51]Sharing of biodiversity data must be the expected

norm We stipulate that withholding of data - to protectprecise localities for collectible or marketable plants oranimals or for species of special concern - should be theexception and require explicit justification We empha-size that such data represent a small fraction of biodi-versity data and should not be allowed to dictate normalpractice We also stipulate that our call for access tobiodiversity data does not supersede national or indigen-ous rights to regulate uses of biodiversity data as protec-tion against commercial exploitation (rsquobiopiracyrsquo) Tothis end we suggest close consultation and confirmationwith CITES [52] and the TRAFFIC Secretariat [53]when questions of this kind occur As a corollary allcontributors of data must receive appropriate propor-tional recognition for their contributions of data Onthis backdrop we offer 24 recommendations Recom-mendation 1 is however the primary recommendationthat leads to the other recommendationsRecommendation 1 All data relevant to the under-

standing of biodiversity and to biodiversity conservationshould be made freely openly and effectively available

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 5 of 10

Recommendation 2 GBIF must re-examine its cur-rent data resources endorsement model and scrutinizethe current practice that national nodes or associateparticipant nodes are required to give endorsementbefore the data are discovered and indexed throughGBIF networkRecommendation 3 GBIF must engage mainstream

scholarly publishers and scientific societies with scho-larly publications to be part of the GBIF network as amajority of them would qualify to be thematicglobalregional associate-participantsRecommendation 4 GBIF must support the develop-

ment of a tool to convert tabular data into resourcedescription framework (RDF) formats conforming to astandard ontology This would be highly desirable forsmall custodianspublishers but is primarily a tool formainstream scholarly publishers (Support for develop-ment of such an open source application should besought from mainstream commercial publishers) GBIFshall evaluate standards such as BioPax [54]Recommendation 5 GBIF must facilitate discovery

and mobilization of all streamstypes of relevant biodi-versity data (This effort should - in close collaborationwith others focusing on this development - includeontological analysis of the most important types of datato be considered the elaboration of suitable workingformats for that data and the developing of mappingstofrom such working formats to a standard RDF formatfor interchange purposes)Recommendation 6 GBIF should develop a set of

supporting tools (such as templates) for biodiversitydata to accommodate more than simple occurrencedata GBIF must increasingly engage with various biodi-versity data communitiesRecommendation 7 GBIF must facilitate discovery of

un-digitized and not yet published datasets togetherwith indexing of published datasets (potentially toinclude semantic indexing based on RDF to allow data-sets to be filtered and retrieved with SPARQL queries)In this regard we strongly endorse the recommendationby the GBIF Global Strategy and Action Plan for Mobili-zation of Natural History Collections data [55]Recommendation 8 GBIF should review the use of

legacy literature such as is stored in Biodiversity Heri-tage Library (BHL) to explore uses of marked-up textsfor data mining and capture of historical biodiversityinformationRecommendation 9 GBIF must explore and develop

the capacity to run queries at the GBIF data portal toreturn harmonized well formed XML andor RDF suchthat fields can be extracted for subsequent analysisRecommendation 10 GBIF must expand and

improve its metadata implementation framework tosuch that fitness for use of the data resource for

intended use can be ascertained from metadata Forexample data records should identify lineage and prove-nance (where data originated and from which dataresource) of all contributed data - at least to the pre-vious phase of data transformation Further we stronglyencourage early implementation of the recommenda-tions of the GBIF Metadata Implementation FrameworkTask Group [56]Recommendation 11 GBIF must strengthen its net-

work of mirror sites and distributed network of lsquotrusteddigital repositoriesrsquo (also called data hosting centers) Inthis regard we call on GBIF to ensure early implementa-tion of the recommendations in this issue on data host-ing infrastructure [57]Recommendation 12 GBIF must explore the feasibil-

ity of using a cloud infrastructure to overcome barriersof investment and maintenance required for biodiversitydata discovery and publishing especially in the develop-ing and under-developed regions of the worldRecommendation 13 GBIF must ensure an early

implementation of the recommendations of the GBIFLife Sciences Identifier (LSID)globally unique identifier(GUID) Task Group [58] We further emphasize theneed for GBIF to adopt a stable and proven persistentidentifier such as the lsquodigital object identifier (doi)rather than unstable persistent identifiersRecommendation 14 GBIF must explore the poten-

tial of the Data Usage Index (DUI) as potential incenti-vization mechanism to recognize efforts required forpublishing of biodiversity data [3159] GBIF shoulddevelop a prototype of such an implementationRecommendation 15 GBIF must institutionalize a

lsquodata citation mechanismrsquo and establish a lsquodata citationservicersquo facilitating deep-data citation and registrationand resolving of citations [26] For the purposes ofaccountability and citation (attribution) all contributorsof data to any aggregation should be identified andacknowledged Individuals or institutions responsible forprimary data have an obligation to make these owner-ship statements available to the aggregators who areresponsible for using them The Dryad applicationwhich uses DataCite to register dois is an initial effortto address this concern [60] In any data aggregationchain the aggregator at each level is responsible foridentification of data sources from previous level ofaggregation and its contributors We believe that thisprovision avoids the complexity of comprehensive iden-tity of all lsquocascadedrsquo data sources and contributors dur-ing the aggregation process It is of course neverthelessthe case that the validity and integrity of data are ulti-mately linked to the sum of the integrity and validity ofall data processes in the lineage of data creationRecommendation 16 GBIF should investigate inno-

vative mechanisms for discovery and publishing of

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 6 of 10

primary biodiversity data in multiple languages GBIFshould commission a position paper detailing suchmechanisms for potential uptake by the communityRecommendation 17 GBIF must institutionalize the

lsquobiodiversity informatics potentialrsquo (BIP) Index todemonstrate the potential and urgency for nations toimplement biodiversity informatics [61] In the longterm GBIF must lead the periodic release of a lsquoglobalbiodiversity information outlookrsquo report analyzing thecurrent state of biodiversity information to meet thelocal-to-global scale biodiversity targetsRecommendation 18 GBIF must commission a strat-

egy paper demystifying the concernsissues related tointellectual property rights and primary biodiversitydata In this regard the substantial work done by theScience Commons (for example the Science CommonsProtocol for Implementing Open Access Data [62]) andthe Open Knowledge Foundation [63] should havedirect applicationRecommendation 19 GBIF should encourage spon-

sors of biodiversity research whether government agen-cies corporations or private foundations to setmandatory requirements for free and open access tobiodiversity data GBIF should encourage that negotia-tions for overhead (indirect) cost contributions fromfunders should include calculations of cost for sustaineddigital infrastructure that is adequate for free and opensharing and the sustained secure and persistent mainte-nance of data Proposals should be expected to includeadequate planning and financial provision for sustaineddata management and access We further recommendthat GBIF should encourage peer review processes thatinclude rigorous scrutiny of past histories of successfulsharing and should support the norm of state-of-the-artplanning for sharing not simply promises to ldquoput dataon the webrdquoRecommendation 20 GBIF must develop a plan to

foster linkages between scholarly publishers and datapublishers from the local to the global scale GBIFshould encourage that records of professional publica-tion be evaluated - at least in part - on the basis of pub-lication in open access journals that do not deny accessthrough lsquopaywallsrsquo and that provide support for sustain-able open access to dataRecommendation 21 GBIF should urge accreditation

bodies for educational institutions and museums torequire demonstrated evidence of capacity to supportdigital access and maintenance of dataRecommendation 22 GBIF should encourage profes-

sional societies and professional disciplines to requireevidence of effective sharing of data in evaluations forhiring promotion and tenureRecommendation 23 GBIF should develop a concep-

tual lsquolandscape maprsquo depicting GBIFrsquos position role

unique advantages and collaborative strategies amid themany biodiversity and biodiversity informatics initiativesat local to global scales This is very important given thebroad reach of the earlier recommendations It is impor-tant that the scope of the GBIFrsquos own vision and mis-sion is well defined with a clear picture of how GBIFrsquosrole fits into a wider framework of sustainable develop-ment and of free and open access to biodiversity dataRecommendation 24 GBIF must evaluate prioritize

and implement the recommendations made by its taskgroups - the Content Needs Assessment Task Group(CNA TG) [42] the Multimedia Resources Task Group(MRTG) [6465] the Metadata Implementation Frame-work Task Group (MIFTG) [56] the LSID-GUID TaskGroup (LGTG) [58] the Observational Data TaskGroup (ODTG) [66] - and in the Global Strategy andAction Plan for Natural History Collections Data(GSAP-NHC) [55] and recommendations on e-learningrecommendations [67] Knowledge Organization System(KOS) [68] and fitness for use [69]

DiscussionThese recommendations grew out of our discussion inJune 2009 Since then there have been subsequentrevisions and modifications of the recommendationsand some additions Chavan and Ingwersen [31]further elaborated on various components of the datapublishing framework especially pertaining to theissues of persistent identifiers the data usage indexand a data citation mechanism This was further dis-cussed during the DataCite Summer Workshop 2010[70] Members of the Task Group were engaged inexploring solutions to various components of the datapublishing framework some of which are included inthis issue [57596171] and some published elsewhere[697273] and MJ Costello WK Michener et al per-sonal communicationIn January 2011 the US National Science Foundation

(NSF) implemented a policy requiring all NSF grantapplicants to submit data management plans as a partof any grant proposal [74] This policy change seems torepresent a very significant fulfillment of our recom-mendation though the exact details of its implementa-tion remain as yet unclearWe believe that timely implementation of

these recommendations and suggested solutions orapproaches by the GBIF network will support muchneeded recognition for individual and institutionalefforts in management and publishing of primary bio-diversity data GBIFrsquos support of these recommenda-tions should be of critical importance in establishingtheir credibility and winning their widespread adop-tion Implementation of these recommendations shouldsubstantially increase the volume of available primary

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 7 of 10

biodiversity data substantiating public investment inbiodiversity science and conservation of bioticresourcesThe DPF TG notes several preliminary efforts to

implement these recommendations by the GBIF Secre-tariat The DPF TG recommendation on incentivizingefforts for metadata authoring has led the GBIF secre-tariat to commission Pensoft Publishers to create a lsquodatapaperrsquo [71] section in four of its journals (BioRisks Phy-toKeys NeoBiota and ZooKeys) alongside a lsquopush-buttonrsquomechanism to generate XML-encoded manuscripts frommetadata descriptions to be submitted directly to thepublisher for peer review and editorial evaluation andpublication in a form of a data paper [71] The BIPIndex an exploratory study to develop metrics to deter-mine country-level biodiversity informatics potentialshas been undertaken [61] GBIF was moreover invitedto be part of the group of experts convened by theCODATA (the Committee on Data for Science andTechnology) to develop an approach to data citationWe were mandated to make recommendations for

potential uptake by the GBIF network However webelieve that these recommendations apply to thebroader biodiversity informatics and ecoinformaticscommunity Nevertheless we reiterate that the GBIFnetwork is the most natural venue to kick-start the earlyimplementation of these recommendations As GBIFenters into its third phase in which it aspires to be theforemost global resource for biodiversity information[75] an early leadership and proactive step towardsimplementation of these recommendations is imperativefor its success

Conclusions and future workThe effective sharing of research data has become a goalof the international research community Implementa-tion of these recommendations should expedite the pro-gress of archiving curation discovery and publishing ofprimary biodiversity data because scientists and origina-tors of data will realize the value and incentives for suchefforts We believe that implementation of our recom-mendations by the GBIF network and its adoption bysimilar initiatives such as GEO-BON IPBES and CBDwill contribute to a much needed global research infra-structure and specifically to an open access regime inbiodiversity and conservation science We furtherbelieve that adoption should encourage the evolution ofa richly informed virtual research space for future stu-dies in biodiversity [76] However we believe that ulti-mately implementation of these recommendations willdepend less on policy-political decisions or technical-infrastructural development and primarily on culturalnormative and attitudinal changes by individuals institu-tions and organizations

AcknowledgementsThis article has been published as part of BMC Bioinformatics Volume 12Supplement 15 2011 Data publishing framework for primary biodiversitydata The full contents of the supplement are available online at httpwwwbiomedcentralcom1471-210512issue=S15 Publication of the supplementwas supported by the Global Biodiversity Information Facility

Author details11968frac12 South Shenandoah Street Los Angeles California 90034-1208 USA2Aundh Pune 411007 India 3Zoology Microbiology Research GroupZoology Department Natural History Museum Cromwell Road London SW75BD UK 4Royal School of Library and Information Science Birketinget 6Copenhagen DK 2300 Denmark 5Oslo University College Pb 4 St OlavsPlass 0130 Oslo Norway 6Plazi Zinggst 16 3600 Bern Switzerland andAmerican Museum of Natural History Central Park West at 79th Street NewYork NY 10024 USA 7Institute of Biodiversity and Ecosystem ResearchBulgarian Academy of Sciences and Pensoft Publishers 13a Geomilev Street1111 Sophia Bulgaria 8BioMedCentral Ltd Floor 6 236 Grayrsquos Inn RoadLondon WC1X 8HB UK 9Global Biodiversity Information Facility SecretariatUniversitetsparken 15 DK 2100 Copenhagen Denmark

Competing interestsThe authors declare that they have no competing interests

Published 15 December 2011

References1 Merriam-Webster [httpwwwmerriam-webstercomdictionarydata]2 Wikipedia [httpenwikipediaorgwikiData]3 National Science Foundation Sustainable Digital Data Preservation and

Access Network Partners (DataNet) Program Solicitation NSF 07-601 2008[httpwwwnsfgovpubs2007nsf07601nsf07601htmtoc]

4 AnthroDPA Metadata Working Group Report of the AnthroDPA MetaDataWorking Group May 2009 Sponsored by the Wenner-Gren Foundationand the US NSF[httpanthrodatadpaorgMediaAnthroDataDPA20Reportpdf]

5 Ackoff RL From data to wisdom Journal of Applied Systems Analysis 1989163-9

6 Bellinger C Castro D Mills A Data Information Knowledge and Wisdom2004 [httpwwwsystems-thinkingorgdikwdikwhtm]

7 Bose R Frew J Lineage retrieval for scientific data processing a surveyACM Computing Surveys 2005 371-28

8 Lathe W Williams J Mangan M Karolchik D Genomic data resourceschallenges and promises Nature Education 2008 13[httpwwwnaturecomscitabletopicpageGenomic-Data-Resources-Challenges-and-Promises-743721]

9 Grantham HS Moilanen A Wilson KA Pressey RL Rebelo TGPossingham HP Diminishing return on investment for biodiversity datain conservation planning Conservation Letters 1190-198 doi 101111j1755-263X200800029x

10 Closing the Climategate Nature 2010 468345 doi 101038468345a11 Penev L Erwin T Miller J Chavan V Moritz T Griswold C Publication and

dissemination of datasets in taxonomy ZooKeys working exampleZooKeys 2009 111-8 doi 103897zookeys11210

12 GBIF GBIF Work Programme 2009-2010 Copenhagen Global BiodiversityInformation Facility 2008

13 Merton RK The Normative Structure of Science The Sociology of ScienceTheoretical and Empirical Investigations Chicago University of Chicago Press1979 267-278

14 Cavendish H Read AS Experiments to determine the density of theearth Philos Trans R Soc Lond 1798 II469-526

15 Michener WK Meta-information concepts for ecological datamanagement Ecological Informatics 2006 13-7 doi 101016jecoinf200508004

16 Voss RS Emmons L Mammalian diversity in neotropical lowlandrainforests a preliminary assessment Bulletin of the American Museum ofNatural History 1996 230

17 Nur N Jones SL Geupel GR Statistical Guide to Data Analysis of AvianMonitoring Programs BTP-R6001-1999 Washington DC US Departmentof the Interior Fish and Wildlife Service 1999 61[httplibraryfwsgovPubs9avian_monitoringpdf]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 8 of 10

18 Agosti D Majer J Alonso E Schultz TR Ants Standard Methods forMeasuring and Monitoring Biodiversity Biological Diversity HandbookSeries Washington DC Smithsonian Institution Press 2000 [httpantbaseorgantspublications2033020330pdf]

19 EDIT Platform for Cybertaxonomy [httpwp5e-taxonomyeu]20 EDIT Volume on field recording techniques and protocols for all taxa

biodiversity inventories 2010 [httpwwwabctaxabevolumesvolume-8-manual-atbi]

21 Knowledge Network for Biodiversity an Introduction to EcologicalMetadata Language [httpknbecoinformaticsorgeml_metadata_guidehtml]

22 Borer ET Seabloom EW Jones MB Schildhauer M Some simple guidelinesfor effective data management ESA Bulletin 2009 90206-214[httpwwwesajournalsorgdoipdf1018900012-9623-902205]

23 The Kepler Project [httpskepler-projectorg]24 Giardine B Riemer C Hardison RC Burhans R Elnitski L Shah P Zhang Y

Blankenberg D Albert I Taylor J Miller W Kent WJ Nekrutenko A Galaxy aplatform for interactive large-scale genome analysis Genome Res 2005151451-1455

25 DDI Alliance Metadata specification for social and behavioral sciencesver 31[http httpwwwddiallianceorg]

26 Green T We need publishing standards for datasets and data tablesWhite paper OECD Publishing 2009 9-11 doi 101787603233448430

27 International Nucleotide Sequence Database Collaboration [httpinsdcorg]

28 GenBank [httpwwwncbinlmnihgovGenbankindexhtml]29 European Nucleotide Archive [httpwwwebiacukena]30 DNA Data Bank of Japan [httpwwwddbjnigacjp]31 Chavan VS Ingwersen P Towards a data publishing framework for

primary biodiversity data challenges and potentials for the biodiversityinformatics community BMC Bioinformatics 2009 10(Suppl 14)S2 doi1011861471-2105-10-S14-S2

32 Penev L Sharkey M Erwin T van Noort S Buffington M Seltmann KJohnson N Taylor M Thompson FC Dallwitz MJ Data publication anddissemination of interactive keys under the open access modelZooKeys working example ZooKeys 2009 211-17 doi 103897zookeys21274

33 Reichman OJ Jones MB Schildhauer MP Challenges and opportunitiesof open data in ecology Science 2011 331703 doi 101126science1197962

34 Craigie ID Baillie JEM Balmford A Carbone C Collen B Green REHutton JM Large marine population declines in Africarsquos protected areasBiol Conserv 2010 1432221-2228

35 Berlin Declaration on Open Access to Knowledge in the Sciences andHumanities 2003 [httpoampgdelangen-ukberlin-prozessberliner-erklarung]

36 Berlin Declaration Table of Signatories [httpoampgdelangen-ukberlin-prozesssignatoren]

37 About Conservation Commons [httpconservationcommonsnetcc_en_1-about-conservation-commons]

38 Conservation Commons Partners [httpconservationcommonsnetpartners]

39 Chavan V Watve AV Londhe MS Rane NS Pandit AT Krishnan SCataloguing Indian biota the electronic catalogue of known Indianfauna Curr Sci 2004 87749-763

40 Sarkar IN Biodiversity informatics organizing and linking informationacross the spectrum of life Brief Bioinf 2007 8347-357

41 Page RDM Biodiversity informatics the challenge of linking data and therole of shared identifiers Brief Bioinf 2008 9345-354

42 Faith DP Collen B Arino AH Koleff P Guinotte J Kerr J Chavan V Bridgingthe biodiversity data gaps recommendations of the GBIF ContentNeeds Assessment Task Group Biodiversity Informatics 2011

43 GBIF Data Portal [httpdatagbiforg]44 Butler D Gee H Macilwain C Museum research comes off list of

endangered species Nature 1998 394115-11745 Chavan V Krishnan S Natural history collections A call for national

information infrastructure Curr Sci 2003 8434-4246 Arino AH Approaches to estimating the universe of natural history

collections data Biodiversity Informatics 2010 781-9247 Heidorn PB Shedding light on the dark data in the long-tail of science

Library Trends 2008 57280-299 doi 101353lib00036

48 GBIF GBIF commissions Data Publishing Framework Task Group (10March 2009)[httpwwwgbiforgcommunicationsnews-and-eventsshowsinglearticlegbif-commissions-data-publishing-framework-task-group]

49 Chavan V Data Publishing = Scholarly Publishing e-Biosphere 09International Conference on Biodiversity Informatics June 2009 London[httpwwwslidesharenetvishwaschavanebiosphere09-vc-final-1734144]

50 Roberts D Chavan V Standards identifier could mobilize data and freetime Nature 2008 453449-450

51 IETF RFC 2119 (Released 1997)[httpwwwietforgrfcrfc2119txt]52 CITES [httpwwwcitesorg]53 TRAFFIC [httpwwwtrafficorg]54 BioPAX - Biological Pathway Exchange [httpwwwbiopaxorg]55 Berendsohn WG Chavan V Macklin JA Recommendations of the GBIF

Task Group on the Global Strategy and Action Plan for the mobilizationof the natural history collections data Biodiversity Informatics 2010767-71

56 Global Biodiversity Information Facility Report of the GBIF MetadataImplementation Framework Task Group (MIFTG) Copenhagen GlobalBiodiversity Information Facility 2009 [httpwww2gbiforgGBIF-MIFTG-Reportpdf]

57 Goddard A Wilson N Cryer P Yamashita G Data hosting infrastructure forprimary biodiversity data BMC Bioinformatics 2011 12(Suppl 15)S5

58 GBIF Adoption of Persistent Identifiers for Biodiversity InformaticsRecommendations of the GBIF LSID GUID Task Group CopenhagenGlobal Biodiversity Information Facility 2009 [httpwww2gbiforgPersistent-Identifierspdf]

59 Ingwersen P Chavan V Indicators for the Data Usage Index (DUI) anincentive for publishing primary biodiversity data through globalinformation infrastructure BMC Bioinformatics 2011 12(Suppl 15)S3

60 DataCite Metadata [httpswwwdatadryadorgwikiDataCite_Metadata]61 Arino AH Chavan V King N The Biodiversity Informatics Potential Index

BMC Bioinformatics 2011 12(Suppl 15)S462 Science Commons Protocol for Implementing Open Access Data [http

sciencecommonsorgprojectspublishingopen-access-data-protocol]63 Open Knowledge Foundation [httpokfnorg]64 Morris R Olson A OrsquoTuama E Riccardi G Whitbread G Hagedorn G

Teage I Heikkinen M Leary P Barve V Chavan V Recommendations of theGBIF Multimedia Resources Task Group Copenhagen Global BiodiversityInformation Facility 2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

65 Morris R Olson A Freeland C Hagedorn G Riccardi G Carausu M-COrsquoTuama E Chavan V Mobilising Multimedia Resources in Biodiversity2nd Report of the GBIF Multimedia Resources Task Group (MRTG)Copenhagen Global Biodiversity Information Facility 2009 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

66 Kelling S Ingole B Daly B Stein B Lepage D OrsquoTuama E Cooper JJones M Lahti T Chavan V Recommendations of the GBIF ObservationalData Task Group Copenhagen Global Biodiversity Information Facility2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

67 Balde O Encinas Escribano M Gonzaacutelez-Talavaacuten A Martens MJMNorton GA Talukdar GH GBIF Task Group on Electronic Learning FinalReport version 10 Copenhagen Global Biodiversity Information Facility2010 [httplinksgbiforggbif_elearning_task_group_en_v1pdf]

68 Catapano T Hobern D Lapp H Morris RA Morrision N Noy NSchildhauer M Thau D Recommendations for the Use of KnowledgeOrganisation Systems by GBIF Copenhagen Global BiodiversityInformation Facility 2001 [httplinksgbiforggbif_kos_whitepaper_v1pdf]Released on 04 Feb 2011

69 Hill AW Otegui J Arintildeo AH Guralnick RP GBIF Position Paper on FutureDirections and Recommendations for Enhancing Fitness-for-Use Acrossthe GBIF Network version 10 Copenhagen Global BiodiversityInformation Facility 2010 [httpwww2gbiforgGPP-Finalpdf] PrimaryBiodiversity Data

70 Chavan V Towards Data Publishing Framework DataCite Summer Meeting7-8 June 2010 Hannover Germany [httpflowcastsmediaelearninguni-hannoverde2010-07-05datacite2010AcquiringhighqualityresearchdataAndreasHense-640-video-O3hD9ZOmmp4]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 9 of 10

71 Chavan V Penev L The data paper a mechanism to incentivize datapublishing in biodiversity science BMC Bioinformatics 2011 12(Suppl15)S2

72 Berents P Hamer M Chavan V Towards demand driven publishingapproaches to the prioritization of digitization of natural historycollections data Biodiversity Informatics 2010 7113-119

73 Chavan VS Sood RK Arino AH Best Practice Guide for lsquoData Discoveryand Publishing Strategy and Action Plansrsquo version 10 CopenhagenGlobal Biodiversity Information Facility 2010 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

74 NSF Data Management Plan Requirements [httpwwwnsfgovenggeneraldmpjsp]

75 GBIF GBIF Strategic Plan 2012-2016 Seizing the Future CopenhagenGlobal Biodiversity Information Facility 2011 [httpgbifddbjnigacjpgbif_newsuploadGBIF_Strategic_Plan_2012-16pdf]

76 Gaikwad J Chavan V Open access and biodiversity conservationchallenges and potentials for the developing world Data Science Journal2006 51-17

doi1011861471-2105-12-S15-S1Cite this article as Moritz et al Towards mainstreaming of biodiversitydata publishing recommendations of the GBIF Data PublishingFramework Task Group BMC Bioinformatics 2011 12(Suppl 15)S1

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 10 of 10

  • Abstract
    • Background
    • Discussion
    • Conclusions
      • Background
        • Data usage and definitions
        • The volume of data
        • Collections of data databases datasets and data tables
        • How data have meaning metadata
        • Provision of metadata
        • Open access and biodiversity data
        • The GBIF data publishing framework task group
        • A data publishing framework for primary biodiversity data
          • Recommendations
          • Discussion
          • Conclusions and future work
          • Acknowledgements
          • Author details
          • Competing interests
          • References

statistical processes applied to biodiversity dataRecently the European Union Framework Projects 6project EDIT (European Distributed Institute for Taxon-omy) has developed a complete workflow from datacollection in the field to assembly of datasets and ana-lyses [1920] These and many other works provide gui-dance in the development of standard ontologies fordata descriptionWe recommend a research process that - from an

ontological perspective - systematically reviews analyzesand specifies how data can most efficiently be suppliedto fit the needs of these primary biodiversity-monitoringprocesses We suggest detailed survey and analysis ofthe primary and standard forms of processing that bycommunity consensus are of greatest proven value andimpact in biodiversity conservation This assures thatinvestments in data collection will have optimal proba-tive force Based in this analysis standards can belsquoreverse engineeredrsquo to produce data best suited to thedemands of biodiversity conservationWe also strongly recommend careful analysis of stan-

dards already under development The EcologicalMetadata Language (EML) [21] under continuingdevelopment has made significant progress but webelieve that the issues raised elsewhere in this reporthave yet to be addressed Specifically significant onto-logical work remains to be accomplished regarding theanalysis and standard definition of biological field tech-niques data transformation methods and statisticalprocessesWe also believe that the scripting capacity of standard

statistical packages [22] and still emergent applicationsfor documenting scientific workflow (such as Kepler[23]) may both have direct utility in recording the pro-cess and context for scientific data capture A notableexample of such workflow capture is in the Galaxygenomics platform [24] Ontological research and devel-opment coupled with applications development shouldprovide the necessary foundations for required descrip-tions of dataIn the social sciences the Data Documentation Initia-

tive based at the University of Michiganrsquos Interuniver-sity Consortium for Political and Social Research(ICPSR) has been underway for several years and isnow at version 31 [25] Similarly a 2009 publication ofthe OECD has proposed a model template for metadatadescribing a published dataset [26] The requirement offree text abstracts may provide an adequate frame forsuch detailed specification but considerable additionalwork will be demanded particularly in deriving minimaldescriptive standards for discovery of biodiversity dataThe importance of metadata in exposing data to dis-

covery becomes increasingly important as the units intowhich data are assembled become smaller The

molecular sequence repositories developed and main-tained by International Nucleotide Sequence DatabaseCollaborations (INSDC [27]) such as GenBank [28]ENA [29] and DDBJ [30] are perhaps among the bestknown example of a data repository but although thesearch interfaces and the utility of data contained withGenBank are very limited (and especially geared formolecular biologists) its global prominence makes it anobvious search target Biodiversity data in general arefar more complicated and tend to be made available insmaller blocks for example the data associated with asingle publication Locating and combining data relevantto a particular purpose thus becomes a goal in itself andis made possible through the existence of metadatausing standard vocabularies

Open access and biodiversity dataOpen access to primary biodiversity data is essentialboth for enabling effective decision making and forempowering stakeholders involved with and affected bythe conservation of biodiversity [31-33] Specifically withrespect to scientific publishing the ability to criticallyevaluate a published scientific hypothesis or scientificreport is contingent on the examination analysis eva-luation and if feasible re-generation of data on whichconclusions are based Biodiversity is not an exceptionto such data restrictions For example authors of apaper published on the failure of African game parks tosuccessfully conserve large mammals were unable topresent local data gathered from reserve operators whowanted it to be kept confidential [34]There is broad emerging consensus in the scientific

and conservation communities that data should befreely openly available in a sustained persistent andsecure way [35-38] However many existing primarybiodiversity data are neither accessible nor discoverable[39] This issue is further compounded by lack of appro-priate representation andor visualization of availabledata and lack of linkability among distributed and het-erogeneous data resources [4041] This adversely affectsthe optimal utility of the biodiversity data Thus anurgent need exists for the discovery of primary biodiver-sity data and its publication in the public domainFor decades there have been declarations statements

policies and guidelines encouraging open access to pri-mary scientific data [3142] With the establishment ofthe Global Biodiversity Information Facility (GBIF) in2001 an attempt has been made to develop a globalinfrastructure to consolidate the discovery of the worldrsquosprimary biodiversity data and to provide coherentaccess Currently the GBIF network facilitates access tonearly 304 million data records through its portal [43]However these primary biodiversity data records arejust a fraction of the estimated volume of existing data

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 4 of 10

[44-47] This large volume of biodiversity data collectedby a vast number of biodiversity researchers and ama-teurs [3147] remains largely undiscovered and unpub-lished This is attributable we believe to a lack ofencouragement misperceptions of self-interest or lackof infrastructural support Although infrastructure sup-port is increasingly available the problem of appropriateprofessional recognition for institutions and individualsremains [31] We believe that this lack of incentiveremains a major impediment to the provision of freeand open access to primary biodiversity data

The GBIF data publishing framework task groupThe foregoing discussion emphasizes the need for a datapublishing framework to evolve metrics and indicatorsthat provides incentives to multiple actors involved inthe generation of data Recognizing the need for addres-sing social policy political and technical issues influen-cing discovery and publishing through the GBIFnetwork the GBIF Data Publishing Framework TaskGroup (DPF TG) was commissioned in March 2009[48] The DPF TG was tasked with providing recom-mendations on (a) social technical and policy interven-tions that would encourage publication of primarybiodiversity data as a necessary and in-built step in thescientific data management cycle (b) opportunities andmechanisms to incentivize and attribute credit forinvestment in primary biodiversity data publishing fromindividual to institutional to national levels and (c)mechanismsprocesses for recognizing efforts of datapublishers The concept of the data publishing frame-work was described at the International BiodiversityInformatics Conference (rsquoe-Biosphere 09rsquo) held in Lon-don in June 2009 [49] In its meeting in June 2009 theDPF TG discussed issues influencing discovery and pub-lishing of primary biodiversity data and possible solu-tions in overcoming impediments

A data publishing framework for primary biodiversitydataDuring its meeting in June 2009 the DPF TG investedsignificant time in defining and determining the scopeand purpose of the data publishing framework for pri-mary biodiversity data The DPF TG recognized theneed expressed by the data originators and informationsystemnetworks for data usage metrics and indicatorsto ensure that the overall utility and impact of their datamanagement and publishing activities is objectivelydocumented leading to crediting of these activities asscientific activity on a par with the recognition receivedfor conventional scholarly publication [31] Furthermoremeasures of scientistsrsquo productivity will be betterinformed through data publishing which requires a pro-fessional cultural change in the recognition of scientific

output [50] Such an incentive mechanism wouldachieve increased data mobilization and increased recog-nition for data generation both desirable outcomes forscientistsOur discussion examined five primary components

that comprise a data publishing framework These com-ponents are (a) socio-cultural (b) technical-infrastruc-tural (c) policy-political (d) legal and (e) economic andthey support various activities of the data publishingcycle (see Figure 1 in [31]) These components are notonly complementary but are inter-dependent Thusthere is no dependency on a sequence of componentsas components need to be implemented concurrentlyTherefore we define a data publishing framework as anenvironment conducive to ensuring free and openaccess to the worldrsquos primary biodiversity data Thecore purpose of the framework is to overcome barriersor impediments affecting access to data and the pub-lishing of data

RecommendationsOn the basis of our understanding of issues influencinglsquofree and open accessrsquo discovery and publishing of theprimary biodiversity data to encourage institutionaliza-tion of the data publishing framework for discoverypublishing and use of primary biodiversity data wemake specific recommendations The key words lsquomustrsquolsquomust notrsquo lsquorequiredrsquo lsquoshallrsquo lsquoshall notrsquo lsquoshouldrsquo lsquoshouldnotrsquo lsquorecommendedrsquo lsquomayrsquo and lsquooptionalrsquo in this docu-ment are to be interpreted as described in RFC 2119lsquoKey words for use in RFCs to Indicate RequirementLevelsrsquo of the Internet Engineering Task Force [51]Sharing of biodiversity data must be the expected

norm We stipulate that withholding of data - to protectprecise localities for collectible or marketable plants oranimals or for species of special concern - should be theexception and require explicit justification We empha-size that such data represent a small fraction of biodi-versity data and should not be allowed to dictate normalpractice We also stipulate that our call for access tobiodiversity data does not supersede national or indigen-ous rights to regulate uses of biodiversity data as protec-tion against commercial exploitation (rsquobiopiracyrsquo) Tothis end we suggest close consultation and confirmationwith CITES [52] and the TRAFFIC Secretariat [53]when questions of this kind occur As a corollary allcontributors of data must receive appropriate propor-tional recognition for their contributions of data Onthis backdrop we offer 24 recommendations Recom-mendation 1 is however the primary recommendationthat leads to the other recommendationsRecommendation 1 All data relevant to the under-

standing of biodiversity and to biodiversity conservationshould be made freely openly and effectively available

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 5 of 10

Recommendation 2 GBIF must re-examine its cur-rent data resources endorsement model and scrutinizethe current practice that national nodes or associateparticipant nodes are required to give endorsementbefore the data are discovered and indexed throughGBIF networkRecommendation 3 GBIF must engage mainstream

scholarly publishers and scientific societies with scho-larly publications to be part of the GBIF network as amajority of them would qualify to be thematicglobalregional associate-participantsRecommendation 4 GBIF must support the develop-

ment of a tool to convert tabular data into resourcedescription framework (RDF) formats conforming to astandard ontology This would be highly desirable forsmall custodianspublishers but is primarily a tool formainstream scholarly publishers (Support for develop-ment of such an open source application should besought from mainstream commercial publishers) GBIFshall evaluate standards such as BioPax [54]Recommendation 5 GBIF must facilitate discovery

and mobilization of all streamstypes of relevant biodi-versity data (This effort should - in close collaborationwith others focusing on this development - includeontological analysis of the most important types of datato be considered the elaboration of suitable workingformats for that data and the developing of mappingstofrom such working formats to a standard RDF formatfor interchange purposes)Recommendation 6 GBIF should develop a set of

supporting tools (such as templates) for biodiversitydata to accommodate more than simple occurrencedata GBIF must increasingly engage with various biodi-versity data communitiesRecommendation 7 GBIF must facilitate discovery of

un-digitized and not yet published datasets togetherwith indexing of published datasets (potentially toinclude semantic indexing based on RDF to allow data-sets to be filtered and retrieved with SPARQL queries)In this regard we strongly endorse the recommendationby the GBIF Global Strategy and Action Plan for Mobili-zation of Natural History Collections data [55]Recommendation 8 GBIF should review the use of

legacy literature such as is stored in Biodiversity Heri-tage Library (BHL) to explore uses of marked-up textsfor data mining and capture of historical biodiversityinformationRecommendation 9 GBIF must explore and develop

the capacity to run queries at the GBIF data portal toreturn harmonized well formed XML andor RDF suchthat fields can be extracted for subsequent analysisRecommendation 10 GBIF must expand and

improve its metadata implementation framework tosuch that fitness for use of the data resource for

intended use can be ascertained from metadata Forexample data records should identify lineage and prove-nance (where data originated and from which dataresource) of all contributed data - at least to the pre-vious phase of data transformation Further we stronglyencourage early implementation of the recommenda-tions of the GBIF Metadata Implementation FrameworkTask Group [56]Recommendation 11 GBIF must strengthen its net-

work of mirror sites and distributed network of lsquotrusteddigital repositoriesrsquo (also called data hosting centers) Inthis regard we call on GBIF to ensure early implementa-tion of the recommendations in this issue on data host-ing infrastructure [57]Recommendation 12 GBIF must explore the feasibil-

ity of using a cloud infrastructure to overcome barriersof investment and maintenance required for biodiversitydata discovery and publishing especially in the develop-ing and under-developed regions of the worldRecommendation 13 GBIF must ensure an early

implementation of the recommendations of the GBIFLife Sciences Identifier (LSID)globally unique identifier(GUID) Task Group [58] We further emphasize theneed for GBIF to adopt a stable and proven persistentidentifier such as the lsquodigital object identifier (doi)rather than unstable persistent identifiersRecommendation 14 GBIF must explore the poten-

tial of the Data Usage Index (DUI) as potential incenti-vization mechanism to recognize efforts required forpublishing of biodiversity data [3159] GBIF shoulddevelop a prototype of such an implementationRecommendation 15 GBIF must institutionalize a

lsquodata citation mechanismrsquo and establish a lsquodata citationservicersquo facilitating deep-data citation and registrationand resolving of citations [26] For the purposes ofaccountability and citation (attribution) all contributorsof data to any aggregation should be identified andacknowledged Individuals or institutions responsible forprimary data have an obligation to make these owner-ship statements available to the aggregators who areresponsible for using them The Dryad applicationwhich uses DataCite to register dois is an initial effortto address this concern [60] In any data aggregationchain the aggregator at each level is responsible foridentification of data sources from previous level ofaggregation and its contributors We believe that thisprovision avoids the complexity of comprehensive iden-tity of all lsquocascadedrsquo data sources and contributors dur-ing the aggregation process It is of course neverthelessthe case that the validity and integrity of data are ulti-mately linked to the sum of the integrity and validity ofall data processes in the lineage of data creationRecommendation 16 GBIF should investigate inno-

vative mechanisms for discovery and publishing of

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 6 of 10

primary biodiversity data in multiple languages GBIFshould commission a position paper detailing suchmechanisms for potential uptake by the communityRecommendation 17 GBIF must institutionalize the

lsquobiodiversity informatics potentialrsquo (BIP) Index todemonstrate the potential and urgency for nations toimplement biodiversity informatics [61] In the longterm GBIF must lead the periodic release of a lsquoglobalbiodiversity information outlookrsquo report analyzing thecurrent state of biodiversity information to meet thelocal-to-global scale biodiversity targetsRecommendation 18 GBIF must commission a strat-

egy paper demystifying the concernsissues related tointellectual property rights and primary biodiversitydata In this regard the substantial work done by theScience Commons (for example the Science CommonsProtocol for Implementing Open Access Data [62]) andthe Open Knowledge Foundation [63] should havedirect applicationRecommendation 19 GBIF should encourage spon-

sors of biodiversity research whether government agen-cies corporations or private foundations to setmandatory requirements for free and open access tobiodiversity data GBIF should encourage that negotia-tions for overhead (indirect) cost contributions fromfunders should include calculations of cost for sustaineddigital infrastructure that is adequate for free and opensharing and the sustained secure and persistent mainte-nance of data Proposals should be expected to includeadequate planning and financial provision for sustaineddata management and access We further recommendthat GBIF should encourage peer review processes thatinclude rigorous scrutiny of past histories of successfulsharing and should support the norm of state-of-the-artplanning for sharing not simply promises to ldquoput dataon the webrdquoRecommendation 20 GBIF must develop a plan to

foster linkages between scholarly publishers and datapublishers from the local to the global scale GBIFshould encourage that records of professional publica-tion be evaluated - at least in part - on the basis of pub-lication in open access journals that do not deny accessthrough lsquopaywallsrsquo and that provide support for sustain-able open access to dataRecommendation 21 GBIF should urge accreditation

bodies for educational institutions and museums torequire demonstrated evidence of capacity to supportdigital access and maintenance of dataRecommendation 22 GBIF should encourage profes-

sional societies and professional disciplines to requireevidence of effective sharing of data in evaluations forhiring promotion and tenureRecommendation 23 GBIF should develop a concep-

tual lsquolandscape maprsquo depicting GBIFrsquos position role

unique advantages and collaborative strategies amid themany biodiversity and biodiversity informatics initiativesat local to global scales This is very important given thebroad reach of the earlier recommendations It is impor-tant that the scope of the GBIFrsquos own vision and mis-sion is well defined with a clear picture of how GBIFrsquosrole fits into a wider framework of sustainable develop-ment and of free and open access to biodiversity dataRecommendation 24 GBIF must evaluate prioritize

and implement the recommendations made by its taskgroups - the Content Needs Assessment Task Group(CNA TG) [42] the Multimedia Resources Task Group(MRTG) [6465] the Metadata Implementation Frame-work Task Group (MIFTG) [56] the LSID-GUID TaskGroup (LGTG) [58] the Observational Data TaskGroup (ODTG) [66] - and in the Global Strategy andAction Plan for Natural History Collections Data(GSAP-NHC) [55] and recommendations on e-learningrecommendations [67] Knowledge Organization System(KOS) [68] and fitness for use [69]

DiscussionThese recommendations grew out of our discussion inJune 2009 Since then there have been subsequentrevisions and modifications of the recommendationsand some additions Chavan and Ingwersen [31]further elaborated on various components of the datapublishing framework especially pertaining to theissues of persistent identifiers the data usage indexand a data citation mechanism This was further dis-cussed during the DataCite Summer Workshop 2010[70] Members of the Task Group were engaged inexploring solutions to various components of the datapublishing framework some of which are included inthis issue [57596171] and some published elsewhere[697273] and MJ Costello WK Michener et al per-sonal communicationIn January 2011 the US National Science Foundation

(NSF) implemented a policy requiring all NSF grantapplicants to submit data management plans as a partof any grant proposal [74] This policy change seems torepresent a very significant fulfillment of our recom-mendation though the exact details of its implementa-tion remain as yet unclearWe believe that timely implementation of

these recommendations and suggested solutions orapproaches by the GBIF network will support muchneeded recognition for individual and institutionalefforts in management and publishing of primary bio-diversity data GBIFrsquos support of these recommenda-tions should be of critical importance in establishingtheir credibility and winning their widespread adop-tion Implementation of these recommendations shouldsubstantially increase the volume of available primary

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 7 of 10

biodiversity data substantiating public investment inbiodiversity science and conservation of bioticresourcesThe DPF TG notes several preliminary efforts to

implement these recommendations by the GBIF Secre-tariat The DPF TG recommendation on incentivizingefforts for metadata authoring has led the GBIF secre-tariat to commission Pensoft Publishers to create a lsquodatapaperrsquo [71] section in four of its journals (BioRisks Phy-toKeys NeoBiota and ZooKeys) alongside a lsquopush-buttonrsquomechanism to generate XML-encoded manuscripts frommetadata descriptions to be submitted directly to thepublisher for peer review and editorial evaluation andpublication in a form of a data paper [71] The BIPIndex an exploratory study to develop metrics to deter-mine country-level biodiversity informatics potentialshas been undertaken [61] GBIF was moreover invitedto be part of the group of experts convened by theCODATA (the Committee on Data for Science andTechnology) to develop an approach to data citationWe were mandated to make recommendations for

potential uptake by the GBIF network However webelieve that these recommendations apply to thebroader biodiversity informatics and ecoinformaticscommunity Nevertheless we reiterate that the GBIFnetwork is the most natural venue to kick-start the earlyimplementation of these recommendations As GBIFenters into its third phase in which it aspires to be theforemost global resource for biodiversity information[75] an early leadership and proactive step towardsimplementation of these recommendations is imperativefor its success

Conclusions and future workThe effective sharing of research data has become a goalof the international research community Implementa-tion of these recommendations should expedite the pro-gress of archiving curation discovery and publishing ofprimary biodiversity data because scientists and origina-tors of data will realize the value and incentives for suchefforts We believe that implementation of our recom-mendations by the GBIF network and its adoption bysimilar initiatives such as GEO-BON IPBES and CBDwill contribute to a much needed global research infra-structure and specifically to an open access regime inbiodiversity and conservation science We furtherbelieve that adoption should encourage the evolution ofa richly informed virtual research space for future stu-dies in biodiversity [76] However we believe that ulti-mately implementation of these recommendations willdepend less on policy-political decisions or technical-infrastructural development and primarily on culturalnormative and attitudinal changes by individuals institu-tions and organizations

AcknowledgementsThis article has been published as part of BMC Bioinformatics Volume 12Supplement 15 2011 Data publishing framework for primary biodiversitydata The full contents of the supplement are available online at httpwwwbiomedcentralcom1471-210512issue=S15 Publication of the supplementwas supported by the Global Biodiversity Information Facility

Author details11968frac12 South Shenandoah Street Los Angeles California 90034-1208 USA2Aundh Pune 411007 India 3Zoology Microbiology Research GroupZoology Department Natural History Museum Cromwell Road London SW75BD UK 4Royal School of Library and Information Science Birketinget 6Copenhagen DK 2300 Denmark 5Oslo University College Pb 4 St OlavsPlass 0130 Oslo Norway 6Plazi Zinggst 16 3600 Bern Switzerland andAmerican Museum of Natural History Central Park West at 79th Street NewYork NY 10024 USA 7Institute of Biodiversity and Ecosystem ResearchBulgarian Academy of Sciences and Pensoft Publishers 13a Geomilev Street1111 Sophia Bulgaria 8BioMedCentral Ltd Floor 6 236 Grayrsquos Inn RoadLondon WC1X 8HB UK 9Global Biodiversity Information Facility SecretariatUniversitetsparken 15 DK 2100 Copenhagen Denmark

Competing interestsThe authors declare that they have no competing interests

Published 15 December 2011

References1 Merriam-Webster [httpwwwmerriam-webstercomdictionarydata]2 Wikipedia [httpenwikipediaorgwikiData]3 National Science Foundation Sustainable Digital Data Preservation and

Access Network Partners (DataNet) Program Solicitation NSF 07-601 2008[httpwwwnsfgovpubs2007nsf07601nsf07601htmtoc]

4 AnthroDPA Metadata Working Group Report of the AnthroDPA MetaDataWorking Group May 2009 Sponsored by the Wenner-Gren Foundationand the US NSF[httpanthrodatadpaorgMediaAnthroDataDPA20Reportpdf]

5 Ackoff RL From data to wisdom Journal of Applied Systems Analysis 1989163-9

6 Bellinger C Castro D Mills A Data Information Knowledge and Wisdom2004 [httpwwwsystems-thinkingorgdikwdikwhtm]

7 Bose R Frew J Lineage retrieval for scientific data processing a surveyACM Computing Surveys 2005 371-28

8 Lathe W Williams J Mangan M Karolchik D Genomic data resourceschallenges and promises Nature Education 2008 13[httpwwwnaturecomscitabletopicpageGenomic-Data-Resources-Challenges-and-Promises-743721]

9 Grantham HS Moilanen A Wilson KA Pressey RL Rebelo TGPossingham HP Diminishing return on investment for biodiversity datain conservation planning Conservation Letters 1190-198 doi 101111j1755-263X200800029x

10 Closing the Climategate Nature 2010 468345 doi 101038468345a11 Penev L Erwin T Miller J Chavan V Moritz T Griswold C Publication and

dissemination of datasets in taxonomy ZooKeys working exampleZooKeys 2009 111-8 doi 103897zookeys11210

12 GBIF GBIF Work Programme 2009-2010 Copenhagen Global BiodiversityInformation Facility 2008

13 Merton RK The Normative Structure of Science The Sociology of ScienceTheoretical and Empirical Investigations Chicago University of Chicago Press1979 267-278

14 Cavendish H Read AS Experiments to determine the density of theearth Philos Trans R Soc Lond 1798 II469-526

15 Michener WK Meta-information concepts for ecological datamanagement Ecological Informatics 2006 13-7 doi 101016jecoinf200508004

16 Voss RS Emmons L Mammalian diversity in neotropical lowlandrainforests a preliminary assessment Bulletin of the American Museum ofNatural History 1996 230

17 Nur N Jones SL Geupel GR Statistical Guide to Data Analysis of AvianMonitoring Programs BTP-R6001-1999 Washington DC US Departmentof the Interior Fish and Wildlife Service 1999 61[httplibraryfwsgovPubs9avian_monitoringpdf]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 8 of 10

18 Agosti D Majer J Alonso E Schultz TR Ants Standard Methods forMeasuring and Monitoring Biodiversity Biological Diversity HandbookSeries Washington DC Smithsonian Institution Press 2000 [httpantbaseorgantspublications2033020330pdf]

19 EDIT Platform for Cybertaxonomy [httpwp5e-taxonomyeu]20 EDIT Volume on field recording techniques and protocols for all taxa

biodiversity inventories 2010 [httpwwwabctaxabevolumesvolume-8-manual-atbi]

21 Knowledge Network for Biodiversity an Introduction to EcologicalMetadata Language [httpknbecoinformaticsorgeml_metadata_guidehtml]

22 Borer ET Seabloom EW Jones MB Schildhauer M Some simple guidelinesfor effective data management ESA Bulletin 2009 90206-214[httpwwwesajournalsorgdoipdf1018900012-9623-902205]

23 The Kepler Project [httpskepler-projectorg]24 Giardine B Riemer C Hardison RC Burhans R Elnitski L Shah P Zhang Y

Blankenberg D Albert I Taylor J Miller W Kent WJ Nekrutenko A Galaxy aplatform for interactive large-scale genome analysis Genome Res 2005151451-1455

25 DDI Alliance Metadata specification for social and behavioral sciencesver 31[http httpwwwddiallianceorg]

26 Green T We need publishing standards for datasets and data tablesWhite paper OECD Publishing 2009 9-11 doi 101787603233448430

27 International Nucleotide Sequence Database Collaboration [httpinsdcorg]

28 GenBank [httpwwwncbinlmnihgovGenbankindexhtml]29 European Nucleotide Archive [httpwwwebiacukena]30 DNA Data Bank of Japan [httpwwwddbjnigacjp]31 Chavan VS Ingwersen P Towards a data publishing framework for

primary biodiversity data challenges and potentials for the biodiversityinformatics community BMC Bioinformatics 2009 10(Suppl 14)S2 doi1011861471-2105-10-S14-S2

32 Penev L Sharkey M Erwin T van Noort S Buffington M Seltmann KJohnson N Taylor M Thompson FC Dallwitz MJ Data publication anddissemination of interactive keys under the open access modelZooKeys working example ZooKeys 2009 211-17 doi 103897zookeys21274

33 Reichman OJ Jones MB Schildhauer MP Challenges and opportunitiesof open data in ecology Science 2011 331703 doi 101126science1197962

34 Craigie ID Baillie JEM Balmford A Carbone C Collen B Green REHutton JM Large marine population declines in Africarsquos protected areasBiol Conserv 2010 1432221-2228

35 Berlin Declaration on Open Access to Knowledge in the Sciences andHumanities 2003 [httpoampgdelangen-ukberlin-prozessberliner-erklarung]

36 Berlin Declaration Table of Signatories [httpoampgdelangen-ukberlin-prozesssignatoren]

37 About Conservation Commons [httpconservationcommonsnetcc_en_1-about-conservation-commons]

38 Conservation Commons Partners [httpconservationcommonsnetpartners]

39 Chavan V Watve AV Londhe MS Rane NS Pandit AT Krishnan SCataloguing Indian biota the electronic catalogue of known Indianfauna Curr Sci 2004 87749-763

40 Sarkar IN Biodiversity informatics organizing and linking informationacross the spectrum of life Brief Bioinf 2007 8347-357

41 Page RDM Biodiversity informatics the challenge of linking data and therole of shared identifiers Brief Bioinf 2008 9345-354

42 Faith DP Collen B Arino AH Koleff P Guinotte J Kerr J Chavan V Bridgingthe biodiversity data gaps recommendations of the GBIF ContentNeeds Assessment Task Group Biodiversity Informatics 2011

43 GBIF Data Portal [httpdatagbiforg]44 Butler D Gee H Macilwain C Museum research comes off list of

endangered species Nature 1998 394115-11745 Chavan V Krishnan S Natural history collections A call for national

information infrastructure Curr Sci 2003 8434-4246 Arino AH Approaches to estimating the universe of natural history

collections data Biodiversity Informatics 2010 781-9247 Heidorn PB Shedding light on the dark data in the long-tail of science

Library Trends 2008 57280-299 doi 101353lib00036

48 GBIF GBIF commissions Data Publishing Framework Task Group (10March 2009)[httpwwwgbiforgcommunicationsnews-and-eventsshowsinglearticlegbif-commissions-data-publishing-framework-task-group]

49 Chavan V Data Publishing = Scholarly Publishing e-Biosphere 09International Conference on Biodiversity Informatics June 2009 London[httpwwwslidesharenetvishwaschavanebiosphere09-vc-final-1734144]

50 Roberts D Chavan V Standards identifier could mobilize data and freetime Nature 2008 453449-450

51 IETF RFC 2119 (Released 1997)[httpwwwietforgrfcrfc2119txt]52 CITES [httpwwwcitesorg]53 TRAFFIC [httpwwwtrafficorg]54 BioPAX - Biological Pathway Exchange [httpwwwbiopaxorg]55 Berendsohn WG Chavan V Macklin JA Recommendations of the GBIF

Task Group on the Global Strategy and Action Plan for the mobilizationof the natural history collections data Biodiversity Informatics 2010767-71

56 Global Biodiversity Information Facility Report of the GBIF MetadataImplementation Framework Task Group (MIFTG) Copenhagen GlobalBiodiversity Information Facility 2009 [httpwww2gbiforgGBIF-MIFTG-Reportpdf]

57 Goddard A Wilson N Cryer P Yamashita G Data hosting infrastructure forprimary biodiversity data BMC Bioinformatics 2011 12(Suppl 15)S5

58 GBIF Adoption of Persistent Identifiers for Biodiversity InformaticsRecommendations of the GBIF LSID GUID Task Group CopenhagenGlobal Biodiversity Information Facility 2009 [httpwww2gbiforgPersistent-Identifierspdf]

59 Ingwersen P Chavan V Indicators for the Data Usage Index (DUI) anincentive for publishing primary biodiversity data through globalinformation infrastructure BMC Bioinformatics 2011 12(Suppl 15)S3

60 DataCite Metadata [httpswwwdatadryadorgwikiDataCite_Metadata]61 Arino AH Chavan V King N The Biodiversity Informatics Potential Index

BMC Bioinformatics 2011 12(Suppl 15)S462 Science Commons Protocol for Implementing Open Access Data [http

sciencecommonsorgprojectspublishingopen-access-data-protocol]63 Open Knowledge Foundation [httpokfnorg]64 Morris R Olson A OrsquoTuama E Riccardi G Whitbread G Hagedorn G

Teage I Heikkinen M Leary P Barve V Chavan V Recommendations of theGBIF Multimedia Resources Task Group Copenhagen Global BiodiversityInformation Facility 2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

65 Morris R Olson A Freeland C Hagedorn G Riccardi G Carausu M-COrsquoTuama E Chavan V Mobilising Multimedia Resources in Biodiversity2nd Report of the GBIF Multimedia Resources Task Group (MRTG)Copenhagen Global Biodiversity Information Facility 2009 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

66 Kelling S Ingole B Daly B Stein B Lepage D OrsquoTuama E Cooper JJones M Lahti T Chavan V Recommendations of the GBIF ObservationalData Task Group Copenhagen Global Biodiversity Information Facility2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

67 Balde O Encinas Escribano M Gonzaacutelez-Talavaacuten A Martens MJMNorton GA Talukdar GH GBIF Task Group on Electronic Learning FinalReport version 10 Copenhagen Global Biodiversity Information Facility2010 [httplinksgbiforggbif_elearning_task_group_en_v1pdf]

68 Catapano T Hobern D Lapp H Morris RA Morrision N Noy NSchildhauer M Thau D Recommendations for the Use of KnowledgeOrganisation Systems by GBIF Copenhagen Global BiodiversityInformation Facility 2001 [httplinksgbiforggbif_kos_whitepaper_v1pdf]Released on 04 Feb 2011

69 Hill AW Otegui J Arintildeo AH Guralnick RP GBIF Position Paper on FutureDirections and Recommendations for Enhancing Fitness-for-Use Acrossthe GBIF Network version 10 Copenhagen Global BiodiversityInformation Facility 2010 [httpwww2gbiforgGPP-Finalpdf] PrimaryBiodiversity Data

70 Chavan V Towards Data Publishing Framework DataCite Summer Meeting7-8 June 2010 Hannover Germany [httpflowcastsmediaelearninguni-hannoverde2010-07-05datacite2010AcquiringhighqualityresearchdataAndreasHense-640-video-O3hD9ZOmmp4]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 9 of 10

71 Chavan V Penev L The data paper a mechanism to incentivize datapublishing in biodiversity science BMC Bioinformatics 2011 12(Suppl15)S2

72 Berents P Hamer M Chavan V Towards demand driven publishingapproaches to the prioritization of digitization of natural historycollections data Biodiversity Informatics 2010 7113-119

73 Chavan VS Sood RK Arino AH Best Practice Guide for lsquoData Discoveryand Publishing Strategy and Action Plansrsquo version 10 CopenhagenGlobal Biodiversity Information Facility 2010 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

74 NSF Data Management Plan Requirements [httpwwwnsfgovenggeneraldmpjsp]

75 GBIF GBIF Strategic Plan 2012-2016 Seizing the Future CopenhagenGlobal Biodiversity Information Facility 2011 [httpgbifddbjnigacjpgbif_newsuploadGBIF_Strategic_Plan_2012-16pdf]

76 Gaikwad J Chavan V Open access and biodiversity conservationchallenges and potentials for the developing world Data Science Journal2006 51-17

doi1011861471-2105-12-S15-S1Cite this article as Moritz et al Towards mainstreaming of biodiversitydata publishing recommendations of the GBIF Data PublishingFramework Task Group BMC Bioinformatics 2011 12(Suppl 15)S1

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 10 of 10

  • Abstract
    • Background
    • Discussion
    • Conclusions
      • Background
        • Data usage and definitions
        • The volume of data
        • Collections of data databases datasets and data tables
        • How data have meaning metadata
        • Provision of metadata
        • Open access and biodiversity data
        • The GBIF data publishing framework task group
        • A data publishing framework for primary biodiversity data
          • Recommendations
          • Discussion
          • Conclusions and future work
          • Acknowledgements
          • Author details
          • Competing interests
          • References

[44-47] This large volume of biodiversity data collectedby a vast number of biodiversity researchers and ama-teurs [3147] remains largely undiscovered and unpub-lished This is attributable we believe to a lack ofencouragement misperceptions of self-interest or lackof infrastructural support Although infrastructure sup-port is increasingly available the problem of appropriateprofessional recognition for institutions and individualsremains [31] We believe that this lack of incentiveremains a major impediment to the provision of freeand open access to primary biodiversity data

The GBIF data publishing framework task groupThe foregoing discussion emphasizes the need for a datapublishing framework to evolve metrics and indicatorsthat provides incentives to multiple actors involved inthe generation of data Recognizing the need for addres-sing social policy political and technical issues influen-cing discovery and publishing through the GBIFnetwork the GBIF Data Publishing Framework TaskGroup (DPF TG) was commissioned in March 2009[48] The DPF TG was tasked with providing recom-mendations on (a) social technical and policy interven-tions that would encourage publication of primarybiodiversity data as a necessary and in-built step in thescientific data management cycle (b) opportunities andmechanisms to incentivize and attribute credit forinvestment in primary biodiversity data publishing fromindividual to institutional to national levels and (c)mechanismsprocesses for recognizing efforts of datapublishers The concept of the data publishing frame-work was described at the International BiodiversityInformatics Conference (rsquoe-Biosphere 09rsquo) held in Lon-don in June 2009 [49] In its meeting in June 2009 theDPF TG discussed issues influencing discovery and pub-lishing of primary biodiversity data and possible solu-tions in overcoming impediments

A data publishing framework for primary biodiversitydataDuring its meeting in June 2009 the DPF TG investedsignificant time in defining and determining the scopeand purpose of the data publishing framework for pri-mary biodiversity data The DPF TG recognized theneed expressed by the data originators and informationsystemnetworks for data usage metrics and indicatorsto ensure that the overall utility and impact of their datamanagement and publishing activities is objectivelydocumented leading to crediting of these activities asscientific activity on a par with the recognition receivedfor conventional scholarly publication [31] Furthermoremeasures of scientistsrsquo productivity will be betterinformed through data publishing which requires a pro-fessional cultural change in the recognition of scientific

output [50] Such an incentive mechanism wouldachieve increased data mobilization and increased recog-nition for data generation both desirable outcomes forscientistsOur discussion examined five primary components

that comprise a data publishing framework These com-ponents are (a) socio-cultural (b) technical-infrastruc-tural (c) policy-political (d) legal and (e) economic andthey support various activities of the data publishingcycle (see Figure 1 in [31]) These components are notonly complementary but are inter-dependent Thusthere is no dependency on a sequence of componentsas components need to be implemented concurrentlyTherefore we define a data publishing framework as anenvironment conducive to ensuring free and openaccess to the worldrsquos primary biodiversity data Thecore purpose of the framework is to overcome barriersor impediments affecting access to data and the pub-lishing of data

RecommendationsOn the basis of our understanding of issues influencinglsquofree and open accessrsquo discovery and publishing of theprimary biodiversity data to encourage institutionaliza-tion of the data publishing framework for discoverypublishing and use of primary biodiversity data wemake specific recommendations The key words lsquomustrsquolsquomust notrsquo lsquorequiredrsquo lsquoshallrsquo lsquoshall notrsquo lsquoshouldrsquo lsquoshouldnotrsquo lsquorecommendedrsquo lsquomayrsquo and lsquooptionalrsquo in this docu-ment are to be interpreted as described in RFC 2119lsquoKey words for use in RFCs to Indicate RequirementLevelsrsquo of the Internet Engineering Task Force [51]Sharing of biodiversity data must be the expected

norm We stipulate that withholding of data - to protectprecise localities for collectible or marketable plants oranimals or for species of special concern - should be theexception and require explicit justification We empha-size that such data represent a small fraction of biodi-versity data and should not be allowed to dictate normalpractice We also stipulate that our call for access tobiodiversity data does not supersede national or indigen-ous rights to regulate uses of biodiversity data as protec-tion against commercial exploitation (rsquobiopiracyrsquo) Tothis end we suggest close consultation and confirmationwith CITES [52] and the TRAFFIC Secretariat [53]when questions of this kind occur As a corollary allcontributors of data must receive appropriate propor-tional recognition for their contributions of data Onthis backdrop we offer 24 recommendations Recom-mendation 1 is however the primary recommendationthat leads to the other recommendationsRecommendation 1 All data relevant to the under-

standing of biodiversity and to biodiversity conservationshould be made freely openly and effectively available

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 5 of 10

Recommendation 2 GBIF must re-examine its cur-rent data resources endorsement model and scrutinizethe current practice that national nodes or associateparticipant nodes are required to give endorsementbefore the data are discovered and indexed throughGBIF networkRecommendation 3 GBIF must engage mainstream

scholarly publishers and scientific societies with scho-larly publications to be part of the GBIF network as amajority of them would qualify to be thematicglobalregional associate-participantsRecommendation 4 GBIF must support the develop-

ment of a tool to convert tabular data into resourcedescription framework (RDF) formats conforming to astandard ontology This would be highly desirable forsmall custodianspublishers but is primarily a tool formainstream scholarly publishers (Support for develop-ment of such an open source application should besought from mainstream commercial publishers) GBIFshall evaluate standards such as BioPax [54]Recommendation 5 GBIF must facilitate discovery

and mobilization of all streamstypes of relevant biodi-versity data (This effort should - in close collaborationwith others focusing on this development - includeontological analysis of the most important types of datato be considered the elaboration of suitable workingformats for that data and the developing of mappingstofrom such working formats to a standard RDF formatfor interchange purposes)Recommendation 6 GBIF should develop a set of

supporting tools (such as templates) for biodiversitydata to accommodate more than simple occurrencedata GBIF must increasingly engage with various biodi-versity data communitiesRecommendation 7 GBIF must facilitate discovery of

un-digitized and not yet published datasets togetherwith indexing of published datasets (potentially toinclude semantic indexing based on RDF to allow data-sets to be filtered and retrieved with SPARQL queries)In this regard we strongly endorse the recommendationby the GBIF Global Strategy and Action Plan for Mobili-zation of Natural History Collections data [55]Recommendation 8 GBIF should review the use of

legacy literature such as is stored in Biodiversity Heri-tage Library (BHL) to explore uses of marked-up textsfor data mining and capture of historical biodiversityinformationRecommendation 9 GBIF must explore and develop

the capacity to run queries at the GBIF data portal toreturn harmonized well formed XML andor RDF suchthat fields can be extracted for subsequent analysisRecommendation 10 GBIF must expand and

improve its metadata implementation framework tosuch that fitness for use of the data resource for

intended use can be ascertained from metadata Forexample data records should identify lineage and prove-nance (where data originated and from which dataresource) of all contributed data - at least to the pre-vious phase of data transformation Further we stronglyencourage early implementation of the recommenda-tions of the GBIF Metadata Implementation FrameworkTask Group [56]Recommendation 11 GBIF must strengthen its net-

work of mirror sites and distributed network of lsquotrusteddigital repositoriesrsquo (also called data hosting centers) Inthis regard we call on GBIF to ensure early implementa-tion of the recommendations in this issue on data host-ing infrastructure [57]Recommendation 12 GBIF must explore the feasibil-

ity of using a cloud infrastructure to overcome barriersof investment and maintenance required for biodiversitydata discovery and publishing especially in the develop-ing and under-developed regions of the worldRecommendation 13 GBIF must ensure an early

implementation of the recommendations of the GBIFLife Sciences Identifier (LSID)globally unique identifier(GUID) Task Group [58] We further emphasize theneed for GBIF to adopt a stable and proven persistentidentifier such as the lsquodigital object identifier (doi)rather than unstable persistent identifiersRecommendation 14 GBIF must explore the poten-

tial of the Data Usage Index (DUI) as potential incenti-vization mechanism to recognize efforts required forpublishing of biodiversity data [3159] GBIF shoulddevelop a prototype of such an implementationRecommendation 15 GBIF must institutionalize a

lsquodata citation mechanismrsquo and establish a lsquodata citationservicersquo facilitating deep-data citation and registrationand resolving of citations [26] For the purposes ofaccountability and citation (attribution) all contributorsof data to any aggregation should be identified andacknowledged Individuals or institutions responsible forprimary data have an obligation to make these owner-ship statements available to the aggregators who areresponsible for using them The Dryad applicationwhich uses DataCite to register dois is an initial effortto address this concern [60] In any data aggregationchain the aggregator at each level is responsible foridentification of data sources from previous level ofaggregation and its contributors We believe that thisprovision avoids the complexity of comprehensive iden-tity of all lsquocascadedrsquo data sources and contributors dur-ing the aggregation process It is of course neverthelessthe case that the validity and integrity of data are ulti-mately linked to the sum of the integrity and validity ofall data processes in the lineage of data creationRecommendation 16 GBIF should investigate inno-

vative mechanisms for discovery and publishing of

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 6 of 10

primary biodiversity data in multiple languages GBIFshould commission a position paper detailing suchmechanisms for potential uptake by the communityRecommendation 17 GBIF must institutionalize the

lsquobiodiversity informatics potentialrsquo (BIP) Index todemonstrate the potential and urgency for nations toimplement biodiversity informatics [61] In the longterm GBIF must lead the periodic release of a lsquoglobalbiodiversity information outlookrsquo report analyzing thecurrent state of biodiversity information to meet thelocal-to-global scale biodiversity targetsRecommendation 18 GBIF must commission a strat-

egy paper demystifying the concernsissues related tointellectual property rights and primary biodiversitydata In this regard the substantial work done by theScience Commons (for example the Science CommonsProtocol for Implementing Open Access Data [62]) andthe Open Knowledge Foundation [63] should havedirect applicationRecommendation 19 GBIF should encourage spon-

sors of biodiversity research whether government agen-cies corporations or private foundations to setmandatory requirements for free and open access tobiodiversity data GBIF should encourage that negotia-tions for overhead (indirect) cost contributions fromfunders should include calculations of cost for sustaineddigital infrastructure that is adequate for free and opensharing and the sustained secure and persistent mainte-nance of data Proposals should be expected to includeadequate planning and financial provision for sustaineddata management and access We further recommendthat GBIF should encourage peer review processes thatinclude rigorous scrutiny of past histories of successfulsharing and should support the norm of state-of-the-artplanning for sharing not simply promises to ldquoput dataon the webrdquoRecommendation 20 GBIF must develop a plan to

foster linkages between scholarly publishers and datapublishers from the local to the global scale GBIFshould encourage that records of professional publica-tion be evaluated - at least in part - on the basis of pub-lication in open access journals that do not deny accessthrough lsquopaywallsrsquo and that provide support for sustain-able open access to dataRecommendation 21 GBIF should urge accreditation

bodies for educational institutions and museums torequire demonstrated evidence of capacity to supportdigital access and maintenance of dataRecommendation 22 GBIF should encourage profes-

sional societies and professional disciplines to requireevidence of effective sharing of data in evaluations forhiring promotion and tenureRecommendation 23 GBIF should develop a concep-

tual lsquolandscape maprsquo depicting GBIFrsquos position role

unique advantages and collaborative strategies amid themany biodiversity and biodiversity informatics initiativesat local to global scales This is very important given thebroad reach of the earlier recommendations It is impor-tant that the scope of the GBIFrsquos own vision and mis-sion is well defined with a clear picture of how GBIFrsquosrole fits into a wider framework of sustainable develop-ment and of free and open access to biodiversity dataRecommendation 24 GBIF must evaluate prioritize

and implement the recommendations made by its taskgroups - the Content Needs Assessment Task Group(CNA TG) [42] the Multimedia Resources Task Group(MRTG) [6465] the Metadata Implementation Frame-work Task Group (MIFTG) [56] the LSID-GUID TaskGroup (LGTG) [58] the Observational Data TaskGroup (ODTG) [66] - and in the Global Strategy andAction Plan for Natural History Collections Data(GSAP-NHC) [55] and recommendations on e-learningrecommendations [67] Knowledge Organization System(KOS) [68] and fitness for use [69]

DiscussionThese recommendations grew out of our discussion inJune 2009 Since then there have been subsequentrevisions and modifications of the recommendationsand some additions Chavan and Ingwersen [31]further elaborated on various components of the datapublishing framework especially pertaining to theissues of persistent identifiers the data usage indexand a data citation mechanism This was further dis-cussed during the DataCite Summer Workshop 2010[70] Members of the Task Group were engaged inexploring solutions to various components of the datapublishing framework some of which are included inthis issue [57596171] and some published elsewhere[697273] and MJ Costello WK Michener et al per-sonal communicationIn January 2011 the US National Science Foundation

(NSF) implemented a policy requiring all NSF grantapplicants to submit data management plans as a partof any grant proposal [74] This policy change seems torepresent a very significant fulfillment of our recom-mendation though the exact details of its implementa-tion remain as yet unclearWe believe that timely implementation of

these recommendations and suggested solutions orapproaches by the GBIF network will support muchneeded recognition for individual and institutionalefforts in management and publishing of primary bio-diversity data GBIFrsquos support of these recommenda-tions should be of critical importance in establishingtheir credibility and winning their widespread adop-tion Implementation of these recommendations shouldsubstantially increase the volume of available primary

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 7 of 10

biodiversity data substantiating public investment inbiodiversity science and conservation of bioticresourcesThe DPF TG notes several preliminary efforts to

implement these recommendations by the GBIF Secre-tariat The DPF TG recommendation on incentivizingefforts for metadata authoring has led the GBIF secre-tariat to commission Pensoft Publishers to create a lsquodatapaperrsquo [71] section in four of its journals (BioRisks Phy-toKeys NeoBiota and ZooKeys) alongside a lsquopush-buttonrsquomechanism to generate XML-encoded manuscripts frommetadata descriptions to be submitted directly to thepublisher for peer review and editorial evaluation andpublication in a form of a data paper [71] The BIPIndex an exploratory study to develop metrics to deter-mine country-level biodiversity informatics potentialshas been undertaken [61] GBIF was moreover invitedto be part of the group of experts convened by theCODATA (the Committee on Data for Science andTechnology) to develop an approach to data citationWe were mandated to make recommendations for

potential uptake by the GBIF network However webelieve that these recommendations apply to thebroader biodiversity informatics and ecoinformaticscommunity Nevertheless we reiterate that the GBIFnetwork is the most natural venue to kick-start the earlyimplementation of these recommendations As GBIFenters into its third phase in which it aspires to be theforemost global resource for biodiversity information[75] an early leadership and proactive step towardsimplementation of these recommendations is imperativefor its success

Conclusions and future workThe effective sharing of research data has become a goalof the international research community Implementa-tion of these recommendations should expedite the pro-gress of archiving curation discovery and publishing ofprimary biodiversity data because scientists and origina-tors of data will realize the value and incentives for suchefforts We believe that implementation of our recom-mendations by the GBIF network and its adoption bysimilar initiatives such as GEO-BON IPBES and CBDwill contribute to a much needed global research infra-structure and specifically to an open access regime inbiodiversity and conservation science We furtherbelieve that adoption should encourage the evolution ofa richly informed virtual research space for future stu-dies in biodiversity [76] However we believe that ulti-mately implementation of these recommendations willdepend less on policy-political decisions or technical-infrastructural development and primarily on culturalnormative and attitudinal changes by individuals institu-tions and organizations

AcknowledgementsThis article has been published as part of BMC Bioinformatics Volume 12Supplement 15 2011 Data publishing framework for primary biodiversitydata The full contents of the supplement are available online at httpwwwbiomedcentralcom1471-210512issue=S15 Publication of the supplementwas supported by the Global Biodiversity Information Facility

Author details11968frac12 South Shenandoah Street Los Angeles California 90034-1208 USA2Aundh Pune 411007 India 3Zoology Microbiology Research GroupZoology Department Natural History Museum Cromwell Road London SW75BD UK 4Royal School of Library and Information Science Birketinget 6Copenhagen DK 2300 Denmark 5Oslo University College Pb 4 St OlavsPlass 0130 Oslo Norway 6Plazi Zinggst 16 3600 Bern Switzerland andAmerican Museum of Natural History Central Park West at 79th Street NewYork NY 10024 USA 7Institute of Biodiversity and Ecosystem ResearchBulgarian Academy of Sciences and Pensoft Publishers 13a Geomilev Street1111 Sophia Bulgaria 8BioMedCentral Ltd Floor 6 236 Grayrsquos Inn RoadLondon WC1X 8HB UK 9Global Biodiversity Information Facility SecretariatUniversitetsparken 15 DK 2100 Copenhagen Denmark

Competing interestsThe authors declare that they have no competing interests

Published 15 December 2011

References1 Merriam-Webster [httpwwwmerriam-webstercomdictionarydata]2 Wikipedia [httpenwikipediaorgwikiData]3 National Science Foundation Sustainable Digital Data Preservation and

Access Network Partners (DataNet) Program Solicitation NSF 07-601 2008[httpwwwnsfgovpubs2007nsf07601nsf07601htmtoc]

4 AnthroDPA Metadata Working Group Report of the AnthroDPA MetaDataWorking Group May 2009 Sponsored by the Wenner-Gren Foundationand the US NSF[httpanthrodatadpaorgMediaAnthroDataDPA20Reportpdf]

5 Ackoff RL From data to wisdom Journal of Applied Systems Analysis 1989163-9

6 Bellinger C Castro D Mills A Data Information Knowledge and Wisdom2004 [httpwwwsystems-thinkingorgdikwdikwhtm]

7 Bose R Frew J Lineage retrieval for scientific data processing a surveyACM Computing Surveys 2005 371-28

8 Lathe W Williams J Mangan M Karolchik D Genomic data resourceschallenges and promises Nature Education 2008 13[httpwwwnaturecomscitabletopicpageGenomic-Data-Resources-Challenges-and-Promises-743721]

9 Grantham HS Moilanen A Wilson KA Pressey RL Rebelo TGPossingham HP Diminishing return on investment for biodiversity datain conservation planning Conservation Letters 1190-198 doi 101111j1755-263X200800029x

10 Closing the Climategate Nature 2010 468345 doi 101038468345a11 Penev L Erwin T Miller J Chavan V Moritz T Griswold C Publication and

dissemination of datasets in taxonomy ZooKeys working exampleZooKeys 2009 111-8 doi 103897zookeys11210

12 GBIF GBIF Work Programme 2009-2010 Copenhagen Global BiodiversityInformation Facility 2008

13 Merton RK The Normative Structure of Science The Sociology of ScienceTheoretical and Empirical Investigations Chicago University of Chicago Press1979 267-278

14 Cavendish H Read AS Experiments to determine the density of theearth Philos Trans R Soc Lond 1798 II469-526

15 Michener WK Meta-information concepts for ecological datamanagement Ecological Informatics 2006 13-7 doi 101016jecoinf200508004

16 Voss RS Emmons L Mammalian diversity in neotropical lowlandrainforests a preliminary assessment Bulletin of the American Museum ofNatural History 1996 230

17 Nur N Jones SL Geupel GR Statistical Guide to Data Analysis of AvianMonitoring Programs BTP-R6001-1999 Washington DC US Departmentof the Interior Fish and Wildlife Service 1999 61[httplibraryfwsgovPubs9avian_monitoringpdf]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 8 of 10

18 Agosti D Majer J Alonso E Schultz TR Ants Standard Methods forMeasuring and Monitoring Biodiversity Biological Diversity HandbookSeries Washington DC Smithsonian Institution Press 2000 [httpantbaseorgantspublications2033020330pdf]

19 EDIT Platform for Cybertaxonomy [httpwp5e-taxonomyeu]20 EDIT Volume on field recording techniques and protocols for all taxa

biodiversity inventories 2010 [httpwwwabctaxabevolumesvolume-8-manual-atbi]

21 Knowledge Network for Biodiversity an Introduction to EcologicalMetadata Language [httpknbecoinformaticsorgeml_metadata_guidehtml]

22 Borer ET Seabloom EW Jones MB Schildhauer M Some simple guidelinesfor effective data management ESA Bulletin 2009 90206-214[httpwwwesajournalsorgdoipdf1018900012-9623-902205]

23 The Kepler Project [httpskepler-projectorg]24 Giardine B Riemer C Hardison RC Burhans R Elnitski L Shah P Zhang Y

Blankenberg D Albert I Taylor J Miller W Kent WJ Nekrutenko A Galaxy aplatform for interactive large-scale genome analysis Genome Res 2005151451-1455

25 DDI Alliance Metadata specification for social and behavioral sciencesver 31[http httpwwwddiallianceorg]

26 Green T We need publishing standards for datasets and data tablesWhite paper OECD Publishing 2009 9-11 doi 101787603233448430

27 International Nucleotide Sequence Database Collaboration [httpinsdcorg]

28 GenBank [httpwwwncbinlmnihgovGenbankindexhtml]29 European Nucleotide Archive [httpwwwebiacukena]30 DNA Data Bank of Japan [httpwwwddbjnigacjp]31 Chavan VS Ingwersen P Towards a data publishing framework for

primary biodiversity data challenges and potentials for the biodiversityinformatics community BMC Bioinformatics 2009 10(Suppl 14)S2 doi1011861471-2105-10-S14-S2

32 Penev L Sharkey M Erwin T van Noort S Buffington M Seltmann KJohnson N Taylor M Thompson FC Dallwitz MJ Data publication anddissemination of interactive keys under the open access modelZooKeys working example ZooKeys 2009 211-17 doi 103897zookeys21274

33 Reichman OJ Jones MB Schildhauer MP Challenges and opportunitiesof open data in ecology Science 2011 331703 doi 101126science1197962

34 Craigie ID Baillie JEM Balmford A Carbone C Collen B Green REHutton JM Large marine population declines in Africarsquos protected areasBiol Conserv 2010 1432221-2228

35 Berlin Declaration on Open Access to Knowledge in the Sciences andHumanities 2003 [httpoampgdelangen-ukberlin-prozessberliner-erklarung]

36 Berlin Declaration Table of Signatories [httpoampgdelangen-ukberlin-prozesssignatoren]

37 About Conservation Commons [httpconservationcommonsnetcc_en_1-about-conservation-commons]

38 Conservation Commons Partners [httpconservationcommonsnetpartners]

39 Chavan V Watve AV Londhe MS Rane NS Pandit AT Krishnan SCataloguing Indian biota the electronic catalogue of known Indianfauna Curr Sci 2004 87749-763

40 Sarkar IN Biodiversity informatics organizing and linking informationacross the spectrum of life Brief Bioinf 2007 8347-357

41 Page RDM Biodiversity informatics the challenge of linking data and therole of shared identifiers Brief Bioinf 2008 9345-354

42 Faith DP Collen B Arino AH Koleff P Guinotte J Kerr J Chavan V Bridgingthe biodiversity data gaps recommendations of the GBIF ContentNeeds Assessment Task Group Biodiversity Informatics 2011

43 GBIF Data Portal [httpdatagbiforg]44 Butler D Gee H Macilwain C Museum research comes off list of

endangered species Nature 1998 394115-11745 Chavan V Krishnan S Natural history collections A call for national

information infrastructure Curr Sci 2003 8434-4246 Arino AH Approaches to estimating the universe of natural history

collections data Biodiversity Informatics 2010 781-9247 Heidorn PB Shedding light on the dark data in the long-tail of science

Library Trends 2008 57280-299 doi 101353lib00036

48 GBIF GBIF commissions Data Publishing Framework Task Group (10March 2009)[httpwwwgbiforgcommunicationsnews-and-eventsshowsinglearticlegbif-commissions-data-publishing-framework-task-group]

49 Chavan V Data Publishing = Scholarly Publishing e-Biosphere 09International Conference on Biodiversity Informatics June 2009 London[httpwwwslidesharenetvishwaschavanebiosphere09-vc-final-1734144]

50 Roberts D Chavan V Standards identifier could mobilize data and freetime Nature 2008 453449-450

51 IETF RFC 2119 (Released 1997)[httpwwwietforgrfcrfc2119txt]52 CITES [httpwwwcitesorg]53 TRAFFIC [httpwwwtrafficorg]54 BioPAX - Biological Pathway Exchange [httpwwwbiopaxorg]55 Berendsohn WG Chavan V Macklin JA Recommendations of the GBIF

Task Group on the Global Strategy and Action Plan for the mobilizationof the natural history collections data Biodiversity Informatics 2010767-71

56 Global Biodiversity Information Facility Report of the GBIF MetadataImplementation Framework Task Group (MIFTG) Copenhagen GlobalBiodiversity Information Facility 2009 [httpwww2gbiforgGBIF-MIFTG-Reportpdf]

57 Goddard A Wilson N Cryer P Yamashita G Data hosting infrastructure forprimary biodiversity data BMC Bioinformatics 2011 12(Suppl 15)S5

58 GBIF Adoption of Persistent Identifiers for Biodiversity InformaticsRecommendations of the GBIF LSID GUID Task Group CopenhagenGlobal Biodiversity Information Facility 2009 [httpwww2gbiforgPersistent-Identifierspdf]

59 Ingwersen P Chavan V Indicators for the Data Usage Index (DUI) anincentive for publishing primary biodiversity data through globalinformation infrastructure BMC Bioinformatics 2011 12(Suppl 15)S3

60 DataCite Metadata [httpswwwdatadryadorgwikiDataCite_Metadata]61 Arino AH Chavan V King N The Biodiversity Informatics Potential Index

BMC Bioinformatics 2011 12(Suppl 15)S462 Science Commons Protocol for Implementing Open Access Data [http

sciencecommonsorgprojectspublishingopen-access-data-protocol]63 Open Knowledge Foundation [httpokfnorg]64 Morris R Olson A OrsquoTuama E Riccardi G Whitbread G Hagedorn G

Teage I Heikkinen M Leary P Barve V Chavan V Recommendations of theGBIF Multimedia Resources Task Group Copenhagen Global BiodiversityInformation Facility 2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

65 Morris R Olson A Freeland C Hagedorn G Riccardi G Carausu M-COrsquoTuama E Chavan V Mobilising Multimedia Resources in Biodiversity2nd Report of the GBIF Multimedia Resources Task Group (MRTG)Copenhagen Global Biodiversity Information Facility 2009 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

66 Kelling S Ingole B Daly B Stein B Lepage D OrsquoTuama E Cooper JJones M Lahti T Chavan V Recommendations of the GBIF ObservationalData Task Group Copenhagen Global Biodiversity Information Facility2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

67 Balde O Encinas Escribano M Gonzaacutelez-Talavaacuten A Martens MJMNorton GA Talukdar GH GBIF Task Group on Electronic Learning FinalReport version 10 Copenhagen Global Biodiversity Information Facility2010 [httplinksgbiforggbif_elearning_task_group_en_v1pdf]

68 Catapano T Hobern D Lapp H Morris RA Morrision N Noy NSchildhauer M Thau D Recommendations for the Use of KnowledgeOrganisation Systems by GBIF Copenhagen Global BiodiversityInformation Facility 2001 [httplinksgbiforggbif_kos_whitepaper_v1pdf]Released on 04 Feb 2011

69 Hill AW Otegui J Arintildeo AH Guralnick RP GBIF Position Paper on FutureDirections and Recommendations for Enhancing Fitness-for-Use Acrossthe GBIF Network version 10 Copenhagen Global BiodiversityInformation Facility 2010 [httpwww2gbiforgGPP-Finalpdf] PrimaryBiodiversity Data

70 Chavan V Towards Data Publishing Framework DataCite Summer Meeting7-8 June 2010 Hannover Germany [httpflowcastsmediaelearninguni-hannoverde2010-07-05datacite2010AcquiringhighqualityresearchdataAndreasHense-640-video-O3hD9ZOmmp4]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 9 of 10

71 Chavan V Penev L The data paper a mechanism to incentivize datapublishing in biodiversity science BMC Bioinformatics 2011 12(Suppl15)S2

72 Berents P Hamer M Chavan V Towards demand driven publishingapproaches to the prioritization of digitization of natural historycollections data Biodiversity Informatics 2010 7113-119

73 Chavan VS Sood RK Arino AH Best Practice Guide for lsquoData Discoveryand Publishing Strategy and Action Plansrsquo version 10 CopenhagenGlobal Biodiversity Information Facility 2010 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

74 NSF Data Management Plan Requirements [httpwwwnsfgovenggeneraldmpjsp]

75 GBIF GBIF Strategic Plan 2012-2016 Seizing the Future CopenhagenGlobal Biodiversity Information Facility 2011 [httpgbifddbjnigacjpgbif_newsuploadGBIF_Strategic_Plan_2012-16pdf]

76 Gaikwad J Chavan V Open access and biodiversity conservationchallenges and potentials for the developing world Data Science Journal2006 51-17

doi1011861471-2105-12-S15-S1Cite this article as Moritz et al Towards mainstreaming of biodiversitydata publishing recommendations of the GBIF Data PublishingFramework Task Group BMC Bioinformatics 2011 12(Suppl 15)S1

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 10 of 10

  • Abstract
    • Background
    • Discussion
    • Conclusions
      • Background
        • Data usage and definitions
        • The volume of data
        • Collections of data databases datasets and data tables
        • How data have meaning metadata
        • Provision of metadata
        • Open access and biodiversity data
        • The GBIF data publishing framework task group
        • A data publishing framework for primary biodiversity data
          • Recommendations
          • Discussion
          • Conclusions and future work
          • Acknowledgements
          • Author details
          • Competing interests
          • References

Recommendation 2 GBIF must re-examine its cur-rent data resources endorsement model and scrutinizethe current practice that national nodes or associateparticipant nodes are required to give endorsementbefore the data are discovered and indexed throughGBIF networkRecommendation 3 GBIF must engage mainstream

scholarly publishers and scientific societies with scho-larly publications to be part of the GBIF network as amajority of them would qualify to be thematicglobalregional associate-participantsRecommendation 4 GBIF must support the develop-

ment of a tool to convert tabular data into resourcedescription framework (RDF) formats conforming to astandard ontology This would be highly desirable forsmall custodianspublishers but is primarily a tool formainstream scholarly publishers (Support for develop-ment of such an open source application should besought from mainstream commercial publishers) GBIFshall evaluate standards such as BioPax [54]Recommendation 5 GBIF must facilitate discovery

and mobilization of all streamstypes of relevant biodi-versity data (This effort should - in close collaborationwith others focusing on this development - includeontological analysis of the most important types of datato be considered the elaboration of suitable workingformats for that data and the developing of mappingstofrom such working formats to a standard RDF formatfor interchange purposes)Recommendation 6 GBIF should develop a set of

supporting tools (such as templates) for biodiversitydata to accommodate more than simple occurrencedata GBIF must increasingly engage with various biodi-versity data communitiesRecommendation 7 GBIF must facilitate discovery of

un-digitized and not yet published datasets togetherwith indexing of published datasets (potentially toinclude semantic indexing based on RDF to allow data-sets to be filtered and retrieved with SPARQL queries)In this regard we strongly endorse the recommendationby the GBIF Global Strategy and Action Plan for Mobili-zation of Natural History Collections data [55]Recommendation 8 GBIF should review the use of

legacy literature such as is stored in Biodiversity Heri-tage Library (BHL) to explore uses of marked-up textsfor data mining and capture of historical biodiversityinformationRecommendation 9 GBIF must explore and develop

the capacity to run queries at the GBIF data portal toreturn harmonized well formed XML andor RDF suchthat fields can be extracted for subsequent analysisRecommendation 10 GBIF must expand and

improve its metadata implementation framework tosuch that fitness for use of the data resource for

intended use can be ascertained from metadata Forexample data records should identify lineage and prove-nance (where data originated and from which dataresource) of all contributed data - at least to the pre-vious phase of data transformation Further we stronglyencourage early implementation of the recommenda-tions of the GBIF Metadata Implementation FrameworkTask Group [56]Recommendation 11 GBIF must strengthen its net-

work of mirror sites and distributed network of lsquotrusteddigital repositoriesrsquo (also called data hosting centers) Inthis regard we call on GBIF to ensure early implementa-tion of the recommendations in this issue on data host-ing infrastructure [57]Recommendation 12 GBIF must explore the feasibil-

ity of using a cloud infrastructure to overcome barriersof investment and maintenance required for biodiversitydata discovery and publishing especially in the develop-ing and under-developed regions of the worldRecommendation 13 GBIF must ensure an early

implementation of the recommendations of the GBIFLife Sciences Identifier (LSID)globally unique identifier(GUID) Task Group [58] We further emphasize theneed for GBIF to adopt a stable and proven persistentidentifier such as the lsquodigital object identifier (doi)rather than unstable persistent identifiersRecommendation 14 GBIF must explore the poten-

tial of the Data Usage Index (DUI) as potential incenti-vization mechanism to recognize efforts required forpublishing of biodiversity data [3159] GBIF shoulddevelop a prototype of such an implementationRecommendation 15 GBIF must institutionalize a

lsquodata citation mechanismrsquo and establish a lsquodata citationservicersquo facilitating deep-data citation and registrationand resolving of citations [26] For the purposes ofaccountability and citation (attribution) all contributorsof data to any aggregation should be identified andacknowledged Individuals or institutions responsible forprimary data have an obligation to make these owner-ship statements available to the aggregators who areresponsible for using them The Dryad applicationwhich uses DataCite to register dois is an initial effortto address this concern [60] In any data aggregationchain the aggregator at each level is responsible foridentification of data sources from previous level ofaggregation and its contributors We believe that thisprovision avoids the complexity of comprehensive iden-tity of all lsquocascadedrsquo data sources and contributors dur-ing the aggregation process It is of course neverthelessthe case that the validity and integrity of data are ulti-mately linked to the sum of the integrity and validity ofall data processes in the lineage of data creationRecommendation 16 GBIF should investigate inno-

vative mechanisms for discovery and publishing of

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 6 of 10

primary biodiversity data in multiple languages GBIFshould commission a position paper detailing suchmechanisms for potential uptake by the communityRecommendation 17 GBIF must institutionalize the

lsquobiodiversity informatics potentialrsquo (BIP) Index todemonstrate the potential and urgency for nations toimplement biodiversity informatics [61] In the longterm GBIF must lead the periodic release of a lsquoglobalbiodiversity information outlookrsquo report analyzing thecurrent state of biodiversity information to meet thelocal-to-global scale biodiversity targetsRecommendation 18 GBIF must commission a strat-

egy paper demystifying the concernsissues related tointellectual property rights and primary biodiversitydata In this regard the substantial work done by theScience Commons (for example the Science CommonsProtocol for Implementing Open Access Data [62]) andthe Open Knowledge Foundation [63] should havedirect applicationRecommendation 19 GBIF should encourage spon-

sors of biodiversity research whether government agen-cies corporations or private foundations to setmandatory requirements for free and open access tobiodiversity data GBIF should encourage that negotia-tions for overhead (indirect) cost contributions fromfunders should include calculations of cost for sustaineddigital infrastructure that is adequate for free and opensharing and the sustained secure and persistent mainte-nance of data Proposals should be expected to includeadequate planning and financial provision for sustaineddata management and access We further recommendthat GBIF should encourage peer review processes thatinclude rigorous scrutiny of past histories of successfulsharing and should support the norm of state-of-the-artplanning for sharing not simply promises to ldquoput dataon the webrdquoRecommendation 20 GBIF must develop a plan to

foster linkages between scholarly publishers and datapublishers from the local to the global scale GBIFshould encourage that records of professional publica-tion be evaluated - at least in part - on the basis of pub-lication in open access journals that do not deny accessthrough lsquopaywallsrsquo and that provide support for sustain-able open access to dataRecommendation 21 GBIF should urge accreditation

bodies for educational institutions and museums torequire demonstrated evidence of capacity to supportdigital access and maintenance of dataRecommendation 22 GBIF should encourage profes-

sional societies and professional disciplines to requireevidence of effective sharing of data in evaluations forhiring promotion and tenureRecommendation 23 GBIF should develop a concep-

tual lsquolandscape maprsquo depicting GBIFrsquos position role

unique advantages and collaborative strategies amid themany biodiversity and biodiversity informatics initiativesat local to global scales This is very important given thebroad reach of the earlier recommendations It is impor-tant that the scope of the GBIFrsquos own vision and mis-sion is well defined with a clear picture of how GBIFrsquosrole fits into a wider framework of sustainable develop-ment and of free and open access to biodiversity dataRecommendation 24 GBIF must evaluate prioritize

and implement the recommendations made by its taskgroups - the Content Needs Assessment Task Group(CNA TG) [42] the Multimedia Resources Task Group(MRTG) [6465] the Metadata Implementation Frame-work Task Group (MIFTG) [56] the LSID-GUID TaskGroup (LGTG) [58] the Observational Data TaskGroup (ODTG) [66] - and in the Global Strategy andAction Plan for Natural History Collections Data(GSAP-NHC) [55] and recommendations on e-learningrecommendations [67] Knowledge Organization System(KOS) [68] and fitness for use [69]

DiscussionThese recommendations grew out of our discussion inJune 2009 Since then there have been subsequentrevisions and modifications of the recommendationsand some additions Chavan and Ingwersen [31]further elaborated on various components of the datapublishing framework especially pertaining to theissues of persistent identifiers the data usage indexand a data citation mechanism This was further dis-cussed during the DataCite Summer Workshop 2010[70] Members of the Task Group were engaged inexploring solutions to various components of the datapublishing framework some of which are included inthis issue [57596171] and some published elsewhere[697273] and MJ Costello WK Michener et al per-sonal communicationIn January 2011 the US National Science Foundation

(NSF) implemented a policy requiring all NSF grantapplicants to submit data management plans as a partof any grant proposal [74] This policy change seems torepresent a very significant fulfillment of our recom-mendation though the exact details of its implementa-tion remain as yet unclearWe believe that timely implementation of

these recommendations and suggested solutions orapproaches by the GBIF network will support muchneeded recognition for individual and institutionalefforts in management and publishing of primary bio-diversity data GBIFrsquos support of these recommenda-tions should be of critical importance in establishingtheir credibility and winning their widespread adop-tion Implementation of these recommendations shouldsubstantially increase the volume of available primary

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 7 of 10

biodiversity data substantiating public investment inbiodiversity science and conservation of bioticresourcesThe DPF TG notes several preliminary efforts to

implement these recommendations by the GBIF Secre-tariat The DPF TG recommendation on incentivizingefforts for metadata authoring has led the GBIF secre-tariat to commission Pensoft Publishers to create a lsquodatapaperrsquo [71] section in four of its journals (BioRisks Phy-toKeys NeoBiota and ZooKeys) alongside a lsquopush-buttonrsquomechanism to generate XML-encoded manuscripts frommetadata descriptions to be submitted directly to thepublisher for peer review and editorial evaluation andpublication in a form of a data paper [71] The BIPIndex an exploratory study to develop metrics to deter-mine country-level biodiversity informatics potentialshas been undertaken [61] GBIF was moreover invitedto be part of the group of experts convened by theCODATA (the Committee on Data for Science andTechnology) to develop an approach to data citationWe were mandated to make recommendations for

potential uptake by the GBIF network However webelieve that these recommendations apply to thebroader biodiversity informatics and ecoinformaticscommunity Nevertheless we reiterate that the GBIFnetwork is the most natural venue to kick-start the earlyimplementation of these recommendations As GBIFenters into its third phase in which it aspires to be theforemost global resource for biodiversity information[75] an early leadership and proactive step towardsimplementation of these recommendations is imperativefor its success

Conclusions and future workThe effective sharing of research data has become a goalof the international research community Implementa-tion of these recommendations should expedite the pro-gress of archiving curation discovery and publishing ofprimary biodiversity data because scientists and origina-tors of data will realize the value and incentives for suchefforts We believe that implementation of our recom-mendations by the GBIF network and its adoption bysimilar initiatives such as GEO-BON IPBES and CBDwill contribute to a much needed global research infra-structure and specifically to an open access regime inbiodiversity and conservation science We furtherbelieve that adoption should encourage the evolution ofa richly informed virtual research space for future stu-dies in biodiversity [76] However we believe that ulti-mately implementation of these recommendations willdepend less on policy-political decisions or technical-infrastructural development and primarily on culturalnormative and attitudinal changes by individuals institu-tions and organizations

AcknowledgementsThis article has been published as part of BMC Bioinformatics Volume 12Supplement 15 2011 Data publishing framework for primary biodiversitydata The full contents of the supplement are available online at httpwwwbiomedcentralcom1471-210512issue=S15 Publication of the supplementwas supported by the Global Biodiversity Information Facility

Author details11968frac12 South Shenandoah Street Los Angeles California 90034-1208 USA2Aundh Pune 411007 India 3Zoology Microbiology Research GroupZoology Department Natural History Museum Cromwell Road London SW75BD UK 4Royal School of Library and Information Science Birketinget 6Copenhagen DK 2300 Denmark 5Oslo University College Pb 4 St OlavsPlass 0130 Oslo Norway 6Plazi Zinggst 16 3600 Bern Switzerland andAmerican Museum of Natural History Central Park West at 79th Street NewYork NY 10024 USA 7Institute of Biodiversity and Ecosystem ResearchBulgarian Academy of Sciences and Pensoft Publishers 13a Geomilev Street1111 Sophia Bulgaria 8BioMedCentral Ltd Floor 6 236 Grayrsquos Inn RoadLondon WC1X 8HB UK 9Global Biodiversity Information Facility SecretariatUniversitetsparken 15 DK 2100 Copenhagen Denmark

Competing interestsThe authors declare that they have no competing interests

Published 15 December 2011

References1 Merriam-Webster [httpwwwmerriam-webstercomdictionarydata]2 Wikipedia [httpenwikipediaorgwikiData]3 National Science Foundation Sustainable Digital Data Preservation and

Access Network Partners (DataNet) Program Solicitation NSF 07-601 2008[httpwwwnsfgovpubs2007nsf07601nsf07601htmtoc]

4 AnthroDPA Metadata Working Group Report of the AnthroDPA MetaDataWorking Group May 2009 Sponsored by the Wenner-Gren Foundationand the US NSF[httpanthrodatadpaorgMediaAnthroDataDPA20Reportpdf]

5 Ackoff RL From data to wisdom Journal of Applied Systems Analysis 1989163-9

6 Bellinger C Castro D Mills A Data Information Knowledge and Wisdom2004 [httpwwwsystems-thinkingorgdikwdikwhtm]

7 Bose R Frew J Lineage retrieval for scientific data processing a surveyACM Computing Surveys 2005 371-28

8 Lathe W Williams J Mangan M Karolchik D Genomic data resourceschallenges and promises Nature Education 2008 13[httpwwwnaturecomscitabletopicpageGenomic-Data-Resources-Challenges-and-Promises-743721]

9 Grantham HS Moilanen A Wilson KA Pressey RL Rebelo TGPossingham HP Diminishing return on investment for biodiversity datain conservation planning Conservation Letters 1190-198 doi 101111j1755-263X200800029x

10 Closing the Climategate Nature 2010 468345 doi 101038468345a11 Penev L Erwin T Miller J Chavan V Moritz T Griswold C Publication and

dissemination of datasets in taxonomy ZooKeys working exampleZooKeys 2009 111-8 doi 103897zookeys11210

12 GBIF GBIF Work Programme 2009-2010 Copenhagen Global BiodiversityInformation Facility 2008

13 Merton RK The Normative Structure of Science The Sociology of ScienceTheoretical and Empirical Investigations Chicago University of Chicago Press1979 267-278

14 Cavendish H Read AS Experiments to determine the density of theearth Philos Trans R Soc Lond 1798 II469-526

15 Michener WK Meta-information concepts for ecological datamanagement Ecological Informatics 2006 13-7 doi 101016jecoinf200508004

16 Voss RS Emmons L Mammalian diversity in neotropical lowlandrainforests a preliminary assessment Bulletin of the American Museum ofNatural History 1996 230

17 Nur N Jones SL Geupel GR Statistical Guide to Data Analysis of AvianMonitoring Programs BTP-R6001-1999 Washington DC US Departmentof the Interior Fish and Wildlife Service 1999 61[httplibraryfwsgovPubs9avian_monitoringpdf]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 8 of 10

18 Agosti D Majer J Alonso E Schultz TR Ants Standard Methods forMeasuring and Monitoring Biodiversity Biological Diversity HandbookSeries Washington DC Smithsonian Institution Press 2000 [httpantbaseorgantspublications2033020330pdf]

19 EDIT Platform for Cybertaxonomy [httpwp5e-taxonomyeu]20 EDIT Volume on field recording techniques and protocols for all taxa

biodiversity inventories 2010 [httpwwwabctaxabevolumesvolume-8-manual-atbi]

21 Knowledge Network for Biodiversity an Introduction to EcologicalMetadata Language [httpknbecoinformaticsorgeml_metadata_guidehtml]

22 Borer ET Seabloom EW Jones MB Schildhauer M Some simple guidelinesfor effective data management ESA Bulletin 2009 90206-214[httpwwwesajournalsorgdoipdf1018900012-9623-902205]

23 The Kepler Project [httpskepler-projectorg]24 Giardine B Riemer C Hardison RC Burhans R Elnitski L Shah P Zhang Y

Blankenberg D Albert I Taylor J Miller W Kent WJ Nekrutenko A Galaxy aplatform for interactive large-scale genome analysis Genome Res 2005151451-1455

25 DDI Alliance Metadata specification for social and behavioral sciencesver 31[http httpwwwddiallianceorg]

26 Green T We need publishing standards for datasets and data tablesWhite paper OECD Publishing 2009 9-11 doi 101787603233448430

27 International Nucleotide Sequence Database Collaboration [httpinsdcorg]

28 GenBank [httpwwwncbinlmnihgovGenbankindexhtml]29 European Nucleotide Archive [httpwwwebiacukena]30 DNA Data Bank of Japan [httpwwwddbjnigacjp]31 Chavan VS Ingwersen P Towards a data publishing framework for

primary biodiversity data challenges and potentials for the biodiversityinformatics community BMC Bioinformatics 2009 10(Suppl 14)S2 doi1011861471-2105-10-S14-S2

32 Penev L Sharkey M Erwin T van Noort S Buffington M Seltmann KJohnson N Taylor M Thompson FC Dallwitz MJ Data publication anddissemination of interactive keys under the open access modelZooKeys working example ZooKeys 2009 211-17 doi 103897zookeys21274

33 Reichman OJ Jones MB Schildhauer MP Challenges and opportunitiesof open data in ecology Science 2011 331703 doi 101126science1197962

34 Craigie ID Baillie JEM Balmford A Carbone C Collen B Green REHutton JM Large marine population declines in Africarsquos protected areasBiol Conserv 2010 1432221-2228

35 Berlin Declaration on Open Access to Knowledge in the Sciences andHumanities 2003 [httpoampgdelangen-ukberlin-prozessberliner-erklarung]

36 Berlin Declaration Table of Signatories [httpoampgdelangen-ukberlin-prozesssignatoren]

37 About Conservation Commons [httpconservationcommonsnetcc_en_1-about-conservation-commons]

38 Conservation Commons Partners [httpconservationcommonsnetpartners]

39 Chavan V Watve AV Londhe MS Rane NS Pandit AT Krishnan SCataloguing Indian biota the electronic catalogue of known Indianfauna Curr Sci 2004 87749-763

40 Sarkar IN Biodiversity informatics organizing and linking informationacross the spectrum of life Brief Bioinf 2007 8347-357

41 Page RDM Biodiversity informatics the challenge of linking data and therole of shared identifiers Brief Bioinf 2008 9345-354

42 Faith DP Collen B Arino AH Koleff P Guinotte J Kerr J Chavan V Bridgingthe biodiversity data gaps recommendations of the GBIF ContentNeeds Assessment Task Group Biodiversity Informatics 2011

43 GBIF Data Portal [httpdatagbiforg]44 Butler D Gee H Macilwain C Museum research comes off list of

endangered species Nature 1998 394115-11745 Chavan V Krishnan S Natural history collections A call for national

information infrastructure Curr Sci 2003 8434-4246 Arino AH Approaches to estimating the universe of natural history

collections data Biodiversity Informatics 2010 781-9247 Heidorn PB Shedding light on the dark data in the long-tail of science

Library Trends 2008 57280-299 doi 101353lib00036

48 GBIF GBIF commissions Data Publishing Framework Task Group (10March 2009)[httpwwwgbiforgcommunicationsnews-and-eventsshowsinglearticlegbif-commissions-data-publishing-framework-task-group]

49 Chavan V Data Publishing = Scholarly Publishing e-Biosphere 09International Conference on Biodiversity Informatics June 2009 London[httpwwwslidesharenetvishwaschavanebiosphere09-vc-final-1734144]

50 Roberts D Chavan V Standards identifier could mobilize data and freetime Nature 2008 453449-450

51 IETF RFC 2119 (Released 1997)[httpwwwietforgrfcrfc2119txt]52 CITES [httpwwwcitesorg]53 TRAFFIC [httpwwwtrafficorg]54 BioPAX - Biological Pathway Exchange [httpwwwbiopaxorg]55 Berendsohn WG Chavan V Macklin JA Recommendations of the GBIF

Task Group on the Global Strategy and Action Plan for the mobilizationof the natural history collections data Biodiversity Informatics 2010767-71

56 Global Biodiversity Information Facility Report of the GBIF MetadataImplementation Framework Task Group (MIFTG) Copenhagen GlobalBiodiversity Information Facility 2009 [httpwww2gbiforgGBIF-MIFTG-Reportpdf]

57 Goddard A Wilson N Cryer P Yamashita G Data hosting infrastructure forprimary biodiversity data BMC Bioinformatics 2011 12(Suppl 15)S5

58 GBIF Adoption of Persistent Identifiers for Biodiversity InformaticsRecommendations of the GBIF LSID GUID Task Group CopenhagenGlobal Biodiversity Information Facility 2009 [httpwww2gbiforgPersistent-Identifierspdf]

59 Ingwersen P Chavan V Indicators for the Data Usage Index (DUI) anincentive for publishing primary biodiversity data through globalinformation infrastructure BMC Bioinformatics 2011 12(Suppl 15)S3

60 DataCite Metadata [httpswwwdatadryadorgwikiDataCite_Metadata]61 Arino AH Chavan V King N The Biodiversity Informatics Potential Index

BMC Bioinformatics 2011 12(Suppl 15)S462 Science Commons Protocol for Implementing Open Access Data [http

sciencecommonsorgprojectspublishingopen-access-data-protocol]63 Open Knowledge Foundation [httpokfnorg]64 Morris R Olson A OrsquoTuama E Riccardi G Whitbread G Hagedorn G

Teage I Heikkinen M Leary P Barve V Chavan V Recommendations of theGBIF Multimedia Resources Task Group Copenhagen Global BiodiversityInformation Facility 2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

65 Morris R Olson A Freeland C Hagedorn G Riccardi G Carausu M-COrsquoTuama E Chavan V Mobilising Multimedia Resources in Biodiversity2nd Report of the GBIF Multimedia Resources Task Group (MRTG)Copenhagen Global Biodiversity Information Facility 2009 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

66 Kelling S Ingole B Daly B Stein B Lepage D OrsquoTuama E Cooper JJones M Lahti T Chavan V Recommendations of the GBIF ObservationalData Task Group Copenhagen Global Biodiversity Information Facility2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

67 Balde O Encinas Escribano M Gonzaacutelez-Talavaacuten A Martens MJMNorton GA Talukdar GH GBIF Task Group on Electronic Learning FinalReport version 10 Copenhagen Global Biodiversity Information Facility2010 [httplinksgbiforggbif_elearning_task_group_en_v1pdf]

68 Catapano T Hobern D Lapp H Morris RA Morrision N Noy NSchildhauer M Thau D Recommendations for the Use of KnowledgeOrganisation Systems by GBIF Copenhagen Global BiodiversityInformation Facility 2001 [httplinksgbiforggbif_kos_whitepaper_v1pdf]Released on 04 Feb 2011

69 Hill AW Otegui J Arintildeo AH Guralnick RP GBIF Position Paper on FutureDirections and Recommendations for Enhancing Fitness-for-Use Acrossthe GBIF Network version 10 Copenhagen Global BiodiversityInformation Facility 2010 [httpwww2gbiforgGPP-Finalpdf] PrimaryBiodiversity Data

70 Chavan V Towards Data Publishing Framework DataCite Summer Meeting7-8 June 2010 Hannover Germany [httpflowcastsmediaelearninguni-hannoverde2010-07-05datacite2010AcquiringhighqualityresearchdataAndreasHense-640-video-O3hD9ZOmmp4]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 9 of 10

71 Chavan V Penev L The data paper a mechanism to incentivize datapublishing in biodiversity science BMC Bioinformatics 2011 12(Suppl15)S2

72 Berents P Hamer M Chavan V Towards demand driven publishingapproaches to the prioritization of digitization of natural historycollections data Biodiversity Informatics 2010 7113-119

73 Chavan VS Sood RK Arino AH Best Practice Guide for lsquoData Discoveryand Publishing Strategy and Action Plansrsquo version 10 CopenhagenGlobal Biodiversity Information Facility 2010 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

74 NSF Data Management Plan Requirements [httpwwwnsfgovenggeneraldmpjsp]

75 GBIF GBIF Strategic Plan 2012-2016 Seizing the Future CopenhagenGlobal Biodiversity Information Facility 2011 [httpgbifddbjnigacjpgbif_newsuploadGBIF_Strategic_Plan_2012-16pdf]

76 Gaikwad J Chavan V Open access and biodiversity conservationchallenges and potentials for the developing world Data Science Journal2006 51-17

doi1011861471-2105-12-S15-S1Cite this article as Moritz et al Towards mainstreaming of biodiversitydata publishing recommendations of the GBIF Data PublishingFramework Task Group BMC Bioinformatics 2011 12(Suppl 15)S1

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 10 of 10

  • Abstract
    • Background
    • Discussion
    • Conclusions
      • Background
        • Data usage and definitions
        • The volume of data
        • Collections of data databases datasets and data tables
        • How data have meaning metadata
        • Provision of metadata
        • Open access and biodiversity data
        • The GBIF data publishing framework task group
        • A data publishing framework for primary biodiversity data
          • Recommendations
          • Discussion
          • Conclusions and future work
          • Acknowledgements
          • Author details
          • Competing interests
          • References

primary biodiversity data in multiple languages GBIFshould commission a position paper detailing suchmechanisms for potential uptake by the communityRecommendation 17 GBIF must institutionalize the

lsquobiodiversity informatics potentialrsquo (BIP) Index todemonstrate the potential and urgency for nations toimplement biodiversity informatics [61] In the longterm GBIF must lead the periodic release of a lsquoglobalbiodiversity information outlookrsquo report analyzing thecurrent state of biodiversity information to meet thelocal-to-global scale biodiversity targetsRecommendation 18 GBIF must commission a strat-

egy paper demystifying the concernsissues related tointellectual property rights and primary biodiversitydata In this regard the substantial work done by theScience Commons (for example the Science CommonsProtocol for Implementing Open Access Data [62]) andthe Open Knowledge Foundation [63] should havedirect applicationRecommendation 19 GBIF should encourage spon-

sors of biodiversity research whether government agen-cies corporations or private foundations to setmandatory requirements for free and open access tobiodiversity data GBIF should encourage that negotia-tions for overhead (indirect) cost contributions fromfunders should include calculations of cost for sustaineddigital infrastructure that is adequate for free and opensharing and the sustained secure and persistent mainte-nance of data Proposals should be expected to includeadequate planning and financial provision for sustaineddata management and access We further recommendthat GBIF should encourage peer review processes thatinclude rigorous scrutiny of past histories of successfulsharing and should support the norm of state-of-the-artplanning for sharing not simply promises to ldquoput dataon the webrdquoRecommendation 20 GBIF must develop a plan to

foster linkages between scholarly publishers and datapublishers from the local to the global scale GBIFshould encourage that records of professional publica-tion be evaluated - at least in part - on the basis of pub-lication in open access journals that do not deny accessthrough lsquopaywallsrsquo and that provide support for sustain-able open access to dataRecommendation 21 GBIF should urge accreditation

bodies for educational institutions and museums torequire demonstrated evidence of capacity to supportdigital access and maintenance of dataRecommendation 22 GBIF should encourage profes-

sional societies and professional disciplines to requireevidence of effective sharing of data in evaluations forhiring promotion and tenureRecommendation 23 GBIF should develop a concep-

tual lsquolandscape maprsquo depicting GBIFrsquos position role

unique advantages and collaborative strategies amid themany biodiversity and biodiversity informatics initiativesat local to global scales This is very important given thebroad reach of the earlier recommendations It is impor-tant that the scope of the GBIFrsquos own vision and mis-sion is well defined with a clear picture of how GBIFrsquosrole fits into a wider framework of sustainable develop-ment and of free and open access to biodiversity dataRecommendation 24 GBIF must evaluate prioritize

and implement the recommendations made by its taskgroups - the Content Needs Assessment Task Group(CNA TG) [42] the Multimedia Resources Task Group(MRTG) [6465] the Metadata Implementation Frame-work Task Group (MIFTG) [56] the LSID-GUID TaskGroup (LGTG) [58] the Observational Data TaskGroup (ODTG) [66] - and in the Global Strategy andAction Plan for Natural History Collections Data(GSAP-NHC) [55] and recommendations on e-learningrecommendations [67] Knowledge Organization System(KOS) [68] and fitness for use [69]

DiscussionThese recommendations grew out of our discussion inJune 2009 Since then there have been subsequentrevisions and modifications of the recommendationsand some additions Chavan and Ingwersen [31]further elaborated on various components of the datapublishing framework especially pertaining to theissues of persistent identifiers the data usage indexand a data citation mechanism This was further dis-cussed during the DataCite Summer Workshop 2010[70] Members of the Task Group were engaged inexploring solutions to various components of the datapublishing framework some of which are included inthis issue [57596171] and some published elsewhere[697273] and MJ Costello WK Michener et al per-sonal communicationIn January 2011 the US National Science Foundation

(NSF) implemented a policy requiring all NSF grantapplicants to submit data management plans as a partof any grant proposal [74] This policy change seems torepresent a very significant fulfillment of our recom-mendation though the exact details of its implementa-tion remain as yet unclearWe believe that timely implementation of

these recommendations and suggested solutions orapproaches by the GBIF network will support muchneeded recognition for individual and institutionalefforts in management and publishing of primary bio-diversity data GBIFrsquos support of these recommenda-tions should be of critical importance in establishingtheir credibility and winning their widespread adop-tion Implementation of these recommendations shouldsubstantially increase the volume of available primary

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 7 of 10

biodiversity data substantiating public investment inbiodiversity science and conservation of bioticresourcesThe DPF TG notes several preliminary efforts to

implement these recommendations by the GBIF Secre-tariat The DPF TG recommendation on incentivizingefforts for metadata authoring has led the GBIF secre-tariat to commission Pensoft Publishers to create a lsquodatapaperrsquo [71] section in four of its journals (BioRisks Phy-toKeys NeoBiota and ZooKeys) alongside a lsquopush-buttonrsquomechanism to generate XML-encoded manuscripts frommetadata descriptions to be submitted directly to thepublisher for peer review and editorial evaluation andpublication in a form of a data paper [71] The BIPIndex an exploratory study to develop metrics to deter-mine country-level biodiversity informatics potentialshas been undertaken [61] GBIF was moreover invitedto be part of the group of experts convened by theCODATA (the Committee on Data for Science andTechnology) to develop an approach to data citationWe were mandated to make recommendations for

potential uptake by the GBIF network However webelieve that these recommendations apply to thebroader biodiversity informatics and ecoinformaticscommunity Nevertheless we reiterate that the GBIFnetwork is the most natural venue to kick-start the earlyimplementation of these recommendations As GBIFenters into its third phase in which it aspires to be theforemost global resource for biodiversity information[75] an early leadership and proactive step towardsimplementation of these recommendations is imperativefor its success

Conclusions and future workThe effective sharing of research data has become a goalof the international research community Implementa-tion of these recommendations should expedite the pro-gress of archiving curation discovery and publishing ofprimary biodiversity data because scientists and origina-tors of data will realize the value and incentives for suchefforts We believe that implementation of our recom-mendations by the GBIF network and its adoption bysimilar initiatives such as GEO-BON IPBES and CBDwill contribute to a much needed global research infra-structure and specifically to an open access regime inbiodiversity and conservation science We furtherbelieve that adoption should encourage the evolution ofa richly informed virtual research space for future stu-dies in biodiversity [76] However we believe that ulti-mately implementation of these recommendations willdepend less on policy-political decisions or technical-infrastructural development and primarily on culturalnormative and attitudinal changes by individuals institu-tions and organizations

AcknowledgementsThis article has been published as part of BMC Bioinformatics Volume 12Supplement 15 2011 Data publishing framework for primary biodiversitydata The full contents of the supplement are available online at httpwwwbiomedcentralcom1471-210512issue=S15 Publication of the supplementwas supported by the Global Biodiversity Information Facility

Author details11968frac12 South Shenandoah Street Los Angeles California 90034-1208 USA2Aundh Pune 411007 India 3Zoology Microbiology Research GroupZoology Department Natural History Museum Cromwell Road London SW75BD UK 4Royal School of Library and Information Science Birketinget 6Copenhagen DK 2300 Denmark 5Oslo University College Pb 4 St OlavsPlass 0130 Oslo Norway 6Plazi Zinggst 16 3600 Bern Switzerland andAmerican Museum of Natural History Central Park West at 79th Street NewYork NY 10024 USA 7Institute of Biodiversity and Ecosystem ResearchBulgarian Academy of Sciences and Pensoft Publishers 13a Geomilev Street1111 Sophia Bulgaria 8BioMedCentral Ltd Floor 6 236 Grayrsquos Inn RoadLondon WC1X 8HB UK 9Global Biodiversity Information Facility SecretariatUniversitetsparken 15 DK 2100 Copenhagen Denmark

Competing interestsThe authors declare that they have no competing interests

Published 15 December 2011

References1 Merriam-Webster [httpwwwmerriam-webstercomdictionarydata]2 Wikipedia [httpenwikipediaorgwikiData]3 National Science Foundation Sustainable Digital Data Preservation and

Access Network Partners (DataNet) Program Solicitation NSF 07-601 2008[httpwwwnsfgovpubs2007nsf07601nsf07601htmtoc]

4 AnthroDPA Metadata Working Group Report of the AnthroDPA MetaDataWorking Group May 2009 Sponsored by the Wenner-Gren Foundationand the US NSF[httpanthrodatadpaorgMediaAnthroDataDPA20Reportpdf]

5 Ackoff RL From data to wisdom Journal of Applied Systems Analysis 1989163-9

6 Bellinger C Castro D Mills A Data Information Knowledge and Wisdom2004 [httpwwwsystems-thinkingorgdikwdikwhtm]

7 Bose R Frew J Lineage retrieval for scientific data processing a surveyACM Computing Surveys 2005 371-28

8 Lathe W Williams J Mangan M Karolchik D Genomic data resourceschallenges and promises Nature Education 2008 13[httpwwwnaturecomscitabletopicpageGenomic-Data-Resources-Challenges-and-Promises-743721]

9 Grantham HS Moilanen A Wilson KA Pressey RL Rebelo TGPossingham HP Diminishing return on investment for biodiversity datain conservation planning Conservation Letters 1190-198 doi 101111j1755-263X200800029x

10 Closing the Climategate Nature 2010 468345 doi 101038468345a11 Penev L Erwin T Miller J Chavan V Moritz T Griswold C Publication and

dissemination of datasets in taxonomy ZooKeys working exampleZooKeys 2009 111-8 doi 103897zookeys11210

12 GBIF GBIF Work Programme 2009-2010 Copenhagen Global BiodiversityInformation Facility 2008

13 Merton RK The Normative Structure of Science The Sociology of ScienceTheoretical and Empirical Investigations Chicago University of Chicago Press1979 267-278

14 Cavendish H Read AS Experiments to determine the density of theearth Philos Trans R Soc Lond 1798 II469-526

15 Michener WK Meta-information concepts for ecological datamanagement Ecological Informatics 2006 13-7 doi 101016jecoinf200508004

16 Voss RS Emmons L Mammalian diversity in neotropical lowlandrainforests a preliminary assessment Bulletin of the American Museum ofNatural History 1996 230

17 Nur N Jones SL Geupel GR Statistical Guide to Data Analysis of AvianMonitoring Programs BTP-R6001-1999 Washington DC US Departmentof the Interior Fish and Wildlife Service 1999 61[httplibraryfwsgovPubs9avian_monitoringpdf]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 8 of 10

18 Agosti D Majer J Alonso E Schultz TR Ants Standard Methods forMeasuring and Monitoring Biodiversity Biological Diversity HandbookSeries Washington DC Smithsonian Institution Press 2000 [httpantbaseorgantspublications2033020330pdf]

19 EDIT Platform for Cybertaxonomy [httpwp5e-taxonomyeu]20 EDIT Volume on field recording techniques and protocols for all taxa

biodiversity inventories 2010 [httpwwwabctaxabevolumesvolume-8-manual-atbi]

21 Knowledge Network for Biodiversity an Introduction to EcologicalMetadata Language [httpknbecoinformaticsorgeml_metadata_guidehtml]

22 Borer ET Seabloom EW Jones MB Schildhauer M Some simple guidelinesfor effective data management ESA Bulletin 2009 90206-214[httpwwwesajournalsorgdoipdf1018900012-9623-902205]

23 The Kepler Project [httpskepler-projectorg]24 Giardine B Riemer C Hardison RC Burhans R Elnitski L Shah P Zhang Y

Blankenberg D Albert I Taylor J Miller W Kent WJ Nekrutenko A Galaxy aplatform for interactive large-scale genome analysis Genome Res 2005151451-1455

25 DDI Alliance Metadata specification for social and behavioral sciencesver 31[http httpwwwddiallianceorg]

26 Green T We need publishing standards for datasets and data tablesWhite paper OECD Publishing 2009 9-11 doi 101787603233448430

27 International Nucleotide Sequence Database Collaboration [httpinsdcorg]

28 GenBank [httpwwwncbinlmnihgovGenbankindexhtml]29 European Nucleotide Archive [httpwwwebiacukena]30 DNA Data Bank of Japan [httpwwwddbjnigacjp]31 Chavan VS Ingwersen P Towards a data publishing framework for

primary biodiversity data challenges and potentials for the biodiversityinformatics community BMC Bioinformatics 2009 10(Suppl 14)S2 doi1011861471-2105-10-S14-S2

32 Penev L Sharkey M Erwin T van Noort S Buffington M Seltmann KJohnson N Taylor M Thompson FC Dallwitz MJ Data publication anddissemination of interactive keys under the open access modelZooKeys working example ZooKeys 2009 211-17 doi 103897zookeys21274

33 Reichman OJ Jones MB Schildhauer MP Challenges and opportunitiesof open data in ecology Science 2011 331703 doi 101126science1197962

34 Craigie ID Baillie JEM Balmford A Carbone C Collen B Green REHutton JM Large marine population declines in Africarsquos protected areasBiol Conserv 2010 1432221-2228

35 Berlin Declaration on Open Access to Knowledge in the Sciences andHumanities 2003 [httpoampgdelangen-ukberlin-prozessberliner-erklarung]

36 Berlin Declaration Table of Signatories [httpoampgdelangen-ukberlin-prozesssignatoren]

37 About Conservation Commons [httpconservationcommonsnetcc_en_1-about-conservation-commons]

38 Conservation Commons Partners [httpconservationcommonsnetpartners]

39 Chavan V Watve AV Londhe MS Rane NS Pandit AT Krishnan SCataloguing Indian biota the electronic catalogue of known Indianfauna Curr Sci 2004 87749-763

40 Sarkar IN Biodiversity informatics organizing and linking informationacross the spectrum of life Brief Bioinf 2007 8347-357

41 Page RDM Biodiversity informatics the challenge of linking data and therole of shared identifiers Brief Bioinf 2008 9345-354

42 Faith DP Collen B Arino AH Koleff P Guinotte J Kerr J Chavan V Bridgingthe biodiversity data gaps recommendations of the GBIF ContentNeeds Assessment Task Group Biodiversity Informatics 2011

43 GBIF Data Portal [httpdatagbiforg]44 Butler D Gee H Macilwain C Museum research comes off list of

endangered species Nature 1998 394115-11745 Chavan V Krishnan S Natural history collections A call for national

information infrastructure Curr Sci 2003 8434-4246 Arino AH Approaches to estimating the universe of natural history

collections data Biodiversity Informatics 2010 781-9247 Heidorn PB Shedding light on the dark data in the long-tail of science

Library Trends 2008 57280-299 doi 101353lib00036

48 GBIF GBIF commissions Data Publishing Framework Task Group (10March 2009)[httpwwwgbiforgcommunicationsnews-and-eventsshowsinglearticlegbif-commissions-data-publishing-framework-task-group]

49 Chavan V Data Publishing = Scholarly Publishing e-Biosphere 09International Conference on Biodiversity Informatics June 2009 London[httpwwwslidesharenetvishwaschavanebiosphere09-vc-final-1734144]

50 Roberts D Chavan V Standards identifier could mobilize data and freetime Nature 2008 453449-450

51 IETF RFC 2119 (Released 1997)[httpwwwietforgrfcrfc2119txt]52 CITES [httpwwwcitesorg]53 TRAFFIC [httpwwwtrafficorg]54 BioPAX - Biological Pathway Exchange [httpwwwbiopaxorg]55 Berendsohn WG Chavan V Macklin JA Recommendations of the GBIF

Task Group on the Global Strategy and Action Plan for the mobilizationof the natural history collections data Biodiversity Informatics 2010767-71

56 Global Biodiversity Information Facility Report of the GBIF MetadataImplementation Framework Task Group (MIFTG) Copenhagen GlobalBiodiversity Information Facility 2009 [httpwww2gbiforgGBIF-MIFTG-Reportpdf]

57 Goddard A Wilson N Cryer P Yamashita G Data hosting infrastructure forprimary biodiversity data BMC Bioinformatics 2011 12(Suppl 15)S5

58 GBIF Adoption of Persistent Identifiers for Biodiversity InformaticsRecommendations of the GBIF LSID GUID Task Group CopenhagenGlobal Biodiversity Information Facility 2009 [httpwww2gbiforgPersistent-Identifierspdf]

59 Ingwersen P Chavan V Indicators for the Data Usage Index (DUI) anincentive for publishing primary biodiversity data through globalinformation infrastructure BMC Bioinformatics 2011 12(Suppl 15)S3

60 DataCite Metadata [httpswwwdatadryadorgwikiDataCite_Metadata]61 Arino AH Chavan V King N The Biodiversity Informatics Potential Index

BMC Bioinformatics 2011 12(Suppl 15)S462 Science Commons Protocol for Implementing Open Access Data [http

sciencecommonsorgprojectspublishingopen-access-data-protocol]63 Open Knowledge Foundation [httpokfnorg]64 Morris R Olson A OrsquoTuama E Riccardi G Whitbread G Hagedorn G

Teage I Heikkinen M Leary P Barve V Chavan V Recommendations of theGBIF Multimedia Resources Task Group Copenhagen Global BiodiversityInformation Facility 2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

65 Morris R Olson A Freeland C Hagedorn G Riccardi G Carausu M-COrsquoTuama E Chavan V Mobilising Multimedia Resources in Biodiversity2nd Report of the GBIF Multimedia Resources Task Group (MRTG)Copenhagen Global Biodiversity Information Facility 2009 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

66 Kelling S Ingole B Daly B Stein B Lepage D OrsquoTuama E Cooper JJones M Lahti T Chavan V Recommendations of the GBIF ObservationalData Task Group Copenhagen Global Biodiversity Information Facility2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

67 Balde O Encinas Escribano M Gonzaacutelez-Talavaacuten A Martens MJMNorton GA Talukdar GH GBIF Task Group on Electronic Learning FinalReport version 10 Copenhagen Global Biodiversity Information Facility2010 [httplinksgbiforggbif_elearning_task_group_en_v1pdf]

68 Catapano T Hobern D Lapp H Morris RA Morrision N Noy NSchildhauer M Thau D Recommendations for the Use of KnowledgeOrganisation Systems by GBIF Copenhagen Global BiodiversityInformation Facility 2001 [httplinksgbiforggbif_kos_whitepaper_v1pdf]Released on 04 Feb 2011

69 Hill AW Otegui J Arintildeo AH Guralnick RP GBIF Position Paper on FutureDirections and Recommendations for Enhancing Fitness-for-Use Acrossthe GBIF Network version 10 Copenhagen Global BiodiversityInformation Facility 2010 [httpwww2gbiforgGPP-Finalpdf] PrimaryBiodiversity Data

70 Chavan V Towards Data Publishing Framework DataCite Summer Meeting7-8 June 2010 Hannover Germany [httpflowcastsmediaelearninguni-hannoverde2010-07-05datacite2010AcquiringhighqualityresearchdataAndreasHense-640-video-O3hD9ZOmmp4]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 9 of 10

71 Chavan V Penev L The data paper a mechanism to incentivize datapublishing in biodiversity science BMC Bioinformatics 2011 12(Suppl15)S2

72 Berents P Hamer M Chavan V Towards demand driven publishingapproaches to the prioritization of digitization of natural historycollections data Biodiversity Informatics 2010 7113-119

73 Chavan VS Sood RK Arino AH Best Practice Guide for lsquoData Discoveryand Publishing Strategy and Action Plansrsquo version 10 CopenhagenGlobal Biodiversity Information Facility 2010 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

74 NSF Data Management Plan Requirements [httpwwwnsfgovenggeneraldmpjsp]

75 GBIF GBIF Strategic Plan 2012-2016 Seizing the Future CopenhagenGlobal Biodiversity Information Facility 2011 [httpgbifddbjnigacjpgbif_newsuploadGBIF_Strategic_Plan_2012-16pdf]

76 Gaikwad J Chavan V Open access and biodiversity conservationchallenges and potentials for the developing world Data Science Journal2006 51-17

doi1011861471-2105-12-S15-S1Cite this article as Moritz et al Towards mainstreaming of biodiversitydata publishing recommendations of the GBIF Data PublishingFramework Task Group BMC Bioinformatics 2011 12(Suppl 15)S1

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 10 of 10

  • Abstract
    • Background
    • Discussion
    • Conclusions
      • Background
        • Data usage and definitions
        • The volume of data
        • Collections of data databases datasets and data tables
        • How data have meaning metadata
        • Provision of metadata
        • Open access and biodiversity data
        • The GBIF data publishing framework task group
        • A data publishing framework for primary biodiversity data
          • Recommendations
          • Discussion
          • Conclusions and future work
          • Acknowledgements
          • Author details
          • Competing interests
          • References

biodiversity data substantiating public investment inbiodiversity science and conservation of bioticresourcesThe DPF TG notes several preliminary efforts to

implement these recommendations by the GBIF Secre-tariat The DPF TG recommendation on incentivizingefforts for metadata authoring has led the GBIF secre-tariat to commission Pensoft Publishers to create a lsquodatapaperrsquo [71] section in four of its journals (BioRisks Phy-toKeys NeoBiota and ZooKeys) alongside a lsquopush-buttonrsquomechanism to generate XML-encoded manuscripts frommetadata descriptions to be submitted directly to thepublisher for peer review and editorial evaluation andpublication in a form of a data paper [71] The BIPIndex an exploratory study to develop metrics to deter-mine country-level biodiversity informatics potentialshas been undertaken [61] GBIF was moreover invitedto be part of the group of experts convened by theCODATA (the Committee on Data for Science andTechnology) to develop an approach to data citationWe were mandated to make recommendations for

potential uptake by the GBIF network However webelieve that these recommendations apply to thebroader biodiversity informatics and ecoinformaticscommunity Nevertheless we reiterate that the GBIFnetwork is the most natural venue to kick-start the earlyimplementation of these recommendations As GBIFenters into its third phase in which it aspires to be theforemost global resource for biodiversity information[75] an early leadership and proactive step towardsimplementation of these recommendations is imperativefor its success

Conclusions and future workThe effective sharing of research data has become a goalof the international research community Implementa-tion of these recommendations should expedite the pro-gress of archiving curation discovery and publishing ofprimary biodiversity data because scientists and origina-tors of data will realize the value and incentives for suchefforts We believe that implementation of our recom-mendations by the GBIF network and its adoption bysimilar initiatives such as GEO-BON IPBES and CBDwill contribute to a much needed global research infra-structure and specifically to an open access regime inbiodiversity and conservation science We furtherbelieve that adoption should encourage the evolution ofa richly informed virtual research space for future stu-dies in biodiversity [76] However we believe that ulti-mately implementation of these recommendations willdepend less on policy-political decisions or technical-infrastructural development and primarily on culturalnormative and attitudinal changes by individuals institu-tions and organizations

AcknowledgementsThis article has been published as part of BMC Bioinformatics Volume 12Supplement 15 2011 Data publishing framework for primary biodiversitydata The full contents of the supplement are available online at httpwwwbiomedcentralcom1471-210512issue=S15 Publication of the supplementwas supported by the Global Biodiversity Information Facility

Author details11968frac12 South Shenandoah Street Los Angeles California 90034-1208 USA2Aundh Pune 411007 India 3Zoology Microbiology Research GroupZoology Department Natural History Museum Cromwell Road London SW75BD UK 4Royal School of Library and Information Science Birketinget 6Copenhagen DK 2300 Denmark 5Oslo University College Pb 4 St OlavsPlass 0130 Oslo Norway 6Plazi Zinggst 16 3600 Bern Switzerland andAmerican Museum of Natural History Central Park West at 79th Street NewYork NY 10024 USA 7Institute of Biodiversity and Ecosystem ResearchBulgarian Academy of Sciences and Pensoft Publishers 13a Geomilev Street1111 Sophia Bulgaria 8BioMedCentral Ltd Floor 6 236 Grayrsquos Inn RoadLondon WC1X 8HB UK 9Global Biodiversity Information Facility SecretariatUniversitetsparken 15 DK 2100 Copenhagen Denmark

Competing interestsThe authors declare that they have no competing interests

Published 15 December 2011

References1 Merriam-Webster [httpwwwmerriam-webstercomdictionarydata]2 Wikipedia [httpenwikipediaorgwikiData]3 National Science Foundation Sustainable Digital Data Preservation and

Access Network Partners (DataNet) Program Solicitation NSF 07-601 2008[httpwwwnsfgovpubs2007nsf07601nsf07601htmtoc]

4 AnthroDPA Metadata Working Group Report of the AnthroDPA MetaDataWorking Group May 2009 Sponsored by the Wenner-Gren Foundationand the US NSF[httpanthrodatadpaorgMediaAnthroDataDPA20Reportpdf]

5 Ackoff RL From data to wisdom Journal of Applied Systems Analysis 1989163-9

6 Bellinger C Castro D Mills A Data Information Knowledge and Wisdom2004 [httpwwwsystems-thinkingorgdikwdikwhtm]

7 Bose R Frew J Lineage retrieval for scientific data processing a surveyACM Computing Surveys 2005 371-28

8 Lathe W Williams J Mangan M Karolchik D Genomic data resourceschallenges and promises Nature Education 2008 13[httpwwwnaturecomscitabletopicpageGenomic-Data-Resources-Challenges-and-Promises-743721]

9 Grantham HS Moilanen A Wilson KA Pressey RL Rebelo TGPossingham HP Diminishing return on investment for biodiversity datain conservation planning Conservation Letters 1190-198 doi 101111j1755-263X200800029x

10 Closing the Climategate Nature 2010 468345 doi 101038468345a11 Penev L Erwin T Miller J Chavan V Moritz T Griswold C Publication and

dissemination of datasets in taxonomy ZooKeys working exampleZooKeys 2009 111-8 doi 103897zookeys11210

12 GBIF GBIF Work Programme 2009-2010 Copenhagen Global BiodiversityInformation Facility 2008

13 Merton RK The Normative Structure of Science The Sociology of ScienceTheoretical and Empirical Investigations Chicago University of Chicago Press1979 267-278

14 Cavendish H Read AS Experiments to determine the density of theearth Philos Trans R Soc Lond 1798 II469-526

15 Michener WK Meta-information concepts for ecological datamanagement Ecological Informatics 2006 13-7 doi 101016jecoinf200508004

16 Voss RS Emmons L Mammalian diversity in neotropical lowlandrainforests a preliminary assessment Bulletin of the American Museum ofNatural History 1996 230

17 Nur N Jones SL Geupel GR Statistical Guide to Data Analysis of AvianMonitoring Programs BTP-R6001-1999 Washington DC US Departmentof the Interior Fish and Wildlife Service 1999 61[httplibraryfwsgovPubs9avian_monitoringpdf]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 8 of 10

18 Agosti D Majer J Alonso E Schultz TR Ants Standard Methods forMeasuring and Monitoring Biodiversity Biological Diversity HandbookSeries Washington DC Smithsonian Institution Press 2000 [httpantbaseorgantspublications2033020330pdf]

19 EDIT Platform for Cybertaxonomy [httpwp5e-taxonomyeu]20 EDIT Volume on field recording techniques and protocols for all taxa

biodiversity inventories 2010 [httpwwwabctaxabevolumesvolume-8-manual-atbi]

21 Knowledge Network for Biodiversity an Introduction to EcologicalMetadata Language [httpknbecoinformaticsorgeml_metadata_guidehtml]

22 Borer ET Seabloom EW Jones MB Schildhauer M Some simple guidelinesfor effective data management ESA Bulletin 2009 90206-214[httpwwwesajournalsorgdoipdf1018900012-9623-902205]

23 The Kepler Project [httpskepler-projectorg]24 Giardine B Riemer C Hardison RC Burhans R Elnitski L Shah P Zhang Y

Blankenberg D Albert I Taylor J Miller W Kent WJ Nekrutenko A Galaxy aplatform for interactive large-scale genome analysis Genome Res 2005151451-1455

25 DDI Alliance Metadata specification for social and behavioral sciencesver 31[http httpwwwddiallianceorg]

26 Green T We need publishing standards for datasets and data tablesWhite paper OECD Publishing 2009 9-11 doi 101787603233448430

27 International Nucleotide Sequence Database Collaboration [httpinsdcorg]

28 GenBank [httpwwwncbinlmnihgovGenbankindexhtml]29 European Nucleotide Archive [httpwwwebiacukena]30 DNA Data Bank of Japan [httpwwwddbjnigacjp]31 Chavan VS Ingwersen P Towards a data publishing framework for

primary biodiversity data challenges and potentials for the biodiversityinformatics community BMC Bioinformatics 2009 10(Suppl 14)S2 doi1011861471-2105-10-S14-S2

32 Penev L Sharkey M Erwin T van Noort S Buffington M Seltmann KJohnson N Taylor M Thompson FC Dallwitz MJ Data publication anddissemination of interactive keys under the open access modelZooKeys working example ZooKeys 2009 211-17 doi 103897zookeys21274

33 Reichman OJ Jones MB Schildhauer MP Challenges and opportunitiesof open data in ecology Science 2011 331703 doi 101126science1197962

34 Craigie ID Baillie JEM Balmford A Carbone C Collen B Green REHutton JM Large marine population declines in Africarsquos protected areasBiol Conserv 2010 1432221-2228

35 Berlin Declaration on Open Access to Knowledge in the Sciences andHumanities 2003 [httpoampgdelangen-ukberlin-prozessberliner-erklarung]

36 Berlin Declaration Table of Signatories [httpoampgdelangen-ukberlin-prozesssignatoren]

37 About Conservation Commons [httpconservationcommonsnetcc_en_1-about-conservation-commons]

38 Conservation Commons Partners [httpconservationcommonsnetpartners]

39 Chavan V Watve AV Londhe MS Rane NS Pandit AT Krishnan SCataloguing Indian biota the electronic catalogue of known Indianfauna Curr Sci 2004 87749-763

40 Sarkar IN Biodiversity informatics organizing and linking informationacross the spectrum of life Brief Bioinf 2007 8347-357

41 Page RDM Biodiversity informatics the challenge of linking data and therole of shared identifiers Brief Bioinf 2008 9345-354

42 Faith DP Collen B Arino AH Koleff P Guinotte J Kerr J Chavan V Bridgingthe biodiversity data gaps recommendations of the GBIF ContentNeeds Assessment Task Group Biodiversity Informatics 2011

43 GBIF Data Portal [httpdatagbiforg]44 Butler D Gee H Macilwain C Museum research comes off list of

endangered species Nature 1998 394115-11745 Chavan V Krishnan S Natural history collections A call for national

information infrastructure Curr Sci 2003 8434-4246 Arino AH Approaches to estimating the universe of natural history

collections data Biodiversity Informatics 2010 781-9247 Heidorn PB Shedding light on the dark data in the long-tail of science

Library Trends 2008 57280-299 doi 101353lib00036

48 GBIF GBIF commissions Data Publishing Framework Task Group (10March 2009)[httpwwwgbiforgcommunicationsnews-and-eventsshowsinglearticlegbif-commissions-data-publishing-framework-task-group]

49 Chavan V Data Publishing = Scholarly Publishing e-Biosphere 09International Conference on Biodiversity Informatics June 2009 London[httpwwwslidesharenetvishwaschavanebiosphere09-vc-final-1734144]

50 Roberts D Chavan V Standards identifier could mobilize data and freetime Nature 2008 453449-450

51 IETF RFC 2119 (Released 1997)[httpwwwietforgrfcrfc2119txt]52 CITES [httpwwwcitesorg]53 TRAFFIC [httpwwwtrafficorg]54 BioPAX - Biological Pathway Exchange [httpwwwbiopaxorg]55 Berendsohn WG Chavan V Macklin JA Recommendations of the GBIF

Task Group on the Global Strategy and Action Plan for the mobilizationof the natural history collections data Biodiversity Informatics 2010767-71

56 Global Biodiversity Information Facility Report of the GBIF MetadataImplementation Framework Task Group (MIFTG) Copenhagen GlobalBiodiversity Information Facility 2009 [httpwww2gbiforgGBIF-MIFTG-Reportpdf]

57 Goddard A Wilson N Cryer P Yamashita G Data hosting infrastructure forprimary biodiversity data BMC Bioinformatics 2011 12(Suppl 15)S5

58 GBIF Adoption of Persistent Identifiers for Biodiversity InformaticsRecommendations of the GBIF LSID GUID Task Group CopenhagenGlobal Biodiversity Information Facility 2009 [httpwww2gbiforgPersistent-Identifierspdf]

59 Ingwersen P Chavan V Indicators for the Data Usage Index (DUI) anincentive for publishing primary biodiversity data through globalinformation infrastructure BMC Bioinformatics 2011 12(Suppl 15)S3

60 DataCite Metadata [httpswwwdatadryadorgwikiDataCite_Metadata]61 Arino AH Chavan V King N The Biodiversity Informatics Potential Index

BMC Bioinformatics 2011 12(Suppl 15)S462 Science Commons Protocol for Implementing Open Access Data [http

sciencecommonsorgprojectspublishingopen-access-data-protocol]63 Open Knowledge Foundation [httpokfnorg]64 Morris R Olson A OrsquoTuama E Riccardi G Whitbread G Hagedorn G

Teage I Heikkinen M Leary P Barve V Chavan V Recommendations of theGBIF Multimedia Resources Task Group Copenhagen Global BiodiversityInformation Facility 2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

65 Morris R Olson A Freeland C Hagedorn G Riccardi G Carausu M-COrsquoTuama E Chavan V Mobilising Multimedia Resources in Biodiversity2nd Report of the GBIF Multimedia Resources Task Group (MRTG)Copenhagen Global Biodiversity Information Facility 2009 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

66 Kelling S Ingole B Daly B Stein B Lepage D OrsquoTuama E Cooper JJones M Lahti T Chavan V Recommendations of the GBIF ObservationalData Task Group Copenhagen Global Biodiversity Information Facility2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

67 Balde O Encinas Escribano M Gonzaacutelez-Talavaacuten A Martens MJMNorton GA Talukdar GH GBIF Task Group on Electronic Learning FinalReport version 10 Copenhagen Global Biodiversity Information Facility2010 [httplinksgbiforggbif_elearning_task_group_en_v1pdf]

68 Catapano T Hobern D Lapp H Morris RA Morrision N Noy NSchildhauer M Thau D Recommendations for the Use of KnowledgeOrganisation Systems by GBIF Copenhagen Global BiodiversityInformation Facility 2001 [httplinksgbiforggbif_kos_whitepaper_v1pdf]Released on 04 Feb 2011

69 Hill AW Otegui J Arintildeo AH Guralnick RP GBIF Position Paper on FutureDirections and Recommendations for Enhancing Fitness-for-Use Acrossthe GBIF Network version 10 Copenhagen Global BiodiversityInformation Facility 2010 [httpwww2gbiforgGPP-Finalpdf] PrimaryBiodiversity Data

70 Chavan V Towards Data Publishing Framework DataCite Summer Meeting7-8 June 2010 Hannover Germany [httpflowcastsmediaelearninguni-hannoverde2010-07-05datacite2010AcquiringhighqualityresearchdataAndreasHense-640-video-O3hD9ZOmmp4]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 9 of 10

71 Chavan V Penev L The data paper a mechanism to incentivize datapublishing in biodiversity science BMC Bioinformatics 2011 12(Suppl15)S2

72 Berents P Hamer M Chavan V Towards demand driven publishingapproaches to the prioritization of digitization of natural historycollections data Biodiversity Informatics 2010 7113-119

73 Chavan VS Sood RK Arino AH Best Practice Guide for lsquoData Discoveryand Publishing Strategy and Action Plansrsquo version 10 CopenhagenGlobal Biodiversity Information Facility 2010 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

74 NSF Data Management Plan Requirements [httpwwwnsfgovenggeneraldmpjsp]

75 GBIF GBIF Strategic Plan 2012-2016 Seizing the Future CopenhagenGlobal Biodiversity Information Facility 2011 [httpgbifddbjnigacjpgbif_newsuploadGBIF_Strategic_Plan_2012-16pdf]

76 Gaikwad J Chavan V Open access and biodiversity conservationchallenges and potentials for the developing world Data Science Journal2006 51-17

doi1011861471-2105-12-S15-S1Cite this article as Moritz et al Towards mainstreaming of biodiversitydata publishing recommendations of the GBIF Data PublishingFramework Task Group BMC Bioinformatics 2011 12(Suppl 15)S1

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 10 of 10

  • Abstract
    • Background
    • Discussion
    • Conclusions
      • Background
        • Data usage and definitions
        • The volume of data
        • Collections of data databases datasets and data tables
        • How data have meaning metadata
        • Provision of metadata
        • Open access and biodiversity data
        • The GBIF data publishing framework task group
        • A data publishing framework for primary biodiversity data
          • Recommendations
          • Discussion
          • Conclusions and future work
          • Acknowledgements
          • Author details
          • Competing interests
          • References

18 Agosti D Majer J Alonso E Schultz TR Ants Standard Methods forMeasuring and Monitoring Biodiversity Biological Diversity HandbookSeries Washington DC Smithsonian Institution Press 2000 [httpantbaseorgantspublications2033020330pdf]

19 EDIT Platform for Cybertaxonomy [httpwp5e-taxonomyeu]20 EDIT Volume on field recording techniques and protocols for all taxa

biodiversity inventories 2010 [httpwwwabctaxabevolumesvolume-8-manual-atbi]

21 Knowledge Network for Biodiversity an Introduction to EcologicalMetadata Language [httpknbecoinformaticsorgeml_metadata_guidehtml]

22 Borer ET Seabloom EW Jones MB Schildhauer M Some simple guidelinesfor effective data management ESA Bulletin 2009 90206-214[httpwwwesajournalsorgdoipdf1018900012-9623-902205]

23 The Kepler Project [httpskepler-projectorg]24 Giardine B Riemer C Hardison RC Burhans R Elnitski L Shah P Zhang Y

Blankenberg D Albert I Taylor J Miller W Kent WJ Nekrutenko A Galaxy aplatform for interactive large-scale genome analysis Genome Res 2005151451-1455

25 DDI Alliance Metadata specification for social and behavioral sciencesver 31[http httpwwwddiallianceorg]

26 Green T We need publishing standards for datasets and data tablesWhite paper OECD Publishing 2009 9-11 doi 101787603233448430

27 International Nucleotide Sequence Database Collaboration [httpinsdcorg]

28 GenBank [httpwwwncbinlmnihgovGenbankindexhtml]29 European Nucleotide Archive [httpwwwebiacukena]30 DNA Data Bank of Japan [httpwwwddbjnigacjp]31 Chavan VS Ingwersen P Towards a data publishing framework for

primary biodiversity data challenges and potentials for the biodiversityinformatics community BMC Bioinformatics 2009 10(Suppl 14)S2 doi1011861471-2105-10-S14-S2

32 Penev L Sharkey M Erwin T van Noort S Buffington M Seltmann KJohnson N Taylor M Thompson FC Dallwitz MJ Data publication anddissemination of interactive keys under the open access modelZooKeys working example ZooKeys 2009 211-17 doi 103897zookeys21274

33 Reichman OJ Jones MB Schildhauer MP Challenges and opportunitiesof open data in ecology Science 2011 331703 doi 101126science1197962

34 Craigie ID Baillie JEM Balmford A Carbone C Collen B Green REHutton JM Large marine population declines in Africarsquos protected areasBiol Conserv 2010 1432221-2228

35 Berlin Declaration on Open Access to Knowledge in the Sciences andHumanities 2003 [httpoampgdelangen-ukberlin-prozessberliner-erklarung]

36 Berlin Declaration Table of Signatories [httpoampgdelangen-ukberlin-prozesssignatoren]

37 About Conservation Commons [httpconservationcommonsnetcc_en_1-about-conservation-commons]

38 Conservation Commons Partners [httpconservationcommonsnetpartners]

39 Chavan V Watve AV Londhe MS Rane NS Pandit AT Krishnan SCataloguing Indian biota the electronic catalogue of known Indianfauna Curr Sci 2004 87749-763

40 Sarkar IN Biodiversity informatics organizing and linking informationacross the spectrum of life Brief Bioinf 2007 8347-357

41 Page RDM Biodiversity informatics the challenge of linking data and therole of shared identifiers Brief Bioinf 2008 9345-354

42 Faith DP Collen B Arino AH Koleff P Guinotte J Kerr J Chavan V Bridgingthe biodiversity data gaps recommendations of the GBIF ContentNeeds Assessment Task Group Biodiversity Informatics 2011

43 GBIF Data Portal [httpdatagbiforg]44 Butler D Gee H Macilwain C Museum research comes off list of

endangered species Nature 1998 394115-11745 Chavan V Krishnan S Natural history collections A call for national

information infrastructure Curr Sci 2003 8434-4246 Arino AH Approaches to estimating the universe of natural history

collections data Biodiversity Informatics 2010 781-9247 Heidorn PB Shedding light on the dark data in the long-tail of science

Library Trends 2008 57280-299 doi 101353lib00036

48 GBIF GBIF commissions Data Publishing Framework Task Group (10March 2009)[httpwwwgbiforgcommunicationsnews-and-eventsshowsinglearticlegbif-commissions-data-publishing-framework-task-group]

49 Chavan V Data Publishing = Scholarly Publishing e-Biosphere 09International Conference on Biodiversity Informatics June 2009 London[httpwwwslidesharenetvishwaschavanebiosphere09-vc-final-1734144]

50 Roberts D Chavan V Standards identifier could mobilize data and freetime Nature 2008 453449-450

51 IETF RFC 2119 (Released 1997)[httpwwwietforgrfcrfc2119txt]52 CITES [httpwwwcitesorg]53 TRAFFIC [httpwwwtrafficorg]54 BioPAX - Biological Pathway Exchange [httpwwwbiopaxorg]55 Berendsohn WG Chavan V Macklin JA Recommendations of the GBIF

Task Group on the Global Strategy and Action Plan for the mobilizationof the natural history collections data Biodiversity Informatics 2010767-71

56 Global Biodiversity Information Facility Report of the GBIF MetadataImplementation Framework Task Group (MIFTG) Copenhagen GlobalBiodiversity Information Facility 2009 [httpwww2gbiforgGBIF-MIFTG-Reportpdf]

57 Goddard A Wilson N Cryer P Yamashita G Data hosting infrastructure forprimary biodiversity data BMC Bioinformatics 2011 12(Suppl 15)S5

58 GBIF Adoption of Persistent Identifiers for Biodiversity InformaticsRecommendations of the GBIF LSID GUID Task Group CopenhagenGlobal Biodiversity Information Facility 2009 [httpwww2gbiforgPersistent-Identifierspdf]

59 Ingwersen P Chavan V Indicators for the Data Usage Index (DUI) anincentive for publishing primary biodiversity data through globalinformation infrastructure BMC Bioinformatics 2011 12(Suppl 15)S3

60 DataCite Metadata [httpswwwdatadryadorgwikiDataCite_Metadata]61 Arino AH Chavan V King N The Biodiversity Informatics Potential Index

BMC Bioinformatics 2011 12(Suppl 15)S462 Science Commons Protocol for Implementing Open Access Data [http

sciencecommonsorgprojectspublishingopen-access-data-protocol]63 Open Knowledge Foundation [httpokfnorg]64 Morris R Olson A OrsquoTuama E Riccardi G Whitbread G Hagedorn G

Teage I Heikkinen M Leary P Barve V Chavan V Recommendations of theGBIF Multimedia Resources Task Group Copenhagen Global BiodiversityInformation Facility 2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

65 Morris R Olson A Freeland C Hagedorn G Riccardi G Carausu M-COrsquoTuama E Chavan V Mobilising Multimedia Resources in Biodiversity2nd Report of the GBIF Multimedia Resources Task Group (MRTG)Copenhagen Global Biodiversity Information Facility 2009 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

66 Kelling S Ingole B Daly B Stein B Lepage D OrsquoTuama E Cooper JJones M Lahti T Chavan V Recommendations of the GBIF ObservationalData Task Group Copenhagen Global Biodiversity Information Facility2008 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

67 Balde O Encinas Escribano M Gonzaacutelez-Talavaacuten A Martens MJMNorton GA Talukdar GH GBIF Task Group on Electronic Learning FinalReport version 10 Copenhagen Global Biodiversity Information Facility2010 [httplinksgbiforggbif_elearning_task_group_en_v1pdf]

68 Catapano T Hobern D Lapp H Morris RA Morrision N Noy NSchildhauer M Thau D Recommendations for the Use of KnowledgeOrganisation Systems by GBIF Copenhagen Global BiodiversityInformation Facility 2001 [httplinksgbiforggbif_kos_whitepaper_v1pdf]Released on 04 Feb 2011

69 Hill AW Otegui J Arintildeo AH Guralnick RP GBIF Position Paper on FutureDirections and Recommendations for Enhancing Fitness-for-Use Acrossthe GBIF Network version 10 Copenhagen Global BiodiversityInformation Facility 2010 [httpwww2gbiforgGPP-Finalpdf] PrimaryBiodiversity Data

70 Chavan V Towards Data Publishing Framework DataCite Summer Meeting7-8 June 2010 Hannover Germany [httpflowcastsmediaelearninguni-hannoverde2010-07-05datacite2010AcquiringhighqualityresearchdataAndreasHense-640-video-O3hD9ZOmmp4]

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 9 of 10

71 Chavan V Penev L The data paper a mechanism to incentivize datapublishing in biodiversity science BMC Bioinformatics 2011 12(Suppl15)S2

72 Berents P Hamer M Chavan V Towards demand driven publishingapproaches to the prioritization of digitization of natural historycollections data Biodiversity Informatics 2010 7113-119

73 Chavan VS Sood RK Arino AH Best Practice Guide for lsquoData Discoveryand Publishing Strategy and Action Plansrsquo version 10 CopenhagenGlobal Biodiversity Information Facility 2010 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

74 NSF Data Management Plan Requirements [httpwwwnsfgovenggeneraldmpjsp]

75 GBIF GBIF Strategic Plan 2012-2016 Seizing the Future CopenhagenGlobal Biodiversity Information Facility 2011 [httpgbifddbjnigacjpgbif_newsuploadGBIF_Strategic_Plan_2012-16pdf]

76 Gaikwad J Chavan V Open access and biodiversity conservationchallenges and potentials for the developing world Data Science Journal2006 51-17

doi1011861471-2105-12-S15-S1Cite this article as Moritz et al Towards mainstreaming of biodiversitydata publishing recommendations of the GBIF Data PublishingFramework Task Group BMC Bioinformatics 2011 12(Suppl 15)S1

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 10 of 10

  • Abstract
    • Background
    • Discussion
    • Conclusions
      • Background
        • Data usage and definitions
        • The volume of data
        • Collections of data databases datasets and data tables
        • How data have meaning metadata
        • Provision of metadata
        • Open access and biodiversity data
        • The GBIF data publishing framework task group
        • A data publishing framework for primary biodiversity data
          • Recommendations
          • Discussion
          • Conclusions and future work
          • Acknowledgements
          • Author details
          • Competing interests
          • References

71 Chavan V Penev L The data paper a mechanism to incentivize datapublishing in biodiversity science BMC Bioinformatics 2011 12(Suppl15)S2

72 Berents P Hamer M Chavan V Towards demand driven publishingapproaches to the prioritization of digitization of natural historycollections data Biodiversity Informatics 2010 7113-119

73 Chavan VS Sood RK Arino AH Best Practice Guide for lsquoData Discoveryand Publishing Strategy and Action Plansrsquo version 10 CopenhagenGlobal Biodiversity Information Facility 2010 [httpwwwgbiforgcommunicationsresourcesprint-and-online-resourcesdownload-publicationsreports]

74 NSF Data Management Plan Requirements [httpwwwnsfgovenggeneraldmpjsp]

75 GBIF GBIF Strategic Plan 2012-2016 Seizing the Future CopenhagenGlobal Biodiversity Information Facility 2011 [httpgbifddbjnigacjpgbif_newsuploadGBIF_Strategic_Plan_2012-16pdf]

76 Gaikwad J Chavan V Open access and biodiversity conservationchallenges and potentials for the developing world Data Science Journal2006 51-17

doi1011861471-2105-12-S15-S1Cite this article as Moritz et al Towards mainstreaming of biodiversitydata publishing recommendations of the GBIF Data PublishingFramework Task Group BMC Bioinformatics 2011 12(Suppl 15)S1

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Moritz et al BMC Bioinformatics 2011 12(Suppl 15)S1httpwwwbiomedcentralcom1471-210512S15S1

Page 10 of 10

  • Abstract
    • Background
    • Discussion
    • Conclusions
      • Background
        • Data usage and definitions
        • The volume of data
        • Collections of data databases datasets and data tables
        • How data have meaning metadata
        • Provision of metadata
        • Open access and biodiversity data
        • The GBIF data publishing framework task group
        • A data publishing framework for primary biodiversity data
          • Recommendations
          • Discussion
          • Conclusions and future work
          • Acknowledgements
          • Author details
          • Competing interests
          • References