81
constraint” on open and effective access to data in “small science” -- in the context of full-life-cycle data management. Tom Moritz National Science Foundation March 30, 2009

"Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Embed Size (px)

Citation preview

Page 1: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Addressing primary “modalities of constraint” on open and

effective access to data in “small science” -- in the context

of full-life-cycle data management.

Tom MoritzNational Science Foundation

March 30, 2009

Page 2: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Wednesday, January 21st, 2009 at 12:00 amFreedom of Information Act

MEMORANDUM FOR THE HEADS OF EXECUTIVE DEPARTMENTS AND AGENCIES SUBJECT: Freedom of Information Act A democracy requires accountability, and accountability requires transparency. As Justice Louis Brandeis wrote, "sunlight is said to be the best of disinfectants." In our democracy, the Freedom of Information Act (FOIA), which encourages accountability through transparency, is the most prominent expression of a profound national commitment to ensuring an open Government. At the heart of that commitment is the idea that accountability is in the interest of the Government and the citizenry alike. The Freedom of Information Act should be administered with a clear presumption: In the face of doubt, openness prevails. The Government should not keep information confidential merely because public officials might be embarrassed by disclosure, because errors and failures might be revealed, or because of speculative or abstract fears. Nondisclosure should never be based on an effort to protect the personal interests of Government officials at the expense of those they are supposed to serve. In responding to requests under the FOIA, executive branch agencies (agencies) should act promptly and in a spirit of cooperation, recognizing that such agencies are servants of the public. All agencies should adopt a presumption in favor of disclosure, in order to renew their commitment to the principles embodied in FOIA, and to usher in a new era of open Government. The presumption of disclosure should be applied to all decisions involving FOIA.

http://www.whitehouse.gov/the_press_office/Freedom_of_Information_Act/

Page 3: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Repatriation of biodiversity information through Clearing House Mechanism of the Convention on Biological Diversity and Global Biodiversity Information Facility; Views and experiences of Peruvian andBolivian non-governmental organizations. Ulla Helimo Master’s Thesis University of Turku Department of Biology 6.10. 2004 p.11. http://enbi.utu.fi/Documents/Ulla%20Helimo%20PRO%20GRADU.pdf [06-06-05]

KNOWLEDGE RESOURCES:

Technology

Page 4: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Poder Politico y ConocimientoResp

ons abi

l idad

y Pod e

r Políticos

Administradores o Gestores

Analistas-Técnicos

Científicos

Conocimiento (en términos científicos-occidentales)Bajo

Alto

Alto

(Sutton, 1999)

From: Organizaciones que aprenden, paises que aprenden: lecciones y AP en Costa Rica by Andrea Ballestero Directora ELAP

Page 5: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Data?The DataNet solicitation defines “data” as:

“any information that can be stored in digital form and accessed electronically, including, but not limited to, numeric data, text, publications, sensor streams, video, audio, algorithms, software, models and simulations, images, etc” (this is a broadly inclusive definition of “data” as an electronic medium…)

Disciplinarily / epistemically, “data” also refers to structured,

conventional expressions of facts (observations, descriptions or measurements ). This definition suggests the importance of discipline specific ontological analyses…

Page 6: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

• Access• Redistribution• Reuse• Absence of Technological Restriction• Attribution• Integrity• No Discrimination Against Persons or Groups• No Discrimination Against Fields of Endeavor• Distribution of License• License Must Not Be Specific to a Package• License Must Not Restrict the Distribution of Other Works

http://opendefinition.org/1.0 [February 20, 2009]

A work is “open” if its manner of distribution satisfies the following conditions

Page 7: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

1. Access: The work shall be available as a whole and at no more than a reasonable reproduction cost, preferably downloading via the Internet without charge. The work must also be available in a convenient and modifiable form.

[Comment: This can be summarized as 'social' openness - not only are you allowed to get the work but you can get it. 'As a whole' prevents the limitation of access by indirect means, for example by only allowing access to a few items of a database at a time.]

2. Redistribution: The license shall not restrict any party from selling or giving away the work either on its own or as part of a package made from works from many different sources. The license shall not require a royalty or other fee for such sale or distribution.

3. Reuse: The license must allow for modifications and derivative works and must allow them to be distributed under the terms of the original work. The license may impose some form of attribution and integrity requirements: see principle 5 (Attribution) and principle 6 (Integrity) below.

[Comment: Note that this clause does not prevent the use of 'viral' or share-alike licenses that require redistribution of modifications under the same terms as the original.]

4. Absence of Technological Restriction: The work must be provided in such a form that there are no technological obstacles to the performance of the above activities. This can be achieved by the provision of the work in an open data format, i.e. one whose specification is publicly and freely available and which places no restrictions monetary or otherwise upon its use.

5. Attribution: The license may require as a condition for redistribution and re-use the attribution of the contributors and creators to the work. If this condition is imposed it must not be onerous. For example if attribution is required a list of those requiring attribution should accompany the work.

6. Integrity: The license may require as a condition for the work being distributed in modified form that the resulting work carry a different name or version number from the original work.

http://opendefinition.org/1.0 [February 20, 2009]

Page 8: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

7. No Discrimination Against Persons or Groups: The license must not discriminate against any person or group of persons.

[Comment: In order to get the maximum benefit from the process, the maximum diversity of persons and groups should be equally eligible to contribute to open knowledge. Therefore we forbid any open-knowledge license from locking anybody out of the process.]

8. No Discrimination Against Fields of Endeavor: The license must not restrict anyone from making use of the work in a specific field of endeavor. For example, it may not restrict the work from being used in a business, or from being used for military research.

[Comment: The major intention of this clause is to prohibit license traps that prevent open source from being used commercially. We want commercial users to join our community, not feel excluded from it.]

9. Distribution of License: The rights attached to the work must apply to all to whom the work is redistributed without the need for execution of an additional license by those parties.

[Comment: This clause is intended to forbid closing of the work by indirect means such as requiring a non-disclosure agreement.]

10. License Must Not Be Specific to a Package: The rights attached to the work must not depend on the work being part of a particular package. If the work is extracted from that package and used or distributed within the terms of the work's license, all parties to whom the work is redistributed should have the same rights as those that are granted in conjunction with the original package.

11. License Must Not Restrict the Distribution of Other Works: The license must not place restrictions on other works that are distributed along with the licensed work. For example, the license must not insist that all other works distributed on the same medium are open.

[Comment: Distributors of open knowledge have the right to make their own choices. Note that 'share-alike' licenses are conformant since those provisions only apply if the whole forms a single work.]

http://opendefinition.org/1.0 [February 20, 2009]

Page 9: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Protocol for Implementing Open Access Data1. Intellectual foundation for the protocolThe motivation behind this memorandum is interoperability of scientific data.The volume of scientific data, and the interconnectedness of the systems under study, makes

integration of data a necessity. For example, life scientists must integrate data from across biology and chemistry to comprehend disease and discover cures, and climate change scientists must integrate data from wildly diverse disciplines to understand our current state and predict the impact of new policies.

The technical challenge of such integration is significant, although emerging technologies appear to be helping. But the forest of terms and conditions around data make integration difficult to legally perform in many cases. One approach might be to develop and recommend a single license: any data with this license can be integrated with any other data under this license.

But this approach, which implicitly builds on intellectual property rights and the ideas of licensing as understood in software and culture, is difficult to scale for scientific uses. There are too many databases under too many terms already, and it is unlikely that any one license or suite of licenses will have the correct mix of terms to gain critical mass and allow massive-scale machine integration of data.

Therefore we instead lay out principles for open access data and a protocol for implementing those principles, and we distribute an Open Access Data Mark and metadata for use on databases and data available under a successful implementation of the protocol.

http://sciencecommons.org/projects/publishing/open-access-data-protocol/

Page 10: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

L. Lessig, “The new Chicago School,” The Journal of Legal Studies, Vol 27 (2) (Pt. 2) June, 1998 pp 661-691.

www.lessig.org/content/articles/works/LessigNewchicschool.pdf

MODALITIES OF CONSTRAINT

Page 11: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

OECD Follow Up Group on Issues of Access to Publicly Funded Research Data. Promoting Access to Public Research Data for Scientific,Economic, and Social Development: Final Report March 2003

Page 12: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

The “DataNet” Program defines “the full data preservation and access lifecycle” as

• “acquisition” • “documentation”• “protection” • “access” • “analysis and dissemination” • “migration” • “disposition”

“Sustainable Digital Data Preservation and Access Network Partners (DataNet) Program Solicitation” NSF 07-601 US National Science Foundation Office of Cyberinfrastructure

Directorate for Computer & Information Science & Engineering

Page 13: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

www.dcc.ac.uk/docs/publications/DCCLifecycle.pdf

Page 14: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

http://wiki.esipfed.org/images/c/c4/IWGDD.pp t

Page 15: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Data – Key Current Conditions

Page 16: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Data is now more than ever available in highly diverse formats from very disparate sources:

(Validation of data and critical analysis of data sources is essential.)

Page 17: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Digital “Explosion” ?• “The digital universe in 2007 — at 2.25 x 1021bits (281 exabytes or

281 billion gigabytes) — was 10% bigger than we thought. The resizing comes as a result of faster growth in cameras, digital TV shipments, and better understanding of information replication.

• “By 2011, the digital universe will be 10 times the size it was in 2006.

• “As forecast, the amount of information created, captured, or replicated exceeded available storage for the first time in 2007. Not all information created and transmitted gets stored, but by 2011, almost half of the digital universe will not have a permanent home.”

The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth through 2011. An IDC Whitepaper www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf

Page 18: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,
Page 19: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

• Many data sets – to be fully useful must be significantly longitudinal

• Thus we must address both legacy data and current/ prospective data

• Older data sets while essential may be much more problematic – Russian Chronicles of Nature / zapovedniks – US LTER Trout Lake, WI example

Page 20: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

NCAR Research Data Archive (RDA)

C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge sharing ,” from the 4th International Digital Curation Conference December 2008 , page 7.

www.dcc.ac.uk/events/dcc-2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]

Page 21: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

The NCAR Research Data Archive (RDA) “The NCAR Research Data Archive (RDA) is a comparatively small

(currently 246 TB, less than 5% of the MSS [Mass Storage System] total size), but very important, part of the MSS stored data. The RDA has been curated by the staff in the Computational and Information Systems Laboratory for over 40 years, [emphasis added] and as such contains reference datasets used by large numbers of scientists. The RDA contents are long-term atmospheric (surface and upper air) and oceanographic observations, grid analyses of observational datasets, operational weather prediction model output, reanalyses, satellite derived datasets, and ancillary datasets, such as topography/bathymetry, vegetation, and land use. The RDA is not a static collection; it is now over 580 datasets with about 100 routinely updated and 10-20 new ones added each year. “

C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge sharing ,” from the 4th International Digital Curation Conference December 2008, page 5.

www.dcc.ac.uk/events/dcc-2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]

Page 22: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

http://www.ncdc.noaa.gov/img/climate/globalwarming/ar4-fig-3-9.gif

Page 23: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Big Science / Small Science?

Page 24: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

The $3.6 billion Large Hadron Collider (LHC) will sample and record the results of up to 600 million proton collisions per second, producing roughly 15 petabytes (15 million gigabytes) of data annually in search of new fundamental particles. To allow thousands of scientists from around the globe to collaborate on the analysis of these data over the next 15 years (the estimated lifetime of the LHC), tens of thousands of computers located around the world are being harnessed in a distributed computing network called the Grid. Within the Grid, described as the most powerful supercomputer system in the world, the avalanche of data will be analyzed, shared, re-purposed and combined in innovative new ways designed to reveal the secrets of the fundamental properties of matter.

LHC source:http://public.web.cern.ch/public/en/LHC/LHC-en.html Source: http://public.web.cern.ch/Public/en/LHC/LHC-en.html

Page 25: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Individual Libraries

Cooperative Projects

National Disciplinary Initiatives

“BIG Science”“Small Science”

Local / Personal Archiving

International Collaborative Research Effort

Individuals

Data Centers

GRIDS

Page 26: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

• Norms and standards for sharing vary by discipline and even within disciplines

• In “big science” (astrophysics / astronomy / meteorology / oceanography / genomics) sharing is expected (if not required) and contributions to a common fund of knowledge are assumed– Standards are relatively clear– Mechanisms for sharing are well-developed– Collective / collaborative authorship is commonplace

• In “small science” such norms are weaker

Page 27: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

• Data does not respect “sectors” –integral data sets must sometimes be drawn from public / private for-profit / not-for-profit sectors

• [At a recent US NAS hearing the Dow Chemical Company reported that it had several hundred thousand technical reports in a proprietary corporate collection…– The greatly extended latency (?) of public access to this

work seems a serious challenge to a fundamental principle of science ]

• Free/ open access and use should be the norm in science -- exceptions to sharing should require special justification

Page 28: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

“Sakhalin Energy relocates offshore pipelines to protect whales”

30/03/2005“Yuzhno-Sakhalinsk, Russian Federation, 30

March 2005: Sakhalin Energy will reroute offshore pipelines in its oil and gas development in the Russian Far East to help protect the endangered western gray whale. “

http://www.shell.com/home/content/media/news_and_library/press_releases/2005/sakhalin_energy_relocates_pipeline_30032005.html

Page 29: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

So…”small science”?(Just a “Cabinet of Curiosities” ?)

http://en.wikipedia.org/wiki/Cabinet_of_curiosities [clipped 03 29 09]

"Musei Wormiani Historia", the frontispiece from the Museum Wormianum depicting Ole Worm's cabinet of curiosities.

Page 30: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

The “small science,” independent investigator approach traditionally has characterized a large area of experimental laboratory sciences, such as chemistry or biomedical research, and field work and studies, such as biodiversity, ecology, microbiology, soil science, and anthropology. The data or samples are collected and analyzed independentlycollected and analyzed independently, and the resulting data sets from such studies generally are heterogeneous and unstandardizedheterogeneous and unstandardized, with few of the individual data holdings deposited in public data repositories or openly shared. The data exist in various twilight exist in various twilight states of accessibilitystates of accessibility, depending on the extent to which they are published, discussed in papers but not revealed, or just known about because of reputation or ongoing work, but kept under absolute or relative secrecy. The data are thus data are thus disaggregated components of an incipient network that disaggregated components of an incipient network that is only as effective as the individual transactions that is only as effective as the individual transactions that put it togetherput it together. Openness and sharing are not ignored, but they are not necessarily dominant either. These values must compete with strategic considerations of self-interest, secrecy, and the logic of mutually beneficial exchange, particularly in areas of research in which commercial applications are more readily identifiable.

The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium. Julie M. Esanu and Paul F. Uhlir, Eds. Steering Committee on the Role of Scientific and Technical Data and Information in the Public Domain Office of International Scientific and Technical Information Programs Board on International Scientific Organizations Policy and Global Affairs Division, National Research Council of the National Academies, p. 8

Page 31: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

“A mishmash of non-standardized databases of raw results and unevenly reported study designs is not a strong foundation for clinical research data

sharing.”Sim, et al “Keeping Raw Data in Context” (letter to) Science VOL 323 6 FEBRUARY 2009 www.sciencemag.org

Page 32: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Small Science: Data Deposit and Access

• Data are typically held in diverse formats • Discovery of data is very weakly supported by

standards-development• Access to and use of data are highly variable• [ However some progress has been made -- as in the

case of museum specimen data in the past 20 years [SEE for ex. : GBIF and many allied projects] ]

• Some progress has also been made respecting observational and other data

• Ecological and conservation field data remain highly problematic

Page 33: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Nominal/ Descriptive element (Sex) [manuscript -- icon]

Date element (mm-dd-yyyy) [manuscript -- alphanumeric]

Responsibility (collectors) [print – alpha]Nominal/ Descriptive element (Scientific Name) [manuscript -- alpha]

Spatial Element (geographic place name) [manuscript -- alpha]

Address element (Institutional Name) [print -- alpha] + (Specimen #) [manuscript -- numeric]

Responsibility (expedition name) [print –alpha]

Specimen Label

Page 34: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Specimen Label + Verso

Address element (Specimen Field #) [manuscript -- numeric]

Nominal/ Descriptive element (Notes) [manuscript -- alpha]

Page 35: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

“Darwin Core” – Access Points

1. ScientificName2. Kingdom3. Phylum4. Class5. Order6. Family7. Genus8. Species9. Subspecies10. InstitutionCode11. CollectionCode12. CatalogNumber

13. Collector14. Year15. Month16. Day17. Country18. State/Province19. County20. Locality21. Longitude22. Latitude23. BoundingBox24. Julian Day

Dave Vieglais Species Analyst 4/20/2000

http://habanero.nhm.ukans.edu/presentations/Gainesville_May2000_files/v3_document.htm

Page 36: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

GBIF – October, 2008 (as a result of the Darwin Core reductionist data analysis…)

GBIF UDDI Registry* registration* update information ________________________________________Data Providers 259 Datasets 7481 Searchable Records 147,539,975

http://www.gbif.org/ [clipped Oct 8, 2008]

Page 37: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

“Small Science”

DATA SETS

someexamples

with “native metadata”

2-d_soil_temps.csvsurface, and sub-surface soil temperatures (at 2cm and 8cm depths) measured at one location for a few days in order to

calibrate a model of temperature propagation. Surface temperature was measured with an infrared thermometer, subsurface temperatures with a thermocouple.

----------------------------5-minute_light_data_for_4_continuous_days_plus_reference.xlsPPF (photosynthetic photon flux = photosynthetically active radiation 400-700nm) measured with an array of photodiodes

calibrated to a Licor sensor, along a linear transect for a few days. used to get an idea of how much light plants along the transect are receiving.

----------------------------CO2_of_air_at_different_heights_July_9.xlsconcentration of CO2 in the air during the evening for one day, measured with a Licor infrared gas analyzer and a series of

relays and tubes with a pump. used to examine the gradient of CO2 coming from the soil when the air is still during the evening.

----------------------------Fern_light_response.xlsLight response curves for bracken ferns, measured with a Licor photosynthesis system. Fronds are exposed to different light

levels and their instantaneous photosynthesis and conductance is measured. used in conjunction with the induction data (below) for physiological characterization of the ferns.

----------------------------La_Selva_species_photosyntheis_table.xlsincomplete data set on instantaneous photosynthesis rates for various tropical understory and epiphytic species grown in a

shade house in Costa Rica.----------------------------manzanita_sapflow_12-5-07_to_7-7-08.xlsinstantaneous sap flow data (as temperature differences on a constant temperature heat dissipation probe) for multiple

branches of Manzanita, collected with a datalogger. used to correlate physiological activity with below-ground measures of root grown and CO2 production.

----------------------------moisture_release_curves.xlspercentage of water content, water potential (in MegaPascals) and temperature of soil samples, measured in the laboratory

for calibration of water content with water potential. soil is from the James Reserve in California.----------------------------Photosynthetic_induction.xlsa time-course of photosynthetic induction for a leaf over 35 minutes. instantaneous photosynthesis measured as mol CO2 �

m/2/s and light level is probably 1000 micromoles. used to determine physiological characteristics of bracken ferns.----------------------------run_2_24-h_data_for_mesh.xlsmeasurements of micrometeorological parameters on a moving shuttle, going from a clearing across a forest edge and into

the forest for about 30 meters. Pyronometers facing up and down, pyrgeometer facing up and down, PAR, air temperature, relative humidity. Also data from a station fixed in the clearing and some derived variables calculated. used for examining edge effects in forests.

----------------------------Segment_of_wallflower_compare_colorspaces_blur.xlspixel counts from images of wallflowers that were segmented into flower/not-flower under different color spaces.

segmentation was made using a probability matrix of hand-segmented images. used to automatically count flowers in images collected after this training data was collected (and used to determine the best color space for this task).

Page 38: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

2 12.365 1196796112 2018.8 0.5585 0.51029 0.55517 0.54354 0.6067 0.52858 0.55351 0.59008 0.59506 0.60337 0.56514 12/4/07 11:21 4.47351 3 12.348 1196796232 2017.9 0.55682 0.51028 0.5535 0.54352 0.60669 0.52857 0.55017 0.59007 0.59505 0.60336 0.56513 12/4/07 11:23 0 4.47490 4 12.357 1196796352 2018.6 0.55514 0.51027 0.55348 0.54351 0.60501 0.52855 0.55016 0.59005 0.59504 0.60501 0.56512 12/4/07 11:25 0 4.47628 5 12.354 1196796472 2017.6 0.55514 0.51026 0.55181 0.5435 0.60334 0.52855 0.54849 0.59004 0.59503 0.60334 0.56511 12/4/07 11:27 0 4.47767 6 12.334 1196796592 2018.3 0.55347 0.51026 0.55015 0.5435 0.60333 0.52854 0.54682 0.59004 0.59502 0.605 0.56511 12/4/07 11:29 0 4.47906 7 12.34 1196796712 2018.5 0.55014 0.50859 0.55014 0.54349 0.60332 0.53019 0.54349 0.59003 0.59501 0.60498 0.56676 12/4/07 11:31 0 4.48045 8 12.337 1196796832 2017.8 0.55013 0.50692 0.55013 0.54348 0.60332 0.53019 0.54182 0.59002 0.59501 0.60498 0.56675 12/4/07 11:33 0 4.48184 9 12.328 1196796952 2017.5 0.5468 0.50691 0.5468 0.54347 0.60331 0.53018 0.53849 0.59001 0.595 0.60497 0.56674 12/4/07 11:35 0 4.48323 10 12.323 1196797072 2017 0.54679 0.50524 0.54679 0.54347 0.59998 0.53017 0.53682 0.59 0.59499 0.60496 0.56674 12/4/07 11:37 0 4.48462 11 12.328 1196797192 2018.9 0.54679 0.50191 0.54512 0.5418 0.59665 0.53017 0.53349 0.59 0.59498 0.60496 0.56673 12/4/07 11:39 0 4.48601 12 12.319 1196797312 2017.7 0.54345 0.49857 0.54178 0.54178 0.59663 0.53015 0.53015 0.58998 0.5933 0.60327 0.56671 12/4/07 11:41 0 4.48740 13 12.311 1196797432 2017.3 0.54343 0.4969 0.54011 0.54177 0.59661 0.53014 0.52848 0.58997 0.59329 0.6016 0.5667 12/4/07 11:43 0 4.48878 14 12.316 1196797552 2018.6 0.5401 0.49357 0.53678 0.54176 0.59328 0.53013 0.5268 0.58995 0.59328 0.60325 0.56669 12/4/07 11:45 0 4.49017 15 12.31 1196797672 2016.8 0.53844 0.4919 0.53511 0.54176 0.59494 0.53013 0.52514 0.58995 0.59328 0.60325 0.56503 12/4/07 11:47 0 4.49156 16 12.31 1196797792 2017.1 0.53676 0.48856 0.53343 0.54174 0.59326 0.53011 0.5218 0.58993 0.59326 0.60323 0.56501 12/4/07 11:49 0 4.49295 17 12.31 1196797912 2017.1 0.53342 0.48523 0.5301 0.54173 0.59324 0.5301 0.51846 0.58826 0.59324 0.60321 0.56499 12/4/07 11:51 0 4.49434 18 12.301 1196798031 2017.5 0.53174 0.48521 0.52842 0.53839 0.59156 0.53008 0.51845 0.58824 0.59323 0.6032 0.56498 12/4/07 11:53 0 4.49573 19 12.301 1196798151 2016.3 0.53007 0.48188 0.52509 0.53838 0.59155 0.53007 0.51512 0.58823 0.59321 0.60152 0.5633 12/4/07 11:55 0 4.49712

20 12.303 1196798271 2016.6 0.5284 0.47855 0.52175 0.53837 0.59154 0.5284 0.5151 0.58821 0.59154 0.60151 0.56163 12/4/07 11:57 0 4.49851

sbid battery datetime heater_voltage Manz1Sap1 Manz1Sap2 Manz1Sap3 Manz1Sap4 Manz2Sap5 Manz2Sap6 Manz2Sap7 Manz3Sap10 Manz3Sap8 Manz3Sap9 Manz4Sap11 timestamp Datagap Julian

manzanita_sapflow_12-5-07_to_7-7-08.xlsinstantaneous sap flow data (as temperature differences on a constant temperature heat dissipation probe) for multiple branches of Manzanita, collected with a datalogger. used to correlate physiological activity with below-ground measures of root grown and CO2 production.

Datum: “0.59998”From an Excel Spreadsheet

Page 39: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Field notes from the AMNH “Lang-Chapin” expedition to the Belgian Congo (1909-1915) http://diglib1.amnh.org/cgi-bin/database/index.cgi

Page 40: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

http://diglib1.amnh.org/galleries/bats/taphozous_mauritianus.html

Page 41: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

View from the north of the Ngoc Linh Mountain Range in Vietnam's Central Highlands. This image was created by draping a LandSat scene (1998) over a three-dimensional model.

Courtesy AMNH Center for Biodiversity and Conservation

Page 42: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Rheinardia ocellata, the Crested Argus. Photographed at night by an automatic camera-trap in the Ngoc Linh foothills (Quang Nam Province).

Courtesy AMNH Center for Biodiversity and Conservation

Page 43: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,
Page 44: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,
Page 45: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

• AGS Alto Golfo Sustentable • ASM American Society of Mammalogists • CEC Commission for Environmental Cooperation • CEDO Intercultural Center for the Study of

Deserts and Oceans• CI Conservation International • CIRVA International Committee for the Recovery

of the Vaquita • CICESE Centro de Investigación Científica y

Ecuación Superior de Ensenada • CILA International Boundary and Water

Commission• CITES Convention on International Trade in

Endangered Species of Wild Fauna and Flora• Conagua National Water Commission• Conanp National Commission for Protected

Natural Areas, • Semarnat (Comisión Nacional de Áreas

Naturales Protegida—Semarnat) • Conapesca National Fisheries and Aquaculture

Commission• Sagarpa (Comisión Nacional de Pesca y

Acuacultura, Sagarpa)

• Profepa Federal Attorney for Environmental Protection

• Secretariat of Agriculture, Livestock, Rural Development, Fisheries, and Food (Mexico) Salud Secretariat of Health (Mexico)

• COSEWIC Committee on the Status of Endangered Wildlife in Canada

• Department of Fisheries and Oceans (Canada) • United States Department of the Interior • European Cetacean Society • US Environmental Protection Agency • US Food and Drug Administration• GEF Global Environmental • IBWC International Boundary and Water

Commission• National Institute of Ecology, Semarnat• Inapesca National Fisheries Institute, Sagarpa• IUCN World Conservation Union • International Whaling Commission• Local Economic and Employment Development

program • United States Marine Mammal Commission

VAQUITA STAKEHOLDERS

Page 46: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

• Marine Stewardship Council • NAMPAN North American Marine Protected

Areas Network (CEC) • US National Academy of Sciences • North American Wildlife Enforcement Group

(CEC) • US National Marine Fisheries Service, NOAA,

Department of Commerce • US National Oceanic and Atmospheric

Administration, Department of Commerce • United States National Ocean Service (NOAA) • PACE Species Conservation Action Programs,

Conanp• PGR Attorney General Office (Mexico)• POEMGC Marine Ecological Planning of the Gulf

of California Program, Semarnat• Procer Conservation Program for Species at

Risk• Secretariat of Economy (Mexico) • Sectur Secretariat of Tourism (Mexico) • Sedesol Secretariat for Social Development

(Mexico) • Semar Secretariat of the Navy• Semarnat Secretariat of the Environment and

Natural Resources • Society for Marine Mammalogy • Solamac Latin American Society for Aquatic

Mammals

• Somemma Mexican Society for Marine Mammalogy

• SWFSC Southwest Fisheries Science Center( US NMFS, NOAA)

• The Nature Conservancy • Universidad Autónoma de Baja California Sur • University of California • United Nations • United States Coast Guard • United States Fish and Wildlife Service• World Wildlife Fund

Page 47: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

By Serge Bloch in NYT: Natalie Anger “Tracking forest creatures on the move.” NYT Feb 2, 2009 SEE:

http://www.nytimes.com/2009/02/03/science/03angier.html?_r=1&scp=1&sq=tracking%20mammals&st=cse

Page 48: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

How many data sources contributed to this analysis?

Page 49: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

The Ecology of Data Sharing

Page 50: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Stages of Digital Library Development

Stage Date Sponsor Purpose

I: Experimental

1994NSF/ARPA/NASA Experiments on collections of digital materials

II: Developing

1998/1999 NSF/ARPA/NASA, DLF/CLIR Begin to consider custodianship, sustainability,

user communities

III: Mature? Funded through normal

channels? Real sustainable interoperable digital libraries  Howard Besser. Adapted from The Next Stage: Moving from Isolated Digital Collections to Interoperable Digital Libraries by First Monday, volume 7, number 6 (June 2002),URL: http://firstmonday.org/issues/issue7_6/besser/index.html 

Page 51: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

THE ROLE OF SCIENTIFIC AND TECHNICAL DATA AND INFORMATION IN THE PUBLIC DOMAIN PROCEEDINGS OF A SYMPOSIUM Julie M. Esanu and Paul F. Uhlir, Editors Steering Committee on the Role of Scientific and Technical Data and Information in the Public Domain Office of International Scientific and Technical Information Programs Board on International Scientific Organizations Policy and Global Affairs Division, National Research Council of the National Academies, p. 5

“Research Commons”

The Public Domain Knowledge Commons

“the institutional ecology of the digital environment” (Yokai Benkler)

Page 52: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

References to “Intellectual Property” in U.S. federal cases

“Professor Hank Greely” Cited in Lessig, L. The future of ideas: the fate of the commons in a connrcted world. NY, Random House, 2001. P. 294.

Page 53: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Julian Birkinshaw and Tony Sheehan, “Managing the Knowledge Life Cycle,” MIT Sloan Management Review, 44 (2) Fall, 2002: 77.

???

Is scientific knowledge a “commodity” ???

Page 54: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Association of University Technology Managers. AUTM U.S. Licensing Activity Survey: FY2006. Survey Summary: A Survey of Technology Licensing (and Related) Activity for U.S. Academic and Nonprofit Institutions and Technology Investment Firms

Page 55: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

ReductionistsCurrent Norms

Expansionists

Maximalists

Intellectual Property Rights

BENEFITS

Differing Interpretations of IPR Regulation

Brotherhood of Painters, Decorators, and Paperhangers of America.; Screen Cartoonists Local Union No. 852 (Hollywood, Calif.); Animation Guild and Affiliated Optical Electronic and Graphic Arts, Local 839 I.A.T.S.E. (North Hollywood, Los Angeles, Calif.); Motion Pictures Screen Cartoonists Local 839, I.A.T.S.E.

Page 56: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Perhaps certain types of “knowledge” (or

“cultural properties”)will inevitably be “commodities”?

But this should not restrict us from making most knowledge resources

available for productive / creative use

Page 57: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Ethical Context for Sharing Knowledge Resources

• Knowledge Equity as a fundamental good• Ethos of Science• Ethos of Conservation• Human Rights• Governmental / Organizational Transparency

and Accountability• Civic Responsibility and Science Literacy

Page 58: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

“The field of knowledge is the common property of all mankind “

Thomas Jefferson 1807

Page 59: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Edward R. Murrow: “Who owns the patent on this vaccine?”

Jonas Salk: “Well, the people, I would say. There is no patent. Could you patent the sun?”

CBS Television interview, on “See It Now” (12 April 1955) form: http://en.wikiquote.org/wiki/Jonas_Salk

Page 60: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

“Another source of tension is the decade- old boom in commercial biotechnology. Walter Gilbert, a Harvard biochemist, says companies have continued to publish descriptions of new strains or genetic discoveries, but, beginning in 1978 or so, they sometimes declined to give out all the information or material. The aim was to guard proprietary interests - staking a claim while keeping competitors at least partially ignorant of the details. Before that, Gilbert says, the rule was considered "absolute" that everything must be made available after publication. 'That's how the entire field of immunology developed -through the free exchange of material," he says. Joshua Lederberg of Rockefeller University also has spoken out, saying it may be necessary to reinforce the old standards because people seem to be neglecting them.”

“Data Sharing: A Declining Ethic? -- Commercial pressures and heightened competition are testing the notion that scientific data and materials should be widely shared.” Science vol 248 p 952- 957, 25 May 1990

Page 61: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

ALL knowledge? Or perhaps, an ethical spectrum ? – the polemics of support for the Science Knowledge Commons

Human Health Agriculture

Earth Science/Conse

rvation

[ Nuclear Technology ]

[Biotechnology]

Education

Science-Tech

Page 62: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

http://www.aaas.org/spp/rd/fy08.htm

Page 63: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

http://www.aaas.org/spp/rd/fy08.htm

Page 64: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Agents Instrumental to Sharing

• Individuals• Professions / Disciplines• Institutions / Organizations• Governments / IGO’s• Funding Agencies / Philanthropic

Organizations

A comprehensive structure of incentives (and disincentives) is needed.

Page 65: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Cost Benefit Calculations of ChangeHigh Cost

Low Cost

Tangible Personal Benefit

Intangible Societal

Benefit-- Clear, direct benefits

-- Change is easy

-- Communication & information are key

-- Intangible direct benefits

-- Change is easy

-- Ultimate benefit should be stressed

--Convenience is key

Cell C Cell D

Cell BCell A

-- Clear, direct benefits

--Change is difficult

--Balancing communication with a strong support system is key

-- Intangible, indirect benefits

--Change if difficult

-- Try to reposition into “Cell D” – leveraging enthusiasm / supply-side persuasion

Adapted from VK Rangan et al. “Do better at doing good,” in in Harvard Business Review on Non-Profits Harvard, Cambridge, 1999, p. 173- ff.

Page 66: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Personal Incentives for Sharing? (The “Reputational Economy”)

• Ethics and the ethos of conservation or of science– Ethical imperative

• The “Reputation Economy” – Personal recognition: priority/ prestige ( evidence

of substantial increases in citation)– Professional credential for hiring, for promotion

and for job security (also requires professional/disciplinary change)

Page 67: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Individual’s willingness to share: Core functions of Scholarly Communication

• “Registration, which allows claims of precedence for a scholarly finding.

• “Certification, which establishes the validity of a registered scholarly claim.

• “Awareness, which allows participants in the scholarly system to remain aware of new claims and findings.

• “Archiving, which preserves the scholarly record over time. • “Rewarding, which rewards participants for their

performance in the communication system based on metrics derived from that system.

Roosendaal, H., Geurts, P in Cooperative Research Information Systems in Physics (Oldenburg, Germany, 1997).

Page 68: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

The Benefits of Open Access

“The influence of OA is more modest than many have proposed, at ~8% for recently published research, but our work provides clear support for its ability to widen the global circle of those who can participate in science and benefit from it. “

J. A. Evans and J. Reimer, Open access and global participation in science. Science v. 323 20 February, 2009 p. 1025.

Page 69: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

University of Peshawar Library

Founded in 1951

Librarian Mr. Riaz Ahmad

Total Volumes 150,000

Urdu  

English  

Other languages  

Microfilms 39

Periodicals 200Audio-Visual section  

Manuscripts 800

Other facilities  

Address:1 Administration Block University of Peshawar, Peshawar-25120, Pakistan

Tel: (+92-91)921-6483

Fax: (+92-91)921-4670

Telex:  

Page 70: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

NWFP University of Engineering and Technology Library

Founded in 1952

Librarian Abdur Raseed

Total Volumes 66,000

Urdu  

English  

Other languages  

Microfilms 34

Periodicals Scientific Journals and periodicals collectionAudio-Visual section  

Manuscripts  

Other facilities  

Address: P.O. Box 814 University Campus , Peshawar, Pakistan

Tel: (+92-91)40573

Fax:  

Telex:  

Page 71: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

The Social Enterprise Spectrum

Purely Philanthropic Purely Commercial

Motives

Methods

Goals

Appeal to

Goodwill

Mission Driven

Social Value

Mixed Motives

Mission and Market Driven

Social and Economic Value

Appeal to Self Interest

Market Driven

Economic Value

JG Dees, “Enterprising Non-profits" in Harvard Business Review on Non-Profits Harvard, Cambridge, 1999, p.147

Page 72: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,
Page 73: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Full-Life-Cycle Strategy

Page 74: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

• Data deposition/acquisition/ingest – Provide systems, tools, procedures, and capacity for efficient data and metadata deposition by authors and others; acquisition from appropriate sources; and ingest in accordance with well-developed and transparent policies and procedures that are responsive to community needs, maximize the potential for re-use, and ensure preservation and access over a decades timeline.

• Data curation and metadata management – Provide for appropriate data curation and indexing, including metadata deposition, acquisition and/or entry and continuing metadata management for use in search, discovery, analysis, provenance and attribution, and integration. Develop and maintain transparent policies and procedures for ongoing collection management, including deaccessioning of data as appropriate.

• Data protection – Provide systems, tools, policies, and procedures for protecting legitimate privacy, confidentiality, intellectual property, or other security needs as appropriate to the data type and use.

• Data discovery, access, use, and dissemination – Provide systems, tools, procedures, and capacity for discovery of data by specialist and non-specialist users, access to data through both graphical and machine interfaces, and dissemination of data in response to users needs.

• Data interoperability, standards, and integration – Promote the efficient use and continuing evolution of existing standards (e.g. ontologies, semantic frameworks, and knowledge representation strategies). Support community-based efforts to develop new standards and merge or adapt existing standards. Provide systems, tools, procedures, and capacity to enhance data interoperability and integration.

• Data evaluation, analysis, and visualization – Provide systems, tools, procedures, and capacity to enable data driven visual understanding and integration and to enhance the ability of diverse users to evaluate, analyze, and visualize data.

“Sustainable Digital Data Preservation and Access Network Partners (DataNet) Program Solicitation” NSF 07-601 US National Science Foundation Office of Cyberinfrastructure

Directorate for Computer & Information Science & Engineering

Page 75: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

8”

Migration / Emulation ???

Page 76: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

objet trouvé – gutter, 10th & Colorado, Santa Monica, California

Preservation

Page 77: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Future research…?

• Data Citation standards?– M. Altman and G. King “A Proposed Standard for the Scholarly Citation of Quantitative Data”

D-Lib Magazine March/April 2007 Vol.13:3/4 http://www.dlib.org/dlib/march07/altman/03altman.html

• URI’s for data sets?• “De-combination” algorithms? + formulae for

proportional accreditation…– GBIF Digital Publishing Framework Task Group (DPFTG)

• Disciplinary / domain ontologies?• Reductionist analyses of optimal categories for

discovery?• “Native” metadata / automated metadata extractions?

Page 78: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

Kirtland’s Warbler / Abaco Island, The Bahamas

Page 79: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

and 5 CALIFORNIA

CONDORS !

DEAD HARBOR SEAL

“NATIVE” METADATA

Page 80: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,
Page 81: "Addressing primary “modalities of constraint" on open and effective access to data in "small science" -- in the context of full-life-cycle data management." National Science Foundation,

http://www.mikero.com/blog/2009/02/20/more-darwinhttp://www.zazzle.com/darwin2009