Upload
rob-guralnick
View
1.109
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Slide show for the iEvoBio inaugural meeting in Porttland, OR in 2010
Citation preview
Biodiversity Discovery and Documentation in the Information and Attention AgePresented by: Rob Guralnick
Authors: Rob Guralnick and Andrew Hill
Contributors: Meredith Lane, Dan Janies, Walter Jetz, and lots of other folks.
Funding support: Global Biodiversity Information Facility, National BiologicalInformation Infrastructure, Defense Advanced Research Projects Agency, National Science Foundation.
#ievobio
WHAT IS BIODIVERSITY DISCOVERY AND DOCUMENTATION?
Linnean shortfall (too few taxonomists,antiquated and laborious process)
Wallacean shortfall (very coarse resolution, scattered data, no integration)
Darwinian shortfall (trees scattered in literature, no “mother of all trees”)
Multiple repositories that do not communicate well storing genetic, phenotypic data. Phenotypic knowledge-bases lag behind.
ACTION IMPEDIMENTS
Discovering and documenting new units of biodiversity
Discovering and documentingdistributions of lineages
Discovering and documentingrelationship among lineages
Discovering and documentinglineage traits from genomesto phenotype.
app
From the State of Observed Species report, http://species.asu.edu/files/SOS2010.pdf
Pace of Species Description and Documentation for 2008
Approx. 1.922 million named species (all taxa)~4-30 million undiscovered
Pace of Species Description and Documentation for 2008
Assuming a relatively conservative number (eg. 10 million undescribed species), it will take another360 years to discover and document them at our current pace. Why is discovery and documentation so slow?
1. Taxonomists proceed in the same manner today as they did one hundred years ago.
2. Few products are generated along the way. This also means the process is vulnerable (to loss of computers to the loss of taxonomists themselves).
3. Discovery and documentation are coupled.
State and Scale of Knowledge in Environmental Sciences
Scal
e (G
rain
)
Ecoregion
World
Continents/Realms
200km
50km
1km
100m
1m
1996
: GTO
PO 3
0
TopographyLandcover
currentLandcover
futureVertebrate
distributions
2009
: SRT
MV
V4
2006
WW
F
2005
-9: I
UCN
, mis
c.
Atla
s dat
a, su
rvey
s
2003
: GLC
200
0
2009
: Glo
bCov
er
1992
:BIO
ME
2001
:Im
age
2.2
Regi
onal
mod
els
Knowledge Gap
Slide from Walter Jetz (thanks Walter)
90 meterresolution SRTM elevationdata for a Portion of Colorado
100 timesas coarse
1000 timesas coarse
A view of theworld at differentresolutions
Distribution Knowledge Is Scattered
• Points are from GBIF data portal
• Expert opinion range map from IUCN Red List
• IUCN also lists some habitat preferences (cropland, meadows, mountain valleys)
Microtus montanus
Documenting our biodiversity mattersbecause it is underincreasing threat.
“Overall, we are locked into a race. We must hurry to acquire the knowledge on which a wise policy of conservation and development can be based for centuries to come.”
- E. O. Wilson
HOW DO WE DO THIS?
DEVELOP KNOWLEDGEBASES OF SPECIES DISTRIBUTIONS AND SPECIES RELATIONSHIPS.
PROVIDE MEANS TO INTEGRATE ACROSS THESE KNOWLEDGE-BASES
PROVIDE TOOLS TO RAPIDLY AND EASILY EXLORE THESE DATA ACROSS SPACE AND TIME
MAKE THIS A COMMUNITY EFFORT – LEVERAGE COMMUNITY SOURCING
Raw global data
lineage, occurrence,
environmental
Initial Research Questions
Analytical Methodsmeans to summarize data &
select hypotheses
Phylo-, Biodiversity and Ecological Informatics
New Research Questions
Processed global data
Species, distributions, new
envir. layers
Growing dataand informationrepositories
Tools Encoding analytical
methods
Application Services
automated workflow for biodiversity science
Growing Toolbox
XX
Concepts and ideas
X
Concepts and ideas Growing Toolbox Growing data Repositories, formats
Tree of Life
Population Genetics
Paup, Phyml/Raxml, MrBayes, Beast, Mesquite, etc.
TCS/NCA, MsBayes. BayesSCC,Structure, etc.
GenBank, TreeBase
(Nexus/Newick/PhyloXML,etc)
Earth Surface
Climate
Ecosystem fluxes
Satellites (Modis/GOES/Landat, etc)
Satellites; historical, current in-situ, GCMs, etc.
Infrared Imaging Spectrometer , etc Instrument-based raw
Statistical/inferential
Inference-based
Satellite image repositiories, Worldclim, PRISM , PMIP
(erdapp, netCDF, GIS formats)
Species named
Species traits
Species distributions
TaxonX, automatedSpecies name extraction
ITIS, Catalog of Life, Zoobank, Zookeys, etc.
Lucid, Ontologies, RDF
GIS, habitat suitability models, SDMs/ENMs, Survey Gap,etc.
Morphbank, TraitNet, etc.
GBIF, VertNet, OBIS (species occurrence), Map of Life, IUCNObservations and model-based
From Peterson et al. In Press Systematics and Biodiversity
The Interconnected Nature of Biodiversity Ideas, Outputs, Repositories
DECOUPLING SPECIES DISCOVERY AND DOCUMENTATION(OR GET IT OUT THERE FOR OTHERS TO USE AND REPURPOSE)
(OR CLAIM NEW BIODIVERSITY, PROVISIONALLY, BEFORE FORMAL PUBLICATION)
Generate new data from specimens
Genbank
Morphbank
Treebase
Link new unit of biodiversity onto tree of life(claim discovery)
Formal publication (documentation)
Comparartive analyses
Publish step 1 repositories
Publish step 2
Community sourcing
Scratch-pads
Life-desks
TAKE HOME MESSAGE 1:
We need to use the web as a collaborative work environment for biodiversity knowledge generation
We need to claim knowledge of the existence of new species before all of the formal steps to document it are complete
We need to publish new data about species soon after generation and prior to publication
Questions:
• How are drug resistant strains of H5N1 circulating around the globe?
• How did drug resistance arise in the H5N1 population?
• Are mutations that give rise to drug resistance in H5N1 under positive selection?
• Can we provide ways for researchers and the general public to near real-time track this spread?
Hosts and strains of avian influenza A
What about monitoring an evolving Earth System?Tracking the spread of disease lineages with known important mutations through time & space
Viral structure
Methods:
• Collect public genome data for H5N1 avian influenza (676 full genomes).
• Use tools for more efficient alignment and phylogenetic analysis of data
• Test whether mutations on M2 gene (L26I, V27A/I, A30S, S31N) that provide resistance to adamantanes (a class of drugs used to treat influenza A) are under positive selection, purifying selection or are neutral (across the full sampled population of H5N1 inf. A)
• Make GoogleEarthTM vizualizations available
.
Global View of Spread of H5N1 (blue branches are lineages with mutation for higher transmissibility among mammals)
Resistant mutant found at position 31 of the M2 protein – colored red below
Altitude of node X = a+ [(n− 1) ×b]
Dn/ds measurements across the M2 protein (high Dn/ds ratios (>1) suggest that more non-synonymous substitutions are
occurring than expected and therefore are likely being maintained in population)
Table 2 Amantadine use in chicken farms in Northern China in 1 year (from October 2004 to September 2005)
Farms No. of total chicken
Total days of medication in 1 year
Dosage (%; w/w)
Routes of administration
A 8,300 37 0.03 Feedstuff
B 15,600 26 .025 Feedstuff
C 10,400 43 .022 Feedstuff
D 26,100 21 .01 Drinking H20
E 7,200 64 .015 Drinking H20
F 13,300 25 .032 Feedstuff
G 4,300 63 .01 Drinking H20
H 21,700 38 .012 Drinking H20
I 5,400 42 .025 Feedstuff
J 14,700 59 .01 Drinking H20From He, 2007, Antiviral Research
So What Did We Find Out?
• Drug resistance to adamantanes is under positive selection for at least some mutations (S31N and V27A/I).
• Drug resistant lineages can spread quickly across the globe
• Emergence of drug resistance has been through mutation not recombination and hitch-hiking (results not shown)
• Effectively treating a potential H5N1 pandemic is based on continued monitoring of evolution and spread of resistance to adamantanes and oseltimivir (Tamiflu)
TAKE HOME MESSAGES 2:
• It is possible to not just develop observing systems of species but of evolving lineages.
• These monitoring or observing systems can provide a unique view into evolution, selection and adaptation.
• Such systems are essential for more accurate forecasting.
• Developing such a system means creating automated workflows.
http:// geophylo.appspot.com/ Hill and Guralnick, in press, Ecography Google App Engine application
WHAT ABOUT ALLOWING OTHERS TO MAKE THEIR OWN GEOPHYLOGENY?
GeoPhylo Engine - Written in Python, open source, and deployed on Google App Engine.
Advantages of cloud-based deployment:
• Scalable (near infinite computation resources)
• All versioning kept intact so developers can easily link to latest and greatest
• Storage of persistent KMLs for users who want to share and modify their KMLs.
• Easily deployable as a web service
TAKE HOME MESSAGES 3:
Geophylogenies provide rich visualizations of multidimensional data that can be examined at multiple spatial (and temporal) scales
Such visualizations may appeal beyond our community of evolutionary biologists to the broader scientific and policy community
Automated approaches and workbench-oriented tools allow for updating, community-driven content to be generated
Our ultimate goal should be an ever-growing “mother of all trees” from which we can attach new “twigs” as we discover them.
Can We Really Track Distributions of Lineages Through Space and Time?
Map of Life Will:
• Provide expert opinion range maps for almost all terrestrial vertebrates (and means to accumulate more maps for other taxa)
• Provide means for the community to annotate those maps
• Assemble point occurences, habitat preference data and environmental data (e.g. climate, landcover, soil, etc)
• Provide a modeling approach to generate much finer scale distribution models (on the order of a kilometer resolution)
Overlaying expert opinion maps and model outputs
Biodiversity encyclopedias
Point occurrences,Valid taxonomies
Range maps, Validation services
Species occurrencedatabases
Range maps
Species data
National biological data
Online conservation tools
En
viro
nm
en
tal d
ata
ITIS
ITIS
Map of Life Connections
- Common data model for range maps
- Web-services based for sharing maps
- Focus on improvement through modeling and community involvement both
Integrating phylogenetic and distributional data in GoogleEarthTM
Work
Workflowscombiningphlyogeneticapproaches,conservation status and speciesoccurrence
TAKE HOME MESSAGES 4:
Map of Life fills a critical gap in our global biodiversity knowledge by integrating different sources of species distribution into high resolution range maps for community use.
The ultimate goal is to integrate such species distribution knowledge with knowledge about relationships among species and conservation knowledge
Such integration, at global scale, and across large taxonomic groups, is the next step forward
Patterns PredictionsRelationalModeling
Community Sourcing and the Attention Age
At the heart of the message here today is also a challenge:
The vision here suggests that data publishing and “sharing” is as important as academic “kudos”
Can we act for collective good of our community and by so doing see gains for all?
Lets change our model of credit!