Upload
anne-thessen
View
126
Download
3
Tags:
Embed Size (px)
DESCRIPTION
This talk describes the potential semantic web technology has to make the practice of taxonomy easier. It was presented at the 2011 Phycological Society of America conference in Seattle, WA, USA.
Citation preview
The Future of Microalgal TaxonomyAnne Thessen, [email protected] Patterson [email protected](Data Conservancy, Life Sciences)
Scientist’s Dream
Computer, what is the trajectory of
the planet Seti Alpha 5?
Taxonomist’s Dream
How many algal species can be found
on this planet?
Taxonomist’s Dream
What species is this?
Taxonomist’s Dream
Taxonomist’s Dream
Setting the stage for a ‘big new biology’
• BIG = data-centric (like particle physics and astronomy)
• Characterized by data sharing via a virtual pool
• New = new skill sets, tools, cyber-infrastructure to exploit the data pool
• Data driven discovery as a new means of understanding
• GenBank as a model within the Life Sciences
Small science
Large number of providers with small amounts of data.
Small number of providers with lots of data.
Aa paleacea
Limulus polyphemus
Kiwa hirsuta
Osedax frankpressi
Kingia australis
Names
Pieris japonica
Pieris rapae
Trypanosoma brucei
Homo sapiens
Many names for one taxon
Didimosphenia geminata
Didymosphenia geminata
Didymosphenia geminata
Didymosphenia geminata
Rock snot
Didymo
Echinella geminata
Gomphonema geminatum
Gomphonema vulgare
Reconciliation Group
Didymosphenia geminataDidimosphenia geminataDidymoRock SnotEchinella geminataGomphonema geminatumGomphonema vulgare
Reconciliation Group
Didymosphenia geminataDidimosphenia geminataDidymoRock SnotEchinella geminataGomphonema geminatumGomphonema vulgare
One name for many taxa
Cyclophora tenuis Cyclophora Castracane 1878
Cyclophora Cyclophora Hübner 1822 Cyclophora porata
.
Contextual data
DiatomChloroplastFrustuleBenthicMarine
Disambiguate by authority, species, contextual data
Contextual data
FoodMoth
WingsExoskeleton
Caterpillar
Global Names Architecture
Provider Services
DATA AND SERVICE CONSUMERS
DATA AND SERVICE PROVIDERS
EXPERTS
Consumer Services
GNA
Names-based cyberinfrastructure
• Managing names to manage biodiversity data- All names (scientific vernacular surrogate)- For all organisms- Many names for one species reconciled- One name for many species disambiguated
• Global Names Architecture - a virtual layer, using names services to link together
distributed data• Globalnames.org• Micro*scope (microscope.mbl.edu) and
Encyclopedia of Life (eol.org)
Legacy Data
• Narrative tradition in biology
• Too much for a human• Can we get a machine
to do the work?• NLP!!!
Legacy Data
• Use NLP/machine learning to extract names and characters
• Hong Cui
Legacy Data
• Spirogyra:chloroplasts:present
Legacy Data
• Spirogyra:chloroplasts:present:attribution
Coffee Ontology
coffee
is a
drink
Existing Ontology
Semantic Web
Data Discovery and Aggregation
Future Data
Triple Store
The New Workforce
• Informatics/computing training• Modified workflows• Importance of data management and
preservation
In Summary
• Big New Biology is coming, taxonomy can benefit from being a part of it
• Existing data can be made machine-readable using information extraction algorithms
• Existing workflows can be modified to capture data close to the source
• Data can be shared using the semantic web
Acknowledgments
• Dima Mozzherin• David Shorthouse• Sayeed Choudhury• Pete DeVries