Generation Challenge Programme Workshop, 13th January 2014 In Plant and Animal Genomics Conference, San Diego, USA, 11-15th January 2014
Elizabeth Arnaud1*, Luca Matteis1, Marie Angelique Laporte1, Herlin Espinosa2, Glenn Hyman2, Rosemary Shrestha3, Arlett Portugal4, Pierre Yves Chibon5, Medha Devare6, Akinnola Akintunde7, Jeffrey W. White8, Mark Wilkinson9, Caterina Caracciolo10,
Fabrizio Celli10, Graham McLaren4
1Bioversity International, France, 2International Center for Tropical Agriculture (CIAT), Colombia, 3Genetic Resources Program (GRP), Centro Internacional de Mejoramiento de Maíz y Trigo (CIMMYT), Mexico, 4Generation Challenge Programme (GCP) c/o CIMMYT, 5 UR Plant Breeding, Univ.
of Wageningen, The Netherlands, 6 International Maize and Wheat Improvement Center - South Asia Regional Office (CIMMYT-SARO), NepaL, 7International Black Sea University (IBSU) Georgia, 9 Centro de Biotecnología y Genómica de Plantas UPM-INIA, Spain, 10Food and Agriculture
Organization (FAO) of the United Nations, Office for Partnership, Italy
The Crop Ontology a resource for enabling access to
breeders’ data
http://www.cropontology.org
CGIAR Crop Lead Centers
Since 2008
The scientific context
Understanding the relationships between plant genotype and environment, develop the adaptive traits to respond to biotic and abiotic stress, promote the adequate agronomic practices to cultivate it and understand the heritability of adaptive traits
The Knowledge domain: plant breeding
Env
ironm
enta
l C
ondi
tions
Light Water
Nutrients Molecular
Physiological
Chemical
Developmental Agronomic
Dimensions of a phenotype
Temperature Soil
Socio Economic
Cultural
Time
Understanding the GxE interaction and the
heritability of adaptive traits
High Throughput Data Generation needs standardized trait concepts
• Next Generation Sequencing (NGS) platforms for detailed analysis of largest plant genomes
• Phenotyping platforms measure a wide range of structural and functional plant traits at the same time as collecting meticulous metadata on the environment and experimental setup [Fiorani and Schurr, 2013]
•GWAS typically focus on associations between a single-nucleotide polymorphisms (SNPs) and traits.
Developing the Crop Ontology content as
a Community of Practice
Harmonization and access to data
Maize Kernel Colour
Rice grain or caryopsis colour
Bean pod color
• Breeders’ data are often unstructured data - Complex free text used for phenotypes description
• No semantic coherence : • Same trait given different
names by scientists
• One trait named the same way for various species but refers to different plant structures
• Data and metadata are NOT
interoperable and often not online
‘Fruit colour‘
Integrated Breeding Platform www.integratedbreeding.net
• one-stop shop for services to design and carry out breeding projects – Integrated breeding workflow
• Breeders’s databases share a common schema and are being published online
• IB Fielbook is available with a standard list of traits per crop
Phenotype
It is a composite of an entity (e.g. fruit) and an attribute (e.g. shape) with a value (e.g. round):
Entity + Attribute = Trait
Entity + (Attribute + Value) = Phenotype (observed)
fruit + (shape + round) = fruit shape round
-> round fruit is the phenotype
A range of controlled vocabularies
From the controlled vocabularies build valid semantic ontologies consumabke by Web 2.0 Best practices
Web 2.0
Crop Ontology
• Crop Ontology is primarily an application Ontology for fielbooks
• A visualization tool supporting community-based development tool of trait dictionaries and crop specific ontologies
• Compare and validate terms in common
Rosemary Shretha, CIMMYT CO coordinator until 2012,
Community based development process
• Domain experts (breeders, pathologists, agronomists, etc) and Data managers identify the list of concepts
• For an variety evaluation project, Data Managers and breeders produce the IBfieldbook template with the traits and submit new terms
• Crop ontology curators in the Crop Lead centers curate, validate, compile the list and upload on the site
• The Global Crop Ontology Curator curates the crop ontology with the Crop Lead Centers’ curators
• Web development expert maintains the site
Crop curators and associated scientists
Crop Ontology themes
General germplasm information
Phenotype and traits
Plant anatomy and development
Location and environment
Trial management and experimental design
Structural and functional genomics
Traits and Phenotypes
• Banana • Cassava • Chickpea • Common beans • Cowpea • Groundnut • Maize
Crop Ontology www.cropontology.org
For 2014, adding Barley Lentil Soybean Sweet Potato
• Pearl millet • Pigeon Pea • Potato • Rice • Sorghum • Wheat • Yam
14 CGP crops
Ontology Engineering • With OBO-edit - http://oboedit.org/
• Creating multi-relationships between concepts
• cross referencing with Plant Ontology and Trait Ontology
Trait Description
Crop Trait Dictionary Template simple to share with breeders
Name of submitting scientist Institution Language of submission Date of submission Bibliographic Reference Comments
Method ID Name of Method Describe how measured (method) Growth Stage Field, greenhouse
Scale ID Type of Measure (Continuous, Discrete or Categorical) For Continuous: units of measurement, reporting units, minimum. maximum For Discrete: Name of scale or units of measurement For Categorical: Name of rating scale, Class # - value = meaning
Crop Name Name of Trait Abbreviated name Synonyms (separate by commas) Trait ID for modification, Blank for New Description of Trait How is this trait routinely used? Trait Class
1
n
n
1
Online visualization of Trait dictionaries
Methods & Scales for annotations
• Precomposed relationships between Trait, Methods and Scales required for annotations in phenotype databases
• On going discussion for revising the structure and get the 3 separated in 3 namespaces
Methods & scales for the standard lists of the Breeders’ fieldbook
Visualization & download In Crop database and Fieldbook template
Easy to use the site - Partners published their Trait ontologies
Grape
Soybean
Solanaceae
France
Barley
Multilingual versions of the crop ontologies
Multiple languages
Experimental design ontology
• CROP - PLANTING
• SEED TREATMENT
• IRRIGATION
• FERTILIZER
• PESTICIDE
• SOIL
• BIOTIC STRESS
• ABIOTIC STRESS
• HARVEST-YIELD
Trial management tasks
Akinnola Akintunde, International Black Sea Univ. (IBSU), Georgia Development of the ontology and fieldbook
Medha Devare CSISA-Nepal Coordinator, CIMMYT –SARO Design of the Fieldbook and coordination
Dictionary for Trial Management Concepts
From Medha Devare, CSISA-Nepal Coordinator CIMMYT -SARO
Environmental Ontology
Jeffrey W. White Research Plant Physiologist & Research Leader Arid-Land Agricultural Research Center USDA-ARS, Arizona, USA
Sheryl Porter Coordinator, Computer Research Applications University of Florida, Gainesville, FL, USA
Environment Ontology and Trial management Ontology
• Improve the current list of concepts •International Consortium for Agricultural System Applications (ICASA) • Integration of a Master list of 600 variables for describing crop management and recording plant responses. • ICASA promotes the use of standards in relation to crop field research and for ecophysiological models. • One objective is the application of ICASA variables by the Agricultural Model Intercomparison and Improvement Project (AgMIP) (http://www.agmip.org/ ).
Environmental Ontology
Synchronization with the Crop databases and IBWS
Synchronization of Crop Ontology with Integrated Breeding Workflow
Graham Mc Laren, Generation Challenge Programme
Rebecca Berrigan, Efficio Technology Service
Luca Matteis, CO Web Site developer, Bioversity International
Arllet Portugal IBP Data Management Leader
Harold Durufle, CO curator, Bioversity International
Application Programming Interface (API)
• Developed by Luca Matteis • Provide access services to 3rd party web sites or software • Support open collaboration and use of the Crop Ontology
Local Databases Breeders & Data Managers
Crop Database Data Manager
Fieldbook Template
Breeders’ Trait Dictionaries
Curation of the Crop Ontology
Cross referencing terms with Plant Ontology &Trait Ontology Submission of new traits through the term tracker
Data Annotation & new terms addition
IBWS - Key elements of the Logical Data Model to store phenotypic data
Annotation for storing phenotypic data in the IBWS
Property (Trait)- CO_ID Method - CO_ID Scale – CO_ID continuous discrete categorical Class1-value – CO_ID Class2-value – CO_ID Class3-value – CO_ID
A unique combination of IDs for P+M+S+C = A Standard Variable
Is_a_valid_value_of
Data
Term ID
Controlled vocabulary
Requires 3 namespaces
Synchronization flow The IBWS accepts updates sent by Crop ontologies
Schema from Rebecca Berrigan, Efficio LLC
Synchronization flow
Schema from Rebecca Berrigan, Efficio LLC
Crop ontology accepts new addition from local ontologies
The crop Ontology web site A Concept name server on the Cloud
Luca Matteis, Web developer, Bioversity International
Crop Ontology
API access by 3rd Party Web sites
[Text]
[Text] API
Genotype Data MS
EU-SOL Solanaceae Breeding DB
Wageningen.
IB Fieldbook
Agtrials -CCAFS International cassava DB
Phenomics Ontology Driven DB (PODD)
IBP Crop Databases
Global Agricultural Trial Repository and database www.agtrials.org
Glenn Hyman, geographer, CIAT
Herlin R. Espinosa G. , web developper, CIAT
Luca Matteis, Web developer, Bioversity International
Global Agricultural Trial Repository http://www.agtrials.org/
1,029 trials for Cassava
• To store evaluation data files described with metadata • To produce an Atlas of the trials
1. Annotating Evaluation data files
2. Searching evaluation data files
Agtrials uses the Crop Ontology trait terms
3. Display the Trial Information
Access to the definition of the Trait in
the Crop Ontology
Luca Matteis, CO Web developer, Bioversity International
Fred Okono, IBP Project Administrator
Brandon Tooke, IBP web developer
Integration of Crop Ontology in IBP
Integration of Crop Ontology in IBP
CO Semantic Web Compliance
Luca Matteis, CO Web developer, Bioversity International
Marie Angelique Laporte, Ontology development, RDF & SKOS conversion, Bioversity International
Mark Wilkinson, Centro de Biotecnología y Genómica de Plantas UPM-INIA, Spain
Linked Open Data Cloud • A term used to describe a recommended best practice for exposing,
sharing, and connecting pieces of data, information and knowledge • It builds upon standard Web technologies such as HTTP, RDF and
URIs • Rather than using them to serve web pages for human readers, it
extends them to share information in a way that can be read automatically by computers.
• This enables data from different sources to be connected and queried.
Wikipedia
Crop Ontology in the Linked Open Data recommended format
• Conversion from OBO to RDF/SKOS resolvable HTTP URIs
• A conversion into Simple Knowledge Organization System (SKOS) is going on
<http://www.cropontology.org/rdf/CO_324:0000002> a skos:Concept ; rdfs:label "Flag leaf weight"@en ; dc:creator _:b1 ; skos:definition "Weight of the flag leaf (the one just below the panicle)." ; skos:inScheme co:sorghum ; skosxl:prefLabel [ a skosxl:Label ; co:acronym [ a skosxl:Label ; skosxl:literalForm "FLGWT" ] ; skosxl:literalForm "Flag leaf weight"@en ] .
Linked Open Data publishing and Aligning Crop Ontology with
AGROVOC
Caterina Caracciolo, Food and Agriculture Organization (FAO), AIMES, Italy
Fabrizzio Celli, Food and Agriculture Organization (FAO), AIMES, Italy
Luca MatteisBioversity International
Marie Angelique Laporte, Bioversity International
Agrovoc - Agricultural Thesaurus
• 32,000 concepts organized in a hierarchy
• each concept may have labels in up to 22 languages
• is now available as a linked data set published, aligned (linked) with several vocabularies
Release of Agris 2.0 agris.fao.org
• AGRIS bibliographic records contain rich metadata and are largely indexed by AGROVOC FAO’s multilingual thesaurus
AGRIS 2.0 and Phenotypic Data
1. The AGRIS datasets were converted to RDF creating some 200 million triples. AGROVOC was aligned to other thesauri.
2. Sparql endpoints, web services and APIs were discovered.
3. AGRIS RDF was interlinked – using AGROVOC LOD as a backbone – to external datasets.
• Align Crop Ontology with AGROVOC in SKOS/RDF • Promote the publishing of Phenotypic data into RDF • Objective : Retrieve bibliographic references and data from
phenotypic databases in the mash up site
• AGRIS 2.0 uses the Linked Open Data Methodology to link various source of data in the mash up site
• Proof of concept done with the Collecting mission database of Bioversity International
• 3 steps
Partners collaborating to the informatics and integration formats
• IBFieldbook and IBWS teams and Efficio LLC
• Plant Breeding dept. of Wageningen for the Resource Description Format (RDF)
• CIAT-DAPA, for the synchronization of The Global Repository of Evaluation trials (Agtrials) of CCAFS
• FAO-AIMES for the use of Linked Open data with AGRIS 2.0
Partners collaborating to the content engineering & the looking forward to a Reference Ontology for plants
• Plant Ontology, Jaiswal Lab., Oregon State University, USA
• Soybase, USDA-ARS, USA • Solanaceae Genomic Network (SGN), USA • Cornell University, USA • Institut National de Recherche d’Agronomie (INRA),
France • Centro de Biotecnología y Genómica de Plantas UPM-
INIA, Spain • POLAPGEN, Poland • Australian Plant Phenomics Data Repository
Any questions, please contact us
Poster #981 Plant Genomics Outreach Booth # 305
Send a mail at : [email protected] [email protected] [email protected]