Upload
philippa-craig
View
216
Download
2
Embed Size (px)
Citation preview
Integrating Phenotypic Data With Genomic, Genetic and Genotypic Data Using Chado
Sook Jung, Taein Lee, Stephen Ficklin, Jing Yu, Dorrie Main
Outline
Introduction of GDR and CottonGen Chado the generic schema Storing Stock Data Storing Phenotypic Data (trait, dataset, etc) Storing Genotypic Data Integration with genetic and genomic Data Conclusion
Database projects of Main lab Major databases with genomic, genetic, phenotypic and genotypic data
1. GDR: Genome Database for Rosaceae
Genomic. Gemetoc and Breeding data (Private data and data from RosBreed project)
• Fruit and Nut, Sat, 12 PM
• Computer Demo, Mon, 1:35 PM
• P0946, RosBreed BIM System, Mon, 10-11:30 AM
2. CottonGen: Replaced CottonDB and Cotton Marker Database
• Cotton Genome Initiative, Sun, 3:50 PM
• Computer Demo, Mon, 1:50 PM
Other databases:
Citrus Genome Database, Cool season food legume database, Genome database for Vacciniium
Built using Chado schema and Tripal (Drupal front end for Chado)
Tripal presentation, GMOD workshop, Wed 11:50 AM
Chado: Modular, Generic and Ontology-driven schema
natural diversity
general
cv
pub
organism map
geneticmage
companalysis
sequence
stock
phenotype
Publication
Chado: Modular, Generic and Ontology-driven schema
FeatureFeature_idNameUniquenameType_idOrganism_idresidues
Feature_relationship
Feature_relationship_idSubject_idObject_idType_id
Featureprop
Featureprop_idFeature_idType_idValuerank
cvtermcvterm_idNamedefinitioncv_idDbxref_id
gene, mRNA, marker, QTL, etc
Abc-mRNApart_of
Abc-gene
Repeat_motifProduct_size
Subject_id
object_id
cv
cv_idNamedefinition
Sequence Ontology, Gene Ontology, etc
Storing Stock (from samples to population; pedigree)
stockstock_idNameUniquenameType_idOrganism_idresiduesstock_relationship
Feature_relationship_idSubject_idObject_idType_id
stockprop
stockprop_idstock_idType_idvalue
cvtermcvterm_idNamedefinitioncv_idDbxref_id
Population, cultivar, breeding line, clone,
sample, etc
Gala-001sample_of
GalaDescription,
population_size
Subject_id
object_id
stockcollection
stockcollction_idNameuniquenameType_idContact_id
GalaMaternal_parent_of
Sonya
pedigree
stock center
Storing phenotype data (from measurements to projects)
stockFeature_idNameUniquenameType_idOrganism_idresidues
nd_experiment
Nd_experiment_idNd_geolocation_idType_id phenotype
phenotype_idUniquenamevalueattr_id
cvtermcvterm_idNamedefinitioncv_idDbxref_id
PhenotypingGenotypingCross_experiment
project
Featureprop_idFeature_idType_idvalue
NE_stockNE_phenotype
project_relationship
Nd_geolocationNd_geolocation_idDescriptionLatitudeLongitudeGeodetic_datum
NE_project
Storing phenotype data (enabling comparison among datasets)
stockFeature_idNameUniquenameType_idOrganism_idresidues
phenotype
phenotype_idUniquenamevalueattr_id
cvtermcvterm_idNamedefinitioncv_idDbxref_id
Nd_experiment
cvphenotype_idUniquenamevalueattr_id
cvtermprop
cvtermprop_idcvterm_idType_idValuerank
attr_id: SkinCol_0 value: 2
RB(cv), SkinCol_0(cvterm)
value rank
Orange 1
Orange-red 2
Pink-red 3
Red 4
Dark red 5
If skin_color_harvest is 1-10 In Standard(cv), we can store the value in standard descriptor again
attr_id: Skin_color_harvest value: 4
Genotypic data integrated with genomic/genetic data
nd_experiment
Nd_experiment_idNd_geolocation_idType_id
genotype
genotype_idnameUniquenamedescription
NE_genotype
feature_genotype
FeatureFeature_idNameUniquenameType_idOrganism_idresidues
project
stock
uniquename: CPSCT038_190|192 description: 190:192
Uniquename:CPSCT038Type:microsatellite
mapExplore sequences around marker in GBrowse
Relationship between genotype and phenotype(haplotype and haplotype effect)
nd_experiment
Nd_experiment_idNd_geolocation_idType_id
genotype
genotype_idnameUniquenamedescription
NE_genotype feature_genotype
FeatureFeature_idNameUniquenameType_idOrganism_idresidues
project
stock
uniquename: MA_H3|H4bdescription: H3|H4b
Uniquename:MaType:MTL
map
phenotype
phenotype_idUniquenamevalueattr_id
NE_phenotype
phenstatement
phenstatement_idType_idGenotype_idphenotype_idEnvironmentpub
attr_id: crisp value: 2.2
Germplasm with H3|H4b alleles of MA locus hasvalue of 2.2 for crisp
Flexibility and generic characteristic of Chado enables us to store and integrate complex biological data from widely different projects and species
The ontology-driven characteristic makes adding new data types relatively easy.
Performance issue mostly resolved by the use of materialized views
Conclusion
Natural diversity module working group
Naama Menda, Seth Redmond, Robert M. Buels, Maren Friesen, Yuri Bendana, Lacey-Anne Sanderson, Hilmar Lapp, Taein Lee, Bob MacCallum, Kirstin E. Bett, Scott Cain, Dave Clements, Lukas A. Mueller and Dorrie Main
Main Lab team
All Project CoPIs (tfGDR, RosBreed and CottonGen)
Funding Sources
USDA NIFA SCRI, NSF Plant Genome Program, USDA-ARS, Washington Tree Fruit Research Commission, Cotton Incorporated, Washington State University, Clemson University, University of Florida, Boyce Thompson Institute, North Carolina State University
Acknowledgement
Taein Lee Stephen Ficklin Chun-Huai Cheng Ping Zheng Anna Blenda Sushan RuDorrie Main Jing Yu