26
Ontological Implications from Sequencing the Global Rice Collections Kenneth L. McNally Sr. Scientist II A Team 4 Computational Genomics T.T. Chang Genetic Resources Center International Rice Research Institute Los Baños, Laguna, Philippines IRRI

Ontological Implications from ... - Plant Ontology Wikiwiki.plantontology.org/images/f/f2/KMcNally_IRRI.pdf · • Build consortium of partners with expertise in particular traits

  • Upload
    vunhi

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Ontological Implications from Sequencing the Global Rice Collections

Kenneth L. McNallySr. Scientist II

A Team 4 Computational GenomicsT.T. Chang Genetic Resources CenterInternational Rice Research Institute

Los Baños, Laguna, Philippines

IRRI

IRRI - Int. Rice Research Institute• Founded in 1960 by Rockefeller and Ford Foundations

• Flagship center of CGIAR

• Green Revolution in Asia initiated through the release of IR8 with sd1 for semi-dwarf stature

Mission

• To reduce poverty and hunger, improve the health of rice farmers and consumers, and ensure environmental sustainability through collaborative research, partnerships, and the strengthening of national agricultural research and extension systems.

QuickTime™ and a

decompressor

are needed to see this picture.

IRGC – the International Rice Genebank Collection

World’s largest collection of rice germplasm held in trustfor the world community and source countries

• Over 108,000 registered and incoming accessions from 117 source countries

• Two cultivated species

Oryza sativaOryza glaberrima

• 22 wild species

• Relatively few accessions have donated alleles to current, high-yielding varieties

• http://www.irri.org/GRC

Rice Diversity

O. sativa Panicles

GCP SSR data set for O. sativa

Allelic series of induced and natural variants

Traditional Germplasm

IRGC – 108,000 accessions

Elite Lines/VarietiesIR64 Mutant Collection

Breeding/Mapping

populations

Array of Genetic Resources

Develop a genetic diversity platform

Single genome

Nipponbare(Temp Japonica)

NGS/3GS

>10K lines from Gene Bank

20 varieties

genome-wide SNP

OryzaSNP

2000+ lines genome-wide SNP

Association genetics platform

SNP haplotype-phenotype association

QTL prediction

Parental choices

Pedigree/trait tracking

Natural reverse genetics system

Probe deep into available, useful diversity

Selective trait evaluation

Large SNP dataset to query new germplasm

Breeding history

Abundant markers

Benchmarkphysical map

Enabling -omics technology

“e-cloning”

2005 20112008 2012

Use

Conserved Germplasm

Breeding Lines

Specialized Genetic Stocks

Current

problems

Drought tolerance

conservation

dissemination

Phenotype-

genotype

association

Durable

disease-pest

resistance

Problem soils

Future

challenges

Public Genetic Diversity Research Platform

C4 Rice Grain quality

Global Rice Scientific Partnership (GRiSP)PL 1.2: Characterizing genetic diversity and

creating novel gene pools

• 1.2.1 Rice SNP Consortium for high density genotypes

http://www.ricesnp.org

Cornell, USDA, EMBRAPA, …

• 1.2.2 Global phenotyping network for key traits

• 1.2.3 Whole genome sequencing of genebank stocks

• 1.2.4 Specialized populations for genetic studies

IRRI

GW

AS

+++

1.2.3 Whole genome sequencing of germplasm stocks

Knowledge of genotype / phenotype / agronomic value

Nu

mb

er o

f li

ne

s

Genetic Resources: Genotype/Phenotype information

Detail required

to evaluate

agronomic performance

Minimal knowledge

of most materials

Solution:

Data on whole collection

to predict performance –

ecogeographic data;

sequence collection

10,000 GeneBank accessions1

Cultivated + close wild relatives

Rice SNP

Consortium

1M Affymetrix

genotyping chip

2000 lines

Phenotyping network

2000+ lines $20 M (combined funding sources)

Association genetics and

QTL mapping

Predict genotype-phenotype

relationships at kb resolution

Specialized genetic

stocks: MAGIC

populations, biparental

RILs, CSSL,

Genebank as a

reverse

genetics

system

Select

accessions

based on QTL

prediction for

targeted

phenotyping of

specific traits

Discover novel

phenotypes

Bioinformatics and database to

Integrate sequence-phenotype data

BGI de novo sequencing 200 @ 50X depth

1000 @ 10-20X depthrest @ 5-10X depth

Use in

breeding

programs

$15 M

Participating institutions: CIRAD, CAAS, BGI, IRRI, ……

1 Include publicly accessible germplasm from IRRI, CIRAD, AfricaRice , CIAT and regional collections

Tapping into the unknown

IRG Traditional Germplasm

100,000 cultivated accessions

Iterative sampling200 @ 50X800 @ 30X9K @ 6X90K @ 2X

Apply low-cost sequencingby next generation and 3rd

generation technologies

• Obtain WGS by NGS or 3rdGS for all? accessions

•Cost-benefit analysis for return on investment

• Use the association data between 2500 lines and trait phenotypes to select materials for specific evaluation

• Isolate novel genes and rare alleles contributing to these traits

103 Genomes by Illumina NGS for 1M SNP Affy chip Cornell, IRRI, USDA, DevGen, Academia Sinica, EMBRAPA, Uni Aberdeen, JBEI/JGI, NIAS, Uni Delaware, …

• 15 indica

• 6 indica/admixed (unique type in some analyses)

• 12 aus

• 17 temperate japonica

• 7 aromatic

• 16 tropical japonica

• 14 O. rufipogon and nivara (AA genome)

• 1 O. meridionalis (AA genome)

• 7 O. glaberrima (African cultivated, AA genome

• 7 O. barthii (AA genome)

• 1 O. punctata (BB genome,outgroup)

52 genomes from W Wang (Kunming Zoo Institute) & FY Hu (YAAS)• 5 indica, 4 aus, 2 deep-water, 6 aromatic, 5 trop. japonica, 4 temp. japonica,

• 25 O. rufipogon and nivara, 1 O. longistaminata

Germplasm for Genotyping/WGS

• NSF-TV (500)

• GCP genotyping set (2339)

• GCP drought (800)

• GCP Aus (300)

• Orytage/Eurigen (600) • Nominations of pure lines

• Others from IRRI Core (13,000)

• Madagascar (50)• O. rufipogon/nivara (100)• MAGIC parents (16)• ACIAR chalk (1300)• Various donors (5)• O. glaberrima (300)

• USDA core (1500)

Now have >4100 SSD genetic stocks~2000 for SNPing on 1M feature Affy arraysRest are in line for sequencing by Illumina NGS

Diversity (coverage), utility, trait donors, nominations

Multiplication of 1200 SSD lines

2 Hta Upland farm DS2010

PL 1.2.2 Global phenotyping network for key traits

Phenotyping consortium for traits with impact(under GRiSP 1.2.3)

• Build consortium of partners with expertise in particular traits

• Rely on existing networks and sites as much as possible

• Identify and prioritize traits where impact is needed

• Sample from the Rice SNP set of 2500 lines for subsets targeted to specific traits and environments

• Phenotype these traits using standardized procedures

• Capture meta-data about experimental design,

o Method ontology, Experimental ontology

• Use controlled vocabulary and trait ontology for data

• Centralized database for data capture

o Pedigrees, germplasm stocks, phenotype studies, SNP data

o new schema “eRice”, Chiangzhi Liang, IRRI

• Link to external genome annotation DBs

o Full use of PO, EO, MO, TO, GO, Crop Ontology (GCP)

Phenotyping network: example traits for impactFocus on traits affected by global climate change

Root properties relevant

drought tolerance

TW16

RTV

TN1

RTV

TN1

Healthy

TW16

Healthy

Gene for virus resistanceTrait Site

Yield components Field

Disease resistance GH + disease nursery

Salinity (vegetative and reproductive

GH + Field

Drought GH + Field

Heat (humid and dry)

Growth chamber +

Field

Grain quality Laboratory

Seed physiology Laboratory

Phenotyping OryzaSNP set for traits.

Chalk, Grain quality, Disease (BLB, blast, SB, +), salinity, drought at reproductive stage, root traits, … (IRRI & GCP collaborations)

Biomass, nutritional effects on immunostimulation in mice (Jan Leach, Colorado State University)

WS2007 & WS2008 for Morpho-agron/Yield (GRC)

Aus lines 2010DS

220 lines + increasing 36

3 environments

• Early vigor

• Canopy temp

• Yield

GCP G3008.6 “Targeting Drought-Avoidance Root Traits …” A. Henry

Linking root architecture with root function

Drought treatment

0

1

2

3

4

5

6

7

0.5 1.5 2.5 3.5 4.5

week after draining cylinders

Cu

mu

lati

ve w

ate

r lo

ss

(kg

) Dular

IR64

unplanted

IRRI lysimeter facility

GCP G3008.6, A. Henry

Ontological implications of 100K+ rice genomes

• Need for a non-coding “sequence” ontology?

o Cover classes not in GO such as transposons, REs, CNVs, SNVs

• A controlled vocabulary for population structure and haplotypes?

• Extend Trait ontology?

• How does a metabolic network intrinsically differ from an ontology (apart from not necessarily being a DAG) ?

• Can they be merged using network computational methods?

Genotyping/phenotyping strategy

SNPdiscovery

OryzaSNP

resequencing

20 rice varieties

(2008-2009)

Illumina 1536-plex and

BeadXpress 384-plex

(Cornell & IRRI, ongoing)

Next-genResequencing

>100 varieties (2010)

Affymetrix 1M

2,000 varieties

(2011)

Trait association studies,

allele mining for key genes

and functional SNPs

QTL mapping, genetic

diversity analysis, MAS,

DNA fingerprinting

Sequencing the entire genebank

10,000 varieties

(2011-2012)

Rest

(2012-2015)

Phenotyping Network: Large-scale, precise

phenotyping and phenomics

(2011-2015)

Mapping populations

diverse germplasm(2 MAGIC, 20 RILs NAM)

(ongoing)

SNPgenotyping

Phenotyping

& Population

Development

Langub, Sipalay, Negros Occ.

Palawan

Panay,Mindoro,Luzon

Sulu Sea