View
142
Download
1
Category
Preview:
Citation preview
Mining CIMMYT germplasm data to inform breeding targets for CC adaptation
Zakaria KEHEL, Jose CROSSA, Thomas PAYNE and Matthew REYNOLDS Rabat-Morocco. 24-27 June 2014
Collection Wild
Land-
race
Breeding
materials
Genetic
stocks Cultivars
Unknown
or Other TOTAL
Bread Wheat 213 32,428 41,995 8,150 6,278 331 89,395
Durum Wheat 25 5,578 14,262 1,089 1,156 58 22,356
Triticale 0 0 16,964 3,402 345 9 20,720
Barley 0 669 13,898 200 1,755 11 16,533
Species &
other 6,541 1,658 155 820 30 15 9,219
Rye 36 109 132 168 219 13 677
Total 6,816 40,442 91,057 13,829 9,783 437 158,713
TOTAL (excl.
barley)
142,180
CIMMYT Wheat Germplasm Bank
WGB: Opportunities, Challenges and Gaps
● Pedigrees, for GWAS or GS precision
● Phenotypes, so expensive (Curation)
● Core reference sets (SeeD, GCP, WGB, FIGS)
● GRIN Global and GeneSys
● Actions as a “global system”
● Little overlap with USDA and ICARDA
The phenotypic values, representing over 11.2M data points, are
held by CIMMYT’s IWIS database.
The value of these phenotypic values exceed USD100M, if the
trials resulting in the assembled data were to be repeated today.
WGB: Opportunities, Challenges and Gaps
● Species accessions
Too many!
Yet, extent of in situ diversity?
Generate new diversity with existing accessions?
● Frustration of limited access to new, improved
germplasm (this might also extend to collecting
landraces).
● Most exchange is bank-to-bank
● “my institution/government owns the germplasm”
Data quality control (single field analysis)
Identification of out layers
Verification (field books)
Data storage
Database with meta data available
Data control of wheat nurseries
Data verified by trait and by nursery
LOC_N
O COUNTRY LOCDESCRIP INSTITUTEN
10601 MAURITIUS REDUIT Agricultural Research and
Exte
19011 ALGERIA ITGC-DAHMOWNE ITGC
19012 ALGERIA EL HARRACH ITGC
19121 EGYPT SERS EL-LIYAN Agr. Res. Center
20701 LEBANON BEKA'A VALLEY Agric. Res. Inst.
21221 TURKEY AGRICULTURE FACULTY University of Trakya
22243 INDIA NAGAON EXP. STA. DWR
24059 CHINA AN DA ALKALI SALINE SOIL INST. Heilongjiang Academy
27121 THAILAND NONGKAI RICE EXP. STN. Rice Research Inst.
41303 UNITEDSTAT
ES ALABAMA AMU Alabama A & M Univ.
42109 MEXICO MEXICALI CIMMYT
42138 MEXICO CIANO - FULL IRRIGATION CIMMYT
65001 GREECE KENTZIKO THERMI NA
65004 GREECE CEREAL INSTITUTE (EPANOMI) NAGREF-DW Dept.
65009 GREECE SCHOOL OF AGRICULTURE YPSILON SA
65124 ITALY S.S. DI GRANICOLTURA PER LA
SICILIA NA
65127 ITALY PIETRANERA Univ. di Palermo
65451 SPAIN LA CABANA CIFA-Alameda del Obispo
LOC_NO Point:COUNTRY Polyg:COUNTR
Y LOCDESCRIP INSTITUTE
12308 KENYA Ethiopia ENDEBESS Kenya Seed Company Ltd.
19013 ALGERIA Morocco AIN EL HADJAR ITGC
19126 EGYPT India KHATTARA Agr. Res. Center
20011 AFGHANISTAN Kazakhstan TAKHAR-TALOQAN CIMMYT
20330 IRAN Russia BIRJAND AGRIC. RES. STN. SPII
21115 SYRIA Turkey AL RQA Ministry of Agriculture
21117 SYRIA Turkey HRAN Ministry of Agriculture
21121 SYRIA Iraq HIMO Ministry of Agriculture
21222 TURKEY Syria AGRICULTURAL FACULTY University of Dicle
22241 INDIA Bangladesh NEPZ, UBKV DWR
24022 CHINA Taiwan KIMMEN A.E.S. Qinghai Academy
29501 TAJIKISTAN Afghanistan TAJIK A.R.I. Kazakh Scientific Res. Inst.
42124 MEXICO United States VALLE DE MEXICALI Agric. Int. de Mexico
47702 DOMINICANREP Haiti QUINIGUA C. de Des. Agrop.
61403 HUNGARY Serbia SZEGED C.R.I. Cereal Res. Inst.
FIGS without roots, or imbalanced passport,
characterization & evaluation data
Data control a continuing process
• The same location with different management system has only
one planting and harvest date
• Full irrigation or irrigated locations with “NO” irrigation in the
corresponding field value
• Same location, IRR YLD less than RF YLD
• 13 Ton/Ha in RF location (Mexico Obregon) as an example
other control methods with time
• Outliers across locations and years
• Validating dates using earlier years or neighboring locations
• RF versus IRR
• …
MET Analysis
MET data
GxE analysis
Variance components, G corr,
BLUPS, Stability
GxE with
covariables
Patterns of GxE
(spatially changing
relationships)
Identification of
co-variables
(Factors, variates)
Meta data stored
in the DB
All, RF,
IRR
y = 0.0176x + 4.3539 R² = 0.1952
y = 0.113x + 8.0045 R² = 0.5792
y = 0.0013x + 1.0054 R² = 0.0005
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
mean
max
min
Linear (mean)
Linear (max)
Linear (min)
y = -0.3232x + 15.653 R² = 0.4153
y = 0.308x + 82.476 R² = 0.3959
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
Vg
Vgxe
Linear (Vg)
Linear (Vgxe)
Change in yield variability in Wheat Nursery
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
ESW
YT3
0
ESW
YT2
9
ESW
YT2
8
ESW
YT2
7
ESW
YT2
6
ESW
YT2
5
ESW
YT2
4
ESW
YT2
3
ESW
YT2
2
ESW
YT2
0
tmin_veg
tmin_rep
tmin_gf
tmin_seas
0.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
0.400
0.450
ESW
YT3
0
ESW
YT2
9
ESW
YT2
8
ESW
YT2
7
ESW
YT2
6
ESW
YT2
5
ESW
YT2
4
ESW
YT2
3
ESW
YT2
2
ESW
YT2
0
tmax_veg
tmax_rep
tmax_gf
tmax_seas
Climate/stage driving variability
tavg_gf tavg_rep tavg_seas tavg_veg tmax_gf tmax_rep tmax_seas tmax_veg tmin_gf tmin_rep tmin_seas tmin_veg
5398434 5390631 5398434 5390631 5398434 5390631 5398434 5390631 5398434 5551629 5551629 5551629
5534459 5551747 5551629 5551629 5534459 5551747 5390631 5552140 5534459 5551747 5398434 5390631
2430154 5551765 5390631 5551765 2430154 2430154 2430154 5551765 2430154 5390631 5398450 5552189
5398450 5551629 5398450 5552140 5398450 5398434 5398450 5551629 5534344 5551765 5390631 5552010
5398424 2430154 2430154 5552010 5398424 5551765 5551629 5398450 5398424 5534344 2430154 5552193
5551747 5534459 5551747 5398450 5390631 5534459 5551747 5398434 5398450 5534459 5534312 5534312
FDgf FDrep FDveg prec_gf prec_veg R10mmCL R10mmgf R10mmrep R10mmveg R5mmCL R5mmrep R5mmveg
5398530 5398530 5535500 5534326 5534326 5534312 5398530 5534312 5534312 5534312 5534312 5534312
5534312 2673706 5534475 5551704 5551704 5535534 5534312 5535534 5535534 5535534 5535534 5535534
5534459 4893489 5398136 5551798 5551798 5552327 5534459 5552327 5552327 5552327 5552327 5552327
5551690 5398471 2673706 5398160 5535534 5398530 5551690 5398530 5398530 5398530 5398530 5398530
5552193 5535415 5535514 5534335 5534335 5535415 5552193 5535415 5535415 5535415 5535415 5535415
2430154 5535428 5398471 5534339 5398160 5390809 2430154 5390809 5390809 5390809 5390809 5390809
How to use all these outputs? Genotypic
sensitivities
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
PH
B3
0R
73
13
68
/(9
07
1X
BA
BA
MG
OYO
)-1
//9
09
1
PH
B3
0H
83
13
68
/90
71
//9
09
1
SC6
21
PH
B3
0H
37
10
2/1
36
8//
90
71
CZH
99
05
2
CZH
99
06
3(Q
PM
)
CZH
99
04
4
PA
N6
57
3
CZH
99
05
5
CZH
99
05
3(Q
PM
)
SC7
13
CZH
99
04
9(Q
PM
)
PA
N6
7
CZH
00
02
3
90
71
/(K
U1
40
3X
13
68
)-2
-1//
13
93
PA
N5
50
3
CZH
00
02
5
TZ9
04
3D
MR
SR/9
07
1
SC6
33
CZH
00
02
8
CZH
99
03
8
CZH
99
04
0
98
3W
H2
3
DK
80
51
CZH
99
06
1
SC6
27
CZH
00
02
9
CZH
00
03
0
CZH
99
02
0
CZH
99
03
7
CZH
00
02
6
PH
B3
0G
97
CZH
99
02
5
CZH
00
02
4
CZH
00
02
7
SC7
09
CZH
99
02
1
SC7
15
CZH
99
03
0
Post Silk Tmax (3% of variability of Grain WT in African Maize nursery)
to have stress populations or using Pedigree to identify useful parents
Again genotype’ sensitivities to climate is
useful!
Basic Model: YLD = Line + Location + LocationxYear+ Error
Full Model: YLD = Line + Location + LocationxYear+ Climate + Genetic Markers +
Genetic MarkersxClimate + Genetic MarkersxLocation + Error
Attempt to dissect GxE
0.2
0.25
0.3
0.35
0.4
0.45
Bas
ic M
od
el
Fu
ll M
od
el
LOC1
LOC2
LOC3
LOC4
LOC5
Predicting all genotypes in single location (4
Years)
Best matching clusters for all ESWYT locations
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1 2 3 4 5 6 7 8 9 10 11 12 13 14
M1
M5
Basic Model: Line + LocxYear
Full Model: Line + LocxYear + Linex LocxYear + W + LinexW + G + GxLocxYear + GxW
Yield prediction on Elite nursery worldwide
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
1 2 3 4 5 6 7
M1
M5
Yield prediction on Elite nursery SEA
25 - 75 50 - 50 RF 82.2 88.9 IRR 46.7 57.8
Can we predict some genotype in all locations?
Latest years
Genetic structure of 32 years of ESWYT
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
ALL ENV GENO ALL ENV GENO ALL ENV GENO ALL ENV GENO
Linear Regression SVM Regression Random Forest PLS Regression
Drought
Optimal
LowN
Model
CV
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
ALL ENV GENO ALL ENV GENO ALL ENV GENO ALL ENV GENO
Linear Regression SVM Regression Random Forest PLS Regression
Drought
Optimal
LowN
Modeling Maize African nurseries (EIHYB)
Maize landraces in Latino-America
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
LR RF SVM KNN
Training
TS=C
TS=C+G(PCs)
TS=C+Pop
TS=PCs
PC1=C
PC2=C
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
LR RF SVM KNN
CV
TS=C
TS=C+G(PCs)
TS=C+Pop
TS=PCs
PC1=C
PC2=C
Modeling TS and genetic structure with long-term climate
Cycle SOW_julia
n Emergence
_Julian HARVEST_julia
n FOLIAR_DISEASE_DEVELOPM
ENT IRRIGATE
D LODGIN
G
2005 11/19/2005 4/21/2006 TRACES YES SLIGHT
Environmental
data
Cycle SOW_julian Emergence _Julian HARVEST_julian FOLIAR_DISEASE_DEVELOPMENT IRRIGATED LODGING
2005 11/19/2005 4/21/2006 TRACES YES SLIGHT
Traits Varieties tested
PBW343
CHAM 6
KLEIN CHAMACO
HIDDAB
CHAKWAL 86
DHARWAR DRY
MILAN/KAUZ//PASTOR
FLORKWA-1/DHARWAR DRY
PASTOR/BAV92
CNDO/R143//ENTE/MEXI_2/3/AEGILOPS
SQUARROSA (TAUS)/4/WEAVER/5/PASTOR
VEBOW/IRENA
PASTOR/DHARWAR DRY
PASTOR//MILAN/KAUZ
BJY/COC//PRL/BOW/3/FRTL
RL6043/4*NAC//PASTOR
BERKUT
SERI*3//RL6010/4*YR/3/PASTOR/4/BAV92
SOROCA
PARUS/PASTOR
ASTREB
PASTOR//HXL7573/2*BAU
PASTOR//HXL7573/2*BAU
PASTOR/3/BJY/COC//PRL/BOW
SOKOLL
SOKOLL
SRMA/TUI//PASTOR
ALTAR 84/AE.SQUARROSA (224)//2*CUPE/3/BAV92
SKAUZ/PASTOR/3/CROC_1/AE.SQUARROSA
(224)//OPATA
CNO79//PF70354/MUS/3/PASTOR/4/BAV92
CNO79//PF70354/MUS/3/PASTOR/4/BAV92
MILAN/KAUZ//PRINIA/3/BAV92
MILAN/KAUZ//DHARWAR DRY/3/BAV92
MILAN/KAUZ/3/URES/JUN//KAUZ/4/CROC_1/AE.SQ
UARROSA (224)//OPATA
KABY/BAV92/3/CROC_1/AE.SQUARROSA
(224)//OPATA
PASTOR/FLORKWA-1//BAV92
BOW//BUC/BUL/3/KAUZ/4/BAV92/5/MILAN/KAUZ
PASTOR//MILAN/KAUZ/3/VEE/PJN//2*TUI
PASTOR//MILAN/KAUZ/3/BAV92
BJY/COC//PRL/BOW/3/MILAN/KAUZ/4/BAV92
RL6043/4*NAC//PASTOR/3/BAV92
RL6043/4*NAC//PASTOR/3/BAV92
KAUZ/BAV92/3/BJY/COC//PRL/BOW
ATTILA/PASTOR
FRAME/BUCHIN
SLVS/PASTOR
PASTOR*2/BAV92
SKAUZ/BAV92//PASTOR
ATTILA/BAV92//PASTOR
TEMPORALERA M 87*2/KONK
Grain yield
Days to heading
Plant height
Agronomic score
Can feed the phenology
table presented earlier
The Wheat Atlas Website
Table of genotype by
location values +
mean, max, min and
SD of genotypes and
locations
Install a win-win
relationships with
collaborators: They send
data, we provide analysis
and reports
The CIMMYT IWIS web-page
http://apps.cimmyt.org/wpgd/index.htm
IWIN-DAP: An Excel Add-In to analyze CIMMYT data
● Curation is important
● Vey helpful to complete info at the genebank and
creation of stress populations accelerate
germplasm exchange
● Pipelines for prediction and genomic selection:
Pedigrees and markers
● Data management and sharing; analytical and
visualization tools
● Collaborations
Conclusions
Recommended