26
Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package Biometrics and Statistics Unit, Global Maize and Wheat programs June, 2015. CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data 1/26

Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Genomic Prediction and Selection forMulti-Environments with Big Data using the BGLR

statistical package

Biometrics and Statistics Unit, Global Maize and Wheat programs

June, 2015.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package1/26

Page 2: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Contents

1 BGLR

2 Prediction in multi-environments

3 Models

4 Cross validation

5 Application examples

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package2/26

Page 3: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

BGLR

BGLR

A novel software for whole genomic regression an prediction forcontinuous, discrete traits, censored and uncensored.Suitable for big p and small n problems.Many non-parametric and parametric models implemented in aconsistent manner.Large collection of Bayesian models included:

Bayesian ridge regression.Bayesian LASSO.BayesA, BayesB, BayesC-π.Reproducing Kernel Hilbert Spaces.Reproducing Kernel Hilbert Spaces with Kernel-Averaging.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package3/26

Page 4: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

BGLR

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package4/26

Page 5: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

BGLR

BGLR in a nutshell

Data equation: y = η + ε where η = 1µ+∑

X jβj + ul .Piors: Different priors can be assigned to regression coefficients andrandom effects ul , which leads to different models.Model fitting using MCMC algorithms (Gibbs sampler andMetropolis-Hastings) implemented efficiently.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package5/26

Page 6: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Prediction in multi-environments

Prediction in multi-environments

In most agronomic traits, the effects of genes are modulated byenvironmental conditions, generating G×E.Researchers working in plant breeding have developed multiple methodsfor accounting for, and exploiting G×E in multi-environment trials.Genomic selection is gaining ground in plant breeding.Most applications so far are based on single-environment/single-traitmodels.Preliminary evidence (e.g., Burgueño et al., 2012) suggests that there isgreat scope for improving prediction accuracy using multi-environmentmodels.The ideas can be taken one step further by incorporating information onenvironmental covariates.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package6/26

Page 7: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Prediction in multi-environments

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package7/26

Page 8: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Prediction in multi-environments

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package8/26

Page 9: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Models

Models

Model 1 (EL, Environment + Line, no pedigree)

yij = µ+ Ei + Li + eij

Model 2 (EA, Environment + Line, with markers)

yij = µ+ Ei + gj + eij

Model 3 (Environments, Line and interactions markes and environment)

yij = µ+ Ei + gj + Egij + eij

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package9/26

Page 10: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Models

Assumptions

It is assumed that Ei ∼ N(0, σ2E), g ∼ N(0,σ2

gG) with G being the genomicrelationship matrix and Egij the interaction term between genotypes andenvironment. Eg ∼ N(0, (Z gGZ T

g ) · Z EZ TE), Z g connects genotypes with

phenotypes, Z E connects phenotypes with environments, and · stands forHadamart product between two matrices.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package10/26

Page 11: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Cross validation

Cross validation

1 CV1: Prediction of performance of newly developed lines (i.e., lines thathave not been evaluated in any field trials).

2 CV2: Prediction in incomplete field trials; here the aim was to predictperformance of lines that have been evaluated in some environments butnot in others.

See Figure in next slide.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package11/26

Page 12: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Cross validation

Continue...

Figure 1: Two hypothetical cross-validation schemes (CV1 and CV2) for five lines(Lines 1-5) and five environments (E1-E5), source: Jarquín et al. (2014).

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package12/26

Page 13: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Application examples

Example 1 Wheat dataset (Ravi, Jessica et al.)

The phenotypic information consists in grain yield for wheat in 5 megaenvironments.

Table 1. Number of lines evaluated in each environment

The problem is to predict 9, 000 unobserved individuals in all theenvironments.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package13/26

Page 14: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Application examples

Continue...

Table 2. Phenotypic correlations between environments.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package14/26

Page 15: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Application examples

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package15/26

Page 16: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Application examples

Continue...

In order to do model fitting we used COP and markers (GBS).1 COP: We computed a relationship matrix (A). The matrix has about

50k × 50k = 2500,000,000 entries.We used BROWSE, the program took several days to finish.We used a ‘ad-hoc’ version of the R program pedigreemm and we got thematrix in about 3 hours.

2 Markers: Information for about 21,000 individuals and 14,000 individualswas available.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package16/26

Page 17: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Application examples

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package17/26

Page 18: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Application examples

Benchmark: Predicting 2014 using previous records

Figure 2: Predictions in testing

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package18/26

Page 19: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Application examples

The real problem

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package19/26

Page 20: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Application examples

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package20/26

Page 21: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Application examples

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package21/26

Page 22: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Application examples

Example 2: Biparental Tropical maize populations(Xuecai et al.)

Genotypic and phenotypic information for about 20 biparentalpopulations.Low (about 200) and Hight density markers (about 60,000).Individuals evaluated in several environments.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package22/26

Page 23: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Application examples

Continue...

Figure 3: Results from CV1

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package23/26

Page 24: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Application examples

Continue...

Figure 4: Results from CV2

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package24/26

Page 25: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Application examples

Collaborators in this work

J. CrossaJuan BurgueñoG. de los CamposJessica RutoskiRavi SinghEnrique AutriqueJesee PolandJuan Carlos Alarcón

Susan DreisigakerPaulino PérezX. ZhangK. SemagnY. BeyeneR. BabuF. San VicenteM. OlsenNewman Samayoua

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package25/26

Page 26: Genomic Prediction and Selection for Multi-Environments ...genomics.cimmyt.org/SAGPDB/Slides Paulino/BGLR slides/BGLR_Gx… · Multi-Environments with Big Data using the BGLR statistical

Application examples

References

Burgueño, J., G. de-los-Campos, K. Weigel, and J. Crossa. (2012).Genomic prediction of breeding values when modeling genotype ×environment interaction using pedigree and dense molecular markers.Crop Science, 43: 311-320.

Jarquín, D., J. Crossa, X. Lacaze, P. Cheyron, J. Daucourt, J. Lorgeou, F.Piraux, et al . (2014).A reaction norm model for genomic selection using high-dimensionalgenomic and environmental data.Theoretical and Applied Genetics, 127 (3): 595-607.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package26/26