Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Analysis of Experiments using ASReml:
with emphasis on breeding trials ©
Salvador A. Gezan [email protected]
Patricio R. Muñoz
October, 2014
CONTENTS
Session
1 Introduction to ASReml
2 Introduction to Linear Mixed Models
3 Job Structure in ASReml
4 Breeding Theory
5 Genetic Analyses: Parental Models
6 Genetic Anayses: Animal Models
5 Variance Structures in ASReml
6 Multivariate Analysis
7 Multi-environment Analysis
8 Spatial Analyses
9 Generalized Linear Mixed Models
10 Introduction to GBLUP
11 GBLUP in ASReml
Session 1
Introduction to
ASReml
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
“ASReml is an statistical packages that fits linear mixed models using
Residual Maximum Likelihood (REML)”
“Typical applications include the analysis of (un)balanced longitudinal data,
repeated measures analysis, the analysis of (un)balanced designed
experiments, the analysis of multi-environment trials, the analysis of both
univariate and multivariate animal breeding, genetics data and the analysis
of regular or irregular spatial data.”
ASReml uses the Average Information (AI) algorithm and sparse matrix
methods.
• Useful for analysis of large and complex dataset.
• Very flexible to model a wide range of variance models for random effects
or error structures (however, complex to program).
WHAT is ASReml?
Distributor Page
http://www.vsni.co.uk/products/asreml (version 3)
Platforms
Windows 98/ME/2000/XP/Vista/Windows7
Linux
Apple Macintosh
Interface
DOS (edit)
Windows (Notepad, ASReml-W)
R (or S-plus)
Text editors (e.g. ConTEXT)
GSView (graphical viewer)
Terminal (Mac)
HOW TO GET ASReml?
WHERE TO GET HELP?
Official Documentation c:\Program Files\Asreml3\Doc\
UserGuide.pdf (use Find window for searching)
UpdateR3.pdf
Webpages
uncronopio.org/ASReml/HomePage (cookbook)
http://www.vsni.co.uk/software/asreml/htmlhelp/ (distributor page)
www.vsni.co.uk/forum (user forum)
STEPS FOR AN ANALYSIS
• Identify the problem and experimental design / observational study.
• Detail treatment and design structure.
• Specify hypotheses / components of interest.
• Collect and prepare data file (e.g. Excel, Access).
• Perform initial data validation and exploratory data analysis (EDA) in
statistical software (e.g. SAS, R, GenStat).
Definition / modification of linear model.
Running / fitting of linear model.
Checking output.
• Extract final output.
• Report analysis.
STEPS FOR AN ANALYSIS IN ASReml
• Prepare ASCII data file (any ASCII editor).
• Prepare a job file (.as, e.g. ASReml-W, ConTEXT).
• Run analysis in ASReml (submit job).
• Check diagnostic plots and output.
• Extract results from output files (e.g. .asr, .sln, .yht).
• Review, revise, re-run fitted model.
• Report analysis.
ALFALFA EXPERIMENT
Source Variety Bk1 Bk2 Bk3 Bk4 Bk5 Bk6
1 A 2.17 1.88 1.62 2.34 1.58 1.66
1 B 1.58 1.26 1.22 1.59 1.25 0.94
1 C 2.29 1.60 1.67 1.91 1.39 1.12
1 D 2.23 2.01 1.82 2.10 1.66 1.10
2 E 2.33 2.01 1.70 1.78 1.42 1.35
2 F 1.38 1.30 1.85 1.09 1.13 1.06
2 G 1.86 1.70 1.81 1.54 1.67 0.88
2 H 2.27 1.81 2.01 1.40 1.31 1.06
3 I 1.75 1.95 2.13 1.78 1.31 1.30
3 J 1.52 1.47 1.80 1.37 1.01 1.31
3 K 1.55 1.61 1.82 1.56 1.23 1.13
3 L 1.56 1.72 1.99 1.55 1.51 1.33
An experiment was establish to compare 12 alfalfa varieties (labeled A-L).
These correspond to 3 different sources but the objective is to estimate
heritability of varieties regardless of its source. A total of 6 plots per variety
were established arranged in a RCB design. The response variable corresponds
to yield (tons/acre) at harvest time.
Example: /Day1/Alfalfa/ALFALFA.txt
ALFALFA EXPERIMENT
Consider a model with block as fixed and variety as random effects.
yield = µ + block + variety + error
yij observation belonging to ith treatment jth block
αi fixed effect of the ith block
gj random effect of the jth variety, E(gj) = 0, V(gj) = σg2
eij random error of the ijth observation, E(eij) = 0, V(eij) = σ2
i = 1, … , 6 (r blocks)
j = 1, … , 12 (t treatments)
ijjiij egy
ALFALFA EXPERIMENT
Source Variety Block Resp
1 A 1 2.17
1 B 1 1.58
1 C 1 2.29
1 D 1 2.23
2 E 1 2.33
2 F 1 1.38
2 G 1 1.86
2 H 1 2.27
3 I 1 1.75
3 J 1 1.52
3 K 1 1.55
3 L 1 1.56
...
3 J 6 1.31
3 K 6 1.13
3 L 6 1.33
Data file: /Day1/Alfalfa/ALFALFA.txt
ALFALFA EXPERIMENT
Alfalfa experiment - 12 varieties - Response Yield
Source 3 !I # Not used
Variety 12 !A !SORT
Block 6 !I
yield
ALFALFA.txt !SKIP 1
!DISPLAY 7 !SUMMARY
yield ~ mu Block !r Variety
predict Variety !SED !TDIFF !PLOT
Job file: /Day1/Alfalfa/Alfalfa.as
Some syntax
~ separates response from the list of fixed and random terms.
! Used for identification of option.
# Comment following (skips rest of line).
ALFALFA EXPERIMENT
ALFALFA EXPERIMENT
ASReml 3.0 [01 Jan 2009] Alfalfa experiment - 12 varieties - Response Yield
Build gt [26 Nov 2010] 32 bit
28 Sep 2013 16:28:33.369 32 Mbyte Windows Alfalfa
Licensed to: UFL 31-dec-2013
***********************************************************
* Contact [email protected] for licensing and support *
***************************************************** ARG *
Folder: C:\WORK\ASReml\ASReml_2013\Distribute_Instr\Day1\Alfalfa
Source 3 !I
Variety 12 !A !SORT
Block 6 !I
QUALIFIERS: !SKIP 1 !DISPLAY 7 !SUMMARY
Reading Alfalfa.txt FREE FORMAT skipping 1 lines
Univariate analysis of yield
Summary of 72 records retained of 72 read
Model term Size #miss #zero MinNon0 Mean MaxNon0 StndDevn
1 Source 3 0 0 1 2.0000 3
2 Variety 12 0 0 1 6.5000 12
3 Block 6 0 0 1 3.5000 6
4 yield Variate 0 0 0.8800 1.597 2.340 0.3584
5 mu 1
QUALIFIERS: predict Variety !SED !TDIFF !PLOT
Forming 19 equations: 7 dense.
Initial updates will be shrunk by factor 0.316
Notice: 1 singularities detected in design matrix.
1 LogL= 48.7345 S2= 0.61974E-01 66 df 0.1000 1.000
2 LogL= 50.0218 S2= 0.57316E-01 66 df 0.1705 1.000
3 LogL= 51.1506 S2= 0.52550E-01 66 df 0.2957 1.000
4 LogL= 51.6976 S2= 0.48748E-01 66 df 0.4902 1.000
5 LogL= 51.7366 S2= 0.47751E-01 66 df 0.5717 1.000
6 LogL= 51.7370 S2= 0.47654E-01 66 df 0.5808 1.000
7 LogL= 51.7370 S2= 0.47653E-01 66 df 0.5809 1.000
ALFALFA EXPERIMENT
Final parameter values 0.58087 1.0000
- - - Results from analysis of yield - - -
Approximate stratum variance decomposition
Stratum Degrees-Freedom Variance Component Coefficients
Variety 11.00 0.213732 6.0 1.0
Residual Variance 55.00 0.476526E-01 0.0 1.0
Source Model terms Gamma Component Comp/SE % C
Variety 12 12 0.580868 0.276798E-01 1.81 0 P
Variance 72 66 1.00000 0.476526E-01 5.24 0 P
Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
5 mu 1 11.0 858.95 <.001
3 Block 5 55.0 17.42 <.001
Notice: The DenDF values are calculated ignoring fixed/boundary/singular
variance parameters using algebraic derivatives.
Solution Standard Error T-value T-prev
3 Block
2 -0.180833 0.891185E-01 -2.03
3 -0.875000E-01 0.891185E-01 -0.98 1.05
4 -0.206667 0.891185E-01 -2.32 -1.34
5 -0.501667 0.891185E-01 -5.63 -3.31
6 -0.687500 0.891185E-01 -7.71 -2.09
5 mu
1 1.87417 0.792320E-01 23.65
2 Variety 12 effects fitted
SLOPES FOR LOG(ABS(RES)) on LOG(PV) for Section 1
0.87
Finished: 28 Sep 2013 16:28:34.002 LogL Converged
ALFALFA EXPERIMENT
a experiment - 12 varieties - Response Yield Residuals vs Fitted values Residuals (Y)-0.3828:0.4563 Fitted values (X) 0.95733: 2.09034
Alfal fa experi ment - 12 varieties - Response Yield vE_1_A
Histogram of resi duals 28 Sep 2013 16:28:33
Range: -0.382836 0.456330
Peak Count: 6
0
ALFALFA EXPERIMENT
Interpreting output
Source of
variation
Num
df
Den
df
Variance
ratio
P-value
Block 5 55 17.42 < 0.001
gi ~ N[0,σg2] sg
2 = 0.0277
eij ~ N[0,σ2] s2 = 0.0477
H2 = 0.0277/(0.0277 + 0.0477) = 0.367
LogL Converged
Approximate stratum variance decomposition
Stratum Degrees-Freedom Variance Component Coefficients
Variety 11.00 0.213732 6.0 1.0
Residual Variance 55.00 0.476526E-01 0.0 1.0
Source Model terms Gamma Component Comp/SE % C
Variety 12 12 0.580868 0.276798E-01 1.81 0 P
Variance 72 66 1.00000 0.476526E-01 5.24 0 P
Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
5 mu 1 11.0 858.95 <.001
3 Block 5 55.0 17.42 <.001
ALFALFA EXPERIMENT
Interpreting output
Alfalfa experiment - 12 varieties - Response Yield 19 Feb 2012 20:34:11
Alfalfa
Ecode is E for Estimable, * for Not Estimable
The predictions are obtained by averaging across the hypertable
calculated from model terms constructed solely from factors
in the averaging and classify sets.
Use !AVERAGE to move ignored factors into the averaging set.
---- ---- ---- ---- ---- ---- 1 ---- ---- ---- ---- ---- ----
Predicted values of yield
The SIMPLE averaging set: Block
Variety Predicted_Value Standard_Error Ecode
A 1.8130 0.0795 E
B 1.3714 0.0795 E
C 1.6485 0.0795 E
D 1.7702 0.0795 E
E 1.7275 0.0795 E
F 1.3675 0.0795 E
G 1.5812 0.0795 E
H 1.6330 0.0795 E
I 1.6796 0.0795 E
J 1.4542 0.0795 E
K 1.5086 0.0795 E
L 1.6071 0.0795 E
ASReml FILES
.apj Project file created with ASReml-W
.as Model and job specifications
.ass Summary statistics for variables from data set
.asr Report output of analysis and summary job
.aov Details of ANOVA table calculations
.sln Solutions of fixed and random effects
.pvs Report predictions and their standard errors
.res Residual statistics and basic residual plots
.ps Graphic files in PS format
.vvp Matrix of variance of variance components
.yht Residuals, predicted and hat values
.pin Calculations of functions of variance components
.pvc Report calculations of functions of variance components
Session 2
Introduction to
Linear Mixed Models
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
MIXED MODELS
• Mixed models extend the linear model by allowing a more flexible
specification of the errors (and other random factors). Hence, it allows for a
different type of inference and also allows to incorporate correlation and
heterogeneous variances between the observations.
• Fixed effects: are those factors whose levels are selected by a nonrandom
process or whose levels consist of the entire population of possible levels.
Inferences are made only to those levels included in the study. Hint: all
levels of interest are in your data set.
• Random effects: a factor where its levels consist of a random sample of
levels from a population of possible levels. The inference is about the
population of levels, not just the subset of levels included in the study.
• Mixed linear models contain both random and fixed effects.
ALFALFA EXPERIMENT
yij observation belonging to ith treatment jth block
αi fixed effect of the ith block
gj random effect of the jth variety, E(gj) = 0, V(gj) = σg2
eij random error of the ijth observation, E(eij) = 0, V(eij) = σ2
i = 1, … , 6 (r blocks)
j = 1, … , 12 (t treatments)
ijjiij egy
MODEL FOR A RCBD
where,
Dataset: two factors to consider: one defining the block to which each
experimental unit is allocated, and the other to the treatment applied
to each unit.
ijjiij egy
yij observation belonging to the ith treatment jth block, i = 1 … r, j = 1 … t
μ is the population mean
αi fixed effects of the ith block
gj random effects of the jth variety, E(gj) = 0, V(gj) = σg2
eij random error of the ijth observation, E(eij) = 0, V(eij) = σ2
gi ~ N[0,σg2]
eij ~ N[0,σ2]
Structural component (or blocking structure) • Concerned the underlying variability (heterogeneity) and structure of the
experimental or measurement units.
• “Controls” different sources of natural variation amongst the units using
factors (e.g. blocks) or variates (e.g. covariates).
Explanatory component (or treatment structure) • Defines the different treatments (or treatment combinations) applied to the
experimental units.
• Provides information about the differences in response caused by the
different treatments and answers the questions of interest.
Multi-stratum ANOVA: makes explicit the separation between blocks (or the
more general structure of units) and treatments.
MODEL COMPONENTS
response = systematic component + random component
response = structural component + explanatory component + random component
Hypothesis of interest
Fixed effects:
(i.e. is there a significant treatment effect)
Test statistic: F or t
Random effects:
(i.e. is there a significant variation due to the random effects)
Test statistic: Chi-square (likelihood ratio test)
MIXED MODELS
H0: µ1 = µ2 = … = µt
H1: µi ≠ µj for some i, j in the set 1 … t
H0: σg2 = 0
H1: σg2 > 0
ALFALFA EXPERIMENT
Consider a model with block as fixed and variety as random effects.
yield = µ + block + variety + error
yij observation belonging to ith treatment jth block
αi fixed effects of the ith block
gj random effects of the jth variety, E(gj) = 0, V(gj) = σg2
eij random error of the ijth observation, E(eij) = 0, V(eij) = σ2
i = 1, … , 6 (r blocks)
j = 1, … , 12 (t treatments)
ijjiij egy
ALFALFA EXPERIMENT
eZgXβy
yield = µ + block + variety + error
2
2
2
0
...
0
g
g
g
G
2
2
2
0
...
0
R
tr
t
t1
11
tr
tr
t
1r
11
e
e
.
.
e
e
g
.
.
g
g
...
.
.
...
.
...
...
...
.
.
...
.
.
...
.
...
...
...
y
y
.
.
y
y
.
.
.
.
.
.
100
001
.
001
100
.
.
.
001
101
101
.
001
011
.
.
.
011
.
.
.
.
.
.
1
2
1
1
1
LINEAR MIXED MODEL
X (n x r) design matrix for fixed effects
β (r x 1) vector of fixed effects
Z (n x t) design matrix for random effects
g (t x 1) vector of random effects
e (n x 1) vector of random errors
G (t x t) matrix of variance-covariance of random effects
R (n x n) matrix of variance-covariance of random errors
0
0
e
gE
R0
0G
e
gVareZgXβy
ALFALFA EXPERIMENT
tgg
g
g
g
tg
g
g
IG22
2
2
2
2
1
10
...
1
01
0
...
0
...
rt
rte
e
e
IR2
2
2
2
12
11
0
...
0
tggg ...21
rteee ...1212
Assumptions
• Random effects: E(g) = 0, V(g) = G = G(θ)
• Deviations: E(e) = 0, V(e) = R = R(θ)
• g and e independent.
hence, E(y) = Xβ
Var(y) = V = V(θ) = V(y) = ZGZ’ + R
Note: normality assumptions can be made about g and e.
LINEAR MIXED MODEL
g ~ MVN(0, G) and e ~ MVN(0, R)
0
0
e
gE
R0
0G
e
gVareZgXβy
• Variance components need to be estimated before obtaining estimates of
fixed/random effects and performing any type of inference.
• Restricted/residual maximum likelihood (REML) is a likelihood-based
method used to estimate these variance components and is based assuming
that both g and e follow a multivariate normal distribution.
• The REML variance component estimates are later used to estimate the
solution of fixed and random effects.
• Henderson (1950) derived the Mixed Model Equations (MME) to obtain
the solutions of all affects simultaneously:
G = G(θ)
R = R(θ)
VARIANCE COMPONENTS
V = V(θ) = V(y) = ZGZ’ + R θ → ^ ^ ^
^ ^ ^ ^ ^ ^
yVXXVXβ111 ˆ')ˆ'(ˆ
)ˆ(ˆ'ˆˆ 1βXyVZGg
BLUE → EBLUE
BLUP → EBLUP
VARIANCE STRUCTURES
2
2
2
2
2
000
000
000
000
1000
0100
0010
0001
2
4
2
3
2
2
2
1
000
000
000
000
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
2
1
1
1
1
1
1
1
1
123
112
211
321
2
2
44
2
34
2
24
2
14
2
34
2
33
2
23
2
13
2
24
2
23
2
22
2
12
2
14
2
13
2
12
2
11
ID: identity
DIAG: diagonal
CORUV: uniform correlation
AR1V: autocorrelation 1st order
US: unstructured
CORUH: uniform heterogeneous
2
4434241
43
2
33231
4232
2
221
413121
2
1
CORRELATION STRUCTURES
CORU: unform correlation
CORG: general correlation AR1: autocorrelation 1st order
CORUB: banded correlation
1
1
1
1
1
1
1
1
123
112
211
321
1
1
1
1
342414
342313
242312
141312
1
1
1
1
123
112
211
321
• V(β) = (X’V-1X)-1
• V(Lβ) = L(X’V-1X)-1L’
• Lβ is the best linear unbiased estimate of Lβ
PROPERTIES OF EBLUE (optional)
• Test of H0: Lβ = 0
β’L’(LX’V-1XL’)-1Lβ ~
• 100(1-α)% confidence interval for l’β
l’β ± zα/2 l’(X’V-1X)-1l
F (approx) with df1= r(L) and df2
(Satterthwaite or Kenward-Roger)
yVXXVXβ111 ˆ')ˆ'(ˆ
PROPERTIES OF EBLUP (optional)
• Linear Combination of a function of fixed and random effects:
Predictions
yRZ'
YRX'
GZRZ'XRZ'
ZRX'XRX'
g
β-1
-1
-1-1-1
-1-1
ˆ
ˆ
ˆˆˆ
ˆˆ
ˆ
ˆ
yRZ'
yRX'
CC
CC
g
β-1
-1
zzzx
xzxx
ˆ
ˆ
ˆ
ˆ
zz
zz
xx
Cgg
CGg
Cβ
)-(
)(
)(
ˆVar
ˆˆVar
ˆVar
)MCG(M'LCL'P
gM'βL'P
zzxx
ˆ)ˆ(Var
ˆˆˆ
PROPERTIES OF EBLUP (optional)
PEV: predictor error variance
r2: reliability (correlation between true and predicted BV)
r: accuracy
SE(BLUP): standard error of a random effect
ii
i cˆ )gSD(
222 )1()gPEV( ee
ii
i rcˆ
2
2
2
2 1PEV
1)g(g
eii
g
i cˆr
2
2 PEV1)g()g(
g
iiˆrˆr
TESTING VAR. COMPONENTS
LRT: likelihood ratio test
• Based on asymptotic derivations.
• Used to compare nested models and is valid if the fixed effects are the same
(under REML).
• Examples:
• Test Statistic: d = 2 [ logL2 – logL1] ~ χ2r2-r1
Hypothesis P-value
Two-sided Prob(χ2r2-r1 > d)
One-sided 0.5(1 – Prob(χ21 ≤ d))
H0: ρ = 0 against H0: ρ ≠ 0
H0: σ2
g = 0 against H0: σ2
g > 0
TESTING VAR. COMPONENTS
Critical values
Goodness-of-fit statistics
• AIC and BIC can be used to select/rank non-nested models
r2 - r1 α = 0.05 α = 0.01
Δdf Two-sided One-sided Two-sided One-sided
1 3.84 2.71 6.63 5.41
2 5.99 4.61 9.21 7.82
3 7.81 6.25 11.34 9.84
4 9.49 7.78 13.28 11.67
5 11.07 9.24 15.09 13.39
AIC = – 2×logL + 2×t
BIC = – 2×logL + 2×t×log(v)
t number of variance parameters in the model
v residual degrees of freedom, v = n – p
ALFALFA EXPERIMENT
Source Model terms Gamma Component Comp/SE % C
Variety 12 12 0.580868 0.276798E-01 1.81 0 P
Variance 72 66 1.00000 0.476526E-01 5.24 0 P
7 LogL= 51.7370 S2= 0.47653E-01 66 df 0.5809 1.000
Model with Variety
Model without Variety 2 LogL= 44.8781 S2= 0.75332E-01 66 df 1.000
Source Model terms Gamma Component Comp/SE % C
Variance 72 66 1.00000 0.753324E-01 5.74 0 P
Testing Variety
H0: σ2
g = 0 against H0: σ2
g > 0
d = 2 [51.737 – 44.878] = 13.72 , Δdf = 1
χ20.05 = 2.71, p-value < 0.001
Testing Genetic variation
H0: H2 = 0 against H0: H
2 > 0
Session 3
Job Structure in
ASReml
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
JOB FILE
STRUCTURE .as FILE
PART A: Data definition and reading of data set.
PART B: Definition of analysis (options, linear model, output).
[Job title]
[Data definition]
[Specification of file(s) to read]
[Options]
[Linear model: factors and variables]
[Linear model: variance structure(s)]
[Additional output]
[Restrictions on variance components]
Note: some options can be indicated in this file or they can be added in the
batch command line.
General Relevant File Syntax
~ separates response from the list of fixed and random terms.
! used for identification of option.
comment following (skips rest of line).
, model specification continues on next line.
$ specifies an user-input option from commands.
Basic Model Syntax Operators . interaction (e.g. A.B, interaction A and B).
/ forms nested factor expansion.
* forms crossed factor expansion.
+ treated as a space.
- excludes model term from model.
JOB FILE
READING / MANIPULATING DATA
Column variables (continuous or discrete)
• Indented by a single space.
• Case sensitive!
• Should follow the same order of the data in original file.
• Less than 16 characters (recommended).
• Should start with character
• No spaces in field name.
Examples of name and type of variables
yield yield is a continuous variable.
treatment * treatments is a simple coded factor (as 1, 2, ... ).
Variety 12 !A variety is an alphabetically coded factor.
dose 4 !I dose is a numerically coded factor (any number).
sex 2 !I !L m f assigns labels to numerical values.
mother [n] !P mother is link to a pedigree structure.
• ASCII file (delimited by: tab, comma or space). • “NA”, “*” and “.” identify missing values.
• First s lines can be skipped by using !SKIP s
• Labels are stored in the order on which they are read.
Some manipulations/transformations
!FILTER f it will filter the variable f
!SELECT v selects observations equal to v from variable f (Above)
!=v to create/overwrite a variable with all values equal to v
!+o sums to variable the number o
!-o subtracts to variable the number o
!*o multiplies the variable by the number o
!/o divides variable by number o
!^p raises the variable to the power p
!^0 calculates the natural logarithm of the variable
!D v eliminates record with missing values or v
!M v converts values of v to missing values
!REPLACE o n replace data values o with n
READING / MANIPULATING DATA
Examples
Yield !*100 variable yield is multiplied by 100 as is read
Yield !M-9 observations with -9 are changed to missing
Yield !^0 calculates the natural log of variable yield
Ymean !=0 !+Y1 !+Y2 !/2 mean of two variables
Relevant Options
!SUMMARY provides a histogram, correlations, counts, etc. (see file .ass)
!OUTLIER performs additional outlier checks (see files .res and .yht)
!X x !Y y produces an scatter-plot for variables x and y
!SORT re-orders labels in alphabetical order
!MVINCLUDE missing values in a factor or variate are treated as zeros.
!WORKSPACE m assigns m Mbytes of memory for the fitting model
!EXTRA n forces n additional iterations after model converge
!MAXIT m indicates a maximum of m iterations
!DOPART $A indicates that different parts will be done
!PART n a specific model n within a job file (may list several parts)
!CONTINUE re-starts fitting of model from last iteration
READING / MANIPULATING DATA
Relevant Options !DISPLAY n selects type(s) of diagnostic plot
!NODISPLAY suppresses diagnostic plot output
!PS saves plots in ps format
!EPS saves plots in eps format
!PNG saves plots in png format
!EPS saves plots in eps format
!WMF saves plots in wmf format
!BMP saves plots in bmp format
Coding !DISPLAY n
1 = variogram
2 = histogram
4 = row and column trends
8 = perspective plot of residuals
e.g. 1 + 8 = 9 !DISPLAY 9 (default)
GRAPHICAL OUTPUT
Specification of Linear Models
Univariate case
y ~ <fixed dense> !r <random sparse> !f <fixed sparse>
mu the constant term or intercept (overall mean)
!r random effects to follow
!f sparse fixed effects to follow (not in ANOVA table)
mv term to estimate missing values (as fixed effects)
Examples
yield ~ mu Variety !r Block
Volume ~ mu Site Site.Block !r Mother Mother.Site !f mv
JOB FILE
Specification of Linear Models
• ASReml uses the Wilkinson and Rogers (1973) notation.
A.B indicates crossed factors
A*B = A + B + A.B SAS: A + B + A*B
A/B = A + A.B SAS: A + B(A)
• Note that the model term A.B denotes interaction or nested effects
depending on which other terms are previously included in the model.
Examples
Volume ~ mu Site !r Genotype Site.Genotype
Volume ~ mu Site !r Site.Genotype
Yield ~ mu A.B !r Block
JOB FILE
Model functions
(to be used after an specified column, or to create new model variables).
and(t) overlays a design matrix for a model term into an existing one
at(f,n) creates a binary variable for the condition specified in a factor
fac(v) forms a factor with the values of a continuous variable
lin(f) transform the factor f into a covariate
uni(v) creates a factor with a level for every record in the data file
fav(v,y) forms a factor with the levels of a combination of 2 factors
ide(f) fits an additional factor without its genetic relationship matrix
inv(v) calculates inverse of variable v
log(v) calculates the natural logarithm of v
pow(y,p) calculates the variable y to power v
sqrt(v) calculates the square root of v
spl(v,n) fits a spline for variable v with n knots
pol(y,n) forms a set of orthogonal polynomials of order n
JOB FILE
Some options in the variance components
!GP restricts to the positive parameter space
!GU unrestricted
!GF fixed at a given supplied value (e.g. starting value)
!VCC c indicates the number of variance parameters constraints
Example
Volume ~ mu Block !r Mother 0.25 !GF Plot 0.4 !GU
JOB FILE
RUNNING ASReml (BATCH MODE)
>asreml –<options> <filename> <arguments>
<options> single letter that indentifies output or job options.
<filename> file “.as” with job details.
<arguments> allows for specific user-defined arguments.
Some options
-c re-start iteration from latest one (continue)
-p calculation of a function of variance components (.pin)
-sm assigns different memory space to job (usually 4, 5, 6, 7 or 8)
-rn renames the file with the argument n (default n = 1)
-n suppress interactive graphics
Examples
>asreml –rs3c Alfalfa 1
RUNNING ASReml (JOB FILE MODE)
Add commands/arguments in the first line of job file.
Equivalent to using batch mode but useful within ASReml-W
Some options
!RENAME renames the file with the arguments
!ARGS n specifies the arguments (can be more than one)
!NOGRAPHS suppress interactive graphics
!WORKSPACE w sets workspace to w Mbytes (e.g. 1600)
!CONTINUE re-start iteration from the latest one
Example
!RENAME !ARGS 1 2 !WORKSPACE 1600
Session 4
Breeding Theory
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
Discrete variation
• Different phenotypic classes are easily distinguished among genotypes
• Few genes with large effect (i.e. major genes).
xi ~ Bin(n, p)
Quantitative variation
• No clear classes between genotypes. Corresponds to most economically
important traits in animal and plant breeding.
• Due to the effect of many genes that contribute to the phenotypic
variation. Every gene with a small additive effect, plus some
environmental variation (infinitesimal model, Fisher 1918).
Probability distribution gi ~ N(0, σ2)
GENETIC VARIATION
p = μ + g + e
• Phenotypic value (p) deviates from the mean (μ) because the genotypic
component (g) and the environmental deviation (e).
• To isolate g we need to test the progeny!!!
g = a + d + i
p = μ + a + d + i + e
a is the additive component, i.e. cumulative effect of the genes or breeding
value (also known as GCA).
d is the dominance deviation, i.e. interaction between alleles or within-locus
interaction (also known as SCA).
i is the epistatic deviation, i.e. between-loci interaction and higher order
interactions.
e is the random deviation o residual.
PHENOTYPIC VALUE
• Partition of the variance is central to quantitative genetics and breeding,
because is the way we quantify the relative importance of genetic and
environmental influences (e.g. heritability).
• Partition is possible with data where the resemblance among relatives can be
used to estimate genetic variance components.
Vp = Vg + Ve
Vp = Va + Vna + Ve
where, Vna = Vd + Vi is the non-additive variance.
• In the statistical analysis (MM) the genetic variance estimates (e.g. Va) are
obtained by relating them to the causal component (e.g. σa2)
VARIANCE COMPONENTS
Broad sense heritability or degree of genetic determination
H2 = Vg / Vp How much of the total variation is due to genetic
causes (g). Important when working with clonally
replicated individuals.
Narrow sense heritability
h2 = Va / Vp Extent to which phenotypes are determined by the
genes transmitted from parents. Determines the degree
of resemblance among relatives. The most important
measure for breeding programs.
Heritabilities vary from 0 to 1 (e.g. 0.5 could be considered high).
Other definitions: family, plot-mean heritabilities and clonal repeatability
HERITABILITY
Definition
• The average effect of the parental alleles passed to the offspring determine
the mean genotypic value of its offspring, or
• The genetic value of an individual (or cross) judged by mean value of its
progeny.
- Sum of average effects across loci (theoretical, now molecular).
- Mean value of offspring (practical).
• Not equivalent concepts if interaction between loci is present or if mating is
not at random.
Estimation
• By BLUP (Best Linear Unbiased Predictor), i.e. the prediction of the
random effects from linear mixed models.
BREEDING VALUE (BLUP)
vector of random effect predictions.
covariance matrix between observations and random
(genetic) effects to be predicted.
variance-covariance matrix for the observations.
individual observations ‘corrected’ by fixed effects.
Gain
Note: the expression changes depending of what trait is being evaluated (y).
BLUP (or EBLUP)
g
C'Z'G ˆ
V
)βX(y ˆ
)(g
)(][g
2
22
yyhˆ
yy/ˆ
ˆˆˆˆ
ii
ipai
)βX(yVZ'Gg
1
)ˆ(ˆ'ˆˆ 1βXyVZGg
• All kind of selection have by aim to increase frequency of favourable
alleles at loci influencing the selected trait(s)
• Types: mass, parental, family, combined, indirect, forward, backward.
SELECTION
Selected
population
Propagation
population
Base
population
Increase
genetic gain
Increase
diversity
Example
Assuming normal distribution, truncated selection and h2 = 0.4
S = μselected – μpopulation = 35 – 25 =10 cm
SELECTION DIFFERENTIAL (S)
25 cm 35 cm
29 cm
• In mass selection, genetic gain can be quantified as the difference between the
average breeding (e.g. additive) values from the selected and original
population, i.e.
But then
• Genetic gain depends of the selection intensity (i), heritability (h2) and the
phenotypic standard deviation.
• Here i corresponded to the selection differential
(S = μselected – μpopulation) expressed in terms of phenotypic standard deviations.
GENETIC GAIN (GA)
ShaaG PSa
2
pSi /
pa hiShG 22
Definition: Correlation between traits (pleitrophy)
• Property of genes of influencing more than one phenotypic trait.
• It could be negative or positive (-1 to 1).
• Informs about the biological relationships among traits.
• Assists in the selection of ‘good’ individuals by looking into two traits
simultaneously.
TYPE-A CORRELATIONS
)()(
),(
21
21)(
pVarpVar
ppCovrg pA
)()(
),(
21
21)(
gVargVar
ggCovrg gA
Indirect Selection
1)(2121 paAa rghhiG
• Is a relative expression of genotype-by-environment interaction.
• It could be zero or positive (0 to 1).
• A value close to 0 indicates that the rank in one environment is very
different than the rank in another environment (i.e. low stability)
• A value close to 1 indicates that a single ranking can be used across all
environments without loss of information (i.e. high stability).
• Vaxs is the variance estimation of the site by genotype interaction.
• The following expressions represent the average correlation between sites
(if more than 2 sites are analyzed).
gxsg
g
VV
V
2
)(gBrgaxsa
a
VV
V
2
)(aBrg
TYPE-B CORRELATIONS
Definition: Correlation between sites
Session 5
Genetic Analyses:
Parental Models
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
Parental Models
• Half-sib crosses / sire model.
– One parent known. Parent selection.
• Full-sib crosses model.
– Both parents known. Parent/cross selection. Add and Dom effects estimable.
• Family model.
– Both parents known. Cross selection. Add and Dom effects confounded.
• Clonal model.
– Clonally replicated individuals. Parent/cross/individual selection.
Individual Models
• Animal model.
– One or two parents known. Individual/parent selection.
• Reduced animal model.
– One or two parents known. Individual/parent selection (only individuals with
records).
GENETIC MODELS
General aspects
• One parent is known (mother, sire, variety).
• The other parent is assumed to be unknown and to mate at random.
• Only additive component (Va) can be estimated.
• Useful for selection of parents (backward selection).
• Parental pedigree can (and should) be incorporated.
• Runs faster than other models (e.g. animal model).
Difficulties
• Concern about situations under non-random mating.
• Selection does not capture non-additive genetic variability.
HALF-SIB / SIRE MODEL
y vector of observations
β vector of fixed effects
b vector of random design effects (e.g. block or plot effect), ~ N(0, Iσ2b)
s vector of random sire effects (i.e. ½ breeding value), ~ N(0, Aσ2s)
e vector of random residual effects, ~ N(0, Iσ2)
X, Z1 and Z2 are incidence matrices
A is the numerator relationship matrix for sires. Replace by I if no pedigree.
I is an identity matrix
Va = 4 σ2s Vp = σ2
s + σ2
h2 = Va / Vp = 4 σ2s / [σ
2s + σ2]
HALF-SIB / SIRE MODEL
esZbZXβy 21
A tree genetic study consisting on seeds from a total of 28 female parents were
collected from mass selection and tested in a RCBD together with 3 control female
parents. The experiment consisted in 10 replicates with 34 plots each of size 2 x 3.
The response variables of interest are total height (HT, cm) and diameter at breast
height (DBH, cm). For now we will concentrate in the response HT. The objective is
to rank the female parents for future selections and seed production. In this analysis
parental pedigree will be ignored. Note that a model can be fitted with and without
the controls included as parents.
OPEN POLLINATION
Example: /Day1/OpenPol/OPENPOL.txt
ID REP PLOT FEMALE TYPE DBH HT
1 1 1 FEM1 Test 23.8 12.4
2 1 1 FEM1 Test 24.4 12.1
3 1 1 FEM1 Test 25.4 10.9
4 1 1 FEM1 Test 28.0 12.7
5 1 1 FEM1 Test 20.9 11.9
6 1 1 FEM1 Test 22.6 11.2
7 1 2 FEM15 Test 22.4 10.7
8 1 2 FEM15 Test 21.9 11.6
9 1 2 FEM15 Test 20.8 11.3
...
OPEN POLLINATION
!RENAME !ARGS 1
Open pollination trial
ID
REP 10 !I
PLOT 34 !I
FEMALE 31 !A !SORT
TYPE 2 !A !SORT
DBH
HT
OPENPOL.TXT !SKIP 1
!MAXIT 40 !DISPLAY 2 !DOPART $A
!PART 1
HT ~ mu REP !r FEMALE REP.PLOT
predict FEMALE
Example: /Day1/OpenPol/OpenPol_.as
OPEN POLLINATION
Interpreting variance components
fi ~ N[0,σs2] sf
2 = 0.196
pij ~ N[0,σp2] sp
2 = 0.053
eijk ~ N[0,σ2] s2 = 1.020
Source Model terms Gamma Component Comp/SE % C
FEMALE 31 31 0.192379 0.196155 3.48 0 P
REP.PLOT 340 340 0.518915E-01 0.529102E-01 2.58 0 P
Variance 1876 1866 1.00000 1.01963 27.74 0 P
Va = 4 s2f = 4 x 0.196 = 0.785
Vp = s2f + sp
2 + s2 = 0.196 + 0.053 + 1.020 = 1.269
h2 = Va / Vp = 0.785 / 1.269 = 0.619
Extract solutions for every parent and rank!!! (.sln file)
FULL-SIB MODELS
General aspects
• Both parents are known (mother, father, family or cross).
• Mating is often planned (e.g. diallels).
• Additive and dominance component (Va and Vd) can be estimated.
• Some studies allow to obtain common environment, reciprocals, etc.
• Useful for selection of parents (backward selection) or specific crosses.
• Increased gain as dominance effects can be ‘captured’.
• Parental pedigree can be incorporated.
Difficulties
• Dominance effects usually estimated with low precision, or confounded with
other effects.
• Better results obtained with a proper planning of crosses (e.g. connected
diallels).
• Need to check connectivity and number of crosses per parent (male and
female) otherwise this model cannot be fitted.
β vector of fixed effects (e.g. μ, replicate)
b vector of random design effects (e.g. block or plot effect), ~ N(0, Iσ2b)
m vector of random male effects (i.e. ½ BV), ~ N(0, Aσ2m)
f vector of random female effects (i.e. ½ BV), ~ N(0, Aσ2f)
mf vector of random interaction male by female effects, ~ N(0, Iσ2mf)
e vector of random residual effects, ~ N(0, Iσ2)
Va = 2 (σ2m + σ2
f) or Va = 4 σ2m (when σ2
m = σ2f)
Vd = 4 σ2mf
Vp = σ2m + σ2
f + σ2mf + σ2
h2 = Va / Vp = [2 (σ2m + σ2
f)] / [σ2
m + σ2f + σ2
mf + σ2]
d2 = Vd / Vp = 4 σ2mf / [σ
2m + σ2
f + σ2mf + σ2]
FULL-SIB: CLASSIC APPROACH
emfZfZmZbZXβy 4321
Example: /Day1/ContPol/CONTPOL.txt
A total of 177 families and 8 checklots were planted in a test using a RCBD with 25
blocks. For all families planted both parents are known. In this analysis parental
pedigree will be ignored. The objective is to estimate the different variance
components, and calculate heritabilities for the response variable YIELD.
FULL-SIB: CLASSIC
REP FAMILY FEMALE MALE YIELD
1 FAM007 PAR0001 PAR0024 128.68
1 FAM163 PAR0059 PAR0041 119.462
1 C10 C10 PAR0043 .
1 FAM040 PAR0020 PAR0053 103.641
1 FAM114 PAR0051 PAR0001 .
1 FAM053 PAR0032 PAR0032 .
1 FAM048 PAR0031 PAR0018 .
1 FAM057 PAR0033 PAR0035 155.226
1 FAM120 PAR0051 PAR0051 .
1 FAM165 PAR0059 PAR0059 193.982
1 FAM133 PAR0053 PAR0009 184.308
1 FAM057 PAR0035 PAR0033 .
1 C30 C30 PAR0043 141.912
1 FAM082 PAR0044 PAR0006 288.692
1 FAM060 PAR0034 PAR0037 .
1 FAM169 PAR0015 PAR0024 245.664
1 FAM047 PAR0031 PAR0016 .
...
FULL-SIB: CLASSIC
!RENAME !ARG 1
Control Crosses trial
REP 25 !I
FAMILY 182 !A
FEMALE 54 !A
MALE 57 !A
YIELD
ANALYSIS
FULLSIB
CHECKLOT
CONTPOLL2.TXT !SKIP 1
!MAXIT 40 !SKIP 1 !DISPLAY 2 !DOPART $A
!PART 1
YIELD ~ mu REP !r FEMALE MALE FEMALE.MALE
Example: /Day1/ContPol/ContPol_.as
FULL-SIB
Interpreting variance components
fi ~ N[0, σf2] sf
2 = 295.9
mj ~ N[0, σm2] sm
2 = 315.6
Fij ~ N[0, σF2] sF
2 = 961.4
eijk ~ N[0, σ2] s2 = 3816.4
Va = 2 (s2f + s2
m) = 2 (295.9 + 315.6) = 1223.0
Vd = 4 s2F = 4×(961.4) = 3845.6
Vp = s2f + s2
m + s2F
+ s2 = 295.9 + 315.6 + 961.4 + 3816.4 = 5389.3
h2 = Va / Vp = 1223.0 / 5389.3 = 0.23
d2 = Vd / Vp = 3845.6 / 5389.3 = 0.71
Extract solutions for every parent and family and rank!!! (.sln file)
Source Model terms Gamma Component Comp/SE % C
FEMALE 54 54 0.775232E-01 295.857 2.32 0 P
MALE 57 57 0.826941E-01 315.590 2.28 0 P
FAMILY 182 182 0.251902 961.350 6.07 0 P
Variance 3879 3854 1.00000 3816.36 42.73 0 P
FAMILY MODEL (Optional)
General aspects
• More common in animal breeding
• Occurs when parents are only present in a single cross.
• Parents might, or might not, be known.
• Additive and dominance component (Va and Vd) can not be separated, unless
there is a well connected parental pedigree.
• Useful for family selection or forward selection.
• Of practical use when dominance variance is known to be negligible.
Difficulties
• Dominance effects are confounded with additive effects.
• Potentially it could over-estimate future genetic gain.
β vector of fixed effects (e.g. μ, replication)
b vector of random design effects (e.g. block or plot effect), ~ N(0, Iσ2b)
F vector of random family effects, ~ N(0, Aσ2F) or N(0, Iσ2
F)
e vector of random residual effects, ~ N(0, Iσ2)
σ2F = Va/2 + Vd/4
Vp = σ2F + σ2
h2cross
= Vfamily / Vp = σ2F / [σ
2F + σ2]
Va and Vd can not be separated unless we assumed that Vd = 0
If Vd = 0 then Va = 2 σ2F
h2 = Va / Vp = 2 σ2F / [σ
2F + σ2]
eZbZXy 1 F2
FAMILY MODEL (Optional)
Example: /Day1/FamilyModel/FISHF.txt
A total of 459 fish were derived from single parental crosses composed of 32 sires
and 32 females to generate 32 families. Number of individuals per family varied
form 2 to 40. The idea is to rank the families and progeny for selection by using the
variable WEIGHT.
ID SireID DamID Family Weight
1001 120 125 22 88.3
1002 120 125 22 84.9
1003 120 125 22 76.8
1004 121 114 23 95.4
1005 121 114 23 85.4
1006 121 114 23 74.8
1007 121 114 23 103.4
1008 121 114 23 78.7
1009 121 114 23 109.5
1010 121 114 23 113.1
1011 121 114 23 95.4
1012 121 114 23 91.1
1013 121 114 23 85.4
1014 121 114 23 85.4
1015 121 114 23 86.0
...
FAMILY MODEL (Optional)
!RENAME !ARGS 1
Family fish experiment
ID
SireID 32 !A
DamID 32 !A
Family 32 !A !SORT
Weight
Fish_Family.txt !SKIP 1
!MAXIT 40 !DOPART $A
!PART 1
Weight ~ mu !r Family
Example: /Day1/FamilyModel/FishF_.as
FAMILY MODEL (Optional)
Interpreting variance components
Fi ~ N[0, σf2] sF
2 = 8.12
eijk ~ N[0, σ2] s2 = 105.65
Vfamily = s2F = 8.12
Vp = s2F + s2 = 8.12 + 105.65 = 113.77
h2cross
= Vfamily / Vp = 8.12 / 113.77 = 0.071
Extract solutions for every parent and rank!!! (.sln file)
Source Model terms Gamma Component Comp/SE % C
Family 32 32 0.768666E-01 8.12079 1.98 0 P
Variance 459 458 1.00000 105.648 14.71 0 P
FAMILY MODEL (Optional)
CLONAL MODEL (Optional)
General aspects
• It can estimated total genetic variability (Vg).
• If both parents are known (mother, father, family or cross) then the additive,
dominance and epistasis components (Va, Vd and Vi) can be reasonably
estimated.
• Useful for selection of parents (backward selection), crosses or specific
genotypes.
• Allows to capture, in new generations, additive, dominance and epistasis
effects.
Difficulties
• Presents same difficulties as full-sib models.
• Some confounding of the epistasis component occurs (higher order terms).
• Occasionally produces negative causal variance components.
β and b as defined before
m vector of random male effects, ~ N(0, Aσ2m)
f vector of random female effects, ~ N(0, Aσ2f)
mf vector of random interaction male by female effects, ~ N(0, Iσ2mf)
mf.c vector of random clonal within family effects, ~ N(0, Iσ2c)
e vector of random residual effects, ~ N(0, Iσ2)
CLONAL MODEL (Optional)
ecmfZmfZfZmZbZXy 54321 .
Va = 2 (σ2m + σ2
f) or Va = 4 σ2m (when σ2
m = σ2f)
Vd = 4 σ2mf Vi = σ2
c – (σ2m+ σ2
f) – 3 σ2mf (approx.)
Vg = Va + Vd + Vi
Vp = σ2m + σ2
f + σ2mf + σ2
c + σ2
H2 = Vg / Vp h2 = Va / Vp d2 = Vd / Vp
CLONAL MODEL (Optional)
Example: /Day1/Clonal/CLONES.txt
A clonal test derived from a total of 61 families crossed in a circular mating
design were established in a field trial with 3 repetitions and incomplete blocks.
Each family has several clones. The objective of this study is to estimate all
variance components (additive, dominance and epistasis).
IDSORT FamilyID Female Male cloneid Rep IncBlock Tree VOL
1 46 Par927 Par931 677 1 1 1 537.7436
2 33 Par908 Par914 476 1 1 2 492.1155
3 53 Par924 Par907 775 1 1 3 704.826
4 41 Par913 Par917 608 1 1 4 494.6012
6 27 Par923 Par905 391 1 2 1 622.0541
7 14 Par925 Par908 192 1 2 2 425.1107
8 22 Par913 Par923 304 1 2 3 298.8255
9 11 Par929 Par920 144 1 2 4 513.8072
11 23 Par901 Par924 320 1 3 1 457.7191
12 60 Par929 Par904 838 1 3 2 709.3598
15 12 Par917 Par921 162 1 3 5 *
16 53 Par924 Par907 763 1 4 1 392.4941
17 13 Par901 Par916 179 1 4 2 463.7218
19 24 Par915 Par904 340 1 4 4 445.3584
20 40 Par922 Par917 592 1 4 5 623.984
21 30 Par904 Par903 424 1 5 1 439.2273
...
CLONAL MODEL (Optional)
!RENAME !ARGS 1
Clonal Analysis of Pinus
IDSORT
FAMILY 61 !A
FEMALE 44 !P
MALE 44 !P
CLONE 868 !A
REP 3 !A
IBLOCK 110 !A
TREE
VOL
PEDPAR.TXT !SKIP 1 !MAKE !ALPHA
CLONES.TXT !SKIP 1
!MAXIT 50 !DISPLAY 2 !DOPART $A
!PART 1
VOL ~ mu REP !r REP.IBLOCK FEMALE MALE FAMILY CLONE !f mv
!PART 2
VOL ~ mu REP !r REP.IBLOCK FEMALE and(MALE) FAMILY CLONE !f mv
Example: /Day1/Clonal/Clonal_.as
Interpreting variance components
Different var. comp. for Male and Female
Same var. comp. for Male and Female
Source Model terms Gamma Component Comp/SE % C
FAMILY 61 61 0.294966E-01 518.846 1.12 0 P
REP.IBLOCK 330 330 0.353801E-06 0.622336E-02 0.00 0 B
FEMALE 44 44 0.714393E-01 1256.62 2.51 0 P
CLONE 868 868 0.428337 7534.46 8.44 0 P
Variance 2604 1766 1.00000 17590.0 22.65 0 P
Source Model terms Gamma Component Comp/SE % C
FEMALE 44 44 0.100569 1769.13 2.04 0 P
MALE 44 44 0.218970E-01 385.195 0.70 0 P
FAMILY 61 61 0.433857E-01 763.208 1.13 0 P
REP.IBLOCK 330 330 0.350074E-06 0.615822E-02 0.00 0 B
CLONE 868 868 0.427149 7514.07 8.42 0 P
Variance 2604 1766 1.00000 17591.2 22.65 0 P
CLONAL MODEL (Optional)
Session 6
Genetic Analyses:
Animal Models
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
Parental Models
• Half-sib crosses / sire model.
– One parent known. Parent selection.
• Full-sib crosses model.
– Both parents known. Parent/cross selection. Add and Dom effects estimable.
• Family model.
– Both parents known. Cross selection. Add and Dom effects confounded.
• Clonal model.
– Clonally replicated individuals. Parent/cross/individual selection.
Individual Models
• Animal model.
– One or two parents known. Individual/parent selection.
• Reduced animal model.
– One or two parents known. Individual/parent selection (only individuals with
records).
GENETIC MODELS
• Why worry about the pedigree in genetic analyses?
Statistically, random genetic effects (i.e. BLUPs) are not independent and
their matrix of correlations or co-variances (G or A) needs to be specified.
Genetically, it is important to consider information about relatives as they
will share some alleles, and therefore their response is correlated.
• How to incorporate this information?
Genetic relationships can be calculated using genetic theory (expected
values) or molecular information (e.g. SNPs), and included into the linear
mixed model by specifying a pedigree file,
• Are there other benefits?
Many. It is a more efficient use of the information about individuals, but also
genetic values of individual not tested, but with relatives tested, can be
predicted and selected.
INCORPORATING PEDIGREE
Example
Pedigree of a group of individuals:
PEDIGREE
Individual Male Female
3 1 2
4 1 Unknown
5 4 3
6 5 2
1 2
4 3
?
5
6
Numerator relationship matrix (A)
• Linked to the concept of identity by descent.
• Diagonal aii = 1 + Fi (inbreeding coefficient on individual i)
Twice the probability that two gametes taken at random from animal i will
carry identical alleles by descent.
• Off-diagonal aij numerator of the coefficient of relationship between animal
i and j.
• Several algorithms are available in ASReml to obtain this matrix.
PEDIGREE
125.1
688.0125.1
313.0625.000.1
563.0625.025.000.1
625.025.000.050.000.1
25.050.050.050.000.000.1
6
5
4
3
21
654321
A
CALCULATING THE A MATRIX
• Let A = {aij} be the relationship matrix.
• Let ai,-j the the i-th row of A except for the j-th element.
• Assume the relationship matrix for the base animals is known (e.g.
unrelated, non inbred). This will for a base matrix (e.g. identity)
• The row of the relationship matrix for the progeny of two parents is
generates as the average of the relationship matrix rows for the parents:
ai,-j = (as,-i + ad,-i)/2
• The diagonal element, ai,i of this new individual is:
ai,i = 1 + as,d/2 = 1 + Fi
where Fi is the inbreeding coefficient.
PEDIGREE
Analysis Trial AB23
Indiv 6 !P
Sire 3 !A
Dam 2 !A
Sex 2 !I
weight
PEDIGREE.PED !SKIP 1
DATA.DAT !SKIP 1
weight ~ mu Sex !r Indiv
PEDIGREE FILE
Indiv Male Female
1 0 0
2 0 0
3 1 2
4 1 0
5 4 3
6 5 2
In ASReml Graphically
1 2
4 3
?
5
6
Some useful options
!MAKE always generates the A inverse (instead of using a stored one).
!ALPHA allows to accept alphanumeric names of individuals.
!REPEAT ignore repeated individuals/entries in the pedigree file.
!GIV writes matrix A inverse in the ASCII format (.giv).
!INBRED generates pedigree for inbreed lines.
!SELF s allows for partial selfing according to variable s.
!GROUPS g includes genetic groups in the pedigree according to variable g.
In ASReml
• Pedigree file can be part of the data file
(first 3 columns: individual, parent1 and parent2).
• Method used to construct the A inverse s based on the algorithm of
Meuwissen and Luo (1992).
• Genetic groups can be defined here.
PEDIGREE FILE
PEDIGREE FILE
Construction / Check
• Pedigree information is associated with proper management and
validation/check of data.
• Individuals need to be ordered by generation (e.g. parents need to be
defined before progeny).
• All parents need to be defined in pedigree file (the inclusion of founder
parents is optional).
• All individuals present in dataset (i.e. levels associated with pedigree file)
need to be defined in pedigree file.
• Individuals can be defined as male or female parents (but this should be
checked if is not biologically possible).
ANIMAL / INDIVIDUAL MODEL
General aspects
• Requires defining individual and parental pedigree.
• A breeding value (or GCA) is obtained for each individual in the dataset,
and for all individuals (e.g. parents) in pedigree file.
• Typically used to estimates additive component (Va) only, but it can be
extended to non-additive and maternal effects.
• Useful for selection of individuals based on additive values (forward
selection) but can be also used to select parents.
• GCA values (or EBV) of parents will be proportional to a parental model.
Difficulties
• For large datasets it can be computationally costly.
• Pedigree file could be difficult to construct/maintain and it needs to be
checked carefully.
β vector of fixed effects
b vector of random design effects (e.g. block effect), ~ N(0, Iσ2b)
a vector of random additive effects (i.e. BV), ~ N(0, Aσ2a)
e vector of random residual effects, ~ N(0, Iσ2)
Va = σ2a
Vp = σ2a + σ2
h2 = Va / Vp = σ2a / [σ
2a+ σ2]
Note: any individual that are included in the pedigree file will have a
prediction of its breeding values (even those that are not measured).
ANIMAL / INDIVIDUAL MODEL
eaZbZXβy 21
The dataset for a fish breeding program contains a total of 933 records of fish.
The objective is to fit an animal model that considers the complete pedigree. The parental pedigree is found in the file PEDPAR.txt, but an individual pedigree
needs to be constructed. For fitting the model consider the factor SEX as a
covariate. The response of interest is days to market size (DAYSM).
ANIMAL / INDIVIDUAL MODEL
Example: /Day1/Fish/FISH.txt
INDIV Sire Dam DaysM Sex Market
1001 564 727 741.46 1 1
1002 564 727 500.09 2 1
1003 564 727 495.07 1 1
1004 564 727 506.25 2 1
1005 564 727 593.21 2 1
1006 564 727 671.10 1 1
1007 564 727 523.48 1 1
1008 564 727 531.33 1 1
1009 564 727 446.02 2 1
1010 564 727 599.20 1 0
1011 564 727 509.38 2 0
...
ANIMAL / INDIVIDUAL MODEL
!RENAME !ARGS 1
Breeding Program Fish
INDIV 2040 !P !SORT
SIRE 100 !I
DAM 100 !I
DAYSM
SEX 2 !I
MARKET
PEDIND.TXT !SKIP 1 !MAKE
FISH.TXT !SKIP 1
!MAXIT 40 !DISPLAY 2 !FCON !DOPART $A
!PART 1
DAYSM ~ mu SEX !r INDIV
Example: /Day1/Fish/Fish_.as
Va = s2a = 2046.39 Vp = s2
a + s2 = 2046.39 + 3500.52 = 5546.91
h2 = Va / Vp = 0.369
ANIMAL / INDIVIDUAL MODEL
Source Model terms Gamma Component Comp/SE % C
INDIV 1380 1380 0.584596 2046.39 4.52 0 P
Variance 933 931 1.00000 3500.52 10.21 0 P
Wald F statistics
Source of Variation NumDF DenDF_con F-inc F-con M P-con
7 mu 1 77.6 15677.14 15677.14 . <.001
5 SEX 1 888.2 21.88 21.88 A <.001
SEX 1 0.000 0.000
SEX 2 21.57 4.612
mu 1 549.8 5.172
INDIV 501 6.527 37.33
INDIV 502 6.074 35.14
INDIV 503 -27.03 36.32
INDIV 504 -23.94 37.53
INDIV 505 0.6396 35.30
INDIV 506 7.579 38.26
INDIV 507 -8.798 35.33
...
ANIMAL / INDIVIDUAL MODEL
Breeding Program Fish
Ecode is E for Estimable, * for Not Estimable
The predictions are obtained by averaging across the hypertable
calculated from model terms constructed solely from factors
in the averaging and classify sets.
Use !AVERAGE to move ignored factors into the averaging set.
---- ---- ---- ---- ---- ---- 1 ---- ---- ---- ---- ---- ----
Predicted values of DAYSM
The SIMPLE averaging set: SEX
INDIV Predicted_Value Standard_Error Ecode
501 567.1392 37.3393 E
502 566.6863 35.0927 E
503 533.5860 36.4141 E
504 536.6737 37.5067 E
505 561.2515 35.2528 E
506 568.1914 38.2210 E
507 551.8138 35.2626 E
508 526.8242 36.6684 E
509 525.0169 37.9278 E
510 523.4792 37.2501 E
511 616.2975 36.1484 E
512 563.8451 37.7190 E
513 541.1283 38.2338 E
514 532.9948 37.0123 E
515 541.1283 38.2338 E
516 538.0093 38.6922 E
105 586.5505 40.6930 E
1 586.5505 40.6930 E
...
ANIMAL / INDIVIDUAL MODEL
Additional comments
• When pedigree is available from several generations, usually more than 3
generations does not produce a significant improvement on precision of
estimates.
• Incorporation of genetic groups is critical in order to consider previous
achieved genetic gains, and to describe the proper structure of the data.
• Reduced animal model (RAM), it is an alternative that runs faster as only
animals with records are considered.
• Other variants exist of the animal model exist that consider:
• Environmental effects.
• Maternal effects
• Genetic maternal effects
• Model with non-additive genetic effects (mainly dominance)
• Common environment effects
β vector of fixed effects
b vector of random design effects (e.g. block effect), ~ N(0, Iσ2b)
a vector of random additive effects (i.e. BV), ~ N(0, Aσ2a)
ce vector of random common environmental effects, ~ N(0, Iσ2ce)
e vector of random residual effects, ~ N(0, Iσ2)
Va = σ2a
Vp = σ2a + σ2
ce + σ2
h2 = Va / Vp = σ2a / [σ
2a+ σ2
ce + σ2]
Note: common environment effects are non-genetic effects that causes
resemble between members of the same family.
eceZaZbZXβy 321
COMMON ENVIRON. EFFECTS
Session 7
Variance Structures in
ASReml
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
VARIANCE STRUCTURES
Direct Product
• Variance structures are specified by using direct products or two or more
matrices (, or Kronecker product).
2221
1211
aa
aaA
BB
BBBA
2221
1211
aa
aa
Example
100
010
001
A
2
212
12
2
1B
2
212
12
2
1
2
212
12
2
1
2
212
12
2
1
00
00
00
00
00
00
00
00
00
00
00
00
BA
VARIANCE STRUCTURES
Direct Sum
• The desired matrix is specified by several square matrices in a block
diagonal matrix.
Example
3
2
1
321
3 )(
A00
0A0
00A
AAARR ,,diagj1j
ALFALFA EXPERIMENT
Alfalfa experiment - 12 varieties - Response Yield
Source 3 !I
Variety 12 !A !SORT
Block 6 !I
yield
ALFALFAS.TXT !SKIP 1 !DISPLAY 7 !SUMMARY
yield ~ mu Block !r Variety
3 1 0
24 0 ID
24 0 ID
24 0 ID
An experiment was establish to compare 12 alfalfa varieties (labeled A-L).
These correspond to 3 different sources but the objective is to estimate
heritability of varieties regardless of its source. A total of 6 plots per variety
were established arranged in a RCB design. The response variable
corresponds to yield (tons/acre) at harvest time. It is of interest to fit a linear
model with an specific error variance for each of the different sources.
Example: /Day2/VarStruct/AlfalfaS_.as
ALFALFA EXPERIMENT
Interpreting output Source Model terms Gamma Component Comp/SE % C
Residual 72 66
Variety 12 12 0.222267E-01 0.222267E-01 1.64 0 P
Variance 0 0 0.928105E-01 0.928105E-01 2.93 0 P
Variance 0 0 0.602051E-01 0.602051E-01 2.99 0 P
Variance 0 0 0.146949E-01 0.146949E-01 2.49 0 P
Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
5 mu 1 9.9 990.35 <.001
3 Block 5 25.0 24.05 <.001
Variety A 0.1685 0.1000
Variety B -0.1668 0.1000
Variety C 0.4365E-01 0.1000
Variety D 0.1361 0.1000
Variety E 0.1211 0.9013E-01
Variety F -0.1983 0.9013E-01
Variety G -0.8737E-02 0.9013E-01
Variety H 0.3721E-01 0.9013E-01
Variety I 0.1027 0.6540E-01
Variety J -0.1585 0.6540E-01
Variety K -0.9548E-01 0.6540E-01
Variety L 0.1860E-01 0.6540E-01
VARIANCE STRUCTURES
Variance models (VCODE)
Common structures
ID Identity 1
DIAG Diagonal w
US Unstructured w(w + 1)/2
AINV Numerator relationship matrix (A) 0 or 1
CORU Uniform correlation 1
Correlation/Spatial structures
CORB Banded correlation w-1
AR1 First order autoregressive 1
AR2 Second order autoregressive 2
ARMA Autoregressive and moving average 2
CORG General correlation (homogeneous) w(w - 1)/2
ANTE1 Antedependence of order 1 w(w - 1)/2
LVR Linear variance 1
VARIANCE STRUCTURES
Correlation-variance structures (homogeneous)
AR1V First order autoregressive (homog.) 2
CORUV Uniform correlation (homogenoeus) 2
CORBV Banded correlation (homogeneos) w
CORGV general correlation (homogeneous) w(w - 1)/2 + 1
Heterogeneous structures
IDH = DIAG Identity (heterogenoeus) w
AR1H First order autoregressive (heterog.) 1 + w
CORUH Uniform correlation (heterogeneous) 1 + w
CORBH Banded correlation (heterogeneos) 2w - 1
CORGH = US general correlation (heterogeneous) w(w - 1)/2 + w
Special structures
IEXP Isotropic Exponential 1
AEXP Anisotropic Exponential 2
OWNk User supplied G matrix k
GIVk User supplied General (Inverse) matrix 0 or 1
VARIANCE STRUCTURES
2
2
2
2
2
000
000
000
000
1000
0100
0010
0001
2
4
2
3
2
2
2
1
000
000
000
000
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
1
2
1
1
1
1
1
1
1
1
123
112
211
321
2
2
44
2
34
2
24
2
14
2
34
2
33
2
23
2
13
2
24
2
23
2
22
2
12
2
14
2
13
2
12
2
11
ID: identity
DIAG: diagonal
CORUV: uniform correlation
AR1V: autocorrelation 1st order
US: unstructured
CORUH: uniform heterogeneous
2
4434241
43
2
33231
4232
2
221
413121
2
1
CORRELATION STRUCTURES
CORU: unform correlation
CORG: general correlation AR1: autocorrelation 1st order
CORUB: banded correlation
1
1
1
1
1
1
1
1
123
112
211
321
1
1
1
1
342414
342313
242312
141312
1
1
1
1
123
112
211
321
VARIANCE STRUCTURES
Variance Header Line
• Required whenever random effects or residuals are not identically and
independently distributed.
<sections> <dimensions> <number of G structures>
<sections>
• Number of residual (Rj) structures to define.
Example. If several experiments are combined into a single analysis, then
each experiment will have an error structure with its own variance:
However, it is also possible to define each error structure with a direct product:
j
s
j RR 1
21 jjj RRR
VARIANCE STRUCTURES
Variance Header Line
<dimensions>
• Number of direct product of variance structures that are required to define
each of the residual, Rj, structures.
Example. An spatial analysis will have an error structure defined by two
elements: correlations across rows and correlations across columns.
<number of G structures>
• Number of random effects (Gi, or any interaction) that are defined with
structures different than identically and independently distributed.
Example. Pedigree matrix can be defined here (G = A)
Note: each of this components will have to be defined in greater detail later.
VAR. STRS. - EXAMPLES
<sections>
• Number of residual structures to define.
3 1 0
1280 0 ID
1320 0 ID
2300 0 ID
!SECTIONS n number of residual structures to define.
3: acts as a counter (here, 3 sites)
1: only a single structure on each of the residual structures
0: no G structures defined
1280: number of observations in site 1 (sorted by site)
0: sortkey (sorting variable no specified here)
ID: VCODE corresponding to independent errors.
1320: number of observations in site 2 (sorted by site)
2300: number of observations in site 3 (sorted by site)
VARIANCE STRUCTURES
<dimensions>
• Number of direct product of variance structures that are required to define
each of the residual structures.
1 2 0
16 row AR1
20 col AR1
1: a single residual structure (1 site here)
2: two direct products that define the residual structure
0: no G structures defined
16: number of rows in experiment (it could be replaced by a number)
row: sortkey for order of rows within dataset
AR1: VCODE corresponding to auto-correlated structure
20: number of columns in experiment (it could be replaced by a number)
col: sortkey for order of columns within dataset
VARIANCE STRUCTURES
<number of G structures>
• Number of random effects (or interactions) that are defined.
3 1 1
1280 0 ID
1320 0 ID
2300 0 ID
site.genotype 2
site 0 CORGH 0.25 0.25 0.25 1.22 1.46 2.05
genotype 0 AINV
3: acts as a counter (here, 3 sites)
1: a single structure en each of the 3 residual elements
1: a single G structure defined
Note: the command !f mv keeps the missing observations and is useful for
counting observations over multiple R structures
VARIANCE STRUCTURES
3 1 1
1280 0 ID
1320 0 ID
2300 0 ID
site.genotype 2
site 0 CORGH 0.25 0.25 0.25 1.22 1.46 2.05
genotype 0 AINV
site.genotype G structure term to be defined
2 number of factors to define for this G structure
site (or 3): acts as a counter (as before with a value of 3)
0: sortkey (not specified)
CORGH: VCODE heterogeneous general correlation matrix
genotype: acts as counter for the genotype factor
0: sorkey (not specified)
AINV: VCODE inverse of the relationship matrix from pedigree file
Some options in the variance components
!GP restricts to the positive parameter space
!GU unrestricted
!GF fixed at a given supplied value (e.g. starting value)
!VCC c indicates the number of variance parameters constraints
!S2==1 qualifier required to fix the error variance at 1.0 and prevent
ASReml trying to estimate two confounded parameters (usually required for
cases where variance , instead of correlation, matrices are specified)
Example
Volume ~ mu Block !r Mother 0.25 !GF Plot 0.4 !GU
VARIANCE STRUCTURES
• Starting values and restrictions can be added next to the parameters.
• Important to aid convergence and to speed up fitting.
VARIANCE STRUCTURES
• Order of starting values for variance and correlation matrices is important
Variance Matrices
or
Correlation Matrices
or
Note: for most complex variance structures it is critical to specify starting values.
10987
654
32
1
10
69
548
3217
10
96
853
7421
10653
942
81
7
Next to model terms
!GP positive variance component
!GU unrestricted variance component (default)
!GF fixed variance component
Volume ~ mu Block !r Mother 0.25 Plot 0.4 !GF
After model terms
!VCC n to read n variance component restrictions lines G structure.
25 26 # V25 = V26
2 -3 # V2 = -V3
4 5 * 4 # V4 = V5*4
!=ABA all parameters with the same letter in the G or R structure
are treated as the same parameter.
2 0 US 0.2 0.3 0.5 !=ABA
CONSTRAINTS IN VAR-COV COMP.
ALFALFA EXPERIMENT
!PART 3
yield ~ mu Block !r Variety
1 2 0
3 Source DIAG 0.8 0.8 0.8 !=ABA
24 0 ID !S2==1
!PART 4
!VCC 1
yield ~ mu Block !r Variety
3 1 0
24 0 ID
24 0 ID
24 0 ID
4 6
It is of interest to fit a linear model with an specific error variance for each of
the sources 1 and 3, and a different for source 2.
Example: /Day2/VarStruct/AlfalfaS_.as
• Post-analysis procedure to calculate functions of variance components
(e.g. heritability or genetic correlations).
• Based in approximations using delta method (i.e. Taylor series approx.)
• It should not be used for statistical inference only as a rough reference.
Linear functions of variance components
Ratio of variance components
Correlations based in 3 variance components
Va = 4 σ2s Vp = σ2
s + σ2
h2 = Va / Vp = 4 σ2s / [σ
2s + σ2]
FUNCTIONS OF VAR. COMPS.
)()(
),(
21
21)(
gVargVar
ggCovrg gA
F Linear functions of variance components
H Ratio of variance components
R Correlations based in 3 variance components
• A .pin file needs to be created with the functions to be calculated
following the order of the variance components presented in the .asr file,
and also uses output from .vvp file.
• Output is presented in file .pvc
FUNCTIONS OF VAR. COMPS.
ASReml options
• Alternatively commands can be incorporated into the file using the commands: !PIN !DEFINE, which will generate the file automatically
and then run it.
!PART 5
yield ~ mu Block !r Variety
!PIN !DEFINE
F Vg 1 #3
F Vtotal 1 2 #4
H Herit 3 4
Source Model terms Gamma Component Comp/SE % C
Variety 12 12 0.580868 0.276798E-01 1.81 0 P
Variance 72 66 1.00000 0.476526E-01 5.24 0 P
1 Variety 0.276798E-01
2 Variance 0.476526E-01
3 Vg 1 0.27680E-01 0.15265E-01
4 Vtotal 1 0.75332E-01 0.16972E-01
Herit = Vg 1 3/Vtotal 4= 0.3674 0.1397
Notice: The parameter estimates are followed by
their approximate standard errors.
ALFALFA EXPERIMENT
Variance component estimates
pvc file
Example: /Day2/VarStruct/AlfalfaS_.as
Session 8
Multivariate Analysis /
Repeated Measures
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
MULTIVARIATE ANALYSIS
General Uses
• More efficient analysis that combines information on two or more response
variables.
• Produces an improvement on the precision of the breeding values (BLUPs).
• Allows to estimate correlations among traits (e.g. phenotypic and genetic
correlations).
• Assists in predicting individual breeding values for traits that were not
measured (but they need to be correlated).
• Relevant to assess importance of indirect selection.
• Can be used to combine different sources of, complete or incomplete,
sources of data.
• Generates the required matrices to construct a selection index.
• Recommended analysis for cases where a prior selection was done based in a
trait.
BIVARIATE ANALYSIS
• Considers a 2 x 2 matrix for each effect, e.g.
In ASReml
• Uses individual stacked responses: yi = [yi(1) yi(2)]’
• The word Trait is used to defined the stacked response vector.
• Typically genetic and error effects are defined with a UN variance structure.
• Other effects can be defined as UN or DIAG structures.
• It is also recommended to use some of the correlation to maintain parameter
space.
Strategy for fitting models in ASReml
• Sensible to initial starting values (for any multivariate analysis).
• Strategy: start with univariate analysis and add one variable at the time.
• Get rough estimates: Estimate phenotypic or genetic correlations /
covariances using univariate solutions, or prior knowledge.
• Use !CONTINUE or –c from previous runs.
2
221
21
2
1
2
1
21
)
ttt
ttti
g
g
gg
V(g
A tree genetic study consisting on seeds from a total of 28 female parents were
collected from mass selection and tested in a RCBD together with 3 control female
parents. The experiment consisted in 10 replicates with 34 plots each of size 2 x 3.
The response variables of interest are total height (HT, cm) and diameter at breast
height (DBH, cm). For now we will concentrate in the response HT. The objective is
to rank the female parents for future selections and seed production. Note that a
model can be fitted with and without the controls included as parents.
OPEN POLLINATION
Example: /Day2/BivarOpen/OPENPOL.txt
ID REP PLOT FEMALE TYPE DBH HT
1 1 1 FEM1 Test 23.8 12.4
2 1 1 FEM1 Test 24.4 12.1
3 1 1 FEM1 Test 25.4 10.9
4 1 1 FEM1 Test 28.0 12.7
5 1 1 FEM1 Test 20.9 11.9
6 1 1 FEM1 Test 22.6 11.2
7 1 2 FEM15 Test 22.4 10.7
8 1 2 FEM15 Test 21.9 11.6
9 1 2 FEM15 Test 20.8 11.3
10 1 2 FEM15 Test 21.6 13.3
...
OPEN POLLINATION (bivariate)
!RENAME !ARGS 2
Open polination trial
ID
REP 10 !I
PLOT 34 !I
FEMALE 31 !A !SORT
TYPE 2 !A !SORT
DBH
HT
OPENPOL.TXT !MAXIT 40 !SKIP 1 !DISPLAY 2 !DOPART $A
!PART 2
HT DBH ~ Trait Trait.REP !r Trait.FEMALE Trait.REP.PLOT
1 2 2
0 0 ID
Trait 0 US 1.01 1.82 7.25
Trait.FEMALE 2
Trait 0 US 0.19 0.31 0.61
FEMALE 0 ID
Trait.REP.PLOT 3
Trait 0 US 0.05 0.001 0.001 !GUFF
REP 0 ID
PLOT 0 ID
Example: /Day2/BivarOpen/BivarOpen_.as
OPEN POLLINATION (bivariate)
Source Model terms Gamma Component Comp/SE % C
Residual UnStructured 1 1 1.00196 1.00196 29.17 0 U
Residual UnStructured 2 1 1.83449 1.83449 23.69 0 U
Residual UnStructured 2 2 7.43730 7.43730 29.20 0 U
Trait.FEMALE UnStructured 1 1 0.191142 0.191142 3.44 0 U
Trait.FEMALE UnStructured 2 1 0.310167 0.310167 3.16 0 U
Trait.FEMALE UnStructured 2 2 0.705031 0.705031 3.39 0 U
Trait.REP.PLOT DIAGonal 1 0.790064E-01 0.790064E-01 5.19 0 U
Trait.REP.PLOT DIAGonal 2 -0.201829 -0.201829 -2.84 0 U
Covariance/Variance/Correlation Matrix UnStructured Residual
1.002 0.6720
1.834 7.437
Covariance/Variance/Correlation Matrix UnStructured Trait.FEMALE
0.1911 0.8449
0.3102 0.7050
Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
8 Trait 2 29.2 9584.21 <.001
9 Trait.REP 18 643.1 4.82 <.001
Interpreting analysis
MULTIVARIATE ANALYSIS
Extensions
• Consider different sites (or years) as different traits (e.g. helps to classify
sites).
• Variance-covariance matrices can be used to ‘study’ genetic structure
(e.g. evaluating / separating genetic groups).
Strategy for fitting models in ASReml
• For fitting model use same strategies as for bivariate analysis.
• Standardized responses, particularly when variables have different scales.
• Implement simple structures first (e.g. ID, DIAG, CORUH, CORGH).
• Correlation variance structures (CORUH, CORBH, CORGH) tend to give
better results.
• Consider constraining some parameters, e.g. !GPFPUP
• Be aware that it might not fit at all!
REPEATED MEASURES
• Very similar to multivariate analysis but every measurement point (time) is
considered as a different trait.
• Requires modelling of the mean effects (patterns) and variance structures.
• Additional modelling of fixed effects of time points is possible (e.g.
polynomials or splines).
• Convergence conflicts are still present, but to a lesser extent.
• Two modelling approaches:
- Multiple vectors: parallel vectors with, typically, US error structure.
- Single vector: stacked responses with, typically, AR1V correlations.
Relevant functions in ASReml
pol(y,n) forms a set of orthogonal polynomials of order n
lin(f) transform the factor f into a covariate
spl(v,k) defines a spline model term for the variable v with k knots
!{ and !} placed around model terms so terms are not reordered
(important for specifying covariances between random terms)
Example: /Day2/MultiVar/MVCOLS.txt
A total of 824 individuals were measured at 4 equally spaced time points. These
correspond to offspring of 26 parents that were planted as a RCBD with 4 blocks
at 2, 4, 6 and 8 years after establishment.
REPEATED MEASURES: AS MV
IDD Indiv Female Rep HT1 HT2 HT3 HT4
1 1 F09 1 62.0 108.0 240.0 411.5
2 2 F02 1 66.0 154.0 275.0 442.0
3 3 F21 1 65.0 116.0 245.0 323.1
4 4 F25 1 68.0 102.0 225.0 350.5
5 5 F13 1 58.0 170.0 325.0 457.2
6 6 F14 1 117.0 265.0 445.0 588.3
7 7 F14 1 * * * *
8 8 F15 1 75.0 162.0 315.0 484.6
9 9 F18 1 74.0 182.0 340.0 493.8
10 10 F03 1 100.0 230.0 350.0 518.2
11 11 F07 1 72.0 148.0 310.0 313.9
12 12 F14 1 69.0 164.0 310.0 469.4
13 13 F11 1 87.0 208.0 340.0 493.8
14 14 F24 1 50.0 148.0 290.0 454.2
15 15 F02 1 66.0 173.0 350.0 521.2
16 16 F21 1 75.0 164.0 305.0 469.4
17 17 F15 1 78.0 166.0 315.0 493.8
...
Example: /Day2/MultiVar/MV_.as
REPEATED MEASURES: AS MV
!RENAME !ARGS 1
Multivariate Analysis of HT - 4 meas
IDD
INDIV
FEMALE !A
REP !A
HT1 HT2 HT3 HT4
MVCols.txt !SKIP 1 !MAXIT 40 !DISPLAY 2 !DOPART $A
!PART 1
HT1 HT2 HT3 HT4 ~ Trait Trait.REP !r Trait.FEMALE
1 2 1
0 0 ID
Trait 0 US 419
556 1405
698 1846 3801
821 2306 4624 7154
Trait.FEMALE 2
Tr 0 US 36
48 74
38 70 117
61 126 223 410
FEMALE 0 ID
REPEATED MEASURES: AS MV
Interpreting analysis
Covariance/Variance/Correlation Matrix UnStructured Residual
419.7 0.7241 0.5527 0.4744
556.1 1405. 0.7989 0.7275
698.1 1847. 3801. 0.8868
822.0 2307. 4625. 7155.
Covariance/Variance/Correlation Matrix UnStructured Trait.FEMALE
35.60 0.9375 0.5843 0.5068
48.08 73.88 0.7499 0.7245
37.78 69.86 117.5 1.019
61.25 126.2 223.8 410.4
Example: /Day2/RepMeas/REPCOLS.txt
REPEATED MEASURES: AS UNIV
IDD Indiv Female Rep Time HT
1 1 F09 1 1 62
2 1 F09 1 2 108
3 1 F09 1 3 240
4 1 F09 1 4 411.5
5 2 F02 1 1 66
6 2 F02 1 2 154
7 2 F02 1 3 275
8 2 F02 1 4 442
9 3 F21 1 1 65
10 3 F21 1 2 116
11 3 F21 1 3 245
12 3 F21 1 4 323.1
13 4 F25 1 1 68
14 4 F25 1 2 102
15 4 F25 1 3 225
16 4 F25 1 4 350.5
17 5 F13 1 1 58
18 5 F13 1 2 170
19 5 F13 1 3 325
20 5 F13 1 4 457.2
...
Example: /Day2/RepMeas/RepCols_.as
REPEATED MEASURES
!RENAME !ARGS 1
Repeated Measures Analysis of HT - 4 meas
IDD
INDIV
FEMALE 26 !A
REP 4 !I
TIME 4 !I
HT
REPCOLS.txt !MAXIT 40 !SKIP 1 !DISPLAY 2 !DOPART $A
!PART 1
!FILTER TIME !SELECT 1
HT ~ mu REP !r FEMALE
!PART 2
log(HT) ~ mu lin(TIME) TIME.REP !r,
!{ FEMALE lin(TIME).FEMALE !} !f mv
1 2 1
824 0 ID !S2==1
TIME 0 AR1H 0.8 0.05 0.05 0.05 0.05
FEMALE 2
2 0 CORUH -0.8 0.004 0.0001
FEMALE
Source Model terms Gamma Component Comp/SE % C
Residual AR=AutoR 4 0.798949 0.798949 82.75 0 U
Residual AR=AutoR 4 0.641964E-01 0.641964E-01 19.31 0 U
Residual AR=AutoR 4 0.464063E-01 0.464063E-01 19.32 0 U
Residual AR=AutoR 4 0.365361E-01 0.365361E-01 20.19 0 U
Residual AR=AutoR 4 0.310505E-01 0.310505E-01 20.57 0 U
FEMALE CORRelat 2 -0.807724 -0.807724 -6.53 0 U
FEMALE CORRelat 2 0.362337E-02 0.362337E-02 1.87 0 U
FEMALE CORRelat 2 0.262804E-03 0.262804E-03 2.06 0 U
Covariance/Variance/Correlation Matrix CORRelation FEMALE
0.3625E-02 -0.8078
-0.7884E-03 0.2629E-03
Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
8 mu 1 23.2 0.37E+06 <.001
9 lin(TIME) 1 24.0 17256.68 <.001
10 TIME.REP 14 2096.0 256.63 <.001
REPEATED MEASURES
Interpreting analysis
Session 9
Multi-environment
Analysis
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
MET ANALYSIS
General Uses
• Incorporates information from several experiments (over different sites or
years) to obtain overall BVs.
• Allows to estimate Genotype-by-Environment (or Genotype-by-Year)
effects, and their variance structure. Hence, it separates genetic effects into
their pure component and their interaction with site (or year).
• Provides with unbiased estimates of heritability and Type-B correlations.
• Critical to understand the genotypes structure of the population and to
define breeding strategies.
Difficulties
• Every site (or year) has its own ‘personality’ (i.e. error structure, design
effects, etc.) that needs to be combined into a single analysis.
• Amount of data can large with difficulties in fitting and convergence.
• Requires additional prior checks (e.g. EDA, coding, etc.).
Some useful options
In ASReml
• Flexible and fast enough to incorporate many datasets.
• Each site will have its own model specification (fixed effects, random
components and error structure).
• Allows to use a 2-stage analysis (see !TWOSTAGEWEIGHTS).
MET ANALYSIS
<sections> <dimensions> <number of G structures>
at(f,n) creates a binary variable for the condition specified in a factor
mv creates a missing value as fixed effect (design matrix)
!SECTION n number of residual structures to define.
!MVINCLUDE missing values in a factor are treated as zeros
Strategy for fitting MET models in ASReml
• Careful cleaning process (same factors, values, etc.).
• Start analyzing every site individually determining all necessary (and
significant) design effects and error structure.
• Evaluate which sites to consider for full analysis (sites with low
heritability contribute little to ranking).
• Consider implementing a data standardization.
• Incorporate and evaluate which variables or factors will act as
‘covariates’ through all trials.
• Combine all trials into a simple single analysis (e.g. heterogeneous error
variances but with common additive variance).
• Progress slowly to more complex variance structure for different model terms (e.g. DIAG for additive).
• Considering favouring the simplest model that suits your requirements
(practical, operational).
MET ANALYSIS
Complex Variance Structures
• Ideal objective: to fit a US structure to the GxE matrix to understand the
genetic structure and evaluate stability of genotypes and breeding zones.
• A US structure is difficult to fit, but other simpler (approximate) structures
are available.
• ASReml allows to consider other structures based in multivariate
techniques (e.g. factor analytic covariance).
MET ANALYSIS
• Is a relative expression of genotype-by-environment interaction.
• It could be zero or positive (0 to 1).
• A value close to 0 indicates that the rank in one environment is very
different than the rank in another environment (i.e. low stability)
• A value close to 1 indicates that a single ranking can be used across all
environments without loss of information (i.e. high stability).
gxsg
g
VV
V
2
)(gBrg
axsa
a
VV
V
2
)(aBrg
TYPE-B CORRELATIONS
Definition: Correlation between sites
MET ANALYSIS
Option 1: Simple GxE structure
• Aims at modelling a common GxE correlation.
• Common structures are: DIAG, CORUH.
• Correlation corresponds to an average value across all sites.
• It is simpler to fit, easy to converge.
• It does not allow for a better understanding of the GxE.
Option 2: Complex GxE structure
• Aims at modelling the ‘full’ GxE correlation structure.
• Common structures are: CORGH, US, FAk, FACVk.
• Provides with a different GxE correlation for each pair of sites.
• It is difficult to fit, particularly for several sites.
• Simplifications are usually required, e.g. standardization.
MET ANALYSIS
Variant 1: Explicit GxE
yield ~ mu Site !r Genotype Site.Genotype
• Provides with average genetic values across all sites, together with GxE
deviations for each site.
• Useful for generating ranking across all sites.
• Allows for simplification of GxE term.
Variant 2: Implicit GxE
yield ~ mu Site !r Site.Genotype
• Provides with a different genetic value for each site.
• Useful for generating rankings for each site.
• It could make use of the full correlation structure of the GxE.
• Typically used to understand the dynamics of GxE.
y vector of observations
β vector of fixed design or covariate effects
l vector of fixed location (sites or years) effects
b vector of random design effects (e.g. block effect), ~ N(0, Iσ2b)
s vector of random sire effects (i.e. ½ breeding value), ~ N(0, Aσ2s)
sl vector of random sire-by-location interactions, ~ N(0, Iσ2sl)
e vector of random residual effects, ~ N(0, D) or N(0, )
Va = 4 σ2s Vaxs = 4 σ2
sl
Vp = σ2s + σ2
sl + σ2
h2 = Va / Vp = 4 σ2s / [σ
2s + σ2
sl + σ2]
rgB(a) = Va / [Va + Vaxs] = ρs
MET HALF-SIB / SIRE MODEL
eslZsZbZlXβXy 32121
i
s
i
R1
Explicit GxE
Example: /Day2/MultiEnv/TRIALS4.txt
A set of 4 trials were established as part of a breeding program. A total of 61
unrelated parents were considered (i.e. half-sib model). All trials corresponded to
IBD with 4 full replicates. The response variable of interest is HT. We are
interested in obtaining an analysis using all four sites simultaneously.
MET ANALYSIS
IDD Test Genotype Rep Iblock Row Column Surv DBH HT
10001 1 G41 1 1 1 1 1 736.6 557.8
10002 1 G33 1 1 2 1 1 685.8 588.3
10003 1 G22 1 1 3 1 1 838.2 551.7
10004 1 G31 1 1 4 1 1 660.4 539.5
10005 1 G18 1 1 5 1 1 406.4 411.5
10006 1 G01 1 1 6 1 1 508.0 417.6
10007 1 G05 1 1 7 1 1 711.2 518.2
10008 1 G54 1 2 8 1 1 609.6 463.3
10009 1 G30 1 2 9 1 1 482.6 466.3
10010 1 G17 1 2 10 1 1 736.6 527.3
10011 1 G58 1 2 11 1 1 584.2 472.4
10012 1 G37 1 2 12 1 1 431.8 442.0
10013 1 G07 1 2 13 1 1 736.6 600.5
10014 1 G42 1 2 14 1 1 711.2 566.9
10015 1 G38 1 3 15 1 1 711.2 518.2
10016 1 G33 1 3 16 1 1 736.6 606.6
10017 1 G50 1 3 17 1 1 736.6 576.1
10018 1 G20 1 3 18 1 1 660.4 539.5
...
Example Variant 1: /Day2/MultiEnv/GxE_.as
MET ANALYSIS
!RENAME !ARGS 2
Four trials to study GxE for HT
IDD
Test 4 !A !SORT
Genotype 61 !A
REP 4 !A
IBlock 110 !A
Row 56 !A
Col 32 !A
Surv
DBH
HT
TRIALS4.txt !SKIP 1 !MAXIT 50 !DISPLAY 2 !DOPART $A
!PART 2
HT ~ mu Test Test.REP !r,
at(Test,1).REP.IBlock at(Test,2).REP.IBlock,
at(Test,3).REP.IBlock at(Test,4).REP.IBlock,
Genotype Test.Genotype !f mv
4 1 0
4480 0 ID
4480 0 ID
4608 0 ID
4400 0 ID
Note: individual site heritabilites can also be calculated.
MET ANALYSIS
Interpreting variance components
Source Model terms Gamma Component Comp/SE % C
Residual 17968 16537
Genotype 100 100 301.167 301.167 4.60 0 P
Test.Genotype 400 400 158.584 158.584 6.74 0 P
at(Test,1).REP.IBloc 4400 4400 1159.04 1159.04 9.75 0 P
at(Test,2).REP.IBloc 4400 4400 1960.32 1960.32 10.84 0 P
at(Test,3).REP.IBloc 4400 4400 815.989 815.989 9.18 0 P
at(Test,4).REP.IBloc 4400 4400 206.324 206.324 4.77 0 P
Variance 0 0 4390.59 4390.59 44.30 0 P
Variance 0 0 3871.67 3871.67 43.39 0 P
Variance 0 0 4130.69 4130.69 42.40 0 P
Variance 0 0 3812.02 3812.02 42.26 0 P
Va = 4 s2g = 4 x 301.2 = 1204.7
Vaxs = 4 s2gs = 4 x 158.6 = 634.3
Vp = 301.2 + 158.6 + (4141.7)/4 +(16235.0)/4 = 5553.9
h2 = Va / Vp = 1204.7 / 5553.9 = 0.217
rgB(a) = Va / [Va + Vaxs] = 1204.7 / [1204.7 + 634.3] = 0.655
y vector of observations
β vector of fixed design or covariate effects
l vector of fixed location (sites or years) effects
b vector of random design effects (e.g. block effect), ~ N(0, Iσ2b)
sl vector of random sire-by-location interactions, ~ N(0, UA)
e vector of random residual effects, ~ N(0, D)
U matrix of variance-covariances
A numerator relationship matrix
D diagonal matrix
MET HALF-SIB / SIRE MODEL
eslZbZlXβXy 3121 Implicit GxE
MET ANALYSIS
!PART 3
HT ~ mu Test Test.REP !r,
at(Test,1).REP.IBlock at(Test,2).REP.IBlock,
at(Test,3).REP.IBlock at(Test,4).REP.IBlock,
Test.Genotype !f mv
4 1 1
4480 0 ID
4480 0 ID
4608 0 ID
4400 0 ID
Test.Genotype 2
Test 0 US 520.7
392.2 563.6
256.7 376.6 392.1
384.1 268.8 200.0 356.8
Genotype 0 ID
Example Variant 2: /MultiEnv/GxE_.as
MET ANALYSIS
Source Model terms Gamma Component Comp/SE % C
Residual 17968 16537
at(Test,1).REP.IBloc 440 440 1161.07 1161.07 9.76 0 P
at(Test,2).REP.IBloc 440 440 1961.80 1961.80 10.84 0 P
at(Test,3).REP.IBloc 440 440 816.001 816.001 9.18 0 P
at(Test,4).REP.IBloc 440 440 207.978 207.978 4.79 0 P
Variance[ 1] 4480 0 4388.87 4388.87 44.30 0 P
Variance[ 2] 4480 0 3871.39 3871.39 43.38 0 P
Variance[ 3] 4608 0 4131.87 4131.87 42.38 0 P
Variance[ 4] 4400 0 3811.58 3811.58 42.26 0 P
Test.Genotype UnStructured 1 1 520.722 520.722 4.86 0 U
Test.Genotype UnStructured 2 1 392.218 392.218 4.21 0 U
Test.Genotype UnStructured 2 2 563.561 563.561 4.94 0 U
Test.Genotype UnStructured 3 1 256.719 256.719 3.43 0 U
Test.Genotype UnStructured 3 2 376.619 376.619 4.44 0 U
Test.Genotype UnStructured 3 3 392.056 392.056 4.65 0 U
Test.Genotype UnStructured 4 1 304.148 304.148 4.04 0 U
Test.Genotype UnStructured 4 2 268.839 268.839 3.59 0 U
Test.Genotype UnStructured 4 3 200.202 200.202 3.20 0 U
Test.Genotype UnStructured 4 4 356.775 356.775 4.66 0 U
Covariance/Variance/Correlation Matrix UnStructured Test.Genotype
520.7 0.7240 0.5682 0.7056
392.2 563.6 0.8012 0.5995
256.7 376.6 392.1 0.5353
304.1 268.8 200.2 356.8
Interpreting variance components
MET ANALYSIS
BLUP values: Variant 1
Effect Level BLUP SE(BLUP)
Genotype G22 11.03 7.085
Test.Genotype 1.G22 10.43 8.368
Test.Genotype 2.G22 7.668 8.238
Test.Genotype 3.G22 -13.59 8.386
Test.Genotype 4.G22 1.297 8.198
BLUP values: Variant 2
Effect Level BLUP SE(BLUP)
Test.Genotype 1.G22 23.17 7.485
Test.Genotype 2.G22 17.8 7.12
Test.Genotype 3.G22 -1.8 7.147
Test.Genotype 4.G22 12.36 6.817
Factor Analytic models
• Useful approximations for modelling an U matrix on GxE or multivariate
analyses.
• Flexible models that require fewer variance-components than US, and tend
to converge better and quicker.
• Allow for additional interpretation of underlie environmental factors
associated with the matrix of correlations.
• Finding solutions for FA models can be difficult requiring proper
specification of initial values.
• Several alternative models are available within ASReml: FAk, FACVk and
XFAk.
• Based on the parameterization:
MET ANALYSIS
'
FA model: FAk
D is a diagonal matrix such that
C is a correlation matrix of the form
F is a matrix of loadings on the correlation scale
E is a diagonal matrix defined by difference (remnant).
FA model: FACVk
is a matrix of loadings on the covariance scale, with
is a diagonal matrix, with
MET ANALYSIS
)(diagDD
DCD
DF
E'FF
'
DED
MET ANALYSIS
!PART 4
HT ~ mu Test Test.REP !r,
at(Test,1).REP.IBlock at(Test,2).REP.IBlock,
at(Test,3).REP.IBlock at(Test,4).REP.IBlock,
Test.Genotype !f mv
4 1 1
4480 0 ID
4480 0 ID
4608 0 ID
4400 0 ID
Test.Genotype 2
Test 0 FA1
0.8 0.9 0.1 0.2 # 1st factor
520.7 563.6 392.1 356.8 # Site Variances
Genotype 0 ID
Example Variant 2: /MultiEnv/GxE_.as
MET ANALYSIS
Source Model terms Gamma Component Comp/SE % C
Residual 17968 16537
at(Test,1).REP.IBloc 440 440 1159.40 1159.40 9.75 0 P
at(Test,2).REP.IBloc 440 440 1961.62 1961.62 10.84 0 P
at(Test,3).REP.IBloc 440 440 815.999 815.999 9.18 0 P
at(Test,4).REP.IBloc 440 440 207.516 207.516 4.79 0 P
Variance[ 1] 4480 0 4389.44 4389.44 44.29 0 P
Variance[ 2] 4480 0 3871.43 3871.43 43.38 0 P
Variance[ 3] 4608 0 4131.95 4131.95 42.38 0 P
Variance[ 4] 4400 0 3811.38 3811.38 42.26 0 P
Test.Genotype FA D(LL'+E)D 1 1 0.787009 0.787009 10.71 0 U
Test.Genotype FA D(LL'+E)D 1 2 0.931814 0.931814 17.28 0 U
Test.Genotype FA D(LL'+E)D 1 3 0.818246 0.818246 11.66 0 U
Test.Genotype FA D(LL'+E)D 1 4 0.695414 0.695414 7.58 0 U
Test.Genotype FA D(LL'+E)D 0 1 519.153 519.153 4.83 0 U
Test.Genotype FA D(LL'+E)D 0 2 563.923 563.923 4.94 0 U
Test.Genotype FA D(LL'+E)D 0 3 391.055 391.055 4.63 0 U
Test.Genotype FA D(LL'+E)D 0 4 359.863 359.863 4.63 0 U
Covariance/Variance/Correlation Matrix FA D(LL'+E)D Test.Genotype
519.1 0.7333 0.6440 0.5472
396.8 563.9 0.7625 0.6480
290.2 358.1 391.1 0.5690
236.5 291.9 213.4 359.9
Interpreting variance components
MET ANALYSIS
Two-Stage Analyses
• An MET analysis with several sites (> 5) is difficult to obtain, particularly if there are too many variance components to estimates (e.g. US).
• It is possible to use a two-stage analysis that is decomposed as:
1st Stage
• Every site is analysed individually with its own characteristics.
• Genotype effects are assumed fixed.
• Means and SEMs are obtained for each site.
2nd Stage
• All means (and SEMs) are combined into a single file.
• The use of !TWOSTAGEWEIGHTS generates weights (and covariance) for
each prediction and combines the analyses into a single run.
Session 10
Spatial
Analysis
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
SPATIAL ANALYSIS
General Uses
• It corresponds to an extension to the single vector repeated measures analysis.
• Incorporates information from physical positions (x and y coordinates).
• Effect: improves estimates (BLUPs) and allows for a better control of errors.
Hence, it will increase heritability and genetic gains.
• More efficient analysis (under presence of correlation) as it ‘borrows’
information from neighbours.
• ASReml can handle regular or irregular grids.
• Can be used for unreplicated trials!
Difficulties
• At the present is more like an ‘art’ that requires to evaluate several options.
• Requires the knowledge of the position of each individual experimental unit
(e.g. plant or plot).
• Additional variance components need to be estimated (i.e. convergence
problems).
SPATIAL ANALYSIS
• Gradients or Trends
Linear trends
Polynomial functions, e.g. f(xc, yc) = + 1xc + 2yc + 3 xc2 yc + 4xc yc
2
Row or Column effects (random).
• Patches
Incomplete Blocks
Spatial Error Structures, e.g. AR1 AR1 +
Var (eij) = s2
+ ms2
Cov (eij , ei’j’) = s2 ρx
hx ρyhy
SPATIAL ANALYSIS
Strategy in ASReml (regular grid)
• Begin with an separable autorregressive error structure: AR1AR1. This is
a first order autorregressive model that assumes separate correlations x and y for columns and rows, respectively (i.e. AR1).
• Evaluate if a nugget effect is required (i.e. !r units).
• Check variogram and incorporate additional random or fixed effects for
trends.
• Use a likelihood ratio test (LRT), BIC or AIC to compare models.
Strategy in ASReml (irregular grid)
• Begin with an isotropic exponential (i.e. IEXP) and then move to more
complex models (e.g. AEXP) .
• As before, evaluate if a nugget effect is required (i.e. !r units), check
variogram and incorporate additional random or fixed effects.
VARIANCE STRUCTURES
Correlation/Spatial structures
AR1 First order autoregressive 1
AR2 Second order autoregressive 2
ARMA Autoregressive and moving average 2
LVR Linear variance 1
IEXP Isotropic Exponential 1
AEXP Anisotropic Exponential 2
Relevant functions in ASReml
!S2==1 used to fix the R variance to 1.0
!f mv to include dummy missing values in sparse form
units includes nugget (microsite) random error
pol(y,n) forms a set of orthogonal polynomials of order n
lin(f) transform the factor f into a covariate
fac(v) forms a factor with the values of a continuous variable
spl(v,k) defines a spline model term for the variable v with k knots
SPATIAL ANALYSIS
Heritability in spatial models
• Traditional expression is only valid when distance between individuals is
assumed to be zero.
• Generic expression for spatial analyses:
• An alternative is to use the PEVs to approximate the mean parental
heritability:
2
2
PEV
)}({1
g
PEVmeanh
g
2
0
2
e
|dy|
y
|dx|
x
2
g
2
g2
)(
4
h
SPATIAL ANALYSIS
Comparing spatial models
• Use LRT when models are nested and have the same fixed effect terms.
• Compare AIC (Akaike Information Criteria) and BIC (Bayesian
Information Criteria) to select among non-nested models (but with same
fixed effect terms).
• Use a h2PEV to compare among different models.
• Calculate one of the proposed R2 expressions for mixed models.
t number of variance parameters in the model
v residual degrees of freedom, v = n – p
AIC = – 2×logL + 2×t
BIC = – 2×logL + 2×t×log(v)
Example: /Day2/Spatial/ROWCOL.TXT
SPATIAL TRIAL
ID REP ROW COL PLOT TREE FEMALE X Y YA
1 2 4 1 14 2 4 1 1 8.628352
2 2 4 1 14 1 4 1 2 7.718902
3 2 3 1 26 2 7 1 3 8.041164
4 2 3 1 26 1 7 1 4 9.593278
5 2 2 1 62 2 16 1 5 8.739841
6 2 2 1 62 1 16 1 6 8.456119
7 2 1 1 50 2 13 1 7 9.557565
8 2 1 1 50 1 13 1 8 10.639179
9 1 4 1 1 2 1 1 9 9.938713
10 1 4 1 1 1 1 1 10 8.332414
11 1 3 1 53 2 14 1 11 10.495654
12 1 3 1 53 1 14 1 12 10.130853
13 1 2 1 37 2 10 1 13 11.983712
14 1 2 1 37 1 10 1 14 12.080121
15 1 1 1 33 2 9 1 15 11.203263
16 1 1 1 33 1 9 1 16 10.757546
17 2 4 1 14 4 4 2 1 9.797591
18 2 4 1 14 3 4 2 2 9.206996
19 2 3 1 26 4 7 2 3 8.786462
...
An experiment was established to evaluate a group of open-pollinated families. The
experiment consisted in row-column design with 4 replicates. The plants within the
experiment where arranged in a 16x16 grid and is of interest to rank female parents
based on the response yield (YA) by fitting an spatial model.
SPATIAL TRIAL
!RENAME !ARGS 1 2
Genetic Spatial trial
ID
REP !I
ROW !I
COL !I
PLOT !I
TREE
FEMALE !A
X 16 # X coordinate
Y 16 # Y coordinate
YA
ROWCOL.TXT !SKIP 1 !MAXIT 40 !DISPLAY 15 !DOPART $A
!PART 1
YA ~ mu REP !r REP.ROW REP.COL FEMALE REP.PLOT !f mv
1 2 0
16 0 ID
16 0 ID
!PART 2
YA ~ mu REP fac(Y) fac(X) !r REP.ROW REP.COL FEMALE REP.PLOT !f mv
1 2 0
16 X AR1 0.3
16 Y AR1 0.3
Example: /Day2/Spatial/Spatial_.as
Interpreting variograms
SPATIAL TRIAL
Traditional Analysis
Spatial Analysis
SPATIAL TRIAL
LogL=-55.4337 S2= 0.40323 252 df
Source Model terms Gamma Component Comp/SE % C
REP.ROW 16 16 0.447880 0.180596 1.96 0 P
REP.COL 16 16 0.144506 0.582684E-01 1.32 0 P
FEMALE 16 16 0.260711 0.105125 1.81 0 P
REP.PLOT 64 64 0.105548 0.425594E-01 0.96 0 P
Variance 256 252 1.00000 0.403225 9.80 0 P
LogL=-61.9450 S2= 0.41594 224 df
Source Model terms Gamma Component Comp/SE % C
REP.ROW 16 16 0.245284 0.102024 1.20 0 P
REP.COL 16 16 0.101193E-06 0.420904E-07 0.00 0 B
FEMALE 16 16 0.279467 0.116242 2.02 0 P
REP.PLOT 64 64 0.503325E-07 0.209354E-07 0.00 0 B
Variance 256 224 1.00000 0.415943 8.85 0 P
Residual AR=AutoR 16 0.522643E-01 0.522643E-01 0.68 0 U
Residual AR=AutoR 16 0.210814 0.210814 2.85 0 U
Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
11 mu 1 13.8 6143.27 <.001
2 REP 3 5.6 4.27 0.062
12 fac(Y) 14 14.8 3.33 0.014
13 fac(X) 14 43.1 1.60 0.119
SPATIAL ANALYSIS
BLUP values
Heritabilites
Traditional Spatial
Female BLUP SE(BLUP) BLUP SE(BLUP)
1 -0.215 0.197 -0.277 0.189
2 0.204 0.197 0.191 0.190
3 -0.154 0.197 -0.129 0.188
4 -0.099 0.197 -0.207 0.189
Traditional Spatial
Va 0.421 0.465
Vp 0.790 0.634
mean(PEV) 0.039 0.036
h2 0.532 0.733
h2pev 0.631 0.693
UNREPLICATED TRIALS (UR)
• Field experiments that allows testing several hundreds of genotypes with
little or no replication.
• Useful for initial stages of genotype screening.
• Most treatments (with the exception of controls or checks) have a single
replication.
• Checks are used for estimation of local control and to detect trends, and they
allow estimation of the residual variance.
• Typically augmented designs are the base for unreplicated trials.
• Using too many check plots could be expensive.
• Checks should have a similar response than test genotypes.
• Statistical analysis can be based in simple (e.g. RCBD) or spatial models
(e.g. AR1AR1).
11 C2 24 112 23 69 C1 96 22 6 34 C1
85 101 48 C1 28 7 89 60 C2 108 74 56
47 C1 10 43 C2 16 52 5 38 33 C2 93
65 111 64 100 81 104 C2 78 C1 113 21 106
12 C2 44 68 42 C1 97 17 32 73 C1 35
25 C1 27 C2 15 88 29 4 53 C2 55 75
102 84 1 49 C1 61 70 C2 18 95 37 C1
46 86 C2 63 2 51 79 39 59 92 C2 57
66 13 C1 82 41 98 C2 90 C1 77 20 36
C1 45 83 87 C2 62 3 30 72 54 105 76
26 C2 9 14 50 8 40 C1 31 19 C2 C1
110 103 67 C1 99 80 C2 71 91 58 109 94
UNREPLICATED TRIALS (UR)
General recommendations
• More control plots improve the efficiency of UR experiments.
• Important gains in efficiency are achieved by using spatial analyses.
An unreplicated pepper trial was established to evaluate a total of 824 genotypes
planted in single plots and arranged as a RCBD with 4 blocks. In addition, a total of
10 control genotypes were planted with 20 replications each (i.e. 5 replications per
block). All these individuals were arranged in a 32x32 grid, and the response variable
yield, YD, was obtained. It is of interest to rank all the single replicated genotypes.
UNREPLICATED TRIALS (UR)
Example: /Day2/UnRep/PEPPER.TXT
Gens Control Rep X Y YD
6 0 1 1 25 7.91
16 0 1 7 17 9.04
18 0 1 11 26 9.53
19 0 1 16 20 10.08
22 0 1 2 27 9.78
35 0 1 10 26 9.21
39 0 1 4 30 8.86
40 0 1 8 24 9.15
42 0 1 11 25 9.38
45 0 1 15 22 10.64
48 0 1 10 32 10.32
50 0 1 10 31 11.22
51 0 1 8 26 11.45
...
UNREPLICATED TRIALS (UR)
!RENAME !ARGS 1
Augmented Design
Gens 824 !I !SORT
Control 2 !I !SORT
Rep 4 !I
X 32
Y 32
YD
PEPPER.TXT !SKIP 1
!DOPART $A !MAXIT 50
!PART 1
YD ~ mu !r Rep Gens !f mv
1 2 0
32 0 ID
32 0 ID
!PART 2
YD ~ mu !r Rep Gens
1 2 0
32 X AR1 0.5
32 Y AR1 0.5
Example: /Day2/UnRep/Unrep_.as
UNREPLICATED TRIALS (UR)
Traditional Analysis
Spatial Analysis
LogL=-478.184 S2= 0.74805 1023 df
Source Model terms Gamma Component Comp/SE % C
Rep 4 4 0.101193E-06 0.756971E-07 0.00 0 B
Gens 834 834 0.282634 0.211424 2.66 0 P
Variance 1024 1023 1.00000 0.748048 10.43 0 P
LogL=-468.587 S2= 0.77062 1023 df
Source Model terms Gamma Component Comp/SE % C
Rep 4 4 0.101193E-06 0.779810E-07 0.00 0 B
Gens 834 834 0.238505 0.183796 2.48 0 P
Variance 1024 1023 1.00000 0.770617 11.04 0 P
Residual AR=AutoR 32 0.113712 0.113712 2.98 0 U
Residual AR=AutoR 32 0.120829 0.120829 3.06 0 U
Session 11
Generalized
Linear Mixed Models
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
GLMM
General Uses
• It corresponds to an extension of the linear mixed models to situations with a
distribution other than the Normal, typically, Binomial and Poisson.
• It needs the specification of the distribution, together with a link function that
connects the response to the explanatory variables of the linear model.
• For linear models, estimation of parameters is based in maximum likelihood
estimation (MLE), and therefore it can run into problems.
• For linear mixed models, estimation of parameters is based in an
approximation to the MLE.
• Testing is done using a LRT, mainly in comparison of the mean deviance.
Difficulties
• Interpretation, and calculation of genetic parameters are more difficult as we
are in a different scale.
• Convergence problems are common, and with unbalanced data it is common
to have biologically inconsistent estimates.
ZgXβp
p
1loge
ZgXβμ )g(
)exp(1
)exp(1
ZgXβ
ZgXβμp
ii nn
inVar
)-(1)(
ppp
BINOMIAL RESPONSES
General expression
Link: logit
Back-transformed model
Variance expression
Note: ni = 1 for binary data.
over- under-dispersion parameter
ZgXβm elog
ZgXβμ )g(
)exp( ZgXβm
)()( ZgXβm Var
POISSON RESPONSES
General expression
Link: log
Back-transformed model
Variance expression
over- under-dispersion parameter
FITTING A GLMM
Relevant functions in ASReml
!BIN assumes a Binomial distribution for the response
!TOTAL specifies vector with the Binomial totals
!POISSON assumes a Poisson distribution for the response
!DISP k estimates or fixed the dispersion parameter to k
!LOGIT considers a logit link function
!PROBIT considers a probit link function
!AOD obtains the analysis of deviance table for fixed effects.
Alternatives
• Perform a transformation of the original data, and then back-transform
predictions.
• Assume a normal distribution (by the CLT), whenever values are relatively
large.
• Collapse data into a higher strata (e.g. PLOT).
GLMM MODEL
Heritability in GLMM (Binomial)
• Calculation is not direct and it requires an approximation.
• Several alternatives are available in the literature
Logit approach
with
Distributional approach
22
22
logit
4
es
sh
3/22 e
)1(
42
22
Binpp
hs
s
Example: /Day2/GLMM/SALMONAB.TXT
BINOMIAL MODEL
A salmon breeding program evaluated a total of 933 records of fish originated
from 124 families. The objective is to select individuals that will constitute the
parents for the next generation. The response variables are MARKETA and
MARKETB, which are binary responses that indicate if a given individual makes it
for a given market category. The linear model to fit should consider the full
pedigree and the factor SEX as a covariate.
INDIV Sire Dam DaysM Sex MarketA MarketB
1001 564 727 741.46 1 1 1
1002 564 727 500.09 2 1 1
1003 564 727 495.07 1 1 1
1004 564 727 506.25 2 0 0
1005 564 727 593.21 2 1 1
1006 564 727 671.1 1 1 1
1007 564 727 523.48 1 1 1
1008 564 727 531.33 1 1 1
1009 564 727 446.02 2 1 0
1010 564 727 599.2 1 1 1
1011 564 727 509.38 2 1 1
1012 564 727 643.45 2 1 1
1013 607 707 711.68 1 1 1
...
!RENAME !ARGS 1
Breeding Program Salmon
INDIV 2040 !P !SORT
SIRE 115 !I
DAM 124 !I
DAYSM
SEX 2 !I
MARKETA
MARKETB
PEDIND.TXT !SKIP 1 !MAKE
SALMONAB.TXT !SKIP 1
!MAXIT 40 !DISPLAY 2 !FCON !DOPART $A
!PART 1
MARKETA !BIN !AOD ~ mu SEX !r INDIV
predict INDIV
!PART 2
MARKETA ~ mu SEX !r INDIV
Example: /Day2/GLMM/GLMFish_.as
BINOMIAL MODEL
Interpreting output
BINOMIAL MODEL
Analysis of Deviance Table for MARKETA
Source of Variation df Deviance Derived F
SEX 1 9.20 15.964
Deviance from GLM fit 931 536.33
Variance heterogeneity factor [Deviance/DF] 0.58
Notice: The Derived F is calculated assuming 931 degrees of freedom
which will usually be a false assumption under a mixed model.
The Analysis of Variance below is of the 'working' variable.
Approximate stratum variance decomposition
Stratum Degrees-Freedom Variance Component Coefficients
INDIV 6.68 0.575366 1.0
Source Model terms Gamma Component Comp/SE % C
INDIV 1380 1380 0.575366 0.575366 1.83 0 P
Variance 933 931 1.00000 1.00000 0.00 0 F
Wald F statistics
Source of Variation NumDF DenDF_con F-inc F-con M P-con
8 mu 1 162.4 276.33 134.54 . <.001
5 SEX 1 931.0 4.36 4.36 A 0.038
GLMM MODEL
16802903145750
5750422
22
logit ../.
.h
es
s
Heritability
Predictions
Predicted values of MARKETA
The SIMPLE averaging set: SEX
INDIV Logit_value Stand_Error Ecode Retransformed_value approx_SE
501 2.0548 0.7383 E 0.8864 0.0978
502 2.2073 0.7194 E 0.9009 0.0851
503 1.8882 0.7264 E 0.8686 0.1069
504 2.0722 0.7363 E 0.8882 0.0964
505 2.1586 0.7255 E 0.8965 0.0891
506 2.3017 0.7438 E 0.9090 0.0830
507 2.4341 0.7256 E 0.9194 0.0728
508 1.8930 0.7242 E 0.8691 0.1062
509 2.0337 0.7414 E 0.8843 0.0998
510 1.8163 0.7330 E 0.8601 0.1130
511 2.3666 0.7343 E 0.9142 0.0778
512 2.0696 0.7365 E 0.8879 0.0966
513 2.0390 0.7408 E 0.8848 0.0993
...
Session 12
Genomic Selection
In ASReml
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
• Genetic improvement aims to select the best individuals for the production
and breeding populations. However, traditional breeding is a long and
expensive process, with many traits difficult to measure.
• More than 20 years ago molecular markers became the promise to aid
breeders in selection using Marker Assisted Selection (MAS). To perform
MAS QTL or association genetics type of analysis was required.
• MAS did work, in a few situations, where a marker-QTL association was
found to explain a significant portion of the variance mainly from single
QTLs with large effect.
• However, most traits of interest in breeding programs are quantitative
complex traits – controlled by a large number of genes.
• Meuwissen et al. 2001 proposed to use all markers simultaneously as
random effects to predict genetic performance (a.k.a. Genomic Selection)
RATIONALE
• Construct prediction models using the current breeding population phenotype
and molecular markers capturing most of the quantitative variation
Supplementary Figures
Supplementary Figure 1 Histograms of (a) the diagonal and (b) the off-diagonal elements of
the raw estimates of the genetic relationship matrix, (c) the diagonal and (d) the off-diagonal
0
0.1
0.2
0.3
0.4
0.5
0.6
-4.1
-3.6
-3.1
-2.6
-2.1
-1.6
-1.1
-0.6
-0.1 0.4
0.9
1.4
1.9
2.4
2.9
3.4
3.9
De
nsi
ty
Z-score
0
10
20
30
40
50
60
70
80
90
100
De
nsi
ty
Diagonal elements of genetic relationship matrix(Rarw estimates)
0
20
40
60
80
100
120
De
nsi
ty
Off-diagonal elements of genetic relationship matrix(Adjusted estimates)
0
20
40
60
80
100
120
140
De
nsi
ty
Diagonal elements of genetic relationship matrix(Adjusted estimates)
0
10
20
30
40
50
60
70
80
90
100
De
nsi
ty
Off-diagonal elements of genetic relationship matrix(Rarw estimates)
Range: 0.980 ~ 1.051
Mean: 1.001
SD: 0.00519
Range: -0.0227 ~ 0.0256
Mean: -0.00026
SD: 0.00455
Range: 0.983 ~ 1.043
Mean: 1.001
SD: 0.00434
Range: -0.0190 ~ 0.0214
Mean: -0.00021
SD: 0.00380
a b
c d
e
Nature Genetics: doi: 10.1038/ng.608
Genotypic information
Breeding Value (BV) +
Prediction model construction:
Molecular Markers
BV =1m + Wjm j
j=1
p
å + e
Quantitative phenotypic information
GENOMIC SELECTION
• Future individuals are genotyped to be use as input on prediction models to
select superior genotypes in next cycles
Supplementary Figures
Supplementary Figure 1 Histograms of (a) the diagonal and (b) the off-diagonal elements of
the raw estimates of the genetic relationship matrix, (c) the diagonal and (d) the off-diagonal
0
0.1
0.2
0.3
0.4
0.5
0.6
-4.1
-3.6
-3.1
-2.6
-2.1
-1.6
-1.1
-0.6
-0.1 0.4
0.9
1.4
1.9
2.4
2.9
3.4
3.9
Den
sity
Z-score
0
10
20
30
40
50
60
70
80
90
100
Den
sity
Diagonal elements of genetic relationship matrix(Rarw estimates)
0
20
40
60
80
100
120
Den
sity
Off-diagonal elements of genetic relationship matrix(Adjusted estimates)
0
20
40
60
80
100
120
140
Den
sity
Diagonal elements of genetic relationship matrix(Adjusted estimates)
0
10
20
30
40
50
60
70
80
90
100
Den
sity
Off-diagonal elements of genetic relationship matrix(Rarw estimates)
Range: 0.980 ~ 1.051
Mean: 1.001
SD: 0.00519
Range: -0.0227 ~ 0.0256
Mean: -0.00026
SD: 0.00455
Range: 0.983 ~ 1.043
Mean: 1.001
SD: 0.00434
Range: -0.0190 ~ 0.0214
Mean: -0.00021
SD: 0.00380
a b
c d
e
Nature Genetics: doi: 10.1038/ng.608
Selection Generation i+1
Genotypes Generation i Molecular Markers
BVF = WjFm j
j=1
p
å
Prediction
Deployment
GENOMIC SELECTION
• Decrease the generation cycle of breeding (e.g. Perennials, Cattle).
• Decrease the cost of testing (e.g. Cattle, Maize).
• Screening a larger number of genotypes without field testing, thus
increasing the selection pressure (e.g. Maize, other cereals).
• Predict performance for difficult and/or expensive traits (e.g. Cattle,
Salmon).
• Predict performance for diseases avoiding challenging and losing the
germplasm (all species).
• Can be used regardless the genetic architecture of the trait.
Note
• To apply GS successfully the constructed models need to accurately predict
the genetic performance.
BENEFITS OF GS
Accuracy depends on:
• The level of linkage disequilibrium (LD) between the markers and the QTL
(effective population size and genotyping density).
• The number of individuals with phenotypes and genotypes in the reference
population (training set) from which the marker effects are estimated.
• The heritability of the trait in question, or, if deregressed breeding values
are used (clonal means or progeny testing), the reliability of these breeding
values.
• The distribution of QTL effects, i.e. number of loci involved.
• Quality of the phenotyping used to construct the prediction model.
GENOMIC SELECTION
• BLUP-Based: G-BLUP, RR-BLUP, RR-BLUP_B
• Bayes-Based: BayesA, BayesB, BayesCπ, BayesR
• LASSO-Based: Bayesian Lasso Regression, Improved Lasso
• Semi-Parametric Regression: RKHS
• Non-Parametrics: Suport Vector Machine, Neural-Networks
• Others...
Meuwissen et al 2001; Habier et al 2011; De los Campos et al 2009; Legarra et al 2011;
Gianola et al 2006; Long et al 2011; Gianola et al 2011
ANALYTIC METHODS FOR GS
• Genomic BLUP (GBLUP) is a Genomic Selection method that uses the
same framework than BLUP analysis, but replaces:
– The numerator relationship matrix (A) derived from the pedigree by,
– The realized relationship matrix (GA) derived from molecular
markers.
• GA is also known as observed relationship matrix or genomic matrix.
GBLUP
Example:
• If the markers are capturing all genetic variation, then we can assume that:
• If we also assume:
• Then we get:
which is a covariance matrix for the individual breeding values a
a =Wm
1220
0112
2022
2101
W
14.0
08.0
02.0
24.0
m
02.0
42.0
80.0
44.0
a
Wma 2
mI)m(V
FROM MARKERS TO GA
2
m'WW)a(V
• Ideally, we want to model this covariance using the same classical Linear
Mixed Model framework, therefore, it would be desirable to have this
matrix in terms of σ2a
• If we recall then:
by replacing σ2m.
SNPsALL
i miia qp_
1
222
SNPsALL
i ii
a
qpm _
1
2
2
2
2
i
2
2aA
ii
a G'WW
)a(Vqp
FROM MARKERS TO GA
2
m'WW)a(V
β vector of fixed effects
b vector of random design effects (e.g. block effect), ~ N(0, Iσ2b)
a vector of random additive effects (i.e. BV), ~ N(0, GAσ2a)
e vector of random residual effects, ~ N(0, Iσ2)
Note:
• The variance-covariance matrix (GA) of the additive effects is now
derived from molecular markers, and it replaces the old A matrix.
eaZbZXβy 21
ANIMAL MODEL GBLUP
• Genomic BLUP (GBLUP) is a Genomic Selection method that uses the
same framework than BLUP analysis, but replaces:
– The numerator relationship matrix (A) derived from the pedigree by,
– The realized relationship matrix (GA) derived from molecular
markers.
• GA is also known as observed relationship matrix or genomic matrix.
GBLUP
125.000
25.0125.025.0
025.015.0
025.05.01
A
99.020.001.002.0
20.003.126.023.0
01.026.099.042.0
02.023.042.098.0
AG
125.000
25.0125.025.0
025.015.0
025.05.01
A
99.020.001.002.0
20.003.126.023.0
01.026.099.042.0
02.023.042.098.0
AG
ADVANTAGES AND CONSIDERATIONS
• The use of GBLUP instead of the pedigree-based BLUP was shown to
partition better the genetic from environmental variation.
• The A matrix is derived based on the infinitesimal model and represents and
average relationship.
• The relationship matrix derived from the markers is more informative
because the relationships estimates include the Mendelian sampling.
• Finally, GBLUP is unbiased: E(GA) = A
GBLUP
GBLUP
ADVANTAGES AND CONSIDERATIONS (cont.):
• GBLUP uses the same framework that BLUP (Linear Mixed Models).
• Fewer normal equations need to be solved in the fitting of the model.
• GBLUP is equivalent to RR_BLUP but it is simpler to implement.
• Allows the direct estimation of individual’s accuracies (i.e. SEP found in
sln files).
• Permits the simultaneous analysis of genotyped an non-genotyped
individuals.
Problem:
• GA matrix is usually not positive definite
Solution:
• Bending the matrix (e.g. diag(GA) + 0.00001).
• Blending the matrix (e.g. GA* = 0.99 GA + 0.01 A).
GBLUP
COMPUTING THE RELATIONSHIP MATRIX
• There are several different algorithms to compute the GA matrix from SNP
data:
• Hayes and Goddard (2008)
• Van Raden (2008) – 2 methods
• Yang et al. (2010) – Human genetics
• Relationship matrices work well to model the variance-covariance of
additive effects assuming a large number of markers is used.
• Overall, the different algorithms to calculate GA do not differ considerably
in their predictive ability.
GBLUP in ASReml
User supplied special variance structures
• The relationship matrix (GA) is computed using a given algorithm from other
software (R, Fortran, etc.) based on molecular markers, and then supplied to
ASReml.
• The GA matrix is supplied as an independent file in ASCII format.
• It should be located (in the job file) after the pedigree file, but before the
dataset file (there is a maximum of up to 98 GA matrices)
• The extension of the file is:
name.grm if the relationship matrix, GA, is provided.
name.giv if the inverse of the relationship matrix, GA-1, is provided.
GBLUP in ASReml
G matrix format
• Could be in dense format (lower triangular row-wise), but need to specify the !DENSE command, or
• Can be read as SPARSE (default) format: row, column, value (lower
triangular row-wise sorted column within rows).
• All diagonal elements of the matrix must be included in the file (even 1s).
Options
!SKIP [n]
!DENSEGRM, !DENSEGIV
!SAVEGIV [f] default dense format, use f = 1 for sparse format
Warning
• The number and order of levels have to match perfectly the ones used for the associated factor, e.g. animalID, read in the data.
GBLUP in ASReml
How to associate the G matrix with the genetic factor?
A. In the variance specification lines, e.g.
B. Directly in the model, e.g.
DAYSM ~ mu SEX !r INDIV
0 0 1
INDIV 1
INDIV 0 GIV1 0.2
DAYSM ~ mu SEX !r giv(INDIV,1) 0.12
Warning: The number and order of levels have to match the ones used for
the associated factor read in the data.
An experiment consisting in evaluating a total of 10 individuals originating from
full-sib families of 4 sires and 4 dams. The objective is to fit a parental model
(i.e. select sires) that considers the molecular pedigree information.
GBLUP in ASReml
DATA.txt
INDIV Sire Dam Resp
1001 10 50 155
1002 10 60 121
1003 10 70 130
1004 20 50 141
1005 20 60 130
1006 20 70 162
1007 30 50 118
1008 30 60 108
1009 30 70 119
1010 40 80 143
PEDSIRE.txt
INDIV Sire Dam
10 1 0
20 2 0
30 2 0
40 1 0
Example: /GBLUP/
GBLUP in ASReml
100250
012500
025010
250001
.
.
.
.
A
9870068002303640
0680016122600360
0230226099200120
3640036001200231
....
....
....
....
AG
1751093000104210
0930046123700730
0010237006210200
4210073002001301
1
....
....
....
....
AG
GMATRIX.grm
Col Row G
1 1 1.023
2 1 0.012
2 2 0.992
3 1 -0.036
3 2 0.226
3 3 1.016
4 1 0.364
4 2 0.023
4 3 0.068
4 4 0.987
GINVG.giv
Col Row GINV
1 1 1.130249244
2 1 -0.020490012
2 2 1.062319971
3 1 0.072807826
3 2 -0.2369711
3 3 1.045793666
4 1 -0.421368173
4 2 -0.0008723
4 3 -0.093379618
4 4 1.175023193
10 20 30 40
GBLUP in ASReml
!RENAME !ARGS 2
Evaluating GBLUP
INDIV 10 !I
Sire 4 !I
Dam 3 !I
Resp
GINVM.giv !SKIP 1
DATA.txt !SKIP 1 !DOPART $A
!PART 2 # Using GINVM.giv
Resp ~ mu !r giv(Sire,1) Dam
predict Sire
!PART 3 # Another way for GINV
Resp ~ mu !r Sire Dam
1 1 1
10 0 ID
Sire 1
Sire 0 GIV1 200
predict Sire
!RENAME !ARGS 4
Evaluating GBLUP
INDIV 10 !I
Sire 4 !I #!P #!I
Dam 3 !I
Resp
GMATRIX.grm !SKIP 1
DATA.txt !SKIP 1 !DOPART $A
!PART 4 # Using GMATRIX.grm
Resp ~ mu !r giv(Sire,1) Dam
GINV Matrix G Matrix
GBLUP in ASReml
!RENAME !ARGS 4
Evaluating GBLUP
INDIV 10 !I
Sire 4 !P #!I
Dam 3 !I
Resp
DUMMYPED.txt !MAKE !SKIP 1
GMATRIX6.grm !SKIP 1
DATA.txt !SKIP 1 !DISPLAY 7 !DOPART $A
!PART 5 # Doing Predictions GMATRIX6.grm
Resp ~ mu !r giv(Sire,1) Dam
Predictions for ‘new’ individuals
1.023 0.012 -0.036 0.364 0.083 0.176
0.012 0.992 0.226 0.023 0.023 0.508
-0.036 0.226 1.016 0.068 -0.011 0.136
0.364 0.023 0.068 0.987 0.123 0.495
0.083 0.023 -0.011 0.083 0.996 0.077
0.176 0.508 0.136 0.495 0.077 1.010
AG
10 20 30 40 50 60
GBLUP in ASReml
Sire Predicted_Value Standard_Error Ecode
10 135.8410 7.3084 E
20 141.4311 7.3654 E
30 120.1485 7.3634 E
40 137.4303 9.8927 E
50 134.5924 15.2820 E
60 139.5677 11.4333 E
SED: Overall Standard Error of Difference 12.58
Predictions for ‘new’ individuals
Source Model terms Gamma Component Comp/SE % C
Dam 4 4 0.318666 46.7809 0.48 0 P
giv(Sire,1) 6 6 1.14196 167.642 0.81 0 P
Variance 10 9 1.00000 146.802 1.41 0 P
GBLUP in ASReml
FINAL COMMENTS
• Modifications can be done that incorporate observed relationships of parents
and all offspring.
• Individuals with measurements correspond to training population and ‘new’
individuals in GA matrix are treated as prediction population.
• It is possible to combine pedigree data (A) with observed relationships (GA)
into a single matrix. This will allows to consider individuals without
molecular data.
• Observed dominance (GD) relationship matrix can also be incorporated to
model these interactions or higher order interactions, e.g. A#D.
• Further understanding of the construction (and properties) of the GA matrix
are required.