Analysis of Experiments using ASReml › uploads › 3 › 8 › 9 › 6 › 38964623 › ... · 2019-09-29 · CONTENTS 3 Session 1 Introduction to ASReml 2 Introduction to Linear

Analysis of Experiments using ASReml:

with emphasis on breeding trials ©

Salvador A. Gezan [email protected]

Patricio R. Muñoz

[email protected]

October, 2014

mailto:[email protected]

mailto:[email protected]

CONTENTS

Session

1 Introduction to ASReml

2 Introduction to Linear Mixed Models

3 Job Structure in ASReml

4 Breeding Theory

5 Genetic Analyses: Parental Models

6 Genetic Anayses: Animal Models

5 Variance Structures in ASReml

6 Multivariate Analysis

7 Multi-environment Analysis

8 Spatial Analyses

9 Generalized Linear Mixed Models

10 Introduction to GBLUP

11 GBLUP in ASReml

Session 1

Introduction to

ASReml

Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©

“ASReml is an statistical packages that fits linear mixed models using

Residual Maximum Likelihood (REML)”

“Typical applications include the analysis of (un)balanced longitudinal data,

repeated measures analysis, the analysis of (un)balanced designed

experiments, the analysis of multi-environment trials, the analysis of both

univariate and multivariate animal breeding, genetics data and the analysis

of regular or irregular spatial data.”

ASReml uses the Average Information (AI) algorithm and sparse matrix

methods.

• Useful for analysis of large and complex dataset.

• Very flexible to model a wide range of variance models for random effects

or error structures (however, complex to program).

WHAT is ASReml?

Distributor Page

http://www.vsni.co.uk/products/asreml (version 3)

Platforms

Windows 98/ME/2000/XP/Vista/Windows7

Linux

Apple Macintosh

Interface

DOS (edit)

Windows (Notepad, ASReml-W)

R (or S-plus)

Text editors (e.g. ConTEXT)

GSView (graphical viewer)

Terminal (Mac)

HOW TO GET ASReml?

http://www.vsni.co.uk/products/asreml

http://www.vsni.co.uk/products/asreml

WHERE TO GET HELP?

Official Documentation c:\Program Files\Asreml3\Doc\

UserGuide.pdf (use Find window for searching)

UpdateR3.pdf

Webpages

uncronopio.org/ASReml/HomePage (cookbook)

http://www.vsni.co.uk/software/asreml/htmlhelp/ (distributor page)

www.vsni.co.uk/forum (user forum)

http://uncronopio.org/ASReml/HomePage

http://uncronopio.org/ASReml/HomePage

http://www.animalgenome.org/bioinfo/resources/manuals/ASReml/ASReml.htm

http://www.vsni.co.uk/software/asreml/htmlhelp/

http://www.vsni.co.uk/forum

STEPS FOR AN ANALYSIS

• Identify the problem and experimental design / observational study.

• Detail treatment and design structure.

• Specify hypotheses / components of interest.

• Collect and prepare data file (e.g. Excel, Access).

• Perform initial data validation and exploratory data analysis (EDA) in

statistical software (e.g. SAS, R, GenStat).

Definition / modification of linear model.

Running / fitting of linear model.

Checking output.

• Extract final output.

• Report analysis.

STEPS FOR AN ANALYSIS IN ASReml

• Prepare ASCII data file (any ASCII editor).

• Prepare a job file (.as, e.g. ASReml-W, ConTEXT).

• Run analysis in ASReml (submit job).

• Check diagnostic plots and output.

• Extract results from output files (e.g. .asr, .sln, .yht).

• Review, revise, re-run fitted model.

• Report analysis.

ALFALFA EXPERIMENT

Source Variety Bk1 Bk2 Bk3 Bk4 Bk5 Bk6

1 A 2.17 1.88 1.62 2.34 1.58 1.66

1 B 1.58 1.26 1.22 1.59 1.25 0.94

1 C 2.29 1.60 1.67 1.91 1.39 1.12

1 D 2.23 2.01 1.82 2.10 1.66 1.10

2 E 2.33 2.01 1.70 1.78 1.42 1.35

2 F 1.38 1.30 1.85 1.09 1.13 1.06

2 G 1.86 1.70 1.81 1.54 1.67 0.88

2 H 2.27 1.81 2.01 1.40 1.31 1.06

3 I 1.75 1.95 2.13 1.78 1.31 1.30

3 J 1.52 1.47 1.80 1.37 1.01 1.31

3 K 1.55 1.61 1.82 1.56 1.23 1.13

3 L 1.56 1.72 1.99 1.55 1.51 1.33

An experiment was establish to compare 12 alfalfa varieties (labeled A-L).

These correspond to 3 different sources but the objective is to estimate

heritability of varieties regardless of its source. A total of 6 plots per variety

were established arranged in a RCB design. The response variable corresponds

to yield (tons/acre) at harvest time.

Example: /Day1/Alfalfa/ALFALFA.txt

ALFALFA EXPERIMENT

Consider a model with block as fixed and variety as random effects.

yield = µ + block + variety + error

yij observation belonging to ith treatment jth block

αi fixed effect of the ith block

gj random effect of the jth variety, E(gj) = 0, V(gj) = σg2

eij random error of the ijth observation, E(eij) = 0, V(eij) = σ2

i = 1, … , 6 (r blocks)

j = 1, … , 12 (t treatments)

ijjiij egy

ALFALFA EXPERIMENT

Source Variety Block Resp

1 A 1 2.17

1 B 1 1.58

1 C 1 2.29

1 D 1 2.23

2 E 1 2.33

2 F 1 1.38

2 G 1 1.86

2 H 1 2.27

3 I 1 1.75

3 J 1 1.52

3 K 1 1.55

3 L 1 1.56

...

3 J 6 1.31

3 K 6 1.13

3 L 6 1.33

Data file: /Day1/Alfalfa/ALFALFA.txt

ALFALFA EXPERIMENT

Alfalfa experiment - 12 varieties - Response Yield

Source 3 !I # Not used

Variety 12 !A !SORT

Block 6 !I

yield

ALFALFA.txt !SKIP 1

!DISPLAY 7 !SUMMARY

yield ~ mu Block !r Variety

predict Variety !SED !TDIFF !PLOT

Job file: /Day1/Alfalfa/Alfalfa.as

Some syntax

~ separates response from the list of fixed and random terms.

! Used for identification of option.

# Comment following (skips rest of line).

ALFALFA EXPERIMENT

ALFALFA EXPERIMENT

ASReml 3.0 [01 Jan 2009] Alfalfa experiment - 12 varieties - Response Yield

Build gt [26 Nov 2010] 32 bit

28 Sep 2013 16:28:33.369 32 Mbyte Windows Alfalfa

Licensed to: UFL 31-dec-2013

***********************************************************

* Contact [email protected] for licensing and support *

***************************************************** ARG *

Folder: C:\WORK\ASReml\ASReml_2013\Distribute_Instr\Day1\Alfalfa

Source 3 !I

Variety 12 !A !SORT

Block 6 !I

QUALIFIERS: !SKIP 1 !DISPLAY 7 !SUMMARY

Reading Alfalfa.txt FREE FORMAT skipping 1 lines

Univariate analysis of yield

Summary of 72 records retained of 72 read

Model term Size #miss #zero MinNon0 Mean MaxNon0 StndDevn

1 Source 3 0 0 1 2.0000 3

2 Variety 12 0 0 1 6.5000 12

3 Block 6 0 0 1 3.5000 6

4 yield Variate 0 0 0.8800 1.597 2.340 0.3584

5 mu 1

QUALIFIERS: predict Variety !SED !TDIFF !PLOT

Forming 19 equations: 7 dense.

Initial updates will be shrunk by factor 0.316

Notice: 1 singularities detected in design matrix.

1 LogL= 48.7345 S2= 0.61974E-01 66 df 0.1000 1.000

2 LogL= 50.0218 S2= 0.57316E-01 66 df 0.1705 1.000

3 LogL= 51.1506 S2= 0.52550E-01 66 df 0.2957 1.000

4 LogL= 51.6976 S2= 0.48748E-01 66 df 0.4902 1.000

5 LogL= 51.7366 S2= 0.47751E-01 66 df 0.5717 1.000

6 LogL= 51.7370 S2= 0.47654E-01 66 df 0.5808 1.000

7 LogL= 51.7370 S2= 0.47653E-01 66 df 0.5809 1.000

ALFALFA EXPERIMENT

Final parameter values 0.58087 1.0000

- - - Results from analysis of yield - - -

Approximate stratum variance decomposition

Stratum Degrees-Freedom Variance Component Coefficients

Variety 11.00 0.213732 6.0 1.0

Residual Variance 55.00 0.476526E-01 0.0 1.0

Source Model terms Gamma Component Comp/SE % C

Variety 12 12 0.580868 0.276798E-01 1.81 0 P

Variance 72 66 1.00000 0.476526E-01 5.24 0 P

Wald F statistics

Source of Variation NumDF DenDF F-inc P-inc

5 mu 1 11.0 858.95 <.001

3 Block 5 55.0 17.42 <.001

Notice: The DenDF values are calculated ignoring fixed/boundary/singular

variance parameters using algebraic derivatives.

Solution Standard Error T-value T-prev

3 Block

2 -0.180833 0.891185E-01 -2.03

3 -0.875000E-01 0.891185E-01 -0.98 1.05

4 -0.206667 0.891185E-01 -2.32 -1.34

5 -0.501667 0.891185E-01 -5.63 -3.31

6 -0.687500 0.891185E-01 -7.71 -2.09

5 mu

1 1.87417 0.792320E-01 23.65

2 Variety 12 effects fitted

SLOPES FOR LOG(ABS(RES)) on LOG(PV) for Section 1

0.87

Finished: 28 Sep 2013 16:28:34.002 LogL Converged

ALFALFA EXPERIMENT

a experiment - 12 varieties - Response Yield Residuals vs Fitted values Residuals (Y)-0.3828:0.4563 Fitted values (X) 0.95733: 2.09034

Alfal fa experi ment - 12 varieties - Response Yield vE_1_A

Histogram of resi duals 28 Sep 2013 16:28:33

Range: -0.382836 0.456330

Peak Count: 6

0

ALFALFA EXPERIMENT

Interpreting output

Source of

variation

Num

df

Den

df

Variance

ratio

P-value

Block 5 55 17.42 < 0.001

gi ~ N[0,σg2] sg

2 = 0.0277

eij ~ N[0,σ2] s2 = 0.0477

H2 = 0.0277/(0.0277 + 0.0477) = 0.367

LogL Converged



Variety 11.00 0.213732 6.0 1.0

Residual Variance 55.00 0.476526E-01 0.0 1.0


Variety 12 12 0.580868 0.276798E-01 1.81 0 P

Variance 72 66 1.00000 0.476526E-01 5.24 0 P

Wald F statistics


5 mu 1 11.0 858.95 <.001

3 Block 5 55.0 17.42 <.001

ALFALFA EXPERIMENT

Interpreting output

Alfalfa experiment - 12 varieties - Response Yield 19 Feb 2012 20:34:11

Alfalfa

Ecode is E for Estimable, * for Not Estimable

The predictions are obtained by averaging across the hypertable

calculated from model terms constructed solely from factors

in the averaging and classify sets.

Use !AVERAGE to move ignored factors into the averaging set.

---- ---- ---- ---- ---- ---- 1 ---- ---- ---- ---- ---- ----

Predicted values of yield

The SIMPLE averaging set: Block

Variety Predicted_Value Standard_Error Ecode

A 1.8130 0.0795 E

B 1.3714 0.0795 E

C 1.6485 0.0795 E

D 1.7702 0.0795 E

E 1.7275 0.0795 E

F 1.3675 0.0795 E

G 1.5812 0.0795 E

H 1.6330 0.0795 E

I 1.6796 0.0795 E

J 1.4542 0.0795 E

K 1.5086 0.0795 E

L 1.6071 0.0795 E

ASReml FILES

.apj Project file created with ASReml-W

.as Model and job specifications

.ass Summary statistics for variables from data set

.asr Report output of analysis and summary job

.aov Details of ANOVA table calculations

.sln Solutions of fixed and random effects

.pvs Report predictions and their standard errors

.res Residual statistics and basic residual plots

.ps Graphic files in PS format

.vvp Matrix of variance of variance components

.yht Residuals, predicted and hat values

.pin Calculations of functions of variance components

.pvc Report calculations of functions of variance components

Session 2

Introduction to

Linear Mixed Models


MIXED MODELS

• Mixed models extend the linear model by allowing a more flexible

specification of the errors (and other random factors). Hence, it allows for a

different type of inference and also allows to incorporate correlation and

heterogeneous variances between the observations.

• Fixed effects: are those factors whose levels are selected by a nonrandom

process or whose levels consist of the entire population of possible levels.

Inferences are made only to those levels included in the study. Hint: all

levels of interest are in your data set.

• Random effects: a factor where its levels consist of a random sample of

levels from a population of possible levels. The inference is about the

population of levels, not just the subset of levels included in the study.

• Mixed linear models contain both random and fixed effects.

ALFALFA EXPERIMENT


αi fixed effect of the ith block

gj random effect of the jth variety, E(gj) = 0, V(gj) = σg2


i = 1, … , 6 (r blocks)


ijjiij egy

MODEL FOR A RCBD

where,

Dataset: two factors to consider: one defining the block to which each

experimental unit is allocated, and the other to the treatment applied

to each unit.

ijjiij egy

yij observation belonging to the ith treatment jth block, i = 1 … r, j = 1 … t

μ is the population mean

αi fixed effects of the ith block

gj random effects of the jth variety, E(gj) = 0, V(gj) = σg2


gi ~ N[0,σg2]

eij ~ N[0,σ2]

Structural component (or blocking structure) • Concerned the underlying variability (heterogeneity) and structure of the

experimental or measurement units.

• “Controls” different sources of natural variation amongst the units using

factors (e.g. blocks) or variates (e.g. covariates).

Explanatory component (or treatment structure) • Defines the different treatments (or treatment combinations) applied to the

experimental units.

• Provides information about the differences in response caused by the

different treatments and answers the questions of interest.

Multi-stratum ANOVA: makes explicit the separation between blocks (or the

more general structure of units) and treatments.

MODEL COMPONENTS

response = systematic component + random component

response = structural component + explanatory component + random component

Hypothesis of interest

Fixed effects:

(i.e. is there a significant treatment effect)

Test statistic: F or t

Random effects:

(i.e. is there a significant variation due to the random effects)

Test statistic: Chi-square (likelihood ratio test)

MIXED MODELS

H0: µ1 = µ2 = … = µt

H1: µi ≠ µj for some i, j in the set 1 … t

H0: σg2 = 0

H1: σg2 > 0

ALFALFA EXPERIMENT

Consider a model with block as fixed and variety as random effects.



αi fixed effects of the ith block

gj random effects of the jth variety, E(gj) = 0, V(gj) = σg2


i = 1, … , 6 (r blocks)


ijjiij egy

ALFALFA EXPERIMENT

eZgXβy


2

2

2

0

...

0

g

g

g

G

2

2

2

0

...

0

R

tr

t

t1

11

tr

tr

t

1r

11

e

e

.

.

e

e

g

.

.

g

g

...

.

.

...

.

...

...

...

.

.

...

.

.

...

.

...

...

...

y

y

.

.

y

y

.

.

.

.

.

.

100

001

.

001

100

.

.

.

001

101

101

.

001

011

.

.

.

011

.

.

.

.

.

.

1

2

1

1

1

LINEAR MIXED MODEL

X (n x r) design matrix for fixed effects

β (r x 1) vector of fixed effects

Z (n x t) design matrix for random effects

g (t x 1) vector of random effects

e (n x 1) vector of random errors

G (t x t) matrix of variance-covariance of random effects

R (n x n) matrix of variance-covariance of random errors

0

0

e

gE

R0

0G

e

gVareZgXβy

ALFALFA EXPERIMENT

tgg

g

g

g

tg

g

g

IG22

2

2

2

2

1

10

...

1

01

0

...

0

...

rt

rte

e

e

IR2

2

2

2

12

11

0

...

0

tggg ...21

rteee ...1212

Assumptions

• Random effects: E(g) = 0, V(g) = G = G(θ)

• Deviations: E(e) = 0, V(e) = R = R(θ)

• g and e independent.

hence, E(y) = Xβ

Var(y) = V = V(θ) = V(y) = ZGZ’ + R

Note: normality assumptions can be made about g and e.

LINEAR MIXED MODEL

g ~ MVN(0, G) and e ~ MVN(0, R)

0

0

e

gE

R0

0G

e

gVareZgXβy

• Variance components need to be estimated before obtaining estimates of

fixed/random effects and performing any type of inference.

• Restricted/residual maximum likelihood (REML) is a likelihood-based

method used to estimate these variance components and is based assuming

that both g and e follow a multivariate normal distribution.

• The REML variance component estimates are later used to estimate the

solution of fixed and random effects.

• Henderson (1950) derived the Mixed Model Equations (MME) to obtain

the solutions of all affects simultaneously:

G = G(θ)

R = R(θ)

VARIANCE COMPONENTS

V = V(θ) = V(y) = ZGZ’ + R θ → ^ ^ ^

^ ^ ^ ^ ^ ^

yVXXVXβ111 ˆ')ˆ'(ˆ

)ˆ(ˆ'ˆˆ 1βXyVZGg

BLUE → EBLUE

BLUP → EBLUP

VARIANCE STRUCTURES

2

2

2

2

2

000

000

000

000

1000

0100

0010

0001

2

4

2

3

2

2

2

1

000

000

000

000

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

2

1

1

1

1

1

1

1

1

123

112

211

321

2

2

44

2

34

2

24

2

14

2

34

2

33

2

23

2

13

2

24

2

23

2

22

2

12

2

14

2

13

2

12

2

11

ID: identity

DIAG: diagonal

CORUV: uniform correlation

AR1V: autocorrelation 1st order

US: unstructured

CORUH: uniform heterogeneous

2

4434241

43

2

33231

4232

2

221

413121

2

1

CORRELATION STRUCTURES

CORU: unform correlation

CORG: general correlation AR1: autocorrelation 1st order

CORUB: banded correlation

1

1

1

1

1

1

1

1

123

112

211

321

1

1

1

1

342414

342313

242312

141312

1

1

1

1

123

112

211

321

• V(β) = (X’V-1X)-1

• V(Lβ) = L(X’V-1X)-1L’

• Lβ is the best linear unbiased estimate of Lβ

PROPERTIES OF EBLUE (optional)

• Test of H0: Lβ = 0

β’L’(LX’V-1XL’)-1Lβ ~

• 100(1-α)% confidence interval for l’β

l’β ± zα/2 l’(X’V-1X)-1l

F (approx) with df1= r(L) and df2

(Satterthwaite or Kenward-Roger)

yVXXVXβ111 ˆ')ˆ'(ˆ

PROPERTIES OF EBLUP (optional)

• Linear Combination of a function of fixed and random effects:

Predictions

yRZ'

YRX'

GZRZ'XRZ'

ZRX'XRX'

g

β-1

-1

-1-1-1

-1-1

ˆ

ˆ

ˆˆˆ

ˆˆ

ˆ

ˆ

yRZ'

yRX'

CC

CC

g

β-1

-1

zzzx

xzxx

ˆ

ˆ

ˆ

ˆ

zz

zz

xx

Cgg

CGg

Cβ

)-(

)(

)(

ˆVar

ˆˆVar

ˆVar

)MCG(M'LCL'P

gM'βL'P

zzxx

ˆ)ˆ(Var

ˆˆˆ

PROPERTIES OF EBLUP (optional)

PEV: predictor error variance

r2: reliability (correlation between true and predicted BV)

r: accuracy

SE(BLUP): standard error of a random effect

ii

i cˆ )gSD(

222 )1()gPEV( ee

ii

i rcˆ

2

2

2

2 1PEV

1)g(g

eii

g

i cˆr

2

2 PEV1)g()g(

g

iiˆrˆr

TESTING VAR. COMPONENTS

LRT: likelihood ratio test

• Based on asymptotic derivations.

• Used to compare nested models and is valid if the fixed effects are the same

(under REML).

• Examples:

• Test Statistic: d = 2 [ logL2 – logL1] ~ χ2r2-r1

Hypothesis P-value

Two-sided Prob(χ2r2-r1 > d)

One-sided 0.5(1 – Prob(χ21 ≤ d))

H0: ρ = 0 against H0: ρ ≠ 0

H0: σ2

g = 0 against H0: σ2

g > 0

TESTING VAR. COMPONENTS

Critical values

Goodness-of-fit statistics

• AIC and BIC can be used to select/rank non-nested models

r2 - r1 α = 0.05 α = 0.01

Δdf Two-sided One-sided Two-sided One-sided

1 3.84 2.71 6.63 5.41

2 5.99 4.61 9.21 7.82

3 7.81 6.25 11.34 9.84

4 9.49 7.78 13.28 11.67

5 11.07 9.24 15.09 13.39

AIC = – 2×logL + 2×t

BIC = – 2×logL + 2×t×log(v)

t number of variance parameters in the model

v residual degrees of freedom, v = n – p

ALFALFA EXPERIMENT


Variety 12 12 0.580868 0.276798E-01 1.81 0 P

Variance 72 66 1.00000 0.476526E-01 5.24 0 P

7 LogL= 51.7370 S2= 0.47653E-01 66 df 0.5809 1.000

Model with Variety

Model without Variety 2 LogL= 44.8781 S2= 0.75332E-01 66 df 1.000


Variance 72 66 1.00000 0.753324E-01 5.74 0 P

Testing Variety

H0: σ2

g = 0 against H0: σ2

g > 0

d = 2 [51.737 – 44.878] = 13.72 , Δdf = 1

χ20.05 = 2.71, p-value < 0.001

Testing Genetic variation

H0: H2 = 0 against H0: H

2 > 0

Session 3

Job Structure in

ASReml


JOB FILE

STRUCTURE .as FILE

PART A: Data definition and reading of data set.

PART B: Definition of analysis (options, linear model, output).

[Job title]

[Data definition]

[Specification of file(s) to read]

[Options]

[Linear model: factors and variables]

[Linear model: variance structure(s)]

[Additional output]

[Restrictions on variance components]

Note: some options can be indicated in this file or they can be added in the

batch command line.

General Relevant File Syntax

~ separates response from the list of fixed and random terms.

! used for identification of option.

comment following (skips rest of line).

, model specification continues on next line.

$ specifies an user-input option from commands.

Basic Model Syntax Operators . interaction (e.g. A.B, interaction A and B).

/ forms nested factor expansion.

* forms crossed factor expansion.

+ treated as a space.

- excludes model term from model.

JOB FILE

READING / MANIPULATING DATA

Column variables (continuous or discrete)

• Indented by a single space.

• Case sensitive!

• Should follow the same order of the data in original file.

• Less than 16 characters (recommended).

• Should start with character

• No spaces in field name.

Examples of name and type of variables

yield yield is a continuous variable.

treatment * treatments is a simple coded factor (as 1, 2, ... ).

Variety 12 !A variety is an alphabetically coded factor.

dose 4 !I dose is a numerically coded factor (any number).

sex 2 !I !L m f assigns labels to numerical values.

mother [n] !P mother is link to a pedigree structure.

• ASCII file (delimited by: tab, comma or space). • “NA”, “*” and “.” identify missing values.

• First s lines can be skipped by using !SKIP s

• Labels are stored in the order on which they are read.

Some manipulations/transformations

!FILTER f it will filter the variable f

!SELECT v selects observations equal to v from variable f (Above)

!=v to create/overwrite a variable with all values equal to v

!+o sums to variable the number o

!-o subtracts to variable the number o

!*o multiplies the variable by the number o

!/o divides variable by number o

!^p raises the variable to the power p

!^0 calculates the natural logarithm of the variable

!D v eliminates record with missing values or v

!M v converts values of v to missing values

!REPLACE o n replace data values o with n


Examples

Yield !*100 variable yield is multiplied by 100 as is read

Yield !M-9 observations with -9 are changed to missing

Yield !^0 calculates the natural log of variable yield

Ymean !=0 !+Y1 !+Y2 !/2 mean of two variables

Relevant Options

!SUMMARY provides a histogram, correlations, counts, etc. (see file .ass)

!OUTLIER performs additional outlier checks (see files .res and .yht)

!X x !Y y produces an scatter-plot for variables x and y

!SORT re-orders labels in alphabetical order

!MVINCLUDE missing values in a factor or variate are treated as zeros.

!WORKSPACE m assigns m Mbytes of memory for the fitting model

!EXTRA n forces n additional iterations after model converge

!MAXIT m indicates a maximum of m iterations

!DOPART $A indicates that different parts will be done

!PART n a specific model n within a job file (may list several parts)

!CONTINUE re-starts fitting of model from last iteration


Relevant Options !DISPLAY n selects type(s) of diagnostic plot

!NODISPLAY suppresses diagnostic plot output

!PS saves plots in ps format

!EPS saves plots in eps format

!PNG saves plots in png format

!EPS saves plots in eps format

!WMF saves plots in wmf format

!BMP saves plots in bmp format

Coding !DISPLAY n

1 = variogram

2 = histogram

4 = row and column trends

8 = perspective plot of residuals

e.g. 1 + 8 = 9 !DISPLAY 9 (default)

GRAPHICAL OUTPUT

Specification of Linear Models

Univariate case

y ~ <fixed dense> !r <random sparse> !f <fixed sparse>

mu the constant term or intercept (overall mean)

!r random effects to follow

!f sparse fixed effects to follow (not in ANOVA table)

mv term to estimate missing values (as fixed effects)

Examples

yield ~ mu Variety !r Block

Volume ~ mu Site Site.Block !r Mother Mother.Site !f mv

JOB FILE

Specification of Linear Models

• ASReml uses the Wilkinson and Rogers (1973) notation.

A.B indicates crossed factors

A*B = A + B + A.B SAS: A + B + A*B

A/B = A + A.B SAS: A + B(A)

• Note that the model term A.B denotes interaction or nested effects

depending on which other terms are previously included in the model.

Examples

Volume ~ mu Site !r Genotype Site.Genotype

Volume ~ mu Site !r Site.Genotype

Yield ~ mu A.B !r Block

JOB FILE

Model functions

(to be used after an specified column, or to create new model variables).

and(t) overlays a design matrix for a model term into an existing one

at(f,n) creates a binary variable for the condition specified in a factor

fac(v) forms a factor with the values of a continuous variable

lin(f) transform the factor f into a covariate

uni(v) creates a factor with a level for every record in the data file

fav(v,y) forms a factor with the levels of a combination of 2 factors

ide(f) fits an additional factor without its genetic relationship matrix

inv(v) calculates inverse of variable v

log(v) calculates the natural logarithm of v

pow(y,p) calculates the variable y to power v

sqrt(v) calculates the square root of v

spl(v,n) fits a spline for variable v with n knots

pol(y,n) forms a set of orthogonal polynomials of order n

JOB FILE

Some options in the variance components

!GP restricts to the positive parameter space

!GU unrestricted

!GF fixed at a given supplied value (e.g. starting value)

!VCC c indicates the number of variance parameters constraints

Example

Volume ~ mu Block !r Mother 0.25 !GF Plot 0.4 !GU

JOB FILE

RUNNING ASReml (BATCH MODE)

>asreml –<options> <filename> <arguments>

<options> single letter that indentifies output or job options.

<filename> file “.as” with job details.

<arguments> allows for specific user-defined arguments.

Some options

-c re-start iteration from latest one (continue)

-p calculation of a function of variance components (.pin)

-sm assigns different memory space to job (usually 4, 5, 6, 7 or 8)

-rn renames the file with the argument n (default n = 1)

-n suppress interactive graphics

Examples

>asreml –rs3c Alfalfa 1

RUNNING ASReml (JOB FILE MODE)

Add commands/arguments in the first line of job file.

Equivalent to using batch mode but useful within ASReml-W

Some options

!RENAME renames the file with the arguments

!ARGS n specifies the arguments (can be more than one)

!NOGRAPHS suppress interactive graphics

!WORKSPACE w sets workspace to w Mbytes (e.g. 1600)

!CONTINUE re-start iteration from the latest one

Example

!RENAME !ARGS 1 2 !WORKSPACE 1600

Session 4

Breeding Theory


Discrete variation

• Different phenotypic classes are easily distinguished among genotypes

• Few genes with large effect (i.e. major genes).

xi ~ Bin(n, p)

Quantitative variation

• No clear classes between genotypes. Corresponds to most economically

important traits in animal and plant breeding.

• Due to the effect of many genes that contribute to the phenotypic

variation. Every gene with a small additive effect, plus some

environmental variation (infinitesimal model, Fisher 1918).

Probability distribution gi ~ N(0, σ2)

GENETIC VARIATION

p = μ + g + e

• Phenotypic value (p) deviates from the mean (μ) because the genotypic

component (g) and the environmental deviation (e).

• To isolate g we need to test the progeny!!!

g = a + d + i

p = μ + a + d + i + e

a is the additive component, i.e. cumulative effect of the genes or breeding

value (also known as GCA).

d is the dominance deviation, i.e. interaction between alleles or within-locus

interaction (also known as SCA).

i is the epistatic deviation, i.e. between-loci interaction and higher order

interactions.

e is the random deviation o residual.

PHENOTYPIC VALUE

• Partition of the variance is central to quantitative genetics and breeding,

because is the way we quantify the relative importance of genetic and

environmental influences (e.g. heritability).

• Partition is possible with data where the resemblance among relatives can be

used to estimate genetic variance components.

Vp = Vg + Ve

Vp = Va + Vna + Ve

where, Vna = Vd + Vi is the non-additive variance.

• In the statistical analysis (MM) the genetic variance estimates (e.g. Va) are

obtained by relating them to the causal component (e.g. σa2)

VARIANCE COMPONENTS

Broad sense heritability or degree of genetic determination

H2 = Vg / Vp How much of the total variation is due to genetic

causes (g). Important when working with clonally

replicated individuals.

Narrow sense heritability

h2 = Va / Vp Extent to which phenotypes are determined by the

genes transmitted from parents. Determines the degree

of resemblance among relatives. The most important

measure for breeding programs.

Heritabilities vary from 0 to 1 (e.g. 0.5 could be considered high).

Other definitions: family, plot-mean heritabilities and clonal repeatability

HERITABILITY

Definition

• The average effect of the parental alleles passed to the offspring determine

the mean genotypic value of its offspring, or

• The genetic value of an individual (or cross) judged by mean value of its

progeny.

- Sum of average effects across loci (theoretical, now molecular).

- Mean value of offspring (practical).

• Not equivalent concepts if interaction between loci is present or if mating is

not at random.

Estimation

• By BLUP (Best Linear Unbiased Predictor), i.e. the prediction of the

random effects from linear mixed models.

BREEDING VALUE (BLUP)

vector of random effect predictions.

covariance matrix between observations and random

(genetic) effects to be predicted.

variance-covariance matrix for the observations.

individual observations ‘corrected’ by fixed effects.

Gain

Note: the expression changes depending of what trait is being evaluated (y).

BLUP (or EBLUP)

g

C'Z'G ˆ

V

)βX(y ˆ

)(g

)(][g

2

22

yyhˆ

yy/ˆ

ˆˆˆˆ

ii

ipai

)βX(yVZ'Gg

1

)ˆ(ˆ'ˆˆ 1βXyVZGg

• All kind of selection have by aim to increase frequency of favourable

alleles at loci influencing the selected trait(s)

• Types: mass, parental, family, combined, indirect, forward, backward.

SELECTION

Selected

population

Propagation

population

Base

population

Increase

genetic gain

Increase

diversity

Example

Assuming normal distribution, truncated selection and h2 = 0.4

S = μselected – μpopulation = 35 – 25 =10 cm

SELECTION DIFFERENTIAL (S)

25 cm 35 cm

29 cm

• In mass selection, genetic gain can be quantified as the difference between the

average breeding (e.g. additive) values from the selected and original

population, i.e.

But then

• Genetic gain depends of the selection intensity (i), heritability (h2) and the

phenotypic standard deviation.

• Here i corresponded to the selection differential

(S = μselected – μpopulation) expressed in terms of phenotypic standard deviations.

GENETIC GAIN (GA)

ShaaG PSa

2

pSi /

pa hiShG 22

Definition: Correlation between traits (pleitrophy)

• Property of genes of influencing more than one phenotypic trait.

• It could be negative or positive (-1 to 1).

• Informs about the biological relationships among traits.

• Assists in the selection of ‘good’ individuals by looking into two traits

simultaneously.

TYPE-A CORRELATIONS

)()(

),(

21

21)(

pVarpVar

ppCovrg pA

)()(

),(

21

21)(

gVargVar

ggCovrg gA

Indirect Selection

1)(2121 paAa rghhiG

• Is a relative expression of genotype-by-environment interaction.

• It could be zero or positive (0 to 1).

• A value close to 0 indicates that the rank in one environment is very

different than the rank in another environment (i.e. low stability)

• A value close to 1 indicates that a single ranking can be used across all

environments without loss of information (i.e. high stability).

• Vaxs is the variance estimation of the site by genotype interaction.

• The following expressions represent the average correlation between sites

(if more than 2 sites are analyzed).

gxsg

g

VV

V

2

)(gBrgaxsa

a

VV

V

2

)(aBrg

TYPE-B CORRELATIONS

Definition: Correlation between sites

Session 5

Genetic Analyses:

Parental Models


Parental Models

• Half-sib crosses / sire model.

– One parent known. Parent selection.

• Full-sib crosses model.

– Both parents known. Parent/cross selection. Add and Dom effects estimable.

• Family model.

– Both parents known. Cross selection. Add and Dom effects confounded.

• Clonal model.

– Clonally replicated individuals. Parent/cross/individual selection.

Individual Models

• Animal model.

– One or two parents known. Individual/parent selection.

• Reduced animal model.

– One or two parents known. Individual/parent selection (only individuals with

records).

GENETIC MODELS

General aspects

• One parent is known (mother, sire, variety).

• The other parent is assumed to be unknown and to mate at random.

• Only additive component (Va) can be estimated.

• Useful for selection of parents (backward selection).

• Parental pedigree can (and should) be incorporated.

• Runs faster than other models (e.g. animal model).

Difficulties

• Concern about situations under non-random mating.

• Selection does not capture non-additive genetic variability.

HALF-SIB / SIRE MODEL

y vector of observations

β vector of fixed effects

b vector of random design effects (e.g. block or plot effect), ~ N(0, Iσ2b)

s vector of random sire effects (i.e. ½ breeding value), ~ N(0, Aσ2s)

e vector of random residual effects, ~ N(0, Iσ2)

X, Z1 and Z2 are incidence matrices

A is the numerator relationship matrix for sires. Replace by I if no pedigree.

I is an identity matrix

Va = 4 σ2s Vp = σ2

s + σ2

h2 = Va / Vp = 4 σ2s / [σ

2s + σ2]

HALF-SIB / SIRE MODEL

esZbZXβy 21

A tree genetic study consisting on seeds from a total of 28 female parents were

collected from mass selection and tested in a RCBD together with 3 control female

parents. The experiment consisted in 10 replicates with 34 plots each of size 2 x 3.

The response variables of interest are total height (HT, cm) and diameter at breast

height (DBH, cm). For now we will concentrate in the response HT. The objective is

to rank the female parents for future selections and seed production. In this analysis

parental pedigree will be ignored. Note that a model can be fitted with and without

the controls included as parents.

OPEN POLLINATION

Example: /Day1/OpenPol/OPENPOL.txt

ID REP PLOT FEMALE TYPE DBH HT

1 1 1 FEM1 Test 23.8 12.4

2 1 1 FEM1 Test 24.4 12.1

3 1 1 FEM1 Test 25.4 10.9

4 1 1 FEM1 Test 28.0 12.7

5 1 1 FEM1 Test 20.9 11.9

6 1 1 FEM1 Test 22.6 11.2

7 1 2 FEM15 Test 22.4 10.7

8 1 2 FEM15 Test 21.9 11.6

9 1 2 FEM15 Test 20.8 11.3

...

OPEN POLLINATION

!RENAME !ARGS 1

Open pollination trial

ID

REP 10 !I

PLOT 34 !I

FEMALE 31 !A !SORT

TYPE 2 !A !SORT

DBH

HT

OPENPOL.TXT !SKIP 1

!MAXIT 40 !DISPLAY 2 !DOPART $A

!PART 1

HT ~ mu REP !r FEMALE REP.PLOT

predict FEMALE

Example: /Day1/OpenPol/OpenPol_.as

OPEN POLLINATION

Interpreting variance components

fi ~ N[0,σs2] sf

2 = 0.196

pij ~ N[0,σp2] sp

2 = 0.053

eijk ~ N[0,σ2] s2 = 1.020


FEMALE 31 31 0.192379 0.196155 3.48 0 P

REP.PLOT 340 340 0.518915E-01 0.529102E-01 2.58 0 P

Variance 1876 1866 1.00000 1.01963 27.74 0 P

Va = 4 s2f = 4 x 0.196 = 0.785

Vp = s2f + sp

2 + s2 = 0.196 + 0.053 + 1.020 = 1.269

h2 = Va / Vp = 0.785 / 1.269 = 0.619

Extract solutions for every parent and rank!!! (.sln file)

FULL-SIB MODELS

General aspects

• Both parents are known (mother, father, family or cross).

• Mating is often planned (e.g. diallels).

• Additive and dominance component (Va and Vd) can be estimated.

• Some studies allow to obtain common environment, reciprocals, etc.

• Useful for selection of parents (backward selection) or specific crosses.

• Increased gain as dominance effects can be ‘captured’.

• Parental pedigree can be incorporated.

Difficulties

• Dominance effects usually estimated with low precision, or confounded with

other effects.

• Better results obtained with a proper planning of crosses (e.g. connected

diallels).

• Need to check connectivity and number of crosses per parent (male and

female) otherwise this model cannot be fitted.

β vector of fixed effects (e.g. μ, replicate)


m vector of random male effects (i.e. ½ BV), ~ N(0, Aσ2m)

f vector of random female effects (i.e. ½ BV), ~ N(0, Aσ2f)

mf vector of random interaction male by female effects, ~ N(0, Iσ2mf)


Va = 2 (σ2m + σ2

f) or Va = 4 σ2m (when σ2

m = σ2f)

Vd = 4 σ2mf

Vp = σ2m + σ2

f + σ2mf + σ2

h2 = Va / Vp = [2 (σ2m + σ2

f)] / [σ2

m + σ2f + σ2

mf + σ2]

d2 = Vd / Vp = 4 σ2mf / [σ

2m + σ2

f + σ2mf + σ2]

FULL-SIB: CLASSIC APPROACH

emfZfZmZbZXβy 4321

Example: /Day1/ContPol/CONTPOL.txt

A total of 177 families and 8 checklots were planted in a test using a RCBD with 25

blocks. For all families planted both parents are known. In this analysis parental

pedigree will be ignored. The objective is to estimate the different variance

components, and calculate heritabilities for the response variable YIELD.

FULL-SIB: CLASSIC

REP FAMILY FEMALE MALE YIELD

1 FAM007 PAR0001 PAR0024 128.68

1 FAM163 PAR0059 PAR0041 119.462

1 C10 C10 PAR0043 .

1 FAM040 PAR0020 PAR0053 103.641

1 FAM114 PAR0051 PAR0001 .

1 FAM053 PAR0032 PAR0032 .

1 FAM048 PAR0031 PAR0018 .

1 FAM057 PAR0033 PAR0035 155.226

1 FAM120 PAR0051 PAR0051 .

1 FAM165 PAR0059 PAR0059 193.982

1 FAM133 PAR0053 PAR0009 184.308

1 FAM057 PAR0035 PAR0033 .

1 C30 C30 PAR0043 141.912

1 FAM082 PAR0044 PAR0006 288.692

1 FAM060 PAR0034 PAR0037 .

1 FAM169 PAR0015 PAR0024 245.664

1 FAM047 PAR0031 PAR0016 .

...

FULL-SIB: CLASSIC

!RENAME !ARG 1

Control Crosses trial

REP 25 !I

FAMILY 182 !A

FEMALE 54 !A

MALE 57 !A

YIELD

ANALYSIS

FULLSIB

CHECKLOT

CONTPOLL2.TXT !SKIP 1

!MAXIT 40 !SKIP 1 !DISPLAY 2 !DOPART $A

!PART 1

YIELD ~ mu REP !r FEMALE MALE FEMALE.MALE

Example: /Day1/ContPol/ContPol_.as

FULL-SIB


fi ~ N[0, σf2] sf

2 = 295.9

mj ~ N[0, σm2] sm

2 = 315.6

Fij ~ N[0, σF2] sF

2 = 961.4

eijk ~ N[0, σ2] s2 = 3816.4

Va = 2 (s2f + s2

m) = 2 (295.9 + 315.6) = 1223.0

Vd = 4 s2F = 4×(961.4) = 3845.6

Vp = s2f + s2

m + s2F

+ s2 = 295.9 + 315.6 + 961.4 + 3816.4 = 5389.3

h2 = Va / Vp = 1223.0 / 5389.3 = 0.23

d2 = Vd / Vp = 3845.6 / 5389.3 = 0.71

Extract solutions for every parent and family and rank!!! (.sln file)


FEMALE 54 54 0.775232E-01 295.857 2.32 0 P

MALE 57 57 0.826941E-01 315.590 2.28 0 P

FAMILY 182 182 0.251902 961.350 6.07 0 P

Variance 3879 3854 1.00000 3816.36 42.73 0 P

FAMILY MODEL (Optional)

General aspects

• More common in animal breeding

• Occurs when parents are only present in a single cross.

• Parents might, or might not, be known.

• Additive and dominance component (Va and Vd) can not be separated, unless

there is a well connected parental pedigree.

• Useful for family selection or forward selection.

• Of practical use when dominance variance is known to be negligible.

Difficulties

• Dominance effects are confounded with additive effects.

• Potentially it could over-estimate future genetic gain.

β vector of fixed effects (e.g. μ, replication)


F vector of random family effects, ~ N(0, Aσ2F) or N(0, Iσ2

F)


σ2F = Va/2 + Vd/4

Vp = σ2F + σ2

h2cross

= Vfamily / Vp = σ2F / [σ

2F + σ2]

Va and Vd can not be separated unless we assumed that Vd = 0

If Vd = 0 then Va = 2 σ2F

h2 = Va / Vp = 2 σ2F / [σ

2F + σ2]

eZbZXy 1 F2


Example: /Day1/FamilyModel/FISHF.txt

A total of 459 fish were derived from single parental crosses composed of 32 sires

and 32 females to generate 32 families. Number of individuals per family varied

form 2 to 40. The idea is to rank the families and progeny for selection by using the

variable WEIGHT.

ID SireID DamID Family Weight

1001 120 125 22 88.3

1002 120 125 22 84.9

1003 120 125 22 76.8

1004 121 114 23 95.4

1005 121 114 23 85.4

1006 121 114 23 74.8

1007 121 114 23 103.4

1008 121 114 23 78.7

1009 121 114 23 109.5

1010 121 114 23 113.1

1011 121 114 23 95.4

1012 121 114 23 91.1

1013 121 114 23 85.4

1014 121 114 23 85.4

1015 121 114 23 86.0

...


!RENAME !ARGS 1

Family fish experiment

ID

SireID 32 !A

DamID 32 !A

Family 32 !A !SORT

Weight

Fish_Family.txt !SKIP 1

!MAXIT 40 !DOPART $A

!PART 1

Weight ~ mu !r Family

Example: /Day1/FamilyModel/FishF_.as



Fi ~ N[0, σf2] sF

2 = 8.12

eijk ~ N[0, σ2] s2 = 105.65

Vfamily = s2F = 8.12

Vp = s2F + s2 = 8.12 + 105.65 = 113.77

h2cross

= Vfamily / Vp = 8.12 / 113.77 = 0.071

Extract solutions for every parent and rank!!! (.sln file)


Family 32 32 0.768666E-01 8.12079 1.98 0 P

Variance 459 458 1.00000 105.648 14.71 0 P


CLONAL MODEL (Optional)

General aspects

• It can estimated total genetic variability (Vg).

• If both parents are known (mother, father, family or cross) then the additive,

dominance and epistasis components (Va, Vd and Vi) can be reasonably

estimated.

• Useful for selection of parents (backward selection), crosses or specific

genotypes.

• Allows to capture, in new generations, additive, dominance and epistasis

effects.

Difficulties

• Presents same difficulties as full-sib models.

• Some confounding of the epistasis component occurs (higher order terms).

• Occasionally produces negative causal variance components.

β and b as defined before

m vector of random male effects, ~ N(0, Aσ2m)

f vector of random female effects, ~ N(0, Aσ2f)

mf vector of random interaction male by female effects, ~ N(0, Iσ2mf)

mf.c vector of random clonal within family effects, ~ N(0, Iσ2c)



ecmfZmfZfZmZbZXy 54321 .

Va = 2 (σ2m + σ2

f) or Va = 4 σ2m (when σ2

m = σ2f)

Vd = 4 σ2mf Vi = σ2

c – (σ2m+ σ2

f) – 3 σ2mf (approx.)

Vg = Va + Vd + Vi

Vp = σ2m + σ2

f + σ2mf + σ2

c + σ2

H2 = Vg / Vp h2 = Va / Vp d2 = Vd / Vp


Example: /Day1/Clonal/CLONES.txt

A clonal test derived from a total of 61 families crossed in a circular mating

design were established in a field trial with 3 repetitions and incomplete blocks.

Each family has several clones. The objective of this study is to estimate all

variance components (additive, dominance and epistasis).

IDSORT FamilyID Female Male cloneid Rep IncBlock Tree VOL

1 46 Par927 Par931 677 1 1 1 537.7436

2 33 Par908 Par914 476 1 1 2 492.1155

3 53 Par924 Par907 775 1 1 3 704.826

4 41 Par913 Par917 608 1 1 4 494.6012

6 27 Par923 Par905 391 1 2 1 622.0541

7 14 Par925 Par908 192 1 2 2 425.1107

8 22 Par913 Par923 304 1 2 3 298.8255

9 11 Par929 Par920 144 1 2 4 513.8072

11 23 Par901 Par924 320 1 3 1 457.7191

12 60 Par929 Par904 838 1 3 2 709.3598

15 12 Par917 Par921 162 1 3 5 *

16 53 Par924 Par907 763 1 4 1 392.4941

17 13 Par901 Par916 179 1 4 2 463.7218

19 24 Par915 Par904 340 1 4 4 445.3584

20 40 Par922 Par917 592 1 4 5 623.984

21 30 Par904 Par903 424 1 5 1 439.2273

...


!RENAME !ARGS 1

Clonal Analysis of Pinus

IDSORT

FAMILY 61 !A

FEMALE 44 !P

MALE 44 !P

CLONE 868 !A

REP 3 !A

IBLOCK 110 !A

TREE

VOL

PEDPAR.TXT !SKIP 1 !MAKE !ALPHA

CLONES.TXT !SKIP 1

!MAXIT 50 !DISPLAY 2 !DOPART $A

!PART 1

VOL ~ mu REP !r REP.IBLOCK FEMALE MALE FAMILY CLONE !f mv

!PART 2

VOL ~ mu REP !r REP.IBLOCK FEMALE and(MALE) FAMILY CLONE !f mv

Example: /Day1/Clonal/Clonal_.as


Different var. comp. for Male and Female

Same var. comp. for Male and Female


FAMILY 61 61 0.294966E-01 518.846 1.12 0 P

REP.IBLOCK 330 330 0.353801E-06 0.622336E-02 0.00 0 B

FEMALE 44 44 0.714393E-01 1256.62 2.51 0 P

CLONE 868 868 0.428337 7534.46 8.44 0 P

Variance 2604 1766 1.00000 17590.0 22.65 0 P


FEMALE 44 44 0.100569 1769.13 2.04 0 P

MALE 44 44 0.218970E-01 385.195 0.70 0 P

FAMILY 61 61 0.433857E-01 763.208 1.13 0 P

REP.IBLOCK 330 330 0.350074E-06 0.615822E-02 0.00 0 B

CLONE 868 868 0.427149 7514.07 8.42 0 P

Variance 2604 1766 1.00000 17591.2 22.65 0 P


Session 6

Genetic Analyses:

Animal Models


Parental Models

• Half-sib crosses / sire model.

– One parent known. Parent selection.

• Full-sib crosses model.

– Both parents known. Parent/cross selection. Add and Dom effects estimable.

• Family model.

– Both parents known. Cross selection. Add and Dom effects confounded.

• Clonal model.

– Clonally replicated individuals. Parent/cross/individual selection.

Individual Models

• Animal model.

– One or two parents known. Individual/parent selection.

• Reduced animal model.

– One or two parents known. Individual/parent selection (only individuals with

records).

GENETIC MODELS

• Why worry about the pedigree in genetic analyses?

Statistically, random genetic effects (i.e. BLUPs) are not independent and

their matrix of correlations or co-variances (G or A) needs to be specified.

Genetically, it is important to consider information about relatives as they

will share some alleles, and therefore their response is correlated.

• How to incorporate this information?

Genetic relationships can be calculated using genetic theory (expected

values) or molecular information (e.g. SNPs), and included into the linear

mixed model by specifying a pedigree file,

• Are there other benefits?

Many. It is a more efficient use of the information about individuals, but also

genetic values of individual not tested, but with relatives tested, can be

predicted and selected.

INCORPORATING PEDIGREE

Example

Pedigree of a group of individuals:

PEDIGREE

Individual Male Female

3 1 2

4 1 Unknown

5 4 3

6 5 2

1 2

4 3

?

5

6

Numerator relationship matrix (A)

• Linked to the concept of identity by descent.

• Diagonal aii = 1 + Fi (inbreeding coefficient on individual i)

Twice the probability that two gametes taken at random from animal i will

carry identical alleles by descent.

• Off-diagonal aij numerator of the coefficient of relationship between animal

i and j.

• Several algorithms are available in ASReml to obtain this matrix.

PEDIGREE

125.1

688.0125.1

313.0625.000.1

563.0625.025.000.1

625.025.000.050.000.1

25.050.050.050.000.000.1

6

5

4

3

21

654321

A

CALCULATING THE A MATRIX

• Let A = {aij} be the relationship matrix.

• Let ai,-j the the i-th row of A except for the j-th element.

• Assume the relationship matrix for the base animals is known (e.g.

unrelated, non inbred). This will for a base matrix (e.g. identity)

• The row of the relationship matrix for the progeny of two parents is

generates as the average of the relationship matrix rows for the parents:

ai,-j = (as,-i + ad,-i)/2

• The diagonal element, ai,i of this new individual is:

ai,i = 1 + as,d/2 = 1 + Fi

where Fi is the inbreeding coefficient.

PEDIGREE

Analysis Trial AB23

Indiv 6 !P

Sire 3 !A

Dam 2 !A

Sex 2 !I

weight

PEDIGREE.PED !SKIP 1

DATA.DAT !SKIP 1

weight ~ mu Sex !r Indiv

PEDIGREE FILE

Indiv Male Female

1 0 0

2 0 0

3 1 2

4 1 0

5 4 3

6 5 2

In ASReml Graphically

1 2

4 3

?

5

6

Some useful options

!MAKE always generates the A inverse (instead of using a stored one).

!ALPHA allows to accept alphanumeric names of individuals.

!REPEAT ignore repeated individuals/entries in the pedigree file.

!GIV writes matrix A inverse in the ASCII format (.giv).

!INBRED generates pedigree for inbreed lines.

!SELF s allows for partial selfing according to variable s.

!GROUPS g includes genetic groups in the pedigree according to variable g.

In ASReml

• Pedigree file can be part of the data file

(first 3 columns: individual, parent1 and parent2).

• Method used to construct the A inverse s based on the algorithm of

Meuwissen and Luo (1992).

• Genetic groups can be defined here.

PEDIGREE FILE

PEDIGREE FILE

Construction / Check

• Pedigree information is associated with proper management and

validation/check of data.

• Individuals need to be ordered by generation (e.g. parents need to be

defined before progeny).

• All parents need to be defined in pedigree file (the inclusion of founder

parents is optional).

• All individuals present in dataset (i.e. levels associated with pedigree file)

need to be defined in pedigree file.

• Individuals can be defined as male or female parents (but this should be

checked if is not biologically possible).

ANIMAL / INDIVIDUAL MODEL

General aspects

• Requires defining individual and parental pedigree.

• A breeding value (or GCA) is obtained for each individual in the dataset,

and for all individuals (e.g. parents) in pedigree file.

• Typically used to estimates additive component (Va) only, but it can be

extended to non-additive and maternal effects.

• Useful for selection of individuals based on additive values (forward

selection) but can be also used to select parents.

• GCA values (or EBV) of parents will be proportional to a parental model.

Difficulties

• For large datasets it can be computationally costly.

• Pedigree file could be difficult to construct/maintain and it needs to be

checked carefully.


b vector of random design effects (e.g. block effect), ~ N(0, Iσ2b)

a vector of random additive effects (i.e. BV), ~ N(0, Aσ2a)


Va = σ2a

Vp = σ2a + σ2

h2 = Va / Vp = σ2a / [σ

2a+ σ2]

Note: any individual that are included in the pedigree file will have a

prediction of its breeding values (even those that are not measured).


eaZbZXβy 21

The dataset for a fish breeding program contains a total of 933 records of fish.

The objective is to fit an animal model that considers the complete pedigree. The parental pedigree is found in the file PEDPAR.txt, but an individual pedigree

needs to be constructed. For fitting the model consider the factor SEX as a

covariate. The response of interest is days to market size (DAYSM).


Example: /Day1/Fish/FISH.txt

INDIV Sire Dam DaysM Sex Market

1001 564 727 741.46 1 1

1002 564 727 500.09 2 1

1003 564 727 495.07 1 1

1004 564 727 506.25 2 1

1005 564 727 593.21 2 1

1006 564 727 671.10 1 1

1007 564 727 523.48 1 1

1008 564 727 531.33 1 1

1009 564 727 446.02 2 1

1010 564 727 599.20 1 0

1011 564 727 509.38 2 0

...


!RENAME !ARGS 1

Breeding Program Fish

INDIV 2040 !P !SORT

SIRE 100 !I

DAM 100 !I

DAYSM

SEX 2 !I

MARKET

PEDIND.TXT !SKIP 1 !MAKE

FISH.TXT !SKIP 1

!MAXIT 40 !DISPLAY 2 !FCON !DOPART $A

!PART 1

DAYSM ~ mu SEX !r INDIV

Example: /Day1/Fish/Fish_.as

Va = s2a = 2046.39 Vp = s2

a + s2 = 2046.39 + 3500.52 = 5546.91

h2 = Va / Vp = 0.369



INDIV 1380 1380 0.584596 2046.39 4.52 0 P

Variance 933 931 1.00000 3500.52 10.21 0 P

Wald F statistics

Source of Variation NumDF DenDF_con F-inc F-con M P-con

7 mu 1 77.6 15677.14 15677.14 . <.001

5 SEX 1 888.2 21.88 21.88 A <.001

SEX 1 0.000 0.000

SEX 2 21.57 4.612

mu 1 549.8 5.172

INDIV 501 6.527 37.33

INDIV 502 6.074 35.14

INDIV 503 -27.03 36.32

INDIV 504 -23.94 37.53

INDIV 505 0.6396 35.30

INDIV 506 7.579 38.26

INDIV 507 -8.798 35.33

...


Breeding Program Fish

Ecode is E for Estimable, * for Not Estimable

The predictions are obtained by averaging across the hypertable

calculated from model terms constructed solely from factors

in the averaging and classify sets.

Use !AVERAGE to move ignored factors into the averaging set.

---- ---- ---- ---- ---- ---- 1 ---- ---- ---- ---- ---- ----

Predicted values of DAYSM

The SIMPLE averaging set: SEX

INDIV Predicted_Value Standard_Error Ecode

501 567.1392 37.3393 E

502 566.6863 35.0927 E

503 533.5860 36.4141 E

504 536.6737 37.5067 E

505 561.2515 35.2528 E

506 568.1914 38.2210 E

507 551.8138 35.2626 E

508 526.8242 36.6684 E

509 525.0169 37.9278 E

510 523.4792 37.2501 E

511 616.2975 36.1484 E

512 563.8451 37.7190 E

513 541.1283 38.2338 E

514 532.9948 37.0123 E

515 541.1283 38.2338 E

516 538.0093 38.6922 E

105 586.5505 40.6930 E

1 586.5505 40.6930 E

...


Additional comments

• When pedigree is available from several generations, usually more than 3

generations does not produce a significant improvement on precision of

estimates.

• Incorporation of genetic groups is critical in order to consider previous

achieved genetic gains, and to describe the proper structure of the data.

• Reduced animal model (RAM), it is an alternative that runs faster as only

animals with records are considered.

• Other variants exist of the animal model exist that consider:

• Environmental effects.

• Maternal effects

• Genetic maternal effects

• Model with non-additive genetic effects (mainly dominance)

• Common environment effects



a vector of random additive effects (i.e. BV), ~ N(0, Aσ2a)

ce vector of random common environmental effects, ~ N(0, Iσ2ce)


Va = σ2a

Vp = σ2a + σ2

ce + σ2

h2 = Va / Vp = σ2a / [σ

2a+ σ2

ce + σ2]

Note: common environment effects are non-genetic effects that causes

resemble between members of the same family.

eceZaZbZXβy 321

COMMON ENVIRON. EFFECTS

Session 7

Variance Structures in

ASReml


VARIANCE STRUCTURES

Direct Product

• Variance structures are specified by using direct products or two or more

matrices (, or Kronecker product).

2221

1211

aa

aaA

BB

BBBA

2221

1211

aa

aa

Example

100

010

001

A

2

212

12

2

1B

2

212

12

2

1

2

212

12

2

1

2

212

12

2

1

00

00

00

00

00

00

00

00

00

00

00

00

BA

VARIANCE STRUCTURES

Direct Sum

• The desired matrix is specified by several square matrices in a block

diagonal matrix.

Example

3

2

1

321

3 )(

A00

0A0

00A

AAARR ,,diagj1j

ALFALFA EXPERIMENT

Alfalfa experiment - 12 varieties - Response Yield

Source 3 !I

Variety 12 !A !SORT

Block 6 !I

yield

ALFALFAS.TXT !SKIP 1 !DISPLAY 7 !SUMMARY


3 1 0

24 0 ID

24 0 ID

24 0 ID

An experiment was establish to compare 12 alfalfa varieties (labeled A-L).

These correspond to 3 different sources but the objective is to estimate

heritability of varieties regardless of its source. A total of 6 plots per variety

were established arranged in a RCB design. The response variable

corresponds to yield (tons/acre) at harvest time. It is of interest to fit a linear

model with an specific error variance for each of the different sources.

Example: /Day2/VarStruct/AlfalfaS_.as

ALFALFA EXPERIMENT

Interpreting output Source Model terms Gamma Component Comp/SE % C

Residual 72 66

Variety 12 12 0.222267E-01 0.222267E-01 1.64 0 P

Variance 0 0 0.928105E-01 0.928105E-01 2.93 0 P

Variance 0 0 0.602051E-01 0.602051E-01 2.99 0 P

Variance 0 0 0.146949E-01 0.146949E-01 2.49 0 P

Wald F statistics


5 mu 1 9.9 990.35 <.001

3 Block 5 25.0 24.05 <.001

Variety A 0.1685 0.1000

Variety B -0.1668 0.1000

Variety C 0.4365E-01 0.1000

Variety D 0.1361 0.1000

Variety E 0.1211 0.9013E-01

Variety F -0.1983 0.9013E-01

Variety G -0.8737E-02 0.9013E-01

Variety H 0.3721E-01 0.9013E-01

Variety I 0.1027 0.6540E-01

Variety J -0.1585 0.6540E-01

Variety K -0.9548E-01 0.6540E-01

Variety L 0.1860E-01 0.6540E-01

VARIANCE STRUCTURES

Variance models (VCODE)

Common structures

ID Identity 1

DIAG Diagonal w

US Unstructured w(w + 1)/2

AINV Numerator relationship matrix (A) 0 or 1

CORU Uniform correlation 1

Correlation/Spatial structures

CORB Banded correlation w-1

AR1 First order autoregressive 1

AR2 Second order autoregressive 2

ARMA Autoregressive and moving average 2

CORG General correlation (homogeneous) w(w - 1)/2

ANTE1 Antedependence of order 1 w(w - 1)/2

LVR Linear variance 1

VARIANCE STRUCTURES

Correlation-variance structures (homogeneous)

AR1V First order autoregressive (homog.) 2

CORUV Uniform correlation (homogenoeus) 2

CORBV Banded correlation (homogeneos) w

CORGV general correlation (homogeneous) w(w - 1)/2 + 1

Heterogeneous structures

IDH = DIAG Identity (heterogenoeus) w

AR1H First order autoregressive (heterog.) 1 + w

CORUH Uniform correlation (heterogeneous) 1 + w

CORBH Banded correlation (heterogeneos) 2w - 1

CORGH = US general correlation (heterogeneous) w(w - 1)/2 + w

Special structures

IEXP Isotropic Exponential 1

AEXP Anisotropic Exponential 2

OWNk User supplied G matrix k

GIVk User supplied General (Inverse) matrix 0 or 1

VARIANCE STRUCTURES

2

2

2

2

2

000

000

000

000

1000

0100

0010

0001

2

4

2

3

2

2

2

1

000

000

000

000

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

2

2

2

2

2

2

2

2

2

1

2

1

1

1

1

1

1

1

1

123

112

211

321

2

2

44

2

34

2

24

2

14

2

34

2

33

2

23

2

13

2

24

2

23

2

22

2

12

2

14

2

13

2

12

2

11

ID: identity

DIAG: diagonal

CORUV: uniform correlation

AR1V: autocorrelation 1st order

US: unstructured

CORUH: uniform heterogeneous

2

4434241

43

2

33231

4232

2

221

413121

2

1

CORRELATION STRUCTURES

CORU: unform correlation

CORG: general correlation AR1: autocorrelation 1st order

CORUB: banded correlation

1

1

1

1

1

1

1

1

123

112

211

321

1

1

1

1

342414

342313

242312

141312

1

1

1

1

123

112

211

321

VARIANCE STRUCTURES

Variance Header Line

• Required whenever random effects or residuals are not identically and

independently distributed.

<sections> <dimensions> <number of G structures>

<sections>

• Number of residual (Rj) structures to define.

Example. If several experiments are combined into a single analysis, then

each experiment will have an error structure with its own variance:

However, it is also possible to define each error structure with a direct product:

j

s

j RR 1

21 jjj RRR

VARIANCE STRUCTURES

Variance Header Line

<dimensions>

• Number of direct product of variance structures that are required to define

each of the residual, Rj, structures.

Example. An spatial analysis will have an error structure defined by two

elements: correlations across rows and correlations across columns.

<number of G structures>

• Number of random effects (Gi, or any interaction) that are defined with

structures different than identically and independently distributed.

Example. Pedigree matrix can be defined here (G = A)

Note: each of this components will have to be defined in greater detail later.

VAR. STRS. - EXAMPLES

<sections>

• Number of residual structures to define.

3 1 0

1280 0 ID

1320 0 ID

2300 0 ID

!SECTIONS n number of residual structures to define.

3: acts as a counter (here, 3 sites)

1: only a single structure on each of the residual structures

0: no G structures defined

1280: number of observations in site 1 (sorted by site)

0: sortkey (sorting variable no specified here)

ID: VCODE corresponding to independent errors.



VARIANCE STRUCTURES

<dimensions>

• Number of direct product of variance structures that are required to define

each of the residual structures.

1 2 0

16 row AR1

20 col AR1

1: a single residual structure (1 site here)

2: two direct products that define the residual structure

0: no G structures defined

16: number of rows in experiment (it could be replaced by a number)

row: sortkey for order of rows within dataset

AR1: VCODE corresponding to auto-correlated structure

20: number of columns in experiment (it could be replaced by a number)

col: sortkey for order of columns within dataset

VARIANCE STRUCTURES

<number of G structures>

• Number of random effects (or interactions) that are defined.

3 1 1

1280 0 ID

1320 0 ID

2300 0 ID

site.genotype 2

site 0 CORGH 0.25 0.25 0.25 1.22 1.46 2.05

genotype 0 AINV

3: acts as a counter (here, 3 sites)

1: a single structure en each of the 3 residual elements

1: a single G structure defined

Note: the command !f mv keeps the missing observations and is useful for

counting observations over multiple R structures

VARIANCE STRUCTURES

3 1 1

1280 0 ID

1320 0 ID

2300 0 ID

site.genotype 2

site 0 CORGH 0.25 0.25 0.25 1.22 1.46 2.05

genotype 0 AINV

site.genotype G structure term to be defined

2 number of factors to define for this G structure

site (or 3): acts as a counter (as before with a value of 3)

0: sortkey (not specified)

CORGH: VCODE heterogeneous general correlation matrix

genotype: acts as counter for the genotype factor

0: sorkey (not specified)

AINV: VCODE inverse of the relationship matrix from pedigree file

Some options in the variance components

!GP restricts to the positive parameter space

!GU unrestricted

!GF fixed at a given supplied value (e.g. starting value)

!VCC c indicates the number of variance parameters constraints

!S2==1 qualifier required to fix the error variance at 1.0 and prevent

ASReml trying to estimate two confounded parameters (usually required for

cases where variance , instead of correlation, matrices are specified)

Example

Volume ~ mu Block !r Mother 0.25 !GF Plot 0.4 !GU

VARIANCE STRUCTURES

• Starting values and restrictions can be added next to the parameters.

• Important to aid convergence and to speed up fitting.

VARIANCE STRUCTURES

• Order of starting values for variance and correlation matrices is important

Variance Matrices

or

Correlation Matrices

or

Note: for most complex variance structures it is critical to specify starting values.

10987

654

32

1

10

69

548

3217

10

96

853

7421

10653

942

81

7

Next to model terms

!GP positive variance component

!GU unrestricted variance component (default)

!GF fixed variance component

Volume ~ mu Block !r Mother 0.25 Plot 0.4 !GF

After model terms

!VCC n to read n variance component restrictions lines G structure.

25 26 # V25 = V26

2 -3 # V2 = -V3

4 5 * 4 # V4 = V5*4

!=ABA all parameters with the same letter in the G or R structure

are treated as the same parameter.

2 0 US 0.2 0.3 0.5 !=ABA

CONSTRAINTS IN VAR-COV COMP.

ALFALFA EXPERIMENT

!PART 3


1 2 0

3 Source DIAG 0.8 0.8 0.8 !=ABA

24 0 ID !S2==1

!PART 4

!VCC 1


3 1 0

24 0 ID

24 0 ID

24 0 ID

4 6

It is of interest to fit a linear model with an specific error variance for each of

the sources 1 and 3, and a different for source 2.


• Post-analysis procedure to calculate functions of variance components

(e.g. heritability or genetic correlations).

• Based in approximations using delta method (i.e. Taylor series approx.)

• It should not be used for statistical inference only as a rough reference.

Linear functions of variance components

Ratio of variance components

Correlations based in 3 variance components

Va = 4 σ2s Vp = σ2

s + σ2

h2 = Va / Vp = 4 σ2s / [σ

2s + σ2]

FUNCTIONS OF VAR. COMPS.

)()(

),(

21

21)(

gVargVar

ggCovrg gA

F Linear functions of variance components

H Ratio of variance components

R Correlations based in 3 variance components

• A .pin file needs to be created with the functions to be calculated

following the order of the variance components presented in the .asr file,

and also uses output from .vvp file.

• Output is presented in file .pvc

FUNCTIONS OF VAR. COMPS.

ASReml options

• Alternatively commands can be incorporated into the file using the commands: !PIN !DEFINE, which will generate the file automatically

and then run it.

!PART 5


!PIN !DEFINE

F Vg 1 #3

F Vtotal 1 2 #4

H Herit 3 4


Variety 12 12 0.580868 0.276798E-01 1.81 0 P

Variance 72 66 1.00000 0.476526E-01 5.24 0 P

1 Variety 0.276798E-01

2 Variance 0.476526E-01

3 Vg 1 0.27680E-01 0.15265E-01

4 Vtotal 1 0.75332E-01 0.16972E-01

Herit = Vg 1 3/Vtotal 4= 0.3674 0.1397

Notice: The parameter estimates are followed by

their approximate standard errors.

ALFALFA EXPERIMENT

Variance component estimates

pvc file


Session 8

Multivariate Analysis /

Repeated Measures


MULTIVARIATE ANALYSIS

General Uses

• More efficient analysis that combines information on two or more response

variables.

• Produces an improvement on the precision of the breeding values (BLUPs).

• Allows to estimate correlations among traits (e.g. phenotypic and genetic

correlations).

• Assists in predicting individual breeding values for traits that were not

measured (but they need to be correlated).

• Relevant to assess importance of indirect selection.

• Can be used to combine different sources of, complete or incomplete,

sources of data.

• Generates the required matrices to construct a selection index.

• Recommended analysis for cases where a prior selection was done based in a

trait.

BIVARIATE ANALYSIS

• Considers a 2 x 2 matrix for each effect, e.g.

In ASReml

• Uses individual stacked responses: yi = [yi(1) yi(2)]’

• The word Trait is used to defined the stacked response vector.

• Typically genetic and error effects are defined with a UN variance structure.

• Other effects can be defined as UN or DIAG structures.

• It is also recommended to use some of the correlation to maintain parameter

space.

Strategy for fitting models in ASReml

• Sensible to initial starting values (for any multivariate analysis).

• Strategy: start with univariate analysis and add one variable at the time.

• Get rough estimates: Estimate phenotypic or genetic correlations /

covariances using univariate solutions, or prior knowledge.

• Use !CONTINUE or –c from previous runs.

2

221

21

2

1

2

1

21

)

ttt

ttti

g

g

gg

V(g

A tree genetic study consisting on seeds from a total of 28 female parents were

collected from mass selection and tested in a RCBD together with 3 control female

parents. The experiment consisted in 10 replicates with 34 plots each of size 2 x 3.

The response variables of interest are total height (HT, cm) and diameter at breast

height (DBH, cm). For now we will concentrate in the response HT. The objective is

to rank the female parents for future selections and seed production. Note that a

model can be fitted with and without the controls included as parents.

OPEN POLLINATION

Example: /Day2/BivarOpen/OPENPOL.txt

ID REP PLOT FEMALE TYPE DBH HT

1 1 1 FEM1 Test 23.8 12.4

2 1 1 FEM1 Test 24.4 12.1

3 1 1 FEM1 Test 25.4 10.9

4 1 1 FEM1 Test 28.0 12.7

5 1 1 FEM1 Test 20.9 11.9

6 1 1 FEM1 Test 22.6 11.2

7 1 2 FEM15 Test 22.4 10.7

8 1 2 FEM15 Test 21.9 11.6

9 1 2 FEM15 Test 20.8 11.3

10 1 2 FEM15 Test 21.6 13.3

...

OPEN POLLINATION (bivariate)

!RENAME !ARGS 2

Open polination trial

ID

REP 10 !I

PLOT 34 !I

FEMALE 31 !A !SORT

TYPE 2 !A !SORT

DBH

HT

OPENPOL.TXT !MAXIT 40 !SKIP 1 !DISPLAY 2 !DOPART $A

!PART 2

HT DBH ~ Trait Trait.REP !r Trait.FEMALE Trait.REP.PLOT

1 2 2

0 0 ID

Trait 0 US 1.01 1.82 7.25

Trait.FEMALE 2

Trait 0 US 0.19 0.31 0.61

FEMALE 0 ID

Trait.REP.PLOT 3

Trait 0 US 0.05 0.001 0.001 !GUFF

REP 0 ID

PLOT 0 ID

Example: /Day2/BivarOpen/BivarOpen_.as

OPEN POLLINATION (bivariate)


Residual UnStructured 1 1 1.00196 1.00196 29.17 0 U



Trait.FEMALE UnStructured 1 1 0.191142 0.191142 3.44 0 U



Trait.REP.PLOT DIAGonal 1 0.790064E-01 0.790064E-01 5.19 0 U

Trait.REP.PLOT DIAGonal 2 -0.201829 -0.201829 -2.84 0 U

Covariance/Variance/Correlation Matrix UnStructured Residual

1.002 0.6720

1.834 7.437

Covariance/Variance/Correlation Matrix UnStructured Trait.FEMALE

0.1911 0.8449

0.3102 0.7050

Wald F statistics


8 Trait 2 29.2 9584.21 <.001

9 Trait.REP 18 643.1 4.82 <.001

Interpreting analysis

MULTIVARIATE ANALYSIS

Extensions

• Consider different sites (or years) as different traits (e.g. helps to classify

sites).

• Variance-covariance matrices can be used to ‘study’ genetic structure

(e.g. evaluating / separating genetic groups).

Strategy for fitting models in ASReml

• For fitting model use same strategies as for bivariate analysis.

• Standardized responses, particularly when variables have different scales.

• Implement simple structures first (e.g. ID, DIAG, CORUH, CORGH).

• Correlation variance structures (CORUH, CORBH, CORGH) tend to give

better results.

• Consider constraining some parameters, e.g. !GPFPUP

• Be aware that it might not fit at all!

REPEATED MEASURES

• Very similar to multivariate analysis but every measurement point (time) is

considered as a different trait.

• Requires modelling of the mean effects (patterns) and variance structures.

• Additional modelling of fixed effects of time points is possible (e.g.

polynomials or splines).

• Convergence conflicts are still present, but to a lesser extent.

• Two modelling approaches:

- Multiple vectors: parallel vectors with, typically, US error structure.

- Single vector: stacked responses with, typically, AR1V correlations.

Relevant functions in ASReml



spl(v,k) defines a spline model term for the variable v with k knots

!{ and !} placed around model terms so terms are not reordered

(important for specifying covariances between random terms)

Example: /Day2/MultiVar/MVCOLS.txt

A total of 824 individuals were measured at 4 equally spaced time points. These

correspond to offspring of 26 parents that were planted as a RCBD with 4 blocks

at 2, 4, 6 and 8 years after establishment.

REPEATED MEASURES: AS MV

IDD Indiv Female Rep HT1 HT2 HT3 HT4

1 1 F09 1 62.0 108.0 240.0 411.5

2 2 F02 1 66.0 154.0 275.0 442.0

3 3 F21 1 65.0 116.0 245.0 323.1

4 4 F25 1 68.0 102.0 225.0 350.5

5 5 F13 1 58.0 170.0 325.0 457.2

6 6 F14 1 117.0 265.0 445.0 588.3

7 7 F14 1 * * * *

8 8 F15 1 75.0 162.0 315.0 484.6

9 9 F18 1 74.0 182.0 340.0 493.8

10 10 F03 1 100.0 230.0 350.0 518.2

11 11 F07 1 72.0 148.0 310.0 313.9

12 12 F14 1 69.0 164.0 310.0 469.4

13 13 F11 1 87.0 208.0 340.0 493.8

14 14 F24 1 50.0 148.0 290.0 454.2

15 15 F02 1 66.0 173.0 350.0 521.2

16 16 F21 1 75.0 164.0 305.0 469.4

17 17 F15 1 78.0 166.0 315.0 493.8

...

Example: /Day2/MultiVar/MV_.as


!RENAME !ARGS 1

Multivariate Analysis of HT - 4 meas

IDD

INDIV

FEMALE !A

REP !A

HT1 HT2 HT3 HT4

MVCols.txt !SKIP 1 !MAXIT 40 !DISPLAY 2 !DOPART $A

!PART 1

HT1 HT2 HT3 HT4 ~ Trait Trait.REP !r Trait.FEMALE

1 2 1

0 0 ID

Trait 0 US 419

556 1405

698 1846 3801

821 2306 4624 7154

Trait.FEMALE 2

Tr 0 US 36

48 74

38 70 117

61 126 223 410

FEMALE 0 ID



Covariance/Variance/Correlation Matrix UnStructured Residual

419.7 0.7241 0.5527 0.4744

556.1 1405. 0.7989 0.7275

698.1 1847. 3801. 0.8868

822.0 2307. 4625. 7155.

Covariance/Variance/Correlation Matrix UnStructured Trait.FEMALE

35.60 0.9375 0.5843 0.5068

48.08 73.88 0.7499 0.7245

37.78 69.86 117.5 1.019

61.25 126.2 223.8 410.4

Example: /Day2/RepMeas/REPCOLS.txt

REPEATED MEASURES: AS UNIV

IDD Indiv Female Rep Time HT

1 1 F09 1 1 62

2 1 F09 1 2 108

3 1 F09 1 3 240

4 1 F09 1 4 411.5

5 2 F02 1 1 66

6 2 F02 1 2 154

7 2 F02 1 3 275

8 2 F02 1 4 442

9 3 F21 1 1 65

10 3 F21 1 2 116

11 3 F21 1 3 245

12 3 F21 1 4 323.1

13 4 F25 1 1 68

14 4 F25 1 2 102

15 4 F25 1 3 225

16 4 F25 1 4 350.5

17 5 F13 1 1 58

18 5 F13 1 2 170

19 5 F13 1 3 325

20 5 F13 1 4 457.2

...

Example: /Day2/RepMeas/RepCols_.as

REPEATED MEASURES

!RENAME !ARGS 1

Repeated Measures Analysis of HT - 4 meas

IDD

INDIV

FEMALE 26 !A

REP 4 !I

TIME 4 !I

HT

REPCOLS.txt !MAXIT 40 !SKIP 1 !DISPLAY 2 !DOPART $A

!PART 1

!FILTER TIME !SELECT 1

HT ~ mu REP !r FEMALE

!PART 2

log(HT) ~ mu lin(TIME) TIME.REP !r,

!{ FEMALE lin(TIME).FEMALE !} !f mv

1 2 1

824 0 ID !S2==1

TIME 0 AR1H 0.8 0.05 0.05 0.05 0.05

FEMALE 2

2 0 CORUH -0.8 0.004 0.0001

FEMALE


Residual AR=AutoR 4 0.798949 0.798949 82.75 0 U

Residual AR=AutoR 4 0.641964E-01 0.641964E-01 19.31 0 U




FEMALE CORRelat 2 -0.807724 -0.807724 -6.53 0 U

FEMALE CORRelat 2 0.362337E-02 0.362337E-02 1.87 0 U

FEMALE CORRelat 2 0.262804E-03 0.262804E-03 2.06 0 U

Covariance/Variance/Correlation Matrix CORRelation FEMALE

0.3625E-02 -0.8078

-0.7884E-03 0.2629E-03

Wald F statistics


8 mu 1 23.2 0.37E+06 <.001

9 lin(TIME) 1 24.0 17256.68 <.001

10 TIME.REP 14 2096.0 256.63 <.001

REPEATED MEASURES


Session 9

Multi-environment

Analysis


MET ANALYSIS

General Uses

• Incorporates information from several experiments (over different sites or

years) to obtain overall BVs.

• Allows to estimate Genotype-by-Environment (or Genotype-by-Year)

effects, and their variance structure. Hence, it separates genetic effects into

their pure component and their interaction with site (or year).

• Provides with unbiased estimates of heritability and Type-B correlations.

• Critical to understand the genotypes structure of the population and to

define breeding strategies.

Difficulties

• Every site (or year) has its own ‘personality’ (i.e. error structure, design

effects, etc.) that needs to be combined into a single analysis.

• Amount of data can large with difficulties in fitting and convergence.

• Requires additional prior checks (e.g. EDA, coding, etc.).

Some useful options

In ASReml

• Flexible and fast enough to incorporate many datasets.

• Each site will have its own model specification (fixed effects, random

components and error structure).

• Allows to use a 2-stage analysis (see !TWOSTAGEWEIGHTS).

MET ANALYSIS

<sections> <dimensions> <number of G structures>

at(f,n) creates a binary variable for the condition specified in a factor

mv creates a missing value as fixed effect (design matrix)

!SECTION n number of residual structures to define.

!MVINCLUDE missing values in a factor are treated as zeros

Strategy for fitting MET models in ASReml

• Careful cleaning process (same factors, values, etc.).

• Start analyzing every site individually determining all necessary (and

significant) design effects and error structure.

• Evaluate which sites to consider for full analysis (sites with low

heritability contribute little to ranking).

• Consider implementing a data standardization.

• Incorporate and evaluate which variables or factors will act as

‘covariates’ through all trials.

• Combine all trials into a simple single analysis (e.g. heterogeneous error

variances but with common additive variance).

• Progress slowly to more complex variance structure for different model terms (e.g. DIAG for additive).

• Considering favouring the simplest model that suits your requirements

(practical, operational).

MET ANALYSIS

Complex Variance Structures

• Ideal objective: to fit a US structure to the GxE matrix to understand the

genetic structure and evaluate stability of genotypes and breeding zones.

• A US structure is difficult to fit, but other simpler (approximate) structures

are available.

• ASReml allows to consider other structures based in multivariate

techniques (e.g. factor analytic covariance).

MET ANALYSIS

• Is a relative expression of genotype-by-environment interaction.

• It could be zero or positive (0 to 1).

• A value close to 0 indicates that the rank in one environment is very

different than the rank in another environment (i.e. low stability)

• A value close to 1 indicates that a single ranking can be used across all

environments without loss of information (i.e. high stability).

gxsg

g

VV

V

2

)(gBrg

axsa

a

VV

V

2

)(aBrg

TYPE-B CORRELATIONS

Definition: Correlation between sites

MET ANALYSIS

Option 1: Simple GxE structure

• Aims at modelling a common GxE correlation.

• Common structures are: DIAG, CORUH.

• Correlation corresponds to an average value across all sites.

• It is simpler to fit, easy to converge.

• It does not allow for a better understanding of the GxE.

Option 2: Complex GxE structure

• Aims at modelling the ‘full’ GxE correlation structure.

• Common structures are: CORGH, US, FAk, FACVk.

• Provides with a different GxE correlation for each pair of sites.

• It is difficult to fit, particularly for several sites.

• Simplifications are usually required, e.g. standardization.

MET ANALYSIS

Variant 1: Explicit GxE

yield ~ mu Site !r Genotype Site.Genotype

• Provides with average genetic values across all sites, together with GxE

deviations for each site.

• Useful for generating ranking across all sites.

• Allows for simplification of GxE term.

Variant 2: Implicit GxE

yield ~ mu Site !r Site.Genotype

• Provides with a different genetic value for each site.

• Useful for generating rankings for each site.

• It could make use of the full correlation structure of the GxE.

• Typically used to understand the dynamics of GxE.


β vector of fixed design or covariate effects

l vector of fixed location (sites or years) effects


s vector of random sire effects (i.e. ½ breeding value), ~ N(0, Aσ2s)

sl vector of random sire-by-location interactions, ~ N(0, Iσ2sl)

e vector of random residual effects, ~ N(0, D) or N(0, )

Va = 4 σ2s Vaxs = 4 σ2

sl

Vp = σ2s + σ2

sl + σ2

h2 = Va / Vp = 4 σ2s / [σ

2s + σ2

sl + σ2]

rgB(a) = Va / [Va + Vaxs] = ρs

MET HALF-SIB / SIRE MODEL

eslZsZbZlXβXy 32121

i

s

i

R1

Explicit GxE

Example: /Day2/MultiEnv/TRIALS4.txt

A set of 4 trials were established as part of a breeding program. A total of 61

unrelated parents were considered (i.e. half-sib model). All trials corresponded to

IBD with 4 full replicates. The response variable of interest is HT. We are

interested in obtaining an analysis using all four sites simultaneously.

MET ANALYSIS

IDD Test Genotype Rep Iblock Row Column Surv DBH HT

10001 1 G41 1 1 1 1 1 736.6 557.8

10002 1 G33 1 1 2 1 1 685.8 588.3

10003 1 G22 1 1 3 1 1 838.2 551.7

10004 1 G31 1 1 4 1 1 660.4 539.5

10005 1 G18 1 1 5 1 1 406.4 411.5

10006 1 G01 1 1 6 1 1 508.0 417.6

10007 1 G05 1 1 7 1 1 711.2 518.2

10008 1 G54 1 2 8 1 1 609.6 463.3

10009 1 G30 1 2 9 1 1 482.6 466.3

10010 1 G17 1 2 10 1 1 736.6 527.3

10011 1 G58 1 2 11 1 1 584.2 472.4

10012 1 G37 1 2 12 1 1 431.8 442.0

10013 1 G07 1 2 13 1 1 736.6 600.5

10014 1 G42 1 2 14 1 1 711.2 566.9

10015 1 G38 1 3 15 1 1 711.2 518.2

10016 1 G33 1 3 16 1 1 736.6 606.6

10017 1 G50 1 3 17 1 1 736.6 576.1

10018 1 G20 1 3 18 1 1 660.4 539.5

...

Example Variant 1: /Day2/MultiEnv/GxE_.as

MET ANALYSIS

!RENAME !ARGS 2

Four trials to study GxE for HT

IDD

Test 4 !A !SORT

Genotype 61 !A

REP 4 !A

IBlock 110 !A

Row 56 !A

Col 32 !A

Surv

DBH

HT

TRIALS4.txt !SKIP 1 !MAXIT 50 !DISPLAY 2 !DOPART $A

!PART 2

HT ~ mu Test Test.REP !r,

at(Test,1).REP.IBlock at(Test,2).REP.IBlock,


Genotype Test.Genotype !f mv

4 1 0

4480 0 ID

4480 0 ID

4608 0 ID

4400 0 ID

Note: individual site heritabilites can also be calculated.

MET ANALYSIS



Residual 17968 16537

Genotype 100 100 301.167 301.167 4.60 0 P

Test.Genotype 400 400 158.584 158.584 6.74 0 P

at(Test,1).REP.IBloc 4400 4400 1159.04 1159.04 9.75 0 P




Variance 0 0 4390.59 4390.59 44.30 0 P

Variance 0 0 3871.67 3871.67 43.39 0 P

Variance 0 0 4130.69 4130.69 42.40 0 P

Variance 0 0 3812.02 3812.02 42.26 0 P

Va = 4 s2g = 4 x 301.2 = 1204.7

Vaxs = 4 s2gs = 4 x 158.6 = 634.3

Vp = 301.2 + 158.6 + (4141.7)/4 +(16235.0)/4 = 5553.9

h2 = Va / Vp = 1204.7 / 5553.9 = 0.217

rgB(a) = Va / [Va + Vaxs] = 1204.7 / [1204.7 + 634.3] = 0.655


β vector of fixed design or covariate effects

l vector of fixed location (sites or years) effects


sl vector of random sire-by-location interactions, ~ N(0, UA)

e vector of random residual effects, ~ N(0, D)

U matrix of variance-covariances

A numerator relationship matrix

D diagonal matrix

MET HALF-SIB / SIRE MODEL

eslZbZlXβXy 3121 Implicit GxE

MET ANALYSIS

!PART 3




Test.Genotype !f mv

4 1 1

4480 0 ID

4480 0 ID

4608 0 ID

4400 0 ID

Test.Genotype 2

Test 0 US 520.7

392.2 563.6

256.7 376.6 392.1

384.1 268.8 200.0 356.8

Genotype 0 ID

Example Variant 2: /MultiEnv/GxE_.as

MET ANALYSIS







Variance[ 1] 4480 0 4388.87 4388.87 44.30 0 P

Variance[ 2] 4480 0 3871.39 3871.39 43.38 0 P

Variance[ 3] 4608 0 4131.87 4131.87 42.38 0 P

Variance[ 4] 4400 0 3811.58 3811.58 42.26 0 P

Test.Genotype UnStructured 1 1 520.722 520.722 4.86 0 U










Covariance/Variance/Correlation Matrix UnStructured Test.Genotype

520.7 0.7240 0.5682 0.7056

392.2 563.6 0.8012 0.5995

256.7 376.6 392.1 0.5353

304.1 268.8 200.2 356.8


MET ANALYSIS

BLUP values: Variant 1

Effect Level BLUP SE(BLUP)

Genotype G22 11.03 7.085

Test.Genotype 1.G22 10.43 8.368


Test.Genotype 3.G22 -13.59 8.386


BLUP values: Variant 2

Effect Level BLUP SE(BLUP)



Test.Genotype 3.G22 -1.8 7.147


Factor Analytic models

• Useful approximations for modelling an U matrix on GxE or multivariate

analyses.

• Flexible models that require fewer variance-components than US, and tend

to converge better and quicker.

• Allow for additional interpretation of underlie environmental factors

associated with the matrix of correlations.

• Finding solutions for FA models can be difficult requiring proper

specification of initial values.

• Several alternative models are available within ASReml: FAk, FACVk and

XFAk.

• Based on the parameterization:

MET ANALYSIS

'

FA model: FAk

D is a diagonal matrix such that

C is a correlation matrix of the form

F is a matrix of loadings on the correlation scale

E is a diagonal matrix defined by difference (remnant).

FA model: FACVk

is a matrix of loadings on the covariance scale, with

is a diagonal matrix, with

MET ANALYSIS

)(diagDD

DCD

DF

E'FF

'

DED

MET ANALYSIS

!PART 4




Test.Genotype !f mv

4 1 1

4480 0 ID

4480 0 ID

4608 0 ID

4400 0 ID

Test.Genotype 2

Test 0 FA1

0.8 0.9 0.1 0.2 # 1st factor

520.7 563.6 392.1 356.8 # Site Variances

Genotype 0 ID

Example Variant 2: /MultiEnv/GxE_.as

MET ANALYSIS







Variance[ 1] 4480 0 4389.44 4389.44 44.29 0 P

Variance[ 2] 4480 0 3871.43 3871.43 43.38 0 P

Variance[ 3] 4608 0 4131.95 4131.95 42.38 0 P

Variance[ 4] 4400 0 3811.38 3811.38 42.26 0 P

Test.Genotype FA D(LL'+E)D 1 1 0.787009 0.787009 10.71 0 U








Covariance/Variance/Correlation Matrix FA D(LL'+E)D Test.Genotype

519.1 0.7333 0.6440 0.5472

396.8 563.9 0.7625 0.6480

290.2 358.1 391.1 0.5690

236.5 291.9 213.4 359.9


MET ANALYSIS

Two-Stage Analyses

• An MET analysis with several sites (> 5) is difficult to obtain, particularly if there are too many variance components to estimates (e.g. US).

• It is possible to use a two-stage analysis that is decomposed as:

1st Stage

• Every site is analysed individually with its own characteristics.

• Genotype effects are assumed fixed.

• Means and SEMs are obtained for each site.

2nd Stage

• All means (and SEMs) are combined into a single file.

• The use of !TWOSTAGEWEIGHTS generates weights (and covariance) for

each prediction and combines the analyses into a single run.

Session 10

Spatial

Analysis


SPATIAL ANALYSIS

General Uses

• It corresponds to an extension to the single vector repeated measures analysis.

• Incorporates information from physical positions (x and y coordinates).

• Effect: improves estimates (BLUPs) and allows for a better control of errors.

Hence, it will increase heritability and genetic gains.

• More efficient analysis (under presence of correlation) as it ‘borrows’

information from neighbours.

• ASReml can handle regular or irregular grids.

• Can be used for unreplicated trials!

Difficulties

• At the present is more like an ‘art’ that requires to evaluate several options.

• Requires the knowledge of the position of each individual experimental unit

(e.g. plant or plot).

• Additional variance components need to be estimated (i.e. convergence

problems).

SPATIAL ANALYSIS

• Gradients or Trends

Linear trends

Polynomial functions, e.g. f(xc, yc) = + 1xc + 2yc + 3 xc2 yc + 4xc yc

2

Row or Column effects (random).

• Patches

Incomplete Blocks

Spatial Error Structures, e.g. AR1 AR1 +

Var (eij) = s2

+ ms2

Cov (eij , ei’j’) = s2 ρx

hx ρyhy

SPATIAL ANALYSIS

Strategy in ASReml (regular grid)

• Begin with an separable autorregressive error structure: AR1AR1. This is

a first order autorregressive model that assumes separate correlations x and y for columns and rows, respectively (i.e. AR1).

• Evaluate if a nugget effect is required (i.e. !r units).

• Check variogram and incorporate additional random or fixed effects for

trends.

• Use a likelihood ratio test (LRT), BIC or AIC to compare models.

Strategy in ASReml (irregular grid)

• Begin with an isotropic exponential (i.e. IEXP) and then move to more

complex models (e.g. AEXP) .

• As before, evaluate if a nugget effect is required (i.e. !r units), check

variogram and incorporate additional random or fixed effects.

VARIANCE STRUCTURES

Correlation/Spatial structures

AR1 First order autoregressive 1

AR2 Second order autoregressive 2

ARMA Autoregressive and moving average 2

LVR Linear variance 1

IEXP Isotropic Exponential 1

AEXP Anisotropic Exponential 2


!S2==1 used to fix the R variance to 1.0

!f mv to include dummy missing values in sparse form

units includes nugget (microsite) random error



fac(v) forms a factor with the values of a continuous variable

spl(v,k) defines a spline model term for the variable v with k knots

SPATIAL ANALYSIS

Heritability in spatial models

• Traditional expression is only valid when distance between individuals is

assumed to be zero.

• Generic expression for spatial analyses:

• An alternative is to use the PEVs to approximate the mean parental

heritability:

2

2

PEV

)}({1

g

PEVmeanh

g

2

0

2

e

|dy|

y

|dx|

x

2

g

2

g2

)(

4

h

SPATIAL ANALYSIS

Comparing spatial models

• Use LRT when models are nested and have the same fixed effect terms.

• Compare AIC (Akaike Information Criteria) and BIC (Bayesian

Information Criteria) to select among non-nested models (but with same

fixed effect terms).

• Use a h2PEV to compare among different models.

• Calculate one of the proposed R2 expressions for mixed models.

t number of variance parameters in the model

v residual degrees of freedom, v = n – p

AIC = – 2×logL + 2×t

BIC = – 2×logL + 2×t×log(v)

Example: /Day2/Spatial/ROWCOL.TXT

SPATIAL TRIAL

ID REP ROW COL PLOT TREE FEMALE X Y YA

1 2 4 1 14 2 4 1 1 8.628352

2 2 4 1 14 1 4 1 2 7.718902

3 2 3 1 26 2 7 1 3 8.041164

4 2 3 1 26 1 7 1 4 9.593278

5 2 2 1 62 2 16 1 5 8.739841

6 2 2 1 62 1 16 1 6 8.456119

7 2 1 1 50 2 13 1 7 9.557565

8 2 1 1 50 1 13 1 8 10.639179

9 1 4 1 1 2 1 1 9 9.938713

10 1 4 1 1 1 1 1 10 8.332414

11 1 3 1 53 2 14 1 11 10.495654

12 1 3 1 53 1 14 1 12 10.130853

13 1 2 1 37 2 10 1 13 11.983712

14 1 2 1 37 1 10 1 14 12.080121

15 1 1 1 33 2 9 1 15 11.203263

16 1 1 1 33 1 9 1 16 10.757546

17 2 4 1 14 4 4 2 1 9.797591

18 2 4 1 14 3 4 2 2 9.206996

19 2 3 1 26 4 7 2 3 8.786462

...

An experiment was established to evaluate a group of open-pollinated families. The

experiment consisted in row-column design with 4 replicates. The plants within the

experiment where arranged in a 16x16 grid and is of interest to rank female parents

based on the response yield (YA) by fitting an spatial model.

SPATIAL TRIAL

!RENAME !ARGS 1 2

Genetic Spatial trial

ID

REP !I

ROW !I

COL !I

PLOT !I

TREE

FEMALE !A

X 16 # X coordinate

Y 16 # Y coordinate

YA

ROWCOL.TXT !SKIP 1 !MAXIT 40 !DISPLAY 15 !DOPART $A

!PART 1

YA ~ mu REP !r REP.ROW REP.COL FEMALE REP.PLOT !f mv

1 2 0

16 0 ID

16 0 ID

!PART 2

YA ~ mu REP fac(Y) fac(X) !r REP.ROW REP.COL FEMALE REP.PLOT !f mv

1 2 0

16 X AR1 0.3

16 Y AR1 0.3

Example: /Day2/Spatial/Spatial_.as

Interpreting variograms

SPATIAL TRIAL

Traditional Analysis

Spatial Analysis

SPATIAL TRIAL

LogL=-55.4337 S2= 0.40323 252 df


REP.ROW 16 16 0.447880 0.180596 1.96 0 P

REP.COL 16 16 0.144506 0.582684E-01 1.32 0 P

FEMALE 16 16 0.260711 0.105125 1.81 0 P

REP.PLOT 64 64 0.105548 0.425594E-01 0.96 0 P

Variance 256 252 1.00000 0.403225 9.80 0 P

LogL=-61.9450 S2= 0.41594 224 df


REP.ROW 16 16 0.245284 0.102024 1.20 0 P

REP.COL 16 16 0.101193E-06 0.420904E-07 0.00 0 B

FEMALE 16 16 0.279467 0.116242 2.02 0 P

REP.PLOT 64 64 0.503325E-07 0.209354E-07 0.00 0 B

Variance 256 224 1.00000 0.415943 8.85 0 P



Wald F statistics


11 mu 1 13.8 6143.27 <.001

2 REP 3 5.6 4.27 0.062

12 fac(Y) 14 14.8 3.33 0.014

13 fac(X) 14 43.1 1.60 0.119

SPATIAL ANALYSIS

BLUP values

Heritabilites

Traditional Spatial

Female BLUP SE(BLUP) BLUP SE(BLUP)

1 -0.215 0.197 -0.277 0.189

2 0.204 0.197 0.191 0.190

3 -0.154 0.197 -0.129 0.188

4 -0.099 0.197 -0.207 0.189

Traditional Spatial

Va 0.421 0.465

Vp 0.790 0.634

mean(PEV) 0.039 0.036

h2 0.532 0.733

h2pev 0.631 0.693

UNREPLICATED TRIALS (UR)

• Field experiments that allows testing several hundreds of genotypes with

little or no replication.

• Useful for initial stages of genotype screening.

• Most treatments (with the exception of controls or checks) have a single

replication.

• Checks are used for estimation of local control and to detect trends, and they

allow estimation of the residual variance.

• Typically augmented designs are the base for unreplicated trials.

• Using too many check plots could be expensive.

• Checks should have a similar response than test genotypes.

• Statistical analysis can be based in simple (e.g. RCBD) or spatial models

(e.g. AR1AR1).

11 C2 24 112 23 69 C1 96 22 6 34 C1

85 101 48 C1 28 7 89 60 C2 108 74 56

47 C1 10 43 C2 16 52 5 38 33 C2 93

65 111 64 100 81 104 C2 78 C1 113 21 106

12 C2 44 68 42 C1 97 17 32 73 C1 35

25 C1 27 C2 15 88 29 4 53 C2 55 75

102 84 1 49 C1 61 70 C2 18 95 37 C1

46 86 C2 63 2 51 79 39 59 92 C2 57

66 13 C1 82 41 98 C2 90 C1 77 20 36

C1 45 83 87 C2 62 3 30 72 54 105 76

26 C2 9 14 50 8 40 C1 31 19 C2 C1

110 103 67 C1 99 80 C2 71 91 58 109 94


General recommendations

• More control plots improve the efficiency of UR experiments.

• Important gains in efficiency are achieved by using spatial analyses.

An unreplicated pepper trial was established to evaluate a total of 824 genotypes

planted in single plots and arranged as a RCBD with 4 blocks. In addition, a total of

10 control genotypes were planted with 20 replications each (i.e. 5 replications per

block). All these individuals were arranged in a 32x32 grid, and the response variable

yield, YD, was obtained. It is of interest to rank all the single replicated genotypes.


Example: /Day2/UnRep/PEPPER.TXT

Gens Control Rep X Y YD

6 0 1 1 25 7.91

16 0 1 7 17 9.04

18 0 1 11 26 9.53

19 0 1 16 20 10.08

22 0 1 2 27 9.78

35 0 1 10 26 9.21

39 0 1 4 30 8.86

40 0 1 8 24 9.15

42 0 1 11 25 9.38

45 0 1 15 22 10.64

48 0 1 10 32 10.32

50 0 1 10 31 11.22

51 0 1 8 26 11.45

...


!RENAME !ARGS 1

Augmented Design

Gens 824 !I !SORT

Control 2 !I !SORT

Rep 4 !I

X 32

Y 32

YD

PEPPER.TXT !SKIP 1

!DOPART $A !MAXIT 50

!PART 1

YD ~ mu !r Rep Gens !f mv

1 2 0

32 0 ID

32 0 ID

!PART 2

YD ~ mu !r Rep Gens

1 2 0

32 X AR1 0.5

32 Y AR1 0.5

Example: /Day2/UnRep/Unrep_.as


Traditional Analysis

Spatial Analysis

LogL=-478.184 S2= 0.74805 1023 df


Rep 4 4 0.101193E-06 0.756971E-07 0.00 0 B

Gens 834 834 0.282634 0.211424 2.66 0 P

Variance 1024 1023 1.00000 0.748048 10.43 0 P

LogL=-468.587 S2= 0.77062 1023 df


Rep 4 4 0.101193E-06 0.779810E-07 0.00 0 B

Gens 834 834 0.238505 0.183796 2.48 0 P

Variance 1024 1023 1.00000 0.770617 11.04 0 P



Session 11

Generalized

Linear Mixed Models


GLMM

General Uses

• It corresponds to an extension of the linear mixed models to situations with a

distribution other than the Normal, typically, Binomial and Poisson.

• It needs the specification of the distribution, together with a link function that

connects the response to the explanatory variables of the linear model.

• For linear models, estimation of parameters is based in maximum likelihood

estimation (MLE), and therefore it can run into problems.

• For linear mixed models, estimation of parameters is based in an

approximation to the MLE.

• Testing is done using a LRT, mainly in comparison of the mean deviance.

Difficulties

• Interpretation, and calculation of genetic parameters are more difficult as we

are in a different scale.

• Convergence problems are common, and with unbalanced data it is common

to have biologically inconsistent estimates.

ZgXβp

p

1loge

ZgXβμ )g(

)exp(1

)exp(1

ZgXβ

ZgXβμp

ii nn

inVar

)-(1)(

ppp

BINOMIAL RESPONSES

General expression

Link: logit

Back-transformed model

Variance expression

Note: ni = 1 for binary data.

over- under-dispersion parameter

ZgXβm elog

ZgXβμ )g(

)exp( ZgXβm

)()( ZgXβm Var

POISSON RESPONSES

General expression

Link: log

Back-transformed model

Variance expression

over- under-dispersion parameter

FITTING A GLMM


!BIN assumes a Binomial distribution for the response

!TOTAL specifies vector with the Binomial totals

!POISSON assumes a Poisson distribution for the response

!DISP k estimates or fixed the dispersion parameter to k

!LOGIT considers a logit link function

!PROBIT considers a probit link function

!AOD obtains the analysis of deviance table for fixed effects.

Alternatives

• Perform a transformation of the original data, and then back-transform

predictions.

• Assume a normal distribution (by the CLT), whenever values are relatively

large.

• Collapse data into a higher strata (e.g. PLOT).

GLMM MODEL

Heritability in GLMM (Binomial)

• Calculation is not direct and it requires an approximation.

• Several alternatives are available in the literature

Logit approach

with

Distributional approach

22

22

logit

4

es

sh

3/22 e

)1(

42

22

Binpp

hs

s

Example: /Day2/GLMM/SALMONAB.TXT

BINOMIAL MODEL

A salmon breeding program evaluated a total of 933 records of fish originated

from 124 families. The objective is to select individuals that will constitute the

parents for the next generation. The response variables are MARKETA and

MARKETB, which are binary responses that indicate if a given individual makes it

for a given market category. The linear model to fit should consider the full

pedigree and the factor SEX as a covariate.

INDIV Sire Dam DaysM Sex MarketA MarketB

1001 564 727 741.46 1 1 1

1002 564 727 500.09 2 1 1

1003 564 727 495.07 1 1 1

1004 564 727 506.25 2 0 0

1005 564 727 593.21 2 1 1

1006 564 727 671.1 1 1 1

1007 564 727 523.48 1 1 1

1008 564 727 531.33 1 1 1

1009 564 727 446.02 2 1 0

1010 564 727 599.2 1 1 1

1011 564 727 509.38 2 1 1

1012 564 727 643.45 2 1 1

1013 607 707 711.68 1 1 1

...

!RENAME !ARGS 1

Breeding Program Salmon

INDIV 2040 !P !SORT

SIRE 115 !I

DAM 124 !I

DAYSM

SEX 2 !I

MARKETA

MARKETB

PEDIND.TXT !SKIP 1 !MAKE

SALMONAB.TXT !SKIP 1

!MAXIT 40 !DISPLAY 2 !FCON !DOPART $A

!PART 1

MARKETA !BIN !AOD ~ mu SEX !r INDIV

predict INDIV

!PART 2

MARKETA ~ mu SEX !r INDIV

Example: /Day2/GLMM/GLMFish_.as

BINOMIAL MODEL

Interpreting output

BINOMIAL MODEL

Analysis of Deviance Table for MARKETA

Source of Variation df Deviance Derived F

SEX 1 9.20 15.964

Deviance from GLM fit 931 536.33

Variance heterogeneity factor [Deviance/DF] 0.58

Notice: The Derived F is calculated assuming 931 degrees of freedom

which will usually be a false assumption under a mixed model.

The Analysis of Variance below is of the 'working' variable.



INDIV 6.68 0.575366 1.0


INDIV 1380 1380 0.575366 0.575366 1.83 0 P

Variance 933 931 1.00000 1.00000 0.00 0 F

Wald F statistics

Source of Variation NumDF DenDF_con F-inc F-con M P-con

8 mu 1 162.4 276.33 134.54 . <.001

5 SEX 1 931.0 4.36 4.36 A 0.038

GLMM MODEL

16802903145750

5750422

22

logit ../.

.h

es

s

Heritability

Predictions

Predicted values of MARKETA

The SIMPLE averaging set: SEX

INDIV Logit_value Stand_Error Ecode Retransformed_value approx_SE

501 2.0548 0.7383 E 0.8864 0.0978

502 2.2073 0.7194 E 0.9009 0.0851

503 1.8882 0.7264 E 0.8686 0.1069

504 2.0722 0.7363 E 0.8882 0.0964

505 2.1586 0.7255 E 0.8965 0.0891

506 2.3017 0.7438 E 0.9090 0.0830

507 2.4341 0.7256 E 0.9194 0.0728

508 1.8930 0.7242 E 0.8691 0.1062

509 2.0337 0.7414 E 0.8843 0.0998

510 1.8163 0.7330 E 0.8601 0.1130

511 2.3666 0.7343 E 0.9142 0.0778

512 2.0696 0.7365 E 0.8879 0.0966

513 2.0390 0.7408 E 0.8848 0.0993

...

Session 12

Genomic Selection

In ASReml


• Genetic improvement aims to select the best individuals for the production

and breeding populations. However, traditional breeding is a long and

expensive process, with many traits difficult to measure.

• More than 20 years ago molecular markers became the promise to aid

breeders in selection using Marker Assisted Selection (MAS). To perform

MAS QTL or association genetics type of analysis was required.

• MAS did work, in a few situations, where a marker-QTL association was

found to explain a significant portion of the variance mainly from single

QTLs with large effect.

• However, most traits of interest in breeding programs are quantitative

complex traits – controlled by a large number of genes.

• Meuwissen et al. 2001 proposed to use all markers simultaneously as

random effects to predict genetic performance (a.k.a. Genomic Selection)

RATIONALE

• Construct prediction models using the current breeding population phenotype

and molecular markers capturing most of the quantitative variation

Supplementary Figures

Supplementary Figure 1 Histograms of (a) the diagonal and (b) the off-diagonal elements of

the raw estimates of the genetic relationship matrix, (c) the diagonal and (d) the off-diagonal

0

0.1

0.2

0.3

0.4

0.5

0.6

-4.1

-3.6

-3.1

-2.6

-2.1

-1.6

-1.1

-0.6

-0.1 0.4

0.9

1.4

1.9

2.4

2.9

3.4

3.9

De

nsi

ty

Z-score

0

10

20

30

40

50

60

70

80

90

100

De

nsi

ty

Diagonal elements of genetic relationship matrix(Rarw estimates)

0

20

40

60

80

100

120

De

nsi

ty

Off-diagonal elements of genetic relationship matrix(Adjusted estimates)

0

20

40

60

80

100

120

140

De

nsi

ty

Diagonal elements of genetic relationship matrix(Adjusted estimates)

0

10

20

30

40

50

60

70

80

90

100

De

nsi

ty

Off-diagonal elements of genetic relationship matrix(Rarw estimates)

Range: 0.980 ~ 1.051

Mean: 1.001

SD: 0.00519

Range: -0.0227 ~ 0.0256

Mean: -0.00026

SD: 0.00455

Range: 0.983 ~ 1.043

Mean: 1.001

SD: 0.00434

Range: -0.0190 ~ 0.0214

Mean: -0.00021

SD: 0.00380

a b

c d

e

Nature Genetics: doi: 10.1038/ng.608

Genotypic information

Breeding Value (BV) +

Prediction model construction:

Molecular Markers

BV =1m + Wjm j

j=1

p

å + e

Quantitative phenotypic information

GENOMIC SELECTION

• Future individuals are genotyped to be use as input on prediction models to

select superior genotypes in next cycles

Supplementary Figures

Supplementary Figure 1 Histograms of (a) the diagonal and (b) the off-diagonal elements of

the raw estimates of the genetic relationship matrix, (c) the diagonal and (d) the off-diagonal

0

0.1

0.2

0.3

0.4

0.5

0.6

-4.1

-3.6

-3.1

-2.6

-2.1

-1.6

-1.1

-0.6

-0.1 0.4

0.9

1.4

1.9

2.4

2.9

3.4

3.9

Den

sity

Z-score

0

10

20

30

40

50

60

70

80

90

100

Den

sity

Diagonal elements of genetic relationship matrix(Rarw estimates)

0

20

40

60

80

100

120

Den

sity

Off-diagonal elements of genetic relationship matrix(Adjusted estimates)

0

20

40

60

80

100

120

140

Den

sity

Diagonal elements of genetic relationship matrix(Adjusted estimates)

0

10

20

30

40

50

60

70

80

90

100

Den

sity

Off-diagonal elements of genetic relationship matrix(Rarw estimates)

Range: 0.980 ~ 1.051

Mean: 1.001

SD: 0.00519

Range: -0.0227 ~ 0.0256

Mean: -0.00026

SD: 0.00455

Range: 0.983 ~ 1.043

Mean: 1.001

SD: 0.00434

Range: -0.0190 ~ 0.0214

Mean: -0.00021

SD: 0.00380

a b

c d

e

Nature Genetics: doi: 10.1038/ng.608

Selection Generation i+1

Genotypes Generation i Molecular Markers

BVF = WjFm j

j=1

p

å

Prediction

Deployment

GENOMIC SELECTION

• Decrease the generation cycle of breeding (e.g. Perennials, Cattle).

• Decrease the cost of testing (e.g. Cattle, Maize).

• Screening a larger number of genotypes without field testing, thus

increasing the selection pressure (e.g. Maize, other cereals).

• Predict performance for difficult and/or expensive traits (e.g. Cattle,

Salmon).

• Predict performance for diseases avoiding challenging and losing the

germplasm (all species).

• Can be used regardless the genetic architecture of the trait.

Note

• To apply GS successfully the constructed models need to accurately predict

the genetic performance.

BENEFITS OF GS

Accuracy depends on:

• The level of linkage disequilibrium (LD) between the markers and the QTL

(effective population size and genotyping density).

• The number of individuals with phenotypes and genotypes in the reference

population (training set) from which the marker effects are estimated.

• The heritability of the trait in question, or, if deregressed breeding values

are used (clonal means or progeny testing), the reliability of these breeding

values.

• The distribution of QTL effects, i.e. number of loci involved.

• Quality of the phenotyping used to construct the prediction model.

GENOMIC SELECTION

• BLUP-Based: G-BLUP, RR-BLUP, RR-BLUP_B

• Bayes-Based: BayesA, BayesB, BayesCπ, BayesR

• LASSO-Based: Bayesian Lasso Regression, Improved Lasso

• Semi-Parametric Regression: RKHS

• Non-Parametrics: Suport Vector Machine, Neural-Networks

• Others...

Meuwissen et al 2001; Habier et al 2011; De los Campos et al 2009; Legarra et al 2011;

Gianola et al 2006; Long et al 2011; Gianola et al 2011

ANALYTIC METHODS FOR GS

• Genomic BLUP (GBLUP) is a Genomic Selection method that uses the

same framework than BLUP analysis, but replaces:

– The numerator relationship matrix (A) derived from the pedigree by,

– The realized relationship matrix (GA) derived from molecular

markers.

• GA is also known as observed relationship matrix or genomic matrix.

GBLUP

Example:

• If the markers are capturing all genetic variation, then we can assume that:

• If we also assume:

• Then we get:

which is a covariance matrix for the individual breeding values a

a =Wm

1220

0112

2022

2101

W

14.0

08.0

02.0

24.0

m

02.0

42.0

80.0

44.0

a

Wma 2

mI)m(V

FROM MARKERS TO GA

2

m'WW)a(V

• Ideally, we want to model this covariance using the same classical Linear

Mixed Model framework, therefore, it would be desirable to have this

matrix in terms of σ2a

• If we recall then:

by replacing σ2m.

SNPsALL

i miia qp_

1

222

SNPsALL

i ii

a

qpm _

1

2

2

2

2

i

2

2aA

ii

a G'WW

)a(Vqp

FROM MARKERS TO GA

2

m'WW)a(V



a vector of random additive effects (i.e. BV), ~ N(0, GAσ2a)


Note:

• The variance-covariance matrix (GA) of the additive effects is now

derived from molecular markers, and it replaces the old A matrix.

eaZbZXβy 21

ANIMAL MODEL GBLUP

• Genomic BLUP (GBLUP) is a Genomic Selection method that uses the

same framework than BLUP analysis, but replaces:

– The numerator relationship matrix (A) derived from the pedigree by,

– The realized relationship matrix (GA) derived from molecular

markers.

• GA is also known as observed relationship matrix or genomic matrix.

GBLUP

125.000

25.0125.025.0

025.015.0

025.05.01

A

99.020.001.002.0

20.003.126.023.0

01.026.099.042.0

02.023.042.098.0

AG

125.000

25.0125.025.0

025.015.0

025.05.01

A

99.020.001.002.0

20.003.126.023.0

01.026.099.042.0

02.023.042.098.0

AG

ADVANTAGES AND CONSIDERATIONS

• The use of GBLUP instead of the pedigree-based BLUP was shown to

partition better the genetic from environmental variation.

• The A matrix is derived based on the infinitesimal model and represents and

average relationship.

• The relationship matrix derived from the markers is more informative

because the relationships estimates include the Mendelian sampling.

• Finally, GBLUP is unbiased: E(GA) = A

GBLUP

GBLUP

ADVANTAGES AND CONSIDERATIONS (cont.):

• GBLUP uses the same framework that BLUP (Linear Mixed Models).

• Fewer normal equations need to be solved in the fitting of the model.

• GBLUP is equivalent to RR_BLUP but it is simpler to implement.

• Allows the direct estimation of individual’s accuracies (i.e. SEP found in

sln files).

• Permits the simultaneous analysis of genotyped an non-genotyped

individuals.

Problem:

• GA matrix is usually not positive definite

Solution:

• Bending the matrix (e.g. diag(GA) + 0.00001).

• Blending the matrix (e.g. GA* = 0.99 GA + 0.01 A).

GBLUP

COMPUTING THE RELATIONSHIP MATRIX

• There are several different algorithms to compute the GA matrix from SNP

data:

• Hayes and Goddard (2008)

• Van Raden (2008) – 2 methods

• Yang et al. (2010) – Human genetics

• Relationship matrices work well to model the variance-covariance of

additive effects assuming a large number of markers is used.

• Overall, the different algorithms to calculate GA do not differ considerably

in their predictive ability.

GBLUP in ASReml

User supplied special variance structures

• The relationship matrix (GA) is computed using a given algorithm from other

software (R, Fortran, etc.) based on molecular markers, and then supplied to

ASReml.

• The GA matrix is supplied as an independent file in ASCII format.

• It should be located (in the job file) after the pedigree file, but before the

dataset file (there is a maximum of up to 98 GA matrices)

• The extension of the file is:

name.grm if the relationship matrix, GA, is provided.

name.giv if the inverse of the relationship matrix, GA-1, is provided.

GBLUP in ASReml

G matrix format

• Could be in dense format (lower triangular row-wise), but need to specify the !DENSE command, or

• Can be read as SPARSE (default) format: row, column, value (lower

triangular row-wise sorted column within rows).

• All diagonal elements of the matrix must be included in the file (even 1s).

Options

!SKIP [n]

!DENSEGRM, !DENSEGIV

!SAVEGIV [f] default dense format, use f = 1 for sparse format

Warning

• The number and order of levels have to match perfectly the ones used for the associated factor, e.g. animalID, read in the data.

GBLUP in ASReml

How to associate the G matrix with the genetic factor?

A. In the variance specification lines, e.g.

B. Directly in the model, e.g.

DAYSM ~ mu SEX !r INDIV

0 0 1

INDIV 1

INDIV 0 GIV1 0.2

DAYSM ~ mu SEX !r giv(INDIV,1) 0.12

Warning: The number and order of levels have to match the ones used for

the associated factor read in the data.

An experiment consisting in evaluating a total of 10 individuals originating from

full-sib families of 4 sires and 4 dams. The objective is to fit a parental model

(i.e. select sires) that considers the molecular pedigree information.

GBLUP in ASReml

DATA.txt

INDIV Sire Dam Resp

1001 10 50 155

1002 10 60 121

1003 10 70 130

1004 20 50 141

1005 20 60 130

1006 20 70 162

1007 30 50 118

1008 30 60 108

1009 30 70 119

1010 40 80 143

PEDSIRE.txt

INDIV Sire Dam

10 1 0

20 2 0

30 2 0

40 1 0

Example: /GBLUP/

GBLUP in ASReml

100250

012500

025010

250001

.

.

.

.

A

9870068002303640

0680016122600360

0230226099200120

3640036001200231

....

....

....

....

AG

1751093000104210

0930046123700730

0010237006210200

4210073002001301

1

....

....

....

....

AG

GMATRIX.grm

Col Row G

1 1 1.023

2 1 0.012

2 2 0.992

3 1 -0.036

3 2 0.226

3 3 1.016

4 1 0.364

4 2 0.023

4 3 0.068

4 4 0.987

GINVG.giv

Col Row GINV

1 1 1.130249244

2 1 -0.020490012

2 2 1.062319971

3 1 0.072807826

3 2 -0.2369711

3 3 1.045793666

4 1 -0.421368173

4 2 -0.0008723

4 3 -0.093379618

4 4 1.175023193

10 20 30 40

GBLUP in ASReml

!RENAME !ARGS 2

Evaluating GBLUP

INDIV 10 !I

Sire 4 !I

Dam 3 !I

Resp

GINVM.giv !SKIP 1

DATA.txt !SKIP 1 !DOPART $A

!PART 2 # Using GINVM.giv

Resp ~ mu !r giv(Sire,1) Dam

predict Sire

!PART 3 # Another way for GINV

Resp ~ mu !r Sire Dam

1 1 1

10 0 ID

Sire 1

Sire 0 GIV1 200

predict Sire

!RENAME !ARGS 4

Evaluating GBLUP

INDIV 10 !I

Sire 4 !I #!P #!I

Dam 3 !I

Resp

GMATRIX.grm !SKIP 1

DATA.txt !SKIP 1 !DOPART $A

!PART 4 # Using GMATRIX.grm


GINV Matrix G Matrix

GBLUP in ASReml

!RENAME !ARGS 4

Evaluating GBLUP

INDIV 10 !I

Sire 4 !P #!I

Dam 3 !I

Resp

DUMMYPED.txt !MAKE !SKIP 1

GMATRIX6.grm !SKIP 1

DATA.txt !SKIP 1 !DISPLAY 7 !DOPART $A

!PART 5 # Doing Predictions GMATRIX6.grm


Predictions for ‘new’ individuals

1.023 0.012 -0.036 0.364 0.083 0.176

0.012 0.992 0.226 0.023 0.023 0.508

-0.036 0.226 1.016 0.068 -0.011 0.136

0.364 0.023 0.068 0.987 0.123 0.495

0.083 0.023 -0.011 0.083 0.996 0.077

0.176 0.508 0.136 0.495 0.077 1.010

AG

10 20 30 40 50 60

GBLUP in ASReml

Sire Predicted_Value Standard_Error Ecode

10 135.8410 7.3084 E

20 141.4311 7.3654 E

30 120.1485 7.3634 E

40 137.4303 9.8927 E

50 134.5924 15.2820 E

60 139.5677 11.4333 E

SED: Overall Standard Error of Difference 12.58

Predictions for ‘new’ individuals


Dam 4 4 0.318666 46.7809 0.48 0 P

giv(Sire,1) 6 6 1.14196 167.642 0.81 0 P

Variance 10 9 1.00000 146.802 1.41 0 P

GBLUP in ASReml

FINAL COMMENTS

• Modifications can be done that incorporate observed relationships of parents

and all offspring.

• Individuals with measurements correspond to training population and ‘new’

individuals in GA matrix are treated as prediction population.

• It is possible to combine pedigree data (A) with observed relationships (GA)

into a single matrix. This will allows to consider individuals without

molecular data.

• Observed dominance (GD) relationship matrix can also be incorporated to

model these interactions or higher order interactions, e.g. A#D.

• Further understanding of the construction (and properties) of the GA matrix

are required.

Documents

Analysis of Experiments using ASReml › uploads › 3 › 8 › 9 › 6 › 38964623 › ... · 2019-09-29 · CONTENTS 3 Session 1 Introduction to ASReml 2 Introduction to Linear