DeltaGen: Quick start manualagrubuntu.cloudapp.net/shiny-apps/PlantBreeding... · The data set in Case study 2 will be used to demonstrate application of three breeding strategies

1

Dr. Zulfi Jahufer & Dr. Dongwen Luo

CONTENTS Page

Main operations tab commands 3

Uploading a data file 4

Matching variable identifiers 5

Data check 6

Univariate analysis 8

Pattern analysis (within univariate model option) 12

Univariate analysis –Two trait combination 15

Guide to calculation of generating a GEBV – “Sample cost” 17

Estimation of genetic gain (G) and its simulation 18

Multivariate analysis 20

Using Plot 21

o Biplots (on raw data) 21

o Matrix plots (phenotypic correlation- on raw data) 22

MANOVA (additive variance/co-variance & correlation) 23

Smith-Hazel selection index 24

Pattern analysis for multiple traits 26

Trial designs 30

o Completely randomized 31

DeltaGen: Quick start manual

2

o Randomized complete block 33

o Factorial design 35

o Row & column design 37

o Row & column design with repeated checks 39

Save session and Quit 41

Please note that clicking on “Help” in the analysis screens will provide information on underlying theory &

associated references.

3

Main operations tab commands

Clicking on any of the commands above will open dropdown menus.

Introduction

This window is opened when DeltaGen is started.

Trial Design

This command will open a screen that will provide a range of experimental designs that DeltaGen can generate. A full description of this command will be provided under experimental design, the last section in this manual.

Data Input: uploading a data file

Clicking on Data Input will open the screen shown on the next page.

Main operations tab commands

4

Shows data files opened

Upload enables data files to be uploaded using Browse; Examples enables practice data sets

within DeltaGen to be uploaded; Clipboard enables copied data to be uploaded (this option

does not work in the online version of DeltaGen).

Click to upload external data files

Data file types accepted by DeltaGen

! CSV files are preferred – save EXCEL data files in csv format for uploading into DeltaGen

! Missing values in the data matrix should be identified before uploading any data set.

The dropdown menu provides 3 options: Empty or * or •

Follow STEPS 1 & 2 to upload a data set from an external file.

STEP 1

STEP 2

Files can be saved in RData or CSV formats

Uploading a data file

PPl ! Data file names cannot have gaps between words.

5

After Uploading a data set following steps 1 & 2, the column identifiers of variables in the data

matrix have to be matched with those already defined as (Year, Season, Location, Replicates,

Row, Column, Sample and Line) in DeltaGen.– as shown in STEP 3

Traits

STEP 3 - To match a DeltaGen variable with an associated column identifier in the data matrix,

click on the relevant dropdown menu and choose the matching column name; e.g. by clicking on

the dropdown menu for Location, column “Site” was selected. Similarly, Rep for Replicates. If a

variable is not in the uploaded data, this is left as “Null”.

STEP 4 - Clicking on Run will submit the data for analysis. You are now ready to check or analyse your data.

Matching variable identifiers (this step can be omitted when using the example data sets in DeltaGen)

6

Data check: Graphical or tabular summary of raw data is an optional data quality check before univariate or multivariate analysis.

The plot-type dropdown menu

provides a range of plots;

Histogram, Density, Scatter, Line,

Box-plot, to illustrate the data.

First click on the X-variable,

in this case DM (dry matter).

You can arrange the plots

by defining the row and

column layout; in the

example presented using

Histograms, the Locations

are presented down rows

and the dry matter in each

of the 3 replicates within

each location as columns.

7

Data check continued. The heat-map option from the dropdown menu under Pivot Table, illustrates the actual values and

spatial distribution of summer dry matter raw data across a field experiment based on a row-column experimental design

consisting of 3 replicates. This can also be used to identify data entry errors.

High value Missing data Low value

Clicking on these headings will show the associated data below.

All factors, e.g.

Replicates,

Column, Row,

can be moved

across; point on

a factor, left

click on the

mouse, hold

down and

move.

This will result

in changing the

configuration of

the table.

8

After uploading the data, Click on the “Models” command and select Univariate. This screen will open.

The Data Information panel provides a summary of the uploaded data.

The demonstration/practice data set used, consists of 107

entries of Perennial ryegrass (Lolium perenne L.) evaluated at 3

locations over 3 years, for seasonal growth. Data file name:

CaseStudy 1 under Examples.

Univariate analysis: Case study 1

The default settings for the linear mixed effects model are; modelling and half sib family.

If simulation of genetic gain is to be conducted, the choice of half sib (HS) or full sib (FS) family is

important. Alternately if the analysis is not for estimation of genetic components of variance or is

based on a fixed effects model, you can continue using the HS default option.

Simulation must be selected only after conducting the variance component analysis for HS or FS.

On opening the univariate analysis screen, the Primary Trait box will be at “Null”. Clicking on this

box will open the dropdown menu with all the traits in the uploaded data set; in the example; NZGro

(seasonal growth at 3 locations in New Zealand).

Fixed terms: clicking in the fixed terms box will open a dropdown menu that will enable you to

select the appropriate factors in the data set (years, locations, seasons, in the example). Select

“Null” if no fixed terms are to be included in the model.

Random terms: Select the appropriate factors from dropdown menu which opens in this box.

Traits for BLUP or BLUE estimates can be selected from the associated dropdown menus. Click Run to begin

analysis

9

Univariate analysis (Case study 1 continued): The linear model - One trait

Replicates nested within locations within seasons within years,

Lines

Line-by-year interaction,

Line-by-season interaction,

Line-by-year interaction

Select Primary trait to be analysed, (From our example data set “NZGro has been selected),

Select Fixed terms and their interactions as required, (if an additional term that does not

appear in the dropdown menu needs to be added, double click in the fixed terms box, enter

the term and click on “Add” that appears)

Select Random terms and interactions as required, (if an additional term that does not

appear in the dropdown menu needs to be added, double click in the random terms box,

enter the term and click on “Add” that appears)

Select Heritability if appropriate, (In Case study 1, this was considered as Repeatability)

Select BLUP if lines are random or BLUE if fixed,

Click Run to begin analysis.

10

As the lines were

considered as random

effects in the linear

model and the BLUP

estimate option was

selected, clicking on

the BLUP button has

provided this analysis

output:

BLUP values for mean

growth based on line

performance across

years, seasons and

locations.

Univariate analysis (Case study 1 continued): linear mixed model analysis-

output

Results for Fixed effects

Genotypic variance (σ2g ) among the 107entries

Associated ± standard error

Specifically for case study 1, this estimate was considered as Repeatability.

Error CV of trial

11

Univariate analysis: Case study 2 The demonstration/practice data set used, consists of 90 half

sib (HS) families of Perennial ryegrass (Lolium perenne L.)

evaluated at 1 location over 3 years, for seasonal growth.

Data file name: CaseStudy 2 under Examples.

Results for Fixed effects

Genetic variance (1/4σ2A ) among the 90

HS families Associated ± standard error

Narrow sense heritability (h2n)

12

Pattern analysis (within Univariate model option) – Multi-location (more than 2 locations)

The first step in Pattern analysis is to select Line-by-location interaction.

This is to generate a two way line-by-location BLUP matrix.

BLUP estimates for each individual line within each

location (in the example Rua, PN and KERI) will be

generated.

Click on

pattern

analysis

Click Run.

The demonstration/practice data set used, consists of 107

entries of Perennial ryegrass (Lolium perenne L.) evaluated a 3

locations over 3 years, for seasonal growth.

Data file name: CaseStudy 1 under Examples.

Locations:

Rua, Ruakura

PN, Palmerston North

KERI, Kerikeri

13

Clicking on Pattern Analysis – Cluster will result in generating

Line groups based on performance across locations and also

associated dendrograms of locations and lines.

Location groups: 1, 2 & 3.

Locations: KERI, Rua, PN

Dendrograms of Location and Line grouping

14

Clicking on Pattern Analysis – PCA (Principle

Component Analysis) will result in generating a

biplot based on PC1 and PC2, the line groups

and individual line labels. The directional vectors

are the locations.

Line clusters

Directional

vectors

15

Univariate analysis (continued): The linear model - Two trait combination

The demonstration/practice data set used, consists of 147 lines (half-

sib families) of switchgrass (Panicum virgatum L.) evaluated across 2

locations over 2 years using randomized complete block designs with 3

replicates. Data on 3 traits; dry matter yield (DMY), cell wall ethanol

(CWE) and Klason lignin (KL), are included.

Data file name: CaseStudy3 under Examples.

Analysis with the secondary trait included provides an opportunity to

simultaneously estimate narrow sense heritability for each trait, and their

genetic correlation.

These outputs are then automatically integrated into the breeding strategy

simulation models for estimation of Correlated Response to Selection of

the primary trait based on secondary trait selection.

All the initial steps with regards the fixed and random term models for the primary trait are

similar to the single trait analysis.

For analysis of the Secondary trait:

Tick the box for secondary trait and select the trait from the dropdown menu, (From

our example data set, trait KL was selected )

Click the MANOVA box and select the terms in the dropdown menu to conduct a

variance/covariance analysis,

Click Run

16

Univariate analysis Output – two trait (CWE/KL) combination - continued

Results from this analysis are similar to those from

the single trait analysis, but also provide information

on narrow sense heritability of the secondary trait as

well as the genetic correlation between the two

traits, CWE/KL.

Variance components for Random effects

from primary trait analysis.

Narrow sense heritability of primary trait.

17

Listed below is a guide for calculating the cost of generating a single Genomic Estimated Breeding Value

(GEBV) – referred to as Sample Cost in GS simulation

Step Cost/sample

Genotyping $53

DNA isolation $7

Library generation $9

†DNA sequencing $37

SNP genotypes $10

(bioinformatics)

Prediction of GEBV's $5

(statistical model)

Total $68

Other notes:

Assumes GBS as the genotyping method.

Sequencing uses an Illumina HiSeq 2500 with version 4 chemistry

†Cost is for the 96-plex level which will change with the level of multiplexing. The suggested multipliers: (48-ples ×2), (192-plex ×0.5), (384-plex ×0.25)

‡Dodds, K.G.; McEwan, J.C.; Brauning, R.; Anderson, R.M.; van Stijn, T.C.; Kristjánsson, T.; Clarke, S.M. (2015). Construction of relatedness matrices using

genotyping-by-sequencing data. BMC Genomics 16: 1047.

18

Estimation of Genetic Gain (G) and its simulation

Clicking on the Strategy dropdown window will enable selection of any of the breeding strategies below:

The data set in Case study 2 will be used to demonstrate application of three breeding strategies. The data

set consists of 90 half sib (HS) families of Perennial ryegrass (Lolium perenne L.) evaluated at 1location

over 3 years, for seasonal growth. Data file name: CaseStudy 2 under Examples.

Clicking on simulation will open the breeding strategy window.

The “Industry standard” can be the trait value of a commercial check or mean of checks in the genetic

family evaluation trial. Some may wish to use the long term average, across the target population of

environments, of the best commercial cultivars. This value will provide a relative comparison (%) of the

genetic gain estimated from family selection to an industry standard.100 is a default value. Any value can

be entered; mm, kg ha-1, …..

19

HS family based

breeding models

including

Genomic

selection (GS).

All the simulation variables have dropdown menus which provide a range of values to select from.

All estimated costs ($) should be entered manually.

Click Update every time a breeding strategy, simulation variable or cost ($) is changed. This will update G estimates and associated costs.

These constants cannot be changed

Inputs automatically transferred from linear model analysis

If Full Sib families, HS will change to FS. If two traits like CWE and KL, primary and secondary, respectively, are analysed, then models for Correlated response to

selection will be available.

Gc, gain estimate per cycle

Ga, gain estimate per annum

% (relative to parental mean)

% (relative to Industry standard)

Estimation of Genetic Gain (G) and its simulation continued

These constants cannot be changed Enter accuracy

value manually

20

Multivariate analysis

To begin: Click on

the “Model”

command and

select Multivariate. Plot gives you options to

generate a Biplot or a Matrix

Plot of phenotypic correlation,

based on raw data.

MANOVA (Multivariate analysis of variance) generates a variance and

covariance matrix and genotypic or genetic correlation coefficients for

the traits chosen from the dropdown menu in the Multiple traits box.

Clicking on Selection index activates a window that enables use of the

Smith-Hazel index.

Clicking in this box will show you the list of traits, in the uploaded data matrix, to be selected

for multivariate analysis based on the three options; Plot, MANOVA and Selection Index.

Used for

highlighting

groups in the

Plot option.

The demonstration/practice data (File name: CaseStudy3 in Examples. You need to first upload this file using: Data Input),

consists of 147 lines (half-sib families) of switchgrass (Panicum virgatum L.) evaluated across 2 locations over 2 years using

randomized complete block designs at each location containing 3 replicates. Data on 3 traits; biomass dry matter yield (DMY),

cell wall ethanol (CWE) and Klason lignin (KL) are included in matrix.

21

Using Plot

! This Biplot is based

on raw data.

22

Phenotypic correlation based on raw data.

23

1) Click on MANOVA,

2) Click on Multiple traits box and choose traits,

3) Click on MANOVA terms box and chose the effects for the completely random

linear model (keep this model simple by choosing main effects and only their two

way interactions),

4) Click Run.

Multivariate analysis Output – estimates are genetic if the data are generated from HS or FS families

MANOVA

1

2

3

4

24

1) Click on Selection Index,

2) Click on Multiple traits box and choose traits,

3) Manually enter the Index weightings,

4) Click on LME fixed terms box and select the fixed effects or leave as Null,

5) Click on LME random terms box and select the random effects,

6) Click on MANOVA terms box and chose the effects for the completely random linear

model (keep this model simple by choosing main effects and only their two way

interactions),

7) Click on the Selection pressure box and choose the intensity of selection,

8) To estimate the genetic gain for each trait under selection (DMY, CWE, KL) tick G,

9) Click Run.

Smith-Hazel selection index

1

2

3

4

5

6

7

8

9 Contains theory

& references

25

[𝑏] = [𝑃]−1[𝐴][𝑤]

[𝑏]

[𝑃]−1

[𝐴]

Smith-Hazel index - Output window

Smith-Hazel index (I): the genetic worth (breeding values) of the HS families.

The Smith-Hazel index equation

Individual trait BLUP’s

Gc, (%) gain estimate per selection cycle in unites of measurement of each trait,

at a 20% selection pressure.

26

Pattern analysis for Multiple Traits

Step 1, upload the Line-by-Trait mean data matrix into

DeltaGen using Data Input. This example is based on the

data file MultiTraitMatrix.csv found in examples.

Step 2, Click on Pattern Analysis

Step 3, select the variables by clicking on them, and keep

the standardized data option on,

Step 4, Click on Run.

Cluster analysis will produce Line groups and a heat map

with Line and Trait dendrograms.

The PCA BiPlot option will provide a graphical summary

of the Line clusters and trait association (shown by the

directional vectors).

27

Pattern analysis – output.

1, 2 & 3.

Line numbers

28

Traits

Lines

Dendrograms of Trait and Line grouping

29

Line cluster groups: Magnification and quality of the contents of the biplot

can be adjusted by moving the scale controllers

Magnification and quality of the contents of the biplot

can be adjusted by moving the scale controllers

The entries (lines, genotypes…..) shown in the biplot

can be shown as dots or labels by selecting the

option in the dropdown menu.

Directional

vectors

30

Trial design instructions

Click to open design menu

Clicking on “Design Type” will display the range of trial designs available: completely randomized, randomized complete block, factorial, row-column (repeated spatial checks can also be included.

All these values can be entered manually.

31

Generating a Completely Randomized trial design: Example; generate a design for 6 treatments. Each treatment will be

replicated 3 times. The total number of entries will therefore be 6×3 = 18. The row & column combinations could be: 2×9, 9×2,

6×3, 3×6.

Let’s generate a 2×9 design.

Click on Run: The data entry format sheet (shown below) and the trial

plan (shown on the next page) will be generated.

Each time Run is clicked a new randomization is generated !!!

This trial design format can be saved as a CSV file

32

Completely Randomized

33

Click on Run: The data entry format sheet (shown below) and the trial plan (shown on

the next page) will be generated.

Each time Run is clicked a new randomization is generated !!!

As each treatment occurs only once in a replicate, 1 was entered.

Generating a Randomized Complete Block trial design: Example; generate a design consisting 3 blocks (replicates) with

50 treatments each. Each treatment will appear once (1) in a replicate. The row & column combinations per replicate: 5×10,

10×5, 2×25, 25×2.

Let’s generate a 5 row×10 column per rep by 3 replicate design.


34

Randomized Complete Block

35

Click on Run: The data entry format sheet (shown below) and the trial plan (shown on the next

page) will be generated.

Generating a Factorial trial design: Example; generate a design for an experiment to determine herbage dry weight response of

5 perennial ryegrass cultivars to 4 levels of application of a nitrogen fertilizer. The design has 4 replicates.

The row & column combinations per replicate: 2×10, 10×2, 5×4, 4×5. Let’s generate a 5 row×4 column per rep by 4 replicate design.


36

Factorial design

Cv 5/Nfert 4

Cv 5/Nfert 2

Cv 5/Nfert 3 Cv 5/Nfert 1

37

Generating a Row & Column trial design: Example; generate a design for 50 treatments with 4 replicates. The total number

of entries across all 4 replicates will be 200. The row & column combinations per replication could be: 2×25, 25×2, 5×10,10×5.

Let’s generate a 5×10 per rep by 4 replicate design. The number of total rows across all 4 replicates will be 20

(5 rows×4 replicates).

Click on Run: The data entry format sheet (shown below) and the trial plan (shown on the next

page) will be generated.


38

Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 Col 7 Col 8 Col 9 Col 10

Row 1 Row 2 Row 3 Row 4 Row 5 Row 1 Row 2 Row 3 Row 4 Row 5 Row 1 Row 2 Row 3 Row 4 Row 5 Row 1 Row 2 Row 3 Row 4 Row 5

Row and Column design

39

Click on Run: The data entry format sheet (above) and the trial plan (shown on the next page)

will be generated.

Generating a Row & Column trial design with repeated spatial checks: Example; generate a design

for 80 treatments with 3 replicates having 2 checks with 4 repeats each in every replicate. The total

number of entries per replicate will be 80 treatments plus 2 check entries, 82.

Let’s generate an 8×11 per rep by 3 replicate design having 2 checks with 4 repeats each in every

replicate. The number of total rows across all 3 replicates will be 24 (8 rows×3 replicates).


40

,

Row and Column design with 2 checks, each repeated 4 times within a replicate

Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 Col 7 Col 8 Col 9 Col 10 Col 11

Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8 Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8 Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8

41

Analysis reports can be saved as PDF, HTML and Word documents.

Click on any of the three document format options followed by

selecting Download.

To Quit DeltaGen, click on Quit App.

Save session and Quit

Documents

DeltaGen: Quick start manualagrubuntu.cloudapp.net/shiny-apps/PlantBreeding... · The data set in Case study 2 will be used to demonstrate application of three breeding strategies