Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
1
Dr. Zulfi Jahufer & Dr. Dongwen Luo
CONTENTS Page
Main operations tab commands 3
Uploading a data file 4
Matching variable identifiers 5
Data check 6
Univariate analysis 8
Pattern analysis (within univariate model option) 12
Univariate analysis –Two trait combination 15
Guide to calculation of generating a GEBV – “Sample cost” 17
Estimation of genetic gain (G) and its simulation 18
Multivariate analysis 20
Using Plot 21
o Biplots (on raw data) 21
o Matrix plots (phenotypic correlation- on raw data) 22
MANOVA (additive variance/co-variance & correlation) 23
Smith-Hazel selection index 24
Pattern analysis for multiple traits 26
Trial designs 30
o Completely randomized 31
DeltaGen: Quick start manual
2
o Randomized complete block 33
o Factorial design 35
o Row & column design 37
o Row & column design with repeated checks 39
Save session and Quit 41
Please note that clicking on “Help” in the analysis screens will provide information on underlying theory &
associated references.
3
Main operations tab commands
Clicking on any of the commands above will open dropdown menus.
Introduction
This window is opened when DeltaGen is started.
Trial Design
This command will open a screen that will provide a range of experimental designs that DeltaGen can generate. A full description of this command will be provided under experimental design, the last section in this manual.
Data Input: uploading a data file
Clicking on Data Input will open the screen shown on the next page.
Main operations tab commands
4
Shows data files opened
Upload enables data files to be uploaded using Browse; Examples enables practice data sets
within DeltaGen to be uploaded; Clipboard enables copied data to be uploaded (this option
does not work in the online version of DeltaGen).
Click to upload external data files
Data file types accepted by DeltaGen
! CSV files are preferred – save EXCEL data files in csv format for uploading into DeltaGen
! Missing values in the data matrix should be identified before uploading any data set.
The dropdown menu provides 3 options: Empty or * or •
Follow STEPS 1 & 2 to upload a data set from an external file.
STEP 1
STEP 2
Files can be saved in RData or CSV formats
Uploading a data file
PPl ! Data file names cannot have gaps between words.
5
After Uploading a data set following steps 1 & 2, the column identifiers of variables in the data
matrix have to be matched with those already defined as (Year, Season, Location, Replicates,
Row, Column, Sample and Line) in DeltaGen.– as shown in STEP 3
Traits
STEP 3 - To match a DeltaGen variable with an associated column identifier in the data matrix,
click on the relevant dropdown menu and choose the matching column name; e.g. by clicking on
the dropdown menu for Location, column “Site” was selected. Similarly, Rep for Replicates. If a
variable is not in the uploaded data, this is left as “Null”.
STEP 4 - Clicking on Run will submit the data for analysis. You are now ready to check or analyse your data.
Matching variable identifiers (this step can be omitted when using the example data sets in DeltaGen)
6
Data check: Graphical or tabular summary of raw data is an optional data quality check before univariate or multivariate analysis.
The plot-type dropdown menu
provides a range of plots;
Histogram, Density, Scatter, Line,
Box-plot, to illustrate the data.
First click on the X-variable,
in this case DM (dry matter).
You can arrange the plots
by defining the row and
column layout; in the
example presented using
Histograms, the Locations
are presented down rows
and the dry matter in each
of the 3 replicates within
each location as columns.
7
Data check continued. The heat-map option from the dropdown menu under Pivot Table, illustrates the actual values and
spatial distribution of summer dry matter raw data across a field experiment based on a row-column experimental design
consisting of 3 replicates. This can also be used to identify data entry errors.
High value Missing data Low value
Clicking on these headings will show the associated data below.
All factors, e.g.
Replicates,
Column, Row,
can be moved
across; point on
a factor, left
click on the
mouse, hold
down and
move.
This will result
in changing the
configuration of
the table.
8
After uploading the data, Click on the “Models” command and select Univariate. This screen will open.
The Data Information panel provides a summary of the uploaded data.
The demonstration/practice data set used, consists of 107
entries of Perennial ryegrass (Lolium perenne L.) evaluated at 3
locations over 3 years, for seasonal growth. Data file name:
CaseStudy 1 under Examples.
Univariate analysis: Case study 1
The default settings for the linear mixed effects model are; modelling and half sib family.
If simulation of genetic gain is to be conducted, the choice of half sib (HS) or full sib (FS) family is
important. Alternately if the analysis is not for estimation of genetic components of variance or is
based on a fixed effects model, you can continue using the HS default option.
Simulation must be selected only after conducting the variance component analysis for HS or FS.
On opening the univariate analysis screen, the Primary Trait box will be at “Null”. Clicking on this
box will open the dropdown menu with all the traits in the uploaded data set; in the example; NZGro
(seasonal growth at 3 locations in New Zealand).
Fixed terms: clicking in the fixed terms box will open a dropdown menu that will enable you to
select the appropriate factors in the data set (years, locations, seasons, in the example). Select
“Null” if no fixed terms are to be included in the model.
Random terms: Select the appropriate factors from dropdown menu which opens in this box.
Traits for BLUP or BLUE estimates can be selected from the associated dropdown menus. Click Run to begin
analysis
9
Univariate analysis (Case study 1 continued): The linear model - One trait
Replicates nested within locations within seasons within years,
Lines
Line-by-year interaction,
Line-by-season interaction,
Line-by-year interaction
Select Primary trait to be analysed, (From our example data set “NZGro has been selected),
Select Fixed terms and their interactions as required, (if an additional term that does not
appear in the dropdown menu needs to be added, double click in the fixed terms box, enter
the term and click on “Add” that appears)
Select Random terms and interactions as required, (if an additional term that does not
appear in the dropdown menu needs to be added, double click in the random terms box,
enter the term and click on “Add” that appears)
Select Heritability if appropriate, (In Case study 1, this was considered as Repeatability)
Select BLUP if lines are random or BLUE if fixed,
Click Run to begin analysis.
10
As the lines were
considered as random
effects in the linear
model and the BLUP
estimate option was
selected, clicking on
the BLUP button has
provided this analysis
output:
BLUP values for mean
growth based on line
performance across
years, seasons and
locations.
Univariate analysis (Case study 1 continued): linear mixed model analysis-
output
Results for Fixed effects
Genotypic variance (σ2g ) among the 107entries
Associated ± standard error
Specifically for case study 1, this estimate was considered as Repeatability.
Error CV of trial
11
Univariate analysis: Case study 2 The demonstration/practice data set used, consists of 90 half
sib (HS) families of Perennial ryegrass (Lolium perenne L.)
evaluated at 1 location over 3 years, for seasonal growth.
Data file name: CaseStudy 2 under Examples.
Results for Fixed effects
Genetic variance (1/4σ2A ) among the 90
HS families Associated ± standard error
Narrow sense heritability (h2n)
12
Pattern analysis (within Univariate model option) – Multi-location (more than 2 locations)
The first step in Pattern analysis is to select Line-by-location interaction.
This is to generate a two way line-by-location BLUP matrix.
BLUP estimates for each individual line within each
location (in the example Rua, PN and KERI) will be
generated.
Click on
pattern
analysis
Click Run.
The demonstration/practice data set used, consists of 107
entries of Perennial ryegrass (Lolium perenne L.) evaluated a 3
locations over 3 years, for seasonal growth.
Data file name: CaseStudy 1 under Examples.
Locations:
Rua, Ruakura
PN, Palmerston North
KERI, Kerikeri
13
Clicking on Pattern Analysis – Cluster will result in generating
Line groups based on performance across locations and also
associated dendrograms of locations and lines.
Location groups: 1, 2 & 3.
Locations: KERI, Rua, PN
Dendrograms of Location and Line grouping
14
Clicking on Pattern Analysis – PCA (Principle
Component Analysis) will result in generating a
biplot based on PC1 and PC2, the line groups
and individual line labels. The directional vectors
are the locations.
Line clusters
Directional
vectors
15
Univariate analysis (continued): The linear model - Two trait combination
The demonstration/practice data set used, consists of 147 lines (half-
sib families) of switchgrass (Panicum virgatum L.) evaluated across 2
locations over 2 years using randomized complete block designs with 3
replicates. Data on 3 traits; dry matter yield (DMY), cell wall ethanol
(CWE) and Klason lignin (KL), are included.
Data file name: CaseStudy3 under Examples.
Analysis with the secondary trait included provides an opportunity to
simultaneously estimate narrow sense heritability for each trait, and their
genetic correlation.
These outputs are then automatically integrated into the breeding strategy
simulation models for estimation of Correlated Response to Selection of
the primary trait based on secondary trait selection.
All the initial steps with regards the fixed and random term models for the primary trait are
similar to the single trait analysis.
For analysis of the Secondary trait:
Tick the box for secondary trait and select the trait from the dropdown menu, (From
our example data set, trait KL was selected )
Click the MANOVA box and select the terms in the dropdown menu to conduct a
variance/covariance analysis,
Click Run
16
Univariate analysis Output – two trait (CWE/KL) combination - continued
Results from this analysis are similar to those from
the single trait analysis, but also provide information
on narrow sense heritability of the secondary trait as
well as the genetic correlation between the two
traits, CWE/KL.
Variance components for Random effects
from primary trait analysis.
Narrow sense heritability of primary trait.
17
Listed below is a guide for calculating the cost of generating a single Genomic Estimated Breeding Value
(GEBV) – referred to as Sample Cost in GS simulation
Step Cost/sample
Genotyping $53
DNA isolation $7
Library generation $9
†DNA sequencing $37
SNP genotypes $10
(bioinformatics)
Prediction of GEBV's $5
(statistical model)
Total $68
Other notes:
Assumes GBS as the genotyping method.
Sequencing uses an Illumina HiSeq 2500 with version 4 chemistry
†Cost is for the 96-plex level which will change with the level of multiplexing. The suggested multipliers: (48-ples ×2), (192-plex ×0.5), (384-plex ×0.25)
‡Dodds, K.G.; McEwan, J.C.; Brauning, R.; Anderson, R.M.; van Stijn, T.C.; Kristjánsson, T.; Clarke, S.M. (2015). Construction of relatedness matrices using
genotyping-by-sequencing data. BMC Genomics 16: 1047.
18
Estimation of Genetic Gain (G) and its simulation
Clicking on the Strategy dropdown window will enable selection of any of the breeding strategies below:
The data set in Case study 2 will be used to demonstrate application of three breeding strategies. The data
set consists of 90 half sib (HS) families of Perennial ryegrass (Lolium perenne L.) evaluated at 1location
over 3 years, for seasonal growth. Data file name: CaseStudy 2 under Examples.
Clicking on simulation will open the breeding strategy window.
The “Industry standard” can be the trait value of a commercial check or mean of checks in the genetic
family evaluation trial. Some may wish to use the long term average, across the target population of
environments, of the best commercial cultivars. This value will provide a relative comparison (%) of the
genetic gain estimated from family selection to an industry standard.100 is a default value. Any value can
be entered; mm, kg ha-1, …..
19
HS family based
breeding models
including
Genomic
selection (GS).
All the simulation variables have dropdown menus which provide a range of values to select from.
All estimated costs ($) should be entered manually.
Click Update every time a breeding strategy, simulation variable or cost ($) is changed. This will update G estimates and associated costs.
These constants cannot be changed
Inputs automatically transferred from linear model analysis
If Full Sib families, HS will change to FS. If two traits like CWE and KL, primary and secondary, respectively, are analysed, then models for Correlated response to
selection will be available.
Gc, gain estimate per cycle
Ga, gain estimate per annum
% (relative to parental mean)
% (relative to Industry standard)
Estimation of Genetic Gain (G) and its simulation continued
These constants cannot be changed Enter accuracy
value manually
20
Multivariate analysis
To begin: Click on
the “Model”
command and
select Multivariate. Plot gives you options to
generate a Biplot or a Matrix
Plot of phenotypic correlation,
based on raw data.
MANOVA (Multivariate analysis of variance) generates a variance and
covariance matrix and genotypic or genetic correlation coefficients for
the traits chosen from the dropdown menu in the Multiple traits box.
Clicking on Selection index activates a window that enables use of the
Smith-Hazel index.
Clicking in this box will show you the list of traits, in the uploaded data matrix, to be selected
for multivariate analysis based on the three options; Plot, MANOVA and Selection Index.
Used for
highlighting
groups in the
Plot option.
The demonstration/practice data (File name: CaseStudy3 in Examples. You need to first upload this file using: Data Input),
consists of 147 lines (half-sib families) of switchgrass (Panicum virgatum L.) evaluated across 2 locations over 2 years using
randomized complete block designs at each location containing 3 replicates. Data on 3 traits; biomass dry matter yield (DMY),
cell wall ethanol (CWE) and Klason lignin (KL) are included in matrix.
21
Using Plot
! This Biplot is based
on raw data.
22
Phenotypic correlation based on raw data.
23
1) Click on MANOVA,
2) Click on Multiple traits box and choose traits,
3) Click on MANOVA terms box and chose the effects for the completely random
linear model (keep this model simple by choosing main effects and only their two
way interactions),
4) Click Run.
Multivariate analysis Output – estimates are genetic if the data are generated from HS or FS families
MANOVA
1
2
3
4
24
1) Click on Selection Index,
2) Click on Multiple traits box and choose traits,
3) Manually enter the Index weightings,
4) Click on LME fixed terms box and select the fixed effects or leave as Null,
5) Click on LME random terms box and select the random effects,
6) Click on MANOVA terms box and chose the effects for the completely random linear
model (keep this model simple by choosing main effects and only their two way
interactions),
7) Click on the Selection pressure box and choose the intensity of selection,
8) To estimate the genetic gain for each trait under selection (DMY, CWE, KL) tick G,
9) Click Run.
Smith-Hazel selection index
1
2
3
4
5
6
7
8
9 Contains theory
& references
25
[𝑏] = [𝑃]−1[𝐴][𝑤]
[𝑏]
[𝑃]−1
[𝐴]
Smith-Hazel index - Output window
Smith-Hazel index (I): the genetic worth (breeding values) of the HS families.
The Smith-Hazel index equation
Individual trait BLUP’s
Gc, (%) gain estimate per selection cycle in unites of measurement of each trait,
at a 20% selection pressure.
26
Pattern analysis for Multiple Traits
Step 1, upload the Line-by-Trait mean data matrix into
DeltaGen using Data Input. This example is based on the
data file MultiTraitMatrix.csv found in examples.
Step 2, Click on Pattern Analysis
Step 3, select the variables by clicking on them, and keep
the standardized data option on,
Step 4, Click on Run.
Cluster analysis will produce Line groups and a heat map
with Line and Trait dendrograms.
The PCA BiPlot option will provide a graphical summary
of the Line clusters and trait association (shown by the
directional vectors).
27
Pattern analysis – output.
1, 2 & 3.
Line numbers
28
Traits
Lines
Dendrograms of Trait and Line grouping
29
Line cluster groups: Magnification and quality of the contents of the biplot
can be adjusted by moving the scale controllers
Magnification and quality of the contents of the biplot
can be adjusted by moving the scale controllers
The entries (lines, genotypes…..) shown in the biplot
can be shown as dots or labels by selecting the
option in the dropdown menu.
Directional
vectors
30
Trial design instructions
Click to open design menu
Clicking on “Design Type” will display the range of trial designs available: completely randomized, randomized complete block, factorial, row-column (repeated spatial checks can also be included.
All these values can be entered manually.
31
Generating a Completely Randomized trial design: Example; generate a design for 6 treatments. Each treatment will be
replicated 3 times. The total number of entries will therefore be 6×3 = 18. The row & column combinations could be: 2×9, 9×2,
6×3, 3×6.
Let’s generate a 2×9 design.
Click on Run: The data entry format sheet (shown below) and the trial
plan (shown on the next page) will be generated.
Each time Run is clicked a new randomization is generated !!!
This trial design format can be saved as a CSV file
32
Completely Randomized
33
Click on Run: The data entry format sheet (shown below) and the trial plan (shown on
the next page) will be generated.
Each time Run is clicked a new randomization is generated !!!
As each treatment occurs only once in a replicate, 1 was entered.
Generating a Randomized Complete Block trial design: Example; generate a design consisting 3 blocks (replicates) with
50 treatments each. Each treatment will appear once (1) in a replicate. The row & column combinations per replicate: 5×10,
10×5, 2×25, 25×2.
Let’s generate a 5 row×10 column per rep by 3 replicate design.
This trial design format can be saved as a CSV file
34
Randomized Complete Block
35
Click on Run: The data entry format sheet (shown below) and the trial plan (shown on the next
page) will be generated.
Generating a Factorial trial design: Example; generate a design for an experiment to determine herbage dry weight response of
5 perennial ryegrass cultivars to 4 levels of application of a nitrogen fertilizer. The design has 4 replicates.
The row & column combinations per replicate: 2×10, 10×2, 5×4, 4×5. Let’s generate a 5 row×4 column per rep by 4 replicate design.
This trial design format can be saved as a CSV file
36
Factorial design
Cv 5/Nfert 4
Cv 5/Nfert 2
Cv 5/Nfert 3 Cv 5/Nfert 1
37
Generating a Row & Column trial design: Example; generate a design for 50 treatments with 4 replicates. The total number
of entries across all 4 replicates will be 200. The row & column combinations per replication could be: 2×25, 25×2, 5×10,10×5.
Let’s generate a 5×10 per rep by 4 replicate design. The number of total rows across all 4 replicates will be 20
(5 rows×4 replicates).
Click on Run: The data entry format sheet (shown below) and the trial plan (shown on the next
page) will be generated.
This trial design format can be saved as a CSV file
38
Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 Col 7 Col 8 Col 9 Col 10
Row 1 Row 2 Row 3 Row 4 Row 5 Row 1 Row 2 Row 3 Row 4 Row 5 Row 1 Row 2 Row 3 Row 4 Row 5 Row 1 Row 2 Row 3 Row 4 Row 5
Row and Column design
39
Click on Run: The data entry format sheet (above) and the trial plan (shown on the next page)
will be generated.
Generating a Row & Column trial design with repeated spatial checks: Example; generate a design
for 80 treatments with 3 replicates having 2 checks with 4 repeats each in every replicate. The total
number of entries per replicate will be 80 treatments plus 2 check entries, 82.
Let’s generate an 8×11 per rep by 3 replicate design having 2 checks with 4 repeats each in every
replicate. The number of total rows across all 3 replicates will be 24 (8 rows×3 replicates).
This trial design format can be saved as a CSV file
40
,
Row and Column design with 2 checks, each repeated 4 times within a replicate
Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 Col 7 Col 8 Col 9 Col 10 Col 11
Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8 Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8 Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 Row 7 Row 8
41
Analysis reports can be saved as PDF, HTML and Word documents.
Click on any of the three document format options followed by
selecting Download.
To Quit DeltaGen, click on Quit App.
Save session and Quit