Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Simulation Study on the Methods for Mapping Quantitative Trait Loci
in Inbred Line Crosses
A Dissertation Submitted in Partial Fulfilment of the Requirements
for the Degree of
DOCTOR OF PHILOSOPHY in
Genetics and Plant Breeding
by
SHENGCHU WANG
Zhejiang University
Hangzhou, Zhejiang, China 2000
A Ph.D. DISSERTATION
Simulation Study of the Methods for Mapping
Quantitative Trait Loci in Inbred Line Crosses
By
Shengchu Wang
Major: Genetics and Plant Breeding
Supervisors: Dr. Jun Zhu and Dr. Zhao-Bang Zeng
Zhejiang University
Hangzhou, Zhejiang China
2000
DEDICATION
To My Wife, Xiu-Juan Rong
And Daughter, Min-Xue Wang
Acknowledgments
I like to express my special thanks to my advisor Dr. Jun Zhu for his important
directions, encouragement and support for my doctoral study and dissertation research.
The experience of studying with Dr. Zhu was beneficial and unforgettable.
I would like to express my sincere thanks to Dr. Zhao-Bang Zeng for supporting
me financially to do part of my dissertation research in US and giving me a lot of
helps in my research work and my life while I stayed at NCSU, US. Thanks also to Dr.
Bruce Weir for furnishing me the host lab and for good advice on my research work. I
would like to express my gratitude to my wife and daughter for their support and
patience.
I am grateful to Dr. Xin-Fu Yan, Dr. Yue-Fu Liu, Dr. Rong-Ling Wu, Hai-Ming Xu,
Ci-Xin He, and everyone who helped me during my dissertation research. I also wish
to express my thanks to my colleagues of computer centre, Zhejiang University, for
their support on my doctorial study and the dissertation research.
Abstract
As the fast advance in molecular genetics, it is much easy to get well-distributed
genetic markers in almost every organism nowadays. Therefore, as the major
direction of quantitative genetics, vary statistical methods have been developed to
detect or map quantitative trait loci (QTL) by using the genetic marker information. In
this dissertation, the principles and models have been summarized for various QTL
mapping methods. These methods include single marker analysis, interval mapping
(IM), composite interval mapping (CIM), mixed-model-based composite interval
mapping (MCIM), and multiple interval mapping (MIM).
A large scale of simulation studies has been used for exploring and comparing
various QTL mapping methods. The simulation study has indicated that although the
single marker analysis has the ability to detect the QTLs but it cannot locate the
positions of the QTL and obtain the estimation of the QTL effects.
Simulations have also been conducted for studying and comparing different
methods (IM, CIM, and MCIM) of QTL mapping under the simple additive situation.
By analysing the LR profile, the power of QTL detection and the probability of false
QTL detected can be calculated for the three methods under various situations. The
estimation of QTL effects and positions as well as their 95% experimental confidence
interval (ECI) for the detected QTLs is also obtained. The simulation results are
useful to those who are using these three methods for QTL mapping practices. The
results could be used as one of the bases for chosen the QTL mapping method among
the available methods for a particular experiment design. The research can also
provide the information for helping the analysis of the QTL mapping result.
However, under the real QTL mapping experiments, more complicated situations
such as QTL by environment interactions and QTL epistasis are existed generally. For
IM and CIM methods, the simulation studies implied that the estimation of QTL main
effects can be obtained unbiased by using data for all environments together.
However, it is difficult to obtain the estimation of QE interaction effects, even by
doing QTL mapping on the data for different environment separately. MCIM method
has the ability to put all QTL main effects and QE interaction effects into the mixed
linear model and obtained the unbiased estimation of main and QE effects as
indicated by the simulation study work.
MCIM method can also use mixed linear model for mapping QTLs with marginal
and epistatic effects. The simulation study has indicated that MCIM method can
obtain the unbiased estimation of QTL marginal and epistatic effects at the same time.
Although IM and CIM have the ability to get the unbiased estimation of marginal
QTL effects when the QTL epistatic effects are existed, the variance for the
estimation of marginal effects will increase largely too. On the other hand, the
detection power of QTLs will go down and the probability of false QTL detection will
go up apparently, especially for the CIM method as the simulation study indicated.
MIM is a multiple QTL oriented method and it also has the ability to analysis the
QTL epistatic effects. However, one of the crucial parts for MIM method is the
criteria or stopping rule for model selection. We proposed a set of parameters for
measuring the fitness between the selected model and the real model and an
experimental criterion has been presented for model selection in the framework of
QTL mapping by using simulation method. The criterion is a modification of BIC by
adding relevant facts such as heritability, marker density, sample size, and
chromosome numbers. The experimental criterion works fine in the simulation cases.
A modified software version of QTL Cartographer has been developed and it is
called Windows QTL Cartographer. Unlike original QTL Cartographer, Windows
QTL Cartographer is the QTL mapping software with user-friend interface and
powerful ability of graphic presentation for the mapping results. It has many users and
been posted on the Internet: (http://statgen.ncsu.edu/qtlcart/WQTLCart.htm).
Key words: Computer simulation; QTL mapping methods; Quantitative trait loci; Model selection; BIC criterion
TABLE OF CONTENTS
1. INTRODUCTION 1
1-1 HISTORY OF THE QTL MAPPING WORK ...................................................................................3 1-2 MOLECULAR MARKERS ..........................................................................................................6 1-3 EXPERIMENTAL DESIGN ..........................................................................................................9 1-4 MODELS AND SOFTWARE ......................................................................................................11 1-5 SIMULATION VS. REAL DATA.................................................................................................13 1-6 MAP FUNCTIONS AND MARKER ANALYSIS ...........................................................................15
1. Map Functions ....................................................................................................................15 2. Marker Order Analysis........................................................................................................17 3. Marker Segregation Analysis ..............................................................................................20
1-7 PURPOSE OF THIS RESEARCH ................................................................................................21
2. REVIEW OF MAJOR QTL MAPPING METHODS 22
2-1 ONE MARKER METHOD.........................................................................................................22 1. Statistic Bases for One Marker Method ..............................................................................22 2. The t -test Method..............................................................................................................23 3. Likelihood Ratio Test Method..............................................................................................24 4. Simple Regression Method ..................................................................................................25
2-2 INTERVAL MAPPING METHOD ...............................................................................................25 1. Conditional Probabilities of QTL Genotypes......................................................................26 2. Genetic Model .....................................................................................................................27 3. Maximum Likelihood Analysis ............................................................................................28 4. Likelihood Ratio Test...........................................................................................................29
2-3 COMPOSITE INTERVAL MAPPING ...........................................................................................30 1. Properties of Multiple Regression Analysis ........................................................................31 2. Genetic Model .....................................................................................................................33 3. Likelihood Analysis .............................................................................................................33 4. Hypothesis Test....................................................................................................................34 5. Marker Selection .................................................................................................................34
2-4 MIXED LINEAR MODEL APPROACH ......................................................................................35 1. Genetic Model .................................................................................................................36 2. Likelihood Analysis .............................................................................................................36 3. Hypothesis Test....................................................................................................................37 4. A Model for GE Interaction ................................................................................................37 5. A Model for QTL Epistasis..................................................................................................38
2-5 MULTIPLE INTERVAL MAPPING .............................................................................................39
3. SIMULATION STUDIES 41
3-1 SIMULATION MODEL AND DATA............................................................................................41 1. Genetic Model for Simulation .............................................................................................41 2. Parameter Setting ...............................................................................................................42
−1−
3. Simulation Procedure..........................................................................................................43 4. Format of the Simulation Data ...........................................................................................44
3-2 SINGLE MARKER ANALYSIS ..................................................................................................45 3-3 COMPARING DIFFERENT MAPPING METHOD.........................................................................47
1. Parameters Setting ..........................................................................................................47 2. Estimation of QTL Effects ...............................................................................................47 3. Power and False Positive................................................................................................48 4. Positions and Effects of Detected QTLs ..........................................................................52 5. The LR Profile .................................................................................................................54
3-4 CONSIDER THE COMPLICATED QTL MAPPING SITUATIONS ..................................................56 1. Parameters Setting ..........................................................................................................56 2. Performance of IM and CIM Methods ............................................................................57 3. Using MCIM Method ......................................................................................................62
4. MODEL SELECTION AND CRITERIA 65
4-1 MIM AND MODEL SELECTION ..............................................................................................65 4-2 MODEL EVALUATION STANDARD ..........................................................................................66 4-3 MODEL SELECTION STRATEGY AND CRITERIA......................................................................67 4-4 PROCEDURE OF MODEL SELECTION ......................................................................................69 4-5 SUMMARY OF CRITERIA FOR MODEL SELECTION .................................................................71
1. Adjusted R2..........................................................................................................................71 2. Mallow’s Cp (Mallows 1973)...............................................................................................71 3. Mean Squared Error Prediction (Aitkin 1974, Miller 1990)...............................................71 4. BIC and Related Criteria ....................................................................................................72
4-6 SIMULATION STUDIES OF CRITERIA.......................................................................................73 1. FW and BW Methods ..........................................................................................................74 2. Criteria and the Various Parameters ..................................................................................74 3. Experimental Criteria .........................................................................................................77
5. CONCLUSIONS AND DISCUSSION 80
5.1 CONCLUSION .........................................................................................................................80 5.2 THRESHOLD AND CRITERIA ...................................................................................................82 5.3 SOFTWARE DESIGN................................................................................................................84
REFERENCE 88
−2−
1. Introduction
1-1 History of the QTL mapping work
It is believed that the rediscovery of Mendelian genetics in 1900 was beginning of
the modern genetics. Through the demonstration on the inheritance of discrete
characters, such as purple vs. white flower, smooth vs. wrinkled seeds, it is clear that
the traits are controlled by genetics factors or genes, which will, inherited from
generation to generation. Later on, great efforts have been made on understanding
how the genes effecting the discrete characters or qualitative traits, especially the
nature of the genes to transmit from the parents to their offspring.
However, most economically as well as biologically important traits are not
qualitative, but quantitative in nature. Here the quantitative means that the trait’s
value cannot be divided into several categories and the distribution of these values is
continuously over a range in a population. The examples of the quantitative trait are of
crop yield, plant height, resistance to diseases, weight gain in mice and egg or milk
production in animals. Due to the complexity nature of the quantitative inheritance,
the progress of quantitative genetics is far behind the Mendelian genetics. To partition
phenotypic variance into various genetic and non-genetic variance components is the
traditional way to study the quantitative traits.
VP = VG+Ve = VA+VD+VI+Ve
Here the phenotypic variance VP is partitioned into two components: genetic part
VG and environmental and residual part Ve. The genetic variance can be further
partitioned into additive VA, dominance VD and epistatic VI variances. It is also
possible to partition VG into other variance components according to the applications.
For example:
VG = VA+VD+VL+VM
where VL is the sex linkage component and VM is the maternal variance component
(Zhu and Weir, 1996).
−3−
These variance components can be estimated under the special breeding designs
(Cockerham 1961, Eberhart et al. 1966, Falconer, 1996, Zhu 1998). These estimations
allow us to evaluate the relative importance of various determinants of the phenotypic
variance. The ratio PG VV is called as heritability in broad sense and PA VV is
called as heritability in narrow sense or just heritability (h2). Heritability measures the
degree that genes transmitted from parents to their offspring comparing to phenotypic
deviation and it is useful in predicting the response to selection.
The questions how the genes contribute to the quantitative trait values and why
the trait values are continuously distributed may be answered partially by polygene
theory (Johannsen 1909, Nilsson-Ehle 1909, East 1916). In this theory, a quantitative
trait is controlled by many genes with small effects, and at the same time is also
influenced easily by environment effects. However, it is very difficult to dissect the
individual genes that controlling a quantitative trait by classical quantitative genetic
means. Therefore, Breeders usually have no idea about the number, location and
effect of the individual genes involved in the inheritance of target quantitative traits
(Comstock 1978). These genes are also called quantitative trait loci (QTLs). It is
impossible to manipulate the QTLs using genetic engineering method and through
that to improve the organism’s traits without obtaining the QTLs information, such as
number, locations, and effects.
The history of QTLs mapping can be traced back to 1920’s. Sax (1923) used the
morphological markers to demonstrate an association between seed weight and seed
coat colour in beans. Thoday (1961) used multiple genetic markers to systematically
map the individual polygenes, which control a quantitative trait. He notices: “The
main practical limitation of the technique seems to be the availability of suitable
markers”. It is obvious that the numbers of the morphological or protein markers are
very limited. Therefore, genetic markers are the nature choice for detecting or
mapping QTLs.
Nowadays, it is much easy to get well-distributed genetic markers in almost every
organism, because the fast advance of molecular genetic technology. Vary statistical
−4−
methods have been developed to detect or map QTLs by using genetic markers
information. Lander and Botstein (1989) proposed the interval mapping method (IM),
which use two adjacent markers to bracket a region for testing the existence of a QTL
by performing a likelihood ratio test at every position in the region. The method has
been proven more powerful and requiting fewer progeny than one-marker methods.
However, interval-mapping method has some drawback. Because it is a one QTL
model, the mapping position of QTLs will be seriously biased when more than one
QTL located at same chromosome (Knott and Haley 1992; Martinez and Curnow
1992).
Later on, several attempts have been made to solve this problem. Zeng (1993)
proved an important property of multiple regression analysis in relation to QTL
mapping: “If there is no epistasis, the partial regression coefficient of a trait on a
marker depends only on those QTLs that are in the interval bracketed by the two
neighbouring markers and is independent of QTLs located in other intervals”. Zeng
(1994) proposed an improved method called composite interval mapping (CIM) by
combining interval mapping with multiple regression analysis. Jansen (1993) has also
proposed a similar strategy. Composite interval mapping has proved having a better
performance than interval mapping in multiple linked QTLs case. Recently an
extended method called multiple interval mapping (MIM) has been proposed (Kao,
Zeng and Teasdale 1999). This method fits all QTLs into the model altogether and has
the ability for analysing QTL epistasis and the associated statistical issues.
A new methodology was also proposed (Zhu, 1998, 1999; Zhu and Weir, 1998)
for systematically mapping QTLs based on the mixed linear model approaches
(MCIM). The MCIM method has very similar performance with Zeng’s CIM
method (See chapter 3). However, MCIM method does not have the problem of
selecting the background control markers and setting the mapping windows size as
CIM method does. MICM method also has the advantage that is very easy to extend
for more complicated QTLs mapping situations such as QTL epistasis and QTL by
environmental interaction etc.
−5−
1-2 Molecular Markers
In classical Mendelian approach, the units of analysis are genetic variances rather
than the underlying genes themselves. However, individual QTL can be dissected by
using linked marker loci. This approach has long been recognized (Sax 1923;
Rasmusson 1933; Thoday 1961), but until recently it has been regarded as of minor
importance because of the lack of sufficient genetic markers. Thanks to modern
molecular biology, this situation has now been changed dramatically. The ability to
detect genetic variation directly at the DNA level has resulted in an essentially endless
supply of markers for any species of interest. Not surprisingly, there has been an
explosion in the use of marker-based methods in quantitative genetics.
The first molecular markers used were allozymes, protein variants detected by
differences in migration on starch gels in an electric field. This class of markers has
been extensively applied to a variety of genetic problems (Tanksley and Rick 1980;
Delourme and Eber 1992; Baes and Van Cutsem 1993; Kindiger and Vierling 1994).
Allozymic variants have the advantage of being relatively inexpensive to score in
large numbers of individuals, but there is often insufficient protein variation for
high-resolution mapping. This is the reason why the rapid development of QTL
mapping did not start with the advent of allozymic markers.
As methods for evaluating variation directly at the DNA level became widely
available during the mid-1980s, DNA-based markers largely replaced allozymes in
mapping studies. DNA is the genetic material of organisms and genetic differences
between individuals will be reflected directly by the nucleotide sequences of DNA
molecules. There are effectively no limitations on either the genomic location or the
number of DNA markers.
A wide variety of techniques can be used to measure DNA variation. Direct
sequencing of DNA provides the ultimate measure of genetic variation, but much
quicker scoring of variation is sufficient for most purposes. These methods include
Restriction Fragment Length Polymorphisms (RFLPs), Polymerase Chan Reaction
(PCR), Randomly Amplified Polymorphic DNAs (RAPDs) and microsatellite DNAs
−6−
etc. There are several recently developed methods that include Representational
Difference Analysis (RDA) and Genomic Mismatch Scanning (GMS).
RFLPs is one of the simplest and wide used types of DNA marker. The approach
is to digest DNA with a variety of restriction enzymes, each of which cuts the DNA at
a specific sequence or restriction site. When the digested DNA is run on a gel under
an electric current, the fragments separate out according to size. A variety of DNA
from different individuals can generate length variation. If we attempted to score the
entire genome for fragment lengths, the result would be a complete smear on the gel.
Instead, individual bands are isolated from this smear by using labelled DNA probes
that have base pair complementarily to particular regions of the genome. Each RFLP
probe generally scores a single marker locus, and the marker alleles are codominant,
as heterozygotes and homozygotes can be distinguished. The first use of the RFLP
markers is in construction of human genetic map (Botstein et al. 1980; Doris-Keller et
al. 1987), and this has been extended to analysis for other species (Beckmann and
Soller 1983, 1986a, 1986b; Soller and Beckmann 1988).
PCR is a rather different molecular marker approach that uses short primers for
DNA replication to delimit fragment sizes. A opposite orientated region flanked by
primer binding sequences that are sufficiently close together allows the PCR reaction
to replicate this region, generation an amplified fragment. If primer-binding sites are
missing or are too far apart, the PCR reaction fails and no fragments are generated for
that region. RAPDs method (Williams et al. 1990) has the similar procedure that the
sequence polymorphisms are detected by using random short sequences as primer.
The advantage is that a single probe can reveal several loci at once, each
corresponding to different regions of the genome with appropriate primer sites. They
also require smaller amounts of DNA. However, RAPDs markers are dominant and
the marker genotype can be ambiguous. Ragot and Hoisington (1993) conclude that
RAPDs are suitable for modest number of individuals, while RFLPs are better for
larger studies.
Microsatellite DNAs, short arrays of simple repeated sequences tend to be very
highly polymorphic. Since array length is cored, microsatellites are codominant, as
−7−
heterozygotes show two different lengths and hence can be distinguished from
homozygotes. This kind of marker is especially suitable for outbred population
because it is most efficient with marker loci having a large number of alleles.
RDA and GMS are two recently developed advance methods. Both methods
examine the entire genome, allowing one to isolate only those sequences that are
shared by two populations (GMS) or those that differ between populations (RDA).
Good use of these methods will very likely provide powerful approaches for the
isolation of QTLs (Lander 1993, Aldhous 1994). Besides above commonly used
markers, other categories of markers can also be very useful in some cases.
The linear arrangement of the markers along the chromosomes or genome for the
species is called marker linkage map. The map information is very important for vary
QTL mapping research work. There are many saturated marker maps, which means
markers covering whole genome in a reasonable distance, have been published in
many organisms (Halward et al. 1993, Xu et al. 1994, Causse et al. 1994, Viruel et al.
1995, Hallden at al. 1996). Based on these kind of saturated maps, many research
areas became more likely to be successful. These research works include studies on
evolutionary process of organisms through comparative mapping (Lagercrantz et al.
1996, Simon et al. 1997), marker assisted selection to improve breeding efficiency
(Lee 1995, Hamalainen et al. 1997) and marker based cloning (Xu 1994) etc.
It is necessary to distinguish between the ideas of the physical maps and the
genetic maps. The set of hereditary material transmitted from parent to offspring is
known as the genome, and it consists of molecules of DNA (DeoxyriboNucleic Acid)
arranged in chromosomes. The DNA itself is characterized by its nucleotide sequence
that is the sequence of bases A, C, G or T. A physical map is an ordering of features
of interest along the chromosome in which the metric is the number of base pairs
between features. This is the level of detail needed for molecular studies, and there are
several techniques available for physical mapping of discrete genetic markers or traits.
However, in this paper genetic map are the main concern and that is the distances
depending on the level of recombination expected between two points. An individual
receives one copy of each heritable unit (allele) from each parent at each location
−8−
(locus) of the genome. The combination of units (haplotype) at different locations
(loci) that the individual transmits to the next generation need not be one of the
parental sets. Recombination may have taken place during the process of meiosis
producing eggs or sperm. That is, through crossing over events alleles in diploids may
come from either of the two parental chromosomes to form the haploid egg or sperm.
Although there is generally a monotonic relation between physical and recombination
distance, the relation is not a simple one.
1-3 Experimental Design
To cross between completely inbred lines, which differ in the trait of interest, offer
an ideal setting for detecting and mapping QTLs by marker-trait associations. The
reason is by doing that all F1s are genetically identical and shows complete linkage
disequilibrium for genes differing between the inbred lines. A number of designs have
been proposed to exploit these features. These designs can produce various mapping
populations that include backcross population, intercross population, doubled haploid
population and recombinant inbred lines population etc. The most inbred lines cross
design population are involved crop plants, however it is also applied to a number of
animal species, especially mice (reviewed by Frankel 1995).
Here we call the two different parental inbred lines (P1 and P2), the one is low (L)
line, and another one is high (H) line. The F1 individuals receive a copy of each
chromosome from each of the two parental lines, and so, wherever the parental lines
differ, they are heterozygous. All F1 individuals will be genetically identical and have
the genotype of HL at each locus. Almost all-experimental designs are starting from
the F1 status.
In a backcross design, The F1 individuals are crossed to one of the two parental
lines, for example, the high line. The backcross progeny, which may number from 100
to over 1000, receive one chromosome from the F1, and one from high parental line.
Thus, at each locus, they have genotype either HH or HL. As a result of crossing over
during meiosis, which is the process during the formation of the gametes, the
chromosome received from the F1 is a mosaic of the two parental chromosomes. At
−9−
each locus, there is a half chance of receiving the allele from the high parental line
and a half chance of receiving the allele from the low parental line. The chromosome
received will be the alternation between stretches of L’s and H’s.
Another common experimental design used in plants is the intercross design. F2
population is made from selfing or sib mating F1 individuals. The F2 individuals
receive two sets of chromosomes from the F1 generation, each of which will be a
combination of parental chromosomes. Thus, at each locus, the F2 individuals will
have the genotypes of HH, HL or LL. The F2 population provides the most of genetic
information among different types of mapping populations (Lander et al. 1987), and is
relatively easy to be obtained.
A doubled haploid (DH) population is composed of many DH lines that are
usually developed from pollens on an F1 plant through anther culture and
chromosome doubling. The genotypes of the DH line’s individuals are homozygous
and are HH or LL in different locus along chromosome. DH populations are also
called permanent population because there will be no segregation in the further
generations. The advantage of the DH population is that the marker data can be used
repeatedly in different locations and years under various experimental designs.
However, the rates of pollens successfully turned into DH plants may vary with
genotypes of pollens, and this will cause segregation distortion and false linkage
between some marker loci.
A recombinant inbred lines (RIL) population is constructed by selfing or sib
mating individuals for many generations start from F2 by single seed descent approach
till almost all of the segregating loci come to be homozygous. Some RIL populations
have been developed in rice, maize and barley etc. recent year (Burr et al. 1988, Reiter
et al. 1992, and Li et al. 1995). The advantage of the RIL population is the genetic
distances are enlarged compared to those obtained from F2 or BC populations. The
reason is that many generations of selfing or sib mating increases the chance of
recombination. Therefore, It may useful for the increasing of the precision in QTL
mapping. However, it is not possible that all individuals in a RIL population are
−10−
homogeneous at all segregating loci through the limited generations of selfing or sib
mating, which will decrease the efficiency for QTL mapping to some extent.
People use different experiment design population for different QTL mapping
research. In this dissertation, B1 and DH population will be used as chief example
because of its simplicity. At each locus in the genome, the progeny of B1 or DH
population have only two possible genotypes. However, the principles and results
obtained here are very easy to extend to other experiment design populations.
1-4 Models and Software
The QTLs information (numbers, positions, and effects etc.) of the experiment
population is unobservable. Through the experiment, people can only observe the trait
phenotype and marker information for each individual. The idea that genetic markers,
which tend to be transmitted together with specific values of the trait, are likely to be
close to a gene affecting that trait is the base for QTLs mapping. Therefore, the
genetic and statistic models are very important for describing the data and abstracting
the QTLs information from the data.
Genetic models are used for describing the organism’s genetic activity such as
recombination events and additive, dominant, or epistatic phenomena etc. For more
than two markers in a chromosome, the simplifying assumption is that recombination
between any two of them is independent from others’ recombination events. This
assumption is called no interference and the phenomenon of a single crossing over
between DNA strands can be considered as a Poisson-process. Therefore, Haldane’s
mapping function (Haldane 1919) can be used for describing the relationship between
recombination fraction r and genetic distance x.
Statistical models are the methods to obtain the QTLs information from the
experimental data through associate analysis and statistical calculation. Without the
appropriate statistical model, there is no way to retrieve the QTL information from the
experiment data, which includes the quantitative phenotypes and molecular markers.
Therefore the statistical model is critical for mapping QTL and a large number of new
models have been proposed since the 1980s (Weller 1986, Lander and Botstein 1989,
−11−
Haley and Knott 1992, Jansen 1992, 1993, Zeng 1993, 1994, Zhu 1998, Kao 1999
etc.).
We can classify these statistical models (methods) base on the number of markers
used or the techniques applied (Liu 1997, Hoeschele et al. 1991). The classification
according to marker numbers includes “single marker method”, “Flanking marker
methods” and “multiple marker methods”. It also can group the methods as “least
square methods”, “regression methods”, “maximum likelihood methods”, and “mixed
linear model approach methods” etc. In summary, these various methods differ from
simple to complicated, from detecting QTL-marker association to locating QTLs
position and estimation their effects, and from low resolution and power to high
resolution and power. In the later chapters, we will discuss these methods in more
details.
It is possible to use calculator to solve statistic problems when the data set is not
very large and the method is not too complicated. However, computer program is
usually used when people analysis the data set by statistic means. There is several
commercial software packages exist currently for statistical analysis purpose. These
general-purpose statistical software packages include SAS, SPSS, SPLUS, and
STATISTICA etc. It is likely to use these kinds of software to do the QTL mapping
analysis (Haley and Knott, 1992). However, the methods for QTL mapping are
usually complicated and not standardized. It is usually not efficient sometime even
impossible to map QTL by using these kinds of software package. Therefore, many
computer programs based on specific statistical methods have been developed for
QTL mapping purpose (Lander and Botstein 1989, Basten 1994,Wang 1999).
Base on the classical interval mapping principles, Mapmaker/QTL (Lander et al.
1987) is one of the popular QTL mapping software. This software has different
versions for PC, Mackintosh, and UNIX systems and it uses command-driven user
interface. It means that a series of commands should be executed for different stages
such as data input, doing various mapping functions and output the result.
QTL Cartographer (Basten et al. 1994) is another popular QTL mapping software
developed according to Zeng’s composite interval mapping method (Zeng 1994). The
−12−
software also has different versions for PC and UNIX. However, the original software
uses several commands to fulfil the mapping tasks and sometimes it is confusing. We
have developed a windows-version of QTL Cartographer software that uses
user-friend interface and graphic result representation. It is certain that the new
version of the QTL Cartographer will be much easier to use and the software will be
described in more details later.
Other software is also available for QTL mapping, such as QTLSTAT (Liu and
Knapp, 1992), PGRI (Lu and Liu, 1995), MAPQTL (Van Ooijen and Maliepaard
1996) and Map Manager QTL (Manly et al 1996). Obvious, these programs are not as
popular as Mapmaker/QTL and QTL Cartographer. However, It is believed that new
method based QTL mapping software will be gradually accepted by genetic
researchers over the time. Advanced statistical method and good user interface should
be the most important facts for these kinds of software.
1-5 Simulation vs. Real Data
Statistical model is used for describing the real biological or genetic system.
Because this kind system is so complicated and some facts are unknown, it is
impossible to include all the facts (parameters) into a model. Therefore, it is
reasonable that there are several statistical models for QTL mapping research. Some
of these are quit complicated and some others maybe very simple. The properties of
an estimator for the statistical model can be obtained parametrically if the distribution
of the estimator is known and well characterized. However, in most models for QTL
mapping, it is usually too complicated to get the properties of the estimators
parametrically. Therefore, computer simulation is necessary for obtaining the
properties and checking the performance of the models and methods. This is no way
to examination a model’s performance by using real (experimental) data because the
true parameter is unknown. The advantage of using computer simulation data is that
we know the true parameters that can be used to compare with the estimator of the
model.
−13−
The data for QTL mapping have two components, which include the map
information and the cross information. The map information data set contains
information of the marker positions and orders for each chromosome or linkage group
for an experimental organism. Figure 1-1 is the estimated genetic map for X
chromosome of the mouse species and the Table 1-1 is the map data in QTL
Cartographer format.
Figure 1-1. Markers information of X chromosome for mouse data. The numbers are distance incM between two markers and the labels are the marker’s names.
Tpm3-rs9 DXMit3 Hmg1-rs14 Hmg14-rs6 DXNds1 Rp118-rs17 Hmg1-rs13 DXMit97 DXMit109 DXMit48 Rps17-rs11 DXMit16 DXMit57
14.3 12.2 4.1 2.5 4.1 6.7 6.9 6.9 4.1 1.2 1.5 8.3
Table 1-1. Map data in QTL Cartographer format. 1No 2Labels 3Interval 4Position No Labels Interval Position
1 Tmp3-rs9 14.3 0.0 8 DXNds1 6.9 50.8 2 DXMit3 12.2 14.3 9 DXMit48 4.1 57.7 3 Hmg1-rs14 4.1 26.5 10 Rp118-rs17 1.2 61.8 4 DXMit97 2.5 30.6 11 Rps17-rs11 1.5 63.0 5 DXMit16 4.1 33.1 12 DXMit57 8.3 64.5 6 Hmg14-rs6 6.7 37.2 13 Hmg1-rs13 0.0 72.8 7 DXMit109 6.9 43.9
1Marker number, 2Marker name, 3Marker position (cM) in ‘interval’ format and 4Marker position in ‘position’ format.
The cross information includes the trait values and the marker genotypes for each
marker position of the individuals in an experiment population. Table 1-2 is the cross
information of mouse data set (partial), which is the Backcross population.
In the simulation study case, we can set the map information for a population and
producing (sample) the cross information of each individual from the population
according to various parameters such as QTL number, positions and distribution etc.
−14−
Table 1-2. First 6 individuals’ cross data in X chromosome of mouse species.
Markers on the X chromosome of the mouse species 1Ind 2BW 1 2 3 4 5 6 7 8 9 10 11 12 13
1 50.0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 54.0 1 1 1 1 1 1 1 1 1 1 1 1 0 3 49.0 0 1 1 1 1 1 1 1 1 1 1 1 1 4 41.0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 36.0 1 1 1 1 1 1 1 1 1 1 1 1 1 6 48.0 0 0 0 0 0 0 0 0 0 0 0 0 0 M M M M M M M M M M M M M M M
1Individual number, 2One of the trait: Body Weight. 3Marker genotypes in each marker position: 1- AA and 0 – Aa.
1-6 Map Functions and Marker Analysis
1. Map Functions
To obtain the marker information, such as position and order, along the
chromosomes is very important for QTL mapping study of an organism. The state of a
specific genetic marker is called the marker genotype. There are two states of marker
genotype for Backcross (or DH) population. We can use 1 to represent MM genotype
and 0 for Mm (−1 for mm) on the marker M. Individuals sharing the same parents
may have different genotypes for the same genetic markers. These differences provide
the variation we need to statistically estimate the relationship between genetic markers
for the purpose of resolving their linear order across chromosomes of the organisms.
Recombination or crossover occurred during prophase I stage of meiosis is the
reason for individuals with same parents may have different marker genotypes. That is
during the production of gametes, an exchange of material between pairs of
chromosomes may occur. People can detect and record the variation or recombinants
by using laboratory techniques as marker genotype for each individual. There are
several facts about the marker genotype:
- The closer of the two markers, the less likely a recombination event is to occur.
- Markers that reside on different chromosome are unlinked.
−15−
- Two markers that never experience a recombinant event between them are called
completely linked. They travel together during the meiosis process.
- If an even number of crossing over events occurs between two genetic markers,
this event is undetectable.
The number of odd crossovers (k) in an interval defined by two genetic markers
has a Poisson distribution with mean θ, that is:
Pr (recombination) = r =∑ −−−
−−
−=−
=++=k
k
eeeeeke )1(
21
2)(...)
!3!1(
!2θ
θθθθ
θ θθθ (1-1)
where θ is the number of map units M between two markers and here M stands for
Morgan and one M equal to 100 cM (center Morgan). After solving above equation
for θ gives Haldane’s map function:
)21ln(21 r−−=θ (1-2)
If let r equal to 0, the θ will be 0 too and it is the completely linkage case. If let r
equal to 0.5, the θ will become ∞ and this means markers are unlinked. This case
happened might be caused by the fact of the markers reside on different chromosomes
or also markers on the same chromosome, but far apart.
Table 1-3. Relationship between recombination frequency and map distance (M).
Recom. 0.0100 0.0500 0.1000 0.1500 0.2000 0.3000 0.4000 0.4900 0.4950 Haldane 0.0101 0.0527 0.1116 0.1783 0.2554 0.4581 0.8047 1.9560 2.3026 Kosambi 0.0100 0.0502 0.1014 0.1548 0.2118 0.3466 0.5493 1.1488 1.3233
If interference is taken into account, the Kosambi map function should be used:
−+
=rr
2121ln
41θ (1-3)
Table 1-3 is relationship between r and cM using different map function. It is easy
to conclude that comparing to Haldane function, as two markers become further apart,
the value of Kosambi map function decreased. However, for very small values of
recombination, both Haldane and Kosambi map function has similar value with
recombination frequency.
−16−
2. Marker Order Analysis
It is necessary to estimate the probability of recombination between each pair of
genetic markers. The recombination occurs in the F1 gametes will be detectable in the
backcross (B1) generation. Assume we have two markers M and N, each having two
versions or alleles M1, M2 and N1, N2. The possible states or genotypes of the two
genetic markers are M1/M1, M1/M2 and N1/N1, N1/N2 for B1 population. If an
offspring’s genotype differs from the parental genotype at the markers, it means that a
recombination event is observed. From Table 1-4 we can know easily that the total
number of recombinant events is n2 + n3. Therefore the estimation of the
recombination frequency between marker M and N should be (n2+n3)/(n1+n2+n3+n4).
Maximizing likelihood method can also be used to solve this problem.
The likelihood function to describe this situation is . 4132 )1()( nnnn rCrrL ++ −=
To take the natural logarithm: )1ln()(ln)(ln)(ln 4132 rnnrnnCrL −++++=
To set the partial derivative with respect to r as 0 and solving the equation for r:
0)1(
)(ln 4132 =−+
−+
=∂
∂rnn
rnn
rrL And
4321
32ˆnnnn
nnr
++++
=
Table 1-4. The possible genotypes for Marker M and N of B1 population.
Marker genotypes N1 / N1 N1 / N2 M1 / M1 n1 n2 M1 / M2 n3 n4
It is very easy to use above formula for calculating the pair wise recombination
frequency between each pair of markers. By doing this calculations, we can decide the
linkage groups. A linkage group is a group of markers where each marker is linked (r
< 0.5) to at least one other marker. If a marker is not linked to any marker in a linkage
group, it does not belong to that group, and most likely belongs to some other linkage
group. In theory, the linkage group numbers should equal to chromosome numbers.
However, sometime the linkage group numbers is greater than chromosome numbers
because the sample variance and the limitation of the sample size. In other words,
−17−
some of the recombination events are not detected by the experiment. In this case, to
increase the sample size or to do more experiments are necessary.
T
w
F
r
g
1
r
g
r
0
−
Figure 1-2. A linkage group structure for simulation study. Numbers abovethe markers are distances of the two markers in cM and under are maker
abl
Indi-dua
123456789
101112131415
T
ith
igu
eco
eno
-6
eco
I
rou
eco
.13
18−
2 3 5 1 4
e
v
m
m
m
20.3 17.6 7.8 21.0
1-5. Simulation data set of marker genotypes for a Backcross population.
Markers Markers i ls 1 2 3 4 5
Indivi -duals 1 2 3 4 5
AA AA AA Aa AA 16 AA AA Aa AA AA AA Aa Aa AA AA 17 Aa Aa Aa Aa Aa Aa Aa Aa Aa Aa 18 Aa Aa Aa Aa Aa AA Aa Aa AA Aa 19 Aa Aa Aa AA Aa AA AA AA AA AA 20 Aa Aa Aa Aa Aa AA Aa AA AA AA 21 AA AA AA AA AA AA AA Aa Aa Aa 22 AA AA AA AA AA AA AA AA AA AA 23 Aa Aa Aa Aa Aa Aa Aa Aa Aa Aa 24 Aa AA Aa AA Aa
AA AA AA Aa AA 25 Aa Aa Aa Aa Aa Aa Aa Aa Aa Aa 26 AA AA AA AA AA
AA AA AA AA Aa 27 AA Aa Aa AA Aa AA Aa Aa AA AA 28 AA AA AA AA AA Aa Aa Aa Aa Aa 29 Aa AA Aa Aa Aa AA AA AA AA AA 30 AA AA AA AA AA
able 1-5 is a simulation data set that includes marker genotypes of B1 population
5 markers and 30 individuals produced from the linkage structure showed in
re 1-2. Here the Haldane map function has been used. The numbers of the
bination events between two markers, which are the counts of changing from
type AA to Aa or from genotype Aa to AA, are presenting in Table 1-6. Table
also includes the recombination frequencies that are the numbers of the
bination events divided by total individual number 30.
t is very important to know the makers’ orders and positions along the linkage
p or chromosome. We can estimate this information from the table of
bination frequencies (Table 1-6). From Table 1-6 we know the smallest value is
and they are the recombination frequencies between marker 1 and 5 or between
marker 3 and 5. Here choice 3-5 as starting point (can choice 1-5 also). Then finding
the smallest value either from marker 3 side (2-3 is 0.17) or from marker 5 side (5-1 is
0.13) and the new order become 3-5-1. The next maker picked is 4 (1-4 is 0.17) and
the new order is 3-5-1-4. Therefore the final orders are 2-3-5-1-4.
After obtaining the markers order, it is easy to estimate the map distance between
markers by using recombination frequencies and appropriate map function. For
example, the recombination frequencies between marker 2 and 3 are 0.17 and the
distance will be 20.8 cM by using formula (1-2) to calculation. The final result is in
Figure 1-3.
Table 1-6. The count (frequencies) of recombination events.
Markers 1 2 3 4 5 1 0(0.00) 7(0.23) 6(0.20) 5(0.17) 4(0.13) 2 0(0.00) 5(0.17) 10(0.33) 7(0.23) 3 0(0.00) 9(0.30) 4(0.13) 4 0(0.00) 7(0.23) 5 0(0.00)
Co
the dis
consid
increas
Fro
estima
marker
very d
Proble
Figure 1-3. Estimated linkage group structure for the simulation data set.
2 0.17 3 0.13 5 0.13 1 0.17 4
m
t
er
e
m
to
n
i
m
20.8 15.1 15.1 20.8
paring Figure 1-3 to Figure 1-2, the markers order of estimation is correct but
ances between markers are not very accurate. It is quite reasonable for
ing such a small sample size (only have 30 individuals). As the sample size
d, the estimation will be more precise.
this simple example, it seems quite easy to obtain the markers order and the
rs of the marker distance by counting the recombination events. However, as
umber increase, the problem of ordering a set of genetic markers will become
fficult. This problem is equivalent to the famous “Travelling Salesman
”. One of the criteria for comparing two different orders is to minimize the
−19−
Sum of Adjacent Recombination Fractions (SAR). For above example, the SAR
value for the final order 2-3-5-1-4 is 0.17+0.13+0.13+0.17 = 0.60. The other criterion
includes SAL standards for Sum of Adjacent Likelihood Functions.
The main problem for ordering the markers is not the criterion but the
computation time. As the marker number increased, the numbers of possible orders
will quickly become unmanageable by means of computation. Therefore, the only
way to solve this problem is to find the better (not necessary the best) order through
some kind of searching procedures. Several methods have been proposed since 1980.
These methods include Branch and Bound (Thompson, 1984), Simulated Annealing
(Weeks and Lange, 1987), Seriation (Buetow and Chakravarti, 1987a, 1987b), and
Rapid Chain Delineation (Doerge, 1993) etc. There are numbers of software available
for ordering markers and estimating distance between markers, MAPMAKER
(Lander etc 1987) is one of it.
3. Marker Segregation Analysis
It is also important to do the Mendelian segregation test for each marker to test the
segregation distortion of the markers. By expectation, the segregation ratio should be
1:1 for population of BC, DH, or RIL and 1:2:1 for the intercross population. In
backcross population, to across between A/A and A/a produces the zygotes AA and
Aa with the same expected number of n/2. Table 1-7 shows the expected number and
observed number for above simulation data set as showed in Table 1-5. A test statistic
can be constructed by using χ2 under the null hypothesis, p(AA) = p(Aa) = 0.5
(Mendelian Segregation), as showed in formula (1-4). In this example, the individual
number n = 30 and n1 and n2 is observed number for genotype AA and Aa in each
marker position.
∑ −=
−= 2
1
221
22 ~)(
.#).#.#( χχ
nnn
ExpExpObs
(1-4)
Rejecting H0 means the deviation from Mendelian segregation is significant and
this phenomenon is called segregation distortion. Segregation distortion can be caused
by sample variation. However sometimes it is caused by genetic reason such as the
−20−
selection force on different types of zygotes is different. Significant segregation
distortion can bias estimation of recombination frequency (distance) between markers.
It can also reduce the power to identify QTLs and bias the estimation of QTLs
positions and effects.
Table 1-7. Marker segregation analysis for the simulation data set.
Markers Marker 1 Marker 2 Marker 3 Marker 4 Marker 5 Genotypes AA Aa AA Aa AA Aa AA Aa AA Aa 1Frequency under H0 ½ ½ ½ ½ ½ ½ ½ ½ ½ ½ Expected number 15 15 15 15 15 15 15 15 15 15 Observed number 18 12 15 15 12 18 17 13 14 16 χ2 value 1.20 0.00 1.20 0.53 0.13 p-value >0.250 >0.995 >0.250 >0.250 >0.500 1H0: null hypothesis.
1-7 Purpose of This Research
The purpose of the QTL mapping practice is to identify or locate various QTLs
along the chromosomes for a species through special experimental design and genetic
markers information. The QTLs information such as number, locations, and effects
can help geneticist and breeders to improve the quality and quantity of the plants or
animals. However, the fundamental of the QTL mapping methods is based on statistic
principles. It is important to understand the statistic principles before using a
particular QTL mapping method to analysis the experimental data set. Moreover, it is
also useful by comparing different QTL mapping methods to understand the
performances of the various methods under difference circumstance. This kind of
study can help users to choose the appropriate QTL mapping method according to
their experiment requirements and provide the basis for understanding the result after
QTL mapping analysis.
In this research, a large scale of computer simulation has been conducted for
studying and comparing the performances of the major QTL mapping methods. These
methods include Interval Mapping method, Composite Interval method, and Mixed-
model based CIM mapping method. We have also conducted a series of simulation
researches for identifying the model selection criteria that are the critical part for the
multiple QTL mapping methods. The computer software accompany with a particular
−21−
QTL mapping method is very important because the QTL mapping method is usually
too complicate to use without the computer software. However, the most QTL
mapping software existed are using command drive system as its interface and it is
usually not very convenience to use. We have developed a QTL mapping software
with user friend interface and result visualization ability. The software is called
“Windows QTL Cartographer” (Wang et al. 1999) that has been posted on the Internet
and has many users.
2. Review of Major QTL Mapping Methods
2-1 One Marker Method
One marker method is based on the simple idea that if there is an association
between marker type and trait value, it is likely that a QTL locus is close to that
marker locus. The approach has been applied in many studies of QTLs for various
organisms such as Drosophila (Thoday, 1961), maize (Edwards et al, 1987) and
tomato (Weller, Soller and Brody, 1988).
Table 2-1. Trait mean and distribution for various populations.
Population Genotype Mean Distribution P1
1MQ / MQ µ1 = µ + a N( µ1, σ2) P2 mq / mq µ2 = µ − a N( µ2, σ2) F1 MQ / mq µ12 = µ + d N( µ12, σ2)
1M or m means marker and Q or q indicates QTL.
Table 2-2. Frequencies and mean effects for various marker-QTL genotypes in B1 population.
Genotype MQ / MQ MQ / Mq MQ / mQ MQ / mq Frequency 1(1−r)/2 r/2 r/2 (1−r)/2
Mean effect µ + a µ + d µ + a µ + d 1 r is the recombination frequency between marker and QTL.
1. Statistic Bases for One Marker Method
Suppose that two parental inbred lines differ sufficiently in the quantitative trait
that we are convinced there are QTLs responsible for the trait difference. Assuming
the trait values of the two parental lines and F1 population are normally distributed as
−22−
showed in Table 2-1. The frequencies and mean effects of various marker-QTL
genotypes for B1 population (P1 × F1 = MQ / MQ × MQ / mq) are showed in Table
2-2. Although we cannot observe the QTL genotypes, but the marker genotypes are
observable and the mean effect for various marker genotypes in B1 population is
showed in Table 2-3.
If only one QTL is linked to marker M, the mean difference between the two
marker types in B1 population is showed in formula (2-1). If ignoring epistatic effect,
the mean different effect for the situation of multiple QTLs linked to the marker M is
showed in formula (2-2).
Table 2-3. Mean effects for various marker genotypes in B1 population.
Marker Types QQ Qq Mean Effect 1Frequency 1 – r r
MM Effect µ + a µ + d µMM = (1–r)(µ + a) + r(µ + d) Frequency r 1-r
Mm Effect µ + a µ + d µMm = r(µ + a) + (1-r)(µ + d) 1The frequency of various QTL genotypes, r is recombination frequency between Q and M.
µMM - µMm = [(1−r) (µ+a)+r(µ+d)] – [r(µ+a)+(1−r)(µ + d) ] = (1−2r)(a−d) (2-1)
( )(∑=
−−=−m
kkkikMmMM dar
1
21µµ ) (2-2)
2. The -test Method t
From formula (2-1), it is easy to know that if the difference in means of the two
marker genotypes is not zero, it can be inferred that , since it is known that
. Therefore, we can use the t-test statistic (formula 2-3) to test for
linkage between marker M and QTL Q.
5.0≠r
0)( ≠−= daδ
Hypotheses: H0: H1: − MmMM µµ 0=− MmMM µµ 0≠
t-test statistic: )2(~11
21
21
2
−+
+
−= nnt
nnst
p
MmMM µµ (2-3)
Here and represent the number of individuals belong to ‘MM’ and ‘Mm’
genotypic marker classes, respectively.
1n 2n
−23−
2
)1()1(
21
222
2112
−+−+−
=nn
snsns p
The is the estimate of the variance for ‘MM’ marker class individuals and the
is the estimate of the variance for ‘Mm’ marker class individuals.
21s
22s
3. Likelihood Ratio Test Method
For a normal distribution variable Y , the likelihood for the
parameters is
),(~ 2σµN
2
2
2)(
2
2
2
1),( σµ
πσσµ
−−
=Y
eL .
The phenotypic distribution for B1 population is a mixture normal as following:
),(),()1(~ 212
21 σµσµ rNNrY +− For MM marker genotype
),()1(),(~ 212
21 σµσµ NrrNY −+ For Mm marker genotype
The likelihood function of any one marker for the backcross scenario is showed in
formula (2-4). The hypothesis of no linkage can be tested with likelihood ratio
statistic.
Hypotheses: H0: r = 0.5 Ha: r<0.5
Likelihood ratio test statistic:
=<
=)5.0,ˆ,ˆ,ˆ()5.0,ˆ,ˆ,ˆ(
ln2 21210
2121
rLrLa
σµµσµµ
λ (2-4)
The estimates of will be different for r being estimated or set to 0.5.
In practice, a set of different values of r is tried and the LR score demonstrates how
much more likely the data are if there was QTL present as compared to the situation
when there is no QTL present. Then, the peak of the LR score can be used to compare
with the threshold value, which is derived according to the significance levels.
2121 ,, σµµ
L(µ1, µ12, σ2, r; x1, …, xn, y1, …, yn) =
),()1(),(),(),()1( 212
21
1 1
212
21
1 2
σµσµσµσµ NrrNrNNrn
i
n
i
−+++−∏ ∏= =
−24−
4. Simple Regression Method
The simple regression model is Y and i = 1, 2,…, n is the
individual index. is the overall mean and is the additive effect of the QTL
when an allele substitution is made from the recurrent parent to the non-recurrent
parent. is the indicate variable, which has the value ½ for carrying non-recurrent
marker (M) and –½ for carrying recurrent marker (m) by the individual.
iii X εββ ++= 10
1β0β
iX
Table 2-4. Possible outcomes for one marker one QTL situation.
Genotypes Frequency X – value Y – value MQ / MQ 1(1 – r ) / 2 ½ 2µ1 MQ / mQ r / 2 –½ µ1 MQ / Mq r / 2 ½ 3µ12 MQ / mq (1 – r) / 2 –½ µ12
1r is recombination frequency between the marker and the QTL. 2µ1 is mean value of P1 genotype. 3µ12 is mean value of F1 genotype.
From Table 2-4, we have:
E(X) = [(1–r)/2](1/2) + (r/2)( –1/2) + (r/2)(1/2) + [(1–r)/2]( –1/2) = 0
E(X2) = [(1–r)/2](1/2)2 + (r/2)( –1/2)2 + (r/2)(1/2)2 + [(1–r)/2]( –/2)2 = ¼
[ ] =−= )()( 222 XEXEXσ ¼
== )( XYEXYσ
[(1–r)/2](1/2)(µ1)+(r/2)( –1/2)(µ1)+(r/2)(1/2)(µ12)+[(1–r/2)]( –1/2) (µ12)
= ¼ (1 – 2r)(µ1 – µ12) = ¼ (1 – 2r)(a – d)
))(21(21 darMQX
XY −−==σ
σβ
Therefore to test the slope of the regression model to see it is zero or not has the
same meaning as a t-test introduced above.
2-2 Interval Mapping Method
When only one marker has being used in QTL mapping, the effects are
underestimated and the position cannot be determined. In order to overcome those
drawbacks, Lander and Botstein (1989) introduced the interval mapping as a
−25−
systematical way to scan the whole genome for evidence of QTL. Interval mapping
method is an extension of one marker analysis by using two flanking markers to
construct an interval for searching a putative QTL within the interval. The concept of
using complete marker linkage maps for genomic scanning of QTL is important and
the idea of viewing QTL genotypes as missing data and using a mixture model for
maximum likelihood analysis is influential.
The basic idea for interval mapping is simple. We first consider an interval
between two observable markers M and N, each having two possible alleles for
Backcross population. The genetic distance or recombination frequency between the
two markers has been previously estimated. A map function (either Haldane or
Kosambi) is utilized to translate from recombination frequency to distance or vice
visa. To calculate a LOD score at each increment (walking step) in the interval and
finally to get the profile of LOD score for whole genome. When a peak has exceeded
the threshold value, we declare that a QTL have been found at that location.
1. Conditional Probabilities of QTL Genotypes
The basic element upon which the formal theory of QTL mapping is built is the
probability of the QTL genotype conditional on the observed marker genotypes. From
the definition of a conditional probability, we have
)Pr()Pr()|Pr(
MNQMNMNQ = (2-5)
The joint and marginal probabilities, Pr(QMN) and Pr(MN), are functions of the
experimental design and the linkage map. When computing joint probabilities
involving more than two loci, one must also account for recombination interference
between loci. When considering a single QTL flanked by two markers M and N, the
gamete frequencies depend on three parameters: the recombination frequency r12
between markers, the recombination frequency r1 between marker M and the QTL,
and the recombination frequency r2 between the QTL and marker N.
−26−
Table 2-5. The probability of the QTL genotype condition on marker classes in B1 population.
Mk Class 1Prob1 Genotype 2Prob2 3Conditional (Prob2 / Prob1) MQN / MQN (1−r1)(1−r2)/2 Pr(QQ) = [(1−r1)(1−r2)] / (1– r12) ≈ 1
MN / MN (1 – r12)/2 MQN / MqN r1 r2 / 2 Pr(Qq) = r1 r2 / (1– r12) ≈ 0 MQN / MQn (1−r1) r2 / 2 Pr(QQ) = (1−r1) r2 / r12 ≈ 1 - p
MN / Mn r12 / 2 MQN / Mqn r1(1−r2)/2 Pr(Qq) = r1(1−r2) / r12 ≈ r1 / r12 = p MQN / mQN r1(1−r2)/2 Pr(QQ) = r1(1−r2) / r12 ≈ r1 / r12 = p
MN / mN r12 / 2 MQN / mqN (1−r1) r2 / 2 Pr(Qq) = (1−r1) r2 / r12 ≈ 1 - p MQN / mQn r1 r2 / 2 Pr(QQ) = r1 r2 / (1– r12) ≈ 0
MN / mn (1 – r12)/2 MQN / mqn (1−r1)(1−r2)/2 Pr(Qq) = [(1−r1)(1−r2)] / (1– r12) ≈ 1 1Probability of the marker class. 2Probability of the marker – QTL genotype. 3Conditional probability for the QTL genotype according to formula (2-5), here p equal to r1 / r12.
Under the assumption of no interference assumption (Haldane), the relationship
between r12 and r1, r2 will be , while under complete
interference (Kosambi). When r
212112 2 rrrrr −+= 2112 rrr +=
12 is small, gamete frequencies are essentially
identical under either interference assumption. Because the QTL is unknown, we
can only use the observable marker genotype to infer the QTL genotype. Table 2-5
shows the probability of the QTL genotype according to the two flank markers
genotypes.
2. Genetic Model
For a backcross population, to analyse a QTL located on an interval flanked by
marker M and N, the interval mapping method assumes the following linear model.
jjj exby ++= **µ j = 1, 2, …, n (2-6)
where = The effect of the putative QTL *b
=QqisgenotypeQTLtheifQQisgenotypeQTLtheif
x j 01*
),0(~ 2σNe j
In the model, the variable x* is used for indicating the QTL genotype which are
unobserved. However, the probabilities of possible QTL genotypes can be inferred by
given the genotypes of two flank markers as showed in Table 2-5 and the summary is
showed in Table 2-6. For backcross population, we can define
−27−
.1,0),,|(Pr * === kpNMkxobp jkj
where 121 rrp = and the approximation is obtained by assuming that the double
recombination events can be ignored.
Table 2-6. The probabilities of possible QTL genotypes condition on marker classes.
QTL Genotype Maker Classes Numbers QQ(1) Qq(0)
MN / MN n1 11
)1)(1(
12
21 ≈−
−−r
rr 01
))((
12
21 ≈− r
rr
MN / Mn n2 pr
rr−≈
−−
11
))(1(
12
21 pr
rr≈
−−
12
21
1)1)((
MN / mN n3 pr
rr≈
−−
12
21
1)1)(( p
rrr
−≈−
−1
1))(1(
12
21
MN / mn n4 01
))((
12
21 ≈− r
rr 11
)1)(1(
12
21 ≈−
−−r
rr
3. Maximum Likelihood Analysis
For model (2-6), there are two possible QTL genotypes each of that can be true
with a certain probability. The distribution of the model is a mixture normal
distribution and the likelihood function can be defined as
∏=
−+
−−=
n
j
jj
jj
yp
byppbL
10
*
12* ),,,(
σµ
φσµ
φσµ (2-7)
where ( ) ( 22
21 zez −=π
φ ) is the standard normal density function.
In likelihood function (2-7), the parameters include:
µ - the mean of the model
*b - the effect of the putative QTL
121 rrp = - the position of the putative QTL related to the flank markers
2σ - residual variance of the model
The data of the analysis include:
jy - Phenotypic value of a quantitative trait for each individual
Genotypes of markers for each individual that contribute to the analysis of
jkp , k = 1, 2; j = 1,2, …, n
−28−
The maximum likelihood analysis of a mixture model is usually through an
Expectation-Maximization algorithm. EM is an iterative procedure and the E-step for
likelihood function (2-7) is to calculate:
[ ]( )[ ]( ) [ ]( )σµφσµφ
σµφ−+−−
−−=
jjjj
jjj ypbyp
bypP
0*
1
*1
The M-step is to calculate:
( ) nbPyn
jjj∑
=
−=1
*ˆµ
( ) ∑∑==
−=n
jj
n
jjj PPyb
11
*ˆ µ
( )[ ]∑=
−−=n
jjj bPy
n 1
2*22 1ˆ µσ
This process is iterated until convergence of estimates.
4. Likelihood Ratio Test
The test statistic can be constructed using a likelihood ratio in LOD (likelihood of
odds) score:
)ˆ,ˆ,ˆ()ˆ,0,ˆ(log
2*
2*
10σµσµ
bLbLLOD =
−=
Under the hypotheses
0:0: *1
*0 ≠= bHandbH
By assuming that the putative QTL is located at the position indicated
by 121 rrp =
2* ˆ,ˆ, σb
, we can get the maximum likelihood estimates of under H2* ,, σµ b 1 as
and under Hµ 0 as with constrained to zero. That the LOD score test
is essentially the same test as the usual likelihood ratio test:
2ˆ,ˆ σµ *b
)ˆ,ˆ,ˆ()ˆ,0,ˆ(ln2
2*
2*
σµσµ
bLbLLR =
−=
And we have the relationship between LOD value and LR value as
−29−
( ) LRLReLOD 217.0log21
10 ==
The test can be performed at any position covered by markers and thus the method
creates a systematic strategy of searching for QTL. The amount of support for a QTL
at a particular map position is often displayed graphically through the use of
likelihood maps profile, which plots the likelihood ratio test statistic as a function of
map position of the putative QTL. If the LOD score at a region exceeds a pre-defined
critical threshold, a QTL is indicated at the neighbourhood of the maximum of the
LOD score with the width of the neighbourhood defined by one or two LOD support
interval (Lander and Botstein 1989). By the property of the maximum likelihood
analysis, the estimates of locations and effects of QTL are asymptotically unbiased if
the assumption that there is at most one QTL on a chromosome is true.
The test statistic LR for a given position is expected to be asymptotically
chi-square distributed with one degree of freedom under the null hypothesis for the
backcross design and with two degree of freedom for the F2 design (Lander and
Botstein 1989, Van Ooijen 1992, Zeng 1994). However, because the test is usually
performed in the whole genome, there is a multiple testing problem. The distribution
of the maximum LR or LOD score over the whole genome under the null hypothesis
becomes very complicated. An asymptotic theory, which is based on an
Orenstein-Uhlenbeck diffusion process for determining appropriate genome-wise
critical values, has been developed by Lander and Botstein (1989), Feingold et al.
(1993) and Lander and Schork (1994). Lander and Botstein (1989) suggested that a
typical LOD score threshold should be between 2 and 3 to ensure a 5% overall false
positive error for detecting QTL.
2-3 Composite Interval Mapping
For interval mapping method, the estimated locations and effects of QTL tend to
be asymptotically unbiased if there is only one segregating QTL on a chromosome.
However, if there is more than one QTL on a chromosome, the test statistic at the
position being tested will be affected by all those QTL and the estimated positions and
−30−
effects of QTL identified by this method are likely to be biased. ‘Ghost QTL problem’.
One of the reasons for these shortcomings is that the test used in interval mapping
method is not an interval test. An interval test is that the effect of the QTL within a
defined interval should be independent of the effects of QTL outside the region.
Otherwise, even when there is no QTL within an interval, the likelihood profile on the
interval can still exceed the threshold significantly if there is a QTL at some nearby
region on same chromosome.
In order to overcome the shortcoming of interval mapping method, Zeng (1994)
proposed an improved method called composite interval mapping by combining
interval mapping with multiple regression analysis. Let us first review some relevant
theory in multiple regression analysis for QTL mapping (Zeng 1993).
1. Properties of Multiple Regression Analysis
Due to the linear structures of locations of genes on chromosomes, multiple
regression analysis has a very important property. That is the partial regression
coefficient of a trait on a marker is expected to depend only on those QTLs that are
located on the interval bracketed by the two neighbouring markers. It is independent
of any other QTL outsides the region if there is no crossing over interference and no
epistasis. However, interference and epistasis will introduce non-linearity in the
model.
Suppose we regression trait value y on t markers observed in B1 population:
∑=
++=t
kjjkkj exby
1
µ
where is the indicate value (1 or 0) of the th marker in the th individual,
and is the partial regression coefficient of the phenotype y on the th marker
conditional on all other markers. can also be denoted as and denotes a
set which includes all markers except the th marker.
jkx k j
kb k
kskb skykb .
k
−31−
Since takes a value of 1 or 0 with equal probability, the variance of the th
marker in the population is
jkx k
412 =kσ . It is easy to show that the covariance between
the th and th markers is i k 4)21( ikik r−=σ and is the recombination
frequency between marker i and marker k. The covariance between the trait value y
and the th maker is:
ikr
k
4)21(1∑=
−=m
uuukyk r δσ
where is u th QTL effect. uδ
With these basic equations, any conditional variance and covariance can be
derived. The variance of marker k conditional on marker i is:
[ ] ( )ikikikiikkik rrr −=−−=−= 14)21(1/ 22222. δδδδ
Because without interference, we have:
( ) ( )( ) kliorilkorderforrrr klilik 212121 −−=−
The covariance between markers i and k conditional on marker l is:
( ) ( )( )[ ]
( )( )( )( )
−−−−=
−−−−=−=
kilorlikorderforrrrlkioriklorderforrrrkliorilkorderfor
rrr
ikilil
ikklkl
kliliklkliliklik
211211
04212121/ 2
. σσσσσ
The above result shows that conditional on an intermediate marker, the covariance
between two flanking markers is expected to zero and from this property Zeng (1993)
shows:
( ) ( )( )( )( ) ( )( )
( ) ( )( )( )( ) ( )( ) u
kuk kuk kkkk
kukukuu
kkkk
ukukukskyk a
rrrrr
arr
rrrb ∑ ∑
≤<− +≤< ++
++
−−
−−
−
−−+
−
−−=
1 1 11
11
11
11. 1
2111
211
where the first summation is for all QTLs located between marker k-1 and k and the
second summation is for all QTLs located between marker k and k+1. This is a very
desirable property that the regression coefficient depends only on those QTLs that are
located between marker k-1 and k+1. That was the property that can be used to create
an interval test in which we can test whether there are QTLs within a marker interval.
−32−
There are also other properties of the multiple regression that have direct
relevance to QTL mapping. These are summarize as follows:
Conditioning on unlinked markers in the multiple regression analysis will reduce
the sampling variance of the test statistic by controlling some residual genetic
variation and thus will increase the power of QTL mapping.
Conditioning on linked markers in the multiple regression analysis will reduce the
chance of interference of possible multiple linked QTL on hypothesis testing and
parameter estimation, but with a possible increase of sampling variance.
Two sample partial regression coefficients of the trait value on two markers in a
multiple regression analysis are generally uncorrelated unless the two markers are
adjacent markers.
2. Genetic Model
Composite interval mapping is an extension of interval mapping with some
selected markers also fitted in the model as cofactors to control the genetic variation
of other possibly linked or unlinked QTL. To test for a QTL on an interval between
adjacent markers Mi and Mi+1, the model will be:
∑ +++=k
jjkkjj exbxby **µ (2-8)
where refers to the putative QTL and refers to those markers selected for
genetic background control. Appropriate selection of markers as cofactors is
important and will discuss later.
*jx jkx
3. Likelihood Analysis
The likelihood function of formula (2-8) is specified as:
∏=
−+
−−=
n
j
jjj
jjj
BXyp
bBXypBbL
10
*
12* ),,(
σφ
σφσ
where and the maximum likelihood estimates of the various
parameters are given below (use EM algorithm):
∑+=k
jkkj xbBX µ
−33−
( )[ ]( )[ ] ( )[ ]σφσφ
σφBXypbBXyp
bBXypP
jjjjjj
jjjj −+−−
−−=
0*
1
*1
( ) ( )∑∑==
′−=−=n
jjj
n
jjj cPXBYPPBXyb
11
* ˆˆ
where c and the prime denotes matrix transposition. { } { }∑=
××===
n
jnjnjj PPyYP
111,,,
( ) ( )*1 ˆˆ bPYXXXB −′′= −
( ) ( ) cbBXYBXYn 2*2 ˆˆˆ −−′
−=σ
4. Hypothesis Test
The hypotheses to be tested are H0: b* = 0 and H1: b* ≠ 0. The likelihood function
under null hypothesis is:
( ) ∏=
−==
n
j
jj BXyBbL
1
2* ,,0σ
φσ
The maximum likelihood estimates of B and σ are:
( ) nBXYBXYandYXXXB
−′
−=′′= − ˆˆˆˆ 21 σ
The likelihood ratio (LR) test statistic is:
( )2*
2*
ˆ,ˆ,ˆ
ˆ,ˆ,0ln2
σ
σ
BbL
BbLLR
=
−=
Like interval mapping method, the test can be performed at any position in a
genome covered by markers and it is easy to perform a systematic search for QTLs in
a genome. As the test statistic is almost independent for each interval, a test on each
interval is more likely to test for a single QTL only.
5. Marker Selection
The main difficulty to use composite interval mapping method is to answer the
question which markers should be added into the model before searching the QTL.
There is no simple solution for this question because the answer depends on the
−34−
number and positions of underlying QTLs and the information is not available before
QTL mapping. Too few markers selected may not achieve the purpose of reducing the
most residual genetic variation and too many markers selected may reduce the power
of the analysis.
The practical implement of the marker selection in QTL cartographer software has
two steps. In the first step, the selecting procedures such as forward, backward, or
stepwise regression selects markers that are significantly associated with the trait.
In the second step, a testing window is defined for blocking the markers inside the
window is used for the test model. The window is constructed by use a parameter W
pn
s
that is the distance (cM) between the testing interval (one for each direction) and the
nearest marker picked for the model. Then, those selected n markers that are
outside of the testing window are also fitted into the model to reduce the residual
variance.
p
The different conditions of the composite interval mapping can be created by
changing the values of and Wpn s. Generally should be much smaller than n,
not exceeding
pn
n2 (Jansen 1994), or alternatively it can be determined
automatically by F-to-enter or F-to-drop criterion in the forward or backward
regression analysis. Ws should at least 10 or 15 cM depending on sample size.
2-4 Mixed Linear Model Approach
As introduced above, CIM method is based on fixed multiple regression models.
Zhu (1998) suggested a new methodology for mapping QTL by using mixed linear
model approach that was called mixed-model based composite interval mapping
(MCIM). Unlike CIM method, MCIM method consider the marker effects as random
effects and by doing so, the obvious advantage is that the model can be extended
easily for more complicated QTL mapping situation, such as QTL by environment
interaction and QTL epistasis etc.
−35−
1. Genetic Model
For B1 or DH population, to analyze a QTL located on an interval flanked by marker
and , the MCIM method assumes the following model. −iM +iM
∑ +++=k
jkMkjMjAj euaxy εµ )()()(
where is the trait value for individual j, is the population mean, is the
additive effect of the putative QTL and is coefficient for additive effect,
is the random effect of marker k with its coefficient , and is the random
residual effect.
jy µ a
)( jAx )(kMe
)(kjMu jε
The model can also be expressed as the mixed linear model formula as follows:
∑
∑
=
=
=+=
+=++=
2
1
'2'2
2
1
),(~
uMuueMMMM
uuuMM
URUIURUV
VXbNeUXbeeUXby
σσ
ε
(2-9)
where V is model’s variance, is the variance component of markers and
is the residual variance component. is known symmetric matrix
of correlation coefficients and is identical matrix.
221 Mσσ =
IR =2
222 eσσ = MRR =1
[ ] ( )mfandfR ffM ,...,2,1, '' == ρ
In above formula, m is the number of markers selected for background control and
is the correlation coefficient between marker and marker .
is the recombination frequency between marker loci f and f ’.
'' 21 ffff r−=ρ
'ffr
fMe'f
Me
2. Likelihood Analysis
The log value of likelihood function for formula (2-9) is specified as:
( )( ) ( ) ( ) ( ) ( XbyVXbyVnVblVbL −′−−−−== −1
21ln
212ln
2,,log π ) (2-10)
where the model’s variance V can be calculated according to formula (2-9) and the
variance component can be estimated by MINQUE-1 (Rao 1971; Rao 1997) or 2uσ
−36−
REML method (Hartley and Rao 1967, Searle, 1970).
The estimations of QTL effects b was obtained by the formula:
( ) yVXXVXb 111 ''ˆ −−−= (2-11)
3. Hypothesis Test
Like IM and CIM methods, to search putative QTL within two flanking markers
and for the whole genome by setting a prior value for recombination
frequency between marker and putative QTL locus Q. The likelihood
ratio statistic (LR) can be calculated by:
−iM +iM
QM ir
−ˆ −iM
( ) ( )5.0,ˆ,ˆ2ˆ,ˆ,ˆ2 01 =−=−− QMQM ii
rVblrVblLR (2-12)
Therefore, the LR profile for whole genome can be plotted and the QTLs can be
located according to the LR profile.
4. A Model for GE Interaction
If QTL mapping experiments was conducted in several environments for
individuals sampled from the same DH population, QTL genetic main effects and GE
interaction effects can be evaluated by MCIM method. When experiment data
obtained from different environments need to be analyzed, environment effects are
usually treated as random effects. The additive model (2-9) can be expended to
include interaction effects for additive, replication, and marker effects.
The trait value measured on the th individual in the th environment and th
replication can be expressed as:
j h b
∑ ∑ +++
++++=
f lhjkkBkBlhMElhjMEfMfjM
hAEhjAEhEhjEjAhjk
eeueueu
eueuaxy
)()()()()()(
)()()()()(µ
where is the population mean, a is the additive main effect for searching QTL and
is coefficient for genetic main additive effect, is environment effect with
its coefficient , is the additive by environment interaction effect with its
µ
)( jAx )(hjEu
)(hjEu )(hjAEe
−37−
coefficient , is the random main effect over environments for the th
marker genotype with its coefficient . is the marker by environments
interaction effect with its coefficient . is the replication effect with its
coefficient and e is the random residual effect.
)(hjAEu
)(kBu
∑=
+
+
+
2
'
6
1
uuu
EE
uuu
EE
URU
U
eU
eU
σ
)( fMe
hjk
+
(~
AEAE
AEAE
UU
XbN
e
f
)( fjMu
)(lhjMEu
+
+
'M
ME
U
Ue
σ
)(lhMEe
)(kBe
+
MEME
B
RU
ee ε
2
B
'
,
AE
( RU uu1
11 −− XV
+ iki xa
ky
jp
) yQ'
'( −VX
jkj xa
ijaa
1
m
Mu (f
fk fMe )()
The model can also be expressed as the mixed linear model formula
∑=
=
+++=
=
++=
6
1
'
2'2'222
)
uu
eBBBMEMEMMME
MEMM
IUUURUUV
VXb
UeUUXby
σ
σσσσ (2-13)
By using model (2-13) and formula (2-10), QTLs can be searched (according to
the LR values) by mixed linear model approaches after using data for all individuals
across multiple environments and replications. When a QTL is found, its position on
the chromosome and genetic main effects (formula 2-11) as well as GE interaction
effects was obtainable by the mixed linear model approaches. GE interaction effects
can be predicted by BLUP method (Zhu 1999).
e uu22ˆ σ=
1') −+−= VXXVQ
5. A Model for QTL Epistasis
The following model can be used for two-way searching the QTLs with digenetic
epistatic effects when the population is B1 or DH.
∑ ∑= =
+++++=mm
hhMMhkMMijkijk eeuway
1 1)()( εµ
where the is the trait value of individual k and is the mean of the population.
and a are the additive effects for the two putative QTLs i and j at two testing
point and . is the digenetic additive by additive epistatic effect between
µ
ia j
ip
−38−
the QTLs i and j. and are the coefficients for the effects of QTL i,
QTL j, and QTL epistasis respectively. is the random effect of marker f and
is the random effect of the two-locus marker interaction between two markers.
and u are the coefficients. is the random residual effect.
jkik xx ,
+ 2MM
MM eU
σ
ijw
=
MMR
ε
)( fMe
εe
∑=
+
+3
1
MM
u
σ
)(hMMe
)( fkMu
=
=
V
y
R
)(hkMM
+
MM
MM
URU
eU
[ ]'hhρ
+
'M
1(
21(
''r ji −
−−
1(
2rcd
−
21( r−
41
ip
''_ jiij =ρρ
The model can also be expressed as following mixed linear model format:
∑=
=
+
3
1
'22'2
' ),(~
uuuuueMMM
uuuMM
URUIUU
VXbNURUXbeXb
σσ (2-14)
where is known symmetric matrix of correlation coefficients for marker
interaction:
MMR
(h, h’ = 1, 2, …, mm) MM =
and '',))
)21)()1)(
''
''' jiji
rrr
rr
jiijij
jiijabhh <<
−−=
The set of (i, j, i’, j’) equal to set of (a, b, c, d) and a < b < c < d in the whole
genome base.
The basic idea of mapping QTLs with marginal and two-ways epistatic effects
is through the two-dimensional searching along the whole genome. For each of the
two testing points and within two intervals each flanked by two markers, the
LR value can be calculated by using formula (2-12) and the QTLs can be located by
analysing the LR profile.
jp
2-5 Multiple Interval Mapping
Multiple interval mapping (MIM) is a multiple QTL oriented method
combining QTL mapping analysis with the analysis of genetic architecture of
quantitative traits through a search algorithm to search for number, positions, effects
and interaction of significant QTL simultaneously. For m putative QTL, the multiple
interval mapping model for a B1 population is defined by:
−39−
( ) i
m
r
t
srisirrsirri exxxy ∑ ∑
= <
+++=1
*** βαµ (2-15)
where is the trait value of individual i and is the mean of the model. is
the additive (marginal) effect of putative QTL r and is the coefficient, which is
unobserved but can be inferred from maker data in sense of probability, is the
epistatic effect between putative QTL r and s and t is the number of significant
pairwise epistatic effects, is the random residual effect.
iy µ rα
*irx
rsβ
ie
The likelihood function of the data given the model (2-15) is a mixture of normal
distributions as follow:
( ) (∏ ∑= =
+=
n
i jjiij
m
EDypEL1
2
1
22 ,|,, σµφσµ )
where is the probability of each multilocus genotype conditional on marker data,
is a vector of QTL parameters ( ’s and ’s), is a vector specifying the
configuration of ’s associated with each and for the th QTL genotype,
ijp
E α β jD
β*x α j
( )2,σ| µyφ denotes a normal density function for y with mean and variance
and n is the number of individuals.
µ 2σ
MIM method consists following four components:
- An evaluation procedure designed to analyse the likelihood of the data given a
genetic model (number, positions and epistasis of QTL) (Kao and Zeng 1997).
- A search strategy optimised to select the best (better) genetic model in the
parameter space.
- An estimation procedure for all parameters of the genetic architecture of the
quantitative traits simultaneously given the selected genetic model.
- A prediction procedure to estimate or predict the genotypic values of individuals
based the selected genetic model and estimated genetic parameter values for
marker-assisted selection.
−40−
Among these components, the second point is the critical part for the MIM
method. In next chapter, the simulation studies have been conducted for selecting
criteria in the model selection framework.
3. Simulation Studies
3-1 Simulation Model and Data
In this section, the model and method for producing simulation data of QTL
mapping experiments will be discussed. The simulation data include two parts that are
mapping information and QTL information.
1. Genetic Model for Simulation
The following is a general genetic model for B1 or DH population with m QTLs.
∑ ∑= ⊂<
+++=m
r msriisirrsirri exxxy
1 ...1)(βαµ (3-1)
where is the trait value of individual i and i is the indexes of the individual in
population ( i = 1, 2, …, n). µ is the mean of the model. α
iy
r is the marginal effect of
QTL r and is an indicator variable denoting genotype of QTL r. is defined
by ½ and -½ for B
irx
ie
irx
1 population and 1 and –1 for DH population. is the
epistatic effect between QTL r and QTL s and m is the number of QTLs chosen for
simulation, is the residual effect of the model assumed to be normally distributed
with mean zero and variance σ
rsβ
2 = . eV
The variance of model (3-1) can be partitioned into several components such
additive variance, epistatic variance, and residual variance. Formula (3-2A) and (3-3A)
are the additive and epistatic variances for B1 population and formula (3-2B) and
(3-3B) give out the additive and epistatic variances for DH population.
( ) [ ] eIA VVVGEGEEVarGVaryVar ++=−=+= 22 )()()()(
−41−
∑∑<
−+=ji
ijjii
iA rV )21(21
41 2 ααα (3-2A)
∑∑<
−+=ji
ijjii
iA rV )21(22 ααα (3-2B)
−−−−= ∑ ∑
<< <lkji jiijijklijklijI rrrV
,
2
)21()21)(21(161 βββ (3-3A)
−−−−= ∑ ∑
<< <lkji jiijijklijklijI rrrV
,
2
)21()21)(21(41 βββ (3-3B)
where is the recombination frequency between QTL i and QTL j. ijr
2. Parameter Setting
The first step of producing simulation data is to set the mapping parameters, such
as experimental population (B1 or DH), sample size (n), trait mean (µ), map function
(Haldane or Kosambi), and marker genotypes (for example, 1 for one genotype and 0
for another genotype). Especially, it is important to define chromosome information
such as chromosome number, marker number and positions for each chromosome.
Table 3-1 shows an example of parameters setting for QTL mapping information.
Table 3-1. An example of parameters setting for simulation mapping information.
Marker genotype Population
Sample Size
Trait Mean
Map Function Chromosomes Mm MM
B1 200 15.8 Haldane 9 1 0
The second step is to set the parameters of QTLs such as heritability (h2), the ratio
of epistatic variance by additive variance C, which is defined as VI / VA (see formula
3-2A and 3-3A), QTL number, positions, and effects. One example of the parameters
setting is showed in Table 3-2. By using this information, it is easy to produce the
additive (α) – epistatic (β) upper-triangle matrix as showed in Table 3-3. The QTL
effects can be adjusted according to h2, C, and as following. eV
Table 3-2. An example of parameters setting for QTL information.
Additive Effect Epistatic Effect QTL Number Heritability C = VI / VA 1Sign : Both (1:3) Sign : Same
8 0.6 0.1 2Distribution : γ-2.1 Distribution : γ-0.3 1Effects can be same direction or both directions, in which case, a ratio can be indicated. 2Effects
−42−
can be chosen for different distributions, such as gamma (with one parameter), normal or even.
Assume heritability is h2 and then AI VVC /= Ge Vh
h2
21−=V .
Note: We can use formula (3-2A), (3-2B) and (3-3A), (3-3B) to calculate VI and
VA. After setting the values of αi and βij, the βij’s value should be adjusted according
to the value of C.
If 1≠=A
I
CVVR then Rijij /ββ = to ensure that R = 1 and . AI VVC /=
Finally, to standardize the QTL effects by adjusting the values of α and β using
formula eV
α and eV
β and to make sure that the value of is equal to 1. eV
Table 3-3. An example of Simulation parameters setting for positions and effects of QTLs. Here, VA = 1.364, VI = 0.136, Ve = 1.0, C = VA / VI = 0.10, h2 = 0.60. Chromosome 1 1 3 3 7 7 7 9 Positions (cM) 11.7 31.8 9.1 43.1 11.8 40.2 65.9 21.9
QTLs 1 2 3 4 5 6 7 8 1 0.958 1.364 0.102 0.000 0.000 0.173 0.000 0.000 2 -0.381 0.183 0.000 0.000 0.000 0.747 0.000 3 0.559 0.000 0.000 0.000 0.000 0.083 4 -0.024 0.335 0.000 0.070 0.240 5 0.929 0.000 0.195 0.112 6 0.482 0.000 0.182 7 1.098 0.159 8 0.668
3. Simulation Procedure
The marker genotype data and trait value for each individual can be produced
according to the mapping information and the QTL information. The basic simulation
strategy is to walk along the chromosomes and treat the marker positions and QTL
positions alike. The difference between marker and QTL is that if a marker is reached,
just record the marker genotype (0 or 1) and for a QTL, the QTL additive and epistatic
effects should add into the trait value for current individual.
For each individual, the simulation starts from the first marker of each
chromosome. By 50% chance, the first marker genotype will be 0 or 1 and record it.
To next marker or QTL position, the chance of obtaining certain type of genotype is
according to the recombination frequency between previous position and the current
−43−
position. For example, if the distance between these two positions is 10cM and the
Haldane map function has been used, according to formula (2-1), the recombination
frequency was 0.091. Therefore, the current genotype will be of difference with
previous one only by the chance of 9.1%. After deciding the genotype for current
position, we can record the genotype value or add QTL additive effect into the trait
value. The procedure will continue until all markers and QTLs have been reached.
Then, the QTL epistatic effects can be add. After adding the trait mean and the
random residual effect, the trait value for current individual was obtained.
4. Format of the Simulation Data
Table 3-4 shows an example of QTL mapping simulation data. The first part of
the data is the marker genotype that is the records for every marker position of the
whole genome. For inbred line, the possible marker genotypes are 3 for Intercross
population and 2 for Backcross, DH, and RIL population. Usually we use different
numbers to represent the different marker genotypes. To use 2 denote genotype AA, 1
to denote Aa, and 0 to denote aa is one of the examples. The second part is the trait
value, which is the joint effect of several factors. These factors include trait mean
value, heritability, and QTL positions and effects (additive, dominance for intercross
population, and possible epistatic effects). In order to analyse the simulation data,
other information besides the marker data and trait value are needed as well. That
includes the map information such as map function, marker positions, and population
types etc.
Table 3-4. An example of the simulation data with 5 individuals.
Individuals Marker Data Trait Value 1 111011111111111111111110000000000000000001 13.7218 2 11111111000000100000000000000111111111111 16.1372 3 11111110000000000000000000001111100000000 15.5285 4 11111100000001111000000011110000000000001 13.5461 5 00111100000000000000011111111000000000000 16.4589
−44−
3-2 Single Marker Analysis
In this section, the simulation study has been conducted for single marker analysis.
The simulation design is based on: replications of 500, sample size of 200, B1
population and trait mean of 15.6, Haldane mapping function, total chromosome
number of 3 with marker number of 12, 11, and 15 respectively, average marker
distance of 10 cM with positions having certain deviation (see Table 3-5). We set
totally 5 QTLs for the whole genome and the heritability is 0.6. Among the QTLs,
only one QTL is set for chromosome 1 and chromosome 2. The other 3 linked QTLs
have been set on the chromosome 3.
In Table 3-5, the t statistic (t-Val) for each marker is calculated by formula (2-3)
and the LR value is obtained according to formula (2-4). This analysis is also fitting
the data to the simple linear regression model and the estimators
of and for each marker is also being estimated. The t statistic is for the
hypothesis that the marker is unlinked to the quantitative trait. The column headed by
Pr(t-Val) is the probability that the trait is unlinked to the marker. Significances at the
5%, 1%, 0.1% and 0.01% levels are indicated by *, **, *** and ****, respectively.
iii XY εββ ++= 10
0β 1β
For the QTL with median effect (0.754) in chromosome one, the estimation of
QTL position is reasonable accurate by the indication of significance level. However,
the range for the estimation of QTL position is much wide (marker 6 to marker10) for
the QTL with large effect (-1.331) in chromosome two. In the multiple-linked QTL
situation on chromosome three, there is no way to distinguish these three QTLs
because almost all markers have very high significance levels. All QTL effects cannot
be estimated by using single maker method because the QTL positions and effects are
confounded. From this simulation study, it is clear that the single marker method have
the power to detect the markers associated with the existed QTLs.
−45−
Table 3-5. Simulation result of single marker analysis (average of 500 replications).
1Chr 2QPos 3Effect 4Mk 5MPos t-Val LR β0 β1 Pr(t-Val) 1 1 0.0 1.79 3.17 14.84 0.33 0.07566 2 12.7 2.14 4.53 14.79 0.42 0.03375 * 3 23.7 2.56 6.46 14.75 0.52 0.01115 * 4 28.6 2.80 7.70 14.72 0.57 0.00559 ** 43.20 0.754 5 41.7 3.51 11.98 14.64 0.73 0.00055 *** 6 49.9 3.19 9.95 14.67 0.66 0.00164 ** 7 59.9 2.65 6.93 14.73 0.54 0.00861 ** 8 68.0 2.30 5.24 14.77 0.46 0.02239 * 9 74.7 2.09 4.31 14.80 0.41 0.03823 * 10 90.8 1.68 2.81 14.85 0.30 0.09424 11 98.1 1.49 2.21 14.88 0.25 0.13774 12 108.2 1.33 1.78 14.91 0.20 0.18347
2 1 0.0 1.68 2.83 15.16 -0.31 0.09370 2 5.5 1.83 3.35 15.18 -0.34 0.06804 3 18.0 2.24 4.98 15.23 -0.45 0.02596 * 4 27.9 2.72 7.27 15.28 -0.55 0.00709 ** 5 34.8 3.06 9.16 15.32 -0.63 0.00251 ** 6 52.2 4.31 17.74 15.45 -0.89 0.00003 **** 7 63.0 5.37 26.91 15.55 -1.10 0.00000 **** 71.60 -1.331 8 73.5 6.27 35.86 15.63 -1.26 0.00000 **** 9 79.7 5.44 27.58 15.56 -1.11 0.00000 **** 10 93.2 4.05 15.76 15.42 -0.84 0.00007 **** 11 97.5 3.71 13.28 15.39 -0.77 0.00027 ***
3 1 0.0 5.79 30.96 14.42 1.17 0.00000 **** 2 7.2 6.87 42.44 14.32 1.37 0.00000 **** 17.50 1.217 3 18.6 8.80 65.63 14.17 1.66 0.00000 **** 4 30.4 7.97 55.20 14.24 1.54 0.00000 **** 5 35.0 7.81 53.27 14.25 1.51 0.00000 **** 46.30 0.484 6 50.3 7.31 47.38 14.29 1.43 0.00000 **** 7 58.9 6.70 40.48 14.34 1.33 0.00000 **** 8 79.0 6.14 34.53 14.39 1.24 0.00000 **** 84.40 0.711 9 80.0 6.14 34.57 14.39 1.24 0.00000 **** 10 97.9 4.62 20.29 14.53 0.95 0.00001 **** 11 102.8 4.18 16.74 14.57 0.86 0.00004 **** 12 112.0 3.50 11.85 14.64 0.72 0.00058 *** 13 119.4 3.02 8.94 14.70 0.62 0.00284 ** 14 138.0 2.19 4.72 14.80 0.42 0.03001 * 15 139.7 2.13 4.49 14.80 0.40 0.03444 *
1Chromosome. 2QTL position in cM. 3QTL effect. 4Marker number. 5Marker position in cM.
−46−
3-3 Comparing Different Mapping Method
It is helpful to know the advantages and disadvantages of QTL mapping methods
before choosing them for a particular QTL mapping experiment. In this section, using
DH as the model population, simulation studies were conducted for comparing the
performances among three methods of IM, CIM, and MCIM under the simple additive
model. The information was presented for QTLs about the positions and effects,
detection power, and probability of false QTLs detected.
1. Parameters Setting
In this study, the simulation design is based on: replications of 500, sample size of
200, population mean of 15.6, Haldane mapping function, total chromosome number
of 9, marker number of 11 for each chromosome, and average marker distance of 10
cM with positions having certain deviation (Figure 3-1). We set totally 7 QTLs for the
whole genome and the heritability is 0.6. Among these QTLs, there are 2 QTLs with
large effects, 3 QTLs with median effects, and 2 QTLs with small effects. There are
opposite sign for 1 QTL with median effect and 1 QTL with small effect as compared
to the other QTLs. According to the QTL number in one chromosome, we have
constructed two different QTL models: Model-I has only one QTL and Model-II has
multiple QTLs for one chromosome.
2. Estimation of QTL Effects
The estimation of QTL effects for the one QTL Model-I and multiple QTL
Model-II by using IM, CIM, and MCIM methods has been showed in Table 3-6. The
estimators were obtained by averaging all effects on each known QTL position over
the 500 replications. For Model-I, the estimated QTL effects are very close to the
parameter value. It is implied by the results that the estimation of QTL effects is
unbiased for all the three QTL mapping methods (IM, CIM, and MCIM) under the
one QTL model. Unlike Model-I, the estimation of QTL effects has small bias on the
multiple QTL Model-II due to the linkage between QTLs. For the two QTLs with
large effects (Q2-1L and Q2-2L), the effects have been apparently overestimated by
−47−
IM method. The QTL effects have been underestimated by all the three mapping
methods for the two QTLs with small effects (Q3-1S and Q3-2S). The situation is
mixture for the three median-effect QTLs (Q1-1M, Q1-2M, and Q1-3M) with some
QTL effects underestimated and some overestimated. Especially for the QTL Q1-3M,
the bias is quite serious for the IM method. However, the estimation bias of QTL
effects is quite small for QTLs with large and median effects, especially by using the
CIM and MCIM methods.
Table 3-6. The simulation results of the QTL effect on the QTL positions for Model-I and Model-II.
MODEL-I MODEL-II 1QTLs 2Eff IM CIM MCIM QTLs Eff IM CIM MCIM Q1-1L 1.50 1.51 1.51 1.54 Q1-1M 0.64 0.92 0.78 0.89 Q2-1M 0.69 0.70 0.69 0.72 Q1-2M 0.64 0.84 0.52 0.76 Q3-1S 0.17 0.17 0.18 0.18 Q1-3M -0.64 -0.17 -0.56 -0.43 Q4-1L 1.50 1.48 1.48 1.53 Q2-1L 1.39 1.71 1.40 1.46 Q5-1M -0.69 -0.69 -0.69 -0.69 Q2-2L 1.39 1.71 1.38 1.48 Q7-1M 0.69 0.70 0.70 0.71 Q3-1S -0.16 -0.06 -0.08 -0.06 Q9-1S -0.17 -0.16 -0.17 -0.17 Q3-1S 0.16 0.09 0.09 0.10
1QTL = QTL with chromosomal number and serial number followed by effect (L-large, M-median or S-small), 2Eff = effect of QTLs.
3. Power and False Positive
Simulation results were presented for power of QTL detection and probability of
false QTL identified under the different thresholds by using IM, CIM, and MCIM
methods for Model-I and Model-II (Table 3-7 and Table 3-8). These kinds of
information were obtained by analysing the LR peaks from the LR profile for each
chromosome. A detected QTL is defined by having a valid LR peak with the highest
LR value that is greater than a predefined threshold. If a detected QTL matched with
the predefined QTL, the QTL will then be counted for calculating power of QTL
detection. However, if the detected QTL cannot match with any predefined QTLs in
the same chromosome, and it will be counted as a false QTL. It is obvious that the
predefined threshold value is very important for mapping QTL. By decreasing the
threshold value, it will increase the power of QTL detection and the probability of
false QTL detected. The reverse is also true.
−48−
Table 3-7. Power of QTL detection and the probability of false QTL detected under different thresholds for Model-I.
1LOD = 2.0 LOD = 2.5 LOD = 3.0 QTL IM CIM MCIM IM CIM MCIM IM CIM MCIM
Q1-1L 100 100 100 100 100 100 100 100 100 Q2-1M 78.0 41.8 79.8 70.6 29.8 70.6 63.0 Q3-1S 2.6 3.6 0.6 0.6 1.8 0.4 0.4 0.8 Q4-1L 100 100 100 100 100 100 100 100 Q5-1M 53.6 90.8 80.2 37.6
53.4 84.8 1.0
99.8 81.6 70.8 24.0 71.0 60.4
Q7-1M 54.2 88.0 78.0 37.4 81.6 69.6 28.2 73.4 61.6 Q9-1S 2.8 2.8 5.0 0.6 2.2 2.2 0.4 1.0 0.8
2FQTL1 34.4 29.2 29.2 27.2 22.2 22.2 21.6 16.6 18.4 3FQTL2 9.8 5.8 12.0 5.6 2.4 7.8 3.0 0.4 4.0
4FQTL2+ 2.6 0.8 4.4 0.8 0.6 0.8 0.6 0.4 0.6 1 2
3 4LOD = threshold, FQTL1 = the probabilities of false for detecting one QTL in the whole
genome, FQTL2 = the probabilities of false for detecting two QTLs, and FQTL2+ = the probabilities of false for detecting more than two QTLs.
For Model-I (Table 3-7), the three mapping methods were equally efficient in
detecting QTLs with large effects (Q1-1L and Q4-1L). However, QTLs with very
small effect (Q3-1S and Q7-1S) could only be detected with very low efficiency. But
the power of detecting QTL with median effect will be affected by choosing different
QTL mapping methods and various threshold values. CIM method tended to have the
highest power values among these three mapping methods. While MCIM method is
more efficiency than IM method. In case of the probability for false QTL detection,
IM method in general gave more false QTLs under the three threshold values. The
methods of CIM and MCIM had similar likelihood of finding one false QTL. But
CIM method was better than MCIM method when considering two and more false
QTLs detection.
For multiple QTL model (Model-II in Table 3-8), all three mapping methods have
the high efficiency for detecting QTLs on the same chromosome with large effects
(Q2-1L and Q2-2L). If there are QTLs with very small effect (Q3-1S and Q3-2S) on
one chromosome, they are almost undetectable by these three methods. IM method
cannot detect the QTL with negative median effect (Q1-3M), which was linked to
QTLs with positive effect (Q1-1M and Q1-2M). CIM method tended to be more
efficiency than MCIM method expect for one QTL (Q1-2M) being closely linked to
−49−
another (Q1-1M) with the same direction of effects. IM method gave high probability
of false QTL detection as compared to other two mapping methods. CIM method
tended to have smaller likelihood of finding false QTLs.
Table 3-8. Power of QTL detection and the probability of false QTL detected under different thresholds for Model-II.
LOD = 2.0 LOD = 2.5 LOD = 3.0 QTL IM CIM MCIM IM CIM MCIM IM CIM MCIM
Q1-1M 81.2 100 84.4 75.8 100 83.0 67.6 99.6 80.4 Q1-2M 14.4 18.8 22.8 13.4 14.6 21.6 11.2 10.0 20.2 Q1-3M 0.8 65.4 38.6 0.2 55.8 30.2 0.0 43.8 25.2 Q2-1L 99.0 99.8 100 99.0 99.8 100 99.0 99.8 99.6 Q2-2L 99.4 100 99.8 99.4 100 99.8 99.4 100 99.8 Q3-1S 0.4 0.6 2.2 0.0 0.0 0.6 0.0 0.0 0.0 Q3-2S 0.8 1.8 1.4 0.4 0.4 0.8 0.0 0.4 0.4 FQTL1 42.6 18.8 31.8 45.6 11.2 22.4 45.2 7.2 17.0 FQTL2 24.4 3.0 12.0 21.6 1.0 7.8 20.6 0.8 5.6
FQTL2+ 7.8 0.0 3.4 6.2 0.0 1.0 6.0 0.0 0.4
It is implied that the density of the genetic marker will affect both the power of
QTL detection and the probability of false QTL detected as showed in Table 3-9.
When marker density increases, there is no apparent gain of power for detecting QTLs
with large effects (Q2-1L and Q2-2L) by three QTL mapping methods. But MCIM
method tends to be more powerful than the other two methods (IM and CIM) for
detecting QTLs with small effects. When considering the power of detecting linked
QTLs with reverse effects (Q1-2M and Q1-3M), MCIM method has a great
improvement, while CIM method performs quite poor. It may suggest that increasing
marker density is sometime even harmful for the CIM method. The QTL Q1-3M is
still cannot be detected by IM method as the marker density increased.
The impact of the sample size on the power of QTL detection and the probability
of false QTL detected is showed in Table 3-10. Basically, the power of the QTL
detection will increase as the sample size increased for all the three mapping methods.
Especially, the CIM method has obtained large improvement both in power of QTL
detection and probability of false QTL detected after the sample size is increasing to
300.
−50−
Table 3-9. Power of QTL detection and the probability of false QTL detected under Model-II when chromosomes = 3, average marker distance = 4 cM, and threshold value is LOD = 2.5.
QTL IM CIM MCIM Q1-1M 72.8 99.8 76.8 Q1-2M 27.8 13.6 47.0 Q1-3M 1.6 41.8 44.2 Q2-1L 98.2 100 98.8 Q2-2L 99.6 100 99.8 Q3-1S 0.4 0.8 4.4 Q3-2S 0.0 0.4 3.0 1FQTL 81.0 49.6 48.4
1Probability of false QTL detected in whole genome.
Table 3-10. Power of QTL detection and the probability of false QTL detected under Model-II for different sample sizes (threshold value with LOD = 2.5).
QTL Samples = 100 Samples = 300 IM CIM MCIM IM CIM MCIM Q1-1M 37.6 66.8 59.0 86.1 98.9 86.3 Q1-2M 10.2 21.4 18.4 16.1 44.9 29.3 Q1-3M 0.4 21.8 11.2 2.2 76.9 48.5 Q2-1L 96.0 98.8 97.2 100 100 100 Q2-2L 91.2 98.6 95.0 100 100 100 Q3-1S 0.0 0.6 1.2 0.7 0.9 1.3 Q3-2S 0.0 0.6 0.6 0.0 1.1 1.1 FQTL 42.0 50.2 33.4 87.0 15.9 29.1
The performance of the QTL mapping analysis will also be affected by the
adjusted factors of the method itself. Before the QTL mapping analysis, the CIM
method needs to set the parameters such as “window size” and “control marker
numbers”. In this simulation study, we simply use the default parameters and that is
10 cM for the “window size” and 5 for the “control marker numbers”. However,
sometimes the change of these parameters in CIM method has a great influence on the
power of QTL detection and the probability of false QTL detection as showed in
Table 3-11. On the other hand, because the MCIM method treats the background
control markers as random effects, the influence of the control markers is much less
than that of CIM method.
−51−
Table 3-11. Power of QTL detection and the probability of false QTL detected under different number of background control markers in model-II (threshold value with LOD = 2.5).
QTL CIM MCIM 1Mn = 5 Mn = 10 Mn = 25 Mn = 5 Mn = 10 Mn = 25
Q1-1M 100 100 98.6 83.0 78.2 78.4 Q1-2M 14.6 25.9 26.2 21.6 28.2 28.4 Q1-3M 55.8 67.0 63.8 30.2 43.2 51.6 Q2-1L 99.8 100 100 100 100 100 Q2-2L 100 100 99.2 99.8 99.6 99.8 Q3-1S 0.0 0.9 1.0 0.6 1.8 2.8 Q3-2S 0.4 0.0 1.8 0.8 2.4 1.8 FQTL 12.2 31.2 65.6 31.2 34.8 39.4
1Mn = Number of control marker.
4. Positions and Effects of Detected QTLs
The summary of the position estimation and the 95% experimental confidence
interval (ECI) for detected QTLs was presented in Table 3-12 for Model-I and
Model-II with threshold setting to LOD = 2.5. For the two QTLs with large effects,
the estimation of position is quite accurate with small ECI for all three mapping
methods. The average range of ECI is 14cM, 8.3cM, and 9.5cM for IM, CIM, and
MCIM methods. Unlike the CIM and MCIM methods, the average range of ECI
increases largely (11cM to 17cM) from Model-I to Model-II for the IM method. As
the QTL has median effect, the estimation of the QTL position becomes less accurate
and the ECI becomes larger. For example, the average range of ECI is almost doubled
for the median effect QTLs in Model-I by using CIM and MCIM methods (15cM for
CIM and 20cM for MCIM). For the two small effect QTLs, it is difficult to obtain a
good estimation for the QTL position and a reasonable ECI because this kind of QTL
can only be detected very few times in 500 replications due to the extreme low power
of QTL detection.
For the single QTL Model-I, the estimated effects of detected QTLs for the two
large QTLs (Q1-1L and Q4-1L) are unbiased as showed in Table 3-13. However, the
estimation of QTL effects tends to be overestimated for the QTLs with median and
small effects. The reason is that the detection power for this kind of QTL is much less
than 100%. That is, we only pick the large LR peak (greater than the predefined
threshold value) as the identified QTL for each replication. It is obvious that the large
−52−
LR peak tends to have the large estimation of QTL effect as compared to the small LR
peak. Therefore, in the real QTL mapping situation, if you identified a QTL with
median or small effect, it is likely to have slightly overestimated effect. The
overestimation in QTL effect could be larger for two linked QTLs as Q2-1L and
Q2-2L at Model-II. This may imply that the QTL linkage will affect the estimation of
QTL effects. To compare the three QTL mapping methods, CIM method performs
well for the estimation of QTL effects and the ECI for QTLs with median effects,
partially due to the high power of QTL detection for these kinds of QTL.
Table 3-12. The simulation results of the position for the detected QTLs under the Model-I and Model-II when the threshold value is setting to LOD = 2.5.
QTL 1Pos IM CIM MCIM Genome 2Est 3ECI Est ECI Est ECI
Q1-1L 12.1 12.2 ( 8 — 17) 12.4 (8 — 17) 12.4 (8 — 16) Q2-1M 30.7 30.5 (18 — 45) 27.0 (20 — 36) 29.6 (18 — 42) Q3-1S 75.5 81.6 (69 — 89) 81.6 (77 — 89) 79.1 (62 — 90) Q4-1L 82.5 82.5 (76 — 89) 82.5 (78 — 87) 82.2 (76 — 88) Q5-1M 98.8 97.4 (85 —102) 97.5 (87 —102) 96.7 (84 —102) Q7-1M 8.8 7.7 ( 0 — 21) 9.2 (5 — 19) 8.1 (0 — 18)
Model-I
Q9-1S 50.6 41.8 (41 — 45) 53.1 (39 — 66) 46.6 (36 — 56)
Q1-1M 12.1 14.9 (4 —26) 16.8 (8 — 23) 16.2 (4 —26) Q1-2M 30.7 29.6 (28 —37) 31.1 ( 26 — 41) 30.5 (28 —42) Q1-3M 75.5 74.7 (75 —75) 75.4 ( 66 — 83) 75.1 (64 —84) Q2-1L 8.8 10.1 (4 —20) 8.8 ( 6 — 12) 9.1 (4 —12) Q2-2L 82.5 79.5 (68 —86) 82.5 ( 77 — 86) 81.9 (76 —86) Q3-1S 50.6 46.0 (44 —48)
Model-II
Q3-2S 98.8 85.9 (85 —87) 106.0 (106 —106) 101.0 (90 —106)
1Pos = position of QTLs. 2Est = estimated QTL position, 3ECI = 95% experimental confidence interval for QTL position.
Note: the blank table cell with “” is caused by 0 detection power.
−53−
Table 3-13. The summary of the effects for the detected QTLs under the Model-I and Model-II when the threshold value is setting to LOD = 2.5.
QTL 1Eff IM CIM MCIM Genome 2Est 3ECI Est ECI Est ECI
Q1-1L 1.50 1.54 (1.1—1.9) 1.53 ( 1.2—1.9) 1.59 ( 1.2—2.1) Q2-1M 0.69 0.94 ( 0.8—1.3) 0.77 ( 0.5—1.1) 0.86 ( 0.6—1.4) Q3-1S 0.17 0.85 ( 0.7—1.0) 0.59 ( 0.6—0.6) 0.67 ( 0.5—0.9) Q4-1L 1.50 1.50 ( 1.1—1.9) 1.49 ( 1.1—1.8) 1.54 ( 1.1—2.0) Q5-1M -0.69 -0.92 (-1.2— -0.7) -0.75 (-1.0 — -0.6) -0.79 (-1.2 — -0.6) Q7-1M 0.69 0.95 ( 0.7—1.3) 0.76 ( 0.6—1.0) 0.83 ( 0.6—1.2)
Model-I
Q9-1S -0.17 -0.89 (-0.9 — -0.9) -0.59 (-0.7 — -0.5) -0.73 (-1.0 — -0.6)
Q1-1M 0.64 1.05 ( 0.8—1.4) 1.08 ( 0.7—1.4) 1.09 ( 0.7—1.5) Q1-2M 0.64 1.00 ( 0.8—1.3) 0.93 ( 0.7—1.2) 1.07 ( 0.6—1.6) Q1-3M -0.64 -0.81 (-0.8 — -0.8) -0.71 (-1.0 — -0.6) -0.80 (-1.5 — -0.5) Q2-1L 1.39 1.75 ( 1.4—2.1) 1.42 ( 1.1—1.7) 1.53 ( 1.1—2.0) Q2-2L 1.39 1.75 ( 1.4—2.1) 1.40 ( 1.1—1.7) 1.53 ( 1.1—2.0) Q3-1S -0.16 -0.73 (-1.0 — -0.6)
Model-II
Q3-2S 0.16 0.80 ( 0.8—0.8) 0.55 ( 0.5—0.6) 0.71 ( 0.6—0.8) 1Eff = effect of QTLs. 2Est = estimated QTL effect, 3ECI = 95% experimental confidence interval for QTL effect.
5. The LR Profile
For these three QTL mapping methods (IM, CIM, and MCIM), the average
mapping results of the two QTL models were showed in Figure 3-1. For Model-I, all
three mapping methods performed quite well because of the unbiased estimation of
QTL positions and effects as well as the LR values depended on the QTL effects. For
Model-II, the two QTLs with small effects (Q3-1S and Q3-2S) are undetectable by all
the three methods of QTL mapping. These three methods have very larger power to
detect QTLs with large effects (Q2-1L and Q2-2L). However, comparing to CIM and
MCIM methods, IM method has more noise between these two QTLs with large
effects and this kind of noise could be harmful when these LR peaks were considered
as QTLs. For the three QTLs with median effects on chromosome 1, the highest LR
value is obtained by CIM method for Q1-1M and Q1-3M, but by MCIM method for
Q1-2M. IM method has very low LR value for Q1-3M with the possible reason of no
−54−
Figuprofshorhorishow
Q
z
Q1-1L Q2-1M Q3-1S Q4-1L Q5-1M Q7-1M Q9-1S
1-1M Q1-2M Q1-3M Q2-1L Q2-2L Q3-1S Q3-2S
re 3-1. The simulation 500 average QTL mapping LR profiles and additive effect iles for the two QTL setting models. The long vertical bars are chromosomes and the t vertical bars are QTL positions and effects. The small dots distributed along the ontal bars are genetic markers. Only the chromosomes with QTL (1, 2, 3) have been ed for Model-II.
−55−
detection power for opposite effects of linked QTLs. For Q1-2M, CIM method has
very low LR value because of the closeness of the first two QTLs.
3-4 Consider the Complicated QTL Mapping Situations
In Section 3-3, the performances of IM, CIM, and MCIM QTL mapping methods
under the simple additive situation have been studied. However, for many real QTL
mapping experiments, more complicated situations such as QTL by environment
interaction and QTL epistasis are existed generally. In this section, we will conduct
the simulation studies for IM and CIM methods under these complicated QTL
mapping situations. The performance of MCIM method will also be studied when the
relative mixed linear models have been used for QTL by environment model (Model
2-13) and QTL epistatic model (Model 2-14).
1. Parameters Setting
For studying the QTL by environment interaction, the Model-AE for the
simulation study is based on following parameters setting: total replications for the
simulation is 300, using DH population with 100 individuals, 3 environments each has
2 repeats, population mean of 15.6, Haldane mapping function, whole genome has 3
chromosomes each with 11 markers, and average marker distance of 10 cM with
positions having certain deviation. Each chromosome has one QTL and the
heritability is 0.36. The QTL positions, the QTL main effects and QE interaction
effects are showed in Table 3-14. Notice that among these three QTLs, Q1-1 has large
QE interaction effect but no main effect. Q2-1 and Q3-1 have the same QTL main
effects. Q2-1 has QE interaction effect but Q3-1 has no QE interaction effect.
Table 3-14. QTL parameters setting of Model-AE for simulation study of QTL by environment interaction.
QE Interaction Effects QTLs 1Chr. 2Pos. Main Effect (Q) 3QE1 QE2 QE3 Q1-1 1 22.0 0.00 -0.34 -0.38 0.72 Q2-1 2 54.7 0.62 0.28 0.39 -0.67 Q3-2 3 96.4 0.62 0.00 0.00 0.00
1Chromosome. 2QTL positions in cM 3QE1, QE2 and QE3 are the QTL effects for environment 1, environment 2, and environment 3.
−56−
Table 3-15. QTL setting of Model-AA and Model-A for mapping QTL with epistatic effects.
QTLs 1Chr 2Pos 3Additive QTLI QTLj 4Epistasis Q1-1 1 12.2 0.80 Q1-1 Q2-2 0.48 Q1-2 1 30.7 0.80 Q1-2 Q1-3 0.96 Q1-3 1 75.5 -0.80 Q2-1 Q2-2 -0.96 Q2-1 2 8.8 1.61 Q2-2 Q3-1 1.92 Q2-2 2 82.5 0.00 Q3-1 3 50.6 0.27 Q3-2 3 98.8 -0.27
1Chromosome. 2Positions in cM. 3QTL additive effects. 4QTLs epistatic effects.
The Model-AA and Model-A are used for studying the QTL epistasis and the
simulation study is based on following parameters setting: total replication of 300,
using DH or B1 population with 200 individuals, population mean of 15.6, Haldane
mapping function, whole genome has 3 chromosomes each with 11 markers, and
average marker distance of 10 cM with positions having certain deviation. QTL
setting for Model-AA is showed in Table 3-15 and the QTL setting for Model-A is just
like Model-AA except that there are no QTL epistatic effects between QTLs.
Therefore, the Model-AA is a additive-epistatic QTL model and the Model-A is a
simple additive model, which is similar for the Model-II in section 3-3. The only
difference between the Model-AA and Model-A is the QTL epistatic effects.
2. Performance of IM and CIM Methods
- QTL by Environments Interaction
The estimators of Model-AE in Table 3-16 were obtained by averaging all effects
on each known QTL position over the 300 replications. From this simulation study,
the estimation of main effect is unbiased when mapping QTL by using data of all
environments together for the IM and CIM methods. If the QTL has no main effect
like Q1-1, the estimation of QE interaction effects is also unbiased when using data
from the particular environment. For QTL without the QE interaction effect (Q3-1),
the estimation of effects across different environment is very similar as the estimated
main QTL effect. However, for the QTL with main effect and QE interaction effects
(Q2-1), the estimation of effects by using the data from a specific environment is
−57−
biased. Actually, the estimation of QTL effects in specific environment is the sum of
QTL main effect and QE interaction effects.
Table 3-16. Estimation of effects of Model-AE on the QTL positions for IM and CIM methods under various environments (Threshold LOD = 2.5).
1E1-3 2E1 E2 E3 QTLs IM CIM IM CIM IM CIM IM CIM
Q1-1 0.00 0.00 -0.35 -0.36 -0.40 -0.39 0.74 0.74 Q2-1 0.62 0.62 0.90 0.90 1.03 1.04 -0.05 -0.05 Q3-1 0.63 0.62 0.62 0.63 0.63 0.65 0.63 0.63
1Use data from all environments together. 2E1, E2, and E3 represent only using the data from environment 1, environment 2, and environment 3 respectively.
For using data of various environments, the power of QTL detection and the
probability of false QTL detected are showed in Table 3-17. For the first QTL (Q1-1),
there is no QTL detect power when using data from all environments together because
its main QTL effect is 0. The power is quite low when using the data of environment
1 and environment 2 due to the small effects of QE1 and QE2. However, the power of
QTL detection is quite high in environment 3 because the QTL effect of QE3 is
relatively high (0.72). Q2-1 has both main effect and QE interaction effects and the
power of QTL detection is quite high over main, QE1, and QE2. It is interesting that
the power of QTL detection over environment 3 is almost 0. The reason is not because
that QTL by environment 3 has no effect but that the effects of main effect and QE3
effect are cancelled out. For the last QTL (Q3-1), the difference for power of QTL
detection between using date from all environments together and using date from only
one environment is caused by the change of sample size and not by the QE interaction
effects.
In case of QTL detecting power, the overall performance between IM method and
CIM method is quite similar in this simulation case. There are two possible reasons.
First, the performance of IM method is quite good under the one-QTL model, and
second, small sample size (only 100 individuals and the 2 repeats seems no much help)
do more harm on CIM method than IM method. However, to consider the probability
of false QTL detected, the performance of CIM is still much better than IM method.
−58−
Table 3-17. The power of QTL detection and the probability of false QTL detected for IM and CIM methods under various environments (Threshold LOD = 2.5).
E1-3 E1 E2 E3 QTLs IM CIM IM CIM IM CIM IM CIM Q1-1 1.95 0.98 14.33 21.00 15.67 23.67 91.67 96.67 Q2-1 98.05 100 94.33 97.33 92.67 86.67 0.67 0.00 Q3-1 97.07 98.05 51.67 51.67 45.33 50.33 61.33 73.00
FQTL1 37.07 10.73 30.00 12.67 27.00 18.33 20.33 4.33 FQTL2 11.22 0.98 4.00 0.67 4.67 1.33 1.67 0.33 FQTL2+ 1.95 0.00 0.00 0.00 0.33 0.00 0.67 0.00
Table 3-18. Estimation of positions and 95% ECI for detected QTLs under various environments (Threshold LOD = 2.5).
E1-3 E1 QTLs IM CIM IM CIM Q1-1 25.4(12.7−32.6) 27.2(23.7−30.6) 21.5(8.0−28.6) 21.5(10.0−32.6) Q2-1 55.1(41.9−64.2) 56.0(46.8−66.2) 55.0(39.9−66.2) 55.8(44.8−66.2) Q3-1 95.4(84.9−104.0) 95.2(84.9−104.0) 95.9(82.9−106.0) 96.0(84.9−106.0)
E2 E3 QTLs IM CIM IM CIM Q1-1 22.6(12.7−32.6) 22.8(12.7−34.6) 22.8(12.7−32.6) 23.0(14.7−34.6) Q2-1 55.1(41.9−66.2) 55.8(46.8−64.2) 62.2(62.2−62.2) -------- Q3-1 95.6(82.9−106.0) 90.3(82.9−104.0) 95.6(84.9−106.0) 96.6(90.0−106.0)
Note: the bold number corresponding to the position with high power of QTL detection (greater than 90.0).
Table 3-19. Estimation of effects and 95% ECI for detected QTLs under various environments (Threshold LOD = 2.5).
E1-3 E1 QTLs IM CIM IM CIM Q1-1 -0.41(-0.46−0.35) -0.35(-0.35−-0.35) -0.66(-0.96−0.55) -0.61(-0.90−0.49) Q2-1 0.62(0.43−0.85) 0.62(0.44−0.78) 0.91(0.62−1.26) 0.91(0.60−1.20) Q3-1 0.63(0.42−0.83) 0.64(0.47−0.82) 0.77(0.57−0.99) 0.76(0.57−0.98)
E2 E3 QTLs IM CIM IM CIM Q1-1 -0.75(-1.03−0.54) -0.65(-0.96−0.51) 0.78(0.55−1.04) 0.77(0.54−1.05) Q2-1 1.05(0.73−1.36) 1.07(0.74−1.42) -0.60(-0.61−0.59) -------- Q3-1 0.78(0.57−1.06) 0.73(0.50−1.03) 0.73(0.54−0.98) 0.70(0.52−0.95)
The positions of the estimated QTLs and their %95 ECI for the detected QTLs are
showed in Table 3-18. The estimators of position are quite accurate, especially for the
QTLs with high power of QTL detection (Bold number). The 95% ECI range is about
20 cM for the most QTLs. The overall results for the estimation of positions and the
relative 95% ECI are quite similar between IM and CIM methods. For the positions
with high QTL detecting power, the estimation of QTL effects for detected QTLs
(Table 3-19) is compatible with the estimation of QTL effects for known QTLs
−59−
positions (Table 3-16). However, the estimation of effects for detected QTLs is not
accurate for those positions with low QTL detecting power (Also see Table 3-17).
- QTL Epistasis
The estimators of QTL additive (main) effects in Table 3-20 were obtained by
averaging all effects on each known QTL position over the 300 replications. Although
to compare to Model-A, Model-AA has extra epistatic effects. It is interested to see
that the average estimation of QTL additive effects is very similar for both Model-AA
and Model-A by using IM and CIM methods. It implies that the epistatic effects may
not affect the average estimation for the QTL additive effects when the DH or B1
population has been used. In other words, the additive effects can still be estimated
unbiased even the epistatic effects are existed when the DH or B1 population has been
used. On the other hand, the biasness can be affected by QTLs linkage like the simple
additive case Model-II in section 3-3. However, the standard deviation for the
estimated additive effects in Model-AA is larger than that in Model-A. Therefore, QTL
epistatic effects may cause large variance for the estimation of QTL additive effects as
indicated by this simulation study.
Table 3-21 shows the summary for detection power of QTLs and the probability
of false QTL detected when Model-AA and Model-A have been used. For CIM
method, to compare to the simple additive Model-A, the detection power of QTLs is
small in Model-AA generally, especially for Q1-2, Q1-3, and Q3-2. In addition, the
probability of false QTL detected is much higher (75.67 to 9.67) in Model-AA. These
results imply that if the epistatic effects are existed, the QTL mapping results
including detection power and false QTL will be affected seriously in CIM method.
On the other hand, the impact of epistatic effects on IM method is not serious
according to this particular simulation work. To compare CIM method to IM method,
CIM has the high detection power of QTLs and IM has the low probability of false
QTL detected when having epistatic effects as demonstrated in this simulation study.
−60−
Table 3-20. Estimation of QTL additive effects for Model-AA and Model-A by using IM and CIM methods on the known QTL positions.
1IM &MdAA IM & MdA CIM &MdAA CIM & MdA QTLs 2Est 3SD Est SD Est SD Est SD
Q1-1 1.15 0.209 1.15 0.183 0.94 0.242 0.94 0.165 Q1-2 1.05 0.224 1.05 0.190 0.80 0.283 0.80 0.205 Q1-3 -0.22 0.207 -0.22 0.182 -0.70 0.174 -0.76 0.133 Q2-1 1.62 0.201 1.62 0.161 1.60 0.235 1.62 0.124 Q2-2 0.40 0.238 0.39 0.204 0.01 0.185 0.01 0.131 Q3-1 0.14 0.214 0.14 0.187 0.13 0.191 0.13 0.141 Q3-2 -0.13 0.241 -0.14 0.211 -0.13 0.195 -0.25 0.164
1MdAA and MdA represent QTL Model-AA and Model-A respectively. 2Estimation of QTL additive effects (average of 300 replications). 3Standard deviation.
Table 3-21. Detection power of QTLs and probability of false QTL detected for Model-AA and Model-A by using IM and CIM methods.
IM Method CIM Method QTLs Model-AA Model-A Model-AA Model-A
Q1-1 88.67 91.67 100.00 100.00 Q1-2 17.33 16.33 30.67 60.00 Q1-3 1.00 3.00 68.67 98.00 Q2-1 99.67 100.00 100.00 100.00 Q2-2 4.67 5.33 0.33 0.33 Q3-1 0.67 0.00 1.00 2.33 Q3-2 0.67 1.00 1.00 6.00
1FQTL 12.00 11.00 35.67 9.67 1Probability of false QTL detected in whole genome.
As showed in Table 3-22, the estimation of position for Q1-1 has certain bias,
especially for CIM method. This bias is caused by the fact of QTL linkage (Q1-1 and
Q1-2). For CIM method, the estimation of positions for the QTLs with media effects
(Q1-2 and Q1-3) is quite accurate in both Model-AA and Model-A. For the QTL with
large effect (Q2-1), the estimation of QTL position is biased, especially for the single
additive Model-A. In general, the estimation of QTL positions for the detected QTLs
is quite accurate for both QTL models as the IM and CIM methods have been used.
The estimation of additive effects for the detected QTLs (also see Table 3-22) is
unbiased in every situation for the QTL with large effect (Q2-1). For QTLs with
media effects, Q1-1 is overestimated caused by QTL linkage and Q1-2 is
overestimated in Model-AA and unbiased in Model-A by using CIM method. For IM
method, the estimation of effects for the detected QTLs has certain overestimation,
especially for those QTLs with median and small effects.
−61−
Table 3-22. Estimation of QTL positions and additive effects for the detected QTLs of Model-AA and Model-A.
IM &MdAA IM & MdA CIM &MdAA CIM & MdA QTLs 1Pos 2Eff Pos Eff Pos Eff Pos Eff
Q1-1 14.6 1.2 15.3 1.2 17.3 1.4 17.2 1.4 Q1-2 28.7 1.2 28.0 1.2 30.4 1.1 30.7 0.9 Q1-3 83.4 -0.9 77.4 -0.7 75.8 -0.8 75.5 -0.8 Q2-1 8.7 1.7 8.7 1.6 9.6 1.6 8.8 1.6 Q2-2 83.2 0.9 81.1 0.8 89.7 -0.6 89.7 0.5 Q3-1 52.0 0.7 --- --- 51.9 0.3 46.7 0.6 Q3-2 102.0 -0.8 104.7 -0.7 100.3 -0.6 98.1 -0.6
1QTL positions in cM. 2Average QTL additive effects for the detected QTLs.
3. Using MCIM Method
- QTL by Environment Interaction
By using the mixed linear model (2-13) approach, MCIM method has the ability
to analyse the QTL mapping data for all environments together. As showed in Table
3-23, the simulation result indicated that the estimation of QTL main effect and
prediction of QTL by environment interactions are unbiased. On the other hand, it is
difficult to get the unbiased estimation or prediction for QE interaction effects by
using IM or CIM method (Table 3-16).
Table 3-23. Estimation of main effect and QE interaction effects on the QTL positions for Model-AE when using MCIM method with mixed linear model approach.
Main E1 E2 E3 QTLs Chr Pos 1Q 2Est QE1 Est QE2 Est QE3 Est Q1-1 1 22.0 0.00 0.08 -0.34 -0.35 -0.38 -0.36 0.72 0.71 Q2-1 2 54.7 0.62 0.64 0.28 0.29 0.39 0.36 -0.67 -0.65 Q3-1 3 96.4 0.62 0.61 0.00 -0.02 0.00 0.02 0.00 0.00
1Real QTL effect (parameter). 2Estimation or prediction of the QTL effect (average of 300).
Table 3-24 shows the power of QTL detection and the estimation of positions and
effects for the detected QTLs of Model-AE when the MCIM method has been used.
For Q1-1, the power of QTL detection is quite low and the estimation of positions and
QTL main effect for the detected QTL has some bias. This phenomenon could be
caused by the extreme low value of QTL main effect (0.00). However, the estimation
of the QE interaction effects for the detected QTL of Q1-1 is reasonably good. MCIM
method has the high power to detect Q2-1 and Q3-1. The estimation of QTL main
effects and QE interaction effects for the detected QTLs of Q2-1 and Q3-1 is unbiased.
The range of the 95% ECI for these three QTLs is about 20cM.
−62−
Table 3-24. Estimation of powers, positions and effects for the detected QTLs of Model-AE when using MCIM method with mixed linear model approach.
Positions QE Interaction Effects QTL Power Est 1ECI
2Main 3QE1 QE2 QE3 Q1-1 12.5 25.6 12−34 0.17 -0.42 -0.38 0.80 Q2-1 96.4 55.0 44−64 0.64 0.31 0.38 -0.69 Q3-1 97.7 94.9 84−104 0.63 -0.03 0.01 0.02
1The 95% ECI of the QTL positions. 2The QTL main effects. 3The QTL by environment 1 interaction effects.
- QTL Epistasis
Table 3-25. The estimation of additive and epistatic effects on the QTL positions for Model-AA by using MCIM method when the mixed linear model is used (300 replications).
Additive Epistatic QTLs 1Eff 2Est
QTL i QTL j Eff Est
Q1-1 0.80 1.02 Q1-1 Q2-2 0.48 0.51 Q1-2 0.80 0.92 Q1-2 Q1-3 0.96 0.98 Q1-3 -0.80 -0.82 Q2-1 Q2-2 -0.96 -0.97 Q2-1 1.61 1.63 Q2-2 Q3-1 1.92 1.89 Q2-2 0.00 0.03 Q3-1 0.27 0.24 Q3-2 -0.27 -0.25
1Parameters setting for QTL effects. 2Estimation of the QTL effects.
According to the simulation study (3-4-2), the epistatic effects will hurt the
efficiency and the results of QTL mapping when IM or CIM method has been used. In
addition, there is no ways to estimate the QTL epistatic effects by using these two
methods. On the other hand, by using mixed linear model approach (2-14), MCIM
method can be used for analysing the QTL additive effects as well as QTL epistatic
effects at the same time by fitting two intervals into the model. QTLs with additive
effects and (or) epistatic effects can be located through a two-dimensional search
procedure (Wang 1999 at al).
Table 3-25 shows the estimation of the QTL additive effects and QTL epistatic
effects for Model-AA by using MCIM method. The simulation result indicated that the
estimation for the most additive and epistatic effects is unbiased. The linkage between
Q1-1 and Q1-2 may cause the overestimation of the additive effects for these two
QTLs (Q1-1 and Q1-2).
−63−
Table 3-26. The power of QTL detection and the estimation of additive and epistatic effects on the QTL positions for Model-AA and Model-A by using MCIM method when the mixed linear model is used (300 replications and threshold is LOD = 2.5).
Power Additive Epistatic QTLs 1MdAA 2MdA MdAA MdA
QTL i QTL j MdAA MdA
Q1-1 85.0 91.0 1.02 1.13 Q1-1 Q2-2 0.51 -0.04 Q1-2 66.0 69.7 0.92 0.96 Q1-2 Q1-3 0.98 0.07 Q1-3 61.5 54.0 -0.82 -0.71 Q2-1 Q2-2 -0.97 -0.05 Q2-1 99.3 99.7 1.63 1.63 Q2-2 Q3-1 1.89 0.03 Q2-2 43.3 0.33 0.03 0.03 Q3-1 32.0 1.67 0.24 0.14 Q3-2 1.0 1.33 -0.25 -0.15
1Model-AA. 2Model-A.
Table 3-26 shows the power of QTL detection and the estimation of QTL effects
on the known QTL positions for Model-AA and Model-A when MCIM method has
been used. Q2-2 and Q3-1 has obtained a big improvement in QTL detection power
for Model-AA in contrast with Model-A and it is also true for Q1-3. This result
implied that the QTL epistatic effects could improve the QTL detection power when
using MCIM method. The estimation of QTL epistatic effects for the simple additive
Model-A is almost 0. It proved that there is no harm for mapping the simple additive
model with the extended epistatic model of the MCIM method.
Table 3-27. The power of QTL detection, probability of false QTL detected, and the estimation of QTL positions for the detected QTLs for Model-AA and Model-A by using MCIM method with simple additive model (300 replications and threshold is LOD = 2.5).
Power QTL position (ECI) QTLs Model-AA Model-A Model-AA Model-A
Q1-1 83.33 91.67 15.6 (6, 26) 15.8 (8, 26) Q1-2 29.33 38.33 28.9 (19, 35) 28.8 (17, 35) Q1-3 35.67 57.67 75.0 (64, 85) 75.2 (64, 83) Q2-1 100.00 100.00 8.6 (6, 12) 8.5 (6, 12) Q2-2 1.67 0.67 82.9 (73, 96) 86.4 (83, 90) Q3-1 2.33 2.33 49.6 (40, 60) 47.2 (37, 53) Q3-2 0.67 1.33 93.0 (92, 94) 105.0 (104, 106)
1FQTL 11.00 7.67
Table 3-27 is the simulation result about the power of QTL detection, probability of
false QTL detected, and the estimation of QTL positions for the detected QTLs when
using MCIM method with the simple additive model. It is implied that if using the
simple additive model, the power of QTL detection will decrease and the probability
of false QTL will increase when the epistatic effects existed. On the other hand, the
estimation of QTL positions is unbiased under this situation.
−64−
4. Model Selection and Criteria
4-1 MIM and Model Selection
Multiple Interval Mapping (MIM) method utilizes multiple marker intervals
simultaneously to construct multiple putative QTLs in the model for QTL mapping. It
is a multiple QTL oriented method combining QTL mapping analysis with the
analysis of genetic architecture of quantitative traits. Through a search algorithm, the
method can obtain the detail information about the QTLs simultaneously such as
number, positions, effects and interaction of the significant QTLs.
The search strategy of MIM method is to select the best (or better) genetic model
in the parameter space. In other words, it is a model selection problem. Therefore,
model selection is the key component of the analysis and the basis of the genetic
parameter estimation and data interpretation in any QTL mapping methods by using
multiple intervals. The analysis of model selection in a high and unknown dimension
is very complicated. The appropriate criteria or stopping rules used for model
selection are greatly important but very difficult to decide.
In this research, we will study the properties of the criteria for model selection in
the QTL mapping framework. Here only the idea case is considered. First the cross
design is backcross using pure-breeding parental lines for its simplicity. However, the
result can be extended to other experimental design with only two different marker
genotypes such as DH and RIL population. For more complicated population such as
F2, the basic principles will be hold. Secondly, assume all the effects of the QTLs are
the same and all markers are equally spaced for the sake of standardizing the criteria.
Finally, All QTLs are exactly position on the markers.
As an example, the parameters setting for the starting model (Model-S) is: sample
size is 300, whole genome has 3 chromosomes, 14 evenly distributed markers for each
chromosome and the marker distance is 8 cM, setting totally 8 QTLs with same effect
and 4 QTLs on chromosome 1, 1 QTL on chromosome 2, and 3 QTLs for
−65−
chromosome 3. When the heritability set to 0.8, 0.5, and 0.2, the QTL effects will be
1.014, 0.507, and 0.169 respectively.
Under this situation, we can simply do model selection on the markers by least
square and multiple regression means. The results obtained under this assumption are
still useful because first, if the marker density is high, the distance between the QTL
and the nearest marker is very small and can be ignored. Second, for the loose marker
situation, MIM can use maximum likelihood method (Kao and Zeng 1997) to estimate
the QTL position according to the information of marker genotypes and positions.
However, in case of model selection, the principle should be same. That means our
result is still useful for MIM model selection practice.
4-2 Model Evaluation Standard
Consider a multiple regression model for a Backcross population as
i
M
jijji Xy εβµ ++= ∑
=1
(4-1)
where is the trait value of individual i. is the mean of the model and M is the
number of marker fitting in the model. is partial regression coefficient (marker
effect) for maker j and is the marker indicate variable for individual i and maker
j. For the backcross population, has two possible values, for example, 1 for MM
and 0 for Mm marker genotype. is a random residual variable assumed normal
distribution with mean 0 and variance .
iy µ
jβ
2
ijX
ijX
iε
σ
The goal of model selection is to find a better (not necessary the best) model with
M markers through a search procedure. Hopefully, these markers are QTLs in our idea
situation. By doing this, there are two possible errors we will make. The first type of
error (called ) is that some selected markers in the final model are not QTLs. This
kind of error is related to the Type I error in some sense. The second type of error
(called ) is that some QTLs (markers) are not included in the final model and this
α
β
−66−
kind of error is related to the detection power (1- ) of QTL. It is very important to
balance these two types of error on the model selection practice.
β
cn
cn
,2,1
3010
As a model selection standard, we have defined three parameters
for measuring the degree of fitness between the selected model and the real model.
Assume the real QTL number is N and the identified QTL number is n, the real QTL
position (cM) is P and the identified QTL position is p and the positions are measured
from beginning of the chromosome.
χβα and,,
( ) cc
cc NifCcNnN
≥=−= ∑ ,,2,11Lα (4-2)
( )∑ ≤=−=c
ccc NifCcnNN
,,2,11Lβ (4-3)
),min(,,,2,1 cccc c
tcc
NnRtandCcR
pP===
−= ∑
∑LLχ (4-4)
where C is the chromosome number, is the percentage of wrong identified QTL
and is the percentage of missed QTL, is the average distance between the
identified QTLs and the real QTLs.
α
β χ
For each chromosome, if the real QTL number is not equal to the identified QTL
number, there will be many ways to associate the real and identified QTL positions.
Here the used criterion is to minimize the total distance for each chromosome.
4-3 Model Selection Strategy and Criteria
One of the difficult parts for model selection is that there are too many potential
models to be considered. In our situation, if the total marker number is M, there will
be about possible models exist. For example, if the total marker number is 100
for the whole genome, it will be more than1 possible models exist and it is
infeasible to test all the models for obtaining the best model.
M2
2. ×
We can divide all possible models into two major groups - models with the same
number of regressors and models with different number of regressors. If the whole
−67−
genome has M markers, all possible models can be divided into M+1 classes and each
class contains the same number of regressors (from the model with M regressors to
the one with no regressor, only the model mean). The criterion for selecting the best
model among the models with same number of regressors is relatively simple. The
best model is the model with the largest coefficient of determination ( ) or the
smallest residual sum of squares (RSS). From formula (4-5) and (4-6), it is easy to see
that can be considered as a measurement for goodness of fit about the model and
maximizing the value of is equivalent to minimizing the value of RSS.
2R
2R2R
( )( )∑
−
−= 2
22
ˆ
YYYY
Ri
i (4-5)
(∑ −−=22 ˆ)1( YYRRSS i ) (4-6)
To compare models with different regressors is the most difficult task for model
selection. The reason is that as the number of regressors increased, the value of
never decreased and the RSS value is always decreased. Therefore, it is impossible to
decide which model is better by simply comparing the value of or RSS. One
must make a decision about what increase in is required before accepting an
additional regressor or what decrease in is accepted before dropping a regressor
from the backward way of thinking.
2R
2R2R
2R
In summary, there are two kinds of criteria we should deal with in the model
selection practice. The first case is that to find the best model inside classes with the
same number of regressors. In this case, criterion itself is simple (use or RSS) but
the difficulty will be too many models to be considered. One way to solve this
problem is to use certain procedure to search through a limited space to find a better
(there no way to guarantee the best) model. These search methods include forward,
backward, stepwise, and branch-and-bound etc. The second case is to find the best
model amount the models with different regressors by using certain criteria. Our
research will deal with the first problem but the focus is on the second one, the criteria
or stopping rule for model selection.
2R
−68−
4-4 Procedure of Model Selection
The first step of model selection is to find the best (at least the better) model for
each class with same number of regressors without doing exhaustive search. For the
situation of M markers, we can find the M+1 models as
Mpp ...,,1,0=η
where p is the number of regressors in the models.
Forward stepwise selection (FW) or backward elimination (BW) method can be
used for this purpose. FW method chooses the subset models by adding one regressor
at a time to the previously chosen model. It starts by choosing the one-regressor
model by selecting the regressor with the largest sum of square (SS) contributed to the
model. At each successive step, the regressor not already in the model that causes the
largest decrease in the RSS (has largest partial sum of square value) is added to the
model. This procedure can go on until all regressors are in the model. BW method
chooses the subset models by starting with the full model and then eliminating, at
each step, one regressor whose deletion will cause the RSS to increase the least. This
will be the regressor in the current model that has the smallest partial sum of square.
This procedure can also go on until the model contains none regressor (only the
model’s mean is left).
Comparing to the exhaustive search, the FW or BW method saves great amount of
the computation time. The cost is obvious as that once a regressor is included, it will
be always stay in the further models for FW method and once a regressor is excluded,
it will be no chance to get in again in the further models for BW method. Therefore, it
is no guarantee that the models selected by FW or BW method is the best model in
each class with the same number of regressors. However, in our situation, due to the
linear structure of marker positions and QTL locations, it is expected that the model
selected by FW or BW method is the best model in each class. Zeng (1993) proved an
important property about the partial regression coefficient for multiple regression
analysis. It is that the partial regression coefficient is expected to depend only on
−69−
those QTLs that are located on the interval bracketed by the two neighboring markers
if there is no crossing-over interference and no epistasis, as showed in formula (4-7).
ixgx xgx
kxx
xgk
xx
gxi
iki iki ii
ik
ii
ki
rr
rr
b δδδ =+≈ ∑ ∑<< <<− + +
+
−
−
1 1 1
1
1
1 (4-7)
In our idea situation (Model-S), the partial regression coefficient is expected to
equal to the QTL effect for markers with QTL and 0 for makers without QTL. Even
when the QTLs are not just on the markers, the partial regression coefficient of
markers near the QTLs will had large values comparing to the markers far away from
the QTLs by expectation. Because that the partial regression coefficient is directly
related to the partial sum of square. It is easy to notice that by expectation, the
markers with QTL will be selected first in FW method and the markers without QTL
will be eliminated first in BW method.
Table 4-1 shows the simulation result of the average R2 value for the true model
and the selected model by using BW method when Model-S is used. For each
replicated sample, the R2 of the true model is calculated by fitting the true parameters
(real QTL number and positions) into the multiple regression models. The R2 of the
selected model is calculated by selecting the model with 8 regressors from the full
model by using BW method. Therefore, the only difference between these two models
will be the marker (QTL) positions. It is obvious that the BW method doing very well
and the average value of R2 is even larger than the one of true model in some cases.
Due to the sample variance, the true model is not necessary the model with maximum
R2, but it is usually a good one (has small standard deviation).
Table 4-1. Comparing the coefficient of determination (R2) between the model selected by backward procedure and the true model for the Model-S. The sample size is 300 and the replication is 1000 times.
Backward Selected Models True Models Heritability 1Low 2.5% 2Mean 3Up 2.5% Low 2.5% Mean Up 2.5%
0.8 0.8029 0.8041 0.8054 0.8027 0.8043 0.8052 0.5 0.5196 0.5219 0.5242 0.5109 0.5132 0.5154 0.2 0.2524 0.2549 0.2574 0.2194 0.2218 0.2243
1The value of R2 for the lower 2.5% position. 2Average value of R2.. 3The value of R2 for the higher 2.5% position.
−70−
4-5 Summary of Criteria for Model Selection
After a series of models with different number of regressors has been obtained by
BW or FW method, the more difficult question remains – to decide which model
(how many regressors) is the model we are looking for. This is the problem of
stopping rule or model selection criteria. Following are some of the criteria:
1. Adjusted R2
pnnRR pp −
−−= )1(1 22
where p is the number of regressors and n is the sample size. Because the value of R2
always increases as the number of regressors increased, the adjusted R2 is try to use
the number of regressors as a penalty to get a balance. The model with the maximum
value of adjusted R2 is the final model when using this criterion.
2. Mallow’s Cp (Mallows 1973)
npRSS
C pp −+= 2
ˆ 2σ
where is the residual sum of square for the model with p regressors and is
the estimation of residual variance . The model with the minimum value of
is the final model when using this criterion.
pRSS 2σ
C2σ p
3. Mean Squared Error Prediction (Aitkin 1974, Miller 1990)
The statistic of MSEP is often called PRESS statistic:
( ) niyyPRESSn
iii ...,,2,1ˆ
2
1=−= ∑
=
where is the trait value and is the estimation of the trait value for individual i.
The parameters for estimating are obtained by Least Square method using all
samples except the individual i. This method is usually called cross-validation and it
is possible to drop several individuals instead just dropping one at once. The model
iy iy
iy
−71−
with the maximum PRESS statistic is the final model when using this criterion.
4. BIC and Related Criteria
( ) )(ˆln)( 2 npgnpB p += σ (4-8)
where p is the number of regressors in the model, n is the sample size, is the
least square or maximum likelihood estimation of residual variance for the model with
p regressors, and is some kinds of function of sample n.
2ˆ pσ
)(ng
By selecting , it is AIC criteria (Akaike 1970, 1973), is
the BIC criterion that is defined by Schwartz (1978) and Rissanen (1978). It is
possible to use other kinds of function (Debasis and Murali, 1996) as long as
they satisfies the following properties:
2)( =ng )ln()( nng =
)(ng
( )( ) ∞=
=
∞→∞→ nngand
nng
nn lnln)(lim0)(lim
Table 4-2. Relationship between sample size and the g(n) functions.
Samples [ln(n)]0.1 n0.1 AIC [nln(n)]0.1 [ln(n)]0.5 [ln(n)]0.9 BIC N0.5 [nln(n)]0.5 100 1.16 1.58 2.00 1.85 2.15 3.95 4.61 10.00 21.46 150 1.17 1.65 2.00 1.94 2.24 4.26 5.01 12.25 27.42 200 1.18 1.70 2.00 2.01 2.30 4.48 5.30 14.14 250 1.19 1.74 2.00 2.06 2.35 4.65 5.52 15.81 37.15 300 1.19 1.77 2.00 2.11 2.39 4.79 5.70 17.32 41.37 350 1.19 1.80 2.00 2.14 2.42 4.91 5.86 18.71 45.28 400 1.20 1.82 2.00 2.18 2.45 5.01 5.99 20.00 48.95 450 1.20 1.84 2.00 2.21 2.47 5.10 6.11 21.21 52.43 500 1.20 1.86 2.00 2.23 2.49 5.18 6.21 22.36 55.74
1000 1.21 2.00 2.00 2.42 2.63 5.69 6.91 31.62 83.11
32.55
Table 4-2 is the possible functions and related values under various sample
sizes. Formula (4-8) shows that the second term is a positive penalty function
and its value will decrease as the number of regressors decreased. This term is used to
balance for the first term
)(ng
)(npg
( )2ˆlog pσn that has the tendency to increase as the number
of regressors decreased. Therefore, the large value of function will select a )(ng
−72−
model with fewer regressors. In other hand, if the value of function is small,
the criteria have the tendency to select a model with more regressors.
)(ng
0
0
BIC( logn ) , h2=0 .8 , Back w ar d
20
40
60
80
100
1 4 7 10 13 16 19 22 25 28 31 34 37 40
B IC (logn), h2=0.8, Fo r w ar d
0
20
40
60
80
100
1 4 7 10 13 16 19 22 25 28 31 34 37 40
B IC (logn), h2=0.5, Ba c k w a r d
20
40
60
80
100
1 4 7 10 13 16 19 22 25 28 31 34 37 40
B IC (logn), h2=0.5, Fo r w a r d
0
20
40
60
80
100
1 4 7 10 13 16 19 22 25 28 31 34 37 40
BIC( logn) , h2=0.2 , Back w ar d
0
20
40
60
80
100
1 4 7 10 13 16 19 22 25 28 31 34 37 40
B IC( logn ) , h2=0 .2 , Fo r w a r d
0
20
40
60
80
100
1 4 7 10 13 16 19 22 25 28 31 34 37 40
tions.
Figure 4-1. Comparison of FW and BW methods for model selection for the Model-S. Thebars indicate the frequency of selected markers on 1000 replica
4-6 Simulation Studies of Criteria
The purpose of this simulation work is to study the effects by using different
criteria for model selection. The properties of model selection include the number of
QTL selected, probability of wrong identified QTL, power and accuracy of QTL
detection. We also try to obtain the experimental criteria for model selection that can
−73−
be used specifically for QTL mapping practice in the population of inbred line
crossing.
1. FW and BW Methods
Figure 4-1 shows the results of model selection for the Model-S on the different
heritabilities (0.8, 0.5, and 0.2). Both FW and BW methods have been used for model
searching and BIC criterion (See Formula (4-8) and ) has been used for the
stopping rule of model selection. The vertical bars represent the frequency of markers
selected in the 1000 replications.
( ) nng ln=
It can be implied that both FW and BW methods performance quite well here. To
compare FW method to BW method, BW method performances better in this case,
especially when the heritability is high. It is interest to see that the error rate of wrong
identified markers, BW method is usually lower than FW method, particularly on the
chromosomes with multiple QTLs (chromosome 1 and chromosome 3). However, FW
method performs very well on the chromosome with single QTL (chromosome 2).
The BW method will be used mainly as the searching method of model selection for
the rest of simulation studies.
2. Criteria and the Various Parameters
Table 4-3 shows the estimation of various parameters for the Model-S by using
various criteria of model selection. Parameters Alfa, Beta, and Distance are defined by
Formula (4-2), (4-3), and (4-4) respectively and Mean is the average marker number
included in the final model (the real QTL number is 8). It is obvious that the
heritability has a great influence on these parameters. For example, the parameter
Mean is 8.86 when h2 = 0.8 and reducing to 4.76 when h2 = 0.2 by using BIC criterion
and the average Distance between the real QTL positions and the identified positions
is increased from 0.26 cM to 7.41 cM.
The parameters Alfa and Beta are very important for evaluating the fitness
between the selected model and the real model. The Alfa is related to the probability
of wrong identified QTL and Beta is related to the QTL detect power (1.0 − beta). The
−74−
loose criteria such as AIC and )ln(n have the tendency to pick a model with more
regressors and the Alfa value will be high and the Beta value will be low (power is
high). However, If the strict criteria such BIC and n have been applied, the
opposite situation will occur. The primary goal of finding a good criterion is to
balance the Alfa and Beta value.
Table 4-3. Estimation of various parameters for the Model-S (1000 replications).
Criteria of Model Selection Parameters Heritability AIC 2B^0.5 3B^0.9 BIC n^0.5
1Mean 0.8 15.34 13.84 9.40 8.86 7.90 0.5 14.23 12.82 8.45 7.69 4.24 0.2 12.17 10.64 5.66 4.76 1.85
Alfa 0.8 0.92 0.73 0.18 0.11 0.00 0.5 0.78 0.61 0.14 0.08 0.00 0.2 0.60 0.45 0.08 0.04 0.00
Beta 0.8 0.00 0.00 0.00 0.00 0.01 0.5 0.01 0.01 0.08 0.12 0.47 0.2 0.08 0.12 0.37 0.44 0.77
Distance (cM) 0.8 0.13 0.15 0.25 0.26 0.29 0.5 2.62 2.92 3.37 3.42 2.65 0.2 7.94 8.47 8.14 7.41 4.78
1Average markers selected in final model. 2 ( ) 5.0ln)( nng = . 3 ( ) 9.0ln)( nng = .
Table 4-4 to Table 4-6 is the results of the parameters estimation under different
models by using BW searching method and different criteria of model selection, such
as AIC, BIC etc. The purpose is to demonstrate the impact on various parameters
estimation by changing the facts such as chromosome numbers, marker density, and
sample size. The model in Table 4-4 is similar to the Model-S except that 2 extra
chromosomes have been added in and there are no QTLs on the new added
chromosomes. It is easy to see from the result that more chromosomes in the genome,
the more strict criteria are needed. That means if using the same criterion, the Alfa
value will increase as chromosome number increased. The same conclusion can be
made for the model with more density marker (average distance between markers is
decreased) showed in Table 4-5. Comparing to Model-S, the dense model half the
average marker distance (4 cM). However, for the model with large sample size (500)
as showed in Table 4-6, the loose criteria should be applied.
−75−
Table 4-4. Various Parameters estimation for the model with 5 chromosomes.
Criteria of Model Selection Parameters Heritability AIC 1B^0.5 2B^0.9 BIC n^0.5
Mean 0.8 23.81 20.81 11.30 9.88 7.92 0.5 22.68 19.98 10.52 8.85 4.22 0.2 20.64 17.70 7.45 5.90 1.79
Alfa 0.8 1.98 1.60 0.41 0.23 0.00 0.5 1.84 1.51 0.40 0.23 0.00 0.2 1.64 1.31 0.30 0.18 0.00
Beta 0.8 0.00 0.00 0.00 0.00 0.01 0.5 0.00 0.01 0.08 0.12 0.47 0.2 0.06 0.10 0.37 0.45 0.78
Distance (cM) 0.8 0.18 0.20 0.31 0.34 0.37 0.5 2.80 3.07 3.72 3.70 2.71 0.2 8.02 8.66 8.58 7.84 4.63
Table 4-5. Various Parameters estimation for the model with marker density of 4 cM.
Criteria of Model Selection Parameters Heritability AIC 1B^0.5 2B^0.9 BIC n^0.5
Mean 0.8 27.47 23.89 12.24 10.56 7.88 0.5 25.76 22.35 10.99 9.22 4.29 0.2 24.41 20.72 8.34 6.36 1.88
Alfa 0.8 2.43 1.99 0.53 0.32 0.00 0.5 2.22 1.79 0.43 0.25 0.00 0.2 2.06 1.60 0.28 0.14 0.00
Beta 0.8 0.00 0.00 0.00 0.00 0.02 0.5 0.00 0.00 0.05 0.10 0.46 0.2 0.01 0.01 0.24 0.35 0.77
Distance (cM) 0.8 0.55 0.61 0.81 0.85 0.90 0.5 2.61 3.05 4.82 4.87 3.46 0.2 5.37 6.68 10.62 10.05 5.39
−76−
Table 4-6. Various Parameters estimation for the model with sample size of 500.
Criteria of Model Selection Parameters Heritability AIC 1B^0.5 2B^0.9 BIC n^0.5
Mean 0.8 14.97 13.22 9.05 8.60 8.00 0.5 14.41 12.7 8.71 8.24 4.87 0.2 12.49 10.75 6.03 5.29 2.14
Alfa 0.8 0.87 0.65 0.13 0.07 0.00 0.5 0.80 0.59 0.11 0.06 0.00 0.2 0.61 0.43 0.06 0.03 0.00
Beta 0.8 0.00 0.00 0.00 0.00 0.00 0.5 0.00 0.00 0.02 0.03 0.39 0.2 0.05 0.09 0.31 0.37 0.73
Distance (cM) 0.8 0.01 0.02 0.04 0.04 0.05 0.5 1.20 1.36 1.66 1.70 1.33 0.2 6.06 6.48 6.42 6.02 4.20
3. Experimental Criteria
From above simulation, it is clear that the criteria are very important for model
selection practice. The loose criteria will increase the power of QTL detection, but at
the same time, the probability of false detected QTL will increase too. On the other
hand, the strict criteria can control the probability of false detected QTL but the
detection power of QTL will be hurt. The idea criterion should be the criterion that
can control the overall false QTLs in a reasonable low probability. One-way to do this
is to control the value of Alfa = 0.05, because the Alfa value was the overall rate of
false QTL detection for the whole genome (See Formula (4-2)). If we use this kind of
criterion, the detection power of QTL will be reasonable high and the average
distance between selected markers and the real QTLs will be optimised.
Now the question is how can we find this idea criterion. It is not easy because the
criterion is affected by various facts such as heritability, chromosome numbers,
marker density, and sample size etc. Because the complication and difficulty of
statistical derivation for the formula of the idea criterion, we here to conduct a large
scale of simulations and try to find the experimental criterion of model selection used
for inbred line QTL mapping practice. From above study, we believe that the BIC is a good criterion of model selection
in our situation, but it is not very accuracy. We suggest the following modification to
BIC in order to include the facts such as heritability, chromosome numbers, marker
density, and sample size.
−77−
( ) ( ) MnpnpBM p ×+= lnˆln)( 2σ (4-9)
where M is a non-negative modifier for BIC criterion and the large M value means
strict criterion. M value is constructed by various facts such as heritability (H), marker
density (D), sample size (S), and chromosome numbers (C). All these facts are known
except the heritability. However, it is easy to obtain the estimation of the heritability
for a data set by regression all markers on the trait value (i.e. use R2).
Table 4-7 is an example of how to obtain the M value under the combinations of
the various facts. For this particular example, Heritability is 0.5, marker Density
(average marker distance) is 8 cM, Sample size is 300, Chromosome number is 3 and
the M value for Alfa = 0.05 is about 1.15 ((1.1+1.2)/2). The values of M under
different fact combinations have been summarized in Table 4-8 and Figure 4-2.
Table 4-7. Parameters estimation of Model-S with h2 = 0.5 by using modified BIC criteria.
Criteria 1B0.6 B0.7 B0.8 B0.9 BIC B1.1 B1.2 B1.3 B1.4 B1.5 Mean 10.29 9.42 8.71 8.20 7.71 7.32 6.97 6.67 6.43 6.20
Distance 3.39 3.46 3.43 3.48 3.43 3.40 3.34 3.32 3.25 3.15 Alfa 0.32 0.23 0.16 0.12 0.08 0.06 0.04 0.03 0.02 0.01 Beta 0.03 0.05 0.07 0.10 0.12 0.15 0.17 0.19 0.22 0.24
1B0.6 means M = 0.6.
By using multiple regression method, it is easy to find the relationship between M
and the various facts according to the data in Table 4-8. We propose following
multiple regression model:
εββββµ +++++= CSDHM 4321 (4-10)
where H is heritability, D is marker density, S is sample size, and C is chromosome
number. The parameters ( s) in Formula (4-10) can be estimated by using multiple
regression analyzing (SAS v6.12) according to the data in Table 4-8. The
experimental formula for M will be:
β
CSDHM 1.002.006.05.05.1ˆ +−−+= (4-11)
−78−
Table 4-8. The value of M under various parameters setting for α = 0.05.
h2 1Start 2D-16 D-4 D-2 3S-150 S-500 S-1000 4C-5 C-9 0.1 0.90 0.60 1.20 1.40 1.10 0.80 0.80 1.35 1.70 0.2 0.97 0.70 1.25 1.55 1.15 0.87 0.87 1.40 1.70 0.3 1.00 0.75 1.35 1.60 1.20 0.95 0.90 1.40 1.75 0.4 1.10 0.77 1.45 1.60 1.25 1.00 0.95 1.45 1.75 0.5 1.15 0.80 1.50 1.80 1.35 1.05 0.95 1.50 1.80 0.6 1.20 0.95 1.60 1.80 1.45 1.10 1.00 1.50 1.80 0.7 1.20 0.90 1.60 1.90 1.50 1.10 1.00 1.45 1.80 0.8 1.25 0.95 1.60 1.95 1.50 1.10 1.00 1.45 1.80 0.9 1.25 0.95 1.60 1.90 1.50 1.10 1.00 1.45 1.75
1Model-S with marker density = 8 cM, samples = 300 and chromosomes = 3. 2Marker density = 16 cM. 3 Sample size = 150. 4 chromosome number = 5.
T
case
of th
chro
�����������������������������������������
��������������������������������������������������������������������������������������������������������0.70
1.00
1.30
1.60
1.90
2.20
h0.1 h0.2 h0.3 h0.4 h0.5 h0.6 h0.7 h0.8 h0.9
���������������D-16
D-8
D-4
D-2
Figure 4-2. The value of M under various parameters setting for α = 0.05. Fromtop to bottom, marker density, samples, chromosomes and the solid line isModel-S.
����������������������������������
����������������������������������
����������������������������������������������������������������������������������������������������
0.70
1.00
1.30
1.60
1.90
2.20
h0.1 h0.2 h0.3 h0.4 h0.5 h0.6 h0.7 h0.8 h0.9
S-150
S-300���������������S-500
S-1000
������������������������������������������������������������������������������������
�������������������������������������������������������������������
0.70
1.00
1.30
1.60
1.90
2.20
h0.1 h0.2 h0.3 h0.4 h0.5 h0.6 h0.7 h0.8 h0.9
S-150
S-300���������������S-500
S-1000
able 4-9 is an example of using this experimental criterion in two simulation
s. For the first case, we set 9 QTLs into 3 chromosomes; the numbers (positions)
e QTLs are 4 (16,40,72,104) for the first chromosome, 2 (8, 32) for second
mosome, and 3 (8, 40, 80) for the last chromosome. The heritability is 0.6,
−79−
marker density is 4 cM, and sample size is 200. In the second case, heritability is 0.82,
marker density is 10 cM, and sample size is 185. There are 7 QTLs distributed along
the 5 chromosomes and the numbers (positions) are 2 (30,70), 3 (10,40,80), 0, 0, 2
(20,50). The experimental criterion (formula 4-9 and 4-11) works fine in these two
cases by controlling the average Alfa value at 0.05 levels. The QTLs detect power is
considerably high. However, the sample variances are quite high, especially for
parameter Alfa. It may be caused by the small sample size. To increase the sample
size will reduce the sample variance for Alfa (data not showed).
Table 4-9. Estimation of parameters in model selection by using the experimental criterion.
Cases 1M QTLs 2n 3SD α SD β SD χ SD First 1.66 9 6.9 1.7 0.048 0.076 0.28 0.12 6.1 4.2
Second 1.58 7 7.4 0.9 0.058 0.095 0.00 0.00 0.3 0.7 1See Formula (13). 2Average identified QTL numbers, 3The standard deviation,
5. Conclusions and Discussion
5.1 Conclusion
One of the goals of this dissertation is to explore and compare the major QTL
mapping methods through simulation study. Single marker analysis is the simplest
method for QTL mapping practice. It can be implied from the simulation study, by
using single marker method, the markers near QTLs have the highest significant level.
However, the nearby markers can also have very high significant level too when the
QTL effect is large or there are more than on QTLs existed in a chromosome.
Therefore, it is the method of QTL “detection”, not location.
Unlike single marker analysis, IM method uses two markers to construct a testing
interval to search for QTL and it has the ability to use the estimated genetic map to
locate QTL and estimate QTL effect at same time. CIM method is the extension of IM
method and the CIM method included the extra markers into the model for controlling
the background genetic variation. MCIM method is similar to CIM method except
−80−
that MCIM method considers the markers for background control having random
effects and uses mixed linear model approach.
Under the simple additive QTL model and DH population, our simulation results
indicated that IM methods perform quite poor both in the QTL detection power and
the possibility of false QTL detection. The performance of MCIM method is
reasonably good while CIM method has the lowest possibility of false QTL detected
and the highest power of QTL detection in most cases. However, unlike CIM method,
the change of marker numbers for background control on MCIM method has little
impact on the result of QTL mapping work. CIM method gained big improvement for
increasing the sample size while MCIM method obtained the benefit for increasing
the marker density as implied by the simulation studies.
It cannot be denied that QTLs are assumed to act additively is a great
simplification of reality. Many research supply strong evidence for QTL by
environment interaction and epistatic interactions between QTLs (Long et al. 1995).
For IM and CIM methods, the simulation studies implied that the estimation of QTL
main effects can be obtained unbiased by using data for all environments together.
However, it is difficult to obtain the estimation of QE interaction effects, even by
doing QTL mapping on the data for different environment separately. In this case,
some of the QTLs may not be located and will be missed. MCIM method has the
ability to put all QTL main effects and QE interaction effects into the mixed linear
model and obtained the unbiased estimation of main and QE effects as indicated by
the simulation study work.
MCIM method can use mixed linear model for mapping QTLs with marginal and
epistatic effects. The simulation study has indicated that MCIM method can obtain the
unbiased estimation of QTL marginal and epistatic effects. Although IM and CIM
have the ability to get the unbiased estimation of marginal QTL effects when the QTL
epistatic effects are existed, the variance for marginal effects estimation will increase
largely too. On the other hand, the detection power of QTLs will go down and the
probability of false QTL detection will go up apparently, especially for the CIM
method as the simulation study indicated.
−81−
MIM is a multiple intervals oriented method for QTL mapping. MIM method can
obtain the QTL information including number, positions, effects, and interaction of
significant QTLs simultaneously and having the ability to analyse the genetic
architecture for an experimental species. There are two crucial tasks, evaluation
procedure and search strategy, need to be fulfilled before MIM method can be
workable. Kao and Zeng (1999) have finished the first task, which is the algorithm for
analysing the likelihood of the data given a genetic model (QTL number, positions
and epistasis of QTL). However, the second task, which is to search and select a
genetic model among the parameter space, is very complicated and the problem of
criteria for model selection is not solved completely yet.
Another goal of this dissertation is to check the performance for various criteria of
model selection under the QTL mapping framework through the simulation studies.
We have also defined a set of parameters for describing the degree of fitness between
the selected model and the true model and proposed an experimental criterion of
model selection, which can be used for QTL mapping work, such as MIM. The
experimental criterion is a modification of BIC by adding relevant facts such as
heritability, marker density, sample size, and chromosome numbers. The experimental
criterion can control the type I and type II errors at a reasonable level and is more
precise than the famous BIC criteria under the QTL mapping situation. The
experimental criterion of model selection works fine in our simulation cases.
There are a number of very important issues that have been neglected in this thesis.
We conclude our discussion with brief statements on several of these issues.
5.2 Threshold and Criteria
Unless the QTLs are very close, on the known QTL positions, the estimation of
QTL effects is unbiased for the methods of using one interval such as IM, CIM and
MCIM as indicated by the simulation studies. However, in the real QTL mapping
practice, the QTL positions are unknown. Therefore, the priority of QTL mapping
work is to find the evidence of QTL according to the LR value. Threshold is a
predefined value and if the LR value exceeds the threshold, a QTL will be declared
−82−
and the position and effect can be estimated easily. It is clear that the threshold value
is important because a high value of threshold will decrease the detect power of QTLs
and lower the probability of false QTL detection at the same time. On the other hand,
the low value of threshold will do the opposite. Therefore, a high value of threshold is
needed if the purpose of the QTL mapping experiment is to find the precise position
for QTL clone and the low value of threshold is a appropriated one if the purpose of
the experiment is to find as many QTLs as possible.
The appropriate value of threshold for IM method can be obtained theoretically
because IM method is a simple one-QTL model. For DH or Backcross population, the
distribution for the maximum of LR on the whole marker interval is between and
, more close to for relatively small interval (< 10 cM) and the LOD threshold
is between 2 and 3 for many organisms. However, the threshold can be affected by
many experimental factors such as sample size, genome size, marker density, missing
data, and segregation distortion etc. Therefore, for particular experiment, the threshold
coming from the theory is not accurate sometimes. One way to obtain the threshold
value according to the specific experimental data set is using permutation test. The
approach to the estimation of a significance threshold is based upon the simple
observation of marker-phenotype association.
21χ
22χ
21χ
Unlike the simple situation in IM method, which is simply testing the hypothesis
of no QTL or just one QTL, the possible existence of several QTLs in one
chromosome has to be considered by the CIM and MCIM methods. The derivation of
the threshold theoretically will be complicated by the multiple-testing problem. On
the other hand, the theoretical bases for the permutation test for CIM or MCIM
method is questionable because the difficulty to permute the selected markers for
control of background variance. Therefore, how to obtain the reasonable threshold
value for CIM and MCIM methods under the multiple-QTLs situation is still an open
question. However, our simulation studies indicate that the LOD = 2.5 is a reasonable
threshold value according to the simulation results, especially the power of QTL
detection and the probability of false QTL detection.
−83−
In addition to the difficulty of choosing an appropriate threshold value, the
problem of criteria (stopping rules) for model selection is critical for the QTL
mapping methods by using multiple intervals (multiple QTLs). MCIM method uses
two-dimensional search strategy for mapping QTLs with epistatic effects. One of the
problems is that there are too many possible pairwise intervals to be considered and
the appropriate selecting methods should be used for reducing the analysing time.
Another difficult question is that how to decide which pairwise interval is the one we
are looking for and this is the problem of criterion for model selection. The problems
of searching strategy and criteria for model selection are even complicated for MIM
method, which uses multiple intervals at same time. In this thesis, we use the
approach of simulation study to deal with these problems and proposed the
experimental criterion for model selection under the QTL mapping framework. This
criterion is useful as indicated by the simulation studies.
5.3 Software Design
The QTL mapping methods such as MIM, MCIM, CIM, IM and even One Marker
Analysis are very complicated in algorithm and time consumed in computation. The
appropriate computer software is important for analysing the data of QTL mapping
experiment. Therefore, to develop good computer software is one of the important
issues for QTL mapping work. The definition of the good computer software for QTL
mapping includes several points. Having the ability to compute fast and accurate is
the most import points for any kinds of software. Besides that, due to the complicated
nature of QTL mapping in experimental data structure and computation algorithm, it
is also very critical to have a user-friend interface for these kinds of software. Another
important consideration for software development is the result presentation. Sometime,
it is not sufficient to represent the QTL mapping results just in raw data format or
tables. To present the result in graphic (visualization) is very useful and can provide
more information for the experimental organism.
−84−
Nowadays, The most popular computer software of QTL mapping include
Mapmaker/QTL (Lander et al. 1987), which is based on the IM method of Lander and
Botstein and QTL Cartographer (Basten et al. 1994), which is based on Zeng’s CIM
method. The above software works very well in the sense of computation. However,
the user-interface for these kinds of software is not very friendly and sometime it is
difficult to handle or use. Moreover, these kinds of software lack the function to
visualize the results of QTL mapping work. Although the result obtained by QTL
Cartographer can be visualized when using the general-purpose graphic software
“gnuplot”. However, the major drawback is that the graphic functions of “gnuplot”
are limited because the software is not designed particular for QTL mapping purpose.
This is the motivation of developing the QTL mapping software with user-friend
interface and powerful graphic presentation ability for the mapping results. The
software is called Windows QTL Cartographer and the functions are based on QTL
Cartographer. It has many uses and been posted on the Internet: (http://statgen.ncsu.
edu/ qtlcart/cartographer.html).
Nowadays, personal computer begins more popular and powerful. The 32 bits
windows operation systems, such as Windows 98 and Windows NT, are the most
popular and efficient systems working on the PC environment. Therefore, it is nature
to develop the new version of QTL Cartographer under PC and Windows operation
system environment. On the other hand, there is several software development
systems exist under Windows environment. These systems include JAVA, Visual
Basic, and Visual C++ etc. The Visual C++ system is the extension of C/C++
language and the C/C++ language is the most powerful and popular program language
on the software market. Visual C++ has added thousands more functions designed for
windows program. Because MFC encapsulate most windows’ functions and can
produce the sceptical program automatically. Here we develop the software by using
MFC under Visual C++ software development environment.
The function of the software can be roughly divided into three main parts – source
data management, data analysis, and the result output and visualization.
- Source Data Management
−85−
The user interfaces for the source data has a great influence on the use of the
software. By using the user-friendly interface, users can organize and input the source
data easily and run the software very quickly.
For the convenience of constructing the input file, a “new file” function has been
implemented. The function uses menu and dialog boxes to guide the user to quick
build the input file and that is much easier than editing the raw data and the tokens.
The ‘Import’ function of the software has been designed for the convenience of the
user. By using this function, people can use the source files of QTL mapping
experiments in MAPMAKER / QTL or QTL Cartographer formats directly.
The software can use ‘Open’ function to load the source data from the “mcd”
format file (the example is included in the software). A limited edit function can be
applied on the displayed file such as delete and insert a character. If the file has been
modified, the ‘Verity’ function should be used again for the error checking. To use the
‘Save’ function to keep the modification permanently and it is also possible to use the
‘Save to’ function to save the file in different file name. It is possible to view the
source data in an organized way by using the “View SrcData” function on the menu.
Through the function, you can view the marker genotype data for each chromosome
and trait values in a nice way.
- Mapping Functions
After the source data is loaded and verified, it is critical to have a good method or
algorithm to analysis data and produce the correct result. ‘QTL Cartographer’ has
been written and used by many people for several years. Therefore, the software of
Windows QTL Cartographer called the relative programs of ‘QTL Cartographer’
directly from inside the software for the purpose of calculation for various QTL
Mapping methods. However, by using the user-friendly interfaces and dialog boxes, it
is easy to change the parameters setting for various QTL mapping methods and
therefore this software is much easier to use.
The implemented functions include Statistical Summary for the raw data, Single
Marker Analysis, Interval Mapping Method, and Composite Mapping Method. The
dialog box for Interval Mapping Method and Composite Mapping method are very
−86−
similar, and the only difference is that there is no ‘Control’ button in the dialog box
for Interval Mapping Method. The ‘Result File’ button is used for indicating the result
filename. To use the ‘Walk Speed’ spin control to set the QTL searching step that is
the distance (in cM) between two testing points for calculating the LR score along the
whole genome. For the chromosome, it is possible to test all chromosomes or just test
one of the chromosomes that can be set by the spin control. That is also true for the
trait selection and only after selecting one trait, the threshold value for this trait can be
inputted or calculated through permutation test.
When all the parameters have been set, QTL mapping analysis can be started by
clicking the ‘OK’ button. In the middle of the process, several DOS windows may
pop out caused by the direct calling for the programs of QTL Cartographer software.
The best way to deal with is to minimize these windows by clicking the minimize
button in the title bar. The result file will be loaded automatically into the software
and the relative graphic will be showed in the graphic dialog box immediately.
- Result visualization
The result file is a text format file with the extension filename of “qrt” and it can
also be opened as the graphic view file by the software of Windows QTL
Cartographer. Many functions can be used through a menu in the graphic view
window and these functions include File, Chrom, Traits, Effect, and Setting. By using
‘File’ item, user can open a new result file, copy the graphic image into the clipboard,
and exit the view window. The ‘Chrom’ and ‘Traits’ items can be used for selecting
one or several chromosomes and traits for display. The estimation of QTL additive
and dominance effects can be displayed by using ‘Effect’ item. For simulation study,
the QTL effects and positions can also be showed out by open a ‘qtl’ file. Through the
‘Setting’ item, user can adjust various properties of the graphic such as changing the
color, setting the threshold value, showing marker names, and drawing the
coordination for various positions etc.
−87−
Reference
[1] Abler, B. S. B., M. D. Edwards, C. W. Stuber 1991. Isoenzymatic identification
of quantitative trait loci in crosses of elite maize inbreds. Crop Science 31:
267-274.
[2] Andersson, L. et al. 1994. Genetic mapping of quantitative trait loci for growth
and fatness in pigs. Science 263: 1771-1774.
[3] Aldhous, P. 1994. Fast tracks to disease genes. Science 265: 2008-2010.
[4] Baes P., and P. Van Cutsem 1993. Electrophoretic analysis of eleven isozyme
systems and their possible use as biochemical markers in breeding of Chicory
cichorium-intybus L. Plant Breeding 110: 16-23.
[5] Basten, C. J., S. B. Zeng and B. S. Weir 1994. Zmap – A QTL cartographer.
Proceedings of the 5th world congree on genetics applied to livestock production,
22: 65-66.
[6] Beckmann, J. S. and M. Soller 1983. Restriction fragment length polymorphisms
in genetic improvement: methodologies, mapping and cost. Theor. Appl. Genet.
67: 35-43
[7] Beckmann, J. S. and M. Soller 1986a. Restriction fragment length polymorphisms
and genetic improvement in agricultural species. Euphytica 35: 111-124.
[8] Beckmann, J. S. and M. Soller 1986b. Restriction fragment length
polymorpphisms in plant genetic improvement. Oxford Surveys Plant Mol. Cell
Biol. 3: 196-250.
[9] Botstein, D., R. L. White, M. Skolnick, and R. W. Davis 1980. Construction of a
genetic linkage map in man using restriction fragment length polymorphisms.
Am. J. Hum. Genet. 32: 314-331.
[10] Bradshaw, H. D. and R. F. Stettle 1995. Molecular genetics of growth and
development in populus. IV. Mapping QTLs with large effects on growth, farm,
and phenology traits in a forest tree. Genetics 139: 963-973.
−88−
[11] Breiman, L. 1995. Better Subset Regression Using the Nonnegative Garrote.
Technometrics 37: 373-384.
[12] Burr, B. F., A. Burr, K. H. Thompson, M. C. Albertson and C. Stuber 1988.
Gene mapping with recombinant inbreds in maize. Genetics 118: 519-526.
[13] Buetow, K. H. and A. Chakravarti 1987a. Multipoint gene mapping using
seriation. I. General methods. American Journal of Human Genetics 41:
180-188.
[14] Buetow, K. H. and A. Chakravarti 1987a. Multipoint gene mapping using
seriation. II. Analysis of simulated and empirical data. . American Journal of
Human Genetics 41: 189-201.
[15] Cause, M. A., T. M. Fulton, Y. G. Cho, S. N. Ahn, J. Chunwongse, et al. 1994.
Saturated molecular map of the rice genome based on an interspecific backcross
population. Genetics 138: 1251-1274.
[16] Chase, K., F. R. Adler, K. G. Lark 1997. Epistat: A Computer Program for
Identifying and Testing Interactions Between Pairs Of Quantitative Trait Loci.
Theor. Appl. Genet. 94: 724-730.
[17] Churchill, G. A., and R. W. Doerge 1994. Empirical threshold values for
quantitative trait mapping. Genetics 138: 963-971.
[18] Cockerham, C. C. 1954. An extension of the concept of partitioning hereditary
variance for analysis of covariances among relatives when epistatsis is present.
Genetics 39: 859-882.
[19] Comstock, R. E. 1978. Quantitative genetics in maize breeding. pp 191-206. In
maize breeding and genetics, New York.
[20] Delourme, R., and F. Eber 1992. Linkage between an isozyme marker and a
restorer gene in radish cytoplasmic male sterility of rapeseed (Brassica napus L.).
Theor. Appl. Genet. 85: 222-228.
[21] Doerge, R. W. 1993. Statistical methods for locating quantitative trait loci with
molecular markers. Ph. D. dissertation, Dept. Statistics, NCSU, Raleigh.
[22] Doerge, R. W. and G. A. Churchill 1995. Permutation tests for multiple loci
affecting a quantitative character. Genetics 142: 285-294.
−89−
[23] Doris-Keller, H. at al. 1987. A genetic linkage map of the human genome. Cell
51: 319-337.
[24] East, E. M. 1916. Studies on size inheritance in Nicotiana. Genetics 1: 164-176
[25] Eberhart, S. A., R. H. Moll, H. F. Robinson and C. C. Cockerham. 1966.
Epistatic and other genetic variances in two varieties of Maize. Crop Science 6:
275-280.
[26] Edwards, M. D., C. W. Stuber, J. F. Wendel 1987. Molecular-marker-facilitated
investigations of quantitative trait loci in maize. I. Numbers, genomic
distribution and types of gene action. Genetics 116: 11-125.
[27] Falconer, D. S. 1996. Introduction to quantitative genetics. Ed. 4. Longman.
New York.
[28] Frank, I., Friedman 1993. A Statistical View of Some Chemometrics
Regression Tools. Technometrics 35: 109-135.
[29] Frankel, W. N. 1995. Taking stock of complex trait genetics in mice. Trends
Genet. 11: 471-477.
[30] Gai J.-Y., J.-K. Wang 1998. Identification and estinmation of a QTL model and
its effects. Theor Appl Genet 97: 1162-1168.
[31] Goffinet B., B. Mangin 1998. Comparing methods to detect more than one QTL
on a chromosome. Theor Appl Genet 96: 628-633.
[32] Haldane, J. B. S. 1919. The combination of linkage values and the calculation of
distance between the loci of linked factors. Journal of Genetics 8: 299-309.
[33] Haley, C. S., S. A. Knott 1992. A Simple Regression Method for Mapping
Quantitative Trait Loci in Line Crosses Using flanking Markers. Heredity 69:
315-324.
[34] Hallden, C., A. Hjerdin, I. M. Rading, T. Sall, and B. Fridlundh, et al. 1996. A
high density RELP linkage map of sugar beet. Genome 39: 634-645.
[35] Halward, T., H. T. Stalker and G. Kocher 1993. Development of an RFLP
linkage map in diploid peanut species. Theor. Appl. Genet. 87: 379-384.
−90−
[36] Hamalaine, J. H., K. N. Watanabe, J. P. T. Valkonen, A. Arihara, R. L. Plaisted,
et al. 1997. Mapping and marker asssisted selection for a gene for extreme
resistance to potato virus Y. Theor. Appl. Genet. 94: 192-197.
[37] Hartley, H. D. and J. N. K. Rao 1967. Maximum-likelihood estimation for the
mixed analysis of variance model. Biometrika, 54: 93-108.
[38] Hoeschele, I., P. Uimari, F. E. Grignola, Q. Zhang and K. M. Gage 1997.
Advances in statistical methods to map quantitative trait loci in outbred
populations. Genetics 147: 1445-1457.
[39] Jansen, R. C. 1992. A general mixture model for mapping quantitative trait loci
by using molecular markers. Theor. Appl. Genet. 85: 252-260.
[40] Jansen, R. C. 1993. Interval mapping of multiple quantitative trait loci. Genetics
135: 205-211.
[41] Jansen, R. C. 1994. Controlling the Type I and Type II Errors in Mapping
Quantitative Trait Loci. Genetics 138: 871-881.
[42] Jansen R. C., D. L. Johnson., J. A. M. Van Arendonk 1998. A Mixture Model
Approach to the Mapping of Quantitative Trait Loci in Complex Populations
With an Application to Multiple Cattle Families. Genetics 148: 391-399.
[43] Jiang, C.-J., Z.-B. Zeng, 1995. Multiple Trait Analysis of Genetic Mapping for
Quantitative Trait Loci. Genetics 140: 1111-1127.
[44] Jiang, C.-J., Z.-B. Zeng, 1997. Mapping Quantitative Trait Loci With
Dominant And Missing Markers In Various Crosses From Two Inbred Lines.
Genetica 101: 47-58.
[45] Johannsen, W. 1909. Elemente der exakten erblichkeitsliehre. Fisher, Jena.
[46] Josee D. and D. Siegmund 1999. Statistical Method for Mapping Quantitative
Trait Loci From a Dense Set of Markers. Genetics 151: 373-386.
[47] Kao, C.-H., Z.-B. Zeng 1997. General Formulas For Obtaining The MLEs
And The Asymptotic Variance-Covariance Matrix In Mapping Quantitative
Trait Loci When Using The EM Algorithm. Biometrics 53: 653-665.
[48] Kao, C.-H., Z.-B. Zeng, and R. Teasdale 1999. Multiple interval mapping for
quantitative trait loci. Genetics 152: 1023-1216.
−91−
[49] Kindiger B., and R. A. Vierling 1994. Comparative isozyme polymorphisms of
North American eastern gamagrass, Tripsacum dactyloides var. dactyloides and
maize, Zea mays L. Genetica 94: 77-83.
[50] Knott, S. A., C. S. Haley, 1992. Aspects of maximum likelihood methods for the
mapping of quantitative trait loci in line crosses. Genet. Res. 60: 139-151.
[51] Kundu, D., G. Murali, 1996. Model Selection in Linear Regression.
Computational Statistics & Data Analysis 22: 461-469.
[52] Lagercrantz, U., and D. J. Lydiate 1996. Comparative genome mapping in
Brassica. Genetics 144: 1903-1910.
[53] Lander, E. S., P. Green, I. Abrahamson, A. Barlow, M. J. Daly, et al. 1987.
Mapmaker: an interactive computer package for constructing primary genetic
linkage maps of experimental populations. Genomics 1: 182-195.
[54] Lander, E. S., D. Botstein. 1989. Mapping Mendelian factors underlying
quantitative traits using RFLP linkage maps. Genetics 121: 185-199.
[55] Lander, E. S. 1993. Finding similarities and differences among genomes. Nature
Genetics 4: 5-6.
[56] Lee, G. H., L. M. Bennett, R. A. Carabeo, and N. R. Drinkwater 1995.
Identification of Hepatocarcinogen-resistence genes in DBA/2 mice. Genetics
139: 387-395.
[57] Lee, M. 1995. DNA markers and plant breeding programs. Adv. Agron. 55:
265-344
[58] Li, Z. K., S. R. M. Pinson, M. A. Marchetti, J. W. Stansel and W. D. Park 1995.
Characterization of quantitative trait loci contributing to field resistance ot
sheath blight (Rhizonctonia solani) in rice. Theor. Appl. Genet. 91: 382-388.
[59] Liu, B. H. and S. J. Knapp 1992. QTLSTAT 1.0, a software for mapping
complex trait using nonlinear models. Oregon state university.
[60] Liu, B. H. and S. J. Knapp 1997. Computational tools for study of complex traits,
pp. 43-79 in Molecular dissection of complex traits. edited by A. H. Paterson,
CRC Press LLC, Boca Raton, Florida.
−92−
[61] Long, A. D., S. L. Mullaney, L. A. Reid, J. D. Fry, C. H. Lanley and T. F. C.
Mackay 1995. High resolution mapping of genetic factors affecting abdominal
bristle number in Drosophila melanogaster. Genetics 139: 1273-1291.
[62] Lu, Y. Y. and B. H. Liu 1995. PGRI, a software for plant genome research.
Plant Genome III conference (Abstrct): 105, San Diego, CA.
[63] Lynch, M., B. Walsh, 1998. Genetics and Analysis of Quantitative Traits.
Massachusetts: Sinauer Associates, Inc.
[64] Mallows, C. L. 1995. More Comments on Cp. Technometrics, 37: 362-372.
[65] Mangin, B., B. Coffinet and A. Rebai 1994. Constructing Confidence Intervals
for QTL Location. Genetics 138: 1301-1308.
[66] Manly, K. F. and E. H. Cudmore, Jr. 1996. New version of MAP manager
genetic mapping software. Plant Genome IV (Abstract): 105.
[67] Martinez, Q., R. N. Curnow 1992. Estimation the Locations and the sites of
the Effects of Quantitative Trait Loci Using Flanking Markers. Theor. Appl.
Genet. 85: 480-488.
[68] Nienhuis, J., T. Helentjaris, M. Slocum, B. Ruggero and A. Schaefer 1987.
Restriction fragment length polymorphism analysis of loci associated with insect
resistance in tomato. Crop Science 27: 791-803.
[69] Nilsson-Ehle, H. 1909. Kreuzunguntersuchungen an hafer und weizen. Lund.
[70] Paterson, A. H., S. Damon, J. D. Hewitt, D. Zamir, H. D. Rabinowitch, S. E.
Lander and S. D. Tanksley 1991. Mendelian factors underlying quantitative
traits in tomato: comparison across species, generations, and environments.
Genetics 127: 181-197
[71] Plomin, R., G. E. McClearn and G. Gora-Maslak 1991. Quantitative trait loci
and psychopharmacology. Journal of Psychopharmacology 5 1-9.
[72] Ragot, M., and D. A. Hoisington 1993. Molecular markers for plant breeding:
comparisons of RELP and RAPD genotyping costs. Theoor. Appl. Genet. 86:
957-984.
[73] Rao, C. R 1971. Estimation of variance and covariance components MINQUE
theory. Journal of multivariate analysis 1: 257-275.
−93−
[74] Rao, P. S. R. S. 1997. Variance components estimation: mixed models,
methodologies and applications (Monographs on statistics and applied
probability 78). Chapman and Hall, London.
[75] Rasmusson, J. M. 1933. A contribution to the theory of quantitative character
inheritance. Hereditas 18: 245-261.
[76] Rebai, A., B. Coffinet and B. Mangin 1994. Approximate thresholds of interval
mapping test for QTL detection. Genetics 138: 235-240.
[77] Rebai, A., B. Coffinet and B. Mangin 1995. Comparing power of different
methods for QTL detection. Biometrics 51: 87-99.
[78] Reiter, R. S., J. G. Cors, M. R. Sussman and W. H. Gabelman 1991. Genetic
analysis of tolerance to low-phosphorus stress in maize using restriction
fragment length polymorphisms. Theor. Appl. Genet. 82: 561-568.
[79] Reiter, R. S., J. G. K. Williams, K. A. Feldmann, J. A. Raflski, S. V. Tingey and
P. A. Scolnik 1992. Global and local genome mapping in Arabidopsis thialiana
by using recombinant inbred lines and random amplified polymorphic DNAs.
Proc. Natl. Acad. USA 89: 1477-1481.
[80] Sax, K. 1923. The association of size differences with seed-coat pattern and
pigmentation in Phaseolus vulgaris. Journal of Theoretical Biology 117: 1-10.
[81] Searle S. R. 1970. Large sample variances of maximum likelihood estimators of
variance components using unbalanced data. Biometrics 26: 505-524.
[82] Shao, J. 1996. Bootstrap Model Selection. Journal of the American Statistical.
Association 91: 655-665.
[83] Sillanpaa M. J., E. Arjas 1998. Bayesian Mapping of Multiple Quantitative Trait
Loci From Incomplete Inbred Line Cross Data. Genetics 148: 1373-1388.
[84] Simon, C. J., and F. J. Muehlbauer 1997. Construction of a chickpea linkage
map and its comparison with maps of a pea and lentil. Journal of Heredity 88:
115-119.
[85] Soller, M., and J. S. Beckmann 1988. Genomic genetics and the utilization for
breeding purposes of genetic variation between populations. The second
−94−
international conference on quantitative genetics, pp 161-188. Sinauer Assoc.,
Sunderland, MA.
[86] Stuber, C. W., M. D. Edwards, J. F. Wendel 1987. Molecular marker facilitated
investigations of quantitative trait loci in maize II. Factors influencing yield and
its component traits. Crop Science 27: 639-648.
[87] Stuber, C. W., S. E. Lincoln, D. W. Woff, T. Helentjaris, and E. S. Lander 1992.
Identification of genetic factors contributing to heterosis in a hybrid from two
elite maize inbred lines using molecular marker. Genetics 132: 832-839.
[88] Tanksley, S. D., and C. M. Rick 1980. Isozymic gene linkage map of the tomato:
Applications in genetic and breeding. Theor. Appl. Genet. 57: 161-170.
[92] Tibshirari, R. 1996. Regression shrinkage and Selection via the Lass. Journal
of the Royal Statistical Society Series B, 58: 267-288.
[89] Thoday, J. M. 1961. Location of polygenes. Nature 191: 368-379.
[90] Thompson, E. A. 1984. Information gain in joint linkage analysis. IMA Journal
of Mathematical Applied Medical Biology 1: 31-49.
[91] Thompson, E. A., T. R. Meagher 1998. Genetic linkage in the estimation of
pairwise relationship. Theor Appl Genet 97: 857-864.
[93] Van Ooijen, J. W. and C. Maliepaard 1996. MAPQTL version 3.0: Software for
the calculation of QTL position on genetic map. Plant Genome IV (Abstract):
105.
[94] Viruel, M. A., R. Messeguer, M. C. De Vicente, J. Garcia Mas, P.
Puigdomenech, et al. 1995. A linkage map with RELF and isozyme markers for
almond. Theor. Appl. Genet. 91: 964-971.
[95] Visscher P. M., R. Thompson and C. S. Haley 1996. Confidence intervals in
QTL mapping by bootstrapping. Genetics 143: 1013-1020.
[96] Wang D., J. Zhu, Z. Li, A. H. Paterson (1999). Mapping QTLs with epistatic
effects and QTL × environment interactions by mixed linear model approaches.
Theor. Appl. Genet., 99:1255-1264.
−95−
[97] Wang S. and X. Yan 1998. Simulation data generation method and software
design for mixed linear model. Journal of Zhejiang Agricultural University
24(2):135-140
[98]
Wang, S., C. Basten, and Z.-B Zeng 1999. Windows QTL Cartographer.
Department of Statistics, North Carolina State University, Raleigh, NC.
(http://statgen.ncsu.edu/qtlcart/WQTLCart.htm).
[99] Weeks, D. and K. Lange 1987. Preliminary ranking procedures for multillocus
ordering. Genomics 1: 236-242.
[100] Weller, J. I. 1996. Maximum likelihood techniques for the mapping and
analysis of quantitative trait loci with the aid of genetic marker. Biometrics 42:
627-640.
[101] Williams, J. G. K. et al. 1990. DNA polymorphisms amplified by arbitrary
primers are useful as genetic markers. Nucl. Acids Res. 18: 6531-6535.
[102] Wu W.-R., W. M. Li 1995. Model Fitting and Model Testing in the Method
of Joint Mapping of Quantitative Trait Loci. TAG 92: 477-482.
[106] Zeng, Z.-B. 1993. Theoretical Basis Of Separation Of Multiple Linked
Gene Effects On Mapping Quantitative Trait Loci. Proc. Natl. Acal. Sci. USA
90: 10972-10976.
[103] Xu, G. W., C. W. Magill, K. F. Schertz and G. E. Hart 1994. A RFLP linkage
map of sorghum bicolor (L.) Moench. Theor. Appl. Genet. 89: 139-145.
[104] Xu, Y. B., and L. H. Zhu 1994. Molecular quantitative genetics. China
Agriculture Press, Beijing.
[105] Zeng, Z.-B. 1992. Correcting the Bias of WRIGHT’s Estimates of the
Number of Genes Affecting a Quantitative Character: A Further Improved
Method. Genetics 131: 987-1001.
[107] Zeng, Z.-B. 1994. Precision Mapping of Quantitative Trait Loci. Genetics
136: 1457-1468.
[108] Zhu, J. and B. S. Weir 1996. Diallel analysis for sex-linked and maternal
effects. Theor. Appl. Genet. 92: 1-9.
−96−
−97−
[109] Zhu J 1998. Mixed model approaches for mapping complex quantitative trait
loci, pp. 11-20, in Proceedings of the China National Conference on Plant
Breeding, edited by L. Z. Wang and J. R. Dai, Agricultural Science and
Technology Press of China, Beijing, China.
[110] Zhu J., and B. S. Weir 1998. Mixed model approaches for genetic analysis of
quantitative traits. pp. 321-330, In Proceedings of the International Conference
on Mathematical Biology, edited by L. S. Chen, S. G. Ruan, and J. Zhu, World
Scientific Publishing Co., Singapore.
[111] Zhu J 1999. Mixed model approaches of mapping genes for complex
quantitative traits. Journal of Zhejiang University (Natural Science), 33(3):
327-335
[112] Zhu J 1999. Principle of Linear Model Analysis. Scientific Publishing House,
Beijing, China