MultiQTL · Our algorithms are based on up-to-date theoretical papers from the whole QTL mapping field but some are unique to us. 1-12 References Korol A., Preygel I., Preygel S

1-1

MultiQTL

Comprehensive interactive package for genetic mapping

of quantitative traits,

allowing for multiple-trait, multiple-environment,

multiple-family, and multiple-interval QTL analysis

Institute of Evolution

Haifa University, Haifa, Israel

Tel: 972-4-8240449, Fax: 972-4-8246554

http://www.multiqtl.com

http://www.multiqtl.com/

Table of Contents

Part 1: Loading and Editing Real Data Data input format Starting to load the data Project creation stage Optional addition and removal of data files Optional data revision

Part 2: Data Simulation (step1 - step 6)

Part 3: Model Creation and Calculation stages Model Creation stage General model options About model Calculation panel description Save and Open Project options

Part 4: Single-QTL model, one and two traits Displaying the results Main window for analysis and LOD graph Models with “extended” parameters Selective genotyping model

1-10

Part 5: Two-linked QTL model Submodel and Estimate options Compare option Bootstrap and Distribution options

Part 6: Multiple-trait model

Multiple trait example for single QTL model Multiple trait example for two-linked QTL model


Table of Contents (continued)

Part 8: Multiple environments file Results for model with/without trait normalization Submodel option Submodel compare option

1-11

Part 11: Summarizing the results Total Significance option Creation of the result table with specified threshold significance. Computation of the total significance by Benjamini-Hochberg (1995) method Creation of the report table Creation panel with LOD score graphs

Part 7: Multiple chromosome analysis Creation of multi-chromosome set Fitting multilocus interval model Multiple Simulation options

Part 9: Multiple families Computation results Submodel compare option Significance options Multichromosome set Format transition “multiple-environment” “multiple-trait” formats

Part 10: Selective genotyping Simulated data Computation results Multichromosome set


Our algorithms are based on up-to-date theoretical papers from the whole QTL mapping field but some are unique to us.

1-12

References

Korol A., Preygel I., Preygel S. 1994, Recombination Variability and Evolution. Chapman & Hall, Lond. Korol A., Ronin Y., Kirzhner V. 1995, Interval mapping of quantitative trait loci employing correlated trait complexes. Genetics 140: 1137-1147. Ronin Y., Kirzhner V., Korol A. 1995, Linkage between loci of quantitative traits and multi-trait analysis with a single marker. Theor. Appl. Genet. 90: 776-786. Ronin Y., Korol A., Fahima T., Kirzhner V., Nevo E. 1996, Censored estimation of linkage between PCR-generated markers and a target gene based on stepwise bulked analysis. Biometrics 52: 1428-1439. Korol A., Ronin Y., Kirzhner V. 1996a, Linkage between loci of quantitative traits and marker loci. Resolution power of three statistical approaches in single marker analysis. Biometrics 52: 426-441. Korol A., Ronin Y., Tadmor Y., Bar-Zur A., Kirzhner V., Nevo E. 1996b, Estimating variance effect of QTL: an important prospect to increase the resolution power of interval mapping. Genet. Res. 67:187-194. Weller J., Song J., Ronin Y., Korol A. 1997, Experimental designs and solutions to multiple trait comparisons. Animal Biotechnology 8: 107-122. Korol A., Ronin Y., Hayes P., Nevo E. 1998a, Multi-interval mapping of correlated trait complexes: simulation analysis and evidence from barley. Heredity 80: 273-284. Korol A., Ronin Y., Nevo E. 1998b, Approximated analysis of QTL-environment interaction with no limits on the number of environments. Genetics 148: 2015-2028. Ronin Y., Korol A., Weller J. 1998, Selective genotyping to detect quantitative trait affecting multiple traits. Theor. Appl. Genet. 97: 1169-1178. Ronin Y., Korol A., Nevo E. 1999, Single- and multiple-trait analysis of linked QTLs: some asymptotic analytical approximation. Genetics 151: 387-396. Peng J., Korol A., Fahima T., Röder M., Ronin Y., Li Y-C., Nevo E. 2000, Molecular genetic maps in wild emmer wheat, Triticum dicoccoides: genome-wide coverage, massive negative interference, and putative quasi-linkage. Genome Res.10:1509-1531. Korol A., Ronin Y., Itzcovich A., Peng J., Nevo E. 2001, Enhanced efficiency of QTL mapping analysis based on multivariate complexes of quantitative traits. Genetics 157: 1789-1803.


1-13

* References (continued)

Peng J., Ronin Y., Fahima T., Röder M., Li Y., Nevo E., Korol A. 2003, Domestication quantitative trait loci in Triticum dicoccoides, the progenitor of wheat. Proc. Natl. Acad. Sci. USA 100: 2489-2495. Mester D.I., Ronin Y.I., Hu Y., Peng, J., Nevo E., Korol A.B. 2003, Efficient multipoint mapping: Making use of dominant repulsion-phase markers. Theor.Appl.Genet. 107: 1002-1112. Mester D.I., Ronin Y.I., Minkov D., Nevo E., Korol A.B. 2003, Constricting large scale genetic maps using evolutionary strategy algorithm. Genetics 165: 2269-2282. Hanotte O., Ronin Y., Agaba M., Nilsson P., Gelhaus A., Horstmann R., Sugimoto Y., Kemp S., Gibson J., Korol A., Soller M., Teale A. 2003, Mapping of quantitative trait Loci (QTL) controlling resistance to trypanosomosis in an experimental cross of trypanotolerant West African N’Dama cattle (Bos taurus) and trypanosusceptible East African Boran cattle (Bos indicus) Proc. Natl. Acad. Sci. USA 100: 7443-7448. Ronin, Y, Korol, A.,

Shtemberg, M., Nevo, E., Soller, M. 2003, High resolution mapping of quantitative trait loci by selective recombinant genotyping. Genetics 164: 1657-1666. Mester, D, Korol, A, Nevo, E. 2004, Fast and high precision algorithms for optimization in large scale genomic problems. Computation Biology and Chemistry 28: 281-290. Yagil, C., Sapojnikov, M., Wechsler, A., Korol, A.B. & Y. Yagil, Y. 2006. Genetic dissection of proteinuria in the Sabra rat. Physiological Genomics 25: 121-133. Atzmon, G., Ronin, Y., Korol, A., Yonash N., Cheng H. & Hillel J. 2006. QTLs associated with growth traits and abdominal fat weight and their interactions with gender and hatch in commercial meat-type chickens. Animal Genetics 37: 352–358. Korol, A., Frenkel, Z., Cohen, L., Lipkin, E. & Soller, M. 2007. Fractioned DNA pooling: a new cost effective strategy for fine mapping of quantitative trait loci. Genetics 176: 2611–2623. Korol, A., Mester, D., Frenkel, Z. & Ronin, Y.I. 2009.Methods for genetic analysis in the Triticeae. Chapt. 6 In: C. Feuillet and G. J. Muehlbauer (eds). Genetics and Genomics of the Triticeae. Springer, pp. 163-199. Korol A, Frenkel Z, Orion O, Ronin Y. 2012. Some ways to improve QTL mapping accuracy. Animal Genetics 43 (Suppl. 1), 36–44.

* References to other papers cited in any of the Parts 1-12 are provided at the end of the document (i.e., at the end of Part 12)


1-14

Part 1: Loading and Editing Real Data

Table of Contents

Format input of real data

Chromosome File Format Trait File format Starting to load the data

Definition and explanations New Input

16 17 20 21 22 23 25 26 30 3135


Regular data Environments data Family data Selective Genotype data Error correction General data formats (*.txt, *.dbf, *.xls, *.raw)

Step 1 Enter a name for the problem Step 2 Select the type of population Step 3 Enter the number of environments and genotypes Step 4 Enter the symbols of the markers Step 5 Select the files of data

Old Input

38 38 39 40 41


1-15

Project Creation Stage 44 45464749


Table of contents (continued)

Errors in chromosome values Errors in trait values Errors in chromosome: recombination rate >0.5 New project creation

Option of changing marker order in the chromosome Option of adding data files Option of adding marker files Option of deleting data files from Problem folder Extracting data and file information from Project Data revision option

Chromosome revision Trait revision

50 53 54 56 58 59 61 62 68

Graphical display of information file


1-16

For detailed descriptions of the MultiQTL formats see the following pages.

Format input of real data

In order to analyze real QTL data, the data files must be prepared in txt format (ANSI) only with extension indicated below. Word or Excel must be converted to this format.

Four main types of real data are employed for the current version of MultiQTL: chromosome files with extension *.chr (each chromosome being represented by a separate file), trait files with extension *.trt (each trait being represented by a separate file) or *.tra (one file for all traits), and marker files with extension *.mrk (for separate marker loci added later, after the main body of data is already entered). No blank (space) is allowed in the file names. If Notepad tool of Windows is used for preparing all these files, you should be careful in saving these files with the proper extension: when saving, please, choose “all files” from the “save as type” options of Notepad.


To conduct QTL analysis with MultiQTL software, you should first prepare linkage maps (ordering of markers along chromosomes). This can be done using various packages, such as MapMaker (Lander et al. 1987) or JoinMap (Stam 1993). A new package, MultiPoint, for efficient multipoint mapping based on Evolutionary Strategy algorithms (Mester et al. 2003a,b, 2004) is being developed by our team (to be released in 2005).

The format described below is mandatory for the old input (v. 2.4 or earlier). The new version v. 2.5 is much easier and allows additional formats (will be described later as new input).


1-17

The Chromosome File Format

Each chromosome file should be arranged as a set of rows. Each row starts with a unique marker name followed by a series of symbols representing individual scores of this marker for all genotypes.

The chromosome data file has extension *.chr

The number of genotypes in each row must be the same. It is not necessary to have spaces between the symbols of marked genotypes, but the first symbol for a genotype must be space separated from the marker name. We assume that the order of markers for each chromosome is known (e.g., from a previous marker ordering based on MapMaker, JoinMap, or other software). Consequently, the rows in the chromosome data file should follow this order. If this is not the case (marker rows are not ordered), you can input your data file with an arbitrary order of markers and provide an additional file *.ord to define the order of markers in the chromosome. For more details, see page 53, this part.

As mentioned above, we are going to provide you with a new, highly efficient and user-friendly, package for multipoint mapping, MultiPoint (Mester et al. 2003a,b, 2004) (expected release June-July 2005).



1-18

Any symbol can represent any genotype. The defaults are:

Missing genotype: 0

Backcross genotypes: 1 - Aa 2 - aa

Dihaploid genotypes: 1 - AA, 2 - aa

RIL_selfing genotypes: 1 - AA, 2 - aa

RIL_sib_mating population genotypes: 1 - AA, 2 - aa

F2, F3, F4 populations genotypes:

for a codominant marker locus: 1 - AA, 2 - aa, 3 - Aa otherwise: when maternal allele is dominant: 4 - A when paternal allele is dominant: 5 - a

The Chromosome File Format (continued)

The defaults can be changed to any other symbols in step 4 of loading data section of the old input (see next example).


Note that with such designations, the sign of the QTL substitution effect d represents the difference X(AA)-X(aa), or X(1)-X(2), or X(A)-X(B), where AA and aa (or 1 and 2, or A and B) are the two homozygotes at the marker locus (clearly, you may use other designations, but sign(d) is always in accordance with the indicated direction). Likewise, h is calculated as deviation from the mid-parent value, i.e., X(Aa)-0.5[X(AA)+X(aa)], or X(3)-0.5[X(1)+X(2)], or X(H)-0.5[X(A)+X(B)].


Example 2: ChromNew.chr

(F2: a - AA genotype, b - aa genotype, h - Aa genotype,

missing symbol -)

BMAG579

h a b h b b h b

b h h - h b h a

b h b b a b b a

b a b h a a a h

a b b

hE35M58la

h d b d b b h b

b h b - h b h b

b h b b a b b d

a a b h a a b h

a b d h h b h d

a b - b

The Chromosome File Format (continued)

Example 1: Chrom1.chr (Population F2, default) mar1a 1 3 23 112 1 3 2 1 1 mar2a 102311210211

mar3a 2 22 3 11 2 1 1 2 1 1

1-19



For the Single-environment case the number of numeric values (phenotypes) in each trait must be the same as the number of genotypes for each marker of the chromosome files. For the currently available QTL-E models (Multiple-environments case), the number of phenotypes for each trait is equal to number of the genotypes (for each marker locus) multiplied by the number of environments. Traits values are set in turn: values for first environment, then for second environment, and so on.

1-20

The Trait File format

The trait data should be arranged in groups. Each group represents a trait and consists of a unique trait name and a line of numeric values or a symbol for a missing value. By

default, the missing value symbol is $.


Single trait data files can also be created. The data may be organized in one group without a trait name. The name of the file with extension *.trt may represent the trait name:

Bur.trt:

54.5 68.5 87.5 88.5 $ 88.5 82.5 78.5 88.5 80.5 95.5 75.5

Yg2.trt:

4.28 4.10 6.45 2.25 11.25 12.55 $ 8.90 10.14 8.83 13.45 9.14

You can also write all trait groups together in a *.tra file:

Alltr.tra:

Bur 54.5 68.5 87.5 88.5 $ 88.5 82.5 78.5 88.5 80.5 95.5 75.5

Yg2 4.28 4.10 6.45 2.25 11.25 12.55 $ 8.90 10.14 8.83 13.45 9.14


Starting to load the data

1-21

Definition and explanations

A “Problem” is a folder that includes information for one mapping problem only, defined by its population, genotype and phenotype numbers, and its chromosomes and traits files. In this folder, an information file that reflects the problem

structure is created alongside the working files.

For problem creation, select File New

Problem from the main menu.


<Old Input> was used in versions up to v. 2.4. Version 2.5 <New Input> includes, in addition, an improved input that can read MapMarker (*.raw) files, formats *.dbf, *.txt and exel (*.xls).


1-22

Starting to load the data: New Input


By opening <Population type> window we can select the population type from the list below. In fact, due to the new section “Multiple families” you can also analyze data generated using Advanced backcrossing design (see Part 9, page 2).

You can choose the format of input data by opening the window of formats. But we first consider an example of our old format (*.chr,*.trt,*.tra). By pressing button <Import Data> you can input data from any folder.

You can get Help for this option by pressing F1.


1-23

Starting to load the data: New Input (continued)

Regular data


If the entered information was correct (without errors), a list of entered files will appear in the right part of the window. Each of these files can be deleted. For that, you should choose it from the section Files, press button <Delete> of the keyboard, and confirm this action by <OK>. You can continue the input by adding new files to the list of selected files.

A window <Open> will appear where you should choose the folder with your data files, select files for input and press button <Open>.


1-24


Regular data (continued)


By pressing button <+> near the symbol of a chromosome, you can get its markers. Colors and letters “C”,”R”, and “D” serve to indicate marker dominance.

You can change the order of markers in the chromosome. For that, select the the marker by using the mouse left button. By using double click of the right mouse button you can get the menu <Move Up> and <Move Dn> for moving the marker.

By pressing <OK>button you finish the input. A standard window <Save As> will appear for choosing the folder for the entered information and service files. Now we can move to step Project Creation Stage (see page 1-44).


1-25


Environments data


In this case, you should first put the check box <Environments> to state <On> and then press the button <Import Data> to input your data.

You can change the names of the environments. For that, you press on <+> near the list of environments.

A list of default names of environments will appear. To change a name, you should mark the environment by mouse left button and then repeat clicking on the chosen name. After the corresponding frame appears, you

can change the name.


1-26


Family data


You should first put the check box <Multiple Families> to state <On>, and then press <Import Data> button to input your data. There are some requirements that we should mention here: 1. The data on families are entered consequently: all files for the first family, then all files for the second family, etc. 2. The chromosome and trait file names should be the same across families. 3. Marker names within chromosomes are shared among families. 4. In chromosomes of some families a marker may be in state “missing data” across the family, but all trait values should be presented across families.

During the input of each family, a question of merging the files appears. Merging is conducted for files of chromosomes (according to marker names) and traits. Missing marker (not polymorphic or not scored within a family) is replaced in this family by symbol “missing marker”. All files are presented on the right side of the panel.


1-27


Family data (continued)


Different markers can be missed in different families, but marker ordering must be the same across families. During the input process you may use a special file *.ord that defines the order of the markers in each chromosome. Each line of this file begins with the chromosome name followed by names of the mapped markers in their true order. The marker symbols are separated by space or tab. To enter the *.ord file, you should chose it in the Input window and press Import Data button.

Below two examples are shown:

Chr1 mar1 mar4 mar5 mar2 mar10 Chr2 mar3 mar4 mar 1 mar6 mar7 mar8


1-28

Starting to load the data: New Input. Family data (continued)


Similar to the case of Multiple-Environment data (page 25), you can change the family name. After all data were entered, you may want to open all chromosomes (by pressing <+>) in order to check again the order and names of the markers. If the names of some markers were not the same among families, extra markers will appear. To find out in what family this happened you should open each family by pressing corresponding <+>, and then within the family find the chromosome where this error has occurred. In the example below, the error is in the 2nd chromosome of Fam2: instead “marker” we had ”mark”. We should delete this file by choosing it from the section Files (where the families are ordered) and pressing the button <Delete>. To fix the names of the markers we should again input this file (after switching the check box <Always show the data table> to state <On>).


1-29


Changing marker’s name

After the input, a table with information on the treated chromosome will appear, and we can now change there the marker name. To do that, we need to select the column with the wrong name and write the correct name in the field Column name. Pressing <OK> will result in input of the chromosome with the changed names.

Starting to load the data: New Input. Family data (continued)


1-30


Selective Genotyping data


Now you should indicate the trait that was the basis of selective genotyping. This trait should be marked by a double click, resulting in appearance of red frame near the trait name. If the trait was not marked, then, after you press the button <OK>, a text explaining how to conduct the choice will appear.

In this case you should put the check box <Selective Genotyping> to state <On> and then press the button <Import Data>.


1-31


Error correction


If there are errors in the information files, corresponding message will appear. You do not need to remember these errors, just press <OK> button.

As a result, a window with the details of the errors will appear. By pressing <OK> in this window, you will obtain sequentially messages on the errors. For instance:


1-32


To correct the indicated error, we write in the Column number field the error number obtained in the message. Simultaneously the marker name appears in the Column name field and this column becomes the first one in the table. In the field near button <Find> we write the erroneous symbol «4» and press <Find>. Both the corresponding line and the erroneous symbol become highlighted. We can correct (change) the highlighted value, e.g., to «2», and press <Enter> of the keyboard. After next pressing <OK> a message about next error appears (if errors still remain).


Error correction (continued)


1-33


To fix such error, you need again write the column number «11» in the indicated field, and in the field near button <Find> enter a backspace and press <Find>. The line «439» with missing symbol(s) will appear. The error will be corrected by entering the missing symbol(s) and pressing <OK>. During entering each symbol do not forget to press <Enter> on the keyboard.

It may happen that the number of entered genotype scores is different in different markers.




1-34


If in the chromosome file the codes do not correspond to the chosen standards, a special message appears.

Missing AA aa Aa A dom

a dom

First, you should mark all adjacent loci, by choosing the first one and then the last one, using Shift key (you’ll get all loci highlighted in black). Now you can bring to correspondence the occurred codes and their graphical display. Namely, to change the code, you should mark the targeted code value by pressing the left mouse button. Then, by a second click on this marked code, you will get a frame. Now you can change the code in the frame as you need.

Once the codes were corrected, they will be remembered and become thereby default codes for all other chromosomes.

Starting to load the data. New Input


Coding by default: Codes occurred in this chromosome


1-35


Other formats


If you are using data prepared in one of the usual formats, you should first select the needed format and then, by pressing <Import Data> button, enter the file in this format. Data from deferent formats can also be entered when you prepare a job file for one mapping project. We show an example with data sets imported first from *.raw (MapMarker) format, and then from *.dbf (DataBaseFormat).


1-36


After entering the data we get an information table. Markers and traits may be displayed in one table. To show how to display the markers in accordance to their chromosomes, let us select the first four markers and write “chrom1” in the field “Chromosome”. Then select the reminder markers and call this group “chrom2”. The same table will be obtained after import from *.dbf format. In this table we may denote groups “chrom3” and “chrom4”.

Starting to load the data: New Input. Other formats (continued)

Such a subdivision of a file into chromosomes during the input process is possible only with a small number of markers (especially unordered). In case of many markers, it will be better first to divide the files according to chromosomes, and input the chromosomes separately. Each chromosome will be named according to its data file name. If the markers were already ordered in initial data file, you may enter all markers as one “super-chromosome” and then subdivide this array into separate chromosomes (section <Data revision options>, p.61).


1-37


After input from *.txt and *.xls formats, a transposed table may be obtained, with marker names occupying the first column. Let us press the button <TRANSPOSE TABLE>.

Now the marker names appear in the first line. To move the names to the title line we should indicate the current line with the names, then change the state of the check box <Column Names> to <On> and press button <Take>.

Starting to load the data: New Input. Other formats (continued)


Starting to load the data: Old Input

1-38

Step 1: Enter a name for the Problem

User should name the problem he is going to create. By default, folder is created in the current directory. In order to make changes, define another Location, and then choose a name for the Problem.


If a folder with this name already exists in the current directory, then after you push the <Next> button, this message will appear. Answering “Yes” destroys all information in this folder. After answering “No” it is necessary to choose a new folder name.

Step 2: Select the type of population and the

mapping function

Currently, you may use the following types of populations: backcross, dihaploid, RIL_selfing, RIL_sib_mating, F2, F2_F3, and F2_F4 (F2 markers with phenotypes represented by means of F3 or F4 families, respectively). Mark <QTL-E> radio button in the QTL-environment interaction case.


1-39

By default, one environment, and number of phenotypes is equal to the number of genotypes. In the case of multi-environment analysis, the number of phenotypes is equal to number of genotypes multiplied by the number of environments (we assume that the same genotypes are assayed in multiple environments). After the number of genotypes was entered, the number of phenotypes appears automatically.

Starting to load the data: Old Input (continued)

Step 3: Enter the number of environments, genotypes, and phenotypes


By answering “Yes” it is possible to change the names of the

environments (by default: env1, env2, etc…) by using this table

In the case of data from multiple-environment design, after you push the <Next> button, this message will appear.


1-40

Notes:

• missing marker symbol - one character only or a digit >5 (by default 0)

• missing trait symbol - one character only, not a digit ! (by default $)

• if your codes differ from the default, you should enter your codes. Then, by pressing button <Next>, you will move to the next step

Backcross (by default)

F2 population (example)

Step 4: Enter the symbols of the

markers that are used in the

data files

By default:

Backcross - 1,2

F2 - 1,2,3,4,5




1-41

First select the drive, then select the folder, and, finally, the files. The icons of the files correspond to their type.

Step 5: Select the data files (chromosomes

and traits) to use in the problem


Remark: Only one *.tra file may be selected

for the Problem !

Click on the <Include> button to add the files

to the problem.

At this stage, the equality of the numbers of genotypes and phenotypes for each marker and trait to the number that was input earlier (at step 3) is checked. Only files that fit this condition may be included. Error messages give the names of files not included in the problem (i.e., when this condition is violated).



1-42

The name of genotypes is incorrect if it begins from a digit or there is no space between a marker name and its values. The number of genotypes may be incorrect if the name of the next marker is missing. Examples of marker name errors: 35EM47jw 1 1 2 3 1 2 2 3 2 1 EM47jw1 1 2 3 1 2 2 3 2 1

Example of errors in the traits file: 0.1500.090 (space is missing). In trait files with extension *.tra additional errors may occur, e.g.: no space between the name of the trait and its values. Example: Wstrs0.17

Step 5 (continued): Error messages

For chromosome file: For traits file:

All errors are saved in the file

errors.txt of Problem folder


Starting to load the data. Old Input (continued)


1-43

When some of the included files have the correct number of objects, this message box appears:

By answering “Yes” and pressing <Finish> button we move to the next stage of data input. By answering “No” and pressing <Cancel> we exit from program to correct errors (in accordance with the messages on the detected errors concerning the number of objects, as on the previous page). If there are no files with correct number of objects, we will get the message:

Step 5 (continued):


Starting to load the data. Old Input (continued)

Press <Cancel> and <Exit> for error correction.


1-44

When creating a project, the <Select Data Files> window appears on the screen. A Project is a file with extension *.job that holds all the data and definitions of a problem, including the methods and the results of the calculations that were conducted. To create the project, some or all files from a directory may be chosen. In one Problem folder, one may create several Projects.

Project Creation Stage

At this stage, the correctness of marker and trait values is verified; in there is an error, windows for correction appear on the screen.


The process of error correction described in the following pages is related only to <Old Input>. For <New Input> developed for v.2.5, checking and correcting the data is much easier and is conducted directly during the data loading as explained above (see page 1-31)


Errors in the chromosome files

Please note that the program checks for correspondence between marker state designations for each of the dominant markers.

Project Creation Stage (continued)

Correct the errors and press the <Next file> button.

The next errors will appear.

All markers from the entire chromosome are shown in the window.

Click on the button near the error to display the error correction window.

1-45



1-46

Errors in the trait file: the missing symbol does not correspond to the symbol defined in Problem folder, or a trait value is not a number.


Errors in the trait values. Error message

Correct these errors and press <Save and exit>. If <Cancel> button is clicked or some errors were not corrected, the window with the question appears. If no more errors were found, project creation is completed.


If <OK> is pressed in the current panel, error control is performed once again and corresponding messages appear in the window <Input Data Report>.


In the chromosome file, a special error message may be obtained due to markers being in repulsion phase, which can be solved by changing the phase (but see the next page).

Click <Yes> to see the upper window. Click <Apply> button to see the lower window; the phase is changed for markers 3 and 4. Click <Save and Exit> button to save the chromosome under a special new name atz8mn%1.


Errors in chromosome: recombination rate >0.5

1-47


This data control is relevant for both Old Input and New Input.


In this case, changing the phase for one interval does not improve the situation. Thus, the program suggests dividing the chromosome into three parts.

Chromosomes with recombination rates >0.5 cannot be included in the project.


Errors in chromosome: recombination rate >0.5 (continued)

1-48


Important note: For data on Families only a message that a certain interval in a certain family shows recombination rate>0.5 may be obtained. You can enter for this family a separate data set and a transformed chromosome, and then use the transformed information to input the data sets for all families.


1-49

Then choose the Problem folder

and its information file.


Creating a new project

It is possible to create several Projects in a single Problem folder. To do so, choose menu item FileNewProject.

A <Select Data Files> window for project creation appears.



1-50

Graphical display of information file

After the Load Data operation is finished, all relevant files are automatically copied to the folder denoted in Step 1. In this folder, an information file with extension *.inf is created that holds all definitions and file names used in the Problem. This information file can be examined using the EditSetting option of the main menu. It is necessary to first choose the Problem folder, and then a file with extension *.inf .

Information file with Problem name with one environment



1-51

In these cases you may change environment (or families) names. Click <Change Environment Names> or <Change Families names> for this. Now change the names in the appeared window.

Graphical display of information file (continued) For a Problem with multiple environments, the information file window looks like:


For a Problem with multiple families, the information file window looks like:


1-52

Graphical display of information file (continued)


For a Problem with Selective genotyping, the information file window looks like: The name of the trait that was a basis to select individuals from the tails of the trait distribution for selective genotyping.

For the trait file with extension *.tra this frame shows the name of the file and names of traits in that file.


Option Change marker order

It is possible to revise (change) the order of the markers in any chromosome. To do so, it is necessary to create a special file with extension *.ord in the Problem folder for every chromosome to be changed. For example for chromosome chrom2 with 12 markers we create the file:

1-53


chrom2Ord.ord 3 6 2 7 8 9 11 1 5 12 4 10

Then we select <EditChange Marker Order> and choose the

corresponding chromosome and order file from the folder.

The new chromosome should be saved with a new or an old name.

This option is used with the <Old Input>. With <New Input> you can change the order of the markers in the chromosome during the input (see page # 1-24).


Option Add data files

It is possible to add data files to an existing Problem folder. This is done using the option <EditAdd to ProblemAdd Data Files> from the main menu. Then choose the Problem folder and its information file in the window <Select Problem to add data files>.

1-54


Now, we choose a folder and its files that will be added to the chosen Problem

folder in the <Select data files to add into Problem> window.


Option Add data files (continued)

The number of genotypes for a chosen chromosome file is compared to the number of genotypes in the Problem folder. The correspondence of marker symbols to those in the Problem folder is checked.

For adding trait files, *.trt files only may be used. The number of phenotypes in the new file is compared to the number in the Problem folder. One of the traits of *.tra file may be overwritten by adding a *.trt file with the same trait name (in this example, the trait “front”.)

Possible error messages appear.

1-55


If the names of added files coincide with the names of existing files, a message offering to overwrite the existing file appears.


1-56

Option Add marker files

E35M55 444024244422244244444244444202424424444444444

Example of a file named mona4h_5.mrk containing marker E35M55 :

Sometimes you may need to add a new marker to a chromosome that already exists in Problem folder. The marker addition procedure will be shown on the next page. First, we place the marker file in the Problem folder. The marker file format is the same as the chromosome file format, but it includes one marker only. The file’s extension should be *.mrk.

During input of a marker file, the user must know the chromosome name and the interval number of its marker. It is helpful if the marker file name contains this information. For example, we use a name like mona3h_6 to indicate that the marker of this file will be added to chromosome mona3h to interval 6.


The option for adding a marker file <EditAdd to ProblemAdd Marker Files> works

similarly to that for adding a data file. First, choose the Problem folder, then find the folder

containing the marker files and select them.


1-57

Option Add marker files (continued)

The number of genotypes from the file with the additional marker as well as marker scores are compared against the corresponding values in the Problem folder. The interval number defining the position of the new marker must be less than the total number of markers in the chromosome. Otherwise, an error message will appear.

In the window <Inserting Markers>, choose the marker file name, the chromosome name in the Problem’s chromosome folder, and the input interval number. Then click <OK>.



Deleting data files from the Problem folder

In the window <Select files to delete>, select which files should be deleted. After clicking <OK> those

files will be deleted.

It is possible to delete data files from an existing

Problem folder. This is done by selecting <Edit

Delete> from the main menu.

Then select the Problem folder and its information file.

1-58



1-59

Extracting data and information file from Project

It is also possible to extract data and problem information from the Project file.

To do that, the Project must be opened using the <FileOpen Project > option.

When we perform the <FileExtract> option, we input the whole path for the folder where the extracted information will be placed. The name of this file information is the initial *.inf.

A Project file with extension *.job contains problem information, fitted models, used methods and obtained results. It is not restricted to the Problem folder and can be

placed in any other folder.



1-60

Extracting data and file information from Project (continued)

Example of information file

with extracted data


In case of multiple Families, data for each family are entered to a separate sub-folder of the chosen folder (using names Fam1, Fam2 …).


1-61

Data revision option

This option is used to change chromosome or trait data. We can delete (or add) a marker in the chromosome or selected part of the chromosome. As in chromosome reordering, we can conduct transformations of the trait data. This option is available for open Projects only. To revise the chromosome data, select <Data revisionChromosome> from the menu or tool bar.

To revise the trait data, choose <Data revisionTrait> from the menu or tool bar.


This option is not provided for data with multiple Families, Environments, and Selective Genotyping.


1-62

In the inside panel, we see marker names and the segregation ratios, together with chi-square values for deviation from expected ratios and significance of deviations marked by *, **, *** (for p<5, 1, and 0.1%, respectively).

Data revision option (continued)

Chromosome revision


Select the chromosome. Blue and red denote dominant markers linked in repulsion phase, whereas green denotes co-dominant markers

.

Note that our new MultiPoint software (under development) allows building high quality multilocus maps based on such information (for the method see Mester et al. 2003 a,b).

Click the <Rec.distances> button. The table of pairwise marker distances appears.


1-63

To delete a marker, choose the <Select marker to delete> option and select the desired marker (for example, marker #4).

Data revision option

Deleting a marker

The new chromosome is shown

in the <Changed chromosome

report> window.

To continue deleting markers, choose the <Select marker to delete> option again and select the desired marker in the <Changed chromosome

report> window.

Chromosome revision (continued)



1-64

To add a marker, choose the <Select interval to add> option. The panel for marker selection appears. After clicking the <OK> button of this panel, new marker(3a) is added to the specified interval.


Adding a marker

.

Any conducted change in the map can be cancelled. For that, click the <Undo> button.





Selecting part of a chromosome

Choose the <Select part of chromosome> option. The <Select part of chromosome> window appears.

Click the upper (lower) radio button of this panel to choose the first (last) marker of the chromosome part in the <Chromosome information> window. Press the <OK> button to obtain the new version of the chromosome.

1-65





Selecting part of a chromosome

1-66



Chrom1_new Chrom2_new

We will employ now the described option to subdivide a large set of markers into parts corresponding to separate chromosomes (see p. 32). Each such part can be selected by indicating its flanking markers. Then we move to the <Chromosome revision> option, and use the function <Select part of chromosome>. After selecting a part, we should save it under a new name (for more details see next page).

You can see here an example with two selected parts, chrom1_new (markers 1-36) and chrom2_new (markers 38-74). Their LOD-score graphs for a multiple-trait model are shown in the figures.


1-67


Saving your changes

Each of the above changes creates a new chromosome, which may be saved. For that, press <Apply> button in <Chromosome information> window or close <Changed chromosome report> window. The window <Status of the new chromosome > appears. The <Exit without saving> option is available only if the <Changed chromosome report> window was closed.



Choose the first option from the Status window to add the new chromosome to current Project and save it in the Problem folder. Choose the second option to add the new chromosome to current Project without saving. Choose the third option to replace the old chromosome by the new one (the old chromosome is deleted), adding the new chromosome to current Project and saving it in the Problem folder.

The name for the new chromosome must be given.


1-68

You can also choose the width of trait

grouping in the histograms.


Trait revision

You can see the effect of a chosen interval/marker on the trait distribution in the alternative QTL groups.

To use this option, select the names of the traits to display or edit.

Select the chromosome and interval to examine.



1-69

You can transform the scale of trait distribution if needed. The decision could be based on the numerical values of parameters characterizing the trait distribution, e.g., asymmetry and curtosis. You can control the transformation based on the parameters of the resulting distribution.


Trait revision (continued)

You can either replace the scores of the original trait with the transformed ones or add the new trait for QTL analysis. A window for the new trait name appears.



1-70

Outliers (extreme data points) can be displayed using the “Tail Cutting” option. To display these points, click on the red tails.





1-71

To see information about a certain genotype click on the corresponding point on the screen.

If the Two traits option is chosen, the diagram shows for the alternative QTL groups the “0.95-areas” that carry 95% of genotypes

from those groups.





2-1

Part 2

Data Simulation

Table of Contents

Introduction Step 1: Input type and size of the data

Step 2: Input global simulation parameters Step 3: Input marker number and chromosome length for regular

2

3

4

5

7

9

10

12

13

14

and multi-environment data

Input marker number and chromosome length for families data Step 4: Setting dominant markers

Step 5: Setting the QTL parameters for regular and

multi-environment data

Setting the QTL parameters for families data Step 6: Setting the epistasis values

Step 7: Simulation of general parameters


2-2

Input of real data (chromosomes & traits) was shown

in Part 1.

In this part we are going to show how to generate “artificial” (simulated) data.

To do that, choose <FileSimulation> from the main menu.

Part 2 : Data simulation

Introduction


2-3

Step 1: Input type and size of the data


Regular Multi-environment Families (‘Environments’)

Only the sample size (number of individuals) should be entered.

The number of environments and number of individuals in each environment (assumed equal) should be entered.

Number of families and genotypes in each family should be entered. If you put the option <All equal> to state <On>, you can set equal family sizes.


2-4

Input number of: chromosomes, traits (at most 15)

Click <OK> to go to step 3.

Select the mapping function (Haldane or Kosambi) and the population type:

Step 2: Input global simulation parameters

We can simulate “missing marker scores” by setting the percent of lost marker scores (0 - no missing marker data). In simulations of multiple families no missing data is only possible).


In simulation of data with multiple environments or multiple families the number of traits (in current version) is limited by two. By default, the names of families are fam1, fam2 …, and those of environments env1, env2 …


2-5

Each red line corresponds to one chromosome.

Step 3: Input marker number and chromosome length

(for regular and multi-environment data)


To select all of the chromosomes press the <All> radio button. Input the number of markers and the chromosomes length. All chromosomes will have the same length and number of markers. However, the chromosomes already selected will not be affected by pressing the <All> radio button.

Select chromosome, input number of markers and chromosome length and press button <Set markers>.

Description of each operation is given in the bottom of the window


2-6

Step 3 (for regular and multi-environment data) (continued):

In this example, the first and third chromosomes have 9 markers and length 80 cM, whereas the second chromosome has 11 markers and length 110 cM defined previously.


You can move specific markers on any selected chromosome by pressing the left mouse button and dragging them. The new marker positions appear on the screen. You can delete specific markers by pressing the right

mouse button.

After finishing marking the chromosomes, press the <Next> button to go to the next step.


2-7

Step 3: Input marker number and chromosome length (for families)


In case of multiple families, these parameters should be set up for each chromosome separately. After entering the number of markers, we get a window for setting the chromosome length for each family. By using option <Equal> we can set equal lengths of this chromosome across families. Pressing <OK> means switching on the function <Set markers>.

This will result in appearance of a marked chromosome and a window <Families selection>. In case of equal lengths of the chromosome among families, the down radio button is called <Into all family>, whereas in non-equal lengths it is called <Proportionally to the length>.


2-8

Step 3 for families data (continued)


Now for each family you can move or delete any marker of the considered chromosome.

If we move the marker within the chromosome, then the interval length will change equally among families, in the case of equal lengths of the chromosome among families, and proportionally to the chromosome length, in the other case. By marking Fam1, or Fam2, etc., we will affect the interval lengths only in the chosen families.

By choosing one of Fam1, Fam2 … and clicking on a marker by the right mouse button we can delete this marker in the marked family. In case when <Proportionally to the length> was chosen, the system will prevent deletion (to exclude situations of empty marker across the data set). During the permitted deletion the system asks for confirmation, and after it is provided, the deleted marker is highlighted by gray color.

After defining of markers in all chromosomes, you can press <Next> to move to the next step of simulation. In case of F2 population this step is defining dominant markers (if you need it). In family data this is done in the same way as for regular data or multi-environment data.


If the population structure allows for both homo- and heterozygous marker states (e.g., F2), you can set dominant markers on each chromosome without selecting it.

Step 4: Setting dominant markers (for all types of data)

First, set a red-blue marker by pressing the left mouse button. Then select either the maternal (red) or paternal (blue) allele of the dominant marker by clicking the red or blue part.

After finishing marking the chromosomes, press the <Next> button to go to the next step. Press <Previous> to go to the previous step.

2-9



2-10

You can set the QTL location within the interval by moving the ‘location’ indicator and entering the QTL substitution effect (d) and the dominant (heterozygous) effect (h) for F2 population [for RIL or dihaploid, effect (d) only].

Step 5: Setting the QTL parameters

(for regular and multi-environment data)

To set a QTL at any interval of any chromosome, left button click this interval. The <Setting QTL’s parameters> window appears on the screen.


You must input effect values for each trait. If the <Equal> option is selected, all traits will have the same substitution and heterozygous effects.


2-11

Step 5 Setting the QTL parameters (for regular

and multi-environment data) (continued)

Click the <OK> button to set the QTL on the chromosome.

In the case of several environments (see step 1 of the simulation dialog) only one or two traits are possible in the current version.

Substitution and dom. effects are entered for every environment.


Every QTL is symbolized as a green triangle. Left click on this to open the <QTL’s set> window with the QTL parameters set, to review or modify them. To delete a QTL, press the right mouse button by this symbol.

Press <Next> to go to the next step; press <Previous> to go to the previous step.


2-12

Step 5: Setting the QTL parameters (for family data)


Input of the effects is conducted for each family separately, but if you marked the radio button <Equal>, then the effect is set up for all families, like in simulation regime for multiple environments. Still, the relative position of the QTL within chosen interval can be adjusted separately for each family. For that, the family should be chosen in the <Families selection> window, and then the QTL position adjusted using the “location” indicator of <Setting QTL’s> window.

You can also conduct the adjustment (if needed) for all families simultaneously by using the choice as shown in the figure


2-13

Step 6: Setting the epistasis values (for all types of data)

If you have set more than one QTL on the same chromosome, you can select on this step interacting QTL pairs and set epistasis value(s). To do so, click the two QTL symbols in turn. Values for each epistatic component are inserted for each trait in the <Setting epistatic effects> window. If the <Equal> option is selected, all traits will have the same epistasis values. Click the <OK> button to set epistasis values for the QTL pair. The <Setting Epistatic effects> window will close and the interacting QTL pair will be shown by dotted arc.


Press <Next> to go to the next step. Press <Previous> to go to the previous step.


2-14

Step 7: Simulation of general parameters (for all types of data)

You should also set general simulation parameters: - mean value of the trait - standard deviation for each trait - residual correlation for each trait pair The <Setting parameter values> window is used to do this. If the <Equal> option is selected, all the traits will have the same parameter values. The residual correlation matrix should be positive definite (i.e. its determinant should be positive). This property cannot be ensured automatically for any arbitrary matrix with coefficients between -1 and +1. To guarantee it, use the <Correlation fitting> tool.


Click the <OK> button to set global parameter values. The <Setting parameter values> window will be closed and the simulation for the project will begin. Once complete, we receive chromosomes and traits in the format of real data.


3-1

Table of contents

Model Creation stage Page 1: Model Parameters

Two-trait two lists option: Example

Multiple-trait option: Example

Initial Submodel Default option

Trait normalization option Selective

Genotyping option

Page 2: Extended Parameters Finish About model

Calculation panel description

Results of calculation

Save and Open Project options

Part 3

Model Creation and Calculation stages

2

3

5

6

7

7

7

8

10

11

14

15

18


3-2

Model Creation stage

Input of real and simulated data to the system was shown in Parts 1 and 2. In this part

we will show how to create various models to analyze this data.

As the process of Project Creation or Data

Simulation is finished, the window for

setting the model appears.

If a Project is open, we can obtain this window using <ModelCreate> from the main menu.

The Model Creation window consists of three panels.

Part 3: Model Creation and Calculation stages


3-3

Page 1: Model Parameters

Name of the window - the type of population created in the Problem folder or during the process of data simulation.

Name of Page1: Model Parameters.

It is necessary to:

- enter a unique name for each model; - select the mapping function (Haldane or Kosambi); - select the number of QTLs per chromosome to fit (“Single QTL” or “Two-linked QTL” model)

Global number of traits is displayed in the area <Number of selected traits>



3-4

Page 1: Model Parameters (continued)

You should select traits in the area <Names of traits>. The number of selected traits is displayed in the <Number of selected traits> area.

Select an option in the area <Number of traits in the model>

Four options are possible :

one-trait - calculations will be performed separately for each of the selected traits

two-trait - calculations will be performed for each pair of the selected traits (Korol et al. 1995)

multiple trait - a multiple trait model will be fitted for the selected traits (no more than 40 traits can be included) (Korol et al. 2001)


two-trait from two lists - the traits to form pairs of traits are selected from two lists (see next page)


3-5


Two-trait two lists option: example


After your selected this option two lists with names of the traits appear. The traits to form pairs of traits are selected from two lists: each chosen trait of the first list is combined with each one of the second list. The selected traits in the two lists must be different: If some coincide you’ll be warned by a message of error.


3-6


Multiple-trait option: example

Out of 50, 34 traits were selected, and the number of objects common to all of the 23 selected traits is 63.


Note: Multiple-trait option is not available for models Selective genotyping, Multiple-environment, and Multiple-family. If the multiple-trait option was selected, it is possible to see the table of objects number for each trait. To do so, click the <Table of objects number> button.

In this example,

the global number

of objects was

152, and global

number of traits

50.


3-7

- no variance and covariance effects; - no epistasis, for a model with two-linked QTLs

Initial Submodel Default option

By default, it is assumed:

To change the default, check the box <Specifying default form of initial model>. After clicking the <Add Model> button, a window <Submodel> that allows nullifying of the default chosen effects will appear. This window may slightly change its form depending on the selected models and type of data. For more detail about Submodel options see Parts 4,5,8 and 9.



Trait normalization option This option can be applied to data with several environments only (in QTL-Environment analysis). If the option <Trait normalization> is selected, trait normalization (scaling) is performed (for more detail see Part 8). Computation results may be presented in normalized (scaled) and non-normalized forms. Selective genotyping option This option is used in simulation of Selective genotyping data. See Part 10 for details.


3-8

Click the tab <Extended Parameters> to move to page 2.

- the calculation method: marker or interval analysis (by default interval analysis); - marker restoration option (more details in the next page); - number of starting points (by default 1)

Page 2: Extended Parameters


Model name will be displayed. It is necessary to select:

In complicated cases (e.g., two-QTL model across multiple environments, under small sample size), the convergence the optimization procedure of the log-likelihood function may strongly depend on the “starting” point (initial values of the genetic parameters). You may want to check the uniqueness of the obtained solution by choosing several starting points (reducing thereby the risk of not reaching the global maximum of likelihood function)


3-9

Page 2: Extended Parameters (continued)

Marker Restoration option

This option is relevant in cases of real data or simulated data with nonzero percent of lost marker scores and/or missing information caused by marker dominance. If the option <Marker Restoration> is selected (default), missing markers will be restored.

If missing markers are not restored (when <Marker restoration> is off) then the number of objects may vary among marker intervals. In this case the option <LOD normalization> may be employed, although one should be careful in using it (see part 4 for details)

The option <Ignore Marker Loss> may be relevant in case of simulated data with nonzero loss of marker scores.


To reduce the effect of missing information, we calculate the probabilities of being a heterozygote and dominant homozygote for each dominant marker phenotype or missing marker phenotype, based on scores of the neighbor markers. These probabilities help us to calculate the likelihood function (e.g., Jiang & Zeng 1997; Jansen & deJong 1999; Peng et al. 2003).


3-10

By specifying the parameters of a model and clicking the button <Add Model> on the <Model Parameters> page, we add the model to the Project. You may use the <Model Scrolling> buttons to display all chosen parameters of the next or previous model. Click <Delete Model> to delete the chosen model (with all computation results related to it). Click <OK> to finish the Model Creation stage.


Finish

Upon finishing the Model Creation stage, the <Calculation Panel> window appears.

This window may also be accessed using the <Model Open> menu.

Click <About Model>, to get a total description of the selected model (see next page). For a detailed description of the <Calculation Panel> see page 14.


3-11

About model

By clicking <About Environment> we can display the number and names of the environments (where there are several). By clicking <About simulation> we see the <Setting the chromosomes> window (for simulated data only).

A description of the selected model is shown in the <About model> window.


In case of Families data the corresponding information can be obtained by pressing <About families> button.


3-12

About model (continued)

Click a chromosome name to get the number

and lengths (in cM) of intervals

By clicking <Param>button we see <Setting parameters Value> window.

<QTL-s set> window appears when clicking a QTL’s triangle.


About simulation for regular and multi-environment data


3-13


About model (continued)

About simulation for families data In such cases, after the chromosome is selected, a window <Families selection> appears. Here, for each family we can get information about interval lengths in each chromosome.

Likewise, after clicking the QTL’s green triangle, in order to get the information of the QTL relative position in the target interval in each of the families, we should use the same window <Families selection>.


3-14

Calculation Panel description

On this panel all chromosomes of the Problem and all selected traits are displayed. Traits are displayed according to the selected model: one trait, trait pairs, or several traits.

In order to compute a model for one square (e.g., chromosome - trait pair combination), click this square and then the <Compute> button. To analyze all combinations, click <All> then <Compute>. You may also choose any chromosome or trait separately and then click the <Compute> button.

Scrolling is available for chromosomes and traits.


For Families data you should also set the number of steps for calculation of the QTL position within the intervals (by default =10). For more detail see section Families.

We may choose any one model from all created models in the <Models> menu.


3-15

Results of calculation

The calculation graphs are shown

in compressed form.

The maximum LOD value of

each graph is also shown.

While computing, the <Progress line> is updated in the window.



3-16

Results of calculation (continued)


Switch the check box <Global Bootstrap or Permutation test> to <On>; the window will change. We can now conduct Bootstrap analysis or Permutation Test for several of all “cells” simultaneously. For that, we should mark the cells of interest (highlighted in blue). The number of runs can be changed (by default, 1000). You can first conduct the Permutation Test and then Bootstrap (recommended order).

After the PermutationTest is conducted, the significance levels and the chosen numbers of permutation runs will be shown in the cells.


3-17

Results of calculation (continued)


For the same set of cells let us conduct now Bootstrap function. We should again mark these cells and press <Bootstrap> button. As a result, in the selected cells we’ll get the information that this function fulfilled and the number of permutation’s runs will be shown.

Later, for any model from the <Calculation Panel> we can switch the check box to <On> and by pressing <All> get full information about the conducted tests. In more detail information about the tests will be provided later.


3-18

Save and Open the Project

We have created a Project and performed our first calculation on it. To save all the data, models and calculated results select: <FileSave Project> or <FileSave Project As>or click on the tool bars. A window for selecting the Project name will appear. Enter the file name and click <Save>. The extension *.job will be added to the file name.


To load a Project, choose <FileOpen Project> from the main menu or click the green icon. The <Select JOB name to open> window appears. Select the desired Project by double clicking on the filename or selecting it then clicking the <Open> button.


4-1

Part 4

Single-QTL model, one and two traits

Table of contents Introduction Description of the first Project (allMarker.job) Main Window for the Analysis and LOD graph

Occurred and Estimated minitables Interval analysis, one trait (Model mTr1int)

interval analysis, two traits (Model mTr2int) Estimate option and Estimation table

Interval analysis, one trait (Model mTr1Int ) Marker analysis, one trait (Model mTr1mark)

Interval analysis, two traits (Model mTr2int)

Defining and Fitting Submodel option Interval analysis, one trait (Model mTr1Int)


3 4

6

7

10

8 9

11

12 14

Displaying the results


4-2


Part 4: Single-QTL model, one and two traits

Scanning option Comparing hypotheses H1 H0 option Compare Submodels option Bootstrap analysis option Distribution option

About submodel option

Main Window for the Analysis and LOD graph

Models with extended parameters (file “lossMarker.job”) Project with missing marker scores LOD Normalization option Ignore Marker Loss option Marker Restoration option

15 17 18 19 21

22

23 24 25 26


4-3

In this part, we show on simulated data, how to perform the analysis. Two examples

of such data will be employed. Both examples are on F2 populations, with scores

from one environment, 200 genotypes (and phenotypes) and all markers from one

chromosome.

The first example (in file “allMarker.job”) includes two traits and has no

missing markers. We shall use it for showing of the overall result of

performing an analysis.

The second example (file “lossMarker.job”) includes one trait with some proportion

of missing markers. We shall show how to use the model options <Marker

restoration>, <LOD normalization> and <Ignore marker loss> on this example.

See also Part 3, page 9.

Introduction



Description of the first project (allMarker.job)

This project includes 4 models: two models with one trait; and two models with two traits (interval and marker analysis in both cases).

Interval

Ma r ke r

4-4


Press any thumbnail graph to display the results of fitting the model for chosen chromosome- trait combination. After pressing the graph button, the LOD value is highlighted in red.


The information on each model can be displayed by clicking <About Model> button on

the <Calculation Panel>.

4-5

Description of the first project (continued)

We can display all the parameters on simulated data by clicking the button

<About Simulation> on the <About Model> panel.

For details see Part 3 page 11.



The X axis of LOD graph is in centiMorgan and is supplied with marker bars. Placing the cursor on the marker vertical bar allows you to display marker name. For F2, F3, or F4 populations, red and blue colors show the dominant marker loci in repulsion phase whereas green color is to denote codominant markers. For backcross (or double haploids, or RIL) all markers are red colored.

Main Window for the Analysis and LOD graph

4-6

Displaying the results


First we select the model mTr1Int. For example, we can look at trait2 graph (LOD=4.26).

The <Results menu> corresponds to the new window with LOD score graph.


For simulated data the green triangle indicates the QTL. You can click it to see the QTL’s input parameters (user-specified) and the results of simulation (the Occurred table)

4-7

Click any place in the interval space to see all the parameter values for this interval (the Estimate table for this interval). The selected interval is highlighted in gray.

Occurred and Estimated minitables


Interval analysis, one trait (Model mTr1Int)

The any graph can be printed by choosing the option of Results menu <OptionsPrint>. But the tables <Occurred> and <Estimated> are not supposed to be printed.

Note that the sign of d represents the difference X(A)-X(B), or X(1)-X(2), where A and B (or 1 and 2) are the two homozygotes at the marker loci (clearly, you may use other designations, but the sign (d) is always in accordance to the indicated direction). Likewise, h is calculated as deviation from the mid-parent value, i.e., X(H)-0.5[X(A)+X(B)], or X(3)-0.5[X(1)+X(2)].


Choosing the <Estimates> option of Results menu opens the Estimation table with estimated parameter values for each interval. Interval with the global maximum LOD is highlighted by red color whereas interval(s) with local maximum LOD is (are) highlighted by blue color.

4-8

The table can be opened in Excel by choosing the option of Results menu <OptionsOpen in EXEL>. The option <Print> for this table is available but less effective than those in Excel. P.E.V. [P.E.V.(ad)] is the percentage of explained variance [explained additive variance] of the trait

Displaying the results: Estimates option

“Coordinate right end” in this table is distance (cM) from the beginning of chromosome to the end of interval. In this table “L“ is the distance from the beginning of the chromosome to the point of maximum LOD in the interval.


Interval analysis, one trait (Model mTr1Int)


4-9

Lod Graph, Estimation table

Marker analysis, one trait (Model mTr1mark)

For the marker analysis case we see <Estimation table> with the estimated parameter values at each marker locus.

We may click any marker to see the estimates of all the parameters for this marker.



4-10

LOD Graph, Occurred and Estimated minitables


For the two-trait analysis, we see <Estimate> and <Occurred> tables with the estimated and occurred parameter values for each of the two traits (Korol et al. 1995)



4-11

Estimation table (two traits) Model mTr2int

The Estimation table shown here corresponds to the two-trait analysis case. The estimated parameter values are shown for each of the two traits.



4-12

Defining and Fitting Submodel option

Model mTr1int

The first calculation was performed (by default) under the assumption of equal residual variance in the QTL groups (i.e., “no variance effect”, for more details on variance effect model see Korol et al. 1996b).

We may analyze the data allowing for variance effect or taking into account other assumptions about the model parameters. This can be done by defining corresponding submodels. They may be created by using <Submodel Add> option from the Results menu

On the <Submodel> panel, all possible conditions for one trait model and F2 population are displayed.

Combinations of these conditions are also possible.



We see LOD graphs for 4 submodels in this window. Name of each one reflects the way of its

computation. All options of Results menu will be performed for selected submodel. We select it by left click. Up to 5 submodels can be created and represented by

graphs simultaneously.

4-13

Model mTr1int

Defining and Fitting Submodel option (continued)


The selected submodel may be deleted by option <Submodel Delete>.


Model mTr2int

Many more submodels may be created for two trait model.

For each selected submodel an <Estimated> table may be received. In this table the color of rectangle denotes the current submodel.

4-14

Defining and Fitting Submodel option (continued)



The <Scanning> option from the Results menu can be used to show more details (also employs linear approximation, but with much more points per interval). <Scanning Parameters Setup> panel is used for that.

4-15

Scanning option

The displayed LOD graph (for model mTr1int, trait1) is very coarse, due to linear approximation: maxLOD values in the intervals are connected to obtain the graph.

The resolution provided by the default scanning parameters is usually sufficient.



4-16

Result of scanning is shown in current window. Note that by using the scanning option we can improve the solution in some complicated situations. Indeed, this option complements the restricted maximum likelihood (ML) solutions representing the best points for each interval. Scanning provides additional (intermediate) points for each interval. Due to the challenge of global multiparametric optimization of ML function (with very complicated landscapes in the parameter space in certain situations) such scanning may help in optimization .

Clearly, scanning option is impossible for models with marker analysis. Choosing the <Unscanning> option restores the original graph

Scanning option (continued)



4-17

Comparing Hypotheses H1 H0 option

To compare hypotheses H1 (there is a QTL in the chromosome) and H0 (no effect of the chromosome on the trait) you can use the permutation test (Churchill & Doerge 1994).

Choose the <SignificanceCompareHypotheses H1:H0> option of Results menu. This test is an easy tool to check for significance. For example we can examine model mTr1Int, trait2, submodel h=0 (yellow). Just press the <Start> button and the test starts. All the preset options can be changed. Permutation test can be continued. Parameter <Number of Runs> allows you to define the number of runs that you want to get in current session, whereas the window <Permutation> displays the total number of permutations at the moment of observations


Definitions: <Critical LOD value> is the maximum LOD value of current submodel. <Overcome> is the number of permutation runs in which the computed result (max LOD) was above the critical LOD value. <Threshold LOD values> are the values exceeded in 5% , 1% or 0.1% of the permutation runs, respectively.

Press the <Reset> button to clear previous permutation result. After closing the permutation panel we see the significance value.


4-18

. After fitting any two submodels you can test whether

they differ statistically. Choose <Significance Compare Submodels> option to compare all relevant submodels to the selected one (Ronin et al. 1999; Peng et al. 2003; Hanotte et al. 2003)

Compare Submodels option

For model mTr1int, the default submodel (see page 14) highlighted in red can be compared with three submodels. The submodel defined by maximal number of parameters is marked by a larger rectangle. In the <Comparison> window, select the submodel to compare to and click <OK>.


The <Comparison Test> window appears. Press <Start>button and the test will run. All the initial parameters here may be changed. <Critical LOD increment> is the difference between maximum LOD values of compared submodels. Simulation method is used to compare the submodels

(e.g., Peng et al. 2003)


Bootstrap analysis allows estimating the standard deviations of the main parameters using repeated re-sampling of the data (with repeats). Non-parametric selective and non-selective bootstrapping (Lebreton & Visscher 1998) is provided in MultiQTL.

4-19

Bootstrap analysis option

Choose <SignificanceBootstrap> option of Results menu. The <Bootstrap Test> window appears.


.

<Number of samples> may be changed. Press <Start> button to start. If permutation test is not performed for current submodel, <Threshold LOD values> are not created.


4-20

A histogram in the case of marker analysis looks as shown.

Bootstrap analysis option (continued)


If threshold LOD values are defined (in permutation test), three new rows in the table appear. In this case, the Power value may be received for 0.05, 0.01, and 0.001 significance levels, accordingly. The table of the <Bootstrap Test> window can be opened in Excel by clicking on the <Open in EXCEL> button.


4-21

Distribution option

Select a submodel and choose <Distribution>

option of Results menu to see the distribution of

the trait in the alternative marker or interval groups.

Trait2 of model mTr1int is shown in current window. The interval number 4 is chosen. The number of steps in the histogram may be changed.


<Distribution> panel of two-trait model is shown.


4-22


Choose <About submodel> option. This option allows displaying the submodel information. Examples of displaying submodel names for two-trait model and one-trait model are shown below.

Rectangle color corresponds to

the submodel graph color.

In order to mark the best submodel, use radio button <Mark this submodel>. This submodel will be shown in the <Calculation Panel>.



4-23

Project with missing marker scores

Real data often have missing marker scores. In a special example (file lossMarker.job) we simulated data with 20% of missing markers. A simple model <missMark> for this Project was created. LOD graph in the scanning form and its <Estimation table> are shown. You can see in the <Estimation table> different number of genotypes (nObj) for every interval (due to missing data). Note that the largest LOD values did not appeared here in the interval carrying the simulated QTL. The number of simulated phenotypes in the example was 200.

Models with extended parameters



LOD Normalization option

LOD value for every interval depends on the number of genotypes (nObj) for this interval. Because this number varies among intervals (due to missing data), the LOD graph (see previous page) may display a biased (non-objective) picture.

We can create a new model <normLod> with “corrected” nObj by scaling on the max nObj (see part 3, page 9). The resulting graph is shown below.

4-24


Models with extended parameters (continued)


4-25

Ignore Marker Loss option

Using simulated data with missing marker scores, you can evaluate the impact of loosing information (due to missing data) on mapping quality, compared to the “no missing data” case. For the last case, you should calculate the results using the option <Ignore Marker Loss> .

Model <MLossIgnor> using this option was created. The result for this model is shown.




4-26

Marker Restoration option

Option <Marker Restoration> can be used for “virtual” restoration of missing marker scores based on the information on linked markers (see Part 3, page 9). This option is available for real and simulation data. By comparing the LOD graph below to the graph shown on previous page (the same data set but with no missing marker scores) you can see how efficient this function was in the considered example.




5-1

Table of contents

Introduction

Parameters of simulated example (Qtl2Tr1_2.job)

Comparing the models

Computation results

Submodel option


Estimate option

Data analysis options Hypotheses H2H0 Hypotheses H2H1 Compare Submodels Bootstrap option

Distribution option

Part 5

Two-linked QTL model

2

3

4

7

10

13

14

16

17 18 19 21


5-2

Part 5: Two-linked QTL model

In order to show the analytical options of the Two-linked QTL model we give an example on simulated data with two QTLs.

Our example includes one chromosome with two interacting QTLs. The simulated data

were for a population F2 phenotyped for two traits in one environment. The number of

genotypes (and phenotypes) was 200.

See next page for details

The name of created project is “Qtl2Tr1_2.job.” Open it and press the

<AboutModel> button. All parameter values will be shown. In order to see

parameter values used in simulation press the <About Simulation> button.

Introduction


5-3

Parameters of simulated example

The simulated chromosome carries two QTLs, in the 3rd and 7th intervals. Parameters of these QTLs are shown in the <Setting QTL’s parameters> windows.

You can open the <Setting epistatic effect> window by left clicking on the point at the top of the dotted arc. The values of epistatic effects are shown in this window.



5-4

The created project includes 4 models

Single QTL analysis Two-linked QTL analysis

Two traits

One trait

Interval mapping analysis was used in all four models. It is also possible to use marker analysis and all the techniques for restoration of missing marker scores.

Comparing the models



5-5

In this example, only one QTL (in the 7th interval) is clearly seen in the single-QTL

analysis for both one and two traits.

One-trait analysis Two-trait analysis


Comparing the models (continued)


5-6

In the two-linked QTL analysis, we clearly see that maximum LOD is reached for the

pair of intervals 3 and7, i.e., at the positions where the QTL effects were simulated.

One-trait analysis Two-trait analysis


Comparing the models (continued)


- a larger window with results representing the LOD values for each pair of intervals of the chosen chromosome.

5-7

Computation results

For two-linked QTL model, the results are shown in two windows:

- a small window with the names of the submodels

Global LOD maximum is highlighted in red, local maxima are marked in blue.

Three dimensional graph may be rotated by pressing the <Rotate> button.



If you analyze simulated data, you may

click the green QTL symbol to see the

QTL’s input parameters and simulation

results. <Specified> and <Occurred>

results will be shown .

5-8

Computation results (continued)

Click any cell to see its <Estimate table>.

The table contains all parameter values.



5-9



Computations in case of marker analysis


5-10

Submodel option

All submodel options available for single QTL analysis can now be used with two QTLs (see Korol et al. 1998a; Ronin et al. 1999; and references therein).

From the previous page we could see that the estimates of main parameters may differ quite strongly from the simulated parameter values. That happens because we have simulated a rather high level of epistasis, but the first submodel (by default) was computed with zero epistasis values. Now let us create a new submodel allowing for epistasis. For that, select

the <SubmodelAdd> menu option.

Set <OFF> values for all epistasis=0 radio buttons



5-11

The new submodel (green) is shown. Its <Occurred> and <Estimated> values are much more similar.

In the small window the two submodels are represented.

The ToolTip (prompt) shows the full name of the submodel.

Submodel option (continued)



5-12

Up to 5 submodels can be created and represented by windows simultaneously. All analytical options can be performed for the selected submodel. Select a submodel in the small window. Its full

window becomes active.

Submodel option (continued) Part 5: Two-linked QTL model

The selected submodel may be deleted by choosing the <SubmodelDelete> option.

A submodel window may be closed, but the submodel is not deleted in this case. The colored rectangle of this submodel will disappear from the submodel name in the small window. If this submodel is selected again, its window will appear again and become active. The <Order> button is used to regulate windows according to submodel numbers in the small window.


5-13


Choose <About submodel> menu option. This option shows submodel information. Examples of submodels for

In order to mark the best submodel, click radio button <Mark this submodel>.

one-trait model two-trait model and



The Estimate option gives you a great deal of information.

The interval pair with global maximum LOD is highlighted by red color, whereas interval(s) with local maximum (maxima) of LOD is (are) highlighted by blue color.

Like in single-QTL analysis and in all further models, P.E.V. [P.E.V.(ad)] is the percentage of explained variance [explained additive variance] of the trait relative to its phenotypic variation

5-14

Estimate option Part 5: Two-linked QTL model

“Coordinate right end1 “ and “Coordinate right end2” in the <Estimate> table are the distances from the beginning of chromosome to the end of first and second intervals, respectively. L1 “ and “L2” in the <Estimate> table are the distances from the beginning of chromosome to the point of maximum LOD in the first and second intervals.


5-15


For this case the table differs a bit from that of the interval analysis

The form of the table in case of marker analysis

Estimate option (continued)

The table can be opened automatically in Excel spreadsheet by the <OptionOpen in EXCEL> menu option.


We can compare hypotheses H2 (two linked QTLs), H1 (a single QTL in the chromosome) or H0 (no QTL in the chromosome), by using the <SignificanceCompare> options.

For comparing hypotheses H2 vs. H0 we use permutation test. It is completely identical to the test of H1 vs. H0 in the single QTL case.

Comparing hypotheses H2 H0


5-16

Data analysis options


Data analysis options (continued)

For testing H2 vs. H1 we use simulation method (parametric bootstrapping – Walling et al. 1998; Korol et al. 1998a; Ronin et al. 1999; Peng et al. 2003).

Both tests run much slower than in single QTL models because computations go over all pairs of intervals.


5-17

The results of the tests can be seen in the <About submodel> table.

Comparing hypotheses H2 H1


5-18

<CompareSubmodels> option is

completely identical to the same

option in the single QTL case.

Compare Submodels




5-19

Bootstrap option

Choose menu option <SignificanceBootstrap>.

This test for two-linked QTL method is identical to the one for single QTL case. But instead of histogram, color intensity is used to show the distribution of outcomes among the pairs of intervals. The numerical value of intensity (“histogram bin”) is displayed by left clicking on a square.




5-20

Bootstrap option (continued)


The form of the table in case of marker analysis.



This option shows how the two intervals of QTL location affect the trait distribution. This option is also identical to the distribution option of the single QTL case. But here it is necessary to define two intervals of QTL location.

5-21

Distribution option

The putative QTL location and the number of intervals on the histograms can be changed.

Choose <Distribution> menu option.



6-1

Part 6 Multiple-trait model

Table of contents Introduction

Single QTL model (MyExTr6_4chr.job and Tr6_Chr4F2.job) Parameters of simulation example Computation results Submodels option About submodel option Significance options

2

3 4 6 7

8

9 10

Permutation test (H1H0) Bootstrap Traits contribution

Two-linked QTL model Parameters of simulation data example Computation results Submodels option About submodel option

Significance options Compare Submodels option Permutation test (H1H0) and (H2H1) Traits contribution

Bootstrap

13 14 16 18

19 20 21 22


6-2


The details of the method adopted here for multiple trait analysis are described in Korol et al. (2001) [see also Korol et al. 1995; Zeng & Jiang 1995; Ronin et al. 1999; and refs therein). Note that in our algorithms the multivariate QTL mapping is based on ML-analysis with interval-specific transformations of the trait space (Korol et al. 2001).

In the simplest case of two non-correlated traits, the advantage of joint analysis of two

traits is in the increase of the “multivariate effect” according to d 2 = (dx/x)2 + (dy/y)2,

where dx and dy are the substitution effects of the QTL for traits x and y, and x and y are

the corresponding standard deviations within the QTL groups (residual standard

deviations). In case of correlated traits, the potential gains from joint mapping analysis,

compared to single trait analysis, are due to: (i) the pleiotropic effects of the QTL on x and

y; (ii) residual correlation between x and y (within the QTL groups) caused by non-genetic

effects and segregation of unlinked QTLs; and (iii) the combined effect of both factors (i)

and (ii) (Korol et al. 2001). Joint analysis proved an efficient tool, upon certain conditions,

for increasing QTL detection power, mapping resolution, and accuracy of estimated

parameters. It may also increase the power of discriminating among various hypotheses

concerning the trait genetic architecture, such as linkage versus pleiotropy.

Introduction

In order to demonstrate the options of the Multiple trait QTL analysis and the performance of the employed algorithms, we provide two examples of simulated data with several traits.


6-3

Parameters of simulation example

Parameters of QTL3 of chromosome chr2 and QTL4 of chromosome chr3 and general parameters are displayed. Open the project and press the <AboutModel> button. All data parameters will be shown. In order to see simulation parameters press the <AboutSimulation> button.

Part 6: Multiple-trait model Single-QTL model

The file with the first example is named “MyExtr6_4chr.job”. The simulated data represent a backcross population with 4 chromosomes, with the phenotype (6 traits) scored in one environment. The number of genotypes (and phenotypes) is 200.


6-4

Computation results

We shall show multi-trait single-QTL model performance on this example, for chromosomes 2 and 3. A model with all six traits was created. Scanning type of graph of the second chromosome is displayed.

Estimated and simulated characteristics are shown.


Single-QTL model


6-5


Estimates table may be received by <Estimate> menu option. The interval with global maximum LOD is highlighted by red color.


The estimates of the genetic parameters for each trait are shown, including the pleiotropic effects eff.(d) of the putative QTL and the percentage of explained phenotypic variation (P.E.V.)

Single-QTL model (continued)

Important comment: It is clear that not every trait combination can be treated based on multitrait QTL analysis. Indeed, this is impossible if the residual variance-covariance matrix is degenerative (i.e., its determinant is zero). Thus, during the analysis, the last condition is tested, and if this is indeed the case, the trait(s) causing this condition are detected and reported. In the example shown here the traits that cause the foregoing condition are displayed in the “cells”. As you can see, the set of such traits may vary among the chromosomes. To continue the analysis, you should build a new multiple-trait model that will not include any of these trouble-maker traits and repeat the analysis.


6-6

Submodel option

In this case the submodel contains only a part of all traits. Use <SubmodelAdd> menu option, select desired traits in the <Submodel> window and press <OK> button.

Submodel with trait1, trait3, trait4, trait5 was created in this example. Later we will consider how to employ the special options of MultiQTL analysis of multi-trait complexes in order to select a “reasonable” subset of traits to remain in the final model.

<SubmodelDelete> option may be used.




6-7



<About submodel> menu option displays submodel selected in the graphs window. The color of rectangle in the upper-left corner of <About submodel> panel shows the color of corresponding graph. Three submodels are shown in this example.


It is noteworthy that in multitrait analysis the comparison between submodels that differ in the number of traits is conducted using the function Trait contribution, whereas Compare Submodels option serves for comparing two-QTL vs single-QTL models or two-QTL models with and without epistasis.


6-8

Significance options: H1 H0

<Distribution> menu option does not exist for multi-trait model. <Significance Compare H1H0> option was chosen for analyzing the results on chromosome 2, submodel 3. It is identical to the same options in other single-QTL models.




6-9

<SignificanceBootstrap> option is identical to the same options as other single-QTL models. Note that in the bootstrap table all traits’ parameters are displayed.



Significance options: Bootstrap


6-10

Significance options: Trait contribution

<SignificanceCompareTrait Contribution> is a special option for multitrait model. Choose this option for selected submodel. Panel of trait significance appears. Press <Start> button. Two tests are then conducted simultaneously for each of the traits based on permutations of the trait values relative to the set of the remainder traits and the marker set (Korol et al. 2001): (a) test of the trait “contribution” to the LOD, and (b) test of the putative QTL effects, additive and/or heterozygous.

Part 6: Multiple-trait model Single-QTL model (continued)

The need in these tests derives from the fact that not for every trait combination joint analysis of the multitrait complex is automatically more informative than the analysis of a sub-complex or even of some single traits (Korol et al. 1995, 2001). An improvement is expected when the interval (QTL) affects several traits and/or when some of the traits are correlated (see p.6.2). If one includes too many traits, some may be non-informative or depend on different regions of the same chromosome increasing thereby the uncertainty of QTL location compared to the simple single-QTL analysis. Our advice is to take for joint analysis complexes of functionally-related and/or correlated traits.


6-11


We receive significance of each trait for current submodel. The significance is defined for LOD, substitution effect {d} , and heterozygous effect {h} (for F2 or F3 population). To demonstrate how to employ this function in order to “optimize” the trait complex, we will use the example in file Tr6_Chr4F2.job. The LOD graph for the 6-trait complex and the results of the trait contribution test are shown.

Clearly, the next steps are to remove trait #6 from the set, re-analyze the reduced set, then to remove (if necessary) the next non-significant trait, etc. For this example, after step 3 no further reduction of irrelevant (“parasitic”) traits is possible. The results are shown on graphs below. Please, note that if you are going to try multiple-trait analysis with MIM (see part 7), the requirement is that for all chromosomes the same trait set will be employed. Thus, for MIM you should use either the initial set or a reduce set, with the same traits for all chromosomes.


Significance options: Trait contribution (continued)


6-12


For the simplified 3-trait model (highlighted in dark yellow) the QTL detection power was P=99.9% (at significance 0.001) and the accuracy (standard deviation) of the estimated QTL position was S L=7.1 cM (the first bootstrap graph) whereas for the initial 6-trait model we have got P=99.7% and S L=7.8 cM, respectively.

For traits with strongest QTL effects of the considered chromosome the results were: for trait 3, P=84.4% and S L=12.2 cM; for trait 5, P=92.2% and S L=13.4 cM (the second graph).


Significance options: Trait contribution (continued)


6-13

Consider, for example, chromosome #1. We shall show how to use the options of multitrait two-linked QTL analysis. These simulation data are on backcross population, with one chromosome, 10 traits, one environment. The number of genotypes / phenotypes was 200.

Parameters of simulation data example

Parameters of QTL 1 and QTL 2

and epistatic parameters are displayed.

Two-linked QTL model



6-14

Computation results One model with all six traits was created.

Simulation (left click on single QTL) and computation (left click on any square) results are shown.


Two-linked QTL model (continued)


6-15


A global estimates table can be received by <Estimate> menu option.The interval with global maximum LOD is highlighted by red color, whereas interval(s) with local maximum LOD is (are) highlighted by blue color.


We should note that in multiple-trait analysis the two- QTL model is challenging from computational viewpoint and there is no guarantee about the convergence to the solution for an arbitrary trait complex and marker data set.



6-16

Submodel option

Use <SubmodelAdd> menu option, select the desired traits in the <Submodel> window and press <OK> button.

In this case, submodel contains only a part of all traits, with possibility to include epistasis (by default, no epistasis). For more details about the submodel analysis combined with trait contribution analysis, please, consider the examples for single-QTL analysis (p.6.11).

In addition to the basic model of no epistasis, three submodels were created in this example: - epistasis fitted for all 6 traits; - selected were traits 2,3,4,5; - no epistasis; - selected were traits 2,3,4,5 with epistasis




6-17


Three-dimensional graphs of the initial model (red) and all three submodels are displayed. In the small window corresponding radio buttons for four submodels are shown.

The ToolTip (prompt) shows the

full name of the sub-model.




6-18


The general information about each submodel may be displayed by the <About Submodel> menu option.




6-19

In this model <SignificanceCompareSubmodel> option is possible. Only submodels with identical sets of traits, including and not including epistasis, may be compared.

In this example two submodels with traits #1-6 are compared.

Significance options: Compare Submodels option




6-20

Significance options: H2H0, H2H1

<SignificanceCompare H2H0> and <SignificanceCompare H2H1> options are identical to those options in other

two-linked QTL models.




6-21

Traits contribution


This option is identical to the corresponding option for single QTL model (see page11)



6-22

Bootstrap


This option is identical to the

corresponding two-linked QTL

option for one- or two-trait model

(see part 5, page 19). Note that

in the bootstrap table all traits’

parameters are displayed.



7-1

Table of contents

Introduction

Simulated example (MultiSet.job) Windows for creation of a multi-chromosome set

Part 7

Multiple chromosome analysis

Selection of traits for single- or two-trait models Selection of traits for multitrait models

Creation of multi-chromosome set Multiple interval mapping (MIM) Results of the Multiple interval mapping (MIM)

Multi-chromosome set for two-trait analysis Multi-chromosome set for multiple-trait model

Multiple Simulation options

2 3 5

6

6 7 8

11 11 14 16


7-2

In the previous parts, we have demonstrated how to conduct QTL analysis for each separate chromosome. Here we show how to use the MultiQTL tools for entire genome analysis. The examples include Monte-Carlo simulation (for simulating multiple samples) and Multiple Interval Mapping (MIM) for a single (real) data set (Kao et all. 1999). It should be stressed that combination of MIM with other “multiple” approaches, i.e., multiple-trait (part 6), multiple-environment (part 8), and multiple-family (part 9), allows achieving high quality mapping, including further increase in QTL detection and accuracy of the estimated QTL position. Our own experience indicates that even with a rather modest sample size (n=100-200) one may reach fine mapping (with standard deviation of QTL position being 2-3 cM or even less that 1 cM).

There are three possibilities to start multiple interval analysis:

1. From <MultiSet> menu option of Data Analysis window

2. From icon

3. From <MultiSet>option of the main menu

Part 7: Multiple chromosome analysis

Introduction


7-3

Simulated example

For showing multiple interval mapping (MIM) analysis, a project based on simulated data Tr6-4chrF2.job is used. The simulated data represent F2 population, with six traits, one environment, and four chromosomes. The number of genotypes (and phenotypes) is 500. In the current 2.5 version of MultiQTL (unlike the 2.4 one), Multiple Interval Mapping analysis is also possible for multiple-trait models, multiple-environment models, and multiple-family models. To demonstrate the single-trait analysis we have prepared tr1Q1 and tr1Q2 models, for two-trait analysis - tr2Q1 and tr2Q2 models, and for multiple-trait analysis - mltrQ1Tr4_1,mltrQ2Tr4_1models.


QTLs parameters on the first

chromosome:

Parameters of QTL 1 and QTL 2


7-4

Simulated example (continued)


Parameters of QTL3 of chromosome chr2 and QTL4 of chromosome chr3 and general

parameters are displayed.


7-5

Windows for creating a multi-chromosome set

After you opened the <MultiSet> option, window <Multichromosome set> appears.

The window is either empty or it includes names of multiple chromosome sets created before. For opening an old set click left mouse button on its name and press button <Open Set>. You can also delete a chosen set by <Delete Set> option or copy it under a new name by <Copy Set> option. In order to create a new multichromosome set click <Add New Set>. The window <Selection of traits> will appear that is different for different types of models fitted at the single-chromosome stage of the analysis (see next page).


<Multichromosome set> window


7-6

In these windows you should enter a name for the multiset, choose the number of traits according to the model (one, two, or multiple) and select trait names for it. Press <OK> button.


<Selection of traits> window for single- or two-trait models

For multitrait models, we can choose a model from a set of models with different number of traits that were created and computed for each chromosome. Our example includes a model with six traits and two models with five traits. We enter the multiset name and chose the model class: Multitrait. In order to find out what traits were entered to the set, the set should be marked by the mouse left button. Then, the button <Select> should be pressed to confirm the choice.

Windows for creation of a multi-chromosome set (continued)

<Selection of traits> window for multitrait models


7-7

Creation of multi-chromosome set


We can add a chromosome to the set. For that, select a chromosome name and a model name. Choose a submodel from the <Submodel name> window and click <Add to Set> button to add chromosomes to the set. To delete any chromosome from the set, choose it by left click on its name and press the <Delete from Set> button.

For the example, we consider a set twoTrait (tr5_2) with two traits.

As a result, we will obtain a MIM set with four chromosomes with different models and submodels. Click <Open Set> button for further work.


7-8

Multiple interval mapping (MIM)

A new window appears with graphs of all selected chromosomes and a menu with multi-chromosome analysis options. The menu becomes visible only after activation of the window. To perform multiple QTL analysis, select <MIM> option.


Our ‘Multiple Interval Mapping’ algorithm reduces the background (non-controlled) variation by taking into account QTL effects from other chromosomes (Jansen & Stam 1994; Zeng 1994; Kao et al. 1999). In our software, it consists of sequential chromosome analysis for QTL presence while subtracting the QTL effects of other chromosomes. Since the positions and effects of these QTLs are unknown in advance, the algorithm is iterative.


7-9

Multiple interval mapping (MIM) (continued)


increase or decrease this level. Say, you may increase it to 0.1 in order to allow less significant effects to be temporary included into the model (with a hope that reduction of the residual variation due to other included QTLs can make some small effects “significant”). Then, after the model is build, you may again return to the initial 0.05 or even set a more stringent level. This is similar to the procedures of forward and backward stepwise regression. ”Min. Permut” and “Max. Permut” mean minimum and maximum number of permutations conducted at each round of the MIM process.

First, the most powerful QTL is found and its effect is subtracted. Then, the next powerful QTL is searched and its effect is subtracted, and so on. This procedure is repeated until no QTLs are found anymore on the remaining chromosomes. Assuming no interaction, the QTLs effects are re-evaluated by fitting the QTLs from other chromosomes in the order of their power. This procedure is applied repeatedly until the difference between parameters of

each QTL on two consecutive iterations is less than a reselected value.

Here we need to explain the meaning of parameters appearing in the window: “Level” means the level of significance for individual chromosome effect that you want to require for the chromosome to be included into the MIM model, whereas “Precision” means the standard error of this Level. You may want to use the default value (0.05) or either


7-10

Multiple interval mapping (MIM) (continued)


The yellow arrow icon shows the currently treated chromosome; the blue sign marks a chromosome without detected QTL, and a red-green “molecule” marks a detected QTL.

For any current stage, if after the minimum number of permutations the trial chromosome does not fit the significance conditions defined by the Level, then the MIM algorithm moves to test the next chromosome. If during this test the chromosome was not left after the minimum number of permutations (Min. Permut. runs), the testing continues till one of the following events occurs: (a) a non-significant level was reached before the number of permutations reached Max.Permut. and the chromosome is marked as non-significant; and (b) the p-value (significance level) becomes lower than Level - 2Precision before (or just when) the number of permutation runs reached Max. Permut., then the chromosome is marked as significant for the current iteration. After a few or even few dozens of iterations the process converges. But you may want to continue the iterations by setting more challenging permutation numbers

Click <Start> button of <Multiple interval mapping> window.

Press <Close> button after process finishing.


7-11

All LOD graphs after the calculations appear. The blue triangle is a symbol indicating the position of the QTL after

MIM analysis. Status of <after> button is ON. To see LOD graph of each chromosome before MIM analysis, i.e. based on single-chromosome analysis, click the <before> button. Click the chromosome name to see a large graph of <before> or <after> status. Use <view> menu option to switch <before>/<after> buttons for all chromosomes simultaneously.


Results of the Multiple interval mapping (MIM)

P.E.V. [P.E.V.(ad)] is the percentage of explained variance [explained additive variance] of the trait

Multi-chromosome set for two-trait analysis


7-12

On the graph obtained by MIM analysis you can see both green (only for simulated data) and blue (QTL fitted using MIM analysis) symbols. Click the blue QTL symbol to see the table of “Detected” values.

To compare the results from <before> and <after> graphs, open both of them. You can open a graph of a model obtained after MIM and further tune this model by creating its derivative submodels and compare these with the obtained MIM model.


Results of the Multiple interval mapping (MIM) (continued)


7-13


A graph for a “before”-MIM model

Submodel for the “after” MIM model and its general characteristics

A graph for an “after”-MIM model



7-14


By pressing buttons <General PEV> or <General PEV(ad)> we obtain PEVs for each of the traits


Multi-chromosome set for multiple-trait model


7-15


We can compare the results for each chromosome (chr. 2 in graphs below) obtained by single-chromosome analysis and by MIM (“before” and “after” variants).

You can further work with the results of MIM, by building new submodes, e.g., based on “trait contribution” analysis in case of combined multitrait-MIM analysis (see the green submodel above).



7-16

Multiple Simulation option

This regime is possible for simulated data only! To perform multiple simulations, select <MultiSimulation> from the menu.

Part 7 : Multiple chromosome analysis

The window <Multiple simulation > appears. It has two options. The <Calculating Empirical Threshold> option allows you evaluating the QTL detection power in multiple simulation analysis. By choosing this option, we get the <Calculation Emp> window. Press <Start/Continue> button of this window to start the process. New diagrams appear in the <Multichromosomes Set> window:

for Multiple set with one or two traits for Multiple set with multitrait set


7-17

After closing the <Calculation Empirical Threshold> window we can begin multiple simulation process. This process may also be performed without previous calculation of the threshold. Press <Multi-simulation> button and <Start/Continue> button in the <Multisimulation Test> window.

Multiple Simulation options (continued)


The diagrams in the <Multichromosome Set> window are ever changing during the calculations. Close <Multisimulation Test> window in order to see the results of multiple simulation process.

Parameter <Samples> of this window can be changed.

Now close the <Multisimulation Test> window.


7-18

Click a chromosome name to see the results.

Click the <Open in EXCEL> button in the table of the result to open it in EXCEL.

Choose <Experimental levels> of the test statistics threshold to see the corresponding detection power for each chromosome.

Multiple Simulation options (continued)

Part 7 : Multiple chromosome analysis

If the radio button <Computed> is in state <On> then in all rows the deviation from mean values is shown. In case <Occurred>, deviations from occurred values are shown.


8-1

Table of contents

Introduction

Single-QTL analysis for one trait model

Part 8 Multiple environments

3

4

5

7

8

13

14

16

17

Parameters of simulated data (multiEnv.job)

Results for model with trait normalization

Results for model without trait normalization

Submodel option


Submodel compare option

Distribution option

Significance options


8-2


18 19 20 21 22

23 24 25 26 27 28 29

Two_linked QTL analysis for two-trait model

Results of computation Submodel option Significance options

Multichromosome set

Multiple interval mapping (MIM) Multisimulation option

Parameters of simulated data (multiEnv3chrom.job) Results of computation Estimate option Submodel option Submodel compare option

Single-QTL and Two_linked QTL analysis for two-trait model

Format transition: “multiple-environment” “multiple-trait” formats


8-3

Introduction

In the current version of the package we consider only the case when the same mapping population is phenotyped in several environments (for description of the analytical model for such a situation see Jansen et al. 1995). The marker genotypes are the same for all environments, but the phenotypes are different for each environment; thus the number of phenotypes is equal to that of genotypes multiplied by the number of environments. For this type of data all models in the package are available, excluding multi-trait models that will be implemented in further versions. We will also implement the approximate model based on environment “bioindication” principle, with no limitation on the number of environments (Korol et al. 1998). Still, the current version allows two-trait analysis with multiple environments, as well as all the aforementioned options combined with MIM. The number of environments in the current version may be relatively large, hence the number of parameters may also be large. In order to reduce the number of parameters, it is possible to create a model with trait centering / normalization for each environment (see page 7 of Part 3). To show how to analyze data in this case, we give an example of simulated F2 data for one trait scored in six environments. The number of genotypes is 100; phenotype number is 600; file name multiEnv.job.

Part 8: Multiple environments

In the data scored in multiple environments we provide a technical function “normalization”. The intention is to make data transformation in such a way that the residual variance will be the same in all environments. The Estimates tables can display the results in either the normalized or non-normalized form, in accordance to user’s choice.


8-4

Single-QTL analysis for one-trait model

In order to see the general simulation parameters press <Param> button and panel <Setting parameter value> will appear. To see the QTL parameters, press on the green QTL symbol and the panel <Setting QTL’s parameters> will appear.

Two models were created: m without trait normalization and m_tn with trait normalization.


Parameters of simulation data


8-5

Results for model with trait normalization

Press left mouse in any interval to get the estimates. Then, choose whether to see the results for normalized or non-normalized regime.

Estimates for normalization regime

LOD graph is shown in the scanning form.

Estimates for non-normalization regime


Single-QTL analysis for one-trait model (continued)


8-6

<Estimate> option

If status of check box <Trait normalization> is ON, we see the estimates for normalization model.

Otherwise we see the estimates for no normalization model.



Results for model with trait normalization (continued)


8-7

Results for model without trait normalization

Estimated values for any interval can be received by left mouse button click.

Table of all estimated values is received by choosing <Estimate> from the menu options.




8-8

Submodel option

In multiple-environment analysis, the submodel options include two parts: -You may take into account some assumptions about the problem parameters for selected environments (same way as in the case of one environment). Then only conditions <no variance effect> and <no covariance effect> are taken into account. - You may want to put some problem parameters equal for some or all environments.

By default, no special conditions except <no variance effect> and <no covariance effect> are taken into account. The first graph on next page (shown in red) represents the default submodel. Select <SubmodelAdd> menu option, panel <Submodel> appears.



When the panel appears, the status of the <Constraints within selected environments> radio button is ON.

For the first example we choose constraint h=d/2 (dominant model) for all environments; we press <Apply>, and then <OK> buttons.


8-9

For the second example we choose condition h=d/2 for 2,3,5,6 environments. It is necessary to press <Apply> button after each choice. We turn to the lower part of submodel panel. For that, we press <Equating parameters in the selected environments> radio button. It is not necessary to choose anything in the upper panel.




For the second example, we first click on <dom.ef(h)>radio button on the lower panel part; then we choose environments: 2,3,4,5; and then we press <Apply> and <OK> buttons.

We obtain tree graphs: - red graph is the default submodel; - green graph, for our first example; - blue graph, for our second example.


8-10

A panel appears, and in its upper part we see the constraint selected for this submodel. In the lower part of panel, we choose <ad ef(d)> radio button and environments 1,2,3; and then we press <Apply> and <OK> buttons.

We shall show how to receive a new submodel by using an old submodel. We select “green” submodel and choose <SubmodelAdd> option.

New graph (yellow) is displayed.





8-11


You may want to delete one (or more) of the reviewed submodels. Thus, let us delete the “green” submodel, using <SubmodelDelete> menu option. We can take one of the already defined models, and put additional conditions on the parameters, creating thereby a new submodel. We choose the default submodel (red). In the lower part of <Submodel> panel we choose <ad.ef(d)> radio button and environments 1,2,3.


Choose this new graph and <Submodel <Add>

option for that.


Now we press <Apply> button, and then choose <dom.ef(h)> radio button and the same environments. We press <Apply> and <OK> buttons. New graph (yellow) is displayed. The ToolTip (prompt) shows the full name of this submodel


8-12

In the lower part of <Submodel panel> we see condition defined on the previous page. By pressing <dom.ef(d)>radio button, we see the selected environments (1,2,3) since this condition is also satisfied for this submodel. We may also delete a condition by pressing <Remove> button.

Press <dom.ef(d)>radio button, then <Remove> and <OK> buttons. We create a new submodel for which only one parameter <ad.ef(h)> is equal for the 1-3 environments.





8-13

About submodel option Choose <About submodel> menu option.

Rectangle’s color corresponds to the submodel graph’s color.




8-14


Choose <SignificanceCompareSubmodel> menu option. Two submodels may be compared if parameter set of one of them is a subset of parameter set of the other. Therefore, all submodels may be compared to the default model. In this example only submodels 4 and 5 may be compared one to another.




8-15

For the “yellow” submodel the ‘Comparison’ window allows to test it against two other submodels. In the figure, the main (red) submodel was selected, and to compare the two submodels click <OK>. <Comparison test> panel appears.

All the initial parameters here may be changed. Press <Start> and the test will run.



Submodel compare option (continued)


8-16

Distribution option

Options of the <Distribution> menu are similar to those in one- and two-trait models. Note that in this case, trait distribution may be displayed for each selected environment.

For each environment, we may

choose any interval to display the

distribution in putative QTL groups.




8-17

<Significance Compare H1H0> and <Significance Bootstrap> are identical to those in single QTL models in single-environment problems.




Now, in the bootstrap table, parameters for all environment are displayed. By moving the Scroll bar you can see the parameters for each of the environments


8-18


To show the features of the two-trait analysis and to demonstrate multiChrom regimes (MIM и multiSimulation), we consider a simulated example multiEnv3chrom. Simulated were backcross data with 3 chromosomes, 2 traits scored in 6 environments, affected by 3 QTLs; the number of genotypes was 200, hence the number of phenotypes 1200.

Single-QTL and Two_linked QTL analysis for two-trait model



8-19


We show below the results of computation of two-trait model q1t2 with one QTL for chromosome 3. The figures display the results for two traits across all four environments.

Single-QTL analysis for two-trait model

Results of computation


8-20


Single-QTL analysis for two-trait model (continued)

Estimate option


8-21


In case of two-trait analysis the panel for creating submodels and <About submodel> panel for displaying the structure of the submodels are different from those of single-trait models (the graphs on this page are for chromosome 3).


Submodel option


8-22


In this example, the submodels can be created to test various hypotheses. Thus, for testing QTL-E interaction hypothesis, we can set equal the QTL effects (di) in the 3 environments. In comparison of such models, the criterion employed in MultiQTL is difference in likelihoods rather than Critical LOD increment of the compared submodels (although we have not changed the corresponding text in the Comparison Test panel).

Significance options are similar to those of other models.



From the comparison of LODs on graphs on the previous page, we can see that the difference in maxLOD values for these models is about 2. We should that for comparisons of different submodels in the Multiple-Environment analysis, the criterion for comparison is the difference of likelihoods (and not of LODs) that may differ from that of LODs.


8-23



Results of computation

Consider the results of computation of two-trait model q2t2 with two_linked QTLs of chromosome 1. The results for two-trait model across all environments.


8-24


Two linked QTL analysis for two-trait model (continued)

Submodel option

In this case, both the panel <Submodel> and window <About submodel> display parameters of two QTLs.


8-25



Significance options H2H0 and Bootstrap practically do not differ form corresponding options in other sections of package. <Significance Compare H2 H1> option are identical to those for two-linked QTL models in single-environment problems.


The results for two-trait model q2t2 (for

two-QTL model, we had maxLOD=12.84)

The results for single-trait model q2t1 (trait2)

(for two-QTL model, we had maxLOD=5.33)


8-26


Multichromosome set

Let us open the MultiSet option, create a set for two-trait model and set two-QTL for chromosome 1 and single-QTL model for the other two chromosomes (for the details of the simulated data set see page 18. After opening the set, select MIM option.


8-27


Multichromosome set (continued)

Using MIM analysis of the foregoing data, two effects were detected, on chromosomes 2 and 3. By pressing corresponding buttons we can get corresponding values of P.E.V. and P.E.V.(ad). In this case these values are equal because of absence of epistasis in two-QTL model.

As in general case, we can open for each chromosome windows for “after” MIM analysis, and create and compare submodels for each of these.



8-28


Multisimulation option

The Results table for all environments and all traits will appear after closing <Multisimulation Window> and then pressing the button with the name of chosen chromosome



8-29


Creating a project with “single-environment” traits using “multiple-environment” project

Some options for single-environment analysis are limited when your are working within a multiple-environment project, e.g., you may want to consider the trait scores across environments as a multiple-trait complex. For a transition to single-environment analysis you can apply the <Extract> option of the main menu (see Part1, page 59). By choosing this option, you will get a new window:

First, if we chose option <Environments> our data will appear in the multiple-environment format. Now, by selecting the <Traits> option, we can get for each trait a multiple-trait complex with the number of traits equal to the number of environments. The names of these new traits are formed from the old name extended as *_e1, *_e2 , *_e3…

We should now set a name for the new folder where the extracted data will be saved, and add a new folder name in the current folder (shown in the window) or in another folder. The new folder will host a new single-environment project that can be created using the <New Project> option of the main menu (see Part1, page 59).



8-30


Under some circumstances an opposite transformation may also be of interest: creating a a “multiple-environment” project from an existing standard project. Again, we can employ <Extract> option of the main menu. Corresponding windows look now a bit different:

If the check box will be now set in state <On>, the window will change: a list of all traits of the project will appear together with a cell for setting the number of environments for new multiple-environment problem.

We can select from the list the “traits” that represent scores of one trait across a few environments, write the trait name and by pressing button <Add new trait> define the first trait of the new multiple-environment problem. Then the next group of scores, for the second trait, should be selected, etc. Clearly, at each such step, the number of selected scores should be the same and equal to the number of environments. By default, the names of the environments are: “env1, env2, …”. By pressing <OK> we finish the process of defining trait-environment combinations, and the system will suggest to set the folder name. By pressing again <OK> we conclude the operation. Now, a new multiple- environment project can be created using <New Project> option of the main menu.

Creating a “multiple-environment” project using a project with “multiple-trait” data



9-1

Table of contents

Introduction

Parameters of simulated data (family3chrom.job)

Computation results

Submodel option


Distribution option


Multichromosome set

Part 9 Multiple families

2

4

6

8

10

11

12

15

16

17

MIM

Multisimulation


9-2

Introduction

Part 9: Multiple families

This section allows you joint analysis of multiple families (populations) of the same structure. Such analysis was not available in the previous versions of MultiQTL. It is noteworthy that the model of “multiple families” can also be applied for “multiple- environments” data sets. Namely, it allows conducting QTL-E analysis when the data include different genotypes scored in different environments, unlike our other multiple-environment model (Part 8) where the same set of genotypes is scored in multiple environments. The “multiple families” model and algorithms implemented in the current version can be applied for mapping analysis of data that fit a few additional assumptions:

- Same population type is assumed across families (but we plan to relax this restriction. In the future versions of the package this restriction will be relaxed. - For each marker locus, at least one family must must be polymorphic (this allows using the algorithms of the multiple-families section for QTL analysis in advanced backcross design (e.g., Luo et al. 2002). - Missing marker scores are automatically recovered based on information on polymorphic neighbor markers. - Each trait should be scored in all families. - Data input is conducted separately for each family (see Part 1).

There are also some limitations in the algorithm presented in the current version:

- Only single-trait and two-trait models are available. - Selective genotyping model is not yet available. - Each chromosomal interval is scanned over the user-defined number of steps, hence computations are relatively slow for two-QTL F2 models.


9-3

Introduction (continued)


The main options of the Multiple families analysis will be demonstrated using simulated data. As with real data, the number of individuals, chromosome lengths, QTL relative position within the interval, and QTL effect can be family-specific. As indicated above, this section of the package can also be applied to analyze data sets from multiple environments when the mapping population is comprised of sub-populations (or families), each represented by its specific genotypes. Likewise, families derived from advanced backcross design (that become popular due to a possibility of combining together breeding scheme and QTL mapping), can also be analyzed based on the multiple-family mapping. Due to the specificity of multilocus marker haplotypes sampled in each family/environment, linkage phase between marker loci (as well as between marker haplotypes and the QTL) may also be family-specific. Despite the last difference and corresponding differences in the mapping model and algorithms, the logic of the analysis and the main questions and tests are quite similar, allowing a less detailed description compared to that of Part 8. Still, possible variation in linkage phases between the markers and QTLs needs a special consideration that differs from that in Part 8. Namely, if in the previous Multiple-environment analysis one found nearly equal but oppositely directed QTL effects in two environments, this could be interpreted as QTL-E interaction (due to identity of the set of genotypes scored in the two environments). This situation cannot be interpreted as QTL-E interaction in the new model with family/environment specific genotypes (subpopulations). In the last case one can only declare about phase variation among the subpopulations.


9-4



The illustration examples will be based on data simulated in family3chrom.job file. Number of individuals, chromosome lengths, QTL positions/effects can be family-specific. Details about the parameters can be obtained in corresponding windows (see Part3 page 13).

Windows <Setting QTL’s parameters> show how the values of QTL effects and positions are displayed. The QTL location in interval 3 for family 2 and in interval 7 for family 1 can be seen after selecting the corresponding family in the window <Families Selection>.


9-5

Parameters of simulation data (continued)


Here the mean values of the traits are set different across families, whereas the values for Stand.dev and Correlation were equal in different families.


9-6


Computation results

Four models with one and two traits with single- and two-QTL were created. To conduct the computations the user should choose the number of points in each interval to fit the model (by default 10).

Analysis of two-trait model with single QTL is shown in the figures. In the <Estimated> table we can see the lengths of the marked interval for each of the 4 families whereas table <Occurred> shows the QTL position relative to the left flank of the interval. By moving the Scroll bar you can see the parameter estimates for each family.


9-7


As usually, table Estimates displays the intervals with max LOD values (the local ones and the global), the sample size, and coordinates of the intervals’ flanks for each family and the location of max LOD (the local ones and the global) in each family. By moving the Scroll bar you can see the parameter estimates for each family.

Computation result (continued)


9-8

Submodel option Part 9: Multiple families

The <Scanning> option is not provided for Family data because the basic computation is conducted by scanning along the interval. The <Submodel> option is identical to the corresponding option of the Multi-Environment analysis (see Part 8). Two examples of submodels are shown below.


9-9

Important comment: In creating new submodels with equal additive (d) effects, we actually take into account the size of the effects, because linkage phase between the markers and QTL may vary among families (i.e., coupling or repulsion). Similar comment is also applied when heterozygous (h) effects are supposed equal between (some or all) families. It should be noted that if both, d and h, are supposed to be equal between families, we take care that the sign relationship between d and h is maintained within each family. An example of F2 is provided below:



We create a submodel with (d) and (h) equal in all families (i.e. the absolute values are equal).

Please compare the effects of the new submodel (green) with those of the initial model (red) for the interval with max LOD.


9-10


Choose <SignificanceCompareSubmodel> menu option.


This option is identical to the corresponding option

of the Multi-Environment analysis.


9-11

Distribution option

Options of the <Distribution> menu are similar to those in one- and two-trait models.

Note that in this case, trait distribution may be displayed for each selected family.



9-12

<SignificanceCompareH1H0> is identical to that in single QTL models with either single or multiple environments. But due to specific aspects of the algorithm, the computation is slower here. We demonstrate this option on the second trait using single-trait model and markers of chromosome 2.




9-13

Significance options (continued)

<Significance Bootstrap> option is identical to that in other models. However, in the bootstrap table parameters for all families are displayed. By moving the Scroll bar you can see the parameter estimates for each family. The displayed example is the same as was used in the demonstration on the previous pages.



9-14

<SignificanceCompareH2H1> options are identical to those in two-linked QTL models for single environment problems. An example of comparing two-QTL and single-QTL models for first trait of chromosome 1 is shown.

Significance options (continued)



9-15


Multichromosome set

By calling for the option MultiSet we create an example of a set with single-trait model for trait 2. For one of the chromosomes (chrom1) two-QTL model was selected and for the other two chromosomes single-QTL model was selected. After opening the set, we choose MIM.


9-16



By using MIM function we revealed three effects, on chromosomes 1, 2, and 3. By clicking on the QTL symbol (blue triangle), the table of results can be obtained.



9-17


Multisimulation option As in general case of simulated data, Multisimulation option can be obtained. A table with results for all families (environments) and traits appears if you press the button with the name of the selected chromosome, after closing <Multisimulation Window>.



10-1

Table of contents

Introduction

Data Simulation

Computation results

Part 10 Selective genotyping

2

3

5

6

7 8 9

10

10 Multisimulation option Multichromosome set

Results for four models on simulation data Real data Real data. One-trait model Real data.Two-trait model Estimate option


10-2

Introduction

Part 10: Selective genotyping

Selective genotyping (SG) is cost-efficient approach of QTL mapping utilizing the fact that the relative amount of information about QTL-marker association is much higher in the individuals from the tails of trait distribution compared with those from the middle part of trait distribution (Lander & Botstein, 1989). Consequently, in SG design only a part of objects, with minimum and maximum values of the target trait, are genotyped for marker loci. However, in order to get unbiased estimates for the QTL effect, all phenotypes, non-genotyped and genotyped, are included in the analysis (for the advantages of SG see: Lander & Botstein, 1989; Ronin et al. 1998, 2003; and refs. therein). To demonstrate the SG functions of the package, simulated data will be employed. Simulation of Selective Genotyping data sets can be achieved by creating special models with single- or multiple-traits (with only one of the multiple traits being the selected trait). Clearly, the weight of the left and right tails can be different in the simulated as well as real data. To analyze the data, models with single- and two-QTL per chromosome can be employed.


10-3

Data Simulation

To conduct the simulation and build the analytical model, select the <Selective Genotyping> option. Choose the target trait from the <Selected trait> menu that will define the objects for selective genotyping from the tails of the distribution of the selected trait . A panel named <Selective genotyping> appears. Click in this panel to set selective genotyping parameters.

To create the data for SG examples we should return to the corresponding section of Part3 (see page 3-7). The simulated example included 2000 individuals F2. Four data sets with different sizes of tails selected for genotyping where simulated: - ms10: 10% individuals for SG; - ms20: 20% individuals for SG; - ms40: 20% individuals for SG; - ms100: all individuals were genotyped.



10-4

Data Simulation (continued)

When panel <Selective Genotyping> opens, the option <Rough Tuning> is on. Select the left part of the trait distribution (min. values) by moving the left slider. Then select the right part of trait distribution (max. values) by moving the right slider.

The percent of selected objects and the number of objects will appear in the

corresponding areas. Note that in the current version, for calculating the interval lengths, we simulate and employ marker scores for all individuals. For fitting the SG mapping model only genotypes from the defined tails are used.

Fine-tuning the number of selected objects is possible. To do this, click <Fine Tuning> radio button and move the left or right slider. In this way we can create all four data sets.

To create a simulation project, it is necessary to choose the threshold phenotypic values of the chosen trait.



10-5

Computation Results

Results for the simulation data sets: We show below the results for all 4 sets that may give an idea how the simulation tools can help you in designing experiments. Note very close results in ms100 and ms40 models despite 2.5-fold smaller sample size in ms40.

Calculation result for ms100 model.

Calculation results for ms10 model. Calculation result for ms20 model.

Calculation results for ms40 model.

Scanning option was used in both cases. Estimated parameter values for the interval with maximum LOD value are displayed.



10-6

Computation Results (continued) Real data

For illustration of real data analysis, we employ data on population F2 with 1000 phenotypes (four traits, trait1…trait4), with 400 being genotyped for markers of 2 chromosomes. Individuals for genotyping were selected from tails of the phenotypic distribution of trait1. In the input data first are placed the selected (genotyped) individuals and then the remainder (for more details on input <Selective Genotyping> data see Part 1 page 30).

Two models were created for this set, with single- and two-trait analysis. Note that for although any two-trait combination can be used for two-trait analysis, we strongly recommend to use only pairs that include the selected trait. For that, you may find helpful the option “Two(2lists)” as shown in the illustration (note that the option <Selective Genotyping> is in state <On>; the selected trait is indicated). See also Part 3, page 5.



10-7

Computation Results (continued)

Real data. Single-trait model: It is noteworthy that if SG was based on tails for a trait x, then single-trait analysis for any another traits y=yi correlated with x may result in biased estimates of QTL effects for y. This point can be illustrated by the results for Trait2 and Trait3.

Trait1 Trait2

Trait3 Trait4



10-8


Real data.Two-trait model: The aforementioned bias, i.e. the possibility of false positive detection for traits correlated to the selected trait, can be corrected by using two-trait analyses for (x, yi) (see Ronin et al. 1998).

Trait1 -Trait2 Trait1 -Trait3

Trait1 -Trait4 The obtained <Estimates> for the pairs indicated relatively high Cor.Coef of Trait2 and Trait3 with Trait1 (~0.7) and small Cor.Coef for Trait4 with Trait1. Accordingly, the estimated QTL effects for Trait2 and Trait3 vanish in the two-trait model compared to the single-trait model. Clearly, this does not mean that the effects detected for the correlated trait should always be zero.



10-9

Estimate option


The <Estimate> table carries the information about the number of phenotypes and genotypes, as well as the name of the selected trait.

Significance options are identical to those in the usual analysis (with non-SG data) and, therefore, are not considered repeatedly here.



10-10

Multichromosome set

MIM analysis is not implemental for SG data, if together with genotyped the non-genotyped phenotypes are also included into analysis to reduce the biases. But for simulated data, it may be instructive to employ MultiSimulation option, to get a better understanding of the effects obtained under single-run simulation analysis. As an example, we consider the previous example of an F2 population with 1000 phenotypes out of which 40% were selected from the tails of the Trait1 distribution for genotyping. The simulated additive QTL effect was 0.5 for Trait1 and 0 for Trait2. The results of two-trait analysis for a single simulated set look like:

We create a single chromosome MultiSet and employ the MultiSimulation option:

One can see that the average ad.ef(d) for Trait2 (equal to 0.0637) is clearly smaller than the value obtained for a single simulated experiment (0.155) and approaches to the initial set zero effect.



11-1

Part 11

Summarizing the results

Table of contents

Introduction

Simulated example (TotalSign1.job)

Creating Second project for this example (TotalSign2.job)

Total Significance option

Significance value on the Calculation panel

Model groups

Creating the result table with specified threshold significance

Computation of the total significance by Benjamini & Hogberg method

Creating the report table

Creating a summary panel with LOD score graphs

Appending the second project

References (to Parts 1-11)

2

3

4

5

6

7

8

11

13

18

20

23


11-2

In the previous parts, it was shown how to perform data analysis for all models. Now we will see how to summarize the obtained results. This part includes three aspects: (a) Combining the results obtained in different job files; (b) Calculation of the experiment-wise significance; and (c) Generating output summary information (tables and figures). A real data set (problem) usually includes many chromosomes and traits. Consequently, many computing intensive analyses should be conducted and some of the calculations (especially for two-linked models) take a lot of time. Therefore, to expedite the analysis, it may be helpful to create several projects (job files) for one problem and to perform calculations for different projects on several computers (in the current version only two projects can be merged in one step, but you can apply this option sequentially). Finally, we want to combine all the calculations and to summarize the results for the whole problem. In particular, we need to take into account the fact that the calculated trait-chromosome significances should be corrected for multiple comparisons to get experiment-wise significance. All these steps will be shown on simulated examples.

Part 11: Summarizing the results

Introduction


11-3

Simulated example

Chromosomes and QTLs are shown on the <Setting the <Chromosomes> panel. Trait information is displayed on the <Setting parameter values > panel.

We simulated data for a project with name “TotalSign1.job”.

This data corresponds to backcross population genotyped for markers of six chromosomes and phenotyped for five traits in one environment. The number of genotypes (and phenotypes) is

200.



11-4

Creating second project for this example

In our example, we defined and computed four models: - one-trait with single- and two-linked QTL, - two- and multiple-trait with single-QTL. In order to show how to work with several projects, we created a second project for our example. For that, using <File Extract> option of the main menu, a folder “Total” was created with chromosomes, traits and information files.

By <FileNew Project> option, we created in this folder the second project, file “TotalSign2.job”, with two chromosomes and three traits. In the <Select Data Files> panel we selected

“chrom4“, “chrom5“, ”trait3”, ”trait4”, and “trait5”. For more

details about these functions see in Part 1 pages 44 and 59.

In the second project, three models were defined and computed: one-, two- and multiple-trait analysis with two-linked QTL.



11-5

Total Significance option

For every chromosome-trait (or chromosome-[trait complex]) combination in both projects, data analysis was performed by creating submodels and testing significance. We should now obtain genome-wise (or experiment-wise) significance.

On the <Calculation panel> significances are displayed only for the chromosome-trait pairs that have already been calculated.

First, we choose the model “m1” (one-trait & single- QTL).

Let us open the first project “TotalSign1.job” by <FileOpen Project> main menu option.

Then, we choose <Total Significance> menu option.



11-6

- Significance value is displayed. It means, that for this cell only one submodel was tested for significance. - Colored rectangle in the cell refers to color of the LOD graph of submodel with the best significance. - The term “Multiple match” means, that significance was computed for several submodels and the best is not chosen.

By clicking left mouse button on this cell, we should get a window with several graphs. We can choose the best submodel (e.g., the one obtained after and removing the non-significant parameters) and close the window. Colored rectangle will then appear in this cell.

Significance value(s) on the Calculation panel

We define the term cell for chromosome-trait or chromosome-[trait complex] combination. There are three options of displaying the significances, as explained below:



Model groups

Clearly, to correct for multiple comparisons, one cannot consider simultaneously “heterogeneous” models, e.g., it is impossible to combine within such a summary test single-trait and multiple-trait models. Thus, comparison should be performed for the same model groups. The simplest division of the models into such groups is to consider separately one-, two-, and multiple-trait models. But two one-trait models based on marker and interval analyses must also be classified into two different groups. A created group should include models, where all initial parameters are the same, and only single- and two-linked QTL models and their submodels being included in one group. For example, three groups of models are presented in our first job file “TotalSign1.job”:

- One-trait model group includes m1 and m2 models for single- and two- linked QTL, respectively. - Two-trait model includes one single-QTL model, m1tr2. - Multiple-trait group includes one model

m1mlt with single QTL.

11-7

These groups are displayed on the

<Total significance> panel.



11-8

Press <Threshold sign.choice > button and set maximal limit of significance value in the <Declared threshold> panel. Press <OK> button of this panel.

Labels in the cells of the <Calculation Panel>, in which significance is better than the maximal limit, will be highlighted in red.

In <Total significance> panel, we choose model group “One trait” with m1 and m2 models of the first project. Then examine all significance values on <Calculation Panel> for all models of this group. For the cell labeled “Multiple match” choose submodel with the best significance. Estimate maximal limit of significance value for this model group.

Creating the Result table with specified threshold significance


Press <Display chosen cases> button in order to receive the “total significance” table. It is possible that different models are represented for one cell (e.g., single-QTL and two-linked QTL). In such a case, a new panel for model choice appears (see next page).


11-9

You can choose one of two models or exclude both. For choice, click left mouse on one of the two <Choice>radio buttons, and then press <Include one> button. For exclusion, press <Exclude both> button. It is necessary to perform this operation for every cell for which both single- and two-QTL models have been fitted previously.

Creating the Result table with specified threshold significance (continued)


Finally, we receive the table with all selected significances. By default the information in this table is sorted by the chromosome number. It is possible to sort it by the trait number. For that, it is necessary to change the choice in the window <Sorting>.


11-10

By using this table we can: - Calculate experiment-wise significance based on method of Benjamini & Hogberg (1995) of controlling False- Discovery-Rate (FDR). - Obtain combined estimates table for all significant results. - Assemble the LOD score graphs for all significant results.

For that, it is necessary to choose all

or some of the rows in the table and press <OK>.

After that one of the resulting options of the <Total significance> window may be chosen.


Creating the Result table with specified threshold significance (continued)


11-11

Computing total significance by Benjamini & Hogberg (1995) method

Computation is performed if corresponding option of <Total significance> window was chosen. It is necessary to define FDR parameter.

In case of choosing all rows of the significance table, this table will be shown again, but only rows whose significance fits the FDR criterion will be included in this table. In case of multitrait groups, computing of global significance is not performed.



11-12

For example,”chrom1”,”chrom3” and “chrom5” were chosen and their global significance was computed with FDR criterion=0.01. Results are represented in the “choice 1” window. The same operation was performed for ”Chrom2”,”chrom4” and “chrom6” and their computed results are represented in the “choice 2” window. In cases where the “Choice all” option was marked, the computed results are represented in the same (main) window.

In case of choosing all rows of the significance table, this table will be shown with rows whose significance fits the FDR criterion.


Computing total significance by Benjamini & Hogberg method (continued)


11-13

Creating the Report table

You can get global report table for any significance table when pressing <Report table> button, even without performing global significance computation. We receive the report table for the chosen rows. Every row of this table contains estimates for one cell (trait-chromosome combination). Estimated values may be received from bootstrap test, if it was performed during the mapping analysis. In this case, the report contains an estimate of “QTL detection power” and the parameter estimates accompanied with standard deviations/errors.

For that, it is necessary to choose the significance level (0.05, 0.01 or 0.001) that will define the threshold value of the test statistics under H0 (calculated earlier based on permutation test). If “bootstrap” test was not performed during the previous analysis, the parameter estimates are

received from main computation.

If two-linked QTL analysis was performed for some cell, the corresponding line in the table will include estimated values for the two QTLs and “Epistasis” values (if the corresponding model with epistasis was created and fitted earlier). See example in the next page.



11-14

Creating the Report table (continued)

In our example, this report is provided for one-trait group, for a backcross population. Bootstrap test was not performed for pairs chrom1_trait5, chrom4_trait4, chrom5_trait3, and chrom6_trait1. Two-linked QTL analysis was performed for pairs chrom4_trait1, chrom4_trait2, and chrom5_trait2. All computations, except for chrom4_trait1 and chrom4_trait2 pairs, were performed for the general submodel.

In order to see all results, use scrolling option. Press <Open in EXCEL> button to transform the report table to Excel format.



11-15


This report refers to two-trait group for a backcross population. All estimated values were

displayed for each trait.



11-16


This report refers to multi-trait group of a backcross population. All estimated values

were displayed for each trait.



11-17


A multi-environment example is displayed in this table (multEnvSign.job file). This report refers to one-trait group of a backcross population. All estimated values were displayed for each environment.

Submodel with equal effects (d) for env2, env4, and env6 is used for chrom1_trait1 pair. For chrom2_trait1 pair, the submodel was fitted with trait mean (m) equal among env1, env2, env3; QTL effects were equal across env1-env5 and variance effects were equal among env3, env4, and env6.



11-18

Creating a summary panel with LOD score graphs

It is possible to get a panel with LOD score graphs for any significance table when choosing all or some rows and pressing <LOD score graphs> button of the <Total significance> window. Use <Save to PCX format> menu option to get an output file with *.pcx extension. It is necessary to enter the name of this file. The graphical file is automatically placed into the folder with the project file. Up to 12 graphs can be placed in one page. If the number of chosen rows is >12, additional pages are created.



11-19

Creating a panel with LOD score graphs (continued)

Use option menu <ViewNext> or <ViewPrevious> for page turning.



11-20

Appending the second project

Now we will show how to add the results contained in the second project to this table.

We have to open our second project “TotalSign2.job” by choosing <FileOpen Project> main menu option. In order to save the best selected submodels in the first project, answer “YES” to the message that appears on the screen:

If the results of first project were received from bootstrap test, it is necessary to choose and to remember the significance limit level for each model group. A message with corresponding question will appear.



11-21

In the second project we also choose model group “One trait” in the <Total Significance> panel and corresponding model “m1Qtl2” in <Calculation Panel>.

Threshold significance of this model group (0.003) was already defined. Therefore, we could see red messages in the cells of the <Calculation Panel>, in which significances are less than the allowed maximal limit. Note that the limit value for second project of current model group has to be equal to that for the first project.

Press <Display chosen cases> button to

receive the combined significance table for

both projects.

Appending the second project (continued)



11-22

Different significances may be received in the same cell for two projects. It is necessary to choose the best submodel or exclude both using the special panel.

Finally, we receive the table with all

selected significances of both projects.

Appending the second project (continued)



Table of contents

Changes and additions to Total significance option

• General option

• FDR analysis after fitting MIM model

2

3

4

5

7

9

9

13

16

25

28

The main additions to version 2.6 as compared to 2.5

Analyzing epistasis between non-homologous

chromosomes

Separated storing of the results files

Extending Re-sampling options by Jackknife analysis

Dealing with traits that are proportions: Logit transformation

Global permutation test for “after fitting MIM” model

Allowing for half-sib family structure

Tutorial book of MultiQTL package, part 12

Description of the additions and changes included to version 2.6 as compared to 2.5

part 12

Building submodels with variance effect for multiple-environment

and multiple-family data

12-1


• Epistasis analysis for both linked and unlinked QTLs

for singe-, two-, and multiple-trait models, for singe- and multiple-

environment models, interval and marker analysis

• Jackknife analysis, in addition to bootstrap re-sampling analysis

• Global permutation test after fitting MIM model for a selected set

• FDR analysis after fitting MIM model (for genome-wise analysis)

• Extension of FDR analysis for multiple-trait models

to facilitate selection of variants with non-overlapping traits for joint

FDR analysis

• Allowing for half-sib family structure (via extended data input)

• Logit-transformation for QTL analysis of proportions

• Separated storing of the results of epistasis analysis for non-linked

QTLs and MIM results (to prevent too large job files)

The main additions to version 2.6 as compared to 2.5

part 12

12-2


All results of analysis of epistasis between QTL on non-homologous chromosomes are stored in a separate

file in the same folder where the main job file of the problem was saved. The file for these epistasis results

has the same name as the main job file, but a different extension, *.twc. It existence is noted in the main file

and is checked each time when the main file is opened. But the file *.twc is opened only when the user

select the corresponding option of the main menu.

Separated storage of the results

In the new version of the package, all information about multi-chromosome sets (multiChromosome set )

treated using Multiple Interval mapping (MIM) is excluded from the main file *.job. This information is stored

in a special file with the same name but different extension ( *.mim). The program reads this file only when

the user starts working with multi-chromosomal sets or employs option <Total significance> <After

MIM>. During opening of the *.job file the system checks whether the folder that harbors the opened file

*.job includes also the file with same name and extension *.mim. If this file is absent, for a problem where

MIM analysis has already been conducted for some sets, a corresponding warning message will appear.

Such a situation may be a result from moving/copying the job file to a new folder without simultaneous

moving/copying the *.mim file.

The described sub-division of the results into three sections, stored in the main file (*.job) and two

additional files (*.twc and *.mim) was done to reduce the size of the files.

IMPORTANT:

Comment A. It should be stressed, that this new form of storing the results makes impossible reading

the new results (created using version 2.6 of MultiQTL) by previous versions of the package (2.4 and 2.5).

Clearly, the old results are readable by the new (2.6) version.

Comment B. If you have already used the previous version of MultiQTL, you may want to install the new

version in a separate folder, in order to have both versions in parallel, a least for some transitory period.

part 12

12-3


12-4

Extending Resampling options by Jackknife analysis

Previous versions included only one of the two main resampling options – Bootstrap analysis. Here we add a

new option – Jackknife analysis. Choosing one of these two options is conducted at the sage of defining the

model. The window <Extended Parameters> of the version 2.6 includes a new box <Resampling

method>. By default, the <Bootstrap> method is applied. If user selects Jackknife method, the size of

sampled part of the mapping population should

be indicated in % (e.g., 60%).

On the next steps, all operations with this model

that employ Re-sampling options, are based on

the choice made here. This includes single-, two-

, and multi-chromosome sets. Global re-sampling

analysis will also be based on this choice. We

strongly recommend not using subsamples close

to 90-100%.

The chosen method will be indicated in the title of the window for Resampling analysis, and if Jacknnife was

selected, then the sampled part of the total sample size will be indicated:

(additions to Part 3 - Extended Parameters, Part 4, Part 5 - Bootstrap analysis option)

part 12


12-5

After choosing the trait for this

transformation, the user will be

asked for a confirmation.

Dealing with traits that are proportions: Logit transformation

In case of QTL analysis of proportions special measures are needed if the trait values are close to zero or one,due

to high deviation of the distribution from normality. Our package includes a special section for normalization of trait

distribution by transformations preserving trait range. In version 2.6 we included a new option for traits that are

proportions: logit transformation, the most suitable for such traits. In this case, the trait values “P” are replaced by

P’=log(P/(1-P). In version 2.6, the window <Data check and transformation> includes box <Logit

transformation>.

(addition to Part1 - Data revision option)

part 12


12-6

Then, the system defines minimum and maximum values in the trait. If one of these values is out of the

permitted interval (0 and 1 or 0 and 100) a warning message is displayed. Otherwise (0Pmin and Pmax 1),

Pmin and Pmax are displayed together with censoring values c1 and c2 (to exclude too extreme values of the

transformed trait): trait values with P< c1 and P>1-c2 are replaced by c1 and 1-c2, respectively. By default

c1=c2=0.01, but user can change these values.

Dealing with traits that are proportions: Logit transformation (continued)

part 12


12-7

It should be noted that model fitting during MIM process, small-scale significance testing based on permutations

is conducted, for stepwise selection of chromosomes to be included to or removed from the model. To get more

precise (real) estimates of significance, permutation test should then be conducted after MIM analysis. To

facilitate this process, in the window for MIM a new options, called <GlobalPerm. test>, is included to version

2.6 of MultiQTL package.

Before starting this analysis, we should select the

chromosomes targeted for permutation analysis.

This is initiated by pressing button <Choice for

permutation test>. The window of the selected

chromosomes will be colored (in blue).

To cancel the selected, the same button should be

pressed again. After selection is finished, the

number of permutations should be set from the

corresponding window.

Button <Start> of <Global permutation test>

window is then used to begin the Permutation test.

Button <Stop> allows to break the process after

treating of current chromosome. After the process

was stopped or finished, the color of treated

chromosomes will recover and the <Global

permutation test> window will be closed.

Global permutation test for after fitting MIM model (addition to Part 7)

part 12


12-8

The results of significance tests and permutation number for any chromosome can be seen by opening its window.

Important note:

In the new version of the package, all information about multi-chromosome sets (multiChromosome set )

treated using MIM tools is excluded from the main file *.job. This information is stored in a special file with

the same name but different extension ( *.mim). For more details see section “Separate storage of the

results”

Global permutation test for after fitting MIM model (continued)

part 12


12-9

Changes and additions to Total significance option

Changes have been made in the <Total significance> section to fix a program bug that was causing inflated

estimates of significance values for two- and multi-trait models. An extension of this section was included to

allow estimating significance for genome-wise analysis based on MIM analysis. Thus, the <Total

significance> option is now subdivided into two: <General> (old) and <After MIM> (a new one).

one <Total significance> and a new one <Selection from the model(s)> for selecting

multiple trait models consisting of different

sets of traits (see the example

based on sim10trPerm_2 file).

This example includes 5 models.

After choosing one of these

(mlt1_4), the names of all

models with all traits being

different from the traits of the

chosen model, will be listed in

window <Next choice>. We

can choose now one of these

listed models, and this will bring

to a new list of models with trait sets different from the traits of the selected two, etc. If no such models are anymore available, the window will be

empty. If from the beginning all models have shared traits, a corresponding massage will appear and,

consequently, only one model can be chosen to proceed the analysis.

After pressing button <OK>, the genome wise significance based on controlling FDR will be calculated for the

selected multi-trait models. If you do not proceed and just close the window, Total significance analysis will not be

conducted for the selected models.

(additions to Part 11) General option

Multiple trait models: In case of several multiple trait models, two windows will appear, the old

part 12


12-10

Two-trait models: All the models that were scored for significance appear in the window <Total significance>. Single-

trait and the selected multi-trait models can be analyzed as usually (including genome-wise significance testing,

obtaining report tables and figures). When user selects two-trait models, a message appears suggesting to choose

trait pairs with non-shared traits between the pairs.

Then, in the window <Calculation

panel> appears a new button <OK>

and a new check button <Chose

different trait pairs> in state

<On>.

Assume a pair trait1/trait2 is chosen. The

chromosomes of this pair, that were analyzed for

significance, will be marked by blue. If by using <Ctrl>

an additional pair will chosen that includes one of these

two traits (e.g., the pair trait1/trait4), a message will be

displayed that this choice is impossible.

Clearly, for this example, only

pair trait3/trait4 can be added

to previously selected

trait1/trait2.

By pressing button <OK> the selected pairs for the current model will be included to the estimation of the

Total significance.

General option (continued)

part 12


12-11

In this example, we considered all two-trait combinations for the single-QTL model. Two-QTL models that were

fitted and tested individually for significance should also be considered. Namely, before conducting Total

significance test, both singe- and two-linked QTL models should be selected for the chosen non-overlapping

trait pairs. This will allow us for each pair of traits to chose the best of the two fitted models. By choosing

Threshold sign (e.g., 0.05) and pressing <Display chosen case>, we will obtain a window for choosing the

best model for each of the selected trait pairs. We select linked-QTL model for Chrom2, and single-QTL model

for Chrom3. After that, we get the window <Group two_ trait> only for the selected pairs of traits.

General option (continued) part 12


12-12

In the presented example, we selected 3 multitrait models.

In the last version (2.6) the Report document has changed. It includes now PEV and PEVad for each trait, in

correspondence to all other report documents of the system.

General option (continued) part 12


12-13

Multi_chromosome sets for MIM analysis can be created for single traits, for trait pairs, and for multi-trait

complexes. For the same trait, Multi-set may include different models for different chromosomes (single- and

two- QTL). No more than one MIM set can be selected for one trait. Similarly, for two-trait and multiple-trait

sets, the sets should be selected in such a way that with one round of Total significance analysis any trait will

enter not more than once in the selected sets. Thus, after choosing the option <Total significance> <After

MIM>, the program displays all the single-trait sets with the same trait (if such sets exist at all) and allows the

user to make the choice.

To assist in selection of MIM sets, for each set the user can see the trait(s), chromosome, employed model,

and the calculated single-chromosome significance. Out of the suggested alternative, only one can be

selected, followed by confirmation (using <OK> button). By closing the window without any choice, the user

will remain in the final documents only with traits represented by one model only.

FDR analysis after fitting MIM model

part 12


12-14

If Multi_chromosome sets include two-trait problems, all such sets are suggested to the user for choosing sets with non-

overlapping traits. In this case, each set is represented by its name and accompanied by trait names. For each

set, its chromosomes, models (single- and two-QTL), and significance for each chromosome are displayed.

The traits cannot be repeatedly chosen. Thus, if we select, e.g., set tr2 with traits trait4 trait5, the window

<Next choose> will show only sets with non-overlapping traits relative to the already selected set(s). This

process can be repeated. When selection is finished, we should press button <OK>. If, instead, you close the

window using Cancel, none of the two-trait sets will be included to the analysis of significance.

FDR analysis after fitting MIM model (continued)

part 12


12-15

For multi-trait combinations the program automatically detects and displays the sets with non-overlapping traits.

The user can choose any set or a group of sets with different traits by using the information in the window <Next

choose>. As before, the selection of sets should be confirmed by pressing button <OK>, whereas to reject the

chosen sets you just close the window.

All selected sets will be represented by their models

in the <Total significance> window. Each of these,

can be analyzed for significance and characterized

by reporting table and graphs in the same way as

done for single-chromosome analysis. Clearly, the

results may be rather different between the single-

chromosome and MIM analysis.

FDR analysis after fitting MIM model (continue)

part 12


12-16

Analyzing epistasis between non-homologous chromosomes

This function is implemented for all types of population implemented so far in the package, as well for multiple-

environment analysis, but is not yet available for multiple-family analysis. It makes sense to move to this analysis

after the single-chromosome analysis is already done. In the current version 2.6, the analysis can be conducted

only with single-QTL chromosomes for single-trait, two-trait and multiple-trait models. As an example we use the

problem for F2 population with 5

chromosomes and 4 traits, represented in file F2TwoChrom.job Four models were previously

computed here: m1 – single-trait,

m1_2 – the same but with two-linked

QTLs (hence excluded from this

example), m2 – two-trait, and mlt –

multi-traits. To call this function, we

use the option <TwoChromSet> of

the main menu.

A window for chromosome pairs will

appear with lists of all models for all

chromosomes.

part 12


Using the window <trait/pair traits> you can chose any trait or trait pair for the analysis, whereas for the

multiTraits model this window is not available because ALL TRAITS of the multiple-trait model are then selected.

Let us chose model m1, trait3, and chromosomes 2 to 5. By pressing <OK> we will obtain a window similar to our

standard window, but this time it is build to represent all possible pairs of chromosomes (for the selected trait or

trait combination) rather than usual pattern of chromosomes versus traits.

In this panel we can define a

request for computing a certain

pair of chromosomes, or

several pairs, or all pairs. For

that, we should chose, as

usually, the desired

combination(s) or <All> and

then press <Compute>.

This example shows a choice of all

chromosomes from one list (2-5);

naturally, the diagonal cells remain empty,

and the results are symmetric relative to

the diagonal.

Analyzing epistasis between non-homologous chromosomes (continued)

part 12

12-17


Another option is to chose two sets of chromosome, so that

epistasis will be analyzed between chromosomes of these

lists but not within lists. For that, we press button <Second

list> and obtain the second list. Now we should chose

chromosomes to define the targeted pairs. The system

reports when a chromosome is selected in both lists (this is

considered a wrong choice). By pressing <OK> we obtain

the panel for the chosen combinations for the two lists. To

return to the single-list variant, we should press again the

button <Second list>. Note that the window <Computation

for chromosome pairs > includes a small window

<Variants>. This window lists the sets of chromosomes for

the current model and trait (or trait pairs). By choosing one of

the created variants, we can see which chromosomes it

includes (from one or two lists of chromosomes). By pressing

<OK> we can see the results for this set. By pressing

<Delete>, we can remove this set with all its results.

In the process of choosing the pairs of chromosomes, you

may be interested to see the results of single-chromosome

analysis. This is provided by pressing <View of the single

chromosome results>. Repeated pressing of this button

removes the single-chromosome window.

The panel with the two-chromosome results can be

printed or saved in EXCEL file.


part 12

12-18


The window includes a table

of results for all pairs of

intervals of the analyzed pair

of chromosomes. In

particular, the estimates of

the effects can be displayed

for each pair of intervals.

By selecting a result from the computation panel (e.g., for the pair chrom1-chrom2), we obtain a window similar to

the window for two-linked QTL model from our standard analysis for one chromosome. It is accompanied with a

menu that allows using all functions described earlier for linked-QTL analysis.

Please, note that the main (default) model in this analysis assumes

epistasis (and includes epistatic parameters) between the analyzed

chromosomes.


part 12

12-19


Let us select option <Estimates> of the menu. We obtain then a window very similar to

that of the two-linked QTL model for a single chromosome.

Analyzing epistasis between non-homologous chromosomes (continue)

part 12

12-20


The window of Permutation test is analogous to that of the single-chromosome analysis, whereas the window for

re-sampling (bootstrap and jackknife) function slightly differs from that of single-chromosome analysis.


part 12

Testing H2:H1 is not presented among functions of

option <Significance>. This is based on the assumption

that pairs of chromosomes to be tested for epistasis

include only chromosomes with significant individual

effect on the trait(s), hence both members of the tested

pair should affect the trait(s).

12-21


Like in single-chromosome analysis, options <Global Resampling> and <Global permutation test> are

available in the two-chromosome analysis.

For multiple-trait complexes, the two-chromosome analysis does not include the function Traits

Contribution that has a more exploratory character.


part 12

12-22


Here we demonstrate how to create sub-models and

compare models. The example is for pair chrom4 and

chrom5.

The default model (with epistasis) is highlighted in red.

We can see that epistatic parameters e1-e4 are small

(relative to d1, d2, h1 and h2). Is epistasis between

chrom4 and chrom5 significant? We create a sub-

model assuming no episatasis (green).

Now we can employ compare the models


part 12

12-23


Thus, all main functions employed for epistasis within chromosome (for linked QTLs) are available for epistasis

analysis for QTLs from non-homologous chromosomes.

Closing the <Computation of the pair chromosome> window returns the standard panel of single-

chromosome analysis. Using options <Save>, <Save As> or closing the problem result in saving all results of

analysis of chromosome pairs. In next round of analysis with the saved job file, we could see all the variants

and results of analysis conducted using function <TwoChromSet>. This information can be seen in the

window <Computation of the pair chromosome>.

Important note:

In the new version of the package, the information resulting from analysis of epistasis between non-

homologous chromosomes is stored in a special file with the same name but different extension ( *.mim).

For more details see section “Separate storage of the results”


part 12

12-24


Data input for half-sibs population

The new version 2.6 allows analyzing data of half-sib mapping design that is widely used in dairy cattle genetics. The

usual way to treat such data is via “backcross” model. In such a case, the estimated effect is half of the true additive

effect. It is noteworthy, that in MultiQTL, the out put of the substitution effect is d=(XQQ-Xqq)=2D (which is twice of

the additive effect). Thus, in case of half-sib design, our d represents exactly the estimate of additive effect.

Another specific aspect of using MultiQTL for half-sib mapping design concerns the data input. For input the data

should be prepared in EXCEL format. In the input table (file *.xls), the first line represents the names of the markers

and quantitative traits. The second line indicates the names of sire’s alleles. The remainder liens of the table

represent the progeny alleles. Thus, the number of lines is equal to the progeny size plus 2. High flexibility is allowed

for allele designation: they could be numbers, character, or mixed (among marker loci). For example: 200 190, А В,

200 А. The two alleles of a diploid progeny can be separated, e.g., by a backspace, comma, or slash, or presented

without any division symbol,e.g., АВ.

Data input is conducted as follows:

We start with calling the data

table. For that, during creating a

new project, we chose in

Population Type the option half

sibs. After pressing button Import

Data, the data format xls is

selected and then, on the tree of

folders and files, we chose the

required file. This selection will

result in appearance of the data

table on the screen. The next

steps should transform this data

set to a “backcross” type data.

part 12

12-25


The data table looks as follows: 1

2

3

4 6

7

8

5

part 12

12-26

- We should mark the line of marker names (1), mark in box <Selected String> the line Column Names (2),

and then, by pressing button <Take> (3), we enter marker names (and trait names, if presented in the same

file) to MultiQTL.

- Now we click on the line with the parental alleles (4), then in box <Selected String> click on Sire’s

genotype (5), and then press button <Take> to enter names of sire’s alleles.

- Choosing each marker by button corresponding to the <Name of marker> (6) and then button <Take>,

we transform the information in the column to the internal format of MultiQTL. However, this input will be

correct only for markers with backspace as division symbol. If no division symbol is presented, then, before

pressing button <Take>, we should mark <No delimiter> (7). If there is a division symbol, comma or slash,

then before pressing button <Take>, we should enter the corresponding division symbol as <Allele

delimiter> (8). After finishing the input of each marker, we should confirm it by pressing <ОК>.

part 12

12-27


Building submodels with variance effect for multiple-environment and multiple-family data (to parts 8-9)

In the previous version, there was no possibility to create models and submodels with variance effect for multiple-

environment (and multiple-family) data. In the current version, it is already possible to create models with variant

effect for single-trait analysis. Likewise, for a model with no variance effect, it can be defined for selected or all

environments by using corresponding sub-model options.

During creation of a model, you can define a special model (that will differ from

the default one), by switching off the indicated function.

Then, upon adding a new model, a new window <Submodel> opens

allowing to set the parameters of the special model. The same

window is used for defining sub-models if the created multiple-

environment model was without variance effect (by default). For

example, we may have data for two environments with a

single-trait model without variance effect.

We can use option <Submodel Add> to

create several submodels with variance effect.

part 12

12-28


For that, in the panel <Submodel>, the check button <no variance effect> should be switched to state

<Off>. Then, you choose both environments in the list <Environments> and by pressing buttons <Apply>

and <Ok>. Obtain the new solution (green graph).

You can choose only one of the environments from the

list. For each of such two choices you’ll get then a

separate graph: yellow (variance effect allowed for the

second but not the first environment) and blue (variance

effect allowed for the first but not the second

environment). Using the <About submodel> option, you

can see the description of the submodels.

part 12

Building submodels with variance effect for multiple-environment and multiple-family data (continued)

12-29


References

Our algorithms are based on theoretical papers of the QTL mapping community, and our own publications. List of our relevant publications was provided in Part 1, pages 12-13. Here we provide references to all other papers cited in any of the Parts 1-12.

Benjamini Y. and Hochberg Y. 1995, Controlling the false discovery rate: A practical and

powerful approach to multiple testing. J. Roy. Stat. Soc., Ser. B. 57: 289-300.

Crow J.F. 1990, Mapping functions. Genetics 125: 669-671.

Darvasi A. and Soller M. 1992. Selective genotyping for determination of linkage between a

marker locus and a quantitative trait locus. Theor. Appl. Genet. 85: 353–359.

Churchill G.A. and Doerge R.W. 1994, Empirical threshold values for quantitative trait

mapping. Genetics 138: 963-971.

Jansen J., de Jong A.G., van Ooijen J.W. 2001, Constructing dense genetic linkage maps

Theor. Appl. Genet.102:1113–1122.

Jansen R.C., Van Ooijen J.M., Stam P., Lister C., and Dean C. 1995, Genotype-by-environment

interaction in genetic mapping of multiple quantitative trait loci. Theor. Appl. Genet. 91: 33–37.

Jansen R.C. and Stam P. 1994, High resolution of quantitative traits into multiple loci via

interval mapping. Genetics 136: 1447-1455.

Jiang, C. and Zeng Z.-B. 1995, Multiple trait analysis and genetic mapping for quantitative trait

loci. Genetics 140: 1111–1127.

12-30


Jump to first page

Jiang C. and Zeng Z.-B. 1997, Mapping quantitative trait loci with dominant and missing

markers in various crosses from two inbred lines. Genetica 101: 47–58.

Kao C.-H., Zeng Z.-B., and Teasdale R.D. 1999, Multiple interval mapping for quantitative

trait loci. Genetics 152: 1203–1216.

Lander E.S., Green P., Abrahamson J., et al. 1987, MapMaker: An interactive computer

package for constructing genetic linkage maps of experimental and natural populations.

Genomics 1: 174-181.

Lebreton C.M. and Visscher P.M. 1998, Empirical nonparametric bootstrap strategies in

quantitative trait loci mapping: Conditioning on the genetic model. Genetics 148: 525–535.

Luo Z.W., Ch.-I. Wu, and Kearsey M.J. 2002, Precision and high-resolution mapping of

quantitative

trait loci by use of recurrent selection, backcross and intercross schemes. Genetics 161: 915-929.

Stam P. 1993, Construction of integrated genetic linkage maps by means of a new computer

package: JoinMap. The Plant Journal 3: 739-744.

Walling G.A., Visscher P.M., and Haley C.S. 1998, A comparison of bootstrap methods to

construct confidence intervals in QTL mapping. Genet. Res. 71: 171-180.

Zeng Z-B. 1994, Precise mapping of quantitative trait loci. Genetics 136: 1457–1468.

Zeng Z-B., Kao C.-H., and Basten C.J. 1999, Estimating the genetic architecture of quantitative

traits Genet. Res. 74: 279-289.

References (continued)

Good luck !

We will be glad to your comments, criticisms and suggestions.

12-30


Documents

MultiQTL · Our algorithms are based on up-to-date theoretical papers from the whole QTL mapping field but some are unique to us. 1-12 References Korol A., Preygel I., Preygel S