20
Copyright Ó 2007 by the Genetics Society of America DOI: 10.1534/genetics.107.074377 Predicting the Size of the Progeny Mapping Population Required to Positionally Clone a Gene Stephen J. Dinka,* Matthew A. Campbell, Tyler Demers* and Manish N. Raizada* ,1 *Department of Plant Agriculture, University of Guelph, Guelph, Ontario, Canada N1G 2W1 and The Institute for Genomic Research, Rockville, MD, 20850 Manuscript received April 10, 2007 Accepted for publication June 4, 2007 ABSTRACT A key frustration during positional gene cloning (map-based cloning) is that the size of the progeny mapping population is difficult to predict, because the meiotic recombination frequency varies along chromosomes. We describe a detailed methodology to improve this prediction using rice (Oryza sativa L.) as a model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop- ulation size by comparing these theoretical estimates to 41 successful positional cloning attempts. We then used each validated equation to test whether neighborhood meiotic recombination frequencies extracted from a reference RFLP map can help researchers predict the mapping population size. We developed a meiotic recombination frequency map (MRFM) for 1400 marker intervals in rice and anchored each published allele onto an interval on this map. We show that neighborhood recombination frequencies (R-map, .280-kb segments) extracted from the MRFM, in conjunction with the validated formulas, better predicted the mapping population size than the genome-wide average recombination frequency (R-avg), with improved results whether the recombination frequency was calculated as genes/cM or kb/cM. Our results offer a detailed road map for better predicting mapping population size in diverse eukaryotes, but useful predictions will require robust recombination frequency maps based on sampling more progeny. A limited number of forward genetics techniques exist to isolate an allele that underlies a mutant or polymorphic phenotype and that require no prior knowledge of the gene product. These include pro- tocols to isolate host DNA flanking insertional muta- gens (e.g., transposons) (Ballinger and Benzer 1989; Raizada 2003) and positional gene cloning techniques (Botstein et al. 1980; Paterson et al. 1988; Tanksley et al. 1995) that permit the discovery of alleles created by chemical mutagens, radiation, or natural genetic var- iation. Positional gene cloning is feasible when the fol- lowing conditions are met: (1) two parents exist that differ in a trait of interest; (2) the parents can be distin- guished at the chromosome level by polymorphic DNA markers (e.g., RFLP); and (3) in a population of progeny, the underlying gene can be mapped relative to nearby DNA segments that have previously been cloned (Botstein et al. 1980; Tanksley et al. 1995). Unfortunately, posi- tional gene cloning suffers from unpredictability in terms of the number of post-meiotic progeny that a researcher can expect to genotype to narrow a candidate chromosomal region to a small number of candidate genes (Dinka and Raizada 2006). For example, in rice (Oryza sativa L.), only 1160 gametes were genotyped to narrow the Pi36(t) allele to a resolution of 17 kb (Liu et al. 2005), whereas 18,944 gametes were genotyped to map the Bph15 allele to a lower resolution of 47 kb (Yang et al. 2004). During fine mapping, the physical distance between a known physical location on a chro- mosome (i.e., the molecular marker) and the target allele is inferred by the frequency of meiotic recombinants that can break cosegregation of the phenotype encoded by the target allele with physically anchored molecular markers (Botstein et al. 1980; Paterson et al. 1988). Ideally, a gene hunt ends once a molecular marker is found that always cosegregates with the target phenotype in a large population of genotyped and phenotyped F 2 (or post-F 2 ) progeny. Therefore, the frequency of mei- otic recombination in the vicinity of the target locus (defined as R ¼ kilobase/cM), along with the local den- sity of molecular markers, determines the size of the mapping population. We are interested in helping re- searchers predict mapping population size. As initial analysis assigns a target allele to a 1–5-cM map interval, the goal of this study is to determine whether the re- combination frequency at this interval size, obtained from a high-density molecular marker map, can be used to predict the number of progeny required for subsequent sub-centimorgan mapping in combination with user- friendly mathematical formulas. Durrett et al. (2002) used the kb/cM ratio (R) as the basis of an equation (which we will refer to as the Durrett–Tanksley equation) to predict genotyping 1 Corresponding author: Department of Plant Agriculture, University of Guelph, 50 Stone Rd., Guelph, Ontario, Canada N1G 2W1. E-mail: [email protected] Genetics 176: 2035–2054 (August 2007)

Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

Copyright � 2007 by the Genetics Society of AmericaDOI: 10.1534/genetics.107.074377

Predicting the Size of the Progeny Mapping Population Required toPositionally Clone a Gene

Stephen J. Dinka,* Matthew A. Campbell,† Tyler Demers* and Manish N. Raizada*,1

*Department of Plant Agriculture, University of Guelph, Guelph, Ontario, Canada N1G 2W1 and†The Institute for Genomic Research, Rockville, MD, 20850

Manuscript received April 10, 2007Accepted for publication June 4, 2007

ABSTRACT

A key frustration during positional gene cloning (map-based cloning) is that the size of the progenymapping population is difficult to predict, because the meiotic recombination frequency varies alongchromosomes. We describe a detailed methodology to improve this prediction using rice (Oryza sativa L.) as amodel system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop-ulation size by comparing these theoretical estimates to 41 successful positional cloning attempts. We thenused each validated equation to test whether neighborhood meiotic recombination frequencies extractedfrom a reference RFLP map can help researchers predict the mapping population size. We developed ameiotic recombination frequency map (MRFM) for �1400 marker intervals in rice and anchored eachpublished allele onto an interval on this map. We show that neighborhood recombination frequencies(R-map, .280-kb segments) extracted from the MRFM, in conjunction with the validated formulas, betterpredicted the mapping population size than the genome-wide average recombination frequency (R-avg),with improved results whether the recombination frequency was calculated as genes/cM or kb/cM. Ourresults offer a detailed road map for better predicting mapping population size in diverse eukaryotes, butuseful predictions will require robust recombination frequency maps based on sampling more progeny.

A limited number of forward genetics techniquesexist to isolate an allele that underlies a mutant or

polymorphic phenotype and that require no priorknowledge of the gene product. These include pro-tocols to isolate host DNA flanking insertional muta-gens (e.g., transposons) (Ballinger and Benzer 1989;Raizada 2003) and positional gene cloning techniques(Botstein et al. 1980; Paterson et al. 1988; Tanksley

et al. 1995) that permit the discovery of alleles createdby chemical mutagens, radiation, or natural genetic var-iation. Positional gene cloning is feasible when the fol-lowing conditions are met: (1) two parents exist thatdiffer in a trait of interest; (2) the parents can be distin-guished at the chromosome level by polymorphic DNAmarkers (e.g., RFLP); and (3) in a population of progeny,the underlying gene can be mapped relative to nearbyDNAsegmentsthathavepreviouslybeencloned(Botstein

et al. 1980; Tanksley et al. 1995). Unfortunately, posi-tional gene cloning suffers from unpredictability interms of the number of post-meiotic progeny that aresearcher can expect to genotype to narrow a candidatechromosomal region to a small number of candidategenes (Dinka and Raizada 2006). For example, in rice(Oryza sativa L.), only 1160 gametes were genotyped

to narrow the Pi36(t) allele to a resolution of 17 kb (Liu

et al. 2005), whereas 18,944 gametes were genotypedto map the Bph15 allele to a lower resolution of 47 kb(Yang et al. 2004). During fine mapping, the physicaldistance between a known physical location on a chro-mosome (i.e., themolecularmarker)andthetargetalleleis inferred by the frequency of meiotic recombinants thatcan break cosegregation of the phenotype encoded bythe target allele with physically anchored molecularmarkers (Botstein et al. 1980; Paterson et al. 1988).Ideally, a gene hunt ends once a molecular marker isfound that always cosegregates with the target phenotypein a large population of genotyped and phenotyped F2

(or post-F2) progeny. Therefore, the frequency of mei-otic recombination in the vicinity of the target locus(defined as R¼ kilobase/cM), along with the local den-sity of molecular markers, determines the size of themapping population. We are interested in helping re-searchers predict mapping population size. As initialanalysis assigns a target allele to a 1–5-cM map interval,the goal of this study is to determine whether the re-combination frequencyat this interval size,obtainedfroma high-density molecular marker map, can be used topredict the number of progeny required for subsequentsub-centimorgan mapping in combination with user-friendly mathematical formulas.

Durrett et al. (2002) used the kb/cM ratio (R) as thebasis of an equation (which we will refer to as theDurrett–Tanksley equation) to predict genotyping

1Corresponding author: Department of Plant Agriculture, University ofGuelph, 50 Stone Rd., Guelph, Ontario, Canada N1G 2W1.E-mail: [email protected]

Genetics 176: 2035–2054 (August 2007)

Page 2: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

requirements during positional cloning, the only suchequation we could find in the literature. Durrett et al.compared the results of their equation to empiricalevidence from 12 published positional cloning suc-cesses in Arabidopsis thaliana; the model often appearedto overestimate the number of progeny required to begenotyped. However, the accuracy of the model wasdifficult to assess, because only the genome-wide re-combination frequency was employed, rather than localrates of recombination. Perhaps as a result, it was simplyconcluded that some researchers were lucky or unlucky(Durrett et al. 2002).

Building upon the work of Durrett et al., we havetried to understand and predict when a researcher willbe lucky or unlucky during positional gene cloning byaccounting for: (1) over-genotyping (resulting in re-dundant crossovers between the target locus and theclosest molecular markers); (2) a low density of availablemolecular markers in the target interval (causing somecrossovers to be missed); and most important, (3) highor low local rates of local recombination (R) comparedto the genome-wide average (Nachman 2002). We havecompared the predictions of the Durrett–Tanksleyequation to empirical data obtained from 41 positionalcloning studies in rice (O. sativa L.), which is a modelsystem for the world’s most important crops, the cereals(Paterson et al. 2005). Specifically, we have measuredthe predictability of the Durrett–Tanksley equation andthen focused on whether ‘‘neighborhood’’ (,2 cM) re-combination values obtained from a reference geneticmap (Harushima et al. 1998) further improve the ac-curacy of the model compared to using the genome-wide average recombination rate (R-avg). In addition,we have derived and tested a simpler equation thatpredicts progeny mapping size. Finally, we have mea-sured the utility of employing R-values calculated asgenes/cM rather than kb/cM to predict mapping pop-ulation size, as the former allows the candidate genenumber to be estimated, which is of greater interest toresearchers targeting sequenced, annotated genomes.

MATERIALS AND METHODS

Use and modification of the Durrett–Tanksley equation:First, we used the Durrett–Tanksley equation (Durrett et al.2002) which estimates the number of F2/post-F2 meioticgametes required to positionally clone an allele as derivedfrom an F1 heterozygote, based on the following probability:

P ¼ 1� ½1 1 NT=ð100RÞ�e�NT=ð100RÞ;

where P is probability (P) that if a (proximal) crossover occursin the vicinity of a target allele that a second (distal) crossoverwill be carried by a sibling gamete; N is number of genotypedchromosomes (informative gametes) required; T is map reso-lution, the candidate kilobase or gene block distance betweenthe closest two molecular markers containing the targetallele; and R is recombination frequency (kb/cM or genes/cM).

As the equation is dependent only on the value NT/100R,then if the probability is set at 0.95, NT/100R ¼ 4.744, whichmay be rewritten as N ¼ (4.744 3 100R)/T.

To adjust for the target number of gametes containing aninformative crossover (lT), which we assume may decrease T(better map resolution), we introduced the empirically-derived T modifier, 4.744/lT (see results); the resultingmodified Durrett–Tanksley equation is as follows:

N ¼ (4.744 3 100R)/½T-marker 3 (4.744/lT)�,or simplified,

N ¼ ð100R 3 lTÞ=T -marker ;

where N is total number of informative chromosomes (game-tes) that must be genotyped with the probability of success setat P ¼ 0.95, R is the local recombination frequency (R-local)(kb/cM or genes/cM), T-marker is distance between the closesttwo molecular markers (in which crossovers are detected rel-ative to the target allele) (kilobases or gene block), and lT isnumber of crossovers between the closest two molecularmarkers ($2).

The Durrett–Tanksley equation assumes that the recombi-nation frequency (R) is constant in the vicinity T of the tar-get allele. This equation also requires that the genotype of thetarget allele (a) in F2/post-F2 progeny can be assigned. Thus,in the case of a recessive target allele, N equals the number ofF2 testcross progeny. Alternatively, where F2 progeny are theproduct of selfing F1 heterozygotes (such as in plants), thensince each F2 progeny is derived from two meioses, N equalstwo times the number of F2 progeny genotyped; this is onlytrue, however, when the F2 progeny genotype AA can be dis-tinguished from the genotype Aa since this is required todetermine whether a crossover occurred on the proximal ordistal side of the target allele. Such a determination requirestesting progeny for segregation of phenotypes in the F3 gen-eration (progeny testing).

Derivation of a simplified equation based on single-crossover probability: We developed the following user-friendly equation to estimate the fine-mapping populationsize, an estimate of the number of F2 testcross progeny re-quired to be genotyped to detect sufficient crossovers toachieve a desired kilobase or gene block resolution:

N ¼ Log ð1� PÞ=Logð1� T -marker=100RÞ;

where N is the number of meiotic gametes (chromosomes)that must be genotyped in which it can be determined whethera crossover is located proximal or distal to the target allele, P isthreshold probability of success (e.g., 0.95), T-marker is expecteddistance between flanking molecular markers (kilobases or can-didate genes), and R is local or genome-wide average recom-bination frequency (kb/cM or genes/cM).

This equation was based on the assumption that if a cross-over occurs in a segment (with length T) on the proximal sideof a target allele in a large population of F2 progeny (N), thenthere is an equal chance that a recombination event will becarried by a sibling F2 gamete on the distal side within adistance of ,T from the target allele as shown in Figure 1B.Hence, because the probability of only a single recombinationevent occurring within the mapping population must becalculated, the equation is simplified. However, it is recog-nized that the distance between the two crossovers will rangefrom zero to 2T; on average, however, the distance will beT, and likely ,T when there are more than two informativecrossovers and/or when the molecular marker resolution islimiting. However, since the majority of positional cloningstudies report more than two informative crossovers (l) (seeTable 2), and since the minimum distance between flanking

2036 S. J. Dinka et al.

Page 3: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

molecular markers (T-marker) is often limiting, then the prob-ability is high that the distance between the closest two cross-overs will be ,T-marker.

The detailed derivation of this equation is as follows:

1. P (failure) of a crossover in the target interval (T) pergamete ¼ (total genome crossovers � target interval cross-overs)/total genome crossovers.

2. Alternatively, P (failure) per gamete ¼ 1 � (fraction ofgenome 3 number of crossovers in whole genome).

3. Thus, P (failure) per gamete ¼ 1 � ½(kb resolution/kbgenome size 3 (genome map in cM/100)� or P(failure) pergamete ¼ 1 � ½(gene block resolution/genome-wide genenumber 3 (genome map in cM/100)�.

4. Since P (failure) ¼ (P failure per gamete)N, where N isnumber of informative gametes, then

N ¼ Log ðP failÞ=Log ðP fail per gameteÞand

N ¼ Log ð1� P successÞ=Log ðP fail per gameteÞ:

5. Therefore, N ¼ Log (1 � Psuccess)/Log ½1 � (gene block/genome gene number 3 genome map cM/100)� or N¼ Log(1 � Psuccess)/Log ½1 � (kb target/genome kb 3 genomemap cM/100)�.

6. Simplified, the above equation can be rewritten as:

N ¼ Log (1 � Psuccess)/Log ½1 � (kb target/100) 3 (totalcM/total genome kb)�,

or

N ¼ Log ð1� PÞ=Log ð1� T -marker=100RÞ;

where R is local or genome-wide recombination frequency.Additional assumptions of this model are as follows:

1. The equation assumes that the phenotype of the trait ofinterest can be readily scored to determine if a crossoveroccurred proximal or distal to the target allele; hence N isequivalent to the number of testcross progeny, 0.5 3 thenumber of F2 (selfed) progeny (if no progeny testingperformed), or 2 3 the number of F2 (selfed) progeny (ifF3 progeny testing is performed).

2. The equation assumes that the frequency of double-recombinants in a small interval is negligible due tocrossover interference.

3. The equation assumes that the crossover may occur any-where in the defined interval T such that the distance be-tween each informative crossover and the target locus is ,T.

4. The recombination frequency is assumed to be constant inthe region ,2T.

Modified single crossover equation: Based on empiricaldata, we then modified this equation by adjusting the geneticmap resolution T by the number of crossovers (see results),resulting in the equation:

N ¼ Log ð1� PÞ=Logf1� ½T -marker 33=lT�=100Rg;

where N is total number of informative chromosomes that mustbe genotyped with the probability of success, P¼ 0.95, R is thelocal recombination frequency (R-local) (kb/cM or genes/cM),T-marker is distance (kb or candidate gene block) between theclosest two molecular markers (in which crossovers are de-tected relative to the target allele), and lT is number of cross-overs between the closest two molecular markers ($2).

Analysis of published positional cloning studies: We ana-lyzed 41 published positional cloning/fine-mapping studies inrice to extract or calculate the three variables, N, T, and R

(Table 1). The candidate gene resolution (T) ½in kb or genenumber, T(kb) or T(gene)� was either reported in each studyor obtained by personal communication with the authors. Inthe latter case, these were confirmed by corroborating thekilobase resolution with the gene resolution using the TIGRPseudomolecules Release 4.0 database (Yuan et al. 2005);retroelements, transposons, and transposases were excludedfor gene resolution. The calculation of N gametes genotypedwas more complex; it required us to distinguish the actualnumber of progeny genotyped (g) from the number ofinformative chromosomes (N), defined as chromosomes thathad the potential of having a crossover between the tar-get allele and a flanking molecular marker, and where thelocation of that crossover (proximal or distal to the target) wasdistinguished (e.g., using progeny testing). To convert g to N,we multiplied g by a meiosis factor ( f ) as shown in Table 1 (alsosee footnotes to Table 1). This required us to classify themapping strategy used and note whether the target trait wasdominant, recessive, or was expressed in the haploid genera-tion (gamete or gametophyte). For example, for the cloning ofthe recessive bc1 allele (Y. Li et al. 2003), since only F2 recessiveprogeny were genotyped (7068 recessives genotyped out of30,000 F2 progeny) and hence the genotype of the target allelewas non-ambiguous, the total number of informative chromo-somes genotyped was 2 3 7068 (i.e., f¼ 2, hence N¼ 2 3 g). Incontrast, for the fine mapping of the dominant Psr1 allele(Nishimura et al. 2005), since 3800 (Backcross 3, BC3) F1

progeny were genotyped, and thus only 50% of the targetchromosomes underwent informative meioses, then f ¼ 0.5,and N ¼ 1900 informative chromosomes. For rice, it wasassumed that males and females had equal rates of recombi-nation, but in many species, such as zebrafish, this is not true(Singer et al. 2002; Lenormand and Dutheil 2005) and mustbe accounted for in the meiosis factor. Finally, to calculate thelocal recombination frequency (R-local) (Table 2), we usedthe following equation:

R -local ¼ T ðlocalÞ=mðlocalÞ;

where R is local recombination frequency (kilobases/cM),T(kb) is distance in kilobases between the closest two cross-overs, m is genetic map distance between the two crossovers incentimorgans, and m¼ 100 3 (l1 1 l2)/N, where l1 is numberof closest, proximal crossovers (Table 2), l2 is number of closest,distal crossovers (Table 2), and N is total number of informa-tive gametes (chromosomes) genotyped (Table 1). In a testcross,m ¼ 100 3 l/progeny, whereas in a selfed cross with progenytesting, m¼ 100 3 (l/2 3 progeny) since genotyping permitsboth chromosomes to contribute to the mapping population.

The only crossovers (lT) in the calculation were those thatwere in between the two molecular markers used to define T.For each of the 41 studies, we applied the values for R(local),T(kb) and set P at 0.95, to the Durrett–Tanksley equation andcompared the number of informative gametes (N) required bythis equation to the empirical numbers shown in Table 1. Weperformed both nonparametric correlation analysis (Spear-man coefficient) and linear regression analysis using thesoftware program Instat 3 (GraphPad Software).

Generation of a reference meiotic recombination frequencymap (MRFM) for rice: To determine whether recombinationfrequencies derived from a reference genetic map could beused to predict progeny sampling requirements using theDurrett–Tanksley equation, we first assembled such a map,inspired by a previous report (Wu et al. 2003), to generate twotypes of recombination values: R(gene), in genes/cM; andR(kb), in kilobases/cM (see supplemental Table 1 at http://www.genetics.org/supplemental/). The names and GenBankaccession numbers of RFLP markers genetically mapped in an

Predicting Positional Gene Cloning 2037

Page 4: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

F2 population between Nipponbare and Kasalath were ob-tained from the Rice Genome Project (RGP: http://rgp.dna.affrc.go.jp/) (Harushima et al. 1998). FASTA sequence filesfor the markers were obtained from NCBI. The RFLP markersequences from the RGP map were physically mapped ontothe version 4 TIGR rice pseudomolecules map (http://www.rice.tigr.org) using the Genomic Mapping and AlignmentProgram (GMAP) (Wu and Watanabe 2005). The physicalmap position of each marker was derived from the top hit thatexceeded a threshold of 95% identity over 90% of the length.After physically positioning the RFLP markers onto the pseu-domolecules, Perl scripts and manual inspection were used toremove all markers showing map incongruency (where thephysical and genetic position of the markers were at odds). Weobtained 1391 congruent markers for the RGP map. This es-tablished both physical and genetic locations and hence in-terval distances for each RFLP marker; from these values, thekb/cM recombination frequency was calculated for each markerpair. To generate the corresponding genes/cM frequencies,we queried the Osa1 database at TIGR: the coordinates of all42,535 non-transposable element-related transcription unitswere obtained (Yuan et al. 2005). Custom Perl scripts werewritten to bin these transcription units between each RFLPmarker pair. This established the number of non-transposableelement candidate genes for each interval along with thegenetic locations of these markers, and hence the followingparameters were calculated for each RFLP marker pair: thegenetic distance between each marker and the correspondinggenes/cM recombination rate.

Testing the predictive value of the Modified Durrett–Tanksley equation using R-map recombination frequencies:Next we assigned each target allele to a physical location onthe RGP physical map, which contains 1400 marker intervals.To accomplish this, each target allele was assigned a TIGRlocus number (if cloned) onto a BAC/PAC clone (if not cloned;TIGR Pseudomolecules Release 4.0); sometimes this informa-tion was published. In remaining examples, the GenBankgene sequence or molecular marker information was used toscreen the TIGR rice sequence database; the genetic mapposition, marker data, and BAC/PAC assignment helped toverify the physical assignment. The locus or BAC/PAC nameand sequence was then used to assign each allele to an intervalbetween two mapped markers on the RGP MRFM of rice(Table 2; supplemental Table 1 at http://www.genetics.org/supplemental/). The recombination frequency of the corre-sponding marker interval (R-map) was then employed; becausewe feared that chance crossovers might distort the recombina-tion frequency in small intervals (,277 kb, 1-cM average) onthis map, adjacent segments were sometimes added together(to achieve a .280-kb interval) before calculating an averageR-map value with the goal of situating the target allele at thephysical center of the larger interval. In rare situations, anR-map value for an interval of ,280 kb was accepted becauseadjacent intervals were unusually large. The choice to add ornot add marker intervals was done blindly from the R-localvalues in order to not bias R-map values. The R-map valueswere then applied to each equation.

Calculation of R-avg values: The genome-wide averagerecombination frequency in kilobases/cM was calculated bydividing the total genome size (�430 Mb) (IRGSP 2005) bythe total genetic map length (�1521 cM) (Harushima et al.1998); the average recombination frequency in genes/cMwas calculated by dividing the total number of non-transpos-able element-encoded transcription units (�42,535) (Yuan

et al. 2005) by the map length. The resulting genome-widerecombination frequency (R-avg) in rice is 277 kb/cM and 28genes/cM.

RESULTS

Initial equations to predict mapping population size:Initially, we employed two equations to predict the sizeof the fine-mapping population, one of which is de-veloped here. First, we used the Durrett–Tanksley equa-tion (Durrett et al. 2002), which estimates the numberof F2/post-F2 meiotic gametes required to positionallyclone an allele as generated from an F1 heterozygote; itcalculates the probability (P) that if a (proximal) cross-over occurs in the vicinity of a target allele that a second(distal) crossover will be carried by a sibling gamete,such that the distance between the two crossovers will bethe kilobase distance T (Figure 1A), for a prescribednumber of genotyped gametes (N) (informative chro-mosomes) and for a given recombination frequency (R),according to the following equation:

P ¼ 1� ½1 1 NT=ð100RÞ�e�NT=ð100RÞ:

The primary assumption of the equation is that theprogeny number will vary with the recombination fre-quency: the higher the frequency of recombination, thefewer progeny will be required to detect a crossover be-tween the target allele and flanking molecular markers.See materials and methods for additional details.

We then derived a second equation with the goal ofmaking it more user-friendly for researchers. This equa-tion was based on the following premise: if a crossoveroccurs in a segment (with length T ) on the proximalside of a target allele in a large population of F2 progeny(N), then there is an equal probability that a siblinggamete will carry a crossover on the distal side within adistance of ,T from the target allele as shown in Figure

Figure 1.—An explanation of how the map resolution (T)was calculated for the equations used in this study. (A) TheDurrett–Tanksley equation calculates the probability thattwo sibling post-meiotic (F2) gametes will carry informativecrossovers: a proximal crossover occurring (X) flanking thetarget allele (solid line) and a second, distal crossover occur-ring at a distance ,T from the first crossover. (B) The simpli-fied, single crossover equation divides the flanking region ofthe target allele into T segments, and calculates the probabil-ity that a crossover will occur in any T segment. Thus, on av-erage, within the F2 gamete population, the average distancebetween flanking crossovers will be T (range .0 to ,2T).

2038 S. J. Dinka et al.

Page 5: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

1B. This simplifies the equation by only having to calcu-late the probability of a single crossover within the pop-ulation, noting, however, that although on average anytwo crossovers will be distance T apart, they may rangefrom zero to 2T (see materials and methods for fur-ther details). The number of F2 testcross progeny re-quired to be genotyped to detect sufficient crossovers toachieve a desired kilobase or gene block resolution isthus as follows:

N ¼ Log ð1� PÞ=Log ð1� T -marker=100RÞ;

where N is the number of meiotic gametes (chromo-somes) that must be genotyped in which it can be deter-mined whether a crossover is located proximal or distalto the target allele, P is threshold probability of success(e.g., 0.95), T-marker is expected distance between flank-ing molecular markers (kilobases or candidate genes),and R is local or genome-wide average recombinationfrequency (kb/cM or genes/cM).

Similar totheDurrett–Tanksleyequation, thismodelas-sumes that the phenotype of the trait of interest can bereadily scored to determine if a crossover occurred prox-imal or distal to the target allele; hence N is equivalent tothe number of testcross progeny, 0.5 times the number ofF2 (selfed) progeny (if no progeny testing performed), ortwo times the number of F2 (selfed) progeny (if F3

progeny testing is performed). The derivation of thisequation is in the materials and methods section.

Empirical gamete number, mapping resolution, andlessons from published studies in rice: To validate theequations noted above, we first analyzed 41 publishedpositional cloning/fine-mapping studies in rice, to ex-tract or calculate N and T (Table 1) (see materials and

methods). We made several observations that might beuseful to future research groups who wish to undertakepositional cloning in rice. First, as in other species, inrice there was a wide range in the number of informativegametes (N) (potential recombinant chromosomes) thatwere genotyped to positionally clone target alleles: thisranged from only 416 gametes for the Pi-kh allele(Sharma et al. 2005) to �20,000 gametes for the allelesGn1a (Ashikari et al. 2005), qSH1 (Konishi et al. 2006),and Bph15 (Yang et al. 2004), an �25-fold range. Theaverage number of informative gametes genotyped was5686; the median was 4200. The median target resolution(T) achieved was 44.5 kb or five genes. There were sevenexamples of single-gene resolution mapping (Table 1),and to achieve this resolution, the number of informativegametes employed ranged from 2800 to 26,000 (�10-foldrange); the average was 11,593 gametes. Single generesolution mapping in a smaller genome, A. thaliana, hasbeen much rarer (Dinka and Raizada 2006).

Several fine-mapping strategies were used successfully:

1. Of 41 studies, 11 groups reported isolation of aquantitative trait locus (QTL); to reduce the effects

of minor QTL and/or to be able to employ a back-ground with well-characterized molecular markers,the target QTL was isolated by limited backcrossing(BC) or full introgression (near isogenic line, NIL)into a new genetic background. In other examples(e.g., qSH1) (Konishi et al. 2006), the original QTLgenome was used for mapping such that all but thetarget QTL was fixed (not segregating); to createheterozygosity in the region containing the targetallele for mapping, a corresponding chromosomesegment from a polymorphic genotype was crossed in½segment substitution line (SSL)� (Table 1).

2. Because outcrosses/testcrosses are challenging inrice, most studies involved selfing progeny, whichhas the potential of carrying informative crossoverevents on both diploid chromosomes, thus poten-tially doubling the effective number of informativegametes (N). One of the challenges created by selfing,however, for recessive alleles, is that it is not possible todetermine whether a crossover occurred proximal ordistal to the target without checking for the segrega-tion pattern (progeny testing, PT) in the subsequentgeneration (e.g., F3) to distinguish all genotype com-binations (aa, Aa, AA) at the target locus. Six groupsprogeny-tested to check the recessive genotype (e.g.,chl1) (H. T. Zhang et al. 2006). Alternatively, to avoid F3

generation phenotyping, 15 groups (e.g., bc1) (Y. Li

et al. 2003) preselected recessive (mutant) progeny byphenotyping and then only genotyped this subset, thusdiscarding 75% of all progeny.

3. There were 12 fully dominant alleles targeted; in thesecases, as in recessive alleles, because the proximal vs.distal location of flanking crossovers could not bedistinguished without distinguishing AA from Aa geno-types, researchers either progeny-tested in the sub-sequent generation (e.g., Pi-kh) (Sharma et al. 2005)or, cleverly, preselected only the recessive progeny classfor genotyping (e.g., Xa1) (Yoshimura et al. 1998).

4. Finally, there were four examples ½ f5-DU, Rf-1, S32(t),S5n� where the target alleles were expressed in thehaploid generation (e.g., pollen grain, embryo sac)and where the nature of the gene products oftenrequired generating outcross/testcross progeny formapping. In the case of f5-DU (Wang et al. 2006), anallele that boosts pollen viability in specific hybridgenotypes, testcross progeny were used for mapping,since phenotyping required a hybrid background tocheck for segregation of viable pollen grains (eitherhigh or low). Similarly, to fine map the S5n locus (Qiu

et al. 2005), which confers embryo sac viability to wide-cross hybrids, 8000 hybrids were generated by out-crossing a heterozygous NIL S5n/� parent (NIL F1)to a wide-cross tester; phenotyping was performed bymeasuring segregation of fertility of F2 embryo sacson hybrid rice spikelets. In the case of S32(t) (Li et al.2007), which also confers (post-meiotic, haploid)embryo sac viability, the segregation of embryo sac

Predicting Positional Gene Cloning 2039

Page 6: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

TA

BL

E1

An

alys

iso

fp

ub

lish

edp

osi

tio

nal

clo

nin

gan

dfi

ne-

map

pin

gst

ud

ies

inri

ce(O

ryza

sati

va)

Tar

get

alle

leT

IGR

ann

ota

tio

n

Inh

erit

ance

Map

pin

gst

rate

gya

To

tal

pro

gen

yP

roge

ny

gen

oty

ped

(g)

Mei

osi

sfa

cto

r(f

)b

Info

rmat

ive

gam

etes

(N)c

gen

oty

ped

Can

did

ate

reso

luti

on

(T)d

Ref

eren

ceT

ype

Tra

itkb

Gen

ese

bc1

LO

C_O

s03g

3025

0Si

mp

leR

ecF

2-R

ec�

30,0

007,

068

214

,136

3.3

1Y.

Li

etal

.(2

003)

bel

LO

C_O

s03g

5524

0Si

mp

leR

ecF

2-R

ec98

723

12

462

110

18P

an

etal

.(2

006)

Bph

15

fB

AC

20M

14/

BA

C64

O9

Sim

ple

Do

mR

ILF

2-A

ll1

PT

9,47

29,

472

218

,944

4711

Yan

get

al.

(200

4)ch

l1L

OC

_Os0

3g59

640

Sim

ple

Rec

F2-A

ll1

PT

�2,

000

477

295

471

.69

H.

T.

Zh

an

get

al.

(200

6)ch

l9L

OC

_Os0

3g36

540

Sim

ple

Rec

F2-A

ll1

PT

�10

,000

2,45

82

4,90

615

0013

7H

.T

.Z

ha

ng

etal

.(2

006)

cpt1

LO

C_O

s02g

3597

0Si

mp

leR

ecF

2-R

ec5,

000

1,40

02

2,80

032

523

Ha

ga

etal

.(2

005)

d11

LO

C_O

s04g

3943

0Si

mp

leR

ecSS

LF

2-R

ec�

15,0

003,

020

26,

040

9819

Ta

na

be

etal

.(2

005)

d2L

OC

_Os0

1g10

040

Sim

ple

Rec

SSL

F2-A

ll1

PT

3,00

03,

000

26,

000

6010

Ho

ng

etal

.(2

003)

Dbs

LO

C_O

s01g

3304

0Si

mp

leR

ecF

2-R

ec�

12,4

003,

100

26,

200

8615

Sazu

ka

etal

.(2

005)

dgl1

LO

C_O

s01g

4900

0Si

mp

leR

ecF

2-R

ec�

4,60

01,

150

22,

300

44.5

5K

om

or

iso

no

etal

.(2

005)

Eu

iL

OC

_Os0

5g40

384

Sim

ple

Rec

NIL

F2-R

ec5,

500

1,40

02

2,80

024

1Z

hu

etal

.(2

006)

eui1

LO

C_O

s05g

4038

4Si

mp

leR

ecF

2-R

ec�

10,0

002,

623

25,

246

303

Lu

oet

al.

(200

6)f5

-DU

fP

AC

P00

08A

07Q

TL

Gam

eteg

NIL

hyb

rid

test

cro

ss1,

993

1,99

31

1,99

370

9W

an

get

al.

(200

6)fo

n1

LO

C_O

s06g

5034

0Si

mp

leR

ecF

2-A

ll2,

419

2,41

91

2,41

915

010

Suza

ki

etal

.(2

004)

fon

4L

OC

_Os1

1g38

270

Sim

ple

Rec

F2-R

ec�

8,40

02,

100

24,

200

450

83H

.W

.C

hu

etal

.(2

006)

gh2

LO

C_O

s02g

0949

0Si

mp

leR

ecF

2-R

ec13

,000

3,25

62

6,51

130

3K

.W

.Z

ha

ng

etal

.(2

006)

gid1

LO

C_O

s05g

3373

0Si

mp

leR

ecF

2-R

ec�

7,20

01,

800

13,

600

384

Ueg

uc

hi-T

an

ak

aet

al.

(200

5)gl

-3f

BA

CO

SJN

3b00

74M

06Q

TL

Rec

SSL

BC

4F2

Rec

2,06

849

92

998

87.5

10W

an

etal

.(2

006)

Gn

1a

LO

C_O

s01g

1011

0Q

TL

Ad

dit

ive

NIL

BC

F2-A

ll1

PT�

13,0

0013

,000

226

,000

6.3

1A

sh

ik

ar

iet

al.

(200

5)H

d1L

OC

_Os0

6g16

370

QT

LD

om

BC

3F

3-R

ec9,

000

1,50

52

3,01

012

2Ya

no

etal

.(2

000)

Hd6

LO

C_O

s03g

5538

9Q

TL

Do

mN

ILB

C3F

2P

T2,

807

2,80

72

5,61

426

.41

Ta

ka

ha

sh

iet

al.

(200

1)H

td1

LO

C_O

s04g

4647

0Si

mp

leR

ecF

2-R

ec20

,000

4,60

02

9,20

030

6Z

ou

etal

.(2

005,

2006

)M

oc1

LO

C_O

s06g

4078

0Si

mp

leR

ecF

2-R

ec2,

010

2,01

02

4,02

020

2X

.L

iet

al.

(200

3)P

i36

(t)f

PA

CP

0443

G08

Sim

ple

Do

mF

2-R

ec4,

884

580

21,

160

172

Liu

etal

.(2

005)

Pib

LO

C_O

s02g

5731

0Si

mp

leD

om

BC

2F3/

F4-R

ec1

PT�

13,0

003,

305

26,

610

8012

Wa

ng

etal

.(1

999)

Pi-

d2L

OC

_Os0

6g29

810

Sim

ple

Do

mF

2-R

ec20

,000

4,00

02

8,00

018

033

Ch

en

etal

.(2

006)

Pi-

khL

OC

_Os1

1g42

010

Sim

ple

Do

mF

2-A

ll1

PT

208

208

241

614

3.5

18Sh

ar

ma

etal

.(2

005)

pla1

LO

C_O

s10g

2634

0Si

mp

leR

ecF

2-R

ec2,

312

578

21,

156

743

Miy

osh

iet

al.

(200

4)P

sr1

LO

C_O

s01g

2548

4Q

TL

Do

mN

ILB

C3F

23,

800

3,80

00.

51,

900

50.8

4N

ish

im

ur

aet

al.

(200

5)qS

h1L

OC

_Os0

1g62

920

QT

LD

om

SSL

BC

4F2

1P

T10

,388

10,3

882

20,7

660.

612

1K

on

ish

iet

al.

(200

6)qU

vr1

0L

OC

_Os1

0g08

580

QT

LA

dd

itiv

eN

ILF

2-A

ll1

PT

1,85

01,

850

23,

700

276

Ued

aet

al.

(200

5)R

f-1

LO

C_O

s10g

3543

6Si

mp

leG

amet

egN

ILC

MS

test

cro

ss5,

145

5,14

51

5,14

576

4K

om

or

iet

al.

(200

4)S3

2(t

)fP

AC

AP

0052

94Q

TL

Gam

eteg

Het

BC

4F2

gam

etes

1,05

01,

050

22,

100

647

Li

etal

.(2

007)

S5n

fP

AC

P00

21C

04Q

TL

Gam

eteg

NIL

hyb

rid

ou

tcro

ss8,

000

8,00

01

8,00

040

5Q

iu

etal

.(2

005)

Skc1

LO

C_O

s1g2

0160

QT

LD

om

BC

3F2-A

ll1

PT

2,97

32,

973

25,

946

7.4

1R

en

etal

.(2

005)

spl1

1L

OC

_Os1

2g38

210

Sim

ple

Rec

F2-R

ec;

F3-A

ll�

3,00

02,

143

2/0.

5h

1,53

727

3Z

en

get

al.

(200

4)sp

l7L

OC

_Os0

5g45

410

Sim

ple

Rec

SSL

F2-A

ll1

PT

2,94

42,

944

25,

888

11

Yam

an

ou

ch

iet

al.

(200

2)X

a1L

OC

_Os0

4g53

120

Sim

ple

Do

mF

3-R

ec4,

225

965

21,

930

257

Yosh

im

ur

aet

al.

(199

8)xa

13

LO

C_O

s08g

4235

0Si

mp

leR

ecF

2-A

ll1

PT

�8,

000

7,97

22

14,8

4214

.82

Z.

H.

Ch

uet

al.

(200

6)

(con

tin

ued

)

2040 S. J. Dinka et al.

Page 7: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

TA

BL

E1

(Co

nti

nu

ed)

Tar

get

alle

leT

IGR

ann

ota

tio

n

Inh

erit

ance

Map

pin

gst

rate

gya

To

tal

pro

gen

yP

roge

ny

gen

oty

ped

(g)

Mei

osi

sfa

cto

r(

f)b

Info

rmat

ive

gam

etes

(N)c

gen

oty

ped

Can

did

ate

reso

luti

on

(T)d

Ref

eren

ceT

ype

Tra

itkb

Gen

ese

Xa2

6L

OC

_Os1

1g47

000

Sim

ple

Do

mF

2/

NIL

-Rec

�1,

908

477

295

467

.212

Yan

get

al.

(200

3);

Sun

etal

.(2

004)

xa5

LO

C_O

s05g

0158

0Si

mp

leR

ecF

2-A

ll1

PT

2,34

52,

345

24,

790

8.1

2Iy

er

and

Mc

Co

uc

h(2

004)

Med

ian

�4,

884

2,41

94,

200

44.5

5

QT

L,

qu

anti

tati

vetr

ait

locu

s;R

ec,

rece

ssiv

e;D

om

,d

om

inan

t;G

amet

e,ga

met

op

hyt

ic;

RIL

,re

com

bin

ant

inb

red

lin

e;P

T,

pro

gen

yte

sted

;SS

L,

chro

mo

som

e/se

gmen

tsu

b-

stit

uti

on

lin

e;h

et,

het

ero

zygo

us.

aT

od

isti

ngu

ish

wh

eth

era

cro

sso

ver

occ

urr

edb

etw

een

the

targ

etal

lele

and

the

pro

xim

alvs

.d

ista

lm

ole

cula

rm

arke

r,th

ep

ost

-mei

oti

cge

no

typ

eo

fth

eta

rget

locu

s(A

A,

Aa,

aa)

mu

stb

ed

isce

rnib

le,a

sth

isd

eter

min

esth

en

um

ber

of

effe

ctiv

em

eio

ses

that

con

trib

ute

sto

the

fin

alm

app

ing

of

the

targ

etlo

cus.

Fo

rex

amp

le,f

or

are

cess

ive

trai

t,in

the

F2

gen

erat

ion

,wh

erea

sa

cro

sso

ver

can

be

det

ecte

db

yge

no

typ

ing

flan

kin

gm

ole

cula

rm

arke

rs,b

ecau

seA

Avs

.Aa

alle

les

can

no

tb

ed

isti

ngu

ish

ed,t

he

pro

xim

alvs

.dis

tall

oca

tio

no

fth

ecr

oss

ove

rca

nn

ot

be

assi

gned

wit

ho

ut

ph

eno

typ

ing

segr

egan

ts(P

T,

pro

gen

yte

stin

g)in

the

F3

gen

erat

ion

;w

ith

ou

tp

roge

ny

test

ing,

on

ly�

1/4

(aa

clas

s)o

fre

com

bin

ant

pro

gen

yco

ntr

ibu

teto

the

fin

alm

apas

sign

men

to

fth

eal

lele

.T

he

foll

ow

ing

gen

oty

pin

gan

dp

hen

oty

pin

gst

rate

gies

wer

eem

plo

yed

:F

2-R

ec,

on

lyF

2re

cess

ive

pro

gen

yge

n-

oty

ped

;F2-A

ll,a

llF

2p

roge

ny

gen

oty

ped

ran

do

mly

;F2-A

ll1

PT

,all

F2

pro

gen

yge

no

typ

ed,t

hen

F3

pro

gen

yp

hen

oty

ped

tod

isti

ngu

ish

AA

fro

mA

aal

lele

sat

the

targ

etlo

cus

inth

eF

2ge

ner

atio

n;R

IL/

NIL

/SS

L/

BC

,use

db

ackc

ross

ing

and

/o

rre

com

bin

ant

inb

red

lin

es,n

ear-

iso

gen

icli

nes

,an

d/

or

segm

ent

sub

stit

uti

on

lin

esto

dis

tin

guis

hth

eta

rget

QT

Lp

hen

oty

pe

or

toin

tro

gres

sth

eta

rget

alle

lein

toan

app

rop

riat

ech

rom

oso

me

bac

kgro

un

dsu

itab

lefo

rm

app

ing;

F1

or

F2

gam

etes

,sin

ceth

eta

rget

gen

ep

rod

uct

isex

pre

ssed

inth

eh

aplo

idga

met

op

hyt

e(p

oll

eno

rem

bry

osa

c),

use

dth

era

tio

of

gam

eto

ph

yte

ph

eno

typ

es(e

.g.,

po

llen

germ

inat

ion

)o

fth

esp

ikel

et,

inco

mb

inat

ion

wit

hge

no

typ

ing

the

pla

nts

carr

yin

gth

esp

ikel

ets,

tod

isti

ngu

ish

AA

/A

a/aa

gen

oty

pes

of

the

targ

etal

lele

.bT

he

mei

osi

sfa

cto

r(

f)is

mu

ltip

lied

by

the

nu

mb

ero

fp

ost

-mei

oti

cp

roge

ny

gen

oty

ped

togi

veth

en

um

ber

of

mei

ose

sth

atco

ntr

ibu

ted

toth

efi

nal

map

pin

gre

solu

tio

no

fth

eta

rget

alle

le.

Fo

rex

amp

le,

ifth

eF

2p

op

ula

tio

nw

asd

eriv

edfr

om

selfi

ng

F1

par

ents

,th

enea

chF

2p

lan

tre

pre

sen

tstw

om

eio

ses

(f¼

2)w

hen

the

pre

cise

F2

gen

oty

pe

was

dis

cern

edb

yp

roge

ny

test

ing

(ph

eno

typ

ing

segr

egan

tsin

the

F3

gen

erat

ion

).W

ith

ou

tp

roge

ny

test

ing,

then

on

ly1/

4re

cess

ive

(aa)

F2

pro

gen

yw

ere

con

sid

ered

tob

eu

sefu

l,b

ut

sin

cetw

om

eio

ses

con

trib

ute

dto

this

clas

s,f¼

(1/

4)3

0.5.

cN

um

ber

of

chro

mo

som

eso

ref

fect

ive

po

st-m

eio

tic

pro

du

cts

that

wer

ege

no

typ

ed(N

),w

her

eN¼

g3

f.dT

he

fin

alm

apre

solu

tio

n(T

)is

defi

ned

asth

en

um

ber

of

kilo

bas

eso

rge

nes

inth

ein

terv

alb

etw

een

the

mo

stp

roxi

mal

and

dis

tal

mo

lecu

lar

mar

ker

flan

kin

gth

eta

rget

alle

le,

inw

hic

hat

leas

to

ne

cro

sso

ver

was

fou

nd

.W

hen

this

valu

ew

asn

ot

pu

bli

shed

,it

was

esti

mat

edu

sin

gth

eV

ersi

on

4T

IGR

rice

pse

ud

om

ole

cule

sge

no

me

bro

wse

r,an

dw

hen

po

ssib

leco

nfi

rmed

by

per

son

alco

mm

un

icat

ion

wit

hth

est

ud

yau

tho

rs.

eO

nly

no

n-t

ran

spo

son

,n

on

-ret

roel

emen

tge

ne

mo

del

sar

ein

clu

ded

asd

efin

edb

yth

eV

ersi

on

4T

IGR

rice

pse

ud

om

ole

cule

san

no

tati

on

dat

abas

e.fG

ene

no

tye

tis

ola

ted

.gT

he

alle

leac

tsin

the

hap

loid

gam

ete-

der

ived

gen

erat

ion

,ei

ther

po

llen

or

emb

ryo

sac.

See

resu

lt

sfo

rd

etai

ls.

hT

wo

pro

gen

yp

op

ula

tio

ns,

wit

htw

oge

no

typ

ing

stra

tegi

es,

wer

eu

sed

.

Predicting Positional Gene Cloning 2041

Page 8: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

viability was measured in the spikelets of selfed F2

plants. Finally, in the case of Rf-1, a nuclear locus thatrestores male gamete (pollen) fertility by overcomingthe effects of a mitochondrial ½cytoplasmic male ster-ility (CMS)� gene, 5145 testcross F2 progeny (three-way cross: heterozygote restorer 3 non-restorer tester)were generated for mapping and the segregation ofpollen viability scored (Komori et al. 2003, 2004).

Lessons from calculating empirical local recombina-tion frequencies (R-local) and their use in validatingpredictive equations: To both validate the equationsnoted in this study and later understand any discrep-ancies between the experimental data and predictionsbased on the molecular marker map, we then calculatedthe experimental (local) recombination frequency(R-local) for each of the 41 successful fine-mappingstudies in rice (see materials and methods) (Table 2).From each study, we counted the number of crossoverslocated between the closest two markers used to definethe final map resolution (T); these are the first recom-binants used to define the edges of the candidate targetregion. Although we expected to find only 1 crossoveron each distal or proximal flank (2 total), in 32 of 41examples we found between 3 and 16 total crossovers,due to hotspots of recombination and/or poor markerdensity; such redundant crossover targets suggested thatan excess number of progeny were genotyped given theavailable marker density in the majority of rice posi-tional cloning attempts, an important observation.

Since a high density of molecular markers and largeprogeny numbers are used in positional cloning, theR-local values provide an interesting snapshot into thevariation in recombination frequency in the rice ge-nome: we found that though the genome-wide averageR was 277 kb/cM or 28.0 genes/cM in rice, locally,R-values ranged from 3.3 to 1344.2 genes/cM or 28.2 to14,718 kb/M, an �400-fold and �500-fold range, re-spectively. Strongly influenced by chance, such a widerange in recombination frequencies would largely ex-plain the wide range in the number of progeny that weregenotyped in rice (Table 1). The most hyper-recombi-nogenic region (3.3 genes/cM, 28.2 kb/cM) flankedthe Pi36(t) allele (Liu et al. 2005), which required only1160 informative gametes to achieve a map resolution of17 kb or two candidate genes. The region with the leastamount of recombination (1344.2 genes/cM or 14,718kb/cM) encompassed the chl9 allele; in this study,although 4906 informative chromosomes were geno-typed, the map resolution was 1500 kb or 137 genes(H. T. Zhang et al. 2006). These two groups define theextremes of good and bad ‘‘luck,’’ respectively, in rice,and as such may set upper and lower map-population-size boundaries for future positional cloning attemptsin this important species.

We then compared the empirical number of gametesthat were genotyped (N) in each study to the number

predicted by both equations (see above) given only thevariables T and R-local; this allowed us to first test thevalidity of the equations in rice and to modify the equa-tions if necessary. The size of the mapping population(informative chromosomes) (N) predicted by the Durrett–Tanksley equation compared to the empirical data, forgiven Tand R-local values (in kb/cM), is shown in Figure2A; we found a strong positive correlation between themapping size predicted by the Durrett–Tanksley equa-tion and the experimental results (Spearman r ¼ 0.85,P , 0.0001, n ¼ 41). In at least 10 examples (10/41),however, in spite of using the actual recombinationfrequencies, we found that the Durrett–Tanksley equa-tion overestimated the mapping population by atleast twofold, which would have caused researchers tounnecessarily genotype thousands of extra progeny.The simpler, Single Crossover model appeared to be aslightly better predictor of the progeny mapping pop-ulation size as shown in Figure 2B. Although this secondequation predicted the mapping population N witha near-equivalent correlation as the Durrett–Tanksleyequation (Spearman r ¼ 0.86; P , 0.0001; n ¼ 41),linear regression analysis of the two models (Figure 3, Aand B) demonstrated that the single crossover equationcame closer to a linear slope of m ¼ 1 on an x–y scatterplot of predicted vs. experimental N values; in the caseof the Durrett–Tanksley model, the best-fit line followedthe equation y¼ 1.70x� 1323 (goodness of fit r2¼ 0.76,Sy.x ¼ 5456), whereas for the single crossover equation,the best-fit line was y ¼ 1.07x � 833 (r2 ¼ 0.76, Sy.x ¼3426). Although one equation was slightly better thanthe other, these results demonstrate for the first timethat (both) simple formulas, if based on accurate localrecombination frequency values, can provide signifi-cant guidance in predicting the mapping populationsize in the majority of alleles targeted for positionalcloning.

Fine-tuning of the equations based on empiricalstudies: We then wondered if we could fine-tune bothpredictive models. We noticed that the Durrett–Tanksleyequation overestimated the number of progeny neededwhen the experimental number of crossovers found indistance Twas low (,5 total); when the number of cross-overs found was high (.5), this equation underestimatedthe number of progeny required (Figure 2A; Table 2).In the latter cases, it appeared as if T was limited by thelocal density of molecular markers; given this low density,the published studies appear to have ‘‘over-genotyped’’the progeny population. Restated, when many crossoverswere found within the interval T (final map resolution),then the actual candidate distance (in kilobases) mighthave been smaller (higher map resolution) had moremolecular markers been available in the vicinity. Byplotting the ratio Nmodel/N empirical relative to the numberof crossovers (lT) (where l ¼ l1 1 l2) (Table 2) on ascatter plot, we found that there was an inverse Powerrelationship between the two variables such that N model/

2042 S. J. Dinka et al.

Page 9: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

TA

BL

E2

Cal

cula

tio

no

flo

cal

reco

mb

inat

ion

freq

uen

cies

(R-l

oca

l)in

chro

mo

som

ein

terv

als

con

tain

ing

targ

etlo

ci

Tar

get

alle

le

R(L

oca

l)as

defi

ned

by

po

siti

on

alcl

on

ing

stu

die

sR

egio

nal

R(m

ap)

asd

eter

min

edb

yan

cho

rin

glo

cus

toth

eri

ceR

GP

1391

-mar

ker

map

a

Pro

xim

al(l

1)

cro

sso

vers

Dis

tal

(l2)

cro

sso

vers

R(l

oca

l)b

(kb

/cM

)R

(lo

cal)

b

(gen

es/

cM)

Ch

rom

oso

me

Ph

ysic

alin

terv

al(b

p)c

Pro

xim

alm

arke

rD

ista

lm

arke

rR

(map

)d

(kb

/cM

)R

(map

)d

(gen

es/

cM)

bc1

52

66.6

20.2

317

2153

34–1

7217

366

C12

819

E61

171

403.

646

.8be

l1

216

9.4

27.7

331

3796

97–3

1385

021

C00

12S2

722

140.

221

.4B

ph1

52

31,

780.

741

6.8

468

8283

5–72

3162

1C

0708

R02

8814

60.4

125.

4ch

l11

222

7.7

28.6

333

8942

27–3

3900

896

E50

580

C14

8436

9.9

53.8

chl9

23

14,7

18.0

1344

.23

2020

4631

–202

0230

5E

6134

0S2

992

6028

.851

2.0

cpt1

310

700.

049

.52

2160

2296

–192

7256

E36

34E

5085

017

5.3

20.7

d11

57

493.

395

.64

2325

2491

–232

4807

0S1

0702

R02

7827

2.7

36.3

d21

11,

800.

030

0.0

152

3342

5–52

4132

1E

6011

0C

1207

225

0.7

31.3

dbs

25

761.

713

2.9

118

4777

85–1

8484

690

C05

85E

6080

8A29

44.1

316.

7dg

l11

325

5.9

28.8

128

4632

15–2

8470

187

R10

12C

0922

169.

321

.3eu

i1

222

4.0

9.3

523

6453

62–2

3655

166

S148

89E

3000

312

7.0

18.7

eui1

12

524.

652

.55

2364

5362

–236

5516

6S1

4889

E30

003

127.

018

.7f5

-DU

132

93.0

12.0

512

7864

9–13

7942

1R

0830

S161

1310

6.7

11.9

fon

13

272

5.7

48.4

630

4734

75–3

0469

084

R14

79E

1013

918

5.5

26.3

fon

41

101,

718.

231

6.9

1122

1636

72–2

2162

134

E43

36C

0050

2171

.623

6.7

gh2

1e1e

976.

797

.72

4874

079–

4870

146

S151

1S1

3446

177.

523

.9gi

d11

168

4.0

72.0

519

7853

34–1

9788

132

S268

9C

1268

252.

228

.3gl

-33

314

5.5

16.6

316

4694

00–1

6556

900

E03

23C

6031

8A12

8.5

12.5

Gn

1a

11

819.

013

0.0

152

7232

4–52

6727

4E

6011

0C

1207

225

0.7

31.3

hd1

12

120.

420

.16

9335

377–

9337

570

C02

35R

1006

925

7.9

22.4

Hd6

43

211.

78.

03

3145

3402

–314

4775

5C

0012

S272

214

0.2

21.4

htd1

11

1,38

0.0

276.

04

2734

8729

–273

5139

3C

1137

8E

5119

628

3.7

40.7

moc

12

220

1.0

20.1

624

3103

11–2

4315

385

S103

24R

1559

151.

818

.7P

i36

(t)

25

28.2

3.3

827

6297

7–28

8890

9S4

144B

C60

293

139.

412

.5P

ib1

31,

322.

019

8.3

235

1129

00–3

5107

768

C03

79S1

0020

98.3

12.9

Pi-

d21

33,

600.

066

0.0

617

1593

37–1

7163

823

C00

58S2

0510

9503

.884

0.0

Pi-

kh6

459

.77.

511

2476

1902

–247

6292

2R

1082

9S5

730

198.

622

.3pl

a11e

1e42

7.7

17.3

1013

3291

13–1

3327

360

C09

61R

1738

A13

5.1

13.7

Psr

12

132

1.7

25.3

114

4293

94–1

4435

943

R25

01C

0045

428.

039

.1qS

h11

242

.469

.21

3677

6800

–367

7221

1R

0578

E20

351

1145

.613

1.3

qUvr

10

21

333.

074

.010

4433

064–

4429

249

E30

589B

E10

6429

1.7

28.2

Rf-

12

11,

303.

468

.610

1860

6065

–186

2412

6S1

1841

S211

7431

8.5

48.2

S32

(t)

42

224.

024

.52

2109

653–

2227

836

S203

51R

1047

920

1.1

22.6

S5n

12

1,06

6.7

133.

36

5606

631–

5758

152

R19

54R

2349

70.7

10.0

Skc1

11

220.

029

.71

1146

0220

–114

5573

4E

4175

E10

119

190.

418

.9sp

l11

21

138.

315

.412

2343

9842

–234

3465

2E

6197

6S5

375

95.2

13.3

spl7

11

29.4

29.4

526

2611

54–2

6263

625

C04

66C

1136

822

2.2

43.6

Xa1

32

96.5

27.0

431

4190

36–3

1425

732

C52

068

S154

429

8.3

45.0

xa1

33

243

9.3

59.4

826

5959

51–2

6593

109

R39

61A

R06

3910

9.1

16.0

(con

tin

ued

)

Predicting Positional Gene Cloning 2043

Page 10: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

TA

BL

E2

(Co

nti

nu

ed)

Tar

get

alle

le

R(L

oca

l)as

defi

ned

by

po

siti

on

alcl

on

ing

stu

die

sR

egio

nal

R(m

ap)

asd

eter

min

edb

yan

cho

rin

glo

cus

toth

eri

ceR

GP

1391

-mar

ker

map

a

Pro

xim

al(l

1)

cro

sso

vers

Dis

tal

(l2)

cro

sso

vers

R(l

oca

l)b

(kb

/cM

)R

(lo

cal)

b

(gen

es/

cM)

Ch

rom

oso

me

Ph

ysic

alin

terv

al(b

p)c

Pro

xim

alm

arke

rD

ista

lm

arke

rR

(map

)d

(kb

/cM

)R

(map

)d

(gen

es/

cM)

Xa2

62

1440

.17.

211

2768

8448

–276

9219

8R

1506

R33

4238

3.4

32.5

Xa5

11

194.

047

.95

3235

38–3

2546

8C

1201

5C

0568

111.

913

.1

Fo

rco

mp

aris

on

,th

ege

no

me-

wid

eav

erag

eR

(avg

277

kb/

cMo

r28

.0ge

nes

/cM

.a

Ric

eG

eno

me

Pro

ject

(RG

P)

RF

LP

map

bas

edo

nan

F2

po

pu

lati

on

bet

wee

nN

ipp

on

bar

ean

dK

asal

ath

(htt

p://

rgp

.dn

a.af

frc.

go.jp

/)

(Ha

ru

sh

im

aet

al.

1998

).bT

he

loca

lre

com

bin

atio

nfr

equ

ency

was

calc

ula

ted

asfo

llo

ws:

R-lo

cal¼

T(l

oca

l)/

m(l

oca

l),

wh

ere

T(l

oca

l)is

nu

mb

ero

fki

lob

ases

or

gen

esb

etw

een

the

clo

sest

flan

kin

gm

ole

cula

rm

arke

rsw

ith

atle

ast

on

ecr

oss

ove

rb

etw

een

each

mar

ker

and

the

targ

etlo

cus,

and

m(l

oca

l)is

gen

etic

dis

tan

ceb

etw

een

thes

etw

om

arke

rsin

cen

tim

org

ans.

Th

eva

lue

for

T(l

oca

l)is

rep

ort

edin

Tab

le1.

Th

ege

net

icm

apd

ista

nce

,m

(lo

cal)¼

100

3(l

11

l2)/

N,

wh

ere

l1

isn

um

ber

of

pro

xim

alcr

oss

ove

rs(T

able

2),

l2

isn

um

ber

of

dis

talc

ross

ove

rs(T

able

2),a

nd

Nis

tota

ln

um

ber

of

info

rmat

ive

chro

mo

som

esge

no

typ

ed(T

able

1).I

na

test

cro

ss,m¼

100

3l

/p

roge

ny

(hen

cem¼

100

3re

com

bin

ants

/p

roge

ny)

,wh

erea

sin

ase

lfed

cro

ssw

ith

pro

gen

yte

stin

g,m¼

100

3(l

/2

3p

roge

ny)

sin

cege

no

typ

ing

per

mit

sb

oth

chro

mo

som

esto

con

trib

ute

toth

em

app

ing

po

pu

lati

on

.cT

his

isth

ep

reci

sech

rom

oso

me

ph

ysic

allo

cati

on

of

the

targ

etlo

cus

or

can

did

ate

regi

on

on

the

rice

ph

ysic

alm

ap(V

ersi

on

4.0

TIG

Rri

cep

seu

do

mo

lecu

les)

.d

R(m

ap)¼

T(m

ap)/

m(m

ap),

wh

ere

T(m

ap)

isn

um

ber

of

kilo

bas

eso

rge

nes

bet

wee

nth

eR

GP

pro

xim

alan

dd

ista

lm

arke

rsco

nta

inin

gth

eta

rget

locu

sas

list

edin

Tab

le2,

and

m(m

ap)

isge

net

icm

apd

ista

nce

bet

wee

nth

ese

two

RG

Pm

arke

rsin

cen

tim

org

ans.

Th

ese

valu

esca

nb

efo

un

din

sup

ple

men

tal

Tab

le1

ath

ttp

://w

ww

.gen

etic

s.o

rg/

sup

ple

men

tal/

.eN

um

ber

of

reco

mb

inan

tsn

ot

rep

ort

ed,

sose

tat

l1¼

1an

dl

1.

2044 S. J. Dinka et al.

Page 11: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

N empirical ¼ 4.744/lT. Therefore, we adjusted T bymultiplying it by 4.744/lT, where lT is the total numberof crossovers in this region. Accordingly, we also rede-fined T as T-marker to note that marker density oftenrate-limits the physical resolution. The resulting modi-fied Durrett–Tanksley equation is

N ¼ð4:744 3 100RÞ=½T -marker 3 ð4:744=lTÞ�;

or simplified;

N ¼ ð100R 3 lTÞ=T -marker ;

where N is total number of informative chromosomesthat must be genotyped with the probability of successset at P ¼ 0.95, R is the local recombination frequency(R-local), T-marker is distance between the closest twomolecular markers (in which crossovers are detectedrelative to the target allele), and lT is number ofcrossovers between the closest two molecular markers($2). This is a rewritten version of the standard mapdistance calculation: m¼ 100 3 recombinants/progenyfor a testcross, assuming no double crossovers (Haldane

1919).

Figure 2.—Testing the validity of two mathematical equations as predictors of the size of the progeny mapping population (N)required to positionally clone target alleles using rice as a model system. We compared N values predicted by the Durrett–Tanksleyequation (A) and the Single Crossover model equation (B) to 41 published, empirical studies (shown in Table 1). In both graphs,the target alleles are shown on the x-axis; solid histograms denote the kilobase map resolution achieved, and the solid graphed lineis the number of informative post-meiotic gametes (N) genotyped, as calculated in Table 1; the spotted line is the number ofinformative gametes predicted. When the probability of success is set at 95%, then the Durrett–Tanksley equation is simplifiedsuch that N ¼ (4.744 3 100R)/T, where R is relevant meiotic recombination frequency and T is final map resolution achieved,notably the distance between the closest distal and proximal molecular markers that are subject to at least one crossover betweenthe marker and the target trait in the progeny population. The single crossover model predicts that N¼ Log (1 � P)/Log (1� T/100R). For both equations, we employed the published, empirical local recombination frequency (R-local) as shown in Table 2,and hence these graphs represent the upper limit of prediction possible by the equations. For the graphs above, we set the prob-ability of success at 95% (P ¼ 0.95).

Predicting Positional Gene Cloning 2045

Page 12: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

We then compared the predictions of the modifiedDurrett–Tanksley equation, using R-local values (Table2), to the published mapping size population values(N); as shown in Figure 3C, the modified equation was100% predictive (y ¼ 1.0x, r2 ¼ 1.0, F ¼ 0). Using asimilar approach, we also modified the Single Crossoverequation. By plotting the ratio N model/N empirical relative tothe number of crossovers (lT) (where lT ¼ l1 1 l2)(Table 2) on a scatter plot, we found that there was aninverse Power relationship between the two variablessuch that N model/N empirical � 3/lT. Therefore, we modi-fied the genetic map resolution T by the number ofcrossovers, resulting in the following modified SingleCrossover equation:

N ¼ Log ð1� PÞ=Log f1� ½T � marker ð3=lTÞ�=100Rg:

As shown in Figure 3D, again the modified equationwas close to 100% predictive of the empirical results (y¼1.0x � 1.5, r2 ¼ 1.0).

These modified equations offer some advantages forresearchers: these equations define probability explic-itly as the number of crossovers (informative gametes)that a researchers can expect to achieve for a givenprogeny population. A researcher is taking more of a risk

if the goal is to achieve only two informative gametes,each carrying a crossover on either side of thetarget allele (lT ¼ 2), compared to if the target is fiveinformative gametes. These equations also make itexplicit that the density of available molecular markersin the target region is critical: if there are few availablemolecular markers, a researcher does not achieve betterresolution by increasing the number of progeny geno-typed (N) beyond a certain threshold. We suggest thatusers of this equation who wish to predict N should selectT based on a realistic density of achievable molecularmarkers in the vicinity of the target allele, and adjust lT

according to their own risk assessment. For example, ifobtaining only two informative recombinant gametes istoo risky, N should be increased.

Predictive value of the equations using recombina-tion frequencies derived from a MRFM: In the analysisabove, we validated both Durrett–Tanksley equations andthe Single Crossover equations using published high-resolution, local recombination frequencies (R-local)derived from already fine-mapped alleles. Our goal wasto predict the progeny mapping population (N infor-mative gametes) in advance, however, whereas R-localdata is not available until the conclusion of a positionalcloning attempt. Previous a priori mapping population

Figure 3.—Linear re-gression analysis to validateand determine which math-ematical models predictmapping population sizeduring positional cloningattempts in rice. In eachcase, the y-axis is thenumberof gametes predicted byeach model, and the x-axisis the published, empiricalnumber of informative ga-metes genotyped. (A) Lin-ear regression analysis ofthe Durrett–Tanksley equa-tion, based on the calcula-tion P ¼ 1 � [1 1 NT/(100R)]e�NT/(100R), whereP is threshold probabilityof success, N is the numberof meiotic gametes (chro-mosomes) that must be gen-otyped in which it can bedetermined whether a cross-over is located proximal ordistal to the target allele,

T is expected distance between flanking molecular markers (kilobases or candidate genes), and R is local recombination frequency(kb/cM orgenes/cM). Theequationwas simplified by setting theprobability of successatP¼0.95, resulting inN¼(4.7443100R)/T.(B) Linear regression analysis of the Single Crossover equation, where N ¼ Log (1 � P)/Log (1 � T-marker/100R). (C) Linear re-gression analysis of the modified Durrett–Tanksley equation, calculated as N¼ (100R 3 lT)/T-marker, where lT is number of cross-overs between the closest two molecular markers ($2). (D) Linear regression analysis of the modified Single Crossover equation,calculated as N ¼ Log (1 � P)/Log {1 � ½T-marker(3/lT)�/100R}. For each model, experimentally-derived R-local values were usedfrom Table 2, andhence these graphs represent the upper accuracy limit of theequations as typically such high resolution frequenciesare not available before a positional cloning experiment. The results demonstrate that the Single Crossover model is moderatelybetter predictive of the mapping population size compared to the Durrett–Tanksley equation, but both models become accuratewhen the equations are adjusted for the number of gametes carrying crossovers immediately flanking the target locus.

2046 S. J. Dinka et al.

Page 13: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

estimates only used the genome-wide average recombi-nation frequency (R-avg) (Durrett et al. 2002), but aswe have confirmed (Table 2) and as many others havenoted (Wu et al. 2003; Crawford et al. 2004; McVean

et al. 2004), recombination frequencies vary tremen-dously along any chromosome. Therefore, we wonderedif we could more accurately predict N in advance byemploying regional meiotic recombination frequenciesfrom a high-density molecular marker map (R-map). Toaccomplish this, we first developed a MRFM for 1400marker intervals in rice, based on the Rice Genome Pro-ject (RGP) F2 ½Nipponbare (Japonica) 3 Kasalath (In-dica)� RFLP map (Harushima et al. 1998). Mean R-mapvalues were 33.5 genes/cM and 294 kb/cM, similar tocalculations of the whole-genome average recombinationfrequency (R-avg) for rice (28 genes/cM and 277 kb/cM).The entire R-map data set is located in supplementalTable 1 (http://www.genetics.org/supplemental/) and itshould serve as a useful reference for future positionalcloning studies in rice.

Next, in silico, we mapped each cloned allele onto aphysical and genetic interval on this map as shown inTable 2 (see materials and methods). We then usedthe corresponding ‘‘neighborhood’’ recombination fre-quencies (R-map) to calculate mapping population sizes(N). As shown in Figure 4, we found that there was amodest but significant improvement in predicting thenumber of informative gametes (N) required to be ge-notyped when recombination frequencies (calculated

as kilobases/cM) were based on rice RGP R-map values;as we suspected, we found that there was not a signif-icant correlation between the empirical mapping size(N) vs. mapping sizes predicted by either of the two(unmodified) equations when the R-avg value was used(Spearman r¼ 0.30, P¼ 0.0547, n¼ 41) (Figure 4, A andD). In contrast, the correlation was significant whenR-map values were used (Spearman r¼ 0.46, P¼ 0.0022,n ¼ 41) (Figure 4, B and E) and this correlation in-creased even further when several outliers were re-moved (Spearman r¼ 0.61, P , 0.0001, n¼ 36) (Figure4, C and F). Surprisingly, however, the correlation didnot improve even further when the modified equationswere used that took into account the number of im-mediate crossovers (lT) (for R-map, Spearman r¼ 0.35,P ¼ 0.0232, considered significant); however, the corre-lation was still a significant improvement over when theR-avg value was used in conjunction with the modifiedequations (Spearman r ¼ 0.21, P ¼ 0.19, n ¼ 41, notsignificant; data not shown). We conclude that mappingsize predictions based on neighborhood (.280-kb seg-ments) recombination frequencies (in kilobases/cM)better predict the number of progeny required to begenotyped to positionally clone a gene than predictionsbased on using the genome-wide average recombina-tion frequency.

The effect of using R-map recombination frequen-cies calculated as kb/cM vs. genes/cM: Although use ofR-map values better predicted the size of the progeny

Figure 4.—Modest improvement in predicting the size of the progeny mapping population (informative gametes) required tobe genotyped during positional cloning when using neighborhood recombination frequencies extracted from a reference geneticmap (R-map) compared to the whole genome average (R-avg) using R-values based on kilobase/cM calculations. On the x-axis isthe mapping population size from published positional cloning studies in rice (see Table 1). On the y-axis is the prediction. (A–C)Models based on the Durrett–Tanksley equation (unmodified). (D–F) Models based on the Single Crossover equation (unmod-ified). R-map values were calculated from the 1400-marker Rice Genome Project (RGP) RFLP map (F2 of Nipponbare 3 Kasalathcross) (see materials and methods). In C and F, five outliers were removed in comparison to B and E, respectively. Both equa-tions set the probability of success at 95%.

Predicting Positional Gene Cloning 2047

Page 14: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

mapping population compared to the genome-wide aver-age recombination frequency, we were disappointedthat the improvement was not more significant. In orderto understand the reason, we asked to what extentR-map values calculated as kilobases/cM (from the riceRGP 1400-marker map) in fact correlated with theR-local values that we extracted from the 41 publishedstudies. As shown in Figure 5A, the correlation was infact poor (Spearman r¼ 0.23, P¼ 0.1428, considered notsignificant); of course, there was no correlation whenR-local was compared to R-avg, so the R-map (kb/cM)values were still useful.

However, we then asked whether the correlation im-proved when R-map was calculated as genes/cM insteadof kb/cM. Limited evidence (Fu et al. 2001) suggestedthat the crossovers contributing to R-map values mightprimarily be occurring in and around genes. In fact, asshown in Figure 5B, we found a significantly improvedcorrelation between R-map values calculated as genes/cMto R-local values also calculated as genes/cM (Spearmanr ¼ 0.48, P ¼ 0.0016).

Therefore, we retested whether we could better pre-dict progeny mapping population sizes (N) when usingrice RGP R-map values calculated as genes/cM ratherthan kilobases/cM. Using R-map (genes/cM) calcula-tions shown in Table 2, Figure 6 demonstrates that in-deed the map population (N) predicted by both the(unmodified) Durrett–Tanksley equation and the (un-modified) Single-Crossover equation based on R-map(genes/cM) values better predicted the published re-sults over the genome-wide R-avg (28 genes/cM) orR-map values based on kb/cM (Figure 6 vs. Figure 4). Infact, with three outliers removed, the correlation be-tween the progeny size predictions based on R-map vs.the published data was extremely significant (Spearmanr ¼ 0.67, P , 0.0001, n ¼ 38) (Figure 6, C and F). Al-though the predictions did not improve further whenthe modified equations were used (for R-map, Spear-man r ¼ 0.38, P ¼ 0.0151, considered significant), thepredictions were significantly better than when theR-avg value was used in conjunction with the modifiedequations (Spearman r ¼ 0.05, P ¼ 0.7662, n ¼ 41, not

significant; data not shown). We conclude that mappingsize predictions based on neighborhood (.280-kb seg-ments) recombination frequencies (R-map) better pre-dict the number of progeny required to be genotypedfor positional gene cloning in rice when R-values arecalculated as genes/cM rather than kilobases/cM, andboth are significant improvements over calculationsbased on the genome-wide R-avg.

The limiting factor is that R-map values often do notreflect R-local frequencies, but when they do theprogeny mapping size can be accurately predicted: Ascalculated in Table 2 and shown in Figure 7A, thelimiting factor is that the neighborhood recombinationfrequency often does not reflect the local recombina-tion frequency, even though it is more reflective of localrates of recombination than the genome-wide average.The situation may or may not be better for other maps inother species, particularly as more robust, higher-resolution maps are constructed. Indeed, the rice mapgave us hope for the future; in spite of the problems withour use of this map (see discussion) as shown in Figure7A, we found 11 examples where the R-map values (cal-culated as genes/cM) were only ,30% different thanthe corresponding R-local value. These corresponded tothe following loci: f5-DU, spl11, gl-3, pla1, hd1, moc1,S32(t), bel, dl1, fon4, and Pi-d2. When the mapping pop-ulation size (N) was calculated for only these 11 alleles,shown in Figure 7, B–E, linear regression analysis showedthat both the modified Durrett–Tanksley equation aswell as the modified Single Crossover equation very ac-curately predicted the mapping population size (N)using recombination frequency (R-map) values fromthe RGP map: the best fit lines were linear (m¼ 1.2) andthe predictions matched the best-fit lines with very highr2 values (0.95–0.98). Similar results were obtained for10 examples where R-map values, calculated as kb/cM,were used; in that case, the predictions matched thebest-fit line also with r2 value of 0.98 (slope y ¼ 0.8x �590; data not shown).

The utility of our approach was best demonstrated bycomparing the data for bel (Pan et al. 2006) vs. Pi-d2(Chen et al. 2006); empirically, only 462 informative

Figure 5.—Meiotic re-combination frequencies(R-map) extracted from theRice Genome Project (RGP)RFLP map better correlatewith local frequencies(R-local) frompublishedpos-itional cloning studies whencalculated as genes/cMrather than kilobases/cM.(A) Linear regression analy-sisofR-mapvs.R-local valuesusing kilobase/cM ratios.

(B) Linear regression analysis using genes/cM ratios. The correlation between R-map vs. R-local will have to be calculated empiricallyfor each map and each species to determine if the methodology described in this study can be employed.

2048 S. J. Dinka et al.

Page 15: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

gametes (N) were genotyped to fine map bel to a mapresolution (T) of 18 genes; in contrast, 8000 informativegametes were required to fine map Pi-d2 to a mapresolution of 33 genes. The RGP map correctly pre-dicted that the recombination frequency (R-local)flanking Pi-d2 was �20-fold lower than that flankingbel. As a result, both modified equations would havepredicted in advance that mapping bel to this resolutionwould require �360 gametes, and that Pi-d2 wouldrequire �10,000 gametes. If such accurate predictionscould be made across the majority of target loci in thefuture, then researchers will be able generate appropri-ately sized map populations and properly allocate hu-man, growth room, and financial resources.

DISCUSSION

A key frustration during positional gene cloning, alsoknown as map-based cloning, has been that the size ofthe mapping population has been found to vary .25-fold within a species (Dinka and Raizada 2006) (Table1) depending on the target locus, and that this final sizehas been difficult to predict. As a result, researchersoften undertake positional cloning attempts with somefear. More importantly, it has been difficult to estimate

the time, resources, growth space, and personnel re-quired to generate, propagate, genotype, and pheno-type an appropriately sized progeny population. Thegoal of this research was to create a detailed method-ology to improve mapping size predictability acrosseukaryotic species once researchers have initially map-ped a target locus to a small interval (1–2 cM). As a sidebenefit, we have provided a detailed review of positionalcloning strategies and results in rice, which should beuseful information for the research community studyingrice, the world’s most important crop. Building upon thework of Durrett et al. (2002), we have demonstratedthe utility of a formula (the Durrett–Tanksley equation)that predicts progeny population size N (Figure 2). Byfurther fine-tuning the Durrett–Tanksley equation, tak-ing into account how many (redundant) crossovers de-fined the map resolution T (a measure of the localmarker density), we were able predict the size of themapping population with 100% accuracy when providedwith local, high-resolution recombination frequencies(Figure 3). We also derived and tested a simpler, moreuser-friendly equation, based on the probability ofachieving only one crossover within the progeny pop-ulation, instead of the two calculated by the Durrett–Tanksley equation. We found that the Single Crossovermodel was as predictive as the Durrett–Tanksley

Figure 6.—More significant improvement in predicting the size of the progeny mapping population (informative gametes)required to be genotyped during positional cloning in rice when employing neighborhood recombination frequencies (R-map) from the RGP map calculated as genes/cM rather than kb/cM. On the x-axis is the mapping population size from publishedpositional cloning studies in rice (see Table 1). On the y-axis is the prediction. (A–C) Models based on the Durrett–Tanksleyequation (unmodified). (D–F) Models based on the Single Crossover equation (unmodified). R-map values were calculated fromthe 1400-marker Rice Genome Project (RGP) RFLP map (F2 of Nipponbare 3 Kasalath cross) (see materials and methods). InC and F, three outliers were removed in comparison to B and E, respectively. Both equations set the probability of success at 95%. Asignificant improvement over use of the R-avg frequency (A and D) is demonstrated.

Predicting Positional Gene Cloning 2049

Page 16: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

equation, and that the number of crossovers (l) wasagain a useful equation modifier (Figures 2 and 3). Withvalidated equations, and researchers not having theluxury of having access to robust recombination fre-quencies in the vicinity of their target allele, wemeasured whether recombination frequencies derivedfrom a 1400-marker reference genetic map (supple-mental Table 1 at http://www.genetics.org/supplemental/)could be useful, and indeed the map population size wasmore accurately predicted when these values were usedinstead of the genome-wide average recombination fre-quency (Figures 4 and 6). Since researchers targeting afully sequenced genome care more about how manycandidate genes they must distinguish, not the numberof kilobases per se, we also determined that the modelscould predict gene resolution as well as or better thanthe kilobase resolution (Figures 5 and 6). Although therice map, in conjunction with our formulas, could haveaccurately predicted several unusually large or small

mapping population-requiring target alleles, includingalleles located near centromeres suffering from sup-pressed meiotic recombination (e.g., chl9, Pi-d2, andBph15), we found that the limiting factor was the cor-relation between R-map vs. R-local recombination fre-quencies (Table 2, Figure 7).

Understanding R-map vs. R-local discrepancies:There are likely several reasons for why recombinationfrequencies from a reference genetic map (R-map) inrice often did not match the frequency in the vicinity oftarget alleles (R-local), and these are important lessonsfor future attempts to predict mapping population size.First and most obvious, even within a .280-kb interval(�1 cM average), the rice RGP map demonstrated thatthe meiotic recombination frequency could vary signif-icantly (Wu et al. 2003) (supplemental Table 1 at http://www.genetics.org/supplemental/). Second, as is the casewith many whole-genome genetic maps, only small num-bers of progeny (typically 100–200) were genotyped to

Figure 7.—The underlying limiting factor is that the neighborhood (.280 kb) recombination frequency R-map often does notreflect the recombination frequency in the vicinity (,50 kb) of the target locus (R-local), but when the values do match, then theprogeny mapping size can be accurately predicted. (A) Comparison of recombination frequencies in the vicinity of the target gene(R-local) compared to neighborhood recombination frequencies (R-map) derived from the Rice Genome Project (RGP) RFLPmap. The graph shows at which loci R-map reflects R-local and where it does not. In the vicinity of the qSh1, dbs, fon4, chl-9, and Pi-d2 loci, R-map accurately predicted a low recombination frequency, unlike R-avg, and thus predicted that large numbers of prog-eny would need to be genotyped. (B–E) Linear regression analysis demonstrating that when R-map values were within 30% ofR-local values, including several high or low recombination intervals shown in A, then the modified Durrett–Tanksley equationand modified Single Crossover equation could accurately predict the final outcome, namely large or small mapping populations,respectively. The modified equations (C and E), which took into account the number of gametes carrying crossovers immediatelyflanking each target locus, were more accurate than the original equations (B and D).

2050 S. J. Dinka et al.

Page 17: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

generate the RGP map (Harushima et al. 1998); as aresult, the location of rare crossovers was more subjectto chance. In other words, had the RGP map been gen-erated multiple times using independent populations,the recombination frequencies would likely have variedsignificantly within 1–2-cM intervals. Third, whereas theRGP map was based on two parental genotypes, therice Indica variety (Kasalath) and the Japonica variety(Nipponbare) (Harushima et al. 1998), only 8 of 41 ofthe studies that we compared our models to also usedthese genotypes to generate their mapping populations.Differences between genotypes, such as the density ofrepetitive DNA or local cytogenetic rearrangements asseen in maize (Bennetzen and Ramakrishna 2002;Wang and Dooner 2006), might have caused R-mapvalues from the RGP map to differ from the publishedstudies. Indeed, it has been shown that domesticatedrice cultivars have an unusually high rate of ongoinggene duplications, vary considerably in the location anddensity of repetitive DNA (e.g., retroelements), and havevery high rates of intergenic nucleotide polymorphisms(SNPs, indels), perhaps in part due to human selectionin geographically isolated locations (Garris et al. 2005;Yuet al. 2005; Tanget al. 2006). Finally, the RGP map wasgenerated using F2 selfed progeny, whereas the map-ping populations used in the 41 published studies weregenerated by diverse methods, including the use of NILs,chromosome SSLs, and recombinant inbred lines (RILs),and in at least at one locus with low recombination rates,fon4-1, an �200-kb chromosome deletion was involved(H. W. Chu et al. 2006). It has been shown that when twochromatids differ in their relatedness to one another, asin RILs vs. NILs, the local recombination frequency maybe affected (Burr and Burr 1991; Lukacsovich andWaldman 1999; Li et al. 2006); in the most extremecase, unequal deletions between chromatids, suppres-sion of meiotic recombination has long been observed

(Rieseberg 2001). All of these factors might have con-tributed to our observation that R-map values from therice RGP map often did not match recombination fre-quencies in the vicinity of target alleles.

Applying these results: As for our recommendationsto researchers undertaking positional cloning, we rec-ommend that the R-map strategy should only be reliedupon when they have access to a reference genetic mapthat has been demonstrated to have a strong correlationbetween R-map values and R-local values. To make thispossible, higher resolution maps, with more markers,must be generated and/or employed to account for sub-centimorgan R variation. In potato, a genetic map with10,000 markers was recently constructed (van Os et al.2006), demonstrating progress in this area. Such high-resolution maps will provide researchers with a range ofrecombination frequencies across a 1–2-cM interval,and thus, at best, researchers could expect to predict anupper and lower range of N, not the precise number. Toimprove the robustness (reproducibility) of R-map fre-quencies, genetic maps must be generated based onsampling hundreds to thousands of progeny rather thanonly 100–200 individuals (Ferreira et al. 2006). To makereference map frequencies relevant to the genotypictargets of positional cloning, maps must be constructedfrom more parental genotype pairs. In addition, forsome species, the number of informative gametes (N)might need to be adjusted to account for male vs. femaledifferences in recombination frequency (Lenormand

and Dutheil 2005) by adjusting the meiosis factor (f )(see materials and methods). As to whether R-mapvalues based on genes/cM or kilobases/cM should beused, we had assumed, given that meiotic recombina-tion in plant genomes has been shown to be highlybiased to gene regions, rather than flanking hetero-chromatin (Fu et al. 2001), that if we ascribed mostrecombination as occurring within or flanking genes,

Figure 8.—The final goal ofthis research: a mapping popula-tion size prediction graph. Shownare the predictions for rice chro-mosome 3 of the number of prog-eny (informative gametes, N)required to positionally clone atarget allele to achieve a five-candidate gene map resolution(T) based on the Single Crossoverequation (unmodified) with a95% probability of success. Thex-axis denotes the physical basepair location along the sequencedchromosome. Arrows point topreviously isolated alleles in rice;the model was effective in predict-ing the relative mapping popula-tion size for these alleles (see

results text). For example, the graph accurately predicted that 20-fold more progeny would be required to positionally clonethe chl9 locus compared to the nearby gl-3 locus. The model is based on meiotic recombination frequencies (R-map in genes/cM)as calculated from the Rice Genome Project (RGP) map (see supplemental Table 1 at http://www.genetics.org/supplemental/).

Predicting Positional Gene Cloning 2051

Page 18: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

then the genes/cM ratio would be less variable than thekb/cM ratio; in other words, as the number of genesincreased in an interval, the frequency of crossoverswould also increase in proportion, keeping the genes/cMratio constant. However, in retrospect, two pieces of datanow suggest that this assumption was incorrect. First, inthe meiotic recombination frequency calculations wemade on the RGP rice map, we found that the genes/cMratio varied within the genome nearly as much as thekb/cM ratio; the coefficient of variation for R (genes/cM)was 98% across the rice genome (n¼ 971) compared to113% for R (kb/cM) (n ¼ 952). Second, if recombina-tion was biased to within or near genes, then the recom-bination frequencies from positional cloning studies(R-local) would be predicted to be higher than thegenome-wide average for rice (R-avg ¼ 277 kb/cM); infact, out of the 41 published studies, 20 studies had aR-local value below R-avg with 20 above the R-avg, sug-gesting no bias in recombination near genes (Table 2).It is therefore possible that the stronger correlation wefound for the RGP map between R-map vs. R-local, whencalculated as genes/cM, was random, but this should betested for more maps and for more species. Indeed, itwill be interesting to test the predictions of this paper inboth larger and more compact genomes.

As more robust, higher-resolution maps across moreparental genotypes become available, our hope is thatthe methodology we have described here will generateaccurate mapping population size graphs that predict arange of N-values for a given target allele. We concludeby showing an example of such a map in Figure 8, re-presenting our predictions for rice chromosome 3. Inspite of the challenges noted, this map did accuratelypredict the very different mapping population sizesrequired for the five alleles shown.

We thank the corresponding authors of the positional cloning stud-ies cited here for numerous personal communications. Funds for workon rice genome annotation at The Institute for Genomic Researchwere through a grant from the National Science Foundation (DBI0321538) to C. Robin Buell. This research was supported by an OntarioPremier’s Research Excellence Award, an Ontario Ministry of Agricul-ture and Food (OMAF) grant, and a Discovery Grant from the NaturalSciences and Engineering Research Council of Canada, to M.N.R.

LITERATURE CITED

Ashikari, M., H. Sakakibara, S. Y. Lin, T. Yamamoto, T. Takashi

et al., 2005 Cytokinin oxidase regulates rice grain production.Science 309: 741–745.

Ballinger, D. G., and S. Benzer, 1989 Targeted gene mutations inDrosophila. Proc. Natl. Acad. Sci. USA 86: 9402–9406.

Bennetzen, J. L., and W. Ramakrishna, 2002 Exceptional haplo-type variation in maize. Proc. Natl. Acad. Sci. USA 99: 9093–9095.

Botstein, D., R. L. White, M. Skolnick and R. W. Davis, 1980 Con-struction of a genetic linkage map in man using restriction frag-ment length polymorphisms. Am. J. Hum. Genet. 32: 314–331.

Burr, B., and F. A. Burr, 1991 Recombinant inbreds for molecularmapping in maize—theoretical and practical considerations.Trends Genet. 7: 55–60.

Chen, X. W., J. J. Shang, D. X. Chen, C. L. Lei, Y. Zou et al., 2006 AB-lectin receptor kinase gene conferring rice blast resistance.Plant J. 46: 794–804.

Chu, H. W., Q. Qian, W. Q. Liang, C. S. Yin, H. X. Tan et al.,2006 The floral organ number4 gene encoding a putative or-tholog of Arabidopsis CLAVATA3 regulates apical meristem sizein rice. Plant Physiol. 142: 1039–1052.

Chu, Z. H., B. Y. Fu, H. Yang, C. G. Xu, Z. K. Li et al., 2006 Targetingxa13, a recessive gene for bacterial blight resistance in rice. The-oret. Appl. Genet. 112: 455–461.

Crawford, D. C., T. Bhangale, N. Li, G. Hellenthal, M. J. Rieder

et al., 2004 Evidence for substantial fine-scale variation in recom-bination rates across the human genome. Nat. Genet. 36: 700–706.

Dinka, S. J., and M. N. Raizada, 2006 Inexpensive fine mappingand positional cloning in plants using visible, mapped trans-genes. Can. J. Bot. 84: 179–188.

Durrett, R. T., K. Y. Chen and S. D. Tanksley, 2002 A simple for-mula useful for positional cloning. Genetics 160: 353–355.

Ferreira, A., M. F. da Silva, L. Silva and C. D. Cruz, 2006 Estimatingthe effects of population size and type on the accuracy of geneticmaps. Genet. Molec. Biol. 29: 187–192.

Fu, H. H., W. K. Park, X. H. Yan, Z. W. Zheng, B. Z. Shen et al.,2001 The highly recombinogenic bz locus lies in an unusuallygene-rich region of the maize genome. Proc. Natl. Acad. Sci. USA98: 8903–8908.

Garris, A. J., T. H. Tai, J. Coborn, S. Kresovich and S. R. McCouch,2005 Genetic structure and diversity in Oryza sativa L. Genetics169: 1631–1638.

Haga, K., M. Takano, R. Neumann and M. Iino, 2005 The rice co-leoptile phototropism gene encoding an ortholog of ArabidopsisNPH3 is required for phototropism of coleoptiles and lateraltranslocation of auxin. Plant Cell 17: 103–115.

Haldane, J. B. S., 1919 The combination of linkage values and thecalculation of distances between the loci of linked factors. J.Genet. 8: 299–309.

Harushima, Y., M. Yano, P. Shomura, M. Sato, T. Shimano et al.,1998 A high-density rice genetic linkage map with 2275markers using a single F-2 population. Genetics 148: 479–494.

Hong, Z., M. Ueguchi-Tanaka, K. Umemura, S. Uozu, S. Fujioka

et al., 2003 A rice brassinosteroid-deficient mutant, ebisu dwarf(d2), is caused by a loss of function of a new member of cyto-chrome P450. Plant Cell 15: 2900–2910.

IRGSP, 2005 The map-based sequence of the rice genome. Nature436: 793–800.

Iyer, A. S., and S. R. McCouch, 2004 The rice bacterial blight re-sistance gene xa5 encodes a novel form of disease resistance.Mol. Plant Microbe Interact. 17: 1348–1354.

Komori, T., T. Yamamoto, N. Takemori, M. Kashihara, H.Matsushima et al., 2003 Fine genetic mapping of the nucleargene, Rf-1, that restores the BT-type cytoplasmic male sterility inrice (Oryza sativa L.) by PCR-based markers. Euphytica 129: 241–247.

Komori, T., S. Ohta, N. Murai, Y. Takakura, Y. Kuraya et al.,2004 Map-based cloning of a fertility restorer gene, Rf-1, in rice(Oryza sativa L.). Plant J. 37: 315–325.

Komorisono, M., M. Ueguchi-Tanaka, I. Aichi, Y. Hasegawa, M.Ashikari et al., 2005 Analysis of the rice mutant dwarf and gla-dius leaf 1. Aberrant katanin-mediated microtubule organizationcauses up-regulation of gibberellin biosynthetic genes indepen-dently of gibberellin signaling. Plant Physiol. 138: 1982–1993.

Konishi, S., T. Izawa, S. Y. Lin, K. Ebana, Y. Fukuta et al., 2006 AnSNP caused loss of seed shattering during rice domestication. Sci-ence 312: 1392–1396.

Lenormand, T., and J. Dutheil, 2005 Recombination difference be-tween sexes: A role for haploid selection. PLoS Biol. 3: 396–403.

Li, D. T., L. M. Chen, L. Jiang, S. S. Zhu, Z. G. Zhao et al., 2007 Finemapping of S32(t), a new gene causing hybrid embryo sac steril-ity in a Chinese landrace rice (Oryza sativa L.). Theoret. Appl.Genet. 114: 515–524.

Li, L. L., M. Jean and F. Belzile, 2006 The impact of sequence di-vergence and DNA mismatch repair on homeologous recombi-nation in Arabidopsis. Plant J. 45: 908–916.

Li, X. Y., Q. Qian, Z. M. Fu, Y. H. Wang, G. S. Xiong et al.,2003 Control of tillering in rice. Nature 422: 618–621.

2052 S. J. Dinka et al.

Page 19: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

Li, Y. H., O. Qian, Y. H. Zhou, M. X. Yan, L. Sun et al.,2003 BRITTLE CULM1, which encodes a COBRA-like protein,affects the mechanical properties of rice plants. Plant Cell 15:2020–2031.

Liu, X. Q., L. Wang, S. Chen, F. Lin and Q. H. Pan, 2005 Geneticand physical mapping of Pi36(t), a novel rice blast resistancegene located on rice chromosome 8. Mol. Genet. Genom. 274:394–401.

Lukacsovich, T., and A. S. Waldman, 1999 Suppression of intra-chromosomal gene conversion in mammalian cells by small de-grees of sequence divergence. Genetics 151: 1559–1568.

Luo, A. D., Q. Qian, H. F. Yin, X. Q. Liu, C. X. Yin et al., 2006 EUI1,encoding a putative cytochrome P450 monooxygenase, regulatesinternode elongation by modulating gibberellin responses inrice. Plant Cell Physiol. 47: 181–191.

McVean, G. A. T., S. R. Myers, S. Hunt, P. Deloukas, D. R. Bentley

et al., 2004 The fine-scale structure of recombination rate vari-ation in the human genome. Science 304: 581–584.

Miyoshi, K., B. O. Ahn, T. Kawakatsu, Y. Ito, J. I. Itoh et al.,2004 PLASTOCHRON1, a timekeeper of leaf initiation in rice,encodes cytochrome P450. Proc. Natl. Acad. Sci. USA 101: 875–880.

Nachman, M. W., 2002 Variation in recombination rate across thegenome: evidence and implications. Curr. Opin. Genet. Dev.12: 657–663.

Nishimura, A., M. Ashikari, S. Lin, T. Takashi, E. R. Angeles et al.,2005 Isolation of a rice regeneration quantitative trait loci geneand its application to transformation systems. Proc. Natl. Acad.Sci. USA 102: 11940–11944.

Pan, G., X. Y. Zhang, K. D. Liu, J. W. Zhang, X. Z. Wu et al.,2006 Map-based cloning of a novel rice cytochrome P450 geneCYP81A6 that confers resistance to two different classes of herbi-cides. Plant Mol. Biol. 61: 933–943.

Paterson, A., E. Lander, J. Hewitt, S. Peterson, S. Lincoln et al.,1988 Resolution of quantitative traits into Mendelian factors byusing a complete linkage map of restriction fragment lengthpolymorphisms. Nature 335: 721–726.

Paterson, A. H., M. Freeling and T. Sasaki, 2005 Grains of knowl-edge: genomics of model cereals. Genome Res. 15: 1643–1650.

Qiu, S. Q., K. D. Liu, J. X. Jiang, X. Song, C. G. Xu et al., 2005 De-limitation of the rice wide compatibility gene S5(n) to a 40-kbDNA fragment. Theoret. Appl. Genet. 111: 1080–1086.

Raizada, M. N., 2003 RescueMu protocols for maize functional ge-nomics. Methods Mol. Biol. 236: 37–58.

Ren, Z. H., J. P. Gao, L. G. Li, X. L. Cai, W. Huang et al., 2005 A ricequantitative trait locus for salt tolerance encodes a sodium trans-porter. Nat. Genet. 37: 1141–1146.

Rieseberg, L. H., 2001 Chromosomal rearrangements and specia-tion. Trends Ecol. Evol. 16: 351–358.

Sazuka, T., I. Aichi, T. Kawai, N. Matsuo, H. Kitano et al.,2005 The rice mutant dwarf bamboo shoot 1: A leaky mutantof the NACK-type kinesin-like gene can initiate organ primordiabut not organ development. Plant Cell Physiol. 46: 1934–1943.

Sharma, T. R., M. S. Madhav, B. K. Singh, P. Shanker, T. K. Jana

et al., 2005 High-resolution mapping, cloning and molecularcharacterization of the Pi-k(h) gene of rice, which confers re-sistance to Magnaporthe grisea. Mol. Genet. Genom. 274: 569–578.

Singer, A., H. Perlman, Y. L. Yan, C. Walker, G. Corley-Smith et al.,2002 Sex-specific recombination rates in zebrafish (Danio rerio).Genetics 160: 649–657.

Sun, X. L., Y. L. Cao, Z. F. Yang, C. G. Xu, X. H. Li et al., 2004 Xa26,a gene conferring resistance to Xanthomonas oryzae pv. oryzae inrice, encodes an LRR receptor kinase-like protein. Plant J. 37:517–527.

Suzaki, T., M. Sato, M. Ashikari, M. Miyoshi, Y. Nagato et al.,2004 The gene FLORAL ORGAN NUMBER1 regulates floralmeristem size in rice and encodes a leucine-rich repeat receptorkinase orthologous to Arabidopsis CLAVATA1. Development131: 5649–5657.

Takahashi, Y., A. Shomura, T. Sasaki and M. Yano, 2001 Hd6, arice quantitative trait locus involved in photoperiod sensitivity,encodes the alpha subunit of protein kinase CK2. Proc. Natl.Acad. Sci. USA 98: 7922–7927.

Tanabe, S., M. Ashikari, S. Fujioka, S. Takatsuto, S. Yoshida et al.,2005 A novel cytochrome P450 is implicated in brassinosteroidbiosynthesis via the characterization of a rice dwarf mutant,dwarf11, with reduced seed length. Plant Cell 17: 776–790.

Tang, T., J. Lu, J. Huang, J. He, S. R. McCouch et al., 2006 Genomicvariation in rice: genesis of highly polymorphic linkage blocksduring domestication. PLoS Genet. 2: e199.

Tanksley, S. D., M. W. Ganal and G. B. Martin, 1995 Chromo-some landing: a paradigm for map-based gene cloning in plantswith large genomes. Trends Genet. 11: 63–68.

Ueda, T., T. Sato, J. Hidema, T. Hirouchi, K. Yamamoto et al.,2005 qUVR-10, a major quantitative trait locus for ultraviolet-B resistance in rice, encodes cyclobutane pyrimidine dimer pho-tolyase. Genetics 171: 1941–1950.

Ueguchi-Tanaka, M., M. Ashikari, M. Nakajima, H. Itoh, E. Katoh

et al., 2005 GIBBERELLIN INSENSITIVE DWARF1 encodes asoluble receptor for gibberellin. Nature 437: 693–698.

van Os, H., S. Andrzejewski, E. Bakker, I. Barrena, G. J. Bryan

et al., 2006 Construction of a 10,000-marker ultradense geneticrecombination map of potato: Providing a framework for accel-erated gene isolation and a genomewide physical map. Genetics173: 1075–1087.

Wan, X. Y., J. M. Wan, L. Jiang, J. K. Wang, H. Q. Zhai et al.,2006 QTL analysis for rice grain length and fine mapping ofan identified QTL with stable and major effects. Theoret. Appl.Genet. 112: 1258–1270.

Wang, G. W., Y. Q. He, C. G. Xu and Q. F. Zhang, 2006 Fine map-ping of f5-Du, a gene conferring wide-compatibility for pollenfertility in inter-subspecific hybrids of rice (Oryza sativa L.). The-oret. Appl. Genet. 112: 382–387.

Wang, Q. H., and H. K. Dooner, 2006 Remarkable variation inmaize genome structure inferred from haplotype diversity atthe bz locus. Proc. Natl. Acad. Sci. USA 103: 17644–17649.

Wang, Z. X., M. Yano, U. Yamanouchi, M. Iwamoto, L. Monna et al.,1999 The Pib gene for rice blast resistance belongs to the nu-cleotide binding and leucine-rich repeat class of plant disease re-sistance genes. Plant J. 19: 55–64.

Wu, J. Z., H. Mizuno, M. Hayashi-Tsugane, Y. Ito, Y. Chiden et al.,2003 Physical maps and recombination frequency of six ricechromosomes. Plant J. 36: 720–730.

Wu, T. D., and C. K. Watanabe, 2005 GMAP: a genomic mappingand alignment program for mRNA and EST sequences. Bioinfor-matics 21: 1859–1875.

Yamanouchi, U., M. Yano, H. X. Lin, M. Ashikari and K. Yamada,2002 A rice spotted leaf gene, SpI7, encodes a heat stress tran-scription factor protein. Proc. Natl. Acad. Sci. USA 99: 7530–7535.

Yang, H. Y., A. Q. You, Z. F. Yang, F. Zhang, R. F. He et al.,2004 High-resolution genetic mapping at the Bph15 locusfor brown planthopper resistance in rice (Oryza sativa L.). The-oret. Appl.Genet. 110: 182–191.

Yang, Z., X. Sun, S. Wang and Q. Zhang, 2003 Genetic and physicalmapping of a new gene for bacterial blight resistance in rice. The-oret. Appl. Genet. 106: 1467–1472.

Yano, M., Y. Katayose, M. Ashikari, U. Yamanouchi, L. Monna et al.,2000 Hd1, a major photoperiod sensitivity quantitative trait lo-cus in rice, is closely related to the Arabidopsis flowering timegene CONSTANS. Plant Cell 12: 2473–2483.

Yoshimura, S., U. Yamanouchi, Y. Katayose, S. Toki, Z. X. Wang

et al., 1998 Expression of Xa1, a bacterial blight-resistance genein rice, is induced by bacterial inoculation. Proc. Natl. Acad. Sci.USA 95: 1663–1668.

Yu, J., J. Wang, W. Lin, S. Li and H. E. A. Li, 2005 The genomes ofOryza sativa: a history of duplications. PLoS Biol 3: e38.

Yuan, Q. P., O. Y. Shu, A. H. Wang, W. Zhu, R. Maiti et al., 2005 TheInstitute for Genomic Research Osa1 rice genome annotation da-tabase. Plant Physiol. 138: 17–26.

Zeng, L. R., S. H. Qu, A. Bordeos, C. W. Yang, M. Baraoidan et al.,2004 Spotted leaf11, a negative regulator of plant cell deathand defense, encodes a U-box/armadillo repeat protein en-dowed with E3 ubiquitin ligase activity. Plant Cell 16: 2795–2808.

Zhang, H. T., J. J. Li, J. H. Yoo, S. C. Yoo, S. H. Cho et al., 2006 Ricechlorina-1 and chlorina-9 encode ChlD and ChlI subunits ofMg-chelatase, a key enzyme for chlorophyll synthesis and chloro-plast development. Plant Mol. Biol. 62: 325–337.

Predicting Positional Gene Cloning 2053

Page 20: Predicting the Size of the Progeny Mapping …...model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping pop We derived and/or validated, then

Zhang, K. W., Q. Qian, Z. J. Huang, Y. Q. Wang, M. Li et al.,2006 GOLD HULL AND INTERNODE2 encodes a primarilymultifunctional cinnamyl-alcohol dehydrogenase in rice1. PlantPhysiol. 140: 972–983.

Zhu, Y. Y., T. Nomura, Y. H. Xu, Y. Y. Zhang, Y. Peng et al.,2006 ELONGATED UPPERMOST INTERNODE encodes a cy-tochrome P450 monooxygenase that epoxidizes gibberellins in anovel deactivation reaction in rice. Plant Cell 18: 442–456.

Zou, J. H., Z. X. Chen, S. Y. Zhang, W. P. Zhang, G. H. Jiang et al.,2005 Characterizations and fine mapping of a mutant gene for

high tillering and dwarf in rice (Oryza sativa L.). Planta 222: 604–612.

Zou, J. H., S. Y. Zhang, W. P. Zhang, G. Li, Z. X. Chen et al.,2006 The rice HIGH-TILLERING DWARF1 encoding an ortho-log of Arabidopsis MAX3 is required for negative regulation ofthe outgrowth of axillary buds. Plant J. 48: 687–696.

Communicating editor: J. A. Birchler

2054 S. J. Dinka et al.