13
Copyright Ó 2007 by the Genetics Society of America DOI: 10.1534/genetics.107.081323 Gene-Based Sequence Diversity Analysis of Field Pea (Pisum) Runchun Jing,* ,1 Richard Johnson,* Andrea Seres, Gyorgy Kiss, Mike J. Ambrose, Maggie R. Knox, T. H. Noel Ellis and Andrew J. Flavell* ,2 *Plant Research Unit, University of Dundee at Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, United Kingdom, Institute of Genetics–Agricultural Biotechnology Center, Szent-Gyo ¨rgyi u. 4, 2100 Go ¨do ¨ll } o, Hungary and John Innes Centre, Colney, Norwich NR4 7UH, United Kingdom Manuscript received August 30, 2007 Accepted for publication October 11, 2007 ABSTRACT Sequence diversity of 39 dispersed gene loci was analyzed in 48 diverse individuals representative of the genus Pisum. The different genes show large variation in diversity parameters, suggesting widely differing levels of selection and a high overall diversity level for the species. The data set yields a genetic diversity tree whose deep branches, involving wild samples, are preserved in a tree derived from a polymorphic retrotransposon insertions in an identical sample set. Thus, gene regions and intergenic ‘‘junk DNA’’ share a consistent picture for the genomic diversity of Pisum, despite low linkage disequilibrium in wild and landrace germplasm, which might be expected to allow independent evolution of these very different DNA classes. Additional lines of evidence indicate that recombination has shuffled gene haplotypes efficiently within Pisum, despite its high level of inbreeding and widespread geographic distribution. Trees derived from individual gene loci show marked differences from each other, and genetic distance values between sample pairs show high standard deviations. Sequence mosaic analysis of aligned sequences identifies nine loci showing evidence for intragenic recombination. Lastly, phylogenetic network analysis confirms the non-treelike structure of Pisum diversity and indicates the major germplasm classes involved. Overall, these data emphasize the artificiality of simple tree structures for representing genomic sequence variation within Pisum and emphasize the need for fine structure haplotype analysis to accurately define the genetic structure of the species. T HE genetic diversity of a species is the sum of its total DNA sequence variation, resulting from millions of years of cumulative mutation, recombination, and selection. Understanding the pattern of the diversity within cultivated plant species and their wild relatives is both interesting and practically important from the viewpoint of conservation and use. Therefore, the ways by which genetic diversity in populations are estimated and represented are important. A popular approach for measuring the genetic diversity of accessions of a species is to analyze samples from gene banks (germplasm col- lections) by one or more of the many available molecular marker methods. The effectiveness of this strategy de- pends upon the suitability of the marker method for analyzing the diversity of the sample collection under investigation. Different marker methods can give differ- ent views of diversity, depending upon the evolutionary parameters of the underlying DNA sequence variation. Rapidly evolving DNAs, such as simple sequence repeats (SSRs), give high resolution views of relatedness; single nucleotide polymorphism (SNP)-based variation is more suited to deeper relationships, reflecting the slow mutation rate for this type of sequence variation; and transposon insertion-based marker methods should lie between these two, reflecting their intermediate muta- tion rate, although few studies have been carried out to test this. Furthermore, the genomic compartment in which the markers reside might affect the diversity pattern seen, and it is possible that markers residing mainly in junk DNA might produce different results from markers based upon genes, which are predomi- nantly euchromatic. For these reasons there is a need to compare the diversity patterns obtained using different molecular approaches for diversity assessment. The data sets that result from marker analysis of germplasm samples are often represented by trees, with the summed branch lengths separating any two samples taken as a measure of their relatedness. Trees are visually appealing but can be misleading. One of several serious limitations to this approach is the fact that it ignores introgression and recombination. Modeling has shown that recombination leads to long terminal branches, resulting in trees showing a ‘‘star phylogeny’’ (Schierup and Hein 2000). Recombination results in multiplica- tion of the number of trees, with every inherited crossover producing a corresponding extra tree, which Sequence data from this article have been deposited with the EMBL/ GenBank Data Libraries under accession nos. EU269859–EU271646. 1 Present Address: School of Biological Sciences, University of East Anglia, Norwich NR4 7TJ, United Kingdom. 2 Corresponding author: Division of Plant Sciences, University of Dundee at SCRI, Invergowrie, Dundee DD2 5DA, United Kingdom. E-mail: a.j.fl[email protected] Genetics 177: 2263–2275 (December 2007) Downloaded from https://academic.oup.com/genetics/article/177/4/2263/6064433 by guest on 12 March 2022

Gene-Based Sequence Diversity Analysis of Field Pea - Genetics

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Copyright � 2007 by the Genetics Society of AmericaDOI: 10.1534/genetics.107.081323

Gene-Based Sequence Diversity Analysis of Field Pea (Pisum)

Runchun Jing,*,1 Richard Johnson,* Andrea Seres,† Gyorgy Kiss,† Mike J. Ambrose,‡

Maggie R. Knox,‡ T. H. Noel Ellis‡ and Andrew J. Flavell*,2

*Plant Research Unit, University of Dundee at Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, United Kingdom,†Institute of Genetics–Agricultural Biotechnology Center, Szent-Gyorgyi u. 4, 2100 Godoll}o, Hungary and

‡John Innes Centre, Colney, Norwich NR4 7UH, United Kingdom

Manuscript received August 30, 2007Accepted for publication October 11, 2007

ABSTRACT

Sequence diversity of 39 dispersed gene loci was analyzed in 48 diverse individuals representative of thegenus Pisum. The different genes show large variation in diversity parameters, suggesting widely differinglevels of selection and a high overall diversity level for the species. The data set yields a genetic diversity treewhose deep branches, involving wild samples, are preserved in a tree derived from a polymorphicretrotransposon insertions in an identical sample set. Thus, gene regions and intergenic ‘‘junk DNA’’ sharea consistent picture for the genomic diversity of Pisum, despite low linkage disequilibrium in wild andlandrace germplasm, which might be expected to allow independent evolution of these very different DNAclasses. Additional lines of evidence indicate that recombination has shuffled gene haplotypes efficientlywithin Pisum, despite its high level of inbreeding and widespread geographic distribution. Trees derivedfrom individual gene loci show marked differences from each other, and genetic distance values betweensample pairs show high standard deviations. Sequence mosaic analysis of aligned sequences identifies nineloci showing evidence for intragenic recombination. Lastly, phylogenetic network analysis confirms thenon-treelike structure of Pisum diversity and indicates the major germplasm classes involved. Overall, thesedata emphasize the artificiality of simple tree structures for representing genomic sequence variation withinPisum and emphasize the need for fine structure haplotype analysis to accurately define the geneticstructure of the species.

THE genetic diversity of a species is the sum of its totalDNA sequence variation, resulting from millions of

years of cumulative mutation, recombination, andselection. Understanding the pattern of the diversitywithin cultivated plant species and their wild relativesis both interesting and practically important from theviewpoint of conservation and use. Therefore, the waysby which genetic diversity in populations are estimatedand represented are important. A popular approach formeasuring the genetic diversity of accessions of a speciesis to analyze samples from gene banks (germplasm col-lections) by one or more of the many available molecularmarker methods. The effectiveness of this strategy de-pends upon the suitability of the marker method foranalyzing the diversity of the sample collection underinvestigation. Different marker methods can give differ-ent views of diversity, depending upon the evolutionaryparameters of the underlying DNA sequence variation.Rapidly evolving DNAs, such as simple sequence repeats

(SSRs), give high resolution views of relatedness; singlenucleotide polymorphism (SNP)-based variation ismore suited to deeper relationships, reflecting the slowmutation rate for this type of sequence variation; andtransposon insertion-based marker methods should liebetween these two, reflecting their intermediate muta-tion rate, although few studies have been carried out totest this. Furthermore, the genomic compartment inwhich the markers reside might affect the diversitypattern seen, and it is possible that markers residingmainly in junk DNA might produce different resultsfrom markers based upon genes, which are predomi-nantly euchromatic. For these reasons there is a need tocompare the diversity patterns obtained using differentmolecular approaches for diversity assessment.

The data sets that result from marker analysis ofgermplasm samples are often represented by trees, withthe summed branch lengths separating any two samplestaken as a measure of their relatedness. Trees are visuallyappealing but can be misleading. One of several seriouslimitations to this approach is the fact that it ignoresintrogression and recombination. Modeling has shownthat recombination leads to long terminal branches,resulting in trees showing a ‘‘star phylogeny’’ (Schierup

and Hein 2000). Recombination results in multiplica-tion of the number of trees, with every inheritedcrossover producing a corresponding extra tree, which

Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. EU269859–EU271646.

1Present Address: School of Biological Sciences, University of East Anglia,Norwich NR4 7TJ, United Kingdom.

2Corresponding author: Division of Plant Sciences, University of Dundeeat SCRI, Invergowrie, Dundee DD2 5DA, United Kingdom.E-mail: [email protected]

Genetics 177: 2263–2275 (December 2007)

Dow

nloaded from https://academ

ic.oup.com/genetics/article/177/4/2263/6064433 by guest on 12 M

arch 2022

differs by a single branch translocation from its ante-cedent (Hudson 1983; Maddison 1995). One way ofrepresenting such ambiguity is to replace tree structureby a reticulated network (Bryant and Moulton 2004;Huson and Bryant 2006). In summary, the usefulnessof trees derived from unlinked sites in a genome thathas undergone significant recombination is question-able, and trees derived from linked sites must be con-sidered in the context of the ratio of recombination tomutation in that region in the corresponding lineage.

We have previously studied the genetic diversity of thefield pea (P. sativum) and its wild relatives, using severalmarker-based approaches. P. sativum is an Old Worldlegume crop first cultivated 10,000 years ago (Blixt

1972; Zohary 1996; Mithen 2003). Like all major cropspecies, cultivated Pisum has a condensed gene poolrelative to its wild relatives (in this case P. fulvum andP. elatius), and the relationships within the genus Pisumhave generated substantial debate. Studies by ourselvesand others have indicated that P. fulvum can reason-ably be considered as a distinct species, with P. sativumforming a subset of P. elatius (Vershinin et al. 2003;Baranger et al. 2004; Tar’anet al. 2005). Other claimedspecies such as P. humile and P. abyssinicum have littlesupport from molecular studies. Furthermore, there isextensive sharing of SSAP retrotransposon markersbetween all Pisum species, suggesting that there hasbeen significant outcrossing between them (Vershinin

et al. 2003), despite the predominantly inbreedingnature of the genus, which should restrict introgressionof haplotypes. Most, but not all wide crosses betweenPisum genotypes are fertile, the exceptions involvingP. fulvum and P. abyssinicum. P. abyssinicum and P. sativumare interfertile although success rates can be variable,and between some accessions reciprocal crosses maynot be interfertile. P. fulvum and P. sativum have a morerestricted interfertility, although crosses can be success-ful (Clement et al. 2002) and for practical purposesP. abyssinicum has been used in a bridging cross (e.g.,Forster et al. 1999). There are sporadic reports ofinfertility in wide crosses (e.g., Ochatt et al. 2004), butsuch reports should be considered against the back-ground of demonstrable success in these wide crosses.For these reasons Pisum is perhaps best considered as aspecies complex with multiple subspecies that inter-breed to a different degree.

The SSAP markers that have given rise to the aboveconclusion are based upon insertions of the PDR1 Ty1-copia group retrotransposon (Vershinin et al. 2003). Inlarge plant genomes such as maize and Pisum, retro-transposons are mainly located in intergenic nuclearDNA, and the antiquity of individual insertion eventstypically lies in the 0.1–5 MYA range (Sanmiguel et al.1998; Jing et al. 2005). One aim of the present study wasto compare SSAP-based diversity analysis with gene-based sequence diversity analysis, to see whether thesevery different genomic fractions produce similar or

different pictures of Pisum diversity. Such informationwould also provide pointers to an optimal way ofassessing diversity within Pisum and perhaps the othermajor inbreeding crop species with large, transposon-rich genomes such as wheat, barley, and rice. The othermain goal was to explore the variation in genic sequen-ces for Pisum and compare the contributions of re-combination and mutation to the evolution of itsgenetic diversity.

MATERIALS AND METHODS

Plant materials and DNA extraction: Forty-eight Pisumaccessions from the John Innes Pisum Collection (http://www.jic.ac.uk/germplas/pisum/) were selected (see supple-mental Table 1 at http://www.genetics.org/supplemental/).Forty-five of these are in a core set of 52 Pisum accessionschosen for our previous study of SSAP marker-based gene-tic diversity analysis (Vershinin et al. 2003). The three newP. sativum accessions ( JI15, JI281, and JI399) are the parentsof genetic mapping populations, providing polymorphismsthat would hopefully allow us to map the genes in this study,and the four samples missing from this study are P. abyssinicumaccessions that have been shown by extensive marker analysisto be almost identical to the single chosen P. abyssinicumaccession JI2385 (Vershinin et al. 2003). Each sample was theprogeny of a single selfed plant, to ensure homogeneity andderived from an inbred accession to ensure homozygosity.DNAs were extracted by the QIAGEN (Valencia, CA) DNeasy96 method.

PCR primers for amplification of gene loci: Primers usedfor gene segment amplification are shown in supplementalTable 2 at http://www.genetics.org/supplemental/. All pri-mers are derived from Pisum gene exon sequences, and allPCR amplicons contain at least one intron. Forty-two primerpairs were originally selected, 3 of which were discarded (datanot shown) because they produced mixed PCR products.

PCR amplification of pea gene-derived sequences: All PCRamplifications were carried out using a MJ Research PTC-225Tetrad Thermal Cycler. Twenty-five-microliter reactions con-tained the following: 30 ng of pea genomic DNA template,0.2 mm each of forward and reverse primer, 2.5 ml QIAGENHotStar 103 PCR buffer containing 15 mm MgCl2, 4 ml ofdNTPs (Roche 1250 mm), and 0.125 ml (0.625 units) Hot StarTaq DNA Polymerase (QIAGEN). Cycling conditions involvedan initial enzyme activation step for 15 min at 95�, followed by40 cycles of 94� for 1 min, 55� for 1 min and 72� for 1 min, witha final extension cycle of 72� for 7 min. Amplification productswere visualized by electrophoresis of 5 ml of PCR product on a1.5% agarose gel containing ethidium bromide. PCR productswere purified using NucleoFast 96 PCR plates (Macherey-Nagel). The PCR product yield was estimated by comparisonwith standard concentrations of l bacteriophage DNA usingagarose gel electrophoresis. Thirty-eight amplifications failedto produce a PCR product (2% of the complete set).

DNA sequencing: Sequencing PCR reactions used 0.33 mlBig Dye Terminator v3.1 Cycle Sequencing RR-100 (Perkin-Elmer, Norwalk, CT), 3.33 ml BetterBuffer (Web Scientific),0.44 mm primer, 6 ng template PCR DNA fragment in a finalvolume of 10 ml. Cycle conditions: Cycling conditions involved25 cycles of 96� for 30 sec, 50� for 15 sec, and 60� for 4 min withtemperature ramp rate of 1�/sec. Products were purifiedeither using genCLEAN plates (Genetix) or by ethanolprecipitation, as follows: 31 ml of a 0.1 m sodium acetate pH4.6 in 95% ethanol solution was added. This was allowed to

2264 R. Jing et al.D

ownloaded from

https://academic.oup.com

/genetics/article/177/4/2263/6064433 by guest on 12 March 2022

incubate for 15 min at room temperature, followed by a 4000rpm spin for 30 min in an Eppendorf 5810R plate centrifuge.Plates were inverted and spun at 700 rpm for 1 min. A total of150 ml 70% ethanol was added, and the plates were vortexedand incubated for 15 min before another spin at 4000 rpm for10 min. The ethanol wash step was then repeated to ensureefficient removal of unincorporated sequencing dye, andsamples were left to air dry for 20 min. Electrophoresis ofproducts from the sequencing reactions was carried out usingan ABI 3730 capillary sequencer.

DNA sequence analysis: The output sequence traces fromthe sequencer were pretrimmed using PHRED with a qualityscore of 13 or higher (Ewing et al. 1998; Ewing and Green

1998; http://www.phrap.com/phred/). Pretrimmed sequenceswere imported into Bioedit Sequence Alignment Editor(version 7.0.4.1; http://www.mbio.ncsu.edu/BioEdit/bioedit.html), and the sequences for each gene were aligned withClustalW (Thompsonet al. 1994). At this point all the sequencetraces were visually checked against the corresponding align-ment to identify regions of poor sequence quality leading toerrors in polymorphism assignment. This resulted in thediscarding of 3 complete alignments and 49 individual se-quences (2.6% of the remaining data set) due to unacceptablylow sequence quality. The alignments were then trimmed tothe lengths of the shortest members, and presumptive poly-morphisms were identified. To test the accuracy of polymorphismidentification a second, independent round of polymorphismsearching was performed. The sequence traces were importedinto Mutation Surveyor V3.01 (SoftGenetics LLC http://www.softgenetics.com/mutationSurveyor.html). This softwaredisplays polymorphisms in both graphical and tabular forms,linked to the sequence traces and with associated qualityscores. The Mutation Surveyor environment was used to crosscheck visually the sequence traces for all presumptive poly-morphisms identified in both searches in the completesequence set. For the remaining 39 alignments, 14 discrep-ancies between the two methods were identified out of 1021presumptive polymorphic sites in 13,503 bp (�0.1%). These14 discrepancies were all resolved after close inspection of thetraces.

DNA marker analysis and genetic linkage analysis: Singlenucleotide or insertion-deletion polymorphisms segregatingbetween sample JI399 and either or both of JI15 and JI281were identified in CLUSTALW sequence alignments. Allele-specific PCR marker primers are shown in supplemental Table3 at http://www.genetics.org/supplemental/. PCRs usingQIAGEN Hotstar Taq DNA polymerase contained 0.1 mm eachprimer, 20 ng template genomic DNA, and the followingprogram: 95� for 15 min, 40 cycles of 94� for 30 sec, 55� for30 sec, 72� for 30 sec followed by 72� for 7 min. For lociPis_GEN_27, Pis_GEN_28, and psat_EST_00176 the anneal-ing temperature was reduced to 51�. The resulting markerswere resolved by size difference on agarose gel electrophoresisby the MS-PCR method (Rust et al. 1993), the CEL Iendonuclease approach (Kulinski et al. 2000), and/or bythe tagged microarray (TAM) approach (Flavell et al. 2003;Jing et al. 2007). In the latter case e and g tags were appendedto X and Y tags (supplemental Table 3), respectively, byinclusion of TAM tag primers e-X (ACCGCATCCGAACATTTGTC½spacer C-18�CGTGCCGCAAGGACGGGC) and g-Y(GCCGATAATCACCTTGTCAC½spacer C-18�TATATTATGGGCCGCACTGACGGAC), and the e and g tags were detected bythe TAM approach. Markers were scored in either or bothrecombinant inbred line (RIL) mapping populations JI15 3JI399 and JI281 3 JI399 (Ellis et al. 1992). Loci Psat_EST_172,Psat_EST_185, Psat_EST_189, Psat_EST_191, Pis_GEN_15,and Pis_GEN_27 were scored by the CEL 1 approach (Kulinski

et al. 2000) in an attenuated RIL population of 16 and

positioned by matching scores to existing data in an Excelspread sheet.

Phylogenetic and diversity analysis: Combined trees for allgene introns were created using DNAdist ½neighbor joining(NJ)� and DNAml (maximum likelihood) in the Phylip pack-age (http://evolution.genetics.washington.edu/phylip.html)after concatenation of DNA sequence alignments createdusing ClustalW within the Bioedit package (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). Individual gene trees wereconstructed in the same way using DNAdist. The SSAP tree wasa reanalysis of the data from Vershinin et al. (2003) based onDice coefficients, using the DARwin5 package (http://mendel.ethz.ch:8080/Darwin/) (Perrier et al. 2003). Where a se-quence was missing, a corresponding gap was inserted. Treeswere modified with TreeView v1.6.6 (http://taxonomy.zoology.gla.ac.uk/rod/treeview.html). Diversity data analysis (Table 1)was carried out using DnaSP v4.10 (http://www.ub.es/dnasp/).Distance matrices were obtained using DNAdist, and datamanipulations (Figures 4 and 5) were carried out in MicrosoftExcel spread sheets. Linkage disequilibrium analysis used theTASSEL (http://www.maizegenetics.net/tassel) (Bradbury

et al. 2007) and DNAsp packages.Putative recombination breakpoints within gene segments

were calculated from aligned single gene segment sequencesusing the TOPALi software (http://www.bioss.ac.uk/knowledge/topali/) (Milne et al. 2004) with the difference of squares(DSS) option. The statistical significance of DSS peaks, deter-mined by the f test (Bruen et al. 2006) within the TOPALipackage, was estimated with parametric bootstrapping. Net-work analysis of the combined aligned sequence set used thesplit decomposition approach in the software package SplitsTree4 (http://www-ab.informatik.uni-tuebingen.de/software/splitstree4/welcome.html) (Bandelt and Dress 1992; Bryant

and Moulton 2004; Huson and Bryant 2006). The statisticalf test (Bruen et al. 2006) was carried out within SplitsTree4.

RESULTS

To investigate gene-based diversity in pea we adopteda strategy of sequencing multiple gene regions in a set ofsamples that were selected to represent the full diversityof the genus Pisum (Table 1; supplemental Tables 1 and2). The Pisum sample set comprises 45 accessionsincluding 10 P. fulvum, 10 P. elatius, 2 P. humile, and 22P. sativum landraces and cultivars. This is a subset of acore set of 52 Pisum accessions used in a previous SSAPmarker-based genetic diversity study (Vershinin et al.2003), allowing us to compare the diversity patterns de-duced from the two experimental approaches. Addi-tionally, three cultivars ( JI15, JI281, and JI399) were alsoanalyzed because these are the parents of geneticmapping populations, to allow genetic mapping of theanalyzed genes.

Thirty-nine genes were analyzed, to both ensure againstbias associated with individual genes and to investigatethe partitioning of diversity in a variety of genomic sites.Five of these genes were selected because of theirsignificance in seed development and composition orplant architecture, and the rest were selected at randomfrom Pisum sequence databases. The PCR ampliconswere chosen to contain intron sequence, because of thehigher variation in such regions, and exon-derived PCR

Gene-Based Diversity Analysis of Pea 2265D

ownloaded from

https://academic.oup.com

/genetics/article/177/4/2263/6064433 by guest on 12 March 2022

TA

BL

E1

Div

ersi

typ

aram

eter

sfo

r3

9ge

ne

regi

on

sin

Pis

um

Gen

en

o.a

Pri

mer

pai

rn

ame

Gen

ean

no

tati

on

Seq

uen

cele

ngt

hN

o.

of

seq

uen

ces

No

.o

fp

oly

mo

rph

icsi

tes

Ind

els

Hap

loty

pes

Hap

loty

pe

div

ersi

tyD

iver

sity

pSD

pQ

w

1A

J418

375

Nor

kn

od

ula

tio

nfa

cto

rre

cep

tor

140

472

03

0.08

40.

0009

0.00

060.

0032

32

Psa

t_E

ST_1

90F

ibri

llin

325

434

05

0.44

50.

0029

0.00

060.

0056

43

Pis

_Gen

9F

ruct

ose

-1,6

-bis

ph

osp

hat

ase

gen

e27

547

122

100.

415

0.00

320.

0011

0.01

034

Psm

t_E

ST_2

01A

nio

nex

chan

gep

rote

infa

mil

y27

548

141

120.

499

0.00

370.

0001

0.01

151

5P

sat_

EST

_178

Acy

l-co

Ad

ehyd

roge

nas

e37

347

85

90.

775

0.00

420.

0005

0.00

652

6P

smt_

EST

_202

Un

nam

edp

rote

in(A

.th

alia

na)

218

476

27

0.49

40.

0044

0.00

10.

0064

77

Pis

_Gen

10sy

m29

gen

efo

rse

rin

e-th

reo

nin

ep

rote

inki

nas

e18

042

54

70.

623

0.00

520.

001

0.00

646

8P

smt_

EST

_200

Po

ly(A

)-b

ind

ing

pro

tein

314

478

47

0.51

00.

0053

0.00

120.

0129

79

X88

790

Gra

nu

le-b

ou

nd

star

chsy

nth

ase

406

4712

012

0.85

50.

0057

0.00

050.

0066

910

AF

0101

90P

isu

mU

nif

oli

ata

tran

scri

pti

on

fact

or

(Un

i)45

948

227

180.

866

0.00

610.

0007

0.01

109

11P

is_G

en14

Hsp

70ge

ne

for

hea

t-sh

ock

pro

tein

278

4713

114

0.64

10.

0061

0.00

130.

0107

12P

sat_

EST

_189

Pu

tati

veP

I/p

ho

sph

atid

ylch

oli

ne

tran

sfer

pro

tein

253

4411

411

0.80

90.

0062

0.00

080.

0101

13P

smt_

EST

_197

110

kDa

4SN

c-T

ud

or

do

mai

np

rote

in42

147

85

100.

803

0.00

630.

0006

0.00

642

14P

is_G

en28

Th

iore

do

xin

fge

ne,

com

ple

tecd

s.45

245

138

140.

838

0.00

690.

0006

0.00

859

15P

sat_

EST

_163

GT

Pas

eac

tiva

tin

gp

rote

in-li

ke45

446

246

190.

861

0.00

720.

0008

0.01

2516

Psa

t_E

ST_1

88P

uta

tive

pro

tein

115

4617

210

0.54

10.

0082

0.00

250.

0299

417

Psa

t_E

ST_1

71g-

Am

ino

bu

tyra

tetr

ansa

min

ase

sub

un

itp

recu

rso

ris

ozy

me

152

545

214

130.

778

0.00

830.

0011

0.00

951

18P

is_G

en15

1-am

ino

cycl

op

rop

ane-

1-ca

rbo

xyla

tesy

nth

ase

(AC

S1)

307

4524

616

0.71

40.

0086

0.00

154

0.01

823

19C

D85

9399

Cat

ion

icam

ino

acid

tran

spo

rter

-like

pro

tein

342

4821

317

0.85

40.

0089

0.00

070.

0145

620

Psa

t_E

ST_1

61P

uta

tive

dip

epti

dyl

pep

tid

ase

IV54

145

4010

150.

841

0.00

920.

0011

0.01

756

21P

is_G

en5

No

np

ho

sph

ory

lati

ng

GA

PD

Hge

ne

515

4640

725

0.90

20.

0095

0.00

10.

0179

22P

sat_

EST

_175

Un

nam

edp

rote

inp

rod

uct

(A.

thal

ian

a)14

146

82

100.

775

0.00

990.

001

0.01

291

23P

is_G

en7

Lh

cb3

gen

efo

rch

loro

ph

yll

a/b

-bin

din

gp

rote

in48

547

315

270.

918

0.01

010.

001

0.01

624

Psa

t_E

ST_1

72P

uta

tive

Tco

mp

lex

pro

tein

282

4525

1319

0.87

00.

0105

0.00

120.

0214

125

Psa

t_E

ST_1

79P

uta

tive

PST

Vd

RN

A-b

idin

gp

rote

in45

245

243

190.

884

0.01

090.

001

0.01

2226

Psm

t_E

ST_1

99A

t2g4

4950

(A.

thal

ian

a)38

342

338

160.

779

0.01

140.

0019

0.02

119

27P

is_G

en16

Gib

ber

elli

nc2

0-o

xid

ase

gen

e,co

mp

lete

cds

413

4833

1221

0.78

90.

0131

0.00

120.

0184

528

Psa

t_E

ST_1

62P

uta

tive

asp

arta

team

ino

tran

sfer

ase

185

4817

29

0.62

20.

0132

0.00

395

0.02

394

29P

sat_

EST

_165

Pu

tati

veam

idas

e22

148

181

130.

895

0.01

380.

0012

40.

0190

430

Psa

t_E

ST_1

76P

uta

tive

leu

kotr

ien

e-A

4h

ydro

lase

351

4638

322

0.90

80.

0138

0.00

160.

0255

131

Psa

t_E

ST_1

85P

uta

tive

pro

tein

294

4632

622

0.89

30.

0141

0.00

120.

0252

832

Pis

_Gen

27kd

sAge

ne

for

Kd

o-8

-ph

osp

hat

esy

nth

ase

319

3932

914

0.74

80.

0151

0.00

20.

0246

533

Psm

t_E

ST_2

03Si

mil

arto

absc

isic

acid

ind

uce

dp

rote

inh

om

olo

gue

283

4820

314

0.82

00.

0153

0.00

131

0.01

604

34P

smt_

EST

_198

Pu

tati

vean

ion

exch

ange

pro

tein

117

4610

210

0.66

90.

0165

0.00

110.

0203

235

Psa

t_E

ST_1

91tR

NA

pse

ud

ou

rid

ine

syn

thas

efa

mil

y53

847

382

200.

924

0.01

730.

0008

0.01

661

36A

J291

298

Pim

(Pea

m)

565

4754

1125

0.93

70.

0220

0.00

240.

0257

937

Pis

_Gen

12P

uta

tive

hem

eo

xyge

nas

e1

pre

curs

or

(HO

1)53

442

4611

130.

814

0.02

410.

001

0.02

44

38P

smt_

EST

_196

Cys

tath

ion

ine

gam

ma

syn

thas

e41

047

275

170.

891

0.02

600.

0012

0.02

5839

Pis

_Gen

8P

S-IA

A4/

5ge

ne

auxi

n-r

egu

late

dge

ne

295

4762

927

0.94

40.

0366

0.00

30.

0546

To

tal

13,4

361,

791

873

188

572

——

——

Mea

n34

546

225

150.

7494

0.01

080.

0012

0.01

64

aG

ene

ord

er,

by

incr

easi

ng

pva

lue,

asu

sed

inF

igu

re3.

2266 R. Jing et al.D

ownloaded from

https://academic.oup.com

/genetics/article/177/4/2263/6064433 by guest on 12 March 2022

primers were used because the primer sites are betterconserved, minimizing PCR failure and associated datalosses.

Gene sequence-based diversity in Pisum: Alignmentswere generated for each of the 39 gene segment se-quence sets. Nucleotide sequence diversity parameterswere then determined for each sequence set, aftertrimming to remove missing data (Table 1). The regionssequenced are in the range 115–565 bp, with an averageof 345 bp and a total aligned sequence set of 13,436 bpper sample. Haplotype diversity is high for most of theloci, averaging 0.749 (SD 0.184), with a notable excep-tion, AJ418375, encoding the Nork receptor-like kinaseinvolved in the early steps of nodulation signal trans-duction (Endre et al. 2002), with 0.084. The mean nu-cleotide diversity p varies over 40-fold, from 0.0009 forNork to 0.0366 for PS-1AA4/5 auxin-regulated gene,with a mean of 0.0108 (SD 0.0071), and Watterson’s Qw

(Watterson 1975), on the basis of the number of seg-regating sites in the sample, varies between 0.0032 forNork and 0.0546 for PS-1AA4/5. Overall, these data

reveal a high gene-based diversity of Pisum germplasmand show that this diversity is partitioned unevenlybetween genes. Tajima’s D values on all 39 sequencedloci (Tajima 1989) showed four with evidence for non-neutral evolution ½Pis_Gen 9, Pis_Gen 15, Psat_EST_188, and Psat_EST_201, with D values of �2.1714(P . 0.05), �1.8259 (P . 0.05), �2.258 (P . 0.01), and�2.088 (P . 0.05), respectively�.

Sequence-based diversity trees: To investigate thetree structure of the complete sequence set, the alignedgene-derived sequences were concatenated and an NJtree was derived from the distance matrix (Figure 1A).The most distinct clade in this tree, with 100% bootstrapsupport, comprises the ten P. fulvum accessions. This isflanked by seven other wild Pisum accessions with weakbootstrap support (64%). Only two other groups arestrongly supported by bootstrap analysis, comprisingtwo pairs of P. sativum cultivars.

The gene sequence-based tree was compared to a NJtree for an identical set of Pisum samples derived fromretrotransposon insertion SSAP marker data (Vershinin

Figure 1.—Phylogenetictrees for Pisum samples de-duced from 39 gene-derivedsequences and retrotrans-poson markers. Unrootedtrees were generated from(A) 39 gene segments and(B) 54 PDR1 SSAP ret-rotransposon markers(Vershinin et al. 2003) asdescribed in materials

and methods. Species des-ignations, all percentagebootstrap support values.90%, and branch lengthsin distance units are shown.

Gene-Based Diversity Analysis of Pea 2267D

ownloaded from

https://academic.oup.com

/genetics/article/177/4/2263/6064433 by guest on 12 March 2022

et al. 2003; Figure 1B). The SSAP tree shares the deeperfeatures described above for the gene sequence tree,such as a well-supported P. fulvum clade, with the sameflanking set of seven wild Pisums. This broad agreementfor the deeper structure of the genetic diversity forPisum, involving most of the wild samples in this analysis,between two approaches that utilize very different frac-tions of the genome, suggests that both are giving anaccurate picture at this level.

Single locus diversity trees: Despite the similarities intree structure mentioned above, there is little detailedsimilarity between the two trees. Furthermore, the treesin Figure 1 tend toward a ‘‘star’’ phylogeny (i.e., the pres-

ence of long terminal branches) and the low bootstrapsupport for most of the branches, which is suggestivethat these genomes have been recombining with oneanother (Schierup and Hein 2000). Consistent withthis suggestion, removal of individual sequences fromthe concatenated complete sequence set resulted insome rearrangements to the tree structure (data notshown). The reason for this became apparent when treesderived from individual gene-derived aligned sequenceswere investigated. A few examples are shown in Figure 2and all are included in supplemental Figure 1 at http://www.genetics.org/supplemental/. Several individualgene trees resemble the consensus tree quite well, for

Figure 2.—Phylogenetic trees for Pisum samples derived from single gene-derived sequences. Sequence diversity parametersfor the genes are shown in Table 1. Species designations and branch lengths in distance units are shown.

2268 R. Jing et al.D

ownloaded from

https://academic.oup.com

/genetics/article/177/4/2263/6064433 by guest on 12 March 2022

example both Psat_EST 171 and AJ291298 show closegrouping for all P. fulvum samples, with some membersof the intermediate group of seven wild Pisum, men-tioned above, nearby. However, many trees show struc-tures that depart from this. For example, the AF010190and Psat_EST_178 sequence trees split one of the P.fulvum samples, JI2530, from the other nine, and for PisGen 27 P. fulvum is widely distributed across the tree.Thus, the phylogeny inferred from these data dependsvery much on which gene is used.

Heterogeneity of gene-based genetic distance values:The variance in genetic distance inferred from individ-ual gene loci can be analyzed in a quantitative mannerusing the distance values. A few examples of the vari-ation in pairwise distances between gene loci, togetherwith the average for all pairwise distances, are shown inFigure 3. The 39 individual gene loci in the figure areordered by increasing diversity (p) for the combineddata set (i.e., in the same order as for Table 1). Figure 3ashows the distances between two cultivars, JI321 andJI399, which show the lowest mean distance separatingthem in the entire sample set (0.0036) and are tightlylinked in the combined NJ tree (Figure 2). Twenty-threeout of 37 loci are monomorphic between these samples,and the highest divergence corresponds to a geneticdistance of 0.0219 (for Pis_Gen_8). Figure 3b involvesJI321 again, this time with JI1267, a P. sativum landracefrom India. Thirteen out of 37 loci are monomorphicin this sample pair, and the mean genetic distance ishigher (0.0090), as expected. When JI321 is comparedto the wild P. elatius sample JI261 (Figure 3c), only threeloci are monomorphic in the sequenced region, and themean genetic distance is still higher (0.0138). Lastly,when two of the most distantly related samples are cross-compared ( JI261, P. elatius, and JI2517, P. fulvum) themean genetic distance is almost doubled (0.0234), with2 out of 36 loci monomorphic (Figure 3d).

Because the gene order in Figure 3 reflects the in-creasing global diversity in the combined gene set(Table 1), one would expect an increase in individualallele-to-allele genetic distance from left to right forindividual sample pairs. This is broadly true (Figure 3),but there is obviously a large amount of scatter. Forexample, gene 3 (Pis Gen 9) is one of the least diverse ofthe 39 genes in this study (Table 1), yet it shows one ofthe highest genetic distances between JI261 and JI2517(arrow in Figure 3d), while gene 22 (Psat EST 175, alsoarrow in Figure 3d), a medium diversity gene in thecomplete gene set, is monomorphic between these twohighly diverse plant samples.

To quantify this scatter in genetic distance, eachindividual sample–sample distance value for each locuswas divided by the corresponding average distance forall sample pairs for that locus, to normalize for varyingdiversity between gene loci. The normalized distances(D), with corresponding standard deviations (SD) andSD/D ratios for the complete dataset are shown in

Figure 4. The normalized distance values vary between0.233 and 1.888, and their associated SD values varybetween 0.398 and 2.659. In general, closely relatedsample pairs show SD values slightly greater than thedistances (SD/D $ 1), and distantly related samplestend to have SD/D ratios between 0.5 and 1.0.

Figure 3.—Distance values for 39 gene segments betweenselected Pisum sample pairs. (a) Cultivars JI321 and JI399. (b)Cultivar JI321 and landrace JI1267. (c) Cultivar JI321 and wildP. elatius sample JI261. (d) Wild P. elatius JI261 and P. fulvumJI2517. Examples denoted by arrows are discussed in the text.

Gene-Based Diversity Analysis of Pea 2269D

ownloaded from

https://academic.oup.com

/genetics/article/177/4/2263/6064433 by guest on 12 March 2022

Diversity, genetic linkage, and linkage disequilibrium:We suggest that the scatter in distance values betweengene loci is responsible for the distorted phylogeniesseen in individual gene trees. If this scatter derives fromintrogression between different Pisum lineages that hasshuffled gene-specific diversity in Pisum, then closelylinked genes might yield similar tree structures. Toinvestigate this possibility, we determined genetic mappositions for all the gene loci in this study showing

polymorphism in either or both of two recombinantinbred mapping populations (JI15 3 JI399 and JI281 3

JI399) derived from crosses between samples used inthis study. The derived genetic linkage map is shown inFigure 5. Several gene segment pairs are closely linked,notably Psat_EST_197 and AF010190, which cosegre-gate, Psat_EST_191 and Psat_EST_185, separated by5 cM, and Psat_EST_178 and Pis_GEN27, separated by2 cM. The first of these pairs are separated by �60kb,or �seven genes on Medicago genomic BAC cloneAC139708 (supplemental Table 2). The pea genome isknown to be larger than, but generally collinear withMedicago (Kalo et al. 2004). The corresponding treesbased on these two gene segments show some similaritiesin tree structure but several major differences (Figure2). The other pairs above seem to be no more similarthan any randomly chosen pair (supplemental Figures 1and 2 at http://www.genetics.org/supplemental/). Thereis therefore little support for the suggestion that closelylinked gene loci are showing coordinated DNA se-quence evolution in this collection of germplasm.

The largely independent evolution of closely linkedloci in Pisum led us to explore the conservation inlinkage disequilibrium (LD) in these samples. Three ofthe seven P. sativum linkage groups (LGs) had five or sixmapped gene loci available, and LD was investigatedwithin and between the loci on each of these threegroups. One example (LG III) is shown in Figure 6, and

Figure 4.—Variation in distance values. Normalized dis-tance values (D, line), corresponding SD values (solid trian-gles), and SD/D ratios (open squares) for all 1128 samplepairs in this study. The sample pairs are ordered by increasingnormalized D value.

Figure 5.—Genetic map positions for gene segments used in this study. DNA sequence polymorphisms in gene segments weremapped against an existing set of molecular markers in recombinant inbred populations JI281 3 JI399 and/or JI15 3 JI 399.Linkage groups and non-JI399 population parental lines are shown.

2270 R. Jing et al.D

ownloaded from

https://academic.oup.com

/genetics/article/177/4/2263/6064433 by guest on 12 March 2022

the other two, which produced broadly similar results,are in supplemental Figure 2. To perform the analysisthe gene loci sequences were concatenated in theirchromosome order. We had no direct information ofthe gene polarity but our main goal was intergenic LDdecay, and intragenic LD decay would be very lowcompared to intergenic effects. LD decay is only validrelative to the germplasm in which it is investigated, andrates of LD decay increase as the germplasm broadens(Morrell et al. 2005; Caldwell et al. 2006). In thePisum cultivars there is apparently extensive LD be-tween the loci of LG III, but there is no statisticalsupport for this conclusion because of the low samplenumber (only seven cultivars, supplemental Table 1).For landraces there is extensive LD within the se-quenced loci but little evidence for LD between loci,even for the very closely linked Psat_EST_197 andAF010190. This result is consistent with the lack of closesimilarity between the trees derived from these locimentioned above and underlines the fact that evenclosely linked loci appear to be evolving in a largelyindependent manner in noncultivar Pisum. Lastly, wildsamples show apparent sporadic LD conservation butthis is confined to relatively few markers, and we suggest

that this is an artifact deriving from the strong popula-tion substructure in the wild sample set (see discus-

sion). Indeed, there is evidence for LD decay withinsome of the loci in wild germplasm (Figure 6).

Evidence for recombination from mosaic detectionand phylogenetic network analysis: The combinedanalyses above suggest that recombination has shuffledgene loci in Pisum. To test whether intragenic recom-bination could be detected in any of these sequences,individual gene segment alignments were analyzed bythe TOPALi package, which searches for statisticallysignificant evidence of chimeric sequences in alignedsequences (Milne et al. 2004). This analysis revealed 9 ofthe 39 gene segments that show such evidence forrecombination (Pis_Gen_8, Pis_Gen_12, Pis_Gen_28,Psat_EST_163, Psat_EST_178, Psat_EST_190, Psat_EST_191,Psat_EST_196,andPsat_EST_202; supplemen-tal Figure 3 at http://www.genetics.org/supplemental/).

As a final test of the impact of recombination uponthe genetic diversity of Pisum and to determine thelineages most affected, we reanalyzed the completeconcatenated sequence set used for Figure 1 using thesplit decomposition approach, which identifies depar-tures from treelike structure in molecular diversity data

Figure 6.—Linkage disequilibrium within and between gene segments for Pisum LG III LD determinations (as described inmaterials and methods) were performed on the gene segments in LGIII (Figure 4), which were concatenated in their corre-sponding linkage order prior to analysis. The polarity of each gene with respect to linkage order is unknown. The top right quad-rant of each matrix shows each SNP with an allele frequency of $0.1, scored for LD value against all other corresponding SNPs.Boundaries of gene segments are shown by thick lines. Color coded r2 values (from red¼ 1.0 , r2 , 0.9 to white¼ 0.1 , r2 , 0) areshown at the side, and locations of the polymorphisms are shown at the bottom. The bottom left quadrants show correspondingP-values (from red ¼ P , 0.0001 to white ¼ P . 0.01). Sample sets (cultivars, landraces, and wild Pisums are described in sup-plemental Table 1). The bottom graphs plot LD decay with distance for all polymorphic SNPs in all samples.

Gene-Based Diversity Analysis of Pea 2271D

ownloaded from

https://academic.oup.com

/genetics/article/177/4/2263/6064433 by guest on 12 March 2022

sets and visualizes them as reticulated networks (Ban-

delt and Dress 1992; Bryant and Moulton 2004;Huson and Bryant 2006). The results of this approach(Figure 7) show extensive evidence for non-treelikestructure, indicated by reticulate structure in the net-work, and identify the samples exhibiting strongestevidence of recombination. Among the most affectedsamples are JI3147–JI261 (two P. elatius samples),JI3151–JI241 (P. elatius–P. humile), JI2078–JI228 (P.elatius–P. sativum), and the latter pair with adjacentsectors, mainly comprising P. sativum samples, in thenetwork. There is also evidence for network structureinvolving the P. fulvum group, both internally and withadjacent P. abysinicum ( JI2385), P. elatius ( JI3155, JI1094,JI3149, JI3147, and JI261), and P. humile ( JI1794). Astatistical f test (Bruen et al. 2006) showed significantevidence for recombination (P ¼ 0).

DISCUSSION

The first goal of this study was to compare the diversitypatterns obtained using nuclear gene region sequenceand PDR1 retrotransposon insertion data. Our observa-tion that the deep structures of the correspondingdiversity trees, comprising many of the wild samples, is

conserved between these two approaches is importantbecause it validates both methods, which rely upon verydifferent genomic compartments. In plant genomesretrotransposon insertions are mainly found as nestedblocks, either between ‘‘gene islands’’ or in centromericregions largely devoid of genes (Pelissier et al. 1995;Sanmiguel et al. 1996; Presting et al. 1998). Thecorresponding location(s) of Pisum retrotransposoninsertions is not yet clear but a Ty3-gypsy group elementhybridizes to multiple dispersed spots on in situ meta-phase chromosome spreads (Neumann et al. 2001), andmost of the identifiable PDR1 insertions in the Pisumgenome are either in transposable element DNA orunknown repetitious sequence ( Jing et al. 2005). Incontrast, the sequence data used here derive from 39expressed nuclear genes. Our data imply that bothsequence classes yield genetic diversity data that repre-sent faithfully the behavior of the genome as a whole.The rapid decay of LD shown here suggests that this isnot due to hitchhiking effects; rather, we propose thatthe great majority of both PDR1 insertions and genesequence mutations are selectively neutral and accumu-late at comparable rates across the Pisum genus.

The reasonably large number of gene loci studied(39) and extensive sequence set (.13 kb) obtained foreach give us reason to be confident that the diversity

Figure 7.—Split-decomposition network forPisum samples deduced from 39 gene-derivedsequences. An unrooted network (Huson andBryant 2006) was derived from the same genesegment sequence data set as that used for Figure1B. Bold lines indicate edges with 100% confi-dence level from 1000 bootstrap replicates.

2272 R. Jing et al.D

ownloaded from

https://academic.oup.com

/genetics/article/177/4/2263/6064433 by guest on 12 March 2022

picture obtained for this well-studied Pisum sample setis an accurate one. DNA sequence of genes shouldreveal deeper evolutionary relationships than retro-transposon insertions, because its rate of evolution(�10�8– 10�10 mutations site�1 yr�1) is orders of magni-tude lower than the estimated transposition rate ofPDR1 of 5 3 10�7 insertions element�1 yr�1 ( Jing et al.2005).

This study has also provided interesting data on thegene-based genetic diversity of Pisum, which is shown tobe a diverse genus, with one polymorphic site on averageevery 15 bp across the sequences studied here. A com-parison between our data and analogous gene-basedsequence data for two other important crop species,barley (Caldwell et al. 2006) and maize (Tenaillon

et al. 2001) is shown in Table 2. The data are partitionedbetween cultivars, landraces, and wild plants. All threespecies show high frequencies of mutations, which isconsistent with the fact that all still have a widespreadextant wild germplasm. Maize displays the highestdiversity, at least for landraces and cultivars (169% Qw

average, relative to Pisum, wild plants were not studied).This is unsurprising, as it is an outbreeder. Pisum showssomewhat higher values for Qw than barley (143%averaged across landraces and cultivars). The reasonfor this is unclear to us as both species show quite similarlow outcrossing rates, broad geographic distributions,and domestication histories. Nevertheless, the values ofQw obtained here, combined with the estimate ofpopulation size from Jing et al. 2005, are consistent witha credible mutation rate in the range 1.8 3 10�8–1.1 3

10�9 for those sequences that did not show non-neutralevolution in Tajima’s test (Tajima 1989; data not shown).

All of the Pisum gene loci analyzed here show highintrinsic genetic diversity (Table 1), with the singleexception of the Nork locus, which displays only threepolymorphisms at two sites in 2 out of 48 genotypes.Both sites are intronic, with the flanking exon sequence(65 bp) being monomorphic in the entire germplasmset. While only a very small sequence has been studiedand it would be presumptuous to attach too much

significance to this result, such low diversity in such adiverse species is striking and may reflects a high degreeof purifying selection in the genomic region. Norkencodes a receptor kinase mediating the nodulationresponse to the Nod factor of rhizobial bacteria (Endre

et al. 2002). It thus plays a crucial role in nitrogenfixation by pea and needs to interact with multiple otherfactors to achieve this, necessitating strong conserva-tion. Of course, it is not possible at this stage to saywhether Nork itself, rather than a closely linked locus, isunder selection, but the rapid LD decay in this sampleset argues for the former.

Multiple lines of evidence in this study suggest thatthe genetic structure of Pisum diversity is inadequatelyrepresented by a tree structure and the concept of asingle genetic distance between two Pisum samples is apoor approximation to the reality that each locus has itsown distance value, and the variation between loci forthese values (correcting for inherent differences ingene diversity between loci) is of the same order as thevalues themselves. Thus, different genetic loci in Pisumcan display very different pictures of the genetic di-versity of the species, and simply averaging these hidesthe true diversity pattern. Our LD and TOPALi analysesstrongly suggest that recombination is the major reasonfor this. The rapid decay of LD within the noncultivatedgermplasm, even between closely linked genes, togetherwith statistically significant evidence for intragenicrecombination in 23% of the genes studied heredemonstrate that recombination has been very effectivein shuffling Pisum genetic diversity between the majorlineages of the genus. This has also been found to be thecase for Arabidopsis thaliana (Nordborg et al. 2002; Kim

et al. 2007), another species with highly restrictedoutcrossing, and has been proposed to explain the pau-city of retrotransposon-based markers that are confinedto individual Pisum species (Vershinin et al. 2003).

The observations reported here have implications forthe management of plant genetic resources and the se-lection of germplasm for crop plant breeding. Crop plantgermplasm collections, typically containing thousands of

TABLE 2

Gene-based sequence diversity in crop plants

SpeciesNo. ofgenes

Basepairs

No. ofindividuals Mutations/bp Average Qw

a Reference

Pisum spp. 39 13,436 23 wild 0.067 0.0191 This study18 landraces 0.025 0.00797 cultivars 0.012 0.0053

Hordeum vulgare 6 7,243 34 wild 0.060 0.0144 Caldwell et al. (2006)15 landraces 0.015 0.005074 cultivars 0.017 0.0038

Zea mays 21 14,423 16 landraces 0.036 0.0123b Tenaillon et al. (2001)9 cultivars 0.0099b

a Watterson (1975).b Silent sites only.

Gene-Based Diversity Analysis of Pea 2273D

ownloaded from

https://academic.oup.com

/genetics/article/177/4/2263/6064433 by guest on 12 March 2022

wild, landrace, and cultivated samples, are seen as apowerful resource for the introduction of new geneallele combinations into cultivated germplasm. Theglobal diversity of such collections is often representedin a tree format, which is used for selecting samplesubsets (core collections) for further exploration orexploitation. This way of representing genetic diversityis shown here to be a potentially error-ridden process.Sample pairs that are very closely related by globaldiversity analysis may nevertheless carry many distantlyrelated gene alleles and distantly related samples, evenincluding samples supposedly from different species,and carry many identical gene alleles. At the momentsuch discrepancies cannot be predicted and this worksuggests that fine structure haplotype analysis of germ-plasm collections will be required to provide thesolution to this problem. Unfortunately, our LD dataindicate that this will be difficult to achieve in un-cultivated Pisum germplasm, which carries the greatestwealth of unexplored alleles needed for future cropimprovement, because the high rate of LD decay in suchgermplasm will necessitate a correspondingly detaileddefinition of the fine structure of haplotype diversitydown to the gene level.

We thank Linda Milne for substantial bioinformatics help, DavidMarshall, Jo Dicks, Robbie Waugh for many interesting discussions andcomments, and finally Deborah Charlesworth for many constructivecriticisms of this work. This work was supported by grant FP6-2002-FOOD-1-506223 (Grain Legumes Integrated Project) from the Euro-pean Commission (EC) under the EC Framework program VI.

LITERATURE CITED

Bandelt, H. J., and A. W. Dress, 1992 Split decomposition: a newand useful approach to phylogenetic analysis of distance data.Mol. Phylogenet. Evol. 1: 242–252.

Baranger, A., G. Aubert, G. Arnau, A. L. Laine, G. Deniot et al.,2004 Genetic diversity within Pisum sativum using protein-and PCR-based markers. Theor. Appl. Genet. 108: 1309–1321.

Blixt, S., 1972 Mutation genetics in Pisum. Agri. Hort. Genet. 30: 1–293.

Bradbury, P. J., Z. Zhang, D. E. Kroon, T. M. Casstevens, Y. Ram-Doss et al., 2007 TASSEL: software for association mapping ofcomplex traits in diverse samples. Bioinformatics 23: 2633–2635.

Bruen, T. C., H. Philippe and D. Bryant, 2006 A simple and robuststatistical test for detecting the presence of recombination. Ge-netics 172: 2665–2681.

Bryant, D., and V. Moulton, 2004 Neighbor-Net: an agglomerativemethod for the construction of phylogenetic networks. Mol. Biol.Evol. 21: 255–265.

Caldwell, K. S., J. Russell, P. Langridge and W. Powell,2006 Extreme population-dependent linkage disequilibriumdetected in an inbreeding plant species, Hordeum vulgare. Genetics172: 557–567.

Clement, S. L., D. C. Hardie and L. R. Elberson, 2002 Variationamong accessions of Pisum fulvum for resistance to Pea Weevil.Crop Sci. 42: 2167–2173.

Ellis, T. H. N., L. Turner, R. P. Hellens, D. Lee, C. L. Harker et al.,1992 Linkage maps in pea. Genetics 130: 649–663.

Endre, G., A. Kereszt, Z. Kevei, P. Kalo and G. B. Kiss, 2002 A re-ceptor kinase gene regulating symbiotic nodule development.Nature 417: 962–966.

Ewing, B., and P. Green, 1998 Basecalling of automated sequencertraces using phred.: II. Error probabilities. Genome Res. 8: 186–194.

Ewing, B., L. Hillier, M. Wendl and P. Green, 1998 Basecalling ofautomated sequencer traces using phred: I. Accuracy assessment.Genome Res. 8: 175–185.

Flavell, A. J., V. N. Bolshakov, A. Booth, A. R. Jing, J. Russell

et al., 2003 A microarray-based high throughput molecularmarker genotyping method—the tagged microarray marker(TAM) approach. Nucleic Acids Res. 31: e115.

Forster, C., H. North, N. Afzal, C. Domone, A. Hornostaj et al.,1999 Molecular analysis of a null mutant for pea (Pisum sativumL.) seed lipoxygenase-2. Plant Mol. Biol. 39: 1209–1220.

Hudson, R. R., 1983 Properties of a neutral allele model with intra-genic recombination. Theor. Popul. Biol. 23: 183–201.

Huson, D. H., and D. Bryant, 2006 Application of phylogeneticnetworks in evolutionary studies. Mol. Biol. Evol. 23: 254–267.

Jing, R., M. R. Knox, J. M. Lee, A. V. Vershinin, M. J. Ambrose et al.,2005 Insertional polymorphism and antiquity of PDR1 retro-transposon insertions in Pisum species. Genetics 171: 741–752.

Jing, R., V. I. Bolshakov and A. J. Flavell, 2007 The tagged micro-array marker (TAM) method for high throughput detection ofsingle nucleotide and indel polymorphisms. Nat. Protoc. 2: 168–177.

Kalo, P., A. Seres, A. Taylor, J. Jakab, Z. Kevei et al.,2004 Comparative mapping between Medicago sativa and Pisumsativum Mol. Gen. Genomics 272: 235–246.

Kim, S., V. Plagnol, T. T. Hu, C. Toomajian, C. Clark et al.,2007 Recombination and linkage disequilibrium in Arabidopsisthaliana. Nature Genet. 39: 1151–1155.

Kulinski, J., D. Besack, C. A. Oleykowski, A. K. Godwin and A. T.Yeung, 2000 The CEL I enzymatic mutation detection assay.BioTechniques 29: 44–48.

Maddison, W. P., 1995 Phylogenetic histories within and amongspecies, pp. 273–287 in Experimental and Molecular Approaches toPlant Biosystematics, edited by P. C. Hoch and A. G. Stevenson.Monographs in Systematics, Missouri Botanical Garden, St. Louis.

Milne, I, F. Wright, G. Row, D. F. Marshal, D. Husmeier et al.,2004 TOPALi: software for automatic identification of recombi-nant sequences within DNA multiple alignments. Bioinformatics20: 1806–1807.

Mithen, S., 2003 After the Ice: A Global Human History 20,000–5,000BC. Weidenfield & Nicholson, London.

Morrell, P. L., D. M. Toleno, K. E. Lundy and M. T. Clegg,2005 Low levels of linkage disequilibrium in wild barley(Hordeum vulgare ssp. spontaneum) despite high rates of self-fertilization. Proc. Natl. Acad. Sci. USA 102: 2442–2447.

Neumann, P., M. Nouzova and J. Macas, 2001 Molecular andcytogenetic analysis of repetitive DNA in pea (Pisum sativumL.). Genome 44: 716–728.

Nordborg, M., J. O. Borevitz, J. Bergelson, C. C. Berry, J. Chory

et al., 2002 The extent of linkage disequilibrium in Arabidopsisthaliana. Nature Genet. 30: 190–193.

Ochatt, S. J., A. Benabdelmouna, P. Marget, G. Aubert, F. Moussy

et al., 2004 Overcoming hybridization barriers between pea andsome of its wild relatives. Euphytica 137: 353–359.

Pelissier, T., S. Tutois, J.-M. Deragon, S. Tourmente, S. Genestier

et al., 1995 Athila, a new retroelement from Arabidopsis thaliana.Plant Mol. Biol. 29: 441–452.

Perrier, X., A. Flori and F. Bonnot, 2003 Data analysis methods,pp 43–76 in Genetic Diversity of Cultivated Tropical Plants, edited byP. Hamon, M. Seguin, X. Perrier and J. C. Glaszmann. CIRAD/Science, Montpellier, France.

Presting, G. G., L. Malysheva, J. Fuchs and I. Schubert, 1998 ATY3/GYPSY retrotransposon-like sequence localises to the cen-tromeric regions of cereal chromosomes. Plant J. 16: 721–728.

Rust, S., H. Funke and G. Assman, 1993 Mutagenically-separatedPCR (MS-PCR): a highly specific one step procedure for easy mu-tation detection. Nucleic Acids Res. 21: 3623–3629.

Sanmiguel, P., A. Tikhonov, Y-K. Jin, N. Motchoulskaia, D.Zakharov et al., 1996 Nested retrotransposons in the inter-genic regions of the maize genome. Science 274: 765–768.

Sanmiguel, P., B. S. Gaut, A. Tikhonov, Y. Nakijama and J. L.Bennetzen, 1998 The paleontology of intergene retrotranspo-sons of maize. Nat. Genet. 20: 43–45.

Schierup, M. H., and J. Hein, 2000 Consequences of recombina-tion on traditional phylogenetic analysis. Genetics 156: 879–891.

Tajima, F., 1989 Statistical methods to test for nucleotide mutationhypothesis by DNA polymorphism. Genetics 123: 585–595.

2274 R. Jing et al.D

ownloaded from

https://academic.oup.com

/genetics/article/177/4/2263/6064433 by guest on 12 March 2022

Tar’an, B., C. Zhang, T. Wankertin, A. Tullu and A. Vandenberg,2005 Genetic diversity among varieties and wild species acces-sions of pea (Pisum sativum L.) based on molecular markers,and morphological and physiological characters. Genome 48:257–272.

Tenaillon, M. I., M. C. Sawkins, A. D. Long, R. L. Gaut, J. F.Doebley et al., 2001 Patterns of DNA sequence polymorphismalong chromosome 1 of maize (Zea mays ssp. maysL.). Proc. Natl.Acad. Sci. USA 98: 9161–9166.

Thompson, J. D., D. G. Higgins and T. J. Gibson, 1994 Clustal W:improving the sensitivity of progressive multiple sequence align-ment through sequence weighting, position-specific gap penal-ties and weight matrix choice. Nucleic Acids Res. 22: 4673–4680.

Vershinin, A. V., T. R. Alnutt, M. R. Knox, M. R. Ambrose and T. H.N. Ellis, 2003 Transposable elements reveal the impact of in-trogression, rather than transposition, in Pisum diversity, evolu-tion and domestication. Mol. Biol. Evol. 20: 2067–2075.

Watterson, G. A., 1975 On the number of segregating sites in ge-netic models without recombination. Theor. Popul. Biol. 7: 256–276.

Zohary, D., 1996 The mode of domestication of the founder cropsof near east agriculture, pp. 142–158 in The Origin and Spread ofAgriculture and Pastoralism in Eurasia, edited by D. R. Harris. Uni-versity College London Press, London.

Communicating editor: D. Voytas

Gene-Based Diversity Analysis of Pea 2275D

ownloaded from

https://academic.oup.com

/genetics/article/177/4/2263/6064433 by guest on 12 March 2022