Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Linkage, QTL, Mapping Populations and
Marker-Assisted-SelectionDave Douches, Allen Van Deynze and
Dave FrancisPAA SolCAP Workshop
August 2009
1
Detecting QTLs using Marker-based analysis (MBA) requires genotyping of individuals
Steps toward QTL analysis using MBA:
1- Segregating populations
2- Measure phenotypic variationThe more accurate measurement the more precise QTL identification.
3- Measure genotypic variationGenotype individualsGenetic map construction
4- Statistical analysis to find QTLs and QTL validation
Common Population StructuresF2;F3;F4;Recombinant Inbred (RI);F1-derived, intermated recombinant inbred (IRI);Doubled haploids (DH);Backcross (BC1);Association study in the base germplasm;Near Isogenic Lines (NIL).
3
QTL mapping in an F2
4
+ from Parent 2+ from Parent 1
1 2 5 6 7 8 9 103 4
…is an association study in a population where the confounding effect of pedigree has been removed.
Mapping Populations in Potato
4x crosses (between heterozygous parents)
• Mixed segregation patterns• Simplex and duplex segregation patterns
2x crosses• Testcross, F2, tri-allelic ratios• Doubled monoploid cross
– Segregation in one parent - testcross
Mappeddisease resistance
genes and QTL
Potato: Ultra-high density (UHD) genetic map
•70,344 unigenes
•150 SSRs (1000 more in ESTs)
•10,000+ AFLP markers (from 381 primer combinations)
• ordered on 136 progeny of diploid RH x SH cross
• most dense genetic map in any organism
https://gabi.rzpd.de/projects/Pomamo/Walter De Jong, Cornell University
The level of resolution needed depends on intended application
Combining QTLs between lines within a segregating population.Moving chromosome segments within heterotic pools Make germplasm wide inferences/claims about a particular chromosome segment (assuming IBD).Moving chromosome segments across heterotic pools of elite germplasm.Introgression of special trait (e.g. disease resistance) from exotic germplasm.QTL cloning
8
Incr
ease
d ne
ed fo
r res
olut
ion
Type of population depends on resolution needed
Comparison of resolution and research time for various approaches to dissect quantitative variation. The research times assume the target species has only two generations per year. NIL, near-isogenic line; RIL, recombinant inbred line
9Buckler and Thornsberry (2002)
Linkage
Linkage• Association of two or more loci on a chromosome with limited
recombination
Linkage Disequilibrium or Gametic Phase Disequilibrium• Non random association of alleles at two or more loci not
necessarily on the same chromosome• Measures co-segregation of alleles in a population• Mendel’s pea traits – showed complete linkage equilibrium and
hence independent assortment• Can arise from intermixture of populations with different gene
frequencies• Can also be produced or maintained by selection favoring one
combination of alleles over the other – e.g. selection for yield in a breeding population
Falconer and McKay (1996)10
Linkage disequilibrium
11
Linkage Disequilibrium Decay
12
r = Recombination Rate between two locir = 0.5 = Two loci are unlinkedr = 0 = Two loci are completely linked and do not independently assortFalconer and Mackay (1996)
What are QTL?
Genes are regions of DNA controlling a trait or phenotype
Quantitative Trait Loci (QTL) are regions of DNA associated with quantitative traits
• They can be defined using molecular markers that highlight specific DNA segments or genes
This can change how we manipulate these traits in breeding
A trait is quantitative/polygenic if its inheritance is controlled by two or more genes and their interaction with environment
1- A trait having a continuous distribution
2- Controlled by the interaction of many genes
Cont…
A trait is quantitative/polygenic if its inheritance is controlled by two or more genes and the their interaction with environment
Examples• Yield• Lycopene content in tomato• Early Blight (EB) resistance
3- Influenced by the environment
Trait Distribution in the Progeny
If we could observe directly the QTL we could see the 3 underlying trait distributions
Trait distribution in the F2 progeny
Modeling a Marker Locus and a Linked QTL
Single Marker Analysis modeling, assume:• One QTL locus A, • One Marker locus M,
Measure of association between A and M• Self pollinated crops:
compare AA to aa (no dominance effect)
• Cross Pollinated crop: compare AAT to aAT (dominance and additive effects are
confounded)
17
Assigned values of the genotypes at A and M
Let δ=d, d/2, and 0 for cloned, selfed, and tescrossed progenies, respectively.
18
Genotyping PhenotypingIndividual
GenotypedCloned Progeny
Selfed Progeny
Testcrossed Progeny
AA AA => a AA => a AT => aAa Aa => d 1AA:2Aa:1aa => d/2 ½ AT+ ½ aT => 0
aa aa => -a aa => -a aT => -a
Need to collect good data: Good phenotypes are important
ripeningin
fect
ion
Ann Powell UC Davis
What information do we need?
Information pertinent to the association between marker genotypes and phenotypic values.• Total number of individuals genotyped, N (and
number in each class, nMM, nMm, and nmm)• Values of a and δ.• Values of residual genetic variance and error
variance
20
Detection of a QTL
Detection of linkage between A and M, will occur only if both are segregating in the population and if they are physically linked;We cannot extrapolate results from one mapping population to the rest of our germplasm without considering disequilibrium.
21
Do we have disequilibrium between a linked marker and a QTL?
We cannot really assess very well the extent of gametic phase disequilibrium between loci in the population without very extensive mapping studies.
With SNPs we can look at marker-marker associations for a much smaller cost: we just need to genotype our germplasm.
22
The marker density needed depends on the population
23
( )
0.0
0.5
1.0
1.5
2.0
2.5
0 20 40 60 80 100
Distance of Marker from QTL (in cM)
(MM
-mm
)/a *
DH or F2 or 2xBC1F3F4IRI (n# gen RM=0)IRI (n# gen RM=2)IRI (n# gen RM=5)IRI (n# gen RM=10)IRI (n# gen RM=25)
Pow
er o
f det
ectin
g a
QTL
34.7 cM
2.4 cM
TG670
SSR10511CT6213SSR26614SSR192 TG12515SSR95 CT2012721
SSR31634CT14935CT20134I CT10725I41CT20268I42SSR13446CT10975I51TG27354CT10030I.258CT2011659LEOH106 TG5960CT10629 CT10811CT10945 LEVCOH1265LEVCOH1166CT19169SSR973SSR308 TG46581LEOH22286SSR42 TG260TG24587SSR3796CT1025998TG255103SSR65 SSR582SSR288 TG580113CT10126I115
Chr.1
CT105350LEOH3423LEOH342n7TG60812CT20522CT10682I29CT1019030TG16531SSR6633SSR9640CT1064942CT1092344TG1446SSR5 SSR60547CT10771 CT1015348CT1080151CT10279I55CT24457SSR3259LEOH34860SSR59861SSR2664TG46968LEOH11370TG64572TG33777TG53784TG16790LEOH319 LEOH17491TG15192TG154100
Chr.2
TG214 SSR6010SSR14 LEOH1271SSR3202CT10690I3CT20050 CT10772I4CT106786CT10042I13CT1045017CT8519CT10437I22TG24628LEOH11031LEOH18533TG12934CT10689I36CT10480I38SSR11144CD5145CT20195 CT10402I46CT20037 CT1043747CT10736 CT8250CT2002351LEOH22354CT1050655TG52059CT14169TG13B77TG13086TG11492
Chr.3
CT109520
SSR29610
TG1522CT10255I SSR4325LEOH36126SSR431127TG48337CT2014541CT1018447CT1032253SSR31056SSR450 CT10485CT157 SSR30657CT1080959CT1021564LEOH101 CT17868CT2002870CT19474TG16377CT1088878CT10184I79CT1013684CT1055685TG50090CT5092
CT10375121
Chr.4
CT1010CT102384TG4419CT167 SSR11514CT1003615
CT9330TG9637CT1096341CT10373I TG619CT10151I42CT10765I44CT20210I45CT10526 LEOH6346TG100A47CT11855LEOH19258LEOH31667CT1059176SSR16283SSR109 TG18584
Chr.5
CT2160CT102426CT10242I7CT1018711TG59017
CT10328I27SSR12830
TG35639LEOH24341
TG36549TG25355LEOH14657LEOH20958LEOH20061LEOH11266
CT20674TG31478
Chr.6
TG3420CT200173CT5210SSR28611L21J7a18LEOH10422SSR27627CD5728
TG18338TG17441CT1013845CD5446
CT1097455
SSR4564TG2070CT1003973LEOH22174TG49980
Chr.7
CT10152I0CT103961LEOH704
TG17616LEOH12319LEOH147 CT47CT1019220CT1001521CT92 SSR32722CT1016226TG34931TG30239SSR33544SSR6350TG33051SSR3853CT10367I62
CT26570
CT6882
Chr.8
GP390
TG189
CT14323SSR68 SSR7030CT2015931CT1000436
TG29149CT1002450SSR11056LEOH14458TG55160
CT7471LEOH11772LEOH17074TG42181
SSR333 TG328100
Chr.9
CT100820CT10082I1TG122 CT166CT1067013SSR3416SSR59617CT23419CT10105I30CT10464I31SSR31835CT1136CT1055440CT20342CT10419I43CT10078I CT1070146CT10386I CT1038653
TG40377
TG23388LEOH336 TG6389SSR22391
Chr.10
CT10683I0TG4974LEOH1765
SSR8017TG50822SSR7627
CT10120 CT20244I34
TG147 CT10781CT1091547TG38451CT10737I53CT10615I59
CT2018168TG54672TG3678
TG39389CT1002790
Chr.11
TG1800
TG689
CT21126CT10953I29
TG36038SSR2041TG56547TG11153
LEOH6664LEOH30167
CT156 CT10329I78CT1077879CT10796I LEOH27580LEOH19787CT27688
Chr.12
Tomato SNP and Indel Map
Matthew Robbins, Ohio State University
Potato SNP map
Potentially 7600 markers• Candidate genes and randomly distributed
High SNP polymorphism ratePotentially over 600 markers per chromosomeWe need over 3000 random markers to saturate the 4x genome• Bradshaw et al 2008
We need less than 1000 markers to saturate the 2x genome
Sample (population) sizeσ2 = error σ2 + residual genetic σ2
“error σ2 ” and “residual genetic σ2 ” are based on overall mean for each entry.Increasing the number of field replications will reduce error σ2 but not the between-line genetic σ2.To further reduce the denominator in the t-test. We need to add more genetic entries.
Typical split plot: – Sub-plot error: plot-level error σ2
– Plot error: genetic σ2 + (plot-level error σ2)/(rep number)
26
Statistical power
27
h2=0.10
h2=0.05
Hu and Shu 2008
28
Most popular QTL analysis methods for MBA
Single-Marker Analysis (SMA)Interval Mapping (IM)
Simple Interval Mapping (SIM)MapMaker/QTLQGENER/QTLMapQTL5
Composite Interval Mapping (CIM)QTL CartographerR/QTLMapQTL5 QTL Network 2.0
Multiple Interval Mapping (MIM)WinQTL cartographer
SMA is being used less nowadays due to its disadvantages and availability of more advanced statistical methods
Advantage:Simplest approach for detecting QTLsQuick and less memory intensive
Disadvantages:May not detect loosely linked QTLsQTL effects are often underestimatedDoes not determine the distance between marker and QTLDoes not determine the direction of QTL from marker
SIM was one the most commonly used methodology in QTL mapping
Advantage:The likelihood map represents the position of the QTL at various
points of the genome
The probable position of the QTL is given by support intervals
Requires less progenies than the SMA
Disadvantages:The number of QTL cannot be resolved
The locations of the QTL are sometimes not well resolved, the exact positions of the QTL cannot be determined
The statistical power is still relatively low
CIM has at least four advantages over SMA and SIM
Advantages:By focusing on one genome region, the multidimensional search is reduced to one-dimensional searchThe resolution of QTL locations obtained is much higher than SMA and IMThere are more variables in the model, making the model more efficientMarkers can be used as boundary conditions to narrow down the most likely QTL position
Comparison of SMA, SIM and CIM for EB resistance in tomato
SMA
SIM
CIM
EB Resistant EB Susceptible
Solanum lycopersicum X S. habrochaites
~850 BC1Plants
F1 X S. lycopersicum
BC1
(EB Susceptible) (EB Resistant)
Trait-based analysis (Selective Genotyping)
Zhang et al 2003 Mol. Breed. 12: 3-19 Early blight resistance
Foolad et al 1997 Mol. Breed. 3: 269-277 Salt tolerance
QTL mapping step by step: A practical example A raw file is used for the genetic map construction
QTL mapping step by step: A practical example A genetic map is constructed
QTL mapping step by step: A practical example Phenotypic data is collected and incorporated into the file
QTL mapping step by step: A practical example A map file is then created for QTL analysis
Key points of QTL mapping
QTL’s mapped in a bi-parental cross may not be appropriate for MAS in all populations• Marker allele and trait may not be linked in all populations.• Genetic background effects may be population specific.• Original association may be spurious.• QTL detection is dependent on magnitude of the difference
between alleles and the variance within marker classes.
Adding more genotypes is more efficient than replicating the same genotypes (provided that a minimum number of environments are sampled)h2 and size of effect important Results are trait specific
39
SolCAP Mapping PopulationsGermplasm panel• Over 300 genotypes• Over 20 programs represented
– Important advanced breeding lines– Current and Historical varieties– Parents of mapping populations– Important genetic stocks
• Phenotyping at NY, WI and OR of core traits– Evaluation of specific gravity, glucose and sucrose, chip
color, skin type, shape, vine maturity, tuber number, vitamin C, internal defects, etc.
– Accessible through the SolCAP and Solanaceae Genome Network (SGN) websites.
• Association mapping• Develop linkage hypotheses• Phenotyping of other traits
SolCAP Mapping Populations
4x population• Rio Grande X Premier Russet• 200 progeny• 7600 SNPs• 3 environments
– North Carolina, Minnesota, Idaho• Identify QTL for core traits (CHO, Vit C)
Other populations from the community• Core trait validation• Other traits of regional importance
SolCAP populations
2x population• DM1-3-516 44 x 84SD22 • 120 progeny• Testcross segregation• Map SNPs• Anchor scaffolds to the genetic map
– Link mapping to PGSC• Map core traits
Other populations from the community
Approach of MAS in Tomato
•Pathway approach for candidate gene identification and introduction to metabolic pathway databases.
•Identification of polymorphisms in data-based sequences
•Approach of potato breeding for the future
Dist MarkercM Name
CT233
17.1
TG672.1LEOH36
20.0
TG1252.0CT62
24.7
CT1497.2
LEOH17*
12.8
TG2739.3
TG599.9
CT191
10.7TG465
5.8TG2601.1LEOH7
15.1
TG25510.1
TG580
Chr 1
Dist MarkercM Name
TG608
13.9
CT205
10.7TG1653.8 LEOH23
10.9TG14
18.4
CT2446.1
TG4695.2
TG645
10.6TG537
7.0TG1673.1TG151
10.2TG154
Chr 2
Dist MarkercM Name
TG15
12.4
TG483
23.6
CT157
18.5
LEOH377.7
CT1786.5
CT194
16.5
CT501.6TG500
13.6
TG1630.0LEOH10
Chr 4
Dist MarkercM Name
CT1017.9
TG4413.6CT167
18.5
CT933.0LEOH163.0TG96
12.5
TG100A
19.5
CT118
28.3
TG185
Chr 5
IL1-
1
IL4-
3IL
4-4
IL1-
2IL
1-3
Chr 3
IL2-
4
LEOH15*
IL4-
1
LEOH17*
IL5-
2
LEOH17*
Dist MarkercM Name
TG1145.2
TG130
18.5
CT141
13.4
TG520
23.5
CT829.1
CD5110.1
TG1295.7
TG2468.2
CT85
18.2
TG214
IL3-
2
LEOH17*
IL3-
1
LEOH15*
IL3-
3
LEOH17*
LEOH17*
LEOH36
LEOH10
LEOH37
QTL Trait Origin2 L, YSD S. lyc.4 YSD S. lyc.6 L, Hue ogc
7 L, Hue S. hab.11 L, Hue S. lyc.
Audrey Darrigues, Eileen Kabelka
Example: QTL for color uniformity in elite crosses
Comments on SGN databases:
•There is a wealth of information organized and available.
•Information is not always complete/up to date.
•Display is not always optimal, and several steps may be needed to go from pathway > gene > potential marker.
•Sequence data has error associated with it. eSNPs are not the same as validated markers.
•SolCAP will be asking for feedback on how best to improve the SGN database and access via the Breeders Portal
Current SGN interfaces are aimed at the molecular biologist, with searches designed to facilitate molecular discovery• Need a portal that is trait and germplasm centered
Ability to search by traits relevant to (and defined by) breeders
When a marker is identified, a protocol for use in breeding will be provided.
Search option that only yields polymorphic marker, phenotype, or QTL results from elite germplasm
Ability to search for known parents or offspring of any given genotype
Ability to generate a list of markers that are polymorphic between any two parents
Detailed tutorial/definitions of terms and traits utilized within the database
Developing “Breeder Friendly” Tools
http://www.tomatomap.net
http://sgn.cornell.edu/
Ovate
•Missing data in SGN•Limited ability to generate tables, PCR conditions sometimes incomplete, Enzyme sometimes missing, SNP not described.
•Missing data in Tomatomap.net•SNP and sequence context requires BMC genomics supplemental table , ASPE primers, GoldenGate primers.
•2007. BMC Genomics 8:465 www.biomedcentral.com/content/pdf/1471-2164-8-465.pdf
Databases
Genotyping the core collections will impact breeding.
• Potential translational approaches: 1) introgression from other populations (domesticated or wild) 2) selection for coupling phase recombinants to establish linkage
blocks of favorable alleles (e.g. disease resistance loci)3) population development designed to maximize variation w/in
market classes4) association approaches5) whole genome approaches
• Other translational strategies will emerge under other CAPs or through innovation in public research.
Additional Thoughts from Tomato:
•Marker resources are currently not sufficient for QTL discovery in bi-parental or Association Mapping populations; in the future they should be.
•The best time to use genetic markers : early generation selection
•Restructuring of breeding program to integrate markers may include:
• 1) Increasing genotypic replication (population size) at the expense of replication (consider augmented designs).• 2) Collecting objective data.
Marker Assisted Selection in Potato
Limited number of markers available• PVY, RB, GN, VW useful• Invertase, sucrose synthase and LB QTL not validated in our
germplasm• More markers and QTL to validate from Li, et al. 2008 and Bradshaw
et al. 2008Some are PCR-based• Need for marker conversion• Use of high-throughput DNA isolation methodsApply at early generation selection phase• Seedling stage• After year 1 or 2 selection Need markers linked to core traits (e.g. glucose and starch)
Address regional, individual program and emerging needs within the Solanaceae community through a small grants program.
Six key areas we intend to allocate resources to are:
– Genotyping mapping populations in the core facilities requested by the greater breeding community
– Marker conversion – developing SNP markers linked to QTL into easily assayed (e.g. CAPs or dCAPs) markers that end-users can readily apply in their own research programs
– QTL validation and MAS
– Population development to address emerging needs
– Extension or education special projects
– New directions not envisioned at the time of proposal submission.
Small Grants Program
References
Dudley, J.W. and R.J. Lambert. 1992. Maydica 37:81-87.Falconer and McKay. 1996. Introduction to quantitative genetics. 4th ed. Longman group LTD. Essex, UK. pp 464.Laurie, C.C. et al. 2004. Genetics 168:2141-2155.Liu, B.H. 1998. Statistical Genomics. Linkage, mapping and QTL analysis.CRC press LLC. FL, USA. 611p.Schön, C.C. et al. 2004. Genetics 167: 485-498Tanksley S.D. et al. 1996. Theor. Appl. Genet. 92:213-224.Walsh, B. and Lynch, M. 1998. Genetics and Analysis of Quantitative Traits. Sinauer Associates Inc. MA, USA. 980p.Winkler, C.R. et al. 2003. Genetics 164:741-745.Xiao, J. et al. 1998. Genetics 150:899-909. Kaepler, 1997. TAG 95:618-621.Frisch, et al., 1999. Crop Science 39: 1295-1301.Knapp and Bridges, 1990. Genetics 126: 769-777.Yu et al., 2006. Nature Genetics 38:203-308.Van Deynze et al., 2007. BMC Genomics 8:465www.biomedcentral.com/content/pdf/1471-2164-8-465.pdf
55