10
Theoretical Population Biology 59, 175184 (2001) MINIREVIEW Quantitative Genetics in the Age of Genomics Bruce Walsh Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721 E-mail: jbwalshu.arizona.edu Received November 2, 2000 C. elegans Genome Project ... 8100 Drosophila Genome Project ... 81 Human Genome Project ... 810 Working knowledge of multivariate statistics ... million billion billion Priceless T-shirt designed by Mike Wade for the Evolution 2000 meetings Mike Lynch and I recently attempted to summarize the current state of quantitative genetics (Lynch and Walsh, 1998; reviewed for this journal by Baston, 2000). Already, parts of our treatment are somewhat dated due to the explosive growth and refinement of methods for mapping loci underlying complex traits (QTLs, for quan- titative trait loci). While such growth is bad news for textbook writers trying to stay current, it is also the hallmark of a scientifically healthy and active field. As the age of genomics ushers in, continued dramatic changes in the field of quantitative genetics are expected. The fastercheaper trend for sequencing and genotyping that fuels genomics has a parallel trend in computing, and this will also have a significant impact on the field. Finally, a variety of other emerging biotechnologies in such areas as reproductive biology (whole organism cloning, embryo transplantation) and recombinant DNA (trans- genic organisms) will also have important consequences. Given the availability of these new biological and com- putational tools, what will quantitative genetics become? In particular, if we wrote a second edition in (say) 2020, how much will the field have changed? Ultimately, any attempt to predict the long-term directions of a field results in a paper that future workers will find both somewhat amusing and hopelessly misguided. This caveat being stated for the record, let us press on. The machinery of quantitative genetics is widely applied in such diverse fields as human genetics, evolution, and breeding. While the broad goals of these different users are the same (determining how genetic and environmental fac- tors contribute to the observed variance, either within or between populations, of particular traits), their specific goals are different. Human geneticists are concerned with finding disease-susceptible genotypes and their associated risk factors (such as particular genotypeenvironment interactions). Plant and animal breeders are concerned with maximizing selection response and stability. Evolutionary geneticists are concerned with the genetic architecture of particular traits and inferring both their past evolutionary history and their potential for future evolution. Classical quantitative genetics has admirably served these different users in the past. How will it fare in the future and how will it be different? CLASSICAL VS NEOCLASSICAL QUANTITATIVE GENETICS The strength of classical (Fisherian) quantitative genetics is that one can use the variance component machinery developed by Fisher (1918) to analyze essen- tially any character in any organism. All that is needed is a collection of known relatives (often not a trivial task to obtain) and some very general knowledge about the genetic and mating systems (such as ploidy level and whether the species is an outcrosser or a selfer). Under doi:10.1006tpbi.2001.1512, available online at http:www.idealibrary.com on 175 0040-580901 K35.00 Copyright ] 2001 by Academic Press All rights of reproduction in any form reserved.

Quantitative Genetics in the Age of Genomics

Embed Size (px)

Citation preview

Page 1: Quantitative Genetics in the Age of Genomics

e

riz

ry.c

Theoretical Population Biology 59, 175�184 (2001)

MINIREVIEW

Quantitative Genetics in th

Bruce WalshDepartment of Ecology and Evolutionary Biology, University of AArizona 85721

E-mail: jbwalsh�u.arizona.edu

Received November 2, 2000

C. elegans Genome Project ... 8100Drosophila Genome Project ... 81Human Genome Project ... 810Working knowledge of

multivariate statistics ...

millionbillionbillion

Priceless

��T-shirt designed by Mike Wade for the Evolution2000 meetings

Mike Lynch and I recently attempted to summarizethe current state of quantitative genetics (Lynch andWalsh, 1998; reviewed for this journal by Baston, 2000).Already, parts of our treatment are somewhat dated dueto the explosive growth and refinement of methods formapping loci underlying complex traits (QTLs, for quan-titative trait loci). While such growth is bad news fortextbook writers trying to stay current, it is also thehallmark of a scientifically healthy and active field. As theage of genomics ushers in, continued dramatic changes inthe field of quantitative genetics are expected. Thefaster�cheaper trend for sequencing and genotyping thatfuels genomics has a parallel trend in computing, and thiswill also have a significant impact on the field. Finally, avariety of other emerging biotechnologies in such areasas reproductive biology (whole organism cloning,embryo transplantation) and recombinant DNA (trans-genic organisms) will also have important consequences.

Given the availability of these new biological and com-putational tools, what will quantitative genetics become?In particular, if we wrote a second edition in (say) 2020,how much will the field have changed? Ultimately, anyattempt to predict the long-term directions of a field resultsin a paper that future workers will find both somewhat

doi:10.1006�tpbi.2001.1512, available online at http:��www.idealibra

175

Age of Genomics

ona, Tucson,

amusing and hopelessly misguided. This caveat beingstated for the record, let us press on.

The machinery of quantitative genetics is widely appliedin such diverse fields as human genetics, evolution, andbreeding. While the broad goals of these different users arethe same (determining how genetic and environmental fac-tors contribute to the observed variance, either within orbetween populations, of particular traits), their specificgoals are different. Human geneticists are concerned withfinding disease-susceptible genotypes and their associatedrisk factors (such as particular genotype�environmentinteractions). Plant and animal breeders are concernedwith maximizing selection response and stability.Evolutionary geneticists are concerned with the geneticarchitecture of particular traits and inferring both theirpast evolutionary history and their potential for futureevolution. Classical quantitative genetics has admirablyserved these different users in the past. How will it fare inthe future and how will it be different?

CLASSICAL VS NEOCLASSICALQUANTITATIVE GENETICS

The strength of classical (Fisherian) quantitativegenetics is that one can use the variance componentmachinery developed by Fisher (1918) to analyze essen-tially any character in any organism. All that is needed isa collection of known relatives (often not a trivial task toobtain) and some very general knowledge about thegenetic and mating systems (such as ploidy level andwhether the species is an outcrosser or a selfer). Under

om on

0040-5809�01 K35.00

Copyright ] 2001 by Academic PressAll rights of reproduction in any form reserved.

Page 2: Quantitative Genetics in the Age of Genomics

classical quantitative genetics, we observe only thephenotypic value z of an individual, which we regard asthe sum of an unseen genetic (g) and environment (e)value,

z=g+e (1)

Using the covariance structure for the phenotypicresemblances between sets of known relatives, we candecompose the genotypic value into further components(e.g., g=a+d+i, an additive, dominance, and interac-tion term) and subsequently estimate the varianceassociated with each of these components. The flexibilityinherent in classical quantitative genetics arises because(for many problems) knowledge of just the variance com-ponents is sufficient, with any further knowledge of thefine genetic details being largely irrelevant. For example,variance components are sufficient to predict the shortterm response to selection and to estimate the increase indisease risk for different sets of relatives. This focus onvariance components has left most of my molecularcolleagues with an uneasy feeling about quantitativegenetics, as the notion of having one's experimentalanalysis be heavily dependent on statistics has beenanathema to many molecular biologists (Indeed, I had acolleague who cheerily informed students in my under-graduate class that ``If you need statistics to analyze anexperiment, you have not designed it correctly''). This ismost unfortunate because if we indeed knew all the finedetails (for example, the values of all relevant genotypesand their population frequencies), in many cases wewould first translate these into genetic variance com-ponents before using this information to make popula-tion-level predictions. This classical world of quantitativegenetics where genotypes are unknown is dominated byrandom effects models, where any particular genotype(and its associated genotypic value g) is assumed to be arandom draw from a population and our interest is inestimating the variance components of g.

In the emerging, more generalized framework forquantitative genetics, at least some genotypes at lociinfluencing trait variation are assumed known. Underthis new framework (which I will refer to as neoclassicalquantitative genetics), we know both the phenotype zand the multilocus genotype m for specific genes of inter-est. The recorded genotypes are either actual segregatingsites contributing to character variance or markers

176

tightly linked to these sites. If Gm is the genotypic valueassociated with this genotype, then the simplest model is

z=Gm+g+e (2)

We can estimate the genotypic value Gm associatedwith m (if unknown) by treating it as a standard fixedeffect. Since Gm likely depends on the distribution ofbackground genotypes and environments in the popula-tion of interest, we expect its value to change (and thushave to be re-estimated) as the population of interestchanges. The genotypic value contributed by unmea-sured genes is given by g, whose variance components areestimated in the standard Fisherian framework. Thisbasic model can easily be expanded to accommodate aknown-genotype x background-genotype epistatic inter-actions by adding a Gm_g term. Since g is a randomeffect, this term is also random, and we can estimate itsassociated variance (provided the genotype m is suf-ficiently frequent in our sample) again by standardFisherian approaches. Similar modifications are alsoeasily envisioned for other potential interactions betweenGm , g, and e.

The view of many of my molecular colleagues is that ifthe appropriate set of genes are known, then Gm accountsfor a very significant fraction of the total genetic variance,and hence quantitative genetics (the contribution g) justdeals with trivial residual variation. This presupposesthat Gm is known either without error or has a very smallassociated error variance. It further supposes the stabilityof Gm across individuals. In fact, Gm is the expected valuefor the genotype m over the distribution of backgroundgenotypes and environments. If the loci scored for m haveeither significant epistatic interactions with the back-ground genes and�or genotype-environment interactions,this can generate significance variance in Gm around itsexpected value (most of this error would be incorporatedinto e, although contributions from additive epistatiswould be incorporated into g).

These important caveats aside, then my molecularcolleagues are partly correct in their assertion. Supposethe phenotypic trait of interest has not yet been displayedin an individual. For example, we are trying to determinethe probability that an individual will get a late-onset dis-ease, so that z is eventually coded as either 0 (diseasefree) or 1 (diseased). In the classical framework, wewould use information on the disease status of relativesof this individual coupled with estimates of variancecomponents to estimate the risk. In the neoclassicalframework, we could look directly at the genotype m forcandidate disease risk genes and use this to predict z.Clearly, if most of the genetic variation is accounted forby Gm , then inclusion of g (via variance components) will

Bruce Walsh

have little effect on our prediction. For many problems ofinterest, however, even if we were able to identify all therelevant genes underlying a trait, we would still not knowthe genotype of particular individuals. For example,

Page 3: Quantitative Genetics in the Age of Genomics

suppose we are trying to predict the phenotypic values inthe offspring from two parents, as would occur if we aretrying to predict response to selection or to inform a coupleabout the disease risk their offspring face. Once suchindividuals are realized, we can genotype them, butbefore their realization, we are strongly in the realm ofquantitative genetics. Let mi and mj be the (potentiallymultilocus) genotypes of the two parents. Given theseknown genotypes, the expected frequencies of gametesfor each are easily computed (even when loci are linked),and the resulting distribution of genotypic values (overthe known loci) in their offspring is given by

G(xi , xj) p(xi | mi) p(xj | m j) (3)

where G(x, y) is the function giving the resultinggenotypic value for an individual formed by gametes oftypes x and y, and p(x | m) is the distribution of gametesof type x given a parent of type m. Under the classicalframework, it is generally assumed that genotypic valuesare normally distributed and hence the offspring distribu-tion is completely specified by the mean and variance(mean vector and covariance matrix in the multivariatecase). Assuming the infinitesimal model, the mean is theaverage breeding value of the parents and the segregationvariance is a constant value, independent of the parents.

When genotypes are indeed known, the segregationkernel from any given family (Eq. 3) allows for distribu-tions that depart significantly from normality. In suchcases, variance components alone are not sufficient todescribe the complete distribution. Note from Eq. (3)that the segregation variance (the offspring variance of Garound the offspring mean) is clearly a function of theparental genotypes. With known genotypes, the segrega-tion variance is heteroscedastic (taking different valuesfor different parental genotypes), in contrast to classicalquantitative genetics which assumes a homoscedasticsegregation variance (a constant value, independent ofparental genotypes). This is obviously a point to beexploited by breeders, with some families having asmaller offspring variance, and hence being more predict-able, than others. Likewise, families with larger thanaverage segregation variance are of importance tobreeders trying to select extreme individuals.

Moving from this fictitious world, where all relevantgenotypes are known, to the real world where (at best)only a subset are known, quantitative genetics furtherincreases in importance. As mentioned, the genotypic

Quantitative Genetics in the Age of Genomics

values for particular loci are potentially functions of thebackground genotypes and environments (when epistatisand�or genotype�environment interactions are present).Furthermore knowledge of important genotypes is

expected to be fleeting. Mutation will generate newQTLs, and a candidate locus that works well in one pop-ulation may be a very poor predictor (at best) in another(a particuarly interesting example is given by Winkelmanand Hodgetts, 1992, who examined candidate locusassociation in a mouse line before and after selection).Indeed, if even a modest number of QTLs influence atrait, then (apart from clones) each individual is essen-tially unique in terms of its relevant genotypes and theparticular environment effects it has experienced. Ifepistasis and�or genotype-environment interactions aresignificant, any particular genotype may be a good, butnot exceptional, predictor of phenotype. Quantitativegenetics provides the machinery necessary for managingall this uncertainty in the face of some knowledge ofimportant genotypes. Variance components allow one toquantify just how much of the variation is accounted forby the known genotypes. A critical feature of quantitativegenetics is that it allows for the proper accounting ofcorrelations between relatives in the unmeasured geneticvalues (g).

Ironically, with the age of genomics we are now at thepoint where it may be much easier to extract andsequence potential candidates genes from an organism ofinterest than it is to obtain a collection of knownrelatives. Under neoclassical quantitative genetics, we arepotentially less dependent on known sets of relatives,replacing them to some extent with known genotypes atparticular candidate loci. We can also use marker infor-mation to make inferences about the degree of related-ness between sampled individuals, and this informationcan be used in a classical quantitative genetics frameworkto estimate the associated variance components. Thedownside to this approach is that the estimation error fordegree of relatedness rapidly increases once we passbeyond first and second degree relatives.

GENOMICS, MODEL SYSTEMS,AND CANDIDATE LOCI

Much of the above discussion of a more generalizedview of quantitative genetics has assumed that we knowthe genotypes (and their effects) at a number of QTLs.Given that very few QTLs from natural or human pop-ulations have been fully isolated, we are still very far fromachieving this goal. An often stated hope is that genomics

177

projects will provide candidate loci, as candidate�traitassociation tests are inherently more powerful thanmarker-based whole genome scans for QTLs. Even witha list of such potential candidates in hand, one is still

Page 4: Quantitative Genetics in the Age of Genomics

faced with the difficult task of first testing for candidategenotype-trait associations, and then demonstratingthat any such associations are not sampling artifacts. Wewill examine these two issues��the impact of genomicprojects on sharpening the search for potential can-didates and the subsequent tests for association��inturn.

While a variety of tools are currently available tosearch for candidates, at present they are restricted byeconomical (rather than biological) constraints toimportant traits in just a few organisms. One of the majortrends expected over the next decade will be to makethese tools economically feasible for just about anyorganism of interest. Clearly, the trend toward fast andcheaper genotyping and sequencing will continue, as willthe continued expansion of newer technologies such asexpression chips. Simply extrapolating this trend out (adangerous thing to do, for past history suggests we willlikely underestimate it), genomic projects for one'sfavorite organisms become a reality in the immediatefuture. With a genomic sequence in hand, the ability toscore individuals rapidly (and cheaply) at a very largenumber of loci also becomes quite feasible.

What are the basic off-the-shelf genomic tools currentlyavailable in model systems (e.g., Drosophila, yeast,C. elgans, and humans)? Perhaps the most useful singletool to quantitative geneticists are dense marker maps,usually constructed from single nucleotide polymor-phisms (SNPs) and�or microsatellites. It is these mapsthat allow for QTL mapping, association studies, andestimation of the relatedness between individuals in arandom sample from the population. Complete genomicsequences offer the hope of fishing for candidate genessimply based on sequence information (we will havemore to say about this shortly). With a complete genomesequence in hand, one can construct any number of DNAchips��microarrays of a large number of chosen DNAsequences for looking at gene expression in particulartissues (expression array analysis), probing a relatedgenome for homologous genes of interest and many otherinteresting possibilities we are only beginning to con-sider. To get somewhat of a feeling for the potential ofsuch chips, the technology now exists to choose a verylarge number of specific DNA sequences (for example,10,000 genes of interest in humans) and simply have acomputer pull the sequences out of a database and printa custom chip optimized for our particular problemusing what amounts to fairly standard ink-jet printer

178

technology. With such a chip in hand one can examinethe tissue- and developmental-specific patterns of expres-sion in genes of interest, as well as examine their expres-sion in the tissues of related species. Chips can also be

used to rapidly genotype individuals at literally tens ofthousands of SNP sites.

Besides faster and cheaper sequencing, a major factorfacilitating future genomic projects is the ability to usesequence homology to bootstrap from a model system toa related species. For example, a fraction of microsatellitemarkers found in Drosophila may extend over to (say) aparticular moth of interest. Likewise, a DNA chip from amodel organism can be used to extract homologousgenes of interest, and subtraction methods allow theunique genomic sequences of one organism (relative to atarget model system) to be extracted (or at least enriched).There is thus a level of acceleration with genomics, in that.as more organisms are investigated, the phylogenetic spaceof model systems becomes increasingly denser, furtherfacilitating genomic projects by providing systems witheven closer sequence homology. For example, using resultsfrom Drosophila, one lab may obtain genomic sequencesfrom the silkworm (Bombyx mori) and another from thecorn earworm (Helicoverpa zea). With both these lepidop-teran sequences in hand, workers using other moth speciescan use these more closely related species in place ofDrosophila as their model system benchmarks. The WorldWide Web greatly facilitates rapid exchange of thisgrowing web of genomic information.

In the immediate future, it is thus clear that largeamounts of genomic sequences can be obtained for justabout any organism of interest. The impact this will haveon quantitative-genetic studies in these (and related)organisms are twofold. First, obvious candidates formicrosatellite markers can be chosen from sequence data.While scanning for population variation in potentialmarkers is very straightforward, constructing the geneticmap using these markers requires the ability to recoverand score meiotic products. The ability to use thepolymerase chain reaction (PRC) to sequence singlehaploid gametes allows such maps to be constructed evenfor long-lived, or otherwise difficult to breed, species.Likewise, methods for rapid detection of SNPs continueto be developed at a rapid pace. For example, the SNPConsortium (http:��snp.cshl.org) announced in August2000 that it had detected almost 300,000 human SNPs,while (not to be outdone) Celera (http:��www.Celera.com) announced a month later that it had detected 2.4million human SNPs.

The second impact from genomic sequences is thepotential to intelligently suggest candidate genes for thecharacter(s) of interest. This is not nearly as straight-

Bruce Walsh

forward as using sequence data to find potential markers,and a variety of strategies have been suggested to findsuitable candidates. The most obvious is to choosecandidates based on homologies to genes known to be

Page 5: Quantitative Genetics in the Age of Genomics

involved in trait variation in some well-studied modelsystem. At the coarsest level, this could simply entailexamining all genes that are known to have mutationsthat affect the trait of interest. For example, in Drosophilathere are over 130 genes with known mutations affectingbristle number (T. Mackay, personal communication).Not only are these loci candidates for bristle numberQTLs in Drosophila, but their homologues are bristlecandidates for other insects as well. With such a collec-tion of potential candidates in hand, one can do addi-tional screening, for example by using microarrays to seewhich (if any) are indeed expressed in tissues that affectthe character(s) of interest.

The next generation of strategies for candidate inclu-sion�rejection requires the ability to read a DNAsequence in much more biological detail than is currentlypossible. At present, we can translate any particular cod-ing region of DNA into an amino sequence and by look-ing for specific protein motifs can make some general(and often somewhat intelligent) statements about itsfunction (i.e., the resulting protein spans a membrane, orbinds DNA, etc.). As more proteins are examined, thecatalogue of structural motifs will grow, allowing us tomake even more informed statements about protein func-tion simply from its amino acid sequence. Results fromsuch proteomic studies may suggest possible genes thatotherwise would be overlooked as candidates. Equallyimportant, it may allow other genes to be excluded ascandidates. The third generation of candidate detec-tion�exclusion strategies (which may be achieved in thenext decade) will be deciphering the regulatory aspects ofa DNA sequence, in particular being able to detect tissue-or developmental- specific regulatory signals. Such infor-mation will obviously be greatly informative for addingand removing genes as candidates. The most far-removedprospect is in understanding the gene circuits andpathways to the extent of predicting which genes arelikely to have major effects on the flux through apathway. If we can determine that a particular gene canhave only a very minor effect on a pathway of interest, itmay indeed be a QTL, but it is expected to account forvery little trait variation.

The above collection of approaches, some fullydeveloped, the others perhaps nothing more than pipedreams, can suggest a set of candidates. Of course, wewill have to tolerate inclusion of a large number of failedcandidates to avoid exclusion of candidates that areindeed QTLs in the population of interest. One approach

Quantitative Genetics in the Age of Genomics

is to start with a collection of candidates, test for associa-tions, and determine how much residual genetic varia-tion remains after the successful candidates are included.If only a small fraction of genetic variation is accounted

for, the search for additional candidates has the potentialto be rewarding. The converse is not necessarily true.Even if the model based on the successful candidatesaccounts for most of the genetic variation, importantalleles at other loci (having a large effect on the genotypicvalue, but a small effect on the variance because they arerare) can still be present.

The final point about candidates is that while much ofthe focus is on those genes contributing to within-pop-ulation variation, genes contributing to between-popula-tion differences are equally (if not more) important tobreeders and evolutionary geneticists. Sequencing pro-jects offer the potential for rapidly scanning a number ofcandidates for such between-population differences. Inpopulations where the character divergence is thought tohave been at least partially driven by selection, one cansearch for signatures for selective sweeps around can-didate genes (provided the divergence is sufficientlyrecent). Indeed, such an approach (using a cloned maizeQTL) suggests that a single regulatory change in a QTLis responsible for some of the major differences in plantarchitecture between maize and its undomesticatedancestor teosinte (Wang et al., 1999).

While a number of clever strategies can be used toenrich the pool of candidate genes, it must be stressedthat most candidates are likely to fail to show associa-tions. Candidates influencing a character in one modelsystem may have essentially no effect in a related species(some examples of this striking lack of correlation formice vs humans are reviewed by Guo and Lange, 2000).Even when a significant correlation is found betweencandidate genotypes and trait values, it still must bedemonstrated that this is a true association, as opposedto being an artifact of population stratification (subdivi-sion). A classic example of such a stratification-inducedcorrelation is that a marker highly associated withdiabetes in Pima Native Americans is also a marker foradmixture with the Caucasian population (which is alower risk population for diabetes). In a restricted samplelooking only at full-heritage individuals, this associationvanishes (Knowler et al., 1988). Association tests havebeen developed to control for admixture (e.g., Ewens andSpielman, 1995; Allison, 1997; Spielman and Ewens,1998; Allison, et al., 1999; Clayton, 1999; George et al.,1999; Monks and Kaplan, 2000). Unfortunately, theserequire collections of relatives. Ideally, by using suf-ficiently dense markers, SNP-trait associations can bedetected in random samples from the entire population,

179

using unlinked markers to control for populationstratification. Such population-level associations are thecurrent hope of many human geneticists trying to dissectcomplex traits. Further commentary on this approach

Page 6: Quantitative Genetics in the Age of Genomics

was recently offered in this journal by Guo and Lange(2000), and elsewhere by Kruglyak (1999) and Risch(2000).

When a large number of markers are examined, theproblem of controlling for false positives while still main-taining statistical power is a difficult one. For example,using DNA chip technology, one could easily score30,000 SNP sites in humans. Given the roughly 3000 cMmap of humans, such markers (if suitably chosen) wouldspan every 0.1 cM, meaning that any potential QTL isless than 0.05 cM from a marker (roughly 50 kb). In sib-pair or other pedigree approaches of QTL mapping, suchmarker information can be efficiently combined toestimate the probability of identity by descent for anyparticular chromosomal region. However, the problemof dealing with such a large number of highly correlatedtests (especially for markers on the same chromosomes)in population-level association studies is still largelyunresolved. The advantage of scanning a small collectionof candidates (versus a whole-genome SNP scan) is thatfar fewer tests are involved. The difficult issue is movingfrom a candidate-trait association (for example, by asignificant variance associated with a candidate locus) toidentifying particular SNP sites within that candidateinfluencing the trait of interest. For any candidate, weexpect a population sample to contain a number of SNPsites which are likely to be very highly correlated, greatlycomplicating association studies at the nucleotide level.

The false-positives versus power problem that com-plicates the search for candidates is even more problematicin tests of second and higher-order epistasis betweencandidates. Restricting epistasis tests to only candi-dates that each show significant marginal effects is oneapproach, but one can easily conceive of situations whereloci have small marginal effects, but specific combina-tions could have substantial effects.

HUMAN GENETICS

Given the above general considerations about thefuture of quantitative genetics, what can we say about thefuture impacts for specific fields?

We will say very little about the potential directions ofhuman quantitative-genetics over the next 20 years, asthis subject has received much recent discussion (e.g.,Guo and Lange, 2000; Risch, 2000). Briefly, the most

180

obvious, and well-stated, goal of the human genome pro-ject as it relates to quantitative genetics is in the searchfor QTLs for disease susceptibility. As Risch (2000) hasstressed, genes of major effect are the low-lying fruit, and

hence easily picked, while genes of more modest effectwill prove considerably more difficult. Even with thecomplete human sequence in hand, large and carefullydesigned association studies are required to demonstratethat particular candidate loci do indeed have an impacton human disease. There is considerable debate as towhether the best approach for demonstrating suchassociations is to use a large genetically heterogeneouspopulation (such as a random sample from a majorcosmopolitan city) or to sample from more geneticallyhomogeneous populations (such as Iceland). One of theunstated assumptions of many QTL mappers is that locithat overall have a modest effect may have a much largereffect in the right environment and�or genetic back-ground. If genetic background is critical (i.e., there arelarge epistatic interactions between some QTLs), thenusing a more homogenous population offers more powerfor detecting certain QTLs (those whose more favorablebackgrounds are fixed or are at least at high frequencies)and will entirely miss others (if the background is fixedfor the wrong loci). Even if epistasis is not significant,we fully expect different populations to be segregatingdifferent alleles with different effects. It is likely thatassociation studies may first start with a collection ofhomogenous populations to provide higher power formapping a subset of QTLs and then use the total collec-tion of detected associations to screen large hetero-geneous populations for additional associations.

BREEDING

For centuries, selection on complex traits to improvedomesticated plants and animals has been entirely onphenotype. While this has proven to be a fabulously suc-cessful approach, the age of genomics offers the prospectof shifting selection directly to genotypes. At first glancethis appears to offer the potential to greatly enhanceselection response, but it must be remembered that agene of major effect rapidly increases in frequency undersimple mass (phenotypic) selection. Thus, in many cases,the ability to select directly on genotypes might not offerdramatic increases in response. One obvious case whereselection on genotypes can potentially increase responseis when we can select individuals before the phenotypictrait is expressed. With the appropriate markers, favorablegenes can be selected even in individuals where they are not

Bruce Walsh

expressed (for example, selection on milk production inmales), but BLUP (best linear unbiased predictors),prediction of breeding values using information fromrelatives, can accomplish the same task in many settings

Page 7: Quantitative Genetics in the Age of Genomics

(but not all, for example as pointed out by one of thereviewers, males from within the same full-sib family havethe same BLUP as they have the same parents).

There are important reasons that phenotype shouldalways be a component in selection schemes. Mutation iscontinuously generating a background of new variation,and selection on phenotypes is an easy way to capturemuch of this variation. A more subtle reason is that selec-tion schemes targeting specific genotypes can result in amuch greater reduction in effective population size (Ne)than selection based solely on phenotypic value. Giventhat a very significant component of selection responseafter the first few generations arises from new mutations(Hill 1982a, b; Frankham, 1980; Weber and Diggins,1990), a reduction in Ne reduces the accumulated muta-tional variance because of random genetic drift, loweringthe long term rate of response.

Equally important as the genomic revolution areadvances in reproductive technologies. Already, theability to select plant cells in tissue culture and then growup the surviving individuals into fertile plants greatlyincreases the efficiency of selection for certain characters(such as pesticide resistance). Likewise, the relativelyrecent breakthroughs in cloning large mammals meanthat animal breeders will now have to learn the schemesplant breeders have used for ages to exploit asexualreproduction. Further expected advances may allow abreeder to use cell cultures of individuals with elite breed-ing values, creating clones as needed. With sufficientbreakthroughs in cell culture technology, it is con-ceivable that instead of cloning a series of individualsfrom (say) a prize bull, one could simply extract spermfrom an appropriate cell culture. The downside to theseapproaches is that rapid evolution for adaptation toculture growth can result in rather dramatic geneticchanges (such as chromosomal loss or duplication),which limits the shelf-life of such cultures as a powerfulbreeding tool. Further, as plant breeders are well aware,Mendelian reassortment and recombination are criticalcomponents for continued progress, so that cloning isnot a substitute for sexual reproduction.

With embryo transplantation and cloning, we nowhave the experimental tools to ask very interesting ques-tions about the importance of maternal effects (such asthe importance of the maternal pre- and post-natalenvironment). A much deeper understanding of theimportance of maternal effects is thus expected to bedeveloped in the near future. This has important implica-

Quantitative Genetics in the Age of Genomics

tions for breeders, as one can easily envision implantingembryos from in vitro crosses of elite parents into sur-rogate mothers selected for high performing maternalenvironments.

Finally, while the impact of transgenic organisms can-not be underestimated in terms of introducing importantgenetic variation, we presently have a very poor ability topredict the phenotypic consequences of any particulartransgenic construction. Currently, novel genes areessentially inserted at random in the genome, usuallyunder some strong promoter sequence to mitigate posi-tion effects. This may result in advantageous changes atone trait, but can also result in deleterious effects onothers. While present transgenic technologies work wellfor major genes, given the variation in expression fromposition effects its effectiveness for moving genes ofmodest effects is unclear. A deeper understanding of theregulatory control signals will certainly increase trans-genic efficiencies, as will the development of trulyhomologous recombination vectors. Given the essen-tially unlimited variation from the introduction of novelgenes, should breeders still worry about maintaining suf-ficient effective population sizes? The answer is clearlyyes. Besides reducing the impact of inbreeding depres-sion, larger populations are expected to harbor greaterpolygenic variation (everything else being equal). A largepopulation has a far better chance of segregating modi-fier genes to ameliorate deleterious effects associated withthe introduction of some major genes.

EVOLUTIONARY GENETICS

The genetic architecture of traits is of key concern toevolutionary geneticists. Many of these architecturalissues can be addressed by analysis of a collection oftightly linked markers and the appropriate experimentaldesign. Examples include the relative roles of deleteriousrecessives vs overdominant loci in inbreeding depression,the number of loci that contribute a significant fraction ofvariation, whether new mutations occur mainly at the setof already segregating loci or if entirely new loci areinvolved, etc. Hence, many of these fundamental issuescan be examined without ever having to sequence a singleQTL.

This being said, one ultimate aim of evolutionaryquantitative geneticists is the full sequence analysis of acollection of QTLs in several model systems. Such com-parative sequence data will allow us to address thefundamental question of the molecular causes of pheno-typic variation (both within a population and between

181

populations or species). Are modest phenotypic differenceslargely caused by genes very far along a developmentalpathway or can subtle differences in expression or func-tion of genes at the base of a complex pathway generate

Page 8: Quantitative Genetics in the Age of Genomics

small differences in the phenotypic end product? Are theloci involved in phenotypic variation fairly trait-specificor are they involved in organism-wide signaling such ashormones, hormonal receptors, ion channel genes, ortranscription factors? Is the oft-stated view (e.g., Wilson,1976; Carroll, 2000) that the majority of phenotypic dif-ferences (and hence adaptation) are due to regulatory, asopposed to structural, changes correct? When similarphenotypes arise independently in different species, arethe same set of genes involved? Is there a difference instrength of selection on regulatory vs. structural differen-ces contributing to trait variation? Are there genomic hotspots of quantitative trait variation? Are there genomicconstraints to evolution that one can quantify? Theresolution of these questions for even a small number ofmodel systems will provide considerable insight intoevolution. The machinery to attack many of these ques-tions is already in hand. For example, with a populationsample of DNA sequences from a QTL, one can usestandard tests to look for signatures of selection as wellas compare differences between species.

Finally, the ability to construct transgenic organismsoffers the possibility (in certain settings) to do the onceunthinkable, namely to regenerate the sequence of keyancestral steps during the adaptation of a particularcharacter. If one localizes (say) six major genes betweenan ancestral population and a derived one adapted to aparticular environment, each major gene can be insertedinto the otherwise ancestral background, and the fitnesseffects of these constructs can be examined in theappropriately controlled natural settings.

THE COMPUTATIONAL REVOLUTION:A BAYESIAN FUTURE?

In addition to the new technologies from genomics andmolecular biology, the computer revolution (itself a keycomponent for practical genomics) has very significantimplications for the future of quantitative genetics. Ascomputers continue to become faster and cheaper, com-putationally intensive algorithms will continue to gainimportance. Resampling approaches (such as ran-domization tests and bootstrapping) already have had animpact on QTL mapping, allowing for the construction ofconfidence intervals and proper tests of significance. Like-wise, variance component estimation using very large

182

pedigrees, pioneered by animal breeders using the BLUPmachinery developed by Henderson (reviewed in Lynchand Walsh, 1998), will continue to grow in importancein the analysis of both human and natural populations.

Indeed, such an approach has been suggested as apowerful tool for QTL mapping in humans (Almasy andBlangero, 1998). With a collection of major genes inhand, extensive studies of genotype-environment interac-tions become practical, and the AMMI (additive maineffects, multiplicative interactions) model of plantbreeders (Gauch, 1992) offers a powerful, yet usuallyoverlooked, tool for the analysis of genotype by environ-mental effects.

The most far-reaching of these computationally inten-sive methods are Markov Chain Monte Carlo (MCMC)approaches for simulating draws from complex proba-bility distributions (e.g., Geyer, 1992; Tierney, 1994;Tanner, 1996). In particular, the Gibbs sampler (Gemanand Geman, 1984) allows for the efficient exploration ofvery complex likelihood surfaces and calculation ofBayesian posterior distributions (Smith and Roberts,1993). These resampling approaches (Gibbs andMCMC) allow for relatively easy analysis in modelswhere standard likelihood calculations, at best, areextremely difficult (such as likelihoods over complexpedigrees, e.g., Thompson, 2000). A more profoundresult of these resampling methods is that, akin to theshift over the past 20 years in quantitative geneticstowards more likelihood-based analysis, the next 20years will likely be marked by a similar influx of Bayesianmethods replacing their likelihood counterparts. Forexample, resampling-based Bayesian methods for multipleQTL mapping have recently been proposed (Sillanpaaand Arjas, 1998, 1999; Stephens and Fisch, 1998).

As most readers are undoubtedly aware, there is aclose connection between likelihood and Bayesianapproaches. Under a Bayesian analysis, the posteriorprobability density for a vector u of parameters isproportional to the prior density assumed for u timesthe likelihood function L for u given the observed datavector x,

Posterior(u | x)=constant* L(u | x)* Prior(u)

A likelihood ratio analysis is concerned with the localgeometry of the likelihood surface around its maximalvalue. A Bayesian analysis is concerned with the entireshape of the posterior, which depends on both thelikelihood function and the assumed prior. A key featureof such an analysis is that, by suitable integration of thefull posterior density, marginal posterior distributionsfor any component(s) of u can be obtained.

Bruce Walsh

While there are deep philosophical differences betweenBayesian and frequentist statisticians (which at timestakes on the air of religious warfare), this predicted shiftfrom likelihood to Bayesian approaches will be driven by

Page 9: Quantitative Genetics in the Age of Genomics

practical and applied concerns. Many quantitativegeneticists may not be deeply worried about the subtledifferences between Bayesian and frequentist notionsof probability (much to the anguish of both sides), butwill immediately appreciate that a Bayesian marginalposterior for an unknown parameter gives a distributionthat incorporates any uncertainty in estimating othernuisance parameters as well. For example, an estimate ofan epistatic variance involves estimates of additive,dominance, and environmental variances as well aspotential fixed effects. The marginal posterior for theepistatic variance component, by integrating the poste-rior over all other parameters, automatically incor-porates the uncertainty in these nuisance parameters intothe uncertainty of the epistatic variance.

Besides debates over philosophical considerations(such as the choice of a prior), the major stumbling blockto Bayesian approaches has been their mathematicalintractability for the complexity inherent in most quan-titative genetics models. This was also the case forlikelihood methods 30 years ago, when faster computersand better algorithms made these approaches feasible.Likewise, ever faster computers and the Gibbs sampler(and other MCMC methods) have made Bayesianmethods very widely applicable, and they are expected toplay a major role in quantitative genetics. This is espe-cially true in the analysis of the coming generation ofhigh-dimensional models that must deal simultaneouslywith phenotypic, molecular, and environmental informa-tion. A concern expressed by one of the reviewers (inwhich I am in complete agreement) is that a negativeconsequence of resampling methods may well be anincreased use of inappropriate models. The (relative)ease of these methods can tempt the unwary into using ahighly complex model that does not fit the observed data.Resampling methods allow for extensive exploratorydata analysis (for example, through jackknifing andbootstrapping), and it is hoped that quantitativebiologists will be trained to fully examine data sets beforeproceeding with any particular underlying model ofanalysis.

SUMMARY

Quantitative genetics is indeed very healthy in thiscoming age of genomics, and will play an even greater

Quantitative Genetics in the Age of Genomics

role as genotypes of potential interest are investigated byhuman geneticists breeders, and evolutionary geneticists.While we have (or soon will have) the ability to doexperiments that the founders of quantitative genetics

could never envision in their wildest imagination, thebasic machinery they developed is easily adaptable to thenew analyses that will be required.

Far from ``freeing'' molecular biologists from mathe-matics, the age of genomics has forced an appreciation ofthe importance of quantitative methods. As we start tomine this genomic information and attempt to mapmolecular variation into trait variation, quantitativegenetics will move even more to the forefront. I'm afraidmy molecular colleagues will have to develop a deeperappreciation of Fisher (1918).

ACKNOWLEDGMENT

Thanks to Sam Karlin and the anonymous reviewers for theirexcellent comments and suggestions.

REFERENCES

Allison, D. B. 1997. Transmission-disequilibrium tests for quantitativetraits, Amer. J. Hum. Genetics 60, 676�690.

Allison, D. B., Heo, M., Kaplan, N., and Martin, E. R. 1999. Sibling-based tests of linkage and association for quantitative traits, Amer.J. Hum. Genetics 64, 1754�1764.

Almasy, L., and Blangero, J. 1998. Multipoint quantitative-trait linkageanalysis in general pedigrees, Amer. J Hum. Genetics 62, 1998�1211.

Basten, C. 2000. Review of genetics and analysis of quantitative traits,Theor. Pop. Biology 57, 307.

Carroll, S. B. 2000. Endless forms: the evolution of gene regulation andmorphological diversity, Cell 101, 577�580.

Clayton, D. 1999. A generalization of the transmission�disequilibriumtest for uncertain-haplotype transmission, Amer. J. Hum. Genetics 651170�1177.

Ewens, W. J., and Spielman, R. S. 1995. The transmission�dis-equilibrium test: history, subdivision, and admixture, Amer. J. Hum.Genetics 57, 455�464.

Fisher, R. A. 1918. The correlation between relatives on the suppositionof Mendelian inheritance, Trans. Royal Soc. Edinburgh 52, 399�433.

Frankham, R. 1980. The founder effect and response to artificial selec-tion in Drosophila, in ``Selection Experiments in Laboratory andDomestic Animals'' (A. Roberston, Ed.), pp. 87�90. CommonwealthAgricultural Bureaux.

Gauch, H. G. 1992. ``Statistical Analysis of Regional Yields Trails:AMMI Analysis of Factorial Designs,'' Elsevier, Amsterdam.

Geman, S., and Geman, D. 1984. Stochastic relaxation, Gibbs distribu-tion and Bayesian restoration of images, IEEE Trans. Pattern Anal.Mach. Intell. 6, 721�741.

George, V., Tiwari, H. K. Zhu, X., and Elston, R. C. 1999. A test fortransmission�disequilibrium for quantitative traits in pedigree data,by multiple regression, Amer. J. Hum. Genetics 65, 236�245.

Geyer, C. J. 1992. Practical Markov chain Monte Carlo (with discus-

183

sion), Stat. Sci. 7, 73�511.Guo, S.-W., and Lange, K. 2000. Genetic mapping of complex traits:

promises, problems and prospects, Theor. Popul. Biol. 57, 1�11.Hill, W. G. 1982a. Rates of change in quantitative traits from fixation

of new mutations, Proc. Natl. Acad. Sci. 79, 142�145.

Page 10: Quantitative Genetics in the Age of Genomics

Hill, W. G. 1982b. Predictions of response to artificial selection fromnew mutations, Genet. Res. 40, 255�278.

Knowler, W. C., Williams R. C., Pettitt, D. J., and Steinberg, A. G.,1988. Gm3; 5, 13, 14 and type 2 diabetes mellitus: an association inAmerican Indians with genetic admixture, Am. J. Hum. Genet. 43,520�526.

Kruglak, L. 1999. Prospects for whole-genome linkage disequilibriummapping of common disease genes, Nature Genetics 22, 139�144.

Lynch, M. and Walsh, B. 1998. ``Genetics and Analysis of QuantitativeTraits,'' Sinuaer Associates, Sunderland, MA.

Monks, S. A., and Kaplan, N. L. 2000. Removing the sampling restric-tions from family-based tests of association for a quantitative-traitlocus, Amer. J. Hum. Genetics 66, 576�592.

Risch, N. J. 2000. Searching for genetic determinants in the new millen-nium, Nature 405, 847�856.

Sillanpaa, M. J., and Arjas, E. 1998. Bayesian mapping of multiplequantitative trait loci from incomplete inbred line cross data,Genetics 148, 1373�1388.

Sillanpaa, M. J., and Arjas, E. 1999. Bayesian mapping of multiplequantitative trait loci from incomplete oubred offspring data,Genetics 151, 1605�1619.

Smith, A. F. M., and Roberts, G. O. 1993. Bayesian computation via theGibbs sampler and related Markov chain Monte-Carlo methods(with discussion), J. Royal. Statist. Soc. Ser. B 55, 3�23.

184

Spielman, R. S., and Ewens, W. J.. 1998. A sibship test for linkage in thepresence of association: The sib transmission�disequilibrium test,Amer. J. Hum. Genetics 62, 450�458.

Stephens, D. A., and Fisch, R. D. 1998. Bayesian analysis of quan-titative trait loci data using reversiible jump Markov chain MontaCarlo, Biometrics 54, 1334�1347.

Tanner, M. A. 1996. ``Tools for Statistical Analysis,'' Springer-Verlag,Berlin�New York.

Tierney, L. 1994. Markov chains for exploring posterior distributions(with discussion), Ann. Statist. 22, 1701�1762.

Thompson, E. A. 2000. ``Statistical Inferences from Genetic Data onPedigrees,'' IMS Books.

Wang, R.-L., Stec, A., Hey, J., Lukens, L., and Doebley J. 1999. Thelimits of selection during maize domestication. Nature 398, 236�239.

Weber, K. E. and Diggins, L. T. 1990. Increased selection response inlarger populations. II. Selection for ethanol vapor resistance inDrosophila melanogaster at two population sizes, Genetics 125,585�597.

Wilson, A. C. 1976. Gene regulation in evolution, in ``MolecularEvolution'' (F. J. Ayala, Ed.), pp. 225�236, Sinauer Associates,Sunderland, MA.

Winkelman, D. C., and Hodgetts, R. B. 1992. RFLPs for somatotropicgenes identify quantitative trait loci for growth in mice, Genetics 131,929�937.

Bruce Walsh