15
© 2013, Elsevier Ltd. All rights reserved. 1 CHAPTER 12 Multifactorial Inheritance and Complex Diseases Christine W Duarte, Laura K Vaughan, T Mark Beasley, and Hemant K Tiwari Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, AL, USA This article is a revision of the previous edition article by Hemant K Tiwari, T Mark Beasley, Varghese George and David B Allison, volume 1, pp 299–306, © 2007, Elsevier Ltd. 12.1 INTRODUCTION If a disease or condition is caused by a single locus of large effect, it is called a single-gene or monogenic dis- ease, disorder, or more generically, condition. There are over 10,000 such examples, which include cystic fibrosis, Huntington disease, Duchenne muscular dystrophy, and Marfan syndrome. It is possible that a single-gene dis- ease has locus heterogeneity if that disease is caused by single mutations in different genes, but this is more prop- erly considered a special case of an oligogenic disorder. For example, osteogenesis imperfecta is caused by single mutation in genes on either chromosome 7 or chromo- some 17. Oligogenic disorders are explained by a few loci with large effects (for examples, see Reference (1)). In contrast to oligogenic traits, polygenic inheritance is due to many loci with small effects at each locus. Thus, the term polygenic is generally used to describe multiple factors that are exclusively genetic. Any of these genetic effects, with or without the combination of an environ- mental effect, can give rise to a multifactorial disorder. Multifactorial diseases are caused by the simultaneous action of multiple genetic and/or environmental factors. In contrast to dichotomous traits (i.e. affected versus unaffected), quantitative traits are measured on a con- tinuous scale, most of which are thought to be multifac- torial (e.g. blood pressure and body mass index). Some quantitative traits may be due to major gene effects with a multifactorial background. Multifactorial inheritance is responsible for the majority of modern deleterious health conditions such as heart disease and diabetes. Atopic syndrome, diabetes, cancer, spina bifida/anen- cephaly, pyloric stenosis, cleft palate, congenital hip dys- plasia, club foot, and many other disorders and complex phenotypes result from multifactorial inheritance. 12.2 DEFINITIONS AND TERMINOLOGY The polygenic model has its origins from Fisher’s semi- nal work (2), which showed that “many small, equal and additive loci” would result in a Gaussian (or normal) distribution for a phenotype. Similarly, the combined additive effects of many genetic and environmental fac- tors will also produce an approximately Gaussian phe- notypic distribution. To illustrate, suppose (naïvely) that a quantitative trait such as percent body fat is deter- mined by a single gene with two codominant 1 alleles, A and a, which have equal frequency (p = 0.50). Assume individuals with an A allele tend to have a higher value of the trait, and individuals with an a allele tend to have a lower value of the trait. If A has an additive effect, then there are three distinct phenotypic groups, namely, high (2), intermediate (1), and low (0). If the allele frequen- cies of A and a are both 0.50, then 25% of individuals would be expected to be aa and of low-percent fat, 50% would be expected to be Aa and of moderate-percent fat, and 25% would be expected to be AA and of high- percent fat. Figure 12-1 gives the distribution of the trait in a population. Now, suppose that the trait is determined by two loci. The second locus also has two codominant alleles, B for high and b for low expression of the trait, with B having an allele frequency of 0.50 and the same effect magni- tude as the A allele. There are now nine possible geno- types (see Table 12-1). 1 “An allele a is said to be codominant with respect to the wild-type allele A if the A/a heterozygote fully expresses both the pheno- types associated with the a/a and A/A homozygotes.”—from http:// www.informatics.jax.org/javawi2/servlet/WIFetch?page=glossary Term&key=codominant.

Chapter 12 - M ultifactorial Inheritance and Complex Diseases

Embed Size (px)

DESCRIPTION

cs

Citation preview

  • mutation in genes on either chromosome 7 or chromo-some 17. Oligogenic disorders are explained by a few loci with large effects (for examples, see Reference (1)). In contrast to oligogenic traits, polygenic inheritance is due to many loci with small effects at each locus. Thus, the term polygenic is generally used to describe multiple factors that are exclusively genetic. Any of these genetic effects, with or without the combination of an environ-mental effect, can give rise to a multifactorial disorder. Multifactorial diseases are caused by the simultaneous action of multiple genetic and/or environmental factors.

    In contrast to dichotomous traits (i.e. affected versus unaffected), quantitative traits are measured on a con-tinuous scale, most of which are thought to be multifac-torial (e.g. blood pressure and body mass index). Some quantitative traits may be due to major gene effects with a multifactorial background. Multifactorial inheritance is responsible for the majority of modern deleterious health conditions such as heart disease and diabetes. Atopic syndrome, diabetes, cancer, spina bifida/anen-cephaly, pyloric stenosis, cleft palate, congenital hip dys-plasia, club foot, and many other disorders and complex phenotypes result from multifactorial inheritance.

    individuals with an A allele tend to have a higher value of the trait, and individuals with an a allele tend to have a lower value of the trait. If A has an additive effect, then there are three distinct phenotypic groups, namely, high (2), intermediate (1), and low (0). If the allele frequen-cies of A and a are both 0.50, then 25% of individuals would be expected to be aa and of low-percent fat, 50% would be expected to be Aa and of moderate-percent fat, and 25% would be expected to be AA and of high-percent fat. Figure 12-1 gives the distribution of the trait in a population.

    Now, suppose that the trait is determined by two loci. The second locus also has two codominant alleles, B for high and b for low expression of the trait, with B having an allele frequency of 0.50 and the same effect magni-tude as the A allele. There are now nine possible geno-types (see Table 12-1).

    1An allele a is said to be codominant with respect to the wild-type allele A if the A/a heterozygote fully expresses both the pheno-types associated with the a/a and A/A homozygotes.from http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=glossaryM

    Department of Biosta

    This article is a revision of the previous edition article by Hemant Kvolume 1, pp 299306, 2007, Elsevier Ltd.

    12.1 INTRODUCTION

    If a disease or condition is caused by a single locus of large effect, it is called a single-gene or monogenic dis-ease, disorder, or more generically, condition. There are over 10,000 such examples, which include cystic fibrosis, Huntington disease, Duchenne muscular dystrophy, and Marfan syndrome. It is possible that a single-gene dis-ease has locus heterogeneity if that disease is caused by single mutations in different genes, but this is more prop-erly considered a special case of an oligogenic disorder. For example, osteogenesis imperfecta is caused by single 2013, Elsevier LC HA P T E R

    12ultifactorial Inheritance and Complex Diseases

    Christine W Duarte, Laura K Vaughan, T Mark Beasley, and Hemant K Tiwari

    tistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, AL, USA

    Tiwari, T Mark Beasley, Varghese George and David B Allison,

    12.2 DEFINITIONS AND TERMINOLOGY

    The polygenic model has its origins from Fishers semi-nal work (2), which showed that many small, equal and additive loci would result in a Gaussian (or normal) distribution for a phenotype. Similarly, the combined additive effects of many genetic and environmental fac-tors will also produce an approximately Gaussian phe-notypic distribution. To illustrate, suppose (navely) that a quantitative trait such as percent body fat is deter-mined by a single gene with two codominant1 alleles, A and a, which have equal frequency (p = 0.50). Assume td. All rights reserved. 1

    Term&key=codominant.

  • 2 CHAPTER 12 Multifactorial Inheritance and Comp

    An individual can possess 0, 1, 2, 3, or 4 high-trait alleles. Assuming that the combined effects of the two loci are also additive,2 there are five distinct pheno-types with respect to the number of high-trait alleles (see Table 12-2).

    The trait distribution with respect to genotypic value distribution is shown in Figure 12-2. As can be seen in Figure 12-2, even with two loci, the distribution of the phenotype starts to look Gaussian. An example of a three-locus system with equal allele frequencies, no link-age disequilibrium,3 and equal additive effects, is shown in Figure 12-3. It can be shown that six diallelic loci are enough to produce population frequencies virtually indistinguishable from a normal curve.

    Many traits (or diseases) are treated as dichotomous variables because they appear to be either present or absent (e.g. cancer). By definition, dichotomous variables do not approximate a Gaussian distribution. However, these diseases may still be polygenic or multifactorial because

    2That is, not epistatic, where epistatic refers to an interaction (in the statistical, not necessarily biochemical sense) between two dif-ferent loci, such that the effect of genotype at one locus depends on the genotype at another locus.3Linkage disequilibrium is defined as the nonrandom association between alleles at linked (or adjacent) loci (133). Two loci are said to be linked if they are close sufficiently close on the same chromo-some such that they do not segregate independently.

    FIGURE 12-1 Expected phenotype distribution for a trait with a single causal locus with an allele frequency of 50% and in HardyWeinberg equilibrium.

    TABLE 12-1 Frequency Distribution of Genotypic Values for Two Loci with No Linkage Disequilibrium

    AA Aa aa

    BB 0.0625 0.1250 0.1250Bb 0.1250 0.2500 0.1250Bb 0.0625 0.1250 0.0625lex Diseases

    they do not follow the patterns expected of Mendelian (single-gene) diseases. A common explanation is that an underlying liability distribution exists for multifactorial diseases (3). Individuals on the low end of the distribution have little chance of developing the disease because they possess few of the alleles or environmental factors that jointly cause the disease. By contrast, individuals on the high end of the liability distribution have a greater chance

    TABLE 12-2 Genotypic Values of Two Loci Genotypes

    AA Aa aa

    BB 4 3 2Bb 3 2 1bb 2 1 0

    FIGURE 12-2 Expected phenotype distribution for a trait with two independently segregating causal loci of equal effect and allele frequency.

    FIGURE 12-3 Expected phenotype distribution for a trait with three independently segregating causal loci of equal effect and allele frequency.

  • CHAPTER 1

    of developing the disease because they possess many of the alleles and/or environmental factors that jointly cause the disease. The liability distribution is assumed to be continuous (representing the sum of a large number of independent genetic and environmental factors) and nor-mally distributed within the population. It is also a com-monplace to assume that all correlations between relatives are due to shared genes but not shared environment. For multifactorial diseases that are either present or absent, there is a hypothesized threshold of liability that must be crossed before the disease is manifest (3).

    For example, consider the development of the cleft palate. Early in embryonic development, the palatal arches are in a vertical position. Through embryonic and fetal development, the head grows larger, making the arches farther apart, and the tongue increases in size, making it more difficult to move. In addition, the arches themselves are growing and turning horizontally. There is a critical stage in development by which the two arches must meet and fuse. Head growth, tongue growth, and palatal arch growth are all subject to many genetic and environmental factors. If the two arches start to grow in time, grow at the proper rate, and begin to move soon enough to the horizontal, they will meet and fuse in spite of head size and tongue growth. The result is no cleft palate. They may fuse well ahead of the critical develop-mental stage or just barely make it in time; it is impossi-ble to know. However, if they dont meet at the critical stage, a cleft palate results. If they are close together at the critical stage, a small cleft will result, perhaps only a bifurcated uvula. If they are far apart, a more severe cleft will result. That critical difference in liability is called the threshold. Beyond the threshold, disease results. Below the threshold, normal development is observed. Thus the underlying liability is distributed as the normal curve shown in Figure 12-4.

    Some diseases may have more than one threshold, and commonly two liability thresholds are present as defined by factors such as gender, race, age of onset, and so on, causing different levels of severity (4). Examples include pyloric stenosis (sex dimorphism for liability) (5) and orofacial cleft syndrome/cleft lip and palate (two thresh-olds for fetal mortality and disease) (6,7). The latter model proposes a lower threshold level of liability result-ing in a cleft formation and a higher level causing a fetal death (preferentially in males).

    It should be emphasized that, similar to any other loci, the individual loci underlying a polygenic or multifac-torial trait are generally assumed to follow the Mende-lian principles of random segregation and independent assortment.4 The difference is that they act together to influence the trait. Thus, the multifactorial model assumes:

    4Good descriptions of these principles can be found in the following web pages: http://biology.about.com/library/weekly/aa110603a.htm and http://biology.about.com/library/weekly/aa100903a.htm.2 Multifactorial Inheritance and Complex Diseases 3

    1. several, but not an unlimited number, of loci are involved in the expression of the trait;

    2. the loci act in concert in an additive manner, each adding or subtracting a small amount from the phe-notype; and

    3. the environment interacts with the genotype to pro-duce the final phenotype.

    12.3 DETERMINING THE GENETIC COMPONENT OF A TRAIT

    Historically, the genetic study of any trait can be divided into four broad categories: familial aggregation, segrega-tion analysis, linkage analysis, and association studies. This paradigm was useful in discovering genes for many monogenic disorders.

    12.3.1 Familial Aggregation

    The first step of any genetic analysis is to establish a genetic component to the disease. In addition, one must establish the relative size of the genetic effect in com-parison to other sources of variation, such as common household effect and random environmental effect.

    Familial aggregation can be established using family-based or twin/adoption studies. Family members share genes and environment; therefore, familial aggregation of the trait could be due to genetic and environment together. In general, very few traits are influenced only by genes or only by the environment. Detection and estimation of familial aggregation is the first step in the genetic analysis of any multifactorial trait. Twin and adoption studies are traditionally used to determine the genetic component of the trait (811). As monozygotic

    FIGURE 12-4 Liability distribution for a multifactorial disease. To be affected with the disease, an individual must exceed the threshold.

  • 4 CHAPTER 12 Multifactorial Inheritance and Com

    (MZ) twins share all their nuclear genes, any difference between them regarding a particular trait should be due solely to environmental effects. If the trait is completely influenced by genes, then MZ twins should have essen-tially identical expression of the trait. This is not true for dizygotic (DZ) twins because, on average, they share only 50% of their genes.

    Twin studies to determine the genetic component of the threshold character are based on comparing concor-dance rates of MZ and DZ twins. If both members of a twin pair have the same status of a dichotomous trait (i.e. either both have the disease or both do not have the disease), they are concordant. If they do not share the trait status, they are discordant. The concordance rate is the proportion of concordant twin pairs among all those with the trait. Significantly higher concordance rates in MZ twins compared with DZ twins is considered an evidence for a significant genetic component of the disease. The significance of the difference can be easily tested by a 2 2 contingency table using a chi-square test (see Table 12-3).

    However, concordance rates are not appropriate for continuous traits, so correlation coefficients can be used instead of concordance rates (12).

    The method to determine the degree of genetic com-ponent of a continuous trait is based on a comparison of the variance of the differences between MZ twins and differences between DZ twins. As MZ twins share all their genes, the variance of the trait between MZ twins (VMZ) must be due to environmental variance (VE), so in this case, we have VMZ = VE. However, the variance of the trait between the DZ twins (VDZ) could be due to both environment (VE) and shared genes (VG). So, genetic variance is VG = VDZ VMZ, and therefore, the heritability, h2, is defined as

    h2 =VDZ VMZ

    VMZ.

    Heritability ranges between 0 and 1, with 0 meaning a solely environmentally determined trait and 1 meaning a completely genetically determined trait.

    Adoption studies provide a second familial aggrega-tion strategy for estimating the influence of genes on

    TABLE 12-3 Using Twin Concordance and Discordance Rates to Test for a Genetic Component of a Disease

    TwinsConcordant Pair

    Discordant Pair Total Pairs

    MZ n11 n12 nMZDZ n21 n22 nDZ

    nC nD n

    21 =n(n11n22 n12n21)

    2

    nCnDnMZnDZplex Diseases

    multifactorial traits. The strategy consists of comparing disease rates among the adopted offspring of affected parents with the rates among adopted offspring of unaf-fected parents. Certain biases can influence these studies, namely, (1) parental environment could have long-lasting effects on an adopted child; (2) adoption agencies attempt to match the adoptive parents with natural parents in terms of socioeconomic status; (3) children might be sev-eral years old when adopted, introducing the potential for many environmental confounds. Moreover, these studies are reasonably good at estimating additive genetic effects that are not age specific, but poor at estimating nonad-ditive genetic effects or genetic effects that are expressed differently across the age span.

    For continuous traits, the familial aggregation is usu-ally measured by heritability, which is the proportion of variability of the trait explained by genetic variation. Heritability can be defined as either using total genetic effects (sum of additive, dominant, and epistatic effects) or using only additive effects. The former quantity is called heritability in the broad sense and is given by

    h2 =Var(G)

    Var(T),

    where Var(G) and Var(T) are genetic and total variance, respectively. The latter quantity is called heritability in the narrow sense and is given by

    h2 =Var(A)

    Var(T),

    where Var(A) is the additive genetic variance.One cannot conclude the number of genes or which

    genes are involved in the etiology of the trait from a heritability estimate. Although the absence of familial aggregation is generally thought to rule out a genetic contribution the trait, there are some unlikely yet plau-sible scenarios where this is not so, including but not lim-ited to phenotypic competition within families (13) that counters genetic effects, and an extreme form of epis-tasis referred to by some as emergenesis (14). It is also important to emphasize that heritability is a population-specific estimate, and thus can vary from population to population.

    There are many other methods to detect and estimate familial aggregation using family data. For example, the recurrence risk is often used to determine the strength of familial aggregation for a discrete trait. The recurrence risk is the probability that a relative of an affected indi-vidual is also affected. The most commonly used mea-sure is the sibling recurrence risk, i.e. the probability that a sibling of an affected individual is also affected. The ratio of the sibling recurrence risk and the overall disease prevalence is called a sibling relative risk. It is one of the measures of the magnitude of the genetic contribution to susceptibility for a dichotomous trait (affected ver-sus unaffected). Examination of relative recurrence risk values for various classes of relatives could suggest that

  • CHAPTER 12

    the trait is influenced by multiple loci (15). For a single ascertainment scheme, the sibling recurrence risk can be calculated from sibling data as follows (16):

    Ks =

    s= 1

    a= 1 (a 1)ns(a)s= 1

    a= 1 (s 1)ns(a)

    where a = number of affected sibs in a sibship, s = number of siblings in the sibship, and ns(a) = number of sibships of size s with a affected sibs.

    Note that the familial aggregation methods given above use only trait information from the sample. Owing to the availability of genome-wide single nucleotide poly-morphism (SNP) data, it is now feasible to calculate the heritability using genome-wide SNP markers. One such method was proposed by Visscher et al. (17), who used genome-wide identity-by-descent (IBD) sharing proba-bility between full sibs using genome-wide SNPs data.

    12.3.2 Segregation Analysis

    Once a genetic basis of the trait has been established, traditionally, the next step has been to determine the genetic models that explain the segregation of a pheno-type (continuous, dichotomous, or ordinal) in a given familial data set via segregation analysis. Segregation analysis requires phenotypic data on related individu-als and does not require any molecular data. Segrega-tion analysis is the statistical methodology to determine whether a model with one or more major genes and/or polygenes (i.e. a set of genes each with small quantita-tive effect that together produce a phenotype) is consis-tent with the observed pattern of phenotypic inheritance, and to estimate the parameters of the best-fitting genetic model. It entails determining the mode of inheritance (additive, recessive, or dominant), estimating disease allele frequency, and estimating penetrances (probability of being affected given genotype). At one time, segrega-tion analysis was one of the most important tools for genetic analysis of familial data. In the late 1980s, large numbers of DNA markers became available, which ren-dered segregation analysis less popular.

    If the trait is monogenic and thus owing to single major gene effect, segregation analysis has proven to be a very effective tool in determining the parameters for mode of inheritance. Subsequently, these parameters have been used in model-based linkage analysis (see below for more detail) to find the location of putative disease-causing genes. This paradigm has been used suc-cessfully for the simple Mendelian traits, in which only one gene is segregating. For multifactorial traits, which may be due to the effects of many genes and environ-mental effects, estimation of the genetic model may be virtually impossible using segregation analysis.

    In order to determine the parameters of the genetic model using segregation analysis, the likelihood of a particular mode of inheritance can be formulated using three types of probability functions. First, there is a Multifactorial Inheritance and Complex Diseases 5

    probability distribution for segregation of genotypes among the founders (individuals whose parents are not included in the observed pedigree data), where genotypes of the founders are independently drawn from the popu-lation based on the prevalence of the disease and mode of inheritance. Second, there is a probability distribution of the segregating genotypes of the non-founders (indi-viduals with both parents in the pedigree) conditional on their parental genotypes. Third, there are penetrance functions (probability of being affected given a particu-lar genotype). To test whether there is a segregation of a single gene, the likelihood under the assumed genetic model is compared with the likelihood under the null model of segregation with no genetic effect. The more complicated or general model could be included for test-ing a particular mode of inheritance including polygenic or multifactorial components in modeling the disease. However, the number of possible genetic models with a given mode of inheritance may be too large to make any meaningful inference about the disease model.

    12.3.3 Linkage Analysis

    Genetic linkage analysis is based on the observation that any two loci, which are in close proximity to each other on the same chromosome, will tend to cosegregate among related individuals more often than two random loci in the genome. Thus, the affected individuals sharing a gen-otype at a putative disease locus would be more likely to share a genotype at linked marker loci. In the absence of linkage, the recombination fraction (i.e. = the propor-tion of gametes in which two genes on the same parental chromosome are separated; for more details, see http://en.wikipedia.org/wiki/Genetic_linkage) is ; however, if there is a linkage, the recombination fraction is

  • 6 CHAPTER 12 Multifactorial Inheritance and Com

    are usually made in calculating lod scores: the mode of inheritance underlying the marker and the trait is known, the parameters such as penetrances and allele frequencies at both marker and trait loci are known without error, and all founders are unrelated to each other. Misspecifi-cation of any of these assumptions can affect the validity or power of the analysis and can result in an inconsistent estimate of the recombination fraction. Thus, the models used in model-based analysis must approximate the com-plexity of the disease being investigated. It is noteworthy that incorrect specification of a legitimate model-based linkage test may reduce power but generally does not lead to an inflated Type I error rate (false-positive rate).12.3.3.2 Model-Free Linkage Analysis. The genetic mechanism underlying a complex disease is often unknown, and it is impossible to specify the correct genetic parameters such as mode of inheritance, disease allele frequency, and penetrance in complex diseases. Under these circumstances, model-free linkage analysis, which makes no assumption about the mode of inheri-tance of the trait, is usually preferred. If a disease sus-ceptibility locus and a marker locus are linked, and, by definition, cosegregating in a family, pairs of relatives who are concordant for the disease (i.e. both affected or neither affected) should share more alleles identical-by-descent (IBD5) than will an average pair of relatives with the same degree of kinship. Similarly, discordant (affectedunaffected) relative pairs should share fewer alleles IBD than will an average pair of relatives with the same degree of kinship at the disease locus or marker linked to the disease locus. Model-free methods were first derived for sib pairs (22,23), but were extended to other relative pairs (24,25). The HasemanElston method consists of regressing the squared phenotypic difference among siblings within sib pairs on the estimated propor-tion of alleles the sibs share IBD at a marker locus. A negative slope suggests linkage because it indicates that greater similarity at a trait locus tends to occur with greater similarity at a marker locus. There have been a number of extensions to the HasemanElston method to increase its power (2635). In 2003, Human Heredity published special topic issue titled Recent Advances in the Analysis of Genetic Traits celebrating the thirtieth anniversary of the seminal paper by Haseman and Elston for quantitative trait linkage analysis (Human Heredity 2003, Vol. 55, No. 23). Several software programs are available to perform linkage analyses, including S.A.G.E. (36), GENEHUNTER (37), LINKAGE (3840), Mx (http://www.vcu.edu/mx/) and MERLIN (41).

    The model-free methods described above mostly utilize relative pairs such as sib pairs. However, large extended families could provide more linkage information than

    5IBD alleles in an individual or in two people are defined as alleles that are identical because they have both been inherited from the same common ancestor, as opposed to identity-by-state.from http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=hmg.plex Diseases

    these relative pairs. Methods based on the variance com-ponents framework have become a popular choice for linkage analysis because of ease of modeling covariates and genegene or geneenvironment interactions, and can utilize large extended families in the model-free envi-ronment (4246). There are several programs available to perform linkage analyses based on variance compo-nents methodology such as ACT (43), SOLAR (42), and Mx (http://www.vcu.edu/mx/). Mx uses a structural equation modeling approach, which is equivalent to the other types of variance components approaches under most circumstances. There are many more statistical genetics packages freely available for public use, and a complete list along with download information for each package can be found at http://linkage.rockefeller.edu/soft/ and http://www.soph.uab.edu/ssg/linkage/lddac.

    12.3.4 Transmission Disequilibrium Test and Association Analysis Using Familial Data

    In classical genetic studies, the identification of a chromo-somal region with linkage analysis is the first step in the gene mapping process. As linkage analysis provides infor-mation of a genomic region, a typical quantitative trait locus (QTL) may cover several millions of base pairs and may contain hundreds of genes. The initial detection of the QTL is followed by addition of more markers within the QTL to narrow down the region as much as possi-ble. Once the resolution limit of the linkage approach is reached, the most commonly employed follow-up is to fine map using association analyses with SNPs.

    The rationale for association analyses is to confirm the involvement of a putative allele involved in a trait of interest. The rationale of fine mapping is that the greater number of SNPs and the greater sensitivity of the associ-ation tests provide more detailed information of the tar-get region. The SNPs located close to a disease locus may cosegregate because of linkage disequilibrium, i.e. allelic association due to linkage. The allelic association forms the theoretical basis for association mapping. Allelic association that is not due to linkage disequilibrium is of no interest in mapping disease genes. The simplest way to test for association is to perform a casecontrol study, where the cases are the individuals with the disease and the controls are without the disease. Association is then tested by ascertaining whether a particular marker allele is more frequent among the cases than the con-trols. A significant result will be observed if the marker is in linkage disequilibrium with the disease locus, or from a variety of confounding reasons such as popula-tion stratification. Therefore, casecontrol association studies without controlling for stratification are prone to false-positive results with no biological significance, and for this reason, association studies were not popular until the mid-1990s when methods to account for this stratification were established using familial data (47).

  • CHAPTER 12

    To control for population stratification, Spielman et al. (47) proposed the transmission disequilibrium test (TDT), which is a family-based association test (FBAT) in the presence of linkage that controls for population stratification by comparing the allele frequencies among alleles transmitted to an affected offspring to those that are not transmitted to an affected offspring from informative parental matings (i.e. matings with at least one heterozygous parent). This study design requires the collection of family trios that include two parents and an affected offspring. More than 225 extensions and variations of the original TDT have been proposed (see the exhaustive review of TDT procedures in Tiwari et al. (48)). There are a number of software programs available for TDT and/or association analyses using family data such as FBAT (http://www.biostat.harvard.edu/~fbat/fbat.htm), ASSOC (http://darwin.cwru.edu/sage/), and GASSOC (http://mayoresearch.mayo.edu/schaid_lab/software.cfm). A complete list of association programs can be found at http://linkage.rockefeller.edu/ and http://www.soph.uab.edu/ssg.asp?id=1087.

    Once the results from the association analyses are deemed adequate, the next step is to screen the candidate genes for DNA sequence variation by direct sequencing. The relevance of the detected mutations is confirmed with additional association studies in the original and other populations, as well as functional assays in vitro (expression studies in different cell lines) and in vivo (transgenic and knock-out animal models) (49).

    12.4 THE INTERNATIONAL HapMap PROJECT

    In the context presented above, studies progress from esti-mates of heritability to segregation analysis to linkage and then finally familial association analysis to determine can-didate genes for a trait of interest. However, this paradigm has changed recently. With the advent of high-dimensional genotyping technologies using microarrays, the approach for discovering new genetic variants for a disease or trait have changed drastically. In 1996, Lander proposed the common-disease common-variant (CDCV) hypothesis (50). The HapMap project was initiated to create dense set of genetic markers to test the CDCV hypothesis. The CDCV is based on the idea that the genetic component of common diseases is attributable in part to common allelic variants (i.e. alleles with frequency at least 5%). The draft of the complete human genome sequence was completed in 2001 and had strong effect in advances in the genome sequencing technology (51). The International HapMap Project (www.hapmap.org) was an international partnership that was formed in 2002 to help researchers find genes associated with human disease by providing a public database of common genome-wide human var-iation across populations (5254). The first stage of the HapMap project focused on four diverse populations: 30 trios (two parents and adult child) from Yoruba in Ibadan Multifactorial Inheritance and Complex Diseases 7

    (90 individuals), Nigeria, 30 trios from the Centre dEtude du Polymorphisme Humain (CEPH) collection of Utah residents of Northern and Western European ancestry, 45 unrelated individuals from the Han Chinese in Beijing, and 45 unrelated individuals from Japanese in Tokyo. This project genotyped over 1 million SNPs in Phase I and additional 2.1 million in Phase II in the HapMap samples (53,54). It helped initiate advances in SNP array technologies to make genome-wide association studies (GWAS) feasible and affordable. Affymetrix and Illumina SNP arrays became available to researchers, which initially surveyed approximately 100,000 SNPs, and which now survey 2.5 million SNPs. During the most recent phase, HapMap 3, 1184 individuals representing 11 global pop-ulations were genotyped for approximately 1.6 million common SNPs (55; http://www.sanger.ac.uk/resources/downloads/human/hapmap3.html; http://www.broadin-stitute.org/~debakker/p3.html; http:// hapmap.ncbi.nlm.nih.gov/).

    As a complement, the 1000 Genomes Project, which began in 2008, was initiated to provide a catalog of low-frequency SNPs and structural and sequence variants in the human genome (56; http://www.1000genomes.org/).

    12.5 GENOME-WIDE ASSOCIATION STUDIES

    GWAS is an approach that involves scanning thousands to a few million SNPs across the whole genome on many individuals to find association with a disease or trait. As mentioned earlier, GWAS became a popular choice of genetic studies to detect putative loci associated with a disease or trait because of availability of high-throughput SNP arrays, decreased cost of genotyping, and methods to correct for population stratification (i.e. systematic dif-ferences in allele frequencies between subpopulations in a given population possibly because of different ancestry). Before the HapMap era, investigators were reluctant to conduct association studies using population data because of concerns about population stratification. For example, in case-control studies, we usually test association of a par-ticular SNP by comparing allele frequency between cases and controls. Allele frequencies are known to vary within and between populations depending on genetic ancestry (53,57). Genetic ancestry becomes a confounding variable leading to spurious associations if allele frequencies are different within or between race/ethnic groups. Methods for correcting population substructure are described later.

    12.5.1 Study Designs

    Any type of data set such as pedigree data, case-control data, or population data are all appropriate choices for GWAS, but one has to adjust for familial correlations in pedigree data and population stratification in population or case-control data sets to control for the confounding

  • 8 CHAPTER 12 Multifactorial Inheritance and Co

    due to relatedness or population substructure. Case-control and population data have been commonly used for GWAS because of availability and convenience of ascertainment. There are some issues associated with the case-control design. If the disease is heterogenous, extra attention should be paid to minimize heterogeneity in case selection, e.g. selecting the most extreme cases or selecting individuals from a familial disease cohort. There has been a controversy in how to select optimal controls. Usually, controls from the same population and residing in same geographic area are preferred, but these can be difficult to ascertain. The Wellcome Trust Case Control Consortium used 3000 UK controls and 2000 cases from each of seven different diseases and showed that using common controls was effective, had minimal impact on genotypic distributions, and did not lead to excess false positives (58,59). Misclassification error in control selec-tion could affect the power of the association analysis. Specifically, this is true for late-onset diseases because controls have not yet reached the age to develop the dis-ease; this issue can be resolved by increasing the sample size (58). Population stratification and cryptic related-ness (i.e. relatedness among the individuals in the study that is not known to the investigator) can also increase the false-positive findings as previously discussed. The family-based association studies are robust to population stratification, but it is difficult to ascertain all pedigree members, which leads to missing data within families and loss of power compared to case-control designs (60). There are other issues with study design selection; an excellent review is provided by McCarthy et al. (58).

    12.5.2 Quality Control

    The first step of GWAS analysis is the QC of the geno-typic and phenotypic data. There are a number of proce-dures needed to ensure the quality of genotype data both at the genotyping laboratory and after calling genotypes using statistical approaches. Here, we assume that the genotyping laboratory has used best practices to remove technical variation, and we present only the statistical methods that are used after completion of the genotyp-ing. The QC and association analysis of GWAS data can be performed using the robust, freely available, and open source software PLINK developed by Purcell et al. (61). Two recent publications provide excellent reviews of the QC protocol for GWAS data (62,63). Here, we provide few important steps of the QC in GWAS in similar guide-lines as in Laurie et al. (62) and Turner et al. (63). Note that the current genotyping technology is very reliable, but there are still some possibilities of errors when geno-typing large number of SNPs.

    12.5.3 Sex Inconsistency

    It is possible that self-reported sex of the individual is incorrect. Sex inconsistency can be checked by comparing mplex Diseases

    the reported sex of each individual with predicted sex by using X-chromosome markers heterozygosity to deter-mine the sex of the individual empirically.

    12.5.4 Relatedness and Mendelian Errors

    Another kind of error that can occur in genotyping is due to sample mix-up, cryptic relatedness, duplications, and pedigree errors such as self-reported relationships that are not accurate. To detect sample relatedness, one can calculate three IBD probabilities of sharing 0, 1, and 2 alleles that are identical-by-descent for each pair of individuals using software such as PLINK and a kin-ship coefficient matrix. Individuals sharing zero alleles at every locus are unrelated, individuals sharing one allele IBD at every locus are parentoffspring pairs, individ-uals sharing two alleles IBD at every locus are MZ twins or a duplicated sample, and on average, sib pairs share 0, 1, and 2 alleles IBD with sharing probabilities 0.25, 0.5, and 0.25, respectively. The relationship errors can be corrected by consulting with the self-reported rela-tionships and/or using inferred genetic relationships. Cryptic relatedness can inflate the variance of the test statistic (e.g. if the test statistic is the difference in the overall allele counts between case and control samples in a trend test (64)). The presence of cryptic relatedness in case-control studies increases the false positives in asso-ciation analysis. Devlin and Roeder provided a method to correct for the variance inflation (see References (64) and (65) for details).

    12.5.5 Batch Effects

    For GWAS, samples are processed together for genotyp-ing in a batch. The size and composition of the sample batch depends on the type of the commercial array; for example, an affymetrix array can genotype up to 96 samples, and an Illumina array can genotype up to 24 samples. To minimize batch effects, samples should be randomly assigned to plates with different phenotypes, sex, race, and ethnicity. The downstream association study can be confounded by the batch effects. There are several methods available to detect any batch effects. The most commonly used method is to compare the average minor allele frequencies (MAFs) and average genotyping call rates across all SNPs for each plate. Most genotyping laboratories perform batch effect detection and usually regenotype the data if there is a batch effect or a plate is discarded when there is a large amount of missing data.

    12.5.6 Marker and Sample Genotyping Efficiency or Call Rate

    Marker genotyping efficiency is defined as the proportion of samples with a genotype call for each marker. If large numbers of samples is not called for a particular marker, that is, an indication of a poor assay and the marker

  • CHAPTER 12

    should be removed from further analysis. A threshold for removing markers varies from study to study depending on the sample size of the study. Usually recommended call rates are 9899%. If the quality of the DNA sample is poor, it leads to low call rate of genotypes for the indi-vidual, i.e. the number of missing genotypes will be large and the sample should be excluded from further analysis. Before performing the association analysis, one should filter out the samples and markers using some threshold for marker and sample call rates.

    12.5.7 Population Stratification

    There are a number of methods proposed to correct for population substructure. Three commonly used meth-ods to correct for the underlying variation in allele fre-quencies that leads to confounding due to population stratification are genomic control (4,6471), structured association testing (7274), and principal components (PCs) (75,76). The genomic control method controls for confounding due to substructure by estimating an infla-tion factor (ratio of the variance of the test statistic to the variance under the null hypothesis), then adjusting the test statistics for all markers in GWAS downward by the inflation factor. Usually, the inflation factor is calculated using a few hundred loci. Structure associa-tion testing (72,77) (1) estimates ancestry proportions of each individual from the founding population using markers with different allele frequencies in the founder population; and (2) uses theses proportions to cluster individuals to create homogenous groups with similar ancestry profile for the association analysis. Principal components analysis (PCA) uses thousands of mark-ers to detect population stratification and PCs then can be used to correct for stratification by modeling PCs as covariates in the model (75,76). PCs can be calculated using a program Eigenstrat (75,76). There are two issues with using PCA: (1) how many SNPs to use, and (2) how many PCs should be included as covariates in the association analysis.

    12.5.8 Marker Allele Frequency and HardyWeinberg Equilibrium Filter

    The HardyWeinberg equilibrium (HWE) test com-pares the observed genotypic proportion at the marker versus the expected proportion. Deviation from HWE at a marker locus can be due to population stratifica-tion, inbreeding, selection, nonrandom mating, genotyp-ing error, actual association to the disease or trait under study, or a deletion or duplication polymorphism. How-ever, HWE is typically used to detect genotyping errors. SNPs that do not meet HWE at a certain threshold of significance are usually excluded from further associa-tion analysis. It is also important to discard SNPs based on MAF. Most GWAS studies are powered to detect a disease association with common SNPs (MAF 0.05). Multifactorial Inheritance and Complex Diseases 9

    The rare SNPs may lead to spurious results because of the small number of homozygotes for the minor allele, genotyping errors, or population stratification.

    12.6 ASSOCIATION METHODS/STATISTICAL ANALYSIS

    12.6.1 Discovery Phase of the GWAS

    Discovery phase consists of scanning the genome with at least 1000 cases and 1000 controls with an appropri-ate statistical test. The choice of the statistical test for association depends on the study design and the pheno-type under consideration. In the case-control design, the goal is to compare the allele or genotypic frequencies between cases (affected) and controls (normal). This can be tested with Pearsons chi-square test, Fishers exact test, or the CochranArmitage test. Pearsons chi-square tests the null hypothesis of no association between rows and columns of the 2 3 contingency table consisting of the counts of the three genotypes among cases and con-trols (78). Fishers exact test is similar to the Pearsons test, but the deviation from the null hypothesis is calcu-lated exactly from all possible permutations of the data, and thus, does not assume the asymptotic property as Pearsons test (79). The CochranArmitage test for trend is a test of proportions of cases versus controls (8082) and assumes an additive mode of inheritance that is a lin-ear trend. There is a loss of power if the trend is not lin-ear. Freidlin et al. (83) recommended using a maximum of the test statistics obtained from additive, dominant, or recessive effects models. Note that in the above statisti-cal procedures, one cannot model covariates such as sex, age, race, age of onset, PCs (PCs from admixture), and so on. To accommodate any relevant covariates in the anal-ysis, one can use logistic regression. Logistic regression is more flexible in that it can model covariates, multiple SNPs as main effects, SNP by SNP interactions, SNP by environment interactions, and so on. If the phenotype is continuous, analysis of variance and general linear model approaches can be employed. One can also use a linear regression framework if extremes of the distribu-tion are used to define case and control status. Huang and Lin have given an efficient association method using extreme phenotypes (84). The analysis of familial data requires correcting for the dependency of observa-tions. The notable methods include linear mixed model (8587), FBAT (see review by Laird and Lange (88)), or ASSOC (a module of S.A.G.E. software suite (36)). After scanning 12 million SNPs (i.e. 12 million statistical tests) to determine significant associations, appropriate multiple testing correction is required to control for false positives and also in choosing SNPs for follow-up stud-ies. The guideline for significant association for GWAS is generally a p value of ~5 108 (60,89), but, it is common practice to use a higher p value threshold for follow-up study or replication. Balding (90) provides a

  • 10 CHAPTER 12 Multifactorial Inheritance and Co

    comprehensive discussion of the advantages and disad-vantages of these methods pertaining to GWAS.

    12.6.2 Validation and Replication Phase

    Some investigators have recommended reanalysis of the original discovery phase GWAS data using a different genotype platform for validation, which has been termed technical validation (58). Technical validation allows detection of technical errors in genotyping that might give rise to spurious association signals or false positives, but given limitations of the resources available to inves-tigator, it may not be feasible. The replication phase or follow-up study is one of the most challenging aspects of the GWAS and is required to control for false posi-tives. Replication in an independent data set with similar genetic background and phenotype is warranted. Usually several hundreds or a few thousands of SNPs are tested in a replication set, depending on the threshold used for significant association p value. The statistical methods are the same as in the discovery phase, depending on the study design and type of phenotype.

    12.7 ANALYSIS OF RARE VARIANTS USING NEW TECHNOLOGIES

    Introduction of the HapMap project and large-scale GWAS studies were driven by the common disease, common variant (CVCD) hypothesis that was first introduced in the 1990s (50,91). The CDCV is based on the idea that the genetic component of common diseases is attributable in part to allelic variants that are present in more than 5% of the population. An extension of this hypothesis is that the same variants will be responsible for the disease across multiple populations (92). The early success of GWAS (age-related macular degenera-tion) seemed to support the theory that a large proportion of the genetic variants underlying complex disease could be explained by the CDCV. It is now becoming apparent that many common variants confer only a small portion of risk individually and also explain a small portion of the heritability of common complex diseases (93). While GWAS have been successful in many ways, identifying hundreds of variants for a large number of traits (http://www.genome.gov/26525384), there still remains a large proportion of heritability that has yet to be explained.

    When the CDCV hypothesis was first introduced, it was not without contention (94). One of the strongest counter arguments was based on the hypothesis of common dis-ease, rare variant, (CDRV) which is in essence the antith-esis of the CDCV hypothesis (77,95). The rare variant hypothesis proposes that common complex diseases are due to the combined effect of multiple rare variants with moderate to low individual risk. Unlike CVCD, it is gen-erally thought that owing to population history, these rare variants will be population specific (96). It is only recently, with the availability of affordable large-scale sequencing mplex Diseases

    technology and advances in analytical methods (discussed below), that scientists have gained the ability to address the role of CDRV in human disease. In reality, it is likely that the genetic basis of complex disease is somewhere between the two extremes, with multiple genes interacting together with a variety of common and rare variants and other genetic and environmental factors (97).

    As new high-throughput, massively parallel sequencing technologies emerged in 2005, direct sequencing became commonly used to directly interrogate whole genomic sequences for association with disease without prior spec-ification of SNPs currently available on commercial SNP chips (98). Such technologies overcome some of the short-comings of GWAS methods, such as ascertainment bias in the set of currently available SNPs, and the ability to assay rare or private variants. In addition, greater flexibil-ity exists in the search for variants other than SNPs, such as copy number variants, insertions or deletions or indels, inversions, and so on. Whole exome sequencing, in which only the sequence of exons are assayed, has been used to discover causal mutations in a number of Mendelian disorders such as Millers syndrome (99) and Hereditary Spastic Paraparesis (100). Because of the enormous num-ber of variants introduced from new sequencing technol-ogies and the small sample sizes typically present, new bioinformatic and statistical methods have been devel-oped to reduce the dimensionality and improve the prob-ability of detection of causal variants. Prior bioinformatic processing may include filtering by IBD methods if family data is present (101), or filtering based on the expected mode of inheritance (99). In addition, if only rare vari-ants are desired, then common variants can be filtered out using dbSNP (102), and in addition, predicted functional variants (nonsense, missense, splice site variants, indels, frameshift mutations, etc.) can be discerned using tools such as SIFT and PolyPhen (102,103). Once likely non-functional variants have been filtered out, new statistical methods for summarizing the effect of multiple rare var-iants at a single gene can be applied. Some examples of these methods include the cohort allelic sums test method, which compares the number of individuals with muta-tions within a gene between cases and controls (104), the combined multivariate and collapsing (105), which col-lapses multiple rare variants in conjunction with common variants using multivariate analysis, methods that weight the counts of each variant using the estimated standard deviation of the total number of mutations (106,107), or a method that models these weights in a flexible Bayesian framework (108). A review of recent methods is given in reference (109). Whole-exome and whole-genome stud-ies (in which the contribution of noncoding regions to disease can be assayed) are currently underway for com-plex (multifactorial) diseases. The next couple of years will show if these technologies can help to fill in the gaps from GWAS studies, termed missing heritability (93), and in identifying causal variants underlying multifac-torial diseases. In addition, new sequencing technologies

  • CHAPTER 12

    offer opportunities for functional characterization studies such as gene expression profiling using next generation sequencing (110), epigenetic profiling (111), and in inden-tifying somatic mutations occurring in cancer (112114).

    12.8 INTEGRATION OF GENETIC, GENOMIC, AND FUNCTIONAL DATA FOR ANALYSIS OF MULTIFACTORIAL DISEASES

    Genes do not work in isolation, but instead work in concert to carry out complex biological functions (115). The field of Systems Biology is an emerging area of research that aims to study these complex interactions (116118). Following the Central Dogma of molecular biology (DNARNAProtein), the ability to integrate genetic data with other types of information (e.g. gene expression, protein interaction, etc.) is essential to our ability to understand the underlying complex etiology. A systems biology approach to complex diseases is pos-sible in part due to the ever increasing amount of omics data (e.g. genomics, proteomics, metabolomics, etc.) that is becoming available in public databases, the scientific literature, and elsewhere. There are a wealth of meth-ods that have been developed, or are being developed, that are designed to utilize this wealth of data to help prioritize and interpret results of genetic association studies. One of the most popular types of methods is pathway analysis, often referred to as gene set analy-sis, which can be broadly defined as methods that use preexisting information about SNPs, genes, transcripts, or other genomic information to interrogate subsets (e.g. pathways) of genomic variables as they relate to a complex trait or disease. These methods were first devel-oped for genome-wide gene expression studies, but have recently been applied to GWAS, and will have great util-ity in sequencing-based studies. These methods typically examine test statistics to determine if the members of a group of genes are enriched for association with a trait (e.g. gene set enrichment analysis (119,120)), or to test if the group itself is associated with the trait (e.g. gene set ridge regression in association studies, (121,122)). For additional information on these methods, see refer-ences (123129). Inclusion of prior network knowledge or de novo network discovery in conjunction with path-way methods has also been shown to further improve genomic data discoveries, with significant examples in cancer as exemplified in (130132).

    12.9 CONCLUSIONS

    Genetic modeling is a challenging art and sci-ence. Advances in molecular technology, statistical methodology,6 and increasing availability of large

    6A steady stream of videos offering tutelage on these advances can be freely seen at: http://www.soph.uab.edu/ssg/courses/ssgseminars. Multifactorial Inheritance and Complex Diseases 11

    samples allow many new investigations to be undertaken on unprecedented scales. Interpretation of the resulting findings remains both difficult and one of the more excit-ing challenges facing todays biomedical researchers.

    REFERENCES

    1. Badano, J. L.; Katsanis, N. Beyond Mendel: An Evolving View of Human Genetic Disease Transmission. Nat. Rev. Genet. 2002 Oct, 3 (10), 779789.

    2. Fisher, R. A. The Correlation between Relatives on the Sup-position of Mendelian Inheritance. Trans. R. Soc. Edinb. 1918, 52, 399433.

    3. Falconer, D. S. The Inheritance of Liability to Certain Dis-eases, Estimated from the Incidence among Relatives. Ann. Hum. Genet. 1965 Aug, 29, 5176.

    4. Reich, T.; James, J. W.; Morris, C. A. The Use of Multiple Thresholds in Determining the Mode of Transmission of Semi-Continuous Traits. Ann. Hum. Genet. 1972 Nov, 36 (2), 163184.

    5. Chakraborty, R. The Inheritance of Pyloric Stenosis Explained by a Multifactorial Threshold Model with Sex Dimorphism for Liability. Genet. Epidemiol. 1986, 3 (1), 115.

    6. Dronamraju, K. R.; Bixler, D.; Majumder, P. P. Fetal Mor-tality Associated with Cleft Lip and Cleft Palate. Johns Hop-kins Med. J. 1982 Dec, 151 (6), 287289.

    7. Dronamraju, K. R.; Bixler, D. Fetal Mortality in Oral Cleft Families(IV): The Doubling Effect. Clin. Genet. 1983 Jul, 24 (1), 2225.

    8. Elston, R. C.; Boklage, C. E. An Examination of Fundamen-tal Assumptions of the Twin Method. Prog. Clin. Biol. Res. 1978, 24A, 189199.

    9. Hopper, J. L. Twin Concordance. In Encyclopedia of Biosta-tistics; John Wiley: New York, 1998; Vol. 6, pp 46264629.

    10. Karlin, S.; Cameron, E. C.; Williams, P. T. Sibling and Parent Offspring Correlation Estimation with Variable Family Size. Proc. Natl. Acad. Sci. U.S.A. 1981 May, 78 (5), 26642668.

    11. Neale, M. C. Adoption Studies. In Encyclopedia of Biostatis-tics; John Wiley: New York, 1998; Vol. 1, pp 7781.

    12. Neale, M. C.; Cardon, L. R. Methodology for Genetic Stud-ies of Twins and Families; Kluwer: London, 1992.

    13. Carey, G. Sibling Imitation and Contrast Effects. Behav. Genet. 1986 May, 16 (3), 319341.

    14. Lykken, D. T.; McGue, M.; Tellegen, A.; Bouchard, T. J., Jr. Emergenesis. Genetic Traits That May Not Run in Families. Am. Psychol. 1992 Dec, 47 (12), 15651577.

    15. Risch, N. Linkage Strategies for Genetically Complex Traits. I. Multilocus Models. Am. J. Hum. Genet. 1990 Feb, 46 (2), 222228.

    16. Olson, J. M.; Cordell, H. J. Ascertainment Bias in the Esti-mation of Sibling Genetic Risk Parameters. Genet. Epide-miol. 2000 Mar, 18 (3), 217235.

    17. Visscher, P. M.; Medland, S. E.; Ferreira, M. A.; Morley, K. I.; Zhu, G.; Cornes, B. K.; Montgomery, G. W.; Mar-tin, N. G. Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Sib-lings. PLoS Genet. 2006 Mar, 2 (3), e41, Epub 2006 Mar 24.

    18. Morton, N. E. Sequential Tests for the Detection of Linkage. Am. J. Hum. Genet. 1955 Sep, 7 (3), 277318.

    19. Elston, R. C.; Stewart, J. A General Model for the Genetic Analysis of Pedigree Data. Hum. Hered. 1971, 21 (6), 523542.

    20. Elston, R. C.; Rao, D. C. Statistical Modeling and Analysis in Human Genetics. Annu. Rev. Biophys. Bioeng. 1978, 7, 253286, Review.

  • 12 CHAPTER 12 Multifactorial Inheritance and Com

    21. Lander, E. S.; Green, P. Construction of Multilocus Genetic Linkage Maps in Humans. Proc. Natl. Acad. Sci. U.S.A. 1987 Apr, 84 (8), 23632367.

    22. Haseman, J. K.; Elston, R. C. The Investigation of Linkage between a Quantitative Trait and a Marker Locus. Behav. Genet. 1972 Mar, 2 (1), 319.

    23. Penrose, L. S. The Detection of Autosomal Linkage in Data Which Consist of Pairs of Brothers and Sisters of Unspecified Parentage. Ann. Eugen. 1935, 6, 133138.

    24. Amos, C. I.; Elston, R. C.; Wilson, A. F.; Bailey-Wilson, J. E. A More Powerful Robust Sib-Pair Test of Linkage for Quan-titative Traits. Genet. Epidemiol. 1989, 6 (3), 435449.

    25. Olson, J. M.; Wijsman, E. M. Linkage between Quantitative Trait and Marker Loci: Methods Using All Relative Pairs. Genet. Epidemiol. 1993, 10 (2), 87102.

    26. Drigalenko, E. How Sib Pairs Reveal Linkage. Am. J. Hum. Genet. 1998 Oct, 63 (4), 12421245.

    27. Forrest, W. F. Weighting Improves the New HasemanElston Method. Hum. Hered. 2001, 52 (1), 4754.

    28. Gerhard, D.; Hothorn, L. A. Rank Transformation in HasemanElston Regression Using Scores for Location-Scale Alternatives. Hum. Hered. 2010, 69 (3), 143151.

    29. Sham, P. C.; Purcell, S. Equivalence between HasemanElston and Variance-Components Linkage Analyses for Sib Pairs. Am. J. Hum. Genet. 2001 Jun, 68 (6), 15271532.

    30. Sham, P. C.; Purcell, S.; Cherny, S. S.; Abecasis, G. R. Power-ful Regression-Based Quantitative-Trait Linkage Analysis of General Pedigrees. Am. J. Hum. Genet. 2002 Aug, 71 (2), 238253.

    31. Shete, S.; Jacobs, K. B.; Elston, R. C. Adding Further Power to the Haseman and Elston Method for Detecting Linkage in Larger Sibships: Weighting Sums and Differences. Hum. Hered. 2003, 55 (23), 7985.

    32. Visscher, P. M.; Hopper, J. L. Power of Regression and Max-imum Likelihood Methods to Map QTL from Sib-Pair and DZ Twin Data. Ann. Hum. Genet. 2001 Nov, 65 (Pt 6), 583601.

    33. Wang, T.; Elston, R. C. A Modified Revisited HasemanElston Method to Further Improve Power. Hum. Hered. 2004, 57 (2), 109116.

    34. Wright, F. A. The Phenotypic Difference Discards Sib-Pair QTL Linkage Information. Am. J. Hum. Genet. 1997 Mar, 60 (3), 740742.

    35. Xu, X.; Weiss, S.; Xu, X.; Wei, L. J. A Unified HasemanElston Method for Testing Linkage with Quantitative Traits. Am. J. Hum. Genet. 2000 Oct, 67 (4), 10251028.

    36. S.A.G.E. 6.x [2010]. Statistical Analysis for Genetic Epide-miology, http://darwin.cwru.edu/sage/.

    37. Kruglyak, L.; Daly, M. J.; Reeve-Daly, M. P.; Lander, E. S. Parametric and Nonparametric Linkage Analysis: A Unified Multipoint Approach. Am. J. Hum. Genet. 1996, 58, 13471363.

    38. Lathrop, G. M.; Lalouel, J. M. Easy Calculations of Lod Scores and Genetic Risks on Small Computers. Am. J. Hum. Genet. 1984 Mar, 36, 460465.

    39. Lathrop, G. M.; Lalouel, J. M.; Julier, C.; Ott, J. Strategies for Multilocus Linkage Analysis in Humans. Proc. Natl. Acad. Sci. U.S.A. 1984 Jun, 81, 34433446.

    40. Lathrop, G. M.; Lalouel, J. M.; White, R. L. Construction of Human Linkage Maps: Likelihood Calculations for Mul-tilocus Linkage Analysis. Genet. Epidemiol. 1886 Feb, 3, 3952.

    41. Abecasis, G. R.; Cherny, S. S.; Cookson, W. O.; Cardon, L. R. MerlinRapid Analysis of Dense Genetic Maps Using Sparse Gene Flow Trees. Nat. Genet. 2002 Jan, 30 (1), 97101.

    42. Almasy, L.; Blangero, J. Multipoint Quantitative-Trait Link-age Analysis in General Pedigrees. Am. J. Hum. Genet. 1998 May, 62 (5), 11981211.plex Diseases

    43. Amos, C. I. Robust Variance-Components Approach for Assessing Genetic Linkage in Pedigrees. Am. J. Hum. Genet. 1994 Mar, 54 (3), 535543.

    44. Amos, C. I.; Zhu, D. K.; Boerwinkle, E. Assessing Genetic Linkage and Association with Robust Components of Vari-ance Approaches. Ann. Hum. Genet. 1996 Mar, 60 (Pt 2), 143160.

    45. Goldgar, D. E. Multipoint Analysis of Human Quantitative Genetic Variation. Am. J. Hum. Genet. 1990 Dec, 47 (6), 957967.

    46. Schork, N. J. Extended Multipoint Identity-by-Descent Analysis of Human Quantitative Traits: Efficiency, Power, and Modeling Considerations. Am. J. Hum. Genet. 1993 Dec, 53 (6), 13061319.

    47. Spielman, R. S.; McGinnis, R. E.; Ewens, W. J. Transmission Test for Linkage Disequilibrium: The Insulin Gene Region and Insulin-Dependent Diabetes Mellitus (IDDM). Am. J. Hum. Genet. 1993 Mar, 52 (3), 506516.

    48. Tiwari, H. K.; Barnholtz-Sloan, J.; Wineinger, N.; Padilla, M. A.; Vaughan, L. K.; Allison, D. B. Review and Evaluation of Methods Correcting for Population Stratification with a Focus on Underlying Statistical Principles. Hum. Hered. 2008, 66 (2), 6786.

    49. Page, G. P.; George, V.; Go, R. C.; Page, P. Z.; Allison, D. B. Are We There Yet?: Deciding When One Has Demon-strated Specific Genetic Causation in Complex Diseases and Quantitative Traits. Am. J. Hum. Genet. 2003 Oct, 73 (4), 711719, Review.

    50. Lander, E. S. The New Genomics: Global Views of Biology. Science 1996 Oct 25, 274 (5287), 536539.

    51. Lander, E. S.; Linton, L. M.; Birren, B.; Nusbaum, C.; Zody, M. C.; Baldwin, J.; Devon, K.; Dewar, K.; Doyle, M.; FitzHugh, W., et al. International Human Genome Sequencing Consor-tium Initial Sequencing and Analysis of the Human Genome. Nature 2001 Feb 15, 409 (6822), 860921.

    52. International HapMap Consortium. The International Hap-Map Project. Nature 2003 Dec 18, 426 (6968), 789796.

    53. International HapMap Consortium A. Haplotype Map of the Human Genome. Nature 2005 Oct 27, 437 (7063), 12991320.

    54. International HapMap Consortium; Frazer, K. A.; Ballinger, D. G.; Cox, D. R.; Hinds, D. A.; Stuve, L. L.; Gibbs, R. A.; Belmont, J. W.; Boudreau, A.; Hardenbol, P., et al. A Second Generation Human Haplotype Map of Over 3.1 million SNPs. Nature 2007 Oct 18, 449 (7164), 851861.

    55. International HapMap 3 Consortium; Altshuler, D. M.; Gibbs, R. A.; Peltonen, L.; Altshuler, D. M.; Gibbs, R. A.; Peltonen, L.; Dermitzakis, E.; Schaffner, S. F.; Yu, F., et al. Integrating Common and Rare Genetic Variation in Diverse Human Populations. Nature 2010 Sep 2, 467 (7311), 5258.

    56. 1000 Genomes Project Consortium. A Map of Human Genome Variation from Population-Scale Sequencing. Nature 2010 Oct 28, 467 (7319), 10611073, Erratum in: Nature 2011 May 26, 473 (7348), 544.

    57. Stephens, J. C.; Schneider, J. A.; Tanguay, D. A.; Choi, J.; Acharya, T.; Stanley, S. E.; Jiang, R.; Messer, C. J.; Chew, A.; Han, J. H., et al. Haplotype Variation and Linkage Dis-equilibrium in 313 Human Genes. Science 2001 Jul 20, 293 (5529), 489493.

    58. McCarthy, M. I.; Abecasis, G. R.; Cardon, L. R.; Goldstein, D. B.; Little, J.; Ioannidis, J. P.; Hirschhorn, J. N. Genome-Wide Association Studies for Complex Traits: Consensus, Uncertainty and Challenges. Nat. Rev. Genet. 2008 May, 9 (5), 356369, Review.

    59. Wellcome Trust Case Control Consortium Genome-Wide Association Study of 14,000 Cases of Seven Common Dis-eases and 3,000 Shared Controls. Nature 2007 Jun 7, 447 (7145), 661678.

  • CHAPTER 12

    60. Risch, N.; Merikangas, K. The Future of Genetic Studies of Complex Human Diseases. Science 1996 Sep 13, 273 (5281), 15161517.

    61. Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M. A.; Bender, D.; Maller, J.; Sklar, P.; de Bakker, P. I.; Daly, M. J., et al. PLINK: A Tool Set for Whole-Genome Associa-tion and Population-Based Linkage Analyses. Am. J. Hum. Genet. 2007 Sep, 81 (3), 559575.

    62. Laurie, C. C.; Doheny, K. F.; Mirel, D. B.; Pugh, E. W.; Bierut, L. J.; Bhangale, T.; Boehm, F.; Caporaso, N. E.; Cor-nelis, M. C.; Edenberg, H. J., et al. GENEVA Investigators Quality Control and Quality Assurance in Genotypic Data for Genome-Wide Association Studies. Genet. Epidemiol. 2010 Sep, 34 (6), 591602.

    63. Turner, S.; Armstrong, L. L.; Bradford, Y.; Carlson, C. S.; Crawford, D. C.; Crenshaw, A. T.; de Andrade, M.; Doheny, K. F.; Haines, J. L.; Hayes, G., et al. Quality Control Proce-dures for Genome-Wide Association Studies. Curr. Protoc. Hum. Genet. 2011 Jan, Chapter 1:Unit1.19.

    64. Devlin, B.; Roeder, K. Genomic Control for Association Studies. Biometrics 1999 Dec, 55 (4), 9971004.

    65. Voight, B. F.; Pritchard, J. K. Confounding from Cryp-tic Relatedness in Case-Control Association Studies. PLoS Genet. 2005 Sep, 1 (3), e32.

    66. Bacanu, S. A.; Devlin, B.; Roeder, K. The Power of Genomic Control. Am. J. Hum. Genet. 2000 Jun, 66 (6), 19331944.

    67. Dadd, T.; Weale, M. E.; Lewis, C. M. A Critical Evalua-tion of Genomic Control Methods for Genetic Association Studies. Genet. Epidemiol. 2009 May, 33 (4), 290298, Review.

    68. Devlin, B.; Roeder, K.; Wasserman, L. Genomic Control, a New Approach to Genetic-Based Association Studies. Theor. Popul. Biol. 2001 Nov, 60 (3), 155166, Review.

    69. Devlin, B.; Bacanu, S. A.; Roeder, K. Genomic Control to the Extreme. Nat. Genet. 2004 Nov, 36 (11), 11291130.

    70. Zheng, G.; Freidlin, B.; Li, Z.; Gastwirth, J. L. Genomic Con-trol for Association Studies Under Various Genetic Models. Biometrics 2005 Mar, 61 (1), 186192.

    71. Zheng, G.; Freidlin, B.; Gastwirth, J. L. Robust Genomic Control for Association Studies. Am. J. Hum. Genet. 2006 Feb, 78 (2), 350356.

    72. Pritchard, J. K.; Rosenberg, N. A. Use of Unlinked Genetic Markers to Detect Population Stratification in Association Studies. Am J Hum Genet. 1999 Jul, 65 (1), 220228.

    73. Pritchard, J. K.; Stephens, M.; Rosenberg, N. A.; Donnelly, P. Association Mapping in Structured Populations. Am. J. Hum. Genet. 2000 Jul, 67 (1), 170181.

    74. Redden, D.; Divers, J.; Vaughan, L.; Tiwari, H.; Beasley, T.; Fernandez, J.; Kimberly, R.; Feng, R.; Padilla, M.; Lui, N., et al. Regional Admixture Mapping and Struc-tured Association Testing: Conceptual Unification and an Extensible General Linear Model. PLoS Genet. 2006 Aug 25, 2 (8), e137.

    75. Patterson, N.; Price, A. L.; Reich, D. Population Structure and Eigen Analysis. PLoS Genet. 2006 Dec, 2 (12), e190.

    76. Price, A. L.; Patterson, N. J.; Plenge, R. M.; Weinblatt, M. E.; Shadick, N. A.; Reich, D. Principal Components Analy-sis Corrects for Stratification in Genome-Wide Association Studies. Nat. Genet. 2006 Aug, 38 (8), 904909.

    77. Pritchard, J. K.; Cox, N. J. The Allelic Architecture of Human Disease Genes: Common DiseaseCommon Variant or Not? Hum. Mol. Genet. 2002 Oct 1, 11 (20), 24172423.

    78. Pearson, K. On the Criterion That a Given System of Devia-tions from the Probable in the Case of a Correlated System of Variables is Such That It Can Be Reasonably Supposed to Have Arisen from Random Sampling. Philos. Mag. 1900, 50 (302), 157175, Series 5. Multifactorial Inheritance and Complex Diseases 13

    79. Fisher, R. A. On the Interpretation of 2 from Contingency Tables, and the Calculation of P. J. R. Stat. Soc. 1922, 85 (1), 8794.

    80. Armitage, P. Tests for Linear Trends in Proportions and Fre-quencies. Biometrics 1955, 11 (3), 375386.

    81. Cochran, W. G. Some Methods for Strengthening the Com-mon Chi-Square Tests. Biometrics 1954, 10 (4), 417451.

    82. Sasieni, P. From Genotypes to Genes: Doubling the Sample Size. Biometrics 1997 Dec, 53 (4), 12531261.

    83. Freidlin, B.; Zheng, G.; Li, Z.; Gastwirth, J. L. Trend Tests for CaseControl Studies of Genetic Markers: Power, Sample Size and Robustness. Hum. Hered. 2002, 53 (3), 146152.

    84. Huang, B. E.; Lin, D. Y. Efficient Association Mapping of Quantitative Trait Loci with Selective Genotyping. Am. J. Hum. Genet. 2007, 80, 567576.

    85. Aulchenko, Y. S.; de Koning, D. J.; Haley, C. Genome-Wide Rapid Association Using Mixed Model and Regression: A Fast and Simple Method for Genome-Wide Pedigree-Based Quantitative Trait Loci Association Analysis. Genetics 2007 Sep, 177 (1), 577585.

    86. Kang, H. M.; Zaitlen, N. A.; Wade, C. M.; Kirby, A.; Heck-erman, D.; Daly, M. J.; Eskin, E. Efficient Control of Popu-lation Structure in Model Organism Association Mapping. Genetics 2008 Mar, 178 (3), 17091723.

    87. Zhang, Z.; Ersoz, E.; Lai, C. Q.; Todhunter, R. J.; Tiwari, H. K.; Gore, M. A.; Bradbury, P. J.; Yu, J.; Arnett, D. K.; Ordo-vas, J. M., et al. Mixed Linear Model Approach Adapted for Genome-Wide Association Studies. Nat. Genet. 2010 Apr, 42 (4), 355360.

    88. Laird, N. M.; Lange, C. Family-Based Designs in the Age of Large-Scale Gene-Association Studies. Nat. Rev. Genet. 2006 May, 7 (5), 385394, Review.

    89. Hoggart, C. J.; Clark, T. G.; De Iorio, M.; Whittaker, J. C.; Balding, D. J. Genome-Wide Significance for Dense SNP and Resequencing Data. Genet. Epidemiol. 2008 Feb, 32 (2), 179185.

    90. Balding, D. J. A Tutorial on Statistical Methods for Popula-tion Association Studies. Nat. Rev. Genet. 2006 Oct, 7 (10), 781791, Review.

    91. Chakravarti, A. Population GeneticsMaking Sense Out of Sequence. Nat. Genet. 1999 Jan, 21 (Suppl. 1), S56S60, Review.

    92. Lohmueller, K. E.; Mauney, M. M.; Reich, D.; Braverman, J. M. Variants Associated with Common Disease Are Not Unusually Differentiated in Frequency Across Populations. Am. J. Hum. Genet. 2006 Jan, 78 (1), 130136.

    93. Manolio, T. A.; Collins, F. S.; Cox, N. J.; Goldstein, D. B.; Hindorff, L. A.; Hunter, D. J.; McCarthy, M. I.; Ramos, E. M.; Cardon, L. R.; Chakravarti, A., et al. Finding the Miss-ing Heritability of Complex Diseases. Nature 2009 Oct 8, 461 (7265), 747753, Review.

    94. Terwilliger, J. D.; Hiekkalinna, T. An Utter Refutation of the Fundamental Theorem of the HapMap. Eur. J. Hum. Genet. 2006 Apr, 14 (4), 426437.

    95. Terwilliger, J. D.; Gring, H. H. Update to Terwilliger and Grings Gene Mapping in the 20th and 21st centuries (2000): Gene Mapping When Rare Variants Are Common and Common Variants Are Rare. Hum. Biol. 2009 Dec, 81 (56), 729733.

    96. Bodmer, W.; Bonilla, C. Common and Rare Variants in Mul-tifactorial Susceptibility to Common Diseases. Nat. Genet. 2008 Jun, 40 (6), 695701.

    97. Zondervan, K. T.; Cardon, L. R. The Complex Interplay among Factors That Influence Allelic Association. Nat. Rev. Genet. 2004 Feb, 5 (2), 89100.

    98. Mardis, E. R. A Decades Perspective on DNA Sequencing Technology. Nature 2011 Feb 10, 470 (7333), 198203.

  • 14 CHAPTER 12 Multifactorial Inheritance and Co

    99. Ng, S. B.; Buckingham, K. J.; Lee, C.; Bigham, A. W.; Tabor, H. K.; Dent, K. M.; Huff, C. D.; Shannon, P. T.; Jabs, E. W.; Nickerson, D. A., et al. Exome Sequencing Identifies the Cause of a Mendelian Disorder. Nat. Genet. 2010 Jan, 42 (1), 3035.

    100. Erlich, Y.; Edvardson, S.; Hodges, E.; Zenvirt, S.; Thekkat, P.; Shaag, A.; Dor, T.; Hannon, G. J.; Elpeleg, O. Exome Sequencing and Disease-Network Analysis of a Single Family Implicate a Mutation in KIF1A in Hereditary Spastic Parapa-resis. Genome. Res. 2011 May, 21 (5), 658664.

    101. Rdelsperger, C.; Krawitz, P.; Bauer, S.; Hecht, J.; Bigham, A. W.; Bamshad, M.; de Condor, B. J.; Schweiger, M. R.; Robinson, P. N. Identity-by-Descent Filtering of Exome Sequence Data for Disease-Gene Identification in Autosomal Recessive Disorders. Bioinformatics 2011 Mar 15, 27 (6), 829836.

    102. Ng, S. B.; Turner, E. H.; Robertson, P. D.; Flygare, S. D.; Bigham, A. W.; Lee, C.; Shaffer, T.; Wong, M.; Bhattacha-rjee, A.; Eichler, E. E., et al. Targeted Capture and Massively Parallel Sequencing of 12 Human Exomes. Nature 2009 Sep 10, 461 (7261), 272276.

    103. Sunyaev, S.; Ramensky, V.; Koch, I.; Lathe, W., III; Kon-drashov, A. S.; Bork, P. Prediction of Deleterious Human Alleles. Hum. Mol. Genet. 2001 Mar 15, 10 (6), 591597.

    104. Morgenthaler, S.; Thilly, W. G. A Strategy to Discover Genes That Carry Multi-Sllelic or Mono-Allelic Risk for Common Diseases: A Cohort Allelic Sums test (CAST). Mutat. Res. 2007 Feb 3, 615 (12), 2856.

    105. Li, B.; Leal, S. M. Methods for Detecting Associations with Rare Variants for Common Diseases: Application to Analy-sis of Sequence Data. Am. J. Hum. Genet. 2008 Sep, 83 (3), 311321.

    106. Madsen, B. E.; Browning, S. R. A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic. PLoS Genet. 2009 Feb, 5 (2), e1000384.

    107. Price, A. L.; Kryukov, G. V.; de Bakker, P. I.; Purcell, S. M.; Staples, J.; Wei, L. J.; Sunyaev, S. R. Pooled Association Tests for Rare Variants in Exon-Resequencing Studies. Am. J. Hum. Genet. 2010 Jun 11, 86 (6), 832838.

    108. Yi, N.; Zhi, D. Bayesian Analysis of Rare Variants in Genetic Association Studies. Genet. Epidemiol. 2011 Jan, 35 (1), 5769.

    109. Bansal, V.; Libiger, O.; Torkamani, A.; Schork, N. J. Statisti-cal Analysis Strategies for Association Studies Involving Rare Variants. Nat. Rev. Genet. 2010 Nov, 11 (11), 773785.

    110. Ansorge, W. J. Next-Generation DNA Sequencing Tech-niques. Nat. Biotechnol. 2009 Apr, 25 (4), 195203.

    111. Hirst, M.; Marra, M. A. Next Generation Sequencing Based Approaches to Epigenomics. Brief Funct. Genomics 2010 Dec, 9 (56), 455465, Review.

    112. Meyerson, M.; Gabriel, S.; Getz, G. Advances in Understand-ing Cancer Genomes through Second-Generation Sequenc-ing. Nat. Rev. Genet. 2010 Oct, 11 (10), 685696, Review.

    113. Timmermann, B.; Kerick, M.; Roehr, C.; Fischer, A.; Isau, M.; Boerno, S. T.; Wunderlich, A.; Barmeyer, C.; Seemann, P.; Koenig, J., et al. Somatic Mutation Profiles of MSI and MSS Colorectal Cancer Identified by Whole Exome Next Generation Sequencing and Bioinformatics Analysis. PLoS One 2010 Dec 22, 5 (12), e15661.

    114. Wei, X.; Walia, V.; Lin, J. C.; Teer, J. K.; Prickett, T. D.; Gartner, J.; Davis, S.; NISC Comparative Sequencing Program; Stemke-Hale, K.; Davies, M. A., et al. Exome Sequencing Dentifies GRIN2A As Frequently Mutated in Melanoma. Nat. Genet. 2011 May, 43 (5), 442446, Epub 2011 Apr 15.

    115. Schadt, E. E. Molecular Metworks As Sensors and Drivers of Common Human Diseases. Nature 2009 Sep 10, 461 (7261), 218223, Review.mplex Diseases

    116. Chuang, H. Y.; Hofree, M.; Ideker, T. A Decade of Systems Biology. Annu. Rev. Cell Dev. Biol. 2010 Nov 10, 26, 721744, Review.

    117. Kohl, M. Standards, Databases, and Modeling Tools in Sys-tems Biology. Methods Mol. Biol. 2011, 696, 413427.

    118. Kohl, M. Development and Validation of Predictive Molecu-lar Signatures. Curr. Mol. Med. 2010 Mar, 10 (2), 173179, Review.

    119. Subramanian, A.; Kuehn, H.; Gould, J.; Tamayo, P.; Mesirov, J. P. GSEA-P: A Desktop Application for Gene Set Enrichment Analysis. Bioinformatics 2007 Dec 1, 23 (23), 32513253.

    120. Subramanian, A.; Tamayo, P.; Mootha, V. K.; Mukherjee, S.; Ebert, B. L.; Gillette, M. A.; Paulovich, A.; Pomeroy, S. L.; Golub, T. R.; Lander, E. S., et al. Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles. Proc. Natl. Acad. Sci. U.S.A. 2005 Oct 25, 102 (43), 1554515550.

    121. Chen, L. S.; Hutter, C. M.; Potter, J. D.; Liu, Y.; Prentice, R. L.; Peters, U.; Hsu, L. Insights into Colon Cancer Etiology Via a Regularized Approach to Gene Set Analysis of GWAS Data. Am. J. Hum. Genet. 2010 Jun 11, 86 (6), 860871.

    122. Chen, X.; Wang, L.; Hu, B.; Guo, M.; Barnard, J.; Zhu, X. Pathway-Based Analysis for Genome-Wide Association Studies Using Supervised Principal Components. Genet. Epi-demiol. 2010 Nov, 34 (7), 716724.

    123. Cantor, R. M.; Lange, K.; Sinsheimer, J. S. Prioritizing GWAS Results: A Review of Statistical Methods and Recom-mendations for Their Application. Am. J. Hum. Genet. 2010 Jan, 86 (1), 622, Review.

    124. Holmans, P. Statistical Methods for Pathway Analysis of Genome-Wide Data for Association with Complex Genetic Traits. Adv. Genet. 2010, 72, 141179.

    125. Hong, M. G.; Pawitan, Y.; Magnusson, P. K.; Prince, J. A. Strategies and Issues in the Detection of Pathway Enrichment in Genome-Wide Association Studies. Hum. Genet. 2009 Aug, 126 (2), 289301.

    126. Thomas, D. C.; Conti, D. V.; Baurley, J.; Nijhout, F.; Reed, M.; Ulrich, C. M. Use of Pathway Information in Molecular Epide-miology. Hum. Genomics 2009 Oct, 4 (1), 2142, Review.

    127. Wang, K.; Li, M.; Hakonarson, H. Analysing Biological Pathways in Genome-Wide Association Studies. Nat. Rev. Genet. 2010 Dec, 11 (12), 843854, Review.

    128. Wang, L.; Jia, P.; Wolfinger, R. D.; Chen, X.; Zhao, Z. Gene Set Analysis of Genome-Wide Association Studies: Method-ological Issues and Perspectives. Genomics 2011 Jul, 98 (1), 18.

    129. Wang, L.; Jia, P.; Wolfinger, R. D.; Chen, X.; Grayson, B. L.; Aune, T. M.; Zhao, Z. An Efficient Hierarchical General-ized Linear Mixed Model for Pathway Analysis of Genome-Wide Association Studies. Bioinformatics 2011 Mar 1, 27 (5), 686692.

    130. Carro, M. S.; Lim, W. K.; Alvarez, M. J.; Bollo, R. J.; Zhao, X.; Snyder, E. Y.; Sulman, E. P.; Anne, S. L.; Doetsch, F.; Colman, H., et al. The Transcriptional Network for Mesen-chymal Transformation of Brain Tumours. Nature 2010 Jan 21, 463 (7279), 318325.

    131. Cerami, E.; Demir, E.; Schultz, N.; Taylor, B. S.; Sander, C. Automated Network Analysis Identifies Core Pathways in Glioblastoma. PLoS One 2010 Feb 12, 5 (2), e8918.

    132. Torkamani, A.; Schork, N. J. Identification of Rare Cancer Driver Mutations by Network Reconstruction. Genome Res. 2009 Sep, 19 (9), 15701578.

    133. Satagopan, J. M.; Smith, A. D. Statistical Methods in Genom-ics Research. Heart Drug 2003, 3, 4860.

    134. Reich, D. E.; Goldstein, D. B. Detecting Association in a Case-Control Study While Correcting for Population Strat-ification. Genet. Epidemiol. 2001 Jan, 20 (1), 416, Review.

  • CHAPTER 12 Multifactorial Inheritance and Complex Diseases 15Biographies

    Christine Duarte, PhD. Dr Duarte is an assistant professor in the Department of Biostatistics at University of Alabama at Birmingham. Her current research focuses on developing statistical methods for applying systems biology approaches to high-dimensional genomic data analysis. Her research interests falls into three main areas: (1) prediction of complex traits and disease from high dimensional genomic data; (2) developing network models for decomposing genetic effects on complex traits into intermediate phenotypes; and (3) integrating cross-platform genomic data using network modeling to explain biological processes of interest such as disease etiology and treatment response.

    Laura Kelly Vaughan, PhD. Dr Vaughan is an assistant professor in the Section on Statistical Genetics in the Department of Biostatistics at the University of Alabama at Birmingham. Her research interests include integrating biological knowledge into the statistical analysis of com-plex genomic data, pathway analysis, and reproducible research practices.

    T Mark Beasley, PhD. Dr Beasley is an associate professor and has been a faculty member of the UAB Department of Biostatics and the Section on Statistical Genetics (SSG) since 2001. Dr Beasley has published over 30 articles in the area of statistical methodology, focused in five major areas: (1) methodological problems in statistical genetics; (2) nonparametric statistics; (3) simulation studies; (4) the use of linear models; (5) longitudinal analysis, and (6) mediation analysis. He has also coauthored over 30 applied research articles in a variety of disciplines including education, psychology, medicine, genetics, and gerontology, pharmacology.

    Hemant K Tiwari, PhD. Dr Tiwari is William Student Sealy Gosset Professor in Biostatistics and Interim Head of the Section on Statistical Genetics in the School of Public Health at the University of Alabama at Birmingham. His research interests include genetic linkage analysis, disequilibrium mapping, genome-wide association studies (GWAS), population genetics, struc-tural variations in the genome, pharmacogenetics/pharmacogenomics, and bioinformatics. At present, he is involved in gene mapping studies of rheumatoid arthritis, SLE, cardiovascular diseases, and obesity-related traits.

    12 - Multifactorial Inheritance and Complex Diseases12.1 Introduction12.2 Definitions and Terminology12.3.1 Familial Aggregation12.3.2 Segregation Analysis12.3.3 Linkage Analysis12.3.3.1 Model-Based Analysis.12.3.3.2 Model-Free Linkage Analysis.The genetic mechanism underlying a complex disease is often unknown, and it is impossible ...

    12.3.4 Transmission Disequilibrium Test and Association Analysis Using Familial Data

    12.4 The International HapMap Project12.5 Genome-Wide Association Studies12.5.1 Study Designs12.5.2 Quality Control12.5.3 Sex Inconsistency12.5.4 Relatedness and Mendelian Errors12.5.5 Batch Effects12.5.6 Marker and Sample Genotyping Efficiency or Call Rate12.5.7 Population Stratification12.5.8 Marker Allele Frequency and HardyWeinberg Equilibrium Filter

    12.6 Association Methods/Statistical Analysis12.6.1 Discovery Phase of the GWAS12.6.2 Validation and Replication Phase

    12.7 Analysis of Rare Variants Using New Technologies12.8 Integration of Genetic, Genomic, and Functional Data for Analysis of Multifactorial Diseases12.9 ConclusionsReferences