1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B

1

If a trait is inherited in a recessive manner with complete penetrance, then the child of two unaffected carriers has a 25% chance of having the trait. Example: Brown and Blue eyes. The Blue is recessive to Brown.

B-b B-b

B-B B-b B-b b-b

Lecture 2 - Segregation Analysis

1/15/04

Biomath 207B / Biostat 237 / HG 207B

2

If a trait is inherited in a dominant manner with complete penetrance, then the child of an affected heterozygous parent and an unaffected parent has a 50:50 chance of having the trait. Dominant and recessive are relative terms. Note Brown eye gene is dominant to Blue eye gene.

B-b b-b

b-b b-b B-b B-b

3

Simple segregation patterns:

(1) recessive pattern of inheritance.(2) disease is fully penetrant(3) let D denote the disease allele(4) p(d)=0.7, p(D)=0.3 (5) collect all families with exactly two children

What distribution of affecteds do we expect to see under Hardy Weinberg Equilibrium and randommating?

Unaffectedparents:

One affectedparent (male orfemale):

Two affected parents:

75.1% 6.6% 1.1%

10.7% 3.8% 1.9%

0.8%

4

A disease that is inherited in a dominant manner has a different pattern

(1) disease is fully penetrant(2) let D denote the disease allele(3) p(d)=0.9, p(D)=0.1 (4) collect all families with exactly two children(5) Hardy Weinberg equilibrium and random mating

Unaffectedparents:

One affectedparent (male orfemale):

Two affected parents:

7.3% 14.6% 8.9%

1.2%

65.6%

0.2% 2.2%

5

Why is it not always this simple?

-More than one gene can be involvedand environment influences disease risk. That is, there are diseases with reduced penetrance and sporadic cases of disease.-Can’t sample everyone. Complete ascertainment is impractical for rare diseases-Family structures will vary. Parents may not be available.

6

Most common diseases are examples of multi-factorial,orcomplex,traits.

Complex trait: more than one gene or gene(s) and environmentplay a role.

Two genes gene-environment genes-environmentadditive effects additive effects interactions

gene 1 gene gene 1

TRAIT TRAIT gene 2 TRAIT

gene 2 environment environment

In a multi-factorial disease, genes that play a role insusceptibility to a disease may not be necessary or sufficient fordisease expression. Do not observe Mendelian inheritancepatterns.

Mendelian inheritance patterns include the transmissionpatterns expected if there is a single gene obeying Mendel’s lawof independent assortment of alleles at a single locus, eg.dominant, recessive.

7

Quantifying the Familial Aggregation

The first step of any genetic study is to ask one of thefollowing related questions:

(1) Does the disease aggregate in families (more thanexpected by chance)?

(2) Are family members’ trait values more likely to besimilar than the trait values of two randomlyselected people from the same population?

One popular method of answering these questions is tocalculate the recurrence risk to relatives.

Recurrence risk to relatives of type R :

R = Prob(relative of type R affected | subject affected) Prob(random person affected)

The larger R, the greater than degree of aggregation infamilies but a large value of R does not prove diseasehas a genetic basis. Aggregation could be commonenvironmental factors.

Prob(random person affected)= population prevalence.

The observation that offspring > siblings argues against apurely Mendelian trait.

8

Segregation Analysis

• Goal of Segregation analysis: To identify the specific genetic mechanisms that may control traits associated with disease.

• Segregation Analysis is used to determine if the observed familial aggregation has a genetic basis. In addition, it is used to estimate the relative effects of genetic and environmental factors shared among family members. It can also be used to test for gene-environmental interactions.

• See Jarvik (1998) Complex Segregation analyses: Uses and Limitations AJHG 63:942-946 for more information.

9

Why go to all the trouble of

segregation analysis?

(1) Calculating relative risks isn’t good enough.Familial aggregation can be due to shared environment. High sibling relative risk (s) or heritability does not prove that the disease has a genetic component (see for example, Guo AJHG 1998). Segregation analysis increases the confidence that genes play a role in the susceptibility to the disease.

(2) The most powerful forms of linkage analysis require accurate knowledge of the inheritance mode and penetrance of the disease.Genetic model based gene mapping (classical linkage analysis) requires that the inheritance mode (dominant, recessive, etc) for the major gene and the probability of disease given a particular genotype be known. If the genetic model is wrong the false negative rate is increased (Martinez M. et al, Gen. Epi., 1989, 6:253-8).

10

Segregation analysis is a more difficult but more informative method of gathering evidence for substantial genetic involvement in susceptibility to the trait.

Familial Aggregation can be due to:

(1) Shared genes (a) one gene acting in a

(i) dominant manner Let D be the disease risk gene

P(disease|DD)=P(disease|Dd)>P(disease|dd) (ii) recessive manner

P(disease|DD)>P(disease|Dd)=P(disease|dd) (iii) additive manner

P(disease|Dd)=1/2(P(disease|DD)+P(disease|dd)) (iv) codominant manner

P(disease|DD)>P(disease|dD)>P(disease|dd) (b) several genes (c) many genes (polygene model)

(2) Shared environment (3) A combination of both genes and environment that

can include interactions between the genes and the environment.

11

Segregation Analysis involves: (1) Specifying a mathematical model (similar to genetic

model based linkage analysis). (2) Computing the likelihood of the observed data under

the model (3) Comparing various models to find the “best” fitting

model. Note that with segregation analysis, the best model is the best model among those examined. For example, if a polygene model is not among the choices for a disease caused by many loci, the best fitting model might be end up being a major gene model with spurious environmental factors. Environmental factors must be identified and carefully documented for accurate results. The method of finding the families (ascertainment) should be included in the model.

12

The overall approach to segregation analysis is:

• Step (1): Specify null and alternative hypotheses.• For example: no aggregation in families at all

(sporadic model) for the null hypothesis and Mendelian inheritance (single gene) as the alternative hypothesis.

• Step (2): Translate into mathematical models.• Step (3): Compute the maximum likelihood of the

data and maximum likelihood estimates for the parameters in the mathematical model for both hypotheses.

• Step (4): If the null model is a special case of the alternative (nested models), then compare the models using Likelihood ratio tests (LRT) to find the hypothesis that is best supported by the data (hierarchical testing). If not nested, then use AIC criterion or simulation to test.

• Repeat these steps for as many hypotheses as you wish to test.

13

Comparing models:

(1) If the null hypothesis is a special case of the alternative model then one way to compare is using a LRT test. For example a dominant Mendelian model is a restriction of the co-dominant Mendelian model. Under this null hypothesis: 2*LR has a chi-square distribution. The degrees of freedom are determined by the difference in the number of parameters. When comparing the dominant and codominant Mendelian models, the degree of freedom is one. The chi-square statistic has an associated p-value. If it is less than 0.05 then reject the null hypothesis in favor of the alternative. If it is greater than 0.05 then accept the null hypothesis.

(2) If the null hypothesis is not a special case of the alternative use the AIC criterion to compare. For example, a dominant Mendelian model under HWE is not a special case of a recessive Mendelian model where we do not assume HWE. The model with the lowest AIC corresponds to the accepted hypothesis.

14

Converting hypotheses into models:

• The mathematical models have three parts:• The penetrance – a measure of how likely is the

trait value given a person is in a particular risk group In genetics, the most relevant parameters are =aa, Aa, AA, representing the value for phenotype value for the aa, and the change in value for the Aa, or AA group.

• The prior - The probability that a founder belongs to a particular risk group (under HWE determined by qA).

• The transmission probabilities - The probability that an offspring belongs to a particular risk group given their parents’ risk groups. The relevant parameters aa, Aa, and AA. For example aa = P(A transmitted from an aa parent) and aa aa =P(AA transmitted from aa and aa parents) Under Mendelian inheritance, aa= 0, Aa=1/2, and AA=1.

15

With this information, determine the likelihood of the trait gene location given the marker genotypes for the family members. (Sum over the possible genotypes for the trait).

Each family is independent so the individual family likelihoods multiply.

mlmlkj

iiG G i

GGTransGXPenn

,|GGPrior|...

rfamily for Prob

m},,{

j

1

16

Ousiotype model:

Define AA, Aa, aa to be the probability of "transmitting" type A to an offspring depending on the parental type. These transmission probabilities are Pr(gi|gfi,gmi) where gi is person i's ousiotype, gfi is his father's ousiotype, and gmi is his mother's ousiotype.

if P(offspring ousiotype |parents ousiotype) offspring's ousiotype (gi): father's

ousiotype (gfi)

mother's ousiotype (gmi)

aa aA AA

aa aa aa)

2 aa aa) aa2

aa Aa aa) Aa) aa Aa)+ aa)Aa aaAa aa AA aa) AA) aa AA)+ aa)AA aaAA Aa aa Aa) aa) Aa aa)+ Aa)aa Aaaa Aa Aa Aa)

2 Aa Aa) Aa2

Aa AA Aa) AA) Aa AA)+ aA)AA AaAA AA aa AA) aa) AA aa)+ AA)aa AAaa AA Aa AA) Aa) AA Aa)+ AA)Aa AAAa AA AA AA)2 AA AA) AA

2

Documents

1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B