Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women

Note that the genetic mapis different for men and women

Recombination frequency is higher in meiosisin women

• The CEPH families were instrumental in constructing the map

• But our goal is to map human diseases

•You rarely get large multi-generational highly informative families

• How do we get to a lod score of 3 with small families?

How do we extend the map to identifying disease genes?

That’s the awesome power of logarithms

Science (2006) 312:279-282

Recall: The small family (5 kids) and Mom (informative) was either:

D

2

d

1

D

1

d

2OR

L = [(0.9)5 + (0.1)5 ]/ 2

(0.5)5

Z = log10 L = 0.97

What if there were no crossovers?

(For one crossover, Z = 0.021, the crossover penalty)

Dd 12 dd 22

dd 12 Dd 22 Dd 22 dd 12Dd 12

Add the lod scores from different families

This is the same as multiplying probabilities

What is the probability of two coin flips and getting two heads?

0.5 x 0.5 = 0.25 (product rule in statistics)

If the same markers are in two different families, then they are independent

4 or 5 small families, and a small number of crossovers, should suffice

Works extremely well for DNA markers, more problematic for diseases

If the disease (phenotype) is caused mutations in either of two genes, then mixing lod scores will confound the analysis (called heterogeneity)

Autosomal Recessive

Use IBD Mapping

Look for homozygous region in affected individuals, Not homozygous in unaffected individuals

IDB preserves the haplotype

Similar principle as Linkage Disequilibrium

Except it is with individuals, not populations

Disequilibrium Mapping

A way to map genes using populations

Instead of using pedigress, use all of the patients

We are interested in haplotypes

GAAAGGAAAAGAAGATTTACTTCC[1396bp]GAAGCTCAGAAAGGCGATAATATAAAAAATAT[2502bp]TTGGGAATTTACAGAATAC

Haplotype 3

GAAAGGAAAAGAAGATTTCCTTCC[1396bp]GAAGCTCAGAAAGGTGATAATATAAAAAATAT[2502bp]TTGGGAATTTACAGAATAC

GAAAGGAAAAGAAGATTTACTTCC[1396bp]GAAGCTCAGAAAGGCGATAATATAAAAAATAT[2502bp]TTGGGAATTTACAAAATAC

2 alleles 2 alleles 2 alleles

Haplotype 2

Haplotype 1

A B C D E

A1 B1 C1 D1 E1

A2 B2 C2 D2 E2

Consider five loci each with two alleles

How Many Haploytpes? A1 B2 C1 D2 E1

A2 B2 C1 D1 E2Individual =

Two haplotypes

In theory there are 25 (32) possibilities IF the combinations are independent

In practice, far fewer (5-10 in sub-Mb distances)

A B C D E

A1 B1 C1 D1 E1

A2 B2 C2 D2 E2

Some SNPs are “old”

Example A1 and A2, D1 and D2 If they are in Hardy Weinberg equilibrium,then 4 haplotypes

A1 D1A1 D2A2 D1A2 D2

A new SNP arises (B2), but in just one haplotype

A1 B2 D1

A1 B1 D1A1 B1 D2A2 B1 D1A2 B1 D2

New Haplotype

WHY?

A1 B2 C1 D1 E1A1 B1 C1 D1 E1A1 B1 C1 D2 E1A2 B1 C1 D1 E1A2 B1 C1 D2 E1

Even later, two new SNPs arise (C2 and E2)

A1 B1 C2 D1 E1

A2 B1 C1 D1 E2

So we end up with a total of 7 haplotypes for 5 SNPs

There is a possibility of recombination between SNPs

However, this is very slow and improbable, especially for short distances

Now consider that a disease mutation arises between C and D

Just like the SNPs, it is likely to have arisen once

And it is in only one of the common 7 haplotypes

Therefore the SNP alleles in that haplotype are correlated with the mutation

This is the principle of DISEQUILIBRIUM MAPPING

It depends on:

1. Age of the mutation

2. Age of the SNPs in the haplotype

3. Age of the population

4. Frequency of recombination (distance between) SNPs

Disequilibrium mapping is particularly useful when:

There is a relatively new disease mutation

Relatively isolated (and hopefully new) population (Finland)

A B C D E

*PopulationAlleleFrequencies

1 0.3 0.2 0.8 0.7 0.22 0.7 0.8 0.2 0.3 0.8

If equilibrium, patients should have same allele frequencies

If disequilibrium, patients should have increased frequencies near the disease gene

The degree of deviation should be maximal near the disease gene

A1 B2 C 1 D2 E1

*

Simple Case:

Autosomal dominant disease arises between C and D of a particular genotype

A Few Generations Later:

Allele Frequencies

Population 0.3 0.8 0.3 0.3 0.2

Patients 1 1 1 1 1

Over time:

(Patients only) 0.6 0.9 1 1 0.9

Later

0.4 0.8 1 1 0.7

Deviation fromPopulationFrequency

Distance along Chromosome

Disease Gene

So this is it……………….

How do we find the gene and the mutation?

We need to make the correlation with the genetic map

(for example distance in cM) to the physical map (DNA)

Most important to have the physical map annotated

Distance along Chromosome

Disease Gene

All methods give a map location(Maximum Likelihood)

Point your browser to genome.ucsc.edu

Identify the genes in the interval

Look for best candidate

Expression data (is the gene expressed in affected tissue?)

Is expression of the gene affected in patients?

Ultimately we must search for mutations

DNA sequencing is best (SSCP is usually done first)

Does the mutation make sense

For example, recessive= loss of function

SNP chips Lots of possibilities

Great Dane x Mexican Chihuahua

F1 Big (Great Danes)

3 Big : 1 Small?

Not Likely………….The sum total of many gene….multigenic

Many human disorders, conditions and predispositions are multigenic

Twin studies where identical twins are raised together or raised apart

Look at complex behaviors and ask if they are genetic or environment

Answer: For almost every single behavior…..it’s a little of both

“Heritability” or the fraction of the condition that is genetic

But how many genes?

Association studies…..use SNP chips and the awesome power of

Computational Biology

Documents

Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women