Summary Lecture This class has been edited from several sources. Primarily from Terry Speed’s homepage at Stanford and the Technion course “Introduction

.

Summary Lecture

This class has been edited from several sources. Primarily from Terry Speed’s homepage at Stanford and the Technion course “Introduction to Genetics” and several other courses as specified on some slides. Changes made by Dan Geiger.

2

Purpose of human linkage analysis

To obtain a crude chromosomal location of the gene or genes associated with a phenotype of interest, e.g. a genetic disease

or an important quantitative trait.

Examples: Cystic fibrosis (found), Diabetes, Alzheimer, and Blood pressure.

3

Linkage Strategies I

Traditional (from the 1980s or earlier) Linkage analysis on pedigrees Association studies: candidate genes Allele-sharing methods: Affected siblings Animal models: identifying candidate genes Cell – hybrids

Newer (from the 1990s) Focus on special populations (Finland, Hutterites) Haplotype-sharing (many variants)

4

Linkage Strategies II

On the horizon (here) Single-nucleotide polymorphism (SNPs) Functional analyses: finding candidate genes

Needed (starting to happen) New multilocus analysis techniques, especially Ways of dealing with large pedigrees Better phenotypes: ones closer to gene products Large collaborations

5

Horses for courses

Each of these strategies has its domain of applicability

Each of them has a different theoretical basis and method of analysis

Which is appropriate for mapping genes for a disease of interest depends on a number of matters, most importantly the disease, and the population from which the sample comes.

6

The disease matters

Definition (phenotype), prevalence, features such as age at onset

Genetics: nature of genes (Penetrance), number of genes, nature of their contributions (additive, interacting), size of effect

Other relevant variables: Sex, obesity, etc.

Genotype-by-environment interactions: Exposure to sun.

7

Example: Age at onset

8

The population matters

History: pattern of growth, immigration

Composition: homogeneous or melting pot, or in between

Mating patterns: family sizes, mate choice

Frequencies of disease-related alleles, and of marker alleles

Ages of disease-related alleles

9

Complex traits

Definition vague, but usually thought of as having multiple, possibly interacting loci, with unknown penetrances; and phenocopies.

Affected only methods are widely used. The jury is still out on which, if any will succeed.

Few success stories so far.

Important: heart disease, cancer susceptibility, diabetes, …are all “complex” traits.

We focused more on simple traits where success has been demonstrated very often. About 6-8 percent of human diseases are thought o be simple Mendelian diseases.

10

Design of gene mapping studies

How good are your data implying a genetic component to your trait? Can you estimate the size of the genetic component?

Have you got, or will you eventually have enough of the right sort of data to have a good chance of getting a definitive result?

Power studies.

Simulations.

11

Genotyping

Choice of markers: highly polymorphic preferred.

Heterozygosity and polymorphism information content (PIC) value are measures commonly used.

Reliability of markers important too

Good quality data critical: errors can play a surprisingly large role.

A person is said to be typed if its markers have been genotyped.

12

Preparing genotype data for analysis

Data cleaning is the big issue here.

Need much ancillary data…how good is it?

13

Analysis

A very large range of methods/programs are available.

Effort to understand their theory will pay off in leading to the right choice of analysis tools.

Trying everything is not recommended, but not uncommon.

Many opportunities for innovation.

14

Interpretation of results of analysis

An important issue here is whether you have established linkage. The standards seem to be getting increasingly stringent.

What p-value or LOD should you use?

Dealing with multiple testing, especially in the context of genome scans and the use of multiple models and multiple phenotypes, is one of the big issues. E.g., Bonferroni correction.

15

Problem with standard P-values

If a single test was to be employed to test a null hypothesis, using 0.05 as the significance level and if the null hypothesis was actually true; the probability of reaching the right conclusion (i.e., not significant) is 0.95.

If two such hypotheses were tested, then the probably of reaching the right conclusion (i.e., not significant) on both occasions would be 0.95X0.95 = 0.90.

If more hypotheses (n) were tested and if all of them were in fact true, the probability of being right on all occasions would decrease substantially (0.95n).

In other words, the probability of being wrong at least once (or getting a significant result erroneously) would increase drastically (1-0.95n).

Put simply, by running more tests on a given data set, there is an increasing likelihood of getting a significant result by chance alone

Source: http://www.edu.rcsed.ac.uk/statistics/the%20bonferroni%20correction.htm

16

The Bonferroni Correction for Non-statisticians

The Bonferroni correction for multiple significance testing is simply to multiply the p value by the number of tests k carried out. The corrected value kp is then compared against the level of 0.05 to decide if it is significant. If the corrected value is still less than 0.05, only then is the null hypothesis rejected.


17

Some Problems with the Bonferroni Correction [1] 1. This test is for independent tests not for depended ones.

2. If one carries out multiple tests on a single set of data, the interpretation of a single relationship between two variables (or the p value) would actually depend on how many other tests were performed.

3. Perhaps too cautious. This means that significant results are lost and the power of the study is reduced.

4. If Bonferroni correction were to be made universal, to make results significant, authors would not include many other tests they would have done with non-significant results and thus would not apply Bonferroni to same extent they should.

Also for tests published in other papers on the same set of patients or tests done subsequently would need to be corrected taking into account the number of previous tests.

Source (modified from): http://www.edu.rcsed.ac.uk/statistics/the%20bonferroni%20correction.htm

18

When to use Bonferroni Correction ?

Because of the above problems due to the disagreements among statisticians over its universal use, the use of the Bonferroni correction may best be limited to instances like

• a group of cases and controls subjected to a number of independent tests of associations with different biological parameters

• the same test being repeated in many subsamples, such as when stratified by age, sex, income status, etc.

Even in these instances, if there is a biological explanation for the null hypothesis to be rejected and only the non-corrected p value is significant, but kp is not, one is allowed to conclude (with appropriate explanations, of course!), the significant nature of the findings.


19

References to Bonferonni and other multiple test

1. Perneger, T.V. What’s wrong with Bonferroni adjustments. BMJ, 1998. 316(7139):p. 1236-1238.

2. Bender, R. and S. Lange, Multiple test procedures other than Bonferroni’s deserve wide use. BMJ, 1999. 318(7138):p.600-601.

3. Sankoh, A.J., M.F. Huque, and S.D. Dubey, Some comments on frequently used multiple endpoint adjustment methods in clinical trials. Stat Med, 1997. 16(22):p.2529-2542.


Documents

Summary Lecture This class has been edited from several sources. Primarily from Terry Speed’s homepage at Stanford and the Technion course “Introduction