196
Statistical Genetics Matt McQueen Assistant Professor Institute for Behavioral Genetics University of Colorado at Boulder

Statistical Genetics - University of Colorado Boulder

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistical Genetics - University of Colorado Boulder

Statistical Genetics

Matt McQueenAssistant Professor

Institute for Behavioral GeneticsUniversity of Colorado at Boulder

Page 2: Statistical Genetics - University of Colorado Boulder

Why am I here?

Statistical Genetics - Biodemography

Page 3: Statistical Genetics - University of Colorado Boulder

Perspectives…

Page 4: Statistical Genetics - University of Colorado Boulder

Perspectives…

Epidemiology

Page 5: Statistical Genetics - University of Colorado Boulder

Perspectives…

Biostatistics

Page 6: Statistical Genetics - University of Colorado Boulder

Perspectives…

Health Policy

Page 7: Statistical Genetics - University of Colorado Boulder

Perspectives…

Environmental Health

Page 8: Statistical Genetics - University of Colorado Boulder

Perspectives…

Society, Human Development and Health

Page 9: Statistical Genetics - University of Colorado Boulder

The View from Here…

GENES Outcome

Environment

Page 10: Statistical Genetics - University of Colorado Boulder

OverviewBackground and Introduction

Linkage and Linkage Disequilibrium

Population Genetics

Linkage Analysis

Association Analysis

Page 11: Statistical Genetics - University of Colorado Boulder

OverviewBackground and Introduction

Linkage and Linkage Disequilibrium

Population Genetics

Linkage Analysis

Association Analysis

Page 12: Statistical Genetics - University of Colorado Boulder

Statistical GeneticsOtherwise known as:

- Genetic Epidemiology- Genetic Statistics

By definition, “integrative”- Combines epidemiological, statistical, clinical,

genetic and molecular approaches

Page 13: Statistical Genetics - University of Colorado Boulder

Genetic Discovery

Evidence for genetic effects? Familial aggregation

Mode of inheritance? Segregation Analysis

Where in the region? Fine Mapping

What chromosome / region? Linkage Analysis

What gene? Association Analysis

What is the effect of the gene? Characterization

Page 14: Statistical Genetics - University of Colorado Boulder

Why Hunt for Genes?

Page 15: Statistical Genetics - University of Colorado Boulder

Why Hunt for Genes?Disease etiology

Page 16: Statistical Genetics - University of Colorado Boulder

Why Hunt for Genes?Disease etiology

Refined diagnosis and/or prognosis

Page 17: Statistical Genetics - University of Colorado Boulder

Why Hunt for Genes?Disease etiology

Refined diagnosis and/or prognosis

Drug development

Page 18: Statistical Genetics - University of Colorado Boulder

Why Hunt for Genes?Disease etiology

Refined diagnosis and/or prognosis

Drug development

Disease prediction

Page 19: Statistical Genetics - University of Colorado Boulder

Challenges

Page 20: Statistical Genetics - University of Colorado Boulder

ChallengesField is young and changes rapidly

- Technology drives the science- We test because we can

Page 21: Statistical Genetics - University of Colorado Boulder

ChallengesLiterature can be difficult

- Statisticians writing genetic papers- Geneticists writing statistical papers

Page 22: Statistical Genetics - University of Colorado Boulder

ChallengesSoftware typically not well-tested or supported

- The cost of being “free”- Use at your own risk!

Page 23: Statistical Genetics - University of Colorado Boulder

ChallengesMethods are often oversold

- Consequence of high-pressure field - Rapid development creates sense of urgency

Page 24: Statistical Genetics - University of Colorado Boulder

Some TerminologyLocus

- A location in the genome

Gene- A DNA segment characterized by sequence, transcription or

homology

Allele- Different forms of a gene: A, a; B, b

Polymorphism- Allele present in the population with > 5% freq

Mutation- Allele present in the population with < 5% freq

Page 25: Statistical Genetics - University of Colorado Boulder

Some TerminologyPhenotype

- Any measurable outcomeQuantitative Trait Locus (QTL)

- A region (gene) that contributes to a phenotypePenetrance (binary, disease phenotypes)

- Prob(Phenotype | Genotype)Heritability (quantitative traits)

- Variance explained by genetic factorsMendelian Disorder

- Diseases influenced by a single geneComplex Trait

- Disease influenced by multiple genes and environment

Page 26: Statistical Genetics - University of Colorado Boulder

Pedigree Notation

Male

Female

Founders

Page 27: Statistical Genetics - University of Colorado Boulder

Pedigree Notation

Male

Female

Affected

Page 28: Statistical Genetics - University of Colorado Boulder

Pedigree Notation

Male

Female

Affected

Deceased

Page 29: Statistical Genetics - University of Colorado Boulder

Mendel’s Laws

Page 30: Statistical Genetics - University of Colorado Boulder

Mendel’s LawsMendel’s First Law

- Independent Segregation

ggGgg

GgGGGDad

gG

Mom

Page 31: Statistical Genetics - University of Colorado Boulder

Mendel’s LawsMendel’s Second Law

- Independent Assortment

Page 32: Statistical Genetics - University of Colorado Boulder

Mendel’s LawsWhat do peas have to do with people?

- Underlying principles of statistical genetics!

Page 33: Statistical Genetics - University of Colorado Boulder

OverviewA Brief Introduction

Linkage and Linkage Disequilibrium

Population Genetics

Linkage Analysis

Association Analysis

Page 34: Statistical Genetics - University of Colorado Boulder

Linkage

Page 35: Statistical Genetics - University of Colorado Boulder

LinkageGeneral Idea:

- Describes the relationship between two loci- If two loci are close in proximity

- “linked”

- If two loci are far apart (different chromosomes):- “not linked”

Page 36: Statistical Genetics - University of Colorado Boulder

Recombinant Events

Page 37: Statistical Genetics - University of Colorado Boulder

RecombinationA1

B1

A2

B2

Page 38: Statistical Genetics - University of Colorado Boulder

RecombinationA1

B1

A2

B2

gametes A1

B1

A2

B2

A2

B1

A1

B2

Page 39: Statistical Genetics - University of Colorado Boulder

RecombinationA1

B1

A2

B2

gametes A1

B1

A2

B2

A2

B1

A1

B2

θ = Recombination Rate

1−θ2

1−θ2

θ2

θ2probability

Page 40: Statistical Genetics - University of Colorado Boulder

No LinkageA1

B1

A2

B2

gametes A1

B1

A2

B2

A2

B1

A1

B2

14

14

14

14probability

θ ~ 0.5

Page 41: Statistical Genetics - University of Colorado Boulder

Where have we seen this before?Mendel’s Second Law

- Independent Assortment

Page 42: Statistical Genetics - University of Colorado Boulder

LinkageA1

B1

A2

B2

gametes A1

B1

A2

B2

θ ~ 0

12

12

Page 43: Statistical Genetics - University of Colorado Boulder

Recombination EventsWhat predicts a recombination event?

What drives the recombination fraction?

Page 44: Statistical Genetics - University of Colorado Boulder

Genetic DistanceDefinition:

- The expected number of crossover events between two loci

Units:- Morgans- 1 Morgan = 1 crossover event expected

Genetic Map- A linearly arranged set of loci with genetic distances between

them- Human Autosomes ~ 3900 cM

Page 45: Statistical Genetics - University of Colorado Boulder

Linkage Disequilibrium

Page 46: Statistical Genetics - University of Colorado Boulder

Linkage DisequilibriumGeneral Idea:

- Describes the relationship between alleles at two loci

- If the alleles at each loci are close in proximity:- “in linkage disequilibrium”

Page 47: Statistical Genetics - University of Colorado Boulder

Linkage Disequilibrium

x4x3x2x1Frequency

A2B2A2B1A1B2A1B1Gametes

Page 48: Statistical Genetics - University of Colorado Boulder

Linkage Disequilibrium

x4x3x2x1Frequency

A2B2A2B1A1B2A1B1Gametes

pB2=x2+x4pB1=x1+x3pA2=x3+x4pA1=x1+x2Frequency

B2B1A2A1Allele

Page 49: Statistical Genetics - University of Colorado Boulder

Linkage Disequilibrium

x4x3x2x1Frequency

A2B2A2B1A1B2A1B1Gametes

pB2=x2+x4pB1=x1+x3pA2=x3+x4pA1=x1+x2Frequency

B2B1A2A1Allele

D = Observed - Expected

D = x1 − pA1pB1

D = x1 − (x1 + x2)(x1 + x3)D = x1x4 − x2x3

Page 50: Statistical Genetics - University of Colorado Boulder

Another Common LD Metric

r2 =D2

pA1pA 2 pB1pB 2

Page 51: Statistical Genetics - University of Colorado Boulder

Reasons for LDMutation

Population Subdivision

Genetic Drift

Lack of Recombination

Selection

Non-Random Mating

Page 52: Statistical Genetics - University of Colorado Boulder

How does linkage relate to linkage disequilibrium?

Page 53: Statistical Genetics - University of Colorado Boulder

Linkage and LD

Dt = (1−θ)t D0

After t generations of random mating…

LD is a function of recombination and time (generations)

Page 54: Statistical Genetics - University of Colorado Boulder

Linkage and LD

Key Concepts…- Linkage : Location- LD : Alleles- There can be Linkage without LD- There can be LD without Linkage

Page 55: Statistical Genetics - University of Colorado Boulder

OverviewBackground and Introduction

Linkage and Linkage Disequilibrium

Population Genetics

Linkage Analysis

Association Analysis

Page 56: Statistical Genetics - University of Colorado Boulder

DNA VariationDNA

- Adenine (A)- Guanine (G)- Cytosine (C)- Thymine (T)

DNA double helix- A pairs with T and G pairs with C

Codons- Triplets of bases- 64 possible codons

- 20 amino acids

Page 57: Statistical Genetics - University of Colorado Boulder

MutationsPoint

- Substitute one base for another

Deletions- Base removed entirely

Insertions- Base inserted

Duplications- Base and/or sequence duplicated

Page 58: Statistical Genetics - University of Colorado Boulder

MutationsPoint

- Substitute one base for another

Deletions- Base removed entirely

Insertions- Base inserted

Duplications- Base and/or sequence duplicated

Page 59: Statistical Genetics - University of Colorado Boulder

More on Point MutationsPoint Mutations

- Synonymous- No change in amino acid

- Nonsynonymous- Amino acid change

- Creates a new polymorphic site- “Single Nucleotide Polymorphism” (SNP)

Page 60: Statistical Genetics - University of Colorado Boulder

Mutation Becomes PolymorphismInfinite Sites Model

- Each mutation creates a unique polymorphic site- Mutation rate ~ 10-6

Page 61: Statistical Genetics - University of Colorado Boulder

Life After MutationMutation is neutral

- Random Genetic Drift- Eventually, the allele will “drift” out

Mutation is harmful- Selective Pressure

- Allele may quickly disappear

Mutation is beneficial- Selective Pressure

- Allele frequency may increase rapidly

Page 62: Statistical Genetics - University of Colorado Boulder

Human Genetic History

Page 63: Statistical Genetics - University of Colorado Boulder

Human Genetic History

National Geographic: The Genographic Project

Page 64: Statistical Genetics - University of Colorado Boulder

Human Genetic History

National Geographic: The Genographic Project

Page 65: Statistical Genetics - University of Colorado Boulder

Human Genetic History

National Geographic: The Genographic Project

Page 66: Statistical Genetics - University of Colorado Boulder

Human Genetic History

National Geographic: The Genographic Project

Page 67: Statistical Genetics - University of Colorado Boulder

Who Are We?

Tim

e

Sequences

Page 68: Statistical Genetics - University of Colorado Boulder

Who Are We?

Tim

e

Sequences

Page 69: Statistical Genetics - University of Colorado Boulder

Who Are We?

Tim

e

MRCA

Page 70: Statistical Genetics - University of Colorado Boulder

Who Are We?All DNA sequences are derived from others

- Every sample has a genealogy

Eventually, all lineages coalesce- Most Recent Common Ancestor (MRCA)

The “older” the genetic history…- The less observed LD (Africans vs European)

The more isolated genetic history…- The more observed LD (Mayan)

Page 71: Statistical Genetics - University of Colorado Boulder

Who Are We?What does this have to do with gene-mapping?

Balding (2006) Nat Rev Genet.

Page 72: Statistical Genetics - University of Colorado Boulder

OverviewBackground and Introduction

Linkage and Linkage Disequilibrium

Population Genetics

Linkage Analysis

Association Analysis

Page 73: Statistical Genetics - University of Colorado Boulder

Linkage AnalysisGene-Mapping

- Manipulate the Properties of Linkage- Using an observed locus (marker) to draw inferences about

an unobserved locus (disease gene)

Family-Based Design- Extended (grandparents, parents and kids)- Nuclear (parents and kids)

- Sibling Pair (no parents and kids)

Goal: Find genomic region “linked” to disease

Page 74: Statistical Genetics - University of Colorado Boulder

Linkage Analysis

0 2010 30 40 50 60 70

cM

M1 M2 M3 M4 M5 M6 M7 M8

Disease Gene (unobserved)

Genetic Markers

Genetic Distance

Page 75: Statistical Genetics - University of Colorado Boulder

Linkage Analysis

0 2010 30 40 50 60 70

cM

M1 M2 M3 M4 M5 M6 M7 M8

Disease Gene

Page 76: Statistical Genetics - University of Colorado Boulder

Linkage Analysis

0 2010 30 40 50 60 70

cM

M1 M2 M3 M4 M5 M6 M7 M8

Disease Gene

Linkage Region

Page 77: Statistical Genetics - University of Colorado Boulder

Linkage AnalysisParametric

- Affected / Unaffected- Observed recombination events

Non-Parametric- Affected / Unaffected- Identity-by-Descent (IBD)

“Semi-Parametric”- Quantitative- IBD

MCMC- Any phenotype- IBD

Page 78: Statistical Genetics - University of Colorado Boulder

Linkage AnalysisParametric

- Affected / Unaffected- Observed recombination events

Non-Parametric- Affected / Unaffected- Identity-by-Descent (IBD)

“Semi-Parametric”- Quantitative- IBD

MCMC- Any phenotype- IBD

Page 79: Statistical Genetics - University of Colorado Boulder

Linkage AnalysisKey Concepts

- Allele Sharing (IBS and IBD)- Linkage Statistics (LOD Score, etc.)

Page 80: Statistical Genetics - University of Colorado Boulder

Allele Sharing

Page 81: Statistical Genetics - University of Colorado Boulder

Identity by State (IBS)

ac bd

IBS = 0

How many alleles are in common?

Page 82: Statistical Genetics - University of Colorado Boulder

Identity by State (IBS)

ac ad

IBS = 1

How many alleles are in common?

Page 83: Statistical Genetics - University of Colorado Boulder

Identity by State (IBS)

ac ac

IBS = 2

How many alleles are in common?

Page 84: Statistical Genetics - University of Colorado Boulder

Identity by Descent (IBD)

ab cd

ac bd

IBD = 0

How many alleles are common by descent?

Page 85: Statistical Genetics - University of Colorado Boulder

Identity by Descent (IBD)

ab cd

ac ad

IBD = 1

How many alleles are common by descent?

Page 86: Statistical Genetics - University of Colorado Boulder

Identity by Descent (IBD)

ab cd

ac ac

IBD = 2

How many alleles are common by descent?

Page 87: Statistical Genetics - University of Colorado Boulder

IBS and IBD

ab cd

ac bd

IBS = 0IBD = 0

Page 88: Statistical Genetics - University of Colorado Boulder

IBS and IBD

ab cd

ac ad

IBS = 1IBD = 1

Page 89: Statistical Genetics - University of Colorado Boulder

IBS and IBD

ab cd

ac ac

IBS = 2IBD = 2

Page 90: Statistical Genetics - University of Colorado Boulder

Ambiguous IBD

ab cb

bc ab

IBS = 1IBD = 0

Page 91: Statistical Genetics - University of Colorado Boulder

IBD Probabilities

00.500.50Avuncular

00.500.50Half-Sibs

00.500.50Grandparent-Grandchild

00.250.75First Cousin

010Parent-Offspring

0.250.500.25Full Sibs

100MZ Twins

π2π1π0Relative PairProbability of Sharing IBD Alleles

Page 92: Statistical Genetics - University of Colorado Boulder

IBD and Sibling Pairs

00.500.50Avuncular

00.500.50Half-Sibs

00.500.50Grandparent-Grandchild

00.250.75First Cousin

010Parent-Offspring

0.250.500.25Full Sibs

100MZ Twins

π2π1π0Relative PairProbability of Sharing IBD Alleles

Page 93: Statistical Genetics - University of Colorado Boulder

IBD and Sibling PairsUse of Sibling Pairs in linkage analysis

- Affected Sibling Pair (ASP) Design- Binary Trait

- Unascertained Sibling Pair Design- Quantitative Traits

- Ascertained Sibling Pair Design- Quantitative Traits

We look for regions that show deviation of IBD from what is expected under the null

Page 94: Statistical Genetics - University of Colorado Boulder

Linkage Analysis of Sibling PairsBasic Idea

- Sibling pairs sharing more alleles IBD than expected at a trait-influencing locus should have more similar phenotypes

Page 95: Statistical Genetics - University of Colorado Boulder

Affected Sibling Pairs

ASP DSP USP

If there is a shared genetic component…

P(IBD=0, IBD=1, IBD=2) = 0.25, 0.50, 0.25

Page 96: Statistical Genetics - University of Colorado Boulder

Affected Sibling Pairs

100255025Expected

100354520Observed

Total210

Number of Alleles Shared IBD

H0: No LinkageH1: Linkage

Page 97: Statistical Genetics - University of Colorado Boulder

Sibling Pairs (Quantitative Traits)

If there is a shared genetic component…

P(IBD=0, IBD=1, IBD=2) = 0.25, 0.50, 0.25

Page 98: Statistical Genetics - University of Colorado Boulder

Quantitative TraitsHaseman-Elston Algorithm

- Calculate number of alleles shared IBD and the squared phenotype difference for each sibpair

- Regress squared differences against IBD sharing

E(∆2) =α + βπ

∆ = trait difference between sibsα = regression interceptβ = slopeπ = IBD sharing

Page 99: Statistical Genetics - University of Colorado Boulder

0

1

2

3

4

5

6

7

8

9

-0.1 0.4 0.9 1.4 1.9

IBD

Quantitative Traits

β=0

Page 100: Statistical Genetics - University of Colorado Boulder

Quantitative Traits

0

1

2

3

4

5

6

7

8

9

-0.1 0.4 0.9 1.4 1.9

IBD

β<0

Page 101: Statistical Genetics - University of Colorado Boulder

Linkage Analysis Statistics

Page 102: Statistical Genetics - University of Colorado Boulder

The LOD ScoreMorton (1955)Log10 of the ODds for linkageEssentially a Likelihood Ratio

- Likelihood of observed- Likelihood of expected (no linkage, theta=0.5)

Developed in the context of parametric linkage

Page 103: Statistical Genetics - University of Colorado Boulder

Common Nonparametric StatisticsMaximum LOD Score

- “MLS” (or MLOD)- ASP design only- GENEHUNTER, ASPEX

Nonparametric Linkage Score- “NPL Score”- Any family design- GENEHUNTER

Kong and Cox LOD Score- “K&C LOD Score”- Derived from the NPL- MERLIN, ALLEGRO

Page 104: Statistical Genetics - University of Colorado Boulder

Interpreting Linkage StatisticsTraditional View…

- LOD > 3.0 for genome-wide significance

More Contemporary View…- Simulate for empirically derived significance

Page 105: Statistical Genetics - University of Colorado Boulder

Examples from the LiteratureLinkage Analysis

Page 106: Statistical Genetics - University of Colorado Boulder

Alcoholism

Reich et al (1998) Am J Med Genet (Neuropsychiatric Genetics)

Page 107: Statistical Genetics - University of Colorado Boulder

Antisocial Drug Dependence

Stallings et al (2005) Archives of Gen Psychiatry

Page 108: Statistical Genetics - University of Colorado Boulder

Bipolar Disorder

McQueen et al (2005) Am J Hum Genet

Page 109: Statistical Genetics - University of Colorado Boulder

OverviewBackground and Introduction

Linkage and Linkage Disequilibrium

Population Genetics

Linkage Analysis

Association Analysis

Page 110: Statistical Genetics - University of Colorado Boulder

Association Analysis

Page 111: Statistical Genetics - University of Colorado Boulder

Association AnalysisGene-Mapping

- Manipulate the Properties of Linkage Disequilibrium- Using an observed locus (marker) to draw inferences about

an unobserved locus (disease gene)

Fine-Mapping- Refine a linkage region

Candidate-Gene- Evaluate the genetic variation as it relates to an outcome

Goal: Find genomic region and/or genes “associated”with disease

Page 112: Statistical Genetics - University of Colorado Boulder

Association AnalysisFamily-Based

- Parent/Offspring Trios- Sibling Pairs- Nuclear Families- Extended Pedigrees

Population-Based- Case-Control- Cohort

Page 113: Statistical Genetics - University of Colorado Boulder

Association AnalysisKey Concepts

- Genotype Coding- Population Stratification- Transmission Disequilibrium Test (TDT)- Whole Genome Association

Page 114: Statistical Genetics - University of Colorado Boulder

Coding GenotypesAssume a biallelic marker (SNP)There are three possible genotypes

- AA- Aa- aa

Page 115: Statistical Genetics - University of Colorado Boulder

Coding Genotypes

Genotype

100Recessive

(A)

110Dominant

(A)

1,0,00,1,00,0,1Genotype

(A)

210Additive

(A)

AAaAaa

Page 116: Statistical Genetics - University of Colorado Boulder

Genotype Coding

Marker Score = XAdditive : X = (0, 1 or 2)Dominant : X = (0 or 1)Recessive : X = (0 or 1)

Page 117: Statistical Genetics - University of Colorado Boulder

Additive Model

X0 1 2

Y

Page 118: Statistical Genetics - University of Colorado Boulder

Dominant Model

X0 1 2

Y

Page 119: Statistical Genetics - University of Colorado Boulder

Recessive Model

X0 1 2

Y

Page 120: Statistical Genetics - University of Colorado Boulder

Population Stratification

Page 121: Statistical Genetics - University of Colorado Boulder

Genetic AssociationsTruth

- Causal locus (direct)- In LD with causal locus (indirect)

Chance- If you test 100 times, you’ll see ~ 5 tests < 0.05- No causal underpinning

Bias- Association is not causal- e.g. Population stratification

Page 122: Statistical Genetics - University of Colorado Boulder

StratificationEssentially a confounder!

How does it happen?

Page 123: Statistical Genetics - University of Colorado Boulder

Common Cause

G P

A

Ancestry (A) predicts Genotype (G)

Ancestry (A) predicts Phenotype (P)

a.k.a.… Population Stratification

Page 124: Statistical Genetics - University of Colorado Boulder

Poor Epidemiologic DesignSource Population?

Two Necessary Components:- Different prevalence (mean) of disease- Different allele frequency

Page 125: Statistical Genetics - University of Colorado Boulder

Famous Example

Knowler et al (1988) Am J Hum Genet.

Page 126: Statistical Genetics - University of Colorado Boulder

DRD2

Page 127: Statistical Genetics - University of Colorado Boulder

Stratification HappensStrategies to deal with it

- Self-Reported Ancestry- Match (design) or Adjust (analysis)

- Use other genetic markers (ancestry informative)- Genomic Control (Devlin – U of Pittsburgh)- STRUCTURE (Pritchard – U of Chicago)- Eigenstrat (Reich – Broad Institute/Harvard)

- Use a family-based design

Page 128: Statistical Genetics - University of Colorado Boulder

The TDT

Page 129: Statistical Genetics - University of Colorado Boulder

Transmission Disequilibrium Test (TDT)

Page 130: Statistical Genetics - University of Colorado Boulder

Transmission Disequilibrium Test (TDT)

AB AB

ABBBAABA

Page 131: Statistical Genetics - University of Colorado Boulder

Transmission Disequilibrium Test (TDT)

AB AB

ABBBAABA

Under the null:Equally probable!

Page 132: Statistical Genetics - University of Colorado Boulder

Transmission Disequilibrium Test (TDT)

AB AB

AB

Father - “A” was transmitted and “B” wasn’tMother - “B” was transmitted and “A” wasn’t

Page 133: Statistical Genetics - University of Colorado Boulder

Transmission Disequilibrium Test (TDT)

AB AB

AB

Offspring

BBxBB

ABxBB

010ABxAB

AAxBB

AAxAB

AAxAA

BBABAAParent

Page 134: Statistical Genetics - University of Colorado Boulder

Transmission Disequilibrium Test (TDT)

AB AB

AB

Offspring

BBxBB

ABxBB

010ABxAB

AAxBB

AAxAB

AAxAA

BBABAAParent

nBA

nAA

A

nBBB

nABA

B

Not Transmitted

�Tra

nsm

itted

Page 135: Statistical Genetics - University of Colorado Boulder

Transmission Disequilibrium Test (TDT)

AB AB

AB

Offspring

BBxBB

ABxBB

010ABxAB

AAxBB

AAxAB

AAxAA

BBABAAParent

1

0

A

0B

1A

B

Not Transmitted

�Tra

nsm

itted

Page 136: Statistical Genetics - University of Colorado Boulder

TDT

nBA

nAA

A

nBBBnABAB

Not Transmitted

�Tra

nsm

itted

TDT =(nBA − nAB )2

nBA + nAB

~ χ12

McNemar Test for Matched-Pair Data

Page 137: Statistical Genetics - University of Colorado Boulder

Generalized ExtensionsMultiple OffspringMissing ParentsNon-Binary Phenotypes

- Quantitative, time-to-onset, ordinal…

Page 138: Statistical Genetics - University of Colorado Boulder

Generalized ExtensionsFBAT/PBAT (Laird/Lange - Harvard)

QTDT (Abecasis/Cardon - Michigan)

PDT (Monks/Kaplan - Duke)

Page 139: Statistical Genetics - University of Colorado Boulder

Population StratificationWhy are Family-Based Designs etc. robust to

population stratification?

Page 140: Statistical Genetics - University of Colorado Boulder

Family-Based Data

G P

A

GP1 GP2

Page 141: Statistical Genetics - University of Colorado Boulder

Family-Based Data

G P

A

GP1 GP2

Condition on parental genotypes

Page 142: Statistical Genetics - University of Colorado Boulder

Family-Based Data

G P

A

GP1 GP2

Condition on parental genotypes

P(G|GP1,GP2,A) = P(G| GP1,GP2)

Page 143: Statistical Genetics - University of Colorado Boulder

Paradigm ShiftFrom Linkage to Association

Page 144: Statistical Genetics - University of Colorado Boulder

Gene-MappingMonogenic ‘Mendelian’ Diseases

- Rare disease- Rare variants

- Highly penetrant

Complex Disease- Rare/Common disease- Rare/Common variants

- Variable penetrance

Page 145: Statistical Genetics - University of Colorado Boulder

Gene-MappingMonogenic ‘Mendelian’ Diseases

- Rare disease- Rare variants

- Highly penetrant

Complex Disease- Rare/Common disease- Rare/Common variants

- Variable penetrance

Linkage!

Page 146: Statistical Genetics - University of Colorado Boulder

Gene-MappingMonogenic ‘Mendelian’ Diseases

- Rare disease- Rare variants

- Highly penetrant

Complex Disease- Rare/Common disease- Rare/Common variants

- Variable penetrance Association

Page 147: Statistical Genetics - University of Colorado Boulder

Genetic Discovery

Evidence for genetic effects? Familial aggregation

Mode of inheritance? Segregation Analysis

Where in the region? Fine Mapping

What chromosome / region? Linkage Analysis

What gene? Association Analysis

What is the effect of the gene? Characterization

Page 148: Statistical Genetics - University of Colorado Boulder

Genetic Discovery

Evidence for genetic effects? Familial aggregation

Mode of inheritance? Segregation Analysis

Where in the region? Fine Mapping

What chromosome / region? Linkage Analysis

What gene? Association Analysis

What is the effect of the gene? Characterization

Page 149: Statistical Genetics - University of Colorado Boulder

Gene-MappingWhere in the genome (1980s - 2005)?

- Linkage

Where in the genome (2006 - )?- Association

Page 150: Statistical Genetics - University of Colorado Boulder

Foreshadowing the Paradigm Shiftc. 1996

Page 151: Statistical Genetics - University of Colorado Boulder
Page 152: Statistical Genetics - University of Colorado Boulder

Linkage and Complex Disease

Page 153: Statistical Genetics - University of Colorado Boulder
Page 154: Statistical Genetics - University of Colorado Boulder

Linkage of Complex TraitsDismal and controversial picture

Page 155: Statistical Genetics - University of Colorado Boulder

The Power of Linkage vs Association

Page 156: Statistical Genetics - University of Colorado Boulder

Relative Power*

70022,3850.200.20

6598,0670.010.20

2,448207,6350.200.05

2,27867,2190.010.05

ASSOCIATION(NA)

LINKAGE(NL)PrevalenceMAF

MAF = Minor allele frequencyNL = Number of affected sibling pairsNA = Number of case-control pairsOdds Ratio = 1.5

*Adapted from Roeder et al, Am J Hum Genet (2006)

Page 157: Statistical Genetics - University of Colorado Boulder

Rare Disease - Rare Variant

70022,3850.200.20

6598,0670.010.20

2,448207,6350.200.05

2,27867,2190.010.05

ASSOCIATION(NA)

LINKAGE(NL)PrevalenceMAF

MAF = Minor allele frequencyNL = Number of affected sibling pairsNA = Number of case-control pairsOdds Ratio = 1.5

*Adapted from Roeder et al, Am J Hum Genet (2006)

Page 158: Statistical Genetics - University of Colorado Boulder

Common Disease - Rare Variant

70022,3850.200.20

6598,0670.010.20

2,448207,6350.200.05

2,27867,2190.010.05

ASSOCIATION(NA)

LINKAGE(NL)PrevalenceMAF

MAF = Minor allele frequencyNL = Number of affected sibling pairsNA = Number of case-control pairsOdds Ratio = 1.5

*Adapted from Roeder et al, Am J Hum Genet (2006)

Page 159: Statistical Genetics - University of Colorado Boulder

Common Variant - Rare Disease

70022,3850.200.20

6598,0670.010.20

2,448207,6350.200.05

2,27867,2190.010.05

ASSOCIATION(NA)

LINKAGE(NL)PrevalenceMAF

MAF = Minor allele frequencyNL = Number of affected sibling pairsNA = Number of case-control pairsOdds Ratio = 1.5

*Adapted from Roeder et al, Am J Hum Genet (2006)

Page 160: Statistical Genetics - University of Colorado Boulder

Common Disease - Common Variant

70022,3850.200.20

6598,0670.010.20

2,448207,6350.200.05

2,27867,2190.010.05

ASSOCIATION(NA)

LINKAGE(NL)PrevalenceMAF

MAF = Minor allele frequencyNL = Number of affected sibling pairsNA = Number of case-control pairsOdds Ratio = 1.5

*Adapted from Roeder et al, Am J Hum Genet (2006)

Page 161: Statistical Genetics - University of Colorado Boulder

Why Now?

Page 162: Statistical Genetics - University of Colorado Boulder

The “-omics” Agec. 1996

-Pre-genomic era-100’s of Markers

- STRs

c. 2007-Post-genomic era-100,000’s of markers

- SNPs

Page 163: Statistical Genetics - University of Colorado Boulder

c. 2007

Page 164: Statistical Genetics - University of Colorado Boulder

Available TechnologyPlatforms available (or coming soon)

- 1 SNP- Hundreds of SNPs- Thousands of SNPs- Hundreds of thousands of SNPs- Millions of SNPs

Flexibility for Association- Single Marker- Candidate Gene- Whole-Genome

Page 165: Statistical Genetics - University of Colorado Boulder

Examples from the LiteratureWhole Genome Association

Page 166: Statistical Genetics - University of Colorado Boulder

What if we discover that genes have nothing to do with complex phenotypes?

Page 167: Statistical Genetics - University of Colorado Boulder

What if we discover that genes have nothing to do with complex phenotypes?

Good News: We may not have to cross that bridge

Page 168: Statistical Genetics - University of Colorado Boulder

Replicated AssociationsType II DiabetesBMI / ObesityCrohn’s DiseaseAge-Related Macular Degeneration (AMD)Prostate CancerBreast CancerHeart Disease

Page 169: Statistical Genetics - University of Colorado Boulder
Page 170: Statistical Genetics - University of Colorado Boulder

Framingham Heart Study and BMI

Page 171: Statistical Genetics - University of Colorado Boulder

Framingham Heart Study and BMI

The SNP is close (in LD) with INSIG2- A plausible candidate for obesity- Responds to insulin- Involved in trigylceride synthesis

Page 172: Statistical Genetics - University of Colorado Boulder

Framingham Heart Study and BMI

Page 173: Statistical Genetics - University of Colorado Boulder

Framingham Heart Study and BMIReplicated in 4 out 5 studies

- Childhood sample- African American Sample- Europe and North America

Page 174: Statistical Genetics - University of Colorado Boulder

Hot off the press…

Page 175: Statistical Genetics - University of Colorado Boulder

Hot off the press…

Page 176: Statistical Genetics - University of Colorado Boulder

In Summary…WGA is starting off successful

- More replicated associations in one year…

Page 177: Statistical Genetics - University of Colorado Boulder

Statistical GeneticsThe Challenges We Face

Page 178: Statistical Genetics - University of Colorado Boulder

Analytic Challenges

Page 179: Statistical Genetics - University of Colorado Boulder

Wealth of InformationWhole Genome Association using SNPs

- Potentially use all of the data- Covariates, interactions, effect size, etc.- Statistical issues abound…

Page 180: Statistical Genetics - University of Colorado Boulder

Multiple Comparisons

Page 181: Statistical Genetics - University of Colorado Boulder

Multiple Comparisons

The 500K People Chip

Page 182: Statistical Genetics - University of Colorado Boulder

Multiple ComparisonsWhich SNPs are “real”?

- 500K Chip- 25,000 SNPs with p < 0.05

Multiple Phenotypes- 10 Phenotypes, 500K chip

- 5,000,000 comparisons!!!!

Page 183: Statistical Genetics - University of Colorado Boulder

The P-Value Epidemic

Page 184: Statistical Genetics - University of Colorado Boulder

“My name is Matt McQueen and I have a P-value problem”

The smallest p-values- Most addictive- We’ve been trained to focus on them- What do they mean?

- Truth- Chance- Bias

Page 185: Statistical Genetics - University of Colorado Boulder
Page 186: Statistical Genetics - University of Colorado Boulder

Replicated Associations…

Scott et al (2007) Science

Page 187: Statistical Genetics - University of Colorado Boulder

The Phenotype Question

Page 188: Statistical Genetics - University of Colorado Boulder

What is a phenotype?Depends on who you ask…

Page 189: Statistical Genetics - University of Colorado Boulder

What is a phenotype?If we asked a gene…

GENE

Trait 1

Trait 2

Trait 3

Trait 4

Trait 5

Trait 6

5%

55%

4%

20%

1%

15%

Page 190: Statistical Genetics - University of Colorado Boulder

What is a phenotype?If we asked an environmental factor…

Trait 1

Trait 2

Trait 3

Trait 4

Trait 5

Trait 6

10%

10%

30%

5%

5%

40%

ENV

Page 191: Statistical Genetics - University of Colorado Boulder

What is a phenotype?

GENE

Trait 1

Trait 2

Trait 3

Trait 4

Trait 5

Trait 6

5%

55%

4%

20%

1%

15%

10%

10%

30%

5%

5%

40%

ENV

Page 192: Statistical Genetics - University of Colorado Boulder

The Genotype Question

Page 193: Statistical Genetics - University of Colorado Boulder

What is a Genotype?

We test SNPs for association because we can

What about epigenetic factors?- Methylation- Copy Number Variation

Page 194: Statistical Genetics - University of Colorado Boulder

The Not-So Distant Future

Page 195: Statistical Genetics - University of Colorado Boulder

The $1000 GenomeNHGRIRFA Number

- RFA-HG-06-020Title

- “The $1000 Genome”Goal

- Develop technology to enable investigators to sequence an entire human genome for $1000 within 10 years

Page 196: Statistical Genetics - University of Colorado Boulder

June 2017Biodemography Short Course

Complete Genome Sequence Analysis