Transcript
Page 1: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Using pathways to discovercomplex disease models

Gary Chen, Duncan ThomasDepartment of Preventive Medicine

USC

October 20, 2009

Page 2: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

An outline

1. Motivation

2. A stochastic search variable selectionalgorithm

3. Example using candidate genes

4. Ideas for GWAS

Page 3: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Common disease have complexetiology

I GWAS have had great success in searchingfor genetic variants for common diseases

I Recent successes: AMD, BMI/obesity,Type 2 diabetes, Breast cancer, Prostatecancer

I Marginal effects from single SNP analysesdo not explain all heritability. Can wemove beyond the low-hanging fruit?

Page 4: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Common disease have complexetiology

I GWAS have had great success in searchingfor genetic variants for common diseases

I Recent successes: AMD, BMI/obesity,Type 2 diabetes, Breast cancer, Prostatecancer

I Marginal effects from single SNP analysesdo not explain all heritability. Can wemove beyond the low-hanging fruit?

Page 5: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Use biological knowledge to helpsearch for disease models

I Hierarchical ModelingI Stabilizes effect estimates β from an

association test by assuming they come froma prior distribution derived from biologicaldata

I Examples in Genetic EpiI Model selection: Conti et al (Hum Her,

2003), Baurley et al(Stat Med, in review)I GWAS: Lewinger et al (Gen Epi 2007), Chen

et Witte (AJHG 2007)I Review: Thomas et al (Hum Genomics 2009)

Page 6: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Use biological knowledge to helpsearch for disease models

I Hierarchical ModelingI Stabilizes effect estimates β from an

association test by assuming they come froma prior distribution derived from biologicaldata

I Examples in Genetic EpiI Model selection: Conti et al (Hum Her,

2003), Baurley et al(Stat Med, in review)I GWAS: Lewinger et al (Gen Epi 2007), Chen

et Witte (AJHG 2007)I Review: Thomas et al (Hum Genomics 2009)

Page 7: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

An outline

1. Motivation

2. A stochastic search variable selectionalgorithm

3. Example using candidate genes

4. Ideas for GWAS

Page 8: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Searching for independent maineffects and their interactions

I Ideally fit all predictors in a single model ifN > P

I Model selection: e.g. stepwise regressionI P-values can be anti-conservative: Don’t

adjust for number of testsI Can be computationally intractable

I An alternative: Bayesian model averagingI Probabilistically propose sub-models from a

posterior distributionI Summary statistics of parameters averaged

across all proposed modelsI Appears to better control for multiple

comparisons

Page 9: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Searching for independent maineffects and their interactions

I Ideally fit all predictors in a single model ifN > P

I Model selection: e.g. stepwise regressionI P-values can be anti-conservative: Don’t

adjust for number of testsI Can be computationally intractable

I An alternative: Bayesian model averagingI Probabilistically propose sub-models from a

posterior distributionI Summary statistics of parameters averaged

across all proposed modelsI Appears to better control for multiple

comparisons

Page 10: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

The model form: A two-levelhierarchical model

I First Level: a linear modelI logit(P(Y = 1|β,X )) ∼ β0 +

∑Kk=1 βkX

I X can be G, E, GxG, GxE, etc.

I Second level: a mixture prior on each βkof univariate Gaussians:

I β ∼ N(φβ̄k + (1−φ)πTZk , φτ2

adjk+ (1−φ)σ2)

I 1st component: neighborhood of gene kI 2nd component: pathway info on gene k

Page 11: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

The model form: A two-levelhierarchical model

I First Level: a linear modelI logit(P(Y = 1|β,X )) ∼ β0 +

∑Kk=1 βkX

I X can be G, E, GxG, GxE, etc.

I Second level: a mixture prior on each βkof univariate Gaussians:

I β ∼ N(φβ̄k + (1−φ)πTZk , φτ2

adjk+ (1−φ)σ2)

I 1st component: neighborhood of gene kI 2nd component: pathway info on gene k

Page 12: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

How the parameters fit togetherβ ∼ N(φβ̄k + (1− φ)πTZk , φ

τ 2

adjk+ (1− φ)σ2)

Page 13: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Stochastic Search VariableSelection

I Propose a swap, addition or deletion of anvariable

I Perform reversible jump MetropolisHastings step comparing posteriorprobabilities

I H = P(Y=1|β′,X )P(β′|Z ,A,π,σ,τ,φ)P(Y=1|β,X )P(β|Z ,A,π,σ,τ,φ)

I Accept move with probability min(1,H)

Page 14: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Stochastic Search VariableSelection

I Propose a swap, addition or deletion of anvariable

I Perform reversible jump MetropolisHastings step comparing posteriorprobabilities

I H = P(Y=1|β′,X )P(β′|Z ,A,π,σ,τ,φ)P(Y=1|β,X )P(β|Z ,A,π,σ,τ,φ)

I Accept move with probability min(1,H)

Page 15: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Stochastic Search VariableSelection

I Propose a swap, addition or deletion of anvariable

I Perform reversible jump MetropolisHastings step comparing posteriorprobabilities

I H = P(Y=1|β′,X )P(β′|Z ,A,π,σ,τ,φ)P(Y=1|β,X )P(β|Z ,A,π,σ,τ,φ)

I Accept move with probability min(1,H)

Page 16: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

An outline

1. Motivation

2. A stochastic search variable selectionalgorithm

3. Example using candidate genes

4. Ideas for GWAS

Page 17: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Folate pathway

Reed et al J Nutr. 2006 Oct;136(10):2653-61

Page 18: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Simulated data setI Simulated data for 4000 individualsI 14 genes, 2 environmental variablesI Pathway enzymes: genotype specific rates

I Simulating disease statusI Assign homocysteine as causal mechanismI ’Run’ the pathway until steady stateI Probabilistically assign disease status

conditional on metabolite conc.I Priors

I Deposit half the genotypes into priordatabase

I Z matrix, causal metabolite(s): correlation ofprior genotypes to candidate metabolite

I A matrix, network information: correlation ofcorrelation profiles between two effects

Page 19: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Simulated data setI Simulated data for 4000 individualsI 14 genes, 2 environmental variablesI Pathway enzymes: genotype specific ratesI Simulating disease status

I Assign homocysteine as causal mechanismI ’Run’ the pathway until steady stateI Probabilistically assign disease status

conditional on metabolite conc.

I PriorsI Deposit half the genotypes into prior

databaseI Z matrix, causal metabolite(s): correlation of

prior genotypes to candidate metaboliteI A matrix, network information: correlation of

correlation profiles between two effects

Page 20: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Simulated data setI Simulated data for 4000 individualsI 14 genes, 2 environmental variablesI Pathway enzymes: genotype specific ratesI Simulating disease status

I Assign homocysteine as causal mechanismI ’Run’ the pathway until steady stateI Probabilistically assign disease status

conditional on metabolite conc.I Priors

I Deposit half the genotypes into priordatabase

I Z matrix, causal metabolite(s): correlation ofprior genotypes to candidate metabolite

I A matrix, network information: correlation ofcorrelation profiles between two effects

Page 21: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Setting up the priors

Page 22: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Comparison

Same interactions detected. Z matrix providessupport.

Page 23: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Sensitivity analysis

I How does our prior on β affect posteriorinference?

I Compare four special cases of the priordensity:

I βpriork ∼ N(φβ̄k + (1− φ)πTZk ,

φ τ2

nk+ (1− φ)σ2)

I 1. Non-informative: constrain φ = 0, π = 0I 2. Z matrix: constrain φ = 0I 3. Adjacency info: constrain π = 0I 4. Z matrix and adjacency info: no

constraints

Page 24: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Sensitivity analysis

I How does our prior on β affect posteriorinference?

I Compare four special cases of the priordensity:

I βpriork ∼ N(φβ̄k + (1− φ)πTZk ,

φ τ2

nk+ (1− φ)σ2)

I 1. Non-informative: constrain φ = 0, π = 0I 2. Z matrix: constrain φ = 0I 3. Adjacency info: constrain π = 0I 4. Z matrix and adjacency info: no

constraints

Page 25: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Sensitivity analysis

I How does our prior on β affect posteriorinference?

I Compare four special cases of the priordensity:

I βpriork ∼ N(φβ̄k + (1− φ)πTZk ,

φ τ2

nk+ (1− φ)σ2)

I 1. Non-informative: constrain φ = 0, π = 0I 2. Z matrix: constrain φ = 0I 3. Adjacency info: constrain π = 0I 4. Z matrix and adjacency info: no

constraints

Page 26: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Model averaged estimates ofhyperparameters

I ResultsI Prior solely incorporating information in Z

matrix appeared to explain residual variationbetter than adjacency-only prior

I π estimated at 1.86, consistent withsimulated effect size.

Scenario σ̂2 τ̂ 2 φ̂Non informative .48 N/A 0Z matrix .00459 N/A 0Adjacency .48 .22 .56Z mat + Adj .00731 .23 .05

Page 27: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Comparison among several priors

Page 28: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Summary of simulated example

I Biomarker data incorporated as priorsI Intermediate phenotypes believed to be

causal in Z (mean) matrixI Global level pathway information encoded in

A (adjacency) matrix

I Influence of prior estimated by observeddata through π,τ ,σ,φ

I Informative priors provided additionalsupport for causal genes

Page 29: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

An outline

1. Motivation

2. A stochastic search variable selectionalgorithm

3. Example using candidate genes

4. Ideas for GWAS

Page 30: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Can be applied in genome-wideassociation study

I Proof of concept: GWAS of breast cancerI 2000 cases, 2000 controls, ∼ 1M SNPsI Top SNP from each of 2755 genes, p < .05

from GWAS

I Gene Ontology used to define adjacencymatrix and proposal kernel

I Considered the 22 GO terms under BiologicalProcess (Level 3)

I Pair of SNPs considered neighbors if share atleast one GO term

I Define a proposal density for new var V ′i as:

I Q(V ′i ) = I (Aij,i 6=j 6= 0)

Page 31: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Can be applied in genome-wideassociation study

I Proof of concept: GWAS of breast cancerI 2000 cases, 2000 controls, ∼ 1M SNPsI Top SNP from each of 2755 genes, p < .05

from GWAS

I Gene Ontology used to define adjacencymatrix and proposal kernel

I Considered the 22 GO terms under BiologicalProcess (Level 3)

I Pair of SNPs considered neighbors if share atleast one GO term

I Define a proposal density for new var V ′i as:

I Q(V ′i ) = I (Aij,i 6=j 6= 0)

Page 32: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Analysis

I Stepwise regression:I Considered only first 100 SNPsI Retained 83/100 SNPsI Intractable for 2nd order interactions

I Our proposed algorithm:I Low posterior probability for interactionsI Most sub-models contained variables with

shared annotation

Page 33: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Analysis

I Stepwise regression:I Considered only first 100 SNPsI Retained 83/100 SNPsI Intractable for 2nd order interactions

I Our proposed algorithm:I Low posterior probability for interactionsI Most sub-models contained variables with

shared annotation

Page 34: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Sensitivity analysis

I Compare non-informative prior to oneusing GO terms in A

I 1. Non-informative: constrain φ = 0I 2. Adjacency info: no constraint on φ

Scenario σ̂2 τ̂ 2 φ̂Non informative .01 N/A 0Adjacency .01 .0004 .86

Page 35: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Posterior inference

Page 36: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Scaling up to larger sub-models

I Need to test larger sub-models in GWASsettings

I Partition models into submodels usingontology info

I Parallel processing: nodes fit submodels

I A parallelized MCMC algorithm - Poster190

Page 37: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Logical topology of sub-models

Page 38: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Hierarchical model

Page 39: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Summary for GWAS exampleI External knowledge can be informative

I MLEs of β are smoothed towards pathwaymeans

I Ontologies useful: WECARE study in breastcancer - Poster 189

I For GWAS: Genome-wide expressionpotentially more biologically informative in Zmatrix

I Priors can guide towards biologically relevantinteractions

I Computational efficiency essential:I Defining proposal kernel: e.g. expit(πTZ )I More parsimonious sub-models desirable (e.g.

fused LASSO)I Fisher scoring can be improved using parallel

code (e.g. GPUs)

Page 40: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Summary for GWAS exampleI External knowledge can be informative

I MLEs of β are smoothed towards pathwaymeans

I Ontologies useful: WECARE study in breastcancer - Poster 189

I For GWAS: Genome-wide expressionpotentially more biologically informative in Zmatrix

I Priors can guide towards biologically relevantinteractions

I Computational efficiency essential:I Defining proposal kernel: e.g. expit(πTZ )I More parsimonious sub-models desirable (e.g.

fused LASSO)I Fisher scoring can be improved using parallel

code (e.g. GPUs)

Page 41: Pathway talk for IGES 2009 Hawaii

Using pathways todiscover complexdisease models

Gary Chen,Duncan ThomasDepartment of

PreventiveMedicine

USC

1. Motivation

2. A stochasticsearch variableselection algorithm

3. Example usingcandidate genes

4. Ideas for GWAS

Acknowledgements

I James Baurley

I David Conti

I Dataset: African American Breast CancerGWAS Collaborators

I Funding: R01 ES016813


Recommended