26
Novel Statistical Methods Gary K. Chen University of Southern California May 17, 2011

Analysis update for GENEVA meeting 2011

  • Upload
    usc

  • View
    189

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Analysis update for GENEVA meeting 2011

Novel Statistical Methods

Gary K. ChenUniversity of Southern California

May 17, 2011

Page 2: Analysis update for GENEVA meeting 2011

An outline

Association testing in admixed populations

Gene-gene interactions

Copy number inferences

Page 3: Analysis update for GENEVA meeting 2011

Local ancestry inference

I Assumption: 2 or more homogeneouspopulations gave rise to today’s admixedpopulation. e.g. Hispanics, African Americans

I Software:I LAMPI HAPAAI Hapmix

I Relevance:I Not taking ancestry into account can cause large

problems in confoundingI However, understanding local ancestry can enhance

inference in gene mapping

Page 4: Analysis update for GENEVA meeting 2011

Hidden Markov Model of HAPMIXprogram

Page 5: Analysis update for GENEVA meeting 2011

Combining evidence from both localancestry and association

Page 6: Analysis update for GENEVA meeting 2011

Novel MIX score statistic

I A χ21 test combining association and admixture

associationI Likelihood:

I Lcombined(pA, pE ,R) =LAA,AE ,AA(pA, pE ,R)Ladmix(Ω(R))

I Assumption: the SNP odds ratio R is re-used inthe ancestry odds ratio Ω(R)

I MIX = 2[ maxpA,0,pE ,0,R

logLcombined(pA,0, pE ,0,R)−

max pA,0, pE ,0logLcombined(pA,0, pE ,0, 1)]

Page 7: Analysis update for GENEVA meeting 2011

Afr-Am Prostate Cancer GWAS

Page 8: Analysis update for GENEVA meeting 2011

Afr-Am Prostate Cancer Admixture scan

Page 9: Analysis update for GENEVA meeting 2011

Top results from scan of MIX statistic

chr position rs adm mix snp beta se pvalue8 128187997 rs7844219 49.9614 83.6454 54.5324 0.279106 0.0366375 1.96509e-148 128193308 rs1551512 50.4199 81.3104 51.5792 0.266105 0.0364108 2.15938e-138 128198554 rs6989838 52.7453 80.7646 51.0694 0.266931 0.036335 1.61315e-138 128199669 rs7013255 50.4199 80.7332 51.7533 0.266914 0.0363362 1.62204e-138 128194098 rs16901979 49.9614 79.6505 51.0524 0.265175 0.0363043 2.22267e-138 128176062 rs6983561 49.9614 79.386 49.888 0.276348 0.037292 9.90319e-148 128194377 rs10505483 51.8086 78.1544 49.3351 0.260651 0.0363121 5.71765e-138 128174913 rs7012442 49.5051 77.371 48.4881 0.278969 0.0376443 9.78106e-148 128219343 rs6987409 49.5051 62.8232 46.5145 0.36881 0.0502564 1.41553e-138 128202258 rs7000307 49.9614 57.1498 37.9689 0.254356 0.0408703 4.13043e-108 128225845 rs7822987 54.6449 56.9699 40.3666 0.349385 0.0501898 2.4144e-128 128204516 rs7840773 52.2758 56.5461 36.8886 0.2491 0.0405111 6.68617e-108 128223073 rs7018243 49.051 56.4211 40.683 0.345422 0.0498431 3.02292e-128 128225870 rs7822995 49.9614 56.1409 40.2347 0.349646 0.0502108 2.37455e-128 128173525 rs13254738 52.7453 55.9177 37.4838 -0.255658 0.0386944 3.37064e-118 128204547 rs7824364 47.2561 55.7612 37.2464 0.2476 0.0404363 7.88135e-108 128173119 rs1456315 49.9614 54.9189 41.9607 -0.231627 0.0357647 8.22321e-118 128482487 rs6983267 55.6079 49.2272 19.0306 -0.280904 0.0593348 2.00619e-068 128168637 rs1840709 50.8806 44.0089 22.5331 -0.204043 0.0410319 6.27666e-078 128257237 rs16902003 50.4199 38.9456 29.1061 0.340009 0.0612992 2.29044e-08

Page 10: Analysis update for GENEVA meeting 2011

An outline

Association testing in admixed populations

Gene-gene interactions

Copy number inferences

Page 11: Analysis update for GENEVA meeting 2011

Detecting higher order interactions

I Statistical epistasis may account for somehidden heritability

I Statistical and computational challenges areobvious

I Possible approaches for variable selectionI Constrain search to only variables with strong

marginal effectsI Place priors on the effect sizes, informed through

biology: (e.g. Chen and Thomas Genetic Epi 2010)

I Search space can still be hugeI Implement massively parallel optimization

algorithmsI Provide a good fit for hardware architecture of

Graphics Processing Units

Page 12: Analysis update for GENEVA meeting 2011

Organization of gridblock of threadblockson GPU

Page 13: Analysis update for GENEVA meeting 2011

Overview of algorithmI Newton-Raphson kernel

I Each threadblock maps to a block of 512 subjects(theads) for 1 variable

I Each thread calculates subject’s contribution togradient and hessian

I Sum (reduction) across 512 subjectsI Sum (reduction) across subject blocks in new

kernel

I Compute log-likelihood change for eachvariable (like above).

I Apply a max operator (log2 reduction) toselect variable with greatest contribution tolikelihood.

I Iterate repeatedly until likelihood increase lessthan epsilon

Page 14: Analysis update for GENEVA meeting 2011
Page 15: Analysis update for GENEVA meeting 2011

Evaluation on large dataset

I GWAS dataI 6,806 African American subjects in a case control

study of prostate cancerI 1,047,986 SNPs typed

I Elapsed walltime for 1 LASSO iteration (sweepacross all variables)

I 15 minutes on optimized serial implementationacross 2 slave CPUs

I 5.8 seconds on parallel implementation across 2nVidia Tesla C2050 GPU devices

I 155x speed up

Page 16: Analysis update for GENEVA meeting 2011

Application

I Defined 28 risk regions (Haiman et al PLoSGenet in press)

I 6,256 SNPs typed

I Fit a model with 19,571,896 variables usingLASSO penalized multivariate logisticregression

I Avg run time per variable: 1 min 40 seconds

Page 17: Analysis update for GENEVA meeting 2011

Results

Table: 1st 10 variables to enter the model

Interaction β 1df χ2

SNP 1 SNP 2 Multivariate Univariate Interaction SNP 1 SNP 2rs10050937 rs17794619 -0.472152 -0.512223 35.0549 6.6248 15.6842rs12484747 rs5759052 -0.382707 -0.34638 27.221 3.89604 18.1621rs12943477 rs7130881 0.243003 0.267117 42.1636 5.71494 31.5322rs13417654 rs5759256 0.216708 0.240687 32.3129 16.0361 0.0104941rs2625403 rs4872172 -0.12534 -0.148221 30.0041 11.1439 14.8556rs266880 rs7949453 0.136513 0.152762 28.983 12.0237 10.7806rs2963275 rs360802 -0.53471 -0.583309 29.8975 1.63451 7.26303rs339319 rs7075009 -0.225684 -0.263573 31.4443 22.2988 6.06348rs4129455 rs9333335 -1.33385 -1.78312 33.2588 6.37851 1.19051rs6798749 rs8079894 0.179629 0.201867 29.0323 12.0371 9.97029

Page 18: Analysis update for GENEVA meeting 2011

An outline

Association testing in admixed populations

Gene-gene interactions

Copy number inferences

Page 19: Analysis update for GENEVA meeting 2011

Application to cancer tumor data

I Copy number inference in tumors morechallenging

I Tissues can be contaminated with normal cellsI Furthermore, intra tumor heterogeneity can lead to

sub-clones with distinct CN profiles

I A large state space HMMI Consider differing normal-tumor copy number and

genotype combinationsI For each combination, a possible contamination

proportionI Copy Num: z = (1-α)znormal + α ztumor

Page 20: Analysis update for GENEVA meeting 2011

Simplified Example of a State Spacestate CNfrac BACnormal CNtumor BACtumor0 2 0 2 01 2 1 2 12 2 2 2 23 0 0 0 04 0 1 0 05 0 2 0 06 0.5 0 0 07 0.5 1 0 08 0.5 2 0 09 1 0 1 010 1 1 1 011 1 1 1 112 1 2 1 113 1.5 0 1 014 1.5 1 1 015 1.5 1 1 116 1.5 2 1 117 2.5 0 3 018 2.5 1 3 119 2.5 1 3 220 2.5 2 3 321 3 0 4 022 3 1 4 123 3 1 4 224 3 1 4 325 3 2 4 426 3.5 0 4 027 3.5 1 4 128 3.5 1 4 229 3.5 1 4 330 3.5 2 4 4

Page 21: Analysis update for GENEVA meeting 2011

Comparison of algorithms

I We implement 8 kernels. Examples:I Re-scaling transition matrix (for SNP spacing)

I Serial: O(2nm2); Parallel: O(n)

I Forward backwardI Serial: O(2nm2); Parallel: O(nlog2(m))

I Normalizing constant (Baum-Welch)I Serial: O(nm); Parallel: O(log2(n))

I MLE of transition matrix (Baum-Welch)I Serial: O(nm2); Parallel: O(n)

Page 22: Analysis update for GENEVA meeting 2011

Speedups

Table: 1 iteration of HMM training on Chr 1 (41,263 SNPs)

states CPU GPU fold-speedup128 9.5m 37s 15x512 2h 35m 1m 44s 108x

Page 23: Analysis update for GENEVA meeting 2011

Chr 21 0 percent tumor

Page 24: Analysis update for GENEVA meeting 2011

Chr 21 100 percent tumor

Page 25: Analysis update for GENEVA meeting 2011

Chr 21 50 percent tumor

Page 26: Analysis update for GENEVA meeting 2011

Thanks to

I Admixture scoring: Bogdan Pasaniuc

I CNV work: Kai Wang, Christina Curtis

I Access to GPU server: Tim Triche, ZachRamjan

I (Chris’s Acknowledgement slide)