Associating Genomic V ariations with Phenotypes

Preview:

DESCRIPTION

Associating Genomic V ariations with Phenotypes. M odel comparison , rare variants , and analysis pipeline. Qunyuan Zhang Division of Statistical Genomics & Genome Institute Washington University School of Medicine. Data & Question. Genotypes: SNP Insertion Deletion Duplication - PowerPoint PPT Presentation

Citation preview

1

Associating Genomic Variations with

Phenotypes

Model comparison, rare variants, and analysis pipeline

Qunyuan ZhangDivision of Statistical Genomics & Genome InstituteWashington University School of Medicine

2

Data & Question

Relationshipbetween X and Y ?

nmnnn

m

m

xxxyn

xxxyxxxy

XYi

.......................

...2

...1

21

222212

112111

Genotypes:SNP

InsertionDeletion

DuplicationInversion

Translocation…

Phenotypes(quantitative,categorical)

3

Linkage & Association

Association: (Y,X)

Linkage: (Y,Q)Q is unobservable

...

.....................

...2

...1

221

2222212

1212111

nnnn xqxyn

xqxyxqxy

XYi Genotypes

Phenotype

Putative QTL

r1 Q r2

4

A Fixed-effect Mixture Model For LinkageCommonly used in plant genetics

r1 Q r2

P1 X P2

F1

F2

3

1

),|()(j

iji rXQPyf

2)(

21exp

21

j

jiy

j

n

iiyfYL

1

)()(

SNP A SNP B

5

A Variance-component Model For LinkageCommonly used in human genetics

r1 Q r2

)()(

21exp

||)2(1)( 1

2/12/

YYYL Tn V

V

222)( eggQQYCov IΔΔV

Background IBD matrix

QTL IBD matrix

Diagonal unit matrix

SNP A SNP B

6

Variance-component Model = Random-effect Linear Model

222eggQQ IΔΔV

eγZγZμ ggQQY

),0( 2QQMVN Δ ),0( 2

ggMVN Δ ),0( 2eN

)()(

21exp

||)2(1)( 1

2/12/

YYYL Tn V

V

Random effects

7

From Linkage to Association

22egg IΔV

eγZγZμ ggQQY

)()(

21exp

||)2(1)( 1

2/12/

XYXYYL Tn V

V

eγZXβμ ggY

marker effect(s)

Family-based association model

Linkage model

QTL effect(s)

fixed effect(s)

8

A Simple Association ModelFor Unrelated Subjects

2eIV

)()(

21exp

||)2(1)( 1

2/12/

XYXYYL Tn V

V

eXβμ Y

n

i e

i Xy

e1

2)(21exp

21

9

Covariate(s): Adjusting For Confounder(s)

eβXXβμ CCY

Observed confounders: age, sex etc.Hidden confounders: population structure

Population structure can be estimated by:-PCA-Clustering-Admixture/ancestry

10

Modeling Hidden Genetic CorrelationBetween Subjects

22egg IΔV

eγZβXXβμ ggCCY

marker fixed effect(s)

Family data, pedigree => IBD matrixPopulation data, hidden, marker data => IBS matrix

covariate fixed effect(s)

Genetic background random effects

11

Modeling Rare Variants

eγZβXXβμ ggCCY

...11 XY μ

......2211 kkXXXY μ

Common variants, tested individually, H0: β1=0. One p-value per variant

Rare variants, tested as an entire group (burden test), usually by geneH0: β1= β2=…=βk=0 . One p-value per group of variants

Incorporated with variable selection, with loose criteria

β can be treated as random effects, variance components test, can be weighted by prior information

12

Collapsing Model

......2211 kkXXXY μ

... XY μ

110

001311020001

321 XXXXsubject

Collapsing multiple variables into one

13

Weighted Sum Model......2211 kkXXXY μ

...)(1

k

jjjXwY μ

2.08.00.0

001311020001

3.05.02.0 1

3

1

2

1

1 SwX

wX

wXsubject

Weighted sum score

... SY μ

14

Weighting VariantsBase on allele frequency, continuous or binary(0,1) weight,

variable threshold;Based on function annotation/prediction;Based on sequencing quality (coverage, mapping quality,

genotyping quality, validated or not etc.);Data-driven, using both genotype and phenotype data,

learning weights (including effect directions) from data, requiring permutation test;

Any combination …

Grouping VariantsBy gene By transcript By exonBy gene set / pathway By protein domain……

15

Modeling More Data TypesGeneralized Linear (Mixed) Model

eXβμ ...)(Yg

Link function

For binary Y, logistic model

)0(1)1(log)(log)(

YPYPYitYg

1)...exp()...exp()1(

eXβμeXβμYP

16

Longitudinal Data (quantitative)

Fixed effect, time as covariate

Repeated measures, random effect, correlation within subjects

Time

17

Longitudinal Data (binary)

Linear model, time as covariate

Survival analysis, CoxPH model etc.

Time

18

Tools

SAS ProceduresREG, LOGISTIC, GENMOD, MIXED, HPMIXED, GLIMMIX, PHREG/LIFETEST

R Functions/Packageslm (), glm()gee, nlme, kinship2/coxme, lme4, survival

Other ProgramsSOLAR, MMAP, EMMA, EMMAX, SKAT

19

Pipeline

job1 job2 ….. Job N

Input (data + options)

Options.jobi => self-programmed modules (SAS, R,…)

Options.jobi => external program modules (MMAP, SKAT,..)

Result 1

Result 2

….. Result N

Job generating/submitting module

Job number controlling module

Job status monitoring module (all done ?)

Yes

Result summarizing module

no

Wait …

LSF bsub

20

gwas.sh options.gwa

#!/bin/shOPFILE=$1...…

[DATA]database=SASgenotype_dir=/dsg1/gwas/fhsgenogenotype_file=

phenotype_file=fhs100markerinfo_file=mapallmarker_selection=MAF>0.01pedigree_file=pediallsubjectID=subjectpedgreeID=famidmarkername=snp…[ANALYSIS]phenolist_file=pheno_list=bmi/qtcovariates=program=SASGLManalysis=mixed[OUTPUT]output_dir=/dsguser/qunyuan/fhs/bmioutput_file=output_replace=no[RUN]clusterjobname=bmimixedmemsize=1000Mmaxjobn=300…

Pheno type covar program analysis runBmi qt age,sex SASGLM mixed YESObes ql NA SASGLM gee YESHD ql age SASGLM gee NOAge …Sex ……

Program language location Maintainer SASGLM SAS /dsg1/code/sas/glm.sas Q.ZhangGSTAT R /dsg1/code/R/gstat.R Q.ZhangMMAP C /dsg1/code/sas/mmap.sh J. Czajkowski…

21

Thanks !

Recommended