View
153
Download
1
Category
Preview:
DESCRIPTION
Whole Genome Regression : Lasso, Ridge, Bayesian Lasso
Citation preview
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Whole Genome Prediction Using PenalizedRegressionBayesian Lasso
Jinseob Kim, MD, MPH
GSPH, SNU
February 27, 2014
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Contents
1 Limitation of GWAS or Linkage analysis
2 Introduction of WGP목표
3 Lasso estimation
4 Bayesian inference of Lasso
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Limitation
1 GWAS & Linkage : No consistent result → poor prediction.
2 Complex traits : Overall effect (e.g:cardiovascular, cancer,etc..).
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Example of GWAS
GWAS에서 여러 genetic information을 한 모형에 포함시키는것은 불가능.
1 1 trait VS 1 locus → SNP 갯수만큼 통계량 구한다.
2 허나 전체 SNP information을 고려해야 하므로 multiplecomparison p-value를 이용하게 된다. (ex: p-value cutoff-5× 10−8)
3 Significant SNP만을 대상으로 모형을 구성 or combineinformation via Genetic Risk Score
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Problem
1 Multiple comparison → Power...
2 SNP 하나씩 trait와 분석 → LD information...
3 What is Genetic Risk Score??? 부정확한 지표..
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Why this problem?
그냥 다 넣고 회귀분석하면 안될까???
1 Multicolinearity issue!!! → LD: similar allele information
2 n < p issue: 즉, 사람수보다 변수(SNP)갯수가 많으면회귀계수 추정이 안됨.
회귀계수의 분산(variance)이 너무 커진다..... 추정불가..
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Why?
β 추정량의 unbiaseness를 포기하지 않았기 때문이다.Variance-bias trade-off!!
(a) (b)
Figure : Summary of variance-bias tradeoff
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Variance-bias tradeoff
Y = f (x) + ε, ε ∼ N(0, σe), f : estimate of f 일 때
Err(x) = E [(Y − f (x))2] (1)
Err(x) = (E [f (x)− f (x)])2 + E [f (x)− E [f (x)]]2 + σe (2)
Err(x) = Bias2 + Variance + Irreducible error (3)
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
목표
Core context of WGP
β가 unbiased estimator 임을 포기한다!!!
1 WGP can use all available markers to regress phenotype ontogenomic information.
Ridge regressionLasso (Least absolute shrinkage and selection operator)
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
목표
1 Lasso와 Bayesian inference의 핵심 원리와 아이디어만 알면된다.
2 Lasso 패키지를 이용하기 위한 데이터 정리를 할 수 있다.
3 분석은 최대한 자동화 → 바로 테이블과 그림 생성.
4 Data와 phenotype 입력 → 논문에 수록할 테이블과 그림!!
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Ridge VS Lasso
Ridge regression
minimize (y − Xβ)T (y − Xβ) s.t
p∑j=1
β2j ≤ t
↔ minimize (y − Xβ)T (y − Xβ) + λ
p∑j=1
β2j
(4)
Lasso
minimize (y − Xβ)T (y − Xβ) s.t
p∑j=1
|βj | ≤ t
↔ minimize (y − Xβ)T (y − Xβ) + λ
p∑j=1
|βj |(5)
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Ridge VS Lasso(2)
1 두 방법 모두 많은 beta값들을 0으로 보낸다. 다중공선성해결, LD information 반영.
2 Square(β2) VS Abs(|β|)3 0.04 VS 0.2 : 제곱이 절대값보다 작은 β값을 0으로 더 잘보낸다.
4 절대값이 더 강한 조건, 즉 더 많은 β들을 0으로 보낸다.
5 Lasso가 ridge보다 더 많은 β들을 0으로 보낸다.
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Ridge VS Lasso(3)
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Choosing λ
K-fold cross validation 데이터에서 k개 뺀 n − k개들을 가지고Modeling 후 이것을 k개의 sample에 적용하여 error구한 후 그것들을 다 평균한 것을 CV error라 한다.CV error들의 평균을 최소화 하는 λ 구한다.
(CV error)(λ) = E ((CV error)(λ)k ) (6)
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
10 fold CV
Figure : 10 fold CV
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Bayesian inference
Introduction → ThinkBayes 강의록 gogo!!
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Lasso → Bayesian Lasso
βi |σ2 ∼ λ2σ e−λ|βi |/σ : Laplace prior
1 The Laplacian prior assigns more weight to regions near zerothan the normal prior.
2 Interpretated as mixture of the hierarchical priors (Normal +exponential)
a2e−a|z| =
∫∞0
1√2πs
e−z2/2s a2
2 e−as2/2ds, a > 0
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Laplace prior
Figure : Normal VS Laplace prior
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Example: Continuous case
Whole model
µi = µ+J∑
j=1
xijγj +L∑
l=1
zljβj (7)
Likelihood
p(yi |µi , σ2) = (2πσ2)−12 exp{−(yi − µi )2
2σ2} (8)
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Likelihood
p(y |µ, γ, β, σ2) =∏
N(yi |µi +J∑
j=1
xijγj +L∑
l=1
zijβj , σ2) (9)
y = {yi}, γ = {γj}, β = {βl}
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Prior construction: hierarchial model
1 Intercept(µ) & sex, smoking, BMI(γ) : vagueprior(non-informative)
2 Residual variance - standard assumption of bayesian regression: scaled-inverse Chi-square density χ−2(σ2|df ,S)
3 marker effect - bayesian Lassop(β, τ2, λ2|H, σ2) = p(β|τ2σ2)p(τ2|λ2)p(λ2|r , s)= {
∏Ll=1N(βl |0, τ2l σ2)Exp(τ2l |λ2)}G (λ2|r , s)
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Prior
p(µ, γ, σ2, β, τ2, λ2|H) ∝
χ−2(σ2|df ,S){L∏
l=1
N(βl |0, τ2l σ2)Exp(τ2l |λ2)}G (λ2|r , s)(10)
H = {df = 5,S = 170, δ = 1× 104, s = 2} : For priors with smallinfluences on predictions
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Posterior
p(µ, γ, σ2, β, τ2, λ2|y) ∝∏N(yi |µi +
J∑j=1
xijγj +L∑
l=1
zijβj , σ2)
×χ−2(σ2|df , S){L∏
l=1
N(βl |0, τ2l σ2)Exp(τ2l |λ2)}G (λ2|r , s)
(11)
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Implementation
1 BLR(Bayesian Linear Regression) package in R
2 bayesm, splines and SuppDists for sampler
→ BGLR(Bayesian Generalized Linear Regression) package in R
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
Goodness of fit, DIC
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
실습
BGLR package 실습 : continuous trait (TG) & binomial traint(hyperTG)
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
주의사항
1 미리 변수성질(conti VS categorial) 지정.
2 Lasso 쓸 변수(genotype)와 그냥 변수(age)를 구분
3 이론적으로 Lasso 에 들어갈 x들은 모두 표준화되어야한다. 베타값이 공평하게 측정되어야 하기 때문이다. 허나allele count는 무조건 0,1,2이므로 상관없음.
4 Missing이 없어야 한다. GWAS는 Missing 빼고 알아서계산해주지만 BGLR은 그렇지 않다. 게다가 predictionmodel이므로 더더욱 x값에 missing 없어야 함: Imputation ormean allele count.
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
주의사항2
Validation 할 것이라면
1 두 Set의 공통 SNP만으로 예측모형 구성하여야 한다.
2 두 Set의 allele count reference가 동일하여야 한다.
3 두 Set에 모두 해당 trait이 있어야 한다.
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Limitation of GWAS or Linkage analysisIntroduction of WGP
Lasso estimationBayesian inference of Lasso
끝
HP: 010-9192-5385E-mail: secondmath85@gmail.com
Jinseob Kim, MD, MPH Whole Genome Prediction Using Penalized Regression
Recommended