20
Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics

Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Embed Size (px)

Citation preview

Page 1: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Disease Models and Association StatisticsDisease Models and Association Statistics

Nicolas Widman

CS 224- Computational GeneticsNicolas Widman

CS 224- Computational Genetics

Page 2: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

IntroductionIntroduction

Certain SNPs within genes may be associated with a disease phenotype

Statistical model used in class only considers inheritance of a single copy of an SNP location: Single Chromosome Model Expand the statistic to a diploid model and

take into account different expression patterns of a SNP

Certain SNPs within genes may be associated with a disease phenotype

Statistical model used in class only considers inheritance of a single copy of an SNP location: Single Chromosome Model Expand the statistic to a diploid model and

take into account different expression patterns of a SNP

Page 3: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Basic Statistic- Haploid ModelBasic Statistic- Haploid Model

: Relative Risk pA: Probability of disease-associated allele F: Disease prevalence

For this project, F is assumed to be very small +/-: Disease StateDerivation of case (p+) and control (p-) frequencies:

P(A)=pA p+A=P(A|+) p-

A=P(A|-) F=P(+)

P(A|+)=P(+|A)P(A)/P(+)

P(+|A)= P(+|¬A)

: Relative Risk pA: Probability of disease-associated allele F: Disease prevalence

For this project, F is assumed to be very small +/-: Disease StateDerivation of case (p+) and control (p-) frequencies:

P(A)=pA p+A=P(A|+) p-

A=P(A|-) F=P(+)

P(A|+)=P(+|A)P(A)/P(+)

P(+|A)= P(+|¬A)

Page 4: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Derivation- ContinuedDerivation- Continued

P(+)=F=pAP(+|A)+(1-pA)P(+|¬A)

P(+)=F= pAP(+|A)+(1-pA)P(+|A)/

P(+)=F=P(+|A)(pA+(1-pA)/)=P(+|A)(pA(-1)+1)/

P(+|A)= F/(pA(-1)+1)

P(A|+)=P(+|A)P(A)/P(+)=P(+|A)pA/F=pA/(pA(-1)+1)

P(-|A)=1-P(+|A)=1- F/(pA(-1)+1)

P(A|-)=P(-|A)P(A)/P(-)

If F is small, then 1-F ≈ 1 and P(-|A) ≈ 1

then, P(A|-) ≈ P(A) = pA

P(+)=F=pAP(+|A)+(1-pA)P(+|¬A)

P(+)=F= pAP(+|A)+(1-pA)P(+|A)/

P(+)=F=P(+|A)(pA+(1-pA)/)=P(+|A)(pA(-1)+1)/

P(+|A)= F/(pA(-1)+1)

P(A|+)=P(+|A)P(A)/P(+)=P(+|A)pA/F=pA/(pA(-1)+1)

P(-|A)=1-P(+|A)=1- F/(pA(-1)+1)

P(A|-)=P(-|A)P(A)/P(-)

If F is small, then 1-F ≈ 1 and P(-|A) ≈ 1

then, P(A|-) ≈ P(A) = pA

Page 5: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Haploid ModelHaploid Model

The relative risk formula:

Association Power:

The relative risk formula:

Association Power:

1)1( +−=+

A

AA p

pp

)1()/2(

)(

AA

AAA

ppN

ppN

−−

=−+

λ

))2/((1

))2/((1

1

N

N

A

A

λα

λα

+Φ−Φ−+

+ΦΦ−

Page 6: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

AssumptionsAssumptions

Low disease prevalence F ≈ 0: Allows p-

A ≈ pA

Uses Hardy-Weinberg Principle A-Major Allele a-Minor Allele

P(AA)=P(A)^2

P(Aa)=2*P(A)*(1-P(A))

P(aa)=(1-P(A))^2

Uses a balanced case-control study

Low disease prevalence F ≈ 0: Allows p-

A ≈ pA

Uses Hardy-Weinberg Principle A-Major Allele a-Minor Allele

P(AA)=P(A)^2

P(Aa)=2*P(A)*(1-P(A))

P(aa)=(1-P(A))^2

Uses a balanced case-control study

Page 7: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Diploid Disease ModelsDiploid Disease Models

When inheriting two copies of a SNP site, there are three common relationships between major and minor SNPs Dominant

Particular phenotype requires one major allele

Recessive Particular phenotype requires both minor alleles

Additive Particular phenotype varies based whether there are one or

two major alleles

When inheriting two copies of a SNP site, there are three common relationships between major and minor SNPs Dominant

Particular phenotype requires one major allele

Recessive Particular phenotype requires both minor alleles

Additive Particular phenotype varies based whether there are one or

two major alleles

Page 8: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Diploid Disease ModelsDiploid Disease Models

AA- Homozygous major Aa, aA- Heterozygous aa- Homozygous minor

AA- Homozygous major Aa, aA- Heterozygous aa- Homozygous minor

Page 9: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Modifying the Calculation for Relative Risk

Modifying the Calculation for Relative Risk

Previous relative risk formula only considered the haploid case of having a SNP or not having a SNP.

Approach:

Create a virtual SNP which replaces pA in the formula.

Previous relative risk formula only considered the haploid case of having a SNP or not having a SNP.

Approach:

Create a virtual SNP which replaces pA in the formula.

1)1( +−=+

A

AA p

pp

Page 10: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Virtual SNPsVirtual SNPs

Use Hardy-Weinberg Principle to calculate a new pA - the virtual SNP using the characteristics of diploid disease models. Recessive

pA=pd*pd

DominantpA=pd*pd+2*pd*(1-pd)

AdditivepA=pd*pd+c*pd*(1-pd)

Pd: Probability of disease-associated allele. In the calculations used to determine the association power, c was set to sqrt(2).

Use Hardy-Weinberg Principle to calculate a new pA - the virtual SNP using the characteristics of diploid disease models. Recessive

pA=pd*pd

DominantpA=pd*pd+2*pd*(1-pd)

AdditivepA=pd*pd+c*pd*(1-pd)

Pd: Probability of disease-associated allele. In the calculations used to determine the association power, c was set to sqrt(2).

Page 11: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Diploid Disease Models: =1.5Diploid Disease Models: =1.5

Page 12: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Diploid Disease Models: =1.5Diploid Disease Models: =1.5

Page 13: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Diploid Disease Models: =1.5Diploid Disease Models: =1.5

Page 14: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Diploid Disease Models: =2Diploid Disease Models: =2

Page 15: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Diploid Disease Models: =2Diploid Disease Models: =2

Page 16: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Diploid Disease Models: =2Diploid Disease Models: =2

Page 17: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

Diploid Disease Models: =3Diploid Disease Models: =3

Page 18: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

ResultsResults

Achieving significant association power with low relative risk SNPs (=1.5) Minimum of 200 cases and 200 controls required to

reach 80% power within strongest pd intervals for each type of SNP

At a sample size of 1000 cases and 1000 controls, dominant and additive SNPs show very significant power for almost all SNP probabilities below 50%

Difficult to obtain significant association for low probability recessive SNPs regardless of sample size

Achieving significant association power with low relative risk SNPs (=1.5) Minimum of 200 cases and 200 controls required to

reach 80% power within strongest pd intervals for each type of SNP

At a sample size of 1000 cases and 1000 controls, dominant and additive SNPs show very significant power for almost all SNP probabilities below 50%

Difficult to obtain significant association for low probability recessive SNPs regardless of sample size

Page 19: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

ResultsResults

SNP probability ranges for greatest association power Dominant: .10 - .30 Recessive: .45 - .70 Additive: .15 - .40

Higher relative risk SNPs require fewer cases and controls to achieve the same power.

As approaches 1, the association power to detect a recessive allele with probability p is the same as the power to detect dominant allele with probability 1-p.

SNP probability ranges for greatest association power Dominant: .10 - .30 Recessive: .45 - .70 Additive: .15 - .40

Higher relative risk SNPs require fewer cases and controls to achieve the same power.

As approaches 1, the association power to detect a recessive allele with probability p is the same as the power to detect dominant allele with probability 1-p.

Page 20: Disease Models and Association Statistics Nicolas Widman CS 224- Computational Genetics Nicolas Widman CS 224- Computational Genetics

ResultsResults

Diseases with higher relative risk have their range of highest association power skewed toward lower probability SNPs.

Challenges in obtaining high association power: Low probability recessive SNPs Low relative risk diseases, especially with small

sample sizes High probability dominant SNPs, however these are

unlikely due natural selection and that the majority of the population would be affected by such diseases.

Diseases with higher relative risk have their range of highest association power skewed toward lower probability SNPs.

Challenges in obtaining high association power: Low probability recessive SNPs Low relative risk diseases, especially with small

sample sizes High probability dominant SNPs, however these are

unlikely due natural selection and that the majority of the population would be affected by such diseases.