JCKBSE2010 Kaunas Predicting Combinatorial Protein-Protein Interactions from Protein Expression Data...
Preview:
Citation preview
- Slide 1
- JCKBSE2010 Kaunas Predicting Combinatorial Protein-Protein
Interactions from Protein Expression Data Based on Correlation
Coefficient Sho Murakami, Takuya Yoshihiro, Etsuko Inoue and Masaru
Nakagawa Faculty of Systems Engineering, Wakayama University
- Slide 2
- JCKBSE2010 Kaunas Wakayama University 2 2 Agenda Background
Combinatorial Protein-Protein Interactions The Proposed Data Mining
Method Evaluation Conclusion
- Slide 3
- JCKBSE2010 Kaunas Wakayama University Background Finding
Interactions among genes/proteins are important Many data-mining
algorithms to discover gene-gene (or protein-protein) interactions
are proposed so far. One of the main source is gene or protein
expression data 3 2D Electorophoresis for protein expression
Microarray for gene expression) Color strength is expression level
Size of spot is expression level
- Slide 4
- JCKBSE2010 Kaunas Wakayama University Related Work for
Interaction Discovery Bayesian Networks Discovering interactions
from expression data based on conditional probability among events
4 A C B AB C Ex. to discover protein-protein interactions among
proteins A, B and C, 1. Define events A, B and C 2. Compute
conditional probability related with A, B and C Ex. to discover
protein-protein interactions among proteins A, B and C, 1. Define
events A, B and C 2. Compute conditional probability related with
A, B and C samples Event C is expressed If high, Interaction is
predicted
- Slide 5
- JCKBSE2010 Kaunas Wakayama University Problems of Bayesian
Networks Bayesian Networks Require large Number of Samples For
gene: microarray supplies cheap and high-speed experiment For
protein: 2D-electrophoresis takes time and expensive 5 A C B
sufficient samples in the area ? Many Samples are Necessary to
obtain statistically reliable results AB C ex. to discover
protein-protein interactions among proteins A, B and C, 1. Define
events A, B and C 2. Compute conditional probability related with
A, B and C ex. to discover protein-protein interactions among
proteins A, B and C, 1. Define events A, B and C 2. Compute
conditional probability related with A, B and C
- Slide 6
- JCKBSE2010 Kaunas Wakayama University 6 The Objective of our
study Finding combinatorial protein-protein interactions from
small-size protein expression data
- Slide 7
- JCKBSE2010 Kaunas Wakayama University 7 7 Expression Data
2D-electrophoresis processed for each sample which includes
expression levels of each protein. Expression levels: obtained by
measuring size of areas As pre-processing, normalization is applied
Each black area indicates a protein: size of areas represent
expression levels sample3 sample2 sample1 Proteins
- Slide 8
- JCKBSE2010 Kaunas Wakayama University 8 8 Model of
Protein-Protein Interaction Considered Model: two proteins A and B
effect on other protein Cs expression level only when both A and B
are expressed We want to estimate the combinatorial Effect! A B C C
A B C Effect on expression levels Complex of A and B A B A B A B
Sole effect from A,B on C is usually considered Only If both A and
B exist, Combinatorial effect works on C!
- Slide 9
- JCKBSE2010 Kaunas Wakayama University 9 9 Predicting
Interactions by Correlation Coefficient Computing correlation
coefficient of (A,B) and C Correlation coefficient requires less
number of samples The amount of complex (A,B) is estimated by
min(A,B) Total effect on C will be high if correlation is high
Expression level AB Expression level of A and B of a sample
Estimated amount of complex of A and B Compute correlation of min(
A,B ) and C This amount would Effect on C min( A,B ) C
- Slide 10
- JCKBSE2010 Kaunas Wakayama University 10 The problem of scale
difference Amount of expression level for 1 molecular is different
among proteins, so the same amount of A and B not always combined.
Therefore, taking min cannot express correct amount of complex
Exp.level AB Proteins A and B Estimated number of complex AB
Proteins A and B The amount of complex is not correct Taking min
leads correct amount of complex Solution correct the scale of A
Scaling problem and solution is the expression level required for a
complex Exp.level
- Slide 11
- JCKBSE2010 Kaunas Wakayama University 11 How to determine
correct scale? Expression level ABk1Ak1Ak2Ak2Ak3Ak3A We compute
Score S: the total effect of (A, B) on C Compute Correlation Select
the scale which leads the maximum correlation coefficient of
min(A,B) and C If interaction of our model exists, high correlation
value must appear. min( A,B ) Score S Correlation 0.1Correlation
0.2Correlation 0.3Correlation 0.7
- Slide 12
- JCKBSE2010 Kaunas Wakayama University Estimating Combinatorial
Effect from Score S Score S consists of Sole Effect and
Combinatorial Effect Compute Score S: Score S assuming no
combinatorial effect Difference between S and S is the level of
Combinatorial Effect 12 Level of combinatorial effect B C A The
difference between score S and S is the combinatorial effect A B C
B C A C Assuming no combinatorial Effect A B C C Score S B C A
Score S Computing Statistic Distribution
- Slide 13
- JCKBSE2010 Kaunas Wakayama University Assume that expression
levels of proteins A, B and C follow normal distribution Computer
simulation leads the distribution of Score S How to compute
distribution of score S? 13 Correlation Correlation Distribution of
A Distribution of B Distribution of C Score S of =0.5, =0.3 Obtain
distribution of score S Randomly create a distribution of A, B and
C where correlation coefficient of A-B is , that of B-C is Create
the table of average and stddev for each and Repeat computation of
score S Repeat computation of score S Score S of =0.5, =0.4 We can
obtain the distribution for each and . Upper: average Lower:
stddev
- Slide 14
- JCKBSE2010 Kaunas Wakayama University Place the score S in
distribution of S Z-score: Measure difference between score S and
average of S as the count of standard deviation Score S Computing
Combinatorial Effect as Z-score 14 The higher z-score is, the
stronger the combinatorial effect is ! Distribution of score
SCompute score Scorresponding The amount of combinatorial effect
level Z-score (score S-avg(S)) / stddev(S) Measurement as count of
standard deviation average Score S Z-score Score S
- Slide 15
- JCKBSE2010 Kaunas Wakayama University Trying all combination of
A, B and C Compute the maximum correlation coefficient among all
scale of A and B to compute Score S Compute z-score and create
ranking by them 15 Compute z-scores from distribution of S Summary
of the proposed algorithm ABCD sample1 sample2 sample3 Expression
Data (A,B)C (A,B)D (A,B)E (A,B)F (A,C)B (A,C)D (A,C)E (A,C)F (B,C)A
(B,C)D (B,C)E (B,C)F Trying all combinations 1 Compute max
correlation among every scale 2 A B A B A B Try every scales
correlation 0.3 correlation 0.8 correlation: 0.5 S Z-score = 5.5
list of all combinations 3 Ranking by z-score
rankCombinationsZ-score (A,C)B 5.5 (B,C)E 4.9 (A,B)F 4.7 Score S =
0.8 S
- Slide 16
- JCKBSE2010 Kaunas Wakayama University 16 Evaluation Applying
our method into real expression data Protein expression data of
black cattle # of samples is 195, # of proteins is 879 finding
combinatorial protein-protein interactions using our method
- Slide 17
- JCKBSE2010 Kaunas Wakayama University The Expression Data
Follows Normal Distribution By way of Jarque-Bera test with
confidential level of 95%, we test if expression data follows
normal distribution. Result: 454 proteins out of 879 proteins
follow normal distribution Thus, we use 454 proteins for evaluation
17
- Slide 18
- JCKBSE2010 Kaunas Wakayama University Results We found so many
combinations of proteins which would have combinatorial effect The
maximum value of z-score is 11.0 The combinations where z-value is
more than about 5.5 (p-value is less than 0.000000019(=0.05/ 454 C
3 ))) would have combinatorial effect with confidential level of
95%. 18 The histogram of z-score # of combinations Z-score
- Slide 19
- JCKBSE2010 Kaunas Wakayama University Comparing z-scores with
normal distribution 19 We compare the histogram with that of
without combinatorial effect Created by augmenting normal
distribution with the number of trials ( 454 C 3 ) It is inferred
that this data includes considerable amount of combinatorial effect
Distribution of z-score under assumption no combinatorial effect
Estimated distribution of z-score obtained from real data # of
combinations Z-score Histogram of real data # of combinations
Z-score Histogram without combinatorial effect
- Slide 20
- JCKBSE2010 Kaunas Wakayama University The Ranking based on
Z-score 20 The ranking table shows that Combinations with low score
S are retrieved. Same protein tends to appear many times. The
ranking of Z-score obtained from real data B CAC Correlation of B-C
Score S B C A Z-score Rank A Protein Num B Protein Num C Protein
Num Correlation of A-C
- Slide 21
- JCKBSE2010 Kaunas Wakayama University Conclusion 21 Summary We
propose a method to estimate combinatorial effect of three proteins
from protein expression data Applying the method into real data, we
found many combinations which would have combinatorial effect
Future work To confirm the reliability, we are planning to study
whether the found combinations include well-known protein-protein
interactions or not.