A Random Walk Based Approach
for Improving Interaction Network and
Increasing Prediction Accuracy
Chengwei LEI, Ph.D.Assistant Professor of Computer Science
Department of Electrical Engineering and Computer ScienceMcNeese State University
• Interaction network is a network of nodes that are connected by features.
What is Interaction Network
• If the feature is a physical and molecular, the interaction network is molecular interactions usually found in cells.
First Introduced in Biology
Network View of Protein Interaction Network
Sounds familiar?
Sounds familiar?
Even In Mechanical Engineering
Real-world Classification
• Noisy data
• Overfitting problem
• Few true “driver” changes / vast number of “passenger” changes.
Good Bad
Current Methods
Classifier
Prediction
Current Methods
Classifier
Prediction
Statistical test
Pick the most significant ones
Problem?
• Ignore the relationships between nodes/features/sensors
Our approach
• Improve prognosis by combining
– Node readout data – Node-node interaction networks
Classifier
Prediction
Network
Transformation Matrix
Network
TransformationMatrix
Classifier
Network
Prediction
Transformation Matrix
Transformation Matrix
• Transformation matrix is generated by apply the Random Walk with Restart (RWR) algorithm on the Interaction network.
• A random walk is a mathematical formalization of a path that consists of a succession of random steps.
Random Walk
• A random walk is a mathematical formalization of a path that consists of a succession of random steps.
• Random walk for one node on a graph G is a walk on G where the next node is chosen uniformly at random from the set of neighbors of the current node– when the walk is at node v, the probability to
move in the next step to the neighbor u is Pvu = 1/d(v) for (v, u) is connected and 0 otherwise.
Random Walk
Random Walk
Random Walk
Step 1Step 1
Random Walk
Random Walk
Step 2Step 2
Random Walk
Random Walk
Random Walk
Random Walk
Step 3
Random Walk
Step 2Step 1
…… Step NStep 3
Random Walk with Restart
• A random walker start from a node (v) with – uniform probability to visit its neighbors – fixed probability c to revisit the start node
(v)• The probability for a random walker to
be on node j after k times is
– fijk(v) is the probability for a random walker
to take path i to j at time k– Fj(v) at equilibrium is the probability for a random
walker starting from node v to reach node j => Similarity between patient v and j
How about Two?
• Biology Data– Cancer prediction
Experiments
Classification results
Wang’s Dataset
Network
TransformationMatrix
1 1 0 … 1
1 1 0 … 1
0 0 1 … 0
… … … … …
1 1 0 … 1
1 1 0 … 1
1 1 0 … 1
… … … … …
1 1 0 … 1
286
Wang’s Dataset
7885
10144
10144
10144
7885
286
7885
2259
286
7885
286
7885
2259
1247T-test
Good Bad
286
7885
286
7885
2259
1678
T-test
Good Bad
286
7885
286
7885
2259
483119552
Pvalue comparison for Wang’s data
Significantlydown-regulated
genes
Significantlyup-regulated
genes
For Vijver’s dataset
146349 856
DE Genes
Further verification
• For verification, search each gene in the PubMed database – pick the top DE genes from the original dataset
and the enhanced dataset,– with keyword “( GENE-NAME ) AND Cancer AND
(Metastasis or Metastatic) ”.
Top 15 DE genes in original dataset
Top 15 original non-significant genes in the enhanced dataset
Top 15 original non-significant genes in the enhanced dataset
• SLC26A8 is a male reproductive system diseases related gene
• It is also related to breast cancer
Top 15 original non-significant genes in the enhanced dataset
• SLC26A8 is a male reproductive system diseases related gene
• It is also related to breast cancer– A. E. Dahm, A. L. Eilertsen, J.
Goeman, “A microarray study on the effect of four hormone therapy regimens on gene transcription in whole blood from healthy postmenopausal women,” Thrombosis research, vol. 130, no. 1, pp. 45–51, 2012.
– J.-H. Shin, E. Son, H. Lee, S. Kim, “Molecular and functional expression of anion exchangers in cultured normal human nasal epithelial cells,” Acta physiologica, vol. 191, no. 2, pp. 99–110, 2007
Top 15 original non-significant genes in the enhanced dataset
• RPS6 is a very important gene in cancer research, especially for the cancer antibodies drug development
Top 15 original non-significant genes in the enhanced dataset
• RPS6 is a very important gene in cancer research, especially for the cancer antibodies drug development
– J. C. Potratz, D. N. Saunders, D. H. Wai, et al., “Synthetic lethality screens reveal rps6 and mst1r as modifiers of insulin-like growth factor-1 receptor inhibitor activity in childhood sarcomas,” Cancer research, vol. 70, no. 21, pp. 8770–8781, 2010.
– F. Henjes, C. Bender, S. von der Heyde, L. Braun, H. et al., “Strong egfr signaling in cell line models of erbb2-amplified breast cancer attenuates response towards erbb2-targeting drugs,” Oncogenesis, vol. 1, no. 7, p. e16, 2012.
Top 15 original non-significant genes in the enhanced dataset
• G2E3 is a dual function ubiquitin ligase required for early embryonic development
• and also a nucleo-cytoplasmic shuttling protein with DNA damage responsive localization
Top 15 original non-significant genes in the enhanced dataset
• G2E3 is a dual function ubiquitin ligase required for early embryonic development
• and also a nucleo-cytoplasmic shuttling protein with DNA damage responsive localization
– W. S. Brooks, E. S. Helton, S. Banerjee, “G2e3 is a dual function ubiquitin ligase required for early embryonic development,” Journal of Biological Chemistry, vol. 283, no. 32, pp. 22 304–22 315, 2008.
Top 15 original non-significant genes in the enhanced dataset
• RACGAP1 plays a regulatory role in cell growth, transformation and metastasis
Top 15 original non-significant genes in the enhanced dataset
• RACGAP1 plays a regulatory role in cell growth, transformation and metastasis
– S. Saigusa, K. Tanaka, Y. Mohri, M. Ohi, T. Shimura, et al., “Clinical signif-icance of racgap1 expression at the invasive front of gastric cancer,” Gastric Cancer, pp. 1–9, 2014.
– V. Kotoula, K. T. Kalogeras, G. Kouvatseas, D. Televantou, R. Kro-nenwett, “Sample parameters affecting the clinical relevance of rna biomarkers in translational breast cancer research,” Virchows Archiv, vol. 462, no. 2, pp. 141–154, 2013.
– K. Pliarchopoulou, K. Kalogeras, R. Kronenwett, et al., “Prognostic significance of racgap1 mrna expression in high-risk early breast cancer: a study in primary tumors of breast cancer patients participating in a randomized hellenic cooperative oncology group trial,” Cancer chemotherapy and pharmacology, vol. 71, no. 1, pp. 245–255, 2013..
Top 15 original non-significant genes in the enhanced dataset
Ongoing Experiment
Thank you