Upload
roberto-anglani
View
13
Download
0
Tags:
Embed Size (px)
Citation preview
Topological analysis of coexpression networks in neoplastic tissues
R Anglani1, TM Creanza2, VC Liuzzi1, PF Stifanelli1, R Maglietta1, A Piepoli3, S Mukherjee4, PF Schena2, N Ancona1
1Bioinformatics & Systems Biology Lab, CNR-ISSIA, Bari, Italy; 2Dipartimento di Emergenza e Trapianti Organi, Università di Bari, Italy; 3Unità Operativa di Gastroenterologia, IRCCS, Casa Sollievo della Sofferenza, San Giovanni Rotondo, Italy; 4Institute for Genome Sciences and Policy, Duke University, Durham, USA
BITS 2012 Ninth Annual Meeting of the Bioinformatics Italian Society May 2-4, 2012, Catania, Italy
Gene co-expression networks are useful models to enlighten the coordinated expression of groups of genes that are functionally co-regulated in order to provide the adaptive response to the system modification. In this framework, topology-based approaches to network analysis can yield unexpected insights of the global properties of biological systems that could not be unveiled with one-gene approaches. We show that topological differences can critically emerge from the comparison between normal and cancer networks and can identify those non-differentially expressed genes that can have a role in the evolution of the specific disease. To this aim, we introduce a novel method for the characterization of disease genes, based on the study of a new observable, called “differential connection”, i.e. a statistically significative degree difference of a gene between two phenotype conditions. Moreover, we see that preferential removal of “differentially connected” genes is responsible for the alteration of the average path length of the normal network with respect to random failure and hub removal. Finally, we suggest a possible association between the differential connectivity and the presence of mutations within protein interaction domains.
Fisher’s exact test LUNG (19520 genes)DC(FDR<13%) = 7520 DE(FDR<13%) = 12258DC&DE = 3293 p(DC&DE>3292) ~ 1
Fisher’s exact test DC & Cancer censusDC(FDR<13%) = 1072 CENSUS(CRC) = 18DC & CENSUS = 4 p(DC&CC>4) ~ 0.02
Fisher’s exact test DE & Cancer censusDE(FDR<13%) = 6606 CENSUS(CRC) = 18 DC & CENSUS = 9 p(DC&CC>9) ~ 0.21
Fisher’s exact test COLON (17400 genes)DC(FDR<13%) = 1072 DE(FDR<13%) = 606DC&DE = 260 p(DC&DE>260) ~ 1
Dataset(EMTAB829) colorectal cancer Affymetrix GeneChip Human Exon 1.0 ST 28 samples (14 cancer and 14 paired normal)(GSE15852) breast cancerAffymetrix GeneChip Human Genome U133A 86 samples (43 cancer and 43 paired normal)(GSE18842) non-small cell lung cancer Affymetrix GeneChip Human Genome U133Plus2 91 samples (46 cancer and 45 normal, all paired except 3)
Coexpression network generation Coexpression networks are build by evaluating the Pearson correlation coefficients (PCC) of each dataset X and testing the hypothesis of no correlation. In the null case, for approximately Gaussian data the sampling distribution of PCC follows Student’s t-distribution with n-2 degrees of freedom, with n is the sample size.
t =pn� 2
rp1� r2
Degree distributionThe degree of a node in a graph is the number of edges connected to the node. The degree distribution P(k) is the probability that a selected node has k links. A random network is characterized by a Poisson degree distribution, while a ‘scale-free’ network [1] has a power law distribution P(k) ~ k-γ. We find that degree distribution of the examined coexpression networks are scale-free in agreement with Ref. [2].
Fisher’s exact test BREAST (12157 genes)DC(FDR<13%) = 582 DE(FDR<13%) = 3127DC&DE = 337 p(DC&DE>337) ~ 1E-64
Differential connectivityGiven Δi = degree difference of the i-th gene between normal and cancer condition, a gene is said to be “differentially connected” when Δi is statistically significative. To assess the significance, we randomly assign patients to one of two groups and we evaluate Δi* for each permutation. We repeat the shuffle 1000 times to obtain the random distribution. The differential connection p-value is evauated comparing the real Δi with the random distribution. In order to control the expected proportion of incorrectly rejected null hypotheses, we evaluate Benjamini Hochberg False Discovery Rate and we put the significance threshold to 13%.Fisher’s exact tests for colon and lung case suggest that “differentially connected” genes can represent a population distinct from differentially expressed genes.
0 0.2 0.4 0.6 0.8 1
Fraction of removed nodes
0
2
4
6
8
10
12
14
16
Ave
rage
pat
h le
ngth
Breast
randomhubsdiff conndiff expr
0 0.2 0.4 0.6 0.8 1
Fraction of removed nodes
1
2
3
4
5
6
Ave
rage
pat
h le
ngth
Lung
randomhubsdiff conndiff expr
0 0.1 0.2 0.3 0.41.75
1.8
1.85
1.9
1.95
0.0001 0.001 0.01 0.1 1Differential connection p-value
0
10
20
30
40
50
60
70
BH F
alse
Dis
cove
ry R
ate
[%]
LungBreastColon
System attack toleranceAverage path length (APL) is the mean of geodesic lengths over all pair of vertices and it is a measure of efficiency of information transport of the network. Differentially connected genes result to be responsible for an alteration of APL which is evidently different from random failure, hub removal [3] and differentially expressed genes removal.
Literature validation: Cancer-related mutationsDuring cancer progression, mutations can occur indifferently in regulatory or coding sites of genes. It is reasonable to guess that alteration of regulatory sites can lead to modification of gene expression. Instead, missense and nonsense mutations in the binding regions of a protein could disrupt some interactions with other proteins. In our study, we find a significative intersection (Fisher’s exact test pvalue ~ 0.02) between differentially connected genes and colon-cancer-related Census genes. Differentially expressed genes do not provide the same significance.Our results suggest that differentially connected genes can correspond to those genes frequently mutated in colorectal cancer according to Cancer Census Database (Wellcome Trust Institute).
[1] Barabasi & Oltvai, Nature Reviews 5 101 (2004); [2] Carter et al., Bioinformatics 20 2242 (2004); [3] Albert & Barabasi, Rev. Mod. Phys 74 47 (2002)
0 0.2 0.4 0.6 0.8 1
Fraction of removed nodes
2
3
4
5
6
7
Ave
rage
pat
h le
ngth
Colon
randomhubsdiff conndiff expr
0 0.1 0.2 0.3 0.42
2.12.22.32.42.52.6
FBXW7
KRAS
MAP2K4
VTI1A
0 500 1000 1500 2000 2500
Degree [k]
0
200
400
600
800
1000
1200
1400
1600
Num
ber
of g
enes
wit
h de
gree
k
Breast
normalcancer
10 100 10001
10
100
1000
0 1000 2000 3000 4000 5000
Degree [k]
0
200
400
600
800
1000
1200
1400
Num
ber
of g
enes
wit
h de
gree
k
Colon
normalcancer
100 1000 10000
1
10
100
1000
0 500 1000 1500 2000 2500 3000 3500
Degree [k]
0
500
1000
1500
2000
2500
Num
ber
of g
enes
wit
h de
gree
k
Lung
normalcancer
10 100 1000
1
10
100
1000