8
2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW) A Computational Model for Genetic and Epigenetic Signals in Colon Cancer Irina A. Roznovat and Heather J. Ruskin Centre for Scientific Computing & Complex Systems Modelling (SCI-SYM), School of Computing, Dublin City University, Dublin, Ireland {iroznovat, hruskin}@computing.dcu.ie Abstract-Cancer, a class of diseases, which are characterized by abnormal cell growth, has one of the highest overall death rates world-wide. Its development has been linked to genetic and epigenetic events that aberrantly affect the regulation of key genes that control cellular mechanisms. However, a distinct problem in cancer research is the lack of precise information on tumor pathways, so the delineation of these and the processes underlying disease proliferation is an important area of investigation. This has motivated the effort to build computational models to help understand the molecular changes that lead to one or more types of malignancy. The current work aims to develop a prototype network- based model to describe cancer initiation and progression, with a focus on three main layers: micro-molecular events, gene interactions and cancer stages. Keywords-Genetic and epigenetic events; cancer; pathogenesis; gene interactions; computational model I. INTRODUCTION Over the last several decades, considerable effort has been made to identify the oncogenes and tumor suppressor genes that are most susceptible to mutation and hence implicated in the initiation of cancers. Familial risk has also been investigated and heritable tendencies identified for cancers of the colon, breast, lung and others [1], as well as for skin melanomas [2]. While familial cancer is less common than spontaneous manifestations, genetic patterns are gradually being extracted, which may contribute to the identification of those at greater risk. Further, development of targeted therapies is now feasible, where these aim to control or even neutralize the effects of changes in key genes. Over the last two decades, it has become increasingly evident that not only genetic, but also epigenetic inheritance is important in many pathological conditions, and relates to features, such as loss of imprinting 1, hyper and hypomethylation 2 [3], and chromatin remodeling [4]. Defined as non-genetic heritable changes in chromatin structure that do not include modifications in DNA sequence, but affect gene expression [4], epigenetic signals have been detected in the earliest stages of different neoplastic diseases 1 A genomic phenomenon whereby normally silent inherited alleles are activated 2 Epigenetic events associated with increased methylation / gene inactivation, and loss of imprinting / chromosomal instability 978-1-4673-2747-3/12/$31.00 ©2012 IEEE 188 [5], and act as markers for cancer initiation. Additionally, their unique reversibility property, (not shared by genetic mutations), is already exploited in some therapies. Cancer is caused by the deregulation of key genes that control cellular mechanisms, including mitosis 3 , meiosis 4 , apoptosis', and movement. Gene transcription is controlled by chromatin, which can change dynamically, due to different epigenetic events, such as DNA methylation (DNAm) and histone modifications (HM) [6], [7]. DNA methylation, the molecular process that involves the addition of a methyl group to cytosine ring, is linked in cancer [8], to unscheduled tumor suppressor gene silencing, oncogene activation and chromosomal instability [5]. Respectively, these can protect cells by inhibiting mutated cell division, or can promote cell proliferation and lead to increase in cell predisposition to mutation. Similarly, the modification of histones, (proteins that package the DNA nucleosome sequence), influences change in chromatin structure. Gene activity is dependent on different types of histone modification, which may be cumulative, and which include methylation, acetylation, and phosphorylation, (the addition, respectively of an acetyl or phosphate group to a chemical compound). For example, the deacetylated histone state has been reported to maintain nucleosomes in a transcriptional silent state [9]. Nowadays, treatments of incipient stages of cancer, for localized, non-metastasized tumors, are increasingly effective and it is known that the early detection of abnormal cellular changes within the body can contribute to their success in inhibiting tumor growth. Therefore, the study of DNA methylation and histone modification, alterations that have been linked to cancer initiation, are of particular interest in cancer research. In this paper, we focus on genetic and epigenetic events that induce aberrant DNA methylation in cells and propose a computational model to describe their interdependencies in cancer development. 3 The process by which a cell separates the chromosome into two identical sets 4 The cell division necessary for sexual reproduction 5 The process of programmed cell death

[IEEE 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW) - Philadelphia, USA (2012.10.4-2012.10.7)] 2012 IEEE International Conference on Bioinformatics

Embed Size (px)

Citation preview

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)

A Computational Model for Genetic and Epigenetic Signals in Colon Cancer

Irina A. Roznovat and Heather J. RuskinCentre for Scientific Computing & Complex Systems Modelling (SCI-SYM),

School of Computing, Dublin City University,Dublin, Ireland

{iroznovat, hruskin}@computing.dcu.ie

Abstract-Cancer, a class of diseases, which are characterizedby abnormal cell growth, has one of the highest overall deathrates world-wide. Its development has been linked to geneticand epigenetic events that aberrantly affect the regulation ofkey genes that control cellular mechanisms. However, adistinct problem in cancer research is the lack of preciseinformation on tumor pathways, so the delineation of these andthe processes underlying disease proliferation is an importantarea of investigation. This has motivated the effort to buildcomputational models to help understand the molecularchanges that lead to one or more types of malignancy.

The current work aims to develop a prototype network­based model to describe cancer initiation and progression, witha focus on three main layers: micro-molecular events, geneinteractions and cancer stages.

Keywords-Genetic and epigenetic events; cancer;pathogenesis; gene interactions; computational model

I. INTRODUCTION

Over the last several decades, considerable effort hasbeen made to identify the oncogenes and tumor suppressorgenes that are most susceptible to mutation and henceimplicated in the initiation of cancers. Familial risk has alsobeen investigated and heritable tendencies identified forcancers of the colon, breast, lung and others [1], as well asfor skin melanomas [2]. While familial cancer is lesscommon than spontaneous manifestations, genetic patternsare gradually being extracted, which may contribute to theidentification of those at greater risk. Further, developmentof targeted therapies is now feasible, where these aim tocontrol or even neutralize the effects of changes in keygenes. Over the last two decades, it has become increasinglyevident that not only genetic, but also epigenetic inheritanceis important in many pathological conditions, and relates tofeatures, such as loss of imprinting 1, hyper andhypomethylation 2 [3], and chromatin remodeling [4].Defined as non-genetic heritable changes in chromatinstructure that do not include modifications in DNA sequence,but affect gene expression [4], epigenetic signals have beendetected in the earliest stages of different neoplastic diseases

1 A genomic phenomenon whereby normally silent inherited alleles areactivated2 Epigenetic events associated with increased methylation / geneinactivation, and loss of imprinting / chromosomal instability

978-1-4673-2747-3/12/$31.00 ©2012 IEEE 188

[5], and act as markers for cancer initiation. Additionally,their unique reversibility property, (not shared by geneticmutations), is already exploited in some therapies.

Cancer is caused by the deregulation of key genes thatcontrol cellular mechanisms, including mitosis 3

, meiosis 4,

apoptosis', and movement. Gene transcription is controlledby chromatin, which can change dynamically, due todifferent epigenetic events, such as DNA methylation(DNAm) and histone modifications (HM) [6], [7]. DNAmethylation, the molecular process that involves the additionof a methyl group to cytosine ring, is linked in cancer [8], tounscheduled tumor suppressor gene silencing, oncogeneactivation and chromosomal instability [5]. Respectively,these can protect cells by inhibiting mutated cell division, orcan promote cell proliferation and lead to increase in cellpredisposition to mutation. Similarly, the modification ofhistones, (proteins that package the DNA nucleosomesequence), influences change in chromatin structure. Geneactivity is dependent on different types of histonemodification, which may be cumulative, and which includemethylation, acetylation, and phosphorylation, (the addition,respectively of an acetyl or phosphate group to a chemicalcompound). For example, the deacetylated histone state hasbeen reported to maintain nucleosomes in a transcriptionalsilent state [9].

Nowadays, treatments of incipient stages of cancer, forlocalized, non-metastasized tumors, are increasinglyeffective and it is known that the early detection of abnormalcellular changes within the body can contribute to theirsuccess in inhibiting tumor growth. Therefore, the study ofDNA methylation and histone modification, alterations thathave been linked to cancer initiation, are of particular interestin cancer research. In this paper, we focus on genetic andepigenetic events that induce aberrant DNA methylation incells and propose a computational model to describe theirinterdependencies in cancer development.

3 The process by which a cell separates the chromosome into two identicalsets4 The cell division necessary for sexual reproduction5 The process of programmed cell death

II. COLON CANCER MODEL

A. Gene framework structureStructured in three main layers, (micromolecular events,

genetic interactions and cancer stages), the current modellooks at the effect on phenotype of a number of key geneticand epigenetic factors. These include DNA methylation(both hyper and hypomethylation), and histonemodifications, combined with tumor pathway informationregarding genetic mutations and gene expression.

While colon cancer is the initial focus, (and is generallywell-documented), the model framework presented here isdesigned to be generalizable to other types of cancer.Therefore, more general terms, such as "carcinoma in situ"or "invasive carcinoma" have been used to denote coloncancer stages in the current framework, instead of morespecific usage. Defined as a cell having high potential tobecome abnormal, carcinoma in situ is considered to be thefirst stage of cancer, (Stage 0). This corresponds e.g. toductal carcinoma in situ (DCIS) of breast cancer, to Bowen'sdisease in skin melanomas, or to "polyps" or "aberrant cryptfocus" phenotypes in the colon gene network. Thedesignation invasive carcinoma corresponds to Stages I, II,III and is labeled "carcinoma" or "near-carcinoma" in thecurrent model framework. In summary, this si~plifiedframework recognizes four main stages: healthy, carcinomain situ, invasive carcinoma and metastasis, (i.e. Stage IVtumor phase, cancer has spread throughout the body or toanother organ).

The framework builds a network for gene relations, basedprimarily on statistical data, mined from a manually- curated

database, StatEpigen [10]. This provides information forcolon cancer, specifically on gene relationships at differentpathology phenotype levels affected by recorded genetic andepigenetic events. Relationships are designated: simple(containing single gene information at a given cancer stage)and conditional (pairwise gene dependence at a given cancerstage). These data are used to inform the set of conditionalinterdependencies for a Bayesian network. In general, theconditional relationships (CR) from StatEpigen areexpressed mathematically by:

CR(G2, Gj, e2, e.. s, c) =P((G2e~I(Gl e.), s, c) (1)i.e. the probability of an event e2 for gene G2 given the

event el for G1 in the stage s of cancer type c. Similarly, asimple relationship (SiP) is just:

SiP(G, e, s, c) = P((G e)1 (s), (c)) (2)the probability that event e appears for gene G at stage s

of cancer type c.Although StatEpigen considers only two types of gene

relationship, the model also looks at "self-relationship"(SeP),where the gene relationship depends on the previous status ofthe same gene. This is illustrated by an example foradenomatous polyposis coli, (APC), a tumor suppressor genethat is detected to be frequently mutated and/orhypermethylated in the very early stages of colon cancer.From the StatEpigendata [10]:

P(APC mutationl APC hypermethylation, adenoma,colon) = 0.538, therefore

SeP(APC, mutation, adenoma, colon) = 0.538.The interdependence between current and extended

framework levels is illustrated in Fig. 1:

Figure 1. Dynamics of gene interactions in cancer pathways: Simple Model inclusions

189

A simple overview, based on StatEpigen data, assignsprobabilities to the middle diagram layer, (Fig. 1), as shown.This middle layer aims to describe gene relations, associatedwith three types of cancer, (colon, lung and stomach),through a connected graph, where the genes are nodes,linked by directed weighted edges. The weight information isgiven by the empirical probability value and the direction bythe relationship itself. When no reliable information isprovided on dependence between two genes, a cut edge isdrawn between them. Additionally, the node color indicatesgene affiliation to a specific cancer type. For example, geneG6 is found to be mutated in all three cancer types, withconditional relationships for these data with genes Gs, Gs andGIO' (for colon cancer), G7, (for stomach and lung cancertypes), and G2, (for lung cancer). An example of thebiological data from StatEpigen that feed into the gene layerin Fig. 1 is shown in Table I below:

TABLE I. GENES INVOLVED IN DIFFERENT GENETIC AND EPIGENETICEVENTS (HTIERlvIETHYLATION (H+), GENE EXPRESSION (GE), J\1UTATION

(M)) IN COLON, LUNG AND STOMACH CANCER PHENOTTIES [10]

Group Colon Lung Stomachfrom Genes cancer cancer cancertable (C I ) (Cz) (C3)

G1 CRABPI Present(H+)

Gz NOREI Present(H+)

G3 NR3Cl Present(H+)

G4 MLHI Present(H+; GE)

Gs MGMT Present(H+; GE)

G6 KRAS Present Present Present(M) (M) (M)

G7 RASSFIA Present Present(H+) (H+)

Gs APC Present(H+; M)

Gg TP53 Present(M; GE)

G10 CDKN2A: PresentP14 (H+)

Currently, the gene model framework focuses on coloncancer and, based on the pathology phenotype levels fromStatEpigen data, each stage of colon cancer, together withthe healthy phenotype is described computationally by agraph component. However, as a Bayesian-type network, thegene framework should contain only acyclic graphcomponents [11]. Hence, a potential loop problem is solvedby removing the lowest weighted edge, (the generelationship with the lowest probability). This refinementcontinues until the constraint is satisfied.

B. The most plausible pathways in cancerOf particular interest is the detection of the most

plausible pathways in cancer development. In the currentgene framework, each graph node has been labeled with oneof the three attributes: start, middle or final node, based on

190

its positron in the current acyclic graph, (following theprevious preprocessing step).

A gene Gi is defined to be a parent for a gene Gj if thereis a conditional relationship CR(Gj, Gi, ej, ei, s, c) in theanalyzed data, where ei, ej are the molecular events observedrespectively for Gi, and Gj, in the stage s of cancer type c,and i, j are indexes with values from 1 to the total number ofnetwork genes. Additionally, the gene Gj is referred as achild for Gi. Hence, a start node is a gene that has no parentsin the gene framework, while a final node is a gene that doesnot have any children. Additionally, a middle node illustratesa gene that is both influenced by and influences anothergene, i.e. can be either argument in different conditionalrelationships of the gene network (it has both parents andchildren).

Furthermore, if the assumption is made that a pathway isa directional connection of graph edges beginning in a startnode and containing zero, one or more middle nodes andending in a final node, it is relatively easy to determine allpathways in the gene network. Given this result, thedecision on which pathway is more plausible is made bycomparing the pathway score element and the networkthreshold (a value set a priori for the entire network). Thepathway score is defined as the sum of all logarithms ofboth the compound conditional gene relationshipprobabilities and the simple relationship probability for thestart gene, multiplied by the pathway contributioncoefficient. This factor is equal to the ratio between thepathway length, (the number of genes found in the currentpathway), and the total number of network genes. The scoreof the kth pathway from the gene network is described by thefollowing formula:

SCORE(k) = [logSiP(Gksb eksb s, c) +

+ Ini,j=I(log(CR(G\, Gkj, e\, ekj, s, c))] x nIV, (3)where:Gst = the start gene of the kth pathway from the network;est = the molecular event corresponding to the Gst gene;Gik,G·k = the i'",J" gene respectively of the kth pathway;eik, e/" = the molecular event corresponding respectively tothe Gik, to the Gjk genes;n = the gene number of the kth pathway;V = the total number of the network genes;s = the stage of cancer for the current pathway;c = the type of cancer for the kth gene pathway.

The computed value, (SCORE(k)), is compared with thenetwork threshold, and if it is higher, the kth gene pathway isconsidered one of the most plausible in the graph.

C. DNA methylationKnown to have a crucial role in cancer development,

DNA methylation is the main feature of the colon cancermodel to date. The methylation level update inside a gene isa complex process, influenced by a combination of fourfactors: cancer stages, gene relationship type, genetic andepigenetic events and histone modifications.

2) Methylation update stepBased on the relationship type, the methylation level of a

gene G is then updated, using the following formula:M'(G) = M(G) ± [Parent(G)] / (E+S) x

x [ IParent(G) k=1 Rk x Pk( G)], (4)

Cancer stages Methylation rangesHealthy 32 - 32.50

Carcinoma in situ 40 - 40.50

Invasive carcinoma 50 - 50.50

Metastasis 97 - 97.50

1) Methylation initialization stepThe methylation level depends on cancer stages. For

example, gene TP53 (tumor protein p53) is thought to havea higher methylation level in metastasis than in an incipientstage as polyps (carcinoma in situ). Therefore, four value­ranges, presented in Table II, have been chosen to illustratethe initial methylation level in the current framework, basedon the given stages.

where:M'(G) = DNAm of the gene G in the next Time-step;M(G) = DNAm of the gene G in the current Time-step;Parent(G) = the number of the gene G parents (consideredequal to 1 if G is a start gene or it is inside a "self­relationship").E = total number of edges from graph;S = total number of "self-relationships" from graph,considered zero in non "self-relationship" update step;R, = a random value from [10-3, 10-2] range forhypermethylation (H+)and hypomethylation (H-), or [5*10-4

,

10-3] for mutations; these ranges are different forhyper/hypomethylation than for mutations, because theformer events affect the methylation level directly, while the

Cancer stages Initial histone rangesHealthy 0.10 - 0.19

Carcinoma in situ 0.20 - 0.34

Invasive carcinoma 0.35 - 0.74

Metastasis 0.75 - 0.90

3) Histone modifications and DNA methylationIt has been recently established that DNAm and HM can

influence one another [12], [13]. However, these types ofcontributions are used by the current gene framework only ifthey have a high probability of occurring P, where P > 0.60.

Similarly to DNAm, the histone acetylation ormethylation levels are dependent on the cancer stages.Initially, randomly-generated values within the rangesillustrated in Table III are assigned to histone acetylation andmethylation levels, according to the cancer stage of the entiregene network.

In addition to the data drawn from StatEpigen, themethylation level of a gene G is updated using the following

TABLE III. INITIAL HISTONE ACETYLATION AND METHYLATIONRANGES BASED ON THE CANCER STAGES

latter can induce changes in methylation level. Therefore,the R, coefficient represents the contribution of the kth

parent to the new gene methylation level, based on themolecular event history observed for the current gene.Pk(G) is:a) The probability of the kth conditional relationship that

includes the gene G, if it is a middle or an end gene. Itshows the incidence of a molecular event of the currentgene given the molecular events observed for its kth

parent inside the gene network.b) The simple relationship probability of the gene G if it is

a start gene;c) The "self-relationship" probability if the gene G is

inside a "self-updating" methylation step.The sign '+' or '-' is assigned according to the observedmolecular event present in the gene relationship: '+' is forH+, '-' for H-. For mutation, a random variable inside thenetwork decides if it is '+' or '-'.

Based on expression (4), the methylation level is updatedat the time T+1 considering its value at the time T andseveral other factors. The number of parent genes(Parent(G)) indicates how a gene is connected to the rest ofthe genes in the network. This information can be useful toidentify the highly-connected genes ("hub genes") in anetwork and to analyze their contribution to cancer, incomparison with other genes. Additionally, the ratio 1 /(E+S) shows the strength of the gene network. For example,in a small network, (fewer edges and "self-relationships" inthe graph), the methylation level will be updated to a highervalue than in a larger network. As a result, the formernetwork will register a faster progression in tumordevelopment than the latter.

INITIAL METHYLATION RANGES FOLLO'NING THE CANCERSTAGES

TABLE II.

During this step, the methylation level value for eachgene is randomly- generated within these ranges, incompliance with the cancer stage information provided bythe gene relationship. For example, having the following twogene relationships in the network, (where "H+" and "M"denote respectively hypermethylation and mutation):P(MGMT (H+) IKRAS (M), carcinoma in situ) = 0.282,P( APC (H+) I TP53 (M), invasive carcinoma) = 0.688,then the initial methylation level for the MGMT (06

_

methylguanine-DNA methyltransferase) and KRAS (v-Ki­ras2 Kirsten rat sarcoma viral oncogene homolog) geneswill be a value in the 40 - 40.5 range, since this relationshipwas observed in carcinoma in situ. Additionally, for theAPC and TP53 genes, methylation level is initially higher,between 50 and 50.5, because the latter was observed to beplausible in invasive carcinoma, (a more advanced cancerstage than the first case).

191

6 There are five known families ofhistones Hl/H5, H2A, H2B, ill and H4,grouped in two categories: the core histones (H2A, H2B, ill, H4) and thelinker histones (HI and H5).

In the colon cancer model, the DNAm updates based onHM are labelled as the HM_DM step, while the evolution ofhistones given the DNAm level is referred to as the DM_HMstep.

4) Network timeGiven its importance, the DNAm update step is taken as

the time-step for the gene framework. The gene methylationlevel, from the simple and conditional relationships, isreferred to as STATIC_UPDATE. Similarly, themethylation updates based on "self-relationships" is labeled

Cancer stages Histone weight rangesHealthy 0.0010 - «) 0.0020

Carcinoma in situ 0.0020 - «) 0.0035

Invasive carcinoma 0.0035 - «) 0.0075

Metastasis 0.0075 - «) 0.0090

AVERAGE 11ETHYLATION LEVEL OF THE ENTIRE NETWORKBASED ON CANCER STAGES

Cancer stages Average of networkmethylation level

Healthy 32 - «) 41

Carcinoma in situ 41 - «) 52

Invasive carcinoma 52 - «) 102

Metastasis >= 102

TABLE V.

DYN_UPDATE. Chronologically, a network methylationcycle starts with one HM_DM step, followed by aSTATIC_UPDATE phase which is combined with anumber of DYN_UPDATE iterations, (--- 103 or 104

) . SinceDNAm has been reported to be a more stable action innature than HM [12], different dynamics are used to handleHM events in the current model. Hence, the methylationcycle ends with 'p' DM_HM iterations (p=3). The genenetwork is allowed to evolve over time and the networkmethylation level is checked after a number of methylationcycles (--- 105 or 106

) .

5) Network evaluation methodsThe gene network is rated as being in one of the possible

cancer stages in two ways:a) based on the average network methylation level

(network score);b) based on the percentage ofgenes highly methylated

due to the influence of the current genetic and epigeneticevents.

Firstly, these methods are used to test the networkmethylation level against predefined methylation valuesbased on cancer stages (standard approach (SA)).The relationship between the network score and the cancerstages is given in Table V. At the start, these values are usedto assess the initial cancer stage of the network. After aspecific predetermined number of methylation cycles, (e.g.--- 105 or 106

) , the network score is compared again withthese values. For example, a network score of 48 shows thatthe network is in the carcinoma in situ stage. If its score is54 after the given number of methylation cycles, then thegene network is considered to be in invasive carcinoma,meaning the tumor has progressed in this time period.

This method is limited by its sensitivity to outlier genes;a gene with a high methylation level can significantlyincrease the network score. The gene percentage methodhas been derived, based on this observation. Here, thedecision on network cancer stage is made by analyzing thepercentage of genes that have their methylation level withina specific range, as presented in Table VI. For example, if2% of genes from a network have their methylation level inthe 40-50 range, the network is still considered to be in a"healthy" stage, even if this range is specific to carcinomain situ in general. Similarly, if more than 90% of genes have

HISTONE 'HEIGHT RANGES FOR EACH CANCER STAGETABLE IV.

The DNAm contribution to histone evolution inside agene G is described by the following expression:

H'E(G) = IHE(G) ± [WE(G) + Mean(G) x R]I, (6)where:E = acetylation or methylation for the current histone;HE'(G) = the histone acetylation or methylation level(according to the event E) in the next Time-step;HE(G) = the histone acetylation or methylation level (basedon the event E) in the current Time-step;R = random number (E [10-5, 10-4] ) ;

Mean(G) = the average of DNAm level for all networkgenes found in the same cancer stage with gene G;WE(G) = the weight ratio for the event E based on thecancer stage of the gene G: (it is randomly-generated withinthe ranges given in Table IV).The sign '+' or '-' is decided by a random network value.

Notation: Further, we will use the notation «) L for anupper range limit L that is not reachable. For example, inTable IV the histone weight corresponding to the "healthy"stage is always lower than 0.0020.

formula (adapted from that given by [13]) that links themodifications of the core histones" H3 and H4 to DNAm:

D'(G) = ID(G) ± Mod x RI, (5)where:D'(G) = DNAm of the gene G in the next Time-step;D(G) = DNAm of the gene G in the current Time-step;Mod = average of Acetylation and Methylation in H3 andH4;R = random number (E [10-5, 10-4] ) .

The sign '+' or '-' is decided by a random network value.

192

their methylation level in the 50- 97 range, the respectivegene network is in the metastasis stage already, and not ininvasive carcinoma.

TABLE VI. THE LINKAGE BET'NEEN THE NETWORK GENE PERCENTAGEAND 11ETHYLATION RANGES IN DIFFERENT CANCER STAGES

Cancer stages Methylation Genelevel percentage

32-40 > 970/0Healthy 40 - 50 0.1 - «) 30/0

32-40 0.1 - «) 30/0

Carcinoma in 40 - 50 3 - «) 900/0situ

50 -97 0.1 - «) 30/0

40 - 50 90 - 1000/0

Invasive 50 - 97 3 - «) 900/0carcinoma

> 97 0.1 - «) 350/0

50 -97 90 -1000/0Metastasis > 97 35 - 1000/0

In the second approach, the current network is comparedwith other gene networks obtained from both the same ordifferent databases [10], [14] (training-testing approach(TTA)). TTA contains two distinct working phases, thetraining and the testing steps, where two networks groups areused: the training and testing set, respectively. In contrast tothe previous approach (SA), the values of the network scoreand gene percentages based on cancer stages in TTA are notknown apriori, but are obtained from the training networkset and are applied to evaluate the networks from the testingset, (the current network is included in this category).

The aim of the testing phase is to detect the ranges for thenetwork score and gene percentage, for each stage of cancer,similarly to selecting values from Table V and Table VI, butadjusted, according to the information provided by thenetworks from the training set. For example, if a network N1is considered in invasive carcinoma with a network score of51.7 and a network N2 is deemed at the same cancer stage,with a network score of 102.6, this will extend the range forthe network score corresponding to invasive carcinoma stagefrom 52-102 (as presented in Table V), to 51.7-102.6.However, a training network can influence these twoattributes only if the difference between new and existingvalues is lower than a predefined threshold,ACCEPTED_TRAIN. If this condition is not fulfilled, thenetwork is viewed as an outlier and it is not allowed tomodify the network score and gene percentage from thetraining set.

The network score and gene percentage can be changedalso during the testing phase. After the test network islabeled with a cancer stage, it is added to the training set andit can adjust these factors of interest if it is not an outlier (asdefined above). In conclusion, the new network cancontribute to the decisions about further networks.

193

6) Cancer progressionInitially, the network is in the "healthy" stage and

corresponds to relationships within the lowest cancer stage.If several gene relationships, containing the same genes, butin different cancer stages, are found in StatEpigen, the onesfrom the most advanced cancer stage are kept in a repository,for further refinement.

After a specified number of methylation cycles, thenetwork is rated, using the methods described in the previoussubsection. At this step, the gene network can be updated if ithas progressed to a more advanced cancer stage. Some of theactual gene relationships may be overwritten by updatedinformation for similar relationships from the repository.Additionally, new gene relationships corresponding to thecurrent cancer stage can be included. The gene framework isallowed lead-in time by repeating the entire process 'n' times(n = 102 or n = 103

) , in order to register its cancer progressionpotential.

The tumor progression is illustrated in Fig. 2. Thenetwork score was computed for two different genenetworks that were built data from StatEpigen, in thefollowing way:

a) the first network (NJ contains more generelationships found in carcinoma in situ than in invasivecarcinoma stage;

b) the majority ofgene relationships from the secondnetwork (NJJ have been observed in invasive carcinoma;

c) regarding the number ofgene relationships, "self­relationships" and gene number, N2 is much larger than N](the size ofN, is several times greater than the size ofNJ.

The execution time for this simulation takes 104

methylation cycles, each of them having 103

DYN UPDATE iterations.

Figure 2. Tumor progression based on the network score for two differentgene networks (N! and N2)

By analyzing the chart from Fig. 2, both gene networksare considered in carcinoma in situ stage at the end of thissimulation according to the first network evaluation method(Table V). A remarkable observation is that the tumor

progression was more aggressive in N1 than in N2, meaningthat the methylation level update rate was lower in N2 than inN1. This behavior is a consequence of the fact that the rate ofupdating the gene methylation level depends also on thenetwork size, (as described in the subsection 2 on"Methylation update step"). Therefore, it is lower in a large(N2) than in a small network (Ni).

III. CONCLUSIONS AND FUTURE WORK

In this paper, a novel model for colorectal cancerdynamics was presented. This has a multi-layered structureconsidering three important elements known to be involvedin cancer development: molecular signals (genetic andepigenetic), gene interactions and the transitions betweencancer stages. The main contribution of the model is theintroduction of an integrative framework to analyze differentevents and their interdependencies, focusing on their impacton DNA methylation levels. A limitation of the frameworkto date is the fact that the gene weight in the network iscalculated only based on its relationships with other genes, asdescribed in the subsection IL2 on "Methylation updatestep". This weight does not yet consider the importance of aspecific gene in tumor development. For example, it hasbeen reported that mutations of the gene KRAS [9] have ahigh impact on this phenomenon, information that has notbeen included in the current framework. Therefore, a furtherdevelopment step consists of integrating gene impact in themodel. Additionally, data from databases other thanStatEpigen, (such as [14]), will be used for validation.

According to [15], multiple successive "hits" are requiredto transform a normal cell into an abnormal one: fewermutations are thus needed to produce malignancies if somechanges are already inherited. The impact of heredity incancer can be measured through the familial relative riskratio (RR) coefficient. Breast cancer has been reported asone of the most hereditable cancer types with a value of 2.02for RR, followed by lung cancer (RR = --- 2.00) and prostatecancer (RR = 1.89) [1]. Clearly, the heredity factor will betaken into account in any extension of the model describedhere.

Furthermore, ageing, considered a major risk factor incancer development due to its influence on genetic andepigenetic events [16], must be incorporated in any realisticgene framework. In model terms, this is linked tomethylation level updates, given progressive accumulation ofDNA methylation and histone modifications over time [17].

In addition, the significant impact of viral and bacterialinfections on human tumors has been highlighted in manyrecent studies, [18] - [19]. Viruses, such as human papilloma(HPV), hepatitis B, and Epstein-Barr have been associatedwith cancer initiation, [19], due to their capacity to damagenormal cell control in the infected organisms by inducingunscheduled cell growth and by avoiding apoptosis. In [20],the authors have shown that Helicobacter pylori plays a keyrole in gastric cancer development owing to aberrant increaseof DNA methylation level in gastric cells. Data on bacterialand viral infection, therefore, must also be integrated, in amore comprehensive model, together with known propertiesof cell types, such as stem cells. For these, epigenetic and

194

genetic changes are known to be conserved during celldivision and transmitted from generation to generation [20].Discussion of the impacts of environment and lifestyle oncancer development, have also featured in the literature [21],[22] and can to some extent be quantified for inclusion in theframework presented here.

Finally, in considering genes, commonly mutated and/orfrequently hypermethylated in cancer, (such as TP53 [23],MutL homolog 1, colon cancer, nonpolyposis type 2 (MLH1),KRAS, APC), and, respectively, (such as MGMT, Rasassociation domain family 1 isoformA (RASSF1A)) [9], thereis potential to extend our initial colon cancer model to othercancer types.

ACKNOWLEDGMENT

This project acknowledges financial support fromCIESCI ERA-Net Complexity Project, (EC/IRCSET).Useful discussions with colleagues in the Centre forScientific Computing & Complex Systems Modelling (SCI­SYM) and access to StatEpigen are also gratefullyacknowledged.

REFERENCES

[1] A. Risch and C. Plass, "Lung cancer epigenetics and genetics,"International Journal ofCancer, vol. 123, no. 1, pp. 1-7,2008.

[2] K. Hemminki, H. Zhang, and K. Czene, "Familial and attributablerisks in cutaneous melanoma: effects of proband and age, " Journal ofInvestigative Dermatology, vol. 120, no. 2, pp. 217 - 223,2003.

[3] P. Jelinic and P. Shaw, "Loss of imprinting and cancer," The JournalofPathology, vol. 211, no. 3, pp. 261 - 268,2007.

[4] C.D. Allis, T. Jenuwein, D. Reinberg, and M.L. Caparros,Epigenetics.: Cold Spring Harbor Laboratory Press, 2007.

[5] 1. G. Herman and S. B. Baylin, "Gene silencing in cancer inassociation with promoter hypermethylation," New England JournalofMedicine, vol. 349, no. 21, pp. 2042-2054, 2003.

[6] C. Bock and T. Lengauer, "Computational epigenetics,"Bioinformatics, vol. 24, no. 1, pp. 1-10, 2008.

[7] S. 1. Lim, T. W. Tan, and 1.C. Tong, "Computational Epigenetics: thenew scientific paradigm," Bioinformation, vol. 4, no. 7, pp. 331 - 337,2010

[8] H.T. Bjornsson, D. M. Fallin, and A.P. Feinbe, "An integratedepigenetic and genetic approach to common human disease,"TRENDS in Genetics, vol. 20, no. 8, pp. 350-358, 2004.

[9] P.A. Jones and S.B. Baylin, "The fundamental role of epigeneticevents in cancer," Nature Reviews Genetics, vol. 3, no. 6, pp. 415­428,2002.

[10] Statepigen, Database of colon cancer epigenetic statistics. [Online].Available: http://statepigen.sci-sym.dcu.ie/[Accessed: 27 April2012].

[11] D. Heckerman, "A tutorial on learning with Bayesian networks," NatoAsi Series D Behavioural And Social Sciences, vol. 89, pp. 301-354,1998.

[12] H. Cedar and Y. Bergman, "Linking DNA methylation and histonemodification: patterns and paradigms," Nature Reviews Genetics, vol.10, no. 5, pp. 295 - 304, 2009.

[13] K. Raghavan and H. 1. Ruskin., "Computational EpigeneticMicromodel-Framework for Parallel Implementation and InformationFlow," in The Eighth International Conference on Complex Systems,vol. 8, Boston, 2011, pp. 340 - 353.

[14] PubMeth: Reviewed methylation database in cancer. [Online].Available: http://www.pubmeth.org/ [Accessed: 12 April 2012].

[15] A. Knudson, "Two genetic hits (more or less) to cancer," NatureReviews Cancer, vol. 1, no. 2, pp. 157-162, 2001.

[16] M.F. Fraga and M. Esteller, "Epigenetics and aging: the targets andthe marks," Trends in Genetics, vol. 23, no. 8, pp. 413-418, 2007.

[17] M.F. Fraga, R. Agrelo, and M. Esteller, "Cross-Talk between Agingand Cancer," Annals ofthe New YorkAcademy ofSciences, vol. 1100,no. 1,pp.60-74, 2007.

[18] V. Samaras, P.I. Rafailidis, E.G. Mourtzoukou, G. Peppas, and M.E.Falagas, "Chronic bacterial and parasitic infections and cancer: areview," The Journal ofInfection in Developing Countries, vol. 4, no.05,pp.267-281,2010.

[19] C. Carrillo-Infante, G. Abbadessa, L. Bagella, and A. Giordano,"Viral infections as a cause of cancer (Review)," InternationalJournal ofOncology, vol. 30, no. 6, pp. 1521-1528,2007.

195

[20] D. Perrin, H.1. Ruskin, and T. Niwa, "Cell type-dependent, infection­induced, aberrant DNA methylation in gastric cancer," Journal ofTheoretical Biology, vol. 264, no. 2, pp. 570-577, 2010.

[21] Latest reports on cancer statistics: Cancer Research UK. [Online].Available: http://info.cancerresearchuk. org/cancerstats/reports/[Accessed: 14 April 2012].

[22] Lifestyle and cancer risk in the UK population - statistics: CancerResearch UK. [Online]. Available:http://info .cancerresearchuk. org!cancerstats/causes/lifestyle/[Accessed: 14 April 2012].

[23] R. 1. C. Steele, A. M. Thompson, P. A. Hall, and D. P. Lane, "Thep53 tumor suppressor gene," British Journal ofSurgery, vol. 85, no.11,pp. 1460-1467, 1998.