12
Research Article Low-Rank and Sparse Matrix Decomposition for Genetic Interaction Data Yishu Wang, 1 Dejie Yang, 2 and Minghua Deng 1,3,4 1 Center for Quantitative Biology, Peking University, Beijing 100871, China 2 Institute of Computing Technology, Chinese Academy of Science, Beijing 100190, China 3 School of Mathematical Sciences, Peking University, Beijing 100871, China 4 Center for Statistical Sciences, Peking University, Beijing 100871, China Correspondence should be addressed to Minghua Deng; [email protected] Received 8 January 2015; Accepted 13 March 2015 Academic Editor: Junwen Wang Copyright © 2015 Yishu Wang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background. Epistatic miniarray profile (EMAP) studies have enabled the mapping of large-scale genetic interaction networks and generated large amounts of data in model organisms. One approach to analyze EMAP data is to identify gene modules with densely interacting genes. In addition, genetic interaction score ( score) reflects the degree of synergizing or mitigating effect of two mutants, which is also informative. Statistical approaches that exploit both modularity and the pairwise interactions may provide more insight into the underlying biology. However, the high missing rate in EMAP data hinders the development of such approaches. To address the above problem, we adopted the matrix decomposition methodology “low-rank and sparse decomposition” (LRSDec) to decompose EMAP data matrix into low-rank part and sparse part. Results. LRSDec has been demonstrated as an effective technique for analyzing EMAP data. We applied a synthetic dataset and an EMAP dataset studying RNA-related processes in Saccharomyces cerevisiae. Global views of the genetic cross talk between different RNA-related protein complexes and processes have been structured, and novel functions of genes have been predicted. 1. Introduction Genetic interactions, which represent the degree to which the presence of one mutation modulates the phenotype of a second mutation, could be measured systematically and quantitatively in recent years [1, 2]. Genetic interactions can reveal functional relationships between genes and path- ways. Furthermore, genetic networks measured via high- throughput technologies could reveal the schematic wiring of biological processes and predict novel functions of genes [3]. Recently, several high-throughput technologies have been developed to identify genetic interactions at genome scale, including Synthetic Genetic Array (SGA) [4], Diploid-Based Synthetic Lethality Analysis on Microarrays (dSLAM) [5], and epistatic miniarray profile (EMAP) [6]. In particular, EMAP systematically construct double deletion strains by crossing query strains with a library of test strains and identify genetic interactions by measuring a growth pheno- type. An score was calculated based on statistical methods for each pair of genes, while negative scores represent synthetic sick/lethal and positive scores indicate alleviating interactions [6]. Consequently, for each pair of genes, there are two different measures of relationship in EMAP platform. First, the genetic interaction score ( score) represents the degree of synergizing or mitigating effects of the two mutations in combination. Second, the similarity (typically measured as a correlation) of their genetic interaction profiles represents the congruency of the phenotypes of the two mutations across a wide variety of genetic backgrounds. So there are two important aspects in exploiting EMAP data. On the one hand, cellular functions and processes are carried out in series of interacting events, so genes participating in the same biological process tend to interact with each other. erefore, algorithms that detect gene modules composed of densely interacting genes are of great interest. Within these modules, genes tend to have similar genetic interaction profiles; thus the submatrix for these genes tends to have a low-rank Hindawi Publishing Corporation BioMed Research International Volume 2015, Article ID 573956, 11 pages http://dx.doi.org/10.1155/2015/573956

Research Article Low-Rank and Sparse Matrix Decomposition ...downloads.hindawi.com/journals/bmri/2015/573956.pdfResearch Article Low-Rank and Sparse Matrix Decomposition for Genetic

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Low-Rank and Sparse Matrix Decomposition ...downloads.hindawi.com/journals/bmri/2015/573956.pdfResearch Article Low-Rank and Sparse Matrix Decomposition for Genetic

Research ArticleLow-Rank and Sparse Matrix Decomposition forGenetic Interaction Data

Yishu Wang1 Dejie Yang2 and Minghua Deng134

1Center for Quantitative Biology Peking University Beijing 100871 China2Institute of Computing Technology Chinese Academy of Science Beijing 100190 China3School of Mathematical Sciences Peking University Beijing 100871 China4Center for Statistical Sciences Peking University Beijing 100871 China

Correspondence should be addressed to Minghua Deng dengmhmathpkueducn

Received 8 January 2015 Accepted 13 March 2015

Academic Editor Junwen Wang

Copyright copy 2015 Yishu Wang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Background Epistatic miniarray profile (EMAP) studies have enabled the mapping of large-scale genetic interaction networksand generated large amounts of data in model organisms One approach to analyze EMAP data is to identify gene moduleswith densely interacting genes In addition genetic interaction score (119878 score) reflects the degree of synergizing or mitigatingeffect of two mutants which is also informative Statistical approaches that exploit both modularity and the pairwise interactionsmay provide more insight into the underlying biology However the high missing rate in EMAP data hinders the developmentof such approaches To address the above problem we adopted the matrix decomposition methodology ldquolow-rank and sparsedecompositionrdquo (LRSDec) to decompose EMAP data matrix into low-rank part and sparse part Results LRSDec has beendemonstrated as an effective technique for analyzing EMAP data We applied a synthetic dataset and an EMAP dataset studyingRNA-related processes in Saccharomyces cerevisiae Global views of the genetic cross talk between different RNA-related proteincomplexes and processes have been structured and novel functions of genes have been predicted

1 Introduction

Genetic interactions which represent the degree to whichthe presence of one mutation modulates the phenotype ofa second mutation could be measured systematically andquantitatively in recent years [1 2] Genetic interactionscan reveal functional relationships between genes and path-ways Furthermore genetic networks measured via high-throughput technologies could reveal the schematic wiring ofbiological processes and predict novel functions of genes [3]Recently several high-throughput technologies have beendeveloped to identify genetic interactions at genome scaleincluding Synthetic Genetic Array (SGA) [4] Diploid-BasedSynthetic Lethality Analysis on Microarrays (dSLAM) [5]and epistatic miniarray profile (EMAP) [6] In particularEMAP systematically construct double deletion strains bycrossing query strains with a library of test strains andidentify genetic interactions by measuring a growth pheno-type An 119878 score was calculated based on statistical methods

for each pair of genes while negative 119878 scores representsynthetic sicklethal and positive 119878 scores indicate alleviatinginteractions [6]

Consequently for each pair of genes there are twodifferent measures of relationship in EMAP platform Firstthe genetic interaction score (119878 score) represents the degreeof synergizing or mitigating effects of the two mutations incombination Second the similarity (typically measured asa correlation) of their genetic interaction profiles representsthe congruency of the phenotypes of the two mutationsacross a wide variety of genetic backgrounds So there aretwo important aspects in exploiting EMAP data On theone hand cellular functions and processes are carried out inseries of interacting events so genes participating in the samebiological process tend to interact with each otherThereforealgorithms that detect gene modules composed of denselyinteracting genes are of great interest Within these modulesgenes tend to have similar genetic interaction profiles thusthe submatrix for these genes tends to have a low-rank

Hindawi Publishing CorporationBioMed Research InternationalVolume 2015 Article ID 573956 11 pageshttpdxdoiorg1011552015573956

2 BioMed Research International

structureOn the other hand the cross talks betweenmodulesare usually indicated by gene pairs with high 119878 scores (sothat the genetic interaction is significant) Removing themresults in better low-rank structure Evocatively these genepairs are likely shadows over the low-rankmatrix and connectdifferent rank areas These cross talks reveal the relation-ships of different biological process or protein complexesMeanwhile gene pairs exhibiting high absolute value of 119878

scores may encode proteins that are physically associatedor be enriched in protein-protein interactions [7ndash9] Sothe investigation of 119878 score is equally important Howeverthe current methodologies in genetic interaction networksanalysis did not efficiently address these two important issuessimultaneously

In order to identify modules and between-module crosstalks in genetic interaction networks we employ the ldquolow-rank and sparse decompositionrdquo (LRSDec) to decomposeEMAP data matrix into a low-rank part and a sparse partWe propose that the low-rank structure accounts for genemodules in which genes have high correlations and thesparsity matrix captures the significant 119878 scores In particularentries in sparse matrix found by LRSDec correspond totwo sources of biologically meaningful interactions within-module interactions and between-module links In this paperwe focus our discussion of the sparse matrix on the resultsof between-module links while the results of within-moduleinteractions can be found in the Supplementary Mate-rial available online at httpdxdoiorg1011552015573956(Supplementary Data 1)

Low-rank and sparse of matrix structures have beenprofoundly studied in matrix completion and compressedsensing [10 11] The robust principal component analysis(RPCA) [12] proved that the low-rank and the sparse com-ponents of a matrix can be exactly recovered if it has a uniqueand precise ldquolow-rank + sparserdquo decomposition RPCA offersa blind separation of low-rank data and sparse noises whichassumedX = L+S (S is the sparse noise) and exactly decom-posesX into L and Swithout predefined rank(L) and card(S)Another successful matrix decomposition method GoDecstudied the approximated ldquolow-rank+ sparserdquo decompositionof amatrixX by estimating the low-rank part L and the sparsepart S from X allowing noise that is X = L + S + e andconstrained the rank range of L and the cardinality range of119878 [13] It has been stated that GoDec has outperformed otheralgorithms before

In this paper we modified the GoDec matrix decompo-sition method and developed ldquolow-rank and sparse decom-positionrdquo (LRSDec) to estimate the low-rank part L and thesparse part S of X LRSDec minimizes the nuclear normof L and predefines the cardinality range of S while con-sidering the additive noise e Different from GoDec whichdirectly constrains the rank range of L LRSDec minimizesits responding convex polytopes that is the nuclear norm ofL It has been proven that the nuclear norm outperforms therank-restricted estimator [14] Furthermore if in presence ofmissing data LRSDec could impute the missing entries whiledecomposing with no need for data pretreatment whileGoDec could not accomplish decomposition and imputationsimultaneously then we stated the convergence properties of

our algorithm and proved that given the two regularizationparameters the objective value of LRSDec monotonicallydecreases By applying both methods to a synthetic datasetwe demonstrated the superiority of LRSDec over GoDecFinally we analyzed a genetic interaction dataset (EMAP)using our algorithm and identified many biologically mean-ingful modules and cross talks between them

2 Model

LetX be an 119898 times 119899 matrix that represents a genetic interactiondataset where 119898 is the number of query genes and 119899 is thenumber of library genes We propose to decompose X as

X = L + S + e (1)

where L isin R119898times119899 denotes the low-rank part and S isin

R119898times119899 denotes the sparse part and e is the noise Here weintroduce L isin R119898times119899 to account for modules in whichgenes are highly correlated These modules correspond toprotein complexes pathways and biological pathways inwhich genes tend to share similar genetic interaction profiles[15] S isin R119898times119899 is introduced to account for significant119878 scores which are either gene pairs in the same modulethat have genetic interactions or cross talks among differentfunctional modules

Based on the assumptions above we propose to solve thefollowing optimization problem

minimize rank (L) minimize card (S)

subject to sum

(119894119895)

(119883119894119895

minus 119871119894119895

minus 119878119894119895)2

⩽ 120575(2)

where 120575 ge 0 is a regularization parameter that controls theerror tolerance and card(S) denote the number of nonzeroentries in matrix S

To make the minimization problem tractable we relaxthe rank operator on L with the nuclear norm which hasbeen proven to be an effective convex surrogate of the rankoperator [14]

minimize Llowast minimize card (S)

subject to sum

(119894119895)

(119883119894119895

minus 119871119894119895

minus 119878119894119895)2

⩽ 120575(3)

where Llowastis the nuclear norm of L (L

lowast= sum119903

119894=1120590119894 where

1205901 120590

119903are the singular values of L and 119903 is the rank of L)

However missing data is commonly encountered inEMAP data confounding techniques such as cluster analysisand matrix factorization Here we extend our basic model(3) to handle EMAP data with missing values by imputingmissing entries in thematrix simultaneouslywhen estimatinglow-rank matrix L and sparse matrix S Suppose that we onlyobserve a subset of X indexed by Ω and the missing entriesare indexed by Ω

perp In order to find a low-rank matrix L and

BioMed Research International 3

a sparse matrix S based on the observed data we propose tosolve the following optimization problem

minimize Llowast

minimize card (S)

subject to sum

(119894119895)isinΩ

(119883119894119895

minus 119871119894119895

minus 119878119894119895)2

⩽ 120575(4)

3 Algorithm

Similar to GoDec the optimization problem of (3) can besolved by alternatively optimizing the following two subprob-lems until convergence

L119905

= arg minLlowast

1003817100381710038171003817X minus L minus S119905minus1

1003817100381710038171003817

2

119865 (5a)

S119905

= arg mincard(S)le119896

1003817100381710038171003817X minus L119905minus S1003817100381710038171003817

2

119865 (5b)

In each iteration we optimize the objective function byalternatively updating L and S Firstly the subproblem (5a)can be solved by [14] For fixed S the solution of (5a) is

L119905

= T120582

(X minus S119905minus1

) (6)

Here 120582 ge 0 is a regularization parameter controlling thenuclear norm of estimated value L

119905 where

T120582

(W) = UD120582V1015840

with D120582

= diag [(1198891

minus 120582)+

(119889119903

minus 120582)+]

(7)

UDV1015840 is the Singular Value Decomposition (SVD) of W andhere 119905

+= max(119905 0) The notation T

120582(W) refers to soft-

thresholding [14]Next the subproblem (5b) in (3) could be updated via

entry-wise hard thresholding of X minus L119905for fixed L

119905 Before

giving the solution we define an orthogonal projectionoperator P Suppose there is a subset of dataset W indexedby Ω then the matrix W can be projected onto the linearspace of matrices supported by Ω

119875Ω

(W)(119894119895)

=

119882119894119895

if (119894 119895) isin Ω

0 if (119894 119895) notin Ω

(8)

AndPΩperp is its complementary projection that isP

Ω(W) +

PΩperp(W) = WThen the solution of (5b) could be given as follows

S = PΘ

(X minus L119905) (9)

where P is the orthogonal projection operator as definedabove Θ is the nonzero subset of the first 119896 largest entriesof |(X minus L

119905)| Then the matrix (X minus L

119905) can be projected onto

the linear space of matrices supported by Θ

(X minus L119905)(119894119895)

=

(X minus L119905)119894119895

if (119894 119895) isin Θ

0 if (119894 119895) notin Θ

(10)

So far we have developed the algorithm for solvingproblem (3) As for problem (4) due to the existence ofmissing values we took the optimization on the observeddata Ω We updated L

119905and S

119905of the following optimization

subproblems respectively

L119905

= arg minLlowast

1003817100381710038171003817PΩ (X minus L minus S119905minus1

)1003817100381710038171003817

2

119865 (11a)

S119905

= arg mincard(S)le119896

1003817100381710038171003817PΩ (X minus L119905minus S)

1003817100381710038171003817

2

119865 (11b)

The term PΩ

(X minus L minus S)2

119865is the sum of squared errors on

the observed entries indexed by ΩThe subproblem (11a) can be solved by updating Lwith an

arbitrary initialization using [14]

L119905

larr997888 T120582

(PΩ

(X minus S119905minus1

) + PΩperp (L)) (12)

The solution of subproblem (11b) is

S119905

= PΘ

(PΩ

(X minus L119905)) (13)

where Θ is the nonzero subset of the first 119896 largest entries of|PΩ

(X minus L119905)|

Now we have the following algorithm

Algorithm 1 (LRSDec) (i) Input X isin R119898times119899 Initialize S larr 0(ii) Iterate until convergence(a) L-step iteratively update L using (12)(b) S-step Solve S using (13)

(iii) Output L S

The convergence analysis of our algorithms is provided inthe Supplementary Material

4 Parameter Tuning

We have two parameters that need to be tuned in our models120582 and 119896 Here we propose a 10-fold cross validation strategyto select them The idea is as follows let Ω be the index ofobserved entries ofX We randomly partition Ω into 10 equalsize subsets and choose training entriesΩ

1and testing entries

Ω2 Ω1

cup Ω2

= Ω and Ω1

cap Ω2

= Oslash |Ω1| = 09 lowast |Ω| |Ω

2| =

01lowast |Ω| Wemay solve problem (15) on a grid of (120582 119896) valueson the training data

L119905

= arg minLlowast

10038171003817100381710038171003817PΩ1

(X minus L minus S119905minus1

)10038171003817100381710038171003817

2

119865

(14a)

S119905

= arg mincard(S)le119896

10038171003817100381710038171003817PΩ1

(X minus L119905minus S)

10038171003817100381710038171003817

2

119865

(14b)

Then we evaluate the prediction error (15) on the testingdata

Err (120582 119896) =1

2

10038171003817100381710038171003817PΩ2

(X minus L(120582 119896) minus S(120582 119896))10038171003817100381710038171003817

2

119865

(15)

The cross validation process is repeated for 10 timesThenwe can find the optimal parameter (120582

lowast

119896lowast

) which minimizesthe mean of the prediction error

4 BioMed Research International

0 50 100 150 200 250 300 3500

200400

6000

10

20

Rank of matrix L

Card of matrix S

Rela

tive e

rror

of

The minimum ofrelative error

25

250

deco

mpo

sitio

n

Figure 1 Results of parameter tuning on synthetic data 119909-axisdifferent cardinality corresponding to parameter 119896 119910-axis differentrank corresponding to parameter 120582 119911-axis the relative error X minus

X2

119865X

2119865

5 Results

51 Synthetic Data We simulated a synthetic data and thenapplied LRSDec algorithm andGoDec algorithm to it Specif-ically low-rank part sparse part and noises are generated asfollows

(i) Low-rank part the covariance matrix Σ is generatedbyHH119879 whereH isin R119898times119870 andH

119894119895sim N(0 1) Here119870

is the number of hiddenmodulesThe random entriesL119895are drawn fromN(0 120591Σ) Let L = [L

1 L

119899]

(ii) Sparse part the non-zero entries in sparse matrixare generated from the tail of Gaussian distributionN(1 2) whose upper quantile is 120572 = 001 Werandomly selected 70 of them to assign the oppositesignThis is consistent with EMAP datasets in whichnegative genetic interactions are much more preva-lent than the positive ones

(iii) e = 10minus2

lowastF wherein F is a standard Gaussianmatrix

A low-rank matrix 119871 with rank 25 and sparse matrix withcardinality 250 were generated respectively Now we have

X = L + S + e (16)

The first step is parameter training and the result isshowed in Figure 1 Minimal prediction error was achievedwhen 120582 = 250 and 119896 = 25 which coincides with the rankand cardinality of the synthetic data This demonstrated theeffectiveness of cross validation procedure

Next we compared the performance of LRSDec algo-rithm and GoDec algorithm by comparing their predictionerror The relative error is defined as

10038171003817100381710038171003817W minus W10038171003817100381710038171003817

2

119865

W2

119865

(17)

where W is the original matrix and W is an estimateap-proximation As both algorithms are influenced seriouslyby the parameters we compared the relative error of thetwo algorithms by given different parameters To make thecomparison simple we only changed one parameter with

another parameter fixed (Figure 2) One can see that bothalgorithms reach the smallest relative if adopting the correcttwo parameters and LRSDec outperforms GoDec In theSupplementaryMaterial we also compare the performance ofboth algorithms under different noise setting and the trendsare the same

52 Application to EMAP Data from Yeast We also appliedour method to EMAP data interrogating RNA processing inS cerevisiae which consists of 552 genes involved in RNA-related processes [9]This geneticmap contains about 152000pairwise genetic interaction measurements with about 29missing entries in data matrixWe applied ourmethod to thisEMAP data denoted asX to obtain twomatrices a low-rankmatrix L and a sparse matrix S X = L+S is the new completedata matrix with imputed missing entries To exploit thequantitative information from EMAP data we first subjectedthe entire low-rank matrix L to hierarchical clustering anapproach that groups genes with similar patterns of geneticinteractions It should be noted that using low-rank matrix Lin cluster improved the performance of hierarchical cluster-ing in detecting genetic interaction modules [16] Accordingto the clustering result we reordered rows and columnsof matrix X so that the protein complexes and biologicalprocesses showed in [9] could be found (Figure S1)

To help identify more modules of cellular functions andprocesses and reveal the relationships between them we fur-ther analyzed the matrices L and S Figure 3 is a flowchart ofour strategy 1 to detectmodules and cross talks between themthrough low-rank matrix L In this paper we define moduleas a cluster from hierarchical clustering (HC) that passesthrough GO-enriched filtering (Supplementary Section 3)The details are as follows Firstly we clustered the row genesof matrix L using hierarchical clustering (HC) Here weadopted the average-linkage hierarchical clustering algorithmin which the distance of gene 119860 and gene 119861 is defined as1 minus |cor(119860 119861)| where |cor(119860 119861)| is the absolute value of thecorrelation of genetic interaction profile of gene 119860 and gene119861 A cutoff needs to be applied for HC to cut the hierarchicalclustering tree We used the Jaccard index (SupplementarySection 4) to determine how well the predicted gene setscorrespond to benchmark (theoretical) gene sets [17] HereGO functional gene sets are used as benchmark (theoretical)gene setsThe cutoff atwhichHCachieved the highest Jaccardindex is used to cut the hierarchical tree We calculatedthe Jaccard index of every ldquoheightrdquo cutoff in hierarchicalclustering from 02 to 095 by 005 interval This step resultedin the best Jaccard index with height = 07 and 84 clustersNow we got the clusters of row genes in which genes act ina consistent manner across the entire column genes Thenwe filtered the clustering results by a hypergeometric testthat calculates the significance of enrichment of GO itemsand the 119901 value was set to 001 The clusters enriched in GOfunctional categories are defined as row modules Secondlyfor each row module we exploited modules of column genesbased on this row gene module in matrix X by clustering thecolumn genes of this submatrix of X Thirdly we screenedcolumn clusters whose interactions with the rowmodules are

BioMed Research International 5

0 50 100 150 200 250 300 3500

02

04

06

08

1

Number of ranks

The r

elat

ive e

rror

Relative error of matrix L

(a)

0 50 100 150 200 250 300 3500

02

04

06

08

1

12

14

Number of ranks

The r

elat

ive e

rror

Relative error of matrix S

(b)

0 100 200 300 400 500 6000

005

01

015

02

025

03

035

Number of cards

The r

elat

ive e

rror

Relative error of matrix L

GoDecLRSDec

(c)

0 100 200 300 400 500 6000

02

04

06

08

1

12

14

Number of cards

The r

elat

ive e

rror

Relative error of matrix S

GoDecLRSDec

(d)

Figure 2 Performances of LRSDec and GoDec in low-rank and sparse decomposition tasks on synthetic data under different parameters((a)-(b)) Fixed parameter card different parameter rank ((c)-(d)) fixed parameter rank different parameter card

HC row genes of L

Row modules

GO-enrichment filtering

Based oneach row module

HC column genes of L

Fisherrsquos exact testColumn modules

Connections between this row module and all column modules

Column modules that have functional

All column modules map to GO items

Connections among modules of the whole network

Repeat these three steps for

all significant

row modules

Figure 3 Flowchart of our strategy 1 for detecting modules and cross talks between them in genetic interaction network by low-rank matrixL

6 BioMed Research International

Table 1 Fisherrsquos exact test table (Gene set 119860)perp denotes thecomplementary set of gene set 119860 119860119861 denotes the number ofconnections between gene set 119860 and gene set 119861 119860119861

perp

denotes thenumber of connections between gene set 119860 and the complementaryset of gene set 119861

Gene set 119861 (Gene set 119861)perp

Gene set 119860 119860119861 119860119861perp

(Gene set 119860)perp 119860perp

119861 119860perp

119861perp

Table 2 Clustering results

(a) GO slim as benchmark gene set

Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched

200 0063 190 0022 46150 0070 138 0032 44100 0084 90 0050 5050 0088 30 0067 24

(b) GO BP FAT as benchmark gene set

Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched

200 0137 183 0044 47150 0147 142 0058 50100 0155 96 0091 5250 0131 34 0078 26 hypergeometric test applied to test the enrichment of gene sets Signif-icance level FDR lt= 005 Cluster the number of clusters to cut off thehierarchical clustering tree Enriched the number of modules predictedby hierarchical clustering enriched in the GO iterms

significantly enriched by Fisherrsquos exact test (Table 1 119901 value= 005) Here we defined the positive genetic interactions asthose gene pairs with genetic interaction scores 119878 ge 20 andnegative as 119878 le minus25 [9] The reduced gene sets of columngenes were defined as columnmodules (corresponding to therow module) Next we identified the enriched GO functionalcategories of these column modules by mapping them to GOitems (hypergeometric test) Finally repeating these steps forall row modules we identified the modules and intermodulegenetic cross talk of the whole genetic interaction network(Figures 4ndash6) where red and green represent a statisticallysignificant enrichment of positive and negative interactionsThe cross talks constructed in these figures are the 119878 scoresamong genes in the original matrix In the following we willdiscuss many of the interesting connections that have beenreported previously

The low-rank matrix found more functional modulesthan the original matrix (Table 2) In Table 2 we cut thedendrogram at different heights and compared the clustersobtained from the low-rank matrix and that of the originalmatrix Jaccard index [17] is used to determine how well thepredicted clusters recaptured the benchmark gene sets ((a)GO slim and (b) GO BP FAT) The definition of Jaccardindex can be found in the Supplementary Material Inthe ideal situation where predicted clusters perfectly match

the benchmark gene sets the Jaccard index is 1 The largerthe Jaccard index the better the predictions The clustersobtained from clustering of the low-rank matrix are moreenriched with GO functional categories at varying cutoffs(Table 2)

Figure 4 gives an overview of the relationships amongbiological processes when GO slim (downloaded from SGD[18] a broad overview of all of the top GO categories) is usedas GO items We found that several sets of genes that havebeen known to function in the same biochemical processescontain predominantly positive or negative interactionswhich was also observed in [9] For example genes classed asinvolved in RNA splicing and transcription are significantlyenriched in negative genetic interactions (Figure 4 greennodes) In contrast the module involved in protein foldinghas strong positive interactions (red node) In additionFigure 4 also suggested that not all modules have consistentpattern of interactions (yellow node) which is reasonable inbiological processes Finally several connections have beenpreviously discussed For example there is good evidencefor functional interactions between splicing and transcriptionin [19] and functional interactions between splicing andtranslation in [20] Furthermore [21] reported the cooper-ative relationship between protein folding and chromosomeorganization

Actually if we classify the GO functional categories tomore fine items (GO BP FAT downloaded from DAVIDhttpdavidabccncifcrfgov) we can get a more compre-hensive network (Figure 5) Many of the interaction resultsin Figure 5 have been reported before For instance genesinvolved in tubulin complex assembly process have negativegenetic interactions with genes involved in tRNA wobbleuridine modification process supported by SGD while thenegative genetic interactions between RNA splicing processand tRNA metabolic process could also be found Actuallythe genetic interaction between RNA splicing and chromatinmodification has been studied in [22] And the balance ofthe interactions between the processing of ribonucleoproteinassembly of intronic noncoding RNAs and the splicingprocess regulating the levels of ncRNA and host mRNA canbe found in [23] Tubulin functionally relating to roles ofthe elongator complex in tRNA wobble uridine modificationis supported by [24] Moreover epistasis and chromatinimmunoprecipitation experiments indicating that the lossof Rrp6 (regulation of transcription) function is paralleledby the recruitment of Hda1 (histone deacetylase) have beenreported by [25] Finally cotranscriptional recruitment of themRNA export factor Yra1 realized by direct interaction withthe 31015840 end processing factor Pcf11 was in [26]

In an effort to gain insight into the functional organi-zation of RNA-related complexes we used the GO CC FAT(downloaded from DAVID) as the GO functional categoriesto create a map that highlights strong genetic trends bothwithin and between these complexes This result could befound in the Supplementary Material

Now let us turn to the analysis of the sparse matrix S Thesparse matrix gives two distinct measures to exploit geneticinformation First extreme 119878 scores indicate cofunctionalmembership more efficiently [27] Second some 119878 scores

BioMed Research International 7

Negativeinteractions

Positiveinteractions

LOC1LIN1

BUD13

IST3LEA1

STO1

PRP11SNU66

CBC2BRR1MUD2

TGS1YNR004WCWC15ECM2

YPR152CNTC20ISY1CWC21SYF2CWC27PRP19CSN12

PML1MUD1

NAM8

HMT1SNU56

PRP42TIF35DBR1YHC1

LOC1

LIN1

BUD13

IST3

LEA1

STO1

PRP1

1

SNU66

CBC2

BRR1

MU

D2

TGS1

YNR0

04

WCW

C15

ECM2

YPR1

52

CN

TC20

ISY1

CWC2

1

SYF2

CWC2

7

PRP1

9

CSN12

PML1

MU

D1

NA

M8

HM

T1SN

U56

PRP4

2

TIF3

5

DBR

1

YHC1

NPL3PUF4THP2RTT103CTK2SIF2HOS2SET3CDC73LEO1

RTT109

RTT109

CTK1HTZ1SOH1

SOH1

DST1CSE2

CSE2

MED1

SDC1EAF7THP1SYC1RXT2

RXT2

SWR1VPS71

PHO23

SAP30

PHO23SAP30SIN3

SIN3

HIR2

PUF4

SIF2HOS2SET3CDC73LEO1

SOH1CSE2

SDC1EAF7

RXT2

PHO2

SAP3

0SI

N3

HIR2

HIR2

RTT109G

BP2

SWR1VPS71

GLC

7

HIR2

NPL

3

PUF4

THP2

RTT1

03

CTK2

SIF2

HO

S2SE

T3CD

C73

LEO1

RTT1

09

CTK1

HTZ

1

SOH1

DST

1

CSE2

MED

1

SDC1

EAF7

THP1

SYC1

RXT2

PHO23

SAP3

0

SIN3

HIR2

GBP

2YR

A1

THP2

RPN4

SOH1

SAC3

THP1

DO

A1

RAD23

RTT1

09

SIN3

SWD1

SWD3

SWD3

RPN

4CA

F40

BRE1

SWD1

BRE1

LGE1

CDC7

3EA

F6EA

F5EA

F7SR

C1PA

P2EA

F3

GBP2

RAD23

CSE2

PUF4

SWD1

SWD3

BRE1

LGE1

RAD52

SIF2

HO

S2

SOH1

SET3

CDC7

LEO1

SRC1

SDC1

EAF3

RTT1

0

SWR1

VPS

71

RAD5

RAD5

RAD2

EAF6

EAF7

BRE1SIF2HOS2SET3SEM1

SAC3THP1BUD13

RPN4

RAD52

GLC7SRC1TOR1RTT101RAD51

SOH1

CSE2SIN3

MLP2

BRE1

KEM

1

SIF2

HO

S2SE

T3SE

M1

SAC3

THP1

BUD13

RPN4

RAD52

GLC

7

SRC1

TOR1

RTT1

01

RAD51

SOH1

CSE2

SIN3

MLP

2

YBL104CYGR283CRIC1YPT6GIM4

YKE2GIM3

GIM5

YGR2

83

CRI

C1YP

T6G

IM4

YKE2

GIM

3

GIM

5

RNA splicing

Transcription

DNA metabolicprocess

Cellular proteinmodification

processTranslation

Cellular proteindegradation

Transport

Mitochondrionorganization

Ribosome

Chromosomeorganization

Response tostress

Cell-divisioncycle

Protein folding

LRS4

HEX

3SL

X8LR

P1RR

P6FP

R1TG

S1

Figure 4 Global view of the genetic cross talks between different RNA-related processes (GO slim items) Green and red represent astatistically significant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20)interactions respectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative geneticinteractions Nodes (balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heatmaps represent scores of interactions within one process and the rectangle heatmaps represent scores of interactions betweentwo processes

indicate the significant genetic interactions between genesin different gene sets Following this clue by analyzing thematrix S we found much evidence of genes involved in thesame functionalmodules andmany cross talks between func-tional modules (Figure 6) Actually many of them supportthe network in Figures 4-5 The information of gene pairswith extreme 119878 scores and the involved modules could befound in the Supplementary Material (Supplementary Data2)

Strategy 2 of sparse matrix analysis is similar to strategy 1(Figure 3) First for every row module (the same definitionas that in Figure 3) cluster the column genes based ontheir genetic spectrums across genes in this row moduleThen select the column gene sets in which there are genesbelonging to nonzero entries of sparse matrix to be definedas column modules Finally map these column modules toGO items identifying their enriched functional categories(hypergeometric test) Similarly for all row modules repeat

8 BioMed Research International

PRP4

YRA1

CDC73

LGE1SDC1

RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1

SWD1RTT109

SWD3BRE1EAF5EAF7KAP122

SSD1TAD3EAF6

SWR1VPS71HTZ1EAF3CUL3

SEM1THP2SAC3THP1

PUF4GLC7SRC1

TOP1NUP60EFB1DBP5HIR2

SEM

1TH

P2SA

C3TH

P1

PUF4

GLC

7SR

C1

TOP1

NU

P60

EFB1

DBP

5H

IR2

RTT103

GBP2

CTK1

CTK2

SIF2HOS2SET3LEO1NPL3

UBP6DOA1RPN4

RAD23

PRP4

YRA1

CDC73

LGE1

SDC1

RXT2PHO23

SAP30

RXT2

PHO23

SAP3

0

SIN3

TAD3

DST1

CSE2

MED1

SWD1

RTT109

SWD3BRE1

EAF5EAF7

KAP122

SSD1

SWR1VPS71

CUL3

SOP1

EAF6

SET3

HTZ1

EAF3

RTT103

GBP2

CTK1

CTK2

SIF2

HOS2

SIN

3

LEO1

NPL3

DOA1UBP6

RPN4

RAD23

CBC2

CSN12

LEA1

CWC21

YHC1

PRP19

ELP6

PRP11

CWC27NTC20

IKI1

ELP2

ELP3IKI3

ELP4

ECM2

MUD2TGS1

SNU56

BRR1

BUD13

MLP2SYF2

YNR004WLIN1

SNU66PML1

YPR152C

URM1

ELP6

IKI1

ELP2

ELP3

IKI3

ELP4

URM

1

MUD1

HMT1

GIM5

YKE2GIM3

GIM4

GIM5YKE2GIM3GIM4

ELP3

ELP6

IKI1

GIM

5

YKE2

GIM

3

GIM

4

HTZ

1

SWR1

VPS

71

SGF1

1

TGS1

SKI7

HCR

1

RPS8

NPL

3

DRS

1

IST3

TIF35

ISY1

NAM8

CBC2

CSN12

LEA1

CWC21

YHC1

PRP11

CWC27

NTC20

ECM2

MUD1

TGS1

SNU56BRR1

BUD13

MLP2

SYF2

YNR004W

LIN1

SNU66

PML1

YPR152C

MUD2

HMT1

IST3

TIF35

ISY1

NAM8

PRP4

YRA

1

CDC7

3LG

E1

SSD

1

RXT2

PHO

23SA

P30

SIN

3D

ST1

CSE2

MED

1SO

H1

SWD

1

RTT1

09

SWD

3BR

E1

EAF5

EAF7

EAF6

KAP1

22

SDC1

TAD

3

SWR1

VPS

71H

TZ1

EAF3

CUL3

RTT1

03

GBP

2

CTK1

CTK2

SIF2

HO

S2SE

T3LE

O1

NPL

3

UBP

6D

OA

1

RPN

4

RAD

23

RNA splicing

Transcription

Transcription

transcription

RNA transport RNA localization

Intracellulartransport

Ribonucleoproteincomplex

biogenesis

Ribonucleoproteincomplex

biogenesis

Deadenylationdependentdecapping of

nucleartranscribed

mRNA

biogenesis

Regulation ofcellular protein

metabolic ncRNA

mRNA catabolic

metabolic

Ribosomeprocess

process

process

processprocess

process

Posttranscriptionalregulation of

gene expression

dependentmacromolecule

catabolic

tRNAmodification

Modification

process

rRNA metabolic

mRNA metabolic

process

process

process

process

Macromoleculecatabolicprocess

aciddeacetylation

deacetylation

Protein amino

RNA catabolic

Macromoleculecatabolic

tRNA metabolic

Chromatinmodification

tRNA wobbleuridine

modification

rRNA metabolicMacromolecularcomplex subunit

organizationCellular protein

complexassembly

ncRNAmetabolic

Cellular proteinlocalization

Regulation oftranslation

Regulation of

from nucleusmRNA export Histone

Maturation ofSSU rRNA

tRNA

Tubulin complexassembly

RNA capping

processing

RNAprocessing

PRP19

aminoacylation

mRNA 3998400end

3998400end

Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes

the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)

Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein

folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878

score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial

BioMed Research International 9

RNA transportMaturation of

SSU-rRNA

RPS6A

POM152

UAF30

DBP5 NUP157

RPS11B

RPS6B

THP1

SRC1

TOP1

SEM1

NUP60

ENP1

SAC3

PUF4 HIR2

GLC7EFB1

THP2

(a)

RNA transport RNA localization

RRP6

YRA2

DBP5

NUP53

LRP1

THP1SRC1

TOP1

SEM1

NUP60

SAC3

PUF4HIR2

GLC7EFB1

THP2

(b)

Regulation ofgene expression

mRNA metabolicprocess

POP1

PRP42

CDC36SNU71

PAB1

SUP35

PRP3

PRP5

PRP6

PRP21

CDC73

EAF3

RTT103

RTT109

SWD1

SWD3

SDC1SAP30

RXT2

CTK1SET3

RPN4

PRP4

PHO23

NPL3

MED1

LGE1

LEO1

KAP122 DST1CTK2

EAF5HOS2

TAD3CSE2 BRE1

VPS71EAF7EAF6

SOH1

SIF2DOA1

UBP6 SWR1

SSD1

SIN3

HTZ1

(c)

Tubulin complex assembly

(protein folding)mRNA

process3998400 end

NAB2

PAN3

CTK1

GIM3

YKE2

GIM5

GIM4

(d)

Tubulin complex assembly

TOP1

MLP2 HIR2

CDC36CKA1

UTP4

GIM3

YKE2

GIM5

GIM4

(protein folding)Regulation oftranscription

(e)

mRNA splicingncRNA metabolic

process

YPR152C

YNR004WCSN12

MUD1

YHC1

MUD2

NAM8

MSM1

MSD1

MSK1

MRM1

MSF1

PRP19

SNU66

SNU56

PRP11

PML1ECM2

BUD13

HMT1

CWC21CWC27 TGS1

IST3CBC2

ISY1

LIN1

LEA1

MLP2

BRR1

NTC20SYF2

TIF35

(f)

Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix

10 BioMed Research International

6 Conclusion

In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)

References

[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919

[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007

[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008

[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001

[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004

[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative

epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006

[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007

[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008

[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008

[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260

[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006

[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011

[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011

[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010

[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004

[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014

[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009

[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997

[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014

[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013

[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012

[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012

[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008

BioMed Research International 11

[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008

[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007

[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009

[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011

[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011

[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010

[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008

[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article Low-Rank and Sparse Matrix Decomposition ...downloads.hindawi.com/journals/bmri/2015/573956.pdfResearch Article Low-Rank and Sparse Matrix Decomposition for Genetic

2 BioMed Research International

structureOn the other hand the cross talks betweenmodulesare usually indicated by gene pairs with high 119878 scores (sothat the genetic interaction is significant) Removing themresults in better low-rank structure Evocatively these genepairs are likely shadows over the low-rankmatrix and connectdifferent rank areas These cross talks reveal the relation-ships of different biological process or protein complexesMeanwhile gene pairs exhibiting high absolute value of 119878

scores may encode proteins that are physically associatedor be enriched in protein-protein interactions [7ndash9] Sothe investigation of 119878 score is equally important Howeverthe current methodologies in genetic interaction networksanalysis did not efficiently address these two important issuessimultaneously

In order to identify modules and between-module crosstalks in genetic interaction networks we employ the ldquolow-rank and sparse decompositionrdquo (LRSDec) to decomposeEMAP data matrix into a low-rank part and a sparse partWe propose that the low-rank structure accounts for genemodules in which genes have high correlations and thesparsity matrix captures the significant 119878 scores In particularentries in sparse matrix found by LRSDec correspond totwo sources of biologically meaningful interactions within-module interactions and between-module links In this paperwe focus our discussion of the sparse matrix on the resultsof between-module links while the results of within-moduleinteractions can be found in the Supplementary Mate-rial available online at httpdxdoiorg1011552015573956(Supplementary Data 1)

Low-rank and sparse of matrix structures have beenprofoundly studied in matrix completion and compressedsensing [10 11] The robust principal component analysis(RPCA) [12] proved that the low-rank and the sparse com-ponents of a matrix can be exactly recovered if it has a uniqueand precise ldquolow-rank + sparserdquo decomposition RPCA offersa blind separation of low-rank data and sparse noises whichassumedX = L+S (S is the sparse noise) and exactly decom-posesX into L and Swithout predefined rank(L) and card(S)Another successful matrix decomposition method GoDecstudied the approximated ldquolow-rank+ sparserdquo decompositionof amatrixX by estimating the low-rank part L and the sparsepart S from X allowing noise that is X = L + S + e andconstrained the rank range of L and the cardinality range of119878 [13] It has been stated that GoDec has outperformed otheralgorithms before

In this paper we modified the GoDec matrix decompo-sition method and developed ldquolow-rank and sparse decom-positionrdquo (LRSDec) to estimate the low-rank part L and thesparse part S of X LRSDec minimizes the nuclear normof L and predefines the cardinality range of S while con-sidering the additive noise e Different from GoDec whichdirectly constrains the rank range of L LRSDec minimizesits responding convex polytopes that is the nuclear norm ofL It has been proven that the nuclear norm outperforms therank-restricted estimator [14] Furthermore if in presence ofmissing data LRSDec could impute the missing entries whiledecomposing with no need for data pretreatment whileGoDec could not accomplish decomposition and imputationsimultaneously then we stated the convergence properties of

our algorithm and proved that given the two regularizationparameters the objective value of LRSDec monotonicallydecreases By applying both methods to a synthetic datasetwe demonstrated the superiority of LRSDec over GoDecFinally we analyzed a genetic interaction dataset (EMAP)using our algorithm and identified many biologically mean-ingful modules and cross talks between them

2 Model

LetX be an 119898 times 119899 matrix that represents a genetic interactiondataset where 119898 is the number of query genes and 119899 is thenumber of library genes We propose to decompose X as

X = L + S + e (1)

where L isin R119898times119899 denotes the low-rank part and S isin

R119898times119899 denotes the sparse part and e is the noise Here weintroduce L isin R119898times119899 to account for modules in whichgenes are highly correlated These modules correspond toprotein complexes pathways and biological pathways inwhich genes tend to share similar genetic interaction profiles[15] S isin R119898times119899 is introduced to account for significant119878 scores which are either gene pairs in the same modulethat have genetic interactions or cross talks among differentfunctional modules

Based on the assumptions above we propose to solve thefollowing optimization problem

minimize rank (L) minimize card (S)

subject to sum

(119894119895)

(119883119894119895

minus 119871119894119895

minus 119878119894119895)2

⩽ 120575(2)

where 120575 ge 0 is a regularization parameter that controls theerror tolerance and card(S) denote the number of nonzeroentries in matrix S

To make the minimization problem tractable we relaxthe rank operator on L with the nuclear norm which hasbeen proven to be an effective convex surrogate of the rankoperator [14]

minimize Llowast minimize card (S)

subject to sum

(119894119895)

(119883119894119895

minus 119871119894119895

minus 119878119894119895)2

⩽ 120575(3)

where Llowastis the nuclear norm of L (L

lowast= sum119903

119894=1120590119894 where

1205901 120590

119903are the singular values of L and 119903 is the rank of L)

However missing data is commonly encountered inEMAP data confounding techniques such as cluster analysisand matrix factorization Here we extend our basic model(3) to handle EMAP data with missing values by imputingmissing entries in thematrix simultaneouslywhen estimatinglow-rank matrix L and sparse matrix S Suppose that we onlyobserve a subset of X indexed by Ω and the missing entriesare indexed by Ω

perp In order to find a low-rank matrix L and

BioMed Research International 3

a sparse matrix S based on the observed data we propose tosolve the following optimization problem

minimize Llowast

minimize card (S)

subject to sum

(119894119895)isinΩ

(119883119894119895

minus 119871119894119895

minus 119878119894119895)2

⩽ 120575(4)

3 Algorithm

Similar to GoDec the optimization problem of (3) can besolved by alternatively optimizing the following two subprob-lems until convergence

L119905

= arg minLlowast

1003817100381710038171003817X minus L minus S119905minus1

1003817100381710038171003817

2

119865 (5a)

S119905

= arg mincard(S)le119896

1003817100381710038171003817X minus L119905minus S1003817100381710038171003817

2

119865 (5b)

In each iteration we optimize the objective function byalternatively updating L and S Firstly the subproblem (5a)can be solved by [14] For fixed S the solution of (5a) is

L119905

= T120582

(X minus S119905minus1

) (6)

Here 120582 ge 0 is a regularization parameter controlling thenuclear norm of estimated value L

119905 where

T120582

(W) = UD120582V1015840

with D120582

= diag [(1198891

minus 120582)+

(119889119903

minus 120582)+]

(7)

UDV1015840 is the Singular Value Decomposition (SVD) of W andhere 119905

+= max(119905 0) The notation T

120582(W) refers to soft-

thresholding [14]Next the subproblem (5b) in (3) could be updated via

entry-wise hard thresholding of X minus L119905for fixed L

119905 Before

giving the solution we define an orthogonal projectionoperator P Suppose there is a subset of dataset W indexedby Ω then the matrix W can be projected onto the linearspace of matrices supported by Ω

119875Ω

(W)(119894119895)

=

119882119894119895

if (119894 119895) isin Ω

0 if (119894 119895) notin Ω

(8)

AndPΩperp is its complementary projection that isP

Ω(W) +

PΩperp(W) = WThen the solution of (5b) could be given as follows

S = PΘ

(X minus L119905) (9)

where P is the orthogonal projection operator as definedabove Θ is the nonzero subset of the first 119896 largest entriesof |(X minus L

119905)| Then the matrix (X minus L

119905) can be projected onto

the linear space of matrices supported by Θ

(X minus L119905)(119894119895)

=

(X minus L119905)119894119895

if (119894 119895) isin Θ

0 if (119894 119895) notin Θ

(10)

So far we have developed the algorithm for solvingproblem (3) As for problem (4) due to the existence ofmissing values we took the optimization on the observeddata Ω We updated L

119905and S

119905of the following optimization

subproblems respectively

L119905

= arg minLlowast

1003817100381710038171003817PΩ (X minus L minus S119905minus1

)1003817100381710038171003817

2

119865 (11a)

S119905

= arg mincard(S)le119896

1003817100381710038171003817PΩ (X minus L119905minus S)

1003817100381710038171003817

2

119865 (11b)

The term PΩ

(X minus L minus S)2

119865is the sum of squared errors on

the observed entries indexed by ΩThe subproblem (11a) can be solved by updating Lwith an

arbitrary initialization using [14]

L119905

larr997888 T120582

(PΩ

(X minus S119905minus1

) + PΩperp (L)) (12)

The solution of subproblem (11b) is

S119905

= PΘ

(PΩ

(X minus L119905)) (13)

where Θ is the nonzero subset of the first 119896 largest entries of|PΩ

(X minus L119905)|

Now we have the following algorithm

Algorithm 1 (LRSDec) (i) Input X isin R119898times119899 Initialize S larr 0(ii) Iterate until convergence(a) L-step iteratively update L using (12)(b) S-step Solve S using (13)

(iii) Output L S

The convergence analysis of our algorithms is provided inthe Supplementary Material

4 Parameter Tuning

We have two parameters that need to be tuned in our models120582 and 119896 Here we propose a 10-fold cross validation strategyto select them The idea is as follows let Ω be the index ofobserved entries ofX We randomly partition Ω into 10 equalsize subsets and choose training entriesΩ

1and testing entries

Ω2 Ω1

cup Ω2

= Ω and Ω1

cap Ω2

= Oslash |Ω1| = 09 lowast |Ω| |Ω

2| =

01lowast |Ω| Wemay solve problem (15) on a grid of (120582 119896) valueson the training data

L119905

= arg minLlowast

10038171003817100381710038171003817PΩ1

(X minus L minus S119905minus1

)10038171003817100381710038171003817

2

119865

(14a)

S119905

= arg mincard(S)le119896

10038171003817100381710038171003817PΩ1

(X minus L119905minus S)

10038171003817100381710038171003817

2

119865

(14b)

Then we evaluate the prediction error (15) on the testingdata

Err (120582 119896) =1

2

10038171003817100381710038171003817PΩ2

(X minus L(120582 119896) minus S(120582 119896))10038171003817100381710038171003817

2

119865

(15)

The cross validation process is repeated for 10 timesThenwe can find the optimal parameter (120582

lowast

119896lowast

) which minimizesthe mean of the prediction error

4 BioMed Research International

0 50 100 150 200 250 300 3500

200400

6000

10

20

Rank of matrix L

Card of matrix S

Rela

tive e

rror

of

The minimum ofrelative error

25

250

deco

mpo

sitio

n

Figure 1 Results of parameter tuning on synthetic data 119909-axisdifferent cardinality corresponding to parameter 119896 119910-axis differentrank corresponding to parameter 120582 119911-axis the relative error X minus

X2

119865X

2119865

5 Results

51 Synthetic Data We simulated a synthetic data and thenapplied LRSDec algorithm andGoDec algorithm to it Specif-ically low-rank part sparse part and noises are generated asfollows

(i) Low-rank part the covariance matrix Σ is generatedbyHH119879 whereH isin R119898times119870 andH

119894119895sim N(0 1) Here119870

is the number of hiddenmodulesThe random entriesL119895are drawn fromN(0 120591Σ) Let L = [L

1 L

119899]

(ii) Sparse part the non-zero entries in sparse matrixare generated from the tail of Gaussian distributionN(1 2) whose upper quantile is 120572 = 001 Werandomly selected 70 of them to assign the oppositesignThis is consistent with EMAP datasets in whichnegative genetic interactions are much more preva-lent than the positive ones

(iii) e = 10minus2

lowastF wherein F is a standard Gaussianmatrix

A low-rank matrix 119871 with rank 25 and sparse matrix withcardinality 250 were generated respectively Now we have

X = L + S + e (16)

The first step is parameter training and the result isshowed in Figure 1 Minimal prediction error was achievedwhen 120582 = 250 and 119896 = 25 which coincides with the rankand cardinality of the synthetic data This demonstrated theeffectiveness of cross validation procedure

Next we compared the performance of LRSDec algo-rithm and GoDec algorithm by comparing their predictionerror The relative error is defined as

10038171003817100381710038171003817W minus W10038171003817100381710038171003817

2

119865

W2

119865

(17)

where W is the original matrix and W is an estimateap-proximation As both algorithms are influenced seriouslyby the parameters we compared the relative error of thetwo algorithms by given different parameters To make thecomparison simple we only changed one parameter with

another parameter fixed (Figure 2) One can see that bothalgorithms reach the smallest relative if adopting the correcttwo parameters and LRSDec outperforms GoDec In theSupplementaryMaterial we also compare the performance ofboth algorithms under different noise setting and the trendsare the same

52 Application to EMAP Data from Yeast We also appliedour method to EMAP data interrogating RNA processing inS cerevisiae which consists of 552 genes involved in RNA-related processes [9]This geneticmap contains about 152000pairwise genetic interaction measurements with about 29missing entries in data matrixWe applied ourmethod to thisEMAP data denoted asX to obtain twomatrices a low-rankmatrix L and a sparse matrix S X = L+S is the new completedata matrix with imputed missing entries To exploit thequantitative information from EMAP data we first subjectedthe entire low-rank matrix L to hierarchical clustering anapproach that groups genes with similar patterns of geneticinteractions It should be noted that using low-rank matrix Lin cluster improved the performance of hierarchical cluster-ing in detecting genetic interaction modules [16] Accordingto the clustering result we reordered rows and columnsof matrix X so that the protein complexes and biologicalprocesses showed in [9] could be found (Figure S1)

To help identify more modules of cellular functions andprocesses and reveal the relationships between them we fur-ther analyzed the matrices L and S Figure 3 is a flowchart ofour strategy 1 to detectmodules and cross talks between themthrough low-rank matrix L In this paper we define moduleas a cluster from hierarchical clustering (HC) that passesthrough GO-enriched filtering (Supplementary Section 3)The details are as follows Firstly we clustered the row genesof matrix L using hierarchical clustering (HC) Here weadopted the average-linkage hierarchical clustering algorithmin which the distance of gene 119860 and gene 119861 is defined as1 minus |cor(119860 119861)| where |cor(119860 119861)| is the absolute value of thecorrelation of genetic interaction profile of gene 119860 and gene119861 A cutoff needs to be applied for HC to cut the hierarchicalclustering tree We used the Jaccard index (SupplementarySection 4) to determine how well the predicted gene setscorrespond to benchmark (theoretical) gene sets [17] HereGO functional gene sets are used as benchmark (theoretical)gene setsThe cutoff atwhichHCachieved the highest Jaccardindex is used to cut the hierarchical tree We calculatedthe Jaccard index of every ldquoheightrdquo cutoff in hierarchicalclustering from 02 to 095 by 005 interval This step resultedin the best Jaccard index with height = 07 and 84 clustersNow we got the clusters of row genes in which genes act ina consistent manner across the entire column genes Thenwe filtered the clustering results by a hypergeometric testthat calculates the significance of enrichment of GO itemsand the 119901 value was set to 001 The clusters enriched in GOfunctional categories are defined as row modules Secondlyfor each row module we exploited modules of column genesbased on this row gene module in matrix X by clustering thecolumn genes of this submatrix of X Thirdly we screenedcolumn clusters whose interactions with the rowmodules are

BioMed Research International 5

0 50 100 150 200 250 300 3500

02

04

06

08

1

Number of ranks

The r

elat

ive e

rror

Relative error of matrix L

(a)

0 50 100 150 200 250 300 3500

02

04

06

08

1

12

14

Number of ranks

The r

elat

ive e

rror

Relative error of matrix S

(b)

0 100 200 300 400 500 6000

005

01

015

02

025

03

035

Number of cards

The r

elat

ive e

rror

Relative error of matrix L

GoDecLRSDec

(c)

0 100 200 300 400 500 6000

02

04

06

08

1

12

14

Number of cards

The r

elat

ive e

rror

Relative error of matrix S

GoDecLRSDec

(d)

Figure 2 Performances of LRSDec and GoDec in low-rank and sparse decomposition tasks on synthetic data under different parameters((a)-(b)) Fixed parameter card different parameter rank ((c)-(d)) fixed parameter rank different parameter card

HC row genes of L

Row modules

GO-enrichment filtering

Based oneach row module

HC column genes of L

Fisherrsquos exact testColumn modules

Connections between this row module and all column modules

Column modules that have functional

All column modules map to GO items

Connections among modules of the whole network

Repeat these three steps for

all significant

row modules

Figure 3 Flowchart of our strategy 1 for detecting modules and cross talks between them in genetic interaction network by low-rank matrixL

6 BioMed Research International

Table 1 Fisherrsquos exact test table (Gene set 119860)perp denotes thecomplementary set of gene set 119860 119860119861 denotes the number ofconnections between gene set 119860 and gene set 119861 119860119861

perp

denotes thenumber of connections between gene set 119860 and the complementaryset of gene set 119861

Gene set 119861 (Gene set 119861)perp

Gene set 119860 119860119861 119860119861perp

(Gene set 119860)perp 119860perp

119861 119860perp

119861perp

Table 2 Clustering results

(a) GO slim as benchmark gene set

Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched

200 0063 190 0022 46150 0070 138 0032 44100 0084 90 0050 5050 0088 30 0067 24

(b) GO BP FAT as benchmark gene set

Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched

200 0137 183 0044 47150 0147 142 0058 50100 0155 96 0091 5250 0131 34 0078 26 hypergeometric test applied to test the enrichment of gene sets Signif-icance level FDR lt= 005 Cluster the number of clusters to cut off thehierarchical clustering tree Enriched the number of modules predictedby hierarchical clustering enriched in the GO iterms

significantly enriched by Fisherrsquos exact test (Table 1 119901 value= 005) Here we defined the positive genetic interactions asthose gene pairs with genetic interaction scores 119878 ge 20 andnegative as 119878 le minus25 [9] The reduced gene sets of columngenes were defined as columnmodules (corresponding to therow module) Next we identified the enriched GO functionalcategories of these column modules by mapping them to GOitems (hypergeometric test) Finally repeating these steps forall row modules we identified the modules and intermodulegenetic cross talk of the whole genetic interaction network(Figures 4ndash6) where red and green represent a statisticallysignificant enrichment of positive and negative interactionsThe cross talks constructed in these figures are the 119878 scoresamong genes in the original matrix In the following we willdiscuss many of the interesting connections that have beenreported previously

The low-rank matrix found more functional modulesthan the original matrix (Table 2) In Table 2 we cut thedendrogram at different heights and compared the clustersobtained from the low-rank matrix and that of the originalmatrix Jaccard index [17] is used to determine how well thepredicted clusters recaptured the benchmark gene sets ((a)GO slim and (b) GO BP FAT) The definition of Jaccardindex can be found in the Supplementary Material Inthe ideal situation where predicted clusters perfectly match

the benchmark gene sets the Jaccard index is 1 The largerthe Jaccard index the better the predictions The clustersobtained from clustering of the low-rank matrix are moreenriched with GO functional categories at varying cutoffs(Table 2)

Figure 4 gives an overview of the relationships amongbiological processes when GO slim (downloaded from SGD[18] a broad overview of all of the top GO categories) is usedas GO items We found that several sets of genes that havebeen known to function in the same biochemical processescontain predominantly positive or negative interactionswhich was also observed in [9] For example genes classed asinvolved in RNA splicing and transcription are significantlyenriched in negative genetic interactions (Figure 4 greennodes) In contrast the module involved in protein foldinghas strong positive interactions (red node) In additionFigure 4 also suggested that not all modules have consistentpattern of interactions (yellow node) which is reasonable inbiological processes Finally several connections have beenpreviously discussed For example there is good evidencefor functional interactions between splicing and transcriptionin [19] and functional interactions between splicing andtranslation in [20] Furthermore [21] reported the cooper-ative relationship between protein folding and chromosomeorganization

Actually if we classify the GO functional categories tomore fine items (GO BP FAT downloaded from DAVIDhttpdavidabccncifcrfgov) we can get a more compre-hensive network (Figure 5) Many of the interaction resultsin Figure 5 have been reported before For instance genesinvolved in tubulin complex assembly process have negativegenetic interactions with genes involved in tRNA wobbleuridine modification process supported by SGD while thenegative genetic interactions between RNA splicing processand tRNA metabolic process could also be found Actuallythe genetic interaction between RNA splicing and chromatinmodification has been studied in [22] And the balance ofthe interactions between the processing of ribonucleoproteinassembly of intronic noncoding RNAs and the splicingprocess regulating the levels of ncRNA and host mRNA canbe found in [23] Tubulin functionally relating to roles ofthe elongator complex in tRNA wobble uridine modificationis supported by [24] Moreover epistasis and chromatinimmunoprecipitation experiments indicating that the lossof Rrp6 (regulation of transcription) function is paralleledby the recruitment of Hda1 (histone deacetylase) have beenreported by [25] Finally cotranscriptional recruitment of themRNA export factor Yra1 realized by direct interaction withthe 31015840 end processing factor Pcf11 was in [26]

In an effort to gain insight into the functional organi-zation of RNA-related complexes we used the GO CC FAT(downloaded from DAVID) as the GO functional categoriesto create a map that highlights strong genetic trends bothwithin and between these complexes This result could befound in the Supplementary Material

Now let us turn to the analysis of the sparse matrix S Thesparse matrix gives two distinct measures to exploit geneticinformation First extreme 119878 scores indicate cofunctionalmembership more efficiently [27] Second some 119878 scores

BioMed Research International 7

Negativeinteractions

Positiveinteractions

LOC1LIN1

BUD13

IST3LEA1

STO1

PRP11SNU66

CBC2BRR1MUD2

TGS1YNR004WCWC15ECM2

YPR152CNTC20ISY1CWC21SYF2CWC27PRP19CSN12

PML1MUD1

NAM8

HMT1SNU56

PRP42TIF35DBR1YHC1

LOC1

LIN1

BUD13

IST3

LEA1

STO1

PRP1

1

SNU66

CBC2

BRR1

MU

D2

TGS1

YNR0

04

WCW

C15

ECM2

YPR1

52

CN

TC20

ISY1

CWC2

1

SYF2

CWC2

7

PRP1

9

CSN12

PML1

MU

D1

NA

M8

HM

T1SN

U56

PRP4

2

TIF3

5

DBR

1

YHC1

NPL3PUF4THP2RTT103CTK2SIF2HOS2SET3CDC73LEO1

RTT109

RTT109

CTK1HTZ1SOH1

SOH1

DST1CSE2

CSE2

MED1

SDC1EAF7THP1SYC1RXT2

RXT2

SWR1VPS71

PHO23

SAP30

PHO23SAP30SIN3

SIN3

HIR2

PUF4

SIF2HOS2SET3CDC73LEO1

SOH1CSE2

SDC1EAF7

RXT2

PHO2

SAP3

0SI

N3

HIR2

HIR2

RTT109G

BP2

SWR1VPS71

GLC

7

HIR2

NPL

3

PUF4

THP2

RTT1

03

CTK2

SIF2

HO

S2SE

T3CD

C73

LEO1

RTT1

09

CTK1

HTZ

1

SOH1

DST

1

CSE2

MED

1

SDC1

EAF7

THP1

SYC1

RXT2

PHO23

SAP3

0

SIN3

HIR2

GBP

2YR

A1

THP2

RPN4

SOH1

SAC3

THP1

DO

A1

RAD23

RTT1

09

SIN3

SWD1

SWD3

SWD3

RPN

4CA

F40

BRE1

SWD1

BRE1

LGE1

CDC7

3EA

F6EA

F5EA

F7SR

C1PA

P2EA

F3

GBP2

RAD23

CSE2

PUF4

SWD1

SWD3

BRE1

LGE1

RAD52

SIF2

HO

S2

SOH1

SET3

CDC7

LEO1

SRC1

SDC1

EAF3

RTT1

0

SWR1

VPS

71

RAD5

RAD5

RAD2

EAF6

EAF7

BRE1SIF2HOS2SET3SEM1

SAC3THP1BUD13

RPN4

RAD52

GLC7SRC1TOR1RTT101RAD51

SOH1

CSE2SIN3

MLP2

BRE1

KEM

1

SIF2

HO

S2SE

T3SE

M1

SAC3

THP1

BUD13

RPN4

RAD52

GLC

7

SRC1

TOR1

RTT1

01

RAD51

SOH1

CSE2

SIN3

MLP

2

YBL104CYGR283CRIC1YPT6GIM4

YKE2GIM3

GIM5

YGR2

83

CRI

C1YP

T6G

IM4

YKE2

GIM

3

GIM

5

RNA splicing

Transcription

DNA metabolicprocess

Cellular proteinmodification

processTranslation

Cellular proteindegradation

Transport

Mitochondrionorganization

Ribosome

Chromosomeorganization

Response tostress

Cell-divisioncycle

Protein folding

LRS4

HEX

3SL

X8LR

P1RR

P6FP

R1TG

S1

Figure 4 Global view of the genetic cross talks between different RNA-related processes (GO slim items) Green and red represent astatistically significant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20)interactions respectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative geneticinteractions Nodes (balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heatmaps represent scores of interactions within one process and the rectangle heatmaps represent scores of interactions betweentwo processes

indicate the significant genetic interactions between genesin different gene sets Following this clue by analyzing thematrix S we found much evidence of genes involved in thesame functionalmodules andmany cross talks between func-tional modules (Figure 6) Actually many of them supportthe network in Figures 4-5 The information of gene pairswith extreme 119878 scores and the involved modules could befound in the Supplementary Material (Supplementary Data2)

Strategy 2 of sparse matrix analysis is similar to strategy 1(Figure 3) First for every row module (the same definitionas that in Figure 3) cluster the column genes based ontheir genetic spectrums across genes in this row moduleThen select the column gene sets in which there are genesbelonging to nonzero entries of sparse matrix to be definedas column modules Finally map these column modules toGO items identifying their enriched functional categories(hypergeometric test) Similarly for all row modules repeat

8 BioMed Research International

PRP4

YRA1

CDC73

LGE1SDC1

RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1

SWD1RTT109

SWD3BRE1EAF5EAF7KAP122

SSD1TAD3EAF6

SWR1VPS71HTZ1EAF3CUL3

SEM1THP2SAC3THP1

PUF4GLC7SRC1

TOP1NUP60EFB1DBP5HIR2

SEM

1TH

P2SA

C3TH

P1

PUF4

GLC

7SR

C1

TOP1

NU

P60

EFB1

DBP

5H

IR2

RTT103

GBP2

CTK1

CTK2

SIF2HOS2SET3LEO1NPL3

UBP6DOA1RPN4

RAD23

PRP4

YRA1

CDC73

LGE1

SDC1

RXT2PHO23

SAP30

RXT2

PHO23

SAP3

0

SIN3

TAD3

DST1

CSE2

MED1

SWD1

RTT109

SWD3BRE1

EAF5EAF7

KAP122

SSD1

SWR1VPS71

CUL3

SOP1

EAF6

SET3

HTZ1

EAF3

RTT103

GBP2

CTK1

CTK2

SIF2

HOS2

SIN

3

LEO1

NPL3

DOA1UBP6

RPN4

RAD23

CBC2

CSN12

LEA1

CWC21

YHC1

PRP19

ELP6

PRP11

CWC27NTC20

IKI1

ELP2

ELP3IKI3

ELP4

ECM2

MUD2TGS1

SNU56

BRR1

BUD13

MLP2SYF2

YNR004WLIN1

SNU66PML1

YPR152C

URM1

ELP6

IKI1

ELP2

ELP3

IKI3

ELP4

URM

1

MUD1

HMT1

GIM5

YKE2GIM3

GIM4

GIM5YKE2GIM3GIM4

ELP3

ELP6

IKI1

GIM

5

YKE2

GIM

3

GIM

4

HTZ

1

SWR1

VPS

71

SGF1

1

TGS1

SKI7

HCR

1

RPS8

NPL

3

DRS

1

IST3

TIF35

ISY1

NAM8

CBC2

CSN12

LEA1

CWC21

YHC1

PRP11

CWC27

NTC20

ECM2

MUD1

TGS1

SNU56BRR1

BUD13

MLP2

SYF2

YNR004W

LIN1

SNU66

PML1

YPR152C

MUD2

HMT1

IST3

TIF35

ISY1

NAM8

PRP4

YRA

1

CDC7

3LG

E1

SSD

1

RXT2

PHO

23SA

P30

SIN

3D

ST1

CSE2

MED

1SO

H1

SWD

1

RTT1

09

SWD

3BR

E1

EAF5

EAF7

EAF6

KAP1

22

SDC1

TAD

3

SWR1

VPS

71H

TZ1

EAF3

CUL3

RTT1

03

GBP

2

CTK1

CTK2

SIF2

HO

S2SE

T3LE

O1

NPL

3

UBP

6D

OA

1

RPN

4

RAD

23

RNA splicing

Transcription

Transcription

transcription

RNA transport RNA localization

Intracellulartransport

Ribonucleoproteincomplex

biogenesis

Ribonucleoproteincomplex

biogenesis

Deadenylationdependentdecapping of

nucleartranscribed

mRNA

biogenesis

Regulation ofcellular protein

metabolic ncRNA

mRNA catabolic

metabolic

Ribosomeprocess

process

process

processprocess

process

Posttranscriptionalregulation of

gene expression

dependentmacromolecule

catabolic

tRNAmodification

Modification

process

rRNA metabolic

mRNA metabolic

process

process

process

process

Macromoleculecatabolicprocess

aciddeacetylation

deacetylation

Protein amino

RNA catabolic

Macromoleculecatabolic

tRNA metabolic

Chromatinmodification

tRNA wobbleuridine

modification

rRNA metabolicMacromolecularcomplex subunit

organizationCellular protein

complexassembly

ncRNAmetabolic

Cellular proteinlocalization

Regulation oftranslation

Regulation of

from nucleusmRNA export Histone

Maturation ofSSU rRNA

tRNA

Tubulin complexassembly

RNA capping

processing

RNAprocessing

PRP19

aminoacylation

mRNA 3998400end

3998400end

Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes

the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)

Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein

folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878

score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial

BioMed Research International 9

RNA transportMaturation of

SSU-rRNA

RPS6A

POM152

UAF30

DBP5 NUP157

RPS11B

RPS6B

THP1

SRC1

TOP1

SEM1

NUP60

ENP1

SAC3

PUF4 HIR2

GLC7EFB1

THP2

(a)

RNA transport RNA localization

RRP6

YRA2

DBP5

NUP53

LRP1

THP1SRC1

TOP1

SEM1

NUP60

SAC3

PUF4HIR2

GLC7EFB1

THP2

(b)

Regulation ofgene expression

mRNA metabolicprocess

POP1

PRP42

CDC36SNU71

PAB1

SUP35

PRP3

PRP5

PRP6

PRP21

CDC73

EAF3

RTT103

RTT109

SWD1

SWD3

SDC1SAP30

RXT2

CTK1SET3

RPN4

PRP4

PHO23

NPL3

MED1

LGE1

LEO1

KAP122 DST1CTK2

EAF5HOS2

TAD3CSE2 BRE1

VPS71EAF7EAF6

SOH1

SIF2DOA1

UBP6 SWR1

SSD1

SIN3

HTZ1

(c)

Tubulin complex assembly

(protein folding)mRNA

process3998400 end

NAB2

PAN3

CTK1

GIM3

YKE2

GIM5

GIM4

(d)

Tubulin complex assembly

TOP1

MLP2 HIR2

CDC36CKA1

UTP4

GIM3

YKE2

GIM5

GIM4

(protein folding)Regulation oftranscription

(e)

mRNA splicingncRNA metabolic

process

YPR152C

YNR004WCSN12

MUD1

YHC1

MUD2

NAM8

MSM1

MSD1

MSK1

MRM1

MSF1

PRP19

SNU66

SNU56

PRP11

PML1ECM2

BUD13

HMT1

CWC21CWC27 TGS1

IST3CBC2

ISY1

LIN1

LEA1

MLP2

BRR1

NTC20SYF2

TIF35

(f)

Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix

10 BioMed Research International

6 Conclusion

In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)

References

[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919

[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007

[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008

[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001

[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004

[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative

epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006

[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007

[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008

[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008

[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260

[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006

[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011

[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011

[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010

[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004

[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014

[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009

[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997

[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014

[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013

[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012

[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012

[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008

BioMed Research International 11

[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008

[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007

[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009

[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011

[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011

[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010

[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008

[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article Low-Rank and Sparse Matrix Decomposition ...downloads.hindawi.com/journals/bmri/2015/573956.pdfResearch Article Low-Rank and Sparse Matrix Decomposition for Genetic

BioMed Research International 3

a sparse matrix S based on the observed data we propose tosolve the following optimization problem

minimize Llowast

minimize card (S)

subject to sum

(119894119895)isinΩ

(119883119894119895

minus 119871119894119895

minus 119878119894119895)2

⩽ 120575(4)

3 Algorithm

Similar to GoDec the optimization problem of (3) can besolved by alternatively optimizing the following two subprob-lems until convergence

L119905

= arg minLlowast

1003817100381710038171003817X minus L minus S119905minus1

1003817100381710038171003817

2

119865 (5a)

S119905

= arg mincard(S)le119896

1003817100381710038171003817X minus L119905minus S1003817100381710038171003817

2

119865 (5b)

In each iteration we optimize the objective function byalternatively updating L and S Firstly the subproblem (5a)can be solved by [14] For fixed S the solution of (5a) is

L119905

= T120582

(X minus S119905minus1

) (6)

Here 120582 ge 0 is a regularization parameter controlling thenuclear norm of estimated value L

119905 where

T120582

(W) = UD120582V1015840

with D120582

= diag [(1198891

minus 120582)+

(119889119903

minus 120582)+]

(7)

UDV1015840 is the Singular Value Decomposition (SVD) of W andhere 119905

+= max(119905 0) The notation T

120582(W) refers to soft-

thresholding [14]Next the subproblem (5b) in (3) could be updated via

entry-wise hard thresholding of X minus L119905for fixed L

119905 Before

giving the solution we define an orthogonal projectionoperator P Suppose there is a subset of dataset W indexedby Ω then the matrix W can be projected onto the linearspace of matrices supported by Ω

119875Ω

(W)(119894119895)

=

119882119894119895

if (119894 119895) isin Ω

0 if (119894 119895) notin Ω

(8)

AndPΩperp is its complementary projection that isP

Ω(W) +

PΩperp(W) = WThen the solution of (5b) could be given as follows

S = PΘ

(X minus L119905) (9)

where P is the orthogonal projection operator as definedabove Θ is the nonzero subset of the first 119896 largest entriesof |(X minus L

119905)| Then the matrix (X minus L

119905) can be projected onto

the linear space of matrices supported by Θ

(X minus L119905)(119894119895)

=

(X minus L119905)119894119895

if (119894 119895) isin Θ

0 if (119894 119895) notin Θ

(10)

So far we have developed the algorithm for solvingproblem (3) As for problem (4) due to the existence ofmissing values we took the optimization on the observeddata Ω We updated L

119905and S

119905of the following optimization

subproblems respectively

L119905

= arg minLlowast

1003817100381710038171003817PΩ (X minus L minus S119905minus1

)1003817100381710038171003817

2

119865 (11a)

S119905

= arg mincard(S)le119896

1003817100381710038171003817PΩ (X minus L119905minus S)

1003817100381710038171003817

2

119865 (11b)

The term PΩ

(X minus L minus S)2

119865is the sum of squared errors on

the observed entries indexed by ΩThe subproblem (11a) can be solved by updating Lwith an

arbitrary initialization using [14]

L119905

larr997888 T120582

(PΩ

(X minus S119905minus1

) + PΩperp (L)) (12)

The solution of subproblem (11b) is

S119905

= PΘ

(PΩ

(X minus L119905)) (13)

where Θ is the nonzero subset of the first 119896 largest entries of|PΩ

(X minus L119905)|

Now we have the following algorithm

Algorithm 1 (LRSDec) (i) Input X isin R119898times119899 Initialize S larr 0(ii) Iterate until convergence(a) L-step iteratively update L using (12)(b) S-step Solve S using (13)

(iii) Output L S

The convergence analysis of our algorithms is provided inthe Supplementary Material

4 Parameter Tuning

We have two parameters that need to be tuned in our models120582 and 119896 Here we propose a 10-fold cross validation strategyto select them The idea is as follows let Ω be the index ofobserved entries ofX We randomly partition Ω into 10 equalsize subsets and choose training entriesΩ

1and testing entries

Ω2 Ω1

cup Ω2

= Ω and Ω1

cap Ω2

= Oslash |Ω1| = 09 lowast |Ω| |Ω

2| =

01lowast |Ω| Wemay solve problem (15) on a grid of (120582 119896) valueson the training data

L119905

= arg minLlowast

10038171003817100381710038171003817PΩ1

(X minus L minus S119905minus1

)10038171003817100381710038171003817

2

119865

(14a)

S119905

= arg mincard(S)le119896

10038171003817100381710038171003817PΩ1

(X minus L119905minus S)

10038171003817100381710038171003817

2

119865

(14b)

Then we evaluate the prediction error (15) on the testingdata

Err (120582 119896) =1

2

10038171003817100381710038171003817PΩ2

(X minus L(120582 119896) minus S(120582 119896))10038171003817100381710038171003817

2

119865

(15)

The cross validation process is repeated for 10 timesThenwe can find the optimal parameter (120582

lowast

119896lowast

) which minimizesthe mean of the prediction error

4 BioMed Research International

0 50 100 150 200 250 300 3500

200400

6000

10

20

Rank of matrix L

Card of matrix S

Rela

tive e

rror

of

The minimum ofrelative error

25

250

deco

mpo

sitio

n

Figure 1 Results of parameter tuning on synthetic data 119909-axisdifferent cardinality corresponding to parameter 119896 119910-axis differentrank corresponding to parameter 120582 119911-axis the relative error X minus

X2

119865X

2119865

5 Results

51 Synthetic Data We simulated a synthetic data and thenapplied LRSDec algorithm andGoDec algorithm to it Specif-ically low-rank part sparse part and noises are generated asfollows

(i) Low-rank part the covariance matrix Σ is generatedbyHH119879 whereH isin R119898times119870 andH

119894119895sim N(0 1) Here119870

is the number of hiddenmodulesThe random entriesL119895are drawn fromN(0 120591Σ) Let L = [L

1 L

119899]

(ii) Sparse part the non-zero entries in sparse matrixare generated from the tail of Gaussian distributionN(1 2) whose upper quantile is 120572 = 001 Werandomly selected 70 of them to assign the oppositesignThis is consistent with EMAP datasets in whichnegative genetic interactions are much more preva-lent than the positive ones

(iii) e = 10minus2

lowastF wherein F is a standard Gaussianmatrix

A low-rank matrix 119871 with rank 25 and sparse matrix withcardinality 250 were generated respectively Now we have

X = L + S + e (16)

The first step is parameter training and the result isshowed in Figure 1 Minimal prediction error was achievedwhen 120582 = 250 and 119896 = 25 which coincides with the rankand cardinality of the synthetic data This demonstrated theeffectiveness of cross validation procedure

Next we compared the performance of LRSDec algo-rithm and GoDec algorithm by comparing their predictionerror The relative error is defined as

10038171003817100381710038171003817W minus W10038171003817100381710038171003817

2

119865

W2

119865

(17)

where W is the original matrix and W is an estimateap-proximation As both algorithms are influenced seriouslyby the parameters we compared the relative error of thetwo algorithms by given different parameters To make thecomparison simple we only changed one parameter with

another parameter fixed (Figure 2) One can see that bothalgorithms reach the smallest relative if adopting the correcttwo parameters and LRSDec outperforms GoDec In theSupplementaryMaterial we also compare the performance ofboth algorithms under different noise setting and the trendsare the same

52 Application to EMAP Data from Yeast We also appliedour method to EMAP data interrogating RNA processing inS cerevisiae which consists of 552 genes involved in RNA-related processes [9]This geneticmap contains about 152000pairwise genetic interaction measurements with about 29missing entries in data matrixWe applied ourmethod to thisEMAP data denoted asX to obtain twomatrices a low-rankmatrix L and a sparse matrix S X = L+S is the new completedata matrix with imputed missing entries To exploit thequantitative information from EMAP data we first subjectedthe entire low-rank matrix L to hierarchical clustering anapproach that groups genes with similar patterns of geneticinteractions It should be noted that using low-rank matrix Lin cluster improved the performance of hierarchical cluster-ing in detecting genetic interaction modules [16] Accordingto the clustering result we reordered rows and columnsof matrix X so that the protein complexes and biologicalprocesses showed in [9] could be found (Figure S1)

To help identify more modules of cellular functions andprocesses and reveal the relationships between them we fur-ther analyzed the matrices L and S Figure 3 is a flowchart ofour strategy 1 to detectmodules and cross talks between themthrough low-rank matrix L In this paper we define moduleas a cluster from hierarchical clustering (HC) that passesthrough GO-enriched filtering (Supplementary Section 3)The details are as follows Firstly we clustered the row genesof matrix L using hierarchical clustering (HC) Here weadopted the average-linkage hierarchical clustering algorithmin which the distance of gene 119860 and gene 119861 is defined as1 minus |cor(119860 119861)| where |cor(119860 119861)| is the absolute value of thecorrelation of genetic interaction profile of gene 119860 and gene119861 A cutoff needs to be applied for HC to cut the hierarchicalclustering tree We used the Jaccard index (SupplementarySection 4) to determine how well the predicted gene setscorrespond to benchmark (theoretical) gene sets [17] HereGO functional gene sets are used as benchmark (theoretical)gene setsThe cutoff atwhichHCachieved the highest Jaccardindex is used to cut the hierarchical tree We calculatedthe Jaccard index of every ldquoheightrdquo cutoff in hierarchicalclustering from 02 to 095 by 005 interval This step resultedin the best Jaccard index with height = 07 and 84 clustersNow we got the clusters of row genes in which genes act ina consistent manner across the entire column genes Thenwe filtered the clustering results by a hypergeometric testthat calculates the significance of enrichment of GO itemsand the 119901 value was set to 001 The clusters enriched in GOfunctional categories are defined as row modules Secondlyfor each row module we exploited modules of column genesbased on this row gene module in matrix X by clustering thecolumn genes of this submatrix of X Thirdly we screenedcolumn clusters whose interactions with the rowmodules are

BioMed Research International 5

0 50 100 150 200 250 300 3500

02

04

06

08

1

Number of ranks

The r

elat

ive e

rror

Relative error of matrix L

(a)

0 50 100 150 200 250 300 3500

02

04

06

08

1

12

14

Number of ranks

The r

elat

ive e

rror

Relative error of matrix S

(b)

0 100 200 300 400 500 6000

005

01

015

02

025

03

035

Number of cards

The r

elat

ive e

rror

Relative error of matrix L

GoDecLRSDec

(c)

0 100 200 300 400 500 6000

02

04

06

08

1

12

14

Number of cards

The r

elat

ive e

rror

Relative error of matrix S

GoDecLRSDec

(d)

Figure 2 Performances of LRSDec and GoDec in low-rank and sparse decomposition tasks on synthetic data under different parameters((a)-(b)) Fixed parameter card different parameter rank ((c)-(d)) fixed parameter rank different parameter card

HC row genes of L

Row modules

GO-enrichment filtering

Based oneach row module

HC column genes of L

Fisherrsquos exact testColumn modules

Connections between this row module and all column modules

Column modules that have functional

All column modules map to GO items

Connections among modules of the whole network

Repeat these three steps for

all significant

row modules

Figure 3 Flowchart of our strategy 1 for detecting modules and cross talks between them in genetic interaction network by low-rank matrixL

6 BioMed Research International

Table 1 Fisherrsquos exact test table (Gene set 119860)perp denotes thecomplementary set of gene set 119860 119860119861 denotes the number ofconnections between gene set 119860 and gene set 119861 119860119861

perp

denotes thenumber of connections between gene set 119860 and the complementaryset of gene set 119861

Gene set 119861 (Gene set 119861)perp

Gene set 119860 119860119861 119860119861perp

(Gene set 119860)perp 119860perp

119861 119860perp

119861perp

Table 2 Clustering results

(a) GO slim as benchmark gene set

Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched

200 0063 190 0022 46150 0070 138 0032 44100 0084 90 0050 5050 0088 30 0067 24

(b) GO BP FAT as benchmark gene set

Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched

200 0137 183 0044 47150 0147 142 0058 50100 0155 96 0091 5250 0131 34 0078 26 hypergeometric test applied to test the enrichment of gene sets Signif-icance level FDR lt= 005 Cluster the number of clusters to cut off thehierarchical clustering tree Enriched the number of modules predictedby hierarchical clustering enriched in the GO iterms

significantly enriched by Fisherrsquos exact test (Table 1 119901 value= 005) Here we defined the positive genetic interactions asthose gene pairs with genetic interaction scores 119878 ge 20 andnegative as 119878 le minus25 [9] The reduced gene sets of columngenes were defined as columnmodules (corresponding to therow module) Next we identified the enriched GO functionalcategories of these column modules by mapping them to GOitems (hypergeometric test) Finally repeating these steps forall row modules we identified the modules and intermodulegenetic cross talk of the whole genetic interaction network(Figures 4ndash6) where red and green represent a statisticallysignificant enrichment of positive and negative interactionsThe cross talks constructed in these figures are the 119878 scoresamong genes in the original matrix In the following we willdiscuss many of the interesting connections that have beenreported previously

The low-rank matrix found more functional modulesthan the original matrix (Table 2) In Table 2 we cut thedendrogram at different heights and compared the clustersobtained from the low-rank matrix and that of the originalmatrix Jaccard index [17] is used to determine how well thepredicted clusters recaptured the benchmark gene sets ((a)GO slim and (b) GO BP FAT) The definition of Jaccardindex can be found in the Supplementary Material Inthe ideal situation where predicted clusters perfectly match

the benchmark gene sets the Jaccard index is 1 The largerthe Jaccard index the better the predictions The clustersobtained from clustering of the low-rank matrix are moreenriched with GO functional categories at varying cutoffs(Table 2)

Figure 4 gives an overview of the relationships amongbiological processes when GO slim (downloaded from SGD[18] a broad overview of all of the top GO categories) is usedas GO items We found that several sets of genes that havebeen known to function in the same biochemical processescontain predominantly positive or negative interactionswhich was also observed in [9] For example genes classed asinvolved in RNA splicing and transcription are significantlyenriched in negative genetic interactions (Figure 4 greennodes) In contrast the module involved in protein foldinghas strong positive interactions (red node) In additionFigure 4 also suggested that not all modules have consistentpattern of interactions (yellow node) which is reasonable inbiological processes Finally several connections have beenpreviously discussed For example there is good evidencefor functional interactions between splicing and transcriptionin [19] and functional interactions between splicing andtranslation in [20] Furthermore [21] reported the cooper-ative relationship between protein folding and chromosomeorganization

Actually if we classify the GO functional categories tomore fine items (GO BP FAT downloaded from DAVIDhttpdavidabccncifcrfgov) we can get a more compre-hensive network (Figure 5) Many of the interaction resultsin Figure 5 have been reported before For instance genesinvolved in tubulin complex assembly process have negativegenetic interactions with genes involved in tRNA wobbleuridine modification process supported by SGD while thenegative genetic interactions between RNA splicing processand tRNA metabolic process could also be found Actuallythe genetic interaction between RNA splicing and chromatinmodification has been studied in [22] And the balance ofthe interactions between the processing of ribonucleoproteinassembly of intronic noncoding RNAs and the splicingprocess regulating the levels of ncRNA and host mRNA canbe found in [23] Tubulin functionally relating to roles ofthe elongator complex in tRNA wobble uridine modificationis supported by [24] Moreover epistasis and chromatinimmunoprecipitation experiments indicating that the lossof Rrp6 (regulation of transcription) function is paralleledby the recruitment of Hda1 (histone deacetylase) have beenreported by [25] Finally cotranscriptional recruitment of themRNA export factor Yra1 realized by direct interaction withthe 31015840 end processing factor Pcf11 was in [26]

In an effort to gain insight into the functional organi-zation of RNA-related complexes we used the GO CC FAT(downloaded from DAVID) as the GO functional categoriesto create a map that highlights strong genetic trends bothwithin and between these complexes This result could befound in the Supplementary Material

Now let us turn to the analysis of the sparse matrix S Thesparse matrix gives two distinct measures to exploit geneticinformation First extreme 119878 scores indicate cofunctionalmembership more efficiently [27] Second some 119878 scores

BioMed Research International 7

Negativeinteractions

Positiveinteractions

LOC1LIN1

BUD13

IST3LEA1

STO1

PRP11SNU66

CBC2BRR1MUD2

TGS1YNR004WCWC15ECM2

YPR152CNTC20ISY1CWC21SYF2CWC27PRP19CSN12

PML1MUD1

NAM8

HMT1SNU56

PRP42TIF35DBR1YHC1

LOC1

LIN1

BUD13

IST3

LEA1

STO1

PRP1

1

SNU66

CBC2

BRR1

MU

D2

TGS1

YNR0

04

WCW

C15

ECM2

YPR1

52

CN

TC20

ISY1

CWC2

1

SYF2

CWC2

7

PRP1

9

CSN12

PML1

MU

D1

NA

M8

HM

T1SN

U56

PRP4

2

TIF3

5

DBR

1

YHC1

NPL3PUF4THP2RTT103CTK2SIF2HOS2SET3CDC73LEO1

RTT109

RTT109

CTK1HTZ1SOH1

SOH1

DST1CSE2

CSE2

MED1

SDC1EAF7THP1SYC1RXT2

RXT2

SWR1VPS71

PHO23

SAP30

PHO23SAP30SIN3

SIN3

HIR2

PUF4

SIF2HOS2SET3CDC73LEO1

SOH1CSE2

SDC1EAF7

RXT2

PHO2

SAP3

0SI

N3

HIR2

HIR2

RTT109G

BP2

SWR1VPS71

GLC

7

HIR2

NPL

3

PUF4

THP2

RTT1

03

CTK2

SIF2

HO

S2SE

T3CD

C73

LEO1

RTT1

09

CTK1

HTZ

1

SOH1

DST

1

CSE2

MED

1

SDC1

EAF7

THP1

SYC1

RXT2

PHO23

SAP3

0

SIN3

HIR2

GBP

2YR

A1

THP2

RPN4

SOH1

SAC3

THP1

DO

A1

RAD23

RTT1

09

SIN3

SWD1

SWD3

SWD3

RPN

4CA

F40

BRE1

SWD1

BRE1

LGE1

CDC7

3EA

F6EA

F5EA

F7SR

C1PA

P2EA

F3

GBP2

RAD23

CSE2

PUF4

SWD1

SWD3

BRE1

LGE1

RAD52

SIF2

HO

S2

SOH1

SET3

CDC7

LEO1

SRC1

SDC1

EAF3

RTT1

0

SWR1

VPS

71

RAD5

RAD5

RAD2

EAF6

EAF7

BRE1SIF2HOS2SET3SEM1

SAC3THP1BUD13

RPN4

RAD52

GLC7SRC1TOR1RTT101RAD51

SOH1

CSE2SIN3

MLP2

BRE1

KEM

1

SIF2

HO

S2SE

T3SE

M1

SAC3

THP1

BUD13

RPN4

RAD52

GLC

7

SRC1

TOR1

RTT1

01

RAD51

SOH1

CSE2

SIN3

MLP

2

YBL104CYGR283CRIC1YPT6GIM4

YKE2GIM3

GIM5

YGR2

83

CRI

C1YP

T6G

IM4

YKE2

GIM

3

GIM

5

RNA splicing

Transcription

DNA metabolicprocess

Cellular proteinmodification

processTranslation

Cellular proteindegradation

Transport

Mitochondrionorganization

Ribosome

Chromosomeorganization

Response tostress

Cell-divisioncycle

Protein folding

LRS4

HEX

3SL

X8LR

P1RR

P6FP

R1TG

S1

Figure 4 Global view of the genetic cross talks between different RNA-related processes (GO slim items) Green and red represent astatistically significant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20)interactions respectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative geneticinteractions Nodes (balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heatmaps represent scores of interactions within one process and the rectangle heatmaps represent scores of interactions betweentwo processes

indicate the significant genetic interactions between genesin different gene sets Following this clue by analyzing thematrix S we found much evidence of genes involved in thesame functionalmodules andmany cross talks between func-tional modules (Figure 6) Actually many of them supportthe network in Figures 4-5 The information of gene pairswith extreme 119878 scores and the involved modules could befound in the Supplementary Material (Supplementary Data2)

Strategy 2 of sparse matrix analysis is similar to strategy 1(Figure 3) First for every row module (the same definitionas that in Figure 3) cluster the column genes based ontheir genetic spectrums across genes in this row moduleThen select the column gene sets in which there are genesbelonging to nonzero entries of sparse matrix to be definedas column modules Finally map these column modules toGO items identifying their enriched functional categories(hypergeometric test) Similarly for all row modules repeat

8 BioMed Research International

PRP4

YRA1

CDC73

LGE1SDC1

RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1

SWD1RTT109

SWD3BRE1EAF5EAF7KAP122

SSD1TAD3EAF6

SWR1VPS71HTZ1EAF3CUL3

SEM1THP2SAC3THP1

PUF4GLC7SRC1

TOP1NUP60EFB1DBP5HIR2

SEM

1TH

P2SA

C3TH

P1

PUF4

GLC

7SR

C1

TOP1

NU

P60

EFB1

DBP

5H

IR2

RTT103

GBP2

CTK1

CTK2

SIF2HOS2SET3LEO1NPL3

UBP6DOA1RPN4

RAD23

PRP4

YRA1

CDC73

LGE1

SDC1

RXT2PHO23

SAP30

RXT2

PHO23

SAP3

0

SIN3

TAD3

DST1

CSE2

MED1

SWD1

RTT109

SWD3BRE1

EAF5EAF7

KAP122

SSD1

SWR1VPS71

CUL3

SOP1

EAF6

SET3

HTZ1

EAF3

RTT103

GBP2

CTK1

CTK2

SIF2

HOS2

SIN

3

LEO1

NPL3

DOA1UBP6

RPN4

RAD23

CBC2

CSN12

LEA1

CWC21

YHC1

PRP19

ELP6

PRP11

CWC27NTC20

IKI1

ELP2

ELP3IKI3

ELP4

ECM2

MUD2TGS1

SNU56

BRR1

BUD13

MLP2SYF2

YNR004WLIN1

SNU66PML1

YPR152C

URM1

ELP6

IKI1

ELP2

ELP3

IKI3

ELP4

URM

1

MUD1

HMT1

GIM5

YKE2GIM3

GIM4

GIM5YKE2GIM3GIM4

ELP3

ELP6

IKI1

GIM

5

YKE2

GIM

3

GIM

4

HTZ

1

SWR1

VPS

71

SGF1

1

TGS1

SKI7

HCR

1

RPS8

NPL

3

DRS

1

IST3

TIF35

ISY1

NAM8

CBC2

CSN12

LEA1

CWC21

YHC1

PRP11

CWC27

NTC20

ECM2

MUD1

TGS1

SNU56BRR1

BUD13

MLP2

SYF2

YNR004W

LIN1

SNU66

PML1

YPR152C

MUD2

HMT1

IST3

TIF35

ISY1

NAM8

PRP4

YRA

1

CDC7

3LG

E1

SSD

1

RXT2

PHO

23SA

P30

SIN

3D

ST1

CSE2

MED

1SO

H1

SWD

1

RTT1

09

SWD

3BR

E1

EAF5

EAF7

EAF6

KAP1

22

SDC1

TAD

3

SWR1

VPS

71H

TZ1

EAF3

CUL3

RTT1

03

GBP

2

CTK1

CTK2

SIF2

HO

S2SE

T3LE

O1

NPL

3

UBP

6D

OA

1

RPN

4

RAD

23

RNA splicing

Transcription

Transcription

transcription

RNA transport RNA localization

Intracellulartransport

Ribonucleoproteincomplex

biogenesis

Ribonucleoproteincomplex

biogenesis

Deadenylationdependentdecapping of

nucleartranscribed

mRNA

biogenesis

Regulation ofcellular protein

metabolic ncRNA

mRNA catabolic

metabolic

Ribosomeprocess

process

process

processprocess

process

Posttranscriptionalregulation of

gene expression

dependentmacromolecule

catabolic

tRNAmodification

Modification

process

rRNA metabolic

mRNA metabolic

process

process

process

process

Macromoleculecatabolicprocess

aciddeacetylation

deacetylation

Protein amino

RNA catabolic

Macromoleculecatabolic

tRNA metabolic

Chromatinmodification

tRNA wobbleuridine

modification

rRNA metabolicMacromolecularcomplex subunit

organizationCellular protein

complexassembly

ncRNAmetabolic

Cellular proteinlocalization

Regulation oftranslation

Regulation of

from nucleusmRNA export Histone

Maturation ofSSU rRNA

tRNA

Tubulin complexassembly

RNA capping

processing

RNAprocessing

PRP19

aminoacylation

mRNA 3998400end

3998400end

Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes

the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)

Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein

folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878

score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial

BioMed Research International 9

RNA transportMaturation of

SSU-rRNA

RPS6A

POM152

UAF30

DBP5 NUP157

RPS11B

RPS6B

THP1

SRC1

TOP1

SEM1

NUP60

ENP1

SAC3

PUF4 HIR2

GLC7EFB1

THP2

(a)

RNA transport RNA localization

RRP6

YRA2

DBP5

NUP53

LRP1

THP1SRC1

TOP1

SEM1

NUP60

SAC3

PUF4HIR2

GLC7EFB1

THP2

(b)

Regulation ofgene expression

mRNA metabolicprocess

POP1

PRP42

CDC36SNU71

PAB1

SUP35

PRP3

PRP5

PRP6

PRP21

CDC73

EAF3

RTT103

RTT109

SWD1

SWD3

SDC1SAP30

RXT2

CTK1SET3

RPN4

PRP4

PHO23

NPL3

MED1

LGE1

LEO1

KAP122 DST1CTK2

EAF5HOS2

TAD3CSE2 BRE1

VPS71EAF7EAF6

SOH1

SIF2DOA1

UBP6 SWR1

SSD1

SIN3

HTZ1

(c)

Tubulin complex assembly

(protein folding)mRNA

process3998400 end

NAB2

PAN3

CTK1

GIM3

YKE2

GIM5

GIM4

(d)

Tubulin complex assembly

TOP1

MLP2 HIR2

CDC36CKA1

UTP4

GIM3

YKE2

GIM5

GIM4

(protein folding)Regulation oftranscription

(e)

mRNA splicingncRNA metabolic

process

YPR152C

YNR004WCSN12

MUD1

YHC1

MUD2

NAM8

MSM1

MSD1

MSK1

MRM1

MSF1

PRP19

SNU66

SNU56

PRP11

PML1ECM2

BUD13

HMT1

CWC21CWC27 TGS1

IST3CBC2

ISY1

LIN1

LEA1

MLP2

BRR1

NTC20SYF2

TIF35

(f)

Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix

10 BioMed Research International

6 Conclusion

In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)

References

[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919

[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007

[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008

[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001

[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004

[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative

epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006

[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007

[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008

[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008

[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260

[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006

[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011

[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011

[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010

[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004

[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014

[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009

[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997

[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014

[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013

[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012

[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012

[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008

BioMed Research International 11

[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008

[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007

[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009

[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011

[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011

[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010

[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008

[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article Low-Rank and Sparse Matrix Decomposition ...downloads.hindawi.com/journals/bmri/2015/573956.pdfResearch Article Low-Rank and Sparse Matrix Decomposition for Genetic

4 BioMed Research International

0 50 100 150 200 250 300 3500

200400

6000

10

20

Rank of matrix L

Card of matrix S

Rela

tive e

rror

of

The minimum ofrelative error

25

250

deco

mpo

sitio

n

Figure 1 Results of parameter tuning on synthetic data 119909-axisdifferent cardinality corresponding to parameter 119896 119910-axis differentrank corresponding to parameter 120582 119911-axis the relative error X minus

X2

119865X

2119865

5 Results

51 Synthetic Data We simulated a synthetic data and thenapplied LRSDec algorithm andGoDec algorithm to it Specif-ically low-rank part sparse part and noises are generated asfollows

(i) Low-rank part the covariance matrix Σ is generatedbyHH119879 whereH isin R119898times119870 andH

119894119895sim N(0 1) Here119870

is the number of hiddenmodulesThe random entriesL119895are drawn fromN(0 120591Σ) Let L = [L

1 L

119899]

(ii) Sparse part the non-zero entries in sparse matrixare generated from the tail of Gaussian distributionN(1 2) whose upper quantile is 120572 = 001 Werandomly selected 70 of them to assign the oppositesignThis is consistent with EMAP datasets in whichnegative genetic interactions are much more preva-lent than the positive ones

(iii) e = 10minus2

lowastF wherein F is a standard Gaussianmatrix

A low-rank matrix 119871 with rank 25 and sparse matrix withcardinality 250 were generated respectively Now we have

X = L + S + e (16)

The first step is parameter training and the result isshowed in Figure 1 Minimal prediction error was achievedwhen 120582 = 250 and 119896 = 25 which coincides with the rankand cardinality of the synthetic data This demonstrated theeffectiveness of cross validation procedure

Next we compared the performance of LRSDec algo-rithm and GoDec algorithm by comparing their predictionerror The relative error is defined as

10038171003817100381710038171003817W minus W10038171003817100381710038171003817

2

119865

W2

119865

(17)

where W is the original matrix and W is an estimateap-proximation As both algorithms are influenced seriouslyby the parameters we compared the relative error of thetwo algorithms by given different parameters To make thecomparison simple we only changed one parameter with

another parameter fixed (Figure 2) One can see that bothalgorithms reach the smallest relative if adopting the correcttwo parameters and LRSDec outperforms GoDec In theSupplementaryMaterial we also compare the performance ofboth algorithms under different noise setting and the trendsare the same

52 Application to EMAP Data from Yeast We also appliedour method to EMAP data interrogating RNA processing inS cerevisiae which consists of 552 genes involved in RNA-related processes [9]This geneticmap contains about 152000pairwise genetic interaction measurements with about 29missing entries in data matrixWe applied ourmethod to thisEMAP data denoted asX to obtain twomatrices a low-rankmatrix L and a sparse matrix S X = L+S is the new completedata matrix with imputed missing entries To exploit thequantitative information from EMAP data we first subjectedthe entire low-rank matrix L to hierarchical clustering anapproach that groups genes with similar patterns of geneticinteractions It should be noted that using low-rank matrix Lin cluster improved the performance of hierarchical cluster-ing in detecting genetic interaction modules [16] Accordingto the clustering result we reordered rows and columnsof matrix X so that the protein complexes and biologicalprocesses showed in [9] could be found (Figure S1)

To help identify more modules of cellular functions andprocesses and reveal the relationships between them we fur-ther analyzed the matrices L and S Figure 3 is a flowchart ofour strategy 1 to detectmodules and cross talks between themthrough low-rank matrix L In this paper we define moduleas a cluster from hierarchical clustering (HC) that passesthrough GO-enriched filtering (Supplementary Section 3)The details are as follows Firstly we clustered the row genesof matrix L using hierarchical clustering (HC) Here weadopted the average-linkage hierarchical clustering algorithmin which the distance of gene 119860 and gene 119861 is defined as1 minus |cor(119860 119861)| where |cor(119860 119861)| is the absolute value of thecorrelation of genetic interaction profile of gene 119860 and gene119861 A cutoff needs to be applied for HC to cut the hierarchicalclustering tree We used the Jaccard index (SupplementarySection 4) to determine how well the predicted gene setscorrespond to benchmark (theoretical) gene sets [17] HereGO functional gene sets are used as benchmark (theoretical)gene setsThe cutoff atwhichHCachieved the highest Jaccardindex is used to cut the hierarchical tree We calculatedthe Jaccard index of every ldquoheightrdquo cutoff in hierarchicalclustering from 02 to 095 by 005 interval This step resultedin the best Jaccard index with height = 07 and 84 clustersNow we got the clusters of row genes in which genes act ina consistent manner across the entire column genes Thenwe filtered the clustering results by a hypergeometric testthat calculates the significance of enrichment of GO itemsand the 119901 value was set to 001 The clusters enriched in GOfunctional categories are defined as row modules Secondlyfor each row module we exploited modules of column genesbased on this row gene module in matrix X by clustering thecolumn genes of this submatrix of X Thirdly we screenedcolumn clusters whose interactions with the rowmodules are

BioMed Research International 5

0 50 100 150 200 250 300 3500

02

04

06

08

1

Number of ranks

The r

elat

ive e

rror

Relative error of matrix L

(a)

0 50 100 150 200 250 300 3500

02

04

06

08

1

12

14

Number of ranks

The r

elat

ive e

rror

Relative error of matrix S

(b)

0 100 200 300 400 500 6000

005

01

015

02

025

03

035

Number of cards

The r

elat

ive e

rror

Relative error of matrix L

GoDecLRSDec

(c)

0 100 200 300 400 500 6000

02

04

06

08

1

12

14

Number of cards

The r

elat

ive e

rror

Relative error of matrix S

GoDecLRSDec

(d)

Figure 2 Performances of LRSDec and GoDec in low-rank and sparse decomposition tasks on synthetic data under different parameters((a)-(b)) Fixed parameter card different parameter rank ((c)-(d)) fixed parameter rank different parameter card

HC row genes of L

Row modules

GO-enrichment filtering

Based oneach row module

HC column genes of L

Fisherrsquos exact testColumn modules

Connections between this row module and all column modules

Column modules that have functional

All column modules map to GO items

Connections among modules of the whole network

Repeat these three steps for

all significant

row modules

Figure 3 Flowchart of our strategy 1 for detecting modules and cross talks between them in genetic interaction network by low-rank matrixL

6 BioMed Research International

Table 1 Fisherrsquos exact test table (Gene set 119860)perp denotes thecomplementary set of gene set 119860 119860119861 denotes the number ofconnections between gene set 119860 and gene set 119861 119860119861

perp

denotes thenumber of connections between gene set 119860 and the complementaryset of gene set 119861

Gene set 119861 (Gene set 119861)perp

Gene set 119860 119860119861 119860119861perp

(Gene set 119860)perp 119860perp

119861 119860perp

119861perp

Table 2 Clustering results

(a) GO slim as benchmark gene set

Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched

200 0063 190 0022 46150 0070 138 0032 44100 0084 90 0050 5050 0088 30 0067 24

(b) GO BP FAT as benchmark gene set

Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched

200 0137 183 0044 47150 0147 142 0058 50100 0155 96 0091 5250 0131 34 0078 26 hypergeometric test applied to test the enrichment of gene sets Signif-icance level FDR lt= 005 Cluster the number of clusters to cut off thehierarchical clustering tree Enriched the number of modules predictedby hierarchical clustering enriched in the GO iterms

significantly enriched by Fisherrsquos exact test (Table 1 119901 value= 005) Here we defined the positive genetic interactions asthose gene pairs with genetic interaction scores 119878 ge 20 andnegative as 119878 le minus25 [9] The reduced gene sets of columngenes were defined as columnmodules (corresponding to therow module) Next we identified the enriched GO functionalcategories of these column modules by mapping them to GOitems (hypergeometric test) Finally repeating these steps forall row modules we identified the modules and intermodulegenetic cross talk of the whole genetic interaction network(Figures 4ndash6) where red and green represent a statisticallysignificant enrichment of positive and negative interactionsThe cross talks constructed in these figures are the 119878 scoresamong genes in the original matrix In the following we willdiscuss many of the interesting connections that have beenreported previously

The low-rank matrix found more functional modulesthan the original matrix (Table 2) In Table 2 we cut thedendrogram at different heights and compared the clustersobtained from the low-rank matrix and that of the originalmatrix Jaccard index [17] is used to determine how well thepredicted clusters recaptured the benchmark gene sets ((a)GO slim and (b) GO BP FAT) The definition of Jaccardindex can be found in the Supplementary Material Inthe ideal situation where predicted clusters perfectly match

the benchmark gene sets the Jaccard index is 1 The largerthe Jaccard index the better the predictions The clustersobtained from clustering of the low-rank matrix are moreenriched with GO functional categories at varying cutoffs(Table 2)

Figure 4 gives an overview of the relationships amongbiological processes when GO slim (downloaded from SGD[18] a broad overview of all of the top GO categories) is usedas GO items We found that several sets of genes that havebeen known to function in the same biochemical processescontain predominantly positive or negative interactionswhich was also observed in [9] For example genes classed asinvolved in RNA splicing and transcription are significantlyenriched in negative genetic interactions (Figure 4 greennodes) In contrast the module involved in protein foldinghas strong positive interactions (red node) In additionFigure 4 also suggested that not all modules have consistentpattern of interactions (yellow node) which is reasonable inbiological processes Finally several connections have beenpreviously discussed For example there is good evidencefor functional interactions between splicing and transcriptionin [19] and functional interactions between splicing andtranslation in [20] Furthermore [21] reported the cooper-ative relationship between protein folding and chromosomeorganization

Actually if we classify the GO functional categories tomore fine items (GO BP FAT downloaded from DAVIDhttpdavidabccncifcrfgov) we can get a more compre-hensive network (Figure 5) Many of the interaction resultsin Figure 5 have been reported before For instance genesinvolved in tubulin complex assembly process have negativegenetic interactions with genes involved in tRNA wobbleuridine modification process supported by SGD while thenegative genetic interactions between RNA splicing processand tRNA metabolic process could also be found Actuallythe genetic interaction between RNA splicing and chromatinmodification has been studied in [22] And the balance ofthe interactions between the processing of ribonucleoproteinassembly of intronic noncoding RNAs and the splicingprocess regulating the levels of ncRNA and host mRNA canbe found in [23] Tubulin functionally relating to roles ofthe elongator complex in tRNA wobble uridine modificationis supported by [24] Moreover epistasis and chromatinimmunoprecipitation experiments indicating that the lossof Rrp6 (regulation of transcription) function is paralleledby the recruitment of Hda1 (histone deacetylase) have beenreported by [25] Finally cotranscriptional recruitment of themRNA export factor Yra1 realized by direct interaction withthe 31015840 end processing factor Pcf11 was in [26]

In an effort to gain insight into the functional organi-zation of RNA-related complexes we used the GO CC FAT(downloaded from DAVID) as the GO functional categoriesto create a map that highlights strong genetic trends bothwithin and between these complexes This result could befound in the Supplementary Material

Now let us turn to the analysis of the sparse matrix S Thesparse matrix gives two distinct measures to exploit geneticinformation First extreme 119878 scores indicate cofunctionalmembership more efficiently [27] Second some 119878 scores

BioMed Research International 7

Negativeinteractions

Positiveinteractions

LOC1LIN1

BUD13

IST3LEA1

STO1

PRP11SNU66

CBC2BRR1MUD2

TGS1YNR004WCWC15ECM2

YPR152CNTC20ISY1CWC21SYF2CWC27PRP19CSN12

PML1MUD1

NAM8

HMT1SNU56

PRP42TIF35DBR1YHC1

LOC1

LIN1

BUD13

IST3

LEA1

STO1

PRP1

1

SNU66

CBC2

BRR1

MU

D2

TGS1

YNR0

04

WCW

C15

ECM2

YPR1

52

CN

TC20

ISY1

CWC2

1

SYF2

CWC2

7

PRP1

9

CSN12

PML1

MU

D1

NA

M8

HM

T1SN

U56

PRP4

2

TIF3

5

DBR

1

YHC1

NPL3PUF4THP2RTT103CTK2SIF2HOS2SET3CDC73LEO1

RTT109

RTT109

CTK1HTZ1SOH1

SOH1

DST1CSE2

CSE2

MED1

SDC1EAF7THP1SYC1RXT2

RXT2

SWR1VPS71

PHO23

SAP30

PHO23SAP30SIN3

SIN3

HIR2

PUF4

SIF2HOS2SET3CDC73LEO1

SOH1CSE2

SDC1EAF7

RXT2

PHO2

SAP3

0SI

N3

HIR2

HIR2

RTT109G

BP2

SWR1VPS71

GLC

7

HIR2

NPL

3

PUF4

THP2

RTT1

03

CTK2

SIF2

HO

S2SE

T3CD

C73

LEO1

RTT1

09

CTK1

HTZ

1

SOH1

DST

1

CSE2

MED

1

SDC1

EAF7

THP1

SYC1

RXT2

PHO23

SAP3

0

SIN3

HIR2

GBP

2YR

A1

THP2

RPN4

SOH1

SAC3

THP1

DO

A1

RAD23

RTT1

09

SIN3

SWD1

SWD3

SWD3

RPN

4CA

F40

BRE1

SWD1

BRE1

LGE1

CDC7

3EA

F6EA

F5EA

F7SR

C1PA

P2EA

F3

GBP2

RAD23

CSE2

PUF4

SWD1

SWD3

BRE1

LGE1

RAD52

SIF2

HO

S2

SOH1

SET3

CDC7

LEO1

SRC1

SDC1

EAF3

RTT1

0

SWR1

VPS

71

RAD5

RAD5

RAD2

EAF6

EAF7

BRE1SIF2HOS2SET3SEM1

SAC3THP1BUD13

RPN4

RAD52

GLC7SRC1TOR1RTT101RAD51

SOH1

CSE2SIN3

MLP2

BRE1

KEM

1

SIF2

HO

S2SE

T3SE

M1

SAC3

THP1

BUD13

RPN4

RAD52

GLC

7

SRC1

TOR1

RTT1

01

RAD51

SOH1

CSE2

SIN3

MLP

2

YBL104CYGR283CRIC1YPT6GIM4

YKE2GIM3

GIM5

YGR2

83

CRI

C1YP

T6G

IM4

YKE2

GIM

3

GIM

5

RNA splicing

Transcription

DNA metabolicprocess

Cellular proteinmodification

processTranslation

Cellular proteindegradation

Transport

Mitochondrionorganization

Ribosome

Chromosomeorganization

Response tostress

Cell-divisioncycle

Protein folding

LRS4

HEX

3SL

X8LR

P1RR

P6FP

R1TG

S1

Figure 4 Global view of the genetic cross talks between different RNA-related processes (GO slim items) Green and red represent astatistically significant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20)interactions respectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative geneticinteractions Nodes (balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heatmaps represent scores of interactions within one process and the rectangle heatmaps represent scores of interactions betweentwo processes

indicate the significant genetic interactions between genesin different gene sets Following this clue by analyzing thematrix S we found much evidence of genes involved in thesame functionalmodules andmany cross talks between func-tional modules (Figure 6) Actually many of them supportthe network in Figures 4-5 The information of gene pairswith extreme 119878 scores and the involved modules could befound in the Supplementary Material (Supplementary Data2)

Strategy 2 of sparse matrix analysis is similar to strategy 1(Figure 3) First for every row module (the same definitionas that in Figure 3) cluster the column genes based ontheir genetic spectrums across genes in this row moduleThen select the column gene sets in which there are genesbelonging to nonzero entries of sparse matrix to be definedas column modules Finally map these column modules toGO items identifying their enriched functional categories(hypergeometric test) Similarly for all row modules repeat

8 BioMed Research International

PRP4

YRA1

CDC73

LGE1SDC1

RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1

SWD1RTT109

SWD3BRE1EAF5EAF7KAP122

SSD1TAD3EAF6

SWR1VPS71HTZ1EAF3CUL3

SEM1THP2SAC3THP1

PUF4GLC7SRC1

TOP1NUP60EFB1DBP5HIR2

SEM

1TH

P2SA

C3TH

P1

PUF4

GLC

7SR

C1

TOP1

NU

P60

EFB1

DBP

5H

IR2

RTT103

GBP2

CTK1

CTK2

SIF2HOS2SET3LEO1NPL3

UBP6DOA1RPN4

RAD23

PRP4

YRA1

CDC73

LGE1

SDC1

RXT2PHO23

SAP30

RXT2

PHO23

SAP3

0

SIN3

TAD3

DST1

CSE2

MED1

SWD1

RTT109

SWD3BRE1

EAF5EAF7

KAP122

SSD1

SWR1VPS71

CUL3

SOP1

EAF6

SET3

HTZ1

EAF3

RTT103

GBP2

CTK1

CTK2

SIF2

HOS2

SIN

3

LEO1

NPL3

DOA1UBP6

RPN4

RAD23

CBC2

CSN12

LEA1

CWC21

YHC1

PRP19

ELP6

PRP11

CWC27NTC20

IKI1

ELP2

ELP3IKI3

ELP4

ECM2

MUD2TGS1

SNU56

BRR1

BUD13

MLP2SYF2

YNR004WLIN1

SNU66PML1

YPR152C

URM1

ELP6

IKI1

ELP2

ELP3

IKI3

ELP4

URM

1

MUD1

HMT1

GIM5

YKE2GIM3

GIM4

GIM5YKE2GIM3GIM4

ELP3

ELP6

IKI1

GIM

5

YKE2

GIM

3

GIM

4

HTZ

1

SWR1

VPS

71

SGF1

1

TGS1

SKI7

HCR

1

RPS8

NPL

3

DRS

1

IST3

TIF35

ISY1

NAM8

CBC2

CSN12

LEA1

CWC21

YHC1

PRP11

CWC27

NTC20

ECM2

MUD1

TGS1

SNU56BRR1

BUD13

MLP2

SYF2

YNR004W

LIN1

SNU66

PML1

YPR152C

MUD2

HMT1

IST3

TIF35

ISY1

NAM8

PRP4

YRA

1

CDC7

3LG

E1

SSD

1

RXT2

PHO

23SA

P30

SIN

3D

ST1

CSE2

MED

1SO

H1

SWD

1

RTT1

09

SWD

3BR

E1

EAF5

EAF7

EAF6

KAP1

22

SDC1

TAD

3

SWR1

VPS

71H

TZ1

EAF3

CUL3

RTT1

03

GBP

2

CTK1

CTK2

SIF2

HO

S2SE

T3LE

O1

NPL

3

UBP

6D

OA

1

RPN

4

RAD

23

RNA splicing

Transcription

Transcription

transcription

RNA transport RNA localization

Intracellulartransport

Ribonucleoproteincomplex

biogenesis

Ribonucleoproteincomplex

biogenesis

Deadenylationdependentdecapping of

nucleartranscribed

mRNA

biogenesis

Regulation ofcellular protein

metabolic ncRNA

mRNA catabolic

metabolic

Ribosomeprocess

process

process

processprocess

process

Posttranscriptionalregulation of

gene expression

dependentmacromolecule

catabolic

tRNAmodification

Modification

process

rRNA metabolic

mRNA metabolic

process

process

process

process

Macromoleculecatabolicprocess

aciddeacetylation

deacetylation

Protein amino

RNA catabolic

Macromoleculecatabolic

tRNA metabolic

Chromatinmodification

tRNA wobbleuridine

modification

rRNA metabolicMacromolecularcomplex subunit

organizationCellular protein

complexassembly

ncRNAmetabolic

Cellular proteinlocalization

Regulation oftranslation

Regulation of

from nucleusmRNA export Histone

Maturation ofSSU rRNA

tRNA

Tubulin complexassembly

RNA capping

processing

RNAprocessing

PRP19

aminoacylation

mRNA 3998400end

3998400end

Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes

the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)

Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein

folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878

score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial

BioMed Research International 9

RNA transportMaturation of

SSU-rRNA

RPS6A

POM152

UAF30

DBP5 NUP157

RPS11B

RPS6B

THP1

SRC1

TOP1

SEM1

NUP60

ENP1

SAC3

PUF4 HIR2

GLC7EFB1

THP2

(a)

RNA transport RNA localization

RRP6

YRA2

DBP5

NUP53

LRP1

THP1SRC1

TOP1

SEM1

NUP60

SAC3

PUF4HIR2

GLC7EFB1

THP2

(b)

Regulation ofgene expression

mRNA metabolicprocess

POP1

PRP42

CDC36SNU71

PAB1

SUP35

PRP3

PRP5

PRP6

PRP21

CDC73

EAF3

RTT103

RTT109

SWD1

SWD3

SDC1SAP30

RXT2

CTK1SET3

RPN4

PRP4

PHO23

NPL3

MED1

LGE1

LEO1

KAP122 DST1CTK2

EAF5HOS2

TAD3CSE2 BRE1

VPS71EAF7EAF6

SOH1

SIF2DOA1

UBP6 SWR1

SSD1

SIN3

HTZ1

(c)

Tubulin complex assembly

(protein folding)mRNA

process3998400 end

NAB2

PAN3

CTK1

GIM3

YKE2

GIM5

GIM4

(d)

Tubulin complex assembly

TOP1

MLP2 HIR2

CDC36CKA1

UTP4

GIM3

YKE2

GIM5

GIM4

(protein folding)Regulation oftranscription

(e)

mRNA splicingncRNA metabolic

process

YPR152C

YNR004WCSN12

MUD1

YHC1

MUD2

NAM8

MSM1

MSD1

MSK1

MRM1

MSF1

PRP19

SNU66

SNU56

PRP11

PML1ECM2

BUD13

HMT1

CWC21CWC27 TGS1

IST3CBC2

ISY1

LIN1

LEA1

MLP2

BRR1

NTC20SYF2

TIF35

(f)

Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix

10 BioMed Research International

6 Conclusion

In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)

References

[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919

[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007

[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008

[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001

[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004

[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative

epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006

[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007

[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008

[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008

[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260

[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006

[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011

[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011

[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010

[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004

[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014

[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009

[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997

[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014

[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013

[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012

[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012

[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008

BioMed Research International 11

[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008

[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007

[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009

[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011

[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011

[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010

[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008

[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article Low-Rank and Sparse Matrix Decomposition ...downloads.hindawi.com/journals/bmri/2015/573956.pdfResearch Article Low-Rank and Sparse Matrix Decomposition for Genetic

BioMed Research International 5

0 50 100 150 200 250 300 3500

02

04

06

08

1

Number of ranks

The r

elat

ive e

rror

Relative error of matrix L

(a)

0 50 100 150 200 250 300 3500

02

04

06

08

1

12

14

Number of ranks

The r

elat

ive e

rror

Relative error of matrix S

(b)

0 100 200 300 400 500 6000

005

01

015

02

025

03

035

Number of cards

The r

elat

ive e

rror

Relative error of matrix L

GoDecLRSDec

(c)

0 100 200 300 400 500 6000

02

04

06

08

1

12

14

Number of cards

The r

elat

ive e

rror

Relative error of matrix S

GoDecLRSDec

(d)

Figure 2 Performances of LRSDec and GoDec in low-rank and sparse decomposition tasks on synthetic data under different parameters((a)-(b)) Fixed parameter card different parameter rank ((c)-(d)) fixed parameter rank different parameter card

HC row genes of L

Row modules

GO-enrichment filtering

Based oneach row module

HC column genes of L

Fisherrsquos exact testColumn modules

Connections between this row module and all column modules

Column modules that have functional

All column modules map to GO items

Connections among modules of the whole network

Repeat these three steps for

all significant

row modules

Figure 3 Flowchart of our strategy 1 for detecting modules and cross talks between them in genetic interaction network by low-rank matrixL

6 BioMed Research International

Table 1 Fisherrsquos exact test table (Gene set 119860)perp denotes thecomplementary set of gene set 119860 119860119861 denotes the number ofconnections between gene set 119860 and gene set 119861 119860119861

perp

denotes thenumber of connections between gene set 119860 and the complementaryset of gene set 119861

Gene set 119861 (Gene set 119861)perp

Gene set 119860 119860119861 119860119861perp

(Gene set 119860)perp 119860perp

119861 119860perp

119861perp

Table 2 Clustering results

(a) GO slim as benchmark gene set

Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched

200 0063 190 0022 46150 0070 138 0032 44100 0084 90 0050 5050 0088 30 0067 24

(b) GO BP FAT as benchmark gene set

Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched

200 0137 183 0044 47150 0147 142 0058 50100 0155 96 0091 5250 0131 34 0078 26 hypergeometric test applied to test the enrichment of gene sets Signif-icance level FDR lt= 005 Cluster the number of clusters to cut off thehierarchical clustering tree Enriched the number of modules predictedby hierarchical clustering enriched in the GO iterms

significantly enriched by Fisherrsquos exact test (Table 1 119901 value= 005) Here we defined the positive genetic interactions asthose gene pairs with genetic interaction scores 119878 ge 20 andnegative as 119878 le minus25 [9] The reduced gene sets of columngenes were defined as columnmodules (corresponding to therow module) Next we identified the enriched GO functionalcategories of these column modules by mapping them to GOitems (hypergeometric test) Finally repeating these steps forall row modules we identified the modules and intermodulegenetic cross talk of the whole genetic interaction network(Figures 4ndash6) where red and green represent a statisticallysignificant enrichment of positive and negative interactionsThe cross talks constructed in these figures are the 119878 scoresamong genes in the original matrix In the following we willdiscuss many of the interesting connections that have beenreported previously

The low-rank matrix found more functional modulesthan the original matrix (Table 2) In Table 2 we cut thedendrogram at different heights and compared the clustersobtained from the low-rank matrix and that of the originalmatrix Jaccard index [17] is used to determine how well thepredicted clusters recaptured the benchmark gene sets ((a)GO slim and (b) GO BP FAT) The definition of Jaccardindex can be found in the Supplementary Material Inthe ideal situation where predicted clusters perfectly match

the benchmark gene sets the Jaccard index is 1 The largerthe Jaccard index the better the predictions The clustersobtained from clustering of the low-rank matrix are moreenriched with GO functional categories at varying cutoffs(Table 2)

Figure 4 gives an overview of the relationships amongbiological processes when GO slim (downloaded from SGD[18] a broad overview of all of the top GO categories) is usedas GO items We found that several sets of genes that havebeen known to function in the same biochemical processescontain predominantly positive or negative interactionswhich was also observed in [9] For example genes classed asinvolved in RNA splicing and transcription are significantlyenriched in negative genetic interactions (Figure 4 greennodes) In contrast the module involved in protein foldinghas strong positive interactions (red node) In additionFigure 4 also suggested that not all modules have consistentpattern of interactions (yellow node) which is reasonable inbiological processes Finally several connections have beenpreviously discussed For example there is good evidencefor functional interactions between splicing and transcriptionin [19] and functional interactions between splicing andtranslation in [20] Furthermore [21] reported the cooper-ative relationship between protein folding and chromosomeorganization

Actually if we classify the GO functional categories tomore fine items (GO BP FAT downloaded from DAVIDhttpdavidabccncifcrfgov) we can get a more compre-hensive network (Figure 5) Many of the interaction resultsin Figure 5 have been reported before For instance genesinvolved in tubulin complex assembly process have negativegenetic interactions with genes involved in tRNA wobbleuridine modification process supported by SGD while thenegative genetic interactions between RNA splicing processand tRNA metabolic process could also be found Actuallythe genetic interaction between RNA splicing and chromatinmodification has been studied in [22] And the balance ofthe interactions between the processing of ribonucleoproteinassembly of intronic noncoding RNAs and the splicingprocess regulating the levels of ncRNA and host mRNA canbe found in [23] Tubulin functionally relating to roles ofthe elongator complex in tRNA wobble uridine modificationis supported by [24] Moreover epistasis and chromatinimmunoprecipitation experiments indicating that the lossof Rrp6 (regulation of transcription) function is paralleledby the recruitment of Hda1 (histone deacetylase) have beenreported by [25] Finally cotranscriptional recruitment of themRNA export factor Yra1 realized by direct interaction withthe 31015840 end processing factor Pcf11 was in [26]

In an effort to gain insight into the functional organi-zation of RNA-related complexes we used the GO CC FAT(downloaded from DAVID) as the GO functional categoriesto create a map that highlights strong genetic trends bothwithin and between these complexes This result could befound in the Supplementary Material

Now let us turn to the analysis of the sparse matrix S Thesparse matrix gives two distinct measures to exploit geneticinformation First extreme 119878 scores indicate cofunctionalmembership more efficiently [27] Second some 119878 scores

BioMed Research International 7

Negativeinteractions

Positiveinteractions

LOC1LIN1

BUD13

IST3LEA1

STO1

PRP11SNU66

CBC2BRR1MUD2

TGS1YNR004WCWC15ECM2

YPR152CNTC20ISY1CWC21SYF2CWC27PRP19CSN12

PML1MUD1

NAM8

HMT1SNU56

PRP42TIF35DBR1YHC1

LOC1

LIN1

BUD13

IST3

LEA1

STO1

PRP1

1

SNU66

CBC2

BRR1

MU

D2

TGS1

YNR0

04

WCW

C15

ECM2

YPR1

52

CN

TC20

ISY1

CWC2

1

SYF2

CWC2

7

PRP1

9

CSN12

PML1

MU

D1

NA

M8

HM

T1SN

U56

PRP4

2

TIF3

5

DBR

1

YHC1

NPL3PUF4THP2RTT103CTK2SIF2HOS2SET3CDC73LEO1

RTT109

RTT109

CTK1HTZ1SOH1

SOH1

DST1CSE2

CSE2

MED1

SDC1EAF7THP1SYC1RXT2

RXT2

SWR1VPS71

PHO23

SAP30

PHO23SAP30SIN3

SIN3

HIR2

PUF4

SIF2HOS2SET3CDC73LEO1

SOH1CSE2

SDC1EAF7

RXT2

PHO2

SAP3

0SI

N3

HIR2

HIR2

RTT109G

BP2

SWR1VPS71

GLC

7

HIR2

NPL

3

PUF4

THP2

RTT1

03

CTK2

SIF2

HO

S2SE

T3CD

C73

LEO1

RTT1

09

CTK1

HTZ

1

SOH1

DST

1

CSE2

MED

1

SDC1

EAF7

THP1

SYC1

RXT2

PHO23

SAP3

0

SIN3

HIR2

GBP

2YR

A1

THP2

RPN4

SOH1

SAC3

THP1

DO

A1

RAD23

RTT1

09

SIN3

SWD1

SWD3

SWD3

RPN

4CA

F40

BRE1

SWD1

BRE1

LGE1

CDC7

3EA

F6EA

F5EA

F7SR

C1PA

P2EA

F3

GBP2

RAD23

CSE2

PUF4

SWD1

SWD3

BRE1

LGE1

RAD52

SIF2

HO

S2

SOH1

SET3

CDC7

LEO1

SRC1

SDC1

EAF3

RTT1

0

SWR1

VPS

71

RAD5

RAD5

RAD2

EAF6

EAF7

BRE1SIF2HOS2SET3SEM1

SAC3THP1BUD13

RPN4

RAD52

GLC7SRC1TOR1RTT101RAD51

SOH1

CSE2SIN3

MLP2

BRE1

KEM

1

SIF2

HO

S2SE

T3SE

M1

SAC3

THP1

BUD13

RPN4

RAD52

GLC

7

SRC1

TOR1

RTT1

01

RAD51

SOH1

CSE2

SIN3

MLP

2

YBL104CYGR283CRIC1YPT6GIM4

YKE2GIM3

GIM5

YGR2

83

CRI

C1YP

T6G

IM4

YKE2

GIM

3

GIM

5

RNA splicing

Transcription

DNA metabolicprocess

Cellular proteinmodification

processTranslation

Cellular proteindegradation

Transport

Mitochondrionorganization

Ribosome

Chromosomeorganization

Response tostress

Cell-divisioncycle

Protein folding

LRS4

HEX

3SL

X8LR

P1RR

P6FP

R1TG

S1

Figure 4 Global view of the genetic cross talks between different RNA-related processes (GO slim items) Green and red represent astatistically significant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20)interactions respectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative geneticinteractions Nodes (balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heatmaps represent scores of interactions within one process and the rectangle heatmaps represent scores of interactions betweentwo processes

indicate the significant genetic interactions between genesin different gene sets Following this clue by analyzing thematrix S we found much evidence of genes involved in thesame functionalmodules andmany cross talks between func-tional modules (Figure 6) Actually many of them supportthe network in Figures 4-5 The information of gene pairswith extreme 119878 scores and the involved modules could befound in the Supplementary Material (Supplementary Data2)

Strategy 2 of sparse matrix analysis is similar to strategy 1(Figure 3) First for every row module (the same definitionas that in Figure 3) cluster the column genes based ontheir genetic spectrums across genes in this row moduleThen select the column gene sets in which there are genesbelonging to nonzero entries of sparse matrix to be definedas column modules Finally map these column modules toGO items identifying their enriched functional categories(hypergeometric test) Similarly for all row modules repeat

8 BioMed Research International

PRP4

YRA1

CDC73

LGE1SDC1

RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1

SWD1RTT109

SWD3BRE1EAF5EAF7KAP122

SSD1TAD3EAF6

SWR1VPS71HTZ1EAF3CUL3

SEM1THP2SAC3THP1

PUF4GLC7SRC1

TOP1NUP60EFB1DBP5HIR2

SEM

1TH

P2SA

C3TH

P1

PUF4

GLC

7SR

C1

TOP1

NU

P60

EFB1

DBP

5H

IR2

RTT103

GBP2

CTK1

CTK2

SIF2HOS2SET3LEO1NPL3

UBP6DOA1RPN4

RAD23

PRP4

YRA1

CDC73

LGE1

SDC1

RXT2PHO23

SAP30

RXT2

PHO23

SAP3

0

SIN3

TAD3

DST1

CSE2

MED1

SWD1

RTT109

SWD3BRE1

EAF5EAF7

KAP122

SSD1

SWR1VPS71

CUL3

SOP1

EAF6

SET3

HTZ1

EAF3

RTT103

GBP2

CTK1

CTK2

SIF2

HOS2

SIN

3

LEO1

NPL3

DOA1UBP6

RPN4

RAD23

CBC2

CSN12

LEA1

CWC21

YHC1

PRP19

ELP6

PRP11

CWC27NTC20

IKI1

ELP2

ELP3IKI3

ELP4

ECM2

MUD2TGS1

SNU56

BRR1

BUD13

MLP2SYF2

YNR004WLIN1

SNU66PML1

YPR152C

URM1

ELP6

IKI1

ELP2

ELP3

IKI3

ELP4

URM

1

MUD1

HMT1

GIM5

YKE2GIM3

GIM4

GIM5YKE2GIM3GIM4

ELP3

ELP6

IKI1

GIM

5

YKE2

GIM

3

GIM

4

HTZ

1

SWR1

VPS

71

SGF1

1

TGS1

SKI7

HCR

1

RPS8

NPL

3

DRS

1

IST3

TIF35

ISY1

NAM8

CBC2

CSN12

LEA1

CWC21

YHC1

PRP11

CWC27

NTC20

ECM2

MUD1

TGS1

SNU56BRR1

BUD13

MLP2

SYF2

YNR004W

LIN1

SNU66

PML1

YPR152C

MUD2

HMT1

IST3

TIF35

ISY1

NAM8

PRP4

YRA

1

CDC7

3LG

E1

SSD

1

RXT2

PHO

23SA

P30

SIN

3D

ST1

CSE2

MED

1SO

H1

SWD

1

RTT1

09

SWD

3BR

E1

EAF5

EAF7

EAF6

KAP1

22

SDC1

TAD

3

SWR1

VPS

71H

TZ1

EAF3

CUL3

RTT1

03

GBP

2

CTK1

CTK2

SIF2

HO

S2SE

T3LE

O1

NPL

3

UBP

6D

OA

1

RPN

4

RAD

23

RNA splicing

Transcription

Transcription

transcription

RNA transport RNA localization

Intracellulartransport

Ribonucleoproteincomplex

biogenesis

Ribonucleoproteincomplex

biogenesis

Deadenylationdependentdecapping of

nucleartranscribed

mRNA

biogenesis

Regulation ofcellular protein

metabolic ncRNA

mRNA catabolic

metabolic

Ribosomeprocess

process

process

processprocess

process

Posttranscriptionalregulation of

gene expression

dependentmacromolecule

catabolic

tRNAmodification

Modification

process

rRNA metabolic

mRNA metabolic

process

process

process

process

Macromoleculecatabolicprocess

aciddeacetylation

deacetylation

Protein amino

RNA catabolic

Macromoleculecatabolic

tRNA metabolic

Chromatinmodification

tRNA wobbleuridine

modification

rRNA metabolicMacromolecularcomplex subunit

organizationCellular protein

complexassembly

ncRNAmetabolic

Cellular proteinlocalization

Regulation oftranslation

Regulation of

from nucleusmRNA export Histone

Maturation ofSSU rRNA

tRNA

Tubulin complexassembly

RNA capping

processing

RNAprocessing

PRP19

aminoacylation

mRNA 3998400end

3998400end

Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes

the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)

Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein

folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878

score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial

BioMed Research International 9

RNA transportMaturation of

SSU-rRNA

RPS6A

POM152

UAF30

DBP5 NUP157

RPS11B

RPS6B

THP1

SRC1

TOP1

SEM1

NUP60

ENP1

SAC3

PUF4 HIR2

GLC7EFB1

THP2

(a)

RNA transport RNA localization

RRP6

YRA2

DBP5

NUP53

LRP1

THP1SRC1

TOP1

SEM1

NUP60

SAC3

PUF4HIR2

GLC7EFB1

THP2

(b)

Regulation ofgene expression

mRNA metabolicprocess

POP1

PRP42

CDC36SNU71

PAB1

SUP35

PRP3

PRP5

PRP6

PRP21

CDC73

EAF3

RTT103

RTT109

SWD1

SWD3

SDC1SAP30

RXT2

CTK1SET3

RPN4

PRP4

PHO23

NPL3

MED1

LGE1

LEO1

KAP122 DST1CTK2

EAF5HOS2

TAD3CSE2 BRE1

VPS71EAF7EAF6

SOH1

SIF2DOA1

UBP6 SWR1

SSD1

SIN3

HTZ1

(c)

Tubulin complex assembly

(protein folding)mRNA

process3998400 end

NAB2

PAN3

CTK1

GIM3

YKE2

GIM5

GIM4

(d)

Tubulin complex assembly

TOP1

MLP2 HIR2

CDC36CKA1

UTP4

GIM3

YKE2

GIM5

GIM4

(protein folding)Regulation oftranscription

(e)

mRNA splicingncRNA metabolic

process

YPR152C

YNR004WCSN12

MUD1

YHC1

MUD2

NAM8

MSM1

MSD1

MSK1

MRM1

MSF1

PRP19

SNU66

SNU56

PRP11

PML1ECM2

BUD13

HMT1

CWC21CWC27 TGS1

IST3CBC2

ISY1

LIN1

LEA1

MLP2

BRR1

NTC20SYF2

TIF35

(f)

Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix

10 BioMed Research International

6 Conclusion

In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)

References

[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919

[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007

[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008

[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001

[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004

[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative

epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006

[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007

[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008

[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008

[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260

[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006

[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011

[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011

[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010

[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004

[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014

[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009

[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997

[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014

[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013

[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012

[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012

[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008

BioMed Research International 11

[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008

[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007

[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009

[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011

[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011

[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010

[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008

[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article Low-Rank and Sparse Matrix Decomposition ...downloads.hindawi.com/journals/bmri/2015/573956.pdfResearch Article Low-Rank and Sparse Matrix Decomposition for Genetic

6 BioMed Research International

Table 1 Fisherrsquos exact test table (Gene set 119860)perp denotes thecomplementary set of gene set 119860 119860119861 denotes the number ofconnections between gene set 119860 and gene set 119861 119860119861

perp

denotes thenumber of connections between gene set 119860 and the complementaryset of gene set 119861

Gene set 119861 (Gene set 119861)perp

Gene set 119860 119860119861 119860119861perp

(Gene set 119860)perp 119860perp

119861 119860perp

119861perp

Table 2 Clustering results

(a) GO slim as benchmark gene set

Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched

200 0063 190 0022 46150 0070 138 0032 44100 0084 90 0050 5050 0088 30 0067 24

(b) GO BP FAT as benchmark gene set

Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched

200 0137 183 0044 47150 0147 142 0058 50100 0155 96 0091 5250 0131 34 0078 26 hypergeometric test applied to test the enrichment of gene sets Signif-icance level FDR lt= 005 Cluster the number of clusters to cut off thehierarchical clustering tree Enriched the number of modules predictedby hierarchical clustering enriched in the GO iterms

significantly enriched by Fisherrsquos exact test (Table 1 119901 value= 005) Here we defined the positive genetic interactions asthose gene pairs with genetic interaction scores 119878 ge 20 andnegative as 119878 le minus25 [9] The reduced gene sets of columngenes were defined as columnmodules (corresponding to therow module) Next we identified the enriched GO functionalcategories of these column modules by mapping them to GOitems (hypergeometric test) Finally repeating these steps forall row modules we identified the modules and intermodulegenetic cross talk of the whole genetic interaction network(Figures 4ndash6) where red and green represent a statisticallysignificant enrichment of positive and negative interactionsThe cross talks constructed in these figures are the 119878 scoresamong genes in the original matrix In the following we willdiscuss many of the interesting connections that have beenreported previously

The low-rank matrix found more functional modulesthan the original matrix (Table 2) In Table 2 we cut thedendrogram at different heights and compared the clustersobtained from the low-rank matrix and that of the originalmatrix Jaccard index [17] is used to determine how well thepredicted clusters recaptured the benchmark gene sets ((a)GO slim and (b) GO BP FAT) The definition of Jaccardindex can be found in the Supplementary Material Inthe ideal situation where predicted clusters perfectly match

the benchmark gene sets the Jaccard index is 1 The largerthe Jaccard index the better the predictions The clustersobtained from clustering of the low-rank matrix are moreenriched with GO functional categories at varying cutoffs(Table 2)

Figure 4 gives an overview of the relationships amongbiological processes when GO slim (downloaded from SGD[18] a broad overview of all of the top GO categories) is usedas GO items We found that several sets of genes that havebeen known to function in the same biochemical processescontain predominantly positive or negative interactionswhich was also observed in [9] For example genes classed asinvolved in RNA splicing and transcription are significantlyenriched in negative genetic interactions (Figure 4 greennodes) In contrast the module involved in protein foldinghas strong positive interactions (red node) In additionFigure 4 also suggested that not all modules have consistentpattern of interactions (yellow node) which is reasonable inbiological processes Finally several connections have beenpreviously discussed For example there is good evidencefor functional interactions between splicing and transcriptionin [19] and functional interactions between splicing andtranslation in [20] Furthermore [21] reported the cooper-ative relationship between protein folding and chromosomeorganization

Actually if we classify the GO functional categories tomore fine items (GO BP FAT downloaded from DAVIDhttpdavidabccncifcrfgov) we can get a more compre-hensive network (Figure 5) Many of the interaction resultsin Figure 5 have been reported before For instance genesinvolved in tubulin complex assembly process have negativegenetic interactions with genes involved in tRNA wobbleuridine modification process supported by SGD while thenegative genetic interactions between RNA splicing processand tRNA metabolic process could also be found Actuallythe genetic interaction between RNA splicing and chromatinmodification has been studied in [22] And the balance ofthe interactions between the processing of ribonucleoproteinassembly of intronic noncoding RNAs and the splicingprocess regulating the levels of ncRNA and host mRNA canbe found in [23] Tubulin functionally relating to roles ofthe elongator complex in tRNA wobble uridine modificationis supported by [24] Moreover epistasis and chromatinimmunoprecipitation experiments indicating that the lossof Rrp6 (regulation of transcription) function is paralleledby the recruitment of Hda1 (histone deacetylase) have beenreported by [25] Finally cotranscriptional recruitment of themRNA export factor Yra1 realized by direct interaction withthe 31015840 end processing factor Pcf11 was in [26]

In an effort to gain insight into the functional organi-zation of RNA-related complexes we used the GO CC FAT(downloaded from DAVID) as the GO functional categoriesto create a map that highlights strong genetic trends bothwithin and between these complexes This result could befound in the Supplementary Material

Now let us turn to the analysis of the sparse matrix S Thesparse matrix gives two distinct measures to exploit geneticinformation First extreme 119878 scores indicate cofunctionalmembership more efficiently [27] Second some 119878 scores

BioMed Research International 7

Negativeinteractions

Positiveinteractions

LOC1LIN1

BUD13

IST3LEA1

STO1

PRP11SNU66

CBC2BRR1MUD2

TGS1YNR004WCWC15ECM2

YPR152CNTC20ISY1CWC21SYF2CWC27PRP19CSN12

PML1MUD1

NAM8

HMT1SNU56

PRP42TIF35DBR1YHC1

LOC1

LIN1

BUD13

IST3

LEA1

STO1

PRP1

1

SNU66

CBC2

BRR1

MU

D2

TGS1

YNR0

04

WCW

C15

ECM2

YPR1

52

CN

TC20

ISY1

CWC2

1

SYF2

CWC2

7

PRP1

9

CSN12

PML1

MU

D1

NA

M8

HM

T1SN

U56

PRP4

2

TIF3

5

DBR

1

YHC1

NPL3PUF4THP2RTT103CTK2SIF2HOS2SET3CDC73LEO1

RTT109

RTT109

CTK1HTZ1SOH1

SOH1

DST1CSE2

CSE2

MED1

SDC1EAF7THP1SYC1RXT2

RXT2

SWR1VPS71

PHO23

SAP30

PHO23SAP30SIN3

SIN3

HIR2

PUF4

SIF2HOS2SET3CDC73LEO1

SOH1CSE2

SDC1EAF7

RXT2

PHO2

SAP3

0SI

N3

HIR2

HIR2

RTT109G

BP2

SWR1VPS71

GLC

7

HIR2

NPL

3

PUF4

THP2

RTT1

03

CTK2

SIF2

HO

S2SE

T3CD

C73

LEO1

RTT1

09

CTK1

HTZ

1

SOH1

DST

1

CSE2

MED

1

SDC1

EAF7

THP1

SYC1

RXT2

PHO23

SAP3

0

SIN3

HIR2

GBP

2YR

A1

THP2

RPN4

SOH1

SAC3

THP1

DO

A1

RAD23

RTT1

09

SIN3

SWD1

SWD3

SWD3

RPN

4CA

F40

BRE1

SWD1

BRE1

LGE1

CDC7

3EA

F6EA

F5EA

F7SR

C1PA

P2EA

F3

GBP2

RAD23

CSE2

PUF4

SWD1

SWD3

BRE1

LGE1

RAD52

SIF2

HO

S2

SOH1

SET3

CDC7

LEO1

SRC1

SDC1

EAF3

RTT1

0

SWR1

VPS

71

RAD5

RAD5

RAD2

EAF6

EAF7

BRE1SIF2HOS2SET3SEM1

SAC3THP1BUD13

RPN4

RAD52

GLC7SRC1TOR1RTT101RAD51

SOH1

CSE2SIN3

MLP2

BRE1

KEM

1

SIF2

HO

S2SE

T3SE

M1

SAC3

THP1

BUD13

RPN4

RAD52

GLC

7

SRC1

TOR1

RTT1

01

RAD51

SOH1

CSE2

SIN3

MLP

2

YBL104CYGR283CRIC1YPT6GIM4

YKE2GIM3

GIM5

YGR2

83

CRI

C1YP

T6G

IM4

YKE2

GIM

3

GIM

5

RNA splicing

Transcription

DNA metabolicprocess

Cellular proteinmodification

processTranslation

Cellular proteindegradation

Transport

Mitochondrionorganization

Ribosome

Chromosomeorganization

Response tostress

Cell-divisioncycle

Protein folding

LRS4

HEX

3SL

X8LR

P1RR

P6FP

R1TG

S1

Figure 4 Global view of the genetic cross talks between different RNA-related processes (GO slim items) Green and red represent astatistically significant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20)interactions respectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative geneticinteractions Nodes (balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heatmaps represent scores of interactions within one process and the rectangle heatmaps represent scores of interactions betweentwo processes

indicate the significant genetic interactions between genesin different gene sets Following this clue by analyzing thematrix S we found much evidence of genes involved in thesame functionalmodules andmany cross talks between func-tional modules (Figure 6) Actually many of them supportthe network in Figures 4-5 The information of gene pairswith extreme 119878 scores and the involved modules could befound in the Supplementary Material (Supplementary Data2)

Strategy 2 of sparse matrix analysis is similar to strategy 1(Figure 3) First for every row module (the same definitionas that in Figure 3) cluster the column genes based ontheir genetic spectrums across genes in this row moduleThen select the column gene sets in which there are genesbelonging to nonzero entries of sparse matrix to be definedas column modules Finally map these column modules toGO items identifying their enriched functional categories(hypergeometric test) Similarly for all row modules repeat

8 BioMed Research International

PRP4

YRA1

CDC73

LGE1SDC1

RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1

SWD1RTT109

SWD3BRE1EAF5EAF7KAP122

SSD1TAD3EAF6

SWR1VPS71HTZ1EAF3CUL3

SEM1THP2SAC3THP1

PUF4GLC7SRC1

TOP1NUP60EFB1DBP5HIR2

SEM

1TH

P2SA

C3TH

P1

PUF4

GLC

7SR

C1

TOP1

NU

P60

EFB1

DBP

5H

IR2

RTT103

GBP2

CTK1

CTK2

SIF2HOS2SET3LEO1NPL3

UBP6DOA1RPN4

RAD23

PRP4

YRA1

CDC73

LGE1

SDC1

RXT2PHO23

SAP30

RXT2

PHO23

SAP3

0

SIN3

TAD3

DST1

CSE2

MED1

SWD1

RTT109

SWD3BRE1

EAF5EAF7

KAP122

SSD1

SWR1VPS71

CUL3

SOP1

EAF6

SET3

HTZ1

EAF3

RTT103

GBP2

CTK1

CTK2

SIF2

HOS2

SIN

3

LEO1

NPL3

DOA1UBP6

RPN4

RAD23

CBC2

CSN12

LEA1

CWC21

YHC1

PRP19

ELP6

PRP11

CWC27NTC20

IKI1

ELP2

ELP3IKI3

ELP4

ECM2

MUD2TGS1

SNU56

BRR1

BUD13

MLP2SYF2

YNR004WLIN1

SNU66PML1

YPR152C

URM1

ELP6

IKI1

ELP2

ELP3

IKI3

ELP4

URM

1

MUD1

HMT1

GIM5

YKE2GIM3

GIM4

GIM5YKE2GIM3GIM4

ELP3

ELP6

IKI1

GIM

5

YKE2

GIM

3

GIM

4

HTZ

1

SWR1

VPS

71

SGF1

1

TGS1

SKI7

HCR

1

RPS8

NPL

3

DRS

1

IST3

TIF35

ISY1

NAM8

CBC2

CSN12

LEA1

CWC21

YHC1

PRP11

CWC27

NTC20

ECM2

MUD1

TGS1

SNU56BRR1

BUD13

MLP2

SYF2

YNR004W

LIN1

SNU66

PML1

YPR152C

MUD2

HMT1

IST3

TIF35

ISY1

NAM8

PRP4

YRA

1

CDC7

3LG

E1

SSD

1

RXT2

PHO

23SA

P30

SIN

3D

ST1

CSE2

MED

1SO

H1

SWD

1

RTT1

09

SWD

3BR

E1

EAF5

EAF7

EAF6

KAP1

22

SDC1

TAD

3

SWR1

VPS

71H

TZ1

EAF3

CUL3

RTT1

03

GBP

2

CTK1

CTK2

SIF2

HO

S2SE

T3LE

O1

NPL

3

UBP

6D

OA

1

RPN

4

RAD

23

RNA splicing

Transcription

Transcription

transcription

RNA transport RNA localization

Intracellulartransport

Ribonucleoproteincomplex

biogenesis

Ribonucleoproteincomplex

biogenesis

Deadenylationdependentdecapping of

nucleartranscribed

mRNA

biogenesis

Regulation ofcellular protein

metabolic ncRNA

mRNA catabolic

metabolic

Ribosomeprocess

process

process

processprocess

process

Posttranscriptionalregulation of

gene expression

dependentmacromolecule

catabolic

tRNAmodification

Modification

process

rRNA metabolic

mRNA metabolic

process

process

process

process

Macromoleculecatabolicprocess

aciddeacetylation

deacetylation

Protein amino

RNA catabolic

Macromoleculecatabolic

tRNA metabolic

Chromatinmodification

tRNA wobbleuridine

modification

rRNA metabolicMacromolecularcomplex subunit

organizationCellular protein

complexassembly

ncRNAmetabolic

Cellular proteinlocalization

Regulation oftranslation

Regulation of

from nucleusmRNA export Histone

Maturation ofSSU rRNA

tRNA

Tubulin complexassembly

RNA capping

processing

RNAprocessing

PRP19

aminoacylation

mRNA 3998400end

3998400end

Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes

the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)

Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein

folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878

score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial

BioMed Research International 9

RNA transportMaturation of

SSU-rRNA

RPS6A

POM152

UAF30

DBP5 NUP157

RPS11B

RPS6B

THP1

SRC1

TOP1

SEM1

NUP60

ENP1

SAC3

PUF4 HIR2

GLC7EFB1

THP2

(a)

RNA transport RNA localization

RRP6

YRA2

DBP5

NUP53

LRP1

THP1SRC1

TOP1

SEM1

NUP60

SAC3

PUF4HIR2

GLC7EFB1

THP2

(b)

Regulation ofgene expression

mRNA metabolicprocess

POP1

PRP42

CDC36SNU71

PAB1

SUP35

PRP3

PRP5

PRP6

PRP21

CDC73

EAF3

RTT103

RTT109

SWD1

SWD3

SDC1SAP30

RXT2

CTK1SET3

RPN4

PRP4

PHO23

NPL3

MED1

LGE1

LEO1

KAP122 DST1CTK2

EAF5HOS2

TAD3CSE2 BRE1

VPS71EAF7EAF6

SOH1

SIF2DOA1

UBP6 SWR1

SSD1

SIN3

HTZ1

(c)

Tubulin complex assembly

(protein folding)mRNA

process3998400 end

NAB2

PAN3

CTK1

GIM3

YKE2

GIM5

GIM4

(d)

Tubulin complex assembly

TOP1

MLP2 HIR2

CDC36CKA1

UTP4

GIM3

YKE2

GIM5

GIM4

(protein folding)Regulation oftranscription

(e)

mRNA splicingncRNA metabolic

process

YPR152C

YNR004WCSN12

MUD1

YHC1

MUD2

NAM8

MSM1

MSD1

MSK1

MRM1

MSF1

PRP19

SNU66

SNU56

PRP11

PML1ECM2

BUD13

HMT1

CWC21CWC27 TGS1

IST3CBC2

ISY1

LIN1

LEA1

MLP2

BRR1

NTC20SYF2

TIF35

(f)

Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix

10 BioMed Research International

6 Conclusion

In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)

References

[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919

[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007

[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008

[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001

[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004

[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative

epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006

[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007

[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008

[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008

[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260

[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006

[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011

[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011

[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010

[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004

[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014

[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009

[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997

[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014

[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013

[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012

[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012

[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008

BioMed Research International 11

[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008

[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007

[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009

[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011

[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011

[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010

[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008

[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article Low-Rank and Sparse Matrix Decomposition ...downloads.hindawi.com/journals/bmri/2015/573956.pdfResearch Article Low-Rank and Sparse Matrix Decomposition for Genetic

BioMed Research International 7

Negativeinteractions

Positiveinteractions

LOC1LIN1

BUD13

IST3LEA1

STO1

PRP11SNU66

CBC2BRR1MUD2

TGS1YNR004WCWC15ECM2

YPR152CNTC20ISY1CWC21SYF2CWC27PRP19CSN12

PML1MUD1

NAM8

HMT1SNU56

PRP42TIF35DBR1YHC1

LOC1

LIN1

BUD13

IST3

LEA1

STO1

PRP1

1

SNU66

CBC2

BRR1

MU

D2

TGS1

YNR0

04

WCW

C15

ECM2

YPR1

52

CN

TC20

ISY1

CWC2

1

SYF2

CWC2

7

PRP1

9

CSN12

PML1

MU

D1

NA

M8

HM

T1SN

U56

PRP4

2

TIF3

5

DBR

1

YHC1

NPL3PUF4THP2RTT103CTK2SIF2HOS2SET3CDC73LEO1

RTT109

RTT109

CTK1HTZ1SOH1

SOH1

DST1CSE2

CSE2

MED1

SDC1EAF7THP1SYC1RXT2

RXT2

SWR1VPS71

PHO23

SAP30

PHO23SAP30SIN3

SIN3

HIR2

PUF4

SIF2HOS2SET3CDC73LEO1

SOH1CSE2

SDC1EAF7

RXT2

PHO2

SAP3

0SI

N3

HIR2

HIR2

RTT109G

BP2

SWR1VPS71

GLC

7

HIR2

NPL

3

PUF4

THP2

RTT1

03

CTK2

SIF2

HO

S2SE

T3CD

C73

LEO1

RTT1

09

CTK1

HTZ

1

SOH1

DST

1

CSE2

MED

1

SDC1

EAF7

THP1

SYC1

RXT2

PHO23

SAP3

0

SIN3

HIR2

GBP

2YR

A1

THP2

RPN4

SOH1

SAC3

THP1

DO

A1

RAD23

RTT1

09

SIN3

SWD1

SWD3

SWD3

RPN

4CA

F40

BRE1

SWD1

BRE1

LGE1

CDC7

3EA

F6EA

F5EA

F7SR

C1PA

P2EA

F3

GBP2

RAD23

CSE2

PUF4

SWD1

SWD3

BRE1

LGE1

RAD52

SIF2

HO

S2

SOH1

SET3

CDC7

LEO1

SRC1

SDC1

EAF3

RTT1

0

SWR1

VPS

71

RAD5

RAD5

RAD2

EAF6

EAF7

BRE1SIF2HOS2SET3SEM1

SAC3THP1BUD13

RPN4

RAD52

GLC7SRC1TOR1RTT101RAD51

SOH1

CSE2SIN3

MLP2

BRE1

KEM

1

SIF2

HO

S2SE

T3SE

M1

SAC3

THP1

BUD13

RPN4

RAD52

GLC

7

SRC1

TOR1

RTT1

01

RAD51

SOH1

CSE2

SIN3

MLP

2

YBL104CYGR283CRIC1YPT6GIM4

YKE2GIM3

GIM5

YGR2

83

CRI

C1YP

T6G

IM4

YKE2

GIM

3

GIM

5

RNA splicing

Transcription

DNA metabolicprocess

Cellular proteinmodification

processTranslation

Cellular proteindegradation

Transport

Mitochondrionorganization

Ribosome

Chromosomeorganization

Response tostress

Cell-divisioncycle

Protein folding

LRS4

HEX

3SL

X8LR

P1RR

P6FP

R1TG

S1

Figure 4 Global view of the genetic cross talks between different RNA-related processes (GO slim items) Green and red represent astatistically significant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20)interactions respectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative geneticinteractions Nodes (balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heatmaps represent scores of interactions within one process and the rectangle heatmaps represent scores of interactions betweentwo processes

indicate the significant genetic interactions between genesin different gene sets Following this clue by analyzing thematrix S we found much evidence of genes involved in thesame functionalmodules andmany cross talks between func-tional modules (Figure 6) Actually many of them supportthe network in Figures 4-5 The information of gene pairswith extreme 119878 scores and the involved modules could befound in the Supplementary Material (Supplementary Data2)

Strategy 2 of sparse matrix analysis is similar to strategy 1(Figure 3) First for every row module (the same definitionas that in Figure 3) cluster the column genes based ontheir genetic spectrums across genes in this row moduleThen select the column gene sets in which there are genesbelonging to nonzero entries of sparse matrix to be definedas column modules Finally map these column modules toGO items identifying their enriched functional categories(hypergeometric test) Similarly for all row modules repeat

8 BioMed Research International

PRP4

YRA1

CDC73

LGE1SDC1

RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1

SWD1RTT109

SWD3BRE1EAF5EAF7KAP122

SSD1TAD3EAF6

SWR1VPS71HTZ1EAF3CUL3

SEM1THP2SAC3THP1

PUF4GLC7SRC1

TOP1NUP60EFB1DBP5HIR2

SEM

1TH

P2SA

C3TH

P1

PUF4

GLC

7SR

C1

TOP1

NU

P60

EFB1

DBP

5H

IR2

RTT103

GBP2

CTK1

CTK2

SIF2HOS2SET3LEO1NPL3

UBP6DOA1RPN4

RAD23

PRP4

YRA1

CDC73

LGE1

SDC1

RXT2PHO23

SAP30

RXT2

PHO23

SAP3

0

SIN3

TAD3

DST1

CSE2

MED1

SWD1

RTT109

SWD3BRE1

EAF5EAF7

KAP122

SSD1

SWR1VPS71

CUL3

SOP1

EAF6

SET3

HTZ1

EAF3

RTT103

GBP2

CTK1

CTK2

SIF2

HOS2

SIN

3

LEO1

NPL3

DOA1UBP6

RPN4

RAD23

CBC2

CSN12

LEA1

CWC21

YHC1

PRP19

ELP6

PRP11

CWC27NTC20

IKI1

ELP2

ELP3IKI3

ELP4

ECM2

MUD2TGS1

SNU56

BRR1

BUD13

MLP2SYF2

YNR004WLIN1

SNU66PML1

YPR152C

URM1

ELP6

IKI1

ELP2

ELP3

IKI3

ELP4

URM

1

MUD1

HMT1

GIM5

YKE2GIM3

GIM4

GIM5YKE2GIM3GIM4

ELP3

ELP6

IKI1

GIM

5

YKE2

GIM

3

GIM

4

HTZ

1

SWR1

VPS

71

SGF1

1

TGS1

SKI7

HCR

1

RPS8

NPL

3

DRS

1

IST3

TIF35

ISY1

NAM8

CBC2

CSN12

LEA1

CWC21

YHC1

PRP11

CWC27

NTC20

ECM2

MUD1

TGS1

SNU56BRR1

BUD13

MLP2

SYF2

YNR004W

LIN1

SNU66

PML1

YPR152C

MUD2

HMT1

IST3

TIF35

ISY1

NAM8

PRP4

YRA

1

CDC7

3LG

E1

SSD

1

RXT2

PHO

23SA

P30

SIN

3D

ST1

CSE2

MED

1SO

H1

SWD

1

RTT1

09

SWD

3BR

E1

EAF5

EAF7

EAF6

KAP1

22

SDC1

TAD

3

SWR1

VPS

71H

TZ1

EAF3

CUL3

RTT1

03

GBP

2

CTK1

CTK2

SIF2

HO

S2SE

T3LE

O1

NPL

3

UBP

6D

OA

1

RPN

4

RAD

23

RNA splicing

Transcription

Transcription

transcription

RNA transport RNA localization

Intracellulartransport

Ribonucleoproteincomplex

biogenesis

Ribonucleoproteincomplex

biogenesis

Deadenylationdependentdecapping of

nucleartranscribed

mRNA

biogenesis

Regulation ofcellular protein

metabolic ncRNA

mRNA catabolic

metabolic

Ribosomeprocess

process

process

processprocess

process

Posttranscriptionalregulation of

gene expression

dependentmacromolecule

catabolic

tRNAmodification

Modification

process

rRNA metabolic

mRNA metabolic

process

process

process

process

Macromoleculecatabolicprocess

aciddeacetylation

deacetylation

Protein amino

RNA catabolic

Macromoleculecatabolic

tRNA metabolic

Chromatinmodification

tRNA wobbleuridine

modification

rRNA metabolicMacromolecularcomplex subunit

organizationCellular protein

complexassembly

ncRNAmetabolic

Cellular proteinlocalization

Regulation oftranslation

Regulation of

from nucleusmRNA export Histone

Maturation ofSSU rRNA

tRNA

Tubulin complexassembly

RNA capping

processing

RNAprocessing

PRP19

aminoacylation

mRNA 3998400end

3998400end

Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes

the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)

Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein

folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878

score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial

BioMed Research International 9

RNA transportMaturation of

SSU-rRNA

RPS6A

POM152

UAF30

DBP5 NUP157

RPS11B

RPS6B

THP1

SRC1

TOP1

SEM1

NUP60

ENP1

SAC3

PUF4 HIR2

GLC7EFB1

THP2

(a)

RNA transport RNA localization

RRP6

YRA2

DBP5

NUP53

LRP1

THP1SRC1

TOP1

SEM1

NUP60

SAC3

PUF4HIR2

GLC7EFB1

THP2

(b)

Regulation ofgene expression

mRNA metabolicprocess

POP1

PRP42

CDC36SNU71

PAB1

SUP35

PRP3

PRP5

PRP6

PRP21

CDC73

EAF3

RTT103

RTT109

SWD1

SWD3

SDC1SAP30

RXT2

CTK1SET3

RPN4

PRP4

PHO23

NPL3

MED1

LGE1

LEO1

KAP122 DST1CTK2

EAF5HOS2

TAD3CSE2 BRE1

VPS71EAF7EAF6

SOH1

SIF2DOA1

UBP6 SWR1

SSD1

SIN3

HTZ1

(c)

Tubulin complex assembly

(protein folding)mRNA

process3998400 end

NAB2

PAN3

CTK1

GIM3

YKE2

GIM5

GIM4

(d)

Tubulin complex assembly

TOP1

MLP2 HIR2

CDC36CKA1

UTP4

GIM3

YKE2

GIM5

GIM4

(protein folding)Regulation oftranscription

(e)

mRNA splicingncRNA metabolic

process

YPR152C

YNR004WCSN12

MUD1

YHC1

MUD2

NAM8

MSM1

MSD1

MSK1

MRM1

MSF1

PRP19

SNU66

SNU56

PRP11

PML1ECM2

BUD13

HMT1

CWC21CWC27 TGS1

IST3CBC2

ISY1

LIN1

LEA1

MLP2

BRR1

NTC20SYF2

TIF35

(f)

Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix

10 BioMed Research International

6 Conclusion

In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)

References

[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919

[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007

[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008

[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001

[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004

[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative

epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006

[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007

[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008

[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008

[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260

[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006

[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011

[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011

[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010

[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004

[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014

[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009

[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997

[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014

[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013

[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012

[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012

[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008

BioMed Research International 11

[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008

[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007

[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009

[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011

[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011

[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010

[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008

[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article Low-Rank and Sparse Matrix Decomposition ...downloads.hindawi.com/journals/bmri/2015/573956.pdfResearch Article Low-Rank and Sparse Matrix Decomposition for Genetic

8 BioMed Research International

PRP4

YRA1

CDC73

LGE1SDC1

RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1

SWD1RTT109

SWD3BRE1EAF5EAF7KAP122

SSD1TAD3EAF6

SWR1VPS71HTZ1EAF3CUL3

SEM1THP2SAC3THP1

PUF4GLC7SRC1

TOP1NUP60EFB1DBP5HIR2

SEM

1TH

P2SA

C3TH

P1

PUF4

GLC

7SR

C1

TOP1

NU

P60

EFB1

DBP

5H

IR2

RTT103

GBP2

CTK1

CTK2

SIF2HOS2SET3LEO1NPL3

UBP6DOA1RPN4

RAD23

PRP4

YRA1

CDC73

LGE1

SDC1

RXT2PHO23

SAP30

RXT2

PHO23

SAP3

0

SIN3

TAD3

DST1

CSE2

MED1

SWD1

RTT109

SWD3BRE1

EAF5EAF7

KAP122

SSD1

SWR1VPS71

CUL3

SOP1

EAF6

SET3

HTZ1

EAF3

RTT103

GBP2

CTK1

CTK2

SIF2

HOS2

SIN

3

LEO1

NPL3

DOA1UBP6

RPN4

RAD23

CBC2

CSN12

LEA1

CWC21

YHC1

PRP19

ELP6

PRP11

CWC27NTC20

IKI1

ELP2

ELP3IKI3

ELP4

ECM2

MUD2TGS1

SNU56

BRR1

BUD13

MLP2SYF2

YNR004WLIN1

SNU66PML1

YPR152C

URM1

ELP6

IKI1

ELP2

ELP3

IKI3

ELP4

URM

1

MUD1

HMT1

GIM5

YKE2GIM3

GIM4

GIM5YKE2GIM3GIM4

ELP3

ELP6

IKI1

GIM

5

YKE2

GIM

3

GIM

4

HTZ

1

SWR1

VPS

71

SGF1

1

TGS1

SKI7

HCR

1

RPS8

NPL

3

DRS

1

IST3

TIF35

ISY1

NAM8

CBC2

CSN12

LEA1

CWC21

YHC1

PRP11

CWC27

NTC20

ECM2

MUD1

TGS1

SNU56BRR1

BUD13

MLP2

SYF2

YNR004W

LIN1

SNU66

PML1

YPR152C

MUD2

HMT1

IST3

TIF35

ISY1

NAM8

PRP4

YRA

1

CDC7

3LG

E1

SSD

1

RXT2

PHO

23SA

P30

SIN

3D

ST1

CSE2

MED

1SO

H1

SWD

1

RTT1

09

SWD

3BR

E1

EAF5

EAF7

EAF6

KAP1

22

SDC1

TAD

3

SWR1

VPS

71H

TZ1

EAF3

CUL3

RTT1

03

GBP

2

CTK1

CTK2

SIF2

HO

S2SE

T3LE

O1

NPL

3

UBP

6D

OA

1

RPN

4

RAD

23

RNA splicing

Transcription

Transcription

transcription

RNA transport RNA localization

Intracellulartransport

Ribonucleoproteincomplex

biogenesis

Ribonucleoproteincomplex

biogenesis

Deadenylationdependentdecapping of

nucleartranscribed

mRNA

biogenesis

Regulation ofcellular protein

metabolic ncRNA

mRNA catabolic

metabolic

Ribosomeprocess

process

process

processprocess

process

Posttranscriptionalregulation of

gene expression

dependentmacromolecule

catabolic

tRNAmodification

Modification

process

rRNA metabolic

mRNA metabolic

process

process

process

process

Macromoleculecatabolicprocess

aciddeacetylation

deacetylation

Protein amino

RNA catabolic

Macromoleculecatabolic

tRNA metabolic

Chromatinmodification

tRNA wobbleuridine

modification

rRNA metabolicMacromolecularcomplex subunit

organizationCellular protein

complexassembly

ncRNAmetabolic

Cellular proteinlocalization

Regulation oftranslation

Regulation of

from nucleusmRNA export Histone

Maturation ofSSU rRNA

tRNA

Tubulin complexassembly

RNA capping

processing

RNAprocessing

PRP19

aminoacylation

mRNA 3998400end

3998400end

Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes

the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)

Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein

folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878

score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial

BioMed Research International 9

RNA transportMaturation of

SSU-rRNA

RPS6A

POM152

UAF30

DBP5 NUP157

RPS11B

RPS6B

THP1

SRC1

TOP1

SEM1

NUP60

ENP1

SAC3

PUF4 HIR2

GLC7EFB1

THP2

(a)

RNA transport RNA localization

RRP6

YRA2

DBP5

NUP53

LRP1

THP1SRC1

TOP1

SEM1

NUP60

SAC3

PUF4HIR2

GLC7EFB1

THP2

(b)

Regulation ofgene expression

mRNA metabolicprocess

POP1

PRP42

CDC36SNU71

PAB1

SUP35

PRP3

PRP5

PRP6

PRP21

CDC73

EAF3

RTT103

RTT109

SWD1

SWD3

SDC1SAP30

RXT2

CTK1SET3

RPN4

PRP4

PHO23

NPL3

MED1

LGE1

LEO1

KAP122 DST1CTK2

EAF5HOS2

TAD3CSE2 BRE1

VPS71EAF7EAF6

SOH1

SIF2DOA1

UBP6 SWR1

SSD1

SIN3

HTZ1

(c)

Tubulin complex assembly

(protein folding)mRNA

process3998400 end

NAB2

PAN3

CTK1

GIM3

YKE2

GIM5

GIM4

(d)

Tubulin complex assembly

TOP1

MLP2 HIR2

CDC36CKA1

UTP4

GIM3

YKE2

GIM5

GIM4

(protein folding)Regulation oftranscription

(e)

mRNA splicingncRNA metabolic

process

YPR152C

YNR004WCSN12

MUD1

YHC1

MUD2

NAM8

MSM1

MSD1

MSK1

MRM1

MSF1

PRP19

SNU66

SNU56

PRP11

PML1ECM2

BUD13

HMT1

CWC21CWC27 TGS1

IST3CBC2

ISY1

LIN1

LEA1

MLP2

BRR1

NTC20SYF2

TIF35

(f)

Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix

10 BioMed Research International

6 Conclusion

In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)

References

[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919

[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007

[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008

[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001

[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004

[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative

epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006

[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007

[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008

[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008

[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260

[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006

[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011

[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011

[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010

[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004

[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014

[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009

[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997

[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014

[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013

[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012

[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012

[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008

BioMed Research International 11

[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008

[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007

[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009

[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011

[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011

[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010

[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008

[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article Low-Rank and Sparse Matrix Decomposition ...downloads.hindawi.com/journals/bmri/2015/573956.pdfResearch Article Low-Rank and Sparse Matrix Decomposition for Genetic

BioMed Research International 9

RNA transportMaturation of

SSU-rRNA

RPS6A

POM152

UAF30

DBP5 NUP157

RPS11B

RPS6B

THP1

SRC1

TOP1

SEM1

NUP60

ENP1

SAC3

PUF4 HIR2

GLC7EFB1

THP2

(a)

RNA transport RNA localization

RRP6

YRA2

DBP5

NUP53

LRP1

THP1SRC1

TOP1

SEM1

NUP60

SAC3

PUF4HIR2

GLC7EFB1

THP2

(b)

Regulation ofgene expression

mRNA metabolicprocess

POP1

PRP42

CDC36SNU71

PAB1

SUP35

PRP3

PRP5

PRP6

PRP21

CDC73

EAF3

RTT103

RTT109

SWD1

SWD3

SDC1SAP30

RXT2

CTK1SET3

RPN4

PRP4

PHO23

NPL3

MED1

LGE1

LEO1

KAP122 DST1CTK2

EAF5HOS2

TAD3CSE2 BRE1

VPS71EAF7EAF6

SOH1

SIF2DOA1

UBP6 SWR1

SSD1

SIN3

HTZ1

(c)

Tubulin complex assembly

(protein folding)mRNA

process3998400 end

NAB2

PAN3

CTK1

GIM3

YKE2

GIM5

GIM4

(d)

Tubulin complex assembly

TOP1

MLP2 HIR2

CDC36CKA1

UTP4

GIM3

YKE2

GIM5

GIM4

(protein folding)Regulation oftranscription

(e)

mRNA splicingncRNA metabolic

process

YPR152C

YNR004WCSN12

MUD1

YHC1

MUD2

NAM8

MSM1

MSD1

MSK1

MRM1

MSF1

PRP19

SNU66

SNU56

PRP11

PML1ECM2

BUD13

HMT1

CWC21CWC27 TGS1

IST3CBC2

ISY1

LIN1

LEA1

MLP2

BRR1

NTC20SYF2

TIF35

(f)

Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix

10 BioMed Research International

6 Conclusion

In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)

References

[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919

[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007

[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008

[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001

[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004

[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative

epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006

[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007

[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008

[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008

[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260

[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006

[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011

[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011

[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010

[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004

[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014

[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009

[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997

[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014

[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013

[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012

[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012

[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008

BioMed Research International 11

[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008

[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007

[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009

[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011

[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011

[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010

[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008

[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 10: Research Article Low-Rank and Sparse Matrix Decomposition ...downloads.hindawi.com/journals/bmri/2015/573956.pdfResearch Article Low-Rank and Sparse Matrix Decomposition for Genetic

10 BioMed Research International

6 Conclusion

In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)

References

[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919

[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007

[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008

[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001

[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004

[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative

epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006

[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007

[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008

[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008

[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260

[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006

[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011

[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011

[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010

[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004

[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014

[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009

[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997

[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014

[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013

[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012

[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012

[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008

BioMed Research International 11

[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008

[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007

[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009

[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011

[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011

[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010

[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008

[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 11: Research Article Low-Rank and Sparse Matrix Decomposition ...downloads.hindawi.com/journals/bmri/2015/573956.pdfResearch Article Low-Rank and Sparse Matrix Decomposition for Genetic

BioMed Research International 11

[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008

[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007

[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009

[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011

[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011

[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010

[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008

[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 12: Research Article Low-Rank and Sparse Matrix Decomposition ...downloads.hindawi.com/journals/bmri/2015/573956.pdfResearch Article Low-Rank and Sparse Matrix Decomposition for Genetic

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology