Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Research ArticleLow-Rank and Sparse Matrix Decomposition forGenetic Interaction Data
Yishu Wang1 Dejie Yang2 and Minghua Deng134
1Center for Quantitative Biology Peking University Beijing 100871 China2Institute of Computing Technology Chinese Academy of Science Beijing 100190 China3School of Mathematical Sciences Peking University Beijing 100871 China4Center for Statistical Sciences Peking University Beijing 100871 China
Correspondence should be addressed to Minghua Deng dengmhmathpkueducn
Received 8 January 2015 Accepted 13 March 2015
Academic Editor Junwen Wang
Copyright copy 2015 Yishu Wang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Background Epistatic miniarray profile (EMAP) studies have enabled the mapping of large-scale genetic interaction networksand generated large amounts of data in model organisms One approach to analyze EMAP data is to identify gene moduleswith densely interacting genes In addition genetic interaction score (119878 score) reflects the degree of synergizing or mitigatingeffect of two mutants which is also informative Statistical approaches that exploit both modularity and the pairwise interactionsmay provide more insight into the underlying biology However the high missing rate in EMAP data hinders the developmentof such approaches To address the above problem we adopted the matrix decomposition methodology ldquolow-rank and sparsedecompositionrdquo (LRSDec) to decompose EMAP data matrix into low-rank part and sparse part Results LRSDec has beendemonstrated as an effective technique for analyzing EMAP data We applied a synthetic dataset and an EMAP dataset studyingRNA-related processes in Saccharomyces cerevisiae Global views of the genetic cross talk between different RNA-related proteincomplexes and processes have been structured and novel functions of genes have been predicted
1 Introduction
Genetic interactions which represent the degree to whichthe presence of one mutation modulates the phenotype ofa second mutation could be measured systematically andquantitatively in recent years [1 2] Genetic interactionscan reveal functional relationships between genes and path-ways Furthermore genetic networks measured via high-throughput technologies could reveal the schematic wiring ofbiological processes and predict novel functions of genes [3]Recently several high-throughput technologies have beendeveloped to identify genetic interactions at genome scaleincluding Synthetic Genetic Array (SGA) [4] Diploid-BasedSynthetic Lethality Analysis on Microarrays (dSLAM) [5]and epistatic miniarray profile (EMAP) [6] In particularEMAP systematically construct double deletion strains bycrossing query strains with a library of test strains andidentify genetic interactions by measuring a growth pheno-type An 119878 score was calculated based on statistical methods
for each pair of genes while negative 119878 scores representsynthetic sicklethal and positive 119878 scores indicate alleviatinginteractions [6]
Consequently for each pair of genes there are twodifferent measures of relationship in EMAP platform Firstthe genetic interaction score (119878 score) represents the degreeof synergizing or mitigating effects of the two mutations incombination Second the similarity (typically measured asa correlation) of their genetic interaction profiles representsthe congruency of the phenotypes of the two mutationsacross a wide variety of genetic backgrounds So there aretwo important aspects in exploiting EMAP data On theone hand cellular functions and processes are carried out inseries of interacting events so genes participating in the samebiological process tend to interact with each otherThereforealgorithms that detect gene modules composed of denselyinteracting genes are of great interest Within these modulesgenes tend to have similar genetic interaction profiles thusthe submatrix for these genes tends to have a low-rank
Hindawi Publishing CorporationBioMed Research InternationalVolume 2015 Article ID 573956 11 pageshttpdxdoiorg1011552015573956
2 BioMed Research International
structureOn the other hand the cross talks betweenmodulesare usually indicated by gene pairs with high 119878 scores (sothat the genetic interaction is significant) Removing themresults in better low-rank structure Evocatively these genepairs are likely shadows over the low-rankmatrix and connectdifferent rank areas These cross talks reveal the relation-ships of different biological process or protein complexesMeanwhile gene pairs exhibiting high absolute value of 119878
scores may encode proteins that are physically associatedor be enriched in protein-protein interactions [7ndash9] Sothe investigation of 119878 score is equally important Howeverthe current methodologies in genetic interaction networksanalysis did not efficiently address these two important issuessimultaneously
In order to identify modules and between-module crosstalks in genetic interaction networks we employ the ldquolow-rank and sparse decompositionrdquo (LRSDec) to decomposeEMAP data matrix into a low-rank part and a sparse partWe propose that the low-rank structure accounts for genemodules in which genes have high correlations and thesparsity matrix captures the significant 119878 scores In particularentries in sparse matrix found by LRSDec correspond totwo sources of biologically meaningful interactions within-module interactions and between-module links In this paperwe focus our discussion of the sparse matrix on the resultsof between-module links while the results of within-moduleinteractions can be found in the Supplementary Mate-rial available online at httpdxdoiorg1011552015573956(Supplementary Data 1)
Low-rank and sparse of matrix structures have beenprofoundly studied in matrix completion and compressedsensing [10 11] The robust principal component analysis(RPCA) [12] proved that the low-rank and the sparse com-ponents of a matrix can be exactly recovered if it has a uniqueand precise ldquolow-rank + sparserdquo decomposition RPCA offersa blind separation of low-rank data and sparse noises whichassumedX = L+S (S is the sparse noise) and exactly decom-posesX into L and Swithout predefined rank(L) and card(S)Another successful matrix decomposition method GoDecstudied the approximated ldquolow-rank+ sparserdquo decompositionof amatrixX by estimating the low-rank part L and the sparsepart S from X allowing noise that is X = L + S + e andconstrained the rank range of L and the cardinality range of119878 [13] It has been stated that GoDec has outperformed otheralgorithms before
In this paper we modified the GoDec matrix decompo-sition method and developed ldquolow-rank and sparse decom-positionrdquo (LRSDec) to estimate the low-rank part L and thesparse part S of X LRSDec minimizes the nuclear normof L and predefines the cardinality range of S while con-sidering the additive noise e Different from GoDec whichdirectly constrains the rank range of L LRSDec minimizesits responding convex polytopes that is the nuclear norm ofL It has been proven that the nuclear norm outperforms therank-restricted estimator [14] Furthermore if in presence ofmissing data LRSDec could impute the missing entries whiledecomposing with no need for data pretreatment whileGoDec could not accomplish decomposition and imputationsimultaneously then we stated the convergence properties of
our algorithm and proved that given the two regularizationparameters the objective value of LRSDec monotonicallydecreases By applying both methods to a synthetic datasetwe demonstrated the superiority of LRSDec over GoDecFinally we analyzed a genetic interaction dataset (EMAP)using our algorithm and identified many biologically mean-ingful modules and cross talks between them
2 Model
LetX be an 119898 times 119899 matrix that represents a genetic interactiondataset where 119898 is the number of query genes and 119899 is thenumber of library genes We propose to decompose X as
X = L + S + e (1)
where L isin R119898times119899 denotes the low-rank part and S isin
R119898times119899 denotes the sparse part and e is the noise Here weintroduce L isin R119898times119899 to account for modules in whichgenes are highly correlated These modules correspond toprotein complexes pathways and biological pathways inwhich genes tend to share similar genetic interaction profiles[15] S isin R119898times119899 is introduced to account for significant119878 scores which are either gene pairs in the same modulethat have genetic interactions or cross talks among differentfunctional modules
Based on the assumptions above we propose to solve thefollowing optimization problem
minimize rank (L) minimize card (S)
subject to sum
(119894119895)
(119883119894119895
minus 119871119894119895
minus 119878119894119895)2
⩽ 120575(2)
where 120575 ge 0 is a regularization parameter that controls theerror tolerance and card(S) denote the number of nonzeroentries in matrix S
To make the minimization problem tractable we relaxthe rank operator on L with the nuclear norm which hasbeen proven to be an effective convex surrogate of the rankoperator [14]
minimize Llowast minimize card (S)
subject to sum
(119894119895)
(119883119894119895
minus 119871119894119895
minus 119878119894119895)2
⩽ 120575(3)
where Llowastis the nuclear norm of L (L
lowast= sum119903
119894=1120590119894 where
1205901 120590
119903are the singular values of L and 119903 is the rank of L)
However missing data is commonly encountered inEMAP data confounding techniques such as cluster analysisand matrix factorization Here we extend our basic model(3) to handle EMAP data with missing values by imputingmissing entries in thematrix simultaneouslywhen estimatinglow-rank matrix L and sparse matrix S Suppose that we onlyobserve a subset of X indexed by Ω and the missing entriesare indexed by Ω
perp In order to find a low-rank matrix L and
BioMed Research International 3
a sparse matrix S based on the observed data we propose tosolve the following optimization problem
minimize Llowast
minimize card (S)
subject to sum
(119894119895)isinΩ
(119883119894119895
minus 119871119894119895
minus 119878119894119895)2
⩽ 120575(4)
3 Algorithm
Similar to GoDec the optimization problem of (3) can besolved by alternatively optimizing the following two subprob-lems until convergence
L119905
= arg minLlowast
1003817100381710038171003817X minus L minus S119905minus1
1003817100381710038171003817
2
119865 (5a)
S119905
= arg mincard(S)le119896
1003817100381710038171003817X minus L119905minus S1003817100381710038171003817
2
119865 (5b)
In each iteration we optimize the objective function byalternatively updating L and S Firstly the subproblem (5a)can be solved by [14] For fixed S the solution of (5a) is
L119905
= T120582
(X minus S119905minus1
) (6)
Here 120582 ge 0 is a regularization parameter controlling thenuclear norm of estimated value L
119905 where
T120582
(W) = UD120582V1015840
with D120582
= diag [(1198891
minus 120582)+
(119889119903
minus 120582)+]
(7)
UDV1015840 is the Singular Value Decomposition (SVD) of W andhere 119905
+= max(119905 0) The notation T
120582(W) refers to soft-
thresholding [14]Next the subproblem (5b) in (3) could be updated via
entry-wise hard thresholding of X minus L119905for fixed L
119905 Before
giving the solution we define an orthogonal projectionoperator P Suppose there is a subset of dataset W indexedby Ω then the matrix W can be projected onto the linearspace of matrices supported by Ω
119875Ω
(W)(119894119895)
=
119882119894119895
if (119894 119895) isin Ω
0 if (119894 119895) notin Ω
(8)
AndPΩperp is its complementary projection that isP
Ω(W) +
PΩperp(W) = WThen the solution of (5b) could be given as follows
S = PΘ
(X minus L119905) (9)
where P is the orthogonal projection operator as definedabove Θ is the nonzero subset of the first 119896 largest entriesof |(X minus L
119905)| Then the matrix (X minus L
119905) can be projected onto
the linear space of matrices supported by Θ
PΘ
(X minus L119905)(119894119895)
=
(X minus L119905)119894119895
if (119894 119895) isin Θ
0 if (119894 119895) notin Θ
(10)
So far we have developed the algorithm for solvingproblem (3) As for problem (4) due to the existence ofmissing values we took the optimization on the observeddata Ω We updated L
119905and S
119905of the following optimization
subproblems respectively
L119905
= arg minLlowast
1003817100381710038171003817PΩ (X minus L minus S119905minus1
)1003817100381710038171003817
2
119865 (11a)
S119905
= arg mincard(S)le119896
1003817100381710038171003817PΩ (X minus L119905minus S)
1003817100381710038171003817
2
119865 (11b)
The term PΩ
(X minus L minus S)2
119865is the sum of squared errors on
the observed entries indexed by ΩThe subproblem (11a) can be solved by updating Lwith an
arbitrary initialization using [14]
L119905
larr997888 T120582
(PΩ
(X minus S119905minus1
) + PΩperp (L)) (12)
The solution of subproblem (11b) is
S119905
= PΘ
(PΩ
(X minus L119905)) (13)
where Θ is the nonzero subset of the first 119896 largest entries of|PΩ
(X minus L119905)|
Now we have the following algorithm
Algorithm 1 (LRSDec) (i) Input X isin R119898times119899 Initialize S larr 0(ii) Iterate until convergence(a) L-step iteratively update L using (12)(b) S-step Solve S using (13)
(iii) Output L S
The convergence analysis of our algorithms is provided inthe Supplementary Material
4 Parameter Tuning
We have two parameters that need to be tuned in our models120582 and 119896 Here we propose a 10-fold cross validation strategyto select them The idea is as follows let Ω be the index ofobserved entries ofX We randomly partition Ω into 10 equalsize subsets and choose training entriesΩ
1and testing entries
Ω2 Ω1
cup Ω2
= Ω and Ω1
cap Ω2
= Oslash |Ω1| = 09 lowast |Ω| |Ω
2| =
01lowast |Ω| Wemay solve problem (15) on a grid of (120582 119896) valueson the training data
L119905
= arg minLlowast
10038171003817100381710038171003817PΩ1
(X minus L minus S119905minus1
)10038171003817100381710038171003817
2
119865
(14a)
S119905
= arg mincard(S)le119896
10038171003817100381710038171003817PΩ1
(X minus L119905minus S)
10038171003817100381710038171003817
2
119865
(14b)
Then we evaluate the prediction error (15) on the testingdata
Err (120582 119896) =1
2
10038171003817100381710038171003817PΩ2
(X minus L(120582 119896) minus S(120582 119896))10038171003817100381710038171003817
2
119865
(15)
The cross validation process is repeated for 10 timesThenwe can find the optimal parameter (120582
lowast
119896lowast
) which minimizesthe mean of the prediction error
4 BioMed Research International
0 50 100 150 200 250 300 3500
200400
6000
10
20
Rank of matrix L
Card of matrix S
Rela
tive e
rror
of
The minimum ofrelative error
25
250
deco
mpo
sitio
n
Figure 1 Results of parameter tuning on synthetic data 119909-axisdifferent cardinality corresponding to parameter 119896 119910-axis differentrank corresponding to parameter 120582 119911-axis the relative error X minus
X2
119865X
2119865
5 Results
51 Synthetic Data We simulated a synthetic data and thenapplied LRSDec algorithm andGoDec algorithm to it Specif-ically low-rank part sparse part and noises are generated asfollows
(i) Low-rank part the covariance matrix Σ is generatedbyHH119879 whereH isin R119898times119870 andH
119894119895sim N(0 1) Here119870
is the number of hiddenmodulesThe random entriesL119895are drawn fromN(0 120591Σ) Let L = [L
1 L
119899]
(ii) Sparse part the non-zero entries in sparse matrixare generated from the tail of Gaussian distributionN(1 2) whose upper quantile is 120572 = 001 Werandomly selected 70 of them to assign the oppositesignThis is consistent with EMAP datasets in whichnegative genetic interactions are much more preva-lent than the positive ones
(iii) e = 10minus2
lowastF wherein F is a standard Gaussianmatrix
A low-rank matrix 119871 with rank 25 and sparse matrix withcardinality 250 were generated respectively Now we have
X = L + S + e (16)
The first step is parameter training and the result isshowed in Figure 1 Minimal prediction error was achievedwhen 120582 = 250 and 119896 = 25 which coincides with the rankand cardinality of the synthetic data This demonstrated theeffectiveness of cross validation procedure
Next we compared the performance of LRSDec algo-rithm and GoDec algorithm by comparing their predictionerror The relative error is defined as
10038171003817100381710038171003817W minus W10038171003817100381710038171003817
2
119865
W2
119865
(17)
where W is the original matrix and W is an estimateap-proximation As both algorithms are influenced seriouslyby the parameters we compared the relative error of thetwo algorithms by given different parameters To make thecomparison simple we only changed one parameter with
another parameter fixed (Figure 2) One can see that bothalgorithms reach the smallest relative if adopting the correcttwo parameters and LRSDec outperforms GoDec In theSupplementaryMaterial we also compare the performance ofboth algorithms under different noise setting and the trendsare the same
52 Application to EMAP Data from Yeast We also appliedour method to EMAP data interrogating RNA processing inS cerevisiae which consists of 552 genes involved in RNA-related processes [9]This geneticmap contains about 152000pairwise genetic interaction measurements with about 29missing entries in data matrixWe applied ourmethod to thisEMAP data denoted asX to obtain twomatrices a low-rankmatrix L and a sparse matrix S X = L+S is the new completedata matrix with imputed missing entries To exploit thequantitative information from EMAP data we first subjectedthe entire low-rank matrix L to hierarchical clustering anapproach that groups genes with similar patterns of geneticinteractions It should be noted that using low-rank matrix Lin cluster improved the performance of hierarchical cluster-ing in detecting genetic interaction modules [16] Accordingto the clustering result we reordered rows and columnsof matrix X so that the protein complexes and biologicalprocesses showed in [9] could be found (Figure S1)
To help identify more modules of cellular functions andprocesses and reveal the relationships between them we fur-ther analyzed the matrices L and S Figure 3 is a flowchart ofour strategy 1 to detectmodules and cross talks between themthrough low-rank matrix L In this paper we define moduleas a cluster from hierarchical clustering (HC) that passesthrough GO-enriched filtering (Supplementary Section 3)The details are as follows Firstly we clustered the row genesof matrix L using hierarchical clustering (HC) Here weadopted the average-linkage hierarchical clustering algorithmin which the distance of gene 119860 and gene 119861 is defined as1 minus |cor(119860 119861)| where |cor(119860 119861)| is the absolute value of thecorrelation of genetic interaction profile of gene 119860 and gene119861 A cutoff needs to be applied for HC to cut the hierarchicalclustering tree We used the Jaccard index (SupplementarySection 4) to determine how well the predicted gene setscorrespond to benchmark (theoretical) gene sets [17] HereGO functional gene sets are used as benchmark (theoretical)gene setsThe cutoff atwhichHCachieved the highest Jaccardindex is used to cut the hierarchical tree We calculatedthe Jaccard index of every ldquoheightrdquo cutoff in hierarchicalclustering from 02 to 095 by 005 interval This step resultedin the best Jaccard index with height = 07 and 84 clustersNow we got the clusters of row genes in which genes act ina consistent manner across the entire column genes Thenwe filtered the clustering results by a hypergeometric testthat calculates the significance of enrichment of GO itemsand the 119901 value was set to 001 The clusters enriched in GOfunctional categories are defined as row modules Secondlyfor each row module we exploited modules of column genesbased on this row gene module in matrix X by clustering thecolumn genes of this submatrix of X Thirdly we screenedcolumn clusters whose interactions with the rowmodules are
BioMed Research International 5
0 50 100 150 200 250 300 3500
02
04
06
08
1
Number of ranks
The r
elat
ive e
rror
Relative error of matrix L
(a)
0 50 100 150 200 250 300 3500
02
04
06
08
1
12
14
Number of ranks
The r
elat
ive e
rror
Relative error of matrix S
(b)
0 100 200 300 400 500 6000
005
01
015
02
025
03
035
Number of cards
The r
elat
ive e
rror
Relative error of matrix L
GoDecLRSDec
(c)
0 100 200 300 400 500 6000
02
04
06
08
1
12
14
Number of cards
The r
elat
ive e
rror
Relative error of matrix S
GoDecLRSDec
(d)
Figure 2 Performances of LRSDec and GoDec in low-rank and sparse decomposition tasks on synthetic data under different parameters((a)-(b)) Fixed parameter card different parameter rank ((c)-(d)) fixed parameter rank different parameter card
HC row genes of L
Row modules
GO-enrichment filtering
Based oneach row module
HC column genes of L
Fisherrsquos exact testColumn modules
Connections between this row module and all column modules
Column modules that have functional
All column modules map to GO items
Connections among modules of the whole network
Repeat these three steps for
all significant
row modules
Figure 3 Flowchart of our strategy 1 for detecting modules and cross talks between them in genetic interaction network by low-rank matrixL
6 BioMed Research International
Table 1 Fisherrsquos exact test table (Gene set 119860)perp denotes thecomplementary set of gene set 119860 119860119861 denotes the number ofconnections between gene set 119860 and gene set 119861 119860119861
perp
denotes thenumber of connections between gene set 119860 and the complementaryset of gene set 119861
Gene set 119861 (Gene set 119861)perp
Gene set 119860 119860119861 119860119861perp
(Gene set 119860)perp 119860perp
119861 119860perp
119861perp
Table 2 Clustering results
(a) GO slim as benchmark gene set
Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched
200 0063 190 0022 46150 0070 138 0032 44100 0084 90 0050 5050 0088 30 0067 24
(b) GO BP FAT as benchmark gene set
Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched
200 0137 183 0044 47150 0147 142 0058 50100 0155 96 0091 5250 0131 34 0078 26 hypergeometric test applied to test the enrichment of gene sets Signif-icance level FDR lt= 005 Cluster the number of clusters to cut off thehierarchical clustering tree Enriched the number of modules predictedby hierarchical clustering enriched in the GO iterms
significantly enriched by Fisherrsquos exact test (Table 1 119901 value= 005) Here we defined the positive genetic interactions asthose gene pairs with genetic interaction scores 119878 ge 20 andnegative as 119878 le minus25 [9] The reduced gene sets of columngenes were defined as columnmodules (corresponding to therow module) Next we identified the enriched GO functionalcategories of these column modules by mapping them to GOitems (hypergeometric test) Finally repeating these steps forall row modules we identified the modules and intermodulegenetic cross talk of the whole genetic interaction network(Figures 4ndash6) where red and green represent a statisticallysignificant enrichment of positive and negative interactionsThe cross talks constructed in these figures are the 119878 scoresamong genes in the original matrix In the following we willdiscuss many of the interesting connections that have beenreported previously
The low-rank matrix found more functional modulesthan the original matrix (Table 2) In Table 2 we cut thedendrogram at different heights and compared the clustersobtained from the low-rank matrix and that of the originalmatrix Jaccard index [17] is used to determine how well thepredicted clusters recaptured the benchmark gene sets ((a)GO slim and (b) GO BP FAT) The definition of Jaccardindex can be found in the Supplementary Material Inthe ideal situation where predicted clusters perfectly match
the benchmark gene sets the Jaccard index is 1 The largerthe Jaccard index the better the predictions The clustersobtained from clustering of the low-rank matrix are moreenriched with GO functional categories at varying cutoffs(Table 2)
Figure 4 gives an overview of the relationships amongbiological processes when GO slim (downloaded from SGD[18] a broad overview of all of the top GO categories) is usedas GO items We found that several sets of genes that havebeen known to function in the same biochemical processescontain predominantly positive or negative interactionswhich was also observed in [9] For example genes classed asinvolved in RNA splicing and transcription are significantlyenriched in negative genetic interactions (Figure 4 greennodes) In contrast the module involved in protein foldinghas strong positive interactions (red node) In additionFigure 4 also suggested that not all modules have consistentpattern of interactions (yellow node) which is reasonable inbiological processes Finally several connections have beenpreviously discussed For example there is good evidencefor functional interactions between splicing and transcriptionin [19] and functional interactions between splicing andtranslation in [20] Furthermore [21] reported the cooper-ative relationship between protein folding and chromosomeorganization
Actually if we classify the GO functional categories tomore fine items (GO BP FAT downloaded from DAVIDhttpdavidabccncifcrfgov) we can get a more compre-hensive network (Figure 5) Many of the interaction resultsin Figure 5 have been reported before For instance genesinvolved in tubulin complex assembly process have negativegenetic interactions with genes involved in tRNA wobbleuridine modification process supported by SGD while thenegative genetic interactions between RNA splicing processand tRNA metabolic process could also be found Actuallythe genetic interaction between RNA splicing and chromatinmodification has been studied in [22] And the balance ofthe interactions between the processing of ribonucleoproteinassembly of intronic noncoding RNAs and the splicingprocess regulating the levels of ncRNA and host mRNA canbe found in [23] Tubulin functionally relating to roles ofthe elongator complex in tRNA wobble uridine modificationis supported by [24] Moreover epistasis and chromatinimmunoprecipitation experiments indicating that the lossof Rrp6 (regulation of transcription) function is paralleledby the recruitment of Hda1 (histone deacetylase) have beenreported by [25] Finally cotranscriptional recruitment of themRNA export factor Yra1 realized by direct interaction withthe 31015840 end processing factor Pcf11 was in [26]
In an effort to gain insight into the functional organi-zation of RNA-related complexes we used the GO CC FAT(downloaded from DAVID) as the GO functional categoriesto create a map that highlights strong genetic trends bothwithin and between these complexes This result could befound in the Supplementary Material
Now let us turn to the analysis of the sparse matrix S Thesparse matrix gives two distinct measures to exploit geneticinformation First extreme 119878 scores indicate cofunctionalmembership more efficiently [27] Second some 119878 scores
BioMed Research International 7
Negativeinteractions
Positiveinteractions
LOC1LIN1
BUD13
IST3LEA1
STO1
PRP11SNU66
CBC2BRR1MUD2
TGS1YNR004WCWC15ECM2
YPR152CNTC20ISY1CWC21SYF2CWC27PRP19CSN12
PML1MUD1
NAM8
HMT1SNU56
PRP42TIF35DBR1YHC1
LOC1
LIN1
BUD13
IST3
LEA1
STO1
PRP1
1
SNU66
CBC2
BRR1
MU
D2
TGS1
YNR0
04
WCW
C15
ECM2
YPR1
52
CN
TC20
ISY1
CWC2
1
SYF2
CWC2
7
PRP1
9
CSN12
PML1
MU
D1
NA
M8
HM
T1SN
U56
PRP4
2
TIF3
5
DBR
1
YHC1
NPL3PUF4THP2RTT103CTK2SIF2HOS2SET3CDC73LEO1
RTT109
RTT109
CTK1HTZ1SOH1
SOH1
DST1CSE2
CSE2
MED1
SDC1EAF7THP1SYC1RXT2
RXT2
SWR1VPS71
PHO23
SAP30
PHO23SAP30SIN3
SIN3
HIR2
PUF4
SIF2HOS2SET3CDC73LEO1
SOH1CSE2
SDC1EAF7
RXT2
PHO2
SAP3
0SI
N3
HIR2
HIR2
RTT109G
BP2
SWR1VPS71
GLC
7
HIR2
NPL
3
PUF4
THP2
RTT1
03
CTK2
SIF2
HO
S2SE
T3CD
C73
LEO1
RTT1
09
CTK1
HTZ
1
SOH1
DST
1
CSE2
MED
1
SDC1
EAF7
THP1
SYC1
RXT2
PHO23
SAP3
0
SIN3
HIR2
GBP
2YR
A1
THP2
RPN4
SOH1
SAC3
THP1
DO
A1
RAD23
RTT1
09
SIN3
SWD1
SWD3
SWD3
RPN
4CA
F40
BRE1
SWD1
BRE1
LGE1
CDC7
3EA
F6EA
F5EA
F7SR
C1PA
P2EA
F3
GBP2
RAD23
CSE2
PUF4
SWD1
SWD3
BRE1
LGE1
RAD52
SIF2
HO
S2
SOH1
SET3
CDC7
LEO1
SRC1
SDC1
EAF3
RTT1
0
SWR1
VPS
71
RAD5
RAD5
RAD2
EAF6
EAF7
BRE1SIF2HOS2SET3SEM1
SAC3THP1BUD13
RPN4
RAD52
GLC7SRC1TOR1RTT101RAD51
SOH1
CSE2SIN3
MLP2
BRE1
KEM
1
SIF2
HO
S2SE
T3SE
M1
SAC3
THP1
BUD13
RPN4
RAD52
GLC
7
SRC1
TOR1
RTT1
01
RAD51
SOH1
CSE2
SIN3
MLP
2
YBL104CYGR283CRIC1YPT6GIM4
YKE2GIM3
GIM5
YGR2
83
CRI
C1YP
T6G
IM4
YKE2
GIM
3
GIM
5
RNA splicing
Transcription
DNA metabolicprocess
Cellular proteinmodification
processTranslation
Cellular proteindegradation
Transport
Mitochondrionorganization
Ribosome
Chromosomeorganization
Response tostress
Cell-divisioncycle
Protein folding
LRS4
HEX
3SL
X8LR
P1RR
P6FP
R1TG
S1
Figure 4 Global view of the genetic cross talks between different RNA-related processes (GO slim items) Green and red represent astatistically significant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20)interactions respectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative geneticinteractions Nodes (balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heatmaps represent scores of interactions within one process and the rectangle heatmaps represent scores of interactions betweentwo processes
indicate the significant genetic interactions between genesin different gene sets Following this clue by analyzing thematrix S we found much evidence of genes involved in thesame functionalmodules andmany cross talks between func-tional modules (Figure 6) Actually many of them supportthe network in Figures 4-5 The information of gene pairswith extreme 119878 scores and the involved modules could befound in the Supplementary Material (Supplementary Data2)
Strategy 2 of sparse matrix analysis is similar to strategy 1(Figure 3) First for every row module (the same definitionas that in Figure 3) cluster the column genes based ontheir genetic spectrums across genes in this row moduleThen select the column gene sets in which there are genesbelonging to nonzero entries of sparse matrix to be definedas column modules Finally map these column modules toGO items identifying their enriched functional categories(hypergeometric test) Similarly for all row modules repeat
8 BioMed Research International
PRP4
YRA1
CDC73
LGE1SDC1
RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1
SWD1RTT109
SWD3BRE1EAF5EAF7KAP122
SSD1TAD3EAF6
SWR1VPS71HTZ1EAF3CUL3
SEM1THP2SAC3THP1
PUF4GLC7SRC1
TOP1NUP60EFB1DBP5HIR2
SEM
1TH
P2SA
C3TH
P1
PUF4
GLC
7SR
C1
TOP1
NU
P60
EFB1
DBP
5H
IR2
RTT103
GBP2
CTK1
CTK2
SIF2HOS2SET3LEO1NPL3
UBP6DOA1RPN4
RAD23
PRP4
YRA1
CDC73
LGE1
SDC1
RXT2PHO23
SAP30
RXT2
PHO23
SAP3
0
SIN3
TAD3
DST1
CSE2
MED1
SWD1
RTT109
SWD3BRE1
EAF5EAF7
KAP122
SSD1
SWR1VPS71
CUL3
SOP1
EAF6
SET3
HTZ1
EAF3
RTT103
GBP2
CTK1
CTK2
SIF2
HOS2
SIN
3
LEO1
NPL3
DOA1UBP6
RPN4
RAD23
CBC2
CSN12
LEA1
CWC21
YHC1
PRP19
ELP6
PRP11
CWC27NTC20
IKI1
ELP2
ELP3IKI3
ELP4
ECM2
MUD2TGS1
SNU56
BRR1
BUD13
MLP2SYF2
YNR004WLIN1
SNU66PML1
YPR152C
URM1
ELP6
IKI1
ELP2
ELP3
IKI3
ELP4
URM
1
MUD1
HMT1
GIM5
YKE2GIM3
GIM4
GIM5YKE2GIM3GIM4
ELP3
ELP6
IKI1
GIM
5
YKE2
GIM
3
GIM
4
HTZ
1
SWR1
VPS
71
SGF1
1
TGS1
SKI7
HCR
1
RPS8
NPL
3
DRS
1
IST3
TIF35
ISY1
NAM8
CBC2
CSN12
LEA1
CWC21
YHC1
PRP11
CWC27
NTC20
ECM2
MUD1
TGS1
SNU56BRR1
BUD13
MLP2
SYF2
YNR004W
LIN1
SNU66
PML1
YPR152C
MUD2
HMT1
IST3
TIF35
ISY1
NAM8
PRP4
YRA
1
CDC7
3LG
E1
SSD
1
RXT2
PHO
23SA
P30
SIN
3D
ST1
CSE2
MED
1SO
H1
SWD
1
RTT1
09
SWD
3BR
E1
EAF5
EAF7
EAF6
KAP1
22
SDC1
TAD
3
SWR1
VPS
71H
TZ1
EAF3
CUL3
RTT1
03
GBP
2
CTK1
CTK2
SIF2
HO
S2SE
T3LE
O1
NPL
3
UBP
6D
OA
1
RPN
4
RAD
23
RNA splicing
Transcription
Transcription
transcription
RNA transport RNA localization
Intracellulartransport
Ribonucleoproteincomplex
biogenesis
Ribonucleoproteincomplex
biogenesis
Deadenylationdependentdecapping of
nucleartranscribed
mRNA
biogenesis
Regulation ofcellular protein
metabolic ncRNA
mRNA catabolic
metabolic
Ribosomeprocess
process
process
processprocess
process
Posttranscriptionalregulation of
gene expression
dependentmacromolecule
catabolic
tRNAmodification
Modification
process
rRNA metabolic
mRNA metabolic
process
process
process
process
Macromoleculecatabolicprocess
aciddeacetylation
deacetylation
Protein amino
RNA catabolic
Macromoleculecatabolic
tRNA metabolic
Chromatinmodification
tRNA wobbleuridine
modification
rRNA metabolicMacromolecularcomplex subunit
organizationCellular protein
complexassembly
ncRNAmetabolic
Cellular proteinlocalization
Regulation oftranslation
Regulation of
from nucleusmRNA export Histone
Maturation ofSSU rRNA
tRNA
Tubulin complexassembly
RNA capping
processing
RNAprocessing
PRP19
aminoacylation
mRNA 3998400end
3998400end
Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes
the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)
Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein
folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878
score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial
BioMed Research International 9
RNA transportMaturation of
SSU-rRNA
RPS6A
POM152
UAF30
DBP5 NUP157
RPS11B
RPS6B
THP1
SRC1
TOP1
SEM1
NUP60
ENP1
SAC3
PUF4 HIR2
GLC7EFB1
THP2
(a)
RNA transport RNA localization
RRP6
YRA2
DBP5
NUP53
LRP1
THP1SRC1
TOP1
SEM1
NUP60
SAC3
PUF4HIR2
GLC7EFB1
THP2
(b)
Regulation ofgene expression
mRNA metabolicprocess
POP1
PRP42
CDC36SNU71
PAB1
SUP35
PRP3
PRP5
PRP6
PRP21
CDC73
EAF3
RTT103
RTT109
SWD1
SWD3
SDC1SAP30
RXT2
CTK1SET3
RPN4
PRP4
PHO23
NPL3
MED1
LGE1
LEO1
KAP122 DST1CTK2
EAF5HOS2
TAD3CSE2 BRE1
VPS71EAF7EAF6
SOH1
SIF2DOA1
UBP6 SWR1
SSD1
SIN3
HTZ1
(c)
Tubulin complex assembly
(protein folding)mRNA
process3998400 end
NAB2
PAN3
CTK1
GIM3
YKE2
GIM5
GIM4
(d)
Tubulin complex assembly
TOP1
MLP2 HIR2
CDC36CKA1
UTP4
GIM3
YKE2
GIM5
GIM4
(protein folding)Regulation oftranscription
(e)
mRNA splicingncRNA metabolic
process
YPR152C
YNR004WCSN12
MUD1
YHC1
MUD2
NAM8
MSM1
MSD1
MSK1
MRM1
MSF1
PRP19
SNU66
SNU56
PRP11
PML1ECM2
BUD13
HMT1
CWC21CWC27 TGS1
IST3CBC2
ISY1
LIN1
LEA1
MLP2
BRR1
NTC20SYF2
TIF35
(f)
Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix
10 BioMed Research International
6 Conclusion
In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)
References
[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919
[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007
[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008
[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001
[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004
[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative
epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006
[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007
[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008
[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008
[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260
[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006
[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011
[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011
[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010
[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004
[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014
[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009
[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997
[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014
[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013
[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012
[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012
[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008
BioMed Research International 11
[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008
[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007
[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009
[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011
[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011
[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010
[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008
[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
2 BioMed Research International
structureOn the other hand the cross talks betweenmodulesare usually indicated by gene pairs with high 119878 scores (sothat the genetic interaction is significant) Removing themresults in better low-rank structure Evocatively these genepairs are likely shadows over the low-rankmatrix and connectdifferent rank areas These cross talks reveal the relation-ships of different biological process or protein complexesMeanwhile gene pairs exhibiting high absolute value of 119878
scores may encode proteins that are physically associatedor be enriched in protein-protein interactions [7ndash9] Sothe investigation of 119878 score is equally important Howeverthe current methodologies in genetic interaction networksanalysis did not efficiently address these two important issuessimultaneously
In order to identify modules and between-module crosstalks in genetic interaction networks we employ the ldquolow-rank and sparse decompositionrdquo (LRSDec) to decomposeEMAP data matrix into a low-rank part and a sparse partWe propose that the low-rank structure accounts for genemodules in which genes have high correlations and thesparsity matrix captures the significant 119878 scores In particularentries in sparse matrix found by LRSDec correspond totwo sources of biologically meaningful interactions within-module interactions and between-module links In this paperwe focus our discussion of the sparse matrix on the resultsof between-module links while the results of within-moduleinteractions can be found in the Supplementary Mate-rial available online at httpdxdoiorg1011552015573956(Supplementary Data 1)
Low-rank and sparse of matrix structures have beenprofoundly studied in matrix completion and compressedsensing [10 11] The robust principal component analysis(RPCA) [12] proved that the low-rank and the sparse com-ponents of a matrix can be exactly recovered if it has a uniqueand precise ldquolow-rank + sparserdquo decomposition RPCA offersa blind separation of low-rank data and sparse noises whichassumedX = L+S (S is the sparse noise) and exactly decom-posesX into L and Swithout predefined rank(L) and card(S)Another successful matrix decomposition method GoDecstudied the approximated ldquolow-rank+ sparserdquo decompositionof amatrixX by estimating the low-rank part L and the sparsepart S from X allowing noise that is X = L + S + e andconstrained the rank range of L and the cardinality range of119878 [13] It has been stated that GoDec has outperformed otheralgorithms before
In this paper we modified the GoDec matrix decompo-sition method and developed ldquolow-rank and sparse decom-positionrdquo (LRSDec) to estimate the low-rank part L and thesparse part S of X LRSDec minimizes the nuclear normof L and predefines the cardinality range of S while con-sidering the additive noise e Different from GoDec whichdirectly constrains the rank range of L LRSDec minimizesits responding convex polytopes that is the nuclear norm ofL It has been proven that the nuclear norm outperforms therank-restricted estimator [14] Furthermore if in presence ofmissing data LRSDec could impute the missing entries whiledecomposing with no need for data pretreatment whileGoDec could not accomplish decomposition and imputationsimultaneously then we stated the convergence properties of
our algorithm and proved that given the two regularizationparameters the objective value of LRSDec monotonicallydecreases By applying both methods to a synthetic datasetwe demonstrated the superiority of LRSDec over GoDecFinally we analyzed a genetic interaction dataset (EMAP)using our algorithm and identified many biologically mean-ingful modules and cross talks between them
2 Model
LetX be an 119898 times 119899 matrix that represents a genetic interactiondataset where 119898 is the number of query genes and 119899 is thenumber of library genes We propose to decompose X as
X = L + S + e (1)
where L isin R119898times119899 denotes the low-rank part and S isin
R119898times119899 denotes the sparse part and e is the noise Here weintroduce L isin R119898times119899 to account for modules in whichgenes are highly correlated These modules correspond toprotein complexes pathways and biological pathways inwhich genes tend to share similar genetic interaction profiles[15] S isin R119898times119899 is introduced to account for significant119878 scores which are either gene pairs in the same modulethat have genetic interactions or cross talks among differentfunctional modules
Based on the assumptions above we propose to solve thefollowing optimization problem
minimize rank (L) minimize card (S)
subject to sum
(119894119895)
(119883119894119895
minus 119871119894119895
minus 119878119894119895)2
⩽ 120575(2)
where 120575 ge 0 is a regularization parameter that controls theerror tolerance and card(S) denote the number of nonzeroentries in matrix S
To make the minimization problem tractable we relaxthe rank operator on L with the nuclear norm which hasbeen proven to be an effective convex surrogate of the rankoperator [14]
minimize Llowast minimize card (S)
subject to sum
(119894119895)
(119883119894119895
minus 119871119894119895
minus 119878119894119895)2
⩽ 120575(3)
where Llowastis the nuclear norm of L (L
lowast= sum119903
119894=1120590119894 where
1205901 120590
119903are the singular values of L and 119903 is the rank of L)
However missing data is commonly encountered inEMAP data confounding techniques such as cluster analysisand matrix factorization Here we extend our basic model(3) to handle EMAP data with missing values by imputingmissing entries in thematrix simultaneouslywhen estimatinglow-rank matrix L and sparse matrix S Suppose that we onlyobserve a subset of X indexed by Ω and the missing entriesare indexed by Ω
perp In order to find a low-rank matrix L and
BioMed Research International 3
a sparse matrix S based on the observed data we propose tosolve the following optimization problem
minimize Llowast
minimize card (S)
subject to sum
(119894119895)isinΩ
(119883119894119895
minus 119871119894119895
minus 119878119894119895)2
⩽ 120575(4)
3 Algorithm
Similar to GoDec the optimization problem of (3) can besolved by alternatively optimizing the following two subprob-lems until convergence
L119905
= arg minLlowast
1003817100381710038171003817X minus L minus S119905minus1
1003817100381710038171003817
2
119865 (5a)
S119905
= arg mincard(S)le119896
1003817100381710038171003817X minus L119905minus S1003817100381710038171003817
2
119865 (5b)
In each iteration we optimize the objective function byalternatively updating L and S Firstly the subproblem (5a)can be solved by [14] For fixed S the solution of (5a) is
L119905
= T120582
(X minus S119905minus1
) (6)
Here 120582 ge 0 is a regularization parameter controlling thenuclear norm of estimated value L
119905 where
T120582
(W) = UD120582V1015840
with D120582
= diag [(1198891
minus 120582)+
(119889119903
minus 120582)+]
(7)
UDV1015840 is the Singular Value Decomposition (SVD) of W andhere 119905
+= max(119905 0) The notation T
120582(W) refers to soft-
thresholding [14]Next the subproblem (5b) in (3) could be updated via
entry-wise hard thresholding of X minus L119905for fixed L
119905 Before
giving the solution we define an orthogonal projectionoperator P Suppose there is a subset of dataset W indexedby Ω then the matrix W can be projected onto the linearspace of matrices supported by Ω
119875Ω
(W)(119894119895)
=
119882119894119895
if (119894 119895) isin Ω
0 if (119894 119895) notin Ω
(8)
AndPΩperp is its complementary projection that isP
Ω(W) +
PΩperp(W) = WThen the solution of (5b) could be given as follows
S = PΘ
(X minus L119905) (9)
where P is the orthogonal projection operator as definedabove Θ is the nonzero subset of the first 119896 largest entriesof |(X minus L
119905)| Then the matrix (X minus L
119905) can be projected onto
the linear space of matrices supported by Θ
PΘ
(X minus L119905)(119894119895)
=
(X minus L119905)119894119895
if (119894 119895) isin Θ
0 if (119894 119895) notin Θ
(10)
So far we have developed the algorithm for solvingproblem (3) As for problem (4) due to the existence ofmissing values we took the optimization on the observeddata Ω We updated L
119905and S
119905of the following optimization
subproblems respectively
L119905
= arg minLlowast
1003817100381710038171003817PΩ (X minus L minus S119905minus1
)1003817100381710038171003817
2
119865 (11a)
S119905
= arg mincard(S)le119896
1003817100381710038171003817PΩ (X minus L119905minus S)
1003817100381710038171003817
2
119865 (11b)
The term PΩ
(X minus L minus S)2
119865is the sum of squared errors on
the observed entries indexed by ΩThe subproblem (11a) can be solved by updating Lwith an
arbitrary initialization using [14]
L119905
larr997888 T120582
(PΩ
(X minus S119905minus1
) + PΩperp (L)) (12)
The solution of subproblem (11b) is
S119905
= PΘ
(PΩ
(X minus L119905)) (13)
where Θ is the nonzero subset of the first 119896 largest entries of|PΩ
(X minus L119905)|
Now we have the following algorithm
Algorithm 1 (LRSDec) (i) Input X isin R119898times119899 Initialize S larr 0(ii) Iterate until convergence(a) L-step iteratively update L using (12)(b) S-step Solve S using (13)
(iii) Output L S
The convergence analysis of our algorithms is provided inthe Supplementary Material
4 Parameter Tuning
We have two parameters that need to be tuned in our models120582 and 119896 Here we propose a 10-fold cross validation strategyto select them The idea is as follows let Ω be the index ofobserved entries ofX We randomly partition Ω into 10 equalsize subsets and choose training entriesΩ
1and testing entries
Ω2 Ω1
cup Ω2
= Ω and Ω1
cap Ω2
= Oslash |Ω1| = 09 lowast |Ω| |Ω
2| =
01lowast |Ω| Wemay solve problem (15) on a grid of (120582 119896) valueson the training data
L119905
= arg minLlowast
10038171003817100381710038171003817PΩ1
(X minus L minus S119905minus1
)10038171003817100381710038171003817
2
119865
(14a)
S119905
= arg mincard(S)le119896
10038171003817100381710038171003817PΩ1
(X minus L119905minus S)
10038171003817100381710038171003817
2
119865
(14b)
Then we evaluate the prediction error (15) on the testingdata
Err (120582 119896) =1
2
10038171003817100381710038171003817PΩ2
(X minus L(120582 119896) minus S(120582 119896))10038171003817100381710038171003817
2
119865
(15)
The cross validation process is repeated for 10 timesThenwe can find the optimal parameter (120582
lowast
119896lowast
) which minimizesthe mean of the prediction error
4 BioMed Research International
0 50 100 150 200 250 300 3500
200400
6000
10
20
Rank of matrix L
Card of matrix S
Rela
tive e
rror
of
The minimum ofrelative error
25
250
deco
mpo
sitio
n
Figure 1 Results of parameter tuning on synthetic data 119909-axisdifferent cardinality corresponding to parameter 119896 119910-axis differentrank corresponding to parameter 120582 119911-axis the relative error X minus
X2
119865X
2119865
5 Results
51 Synthetic Data We simulated a synthetic data and thenapplied LRSDec algorithm andGoDec algorithm to it Specif-ically low-rank part sparse part and noises are generated asfollows
(i) Low-rank part the covariance matrix Σ is generatedbyHH119879 whereH isin R119898times119870 andH
119894119895sim N(0 1) Here119870
is the number of hiddenmodulesThe random entriesL119895are drawn fromN(0 120591Σ) Let L = [L
1 L
119899]
(ii) Sparse part the non-zero entries in sparse matrixare generated from the tail of Gaussian distributionN(1 2) whose upper quantile is 120572 = 001 Werandomly selected 70 of them to assign the oppositesignThis is consistent with EMAP datasets in whichnegative genetic interactions are much more preva-lent than the positive ones
(iii) e = 10minus2
lowastF wherein F is a standard Gaussianmatrix
A low-rank matrix 119871 with rank 25 and sparse matrix withcardinality 250 were generated respectively Now we have
X = L + S + e (16)
The first step is parameter training and the result isshowed in Figure 1 Minimal prediction error was achievedwhen 120582 = 250 and 119896 = 25 which coincides with the rankand cardinality of the synthetic data This demonstrated theeffectiveness of cross validation procedure
Next we compared the performance of LRSDec algo-rithm and GoDec algorithm by comparing their predictionerror The relative error is defined as
10038171003817100381710038171003817W minus W10038171003817100381710038171003817
2
119865
W2
119865
(17)
where W is the original matrix and W is an estimateap-proximation As both algorithms are influenced seriouslyby the parameters we compared the relative error of thetwo algorithms by given different parameters To make thecomparison simple we only changed one parameter with
another parameter fixed (Figure 2) One can see that bothalgorithms reach the smallest relative if adopting the correcttwo parameters and LRSDec outperforms GoDec In theSupplementaryMaterial we also compare the performance ofboth algorithms under different noise setting and the trendsare the same
52 Application to EMAP Data from Yeast We also appliedour method to EMAP data interrogating RNA processing inS cerevisiae which consists of 552 genes involved in RNA-related processes [9]This geneticmap contains about 152000pairwise genetic interaction measurements with about 29missing entries in data matrixWe applied ourmethod to thisEMAP data denoted asX to obtain twomatrices a low-rankmatrix L and a sparse matrix S X = L+S is the new completedata matrix with imputed missing entries To exploit thequantitative information from EMAP data we first subjectedthe entire low-rank matrix L to hierarchical clustering anapproach that groups genes with similar patterns of geneticinteractions It should be noted that using low-rank matrix Lin cluster improved the performance of hierarchical cluster-ing in detecting genetic interaction modules [16] Accordingto the clustering result we reordered rows and columnsof matrix X so that the protein complexes and biologicalprocesses showed in [9] could be found (Figure S1)
To help identify more modules of cellular functions andprocesses and reveal the relationships between them we fur-ther analyzed the matrices L and S Figure 3 is a flowchart ofour strategy 1 to detectmodules and cross talks between themthrough low-rank matrix L In this paper we define moduleas a cluster from hierarchical clustering (HC) that passesthrough GO-enriched filtering (Supplementary Section 3)The details are as follows Firstly we clustered the row genesof matrix L using hierarchical clustering (HC) Here weadopted the average-linkage hierarchical clustering algorithmin which the distance of gene 119860 and gene 119861 is defined as1 minus |cor(119860 119861)| where |cor(119860 119861)| is the absolute value of thecorrelation of genetic interaction profile of gene 119860 and gene119861 A cutoff needs to be applied for HC to cut the hierarchicalclustering tree We used the Jaccard index (SupplementarySection 4) to determine how well the predicted gene setscorrespond to benchmark (theoretical) gene sets [17] HereGO functional gene sets are used as benchmark (theoretical)gene setsThe cutoff atwhichHCachieved the highest Jaccardindex is used to cut the hierarchical tree We calculatedthe Jaccard index of every ldquoheightrdquo cutoff in hierarchicalclustering from 02 to 095 by 005 interval This step resultedin the best Jaccard index with height = 07 and 84 clustersNow we got the clusters of row genes in which genes act ina consistent manner across the entire column genes Thenwe filtered the clustering results by a hypergeometric testthat calculates the significance of enrichment of GO itemsand the 119901 value was set to 001 The clusters enriched in GOfunctional categories are defined as row modules Secondlyfor each row module we exploited modules of column genesbased on this row gene module in matrix X by clustering thecolumn genes of this submatrix of X Thirdly we screenedcolumn clusters whose interactions with the rowmodules are
BioMed Research International 5
0 50 100 150 200 250 300 3500
02
04
06
08
1
Number of ranks
The r
elat
ive e
rror
Relative error of matrix L
(a)
0 50 100 150 200 250 300 3500
02
04
06
08
1
12
14
Number of ranks
The r
elat
ive e
rror
Relative error of matrix S
(b)
0 100 200 300 400 500 6000
005
01
015
02
025
03
035
Number of cards
The r
elat
ive e
rror
Relative error of matrix L
GoDecLRSDec
(c)
0 100 200 300 400 500 6000
02
04
06
08
1
12
14
Number of cards
The r
elat
ive e
rror
Relative error of matrix S
GoDecLRSDec
(d)
Figure 2 Performances of LRSDec and GoDec in low-rank and sparse decomposition tasks on synthetic data under different parameters((a)-(b)) Fixed parameter card different parameter rank ((c)-(d)) fixed parameter rank different parameter card
HC row genes of L
Row modules
GO-enrichment filtering
Based oneach row module
HC column genes of L
Fisherrsquos exact testColumn modules
Connections between this row module and all column modules
Column modules that have functional
All column modules map to GO items
Connections among modules of the whole network
Repeat these three steps for
all significant
row modules
Figure 3 Flowchart of our strategy 1 for detecting modules and cross talks between them in genetic interaction network by low-rank matrixL
6 BioMed Research International
Table 1 Fisherrsquos exact test table (Gene set 119860)perp denotes thecomplementary set of gene set 119860 119860119861 denotes the number ofconnections between gene set 119860 and gene set 119861 119860119861
perp
denotes thenumber of connections between gene set 119860 and the complementaryset of gene set 119861
Gene set 119861 (Gene set 119861)perp
Gene set 119860 119860119861 119860119861perp
(Gene set 119860)perp 119860perp
119861 119860perp
119861perp
Table 2 Clustering results
(a) GO slim as benchmark gene set
Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched
200 0063 190 0022 46150 0070 138 0032 44100 0084 90 0050 5050 0088 30 0067 24
(b) GO BP FAT as benchmark gene set
Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched
200 0137 183 0044 47150 0147 142 0058 50100 0155 96 0091 5250 0131 34 0078 26 hypergeometric test applied to test the enrichment of gene sets Signif-icance level FDR lt= 005 Cluster the number of clusters to cut off thehierarchical clustering tree Enriched the number of modules predictedby hierarchical clustering enriched in the GO iterms
significantly enriched by Fisherrsquos exact test (Table 1 119901 value= 005) Here we defined the positive genetic interactions asthose gene pairs with genetic interaction scores 119878 ge 20 andnegative as 119878 le minus25 [9] The reduced gene sets of columngenes were defined as columnmodules (corresponding to therow module) Next we identified the enriched GO functionalcategories of these column modules by mapping them to GOitems (hypergeometric test) Finally repeating these steps forall row modules we identified the modules and intermodulegenetic cross talk of the whole genetic interaction network(Figures 4ndash6) where red and green represent a statisticallysignificant enrichment of positive and negative interactionsThe cross talks constructed in these figures are the 119878 scoresamong genes in the original matrix In the following we willdiscuss many of the interesting connections that have beenreported previously
The low-rank matrix found more functional modulesthan the original matrix (Table 2) In Table 2 we cut thedendrogram at different heights and compared the clustersobtained from the low-rank matrix and that of the originalmatrix Jaccard index [17] is used to determine how well thepredicted clusters recaptured the benchmark gene sets ((a)GO slim and (b) GO BP FAT) The definition of Jaccardindex can be found in the Supplementary Material Inthe ideal situation where predicted clusters perfectly match
the benchmark gene sets the Jaccard index is 1 The largerthe Jaccard index the better the predictions The clustersobtained from clustering of the low-rank matrix are moreenriched with GO functional categories at varying cutoffs(Table 2)
Figure 4 gives an overview of the relationships amongbiological processes when GO slim (downloaded from SGD[18] a broad overview of all of the top GO categories) is usedas GO items We found that several sets of genes that havebeen known to function in the same biochemical processescontain predominantly positive or negative interactionswhich was also observed in [9] For example genes classed asinvolved in RNA splicing and transcription are significantlyenriched in negative genetic interactions (Figure 4 greennodes) In contrast the module involved in protein foldinghas strong positive interactions (red node) In additionFigure 4 also suggested that not all modules have consistentpattern of interactions (yellow node) which is reasonable inbiological processes Finally several connections have beenpreviously discussed For example there is good evidencefor functional interactions between splicing and transcriptionin [19] and functional interactions between splicing andtranslation in [20] Furthermore [21] reported the cooper-ative relationship between protein folding and chromosomeorganization
Actually if we classify the GO functional categories tomore fine items (GO BP FAT downloaded from DAVIDhttpdavidabccncifcrfgov) we can get a more compre-hensive network (Figure 5) Many of the interaction resultsin Figure 5 have been reported before For instance genesinvolved in tubulin complex assembly process have negativegenetic interactions with genes involved in tRNA wobbleuridine modification process supported by SGD while thenegative genetic interactions between RNA splicing processand tRNA metabolic process could also be found Actuallythe genetic interaction between RNA splicing and chromatinmodification has been studied in [22] And the balance ofthe interactions between the processing of ribonucleoproteinassembly of intronic noncoding RNAs and the splicingprocess regulating the levels of ncRNA and host mRNA canbe found in [23] Tubulin functionally relating to roles ofthe elongator complex in tRNA wobble uridine modificationis supported by [24] Moreover epistasis and chromatinimmunoprecipitation experiments indicating that the lossof Rrp6 (regulation of transcription) function is paralleledby the recruitment of Hda1 (histone deacetylase) have beenreported by [25] Finally cotranscriptional recruitment of themRNA export factor Yra1 realized by direct interaction withthe 31015840 end processing factor Pcf11 was in [26]
In an effort to gain insight into the functional organi-zation of RNA-related complexes we used the GO CC FAT(downloaded from DAVID) as the GO functional categoriesto create a map that highlights strong genetic trends bothwithin and between these complexes This result could befound in the Supplementary Material
Now let us turn to the analysis of the sparse matrix S Thesparse matrix gives two distinct measures to exploit geneticinformation First extreme 119878 scores indicate cofunctionalmembership more efficiently [27] Second some 119878 scores
BioMed Research International 7
Negativeinteractions
Positiveinteractions
LOC1LIN1
BUD13
IST3LEA1
STO1
PRP11SNU66
CBC2BRR1MUD2
TGS1YNR004WCWC15ECM2
YPR152CNTC20ISY1CWC21SYF2CWC27PRP19CSN12
PML1MUD1
NAM8
HMT1SNU56
PRP42TIF35DBR1YHC1
LOC1
LIN1
BUD13
IST3
LEA1
STO1
PRP1
1
SNU66
CBC2
BRR1
MU
D2
TGS1
YNR0
04
WCW
C15
ECM2
YPR1
52
CN
TC20
ISY1
CWC2
1
SYF2
CWC2
7
PRP1
9
CSN12
PML1
MU
D1
NA
M8
HM
T1SN
U56
PRP4
2
TIF3
5
DBR
1
YHC1
NPL3PUF4THP2RTT103CTK2SIF2HOS2SET3CDC73LEO1
RTT109
RTT109
CTK1HTZ1SOH1
SOH1
DST1CSE2
CSE2
MED1
SDC1EAF7THP1SYC1RXT2
RXT2
SWR1VPS71
PHO23
SAP30
PHO23SAP30SIN3
SIN3
HIR2
PUF4
SIF2HOS2SET3CDC73LEO1
SOH1CSE2
SDC1EAF7
RXT2
PHO2
SAP3
0SI
N3
HIR2
HIR2
RTT109G
BP2
SWR1VPS71
GLC
7
HIR2
NPL
3
PUF4
THP2
RTT1
03
CTK2
SIF2
HO
S2SE
T3CD
C73
LEO1
RTT1
09
CTK1
HTZ
1
SOH1
DST
1
CSE2
MED
1
SDC1
EAF7
THP1
SYC1
RXT2
PHO23
SAP3
0
SIN3
HIR2
GBP
2YR
A1
THP2
RPN4
SOH1
SAC3
THP1
DO
A1
RAD23
RTT1
09
SIN3
SWD1
SWD3
SWD3
RPN
4CA
F40
BRE1
SWD1
BRE1
LGE1
CDC7
3EA
F6EA
F5EA
F7SR
C1PA
P2EA
F3
GBP2
RAD23
CSE2
PUF4
SWD1
SWD3
BRE1
LGE1
RAD52
SIF2
HO
S2
SOH1
SET3
CDC7
LEO1
SRC1
SDC1
EAF3
RTT1
0
SWR1
VPS
71
RAD5
RAD5
RAD2
EAF6
EAF7
BRE1SIF2HOS2SET3SEM1
SAC3THP1BUD13
RPN4
RAD52
GLC7SRC1TOR1RTT101RAD51
SOH1
CSE2SIN3
MLP2
BRE1
KEM
1
SIF2
HO
S2SE
T3SE
M1
SAC3
THP1
BUD13
RPN4
RAD52
GLC
7
SRC1
TOR1
RTT1
01
RAD51
SOH1
CSE2
SIN3
MLP
2
YBL104CYGR283CRIC1YPT6GIM4
YKE2GIM3
GIM5
YGR2
83
CRI
C1YP
T6G
IM4
YKE2
GIM
3
GIM
5
RNA splicing
Transcription
DNA metabolicprocess
Cellular proteinmodification
processTranslation
Cellular proteindegradation
Transport
Mitochondrionorganization
Ribosome
Chromosomeorganization
Response tostress
Cell-divisioncycle
Protein folding
LRS4
HEX
3SL
X8LR
P1RR
P6FP
R1TG
S1
Figure 4 Global view of the genetic cross talks between different RNA-related processes (GO slim items) Green and red represent astatistically significant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20)interactions respectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative geneticinteractions Nodes (balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heatmaps represent scores of interactions within one process and the rectangle heatmaps represent scores of interactions betweentwo processes
indicate the significant genetic interactions between genesin different gene sets Following this clue by analyzing thematrix S we found much evidence of genes involved in thesame functionalmodules andmany cross talks between func-tional modules (Figure 6) Actually many of them supportthe network in Figures 4-5 The information of gene pairswith extreme 119878 scores and the involved modules could befound in the Supplementary Material (Supplementary Data2)
Strategy 2 of sparse matrix analysis is similar to strategy 1(Figure 3) First for every row module (the same definitionas that in Figure 3) cluster the column genes based ontheir genetic spectrums across genes in this row moduleThen select the column gene sets in which there are genesbelonging to nonzero entries of sparse matrix to be definedas column modules Finally map these column modules toGO items identifying their enriched functional categories(hypergeometric test) Similarly for all row modules repeat
8 BioMed Research International
PRP4
YRA1
CDC73
LGE1SDC1
RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1
SWD1RTT109
SWD3BRE1EAF5EAF7KAP122
SSD1TAD3EAF6
SWR1VPS71HTZ1EAF3CUL3
SEM1THP2SAC3THP1
PUF4GLC7SRC1
TOP1NUP60EFB1DBP5HIR2
SEM
1TH
P2SA
C3TH
P1
PUF4
GLC
7SR
C1
TOP1
NU
P60
EFB1
DBP
5H
IR2
RTT103
GBP2
CTK1
CTK2
SIF2HOS2SET3LEO1NPL3
UBP6DOA1RPN4
RAD23
PRP4
YRA1
CDC73
LGE1
SDC1
RXT2PHO23
SAP30
RXT2
PHO23
SAP3
0
SIN3
TAD3
DST1
CSE2
MED1
SWD1
RTT109
SWD3BRE1
EAF5EAF7
KAP122
SSD1
SWR1VPS71
CUL3
SOP1
EAF6
SET3
HTZ1
EAF3
RTT103
GBP2
CTK1
CTK2
SIF2
HOS2
SIN
3
LEO1
NPL3
DOA1UBP6
RPN4
RAD23
CBC2
CSN12
LEA1
CWC21
YHC1
PRP19
ELP6
PRP11
CWC27NTC20
IKI1
ELP2
ELP3IKI3
ELP4
ECM2
MUD2TGS1
SNU56
BRR1
BUD13
MLP2SYF2
YNR004WLIN1
SNU66PML1
YPR152C
URM1
ELP6
IKI1
ELP2
ELP3
IKI3
ELP4
URM
1
MUD1
HMT1
GIM5
YKE2GIM3
GIM4
GIM5YKE2GIM3GIM4
ELP3
ELP6
IKI1
GIM
5
YKE2
GIM
3
GIM
4
HTZ
1
SWR1
VPS
71
SGF1
1
TGS1
SKI7
HCR
1
RPS8
NPL
3
DRS
1
IST3
TIF35
ISY1
NAM8
CBC2
CSN12
LEA1
CWC21
YHC1
PRP11
CWC27
NTC20
ECM2
MUD1
TGS1
SNU56BRR1
BUD13
MLP2
SYF2
YNR004W
LIN1
SNU66
PML1
YPR152C
MUD2
HMT1
IST3
TIF35
ISY1
NAM8
PRP4
YRA
1
CDC7
3LG
E1
SSD
1
RXT2
PHO
23SA
P30
SIN
3D
ST1
CSE2
MED
1SO
H1
SWD
1
RTT1
09
SWD
3BR
E1
EAF5
EAF7
EAF6
KAP1
22
SDC1
TAD
3
SWR1
VPS
71H
TZ1
EAF3
CUL3
RTT1
03
GBP
2
CTK1
CTK2
SIF2
HO
S2SE
T3LE
O1
NPL
3
UBP
6D
OA
1
RPN
4
RAD
23
RNA splicing
Transcription
Transcription
transcription
RNA transport RNA localization
Intracellulartransport
Ribonucleoproteincomplex
biogenesis
Ribonucleoproteincomplex
biogenesis
Deadenylationdependentdecapping of
nucleartranscribed
mRNA
biogenesis
Regulation ofcellular protein
metabolic ncRNA
mRNA catabolic
metabolic
Ribosomeprocess
process
process
processprocess
process
Posttranscriptionalregulation of
gene expression
dependentmacromolecule
catabolic
tRNAmodification
Modification
process
rRNA metabolic
mRNA metabolic
process
process
process
process
Macromoleculecatabolicprocess
aciddeacetylation
deacetylation
Protein amino
RNA catabolic
Macromoleculecatabolic
tRNA metabolic
Chromatinmodification
tRNA wobbleuridine
modification
rRNA metabolicMacromolecularcomplex subunit
organizationCellular protein
complexassembly
ncRNAmetabolic
Cellular proteinlocalization
Regulation oftranslation
Regulation of
from nucleusmRNA export Histone
Maturation ofSSU rRNA
tRNA
Tubulin complexassembly
RNA capping
processing
RNAprocessing
PRP19
aminoacylation
mRNA 3998400end
3998400end
Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes
the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)
Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein
folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878
score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial
BioMed Research International 9
RNA transportMaturation of
SSU-rRNA
RPS6A
POM152
UAF30
DBP5 NUP157
RPS11B
RPS6B
THP1
SRC1
TOP1
SEM1
NUP60
ENP1
SAC3
PUF4 HIR2
GLC7EFB1
THP2
(a)
RNA transport RNA localization
RRP6
YRA2
DBP5
NUP53
LRP1
THP1SRC1
TOP1
SEM1
NUP60
SAC3
PUF4HIR2
GLC7EFB1
THP2
(b)
Regulation ofgene expression
mRNA metabolicprocess
POP1
PRP42
CDC36SNU71
PAB1
SUP35
PRP3
PRP5
PRP6
PRP21
CDC73
EAF3
RTT103
RTT109
SWD1
SWD3
SDC1SAP30
RXT2
CTK1SET3
RPN4
PRP4
PHO23
NPL3
MED1
LGE1
LEO1
KAP122 DST1CTK2
EAF5HOS2
TAD3CSE2 BRE1
VPS71EAF7EAF6
SOH1
SIF2DOA1
UBP6 SWR1
SSD1
SIN3
HTZ1
(c)
Tubulin complex assembly
(protein folding)mRNA
process3998400 end
NAB2
PAN3
CTK1
GIM3
YKE2
GIM5
GIM4
(d)
Tubulin complex assembly
TOP1
MLP2 HIR2
CDC36CKA1
UTP4
GIM3
YKE2
GIM5
GIM4
(protein folding)Regulation oftranscription
(e)
mRNA splicingncRNA metabolic
process
YPR152C
YNR004WCSN12
MUD1
YHC1
MUD2
NAM8
MSM1
MSD1
MSK1
MRM1
MSF1
PRP19
SNU66
SNU56
PRP11
PML1ECM2
BUD13
HMT1
CWC21CWC27 TGS1
IST3CBC2
ISY1
LIN1
LEA1
MLP2
BRR1
NTC20SYF2
TIF35
(f)
Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix
10 BioMed Research International
6 Conclusion
In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)
References
[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919
[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007
[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008
[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001
[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004
[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative
epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006
[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007
[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008
[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008
[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260
[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006
[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011
[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011
[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010
[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004
[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014
[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009
[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997
[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014
[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013
[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012
[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012
[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008
BioMed Research International 11
[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008
[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007
[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009
[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011
[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011
[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010
[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008
[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 3
a sparse matrix S based on the observed data we propose tosolve the following optimization problem
minimize Llowast
minimize card (S)
subject to sum
(119894119895)isinΩ
(119883119894119895
minus 119871119894119895
minus 119878119894119895)2
⩽ 120575(4)
3 Algorithm
Similar to GoDec the optimization problem of (3) can besolved by alternatively optimizing the following two subprob-lems until convergence
L119905
= arg minLlowast
1003817100381710038171003817X minus L minus S119905minus1
1003817100381710038171003817
2
119865 (5a)
S119905
= arg mincard(S)le119896
1003817100381710038171003817X minus L119905minus S1003817100381710038171003817
2
119865 (5b)
In each iteration we optimize the objective function byalternatively updating L and S Firstly the subproblem (5a)can be solved by [14] For fixed S the solution of (5a) is
L119905
= T120582
(X minus S119905minus1
) (6)
Here 120582 ge 0 is a regularization parameter controlling thenuclear norm of estimated value L
119905 where
T120582
(W) = UD120582V1015840
with D120582
= diag [(1198891
minus 120582)+
(119889119903
minus 120582)+]
(7)
UDV1015840 is the Singular Value Decomposition (SVD) of W andhere 119905
+= max(119905 0) The notation T
120582(W) refers to soft-
thresholding [14]Next the subproblem (5b) in (3) could be updated via
entry-wise hard thresholding of X minus L119905for fixed L
119905 Before
giving the solution we define an orthogonal projectionoperator P Suppose there is a subset of dataset W indexedby Ω then the matrix W can be projected onto the linearspace of matrices supported by Ω
119875Ω
(W)(119894119895)
=
119882119894119895
if (119894 119895) isin Ω
0 if (119894 119895) notin Ω
(8)
AndPΩperp is its complementary projection that isP
Ω(W) +
PΩperp(W) = WThen the solution of (5b) could be given as follows
S = PΘ
(X minus L119905) (9)
where P is the orthogonal projection operator as definedabove Θ is the nonzero subset of the first 119896 largest entriesof |(X minus L
119905)| Then the matrix (X minus L
119905) can be projected onto
the linear space of matrices supported by Θ
PΘ
(X minus L119905)(119894119895)
=
(X minus L119905)119894119895
if (119894 119895) isin Θ
0 if (119894 119895) notin Θ
(10)
So far we have developed the algorithm for solvingproblem (3) As for problem (4) due to the existence ofmissing values we took the optimization on the observeddata Ω We updated L
119905and S
119905of the following optimization
subproblems respectively
L119905
= arg minLlowast
1003817100381710038171003817PΩ (X minus L minus S119905minus1
)1003817100381710038171003817
2
119865 (11a)
S119905
= arg mincard(S)le119896
1003817100381710038171003817PΩ (X minus L119905minus S)
1003817100381710038171003817
2
119865 (11b)
The term PΩ
(X minus L minus S)2
119865is the sum of squared errors on
the observed entries indexed by ΩThe subproblem (11a) can be solved by updating Lwith an
arbitrary initialization using [14]
L119905
larr997888 T120582
(PΩ
(X minus S119905minus1
) + PΩperp (L)) (12)
The solution of subproblem (11b) is
S119905
= PΘ
(PΩ
(X minus L119905)) (13)
where Θ is the nonzero subset of the first 119896 largest entries of|PΩ
(X minus L119905)|
Now we have the following algorithm
Algorithm 1 (LRSDec) (i) Input X isin R119898times119899 Initialize S larr 0(ii) Iterate until convergence(a) L-step iteratively update L using (12)(b) S-step Solve S using (13)
(iii) Output L S
The convergence analysis of our algorithms is provided inthe Supplementary Material
4 Parameter Tuning
We have two parameters that need to be tuned in our models120582 and 119896 Here we propose a 10-fold cross validation strategyto select them The idea is as follows let Ω be the index ofobserved entries ofX We randomly partition Ω into 10 equalsize subsets and choose training entriesΩ
1and testing entries
Ω2 Ω1
cup Ω2
= Ω and Ω1
cap Ω2
= Oslash |Ω1| = 09 lowast |Ω| |Ω
2| =
01lowast |Ω| Wemay solve problem (15) on a grid of (120582 119896) valueson the training data
L119905
= arg minLlowast
10038171003817100381710038171003817PΩ1
(X minus L minus S119905minus1
)10038171003817100381710038171003817
2
119865
(14a)
S119905
= arg mincard(S)le119896
10038171003817100381710038171003817PΩ1
(X minus L119905minus S)
10038171003817100381710038171003817
2
119865
(14b)
Then we evaluate the prediction error (15) on the testingdata
Err (120582 119896) =1
2
10038171003817100381710038171003817PΩ2
(X minus L(120582 119896) minus S(120582 119896))10038171003817100381710038171003817
2
119865
(15)
The cross validation process is repeated for 10 timesThenwe can find the optimal parameter (120582
lowast
119896lowast
) which minimizesthe mean of the prediction error
4 BioMed Research International
0 50 100 150 200 250 300 3500
200400
6000
10
20
Rank of matrix L
Card of matrix S
Rela
tive e
rror
of
The minimum ofrelative error
25
250
deco
mpo
sitio
n
Figure 1 Results of parameter tuning on synthetic data 119909-axisdifferent cardinality corresponding to parameter 119896 119910-axis differentrank corresponding to parameter 120582 119911-axis the relative error X minus
X2
119865X
2119865
5 Results
51 Synthetic Data We simulated a synthetic data and thenapplied LRSDec algorithm andGoDec algorithm to it Specif-ically low-rank part sparse part and noises are generated asfollows
(i) Low-rank part the covariance matrix Σ is generatedbyHH119879 whereH isin R119898times119870 andH
119894119895sim N(0 1) Here119870
is the number of hiddenmodulesThe random entriesL119895are drawn fromN(0 120591Σ) Let L = [L
1 L
119899]
(ii) Sparse part the non-zero entries in sparse matrixare generated from the tail of Gaussian distributionN(1 2) whose upper quantile is 120572 = 001 Werandomly selected 70 of them to assign the oppositesignThis is consistent with EMAP datasets in whichnegative genetic interactions are much more preva-lent than the positive ones
(iii) e = 10minus2
lowastF wherein F is a standard Gaussianmatrix
A low-rank matrix 119871 with rank 25 and sparse matrix withcardinality 250 were generated respectively Now we have
X = L + S + e (16)
The first step is parameter training and the result isshowed in Figure 1 Minimal prediction error was achievedwhen 120582 = 250 and 119896 = 25 which coincides with the rankand cardinality of the synthetic data This demonstrated theeffectiveness of cross validation procedure
Next we compared the performance of LRSDec algo-rithm and GoDec algorithm by comparing their predictionerror The relative error is defined as
10038171003817100381710038171003817W minus W10038171003817100381710038171003817
2
119865
W2
119865
(17)
where W is the original matrix and W is an estimateap-proximation As both algorithms are influenced seriouslyby the parameters we compared the relative error of thetwo algorithms by given different parameters To make thecomparison simple we only changed one parameter with
another parameter fixed (Figure 2) One can see that bothalgorithms reach the smallest relative if adopting the correcttwo parameters and LRSDec outperforms GoDec In theSupplementaryMaterial we also compare the performance ofboth algorithms under different noise setting and the trendsare the same
52 Application to EMAP Data from Yeast We also appliedour method to EMAP data interrogating RNA processing inS cerevisiae which consists of 552 genes involved in RNA-related processes [9]This geneticmap contains about 152000pairwise genetic interaction measurements with about 29missing entries in data matrixWe applied ourmethod to thisEMAP data denoted asX to obtain twomatrices a low-rankmatrix L and a sparse matrix S X = L+S is the new completedata matrix with imputed missing entries To exploit thequantitative information from EMAP data we first subjectedthe entire low-rank matrix L to hierarchical clustering anapproach that groups genes with similar patterns of geneticinteractions It should be noted that using low-rank matrix Lin cluster improved the performance of hierarchical cluster-ing in detecting genetic interaction modules [16] Accordingto the clustering result we reordered rows and columnsof matrix X so that the protein complexes and biologicalprocesses showed in [9] could be found (Figure S1)
To help identify more modules of cellular functions andprocesses and reveal the relationships between them we fur-ther analyzed the matrices L and S Figure 3 is a flowchart ofour strategy 1 to detectmodules and cross talks between themthrough low-rank matrix L In this paper we define moduleas a cluster from hierarchical clustering (HC) that passesthrough GO-enriched filtering (Supplementary Section 3)The details are as follows Firstly we clustered the row genesof matrix L using hierarchical clustering (HC) Here weadopted the average-linkage hierarchical clustering algorithmin which the distance of gene 119860 and gene 119861 is defined as1 minus |cor(119860 119861)| where |cor(119860 119861)| is the absolute value of thecorrelation of genetic interaction profile of gene 119860 and gene119861 A cutoff needs to be applied for HC to cut the hierarchicalclustering tree We used the Jaccard index (SupplementarySection 4) to determine how well the predicted gene setscorrespond to benchmark (theoretical) gene sets [17] HereGO functional gene sets are used as benchmark (theoretical)gene setsThe cutoff atwhichHCachieved the highest Jaccardindex is used to cut the hierarchical tree We calculatedthe Jaccard index of every ldquoheightrdquo cutoff in hierarchicalclustering from 02 to 095 by 005 interval This step resultedin the best Jaccard index with height = 07 and 84 clustersNow we got the clusters of row genes in which genes act ina consistent manner across the entire column genes Thenwe filtered the clustering results by a hypergeometric testthat calculates the significance of enrichment of GO itemsand the 119901 value was set to 001 The clusters enriched in GOfunctional categories are defined as row modules Secondlyfor each row module we exploited modules of column genesbased on this row gene module in matrix X by clustering thecolumn genes of this submatrix of X Thirdly we screenedcolumn clusters whose interactions with the rowmodules are
BioMed Research International 5
0 50 100 150 200 250 300 3500
02
04
06
08
1
Number of ranks
The r
elat
ive e
rror
Relative error of matrix L
(a)
0 50 100 150 200 250 300 3500
02
04
06
08
1
12
14
Number of ranks
The r
elat
ive e
rror
Relative error of matrix S
(b)
0 100 200 300 400 500 6000
005
01
015
02
025
03
035
Number of cards
The r
elat
ive e
rror
Relative error of matrix L
GoDecLRSDec
(c)
0 100 200 300 400 500 6000
02
04
06
08
1
12
14
Number of cards
The r
elat
ive e
rror
Relative error of matrix S
GoDecLRSDec
(d)
Figure 2 Performances of LRSDec and GoDec in low-rank and sparse decomposition tasks on synthetic data under different parameters((a)-(b)) Fixed parameter card different parameter rank ((c)-(d)) fixed parameter rank different parameter card
HC row genes of L
Row modules
GO-enrichment filtering
Based oneach row module
HC column genes of L
Fisherrsquos exact testColumn modules
Connections between this row module and all column modules
Column modules that have functional
All column modules map to GO items
Connections among modules of the whole network
Repeat these three steps for
all significant
row modules
Figure 3 Flowchart of our strategy 1 for detecting modules and cross talks between them in genetic interaction network by low-rank matrixL
6 BioMed Research International
Table 1 Fisherrsquos exact test table (Gene set 119860)perp denotes thecomplementary set of gene set 119860 119860119861 denotes the number ofconnections between gene set 119860 and gene set 119861 119860119861
perp
denotes thenumber of connections between gene set 119860 and the complementaryset of gene set 119861
Gene set 119861 (Gene set 119861)perp
Gene set 119860 119860119861 119860119861perp
(Gene set 119860)perp 119860perp
119861 119860perp
119861perp
Table 2 Clustering results
(a) GO slim as benchmark gene set
Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched
200 0063 190 0022 46150 0070 138 0032 44100 0084 90 0050 5050 0088 30 0067 24
(b) GO BP FAT as benchmark gene set
Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched
200 0137 183 0044 47150 0147 142 0058 50100 0155 96 0091 5250 0131 34 0078 26 hypergeometric test applied to test the enrichment of gene sets Signif-icance level FDR lt= 005 Cluster the number of clusters to cut off thehierarchical clustering tree Enriched the number of modules predictedby hierarchical clustering enriched in the GO iterms
significantly enriched by Fisherrsquos exact test (Table 1 119901 value= 005) Here we defined the positive genetic interactions asthose gene pairs with genetic interaction scores 119878 ge 20 andnegative as 119878 le minus25 [9] The reduced gene sets of columngenes were defined as columnmodules (corresponding to therow module) Next we identified the enriched GO functionalcategories of these column modules by mapping them to GOitems (hypergeometric test) Finally repeating these steps forall row modules we identified the modules and intermodulegenetic cross talk of the whole genetic interaction network(Figures 4ndash6) where red and green represent a statisticallysignificant enrichment of positive and negative interactionsThe cross talks constructed in these figures are the 119878 scoresamong genes in the original matrix In the following we willdiscuss many of the interesting connections that have beenreported previously
The low-rank matrix found more functional modulesthan the original matrix (Table 2) In Table 2 we cut thedendrogram at different heights and compared the clustersobtained from the low-rank matrix and that of the originalmatrix Jaccard index [17] is used to determine how well thepredicted clusters recaptured the benchmark gene sets ((a)GO slim and (b) GO BP FAT) The definition of Jaccardindex can be found in the Supplementary Material Inthe ideal situation where predicted clusters perfectly match
the benchmark gene sets the Jaccard index is 1 The largerthe Jaccard index the better the predictions The clustersobtained from clustering of the low-rank matrix are moreenriched with GO functional categories at varying cutoffs(Table 2)
Figure 4 gives an overview of the relationships amongbiological processes when GO slim (downloaded from SGD[18] a broad overview of all of the top GO categories) is usedas GO items We found that several sets of genes that havebeen known to function in the same biochemical processescontain predominantly positive or negative interactionswhich was also observed in [9] For example genes classed asinvolved in RNA splicing and transcription are significantlyenriched in negative genetic interactions (Figure 4 greennodes) In contrast the module involved in protein foldinghas strong positive interactions (red node) In additionFigure 4 also suggested that not all modules have consistentpattern of interactions (yellow node) which is reasonable inbiological processes Finally several connections have beenpreviously discussed For example there is good evidencefor functional interactions between splicing and transcriptionin [19] and functional interactions between splicing andtranslation in [20] Furthermore [21] reported the cooper-ative relationship between protein folding and chromosomeorganization
Actually if we classify the GO functional categories tomore fine items (GO BP FAT downloaded from DAVIDhttpdavidabccncifcrfgov) we can get a more compre-hensive network (Figure 5) Many of the interaction resultsin Figure 5 have been reported before For instance genesinvolved in tubulin complex assembly process have negativegenetic interactions with genes involved in tRNA wobbleuridine modification process supported by SGD while thenegative genetic interactions between RNA splicing processand tRNA metabolic process could also be found Actuallythe genetic interaction between RNA splicing and chromatinmodification has been studied in [22] And the balance ofthe interactions between the processing of ribonucleoproteinassembly of intronic noncoding RNAs and the splicingprocess regulating the levels of ncRNA and host mRNA canbe found in [23] Tubulin functionally relating to roles ofthe elongator complex in tRNA wobble uridine modificationis supported by [24] Moreover epistasis and chromatinimmunoprecipitation experiments indicating that the lossof Rrp6 (regulation of transcription) function is paralleledby the recruitment of Hda1 (histone deacetylase) have beenreported by [25] Finally cotranscriptional recruitment of themRNA export factor Yra1 realized by direct interaction withthe 31015840 end processing factor Pcf11 was in [26]
In an effort to gain insight into the functional organi-zation of RNA-related complexes we used the GO CC FAT(downloaded from DAVID) as the GO functional categoriesto create a map that highlights strong genetic trends bothwithin and between these complexes This result could befound in the Supplementary Material
Now let us turn to the analysis of the sparse matrix S Thesparse matrix gives two distinct measures to exploit geneticinformation First extreme 119878 scores indicate cofunctionalmembership more efficiently [27] Second some 119878 scores
BioMed Research International 7
Negativeinteractions
Positiveinteractions
LOC1LIN1
BUD13
IST3LEA1
STO1
PRP11SNU66
CBC2BRR1MUD2
TGS1YNR004WCWC15ECM2
YPR152CNTC20ISY1CWC21SYF2CWC27PRP19CSN12
PML1MUD1
NAM8
HMT1SNU56
PRP42TIF35DBR1YHC1
LOC1
LIN1
BUD13
IST3
LEA1
STO1
PRP1
1
SNU66
CBC2
BRR1
MU
D2
TGS1
YNR0
04
WCW
C15
ECM2
YPR1
52
CN
TC20
ISY1
CWC2
1
SYF2
CWC2
7
PRP1
9
CSN12
PML1
MU
D1
NA
M8
HM
T1SN
U56
PRP4
2
TIF3
5
DBR
1
YHC1
NPL3PUF4THP2RTT103CTK2SIF2HOS2SET3CDC73LEO1
RTT109
RTT109
CTK1HTZ1SOH1
SOH1
DST1CSE2
CSE2
MED1
SDC1EAF7THP1SYC1RXT2
RXT2
SWR1VPS71
PHO23
SAP30
PHO23SAP30SIN3
SIN3
HIR2
PUF4
SIF2HOS2SET3CDC73LEO1
SOH1CSE2
SDC1EAF7
RXT2
PHO2
SAP3
0SI
N3
HIR2
HIR2
RTT109G
BP2
SWR1VPS71
GLC
7
HIR2
NPL
3
PUF4
THP2
RTT1
03
CTK2
SIF2
HO
S2SE
T3CD
C73
LEO1
RTT1
09
CTK1
HTZ
1
SOH1
DST
1
CSE2
MED
1
SDC1
EAF7
THP1
SYC1
RXT2
PHO23
SAP3
0
SIN3
HIR2
GBP
2YR
A1
THP2
RPN4
SOH1
SAC3
THP1
DO
A1
RAD23
RTT1
09
SIN3
SWD1
SWD3
SWD3
RPN
4CA
F40
BRE1
SWD1
BRE1
LGE1
CDC7
3EA
F6EA
F5EA
F7SR
C1PA
P2EA
F3
GBP2
RAD23
CSE2
PUF4
SWD1
SWD3
BRE1
LGE1
RAD52
SIF2
HO
S2
SOH1
SET3
CDC7
LEO1
SRC1
SDC1
EAF3
RTT1
0
SWR1
VPS
71
RAD5
RAD5
RAD2
EAF6
EAF7
BRE1SIF2HOS2SET3SEM1
SAC3THP1BUD13
RPN4
RAD52
GLC7SRC1TOR1RTT101RAD51
SOH1
CSE2SIN3
MLP2
BRE1
KEM
1
SIF2
HO
S2SE
T3SE
M1
SAC3
THP1
BUD13
RPN4
RAD52
GLC
7
SRC1
TOR1
RTT1
01
RAD51
SOH1
CSE2
SIN3
MLP
2
YBL104CYGR283CRIC1YPT6GIM4
YKE2GIM3
GIM5
YGR2
83
CRI
C1YP
T6G
IM4
YKE2
GIM
3
GIM
5
RNA splicing
Transcription
DNA metabolicprocess
Cellular proteinmodification
processTranslation
Cellular proteindegradation
Transport
Mitochondrionorganization
Ribosome
Chromosomeorganization
Response tostress
Cell-divisioncycle
Protein folding
LRS4
HEX
3SL
X8LR
P1RR
P6FP
R1TG
S1
Figure 4 Global view of the genetic cross talks between different RNA-related processes (GO slim items) Green and red represent astatistically significant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20)interactions respectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative geneticinteractions Nodes (balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heatmaps represent scores of interactions within one process and the rectangle heatmaps represent scores of interactions betweentwo processes
indicate the significant genetic interactions between genesin different gene sets Following this clue by analyzing thematrix S we found much evidence of genes involved in thesame functionalmodules andmany cross talks between func-tional modules (Figure 6) Actually many of them supportthe network in Figures 4-5 The information of gene pairswith extreme 119878 scores and the involved modules could befound in the Supplementary Material (Supplementary Data2)
Strategy 2 of sparse matrix analysis is similar to strategy 1(Figure 3) First for every row module (the same definitionas that in Figure 3) cluster the column genes based ontheir genetic spectrums across genes in this row moduleThen select the column gene sets in which there are genesbelonging to nonzero entries of sparse matrix to be definedas column modules Finally map these column modules toGO items identifying their enriched functional categories(hypergeometric test) Similarly for all row modules repeat
8 BioMed Research International
PRP4
YRA1
CDC73
LGE1SDC1
RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1
SWD1RTT109
SWD3BRE1EAF5EAF7KAP122
SSD1TAD3EAF6
SWR1VPS71HTZ1EAF3CUL3
SEM1THP2SAC3THP1
PUF4GLC7SRC1
TOP1NUP60EFB1DBP5HIR2
SEM
1TH
P2SA
C3TH
P1
PUF4
GLC
7SR
C1
TOP1
NU
P60
EFB1
DBP
5H
IR2
RTT103
GBP2
CTK1
CTK2
SIF2HOS2SET3LEO1NPL3
UBP6DOA1RPN4
RAD23
PRP4
YRA1
CDC73
LGE1
SDC1
RXT2PHO23
SAP30
RXT2
PHO23
SAP3
0
SIN3
TAD3
DST1
CSE2
MED1
SWD1
RTT109
SWD3BRE1
EAF5EAF7
KAP122
SSD1
SWR1VPS71
CUL3
SOP1
EAF6
SET3
HTZ1
EAF3
RTT103
GBP2
CTK1
CTK2
SIF2
HOS2
SIN
3
LEO1
NPL3
DOA1UBP6
RPN4
RAD23
CBC2
CSN12
LEA1
CWC21
YHC1
PRP19
ELP6
PRP11
CWC27NTC20
IKI1
ELP2
ELP3IKI3
ELP4
ECM2
MUD2TGS1
SNU56
BRR1
BUD13
MLP2SYF2
YNR004WLIN1
SNU66PML1
YPR152C
URM1
ELP6
IKI1
ELP2
ELP3
IKI3
ELP4
URM
1
MUD1
HMT1
GIM5
YKE2GIM3
GIM4
GIM5YKE2GIM3GIM4
ELP3
ELP6
IKI1
GIM
5
YKE2
GIM
3
GIM
4
HTZ
1
SWR1
VPS
71
SGF1
1
TGS1
SKI7
HCR
1
RPS8
NPL
3
DRS
1
IST3
TIF35
ISY1
NAM8
CBC2
CSN12
LEA1
CWC21
YHC1
PRP11
CWC27
NTC20
ECM2
MUD1
TGS1
SNU56BRR1
BUD13
MLP2
SYF2
YNR004W
LIN1
SNU66
PML1
YPR152C
MUD2
HMT1
IST3
TIF35
ISY1
NAM8
PRP4
YRA
1
CDC7
3LG
E1
SSD
1
RXT2
PHO
23SA
P30
SIN
3D
ST1
CSE2
MED
1SO
H1
SWD
1
RTT1
09
SWD
3BR
E1
EAF5
EAF7
EAF6
KAP1
22
SDC1
TAD
3
SWR1
VPS
71H
TZ1
EAF3
CUL3
RTT1
03
GBP
2
CTK1
CTK2
SIF2
HO
S2SE
T3LE
O1
NPL
3
UBP
6D
OA
1
RPN
4
RAD
23
RNA splicing
Transcription
Transcription
transcription
RNA transport RNA localization
Intracellulartransport
Ribonucleoproteincomplex
biogenesis
Ribonucleoproteincomplex
biogenesis
Deadenylationdependentdecapping of
nucleartranscribed
mRNA
biogenesis
Regulation ofcellular protein
metabolic ncRNA
mRNA catabolic
metabolic
Ribosomeprocess
process
process
processprocess
process
Posttranscriptionalregulation of
gene expression
dependentmacromolecule
catabolic
tRNAmodification
Modification
process
rRNA metabolic
mRNA metabolic
process
process
process
process
Macromoleculecatabolicprocess
aciddeacetylation
deacetylation
Protein amino
RNA catabolic
Macromoleculecatabolic
tRNA metabolic
Chromatinmodification
tRNA wobbleuridine
modification
rRNA metabolicMacromolecularcomplex subunit
organizationCellular protein
complexassembly
ncRNAmetabolic
Cellular proteinlocalization
Regulation oftranslation
Regulation of
from nucleusmRNA export Histone
Maturation ofSSU rRNA
tRNA
Tubulin complexassembly
RNA capping
processing
RNAprocessing
PRP19
aminoacylation
mRNA 3998400end
3998400end
Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes
the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)
Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein
folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878
score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial
BioMed Research International 9
RNA transportMaturation of
SSU-rRNA
RPS6A
POM152
UAF30
DBP5 NUP157
RPS11B
RPS6B
THP1
SRC1
TOP1
SEM1
NUP60
ENP1
SAC3
PUF4 HIR2
GLC7EFB1
THP2
(a)
RNA transport RNA localization
RRP6
YRA2
DBP5
NUP53
LRP1
THP1SRC1
TOP1
SEM1
NUP60
SAC3
PUF4HIR2
GLC7EFB1
THP2
(b)
Regulation ofgene expression
mRNA metabolicprocess
POP1
PRP42
CDC36SNU71
PAB1
SUP35
PRP3
PRP5
PRP6
PRP21
CDC73
EAF3
RTT103
RTT109
SWD1
SWD3
SDC1SAP30
RXT2
CTK1SET3
RPN4
PRP4
PHO23
NPL3
MED1
LGE1
LEO1
KAP122 DST1CTK2
EAF5HOS2
TAD3CSE2 BRE1
VPS71EAF7EAF6
SOH1
SIF2DOA1
UBP6 SWR1
SSD1
SIN3
HTZ1
(c)
Tubulin complex assembly
(protein folding)mRNA
process3998400 end
NAB2
PAN3
CTK1
GIM3
YKE2
GIM5
GIM4
(d)
Tubulin complex assembly
TOP1
MLP2 HIR2
CDC36CKA1
UTP4
GIM3
YKE2
GIM5
GIM4
(protein folding)Regulation oftranscription
(e)
mRNA splicingncRNA metabolic
process
YPR152C
YNR004WCSN12
MUD1
YHC1
MUD2
NAM8
MSM1
MSD1
MSK1
MRM1
MSF1
PRP19
SNU66
SNU56
PRP11
PML1ECM2
BUD13
HMT1
CWC21CWC27 TGS1
IST3CBC2
ISY1
LIN1
LEA1
MLP2
BRR1
NTC20SYF2
TIF35
(f)
Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix
10 BioMed Research International
6 Conclusion
In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)
References
[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919
[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007
[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008
[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001
[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004
[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative
epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006
[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007
[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008
[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008
[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260
[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006
[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011
[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011
[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010
[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004
[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014
[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009
[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997
[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014
[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013
[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012
[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012
[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008
BioMed Research International 11
[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008
[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007
[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009
[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011
[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011
[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010
[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008
[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
4 BioMed Research International
0 50 100 150 200 250 300 3500
200400
6000
10
20
Rank of matrix L
Card of matrix S
Rela
tive e
rror
of
The minimum ofrelative error
25
250
deco
mpo
sitio
n
Figure 1 Results of parameter tuning on synthetic data 119909-axisdifferent cardinality corresponding to parameter 119896 119910-axis differentrank corresponding to parameter 120582 119911-axis the relative error X minus
X2
119865X
2119865
5 Results
51 Synthetic Data We simulated a synthetic data and thenapplied LRSDec algorithm andGoDec algorithm to it Specif-ically low-rank part sparse part and noises are generated asfollows
(i) Low-rank part the covariance matrix Σ is generatedbyHH119879 whereH isin R119898times119870 andH
119894119895sim N(0 1) Here119870
is the number of hiddenmodulesThe random entriesL119895are drawn fromN(0 120591Σ) Let L = [L
1 L
119899]
(ii) Sparse part the non-zero entries in sparse matrixare generated from the tail of Gaussian distributionN(1 2) whose upper quantile is 120572 = 001 Werandomly selected 70 of them to assign the oppositesignThis is consistent with EMAP datasets in whichnegative genetic interactions are much more preva-lent than the positive ones
(iii) e = 10minus2
lowastF wherein F is a standard Gaussianmatrix
A low-rank matrix 119871 with rank 25 and sparse matrix withcardinality 250 were generated respectively Now we have
X = L + S + e (16)
The first step is parameter training and the result isshowed in Figure 1 Minimal prediction error was achievedwhen 120582 = 250 and 119896 = 25 which coincides with the rankand cardinality of the synthetic data This demonstrated theeffectiveness of cross validation procedure
Next we compared the performance of LRSDec algo-rithm and GoDec algorithm by comparing their predictionerror The relative error is defined as
10038171003817100381710038171003817W minus W10038171003817100381710038171003817
2
119865
W2
119865
(17)
where W is the original matrix and W is an estimateap-proximation As both algorithms are influenced seriouslyby the parameters we compared the relative error of thetwo algorithms by given different parameters To make thecomparison simple we only changed one parameter with
another parameter fixed (Figure 2) One can see that bothalgorithms reach the smallest relative if adopting the correcttwo parameters and LRSDec outperforms GoDec In theSupplementaryMaterial we also compare the performance ofboth algorithms under different noise setting and the trendsare the same
52 Application to EMAP Data from Yeast We also appliedour method to EMAP data interrogating RNA processing inS cerevisiae which consists of 552 genes involved in RNA-related processes [9]This geneticmap contains about 152000pairwise genetic interaction measurements with about 29missing entries in data matrixWe applied ourmethod to thisEMAP data denoted asX to obtain twomatrices a low-rankmatrix L and a sparse matrix S X = L+S is the new completedata matrix with imputed missing entries To exploit thequantitative information from EMAP data we first subjectedthe entire low-rank matrix L to hierarchical clustering anapproach that groups genes with similar patterns of geneticinteractions It should be noted that using low-rank matrix Lin cluster improved the performance of hierarchical cluster-ing in detecting genetic interaction modules [16] Accordingto the clustering result we reordered rows and columnsof matrix X so that the protein complexes and biologicalprocesses showed in [9] could be found (Figure S1)
To help identify more modules of cellular functions andprocesses and reveal the relationships between them we fur-ther analyzed the matrices L and S Figure 3 is a flowchart ofour strategy 1 to detectmodules and cross talks between themthrough low-rank matrix L In this paper we define moduleas a cluster from hierarchical clustering (HC) that passesthrough GO-enriched filtering (Supplementary Section 3)The details are as follows Firstly we clustered the row genesof matrix L using hierarchical clustering (HC) Here weadopted the average-linkage hierarchical clustering algorithmin which the distance of gene 119860 and gene 119861 is defined as1 minus |cor(119860 119861)| where |cor(119860 119861)| is the absolute value of thecorrelation of genetic interaction profile of gene 119860 and gene119861 A cutoff needs to be applied for HC to cut the hierarchicalclustering tree We used the Jaccard index (SupplementarySection 4) to determine how well the predicted gene setscorrespond to benchmark (theoretical) gene sets [17] HereGO functional gene sets are used as benchmark (theoretical)gene setsThe cutoff atwhichHCachieved the highest Jaccardindex is used to cut the hierarchical tree We calculatedthe Jaccard index of every ldquoheightrdquo cutoff in hierarchicalclustering from 02 to 095 by 005 interval This step resultedin the best Jaccard index with height = 07 and 84 clustersNow we got the clusters of row genes in which genes act ina consistent manner across the entire column genes Thenwe filtered the clustering results by a hypergeometric testthat calculates the significance of enrichment of GO itemsand the 119901 value was set to 001 The clusters enriched in GOfunctional categories are defined as row modules Secondlyfor each row module we exploited modules of column genesbased on this row gene module in matrix X by clustering thecolumn genes of this submatrix of X Thirdly we screenedcolumn clusters whose interactions with the rowmodules are
BioMed Research International 5
0 50 100 150 200 250 300 3500
02
04
06
08
1
Number of ranks
The r
elat
ive e
rror
Relative error of matrix L
(a)
0 50 100 150 200 250 300 3500
02
04
06
08
1
12
14
Number of ranks
The r
elat
ive e
rror
Relative error of matrix S
(b)
0 100 200 300 400 500 6000
005
01
015
02
025
03
035
Number of cards
The r
elat
ive e
rror
Relative error of matrix L
GoDecLRSDec
(c)
0 100 200 300 400 500 6000
02
04
06
08
1
12
14
Number of cards
The r
elat
ive e
rror
Relative error of matrix S
GoDecLRSDec
(d)
Figure 2 Performances of LRSDec and GoDec in low-rank and sparse decomposition tasks on synthetic data under different parameters((a)-(b)) Fixed parameter card different parameter rank ((c)-(d)) fixed parameter rank different parameter card
HC row genes of L
Row modules
GO-enrichment filtering
Based oneach row module
HC column genes of L
Fisherrsquos exact testColumn modules
Connections between this row module and all column modules
Column modules that have functional
All column modules map to GO items
Connections among modules of the whole network
Repeat these three steps for
all significant
row modules
Figure 3 Flowchart of our strategy 1 for detecting modules and cross talks between them in genetic interaction network by low-rank matrixL
6 BioMed Research International
Table 1 Fisherrsquos exact test table (Gene set 119860)perp denotes thecomplementary set of gene set 119860 119860119861 denotes the number ofconnections between gene set 119860 and gene set 119861 119860119861
perp
denotes thenumber of connections between gene set 119860 and the complementaryset of gene set 119861
Gene set 119861 (Gene set 119861)perp
Gene set 119860 119860119861 119860119861perp
(Gene set 119860)perp 119860perp
119861 119860perp
119861perp
Table 2 Clustering results
(a) GO slim as benchmark gene set
Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched
200 0063 190 0022 46150 0070 138 0032 44100 0084 90 0050 5050 0088 30 0067 24
(b) GO BP FAT as benchmark gene set
Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched
200 0137 183 0044 47150 0147 142 0058 50100 0155 96 0091 5250 0131 34 0078 26 hypergeometric test applied to test the enrichment of gene sets Signif-icance level FDR lt= 005 Cluster the number of clusters to cut off thehierarchical clustering tree Enriched the number of modules predictedby hierarchical clustering enriched in the GO iterms
significantly enriched by Fisherrsquos exact test (Table 1 119901 value= 005) Here we defined the positive genetic interactions asthose gene pairs with genetic interaction scores 119878 ge 20 andnegative as 119878 le minus25 [9] The reduced gene sets of columngenes were defined as columnmodules (corresponding to therow module) Next we identified the enriched GO functionalcategories of these column modules by mapping them to GOitems (hypergeometric test) Finally repeating these steps forall row modules we identified the modules and intermodulegenetic cross talk of the whole genetic interaction network(Figures 4ndash6) where red and green represent a statisticallysignificant enrichment of positive and negative interactionsThe cross talks constructed in these figures are the 119878 scoresamong genes in the original matrix In the following we willdiscuss many of the interesting connections that have beenreported previously
The low-rank matrix found more functional modulesthan the original matrix (Table 2) In Table 2 we cut thedendrogram at different heights and compared the clustersobtained from the low-rank matrix and that of the originalmatrix Jaccard index [17] is used to determine how well thepredicted clusters recaptured the benchmark gene sets ((a)GO slim and (b) GO BP FAT) The definition of Jaccardindex can be found in the Supplementary Material Inthe ideal situation where predicted clusters perfectly match
the benchmark gene sets the Jaccard index is 1 The largerthe Jaccard index the better the predictions The clustersobtained from clustering of the low-rank matrix are moreenriched with GO functional categories at varying cutoffs(Table 2)
Figure 4 gives an overview of the relationships amongbiological processes when GO slim (downloaded from SGD[18] a broad overview of all of the top GO categories) is usedas GO items We found that several sets of genes that havebeen known to function in the same biochemical processescontain predominantly positive or negative interactionswhich was also observed in [9] For example genes classed asinvolved in RNA splicing and transcription are significantlyenriched in negative genetic interactions (Figure 4 greennodes) In contrast the module involved in protein foldinghas strong positive interactions (red node) In additionFigure 4 also suggested that not all modules have consistentpattern of interactions (yellow node) which is reasonable inbiological processes Finally several connections have beenpreviously discussed For example there is good evidencefor functional interactions between splicing and transcriptionin [19] and functional interactions between splicing andtranslation in [20] Furthermore [21] reported the cooper-ative relationship between protein folding and chromosomeorganization
Actually if we classify the GO functional categories tomore fine items (GO BP FAT downloaded from DAVIDhttpdavidabccncifcrfgov) we can get a more compre-hensive network (Figure 5) Many of the interaction resultsin Figure 5 have been reported before For instance genesinvolved in tubulin complex assembly process have negativegenetic interactions with genes involved in tRNA wobbleuridine modification process supported by SGD while thenegative genetic interactions between RNA splicing processand tRNA metabolic process could also be found Actuallythe genetic interaction between RNA splicing and chromatinmodification has been studied in [22] And the balance ofthe interactions between the processing of ribonucleoproteinassembly of intronic noncoding RNAs and the splicingprocess regulating the levels of ncRNA and host mRNA canbe found in [23] Tubulin functionally relating to roles ofthe elongator complex in tRNA wobble uridine modificationis supported by [24] Moreover epistasis and chromatinimmunoprecipitation experiments indicating that the lossof Rrp6 (regulation of transcription) function is paralleledby the recruitment of Hda1 (histone deacetylase) have beenreported by [25] Finally cotranscriptional recruitment of themRNA export factor Yra1 realized by direct interaction withthe 31015840 end processing factor Pcf11 was in [26]
In an effort to gain insight into the functional organi-zation of RNA-related complexes we used the GO CC FAT(downloaded from DAVID) as the GO functional categoriesto create a map that highlights strong genetic trends bothwithin and between these complexes This result could befound in the Supplementary Material
Now let us turn to the analysis of the sparse matrix S Thesparse matrix gives two distinct measures to exploit geneticinformation First extreme 119878 scores indicate cofunctionalmembership more efficiently [27] Second some 119878 scores
BioMed Research International 7
Negativeinteractions
Positiveinteractions
LOC1LIN1
BUD13
IST3LEA1
STO1
PRP11SNU66
CBC2BRR1MUD2
TGS1YNR004WCWC15ECM2
YPR152CNTC20ISY1CWC21SYF2CWC27PRP19CSN12
PML1MUD1
NAM8
HMT1SNU56
PRP42TIF35DBR1YHC1
LOC1
LIN1
BUD13
IST3
LEA1
STO1
PRP1
1
SNU66
CBC2
BRR1
MU
D2
TGS1
YNR0
04
WCW
C15
ECM2
YPR1
52
CN
TC20
ISY1
CWC2
1
SYF2
CWC2
7
PRP1
9
CSN12
PML1
MU
D1
NA
M8
HM
T1SN
U56
PRP4
2
TIF3
5
DBR
1
YHC1
NPL3PUF4THP2RTT103CTK2SIF2HOS2SET3CDC73LEO1
RTT109
RTT109
CTK1HTZ1SOH1
SOH1
DST1CSE2
CSE2
MED1
SDC1EAF7THP1SYC1RXT2
RXT2
SWR1VPS71
PHO23
SAP30
PHO23SAP30SIN3
SIN3
HIR2
PUF4
SIF2HOS2SET3CDC73LEO1
SOH1CSE2
SDC1EAF7
RXT2
PHO2
SAP3
0SI
N3
HIR2
HIR2
RTT109G
BP2
SWR1VPS71
GLC
7
HIR2
NPL
3
PUF4
THP2
RTT1
03
CTK2
SIF2
HO
S2SE
T3CD
C73
LEO1
RTT1
09
CTK1
HTZ
1
SOH1
DST
1
CSE2
MED
1
SDC1
EAF7
THP1
SYC1
RXT2
PHO23
SAP3
0
SIN3
HIR2
GBP
2YR
A1
THP2
RPN4
SOH1
SAC3
THP1
DO
A1
RAD23
RTT1
09
SIN3
SWD1
SWD3
SWD3
RPN
4CA
F40
BRE1
SWD1
BRE1
LGE1
CDC7
3EA
F6EA
F5EA
F7SR
C1PA
P2EA
F3
GBP2
RAD23
CSE2
PUF4
SWD1
SWD3
BRE1
LGE1
RAD52
SIF2
HO
S2
SOH1
SET3
CDC7
LEO1
SRC1
SDC1
EAF3
RTT1
0
SWR1
VPS
71
RAD5
RAD5
RAD2
EAF6
EAF7
BRE1SIF2HOS2SET3SEM1
SAC3THP1BUD13
RPN4
RAD52
GLC7SRC1TOR1RTT101RAD51
SOH1
CSE2SIN3
MLP2
BRE1
KEM
1
SIF2
HO
S2SE
T3SE
M1
SAC3
THP1
BUD13
RPN4
RAD52
GLC
7
SRC1
TOR1
RTT1
01
RAD51
SOH1
CSE2
SIN3
MLP
2
YBL104CYGR283CRIC1YPT6GIM4
YKE2GIM3
GIM5
YGR2
83
CRI
C1YP
T6G
IM4
YKE2
GIM
3
GIM
5
RNA splicing
Transcription
DNA metabolicprocess
Cellular proteinmodification
processTranslation
Cellular proteindegradation
Transport
Mitochondrionorganization
Ribosome
Chromosomeorganization
Response tostress
Cell-divisioncycle
Protein folding
LRS4
HEX
3SL
X8LR
P1RR
P6FP
R1TG
S1
Figure 4 Global view of the genetic cross talks between different RNA-related processes (GO slim items) Green and red represent astatistically significant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20)interactions respectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative geneticinteractions Nodes (balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heatmaps represent scores of interactions within one process and the rectangle heatmaps represent scores of interactions betweentwo processes
indicate the significant genetic interactions between genesin different gene sets Following this clue by analyzing thematrix S we found much evidence of genes involved in thesame functionalmodules andmany cross talks between func-tional modules (Figure 6) Actually many of them supportthe network in Figures 4-5 The information of gene pairswith extreme 119878 scores and the involved modules could befound in the Supplementary Material (Supplementary Data2)
Strategy 2 of sparse matrix analysis is similar to strategy 1(Figure 3) First for every row module (the same definitionas that in Figure 3) cluster the column genes based ontheir genetic spectrums across genes in this row moduleThen select the column gene sets in which there are genesbelonging to nonzero entries of sparse matrix to be definedas column modules Finally map these column modules toGO items identifying their enriched functional categories(hypergeometric test) Similarly for all row modules repeat
8 BioMed Research International
PRP4
YRA1
CDC73
LGE1SDC1
RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1
SWD1RTT109
SWD3BRE1EAF5EAF7KAP122
SSD1TAD3EAF6
SWR1VPS71HTZ1EAF3CUL3
SEM1THP2SAC3THP1
PUF4GLC7SRC1
TOP1NUP60EFB1DBP5HIR2
SEM
1TH
P2SA
C3TH
P1
PUF4
GLC
7SR
C1
TOP1
NU
P60
EFB1
DBP
5H
IR2
RTT103
GBP2
CTK1
CTK2
SIF2HOS2SET3LEO1NPL3
UBP6DOA1RPN4
RAD23
PRP4
YRA1
CDC73
LGE1
SDC1
RXT2PHO23
SAP30
RXT2
PHO23
SAP3
0
SIN3
TAD3
DST1
CSE2
MED1
SWD1
RTT109
SWD3BRE1
EAF5EAF7
KAP122
SSD1
SWR1VPS71
CUL3
SOP1
EAF6
SET3
HTZ1
EAF3
RTT103
GBP2
CTK1
CTK2
SIF2
HOS2
SIN
3
LEO1
NPL3
DOA1UBP6
RPN4
RAD23
CBC2
CSN12
LEA1
CWC21
YHC1
PRP19
ELP6
PRP11
CWC27NTC20
IKI1
ELP2
ELP3IKI3
ELP4
ECM2
MUD2TGS1
SNU56
BRR1
BUD13
MLP2SYF2
YNR004WLIN1
SNU66PML1
YPR152C
URM1
ELP6
IKI1
ELP2
ELP3
IKI3
ELP4
URM
1
MUD1
HMT1
GIM5
YKE2GIM3
GIM4
GIM5YKE2GIM3GIM4
ELP3
ELP6
IKI1
GIM
5
YKE2
GIM
3
GIM
4
HTZ
1
SWR1
VPS
71
SGF1
1
TGS1
SKI7
HCR
1
RPS8
NPL
3
DRS
1
IST3
TIF35
ISY1
NAM8
CBC2
CSN12
LEA1
CWC21
YHC1
PRP11
CWC27
NTC20
ECM2
MUD1
TGS1
SNU56BRR1
BUD13
MLP2
SYF2
YNR004W
LIN1
SNU66
PML1
YPR152C
MUD2
HMT1
IST3
TIF35
ISY1
NAM8
PRP4
YRA
1
CDC7
3LG
E1
SSD
1
RXT2
PHO
23SA
P30
SIN
3D
ST1
CSE2
MED
1SO
H1
SWD
1
RTT1
09
SWD
3BR
E1
EAF5
EAF7
EAF6
KAP1
22
SDC1
TAD
3
SWR1
VPS
71H
TZ1
EAF3
CUL3
RTT1
03
GBP
2
CTK1
CTK2
SIF2
HO
S2SE
T3LE
O1
NPL
3
UBP
6D
OA
1
RPN
4
RAD
23
RNA splicing
Transcription
Transcription
transcription
RNA transport RNA localization
Intracellulartransport
Ribonucleoproteincomplex
biogenesis
Ribonucleoproteincomplex
biogenesis
Deadenylationdependentdecapping of
nucleartranscribed
mRNA
biogenesis
Regulation ofcellular protein
metabolic ncRNA
mRNA catabolic
metabolic
Ribosomeprocess
process
process
processprocess
process
Posttranscriptionalregulation of
gene expression
dependentmacromolecule
catabolic
tRNAmodification
Modification
process
rRNA metabolic
mRNA metabolic
process
process
process
process
Macromoleculecatabolicprocess
aciddeacetylation
deacetylation
Protein amino
RNA catabolic
Macromoleculecatabolic
tRNA metabolic
Chromatinmodification
tRNA wobbleuridine
modification
rRNA metabolicMacromolecularcomplex subunit
organizationCellular protein
complexassembly
ncRNAmetabolic
Cellular proteinlocalization
Regulation oftranslation
Regulation of
from nucleusmRNA export Histone
Maturation ofSSU rRNA
tRNA
Tubulin complexassembly
RNA capping
processing
RNAprocessing
PRP19
aminoacylation
mRNA 3998400end
3998400end
Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes
the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)
Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein
folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878
score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial
BioMed Research International 9
RNA transportMaturation of
SSU-rRNA
RPS6A
POM152
UAF30
DBP5 NUP157
RPS11B
RPS6B
THP1
SRC1
TOP1
SEM1
NUP60
ENP1
SAC3
PUF4 HIR2
GLC7EFB1
THP2
(a)
RNA transport RNA localization
RRP6
YRA2
DBP5
NUP53
LRP1
THP1SRC1
TOP1
SEM1
NUP60
SAC3
PUF4HIR2
GLC7EFB1
THP2
(b)
Regulation ofgene expression
mRNA metabolicprocess
POP1
PRP42
CDC36SNU71
PAB1
SUP35
PRP3
PRP5
PRP6
PRP21
CDC73
EAF3
RTT103
RTT109
SWD1
SWD3
SDC1SAP30
RXT2
CTK1SET3
RPN4
PRP4
PHO23
NPL3
MED1
LGE1
LEO1
KAP122 DST1CTK2
EAF5HOS2
TAD3CSE2 BRE1
VPS71EAF7EAF6
SOH1
SIF2DOA1
UBP6 SWR1
SSD1
SIN3
HTZ1
(c)
Tubulin complex assembly
(protein folding)mRNA
process3998400 end
NAB2
PAN3
CTK1
GIM3
YKE2
GIM5
GIM4
(d)
Tubulin complex assembly
TOP1
MLP2 HIR2
CDC36CKA1
UTP4
GIM3
YKE2
GIM5
GIM4
(protein folding)Regulation oftranscription
(e)
mRNA splicingncRNA metabolic
process
YPR152C
YNR004WCSN12
MUD1
YHC1
MUD2
NAM8
MSM1
MSD1
MSK1
MRM1
MSF1
PRP19
SNU66
SNU56
PRP11
PML1ECM2
BUD13
HMT1
CWC21CWC27 TGS1
IST3CBC2
ISY1
LIN1
LEA1
MLP2
BRR1
NTC20SYF2
TIF35
(f)
Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix
10 BioMed Research International
6 Conclusion
In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)
References
[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919
[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007
[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008
[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001
[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004
[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative
epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006
[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007
[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008
[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008
[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260
[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006
[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011
[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011
[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010
[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004
[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014
[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009
[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997
[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014
[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013
[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012
[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012
[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008
BioMed Research International 11
[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008
[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007
[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009
[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011
[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011
[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010
[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008
[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 5
0 50 100 150 200 250 300 3500
02
04
06
08
1
Number of ranks
The r
elat
ive e
rror
Relative error of matrix L
(a)
0 50 100 150 200 250 300 3500
02
04
06
08
1
12
14
Number of ranks
The r
elat
ive e
rror
Relative error of matrix S
(b)
0 100 200 300 400 500 6000
005
01
015
02
025
03
035
Number of cards
The r
elat
ive e
rror
Relative error of matrix L
GoDecLRSDec
(c)
0 100 200 300 400 500 6000
02
04
06
08
1
12
14
Number of cards
The r
elat
ive e
rror
Relative error of matrix S
GoDecLRSDec
(d)
Figure 2 Performances of LRSDec and GoDec in low-rank and sparse decomposition tasks on synthetic data under different parameters((a)-(b)) Fixed parameter card different parameter rank ((c)-(d)) fixed parameter rank different parameter card
HC row genes of L
Row modules
GO-enrichment filtering
Based oneach row module
HC column genes of L
Fisherrsquos exact testColumn modules
Connections between this row module and all column modules
Column modules that have functional
All column modules map to GO items
Connections among modules of the whole network
Repeat these three steps for
all significant
row modules
Figure 3 Flowchart of our strategy 1 for detecting modules and cross talks between them in genetic interaction network by low-rank matrixL
6 BioMed Research International
Table 1 Fisherrsquos exact test table (Gene set 119860)perp denotes thecomplementary set of gene set 119860 119860119861 denotes the number ofconnections between gene set 119860 and gene set 119861 119860119861
perp
denotes thenumber of connections between gene set 119860 and the complementaryset of gene set 119861
Gene set 119861 (Gene set 119861)perp
Gene set 119860 119860119861 119860119861perp
(Gene set 119860)perp 119860perp
119861 119860perp
119861perp
Table 2 Clustering results
(a) GO slim as benchmark gene set
Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched
200 0063 190 0022 46150 0070 138 0032 44100 0084 90 0050 5050 0088 30 0067 24
(b) GO BP FAT as benchmark gene set
Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched
200 0137 183 0044 47150 0147 142 0058 50100 0155 96 0091 5250 0131 34 0078 26 hypergeometric test applied to test the enrichment of gene sets Signif-icance level FDR lt= 005 Cluster the number of clusters to cut off thehierarchical clustering tree Enriched the number of modules predictedby hierarchical clustering enriched in the GO iterms
significantly enriched by Fisherrsquos exact test (Table 1 119901 value= 005) Here we defined the positive genetic interactions asthose gene pairs with genetic interaction scores 119878 ge 20 andnegative as 119878 le minus25 [9] The reduced gene sets of columngenes were defined as columnmodules (corresponding to therow module) Next we identified the enriched GO functionalcategories of these column modules by mapping them to GOitems (hypergeometric test) Finally repeating these steps forall row modules we identified the modules and intermodulegenetic cross talk of the whole genetic interaction network(Figures 4ndash6) where red and green represent a statisticallysignificant enrichment of positive and negative interactionsThe cross talks constructed in these figures are the 119878 scoresamong genes in the original matrix In the following we willdiscuss many of the interesting connections that have beenreported previously
The low-rank matrix found more functional modulesthan the original matrix (Table 2) In Table 2 we cut thedendrogram at different heights and compared the clustersobtained from the low-rank matrix and that of the originalmatrix Jaccard index [17] is used to determine how well thepredicted clusters recaptured the benchmark gene sets ((a)GO slim and (b) GO BP FAT) The definition of Jaccardindex can be found in the Supplementary Material Inthe ideal situation where predicted clusters perfectly match
the benchmark gene sets the Jaccard index is 1 The largerthe Jaccard index the better the predictions The clustersobtained from clustering of the low-rank matrix are moreenriched with GO functional categories at varying cutoffs(Table 2)
Figure 4 gives an overview of the relationships amongbiological processes when GO slim (downloaded from SGD[18] a broad overview of all of the top GO categories) is usedas GO items We found that several sets of genes that havebeen known to function in the same biochemical processescontain predominantly positive or negative interactionswhich was also observed in [9] For example genes classed asinvolved in RNA splicing and transcription are significantlyenriched in negative genetic interactions (Figure 4 greennodes) In contrast the module involved in protein foldinghas strong positive interactions (red node) In additionFigure 4 also suggested that not all modules have consistentpattern of interactions (yellow node) which is reasonable inbiological processes Finally several connections have beenpreviously discussed For example there is good evidencefor functional interactions between splicing and transcriptionin [19] and functional interactions between splicing andtranslation in [20] Furthermore [21] reported the cooper-ative relationship between protein folding and chromosomeorganization
Actually if we classify the GO functional categories tomore fine items (GO BP FAT downloaded from DAVIDhttpdavidabccncifcrfgov) we can get a more compre-hensive network (Figure 5) Many of the interaction resultsin Figure 5 have been reported before For instance genesinvolved in tubulin complex assembly process have negativegenetic interactions with genes involved in tRNA wobbleuridine modification process supported by SGD while thenegative genetic interactions between RNA splicing processand tRNA metabolic process could also be found Actuallythe genetic interaction between RNA splicing and chromatinmodification has been studied in [22] And the balance ofthe interactions between the processing of ribonucleoproteinassembly of intronic noncoding RNAs and the splicingprocess regulating the levels of ncRNA and host mRNA canbe found in [23] Tubulin functionally relating to roles ofthe elongator complex in tRNA wobble uridine modificationis supported by [24] Moreover epistasis and chromatinimmunoprecipitation experiments indicating that the lossof Rrp6 (regulation of transcription) function is paralleledby the recruitment of Hda1 (histone deacetylase) have beenreported by [25] Finally cotranscriptional recruitment of themRNA export factor Yra1 realized by direct interaction withthe 31015840 end processing factor Pcf11 was in [26]
In an effort to gain insight into the functional organi-zation of RNA-related complexes we used the GO CC FAT(downloaded from DAVID) as the GO functional categoriesto create a map that highlights strong genetic trends bothwithin and between these complexes This result could befound in the Supplementary Material
Now let us turn to the analysis of the sparse matrix S Thesparse matrix gives two distinct measures to exploit geneticinformation First extreme 119878 scores indicate cofunctionalmembership more efficiently [27] Second some 119878 scores
BioMed Research International 7
Negativeinteractions
Positiveinteractions
LOC1LIN1
BUD13
IST3LEA1
STO1
PRP11SNU66
CBC2BRR1MUD2
TGS1YNR004WCWC15ECM2
YPR152CNTC20ISY1CWC21SYF2CWC27PRP19CSN12
PML1MUD1
NAM8
HMT1SNU56
PRP42TIF35DBR1YHC1
LOC1
LIN1
BUD13
IST3
LEA1
STO1
PRP1
1
SNU66
CBC2
BRR1
MU
D2
TGS1
YNR0
04
WCW
C15
ECM2
YPR1
52
CN
TC20
ISY1
CWC2
1
SYF2
CWC2
7
PRP1
9
CSN12
PML1
MU
D1
NA
M8
HM
T1SN
U56
PRP4
2
TIF3
5
DBR
1
YHC1
NPL3PUF4THP2RTT103CTK2SIF2HOS2SET3CDC73LEO1
RTT109
RTT109
CTK1HTZ1SOH1
SOH1
DST1CSE2
CSE2
MED1
SDC1EAF7THP1SYC1RXT2
RXT2
SWR1VPS71
PHO23
SAP30
PHO23SAP30SIN3
SIN3
HIR2
PUF4
SIF2HOS2SET3CDC73LEO1
SOH1CSE2
SDC1EAF7
RXT2
PHO2
SAP3
0SI
N3
HIR2
HIR2
RTT109G
BP2
SWR1VPS71
GLC
7
HIR2
NPL
3
PUF4
THP2
RTT1
03
CTK2
SIF2
HO
S2SE
T3CD
C73
LEO1
RTT1
09
CTK1
HTZ
1
SOH1
DST
1
CSE2
MED
1
SDC1
EAF7
THP1
SYC1
RXT2
PHO23
SAP3
0
SIN3
HIR2
GBP
2YR
A1
THP2
RPN4
SOH1
SAC3
THP1
DO
A1
RAD23
RTT1
09
SIN3
SWD1
SWD3
SWD3
RPN
4CA
F40
BRE1
SWD1
BRE1
LGE1
CDC7
3EA
F6EA
F5EA
F7SR
C1PA
P2EA
F3
GBP2
RAD23
CSE2
PUF4
SWD1
SWD3
BRE1
LGE1
RAD52
SIF2
HO
S2
SOH1
SET3
CDC7
LEO1
SRC1
SDC1
EAF3
RTT1
0
SWR1
VPS
71
RAD5
RAD5
RAD2
EAF6
EAF7
BRE1SIF2HOS2SET3SEM1
SAC3THP1BUD13
RPN4
RAD52
GLC7SRC1TOR1RTT101RAD51
SOH1
CSE2SIN3
MLP2
BRE1
KEM
1
SIF2
HO
S2SE
T3SE
M1
SAC3
THP1
BUD13
RPN4
RAD52
GLC
7
SRC1
TOR1
RTT1
01
RAD51
SOH1
CSE2
SIN3
MLP
2
YBL104CYGR283CRIC1YPT6GIM4
YKE2GIM3
GIM5
YGR2
83
CRI
C1YP
T6G
IM4
YKE2
GIM
3
GIM
5
RNA splicing
Transcription
DNA metabolicprocess
Cellular proteinmodification
processTranslation
Cellular proteindegradation
Transport
Mitochondrionorganization
Ribosome
Chromosomeorganization
Response tostress
Cell-divisioncycle
Protein folding
LRS4
HEX
3SL
X8LR
P1RR
P6FP
R1TG
S1
Figure 4 Global view of the genetic cross talks between different RNA-related processes (GO slim items) Green and red represent astatistically significant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20)interactions respectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative geneticinteractions Nodes (balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heatmaps represent scores of interactions within one process and the rectangle heatmaps represent scores of interactions betweentwo processes
indicate the significant genetic interactions between genesin different gene sets Following this clue by analyzing thematrix S we found much evidence of genes involved in thesame functionalmodules andmany cross talks between func-tional modules (Figure 6) Actually many of them supportthe network in Figures 4-5 The information of gene pairswith extreme 119878 scores and the involved modules could befound in the Supplementary Material (Supplementary Data2)
Strategy 2 of sparse matrix analysis is similar to strategy 1(Figure 3) First for every row module (the same definitionas that in Figure 3) cluster the column genes based ontheir genetic spectrums across genes in this row moduleThen select the column gene sets in which there are genesbelonging to nonzero entries of sparse matrix to be definedas column modules Finally map these column modules toGO items identifying their enriched functional categories(hypergeometric test) Similarly for all row modules repeat
8 BioMed Research International
PRP4
YRA1
CDC73
LGE1SDC1
RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1
SWD1RTT109
SWD3BRE1EAF5EAF7KAP122
SSD1TAD3EAF6
SWR1VPS71HTZ1EAF3CUL3
SEM1THP2SAC3THP1
PUF4GLC7SRC1
TOP1NUP60EFB1DBP5HIR2
SEM
1TH
P2SA
C3TH
P1
PUF4
GLC
7SR
C1
TOP1
NU
P60
EFB1
DBP
5H
IR2
RTT103
GBP2
CTK1
CTK2
SIF2HOS2SET3LEO1NPL3
UBP6DOA1RPN4
RAD23
PRP4
YRA1
CDC73
LGE1
SDC1
RXT2PHO23
SAP30
RXT2
PHO23
SAP3
0
SIN3
TAD3
DST1
CSE2
MED1
SWD1
RTT109
SWD3BRE1
EAF5EAF7
KAP122
SSD1
SWR1VPS71
CUL3
SOP1
EAF6
SET3
HTZ1
EAF3
RTT103
GBP2
CTK1
CTK2
SIF2
HOS2
SIN
3
LEO1
NPL3
DOA1UBP6
RPN4
RAD23
CBC2
CSN12
LEA1
CWC21
YHC1
PRP19
ELP6
PRP11
CWC27NTC20
IKI1
ELP2
ELP3IKI3
ELP4
ECM2
MUD2TGS1
SNU56
BRR1
BUD13
MLP2SYF2
YNR004WLIN1
SNU66PML1
YPR152C
URM1
ELP6
IKI1
ELP2
ELP3
IKI3
ELP4
URM
1
MUD1
HMT1
GIM5
YKE2GIM3
GIM4
GIM5YKE2GIM3GIM4
ELP3
ELP6
IKI1
GIM
5
YKE2
GIM
3
GIM
4
HTZ
1
SWR1
VPS
71
SGF1
1
TGS1
SKI7
HCR
1
RPS8
NPL
3
DRS
1
IST3
TIF35
ISY1
NAM8
CBC2
CSN12
LEA1
CWC21
YHC1
PRP11
CWC27
NTC20
ECM2
MUD1
TGS1
SNU56BRR1
BUD13
MLP2
SYF2
YNR004W
LIN1
SNU66
PML1
YPR152C
MUD2
HMT1
IST3
TIF35
ISY1
NAM8
PRP4
YRA
1
CDC7
3LG
E1
SSD
1
RXT2
PHO
23SA
P30
SIN
3D
ST1
CSE2
MED
1SO
H1
SWD
1
RTT1
09
SWD
3BR
E1
EAF5
EAF7
EAF6
KAP1
22
SDC1
TAD
3
SWR1
VPS
71H
TZ1
EAF3
CUL3
RTT1
03
GBP
2
CTK1
CTK2
SIF2
HO
S2SE
T3LE
O1
NPL
3
UBP
6D
OA
1
RPN
4
RAD
23
RNA splicing
Transcription
Transcription
transcription
RNA transport RNA localization
Intracellulartransport
Ribonucleoproteincomplex
biogenesis
Ribonucleoproteincomplex
biogenesis
Deadenylationdependentdecapping of
nucleartranscribed
mRNA
biogenesis
Regulation ofcellular protein
metabolic ncRNA
mRNA catabolic
metabolic
Ribosomeprocess
process
process
processprocess
process
Posttranscriptionalregulation of
gene expression
dependentmacromolecule
catabolic
tRNAmodification
Modification
process
rRNA metabolic
mRNA metabolic
process
process
process
process
Macromoleculecatabolicprocess
aciddeacetylation
deacetylation
Protein amino
RNA catabolic
Macromoleculecatabolic
tRNA metabolic
Chromatinmodification
tRNA wobbleuridine
modification
rRNA metabolicMacromolecularcomplex subunit
organizationCellular protein
complexassembly
ncRNAmetabolic
Cellular proteinlocalization
Regulation oftranslation
Regulation of
from nucleusmRNA export Histone
Maturation ofSSU rRNA
tRNA
Tubulin complexassembly
RNA capping
processing
RNAprocessing
PRP19
aminoacylation
mRNA 3998400end
3998400end
Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes
the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)
Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein
folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878
score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial
BioMed Research International 9
RNA transportMaturation of
SSU-rRNA
RPS6A
POM152
UAF30
DBP5 NUP157
RPS11B
RPS6B
THP1
SRC1
TOP1
SEM1
NUP60
ENP1
SAC3
PUF4 HIR2
GLC7EFB1
THP2
(a)
RNA transport RNA localization
RRP6
YRA2
DBP5
NUP53
LRP1
THP1SRC1
TOP1
SEM1
NUP60
SAC3
PUF4HIR2
GLC7EFB1
THP2
(b)
Regulation ofgene expression
mRNA metabolicprocess
POP1
PRP42
CDC36SNU71
PAB1
SUP35
PRP3
PRP5
PRP6
PRP21
CDC73
EAF3
RTT103
RTT109
SWD1
SWD3
SDC1SAP30
RXT2
CTK1SET3
RPN4
PRP4
PHO23
NPL3
MED1
LGE1
LEO1
KAP122 DST1CTK2
EAF5HOS2
TAD3CSE2 BRE1
VPS71EAF7EAF6
SOH1
SIF2DOA1
UBP6 SWR1
SSD1
SIN3
HTZ1
(c)
Tubulin complex assembly
(protein folding)mRNA
process3998400 end
NAB2
PAN3
CTK1
GIM3
YKE2
GIM5
GIM4
(d)
Tubulin complex assembly
TOP1
MLP2 HIR2
CDC36CKA1
UTP4
GIM3
YKE2
GIM5
GIM4
(protein folding)Regulation oftranscription
(e)
mRNA splicingncRNA metabolic
process
YPR152C
YNR004WCSN12
MUD1
YHC1
MUD2
NAM8
MSM1
MSD1
MSK1
MRM1
MSF1
PRP19
SNU66
SNU56
PRP11
PML1ECM2
BUD13
HMT1
CWC21CWC27 TGS1
IST3CBC2
ISY1
LIN1
LEA1
MLP2
BRR1
NTC20SYF2
TIF35
(f)
Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix
10 BioMed Research International
6 Conclusion
In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)
References
[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919
[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007
[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008
[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001
[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004
[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative
epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006
[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007
[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008
[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008
[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260
[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006
[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011
[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011
[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010
[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004
[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014
[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009
[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997
[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014
[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013
[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012
[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012
[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008
BioMed Research International 11
[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008
[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007
[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009
[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011
[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011
[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010
[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008
[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
6 BioMed Research International
Table 1 Fisherrsquos exact test table (Gene set 119860)perp denotes thecomplementary set of gene set 119860 119860119861 denotes the number ofconnections between gene set 119860 and gene set 119861 119860119861
perp
denotes thenumber of connections between gene set 119860 and the complementaryset of gene set 119861
Gene set 119861 (Gene set 119861)perp
Gene set 119860 119860119861 119860119861perp
(Gene set 119860)perp 119860perp
119861 119860perp
119861perp
Table 2 Clustering results
(a) GO slim as benchmark gene set
Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched
200 0063 190 0022 46150 0070 138 0032 44100 0084 90 0050 5050 0088 30 0067 24
(b) GO BP FAT as benchmark gene set
Clusters Low-rank matrix Original matrixJC-index Enriched JC-index Enriched
200 0137 183 0044 47150 0147 142 0058 50100 0155 96 0091 5250 0131 34 0078 26 hypergeometric test applied to test the enrichment of gene sets Signif-icance level FDR lt= 005 Cluster the number of clusters to cut off thehierarchical clustering tree Enriched the number of modules predictedby hierarchical clustering enriched in the GO iterms
significantly enriched by Fisherrsquos exact test (Table 1 119901 value= 005) Here we defined the positive genetic interactions asthose gene pairs with genetic interaction scores 119878 ge 20 andnegative as 119878 le minus25 [9] The reduced gene sets of columngenes were defined as columnmodules (corresponding to therow module) Next we identified the enriched GO functionalcategories of these column modules by mapping them to GOitems (hypergeometric test) Finally repeating these steps forall row modules we identified the modules and intermodulegenetic cross talk of the whole genetic interaction network(Figures 4ndash6) where red and green represent a statisticallysignificant enrichment of positive and negative interactionsThe cross talks constructed in these figures are the 119878 scoresamong genes in the original matrix In the following we willdiscuss many of the interesting connections that have beenreported previously
The low-rank matrix found more functional modulesthan the original matrix (Table 2) In Table 2 we cut thedendrogram at different heights and compared the clustersobtained from the low-rank matrix and that of the originalmatrix Jaccard index [17] is used to determine how well thepredicted clusters recaptured the benchmark gene sets ((a)GO slim and (b) GO BP FAT) The definition of Jaccardindex can be found in the Supplementary Material Inthe ideal situation where predicted clusters perfectly match
the benchmark gene sets the Jaccard index is 1 The largerthe Jaccard index the better the predictions The clustersobtained from clustering of the low-rank matrix are moreenriched with GO functional categories at varying cutoffs(Table 2)
Figure 4 gives an overview of the relationships amongbiological processes when GO slim (downloaded from SGD[18] a broad overview of all of the top GO categories) is usedas GO items We found that several sets of genes that havebeen known to function in the same biochemical processescontain predominantly positive or negative interactionswhich was also observed in [9] For example genes classed asinvolved in RNA splicing and transcription are significantlyenriched in negative genetic interactions (Figure 4 greennodes) In contrast the module involved in protein foldinghas strong positive interactions (red node) In additionFigure 4 also suggested that not all modules have consistentpattern of interactions (yellow node) which is reasonable inbiological processes Finally several connections have beenpreviously discussed For example there is good evidencefor functional interactions between splicing and transcriptionin [19] and functional interactions between splicing andtranslation in [20] Furthermore [21] reported the cooper-ative relationship between protein folding and chromosomeorganization
Actually if we classify the GO functional categories tomore fine items (GO BP FAT downloaded from DAVIDhttpdavidabccncifcrfgov) we can get a more compre-hensive network (Figure 5) Many of the interaction resultsin Figure 5 have been reported before For instance genesinvolved in tubulin complex assembly process have negativegenetic interactions with genes involved in tRNA wobbleuridine modification process supported by SGD while thenegative genetic interactions between RNA splicing processand tRNA metabolic process could also be found Actuallythe genetic interaction between RNA splicing and chromatinmodification has been studied in [22] And the balance ofthe interactions between the processing of ribonucleoproteinassembly of intronic noncoding RNAs and the splicingprocess regulating the levels of ncRNA and host mRNA canbe found in [23] Tubulin functionally relating to roles ofthe elongator complex in tRNA wobble uridine modificationis supported by [24] Moreover epistasis and chromatinimmunoprecipitation experiments indicating that the lossof Rrp6 (regulation of transcription) function is paralleledby the recruitment of Hda1 (histone deacetylase) have beenreported by [25] Finally cotranscriptional recruitment of themRNA export factor Yra1 realized by direct interaction withthe 31015840 end processing factor Pcf11 was in [26]
In an effort to gain insight into the functional organi-zation of RNA-related complexes we used the GO CC FAT(downloaded from DAVID) as the GO functional categoriesto create a map that highlights strong genetic trends bothwithin and between these complexes This result could befound in the Supplementary Material
Now let us turn to the analysis of the sparse matrix S Thesparse matrix gives two distinct measures to exploit geneticinformation First extreme 119878 scores indicate cofunctionalmembership more efficiently [27] Second some 119878 scores
BioMed Research International 7
Negativeinteractions
Positiveinteractions
LOC1LIN1
BUD13
IST3LEA1
STO1
PRP11SNU66
CBC2BRR1MUD2
TGS1YNR004WCWC15ECM2
YPR152CNTC20ISY1CWC21SYF2CWC27PRP19CSN12
PML1MUD1
NAM8
HMT1SNU56
PRP42TIF35DBR1YHC1
LOC1
LIN1
BUD13
IST3
LEA1
STO1
PRP1
1
SNU66
CBC2
BRR1
MU
D2
TGS1
YNR0
04
WCW
C15
ECM2
YPR1
52
CN
TC20
ISY1
CWC2
1
SYF2
CWC2
7
PRP1
9
CSN12
PML1
MU
D1
NA
M8
HM
T1SN
U56
PRP4
2
TIF3
5
DBR
1
YHC1
NPL3PUF4THP2RTT103CTK2SIF2HOS2SET3CDC73LEO1
RTT109
RTT109
CTK1HTZ1SOH1
SOH1
DST1CSE2
CSE2
MED1
SDC1EAF7THP1SYC1RXT2
RXT2
SWR1VPS71
PHO23
SAP30
PHO23SAP30SIN3
SIN3
HIR2
PUF4
SIF2HOS2SET3CDC73LEO1
SOH1CSE2
SDC1EAF7
RXT2
PHO2
SAP3
0SI
N3
HIR2
HIR2
RTT109G
BP2
SWR1VPS71
GLC
7
HIR2
NPL
3
PUF4
THP2
RTT1
03
CTK2
SIF2
HO
S2SE
T3CD
C73
LEO1
RTT1
09
CTK1
HTZ
1
SOH1
DST
1
CSE2
MED
1
SDC1
EAF7
THP1
SYC1
RXT2
PHO23
SAP3
0
SIN3
HIR2
GBP
2YR
A1
THP2
RPN4
SOH1
SAC3
THP1
DO
A1
RAD23
RTT1
09
SIN3
SWD1
SWD3
SWD3
RPN
4CA
F40
BRE1
SWD1
BRE1
LGE1
CDC7
3EA
F6EA
F5EA
F7SR
C1PA
P2EA
F3
GBP2
RAD23
CSE2
PUF4
SWD1
SWD3
BRE1
LGE1
RAD52
SIF2
HO
S2
SOH1
SET3
CDC7
LEO1
SRC1
SDC1
EAF3
RTT1
0
SWR1
VPS
71
RAD5
RAD5
RAD2
EAF6
EAF7
BRE1SIF2HOS2SET3SEM1
SAC3THP1BUD13
RPN4
RAD52
GLC7SRC1TOR1RTT101RAD51
SOH1
CSE2SIN3
MLP2
BRE1
KEM
1
SIF2
HO
S2SE
T3SE
M1
SAC3
THP1
BUD13
RPN4
RAD52
GLC
7
SRC1
TOR1
RTT1
01
RAD51
SOH1
CSE2
SIN3
MLP
2
YBL104CYGR283CRIC1YPT6GIM4
YKE2GIM3
GIM5
YGR2
83
CRI
C1YP
T6G
IM4
YKE2
GIM
3
GIM
5
RNA splicing
Transcription
DNA metabolicprocess
Cellular proteinmodification
processTranslation
Cellular proteindegradation
Transport
Mitochondrionorganization
Ribosome
Chromosomeorganization
Response tostress
Cell-divisioncycle
Protein folding
LRS4
HEX
3SL
X8LR
P1RR
P6FP
R1TG
S1
Figure 4 Global view of the genetic cross talks between different RNA-related processes (GO slim items) Green and red represent astatistically significant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20)interactions respectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative geneticinteractions Nodes (balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heatmaps represent scores of interactions within one process and the rectangle heatmaps represent scores of interactions betweentwo processes
indicate the significant genetic interactions between genesin different gene sets Following this clue by analyzing thematrix S we found much evidence of genes involved in thesame functionalmodules andmany cross talks between func-tional modules (Figure 6) Actually many of them supportthe network in Figures 4-5 The information of gene pairswith extreme 119878 scores and the involved modules could befound in the Supplementary Material (Supplementary Data2)
Strategy 2 of sparse matrix analysis is similar to strategy 1(Figure 3) First for every row module (the same definitionas that in Figure 3) cluster the column genes based ontheir genetic spectrums across genes in this row moduleThen select the column gene sets in which there are genesbelonging to nonzero entries of sparse matrix to be definedas column modules Finally map these column modules toGO items identifying their enriched functional categories(hypergeometric test) Similarly for all row modules repeat
8 BioMed Research International
PRP4
YRA1
CDC73
LGE1SDC1
RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1
SWD1RTT109
SWD3BRE1EAF5EAF7KAP122
SSD1TAD3EAF6
SWR1VPS71HTZ1EAF3CUL3
SEM1THP2SAC3THP1
PUF4GLC7SRC1
TOP1NUP60EFB1DBP5HIR2
SEM
1TH
P2SA
C3TH
P1
PUF4
GLC
7SR
C1
TOP1
NU
P60
EFB1
DBP
5H
IR2
RTT103
GBP2
CTK1
CTK2
SIF2HOS2SET3LEO1NPL3
UBP6DOA1RPN4
RAD23
PRP4
YRA1
CDC73
LGE1
SDC1
RXT2PHO23
SAP30
RXT2
PHO23
SAP3
0
SIN3
TAD3
DST1
CSE2
MED1
SWD1
RTT109
SWD3BRE1
EAF5EAF7
KAP122
SSD1
SWR1VPS71
CUL3
SOP1
EAF6
SET3
HTZ1
EAF3
RTT103
GBP2
CTK1
CTK2
SIF2
HOS2
SIN
3
LEO1
NPL3
DOA1UBP6
RPN4
RAD23
CBC2
CSN12
LEA1
CWC21
YHC1
PRP19
ELP6
PRP11
CWC27NTC20
IKI1
ELP2
ELP3IKI3
ELP4
ECM2
MUD2TGS1
SNU56
BRR1
BUD13
MLP2SYF2
YNR004WLIN1
SNU66PML1
YPR152C
URM1
ELP6
IKI1
ELP2
ELP3
IKI3
ELP4
URM
1
MUD1
HMT1
GIM5
YKE2GIM3
GIM4
GIM5YKE2GIM3GIM4
ELP3
ELP6
IKI1
GIM
5
YKE2
GIM
3
GIM
4
HTZ
1
SWR1
VPS
71
SGF1
1
TGS1
SKI7
HCR
1
RPS8
NPL
3
DRS
1
IST3
TIF35
ISY1
NAM8
CBC2
CSN12
LEA1
CWC21
YHC1
PRP11
CWC27
NTC20
ECM2
MUD1
TGS1
SNU56BRR1
BUD13
MLP2
SYF2
YNR004W
LIN1
SNU66
PML1
YPR152C
MUD2
HMT1
IST3
TIF35
ISY1
NAM8
PRP4
YRA
1
CDC7
3LG
E1
SSD
1
RXT2
PHO
23SA
P30
SIN
3D
ST1
CSE2
MED
1SO
H1
SWD
1
RTT1
09
SWD
3BR
E1
EAF5
EAF7
EAF6
KAP1
22
SDC1
TAD
3
SWR1
VPS
71H
TZ1
EAF3
CUL3
RTT1
03
GBP
2
CTK1
CTK2
SIF2
HO
S2SE
T3LE
O1
NPL
3
UBP
6D
OA
1
RPN
4
RAD
23
RNA splicing
Transcription
Transcription
transcription
RNA transport RNA localization
Intracellulartransport
Ribonucleoproteincomplex
biogenesis
Ribonucleoproteincomplex
biogenesis
Deadenylationdependentdecapping of
nucleartranscribed
mRNA
biogenesis
Regulation ofcellular protein
metabolic ncRNA
mRNA catabolic
metabolic
Ribosomeprocess
process
process
processprocess
process
Posttranscriptionalregulation of
gene expression
dependentmacromolecule
catabolic
tRNAmodification
Modification
process
rRNA metabolic
mRNA metabolic
process
process
process
process
Macromoleculecatabolicprocess
aciddeacetylation
deacetylation
Protein amino
RNA catabolic
Macromoleculecatabolic
tRNA metabolic
Chromatinmodification
tRNA wobbleuridine
modification
rRNA metabolicMacromolecularcomplex subunit
organizationCellular protein
complexassembly
ncRNAmetabolic
Cellular proteinlocalization
Regulation oftranslation
Regulation of
from nucleusmRNA export Histone
Maturation ofSSU rRNA
tRNA
Tubulin complexassembly
RNA capping
processing
RNAprocessing
PRP19
aminoacylation
mRNA 3998400end
3998400end
Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes
the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)
Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein
folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878
score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial
BioMed Research International 9
RNA transportMaturation of
SSU-rRNA
RPS6A
POM152
UAF30
DBP5 NUP157
RPS11B
RPS6B
THP1
SRC1
TOP1
SEM1
NUP60
ENP1
SAC3
PUF4 HIR2
GLC7EFB1
THP2
(a)
RNA transport RNA localization
RRP6
YRA2
DBP5
NUP53
LRP1
THP1SRC1
TOP1
SEM1
NUP60
SAC3
PUF4HIR2
GLC7EFB1
THP2
(b)
Regulation ofgene expression
mRNA metabolicprocess
POP1
PRP42
CDC36SNU71
PAB1
SUP35
PRP3
PRP5
PRP6
PRP21
CDC73
EAF3
RTT103
RTT109
SWD1
SWD3
SDC1SAP30
RXT2
CTK1SET3
RPN4
PRP4
PHO23
NPL3
MED1
LGE1
LEO1
KAP122 DST1CTK2
EAF5HOS2
TAD3CSE2 BRE1
VPS71EAF7EAF6
SOH1
SIF2DOA1
UBP6 SWR1
SSD1
SIN3
HTZ1
(c)
Tubulin complex assembly
(protein folding)mRNA
process3998400 end
NAB2
PAN3
CTK1
GIM3
YKE2
GIM5
GIM4
(d)
Tubulin complex assembly
TOP1
MLP2 HIR2
CDC36CKA1
UTP4
GIM3
YKE2
GIM5
GIM4
(protein folding)Regulation oftranscription
(e)
mRNA splicingncRNA metabolic
process
YPR152C
YNR004WCSN12
MUD1
YHC1
MUD2
NAM8
MSM1
MSD1
MSK1
MRM1
MSF1
PRP19
SNU66
SNU56
PRP11
PML1ECM2
BUD13
HMT1
CWC21CWC27 TGS1
IST3CBC2
ISY1
LIN1
LEA1
MLP2
BRR1
NTC20SYF2
TIF35
(f)
Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix
10 BioMed Research International
6 Conclusion
In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)
References
[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919
[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007
[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008
[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001
[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004
[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative
epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006
[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007
[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008
[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008
[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260
[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006
[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011
[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011
[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010
[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004
[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014
[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009
[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997
[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014
[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013
[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012
[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012
[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008
BioMed Research International 11
[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008
[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007
[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009
[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011
[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011
[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010
[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008
[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 7
Negativeinteractions
Positiveinteractions
LOC1LIN1
BUD13
IST3LEA1
STO1
PRP11SNU66
CBC2BRR1MUD2
TGS1YNR004WCWC15ECM2
YPR152CNTC20ISY1CWC21SYF2CWC27PRP19CSN12
PML1MUD1
NAM8
HMT1SNU56
PRP42TIF35DBR1YHC1
LOC1
LIN1
BUD13
IST3
LEA1
STO1
PRP1
1
SNU66
CBC2
BRR1
MU
D2
TGS1
YNR0
04
WCW
C15
ECM2
YPR1
52
CN
TC20
ISY1
CWC2
1
SYF2
CWC2
7
PRP1
9
CSN12
PML1
MU
D1
NA
M8
HM
T1SN
U56
PRP4
2
TIF3
5
DBR
1
YHC1
NPL3PUF4THP2RTT103CTK2SIF2HOS2SET3CDC73LEO1
RTT109
RTT109
CTK1HTZ1SOH1
SOH1
DST1CSE2
CSE2
MED1
SDC1EAF7THP1SYC1RXT2
RXT2
SWR1VPS71
PHO23
SAP30
PHO23SAP30SIN3
SIN3
HIR2
PUF4
SIF2HOS2SET3CDC73LEO1
SOH1CSE2
SDC1EAF7
RXT2
PHO2
SAP3
0SI
N3
HIR2
HIR2
RTT109G
BP2
SWR1VPS71
GLC
7
HIR2
NPL
3
PUF4
THP2
RTT1
03
CTK2
SIF2
HO
S2SE
T3CD
C73
LEO1
RTT1
09
CTK1
HTZ
1
SOH1
DST
1
CSE2
MED
1
SDC1
EAF7
THP1
SYC1
RXT2
PHO23
SAP3
0
SIN3
HIR2
GBP
2YR
A1
THP2
RPN4
SOH1
SAC3
THP1
DO
A1
RAD23
RTT1
09
SIN3
SWD1
SWD3
SWD3
RPN
4CA
F40
BRE1
SWD1
BRE1
LGE1
CDC7
3EA
F6EA
F5EA
F7SR
C1PA
P2EA
F3
GBP2
RAD23
CSE2
PUF4
SWD1
SWD3
BRE1
LGE1
RAD52
SIF2
HO
S2
SOH1
SET3
CDC7
LEO1
SRC1
SDC1
EAF3
RTT1
0
SWR1
VPS
71
RAD5
RAD5
RAD2
EAF6
EAF7
BRE1SIF2HOS2SET3SEM1
SAC3THP1BUD13
RPN4
RAD52
GLC7SRC1TOR1RTT101RAD51
SOH1
CSE2SIN3
MLP2
BRE1
KEM
1
SIF2
HO
S2SE
T3SE
M1
SAC3
THP1
BUD13
RPN4
RAD52
GLC
7
SRC1
TOR1
RTT1
01
RAD51
SOH1
CSE2
SIN3
MLP
2
YBL104CYGR283CRIC1YPT6GIM4
YKE2GIM3
GIM5
YGR2
83
CRI
C1YP
T6G
IM4
YKE2
GIM
3
GIM
5
RNA splicing
Transcription
DNA metabolicprocess
Cellular proteinmodification
processTranslation
Cellular proteindegradation
Transport
Mitochondrionorganization
Ribosome
Chromosomeorganization
Response tostress
Cell-divisioncycle
Protein folding
LRS4
HEX
3SL
X8LR
P1RR
P6FP
R1TG
S1
Figure 4 Global view of the genetic cross talks between different RNA-related processes (GO slim items) Green and red represent astatistically significant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20)interactions respectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative geneticinteractions Nodes (balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heatmaps represent scores of interactions within one process and the rectangle heatmaps represent scores of interactions betweentwo processes
indicate the significant genetic interactions between genesin different gene sets Following this clue by analyzing thematrix S we found much evidence of genes involved in thesame functionalmodules andmany cross talks between func-tional modules (Figure 6) Actually many of them supportthe network in Figures 4-5 The information of gene pairswith extreme 119878 scores and the involved modules could befound in the Supplementary Material (Supplementary Data2)
Strategy 2 of sparse matrix analysis is similar to strategy 1(Figure 3) First for every row module (the same definitionas that in Figure 3) cluster the column genes based ontheir genetic spectrums across genes in this row moduleThen select the column gene sets in which there are genesbelonging to nonzero entries of sparse matrix to be definedas column modules Finally map these column modules toGO items identifying their enriched functional categories(hypergeometric test) Similarly for all row modules repeat
8 BioMed Research International
PRP4
YRA1
CDC73
LGE1SDC1
RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1
SWD1RTT109
SWD3BRE1EAF5EAF7KAP122
SSD1TAD3EAF6
SWR1VPS71HTZ1EAF3CUL3
SEM1THP2SAC3THP1
PUF4GLC7SRC1
TOP1NUP60EFB1DBP5HIR2
SEM
1TH
P2SA
C3TH
P1
PUF4
GLC
7SR
C1
TOP1
NU
P60
EFB1
DBP
5H
IR2
RTT103
GBP2
CTK1
CTK2
SIF2HOS2SET3LEO1NPL3
UBP6DOA1RPN4
RAD23
PRP4
YRA1
CDC73
LGE1
SDC1
RXT2PHO23
SAP30
RXT2
PHO23
SAP3
0
SIN3
TAD3
DST1
CSE2
MED1
SWD1
RTT109
SWD3BRE1
EAF5EAF7
KAP122
SSD1
SWR1VPS71
CUL3
SOP1
EAF6
SET3
HTZ1
EAF3
RTT103
GBP2
CTK1
CTK2
SIF2
HOS2
SIN
3
LEO1
NPL3
DOA1UBP6
RPN4
RAD23
CBC2
CSN12
LEA1
CWC21
YHC1
PRP19
ELP6
PRP11
CWC27NTC20
IKI1
ELP2
ELP3IKI3
ELP4
ECM2
MUD2TGS1
SNU56
BRR1
BUD13
MLP2SYF2
YNR004WLIN1
SNU66PML1
YPR152C
URM1
ELP6
IKI1
ELP2
ELP3
IKI3
ELP4
URM
1
MUD1
HMT1
GIM5
YKE2GIM3
GIM4
GIM5YKE2GIM3GIM4
ELP3
ELP6
IKI1
GIM
5
YKE2
GIM
3
GIM
4
HTZ
1
SWR1
VPS
71
SGF1
1
TGS1
SKI7
HCR
1
RPS8
NPL
3
DRS
1
IST3
TIF35
ISY1
NAM8
CBC2
CSN12
LEA1
CWC21
YHC1
PRP11
CWC27
NTC20
ECM2
MUD1
TGS1
SNU56BRR1
BUD13
MLP2
SYF2
YNR004W
LIN1
SNU66
PML1
YPR152C
MUD2
HMT1
IST3
TIF35
ISY1
NAM8
PRP4
YRA
1
CDC7
3LG
E1
SSD
1
RXT2
PHO
23SA
P30
SIN
3D
ST1
CSE2
MED
1SO
H1
SWD
1
RTT1
09
SWD
3BR
E1
EAF5
EAF7
EAF6
KAP1
22
SDC1
TAD
3
SWR1
VPS
71H
TZ1
EAF3
CUL3
RTT1
03
GBP
2
CTK1
CTK2
SIF2
HO
S2SE
T3LE
O1
NPL
3
UBP
6D
OA
1
RPN
4
RAD
23
RNA splicing
Transcription
Transcription
transcription
RNA transport RNA localization
Intracellulartransport
Ribonucleoproteincomplex
biogenesis
Ribonucleoproteincomplex
biogenesis
Deadenylationdependentdecapping of
nucleartranscribed
mRNA
biogenesis
Regulation ofcellular protein
metabolic ncRNA
mRNA catabolic
metabolic
Ribosomeprocess
process
process
processprocess
process
Posttranscriptionalregulation of
gene expression
dependentmacromolecule
catabolic
tRNAmodification
Modification
process
rRNA metabolic
mRNA metabolic
process
process
process
process
Macromoleculecatabolicprocess
aciddeacetylation
deacetylation
Protein amino
RNA catabolic
Macromoleculecatabolic
tRNA metabolic
Chromatinmodification
tRNA wobbleuridine
modification
rRNA metabolicMacromolecularcomplex subunit
organizationCellular protein
complexassembly
ncRNAmetabolic
Cellular proteinlocalization
Regulation oftranslation
Regulation of
from nucleusmRNA export Histone
Maturation ofSSU rRNA
tRNA
Tubulin complexassembly
RNA capping
processing
RNAprocessing
PRP19
aminoacylation
mRNA 3998400end
3998400end
Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes
the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)
Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein
folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878
score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial
BioMed Research International 9
RNA transportMaturation of
SSU-rRNA
RPS6A
POM152
UAF30
DBP5 NUP157
RPS11B
RPS6B
THP1
SRC1
TOP1
SEM1
NUP60
ENP1
SAC3
PUF4 HIR2
GLC7EFB1
THP2
(a)
RNA transport RNA localization
RRP6
YRA2
DBP5
NUP53
LRP1
THP1SRC1
TOP1
SEM1
NUP60
SAC3
PUF4HIR2
GLC7EFB1
THP2
(b)
Regulation ofgene expression
mRNA metabolicprocess
POP1
PRP42
CDC36SNU71
PAB1
SUP35
PRP3
PRP5
PRP6
PRP21
CDC73
EAF3
RTT103
RTT109
SWD1
SWD3
SDC1SAP30
RXT2
CTK1SET3
RPN4
PRP4
PHO23
NPL3
MED1
LGE1
LEO1
KAP122 DST1CTK2
EAF5HOS2
TAD3CSE2 BRE1
VPS71EAF7EAF6
SOH1
SIF2DOA1
UBP6 SWR1
SSD1
SIN3
HTZ1
(c)
Tubulin complex assembly
(protein folding)mRNA
process3998400 end
NAB2
PAN3
CTK1
GIM3
YKE2
GIM5
GIM4
(d)
Tubulin complex assembly
TOP1
MLP2 HIR2
CDC36CKA1
UTP4
GIM3
YKE2
GIM5
GIM4
(protein folding)Regulation oftranscription
(e)
mRNA splicingncRNA metabolic
process
YPR152C
YNR004WCSN12
MUD1
YHC1
MUD2
NAM8
MSM1
MSD1
MSK1
MRM1
MSF1
PRP19
SNU66
SNU56
PRP11
PML1ECM2
BUD13
HMT1
CWC21CWC27 TGS1
IST3CBC2
ISY1
LIN1
LEA1
MLP2
BRR1
NTC20SYF2
TIF35
(f)
Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix
10 BioMed Research International
6 Conclusion
In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)
References
[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919
[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007
[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008
[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001
[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004
[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative
epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006
[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007
[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008
[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008
[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260
[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006
[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011
[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011
[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010
[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004
[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014
[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009
[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997
[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014
[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013
[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012
[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012
[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008
BioMed Research International 11
[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008
[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007
[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009
[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011
[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011
[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010
[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008
[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
8 BioMed Research International
PRP4
YRA1
CDC73
LGE1SDC1
RXT2PHO23SAP30SIN3DST1CSE2MED1SOH1
SWD1RTT109
SWD3BRE1EAF5EAF7KAP122
SSD1TAD3EAF6
SWR1VPS71HTZ1EAF3CUL3
SEM1THP2SAC3THP1
PUF4GLC7SRC1
TOP1NUP60EFB1DBP5HIR2
SEM
1TH
P2SA
C3TH
P1
PUF4
GLC
7SR
C1
TOP1
NU
P60
EFB1
DBP
5H
IR2
RTT103
GBP2
CTK1
CTK2
SIF2HOS2SET3LEO1NPL3
UBP6DOA1RPN4
RAD23
PRP4
YRA1
CDC73
LGE1
SDC1
RXT2PHO23
SAP30
RXT2
PHO23
SAP3
0
SIN3
TAD3
DST1
CSE2
MED1
SWD1
RTT109
SWD3BRE1
EAF5EAF7
KAP122
SSD1
SWR1VPS71
CUL3
SOP1
EAF6
SET3
HTZ1
EAF3
RTT103
GBP2
CTK1
CTK2
SIF2
HOS2
SIN
3
LEO1
NPL3
DOA1UBP6
RPN4
RAD23
CBC2
CSN12
LEA1
CWC21
YHC1
PRP19
ELP6
PRP11
CWC27NTC20
IKI1
ELP2
ELP3IKI3
ELP4
ECM2
MUD2TGS1
SNU56
BRR1
BUD13
MLP2SYF2
YNR004WLIN1
SNU66PML1
YPR152C
URM1
ELP6
IKI1
ELP2
ELP3
IKI3
ELP4
URM
1
MUD1
HMT1
GIM5
YKE2GIM3
GIM4
GIM5YKE2GIM3GIM4
ELP3
ELP6
IKI1
GIM
5
YKE2
GIM
3
GIM
4
HTZ
1
SWR1
VPS
71
SGF1
1
TGS1
SKI7
HCR
1
RPS8
NPL
3
DRS
1
IST3
TIF35
ISY1
NAM8
CBC2
CSN12
LEA1
CWC21
YHC1
PRP11
CWC27
NTC20
ECM2
MUD1
TGS1
SNU56BRR1
BUD13
MLP2
SYF2
YNR004W
LIN1
SNU66
PML1
YPR152C
MUD2
HMT1
IST3
TIF35
ISY1
NAM8
PRP4
YRA
1
CDC7
3LG
E1
SSD
1
RXT2
PHO
23SA
P30
SIN
3D
ST1
CSE2
MED
1SO
H1
SWD
1
RTT1
09
SWD
3BR
E1
EAF5
EAF7
EAF6
KAP1
22
SDC1
TAD
3
SWR1
VPS
71H
TZ1
EAF3
CUL3
RTT1
03
GBP
2
CTK1
CTK2
SIF2
HO
S2SE
T3LE
O1
NPL
3
UBP
6D
OA
1
RPN
4
RAD
23
RNA splicing
Transcription
Transcription
transcription
RNA transport RNA localization
Intracellulartransport
Ribonucleoproteincomplex
biogenesis
Ribonucleoproteincomplex
biogenesis
Deadenylationdependentdecapping of
nucleartranscribed
mRNA
biogenesis
Regulation ofcellular protein
metabolic ncRNA
mRNA catabolic
metabolic
Ribosomeprocess
process
process
processprocess
process
Posttranscriptionalregulation of
gene expression
dependentmacromolecule
catabolic
tRNAmodification
Modification
process
rRNA metabolic
mRNA metabolic
process
process
process
process
Macromoleculecatabolicprocess
aciddeacetylation
deacetylation
Protein amino
RNA catabolic
Macromoleculecatabolic
tRNA metabolic
Chromatinmodification
tRNA wobbleuridine
modification
rRNA metabolicMacromolecularcomplex subunit
organizationCellular protein
complexassembly
ncRNAmetabolic
Cellular proteinlocalization
Regulation oftranslation
Regulation of
from nucleusmRNA export Histone
Maturation ofSSU rRNA
tRNA
Tubulin complexassembly
RNA capping
processing
RNAprocessing
PRP19
aminoacylation
mRNA 3998400end
3998400end
Figure 5 Global view of the genetic cross talks between different RNA-related processes (GO BP FAT) Green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactionsrespectively whereas yellow corresponds to cases where there are roughly equal numbers of positive and negative genetic interactions Nodes(balls) correspond to distinct functional processes edges (lines) represent how the processes are genetically connectedThe square heat mapsrepresent scores of interactions within one process and the rectangle heat maps represent scores of interactions between two processes
the above steps Now we got the information of connectionsbetween different functional modules (Figure 6)
Several interesting connections become evident when thedata is analyzed in this way For example there are negativegenetic interactions between SRC1 and POM152 [28] and alsophysical interactions between them [28] We have found thepredominantly negative interactions between RNA transportand maturation of SSU-rRNA (Figure 6(a)) Also we foundnegative interactions between RNA transport and RNA local-ization (Figure 6(b)) where the negative interaction betweenEFB1 and DBP5 revealed in the sparse matrix probablyreflects the cross talk between them Another striking find-ing is the obviously negative interactions between protein
folding and mRNA 31015840 end process (Figure 6(d)) PAN3 hasnegative interactions with GIM4 which has been stated in[9 29 30] with GIM5 which has been stated in [29] andwith YKE2 which has been stated in [29 30] NAB2 wasclustered together with PAN3 but showed no obvious geneticinteractions with protein folding genes in the original datasetbut in fact it has physical interactions with GIM3 GIM4and GIM5 [31] Furthermore we found cross talk betweenprotein folding and regulation of transcription Althoughgenes involved in regulation of transcription present low 119878
score between each other they are enriched in the sameGO functional item regulation of transcription (119901 value =0017813) More results could be found in SupplementaryMaterial
BioMed Research International 9
RNA transportMaturation of
SSU-rRNA
RPS6A
POM152
UAF30
DBP5 NUP157
RPS11B
RPS6B
THP1
SRC1
TOP1
SEM1
NUP60
ENP1
SAC3
PUF4 HIR2
GLC7EFB1
THP2
(a)
RNA transport RNA localization
RRP6
YRA2
DBP5
NUP53
LRP1
THP1SRC1
TOP1
SEM1
NUP60
SAC3
PUF4HIR2
GLC7EFB1
THP2
(b)
Regulation ofgene expression
mRNA metabolicprocess
POP1
PRP42
CDC36SNU71
PAB1
SUP35
PRP3
PRP5
PRP6
PRP21
CDC73
EAF3
RTT103
RTT109
SWD1
SWD3
SDC1SAP30
RXT2
CTK1SET3
RPN4
PRP4
PHO23
NPL3
MED1
LGE1
LEO1
KAP122 DST1CTK2
EAF5HOS2
TAD3CSE2 BRE1
VPS71EAF7EAF6
SOH1
SIF2DOA1
UBP6 SWR1
SSD1
SIN3
HTZ1
(c)
Tubulin complex assembly
(protein folding)mRNA
process3998400 end
NAB2
PAN3
CTK1
GIM3
YKE2
GIM5
GIM4
(d)
Tubulin complex assembly
TOP1
MLP2 HIR2
CDC36CKA1
UTP4
GIM3
YKE2
GIM5
GIM4
(protein folding)Regulation oftranscription
(e)
mRNA splicingncRNA metabolic
process
YPR152C
YNR004WCSN12
MUD1
YHC1
MUD2
NAM8
MSM1
MSD1
MSK1
MRM1
MSF1
PRP19
SNU66
SNU56
PRP11
PML1ECM2
BUD13
HMT1
CWC21CWC27 TGS1
IST3CBC2
ISY1
LIN1
LEA1
MLP2
BRR1
NTC20SYF2
TIF35
(f)
Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix
10 BioMed Research International
6 Conclusion
In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)
References
[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919
[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007
[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008
[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001
[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004
[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative
epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006
[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007
[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008
[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008
[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260
[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006
[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011
[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011
[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010
[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004
[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014
[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009
[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997
[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014
[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013
[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012
[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012
[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008
BioMed Research International 11
[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008
[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007
[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009
[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011
[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011
[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010
[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008
[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 9
RNA transportMaturation of
SSU-rRNA
RPS6A
POM152
UAF30
DBP5 NUP157
RPS11B
RPS6B
THP1
SRC1
TOP1
SEM1
NUP60
ENP1
SAC3
PUF4 HIR2
GLC7EFB1
THP2
(a)
RNA transport RNA localization
RRP6
YRA2
DBP5
NUP53
LRP1
THP1SRC1
TOP1
SEM1
NUP60
SAC3
PUF4HIR2
GLC7EFB1
THP2
(b)
Regulation ofgene expression
mRNA metabolicprocess
POP1
PRP42
CDC36SNU71
PAB1
SUP35
PRP3
PRP5
PRP6
PRP21
CDC73
EAF3
RTT103
RTT109
SWD1
SWD3
SDC1SAP30
RXT2
CTK1SET3
RPN4
PRP4
PHO23
NPL3
MED1
LGE1
LEO1
KAP122 DST1CTK2
EAF5HOS2
TAD3CSE2 BRE1
VPS71EAF7EAF6
SOH1
SIF2DOA1
UBP6 SWR1
SSD1
SIN3
HTZ1
(c)
Tubulin complex assembly
(protein folding)mRNA
process3998400 end
NAB2
PAN3
CTK1
GIM3
YKE2
GIM5
GIM4
(d)
Tubulin complex assembly
TOP1
MLP2 HIR2
CDC36CKA1
UTP4
GIM3
YKE2
GIM5
GIM4
(protein folding)Regulation oftranscription
(e)
mRNA splicingncRNA metabolic
process
YPR152C
YNR004WCSN12
MUD1
YHC1
MUD2
NAM8
MSM1
MSD1
MSK1
MRM1
MSF1
PRP19
SNU66
SNU56
PRP11
PML1ECM2
BUD13
HMT1
CWC21CWC27 TGS1
IST3CBC2
ISY1
LIN1
LEA1
MLP2
BRR1
NTC20SYF2
TIF35
(f)
Figure 6 Functional associations between functionalmodules identified by sparse information Blue andpurplish red represent genes (nodes)involved in differentmodules Edges (lines) represent how the processes are genetically connected where green and red represent a statisticallysignificant enrichment of negative (genetic interaction score [119878] le minus25) and positive (genetic interaction score [119878] gt 20) interactions Theemphasized (thicker) lines are the significant 119878 scores in sparse matrix
10 BioMed Research International
6 Conclusion
In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)
References
[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919
[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007
[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008
[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001
[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004
[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative
epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006
[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007
[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008
[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008
[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260
[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006
[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011
[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011
[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010
[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004
[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014
[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009
[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997
[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014
[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013
[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012
[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012
[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008
BioMed Research International 11
[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008
[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007
[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009
[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011
[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011
[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010
[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008
[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
10 BioMed Research International
6 Conclusion
In this paper we have introduced amethod named ldquoLRSDecrdquoto identify gene modules and cross talks between them in thegenetic interaction network LRSDec is based on low-rankapproximation with regularization parameters and nearlyoptimal error bounds We developed LRSDec to estimate thelow-rank part L and the sparse part S of the originalmatrixXIn the synthetic data LRSDec performed better than anothermatrix decomposition algorithm ldquoGoDecrdquo which has beenshown to be other existing decomposition algorithms [13]Thenwe applied our algorithm to a genetic interaction datasetto identify modules and cross talks between them Afterthe decomposition subsequent analysis revealed many noveland biologically meaningful connections Moreover LRSDeccould impute missing data while decomposing which couldnot be accomplished by other decomposition algorithmsActually LRSDec will not be limited by the yeast geneticinteraction data As long as the dataset has internal low-rank structure and some sparse information we can usethe LRSDec algorithm to decompose the data matrix intoaddition of two matrixes and then analyze them separatelyThis algorithm could be used widely in the field of geneticinteraction data analysis image processing and so on Wealso had a try on the genetic interaction data of C elegans inthe Supplementary Material
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work is supported by the National Natural ScienceFoundation of China (nos 31171262 31471246 and 31428012)and the National Key Basic Research Project of China (no2015CB910303)
References
[1] R A Fisher ldquoXVmdashthe correlation between relatives on thesupposition ofmendelian inheritancerdquoTransactions of the RoyalSociety of Edinburgh vol 52 no 2 pp 399ndash433 1919
[2] S R Collins K M Miller N L Maas et al ldquoFunctionaldissection of protein complexes involved in yeast chromosomebiology using a genetic interaction maprdquo Nature vol 446 no7137 pp 806ndash810 2007
[3] R Mani R P St Onge J L Hartman IV G Giaever and F PRoth ldquoDefining genetic interactionrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no9 pp 3461ndash3466 2008
[4] A H Y Tong M Evangelista A B Parsons et al ldquoSystematicgenetic analysis with ordered arrays of yeast deletion mutantsrdquoScience vol 294 no 5550 pp 2364ndash2368 2001
[5] X Pan D S Yuan D Xiang et al ldquoA robust toolkit forfunctional profiling of the yeast genomerdquo Molecular Cell vol16 no 3 pp 487ndash496 2004
[6] S R Collins M Schuldiner N J Krogan and J S WeissmanldquoA strategy for extracting and analyzing large-scale quantitative
epistatic interaction datardquo Genome Biology vol 7 no 7 articleR63 2006
[7] S R Collins P Kemmeren X-C Zhao et al ldquoToward a com-prehensive atlas of the physical interactome of Saccharomycescerevisiaerdquo Molecular amp Cellular Proteomics vol 6 no 3 pp439ndash450 2007
[8] A Roguev S Bandyopadhyay M Zofall et al ldquoConservationand rewiring of functionalmodules revealed by an epistasismapin fission yeastrdquo Science vol 322 no 5900 pp 405ndash410 2008
[9] G M Wilmes M Bergkessel S Bandyopadhyay et al ldquoAgenetic interaction map of rna-processing factors reveals linksbetween sem1dss1-containing complexes and mrna export andsplicingrdquoMolecular Cell vol 32 no 5 pp 735ndash746 2008
[10] R H Keshavan and S Oh ldquoA gradient descent algorithm onthe grassmanmanifold formatrix completionrdquo httparxivorgabs09105260
[11] D L Donoho ldquoCompressed sensingrdquo IEEE Transactions onInformation Theory vol 52 no 4 pp 1289ndash1306 2006
[12] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo JACMmdashJournal of the ACM vol 58 no3 article 11 2011
[13] T Zhou and D Tao ldquoGoDec randomized low-rank amp sparsematrix decomposition in noisy caserdquo in Proceedings of the 28thInternational Conference on Machine Learning (ICML rsquo11) pp33ndash40 July 2011
[14] R Mazumder T Hastie and R Tibshirani ldquoSpectral regular-ization algorithms for learning large incomplete matricesrdquo TheJournal of Machine Learning Research vol 11 pp 2287ndash23222010
[15] A H Y Tong G Lesage G D Bader et al ldquoGlobal mappingof the yeast genetic interaction networkrdquo Science vol 303 no5659 pp 808ndash813 2004
[16] Y Wang L Wang D Yang and M Deng ldquoImputing missingvalues for genetic interaction datardquo Methods vol 67 no 3 pp269ndash277 2014
[17] J Song and M Singh ldquoHow and when should interactome-derived clusters be used to predict functional modules andprotein functionrdquo Bioinformatics vol 25 no 23 pp 3143ndash31502009
[18] M J Cherry C Ball S Weng et al ldquoGenetic and physicalmaps of Saccharomyces cerevisiaerdquo Nature vol 387 no 6632supplement pp 67ndash73 1997
[19] K T Chathoth J D Barrass S Webb and J D Beggs ldquoAsplicing-dependent transcriptional checkpoint associated withprespliceosome formationrdquo Molecular Cell vol 53 no 5 pp779ndash790 2014
[20] R J Szczesny M A Wojcik L S Borowski et al ldquoYeastand human mitochondrial helicasesrdquo Biochimica et BiophysicaActamdashGene Regulatory Mechanisms vol 1829 no 8 pp 842ndash853 2013
[21] K Khan U Karthikeyan Y Li J Yan and K MuniyappaldquoSingle-molecule DNA analysis reveals that yeast Hop1 proteinpromotes DNA folding and synapsis Implications for conden-sation of meiotic chromosomesrdquo ACS Nano vol 6 no 12 pp10658ndash10666 2012
[22] E A Moehle C J Ryan N J Krogan T L Kress andC Guthrie ldquoThe yeast sr-like protein npl3 links chromatinmodification to mrna processingrdquo PLoS Genetics vol 8 no 11Article ID e1003101 2012
[23] J W S Brown D F Marshall and M Echeverria ldquoIntronicnoncoding RNAs and splicingrdquo Trends in Plant Science vol 13no 7 pp 335ndash342 2008
BioMed Research International 11
[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008
[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007
[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009
[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011
[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011
[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010
[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008
[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 11
[24] R Zabel C Bar C Mehlgarten and R Schaffrath ldquoYeast 120572-tubulin suppressor Ats1Kti13 relates to the Elongator complexand interacts with Elongator partner protein Kti11rdquo MolecularMicrobiology vol 69 no 1 pp 175ndash187 2008
[25] J Camblong N Iglesias C Fickentscher G Dieppois andF Stutz ldquoAntisense RNA stabilization induces transcriptionalgene silencing via histone deacetylation in S cerevisiaerdquo Cellvol 131 no 4 pp 706ndash717 2007
[26] S A Johnson G Cubberley and D L Bentley ldquoCotranscrip-tional recruitment of the mrna export factor yra1 by directinteraction with the 31015840 end processing factor pcf11rdquo MolecularCell vol 33 no 2 pp 215ndash226 2009
[27] L Wang L Hou M Qian F Li and M Deng ldquoIntegratingmultiple types of data to predict novel cell cycle-related genesrdquoBMC Systems Biology vol 5 supplement 1 article S9 2011
[28] W T Yewdell P Colombi T Makhnevych and C P LuskldquoLumenal interactions in nuclear pore complex assembly andstabilityrdquo Molecular Biology of the Cell vol 22 no 8 pp 1375ndash1388 2011
[29] M Costanzo A Baryshnikova J Bellay et al ldquoThe geneticlandscape of a cellrdquo Science vol 327 no 5964 pp 425ndash431 2010
[30] S J Dixon Y Fedyshyn J L Y Koh et al ldquoSignificant conser-vation of synthetic lethal genetic interaction networks betweendistantly related eukaryotesrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no43 pp 16653ndash16658 2008
[31] J Batisse C Batisse A Budd B Bottcher and E HurtldquoPurification of nuclear poly(A)-binding protein Nab2 revealsassociation with the yeast transcriptome and a messengerribonucleoprotein core structurerdquo The Journal of BiologicalChemistry vol 284 no 50 pp 34911ndash34917 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology