7
Ribonucleoprotein particles: advances and challenges in computational methods Shlomi Dvir 1 , Amir Argoetti 1 and Yael Mandel-Gutfreund 1,2 RNA-binding proteins (RBPs) interact with RNA to form Ribonucleoprotein Particles (RNPs). The interaction between RBPs and their RNA partners are traditionally thought to be mediated by highly conserved RNA-binding domains (RBDs). Recently, high-throughput studies led to the discovery of hundreds of novel proteins and domains, of which many do not follow the classical definition of RNA-binding. Despite technological innovations, experimental screenings are currently limited to the detection of specific types of RNPs, underscoring the importance of computational methods for predicting novel RBPs and RNA interacting residues and interfaces. Here, we discuss major challenges in computational prediction of RBPs and RBDs and outline new strategies to circumvent current limitations of experimental techniques. Addresses 1 Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel 2 Department of Computer Science, Technion-Israel Institute of Technology, Haifa 32000, Israel Corresponding author: Mandel-Gutfreund, Yael ([email protected]) Current Opinion in Structural Biology 2018, 53:124–130 This review comes from a themed issue on Protein–nucleic acid interactions Edited by Eric Westhof and Dinshaw Patel https://doi.org/10.1016/j.sbi.2018.08.002 0959-440X/ã 2018 Elsevier Ltd. All rights reserved. Introduction Within cells, RNA and proteins assemble into dynamic nucleic-acid protein complexes named Ribonucleoprotein Particles (RNPs) [1]. Traditionally, RNPs are thought to be formed by interactions between RNA and RNA-binding proteins (RBPs) that harbor evolutionary conserved RNA- binding domains (RBDs). RBDs are the basic units respon- sible for RNA recognition and binding and act in a combi- natorial fashion to perform multiple functions [2]. The RNA Recognition Motif (RRM) [3] is the most abundant RBD in mammalian cells, binding a variety of RNA sequences and structures via diverse modes of recognition [4]. In addition to RRM, other well-conserved classical RBDs exist (for details, see [2]). In 2002, Anantharaman et al. provided the first comprehensive list of 100 evolu- tionary conserved domains involved in RNA metabolism [5]. More recently, Gerstberger et al. extended the list to 799 RBDs that were either experimentally confirmed to bind RNA or found exclusively in well-characterized RNPs, assembling the first census of 1,542 human RBPs [6]. A revolution in the field of RBPs came from high- throughput RNA interactome capture (RIC) studies [7 ]. The first experiments in human cell lines revealed hun- dreds of new RBPs that form complexes with polyadeny- lated [poly(A)] RNAs [8,9]. Variations of the initial meth- odology have been applied to screen for RBPs in other cells types and organisms, generating an unprecedented atlas of RBPs (for a recent comprehensive review, see Hentze et al. [7 ]). To extend our understanding of protein-RNA rec- ognition other complementary methods were developed, specifying the RNA-binding regions within proteins at peptide-level resolution, for example, RNP xl [10], RBDmap [11] and RBR-ID [12]. Whereas screening studies have significantly broadened the repertoire of RBPs, they mainly enabled the discovery of proteins that bind stable poly(A) RNA (e.g. eukaryotic mRNAs and long non-coding RNAs). RBPs that likely escaped discovery by RIC experiments include proteins that bind non-poly(A) RNAs, proteins that regulate small RNAs, bacterial RNAs, and most eukaryotic organelle RNAs. Recently, two studies established a new methodol- ogy to capture RBPs, independent of the polyadenylation state of RNA, taking advantage of click chemistry to capture metabolically labeled RNAs [13,14 ]. Notably, although the total number of RBPs did not change radically, the latter studies provided a valuable list of novel proteins, adding to the growing compendium of RBPs generated by manual curation efforts, homology-based computational analysis, and proteome-wide discovery. Despite the increasing number of novel RBPs and RNPs identified, limitations in existing technologies suggest that many RBPs are yet to be discovered. In this review, we discuss the challenges faced by researchers studying RBPs and protein-RNA interactions. We further present an overview of the available state-of-the-art computational approaches for discovering novel RBPs, identifying RNA-binding sites in proteins and predicting modes of binding. Challenges in predicting RNA-binding proteins RNA-binding proteins may lack known RNA-binding domains Consistent with early findings [6], a large fraction of the RBPs identified in screening experiments do not harbor a Available online at www.sciencedirect.com ScienceDirect Current Opinion in Structural Biology 2018, 53:124–130 www.sciencedirect.com

Ribonucleoprotein particles: advances and challenges in ...yaelab.technion.ac.il/papers/Dvir_2018.pdf · other functional activities [28]. The dual-binding ofsome RBPs urges the development

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ribonucleoprotein particles: advances and challenges in ...yaelab.technion.ac.il/papers/Dvir_2018.pdf · other functional activities [28]. The dual-binding ofsome RBPs urges the development

Ribonucleoprotein particles: advances and challengesin computational methodsShlomi Dvir1, Amir Argoetti1 and Yael Mandel-Gutfreund1,2

Available online at www.sciencedirect.com

ScienceDirect

RNA-binding proteins (RBPs) interact with RNA to form

Ribonucleoprotein Particles (RNPs). The interaction between

RBPs and their RNA partners are traditionally thought to be

mediated by highly conserved RNA-binding domains (RBDs).

Recently, high-throughput studies led to the discovery of

hundreds of novel proteins and domains, of which many do not

follow the classical definition of RNA-binding. Despite

technological innovations, experimental screenings are

currently limited to the detection of specific types of RNPs,

underscoring the importance of computational methods for

predicting novel RBPs and RNA interacting residues and

interfaces. Here, we discuss major challenges in computational

prediction of RBPs and RBDs and outline new strategies to

circumvent current limitations of experimental techniques.

Addresses1 Faculty of Biology, Technion-Israel Institute of Technology, Haifa

32000, Israel2Department of Computer Science, Technion-Israel Institute of

Technology, Haifa 32000, Israel

Corresponding author: Mandel-Gutfreund, Yael ([email protected])

Current Opinion in Structural Biology 2018, 53:124–130

This review comes from a themed issue on Protein–nucleic acid

interactions

Edited by Eric Westhof and Dinshaw Patel

https://doi.org/10.1016/j.sbi.2018.08.002

0959-440X/ã 2018 Elsevier Ltd. All rights reserved.

IntroductionWithin cells, RNA and proteins assemble into dynamic

nucleic-acid protein complexes named Ribonucleoprotein

Particles (RNPs) [1]. Traditionally, RNPs are thought to be

formed by interactions between RNA and RNA-binding

proteins (RBPs) that harbor evolutionary conserved RNA-

binding domains (RBDs). RBDs are the basic units respon-

sible for RNA recognition and binding and act in a combi-

natorial fashion to perform multiple functions [2]. The

RNA Recognition Motif (RRM) [3] is the most abundant

RBD in mammalian cells, binding a variety of RNA

sequences and structures via diverse modes of recognition

[4]. In addition to RRM, other well-conserved classical

RBDs exist (for details, see [2]). In 2002, Anantharaman

Current Opinion in Structural Biology 2018, 53:124–130

et al. provided the first comprehensive list of �100 evolu-

tionary conserved domains involved in RNA metabolism

[5]. More recently, Gerstberger et al. extended the list to

799 RBDs that were either experimentally confirmed to

bind RNA or found exclusively in well-characterized

RNPs, assembling the first census of 1,542 human RBPs

[6]. A revolution in the field of RBPs came from high-

throughput RNA interactome capture (RIC) studies [7��].The first experiments in human cell lines revealed hun-

dreds of new RBPs that form complexes with polyadeny-

lated [poly(A)] RNAs [8,9]. Variations of the initial meth-

odology have been applied to screen for RBPs in other cells

types and organisms, generating an unprecedented atlas of

RBPs (for a recent comprehensive review, see Hentze et al.[7��]). To extend our understanding of protein-RNA rec-

ognition other complementary methods were developed,

specifying the RNA-binding regions within proteins at

peptide-level resolution, for example, RNPxl [10],

RBDmap [11] and RBR-ID [12].

Whereas screening studies have significantly broadened

the repertoire of RBPs, they mainly enabled the discovery

of proteins that bind stable poly(A) RNA (e.g. eukaryotic

mRNAs and long non-coding RNAs). RBPs that likely

escaped discovery by RIC experiments include proteins

that bind non-poly(A) RNAs, proteins that regulate small

RNAs, bacterial RNAs, and most eukaryotic organelle

RNAs. Recently, two studies established a new methodol-

ogy to capture RBPs, independent of the polyadenylation

state of RNA, taking advantage of click chemistry to

capture metabolically labeled RNAs [13,14�]. Notably,

althoughthetotalnumber ofRBPs didnotchangeradically,

the latter studies provided a valuable list of novel proteins,

adding to the growing compendium of RBPs generated by

manual curation efforts, homology-based computational

analysis, and proteome-wide discovery. Despite the

increasing number of novel RBPs and RNPs identified,

limitations in existing technologies suggest that many

RBPs are yet to be discovered. In this review, we discuss

the challenges faced by researchers studying RBPs and

protein-RNA interactions. We further present an overview

of the available state-of-the-art computational approaches

for discovering novel RBPs, identifying RNA-binding sites

in proteins and predicting modes of binding.

Challenges in predicting RNA-binding proteinsRNA-binding proteins may lack known RNA-binding

domains

Consistent with early findings [6], a large fraction of the

RBPs identified in screening experiments do not harbor a

www.sciencedirect.com

Page 2: Ribonucleoprotein particles: advances and challenges in ...yaelab.technion.ac.il/papers/Dvir_2018.pdf · other functional activities [28]. The dual-binding ofsome RBPs urges the development

Ribonucleoprotein particles: a theoretical perspective Dvir, Argoetti and Mandel-Gutfreund 125

known RBD [8,9]. This observation was recapitulated in

other studies from different cell types and organisms [7��].Figure 1 depicts a summary of the RBPs detected in

screening experiments for six major organisms. Based on

their domain content, RBPs wereclassified as RBD(blue) if

they contained at least one known RNA-binding domain,

RBD-unknown (green) if they did not harbor a known

RBD, and no-Pfam-domain (purple) when no domains

could be detected (for a detailed list, see Supplementary

data). Notably, between 42% and 64% of RBPs lack known

RBDs in human and roundworm, respectively (Figure 1;

green and purple fractions). The observed differences in

the fraction of proteins that are classified as RBD-unknown

can be attributed to experimental coverage, as well as to the

total number of characterized RBDs in each organism.

Despite these differences, the high proportion of proteins

that lack known RBDs reinforces the complexity and

challenges associated with the computational prediction

of RNA-binding function in organisms in which experi-

mental data on RBPs is limited.

As previously reported and shown in Figure 1, many

proteins detected in screens for RBPs do not possess a

structured Pfam domain [7��]. Moreover, experimentally

Figure 1

Human Mouse Yeast

Fruit fly Thale cress Roundworm

RBD RBD-unknown No pfam domain

58.38% 45.58% 37.64%

4.47%

2.96%49.95% 56.14%38.66%

49.29% 45.27% 36.09%

61.38%

6.82%

47.91%44.79%

5.92%

2.53%

6.22%

Current Opinion in Structural Biology

RBP domain distribution across major eukaryotic kingdoms. Fraction

of proteins harboring RNA-binding domains (RBDs) in high-throughput

screening studies for RBPs. Protein domains were obtained from Pfam

release 31.0. Domains were classified as RNA-binding (RBD) or

unknown to function in RNA recognition. See Supplemetary data for

the definition of RBDs (primarily based on Gerstberger et al. [6]). A

protein was defined as RBD (light blue) if it had at least one RNA-

binding domain or RBD-unknown if no RBD was found (light green).

Proteins for which no Pfam domains could be identified were

classified as no-Pfam-domain (purple). Analyzed organisms include

Human (Homo sapiens, Taxonomy ID: 9606), Mouse (Mus musculus,

Taxonomy ID: 10090), Yeast (Saccharomyces cerevisiae, Taxonomy

ID: 559292), Fruit fly (Drosophila melanogaster, Taxonomy ID: 7227),

Thale cress (Arabidopsis thaliana, Taxonomy ID: 3702) and

Roundworm (Caenorhabditis elegans, Taxonomy ID: 6239). The full list

of proteins and their categories is provided in Supplementary data.

www.sciencedirect.com

detected RBPs were shown to be highly enriched in Intrin-

sically Disordered Regions (IDRs) suggesting that, in many

cases, protein-RNA interactions within RNP complexes

are mediated by non-structured polypeptide chains

[15,16�]. In line with these findings, methods that are

dedicated to identifying RBDs have shown that approxi-

mately half of the protein regions that directly interact with

RNA are mapped to sequences with low amino acid com-

plexity and high disorder propensity [11,12]. While experi-

mental evidence strongly indicates that IDRs are charac-

teristic of RBPs, this feature is by no means unique to RBPs

[15]. For example, IDRs are commonly found in the tails of

DNA-Binding Proteins (DBPs) and were suggested to

facilitate the process of transcription factor-mediated

DNA search [17]. Thus, although disorder propensity is

a characteristic feature of RBPs, it cannot be used exclu-

sively for either predicting novel RBPs or identifying RNA

binding sites within known RBPs.

Classical RNA-binding domains do not necessarily bind

RNA

Extensive structural studies of protein-RNA complexes

revealed general principles that guide RNA recognition

via classical RBDs [18]. The accumulation of X-ray crys-

tallography- and NMR-based structural data for the most

abundant RBDs in mammalian cells (RRM, KH), pro-

vides valuable insights into the diverse and complex

modes of interaction adopted by these domain families

[3,19]. Notably, while RRM, KH and other classical

RBDs have evolved to bind RNA, members of these

families were shown to participate in other functions.

Some RRM domains (also known as atypical RRMs) have

been implicated in mediating protein-protein interac-

tions, rather than binding RNA, while others bind

RNA as well as proteins [20]. A well-characterized exam-

ple is the RRM domain of Y14. This domain directly

interacts with the Mago protein, as part of the Exon

Junction Complex (EJC) and has no direct contact with

RNA [21] (Figure 2a). Examples of several other RRM

domains that have been shown to mediate protein-protein

interactions are illustrated in Figure 2. Overall, the obser-

vations that classical RNA-binding domains do not nec-

essarily bind RNA raise a major question as to what extent

can we rely on functional annotation of RNA-binding that

is based solely on the presence of classical RBDs.

Many RBPs act as moonlighting proteins

An intriguing conclusion arising from previous RIC experi-

ments is that many of the detected proteins have other non-

RNA-related functions. In many cases, these proteins were

not previously shown to bind RNA [7��]. Among the

significantly enriched, non-RNA-binding functions, are

glycolytic enzymatic activity and intermediary metabo-

lism, specifically enriched in the yeast [22,23] and Arabi-dopsis mRNA interactome, respectively [24]. A well-known

dual-function protein in the intermediary metabolic path-

way is the Iron Regulatory Protein 1 (IRP1) [25]. Other

Current Opinion in Structural Biology 2018, 53:124–130

Page 3: Ribonucleoprotein particles: advances and challenges in ...yaelab.technion.ac.il/papers/Dvir_2018.pdf · other functional activities [28]. The dual-binding ofsome RBPs urges the development

126 Protein–nucleic acid interactions

Figure 2

Y14-MagoFruit fly

U2AF2Human

RBM17Human

NCBP2-NCBP1Human

U2AF1Human

SET1Yeast

(a) (c) (e)

(b) (d) (f)

Current Opinion in Structural Biology

RRM domains predicted by computational approaches as non-RNA-binding. The RRM domains (in green) were experimentally validated to be

involved in protein–protein interactions and were predicted by structural-based computational methods as non-RNA-binding. (a) RRM domain of

the Fruit fly Y14 protein in complex with the Mago and Pym proteins (colored in bronze) in the Exon Junction Complex (PDB 1rk8). (b) RRM

domain of the human NCBP2 interacting with NCBP1 protein (bronze) within the Cap Binding Complex (PDB 6d0y). (c) The third RRM of the

human U2AF2 protein that was solved in complex with an N-terminal SF1 peptide (not shown) (PDB 1opi). (d) RRM domain of U2AF1 from the

U2AF1/U2AF2 human complex (PDB 1jmt). (e) The human RBM17 (SFP45) RRM domain known to mediate protein-protein interaction in the

spliceosome (PDB 2pe8). (f) The N-terminal RRM domain of SET1 from Saccharomyces cerevisiae (PDB 2j8a). Molecular graphics and analyses

were performed with the USCF Chimera package [50].

interesting groups of non-classical RBPs include proteins

involved in chromatin regulation [12] and double-stranded

(ds) DNA-binding [26]. While the enrichment for DNA-

related functions in published interactomes may result

from contamination, growing evidence supports the exis-

tence of proteins that bind DNA as well as RNA, in a

specific manner, namely DNA and RNA binding proteins

(DRBPs) [27]. Due to the complexity of the problem, it is

not trivial to predict dual-binding proteins, and specifically

moonlighting proteins, which primarilybind RNA but carry

out other functional activities [28]. The dual-binding

nature of some RBPs urges the development of new

experimental and computational methods that can

uniquely identify the specific functional regions and their

interacting partners.

Computational advances in studyingribonucleoproteinsPredicting classical and non-classical RNA-binding

proteins and domains

Recent efforts to systematically detect RBPs and RBDs

have considerably expanded the repertoire of RBPs,

Current Opinion in Structural Biology 2018, 53:124–130

originally generated from low-throughput structural and

biochemical studies. Despite these advances, it is likely

that many more RBPs await discovery. Among the under-

represented RBPs are proteins that bind non-poly(A)

RNAs, lowly expressed RBPs and species-specific RBPs

from organisms that have not been thoroughly studied. In

addition to experimental characterization, significant

attempts have been made to improve on existing compu-

tational algorithms to reliably predict RBPs. Such RBP

prediction methods can be roughly divided into two

groups: sequence-based approaches (e.g. RBPPred [29])

that are trained on evolutionary information and physico-

chemical properties extracted from primary sequence,

and structure-based methods that rely on the availability

of the protein three-dimensional structure [30–32]. Other

algorithms (e.g. SPOT-Seq-RNA [33]) integrate tem-

plate-based structure prediction (fold recognition) to infer

RNA function from sequence. Notably, most existing

sequence-based and structure-based methods for RBP

prediction strongly depend on evolutionary conservation

and thus are not appropriate for predicting novel RBPs

that do not share homology with known RBPs. To

www.sciencedirect.com

Page 4: Ribonucleoprotein particles: advances and challenges in ...yaelab.technion.ac.il/papers/Dvir_2018.pdf · other functional activities [28]. The dual-binding ofsome RBPs urges the development

Ribonucleoprotein particles: a theoretical perspective Dvir, Argoetti and Mandel-Gutfreund 127

account for this, non-homology based methods have been

developed and shown to be successful in predicting novel

RBPs [31] and in classifying classical RBDs as nucleic-

acid or non-nucleic-acid binding [34�]. Figure 2 illustrates

six RRM domains that were experimentally confirmed to

be involved in protein-protein interactions (such as Y14),

all correctly classified as non-RNA-binding by structure-

based prediction algorithms (such as [34�]). However,

such methods that rely on features extracted from solved

structures or structural models would likely fail to predict

RBPs that are intrinsically disordered. To overcome this,

a new computational algorithm combined the prediction

of structured and disordered regions, attempting to pre-

dict all RBPs in the human proteome [35]. A different

computational approach that relies on experimental pro-

tein-protein interaction data, named SONAR (Support

vector machine Obtained from Neighborhood Associated

RBPs), takes advantage of the strong tendency of RBPs to

interact with one another [36�]. SONAR has been suc-

cessfully applied to predict a large number of RBPs in

several model organisms [36�].

Although considerable progress has been achieved by the

development of powerful machine learning algorithms,

current RBP predictors miss-annotate a significant frac-

tion of the experimentally detected proteins (presumably

false negatives, FNs). On the other hand, many proteins

predicted to bind RNA are not detected by existing

experimental methodologies (presumably false positives,

FPs). The relatively high number of miss-classified pro-

teins (FNs and FPs) is likely due to the incredible

complexity of the computational problem and the fact

that we still lack a complete understanding of the physi-

cochemical, molecular and structural properties that

endow proteins with RNA-binding capacity. Nonethe-

less, the discrepancy between the computational predic-

tions and the experimental results can also be explained

by shortcomings in existing technologies that are not

tailored to detect the various types of RBP populations.

Predicting RNA-binding sites in proteins and their

interactions with RNAs

In parallel to the development of computational tools for

RBP prediction, dedicated sequence-based and struc-

ture-based algorithms were designed to map the RNA-

interacting surface of proteins and to predict the RNA-

binding residues that mediate protein-RNA interactions

(reviewed in [37��]). In a comprehensive survey, Miao

and Westhof [38��] assessed a large number of computa-

tional programs for the prediction of nucleic-acid binding

sites (implemented in web servers and standalone pro-

grams), providing an extensive view of the existing tools

for predicting RNA-binding and DNA-binding residues.

The results of the survey reinforce the difficulty in

uniquely identifying the RNA-interacting regions in pro-

teins, as the high similarity in the electrostatic properties

renders RNA and DNA binding surfaces practically

www.sciencedirect.com

indistinguishable. Although differences in the shape

and geometry of DNA and RNA binding interfaces

(explicitly between interfaces that bind ssRNA and

dsDNA) have been previously described [39,40], RNA

and DNA binding residues have highly similar features,

leading to many false predictions [38��]. To address this,

DRNApred was recently introduced, employing a two-

layer architecture to reduce cross-predictions of nucleic-

acid binding residues [41].

As discussed above, many RBPs do not possess a classical

RBD or well-defined RNA-binding features. Given that

most methods that predict RNA-binding interfaces con-

sider evolutionary information, they are less suitable for

predicting the location of RNA-binding sites in non-

classical RBPs/RBDs. A well-studied example of a non-

classical RBP is the glycosidic enzyme glyceraldehyde 3-

phosphate dehydrogenase (GAPDH). Early studies have

demonstrated that GAPDH binds tRNAs [42]. GAPDH

was further suggested to bind AU-rich elements in the

30UTR of interferon-gamma (IFNg) mRNA, possibly

regulating its turnover and translation [43]. While most

computational methods fail to predict GAPDH as RBP,

algorithms that rely on physicochemical features for pre-

dicting RNA-binding interfaces, as OPRA (Optimal Pro-

tein-RNA Area) [44] and BindUP [34�], point to a similar

region on the surface of the homodimer as likely to bind

RNA (Figure 3a and b, respectively). Moreover, as

demonstarted in the figure, methods for predicting

RNA-binding residues, such as aaRNA [45], successfully

identify specific residues that were suggested by early

biochemical studies to mediate RNA-recognition [43]

(Figure 3c).

Docking RNAs to classical and non-classical RBPs

Predicting RBPs and RNA-binding sites is an immense

computational challenge. The hardest and most reward-

ing task would be to capture the specific interactions

between the protein and RNA within the RNP complex.

In contrast to the advances achieved in modeling protein–

protein interactions, the prediction of protein-RNA inter-

actions is still considered a daunting task [46]. Many

methods that were originally designed for docking pro-

tein-protein and protein-peptide complexes have also

been utilized for protein-RNA docking, for example

HDOCK [47]. Recently, the Bujnicki’s group has intro-

duced NPDock, a web server dedicated to protein-

nucleic acid docking that combines specific protein-

RNA scoring functions with established docking tools

[48] (see [49] and [37��] for reviews of methods for

protein-RNA docking). Existing tools for docking RNA

to proteins are limited by the availability of experimen-

tally determind protein and RNA three-dimensional

structures, the ability to correctly model the structure

of the individual parts of the complex and the outstanding

complexity in modeling the flexibility of the system.

Nevertheless, the growing number of RBP structures

Current Opinion in Structural Biology 2018, 53:124–130

Page 5: Ribonucleoprotein particles: advances and challenges in ...yaelab.technion.ac.il/papers/Dvir_2018.pdf · other functional activities [28]. The dual-binding ofsome RBPs urges the development

128 Protein–nucleic acid interactions

Figure 3

(d) (e)

(a) (b) (c)

RNA binding-propensity RN A binding-propensityPredicted as RN A binding patch

Most favorable

Least favorable

High Low

Current Opinion in Structural Biology

Computational approaches predict the RNA-binding interfaces and RNA-binding sites on the solved structure of the non-classical RBP GAPDH

(PDB 5c7i). (a) RNA-binding propensity calculated by OPRA [44], predicting the RNA-binding interface on the GAPDH homodimer. (b) The largest

electrostatic patch calculated by BindUP [34�] corresponds well to the predicted RNA-binding interface on GAPDH homodimer. (c) The predicted

RNA-binding residues on GAPDH (predicted by aaRNA [45]) points to the NAD+ binding site, previously suggested to be involved in RNA-binding

[43]. (d) The 30UTR of IFNg (ENST00000229135, positions 659–700) docked to GAPDH by HDOCK [47]. Intriguingly, the region predicted by the

docking algorithm completely overlaps with the unified RNA-binding interface as calculated by OPRA and BindUP (colored in purple). (d) A view of

the predicted interactions between the AUUUA region (red) on the IFNg 30UTR and the predicted RNA-binding residues on GAPDH. Molecular

graphics and analyses were performed with the USCF Chimera package [50].

and protein-RNA complexes in the Protein Data Bank

(PDB) contributes to the continuous improvement in

performance accuracy of the methods. An attempt in this

direction is exemplified in Figure 3d and e, presenting a

predictive model of the IFNg 30UTR docked to the non-

classical RBP GAPDH.

ConclusionsRecent experimental technologies have dramatically

expanded the compendium of RBPs, contributing to

our understanding of the basic principles that govern

protein-RNA recognition. Computational work carried

out in parallel to the experimental efforts has further

Current Opinion in Structural Biology 2018, 53:124–130

advanced the identification and characterization of RBPs,

RBDs and their interaction with RNA. To uncover the

entire atlas of RBPs, a number of key challenges remain

and must be addressed: (a) approximately 50% of RBPs

lack known RBDs and thus cannot be discovered by

inferring homology from known RBPs, (b) classical RBDs

have extensively diverged during evolution to the extent

that some do not necessarily bind RNA, leading to many

false positive predictions, (c) many RBPs may have dual-

binding function, acting as moonlighting proteins, hence

adding to the difficulty of RBP prediction. These chal-

lenges can be partially tackled by new computational

algorithms and machine learning techniques that rely

www.sciencedirect.com

Page 6: Ribonucleoprotein particles: advances and challenges in ...yaelab.technion.ac.il/papers/Dvir_2018.pdf · other functional activities [28]. The dual-binding ofsome RBPs urges the development

Ribonucleoprotein particles: a theoretical perspective Dvir, Argoetti and Mandel-Gutfreund 129

on the continuous flow of diverse types of RBPs identified

by existing and new experimental technologies. Future

advances are needed to fully capture the census of RBPs,

to understand their regulatory functions and modes of

interactions and to unravel the mechanistic details and

dynamics of RNPs.

FundingThis work was supported by The Israel Science Founda-

tion ISF [Grant number 1182/16].

Appendix A. Supplementary dataSupplementary material related to this article can be

found, in the online version, at doi:https://doi.org/10.

1016/j.sbi.2018.08.002.

References and recommended readingPapers of particular interest, published within the period of review,have been highlighted as:

� of special interest�� of outstanding interest

1. Singh G, Pratt G, Yeo GW, Moore MJ: The clothes make themRNA: past and present trends in mRNP fashion. Annu RevBiochem 2015, 84:325-354.

2. Lunde BM, Moore C, Varani G: RNA-binding proteins: modulardesign for efficient function. Nat Rev Mol Cell Biol 2007, 8:479-490.

3. Daubner GM, Clery A, Allain FH: RRM-RNA recognition: NMR orcrystallography . . . and new findings. Curr Opin Struct Biol2013, 23:100-108.

4. Clery A, Blatter M, Allain FH: RNA recognition motifs: boring?Not quite. Curr Opin Struct Biol 2008, 18:290-298.

5. Anantharaman V, Koonin EV, Aravind L: Comparative genomicsand evolution of proteins involved in RNA metabolism. NucleicAcids Res 2002, 30:1427-1464.

6. Gerstberger S, Hafner M, Tuschl T: A census of human RNA-binding proteins. Nat Rev Genet 2014, 15:829-845.

7.��

Hentze MW, Castello A, Schwarzl T, Preiss T: A brave new worldof RNA-binding proteins. Nat Rev Mol Cell Biol 2018, 19:327-341.

This is a recent comprehensive review discussing the state-of-the-arthigh-throughput experimental technologies for discovering RNA-bindingproteins and the main expected and unexpected discoveries arising fromthese experiments.

8. Baltz AG, Munschauer M, Schwanhausser B, Vasile A,Murakawa Y, Schueler M, Youngs N, Penfold-Brown D, Drew K,Milek M et al.: The mRNA-bound proteome and its globaloccupancy profile on protein-coding transcripts. Mol Cell2012, 46:674-690.

9. Castello A, Fischer B, Eichelbaum K, Horos R, Beckmann BM,Strein C, Davey NE, Humphreys DT, Preiss T, Steinmetz LM et al.:Insights into RNA biology from an atlas of mammalian mRNA-binding proteins. Cell 2012, 149:1393-1406.

10. Kramer K, Sachsenberg T: Photo-cross-linking and high-resolution mass spectrometry for assignment of RNA-bindingsites in RNA-binding proteins. Nat Methods 2014, 11:1064-1070.

11. Castello A, Frese CK, Fischer B: Identification of RNA-bindingdomains of RNA-binding proteins in cultured cells on asystem-wide scale with RBDmap. Nat Protoc 2017, 12:2447-2464.

12. He C, Sidoli S, Warneford-Thomson R, Tatomer DC, Wilusz JE,Garcia BA, Bonasio R: High-resolution mapping of RNA-bindingregions in the nuclear proteome of embryonic stem cells. MolCell 2016, 64:416-430.

www.sciencedirect.com

13. Bao X, Guo X, Yin M, Tariq M, Lai Y, Kanwal S, Zhou J, Li N, Lv Y,Pulido-Quetglas C et al.: Capturing the interactome of newlytranscribed RNA. Nat Methods 2018, 15:213-220.

14.�

Huang R, Han M, Meng L, Chen X: Transcriptome-widediscovery of coding and noncoding RNA-binding proteins.Proc Natl Acad Sci USA 2018, 115:e3879-e3887.

This paper presents a novel experimental methodology to capture RNA-binding proteins based on click chemistry. The method overcomes themajor limitation of current technologies that capture mainly RNPs withpolyadenylated RNAs.

15. He B, Wang K, Liu Y, Xue B, Uversky VN, Dunker AK: Predictingintrinsic disorder in proteins: an overview. Cell Res 2009,19:929-949.

16.�

Calabretta S, Richard S: Emerging roles of disorderedsequences in RNA-binding proteins. Trends Biochem Sci 2015,40:662-672.

This review discusses the unique properties of RNA-binding proteinspossessing intrinsically disordered features and the role of disorderedsequences in the function of RNA-binding proteins and the formation ofRNPs.

17. Vuzman D, Levy Y: Intrinsically disordered regions as affinitytuners in protein-DNA interactions. Mol Biosyst 2012, 8:47-57.

18. Chen Y, Varani G: Protein families and RNA recognition. FEBS J2005, 272:2088-2097.

19. Nicastro G, Taylor IA, Ramos A: KH-RNA interactions: back inthe groove. Curr Opin Struct Biol 2015, 30:63-70.

20. Maris C, Dominguez C, Allain FH: The RNA recognition motif, aplastic RNA-binding platform to regulate post-transcriptionalgene expression. FEBS J 2005, 272:2118-2131.

21. Fribourg S, Gatfield D, Izaurralde E, Conti E: A novel mode ofRBD-protein recognition in the Y14-Mago complex. Nat StructBiol 2003, 10:433-439.

22. Beckmann BM, Horos R, Fischer B, Castello A, Eichelbaum K,Alleaume AM, Schwarzl T, Curk T, Foehr S, Huber W et al.: TheRNA-binding proteomes from yeast to man harbourconserved enigmRBPs. Nat Commun 2015, 6:10127.

23. Matia-Gonzalez AM, Laing EE, Gerber AP: Conserved mRNA-binding proteomes in eukaryotic organisms. Nat Struct Mol Biol2015, 22:1027-1033.

24. Koster T, Marondedze C, Meyer K, Staiger D: RNA-bindingproteins revisited – the emerging arabidopsis mRNAinteractome. Trends Plant Sci 2017, 22:512-526.

25. Hentze MW, Argos P: Homology between IRE-BP, a regulatoryRNA-binding protein, aconitase, and isopropylmalateisomerase. Nucleic Acids Res 1991, 19:1739-1740.

26. Conrad T, Albrecht AS, de Melo Costa VR, Sauer S, Meierhofer D,Orom UA: Serial interactome capture of the human cellnucleus. Nat Commun 2016, 7:11212.

27. Hudson WH, Ortlund EA: The structure, function and evolutionof proteins that bind DNA and RNA. Nat Rev Mol Cell Biol 2014,15:749-760.

28. Jeffery CJ: Moonlighting proteins – an update. Mol Biosyst 2009,5:345-350.

29. Zhang X, Liu S: RBPPred: predicting RNA-binding proteinsfrom sequence using SVM. Bioinformatics 2017, 33:854-862.

30. Ahmad S, Sarai A: Analysis of electric moments of RNA-bindingproteins: implications for mechanism and prediction. BMCStruct Biol 2011, 11:8.

31. Shazman S, Mandel-Gutfreund Y: Classifying RNA-bindingproteins based on electrostatic properties. PLoS Comput Biol2008, 4:e1000146.

32. Zhao H, Yang Y, Zhou Y: Structure-based prediction of RNA-binding domains and RNA-binding sites and application tostructural genomics targets. Nucleic Acids Res 2011, 39:3017-3025.

33. Yang YU, Zhao H, Wang J, Zhou Y: SPOT-Seq-RNA: predictingprotein-RNA complex structure and RNA-binding function by

Current Opinion in Structural Biology 2018, 53:124–130

Page 7: Ribonucleoprotein particles: advances and challenges in ...yaelab.technion.ac.il/papers/Dvir_2018.pdf · other functional activities [28]. The dual-binding ofsome RBPs urges the development

130 Protein–nucleic acid interactions

fold recognition and binding affinity prediction. Methods MolBiol 2014, 1137:119-130.

34.�

Paz I, Kligun E, Bengad B, Mandel-Gutfreund Y: BindUP: a webserver for non-homology-based prediction of DNA and RNAbinding proteins. Nucleic Acids Res 2016, 44 W568–574.

This paper presents a web server for classifying domains as nucleic-acidor non-nucleic-acid-binding, based solely on the structural and physi-cochemical properties of domains from solved structures or structuralmodels. In addition to the binding classification, the program displays thelargest electrostatic patch on the domains, representing the RNA or DNAbinding interface.

35. Chowdhury S, Zhang J, Kurgan L: In silico prediction andvalidation of novel RNA binding proteins and residues in thehuman proteome. Proteomics 2018:e1800064 http://dx.doi.org/10.1002/pmic.201800064.

36.�

Brannan KW, Jin W, Huelga SC, Banks CA, Gilmore JM, Florens L,Washburn MP, Van Nostrand EL, Pratt GA, Schwinn MK et al.:SONAR discovers RNA-binding proteins from analysis oflarge-scale protein–protein interactomes. Mol Cell 2016,64:282-293.

This paper presents a unique computational approach for predictingnovel RNA-binding proteins relying on information from protein-proteininteractions learned from large-scale mass spectrometry data. Themethod was successfully applied to predict RNA-binding proteins iden-tified in RNA interactome experiments, without relying on sequence orstructural data.

37.��

Jones S: Protein-RNA interactions: structural biology andcomputational modeling techniques. Biophys Rev 2016, 8:359-367.

This is the most updated comprehensive review of computational meth-ods for predicting RNA-binding proteins and RNA-interaction sites onproteins and of recent methods for docking RNAs to proteins. The reviewalso provides an overview on structural technologies for solving RNPsand the available structural data for RNA-binding proteins and Protein-RNA complexes in the Protein Data Bank.

38.��

Miao Z, Westhof E: A large-scale assessment of nucleic acidsbinding site prediction programs. PLoS Comput Biol 2015, 11:e1004639.

In this study, the authors benchmarked over 20 computational programsfor predicting nucleic-acid-binding sites in proteins. In their study, theytested the programs on 41 different datasets and analyzed their perfor-mance. The paper also discusses the advantages and pitfalls of differentcomputational methods.

Current Opinion in Structural Biology 2018, 53:124–130

39. Shazman S, Elber G, Mandel-Gutfreund Y: From face to interfacerecognition: a differential geometric approach to distinguishDNA from RNA binding surfaces. Nucleic Acids Res 2011,39:7390-7399.

40. Sonavane S, Chakrabarti P: Cavities in protein–DNA andprotein–RNA interfaces. Nucleic Acids Res 2009, 37:4613-4620.

41. Yan J, Kurgan L: DRNApred, fast sequence-based method thataccurately predicts and discriminates DNA- and RNA-bindingresidues. Nucleic Acids Res 2017, 45:e84.

42. Singh R, Green MR: Sequence-specific binding of transfer RNAby glyceraldehyde-3-phosphate dehydrogenase. Science1993, 259:365-368.

43. Nagy E, Rigby WF: Glyceraldehyde-3-phosphatedehydrogenase selectively binds AU-rich RNA in the NAD(+)-binding region (Rossmann fold). J Biol Chem 1995,270:2755-2763.

44. Perez-Cano L, Fernandez-Recio J: Optimal protein-RNA area,OPRA: a propensity-based method to identify RNA-bindingsites on proteins. Proteins 2010, 78:25-35.

45. Li S, Yamashita K, Amada KM, Standley DM: Quantifyingsequence and structural features of protein–RNA interactions.Nucleic Acids Res 2014, 42:10086-10098.

46. Madan B, Kasprzak JM, Tuszynska I, Magnus M, Szczepaniak K,Dawson WK, Bujnicki JM: Modeling of protein–RNA complexstructures using computational docking methods. MethodsMol Biol 2016, 1414:353-372.

47. Yan Y, Zhang D, Zhou P, Li B, Huang SY: HDOCK: a web serverfor protein–protein and protein–DNA/RNA docking based on ahybrid strategy. Nucleic Acids Res 2017, 45 W365-W373.

48. Tuszynska I, Magnus M, Jonak K, Dawson W, Bujnicki JM:NPDock: a web server for protein-nucleic acid docking.Nucleic Acids Res 2015, 43 W425-430.

49. Tuszynska I, Matelska D, Magnus M, Chojnowski G, Kasprzak JM,Kozlowski LP, Dunin-Horkawicz S, Bujnicki JM: Computationalmodeling of protein-RNA complex structures. Methods 2014,65:310-319.

50. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM,Meng EC, Ferrin TE: UCSF Chimera – a visualization system forexploratory research and analysis. J Comput Chem 2004,25:1605-1612.

www.sciencedirect.com