A condensed 3D graphical representation of RNA secondary structures

A condensed 3D graphical representation of RNA secondary structures

Wen Zhu a, Bo Liao a,b,*, Kequan Ding b

a School of Computer and Communication, Hunan University, Changsha, Hunan 410082, Chinab Science 100 Lab, Graduate School of the Chinese Academy of Sciences, Beijing 100049,China

Received 19 December 2004; received in revised form 25 April 2005; accepted 25 April 2005

Available online 02 August 2005

Abstract

In this paper, we propose a condensed 3D graphical representation of RNA secondary structures. Based on this representation, we outline an

approach by constructing a 3-component vector whose components are the normalized leading eigenvalues of the L/L matrices associated with

RNA secondary structure. The examination of similarities/dissimilarities among the secondary structure at the 3 0-terminus of different viruses

illustrates the utility of the approach.

q 2005 Elsevier B.V. All rights reserved.

Keywords: RNA secondary structure; Similarity; Virus; 3D Graphical representation

1. Introduction

Ribonucleic acid (RNA) is an important molecule which

performs a wide range of functions in the biological system. In

particular, it is RNA (not DNA) that contains genetic

information of virus such as HIV and therefore regulates the

functions of such virus. RNA has recently become the center of

much attention because of its catalytic properties, leading to an

increased interest in obtaining structural information [1–2].

Using the similar methods presented by Randic et al. [3–9,11],

one also can outline several graphical representation of RNA

primary sequences based on 2D and 3D to compute the

similarity of RNA primary sequences. Now, we will consider

the similarity of RNA secondary structures [10,12].

Previously, almost all such comparisons are based on

alignments of RNA structures: a distance function or a score

function is used to represent insertion, deletion, and substi-

tution of letters in the compared structures. Using the distance

function, one can compute similarity between RNA structures.

There are many algorithms for computing the similarity

between RNA secondary structures [13–19]. But, by using

these approaches, the chemical structures and the chemical

0166-1280/$ - see front matter q 2005 Elsevier B.V. All rights reserved.

doi:10.1016/j.theochem.2005.04.042

* Corresponding author. Address: School of Computer and Communication,

Hunan University, Changsha, Hunan 410082, China. Tel.: C86 10 88256148;

fax: C86 10 88256147.

E-mail address: [email protected] (B. Liao).

properties are ignored, and there is a restriction of non-

crossing.

In this paper, based on the classifications of bases and base

pairs, we shall propose a condensed 3D graphical represen-

tation, which avoids the limitation associated with non-

crossing. We make a comparison for the secondary structures

belonging to nine different species based on this graphical

representation. In Fig. 1, the secondary structures belonging to

nine different viruses are listed, which were reported by Bol

[20]. Based on the order of free bases A, G, U, C and base pairs

A–U, C–G, we shall reduce a RNA secondary structure into

three curves. We construct a 3-component vector consisting of

the normalized leading eigenvalues of the L/L matrices. The

similarities are computed by calculating the Euclidean distance

between the end point of the vectors or calculating the

correlation angle of the two vectors.

2. 3D Graphical representation of RNA secondary

structures

The secondary structure of an RNA is a set of free bases and

base pairs forming hydrogen bonds between A–U and G–C. Let

A 0, U 0, G 0, C 0 denote A, U, G, C in the base pair A–U and G–C,

respectively. Then we can obtain a special sequence represen-

tation of the secondary structure. We call it the characteristic

sequence of the secondary structure. For example, the

corresponding characteristic sequence of the substructure of

AlMV-3 (Fig. 2) is G 0G 0G 0AAUC 0C 0C 0 (from 3 0 to 5 0).

Let GZg1, g2,., be an arbitrary characteristic sequence of

RNA secondary structure, we have a map f, which maps G into

Journal of Molecular Structure: THEOCHEM 757 (2005) 193–198

www.elsevier.com/locate/theochem

http://www.elsevier.com/locate/theochem

Fig. 1. Secondary structure at the 3 0-terminus of RNA 3 of alfalfa mosaic virus (AlMV-3 [21]), citrus leaf rugose virus (CiLRV-3 [22]), tobacco streak virus (TSV-3

[23,24]), citrus variegation virus (CVV-3 [22]), apple mosaic virus (APMV-3 [25]), prune dwarf ilarvirus (PDV-3 [26]), lilac ring mottle virus (LRMV-3 [27]), elm

mottle virus (EMV-3 [28]) and asparagus virus II (AVII [29]). Numbering of nucleotides is from the 3 0end of RNA 3.

Fig. 2. Substructure of AlMV-3.

W. Zhu et al. / Journal of Molecular Structure: THEOCHEM 757 (2005) 193–198194

a plot set. Explicitly, f(G)Zf(g1), f(g2),., where

fðgiÞ Z

ð1; 0;Ai CUiÞ if gi 2fA;Ug

ðK1; 0;Ci CGiÞ if gi 2fC;Gg

ð1; 1;A0i CU 0

iÞ if gi 2fA0;U 0g

ðK1;K1;C 0i CG0

iÞ if gi 2fC 0;G0g

8>>>>><>>>>>:

In other words, we will reduce a RNA secondary structure

into a series of nodes P0, P1, P2,.,PN, whose coordinates xi,

Table 1

Cartesian 3D coordinates for the substructure of AlMV-3 (Fig. 2)

Base Nucletic x y z

1 G 0 K1 K1 1

2 G 0 K1 K1 2

3 G 0 K1 K1 3

4 A 1 0 1

5 A 1 0 2

6 U 1 0 3

7 C 0 K1 K1 4

8 C 0 K1 K1 5

9 C 0 K1 K1 6

W. Zhu et al. / Journal of Molecular Structure: THEOCHEM 757 (2005) 193–198 195

yi, zi (iZ0, 1, 2,., N, where N is the length of the RNA

secondary structure being studied) satisfy

xi Z 1; yi Z 0; zi Z Ai CUi if gi 2fA; Ug

xi ZK1; yi Z 0; zi Z Ci CGi if gi 2fC; Gg

xi Z 1; yi Z 1; zi Z A0i CU 0

i if gi 2fA0; U 0g

xi ZK1; yi ZK1; zi Z C 0i CG0

i if gi 2fC 0; G0g

8>>>>><>>>>>:

where Ai, Ci, Gi, Ui, A0i, U 0i , C 0

i , and G0i are the cumulative

occurrence numbers of A, C, G, U, A 0, U 0, C 0 and G 0,

respectively, in the subsequence from the first base to the i-th

base in the sequence. For example, the corresponding plot set of

the substructure of AlMV-3 (Fig. 2) is {(K1, K1, 1), (K1, K1,

2), (K1, K1, 3), (1, 0, 1), (1, 0, 2), (1, 0, 3), (K1, K1, 4), (K1,

K1, 5), (K1, K1, 6)}. In Table 1, we show (x, y, z) coordinates

of the substructure of AlMV-3 (Fig. 2).

We called the corresponding plot set characteristic plot set.

The curve connecting all plots of the characteristic plot set in

turn is called characteristic curve. In Fig. 3, we show the

characteristic curves representing the first nine bases of the

substructure of AlMV-3 (Fig. 2).

Bases of RNA can be classed into groups: purine (A,

G)/pyrimidine (C, U), amino (A, C)/keto (G, U), week-H bond

Fig. 3. The 3D graphical representation of the substructure of AlMV-3 based on

pattern AU. The dots denote the bases making up the substructure.

(A, U)/strong-H band (G, C). We can obtain only three

representations corresponding to the three classifications. We

call them pattern AG, AC and AU, respectively. The map f, f 0

and f 00 correspond to the pattern AU, AC and AG, respectively.

In Figs. 4 and 5 we show the curves of the substructure of

AlMV-3 based on pattern AC and AG, respectively.

f0ðgiÞ Z

ð1; 0; Ai CCiÞ if gi 2fA; Cg

ðK1; 0; Ui CGiÞ if gi 2fU; Gg

ð1; 1; A0i CU 0

i Þ if gi Z A0

ðK1; 1; A0i CU 0

i Þ if gi Z U 0

ðK1; K1; G0i CC 0

iÞ if gi Z G0

ð1; K1; G0i CC 0

iÞ if gi Z C 0

8>>>>>>>>>>><>>>>>>>>>>>:

f00ðgiÞ Z

ð1; 0; Ai CGiÞ if gi 2fA; Gg

ðK1; 0; Ui CCiÞ if gi 2fC; Ug

ð1; 1; A0i CU 0

i Þ if gi Z A0

ðK1; 1; A0i CU 0

i Þ if gi Z U 0

ð1; K1; G0i CC 0

iÞ if gi Z G0

ðK1; K1; G0i CC 0

iÞ if gi Z C 0

8>>>>>>>>>>><>>>>>>>>>>>:

3. Similarities/dissimilarities among the RNA secondary

structures of nine virus

In order to find some of the invariants sensitive to the RNA

secondary structure we will transform the 3D representation of

the RNA secondary structure into another mathematical object,

a matrix. Once we have a matrix representing a RNA secondary

structure, we can use some of matrix invariants as descriptors

of the structure. One of the matrices is the L/L matrix whose

elements li,j are defined as the quotient of the Euclidean

distance between a pair of vertices(dots) of the characteristic

curve and the sum of distances between the same pair of


pattern AC. The dots denote the bases making up the substructure.


pattern AG. The dots denote the bases making up the substructure.


vertices measured along the characteristic curve. In other

words,

li;j Zdi;j

PjK1

kZi

dk;kC1

where di,j is the Euclidean distance between a pair of vertices.

Its eigenvalues, and in particular its leading eigenvalue can be

used as descriptors of a RNA secondary structure. Among all

eigenvalues the leading eigenvalue of a matrix, l1, often plays

a special role. In the case of the adjacency matrix of trees,

Lovasz and Pelikan [30] suggested the leading eigenvalue l1 as

an index of molecular branching. More recently it was shown

Table 2

The leading eigenvalues of the L/L matrices associated with three essentially diffe

Patterns AIMV-3 CiLRV-3 TSV-3 CVV-3

AU 12.9727 17.5897 15.4061 15.3904

AC 13.0909 15.0653 14.7057 15.8327

AG 12.1478 17.1360 13.4306 16.3975

Table 3

The similarity/dissimilarity matrix for the coding sequences of Fig. 1 based on the

normalized leading eigenvalues of the L/L matrices

Species AIMV-3 CiLRV-3 TSV-3 CVV-3 A

AIMV-3 0 0.0487 0.0547 0.0483 0

CiLRV-3 0 0.0692 0.0538 0

TSV-3 0 0.0454 0

CVV-3 0 0

APMV-3 0

LRMV-3

PDV-3

EMV-3

AVII

that the leading eigenvalue of a substituted path matrix, ll1,

gives even better characterization of molecular branching [31–

34].

We will characterize the coding sequences of the RNA

secondary structure of nine species by means of the leading

eigenvalue of the L/L matrix. In Table 2 we give the leading

eigenvalues of the L/L matrices associated with three

essentially different patterns of the characteristic curves

representing each of the coding sequences.

Next, we will illustrate the use of the 3D quantitative

characterization of RNA secondary structure with an examin-

ation of similarities/dissimilarties among the nine coding

sequences shown in Fig. 1. We construct a 3-component vector

consisting of the normalized leading eigenvalue l1/n, where l1

is the leading eigenvalue of matrix L/L, n is the length of the

coding sequences. All the vectors rooted at the (0, 0, 0)

position. The underlying assumption is that if two vectors point

to a similar direction in three-dimensional space, then the two

RNA secondary structures represented by the 3-component

vectors are similar.

The similarities among such vectors can be computed in two

ways: (1) calculating the Euclidean distance between the end

points of the vectors; (2) calculating the correlation angle of

two vectors. The smaller the Euclidean distance between the

end points of two vectors, the more similar are the RNA

secondary structures. And, the smaller the correlation angle

between two vectors, the more similar are the RNA secondary

structures.

The Euclidean distance measure between vector end points

and the correlation angle between vectors are different

measures of the similarity of RNA secondary structures.

Observing Tables 3 and 4 we find that there exists an overall

qualitative agreement among similarities. In general, the

correlation angle is the best tolerance for the similarities.

rent patterns of the characteristic curves for the coding sequences of Fig. 1

APMV-3 LRMV-3 PDV-3 EMV-3 AVII

13.5551 12.4008 16.0753 11.1649 13.5015

15.3412 11.8365 16.1528 10.4996 12.6272

13.3476 13.8402 15.6737 12.1459 14.3504

Euclidean distances between the end points of the 3-component vectors of the

PMV-3 LRMV-3 PDV-3 EMV-3 AVII

.0411 0.1503 0.1093 0.1725 0.1312

.0807 0.1442 0.1187 0.1674 0.1251

.0916 0.1053 0.1629 0.1247 0.0860

.0785 0.1078 0.1474 0.1316 0.898

0.1830 0.0860 0.2053 0.1650

0 0.2544 0.0250 0.0200

0 0.2781 0.2357

0 0.0427

0

Table 4

The similarity/dissimilarity matrix for the coding sequences of Fig. 1 based on the angle between the 3-component vectors of the normalized leading eigenvalues of

the L/L matrices

Species AIMV-3 CiLRV-3 TSV-3 CVV-3 APMV-3 LRMV-3 PDV-3 EMV-3 AVII

AIMV-3 0 0.0861 0.0308 0.056551 0.0481 0.0990 0.0198 0.0919 0.0833

CiLRV-3 0 0.0867 0.0733 0.1284 0.0599 0.0734 0.0495 0.0417

TSV-3 0 0.0822 0.0731 0.1166 0.0453 0.1077 0.0978

CVV-3 0 0.0724 0.0501 0.0375 0.0474 0.0435

APMV-3 0 0.1225 0.0551 0.1189 0.1130

LRMV-3 0 0.0794 0.0108 0.0207

PDV-3 0 0.0727 0.0645

EMV-3 0 0.0103

AVII 0

W. Zhu et al. / Journal of Molecular Structure: THEOCHEM 757 (2005) 193–198 197

Observing Tables 3 and 4, we find the more similar species

pairs are AVIIwEMV-3, EMV-3wLRMV-3, AVIIwLRMV-

3 and PDVwAlMV-3.

4. Conclusion

We have presented a similarity measure between RNA

secondary structures. A simple graphical representations

substitute the complicated molecular structure. The advantage

of our approach is that it allows visual inspection of data,

helping in recognizing major similarities among different RNA

structures, and allows one to construct numerical characteriz-

ation. In our approach, the insertion, deletion, and substitution

of plots of curves correspond to the insertion, deletion, and

substitution of letters in the compared structures, respectively.

One difference from the alignments of RNA secondary

structures is that our approach considers not only sequence

structures but also chemical structures for RNA secondary

structures. Another difference from the algorithms for

computing the similarity between RNA secondary structures

is that there is no restriction of non-crossing in our approach.

Acknowledgements

The authors would like to thank the anonymous referees for

many valuable suggestions that have improved this manuscript.

References

[1] Bo Liao, Tian-ming Wang, General combinatorics of RNA hairpins and

cloverleaves, J. Chem. Inf. Comput. Sci. 43 (4) (2003) 1138–1142.

[2] Bo Liao, Tianming Wang, General combinatorics of RNA secondary

structure, Math. Biosci. 191 (2004) 69–81.

[3] Milan Randic, Marjan Vracko, Nella Lers, Dejan Plavsic, Analysis of

similarity/dissimilarity of DNA sequences based on novel sequences

based on novel 2-D graphical representation, Chem. Phys. Lett. 371

(2003) 202–207.

[4] Bo Liao, Tianming Wang, New 2D Graphical representation of DNA

sequences, J. Comput. Chem. 25 (11) (2004) 1364–1368.

[5] M. Randic, Alexandru T. Balanba, On a four-dimensional representation

of DNA primary sequences, J. Chem. Inf. Comput. Sci. 40 (2000) 50–56.

[6] Chunxin Yuan, Bo Liao, Tianming Wang, New 3-D graphical

representation of DNA sequences and their numerical characterization,

Chem. Phys. Lett. 379 (2003) 412–417.

[7] Bo Liao, Tianming Wang, Analysis of similarity/dissimilarity of DNA

sequences based on 3-D graphical representation, Chem. Phys. Lett. 388

(2004) 195–200.

[8] Bo Liao, Tianming Wang, Analysis of similarity of DNA sequences based

on triplets, J. Chem. Inf. Comput. Sci. 44 (2004) 1666–1670.

[9] Bo Liao, Tianming Wang, 3-D graphical representation of DNA

sequences and their numerical characterization, J. Mol. Struct.

THEOCHEM 681 (2004) 209–212.

[10] Bo Liao, Kequan Ding, Tianming Wang, On a six-dimensional

representation of RNA secondary structures, J. Biomol. Struct. Dyn. 22

(2005) 455–464.

[11] Bo Liao, On a 2D graphical representation of DNA sequence, Chem.

Phys. Lett. 401 (2005) 196–199.

[12] Bo Liao, Tianming Wang, A 3D graphical representation of

RNA secondary structures, J. Biomol. Struct. Dyn. 21 (6) (2004) 827–832.

[13] V. Bafna, S. Muthukrisnan, R. Ravi, Comparing similarity between RNA

strings, Comput. Sci. 937 (1995) 1–4.

[14] F. Corpet, B. Michot, RNAlign program: alignment of RNA sequences

using both primary and secondary structures, Comput. Appl. Biosci. 10

(4) (1995) 389–399.

[15] S.Y. Le, R. Nussinov, J.V. Mazel, Tree graphs of RNA secondary structures

and their comparsion, Comput. Biomed. Res. 22 (1989) 461–473.

[16] S.Y. Le, J. Onens, R. Nussinov, J.H. Chen, B. Shapiro, J.V. Mazel, RNA

secondary structures: comparsion and determination of frequently recurring

sunstructures by consensus, Comput. Biomed. 5 (1989) 205–210.

[17] B. Shapiro, An algorithm for comparing multiple RNA secondary

structures, Comput. Appl. Biosci. 4 (3) (1998) 387–393.

[18] B. Shapiro, K. Zhang, Comparing multiple RNA secondary structures using

tree comparisons, Comput. Appl. Biosci. 6 (4) (1990) 309–318.

[19] K. Zhang, Computing similarity between RNA secondary structures,

Proceedings of the IEEE International Joint Symposium on Intelligence

and Systems, Rockviue, MD, May 1998, 126–132.

[20] Chantal B.E.M. Reusken, John F. Bol, Structural elements of the 3 0-

terminal coat protein binding site in alfalfa mosaic virus RNAs, Nucleic

Acids Res. 14 (1996) 2660–2665.

[21] E.C. Koper-Zwarthoff, F.Th. Brederode, P. Walstra, J.F. Bol, Nucleic

Acids Res. 7 (1979) 1887–1900.

[22] S.W. Scott, X. Ge, J. Gen. Virol. 76 (1995) 957–963.

[23] E.C. Koper-Zwarthoff, F.Th. Brederode, P. Walstra, J.F. Bol, Nucleic

Acids Res. 8 (1980) 3307–3318.

[24] B.J. Cornelissen, H. Janssen, D. Zuidema, J.F. Bol, Nucleic Acids Res. 12

(1984) 2427–2437.

[25] R.H. Alrefai, P. Shicl, L.L. Domier, C.J. D’Arcy, P.H. Berger,

S.S. Korban, J. Gen. Virol. 75 (1994) 2847–2850.

[26] S.W. Scott, X. Ge, J. Gen. Virol. 76 (1995) 1801–1806.

[27] E.J. Bachman, S.W. Scott, G. Xin, V. Bowman Vance, Virology 201

(1994) 127–131.

[28] F. Houser-Scott, M.L. Baer, K.F. Liem, J.M. Cai, L. Gehrke, J. Virol. 68

(1994) 2194–2205.

[29] EMBL/GenBank/DDBJ databases. Accession no. X86352.


[30] L. Lovasz, J.I. Pelikan, On the eigenvalues of trees, Period. Math. Hung. 3

(1973) 175–182.

[31] M. Randic, On structural ordering and branching of acyclic saturated

hydrocarbons, J. Math. Chem. 24 (1998) 345–358.

[32] M. Randic, D. Plavsic, M. Razinger, Double invariants MATCH 35

(1997) 243–259.

[33] M. Randic, X. Guo, S. Bobst, Use of matrices for characterization of

molecular structures in: P. Hansen, P. Folwer, M. Zheng (Eds.), Discrete

Mathematical Chemistry DIMACS Workshop on Discrete Mathematical

Chemistry, American Mathematical Society, Providence, RI, 2000,

pp. 305–322.

[34] M. Randic, On molecular branching, Acta Chim. Sloven. 44 (1997) 57–77.

Documents

A condensed 3D graphical representation of RNA secondary structures