Music Translation of Tertiary Protein Structure - Auditory Patterns of the Protein Folding

8/6/2019 Music Translation of Tertiary Protein Structure - Auditory Patterns of the Protein Folding

http://slidepdf.com/reader/full/music-translation-of-tertiary-protein-structure-auditory-patterns-of-the 1/9

C. Di Chio et al. (Eds.): EvoApplications 2011, Part II, LNCS 6625, pp. 214–222, 2011.

© Springer-Verlag Berlin Heidelberg 2011

Music Translation of Tertiary Protein Structure:

Auditory Patterns of the Protein Folding

Riccardo Castagna1,*, Alessandro Chiolerio1, and Valentina Margaria2

1 LATEMAR - Politecnico di Torino

Dipartimento di Scienza dei Materiali ed Ingegneria Chimica

Corso Duca degli Abruzzi 24, 10129 Torino, Italy

Tel.: +39 011 0907381

[email protected] Independent Researcher

Abstract. We have translated genome-encoded protein sequence into musical

notes and created a polyphonic harmony taking in account its tertiary structure.

We did not use a diatonic musical scale to obtain a pleasant sound, focusing

instead on the spatial relationship between aminoacids closely placed in the 3-

dimensional protein folding. In this way, the result is a musical translation of

the real morphology of the protein, that opens the challenge to bring musical

harmony rules into the proteomic research field.

Keywords: Bioart, Biomusic, Protein Folding, Bioinformatics.

1 Introduction

During recent years, several approaches have been investigated to introduce biology

to a wider, younger and non-technical audience [2, 10]. Accordingly, bio-inspired art

(Bioart) represents an interdisciplinary field devoted to reduce the boundaries

between science, intended as an absolutely rational argument, and the emotional

feelings. By stimulating human senses, such as sight, touch and hearing, scientists and

artists together attempt not just to communicate science but also to create new

perspectives and new inspirations for scientific information analysis based on the

rules of serendipity [13].

Bio-inspired music (Biomusic) is a branch of Bioart representing a well developed

approach with educational and mere scientific aims. In fact, due to the affinity

between genome biology and music language, lot of efforts have been dedicated tothe conversion of genetic information code into musical notes to reveal new auditory

patterns [4, 6, 7, 10-12, 14, 15].

The first work introducing Biomusic [10] showed the attempt to translate DNA

sequences into music, converting directly the four DNA basis into four notes. The

goal was initially to create an acoustic method to minimize the distress of handling

the increasing amount of base sequencing data. A certain advantage of this approach

* Corresponding author.



Music Translation of Tertiary Protein Structure 215

was that the DNA sequences were easily recognized and memorized, but from an

aesthetic/artistic point of view it represented a poor result due to the lack of

musicality and rhythm. Other approaches to convert DNA sequences into music were

based on codons reading frame and mathematical analysis of the physical properties

of each nucleotide [6, 7].

Unfortunately, because of the structure and the nature of DNA, all these attempts

gave rise to note sequences lacking of musical depth. In fact, since the DNA is based

on four nucleotides (Adenine, Cytosine, Guanine, Thymine), a long and un-structured

repetition of just four musical notes is not enough to create a musical composition.

As a natural consequence, scientists focused their attention on proteins, instead of

DNA, with the aim of obtaining a reasonable, pleasant and rhythmic sound that can

faithfully represent genomic information.

Proteins are polymers of twenty different amino acids that fold into specific spatial

conformations, driven by non-covalent interactions, to perform their biological function.

They are characterized by four distinct levels of organization: primary, secondary,

tertiary and quaternary structure.

The primary structure refers to the linear sequence of the different amino acids,

determined by the translation of the DNA sequence of the corresponding gene. The

secondary structure, instead, refers to regular local sub-structures, named alpha helix

(α-helix) and beta sheet (β-sheet). The way the α-helices and β-sheets folded into a

compact globule describes the tertiary structure of the protein. The correct folding of

a protein is strictly inter-connected with the external environment and is essentialto execute the molecular function (see Figure 1). Finally, the quaternary structure

represents a larger assembly of several protein molecules or polypeptide chains [3].

A number of studies have dealt with the musical translation of pure protein

sequences [4, 15]. For example, Dunn and Clark used algorithms and secondarystructure of proteins to translate amino acid sequences into musical themes [4].

Another example of protein conversion in music is given by the approach used byTakahashi and Miller [15]. They translated the primary protein structure in a sequence

of notes, and after that they expressed each note as a chord of a diatonic scale.

Amino Acid Sequence (Primary Structure) Folded Protein (Tertiary Structure)

protein folding

Fig. 1. Protein structures: linear sequence of amino acids, described on the left, fold into

specific spatial conformations, driven by non-covalent interactions



216 R. Castagna, A. Chiolerio, and V. Margaria

Moreover, they introduced rhythm into the composition by analyzing the abundance

of a specific codon into the corresponding organism and relating this information with

note duration. Anyway, the use of the diatonic scale and the trick of chords built on a

single note gave rise to results that are partially able to satisfy the listener from a

musical point of view but, unfortunately, they are not a reliable representation of the

complexity of the molecular organization of the protein.

Music is not a mere linear sequence of notes. Our minds perceive pieces of music

on a level far higher than that. We chunk notes into phrases, phrases into melodies,

melodies into movements, and movements into full pieces. Similarly, proteins only

make sense when they act as chunked units. Although a primary structure carries all

the information for the tertiary structure to be created, it still "feels" like less, for its

potential is only realized when the tertiary structure is actually physically created

[11]. Consequently, a successful approach for the musical interpretation of proteincomplexity must take in account, at least, its tertiary structures and could not be based

only on its primary or secondary structure.

2 Method

Our pilot study focused on the amino acid sequence of chain A of the Human

Thymidylate Synthase A (ThyA), to create a comparison with the most recent work

published on this subject [15]. The translation of amino acids into musical notes was

based on the use of Bio2Midi software [8], by means of a chromatic scale to avoid

any kind of filter on the result.The protein 3-dimensional (3D) crystallographic structure was obtained from the

Protein Data Bank (PDB). Information relative to the spatial position in a 3D volumeof each atom composing the amino acids was recovered from the PDB textual

structure (http://www.rcsb.org/pdb/explore/explore.do?structureId=1HVY).

The above mentioned file was loaded in a Matlab® environment, together with

other useful indicators such as the nature of the atom and its sequential position in thechain, the corresponding amino acid type and its sequential position in the protein.

A Matlab® script was written and implemented to translate the structure in music,

as described below. We adopted an approach based on the computation of the centre

of mass of each amino acid, which was identified and used as the basis for subsequent

operations: this was done considering each atom composing every amino acid, its

position in a 3D space and its mass. Therefore, the mass-weighed mean position of

the atoms represented the amino acid centre of mass.

3 Results

3.1 Distance Matrix: Musical Chords

The first important output obtained from the algorithm was the distance matrix,

containing the lengths of the vectors connecting the amino acids one by one.

The matrix is symmetrical by definition and features an interesting block structure

(see Figure 2). The symmetry is explained simply considering that the distance




Fig. 2. Distance matrix. X and Y axis: sequential number of amino acid; Z axis: distance in pm

(both vertical scale and colour scale).

Fig. 3. Sketch of the Matlab® script operation




Fig. 4. Distance distributions. Histograms depicting the distance distribution in the three

orders(X axis: distance in pm; Y axis: number of amino acid couples). Selection rules avoid the

nearest-neighbours amino acids. Increasing the cut-off to third order it is possible to sample the

second peak of the bi-modal distribution (diagonal lines).

between the i-th and j-th amino acid is the same of that between the j-th amino acid

and the i-th one. The block structure is probably due to amino acid clustering in

portions of the primary chain.

We sought spatial correlations between amino acids as non-nearest-neighbour,

hence ignoring those amino acids which are placed close one to the other along the

primary sequence. By spatial correlation, we mean the closest distance between non-

obviously linked amino acids.

Running the sequence, the Matlab® script looked for three spatial correlations (i.e.

three minimal distances) involving three different amino acids (two by two), as sketched

in Figure 3. The two couples for every position in the primary chain were then stored

and the corresponding distance distribution was extracted and plotted (see Figure 4).Those spatial correlations, or distances, are addressed to as first, second and third order.

The parallelism between the sequence order and the discretization order ubiquitous in

every field of digital information theory emerges from the spatial description of the

protein: the more precise is the observation of the amino acids in proximity of a certain

point, the higher the order necessary to include every spatial relation. The same conceptapplies to digital music: the higher either the bit-rate (as in MP3 codification) or the

sampling frequency (as in CD-DA), the higher the fidelity. The topmost limit is an order

equal to n, the number of amino acids composing a protein.




3.2 Note Intensity: Musical Dynamics

The note intensity is given in a range between 0 and 99. We assumed that the closer isthe couple of amino acids, the higher is the intensity of the musical note. In order toplay each order with an intensity comparable to the first order, characterized by theclosest couples which may be found in the whole protein structure, we performed anormalization of the distance data within each order. In this way, normalized distancedata, multiplied times 99, give the correct intensity scale.

3.3 Angle Distribution: Musical Rhythm

The primary sequence was analyzed also to extrapolate the degree of folding, a

measure of the local angle between segments ideally connecting the centres of mass

of subsequent amino acids.Proteins composed by extended planar portions β-sheet tend to have an angular

distribution centred around 180°. The angle distribution was extracted (see Figure 5)and parameterized as the note length: the more linear is the chain, the shorter is thenote. This step gave rhythm to the generated music. In this way, the musical rhythm isintended as the velocity of an imaginary visitor running on the primary sequence. Wewould like to point out that this conversion features a third order cut-off, meaning thatthe spatial description fidelity is based on three spatial relations for each amino acidposition; the higher is the cut-off, the higher the sound quality.

Fig. 5. Angular distribution. Histogram showing the angular distribution of the vectors linking

the amino acids along the chain A of the human Thymidylate Synthase A (ThyA), used to

codify each note length.




3.4 Output: Musical Score/Notation

Finally, a conversion from each amino acid to the corresponding note was performed,

generating an ASCII textual output that can be converted to a MIDI file with the

GNMidi software [9].Since amino acids’ properties influence the protein folding process, we adopted

Dunn’s translation method [4] that is based on amino acids water solubility. The mostinsoluble residues were assigned pitches in the lowest octave, the most soluble,including the charged residues, were in the highest octave, and the moderatelyinsoluble residues were given the middle range. Thus, pitches ranged over twooctaves for a chromatic scale.

After that, the MIDI files of the three orders were loaded in Ableton Live software

[1] and assigned to MIDI instruments. We chose to assign the first order sequence of

musical notes to a lead instrument (Glockenspiel) and to use a string emulator to playthe other two sequences (see Figure 6). In this way it is possible to discern the

primary sequence from the related musical texture that represents the amino acids

involved in the 3D structure (See Additional Data File).

Fig. 6. Score of the musical translation of the chain A of the human Thymidylate Synthase A

(ThyA). The three instruments represent respectively the primary sequence (Glockenspiel) and

the two different amino acids correlated in the tertiary structure (String Bass, Viola).

4 Discussion

We obtained a polyphonic music by translating into musical notes the amino acidsequence of a peptide (the chain A of ThyA) and arranging them in chords byanalyzing their spatial relationship (see Figure 7). To our knowledge, it is the firsttime that a team of scientists and musicians creates a polyphonic music that describesthe entire 3D structure of a bio-molecule.




PPHGELQYLGQIQHILRCGVRKD

DRTGTGTLSVFGMQARYSLRDEF

PLLTTKRVFWKGVLEELLWFIKGS

TNAKELSSKGVKIWDANGSRDFL

DSLGFSTREEGDLGPVYGFQWRH

FGAEYRDMESDYSGQGVDQLQR

VIDTIKTNPDDRRIIMCAWNPRDL

PLMALPPCHALCQFYVVNSELSC

QLYQRSGDMGLGVPFNIASYALLT

YMIAHITGLKPGDFIHTLGDAHIYL

NHIEPLKIQLQRE PRPFPKLRILRK

VEKIDDFKAEDFQIEGYNPHPTIK

MEMAV

E 30

L 74

P 273

E 30

L 74

P 273

a.

b.

Fig. 7. From tertiary protein structure to musical chord. The primary structure of ThyA (chain

A), on top right of the figure, fold into its tertiary structure (a, b). In yellow an example of the

amino acids composing a musical chord: E30, L74 and P273 are non-obviously linked amino

acids accordingly to our 3D spatial analysis criteria.

Previous works, attempting to translate a protein in music, focused on primary or

secondary protein structure and used different tricks to obtain a polyphonic music.

Instead, the Matlab® script we developed is able to analyze the PDB file that

contains the spatial coordinates of each atom composing the amino acids of theprotein. The computation of distances and other useful geometrical properties

between non-adjacent amino acids, generates a MIDI file that codifies the 3Dstructure of the protein into music.

In this way, the polyphonic music contains all the crucial information necessary to

describe a protein, from its primary to its tertiary structure. Nevertheless, our analysis

is fully reversible: by applying the same translation rules that are used to generate

music, one can store, position by position, the notes (i.e. the amino acids) and obtain

their distance. A first order musical sequence gives not enough information to recover

the true protein structure, because there is more than one unique possibility to draw

the protein. On the contrary, our approach, based on a third order musical sequence,

has 3 times more data and describes one and only one solution to the problem of

placing the amino acids in a 3D space.

5 Conclusions

Our work represents an attempt to communicate to a wider audience the complexity

of the 3D protein structure, based on a rhythmic musical rendering. Biomusic can be

an useful educational tool to depict the mechanisms that give rise to intracellular vital

signals and determine cells fate. The possibility to “hear” the relations between amino

acids and protein folding could definitely help students and a non technical auditory

to understand the different facets and rules that regulate cells processes.




Moreover, several examples of interdisciplinary projects demonstrated that the use

of an heuristic approach, sometimes perceived by the interacting audience as a

game, can lead to interesting and useful scientific results [2, 5, 16]. We hope to bring

musical harmony rules into the proteomic research field, encouraging a new

generation of protein folding algorithms. Protein structure prediction, despite all the

efforts and the development of several approaches, remains an extremely difficult and

unresolved undertaking. We do not exclude that, in the future, musicality could be

one of the driving indicators for protein folding investigation.

Acknowledgments. The authors would like to acknowledge Smart Collective (www.

smart-collective.com) and Prof. Fabrizio Pirri (Politecnico di Torino) for supporting.

References

1. Ableton Live, http://www.ableton.com

2. Cyranoski, D.: Japan plays trump card to get kids into science. Nature 435, 726 (2005)

3. Dobson, C.M.: Protein folding and misfolding. Nature 426(6968), 884–890 (2003)

4. Dunn, J., Clak, M.A.: Life music: the sonification of proteins. Leonardo 32, 25–32 (1999)

5. Foldit, http://fold.it/portal

6. Gena, P., Strom, C.: Musical synthesis of DNA sequences. XI Colloquio di Informatica

Musicale, 203–204 (1995)

7. Gena, P., Strom, C.: A physiological approach to DNA music. In: CADE 2001, pp. 129–

134 (2001)

8. Gene2Music, http://www.mimg.ucla.edu/faculty/miller_jh/

gene2music/home.html

9. GNMidi, http://www.gnmidi.com

10. Hayashi, K., Munakata, N.: Basically musical. Nature 310(5973), 96 (1984)

11. Hofstadter, D.R.: Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books, New York

(1979) ISBN 0465026567

12. Jensen, E., Rusay, R.: Musical representations of the Fibonacci string and proteins using

Mathematica. Mathematica J. 8, 55 (2001)

13. Mayor, S.: Serendipity and cell biology. Mol. Biol. Cell 21(22), 3808–3870 (2010)

14. Ohno, S., Ohno, M.: The all pervasive principle of repetitious recurrence governs not only

coding sequence construction but also human endeavor in musical composition.

Immunogenetics 24, 71–78 (1986)

15. Takahashi, R., Miller, J.: Conversion of amino acid sequence in proteins to classical music:

search for auditory patterns. Genome Biology 8(5), 405 (2007)

16. The Space Game, http://www.thespacegame.org

Documents

Music Translation of Tertiary Protein Structure - Auditory Patterns of the Protein Folding