Upload
smartcollective
View
220
Download
0
Embed Size (px)
Citation preview
8/6/2019 Music Translation of Tertiary Protein Structure - Auditory Patterns of the Protein Folding
http://slidepdf.com/reader/full/music-translation-of-tertiary-protein-structure-auditory-patterns-of-the 1/9
C. Di Chio et al. (Eds.): EvoApplications 2011, Part II, LNCS 6625, pp. 214–222, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Music Translation of Tertiary Protein Structure:
Auditory Patterns of the Protein Folding
Riccardo Castagna1,*, Alessandro Chiolerio1, and Valentina Margaria2
1 LATEMAR - Politecnico di Torino
Dipartimento di Scienza dei Materiali ed Ingegneria Chimica
Corso Duca degli Abruzzi 24, 10129 Torino, Italy
Tel.: +39 011 0907381
[email protected] Independent Researcher
Abstract. We have translated genome-encoded protein sequence into musical
notes and created a polyphonic harmony taking in account its tertiary structure.
We did not use a diatonic musical scale to obtain a pleasant sound, focusing
instead on the spatial relationship between aminoacids closely placed in the 3-
dimensional protein folding. In this way, the result is a musical translation of
the real morphology of the protein, that opens the challenge to bring musical
harmony rules into the proteomic research field.
Keywords: Bioart, Biomusic, Protein Folding, Bioinformatics.
1 Introduction
During recent years, several approaches have been investigated to introduce biology
to a wider, younger and non-technical audience [2, 10]. Accordingly, bio-inspired art
(Bioart) represents an interdisciplinary field devoted to reduce the boundaries
between science, intended as an absolutely rational argument, and the emotional
feelings. By stimulating human senses, such as sight, touch and hearing, scientists and
artists together attempt not just to communicate science but also to create new
perspectives and new inspirations for scientific information analysis based on the
rules of serendipity [13].
Bio-inspired music (Biomusic) is a branch of Bioart representing a well developed
approach with educational and mere scientific aims. In fact, due to the affinity
between genome biology and music language, lot of efforts have been dedicated tothe conversion of genetic information code into musical notes to reveal new auditory
patterns [4, 6, 7, 10-12, 14, 15].
The first work introducing Biomusic [10] showed the attempt to translate DNA
sequences into music, converting directly the four DNA basis into four notes. The
goal was initially to create an acoustic method to minimize the distress of handling
the increasing amount of base sequencing data. A certain advantage of this approach
* Corresponding author.
8/6/2019 Music Translation of Tertiary Protein Structure - Auditory Patterns of the Protein Folding
http://slidepdf.com/reader/full/music-translation-of-tertiary-protein-structure-auditory-patterns-of-the 2/9
Music Translation of Tertiary Protein Structure 215
was that the DNA sequences were easily recognized and memorized, but from an
aesthetic/artistic point of view it represented a poor result due to the lack of
musicality and rhythm. Other approaches to convert DNA sequences into music were
based on codons reading frame and mathematical analysis of the physical properties
of each nucleotide [6, 7].
Unfortunately, because of the structure and the nature of DNA, all these attempts
gave rise to note sequences lacking of musical depth. In fact, since the DNA is based
on four nucleotides (Adenine, Cytosine, Guanine, Thymine), a long and un-structured
repetition of just four musical notes is not enough to create a musical composition.
As a natural consequence, scientists focused their attention on proteins, instead of
DNA, with the aim of obtaining a reasonable, pleasant and rhythmic sound that can
faithfully represent genomic information.
Proteins are polymers of twenty different amino acids that fold into specific spatial
conformations, driven by non-covalent interactions, to perform their biological function.
They are characterized by four distinct levels of organization: primary, secondary,
tertiary and quaternary structure.
The primary structure refers to the linear sequence of the different amino acids,
determined by the translation of the DNA sequence of the corresponding gene. The
secondary structure, instead, refers to regular local sub-structures, named alpha helix
(α-helix) and beta sheet (β-sheet). The way the α-helices and β-sheets folded into a
compact globule describes the tertiary structure of the protein. The correct folding of
a protein is strictly inter-connected with the external environment and is essentialto execute the molecular function (see Figure 1). Finally, the quaternary structure
represents a larger assembly of several protein molecules or polypeptide chains [3].
A number of studies have dealt with the musical translation of pure protein
sequences [4, 15]. For example, Dunn and Clark used algorithms and secondarystructure of proteins to translate amino acid sequences into musical themes [4].
Another example of protein conversion in music is given by the approach used byTakahashi and Miller [15]. They translated the primary protein structure in a sequence
of notes, and after that they expressed each note as a chord of a diatonic scale.
Amino Acid Sequence (Primary Structure) Folded Protein (Tertiary Structure)
protein folding
Fig. 1. Protein structures: linear sequence of amino acids, described on the left, fold into
specific spatial conformations, driven by non-covalent interactions
8/6/2019 Music Translation of Tertiary Protein Structure - Auditory Patterns of the Protein Folding
http://slidepdf.com/reader/full/music-translation-of-tertiary-protein-structure-auditory-patterns-of-the 3/9
216 R. Castagna, A. Chiolerio, and V. Margaria
Moreover, they introduced rhythm into the composition by analyzing the abundance
of a specific codon into the corresponding organism and relating this information with
note duration. Anyway, the use of the diatonic scale and the trick of chords built on a
single note gave rise to results that are partially able to satisfy the listener from a
musical point of view but, unfortunately, they are not a reliable representation of the
complexity of the molecular organization of the protein.
Music is not a mere linear sequence of notes. Our minds perceive pieces of music
on a level far higher than that. We chunk notes into phrases, phrases into melodies,
melodies into movements, and movements into full pieces. Similarly, proteins only
make sense when they act as chunked units. Although a primary structure carries all
the information for the tertiary structure to be created, it still "feels" like less, for its
potential is only realized when the tertiary structure is actually physically created
[11]. Consequently, a successful approach for the musical interpretation of proteincomplexity must take in account, at least, its tertiary structures and could not be based
only on its primary or secondary structure.
2 Method
Our pilot study focused on the amino acid sequence of chain A of the Human
Thymidylate Synthase A (ThyA), to create a comparison with the most recent work
published on this subject [15]. The translation of amino acids into musical notes was
based on the use of Bio2Midi software [8], by means of a chromatic scale to avoid
any kind of filter on the result.The protein 3-dimensional (3D) crystallographic structure was obtained from the
Protein Data Bank (PDB). Information relative to the spatial position in a 3D volumeof each atom composing the amino acids was recovered from the PDB textual
structure (http://www.rcsb.org/pdb/explore/explore.do?structureId=1HVY).
The above mentioned file was loaded in a Matlab® environment, together with
other useful indicators such as the nature of the atom and its sequential position in thechain, the corresponding amino acid type and its sequential position in the protein.
A Matlab® script was written and implemented to translate the structure in music,
as described below. We adopted an approach based on the computation of the centre
of mass of each amino acid, which was identified and used as the basis for subsequent
operations: this was done considering each atom composing every amino acid, its
position in a 3D space and its mass. Therefore, the mass-weighed mean position of
the atoms represented the amino acid centre of mass.
3 Results
3.1 Distance Matrix: Musical Chords
The first important output obtained from the algorithm was the distance matrix,
containing the lengths of the vectors connecting the amino acids one by one.
The matrix is symmetrical by definition and features an interesting block structure
(see Figure 2). The symmetry is explained simply considering that the distance
8/6/2019 Music Translation of Tertiary Protein Structure - Auditory Patterns of the Protein Folding
http://slidepdf.com/reader/full/music-translation-of-tertiary-protein-structure-auditory-patterns-of-the 4/9
Music Translation of Tertiary Protein Structure 217
Fig. 2. Distance matrix. X and Y axis: sequential number of amino acid; Z axis: distance in pm
(both vertical scale and colour scale).
Fig. 3. Sketch of the Matlab® script operation
8/6/2019 Music Translation of Tertiary Protein Structure - Auditory Patterns of the Protein Folding
http://slidepdf.com/reader/full/music-translation-of-tertiary-protein-structure-auditory-patterns-of-the 5/9
218 R. Castagna, A. Chiolerio, and V. Margaria
Fig. 4. Distance distributions. Histograms depicting the distance distribution in the three
orders(X axis: distance in pm; Y axis: number of amino acid couples). Selection rules avoid the
nearest-neighbours amino acids. Increasing the cut-off to third order it is possible to sample the
second peak of the bi-modal distribution (diagonal lines).
between the i-th and j-th amino acid is the same of that between the j-th amino acid
and the i-th one. The block structure is probably due to amino acid clustering in
portions of the primary chain.
We sought spatial correlations between amino acids as non-nearest-neighbour,
hence ignoring those amino acids which are placed close one to the other along the
primary sequence. By spatial correlation, we mean the closest distance between non-
obviously linked amino acids.
Running the sequence, the Matlab® script looked for three spatial correlations (i.e.
three minimal distances) involving three different amino acids (two by two), as sketched
in Figure 3. The two couples for every position in the primary chain were then stored
and the corresponding distance distribution was extracted and plotted (see Figure 4).Those spatial correlations, or distances, are addressed to as first, second and third order.
The parallelism between the sequence order and the discretization order ubiquitous in
every field of digital information theory emerges from the spatial description of the
protein: the more precise is the observation of the amino acids in proximity of a certain
point, the higher the order necessary to include every spatial relation. The same conceptapplies to digital music: the higher either the bit-rate (as in MP3 codification) or the
sampling frequency (as in CD-DA), the higher the fidelity. The topmost limit is an order
equal to n, the number of amino acids composing a protein.
8/6/2019 Music Translation of Tertiary Protein Structure - Auditory Patterns of the Protein Folding
http://slidepdf.com/reader/full/music-translation-of-tertiary-protein-structure-auditory-patterns-of-the 6/9
Music Translation of Tertiary Protein Structure 219
3.2 Note Intensity: Musical Dynamics
The note intensity is given in a range between 0 and 99. We assumed that the closer isthe couple of amino acids, the higher is the intensity of the musical note. In order toplay each order with an intensity comparable to the first order, characterized by theclosest couples which may be found in the whole protein structure, we performed anormalization of the distance data within each order. In this way, normalized distancedata, multiplied times 99, give the correct intensity scale.
3.3 Angle Distribution: Musical Rhythm
The primary sequence was analyzed also to extrapolate the degree of folding, a
measure of the local angle between segments ideally connecting the centres of mass
of subsequent amino acids.Proteins composed by extended planar portions β-sheet tend to have an angular
distribution centred around 180°. The angle distribution was extracted (see Figure 5)and parameterized as the note length: the more linear is the chain, the shorter is thenote. This step gave rhythm to the generated music. In this way, the musical rhythm isintended as the velocity of an imaginary visitor running on the primary sequence. Wewould like to point out that this conversion features a third order cut-off, meaning thatthe spatial description fidelity is based on three spatial relations for each amino acidposition; the higher is the cut-off, the higher the sound quality.
Fig. 5. Angular distribution. Histogram showing the angular distribution of the vectors linking
the amino acids along the chain A of the human Thymidylate Synthase A (ThyA), used to
codify each note length.
8/6/2019 Music Translation of Tertiary Protein Structure - Auditory Patterns of the Protein Folding
http://slidepdf.com/reader/full/music-translation-of-tertiary-protein-structure-auditory-patterns-of-the 7/9
220 R. Castagna, A. Chiolerio, and V. Margaria
3.4 Output: Musical Score/Notation
Finally, a conversion from each amino acid to the corresponding note was performed,
generating an ASCII textual output that can be converted to a MIDI file with the
GNMidi software [9].Since amino acids’ properties influence the protein folding process, we adopted
Dunn’s translation method [4] that is based on amino acids water solubility. The mostinsoluble residues were assigned pitches in the lowest octave, the most soluble,including the charged residues, were in the highest octave, and the moderatelyinsoluble residues were given the middle range. Thus, pitches ranged over twooctaves for a chromatic scale.
After that, the MIDI files of the three orders were loaded in Ableton Live software
[1] and assigned to MIDI instruments. We chose to assign the first order sequence of
musical notes to a lead instrument (Glockenspiel) and to use a string emulator to playthe other two sequences (see Figure 6). In this way it is possible to discern the
primary sequence from the related musical texture that represents the amino acids
involved in the 3D structure (See Additional Data File).
Fig. 6. Score of the musical translation of the chain A of the human Thymidylate Synthase A
(ThyA). The three instruments represent respectively the primary sequence (Glockenspiel) and
the two different amino acids correlated in the tertiary structure (String Bass, Viola).
4 Discussion
We obtained a polyphonic music by translating into musical notes the amino acidsequence of a peptide (the chain A of ThyA) and arranging them in chords byanalyzing their spatial relationship (see Figure 7). To our knowledge, it is the firsttime that a team of scientists and musicians creates a polyphonic music that describesthe entire 3D structure of a bio-molecule.
8/6/2019 Music Translation of Tertiary Protein Structure - Auditory Patterns of the Protein Folding
http://slidepdf.com/reader/full/music-translation-of-tertiary-protein-structure-auditory-patterns-of-the 8/9
Music Translation of Tertiary Protein Structure 221
PPHGELQYLGQIQHILRCGVRKD
DRTGTGTLSVFGMQARYSLRDEF
PLLTTKRVFWKGVLEELLWFIKGS
TNAKELSSKGVKIWDANGSRDFL
DSLGFSTREEGDLGPVYGFQWRH
FGAEYRDMESDYSGQGVDQLQR
VIDTIKTNPDDRRIIMCAWNPRDL
PLMALPPCHALCQFYVVNSELSC
QLYQRSGDMGLGVPFNIASYALLT
YMIAHITGLKPGDFIHTLGDAHIYL
NHIEPLKIQLQRE PRPFPKLRILRK
VEKIDDFKAEDFQIEGYNPHPTIK
MEMAV
E 30
L 74
P 273
E 30
L 74
P 273
a.
b.
Fig. 7. From tertiary protein structure to musical chord. The primary structure of ThyA (chain
A), on top right of the figure, fold into its tertiary structure (a, b). In yellow an example of the
amino acids composing a musical chord: E30, L74 and P273 are non-obviously linked amino
acids accordingly to our 3D spatial analysis criteria.
Previous works, attempting to translate a protein in music, focused on primary or
secondary protein structure and used different tricks to obtain a polyphonic music.
Instead, the Matlab® script we developed is able to analyze the PDB file that
contains the spatial coordinates of each atom composing the amino acids of theprotein. The computation of distances and other useful geometrical properties
between non-adjacent amino acids, generates a MIDI file that codifies the 3Dstructure of the protein into music.
In this way, the polyphonic music contains all the crucial information necessary to
describe a protein, from its primary to its tertiary structure. Nevertheless, our analysis
is fully reversible: by applying the same translation rules that are used to generate
music, one can store, position by position, the notes (i.e. the amino acids) and obtain
their distance. A first order musical sequence gives not enough information to recover
the true protein structure, because there is more than one unique possibility to draw
the protein. On the contrary, our approach, based on a third order musical sequence,
has 3 times more data and describes one and only one solution to the problem of
placing the amino acids in a 3D space.
5 Conclusions
Our work represents an attempt to communicate to a wider audience the complexity
of the 3D protein structure, based on a rhythmic musical rendering. Biomusic can be
an useful educational tool to depict the mechanisms that give rise to intracellular vital
signals and determine cells fate. The possibility to “hear” the relations between amino
acids and protein folding could definitely help students and a non technical auditory
to understand the different facets and rules that regulate cells processes.
8/6/2019 Music Translation of Tertiary Protein Structure - Auditory Patterns of the Protein Folding
http://slidepdf.com/reader/full/music-translation-of-tertiary-protein-structure-auditory-patterns-of-the 9/9
222 R. Castagna, A. Chiolerio, and V. Margaria
Moreover, several examples of interdisciplinary projects demonstrated that the use
of an heuristic approach, sometimes perceived by the interacting audience as a
game, can lead to interesting and useful scientific results [2, 5, 16]. We hope to bring
musical harmony rules into the proteomic research field, encouraging a new
generation of protein folding algorithms. Protein structure prediction, despite all the
efforts and the development of several approaches, remains an extremely difficult and
unresolved undertaking. We do not exclude that, in the future, musicality could be
one of the driving indicators for protein folding investigation.
Acknowledgments. The authors would like to acknowledge Smart Collective (www.
smart-collective.com) and Prof. Fabrizio Pirri (Politecnico di Torino) for supporting.
References
1. Ableton Live, http://www.ableton.com
2. Cyranoski, D.: Japan plays trump card to get kids into science. Nature 435, 726 (2005)
3. Dobson, C.M.: Protein folding and misfolding. Nature 426(6968), 884–890 (2003)
4. Dunn, J., Clak, M.A.: Life music: the sonification of proteins. Leonardo 32, 25–32 (1999)
5. Foldit, http://fold.it/portal
6. Gena, P., Strom, C.: Musical synthesis of DNA sequences. XI Colloquio di Informatica
Musicale, 203–204 (1995)
7. Gena, P., Strom, C.: A physiological approach to DNA music. In: CADE 2001, pp. 129–
134 (2001)
8. Gene2Music, http://www.mimg.ucla.edu/faculty/miller_jh/
gene2music/home.html
9. GNMidi, http://www.gnmidi.com
10. Hayashi, K., Munakata, N.: Basically musical. Nature 310(5973), 96 (1984)
11. Hofstadter, D.R.: Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books, New York
(1979) ISBN 0465026567
12. Jensen, E., Rusay, R.: Musical representations of the Fibonacci string and proteins using
Mathematica. Mathematica J. 8, 55 (2001)
13. Mayor, S.: Serendipity and cell biology. Mol. Biol. Cell 21(22), 3808–3870 (2010)
14. Ohno, S., Ohno, M.: The all pervasive principle of repetitious recurrence governs not only
coding sequence construction but also human endeavor in musical composition.
Immunogenetics 24, 71–78 (1986)
15. Takahashi, R., Miller, J.: Conversion of amino acid sequence in proteins to classical music:
search for auditory patterns. Genome Biology 8(5), 405 (2007)
16. The Space Game, http://www.thespacegame.org