NEURAL NETWORKS AND MUSIC 1 Running Head ... - … Awards, 2012/cognitive scie… · NEURAL NETWORKS AND MUSIC 1 ... Author of “This is Your Brain on Music”, Levitin firmly believes

NEURAL NETWORKS AND MUSIC 1

Running Head: NEURAL NETWORKS AND MUSIC

A Modern View on the “Third Culture” Movement:

Neural Networks and Music

Abigail L. Kleinsmith

State University of New York at Oswego


Introduction

In 1959, novelist Charles P. Snow delivered an extremely influential lecture entitled “The

Two Cultures and the Scientific Revolution”, in Cambridge, Massachusetts. He published a

paper about this same topic three years earlier, but his lecture is what propelled his ideas into the

realm of common knowledge. He laments over what appeared to be an insurmountable divide

between the “two cultures”: those of the sciences and the humanities (Graves, 1971). According

to Snow, artists believe that scientists are “shallowly optimistic” and “unaware of man‟s

condition” (Graves, 1971), while scientists think that artists are “totally lacking in foresight” and

are “in a deep sense, anti-intellectual” (Graves, 1971). He proposed an innovative solution in the

form of a “third culture”, which would not only breach the gap, but also revel in an enhanced and

more comprehensive view of our world (Lehrer, 2007).

Unfortunately, this grand vision has yet to be actualized (Lehrer, 2007). There is still a

large divide between the humanities and the scientists – which is not surprising. As an individual

with deep interests in both of these „cultures‟, it seems pertinent to attempt to find some common

ground in my own fashion. As I am currently engaged in musical performance, and the study of

neural networks and cognitive psychology, I feel compelled to contribute a paper in the form of a

discussion that will attempt to illuminate the striking similarities between these fields of study.

This paper will present a preliminary investigation into the convergence of two fields of

human interest, namely computational modeling and music. I hope that this examination will

provide supporting evidence for a neural network‟s ability to simulate and learn about aspects of

music in a practical and applicable way. In this way, it will consequently present a practical (if

micro) application of C. P. Snow‟s “third culture” theory.


Music and Cognition

Music is something which is inextricably bound to the development and evolution of the

human race. Some professionals have openly disagreed with the power and prevalence of music;

Steven Pinker has famously referred to music as “auditory cheesecake”, meaning that it is „nice

to have‟, but is ultimately unnecessary (Levitin, 2007). Many other individuals disagree with

Pinker, such as Dan Levitin of McGill University in Canada. Author of “This is Your Brain on

Music”, Levitin firmly believes that music is something that evolved simultaneously with the

language of our species, and is in fact directly attributable to our higher development (Levitin,

2007).

We have yet to discover a culture which lacks some form of music. Donald Brown‟s

book entitled “Human Universals” presents an extensive list of attributes which he believes to be

widespread across cultures, one of which is music. There have been updates and additions to his

book since its publication in 1991, but there remains a rather large section dedicated to music. He

lists such universals as: children‟s music, musical redundancy, music as seen in art, musical

variation, music as related to social functions, and music as a religious activity (Brown, 1991).

This example demonstrates the apparent importance of music from a cultural perspective.

In order to empirically discuss the issue at hand, we must address music‟s effect on the

human brain. The fact that musical activities tend to activate an area in nearly every part of the

brain lends support to the idea that music has developed with our species. If a creative construct

has the power to simultaneously activate various cortical areas, it could help to develop stronger

pervasive bonds between the activated neurons. Recent topics documented by William F.

Thompson in his book “Music, Thought, and Feeling: Understanding the Psychology of Music”


include the emotional effects of music on the human limbic and cortical systems, and the ability

of pleasant music to activate parietal, frontal, and temporal lobes (Thompson, 2009).

In concordance with emotional aspects of music cognition, I performed a study in 2009

that examined the potential of musical key and tempo to alter a person‟s affect. I recorded one

piece of music played in four different ways (major key/fast tempo, major key/slow tempo,

minor key/fast tempo, minor key/slow tempo); participants listened to one of the four pieces

while they read an intentionally ambiguous story. Participants were then asked questions about

both the character‟s mood and state of being. Interestingly, there was a high correlation between

key and projected affect; participants overwhelmingly perceived the character as being happy

when they heard the song in a major key, and sad when they heard the song in a minor key

(ANOVA, p = 0.006) (Kleinsmith, 2009). Another notable fact is that only two of the individuals

surveyed had any kind of musical training, formal or otherwise (N = 24). This seems to imply

that there is something implicit about music‟s ability to alter our affective state, and this concept

aligns itself with some of the documented ideas of Charles Darwin.

Darwin‟s evolutionary theory helps to support the fact that music has an adaptive

component. The survival of a species can be contingent upon their ability to engage in group

cohesion and cooperation (Thompson, 2009). Music can help to synchronously engage

individuals in a rhythmic activity such as marching, clapping, or drumming. If a group of people

forms an interconnected unit, they have greater chances for survival. A disconnected and chaotic

group will not achieve the same level of performance (Thompson, 2009). As such, one can

theorize that groups who engage in synchronous and rhythmic activities may increase their

chances of survival.


The study of music‟s effect on the human brain is a developing field, due in part to the

emergence of cognitive science as a reliable field of study. Groups focusing their efforts on the

interaction of music and the brain, such as The Society for Music Perception and Cognition

(SMPC), have been founded recently (1990). The SMPC focuses its efforts on expanding our

knowledge about music by studying it empirically from various angles. It is obvious that this is a

burgeoning field of study which has the potential to actualize C. P. Snow‟s “third culture”

theory.

General concepts of neural networks

Computational modeling is a useful tool for discussing human cognition, and artificial

neural networks are one of the more common types of modeling discussed in the literature.

Barbara Tillman provides a nice summary of the purpose of neural networks; she says that “the

goal of artificial networks is not to describe neural anatomy and physiology, but to be founded

on neural principles in order to simulate different levels of perceptual and cognitive processing”

(Peretz & Zatorre, 2003). In my opinion, the degree to which a neural network simulates varying

levels of processing defines its usefulness. Additionally, there needs to be a “biologically

realistic” aspect to the simulation because if a simulation‟s output is not applicable to a real

situation, then its usefulness drastically decreases.

Connectionism is also another topic which is critically linked to both artificial neural

networks and the issues presented in this paper. Munakata and O‟Reilly discuss the idea of

connectionism in their textbook about computational modeling, and indicate that it is also known

as backpropagation (O‟Reilly & Munakata, 2000). Backpropagation can function as a

mechanism which identifies and attempts to correct errors within a network by adjusting specific

weights to fit a constraint. The importance of interconnectivity cannot be ignored as it is a basic


tenet of neural networks. The interaction of different nodes in a network is what ultimately

produces an output far greater than what the nodes would have initially been able to produce

individually.

The two neural networks which will be presented in this current paper deal with self-

organizing maps (SOMs), or the Kohonen algorithm. The Kohonen algorithm has also been used

to identify competitive types of neural networks, and is associated with unsupervised learning

(O‟Reilly & Munakata, 2000). Consequently, these networks are also associated with Hebbian

learning. The SOM is a computational tool for analyzing a network‟s output. It is also a tool that

acts in a reductionist manner and condenses high-dimensional data into a manageable two-

dimensional form (Toiviainen, 1996). A SOM represents multiple relations between data as well;

the proximity of one node to another can indicate that the two are similar. The Kohonen

algorithm provides a concise and effective way of discussing algorithmic musical networks.

Artificial neural networks of music

This discussion of neural networks and music could benefit from a brief discussion of

temporality. Jeffery Elman‟s groundbreaking paper “Finding Structure in Time”, written in 1990,

discussed the importance of temporal aspects in relation to cognitive science, and there appears

to be no human cognition which relies more fundamentally upon temporality than music. For

Elman, “time is inextricably bound up with many behaviors which express themselves as

temporal sequences. Indeed, it is difficult to know how one might deal with such basic problems

as goal-directed behavior, planning, or causation without some way of representing time”

(Elman, 1990). It becomes apparent that studies of music cognition have an important place in

the realm of cognitive studies because time is intimately connected with music. For example, it


would be quite impossible to discuss the erratic melodic lines of Schoenberg‟s string quartets

without discussing the concept of time.

Viewing music from this cognitive perspective allows us to create and solidify a

foundation for the purpose of discussing both fields on the same plane. It is important to have a

directed reason for discussing this unconventional application of neural networks. As this paper

will demonstrate, not only do artificial neural networks of music have an important place in the

study of the brain, but they can highlight and enhance scientists appreciation of the arts, and

consequently contribute to the development of a true “third culture”.

While artificial neural networks have long been used to demonstrate and describe a wide

variety of cognitive tasks and disorders, their use for the modeling of music is an area which is

just currently emerging as a useful intellectual tool. While some individuals, such as Petri

Toiviainen (who is a professor of musicology at the University of Jyväskylä in Finland), have

focused their academic research careers almost solely upon the abilities of neural networks to

model music cognition, it is a very specific area of study and is therefore underrepresented in the

literature. An individual studying both of the seemingly disparate fields – and their interaction –

appears to require a great deal of knowledge in more than one area; this may be another

contributing factor the lack of available empirical research about the topic. It now seems

pertinent to discuss some of Toiviainen‟s research, as it seems to provide a beautiful affirmation

of the fact that artificial neural networks can accurately and effectively model aspects of music.

One of Toiviainen‟s studies, he extrapolates on Carol Krumhansl‟s studies on tonal

hierarchy within the Western twelve-tone chromatic scale (see Figure 1 below, “C” is repeated).

He utilizes a neural network which has been designed to recognize and classify notes within

bebop-style jazz improvisation (Toiviainen, 1996). Improvisation is of particular interest to


cognitive musicologists because of its inherent random nature. When musicians engage in

improvisation, they are operating under varying types and degrees of constraints (i.e., key of the

piece, tempo, musical meter, etc.). A computer simulation can model this by utilizing a „chaotic‟

element that could randomly distribute weight values across the network (Toiviainen, 1996).

Figure 1: Twelve-tone Western chromatic scale, C major.

Toiviainen was particularly interested in the ability of a neural network to establish and

detect a sense of tonal hierarchy within the music, through the use of statistics. A tonal hierarchy

can be defined as a human‟s intentional ordering of notes in order of their importance to the

structure of the scale (Toiviainen, 1996). A tone may be perceived as more “important” than

another if it is a critical place-holder in terms of the scale itself. For example in the key of C

major, the C is the most critical to the overall structure and development of the scale. It both

begins and ends the scale, and is the tone with which the scale is associated (see Figure 2 below).

The notes E and G are perceived as being the next most critical tones because of their importance

to the key‟s triad (see Figure 3 below). The triad is a chord which defines the key in an

identifiable way (Toiviainen, 1996).

Figure 2: C Major scale Figure 3: C Major Triad


It is interesting to note that many individuals will “pick up on” or sense these critical

structural architectures, even without any general knowledge of music theory. As demonstrated

earlier with my 2009 study, individuals who have absolutely no formal musical training will

respond to changes in musical key and demonstrate a change in affect as a direct result of the

music and the manipulation of independent variables. I believe that the tonal hierarchy within a

C major triad would be immediately identified and conceptually understood upon hearing an

auditory example.

Keeping this in mind, we can move on to an exploration of Toiviainen‟s findings with

this particular neural network. The results of Krumhansl‟s study (performed with actual human

listeners and not a neural network) can be seen in Figure 4; she demonstrated that there are very

definite perceived differences between the tones in a chromatic Western scale. The same pattern

is demonstrated in an imitative study executed by Järvinen and colleagues in 1995 (see Figure 5);

this time, the pattern is displayed in relation to the frequency with which the tones were present

in a set of fifty-six improvised samples of bebop-jazz. It would appear that there is a correlation

between the perceived importance of a note within the structure of a scale, and the frequency

with which it is heard in the actual music (Toiviainen, 1996).

Figure 4: Tonal hierarchy (Krumhansl, 1990)


Figure 5: Tonal frequency within improvisational jazz (Järvinen, 1995)

This pattern of implied importance is replicated consistently with Toiviainen‟s artificial

neural network. Figure 6 demonstrates the architecture of the network model. The network itself

has three true sources of input, although the architecture appears to indicate that there are more.

They are as follows:

1. (C) represents the Context Input and indicates information about the Present Chords

(PC) and the Following Chords (FC) being presented to the network.

2. (F) represents Feedback Input and can be thought of as a primitive type of short-term

memory, joining together musical patterns.

3. (E) represents External Input and accounts for the randomness of improvisation by

adding variation to the output patterns. (Toiviainen, 1996).


Figure 6: Architecture of Toiviainen‟s neural network (1996).

The network was trained to recognize jazz melodies as they were presented to it; Figure 7

demonstrates the network‟s representation of an example melody. Toiviainen makes explicit

reference to the occurrence of Hebbian learning in this particular neural network when he

discusses how the network was trained to learn melodies. They were learned “by strengthening

the connections between the active neurons of the auto-associator” (Toiviainen, 1996). This idea

is synchronous with concepts relevant to more general types of neural networks, and this point

serves to highlight the ability of neural networks to simulate musical concepts.

Figure 7: The network‟s representation of a melody


The results of the training sequence are clearly correlated with Krumhansl‟s findings.

Toiviainen describes his network as a connectionist model, which truly highlights the idea of

emergence in relation to the hierarchical structure of a chromatic music scale. Figure 8 indicates

the frequency of occurrence of certain notes within the network. The white nodes indicate the

input (learning) phase, and the black nodes indicate the output (production) phase.

Figure 8: Input and Output of Toiviainen‟s neural network (1996)

The network was similarly trained and run in concordance with different constraints set

by Toiviainen and his colleagues. Using the same basic network architecture, he analyzed the

frequency of occurrence of the individual chords of the chromatic C major scale. Each note of

the scale can be the root note (first note) of a chord; consequently, there are twelve chords

represented in Figure 9, which demonstrates the difference between the input/learning phase

(white nodes) and the output/production phase (black nodes) in the form of a graph (Toiviainen,

1996).

Toiviainen indicates that there is a clear tendency of the network to emphasize (or de-

emphasize) notes in the scale depending on their tonal function (Toiviainen, 1996). Music that

utilizes a greater number of “unimportant” / infrequent tones (such as C# and G#, as seen in


Figure 9) is frequently perceived as being abstract, obscure, or unpleasant. This is due to the

general architecture of the auditory cortex, in addition to the listener‟s expectations. Some

musical artists intentionally do this in order to elicit a desired emotion or reaction in his or her

listening audience (Lehrer, 2007).

Figure 9: Frequency of tonal occurrences in an output of the neural network.

Toiviainen‟s goals for this study were to present and evaluate the output of a network

associated with tonal hierarchy, as well as to critically evaluate the output and suggest means of

improvement. He lists ways in which to improve the methodology and help to more accurately

model music in a scientific way, which are not critically relevant to the current investigation.

This neural network was one of his less complex demonstrations, making it feasible to discuss

within the context of this overview (Toiviainen, 1996).

In contrast to the first example of a musical neural network, I would now like to present a

recurrent connectionist network which is designed to perform music composition. The network is

called CONCERT (an acronym for CONnectionist Composer of ERudite Tunes), and it was

developed by Michael C. Mozer of the University of Colorado at Boulder (Griffith & Todd,

1999). This network composes music solely by imitation and prediction based upon the training


sequences to which it is exposed. It incorporates the three musical aspects of pitch, note duration,

and harmonic structure into its composition. CONCERT was trained using multiple sets of both

Johann Sebastian Bach pieces and traditional European fold melodies (Griffith & Todd, 1999). I

now will briefly explain the network‟s basic architecture in addition to its methods of learning

and its process of composition.

CONCERT‟s architecture is composed of various levels of layers and is very clearly

recurrent. Not surprisingly, its architecture is similar to that of Toiviainen‟s; they are both the

same type of neural network, making use of the Kohonen algorithm. A melody is presented to

the network in a note-by-note fashion, so the input node in this network could be represented by

the “Current Note” node. Mozer indicates that the “Context” node of the network acts as a layer

which “can represent relevant aspects of the input history, that is, the temporal context in which a

prediction is made” (Griffith & Todd, 1999). This layer acts as a form of short-term memory for

the network.

The information flows from the “Context” node to the next two nodes, which are “Next

Note Distributed (NDD)” and “Next Note Local (NNL)”. They both incorporate and represent

the three aspects of music listed previously in this review: pitch, duration, and harmonic

structure. Mozer indicates that these layers “contain CONCERT‟s internal representation of the

note” (Griffith & Todd, 1999). Finally, the prediction of the next note is represented in the output

layer of the network, labeled as the “Note Selector” node (Griffith & Todd, 1999). A pictorial

representation of CONCERT‟s architecture can be seen in Figure 10 below.


Figure 10: CONCERT‟s architecture

With the network‟s architecture in mind, it is important to discuss its method of

composition. Mozer refers to its compositional technique as “algorithmic music composition”

(Griffith & Todd, 1999). He defines this as the network‟s ability to select notes in a sequential

and logical order according to a specific and pre-programmed table. This table gives a numerical

representation of the probability of one note to transition into another, and is cited multiple times

in Griffith and Todd‟s discussion of Mozer‟s network (Griffith & Todd, 1999). This transitional

probability seems similar to the work done by Krumhansl which was discussed earlier in this

paper, although she is not attributed to the development of this table. According to Mozer, it is

possible to manipulate and develop individual transitional tables in order to exemplify specific

musical styles. The one which Mozer utilized for the development of CONCERT is apparently

based upon the transitional probabilities found within traditional European folk music (Griffith &

Todd, 1999).


Finally, I will discuss the way in which CONCERT is trained. During the learning

process, the Kohonen algorithm is utilized to present one note at a time to the network The way

in which the network composes music is almost identical to the way in which it is trained, which

is not surprising. The network composes one note at a time, basing its next „decision‟ upon the

previous note in the sequence. Mozer also indicates that CONCERT uses a form of

backpropagation, which is evident upon simply looking at the architecture (Griffith & Todd,

1999).

On a final note, it is interesting that Mozer appears to have experienced some degree of

success with CONCERT. He indicates that music is easier to simulate with a neural network than

natural language; music‟s finite grammar in conjunction with its “psychoacoustic and stylistic

regularities” (Griffith & Todd, 1999) make it relatively easy to model.

Further Considerations

This paper only briefly touches upon the applications of neural networks to the modeling

of music cognition. As a result, there are many ideas that still need to be addressed with respect

to the combination of the sciences and the humanities. One issue with combining neural science

and music is a lack of global coherence, mentioned in the discussion of CONCERT. Composers

of music can be influenced by an infinite number of things, and individual experience and

cultural perceptions are perhaps the two most prominent. Neural networks have no true

„individual experience‟, except for the training sequences to which they are subjected. “The

difficulty is in deriving this knowledge in an explicit form: even human composers are unaware

of many of the constraints under which they operate” (Griffith & Todd, 1999).

Although it is possible to imitate a sense of randomness with a chaotic weight in a

network, it is strange to imagine a neural network improvising in a cool jazz style, such as that of


Miles Davis. As has been demonstrated with the help of Munakata and O‟Reilly, it is important

to remember that neural networks are one tool of many that we can use to analyze our human

experience.

As an amateur musician myself, it is slightly strange to imagine a neural network

composing pieces of music which could attain such popularity as Beethoven‟s great symphonies.

However, this intellectual issue could potentially come to the forefront of the modern musical

community. As our culture progresses into the twenty-first century at an alarming speed, our

technologies advance at a similar rate. Perhaps new methods of computational processing will be

developed in which music can be composed creatively, as opposed to in a merely imitational

way.

I believe that in the coming years, C. P. Snow‟s “third culture” will become more of a

reality than he had ever imagined. Even in my own personal experience, I am noticing the

importance of a multifaceted background, both academically and otherwise. In today‟s

interconnected global community, it is critical that individuals be able to make connections that

span more than one field of interest or study. Theoretically, persons with multidimensional

backgrounds will move to the forefront of the intellectual community and help to create a

foundation of understanding between the scientists and the artists of the twenty-first century. It

is my hope that this paper illuminates one small way in which the gap between the sciences and

the humanities can be bridged, and potentially even closed sometime in the near future.


References

Brown, D. E. (1991). Human universals. Philadelphia: Temple University Press.

Elman, J. L. (1990). Finding Structure in Time. Cognitive Science, 14, 179-211.

Graves, N. C. (1971). The two culture theory in C. P. Snow's novels. Hattiesburg: University and

College Press of Mississippi.

Griffith, N., & Todd, P. M. (1999). Musical networks: Parallel distributed perception and

performance. Cambridge, Mass: MIT Press.

Kleinsmith, A. (2009). Effects of musical properties upon emotion perception: A study in

psychoacoustics. Oswego, NY: Unpublished manuscript.

Lehrer, J. (2007). Proust was a neuroscientist. Boston: Houghton Mifflin Co.

Levitin, D. J. (2007). This is your brain on music: The science of a human obsession. New York:

Plume.

O'Reilly, R. C., & Munakata, Y. (2000). Computational explorations in cognitive neuroscience:

Understanding the mind by simulating the brain. Cambridge, Mass: MIT Press.

Peretz, I., & Zatorre, R. J. (2003). The cognitive neuroscience of music. Oxford: Oxford

University Press.

Thompson, W. F. (2009). Music, thought, and feeling: Understanding the psychology of music.

Oxford: Oxford University Press.

Toiviainen, P. (1996). Modelling musical cognition with artificial neural networks. Jyv skyl :

University of Jyv skyl .

Documents

NEURAL NETWORKS AND MUSIC 1 Running Head ... - … Awards, 2012/cognitive scie… · NEURAL NETWORKS AND MUSIC 1 ... Author of “This is Your Brain on Music”, Levitin firmly believes