16
In this article, we first survey the three major types of computer music systems based on AI tech- niques: (1) compositional, (2) improvisational, and (3) performance systems. Representative examples of each type are briefly described. Then, we look in more detail at the problem of endowing the resulting performances with the expressiveness that characterizes human-generated music. This is one of the most challenging aspects of computer music that has been addressed just recently. The main problem in modeling expressiveness is to grasp the performer’s “touch,” that is, the knowl- edge applied when performing a score. Humans acquire it through a long process of observation and imitation. For this reason, previous approach- es, based on following musical rules trying to cap- ture interpretation knowledge, had serious limita- tions. An alternative approach, much closer to the observation-imitation process observed in humans, is that of directly using the interpretation knowledge implicit in examples extracted from recordings of human performers instead of trying to make explicit such knowledge. In the last part of the article, we report on a performance system, SAXEX, based on this alternative approach, that is capable of generating high-quality expressive solo performances of jazz ballads based on examples of human performers within a case-based reasoning (CBR) system. A I has played a crucial role in the history of computer music almost since its beginning in the fifties. However, until recently, most efforts had been on composi- tional and improvisational systems, and little efforts had been devoted to performance sys- tems. Furthermore, AI approaches to perfor- mance tried to capture the performer’s touch, that is, the knowledge applied when perform- ing a score by means of rules. This performance knowledge concerns not only technical fea- tures but also the affective aspects implicit in music. Humans acquire it through a long process of observation and imitation (Dowling and Harwood 1986). For this reason, AI approaches, based on following musical rules trying to capture interpretation knowledge, have serious limitations when attempting to model humanlike music performance. The last sections of this article describe the SaxEx sys- tem (Arcos and Lopez de Mantaras 2001; Arcos, Lopez de Mantaras, and Serra 1998). 1 SaxEx directly uses the performance knowledge implicit in examples extracted from recordings of human performers instead of trying to make that knowledge explicit by means of rules. With this approach, closer to the observation- imitation process observed in humans, SaxEx is capable of generating high-quality human- like monophonic (one single melodic instru- ment) performances of jazz ballads based on examples of human performers. 2 SAXEX is based on a type of analogical reason- ing called CBR. In CBR (Aamodt and Plaza 1994; Kolodner 1993; Leake 1996), problems are solved by reusing (often through some sort of adaptation process) the solutions to similar, pre- viously solved problems. It relies on the reason- able assumption that similar problems have similar solutions. CBR is appropriate for prob- lems where (1) many examples of already solved similar problems are available and (2) a large part of the knowledge involved in the solution of problems is tacit, that is, difficult to verbalize and generalize (as is the case for knowledge involved when playing expressive- ly). An additional advantage of CBR is that each new solved problem can be revised by a human and memorized, and therefore, the system can improve its problem-solving capabilities by experience; in other words, it learns by experi- ence. It is worth noticing here that although the use of CBR was motivated by the observation- Articles FALL 2002 43 AI and Music From Composition to Expressive Performance Ramon Lopez de Mantaras and Josep Lluis Arcos Copyright © 2002, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2002 / $2.00 AI Magazine Volume 23 Number 3 (2002) (© AAAI)

Articles AI and Music

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Articles AI and Music

■ In this article, we first survey the three major typesof computer music systems based on AI tech-niques: (1) compositional, (2) improvisational,and (3) performance systems. Representativeexamples of each type are briefly described. Then,we look in more detail at the problem of endowingthe resulting performances with the expressivenessthat characterizes human-generated music. This isone of the most challenging aspects of computermusic that has been addressed just recently. Themain problem in modeling expressiveness is tograsp the performer’s “touch,” that is, the knowl-edge applied when performing a score. Humansacquire it through a long process of observationand imitation. For this reason, previous approach-es, based on following musical rules trying to cap-ture interpretation knowledge, had serious limita-tions. An alternative approach, much closer to theobservation-imitation process observed inhumans, is that of directly using the interpretationknowledge implicit in examples extracted fromrecordings of human performers instead of tryingto make explicit such knowledge. In the last part ofthe article, we report on a performance system,SAXEX, based on this alternative approach, that iscapable of generating high-quality expressive soloperformances of jazz ballads based on examples ofhuman performers within a case-based reasoning(CBR) system.

AI has played a crucial role in the historyof computer music almost since itsbeginning in the fifties. However, until

recently, most efforts had been on composi-tional and improvisational systems, and littleefforts had been devoted to performance sys-tems. Furthermore, AI approaches to perfor-mance tried to capture the performer’s touch,that is, the knowledge applied when perform-ing a score by means of rules. This performanceknowledge concerns not only technical fea-tures but also the affective aspects implicit inmusic. Humans acquire it through a long

process of observation and imitation (Dowlingand Harwood 1986). For this reason, AIapproaches, based on following musical rulestrying to capture interpretation knowledge,have serious limitations when attempting tomodel humanlike music performance. The lastsections of this article describe the SaxEx sys-tem (Arcos and Lopez de Mantaras 2001; Arcos,Lopez de Mantaras, and Serra 1998).1 SaxExdirectly uses the performance knowledgeimplicit in examples extracted from recordingsof human performers instead of trying to makethat knowledge explicit by means of rules.With this approach, closer to the observation-imitation process observed in humans, SaxExis capable of generating high-quality human-like monophonic (one single melodic instru-ment) performances of jazz ballads based onexamples of human performers.2

SAXEX is based on a type of analogical reason-ing called CBR. In CBR (Aamodt and Plaza 1994;Kolodner 1993; Leake 1996), problems aresolved by reusing (often through some sort ofadaptation process) the solutions to similar, pre-viously solved problems. It relies on the reason-able assumption that similar problems havesimilar solutions. CBR is appropriate for prob-lems where (1) many examples of alreadysolved similar problems are available and (2) alarge part of the knowledge involved in thesolution of problems is tacit, that is, difficult toverbalize and generalize (as is the case forknowledge involved when playing expressive-ly). An additional advantage of CBR is that eachnew solved problem can be revised by a humanand memorized, and therefore, the system canimprove its problem-solving capabilities byexperience; in other words, it learns by experi-ence. It is worth noticing here that although theuse of CBR was motivated by the observation-

Articles

FALL 2002 43

AI and MusicFrom Composition to

Expressive Performance

Ramon Lopez de Mantaras and Josep Lluis Arcos

Copyright © 2002, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2002 / $2.00

AI Magazine Volume 23 Number 3 (2002) (© AAAI)

Page 2: Articles AI and Music

satisfied the rules, a simple backtracking proce-dure was used to erase the entire composition tothat point, and a new cycle was started again.The goals of Hiller and Isaacson excluded any-thing related to expressiveness and emotionalcontent. In an interview (Schwanauer andLevitt 1993), Hiller and Isaacson said that beforeaddressing the expressiveness issue, simplerproblems needed to be handled first. We believethat this was a very correct observation in thefifties. After this seminal work, many other re-searchers based their computer compositions onMarkov probability transitions but also withrather limited success, judging from the stand-point of melodic quality. Indeed, methods rely-ing too heavily on Markovian processes are notcomplete enough to produce high-qualitymusic consistently.

However, not all the early work on composi-tion relies on probabilistic approaches. A goodexample is the work of Moorer (1972) on tonalmelody generation. Moorer’s program generat-ed simple melodies, along with the underlyingharmonic progressions, with simple internalrepetition patterns of notes. This approachrelies on simulating human compositionprocesses using heuristic techniques ratherthan on Markovian probability chains. Levitt(1993) also avoided the use of probabilities inthe composition process. He argued that ran-domness tends to obscure rather than revealthe musical constraints needed to representsimple musical structures. His work is based onconstraint-based descriptions of musical styles.He developed a description language thatallows expressing musically meaningful trans-formations of input, such as chord progressionsand melodic lines, through a series of con-straint relationships that he calls style templates.He applied this approach to describe a tradi-tional walking jazz bass player simulation aswell as a two-handed ragtime piano simulation.

The early systems by Hiller and Isaacson andMoore were both based also on heuristicapproaches. However, possibly the most gen-uine example of the early use of AI techniquesis the work of Rader (1974). Rader used rule-based AI programming in his musical round (acircle canon such as “Frère Jacques”) generator.The generation of the melody and the harmonywere based on rules describing how notes orchords can be put together. The most interest-ing AI components of this system are the ap-plicability rules, which determine the applicabil-ity of the melody and chord-generation rules,and the weighting rules, which generate the like-lihood of application of an applicable rule bymeans of a weight. We can already appreciatethe use of metaknowledge in this early work.

imitation process observed in humans, we donot claim that SAXEX entirely reproduces whatmusicians do when performing music.

This article starts by briefly surveying a set ofrepresentative work on composition, improvisa-tion, and performance. Then, through thedescription of SAXEX, we look in more detail atthe problem of endowing the computer-gener-ated performances with the expressiveness thatcharacterizes human performances. Indeed, asdescribed in the AI and Music section, existingapproaches typically deal with two expressiveresources (such as dynamics and rubato, rubatoand vibrato, or rubato and articulation). In con-trast, with the approach used in SAXEX, we arecapable of dealing with all the main expressiveresources: dynamics (loudness), rubato (varia-tions in note duration and attack time), vibrato(repeated small variation in pitch’s frequencyand amplitude), articulation (variations in thetransition time between notes), and attack of thenotes. Another distinctive aspect of SAXEX is thatit works with non-MIDI sound files, whichmakes the problem much more difficult but alsomore interesting because the resulting perfor-mances are much more humanlike. We end thearticle by describing the experiments per-formed, the results obtained, and presenting adiscussion on the limitations and conclusions.

AI and MusicIn this section, we survey a set of representativecomputer music systems that use several AItechniques. It is organized into three subsec-tions: The first is devoted to compositional sys-tems, the second describes improvisation sys-tems, and the third is devoted to performancesystems. These are the subfields of computermusic in which AI techniques have mostlybeen applied. In the performance subsection,we raise the important problem of generatinghumanlike expressive performances. An exam-ple of a successful step toward humanlike per-formance is described in detail in the last sec-tions of the article.

Compositional SystemsHiller and Isaacson’s (1958) work on the ILLIAC

computer, is the best-known pioneering work incomputer music. Their chief result is the IlliacSuite, a string quartet composed following thegenerate-and-test problem-solving approach.The program generated notes pseudorandomlyby means of Markov chains. The generatednotes were next tested by means of heuristiccompositional rules of classical harmony andcounterpoint. Only the notes satisfying therules were kept. If none of the generated notes

Articles

44 AI MAGAZINE

Page 3: Articles AI and Music

AI pioneers such as Herbert Simon or MarvinMinsky also published works relevant to com-puter music. Simon and Sumner (1968, p. 83)described a formal pattern language for musicas well as a pattern induction method to dis-cover patterns more or less implicit in musicalworks. Simon and Sumner (1968, p. 83)describe one example of a pattern that can bediscovered: “The opening section is in C Major,it is followed by a section in dominant andthen a return to the original key.” Although theprogram was not completed, it is worth notic-ing that it was one of the firsts in dealing withthe important issue of music modeling, a sub-ject that has been, and still is, largely studied.For example, the use of models based on gener-ative grammars has been, and continues to be,an important and useful approach in musicmodeling (Lerdahl and Jackendoff 1983).

Marvin Minsky (1981) in his well-knownpaper Music, Mind, and Meaning addresses theimportant question of how music impressesour minds. He applies his concepts of agentand its role in a society of agents as a possibleapproach to shed light on this question. Forexample, he hints that one agent might donothing more than noticing that the music hasa particular rhythm. Other agents might per-ceive small musical patterns such as repetitionsof a pitch and differences such as the samesequence of notes played one fifth higher. Hisapproach also accounts for more complex rela-tions within a musical piece by means of high-er-order agents capable of recognizing largesections of music. It is important to clarify thatin this paper, Minsky does not try to convincethe reader about the question of the validity ofhis approach; he just hints at its plausibility.

Among the compositional systems, there area large number dealing with the problem ofautomatic harmonization using several AItechniques. One of the earliest works is that ofRothgeb (1969). He wrote a SNOBOL program tosolve the problem of harmonizing the unfig-ured bass (given a sequence of bass notes, inferthe chords and voice leadings that accompanythese bass notes) by means of a set of rules suchas “if the bass of a triad descends a semitone,then the next bass note has a sixth.” The maingoal of Rothgeb was not the automatic harmo-nization itself but testing of the computationalsoundness of two bass harmonization theoriesfrom the eighteenth century.

One of the most complete works on harmo-nization is that of Ebcioglu (1993). He devel-oped an expert system, CHORAL, to harmonizechorales in the style of J. S. Bach. CHORAL is giv-en a melody and produces the correspondingharmonization using heuristic rules and con-

straints. The system was implemented using alogical programming language designed by theauthor. An important aspect of this work is theuse of sets of logical primitives to represent thedifferent viewpoints of the music (chords view,time-slice view, melodic view, and so on).These sets of logical primitives allowed tacklingthe problem of representing large amounts ofcomplex musical knowledge.

MUSACT (Bharucha 1993) uses neural net-works to learn a model of musical harmony. Itwas designed to capture musical intuitions ofharmonic qualities. For example, one of thequalities of a dominant chord is to create in thelistener the expectancy that the tonic chord isabout to be heard. The greater the expectancy,the greater the feeling of consonance of thetonic chord. Composers can choose to satisfyor violate these expectancies to varyingdegrees. MUSACT is capable of learning suchqualities and generate graded expectancies in agiven harmonic context

In HARMONET (Feulner 1993), the harmoniza-tion problem is approached using a combina-tion of neural networks and constraint-satisfac-tion techniques. The neural network learnswhat is known as harmonic functions of thechords (chords can play the function of tonic,dominant, subdominant, and so on), and con-straints are used to fill the inner voices of thechords. The work on HARMONET was extended inthe MELONET system (Hörnel and Dagenhardt1997; Hörnel and Menzel 1998). MELONET usesa neural network to learn and reproduce high-er-level structure in melodic sequences. Givena melody, the system invents a baroque-styleharmonization and variation of any choralevoice. According to the authors, HARMONET andMELONET together form a powerful music-com-position system that generates variationswhose quality is similar to those of an experi-enced human organist.

Pachet and Roy (1998) also used constraint-satisfaction techniques for harmonization.These techniques exploit the fact that both themelody and the harmonization knowledgeimpose constraints on the possible chords.However, efficiency is a problem with pureconstraint-satisfaction approaches.

In Sabater, Arcos, and López de Mántaras(1998), the problem of harmonization isapproached using a combination of rules andCBR. This approach is based on the observa-tion that purely rule-based harmonization usu-ally fails because in general the rules don’tmake the music; it is the music that makes therules. Then, instead of relying only on a set ofimperfect rules, why not making use of thesource of the rules, that is, the compositions

Articles

FALL 2002 45

Page 4: Articles AI and Music

in the empty spaces between signatures. Toproperly insert them, EMI has to deal with prob-lems such as linking initial and concludingparts of the signatures to the surrounding mo-tives avoiding stylistic anomalies, maintainingvoice motions, and maintaining notes within arange. Proper insertion is achieved by means ofan augmented transition network (Woods1970). The results, although not perfect, arequite consistent with the style of the composer.

Improvisation SystemsAn early work on computer improvisation isthe FLAVORS BAND system of Fry (1984). FLAVORS

BAND is a procedural language, embedded inLisp, for specifying jazz and popular musicstyles. Its procedural representation allows thegeneration of scores in a prespecified style bymaking changes to a score specification givenas input. It allows combining random func-tions and musical constraints (chords, modes,and so on) to generate improvisational varia-tions. The most remarkable result of FLAVORS

BAND was an interesting arrangement of thebass line, and an improvised solo, of JohnColtrane’s composition “Giant Steps.”

GENJAM (Biles 1994) builds a model of a jazzmusician learning to improvise by means of agenetic algorithm. A human listener plays therole of fitness function by rating the offspringimprovisations. Papadopoulos and Wiggins(1998) also used a genetic algorithm to impro-vise jazz melodies on a given chord progres-sion. Contrary to GENJAM, the programincludes a fitness function that automaticallyevaluates the quality of the offspring improvi-sations rating eight different aspects of theimprovised melody, such as the melodic con-tour, note duration, and intervallic distancesbetween notes.

Franklin (2001) uses recurrent neural net-works to learn how to improvise jazz solosfrom transcriptions of solo improvisations bysaxophonist Sonny Rollins. A reinforcementlearning algorithm is used to refine the behav-ior of the neural network. The reward functionrates the system solos in terms of jazz harmonycriteria and Rollins style.

The lack of interactivity, with a humanimproviser, of these previous approaches hasbeen criticized (Thom 2001) on the groundsthat they remove the musician from the phys-ical and spontaneous creation of a melody.Although it is true that the most fundamentalcharacteristic of improvisation is the sponta-neous, real-time, creation of a melody, it is alsotrue that interactivity was not intended inthese approaches, but nevertheless, they couldgenerate very interesting improvisations.

themselves? CBR allows using examples ofalready harmonized compositions as cases fornew harmonizations. The system harmonizes agiven melody by first looking for similar, al-ready harmonized, cases; when similar casescannot be found, it looks for applicable generalrules of harmony. If no rule is applicable, thesystem fails and backtracks to the previousdecision point. The experiments have shownthat the combination of rules and cases resultsin much fewer failures in finding an appropri-ate harmonization than using either techniquealone. Another advantage of the case-basedapproach is that each newly correctly harmo-nized piece can be memorized and made avail-able as a new example to harmonize othermelodies. That is, a learning by experienceprocess takes place. Indeed, the more examplesthe system has, the less often the system needsto resort to the rules, and therefore, it fails less.MUSE (Schwanauer 1993) is also a learning sys-tem that extends an initially small set of voice-leading constraints by learning a set of rules ofvoice doubling and voice leading. It learns byreordering the rules agenda and chunking therules that satisfy the set of voice-leading con-straints. MUSE successfully learned some of thestandard rules of voice leading included in tra-ditional books of tonal music.

Morales-Manzanares et al. (2001) developeda system called SICIB capable of music composi-tion using body movements. This system usesdata from sensors attached to the dancer andProlog rules to couple the gestures with themusic. The architecture of SICIB also allows real-time performance.

Certainly the best-known work on computercomposition using AI is David Cope’s (1990,1987) EMI project. This work focuses on theemulation of styles of various composers. It hassuccessfully composed music in the styles ofCope, Mozart, Palestrina, Albinoni, Brahms,Debussy, Bach, Rachmaninoff, Chopin, Stra-vinsky, and Bartok. It works by searching forrecurrent patterns in several (at least two)works of a given composer. The discovered pat-terns are called signatures. Because signaturesare location dependent, EMI uses one of thecomposer’s works as a guide to fix them to theirappropriate locations when composing a newpiece. To compose the musical motivesbetween signatures, EMI uses a compositionalrule analyzer to discover the constraints usedby the composer in his/her works. This analyz-er counts musical events such as voice- leadingdirections and the use of repeated notes andrepresents them as a statistical model of theanalyzed works. The program follows thismodel to compose the motives to be inserted

Articles

46 AI MAGAZINE

Page 5: Articles AI and Music

Thom (2001) with her BAND-OUT-OF-A-BOX (BOB)system addresses the problem of real-timeinteractive improvisation between BOB and ahuman player. In other words, BOB is a “musiccompanion” for real-time improvisation.Thom’s approach follows Johnson-Laird’s(1991) psychological theory of jazz improvisa-tion. This theory opposes the view that impro-vising consists of rearranging and transformingprememorized “licks” under the constraints ofa harmony. Instead he proposes a stochasticmodel based on a greedy search over a con-strained space of possible notes to play at a giv-en point in time. The very important contribu-tion of Thom is that her system learns theseconstraints and, therefore, the stochastic mod-el, from the human player by means of anunsupervised probabilistic clustering algo-rithm. The learned model is used to abstractsolos into user-specific playing modes. Theparameters of this learned model are thenincorporated into a stochastic process that gen-erates the solos in response to four-bar solos ofthe human improviser. BOB has been very suc-cessfully evaluated by testing its real-time solotradings in two different styles, that of saxo-phonist Charlie Parker and that of violinistStephane Grapelli.

Another remarkable interactive improvisa-tion system was developed by Dannenberg(1993). The difference with Thom’s approach isthat in Dannenberg’s system, music generationis mainly driven by the composer’s goals ratherthan the performer’s goals. Wessel’s (Wessel,Wright, and Kahn 1998) interactive improvisa-tion system is closer to Thom’s in that it alsoemphasizes the accompaniment and enhance-ment of live improvisations.

Performance SystemsThe compositional and improvisation systemsdescribed to this point, even though they gen-erated some audible output in many cases(very early systems by electronic means andlater systems by MIDI instruments), did notspecifically address the problem ofexpressiveness. However, for performance sys-tems, expressiveness is a mandatory concernbecause the brain is mainly interested inchange. Auditory neurons, like the other neu-rons in the brain, fire constantly even in silentenvironments. What really matters is not thefiring rate but the changes in firing rate. Thereare auditory neurons whose firing rate changesonly when the sound frequency or the intensi-ty increase or decrease. Other neurons reactsimilarly when a sound repeats. Conversely,most of the primary auditory neurons alsoexhibit what is known as habituation (Baars

1998). When neurons repeatedly receive thesame stimulus, their firing rate, instead ofremaining constant, decreases over time,which means that we deafen to a sound unlessit manifests some form of novelty or renewal inits characteristics. Therefore, it is not surprisingthat music becomes more interesting when it isnot too repetitive; that is, when it containsalterations in dynamic, pitch, and rhythm.This pack of alterations might partly explainwhy synthesized music is much less interestingthan human-performed music: A real instru-ment sends to the auditory cortex more stimulito react to than synthesized music. Indeed, theso-called expressive resources provide anextremely rich source of changes to our brains.

In the remaining sections of this article, wefocus on the use of AI techniques for generat-ing expressive performances. It is worth notic-ing that there has been much less work on AItechniques for performance than on AI forcomposition and improvisation and that allthe existing work is recent.

One of the first attempts to provide high-lev-el musical transformations using a rule-basedsystem is that of Johnson (1992). She devel-oped an expert system to determine the tempoand the articulation to be applied when play-ing Bach’s fugues from “The Well-TemperedClavier.” The rules were obtained from twoexpert human performers. The output givesthe base tempo value and a list of performanceinstructions on note duration and articulationthat should be followed by a human player.The results very much coincide with theinstructions given in well-known commentededitions of “The Well-Tempered Clavier.” Themain limitation of this system is its lack of gen-erality because it only works well for fugueswritten in a 4/4 meter. For different meters, therules should be different. Another obvious con-sequence of this lack of generality is that therules are only applicable to Bach fugues.

The work of the KTH group from Stockholm(Bresin 2001; Friberg 1995; Friberg, Sunberg,and Fryden 2000; Friberg et al. 1998) is one ofthe best-known long-term efforts on perfor-mance systems. Their current DIRECTOR MUSICES

system incorporates rules for tempo, dynamic,and articulation transformations constrainedto MIDI. These rules are inferred both fromtheoretical musical knowledge and experimen-tally from training, especially using the so-called analysis-by-synthesis approach. Therules are divided into three main classes: (1)differentiation rules, which enhance the differ-ences between scale tones; (2) grouping rules,which show what tones belong together; and(3) ensemble rules, that synchronize the various

Articles

FALL 2002 47

Page 6: Articles AI and Music

pendency, the more expressive resources onetries to model, the more difficult it is to findthe appropriate rules.

As mentioned in the introduction, in theSAXEX system, we directly use the interpreta-tion knowledge implicit in examples frommonophonic recordings of human performersinstead of try to make this knowledge explicitby means of rules. This approach allows us todeal with the five most important expressiveresources: (1) dynamics, (2) rubato, (3) vibrato,(4) articulation, and (5) attack of the notes.Furthermore, we do it in the context of anexpressively richer (non-MIDI) instrumentsuch as a tenor saxophone. Besides, ours wasthe first attempt to apply CBR to music as wellas the first attempt to cover the full cycle froma non-MIDI inexpressive input sound file to anexpressive output sound file. Working directlywith sound files is much more complex thanworking with MIDI because sound files requireautomatically extracting the higher-level para-meters that are relevant for expressiveness.Besides, it also complicates the expressivetransformation process and the synthesis.However, with our approach, we achieveexpressive performances that are much closerto human performances. The remaining sec-tions describe our system.

SAXEX: A Case-Based ReasoningApproach to Expressive

PerformanceThe task of SAXEX is to infer a set of expressivetransformations to apply to every note of aninexpressive monophonic input so that theoutput sounds expressive. In other words, whatSaxEx does is add expressiveness to a given flatperformance. To do so, SAXEX uses a case mem-ory (figure 1) containing examples of expres-sive human performances, analyzed by meansof spectral modeling techniques (Serra et al.1997), and musical knowledge. SAXEX alsoknows the score of the performance. The heartof the method is to analyze each input notedetermining its “role” in the musical phrase itbelongs to, identify and retrieve notes withsimilar roles in the expressive cases, and trans-form the input note so that its expressive prop-erties (dynamics, vibrato, rubato, articulation,and attack of the notes) match the propertiesof the retrieved similar notes.

Two types of analysis are performed on theexpressive human performances. First, usingspectral modeling techniques (Serra et al.1997), the SMS subsystem computes, for each ofthe five expressive resources, a qualitative val-ue representing how each note, in the expres-

voices in an ensemble.Another long-term effort is the project led by

G. Widmer (2001, 1996). It is another exampleof the use of rules for performing transforma-tions of tempo and dynamics. The approachtaken in this project was first to record and pre-process a large amount of high-quality musicalperformances (containing more than 400,000notes). The large amount of examples is used toinduce performance rules by means of machinelearning techniques. The project has alreadyproduced very promising results.

Canazza et al. (1997) developed a system toanalyze how the musician’s expressive inten-tions are reflected in the performance. Theanalysis reveals two different expressive dimen-sions: (1) one related to the energy (dynamics)and (2) the other one related to the kinetics(rubato) of the piece. The authors also devel-oped a program for generating expressive per-formances according to these two dimensions.

The work of Dannenberg and Derenyi (1998)is also a good example of articulation transfor-mations using manually constructed rules. Theydeveloped a trumpet synthesizer that combinesa physical model with a performance model.The goal of the performance model is to gener-ate control information for the physical modelby means of a collection of rules manuallyextracted from the analysis of a collection ofcontrolled recordings of human performance.

Another approach taken for performingtempo and dynamics transformation is the useof neural network techniques. In Bresin (1998),a system that combines symbolic decision ruleswith neural networks is implemented for sim-ulating the style of real piano performers. Theoutput of the neural networks express time andloudness deviations. These neural networksextend the standard feed-forward networktrained with the back-propagation algorithmwith feedback connections from the outputneurons to the input neurons.

We can see that except for the work of theKTH group that considers three expressiveresources, the other systems are limited to tworesources such as rubato and dynamics or ruba-to and articulation. This limitation has to dowith the use of rules. Indeed, the main prob-lem with the rule-based approaches is that it isdifficult to find rules general enough to capturethe variety present in different performances ofthe same piece by the same musician and eventhe variety within a single performance(Kendall and Carterette 1990). Furthermore,the different expressive resources interact witheach other. That is, the rules for dynamicsalone change when rubato is also taken intoaccount. Obviously, because of this interde-

Articles

48 AI MAGAZINE

Page 7: Articles AI and Music

sive examples, has been played by the humanperformer. This value is stored with each notein the case memory and is available to SAXEX.The possible qualitative values are very low,low, medium, high, and very high. To computethem, SMS computes the average quantitativevalue of the whole performed piece for eachone of the expressive resources and then com-pares how each individual note has beenplayed in relation with these averages. Forexample, if the dynamics (in decibels) of a noteis much higher than the average dynamics ofthe piece, then the symbolic value “very highdynamics” is associated with this note. Thesame occurs for the other values and the otherexpressive resources.

The second analysis applies the generativetheory of tonal music (Lerdahl and Jackendoff1983) and Narmour´s (1990) music cognitionmodel to determine the role of each note in themusical phrase. For example, a note might bethe last of an ascending melodic progression,have long duration, be metrically strong, beharmonically stable, and be the most promi-nent note in the bar.

As a result of these two analyses, each note in

the stored expressive performance is annotatedwith its role in the musical phrase it belongs toand its qualitative expressive values. Theseannotations are represented in the cases of thesystem. It is worth clarifying here that the GEN-ERATIVE THEORY OF TONAL MUSIC (GTTM) and Nar-mour’s analyses introduce important contextinformation on the notes (because the role ofeach note in the phrase depends on its relationto the remaining phrase notes as well as on thechord); therefore, SAXEX cases do not containinformation only on single notes but includecontextual knowledge at the phrase level.

To generate the expressive performance, SAX-EX applies the second analysis to the inexpres-sive input performance to identify the role ofeach note. It then searches for similar notes(that is, notes having similar roles within themusical phrase) in the stored expressive perfor-mances. When a similar note is selected, SAXEX

applies the set of expressive values that willmake the inexpressive note sound like theretrieved expressive one. The rationale behindthis way of proceeding is that similar expres-sive nuances (similar solutions in CBR terms)should be applied to similar musical phrases

Articles

FALL 2002 49

Score

.snd

InexpressivePhrase

Input

.midAffectiveLabels

Retrieve Reuse

Noos

Revise

Retain

MusicalModels Cases

SMS

SynthesisAnalysis

.snd

Expressive Phrase

Output

UserInteraction

Figure 1. SAXEX Schema.

Page 8: Articles AI and Music

Narmour’s Implication-Realization Model

An intuition shared by many peopleis that appreciating music has to dowith expectation. That is, what wehave already heard builds expecta-tions on what is to come. Theseexpectations can be fulfilled or not bywhat is to come. If fulfilled, the lis-tener feels satisfied. If not, the listeneris surprised or even disappointed.Based on this observation, Narmourproposed a theory of cognition ofmelodies based on a set of basicgrouping structures (figure A). Thesestructures characterize patterns ofmelodic implications (or expecta-tions) that constitute the basic unitsof the listener’s perception. Otherresources such as duration and rhyth-mic patterns emphasize or inhibit theperception of these melodic implica-tions. The use of the implication-real-ization model provides a musicalanalysis of the melodic surface of thepiece. The basic grouping structuresare (figure A) the P structure (a patterncomposed of a sequence of at leastthree notes with similar intervalicdistances and the same registral direc-tion), the ID structure (a sequence ofthree notes with the same intervalicdifference and different registraldirection), the D structure (a repeti-tion of at least three notes), the IPstructure (a sequence of three noteswith similar intervalic distances anddifferent registral direction), the VPstructure (a sequence of three noteswith the same registral direction; thefirst intervalic distance is a step, andthe second is a leap), the R structure (asequence of three notes with a differ-ent registral direction; the first inter-valic distance is a leap, and the sec-ond is a step), the IR structure (asequence of three notes with thesame registral direction; the firstintervalic distance is a leap, and thesecond is a step), and the VR structure(a sequence of three notes with differ-ent registral direction; both intervalsare leaps). In figure B, the first threenotes form a P structure, the next

three notes an ID, and the last threenotes another P. The two P structuresin the figure have a descending regis-tral direction, and in both cases, thereis durational cumulation (the lastnote is significantly longer).

Looking at melodic groupings inthis way, we can see how each pitchinterval implies the next. Thus, aninterval can be continued with a sim-ilar one (such as P or ID or IP or VR)or reversed with a dissimilar one.That is, a step (small interval betweennotes) followed by a leap (large inter-val between notes) in the same direc-tion would be a reversal of theimplied interval (another step wasexpected, but instead, a leap is heard)but not a reversal of direction. Pitchmotion can also be continued bymoving in the same direction (up ordown) or reversed by moving in theopposite direction. The strongestkind of reversal involves both a rever-sal of interval and a reversal of direc-tion. When several small intervals(steps) move consistently in the samedirection, they strongly imply con-tinuation in the same direction withsimilar small intervals. If a leap oc-curs instead of a step, it creates a con-tinuity gap, which triggers the expec-tation that the gap should be filled in.To fill it, the next step intervalsshould move in the opposite direc-tion from the leap, which also tendsto limit pitch range and keepsmelodies moving back toward a cen-ter. Basically, continuity (satisfying the

expectations) is nonclosural and pro-gressive, whereas reversal of implica-tion (not satisfying the expectation) isclosural and segmentive. A long noteduration after reversal of implicationusually confirms phrase closure.

Any given melody can be describedby a sequence of Narmour structures.SAXEX performs an automatic parsingof the melody to extract the Narmourstructures (figure B). Then, SAXEX usesthese structures to search and retrievenotes in the case memory that aresimilar to the input inexpressivenotes. To do this retrieval, it takesinto account the kind of structure itbelongs to and its position within thestructure (first note, inner note, orlast note). To select the most pre-ferred notes among the retrievedones, it uses the registral direction(notes having the same registraldirection as the input note will bepreferred), and the durational cumu-lation (if the input note has a longduration and is the last note of thestructure, retrieved notes having thisproperties will be the preferred ones).The rationale is that two notes thatshare these Narmour propertiesshould be played similarly. This ratio-nale agrees with the main case-basedreasoning assumption that similarproblems have similar solutions.

Articles

50 AI MAGAZINE

P ID P

(d)(d)

3

P D ID IP VP R IR VR

Figure A (top). Narmour’s Basic Structures.

Figure B (bottom). Narmour’s Parsing of the Beginning of “All of Me.”

Page 9: Articles AI and Music

(similar problems in CBR terms). For example,if the dynamics of the expressive note is high,the rubato is low, the vibrato is very high, thearticulation is medium, and the attack is verynoisy, SAXEX will apply these same values tothe inexpressive input note. These expressivevalues are then given to the SMS synthesizer,which generates the output sound file (seefootnote 2) by transforming the inexpressivesound file given as input.

The SMS SubsystemSound analysis and synthesis techniques basedon spectral models are useful for extractinghigh-level parameters from sound files, trans-forming them and synthesizing a modified ver-sion of such sound files. Our system uses theSMS subsystem (figure 1) to extract key informa-tion related to the five expressive resources wedeal with. Furthermore, we use the SMS synthe-sis component to generate the expressive inter-pretations inferred by the CBR subsystem. SMS

is based on decomposing the original soundinto sinusoids plus a spectral residual (noise).From the sinusoidal representation, SMS

extracts high-level attributes such as attack andrelease times of the notes, formant structure,vibrato, and average pitch and amplitudewhen the sound is monophonic. These attrib-utes can be modified according to the expres-sive values inferred by the CBR subsystem (seenext section), and the result can be added backto the spectral representation without loss ofsound quality. In summary, this sound analysisand synthesis software is ideal as a preprocessorthat provides to the CBR subsystem a high-lev-el musical description of the sound files and asa postprocessor that adds the expressive trans-formations inferred by the CBR subsystem tothe inexpressive input sound file.

The CBR SubsystemSolving a problem in the SAXEX systeminvolves three parts: (1) the analysis, (2) thereasoning, and (3) the synthesis. The analysisand the synthesis are done using the SMS sub-system as described earlier. The reasoning partis achieved by means of the CBR subsystem(figure 1) and consists of inferring a set ofexpressive transformations to apply to everynote of an inexpressive input so that the out-put sounds expressive. To do this inference, theCBR subsystem has to be able to retrieve, froma memory containing expressive interpreta-tions, those notes that are, musically speaking,similar to the input inexpressive notes. Theretrieved notes are candidates to be imitated.The next four subsections describe the neces-sary CBR steps performed by SAXEX.

The Retrieval StepFrom the overall description section, we cansee that the key problem for SAXEX is findingthe appropriate note to imitate or, in otherwords, to figure out when two notes belongingto two different musical pieces play a similarrole in their respective musical phrases. Whenmusicians compare two music scores, they usetheir musical knowledge to solve this problem.For example, they take into account the fol-lowing five elements:

First is the metrical strength of the notes (themetrical strength of notes played on strongbeats is higher than the metrical strength ofnotes played on weak beats).

Second is the harmonic stability of the notesaccording to jazz harmony (position of thenote within its underlying chord and the roleof the note in the chord progression),

Third is the notes duration. Fourth is the hierarchical relations of the note

with respect to the rest of the notes in thephrase according to the GTTM (Lerdahl andJackendoff 1983). For example, some noteselaborate other notes (some notes are essential,and others are ornamental), some notes createtensing and relaxing relations with respect toothers (some notes give a feeling of finality orsettledness; others feel unstable, and whenthey are played, the listener feels a tension thatis resolved when the piece returns to a morestable note). (see sidebar 2 for further details).

Fifth is the position of notes in Narmour’s(1990) basic musical grouping structures, such asmelodic progressions, repetitions, and changesin registral directions. Narmour, in his implica-tion-realization model of cognition of melodies,identified a small number of basic structures thatconstitute the basic units of the listener’s percep-tion (see sidebar 1 for further details).

SAXEX represents, in its case memory, all thisknowledge about metrical strength, harmonicstability, note duration, hierarchical relations,Narmour structures, and so on, by means of theNOOS object-oriented language (Arcos 1997) anduses this knowledge to retrieve the appropriatenotes to imitate in the case memory of expres-sive notes. That is, SAXEX retrieves notes that aremusically similar to the inexpressive inputnotes. For example, assume that the inexpressiveinput note is the last of a melodic progression, ismetrically strong, and is harmonically stable.Then, SAXEX searches for notes with similarmusical properties in the case memory becausenotes with these properties would clearly begood candidates to imitate. Assume for themoment that only one similar note is found. Inthis simple situation, this note would immedi-ately be retrieved for imitation. In other words,

Articles

FALL 2002 51

Page 10: Articles AI and Music

tated; that is, SAXEX decides to apply expressivetransformations to make the inexpressiveinput note sound like the selected expressiveone. In the case where the preference criteriacannot single out a note, and therefore, severalare equally appropriate according to all theavailable musical criteria, a fuzzy aggregationoperator similar to a weighted average (Arcosand Lopez de Mantaras 2002) is applied tocompute a set of expressive values that com-bines all the expressive values of the selectednotes. The user can also preselect one of thefollowing alternatives to replace the aggrega-tion operator:

First, the majority rule chooses, for eachexpressive resource, the value that was appliedin the majority of the selected expressive notes.

Second, the strict majority rule chooses the

SAXEX would decide to apply, to the inexpressiveinput note, the set of expressive qualitative val-ues that will make that note sound like theretrieved expressive one. For example, if thedynamics of the retrieved expressive note ishigh, the rubato is low, the vibrato is very high,the attack is very noisy, and the articulation ismedium, SAXEX will apply these values to theinexpressive input note.

When several similar notes are found, anadditional step is carried out to select the mostpreferred one using additional musical criteriasuch as the registral direction (ascending ordescending) and the duration of the note (seesidebars for further details).

The Reuse StepIn this step, the selected expressive note is imi-

Articles

52 AI MAGAZINE

Figure 2. Interactive Revision Window.

Page 11: Articles AI and Music

Lerdahl and Jackendoff’s GenerativeTheory of Tonal Music

According to the GENERATIVE THEORY OF

TONAL MUSIC (GTTM), music is builtfrom an inventory of notes and a setof rules. The rules assemble notes intoa sequence and organize them intohierarchical structures of music cog-nition. To understand a piece meansto assemble these mental structures aswe listen to the piece. It proposes fourtypes of structures associated with apiece: (1) grouping, (2) metric, (3)time-span reduction, and (4) prolon-gational reduction. The grouping struc-ture describes the hierarchically re-lated segmentation units thatlisteners can establish when hearing amelody: motifs, phrases, and sec-tions. The listener feels that notesgroup into motifs, which, in turn, aregrouped into phrases, which aregrouped into sections, which arefinally grouped into full pieces. Themetric structure describes the rhyth-mical hierarchy of the piece. It assignsa weight to each note depending onthe beat in which it is played in sucha way that the metric strength ofnotes played on strong (down) beatsis higher than the metric strength(accent) of notes played on weak (up)beats. In figure A, the metric structureis illustrated by the dots under thenotes: the more dots, the stronger theaccent on that note. The time-spanreduction is a hierarchical structuredescribing the relative structuralimportance of notes within the audi-ble rhythmic units of a phrase (figureA). It differentiates in the melody theessential parts from the ornaments.The essential parts are further dissect-ed into even more essential parts andornaments on them. The reductioncontinues until the melody is reducedto a skeleton of the few most promi-nent notes. There are two types ofnodes in this hierarchical structure:(1) left elaboration and (2) right elab-oration. For example, in figure A, thefirst quarter note of the second bar(C) belongs to a left leaf in a right-elaboration node because the

following two notes (D and C) elabo-rate the first note. In turn, these two -notes belong to a left-elaborationnode because the second note (D)elaborates the third (C). In figure A,we can see then that the C of the firstbar is the most prominent note; thenext most prominent notes are thefirst C of the second bar and the B ofthe third bar, and so on. Theprolongational reduction is a hierar-chical structure describing tension-relaxation relations among notes.There are two basic types of nodes inthis structure: (1) tensing and (2)relaxing. Besides, there are three pos-sible modes of branch chaining: (1)strong prolongation, in which eventsrepeat maintaining sonority (forexample, notes of the same chordsuch as the notes of the first and thirdmeasures in figure A); (2) weak prolon-gation, in which events repeat in analtered form (for example, from an Ichord to a I6 Chord); (3) jumps inwhich two completely differentevents are connected (for examplefrom an I chord to a dominant chordsuch as the jump from the Cmaj7chord of the second measure to the E7chord of the third measure in figureA). This structure captures the senseof musical flow across phrases, that is,the buildup and release of tension

within longer and longer passages ofthe piece, until a feeling of maximumrepose at the end of the piece. Ten-sion builds up as the melody departsfrom more stable notes to less stableones and is discharged when the me-lody returns to stable notes. Forexample, in figure A, the D of the sec-ond bar creates a tension (a D is not anote of the underlying Cmaj7 chord)that is immediately resolved in thenext note (C). Tension and release arealso felt as a result of moving fromdissonant to consonant chords, fromnonaccented to accented notes, fromhigher to lower notes, and from pro-longed to nonprolonged notes. SAXEX

uses this model to search and retrievenotes from the case memory whosemetrical strength is similar to theinput inexpressive note. However, therole (prominent note, ornamentalnote, tensing note, relaxing note) ofthe retrieved notes in the time-spanand prolongational reduction treesare used by SAXEX as preference crite-ria to prefer some notes and discardothers. Those notes whose roles aresimilar to the roles of the input notewill be the preferred ones. Again, therationale is that similar notes (in thesense of the GTTM theory) should beplayed similarly.

Articles

FALL 2002 53

Cmaj7 Cmaj7 E7

3

Figure A. Metrical Strength and Time-Span Tree of the Beginning of “All of Me.”

Page 12: Articles AI and Music

non has been studied quite extensively bothfrom the psychological point of view (Davisand Thaut 1989; Dowling and Harwood 1986;Goldstein 1980; Krumhansl 1997; Robazza,Macaluso, and D’Urso 1994; Sloboda 1991)and, more recently, from the neurologicalpoint of view (Boold et al. 1999). SAXEX alsogives the user the possibility to bias the resultsalong several affective dimensions. Thus, theexpressive performances in the case memoryalso include information related to their affec-tive content. This information is representedby a sequence of affective regions. Affectiveregions group sequences of notes with com-mon affective expressiveness along threedimensions: (1) tender-aggressive, (2) sad-joy-ful, and (3) calm-restless. These dimensions aregraded by means of five ordered qualitative val-ues expressed by means of linguistic labels. Themiddle label represents no predominance (forexample, neither tender nor aggressive in thetender-aggressive dimension); lower and upperlabels represent, respectively, predominance inone direction or the other (for example, verycalm is the lowest label in the calm-restlessdimension). These affective regions were man-ually labeled. The user can bias the perfor-mances synthesized by SAXEX by means of indi-cating an affective label at the retrieval step.For example, if the user declares that he/shewants a calm and very tender performance,SAXEX will first look for notes in the case mem-ory that belong to calm and very tender affec-tive regions (most preferred). If none is found,it will look next for notes belonging to calmand tender or very calm and very tenderregions (both as second choices because theseare the next-closest options). Obviously, theselected notes will also have to be similar to theinput nonexpressive note, according to the cri-teria based on the musical knowledgeexplained previously. In the sound examples(see footnote 2), we have included two perfor-mances with different affective values.

ExperimentationWe limited our experiments to monophonictenor saxophone interpretations of standardjazz ballads (such as “All of Me,” “AutumnLeaves,” and “Misty”). The expressive exam-ples, as well as the flat performances for eachballad, were recorded in the audio recordingstudio of the Audiovisual Institute in Barcelonaby a tenor saxophone player.

We have performed two sets of experimentscombining the different jazz ballads that wererecorded. The first set of experiments consistedof using the expressive examples from the firstphrases of a given ballad for generating the

value that was applied in at least half of theselected expressive notes.

Third, is the minority rule (the dual of the firstcriterion).

Fourth, the strict minority rule chooses thevalue that was applied in, at most, one of theselected expressive notes.

Fifth, continuity selects the values of the notethat comes from the same musical subphraseas the previously selected note.

Sixth is discontinuity (the dual of the fifth cri-terion).

If, after applying any of these criteria, morethan one note remains, a random choice takesplace.

From this description, it should be noticedthat the default strategy of the system is toapply the fuzzy aggregation operator. Never-theless, because the generation of expressiveperformances is a creative process influencedby the user’s personal preferences, the alterna-tive reuse criterion can be used to tailor the sys-tem according to such personal preferences.For example, the strict minority and the dis-continuity criteria would force the system toproduce expressive performances containingless usual combinations of expressive effects.

The Revise StepThe user can listen to the obtained results and,through an interactive revision window (figure2), has the option to select any note and man-ually modify its expressive values (decreasingvibrato, increasing dynamics, and so on) ifhe/she thinks that some of the solutions decid-ed by the system should be corrected. Addi-tionally, if several expressive versions of thesame piece have been generated, the user cantake the best partial solutions of each one ofthem to obtain a combined best performance,for example, by taking the first phrase of thesecond version followed by the second phraseof the first version, and so on.

The Retain StepThe performance resulting from the revisionstep is automatically added to the case memo-ry. This added performance is therefore avail-able for retrieval when solving future prob-lems. It is worth noticing that only positivefeedback is given to the system, that is, onlyexpressive performances that have been judgedas good ones or that have been improved at therevision step, are retained. This step providesthe (knowledge-intensive) learning dimensionto the system.

The Use of Affective LabelsMusic clearly evokes emotions. This phenome-

Articles

54 AI MAGAZINE

Page 13: Articles AI and Music

expressive version of the remaining phrases.This group of experiments revealed that SAXEX

clearly identified the relevant precedent casesto be imitated, and appropriate expressivetransformations were applied.

The second set of experiments consisted ofusing examples of expressive performances ofsome pieces to generate expressive perfor-mances of other pieces. This second group ofexperiments revealed that SAXEX also identifiedrelevant precedent cases to imitate, and appro-priate expressive transformations were applied.One can notice, for example, very appropriatecrescendos in increasing melodic progressions,low-frequency vibratos in long notes in sadperformances and higher-frequency vibratos injoyful ones, accentuated legato in sad perfor-mances and staccato in joyful ones, “swingy”rhythmic variations, and attacks from a lowerpitch in the sad interpretations versus noisyattacks in the joyful ones. These expressiveresources are often applied in the same way byhuman performers. We also received additionalfeedback from dozens of individuals, musi-cians, and nonmusicians who accessed thesound examples posted on the web and judgedthe results as humanlike. Indeed, the “playingstyle” of SAXEX very much resembles the styleof the human saxophonist that provided theexpressive examples. Readers are encouraged tojudge the quality of SAXEX performances byaccessing our demos (see footnote 2) and giv-ing us additional feedback.

Concluding RemarksIn the first part of this article, we presented asurvey of representative computer music sys-tems that, to a greater or lesser extent, use AItechniques.3 The survey covered the threemajor types of systems: (1) compositional, (2)improvisational, and (3) performance. In thesecond part, we emphasized the importance ofdealing with the problem of generating expres-sive performances and described SAXEX, a com-puter system based on CBR techniques that iscapable of performing expressive, monophonicmusic that resembles human performance.CBR has proven to be a very powerful tech-nique to directly use the interpretation knowl-edge implicit in examples extracted fromrecordings of human performers rather thantry to make this knowledge explicit. Althoughat present our system is limited to the style ofjazz ballads, this is not a strong limitationbecause with not much effort, our systemcould generate expressive performances in oth-er styles provided that we introduce the perti-nent cases in the case base. As a matter of fact,

jazz ballads, because of their slow tempo andthe importance of the melody, are more sensi-tive than other jazz styles to the inadequate useof expressive resources. We believe that per-forming expressively in styles such as be-bop,or hard-bop, would actually be easier. It possi-bly would also be easier to deal with popular orfolk music because it is generally structurallysimpler than jazz. The bigger problems wouldbe with classical music because it would not beenough to have the appropriate example cases.Indeed, we would also need to change thebackground musical harmonization knowl-edge of the system.

Although at present SAXEX works with tenorsax sound, it would not be difficult to changeto other instruments. We work with a saxbecause it is a very rich instrument in terms ofexpressiveness. Many other instruments wouldactually be easier to deal with. Finally, theproblem of dealing with polyphonic music isthat of source separation, that is, isolating eachinstrument (and, particularly, the soloist) fromthe rest. This problem has not yet been solved.Therefore, unless each instrument is recordedin a different track, we cannot yet deal withpolyphonic music.

Our agenda for the near future includes,among other minor improvements, develop-ing a real-time version as well as adding impro-visation capabilities because the current ver-sion is restricted to perform the notes that arewritten in the score; that is, it cannot change,add, or remove notes. We believe that trulyhumanlike performance has to cope, amongother things, with at least some basic improvi-sation resources, such as not playing somenotes and adding ornamental notes. We intendto build on existing work on improvisation, inparticular on the approach being developedwithin our group by Grachten (2001) that isbased on a cognitive model for jazz improvisa-tion (Pressing 1988).

AcknowledgmentsWe thank David Leake and the anonymousreviewers for their helpful and stimulating sug-gestions to improve the article. Our researchon AI and music is partially supported by theSpanish Ministry of Science and Technologyunder project TIC 2000-1094-C02 “TABASCO:Content-Based Audio Transformation UsingCBR.”

Notes1. We received the best paper award for an early ver-sion of SAXEX at the 1997 International ComputerMusic Conference.

2. Listen to some examples at www.iiia.csic.es/~arcos/noos/Demos/Aff-Example.html and www. iiia.csic.es/

Articles

FALL 2002 55

Page 14: Articles AI and Music

ings of the 1990 International Computer Music Con-ference, 288–291. San Francisco, Calif.: InternationalComputer Music Association.

Cope, D. 1987. Experiments in Music Intelligence. InProceedings of the 1987 International ComputerMusic Conference, 174–181. San Francisco, Calif.:International Computer Music Association.

Dannenberg, R. B. 1993. Software Design for Interac-tive Multimedia Performance. Interface 22(3):213–218.

Dannenberg, R. B., and Derenyi, I. 1998. CombiningInstrument and Performance Models for High-Qual-ity Music Synthesis. Journal of New Music Research27(3): 211–238.

Davis, W. B., and Thaut, M. H. 1989. The Influenceof Preferred Relaxing Music on Measures of StateAnxiety, Relaxation, and Psychological Responses.Journal of Music Theory 26(4): 168–187.

Dowling, W. J., and Harwood, D. L. 1986. Music Cog-nition. San Diego, Calif.: Academic.

Ebcioglu, K. 1993. An Expert System for Harmoniz-ing Four-Part Chorales. In Machine Models of Music,eds. S. M. Schwanauer and D. A. Levitt, 385–401.Cambridge, Mass.: MIT Press.

Feulner, J. 1993. Neural Networks That Learn andReproduce Various Styles of Harmonization. In Pro-ceedings of the 1993 International Computer MusicConference, 236–239. San Francisco, Calif.: Interna-tional Computer Music Association.

Franklin, J. A. 2001. Multi-Phase Learning for JazzImprovisation and Interaction. Paper presented atthe Eighth Biennial Symposium on Art and Technol-ogy, 1–3 March, New London, Connecticut.

Friberg, A. 1995. A Quantitative Rule System forMusical Performance. Ph.D. dissertation, Depart-ment of Speech, Music, and Hearing, Royal Instituteof Technology.

Friberg, A.; Bresin, R.; Fryden, L.; and Sunberg, J.1998. Musical Punctuation on the Microlevel: Auto-matic Identification and Performance of SmallMelodic Units. Journal of New Music Research 27(3):271–292.

Friberg, A.; Sunberg, J.; and Fryden, L. 2000. Musicfrom Motion: Sound Level Envelopes of TonesExpressing Human Locomotion. Journal on NewMusic Research 29(3): 199–210.

Fry, C. 1993. FLAVORS BAND: A Language for SpecifyingMusical Style. In Machine Models of Music, eds. S. M.Schwanauer and D. A. Levitt, 427–451. Cambridge,Mass.: The MIT Press.

Goldstein, A. 1980. Thrills in Response to Music andOther Stimuli. Physiological Psychology 8(1): 126–129.

Grachten, M. 2001. JIG: Jazz Improvisation Genera-tor. In Proceedings of the MOSART Workshop on CurrentResearch Directions in Computer Music, 1–6. Barcelona,Spain: Pompeu Fabra University Publishers.

Hiller, L., and Isaacson, L. 1993. Musical Composi-tion with a High-Speed Digital Computer. In MachineModels of Music, eds. S. M. Schwanauer and D. A.Levitt, 9–21. Cambridge, Mass.: MIT Press.

Hörnel, D., and Degenhardt, P. 1997. A NeuralOrganist Improvising Baroque-Style Melodic Varia-

~arcos/noos/Demos/Example.html. Please make sureyour browser has a plug in to process wave soundfiles.

3. Readers can also find information on AI and musicat the American Association of Artificial IntelligenceAI Topics web site www.aaai.org/AITopics/html/music.html.

ReferencesAamodt, A., and Plaza, E. 1994. Case-Based Reason-ing: Foundational Issues, Methodological Variations,and System Approaches. AI Communications 7(1):39–59.

Arcos, J. L. 1997. The NOOS Representation Language.Ph.D. dissertation, Computer Science Department,Universitat Politècnica de Catalunya.

Arcos, J. L., and Lopez de Mantaras, R. 2002. Com-bining Fuzzy and Case-Based Reasoning to GenerateHumanlike Music Performances. In Proceedings ofthe Information Processing and Management ofUncertainty Congress (IPMU-2000). Studies in Fuzzi-ness and Soft Computing, Volume 89, 21–31. NewYork: Springer-Verlag. Forthcoming.

Arcos, J. L., and Lopez de Mantaras, R. 2001. AnInteractive Case-Based Reasoning Approach for Gen-erating Expressive Music. Applied Intelligence 14(1):115–129.

Arcos, J. L.; Lopez de Mantaras, R.; and Serra, X.1998. SAXEX: A Case-Based Reasoning System forGenerating Expressive Musical Performances. Journalof New Music Research 27(3): 194–210.

Baars, B. 1998. A Cognitive Theory of Consciousness.New York: Cambridge University Press.

Bharucha, J. 1993. MUSACT: A Connectionist Model ofMusical Harmony. In Machine Models of Music, eds. S.M. Schwanauer and D. A. Levitt, 497–509. Cam-bridge, Mass.: The MIT Press.

Biles, J. A. 1994. GENJAM: A Genetic Algorithm forGenerating Jazz Solos. In Proceedings of the 1994International Computer Music Conference,131–137. San Francisco, Calif.: International Com-puter Music Association.

Boold, A. J.; Zatorre, R. J.; Bermudez, P.; and Evans, A.C. 1999. Emotional Responses to Pleasant andUnpleasant Music Correlate with Activity in Paralim-bic Brain Regions. Nature Neuroscience 2(4): 382–387.

Bresin, R. 2002. Articulation Rules for AutomaticMusic Performance. In Proceedings of the 2001 Inter-national Computer Music Conference 2001. SanFrancisco, Calif.: International Computer MusicAssociation. Forthcoming.

Bresin, R. 1998. Artificial Neural Networks–BasedModels for Automatic Performance of MusicalScores. Journal of New Music Research 27(3): 239–270.

Canazza, S.; De Poli, G.; Roda, A.; and Vidolin, A.1997. Analysis and Synthesis of Expressive Intentionin a Clarinet Performance. In Proceedings of the1997 International Computer Music Conference,113–120. San Francisco, Calif.: International Com-puter Music Association.

Cope, D. 1990. Pattern Matching as an Engine for theComputer Simulation of Musical Style. In Proceed-

Articles

56 AI MAGAZINE

Page 15: Articles AI and Music

tions. In Proceedings of the 1997 Interna-tional Computer Music Conference,430–433. San Francisco: InternationalComputer Music Association.

Hörnel, D., and Menzel, W. 1998. LearningMusical Structure and Style with NeuralNetworks. Journal on New Music Research22(4): 44–62.

Johnson, M. L. 1992. An Expert System forthe Articulation of Bach Fugue Melodies. InReadings in Computer-Generated Music,ed. D. L. Baggi, 41–51. Washington, D.C.:IEEE Computer Society.

Johnson-Laird, P. N. 1991. Jazz Improvisa-tion: A Theory at the Computational Level.In Representing Musical Structure, eds. P.Howell, R. West, and I. Cross. San Diego,Calif.: Academic.

Kendall, R. A., and Carterette, E. C. 1990.The Communication of Musical Expres-sion. Music Perception 8(2): 129.

Kolodner, J. 1993. Case-Based Reasoning.San Francisco, Calif.: Morgan Kaufmann.

Krumhansl, C. L. 1997. An ExploratoryStudy of Musical Emotions and Psy-chophysiology. Canadian Journal of Experi-mental Psychology 51(4): 336-352.

Leake, D. 1996. Case-Based Reasoning: Expe-riences, Lessons, and Future Directions. MenloPark, Calif.: AAAI Press.

Lerdahl, F., and Jackendoff, R. 1983. AnOverview of Hierarchical Structure inMusic. Music Perception 1(2): 229–252.

Levitt, D. A. 1993. A Representation forMusical Dialects. In Machine Models ofMusic, eds. S. M. Schwanauer and D. A.Levitt, 455–469. Cambridge, Mass.: TheMIT Press.

Minsky, M. 1993. Music, Mind, and Mean-ing. In Machine Models of Music, eds. S. M.Schwanauer and D. A. Levitt, 327–354.Cambridge, Mass.: MIT Press.

Moorer, J. A. 1993. Music and ComputerComposition. In Machine Models of Music,eds. S. M. Schwanauer and D. A. Levitt,167–186. Cambridge, Mass.: The MIT Press.

Morales-Manzanares, R.; Morales, E. F.;Danenberg, R.; and Berger, J. 2001. SICIB: AnInteractive Music Composition SystemUsing Body Movements. Journal of NewMusic Research 25(2): 25–36.

Narmour, E. 1990. The Analysis and Cogni-tion of Basic Melodic Structures: The Implica-tion-Realization Model. Chicago: Universityof Chicago Press.

Pachet, F., and Roy, P. 1998. FormulatingConstraint-Satisfaction Problems on Part-Whole Relations: The Case of AutomaticHarmonization. Paper presented at theECAI’98 Workshop on Constraint Tech-niques for Artistic Application, 23–28August, Brighton, UK.

Papadopoulos, G., and Wiggins, G. 1998. AGenetic Algorithm for the Generation ofJazz Melodies. Paper presented at theFinnish Conference on Artificial Intelli-gence (SteP’98), 7–9 September, Jyvaskyla,Finland.

Pressing, J. 1988. Improvisation: Methodsand Models. In Generative Processes inMusic: The Psychology of Performance, Impro-visation, and Composition, ed. J. Sloboda,129–178. Oxford, U.K.: Oxford UniversityPress.

Rader, G. M. 1993. A Method for Compos-ing Simple Traditional Music by Computer.In Machine Models of Music, eds. S. M.Schwanauer and D. A. Levitt, 243–260.Cambridge, Mass.: The MIT Press.

Robazza, C.; Macaluso, C.; and D’Urso, V.1994. Emotional Reactions to Music byGender, Age, and Expertise. PerceptualMotor Skills 79:939–944.

Rothgeb, J. 1993. Simulating Musical Skillsby Digital Computer. In Machine Models ofMusic, eds. S. M. Schwanauer and D. A.Levitt, 157–164. Cambridge, Mass.: TheMIT Press.

Sabater, J.; Arcos, J. L.; and López de Mán-taras R. 1998. Using Rules to Support Case-Based Reasoning for HarmonizingMelodies. Paper presented at the AAAISpring Symposium on Multimodal Reason-ing, 27 July, Stanford, California.

Serra,, X.; Bonada, J.; Herrera, P.; andLoureiro, R. 1997. Integrating Complemen-tary Spectral Methods in the Design of aMusical Synthesizer. In Proceedings of the1997 International Computer Music Con-ference, 152–159. San Francisco, Calif.:International Computer Music Association.

Simon, H. A., and Sumner, R. K. 1993. Pat-terns in Music. In Machine Models of Music,eds. S. M. Schwanauer and D. A. Levitt,83–110. Cambridge, Mass.: MIT Press.

Sloboda, J. A. 1991. Music Structure andEmotional Response: Some Empirical Find-ings. Psychology of Music 19(2): 110–120.

Schwanauer, S. M. 1993. A LearningMachine for Tonal Composition. InMachine Models of Music, eds. S. M.Schwanauer and D. A. Levitt, 511–532.Cambridge, Mass.: MIT Press.

Schwanauer, S. M., and Levitt, D. A., eds.1993. Machine Models of Music. Cambridge,Mass.: The MIT Press.

Thom, B. 2001. BOB: An ImprovisationalMusic Companion. Ph.D. dissertation,School of Computer Science, Carnegie-Mel-lon University.

Wessel, D.; Wright, M.; and Kahn, S. A.1998. Preparation for Improvised Perfor-mance in Collaboration with a KhyalSinger. In Proceedings of the 1998 Interna-tional Computer Music Conference,

497–503. San Francisco, Calif.: Internation-al Computer Music Association.

Widmer, G. 2001. The Musical ExpressionProject: A Challenge for Machine Learningand Knowledge Discovery. In Proceedingsof the Twelfth European Conference onMachine Learning, 603–614. Lecture Notesin Artificial Intelligence 2167. New York:Springer-Verlag.

Widmer, G. 1996. Learning Expressive Per-formance: The Structure-Level Approach.Journal of New Music Research 25(2): 179–-205.

Woods, W. 1970. Transition NetworkGrammars for Natural Language Analysis.Communications of the ACM 13(10):591–606.

Ramon Lopez de Man-taras is research full pro-fessor and deputy direc-tor of the ArtificialIntelligence Institute ofthe Spanish ScientificResearch Council. Hereceived a Ph.D. in ap-plied physics in 1977

from the University of Toulouse (France), anM.Sc. in engineering from the University ofCalifornia at Berkeley, and a Ph.D. in com-puter science in 1981 from the TechnicalUniversity of Catalonia. He was the recipi-ent of the Swets and the Zeitlinger Distin-guished Paper Award (corecipient) of theInternational Computer Music Associationin 1997. He is a fellow of the European Fed-eration of National Artificial IntelligenceAssociations. He is presently working oncase-based reasoning, multiagent ap-proaches to autonomous robots’ qualitativenavigation, and AI applications to music.His e-mail address is mantaras@ iiia.csic.es.

Josep Lluís Arcos is atenured scientist at theArtificial IntelligenceInstitute of the SpanishScientific ResearchCouncil (IIIA-CSIC). Hereceived an M.S. in musi-cal creation and soundtechnology from Pom-

peu Fabra Institute in 1996 and a Ph.D. incomputer science from the UniversitatPolitècnica de Catalunya in 1997. He wasthe corecipient of the Swets and ZeitlingerDistinguished Paper Award of the Interna-tional Computer Music Association in1997. He is currently working on represen-tation languages for integrating problemsolving, case-based reasoning, and learn-ing; the integration of software agents withlearning capabilities; and AI applications tomusic. His e-mail address [email protected].

Articles

FALL 2002 57

Page 16: Articles AI and Music

Articles

58 AI MAGAZINE

UnderstandingUnderstanding

Approaches to Music CognitionApproaches to Music Cognition

A Classic—still available from AAAI Press

To order, call 800-405-1619. http://mitpress.mit.edu