Upload
phungtu
View
223
Download
2
Embed Size (px)
Citation preview
VirtuaLatin - Agent Based
Percussive Accompaniment
David Murray-Rust
Master of Science
School of Informatics
University of Edinburgh
2003
Abstract
This project details the construction and analysis of a percussive agent, able to
add timbales accompaniment to pre-recorded salsa music. We propose, imple-
ment and test a novel representational structure specific to latin music, inspired
by Lerdahl and Jackendoff’s General Theory of Tonal Music, and incorporating
specific domain knowledge. This is found to capture the relevant information but
lack some flexibility.
We develop a music listening designed to build up these high level representa-
tions using harmonic and rhythmic aspects along with parallelism, but find that
it lacks the information necessary to create full representations. We develop a
generative system which uses expert knowledge and high level representations to
combine and alter templates in a musically sensitive manner. We implement and
test an agent based platform for the composition of music, which is found to con-
vey the necessary information and perform fast enough that real time operation
should be possible. Overall, we find that the agent is capable of creating accom-
paniment which is indistinguishable from human playing to the general public,
and difficult for domain experts to identify.
i
Acknowledgements
Thanks to everyone who has helped and supported me through this project, in
particular, Alan Smaill and Manuel Contreras my supervisor and co-supervisor,
and everyone who took the Salsa Challenge.
ii
Declaration
I declare that this thesis was composed by myself, that the work contained herein
is my own except where explicitly stated otherwise in the text, and that this work
has not been submitted for any other degree or professional qualification except
as specified.
(David Murray-Rust)
iii
Table of Contents
1 Introduction 1
1.1 The use of agent systems for musical activities . . . . . . . . . . . 1
1.2 Customised representations for latin music . . . . . . . . . . . . . 2
1.3 Output Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Musical analysis of latin music . . . . . . . . . . . . . . . . . . . . 3
1.5 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 5
2.1 Music Representations . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Common Practice Notation . . . . . . . . . . . . . . . . . 5
2.1.3 MIDI - Overview . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Music Representations and Analyses . . . . . . . . . . . . 7
2.2.2 Mechanical Analysis of Music . . . . . . . . . . . . . . . . 8
2.2.3 Computer Generated Music . . . . . . . . . . . . . . . . . 8
2.2.4 Agents and Music . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.5 Interactive Systems . . . . . . . . . . . . . . . . . . . . . . 11
2.2.6 Distributed Architectures . . . . . . . . . . . . . . . . . . 11
2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Design 14
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Higher Level Representations . . . . . . . . . . . . . . . . . . . . 14
iv
3.2.1 The GTTM and its Application to Latin Music . . . . . . 15
3.2.2 Desired Results . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.3 Design Philosophy . . . . . . . . . . . . . . . . . . . . . . 17
3.2.4 Well-Formedness Rules . . . . . . . . . . . . . . . . . . . 19
3.2.5 Preference Rules . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Agent System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Generative Methods . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.1 Basic Rhythm Selection . . . . . . . . . . . . . . . . . . . 26
3.4.2 Phrasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.3 Fills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.4 Chatter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Design Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 System Architecture 30
4.1 Agent Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.2 Class Hierarchy and Roles . . . . . . . . . . . . . . . . . 31
4.1.3 Information Flow . . . . . . . . . . . . . . . . . . . . . . 34
4.2 High Level Representations . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 Representation Classes . . . . . . . . . . . . . . . . . . . . 38
4.2.2 Human Readability . . . . . . . . . . . . . . . . . . . . . . 41
4.2.3 Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.4 Representations By Hand . . . . . . . . . . . . . . . . . . 44
4.3 Low Level Music Representation . . . . . . . . . . . . . . . . . . . 45
4.4 Architecture Summary . . . . . . . . . . . . . . . . . . . . . . . . 45
5 Music Listening 46
5.1 The Annotation Class . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.1 Harmonic Analysis . . . . . . . . . . . . . . . . . . . . . . 48
5.2.2 Pattern Analysis . . . . . . . . . . . . . . . . . . . . . . . 51
5.3 Rhythmic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 52
v
5.4 Dissection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.5 Music Listening Summary . . . . . . . . . . . . . . . . . . . . . . 54
6 Generative Methods 56
6.1 Basic Rhythm Selection . . . . . . . . . . . . . . . . . . . . . . . 56
6.2 Ornamentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.2.1 Phrasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.2.2 Fills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2.3 Chatter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2.4 Transformations . . . . . . . . . . . . . . . . . . . . . . . . 60
6.3 Modularity and Division of Labour . . . . . . . . . . . . . . . . . 61
6.3.1 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.4 Generative Methods Summary . . . . . . . . . . . . . . . . . . . . 62
7 Results and Discussion 63
7.1 Music Listening . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.1.1 Chordal Analysis . . . . . . . . . . . . . . . . . . . . . . . 64
7.1.2 Chord Pattern Analysis . . . . . . . . . . . . . . . . . . . 65
7.1.3 Phrasing Extraction . . . . . . . . . . . . . . . . . . . . . 66
7.1.4 Final Dissection . . . . . . . . . . . . . . . . . . . . . . . . 67
7.2 Listening Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.3 Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.3.1 Structural Assumptions . . . . . . . . . . . . . . . . . . . 70
7.4 Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8 Future Work 74
8.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.1.1 Chord Recognition . . . . . . . . . . . . . . . . . . . . . . 74
8.1.2 Pattern Analysis . . . . . . . . . . . . . . . . . . . . . . . 75
8.2 Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
8.2.1 Ornament Selection . . . . . . . . . . . . . . . . . . . . . . 76
8.2.2 Groove and Feel . . . . . . . . . . . . . . . . . . . . . . . . 77
8.2.3 Soloing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
vi
8.3 Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
8.4 Agent Environment . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.5 Long Term Improvements . . . . . . . . . . . . . . . . . . . . . . 79
9 Conclusions 83
A Musical Background 85
A.1 History and Use of the Timbales . . . . . . . . . . . . . . . . . . . 85
A.2 The Structure of Salsa Music . . . . . . . . . . . . . . . . . . . . 89
A.3 The Role of the Timbalero . . . . . . . . . . . . . . . . . . . . . . 90
A.4 Knowledge Elicitation . . . . . . . . . . . . . . . . . . . . . . . . 91
B MIDI Details 92
B.1 MIDI Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
B.2 MIDI Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
C jMusic 95
C.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
C.2 Alterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
C.3 jMusic Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
D Listening Assessment Test 99
E Example Output 101
Bibliography 108
vii
List of Figures
3.1 Representation Structure . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Example section: the montuno from Mi Tierra (Gloria Estefan),
leading up to the timbales solo . . . . . . . . . . . . . . . . . . . . 18
3.3 Possible Network Structures . . . . . . . . . . . . . . . . . . . . . 22
3.4 Possible Distributed Network Structure . . . . . . . . . . . . . . . 23
3.5 Music Messages Timeline . . . . . . . . . . . . . . . . . . . . . . . 24
3.6 Final Agent Architecture . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 Overview of System Structure . . . . . . . . . . . . . . . . . . . . 31
4.2 Class Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Message Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4 Example jMusic XML File . . . . . . . . . . . . . . . . . . . . . . 36
4.5 SequentialRequester and CyclicResponseCollector flow diagrams . 37
4.6 Different sets of notes which would be classified as as C major . . 40
4.7 Ambiguous Chords . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.8 Example fragment of Section textual output . . . . . . . . . . . . 43
5.1 Analysis Operations . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.1 Generative Structure . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Rhythm Selection Logic . . . . . . . . . . . . . . . . . . . . . . . 58
7.1 Guitar part for bars 22-25 of Mi Tierra, exhibiting split bar chords 65
7.2 Phrasing under the solos (bars 153-176) . . . . . . . . . . . . . . . 66
8.1 Chunk Latency for the Agent System . . . . . . . . . . . . . . . . 80
viii
A.1 Example Timbales setup (overhead view) . . . . . . . . . . . . . . 86
A.2 Scoring Timbale Sounds . . . . . . . . . . . . . . . . . . . . . . . 88
A.3 Standard Son Claves . . . . . . . . . . . . . . . . . . . . . . . . . 88
A.4 Basic Cascara pattern, with a backbeat on the hembra . . . . . . 89
C.1 jMusic Part, Phrase and Note Structure . . . . . . . . . . . . . . 96
ix
Chapter 1
Introduction
This report details the construction of VirtuaLatin, a software agent which is
capable of taking the place of a human timbalero (drummer) in a salsa band.
There are several “real world” reasons to do this, as well as research interest:
• As a practice tool for musicians, so that band rehearsals are possible when
the drummer is ill
• As a learning tool, to give illustrations of how and why timbales should be
played in the absence of a human teacher
• a first step on the road toward allowing hybrid ensembles of human and
mechanical performers
This is a large and complex task, so we identify four main areas of interest.
1.1 The use of agent systems for musical activities
The use of autonomous software agents is becoming increasingly widespread, and
as with many other technological advances, it is highly applicable to music. The
agent paradigm allows an opportunity to analyse the interaction between mu-
sicians, as well as each individual’s mental processes; we feel that this is a key
aspect of understanding how music is created. Ultimately, it is a step towards
a distributable heterogeneous environment in which musicians can play together
1
Chapter 1. Introduction 2
regardless of physical location or mental substrate. We describe an implementa-
tion of an agent infrastructure for musical activities, and analyse its use for both
the project at hand and future work.
1.2 Customised representations for latin music
Music exists in many forms; from the abstract forms in a composer or listeners
mind, through increasingly concrete formal representations such as musical scores
and MIDI data to physical measurements of the sound waves produced when the
music is played[8]. Each level of representation has its own characteristic virtues
and failings, and correct choice or design of representation is crucial to the success
of musical projects. We explore very different levels of musical representation here
- low level representations which allow the basic musical “facts” to be commu-
nicated between agents, and high level representations which seek to understand
the music being played.
When human musicians compose, play or listen to music, high level represen-
tations of the music are created, which enable a deeper understanding of musical
structure[18]. We therefore develop a novel high level symbolic representation of
latin music which captures all the important features of a piece in such a way as
to enable our agent to play in a highly musical manner.
1.3 Output Generation
The ultimate aspiration of the work presented here is to create high quality music;
as such, we need a subsystem which can work over the representations given to
perform in a musical manner. We use a combination of a rule based expert system
which can select and combine templates, and alter them to fit specific situations,
with domain knowledge and high level representations to provide playing which
supports and enhances the musical structure of the piece.
Chapter 1. Introduction 3
1.4 Musical analysis of latin music
In order to provide musically sensitive accompaniment to previously unheard
pieces, our agent needs to be capable of extracting the salient features from mu-
sic it is listening to, and using these to build up the higher level representations
it is going to work with. We combine modified versions of existing methods with
domain knowledge and bespoke algorithms to create a comprehensive analysis of
music heard, inspired by the structure of the GTTM [18]. We give a domain spe-
cific treatment of harmonic, rhythmic and structural features, including a search
for musical parallelism, and investigate whether this is capable of creating the
representations we need. We do not, however, integrate this with the generative
system.
1.5 Aims
The overall aim of the project is:
To create a process which is capable of providing a timbales accom-paniment to prerecorded salsa music, in an agent based environment,which is of sufficient quality to be indistinguishable from human play-ing.
This can be divided into four main aims:
1. construction of an agent environment suitable for the production of music
2. creation of representations which are suitably rich to inform the agent’s
playing
3. implementation of a generative system which can produce high quality out-
put
4. implementation of a music listening subsystem which can build the neces-
sary representations
The dissertation is structured as follows:
Chapter 1. Introduction 4
• some background on the general area, and a look at related work
• an explanation of the design concepts behind the system
• a look at the overall system architecture, including the agent platform and
the music representations used
• description of the music listening sections of the project
• detail of the generative methods used
• analysis of results and discussion
• ideas for further work
• some conclusions and final thoughts
Chapter 2
Background
This chapter gives some background to the project as a whole. A detailed dis-
cussion of latin music and the role of the timbalero in a latin ensemble is given
in Appendix A.
2.1 Music Representations
There are many different ways to represent music, with varying levels of com-
plexity and expression. An overview is given in [8], but here we briefly detail the
three standard representations which are most relevant to this project.
2.1.1 Audio
Audio data is the most basic representation of music, and consists of a direct
recording of the sound produced when it is played. In the digital domain this
consists of a series of samples which represent the waveform of a sound. It can be
used to represent any sound, but is very low level - it does not delineate pitches,
notes, beats or bars.
2.1.2 Common Practice Notation
Common Practice Notation (CPN) is the name given to standard “Western”
scores. It contains information on what notes are to be played at particular times
5
Chapter 2. Background 6
by each instrument. This information is then subject to interpretation - the exact
rendition is up to the players; parameters such as timing, dynamics and timbre
are to some extent encoded encoded in the score, but will generally be played
differently by different players, and are not trivially reproducible mechanically
(work relating to this is discussed below).
2.1.3 MIDI - Overview
MIDI stands somewhere in between Audio and CPN in terms of representational
levels. A MIDI file encodes:
• The start and end times, pitches and velocities of all notes
• Information regarding other parameters of each part (such as volume and
possible timbre changes)
• Information regarding what sounds should be used for each part
To some extent, this captures all of the information about a particular per-
formance - a MIDI recording of a pianist playing a certain piece will generally
be recognisable as the same performance. A MIDI file will be played back by a
sequencer, which in turn triggers a synthesiser to play sounds. It is in this stage
that interpretation is possible; the MIDI sequencer has no idea what sounds it
is triggering - it has simply asked for a sound by number (for example, sound
01 corresponds to a grand piano in the standard mapping). It is possible that
the synthesiser in question does not support all of the parameters encoded in
the MIDI file, or that the sounds are set up unexpectedly. Finally, different
synthesisers will produce sounds of varying quality and realism.
However, due in large part to conventions such as the General MIDI standard,
one can be fairly sure that playing a MIDI file on compatible equipment will sound
close to the authors intention. Thus we have a representational standard with
close to the realism of Audio, with many of the high level features present in
CPN. There exist many packages which can (with varying degrees of success)
turn MIDI data into CPN scores.
Chapter 2. Background 7
2.2 Literature Review
2.2.1 Music Representations and Analyses
A broad overview of the issues surrounding music representation is given by Dan-
nenburg [8]. He explores the problems in musical representation in several areas,
the most relevant of which being hierarchy and structure, timing, timbre and
notation.
One of the most cited works in reference to musical representation is the
Generative Theory of Tonal Music, by Lerdahl and Jackendoff [18]. This outlines
a manner in which to hierarchically segment music into structurally significant
groups, which it is argued is an essential step in developing an understanding of
the music. As presented, it has two main obstructions to implementation; firstly it
is incomplete1, and secondly it is not a full formal specification. Many of the rules
given are intentionally ambiguous - they indicate preferences, and often two rules
will indicate opposing decisions with no decision procedure being defined. Despite
these acknowledged issues, it provides a comprehensive framework on which music
listening applications can be built, and there are many partial implementations
which exhibit some degree of success.
A different aspect of musical representation is covered by the MusES system[24],
developed by Francois Pachet. A novel aspect of this system is the full treatment
of enharmonic spelling - that is, considering C# and Db to be different pitch
classes, despite the fact that they sound the same.2 This is a distinction which
may often be necessary to analysis. The design of the system leans towards sup-
port for analysis, but is intended to be able to support any development - it relies
on the idea that there is “some common sense layer of musical knowledge which
may be made explicit”[25].
MusES was originally developed in Smalltalk, but subsequently ported to
Java. Through conversations with F. Pachet, I was able to obtain a partial copy
1there are features such as parallelism which are relied on but no method for determining
them is given2in some tuning systems, when played on some instruments they may in fact be different.
On a piano keyboard, however, C# and Db are the same key
Chapter 2. Background 8
of the MusES library, and it would have made an ideal development platform.
Unfortunately, due to portions of the code being copyrighted, I was unable to
obtain a complete system.
[13] describes a highly detailed formal representation of music, capable of
representing a wide range of musical styles. An example is given of representing
a minimalist piece which does not have explicitly heard notes; rather, a continuous
set of sine waves is played, the amplitudes of which tend towards the idealised
spectrum of the implied note at any given time, with the frequencies of the tones
close to harmonics tending towards the ideal harmonics. The representation
allows for many different levels of hierarchy and grouping, and is specifically
designed for automated analysis tasks.
2.2.2 Mechanical Analysis of Music
There is a key distinction which lies at the heart of much musical analysis, and
in many ways is more deeply entrenched than in other disciplines: the divide
between symbolic and numeric analysis. This dichotomy is explored in [23], and
synthetic approaches suggested. Harmonic reasoning based in the MusES system
is compared with numeric harmonic analysis by NUSO, which performs statistical
analysis on tonal music. It is suggested that symbolic analysis performs well if
there are recognisable structures specific to a domain, and that numeric analysis
is likely to perform better on “arbitrary sequences of notes”.
2.2.3 Computer Generated Music
In order to create generative musical systems in a scientific manner, it is necessary
to have a specific goal in mind; this often includes tasks such as recreating a par-
ticular style of playing (imitative)3, creating music which has a specific function
(intentional), or testing a particular technique with respect to the generation of
music (technical).
Intentional music is particularly interesting due to it’s broad usage. Every day
3definitions are my own, intended to aid discussion not create a rigorous framework
Chapter 2. Background 9
we hear many pieces of music designed to have specific effects on us, rather than be
pleasurable to listen to. Film soundtracks, and the music in computer games are
two common examples. The creators of GhostWriter [27] (a virtual environment
used to aid children in creative writing in the horror genre) use music as a tool to
build and relieve tension — to support the surprise and suspense which are the
basic tools of the horror narrative. The tool proposed is a generative system which
takes as input a desired level of “scariness” (tension). This is then converted into
a set of parameters which control a high level form generator, a rhythmic section
and a harmonic section. The harmonic section is based on the musical work of
Herrman (who wrote scores for many of Hitchcock’s films, most notably Psycho)
and the theoretical work of Schoenberg. Although the system is not tested in
[27], tests to be performed are outlined.
Zimmeremann [30] uses complex models of musical structure to create music
designed to enhance presentations — the music is used to guide the audience’s
attention and motivation. One contention of this paper is that there is a missing
middle level in the theories of musical structure as applied to this domain -
while they are good at modelling high level structure (e.g. sonata form) and
low level forms (such as cadences and beats) a layer in between is needed, which
is called the music-rhetorical level. A structure of the presentation is created,
which defines series of important points, such as the announcement of an event,
or the introduction of an object, associated with a mood, function and a time.
This structure is then used to guide music-rhetorical operations. The system as
described is a partial implementation, and no analysis is given.
This leads us on to PACTs - Possible ACTions, introduced by Pachet as
strategies, and expanded in [26]. PACTs provide variable levels of description
for musical actions, from low level operations (play “C E G”, play loud, play
a certain rhythm) to high level concepts (play bluesy, play in a major scale).
These are clearly useful tools for intention based composition; they also allow
a different formulation of the the problem of producing musical output - rather
than starting with an empty bar and the problem being how to fill it, we can
start with a general impression of what to play, and the problem is to turn this
Chapter 2. Background 10
into a single concrete performance.
Even if the exact notes and rhythms are known (to the level of a musical
score), this is not generally sufficient to produce quality output. Hence there
are ongoing efforts to both understand how human players interpret scores, and
use this information to enhance the realism of mechanical performance. The
SaxEx system [7] has been designed to take as input sound file of a phrase played
inexpressively, some MIDI data describing the notes and an indication of the
desired output. Case Based Reasoning is then applied, and a new sound file is
created. It was found that this generated pleasing, natural output. The system
has also been extended [2] to include affect driven labels on three axes (tender-
aggressive, sad-joyful, calm-restless) for more control over output.
2.2.4 Agents and Music
There are several ways in which agents could be used for music. A natural
breakdown is to model each player in an ensemble as an agent. This is the
approach taken in the current project. A alternative would be to model a single
musician as a collection of agents, as in Minsky’s Society of Agents model of
cognition.
A middle path between these ideas is taken by Pachet in his investigations into
evolving rhythms [15]. Here, each percussive sound (e.g. kick drum, snare drum)
is assigned to an agent. The agents then work together to evolve a rhythm. They
are given a starting point, and a set of rules (expressed in the MusES system) and
play a loop continuously, with every agent listening to the output of all the others.
Typical rules are: emphasise strong/weak beats, move notes towards/away from
other notes and adding syncopation or double beats. From the interaction of
simple rules, it was found that some standard rhythms could be evolved, and the
interesting versions of existing rhythms could be produced.
The use of multiple agents for beat tracking is described in [11]. This system
creates several agents with different hypotheses about where the beat is, and
assigns greater weight to the agents which correctly predict many new beats. The
system is shown to be both computationally inexpensive and robust with respect
Chapter 2. Background 11
to different styles of music; in all test cases it correctly divined the tempo, the
only error being the phase (it sometimes tracked off-beats rather than on-beats).
2.2.5 Interactive Systems
Antoni Camurri has carried out a lot of work into interactive systems, and is
director of the Laboratorio di Informatica Musicale. 4. In [1] and [6], he looks
at analysis of human gestures and movement. In [4], he develops an architecture
for environmental agents, which alter an environment according to the actions of
people within it. He breaks these agents down in to input and output sections,
then a rational, emotional and reactive component. He finds the architecture
to be flexible, and has used it in performances. The architecture is extended in
[5] to give a fuller treatment of emotion, developing concepts such as happiness,
depression, vanity, apathy and anger.
Rowe [28] has developed the Cypher system, which can be used as an inter-
active compositional or performance tool. It does not use any stored scores, but
will play along with a human performer with “a distinctive style and a voice quite
recognizably different from the music presented at its input”. It offers a general
architecture on which the user can build many different types of system.
Another section of interest is auto accompaniment - creating mechanical sys-
tems which can “play along” with human performers. Raphael [9] creates a system
where the computer plays a prerecorded accompaniment in time to a soloist. It
uses a Hidden Markov model to model the soloist’s note onset times, a phase
vocoder to allow for variable speed playback, and a Bayesian network to link
the two. Training sessions (analogous to rehearsals) are used to train the belief
network.
2.2.6 Distributed Architectures
Since one of the great benefits of agent based approaches is that agents may
be distributed and of unknown origin (as long as they conform to a common
4http://musart.dist.unige.it/sito inglese/laboratorio/description.html
Chapter 2. Background 12
specification), a logical direction is the distributed composition or performance
of music. [16] describes some of the issues in distributed music applications. Two
of the key barriers are defined - latency (the average delay in information being
received after it has been transmitted) and jitter (the variability of this delay).
It is stated that one can generally compensate for jitter by increasing latency,
and that there is a problem with the current infrastructure in that there is no
provision made for Quality of Service specification or allocation. The issues of
representations and data transfer rate are discussed: audio represents a complete
description of the music played, while MIDI only specifies pitches and onsets.
This means that audio will be a more faithful reproduction, but that MIDI has
far lower data transfer rates (typically 0.1 5kbps against 256kbps for high quality
MP3 audio). It is concluded that it is currently impossible to perform music in
a fully distributed fashion, but that all of the problems have technical solutions
on the horizon - except the latencies due to the speed of light.
There are many constraints associated with real time programming; in re-
sponse to this, there have been attempts to set out agent systems designed to
handle real time operation. [12] discusses the difference between reactive and
cognitive agents, and gives a possible hybrid architecture which couples an outer
layer of behaviours (which may be reactive or cognitive) with a central supervisor
(based on an Augmented Transition Network). This ensures that hard goals are
met by reactive processes, but more complex cognitive functions can be performed
when the constraints are relaxed. [10] presents an agent language which allows
the specification of real time constraints, and a CORBA layer which enforces
this. Finally, [14] presents a real-time agent architecture which can take account
of temporal, structural and resource constraints, goal resolution and unexpected
results. This architecture is designed to be implemented by individual agents to
allow them to function in a time and resource limited environment.
Chapter 2. Background 13
2.3 Conclusions
Several pieces of work have been particularly inspiring for this project; the the-
oretical work of Lerdahl and Jackendoff suggets a very useful model for musical
analysis, and also helps support claims about musical structure. Pachet’s work on
the MusES system has been useful, as it has given a complete (working) frame-
work to examing, as well as the concept of PACTs. It is encouraging to see that
not much work has been done on interacting musical agents, so we are covering
new territory. Finally, the work of Rowe has demonstrated the possibilities of
interactive music, and given many concrete examples of how certain subsystems
may be implemented.
Chapter 3
Design
3.1 Overview
From the overall problem domain, we have selected several areas of interest:
• High level representations specific to latin music which are sufficient to
adequately inform the playing of a timbalero.
• Generative methods working over high level representations which are ca-
pable of creating realistic timbale playing.
• Music listening algorithms which are capable of generating the necessary
high level representations from raw musical data.
• Construction of an Agent based environment for musical processes.
The desired end result is a system which can combine these components to
generate high quality timbales parts to prerecorded salsa music.
3.2 Higher Level Representations
The musical representations discussed so far are designed to encode enough data
about a piece of music to enable its reproduction in some manner. A musician
either hearing or playing the music encoded in this form would need to have some
14
Chapter 3. Design 15
higher level understanding of the music in order to either play or hear the piece
correctly. It is these representations which we now consider.
In our specific case, we are attempting to create a representation which will:
• be internal to a particular agent
• aid the agent in generating its output
The goal is not a full formal analysis - this is both difficult and unnecessary.
The agent needs, at this stage:
• An idea of where it is in the piece
• An idea of what to play at this point in time
• Some idea as to what will happen next
3.2.1 The GTTM and its Application to Latin Music
There can be no doubt that the GTTM has played a massive role in the current
state of computational analysis of music - it appears in the bibliography of almost
every paper on the subject. It is the theoretical framework around which the
higher level representations used in this project have been built
To recap, the GTTM consists of four levels:
Grouping Structure segments the piece into a tree of units, with no overlap1.
Metrical Structure Divides the piece up by placing strong and weak beats at
a number of levels
Time-span Reduction calculates an importance for the pitches in a piece based
on grouping and metre
Prolongational Reduction calculates the harmonic and melodic importance
of pitches
1except for the case of elisions, where the last note of one group may also be the first note
of the next
Chapter 3. Design 16
At each of these levels there are a set of well formedness rules, and a set of
preference rules. The idea behind this is that there will often be many valid
interpretations of a section, so we should try and calculate which one is most
likely or preferred.
The GTTM is a very general theory, and in this case we are focusing on a
specific style of music; what extra information does this give us?
Latin music always has an repetitive rhythm going on. Although this may change
for different sections, there will always be a basic ‘groove’ happening. In
almost all cases, this will be based on a clave, a repeating two bar pattern
(see discussion elsewhere).
There are clearly defined hyper-measure structures - mambos, verses, montunos
and more - which provide the large structural elements from which a piece
is built. The actions of a player can generally be described using a single
sentence for each section ( “the horns play in the second mambo, and then
all the percussion stops except the clave in the bridge” )
3.2.2 Desired Results
In general, the smallest structural unit in latin music is the bar; phrases may be
played which cross bars, or which take up less than a single bar, but the structure
is defined in terms of bars. Further, the clave will continue throughout, and will
be implied even when not played. It follows that the necessary tasks are:
quantization of the incoming data, according to an isochronous pulse2
metricization of the quantized data into beats and bars
segmentation of the resulting bars into sections
2Quantization in this sense is different to standard usage in sequencers. In this case we mean
“determining the most appropriate isochronous pulse and notating the incoming events relative
to this”, rather than shifting incoming notes to be exact multiples of some chosen rhythmic
value.
Chapter 3. Design 17
Here we are assume that we are dealing with music which is described in terms
of beats and bars (i.e. metricised and quantized), we are only left with the task
of segmenting these bars and extracting relevant features from them - a process
described in Section 5.
3.2.3 Design Philosophy
The structures under consideration do not represent the music itself, but only its
higher level structure and features.
There are also some assumptions which are used to simplify matters:
Structural Assumption 1 There are high level sections of music with distinct
structural roles
Structural Assumption 2 The smallest structural unit in latin music is the
bar; phrases may be played which cross bars, or which take up less than a single
bar, but the structure is defined in terms of bars.
Structural Assumption 3 A bar contains one and only one chord
Structural Assumption 4 A segment contains one and only one groove
Grouping in the GTTM is completely hierarchical: each group contains other
groups down to the note level and is contained within a larger group up to the
group containing the entire piece; the number of grouping levels is unspecified.
A fully recursive structure is highly expressive, but may cause difficulty with
implementation and makes dealing with the resulting representation more com-
plex. It is clear that more than two levels of grouping would provide a richer
representation: a tune may have a repeated section which is composed of eight
bars of a steady rumba groove, followed by six bars of phrasing. It would make
sense to have this represented as one large group which contained two smaller
groups (see Figure 3.1) . This representation is more complex to manage than
one which considers only sections which are made up of sets of bars, but is ulti-
mately richer, and allows for specification of groove at the section level, which is
more appropriate than the bar level.
Chapter 3. Design 18
�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
���������������������������������������������������������������
���������������������������������������������������������
���������������������
���������
������������������������������
���������������������������������������������������������
���������������������
� � � � � � � � �
������������������������������������������������������������������������������������������������������������
������������������������������������������������������������������������������������������������������������
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
Section "Bridge"
Complete Tune
Segment(groove=phrasing)
Segment
Section "Bridge"
(groove=SonMontuno)
Figure 3.1: Representation Structure
Section Montuno
Segment 2-3 Son Montuno (5x) 2-3 Son Montuno Phrasing Only
Bar Cm Fm G7 G7 Cm Fm G7 G7
Phrasing 1, 1+, 2+
Figure 3.2: Example section: the montuno from Mi Tierra (Gloria Estefan), leading
up to the timbales solo
Since an arbitrary hierarchical tree of groups is likely to be difficult to deal
with, a more constrained representation is proposed. A Song is composed of
Sections, each Section is composed of Segments and each Segment is composed of
Bars. This can be seen as a specialisation of the grouping section of the GTTM,
to pick out areas of particular interest. There is to be some information associated
with each level of grouping, which is as follows:
Bar a bar is exactly four beats long, and contains one chord, and may contain
phrasing information.
Segment A segment is an integer number of bars, and has a constant groove
and instrumentation.
Section A section has a single, defined role in the piece.
Figure 3.2 shows an example representation of a section of Mi Tierra (Gloria
Estefan).
Chapter 3. Design 19
In a similar manner to the GTTM, we specify a set of Well Formedness and
Preference rules with which to perform the analyses. Some of these rules are
derived from the structure of the representations, and some are heuristics based
on musical knowledge.
3.2.4 Well-Formedness Rules
The well formedness rules here come from the design of the representation - there
is no psychological basis a la the GTTM, rather it is proposed as a beneficial
representation for the style of music in question.
The rules for a valid segment are:
SGWF 1 A Segment must contain an integer number of bars (Structural As-
sumption 2)
SGWF 2 A Segment must have one and only one groove associated with it
SGWF 3 A Segment must have only one set of instrumentation levels associated
with it
And for valid sections:
SCWF 1 A Section must have an integer number of Segments within it
SCWF 2 A Section must have a single role associated with it (Structural As-
sumption 1)
There are some implicit boundary conditions on these:
• The start of a piece is the start of a Section and Segment
• The end of a piece is the end of a Section and Segment
Chapter 3. Design 20
3.2.5 Preference Rules
The rules for preferences are more difficult. It is quite possible that different
musicians would group certain pieces differently, and there may be no “best”
analysis. The general goals are:
Preference Criterion 1 Maximise reusability - the more often a Section or Seg-
ment can be reused with minimal alterations, the better descriptor of the music it
is.
This supports the “chunking” often done by musicians (and visible on written
parts) which allows for easy specification of structure, such as “two verses then
a chorus”.
Preference Criterion 2 Avoid overly short units, which will complicate the
analysis and not reflect the perceived structure of the music
Preference Criterion 3 Capture as accurately as possible those structural ele-
ments which inform the playing of a timbalero.
There are some rules which are common to both Sections and Segments, and
come partially from personal experience, and partially from the goals given above:
UP 1 Prefer units which are similar or identical to other units, and hence reusable
(Preference Criterion 1)
UP 2 Prefer units with constant features
i.e. if given a choice of two places to make a break, choose the one which max-
imises the constancy of attributes on each side of the break.
UP 3 Prefer larger units (Preference Criterion 2)
UP 4 Prefer units whose size is a multiple of 4 bars, with extra preference given
for multiples of 8 and 16
Chapter 3. Design 21
This is a parallel to the specification in the GTTM of alternating strong beats at
each level.
Some rules are specific to this particular style of music, and also to Section or
Segments:
UP 5 Prefer units which either start or end with phrasing or fills.
Since phrasing and fills are used in part to support the structure of a piece, it
makes sense to use them to help with the dissection.
SGP 1 Prefer segments which have distinct instrumentation to surrounding seg-
ments
SCP 1 Prefer Sections which centre around a key and describe a tonal arc.
There is presently little to describe the method of creating Sections; a proper
treatment of this subject would require the analysis of a large amount of music,
which is outside the scope of this project.
In short, this representation builds on the hierarchical model set out in the
GTTM, but chooses to make certain levels of grouping special; these levels have
extra information attached to them, and are the only levels of grouping allowed.
3.3 Agent System
The Agent System used is designed to emulate an entire set of interactive agents
(be they human or mechanical) cooperating to create music together. Since only
one agent is being created here, this cannot be fully realised. To provide the
agent context, we use “Dumb Regurgitators” - agents which merely repeat the
music they have been given. Although this removes much of the interest relating
to agent systems, it is a necessary step on the way.
A group of musicians playing together could well be modelled as a set of
agents, each of which is communicating with all of the others3. A conductor
3there are other possibilities: [? ] describes a real-time Blackboard system where each agent
reads and writes data to a central blackboard
Chapter 3. Design 22
Fully InterconnectedNetwork
Central Conductor
Figure 3.3: Possible Network Structures
could be added to provide synchronisation, and potentially an audience could be
added. This has O(n2) complexity with the number of musicians, so we propose a
simpler design (see Figure 3.3) where each musician communicates only with the
central conductor. This would also allow better support for highly distributed
heterogeneous ensembles (see Figure 3.4) , as each platform could have a single
Conductor which handled synchronisation issues and gave any human players the
necessary framework to play within.
Ideally, the system would support these distributed heterogeneous ensembles,
but in reality this is likely to be a very complex problem. Almost any kind
of network based system will have sufficient latency to render fully real time
interaction with human players difficult at best[16], not to mention the time
taken for the virtual musicians to respond to events. Even having a synchronised
click would cause problems, as a musician on one platform would hear the other
musicians performances as being offset from the click. With a fully agent based
system, it is a slightly different story. Even if the system is to run as a real-
time system, it is not necessary that every part runs in real-time, and there is
the possibility for parts to run delayed with respect to others4. The system as
4something that not many human musicians are capable of doing intentionally!
Chapter 3. Design 23
Other Agent Platforms
Agent Platform 1 Agent Platform 2
Conductor 1 Conductor 2
HumanMusician
HumanMusician
Figure 3.4: Possible Distributed Network Structure
Chapter 3. Design 24
Segment starts playing
Segment sent out to all agents
Segment received by agent
Agent sends next segment
Conductor receives next segment
Next segment starts playing
Network Latency
Time allowed forchunk construction
Network Latency
Time
Figure 3.5: Music Messages Timeline
proposed addresses this issue by working with chunks of output. The central
conductor obtains one chunk of output from all of the musicians involved, starts
outputting this, and then delivers copies of the collated chunks to all the musicians
(see Figure 3.5) . This means that:
• every agent only has knowledge up to the end of the previously distributed
chunk. This appears to be reasonable, as it will always take a human some
time to process sounds which are heard, and decisions have to be made as to
what to play before it is played (and hence before you know what everyone
else will play)
• the only constraint necessary for real time operation is that all of the agents
produce their next output chunks before the current chunk has finished
playing.
For this project, bars are going to be used as the chunk size, as this feels
natural. The final proposed architecture is shown in Figure 3.6
Chapter 3. Design 25
Conductor
Dumb Regurgitator
Dumb Regurgitator
Dumb Regurgitator
Timbalero
Musicto play
MusicalOutput
Figure 3.6: Final Agent Architecture
3.4 Generative Methods
The virtual musician must use a representation of the piece being played along
with some internal logic to create its output. The output is largely sequenced
- that is, it is constructed by compositing existing parts, with alteration where
necessary.
In the case of timbales playing, there is a relatively small corpus of possible
rhythms, and a relatively well defined method for choosing which rhythms to play
at any given time. However, the choice of when and how to ornament this basic
playing is less well defined, and could be implemented differently by different
players, or even the same player at different times. The rhythmic selection is
hence split into two parts:
Basic rhythm selection is to be performed deterministically, using a set of
rules, to decide what kind of rhythm to play for a particular segment.
Ornament selection is performed on a per-bar basis to determine whether to
Chapter 3. Design 26
add ornamentation, and if so, what.
Ornament selection is further divided into three distinct categories:
Phrasing involves the entire band, picking out a set of notes in a certain bar.
The timbalero will typically use cymbals and/or the loud open notes of the
timbales to accent these notes. Depending on the surrounding structure
and the spacing of the accented notes, the timbalero has three options:
• Continue playing as much of the basic rhythm as possible, while adding
emphasis to the specified notes
• Play only the specified notes
• Play the specified notes with small fills or ornaments in between.
Fills are the most well known ornament. When a player plays a fill, the basic
rhythm is stopped for the duration of the fill. Fills are generally technically
impressive, dynamically exciting and can provide a more complex rhythmic
counterpoint than the standard rhythm. Fills also often accent a particular
beat - normally the end of the fill, and often the downbeat of the bar after
the one in which the fill starts (although the last beat of the fill bar and
the second beat of the post fill bar are also common in latin music).
Chatter is a term derived from jazz music, to describe non-repeating patterns
played on the snare drum with the left hand (which is typically not used in
the basic rhythm, or may provide a simple backbeat) while the basic rhythm
continues in full. This is used to create tension, add rhythmic complexity
and generally avoid monotony.
3.4.1 Basic Rhythm Selection
Timbales playing is interesting in the degree of orthogonality between the patterns
in each hand. Apart from some patterns where left and right hands are used
together, it is generally possible to fit many left handed variations to a single
right hand part. The factors which affect these choices are:
Chapter 3. Design 27
Right Hand Left Hand
Cascara Clave (on block)
Doble Pailas
Hembra Backbeat
tacet
Mambo Bell Hembra Backbeat
Clave (on block)
Campana Pattern
Table 3.1: Instrumentation by Hand
• The style of the piece
• The current instrumentation
• The structural role of the current section
• The current dynamic
Table 3.1 gives common combinations of sounds played by each hand (see
Appendix A for details). For each combination, different specific rhythms may
be used - there are a variety of cascara patterns in common use, the clave will
change depending on the style of the piece etc.
The system should be designed to analyse the current surroundings and select
the appropriate basic rhythm. From the analysis of Salsa music earlier, we have
the following information to use:
• A salsa tune will consist of a beginning section in traditional Son style and a
second section in a more upbeat Montuno style. The start of the Montuno
is the high point of the piece, and after this the intensity does not drop
much until the end, although there may be a small coda at the end which
is a re-statement of the introduction.
• The Mambo bell is used from the Montuno onwards. While it is being
played, if there is no bongo player playing the Campana part, the timbalero
Chapter 3. Design 28
will do this; otherwise, the left hand plays a back beat on the Hembra.
• In the Son sections, the right hand is always playing cascara. The left hand
can fill in the gaps to play Doble Pailas in the louder sections, add in the
clave if no-one else is playing it in the quiet sections, or do nothing.
3.4.2 Phrasing
Phrasing is a key way to make a performance sound more dynamic and cohesive.
At present, Phrasing information is present only as a set of points within the bar
where the accenting should occur; this is in keeping with the musical practice of
identifying phrasing by accent marks, but does not encode all the information
a musician would use (for example, if the notes played by the rest of the band
have a downwards trend, a timbalero might add phrasing that moved from higher
towards lower pitched sounds)
There are two common modes of phrasing. Sometimes the bar is played as
normal, but the whole band will pick out certain notes to accent. Alternatively,
there may be some bars where everything stops except the phrasing;
3.4.3 Fills
As well as relieving monotony, fills are also used to highlight structural features,
such as changing from a one section to another. Also, fills are more likely to occur
in metrically strong bars.
The Timbalero uses a set of weighted rules to determine when to play a fill.
The rules are:
Fill Placement 1 Prefer fills on the last bar of an eight bar group (starting from
the start of a Section)
Fill Placement 2 Prefer fills on the last bar of a Section
Fill Placement 3 Prefer fills on the last bar of a Segment
Fill Placement 4 If Rule 3 is in force, Prefer fills when the next Segment has
a higher intensity than the current Segment
Chapter 3. Design 29
3.4.4 Chatter
Chatter is less structurally significant than fills are, and can be more widely
applied. A similar set of rules are used to determine when to add chatter:
Chatter Placement 1 Prefer chatter in loud/intense sections
Chatter Placement 2 Prefer chatter in Mambo sections
Chatter Placement 3 Prefer chatter towards the end of a section
Chatter Placement 4 Prefer chatter in the fourth bar of a four bar block (from
the beginning of the Section)
Chatter Placement 5 Prefer chatter in the fourth bar of an eight bar block
(from the beginning of the Section)
Chatter Placement 6 Avoid chatter on the first bar of a section
Chatter Placement 7 Prefer chatter if we played chatter in the previous bar
and it has a followOn
Chatter Placement 8 Avoid chatter if we have played a lot recently
Chatter Placement 9 Avoid chatter straight after a fill
3.5 Design Summary
Several assumptions have been made, based on expert knowledge, about the
structure of latin music. A high level representation system has been proposed,
following the general structure of the GTTM, but adapted to latin music by
means of the assumptions described. We have broken down timbales playing into
the selection of a basic rhythm and the addition of ornaments, and outlined the
principles used to select the basic rhythm. We have divided ornamentation into
three categories - phrasing, fills and chatter - and set out preference rules for
deciding when to add ornamentation of each type.
Chapter 4
System Architecture
In this section we discuss the infrastructure implementation, covering first the
agent platform, it’s protocols and interactions, then the high level representations
derived from the structural discussion in the previous section and finally the low
level musical representations which form the foundations of the system.
The following platform decisions have been made:
• The project is implemented in Java, due to personal familiarity, portability
and the availability of necessary libraries.
• The system will be able to read and write standard MIDI files, to allow
access to music stored in that format and enable usage of the wealth of
tools for turning MIDI into listenable music
• Agent functionality will be provided by the JADE libraries, which are Free,
stable and FIPA compliant
• Low level musical functionality will be provided by the jMusic library1,
which is also Free software.
The project aims to meet all of the objectives set out at the start of the Design
section.
1http://jmusic.ci.qut.edu.au/
30
Chapter 4. System Architecture 31
Timbalero
DumbRegurgitators
ManualAnalysis
GenerativeSubsystem
MusicalAnalysis
Subsystem
Conductor
MIDI FileInput
SongInitiator
Trumpet
Piano
.
.other
musicians
.
.
.
MIDI FileOutput
Figure 4.1: Overview of System Structure
4.1 Agent Architecture
4.1.1 Overview
Figure 4.1 shows an overview of the entire system. In brief, a Timbalero and
a SongInitiator are created. The SongInitiator reads in a MIDI file, and
creates an agent to play each part. It then creates a Conductor who conducts
the piece and then writes the output to another MIDI file.
4.1.2 Class Hierarchy and Roles
Figure 4.2 shows the class inheritance for the Agent classes created.
jade.core.Agent is the JADE class from which all Agents must be derived.
MusicalAgent A musical agent has some understanding of music. This entails
Chapter 4. System Architecture 32
jade.core.Agent
com.moseph.music.MusicalAgent
Musician SongInitiator Conductor
DumbRegurgitator ListeningMusician
Timbalero
Figure 4.2: Class Hierarchy
Chapter 4. System Architecture 33
being able to transmit and receive music in messages, and find out about
other musicians.
Musician A musician produces music. It has an identity, and provides the
service MusicalAgent.musicProducer. It can respond for requests to play
a bar of music, send it’s identity and restart the current song. The basic
musician class will respond with null to every request for a bar, as it has
no idea what to play.
SongInitiator The SongInitiator starts a song. It takes a filename as an
argument, and opens the specified MIDI file. It reads the file into Parts for
each musician, then creates a DumbRegurgitator to play each part. Finally,
it creates a Conductor to oversee playing the whole song, and finishes.
Conductor The conductor sits in the middle of the whole ensemble, and per-
forms several important tasks:
• Gathering information about the surrounding musicians.
• Requesting output from all the musicians, collating it and relaying the
combined information to all the ListeningMusicians.
• Recording all the playing so far and writing it to a MIDI file at the
end of the piece.
DumbRegurgitator To simulate other agents, the DumbRegurgitator takes
one Part of a tune, and returns the bars of it sequentially every time it is
asked for the next bar of music.
ListeningMusician A ListeningMusician adds the ability to receive music to
the basic Musician class. It provides the service MusicalAgent.musicListener,
and the Conductor sends collated output to all Agents providing this ser-
vice.
Timbalero The Timbalero adds the mechanics necessary to play the timbales to
a ListeningMusician - a Representation, Generator and an Annotation.
Chapter 4. System Architecture 34
Player
Conductor
ID Collection
Main SongLoop
Key: Message Structure
Listener
Send ID
INFORMIdentity String
Generatenext Bar
Sendnext Bar
INFORMXML Serialised
jMusic Part
REQUESTString:
BAR_REQUEST
CollateBars
REQUESTString:
IDENTITY_REQUEST
StoreIdentities
RequestIdentities
RequestBar
INFORMSerialised Java
Map of Identity Strings
SendBars
INFORMSerialised Java
Map of XMLSerialised
jMusic Parts
Receive Bar
Receive IDs
FIPA PerformativeContent
Description
Figure 4.3: Message Flow
ChordTranscriber A simple test harness for the chord recognition/key induc-
tion algorithm.
4.1.3 Information Flow
Figure 4.3 shows the messages passed between the agents in the system.
A precise timeline of the whole process is as follows:
1. an Ant buildfile (called with ant run) creates the agent platform with two
agents, a Timbalero and a SongInitiator, and passes them the name of
the song to be played.
2. The SongInitiator reads in the appropriate MIDI file and sets up the rest
of the agents:
Chapter 4. System Architecture 35
• a DumbRegurgitator is created for each part, and is passed an Identity
and a jMusic Part.
• a Conductor is created, and passed the name of the song, the length
to play for and the tempo of the song.
The SongInitiator then deletes itself
3. The Conductor requests the Identity of all the players present. Once they
reply, it sends all of the identities to each listener.
4. The Conductor now starts the main song loop. Each bar, the conductor:
(a) Requests a bar from each music producer
(b) Waits until it has received a bar from everyone
(c) Collates the bars, stores them, and sends a copy to all music listeners
5. Once the required number of bars have been played, the Conductor writes
the collected output to a file.
6. If the representation building abilities of the Timbalero are being tested,
the conductor sends out a request to restart the current song. The main loop
is then repeated to give the Timbalero a chance to use the representation
it has built up on the first iteration of the song.
4.1.3.1 Messages
While attempts have been made to use re-usable protocols for communication,
in some cases platform specific messages have been used; the system is amenable
to being made portable, but more work needs to be done.
Messages are sent as ACLMessages, as provided by the JADE framework.
FIPA performatives are used to distinguish different types of message, along with
a user defined parameters to further specify the communication. Where possible,
messages contain simple strings in the content: field, although in some cases
serialised Java objects are sent. In general, the conversation specifications are
Chapter 4. System Architecture 36
<?xml version="1.0" encoding="utf-8"?>
<Score tempo="180.0">
<Part title="Strings">
<Phrase tempo="-1.0">
<Note pitch="36" dynamic="80" rhythmValue="2.0" duration="1.91" />
<Note pitch="43" dynamic="80" rhythmValue="2.0" duration="1.91" />
</Phrase>
</Part>
</Score>
Figure 4.4: Example jMusic XML File
honoured, so messages will contain the correct :reply-to values, and Behaviours
expecting replies will only consider the correct messages.
Single parts of music are sent as XML fragments, using the jMusic serialisation
methods. This would allow other XML fluent applications access to the data, and
is a relatively simple language while encoding most of the necessary information
(see Figure 4.4). When parts are collated, they are sent as serialised Java hashes,
containing the XML strings indexed by agent ID.
Identities are sent as the stringified form the the Identity class. This is
simply a comma separated list of all the attributes, so it should be readily parsed
by other applications.
4.1.3.2 Behaviours
Although JADE defines several Behaviour classes, these were not really sufficient
for the task at hand so some new behaviours were defined. At least some of these
rely on the SendingState class, which allows messages to be sent to several agents
and keeps track of who has replied and who has not, and can be shared between
multiple behaviours.
These behaviours operate as SequentialRequester and
CyclicResponseCollector pairs. The requester will request a response
from a certain class of agents, and the collector will catch all the responses and
notify the requester when they are all in. The requester will then request the
Chapter 4. System Architecture 37
Get Receivers
Send Message
Add to State
finishedSequence()
Add requesterbehaviour
All MessagesReceived?
maxSeqreached?
action()
everyonereplied?
removeself
action()
block
ASSERT:Messages matches
template and is reply to sent message
responsesCollected()
returnreturnreturn
yes
yesyes
no
nono
Sequential Requester Cyclic Response Collector
Figure 4.5: SequentialRequester and CyclicResponseCollector flow diagrams
Chapter 4. System Architecture 38
next message (see Figure 4.5). These are used by the Conductor for identity
collection at the start of the run, the main song loop, and requesting that the
song be restarted.
The musicians use a far simpler model. They wait for a request, whether it is
for a bar, their identity or to restart, and respond appropriately when they get
one.
4.2 High Level Representations
4.2.1 Representation Classes
The higher level classes are designed to directly model the high level represen-
tations discussed in the previous chapter. We have a Java class for each of the
main structures, plus some supporting classes.
4.2.1.1 Bar
A Bar represents a single bar of music. As per Assumption 3 above, each bar may
only have a single chord associated with it. A Bar hence has only two parameters:
Chord The chord played in the bar.
Accents Any notes within the bar which are especially accented by other musi-
cians. This covers both accents within bars of normal groove and the special
accents within areas of phrasing (see Section 6.3)
A bar knows nothing about who is playing in it or what groove is being played;
bars are designed to be a simplistic as possible.
4.2.1.2 Segment
A Segment represents a group of several bars where a certain set of features are
(reasonably) constant. The parameters of a Segment are:
Style This is a string specifying a particular style. The style may be a rhythmic
style (e.g. Bomba or Rumba) or one of the special styles (e.g. PhrasingOnly
Chapter 4. System Architecture 39
or TimabalesSolo). This is the name of the Java class which will provide the
output for the bars in this Segment (See Section 6.3 for a full description
of how Styles are used)
Clave The clave will always be “3-2” or “2-3”, except in certain Styles (such as
PhrasingOnly) where it may be omitted. If a Segment has an odd number
of bars, the following Segment will generally have the opposite clave.
Bars are contained in a Vector
Instrumentation A hash of player names along with a floating point value
representing their contribution.
Intensity is a single floating point number which provides a general measure-
ment of how intense the playing is at that point (number of instruments,
how loudly they are playing etc.).
Intensity Change takes a value depending on whether the intensity is increas-
ing, decreasing or remaining constant over the course of the Segment.
4.2.1.3 Section
Sections are the highest level structural elements. They contain Segments, and
the only restriction is that a Section has only one structural role.
Name is the working name of the Section, and is used solely to aid human
comprehension, as it is nice to see “VerseA”, ”Instrumental Chorus” etc.
rather than anonymous blocks of Segments.
Segments are stored in a Vector
Role defines the structural role of the section. This is one of the roles laid out
in the discussions on the structure of Salsa music (see Section ?)
Chapter 4. System Architecture 40
���
�
�
�
�
���
�
�
��
Figure 4.6: Different sets of notes which would be classified as as C major
Dm7
����
F6
��
��
Figure 4.7: Ambiguous Chords: conventionally, the first chord is written as Dm7,
while the second would be F6
4.2.1.4 Chord
The representation of chords is a relatively complex problem; Chords are typically
presented as a root note followed by a string denoting the “extension” of the chord
(e.g. ‘Abm’ ‘E7’). There are often different sets of notes which would be given the
same name (Figure 4.6), and there are different ways of writing the same chord -
a chord containing the notes D, F, A and C could be written as Dmin7, or as F6
(see [29], pages 50,51). In some cases the voicing helps, as shown in Figure 4.7.
To avoid problems, the Chord class has been made as flexible as possible. Chords
are represented as a root note (an integer between 0 and 11) and an extension,
which is an array of 7 integer values. These values represent the presence of notes
in a sequence of thirds starting from the root, taking the values given in Table
4.1. Table 4.2 gives some examples.
This allows multiple representations of the same data: in a C chord, the note
Eb would be written as “x -1 x x x x x”, but D# would be “x x x x 2 x x”, yet
they are the same note. In practice this is useful, as a chord speller/recogniser
will make its own choice about which representation is correct.
The Chord class contains many types of chord. Each type of chord can have
Chapter 4. System Architecture 41
0 Note not present
1 Note present in normal form
-1 Note present in diminished/minor form
2 Note present int augmented form
Table 4.1: Meanings of extension values
Chord Root Extension Notes Present
C major C [1, 1,1,0,0,0,0] C,E,G
C minor C [1,-1,1,0,0,0,0] C,Eb,G
D minor D [1,-1,1,0,0,0,0] D,F,A
unknown C [1, 1,1,1,1,1,1] C,E,G,B,D,F,A
Table 4.2: Example Chords
several extension vectors (for example, [1,1,1,0,0,0,0], [1,1,0,0,0,0,0] and
[1,0,1,0,0,0,0] are all considered to be major chords), and several ways of
being written (for example, ’C’, ’C maj’ and ’C major’ all represent a C major
chord).
4.2.2 Human Readability
In order to increase the usability of these classes, a lot of work has been put
into making sure that these objects are easy to create and visualise - this allows
for easy debugging as well as clear and maintainable coding. In all cases the
toString() method has been overridden to output something human readable,
and methods are available to create objects from easily understood strings.
Some examples are:
Chords can be created from a chord string detailing the root note of the chord
and an extension, for example ’Eb’, ’C minor’, ’Ddom7’. 2
2The exact specification is the regex ^([A-G][b#]?)\s?(.*)$
Chapter 4. System Architecture 42
Bars can be created from a chord string, as this is the only necessary information.
They output accents according to the informal counting convention used by
drummers 3. A Bar in D major with accents on the second beat and halfway
through the third beat would become "D ‘2 ‘3+".
Segments are defined by a Style, a clave and a list of Bars. The list of bars
can be specified by a set of Chord specifiers surrounded by ’|’ symbols,
for example new Segment( "SonMontuno", "3-2", "|C|C|D|G|" ).
Sections wrap the output from their component Segments and add role and
name information. Example section output is given in Figure 4.8.
Rhythms can be input using a simple specification language based on a grid
on which notes may or may not be played. The characters X x o . rep-
resent decreasingly loud strikes, with everything else representing a rest.
The following code will return a basic cascara/clave pattern (as shown in
Appendix A:
String[] basicRhythm = new String[] {
"CASCARA_RIGHT |X X xX xX xX x x|",
"BLOCK | x x x x x |" };
return UtilityMethods.stringToRhythm( basicRhythm, 16, QUAVER );
Although this might seem like a trivial detail, the ability to quickly and
intuitively add rhythmic fragments (not to mention their subsequent main-
tainability) is a large factor in the quality of the final output.
4.2.3 Identities
Identities are used for two reasons:
• Real musicians have identities, and it may be useful to remember people
you have played with previously.
3A bar full of semiquavers is counted as as “one ee and uh, two ee and uh, three . . . ” (here
represented as "1 ee + uh 2 ee + uh ...", and a note which starts one quaver after the
beginning of the bar is said to be “on the and of one”.
Chapter 4. System Architecture 43
--------
Section: Instrumental Chorus (Mambo)
8 bars of 2-3 SonMontuno: |Cm|Cm|G|G|Fm|G|G7|Cm ‘1+ ‘2+ ‘3+ ‘4|
Instrumentation:- : Trumpet->0.5
Intensity: 0.8, 0
8 bars of 2-3 SonMontuno: |Cm|Cm|G|G|Fm|G|G7|Cm ‘1+ ‘2+ ‘3+ ‘4|
Instrumentation:- :
Intensity: 0.8, 0
--------
--------
Section: Bridge (Son)
4 bars of PhrasingOnly: |Cm|Cm ‘2+ ‘3 ‘4|Cm|Cm ‘2+ ‘3 ‘4|
Instrumentation:- :
Intensity: 0.6, 0
--------
Figure 4.8: Example fragment of Section textual output
• We need a way to keep track of the MIDI instrument settings so that the
file can be reconstructed at the end.
MIDI files will specify an instrument number for each part, along with a
textual description of the part. This means that there might be a “Piano Solo”
part as well as a “Piano” part. There are also many different instrument numbers
for a single instrument, for example, here are the first few entries in the GM spec:
PC# Instrument
001 Acoustic Grand Piano
002 Bright Acoustic Piano
003 Electric Grand Piano
004 Honky-tonk Piano
A piano part could have any one of these instrument numbers (and more).
The agents in the system do not want to deal with either of these directly -
they simply want to know what instrument someone is playing. Hence, when the
SongInitiator starts the song, it will turn this information into a canonicalInstrument.
This will return a single string which is the internal representation of the instru-
ment played by that agent, and examples are “piano”, “bass”, “vocals”. Since
Chapter 4. System Architecture 44
this process is not foolproof, some MIDI files must be hand edited to set up the
correct names.
The Identity class encodes
• A name for each musician (these currently serve no purpose other than to
make debugging more interesting and illustrate how they could be used,
but could be used to learn the styles of certain musicians)
• A canonical instrument string
• A GM instrument number
• A MIDI channel specification
At present, there are two ways in which Identitys are used:
• the Conductor uses the MIDI channel and instrument number to write the
finished MIDI file
• some of the musical analysis modules will only look at certain instruments
(for example, the chordal analysis sections only consider bass, piano and
guitar.
4.2.4 Representations By Hand
It was necessary to create high level representations by hand in order to test
the generative subsystem without the availability of a finished analysis system.
Annotations for each piece were made using the relevant MIDI file and a recording
of the piece (as played by human musicians). Chordal analysis was carried out
by examining the notes on the score, while structural features were often more
readily derivable from the recorded version. At this stage it was also verified that
the MIDI files were true and accurate representations of the piece in question.
Chapter 4. System Architecture 45
4.3 Low Level Music Representation
Three methods of music representation have been discussed so far: Audio, MIDI
and CPN. Unfortunately, none of these are quite correct for the task at hand;
MIDI is slightly too low level, while CPN does not contain enough information.
Some of the deficiencies of MIDI are:
• The timestamps in ticks are not friendly to work with
• The length of a note is not known until it finishes
• The concept of beats and bars is derivable, but not given
• It is not always clear what instrument is playing each track.
Originally, a new low level music representation was designed and imple-
mented for this project, complete with MIDI file I/O and XML serialisation.
Development was subsequently switched to the jMusic libraries, as they offered
a more complete implementation (most notably in the presence of constants for
many musical values and display of music as scores and as piano roll notation).
A general description of the jMusic framework is given in Appendix C
4.4 Architecture Summary
In this section we have given specific implementations of the high level represen-
tational elements discussed in Section 3.2. An agent system has been defined in
terms of messages and behaviours: a Conductor agent requests each bar sequen-
tially from all available musicians, which are then collated and disseminated to
all interested musicians. A low level representation of music has been specified.
This gives a complete agent based infrastructure for musical activities.
Chapter 5
Music Listening
In this section, we explore the musical analyses which we hope to use to build up
high level representations by looking at ways to extract the features necessary for
the grouping and well-formedness rules discussed in Chapter 3.
The Timbalero attempts to analyse the structure of the music it plays along
with, in the hopes of building up a representation which is accurate enough to
produce high quality output. The structural rules rules given in Sections 3.2.4 and
3.2.5 are used to produce a set of break points which mark the transitions from
segment to segment and section to section. The features necessary to populate
this representation are then extracted from the relevant sections of the music
heard (see Figure 5.1)
5.1 The Annotation Class
The Annotation class is derived from the Memory class (described in Section
6.3.1) and is used by the agent to keep track of all the features extracted from a
piece of music. It consists of a set of strands, each of which stores a particular
class of object, and is referenced by name. Features and Rules both use the
Annotation to store their findings so that they are available to all. At the end
of the piece, the Timbalero can instruct the Annotation to segment itself, and
that way get a Representation of the music heard.
46
Chapter 5. Music Listening 47
FeatureExtraction
Complex FeatureExtraction
PreferenceRules
Well FormednessRules
Music Heard
NecessaryBreak Points
PreferredBreak Points
Dissection
Representation
AttributeExtraction
Figure 5.1: Analysis Operations
Chapter 5. Music Listening 48
5.2 Feature Extraction
There are two kinds of feature extraction considered - simple and complex. Sim-
ple feature extraction works directly on the music itself, while complex feature
extraction uses the results of previous feature extraction. In principle there is
little to distinguish the two methods other than the assertion that all simple
features must be extracted before the complex features are begun. Key Features
include the current instrumentation, the number of players, the bar to bar change
of both of these, harmonic information and phrasing information. Each Feature
creates and uses a strand in the Timbalero’s Annotation (see Section 5.1).
5.2.1 Harmonic Analysis
The basic level of harmonic analysis is performed using a modified version of
the Parncutt chord recogniser given in [29] in combination with a key finding
algorithm due to Longuet-Higgins[19], again as described in [29]. Several small
modifications of this were tried to give a simple adaption to the polyphonic envi-
ronment. The basic Parncutt algorithm looks at the presence of notes only; often,
due to the many voices involved in latin music, this would result in a severely
overloaded harmonic structure. Some attempts were hence made to look at the
weighting of pitch classes1, so notes held for more time would have more effect on
the final decision. The algorithm used is as follows (more formally in Figure 1):
• Construct a single jMusic Part containing all the notes to be analysed.
Presently, the system only looks at piano, bass and guitar parts.
• A strength is calculated for each of the 12 pitch classes . The algorithm
iterates over the notes given, and adds the duration of each note to the
relevant pitch class.
• The set of n significant notes is calculated. From the notes over a certain
threshold (currently 1.0), the strongest are chosen, up to a maximum of n.
1pitch classes represent the 12 notes of the scale, numbered from zero to 11
Chapter 5. Music Listening 49
• The presence array is calculated which contains the value 1.0 for each sig-
nificant pitch class, and 0 for the rest.
• For each possible chord root, a score is calculated, by multiplying the pres-
ence array by the Parncutt root support vector rotated2 by the integer value
of the potential root.
• The scores are normalised so that the average score is 10
• A vector of biases is now added. This can be one or both of:
– The Krummhansl and Kessler stability profiles
– An extra weighting for the lowest note in the given note set
• The pitch class with the highest score is chosen as the best root.
Once the best root has been found, the extension of the chord is calculated.
The getSignificant() function is used to extract the (up to) four most salient
notes played in the bar. These are then translated into entries is the sequence of
thirds above the root, by the Chord class.
The pitch finding algorithm uses the Krummhansl and Kessler stability pro-
files [17] which are a set of weights that indicate the compatibility of the possible
root with the current key centre. In order to do this we need to know what the
current key centre is, and to know that we have to know what chords are being
used. To avoid this deadlock, the most likely current chord is first calculated
without the contextual information. This is then used to update the weights for
the different key centres, so that the current key centre can be computed. The
best root can now be computed again, using the new contextual information.
To calculate the current key centre, a vector of key centre weights is main-
tained. This has 24 entries - one for each pitch class in major and minor flavours.
Each time a new chord is encountered, a fixed compatibility vector is rotated
according to the new chord’s root and flavour and added to the current vector of
2we use rotated here to mean moving each element in the vector right by n places, and
reinserti theng any that “fall off the end” on the left
Chapter 5. Music Listening 50
Algorithm 1 Calculation of best root from note data
N is set of notes
S is strength vector (length 12, initially 0.0)
W is root support weighting vector: [10, 0, 1, 3, 0, 0, 5, 0, 0, 2, 0]
R is support for each candidate root
p is presence vector
B is bias vector
for all n ∈ N do
pc = pitchClass(n)
Spc ⇐ Spc + duration(n)
end for
p⇐ getSignificant(S, numSignificant)
for i = 0 to 11 do
Ri ⇐ S · rotate(W, i)
end for
normalise(R)
for i = 0 to 11 do
Ri ⇐ Ri +Bi
end for
return argmaxi
(Ri)
Chapter 5. Music Listening 51
weights (subject to a maximum threshold)3. The pitch class and flavour with the
highest score is then taken to be the current key centre.
5.2.2 Pattern Analysis
The reason for performing the harmonic analysis detailed above is that is gives a
lot of information about the structure of the piece. To avoid a complex musical
analysis, and to support the rules which prefer reusable fragments, we look at
pattern. The guiding principle is that patterns which occur frequently are likely
to be structural units at some level.
Pattern search is performed using a PatternTree. Once the chord sequence
of the piece has been approximated, a tree is built of all the patterns contained
within this sequence as follows:
1. For each bar, a sequence of chords up the the maximum length to be con-
sidered, starting at that bar is extracted.
2. The pattern tree runs down the sequence in order, starting from a blank
root node, and for every element
• if the current element of the sequence is a child of the current node of
the tree4, then the child’s visit count is increased and it becomes the
new current node
• otherwise, a new child is added, with a visit count of 1, which then
becomes the current root.
Once this has been created, it is easy to see how many times a particular
sequence occurs by walking the tree and reading the visit count in the final node.
The support for the sequence is then the number of occurrences of the sequence
divided by the number of sequences of that length.
3The compatibility vector sums to less than zero; each weight is limited to being between
0 and 60. This means that only the compatible keys have weight at any given time, and
incompatible chords will quickly change the perceived key4Chords are compared by their stringified form, to make the system more accommodating
to insignificant changes in extension
Chapter 5. Music Listening 52
5.3 Rhythmic Analysis
Rhythmic analysis is performed by two separate but similar algorithms. The
general idea is measure the “agreement” of playing within the bar - how many of
the notes are played by everyone, and how many are played by some people and
not others. The idea is to pick out bars which should be classified as phrased,
either as only phrasing, or as a normal bar with some accents.
Both algorithms divide the bar into small segments and the quantize each
note onset to the nearest segment boundary.
The first algorithm (Algorithm 2) then goes through each subdivisions and
counts the number of beats on which almost everyone plays, on which some people
play, and on which (almost) no-one plays. The result is the ration of beats where
everyone played to the number of beats where anyone played.
The second algorithm (Algorithm 3) calculates the disagreement on each sub-
division. Each musician can either play or not play; if everyone or no-one plays,
that is maximum agreement, while if half the musicians play, that is maximum
disagreement. The disagreement is calculated for each subdivision, normalised by
the number of parts, and then the average disagreement for the bar is calculated.
This is then converted to an agreement, on the scale 0 to 1.
Algorithm 2 First Phrasing algorithm
for all subdivision do
if proportion playing > phrase threshold then
phrasedBeats++
else if proportion playing > rest threshold then
unphrasedBeats++
else
//it’s a rest
end if
end for
result = phrasedBeatsphrasedBeats+unphrasedBeats
The results of these algorithms are stored in a PhrasingAnalysis object,
Chapter 5. Music Listening 53
Algorithm 3 Second Phrasing algorithm
for all subdivision do
disagreement ⇐ numPlayers − abs( numPlaying - numNotPlaying )
totalDisagreement + = disagreement / numPlayers
end for
averageDisagreement = totalDisagreement / numSubdivisions
result = (0.5− averageDisagreement)× 2
which also decides5 whether to classify the bar as being normal, normal with
accents, phrasing only or tacet.
5.4 Dissection
The Preference and Well Formedness rules are run after all of the features have
been run. On AnnotationStrand is created for each of these: Well Formedness
rules store boolean values, while Preference rules store floating point numbers.
Each Well Formedness rule has the chance to specify that a particular bar must
be a break point, must not be a break point, or it may leave it in it’s current
state (the default state being a don’t care). Each Preference rule will add (or
subtract) an amount from the score for a particular bar. Breaks are then made
• where the Well Formedness rules force a break
• where the combined score of the Preference rules exceeds a certain threshold.
The current PreferenceRules are:
InstrumentationChange The instrumentation change rule works on the data
produced by the InstrumentationChange feature, which in turn relies on
the Instrumentation feature, which calculates an activity level for each
player. The InstrumentationChange feature then sums the absolute dif-
ferences between each instrument’s activity levels to get a value for the
overall instrumentation change. This is then used as the basis for a score
5by comparing each value to a threshold
Chapter 5. Music Listening 54
PatternAnalysis Works over the scores created by the ChordPattern feature,
and looks for high values and local peaks or jumps.
PlayerChange Similar to the InstrumentationChange rule, except that it looks
at the changes in who is playing, not what they are playing.
TotalInstrumentation Looks at the change in total activity, and adds scores
for changes relative to surrounding bars.
Most of these rules only work over the previous bar (or the next bar), so they
have a very tight window. The PatternAnalysis rule looks at the next but one
bar as well, but this is still not a large amount of data to work on.
The current set of Well Formedness rules is quite limited:
Groove There are no real groove identification tools in place, so the only grooves
which are considered are the transitions between PhrasingOnly and every-
thing else (partial implementation of Segment WF Rule 2).
Once the dissection has been calculated, a new Section or Segment is created
at each of the relevant break points. A set of Attribute rules are then run to fill
in the necessary attributes of each object - for example, a Segment needs to have
the style, intensity, the clave and any phrasing added. Attribute rules are given
a range of bars corresponding to their object, and calculate a value for that set
of bars. For example, the IntensityAttribute calculates the average intensity
over the Segment, and the IntensityChangeAttribute performs a regression fit
between the intensity of each bar and time (relative to the start of the section) to
determine whether the intensity is increasing or not. This results in the finished
Representation.
5.5 Music Listening Summary
In this chapter we have implemented the extraction of several basic musical fea-
tures and some more complex features. We have implemented rules which cor-
respond to some of the grouping rules we are attempting to realise, and show
Chapter 5. Music Listening 55
how a complete dissection may be created, but we are currently unable to realise
this. We have used novel algorithms for detection of phrasing, and have adapted
existing algorithms for key and pitch tracking to our specific purpose.
Chapter 6
Generative Methods
In this chapter we look at everything pertaining to the final output of the tim-
balero - the manner in which we use our high level representations to create a
musically sensitive accompaniment.
In keeping with the overall design, we split the generative subsection into
two main parts: basic rhythm selection and ornamentation. (see Figure 6.1 for
details)
6.1 Basic Rhythm Selection
Using the information given in the Design section, a set of rules has been produced
from a knowledge elicitation interview (Appendix A.4); Figure 6.2 gives pseudo-
code.
The Timbalero goes through three stages in generating the basic rhythm for
the current bar:
1. Select the patterns according to the rules given above. This results in a
pair of two bar Phrases, one each for the left and right hands
2. Adjust the Phrases for the current clave. All of the rhythms are stored
in 2-3 clave form. In 3-2 sections of the piece, the bars must be swapped
around to fit with the clave.
56
Chapter 6. Generative Methods 57
Generator
Style
Ornamentation
Representation
Memory
Phrasing
FillPlacement
ChatterPlacement
Basic RhythmSelection
Transformations
FillSelection
ChatterSelection
Final Output
Figure 6.1: Generative Structure
Chapter 6. Generative Methods 58
if ( not moved_to_bell ||
( current_section is "Coda" &&
current_style is "SonMontuno" ) )
{
right hand plays cascara;
if( very loud ) left hand plays Doble Pailas;
else if( needs a clave ) left hand plays clave;
else left hand plays nothing;
}
else
{
moved_to_bell = true;
right hand plays mambo bell
if( needs_campana ) left hand plays campana;
else left hand plays Hembra;
}
Figure 6.2: Rhythm Selection Logic
3. Select the correct bar. On even bar numbers (zero indexed) the first bar is
selected, and the second on odd bars.
It would be possible to combine the second and third operations, but this is
felt to be more transparent and analogous to how a musician would think.
6.2 Ornamentation
These ornaments are considered in the specified order:
if( has phrasing ) add phrasing;
else if ( should fill ) do fill;
else if ( should chatter ) add chatter;
end
6.2.1 Phrasing
Phrasing is generally performed with a loud open note on one of the timbales,
often augmented with a cymbal crash. The Timbalero in this case always uses
Chapter 6. Generative Methods 59
either the rim of the macho, or the rim of the hembra combined with the cym-
bal. This decision is made based on the proximity of other accents - if there is
another accent within a threshold distance (default 1.25 beats), then no cymbal
will be used. Without this, long runs of phrasing were very hard on the ear, and
unrealistic.
The area around the phrasing is also cleared of notes. Both the right and
left hand parts are cleared, which both simulates moving the hands to play the
accents and leaving some space around them so they stand out more.
For phrasing where play continues as normal, nothing need be done. For
Segments where only phrased notes are played, the PhrasingOnly Style must be
used (see Section 6.3). below.
6.2.2 Fills
The Fill subsection of the Timbalero performs two tasks:
• Fill placement
• Fill selection
The rules detailed in Section 3.4.3 for fill placement have been implemented.
Each rule produces a score for the current bar; the scores of all the rules are
summed, and then a small amount of noise is added. This value is then compared
to a threshold value set in the Generator (Section 6.3). If it is higher than the
threshold, a fill is played.
Fills are selected randomly from a pool, provided by the Style. The exception
to this is the Tacet Style, which is most likely to play an abanico on the lead in
to the next section.
Fills are stored as a single jMusic Phrase. The basic rhythm up until the start
of the Phrase is left in, and the rest is cleared. The fill is then added to the basic
Part. Many fills require an accent to be played on the downbeat of the bar after
the fill. However, due to this representation, each fill must stop before the bar
line. Fills hence have a requiresDownBeatAccent() method; if this returns true,
Chapter 6. Generative Methods 60
then the ornamentation system will add an accent to the beginning of the next
bar.
6.2.3 Chatter
Chatter is added according to the rules set out in Section 3.4.4. Similar to Fills,
Chatter is represented as a single jMusic Phrase, and this has a similar limitation:
in some cases, chatter should span several bars, particularly for chatter based on
displacements1. Since the current representations only allow for single bar chatter,
the Chatter class has been extended to allow a followOn to be added. If Chatter
is to be played and the previous bar contained a Chatter with a followOn, then
the previous Chatter’s followOn will always be used. Rule 7 also increases the
chance of chatter if the previous bar contained a Chatter with a followOn.
6.2.4 Transformations
The transformation section is designed to further enhance realism. Provision is
made to apply transformations which will
• Alter dynamic levels; this covers functions such as playing quietly in quiet
sections and increasing or decreasing dynamics throughout sections.
• Alter the feel or groove of the playing. This would cover both applying
preset grooves to the output and emulating the groove of the other players.
At present, no transformations are implemented - the dynamic changes are
generally implemented by the voicing/rhythms used, and much of the input is
quantized, and hence non-groovy.
1A displacement is a rhythm made up of repeating units whose length is not a power of two
multiple of the bar length, so the positioning of accents in the chatter rotates with respect to
the bar
Chapter 6. Generative Methods 61
6.3 Modularity and Division of Labour
The Timbalero uses a Generator and a Representation to produce the output.
The Representation holds the high level representation of a song and keeps
track of the current position within it. The Generator holds a Memory (see
Section 6.3.1) and loads a Style as apropriate for each Segement
The great majority of the work is done by the Style class; this allows for easy
expansion to other styles. While a range of Styles are possible, there are a few
key styles, whose operation may be illuminating:
Style The basic Style class provides SonMontuno playing, and rhythm and or-
nament selection for Salsa music.
SonMontuno implements nothing at all, and is provided so that Segments can
have a ”SonMontuno” style.
Tacet always returns empty basic parts for both hands, and is used for sections
where the Timbalero does not play. It is very likely to use an abanico as a
fill if a fill is performed, and will only perform a fill if it is the last bar of
the Segment and the next Segment is not Tacet.
PhrasingOnly refuses to play any fills or chatter, and returns empty parts for
the basic bar, so that only phrasing is played
TimbalesSolo Returns empty parts for the basic bars, plays a Fill in almost
every bar and adds Chatter when not playing a fill.
Rumba and Bomba are examples of adding new Styles. The Rumba style
only overrides the getClave() method, to return a rumba clave, while the
Bomba style overrides the default cascara pattern
All of the domain specific knowledge is encoded in the Style class2, from logic
which chooses rhythms down to the actual fragments used. This allows for the
2all the generative knowledge, that is - there is additional knowledge used to build up the
representations used which is stored elsewhere
Chapter 6. Generative Methods 62
easy addition of new styles; with some work, it would allow for expansion to other
genres of music. As noted before, each Segment has a set Style. Each Section,
however, has a structural role. This means that it is possible to have the Son
section of the piece contain a mixture of different rhythmic styles.
6.3.1 Memory
When a real player plays, current actions are often based on previous actions - if
some chatter is started two bars before the end of a section, it will probably be
continued and maybe intensified - until the end of the section. In general, we need
a memory of what decisions have been made previously. A general Memory class
is used for this, which holds a set of MemoryStrands, indexed by name. Each
MemoryStrand contains a list of a certain type of object, and has a set length.
Each time a new value is added to a full strand, the oldest value drops off the
end. This can then be queried to support rules such as ”if I played a really big
fill at the previous section change, I won’t play one here”.
In the current implementation, each bar, the Fill and Chatter played (or null
if none was used) are added to strands called ”Fill” and ”Chatter” respectively.
These are used to support Chatter Preference Rules 7, 8 and 9, the use of Chatter
followOn , and the downbeat accents after certain fills.
6.4 Generative Methods Summary
We have broken down the creation of output into the selection of a basic rhythm
and subsequent ornamentation. We have described how basic rhythm selection is
performed in the context of a salsa tune, and given procedures which implement
the design rules for structural use of ornamentation. We have described the
need for a memory of what has been played and its implementation. Finally,
we have described how a modular architecture is used to support the creation
of appropriate output, and concentrate all of the essential logic in a single place
amenable to extension.
Chapter 7
Results and Discussion
There were two major outputs from the finished system: an analysis of the music
heard, and an accompaniment based on a hand crafted representation. Analyses
of the representations used and the agent system are also given.
Much of the testing has been performed using a version of Mi Tierra, by Gloria
Estefan. The original MIDI file is of unknown origin, but has been compared to
the original recording and found to be a faithful representation of the piece. The
file was quantized, for two reasons:
• it makes life easier for the analysis systems (although the rhythmic analysis
should be relatively robust, there are representational issues1)
• the timbalero does not follow the feel of the other players, so it would be
likely to sound wrong and slightly out of time in some sections.
63
Chapter 7. Results and Discussion 64
By hand context free current key with context
Cm Cm C min Cm
Cm G1020010 C min Cm
G Gdom7 C min Gdom7
G Gdom7 C min Gdom7
Fm G110-1010 C min C1011010
G Gdom7 C min Gdom7
G G1020010 C min Cm
Cm Cm C min Cm
Table 7.1: Output of chord recognition against hand crafted representation for bars
41-48 of Mi Tierra (see text for details)
7.1 Music Listening
7.1.1 Chordal Analysis
Figure 7.1 shows the output of the chord recognition subsystem for a fragment
of music, with hand analysis for comparison. The “context free” output is the
result of running the simple chord recogniser. As noted before, this is fed into
the key induction algorithm to generate the current key (3rd column) and this
context is used to give a “contextual chord” (4th column).
In this section of music, most of the chord changes happen on the fourth beat
of the previous bar (see Section 7.1.1.1), which causes problems for the chord
recogniser. The second bar is originally recognised as a rather strange extension
for G - decoding this we come up with the notes G,Eb,C - quite clearly a C
minor chord. Somehow, possibly due to the extra weighting given to the lowest
note played, this is being misclassified. Looking at the fifth bar, we see another
strange extension to a G chord. This decodes to G,B,F,C, which would indeed be
1Due to the very simple segmentation techniques used, if a note occurs slightly before the
first beat of the bar, it will be considered as part of the previous bar, and the bar in question
will be missing a downbeat
Chapter 7. Results and Discussion 65
44
��
��� �
�
��� �
�
��� �
�
��� �
�
��� �
�
��� �
��
� � ��� �� �
��
�
Figure 7.1: Guitar part for bars 22-25 of Mi Tierra, exhibiting split bar chords
a strange chord, if it were not a superposition of G and F (minor) - a classic case
of a split bar chord. In this section, context seems to help - the chords in bars
2 and 7 are correctly classified with context. However, if we look at bar 55, it
has caused a serious error, where the notes G B F (classified correctly as Gdom7
originally) are classified as a strange rootless C chord - C0011010.
7.1.1.1 Split Bar Chords
Figure 7.1 shows and excerpt of the guitar part in Mi Tierra exhibiting split
bar chords. Using additional musical context2, it is possible to see that the
most appropriate chord sequence at this point in time is |Cm|Cm|Cm|G|, but it is
transcribed as |Cm|Cm|Gsus4|G|
7.1.2 Chord Pattern Analysis
The original version of the chord pattern finding algorithm performed quite
poorly. This was due to the chord recognition being quite sensitive, and classify-
ing broadly similar chords with different extensions. Observing the solo section
(bars 153-176) which consists of repeating the chords |Cm7|F|Ab|G|, we see sev-
eral different versions of the same sequence (compare bars 153-176 to 169-172).
The version given in the appendix (and discussed here) uses a more forgiving
version of the chord pattern rule, where only the root of the chord is considered.
We can see this giving very nice peaks with a period of four bars (the length of
the repetitive sequence), which is what is hoped for. If we look at the Montuno
2Mostly from observing other similar sections and listening to where it feels like the chord
changes
Chapter 7. Results and Discussion 66
44
� �� �
�� �
�� �
�� �
���� �
� �� �
�� �
�� �
��
Figure 7.2: Phrasing under the solos (bars 153-176)
(particularly 125-140) we can see that although the sequences are being found,
the peaks are out of phase with the desired boundaries. This can probably be
attributed to the quite ambiguous two chords at the beginning; this means that
the first repetitive sequence starts on bar 123, rather than bar 121.
7.1.3 Phrasing Extraction
There are many cases where the phrasing extraction algorithms perform as de-
sired. Both algorithms are used in the analysis, although it was found through
experimentation that the second phrasing rule (Algorithm 3) provided a better
discriminator - it was very hard to find a threshold for the first algorithm which
would label an appropriate proportion of bars. There are two aspects to examine
here: the classification of bars, and the identification of correct accents within
bars.
The algorithm has to classify a bar as being normal, phrasing only, normal
with accents or tacet. Almost all of the bars classified as being phrasing only
actually are. Some sections represent particular problems; the timbales solo and
subsequent playing in bars 153 to 176 has a constant set of accents in every bar
(see Figure 7.2), which should be considered as phrasing as everyone is hitting
the same notes except the people playing the solos. However, the fact of people
playing solos over the top confuses the analysis; the first bar, which contains many
accents is identified as phrasing for the first four cycles, after which it is obscured
by the solos. The second bar, which contains the last note of the first group of
accents and nothing else is harder to spot, and is not correctly identified. The
final two bars which contain a few notes of phrasing each are sometimes identified,
Chapter 7. Results and Discussion 67
but mostly missed. This exposes a limitation in the algorithm; if it had a sense
of parallelism, then the repeated rhythmic motifs would become clear.
It is also apparent that the distinction between the different types of bars is
not as clear as one might hope; it is difficult to detect tacet bars - at present a
bar is considered to be tacet if only one person is playing in it, which is clearly an
overly strong assumption (but it allows recognition of the conversation between
vocals and the band in the bridges (bars 17-20, 73-76)).
Where phrasing is correctly identified as being present, no examples have yet
been found where the wrong accents are identified. No results are shown here,
but the phrasing in the bridges is notated correctly, similarly the end of the verse
(bars 39-40) and the bar before the piano break (112).
As it stands, this algorithm has a disproportionate amount of work to do.
Classification of bars and extraction of accents should be split into two sections,
so that the classification section can use more information - for example, if all
the accent patterns were calculated before classification, there would be an op-
portunity to look for repeated motifs.
7.1.4 Final Dissection
As it stands, the final dissection is not really in a usable form. Many Segment
breaks are in the correct places: bars 41, 49, 57 and 65 are ideal examples of rules
combining to clearly specify break points. Indeed, most points where a break
is desired have a break point within one bar of them. The main issue seems to
be with extra breaks being added, which fall into two main categories: plausible
breaks, and implausible breaks. There are several breaks which are shown four
or eight bars from the beginning of a section (bars 69, 85, 91 and 109 are good
examples). A great many of these can be attributed to the chord pattern rule,
but there is often some support from the other rules. These breaks represent an
alternative, but fully plausible decomposition; although they are not structurally
significant points, it would not be ludicrous to (for example) divide the chorus into
smaller Segments. The musical knowledge necessary to avoid these could well be
hard to formulate, although it may be possible to tweak some of the rules slightly
Chapter 7. Results and Discussion 68
and clean this up. The implausible breaks are generally due to the well-formedness
rule stating that there must be a section break when the groove changes, and an
extra rule stating that no breaks were allowed in contiguous sections of phrasing
only playing3. If we look at bars 72 to 76, we can see that the last bar of the
chorus has been grouped with the bridge, which seems structurally wrong. We
can conclude that the rules we have put in place are not quite correct, or should
at least be relaxed.
There are also several sections missing entirely from the music listening sub-
system, as they would require a large scale investigation. No rules at all concern
themselves with inducing Section breaks. There is a reason for this - none are
apparent. Although it may well be clear to a listener what is a verse and what
is a chorus, it is not easy to formalise. Similarly, no work has been done on
classifying the role of Sections; there is information regarding what differentiates
one section from another (for instance, the montuno starts when the lead vocalist
starts improvising) but not enough to make a sufficient ruleset. The style of a
section is also only classified according to whether it is phrasing, normal playing
or tacet - some kind of stylistic analysis would need to be implemented.
7.2 Listening Tests
The listening tests were designed to test the generative subsystem of the tim-
balero; unfortunately, the analysis mechanisms could not be integrated in time,
so the complete system could not be analysed. Appendix D is a copy of the ques-
tionnaire. Two groups of listeners were tested: the general public, and a set of
domain experts (comprising Cambiando, the salsa band in which I used to play
timbales, and my co-supervisor Manuel Contreras who plays congas in another
salsa band).
Two versions of Mi Tierra were recorded, one using the virtual timbalero and
a hand crafted representation, the other played by myself, using a MIDI drum kit
3This was added because these sections typically have high values for many rules, and would
otherwise be highly fragmented
Chapter 7. Results and Discussion 69
Tested Correct Preferred Computer
Expert Listeners 6 83% 33%
General Public 10 60% 50%
Table 7.2: Results of Listening Tests
rearranged to approximate timbales. The use of a MIDI kit allowed completely
identical sounds to be used - a recording of timbales would be relatively easy
to distinguish from one composed of triggered sounds. The human playing was
quantized, and obvious mistakes were edited. The final recordings should hence
be a fair comparison of the two musicians, and should not give any obvious cues
as to which is which other than the actual music produced.
The timbales sounds were produced by recording a set of LP Tito Puente
Timbales, along with an appropriate selection of cowbells. Each possible timbale
sound was produced, then the recording was cleaned up and segmented into
individual files for each sound. The finished files were loaded into a Yamaha
A3000 sampler and mapped to notes corresponding to the MIDI notes sent out
by the electric drum kit. The rest of the sounds in the piece were produced using
the GM synthesiser module built into the drum kit. The level of the timbales was
set artificially high in the mix, to make it obvious exactly what they were doing
- the sound balance was designed to approximate that heard by the timbalero
while playing with a group. The final recordings were normalised, compressed
slightly to remove any possible dynamic inconsistencies and burned to CDs for
the tests. Each participant was given a copy of the questionnaire, a copy of the
CD and some means of listening to the CD (generally headphones) and asked
to read the instructions before listening to the music. Table 7.2 summarises the
results of the tests.
A a χ2 test is as follows:
H1 The general public can identify the virtual timbalero more accurately than
random guessing.
H0 The general public can not do better than random.
Chapter 7. Results and Discussion 70
We calculate a χ2 value of 0.4 (with 9 degrees of freedom), which is not signifi-
cant at any level, so we conclude that the general public are unable to differentiate
between mechanical and human playing. Unfortunately, there are explanations
other than the quality of the generated output for this: from speaking to the
subjects, it was clear that many of them were not quite sure what to look for,
and the general unfamiliarity with the genre made analysis difficult for them.
The domain experts were a different story; almost every expert tested was
able to discriminate between the recordings. We get a χ2 score of 2.67 (5 DOF),
which gives us support at the 95% level for the hypothesis that experts can tell
which is the human and which is the machine. Cited features that gave away the
virtual player include similarity of fills, sounding too polished and following the
marked phrasing too closely.
All subjects indicated a degree of uncertainty. Interestingly, the experts gen-
erally expressed more uncertainty than the general public, and the only person
who was certain which was which was also wrong.
From this we conclude that the generative system is of high quality. Many of
the criticisms could be solved by
• adding more ornamentation fragments to the library
• allowing more freedom over when to perform ornamentation
Some points, such as incoherent soloing, would require more work. It should
also be noted that exact timing information (groove or feel) was removed from
the human performance; this is one area where it is expected that the mechanical
player would have difficulties. However, no comments were made about a lack of
feel in either performance.
7.3 Representations
7.3.1 Structural Assumptions
There were several structural assumptions given in Section 3.2.3; we are interested
in how well they have held up:
Chapter 7. Results and Discussion 71
Structural Assumption 1 There are high level sections of music with distinct
structural roles
This was derived from the structural description of salsa music. It has proved to
be useful in the creation of realistic playing, and informs much of the generative
system. Unfortunately, nothing has so far been implemented which can detect
and label the top level structures (Sections), so empirical support is somewhat
limited. It is by no means inconceivable that these structures can be reliably
extracted - the work here presented would be a solid foundation for this, it simply
needs more work to be done.
Structural Assumption 2 The smallest structural unit in latin music is the
bar; phrases may be played which cross bars, or which take up less than a single
bar, but the structure is defined in terms of bars.
This was found to be generally true, but similarly to the problem with split-
bar chords, it is common that changes happen outside the first or last bar of
a particular Segment - a common example being a lead in. The representation
needs some way to specify or allow for the blurring of boundaries here.
Structural Assumption 3 A bar contains one and only one chord
It was seen from the harmonic analysis that although there is generally one chord
per bar, the chord and the bar do not always share the same boundary - a common
example being when a new chord starts on the last beat of the current bar, rather
than the first beat of the next (see Figure 7.1, and Section 7.1.1.1 for a discussion).
Some possible alternatives are:
• Allow bars to contain several chords - possibly one per beat.
• Allow chords to occur on a continuous timeline, and not be structurally
contained within bars
• Allow the scope of a bar to become somewhat fuzzy, so that chordal changes
near the beginning or end of the bar are absorbed into the appropriate bar
Chapter 7. Results and Discussion 72
With Timbalero (s) Without Timbalero (s)
34.46 26.51
35.09 26.65
35.06 26.67
Mean 34.87 26.61
Std Dev 0.3554 0.0872
Table 7.3: Run times for Mi Tierra with and without the Timbalero (tune length is
273 seconds)
Of these it is felt that the last possibility is most analogous to human per-
ception; Looking again at Figure 7.1 it would more commonly be transcribed
as |Cm|Cm|Cm|G| than |Cm|Cm|Cm/G|G| or similar. It might even be said that
the chord does not change until the next bar, but the notes used anticipate the
change.
We conclude that the assumption is roughly correct, but that account needs
to be taken of this anticipation of changes.
Structural Assumption 5 A segment contains one and only one groove
This assumption is valid, but has been slightly overstretched. There is a need for
a distinction between the current groove and playing instructions. To illustrate,
many segments have a final bar with phrasing in; for this to be played as phrasing
only, it needs to be classified as a new segment. However, it is still quite clearly
part of the previous segment. One possibility would be to allow specification of
playing directives at the bar level. This would allow segments where the general
groove was normal playing, but last bar was to be played phrasing only. It would
also support segments which were entirely comprised of phrasing, and even allow
for a slightly different treatment of these.
Chapter 7. Results and Discussion 73
7.4 Infrastructure
No problems were found with the JADE environment. Average runtimes were
calculated over three runs, with and without the Timbalero being created. 4 The
averages are shown in Table 7.3.
It can be seen that the infrastructure runs approximately ten times faster
than real-time, or alternatively, the infrastructure consumes 9.7% of the available
computing power in a real time situation. A run with the Timbalero playing takes
12.8% of the available CPU time. This indicates that real-time performance is
definitely feasible - especially considering that none of the code has been optimised
at all. The standard deviation of run times is small compared to the actual time;
although this does not directly imply that jitter will not be a problem, it gives
an indication that performance should be relatively dependable.
4the Timbalero is running the full output generation system, and performing basic feature
extraction. The machine is 1.6GHz Intel with 256MB RAM
Chapter 8
Future Work
8.1 Analysis
8.1.1 Chord Recognition
The chord recognition section is quite a key feature of the system, and has several
limitations. An improvement to the current algorithm woule be to use more musi-
cal knowledge in chord generation. For example, if two roots have similar scores,
but one root would give a rootless chord, then the other root would generally be
preferable (see the discussion of rootless chords in the previous section). Simi-
larly, if there is one root which would provide a known extension and one which
wouldn’t, then the first is preferable (e.g. prefer to classify a chord as Cm than
G1020010. While this should generally be inherent in the Parncutt algorithm, it
does not appear to be.
Another possibility is to use an alternative algorithm. [29] gives another chord
classifier based on simple lookup, which would probably be easier to code and
faster, and might prove to be more robust.
A problem with both of the algorithms as presented is that they do not appear
to be designed for continuous data; each measures only the presence or absence
of certain notes, which is not really appropriate for this kind of music. It is
not uncommon for melodies to have many passing notes which have little to do
with the underlying chord of a bar, and which should not necessarily be included
74
Chapter 8. Future Work 75
in analysis. It might also be appropriate to use alternative weighting vectors
designed for this style of music. It would be useful to have an algorithm which
ran continuously (rather than analysing discrete block of data) so that it could
specify chord boundaries, to aid with the problem of split chords.
Finally, at present the algorithm only looks at a defined set of instruments
(piano, bass and guitar). This has been chosen to fit the current set of examples.
It would be more useful if it had some way to both:
• decide which instruments in a tune were likely to be useful, possibly with
some order of preference
• select a subset of these based on who is playing at the moment and what
they are playing. This would allow for the piano to be ignored while it
is soloing, and for only listening to the most significant instruments when
many people are playing.
8.1.2 Pattern Analysis
There are two obvious methods for development here. Firstly, since the perfor-
mance of this algorithm is directly dependent on the output of the chord recog-
nition, improving the accuracy or robustness of the chord classifier will enhance
pattern finding ability. The second possibility is to enhance the chord pattern
algorithm, in a variety of ways:
• The use of a mismatch kernel, to allow for the odd misclassified chord.
• The addition of some domain knowledge specifying how similar different
chords are, so that a sequence could get a partial score from several similar
sequences. An advantage of this is that it would add robustness with respect
to the data provided by the chord recogniser, as a misclassified chord is likely
to be similar to it’s true classification, so that it retains a high score for a
match.
Chapter 8. Future Work 76
8.2 Generation
8.2.1 Ornament Selection
The ornament selection performed by the generative subsection is particularly
weak; Although it works reasonably well, much of it’s power comes from having
hand tuned snippets to work with, and the fact that drummers have a lot of
license over which fills are played when. The current random selection model is
clearly lacking in any kind of musical knowledge, but it would require a significant
amount of work to produce a good selection method. To treat ornament selection
properly using a similar approach to the rest of the project, we would need to:
• have some idea of which fills were appropriate for a particular piece
• make more strategic use of ornaments - using especially large fills for big
changes etc
• Tailor existing ornamentation fragments to fit the particular usage
• Be able to maintain some kind of thematic continuity between ornaments,
while not using the same ones all the time
Also, the ornament fragments need to be expertly tuned to produce the correct
sound; an ornamentation system with some musical knowledge should be able to
take care of at least some of this.
These are clearly large goals, and could easily take up a project on their own.
A quick boost to realism could be given by
• At present, ornamentation is created from strings, and in the creation pro-
cess a small amount of noise is added to the velocities of all the notes, to
simulate human imperfections. Unfortunately, this is done only when the
ornament is created, so every subsequent use will be completely identical.
Adding a bit of variation here would probably help
• Using the indications of ornament strength to guide the choice of ornaments.
Chapter 8. Future Work 77
8.2.2 Groove and Feel
Current output is completely quantized; each note is exactly on it’s chosen di-
vision. Latin music is famous for it’s feel more than anything else, so quantized
output is likely to be strongly sub-optimal. Two possible techniques to improve
the feel are:
• Creating a set of groove templates which describe offsets to be applied to
the placement of notes on each division in the bar. This could be done by
speaking to expert timbaleros, and analysing their playing.
• Analysing the playing of other musicians, to see where their notes are rela-
tive to the nominal pulse, and using the average displacements as a groove
template.
It would be possible to combine the two, so that the output could be smoothly
varied between using the predefined templates and the dynamic templates. Both
techniques are useful, because while dynamic templates allow the timbalero to
respond to the playing of the other musicians, a timbalero may well not always
place their notes the same as the other musicians.
8.2.3 Soloing
The soloing algorithm as implemented uses no musical knowledge at all. It will
play a random fill for about 3 out of four bars, and for every other bar it will
play the basic rhythm with some chatter. A good drum solo (like most good
improvised instrumental solos) should set out some kind of theme and explore it,
or at the very least be in some way musically coherent. As with the ornamentation
above, it could easily become a complete project, and similar future possibilities
apply. This is one area where the use of a probabilistic grammar would seem
highly appropriate[3].
Chapter 8. Future Work 78
8.3 Representations
Representations need to become more flexible. Some milestones in order of in-
creasing freedom from specification are:
• Allowing for the conductor to call for repeats of certain sections
• Having sections which can repeat indefinitely, such as solos1
• Being able to play a tune where the order of sections needs to be learnt, or
is completely fluid.
Although all of these depend heavily on other parts of the system than the rep-
resentational sections, the representations used would need to be able to support
certain operations and structures before they become possible.
The need for chord boundaries to become detached from bar boundaries has
been discussed previously. There is a case for extending this relaxation to other
features; Consider the case of a one and a half beat lead in into a new section.
It would make sense to consider the lead in as being part of the section to which
it leads in. At present, there are only two choices: either the lead in is part of
the previous section (which is the route taken by the hand analyses), or the last
bar of the previous section is absorbed by the new section (which does not seem
appropriate). Allowing segment boundaries to be placed at any point within
the bar would go some way towards solving this, but it is still not a perfect
representation of the structure. A more accurate breakdown would be to have
the boundary on the bar line, but allow the lead in part to be considered part of
the section it leads into. This could be done by allowing different boundaries for
each instrument: no instrument would be allowed to be playing more than one
section at once, but for small periods they could be playing different sections.
This would have the disadvantage of making representations more complex, but
analysing each individual instrument for boundaries would allow a treatment
closer to GTTM [18].
1where the backing repeats until some event takes place; it could be a nod from the soloist,
it could be a special phrase that they play, or it could simply be a general consensus among
the rest of the band
Chapter 8. Future Work 79
8.4 Agent Environment
There are a few issues with the agent environment which were not a problem with
this project, but would need to be addressed for the project to be scaled up:
• The messages passed round currently include serialised Java objects. This
is clearly poor from an interoperability standpoint, and should not be nec-
essary. It should be trivially possible to alter the collated messages to be
sent contained in a jMusic Score, rather than a hash of Parts. Similarly,
identities should be sent in some open format.
• With a few tweaks, the system could be made to work in some kind of real
time, although it is not quite clear what this would be. For this to work well,
there would have to be a mechanism in place to change the “chunk size” of
messages passed round: since a musician’s chunk only becomes available at
the end of a chunk period, there is effectively a two chunk latency for the
musicians to react(see Figure 8.1)
The communications protocol as a whole is slightly inelegant; it should be
possible for musicians to come and go as they please, and provision should be
made for musicians not producing output. A lot more flexibility would need to be
built in in general if the system is to perform in real time; it might be necessary
for each musician to construct parts in several passes, so that they will always
have something available in time. Similarly, they would have to become adept
at working with incomplete information - for example if a link on the network is
dropped or becomes congested and the other musician’s parts are not available
in time.
8.5 Long Term Improvements
The main direction of improvement should be towards more flexibility, generality
and robustness. At present there are many hard coded parameters, which should
be dynamically determinable for a given piece. Many of the rules are vague
Chapter 8. Future Work 80
First chunk becomes available
Output can react to first cunk
TimeMusician startschunk
Agents output next chunk
Time to react to human playing
Figure 8.1: Chunk Latency for the Agent System
Chapter 8. Future Work 81
heuristics, which should be thoroughly researched and optimised. There is a
definite need for more reaction to what the other musicians are playing - at
present, only quite high level features are analysed. It would be quite possible to
have two very different tunes which had almost identical representations, which
indicates that there is more about a tune which could be captured.
There is also the possibility of expansion to both other percussion instruments
and other styles. Due to the modular design of the generative section, all of the
rules which govern rhythm and ornament selection are in the Style class. This
means that a lot of variation is possible simply by adding new styles. To take the
part of another percussionist in the same style would only require changing the
rhythmic fragments, and possibly altering the rhythm selection rules - an entirely
feasible task.
To play percussion in a different style would require that representations be
built up for that particular style. On the generative side, this would only require
that a Style was created with rules relevant to the particular style. It is easy to
imagine extension to latin jazz, and even funk or rock, so long as
• adequate representations can be built up
• playing can be broken down into selection of a basic rhythm followed by
addition of ornamentation.
• Templates are available for the various rhythms and ornaments
For this it would be useful if all of the necessary domain knowledge could be
encoded in a single place; at present, knowledge used to generate output is stored
in the Style class, but the knowledge used to build representations is elsewhere.
It would be useful if the JavaStyle class could also absorb this knowledge as well
- for example, it could define structural sections of music by specifying a set of
rules (selected from a common pool) which indicated that a particular part of the
piece played that role.
It would also be interesting to add automatic style selection, so that the agent
could hear a piece and determine what style to play in. This would be a large
step towards a generalised drummer. Ultimately, it would be desirable to add
Chapter 8. Future Work 82
learning capabilities, so that it could be “taught” new styles, and develop it’s
own rules for determining the structures of novel pieces.
Chapter 9
Conclusions
We will now summarise our findings about how well we have met the design aims,
and offer some final thoughts for the future.
• High level representations sufficient to adequately inform the playing of a
timbalero
The representations were found to be generally usable, and can be extended
without much difficulty. There were several cases where the representation
is essentially accurate, but needs to be slightly more relaxed1
• Generative methods capable of creating realistic timbale playing
The final output produced was able to fool the general public, but domain
experts were able to distinguish it from human playing (albeit without much
confidence). It is also expected that with a little more work, it could provide
much higher quality output, as most of the features which listeners have
used to discriminate between human and machine playing can be relatively
easily compensated for. The work provides a solid foundation on which to
build more involved systems which can deal with aspects such as groove (or
feel) and musical continuity.
1e.g. the assertion that there is one chord per bar, which is generally true, but the chord
boundaries and the bar boundaries do not always line up perfectly
83
Chapter 9. Conclusions 84
• Algorithms which are capable of generating the necessary high level repre-
sentations
The basic features of music are well extracted. More complex features (such
as chords and key) are extracted, but could benefit from more work. Musical
parallelism needs to be more fully investigated, and there are many cases
where the reasoning needs to be more forgiving and musically sensitive.
Some level of structure is discerned, which is close to the desired result in
many places, but it is not a complete and usable technique.
• Construction of an Agent based environment for musical processes
The agent environment seems to be robust, fast enough and can deal with
an acceptable number of musicians, although it has not been put to exten-
sive stress testing. The information encodings used convey all necessary
information adequately, but should be made less platform specific. It ap-
pears that the system is also fast enough to work in real-time, although
some work would need to be done to ensure responsiveness.
Overall, the project’s aim - to create an agent which can produce high quality
timbale accompaniment to salsa music - has been well met. We believe that it is
an extensible platform, and could be adapted to other instruments, other styles
and real-time operation.
Appendix A
Musical Background
A.1 History and Use of the Timbales
Much of the historical information in this section is paraphrased from {TODO:
add ref to Changuito}, and presented in a highly condensed form.
Timbales are commonly thought to be descendants of European timpani
(which are sometimes called timbal in Spanish). Timbales are also sometimes
called pailas Cubanas ; the paila is a hemispherical metal shell used in sugar cane
factories, and formed the body of the first Cuban timpani. In the early part of
the twentieth century, large timbales became unfeasible (for economic reasons)
and smaller versions were developed, which eventually came to be wooden, and
mounted on a single tripod between the players legs. It is not quite clear how
the modern form developed from here, but it is suspected that is has something
to do with the influence of jazz music and the more standard drum kit set up.
Timbales as we see them today consist of two cylindrical shells of metal, each
with a single skin (single skinned pitched membranophones in the Hornbostel-
Sachs scheme). They are mounted on a stand, with an assortment of bells and
woodblocks on a post in the middle, and there is the possibility of adding cymbals
and a kick drum (see Figure A.1) . Each instrument has its own characteristic
sounds and role:
Cascara Cascara (which means shell in Spanish) is produced by striking the
85
Appendix A. Musical Background 86
Hembra Macho
CrashCymbal
Ride CymbalMambo Bell
Block
ChachachaBell
Figure A.1: Example Timbales setup (overhead view)
Appendix A. Musical Background 87
metal shells of the drum with the stick. This forms the basis of many of the
rhythms. Often the right hand will play cascara while the left hand plays
another rhythm, but sometimes both hands play cascara, in which case it
is termed doble pailas . Here the right hand will play it’s standard pattern,
and the left hand will play in all the gaps left by the right.
Macho Themacho is the smaller of the two drums, and is considered to represent
the male role in playing1. It is played with the sticks, and can be played
open or as a rim shot, where the rim of the drum and the skin are struck
simultaneously to give a very piercing tone.
Hembra The hembra is the larger drum, and is often tuned either a fourth or
a fifth lower than the macho. As well as being played with the sticks for
fills and accents, it is often played with the left hand as part of the basic
rhythm. There are two sounds made with the hand - an open tone where
the fingers stroke the skin and bounce off, and a closed tone where the
fingers remain on the skin and mute the tone.
Block Traditionally, blocks were made out of wood, but nowadays can be made
out of acoustic plastic for a louder sound. the produce a single sound when
struck, and are often used to play the clave pattern.
Mambo Bell The larger of the two cowbells, the mambo bell is used in the
loud sections of pieces to create a powerful, driving rhythm. There are two
sounds, one produced by striking the body of the bell, and one by striking
across the mouth of the bell.
Chachacha Bell The chachacha bell is used for a lighter sound, and can make
two sounds in a similar manner to the mambo bell. The two bells can be
used together the play highly intricate rhythms.
Crash Cymbal Crash cymbals are used to add powerful accents to music. The
stick strikes the edge of the cymbal to produce a loud crash sound.
1many drums of African origin are sexualised. In this case, the bright forceful tones of the
macho make it seen to be more male, compared to the deep mellow tones of the hembra
Appendix A. Musical Background 88
� �cascara �block
�������������
Macho
open
� �
rim
���������������������
open
Hembra
sobado
muted body
�
mouth
�Mambo bell���������� �
body
���������� �Chacha bell�
mouth
�
Figure A.2: Scoring Timbale Sounds
3-2 Clave
2-3 Clave
44
44
� � ��
�
�� �
�
�
�� �
�
� � ��
��
Figure A.3: Standard Son Claves
Ride Cymbal More often used in latin jazz than salsa, ride cymbals are struck
on the bell, or with the tip of the stick on the surface. They create a
sound with a short, dynamic attack and a long sustain, that is often used
to provide a rhythmic framework, similar to the cascara. Some cymbals are
produced which can be used both as a crash and a ride.
Figure A.2 is a musical score showing the various sounds that the timbalero
can play. Figure A.3 shows the standard Son clave in 2-3 and 3-2 versions, and
Figure A.4 shows the basic Cascara pattern in 2-3 time.
Appendix A. Musical Background 89
Clave
� 44
44
��
�� �
�� �
� � � �
�� �
�� � � �
� � �� � �
Figure A.4: Basic Cascara pattern, with a backbeat on the hembra
A.2 The Structure of Salsa Music
The knowledge in this section comes from a knowledge elictation interview, de-
tailed in Section A.4.
There are many types of latin music. The first broad distinction is between
salsa and latin jazz, where latin jazz is a more modern style, and disregards
the traditional instrumentation of Cuban music. We are going to concentrate
on Salsa music, although can have many different stylistic variations (rumba,
danzon, bomba, mambo etc).
A typical piece of salsa music will be in a son montuno style. This is a
combination of the traditional son style with the more modern montuno sections.
Montuno The montuno2 is the high point of almost all salsa tunes. It is a
repeated section with an improvised lead vocal doing “call and response”
against a repeated phrase sung by the coros. Playing is generally upbeat,
but with a solid groove. The coros may start their repeated phrases before
the start of the montuno proper - the montuno is considered to start when
the lead vocals begins improvising. Once a piece hits the montuno, it will
generally stay at that level, with the possible exception of a short coda at
the end. More modern pieces tend to reach the montuno level earlier, and
2montuno can also be used to refer to repeated figures, generally played by the piano. This
would be referred to as e.g. a ”2-bar piano montuno” to keep it distinct from the usage as a
section of the piece
Appendix A. Musical Background 90
stay there longer for a more upbeat dynamic overall.
Son The section of the tune before the montuno will be in the more traditional
son style - hence the name son montuno. There are a variety of structural
forms used here; this is where the verses of the song appear, and there is
often an alternation of sections, but it is common for this to have quite an
intricate or unclear structure.
Mambo Mambos are similar to the montuno section, but replace the improvised
vocals with coordinated brass phrases. The feel is still upbeat, but there is
a lot more freedom for improvisation, ornamentation and “looseness”.
Intro Many songs have an instrumental introduction before any vocals come in.
Coda Some songs also have a coda at the end when the montuno has finished.
This commonly either contains a lot of phrasing for a punchy, upbeat end-
ing, or is a re-iteration of the introduction.
Solos The piano is by far the commonest soloing instrument in salsa. These
solos are backed by a lower level of playing from the rest of the band than
the montuno.
So a typical salsa tune might go: Intro, Son-A, Son-B, Son-A, Son-B, Montuno
(variable length), Mambos, Montuno, Coda.
A.3 The Role of the Timbalero
There are fairly standard combinations of parts which the timbalero would play
in most of these sections. As with most latin music (and most music in general)
none of the rules are hard and fast, but they do represent a general trend.
Montuno The right hand plays on the mambo bell. The left hand can play the
hembra on 2 and 4, the clave or some part of the campana pattern.
Appendix A. Musical Background 91
Verses Cascara in the right hand. The left hand can play clave on the block
if no-one else is playing the clave, or can fill in the gaps in the right for a
doble pailas pattern.
Mambo As montuno, but with more fills
Intro The intro is typically instrumental, and of a low intensity. The timbale
player will often tacet, but may play clave, or a gentle cascara depending
on the piece.
Coda If the coda is a repeat of the intro, then the coda is often played as cascara.
If the piece is ending on a high note, then the coda will be played as the
preceding section only more so.
A.4 Knowledge Elicitation
An interview was conducted with Hamish Orr, who is a latin percussion teacher
living in London. He was selected as a domain expert due to his experience
both as a teacher and a performer of the style in question. The interview was
conducted telephonically, and based on the “laddered grid” methodology. The
aim of the interview was to establish whether there were high level strucures
common to salsa music, what they were, and how a timbalero would behave
while playing them. Initial questions were asked to determine whether there
were such structures, and the expert was quickly able to specify several. Follow
up questions were used to determine the similarities and differences with a view
to automatic recognition. Finally, questions were asked to try and determine a
methodology for basic rhythm selection.
Appendix B
MIDI Details
The MIDI standard is an important part of this project, so we present a more
thorough discussion. Further information can be found at [20], [22] and [21].
Each MIDI device is referred to as a port, and each port has sixteen channels.
A channel refers to a particular instrument on the target synthesiser. The device
sending the MIDI information does not know what kind of sound will be produced
by the synthesiser, however - it only knows that it has asked it to use instrument
31 - but there are some standard mappings of instrument numbers to instruments.
There are two substrates on which MIDI exists: streams and files.
B.1 MIDI Streams
MIDI streams were originally transmitted over 11kHz serial links - this was the
original reason for MIDI: to allow keyboard players control over several synthe-
sisers (it replaced control voltage (CV) where a single analogue voltage was set
which could control the frequency of the oscillator in a synthesiser, and was lim-
ited to a single note of polyphony). MIDI gave the ability to control polyphonic
synthesisers (ones capable of playing more than one note at once) as well as giving
more control over different aspects of the sound (the force with which the keys on
the keyboard were struck, control over various parameters of the sound). MIDI
messages are sequences of bytes, with the first byte being a ”status byte” which
determines the type of the message. There are two types of MIDI messages seen
92
Appendix B. MIDI Details 93
in streams:
Short Messages Short messages are the bread and butter of MIDI. The two
most important being Note On and Note Off messages. All short messages
are three bytes long - one status byte and two data bytes. In the case of Note
On and Off messages, the status byte indicates the type of message and the
channel to which it refers, the first data byte gives the note number, and the
second byte gives the ”velocity” (how hard the key has been struck). Once
a sequence receives a Note On message, it will immediately start sounding
the note in question, and continue until it receives a Note Off for the same
note. The other common short messages are controller messages, which can
be used to tell the synthesiser to select a different instrument to play, or
alter parameters of the currently active instrument.
Sysex System Exclusive messages are used to add vendor specific extensions
to the protocol. These generally give very fine control of all aspects of a
synthesiser’s functionality - most of what a user can accomplish by using
the front panel of the instrument can be done via Sysex. Sysex messages
can be any length - this is specified by the the message.
B.2 MIDI Files
MIDI files are used by sequencers to store MIDI data. These contain a sequence
of MIDI messages, with appropriate timestamps (these are referred to as MIDI
Events, indicating a message and a time). There is a further type of MIDI mes-
sage found in MIDI files - meta messages give information about the sequence
contained the the file, such as tempo, time and key signatures and textual infor-
mation describing individual tracks or the file as a whole. These meta messages
consist of a status byte of F followed by a byte indicating the type of message,
and a variable number of bytes relevant to the message itself.
The timestamps of MIDI events are given in ”ticks”. A sequencer has an
internal resolution, measured in Pulses Per Quarter Note1 (PPQN or PPQ), and
1There are alternative timing specifications involving SMPTE, but they are not discussed
Appendix B. MIDI Details 94
each tick represents one of these pulses.
There are two types of standard MIDI Files (SMF): Type 0 and Type 1 (SMF-
0 and SMF-1). SMF-0 files simply contain a list of MIDI Events, each of which
is responsible for defining it’s output channel. SMF-1 files divide the data into
“tracks”. These are separate streams of events, which would typically be sent to
different instruments, each of which has a name and a channel (as well as other
parameters). There is no hard mapping from tracks to instruments, though, as
many tracks can be set to use the same channel. All MIDI files considered in this
project are SMF-1, as this is both the most prevalent and useful standard.
here
Appendix C
jMusic
C.1 Overview
After construction of the original low level classes, an Open Source library jMusic
was found 1 which could provide the required functionality, so development was
switched to these libraries. Among the benefits were:
• XML Serialisation support
• Display in Common Practice Notation.
• Many constants, for note durations, note names, GM specs etc.
• Support for reading and writing MIDI files
jMusic uses Scores, Parts, Phrases and Notes to represent music (see Figure
C.1 for a graphical explanation)
Note The smallest unit is a Note, which has
• pitch
• velocity
• rhythm value
1http://jmusic.ci.qut.edu.au/
95
Appendix C. jMusic 96
��������������������������������������������������
������������������������������������������������������
������������������
������������������������������
Start Time
Duration
Rhythmic Value
Pitch
Phrase (Staccato)
Phrase (Legato)Part
Figure C.1: jMusic Part, Phrase and Note Structure
• duration
Rests are indicated by Notes with a particular pitch value (jmusic.JMC.REST).
Phrase A Phrase is a list of Notes, with each Note added occurring after the
previous note has finished - that is, a note’s onset time is determined by
the sum of the rhythmic values of the previous notes in the phrase. The
duration of Notes allows construction of legato or staccato passages (a series
of notes with smaller durations than rhythmic values will be staccato - see
Figure C.1). Phrases have optional start times (relative to the start of the
Part which contains them). A Phrase containing a musical phrase which
started on the second beat of the piece could be represented either by a
Phrase with a start time of 1.0 (start times start from 0.0) or by a Phrase
where the first note is a one beat rest (and sometimes this distinction is
important).
CPhrases A CPhrase collects Phrases together, and allows them to overlap.
They are not used in this project
Part A Part represents everything that a musician would play in a piece. It is a
Appendix C. jMusic 97
collection of Phrases, along with useful information such as
• a title
• MIDI channel and instrument number
• time signature
• key signature
Score A Score contains the playing of all musicians for an entire piece. It also
has time and key signatures, a tempo and a title.
The limitations here are that Notes depend on the previous notes for their
timing - to move a note backwards in time, one has to shorten previous notes
and lengthen the note in question. A possible workround for this is to construct
a CPhrase made of several single note Phrases, the beginning points of which
can then be set individually. Fortunately, this was not necessary for this project.
Similarly, to play a chord, one has to either create a second Phrase, or add Notes
with a rhythmic value of zero (but the correct duration) so that they all start
at the same time. jMusic Phrases have basic support for this, but it has been
tweaked for flexibility.
C.2 Alterations
A few changes were needed to the stock jMusic distribution in order to make full
use of it. These are documented here.
XML Serialisation While jMusic has support for XML serialisation, this is
currently only applicable to complete scores. A wrapper class was added to
expose the necessary methods. The jMusic XML parser is very brittle, and
will only read it’s own XML.2
2There several undocumented constraints, including: no whitespace at all is allowed between
elements, attributes must be in the correct order. So while it produces valid XML it cannot
read all valid XML.
Appendix C. jMusic 98
Track Names jMusic as it stands cannot extract the track names from MIDI
files. This functionality has been added.
Better Chord Support Adding chords to Phrases in jMusic is an inelegant
operation - all except one of the notes are added with a rhythm value of
0, and they all have a default dynamic. A dynamic for each note is now
supported, and the algorithm will now not add unnecessary rests.
C.3 jMusic Issues
Towards a the end of development a bug was discovered in jMusic which meant
that MIDI files containing certain messages3 were dealt with incorrectly, and
caused the timing of parts to be drastically altered, rendering the music un-
recognisable. The MidiFileAnalyser class was repurposed to strip all offending
content from MIDI files to provide jMusic-safe versions.
3specifically pitch bend
Appendix D
Listening Assessment Test
As part of my MSc, I have been creating a musical agent which can play alongwith pre-recorded MIDI files in the way that a real player would. Specifically, itis being a timbalero in a salsa band.
The purpose of this assessment is to determine whether the virtual timbalero isdistinguishable from a human player, and also to determine which is preferable.
The enclosed CD contains three tracks:
1 A brief demo of the sounds and some of the rhythms which the timbalero will play
2,3 Two versions of ”Mi Tierra” by Gloria Estefan. One of these is played by thevirtual timbalero, and one by myself.
The recordings are made from a timbalero’s point of view - that is, the timbalesparts are louder than normal, to make it easier to hear what is happening. The liveversion was played on an electric drum kit (so that the same sounds are used); it hasbeen quantized and obvious mistakes have been fixed.
You will find the questionnaire on the back of this sheet - make sure you read itbefore listening so you know what to listen for. Please fill in the form before discussingthis with anyone. Any thoughts that come out of a group discussion should be addedin the comments section.
Finally, don’t spare my feelings! If you think that the virtual player sounds betterthan me, then that makes me just as happy as hearing that my playing is OK!
Thanks very much for your time,Dave
99
Appendix D. Listening Assessment Test 100
Your Name
Which ver-sion is playedby a com-puter
first second
How sure areyou?
certain very sure quite sure not very sure totallyunsure
What makesyou thinkthis?
Which play-ing do youprefer?
Comments
Appendix E
Example Output
Table E.1 shows the annotation of Mi Tierra produced by the timbalero.
The section headings are as follows:
Bar The bar number. (Note: bar numbers in actual output are zero indexed. Here
they have been converted to start at 1 to match musical convention. All references
in the text are to 1 indexed bar numbers)
Chord Context Free classification of the current chord
Key The current key
RKey The current key, calculated in reverse bar order
ContChord A contextual classification of the current chord
δii
Normalised change in instrumentation, calculated by adding up the absolute differ-
ences in activity of all instruments, and dividing by the average of the activity
levels for this bar and the previous bar.
i Instrumentation level - the sum of the durations of all the notes played by everyone
in this bar.
p Number of people playing (who have an activity level of more than 0.5)
δP The change in players (+1 for each instrument which enters and each instrument
which drops out)
Phrasing The results of the phrasing analysis; the scores from the two methods are
given. If it is preceded by as ’+’, then the bar is considered PhrasingOnly. A ’.’
means that the bar is considered to have phrasing
ChPat Results of the chord pattern feature
Pref The final score for segment preference
101
Appendix E. Example Output 102
WF Any indications of well formedness; ⊕ means there must be a break, 5 means
there must not be a break.
Break ⊗ means that there was a break.
For comparison to the hand analysis, Segment breaks are shown by single horizontal
rules, and Sections have the section name and role in a box at the top.
Some points to note:
• The output is from a run where the Timbalero did not play; the timbalero’s
playing can confuse some of the features (notably the phrasing analysis)
Appendix E. Example Output 103
Bar Chord Key RKey ContChord δi
ii p δP Phrasing ChPat Pref WF Break
Intro (Son)1 none C nu none 1.0 0.52 1 1 +0.0 0.0 5.49 1.0 ⊕
2 none C C none 0.13 0.69 1 0 +0.0 0.0 5.42 0.66 5
3 none C C none 0.10 0.86 1 0 +0.0 0.0 5.22 0.62 5
4 none C C none 0.03 0.92 1 0 +0.0 0.0 4.92 0.53 5
5 none C C none 0.10 0.75 1 0 +0.0 0.0 4.53 0.59 5
6 none C C none 0.09 0.91 1 0 +0.0 0.0 4.08 0.60 5
7 none C Cm none 0.13 0.69 1 0 +0.0 0.0 3.58 0.62 5
8 none C Cm none 0.61 2.94 2 1 0.37 0.37 3.04 2.12 ⊕ ⊗
9 Cm C Cm Cm 0.64 11.7 4 2 0.57 0.43 2.47 1.99 ⊗
10 Cm Fm Cm Cm 0.13 9.64 4 0 0.37 0.56 1.83 0.5811 Cm Fm Cm Cm 0.10 11.9 4 0 0.42 0.31 1.19 0.6212 Cm Cm Cm Cm 0.02 12.4 4 0 0.25 0.31 0.56 0.5113 Cm Cm Cm Cm 0.21 17.9 4 0 0.37 0.31 0.0 0.2214 Cm Cm F Cm 0.26 11.1 4 0 0.37 0.5 0.0 0.1815 Cm Cm F Cm 0.08 12.4 4 0 0.42 0.37 0.0 0.0516 Ab7 Cm F C1-120000 0.31 10.6 5 1 +1.0 0.8 0.0 0.07 ⊕ ⊗
Bridge (Son)17 none Fm F none 1.0 2.34 1 6 +0.0 0.0 2.82 1.38 5
18 Cdom7 Fm F Cdom7 0.93 8.66 3 4 +1.0 1.0 2.15 1.84 5
19 none Fm F none 0.93 2.22 1 4 +0.0 0.0 1.48 0.87 5
20 Cdom7 Fm Cm Cdom7 0.91 9.07 3 4 +1.0 1.0 0.82 2.03 5
Verse (Son)21 Cm Fm Cm Cm 0.51 9.89 4 3 0.28 0.43 0.28 0.34 ⊕ ⊗
22 Cm Fm Cm Cm 0.07 10.7 3 1 0.25 0.25 0.38 1.04 ⊗
23 Cm Fm Cm Cm 0.08 12.1 4 1 0.25 0.25 0.26 0.5624 Gsus4 Fm Cm F0010001 0.06 10.7 3 1 0.28 0.56 0.21 0.5525 G Fm Cm G 0.27 11.1 5 2 0.28 0.5 0.13 0.5126 Gdom7 Cm Cm C0020010 0.20 10.9 5 0 0.28 0.43 0.04 0.0127 Fdom7 Cm Cm C1-110010 0.20 12.3 4 1 0.57 0.43 0.09 0.7628 Cm Cm Cm Cm 0.22 14.3 5 1 0.25 0.29 0.05 0.2829 Gsus4 Cm Cm C7 0.28 11.7 4 1 0.42 0.37 0.03 0.2830 Gdom7 Cm Cm Gdom7 0.23 13.5 5 1 0.28 0.35 0.0 0.0731 Fdom7 Cm Cm C1-110010 0.27 12.2 4 1 0.57 0.56 0.0 0.0432 Fm Cm Cm C1020010 0.30 14.5 5 1 0.66 0.35 0.09 0.3933 Fdom7 Cm Cm C1-110010 0.32 13.6 4 1 0.57 0.56 0.12 1.02 ⊗
34 Cm Cm Cm Cm 0.31 13.2 5 1 0.5 0.45 0.07 0.0135 G101-1010 Cm Cm C1010010 0.32 11.0 4 1 0.28 0.43 0.08 0.7836 Gdom7 Cm Cm Gdom7 0.38 13.0 5 1 0.25 0.5 0.0 0.0937 F6 Cm Cm C1010010 0.27 11.6 4 1 0.22 0.25 0.0 0.0538 Cm Cm Cm Cm 0.36 19.9 5 1 0.28 0.5 0.0 0.3539 Eb0110101 Cm Cm C101-1010 0.68 9.16 5 4 .0.8 0.7 0.0 0.2740 Bb100-100 Cm Cm C002-1000 0.19 12.7 5 2 +1.0 0.75 0.09 0.49 ⊕ ⊗
Appendix E. Example Output 104
Bar Chord Key RKey ContChord δi
ii p δP Phrasing ChPat Pref WF Break
Chorus (Son)41 Cm Cm Cm Cm 0.35 19.2 6 3 0.62 0.29 0.40 1.25 ⊕ ⊗
42 G1020010 Cm C Cm 0.11 15.3 6 0 0.42 0.37 0.38 0.6043 Gdom7 Cm Cm Gdom7 0.13 19.6 6 0 0.62 0.37 0.34 0.6344 Gdom7 Cm Cm Gdom7 0.11 15.8 6 0 0.57 0.5 0.22 0.5945 G110-1010 Cm Cm C1011010 0.12 20.4 6 0 0.71 0.45 0.05 0.1446 Gdom7 Cm Cm Gdom7 0.18 14.1 6 0 0.42 0.33 0.03 0.1547 G1020010 Cm Cm Cm 0.18 20.4 6 0 0.5 0.25 0.13 0.8248 Cm Cm Cm Cm 0.31 17.9 7 1 0.42 0.5 0.30 0.6649 Cm Cm Cm Cm 0.19 23.7 7 0 0.62 0.32 0.40 1.16 ⊗
50 G110-1010 Cm Cm C1011010 0.15 19.8 7 0 0.57 0.42 0.38 0.5851 Gdom7 Cm Cm Gdom7 0.17 22.0 7 0 0.62 0.39 0.34 0.5552 G100-1010 Cm Cm C1020010 0.20 19.2 7 0 0.42 0.5 0.22 0.5653 G110-1010 Cm Cm C1011010 0.13 21.8 7 0 0.62 0.32 0.05 0.0654 Gdom7 Cm Cm Gdom7 0.13 19.4 7 0 0.57 0.50 0.05 0.0555 Gdom7 Cm Cm C0011010 0.13 19.4 7 0 0.75 0.32 0.01 0.0056 Cm Cm Cm Cm 0.36 11.4 4 3 0.5 0.43 0.13 0.80
Instrumental Chorus (Son)57 Cm Cm Cm Cm 0.23 18.4 6 2 0.42 0.16 0.27 1.30 ⊗
58 Cm Cm Cm Cm 0.41 25.2 6 0 0.42 0.16 0.21 0.4859 Gdom7 Cm Cm Gdom7 0.43 15.7 5 1 0.71 0.29 0.22 0.8860 G100-1010 Cm Cm C1020010 0.35 23.4 6 1 0.42 0.25 0.08 0.4461 Gdom7 Cm Cm C0011010 0.35 18.0 5 1 0.75 0.45 0.03 0.3162 Gdom7 Cm Cm Gdom7 0.16 15.4 6 1 0.42 0.33 0.0 0.0763 Fdom7 Cm Cm C1-110010 0.27 26.4 7 1 0.75 0.46 0.0 0.3564 Fdom7 Cm Cm C1-100010 0.25 16.4 6 1 0.75 0.5 0.04 0.1965 Cm Cm Cm Cm 0.63 14.7 5 3 0.5 0.4 0.10 1.04 ⊗
66 Cm Cm Cm Cm 0.39 15.7 7 2 0.57 0.39 0.05 0.0367 Gsus4 Cm Cm C7 0.31 28.8 7 0 0.71 0.39 0.06 0.9168 Gdom7 Cm Cm Gdom7 0.30 16.1 7 0 0.42 0.5 0.04 0.2269 Fm Cm Cm C1021010 0.21 23.3 7 0 0.71 0.42 0.12 1.22 ⊗
70 Gdom7 Cm C Gdom7 0.14 18.6 7 0 0.42 0.35 0.05 0.0971 G1020010 Cm C Cm 0.11 21.9 7 0 0.75 0.28 0.01 0.0872 Cm Cm F Cm 0.50 12.1 6 3 +1.0 0.87 3.04 1.22 ⊕ ⊗
Bridge (Son)73 none Cm C none 1.0 2.35 1 7 +0.0 0.0 2.82 0.90 5
74 Cdom7 Cm F Cdom7 0.93 7.75 3 4 +1.0 1.0 2.15 1.65 5
75 none C Cm none 0.93 2.22 1 4 +0.0 0.0 1.48 0.85 5
76 Cdom7 C Cm Cdom7 0.90 7.50 3 4 +1.0 1.0 0.82 1.68 5
Verse (Son)77 Cm C Cm Cm 0.52 9.88 4 3 0.28 0.43 0.28 0.45 ⊕ ⊗
78 Cm Cm Cm Cm 0.06 10.5 3 1 0.25 0.31 0.38 1.03 ⊗
79 Cm Cm Cm Cm 0.05 11.2 4 1 0.25 0.31 0.26 0.5380 Gdom7 Cm Cm C0011010 0.08 9.53 4 0 0.25 0.5 0.21 0.5781 Gdom7 Cm Cm Gdom7 0.20 10.9 5 1 0.28 0.5 0.13 0.5782 Gdom7 Cm Cm C0020010 0.19 8.74 4 1 0.28 0.43 0.04 0.1083 Fdom7 Cm Cm C1-110010 0.17 12.4 4 0 0.57 0.43 0.05 0.7184 Cm Cm Cm Cm 0.22 14.5 5 1 0.25 0.29 0.0 0.0885 G110-1010 Cm Cm C1011010 0.32 10.9 4 1 0.42 0.37 0.10 1.12 ⊗
86 Gdom7 Cm Cm Gdom7 0.25 12.4 5 1 0.28 0.35 0.07 0.0687 Cm Cm Cm Cm 0.28 11.7 4 1 0.57 0.56 0.0 0.0288 Fm Cm Cm C1020010 0.31 14.1 5 1 0.66 0.35 0.09 0.4089 Fdom7 Cm Cm C1-110010 0.33 13.2 4 1 0.57 0.56 0.12 0.8390 Cm Cm Cm Cm 0.30 12.0 5 1 0.5 0.45 0.07 0.0491 Gsus4 Cm Cm C7 0.28 11.0 4 1 0.28 0.43 0.13 1.04 ⊗
92 Gdom7 Cm Cm Gdom7 0.33 10.8 5 1 0.25 0.56 0.01 0.2193 Fdom7 Cm Cm C1-110010 0.31 10.5 4 1 0.22 0.31 0.0 0.0194 Cm Cm Cm Cm 0.50 16.4 5 1 0.28 0.5 0.0 0.2795 C101-1010 Cm C C101-1010 0.69 9.18 5 4 .0.8 0.65 0.0 0.2296 Bb100-100 F Cm Bb100-100 0.37 12.4 6 3 +1.0 0.8 0.09 0.48 ⊕ ⊗
Appendix E. Example Output 105
Bar Chord Key RKey ContChord δi
ii p δP Phrasing ChPat Pref WF Break
Chorus (Son)97 Cm F Cm Cm 0.37 18.0 6 2 0.62 0.29 0.40 1.22 ⊕ ⊗
98 G1120010 Cm Cm C1-111000 0.14 13.6 6 0 0.42 0.33 0.38 0.6299 Gdom7 Cm Cm Gdom7 0.19 20.4 6 0 0.62 0.37 0.28 0.74100 G100-1010 Cm Cm C1020010 0.17 14.3 6 0 0.57 0.5 0.18 0.64101 Gdom7 Cm Cm C0011010 0.14 19.3 6 0 0.71 0.45 0.13 0.67102 Gdom7 Cm Cm Gdom7 0.15 14.2 6 0 0.42 0.33 0.01 0.13103 Fdom7 Cm Cm C1-110010 0.15 19.4 6 0 0.5 0.25 0.04 0.17104 Cm Cm Cm Cm 0.31 18.9 7 1 0.37 0.5 0.27 1.01 ⊗
105 Cm Cm Cm Cm 0.16 21.3 7 0 0.62 0.32 0.16 0.56106 Gsus4 Cm C C7 0.15 19.4 7 0 0.57 0.42 0.08 0.24107 Gdom7 Cm C Gdom7 0.17 20.9 7 0 0.62 0.39 0.06 0.03108 Gdom7 Cm C Gdom7 0.35 14.0 7 0 0.42 0.5 0.04 0.16109 Fm Cm C C1021010 0.33 23.7 7 0 0.62 0.32 0.12 1.34 ⊗
110 Gdom7 Cm C Gdom7 0.15 20.0 7 0 0.57 0.50 0.0 0.07111 Gdom7 Cm C C0011010 0.17 18.7 7 0 0.75 0.32 0.01 0.43112 C Cm C C 0.27 16.6 6 3 +1.0 0.92 0.0 0.05 ⊕ ⊗
Piano Break (Mambo)113 Eb6 Cm C C1-10-100 0.74 9.00 1 5 .0.57 0.62 0.0 0.23 ⊕ ⊗
114 D7 G C D7 0.29 4.89 1 0 +0.75 0.87 0.0 0.22 ⊕ ⊗
115 F6 Cm C C1001010 0.20 7.37 1 0 +0.8 0.87 0.0 0.25 5
116 Ddom7 Cm C Cm 0.22 4.67 1 0 +0.75 0.87 0.0 0.18 5
117 C101-1001 G C G1-100010 0.39 10.7 3 2 .0.57 0.75 0.0 0.65 ⊕ ⊗
118 D7 G Cm D7 0.71 13.9 2 3 +0.75 0.87 0.0 0.14 ⊕ ⊗
119 D100-1001 G Cm G0110010 0.65 6.33 1 1 .0.57 0.62 0.0 0.27 ⊕ ⊗
120 Dm G Cm Dm 0.49 5.56 3 2 +0.85 0.93 0.0 0.06 ⊕ ⊗
Montuno (Montuno)121 F0110110 C Cm C101-1001 0.51 17.2 6 3 0.6 0.37 0.13 1.85 ⊕ ⊗
122 F1-110001 Cm Cm C1020010 0.69 58.6 6 0 0.5 0.37 0.03 1.20 ⊗
123 G011-1010 Cm Cm C1001010 0.62 18.3 6 0 0.28 0.29 2.39 1.34 ⊗
124 Cm Cm Cm Cm 0.17 16.6 6 0 0.57 0.33 2.27 0.54125 Fsus4 Cm Cm C100-1011 0.15 17.4 6 0 0.28 0.25 2.12 0.32126 Gsus4 Cm Cm C7 0.17 15.7 6 0 0.57 0.33 1.95 0.35127 G011-1010 Cm Cm C1001010 0.14 19.9 6 0 0.5 0.29 2.39 1.13 ⊗
128 Cm Cm Cm Cm 0.18 16.8 7 1 0.71 0.37 2.27 0.57129 Fsus4 Cm Cm C100-1011 0.30 23.4 7 0 0.57 0.37 2.12 0.49130 G Cm Cm C0011000 0.31 12.1 5 2 0.33 0.37 1.95 0.54131 G011-1010 Cm Cm C1001010 0.30 21.8 6 1 0.28 0.25 2.39 1.39 ⊗
132 Cm Cm Cm Cm 0.28 15.7 5 1 0.4 0.31 2.27 0.63133 Fsus4 Cm Cm C100-1011 0.20 21.6 6 1 0.42 0.41 2.12 0.49134 Gdom7 Cm Cm C0011010 0.21 14.1 5 1 0.57 0.29 1.95 0.47135 G011-1010 Cm Cm C1001010 0.22 18.0 6 1 0.25 0.12 2.39 1.13 ⊗
136 Cm Cm Cm Cm 0.25 13.6 5 1 .0.83 0.34 2.27 0.62137 Fsus4 Cm Cm C100-1011 0.29 23.7 6 1 0.5 0.25 1.97 0.86138 Gsus4 Cm Cm C7 0.28 14.2 4 2 0.2 0.18 1.66 0.49139 G011-1010 Cm Cm C1001010 0.28 20.7 6 2 0.14 0.16 1.76 1.22 ⊗
140 Cm Cm Cm Cm 0.27 14.8 5 1 0.3 0.18 1.46 0.64141 Fsus4 Cm Cm C100-1011 0.19 21.4 6 1 0.37 0.12 1.22 0.72142 Gsus4 Cm Cm C7 0.27 14.8 5 1 0.5 0.25 0.97 0.65143 Gsus4 Cm Cm C7 0.23 20.5 6 1 0.25 0.16 0.91 0.69144 Cm Cm Cm Cm 0.18 14.6 5 1 0.66 0.19 0.69 0.64145 F0110010 Cm Cm C102-1001 0.28 23.9 6 1 0.62 0.33 0.52 0.81146 G Cm Cm G 0.21 16.9 5 1 0.8 0.5 0.35 0.64147 Gsus4 Cm Cm C7 0.22 20.0 6 1 0.33 0.29 0.24 0.59148 Cm Cm Cm Cm 0.22 17.2 5 1 0.3 0.0 0.10 0.56149 Fsus4 Cm C C100-1011 0.22 21.2 6 1 0.37 0.20 0.0 0.11150 Gsus4 Cm G C7 0.37 13.9 4 2 0.3 0.25 0.03 0.57151 Gsus4 Cm G C7 0.51 11.9 6 2 0.75 0.55 0.0 0.07152 G Cm C G 0.30 7.78 4 2 +1.0 0.9 0.0 0.17 ⊕ ⊗
Appendix E. Example Output 106
Bar Chord Key RKey ContChord δi
ii p δP Phrasing ChPat Pref WF Break
Solos (Mambo)153 C100-1001 F C F0110010 0.53 8.58 4 4 +0.62 0.81 0.89 1.05 5
154 Am F A Am 0.47 3.07 3 1 0.12 0.12 0.74 0.82 ⊕ ⊗
155 Ab11-1000 C Ab Caug 0.51 9.43 4 1 .0.37 0.68 0.60 1.33 ⊗
156 Ebaug Ab Eb Ebaug 0.16 13.2 4 0 .0.25 0.62 0.47 0.50157 C100-1001 C C C100-1001 0.18 9.03 4 0 +0.62 0.81 0.89 1.15 ⊕ ⊗
158 Am A A Am 0.52 2.84 3 1 0.12 0.12 0.74 0.84 ⊕ ⊗
159 Ab11-1000 Ab Ab Ab11-1000 0.49 8.29 4 1 0.37 0.5 0.60 1.25 ⊗
160 Ebaug Ab Eb Ebaug 0.31 15.7 5 1 0.33 0.19 0.47 0.75161 C100-1001 C C C100-1001 0.29 17.7 5 0 +0.62 0.81 0.64 1.06 ⊕ ⊗
162 Am A A Am 0.73 2.72 3 2 0.12 0.12 0.50 0.92 ⊕ ⊗
163 Ab11-1000 Ab Ab Ab11-1000 0.62 11.5 5 2 0.5 0.49 0.39 2.11 ⊗
164 Ebaug Ab Eb Ebaug 0.15 15.6 5 0 0.5 0.6 0.28 0.47165 C1001001 C C C1001001 0.13 11.9 5 0 +1.0 0.8 0.29 0.81 ⊕ ⊗
166 Am A A Am 0.54 10.6 5 2 .0.87 0.41 0.20 0.55 ⊕ ⊗
167 Ab11-1000 Ab Ab Ab11-1000 0.31 16.2 6 1 0.5 0.37 0.12 0.76168 Ebaug Ab C Ebaug 0.15 21.6 6 0 0.37 1.11 0.05 0.36169 C1001001 C C C1001001 0.28 24.5 6 0 0.62 0.39 0.0 0.06170 Am A C Am 0.62 9.42 6 4 0.57 0.14 0.0 0.30171 Ab11-1000 Ab C Ab11-1000 0.57 15.1 6 2 0.75 0.54 0.0 0.30172 Gaug G C Gaug 0.12 18.1 6 0 0.5 0.5 0.0 0.10173 C100-1001 C F C100-1001 0.13 13.7 6 0 .0.83 0.45 0.0 0.12174 Am G F Am 0.71 5.19 5 5 0.37 0.43 0.0 0.31175 Caug G F G1010010 0.48 14.9 6 1 0.5 0.45 0.0 0.94176 Bb7 Bb F Bb7 0.46 6.65 3 5 0.62 0.33 0.0 0.27
Mambos (Mambo)177 Fsus4 F C Fsus4 0.53 19.7 4 1 0.75 0.56 0.22 1.98 ⊗
178 F1020001 F F F1020001 0.47 7.01 3 1 0.75 0.41 0.11 0.62179 G011-1010 F F F6 0.34 14.2 4 1 0.5 0.37 0.22 1.51 ⊗
180 Cm F F Cm 0.14 10.5 4 0 0.5 0.5 0.12 0.62181 Fsus4 F C Fsus4 0.26 15.8 6 2 0.71 0.37 0.11 0.75182 F1020001 F C F1020001 0.23 9.81 4 2 0.5 0.30 0.01 0.39183 G011-1010 F C F6 0.44 11.8 4 4 0.5 0.37 0.0 0.10184 C F C C 0.17 8.22 4 0 0.5 0.37 0.0 0.15185 Eb6 F C F011-1010 0.37 18.1 4 0 0.62 0.5 0.0 0.60186 Cdom7 F C Cdom7 0.27 10.3 4 0 0.37 0.37 0.0 0.21187 Gsus4 F Cm F0010001 0.15 12.3 4 0 0.5 0.37 0.22 1.09 ⊗
188 Cm F Cm Cm 0.19 10.8 4 0 0.5 0.43 0.12 0.56189 Fsus4 F C Fsus4 0.28 19.4 4 0 0.5 0.5 0.11 0.89190 F1020001 F C F1020001 0.56 7.78 4 2 0.5 0.56 0.01 0.30191 G1100-110 F C C1021000 0.51 24.4 6 2 0.37 0.5 0.01 1.26 ⊗
192 C F C C 0.31 32.5 6 0 0.57 0.5 0.0 0.16193 Eb0110001 F Cm F0110110 0.81 14.6 4 4 0.27 0.06 0.0 0.27194 D7 F Cm Cm 0.35 8.54 4 0 0.66 0.5 0.0 0.20195 Gsus4 F Cm F0010001 0.24 13.3 4 0 0.18 0.12 0.13 1.28 ⊗
196 Cm F Cm Cm 0.25 7.91 4 0 0.22 0.25 0.05 0.40197 Fsus4 F C Fsus4 0.18 11.5 4 0 0.44 0.18 0.0 0.23198 F1020001 F C F1020001 0.19 7.73 4 0 0.33 0.25 0.0 0.16199 Gsus4 F C C7 0.22 10.9 4 0 0.18 0.12 0.03 0.40200 G F C G 0.38 6.69 4 0 0.37 0.31 0.0 0.19201 Fsus4 F C Fsus4 0.44 17.3 5 1 0.75 0.09 0.22 1.79 ⊗
202 F1020001 F Cm F1020001 0.34 8.41 4 1 0.25 0.31 0.11 0.55203 G011-1010 F Cm F6 0.30 14.8 5 1 0.57 0.30 0.19 1.38 ⊗
204 Cm F Cm Cm 0.36 10.3 4 1 0.4 0.25 0.09 0.35205 Fsus4 F C Fsus4 0.15 12.0 4 0 0.44 0.18 0.08 0.08206 F1020001 F C F1020001 0.18 8.66 4 0 0.08 0.0 0.0 0.14207 Gsus4 F C F0010001 0.16 10.4 4 0 0.4 0.25 0.13 0.70208 C F C C 0.20 7.92 4 0 0.5 0.5 0.30 0.71
Appendix E. Example Output 107
Bar Chord Key RKey ContChord δi
ii p δP Phrasing ChPat Pref WF Break
Chorus (Son) (should be Montuno)209 Cm Cm Cm Cm 0.32 15.4 6 2 0.25 0.12 0.40 1.47 ⊗
210 G1120010 Cm C C1-111000 0.27 14.9 6 0 0.42 0.20 0.38 0.51211 Gdom7 Cm C C0011010 0.29 15.3 5 1 0.62 0.4 0.28 0.51212 Gdom7 Cm C C0020010 0.33 16.2 6 1 0.71 0.29 0.18 0.52213 Gdom7 C C C0011010 0.20 17.1 6 0 0.71 0.35 0.13 0.52214 Gdom7 C Cm Gdom7 0.15 13.2 5 1 0.66 0.4 0.01 0.11215 Fdom7 C Cm C1-110010 0.25 13.8 5 0 0.62 0.35 0.0 0.02216 C101-1001 C Cm C101-1001 0.27 14.4 5 0 0.25 0.37 0.08 0.32217 Cm Cm Cm Cm 0.19 14.0 5 0 .0.87 0.35 0.10 1.01 ⊗
218 Cm Cm Cm Cm 0.23 13.8 6 1 0.57 0.12 0.05 0.00219 Gsus4 Cm C C7 0.22 16.3 6 0 0.37 0.29 0.06 0.59220 Gdom7 Cm C Gdom7 0.30 9.24 6 0 0.57 0.35 0.04 0.21221 F6 Cm C C1001010 0.35 14.8 5 1 .0.87 0.35 0.12 1.30 ⊗
222 Gdom7 Cm C Gdom7 0.18 10.3 5 0 0.66 0.35 0.03 0.35223 Gsus4 C C C7 0.18 14.1 5 0 0.75 0.35 0.01 0.38224 Cm Cm C Cm 0.64 12.4 5 6 +1.0 0.85 0.0 0.05 ⊕ ⊗
End (Montuno)225 C100-1001 C C C100-1001 0.54 41.1 6 1 0.42 0.39 0.0 1.14 ⊕ ⊗
226 Gm C C Gm 0.06 44.0 6 0 0.33 0.53 0.0 0.03227 none C C none 0.86 3.96 1 5 0.53 0.12 0.0 0.45228 C C C C 0.86 8.07 6 5 +1.0 1.0 0.0 0.51 ⊕ ⊗
Bibliography
[1] Kansei-Based Approach Antonio. Interactive systems design:.
[2] J. Arcos, D. Canamero, and R. Lopez. Affect-driven generation of expressivemusical performances, 1998.
[3] Bernard Bel. http://www.lpl.univ-aix.fr/∼belbernard/music/bp2intro.htm.
[4] A. Camurri. An architecture for multimodal environment agents, 1997.
[5] Antonio Camurri and Alessandro Coglio. An architecture for emotionalagents. IEEE MultiMedia, 5(4):24–33, – 1998.
[6] Antonio Camurri, Barbara Mazzarino, and et al. Real-time analysis of ex-pressive cues in human movement.
[7] Dolores Canamero, Josep Lluıs Arcos, and Ramon Lopez de Mantaras. Im-itating human performances to automatically generate expressive jazz bal-lads.
[8] Roger B Dannenberg. A brief survey of music representation issues, tech-niques, and systems, 1994.
[9] Christopher Raphael Department. Orchestra in a box: A system for real-timemusical accompaniment.
[10] Lisa Cingiser DiPippo, Ethan Hodys, and Bhavani Thuraisingham. Towardsa real-time agent architecture - a whitepaper.
[11] Simon Dixon. A lightweight multi-agent musical beat tracking system. InPacific Rim International Conference on Artificial Intelligence, pages 778–788, 2000.
[12] Zahia Guessoum and M. Dojat. A real-time agent model in an asynchronous-object environment. In Rudy van Hoe, editor, Seventh European Workshopon Modelling Autonomous Agents in a Multi-Agent World, Eindhoven, TheNetherlands, 1996.
108
Bibliography 109
[13] M. Harris, A. Smaill, and G. Wiggins. Representing music symbolically,1991.
[14] B. Horling, V. Lesser, R. Vincent, and T. Wagner. The soft real-timeagent control architecture. Technical Report TR02 -14, University of Mas-sachusetts at Amherst, April 2002.
[15] ICMC. Rhythms as Emerging Structures, 2000.
[16] Fabio Kon and Fernando Iazzetta. Internet music: Dream or (virtual) real-ity?
[17] C. Krummhansl and E. Kessler. Tracing the dynamic changes in perceivedtonal organization in a spatial representation of musical keys. PsychologicalReview, (89):334–368, 1982.
[18] Fred Lerdahl and Ray Jackendoff. A Generative Theory of Tonal Music.MIT Press, 1983.
[19] H. C. Longuet-Higgins. Letter to a musical friend. The musical review,23:244–8,271–80, 1962.
[20] Midi specifications. http://www.midi.org/about-midi/specshome.shtml.
[21] The midi specification. http://www.borg.com/∼jglatt/tech/midispec.htm.
[22] Midi specification. http://www.sfu.ca/sca/Manuals/247/midi/MIDISpec.html.
[23] Remy Mouton and Francois Pachet. The symbolic vs. numeric controversyin automatic analysis of music.
[24] F. Pachet. The MusES system: An environment for experimenting withknowledge representation techniques in tonal harmony. In Proceedings ofthe 1st Brazilian Symposium on Computer Music, Caxambu, Minas Gerais,Brazil, pages 195–201, 1994.
[25] F. Pachet, G. Ramalho, and J. Carrive. Representing temporal musicalobjects and reasoning in the MusES system. Journal of New Music Research,5(3):252–275, 1996.
[26] Geber Ramalho and Jean-Gabriel Ganascia. Simulating creativity in jazzperformance. In National Conference on Artificial Intelligence, pages 108–113, 1994.
[27] Judy Robertson, Andrew de Quincey, Tom Stapleford, and Geraint Wiggins.Real-time music generation for a virtual environment.