8
Learning to Generate Music with BachProp Florian Colombo School of Computer Science and School of Life Sciences ´ Ecole Polytechnique F´ ed´ erale de Lausanne, Switzerland [email protected] Johanni Brea School of Computer Science and School of Life Sciences ´ Ecole Polytechnique F´ ed´ erale de Lausanne, Switzerland [email protected] Wulfram Gerstner School of Computer Science and School of Life Sciences ´ Ecole Polytechnique F´ ed´ erale de Lausanne, Switzerland [email protected] ABSTRACT As deep learning advances, algorithms of music compo- sition increase in performance. However, most of the suc- cessful models are designed for specific musical structures. Here, we present BachProp, an algorithmic composer that can generate music scores in many styles given sufficient training data. To adapt BachProp to a broad range of musi- cal styles, we propose a novel representation of music and train a deep network to predict the note transition proba- bilities of a given music corpus. In this paper, new music scores generated by BachProp are compared with the orig- inal corpora as well as with different network architectures and other related models. A set of comparative measures is used to demonstrate that BachProp captures important features of the original datasets better than other models and invite the reader to a qualitative comparison on a large collection of generated songs. 1. INTRODUCTION In search of the computational creativity frontier [1], ma- chine learning algorithms are more and more present in creative domains such as painting [2, 3] and music [4–6]. Already in 1847, Ada Lovelace predicted the potential of analytical engines for algorithmic music composition [7]. Current models of music generation include rule based ap- proaches, genetic algorithms, Markov models or more re- cently artificial neural networks [8]. One of the first artificial neural networks applied to mu- sic composition was a recurrent neural network trained to generate monophonic melodies [9]. In 2002, networks of long short-term memory (LSTM) [10] were applied for the first time to music composition, so as to generate Blues monophonic melodies constrained on chord progressions [11]. Since then, music composition algorithms employing LSTM units, have been used to generate monophonic [4,5] and polyphonic music [6, 12–14] or to harmonize chorales Copyright: c 2018 Florian Colombo et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License , which permits unre- stricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published in the Proceedings of the 16th Sound and Music Computing Conference. in the style of Bach [6,14]. However, most of these algo- rithms make strong assumptions about the structure of the music they model. Here, we present a neural composer algorithm named Bach- Prop designed to generate new music scores in an arbitrary style implicitly defined by the corpus of training data. To this end, we do not assume any specific musical structure of the data except that it is composed of sequences of notes that are characterized by pitch, duration and time-shift rel- ative to the previous note. In the following, we start by contrasting our representa- tion of music to previous propositions [6, 12, 14, 15] with a focus towards training style-agnostic generative models of music. We then introduce our algorithm and compare BachProp with other models on a standard datasets of chor- ales written by Johann Sebastian Bach [16] and establish new benchmarks on the musically complex datasets of MI- DI recordings by John Sankey [17] and string quartets by Haydn and Mozart [18]. Finally, as the evaluation and comparison of generative models is not trivial [19], we in- vite the reader, first, to a subjective comparison on a large collection of samples generated from the different mod- els on the accompanying media webpage [20] and, second, we propose a new set of metrics to quantify differences be- tween the models. Preliminary versions of our work have been made available on arXiv [21, 22]. 2. RELATED WORK Unlike approaches to image generation, where the stan- dard data consists of rows and columns of pixel values for multiple color channels, approaches to music generation lack a standard representation of music data. This is re- flected by the zoo of music notation file formats (ABC, LilyPond, MusicXML, NIFF, MIDI) and the fact that loss- less conversion from one to the other is usually not possi- ble. The MIDI file format captures most features of music, like polyphony, dynamics, micro tuning, expressive timing and tempo changes. But its representational richness and the possibility to represent the exact same song in multi- ple ways, make it challenging to work directly with MIDI. Therefore, all approaches discussed in the following use a first preprocessing step to transform all songs into a sim- pler representation. The subsequent design choices of the generative model are heavily influenced by this first pre- processing step. DeepBach [6] is designed exclusively for songs with a arXiv:1812.06669v2 [cs.SD] 12 Jun 2019

Learning to Generate Music with BachProp

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning to Generate Music with BachProp

Learning to Generate Music with BachProp

Florian ColomboSchool of Computer Scienceand School of Life Sciences

Ecole Polytechnique Federalede Lausanne, Switzerland

[email protected]

Johanni BreaSchool of Computer Scienceand School of Life Sciences

Ecole Polytechnique Federalede Lausanne, Switzerland

[email protected]

Wulfram GerstnerSchool of Computer Scienceand School of Life Sciences

Ecole Polytechnique Federalede Lausanne, Switzerland

[email protected]

ABSTRACT

As deep learning advances, algorithms of music compo-sition increase in performance. However, most of the suc-cessful models are designed for specific musical structures.Here, we present BachProp, an algorithmic composer thatcan generate music scores in many styles given sufficienttraining data. To adapt BachProp to a broad range of musi-cal styles, we propose a novel representation of music andtrain a deep network to predict the note transition proba-bilities of a given music corpus. In this paper, new musicscores generated by BachProp are compared with the orig-inal corpora as well as with different network architecturesand other related models. A set of comparative measuresis used to demonstrate that BachProp captures importantfeatures of the original datasets better than other modelsand invite the reader to a qualitative comparison on a largecollection of generated songs.

1. INTRODUCTION

In search of the computational creativity frontier [1], ma-chine learning algorithms are more and more present increative domains such as painting [2, 3] and music [4–6].Already in 1847, Ada Lovelace predicted the potential ofanalytical engines for algorithmic music composition [7].Current models of music generation include rule based ap-proaches, genetic algorithms, Markov models or more re-cently artificial neural networks [8].

One of the first artificial neural networks applied to mu-sic composition was a recurrent neural network trained togenerate monophonic melodies [9]. In 2002, networks oflong short-term memory (LSTM) [10] were applied for thefirst time to music composition, so as to generate Bluesmonophonic melodies constrained on chord progressions[11]. Since then, music composition algorithms employingLSTM units, have been used to generate monophonic [4,5]and polyphonic music [6, 12–14] or to harmonize chorales

Copyright: c© 2018 Florian Colombo et al. This is

an open-access article distributed under the terms of the

Creative Commons Attribution 3.0 Unported License, which permits unre-

stricted use, distribution, and reproduction in any medium, provided the original

author and source are credited.

Published in the Proceedings of the 16th Sound and Music Computing

Conference.

in the style of Bach [6, 14]. However, most of these algo-rithms make strong assumptions about the structure of themusic they model.

Here, we present a neural composer algorithm named Bach-Prop designed to generate new music scores in an arbitrarystyle implicitly defined by the corpus of training data. Tothis end, we do not assume any specific musical structureof the data except that it is composed of sequences of notesthat are characterized by pitch, duration and time-shift rel-ative to the previous note.

In the following, we start by contrasting our representa-tion of music to previous propositions [6, 12, 14, 15] witha focus towards training style-agnostic generative modelsof music. We then introduce our algorithm and compareBachProp with other models on a standard datasets of chor-ales written by Johann Sebastian Bach [16] and establishnew benchmarks on the musically complex datasets of MI-DI recordings by John Sankey [17] and string quartets byHaydn and Mozart [18]. Finally, as the evaluation andcomparison of generative models is not trivial [19], we in-vite the reader, first, to a subjective comparison on a largecollection of samples generated from the different mod-els on the accompanying media webpage [20] and, second,we propose a new set of metrics to quantify differences be-tween the models. Preliminary versions of our work havebeen made available on arXiv [21, 22].

2. RELATED WORK

Unlike approaches to image generation, where the stan-dard data consists of rows and columns of pixel values formultiple color channels, approaches to music generationlack a standard representation of music data. This is re-flected by the zoo of music notation file formats (ABC,LilyPond, MusicXML, NIFF, MIDI) and the fact that loss-less conversion from one to the other is usually not possi-ble. The MIDI file format captures most features of music,like polyphony, dynamics, micro tuning, expressive timingand tempo changes. But its representational richness andthe possibility to represent the exact same song in multi-ple ways, make it challenging to work directly with MIDI.Therefore, all approaches discussed in the following use afirst preprocessing step to transform all songs into a sim-pler representation. The subsequent design choices of thegenerative model are heavily influenced by this first pre-processing step.

DeepBach [6] is designed exclusively for songs with a

arX

iv:1

812.

0666

9v2

[cs

.SD

] 1

2 Ju

n 20

19

Page 2: Learning to Generate Music with BachProp

constant number of voices (e.g. four voices for a typicalBach chorale) and a discretization of the rhythm into mul-tiples of a base unit, e.g. 16th notes. The model achievesgood results not only in generating novel songs but al-lows also in reharmonizing given melodies while respect-ing user-provided meta-information like the temporal posi-tion of fermatas. The model works with a Gibbs-sampling-like procedure, where, for each voice and time step, onenote is sampled from conditional distributions parameter-ized by deep neural networks. The conditioning is on theother voices in a time window surrounding the current time-step. Additionally a “temporal backbone” signals the po-sition of the current 16th note relative to quarter notes andother meta-information. A special hold symbol can also besampled instead of a note, to represent notes with a dura-tion longer than one time-step.

BachBot [14] and its Magenta implementation Polyphony-RNN [15] contain no assumption about the number of voices;they can be fit to any corpus of polyphonic music, if therhythm can be discretized into multiples of a base unit,e.g. 16th notes. Songs are represented as sequences ofNEW NOTE(PITCH), CONT NOTE(PITCH) and STEP ENDevents, where the STEP END event indicates the end of thecurrent time-step. Between two STEP END events, typi-cally several NEW NOTE(PITCH) and CONT NOTE(PITCH)events can be found sorted by PITCH. A generative modelparametrized by a recurrent neural network model is fit tothese sequences of events, in the same way as recurrentneural network models are used for language modeling ona character- or word-level [23–25].

Common to the models discussed above is a discretiza-tion of time into multiples of a base unit like the 16th note.This limits the representable rhythms considerably; e.g.triplets, grace notes or expressive variations in timing can-not be represented in this way. To overcome this limitation,[26] replace the repertoire of symbols employed by thePolyphony-RNN by NOTE ON, NOTE OFF, TIME SHIFTand SET VELOCITY events, where the TIME SHIFT eventsallows the model to move forward in time by multiples of 8ms up to 1 second and the SET VELOCITY events allow tomodel the loudness of a note (which depends on the pianoon the velocity with which a key is pressed).

3. METHOD

In written music, the nth note note[n] of a piece of mu-sic song = (note[1], . . . ,note[N ]) can be characterizedby its pitch P [n], duration T [n] and the time-shift dT [n]of its onset relative to the previous note, i.e. note[n] =(dT [n], T [n], P [n]). The time-shift dT [n] is zero for notesplayed at the same time as the previous note. In contrast tomost other approaches that discretize the time into multi-ples of a base unit (except e.g. [26]), we round all durationsinto a set of defined musical durations which allows a morefaithful representation of timing that is limited only by thenumber of possible values considered for T [n] and dT [n].For example, our representation allows to easily and with-out any distortion represent 32nd notes, triplets and doubledotted notes in the same dataset. As well as any other morecomplex note durations that can be needed for specific cor-

pora. To achieve that, each duration in the original corpusis mapped to the closest duration in a set of integer multi-ples of atomic durations. For this work, we considered 64th

and 32nd triplets as atomic durations.Our approach is to approximate probability distributions

over note sequences in music scores song1, . . . , songSwith distributions parameterized by recurrent neural net-works and move its weights θ towards the maximum like-lihood estimate

θ∗ = argmaxθPr(song1, . . . , songS |θ) , (1)

Since each note in each song consists of the triplet (dT, T, P )we can parametrize the distributions in a similar way asthe pixel-RNN [27] that was developed for the (red, green,blue) triplets of pixels in images. Importantly, our modeltakes into account that pitch and duration of a note are gen-erally not independent. For example in classical music, thefundamental, e.g. the note C in a piece written in C major,tends to be longer than other notes.

In the following we describe in more details our repre-sentation of music, the structure of the model and our ap-proach to comparing different models that use differentrepresentations of music.

3.1 Conversion of MIDI files into our representationof music

0

F5

B<latexit sha1_base64="VswTg8NX8E/M7xUiW7Q/o7e5Zfc=">AAAB8HicbVA9TwJBEJ3DL8Qv1NJmIzGxIgeN2hFttMPEEyJcyN6yBxv29i67c0Zy4V/YWKix9efY+W9c4AoFXzLJy3szmZkXJFIYdN1vp7Cyura+UdwsbW3v7O6V9w/uTZxqxj0Wy1i3A2q4FIp7KFDydqI5jQLJW8Hoauq3Hrk2IlZ3OE64H9GBEqFgFK300EX+hEGYXU565YpbdWcgy6SWkwrkaPbKX91+zNKIK2SSGtOpuQn6GdUomOSTUjc1PKFsRAe8Y6miETd+Nrt4Qk6s0idhrG0pJDP190RGI2PGUWA7I4pDs+hNxf+8TorhuZ8JlaTIFZsvClNJMCbT90lfaM5Qji2hTAt7K2FDqilDG1LJhlBbfHmZePXqRdW9rVcaN3kaRTiCYziFGpxBA66hCR4wUPAMr/DmGOfFeXc+5q0FJ585hD9wPn8AQJqQ2Q==</latexit><latexit sha1_base64="VswTg8NX8E/M7xUiW7Q/o7e5Zfc=">AAAB8HicbVA9TwJBEJ3DL8Qv1NJmIzGxIgeN2hFttMPEEyJcyN6yBxv29i67c0Zy4V/YWKix9efY+W9c4AoFXzLJy3szmZkXJFIYdN1vp7Cyura+UdwsbW3v7O6V9w/uTZxqxj0Wy1i3A2q4FIp7KFDydqI5jQLJW8Hoauq3Hrk2IlZ3OE64H9GBEqFgFK300EX+hEGYXU565YpbdWcgy6SWkwrkaPbKX91+zNKIK2SSGtOpuQn6GdUomOSTUjc1PKFsRAe8Y6miETd+Nrt4Qk6s0idhrG0pJDP190RGI2PGUWA7I4pDs+hNxf+8TorhuZ8JlaTIFZsvClNJMCbT90lfaM5Qji2hTAt7K2FDqilDG1LJhlBbfHmZePXqRdW9rVcaN3kaRTiCYziFGpxBA66hCR4wUPAMr/DmGOfFeXc+5q0FJ585hD9wPn8AQJqQ2Q==</latexit><latexit sha1_base64="VswTg8NX8E/M7xUiW7Q/o7e5Zfc=">AAAB8HicbVA9TwJBEJ3DL8Qv1NJmIzGxIgeN2hFttMPEEyJcyN6yBxv29i67c0Zy4V/YWKix9efY+W9c4AoFXzLJy3szmZkXJFIYdN1vp7Cyura+UdwsbW3v7O6V9w/uTZxqxj0Wy1i3A2q4FIp7KFDydqI5jQLJW8Hoauq3Hrk2IlZ3OE64H9GBEqFgFK300EX+hEGYXU565YpbdWcgy6SWkwrkaPbKX91+zNKIK2SSGtOpuQn6GdUomOSTUjc1PKFsRAe8Y6miETd+Nrt4Qk6s0idhrG0pJDP190RGI2PGUWA7I4pDs+hNxf+8TorhuZ8JlaTIFZsvClNJMCbT90lfaM5Qji2hTAt7K2FDqilDG1LJhlBbfHmZePXqRdW9rVcaN3kaRTiCYziFGpxBA66hCR4wUPAMr/DmGOfFeXc+5q0FJ585hD9wPn8AQJqQ2Q==</latexit>

A4=69...

...⌘ 3/2

⌘ 1/2

...

...

Duration set

Pitch set

C<latexit sha1_base64="amgKNzy/V2YGl6lTrPSZBvoLF1Y=">AAAB8HicbVA9TwJBEN3DL8Qv1NJmIzGxIgeN2pHQaIeJJ0S4kL1lDjbs7V1254zkwr+wsVBj68+x89+4wBUKvmSSl/dmMjMvSKQw6LrfTmFtfWNzq7hd2tnd2z8oHx7dmzjVHDwey1h3AmZACgUeCpTQSTSwKJDQDsbNmd9+BG1ErO5wkoAfsaESoeAMrfTQQ3jCIMya03654lbdOegqqeWkQnK0+uWv3iDmaQQKuWTGdGtugn7GNAouYVrqpQYSxsdsCF1LFYvA+Nn84ik9s8qAhrG2pZDO1d8TGYuMmUSB7YwYjsyyNxP/87ophpd+JlSSIii+WBSmkmJMZ+/TgdDAUU4sYVwLeyvlI6YZRxtSyYZQW355lXj16lXVva1XGjd5GkVyQk7JOamRC9Ig16RFPMKJIs/klbw5xnlx3p2PRWvByWeOyR84nz9CHpDa</latexit><latexit sha1_base64="amgKNzy/V2YGl6lTrPSZBvoLF1Y=">AAAB8HicbVA9TwJBEN3DL8Qv1NJmIzGxIgeN2pHQaIeJJ0S4kL1lDjbs7V1254zkwr+wsVBj68+x89+4wBUKvmSSl/dmMjMvSKQw6LrfTmFtfWNzq7hd2tnd2z8oHx7dmzjVHDwey1h3AmZACgUeCpTQSTSwKJDQDsbNmd9+BG1ErO5wkoAfsaESoeAMrfTQQ3jCIMya03654lbdOegqqeWkQnK0+uWv3iDmaQQKuWTGdGtugn7GNAouYVrqpQYSxsdsCF1LFYvA+Nn84ik9s8qAhrG2pZDO1d8TGYuMmUSB7YwYjsyyNxP/87ophpd+JlSSIii+WBSmkmJMZ+/TgdDAUU4sYVwLeyvlI6YZRxtSyYZQW355lXj16lXVva1XGjd5GkVyQk7JOamRC9Ig16RFPMKJIs/klbw5xnlx3p2PRWvByWeOyR84nz9CHpDa</latexit><latexit sha1_base64="amgKNzy/V2YGl6lTrPSZBvoLF1Y=">AAAB8HicbVA9TwJBEN3DL8Qv1NJmIzGxIgeN2pHQaIeJJ0S4kL1lDjbs7V1254zkwr+wsVBj68+x89+4wBUKvmSSl/dmMjMvSKQw6LrfTmFtfWNzq7hd2tnd2z8oHx7dmzjVHDwey1h3AmZACgUeCpTQSTSwKJDQDsbNmd9+BG1ErO5wkoAfsaESoeAMrfTQQ3jCIMya03654lbdOegqqeWkQnK0+uWv3iDmaQQKuWTGdGtugn7GNAouYVrqpQYSxsdsCF1LFYvA+Nn84ik9s8qAhrG2pZDO1d8TGYuMmUSB7YwYjsyyNxP/87ophpd+JlSSIii+WBSmkmJMZ+/TgdDAUU4sYVwLeyvlI6YZRxtSyYZQW355lXj16lXVva1XGjd5GkVyQk7JOamRC9Ig16RFPMKJIs/klbw5xnlx3p2PRWvByWeOyR84nz9CHpDa</latexit>

= 120

A<latexit sha1_base64="nSewWuWzlqNWa6GFuGKemBC2Y/c=">AAAB8HicbVA9TwJBEJ3DL8Qv1NJmIzGxIgeN2mFstMPEEyJcyN6yBxv29i67c0Zy4V/YWKix9efY+W9c4AoFXzLJy3szmZkXJFIYdN1vp7Cyura+UdwsbW3v7O6V9w/uTZxqxj0Wy1i3A2q4FIp7KFDydqI5jQLJW8Hoauq3Hrk2IlZ3OE64H9GBEqFgFK300EX+hEGYXU565YpbdWcgy6SWkwrkaPbKX91+zNKIK2SSGtOpuQn6GdUomOSTUjc1PKFsRAe8Y6miETd+Nrt4Qk6s0idhrG0pJDP190RGI2PGUWA7I4pDs+hNxf+8TorhuZ8JlaTIFZsvClNJMCbT90lfaM5Qji2hTAt7K2FDqilDG1LJhlBbfHmZePXqRdW9rVcaN3kaRTiCYziFGpxBA66hCR4wUPAMr/DmGOfFeXc+5q0FJ585hD9wPn8APxaQ2A==</latexit><latexit sha1_base64="nSewWuWzlqNWa6GFuGKemBC2Y/c=">AAAB8HicbVA9TwJBEJ3DL8Qv1NJmIzGxIgeN2mFstMPEEyJcyN6yBxv29i67c0Zy4V/YWKix9efY+W9c4AoFXzLJy3szmZkXJFIYdN1vp7Cyura+UdwsbW3v7O6V9w/uTZxqxj0Wy1i3A2q4FIp7KFDydqI5jQLJW8Hoauq3Hrk2IlZ3OE64H9GBEqFgFK300EX+hEGYXU565YpbdWcgy6SWkwrkaPbKX91+zNKIK2SSGtOpuQn6GdUomOSTUjc1PKFsRAe8Y6miETd+Nrt4Qk6s0idhrG0pJDP190RGI2PGUWA7I4pDs+hNxf+8TorhuZ8JlaTIFZsvClNJMCbT90lfaM5Qji2hTAt7K2FDqilDG1LJhlBbfHmZePXqRdW9rVcaN3kaRTiCYziFGpxBA66hCR4wUPAMr/DmGOfFeXc+5q0FJ585hD9wPn8APxaQ2A==</latexit><latexit sha1_base64="nSewWuWzlqNWa6GFuGKemBC2Y/c=">AAAB8HicbVA9TwJBEJ3DL8Qv1NJmIzGxIgeN2mFstMPEEyJcyN6yBxv29i67c0Zy4V/YWKix9efY+W9c4AoFXzLJy3szmZkXJFIYdN1vp7Cyura+UdwsbW3v7O6V9w/uTZxqxj0Wy1i3A2q4FIp7KFDydqI5jQLJW8Hoauq3Hrk2IlZ3OE64H9GBEqFgFK300EX+hEGYXU565YpbdWcgy6SWkwrkaPbKX91+zNKIK2SSGtOpuQn6GdUomOSTUjc1PKFsRAe8Y6miETd+Nrt4Qk6s0idhrG0pJDP190RGI2PGUWA7I4pDs+hNxf+8TorhuZ8JlaTIFZsvClNJMCbT90lfaM5Qji2hTAt7K2FDqilDG1LJhlBbfHmZePXqRdW9rVcaN3kaRTiCYziFGpxBA66hCR4wUPAMr/DmGOfFeXc+5q0FJ585hD9wPn8APxaQ2A==</latexit>

note[1]<latexit sha1_base64="prldXZo7n7rv+64zouS8M8kwczU=">AAAB+HicbVBNSwMxFMzWr1q/Vj16CRbBU9ktgnorePFYwbWFdinZ9G0bmk2WJFsoS/+JFw8qXv0p3vw3Zts9aOtAYJh5jzeZKOVMG8/7diobm1vbO9Xd2t7+weGRe3zypGWmKARUcqm6EdHAmYDAMMOhmyogScShE03uCr8zBaWZFI9mlkKYkJFgMaPEWGnguv2EmHEU50IamPf8cODWvYa3AF4nfknqqER74H71h5JmCQhDOdG653upCXOiDKMc5rV+piEldEJG0LNUkAR0mC+Sz/GFVYY4lso+YfBC/b2Rk0TrWRLZySKnXvUK8T+vl5n4JsyZSDMDgi4PxRnHRuKiBjxkCqjhM0sIVcxmxXRMFKHGllWzJfirX14nQbNx2/AeruqtZtlGFZ2hc3SJfHSNWugetVGAKJqiZ/SK3pzceXHenY/laMUpd07RHzifPxiPk3M=</latexit><latexit sha1_base64="prldXZo7n7rv+64zouS8M8kwczU=">AAAB+HicbVBNSwMxFMzWr1q/Vj16CRbBU9ktgnorePFYwbWFdinZ9G0bmk2WJFsoS/+JFw8qXv0p3vw3Zts9aOtAYJh5jzeZKOVMG8/7diobm1vbO9Xd2t7+weGRe3zypGWmKARUcqm6EdHAmYDAMMOhmyogScShE03uCr8zBaWZFI9mlkKYkJFgMaPEWGnguv2EmHEU50IamPf8cODWvYa3AF4nfknqqER74H71h5JmCQhDOdG653upCXOiDKMc5rV+piEldEJG0LNUkAR0mC+Sz/GFVYY4lso+YfBC/b2Rk0TrWRLZySKnXvUK8T+vl5n4JsyZSDMDgi4PxRnHRuKiBjxkCqjhM0sIVcxmxXRMFKHGllWzJfirX14nQbNx2/AeruqtZtlGFZ2hc3SJfHSNWugetVGAKJqiZ/SK3pzceXHenY/laMUpd07RHzifPxiPk3M=</latexit><latexit sha1_base64="prldXZo7n7rv+64zouS8M8kwczU=">AAAB+HicbVBNSwMxFMzWr1q/Vj16CRbBU9ktgnorePFYwbWFdinZ9G0bmk2WJFsoS/+JFw8qXv0p3vw3Zts9aOtAYJh5jzeZKOVMG8/7diobm1vbO9Xd2t7+weGRe3zypGWmKARUcqm6EdHAmYDAMMOhmyogScShE03uCr8zBaWZFI9mlkKYkJFgMaPEWGnguv2EmHEU50IamPf8cODWvYa3AF4nfknqqER74H71h5JmCQhDOdG653upCXOiDKMc5rV+piEldEJG0LNUkAR0mC+Sz/GFVYY4lso+YfBC/b2Rk0TrWRLZySKnXvUK8T+vl5n4JsyZSDMDgi4PxRnHRuKiBjxkCqjhM0sIVcxmxXRMFKHGllWzJfirX14nQbNx2/AeruqtZtlGFZ2hc3SJfHSNWugetVGAKJqiZ/SK3pzceXHenY/laMUpd07RHzifPxiPk3M=</latexit>

note[N ]<latexit sha1_base64="qzsrebdgnpe2XHMKskevCdkJ+Zs=">AAAB+HicbVBNSwMxFMz6WevXqkcvwSJ4KrtFUG8FL56kgmsL7VKy6ds2NJssSbZQlv4TLx5UvPpTvPlvzLZ70NaBwDDzHm8yUcqZNp737aytb2xubVd2qrt7+weH7tHxk5aZohBQyaXqREQDZwICwwyHTqqAJBGHdjS+Lfz2BJRmUjyaaQphQoaCxYwSY6W+6/YSYkZRnAtpYNa9D/tuzat7c+BV4pekhkq0+u5XbyBploAwlBOtu76XmjAnyjDKYVbtZRpSQsdkCF1LBUlAh/k8+QyfW2WAY6nsEwbP1d8bOUm0niaRnSxy6mWvEP/zupmJr8OciTQzIOjiUJxxbCQuasADpoAaPrWEUMVsVkxHRBFqbFlVW4K//OVVEjTqN3Xv4bLWbJRtVNApOkMXyEdXqInuUAsFiKIJekav6M3JnRfn3flYjK455c4J+gPn8wdEg5OQ</latexit><latexit sha1_base64="qzsrebdgnpe2XHMKskevCdkJ+Zs=">AAAB+HicbVBNSwMxFMz6WevXqkcvwSJ4KrtFUG8FL56kgmsL7VKy6ds2NJssSbZQlv4TLx5UvPpTvPlvzLZ70NaBwDDzHm8yUcqZNp737aytb2xubVd2qrt7+weH7tHxk5aZohBQyaXqREQDZwICwwyHTqqAJBGHdjS+Lfz2BJRmUjyaaQphQoaCxYwSY6W+6/YSYkZRnAtpYNa9D/tuzat7c+BV4pekhkq0+u5XbyBploAwlBOtu76XmjAnyjDKYVbtZRpSQsdkCF1LBUlAh/k8+QyfW2WAY6nsEwbP1d8bOUm0niaRnSxy6mWvEP/zupmJr8OciTQzIOjiUJxxbCQuasADpoAaPrWEUMVsVkxHRBFqbFlVW4K//OVVEjTqN3Xv4bLWbJRtVNApOkMXyEdXqInuUAsFiKIJekav6M3JnRfn3flYjK455c4J+gPn8wdEg5OQ</latexit><latexit sha1_base64="qzsrebdgnpe2XHMKskevCdkJ+Zs=">AAAB+HicbVBNSwMxFMz6WevXqkcvwSJ4KrtFUG8FL56kgmsL7VKy6ds2NJssSbZQlv4TLx5UvPpTvPlvzLZ70NaBwDDzHm8yUcqZNp737aytb2xubVd2qrt7+weH7tHxk5aZohBQyaXqREQDZwICwwyHTqqAJBGHdjS+Lfz2BJRmUjyaaQphQoaCxYwSY6W+6/YSYkZRnAtpYNa9D/tuzat7c+BV4pekhkq0+u5XbyBploAwlBOtu76XmjAnyjDKYVbtZRpSQsdkCF1LBUlAh/k8+QyfW2WAY6nsEwbP1d8bOUm0niaRnSxy6mWvEP/zupmJr8OciTQzIOjiUJxxbCQuasADpoAaPrWEUMVsVkxHRBFqbFlVW4K//OVVEjTqN3Xv4bLWbJRtVNApOkMXyEdXqInuUAsFiKIJekav6M3JnRfn3flYjK455c4J+gPn8wdEg5OQ</latexit>

0BBBBBBBB@

e[1]...

e[n]e[n + 1]

...e[N ]

1CCCCCCCCA

0BBBBBBBB@

e[1]...

e[n]e[n + 1]

...e[N ]

1CCCCCCCCA

0BBBBBBBB@

e[1]...

e[n]e[n + 1]

...e[N ]

1CCCCCCCCA

0BBBBBBBB@

e[1]...

e[n]e[n + 1]

...e[N ]

1CCCCCCCCA

Figure 1. From MIDI to our representation of music. Anillustration of the steps involved in the proposed conversion ofMIDI sequences. See text for details.

A MIDI file contains a header (meta parameters) and pos-sibly multiple tracks that contain a sequence of MIDI mes-sages. For BachProp, we merge all tracks and consideronly the MIDI messages defining when a note starts (ONevents) or ends (OFF events). For each ON event we lookforward at the next OFF event with the same pitch P toconvert sequences of MIDI messages into a sequences ofnotes (Figure 1A). We then translate timings from the inter-nal MIDI TICK representation to quarter note lengths (Fig-ure 1B).

Page 3: Learning to Generate Music with BachProp

GRU (128)

GRU (128)

GRU (128)

ReLU ReLUReLU

Pr(dT [n + 1])<latexit sha1_base64="a7HJ0+3DL60uRoKtxvrpHa7UrGc=">AAAB8XicbVBNS8NAEJ34WetX1aOXxSJUhJKIoN6KXjxWaGwhDWWz2bRLN7thdyOU0p/hxYOKV/+NN/+N2zYHbX0w8Hhvhpl5UcaZNq777aysrq1vbJa2yts7u3v7lYPDRy1zRahPJJeqE2FNORPUN8xw2skUxWnEaTsa3k399hNVmknRMqOMhinuC5Ywgo2Vgqaqxa1AnHvhWa9SdevuDGiZeAWpQoFmr/LVjSXJUyoM4VjrwHMzE46xMoxwOil3c00zTIa4TwNLBU6pDsezkyfo1CoxSqSyJQyaqb8nxjjVepRGtjPFZqAXvan4nxfkJrkOx0xkuaGCzBclOUdGoun/KGaKEsNHlmCimL0VkQFWmBibUtmG4C2+vEz8i/pN3X24rDZuizRKcAwnUAMPrqAB99AEHwhIeIZXeHOM8+K8Ox/z1hWnmDmCP3A+fwAfs5AI</latexit><latexit sha1_base64="a7HJ0+3DL60uRoKtxvrpHa7UrGc=">AAAB8XicbVBNS8NAEJ34WetX1aOXxSJUhJKIoN6KXjxWaGwhDWWz2bRLN7thdyOU0p/hxYOKV/+NN/+N2zYHbX0w8Hhvhpl5UcaZNq777aysrq1vbJa2yts7u3v7lYPDRy1zRahPJJeqE2FNORPUN8xw2skUxWnEaTsa3k399hNVmknRMqOMhinuC5Ywgo2Vgqaqxa1AnHvhWa9SdevuDGiZeAWpQoFmr/LVjSXJUyoM4VjrwHMzE46xMoxwOil3c00zTIa4TwNLBU6pDsezkyfo1CoxSqSyJQyaqb8nxjjVepRGtjPFZqAXvan4nxfkJrkOx0xkuaGCzBclOUdGoun/KGaKEsNHlmCimL0VkQFWmBibUtmG4C2+vEz8i/pN3X24rDZuizRKcAwnUAMPrqAB99AEHwhIeIZXeHOM8+K8Ox/z1hWnmDmCP3A+fwAfs5AI</latexit><latexit sha1_base64="a7HJ0+3DL60uRoKtxvrpHa7UrGc=">AAAB8XicbVBNS8NAEJ34WetX1aOXxSJUhJKIoN6KXjxWaGwhDWWz2bRLN7thdyOU0p/hxYOKV/+NN/+N2zYHbX0w8Hhvhpl5UcaZNq777aysrq1vbJa2yts7u3v7lYPDRy1zRahPJJeqE2FNORPUN8xw2skUxWnEaTsa3k399hNVmknRMqOMhinuC5Ywgo2Vgqaqxa1AnHvhWa9SdevuDGiZeAWpQoFmr/LVjSXJUyoM4VjrwHMzE46xMoxwOil3c00zTIa4TwNLBU6pDsezkyfo1CoxSqSyJQyaqb8nxjjVepRGtjPFZqAXvan4nxfkJrkOx0xkuaGCzBclOUdGoun/KGaKEsNHlmCimL0VkQFWmBibUtmG4C2+vEz8i/pN3X24rDZuizRKcAwnUAMPrqAB99AEHwhIeIZXeHOM8+K8Ox/z1hWnmDmCP3A+fwAfs5AI</latexit>

Pr(T [n + 1])<latexit sha1_base64="i6IX2y1qUMY21zHazretCIncVKw=">AAAB8HicbVBNS8NAEJ3Ur1q/oh69LBahIpREBPVW9OKxQmOLaSib7aZdutmE3Y1QQv+FFw8qXv053vw3btsctPXBwOO9GWbmhSlnSjvOt1VaWV1b3yhvVra2d3b37P2DB5VkklCPJDyRnRArypmgnmaa004qKY5DTtvh6Hbqt5+oVCwRLT1OaRDjgWARI1gb6bEpay1fnLnBac+uOnVnBrRM3IJUoUCzZ391+wnJYio04Vgp33VSHeRYakY4nVS6maIpJiM8oL6hAsdUBfns4gk6MUofRYk0JTSaqb8nchwrNY5D0xljPVSL3lT8z/MzHV0FORNppqkg80VRxpFO0PR91GeSEs3HhmAimbkVkSGWmGgTUsWE4C6+vEy88/p13bm/qDZuijTKcATHUAMXLqEBd9AEDwgIeIZXeLOU9WK9Wx/z1pJVzBzCH1ifP1/dj5o=</latexit><latexit sha1_base64="i6IX2y1qUMY21zHazretCIncVKw=">AAAB8HicbVBNS8NAEJ3Ur1q/oh69LBahIpREBPVW9OKxQmOLaSib7aZdutmE3Y1QQv+FFw8qXv053vw3btsctPXBwOO9GWbmhSlnSjvOt1VaWV1b3yhvVra2d3b37P2DB5VkklCPJDyRnRArypmgnmaa004qKY5DTtvh6Hbqt5+oVCwRLT1OaRDjgWARI1gb6bEpay1fnLnBac+uOnVnBrRM3IJUoUCzZ391+wnJYio04Vgp33VSHeRYakY4nVS6maIpJiM8oL6hAsdUBfns4gk6MUofRYk0JTSaqb8nchwrNY5D0xljPVSL3lT8z/MzHV0FORNppqkg80VRxpFO0PR91GeSEs3HhmAimbkVkSGWmGgTUsWE4C6+vEy88/p13bm/qDZuijTKcATHUAMXLqEBd9AEDwgIeIZXeLOU9WK9Wx/z1pJVzBzCH1ifP1/dj5o=</latexit><latexit sha1_base64="i6IX2y1qUMY21zHazretCIncVKw=">AAAB8HicbVBNS8NAEJ3Ur1q/oh69LBahIpREBPVW9OKxQmOLaSib7aZdutmE3Y1QQv+FFw8qXv053vw3btsctPXBwOO9GWbmhSlnSjvOt1VaWV1b3yhvVra2d3b37P2DB5VkklCPJDyRnRArypmgnmaa004qKY5DTtvh6Hbqt5+oVCwRLT1OaRDjgWARI1gb6bEpay1fnLnBac+uOnVnBrRM3IJUoUCzZ391+wnJYio04Vgp33VSHeRYakY4nVS6maIpJiM8oL6hAsdUBfns4gk6MUofRYk0JTSaqb8nchwrNY5D0xljPVSL3lT8z/MzHV0FORNppqkg80VRxpFO0PR91GeSEs3HhmAimbkVkSGWmGgTUsWE4C6+vEy88/p13bm/qDZuijTKcATHUAMXLqEBd9AEDwgIeIZXeLOU9WK9Wx/z1pJVzBzCH1ifP1/dj5o=</latexit>

Pr(P [n + 1])<latexit sha1_base64="BWieucDb7tEaK2U4YhYGV9eFt5c=">AAAB8HicbVBNS8NAEJ3Ur1q/qh69LBahIpSkCOqt6MVjBGOLaSib7aZdutmE3Y1QQv+FFw8qXv053vw3btsctPXBwOO9GWbmhSlnStv2t1VaWV1b3yhvVra2d3b3qvsHDyrJJKEeSXgiOyFWlDNBPc00p51UUhyHnLbD0c3Ubz9RqVgi7vU4pUGMB4JFjGBtpEdX1l1fnDnBaa9asxv2DGiZOAWpQQG3V/3q9hOSxVRowrFSvmOnOsix1IxwOql0M0VTTEZ4QH1DBY6pCvLZxRN0YpQ+ihJpSmg0U39P5DhWahyHpjPGeqgWvan4n+dnOroMcibSTFNB5ouijCOdoOn7qM8kJZqPDcFEMnMrIkMsMdEmpIoJwVl8eZl4zcZVw747r7WuizTKcATHUAcHLqAFt+CCBwQEPMMrvFnKerHerY95a8kqZg7hD6zPH1m5j5Y=</latexit><latexit sha1_base64="BWieucDb7tEaK2U4YhYGV9eFt5c=">AAAB8HicbVBNS8NAEJ3Ur1q/qh69LBahIpSkCOqt6MVjBGOLaSib7aZdutmE3Y1QQv+FFw8qXv053vw3btsctPXBwOO9GWbmhSlnStv2t1VaWV1b3yhvVra2d3b3qvsHDyrJJKEeSXgiOyFWlDNBPc00p51UUhyHnLbD0c3Ubz9RqVgi7vU4pUGMB4JFjGBtpEdX1l1fnDnBaa9asxv2DGiZOAWpQQG3V/3q9hOSxVRowrFSvmOnOsix1IxwOql0M0VTTEZ4QH1DBY6pCvLZxRN0YpQ+ihJpSmg0U39P5DhWahyHpjPGeqgWvan4n+dnOroMcibSTFNB5ouijCOdoOn7qM8kJZqPDcFEMnMrIkMsMdEmpIoJwVl8eZl4zcZVw747r7WuizTKcATHUAcHLqAFt+CCBwQEPMMrvFnKerHerY95a8kqZg7hD6zPH1m5j5Y=</latexit><latexit sha1_base64="BWieucDb7tEaK2U4YhYGV9eFt5c=">AAAB8HicbVBNS8NAEJ3Ur1q/qh69LBahIpSkCOqt6MVjBGOLaSib7aZdutmE3Y1QQv+FFw8qXv053vw3btsctPXBwOO9GWbmhSlnStv2t1VaWV1b3yhvVra2d3b3qvsHDyrJJKEeSXgiOyFWlDNBPc00p51UUhyHnLbD0c3Ubz9RqVgi7vU4pUGMB4JFjGBtpEdX1l1fnDnBaa9asxv2DGiZOAWpQQG3V/3q9hOSxVRowrFSvmOnOsix1IxwOql0M0VTTEZ4QH1DBY6pCvLZxRN0YpQ+ihJpSmg0U39P5DhWahyHpjPGeqgWvan4n+dnOroMcibSTFNB5ouijCOdoOn7qM8kJZqPDcFEMnMrIkMsMdEmpIoJwVl8eZl4zcZVw747r7WuizTKcATHUAcHLqAFt+CCBwQEPMMrvFnKerHerY95a8kqZg7hD6zPH1m5j5Y=</latexit>

Interlayer Readout Recurrent

0@

dT [n]T [n]P [n]

1A

<latexit sha1_base64="wVvkYwsa75CNtmHsEyiEz6QSYUY=">AAACFnicbVBNS8NAEN3Ur1q/qh69LBbBU0hEUG9FLx4rNLbQhLLZTNulm03Y3Ygl9F948a948aDiVbz5b9y0RbT1wQ6P92bYmRemnCntOF9WaWl5ZXWtvF7Z2Nza3qnu7t2qJJMUPJrwRLZDooAzAZ5mmkM7lUDikEMrHF4VfusOpGKJaOpRCkFM+oL1GCXaSN2q7YfQZyJPY6Ilux9HzY4IfH9aG0UFEf24lW615tjOBHiRuDNSQzM0utVPP0poFoPQlBOlOq6T6iAnUjPKYVzxMwUpoUPSh46hgsSggnxy1xgfGSXCvUSaJzSeqL8nchIrNYpD02kWHKh5rxD/8zqZ7p0HORNppkHQ6Ue9jGOd4CIkHDEJVPORIYRKZnbFdEAkodpEWYTgzp+8SLwT+8J2bk5r9ctZGmV0gA7RMXLRGaqja9RAHqLoAT2hF/RqPVrP1pv1Pm0tWbOZffQH1sc3xwSggA==</latexit><latexit sha1_base64="wVvkYwsa75CNtmHsEyiEz6QSYUY=">AAACFnicbVBNS8NAEN3Ur1q/qh69LBbBU0hEUG9FLx4rNLbQhLLZTNulm03Y3Ygl9F948a948aDiVbz5b9y0RbT1wQ6P92bYmRemnCntOF9WaWl5ZXWtvF7Z2Nza3qnu7t2qJJMUPJrwRLZDooAzAZ5mmkM7lUDikEMrHF4VfusOpGKJaOpRCkFM+oL1GCXaSN2q7YfQZyJPY6Ilux9HzY4IfH9aG0UFEf24lW615tjOBHiRuDNSQzM0utVPP0poFoPQlBOlOq6T6iAnUjPKYVzxMwUpoUPSh46hgsSggnxy1xgfGSXCvUSaJzSeqL8nchIrNYpD02kWHKh5rxD/8zqZ7p0HORNppkHQ6Ue9jGOd4CIkHDEJVPORIYRKZnbFdEAkodpEWYTgzp+8SLwT+8J2bk5r9ctZGmV0gA7RMXLRGaqja9RAHqLoAT2hF/RqPVrP1pv1Pm0tWbOZffQH1sc3xwSggA==</latexit><latexit sha1_base64="wVvkYwsa75CNtmHsEyiEz6QSYUY=">AAACFnicbVBNS8NAEN3Ur1q/qh69LBbBU0hEUG9FLx4rNLbQhLLZTNulm03Y3Ygl9F948a948aDiVbz5b9y0RbT1wQ6P92bYmRemnCntOF9WaWl5ZXWtvF7Z2Nza3qnu7t2qJJMUPJrwRLZDooAzAZ5mmkM7lUDikEMrHF4VfusOpGKJaOpRCkFM+oL1GCXaSN2q7YfQZyJPY6Ilux9HzY4IfH9aG0UFEf24lW615tjOBHiRuDNSQzM0utVPP0poFoPQlBOlOq6T6iAnUjPKYVzxMwUpoUPSh46hgsSggnxy1xgfGSXCvUSaJzSeqL8nchIrNYpD02kWHKh5rxD/8zqZ7p0HORNppkHQ6Ue9jGOd4CIkHDEJVPORIYRKZnbFdEAkodpEWYTgzp+8SLwT+8J2bk5r9ctZGmV0gA7RMXLRGaqja9RAHqLoAT2hF/RqPVrP1pv1Pm0tWbOZffQH1sc3xwSggA==</latexit>

dT [n + 1]<latexit sha1_base64="1Ktd/qzKsanpGSZxO+Zsy5piVSo=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBZBEEoignorevFYobGFNJTNZtMu3Wzi7kYooX/CiwcVr/4eb/4bN20O2vpg4PHeDDPzgpQzpW3726qsrK6tb1Q3a1vbO7t79f2DB5VkklCXJDyRvQArypmgrmaa014qKY4DTrvB+Lbwu09UKpaIjp6k1I/xULCIEayN1As7njhz/Nqg3rCb9gxomTglaUCJ9qD+1Q8TksVUaMKxUp5jp9rPsdSMcDqt9TNFU0zGeEg9QwWOqfLz2b1TdGKUEEWJNCU0mqm/J3IcKzWJA9MZYz1Si14h/ud5mY6u/JyJNNNUkPmiKONIJ6h4HoVMUqL5xBBMJDO3IjLCEhNtIipCcBZfXibuefO6ad9fNFo3ZRpVOIJjOAUHLqEFd9AGFwhweIZXeLMerRfr3fqYt1ascuYQ/sD6/AEXv47h</latexit><latexit sha1_base64="1Ktd/qzKsanpGSZxO+Zsy5piVSo=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBZBEEoignorevFYobGFNJTNZtMu3Wzi7kYooX/CiwcVr/4eb/4bN20O2vpg4PHeDDPzgpQzpW3726qsrK6tb1Q3a1vbO7t79f2DB5VkklCXJDyRvQArypmgrmaa014qKY4DTrvB+Lbwu09UKpaIjp6k1I/xULCIEayN1As7njhz/Nqg3rCb9gxomTglaUCJ9qD+1Q8TksVUaMKxUp5jp9rPsdSMcDqt9TNFU0zGeEg9QwWOqfLz2b1TdGKUEEWJNCU0mqm/J3IcKzWJA9MZYz1Si14h/ud5mY6u/JyJNNNUkPmiKONIJ6h4HoVMUqL5xBBMJDO3IjLCEhNtIipCcBZfXibuefO6ad9fNFo3ZRpVOIJjOAUHLqEFd9AGFwhweIZXeLMerRfr3fqYt1ascuYQ/sD6/AEXv47h</latexit><latexit sha1_base64="1Ktd/qzKsanpGSZxO+Zsy5piVSo=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBZBEEoignorevFYobGFNJTNZtMu3Wzi7kYooX/CiwcVr/4eb/4bN20O2vpg4PHeDDPzgpQzpW3726qsrK6tb1Q3a1vbO7t79f2DB5VkklCXJDyRvQArypmgrmaa014qKY4DTrvB+Lbwu09UKpaIjp6k1I/xULCIEayN1As7njhz/Nqg3rCb9gxomTglaUCJ9qD+1Q8TksVUaMKxUp5jp9rPsdSMcDqt9TNFU0zGeEg9QwWOqfLz2b1TdGKUEEWJNCU0mqm/J3IcKzWJA9MZYz1Si14h/ud5mY6u/JyJNNNUkPmiKONIJ6h4HoVMUqL5xBBMJDO3IjLCEhNtIipCcBZfXibuefO6ad9fNFo3ZRpVOIJjOAUHLqEFd9AGFwhweIZXeLMerRfr3fqYt1ascuYQ/sD6/AEXv47h</latexit>

T [n + 1]<latexit sha1_base64="7qsityv991ZxbGVC/ktWqp+08Yo=">AAAB7XicbVBNS8NAEJ34WetX1aOXxSIIQklEUG9FLx4rNLaQhrLZbtqlm82yuxFK6I/w4kHFq//Hm//GTZuDtj4YeLw3w8y8SHKmjet+Oyura+sbm5Wt6vbO7t5+7eDwUaeZItQnKU9VN8Kaciaob5jhtCsVxUnEaSca3xV+54kqzVLRNhNJwwQPBYsZwcZKnXYgzr2w2q/V3YY7A1omXknqUKLVr331BinJEioM4VjrwHOlCXOsDCOcTqu9TFOJyRgPaWCpwAnVYT47d4pOrTJAcapsCYNm6u+JHCdaT5LIdibYjPSiV4j/eUFm4uswZ0JmhgoyXxRnHJkUFb+jAVOUGD6xBBPF7K2IjLDCxNiEihC8xZeXiX/RuGm4D5f15m2ZRgWO4QTOwIMraMI9tMAHAmN4hld4c6Tz4rw7H/PWFaecOYI/cD5/AFjZjnM=</latexit><latexit sha1_base64="7qsityv991ZxbGVC/ktWqp+08Yo=">AAAB7XicbVBNS8NAEJ34WetX1aOXxSIIQklEUG9FLx4rNLaQhrLZbtqlm82yuxFK6I/w4kHFq//Hm//GTZuDtj4YeLw3w8y8SHKmjet+Oyura+sbm5Wt6vbO7t5+7eDwUaeZItQnKU9VN8Kaciaob5jhtCsVxUnEaSca3xV+54kqzVLRNhNJwwQPBYsZwcZKnXYgzr2w2q/V3YY7A1omXknqUKLVr331BinJEioM4VjrwHOlCXOsDCOcTqu9TFOJyRgPaWCpwAnVYT47d4pOrTJAcapsCYNm6u+JHCdaT5LIdibYjPSiV4j/eUFm4uswZ0JmhgoyXxRnHJkUFb+jAVOUGD6xBBPF7K2IjLDCxNiEihC8xZeXiX/RuGm4D5f15m2ZRgWO4QTOwIMraMI9tMAHAmN4hld4c6Tz4rw7H/PWFaecOYI/cD5/AFjZjnM=</latexit><latexit sha1_base64="7qsityv991ZxbGVC/ktWqp+08Yo=">AAAB7XicbVBNS8NAEJ34WetX1aOXxSIIQklEUG9FLx4rNLaQhrLZbtqlm82yuxFK6I/w4kHFq//Hm//GTZuDtj4YeLw3w8y8SHKmjet+Oyura+sbm5Wt6vbO7t5+7eDwUaeZItQnKU9VN8Kaciaob5jhtCsVxUnEaSca3xV+54kqzVLRNhNJwwQPBYsZwcZKnXYgzr2w2q/V3YY7A1omXknqUKLVr331BinJEioM4VjrwHOlCXOsDCOcTqu9TFOJyRgPaWCpwAnVYT47d4pOrTJAcapsCYNm6u+JHCdaT5LIdibYjPSiV4j/eUFm4uswZ0JmhgoyXxRnHJkUFb+jAVOUGD6xBBPF7K2IjLDCxNiEihC8xZeXiX/RuGm4D5f15m2ZRgWO4QTOwIMraMI9tMAHAmN4hld4c6Tz4rw7H/PWFaecOYI/cD5/AFjZjnM=</latexit>

H1[n]<latexit sha1_base64="ztp1CQAuI8bB/Ri2uHzhJB76oDQ=">AAAB7XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLz1WMLaQhrLZbtqlm82yuxFK6I/w4kHFq//Hm//GTZuDtj4YeLw3w8y8SHKmjet+O5W19Y3Nrep2bWd3b/+gfnj0qNNMEeqTlKeqF2FNORPUN8xw2pOK4iTitBtN7gq/+0SVZql4MFNJwwSPBIsZwcZK3fbAC0RYG9QbbtOdA60SryQNKNEZ1L/6w5RkCRWGcKx14LnShDlWhhFOZ7V+pqnEZIJHNLBU4ITqMJ+fO0NnVhmiOFW2hEFz9fdEjhOtp0lkOxNsxnrZK8T/vCAz8XWYMyEzQwVZLIozjkyKit/RkClKDJ9agoli9lZExlhhYmxCRQje8surxL9o3jTd+8tG67ZMowoncArn4MEVtKANHfCBwASe4RXeHOm8OO/Ox6K14pQzx/AHzucPlTOOmw==</latexit><latexit sha1_base64="ztp1CQAuI8bB/Ri2uHzhJB76oDQ=">AAAB7XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLz1WMLaQhrLZbtqlm82yuxFK6I/w4kHFq//Hm//GTZuDtj4YeLw3w8y8SHKmjet+O5W19Y3Nrep2bWd3b/+gfnj0qNNMEeqTlKeqF2FNORPUN8xw2pOK4iTitBtN7gq/+0SVZql4MFNJwwSPBIsZwcZK3fbAC0RYG9QbbtOdA60SryQNKNEZ1L/6w5RkCRWGcKx14LnShDlWhhFOZ7V+pqnEZIJHNLBU4ITqMJ+fO0NnVhmiOFW2hEFz9fdEjhOtp0lkOxNsxnrZK8T/vCAz8XWYMyEzQwVZLIozjkyKit/RkClKDJ9agoli9lZExlhhYmxCRQje8surxL9o3jTd+8tG67ZMowoncArn4MEVtKANHfCBwASe4RXeHOm8OO/Ox6K14pQzx/AHzucPlTOOmw==</latexit><latexit sha1_base64="ztp1CQAuI8bB/Ri2uHzhJB76oDQ=">AAAB7XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG9FLz1WMLaQhrLZbtqlm82yuxFK6I/w4kHFq//Hm//GTZuDtj4YeLw3w8y8SHKmjet+O5W19Y3Nrep2bWd3b/+gfnj0qNNMEeqTlKeqF2FNORPUN8xw2pOK4iTitBtN7gq/+0SVZql4MFNJwwSPBIsZwcZK3fbAC0RYG9QbbtOdA60SryQNKNEZ1L/6w5RkCRWGcKx14LnShDlWhhFOZ7V+pqnEZIJHNLBU4ITqMJ+fO0NnVhmiOFW2hEFz9fdEjhOtp0lkOxNsxnrZK8T/vCAz8XWYMyEzQwVZLIozjkyKit/RkClKDJ9agoli9lZExlhhYmxCRQje8surxL9o3jTd+8tG67ZMowoncArn4MEVtKANHfCBwASe4RXeHOm8OO/Ox6K14pQzx/AHzucPlTOOmw==</latexit>

H2[n]<latexit sha1_base64="Mu+O3mVp8+pJLuuuVb/xQzG0dbw=">AAAB7XicbVBNS8NAEJ31s9avqkcvi0XwVJIiqLeilx4rGFtIQ9lsN+3SzSbsboQS+iO8eFDx6v/x5r9x0+agrQ8GHu/NMDMvTAXXxnG+0dr6xubWdmWnuru3f3BYOzp+1EmmKPNoIhLVC4lmgkvmGW4E66WKkTgUrBtO7gq/+8SU5ol8MNOUBTEZSR5xSoyVuu1B05dBdVCrOw1nDrxK3JLUoURnUPvqDxOaxUwaKojWvuukJsiJMpwKNqv2M81SQidkxHxLJYmZDvL5uTN8bpUhjhJlSxo8V39P5CTWehqHtjMmZqyXvUL8z/MzE10HOZdpZpiki0VRJrBJcPE7HnLFqBFTSwhV3N6K6ZgoQo1NqAjBXX55lXjNxk3Dub+st27LNCpwCmdwAS5cQQva0AEPKEzgGV7hDaXoBb2jj0XrGipnTuAP0OcPlrqOnA==</latexit><latexit sha1_base64="Mu+O3mVp8+pJLuuuVb/xQzG0dbw=">AAAB7XicbVBNS8NAEJ31s9avqkcvi0XwVJIiqLeilx4rGFtIQ9lsN+3SzSbsboQS+iO8eFDx6v/x5r9x0+agrQ8GHu/NMDMvTAXXxnG+0dr6xubWdmWnuru3f3BYOzp+1EmmKPNoIhLVC4lmgkvmGW4E66WKkTgUrBtO7gq/+8SU5ol8MNOUBTEZSR5xSoyVuu1B05dBdVCrOw1nDrxK3JLUoURnUPvqDxOaxUwaKojWvuukJsiJMpwKNqv2M81SQidkxHxLJYmZDvL5uTN8bpUhjhJlSxo8V39P5CTWehqHtjMmZqyXvUL8z/MzE10HOZdpZpiki0VRJrBJcPE7HnLFqBFTSwhV3N6K6ZgoQo1NqAjBXX55lXjNxk3Dub+st27LNCpwCmdwAS5cQQva0AEPKEzgGV7hDaXoBb2jj0XrGipnTuAP0OcPlrqOnA==</latexit><latexit sha1_base64="Mu+O3mVp8+pJLuuuVb/xQzG0dbw=">AAAB7XicbVBNS8NAEJ31s9avqkcvi0XwVJIiqLeilx4rGFtIQ9lsN+3SzSbsboQS+iO8eFDx6v/x5r9x0+agrQ8GHu/NMDMvTAXXxnG+0dr6xubWdmWnuru3f3BYOzp+1EmmKPNoIhLVC4lmgkvmGW4E66WKkTgUrBtO7gq/+8SU5ol8MNOUBTEZSR5xSoyVuu1B05dBdVCrOw1nDrxK3JLUoURnUPvqDxOaxUwaKojWvuukJsiJMpwKNqv2M81SQidkxHxLJYmZDvL5uTN8bpUhjhJlSxo8V39P5CTWehqHtjMmZqyXvUL8z/MzE10HOZdpZpiki0VRJrBJcPE7HnLFqBFTSwhV3N6K6ZgoQo1NqAjBXX55lXjNxk3Dub+st27LNCpwCmdwAS5cQQva0AEPKEzgGV7hDaXoBb2jj0XrGipnTuAP0OcPlrqOnA==</latexit>

H3[n]<latexit sha1_base64="ZPVosBgRxDbf3vaOm8oyFPkWSx0=">AAAB7XicbVBNS8NAEJ34WetX1aOXxSJ4KokK6q3opccKxhbSUDbbTbt0dxN2N0IJ/RFePKh49f9489+4aXPQ1gcDj/dmmJkXpZxp47rfzsrq2vrGZmWrur2zu7dfOzh81EmmCPVJwhPVjbCmnEnqG2Y47aaKYhFx2onGd4XfeaJKs0Q+mElKQ4GHksWMYGOlTqt/Eciw2q/V3YY7A1omXknqUKLdr331BgnJBJWGcKx14LmpCXOsDCOcTqu9TNMUkzEe0sBSiQXVYT47d4pOrTJAcaJsSYNm6u+JHAutJyKynQKbkV70CvE/L8hMfB3mTKaZoZLMF8UZRyZBxe9owBQlhk8swUQxeysiI6wwMTahIgRv8eVl4p83bhru/WW9eVumUYFjOIEz8OAKmtCCNvhAYAzP8ApvTuq8OO/Ox7x1xSlnjuAPnM8fmEGOnQ==</latexit><latexit sha1_base64="ZPVosBgRxDbf3vaOm8oyFPkWSx0=">AAAB7XicbVBNS8NAEJ34WetX1aOXxSJ4KokK6q3opccKxhbSUDbbTbt0dxN2N0IJ/RFePKh49f9489+4aXPQ1gcDj/dmmJkXpZxp47rfzsrq2vrGZmWrur2zu7dfOzh81EmmCPVJwhPVjbCmnEnqG2Y47aaKYhFx2onGd4XfeaJKs0Q+mElKQ4GHksWMYGOlTqt/Eciw2q/V3YY7A1omXknqUKLdr331BgnJBJWGcKx14LmpCXOsDCOcTqu9TNMUkzEe0sBSiQXVYT47d4pOrTJAcaJsSYNm6u+JHAutJyKynQKbkV70CvE/L8hMfB3mTKaZoZLMF8UZRyZBxe9owBQlhk8swUQxeysiI6wwMTahIgRv8eVl4p83bhru/WW9eVumUYFjOIEz8OAKmtCCNvhAYAzP8ApvTuq8OO/Ox7x1xSlnjuAPnM8fmEGOnQ==</latexit><latexit sha1_base64="ZPVosBgRxDbf3vaOm8oyFPkWSx0=">AAAB7XicbVBNS8NAEJ34WetX1aOXxSJ4KokK6q3opccKxhbSUDbbTbt0dxN2N0IJ/RFePKh49f9489+4aXPQ1gcDj/dmmJkXpZxp47rfzsrq2vrGZmWrur2zu7dfOzh81EmmCPVJwhPVjbCmnEnqG2Y47aaKYhFx2onGd4XfeaJKs0Q+mElKQ4GHksWMYGOlTqt/Eciw2q/V3YY7A1omXknqUKLdr331BgnJBJWGcKx14LmpCXOsDCOcTqu9TNMUkzEe0sBSiQXVYT47d4pOrTJAcaJsSYNm6u+JHAutJyKynQKbkV70CvE/L8hMfB3mTKaZoZLMF8UZRyZBxe9owBQlhk8swUQxeysiI6wwMTahIgRv8eVl4p83bhru/WW9eVumUYFjOIEz8OAKmtCCNvhAYAzP8ApvTuq8OO/Ox7x1xSlnjuAPnM8fmEGOnQ==</latexit>

note[n]<latexit sha1_base64="pz7eq6swOaBfMbEKPim9jLrGZiA=">AAAB+XicbVBNS8NAEN34WetXqkcvwSJ4KokI6q3oxWMFYwtpKJvtpF262YTdiVpif4oXDype/Sfe/DcmbQ7a+mDg8d4MM/OCRHCNtv1tLC2vrK6tVzaqm1vbO7tmbe9Ox6li4LJYxKoTUA2CS3CRo4BOooBGgYB2MLoq/PY9KM1jeYvjBPyIDiQPOaOYSz2z1kV4xCDMZIww8aRf7Zl1u2FPYS0SpyR1UqLVM7+6/ZilEUhkgmrtOXaCfkYVciZgUu2mGhLKRnQAXk4ljUD72fT0iXWUK30rjFVeEq2p+nsio5HW4yjIOyOKQz3vFeJ/npdieO5nXCYpgmSzRWEqLIytIgerzxUwFOOcUKZ4fqvFhlRRhnlaRQjO/MuLxD1pXDTsm9N687JMo0IOyCE5Jg45I01yTVrEJYw8kGfySt6MJ+PFeDc+Zq1LRjmzT/7A+PwB3umT7w==</latexit><latexit sha1_base64="pz7eq6swOaBfMbEKPim9jLrGZiA=">AAAB+XicbVBNS8NAEN34WetXqkcvwSJ4KokI6q3oxWMFYwtpKJvtpF262YTdiVpif4oXDype/Sfe/DcmbQ7a+mDg8d4MM/OCRHCNtv1tLC2vrK6tVzaqm1vbO7tmbe9Ox6li4LJYxKoTUA2CS3CRo4BOooBGgYB2MLoq/PY9KM1jeYvjBPyIDiQPOaOYSz2z1kV4xCDMZIww8aRf7Zl1u2FPYS0SpyR1UqLVM7+6/ZilEUhkgmrtOXaCfkYVciZgUu2mGhLKRnQAXk4ljUD72fT0iXWUK30rjFVeEq2p+nsio5HW4yjIOyOKQz3vFeJ/npdieO5nXCYpgmSzRWEqLIytIgerzxUwFOOcUKZ4fqvFhlRRhnlaRQjO/MuLxD1pXDTsm9N687JMo0IOyCE5Jg45I01yTVrEJYw8kGfySt6MJ+PFeDc+Zq1LRjmzT/7A+PwB3umT7w==</latexit><latexit sha1_base64="pz7eq6swOaBfMbEKPim9jLrGZiA=">AAAB+XicbVBNS8NAEN34WetXqkcvwSJ4KokI6q3oxWMFYwtpKJvtpF262YTdiVpif4oXDype/Sfe/DcmbQ7a+mDg8d4MM/OCRHCNtv1tLC2vrK6tVzaqm1vbO7tmbe9Ox6li4LJYxKoTUA2CS3CRo4BOooBGgYB2MLoq/PY9KM1jeYvjBPyIDiQPOaOYSz2z1kV4xCDMZIww8aRf7Zl1u2FPYS0SpyR1UqLVM7+6/ZilEUhkgmrtOXaCfkYVciZgUu2mGhLKRnQAXk4ljUD72fT0iXWUK30rjFVeEq2p+nsio5HW4yjIOyOKQz3vFeJ/npdieO5nXCYpgmSzRWEqLIytIgerzxUwFOOcUKZ4fqvFhlRRhnlaRQjO/MuLxD1pXDTsm9N687JMo0IOyCE5Jg45I01yTVrEJYw8kGfySt6MJ+PFeDc+Zq1LRjmzT/7A+PwB3umT7w==</latexit>

Figure 2. BachProp neural architecture. See text for details.

Next, we round all durations T [n] such that they are inthe set of possible note lengths (duration set in Figure 1C)expressed in units of a quarter note, similar to durations instandard music notation software. Similarly, we round thetime-shifts dT [n] to 0 or one of the possible note lengths.Mapping to the closest value in the set removes tempo-ral jitter around the standard note duration that may havebeen introduced accidentally at the moment of recordingthe MIDI file (Figure 1C). While this standardization maybe desired when expressive timing is not taken into ac-count, it is straightforward to extend the duration dictio-nary to include also values that allow to model expressivetiming. However, we believe that a good generative modelof music should be trained to model the music structureonly. If the music representation is introducing additionaltemporal dependencies, the model will need to learn theseinterfering structures as well. In that sense, the processingwe designed for BachProp and presented in this section al-lows it to focus on the essential structure of music.

In order for BachProp to learn tonality, during trainingand before each new epoch, we randomly transpose everysong within the available bounds of the pitch set. For eachsong we compute one of the possible shift of semitonesand apply it as an offset to all pitches within the song. Be-cause a single MIDI sequence will be transposed with upto 20 offsets, this augmentation method allows BachPropto learn the temporal structure of music on more examples.

Finally, we add an artificial note at the beginning and endof each score. After training, the inaudible ‘boundary note’is used by the model to seed and end the generation ofsongs.

3.2 The BachProp neural network

We employ a deep GRU [28] network with three consecu-tive layers as schematized in Figure 2. The network’s taskis to infer the probability distribution over the next possi-ble notes from the representation of the current note andthe network’s internal state (the network representation ofthe history of notes).

The probability of a sequence of N notes note[1 : N ] =(note[1], . . . ,note[N ]) is given by

Pr(note[1 : N ]) =

Pr(note[1])

N−1∏

n=1

Pr(note[n+ 1]|note[1 : n]) . (2)

Each term on the right hand side can be further split into

Pr(note[n+ 1]|note[1 : n]) =

Pr(dT [n+ 1]|note[1 : n])×Pr(T [n+ 1]|note[1 : n], dT [n+ 1])×

Pr(P [n+ 1]|note[1 : n], dT [n+ 1], T [n+ 1]) . (3)

The goal of training the Bachprop network with parame-ters θ is to approximate the conditional probability distri-butions on the right hand side of Equation (3).

In the BachProp network (Figure 2), the conditioning onthe history note[1 : n] is implemented by the values ofthe shared hidden states. The hidden state is composed of3 recurrent layers with 128 gated-recurrent units (GRU).The state H1[n] of the first hidden layer is updated withinput note[n] and previous state H1[n − 1]. The state ofthe upper layers Hi[n] for i = 2, 3 are updated with inputsHi−1[n] and Hi[n− 1].note[n] is represented by three one-hot vectors encod-

ing separately dT [n], T [n] and P [n], i.e. every entry inthese vectors is 0 but the one mapped to the value x[n]for x = dT, T, P . The length of each of these vectors isdefined by the size of the respective dictionary Lx. There-fore, each song is encoded as a set of three 2-dimensionalbinary matrices of sizeNs×Lx, whereNs stands for num-ber of notes in song s and Lx for the size of the set ofunique time-shifts (x = dT ), note durations (x = T ) orkeys (x = P ) present in the original corpus. These matri-ces are zero-padded to account for variable input lengthsNs.

During training, songs are presented to the network in se-quences of batches containing binary tensors with dimen-sion B × N × L. B = 32 is the number of songs pre-sented together over which the gradient of the error sig-nal is averaged. N = 128 is the number of consecu-tive notes over which the gradients are computed, i.e. weused truncated backpropagation through time. L = LdT +LT +LP is the size of the entire input vector encoding fornote[n]. Though the gradient is truncated, the recurrentunits are stateful: they maintain their states across consec-utive batches while the same sets ofB songs are being pre-sented. When a new batch of B songs is being presentedto the network, hidden states are reset.

To generate note[n+1], one third (H1[n] in Figure 2) ofthe full hidden state is fed into a feedforward network withone layer of Rectified linear (ReLU) units and one out-put softmax layer that represents Pr(dT [n+ 1]|H1[n]) ≈Pr(dT [n+1]|note[1 : n]). The chosen dT [n+1] togetherwithH1[n] andH2[n] is fed into a second feedforward net-work with one layer of ReLU units and an output softmaxlayer that represents Pr(T [n + 1]|H1[n], H2[n], dT [n +1]) ≈ Pr(T [n + 1]|note[1 : n], dT [n + 1]). In a simi-larly way, the pitch is sampled from Pr(P [n + 1]|H1[n],H2[n], H3[n], dT [n+1], T [n+1]) ≈ Pr(T [n+1]|note[1 :n], dT [n + 1], T [n + 1]). The ReLU and softmax readoutlayers have the same sizeLx as the dictionary of the featurethey model. These three small steps of sampling dT [n+1],T [n+1] and P [n+1] form together one big step from noten to note n+ 1.

Page 4: Learning to Generate Music with BachProp

The resulting sequence of notes is a newly generated scoresampled from BachProp. Note that, the temperature ofsampling can be adapted to the confidence we give to themodel predictions [5, 29]. In particular, any model trainedwith a corpus that exhibits many repetition of patterns, willgenerate scores with more examples of these repetitions forlower sampling temperatures. Indeed, a lower temperaturewill reduce the probability to select an undesired note thatis not part of the pattern to be repeated. Finally, the gen-erated sequence of notes in our representation can easilybe translated back to a MIDI sequence by reversing themethod schematized in Figure 1.

BachProp has been implemented in Python using the KerasAPI [30]. Code is available on GitHub 1 .

3.3 Comparison against plagiarism and other models

Even in well-established domains such as computer visionand image generation, it is not clear how to compare gen-erative models [19]. But in order to turn generative modelsof music eventually into useful tools for composers, theyshould be able to generate (1) plagiarism-free music of (2)a predefined style or mood that is (3) pleasant to listen to.

A way of measuring plagiarism is to control overfitting bycomparing the loss on training and validation data. Whilethis is a simple method it is rather coarse since it works onsongs as a whole. Instead we propose novelty profiles thatcompare the co-occurrence of short note sequences acrossdifferent data sets. A crucial parameter of novelty profilesis the length of a note sequence on which the comparisontakes place. We adapted the novelty profile, a measure ofsimilarity between any given score and a reference corpus,from [5]. For a pattern size of 6 notes, a novelty score of1 indicates that all patterns of 6 consecutive notes are notpresent in the reference corpus. On the other hand, a notesequence that contains only patterns found in the referencecorpus would exhibit a novelty score of 0. We define thebinary novelty of a single pattern by checking if all threefeatures (dT [n−m : n], T [n−m : n], P [n−m : n]) of thenotes included in the pattern are found in the same orderanywhere in the reference corpus. The novelty score of anentire song is the average binary novelty over all possiblepatterns.

Models that are trained on the same representation of mu-sic can be compared by their likelihood to assess how wellthey generate pieces of a predefined type. But if the mod-els represent probability distributions over different spaces,which is quickly the case when different representationsare used, they are unfortunately not comparable in termsof likelihood. For example, the event based representationfrom [26] can in principle produce all possible note se-quences. But it could also generate nonsensical sequencesof multiple consecutive NOTE OFF events, without corre-sponding previous NOTE ON events. To nevertheless com-pare models that build on different representations of mu-sic we propose simple statistics like interval distributionsthat can be applied to the samples of each generative modelof music.

1 https://github.com/FlorianColombo/BachProp

Finally, to compare the pleasantness of the generated mu-sic, one can ask people to rate different pieces; an approachthat is followed in previous works (e.g. [6]). We also invitethe reader to listen to the large collections of non-cherry-picked generated examples [20].

4. RESULTS AND DISCUSSION

4.1 Datasets

We consider four MIDI corpora with different musical struc-tures and styles (see Table 2). The Nottingham database[31] contains British and American folk tunes. The musi-cal structure of all songs is very similar with a melody ontop of simple chords. The Chorales corpus [16] includeshundreds of four-part chorales harmonized by Bach. Allchorales share some common structures, such as the num-ber of voices and rhythmical patterns. For comparison weused the same filtering of songs as DeepBach [32] to ex-clude chorales with number of voices unequal four. Weconsider both Nottingham and Chorales corpora as homo-geneous data sets. The John Sankey data set [17] is a col-lection of MIDI sequences recorded by John Sankey on adigital keyboard. Even though all songs were composed byBach, the pieces are rather different. In addition, this dataset was recorded live from the digital keyboard and thus weapplied the temporal normalization described above. Atlast, the string quartets data set [18] includes string quar-tets from Haydn and Mozart. Here again, there is a largeheterogeneity of pieces across the corpus.

Renderings of original and generated scores are availablefor listening on the webpage containing media for this pa-per 2 [20]. To train BachProp on the different corpora, weused the same network architecture, number of neurons,initialization and learning parameters, but each of the net-work was trained on a different corpus.

4.2 Alternative models

In addition to BachProp, we trained six other models; threeBachProp variants to assess the impact of our design choices,one baseline model and two previously published and avail-able artificial composers. PolyDAC and IndepBP are di-rect BachProp variants. MidiBP is a version of BachPropthat utilizes a different representation of MIDI note se-quences inspired by [26]. The two state-of-the-art artifi-cial composers, DeepBach [6] and PolyRNN [15] allow usto compare scores generated by models of our design withother algorithms. The 6th model is a multi-layer percep-tron model (MLP) and serves as a baseline control.

PolyDAC is a polyphonic version of [5]. It models thesame conditional distribution as BachProp but instead ofreading out the probabilities from shared hidden layer states,it models each note feature with three independent neuralnetworks. The time-shift, duration, and pitch networks arecomposed of three recurrent layers with 16, 128, and 256GRUs respectively. IndepBP assumes that all note fea-tures are independent from each others. As such, Pr(dT [n+1]), Pr(T [n + 1]), and Pr(P [n + 1]) are read out by

2 Media webpage: https://goo.gl/Z4AfPg

Page 5: Learning to Generate Music with BachProp

three softmax output layers directly from the hidden stateof three hidden layers composed of 128 GRUs that takesas input the one-hot encoding of the nth note. MidiBPneural architecture consists of three recurrent layers com-posed of 128 GRUs. Here, the MIDI note sequences arerepresented differently. While the normalization and pre-processing is done as described above (Figure 1), we thenconvert the normalized music score back to the MIDI-likeformat proposed in [26] where in each time step a singleon-hot vector defines either a NOTE ON event and its cor-responding pitch, a NOTE OFF event and its correspond-ing pitch, or a time-shift and its corresponding duration(defined by our duration representation). Therefore, a sin-gle softmax read out layer is used to sample the upcomingMIDI event. MLP has no recurrent layers but 3 feedfor-ward hidden layers of 124 ReLUs each that gets as inputthe 5 most recent notes note[n − 4 : n] together with thecurrent time-shift dT [n+1] and duration T [n+1] to sam-ple the pitch P [n + 1]. To sample the duration T [n + 1]and the time-shift dT [n+1], appropriate parts of the inputare masked with zeros.

Models BachProp, PolyDAC, MidiBP, IndepBP were trainedwith truncated back propagation through time and the Adamoptimizer [33]. The MLP model was trained with standardback propagation and the Adam optimizer. The mini-batchsize is 32 scores, the validation set a 0.1 fraction of theoriginal corpus, and one training epoch consists of updat-ing the network parameters with all training examples andevaluating the performances on the entire validation set.Training is stopped when the performances on the vali-dation set saturates and the model leading to the highestaccuracy is used for generating new music scores. Deep-Bach was trained for 15 epochs with the standard settingsof the current master branch [32]. PolyRNN was trainedfor 26000 steps with the standard settings of the currentmaster branch [15].

MODEL NLL dT T P

BACHPROP 0.419 0.97 0.91 0.77POLYDAC 0.647 0.97 0.94 0.69INDEPBP 0.647 0.97 0.75 0.63MLP 0.796 0.95 0.76 0.49

Table 1. Comparison of architectures on our representationof music. NLL stands for negative log-likelihood on the valida-tion set. Columns dT , T and P indicate the accuracy (fraction ofcorrect predictions) for time-shifts, durations and pitches, respec-tively.

4.3 BachProp performs better than alternative modelswith same representation

On the Bach Chorales we find that the BachProp architec-ture performs considerably better than the alternative ar-chitectures using the same representation of music (see Ta-ble 1). As expected, the standard feedforward MLP withReLUs yields the worst performance. It lacks the abilityto model long range dependencies, which the other mod-els can do through their recurrent connections. When we

A

B

C

8ve 2m 2M 3m 3M 4 4aug 5 6m 6M 7m 7M

Figure 3. Local statistics. A Distribution of dT . B Distributionof T . C Distribution of intervals in chords (top) and between eachnote (bottom). For all figures, we show the mean and standard de-viation (in black) obtained with bootstrapping (50% of the entirecorpus resampled 10 times). All models were trained on the BachChorales corpus.

remove the conditioning on each of probability terms onthe right side of Equation (3), as done for the IndepBPmodel, we get poorer performances. We further observethat sharing a common hidden state allowed BachProp tooutperform PolyDAC on the pitch predictions.

4.4 BachProp performs at least as good as alternativeswith different representation

To compare models that use a different representation ofmusic, we look at a set of metrics that includes local statis-tics, song-length statistics and novelty profiles. To evalu-ate these metrics for each model, we generated from eachmodel a set containing as many scores as the original BachChorales corpus. We include the baseline models from thelast section for comparison reasons.

4.4.1 Local statistics

A model that has captured the underlying structure of thesequences of notes present in a corpus, should be able togenerate new scores matching the local statistics of whatthey modeled. As such, we suggest to compute the distri-butions of generated dT and T and compare them to theoriginal corpus distributions as a first metric to evaluategenerative models of music. Note that for such direct localstatistics, a simple n-gram model would match the originaldistributions perfectly. Figure 3A and B shows that Bach-Prop and PolyDAC match the original distributions best,followed by MidiBP, DeepBach and PolyRNN, while In-depBP and MLP match the least.

Next, we look at interval distributions. An interval is thenumber of half-tone separating two notes. Here, Bach-Prop, PolyDAC, MidiBP and PolyRNN match the distribu-

Page 6: Learning to Generate Music with BachProp

A

B

Figure 4. Song lengths and novelty profiles. A Distribution ofthe duration of scores in quarter note length. B Novelty profile ofall corpora with respect to the auto-novelty of the original corpus(top). The auto-novelty profiles of all corpora (bottom). See textfor details.

tion quite well. DeepBach seems to generate minor thirdsconsiderably more often than present in the training data(Figure 3C).

4.4.2 Distribution of song lengths

The distribution of song lengths can indicate whether amodel captured really long-range dependencies in the train-ing set. On this measure MidiBP matches the distribu-tion slightly better than BachProp, PolyDAC, IndepBP andMLP (see Figure 4A). Since DeepBach and PolyRNN donot model score endings, we manually set their duration.

4.4.3 Novelty profiles

In Figure 4B (top), we compare the novelty profiles for allmodels with respect to the original Chorales corpus withwhich each model was trained. We compare the differentprofiles with the auto-novelty of the reference corpus. Theauto-novelty is the novelty profile for each song in the ref-erence corpus with respect to the same corpus without thesong for which the novelty score is computed. It reflects,how similar is the music within the original corpus and isconsequently the distribution to match for an ideal gener-ative model of music. Here, the only model that is clearlyoutside the target distribution is the MLP model. Whilethe IndepBP and MidiBP models match the target distri-butions, their novelty distributions for bigger pattern sizesis lower than the original corpus auto-novelty. This is anindicator that these models are generating music examplesthat are too similar to the original data. In other words,these models adopted a strategy closer to reproducing orrecombining observed patterns rather than inferring the ac-tual temporal dependencies between music notes. Deep-Bach, BachProp and PolyDAC have their medians closeand above the original distributions. However, DeepBachand PolyRNN have a surprisingly low variance for each ofthe pattern sizes.

In Figure 4B (bottom) we compare the auto-novelty ofall generated corpora with the original corpus. An auto-

DATASET NLL dT T P SIZE [SCORE — NOTE]

CHORALES 0.419 0.97 0.91 0.77 357 — 95’337NOTTINGHAM 0.587 0.98 0.89 0.70 1037 — 313’975JOHN SANKEY 1.002 0.89 0.77 0.45 135 — 358’211STRING QUARTETS 0.936 0.88 0.83 0.49 215 — 738’739

Table 2. BachProp on other datasets. See Table 1 for descrip-tion of labels.

novelty profile exhibiting distributions with lower noveltyscores than the original data set, is suspected to generatenew music scores of little diversity. The auto-novelty pro-file of BachProp and PolyDAC match the one of the origi-nal corpus best.

4.5 BachProp generates pleasant examples on morecomplex datasets

As a reference for future comparisons, we report here theresults of BachProp trained on more complex datasets. InTable 2, we observe that for homogeneous corpora withmany examples of similar structures (Chorales, Notting-ham), BachProp can predict notes with higher accuraciesthan for more heterogeneous data sets (John Sankey, StringQuartets).

We encourage readers to listen to the examples providedon the accompanying webpage [20] to convince themselvesof the ability of BachProp and its variants to generate uniqueand heterogeneous new music scores.

5. CONCLUSION

In this paper, we presented BachProp, an algorithm forgeneral automated music composition. Our main contribu-tions are (1) a note-sequence based representation of musicwith minimal distortion of the rhythm for training neuralnetwork models, (2) a network architecture that learns togenerate pleasant music in this representation and (3) a setof metrics to compare generative models that operate ondifferent representations of music.

BachProp can be used both for automated and interac-tive music composition. Indeed, adding a human composerin the composition process is straightforward and Bach-Prop can then be used as a computer aided music compo-sition algorithm. For example, the human composer coulduse BachProp to suggest possible continuations and selectamong them. With such algorithms, the authors foreseethat music composition can be brought to a wider audience,therefore allowing untrained humans to compose their ownpieces of music.

Acknowledgments

This research was partially supported by the Swiss Na-tional Science Foundation (Grant 200020 165538) and theLaboratory of Computational Neuroscience (EPFL-LCN).

6. REFERENCES

[1] S. Colton, G. A. Wiggins et al., “Computational cre-ativity: The final frontier?” in ECAI, vol. 12, 2012, pp.21–26.

Page 7: Learning to Generate Music with BachProp

[2] A. Mordvintsev, C. Olah, and M. Tyka, “Inceptionism:Going deeper into neural networks,” Google ResearchBlog. Retrieved June, vol. 20, no. 14, p. 5, 2015.

[3] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image styletransfer using convolutional neural networks,” in IEEEConference on Computer Vision and Pattern Recogni-tion. IEEE, 2016, pp. 2414–2423.

[4] B. L. Sturm, J. F. Santos, O. Ben-Tal, and I. Kor-shunova, “Music transcription modelling and compo-sition using deep learning,” in 1st Conference on Com-puter Simulation of Musical Creativity, 2016.

[5] F. Colombo, A. Seeholzer, and W. Gerstner, “Deep ar-tificial composer: A creative neural network model forautomated melody generation,” in International Con-ference on Evolutionary and Biologically Inspired Mu-sic and Art. Springer, 2017, pp. 81–96.

[6] G. Hadjeres, F. Pachet, and F. Nielsen, “DeepBach:a steerable model for Bach chorales generation,” in34th International Conference on Machine Learning,vol. 70. JMLR, 2017, pp. 1362–1371.

[7] A. Lovelace, “Notes on L. Menabrea’s ’sketch of theanalytical engine invented by Charles Babbage, esq.’,”Taylor’s Scientific Memoirs, 1843.

[8] J. D. Fernandez and F. Vico, “AI methods in algorith-mic composition: A comprehensive survey,” Journal ofArtificial Intelligence Research, vol. 48, pp. 513–582,2013.

[9] P. M. Todd, “A connectionist approach to algorithmiccomposition,” Computer Music Journal, vol. 13, no. 4,pp. 27–43, 1989.

[10] S. Hochreiter and J. Schmidhuber, “Long short-termmemory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.

[11] D. Eck and J. Schmidhuber, “Finding temporal struc-ture in music: Blues improvisation with LSTM recur-rent networks,” in 12th IEEE Workshop on Neural Net-works for Signal Processing. IEEE, 2002, pp. 747–756.

[12] N. Boulanger-Lewandowski, Y. Bengio, and P. Vin-cent, “Modeling temporal dependencies in high-dimensional sequences: Application to polyphonicmusic generation and transcription,” in 29th Interna-tional Conference on Machine Learning, 2012.

[13] S. Lattner, M. Grachten, and G. Widmer, “Impos-ing higher-level structure in polyphonic music gen-eration using convolutional restricted Boltzmann ma-chines and constraints,” Journal of Creative Music Sys-tems, vol. 2, no. 1, 2018.

[14] F. Liang, M. Gotham, M. Johnson, and J. Shotton, “Au-tomatic stylistic composition of Bach chorales withdeep LSTM,” in 18th International Society for MusicInformation Retrieval Conference, 2017.

[15] Magenta Team Google Brain, “Polyphony RNN, revi-sion ca73164,” https://github.com/tensorflow/magenta/tree/master/magenta/models/polyphony rnn, 2016.

[16] “J.S. Bach chorales,” http://web.mit.edu/music21/, ac-cessed: 2019-04-01.

[17] “Bach MIDI sequences by John Sankey,” http://www.jsbach.net/midi/midi johnsankey.html, accessed:2019-04-01.

[18] “String quartets by Mozart and Haydn,” http://www.stringquartets.org, accessed: 2019-04-01.

[19] L. Theis, A. van den Oord, and M. Bethge,“A note on the evaluation of generative models,”ArXiv:1511.01844, p. arXiv:1511.01844, 2015.

[20] “BachProp media webpage,” https://sites.google.com/view/bachprop, accessed: 2019-04-01.

[21] F. Colombo and W. Gerstner, “BachProp: Learningto compose music in multiple styles,” arXiv preprintarXiv:1802.05162, 2018.

[22] F. Colombo, J. Brea, and W. Gerstner, “Learningto generate music with BachProp,” arXiv preprintarXiv:1812.06669, 2018.

[23] I. Sutskever, J. Martens, and G. Hinton, “Generatingtext with recurrent neural networks,” in 28th Inter-national Conference on Machine Learning, 2011, pp.1017–1024.

[24] A. Graves, “Generating Sequences With RecurrentNeural Networks,” ArXiv:1308.0850, 2013.

[25] T. Mikolov, “Statistical language models based on neu-ral networks,” Ph.D. dissertation, 2012.

[26] S. Oore, I. Simon, S. Dieleman, and D. Eck, “Learningto create piano performances,” NIPS 2017 Workshopon Machine Learning for Creativity and Design, 2017.

[27] A. van den Oord, N. Kalchbrenner, andK. Kavukcuoglu, “Pixel Recurrent Neural Networks,”ArXiv:1601.06759, 2016.

[28] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Em-pirical evaluation of gated recurrent neural networks onsequence modeling,” arXiv preprint arXiv:1412.3555,2014.

[29] A. Karpathy, “The unreasonable effectiveness of recur-rent neural networks,” http://karpathy.github.io/2015/05/21/rnn-effectiveness, 2016.

[30] F. Chollet et al., “Keras,” https://github.com/fchollet/keras, 2015.

[31] “Nottingham data set of folk songs,” http://www-etud.iro.umontreal.ca/∼boulanni/icml2012,accessed: 2019-04-01.

[32] “DeepBach, revision f069695,” https://github.com/Ghadjeres/DeepBach, accessed: 2019-04-01.

Page 8: Learning to Generate Music with BachProp

[33] D. P. Kingma and J. Ba, “Adam: A method for stochas-tic optimization,” arXiv preprint arXiv:1412.6980,2014.