Abstract a por- arXiv:1907.10380v1 [cs.HC] 23 Jul 2019

NONOTO: A Model-agnostic Web Interface for Interactive Music Composition byInpainting

Théis BazinSony CSL

Paris, [email protected]

Gaëtan HadjeresSony CSL

Paris, [email protected]

Abstract

Inpainting-based generative modeling allows for stimu-lating human-machine interactions by letting users per-form stylistically coherent local editions to an object us-ing a statistical model. We present NONOTO, a newinterface for interactive music generation based on in-painting models. It is aimed both at researchers, by of-fering a simple and flexible API allowing them to con-nect their own models with the interface, and at mu-sicians by providing industry-standard features such asaudio playback, real-time MIDI output and straightfor-ward synchronization with DAWs using Ableton Link.

Keywords interfaces, generative models, inpainting, in-teractive music generation, web technologies, open-sourcesoftware

IntroductionWe present a web-based interface that allows users to com-pose symbolic music in an interactive way using genera-tive models for music. We strongly believe that such mod-els only reveal their potential when actually used by artistsand creators. While generative models for music have beenaround for a while (Boulanger-Lewandowski et al., 2012;Hadjeres et al., 2017; Roberts et al., 2018), the conceptionof A.I.-based interactive interfaces designed for music cre-ators is still burgeoning. We contribute to this emerging areaby providing a general web interface for many music gen-eration models so that researchers in the domain can easilytest and promote their works in actual music production andperformance settings. This desire follows from the seminalwork by Theis et al., 2015, in which the authors advocatethat quantitative evaluation of generative models in an un-ambiguous way is hard and that "generative models need tobe evaluated directly with respect to the application(s) theywere intended for" (Theis et al., 2015). Lastly, we hopethat the present work will contribute in making A.I.-assistedcomposition accessible to a wider audience, from non musi-cians to professional musicians, helping bridge the gap be-tween these communities.

Published as a conference paper at the 10th International Con-ference on Computational Creativity (ICCC 2019), UNC Charlotte,North Carolina.

Drawing inspiration from recent advances in interactiveinterfaces for image restoration and editing (Isola et al.,2016; Jo and Park, 2019; Yu et al., 2018), we focus onproviding an interface for “inpainting” models for symbolicmusic, which are models that are able to recompose a por-tion of a score given all the other portions. The reason is thatsuch models are more suited for an interactive use (com-pared to models generating a full score all at once) and letusers play an active part in the compositional process. As anoutcome, users can feel that the composition is the result oftheir work and not just something created by the machine.Furthermore, allowing quick exploration of musical ideas ina playful setting can enhance creativity and provide accessi-bility: the "technical part" of the composition is taken careof by the generative model which allows musicians as wellas non experts in music to express themselves more freely.

Contributions: The key elements of novelty are: (a) easy-to-use and intuitive interface for users, (b) easy-to-plug in-terface for researchers allowing them to explore the poten-tial of their music generation algorithms, (c) web-based andmodel-agnostic framework, (d) integration of existing musicinpainting algorithms, (e) novel paradigms for A.I.-assistedmusic composition and live performance, (f) integration inprofessional music production environments.

The code for the interface is distributed under a GNUGPL license and available, along with ready-to-use pack-aged standalone applications and video demonstrations ofthe interface, on our GitHub1.

Existing approachesThe proposed system is akin to the FlowComposer sys-tem (Papadopoulos et al., 2016) which offers to generatesheets of music by performing local updates (using MarkovModels in their case). However, this interface does not ex-hibit the same level of interactivity as ours since no real-timeaudio nor MIDI playback is available, which limits the toolto solely studio usage and makes for a less spontaneous andreactive user experience.

The recent tools proposed by the Google Magenta teamas part of their Magenta Studio effort (Roberts et al., 2019)are more aligned with our aims in this project: they offer

1https://github.com/SonyCSLParis/NONOTO

arX

iv:1

907.

1038

0v1

[cs

.HC

] 2

3 Ju

l 201

9

https://github.com/SonyCSLParis/NONOTO

a selection of Ableton Live plugins (using Max for Live)that make use of various symbolic music generation mod-els for rhythm as well as melody (Roberts et al., 2018;Huang et al., 2018). Similarly, the StyleMachine, developedby Metacreative Technologies 2, is a proprietary tool thatallows to generate midi-tracks into Ableton Live in variousdance music styles using a statistical model trained on dif-ferent stylistic corpora of electronic music. Yet these toolsdiffer from ours in the generation paradigm used: they offereither continuation-based (the model generates the end of asequence given the beginning) or complete generation (themodel generates a full sequence, possibly given a template),thus breaking the flow of music on new generations. We be-lieve that this limits their level of interactivity as opposedto local, inpainting-based models as ours, as mentioned pre-viously. In particular, it hinders their usage in live, perfor-mance contexts.

Suggested mode of usageThe interface displays a musical score which loops foreveras shown in Figure 1. Users can then modify the score byregenerating any region only by clicking/touching it. Thedisplayed musical score is updated instantly without inter-rupting the music playback. Other means of control areavailable depending on the specificity of the training dataset:we implemented, for instance, the positioning of fermatas inthe context of Bach chorales generation or the control of thechord progression when generating Jazz leadsheets or Folksongs. These metadata are sent, along with the sheet, tothe generative models when performing re-generations. Thescores can be seamlessly integrated in a DAW so that theuser (or even other users) can shape the sounds, add effects,play the drums or create live mixes. This creates a new jam-like experience in which the user controlling the A.I. can beseen as just one of the multiple instrument players. This in-terface thus has the potential to create a new environment forcollaborative music production and performance.

Since our approach is flexible, our tool can be used in con-junction with other A.I.-assisted musical tools like MagentaStudio (Roberts et al., 2019) or the StyleMachine (fromMetacreative Technologies).

TechnologyFrameworkOur framework relies on two elements: an interactive webinterface and a music inpainting server. This decoupling isstrict so that researchers can easily plug-in new inpaintingmodels with little overhead: it suffices to implement howthe music inpainting model should function given a parti-cular user input. We make heavy use of modern web browsertechnologies, making for a modular and hackable code-basefor artists and researchers, allowing e.g. to edit the interfaceto allow for some particular means of interaction or to addsupport for some new metadata specific to a given corpus.

2https://metacreativetech.com/products/stylemachine-lite/

Interface The interface is based on OpenSheetMusicDis-play 3, a TypeScript library aimed at rendering MusicXMLsheets with vector graphics. Using Tone.js (Mann, 2015),a JavaScript library for real-time audio synthesis, we aug-mented OSMD with real-time audio playback capacities, al-lowing users to preview the generated music in real-timefrom within the interface. Furthermore, the audio playbackis uninterrupted by re-generations, enabling a truly interac-tive experience.

Generation back-end and communication For better in-teroperability, we rely on the MusicXML standard to com-municate scores between the interface and the server. TheHTTP-based communication API then just consists in twocommands that are required server-side:

• A /generate command which expects the generationmodel to return a fresh new sheet of music in the Mu-sicXML format to initialize a session,

• A /timerange-change command which takes as pa-rameter the position of the interval to re-generate. Theserver is then expected to return an updated sheet with thechosen portion regenerated by the model using the currentmusical context.

DAW integration In order for NONOTO to be readily us-able in traditional music production and performance con-texts, we implemented the possibility of integrating the gen-erated scores in any DAW in real time. To this end, we pro-vide the user with the option of either rendering the gener-ated sheet to audio in real-time from within the web interfaceusing Tone.js or of routing it via MIDI to any virtual MIDIport on the host machine, using the JavaScript bindings tothe Web MIDI API, WebMidi.js 4. We also integrated sup-port for Ableton Link 5,6, an open-source technology devel-opped by Ableton for easy synchronization of musical hostson a local network, allowing to syncronize the inferface withe.g. Ableton Live. Adding support for these technologiesdoes not represent novel advances on our side per se, yet,paired with the support of arbitrary generation back-ends,they allow to quickly test new generation models in a stan-dard music production environment with minimal overworkand make for a beneficial tool for researchers – and the firstof its kind to our knowledge.

ConclusionWe have introduced NONOTO, an interactive, open-sourceand hackable interface for music generation using inpaint-ing models. We invite researchers and artists alike to makeit their own by developing new models or means of interact-ing with those. This high level of hackibility is to a largeextent permitted by the wide range of technologies now of-fered in a very convenient fashion by modern web browsers,

3https://github.com/opensheetmusicdisplay/opensheetmusicdisplay/

4https://github.com/djipco/webmidi5https://github.com/Ableton/link/6https://github.com/2bbb/node-abletonlink

https://metacreativetech.com/products/stylemachine-lite/

https://metacreativetech.com/products/stylemachine-lite/

https://github.com/opensheetmusicdisplay/opensheetmusicdisplay/

https://github.com/opensheetmusicdisplay/opensheetmusicdisplay/

https://github.com/djipco/webmidi

https://github.com/Ableton/link/

https://github.com/2bbb/node-abletonlink

(a)

(b)

Figure 1: Our web interface used on different datasets: 1amelody and symbolic chords format, 1b four-part choralemusic.

from which we draw heavily. Ultimately, we hope that pro-viding tools such as ours with a strong focus on usability,accessibility, affordance and hackability will help shift thegeneral perspective on machine learning for music creation,transitioning from the current and somewhat negative viewof "robot music", replacing musicians, towards a more real-istic and humbler view of it as A.I.-assisted music.

AckowledgementsThe authors would like to thank the anonymous reviewersfor their helpful comments.

References[Boulanger-Lewandowski et al., 2012] Boulanger-Lewandowski, N., Bengio, Y., and Vincent, P. (2012).Modeling Temporal Dependencies in High-DimensionalSequences: Application to Polyphonic Music Generationand Transcription. arXiv e-prints, page arXiv:1206.6392,1206.6392.

[Hadjeres et al., 2017] Hadjeres, G., Pachet, F., and Nielsen,F. (2017). DeepBach: a steerable model for Bach choralesgeneration. In Proc. of the 34th International Conference onMachine Learning (ICML), pages 1362–1371. 1612.01010.

[Huang et al., 2018] Huang, C. A., Vaswani, A., Uszkoreit,J., Shazeer, N., Hawthorne, C., Dai, A. M., Hoffman, M. D.,and Eck, D. (2018). An improved relative self-attentionmechanism for transformer with application to music gen-eration. CoRR, abs/1809.04281, 1809.04281.

[Isola et al., 2016] Isola, P., Zhu, J.-Y., Zhou, T., and Efros,A. A. (2016). Image-to-image translation with conditionaladversarial networks. ArXiV, 1611.07004.

[Jo and Park, 2019] Jo, Y. and Park, J. (2019). SC-FEGAN:Face Editing Generative Adversarial Network with User’sSketch and Color. arXiv e-prints, page arXiv:1902.06838,1902.06838.

[Mann, 2015] Mann, Y. (2015). Interactive music withTone.js. In Proceedings of the 1st Web Audio Conference.

[Papadopoulos et al., 2016] Papadopoulos, A., Roy, P., andPachet, F. (2016). Assisted lead sheet composition usingFlowComposer. In Rueher, M., editor, Principles and Prac-tice of Constraint Programming, pages 769–785, Cham.Springer International Publishing.

[Roberts et al., 2018] Roberts, A., Engel, J., Raffel, C.,Hawthorne, C., and Eck, D. (2018). A hierarchical latentvector model for learning long-term structure in music. InProceedings of the 35th International Conference on Ma-chine Learning, pages 4364–4373. 1803.05428.

[Roberts et al., 2019] Roberts, A., Mann, Y., Engel,J., and Radebaugh, C. (2019). Magenta studio.https://magenta.tensorflow.org/studio-announce.

[Theis et al., 2015] Theis, L., van den Oord, A., and Bethge,M. (2015). A note on the evaluation of generative models.arXiv e-prints, page arXiv:1511.01844, 1511.01844.

[Yu et al., 2018] Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X.,and Huang, T. S. (2018). Free-form image inpainting withgated convolution. CoRR, abs/1806.03589.

Documents

Abstract a por- arXiv:1907.10380v1 [cs.HC] 23 Jul 2019