ChucK! @ Harvestworks Part 3 : Audio analysis & machine learning Rebecca Fiebrink Princeton University 1

ChucK! @ HarvestworksPart 3 : Audio analysis & machine learning

Rebecca FiebrinkPrinceton University

1

Real-time audio analysis

• Goal: Analyze audio within same sample-synchronous framework as synthesis & interaction.

The Unit Analyzercenter freq, radius

Impulse generator

BiQuadFilter

DAC

Send impulse

FFTFFT

Spectral feature

extractors

Spectral feature

extractors

IFFTIFFT……

Time-domain feature

extractors

Time-domain feature

extractors

UAnaUAnaNew: Unit Analyzer

UGenOld: Unit Generator

The Unit Analyzer

4

• Like a unit generator– Blackbox for computation– Plug into a directed graph/network/patch

• Unlike a unit generator– Input is samples, data, and/or metadata– Output is samples, data, and/or metadata– Not tied to sample rate; computed on-demand

=>

=^

=> =^chuck upchuck

See upchuck_operator.ck, upchuck_function.ck, continuous_feature_extraction.ck

The UAnaBlob

• Upchucked by UAna• Generic representation for metadata.

– Real and complex arrays– Spectra, feature values, or user-defined– Timestamped

• One associated with each UAna

FFT/IFFT

• Takes care of:– Buffering input / overlap-adding output– Maintaining window and FFT sizes– Mediating audio rate and analysis “rate”

• FFT outputs complex spectrum as well as magnitude spectrum– Low-level: access/modify contents manually– High-level: connect FFT to spectral processing UAnae

See ifft.ck, ifft_transformation.ck

Example: Cross-synthesis

• Apply the spectral envelope of one sound to another sound– Ex: xsynth_robot123.ck, xsynth_guitar123.ck– Voice spectrum taken from:

10

Machine learning for live performance

• Problem: How do we use audio and gestural features?– there is a semantic gap between the raw data that

computers use and the musical, cultural, aesthetic meanings that humans perceive and assign.

One solution: A lot of code

• What algorithm would you design to tell a computer whether a picture contains a human face?

12

The problem

• If your algorithm doesn’t work, how can you fix it?

• You can’t easily reuse it to do a similar task (e.g., recognizing monkey faces that are not human)

• There’s no “theory” for how to write a good algorithm

• It’s a lot of work!

13

Another solution: Machine learning (Classification)

• Classification is a data-driven approach for applying labels to data. Once a classifier has been trained on a training set that includes the true labels, it will predict labels for new data it hasn’t seen before.

14

Classifier

Data Set: A feature vector and class for every data point

Train the classifier on a labeled dataset

Run the trained classifier on new data

Classifier

NO!NO!

Candidates for classification

• Which gesture did the performer just make with the iCube?

• Which instruments are playing right now?• Who is singing? What language are they singing?• Is this chord major or minor?• Is this dancer moving quickly or slowly?• Is this music happy or sad?• Is anyone standing near the camera?

18

An example algorithm: kNN

• The features of an example are treated as its coordinates in n-dimensional space

• To classify an new example, the algorithm looks for its k (maybe 10) nearest neighbors in that space, and chooses the most popular class.

19

kNN space: Basketball or Sumo?

20

Feature 1: Weight

Feat

ure

2: H

eigh

t


21

Feature 1: Weight

Feat

ure

2: H

eigh

t

?


22

Feature 1: Weight

Feat

ure

2: H

eigh

t

?

K=3


23

Feature 1: Weight

Feat

ure

2: H

eigh

t

SS

SMIRK (small music information retrieval toolkit)

• For real-time application of machine learning– Learning in ChucK– E.g., kNN gesture classification, musical audio

genre/artist classification

24

Interaction & on-the-fly learning

• Can we make process of training a classifier interactive? Performative?

25

Another technique: Neural networks

• Very early method• Inspired by the brain• Results in highly non-

linear functions from input to output

26

Combining Techniques with Wekinator

27

ChucK: Pass features to Java, receive

results back and use them to make sound

ChucK: Pass features to Java, receive

results back and use them to make sound

Java: Train a neural network to map

features to sounds

Java: Train a neural network to map

features to soundsOSC

Example: Wekinator

See performance video at http://wekinator.cs.princeton.edu/video/nets0.mov

Review

• Machine learning can be used to:– Apply meaningful labels (classification)– Learn (& re-learn) functions from inputs to outputs

(e.g., neural networks)

• Appropriate for camera, audio, sensors, and many other types of data

• Live, interactive performance is a very interesting application area

• http://wekinator.cs.princeton.edu28

Wrap-up

• Thanks for coming, thanks to Harvestworks!• See resources on handout; workshop

webpage with slides & code• Please fill out evaluation forms!

29

Documents

ChucK! @ Harvestworks Part 3 : Audio analysis & machine learning Rebecca Fiebrink Princeton University 1