Upload
alyson-baldwin
View
217
Download
0
Embed Size (px)
Citation preview
ChucK! @ HarvestworksPart 3 : Audio analysis & machine learning
Rebecca FiebrinkPrinceton University
1
Real-time audio analysis
• Goal: Analyze audio within same sample-synchronous framework as synthesis & interaction.
The Unit Analyzercenter freq, radius
Impulse generator
BiQuadFilter
DAC
Send impulse
FFTFFT
Spectral feature
extractors
Spectral feature
extractors
IFFTIFFT……
Time-domain feature
extractors
Time-domain feature
extractors
UAnaUAnaNew: Unit Analyzer
UGenOld: Unit Generator
The Unit Analyzer
4
• Like a unit generator– Blackbox for computation– Plug into a directed graph/network/patch
• Unlike a unit generator– Input is samples, data, and/or metadata– Output is samples, data, and/or metadata– Not tied to sample rate; computed on-demand
=>
=^
=> =^chuck upchuck
See upchuck_operator.ck, upchuck_function.ck, continuous_feature_extraction.ck
The UAnaBlob
• Upchucked by UAna• Generic representation for metadata.
– Real and complex arrays– Spectra, feature values, or user-defined– Timestamped
• One associated with each UAna
FFT/IFFT
• Takes care of:– Buffering input / overlap-adding output– Maintaining window and FFT sizes– Mediating audio rate and analysis “rate”
• FFT outputs complex spectrum as well as magnitude spectrum– Low-level: access/modify contents manually– High-level: connect FFT to spectral processing UAnae
See ifft.ck, ifft_transformation.ck
Example: Cross-synthesis
• Apply the spectral envelope of one sound to another sound– Ex: xsynth_robot123.ck, xsynth_guitar123.ck– Voice spectrum taken from:
10
Machine learning for live performance
• Problem: How do we use audio and gestural features?– there is a semantic gap between the raw data that
computers use and the musical, cultural, aesthetic meanings that humans perceive and assign.
One solution: A lot of code
• What algorithm would you design to tell a computer whether a picture contains a human face?
12
The problem
• If your algorithm doesn’t work, how can you fix it?
• You can’t easily reuse it to do a similar task (e.g., recognizing monkey faces that are not human)
• There’s no “theory” for how to write a good algorithm
• It’s a lot of work!
13
Another solution: Machine learning (Classification)
• Classification is a data-driven approach for applying labels to data. Once a classifier has been trained on a training set that includes the true labels, it will predict labels for new data it hasn’t seen before.
14
Classifier
Data Set: A feature vector and class for every data point
Train the classifier on a labeled dataset
Run the trained classifier on new data
Classifier
NO!NO!
Candidates for classification
• Which gesture did the performer just make with the iCube?
• Which instruments are playing right now?• Who is singing? What language are they singing?• Is this chord major or minor?• Is this dancer moving quickly or slowly?• Is this music happy or sad?• Is anyone standing near the camera?
18
An example algorithm: kNN
• The features of an example are treated as its coordinates in n-dimensional space
• To classify an new example, the algorithm looks for its k (maybe 10) nearest neighbors in that space, and chooses the most popular class.
19
kNN space: Basketball or Sumo?
20
Feature 1: Weight
Feat
ure
2: H
eigh
t
kNN space: Basketball or Sumo?
21
Feature 1: Weight
Feat
ure
2: H
eigh
t
?
kNN space: Basketball or Sumo?
22
Feature 1: Weight
Feat
ure
2: H
eigh
t
?
K=3
kNN space: Basketball or Sumo?
23
Feature 1: Weight
Feat
ure
2: H
eigh
t
SS
SMIRK (small music information retrieval toolkit)
• For real-time application of machine learning– Learning in ChucK– E.g., kNN gesture classification, musical audio
genre/artist classification
24
Interaction & on-the-fly learning
• Can we make process of training a classifier interactive? Performative?
25
Another technique: Neural networks
• Very early method• Inspired by the brain• Results in highly non-
linear functions from input to output
26
Combining Techniques with Wekinator
27
ChucK: Pass features to Java, receive
results back and use them to make sound
ChucK: Pass features to Java, receive
results back and use them to make sound
Java: Train a neural network to map
features to sounds
Java: Train a neural network to map
features to soundsOSC
Example: Wekinator
See performance video at http://wekinator.cs.princeton.edu/video/nets0.mov
Review
• Machine learning can be used to:– Apply meaningful labels (classification)– Learn (& re-learn) functions from inputs to outputs
(e.g., neural networks)
• Appropriate for camera, audio, sensors, and many other types of data
• Live, interactive performance is a very interesting application area
• http://wekinator.cs.princeton.edu28
Wrap-up
• Thanks for coming, thanks to Harvestworks!• See resources on handout; workshop
webpage with slides & code• Please fill out evaluation forms!
29