36
Nonnegative Tensor Factorization for Source Separation of Loops in Audio Jordan B. L. Smith National Institute of Advanced Industrial Science and Technology (AIST), Japan Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST), Japan

Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Nonnegative Tensor Factorization for Source

Separation of Loops in Audio

Jordan B. L. Smith National Institute of Advanced Industrial Science and Technology (AIST), Japan

Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST), Japan

Laboratoire de signaux et systèmes (L2S) & IRCAM, Paris

Page 2: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Introduction

Page 3: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

• In some musical styles, songs are built from loops. E.g.:Extracting loops from music

2. Loops arranged to make a song

0:00 0:30 1:00A A A A A A A

B B B B B BC C C C

DD D

DrumMelody

BassFX

1. Collection of loops

A B

C D

3. Song mixed down to audio

→ composition process →

Audio examples (and test data) all borrowed from [López-Serrano et al. 2016]

Page 4: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

• In some musical styles, songs are built from loops. E.g.:Extracting loops from music

2. Loops arranged to make a song

0:00 0:30 1:00A A A A A A A

B B B B B BC C C C

DD D

DrumMelody

BassFX

• Goal: decompose the audio signal to recover:• the layout of the song• the source-separated loops

1. Collection of loops

A B

C D

3. Song mixed down to audio

← decomposition procedure ←

Page 5: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

• Two previous approaches that inspired us:• Fingerprint-based loop detection [López-Serrano et al. 2016]

Extracting loops from music

Inputs:

A B

C D+ →

A A A A A A AB B B B B B

C C C CDD D

Output:

Original loops Mixed audio Map of loop activations

Inputs:

+ →

Output:

Assumption that loops are introduced

additively A:B:C:D:

Mixed audio Separated tracks, one per loop

• Iterative NMF [Seetharaman & Pardo 2016]

Page 6: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

• Our proposed system:Extracting loops from music

Input:

→A A A A A A A

B B B B B BC C C C

DD D

Outputs:

+A:B:C:D:

Mixed audio Map of loop activations Separated tracks, one per loop

• We attempt to solve both problems in one step, without assumption of additive layout

• We do so by extending nonnegative matrix factorization (NMF) to handle periodicity

Page 7: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Source separation using NMF*

• Steady-state notes

• Note sequences repeated in time

• Transposed notes

• Periodicity (especially at downbeats)

• NMF with harmonic templates

• NMFD with time-evolving templates[Smaragdis 2004]

• NMF2D with transposed harmonic templates[e.g., FitzGerald, Cranitch & Coyle 2008]

• ...no nonnegative approach!NB: REPET, a median-filtering approach[Rafii, Liutkus, & Pardo 2014]

NMF can handle many

types of repetition:

Page 8: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Method

Page 9: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Nonnegative tensor factorization• Step 1: estimate downbeats [madmom, Böck et al. 2016]

Page 10: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Nonnegative tensor factorization• Step 1: estimate downbeats [madmom, Böck et al. 2016]

Page 11: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Nonnegative tensor factorization• Step 1: estimate downbeats• Step 2: stack the 2D spectrograms into a 3D volume (a

“spectral cube”)

Page 12: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Nonnegative tensor factorization• Step 1: estimate downbeats• Step 2: stack the 2D spectrograms into a 3D volume (a

“spectral cube”)

Page 13: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Nonnegative tensor factorization• Step 1: estimate downbeats• Step 2: stack the 2D spectrograms into a 3D volume (a

“spectral cube”)

Page 14: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Detour: understanding the spectral cube

Frequency

Bar number (time in piece)

Time in bar

Page 15: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Detour: understanding the spectral cube

Frequency

Bar number (time in piece)

Time in bar

Page 16: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Detour: understanding the spectral cube

Frequency

Bar number (time in piece)

Time in bar

Page 17: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Detour: understanding the spectral cube

Frequency

Bar number (time in piece)

Time in bar

Page 18: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Bottom to topBack to front Left to right

Visualizing a 3D volume: CT scan

Page 19: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Low frequency to highBeginning to end of piece Beginning to end of a bar

Visualizing a 3D volume: CT scan

Frequency

Bar number (time in piece)

Time in bar

Page 20: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Nonnegative tensor factorization• Step 1: estimate downbeats• Step 2: stack the 2D spectrograms into a 3D volume (a

“spectral cube”)• Step 3: use nonnegative tensor factorization (NTF) to

model the spectral cube

Page 21: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Nonnegative matrix factorization• NMF: X ≈ W ◦ H• W = note templates• H = activation functions

X ≈ M

N

M × rW

r × NH

• Needs post-processing to separate sources:• which templates in W belong to the same source?• different sources could use the same harmonic

components!

Page 22: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Nonnegative tensor factorization• Tucker Decomposition: X ≈ C ◦ (W ◦ H ◦ D)• W = note templates• H = activation functions (time-in-bar)• D = loop activation functions (time-in-piece)• C = core tensor = recipe for each loop type

≈ M

PQ

=

Tucker decomposition

𝓧

Page 23: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Interpreting theNTF model

• W, H, and D all musically intuitive:

A A A A A A AB B B B B B

C C C CDD DLoop template

activations directly estimate layout of song

Page 24: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Interpreting theNTF model

• Core tensor C = recipe for each loop typeLoop recipes

(C)

• Pixel C(i, j, k) tells us to play note wi with activation function hj whenever loop dk appears.

(w4, h7)+

(w11, h10)+

(w24, h16)

Page 25: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Interpreting theNTF model

• Core tensor C = recipe for each loop typeLoop recipes

(C)

• To recover entire spectrogram: C ◦ (W ◦ H ◦ D) • To recover individual loop source: C[:,:,k] ◦ (W ◦ H ◦ D[k,:])

Page 26: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Evaluation

Page 27: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Evaluation• We used synthetic data [López-Serrano et al. 2016]

• 7 sets of loops x 3 different layouts (arrangements)• Algorithm output 1: separated signals

• Evaluate quality with SDR, SIR, SAR

A A A A A A AB B B B B B

C C C CDD D

estimated map ground truth map

estimated source tracks stem tracks

• Algorithm output 2: loop layout• Evaluate accuracy with correlation

Page 28: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Good separation example

• When it works, it works

Collection of loops for genre: “Acid”

Drum Melody

Bass FX

Extracted loops

1 2

3 4

Page 29: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Flawed separation example

Original tracks for genre “Brezo”

Source separated tracks

A A A A A A A

B B B B B B

C C C C

DD D

A A A

B B B

C C C C

DD D

Page 30: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Flawed separation example

Original tracks for genre “Brezo”

Source separated tracks

A A A A A A A

B B B B B B

C C C C

DD D

A A A

B B B

C C C C

DD D

A A A A A A AB B B B B BC C C C

DD D

swap rows

substitute C=CA

A A AB B B B B BC C C C

DD D

substitute C=CA

A A AB B BC C C C

DD D

A A A A A A AB B B B B BC C C CDD D

A A AB B BC C C CDD D

swap rows

Page 31: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

(proposed)

(performance ceiling)

[Seetharaman & Pardo 2016]

10

5

0

20151050

1050

–5

SAR

SDR

SIR

Our reconstruction quality is average.:-|

We have more noisy artifacts. :-(

We have less crosstalk than others! :-D

1.00.80.60.40.20.0Co

rrela

tion

We get very clean layouts! :-D

Page 32: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Conclusion

Page 33: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Conclusion• Proposed method of decomposing audio into loops that:

• Models periodicity using the spectral cube• Models source signals and song composition jointly• Tucker decomposition is musically intuitive

• Weaknesses include:• Very conservative reconstructions don’t model the

whole signal• Like NMFD, we cannot distinguish between

algebraically equivalent decompositions• Future work: searching for repetitions at multiple

hierarchical time scales

Page 34: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Future work: hierarchical analysis

• Different loops in the song have different lengths and periods

• Spectral cubes with different periods highlight different consistent repetitions

1 downbeat 4 downbeatsPERIOD: 2 beats

Page 35: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Future work: hierarchical analysis

• Different loops in the song have different lengths and periods

• Spectral cubes with different periods highlight different consistent repetitions

1 downbeat 2 downbeats 4 downbeatsPERIOD: 2 beats

Page 36: Nonnegative Tensor Factorization for Source Separation of ... · [Rafii, Liutkus, & Pardo 2014] NMF can handle many types of repetition: Method. Nonnegative tensor factorization

Thank you!

PS. Jordan is now at:

+