24
Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh, Bochen Li, Xinzhao Liu, Zhiyao Duan, Gaurav Sharma Department of Electrical and Computer Engineering, University of Rochester March 9, 2017 1 / 24

Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Visually Informed Multi-Pitch Analysis of StringEnsembles

Karthik Dinesh, Bochen Li, Xinzhao Liu, Zhiyao Duan, Gaurav Sharma

Department of Electrical and Computer Engineering, University of Rochester

March 9, 2017

1 / 24

Page 2: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Pitch in Music

Pitch - fundamental frequency of musical note from an instrument

Pitch changes with time as notes and vibrato change

2 / 24

Page 3: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Multi-pitch Analysis

Multiple music instrument ensemble has pitches corresponding tonotes from each instrument - multiple pitches

3 / 24

Page 4: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Introduction: Multi-pitch Estimation and Streaming

Multi-pitch Analysis

Multi-pitch Estimation (MPE): Estimate instantaneous pitches andpolyphony

Multi-pitch Streaming (MPS): Organize the estimated pitches intostreams corresponding to individual sound sources

4 / 24

Page 5: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Introduction: Applications of Multi-pitch Analysis

Multipitch analysisMIR: Music transcription, source separation, melody extractionSpeech recognition: Multi-talk recognition, prosody analysisMusicology: Scholarly analysisMusic education: Teach music to amateurs

Multi-pitch

Analysis

Multi-pitch

Estimation

Streaming

MIR

MusicEducation

Musicology

SpeechRecognition

Multi-pitch

5 / 24

Page 6: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Contribution: Augmenting MPE/MPS with Video

MPE/MPS based on audio alone challengingVideo modality provides valuable informationMultimedia research has gained prominenceLimited video informed work till date

Video

Audio

Feature

Extraction/

Processing

Analysis

MPE/MPS

6 / 24

Page 7: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Visually Informed Multi-pitch Analysis: Framework

Video module → play/non-play (P/NP) activity

P/NP activity → instantaneous polyphony (for MPE)helps organize pitches to active sources (for MPS)

Play/Non-play(P/NP)ActivityDetection

Multi-pitchEstimation

Multi-pitchStreaming

EstimatedPitch

Streams

InstantaneousPolyphony ActiveSources

Video

Audio

7 / 24

Page 8: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

P/NP detection: Framework

Input video Optical flow GMM based player detection Histogram threshold based

high motion region detection

Feature

Extraction

Support

vector

machine

P/NP

labels

High motion region

8 / 24

Page 9: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Multipitch Analysis: Prior Audio-Only Multi-pitchEstimation [2]

Audio&

mixture

θ → Set of fundamental frequency

O → Obs. from power spectrum

Θ → space of possible sets of θ

θ̂ = arg maxθ∈Θ

£(O | θ)

£(θ) = £peak(θ).£non−peak(θ)

Peak&

detection

MLE&based&

frequency&

estimation

Threshold MPE

9 / 24

Page 10: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Multipitch Analysis: Video Based Multi-pitch Estimation

P/NP labels inform instantaneous polyphonyInstantaneous polyphony used as threshold

Audio&

mixture

Peak&

detection

MLE&based&

frequency&

estimation

Threshold MPE

Video&based&P/NP&activity&labels

10 / 24

Page 11: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Multipitch Analysis: Prior Audio-Only Multi-pitchStreaming [3]

Pitch&

order

init.

All&pitches&

traversed?

Pick&rand.&

pitch/

stream

All&stream&

traversed?

Find&swap&set

Swap&and&get&

cluster

Is&

objective

reduced?

Best&

cluster

Is&

prev.&cluster

=best&cluster

Constraints&

satisfied&by&

best&cluster

Start

End

Audio&

Mixture

MPE&

estimate

#&sound&

sources

MPS

Objective&function:5 Maintain&timbre&consistency

Must5link&constraints:5 Pitches&close&in&time/frequency

Cannot5link&constraints:5 Pitches&in&same&time&frame

Constrained&Clustering&Algorithm&

Y

N

Y

N

N

N

Y

Y

f(Π) =M!

m=1

!

ti∈Sm

∥ti − cm∥2

ti → timbre feature vectorfor ith pitch

cm → centroid of timbre for stream Sm

M → number of streamsΠ → partition into M streams

11 / 24

Page 12: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Multipitch Analysis: Video Based Multi-pitch Streaming

P/NP&

based

init.

All&pitches&

traversed?

Pick&rand.&

pitch/

stream

All&stream&

traversed?Find&swap&

set

Is&P/NP

satisfied?

Is&

objective

reduced?

Swap/get

new&

cluster

Best&

cluster

Is&

prev.&cluster

=best&cluster

Constraints&

satisfied&by&

best&cluster

Start

End

P/NP&

labels

MPE&

estimate

#&sound&

sources

MPS

Objective&function:5 Maintain&timbre&consistency

Must5link&constraints:5 Pitches&close&in&time/frequency

Cannot5link&constraints:5 Pitches&in&same&time&frame

Constrained+Clustering+Algorithm+

Y

N

Y

N

N

N

N

Y

Y

Y

f(Π) =M!

m=1

!

ti∈Sm

∥ti − cm∥2

ti → timbre feature vectorfor ith pitch

cm → centroid of timbre for stream Sm

M → number of streamsΠ → partition into M streams

12 / 24

Page 13: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Experimental Results

Assessment on subset of URMP ground-truth dataset [4]Focus on string ensembles including violin, viola, cello, and bass11 videos featuring 3 duets, 2 trios, 4 quartets, and 2 quintets

P/NP SVM classifier used with radial basis function (RBF) kernel

P/NP evaluation: leave one out cross validation error

13 / 24

Page 14: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Experimental Results: Performance Metrics

P/NP detection accuracy:

P/NP detection acc =#corr predicted labels w .r .t ground truth

#labels

MPE accuracy:

MPE acc =#corr est pitch

#est pitch + #gt pitch −#corr est pitch

MPS accuracy:

MPS acc =#corr est & str pitch

#corr est & str pitch + #pitch in est not gt + #pitch in gt not est

corr → correct, est→estimated, str→streamed,gt→ground truth

14 / 24

Page 15: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Experimental Results: P/NP Detection and MPE Accuracy

Piece No.P/NP Detection Accuracy (%) MPE Accuracy (%)P1 P2 P3 P4 P5 Audio Video PNP GT PNP

# 1 97.4 91.5 - - - 70.2 83.6 85.1# 2 93.6 93.3 - - - 68.7 72.2 74.2# 3 81.1 71.3 - - - 58.5 62.7 70.0

# 4 92.5 91.4 78.4 - - 59.8 65.9 68.6# 5 93.9 92.9 89.4 - - 75.0 76.7 79.0

# 6 83.4 88.4 78.6 73.4 - 49.5 52.3 56.3# 7 69.3 73.6 75.1 70.1 - 52.1 52.0 59.0# 8 90.0 90.9 84.6 86.4 - 62.2 62.3 66.6# 9 93.1 95.5 82.4 91.5 - 62.2 63.3 65.7

# 10 91.9 92.3 88.5 94.1 91.2 47.4 52.3 53.3# 11 74.2 75.1 70.0 75.3 62.5 46.4 44.0 48.8

Table: Results of video-based Play/Non-play detection and MPE accuracy of the11 test pieces.

15 / 24

Page 16: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Experimental Results: Comparison of MPE AccuraciesAudio/Video/Ground Truth

Experiments on 53 duets, 38 trios and 14 quartets

Baseline(audio,based

Figure: Boxplot of MPE accuracy grouped by polyphony on all subsets derivedfrom the 11 pieces, comparing the baseline audio-based method (dark gray),proposed visually informed method (light gray), and the incorporation ofground-truth PNP labels (white). The number above each box shows the meanvalue of the box. 16 / 24

Page 17: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Experimental Results: Comparison of MPS AccuraciesAudio/Video/Ground Truth

Experiments on 53 duets, 38 trios and 14 quartets

Baseline(audio,based

Figure: Boxplot of MPS accuracy grouped by polyphony on all subsets derivedfrom the 11 pieces, comparing the baseline audio-based method (dark gray),proposed visually informed method (light gray), and the incorporation ofground-truth PNP labels (white). The number above each box shows the meanvalue of the box. 17 / 24

Page 18: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Conclusion

We demonstrated a novel technique of visually informed multi-pitchanalysis for string ensembles

Video based play/non-play detection technique was used

To obtain concurrent pitches in each time frame (MPE)To assign the estimated pitches to corresponding sound sources (MPS)

Experimental results show

Video based P/NP detection has accuracy of 85.3%Statistically significant improvement on both the MPE and MPSaccuracy at a significance level of 0.01 in most cases

With improvement in underlying MPE/MPS integration with P/NP,better results can be obtained

18 / 24

Page 19: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

References

[1] D. Sun, S. Roth, and M. J. Black, “Secrets of optical flow estimationand their principles,” in Proc. IEEE Conf. Computer Vision andPattern Recognition (CVPR), 2010, pp. 2432–2439.

[2] Z. Duan, B. Pardo, and C. Zhang, “Multiple fundamental frequencyestimation by modeling spectral peaks and non-peak regions,” IEEETrans. Audio, Speech, Language Process., vol. 18, no. 8, pp.2121–2133, 2010.

[3] Z. Duan, J. Han, and B. Pardo, “Multi-pitch streaming of harmonicsound mixtures,” IEEE/ACM Trans. Audio, Speech, LanguageProcess., vol. 22, no. 1, pp. 138–150, 2014.

[4] B. Li, X. Liu, K. Dinesh, Z. Duan, and G. Sharma, “Creating a musicalperformance dataset for multimodal music analysis: Challenges,insights, and applications,” IEEE Trans. Multimedia, submitted.Available: https://arxiv.org/abs/1612.08727.

19 / 24

Page 20: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Back Up Slides

20 / 24

Page 21: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Pieces Used in ExperimentsPiece number Piece Name Polyphony Performance Style Description

#1 01 Jupiter vn vc 2Motion is easy to capture.

All players are playing at most time

#2 02 Sonata vn vn 2Motion is easy to capture.

All players are playing at most time

#3 19 Pavane cl vn vc 3Some plucking motion for the

violin and cello

#4 12 Spring vn vn vc 3

Motion is easy to capture for player 1 and 2.For player 3, some soft articulation is

from slow motion,which may be difficult to capture

#5 13 Hark vn vn va 3Motion is easy to capture.

All players are playing at most time

#6 24 Pirates vn vn va vc 4Motion is easy to capture.

All players are playing at most time

#7 26 King vn vn va vc 4A lot of repeated notes,

where the bow motion is slight

#8 32 Fugue vn vn va vc 4Motion is easy to capture.

Different players play alternatively sometimes

#9 36 Rondeau vn vn va vc 4Motion is easy to capture.

All players are playing at most time.

#10 38 Jerusalem vn vn va vc db 5Motion is easy to capture.

All players are playing at most time.

#11 44 K515 vn vn va va vc 5Some fast notes are played by legato bowing,

where the bow motion is slow.

Table: Pieces used in the experiment with polyphony and performance style21 / 24

Page 22: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Problematic Pieces

Time in beats0 5 10 15 20 25 30

Pitc

h

A2#

C3#

E3

G3

A3#

C4#

E4

G4

A4#

ch1ch2

ch3

ch4

Time in beats0 10 20 30 40 50 60 70 80

Pitc

h

D2#

G2#

C3#

F3#

B3

E4

A4

D5

G5

C6

F6

ch1

ch2

ch3

ch4

ch5

Figure: MIDI plot for segments from pieces (#7) 26-In hall of mountain king(top) and (#11) 44-K515 (bottom) which have limited bow motion

22 / 24

Page 23: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Audio Based Multipitch Analysis

Multi-pitch Estimation

Likelihood method [2]

Model peak/non-peak region of spectrum

Interative greedy search → estimate pitch one by one

23 / 24

Page 24: Visually Informed Multi-Pitch Analysis of String Ensemblesbli23/publication/dinesh2017visually_slide… · Visually Informed Multi-Pitch Analysis of String Ensembles Karthik Dinesh,

Audio Based Multipitch Analysis

Multi-pitch Streaming

Constrained clustering method [3]

Constraints on timbre consistency

Constraints on time-frequency relationship

24 / 24