Cross-Modal (Visual-Auditory) Denoising

Preview:

DESCRIPTION

1. Cross-Modal (Visual-Auditory) Denoising. Dana Segev Yoav Y. Schechner Michael Elad. Technion – Israel Institute of Technology. Motivation. Noisy digits sequence. Digits sequence. Denoised by state of the art algorithm of Cohen & Berdugo. Segev, Schechner, Elad, Cross-Modal Denoising. - PowerPoint PPT Presentation

Citation preview

Cross-Modal (Visual-Auditory)

DenoisingDana Segev

Yoav Y. Schechner

Michael Elad

Technion – Israel Institute of Technology

1

2

Digits sequence Noisy digits sequence

Denoised by state of the art algorithm of Cohen & Berdugo

Segev, Schechner, Elad, Cross-Modal Denoising

Use one modality to denoise another?

• Use video to denoise a soundtrack?

3

Segev, Schechner, Elad, Cross-Modal Denoising

a

Very intenseNon-stationaryUnknownUnseen source.

Noise

Single microphone

4

Segev, Schechner, Elad, Cross-Modal Denoising

5

very noisy audio

time (sec)

Input

Algorithm

denoised audio

OutputFor human and machine hearing

video

Cross-modalExample-

Based

Segev, Schechner, Elad, Cross-Modal Denoising

6

Segev, Schechner, Elad, Cross-Modal Denoising

7

Segev, Schechner, Elad, Cross-Modal Denoising

8

Training xample set

nput test set

I

E

Segev, Schechner, Elad, Cross-Modal Denoising

9

Segev, Schechner, Elad, Cross-Modal Denoising

10

~syllable(0.25 sec)

Segev, Schechner, Elad, Cross-Modal Denoising

lophone

11

Xylophone

Segev, Schechner, Elad, Cross-Modal Denoising

lophone

12

Sound

Xylophone

Segev, Schechner, Elad, Cross-Modal Denoising

13

... ...

Exam

ple

s

Segev, Schechner, Elad, Cross-Modal Denoising

14

... ...

Exam

ple

s

Segev, Schechner, Elad, Cross-Modal Denoising

15

... ...

Exam

ple

s

Segev, Schechner, Elad, Cross-Modal Denoising

16

... ...

Exam

ple

s

Segev, Schechner, Elad, Cross-Modal Denoising

Cross-modal representation.

17

• Generating multimodal features.

• Cross-modal pattern recognition.

• Rendering a denoised signal.

• Learning feature statistics.

Segev, Schechner, Elad, Cross-Modal Denoising

18

Input video

Video feature-space

time (sec)

Input audio

Audio feature-spaceSegev, Schechner, Elad, Cross-Modal

Denoising

19

Input audio-video

time (sec)

Audio-video feature-space

Segev, Schechner, Elad, Cross-Modal Denoising

20

Training audio-video

Audio-video examples

feature-space

time (sec)

Segev, Schechner, Elad, Cross-Modal Denoising

21

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising

22

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising

23

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising

24

Nearest Neighbor

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising

25

Nearest Neighbor

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising

26

Exam

ple

s

... ...

Segev, Schechner, Elad, Cross-Modal Denoising

27

Exam

ple

s

... ...

Segev, Schechner, Elad, Cross-Modal Denoising

28

Noisy audio

Clean segment

Clean segment

Clean segment

Segev, Schechner, Elad, Cross-Modal Denoising

29

Noisy audio

Clean segment

Clean segment

Clean segment Denoised

Segev, Schechner, Elad, Cross-Modal Denoising

Exam

ple

s

... ...

30

Segev, Schechner, Elad, Cross-Modal Denoising

31

Examples..

. ..

.

Input

...

...

Segev, Schechner, Elad, Cross-Modal Denoising

32

...

...

...

...

Examples

Input

Segev, Schechner, Elad, Cross-Modal Denoising

33

...

...

...

...

...

...

...

...

...

...

Examples

Input

Segev, Schechner, Elad, Cross-Modal Denoising

34

...

...

...

...

...

...

...

...

...

...

Examples

Input

Segev, Schechner, Elad, Cross-Modal Denoising

Bartender experiment

35

Segev, Schechner, Elad, Cross-Modal Denoising

36

...

...

...

...

...

...

...

...

...

...

Examples

Input

Segev, Schechner, Elad, Cross-Modal Denoising

Cross-modal representation.

37

• Generating multimodal features.

Cross-modal pattern recognition (NN).Rendering a denoised signal.

• Learning feature statistics.

Segev, Schechner, Elad, Cross-Modal Denoising

38

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising

39

Feature-spaceFor the k-th

example segment:

Segev, Schechner, Elad, Cross-Modal Denoising

40

Feature-space

bi

fif

ty

two

ar

bi - fif - ty- two

For the k-th example segment:

Segev, Schechner, Elad, Cross-Modal Denoising

41

Current cluster

Next cluster

bi ty fif two ar

bi

tyfif

twoar

1

1

1

1

1

1

1

Feature-space

bi

fif

ty

two

ar

1

2

1

Segev, Schechner, Elad, Cross-Modal Denoising

42

Current cluster

Next cluster

bi ty fif two ar

bi

tyfif

twoar

13

17

22

9

43

21

53

60

2

3

7 11

6

23

12

5

7

6

1

2

4

526 1

12

Syllable consecutive probability

The probability for transition

between clusters

=Number of examples in training set

Segev, Schechner, Elad, Cross-Modal Denoising

43

Hidden Markov Model

PTimedelay

bi

fif

fif

bi

Segev, Schechner, Elad, Cross-Modal Denoising

44

PTimedelay

bi

fif

fif

bike

Audio noise

Segev, Schechner, Elad, Cross-Modal Denoising

45

Hidden Markov Model

PTimedelay

bi

fif

fif

bi

+mi

Audio noise

keSegev, Schechner, Elad, Cross-Modal Denoising

46

Examples..

. ..

.

Input

...

...

Segev, Schechner, Elad, Cross-Modal Denoising

47

...

...

Examples

Input

...

...

...

...

...

...

...

...

Segev, Schechner, Elad, Cross-Modal Denoising

48

...

...

Examples

Input

...

...

...

...

...

...

...

...

Segev, Schechner, Elad, Cross-Modal Denoising

49

Input video

Segev, Schechner, Elad, Cross-Modal Denoising

50

Input video

Segev, Schechner, Elad, Cross-Modal Denoising

51

Input video

Segev, Schechner, Elad, Cross-Modal Denoising

52

A Cost function

A Regularization term

A Data term

A Regularization term

A Data term

Segev, Schechner, Elad, Cross-Modal Denoising

53

A Cost function

A Regularization term

A Data term

A Regularization term

A Data term

Optimally vector of indices

Segev, Schechner, Elad, Cross-Modal Denoising

54

• nodes

• edges

Complexity:

Examples

Input

...

.. .

...

...

...

...

...

...

...

...

Complexity: Dynamic Programming

Segev, Schechner, Elad, Cross-Modal Denoising

55

...

...

Examples

Input

...

...

...

...

...

...

...

...

Segev, Schechner, Elad, Cross-Modal Denoising

56

...

...

Examples

Input

...

...

...

...

...

...

...

...

Segev, Schechner, Elad, Cross-Modal Denoising

57

...

...

Examples

Input

...

...

...

...

...

...

...

...

Segev, Schechner, Elad, Cross-Modal Denoising

Cross-modal representation.

58

• Generating multimodal features.

Cross-modal pattern recognition.

Rendering a denoised signal.

Learning feature statistics.

Segev, Schechner, Elad, Cross-Modal Denoising

Audio Features

59

• Sensitivity to sound perception.• Dimension reduction

Visual Features• Focusing on the

motion of interest• Dimension reduction

SpeechFeatures

MusicFeatures

Requirements

The spatial trajectoryof a hitting rod

DCT coefficients

MFCCs

Spectrogram of each segment

Segev, Schechner, Elad, Cross-Modal Denoising

60

MFCCs – Mel-frequency Ceptral Coefficients

Audio signalSignal spectrum

Mel-frequency filter bank log(.)

DCT

MFCCsSegev, Schechner, Elad, Cross-Modal Denoising

61

Spectrogram of each segment

Spectrogram

Xylophne signal

Spectrogram

accumulation

Segev, Schechner, Elad, Cross-Modal Denoising

The given movie

62

. . .

speech

Segev, Schechner, Elad, Cross-Modal Denoising

Locking on the object of interest

63

. . .speech

Segev, Schechner, Elad, Cross-Modal Denoising

64

. . .speech

Extracting global motion by tracking

Segev, Schechner, Elad, Cross-Modal Denoising

65

. . .speech

Extracting global motion by tracking

Segev, Schechner, Elad, Cross-Modal Denoising

Extracting features

66

DCT coefficients which highly represent motion between frames

speech

Segev, Schechner, Elad, Cross-Modal Denoising

The given movie

67

. . .

Xylophone

Segev, Schechner, Elad, Cross-Modal Denoising

Locking on the object of interest

68

Xylophone

. . .

Segev, Schechner, Elad, Cross-Modal Denoising

Extracting global motion by tracking

69

Xylophone

. . .

X

Z Y

Segev, Schechner, Elad, Cross-Modal Denoising

70

Xylophone

. . .X

Z Y

Extracting global motion by tracking

Segev, Schechner, Elad, Cross-Modal Denoising

Extracting features

71

Xylophone

Hitting rod spatial coordinates

X

YZ

Segev, Schechner, Elad, Cross-Modal Denoising

Speech

72

• A corpus of a limited number of words and

syllables:

Digits and bar beverages.

• Video rate 25fps, Audio rate 8000Hz.

• Kmeans clustering, 350 clusters.

• Distance measurement l2 norm.Xylophone

• A corpus of a limited sounds.

• Video rate 25fps, Audio rate 16000Hz

• Distance measurement l2 norm.Segev, Schechner, Elad, Cross-Modal Denoising

73

Xylophone

•Training duration: 103 sec

•Testing duration : 100 secMusic from song by

GNR: SNR = 0.9Xylophone

Melody: SNR = 1

Segev, Schechner, Elad, Cross-Modal Denoising

Speech: Digits

74

•Training duration: 60 sec•Testing duration : 240 sec

Noisy Denoised

SNR = 0.07

Segev, Schechner, Elad, Cross-Modal Denoising

Speech: Bartender

75

Music from song by Phil Collins

Male Speech White Gaussian

•Training duration: 48 sec

•Testing duration : 350 sec

SNR = 0.59

SNR = 0.3 SNR = 0.38

Segev, Schechner, Elad, Cross-Modal Denoising

76

video

very noisy audio

time (sec)

Input

Algorithm

denoised audio

OutputFor human and machine hearing

• Example-based• Hidden Markov Model

Segev, Schechner, Elad, Cross-Modal Denoising

Recommended