8
Audio Lifelogging - Dan Ellis 2014-01-29 /6 1 1. Audio Lifelogging 2. Speech 3. Environmental Sound Analysis 4. Medical Applications Audio Lifelogging Dan Ellis Lab oratory for R ecognition and O rganization of S peech and A udio Dept. Electrical Eng., Columbia Univ., NY USA [email protected] http://labrosa.ee.columbia.edu /

Audio Lifelogging - Columbia Universitydpwe/talks/IDSE-lifelog-2014-01.pdf · Audio Lifelogging Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Audio Lifelogging - Columbia Universitydpwe/talks/IDSE-lifelog-2014-01.pdf · Audio Lifelogging Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical

Audio Lifelogging - Dan Ellis 2014-01-29 /6���1

1. Audio Lifelogging 2. Speech 3. Environmental Sound Analysis 4. Medical Applications

Audio Lifelogging Dan Ellis

Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA

[email protected] http://labrosa.ee.columbia.edu/

Page 2: Audio Lifelogging - Columbia Universitydpwe/talks/IDSE-lifelog-2014-01.pdf · Audio Lifelogging Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical

Audio Lifelogging - Dan Ellis 2014-01-29 /6

Personal Audio Lifelogs!!

• Easy to record everything you hear ~250GB / year @ 64 kbps

!!

• Very hard to find anything how to scan? how to visualize? how to index?

���2

Page 3: Audio Lifelogging - Columbia Universitydpwe/talks/IDSE-lifelog-2014-01.pdf · Audio Lifelogging Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical

Audio Lifelogging - Dan Ellis 2014-01-29 /6

Speech Recognition• Transcripts of discussions would be useful…

distant-mic / noisy ASR still quite limited CHiME (Comp. Hearing in Multisource Env.s):

���3

Barker et al. ’13 http://spandh.dcs.shef.ac.uk/chime_workshop/slides/CHiME13_overview.pdf

Datasets and tasks

The ‘CHiME’ noise backgrounds

Binaural noise backgrounds recorded in a family home (living room).

Plenty of sources, well-defined application domain with a learnablenoise ‘vocabulary’ and ‘grammar’.

Total of 14 h of audio in 0.5 to 1.5 h sessions over several weeks.

(The 2nd ‘CHiME’ Challenge 01/06/2013 6 / 23

Results

Track 2 results

−6 −3 0 3 6 90

10

20

30

40

50

60

70

80

SNR (dB)

Wo

rd e

rro

r ra

te (

%)

ASR Baseline (reverb)ASR Baseline (noisy)TU Tampere & KU LeuvenTU Munich, TUT, KUL & BMWFBK−Irst & INESC−IDMitsubishi Electric

Best system: spatial enhancement, MLLT, SAT, LDA, f-bMMI, featureaugmentation, bMMI noise-adaptive training, DLM and MBR decoding

(The 2nd ‘CHiME’ Challenge 01/06/2013 22 / 23

Results

Track 2 results

−6 −3 0 3 6 90

10

20

30

40

50

60

70

80

SNR (dB)

Word

err

or

rate

(%

)

ASR Baseline (reverb)ASR Baseline (noisy)TU Tampere & KU LeuvenTU Munich, TUT, KUL & BMWFBK−Irst & INESC−IDMitsubishi Electric

Best system: spatial enhancement, MLLT, SAT, LDA, f-bMMI, featureaugmentation, bMMI noise-adaptive training, DLM and MBR decoding

(The 2nd ‘CHiME’ Challenge 01/06/2013 22 / 23

Page 4: Audio Lifelogging - Columbia Universitydpwe/talks/IDSE-lifelog-2014-01.pdf · Audio Lifelogging Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical

Audio Lifelogging - Dan Ellis 2014-01-29 /6

Environmental Sound Classification• Classify soundtracks with “texture” features

!!!!

• Mixed results…

���4

SoundAutomatic

gaincontrol

Envelopecorrelation

Cross-bandcorrelations

(318 samples)

Modulationenergy(18 x 6)

mean, var,skew, kurt(18 x 4)

melfilterbank

(18 chans) x

x

xxxx

FFT Octave bins0.5,1,2,4,8,16 Hz

Histogram

Ellis, Zheng, McDermott ’11

Page 5: Audio Lifelogging - Columbia Universitydpwe/talks/IDSE-lifelog-2014-01.pdf · Audio Lifelogging Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical

Audio Lifelogging - Dan Ellis 2014-01-29 /6

Segmentation & Clustering!

• Features from 1 min windows !

• Segmentation by local statistics !

• Cluster across whole set !

• Manual labels of each class

���5

09:00

09:30

10:00

10:30

11:00

11:30

12:00

12:30

13:00

13:30

14:00

14:30

15:00

15:30

16:00

16:30

17:00

17:30

18:00

preschool

cafepreschoolcafelecture

officeofficeoutdoorgroup

lab

cafemeeting2

office

office

outdoorlabcafe

Ron

Manuel

Arroyo?

Lesser

2004-09-13

L2cafeoffice

outdoorlecture

outdoormeeting2

outdoorofficecafeoffice

outdoorofficepostlecoffice

DSP03

compmtg

Mike

Sambarta?

2004-09-14

Lee & Ellis ’04

Page 6: Audio Lifelogging - Columbia Universitydpwe/talks/IDSE-lifelog-2014-01.pdf · Audio Lifelogging Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical

Audio Lifelogging - Dan Ellis 2014-01-29 /6

Medical Applications• Tracking behavior & symptoms

coughing sleep disturbances

• Tracking hospital interactions correlate recordings

���6

Page 7: Audio Lifelogging - Columbia Universitydpwe/talks/IDSE-lifelog-2014-01.pdf · Audio Lifelogging Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical

Audio Lifelogging - Dan Ellis 2014-01-29 /6

Privacy• Privacy-preserving features

scramble audio over 200ms windows

���7

Original (dan+kean-ex.wav)

0

time / s

level / dBfre

q / k

Hz

Scrambled (200ms wins over 1s)

0 2 4 6 8 10 12 140

2

4

freq

/ kH

z

2

4

-60-40-20020

Hearium

• Self-audio only augmented-realityearphones/mics

Page 8: Audio Lifelogging - Columbia Universitydpwe/talks/IDSE-lifelog-2014-01.pdf · Audio Lifelogging Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical

Audio Lifelogging - Dan Ellis 2014-01-29 /6

Summary• Personal Audio Lifelogs

Too good to waste! !

• Speech & Nonspeech ContentBoth useful in different ways !

• ApplicationsPersonal information, behavior measurement

���8