32
1 A Brief Overview of Audio Information Retrieval Unjung Nam CCRMA Stanford University

A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

1

A Brief Overview ofAudio Information Retrieval

Unjung NamCCRMAStanford University

Page 2: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 2

Outline

� What is AIR?� Motivation� Related Field of Research� Elements of AIR� Experiments and discussion

– Music Classification System

Page 3: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 3

What is AIR?

� Audio Information Retrieval (AIR):– Audio Information: Speech, Music, Natural sounds, etc.

– To develop various methods in order to recognize the audioinformation

Speech?Music?Animal Sound?Clock Alarm?

:

HumanAudio

Computer

Page 4: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 4

What is AIR? Applications

� Speech-related retrieval– Recognizing and Transcribing the content of Radio

programs, Telephone conversations, Recorded Meetings

� Music-related retrieval– Music similarity, Music style classification, Instrument

recognition

� Others audio retrieval applications– Alarms, animal sounds, natural sounds, etc.

Page 5: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 5

What is AIR? Muscle Fish Audio Retrieval

http://www.musclefish.com

Page 6: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 6

Motivation

� Multimedia Information stored on computersystems increases due to Internet.

� Multimedia database is classified/retrieved in amanual process which is often subjective andinaccurate when describing audio.

� Multimedia database should be handled by themethods of “automatic” analysis, segmentation,indexing and retrieval.

Page 7: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 7

Related Field of Research

� Automatic Speech Recognition– Overview of the process

� Audio/Video Classification– Video Mail Retrieval

� Computer Vision– Image Retrieval, Face Recognition

� Multimedia Database Management

Page 8: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 8

Automatic Speech Recognition:Overview

Page 9: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 9

Audio/Video Classification:Video Mail Retrieval

Page 10: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 10

Computer Vision:Image Retrieval

Page 11: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 11

Computer Vision:Face Recognition

Page 12: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 12

Multimedia Database Management:Characteristics of Multimedia Information Retrieval

� Content based retrieval�Automatic Indexing� Similarity Matching

– Similar content may have different representation

– Data filtering rather than exact matching (data selection)

� Browsing and Relevance feedback– No ideal mathematical model for defining similarity,

human feedback is required

Page 13: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 13

Elements of AIR

Building Classification Database Retrieval of Best Matching Audio

Audio Input

Feature Extraction

Audio Signal

Feature Extraction

Feature Classification/Clustering

ClassificationModel Space

Projection to Model Space

Find Matching Model Space

Retrieval of Best Match

Page 14: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 14

Feature Extraction

� Time domain feature modules– Short-Time Energy and Average Magnitude

– Short-Time Average Zero-Crossing Rate– Linear Prediction– Pulse metric, etc.

� Spectral domain feature models• STFT• Spectral Centroid• Harmony Analysis

• MFCC• Constant Q, etc.

Page 15: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 15

Classification/Clustering Methods

� Deterministic Methods– Minimum Distance Classifier– k-nearest neighbor (k-NN)– Discriminant functions– Generalized Discriminators, etc.

� Statistical Methods– Class-related Probability Functions– Minimum Error Classification– Likelihood-based MAP Classification– Approximating a Bayes Classifier– Parameterization and Probability Estimation: Hidden Markov Model,

etc.

Page 16: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 16

Experiments: Music Classification System

Genre category File No. Filename Duration(secs.)

MATLAB Graphic notation

1 dance.wav 11 ‘go’2 rocky.wav 13 ‘gx’

Jazz

3 band.wav 46 ‘g+’4 queen.wav 3 ‘gd’5 susqnet.wav 16 ‘*’6 pop.wav 26 ‘v’7 latin.wav 26 ‘*’8 latin2.wav 16 ‘+’9 reggae.wav 26 ‘d’

Pop/Rock

10 reggae2.wav 26 ‘^’11 phantom.wav 6 ‘ro’12 quartet4.wav 10 ‘rv’13 quartet3.wav 7 ‘rx’14 quartet2.wav 8 ‘r*’15 quartet1.wav 10 ‘r^’16 musicnight.wav 14 ‘r<’17 angel.wav 9 ‘r>’18 piacel.wav 8 ‘r+’19 piavio.wav 7 ‘rx’

Classic

20 piavio1.wav 7 ‘rs’Test Signal b3.wav 7 ‘k.’

Page 17: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 17

Experiments: Feature Modules

� Spectral Centroid– Center Gravity of Spectrum: Brightness of a sound

Sound File frame STFT

Input

– The individual centroid of a spectral frame is defined as the averagefrequency weighted by amplitudes, divided by the sum of the amplitudes,or:

– Here, F [k] is the amplitude corresponding to bin k in DFT spectrum.

=

== N

k

N

k

kF

kkFCentroidSpectral

1

1

][

][

NPreprocessing

&Windowing

Number of NSpectral Centroid

Page 18: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 18

Experiments: Spectral Centroid

Jazz

Pop/Rock

Classic

Page 19: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 19

Experiments: Spectral Centroid

Spe

ctra

lCen

troi

d

The weighted averagespectral centroid of eachframes in 20 sound files.

Green: jazzBlue: pop/rockRed: classic

• pop/rock higher thanclassical

• classical fluctuatinga lot

Page 20: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 20

Experiments: Feature Modules

� Short-Time Energy Function– Amplitude variation over the time– Rhythm and periodicity information

Sound File frame

Input

– Where x(m) is the discrete time audio signal, n is time index ofthe short-time energy, and w(m) is a rectangle window

NPreprocessing

&Windowing

Number of NEnergy Change

[ ]21 )()(∑ −=m

N mnwmxEn −≤≤

=otherwise.,0

,10,1)(

Nxxw

Page 21: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 21

Experiments: Short-Time Energy Function

Jazz: dance.wav

Page 22: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 22

Experiments: Short-Time Energy Function

The pop/rock musicsamples show the mostfluctuating energy whileclassical music samplesshows stable energyfluctuation. The energychanges of jazz samplesseem to show mediumfluctuation.

Page 23: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 23

Experiments: Feature Modules

� Short-Time Average Zero Crossing Rate– Zero-Crossing Rate (ZCR) is a measure of how often the signal

– crosses zero per unit time.– occur if successive samples have different signs.

– The rate at which zero-crossings occur is a simple measure of thefrequency content of a signal.

– w(n) is a rectangle window of length N

[ ] [ ]

[ ]

<−≥

=

−−−= ∑

,0)(,1

,0)(,1)(sgn

),()1(sgn)(sgn2

1

nx

nxnxwhere

mnwmxmxZnm

Page 24: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 24

Experiments: Short-Time Average ZCR

Classic: quartet4.wav

Page 25: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 25

Experiments: Short-Time Average ZCR

It doesn’t seem toshow any indicationof classifying thethree different musicgenres.

Page 26: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 26

Experiments: Classification/Clustering Methods

� K-means clustering algorithm– K-means cluster analysis programs begin by creating

the K clusters according to some arbitrary procedure.– The program calculates the means or centroids of each

of the clusters.– If one of the observations is closer to the centroid of another

cluster, then the observation is made a member of that cluster.

� K-Nearest Neighbour Classifier (KNN)– to classify a feature space with a given set of sample data by

evaluating the k nearest sample points of each point inthe feature space.

Page 27: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 27

Experiments: Classification/Clustering Methods

K-means clustering KNN classifier

Page 28: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 28

Experiments: Classification/Clustering Methods

Feature vectors of the 20input files get extracted ineach frame and gotplotted.red: classicalblue: pop/rockgreen: jazz

The first figure: spectralcentroid in x-axis andshort-time energy in yaxis. The second: short-time energy in x-axis andshort-time ZCR in y-axis.The third figure: 3dimensional space

Page 29: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 29

Experiments: Classification/Clustering Methods

The means of featurevectors of 20 musicsamples are plotted in3 dimensional space

Page 30: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 30

Experiments: Classification/Clustering Methods

The feature vectors of the test signal b3.wav gets plotted as blackdots in 2 dimensional space.

Page 31: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 31

Experiments: Classification/Clustering Methods

The nearest neighbours aredetermined using Euclideandistance. Each mean of the 20sound samples gets the predictedclass labels as an index 1 to 20.Each of the feature vectors of thetest signal is assigned to one of 20means. The test signal in thefigure above determined that thenumber of feature vectors assignedto 13 is the greatest. The followingdescribes the result in MATLAB.

Test 13. Quartet3.wav

>> classfier

ans =

class is 13

>> classPoint

classPoint =

16556

14002

12326

27 % 27 feature vectors assigned to 13 th class5

1222007

>>

Page 32: A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam 23 Experiments: Feature Modules Short-Time Average Zero Crossing Rate – Zero-Crossing

December 2000 by Unjung Nam 32

Experiments: Discussion

� Though the test signal and the quartet3.wav are not fell into asame category, they sounded similar in terms of rhythm andtempo information. It seems that this system doesn’teffectively classify the timbre information.

� The number of feature modules are limited in this system.

� The variance factor of the feature vectors is not considered.

� Need to experiment with more samples.