A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam...

A Brief Overview ofAudio Information Retrieval

Unjung NamCCRMAStanford University

December 2000 by Unjung Nam 2

Outline

� What is AIR?� Motivation� Related Field of Research� Elements of AIR� Experiments and discussion

– Music Classification System

What is AIR?

� Audio Information Retrieval (AIR):– Audio Information: Speech, Music, Natural sounds, etc.

– To develop various methods in order to recognize the audioinformation

Speech?Music?Animal Sound?Clock Alarm?

HumanAudio

Computer

What is AIR? Applications

� Speech-related retrieval– Recognizing and Transcribing the content of Radio

programs, Telephone conversations, Recorded Meetings

� Music-related retrieval– Music similarity, Music style classification, Instrument

recognition

� Others audio retrieval applications– Alarms, animal sounds, natural sounds, etc.

What is AIR? Muscle Fish Audio Retrieval

http://www.musclefish.com

Motivation

� Multimedia Information stored on computersystems increases due to Internet.

� Multimedia database is classified/retrieved in amanual process which is often subjective andinaccurate when describing audio.

� Multimedia database should be handled by themethods of “automatic” analysis, segmentation,indexing and retrieval.

Related Field of Research

� Automatic Speech Recognition– Overview of the process

� Audio/Video Classification– Video Mail Retrieval

� Computer Vision– Image Retrieval, Face Recognition

� Multimedia Database Management

Automatic Speech Recognition:Overview

Audio/Video Classification:Video Mail Retrieval

Computer Vision:Image Retrieval

Computer Vision:Face Recognition

Multimedia Database Management:Characteristics of Multimedia Information Retrieval

� Content based retrieval�Automatic Indexing� Similarity Matching

– Similar content may have different representation

– Data filtering rather than exact matching (data selection)

� Browsing and Relevance feedback– No ideal mathematical model for defining similarity,

human feedback is required

Elements of AIR

Building Classification Database Retrieval of Best Matching Audio

Audio Input

Feature Extraction

Audio Signal

Feature Extraction

Feature Classification/Clustering

ClassificationModel Space

Projection to Model Space

Find Matching Model Space

Retrieval of Best Match

Feature Extraction

� Time domain feature modules– Short-Time Energy and Average Magnitude

– Short-Time Average Zero-Crossing Rate– Linear Prediction– Pulse metric, etc.

� Spectral domain feature models• STFT• Spectral Centroid• Harmony Analysis

• MFCC• Constant Q, etc.

Classification/Clustering Methods

� Deterministic Methods– Minimum Distance Classifier– k-nearest neighbor (k-NN)– Discriminant functions– Generalized Discriminators, etc.

� Statistical Methods– Class-related Probability Functions– Minimum Error Classification– Likelihood-based MAP Classification– Approximating a Bayes Classifier– Parameterization and Probability Estimation: Hidden Markov Model,

Experiments: Music Classification System

Genre category File No. Filename Duration(secs.)

MATLAB Graphic notation

1 dance.wav 11 ‘go’2 rocky.wav 13 ‘gx’

3 band.wav 46 ‘g+’4 queen.wav 3 ‘gd’5 susqnet.wav 16 ‘*’6 pop.wav 26 ‘v’7 latin.wav 26 ‘*’8 latin2.wav 16 ‘+’9 reggae.wav 26 ‘d’

Pop/Rock

10 reggae2.wav 26 ‘^’11 phantom.wav 6 ‘ro’12 quartet4.wav 10 ‘rv’13 quartet3.wav 7 ‘rx’14 quartet2.wav 8 ‘r*’15 quartet1.wav 10 ‘r^’16 musicnight.wav 14 ‘r<’17 angel.wav 9 ‘r>’18 piacel.wav 8 ‘r+’19 piavio.wav 7 ‘rx’

Classic

20 piavio1.wav 7 ‘rs’Test Signal b3.wav 7 ‘k.’

Experiments: Feature Modules

� Spectral Centroid– Center Gravity of Spectrum: Brightness of a sound

Sound File frame STFT

– The individual centroid of a spectral frame is defined as the averagefrequency weighted by amplitudes, divided by the sum of the amplitudes,or:

– Here, F [k] is the amplitude corresponding to bin k in DFT spectrum.

kkFCentroidSpectral

NPreprocessing

&Windowing

Number of NSpectral Centroid

Experiments: Spectral Centroid

Pop/Rock

Classic

Experiments: Spectral Centroid

The weighted averagespectral centroid of eachframes in 20 sound files.

Green: jazzBlue: pop/rockRed: classic

• pop/rock higher thanclassical

• classical fluctuatinga lot

� Short-Time Energy Function– Amplitude variation over the time– Rhythm and periodicity information

Sound File frame

– Where x(m) is the discrete time audio signal, n is time index ofthe short-time energy, and w(m) is a rectangle window

NPreprocessing

&Windowing

Number of NEnergy Change

[ ]21 )()(∑ −=m

N mnwmxEn −≤≤

=otherwise.,0

,10,1)(

Experiments: Short-Time Energy Function

Jazz: dance.wav

Experiments: Short-Time Energy Function

The pop/rock musicsamples show the mostfluctuating energy whileclassical music samplesshows stable energyfluctuation. The energychanges of jazz samplesseem to show mediumfluctuation.

� Short-Time Average Zero Crossing Rate– Zero-Crossing Rate (ZCR) is a measure of how often the signal

– crosses zero per unit time.– occur if successive samples have different signs.

– The rate at which zero-crossings occur is a simple measure of thefrequency content of a signal.

– w(n) is a rectangle window of length N

[ ] [ ]

<−≥

−−−= ∑

,0)(,1

,0)(,1)(sgn

),()1(sgn)(sgn2

nxnxwhere

mnwmxmxZnm

Experiments: Short-Time Average ZCR

Classic: quartet4.wav

Experiments: Short-Time Average ZCR

It doesn’t seem toshow any indicationof classifying thethree different musicgenres.

Experiments: Classification/Clustering Methods

� K-means clustering algorithm– K-means cluster analysis programs begin by creating

the K clusters according to some arbitrary procedure.– The program calculates the means or centroids of each

of the clusters.– If one of the observations is closer to the centroid of another

cluster, then the observation is made a member of that cluster.

� K-Nearest Neighbour Classifier (KNN)– to classify a feature space with a given set of sample data by

evaluating the k nearest sample points of each point inthe feature space.

K-means clustering KNN classifier

Feature vectors of the 20input files get extracted ineach frame and gotplotted.red: classicalblue: pop/rockgreen: jazz

The first figure: spectralcentroid in x-axis andshort-time energy in yaxis. The second: short-time energy in x-axis andshort-time ZCR in y-axis.The third figure: 3dimensional space

The means of featurevectors of 20 musicsamples are plotted in3 dimensional space

The feature vectors of the test signal b3.wav gets plotted as blackdots in 2 dimensional space.

The nearest neighbours aredetermined using Euclideandistance. Each mean of the 20sound samples gets the predictedclass labels as an index 1 to 20.Each of the feature vectors of thetest signal is assigned to one of 20means. The test signal in thefigure above determined that thenumber of feature vectors assignedto 13 is the greatest. The followingdescribes the result in MATLAB.

Test 13. Quartet3.wav

>> classfier

class is 13

>> classPoint

classPoint =

27 % 27 feature vectors assigned to 13 th class5

1222007

Experiments: Discussion

� Though the test signal and the quartet3.wav are not fell into asame category, they sounded similar in terms of rhythm andtempo information. It seems that this system doesn’teffectively classify the timbre information.

� The number of feature modules are limited in this system.

� The variance factor of the feature vectors is not considered.

� Need to experiment with more samples.

A Brief Overview of Audio Information Retrievalunjung/AIR/KEAMS.pdf · December 2000 by Unjung Nam...

Documents

Inductor Current Zero-Crossing Detector and CCM/DCM ... · Inductor Current Zero-Crossing Detector and CCM/DCM Boundary Detector for Integrated High-Current Switched-Mode DC–DC

Handouk Electronics Co., Ltd. All rights reserved. ZCR16D_1A_V10 Analogue Zero Crossing Relay Contact Data Dec. 2008. Rev. 01 1/4 Input : 12VDC

HMONG RITUALS AND MUSIC - tribalmusicasia.com€¦ · instruments, ritual and musical, were displayed along with fine costumes ... Shaman Rhiav Lis crossing the Nam La River to her

A Comparison of Two Types of Zero-crossing FM Demodulators for Wireless Receivers

Features Based on Zero-Crossing

Zero Crossing Detector on Curve Using Laplacian of Gaussian (LoG)

Assamese Vowel Phoneme Recognition Using Zero …ijarcsse.com/Before_August_2017/docs/papers/Volume... · Assamese Vowel Phoneme Recognition Using Zero Crossing Rate and Short-time

Flume experiments on intermittency and zero-crossing ... · Flume experiments on intermittency and zero-crossing properties of canopy turbulence Davide Poggi1,a and Gabriel Katul2,b

NLP (Fall 2013): Audio Processing, Zero Crossing Rate, Dynamic Time Warping, Spoken Word Recognition

3-phase BLDC Motor Control with Sensorless Back-EMF ADC Zero … · 2019-12-11 · Target Motor Theory 3-Phase BLDC Motor Control with Sensorless Back-EMF, ADC, Zero Crossing, Rev

Zero-field level crossing and optical radio-frequency double

Layouts of Zero-Crossing Extraction, Feed-Forward Correction, and Fast Feedback Stabilisation

IB3 =( + 1) IE2 Zero-Crossing Detectorcatarina.udlap.mx/u_dl_a/tales/documentos/meie/carrillo_a_j/apendi… · 1. In this unusual zero-crossing detector, even though Q2/Q3 appear

UNIVERSITI PUTRA MALAYSIA STRUCTURES OF …psasir.upm.edu.my/8605/1/FSAS_1996_3_A.pdf · Benzyl DimethylT etradecylAmmonium Chloride Cubic Phase Cubic Phase Zero-Crossing Pickoff

AN1913_3-Phase BLDC Motor Control With Sensor Less Back-EMF ADC Zero Crossing Detection Using

Zero Crossing[1]

00206B3AD803190917144636 › eiu › uploads › files › DS_Khong_Duoc_Gia...ÐÄNG THI NGOC UYNH LE TO TUYET NHU NGUYEN TRAN PHUONG NGA Nü Nam Nam Nü Nam Ntr Nam Nam Nam Nam Nü

LECTURE 9 LECTURE OUTLINE - MIT OpenCourseWare · LECTURE 9 LECTURE OUTLINE •Minimax problems and zero-sum games •Min Common / Max Crossing duality for min-imax and zero-sum games

LM1815 Adaptive Variable Reluctance Sensor Amplifier ... · LM1815 Adaptive Variable Reluctance Sensor Amplifier ... OPERATION OF ZERO CROSSING DETECTOR ... observed at the variable

Time Switches (digital/analogue), Sun Relays & Accessories · Time Switches (digital/analogue), Sun Relays & Accessories ... • Switching at Zero-crossing ... • Indication of channel