1
Improvement of Audibility for Multi Speakers with the Head Related Transfer Function Takanori Nishino , Kazuhiro Uchida, Naoya Inoue, Kazuya Takeda and Fumitada Itakura Center for Information Media Studies/CIAIR, Nagoya University Introductio n Measuremen t Experimen t 1 If two speakers talk at same time, it is difficult to distinguish the speaker and the details of the speech when the listener uses the headphone. However, the listener can hear the target speech when there are many speakers; it’s called the cocktail party effect. It is considered that the sound localization contributes to the cocktail party effect. When using the headphones, it is necessary to control the directions of the speakers for hearing. Sound localization can be controlled by the head related transfer function (HRTF) with the headphones. The HRTF is the acoustical transfer function that exists between the sound source and the entrance of the ear canal, and since the HRTF is characterized by the subjects and the source directions, it is necessary to measure the HRTFs of all subjects and directions. Because it is computationally expensive and time consuming to measure the HRTFs, it is necessary to estimate the HRTFs for the listener. In this study, we investigate the audible angle when there are two speakers using the auditory masking, and estimate suitable HRTFs for the listener. Conclusio ns HRTFs and anthropometric measures of right ear Evaluated bandwidth: 0.0 4.0 kHz, 8.0 kHz, 12.0 kHz, 16.0 kHz, 20.0 kHz, 22.0 kHz, 24.0 kHz Using from the first PC weight to th e fifth Multiple regression model is made by 70 subjects’ data (Measured) 8 subjects’ HRTFs are estimated by t he anthropometric measures and the m ultiple regression model (Unmeasur ed) Experiments are performed 71 mutuall y different conditions Evaluation by objective measure (Spectral Distortion) If the small SD is obtained, the e stimated HRTF is similar to the meas ured HRTF. Experimen t 2 Functio Basis : Weight PC : HRTF : n k k n n k k n w n w v H v H ] [ , , ] [ Size of Head and Ear Magnitude Response of HRTF Principal Component Analysis Multiple Regression Analysis Ear and Head of Size t Coefficie Regression Weight PC : : : ln 0 , ] [ ˆ x w l lk n k x n w Function Basis : Weight PC : HRTF of Delay : n k k k k k k n n k k n w n w v D D v D ] [ )] 355 ( , ), 5 ( ), 0 ( [ ] [ Size of Head and Ear Initial Delay of HRTF Principal Component Analysis Multiple Regression Analysis Ear and Head of Size t Coeffici Regression Weight PC : : : ln 0 , ] [ ˆ x w l lk n k x n w HRTF A sound wave is reflected and diffracted with the head and the ears. Physical size of the head and the ear The relation between the HRTFs and physical sizes of the head and the ear is investigated by the multiple regression analysis. The suitable HRTFs are estimated with the multiple regression model. Method The Analysis of the Magnitude Response The Analysis of the Initial Delay Experimental Conditions HRTF Estimated HRTF Measured : ] [ ˆ : ] [ ] dB [ | ] [ ˆ | | ] [ | log 20 1 SD 2 1 i i I i i i f H f H f H f H I Results 0 1 2 3 4 5 6 7 4 8 12 16 20 24 Frequency [kH z] SpectralDistortion [dB] M easured Subjects U nm easured Subjects Magnitude Response Generating the HRTFs Purpose Method Experimental Conditions Results Subject 78 subjects ( 63 males 15 f emales ) 1 Head and Torso Simulator Directions of sound source 72 azimuths Sampling frequency 48.0 kHz Durations of the HRTF 10.7 ms ( 512 points ) Anthropometric Measures (Burkhard and Sachs, 1975) Measurement Conditions 1 Ear Length 2 Ear Breadth 3 Concha Length 4 Concha Breadth 5 Protrusion 6 Bitragion Diam eter 7 Radial distances between the bitragion and the pronasale 8 Radial distances between the bitragion and the opistocranion 9 Radial distances between the bitragion and the vertex Average Error Measured Subjects 0.027 [ms] (1. 3 [points]) Unmeasured Subjects 0.031 [ms] (1. 5 [points]) Initial Delay Imperceptible (Toole and Sayers 196 No significant difference (Nishino et al. 1999) HRTF Database http://www.itakura.nuee.nagoya-u.ac.jp/HRTF/ -Contents HRTFs used in this experiments HRTFs measured for 72 azimuths and 28 elevations In this study, it is investigated to the improvement of the audibility for the multi speakers. As the results, it is more effective for audibility at the intervals of 45 degrees and the suitable HRTFs for the listeners can be estimated. The communication system for multi speakers can be designed with these results. It is not audible when the many speakers talk at the same time. To investigate the audible performances when the direction of the speakers is changed. It is more audible when the speakers are in the different direction. Maskee Japanese Speech (0°(front)) Masker Pink Noise (0°, 15°, , 90°) Subject 6 males It is more effective at the intervals of 45 degrees. 0 2 4 6 8 10 12 14 0 15 30 45 60 75 90 Interval[deg] M asking Level[dB HRTF ILD ,ITD The Method of the limits for measuring the masking level(2up-2down). • The subjects answer the recognizable or not when the stimuli are listened with the earphones. • The stimuli are convolution of the speech and the HRTFs or the interaural level/time difference. • If the higher level is obtained, it is easier to hear the maskee at that angle. Purpo se

Improvement of Audibility for Multi Speakers with the Head Related Transfer Function Takanori Nishino †, Kazuhiro Uchida, Naoya Inoue, Kazuya Takeda and

Embed Size (px)

Citation preview

Page 1: Improvement of Audibility for Multi Speakers with the Head Related Transfer Function Takanori Nishino †, Kazuhiro Uchida, Naoya Inoue, Kazuya Takeda and

Improvement of Audibility for Multi Speakerswith the Head Related Transfer Function

Takanori Nishino†, Kazuhiro Uchida, Naoya Inoue, Kazuya Takeda and Fumitada ItakuraCenter for Information Media Studies/CIAIR, Nagoya University

Introduction

Measurement

Experiment 1

If two speakers talk at same time, it is difficult to distinguish the speaker and the details of the speech when the listener uses the headphone. However, the listener can hear the target speech when there are many speakers; it’s called the cocktail party effect. It is considered that the sound localization contributes to the cocktail party effect. When using the headphones, it is necessary to control the directions of the speakers for hearing. Sound localization can be controlled by the head related transfer function (HRTF) with the headphones. The HRTF is the acoustical transfer function that exists between the sound source and the entrance of the ear canal, and since the HRTF is characterized by the subjects and the source directions, it is necessary to measure the HRTFs of all subjects and directions. Because it is computationally expensive and time consuming to measure the HRTFs, it is necessary to estimate the HRTFs for the listener. In this study, we investigate the audible angle when there are two speakers using the auditory masking, and estimate suitable HRTFs for the listener.

Conclusions

• HRTFs and anthropometric measures of right ear

• Evaluated bandwidth:0.0 ~ 4.0 kHz, 8.0 kHz, 12.0 kHz, 16.

0 kHz, 20.0 kHz, 22.0 kHz, 24.0 kHz• Using from the first PC weight to the fifth

• Multiple regression model is made by 70 subjects’ data (Measured)

• 8 subjects’ HRTFs are estimated by the anthropometric measures and the multiple regression model (Unmeasured)

• Experiments are performed 71 mutually different conditions

• Evaluation by objective measure (Spectral Distortion)

If the small SD is obtained, the estimated HRTF is similar to the measured HRTF.

Experiment 2

Function Basis:

WeightPC :

HRTF :

n

k

k

nnkk

nw

nw

v

H

vH

][

,

, ][

Size of Head and EarMagnitude Response of HRTF

Principal Component Analysis

Multiple Regression Analysis

Ear and Head of Size

tCoefficien Regression

WeightPC

:

:

:

ln0, ][ˆ

x

w

llknk xnw

Function Basis:

WeightPC :

HRTF of Delay :

n

k

kkkk

k

nnkk

nw

nw

v

D

D

vD

][

)]355(,),5(),0([

][

Size of Head and EarInitial Delay of HRTF

Principal Component Analysis

Multiple Regression Analysis

Ear and Head of Size

tCoefficien Regression

Weight PC

:

:

:

ln0, ][ˆ

x

w

llknk xnw

HRTF

A sound wave is reflected and diffracted with the head and the ears.

Physical size of the head and the ear

The relation between the HRTFs and physical sizes of the head and the ear is investigated by the multiple regression analysis. The suitable HRTFs are estimated with the multiple regression model.Method

The Analysis of the Magnitude Response

The Analysis of the Initial Delay

Experimental Conditions

HRTF Estimated

HRTF Measured

:][ˆ

:][

]dB[|][ˆ|

|][|log20

1SD

2

1

i

i

I

i i

i

fH

fH

fH

fH

I

Results

0

1

2

3

4

5

6

7

4 8 12 16 20 24Frequency [kHz]

Spec

tral

Dis

tort

ion

[dB

]

Measured Subjects

Unmeasured Subjects

Magnitude Response

Generating the HRTFs

Purpose

Method

Experimental Conditions

Results

Subject ・ 78 subjects ( 63 males , 15 females ) ・ 1 Head and Torso Simulator

Directions of sound source 72 azimuths

Sampling frequency 48.0 kHz

Durations of the HRTF 10.7 ms ( 512 points )

Anthropometric Measures(Burkhard and Sachs, 1975)

Measurement Conditions

1 Ear Length

2 Ear Breadth

3 Concha Length

4 Concha Breadth5 Protrusion

6 Bitragion Diameter

7 Radial distances between the bitragion and the pronasale

8 Radial distances between the bitragion and the opistocranion

9 Radial distances between the bitragion and the vertex

Average Error

Measured Subjects

0.027 [ms] (1.3 [points])

Unmeasured Subjects

0.031 [ms] (1.5 [points])

Initial Delay

Imperceptible (Toole and Sayers 1965)

No significant difference(Nishino et al. 1999)

HRTF Databasehttp://www.itakura.nuee.nagoya-u.ac.jp/HRTF/

-Contents

 ・ HRTFs used in this experiments ・ HRTFs measured for 72 azimuths and 28 elevations

In this study, it is investigated to the improvement of the audibility for the multi speakers. As the results, it is more effective for audibility at the intervals of 45 degrees and the suitable HRTFs for the listeners can be estimated. The communication system for multi speakers can bedesigned with these results.

It is not audible when the many speakers talk at the same time.

To investigate the audible performances when the direction of the speakers is changed.

It is more audible when the speakers are in the different direction.

Maskee Japanese Speech(0°(front))

Masker Pink Noise(0°, 15°, …, 90°)

Subject 6 males

It is more effective at the intervals of 45 degrees.

0

2

4

6

8

10

12

14

0 15 30 45 60 75 90Interval [deg]

Mask

ing

Level [d

B]

HRTFILD, ITD

• The Method of the limits for measuring the masking level(2up-2down).• The subjects answer the recognizable or not when the stimuli are listened with the earphones.• The stimuli are convolution of the speech and the HRTFs or the interaural level/time difference.• If the higher level is obtained, it is easier to hear the maskee at that angle.

Purpose