15
Finding Earthquake Victims by Voice Detection Techniques Ruchi Jha* ,Walter Lang, Reiner Jedermann Institute For Microsensors, Actuators, And Systems (IMSAS) University Of Bremen, Bremen, Germany 8th International Electronic Conference on Sensors and Applications 115 Nov, 21

Finding Earthquake Victims by Voice Detection Techniques

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Finding Earthquake Victims by Voice Detection Techniques

Finding Earthquake Victims by Voice Detection Techniques

Ruchi Jha* ,Walter Lang, Reiner Jedermann

Institute For Microsensors, Actuators, And Systems (IMSAS)

University Of Bremen, Bremen, Germany

8th International Electronic Conference

on Sensors and Applications

1–15 Nov, 21

Page 2: Finding Earthquake Victims by Voice Detection Techniques

Background And Motivation

Introduction

1. Most earthquakes above 5.5 on Richter scale, can cause large-scale destruction.

2. Many of the casualties stuck in damaged structures can live only for a few hours after the

initial disaster.

3. Approximately 80-percent of victims can be successfully rescued alive if they are detected by

help teams within 48 hours of disaster occurrence.

4. Despite of technological excellence, rescue is challenging when the victim cannot be found

through a direct line of sight.

5. A method is needed to detect people through physical attributes like motion, odour, speech,

heat signature or gaseous emissions unique to human body for better rescue operations.

1

Page 3: Finding Earthquake Victims by Voice Detection Techniques

Introduction

Voice Activity Detection

(VAD)Methods Applications

Detecting

presence or absence

of

human speech.

1. Audio Conferencing

2. Echo Cancellation

3. Speech Recognition

4. Speech Encoding

5. Hands-free telephony

1. Energy Thresholds

2. Pitch and Harmonic detection

3. Zero-crossing rates

4. Waveform and

spectral analysis

Introduction

2

Page 4: Finding Earthquake Victims by Voice Detection Techniques

Speech Noise

Properties

Frequency Domain Properties

1. Spectral Flux

2. Spectral Centroid

3. Spectral Roll off

SIGNALS

Time Domain Properties

1. Energy

2. Zero Crossing Rate

Scientific Approach

IntroductionScientific

Approach

3

Page 5: Finding Earthquake Victims by Voice Detection Techniques

System Description

1. Collection of Noise and Voice Samples

Count Group Sources Name Examples

11 Noise Studio Audio CD [14] N1 to 11Traffic, touring cars, motorcars,

cleaning, airplane, buzzer, river, applause, industry, chattering.

9 Voice Samples

CD, TV,

Studio recording

VF1 to 4(female)VM1 to 5

(male)

Female and Male sound

recordings in English and

German.

7 Noise StreetOwn Outside recording

SN1 to 7Street noises with birds, cars,

tram, glasses, music, river and wind.

5 Voice MixOwn recording

OutsideMIX 1 to 5 Mix sounds of people speaking

with background noise.

6 Voice StudioStudio

recording

VF… (female)

VM…

(male)

Speech recorded in Spanish(S), German (D), Hindi (H), English (E),

and Latvian (L).

Total Samples = 38; Sampling frequency = 44100 Hz (mono) ; Length = 2 seconds

IntroductionScientific

Approach

System

Description

4

Page 6: Finding Earthquake Victims by Voice Detection Techniques

2. Feature Extraction

1. Trails with Average values of Spectral features.

Many Buffer range of values overlapping for speech and Noise. (Not opted)

2. Calculation of Average of Local Minima and Maxima (ALMM) of spectral features.

• MATLAB script to find peak values.

• Drawing a conclusion for separating speech and noise.

IntroductionScientific

Approach

System

Description

5

Page 7: Finding Earthquake Victims by Voice Detection Techniques

IntroductionScientific

Approach

System

Description

..continued.

Flux Peaks in a River sound sample Discriminating Speech (in blue) and Noise (in red) based on the ALMM of Flux values

Noise : N (1-11)

Voice : V (M/F)

6

Page 8: Finding Earthquake Victims by Voice Detection Techniques

System

Description

Scientific

ApproachIntroduction

Cross Validation : Estimates the performance of a model.

1. Manual method

• Linear boundary between the sample.

2. Machine learning

• Selection of a predictive model : LDA (Linear Discriminant Analysis)

• Training the LDA Model : To obtain a linear boundary.

• Testing the model : To get a predicted result.

3. Methodology

7

Page 9: Finding Earthquake Victims by Voice Detection Techniques

Results1. Cross validation Results for manual boundary selection

A distinction of noise and speech using Manual method with Flux values

System

Description

Scientific

ApproachIntroduction Results

Studio Noise : SN (1-7)

Voice : V (M/F)

Noise + Speech : MIX (1-5)

8

-5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0

PAV+ (Flux, log-scale)

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

1

PA

V-(F

l ux,

l og-s

cal e

)

SN-1

SN-2

SN-3

SN-4

SN-5

SN-6

SN-7

MIX-1

MIX-2

MIX-3

MIX-4

MIX-5

V-S

VM-DeV-HVF-De

V-En

V-la

Page 10: Finding Earthquake Victims by Voice Detection Techniques

2. Cross validation Results for Machine learning

System

Description

Scientific

ApproachIntroduction Results

A distinction of noise and speech using ML linear classification with Flux values

9

Page 11: Finding Earthquake Victims by Voice Detection Techniques

System

Description

Scientific

ApproachIntroduction Results

3. Overall success rate for both methods and combination of parameters.

Results of cross-validation : Machine learningResults of cross-validation : Manual methods

10

Page 12: Finding Earthquake Victims by Voice Detection Techniques

System

Description

Scientific

ApproachIntroduction Results

Additional Observation with mixed Sample type for Training and Testing

Prediction result obtained by varying speech content in the samples

• Models tested from low to high amplitude of varying speech.

11

Page 13: Finding Earthquake Victims by Voice Detection Techniques

1. Flux, Centroid, Roll-off And Their ALMM Values Can Be Used For VAD.

1. 2. Machine Learning Is More Effective Than Manual Technique.

2. 3. LDA Method Is Better Than QDA.

3. 4. Combination Of Parameters Should Be Preferred.

4. 5. Requires More Research To Detect Low Percentage Of Speech(< 30 %).

System

Description

Scientific

ApproachIntroduction Results Conclusion

Conclusion

12

Page 14: Finding Earthquake Victims by Voice Detection Techniques

Challenges

System

Description

Scientific

ApproachIntroduction Results Conclusion

Challenges and

Future Possibilities

▪ Insufficient literature on the methods of using spectral values.

▪ Recording and procuring correct samples.

▪ Difficult to detect low amplitude of speech in a sample.

Future Possibilities

▪ More data set, with diverse recordings.

▪ More signal features for speech identification.

▪ More machine learning models can be tested.

13

Page 15: Finding Earthquake Victims by Voice Detection Techniques

Thank you