12
IMPROVEMENT OF VOICE TRIGGER RECOGNITION IN SMARTPHONES WITH MULTI-MIC VOICE ENHANCEMENT DSP TECHNOLOGY

IMPROVEMENT OF VOICE TRIGGER RECOGNITION IN … · Hey Siri? Hey Cortana? Hello Jarvis? Open Sesame? Alexa? Sorry Master, I cannot recognize the voice trigger. Please, try again THANK

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IMPROVEMENT OF VOICE TRIGGER RECOGNITION IN … · Hey Siri? Hey Cortana? Hello Jarvis? Open Sesame? Alexa? Sorry Master, I cannot recognize the voice trigger. Please, try again THANK

IMPROVEMENT OF

VOICE TRIGGER RECOGNITION

IN SMARTPHONES

WITH MULTI-MIC VOICE ENHANCEMENT

DSP TECHNOLOGY

Page 2: IMPROVEMENT OF VOICE TRIGGER RECOGNITION IN … · Hey Siri? Hey Cortana? Hello Jarvis? Open Sesame? Alexa? Sorry Master, I cannot recognize the voice trigger. Please, try again THANK

Voice Enhancement Package (VEP) is a set of software technologies

utilizing a number of microphones to enhance the voice signal before

it is provided to the speech recognition engine

While, in general, the number of microphones is unlimited, in

smartphones the number of microphones cannot exceed 4 for cost,

design and power consumption reasons

ABOUT ALANGO VOICE

ENHANCEMENT PACKAGE

WWW.ALANGO.COM

Directional

MicrophoneArray

(DMAR)

Direction

Detection

AEC

AEC

AEC

AEC

TTS Engine

Voice Controlled

System

(ASR)

Voice Trigger

Audio ++Audio or Voice Prompts

Noise

Voice

Acoustic echo of voice

prompts or music

ALANGO VOICE ENHANCEMENT PACKAGE

2

Page 3: IMPROVEMENT OF VOICE TRIGGER RECOGNITION IN … · Hey Siri? Hey Cortana? Hello Jarvis? Open Sesame? Alexa? Sorry Master, I cannot recognize the voice trigger. Please, try again THANK

Test:

Compare voice trigger recognition rate in a controllable and reproducible acoustic environment (office meeting room), using:

• 1 microphone raw audio (without any additional processing)

• 2 microphones processed audio (2mic multidirectional voice enhancement)

• 3 microphones processed audio (3mic multidirectional voice enhancement)

• 4 microphones processed audio (4mic multidirectional voice enhancement)

Test setup and components:

• Real meeting room

• A handset mockup integrating 4 microphones (DUT – Device Under Test)

• A mouth simulator generating the voice trigger “Hello blue genie” using different voices and voice levels:

• External speaker simulating ambient noise

• Signal processing chain: Alango n-mic DSP technology -> Sensory Voice Trigger “Truly Hands-free” technology

Determine objective improvement of Voice Triger recognition (hit rate) with Alango multi-directional beamforming in realistic

conditions for different users, noise levels and distances

OBJECTIVES OF THE VOICE

ENHANCEMENT TESTS

WWW.ALANGO.COM3

Page 4: IMPROVEMENT OF VOICE TRIGGER RECOGNITION IN … · Hey Siri? Hey Cortana? Hello Jarvis? Open Sesame? Alexa? Sorry Master, I cannot recognize the voice trigger. Please, try again THANK

Test flow:

• Make recordings of user’s voice using mouth simulator reproducing the phrase “Hello blue genie”:

at different distances from DUT: 0.5m, 1.0m, …, 4.5m

and

at different SPL levels: 94, 88, 82, 76 dBSPL at Mouth Reference Point (3cm from mouth opening)

• Process recordings off-line using Alango n-mic voice enhancement DSP technology

• Pass the enhanced voice signals to Sensory Voice Trigger technology (“Truly Hands-free” SDK)

• Calculate recognition rate statistics (“hit-rate”), 448 triggers per test

TEST DESCRIPTION

& PREREQUISITES

Signal processing chart:

Alango

Multi-Microphone

Voice Enhancement

DSP Technology

Sensory

Voice Trigger

Recognition result

(trigger found /

trigger not found)

WWW.ALANGO.COM4

Page 5: IMPROVEMENT OF VOICE TRIGGER RECOGNITION IN … · Hey Siri? Hey Cortana? Hello Jarvis? Open Sesame? Alexa? Sorry Master, I cannot recognize the voice trigger. Please, try again THANK

Test setup (real photo):1. Mouth simulator connected to the computer plays prerecorded voice triggers of different strength with different

voices and from different distances

2. A large loudspeaker directed towards the wall produces different types of diffused noise from different directions

3. 4-mic phone (DUT) lying on a table. Signals from 4 microphones are recorded and processed by the computer

Device under test (DUT):A phone mockup integrating 4 microphones

located on the top and bottom edges

TEST DESCRIPTION

& PREREQUISITES

WWW.ALANGO.COM

Mic 3 Mic 4

Mic 1 Mic 2

4-mic mockup

Diffused noise Mouth simulator

Triggers

5

Page 6: IMPROVEMENT OF VOICE TRIGGER RECOGNITION IN … · Hey Siri? Hey Cortana? Hello Jarvis? Open Sesame? Alexa? Sorry Master, I cannot recognize the voice trigger. Please, try again THANK

TEST A

WWW.ALANGO.COM

Test setup configuration

(top view)

Voice Trigger direction – phone back

Noise direction – phone side

Large meeting room

Recording and

collecting data on PC

Speaker (turned towards the wall)

Produces 75 dB SPL noise level at

DUT location

DUT

1.6 m

Distances from 0.5 to 4.0 meters away from DUT

0.5 m 4 m

Table

Mouth simulator

6

Page 7: IMPROVEMENT OF VOICE TRIGGER RECOGNITION IN … · Hey Siri? Hey Cortana? Hello Jarvis? Open Sesame? Alexa? Sorry Master, I cannot recognize the voice trigger. Please, try again THANK

10.93

0.57

0.29

0.140.07

0 0

0.93

0.14

0 0 0 0 0 0

1 1

0.86 0.860.79

0.570.64

0.57

1 10.93

1 1 1 1 1

0

0.2

0.4

0.6

0.8

1

1.2

0.5 1 1.5 2 2.5 3 3.5 4V

R h

it r

ate

Distance, m

1mic, no processing

TEST A RESULTS

WWW.ALANGO.COM

Voice Trigger direction – phone back

Noise direction – phone side

1 10.93

0.86

0.71

0.43

0.070

10.93

0.5

0 0 0 0 0

1 1 10.93

10.93

0.640.71

1 1 1 1 1 1

0.86

1

0

0.2

0.4

0.6

0.8

1

1.2

0.5 1 1.5 2 2.5 3 3.5 4

VR

hit

rate

Distance, m

2mic voice enhancement

Each chart is a result of statistics

collected over 448 voice trigger

stimulus tests

Voice Sound Pressure Level

at the Mouth Reference Point

(3cm from the mouth opening)

94 - very loud voice

88 - loud voice

82 - normal voice

76 - quiet voice

1 1

0.860.93

0.64

0.5

0.290.36

1

0.86

0.5

0.21

0.070 0 0

1 10.93

1 10.93

0.860.93

1 1 1 1 1 1 1 1

0

0.2

0.4

0.6

0.8

1

1.2

0.5 1 1.5 2 2.5 3 3.5 4

VR

hit

rate

Distance, m

3mic voice enhancement

1 10.93

10.93

0.64

0.43

0.29

1 1

0.57

0.43

0.21

0 0 0

1 1 1 1 10.93

0.86 0.86

1 1 1 1 1 1 1 1

0

0.2

0.4

0.6

0.8

1

1.2

0.5 1 1.5 2 2.5 3 3.5 4

VR

hit

rate

Distance, m

4mic voice enhancement

7

Page 8: IMPROVEMENT OF VOICE TRIGGER RECOGNITION IN … · Hey Siri? Hey Cortana? Hello Jarvis? Open Sesame? Alexa? Sorry Master, I cannot recognize the voice trigger. Please, try again THANK

TEST B

WWW.ALANGO.COM

Test setup configuration

(top view)

Voice Trigger direction – phone side

Noise direction – phone back

Large meeting room

Recording and

collecting data on PC

Speaker (turned towards the wall)

Produces 75 dB SPL noise level at

DUT location

DUT

1.6 m

Distances from 0.5 to 4.0 meters away from DUT

0.5 m 4 m

Table

Mouth simulator

8

Page 9: IMPROVEMENT OF VOICE TRIGGER RECOGNITION IN … · Hey Siri? Hey Cortana? Hello Jarvis? Open Sesame? Alexa? Sorry Master, I cannot recognize the voice trigger. Please, try again THANK

1 1 10.93

1

0.79

0.93

0.64

1 1

0.86

0.64

0.5

0.14 0.14

0

1 1 1 1 1 1 1 11 1 1 1 1 1 1 1

0

0.2

0.4

0.6

0.8

1

1.2

0.5 1 1.5 2 2.5 3 3.5 4

VR

hit

rate

Distance, m

4mic voice enhancement

1 10.93

1

0.710.79

0.57

0.43

1 1

0.570.5

0.36

0 0 0

1 10.93

1 1 1 10.93

1 1 1 1 1 1 1 1

0

0.2

0.4

0.6

0.8

1

1.2

0.5 1 1.5 2 2.5 3 3.5 4

VR

hit

rate

Distance, m

3mic voice enhancement

1 1

0.86

0.64 0.64 0.64

0.5

0.36

0.93 0.93

0.210.14

0 0 0 0

1 10.93

10.93

1 1 11 10.93

1 1 1 1 1

0

0.2

0.4

0.6

0.8

1

1.2

0.5 1 1.5 2 2.5 3 3.5 4

VR

hit

rate

Distance, m

2mic voice enhancement

1

0.86

0.360.29

0.07 0.070 0

0.86

0.070 0 0 0 0 0

1 10.93

0.710.64

0.86

0.710.64

1 1 1 1

0.86

1 1 1

0

0.2

0.4

0.6

0.8

1

1.2

0.5 1 1.5 2 2.5 3 3.5 4V

R h

it r

ate

Distance, m

1mic, no processing

TEST B RESULTS

WWW.ALANGO.COM

Voice Trigger direction – phone back

Noise direction – phone side

Each chart is a result of statistics

collected over 448 voice trigger

stimulus tests

Voice Sound Pressure Level

at the Mouth Reference Point

(3cm from the mouth opening)

94 - very loud voice

88 - loud voice

82 - normal voice

76 - quiet voice

9

Page 10: IMPROVEMENT OF VOICE TRIGGER RECOGNITION IN … · Hey Siri? Hey Cortana? Hello Jarvis? Open Sesame? Alexa? Sorry Master, I cannot recognize the voice trigger. Please, try again THANK

Trigger direction – phone back

Noise direction – phone side

Trigger direction – phone side

Noise direction – phone back

1mic (no processing) 1mic (no processing)

2mic voice enhancement 2mic voice enhancement

3mic voice enhancement 3mic voice enhancement

4mic voice enhancement 4mic voice enhancement

AUDIO EXAMPLESDistance to DUT – 2m

User’s speech level – 82 dBSPL@MRP

WWW.ALANGO.COM10

Page 11: IMPROVEMENT OF VOICE TRIGGER RECOGNITION IN … · Hey Siri? Hey Cortana? Hello Jarvis? Open Sesame? Alexa? Sorry Master, I cannot recognize the voice trigger. Please, try again THANK

Hello Google ?

Hey Siri?

Hey Cortana?

Hello Jarvis?

Open Sesame?Alexa?

Sorry Master, I cannot recognize the voice trigger.

Please, try again

THANK YOU FOR

YOUR INTEREST!

WWW.ALANGO.COM11

Page 12: IMPROVEMENT OF VOICE TRIGGER RECOGNITION IN … · Hey Siri? Hey Cortana? Hello Jarvis? Open Sesame? Alexa? Sorry Master, I cannot recognize the voice trigger. Please, try again THANK

Don’t hesitate to contact us if you need more information about

Alango voice enhancement or other technologies.

Let us know what you think!

We are looking forward to hearing from you!

Alango Technologies Ltd.

2 Etgar St. PO Box 62

Tirat Carmel 39100 Israel

Please, send your questions, comments, thoughts, proposals

to [email protected] or specifically

Mr. Robert Schrager (Sales enquiries):

Mr. Alex Radzishevsky (Technical enquiries):

Dr. Alexander Goldin (CEO):

[email protected]

[email protected]

[email protected]

CONTACT

INFORMATION

Phone numbers:

Main office: +972 (4) 8580 743

Fax: +972 (4) 8580 621

WWW.ALANGO.COM12alangomob