Track 1 session 3 - st dev con 2016 - smart home and building

October 4, 2016

Santa Clara Convention Center

Mission City Ballroom

Smart Home & Building voice remote controls, source localization,

beamforming ASR

Roberto Sannino

Voice Communication a key driver of innovation since 1800’s

2

IoT evolution of Voice Automation:the IoT voice assistant

3

…to Home / Office

Terminals

From professional

PC applications

…to Smart Mobiles

…to “Anything Connectable”

How can I

help you?

Voice Terminal

• Audio capture & render

• Signal processing

• Low power

• Constained geometry

Voice & data

GatewaySeamless connectivity

MEMS microphones and Audio Quality at system level

4

Cloud

• Natural Language Processing

• Dialogue Management

• ServicesPlay Music

Control Lighting, heating, …

News, sport, traffic, weather, …

Answer questions, create to-do lists, shopping lists, …

Place orders online, use other online services: taxi, pizza, …

Digital MEMS Microphones 5

SensingA/D and Digital i/f

ASIC

Sensor

Sound Inlet

PDM (Pulse Density Modulation) interface:

• 1 to 3 MHz

• 1-bit resolution

• Fully digital

• Capacitive membrane

• Omnidirectional

• Analog output

Digital MEMS microphones:

• ultra-compact, low-power, omnidirectional

• built with a capacitive sensing element and an IC interface

Bottom port

Top port

Top port metallic

Bottom port metallic

Microphone to STM32 Architecture 6

• Serial: SAI/I2S/SPI: 1 or 2 microphones share CLK and data line

• Parallel: GPIO: Up to 16 (or 32) microphones

• DFSDM (Digital Filter for Sigma Delta Modulator) dedicated interface [only on selected STM32 devices]

PDM Audio IN IIR-HP IIR-LP

FIR-LPSinc3

dec=8

FIR-LPSinc3

dec=8/10/16

16 bit PCM

Digital Audio OUT

PCMGain

Control

2-Stage decimation filter IIR signal conditioning

PDM to PCM filter SW library for STM32 CubeSoftware

Hardware

Direct acquisition of digital MEMS Microphones

BlueCoin: the Robotic EarAugmented hearing and motion sensing

7

Sound

Localization

Embedded Processing

Motion, Activity

and Balance

Acoustic Beamforming

Bluetooth Low Energy

Full Embedded Sensing Software Development Kit

8

4 x MP23DB01MM LSM6DS3 LPS22HBLIS3MDL BLUENRG-MS BALF-NRG-01D3

Indoor Voice Capture: the Problem 9

Audio input (e.g. music, or far-end speaker)

Reference signal (same as Audio Input)

Audio output (e.g. speaker’s voice, clean)

reflections, diffusion, …

Voice Acoustic Echo

background noise

Indoor Voice Capture: the Problem 10


Reference signal (same as Audio Input)



Voice Acoustic Echo

background noise

Audio Front End: Example of Signal Processing Architecture

11

Beamforming

Audio

Analytics

Acoustic Echo

Cancellation

Statistical

DereverberationAuto Gain

Control

Trigger ASR

Noise Reduction

reference

Source

Localization

- Voice Activity Detection

- Statistical moments

- Noise estimation

- ...

Speech Recognition

embedded cloud

MEMS microphone

array

Audio Front End: Example of Signal Processing Architecture

12

Beamforming

Audio

Analytics

Acoustic Echo

Cancellation

Statistical


Control

Trigger ASR

Noise Reduction

reference

Source

Localization



- Noise estimation

- ...

Speech Recognition

embedded cloud

MEMS microphone

array

Software IP and ST Eco-systemOpen Software Design Environment

13

Algorithms and system demonstrators for the Internet of Things.

Unleashing the power of embedded software

Bring your ideas to now!

BlueMicroSystem

STM32 ODESTM32 Nucleo

development

boards

STM32 Nucleo

expansion

boards

STM32 Cube

software

STM32 Cube

expansion

software

Software libraries

BlueVoiceLink

SmartAcoustics

Example Projects

http://www2.st.com/content/st_com/en/products/ecosystems/open-software-expansion/open-audio.html


Audio SW IP and Eco-system 14

Audio

AnalyticsStatistical


Control

Trigger ASR

Noise Reduction

reference



- Noise estimation

- ...

3rd party ASR

embedded cloud

MEMS microphone

array

osxAcousticSL

osxAcousticBF osxAcousticEC

Each osxAcoustic library may be easily replaced by 3rd party SW IP

All are released under free evaluation and production licensing

Spatial Audio Processing 15

Beamforming

Source

Localization



- Noise estimation

- ...

MEMS microphone

array

Estimates the Direction of Arrival of the Main

sound source

Independent from beamforming

May control the beam direction

Sound Localization: osxAcousticSL

Spatial Filter

Outputs the Audio that comes from a given

direction

Adaptively cancels audio signals coming from

other directions

Beamforming: osxAcousticBF

Freely licensed FW Libraries for STM32

http://goo.gl/4nXh8W



Audio Beamforming

𝑓1 .

Ʃ

𝑓2 .

𝑓3 .

𝑓𝑁 .

Adaptive

Filtering

Audio out (mono)

Microphone ArrayBeamformer

MEMS microphones enable

very small array geometries!

Environmental

Noise

Sound Source

16

First Order Beam Patterns 17

Figure of 8

Simple subtraction

of 2 microphone outputs

Subtraction of 2 microphone outputs,

after one digital delay ∆.

∆ = acoustic latency from [m1] to [m2]

Cardioid

ST Beamforming Solution:

osxAcousticBF18

End-fire cardioid beamforming based on two digital MEMS microphones

• Fine-tuned for ST Digital MEMS Microphones

Scalable performance Vs MIPS to fit application requirements

• 4 algorithm options

Strong BF

Endfire

± 35° around the microphone axis

≈ 84 MIPS of STM32F4

≈ 60°

Basic Cardioid

Endfire

± 85° around the microphone axis

≈ 11 MIPS of STM32F4

≈ 170°

osxAcousticBF – Algorithm Options 19

• Strong: back to back cardioid and adaptive noise removal filter

∆ = 𝑑 𝑐 ;𝑐 = 𝑠𝑝𝑒𝑒𝑑 𝑜𝑓 𝑠𝑜𝑢𝑛𝑑

d

out

Delay = ∆

+ -

∆ = 𝑑 𝑐 ;𝑐 = 𝑠𝑝𝑒𝑒𝑑 𝑜𝑓 𝑠𝑜𝑢𝑛𝑑

d

out

Delay = ∆

+ -𝐷𝑒𝑛𝑜𝑖𝑠𝑒

• Cardioid basic:

1st -order Differential

Microphone Array (DMA)

• Cardioid denoise: a denoise

filter is added to the end fire

beam forming output

out

∆

+

-∆

𝐸𝑛ℎ𝑎𝑛𝑐𝑒

𝑅𝑒𝑚𝑜𝑣𝑒

-

+

d

• ASR ready: same as the Strong, without the denoise filter.

Best performance for Automatic Speech Recognition applications.

Microphones Sensitivity Matching 20

• Key to optimal performance• Best directivity results

• Best noise rejection

• Gain compensation API• Adjust the amplitude of one microphone to match the other’s

• Gain compensation options• Static gain offline computation

• Dynamic gain compensation

Polar Pattern Tests 21

Test setup:

• Microphone Array mounted on a rotating support

• Inter-microphone distance: 4mm

• Rotation in steps of 10 degrees

• Gaussian White Noise played by high quality loudspeaker

• Resulting beampattern

• Blue: omnidirectional microphone

• Red: «Basic cardioid» mode

• Green: «Strong» mode

BlueCoin eval platform

Integrated

MEMS micro-array

Beamforming: ASR Test 22

WORDS

NOISE

Male and female

spoken words - at 0°

Gaussian White

Noise - at 90°

Test setup:

Inputs

Output

4 synchronous output channels :

• Omnidirectional microphone

• Basic Cardioid

• ASR Ready

• Strong Cardioid

Recorded words are sent to Google ASR and recognition data are collected

BlueCoin eval platform

Integrated

MEMS micro-array

osxAcousticBF: ASR Test Results 23

AS

R c

onfidence

Signal to Noise ratio

omnidirectional

cardioid

ASR

strong

Evaluation Systems 24

X-NUCLEO-CCA02M1 supports beamforming based on the 2 onboard MP34DT01-M

Beam steering can be implemented in architectures with >2 microphones by choosing each time a

different ordered couple of microphones

e.g. 4-microphone configurations enable implementation of 8 different cardioid beamforming

µ4 array: MEMS microphone side by side:

the smallest array you can build

4 x MP23DB01MM

Sound Source Localization 25

Signals are acquired by one or two couples of microphones in

order to estimate the sound Direction of Arrival (DoA)

Angle 𝛼 = Direction of Arrival

𝛼

osxAcousticSLSound Source Localization Library

26

• Scalable library allows MIPS Vs resolution trade-off

• Selectable angle resolution, up to 1 degree theoretical

• Selectable Algorithm

• Two algorithms implemented

• XCORR:

• GCC-PHAT:

• A simple Voice Activity Detector is included, based on energy threshold.

• Avoids false recognitions in case of low signal energy

Supports cm-sized microphone arrays

low-MIPS and low-resolution

Supports mm-sized Differential Arrays

Source Localization Application considerations

29

Range

2 microphones cover a

range of 180°

4 microphones cover

a range of 360°

MIPS Performance

On a typical Home application source localization may run as a low priority task

Depending on the use case, localization info may not reqire continuous updates (e.g. few times per second)

Due to spatial simmetry:

Acoustic Echo CancellationRemoves echo of playback audio in speech capture application

30

AEC(estimates room

reverberation)

Reverberant Room

Known Audio Source

e. g. music / voice

Single Microphone

application

STM32 is connected to both

the microphone and the

loudspeaker

The Open.AUDIO AEC library is an optimized STM32

port based on the Open Source project Speex:

http://www.speex.org/

osxAcousticEC



Putting together SW librariesSmartAcoustic1

31

Beamforming

Acoustic Echo

Cancellation

reference

audio

Source

Localization4-MEMS

microphones array

• Example project in source code built on STM32Cube software technology

• Includes acoustic Beam Forming, Echo Cancellation, and Source Localization.

• Immediate test and performance evaluation

User-selectable angle resolution

User-selectable activation treshold

Based on 4 MEMS microphones

360° localization range

User-selectable neam direction

User-selectable beamforming algorithm

Based on 4 MEMS microphones

GUI highlights the chosen microphone couple

Based on a single MEMS microphone

Reference audio is stored on STM32 FLASH

Uses Audio OUT to play back audio while

streaming cleaned speech on USB

SmartAcoustic1 32

Evaluation system

Software reference design

Multi –platform support

Supports STM32 Nucleo expansion boards

X-NUCLEO-CCA01M1

X-NUCLEO-CCA02M1

connected to a

NUCLEO-F446RE boardSupports BlueCoin

Integrated Audio and Sensors platform

Smart Home Use Case DiscussionThe Internet Voice Assistant for Smart Home

33

• Audio capture and playback

• Automatic voice dialogue • Cloud based

• Mixed Embedded/Cloud

• Internet connection

• Powered• Plugged to Mains

• Battery Operated

Cloud

Typical Features

The Problem: Indoor Voice, Audio, Noise 34


Direct Acoustic Echo


background noise


Voice

Beamforming vs. AEC 35

Beamforming

Acoustic Echo

Cancellation

reference audio

Beamforming:

requires two (or more) microphones,

Is independent from the loudspeaker

AEC:

requires a single microphone,

must connect also to the audio OUT path

• AEC (tries to) cancel the Direct Acoustic Echo and its reflections

• Beamforming (tries to) cancel every signal that is not «on the beam»

Combining Beamforming and AEC

Beamforming

Acoustic Echo

Cancellation

reference audio

ASR

ASR

One of the microphones

all microphones

Best ASR score

is chosen

Alternative solution, based on ASR confidence ranking

36

Combined Beamforming and Localization in noisy environments

Beamforming

Multiple beamforming in parallel

Select

based

on

ASR

score

ranking

Source localization may be an implicit result

of multiple beamforming & ASR ranking

ASRBeamforming ASR

Beamforming ASRBeamforming ASR

cloudembedded

NOTE: osxAcousticSL Acoustic Source Localization library is not

effective in presence of strong Noise, Reflections and Reverberations.

37

Example of System Implementation 38

Beamforming

Acoustic Echo

Cancellation

reference audio OUT

ASR

Concurrent execution of multiple beamforming, AEC, and ASR

Select

based

on

ASR

score

ranking

ASRBeamforming ASR

Beamforming ASRBeamforming ASR

one microphone

cloudembedded

Hint: consider sensing the loudness level to switch off algorithms when they are not needed!

MEMS Microphone Array to Cloud Architecture

39

Integrated Terminal

Audio Front End Signal Processing

Communication

Interface

3rd Party

Cloud-based

ServicesGateway

Thank You

Devices & Hardware

Track 1 session 3 - st dev con 2016 - smart home and building