Upload
stworld
View
363
Download
6
Embed Size (px)
Citation preview
October 4, 2016
Santa Clara Convention Center
Mission City Ballroom
Smart Home & Building voice remote controls, source localization,
beamforming ASR
Roberto Sannino
Voice Communication a key driver of innovation since 1800’s
2
IoT evolution of Voice Automation:the IoT voice assistant
3
…to Home / Office
Terminals
From professional
PC applications
…to Smart Mobiles
…to “Anything Connectable”
How can I
help you?
Voice Terminal
• Audio capture & render
• Signal processing
• Low power
• Constained geometry
Voice & data
GatewaySeamless connectivity
MEMS microphones and Audio Quality at system level
4
Cloud
• Natural Language Processing
• Dialogue Management
• ServicesPlay Music
Control Lighting, heating, …
News, sport, traffic, weather, …
Answer questions, create to-do lists, shopping lists, …
Place orders online, use other online services: taxi, pizza, …
Digital MEMS Microphones 5
SensingA/D and Digital i/f
ASIC
Sensor
Sound Inlet
PDM (Pulse Density Modulation) interface:
• 1 to 3 MHz
• 1-bit resolution
• Fully digital
• Capacitive membrane
• Omnidirectional
• Analog output
Digital MEMS microphones:
• ultra-compact, low-power, omnidirectional
• built with a capacitive sensing element and an IC interface
Bottom port
Top port
Top port metallic
Bottom port metallic
Microphone to STM32 Architecture 6
• Serial: SAI/I2S/SPI: 1 or 2 microphones share CLK and data line
• Parallel: GPIO: Up to 16 (or 32) microphones
• DFSDM (Digital Filter for Sigma Delta Modulator) dedicated interface [only on selected STM32 devices]
PDM Audio IN IIR-HP IIR-LP
FIR-LPSinc3
dec=8
FIR-LPSinc3
dec=8/10/16
16 bit PCM
Digital Audio OUT
PCMGain
Control
2-Stage decimation filter IIR signal conditioning
PDM to PCM filter SW library for STM32 CubeSoftware
Hardware
Direct acquisition of digital MEMS Microphones
BlueCoin: the Robotic EarAugmented hearing and motion sensing
7
Sound
Localization
Embedded Processing
Motion, Activity
and Balance
Acoustic Beamforming
Bluetooth Low Energy
Full Embedded Sensing Software Development Kit
8
4 x MP23DB01MM LSM6DS3 LPS22HBLIS3MDL BLUENRG-MS BALF-NRG-01D3
Indoor Voice Capture: the Problem 9
Audio input (e.g. music, or far-end speaker)
Reference signal (same as Audio Input)
Audio output (e.g. speaker’s voice, clean)
reflections, diffusion, …
Voice Acoustic Echo
background noise
Indoor Voice Capture: the Problem 10
Audio input (e.g. music, or far-end speaker)
Reference signal (same as Audio Input)
Audio output (e.g. speaker’s voice, clean)
reflections, diffusion, …
Voice Acoustic Echo
background noise
Audio Front End: Example of Signal Processing Architecture
11
Beamforming
Audio
Analytics
Acoustic Echo
Cancellation
Statistical
DereverberationAuto Gain
Control
Trigger ASR
Noise Reduction
reference
Source
Localization
- Voice Activity Detection
- Statistical moments
- Noise estimation
- ...
Speech Recognition
embedded cloud
MEMS microphone
array
Audio Front End: Example of Signal Processing Architecture
12
Beamforming
Audio
Analytics
Acoustic Echo
Cancellation
Statistical
DereverberationAuto Gain
Control
Trigger ASR
Noise Reduction
reference
Source
Localization
- Voice Activity Detection
- Statistical moments
- Noise estimation
- ...
Speech Recognition
embedded cloud
MEMS microphone
array
Software IP and ST Eco-systemOpen Software Design Environment
13
Algorithms and system demonstrators for the Internet of Things.
Unleashing the power of embedded software
Bring your ideas to now!
BlueMicroSystem
STM32 ODESTM32 Nucleo
development
boards
STM32 Nucleo
expansion
boards
STM32 Cube
software
STM32 Cube
expansion
software
Software libraries
BlueVoiceLink
SmartAcoustics
Example Projects
Audio SW IP and Eco-system 14
Audio
AnalyticsStatistical
DereverberationAuto Gain
Control
Trigger ASR
Noise Reduction
reference
- Voice Activity Detection
- Statistical moments
- Noise estimation
- ...
3rd party ASR
embedded cloud
MEMS microphone
array
osxAcousticSL
osxAcousticBF osxAcousticEC
Each osxAcoustic library may be easily replaced by 3rd party SW IP
All are released under free evaluation and production licensing
Spatial Audio Processing 15
Beamforming
Source
Localization
- Voice Activity Detection
- Statistical moments
- Noise estimation
- ...
MEMS microphone
array
Estimates the Direction of Arrival of the Main
sound source
Independent from beamforming
May control the beam direction
Sound Localization: osxAcousticSL
Spatial Filter
Outputs the Audio that comes from a given
direction
Adaptively cancels audio signals coming from
other directions
Beamforming: osxAcousticBF
Freely licensed FW Libraries for STM32
http://goo.gl/4nXh8W
Audio Beamforming
𝑓1 .
Ʃ
𝑓2 .
𝑓3 .
𝑓𝑁 .
Adaptive
Filtering
Audio out (mono)
Microphone ArrayBeamformer
MEMS microphones enable
very small array geometries!
Environmental
Noise
Sound Source
16
First Order Beam Patterns 17
Figure of 8
Simple subtraction
of 2 microphone outputs
Subtraction of 2 microphone outputs,
after one digital delay ∆.
∆ = acoustic latency from [m1] to [m2]
Cardioid
ST Beamforming Solution:
osxAcousticBF18
End-fire cardioid beamforming based on two digital MEMS microphones
• Fine-tuned for ST Digital MEMS Microphones
Scalable performance Vs MIPS to fit application requirements
• 4 algorithm options
Strong BF
Endfire
± 35° around the microphone axis
≈ 84 MIPS of STM32F4
≈ 60°
Basic Cardioid
Endfire
± 85° around the microphone axis
≈ 11 MIPS of STM32F4
≈ 170°
osxAcousticBF – Algorithm Options 19
• Strong: back to back cardioid and adaptive noise removal filter
∆ = 𝑑 𝑐 ;𝑐 = 𝑠𝑝𝑒𝑒𝑑 𝑜𝑓 𝑠𝑜𝑢𝑛𝑑
d
out
Delay = ∆
+ -
∆ = 𝑑 𝑐 ;𝑐 = 𝑠𝑝𝑒𝑒𝑑 𝑜𝑓 𝑠𝑜𝑢𝑛𝑑
d
out
Delay = ∆
+ -𝐷𝑒𝑛𝑜𝑖𝑠𝑒
• Cardioid basic:
1st -order Differential
Microphone Array (DMA)
• Cardioid denoise: a denoise
filter is added to the end fire
beam forming output
out
∆
+
-∆
𝐸𝑛ℎ𝑎𝑛𝑐𝑒
𝑅𝑒𝑚𝑜𝑣𝑒
-
+
d
• ASR ready: same as the Strong, without the denoise filter.
Best performance for Automatic Speech Recognition applications.
Microphones Sensitivity Matching 20
• Key to optimal performance• Best directivity results
• Best noise rejection
• Gain compensation API• Adjust the amplitude of one microphone to match the other’s
• Gain compensation options• Static gain offline computation
• Dynamic gain compensation
Polar Pattern Tests 21
Test setup:
• Microphone Array mounted on a rotating support
• Inter-microphone distance: 4mm
• Rotation in steps of 10 degrees
• Gaussian White Noise played by high quality loudspeaker
• Resulting beampattern
• Blue: omnidirectional microphone
• Red: «Basic cardioid» mode
• Green: «Strong» mode
BlueCoin eval platform
Integrated
MEMS micro-array
Beamforming: ASR Test 22
WORDS
NOISE
Male and female
spoken words - at 0°
Gaussian White
Noise - at 90°
Test setup:
Inputs
Output
4 synchronous output channels :
• Omnidirectional microphone
• Basic Cardioid
• ASR Ready
• Strong Cardioid
Recorded words are sent to Google ASR and recognition data are collected
BlueCoin eval platform
Integrated
MEMS micro-array
osxAcousticBF: ASR Test Results 23
AS
R c
onfidence
Signal to Noise ratio
omnidirectional
cardioid
ASR
strong
Evaluation Systems 24
X-NUCLEO-CCA02M1 supports beamforming based on the 2 onboard MP34DT01-M
Beam steering can be implemented in architectures with >2 microphones by choosing each time a
different ordered couple of microphones
e.g. 4-microphone configurations enable implementation of 8 different cardioid beamforming
µ4 array: MEMS microphone side by side:
the smallest array you can build
4 x MP23DB01MM
Sound Source Localization 25
Signals are acquired by one or two couples of microphones in
order to estimate the sound Direction of Arrival (DoA)
Angle 𝛼 = Direction of Arrival
𝛼
osxAcousticSLSound Source Localization Library
26
• Scalable library allows MIPS Vs resolution trade-off
• Selectable angle resolution, up to 1 degree theoretical
• Selectable Algorithm
• Two algorithms implemented
• XCORR:
• GCC-PHAT:
• A simple Voice Activity Detector is included, based on energy threshold.
• Avoids false recognitions in case of low signal energy
Supports cm-sized microphone arrays
low-MIPS and low-resolution
Supports mm-sized Differential Arrays
Source Localization Application considerations
29
Range
2 microphones cover a
range of 180°
4 microphones cover
a range of 360°
MIPS Performance
On a typical Home application source localization may run as a low priority task
Depending on the use case, localization info may not reqire continuous updates (e.g. few times per second)
Due to spatial simmetry:
Acoustic Echo CancellationRemoves echo of playback audio in speech capture application
30
AEC(estimates room
reverberation)
Reverberant Room
Known Audio Source
e. g. music / voice
Single Microphone
application
STM32 is connected to both
the microphone and the
loudspeaker
The Open.AUDIO AEC library is an optimized STM32
port based on the Open Source project Speex:
http://www.speex.org/
osxAcousticEC
Putting together SW librariesSmartAcoustic1
31
Beamforming
Acoustic Echo
Cancellation
reference
audio
Source
Localization4-MEMS
microphones array
• Example project in source code built on STM32Cube software technology
• Includes acoustic Beam Forming, Echo Cancellation, and Source Localization.
• Immediate test and performance evaluation
User-selectable angle resolution
User-selectable activation treshold
Based on 4 MEMS microphones
360° localization range
User-selectable neam direction
User-selectable beamforming algorithm
Based on 4 MEMS microphones
GUI highlights the chosen microphone couple
Based on a single MEMS microphone
Reference audio is stored on STM32 FLASH
Uses Audio OUT to play back audio while
streaming cleaned speech on USB
SmartAcoustic1 32
Evaluation system
Software reference design
Multi –platform support
Supports STM32 Nucleo expansion boards
X-NUCLEO-CCA01M1
X-NUCLEO-CCA02M1
connected to a
NUCLEO-F446RE boardSupports BlueCoin
Integrated Audio and Sensors platform
Smart Home Use Case DiscussionThe Internet Voice Assistant for Smart Home
33
• Audio capture and playback
• Automatic voice dialogue • Cloud based
• Mixed Embedded/Cloud
• Internet connection
• Powered• Plugged to Mains
• Battery Operated
Cloud
Typical Features
The Problem: Indoor Voice, Audio, Noise 34
Audio input (e.g. music, or far-end speaker)
Direct Acoustic Echo
Audio output (e.g. speaker’s voice, clean)
background noise
reflections, diffusion, …
Voice
Beamforming vs. AEC 35
Beamforming
Acoustic Echo
Cancellation
reference audio
Beamforming:
requires two (or more) microphones,
Is independent from the loudspeaker
AEC:
requires a single microphone,
must connect also to the audio OUT path
• AEC (tries to) cancel the Direct Acoustic Echo and its reflections
• Beamforming (tries to) cancel every signal that is not «on the beam»
Combining Beamforming and AEC
Beamforming
Acoustic Echo
Cancellation
reference audio
ASR
ASR
One of the microphones
all microphones
Best ASR score
is chosen
Alternative solution, based on ASR confidence ranking
36
Combined Beamforming and Localization in noisy environments
Beamforming
Multiple beamforming in parallel
Select
based
on
ASR
score
ranking
Source localization may be an implicit result
of multiple beamforming & ASR ranking
ASRBeamforming ASR
Beamforming ASRBeamforming ASR
cloudembedded
NOTE: osxAcousticSL Acoustic Source Localization library is not
effective in presence of strong Noise, Reflections and Reverberations.
37
Example of System Implementation 38
Beamforming
Acoustic Echo
Cancellation
reference audio OUT
ASR
Concurrent execution of multiple beamforming, AEC, and ASR
Select
based
on
ASR
score
ranking
ASRBeamforming ASR
Beamforming ASRBeamforming ASR
one microphone
cloudembedded
Hint: consider sensing the loudness level to switch off algorithms when they are not needed!
MEMS Microphone Array to Cloud Architecture
39
Integrated Terminal
Audio Front End Signal Processing
Communication
Interface
3rd Party
Cloud-based
ServicesGateway
Thank You