Speech recognition: ready to take off?

View
1.384
Download
1
Category

Business

Preview:

Citation preview

By: Ma Jie (A0129447X)Niu Rui (A0040287J)Nguyen Gia Huy (A0045581E)Liu Lili (A0132407R)Tan Gee Kwang (A0147159X)

Speech Recognition: Ready to Take Off?

Overview

• Siri

• Other applicationsPerformance

of SR

• Underlying technologySR improvement

• Avionics

• Field AutomationEmerging

Application

Overview

• Siri

• Other applicationsPerformance

of SR

• Underlying technologySR improvement

• Avionics

• Field AutomationEmerging

Application

In 2013, Intelligent Voice survey showed that only 15% of respondents said that they had used Siri in iOS7. Nearly half believed Apple had “oversold Siri’s voice recognition capabilities”

2015 WWDC, Apple’s software engineering vice president claimed that Siri Gets 1 Billion Requests a Week

Performance of Siri

Doing Basic Math faster Find facts two times faster Four Time faster than you to set alarms Tweets more than two times faster than you Convert measurements

Siri Usage Rate Detail and Customer Satisfaction

Source: http://www.imore.com/siri-months-community-report-card

15%

36%

10%

20%

12%

7%

Do you use Siri on your iOS device?

Yes, and I like it

Yes, but it could be better

Yes, and I'm neutral

No: tried it and didn't like it

No: I didn't even try because I have no desire

Other

Source: http://www.besttechie.com/2013/03/07/do-people-still-use-siri/

Performance of Siri

Apple claims that iOS 9, Siriwill be up to 40 percent faster and 40 percent more accurate

What has hold it back?1. There is learning curve. 2. It’s far from perfect3. The use cases are limited4. Lack of integration of third-party apps

Speech Recognition Market

Source: Matt M., Joshua S., and David H. 2014. Dynamic Commercialization Strategies for Disruptive Technologies: Evidence from the Speech Recognition Industry

In past 50 years, the technological breakthroughs haven enabled the SR become reality.

Coupled with the advances in CPU power and enhanced software algorithms, SR had achieved steep improvement and commercial feasibility after 1990s.

Current Applications of SR

Applications in various industries

Call Centers

Medical Industries

Education

Automotive

Home Automation

Students with disabilities used a SR powered Hosted Transcription System (HTS) to convert digitized audio and video into accessible, Multimedia Transcripts

In 2011, 52% of Canadian disability service providers interviewed reported using speech to text supports

Strengthen by lowering WER

Problems:

– Scalability to meet temporal demands

– Fixed cost for infrastructure

SR in Educational – Liberated learning project (LLR)Quality

Cost

Source: http://www.transcribeyourclass.ca/financial.html

HIS Automotive: About 25% U.S. motorists use speech recognition in their cars dailyand 53% use it at least once a week; by 2020, 68 million vehicles worldwide will have voice controls, increased by 84% from 37 million in 2014.

SR in Automotive

Most SR in today’s market have about 50 to 60 voice commands

Common used features: Make calls, play music, temperature control, navigation.

More features available: Reminders, Send emails, search nearby restaurants/shops/petrol stations, real-time traffic conditions, connect to other SR control system (e.g. home automation)…

SR in Automotive

Nuance – Dragon Drive Platform

– Cloud-based voice and content solutions

– Integrated with in-vehicle cloud-based search capabilities from Telenav, leader of location-based services (Source: Telenav, Nov 3, 2015)

– Attractive features – Read out the daily update when enters the car, Connect the home to your car through LG HomeChat software

SR in Automotive

Video: https://www.youtube.com/watch?v=laxXWUxXcWs

http://www.youtube.com/watch?v=laxXWUxXcWs&t=1m54s

http://www.youtube.com/watch?v=laxXWUxXcWs&t=1m54s

Problems encountered with ASR in cars -

– Doesn’t recognize/misinterprets verbal commands (63 percent)

– Doesn’t recognize/misinterprets names/words (44 percent)

– Doesn’t recognize/misinterprets numbers (31 percent)

– Wind noise

– Language accents

– Imperfect speech recognition software might prove to be a distraction

SR in Automotive

SR in Home Automation

Smart home

– Lighting control (Vocca)

– TV (apple TV)

– Personal Assistant (Echo, Homey)

SR in Home Automation – Apple TV

The Apple TV uses Siri search as the glue that holds all those individual apps together. Voice commands (also found on Roku, Android TV and Amazon Fire TV) are easier than entering names on a virtual keyboard. And despite some rough edges, Siri is more helpful than the rest.

Siri’s advantage is more advanced queries.

Six degrees of Kevin Bacon

Filter TV episodes by actors

Rewind

Siri’s limitation:

Pronunciation of difficult names

TV show recognition by genres

Source: http://www.wsj.com/articles/apple-tv-review-a-giant-iphone-for-your-living-room-1446080460

The TV of the future needs to be as powerful and easy to use as an iPhone, and this Apple TV is the first box—and the first Apple TV—to achieve that.

Amazon Echo – launch in November 6, 2014 Limited and June 23, 2015 Wide

Can answer general questions, reorder the items you buy frequently from Amazon, and play music

SR in Home Automation

Source: http://www.amazon.com/Amazon-SK705DI-Echo/dp/B00X4WHP5E/ref=sr_1_1?ie=UTF8&qid=1446173814&sr=8-1&keywords=amazon+echo

Source: http://www.cnet.com/products/amazon-echo-review/

Apple's HomeKit

– A framework for communicating with and controlling connected accessories in a user’s home, announced in Apple WWDC 2014.

SR in Home Automation

HomeKit-certified devicesecobee3 Use sensors and a thermostat to keep tabs on your home’s temp.

ElgatoA variety of Elgato’s Eve sensors will give you all kinds of information about what’s going on inside your home. (Door & Window, Energy, Weather, Room)

iHome Connect ordinary devices into the smart plug, and you can start controlling them with your phone.Insteon The company’s hub can control all its products, including lights and locks, even from outside your home.Lutron Control your lights and shades with its bridges and kits.

iDevicesPlug anything into the company’s indoor or outdoor switch to make the device smart, and control your climate with the thermostat.

Schlage You’ll be able to ask Siri to lock and unlock your door.

AugustThe smart lock company announced a doorbell camera and keypad to its lineup, but it’s just the new lock that works with Siri for now.

Coming Plugs, Thermostats (Honeywell Lyric), Lighting (Philips), Alarm System (Honeywell Lynx Security System)

PartnershipsChamberlain MA Garage, Cree, Friday Smart Lock, GE (color-changing LEDs), Haier (smart air-conditioner), Incipio, Kwikset, Netatmo, Osram Sylvania, Philips Hue, SkyBell, Withings (baby monitors)

Source: http://www.digitaltrends.com/home/a-list-of-apple-homekit-compatible-devices/

Total price: US$2000

SR in Home Automation

Source: http://publications.lib.chalmers.se/records/fulltext/203117/203117.pdf

Most common used features

Other features that users would like

There is user base for SR (doctors, drivers, smart phone users…)

But the fact is that most of the customers only tried few times or use basic commands for SR when they have to (driving, busy hands, etc.)

Why?

– SR doesn’t recognize the complicated commands, which offers limitations to the features

– SR reacts very slow

– Takes time to train it

– Interaction with SR is not natural; words must be clear and without emotion

– Bad first impression, no interest to try even SR is improving

Summary of Challenges in SR

Customers don’t think that using SR is necessary in their daily life!

Overview

• Siri

• Other applicationsPerformance

of SR

• Underlying technologySR improvement

• Avionics

• Field AutomationEmerging

Application

ComponentsRequirementsDimension

SpeedProcess the algorithms

Processor

Underlying Technology of Speech Recognition

Source: http://web.sfc.keio.ac.jp/~rdv/keio/sfc/teaching/architecture/architecture-2008/lec07-cache.html

AchievementsRequirementsDimension

Accuracy

Quality of Signal Receive

Background noise

elimination

Channel effect elimination

Acoustic scoring

Deep Learning

Acoustic database

Language Matching

Modelling

Language database

Underlying Technology of Speech Recognition

AchievementsRequirementsDimension

Accuracy

Quality of Signal Receive

Background noise

elimination

Channel effect elimination

Acoustic scoring

Deep Learning

Acoustic database

Language Matching

Modelling

Language database

Underlying Technology of Speech Recognition

Microphone

Components

AchievementsRequirementsDimension

Accuracy

Quality of Signal Receive

Background noise

elimination

Channel effect elimination

Acoustic scoring

Deep Learning

Acoustic database

Language Matching

Modelling

Language database

Underlying Technology of Speech Recognition

Memory

Components

• Speech Recognition needs support from data base which can be local or in Cloud.

• Performance of memory is far behind processor, bottleneck of SRS is memory speed (network speed if with Cloud)

Source: http://web.sfc.keio.ac.jp/~rdv/keio/sfc/teaching/architecture/architecture-2008/lec07-cache.html

AchievementsRequirementsDimension

Accuracy

Quality of Signal Receive

Background noise

elimination

Channel effect elimination

Acoustic scoring

Deep Learning

Acoustic database

Language Matching

Modelling

Language database

Underlying Technology of Speech Recognition

Algorithms

Components

Noise Elimination Algorithm Performance• Noise has two main effects over the speech representation: distortion in the

representation space, and a loss of information. • Study shows that noise compensation methods will help to improve the accuracy in

different SNR (signal noise ratio) levels and distances

Source: Angel de la T. et al. Speech Recognition Under Noise Conditions: Compensation Methods

Source: Pedro J. Moreno, 1996, Speech Recognition in Noisy Environments

Speakers may have different accents, dialects, or pronunciations, and speak in different styles, at different rates, and in different emotional states.

Deep learning, introduced in 2006, attempt to learn multiple levels of representation of increasing complexity/abstraction.

A new architecture, the deep belief network (DBN)-HMM, has been developed in 2012.

Deep Learning

Idea was started from 1970s, but the progress is very slow -> Computational and data limitations

Deep learning - one step closer to artificial intelligence

Deep Learning

More data Faster hardware

Word error rate (WER) for SR technology in automotive has been reduced to below <1%

Accuracy of SR

Source: http://whatsnext.nuance.com/in-the-labs/deep-learning-in-connected-cars/

Overall WER improvement for SR

Accuracy of SR

Source: http://whatsnext.nuance.com/in-the-labs/what-is-deep-machine-learning/

Accuracy of SR According to Baidu, their error rates in a clean environment were at 6.56% and

19.06% in noisy environments by using GPUs

Apple claims that Siri in iOS 9 has only a 5% word error rate

Siri in iOS 9 requests to teach Siri your voice whenever change to a new language

Source: NVIDIA GTC: The Race To Perfect Voice Recognition Using GPUs

TARGET: < 0.1% or even 0%

How will SR improve further?

Customers don’t think that using SR is necessary in their daily life!

BUT IF –SR is faster and smarter to understand the commands, with more features available

Customers might start thinking: Why not try SR?

For example: Ability to recognize multilingual content, direct link to third-party apps, allow multi-users to interact at the same time…

So, when will SR like Siri be able to widely used by customers?

2020 to 2025– Improvement of Deep Learning (Apple has just acquired VocalIQ in Oct, 2015) for

more intelligent algorithm

– Improvement of Big data, multiple channels to enhance data base used in modeling for higher accuracy

– Improvement of Mobile network, faster response for better customer experience

– With diffusion of smart devices and apps, new customers will get more chance to accept SR before old hobby formed

– Potential new standard of human-machine interface

– Cost will be reduced further with core components improvement

How will SR improve further?

Speech Recognition: Future Market Trend Voice will be the most important area for growth in mobile user interfaces

Tractica forecasts the growth rate for SR: reach $5.1 billion by 2024 at a CAGR of 40%

Strongest market - Consumer-facing market: Mobile device authentication and control of wearable devices

Global Automotive Voice Recognition Market 2014-2018 forecasts the automotive voice recognition sector to grow at 10.59% CAGR to 2018

Speech Recognition: Future Market Trend SR market in Automotive

Market for Home automation

– Annual growth rate can reach 67% over next 5 years

– Revenue arrives $61billion with 52% compound annual growth rate, forecast the value can reach $490 billion in 2019

Speech Recognition: Future Market Trend

Overview

• Siri

• Other applicationsPerformance

of SR

• Underlying technologySR improvement

• Avionics

• Field AutomationEmerging

Application

SR in Avionics - Head-in and Head-out in cockpit

Multi-function displays with menu structures many tiers deep

Pilot needs one hand on collective while the other one on the joystick

SR in Avionics

Speech recognition reduce workload and free hands for pilots.

With increment of head up time, pilot can focus on flying the aircraft and response to out environment.

Noise elimination and integration with onboard system

http://www.speech.sri.com/press/airforce-print-news-oct15-2007.pdf http://www.gizmag.com/go/7484/

Navigation Functions

• Entering waypoints and inputting FMS data

• Reduce confusion

Communication Functions

• Change frequencies of channel by voice control

• Query system by “asking”

Checklist

• Task list

• Avionics monitor

Safety and security are roadblocks for SR adoption in avionics

Entry level functions with low safety concerns

SR in Avionics

SR Deployment in Avionics

2000 2007 2008 2014 2015Typhoon Gazelle F-35 & F-22 Sferion Assistance System

Direct input voice system Speaker- independent system

Start in civil avionics

Pro Line Fusion flight deck

It is not a technology problem, but more of an acceptance problem.

Air transport will accept after SR product actually comes out and proves its value

SR Commercialization in Avionics

"We've hit our sweet spot finally and its gotten to the point where its getting very, very close to being product ready in terms of being mature enough to get out there."

- Geoff Shapiro from Rockwell CollinesResource: http://www.aviationtoday.com/av/topstories/Rockwell-Collins-Rapidly-Advancing-Cockpit-Voice-Recognition-Technology_83515.html#.Vjm710b0wTY

SR in Field Automation

Equipment inspection in the field by using portable devices embedded with speech recognition system

Enter data faster and reduce the cost

Source: https://www.earthworksaction.org/issues/detail/oil_and_gas_noise#.Vh_SNN-qpBchttp://www.ehjournal.net/content/14/1/18

SR in Field Automation

Noise level is very high thus noise elimination will be more challenging

https://www.earthworksaction.org/issues/detail/oil_and_gas_noise

http://www.ehjournal.net/content/14/1/18

Robot designed for dedicate functions can only receive pre-defined instruction

Low request for noise elimination, process and memory

SR in Personal Robot for Family

Artificial Intelligence – Key technology for future improvement of SR

We should “talk” rather than type

Artificial Intelligence should be deployed in any complex environment with capacity to understand the instruction

High request for noise elimination, process and memory

SR in the future – everywhere in your life

Driving in the car Shopping in the mall Eating in the canteen

Q&A

Recommended

speech recognition

Documents

Speech Recognition and Speech Translation

Documents

ISSUES IN SPEECH RECOGNITION Shraddha Sharma. Contents: Introduction What is speech recognition? Terminology of speech recognition Why we want speech

Documents

SPEECH RECOGNITION:

Documents

ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition? also known as automatic speech recognition or computer speech

Documents

Speech Recognition. What makes speech recognition hard?

Documents

SpeM: Modeling Human Speech Recognition - MRC ... · Web viewKeywords: human speech recognition; automatic speech recognition; spoken word recognition; computational modeling Abstract

Documents

Interaction Speech Recognition Technical Reference · 2020-05-05 · 6. Interaction Speech Recognition recognizes the response. 7. Interaction Speech Recognition returns the recognition

Documents

Speech and Speech Recognition resources

Documents

Information for Speech Recognition Joint Processing of ... Speech Recognition ... speech onset cues with audio-based speech energy Audio-Visual Speech synthesis ... speech recognition

Documents

Is Automatic Speech Recognition Ready for Direct Use by Classroom Teachers?

Documents

1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types

Documents

The Practical Guide to Speech Recognition · Speech recognition offers a rapid and substantial payback. Table One: Increasing Self-Help with Speech Recognition 3 Speech Recognition

Documents

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W)

Documents

Chapter 5: Speech Recognition An example of a speech recognition system Speech recognition techniques Ch5., v.5b1

Documents