Welcome to HLAA Webinars · the contents of this webinar were developed under a grant from the national institute on disability, independent living, and rehabilitation research (nidilrr

Welcome to HLAA Webinars

The Rise of Automatic Speech Recognition (ASR): The History, Current Technologies and Practical Tips

Guest Speaker: Linda Kozma-Spytek Senior Research Audiologist, Technology Access Program Co-Director, Deaf/Hard of Hearing Technology RERC Gallaudet University

The Rise of ASR

• What is ASR and how is it used

• History of ASR

• Examples of current technologies for captioning using ASR

• Practical considerations and tips

• Questions?

The mention of any product or service is strictly to serve as an example and does not constitute or imply an endorsement, nor should exclusion suggest disapproval.

What is automatic speech recognition?

• Computer hardware systems and software-based techniques to identify and process human voice

• Known by a variety of terms: speech to text, automatic voice recognition, voice to text, speech recognition

How is automatic speech recognition used?

• Used to perform an action based on the instructions defined by a user

• Used for authenticating users via their voice (biometric authentication)

• Used to convert words a person has spoken into text

ASR has a 70 Year History

• 1950s & 60s: Bell Laboratories and IBM – isolated numbers and words

• 1970s: US Department of Defense – ~1000 words (the vocabulary of a 3 yr. old)

• 1980s: new models of speech recognition – thousands of words, continuous speech


• 1990s: Dragon Dictate (Nuance) – first commercial ASR system with advent of personal computers

• 2000s: Google Voice Search – 230 billion words from user searches

• 2010s…


o IBM Watson used by thousands of companies from computer software to health care to financial services companies

o Apple Siri on 700 million+ iPhones

o Google Assistant on 400 million+ devices

o Microsoft Cortana on 400 million+ Windows 10 devices

o Amazon Alexa has thousands of hardware and software integrations for smart home use

“We’ve seen more progress in this technology in the last 30 months

than we saw in the last 30 years….

Ultimately vocal computing is replacing the traditional graphical user interface.”

Shawn DuBravac

Chief Economist Consumer Technology Association

CES 2017

Some common uses of ASR include:

• Medical dictation

• In-car systems control

• Journalism

• Solving Crimes

• Telephone interactive voice response

• Pronunciation training and evaluation

• Virtual and Voice Assistants

• Home automation

• Video games and…

Captioning for…

– Face-to-Face Conversation – Telecommunications – Media

o live and pre-recorded

– Presentations – Meetings

o in-person and remote

Face-to-Face Conversation (examples)

Live Transcribe (Google)

Microsoft Translator

Telecommunications (examples)

Hearing Life e-News from HLAA May 14, 2020

IPCTS

Skype (Microsoft)

Media: live and pre-recorded (examples)

Live Caption (Google)

Live Transcribe (alternative to Live Caption)

Set-up from: Technical Diagrams for Accessible Speech-to-Text Technologies Found at: http://deafearscientists.org/other-links/

3.5 mm audio jack connector

http://deafearscientists.org/other-links/




3.5 mm and ¼ inch audio jacks

Presentations (examples)

Microsoft PowerPoint This is an example of captions in PowerPoint.

Google Slides

This is an example of captions in Google Slides.

Meetings: in person and remote (examples)

Otter

Live Transcribe or Microsoft Translator using Screen Mirroring (alternatives to Otter)

USB-C connection on phone to HDMI connection on TV

USB Connectors HDMI

Videoconference Meetings

GoToMeeting + Otter

Videoconference Meetings

Google Meet

Performance, Features and Functions

Accuracy, Latency, Speaker Identification, Punctuation,

Non-Speech Information, Caption Size, Font, and Location,

Number of Lines of Text and Scrolling User Controls and Interface,

Saving, Editing and Printing Transcripts Specialized Vocabulary

I'd like to tell you a story that shows why moments like this, moments of chaos and disruption, allow us to discover things about ourselves. The story I'd like to share is of a young woman named Maya Shanker. When Maya was a girl, her mother gave her a violin. She was captivated by the instrument, and she began playing constantly. Eventually, she was accepted to Juilliard, the renowned music school in New York City. She became a student of the great violinist, Itzhak Perlman. She was well on her way to becoming a professional musician. Everything was going exactly to plan. And then one day as she was practicing, she over stretched a finger and felt a pop. She'd injured a tendon. Months passed. Her hand never healed properly, and eventually doctors told her she had to give up the violin. The grief that Maya felt was enormous. Her plan for her life, her very identity, had been wrapped around music. She didn't know what to do next. And then one day as she was helping her parents clean out their basement, she stumbled on one of her sister’s course books. It was The Language Instinct by Steven Pinker. And as she read the book, Maya found herself captivated by the human mind. She realized she wanted to learn more about the brain and how it works. And so she decided to study cognitive science. Fast forward a couple of decades and Maya has since completed a Rhodes scholarship and a stint in the Obama White House. I tell you Maya's story not to suggest it should be your story, but because it shows that when the world is uncertain, and the ground under our feet feels unsteady that's often the time we discover new things about ourselves.

that

it’s up early.……

should enjoy attended past or had….……. always

competed.. road…..

GIGO: Garbage In, Garbage Out

o Speaker Characteristics

o Speech clarity

o Rate of speech

o Accented speech

o Overlapping Speech

o Speech Coding

o Microphone(s)

o Location

o Quality and sensitivity

o Environment

o Noise

o Signal to noise levels

Other Considerations

• Compatibility with devices/software you use

• Cost of using the software/applications/cloud services

• Other services and equipment needed

• Built-in tools wanted or needed

• Power usage and Internet connectivity

• Privacy and security concerns

Final Thoughts

• Read the product information and manuals

• Use online help and tutorials

• Provide feedback; report problems

• Have patience for the learning process

• Don’t be afraid to try, and try again

• Expect the inevitable learning curve

Resources Document General Information • Knowledge Base: Strategies for Deaf and Hard of Hearing Communication

Technology Examples and Website Links • Face-to-Face Conversation: Ava, Google Live Transcribe, Microsoft Translator

• Telecommunications: Microsoft Skype, InnoCaption, Machine Genius, Google

Live Relay

• Media: Google Live Caption, Google Live Transcribe, Otter

• Presentations: Microsoft PowerPoint, Google Slides

• Meetings: Google Meet, Otter, Web Captioner

Questions?

If you have any questions, please contact:

[email protected]

THE CONTENTS OF THIS WEBINAR WERE DEVELOPED UNDER A GRANT FROM THE NATIONAL INSTITUTE ON DISABILITY, INDEPENDENT LIVING, AND REHABILITATION RESEARCH (NIDILRR GRANT NUMBER 90REGE0013-01-00). NIDILRR IS A CENTER WITHIN THE ADMINISTRATION FOR COMMUNITY LIVING (ACL), DEPARTMENT OF

HEALTH AND HUMAN SERVICES (HHS). THE CONTENTS OF THIS WEBINAR DO NOT NECESSARILY REPRESENT THE POLICY OF NIDILRR, ACL, HHS, AND YOU

SHOULD NOT ASSUME ENDORSEMENT BY THE FEDERAL GOVERNMENT.

Thank you for joining HLAA Webinars

For more educational resources on hearing loss and recorded webinars,

please visit hearingloss.org

Documents

Welcome to HLAA Webinars · the contents of this webinar were developed under a grant from the national institute on disability, independent living, and rehabilitation research (nidilrr