Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Welcome to HLAA Webinars
The Rise of Automatic Speech Recognition (ASR): The History, Current Technologies and Practical Tips
Guest Speaker: Linda Kozma-Spytek Senior Research Audiologist, Technology Access Program Co-Director, Deaf/Hard of Hearing Technology RERC Gallaudet University
The Rise of ASR
• What is ASR and how is it used
• History of ASR
• Examples of current technologies for captioning using ASR
• Practical considerations and tips
• Questions?
The mention of any product or service is strictly to serve as an example and does not constitute or imply an endorsement, nor should exclusion suggest disapproval.
What is automatic speech recognition?
• Computer hardware systems and software-based techniques to identify and process human voice
• Known by a variety of terms: speech to text, automatic voice recognition, voice to text, speech recognition
How is automatic speech recognition used?
• Used to perform an action based on the instructions defined by a user
• Used for authenticating users via their voice (biometric authentication)
• Used to convert words a person has spoken into text
ASR has a 70 Year History
• 1950s & 60s: Bell Laboratories and IBM – isolated numbers and words
• 1970s: US Department of Defense – ~1000 words (the vocabulary of a 3 yr. old)
• 1980s: new models of speech recognition – thousands of words, continuous speech
ASR has a 70 Year History
• 1990s: Dragon Dictate (Nuance) – first commercial ASR system with advent of personal computers
• 2000s: Google Voice Search – 230 billion words from user searches
• 2010s…
ASR has a 70 Year History
o IBM Watson used by thousands of companies from computer software to health care to financial services companies
o Apple Siri on 700 million+ iPhones
o Google Assistant on 400 million+ devices
o Microsoft Cortana on 400 million+ Windows 10 devices
o Amazon Alexa has thousands of hardware and software integrations for smart home use
“We’ve seen more progress in this technology in the last 30 months
than we saw in the last 30 years….
Ultimately vocal computing is replacing the traditional graphical user interface.”
Shawn DuBravac
Chief Economist Consumer Technology Association
CES 2017
Some common uses of ASR include:
• Medical dictation
• In-car systems control
• Journalism
• Solving Crimes
• Telephone interactive voice response
• Pronunciation training and evaluation
• Virtual and Voice Assistants
• Home automation
• Video games and…
Captioning for…
– Face-to-Face Conversation – Telecommunications – Media
o live and pre-recorded
– Presentations – Meetings
o in-person and remote
Face-to-Face Conversation (examples)
Live Transcribe (Google)
Microsoft Translator
Telecommunications (examples)
Hearing Life e-News from HLAA May 14, 2020
IPCTS
Skype (Microsoft)
Media: live and pre-recorded (examples)
Live Caption (Google)
Live Transcribe (alternative to Live Caption)
Set-up from: Technical Diagrams for Accessible Speech-to-Text Technologies Found at: http://deafearscientists.org/other-links/
3.5 mm audio jack connector
3.5 mm and ¼ inch audio jacks
Presentations (examples)
Microsoft PowerPoint This is an example of captions in PowerPoint.
Google Slides
This is an example of captions in Google Slides.
Meetings: in person and remote (examples)
Otter
Live Transcribe or Microsoft Translator using Screen Mirroring (alternatives to Otter)
USB-C connection on phone to HDMI connection on TV
USB Connectors HDMI
Videoconference Meetings
GoToMeeting + Otter
Videoconference Meetings
Google Meet
Performance, Features and Functions
Accuracy, Latency, Speaker Identification, Punctuation,
Non-Speech Information, Caption Size, Font, and Location,
Number of Lines of Text and Scrolling User Controls and Interface,
Saving, Editing and Printing Transcripts Specialized Vocabulary
I'd like to tell you a story that shows why moments like this, moments of chaos and disruption, allow us to discover things about ourselves. The story I'd like to share is of a young woman named Maya Shanker. When Maya was a girl, her mother gave her a violin. She was captivated by the instrument, and she began playing constantly. Eventually, she was accepted to Juilliard, the renowned music school in New York City. She became a student of the great violinist, Itzhak Perlman. She was well on her way to becoming a professional musician. Everything was going exactly to plan. And then one day as she was practicing, she over stretched a finger and felt a pop. She'd injured a tendon. Months passed. Her hand never healed properly, and eventually doctors told her she had to give up the violin. The grief that Maya felt was enormous. Her plan for her life, her very identity, had been wrapped around music. She didn't know what to do next. And then one day as she was helping her parents clean out their basement, she stumbled on one of her sister’s course books. It was The Language Instinct by Steven Pinker. And as she read the book, Maya found herself captivated by the human mind. She realized she wanted to learn more about the brain and how it works. And so she decided to study cognitive science. Fast forward a couple of decades and Maya has since completed a Rhodes scholarship and a stint in the Obama White House. I tell you Maya's story not to suggest it should be your story, but because it shows that when the world is uncertain, and the ground under our feet feels unsteady that's often the time we discover new things about ourselves.
that
it’s up early.……
should enjoy attended past or had….……. always
competed.. road…..
GIGO: Garbage In, Garbage Out
o Speaker Characteristics
o Speech clarity
o Rate of speech
o Accented speech
o Overlapping Speech
o Speech Coding
o Microphone(s)
o Location
o Quality and sensitivity
o Environment
o Noise
o Signal to noise levels
Other Considerations
• Compatibility with devices/software you use
• Cost of using the software/applications/cloud services
• Other services and equipment needed
• Built-in tools wanted or needed
• Power usage and Internet connectivity
• Privacy and security concerns
Final Thoughts
• Read the product information and manuals
• Use online help and tutorials
• Provide feedback; report problems
• Have patience for the learning process
• Don’t be afraid to try, and try again
• Expect the inevitable learning curve
Resources Document General Information • Knowledge Base: Strategies for Deaf and Hard of Hearing Communication
Technology Examples and Website Links • Face-to-Face Conversation: Ava, Google Live Transcribe, Microsoft Translator
• Telecommunications: Microsoft Skype, InnoCaption, Machine Genius, Google
Live Relay
• Media: Google Live Caption, Google Live Transcribe, Otter
• Presentations: Microsoft PowerPoint, Google Slides
• Meetings: Google Meet, Otter, Web Captioner
Questions?
If you have any questions, please contact:
THE CONTENTS OF THIS WEBINAR WERE DEVELOPED UNDER A GRANT FROM THE NATIONAL INSTITUTE ON DISABILITY, INDEPENDENT LIVING, AND REHABILITATION RESEARCH (NIDILRR GRANT NUMBER 90REGE0013-01-00). NIDILRR IS A CENTER WITHIN THE ADMINISTRATION FOR COMMUNITY LIVING (ACL), DEPARTMENT OF
HEALTH AND HUMAN SERVICES (HHS). THE CONTENTS OF THIS WEBINAR DO NOT NECESSARILY REPRESENT THE POLICY OF NIDILRR, ACL, HHS, AND YOU
SHOULD NOT ASSUME ENDORSEMENT BY THE FEDERAL GOVERNMENT.
Thank you for joining HLAA Webinars
For more educational resources on hearing loss and recorded webinars,
please visit hearingloss.org