Upload
brandon-muramatsu
View
2.473
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Need to find a specific segment in an hour-long web video, webcast or podcast of a lecture? Want to read a transcript of that lecture? Want to bookmark, annotate, or discuss video or audio clips from an entire lecture? The SpokenMedia project at MIT is developing a web-based service to enable automatic lecture transcription. The project is also developing a suite of tools and services to improve interaction with webcasts and podcasts enabling students and faculty to create rich media notebooks to support their learning and teaching. Presented by Brandon Muramatsu, Andrew McKinney and Peter Wilkins at the NERCOMP 2010, Providence, Rhode Island, March 9, 2010.
Citation preview
SpokenMedia:Automatic Lecture Transcription
and Rich Media Notebooks
Brandon Muramatsu [email protected] McKinney [email protected]
Peter Wilkins [email protected]
MIT, Office of Educational Innovation and Technology
Citation: Muramatsu, B., McKinney, A., Wilkins, P. (2010). SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks.Presented at NERCOMP 2010: Providence, Rhode Island, March 9, 2010.
Unless otherwise specified, this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
SpokenMedia:Automatic Lecture Transcription
and Rich Media Notebooks
Brandon Muramatsu [email protected] McKinney [email protected]
Peter Wilkins [email protected]
MIT, Office of Educational Innovation and Technology
Citation: Muramatsu, B., McKinney, A., Wilkins, P. (2010). SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks.Presented at NERCOMP 2010: Providence, Rhode Island, March 9, 2010.
Unless otherwise specified, this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
B R E A K I N G N E W S…YouTube announces captions on all videos…News at 11……we now return you to your regularly scheduled presentation…
YouTube. (2010, March 4). The Future Will be Captioned: Improving Accessibility on YouTube. Retrieved on March 8, 2010 from YouTube Website:
http://youtube-global.blogspot.com/2010/03/future-will-be-captioned-improving.html
SpokenMedia: What to do if SpokenMedia: What to do if your videos aren’t in YouTubeyour videos aren’t in YouTube
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Why are we doing this?
• More & more videos on the Web– Universities recording
course lectures– Students relying upon
Web video for courses
3
MIT OCW 8.01: Professor Lewin puts his life on the line in Lecture 11 by demonstrating his faith
in the Conservation of Mechanical Energy.
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
What video? Where?
4
iTunes U
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
What are the challenges?
• Search– Volume– Segmented
by Web, Video
– Text title and Description
5
Google Search for “angular momentum”Performed April 2009
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Bing Search for “angular momentum” Performed August 2009
What about Bing?
6
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
What are the Challenges?
• Description– Course and
Lecture Title– Summary– Metadata?
7
YouTube, MIT OCW Physics 8.01 - Lecture 20Retrieved August 2009
webcast.berkeley, Physics 8A, 002, Spring 2009
Retrieved August 2009
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
What are the challenges? Use
• Interaction & Use– Transcripts /
captions• Do they exist?• Cost?
– Full video vs.segments
8
Lewin, W. (1999). Lec 20 | 8.01 Physics I: Classical Mechanics, Fall 1999.Retrieved August 1, 2009 from YouTube Website:
http://www.youtube.com/watch?v=ibePFvo22x4
“GOD!!!51 MINUTES!!i think i'll pass.. “
– slourdas, YouTube
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Search thru the Static
We’re living in a video world…but only have text to use for search…
9
flickr @futureatlas.com
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Why do we need these tools?
• Improve search and retrieval• Improve user experience• Captioning for accessibility?• Facilitate translation?• Other uses?
10
YouTube Announcement
11YouTube. (2010, March 4). The Future Will be Captioned: Improving Accessibility on YouTube. Retrieved on March 8, 2010 from YouTube Website:
http://youtube-global.blogspot.com/2010/03/future-will-be-captioned-improving.html
Comparing SpokenMedia and YouTube Auto-Caption?
YouTube• Scale ✔• Research-basis ✔• For all videos ✔ (soon)• No transcript/caption
export (?)• YouTube hosted• Accuracy based on
general patterns (?)• No transcript editing (?)
SpokenMedia• Limited• Research-basis ✔• Service by request• Transcript/caption export
available ✔• Hosted anywhere ✔• Accuracy based on custom
models ✔ (soon)• Transcript editing ✔ (soon)
12
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Developing SpokenMedia…
• What do we have at MIT?– Existing videos & audio, new video– Lecture notes, slides, etc. (descriptive text)– Multiple videos/audio by same lecturer– Diverse topics/disciplines
• Research from Spoken Language Systems Group !!!
13
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Enabling Research
• Spoken Lecture: research project• Speech recognition & automated
transcription of lectures• Why lectures?
– Conversational, spontaneous, starts/stops– Different from broadcast news, other types of
speech recognition– Specialized vocabularies
14
James [email protected]
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Spoken Lecture Project
• Processor, browser, workflow• Prototyped with lecture & seminar video
– MIT OCW (~300 hours, lectures)– MIT World (~80 hours, seminar speakers)
Supported with iCampus MIT/Microsoft Alliance funding
15
James [email protected]
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Tech Transfer Timeline:Research -> Service
16
1990 2000 20102006
Spoken Language Systems GroupResearch
2009
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Let’s see a demo!
17
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Demo
18
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
How Does it Work?
19
Lecture Transcription Workflow
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Recognizer Accuracy? ~85%
• Accuracy– Domain Model and
Acoustic Model– Internal validity
measure – Single 100% accurate
transcript for a full course
20
Ongoing research by Jim Glass and his team
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
What works today?
21
Lecture Transcription Workflow
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Transcript “Errors”
• “angular momentum and forks it’s extremely non intuitive”– “folks”?– “torques”?
• “introduce both fork an angular momentum”– “torque”!
22
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
That’s what we have today…
• Features– Video linked transcripts– “Bouncing Ball” follow along– Search within a video– Multiple transcript language support
• Challenges– Accuracy (partial toolset)
23
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Where are we heading?
• Improved accuracy• Automate and improve processing• Search across multiple video transcripts
• Starting a lecture transcription service
24
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Lecture Transcription Service
• Integrate with media production workflows– At MIT, University of Queensland
• Stand-alone service– Test with external content (video) producers
25
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
A Lecture Transcription Service? Caveats
• Lecture-style content (technology optimized)• Up to 85% accuracy
– (good for search, not sure about accessibility)• English-language audio
– (need much more research for other languages)• Processing hosted at MIT (current thinking)
– Submit jobs via MIT-run service– Contribute audio, models, transcript for further
research
26
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Test it for yourself!
http://spokenmedia.mit.edu/
http://sm.mit.edu/upload
27
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Toward Rich Media NotebooksImproving the User Experience
• Innovative player interfaces (soon)– Bookmarking and annotation– Clip creation and authoring
• Transcript editing (soon)• Searching across collections of videos
28
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Player with Annotation Mockup
29
Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)
Editing Interfaces
30
Soon(we’re designing the editing interfaces right now)
Thanks!
spokenmedia.mit.edu
Brandon Muramatsu [email protected] McKinney [email protected]
Peter Wilkins [email protected]
MIT, Office of Educational Innovation and Technology
Citation: Muramatsu, B., McKinney, A., Wilkins, P. (2010). SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks.Presented at NERCOMP 2010: Providence, Rhode Island, March 9, 2010.
Unless otherwise specified, this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License