Transcript
Page 1: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

SpokenMedia:Automatic Lecture Transcription

and Rich Media Notebooks

Brandon Muramatsu [email protected] McKinney [email protected]

Peter Wilkins [email protected]

MIT, Office of Educational Innovation and Technology

Citation: Muramatsu, B., McKinney, A., Wilkins, P. (2010). SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks.Presented at NERCOMP 2010: Providence, Rhode Island, March 9, 2010.

Unless otherwise specified, this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

Page 2: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

SpokenMedia:Automatic Lecture Transcription

and Rich Media Notebooks

Brandon Muramatsu [email protected] McKinney [email protected]

Peter Wilkins [email protected]

MIT, Office of Educational Innovation and Technology

Citation: Muramatsu, B., McKinney, A., Wilkins, P. (2010). SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks.Presented at NERCOMP 2010: Providence, Rhode Island, March 9, 2010.

Unless otherwise specified, this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

B R E A K I N G N E W S…YouTube announces captions on all videos…News at 11……we now return you to your regularly scheduled presentation…

YouTube. (2010, March 4). The Future Will be Captioned: Improving Accessibility on YouTube. Retrieved on March 8, 2010 from YouTube Website:

http://youtube-global.blogspot.com/2010/03/future-will-be-captioned-improving.html

SpokenMedia: What to do if SpokenMedia: What to do if your videos aren’t in YouTubeyour videos aren’t in YouTube

Page 3: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Why are we doing this?

• More & more videos on the Web– Universities recording

course lectures– Students relying upon

Web video for courses

3

MIT OCW 8.01: Professor Lewin puts his life on the line in Lecture 11 by demonstrating his faith

in the Conservation of Mechanical Energy.

Page 4: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

What video? Where?

4

iTunes U

Page 5: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

What are the challenges?

• Search– Volume– Segmented

by Web, Video

– Text title and Description

5

Google Search for “angular momentum”Performed April 2009

Page 6: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Bing Search for “angular momentum” Performed August 2009

What about Bing?

6

Page 7: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

What are the Challenges?

• Description– Course and

Lecture Title– Summary– Metadata?

7

YouTube, MIT OCW Physics 8.01 - Lecture 20Retrieved August 2009

webcast.berkeley, Physics 8A, 002, Spring 2009

Retrieved August 2009

Page 8: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

What are the challenges? Use

• Interaction & Use– Transcripts /

captions• Do they exist?• Cost?

– Full video vs.segments

8

Lewin, W. (1999). Lec 20 | 8.01 Physics I: Classical Mechanics, Fall 1999.Retrieved August 1, 2009 from YouTube Website:

http://www.youtube.com/watch?v=ibePFvo22x4

“GOD!!!51 MINUTES!!i think i'll pass.. “

– slourdas, YouTube

Page 9: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Search thru the Static

We’re living in a video world…but only have text to use for search…

9

flickr @futureatlas.com

Page 10: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Why do we need these tools?

• Improve search and retrieval• Improve user experience• Captioning for accessibility?• Facilitate translation?• Other uses?

10

Page 11: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

YouTube Announcement

11YouTube. (2010, March 4). The Future Will be Captioned: Improving Accessibility on YouTube. Retrieved on March 8, 2010 from YouTube Website:

http://youtube-global.blogspot.com/2010/03/future-will-be-captioned-improving.html

Page 12: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Comparing SpokenMedia and YouTube Auto-Caption?

YouTube• Scale ✔• Research-basis ✔• For all videos ✔ (soon)• No transcript/caption

export (?)• YouTube hosted• Accuracy based on

general patterns (?)• No transcript editing (?)

SpokenMedia• Limited• Research-basis ✔• Service by request• Transcript/caption export

available ✔• Hosted anywhere ✔• Accuracy based on custom

models ✔ (soon)• Transcript editing ✔ (soon)

12

Page 13: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Developing SpokenMedia…

• What do we have at MIT?– Existing videos & audio, new video– Lecture notes, slides, etc. (descriptive text)– Multiple videos/audio by same lecturer– Diverse topics/disciplines

• Research from Spoken Language Systems Group !!!

13

Page 14: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Enabling Research

• Spoken Lecture: research project• Speech recognition & automated

transcription of lectures• Why lectures?

– Conversational, spontaneous, starts/stops– Different from broadcast news, other types of

speech recognition– Specialized vocabularies

14

James [email protected]

Page 15: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Spoken Lecture Project

• Processor, browser, workflow• Prototyped with lecture & seminar video

– MIT OCW (~300 hours, lectures)– MIT World (~80 hours, seminar speakers)

Supported with iCampus MIT/Microsoft Alliance funding

15

James [email protected]

Page 16: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Tech Transfer Timeline:Research -> Service

16

1990 2000 20102006

Spoken Language Systems GroupResearch

2009

Page 17: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Let’s see a demo!

17

Page 18: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Demo

18

Page 19: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

How Does it Work?

19

Lecture Transcription Workflow

Page 20: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Recognizer Accuracy? ~85%

• Accuracy– Domain Model and

Acoustic Model– Internal validity

measure – Single 100% accurate

transcript for a full course

20

Ongoing research by Jim Glass and his team

Page 21: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

What works today?

21

Lecture Transcription Workflow

Page 22: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Transcript “Errors”

• “angular momentum and forks it’s extremely non intuitive”– “folks”?– “torques”?

• “introduce both fork an angular momentum”– “torque”!

22

Page 23: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

That’s what we have today…

• Features– Video linked transcripts– “Bouncing Ball” follow along– Search within a video– Multiple transcript language support

• Challenges– Accuracy (partial toolset)

23

Page 24: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Where are we heading?

• Improved accuracy• Automate and improve processing• Search across multiple video transcripts

• Starting a lecture transcription service

24

Page 25: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Lecture Transcription Service

• Integrate with media production workflows– At MIT, University of Queensland

• Stand-alone service– Test with external content (video) producers

25

Page 26: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

A Lecture Transcription Service? Caveats

• Lecture-style content (technology optimized)• Up to 85% accuracy

– (good for search, not sure about accessibility)• English-language audio

– (need much more research for other languages)• Processing hosted at MIT (current thinking)

– Submit jobs via MIT-run service– Contribute audio, models, transcript for further

research

26

Page 27: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Test it for yourself!

http://spokenmedia.mit.edu/

http://sm.mit.edu/upload

27

Page 28: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Toward Rich Media NotebooksImproving the User Experience

• Innovative player interfaces (soon)– Bookmarking and annotation– Clip creation and authoring

• Transcript editing (soon)• Searching across collections of videos

28

Page 29: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Player with Annotation Mockup

29

Page 30: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Editing Interfaces

30

Soon(we’re designing the editing interfaces right now)

Page 31: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Thanks!

spokenmedia.mit.edu

Brandon Muramatsu [email protected] McKinney [email protected]

Peter Wilkins [email protected]

MIT, Office of Educational Innovation and Technology

Citation: Muramatsu, B., McKinney, A., Wilkins, P. (2010). SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks.Presented at NERCOMP 2010: Providence, Rhode Island, March 9, 2010.

Unless otherwise specified, this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License