Upload
lamkhanh
View
213
Download
0
Embed Size (px)
Citation preview
1
The BaBL project Real-Time Closed-Captioning for
WebRTC
Luis Villaseñor Muñoz
30th April 2014
2
BaBL, version 1.0: Project Goal
To develop a proof of concept WebRTC conference
application that is able to use the WebRTC's data channel
for transmitting real-time captioning.
4
BaBL, version 2.0: Project Goal
To develop a WebRTC multiconference application with
some extra features based on speech recognition as real-
time closed-captioning, instant translation or transcription
storage.
5
BaBL, version 2.0: Milestones
• Multiconference WebRTC application
• Real-Time Closed-Captioning
• Instant translation
• Transcription storage
6
Multiconference WebRTC application
• WebRTC, what is it?
WebRTC is a free, open project that enables web browsers with Real-
Time Communications capabilities.
• Its goal:
To enable rich, high quality, RTC applications to be developed in the
browser via simple Javascript APIs and HTML5.
[1] As stated in WebRTC.org.
7
WebRTC APIs
• MediaStream:
For acquiring audio and video.
• RTCPeerConnection:
For transmitting audio and video.
• RTCDataChannel:
For transmitting data.
8
MediaStream
• navigator.getUserMedia(constraints,
successCallback, errorCallback);
[2] Figure by Justin Uberti and Sam Dutton.
9
RTCPeerConnection
• Signaling:
Session description, ICE, STUN, TURN…
• Media engines:
Codecs, echo cancelation, noise reduction, jitter buffering…
• Security:
HTTPS, SRTP, DTLS…
11
Signaling server
• NodeJS:
Web server and signaling server.
Fully implemented using Javascript.
• Socket.io:
NodeJS module that enables websockets between clients and server.
12
Calling: The establishment
Download webpage (HTTP/HTTPS)
getUserMedia
getUserMedia
Download webpage (HTTP/HTTPS)
New user joined (websocket)
Create room (websocket)
Join room (websocket)
PeerConnection PeerConnection
Offer (websocket) Offer (websocket)
Answer (websocket) Answer (websocket)
createOffer
createAnswer
ICE candidates (websocket)
Media streams (SRTP)
ICE candidates (websocket)
User A User B Server
14
Calling: ICE/STUN/TURN
• Interactive Connectivity Establishment (ICE):
RFC 5245. Candidates for IP address.
• Session Traversal Utilities for NAT (STUN):
Request and response.
• Traversal Using Relays around NAT (TURN):
STUN extension. Relay. Useful but resource-intensive.
16
Real-time closed-captioning
• Web speech API:
SpeechRecognition interface: For converting the voice into text.
• WebRTC data channel:
For sending the text to the other peers.
17
Web Speech API
• Another HTML5 API:
Specification by W3C.
• Only implemented on Chrome:
The voice is sent to Google’s speech recognition web service.
A JSON object with a list of possible matches is returned.
They use it for voice searches: https://www.google.com/
18
RTCDataChannel
• Bidirectional peer to peer:
Really low latency.
• Secure:
Datagram Transport Layer Security.
• Unreliable or reliable:
Latency or accuracy.
19
Challenges
• Subtitles should be switched on/off by the remote user
We send the remote user’s requests using the signaling server.
• Continuous recognition
We keep a list of user requesting subtitles.
• Microphone permission
We use HTTPS.
20
Architecture
1. Subtitles request
2. Subtitles request
3. Voice 4. Subtitles
5. Subtitles
User A User B
Google server
Signaling server
22
Transcription storage
• Keeping record of our conversations:
Text is much lighter than audio or video.
And easier to find!
• Indexed DB:
One more HTML5 API.
Local storage in the client side.
24
Instant translation
• Translation services online:
They are not free.
• Microsoft Translator API:
Free 2 millions characters/month.
25
Challenges
• Should go through the server
My private developer key can’t be in the client side.
• When to request the translation?
isFinal flag. Not so real-time. But much cheaper!
26
Architecture
1. Subtitles request
2. Subtitles request
3. Voice 4. Subtitles
5. Subtitles
User A User B
6. Subtitles
7. Translated subtitles
8. Translated subtitles
Google Server
Signaling Server
Microsoft Translator Server
29
Spoken translated subtitles
• Speech Synthesis API:
The other interface included in the Web Speech API.
Chrome has some built-in speech engines.
31
Conclusion
•Not perfect:
Programmed by just one person.
Using free resources.
These technologies are still under development.
•A little more time, a little more resources:
And Sci-Fi won’t be Sci-Fi anymore!
32
References
• [1] Google Chrome team. WebRTC.org. http://www.webrtc.org/ [Online;
accessed 30-April-2014]
• [2] Justin Uberti and Sam Dutton. WebRTC. http://io13webrtc.appspot.com/
[Online; accessed 30-April-2014]
• [3] J. Rosenberg. Interactive Connectivity Establishment (ICE).
https://tools.ietf.org/html/rfc5245 [Online; accessed 30-April-2014]
34
Acknowledgements
• Don Monte and Nishant Agrawal
• Elias Yousef
• Javier Monte Condeoliva and Miguel Camacho Ruiz
• Tania Arenas de la Rubia
• Carol Davids