2014/2/221 MASTAR Project, Universal Communication Research Institute E-mail: [email protected] Chiori Hori Ph.D. Spoken Language Communication Laboratory

23/04/10 1MASTAR Project, Universal Communication Research InstituteE-mail: [email protected]

Chiori Hori Ph.D.Spoken Language Communication Laboratory

National Institute of Information and Communications Technology(NICT)

Geneva, 25 November 2011

Telecommunications Relay Services in Speech-to-Speech translation system

in accordance with Recommendations F.745 and H.625

ITU-T Workshop on“Telecommunications relay services for persons with disabilities ”

(Geneva, 25 November 2011)


Telecommunications Relay Services in Speech-to-Speech translationTelecommunications Relay Services in Speech-to-Speech translationin accordance with ITU-T Recommendations F.745 and H.625in accordance with ITU-T Recommendations F.745 and H.625

Speech-to-Speech Translation

Communicating between more languages can be actualized using S2ST technology by connecting distributed S2ST servers, (i.e., ASR, MT, TTS) all over the world.

Speech-to-Speech Translation (S2ST) technologies are an effective means to break through language barriers between people who do not speak the same language.

EnglishEnglish““I go to school”I go to school”

AutomaticAutomaticSpeech Speech

RecognitionRecognition(ASR)(ASR)

MachineTranslation

(MT)

SpeechSpeechSynthesisSynthesis

(TTS)(TTS)

w a t a sh i w a t a sh i w a g a xtu w a g a xtu k o o n i…..k o o n i…..

私は私は学校に行く学校に行く

I go to I go to school school

JapaneseJapanese「私は学校に行く」「私は学校に行く」

Convert from phoneme

to word

Convert from Japanese text

to English text

Convertfrom text

to waveform

Japanese speech and

textcorpora

Japanese speech and

textcorpora

Japanese-to-English

parallel corpora

Japanese-to-English

parallel corpora

English speech corpora

English speech corpora

Large amount of training data for machine learning

Network-based S2ST systems

Network

Communication between users

who speak different languages

Speaker of Language

B

Digitalization of speech

signals

MC client

ASR serverASR server

Conversion from speech signal to text in Language

A

MC serverMC server

Speaker of Language

A

Digitalization of speech

signals

MC client

MT serverMT server

Conversion from text in A to text in

B

MC serverMC server

TTS serverTTS server

Conversion from text in B to speech

signal

MC serverMC server

ASR serverASR server

Conversion from speech signal to text in Language

B

MC serverMC server

MT serverMT server

Conversion from text in B to text in

A

MC serverMC server

TTS serverTTS server

Conversion from text in A to speech

signal

MC serverMC server


Japanese speaker’s device

Chinese speaker’s device

Network-based Speech Translation System Network-based Speech Translation System in accordance with ITU-T Recommendations F.745 and H.625in accordance with ITU-T Recommendations F.745 and H.625

Network-based S2ST application via multilateral translation on smartphone/tablet/PC/TV

English speaker’s device

飲み水は 13 ： 00 から市役所前で配給します．

Water to drink will be provided in front of the city hall from13 ： 00.

On-site communicationOn-site communication

从下午一点开始，在市政府门前供应饮用水。

Papa, maman, comment vas-

tu?

Remote CommunicationRemote Communication

お父さん，お母さんお元気ですか？


Modality Conversion Markup Language (MCML)

XML schema, ITU-T name space (http://www.itu.int/xml-namespace/itu-t/H.645/MCML.xsd) MCML includes information for communication between multiple persons who use different modalities. Ex. speech, text, image, video data input by users or output by MCML servers such as ASR, MT, TTS , Sign Recognition systems.

Network-based Speech Translation System Network-based Speech Translation System in accordance with ITU-T Recommendations F.745 and H.625in accordance with ITU-T Recommendations F.745 and H.625

http://www.itu.int/rec/T-REC-F.745-201010-I/en



U-STAR Consortium

The Universal Speech Translation Advanced Research (U-STAR) Consortium has been established as an international research collaboration entity with the goal of developing a world wide network-based speech-to-speech translation system. The consortium objective is to create a basic infrastructure for spoken language communication to overcome the language barriers that exist around the world. Currently, there are participant members from 14 countries (15 institutes).

Plan for Field experiment

Period: One year from April of 2012 including during the 2012 London Olympics

Application: Multiparty conversation via a network-based S2ST system on iPhones and Android phones (Free)

MCML servers: ASR,MT, TTS servers will be provided by U-STAR members

Potential languages: Chinese, Dzongkha, English, Filipino, Hindi, Indonesian Japanese, Korean, Mongolian, Malay Nepali, Sinhala, Thai, Urdu, Vietnamese and some European languages


The U-STAR membersInstitute Country Language

DITT Bhutan Dzongkha

UPD Philippines Filipino

CDAC India Hindi

BPPT Indonesia Indonesian

NICT Japan Japanese

ETRI Korea Korean

MUST Mongolia Mongolian

NUM Mongolia Mongolian

I2R Singapore Malay

LTK Nepal Nepali

UCSC Sri Lanka Sinhala

NECTEC Thailand Thai

KICS-UET Pakistan Urdu

IOIT Vietnam Vietnamese

CASIA China Chinese


Potential European Language

French, German, Italian,Portuguese, Spanish, Turkish,

British English

Documents

2014/2/221 MASTAR Project, Universal Communication Research Institute E-mail: [email protected] Chiori Hori Ph.D. Spoken Language Communication Laboratory