View
630
Download
0
Embed Size (px)
Citation preview
Deep Learning for Machine Translation
Satoshi Enoue, Jungi Kim, Jean Senellart, SYSTRAN
SYSTRAN Through Machine Translation History
Rule Base Machine Translation
Example-Based Machine Translation Phrase Based Machine Translation
Syntax Based Machine Translation
Neural MachineTranslation
Hybrid Machine Translation
SYSTRAN197
1968SYSTRAN (SYStem TRANslation) founded by Dr. Toma in La Jolla, California (USA)
1969Provided first MT software for the US Air Force, (Russian to English)
1975Used by NASA for the Apollo-Soyuz American-Soviet project
1975Translation systems for all European languages in the European Commission
1986SYSTRAN is acquired by France’s Gachot SA, thus becoming a French company with a U.S. subsidiary
1995Pioneered development of first Windows-based MT software
1997First free Web-based translation service: Altavista Babelfish. SYSTRAN made the Internet community aware of the usefulness and capabilities of machine translation
2002SYSTRAN was used on most major Internet Portals: Yahoo!, Google, AltaVista, Lycos.
1996SYSTRAN within SEIKO’s pocket translators.
1990’sPort technology from mainframes to Desktop PC’s and Client-Server environments for personal and corporate use
2014Following acquisition by CSLI, SYSTRAN SA forms part of the SYSTRAN International Group
2005Launched embedded translation software for mobile devices
2009Developed first hybrid translation software and solution: SES 7 Translation Server
2011Launch of SES 7 Training Server, first solution for self-learning of MT engines
2015SES8 Translation and Training Server – Large Models
2016More than 140 language Pairs.Launch of SYSTRAN.io, the Natural Language Processing API platform
01/05/2023 3
The new game changer• Deep Neural Network
Technologies• Image Analysis• Voice Recognition• Text
• Text Generation• Word Embeddings• Multitask NLP• Neural Machine Translation
• … Games
• Super Human Abilities
Sequence of fascinating results and technologies over the last 3 years – all based on Deep Neural Network (DNN) – covering a large variety of domains…
SYSTRAN - Copyright 2016
01/05/2023 4
The new game changer• Deep Neural Network
Technologies• Image Analysis• Voice Recognition• Text
• Text Generation• Word Embeddings• Multitask NLP• Neural Machine Translation
• … Games
SYSTRAN - Copyright 2016
01/05/2023 5
The new game changer• Deep Neural Network
Technologies• Image Analysis• Voice Recognition• Text
• Text Generation• Word Embeddings• Multitask NLP• Neural Machine Translation
• … Games Abilities
• Google 2015 RNN voice search recognition outperforms 2012 DNN models
• Baidu Deep Speech announces 16.5% improvement over baseline and higher performance than human in noisy environment
SYSTRAN - Copyright 2016
01/05/2023
The new game changer - examples• Deep Neural Network
Technologies• Image Analysis• Voice Recognition• Text
• Text Generation• Word Embeddings• Multitask NLP• Neural Machine Translation
• … Gamesn Abilities
Les yeux prenaient des redoutables, des troncs de feu.Toutes les prétexticheurs par ces quatre repentilleuses avec du sergent de Digne, débragiffés nymoeurs sur les derniers instants à hardis, boucher, sans dénongée en plus ennérence, ils se refecturent encore. Ils auraient déjà mangé ses très interses.
ShakespeareVictor Hugo
Char-RNN, Andrej Karpathy, 2015
공급자는 AspNetXSprchyLibrary 의 인스턴스를 만들어 다른 경고를 오버터 컴퓨터에 저장할 수 있습니다 .
MSDN
SYSTRAN - Copyright 2016
01/05/2023 7SYSTRAN - Copyright 2016
01/05/2023 8SYSTRAN - Copyright 2016
01/05/2023 9
The new game changer - examples• Deep Neural Network
Technologies• Image Analysis• Voice Recognition• Text
• Text Generation• Word Embeddings• Multitask NLP• Neural Machine Translation
• … Games
word2vec, Google, 2013
01/05/2023
The new game changer - examples• Unified Neural Network Architecture
for several NLP tasks POS tagging, chunking, NER, SRL
• Focus on avoiding task/linguistic specific engineering• Joint decision on the different tasks
Outperforms almost all of the state of the art results for each individual tasks
Natural Language Processing (Almost) from Scratch, Collobert et al., 2011
• Deep Neural Network Technologies• Image Analysis• Voice Recognition• Text
• Text Generation• Word Embeddings• Multitask NLP• Neural Machine Translation
• … Gamesn Abilities
The new game changer - examples• Deep Neural Network
Technologies• Image Analysis• Voice Recognition• Text
• Text Generation• Word Embeddings• Multitask NLP• Neural Machine Translation:
sentence encoding-decoding• … Games
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, K. Cho et al, 2014
01/05/2023
The new game changer - examples• Deep Neural Network
Technologies• Image Analysis• Voice Recognition• Text
• Text Generation• Word Embeddings• Multitask NLP• Neural Machine : sentence encoding-
decoding• … Games – DQN, AlphaGo
HUMAN-LEVEL CONTROL THROUGH DEEP REINFORCEMENT LEARNING, Google DeepMind, 2015
01/05/2023 AlphaGo, Google DeepMind, 2016SYSTRAN - Copyright 2016
01/05/2023 14
The new game changer - examples
More and more evidence of “super-human abilities”
Could we also reach Super-human Machine Translation?
SYSTRAN - Copyright 2016
01/05/2023 15
The new game changer – ingredients• MLP – multilayer perceptron
• Actually an “old concept”
• CNN• Convolutional Neural network
• Word Embeddings• Representing words as vectors
• RNN – GRU, LSTM• MLP with memory
• Attention-Based models• Ability to decide where to find
information
SYSTRAN - Copyright 2016
01/05/2023 16
The new game changer – ingredients• MLP – multilayer perceptron
• Actually an “old concept”
• CNN• Convolutional Neural network
• Word Embeddings• Representing words as vectors
• RNN – GRU, LSTM• MLP with memory
• Attention-Based models• Ability to decide where to find
information
SYSTRAN - Copyright 2016
01/05/2023 17
The new game changer – ingredients• MLP – multilayer perceptron
• Actually an “old concept”
• CNN• Convolutional Neural network
• Word Embeddings• Representing words as vectors
• RNN – GRU, LSTM• MLP with memory
• Attention-Based models• Ability to decide where to find
information
SYSTRAN - Copyright 2016
01/05/2023 18
The new game changer – ingredients• MLP – multilayer perceptron
• Actually an “old concept”
• CNN• Convolutional Neural network
• Word Embeddings• Representing words as vectors
• RNN – GRU, LSTM• MLP with memory
• Attention-Based models• Ability to decide where to find
information
SYSTRAN - Copyright 2016
01/05/2023 19
The new game changer – ingredients• MLP – multilayer perceptron
• Actually an “old concept”
• CNN• Convolutional Neural network
• Word Embeddings• Representing words as vectors
• RNN – GRU, LSTM• MLP with memory
• Attention-Based models• Ability to decide where to find
information
SYSTRAN - Copyright 2016
All of these features are the ingredients to Neural Machine Translation
01/05/2023 20
About Neural Machine Translation (NMT)• The goal is to perform end-to-end translation
• Like in Speech Recognition• The spirit is to remove all these features and have single system
• For Machine Translation – first NMT systems are encoder-decoder• But not that magic
• Not systematic improvements over SMT baseline• Use of ensemble systems• Issues with sentence lengths, vocabulary size
• Solutions come back with some interest in “linguistic” characteristics• Attention-Based model (alignment information)• Deep Fusion with Language Model (better modelling of target language)• Combine with word level (~ morphology)
SYSTRAN - Copyright 2016
01/05/2023 21
SYSTRAN approach to NMT• Current Real Use-Case Requirements: • Adaptation to (small) domain• Help for post-editing• Preserved speed• Consistent results amongst multiple target languages• Possibility to let users control translation through annotations, terminology• …
• Toward Linguistically Motivated NN architecture• SYSTRAN MT is composed of linguistic modules – let us start with them• Lot of knowledge to leverage
SYSTRAN - Copyright 2016
01/05/2023 22
SYSTRAN Deep Learning Story – Part ILanguage Identification
SYSTRAN LDK 1
• Statistical Classifier – 3-grams• Heavily Feature Engineered over years• e.g. diacritics model for latin language• Include lexicon of frequent terms
• Quite good accuracy on news-type data – need ~20 characters
Basic RNN
• “out-of-the-box” character level RNN• no specific language specific
engineering• 80K words training per language
Google CLD
• Naïve Bayesian Classifier – 4-grams• Trained on “big data”• carefully scrapped over 100M pages
• Specific tricks for closely related languages (Spanish/Portuguese)
• Geared for webpages - 200+ characters
Learnings: with same data RNN approach easily outperforms baseline, no specific engineering needed… big data is not competing...
SYSTRAN - Copyright 2016
News Sentences
One-word request
Ted-Talk Sentences
Tweets
LDK 97 55.2 87.4 78.3
RNN 98.2 61.5 91.4 77.9
CLD 96.1 15.3 86 78.1
01/05/2023 23
SYSTRAN Deep Learning Story – Part IIPart of Speech Tagging
Phase 1 - 1968-2014 - Handcrafting• Manual Rule and Lexicon Coding of homography• Closely related to Morphology description• 27 languages covered
Phase 2 - 2008-2015 – Annotating• Train Classifier to "relearn” rules (fnTBL)• Transfer knowledge through system output• Maintenance through Annotation
Phase 3 - 2015- - Generalizing• Relearn with RNN• Joint decision (so far tokenization/part of speech
tagging) – working on morphology• Better generalization from additional knowledge
(word embeddings)
SYSTRAN - Copyright 2016
Learnings: Possibility to leverage ”handcrafting” and gain quality. But learning becoming too smart – it also learns initial errors
01/05/2023 24
SYSTRAN Deep Learning Story – Part IIITransliteration• Transliteration of person names
is depending on• Source Language• Target Language• But also Name origin
• 카스파로프 = Kasparov• 필리프 = Philippe
• Good Transliteration system needs:• Detection of origin• Transliteration mechanism
•Extremely complicated – since it requires phonetics modeling
Rule-Based
• Satisfactory but origin detection and multiple domains
• No generalization - unseen sequence is wrong
PBMT
• Encoding-Decoding Approach• Long distance "view" guarantee consistency of
transliteration
RNN
Learnings:- losing reliability/traceability of the process+ more global consistency, compactness of the solution
01/05/2023 25
SYSTRAN Deep Learning Story – Part IVLanguage Modeling• RNN language model proves to overpass standard n-gram models • No limitation in the span• Seems to capture also better the language structure• Better generalization due to word embedding
• Can be easily introduced in PBMT engine through rescoring• Are still challenging pure sequence-to-sequence NMT approaches
Learnings:- Very long training process, several weeks of training for one language+ Consistent quality gain, easy introduction in existing framework
01/05/2023 26
Learnings from Deep Learning • Consistent quality improvement in all the experiments/modules we
worked on• Better leverage of existing training material• Better generalization
• Incrementability: by design, it is immediate to feed more training data – i.e. adapt dynamically to usage• Globally more simple than alternative approaches and cognitively
interesting• Fit to be combined in a global NN architecture
SYSTRAN - Copyright 2016
01/05/2023 SYSTRAN - Copyright 2016 27
Linguistically Motivated NN architecture
Morphology
Syntactic Analysis
Sentence Encoding Sentence Decoding
RNN-LM
Word Embedding
Source Sentence …
Target Sentence …
01/05/2023 28
What about Statistical Post Editing:Learning to
correct?
SYSTRAN - Copyright 2016
• SPE was introduced as smart alternative the SMT
• Corresponding to real MT use case for localization• Very little data can produce
adaptation• Reduce Human Post-Editor Work
by iteratively learning edits
• However implementation with PBMT is not satisfactory• PBMT does not learn to correct but to
translate• Not incremental
• Learning to correct• More control of the process
Toward a “translation checker”• Change the paradigm – now human post-
editor to MT output, tomorrow automatic post-editor to human output?
SPE
MT
HPE
HPE
01/05/2023 29
Deep Learning for Machine Translation• No doubt – it is coming:
• We will probably reach “superhuman” machine translation in coming years• And this could become real translation assistant
• How is not yet completely clear• From our perspective, we are working on hybrid approach = linguistically motivated NN
architecture• More will also be coming from research world
• Still some work ahead• Training of models is still a technological challenge• We need the models to explain as much as to translate to become really useful – or for
language learning• Multi-level analysis - document translation and not just sentences• Multi-modal => could lead to full self language learning
SYSTRAN - Copyright 2016