19
MT serving the society (Aaron) Lifeng Han / ADAPT @DCU [email protected] linkedin.com/in/aaronhan PubhD, Dublin 2017.03.1st The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.

PubhD talk: MT serving the society

Embed Size (px)

Citation preview

Page 1: PubhD talk: MT serving the society

MT serving the society(Aaron) Lifeng Han / ADAPT @DCU

[email protected]

linkedin.com/in/aaronhan

PubhD, Dublin 2017.03.1st

The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.

Page 2: PubhD talk: MT serving the society

www.adaptcentre.iePresenter

Lifeng Han (or Aaron)

2016.12-on, PhD student in ADAPT Centre @ DCU

2016.10-11, RA researcher in ADAPT Centre

2016.03-2016.07, Guest researcher in Uni. Of Amsterdam

2014.09-2016.02, Employee in Uni. Of Amsterdam

2014.07. Master of Computer Science, Bachelor in Mathematics

News: https://aaronlifenghan.jimdo.com/news/

Poet: https://poethan.wordpress.com/

Like: Sports/arts/music/photography/poetry/cooking/cycling/drawing

Page 3: PubhD talk: MT serving the society

www.adaptcentre.ieContent

What is MT

How MT began and developed

How MT works now

How MT serves the society

How you are connected with MT daily

Page 4: PubhD talk: MT serving the society

www.adaptcentre.ieWhat is MT

MT means Machine Translation.

Use the machine / computer to translate human/natural languages

- e.g. from English to German/French/Spanish/Irish/Chinese

- And opposite directions MT

To work out with MT

- teach the computer to understand human languages

- teach the computer to learn grammar / semantics

- teach the computer to learn algorithms

Page 5: PubhD talk: MT serving the society

www.adaptcentre.ieHow MT began and developed - began

The original idea is from ‘ the Tower of Babel’ (Genesis)

- 11:5 LORD came down to see the city and the tower the people were

building.

- 11:6 The LORD said, "If as one people speaking the same language

they have begun to do this, then nothing they plan to do will be

impossible for them.

The second idea is from René Descartes (1629)

- a universal language,

- with equivalent ideas in different tongues sharing one symbol.[3]

- Philosophical statement: ‘I think, therefore I am’

The third idea is from Warren Weaver "machine translation“

- appeared in <Memorandum on Translation> 1949

Page 6: PubhD talk: MT serving the society

www.adaptcentre.ieHow MT began and developed - developed

MT Models:

Rule-based MT (RBMT)

Statistical MT (SMT)

Example-based MT (EBMT)

Hybrid MT (HMT)

Neural MT (NMT)

Page 7: PubhD talk: MT serving the society

www.adaptcentre.ieHow MT began and developed - RBMT

RBMT paradigm: used mostly in the creation of dictionaries and

grammar programs.

- transfer-based machine translation

- interlingual machine translation

- dictionary-based machine translation

Approaches:

- linking the structure of the input sentence with the structure of output

sentence

- by parser, analyser for source lang., generator for target lan., a

transfer lexicon for the actual MT

Page 8: PubhD talk: MT serving the society

www.adaptcentre.ieHow MT began and developed - RBMT

RBMT: Linguistics motivated:

- more information about the linguistics of the source and target

languages

- using the morphological and syntactic rules and semantic analysis of

both languages

Downfall:

- everything must be made explicit

- orthographical variation and erroneous input must be made part of

the source language analyser in order to cope with it

- lexical selection rules must be written for all instances of ambiguity

Page 9: PubhD talk: MT serving the society

www.adaptcentre.ieHow MT began and developed - SMT

Ideas of SMT introduced by Warren Weaver in 1949

- including the ideas of applying Claude Shannon's information theory.

SMT re-introduced in the late 1980s and early 1990s

- by researchers at IBM's Thomas J. Watson Research Center

- contributed to the significant resurgence in interest in MT

Page 10: PubhD talk: MT serving the society

www.adaptcentre.ieHow MT began and developed - SMT

A document is translated according to the probability distribution p(e|f):

- a string e in the target language (e.g. English) is the translation of a

string f in the source language (e.g. French), if by Bayes Theorem:

P(e|f) ~ P(f|e)P(e);

- translation model p(f|e): the probability that the source string is the

translation of the target string

- language model p(e): the probability of seeing that target language

string.

Splits the problem into two sub-problems. Finding the best translation e

is done by picking up the one that gives the highest probability

Page 11: PubhD talk: MT serving the society

www.adaptcentre.ieHow MT began and developed - SMT

SMT derivations:

- Word-based

- Phrase-based

- Hierachical phrase-based

- Syntax-based (constituency structure vs dependency structure)

- Semantic integration

Problems of syntax-based model:

- Long distance dependency is still problem

- no linguistic restrictions imposed on the variables.

- when the translated piece of text is longer than a shreshold, models

can not use syntax-based rules, instead using so-called ‘glue rules’

Page 12: PubhD talk: MT serving the society

www.adaptcentre.ieHow MT began and developed - NMT

Neural MT:

A deep learning based approach to MT

- Radical departure from phrase-based statistical translation

approaches, in which a translation system consists of subcomponents

that are separately engineered

- all parts of the neural translation model are trained jointly (end-to-

end) to maximize the translation performance

refer: https://en.wikipedia.org/wiki/Neural_machine_translation

Page 13: PubhD talk: MT serving the society

www.adaptcentre.ieHow MT began and developed - NMT

Began from ‘word-to-vector’, by NN

Word embedding

Neural Language model

Encoder-Decoder model

New: Attention mechanism, e.g. adding alignment information etc.

Page 14: PubhD talk: MT serving the society

www.adaptcentre.ieHow MT began and developed - NMT

Benefits of NMT:

Each output predicted from

- encoding of the full input sentence

- all previously produced output words (theoritically)

Word embeddings allow generalization

- ‘cat’ and ‘cats’ can have similar representations

- similar goes to ‘home’ and ‘house’

- better fluency

- better handling sentence-level context

Page 15: PubhD talk: MT serving the society

www.adaptcentre.ieHow MT began and developed - NMT

Disadvantages:

- limited vocabulary, allows limited vocabulary size

- no explicit modeling of coverage / bad with rare words

- development challenges / speed / hardware / process not transparent

- traditional SMT allows customization / using own terminology/

customers domain / rules for dates, units / markup tags handling etc.

but NMT not.

Page 16: PubhD talk: MT serving the society

www.adaptcentre.ieHow MT works now

For large data available language pairs, e.g.

French/German/Spanish/Chinese-English.

- Chinese English, the output can make meaning preservation most

cased, but word reordering/grammer is not good enough

For low resource language pairs:

- both adequacy and fluency need to be improved largely

For cheapness, it still needs big machines to work behind

Page 17: PubhD talk: MT serving the society

www.adaptcentre.ieHow MT serves the society

Scientific communication

- researchers to understand each other’s work/paper/theoreis

Technological communication

- engineers help each others to fix the projects

Commercial communication

- when we buy stuffs from other countries, patent translations

Cultural communication

- social nets, news, travels, costumes, arts

Page 18: PubhD talk: MT serving the society

www.adaptcentre.ieHow you are connected with MT daily

The papers you read everyday:

- even though you read English articles, the authors gained their ideas

probably from different languages’ articles

The food you bought everyday:

- produced by international companies who need transaltions always

The furniture/cloth you bought:

- multilingual translations introductions

The letter you receive monthly: waternet / trash/ etc. in NL/dutch

- use multimodal MT just make a picture and translation comes

The social net/news you read online:

- reporters from different countries by their own languages

Page 19: PubhD talk: MT serving the society

www.adaptcentre.ieReferences

Qun Liu. Dependence-based SMT talk. ILLC, UvA. 2014.Nov.

Philipp Koehn. Neural MT web seminar. Omniscientech. 2017.Jan.25th.

(Aaron) Lifeng Han. ‘Neural Machine Translation: Are we building 'The

Tower of Babel‘ again?’ Talk. DCU, Dublin. 2017.01.25th.

https://en.wikipedia.org/wiki/Machine_translation

https://en.wikipedia.org/wiki/Rule-based_machine_translation

https://en.wikipedia.org/wiki/Statistical_machine_translation