Upload
others
View
20
Download
0
Embed Size (px)
Citation preview
Fundamentals of Machine Learningfor Machine Translation
Dr. John D. KelleherADAPT Centre for Digital Content Technology1
Dublin Institute of Technology, Ireland
Translating Europe Forum 2016Brussels, Charlemagne Building, Rue de la Loi 170
27th October 2016
1The ADAPT Centre is funded under the SFI Research Centres Programme(Grant 13/RC/2106) and is co-funded under the European RegionalDevelopment Fund
1 / 28
Outline
Basic Building Blocks: Neurons
What is a Neural Network?
Word Embeddings
Recurrent Neural Networks
Encoders
Decoders (Language Models)
Neural Machine Translation
Conclusions
What is a function?
A function maps a set of inputs (numbers) to an output (number)
sum(2, 5, 4)→ 11
3 / 28
What is a weightedSum function?
weightedSum([n1, n2, . . . , nm]︸ ︷︷ ︸Input Numbers
, [w1,w2, . . . ,wm]︸ ︷︷ ︸Weights
)
= (n1 × w1) + (n2 × w2) + · · ·+ (nm × wm)
weightedSum([3, 9], [−3, 1])
= (3×−3) + (9× 1)
= −9 + 9
= 0
4 / 28
What is an activation function?
An activation function takes the output of ourweightedSum function and applies another mapping to it.
−10 −5 0 5 10
0.00
0.25
0.50
0.75
1.00
x
logistic(x)
5 / 28
What is an activation function?
activation =
logistic(weightedSum(([n1, n2, . . . , nm]︸ ︷︷ ︸Input Numbers
, [w1,w2, . . . ,wm]︸ ︷︷ ︸Weights
))
logistic(weightedSum([3, 9], [−3, 1]))
= logistic((3×−3) + (9× 1))
= logistic(−9 + 9)
= logistic(0)
= 0.5
6 / 28
What is a Neuron?
The simple list of operations that we have just described definesthe fundamental building block of a neural network: the Neuron.
Neuron =
activation(weightedSum(([n1, n2, . . . , nm]︸ ︷︷ ︸Input Numbers
, [w1,w2, . . . ,wm]︸ ︷︷ ︸Weights
))
7 / 28
What is a Neural Network?
HiddenLayer
OutputLayer
InputLayer
I1
H2
H3
H1
I2
O1
O2
8 / 28
Where do the weights come from?
OutputLayer
HiddenLayer
InputLayer
I1
H1w1
H2
w2
H3
w3I2
w4
w5
w6
O1
w7
O2
w10
w8
w11
w9
w12
9 / 28
Training a Neural Network
I We train a neural network by iteratively updating the weights
I We start by randomly assigning weights to each edge
I We then show the network examples of inputs and expectedoutputs and update the weights using Backpropogationso that the network outputs match the expected outputs
I We keep updating the weights until the network is workingthe way we want
10 / 28
Word Embeddings
Each word is represented by a vector of numbers that positions theword in a multi-dimensional space, e.g.:
king =< 55,−10, 176, 27 >
man =< 10, 79, 150, 83 >
woman =< 15, 74, 159, 106 >
queen =< 60,−15, 185, 50 >
11 / 28
Word Embeddings
vec(King)− vec(Man) + vec(Woman) ≈ vec(Queen)2
2Linguistic Regularities in Continuous Space Word Representations (Mikolov et al., 2013)
12 / 28
Recurrent Neural Networks
A particular type of neural network that is useful for processingsequential data (such as, language) is a Recurrent NeuralNetwork.
Using an RNN we process our sequential data one input at a time.
In an RNN the outputs of some of the neurons for one input arefeed back into the network as part the next input.
13 / 28
Recurrent Neural Networks
InputLayer
OutputLayer
MemoryBuffer
HiddenLayerI1
H1
H2
H3
M1
M2
M3
O1
14 / 28
Recurrent Neural Networks
HiddenLayer
MemoryBuffer
OutputLayer
InputLayer
I1
H2
H3
H1
M2
M3
M1
O1
15 / 28
Recurrent Neural Networks
OutputLayer
HiddenLayer
MemoryBuffer
InputLayer
I1
H2
H3
H1
M2
M3
M1
O1
16 / 28
Recurrent Neural Networks
HiddenLayer
OutputLayer
InputLayer
MemoryBuffer
I1
H2
H3
H1
M2
M3
M1
O1
17 / 28
Recurrent Neural Networks
Output
Hidden
InputMemory
Figure: Recurrent Neural Network
yt
ht
xtht−1
Figure: Recurrent Neural Network
18 / 28
Recurrent Neural Networks
Output:
Input:
y1 y2 y3 yt yt+1
h1 h2 h3 · · · ht ht+1
x1 x2 x3 xt xt+1
0 / 0
Figure: RNN Unrolled Through Time
19 / 28
Recurrent Neural Networks
1. RNN Encoders
2. RNN Language Models
20 / 28
Encoders
Encoding:
Input:
h1 h2 · · · hm C
Word1 Word2 Wordm < eos >
0 / 0
Figure: Using an RNN to Generate an Encoding of a Word Sequence
21 / 28
Language Models
Output:
Input:
⇤Word2 ⇤Word3 ⇤Word4 ⇤Wordt+1
h1 h2 h3 · · · ht
Word1 Word2 Word3 Wordt
0 / 0
Figure: RNN Language Model Unrolled Through Time
22 / 28
Decoder
Output:
Input:
⇤Word2 ⇤Word3 ⇤Word4 · · · ⇤Wordt+1
h1 h2 h3 · · · ht
Word1
0 / 0
Figure: Using an RNN Language Model to Generate (Hallucinate) aWord Sequence
23 / 28
Encoder-Decoder Architecture
Decoder
Encoder
Target1 Target2 · · · < eos >
h1 h2 · · · C d1 · · · dn
Source1 Source2 · · · < eos >
0 / 0
Figure: Sequence to Sequence Translation using an Encoder-DecoderArchitecture
24 / 28
Neural Machine Translation
Decoder
Encoder
Life is beautiful < eos >
h1 h2 h3 h4 C d1 d2 d3
belle est vie La < eos >
0 / 0
Figure: Example Translation using an Encoder-Decoder Architecture
25 / 28
Conclusions
I An advantage of the encoder-decoder architecture is thatthe system processes the entire input before it startstranslating
I This means that the decoder can use what it has alreadygenerated and the entire source sentence when generating thenext word in the translation
26 / 28
Conclusions
I There is ongoing research on what is the best way to presentthe source sentence to the encoder
I There is also ongoing research on giving the decoder theability to attend to different parts of the input duringtranslation
I There is also interesting work on improving how these systemshandle idiomatic language
27 / 28
Thank you for your attention
@johndkelleher
www.comp.dit.ie/jkelleher
www.machinelearningbook.com
Acknowledgements: The ADAPT Centre is funded under the SFI Research Centres
Programme (Grant 13/RC/2106) and is co-funded under the European Regional
Development Fund.