Upload
buidat
View
234
Download
4
Embed Size (px)
Citation preview
1
NLP Programming Tutorial 10 – Neural Networks
NLP Programming Tutorial 10 -Neural Networks
Graham NeubigNara Institute of Science and Technology (NAIST)
2
NLP Programming Tutorial 10 – Neural Networks
Prediction Problems
Given x, predict y
3
NLP Programming Tutorial 10 – Neural Networks
Example we will use:
● Given an introductory sentence from Wikipedia
● Predict whether the article is about a person
● This is binary classification (of course!)
Given
Gonso was a Sanron sect priest (754-827)in the late Nara and early Heian periods.
Predict
Yes!
Shichikuzan Chigogataki Fudomyoo isa historical site located at Magura, MaizuruCity, Kyoto Prefecture.
No!
4
NLP Programming Tutorial 10 – Neural Networks
Linear Classifiers
y = sign (w⋅ϕ(x))
= sign (∑i=1
Iw i⋅ϕi( x))
● x: the input
● φ(x): vector of feature functions {φ1(x), φ
2(x), …, φ
I(x)}
● w: the weight vector {w1, w
2, …, w
I}
● y: the prediction, +1 if “yes”, -1 if “no”● (sign(v) is +1 if v >= 0, -1 otherwise)
5
NLP Programming Tutorial 10 – Neural Networks
Example Feature Functions:Unigram Features
● Equal to “number of times a particular word appears”
x = A site , located in Maizuru , Kyotoφ
unigram “A”(x) = 1 φ
unigram “site”(x) = 1 φ
unigram “,”(x) = 2
φunigram “located”
(x) = 1 φunigram “in”
(x) = 1
φunigram “Maizuru”
(x) = 1 φunigram “Kyoto”
(x) = 1
φunigram “the”
(x) = 0 φunigram “temple”
(x) = 0
…The restare all 0
● For convenience, we use feature names (φunigram “A”
)
instead of feature indexes (φ1)
6
NLP Programming Tutorial 10 – Neural Networks
Calculating the Weighted Sumx = A site , located in Maizuru , Kyoto
φunigram “A”
(x) = 1φ
unigram “site”(x) = 1
φunigram “,”
(x) = 2
φunigram “located”
(x) = 1
φunigram “in”
(x) = 1
φunigram “Maizuru”
(x) = 1
φunigram “Kyoto”
(x) = 1
wunigram “a”
= 0
wunigram “site”
= -3
wunigram “located”
= 0
wunigram “Maizuru”
= 0
wunigram “,”
= 0w
unigram “in” = 0
wunigram “Kyoto”
= 0
φunigram “priest”
(x) = 0 wunigram “priest”
= 2
φunigram “black”
(x) = 0 wunigram “black”
= 0
* =
0
-3
…
0
000
0
0
0
…
+
++
+
+++
++
=-3 → No!
7
NLP Programming Tutorial 10 – Neural Networks
The Perceptron
● Think of it as a “machine” to calculate a weighted sum
sign(∑i=1
Iw i⋅ϕi( x))
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
-30000020
-1
8
NLP Programming Tutorial 10 – Neural Networks
Problem: Linear Constraint
● Perceptron cannot achieve high accuracy on non-linear functions
X
O
O
X
9
NLP Programming Tutorial 10 – Neural Networks
Neural Networks
● Neural networks connect multiple perceptrons together
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
-1
● Motivation: Can express non-linear functions
10
NLP Programming Tutorial 10 – Neural Networks
Example:
● Build two classifiers:
X
O
O
X
φ(x2) = {1, 1}φ(x
1) = {-1, 1}
φ(x4) = {1, -1}φ(x
3) = {-1, -1}
w1
w2
φ1
φ2
1
φ1
φ2
1
1
1
-1
-1
-1
-1
φ1
φ2 y
1
y2
11
NLP Programming Tutorial 10 – Neural Networks
Example:
● These classifiers map the points to a new space
X
O
O
X
φ(x2) = {1, 1}φ(x
1) = {-1, 1}
φ(x4) = {1, -1}φ(x
3) = {-1, -1}
11-1
-1-1-1
φ1
φ2
y2
y1
y1
y2
y(x1) = {-1, -1}
X Oy(x
2) = {1, -1}
O
y(x3) = {-1, 1}
y(x4) = {-1, -1}
12
NLP Programming Tutorial 10 – Neural Networks
Example: ● In the new space, examples are classifiable!
X
O
O
X
φ(x2) = {1, 1}φ(x
1) = {-1, 1}
φ(x4) = {1, -1}φ(x
3) = {-1, -1}
11-1
-1-1-1
φ1
φ2
y2
y1
y1
y2
y(x1) = {-1, -1}
X O y(x2) = {1, -1}
Oy(x3) = {-1, 1}
y(x4) = {-1, -1}
111
y3
13
NLP Programming Tutorial 10 – Neural Networks
Example:
● Final neural network:
w1
w2
φ1
φ2
1
φ1
φ2
1
1
1
-1
-1
-1
-1
1 1
1
1
w3 y4
14
NLP Programming Tutorial 10 – Neural Networks
Representing a Neural Network
● Assume network is fully connected and in layers
● Each perceptron:● A layer ID● A weight vector
network = [ (1, w
0),
(1, w1),
(1, w2),
(2, w3)
]
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
1
2
3
Layer 1 Layer 2
15
NLP Programming Tutorial 10 – Neural Networks
Neural Network Prediction Process
● Predict one perceptron at a time using previous layer
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
1
2
3
16
NLP Programming Tutorial 10 – Neural Networks
Neural Network Prediction Process
● Predict one perceptron at a time using previous layer
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
1
2
3
-1
17
NLP Programming Tutorial 10 – Neural Networks
Neural Network Prediction Process
● Predict one perceptron at a time using previous layer
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
1
2
3
-1
1
18
NLP Programming Tutorial 10 – Neural Networks
Neural Network Prediction Process
● Predict one perceptron at a time using previous layer
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
1
2
3
-1
1
1
19
NLP Programming Tutorial 10 – Neural Networks
Neural Network Prediction Process
● Predict one perceptron at a time using previous layer
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
1
2
3
-1
1
1
-1
20
NLP Programming Tutorial 10 – Neural Networks
Review:Pseudo-code for Perceptron Predicton
predict_one(w, phi) score = 0
for each name, value in phi # score = w*φ(x)if name exists in w
score += value * w[name]if score >= 0
return 1else
return -1
21
NLP Programming Tutorial 10 – Neural Networks
Pseudo-Code for NN Prediction
predict_nn(network, phi) y = [ phi, {}, {} … ] # activations for each layer for each node i:
layer, weight = network[i] # predict the answer with the previous perceptron
answer = predict_one(weight, y[layer-1])# save this answer as a feature for the next layery[layer][i] = answer
return the answer for the last perceptron
22
NLP Programming Tutorial 10 – Neural Networks
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
sign(φ*x)y
Neural Network Activation Functions
● Previously described NN uses step function
● Step function is not differentiable → use tanh
y=sign(w⋅ϕ( x))
-4 -3 -2 -1 0 1 2 3 4
-2-1012
tanh(φ*x)
y
y=tanh (w⋅ϕ(x ))
Python:from math import tanhtanh(x)
23
NLP Programming Tutorial 10 – Neural Networks
Learning a Perceptron w/ tanh
● First, calculate the error:
● Update each weight with:
● Where λ is the learning rate
● (For step function perceptron δ = -2 or +2, λ = 1/2)
δ = y' - y
correct tag system output
w ←w+ λ⋅δ⋅ϕ(x )
24
NLP Programming Tutorial 10 – Neural Networks
Problem: Don't Know Correct Answer!
● For NNs, only know correct tag for last layer
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
1
2
3y' = 1y = -1
y' = ?y = 1
y' = ?y = 1
y' = ?y = 1
25
NLP Programming Tutorial 10 – Neural Networks
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
y
Answer: Back-Propogation● Pass error backwards along the network
● Also consider gradient of tanh
● Combine:
δ = -0.9
δ = 0.2
δ = 0.4
j
w=0.1
w=1
w=-0.3
∑iδ iw j ,i
d tanh (ϕ(x )∗w)=1−(ϕ(x)∗w)2=1− y j
2
δ j=(1− y j2)∑i
δiw j ,i
26
NLP Programming Tutorial 10 – Neural Networks
Back Propagation Codeupdate_nn(network, phi, y')
create array δ calculate y using predict_nn for each node j in reverse order:
if j is the last nodeδ
j = y' – y
j
elseδ
j =
for each node j:layer, w = network[j]for each name, val in y[layer-1]:
w[name] += λ * δj * val
(1− y j2)∑i
δ i w j ,i
27
NLP Programming Tutorial 10 – Neural Networks
Training process
● For previous perceptron, we initialized weights to zero
● In NN: randomly initialize weights(so not all perceptrons are identical)
create networkrandomize network weightsfor I iterations
for each labeled pair x, y in the dataphi = create_features(x)update_nn(w, phi, y)
28
NLP Programming Tutorial 10 – Neural Networks
Exercise
29
NLP Programming Tutorial 10 – Neural Networks
Exercise (1)● Write two programs
● train-nn: Creates a neural network model● test-nn: Reads a neural network model
● Test train-nn● Input: test/03-train-input.txt● Use one iteration, one hidden layer, two hidden nodes● Calculate updates by hand and make sure they are
correct
30
NLP Programming Tutorial 10 – Neural Networks
Exercise (2)
● Train a model on data/titles-en-train.labeled
● Predict the labels of data/titles-en-test.word
● Grade your answers ● script/grade-prediction.py data-en/titles-en-test.labeled your_answer
● Compare:● With a single perceptron/SVM classifiers● With different neural network structures
31
NLP Programming Tutorial 10 – Neural Networks
Thank You!