Upload
vodieu
View
217
Download
1
Embed Size (px)
Citation preview
Nara Institute of Science and TechnologyAugmented Human Communication LaboratoryPRESTO, Japan Science and Technology Agency
Advanced Cutting Edge
Research Seminar
Dialogue System with Deep Neural
Networks
Assistant Professor
Koichiro Yoshino
2018/1/25 ©Koichiro Yoshino AHC-Lab. NAIST,
PRESTO JSTAdvanced Cutting Edge Research Seminar 2 1
1. Basis of spoken dialogue systems
– Type and modules of spoken dialogue systems
2. Deep learning for spoken dialogue systems
– Basis of deep learning (deep neural networks)
– Recent approaches of deep learning for spoken dialogue systems
3. Dialogue management using reinforcement learning
– Basis of reinforcement learning
– Statistical dialogue management using intention dependency graph
4. Dialogue management using deep reinforcement learning
– Implementation of deep Q-network in dialogue management
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 2
Course works
• Perceptron
– Simple binary classifier
• Multi-layer perceptron
– Combination of binary classifiers
• Deep neural networks
• Ways to apply deep neural networks
– What kind of problem can be solved with DNN?
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 3
Basis of deep neural networks
• Simplest unit of neural networks
– Takes several inputs produces one output
– Perceptron does a single decision given several inputs
𝒚 = 𝐬𝐢𝐠𝐧
𝒊
𝒘𝒊 ∙ 𝝋𝒊(𝒙) + 𝒃
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 4
Simple perceptron
Binary output
+1 or -1Weight for 𝝋𝒊 Feature function on 𝒙
bias 𝒃
𝒚𝒙
• Problems the perceptron can solve
– Linearly solvable binary classification problem
• Positive: “This room is good”
• Positive: “This building is cool”
• Negative: “This room is bad”
• Problems the perceptron cannot solve
– Non-linear problems
• Positive: “very good”
• Positive: “not bad”
• Negative: “very bad”
• Negative: “not good”
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 5
Properties of simple perceptron
negative
positivegood
bad
cool
goodbad
not
very very good
not goodnot bad
very bad
• Using several classifiers
– Multi-layer perceptron (MLP)
• Feed-forward network
– Classifiers of the 1st layer take
the same input, but learn
different weights
– If we have two linear
separating planes, we can
classify the example of
non-linear problem
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 6
Solutions for non-linear problems
goodbad
not
very 𝒙𝟏: very good
𝒙𝟒: not good𝒙𝟑: not bad
𝒙𝟐: very bad
𝒚
• 1st layer
– Input: same as the single perceptron
– Output: features for the decision
of the 2nd layer
• Each perceptron may learn the
mapping between the input feature
space and the new feature space
• Kernel methods do the similar thing
• 2nd layer
– Input: features that come from
the 1st layer
– Output: classification result
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 7
Multi-layer perceptron (MLP)
goodbad
not
very 𝒙𝟏
𝒙𝟒𝒙𝟑
𝒙𝟐
𝒙𝟏
𝒙𝟒
𝒙𝟑
𝒙𝟐
𝝋1(𝒙)
𝝋2(𝒙)
𝝋1 𝒙 ∗ 𝝋2 𝒙 = 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒
𝝋1 𝒙 ∗ 𝝋2 𝒙 = 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
• Deep neural network (DNN) is deeply layered multi-layer
perceptron
– It can train the mapping of 𝑿 and 𝒚 (𝒚 = 𝒇(𝑿)) even if it is complex
– Restricted Boltzmann machine (RBM) is a key technique to train the
model
• Pre-trains mapping of each layer (𝑿 and 𝑯𝟏, 𝑯𝟏 and 𝑯𝟐, …) and
finetunes entire the network (backpropagation)
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 8
Deep neural networks
𝒙𝟏
𝒙𝟐
𝒙𝟑
𝒙𝟒
𝒉𝟏𝟏
𝒉𝟐𝟏
𝒉𝟑𝟏
𝒉𝟒𝟏
𝒉𝟏𝟐
𝒉𝟐𝟐
𝒉𝟑𝟐
𝒉𝟒𝟐
𝒉𝟏𝟑
𝒉𝟐𝟑
𝒉𝟑𝟑
𝒉𝟒𝟑
𝒉𝟏 𝒚
• Recurrent neural network
is a neural network that
has a recursion
– 𝒉𝒕 = 𝐭𝐚𝐧𝐡(𝑾𝒙𝒉𝒙𝒕 +𝑾𝒉𝒉𝒉
𝒕−𝟏 + 𝒃𝒉)
– 𝒚𝒑𝒕 = 𝐬𝐨𝐟𝐭𝐦𝐚𝐱(𝑾𝒉𝒚𝒑𝒉𝒕 + 𝒃𝒚𝒑)
• 𝐭𝐚𝐧𝐡 =𝒆𝒙−𝒆−𝒙
𝒆𝒙+𝒆−𝒙, 𝐬𝐨𝐟𝐭𝐦𝐚𝐱 =
𝒆𝒚𝒊
𝒌 𝒆𝒚𝒌
• This structure works well for sequential input (𝑿𝟏, 𝑿𝟐, … , 𝑿𝒕)
– 𝒕 is a time step
– Input 𝒉𝒕−𝟏 for time-step 𝒕 will be a memory of previous input
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 9
Variation of neural networks:
Recurrent neural network (RNN)
𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝟒 𝒙𝟓
𝒚𝒑𝟏 𝒚𝒑
𝟐 𝒚𝒑𝟑 𝒚𝒑
𝟒 𝒚𝒑𝟓
𝒉𝟏 𝒉𝟐 𝒉𝟑 𝒉𝟒 𝒉𝟓𝒉𝟎
• CNN is the state of the art algorithm for classification
– 𝒄𝒊,𝒋 = 𝒔=𝟎𝒎−𝟏 𝒕
𝒏−𝟏𝒘𝒔𝒕𝒙 𝒊+𝒔 ,(𝒋+𝒕) + 𝒃𝒄
– 𝒂𝒊,𝒋 = 𝐭𝐚𝐧𝐡 𝒄𝒊,𝒋 , 𝒑𝒊,𝒋 = 𝐦𝐚𝐱 𝒂𝒊,𝒋
– 𝒐 = 𝐭𝐚𝐧𝐡 𝑾𝒑𝒐𝒑 + 𝒃𝒐 , 𝒚 = 𝐬𝐨𝐟𝐭𝐦𝐚𝐱(𝑾𝒐𝒚𝒐 + 𝒃𝒚)
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 10
Variation of neural networks:
Convolutional neural network (CNN)
…
Convolution
Max-poolingActivationFull-connection
and softmax
• Deep learning can learn the mapping between 𝒙 and 𝒚
if we have a large-scale aligned data
– Speech sounds and their phonemes
– Transcribed utterances and dialogue states
– Belief and action-value function
– State and action
– Action and utterance
• Of course, it is not so simple, but successful works of deep
learning solve any problem of mapping that was hard to
solve existing frameworks…
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 11
Ways to apply deep learning
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 12
Tasks of spoken dialogue systems
SLU
LG
ModelKnowledge
base
DM
$FROM=Ikoma
$LINE=Kintetsu
1 ask $TO_GO
2 inform $NEXT
…Where will you go?
$FROM=Ikoma
$TO_GO=???
$LINE=KintetsuI’d like to take
Kintetsu-line
from Ikoma stat.
• Speech recognition
• Spoken language understanding
• Dialogue state tracking
• Action decision
• Language generation
• End-to-end dialogue
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 13
Speech recognition with DNN in early stage
• Conventional ASR architecture
𝐚𝐫𝐠𝐦𝐚𝐱𝑾𝑷(𝑾|𝑿) = 𝐚𝐫𝐠𝐦𝐚𝐱
𝑾𝑷 𝑿 𝑾 𝑷(𝑾)
𝑊 is word sequence and 𝑋 is speech
Acoustic
model
Language
model
DNN-HMMGMM-HMM
a r a
𝒙𝟏 𝒙𝟐 𝒙𝟑
a r a
……
…
…
𝒙𝟏
……
…
…
𝒙𝟐
……
…
…
𝒙𝟑
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 14
Speech recognition with DNN in early stage
• Just replace the generative probability of GMM for a phoneme with
discriminative probability to classify a phoneme from speech
– The other architecture (HMM-based phoneme sequence selection and n-
gram based language modeling) was the same, but it reduced 20-30% errors
of speech recognition
DNN-HMMGMM-HMM
a r a
𝒙𝟏 𝒙𝟐 𝒙𝟑
a r a
……
…
…
𝒙𝟏
……
…
…
𝒙𝟐
……
…
…
𝒙𝟑
• Language model calculates likelihoods of word sequence 𝑾
– 𝑷 𝑾 = 𝑷 𝒘𝟏 𝑷 𝒘𝟐|𝒘𝟏 …𝑷 𝒘𝒏|𝒘𝟏, … ,𝒘𝒏−𝟏
• Existing language models are modeled with N-gram model
that approximate given words
– 𝑷 𝑾 ≈ 𝒊𝑷(𝒘𝒊|𝒘𝒊−𝟏:𝒊−𝑵−𝟏)
• The problem can be solved with RNN
– 𝒉𝒕 = 𝐭𝐚𝐧𝐡(𝑾𝒘𝒉𝒘𝒕−𝟏 +𝑾𝒉𝒉𝒉
𝒕−𝟏 + 𝒃𝒉)
– 𝒘𝒕 = 𝐬𝐨𝐟𝐭𝐦𝐚𝐱(𝑾𝒉𝒚𝒑𝒉𝒕 + 𝒃𝒚𝒑)
• RNN (and its successors) became a state of the art LM
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 15
Language model and recurrent neural network
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 16
End-to-end speech recognition system
• Early DNN-based speech recognition
system just replaced some modules
with deep neural networks, but
recent researchers tries to train the
model of 𝐚𝐫𝐠𝐦𝐚𝐱𝑾𝑷(𝑾|𝑿)
– Including pre-processing of ASR
• Ochiai et al., “Multichannel end-to-end speech
recognition.” In Proc. ICML, 2017.
図は論文より引用
• SLU
– Convert the user utterance into machine-readable expressions
• DM
– Decide the next system action from the SLU result and dialogue history
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 17
Problem of Spoken language understanding (SLU)
and dialogue state tracking (DST)
I want to take
Kintetsu from
Ikoma
I want to take Kintetsu from Ikoma
Line From Stat
Train_info{$FROM=
Ikoma,$LINE=Kintetsu}
Train_info{
$FROM=Ikoma
$TO_GO=Namba
$LINE=Kintetsu
}
1 inform $NEXT_TRAIN
2 ask $TO_GO
…
SLU result Dialogue state
history
Action decision
$FROM=???
$TO_GO=Namba
$LINE=???
Train_info{$FROM=
Ikoma,$LINE=Kintetsu}
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 18
Simple classification for SLU
Chinese word model
Chinese char model
(Translated) English
word model
A MULTICHANNEL
CONVOLUTIONAL NEURAL
NETWORK FOR CROSS-LANGUAGE
DIALOG STATE TRACKING
Shi et al., In Proc. IEEE-SLT 2016
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 19
CNN for classification
• CNN requires to use fixed
size of matrix as the input
• Using two techniques
– Embedding
converts word into fixed
length meaning vector
– 0-padding
sets the height of matrix with
the maximum sentence
length in the training data
and fill with 0
if the sentence length is
smaller than the max.… …
He doesn't have … himself
word embedding
artificial
intelligence class
…
𝒙𝟏𝒙𝟐𝒙𝟑
fixed length
vector of words
0 0 0 0 0 0 0 00 0 0 0 0 0 0 0
0-padding: fill with 0 if the
sentence length is smaller than
maximum sentence length
• SLU: find a dialogue frame 𝑭 given words in the utterance 𝑾
– 𝐚𝐫𝐠𝐦𝐚𝐱𝑭𝑷(𝑭|𝑾)
• For the tagging problem, there are many works
– Slot filling, domain/Intent classification,
dialogue act classification, …
• DST: find a dialogue state 𝑺 given sequence of 𝑭𝟏:𝒕
– 𝐚𝐫𝐠𝐦𝐚𝐱𝑺𝑷(𝑺|𝑭𝟏:𝒕)
• It can be solved as a joint problem
– 𝑷 𝑺 𝑾𝟏:𝒕 = 𝑷 𝑺 𝑭𝟏:𝒕 𝑷(𝑭𝟏:𝒕|𝑾𝟏:𝒕)
• Can be solved with the same model? (with sequential model)
– RNN, long short term memory neural network (LSTM)
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 20
Problem definition of SLU and DST
• Word-Based Dialog State Tracking with Recurrent Neural Networks.
Henderson et al., In Proc. SIGDIAL, pp, 292-300, 2014.
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 21
RNN-based dialogue state tracking
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 22
LSTM-based dialogue state tracking
Is there any activity in
Singapore?
…
User utterance
word sequence
Word embedding
LSTM
Task: activity{
Area: Singapore
Price range: -
…}
…
…
…
…
Other features
T
Dialogue State Tracking
using Long Short Term
Memory Neural Networks.
Yoshino et al., In Proc.
IWSDS, 2016.
• RNN: Output dialogue state given sequence of words
(utterance)
– 𝒉𝒕 = 𝐭𝐚𝐧𝐡(𝑾𝑿𝒉𝑿𝒕 +𝑾𝒉𝒉𝒉
𝒕−𝟏 + 𝒃𝒉)
• 𝑿𝒕 is a sequence of words in time 𝒕
• Dialogue history is propagated as hidden layer 𝒉𝒕−𝟏
– 𝒚𝒑𝒕 = 𝐬𝐨𝐟𝐭𝐦𝐚𝐱(𝑾𝒉𝒚𝒑𝒉𝒕 + 𝒃𝒚𝒑)
• Output a dialogue state 𝒚𝒑𝒕 that has the highest probability
• Belief update
– 𝒃𝒕 ≈ 𝑷(𝒐𝒕|𝒔𝒕) 𝒔𝒊𝑷 𝒔𝒋𝒕 𝒔𝒊𝒕−𝟏 𝒃𝒕−𝟏
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 23
Relation between belief update and RNN-
based dialogue state tracker
observation state transition belief
observation state transition
belief
• Decide action 𝒂𝒕 given belief 𝒃𝒕
• There are two ways:
– Find the best policy (policy gradient)
– Find the best Q-function (Q-network)
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 24
Problem of action decision
Train_info{
$FROM=Ikoma
$TO_GO=Bamba
$LINE=Kintetsu
}
1 inform $NEXT_TRAIN
2 ask $TO_GO
…
Belief of dialogue states
Action decision
I’d like to go to
Namba with
Kintetsu
?
𝒃𝒕
𝒔𝟏𝒔𝟐𝒔𝟑𝒔𝟒
Train_info{
$FROM=Ikoma
$TO_GO=Bamba
$LINE=Kintetsu
}
Train_info{
$FROM=Ikoma
$TO_GO=Bamba
$LINE=Kintetsu
}
Train_info{
$FROM=Ikoma
$TO_GO=Bamba
$LINE=Kintetsu
}
𝒂𝒕
• Maximize the expected future reward (value function)
– 𝑽𝝅∗𝒔𝒕 = 𝐦𝐚𝐱
𝝅𝑽𝝅(𝒔𝒕)
= 𝐦𝐚𝐱𝒂 𝒔𝒕+𝟏𝑷 𝒔
𝒕+𝟏|𝒔𝒕, 𝒂𝒕
𝒂
𝑹 𝒔𝒕, 𝝅 𝒔 , 𝒔𝒕+𝟏 + 𝜸𝑽𝝅∗𝒔𝒕+𝟏
– 𝑸𝝅∗𝒔, 𝒂 = 𝒔𝒕+𝟏𝑷 𝒔
𝒕+𝟏|𝒔𝒕, 𝒂𝒕 𝑹 𝒔𝒕, 𝒂𝒕, 𝒔𝒕+𝟏 + 𝜸𝐦𝐚𝐱𝒂𝑸𝝅∗(𝒔𝒕+𝟏, 𝒂𝒕+𝟏)
• Policy gradient directly calculates the score 𝑽𝝅∗𝒔𝒕
• Q-network calculates 𝑸𝝅∗𝒔, 𝒂 for each action according to the
sampling manner of Q-learning
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 25
What was good actions?
• In policy gradient, policy is not decisive: 𝝅(𝒔, 𝒂) is a
probability of selecting action 𝒂 given state 𝒔
– 𝑱 𝜽 = 𝑽𝝅𝜽 𝒔
– 𝜵𝜽𝑱 𝜽 = 𝑬𝝅𝜽[𝜵𝜽𝐥𝐨𝐠𝝅_𝜽(𝒔, 𝒂)𝑸𝝅𝜽 𝒔, 𝒂 ]
– If we can learn parameter 𝜽 of policy 𝝅𝜽 as maximizing the 𝑱 𝜽
in existing data, we can get the policy that maximizes the reward for
existing data sequence
– We can use deep learning for the parameter learning
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 26
Policy gradient
• Premise: if we can calculate 𝑸 𝒔, 𝒂 for every pair, we should
decide action 𝒂 according to 𝐦𝐚𝐱𝒂𝑸 𝒔,𝒂
• Problem: we don’t know 𝑷 𝒔𝒕+𝟏|𝒔𝒕, 𝒂𝒕 to calculate 𝑸 𝒔, 𝒂
• Solution: approximate the 𝑷 𝒔𝒕+𝟏|𝒔𝒕, 𝒂𝒕 with sampling
– 𝑸 𝒔𝒕, 𝒂𝒕
𝒖𝒑𝒅𝒂𝒕𝒆𝟏 − 𝜶 𝑸 𝒔𝒕, 𝒂𝒕 + 𝜶 𝑹 𝒔𝒕, 𝒂𝒕, 𝒔𝒕+𝟏 + 𝜸𝐦𝐚𝐱
𝒂𝒕+𝟏𝑸(𝒔𝒕+𝟏, 𝒂𝒕+𝟏)
– Back-propagate the reward from the end of the sample
– We will try to build a dialogue manager by using this algorithm in
the next class
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 27
Q-learning
• Idea: If we can regress the Q-value at each sampling, learning
will be efficient
– 𝓛 𝜽𝒊 = 𝑬𝒔,𝒂,𝒓,𝒔′[ 𝒚 − 𝑸 𝒔, 𝒂𝟐]
– 𝒚 = 𝑹 𝒔𝒕, 𝒂𝒕, 𝒔𝒕+𝟏 + 𝜸𝐦𝐚𝐱𝒂𝑸(𝒔𝒕+𝟏, 𝒂𝒕+𝟏)
• Regression?
train the mapping between 𝒚 and 𝑸 𝒔, 𝒂 ?
Deep learning! (Deep Q-network; DQN)
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 28
Q-network
tanh 𝑸(𝒃, 𝒂)
𝒔 = 𝒔𝟏: 𝟎. 𝟎𝒔 = 𝒔𝟐: 𝟎. 𝟗
𝒔 = 𝒔𝒏: 𝟎. 𝟎
𝒃
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 29
Joint learning of dialogue state tracking and
action decision with deep learning
• LSTM-based DST
results are used as
the input of DQN
LSTM: calculate 𝒃𝒕 from observation
DQN: optimize 𝑸(𝒃𝒕, 𝒂𝒕+𝟏; 𝜽) with regression
Fine tuning of entire the network
Towards End-to-End
Learning for Dialog State
Tracking and Management
using Deep Reinforcement
Learning. Zhao et al., In
Proc. SIGDIAL, 2016
• Generate a sentence given a system action
• Conventional: statistical template based approach
– It is still weak for out of vocabulary…
(and out of template)
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 30
Language generation
Language
generationAsk $TO_GO Where will you go?
両手鍋/T を 油/F で熱/Ac する
セロリ/F と 青ねぎ/F とニンニク/F を 加え/Ac る
1分ほど/D 炒め/Ac る
フローグラフからの手順書の生成. 情処論文誌, 2015.
• RNN can generate a sequence of words by using the
generated words as its next input (decoder model)
– 𝒉𝒕 = 𝐭𝐚𝐧𝐡(𝑾𝒘𝒉𝒘𝒕−𝟏 +𝑾𝒉𝒉𝒉
𝒕−𝟏 + 𝒃𝒉)
– 𝒘𝒕 = 𝐬𝐨𝐟𝐭𝐦𝐚𝐱(𝑾𝒉𝒚𝒑𝒉𝒕 + 𝒃𝒚𝒑)
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 31
RNN-language model for generation
𝒙𝟏=He 𝒙𝟐=doesn’t 𝒙𝟑=have 𝒙𝟕=in
𝒚𝒑𝟏=doesn’t 𝒚𝒑
𝟐=have 𝒚𝒑𝟑=very 𝒚𝒑
𝟖=himself
𝑾𝒉𝟏 𝑾𝒉𝟐 𝑾𝒉𝟑 𝑾𝒉𝒏−𝟐 𝑾𝒉𝟓𝑾𝒉𝟎…
𝑾𝒙𝟏 𝑾𝒙𝟐 𝑾𝒙𝟑 𝑾𝒙𝟕
Dimension size of
output will be a
vocabulary
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 32
Decoder with condition:
semantically conditioned LSTM
recurrent
hidden layer
embedding of
a word
1-hot dialog
act and slot
values
What to say
(contents)
How to say (=LM)
Semantically Conditioned LSTM-
based Natural Language Generation
for Spoken Dialogue Systems. Wen
et al., In Proc. EMNLP, 2015.
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 33
semantically conditioned LSTM
decoding results for dialogue systems
• Sequence-to-sequence
modeling of generation
– Change the response according
to the dialogue context
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 34
Context-aware NLG
Dusek et al., A context-aware natural
language generator for dialogue
systems. In Proc. SIGDIAL 2016.
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 35
QA style
SLU
ModelKnowledge
base
DM
$FROM=Ikoma
$LINE=Kintetsu
1 ask $TO_GO
2 inform $NEXT
…
$FROM=Ikoma
$TO_GO=???
$LINE=KintetsuI’d like to take
Kintetsu-line
from Ikoma stat.
Skip the
management(example-base)
LG
Where will you go?
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 36
Encoder-decoder
• We can combine ideas of encoder and decoder to make the neural
network that remembers the input sentence and outputs the input
sentence
– RNN may remember not only words but also order of words
where is the rest room EOS
rest
rest
room
room
is
is
next
entrance
.
.
EOS
Vinyals, Oriol, and Quoc Le. "A neural
conversational model." arXiv preprint
arXiv:1506.05869 (2015).
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 37
Attention model
• Gives attentional point to be decoded
next to the entrance . EOS
X
𝐬𝐢𝐠𝐧 (softmax)
X …
Decides to through the input or not!
where is the rest room EOS next to the entrance .
• 𝒂𝒕,𝒋 = 𝐚𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧_𝐬𝐜𝐨𝐫𝐞(𝒉𝒆𝒋, 𝒉𝒅𝒕 )
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 38
Attention model
X X
Decides to through the input or not!
X X X
attention_score & softmax
next to the entrance . EOS
where is the rest room EOS next to the entrance .
• One typical end-to-end modeling of dialogue
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 39
ChitChat
Serban et al.,Building
end-to-end dialogue
systems using generative
hierarchical neural
network models. In Proc.
AAAI 2016.
• The task is goal
oriented (API call),
but try to work the
system on end-to-end
memory network
– The problem is
perfectly solved in
DSTC6-Track1
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 40
Memory Network for dialogue systems
• https://sites.google.com/site/deeplearningdialogue/
– Deep Learning for Dialogue Systems Tutorial
by Yun-Nung Chen, Asli Celikyikmaz, and Dilek Hakkani-Tur
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 41
If you are interested in more recent works…
• Deep learning is applied for several tasks of spoken dialogue systems
in recent years
– Speech recognition, understanding, state tracking, action decision,
generation, and end-to-end modeling
• How do we apply deep learning for dialogue (in development)?
– Clarify your problem to setup the input and output
– Find similar systems of existing works in recent (2-3 years) conferences
(SIGDIAL, NAACL, ACL, EMNLP, COLING, AAAI, IJCAI, …)
– Can you prepare the enough data pair of the input and the output?
• How do we apply deep learning for dialogue (in research)?
– Find a mapping problem that requires hi-dimension or non-linear
• Consider properties of your input and output, even if it is a part of your
problem
– Can you prepare the enough data pair of the input and the output?
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 42
Summary
• 1/30
• Dialogue management with Q-learning
– We will see the detailed algorithm and implementation of the
dialogue manager with Q-learning
– We will discuss about user simulator
2018/1/25 ©Koichiro Yoshino
AHC-Lab. NAIST, PRESTO JSTAdvanced Cutting Edge Research Seminar 2 43
Next contents