Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
• Article •
Emotional dialogue generation via multiple classifier base
on generative adversarial network
Wei CHEN, Xinmiao CHEN, Xiao SUN*
School of Computer and Information, Hefei University of Technology, Hefei, 230601 China
* Corresponding author, [email protected]
Abstract Background Human-machine dialogue generation is an essential research direction of natural
language processing. Generating high-quality, diverse, fluent, and emotional conversations is a challenging
task. With the continuous advancement of research related to artificial intelligence and deep learning, the
end-to-end neural network model provides an extensible conversational generation framework, which
provides the possibility for machines to understand semantics and automatically generate responses. The
neural network model also brings new questions and challenges. The basic conversational model
framework tends to produce universal, meaningless, and relatively "safe" answers. Methods Based on
generative adversarial network (GAN), a new emotional dialogue generation framework EMC-GAN was
proposed to complete the task of emotional dialogue generation. The emotional dialogue generation model
includes a generative model and three discriminative models. The generator is based on the basic sequence
to sequence (Seq2Seq) dialogue generation model, and the discriminative model of the overall framework
consists of a basic discriminative model, an emotion discriminative model, and a fluency discriminative
model. The basic discriminative model distinguishes generated fake sentences and real sentences in the
training corpus. The emotion discriminative model evaluates whether the emotion of the generated dialogue
text is the same as the specified emotion, and directs the generative model to generate the dialog text of the
specified emotion category. And the fluency discriminative model scores fluency of the generated dialogue
and guides generator to produce more fluent sentences. Results From the experimental results, our
model is superior to other similar models in emotion accuracy, fluency, and consistency. Conclusions
The proposed EMC-GAN model can generate consistent, smooth, and fluent dialogue text with specified
emotion, and has better performance on emotion accuracy, consistency, and fluency.
Keywords Emotional dialogue generation; Sequence to sequence model; Emotion classification;
Generative adversarial networks; Multiple classifier
1 Introduction
The technology related to human-machine dialogue has used in many products, such as intelligent voice
assistants, online customer service, and so on, so people put forward higher requirements and expectations
for the level of human-machine dialogue. There are many related types of research on dialogue systems,
such as dialogue systems with commonsense knowledge[1], dialogue systems with audio context[2], latent-
variable task-oriented dialogue systems[3], dialogue system combining text and image[4]. For more related
research, please refer to the survey of Ma et al.[5] At present, dialogue generation mainly includes three
methods, rule-based system[6], information retrieval system[7] and generation-based system. This work is
based on the latter approach. On the machine translation task, a lot of research work has been done on the
Seq2Seq model, such as implementation using Recurrent Neural Network(RNN)[8], Long Short-Term
Memory (LSTM)[9] and attention mechanism[10]. Vinyals et al. first applied the Seq2Seq structure in
dialogue generation task[11].
The basic Seq2Seq model has a primary drawback when used to generate texts; evaluating the
performance of the model is usually at the sentence level. Since then, many kinds of research have tried to
solve these problems using generative adversary networks (GAN)[12], which got great success in computer
vision. Yu et al. proposed a more suitable framework for generating conversations based on GAN, called
SeqGAN[13]. Modeling the data generator as a stochastic policy in reinforcement learning[14-15], SeqGAN
avoids the difference of generators by directly doing gradient strategy updates. Li et al. proposed that using
adversarial training based on reinforcement learning for open-domain dialogue generation[16]. Cui et al.
proposed the Dual Adversarial Learning (DAL) framework that improves both diversity and overall quality
of the generated responses[17].
People with emotional intelligence can know and show their emotions, identify others' emotions, control
emotions, and use feelings and emotions to spur adaptive behavior[18]. It is equally essential to give
machines emotion in human-machine dialogue. Ghosh et al. proposed an LSTM-based model for
generating text with emotion[19]. Rashkin et al. introduce a new data set with emotional annotations that
used to provide retrieval candidates or fine-tune the dialogue model, leading to more empathetic
responses[20]. Emotional Chatting Machine, proposed by Zhou and Zhang, can generate appropriate
dialogue texts both in content and emotion[21]. Wang et al. proposed a framework SentiGAN, which enables
the model to generate diverse, high-quality texts with specific sentiment labels through penalty mechanisms
[22]. In previous work, we presented a model built on LSTM, and we change the training corpus to solve the
emotion factors in dialogue texts: the input is adapted to the original sentence and the sentence with the
emotion label, and the sentence with the emotion label is used as the output[23].
We introduce a new emotional dialogue generation model based on a generative adversarial network
(EMC-GAN) to implement the emotional dialogue generation task in this work. Since the basic dialogue
generation model is challenging to get the emotional features of the dialogue text, we solve this problem by
decomposing the emotional dialogue task. Several different models are trained to generate dialogue texts
with different emotions. Each model focuses on creating one kind of emotional dialogue text. In this way, it
excludes the interference and influence of other emotions in generating the specified emotional dialogue
text to improve the accuracy of generating dialogue texts with specified emotion. The proposed framework
includes a generative model and multiple discriminative models. The generative model was constructed
based on the basic Seq2Seq dialogue generation model[24]; the discriminative model of framework is
composed of the basic discriminative model, emotion discriminative model, and fluency discriminative
model. They are useful to distinguish the generated text from the original text and guide the generated
dialogue text to be more fluent and with a specific emotion, respectively. The EMC-GAN model can
produce coherent, smooth, and fluent dialogue texts with specified emotion in results, and does better in
emotional accuracy, coherence, and fluency than others.
Figure 1 EMC-GAN overall framework.
2 Methods
The proposed emotional dialogue generation framework EMC-GAN includes one generative model and
three discriminative models. The generative model 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒) is a dialogue generation model based on
the basic Seq2Seq model. It generates coherent and fluent target sentences with the specified emotion
category e for the input source sentences. The discriminative model of the EMC-GAN includes a basic
discriminative model 𝐷𝑒(𝑋, 𝑌; 𝜃𝑑𝑒), an emotion discriminative model 𝐷𝑒
𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑𝑒), and a fluency
discriminative model 𝐷𝑒𝑓𝑙𝑢𝑒𝑛𝑐𝑦
(𝑋, 𝑌; 𝜃𝑑𝑒). Basic discriminative model is the same as the general dialogue
generation model, which is based on generative adversarial network, and distinguishes generated fake
sentences and real sentences in the training corpus. It also guides the generator to generate dialogue texts
that are closer to human dialogue texts. Emotion discriminative model is a binary classifier of text sequence
which can distinguish whether the emotion of the generated dialogue text and the specified emotion e is the
same. It gives the confidence probability of the emotion category of the input dialogue text is the specified
emotion category. Fluency Discriminator scores the fluency of input dialogue texts and guides the
generator to create more fluent dialogue texts.
2.1 Generative model oriented emotional dialogue text
The goal of generative model 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒) is to generate the target sequence for the input source
sequence with emotion e. 𝜃𝑔𝑒 is a parameter of generative model. At each time-step t, generative model
𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒) produces a sentence sequence 𝑆𝑡 = 𝑌1:𝑡 = {𝑦<1>, 𝑦<2>, … , 𝑦<𝑡>}, where 𝑦<𝑡> is a word token
in the existing vocabulary. The Eq1 and Eq2 show the penalty based loss function[22]:
𝑉𝐺𝑒(𝑆𝑡, 𝑦<𝑡+1>) = 𝜆1 ⋅ 𝑉𝑒(𝑆𝑡, 𝑦<𝑡+1>) + 𝜆2 ⋅ 𝑉𝑒
𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑆𝑡, 𝑦<𝑡+1>) +
𝜆3 ⋅ 𝑉𝑒𝑓𝑙𝑢𝑒𝑛𝑐𝑦
(𝑆𝑡, 𝑦<𝑡+1>) (1)
where 𝑉𝐺𝑒(𝑆𝑡, 𝑦<𝑡+1>) is the total penalty score for the sentence sequence that calculated by multiple
discriminative models, 𝑉𝑒(𝑆𝑡, 𝑦<𝑡+1>) is the basic discriminative model 𝐷𝑒(𝑋, 𝑌; 𝜃𝑑𝑒) calculate penalty
score, 𝑉𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑆𝑡, 𝑦<𝑡+1>) is the emotional discriminative model 𝐷𝑒
𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑𝑒) calculate penalty
score about the emotion of generated dialogue texts, and 𝑉𝑒𝑓𝑙𝑢𝑒𝑛𝑐𝑦
(𝑆𝑡, 𝑦<𝑡+1>) is the penalty score about
the fluency of the sentences that calculated by fluency discriminative model 𝐷𝑒𝑓𝑙𝑢𝑒𝑛𝑐𝑦
(𝑋, 𝑌; 𝜃𝑑𝑒) and 𝜆1 +
𝜆2 + 𝜆3 = 1. In this paper, we set 𝜆1 = 0.5 and 𝜆2 = 𝜆3 = 0.25. And loss function 𝐿(𝑦<𝑡+1>) based on
penalty score:
𝐿(𝑦<𝑡+1>) = 𝐺𝑒(𝑦<𝑡+1>|𝑆𝑡; 𝜃𝑔𝑒) ⋅ 𝑉𝐺𝑒
(𝑆𝑡, 𝑦<𝑡+1>) (2)
where 𝐺𝑒(𝑦<𝑡+1>|𝑆𝑡; 𝜃𝑔𝑒) is the probability of choosing (t+1)-th word relies on sequence 𝑆𝑡. Generative
model reference penalty to minimize loss:
𝐽𝐺𝑒(𝜃𝑔
𝑒) = 𝐸𝑌∼𝑃𝐺𝑒[𝐿(𝑦)] = ∑ 𝐿(𝑦<𝑡+1>)
𝑡=𝑇𝑦−1
𝑡=0
(3)
The penalty is calculated as follows:
𝑉𝑒(𝑆𝑡 , 𝑦<𝑡+1>) = {
1
𝑁∑ (1 − 𝐷𝑒(𝑋, 𝑌𝑡+1; 𝜃𝑑
𝑒))𝑁
𝑛=1, 𝑡 < 𝑇𝑦
1 − 𝐷𝑒(𝑋, 𝑌𝑡+1; 𝜃𝑑𝑒), 𝑡 = 𝑇𝑦
(4)
where 𝑇𝑦 is the target sequence maximum length, and N is the size of Monte Carlo search samples. The
penalty score of the partial sequence is calculated by the average of multiple samples to reduce the loss
caused by sampling.
2.1.1 The baseline model of dialogue generation
The basic Seq2Seq model is used as a benchmark model. This model uses the encoder-decoder network
with deep LSTM units as the underlying architecture for dialogue text generation. Adding an effective
attention mechanism to the model can get more corresponding information between the source sentence
and the target sentence. The overall structure of this model shows in Figure 2. The dialog generation model
in this paper has the same network structure as the Seq2Seq baseline model.
Both encoder and decoder are implemented by LSTM. For the encoder, at each time-step, a token of the
source sequence will be input to the encoder network; after the input completed, the encoder generates a
semantic vector C for all time-step inputs, the semantic vector C represent the input source sequence. The
decoder's first state is provided by the generated semantic vector C. The decoder decodes the semantic
vector and outputs a token 𝑦<𝑡> at each time-step. We get the output sequence 𝑌(𝑦<1>, 𝑦<2>, . . . , 𝑦<𝑇𝑦>)
for the input sequence 𝑋(𝑥<1>, 𝑥<2>, . . . , 𝑥<𝑇𝑥>) by the dialogue generation framework based on the
encoder-decoder network.
Figure 2 Encoder-Decoder with attention mechanism.
2.1.2 Attention mechanism of generative model
The core work of attention mechanism is to compute the context vector. The context vector 𝑐𝑜𝑛𝑡𝑒𝑥𝑡<𝑡>
can direct the decoder for decoding, and the decoder can know which work tokens of input sentence should
be focused on by the context vector. Figure 3 shows the calculation detail of the context vector in attention
mechanism. For saving the hidden layer state of the source sequence, the Seq2Seq model with attention
uses a bidirectional LSTM network[25] to extract the hidden state of the source sequence at each time-step.
In the bidirectional LSTM network, as Eq5 shows:
𝑎<𝑡> = (𝑎→<𝑡>
𝑎←
<𝑡>) (5)
𝑎<𝑡> contains two parts 𝑎→<𝑡> and 𝑎
←<𝑡> , representing the positive sequence feature and the reverse
sequence feature, respectively. The two parts hidden state vector is calculated by follows:
𝑎→<𝑡> = 𝐵𝑖𝐿𝑆𝑇𝑀𝑃𝑟𝑒 (𝑎
→<𝑡−1>, 𝑥<𝑡>) (6)
𝑎←<𝑡> = 𝐵𝑖𝐿𝑆𝑇𝑀𝑃𝑜𝑠𝑡 (𝑎
←<𝑡+1>, 𝑥<𝑡>) (7)
We can not only get the historical information of the sequence before time-step t, but also get the future
information of the sequence by the intermediate state vector 𝑎<𝑡>. To distinguish the intermediate state
vector of the decoder from that of the encoder, we use 𝑆<𝑡> to represent that at time-step t. The
intermediate state vector of the decoder at last time-step 𝑆<𝑡−1> is concatenated with intermediate state
vector of the encoder at each time-step:
𝑒<𝑡,𝑡′> = (𝑆<𝑡−1>, 𝑎<𝑡′>) (8)
where the vector 𝑒<𝑡,𝑡′> represent the concatenated vector of the intermediate state vector of decoder at t-1
and the intermediate state vector of encoder at t'. As Eq9, attention vector 𝛼<𝑡,𝑡′> represents the degree of
the decoder at t focus on the intermediate state vector 𝑎<𝑡′> of the encoder at t'. As Eq10 shows, the
context vector 𝑐𝑜𝑛𝑡𝑒𝑥𝑡<𝑡> is obtained by multiplying the attention vector 𝛼<𝑡,𝑡′> and its corresponding
hidden state vector 𝑎<𝑡′> and then summing the up from time-step 1 to 𝑇𝑥.
Figure 3 Context vector in attention mechanism.
𝛼<𝑡,𝑡′> =exp (𝑒<𝑡,𝑡′>)
∑ exp (𝑒<𝑡,𝜏>)𝑇𝑥
𝜏=1
(9)
𝑐𝑜𝑛𝑡𝑒𝑥𝑡<𝑡> = ∑ 𝛼<𝑡,𝑡′>𝑎<𝑡′>
𝑇𝑥
𝑡′=1
(10)
2.2 Discriminative model
The deep discriminative models implemented by convolutional neural network (CNN)[26] and recursive
convolutional neural network (RCNN)[27] perform well in complex sequence classification tasks. We use
CNN as the core structure of the discriminative model in this work. In addition, the highway network[28] is
added to the discriminative model to improve the training speed. The emotion discriminative model and
fluency discriminative model are both pre-trained models and do not participate in the adversarial training
process of the model.
2.2.1 Basic discriminative model
The text classification model based on CNN, proposed by Zhang and LeCun[29], is used as main structure
to the basic discriminative model 𝐷𝑒(𝑋, 𝑌; 𝜃𝑑𝑒), which employed to distinguish generated fake sentences
and real sentences in dataset. The loss function of the basic discriminative model is shown:
𝐽𝐷𝑒(𝜃𝑑
𝑒) = −[𝑦 ⋅ log(𝑝) + (1 − 𝑦) ⋅ log(1 − 𝑝)] (11)
The adversarial training process of the proposed emotional dialogue generation model EMC-GAN is in
Table 1.
Table 1 The adversarial training of EMC-GAN
Algorithm 1 Adversarial training of model
Input: 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒); 𝐷𝑒(𝑋, 𝑌; 𝜃𝑑
𝑒), 𝐷𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑
𝑒), 𝐷𝑒𝑓𝑙𝑢𝑒𝑛𝑐𝑦
(𝑋, 𝑌; 𝜃𝑑𝑒);
Real dialogue text (the emotion category of target sentence Y is e): R{X,Y}
Output: Trained Dialogue Generator: 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒)
1: Initialize 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒) and 𝐷𝑒(𝑋, 𝑌; 𝜃𝑑
𝑒) with random weights;
2: Pre-train 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒) using MLE on Train Data R{X,Y};
3: Generate Fake dialogue texts F{X,Y} using 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒)
4: Pre-train 𝐷𝑒(𝑋, 𝑌; 𝜃𝑑𝑒) using {R,F}
5: while model not converges do
6: for each generative step do
7: Generate fake dialogue text (F) using 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒)
8: Calculate penalty 𝑉𝐺𝑒 by Eq1
9: Update 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒) by Eq3
10: end
11: for each discriminative step do
12: Generate fake dialogue text (F) using 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒)
13: Update 𝐷𝑒(𝑋, 𝑌; 𝜃𝑑𝑒) using {R,F} by Eq11
14: end
15: end
16: return
2.2.2 Emotion discriminative model
To generate the dialogue response with specified emotion, the emotion discriminative model
𝐷𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑
𝑒) guides the generative model to generate the target sentence with specified emotion for
the input source sentence. The emotional discriminative model can distinguish whether the emotion
category of input dialogue text is the specified emotion category. The training process of the emotion
discriminative model is in Table 2. For the emotion discriminative model 𝐷𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑
𝑒), the real
dialogue text with emotion category e is 𝑅 = {Dialogue𝑒3} and the fake dialogue text with emotion
category is 𝐹 = {Dialogue𝑒1, Dialogue𝑒2, … , Dialogue𝑒6} . The emotion discriminator is used to
discriminate between real dialogue text R and the fake dialogue text F and gives confidence probability that
the input dialogue text is real dialogue text. This model is similar to the basic discriminative model and is
trained in advance. Emotion accuracy in different categories is about 70% ~ 85%, and experimental results
show it enough to guide the generative model to generate sentences with emotion.
Table 2 The training process of the emotion discriminative model
Algorithm 2 Emotion Discriminative Model training
Input: 𝐷𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑
𝑒);
Real dialogue text (R) with emotion category e: R{𝐷𝑖𝑎𝑙𝑜𝑔𝑢𝑒𝑒},
Fake dialogue text (F) with other emotion category: F{𝐷𝑖𝑎𝑙𝑜𝑔𝑢𝑒𝑒1, 𝐷𝑖𝑎𝑙𝑜𝑔𝑢𝑒𝑒2, ...};
Output: Trained Emotional Discriminator: 𝐷𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑
𝑒);
1: Initialize 𝐷𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑
𝑒) with random weights;
2: while model not converges do
3: for each emotional discriminative step do
4: Update 𝐷𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑
𝑒) using {R,F} by Eq11
5: end
6: end
7: return
2.2.3 Fluency discriminative model
The sentence fluency evaluation algorithm in this work based on the sentence fluency evaluation method
proposed by Liu[30]. The process of sentence fluency evaluation shows in Table 3, and the algorithm
employs the N-gram statistical language model[31] to evaluate sentences' fluency. This algorithm uses the
transition probability of all three tuples to measure the fluency of the whole sentence. At first we count all
binary tuples in the dialogue text of dataset and the corresponding occurrence times, save results in
𝑛_𝑔𝑟𝑎𝑚2_𝑐𝑜𝑢𝑛𝑡, take the binary tuple as the key of the dictionary, and take the occurrence time of the
binary tuple as the value. Count all the triple tuple and corresponding occurrence times in the same way,
and save counted results in 𝑛_𝑔𝑟𝑎𝑚3_𝑐𝑜𝑢𝑛𝑡. The transition probabilities of all triple tuples are calculated
by the dictionary 𝑛_𝑔𝑟𝑎𝑚2_𝑐𝑜𝑢𝑛𝑡 and 𝑛_𝑔𝑟𝑎𝑚3_𝑐𝑜𝑢𝑛𝑡, as shown:
𝑝(𝑥𝑖 , 𝑥𝑗 , 𝑥𝑘) = 𝑝(𝑥𝑘|𝑥𝑖 , 𝑥𝑗) =𝑐𝑜𝑢𝑛𝑡(𝑥𝑖 , 𝑥𝑗 , 𝑥𝑘)
𝑐𝑜𝑢𝑛𝑡(𝑥𝑖 , 𝑥𝑗) (12)
where xi, xj and xk are adjacent words in the sentence. And the calculated result is saved in
𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏 . Finally, the descending order is sorted according to the transition probability
corresponding to the triple tuples and saved to list 𝑠𝑜𝑟𝑡𝑒𝑑_𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏. In general, sentences with
higher transition probability of n-gram tuple are more fluent. Use two transition probabilities to decide
whether the current generated sentence is fluent, 𝑟𝑒𝑤𝑎𝑟𝑑_𝑝𝑟𝑜𝑏 and 𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑝𝑟𝑜𝑏. The first 40% triple
tuples are smoother and the last 20% of these are more awkward in 𝑠𝑜𝑟𝑡𝑒𝑑_𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏 , then
𝑟𝑒𝑤𝑎𝑟𝑑_𝑝𝑟𝑜𝑏 is the minimum transition probability in the first 40% tuples, and 𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑝𝑟𝑜𝑏 is the
maximum transition probability in the last 20% tuples.
When evaluating the fluency of sentence 𝑋 = {𝑥1, 𝑥2, … , 𝑥𝑚}, the initial fluency score is assigned to
𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) = 0, and then all the triple tuples are traversed. If the length of the sentence is less than 3, the
algorithm will directly set the fluency score of this short sentence to 0 because we don't expect the model to
use these short sentences as responses for the input source sentence. If the transition probability of the
current triple tuple is higher than 𝑟𝑒𝑤𝑎𝑟𝑑_𝑝𝑟𝑜𝑏, it indicates that the triple tuple is relatively smooth. The
current fluency score 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) add the ratio of transition probability to 𝑟𝑒𝑤𝑎𝑟𝑑_𝑝𝑟𝑜𝑏. If the transition
probability of current triple tuple is less than 𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑝𝑟𝑜𝑏, it indicates that the triple tuple is relatively
awkward. The current fluency score 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) subtract the ratio of transition probability to
𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑝𝑟𝑜𝑏. If the binary tuple corresponding to the triple tuple does not exist, the triple tuple is
assigned a tiny transition probability (0.02) when calculating fluency. Finally, the computed fluency score
𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) divides the number of triple tuples in the sentence is the final result.
Table 3 The fluency score of dialogue by fluency discriminative model
Algorithm 3 Calculate fluency score
Input: corpus, Sentence to be evaluated;
Output: the fluency score of input sentence;
1: Count the number of Bi-gram of dialogue corpus.
2: Bi-gram count dict 𝑛_𝑔𝑟𝑎𝑚2_𝑐𝑜𝑢𝑛𝑡={′′[𝑥𝑖, 𝑥𝑗]′′: the number of [𝑥𝑖 , 𝑥𝑗] in corpus}
3: Count the number of Tri-gram of dialogue corpus.
4: Tri-gram count dict 𝑛_𝑔𝑟𝑎𝑚3_𝑐𝑜𝑢𝑛𝑡 ={′′[𝑥𝑖, 𝑥𝑗 , 𝑥𝑘]′′: the number of [𝑥𝑖 , 𝑥𝑗 , 𝑥𝑘] in corpus}
5: Calculate Tri-gram transfer probability by Eq12
6: Tri-gram transition probability dict 𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏={′′[𝑥𝑖 , 𝑥𝑗 , 𝑥𝑘]′′: 𝑝(𝑥𝑖 , 𝑥𝑗 , 𝑥𝑘)}
7: Sort the 𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏 by transfer probability, and sorted probability saved to list
𝑠𝑜𝑟𝑡𝑒𝑑_𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏
8: The size of sorted probability list is: 𝑠𝑖𝑧𝑒 = 𝑙𝑒𝑛(𝑠𝑜𝑟𝑡𝑒𝑑_𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏)
9: 𝑟𝑒𝑤𝑎𝑟𝑑_𝑝𝑟𝑜𝑏 = 𝑠𝑜𝑟𝑡𝑒𝑑_𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏[int(𝑠𝑖𝑧𝑒 ∗ 0.4)]
10: 𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑝𝑟𝑜𝑏 = 𝑠𝑜𝑟𝑡𝑒𝑑_𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏[int(𝑠𝑖𝑧𝑒 ∗ (1 − 0.2)]
11: 𝑝(𝑥𝑖| Tri-gram) = 𝑝(𝑥𝑖 , 𝑥𝑖+1, 𝑥𝑥+2)
12: 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) = 𝑓𝑙𝑢𝑒𝑛𝑐𝑦({𝑥1, 𝑥2, … , 𝑥𝑚}) = 0
13: if 𝑇𝑥 < 3 then
14: return 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋);
15: for 𝑖 = 1 to 𝑚 − 2 do
16: if 𝑝(𝑥𝑖| Tri-gram) ≥ 𝑟𝑒𝑤𝑎𝑟𝑑_𝑝𝑟𝑜𝑏 then
17: 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) += 𝑝(𝑥𝑖| Tri-gram )/𝑟𝑒𝑤𝑎𝑟𝑑_𝑝𝑟𝑜𝑏
18: else if 𝑝(𝑥𝑖|Tri-gram) ≤ 𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑝𝑟𝑜𝑏 then
19: 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) −= 𝑝(𝑥𝑖|Tri-gram)/𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑝𝑟𝑜𝑏
20: end
21: 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) = 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋)/(𝑚 − 2)
22: return 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋);
3 Experiments
In the process of emotional dialogue generation, the generative model generates target sentences
according to the input source sentences and a specified emotion category. The resulting sentence sequence
should be consistent, fluent, and have the specified emotion category.
3.2 Datasets
The dialogue dataset is a series of dialogue pairs with emotion category labels {X, Y}, where 𝑋 =
{𝑒𝑥 , 𝑥<1>, 𝑥<2>, … , 𝑥<𝑇𝑥>} is the source sentence sequence, 𝑌 = {𝑒𝑦 , 𝑦<1>, 𝑦<2>, … , 𝑦<𝑇𝑦>} is the target
sentence sequence (dialogue response), 𝑒𝑥 and 𝑒𝑦 are the emotion category label of the source sentence and
the target sentence respectively. The generative model intends to produce the target sentence with a
specified emotion for input source sentences with any emotion. To construct the corresponding dataset for
different emotion generative model, we divide the dataset into multiple sub-datasets according to the
emotion of the target sentence. All the target sentence sequences have the same emotion label 𝑒𝑦 in each
sub-dataset.
NLPCC Weibo (NLPW): it is built from conversations extracted from Weibo comments and has
1119200 dialogue pairs with six emotion category labels (anger, disgust, happiness, like, sadness, and
other), similar to our previous work[23].
Xiaohuangji (XHJ): this dataset has 454130 dialogue pairs. However this dialogue corpus has no
corresponding emotion labels. We use natural language processing open-source tool HanLP[32] to train an
emotion classification model, which is a naive Bayes classifier with NLPW dataset as training data. The
emotion classification model can classify the sentence into six emotions corresponding to the NLPW
dataset.
The quantity distribution of two datasets NLPW and XHJ in different emotion categories is shown in
Figure 4. Table 4 shows the emotion distribution of different emotion sub-datasets.
Figure 4 Dataset emotion distribution.
Table 4 Sub-dataset emotion distribution
Target Emotion
Dataset Source Emotion Other Like Sadness Disgust Anger Happiness
NLPW
Other 0.419 0.271 0.290 0.301 0.291 0.276
Like 0.187 0.381 0.182 0.180 0.159 0.276
Sadness 0.095 0.083 0.212 0.098 0.111 0.099
Disgust 0.149 0.110 0.145 0.259 0.199 0.120
Anger 0.065 0.040 0.064 0.080 0.147 0.057
Happiness 0.085 0.115 0.108 0.082 0.094 0.171
XHJ
Other 0.450 0.393 0.377 0.388 0.374 0.382
Like 0.132 0.227 0.119 0.124 0.109 0.144
Sadness 0.087 0.079 0.172 0.080 0.072 0.106
Disgust 0.173 0.155 0.170 0.246 0.183 0.160
Anger 0.102 0.091 0.109 0.110 0.212 0.101
Happiness 0.056 0.053 0.053 0.052 0.049 0.108
3.4 Experimental setup
The basic Seq2Seq dialogue generation model is a benchmark model for experimental comparison.
Besides, the hybrid neural network-based emotional dialogue generation model EHMCG[23] and the
emotional dialogue generation model EM-SeqGAN[33] are compared with the proposed EMC-GAN model.
This work mainly analyzes and evaluates the generated dialogue text of different models from three
indicators of emotion accuracy, coherence, and fluency. Tensorflow[34] is used to build our model. For the
penalty score calculated by the discriminator coefficient 𝜆 (in Eq1), the ratio between 𝜆1, 𝜆2, and 𝜆3 set at
2:1:1, and they constrain the weights of three evaluation dimensions in guiding the training of the
generative model. The training iterations of the generative model and discriminative model are 5 and 10,
respectively.
3.5 Emotion accuracy
After generating dialogue text, we annotate the categories of emotion for the generated dialogue text by
the emotion classifier of HanLP. If the emotion category of the generated sentence is the same as that of the
target sentence, the emotion category of the generated target sentence conforms to the expectation.
Table 5 shows emotion accuracy of generated dialogue text of different dialogue generation models.
Compared with other models, the proposed EMC-GAN model has the highest emotion accuracy in any
emotions of each sub-dataset. For the NLPW dataset, the EMC-GAN model has high emotion accuracy in
emotion "Other", "Like", "Sadness" and "Disgust", which is 0.588 ~ 0.740, while the emotion accuracy in
emotion "Anger" and Happiness is only 0.392 and 0.236. For the XHJ dataset, the EMC-GAN model has
higher emotion accuracy in every emotion, and the emotion accuracy reaches 0.701~0.870. Compared with
other emotion categories, the emotion "Other" has the highest emotion accuracy on both datasets, and the
reason may be the emotion "Other" has the most number of training data. The emotion "Other" represents
any other emotions; the emotion classification model may tend to judge the emotion category of input
sentence as emotion "Other".
Table 5 The emotion accuracy of generated dialogue text
Emotion Accuracy
Dataset Model Other Like Sadness Disgust Anger Happiness
NLPW
Seq2Seq 0.286 0.121 0.089 0.128 0.212 0.191
EHMCG 0.421 0.354 0.374 0.289 0.211 0.195
EM-SeqGAN 0.572 0.487 0.548 0.376 0.295 0.201
EMC-GAN 0.740 0.687 0.723 0.588 0.392 0.236
XHJ
Seq2Seq 0.293 0.176 0.064 0.227 0.177 0.094
EHMCG 0.563 0.379 0.374 0.458 0.385 0.295
EM-SeqGAN 0.647 0.563 0.487 0.567 0.426 0.375
EMC-GAN 0.870 0.794 0.761 0.732 0.765 0.701
3.6 Coherence evaluation
The dialogue text's coherence, whether it meets the question or related context of the source sentence, is
one of the essential portions to estimate the performance of the dialogue generation model. Now, we do not
have a good enough model to evaluate the coherence of the generated dialogue text. To complete the task
of dialogue consistency assessment, we use manual judge method to assess the coherence of the generated
dialogue text. The evaluation options and corresponding evaluation scores are shown in Table 6. The
evaluation score range from 1~5 and the higher evaluation score means better coherence of the dialogue
text.
The coherence evaluation score of generated dialogue text of different dialogue generation models shows
in Table 7. The proposed EMC-GAN model has a higher coherence evaluation scores on all emotion
categories of both datasets than other models. The EMC-GAN model gets higher coherence evaluation
scores on XHJ dataset than that of NLPW dataset. The generated dialogue texts of "Other", "Like" and
"Sadness" emotion categories obtain the higher coherence evaluation score in both datasets. Notably, the
coherence evaluation scores of the EMC-GAN model are 3.407 and 3.180 in "Other" and "Sadness",
respectively. That indicates the generated dialogue sentences have pretty good coherence.
Table 6 The evaluation options evaluation score of human evaluation
Option very good good normal bad very bad
Score 5 4 3 2 1
Table 7 The coherence evaluation score of generated dialogue
Coherence Evaluate Score
Dataset Model Other Like Sadness Disgust Anger Happiness
NLPW
Seq2Seq 1.306 1.067 1.451 1.085 1.181 1.051
EHMCG 1.403 1.192 1.387 1.236 1.335 1.096
EM-SeqGAN 1.732 1.563 1.734 1.522 1.403 1.225
EMC-GAN 2.277 2.875 2.115 1.881 1.972 1.245
XHJ
Seq2Seq 1.127 1.361 1.229 1.111 1.147 1.263
EHMCG 1.256 1.452 1.248 1.324 1.223 1.371
EM-SeqGAN 1.820 1.726 1.339 1.514 1.330 1.207
EMC-GAN 3.407 2.542 3.180 1.931 1.561 2.255
3.7 Fluency evaluation
Similar to coherence, the fluency of generated dialogue text is also an essential factor in evaluating the
dialogue generation model's performance, which much reflects the dialogue generation model's text
production capability. The fluency discriminative model evaluates the fluency of generated dialogue texts
in this process, and the fluency evaluation method is shown in Algorithm 3. Besides, to improve the
accuracy of the fluency evaluation, we also adopt manual evaluation.
The fluency score is obtained by the fluency discriminative model. As Table 8 shows, the proposed
EMC-GAN model has a higher fluency score than other models in each dataset and emotion category. For
the NLPW dataset, the EMC-GAN model gets a higher fluency score on the emotion "Sadness" and
"Anger", and the fluency score on the emotion "Other" is lower. The generated dialogue texts of the XHJ
dataset get higher fluency scores than that of the NLPW dataset, and the fluency of sentence improves
obviously. For the XHJ dataset, the fluency score on the emotion "Other" is also relatively low, and others
have a higher fluency score.
Table 9 is the result of the fluency score evaluated by humans of different models. This fluency score is
similar to the coherence evaluation score. Compared with other models, the proposed EMC-GAN model
gets a higher fluency in each dataset and emotion category. The EMC-GAN model has a higher fluency
score in each emotion of the XHJ dataset than that of the NLPW dataset. For the XHJ dataset, our EMC-
GAN model gets the highest fluency evaluation score of 4.480 on emotion "Sadness", and the fluency
evaluation score on emotion "Disgust" and "Anger" are lower at 2.835 and 2.960, respectively.
Table 8 The fluency score of generated dialogue evaluated by algorithm 3
Fluency Score
Dataset Model Other Like Sadness Disgust Anger Happiness
NLPW Seq2Seq -0.189 -0.193 -0.192 -0.193 -0.192 -0.194
EHMCG 0.405 0.727 1.133 0.526 1.238 0.943
EM-SeqGAN 0.586 0.875 1.875 1.034 1.337 1.237
EMC-GAN 0.854 1.710 2.617 1.512 2.498 1.706
XHJ
Seq2Seq -0.124 -0.123 -0.124 -0.125 -0.124 -0.123
EHMCG 3.356 5.528 7.526 6.776 5.882 4.581
EM-SeqGAN 4.652 7.238 8.832 7.774 6.237 6.735
EMC-GAN 6.300 9.239 11.33 10.33 10.26 11.26
Table 9 The fluency score of generated dialogue evaluated by human
Fluency Evaluate Score
Dataset Model Other Like Sadness Disgust Anger Happiness
NLPW
Seq2Seq 1.193 1.267 1.251 1.202 1.114 1.351
EHMCG 1.253 1.196 1.325 1.269 1.156 1.183
EM-SeqGAN 2.013 2.162 2.067 1.849 1.758 1.657
EMC-GAN 2.424 2.875 2.365 2.476 2.272 1.984
XHJ
Seq2Seq 1.287 1.111 1.567 1.311 1.265 1.187
EHMCG 1.257 1.284 1.732 1.455 1.325 1.173
EM-SeqGAN 2.471 2.648 2.741 1.846 1.775 2.659
EMC-GAN 3.717 3.760 4.480 2.835 2.960 3.326
3.8 Results analysis
The emotion accuracy error of this experiment mainly comes from the mistake of datasets and emotion
classification model. As the accuracy of the emotion classification model for the NLPW dataset is 64%,
there are some errors in the emotion category of the dialogue text. Since the dialogue text in the XHJ
dataset does not have a corresponding emotion category label, the HanLP tool uses to train an emotion
classifier on the NLPW dataset as a training corpus and add emotion labels to the dialogue text in the XHJ
dataset. Besides, to reduce the influence of emotion classification model on the emotion evaluation for the
dialogue texts, we use the same emotion classification model to classify the generated dialogue texts.
From the experimental results, our model has a more excellent performance in the XHJ dataset than that
of NLPW dataset. By analyzing the dialogue text in NLPW dataset and found the sentence quality
generally poor. Compared with the general dialogue text, the length of the sentences in the NLPW dataset
is shorter, and sentences are awkward, which may be caused by the irregularity of comment sentence in
Weibo. In the NLPW data set, there are a large number of binary and triple tuples with the frequency of one
occurrence, which is not conducive to the construction of the language model, and it is difficult for the
dialogue generation model to reach the convergence state. Analyzing dialogue text generated by the
baseline model shows that the majority of the generated sentences are a sequence with random words, and
the fluency of the generated sentences is particularly bad. It can see in Table 8, the generated dialogue texts
of the Seq2Seq model get lower fluency scores, and its fluency scores on different emotions are roughly the
same. Because in the process of fluency evaluation, we give the transition probability of this triple tuple a
small value (0.02) when the binary tuple corresponding to the triple tuple does not exist. Due to the fluency
of the generated dialogue texts are too weak, the transition probability of most triple tuples are close to the
minimum value.
Some generated dialogue text samples get from the test set. Table 10 shows the generated dialogue text
of Seq2Seq and EMC-GAN model; the emotion column in the table is the emotion category that dialogue
text requires should generate. It can see that the response generated by Seq2Seq is relatively weak in terms
of emotional and content consistency, and the generated responses are not well satisfied in multiple
evaluation dimensions. Our model is better at emotion accuracy, coherence, and fluency.
Table 10 Cases of generated dialogue from Seq2Seq and EMC-GAN
Model Post(translate) Emotion Response(translate)
Seq2Seq
我命运多坎坷啊你都预见不到!
How rough my fate has been, you never saw it
coming!
Anger
关我什么事?
What is it to me?
懒得理你。
Talk to the hand. Disgust
切~,我是傻。
Cut ~, I am stupid.
你吹牛。
You brag. Happiness
就是啊、已经注意到了。
Yeah, I've noticed.
我也要参加。
I'll be there, too. Like
谢谢您的关注!祝您周末愉快。
Thank you for your attention! Have a nice weekend.
牙齿不痛了?
No pain in your tooth? Sadness
我爬山好痛苦...
It was painful climbing the mountain...
来的话,如何找你?
How can I get to you if I come? Other
星期六晚。
Saturday night.
EMC-GAN
不是我吓你,你晚上真的会做噩梦,小狗会
变成鬼找你。
It’s not that I scare you, you really have nightmares at night, and the puppy will turn into
a ghost to find you.
Anger
吓到我了!
Scared me!
青岛,一座来过就不想再来的城市!
Qingdao, a city that has been here but never
wants to come again!
Disgust
你怎么这么说话呢?赤裸裸的羡慕嫉妒。
Why are you talking like this? Naked envy and
jealousy.
又是第一名。
Number one again. Happiness
感觉心情不错很好。
I feel in a good mood.
我爱你。
I love you. Like
真的啊~,超级无敌帅。
Really, super invincible handsome.
我不是故意的亲爱的。
I didn't mean to, honey. Sadness
太伤心了…
It's so sad...
你是机器人嘛?回复这么快。
Are you a robot? Reply so quickly. Other
我又不是,我怎么知道。
I am not, how do I know.
4 Conclusion
A new emotional dialogue generation framework EMC-GAN proposed in this work uses multiple
classifiers to generate better dialogues in different feature dimensions. The generative model generates the
target sentences for the input source sentences. The basic discriminative model distinguishes generated fake
sentences and real sentences in the dataset. The emotion discriminative model evaluates whether the
emotion of the generated dialogue text is the same as the specified emotion. And the fluency discriminative
model judges the input dialogue is fluent or not, and give the fluency score for the input dialogue text.
According to the experimental results, the EMC-GAN model can generate dialogue text with a specified
emotion. Compared with other models, the generated dialogue texts of EMC-GAN are more fluent and
smoother. However, the accuracy of the emotion classifier should be improved to obtain more realistic
dialogue text. Besides, other features of the sentence, such as novelty and variability, would be considered
to make the final dialogue text more fluent and natural.
References
1 Young T, Cambria E, Chaturvedi I, Zhou H, Biswas S, Huang M. Augmenting end-to-end dialogue systems with
commonsense knowledge. Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, Louisiana, USA,
2018: 4970-4977
2 Young T, Pandelea V, Poria S, Cambria E. Dialogue systems with audio context. Neurocomputing. 2020, 388:102-109
DOI: 10.1016/j.neucom.2019.12.126
3 Xu H, Peng H, Xie H, Cambria E, Zhou L, Zheng W. End-to-End latent-variable task-oriented dialogue system with
exact log-likelihood optimization. World Wide Web, 2020, 23(3): 1989-2002
DOI: 10.1007/s11280-019-00688-8
4 Zhang Z, Liao L, Huang M, Zhu X, Chua T S. Neural multimodal belief tracker with adaptive attention for dialogue
systems. The World Wide Web Conference. San Francisco, CA, USA, 2019: 2401-2412
DOI: 10.1145/3308558.3313598
5 Ma Y, Nguyen K L, Xing F Z, Cambria E. A survey on empathetic dialogue systems. Information Fusion, 2020.
6 Adrian P, Harold B. Rule Responder: Rule-Based Agents for the Semantic-Pragmatic Web. International Journal on
Artificial Intelligence Tools. 2011, 20:1043-1081
DOI: 10.1142/S0218213011000528
7 Xu M, Li P, Yang H, Ren P, Ren Z, Chen Z, Ma J. A Neural Topical Expansion Framework for Unstructured Persona-
oriented Dialogue Generation. The 24th European Conference on Artificial Intelligence (ECAI). Santiago, Chile, 2020
8 Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio, Y. Learning phrase
representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference
on Empirical Methods in Natural Language Processing. Doha, Qatar, ACL, 2014: 1724-1734
DOI: 10.3115/v1/d14-1179
9 Sutskever I, Vinyals O, Le Q V. Sequence to Sequence Learning with Neural Networks. Advances in Neural Information
Processing Systems. Quebec, Canada, 2014: 3104-3112
10 Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. 3rd International
Conference on Learning Representations. San Diego, CA, 2015
11 Vinyals O, Le Q. A neural conversational model. arXiv preprint arXiv:1506.05869, 2015
12 Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative
adversarial nets. Advances in Neural Information Processing Systems. Montreal, Quebec, 2014: 2672-2680
13 Yu L, Zhang W, Wang J, Yu Y. SeqGAN: Sequence generative adversarial nets with policy gradient. In: Proceedings of
the Thirty-First AAAI Conference on Artificial Intelligence. San Francisco, California, AAAI, 2017: 2852-2858
14 Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: A survey. Journal of artificial intelligence research,
1996, 4: 237-285
15 Sutton R S, McAllester D A, Singh S P, Mansour Y. Policy gradient methods for reinforcement learning with function
approximation. Advances in Neural Information Processing Systems 12, NIPS Conference. Denver, Colorado, USA,
1999: 1057-1063
16 Li J, Monroe W, Shi T, Jean S, Ritter A, Jurafsky D. Adversarial learning for neural dialogue generation. In:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark,
ACL, 2017: 2157-2169
DOI: 10.18653/v1/d17-1230
17 Cui S, Lian R, Jiang D, Song Y, Bao S, Jiang Y. DAL: Dual Adversarial Learning for Dialogue Generation. arXiv
preprint arXiv:1906.09556, 2019
18 Salovey P, Mayer J D. Emotional intelligence. Imagination, Cognition and Personality. 1990, 9(3): 185-211
DOI: 10.2190/DUGG-P24E-52WK-6CDG
19 Ghosh S, Chollet M, Laksana E, Morency L, Scherer S. Affect-LM: A neural language model for customizable affective
text generation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics,
Vancouver, Canada. 2017: 634-642
DOI: 10.18653/v1/P17-1059
20 Rashkin H, Smith E M, Li M, Boureau Y L. Towards Empathetic Open-domain Conversation Models: A New
Benchmark and Dataset. In: Proceedings of the 57th Conference of the Association for Computational Linguistics.
Florence, Italy, 2019, 1: 5370-5381
DOI: 10.18653/v1/p19-1534
21 Zhou H, Huang M, Zhang T, Zhu X, Liu B. Emotional chatting machine: Emotional conversation generation with
internal and external memory. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. New
Orleans, Louisiana, 2018:730-739
22 Wang K, Wan X. SentiGAN: Generating Sentimental Texts via Mixture Adversarial Networks. IJCAI. Stockholm,
Sweden, 2018: 4446-4452
DOI: 10.24963/ijcai.2018/618
23 Sun X, Peng X, Ding S. Emotional human-machine conversation generation based on long short-term memory.
Cognitive Computation, 2018, 10(3): 389-397
DOI: 10.1007/s12559-017-9539-4
24 Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint
arXiv:1409.0473, 2014
25 Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks. 2013 IEEE international
conference on acoustics, speech and signal processing. IEEE, 2013: 6645-6649
DOI: 10.1109/ICASSP.2013.6638947
26 Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X. Improved techniques for training GANs.
Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems
2016, Barcelona, Spain. 2016: 2226-2234
27 Lai S, Xu L, Liu K, Zhao, J. Recurrent convolutional neural networks for text classification. In: Proceedings of the
Twenty-Ninth AAAI Conference on Artificial Intelligence. Austin, Texas, AAAI, 2015: 2267-2273
28 Srivastava R K, Greff K, Schmidhuber J. Highway networks. arXiv preprint arXiv:1505.00387, 2015
29 Zhang X, LeCun Y. Text Understanding from Scratch.arXiv preprint arXiv:1502.01710, 2015
30 Liu D. Approaches to Chinese Word Analysis; Utterance Segmentation and Automatic Evaluation of Machine
Translation. Dissertation for the Master Degree. Beijing: Institute of Automation, Chinese Academy of Sciences. 2004
31 Doddington G. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In:
Proceedings of the second international conference on Human Language Technology Research. San Diego, California,
2002: 138-145
DOI: 10.5555/1289189.1289273
32 Zhang H P, Yu H K, Xiong D Y, Liu, Q. HHMM-based Chinese lexical analyzer ICTCLAS. In: Proceedings of the
Second Workshop on Chinese Language Processing. Sapporo, Japan, ACL, 2003, 17: 184-187
DOI: 10.3115/1119250.1119280
33 Sun X, Chen X, Pei Z, Ren F. Emotional human machine conversation generation based on SeqGAN. First Asian
Conference on Affective Computing and Intelligent Interaction (ACII Asia). IEEE, Beijing, 2018: 1-6
DOI: 10.1109/ACIIAsia.2018.8470388
34 Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg
J, Monga R, Moore S, Derek G. Tensorflow: A system for large-scale machine learning. 12th USENIX Symposium on
Operating Systems Design and Implementation (OSDI 2016), Savannah, GA, USA, 2016: 265-283