37
Intelligent Chatbot on WeChat WeChat AI NLP Team 2017.02.08

Intelligent Chatbot on WeChat

Embed Size (px)

Citation preview

Page 1: Intelligent Chatbot on WeChat

Intelligent Chatbot on WeChat

WeChat AI NLP Team

2017.02.08

Page 2: Intelligent Chatbot on WeChat

846 million monthly active users

300 million WeChat Pay users

10 million Official Accounts

200 thousand developers

WeChat is the leading mobile social network in China. In 6 years, WeChat has gained…

Data: Tencent Financial Reports

Page 3: Intelligent Chatbot on WeChat

WeChat is not just a mobile messaging app. It’s a new lifestyle, connecting people with people, services, devices and more.

WeChat Overview The WeChat Lifestyle

Page 4: Intelligent Chatbot on WeChat

Red Pocket Jan 27 – Feb 01

46 Billion Emoji

Jan 27 – Feb 01 16 Billion

Voice Call Jan 27 – Jan 28

2.1 Billion minutes

Chinese New Year 2017

Page 5: Intelligent Chatbot on WeChat

5

The new way for businesses to interact with their customers.

Powered by WeChat

Page 6: Intelligent Chatbot on WeChat

7

Messaging (Can be automated)

Account management

Service Accounts China Merchant Bank case

China Merchants Bank

Over 10 million followers

Open an account

Pay bill/loan

Receive payment notifications

Receive CRM promotions

Powered by WeChat

Page 7: Intelligent Chatbot on WeChat

8

Messaging Account management

Service Accounts China Southern Airlines case

China Southern Airlines

Buy Tickets

Check-in

Choose seats

Flight status update

Frequent flyer services

Powered by WeChat

Page 8: Intelligent Chatbot on WeChat

Chatbot

Examples • WeBank • WeChat official account • Tencent games • Xiao‘er Mechanical Monk

Chatbot on WeChat

• Natural to server customers • Powerful for users to acquire service, information, knowledge, etc.

Page 9: Intelligent Chatbot on WeChat

Work Flow of Wechat Chatbot

Question

Question Parsing Question

Understanding Output

Rule Match

QnA Chitter

Chat Model

Answer Ranking

Answer

Context

Answer Candidates

Knowledge Base

Page 10: Intelligent Chatbot on WeChat

Chatbot Architecture in Progress

Question

Question Parsing Question

Understanding Output

Rule Match

QnA Document Content

Chitter Chat Model

Answer Ranking

Answer

Sentiment Analysis

Sentiment Analysis Output Context

Answer Candidates

Personalization

Knowledge Graph

Under development • Sentiment analysis • Knowledge graph • Doc-chat • Personalization • Expose the platform to public

Page 11: Intelligent Chatbot on WeChat

Example of Task Completion Chatbot

Intent = Book Flight

Dialog Manager Domain

Ontology

Slot Key Value

Intent Book Flight

Date 09-20-2016

From Beijing

To Shanghai

Date=tomorrow

To=Shanghai

Dialog Manager

Slot Key Value

Intent Weather

Date 09-20-2016

Location Shanghai

Intent = Weather

Dialog Manager

From=Beijing Dialog Manager

Key Technologies: • Intent classification • Slot filling • Multi-initiative context

management

Page 12: Intelligent Chatbot on WeChat

Conversational Chatbot

How can be happy? Why I’m so busy?

Page 13: Intelligent Chatbot on WeChat

Hard Problems for Conversational Chatbot

Question Understanding: • 干啥呐?(what are you doing?) • 干啥的?(what is your job?)

• 你哪里好?(why you think you are good?) • 你在哪里? (where are you?)

• 你师父呢?(where is your master?) • 师父在忙 (master is busy) • 他在忙啥? (what is he doing?)

• 闻何法啊? (how do you practice Dharma?) • 破除我执 (being not obsessive) • 如何破除呢? (how?)

Knowledge Representation:

• Notarial certificates, executed in the mainland, and to be used in Hong Kong Special Administrative Region, shall be acknowledged by the Consular Department of the Ministry of Foreign Affairs of the People's Republic of China

• 转心 (transform the heart),就是心里要去拿起一个正确的东西,否则心在烦恼(affliction)中时是很难转动的。要不断培养自己的发心(bodhicitta-samutpada) ,让它越来越宽广,越来越清净,烦恼自然就越来越少。恨(hatred)也好,念(obsession)也好,都是妄想(delusion) ,消耗心力、迷障未来。

Answer Generation: avoid trivial and boring answers • 忙呢 (busy now) • 你忙 (take your time) • 再见 (see you later) • 狗狗很可爱 (dogs are cute) 是很可爱 (yes, they are cute)

Page 14: Intelligent Chatbot on WeChat

Sentence Modeling by Recurrent Neural Network

x0 x1 x2 x3 xn

Embedding Layer

V0 V1 V2 V3 Vn

h0 h1 h2 hn h3

V0 V1 V2 V3 V4 V5 V6 V7 V8

x0 x1 x2 x3 x8

Embedding Layer

x4 x5 x6 x7

Page 15: Intelligent Chatbot on WeChat

Anaphora Resolution

Input: q: current query c: contenxt

Output: q': current query after anaphora resolution H: replace pronouns in the current query with noun phrases in the context

About 5% of the total queries

Examples:

C1: 你是陈奕迅粉丝吗? (are you a fan of Eason Chan? )

C2: 更喜欢张学友 (I like Jacky Cheung more)

q : 为什么更喜欢他? (Why like him more?)

q ‘: 为什么更喜欢张学友 (Why like Jacky Cheung more?)

q'= H(q,C)

C1 : 你住哪儿? (where do you live? )

C2 : 不二寺。 (Bu’er Temple )

q : 那在哪儿? (Where is it? )

q ‘ : 不二寺在哪儿? (Where is Bu’er Temple ? )

Page 16: Intelligent Chatbot on WeChat

模型建立 代消解

Context Query 陈奕迅 粉丝 更 喜欢 张学友 为什么 更 喜欢 他

)|(max 为什么更喜欢他张学友PP

“他”(him) “张学友”(Jacky Chueng)

q' = 为什么更喜欢张学友

RNN for Anaphora Resolution

Example:

C1: 你是陈奕迅粉丝吗?

C2: 更喜欢张学友

q : 为什么更喜欢他?

q ‘: 为什么更喜欢张学友

• 100K training data

• Accuracy: 90%

• Majority of the errors are

caused by the mistakes of

entity tagging

A bad case:

C1: 你认识贤三吗?

C2: 当然认识。

q : 他是你什么人?

q ': 三是你什么人?

Page 17: Intelligent Chatbot on WeChat

Query Complement

Input: q: current query c: context

Output: q': current query after query complement H: complete the current query with information in the context

About 15% of the total queries

Examples:

C1: 那你会发表情包吗? (can you send emojis? )

C2: 一般不发 (usually I don’t send emojis)

q :为什么? (Why?)

q ‘: 为什么不发表情包 (Why not send emojis?)

q'= H(q,C)

C1 :讲个故事给我听 (tell me a story )

C2 :等我学会了给你讲哦 。 (I’ll tell you a story once I learn how to)

q :我等着 (I’m waiting)

q ‘ :我等着听故事 (I’m waiting for the story)

Page 18: Intelligent Chatbot on WeChat

模型建立 代消解 RNN for Query Complementt

Training Sample:

C1:讲个故事给我听

C2:等我学会了给你讲哦 。

q :我等着

q ‘:我等着听故事

• 100,000 training instances

• Accuracy: 70%

• Increased the engagement of

Xian’er Mechanical Monk by

11%

我 等 着 听 故

我 等 着 听

讲 个 故 事 给 我 听 _E_ 等 ...

... ...

x

y

Page 19: Intelligent Chatbot on WeChat

部分结果展示

你去问问师父喜欢你吗

不会的,问你师父去

什么时候问必要

Query Complement Results in Real Dialogs

Page 20: Intelligent Chatbot on WeChat

部分结果展示 Sentence Similarity Computation

Unsupervised word embedding approach is not good enough

Sentence 0 Sentence 1 Similarity based on Word

Embedding

Similar Enough?

你是谁 (who are you) 我是谁 (who am I) 0.93 No

我爱你 (I love you) 你爱我 (you love me) 0.89 No

吃饭了吗 (Do you have lunch?) 吃饭了 (just had lunch) 0.84 No

你干嘛的 (what is your job?) 你干嘛呢 (who are you doing?) 0.93 No

有轮回吗? (Is reincarnation true?)

轮回有结束吗 (will the cycle of life end?)

0.73 No

会不会轮回 (will reincarnation happen?)

会不会轮回结束 (Will reincarnation end?)

0.84 No

随喜您 (you did it well) 您做的很好 (you did it well) 0.20 Yes

Page 21: Intelligent Chatbot on WeChat

Supervised Learning for Sentence Similarity

Feature Embedding Model • Sentence features unigrams bi-grams • Comparison Features word pairs from two sentences each edit operations 什么 含义 vs. 什么 意思 match-什么-什么 replace-含义-意思

RNN for sentence similarity Question 0 Question 1

Page 22: Intelligent Chatbot on WeChat

Sentence Similarity Results

Models Accuracy

Unsupervised word embedding 0.63

RNN + cosine similarity 0.65

RNN + MLP 0.6878

CNN + MLP 0.6968

RNN + Tensor 0.728

Feature Embedding 0.75

220,000 sentence pairs for training

20,000 for testing

Page 23: Intelligent Chatbot on WeChat

Response Generation

• Generative model is used if no match from knowledge base • Neural Network based methods for response generation

24

• Motivated by neural network based methods for translation

One sentence in Language A One sentence in Language B

Input Sentence Response Sentence

Translation

Response Generation

Page 24: Intelligent Chatbot on WeChat

Neural Network based Methods for Response Generation

• Motivated by neural network based methods for translation

25

Training data:

Objective:

Page 25: Intelligent Chatbot on WeChat

NN based Methods for Response Generation

26

Page 26: Intelligent Chatbot on WeChat

Dialogue vs. Translation

27

•Dialogue corpus is different from translation

corpus

•The response diversity problem exists in

dialogue corpus

Page 27: Intelligent Chatbot on WeChat

Diversity

28

For question: What's up? The normal I am OK.

I am fine.

Mr. Shelton Bazinga!

Mr. Trump You are fired!

Page 28: Intelligent Chatbot on WeChat

•In our experimental corpus, more than 60 different

responses exist to the post “You are so silly”

•No!

•You are!

•Why?

•Don’t say that

•Many different responses usually correspond to the

same post

Diversity

Page 29: Intelligent Chatbot on WeChat

Issues on Response Diversity

• Only return the safe and generic answers, i.e. the one with the highest probability

• Cannot recognize good but low probability answers

30

Responses with high probabilities

Good responses, but occur not frequently

Bad responses

Page 30: Intelligent Chatbot on WeChat

Response-Style Modeling

31

Page 31: Intelligent Chatbot on WeChat

32 A diverter is developed to generate the mechanism distribution of an input

post

Encoder-Diverter-Decoder

Page 32: Intelligent Chatbot on WeChat

34

•Training

•815, 852 pairs of post and response •775, 852 are for training, •40, 000 are for model validation.

•Testing

•We randomly select 300 posts from about 15 million posts

•Every baseline model generates 5 response

•Use human judgment the evaluate the model performance

Experiment

Page 33: Intelligent Chatbot on WeChat

36

the diversity of the response is increased by 1.7 times, and the

accuracy is increased by 9.8%

Experiment Results

Page 34: Intelligent Chatbot on WeChat

37

Example Output

Page 35: Intelligent Chatbot on WeChat

Future Work

• Making use of more knowledge sources knowledge graph article content

• Unsupervised machine learning

• Open the service to the public knowledge management model tuning chatbot customization

38

Page 36: Intelligent Chatbot on WeChat

Voice & Audio Natural Language Processing

Machine Learning Image & Video

WeChat AI are hiring now!

[email protected] [email protected]

Beijing, Guangzhou, Shenzhen, Palo Alto

Machine Translation

/通用格式

/通用格式

/通用格式

/通用格式

/通用格式

/通用格式 /通用格式 /通用格式 /通用格式 /通用格式 samples/

sec

batch_size

speed, 4 gpus

amber mxnet tf

Page 37: Intelligent Chatbot on WeChat

Thanks

WeChat A.I. NLP