22
Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Embed Size (px)

Citation preview

Page 1: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Chunyi Peng, Zaoyang Gong, Guobin shenMicrosoft Research AsiaHotWeb 2006

MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Page 2: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Solve it yourself! – Ooh, out of our scope! Usually, Search it! –A common and good

way in many cases, but Search engine typically returns pages of links,

not direct answers. Some time it is very difficult for people to

describe their questions in a precise way. not all information is readily available in the

web. So, Ask! –A natural and effective way

Question-Answering (QA) utilizes grassroots intelligence and collaboration

Especially as a specific information acquisition.

When you have a question…

Page 3: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

So, our goals…

Measurement and modeling o f a real large-scale QA system how a real QA system works? What are the typical user behaviors and

their impacts? Seek Better QA system

How to design a QA system? How to make performance tradeoffs?

Page 4: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

iAsk (http://iask.sina.com.cn)

A topic-based web-QA system Question lifecycle:

questioning->wait for reply -> confirmation (closed)

Provide optimal reply selection & reply rewarding

Page 5: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Measurement Results

Data Set 2-month (Nov 22, 2005 to Jan 23, 2006) 350K questions and 2M replies 220K users, 1901 topics

Measurement on Question/reply patterns over time Question/reply pattern over topics Question/reply pattern across users Question/reply Incentive mechanisms

Page 6: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Behavior Pattern over Time

On Hourly Scale: a consistent usage pattern

Page 7: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Behavior Pattern over Topics Topic characteristics

P--Popularity (#Q) (Zipf-Popularity) questioning and replying activities

Q--Question Proneness (#Q/#U) the likelihood that a user will ask a question

R-- Reply Proneness (#R/#U) the likelihood that a user will reply a question

Our measurement shows that topic characteristics vary intensively and user behaves quite differently.

Page 8: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Behavior Pattern across Users Active and non-active users

about 9% users to 80% replies VS.about 22% users to 80% questions

asymmetric questioning/replying pattern 4.7% altruists VS. 17.7% free-riders

Narrow user interests #topic (Q): 1.8 #topic (R): 3.3

Page 9: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Performance Metric

Reply-Rate how likely his question can be replied

Reply-Number How likely his question can get an expected

answer

Reply-Latency how quickly he can get an answer

Page 10: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

iAsk performance

Long-term performance: Reply-Rate: 99.8% Reply-Number: about 5 Reply-Latency: about 10hr

Within 24hrs Reply-Rate: 85% Reply-Number: about 4 Reply-Latency: about 6hr

In summary, the performance is quite satisfactory except sometimes users need tolerate a relative long delay

Page 11: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Measurement on Incentive Mechanism

Page 12: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Modeling

The question arrival distribution: Poisson distribution

The reply behavior: an approximate exponentially-decaying model

Performance formula Define dynamic performance

Page 13: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Parameter Impact

Page 14: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Possible Improvement

Active or Push-based Question Delivery Better Webpage Layout, e.g. adding

shortcuts Better Incentive mechanism Utilize Power of Social Networks

Page 15: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Conclusions

Web-QA that leverages the grassroots’ intelligence and collaboration is hot and getting hotter…

Our measurement and model revealed that the QA’s QoS heavily depends on three key factors: user scale, user reply probability and a system design artifact, e.g. webpage design.

Current simple Web-QA System achieved the acceptable performance, but there still is improvement room

Page 16: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Backup

Page 17: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Behavior Pattern over Topics Topic characteristics

P--Popularity (#Q) (Zipf-Popularity)

Page 18: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Behavior Pattern over Topics Topic characteristics

P--Popularity (#Q), Zipf-Popularity Q--Question Proneness (#Q/#U) R-- Reply Proneness (#R/#U)

Page 19: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Narrow User Interest Scope

Page 20: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Reply distribution (measured)

Page 21: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Static Performance Formula

Reply-Rate

Reply-Number

Reply-Latency

Page 22: Chunyi Peng, Zaoyang Gong, Guobin shen Microsoft Research Asia HotWeb 2006 MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Dynamic Performance FormulaDefine dynamic performance

We have,