77
Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong 1/16/2012 I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 1 / 62

Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Community Question Answering

Irwin King, Tom Chao Zhou, Baichuan Li

Department of Computer Science & EngineeringThe Chinese University of Hong Kong

1/16/2012

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 1 / 62

Page 2: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 2 / 62

Page 3: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Introduction

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 3 / 62

Page 4: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Introduction

Community Question Answering

Knowledge dissemination,information seeking

Natural language questions

Explicit, self-contained answers

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 3 / 62

Page 5: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Introduction

Community Question Answering

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 4 / 62

Page 6: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Introduction

Community Question Answering

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 5 / 62

Page 7: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Introduction

Advantages of Community Question Answering

Could solve information needs that are personal, heterogeneous,specific, open-ended, and cannot be expressed as a short query

No single Web page will directly answer these complex andheterogeneous needs, CQA users should understand and answer betterthan a machine

Have accumulated rich knowledge

More than one billion posted answers in Yahoo! Answershttp://yanswersblog.com/index.php/archives/2010/05/03/1-billion-answers-served/More than 190 million resolved questions in Baidu ZhidaoIn China, 25% of Google’s top-search-results page contain at least onelink to some Q&A site, Si et al., VLDB, 2010

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 6 / 62

Page 8: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Introduction

Community Question Answering

Community Question Answering

Question Retrieval

Question Recommendation

Question Subjectivity Analysis

Lexical Approach

Syntactic Approach

LM TM TRLM

MDL TopicTRLM

STM

Supervised Semi-Supervised

Data-Driven

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 7 / 62

Page 9: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 8 / 62

Page 10: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Motivation

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 7 / 62

Page 11: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Motivation

Ask a Question

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 8 / 62

Page 12: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Motivation

Problem and Opportunity

Problem

Askers need to wait some time to get an answer, time lag15% of the questions do not receive any answer in Yahoo! Answers,which is one of the first CQA sites on the Web

Opportunity

25% questions in certain categories are recurrent, Anna, Gideon andYoelle, WWW, 2012

Answer new questions by reusing past resolved questions

Question Retrieval: find semantically similar past questions for a newquestion

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 9 / 62

Page 13: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Motivation

Question Retrieval Example

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 10 / 62

Page 14: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Motivation

Benefit of Question Retrieval

Provide an alternative to automatic question answering

Help askers get an answer in a timely manner

Guide answerers to answer unique questions, better utilize users’answering passion

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 11 / 62

Page 15: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Motivation

Notations

Symbol Description

Q A new questionD A candidate question∣ ⋅ ∣ Length of the textC Background collectionw A term in the new questiont A term in a candidate question

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 12 / 62

Page 16: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Lexical-based Approach

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 12 / 62

Page 17: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Lexical-based Approach

Lexical-based Approach: Language Model

In language modeling, similarity between a query and a document isgiven by the probability of generating the query from the documentlanguage model

Unigram language model, i.i.d. sampling

P(Q∣D) =∏w∈Q

P(w ∣D)

In question retrieval syntax, query is the new question, document is acandidate question

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 13 / 62

Page 18: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Lexical-based Approach

Lexical-based Approach: Language Model

To avoid zero probabilities and estimate more accurate languagemodels, documents are smoothed using a background collection

P(w ∣D) = (1− �)Pml(w ∣D) + �Pml(w ∣C )

� is a smoothing parameter, 0 ≤ � ≤ 1

Pml(w ∣D) =termfrequency(w ,D)∑

w ′∈Dtermfrequency(w ′,D)

Maximum likelihood estimator to calculate Pml(⋅)

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 14 / 62

Page 19: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Lexical-based Approach

Lexical-based Approach: Translation Model

LM

Advantage Simple

Disadvantage Lexical Gap

Lexical Gap, two questions that have the same meaning use verydifferent wording

Is downloading movies illegal?Can I share a copy of a DVD online?

Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee, Finding SimilarQuestions in Large Question and Answer Archives, CIKM, 2005

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 15 / 62

Page 20: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Lexical-based Approach

Lexical-based Approach: Translation Model

Language ModelP(w ∣D) = (1− �)Pml(w ∣D) + �Pml(w ∣C )

Translation ModelP(w ∣D) = (1− �)

∑t∈D

(T (w ∣t)Pml(t∣D)) + �Pml(w ∣C )

T (w ∣t) is the probability that word w is the translation of word t,denotes semantic similarities between words

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 16 / 62

Page 21: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Lexical-based Approach

Lexical-based Approach: Translation Model

Table: Questions share few common words, but may have high semanticrelatedness according to translation model

Id like to insert music into PowerPoint.How can I link sounds in PowerPoint?

How can I shut down my system in Dos-mode.How to turn off computers in Dos-mode.

Photo transfer from cell phones to computers.How to move photos taken by cell phones.

Which application can run bin files?I download a game. How can I execute bin files?

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 17 / 62

Page 22: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Lexical-based Approach

Figure: The first row shows the source words and each column shows top 10words that are most semantically similar to the source word. A higher rank meansa larger T (w ∣t) value

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 18 / 62

Page 23: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Lexical-based Approach

Lexical-based Approach: Translation Model

How to learn T (w ∣t)?

Prepare a monolingual parallel corpus of pairs of text, each pair shouldbe semantically similarEmploy machine translation model IBM model 1 on the parallel corpusto learn T (w ∣t)

How this paper prepares monolingual parallel corpus

Each pair contains two questions whose answers are very similar

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 19 / 62

Page 24: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Lexical-based Approach

Lexical-based Approach: Translation Model

Delphine Bernhard and Iryna Gurevych, Combining Lexical SemanticResources with Question & Answer Archives for Translation-BasedAnswer Finding, ACL, 2009

Propose and several methods to prepare parallel monolingual corpora

Question answer pairs: question ↔ answerQuestion reformulation pairs: question ↔ question reformulation byuser

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 20 / 62

Page 25: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Lexical-based Approach

Lexical-based Approach: Translation Model

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 21 / 62

Page 26: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Lexical-based Approach

Lexical-based Approach: Translation Model

Lexical Semantic Resources: glosses and definitions for the samelexeme in different lexical semantic and encyclopedic resources can beconsidered as near-paraphrases, since they define the same terms andhence have the same meaning

moon

Wordnet: the natural satellite of the EarthEnglish Wiktionary: the Moon, the satellite of planet EarthEnglish Wikipedia: the Moon (Latin: Luna) is Earth’s only naturalsatellite and the fifth largest natural satellite in the Solar System

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 22 / 62

Page 27: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Lexical-based Approach

Lexical-based Approach: Translation-based LanguageModel

TM

Advantage Tackle lexical gap to some extent

Disadvantage T (w ∣w) = 1 for all w while maintaining other wordtranslation probabilities unchanged, produce inconsistentprobability estimates and make the model unstable

Xiaobing Xue, Jiwoon Jeon and W. Bruce Croft, Retrieval Models forQuestion and Answer Archives, SiGIR, 2008

Translation-based Language Model

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 23 / 62

Page 28: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Lexical-based Approach

Lexical-based Approach: Translation-based LanguageModel

Translation ModelP(w ∣D) = (1− �)

∑t∈D

(T (w ∣t)Pml(t∣D)) + �Pml(w ∣C )

Translation-based Language Model

P(w ∣D) = ∣D∣∣D∣+�Pmx(w ∣D) + �

∣D∣+�Pml(w ∣C )

Pmx(w ∣D) = (1− �)Pml(w ∣D) + �∑t∈D

T (w ∣t)Pml(t∣D)

Linear combination of language model and translation model

Answer part should provide additional evidence about relevance,incorporating the answer part

Pmx(w ∣(D,A)) = �Pml(w ∣D) + �∑t∈D

T (w ∣t)Pml(t∣D) + Pml(w ∣A)

� + � + = 1

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 24 / 62

Page 29: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Syntactic-based Approach

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 24 / 62

Page 30: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Syntactic-based Approach

Syntactic-based Approach: Syntactic Tree Matching

Some similar questions neither share many common words, nor followidentical syntactic structure

How can I lose weight in a few months?Are there any ways of losing pound in a short period?

Kai Wang, Zhaoyan Ming and Tat-Seng Chua, A Syntactic TreeMatching Approach to Finding Similar Questions in Community-basedQA Services, SIGIR, 2009

Syntactic tree matching

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 25 / 62

Page 31: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Syntactic-based Approach

Figure: (a) The Syntactic Tree of the Question ”How to lose weight?”. (b) TreeFragments of the Sub-tree covering ”lose weight”.

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 26 / 62

Page 32: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Syntactic-based Approach

Syntactic-based Approach: Syntactic Tree Matching

Tree kernel: utilize structural or syntactic information to capturehigher order dependencies between grammar rules

k(T1,T2) =∑n1∈N1

∑n2∈N2

C (n1, n2)

N1, N2 are sets of nodes in two syntactic trees T1 and T2, andC (n1, n2) equals to the number of common fragments rooted in n1and n2

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 27 / 62

Page 33: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Retrieval Syntactic-based Approach

Syntactic-based Approach: Syntactic Tree Matching

Limitation of tree kernel

Tree kernel function merely replies on the intuition of counting thecommon number of sub-trees, whereas the number might not be agood indicator of the similarity between two questionsTwo evaluated sub-trees have to be identical to allow further parentmatching, for which semantic representations cannot fit in well

Syntactic tree matching

A new weighting scheme for tree fragments that are robust againstsome grammatical errorsIncorporate semantic features

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 28 / 62

Page 34: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Recommendation

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 29 / 62

Page 35: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Recommendation Motivation

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 28 / 62

Page 36: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Recommendation Motivation

Motivation

Question Recommendation

Retrieve and rank other questions according to their likelihood of beinggood recommendations of the queried questionA good recommendation provides alternative aspects around users’interest

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 29 / 62

Page 37: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Recommendation Motivation

Example

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 30 / 62

Page 38: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Recommendation MDL-based Tree Cut Model

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 30 / 62

Page 39: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Recommendation MDL-based Tree Cut Model

Question Recommendation: MDL-based Tree Cut Model

Yunbo Cao, Huizhong Duan, Chin-Yew Lin, Yong Yu and Hsiao-WuenHon, Recommending Questions Using the MDL-based Tree CutModel, WWW, 2008

Step 1: Represent questions as graphs of topic terms

Step 2: Rank recommendations on the basis of the graphs

Formalize both steps as the tree-cutting problems and employ theMDL (Minimum Description Length) for selecting the best cuts

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 31 / 62

Page 40: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Recommendation MDL-based Tree Cut Model

Question Recommendation: MDL-based Tree Cut Model

Question

Any cool clubs in Berlin orHamburg?

Question topic

Major context/constraint of aquestion, characterize users’interestsBerlin, Hamburg

Question focus

Certain aspect of the questiontopiccool club

Suggest alternative aspects ofthe queries question topic

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 32 / 62

Page 41: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Recommendation MDL-based Tree Cut Model

Question Recommendation: MDL-based Tree Cut Model

Extraction of topic terms: base noun phrase, WH-ngramReduction of topic terms: MDL-based tree cut model

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 33 / 62

Page 42: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Recommendation MDL-based Tree Cut Model

Question Recommendation: MDL-based Tree Cut Model

Topic profile

Probability distribution of categories {p(c ∣t)}c∈C

p(c ∣t) = count(c,t)∑c∈C count(c,t

count(c , t) is the frequency of the topic term t within the category c

Specificity

Inverse of the entropy of the topic profileTopic term of high specificity usually specifies question topicTopic term of low specificity is usually used to represent question focus

Topic chain

Topic chain is a sequence of ordered topic terms sorted from big tosmall according to specificity

Question tree

Prefix tree built over topic chains of the question set Q

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 34 / 62

Page 43: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Recommendation MDL-based Tree Cut Model

Question Recommendation: MDL-based Tree Cut Model

Ranking recommendation candidatesDetermine what topic terms (question focus) should be substituted

Collect a set of topic chains Qc = {qci }Ni−1 such that at least one topic

term occurs in both qc and qci

Construct a question tree from the set of topic chains Qc ∪ qc

Employ MDL to separate topic chains into Head, H and Tail, T

Score recommendation candidates rendered by various substitutions

Specificity: the more similar are H(qc) and H(q̂c), the higher scoreGenerality: the more similar are T (qc) and T (q̂c), the lower score

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 35 / 62

Page 44: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Recommendation Topic-enhanced Translation-based Language Model

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 35 / 62

Page 45: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Recommendation Topic-enhanced Translation-based Language Model

Question Recommendation: TopicTRLM

Tom Chao Zhou, Chin-Yew Lin, Irwin King, Michael R. Lyu, Young-InSong and Yunbo Cao, Learning to Suggest Questions in OnlineForums, AAAI, 2011

Suggest semantically related questions in online forumsHow is Orange Beach in Alabama?

Is the water pretty clear this time of year on Orange Beach?Do they have chair and umbrella rentals on Orange Beach?

Topic: travel in Orange Beach

Fuse both lexical and latent semantic information

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 36 / 62

Page 46: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Recommendation Topic-enhanced Translation-based Language Model

Question Recommendation: TopicTRLM

Document representationBag-of-words

IndependentFine-grained representationLexically similar

Topic model

Assign a set of latent topic distributions to each wordCapturing important relationships between wordsCoarse-grained representationSemantically related

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 37 / 62

Page 47: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Recommendation Topic-enhanced Translation-based Language Model

Question Recommendation: TopicTRLM

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 38 / 62

Page 48: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 39 / 62

Page 49: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Motivation

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 38 / 62

Page 50: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Motivation

Question Subjectivity Analysis

Question Analysis is to analyze characteristics of questions

Understand User Intent

Provide rich information to question search, questionrecommendation, answer quality prediction, etc.

Question Subjectivity Analysis is an important aspect of questionanalysis

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 39 / 62

Page 51: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Motivation

Question Subjectivity Analysis: Definition

Subjective question

Private statementsPersonal opinion and experienceWhat’s the difference between chemotherapy and radiation treatments?

Objective question

Objective, verifiable informationOften with support from reliable sourcesHas anyone got one of those home blood pressure monitors? and if sowhat make is it and do you think they are worth getting?

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 40 / 62

Page 52: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Motivation

Question Subjectivity Analysis: Motivation

More accurately identify similar questions, improve question search

Better rank or filter the answers based on whether an answer matchesthe question orientation

Crucial component of inferring user intent, a long-standing problem inWeb search

Route subjective questions to users for answer, trigger automaticfactual question answering system for objective questions

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 41 / 62

Page 53: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Motivation

Question Subjectivity Analysis: Challenge

Ill-formatted, e.g., word capitalization may be incorrect or missing,consecutive words may be concatenated

Ungrammatical, include common online idioms, e.g., using “u” means“you”, “2” means “to”

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 42 / 62

Page 54: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Supervised Approach

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 42 / 62

Page 55: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Supervised Approach

Question Subjectivity Analysis: Supervised Learning

Baoli Li, Yandong Liu, Ashwin Ram, Ernest V. Garcia and EugeneAgichtein, Exploring Question Subjectivity Prediction in CommunityQA, SIGIR, 2008

Support Vector Machine with linear kernel

Features

Character 3-gramWordWord + character 3-gramWord n-gramWord POS n-gram, mix of word and POS tri-grams

Term weighting schemes: binary, TF, TF*IDF

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 43 / 62

Page 56: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Semi-Supervised Approach

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 43 / 62

Page 57: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Semi-Supervised Approach

Question Subjectivity Analysis: Semi-Supervised Learning

Figure: YahooAnswers Example.

Baoli Li, Yandong Liu andEugene Agichtein, CoCQA:Co-Training Over Questions andAnswers with an Application toPredicting Question SubjectivityOrientation, EMNLP, 2008

Incorporate relationshipsbetween questions andcorresponding answers

Co-training, two views of thedata, question and answer

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 44 / 62

Page 58: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Semi-Supervised Approach

Figure: Algorithm CoCQA.

At step 1,2, each category hastop Kj most confident exampleschosen as additional “labeled”data

Terminate when the incrementsof both classifiers are less thanthreshold X or maximumnumber of iterations areexceeded

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 45 / 62

Page 59: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Semi-Supervised Approach

Question Subjectivity Analysis: Semi-Supervised Learning

Obtain two classifiers after CoCQA terminates

Classify a new example with two classifiers based on both of thefeature sets

Multiply the confidence values, choose the class that has the highestproduct

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 46 / 62

Page 60: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Data-driven Approach

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 46 / 62

Page 61: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Data-driven Approach

Question Subjectivity Analysis: Data-driven Approach

Tom Chao Zhou, Xiance Si, Edward Y. Chang, Irwin King andMichael R. Lyu, A Data-Driven Approach to Question SubjectivityIdentification in Community Question Answering, AAAI, 2012

Li et al. 2008 (supervised), Li et al. 2008 (CoCQA, semi-supervised)based on manual labeling data

Manual labeling data is quite expensive

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 47 / 62

Page 62: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Data-driven Approach

Question Subjectivity Analysis: Data-driven Approach

Web-scale learning is to use available large-scale data rather than hopingfor annotated data that isn’t available

- Halevy, Norvig and Pereira

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 48 / 62

Page 63: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Data-driven Approach

Question Subjectivity Analysis: Data-driven Approach

Whether we can utilize social signals to collect training data for questionsubjectivity identification with NO manual labeling?

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 49 / 62

Page 64: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Data-driven Approach

Like Signal: like an answer ifthey find the answer useful

Intuition

Subjective: answers areopinions, different tastes; bestanswer receives similarnumber of likes with otheranswersObjective: like an answerwhich explains universal truthin most detail; best answerreceives high likes than otheranswers

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 50 / 62

Page 65: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Data-driven Approach

Vote Signal: users could votefor best answer

Intuition

Subjective: vote for differentanswers, support differentopinions; low percentage ofvotes on best answerObjective: easy to identifyanswer contain the most fact;percentage of votes of bestanswer is high

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 51 / 62

Page 66: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Data-driven Approach

Source Signal: reference toauthoritative resources

Intuition

Only available for objectivequestion that has fact answer

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 52 / 62

Page 67: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Data-driven Approach

Question Subjectivity Analysis: Data-driven Approach

Poll and Survey signal

User intent is to seek opinions

Very likely to be subjective

What is something you learnedin school that you think is usefulto you today?

If you could be a cartooncharacter, who would you wantto be?

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 53 / 62

Page 68: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Data-driven Approach

Question Subjectivity Analysis: Data-driven Approach

Answer Number signal: the number of posted answers to eachquestion

Intuition

Subjective: alertpost opinions even they notice there are other answersObjective: may not post answers to questions that has received otheranswers since an expected answer is usually fixedA large answer number indicate subjectivityA small answer number may be due to many reasons, such asobjectivity, small page views

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 54 / 62

Page 69: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Data-driven Approach

Question Subjectivity Analysis: Data-driven Approach

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 55 / 62

Page 70: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Data-driven Approach

Question Subjectivity Analysis: Data-driven Approach

Features

Word: term frequencyWord n-gram: term frequencyQuestion length: information needs of subjective questions arecomplex, users use descriptions to explain, larger question lengthRequest word: particular words to explicitly indicate their request forseeking opinions; manual list of 9 words

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 56 / 62

Page 71: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

Question Subjectivity Analysis Data-driven Approach

Question Subjectivity Analysis: Data-driven Approach

Subjectivity clue: external lexicon, over 8000 clues, manuallycompiled word list from news to express opinions

Punctuation density: density of punctuation marks

Grammatical modifier: inspired by opinion mining research of usinggrammatical modifiers on judging users’ opinions, adjective andadverb

Entity: objective question expects fact answer, leading to lessrelationships among entities, subjective questions contains moredescriptions, may involve relatively complex relations

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 57 / 62

Page 72: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

References

Outline

1 Introduction

2 Question RetrievalMotivationLexical-based ApproachSyntactic-based Approach

3 Question RecommendationMotivationMDL-based Tree Cut ModelTopic-enhanced Translation-based Language Model

4 Question Subjectivity AnalysisMotivationSupervised ApproachSemi-Supervised ApproachData-driven Approach

5 References

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 58 / 62

Page 73: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

References

References

Chengxiang Zhai and John Lafferty, A Study of Smoothing Methodsfor Language Models Applied to Information Retrieval, ACMTransactions on Information Systems, 2004

Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee, Finding SemanticallySimilar Questions Based on Their Answers, SIGIR, 2005

Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee, Finding SimilarQuestions in Large Question and Answer Archives, CIKM, 2005

Xiaobing Xue, Jiwoon Jeon and W. Bruce Croft, Retrieval Models forQuestion and Answer Archives, SIGIR, 2008

Hu Wu, Yongji Wang and Xiang Cheng, Incremental ProbabilisticLatent Semantic Analysis for Automatic Question Recommendation,RecSys, 2008

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 58 / 62

Page 74: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

References

References

Yunbo Cao, Huizhong Duan, Chin-Yew Lin, Yong Yu and Hsiao-WuenHon, Recommending Questions Using the MDL-based Tree CutModel, WWW, 2008

Baoli Li, Yandong Liu and Eugene Agichtein, CoCQA: Co-TrainingOver Questions and Answers with an Application to PredictingQuestion Subjectivity Orientation, EMNLP, 2008

Delphine Bernhard and Iryna Gurevych, Combining Lexical SemanticResources with Question & Answer Archives for Translation-BasedAnswer Finding, ACL, 2009

Kai Wang, Zhaoyan Ming and Tat-Seng Chua, A Syntactic TreeMatching Approach to Finding Similar Questions in Community-basedQA Services, SIGIR, 2009

Alon Halevy, Peter Norvig, and Fernando Pereira, The UnreasonableEffectiveness of Data, IEEE Intelligent Systems, 2009

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 59 / 62

Page 75: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

References

References

Kai Wang, Zhao-Yan Ming, Xia Hu and Tat-Seng Chua,Segmentation of Multi-Sentence Questions: Towards EffectiveQuestion Retrieval in cQA Services, SIGIR, 2010

Xin Cao, Gao Cong, Bin Cui, Christian S. Jensen, A GeneralizedFramework of Exploring Category Information for Question Retrievalin Community Question Answer Archives, WWW, 2010

Tom Chao Zhou, Chin-Yew Lin, Irwin King, Michael R. Lyu, Young-InSong and Yunbo Cao, AAAI, 2011

Shuguang Li and Suresh Manandhar, Improving QuestionRecommendation by Exploiting Information Need, ACL, 2011

Tom Chao Zhou, Xiance Si, Edward Y. Chang, Irwin King andMichael R. Lyu, A Data-Driven Approach to Question SubjectivityIdentification in Community Question Answering, AAAI, 2012

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 60 / 62

Page 76: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

References

References

Onur Kucuktunc, B. Barla Cambazoglu, Ingmar Weber, HakanFerhatosmanoglu, A Large-Scale Sentiment Analysis for Yahoo!Answers, WSDM, 2012

Xingliang Ni, Yao Lu, Xiaojun Quan, Liu Wenyin, Bei Hua, Userinterest modeling and its application for question recommendation inuser-interactive question answering systems, Information Processingand Management, 2012

Anna Shtok, Gideon Dror and Yoelle Maarek, Learning from the Past:Answering New Questions with Past Answers, WWW, 2012

Xiance Si, Edward Y. Chang, Zoltan Gyongyi and Maosong Sun,Confucius and Its Intelligent Disciples: Integrating Social with Search,VLDB, 2010

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 61 / 62

Page 77: Community Question Answering...Community Question Answering Irwin King, Tom Chao Zhou, Baichuan Li Department of Computer Science & Engineering The Chinese University of Hong Kong

References

QA

Thanks for your attention!

I.King, T.C.Zhou, B.Li (CUHK) Community Question Answering 1/16/2012 62 / 62