Download pdf - It’s all in the Content: State of the art Best Answer Prediction based on Discretisation of Shallow Linguistic Features

It’s all in the Content: State of the art Best

Answer Prediction based on Discretisation

of Shallow Linguistic Features

George Gkotsis, Karen Stepanyan, Carlos

Pedrinaci, John Domingue, Maria Liakata*

Knowledge Media Institute, The Open University

*Department of Computer Science, University of Warwick

Outline

• Motivation

• Problem description

• Proposed solution

• Evaluation

• Discussion & Conclusion

23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)

Motivation


Questions on social networking sites


Recommendations

&

opinions

Authoritative

responses

Expert &

Empirical

knowledge

Queries on CQA


Why best answer prediction?

• Information overload

• Increase awareness in the community

• Answer questions more efficiently

• One way to study social media reception

• Plus:

• Finding experts in communities

• Study of language use

• Trend analysis

• …

• Visit


Problem description


Best answer prediction in Social Q&A

• Binary classification problem

• Is it solved?

• Yes, partially

• Current solutions depend on:


Answer Ratings

• Score, #comments

Knowledge is Future & Unknown

User Ratings

• User Reputation

• UpVotes etc

• Preferential attachment

Knowledge is Past & Not

always available

State of the art solutions

“…we observe significant assortativity in the reputations of

co-answerers, relationships between reputation and

answer speed, and that the probability of an answer

being chosen as the best one strongly depends on

temporal characteristics of answer arrivals.”

Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, Jure Leskovec

Discovering Value from Community Activity on Focused Question

Answering Sites: A Case Study of Stack Overflow.

KDD 2012


State of the art solutions (cont.)

“When available, scoring (or rating) features improve

prediction results significantly, which demonstrates the

value of community feedback and reputation for identifying

valuable answers.”

Grégoire Burel, Yulan He, Harith Alani.

Automatic Identification of Best Answers in Online Enquiry

Communities

ESWC 2012


State of the art solutionsSummary


Our solution

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Linguistic User Ratings Answer ratings

Average Precision

StackExchange network

SE “is all about getting answers, it’s not a

discussion forum, there’s no chit-chat”

• 123 Q&A sites

• 5,622,330 users

• 9.5 million questions

• 16.3 million answers

• 9.3 million visits per day


20 June 2014:

Training Dataset

September 2013 dump

StackOverflow & 20 of the most active SE websites

Questions with Accepted Answers

• 4,366,662 Non Accepted Answers

• 3,939,224 Accepted Answers


Accepted Answers

47%

Non Accepted Answers …

SE websites


0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

200,000

Non Accepted

Accepted


StackOverflow

91%

The Rest9%

3,375,817

3,795,276

0

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

7,000,000

8,000,000

stackoverflow

Non AcceptedAnswers

AcceptedAnswers

Shallow Linguistic features

• Long history, coming from studies on readability

1. Average number of characters per word

2. Average number of words per sentence

3. Number of words in the longest sentence

4. Answer length

5. Log Likehood:


Pitler and Nenkova, 2008

StackOverflow – Activity


StackOverflow – Length


StackOverflow – Log Likehood


StackOverflow – Characters Per Word


StackOverflow – Longest Sentence


StackOverflow – Words Per Sentence


StackOverflowOverview of shallow features’ evolution


Shallow features: Observations

• Accepted answers tend to be:

• Longer

• Differ more from the community vocabulary

• Contain shorter words

• Have longer longest sentences

• Have more words per sentence


But how good are shallow features?

But how good are shallow features?

• 58% macro precision (our baseline)

• Possible reasons

1. Evolution of language characteristics

• Language becomes more eloquent

2. Variance is huge

3. Universal classifier looks unreachable, e.g.:

• SuperUser average length is 577

• Skeptics average length is 2,154


Proposed solution


Objectives

• Build a classifier which is:

1. Based on linguistic features solely

2. Robust

• Performs equally well to other classifiers that use user ratings (past

knowledge) or answer ratings (future knowledge)

3. Universal

• Same classifier applicable to as many SE websites possible

(domain agnostic)


Feature discretisationExample for Length


Group by question

Question Id

1

5

Answer Id

6

7

Length

2 200

3 150

4 250

150

100

Sort by Length in descending order

Rank

LengthD

1

2

3

1

2

Information Gain from Discretisation


Feature discretisation

Category Name Information Gain

Linguistic

Length 0.0226

LongestSentence 0.0121

LL 0.0053

WordsPerSentence 0.0048

CharactersPerWord 0.0052

Linguistic

Discretisation

LengthD 0.2168

LongestSentenceD 0.1750

LLD 0.1180

WordsPerSentenceD 0.1404

CharactersPerWordD 0.1162


20x increase

User and answer rating features


Category Name Information Gain

Other

Age 0.0539

CreationDateD 0.1575

AnswerCount 0.3270

User Rating

UserReputation 0.0836

UserUpVotes 0.0535

UserDownVotes 0.0412

UserViews 0.0528

UserUpDownVotes 0.0508

Answer rating

Score 0.0792

CommentCount 0.0286

ScoreRatio 0.4539

Evaluation


What are we evaluating?

1. Prediction

2. How good is it compared with the SOTA?

3. Generality


1. Prediction – Features used


Linguistic

Linguistic

Discretisation

Other

User

Rating

Answer

Rating

Past Knowledge Future Knowledge

1. Prediction

• Classifier was Alternate Decision Trees (ADT)

• Binary, boosting, numerical data

• Weka

• 10-fold validation


Linguistic

Linguistic

Discretisation

Other

1. PredictionSE Website P R FM AUC

stackoverflow.com 0.82 0.66 0.73 0.85

apple.stackexchange.com 0.84 0.68 0.75 0.86

askubuntu.com 0.84 0.74 0.79 0.88

drupal.stackexchange.com 0.87 0.79 0.83 0.89

electronics.stackexchange.com 0.79 0.65 0.71 0.84

english.stackexchange.com 0.77 0.52 0.62 0.83

gamedev.stackexchange.com 0.82 0.71 0.76 0.87

gaming.stackexchange.com 0.87 0.79 0.83 0.91

gis.stackexchange.com 0.85 0.73 0.78 0.87

math.stackexchange.com 0.85 0.74 0.79 0.87

mathoverflow.net 0.83 0.7 0.76 0.87

meta.stackoverflow.com 0.87 0.69 0.77 0.87

physics.stackexchange.com 0.86 0.71 0.78 0.88

programmers.stackexchange.com 0.76 0.4 0.52 0.84

serverfault.com 0.83 0.66 0.74 0.85

skeptics.stackexchange.com 0.87 0.83 0.85 0.91

stats.stackexchange.com 0.85 0.79 0.82 0.89

superuser.com 0.84 0.65 0.73 0.85

tex.stackexchange.com 0.87 0.77 0.82 0.88

unix.stackexchange.com 0.81 0.68 0.74 0.85

wordpress.stackexchange.com 0.88 0.8 0.84 0.89

Average 0.84 0.7 0.76 0.87


SE Website P R FM AUC

stackoverflow.com 0.82 0.66 0.73 0.85

Macro Average 0.84 0.7 0.76 0.87

2. Comparison with other solutions


Linguistic

Linguistic

Discretisation

Other

User

Rating

Answer

Rating

Case Features Used

1 Linguistic

2 Linguistic & Discretisation

3 Linguistic & Discretisation &

Other

4 Linguistic & Other & User

Rating

(no discretisation)


Rating

(with discretisation)

6 All features

(Answer and User Rating

with discretisation)

Comparison

Case Features Used P R FM AUC

1 Linguistic 0.58 0.60 0.56 0.60

2 Linguistic & Discretisation 0.81 0.70 0.74 0.84

3 Linguistic & Discretisation &

Other

0.84 0.7 0.76 0.87


Rating

(no discretisation)

0.82 0.69 0.75 0.86


Rating

(with discretisation)

0.82 0.72 0.77 0.88

6 All features

(Answer and User Rating

with discretisation)

0.88 0.85 0.86 0.94


3. Generality

• Leave-one-out

• Trained a classifier for each SE website based on all other SE

websites

(Stackoverflow was evaluated but was excluded from training due to its size)


P R FM AUCMacro average based on self-training(results from the first part of evaluation) 0.84 0.7 0.76 0.87

Leave-one-out 0.83 0.7 0.76 0.87

Discussion & Conclusion


Best Answer prediction

• Community feedback on the answers remains the best

way for determining the best answer, but

• Discretisation reveals a lot more information

• Content features, even shallow ones CAN be very informative

• Independent from past (not always available) knowledge

• Independent from future knowledge

• Web application/service is under development



Best Answer Prediction

User & answer rating

Linguistic features

?

Proposed

solution

Thank you


http://xkcd.com/386/