Click here to load reader

It’s all in the Content: State of the art Best Answer Prediction based on Discretisation of Shallow Linguistic Features

  • View
    697

  • Download
    1

Embed Size (px)

Text of It’s all in the Content: State of the art Best Answer Prediction based on Discretisation of...

Its all in the Content: State of the art Best Answer Prediction based on Discretisation of Shallow Linguistic Features

Its all in the Content: State of the art Best Answer Prediction based on Discretisation of Shallow Linguistic FeaturesGeorge Gkotsis, Karen Stepanyan, Carlos Pedrinaci, John Domingue, Maria Liakata*Knowledge Media Institute, The Open University*Department of Computer Science, University of Warwick

23-26 June 2014ACM Web Science Conference 2014 (WebSci14)OutlineMotivation

Problem description

Proposed solution

Evaluation

Discussion & Conclusion

23-26 June 2014ACM Web Science Conference 2014 (WebSci14)Motivation23-26 June 2014ACM Web Science Conference 2014 (WebSci14)Questions on social networking sites23-26 June 2014ACM Web Science Conference 2014 (WebSci14)

Recommendations &opinionsAuthoritative responsesExpert & Empirical knowledgeQueries on CQA23-26 June 2014ACM Web Science Conference 2014 (WebSci14)

Why best answer prediction?Information overload

Increase awareness in the communityAnswer questions more efficiently

One way to study social media reception

Plus:Finding experts in communitiesStudy of language useTrend analysisVisit 23-26 June 2014ACM Web Science Conference 2014 (WebSci14)

Problem description23-26 June 2014ACM Web Science Conference 2014 (WebSci14)Best answer prediction in Social Q&ABinary classification problem

Is it solved?Yes, partiallyCurrent solutions depend on:23-26 June 2014ACM Web Science Conference 2014 (WebSci14)Answer Ratings

Score, #comments

Knowledge is Future & Unknown

User Ratings

User ReputationUpVotes etcPreferential attachment

Knowledge is Past & Not always available

State of the art solutionswe observe significant assortativity in the reputations of co-answerers, relationships between reputation and answer speed, and that the probability of an answer being chosen as the best one strongly depends on temporal characteristics of answer arrivals.

Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, Jure LeskovecDiscovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow.KDD 201223-26 June 2014ACM Web Science Conference 2014 (WebSci14)10State of the art solutions (cont.)When available, scoring (or rating) features improve prediction results significantly, which demonstrates the value of community feedback and reputation for identifying valuable answers.

Grgoire Burel, Yulan He, Harith Alani.Automatic Identification of Best Answers in Online Enquiry CommunitiesESWC 201223-26 June 2014ACM Web Science Conference 2014 (WebSci14)11State of the art solutionsSummary23-26 June 2014ACM Web Science Conference 2014 (WebSci14)Our solution12StackExchange network SE is all about getting answers, its not a discussion forum, theres no chit-chat

123 Q&A sites5,622,330 users9.5 million questions16.3 million answers9.3 million visits per day23-26 June 2014ACM Web Science Conference 2014 (WebSci14)20 June 2014:Training DatasetSeptember 2013 dumpStackOverflow & 20 of the most active SE websitesQuestions with Accepted Answers

4,366,662 Non Accepted Answers3,939,224 Accepted Answers23-26 June 2014ACM Web Science Conference 2014 (WebSci14)SE websites23-26 June 2014ACM Web Science Conference 2014 (WebSci14)23-26 June 2014ACM Web Science Conference 2014 (WebSci14)Shallow Linguistic featuresLong history, coming from studies on readabilityAverage number of characters per wordAverage number of words per sentenceNumber of words in the longest sentenceAnswer lengthLog Likehood:

23-26 June 2014ACM Web Science Conference 2014 (WebSci14)

Pitler and Nenkova, 2008StackOverflow Activity23-26 June 2014ACM Web Science Conference 2014 (WebSci14)

StackOverflow Length23-26 June 2014ACM Web Science Conference 2014 (WebSci14)

StackOverflow Log Likehood23-26 June 2014ACM Web Science Conference 2014 (WebSci14)

StackOverflow Characters Per Word23-26 June 2014ACM Web Science Conference 2014 (WebSci14)

StackOverflow Longest Sentence23-26 June 2014ACM Web Science Conference 2014 (WebSci14)

StackOverflow Words Per Sentence23-26 June 2014ACM Web Science Conference 2014 (WebSci14)

StackOverflowOverview of shallow features evolution23-26 June 2014ACM Web Science Conference 2014 (WebSci14)

Shallow features: ObservationsAccepted answers tend to be:LongerDiffer more from the community vocabularyContain shorter wordsHave longer longest sentencesHave more words per sentence23-26 June 2014ACM Web Science Conference 2014 (WebSci14)But how good are shallow features?But how good are shallow features?58% macro precision (our baseline)

Possible reasonsEvolution of language characteristicsLanguage becomes more eloquentVariance is hugeUniversal classifier looks unreachable, e.g.:SuperUser average length is 577Skeptics average length is 2,154

23-26 June 2014ACM Web Science Conference 2014 (WebSci14)Proposed solution23-26 June 2014ACM Web Science Conference 2014 (WebSci14)ObjectivesBuild a classifier which is:

Based on linguistic features solelyRobustPerforms equally well to other classifiers that use user ratings (past knowledge) or answer ratings (future knowledge)UniversalSame classifier applicable to as many SE websites possible (domain agnostic)23-26 June 2014ACM Web Science Conference 2014 (WebSci14)Feature discretisationExample for Length23-26 June 2014ACM Web Science Conference 2014 (WebSci14)Group by questionQuestion Id15Answer Id67Length220031504250150100Sort by Length in descending orderRankLengthD12312Information Gain from Discretisation

23-26 June 2014ACM Web Science Conference 2014 (WebSci14)Feature discretisationCategory Name Information Gain Linguistic Length 0.0226 LongestSentence0.0121 LL0.0053 WordsPerSentence 0.0048CharactersPerWord 0.0052 Linguistic Discretisation LengthD 0.2168 LongestSentenceD0.1750 LLD0.1180 WordsPerSentenceD0.1404 CharactersPerWordD 0.1162 23-26 June 2014ACM Web Science Conference 2014 (WebSci14)20x increaseUser and answer rating features23-26 June 2014ACM Web Science Conference 2014 (WebSci14)Category Name Information Gain Other Age0.0539 CreationDateD0.1575AnswerCount0.3270User Rating UserReputation0.0836UserUpVotes0.0535UserDownVotes0.0412UserViews0.0528UserUpDownVotes0.0508Answer rating Score0.0792CommentCount0.0286ScoreRatio0.4539Evaluation23-26 June 2014ACM Web Science Conference 2014 (WebSci14)What are we evaluating?Prediction

How good is it compared with the SOTA?

Generality23-26 June 2014ACM Web Science Conference 2014 (WebSci14)1. Prediction Features used23-26 June 2014ACM Web Science Conference 2014 (WebSci14)LinguisticLinguisticDiscretisationOtherUser RatingAnswer Rating

Past KnowledgeFuture Knowledge1. PredictionClassifier was Alternate Decision Trees (ADT)Binary, boosting, numerical dataWeka10-fold validation

23-26 June 2014ACM Web Science Conference 2014 (WebSci14)LinguisticLinguisticDiscretisationOther1. PredictionSE WebsitePRFMAUCstackoverflow.com0.820.660.730.85apple.stackexchange.com0.840.680.750.86askubuntu.com0.840.740.790.88drupal.stackexchange.com0.870.790.830.89electronics.stackexchange.com0.790.650.710.84english.stackexchange.com0.770.520.620.83gamedev.stackexchange.com0.820.710.760.87gaming.stackexchange.com0.870.790.830.91gis.stackexchange.com0.850.730.780.87math.stackexchange.com0.850.740.790.87mathoverflow.net0.830.70.760.87meta.stackoverflow.com0.870.690.770.87physics.stackexchange.com0.860.710.780.88programmers.stackexchange.com0.760.40.520.84serverfault.com0.830.660.740.85skeptics.stackexchange.com0.870.830.850.91stats.stackexchange.com0.850.790.820.89superuser.com0.840.650.730.85tex.stackexchange.com0.870.770.820.88unix.stackexchange.com0.810.680.740.85wordpress.stackexchange.com0.880.80.840.89Average0.840.70.760.8723-26 June 2014ACM Web Science Conference 2014 (WebSci14)SE WebsitePRFMAUCstackoverflow.com0.820.660.730.85Macro Average0.840.70.760.872. Comparison with other solutions23-26 June 2014ACM Web Science Conference 2014 (WebSci14)LinguisticLinguisticDiscretisationOtherUser RatingAnswer RatingCaseFeatures Used1Linguistic 2Linguistic & Discretisation3Linguistic & Discretisation & Other 4Linguistic & Other & User Rating(no discretisation) 5Linguistic & Other & User Rating(with discretisation) 6All features(Answer and User Rating with discretisation) ComparisonCaseFeatures UsedPRFMAUC1Linguistic 0.580.600.560.602Linguistic & Discretisation0.810.700.740.843Linguistic & Discretisation & Other 0.840.70.760.874Linguistic & Other & User Rating(no discretisation) 0.820.690.750.865Linguistic & Other & User Rating(with discretisation) 0.820.720.770.886All features(Answer and User Rating with discretisation) 0.880.850.860.9423-26 June 2014ACM Web Science Conference 2014 (WebSci14)3. GeneralityLeave-one-outTrained a classifier for each SE website based on all other SE websites(Stackoverflow was evaluated but was excluded from training due to its size)23-26 June 2014ACM Web Science Conference 2014 (WebSci14)PRFMAUCMacro average based on self-training(results from the first part of evaluation)0.840.70.760.87Leave-one-out0.830.70.760.87Discussion & Conclusion23-26 June 2014ACM Web Science Conference 2014 (WebSci14)Best Answer predictionCommunity feedback on the answers remains the best way for determining the best answer, butDiscretisation reveals a lot more informationContent features, even shallow ones CAN be very informativeIndependent from past (not always available) knowledgeIndependent from future knowledgeWeb application/service is under development23-26 June 2014ACM Web Science Conference 2014 (WebSci14)23-26 June 2014ACM Web Science Conference 2014 (WebSci14)Best Answer Predict

Search related