View
214
Download
0
Tags:
Embed Size (px)
Citation preview
3
Blog Post Opinion Retrieval
Aims at developing an effective retrieval function that ranks posts according to the likelihood that they are expressing an opinion about a particular topic.
7
A Common Approach to Opinion Retrieval
Rank posts by relevance, select the highest ranking posts
Calculate opinion score for each document
8
A Common Approach to Opinion Retrieval
Rank posts by relevance, select the highest ranking posts
Combine the opinion and relevance scores
Calculate opinion score for each document
9
Calculate opinion score for each document
• General Inquirer (Stone et al., 1966)
• OpinionFinder lexicon (Wiebe & Riloff, 2005)
• SentiWordNet (Esuli & Sebastiani, 2006)
• etc
• Lexicon-based
• Classification-based
10
Calculate opinion score for each document
• General Inquirer (Stone et al., 1966)
• OpinionFinder lexicon (Wiebe & Riloff, 2005)
• SentiWordNet (Esuli & Sebastiani, 2006)
• etc
• Lexicon-based
• Classification-based
22
Opinion Lexicon
fortunatenice
badgood
poorwrong
spoiled
1.0
0.96
0.95
0.98
0.89
0.88
0.93
...
...
...
EM algorithm
SentiwordNet
Amazon.com Review and Specification Corpus
tp(o|t)
Lee et al., KLE at TREC 2008
33
• No statistically significant difference between kernels using the best parameter for each.
• Laplace kernel is less sensitive to the parameter
Different Kernels
34
Smoothed Proximity Model
• Capture Proximity at different ranges
• In docs where exact query term may be rare
• Opinion expressions refer to q indirectly via anaphoric expressions
36
A Common Approach to Opinion Retrieval
Rank posts by relevance, select the highest ranking posts
Combine the opinion and relevance scores
Calculate opinion score for each document
37
TREC Baselines
Rank posts by relevance, select the highest ranking posts
Combine the opinion and relevance scores
Calculate opinion score for each document
43
Different Relevant Opinion Scoring Method
TREC baseline 4
Statistical significant over TREC relevance baselines
44
Results over five standard TREC baselines
Statistical significant over TREC relevance baselines
Statistical significant over non-proximity opinion baseline
45
per Topic Performance Analysis
Carmax Yojimbo TomTom
Picasa
Mark Warner for President
Iceland European Union
Sheep and Wool Festival
46
Results of the best runs on standard baseline 4
Statistical significant over TREC relevance baselines
47
Conclusions
• A novel probabilistic model for blog opinion retrieval was proposed
• Proximity of opinion to query terms is a good indicator of their relatedness
• Laplace kernel was proposed and the effect of different kernels was studied
•Normalization can be important and the best normalization depends on the underlying relevance retrieval baseline