1 Proximity-Based Opinion Retrieval Mark CarmanFabio CrestaniShima Gerani

1

Proximity-Based Opinion Retrieval

Mark Carman Fabio CrestaniShima Gerani

2

What is Blog Post Opinion Retrieval?

3

Blog Post Opinion Retrieval

Aims at developing an effective retrieval function that ranks posts according to the likelihood that they are expressing an opinion about a particular topic.

4

Relevant Opinion

5

RelevantOpinion

6

A Common Approach to Opinion Retrieval

Rank posts by relevance, select the highest ranking posts

7



Calculate opinion score for each document

8



Combine the opinion and relevance scores


9


• General Inquirer (Stone et al., 1966)

• OpinionFinder lexicon (Wiebe & Riloff, 2005)

• SentiWordNet (Esuli & Sebastiani, 2006)

• etc

• Lexicon-based

• Classification-based

10


• General Inquirer (Stone et al., 1966)

• OpinionFinder lexicon (Wiebe & Riloff, 2005)

• SentiWordNet (Esuli & Sebastiani, 2006)

• etc

• Lexicon-based

• Classification-based

11

A relevant blog post about “Munich”

12

So, What is the problem?

13

Also relevant to “Brokeback Mountain” and “Crash”

14

Challenges

14

15

Challenges

query specific opinion score

Final Ranking

16

Topic Related Opinion Retrieval

O: document expresses an opinion about the query

17


Relevance Opinion

18


Proximity-based estimate

22

Opinion Lexicon

fortunatenice

badgood

poorwrong

spoiled

1.0

0.96

0.95

0.98

0.89

0.88

0.93

...

...

...

EM algorithm

SentiwordNet

Amazon.com Review and Specification Corpus

tp(o|t)

Lee et al., KLE at TREC 2008

23

Proximity-based ModelDifferentiating document’s positions

24

Opinion Density of a document's Position

is referring to

How much is opinionated

25

Opinion Density of a document's Position

lexiconlexicon kernelkernel

26

Opinion Density: P(o|i,d)

nice

heavy

27


nice

heavy

28

Propagated Opinion

nice

heavy

29


brokeback

mountainmunic

h

brokeback

30

Proximity-based Opinion Prob.

Avg:

Max:

31

Different Kernels

32

Different Kernels

33

• No statistically significant difference between kernels using the best parameter for each.

• Laplace kernel is less sensitive to the parameter

Different Kernels

34

Smoothed Proximity Model

• Capture Proximity at different ranges

• In docs where exact query term may be rare

• Opinion expressions refer to q indirectly via anaphoric expressions

35

Relevance Retrieval Step

36





37

TREC Baselines




38


Relevance Opinion

39


estimate the relevance component

40

Relevance Component

41

Relevance Prob.

42

Different ways of using relevance score

TREC baseline 4

Relevance

43

Different Relevant Opinion Scoring Method

TREC baseline 4

Statistical significant over TREC relevance baselines

44

Results over five standard TREC baselines


Statistical significant over non-proximity opinion baseline

45

per Topic Performance Analysis

Carmax Yojimbo TomTom

Picasa

Mark Warner for President

Iceland European Union

Sheep and Wool Festival

46

Results of the best runs on standard baseline 4


47

Conclusions

• A novel probabilistic model for blog opinion retrieval was proposed

• Proximity of opinion to query terms is a good indicator of their relatedness

• Laplace kernel was proposed and the effect of different kernels was studied

•Normalization can be important and the best normalization depends on the underlying relevance retrieval baseline

48

Thanks!

shima.gerani,mark.carman,fabio.crestani

@usi.ch

Documents

1 Proximity-Based Opinion Retrieval Mark CarmanFabio CrestaniShima Gerani