45
1 Proximity-Based Opinion Retrieval Mark CarmanFabio Crestani Shima Gerani

1 Proximity-Based Opinion Retrieval Mark CarmanFabio CrestaniShima Gerani

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

1

Proximity-Based Opinion Retrieval

Mark Carman Fabio CrestaniShima Gerani

2

What is Blog Post Opinion Retrieval?

3

Blog Post Opinion Retrieval

Aims at developing an effective retrieval function that ranks posts according to the likelihood that they are expressing an opinion about a particular topic.

4

Relevant Opinion

5

RelevantOpinion

6

A Common Approach to Opinion Retrieval

Rank posts by relevance, select the highest ranking posts

7

A Common Approach to Opinion Retrieval

Rank posts by relevance, select the highest ranking posts

Calculate opinion score for each document

8

A Common Approach to Opinion Retrieval

Rank posts by relevance, select the highest ranking posts

Combine the opinion and relevance scores

Calculate opinion score for each document

9

Calculate opinion score for each document

• General Inquirer (Stone et al., 1966)

• OpinionFinder lexicon (Wiebe & Riloff, 2005)

• SentiWordNet (Esuli & Sebastiani, 2006)

• etc

• Lexicon-based

• Classification-based

10

Calculate opinion score for each document

• General Inquirer (Stone et al., 1966)

• OpinionFinder lexicon (Wiebe & Riloff, 2005)

• SentiWordNet (Esuli & Sebastiani, 2006)

• etc

• Lexicon-based

• Classification-based

11

A relevant blog post about “Munich”

12

So, What is the problem?

13

Also relevant to “Brokeback Mountain” and “Crash”

14

Challenges

14

15

Challenges

query specific opinion score

Final Ranking

16

Topic Related Opinion Retrieval

O: document expresses an opinion about the query

17

Topic Related Opinion Retrieval

Relevance Opinion

18

Topic Related Opinion Retrieval

Proximity-based estimate

22

Opinion Lexicon

fortunatenice

badgood

poorwrong

spoiled

1.0

0.96

0.95

0.98

0.89

0.88

0.93

...

...

...

EM algorithm

SentiwordNet

Amazon.com Review and Specification Corpus

tp(o|t)

Lee et al., KLE at TREC 2008

23

Proximity-based ModelDifferentiating document’s positions

24

Opinion Density of a document's Position

is referring to

How much is opinionated

25

Opinion Density of a document's Position

lexiconlexicon kernelkernel

26

Opinion Density: P(o|i,d)

nice

heavy

27

Opinion Density: P(o|i,d)

nice

heavy

28

Propagated Opinion

nice

heavy

29

Opinion Density: P(o|i,d)

brokeback

mountainmunic

h

brokeback

30

Proximity-based Opinion Prob.

Avg:

Max:

31

Different Kernels

32

Different Kernels

33

• No statistically significant difference between kernels using the best parameter for each.

• Laplace kernel is less sensitive to the parameter

Different Kernels

34

Smoothed Proximity Model

• Capture Proximity at different ranges

• In docs where exact query term may be rare

• Opinion expressions refer to q indirectly via anaphoric expressions

35

Relevance Retrieval Step

36

A Common Approach to Opinion Retrieval

Rank posts by relevance, select the highest ranking posts

Combine the opinion and relevance scores

Calculate opinion score for each document

37

TREC Baselines

Rank posts by relevance, select the highest ranking posts

Combine the opinion and relevance scores

Calculate opinion score for each document

38

Topic Related Opinion Retrieval

Relevance Opinion

39

Topic Related Opinion Retrieval

estimate the relevance component

40

Relevance Component

41

Relevance Prob.

42

Different ways of using relevance score

TREC baseline 4

Relevance

43

Different Relevant Opinion Scoring Method

TREC baseline 4

Statistical significant over TREC relevance baselines

44

Results over five standard TREC baselines

Statistical significant over TREC relevance baselines

Statistical significant over non-proximity opinion baseline

45

per Topic Performance Analysis

Carmax Yojimbo TomTom

Picasa

Mark Warner for President

Iceland European Union

Sheep and Wool Festival

46

Results of the best runs on standard baseline 4

Statistical significant over TREC relevance baselines

47

Conclusions

• A novel probabilistic model for blog opinion retrieval was proposed

• Proximity of opinion to query terms is a good indicator of their relatedness

• Laplace kernel was proposed and the effect of different kernels was studied

•Normalization can be important and the best normalization depends on the underlying relevance retrieval baseline

48

Thanks!

shima.gerani,mark.carman,fabio.crestani

@usi.ch