62
ented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi Chen Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs, AAAI Technical Report

Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

Presented by Jian-Shiun Tzeng 11/24/2008

Opinion Extraction, Summarization and Tracking in News and Blog Corpora

Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi Chen

Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs, AAAI

Technical Report

Page 2: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

2

Outline

1. Introduction2. Corpus Description3. Opinion Extraction4. Opinion Summarization5. An Opinion Tracking System6. Conclusion and Future Work

Page 3: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

3

1. Introduction

• Watching specific information sources and summarizing the newly discovered opinions are important for governments to improve their services and for companies to improve their products

• News and blog articles are two important sources of opinions

• Sentiment and topic detection

Page 4: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

4

2. Corpus Description

• Three sources of information– TREC corpus (in English)– NTCIR corpus (in Chinese)– Articles from web blogs (in Chinese)

• Chinese materials are annotated for– Inter-annotator agreement analysis– Experiment of opinion extraction

• All of them are then used in opinion summarization (animal clone)

Page 5: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

5

2. Corpus Description

2.1 Data Acquisition2.2 Annotations2.3 Inter-annotator Agreement

Page 6: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

6

2.1 Data Acquisition

• TREC 2003– 50 document sets (25 documents in each set)– Documents in the same set are relevant– Set 2 (Clone Dolly Sheep)

Page 7: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

7

2.1 Data Acquisition

• NTCIR– Test collection CIRB010 for Chinese IR in NTCIR2

(2001)– 50 topics (6 of them are opinionated topics)– Total 192 documents relevant to 6 topics are

chosen to be training data– Topic “animal cloning” of NTCIR3 selected from

CIRB011 and CIRB020 used for testing

Page 8: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

8

2.1 Data Acquisition

• Blog– Retrieve from blog portals by the query “animal

cloning”

Page 9: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

9

2.1 Data Acquisition

• The numbers of documents relevant to “animal cloning” in three different information sources are listed in Table 1.

Page 10: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

10

2.2 Annotations

• To build up training and testing sets for Chinese opinion extraction, opinion tags at word, sentence and document levels are annotated by 3 annotators.

• We adopt the tagging format specified in the paper (Ku, Wu, Li and Chen, 2005). There are four possible values – say, positive, neutral, negative and non-sentiment, for the opinion tags at three levels.

• NTCIR news and web blog articles are annotated for this work.

Page 11: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

11

2.3 Inter-annotator Agreement

Page 12: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

12

2.3 Inter-annotator Agreement

• Blog articles may use simpler words and are easier to understand by human annotators than news articles

<

Page 13: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

13

2.3 Inter-annotator Agreement

• The agreement drops fast when the number of annotators increases

• Less possible to have consistent annotations when more annotators are involved

• We adopt voting to create the gold standard– The majority of annotation is taken as the gold

standard for evaluation– If the annotations of one instance are all different,

this instance is dropped

Page 14: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

14

2.3 Inter-annotator Agreement

• A total of 3 documents, 18 sentences but 0 words are dropped

• According to this criterion, Table 6 summarizes the statistics of the annotated testing data

Page 15: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

15

2.3 Inter-annotator Agreement

• Annotation results of three annotators comparing to the gold standard

Page 16: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

16

2.3 Inter-annotator Agreement

• The decision of opinion polarities depends much on human perspectives. Therefore, the information entropy of testing data should also be taken into consideration, when comparing system performance.

Page 17: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

17

3. Opinion Extraction

• The goal of opinion extraction is to detect where in documents opinions are embedded

• Opinions are hidden in words, sentences and documents

• An opinion sentence is the smallest complete semantic unit from which opinions can be extract

• Extraction algorithm– words sentences documents

Page 18: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

18

3. Opinion Extraction

• Opinion scores of words, which represent their sentiment degrees and polarities

• The degree of a supportive/non-supportive sentence is a function of an opinion holder together with sentiment words

• The opinion of a document is a function of all the supportive/non-supportive sentences

• A summary report is a function of all relevant opinionated documents

Page 19: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

19

3. Opinion Extraction

3.1 Algorithm- Word Level- Sentence Level- Document Level

3.2 Performance of Opinion Extraction

Page 20: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

20

3.1 Algorithm – Word Level

• To detect sentiment words in Chinese documents, a Chinese sentiment dictionary is indispensable

• However, a small dictionary may suffer from the problem of coverage

• We develop a method to learn sentiment words and their strengths from multiple resources

Page 21: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

21

3.1 Algorithm – Word Level

• Two sets of sentiment words– General Inquirer1 (GI)• English Chinese

– Chinese Network Sentiment Dictionary2 (CNSD)• Chinese, collected from the Internet

Page 22: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

22

3.1 Algorithm – Word Level

• We enlarge the seed vocabulary by consulting two thesauri– tong2yi4ci2ci2lin2 (Cilin, 同義詞詞林 ) (Mei et al. 1982)

• 12 large categories, 1428 small categories, and 3925 word clusters

– Academia Sinica Bilingual Ontological Wordnet 3 (BOW)• Similar structure as WordNet

• Word in the same clusters may not always have the same opinion tendency– 寬恕 (positive)、姑息 (negative) in the same synonym set

(synset)

Page 23: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

23

• This equation not only tells us the opinion tendency of an unknown word, but also suggests its strength

• fpci and fnci denote the frequencies of a character ci in the positive and negative words

• n and m denote total number of unique characters in positive and negative words

• Formulas (1) and (2) utilize the percentage of a character in positive/negative words to show its sentiment tendency

3.1 Algorithm – Word Level

)2(

)1(

ii

i

i

ii

i

i

cc

cc

cc

cc

fnfp

fnN

fnfp

fpP

Page 24: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

24

3.1 Algorithm – Word Level

• However, there are more negative words than positive ones in the “seed vocabulary”

• Hence, the frequency of a character in a positive word may tend to be smaller than that in a negative word

• That is unfair for learning, so a normalized version of Formulas (3) and (4) shown as follows is adopted

)2(

)1(

ii

i

i

ii

i

i

cc

cc

cc

cc

fnfp

fnN

fnfp

fpP

Page 25: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

25

3.1 Algorithm – Word Level

• Where Pci and Nci denote the weights of ci as positive and negative characters

)4(//

/

)3(//

/

11

1

11

1

m

jcc

n

jcc

m

jcc

c

m

jcc

n

jcc

n

jcc

c

jiji

ji

i

jiji

ji

i

fnfnfpfp

fnfn

N

fnfnfpfp

fpfp

P

Page 26: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

26

3.1 Algorithm – Word Level

• The difference of Pci and Nci determines the sentiment tendency of character ci

• If it is a positive value, then this character appears more times in positive Chinese words; and vice versa

• A value close to 0 means that it is not a sentiment character or it is a neutral sentiment character.

)5()(iii ccc NPS

Page 27: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

27

3.1 Algorithm – Word Level

• Formula (6) defines: a sentiment degree of a Chinese word w is the average of the sentiment scores of the composing characters c1, c2, …, cp.

• If the sentiment score of a word (w) is positive, it is likely to be a positive sentiment word, and vice versa

• A word with a sentiment score close to 0 is possibly neutral or non-sentiment.

)6(1

1

p

jcw jS

pS

Page 28: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

28

3.1 Algorithm – Sentence Level

Page 29: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

29

3.1 Algorithm – Document Level

Page 30: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

30

3.2 Performance of Opinion Extraction

• The gold standard is used to evaluate the performance of opinion extraction at word, sentence and document level

• The performance is compared with two machine learning algorithms, i.e., SVM and the decision tree, at word level

• C5 system is employed to generate the decision tree

Page 31: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

31

3.2 Performance of Opinion Extraction

• Proposed sentiment word mining algorithm

Page 32: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

32

3.2 Performance of Opinion Extraction

• For machine learning algorithms, qualified seeds are used for training (set A) and gold standard is used for testing (set B)

Avg. 46.81%

small training set

worse

Page 33: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

33

3.2 Performance of Opinion Extraction

• Our algorithm outperforms SVM and the decision tree in sentiment word mining

• This is because the semantics within a word is not enough for a machine learning classifier. In other words, machine learning methods are not suitable for word level opinion extraction

• In the past, Pang et al. (2002) showed that machine learning methods are not good enough for opinion extraction at document level

• Under our experiments, we conclude that opinion extraction is beyond a classification problem

Page 34: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

3.2 Performance of Opinion Extraction

• Current algorithm only considers opinionated relations but not relevant relations

• Many sentences, which are non-relevant to the topic “animal cloning”, are included for opinion judgment

• The non-relevant rate is 50% and 53% for NTCIR news articles and web blog articles

34

Page 35: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

35

3.2 Performance of Opinion Extraction

• Extracting opinions only is not enough for opinion summarizations

• The focus of opinions should also be considered

• In the following opinion summarization section, a relevant sentence selection algorithm is introduced and applied when extracting sentences for opinion summarizations

Page 36: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

36

4. Opinion Summarization

• Traditional summarization algorithms rely on the important facts of documents and remove the redundant information

• The repeated opinions of the same polarity cannot be dropped because they strengthen the sentiment degree

• Detecting opinions generating opinion summaries (remove redundant)

Page 37: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

37

4. Opinion Summarization

• An algorithm, which decides the relevance degree and the sentiment degree

• A text-based summary categorized by opinion polarities (different from the traditional summaries)

• A graph-based summary along time series

Page 38: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

38

4. Opinion Summarization

4.1 Algorithm4.2 Opinion Summaries of News and Blogs

Page 39: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

39

4.1 Algorithm

• Choosing representative words that can exactly present the main concepts of a relevant document set is the main work of relevant sentence retrieval

• A term is considered to be representative if– it appears frequently across documents or– appears frequently in each document (Fukumoto

and Suzuki, 2000)

Page 40: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

40

4.1 Algorithm

• W: weight• S: document level• P: paragraph level• TF: term frequency• N: word count?

• Event Tracking based on domain dependency (Fukumoto and Suzuki, 2000)– N: the # of stories (documents, paragraphs)– Nsi: the # of stories where t occurs

(9)

(10)

Page 41: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

41

4.1 Algorithm

• (11) and (13): how frequently term t appears across documents and paragraphs

• (12) and (14): how frequently term t appears in each documents and paragraphs

• TH : threshold to control the # of representative terms in a relevant corpus. TH ↑, # of included terms ↑

(11)

(12)

(13)

(14)

mistake in this paper?

t t

tt

Page 42: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

42

4.1 Algorithm

• A term is thought as representative if it satisfies either Formulas (15) or (16)– Terms satisfying Formula (15) tend to appear in few paragraphs of many

documents (t as an topic)• t frequently appears across documents , rather than paragraphs (Disp)• t frequently appears in the particular paragraph Pj, rather than the document Si (Dev)

– Terms satisfying Formula (16) appear in many paragraphs of few documents (t as an event)• t frequently appears across paragraphs , rather than documents (Disp)• t frequently appears in i-th document Si, rather than paragraph Pj (Dev)

Page 43: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

43

Topic Extraction

Page 44: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

44

Event Extraction

Page 45: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

45

4.1 Algorithm

• The score of a term, defined as the absolute value of DevPjt minus DevSit , measures how significant it is to represent the main concepts of a relevant document set

Page 46: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

46

4.1 Algorithm

Page 47: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

47

4.1 Algorithm

• NTCIR corpus, with TREC style, contains concept words for each topic.

• These words are taken as the major topic for the opinion extraction.

• Sentences contain at least one concept word are considered relevant to the topic

Page 48: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

48

4.1 Algorithm

better than

considering relevance relations together with sentiments

Page 49: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

49

4.1 Algorithm

better than

considering relevance relations together with sentiments

Page 50: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

50

4.1 Algorithm

• Totally 29.67% and 72.43% of non-relevant sentences are filtered out for news and web blog articles

• The performance of filtering non-relevant sentences in blog articles is better than that in news articles

Page 51: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

51

4.1 Algorithm

• The result is also consistent with the higher agreement rate of annotations in blog articles– Total 15 topical words are extracted automatically

from blog articles while more, 73 topical words, are extracted from news articles

– These all tell that the content of news articles diverge more than that of blog articles

• However, the judgment of sentiment polarity of blog articles is not simpler (precision 38.06% vs. 23.48%)

Page 52: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

52

4.1 Algorithm

• The topical degree and the sentiment degree of each sentence are employed to generate opinion summaries

• Two types of opinion summarizations– Brief opinion summary

• pick up the document with the largest number of positive or negative sentences and use its headline to represent the overall summary

– Detailed opinion summary• list positive-topical and negative-topical sentences with higher

sentiment degree

Page 53: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

53

4.1 Algorithm

Page 54: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

54

4.1 Algorithm

Page 55: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

55

4.2 Opinion Summaries of News and Blogs

• Two main sources for opinions– News documents are more objective – Blog articles are usually more subjective

• Different social classes– The opinions extracted from news are mostly from

famous people– The opinions expressed in blogs may come from a no

name• The opinion summarization algorithm proposed is

language independent

Page 56: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

56

4.2 Opinion Summaries of News and Blogs

Page 57: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

57

4.2 Opinion Summaries of News and Blogs

Page 58: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

58

5. An Opinion Tracking System

• Like an event, we are more concerned of how opinions change over time

• Because the number of articles relevant to “animal cloning” is not large enough to track opinions in NTCIR corpus, we take the president election in the year 2000 in Taiwan as an illustrating example.

Page 59: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

59

5. An Opinion Tracking System

Person A was the President elect.

Page 60: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

60

5. An Opinion Tracking System

• This tracking system can also track opinions according to different requests and different information sources, including news agencies and the web

• Opinion trends toward one specific focus from different expressers can also be compared

• This information is very useful for the government, institutes, companies, and the concerned public.

Page 61: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

61

6. Conclusion and Future Work

• Algorithms for opinion extraction, summarization, and tracking

• Machine learning methods are not suitable for sentiment word mining

• Utilizing the sentiment words mined together with topical words enhances the performance

• Opinion holders– Different holders have different influence

• How to influence the sentiment degree

– Relation between holders• Multi-perspective problems in opinions

Page 62: Presented by Jian-Shiun Tzeng 11/24/2008 Opinion Extraction, Summarization and Tracking in News and Blog Corpora Lun-Wei Ku, Yu-Ting Liang and Hsin-Hsi

62

Resources

• NTU Sentiment Dictionary© (NTUSD)– NTUSD_positive_unicode.txt (2812 words)– NTUSD_negative_unicode.txt (8276 words)