13
ORIGINAL PAPER SemEval-2010 task 18: disambiguating sentiment ambiguous adjectives Yunfang Wu · Peng Jin Published online: 1 December 2012 © Springer Science+Business Media Dordrecht 2012 Abstract Sentiment ambiguous adjectives, which have been neglected by most previous researches, pose a challenging task in sentiment analysis. We present an evaluation task at SemEval-2010, designed to provide a framework for comparing different approaches on this problem. The task focuses on 14 Chinese sentiment ambiguous adjectives, and provides manually labeled test data. There are 8 teams submitting 16 systems in this task. In this paper, we define the task, describe the data creation, list the participating systems, and discuss different approaches. Keywords Sentiment ambiguous adjectives · Sentiment analysis · Word sense disambiguation · SemEval 1 Introduction In recent years, sentiment analysis has attracted considerable attention in the field of natural language processing. It is the task of mining positive and negative opinions from real texts, which can be applied to many natural language application systems, such as document summarization and question answering. Previous work on this problem falls into three groups: opinion mining of documents, sentiment classification of sentences and polarity prediction of words. Sentiment analysis at both document and sentence level relies heavily on word level. Another line of Y. Wu (&) Key Laboratory of Computational Linguistics (Peking University), Ministry of Education, Beijing, China e-mail: [email protected] P. Jin Laboratory of Intelligent Information Processing and Application, Leshan Normal University, Leshan, China e-mail: [email protected] 123 Lang Resources & Evaluation (2013) 47:743-755 DOI 10.1007/s10579-012-9206-z

SemEval-2010 task 18: disambiguating sentiment ambiguous adjectives

  • Upload
    peng

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SemEval-2010 task 18: disambiguating sentiment ambiguous adjectives

ORIGINAL PAPER

SemEval-2010 task 18: disambiguating sentimentambiguous adjectives

Yunfang Wu · Peng Jin

Published online: 1 December 2012

© Springer Science+Business Media Dordrecht 2012

Abstract Sentiment ambiguous adjectives, which have been neglected by most

previous researches, pose a challenging task in sentiment analysis. We present an

evaluation task at SemEval-2010, designed to provide a framework for comparing

different approaches on this problem. The task focuses on 14 Chinese sentiment

ambiguous adjectives, and provides manually labeled test data. There are 8 teams

submitting 16 systems in this task. In this paper, we define the task, describe the data

creation, list the participating systems, and discuss different approaches.

Keywords Sentiment ambiguous adjectives · Sentiment analysis ·

Word sense disambiguation · SemEval

1 Introduction

In recent years, sentiment analysis has attracted considerable attention in the field of

natural language processing. It is the task of mining positive and negative opinions

from real texts, which can be applied to many natural language application systems,

such as document summarization and question answering. Previous work on this

problem falls into three groups: opinion mining of documents, sentiment

classification of sentences and polarity prediction of words. Sentiment analysis at

both document and sentence level relies heavily on word level. Another line of

Y. Wu (&)

Key Laboratory of Computational Linguistics (Peking University), Ministry of Education,

Beijing, China

e-mail: [email protected]

P. Jin

Laboratory of Intelligent Information Processing and Application,

Leshan Normal University, Leshan, China

e-mail: [email protected]

123

Lang Resources & Evaluation (2013) 47:743-755

DOI 10.1007/s10579-012-9206-z

Page 2: SemEval-2010 task 18: disambiguating sentiment ambiguous adjectives

research is feature-based sentiment analysis that extracts product features and the

opinion towards them (e.g. Jin and Ho 2009; Li et al. 2010), which is also based on

the lexical semantic orientation.

The most frequently explored task at word level is to determine the semantic

orientation (SO) of words, in which most work centers on assigning a prior polarity

to words or word senses in the lexicon out of context. However, for some words, the

polarity varies strongly with context. For instance, the word “low” has a positive

orientation in “low cost” but a negative orientation in “low salary”. This makes it

hard to attach each word to a specific sentiment category in the lexicon. Turney and

Littman (2003) claim that sentiment ambiguous words cannot be avoided easily in a

real-world application. But unfortunately, sentiment ambiguous words are neglected

by most researches concerning sentiment analysis (e.g., Hatzivassiloglou and

McKeown 1997; Turney and Littman 2003; Kim and Hovy 2004).

Also, sentiment ambiguous words have not been intentionally tackled in the

researches of word sense disambiguation, where senses are defined as word

meanings rather than semantic orientations. Actually, disambiguating sentiment

ambiguous words is an interaction task between sentiment analysis and word sense

disambiguation.

Our task at SemEval-2010 provides a benchmark data set to encourage studies on

disambiguating sentiment ambiguous adjectives (SAAs) within context in real text.

We limit our work to 14 frequently used adjectives in Chinese, such as “large, small,

many, few, high, low”, which all have the meaning of measurement. Although the

number of such ambiguous adjectives is not large, they are frequently used in real

text, especially in the texts expressing opinions and emotions. The work of Wu and

Wen (2010) has proven that the disambiguation of 14 SAAs can obviously improve

the performance of sentiment classification of product reviews. Our task attracts

researchers’ attention, and there are 8 teams coming from France, Spain, China

mainland and Hong Kong.

The rest of this paper is organized as follows. Section 2 discusses related work;

Sect. 3 defines the task; Sect. 4 describes the data collection; Sect. 5 gives a brief

summary of 16 participating systems; Sect. 6 gives a discussion; finally Sect. 7

draws conclusions.

2 Related work

2.1 Word-level sentiment analysis

Recently there has been extensive research in sentiment analysis, for which Pang

and Lee (2008) give an in-depth survey of literature. Closer to our study is the large

body of work on automatic SO prediction of words (Hatzivassiloglou and McKeown

1997; Turney and Littman 2003; Kim and Hovy 2004; Andreevskaia and Bergler

2006), but unfortunately they discard SAAs or just give a prior polarity to each SAA

in their research. In recent years, some studies go a step further, attaching SO to

senses instead of word forms (Esuli and Sebastiani 2006; Wiebe and Mihalcea 2006;

Su and Markert 2008), but their work is still limited in lexicon out of context.

744 Y. Wu, P. Jin

123

Page 3: SemEval-2010 task 18: disambiguating sentiment ambiguous adjectives

The most relevant work is Ding et al. (2008), in which SAAs are named as

context dependant opinions. They argue that there is no way to know the SO of

SAAs without prior knowledge, and asking a domain expert to provide such

knowledge is scalable. So they adopt a holistic lexicon-based approach to solve this

problem, by exploiting external information and evidences in other sentences and

other reviews. Wu and Wen (2010), Wen and Wu (2011) disambiguate dynamic

SAAs by extracting sentiment expectation of nouns using lexical-syntactic patterns.

2.2 Phrase-level sentiment analysis

The disambiguation of SAAs can also be considered as a problem of phrase-level

sentiment analysis. Wilson et al. (2005) present a two-step process to recognize

contextual polarity that employs machine learning and a variety of features.

Takamura et al. (2006, 2007) propose latent variable model and lexical network to

determine the SO of phrases, focusing on “noun + adjective” pairs. Their

experimental results suggest that the classification of pairs containing ambiguous

adjectives is much harder than those with unambiguous adjectives. In this task, we

also deal with “noun + adjective” pairs but focus on the much harder task of

disambiguating SAAs.

2.3 Disambiguating adjectives

Although quite a lot of work has devoted to disambiguate word senses, limited work

intentionally tackles the problem of disambiguating adjectives, since most work

focuses on the meanings of nouns and verbs.

Yarowsky (1993) utilizes collocations to disambiguate nouns, verbs and

adjectives. Justeson and Kats (1995) argue for a linguistically principled approach

to disambiguate adjective senses, and conclude that about three-quarters of all

instances of the adjectives can be disambiguated by the nouns they modify or by

syntactic constructions. McCarthy and Carroll (2003) explore selectional prefer-

ences on the disambiguation of verbs, nouns and adjectives.

3 Task set up

SAAs can be divided into two groups: static SAAs and dynamic SAAs. A static

SAA has different semantic orientations corresponding to different senses, which

can be defined in the lexicon. For instance, 骄傲|pride has two senses: one sense is

“pride” that is positive; the other sense is “conceited” that is negative. Dynamic

SAAs are neutral out of context, and their SOs are evoked only when they are

occurring in specific contexts, which make it impossible to assign a polarity tag to a

dynamic SAA in the lexicon. For instance, it is quite difficult to assign a polarity tag

to the word 高|high out of context.

In this task, we focus on 14 frequently used dynamic SAAs in Chinese, as shown

below:

SemEval-2010 task 18 745

123

Page 4: SemEval-2010 task 18: disambiguating sentiment ambiguous adjectives

(1) Sentiment ambiguous adjectives (SAAs) = {大|large, 多|many, 高|high, 厚|

thick, 深|deep, 重|heavy, 巨大|huge, 重大|great, 小 |small, 少|few, 低|low,

薄|thin, 浅|shallow, 轻|light}

These adjectives are neutral out of context, but when they co-occur with some

target nouns, positive or negative emotion will be evoked. The task is designed to

automatically determine the SO of these SAAs within context. For example, 高|high

should be assigned as positive in “工资高|salary is high” but negative in “价格高|

price is high”.

In this task, no training data is provided by the organizers, but external resources,

including training data and lexicon, are encouraged to use by the participating

systems.

4 Data creation

4.1 Data

We collected data from two sources. The main part was extracted from Xinhua News

Agency of Chinese Gigaword (Second Edition) released by LDC. The texts were

automatically word-segmented and POS-tagged using the open software ICTCLAS,1

which is based on a hierarchical hidden Markov model. In order to concentrate on the

disambiguation of SAAs, and avoid the complicated processing of syntactic parsing,

we extracted some sentences containing strings that respect the pattern shown in (2),

where the target nouns are modified by the adjectives in most cases.

(2) noun + adverb + adjective (adjectiveϵSAAs)

e.g. 成本/n 较/d 低/a

The cost is relatively lower.

Another small part of data was extracted from the Web. Using the search engine

Google,2 we searched the queries as in (3):

(3) 很|very + adjective (adjectiveϵSAAs)

From the returned snippets, we manually picked out some sentences that contain

strings that follow the pattern (2). Also, the sentences were automatically segmented

and POS-tagged using ICTCLAS.SAAs in the data were assigned as positive, negative or neutral independently by

two annotators. Since the task focuses on the distinction between positive and

negative categories, the neutral instances were removed at last. The inter-annotator

agreement is in a high level with a kappa value of 0.91, indicating that

disambiguating SAAs within context by humans is not a hard work. After cases

with disagreement were negotiated between the two annotators, a gold standard

annotation data was agreed upon.

1 http://www.ictclas.org/.2 http://www.google.com/.

746 Y. Wu, P. Jin

123

Page 5: SemEval-2010 task 18: disambiguating sentiment ambiguous adjectives

In total 2,917 instances were provided as the test data in the task. The number of

instances of per target adjective is listed in Table 3. The instances are given in XML

format. Table 1 gives an example of the adjective 多|many, where “senseid = ”/” is

waiting for the correct answer that is a polarity tag of positive or negative. The

dataset can be downloaded freely from the SemEval-2010 website.3

Evaluation was performed in terms of micro precision and macro precision:

Pmir ¼XN

i¼1

mi

,XN

i¼1

ni ð1Þ

Table 1 An example of the test

data\instance id = “多.3”[

\answer instance = “多.3” senseid = “”/[

\context[

王义夫自言收获颇 \head[多\/head[

\/context[

\postagging[

\/word[

\word id = “1” pos = “nr”[

\token[王\/token[

\/word[

\word id = “2” pos = “nr”[

\token[义夫\/token[

\/word[

\word id = “3” pos = “p”[

\token[自\/token[

\/word[

\word id = “4” pos = “vg”[

\token[言\/token[

\/word[

\word id = “5” pos = “n”[

\token[收获\/token[

\/word[

\word id = “6” pos = “d”[

\token[颇\/token[

\/word[

\word id = “7” pos = “a”[

\token[多\/token[

\/word[

\/postagging[

\/instance[

3 http://semeval2.fbk.eu/semeval2.php?location=data.

SemEval-2010 task 18 747

123

Page 6: SemEval-2010 task 18: disambiguating sentiment ambiguous adjectives

Pmar ¼XN

i¼1

Pi=N Pi ¼ mi=ni ð2Þ

where N is the number of all target words, ni is the number of all test instances for a

specific word, and mi is the number of correctly labeled instances.

4.2 Baseline

We group 14 SAAs into two categories: positive-like adjectives and negative-like

adjectives. Positive-like adjectives have the connotation towards large measure-

ment, whereas negative-like adjectives have the connotation towards small

measurement.

(4) Positive-like adjectives (Pa) = {大|large, 多|many, 高|high, 厚|thick, 深 |deep,

重 |heavy, 巨大|huge, 重大|great}

(5) Negative-like adjectives (Na) = {小|small, 少|few, 低|low, 薄|thin, 浅|shallow,

轻|light}

We conducted baseline experiments on the dataset. Not considering the context,

assign all positive-like adjectives as positive and all negative-like adjectives as

negative. The micro precision of the baseline is 61.20 %.

5 Systems and results

We published first trial data and then test data. In total 11 different teams

downloaded both the trial and test data. Finally 8 teams submitted their

experimental results, including 16 systems.

5.1 Results

Table 2 lists all 16 systems’ scores, ranked from best to lowest performance by

micro precision. The best system gets a micro precision of 94.20 %, which

outperforms our baseline by 33 %. There are 5 systems that cannot rival our

baseline. The performance of the lowest ranked system is only a little higher than

random baseline, which is 50 % when we randomly assign a SO tag to each instance

in the test data. To our surprise, the performances of different systems differ greatly.

The gap between the best and lowest-ranked systems is 43.12 % measured by micro

precision.

Table 3 lists the statistics of per target adjective, where “Ins#” denotes the

number of instances in the test data; “Max %” and “Min %” denote the max and min

micro precision among all systems respectively; “SD” denotes the standard

deviation of precision.

Table 3 shows that the performances of different systems also differ greatly on

each of 14 target adjectives. For example, the precision of 大|large is 95.53 % by

one system but only 46.51 % by another system. There is neither a fixed adjective

748 Y. Wu, P. Jin

123

Page 7: SemEval-2010 task 18: disambiguating sentiment ambiguous adjectives

that is hard to tackle for all systems nor a fixed adjective that is easy to tackle for all

systems.

5.2 Systems

In this section, we give a brief description to the participating systems.

Table 2 The scores

of 16 systemsSystem Micro pre. (%) Macro pre. (%)

YSC-DSAA 94.20 92.93

HITSZ_CITYU_1 93.62 95.32

HITSZ_CITYU_2 93.32 95.79

Dsaa 88.07 86.20

OpAL 76.04 70.38

CityUHK4 72.47 69.80

CityUHK3 71.55 75.54

HITSZ_CITYU_3 66.58 62.94

QLK_DSAA_R 64.18 69.54

CityUHK2 62.63 60.85

CityUHK1 61.98 67.89

QLK_DSAA_NR 59.72 65.68

Twitter Sentiment 59.00 62.27

Twitter Sentiment_ext 56.77 61.09

Twitter Sentiment_zh 56.46 59.63

Biparty 51.08 51.26

Table 3 The scores of 14 SAAsWords Ins# Max % Min % SD

大 |large 559 95.53 46.51 0.155

多|many 222 95.50 49.10 0.152

高 ||high 546 95.60 54.95 0.139

厚 |thick 20 95.00 35.00 0.160

深 |deep 45 100.00 51.11 0.176

重|heavy 259 96.91 34.75 0.184

巨大 |huge 49 100.00 10.20 0.273

重大 |great 28 100.00 7.14 0.243

小 |small 290 93.10 49.66 0.167

少few 310 95.81 41.29 0.184

低 |low 521 93.67 48.37 0.147

薄 |thin 33 100.00 18.18 0.248

浅 |shallow 8 100.00 37.50 0.155

轻 |light 26 100.00 34.62 0.197

SemEval-2010 task 18 749

123

Page 8: SemEval-2010 task 18: disambiguating sentiment ambiguous adjectives

YSC-DSAA This system (Yang and Liu 2010) manually built a word library

SAAOL (sentiment ambiguous adjectives oriented library). It consists of positive

words, negative words, NSSA (negative sentiment ambiguous adjectives), PSSA

(positive sentiment ambiguous adjectives), and inverse words. A word would be

assigned as NSAA if it collocates with positive-like adjectives, and a word would be

assigned as PSAA if it collocates with negative-like adjectives. For example,

“任务|task” is assigned as NSAA as it collocates with 重|heavy in the phrase of

“任务很重|the task is very heavy”. The system divides sentences into clauses using

heuristic rules, and disambiguates SAAs by analyzing the relationship between

SAAs and the target nouns.

HITSZ_CITYU This group (Xu et al. 2010) submitted three systems, including

one baseline system and two improved systems.

HITSZ_CITYU_3: The baseline system is based on the collocations of opinion

words and target words. For the given adjectives, their collocations are extracted

from People’s Daily corpus. With human annotation, the system obtains 412

positive and 191 negative collocations, which serve as seed collocations. Using the

context words of seed collocations as features, the system trains a one-class SVM

classifier.

HITSZ_CITYU_2 and HITSZ_CITYU_1: Using HowNet-based word similarity,

the system expands the seed collocations on both adjective side and collocated

target noun side. The system then exploits intra-sentence opinion analysis to further

improve performance. The strategy is that if the neighboring sentences on both sides

have the same polarity, the ambiguous adjective would be assigned as the same

polarity; if the neighboring sentences have conflicted polarity, the SO of the

ambiguous adjective would be determined by its context words and the transitive

probability of sentence polarity. The final system (HITSZ_CITYU_1/2) combines

collocations, context words and neighboring sentence sentiment in a two-class SVM

classifier to determine the polarity of ambiguous adjectives. HITSZ_CITYU_2 and

HITSZ_CITYU_1 use different parameters and combining strategies.

OpAL This system (Balahur and Montoyo 2010) combines supervised methods

with unsupervised ones. The authors employ Google translator to automatically

translate the task dataset from Chinese to English, since their system is working in

English. The system explores three types of judgments. The first one trains a SVM

classifier based on NTCIR data and EmotiBlog annotations. The second one is based

on the local polarity, obtained by the returned hits of the search engine, by issuing

queries of “noun + SAA + AND + non-ambiguous adjective”, where the non-

ambiguous adjectives include a positive set (“positive, beautiful, good”) and a

negative set (“negative, ugly, bad”). An example query is “price high and good”.

The third judgment consists of some rules. The final result is obtained by the

majority vote of the three components.

CityUHK This group submitted four systems (Lu and Tsou 2010). Both machine

learning method and lexicon-based method are employed in their systems. In the

machine learning method, maximum entropy model is utilized to train a classifier

based on the Chinese data from NTCIR opinion task. Clause-level and sentence-

level classifiers are trained and compared. In the lexicon-based method, the authors

classify SAAs into two clusters: intensifiers (our positive-like adjectives in (4)) and

750 Y. Wu, P. Jin

123

Page 9: SemEval-2010 task 18: disambiguating sentiment ambiguous adjectives

suppressors (our negative-like adjectives in (5)). Moreover, the collocation nouns

are also classified into two clusters: positive nouns (e.g., 素质|quality) and negative

nouns (e.g., 风险|risk). And then the polarity of a SAA is determined by its

collocation noun.

CityUHK4: clause-level machine learning + lexicon.

CityUHK3: sentence-level machine learning + lexicon.

CityUHK2: clause-level machine learning.

CityUHK1: sentence-level machine learning.

QLK_DSAA This group submitted two systems. The authors adopt their SELC

(SElf-supervised, Lexicon-based and Corpus-based) model (Qiu et al. 2009), which

is proposed to exploit the complementarities between lexicon-based and corpus-

based methods to improve the whole performance. They determine the sentence

polarity by SELC model, and simply regard the sentence polarity as the polarity of

SAA in the sentence.

QLK_DSAA_NR: Based on the result of SELC model, they inverse the SO of

SAA when it is modified by negative terms. Our task consists of only positive and

negative categories, so they replace the neutral value obtained by SELC model with

the predominant polarity of the SAA.

QLK_DSAA_R: Based on the result of QLK_DSAA_NR, they add rules to cope

with two modifiers 偏|specially and 太|too, which always have the negative

meanings.

Twitter sentiment This group submitted three systems (Pak and Paroubek 2010).

By exploiting Twitter, they automatically collect English and Chinese datasets

consisting of negative and positive expressions. The sentiment classifier is trained

using Naive Bayes model with n-gram of words as features.

Twitter Sentiment: The task dataset is automatically translated from Chinese to

English using Google translator. They train a Bayes classifier based on the English

training data that is automatically extracted from Twitter.

Twitter Sentiment_ext: With Twitter Sentiment as a base, they utilize extended

data.

Twitter Sentiment_zh: They train a Bayes classifier based on the Chinese training

data that is automatically extracted from Twitter.

Biparty This system (Meng and Wang 2010) transforms the task of disambig-

uating SAAs to predict the polarity of target nouns. The system presents a

bootstrapping method to automatically build the sentiment lexicon, by building a

nouns-verbs bi-party graph from a large corpus. Firstly they select a few nouns as

seed words, and then they use a cross inducing method to expand more nouns and

more verbs into the lexicon. The strategy is based on the random walk model.

6 Discussion

To our delight, the participating 8 teams exploit totally different methods in

disambiguating SAAs. The experimental results of some systems are promising, and

the micro precision of the best three systems is over 93 %. Although the

SemEval-2010 task 18 751

123

Page 10: SemEval-2010 task 18: disambiguating sentiment ambiguous adjectives

experimental results of some systems are not so good, their adopted methods are

novel and interesting.

6.1 Human annotation

In the YSC-DSAA system, the word library of SAAOL is built by humans. In the

HITSZ_CITYU systems, the seed collocations are annotated by humans. The three

systems (YSC-DSAA, HITSZ_CITYU_1, HITSZ_CITYU_2) rank top 3 among all

systems. Undoubtedly, human annotated resources can help improve the perfor-

mance of disambiguating SAAs.

6.2 Training data

The system OpAL combines supervised method with unsupervised ones, and the

supervised method employs a SVM classifier based on NTCIR data and EmotiBlog

annotations. The CityUHK systems train a maximum entropy classifier based on the

Chinese data from NTCIR. The Twitter Sentiment systems utilize a training data

automatically collected from Twitter. The experimental results of CityUHK2 and

CityUHK1 show that the maximum entropy classifier does not work well, mainly due to

the small Chinese training data that is only 9 K. The performances of the Twitter

Sentiment systems are evenworse than our baseline,mainly due to the poor quality of the

training data that is automatically collected from Twitter. What’s more, the training data

designed for sentiment analysis is not qualified for our task of disambiguating SAAs.

6.3 Cross-lingual resources

Our task is in Chinese. Some participating systems, including OpAL and Twitter

Sentiment, exploit English sentiment analysis by translating our Chinese data into

English. The OpAL system achieves a quite good result. It is interesting that the

system Twitter Sentiment based on automatically extracted English training data

gets even better results than Twitter Sentiment_zh that is based on Chinese training

data. It proves the cross-lingual property of the polarity of SAAs and demonstrates

that disambiguating SAAs is a common task in natural language processing.

6.4 Heuristic rules

Some participating systems, including OpAL and QLK_DSAA, employ heuristic

rules. By adding rules to copewith偏|specially and太|too, the systemQLK_DSAA_R

outperforms QLK_DSAA_NR by 4.46% inmicro precision. This proves the utility of

heuristic rules in sentiment analysis.

6.5 Target nouns

Some participating systems, including YSC-DSAA, CityUHK and Biparty, employ

the polarity of target nouns to disambiguate SAAs. The system YSC-DSAA

manually annotates the polarity of target nouns, achieving a good result. In the

752 Y. Wu, P. Jin

123

Page 11: SemEval-2010 task 18: disambiguating sentiment ambiguous adjectives

systems of CityUHK, positive and negative nouns are classified and annotated. By

using the polarity of target nouns, the system CityUHK4 outperforms CityUHK2 by

9.84 % in micro precision. The system Biparty tries to automatically extract the

negative nouns from large corpus by using the random walk model, but the

experimental results do not meet the authors’ expectation.

In our work of Wu and Wen (2010) as well as Wen and Wu (2011), the task of

disambiguating SAAs is also reduced to sentiment classification of nouns. The SO

of SAAs in a given phrase can be calculated by the following equation:

1 if a is positive-like C(a) =

-1 if a is negative-like

⎧⎨⎩

1 if n is positive expectation C(n) =

-1 if n is negative expectation

⎧⎨⎩

SO(a)=C(a)*C(n)

If adverb=“ |not”, SO(a)= -SO(a)

(3)

where C(a) denotes the category of SAAs; C(n) denotes the sentiment expectation of

nouns. Then the task is transformed to automatically determine the sentiment

expectation of nouns, which is an important research issue in itself and has many

application usages in sentiment analysis. Wu and Wen (2010) mine the Web using

lexico-syntactic patterns to infer the sentiment expectation of nouns, and then exploit

character-sentiment model to reduce noises caused by the Web data. In the work of

Wen and Wu (2011), a bootstrapping framework is designed to retrieve patterns that

might be used to express complaints from theWeb, and then the sentiment expectation

of a noun could be automatically predicted with the output patterns.

6.6 Context and world knowledge

The two systems of HITSZ_CITYU_2 and HITSZ_CITYU_1 exploit intra-sentence

opinion analysis to disambiguate SAAs, achieving a quite good result. In some

cases, to correctly disambiguate SAAs is a quite hard work since it requires world

knowledge. For instance, the following sentence is very hard to cope with:

(6) 这位 跳水运动员 的 动作 难度 很 大.

This diver’s movement is very difficult.

“难度很大|very difficult” generally evokes people’s negative feeling. However,

according to our world knowledge, the more difficult the movement is, the greater

the diver will be rewarded. So the polarity of 大|large in this sentence is positive.

7 Conclusion

Disambiguating sentiment ambiguous adjectives poses a challenging task in

sentiment analysis. The task of disambiguating sentiment ambiguous adjectives at

SemEval-2010 tries to encourage researchers’ study on this problem. In this paper,

SemEval-2010 task 18 753

123

Page 12: SemEval-2010 task 18: disambiguating sentiment ambiguous adjectives

we give a detailed description of this task, give a brief introduction to the

participating systems, and discuss different approaches. The experimental results of

the participating systems are promising, and the used approaches are diversified and

novel.

We are eager to see further research on this issue, and we encourage an

integration of the disambiguation of sentiment ambiguous adjectives into applica-

tions of sentiment analysis.

Acknowledgments This work was supported by National High Technology Research and DevelopmentProgram of China (863 Program) (No. 2012AA011101) and 2009 Chiang Ching-kuo Foundation forInternational Scholarly Exchange (No. RG013-D-09).

References

Andreevskaia, A., & Bergler, S. (2006). Sentiment tagging of adjectives at the meaning level. In The 19thCanadian conference on artificial intelligence.

Balahur, A., & Montoyo, A. (2010). The OpAL participation in the SemEval-2010 Task 18:

Disambiguation of sentiment ambiguous adjectives. In Proceedings of 5th international workshopon semantic evaluation.

Ding, X., Liu, B., & Yu, P. (2008). A holistic lexicon-based approach to opinion mining. In Proceedingsof WSDM-2006.

Esuli, A., & Sebastiani, F. (2006). SentiWordNet: A publicly available lexical resource for opinion

mining. In Proceedings of LREC-2006.Hatzivassiloglou, V., & McKeown, K. (1997). Predicting the semantic orientation of adjectives. In

Proceedings of ACL-1997.Jin, W., & Ho, H. (2009). A novel lexicalized HMM-based learning framework for web opinion mining.

In Proceedings of the 26th annual international conference on machine learning (ICML-09).Justeson, J., & Kats, S. (1995). Principled disambiguation: Discriminating adjective senses with modified

nouns. Computational Lingustics, 21(1), 1–27.Kim, S., & Hovy, E. (2004). Determining the sentiment of opinions. In Proceedings of COLING-2004.Li, F., Han, C., Huang, M., Zhu, X., Xia, Y., Zhang, S., & Yu, H. (2010). Structure-aware review mining

and summarization. In Proceedings of COLING-2010.Lu, B., & Tsou, B. (2010). CityU-DAC: Disambiguating sentiment-ambiguous adjectives within context.

In Proceedings of 5th international workshop on semantic evaluation.McCarthy, D., & Carroll, J. (2003). Disambiguating nouns, verbs and adjectives using automatically

acquired selectional preferences. Computational Linguistics, 29(4), 639–654.Meng, X., & Wang, H. (2010). Bootstrapping word dictionary based on random walking on biparty graph.

In Proceedings of 5th international workshop on semantic evaluation.Pak, A., & Paroubek, P. (2010). Using Twitter for disambiguating sentiment ambiguous adjectives.

In Proceedings of 5th international workshop on semantic evaluation.Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in

Information Retrieval.

Qiu, L., Zhang, W., Hu, C., & Zhao, K. (2009). SELC: A self-supervised model for sentiment analysis.

In Proceedings of CIKM-2009.Su, F., & Markert, K. (2008). From words to senses: A case study of subjectivity recognition.

In Proceedings of COLING-2008.Takamura, H., Inui, T., & Okumura, M. (2006). Latent variable models for semantic orientations of

phrases. In Proceedings of EACL-2006.Takamura, H., Inui, T., & Okumura, M. (2007). Extracting semantic orientations of phrases from

dictionary. In Proceedings of NAACL HLT-2007.Turney, P., & Littman, M. (2003). Measuring praise and criticism: Inference of semantic orientation from

association. ACM Transaction on Information Systems, 21(4), 315–346.Wen, M., & Wu, Y. (2011). Predicting expectation of nouns using bootstrapping method. In Proceedings

of IJCNLP-2011.

754 Y. Wu, P. Jin

123

Page 13: SemEval-2010 task 18: disambiguating sentiment ambiguous adjectives

Wiebe, J., & Mihalcea, R. (2006). Word sense and subjectivity. In Proceedings of ACL-2006.Wilson, T., Wiebe, J., & Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment

analysis. In Proceedings of HLT/EMNLP-2005.Wu, Y., & Wen, M. (2010). Disambiguating dynamic sentiment ambiguous adjectives. In Proceedings of

COLING-2010.Xu, R., Xu, J., & Kit, C. (2010). HITSZ_CITYU: Combine collocation, context words and neighboring

sentence sentiment in sentiment adjectives disambiguation. In Proceedings of 5th internationalworkshop on semantic evaluation.

Yang, S., & Liu, M. (2010). YSC-DSAA: An approach to disambiguate sentiment ambiguous adjectives

based on SAAOL. In Proceedings of 5th international workshop on semantic evaluation.Yarowsky, D. (1993). One sense per collocation. In Proceedings of ARPA human language technology

workshop.

SemEval-2010 task 18 755

123