Upload
peng
View
217
Download
0
Embed Size (px)
Citation preview
ORIGINAL PAPER
SemEval-2010 task 18: disambiguating sentimentambiguous adjectives
Yunfang Wu · Peng Jin
Published online: 1 December 2012
© Springer Science+Business Media Dordrecht 2012
Abstract Sentiment ambiguous adjectives, which have been neglected by most
previous researches, pose a challenging task in sentiment analysis. We present an
evaluation task at SemEval-2010, designed to provide a framework for comparing
different approaches on this problem. The task focuses on 14 Chinese sentiment
ambiguous adjectives, and provides manually labeled test data. There are 8 teams
submitting 16 systems in this task. In this paper, we define the task, describe the data
creation, list the participating systems, and discuss different approaches.
Keywords Sentiment ambiguous adjectives · Sentiment analysis ·
Word sense disambiguation · SemEval
1 Introduction
In recent years, sentiment analysis has attracted considerable attention in the field of
natural language processing. It is the task of mining positive and negative opinions
from real texts, which can be applied to many natural language application systems,
such as document summarization and question answering. Previous work on this
problem falls into three groups: opinion mining of documents, sentiment
classification of sentences and polarity prediction of words. Sentiment analysis at
both document and sentence level relies heavily on word level. Another line of
Y. Wu (&)
Key Laboratory of Computational Linguistics (Peking University), Ministry of Education,
Beijing, China
e-mail: [email protected]
P. Jin
Laboratory of Intelligent Information Processing and Application,
Leshan Normal University, Leshan, China
e-mail: [email protected]
123
Lang Resources & Evaluation (2013) 47:743-755
DOI 10.1007/s10579-012-9206-z
research is feature-based sentiment analysis that extracts product features and the
opinion towards them (e.g. Jin and Ho 2009; Li et al. 2010), which is also based on
the lexical semantic orientation.
The most frequently explored task at word level is to determine the semantic
orientation (SO) of words, in which most work centers on assigning a prior polarity
to words or word senses in the lexicon out of context. However, for some words, the
polarity varies strongly with context. For instance, the word “low” has a positive
orientation in “low cost” but a negative orientation in “low salary”. This makes it
hard to attach each word to a specific sentiment category in the lexicon. Turney and
Littman (2003) claim that sentiment ambiguous words cannot be avoided easily in a
real-world application. But unfortunately, sentiment ambiguous words are neglected
by most researches concerning sentiment analysis (e.g., Hatzivassiloglou and
McKeown 1997; Turney and Littman 2003; Kim and Hovy 2004).
Also, sentiment ambiguous words have not been intentionally tackled in the
researches of word sense disambiguation, where senses are defined as word
meanings rather than semantic orientations. Actually, disambiguating sentiment
ambiguous words is an interaction task between sentiment analysis and word sense
disambiguation.
Our task at SemEval-2010 provides a benchmark data set to encourage studies on
disambiguating sentiment ambiguous adjectives (SAAs) within context in real text.
We limit our work to 14 frequently used adjectives in Chinese, such as “large, small,
many, few, high, low”, which all have the meaning of measurement. Although the
number of such ambiguous adjectives is not large, they are frequently used in real
text, especially in the texts expressing opinions and emotions. The work of Wu and
Wen (2010) has proven that the disambiguation of 14 SAAs can obviously improve
the performance of sentiment classification of product reviews. Our task attracts
researchers’ attention, and there are 8 teams coming from France, Spain, China
mainland and Hong Kong.
The rest of this paper is organized as follows. Section 2 discusses related work;
Sect. 3 defines the task; Sect. 4 describes the data collection; Sect. 5 gives a brief
summary of 16 participating systems; Sect. 6 gives a discussion; finally Sect. 7
draws conclusions.
2 Related work
2.1 Word-level sentiment analysis
Recently there has been extensive research in sentiment analysis, for which Pang
and Lee (2008) give an in-depth survey of literature. Closer to our study is the large
body of work on automatic SO prediction of words (Hatzivassiloglou and McKeown
1997; Turney and Littman 2003; Kim and Hovy 2004; Andreevskaia and Bergler
2006), but unfortunately they discard SAAs or just give a prior polarity to each SAA
in their research. In recent years, some studies go a step further, attaching SO to
senses instead of word forms (Esuli and Sebastiani 2006; Wiebe and Mihalcea 2006;
Su and Markert 2008), but their work is still limited in lexicon out of context.
744 Y. Wu, P. Jin
123
The most relevant work is Ding et al. (2008), in which SAAs are named as
context dependant opinions. They argue that there is no way to know the SO of
SAAs without prior knowledge, and asking a domain expert to provide such
knowledge is scalable. So they adopt a holistic lexicon-based approach to solve this
problem, by exploiting external information and evidences in other sentences and
other reviews. Wu and Wen (2010), Wen and Wu (2011) disambiguate dynamic
SAAs by extracting sentiment expectation of nouns using lexical-syntactic patterns.
2.2 Phrase-level sentiment analysis
The disambiguation of SAAs can also be considered as a problem of phrase-level
sentiment analysis. Wilson et al. (2005) present a two-step process to recognize
contextual polarity that employs machine learning and a variety of features.
Takamura et al. (2006, 2007) propose latent variable model and lexical network to
determine the SO of phrases, focusing on “noun + adjective” pairs. Their
experimental results suggest that the classification of pairs containing ambiguous
adjectives is much harder than those with unambiguous adjectives. In this task, we
also deal with “noun + adjective” pairs but focus on the much harder task of
disambiguating SAAs.
2.3 Disambiguating adjectives
Although quite a lot of work has devoted to disambiguate word senses, limited work
intentionally tackles the problem of disambiguating adjectives, since most work
focuses on the meanings of nouns and verbs.
Yarowsky (1993) utilizes collocations to disambiguate nouns, verbs and
adjectives. Justeson and Kats (1995) argue for a linguistically principled approach
to disambiguate adjective senses, and conclude that about three-quarters of all
instances of the adjectives can be disambiguated by the nouns they modify or by
syntactic constructions. McCarthy and Carroll (2003) explore selectional prefer-
ences on the disambiguation of verbs, nouns and adjectives.
3 Task set up
SAAs can be divided into two groups: static SAAs and dynamic SAAs. A static
SAA has different semantic orientations corresponding to different senses, which
can be defined in the lexicon. For instance, 骄傲|pride has two senses: one sense is
“pride” that is positive; the other sense is “conceited” that is negative. Dynamic
SAAs are neutral out of context, and their SOs are evoked only when they are
occurring in specific contexts, which make it impossible to assign a polarity tag to a
dynamic SAA in the lexicon. For instance, it is quite difficult to assign a polarity tag
to the word 高|high out of context.
In this task, we focus on 14 frequently used dynamic SAAs in Chinese, as shown
below:
SemEval-2010 task 18 745
123
(1) Sentiment ambiguous adjectives (SAAs) = {大|large, 多|many, 高|high, 厚|
thick, 深|deep, 重|heavy, 巨大|huge, 重大|great, 小 |small, 少|few, 低|low,
薄|thin, 浅|shallow, 轻|light}
These adjectives are neutral out of context, but when they co-occur with some
target nouns, positive or negative emotion will be evoked. The task is designed to
automatically determine the SO of these SAAs within context. For example, 高|high
should be assigned as positive in “工资高|salary is high” but negative in “价格高|
price is high”.
In this task, no training data is provided by the organizers, but external resources,
including training data and lexicon, are encouraged to use by the participating
systems.
4 Data creation
4.1 Data
We collected data from two sources. The main part was extracted from Xinhua News
Agency of Chinese Gigaword (Second Edition) released by LDC. The texts were
automatically word-segmented and POS-tagged using the open software ICTCLAS,1
which is based on a hierarchical hidden Markov model. In order to concentrate on the
disambiguation of SAAs, and avoid the complicated processing of syntactic parsing,
we extracted some sentences containing strings that respect the pattern shown in (2),
where the target nouns are modified by the adjectives in most cases.
(2) noun + adverb + adjective (adjectiveϵSAAs)
e.g. 成本/n 较/d 低/a
The cost is relatively lower.
Another small part of data was extracted from the Web. Using the search engine
Google,2 we searched the queries as in (3):
(3) 很|very + adjective (adjectiveϵSAAs)
From the returned snippets, we manually picked out some sentences that contain
strings that follow the pattern (2). Also, the sentences were automatically segmented
and POS-tagged using ICTCLAS.SAAs in the data were assigned as positive, negative or neutral independently by
two annotators. Since the task focuses on the distinction between positive and
negative categories, the neutral instances were removed at last. The inter-annotator
agreement is in a high level with a kappa value of 0.91, indicating that
disambiguating SAAs within context by humans is not a hard work. After cases
with disagreement were negotiated between the two annotators, a gold standard
annotation data was agreed upon.
1 http://www.ictclas.org/.2 http://www.google.com/.
746 Y. Wu, P. Jin
123
In total 2,917 instances were provided as the test data in the task. The number of
instances of per target adjective is listed in Table 3. The instances are given in XML
format. Table 1 gives an example of the adjective 多|many, where “senseid = ”/” is
waiting for the correct answer that is a polarity tag of positive or negative. The
dataset can be downloaded freely from the SemEval-2010 website.3
Evaluation was performed in terms of micro precision and macro precision:
Pmir ¼XN
i¼1
mi
,XN
i¼1
ni ð1Þ
Table 1 An example of the test
data\instance id = “多.3”[
\answer instance = “多.3” senseid = “”/[
\context[
王义夫自言收获颇 \head[多\/head[
\/context[
\postagging[
\/word[
\word id = “1” pos = “nr”[
\token[王\/token[
\/word[
\word id = “2” pos = “nr”[
\token[义夫\/token[
\/word[
\word id = “3” pos = “p”[
\token[自\/token[
\/word[
\word id = “4” pos = “vg”[
\token[言\/token[
\/word[
\word id = “5” pos = “n”[
\token[收获\/token[
\/word[
\word id = “6” pos = “d”[
\token[颇\/token[
\/word[
\word id = “7” pos = “a”[
\token[多\/token[
\/word[
\/postagging[
\/instance[
3 http://semeval2.fbk.eu/semeval2.php?location=data.
SemEval-2010 task 18 747
123
Pmar ¼XN
i¼1
Pi=N Pi ¼ mi=ni ð2Þ
where N is the number of all target words, ni is the number of all test instances for a
specific word, and mi is the number of correctly labeled instances.
4.2 Baseline
We group 14 SAAs into two categories: positive-like adjectives and negative-like
adjectives. Positive-like adjectives have the connotation towards large measure-
ment, whereas negative-like adjectives have the connotation towards small
measurement.
(4) Positive-like adjectives (Pa) = {大|large, 多|many, 高|high, 厚|thick, 深 |deep,
重 |heavy, 巨大|huge, 重大|great}
(5) Negative-like adjectives (Na) = {小|small, 少|few, 低|low, 薄|thin, 浅|shallow,
轻|light}
We conducted baseline experiments on the dataset. Not considering the context,
assign all positive-like adjectives as positive and all negative-like adjectives as
negative. The micro precision of the baseline is 61.20 %.
5 Systems and results
We published first trial data and then test data. In total 11 different teams
downloaded both the trial and test data. Finally 8 teams submitted their
experimental results, including 16 systems.
5.1 Results
Table 2 lists all 16 systems’ scores, ranked from best to lowest performance by
micro precision. The best system gets a micro precision of 94.20 %, which
outperforms our baseline by 33 %. There are 5 systems that cannot rival our
baseline. The performance of the lowest ranked system is only a little higher than
random baseline, which is 50 % when we randomly assign a SO tag to each instance
in the test data. To our surprise, the performances of different systems differ greatly.
The gap between the best and lowest-ranked systems is 43.12 % measured by micro
precision.
Table 3 lists the statistics of per target adjective, where “Ins#” denotes the
number of instances in the test data; “Max %” and “Min %” denote the max and min
micro precision among all systems respectively; “SD” denotes the standard
deviation of precision.
Table 3 shows that the performances of different systems also differ greatly on
each of 14 target adjectives. For example, the precision of 大|large is 95.53 % by
one system but only 46.51 % by another system. There is neither a fixed adjective
748 Y. Wu, P. Jin
123
that is hard to tackle for all systems nor a fixed adjective that is easy to tackle for all
systems.
5.2 Systems
In this section, we give a brief description to the participating systems.
Table 2 The scores
of 16 systemsSystem Micro pre. (%) Macro pre. (%)
YSC-DSAA 94.20 92.93
HITSZ_CITYU_1 93.62 95.32
HITSZ_CITYU_2 93.32 95.79
Dsaa 88.07 86.20
OpAL 76.04 70.38
CityUHK4 72.47 69.80
CityUHK3 71.55 75.54
HITSZ_CITYU_3 66.58 62.94
QLK_DSAA_R 64.18 69.54
CityUHK2 62.63 60.85
CityUHK1 61.98 67.89
QLK_DSAA_NR 59.72 65.68
Twitter Sentiment 59.00 62.27
Twitter Sentiment_ext 56.77 61.09
Twitter Sentiment_zh 56.46 59.63
Biparty 51.08 51.26
Table 3 The scores of 14 SAAsWords Ins# Max % Min % SD
大 |large 559 95.53 46.51 0.155
多|many 222 95.50 49.10 0.152
高 ||high 546 95.60 54.95 0.139
厚 |thick 20 95.00 35.00 0.160
深 |deep 45 100.00 51.11 0.176
重|heavy 259 96.91 34.75 0.184
巨大 |huge 49 100.00 10.20 0.273
重大 |great 28 100.00 7.14 0.243
小 |small 290 93.10 49.66 0.167
少few 310 95.81 41.29 0.184
低 |low 521 93.67 48.37 0.147
薄 |thin 33 100.00 18.18 0.248
浅 |shallow 8 100.00 37.50 0.155
轻 |light 26 100.00 34.62 0.197
SemEval-2010 task 18 749
123
YSC-DSAA This system (Yang and Liu 2010) manually built a word library
SAAOL (sentiment ambiguous adjectives oriented library). It consists of positive
words, negative words, NSSA (negative sentiment ambiguous adjectives), PSSA
(positive sentiment ambiguous adjectives), and inverse words. A word would be
assigned as NSAA if it collocates with positive-like adjectives, and a word would be
assigned as PSAA if it collocates with negative-like adjectives. For example,
“任务|task” is assigned as NSAA as it collocates with 重|heavy in the phrase of
“任务很重|the task is very heavy”. The system divides sentences into clauses using
heuristic rules, and disambiguates SAAs by analyzing the relationship between
SAAs and the target nouns.
HITSZ_CITYU This group (Xu et al. 2010) submitted three systems, including
one baseline system and two improved systems.
HITSZ_CITYU_3: The baseline system is based on the collocations of opinion
words and target words. For the given adjectives, their collocations are extracted
from People’s Daily corpus. With human annotation, the system obtains 412
positive and 191 negative collocations, which serve as seed collocations. Using the
context words of seed collocations as features, the system trains a one-class SVM
classifier.
HITSZ_CITYU_2 and HITSZ_CITYU_1: Using HowNet-based word similarity,
the system expands the seed collocations on both adjective side and collocated
target noun side. The system then exploits intra-sentence opinion analysis to further
improve performance. The strategy is that if the neighboring sentences on both sides
have the same polarity, the ambiguous adjective would be assigned as the same
polarity; if the neighboring sentences have conflicted polarity, the SO of the
ambiguous adjective would be determined by its context words and the transitive
probability of sentence polarity. The final system (HITSZ_CITYU_1/2) combines
collocations, context words and neighboring sentence sentiment in a two-class SVM
classifier to determine the polarity of ambiguous adjectives. HITSZ_CITYU_2 and
HITSZ_CITYU_1 use different parameters and combining strategies.
OpAL This system (Balahur and Montoyo 2010) combines supervised methods
with unsupervised ones. The authors employ Google translator to automatically
translate the task dataset from Chinese to English, since their system is working in
English. The system explores three types of judgments. The first one trains a SVM
classifier based on NTCIR data and EmotiBlog annotations. The second one is based
on the local polarity, obtained by the returned hits of the search engine, by issuing
queries of “noun + SAA + AND + non-ambiguous adjective”, where the non-
ambiguous adjectives include a positive set (“positive, beautiful, good”) and a
negative set (“negative, ugly, bad”). An example query is “price high and good”.
The third judgment consists of some rules. The final result is obtained by the
majority vote of the three components.
CityUHK This group submitted four systems (Lu and Tsou 2010). Both machine
learning method and lexicon-based method are employed in their systems. In the
machine learning method, maximum entropy model is utilized to train a classifier
based on the Chinese data from NTCIR opinion task. Clause-level and sentence-
level classifiers are trained and compared. In the lexicon-based method, the authors
classify SAAs into two clusters: intensifiers (our positive-like adjectives in (4)) and
750 Y. Wu, P. Jin
123
suppressors (our negative-like adjectives in (5)). Moreover, the collocation nouns
are also classified into two clusters: positive nouns (e.g., 素质|quality) and negative
nouns (e.g., 风险|risk). And then the polarity of a SAA is determined by its
collocation noun.
CityUHK4: clause-level machine learning + lexicon.
CityUHK3: sentence-level machine learning + lexicon.
CityUHK2: clause-level machine learning.
CityUHK1: sentence-level machine learning.
QLK_DSAA This group submitted two systems. The authors adopt their SELC
(SElf-supervised, Lexicon-based and Corpus-based) model (Qiu et al. 2009), which
is proposed to exploit the complementarities between lexicon-based and corpus-
based methods to improve the whole performance. They determine the sentence
polarity by SELC model, and simply regard the sentence polarity as the polarity of
SAA in the sentence.
QLK_DSAA_NR: Based on the result of SELC model, they inverse the SO of
SAA when it is modified by negative terms. Our task consists of only positive and
negative categories, so they replace the neutral value obtained by SELC model with
the predominant polarity of the SAA.
QLK_DSAA_R: Based on the result of QLK_DSAA_NR, they add rules to cope
with two modifiers 偏|specially and 太|too, which always have the negative
meanings.
Twitter sentiment This group submitted three systems (Pak and Paroubek 2010).
By exploiting Twitter, they automatically collect English and Chinese datasets
consisting of negative and positive expressions. The sentiment classifier is trained
using Naive Bayes model with n-gram of words as features.
Twitter Sentiment: The task dataset is automatically translated from Chinese to
English using Google translator. They train a Bayes classifier based on the English
training data that is automatically extracted from Twitter.
Twitter Sentiment_ext: With Twitter Sentiment as a base, they utilize extended
data.
Twitter Sentiment_zh: They train a Bayes classifier based on the Chinese training
data that is automatically extracted from Twitter.
Biparty This system (Meng and Wang 2010) transforms the task of disambig-
uating SAAs to predict the polarity of target nouns. The system presents a
bootstrapping method to automatically build the sentiment lexicon, by building a
nouns-verbs bi-party graph from a large corpus. Firstly they select a few nouns as
seed words, and then they use a cross inducing method to expand more nouns and
more verbs into the lexicon. The strategy is based on the random walk model.
6 Discussion
To our delight, the participating 8 teams exploit totally different methods in
disambiguating SAAs. The experimental results of some systems are promising, and
the micro precision of the best three systems is over 93 %. Although the
SemEval-2010 task 18 751
123
experimental results of some systems are not so good, their adopted methods are
novel and interesting.
6.1 Human annotation
In the YSC-DSAA system, the word library of SAAOL is built by humans. In the
HITSZ_CITYU systems, the seed collocations are annotated by humans. The three
systems (YSC-DSAA, HITSZ_CITYU_1, HITSZ_CITYU_2) rank top 3 among all
systems. Undoubtedly, human annotated resources can help improve the perfor-
mance of disambiguating SAAs.
6.2 Training data
The system OpAL combines supervised method with unsupervised ones, and the
supervised method employs a SVM classifier based on NTCIR data and EmotiBlog
annotations. The CityUHK systems train a maximum entropy classifier based on the
Chinese data from NTCIR. The Twitter Sentiment systems utilize a training data
automatically collected from Twitter. The experimental results of CityUHK2 and
CityUHK1 show that the maximum entropy classifier does not work well, mainly due to
the small Chinese training data that is only 9 K. The performances of the Twitter
Sentiment systems are evenworse than our baseline,mainly due to the poor quality of the
training data that is automatically collected from Twitter. What’s more, the training data
designed for sentiment analysis is not qualified for our task of disambiguating SAAs.
6.3 Cross-lingual resources
Our task is in Chinese. Some participating systems, including OpAL and Twitter
Sentiment, exploit English sentiment analysis by translating our Chinese data into
English. The OpAL system achieves a quite good result. It is interesting that the
system Twitter Sentiment based on automatically extracted English training data
gets even better results than Twitter Sentiment_zh that is based on Chinese training
data. It proves the cross-lingual property of the polarity of SAAs and demonstrates
that disambiguating SAAs is a common task in natural language processing.
6.4 Heuristic rules
Some participating systems, including OpAL and QLK_DSAA, employ heuristic
rules. By adding rules to copewith偏|specially and太|too, the systemQLK_DSAA_R
outperforms QLK_DSAA_NR by 4.46% inmicro precision. This proves the utility of
heuristic rules in sentiment analysis.
6.5 Target nouns
Some participating systems, including YSC-DSAA, CityUHK and Biparty, employ
the polarity of target nouns to disambiguate SAAs. The system YSC-DSAA
manually annotates the polarity of target nouns, achieving a good result. In the
752 Y. Wu, P. Jin
123
systems of CityUHK, positive and negative nouns are classified and annotated. By
using the polarity of target nouns, the system CityUHK4 outperforms CityUHK2 by
9.84 % in micro precision. The system Biparty tries to automatically extract the
negative nouns from large corpus by using the random walk model, but the
experimental results do not meet the authors’ expectation.
In our work of Wu and Wen (2010) as well as Wen and Wu (2011), the task of
disambiguating SAAs is also reduced to sentiment classification of nouns. The SO
of SAAs in a given phrase can be calculated by the following equation:
1 if a is positive-like C(a) =
-1 if a is negative-like
⎧⎨⎩
1 if n is positive expectation C(n) =
-1 if n is negative expectation
⎧⎨⎩
SO(a)=C(a)*C(n)
If adverb=“ |not”, SO(a)= -SO(a)
(3)
where C(a) denotes the category of SAAs; C(n) denotes the sentiment expectation of
nouns. Then the task is transformed to automatically determine the sentiment
expectation of nouns, which is an important research issue in itself and has many
application usages in sentiment analysis. Wu and Wen (2010) mine the Web using
lexico-syntactic patterns to infer the sentiment expectation of nouns, and then exploit
character-sentiment model to reduce noises caused by the Web data. In the work of
Wen and Wu (2011), a bootstrapping framework is designed to retrieve patterns that
might be used to express complaints from theWeb, and then the sentiment expectation
of a noun could be automatically predicted with the output patterns.
6.6 Context and world knowledge
The two systems of HITSZ_CITYU_2 and HITSZ_CITYU_1 exploit intra-sentence
opinion analysis to disambiguate SAAs, achieving a quite good result. In some
cases, to correctly disambiguate SAAs is a quite hard work since it requires world
knowledge. For instance, the following sentence is very hard to cope with:
(6) 这位 跳水运动员 的 动作 难度 很 大.
This diver’s movement is very difficult.
“难度很大|very difficult” generally evokes people’s negative feeling. However,
according to our world knowledge, the more difficult the movement is, the greater
the diver will be rewarded. So the polarity of 大|large in this sentence is positive.
7 Conclusion
Disambiguating sentiment ambiguous adjectives poses a challenging task in
sentiment analysis. The task of disambiguating sentiment ambiguous adjectives at
SemEval-2010 tries to encourage researchers’ study on this problem. In this paper,
SemEval-2010 task 18 753
123
we give a detailed description of this task, give a brief introduction to the
participating systems, and discuss different approaches. The experimental results of
the participating systems are promising, and the used approaches are diversified and
novel.
We are eager to see further research on this issue, and we encourage an
integration of the disambiguation of sentiment ambiguous adjectives into applica-
tions of sentiment analysis.
Acknowledgments This work was supported by National High Technology Research and DevelopmentProgram of China (863 Program) (No. 2012AA011101) and 2009 Chiang Ching-kuo Foundation forInternational Scholarly Exchange (No. RG013-D-09).
References
Andreevskaia, A., & Bergler, S. (2006). Sentiment tagging of adjectives at the meaning level. In The 19thCanadian conference on artificial intelligence.
Balahur, A., & Montoyo, A. (2010). The OpAL participation in the SemEval-2010 Task 18:
Disambiguation of sentiment ambiguous adjectives. In Proceedings of 5th international workshopon semantic evaluation.
Ding, X., Liu, B., & Yu, P. (2008). A holistic lexicon-based approach to opinion mining. In Proceedingsof WSDM-2006.
Esuli, A., & Sebastiani, F. (2006). SentiWordNet: A publicly available lexical resource for opinion
mining. In Proceedings of LREC-2006.Hatzivassiloglou, V., & McKeown, K. (1997). Predicting the semantic orientation of adjectives. In
Proceedings of ACL-1997.Jin, W., & Ho, H. (2009). A novel lexicalized HMM-based learning framework for web opinion mining.
In Proceedings of the 26th annual international conference on machine learning (ICML-09).Justeson, J., & Kats, S. (1995). Principled disambiguation: Discriminating adjective senses with modified
nouns. Computational Lingustics, 21(1), 1–27.Kim, S., & Hovy, E. (2004). Determining the sentiment of opinions. In Proceedings of COLING-2004.Li, F., Han, C., Huang, M., Zhu, X., Xia, Y., Zhang, S., & Yu, H. (2010). Structure-aware review mining
and summarization. In Proceedings of COLING-2010.Lu, B., & Tsou, B. (2010). CityU-DAC: Disambiguating sentiment-ambiguous adjectives within context.
In Proceedings of 5th international workshop on semantic evaluation.McCarthy, D., & Carroll, J. (2003). Disambiguating nouns, verbs and adjectives using automatically
acquired selectional preferences. Computational Linguistics, 29(4), 639–654.Meng, X., & Wang, H. (2010). Bootstrapping word dictionary based on random walking on biparty graph.
In Proceedings of 5th international workshop on semantic evaluation.Pak, A., & Paroubek, P. (2010). Using Twitter for disambiguating sentiment ambiguous adjectives.
In Proceedings of 5th international workshop on semantic evaluation.Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in
Information Retrieval.
Qiu, L., Zhang, W., Hu, C., & Zhao, K. (2009). SELC: A self-supervised model for sentiment analysis.
In Proceedings of CIKM-2009.Su, F., & Markert, K. (2008). From words to senses: A case study of subjectivity recognition.
In Proceedings of COLING-2008.Takamura, H., Inui, T., & Okumura, M. (2006). Latent variable models for semantic orientations of
phrases. In Proceedings of EACL-2006.Takamura, H., Inui, T., & Okumura, M. (2007). Extracting semantic orientations of phrases from
dictionary. In Proceedings of NAACL HLT-2007.Turney, P., & Littman, M. (2003). Measuring praise and criticism: Inference of semantic orientation from
association. ACM Transaction on Information Systems, 21(4), 315–346.Wen, M., & Wu, Y. (2011). Predicting expectation of nouns using bootstrapping method. In Proceedings
of IJCNLP-2011.
754 Y. Wu, P. Jin
123
Wiebe, J., & Mihalcea, R. (2006). Word sense and subjectivity. In Proceedings of ACL-2006.Wilson, T., Wiebe, J., & Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment
analysis. In Proceedings of HLT/EMNLP-2005.Wu, Y., & Wen, M. (2010). Disambiguating dynamic sentiment ambiguous adjectives. In Proceedings of
COLING-2010.Xu, R., Xu, J., & Kit, C. (2010). HITSZ_CITYU: Combine collocation, context words and neighboring
sentence sentiment in sentiment adjectives disambiguation. In Proceedings of 5th internationalworkshop on semantic evaluation.
Yang, S., & Liu, M. (2010). YSC-DSAA: An approach to disambiguate sentiment ambiguous adjectives
based on SAAOL. In Proceedings of 5th international workshop on semantic evaluation.Yarowsky, D. (1993). One sense per collocation. In Proceedings of ARPA human language technology
workshop.
SemEval-2010 task 18 755
123