9
Multilingual Subjectivity Analysis Using Machine Translation Carmen Banea and Rada Mihalcea University of North Texas [email protected], [email protected] Janyce Wiebe University of Pittsburgh [email protected] Samer Hassan University of North Texas [email protected] Abstract Although research in other languages is in- creasing, much of the work in subjectivity analysis has been applied to English data, mainly due to the large body of electronic re- sources and tools that are available for this lan- guage. In this paper, we propose and evalu- ate methods that can be employed to transfer a repository of subjectivity resources across lan- guages. Specifically, we attempt to leverage on the resources available for English and, by employing machine translation, generate re- sources for subjectivity analysis in other lan- guages. Through comparative evaluations on two different languages (Romanian and Span- ish), we show that automatic translation is a viable alternative for the construction of re- sources and tools for subjectivity analysis in a new target language. 1 Introduction We have seen a surge in interest towards the ap- plication of automatic tools and techniques for the extraction of opinions, emotions, and sentiments in text (subjectivity). A large number of text process- ing applications have already employed techniques for automatic subjectivity analysis, including auto- matic expressive text-to-speech synthesis (Alm et al., 2005), text semantic analysis (Wiebe and Mihal- cea, 2006; Esuli and Sebastiani, 2006), tracking sen- timent timelines in on-line forums and news (Lloyd et al., 2005; Balog et al., 2006), mining opinions from product reviews (Hu and Liu, 2004), and ques- tion answering (Yu and Hatzivassiloglou, 2003). A significant fraction of the research work to date in subjectivity analysis has been applied to English, which led to several resources and tools available for this language. In this paper, we explore multiple paths that employ machine translation while lever- aging on the resources and tools available for En- glish, to automatically generate resources for sub- jectivity analysis for a new target language. Through experiments carried out with automatic translation and cross-lingual projections of subjectivity annota- tions, we try to answer the following questions. First, assuming an English corpus manually an- notated for subjectivity, can we use machine trans- lation to generate a subjectivity-annotated corpus in the target language? Second, assuming the availabil- ity of a tool for automatic subjectivity analysis in English, can we generate a corpus annotated for sub- jectivity in the target language by using automatic subjectivity annotations of English text and machine translation? Finally, third, can these automatically generated resources be used to effectively train tools for subjectivity analysis in the target language? Since our methods are particularly useful for lan- guages with only a few electronic tools and re- sources, we chose to conduct our initial experiments on Romanian, a language with limited text process- ing resources developed to date. Furthermore, to validate our results, we carried a second set of ex- periments on Spanish. Note however that our meth- ods do not make use of any target language specific knowledge, and thus they are applicable to any other language as long as a machine translation engine ex- ists between the selected language and English.

Multilingual Subjectivity Analysis Using Machine Translationweb.eecs.umich.edu/~mihalcea/papers/banea.emnlp08.pdf · 2009-01-16 · machine translation to produce a corpus in the

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multilingual Subjectivity Analysis Using Machine Translationweb.eecs.umich.edu/~mihalcea/papers/banea.emnlp08.pdf · 2009-01-16 · machine translation to produce a corpus in the

Multilingual Subjectivity Analysis Using Machine Translation

Carmen Banea and Rada MihalceaUniversity of North Texas

[email protected], [email protected]

Janyce WiebeUniversity of Pittsburgh

[email protected]

Samer HassanUniversity of North Texas

[email protected]

Abstract

Although research in other languages is in-creasing, much of the work in subjectivityanalysis has been applied to English data,mainly due to the large body of electronic re-sources and tools that are available for this lan-guage. In this paper, we propose and evalu-ate methods that can be employed to transfer arepository of subjectivity resources across lan-guages. Specifically, we attempt to leverageon the resources available for English and, byemploying machine translation, generate re-sources for subjectivity analysis in other lan-guages. Through comparative evaluations ontwo different languages (Romanian and Span-ish), we show that automatic translation is aviable alternative for the construction of re-sources and tools for subjectivity analysis ina new target language.

1 Introduction

We have seen a surge in interest towards the ap-plication of automatic tools and techniques for theextraction of opinions, emotions, and sentiments intext (subjectivity). A large number of text process-ing applications have already employed techniquesfor automatic subjectivity analysis, including auto-matic expressive text-to-speech synthesis (Alm etal., 2005), text semantic analysis (Wiebe and Mihal-cea, 2006; Esuli and Sebastiani, 2006), tracking sen-timent timelines in on-line forums and news (Lloydet al., 2005; Balog et al., 2006), mining opinionsfrom product reviews (Hu and Liu, 2004), and ques-tion answering (Yu and Hatzivassiloglou, 2003).

A significant fraction of the research work to datein subjectivity analysis has been applied to English,which led to several resources and tools available forthis language. In this paper, we explore multiplepaths that employ machine translation while lever-aging on the resources and tools available for En-glish, to automatically generate resources for sub-jectivity analysis for a new target language. Throughexperiments carried out with automatic translationand cross-lingual projections of subjectivity annota-tions, we try to answer the following questions.

First, assuming an English corpus manually an-notated for subjectivity, can we use machine trans-lation to generate a subjectivity-annotated corpus inthe target language? Second, assuming the availabil-ity of a tool for automatic subjectivity analysis inEnglish, can we generate a corpus annotated for sub-jectivity in the target language by using automaticsubjectivity annotations of English text and machinetranslation? Finally, third, can these automaticallygenerated resources be used to effectively train toolsfor subjectivity analysis in the target language?

Since our methods are particularly useful for lan-guages with only a few electronic tools and re-sources, we chose to conduct our initial experimentson Romanian, a language with limited text process-ing resources developed to date. Furthermore, tovalidate our results, we carried a second set of ex-periments on Spanish. Note however that our meth-ods do not make use of any target language specificknowledge, and thus they are applicable to any otherlanguage as long as a machine translation engine ex-ists between the selected language and English.

Page 2: Multilingual Subjectivity Analysis Using Machine Translationweb.eecs.umich.edu/~mihalcea/papers/banea.emnlp08.pdf · 2009-01-16 · machine translation to produce a corpus in the

2 Related Work

Research in sentiment and subjectivity analysis hasreceived increasingly growing interest from the nat-ural language processing community, particularlymotivated by the widespread need for opinion-basedapplications, including product and movie reviews,entity tracking and analysis, opinion summarization,and others.

Much of the work in subjectivity analysis hasbeen applied to English data, though work on otherlanguages is growing: e.g., Japanese data are usedin (Kobayashi et al., 2004; Suzuki et al., 2006;Takamura et al., 2006; Kanayama and Nasukawa,2006), Chinese data are used in (Hu et al., 2005),and German data are used in (Kim and Hovy, 2006).In addition, several participants in the Chineseand Japanese Opinion Extraction tasks of NTCIR-6 (Kando and Evans, 2007) performed subjectivityand sentiment analysis in languages other than En-glish.

In general, efforts on building subjectivity analy-sis tools for other languages have been hampered bythe high cost involved in creating corpora and lexicalresources for a new language. To address this gap,we focus on leveraging resources already developedfor one language to derive subjectivity analysis toolsfor a new language. This motivates the direction ofour research, in which we use machine translationcoupled with cross-lingual annotation projections togenerate the resources and tools required to performsubjectivity classification in the target language.

The work closest to ours is the one reported in(Mihalcea et al., 2007), where a bilingual lexiconand a manually translated parallel text are used togenerate the resources required to build a subjectiv-ity classifier in a new language. In that work, wefound that the projection of annotations across par-allel texts can be successfully used to build a cor-pus annotated for subjectivity in the target language.However, parallel texts are not always available fora given language pair. Therefore, in this paper weexplore a different approach where, instead of rely-ing on manually translated parallel corpora, we usemachine translation to produce a corpus in the newlanguage.

3 Machine Translation for SubjectivityAnalysis

We explore the possibility of using machine transla-tion to generate the resources required to build sub-jectivity annotation tools in a given target language.We focus on two main scenarios. First, assuming acorpus manually annotated for subjectivity exists inthe source language, we can use machine translationto create a corpus annotated for subjectivity in thetarget language. Second, assuming a tool for auto-matic subjectivity analysis exists in the source lan-guage, we can use this tool together with machinetranslation to create a corpus annotated for subjec-tivity in the target language.

In order to perform a comprehensive investiga-tion, we propose three experiments as described be-low. The first scenario, based on a corpus manu-ally annotated for subjectivity, is exemplified by thefirst experiment. The second scenario, based on acorpus automatically annotated with a tool for sub-jectivity analysis, is subsequently divided into twoexperiments depending on the direction of the trans-lation and on the dataset that is translated.

In all three experiments, we use English as asource language, given that it has both a corpus man-ually annotated for subjectivity (MPQA (Wiebe etal., 2005)) and a tool for subjectivity analysis (Opin-ionFinder (Wiebe and Riloff, 2005)).

3.1 Experiment One: Machine Translation ofManually Annotated Corpora

In this experiment, we use a corpus in the sourcelanguage manually annotated for subjectivity. Thecorpus is automatically translated into the target lan-guage, followed by a projection of the subjectivitylabels from the source to the target language. Theexperiment is illustrated in Figure 1.

We use the MPQA corpus (Wiebe et al., 2005),which is a collection of 535 English-language newsarticles from a variety of news sources manually an-notated for subjectivity. Although the corpus wasoriginally annotated at clause and phrase level, weuse the sentence-level annotations associated withthe dataset (Wiebe and Riloff, 2005). From the totalof 9,700 sentences in this corpus, 55% of the sen-tences are labeled as subjective while the rest areobjective. After the automatic translation of the cor-

Page 3: Multilingual Subjectivity Analysis Using Machine Translationweb.eecs.umich.edu/~mihalcea/papers/banea.emnlp08.pdf · 2009-01-16 · machine translation to produce a corpus in the

Figure 1: Experiment one: machine translation of man-ually annotated training data from source language intotarget language

pus and the projection of the annotations, we obtaina large corpus of 9,700 subjectivity-annotated sen-tences in the target language, which can be used totrain a subjectivity classifier.

3.2 Experiment Two: Machine Translation ofSource Language Training Data

In the second experiment, we assume that the onlyresources available are a tool for subjectivity anno-tation in the source language and a collection of rawtexts, also in the source language. The source lan-guage text is automatically annotated for subjectiv-ity and then translated into the target language. Inthis way, we produce a subjectivity annotated cor-pus that we can use to train a subjectivity annotationtool for the target language. Figure 2 illustrates thisexperiment.

In order to generate automatic subjectivity anno-tations, we use the OpinionFinder tool developed by(Wiebe and Riloff, 2005). OpinionFinder includestwo classifiers. The first one is a rule-based high-precision classifier that labels sentences based on thepresence of subjective clues obtained from a largelexicon. The second one is a high-coverage classi-fier that starts with an initial corpus annotated us-ing the high-precision classifier, followed by severalbootstrapping steps that increase the size of the lex-icon and the coverage of the classifier. For most ofour experiments we use the high-coverage classifier.

Figure 2: Experiment two: machine translation of rawtraining data from source language into target language

Table 1 shows the performance of the two Opinion-Finder classifiers as measured on the MPQA corpus(Wiebe and Riloff, 2005).

P R Fhigh-precision 86.7 32.6 47.4high-coverage 79.4 70.6 74.7

Table 1: Precision (P), Recall (R) and F-measure (F) forthe two OpinionFinder classifiers, as measured on theMPQA corpus

As a raw corpus, we use a subset of the SemCorcorpus (Miller et al., 1993), consisting of 107 docu-ments with roughly 11,000 sentences. This is a bal-anced corpus covering a number of topics in sports,politics, fashion, education, and others. The reasonfor working with this collection is the fact that wealso have a manual translation of the SemCor docu-ments from English into one of the target languagesused in the experiments (Romanian), which enablescomparative evaluations of different scenarios (seeSection 4).

Note that in this experiment the annotation of sub-jectivity is carried out on the original source lan-guage text, and thus expected to be more accuratethan if it were applied on automatically translatedtext. However, the training data in the target lan-guage is produced by automatic translation, and thuslikely to contain errors.

Page 4: Multilingual Subjectivity Analysis Using Machine Translationweb.eecs.umich.edu/~mihalcea/papers/banea.emnlp08.pdf · 2009-01-16 · machine translation to produce a corpus in the

3.3 Experiment Three: Machine Translation ofTarget Language Training Data

The third experiment is similar to the second one,except that we reverse the direction of the transla-tion. We translate raw text that is available in thetarget language into the source language, and thenuse a subjectivity annotation tool to label the auto-matically translated source language text. After theannotation, the labels are projected back into the tar-get language, and the resulting annotated corpus isused to train a subjectivity classifier. Figure 3 illus-trates this experiment.

Figure 3: Experiment three: machine translation of rawtraining data from target language into source language

As before, we use the high-coverage classifieravailable in OpinionFinder, and the SemCor corpus.We use a manual translation of this corpus availablein the target language.

In this experiment, the subjectivity annotationsare carried out on automatically generated sourcetext, and thus expected to be less accurate. How-ever, since the training data was originally writtenin the target language, it is free of translation errors,and thus training carried out on this data should bemore robust.

3.4 Upper bound: Machine Translation ofTarget Language Test Data

For comparison purposes, we also propose an ex-periment which plays the role of an upper bound onthe methods described so far. This experiment in-volves the automatic translation of the test data from

the target language into the source language. Thesource language text is then annotated for subjectiv-ity using OpinionFinder, followed by a projection ofthe resulting labels back into the target language.

Unlike the previous three experiments, in thisexperiment we only generate subjectivity-annotatedresources, and we do not build and evaluate a stan-dalone subjectivity analysistool for the target lan-guage. Further training of a machine learning algo-rithm, as in experiments two and three, is required inorder to build a subjectivity analysis tool. Thus, thisfourth experiment is an evaluation of theresourcesgenerated in the target language, which representsan upper bound on the performance of any machinelearning algorithm that would be trained on these re-sources. Figure 4 illustrates this experiment.

Figure 4: Upper bound: machine translation of test datafrom target language into source language

4 Evaluation and Results

Our initial evaluations are carried out on Romanian.The performance of each of the three methods isevaluated using a dataset manually annotated forsubjectivity. To evaluate our methods, we generate aRomanian training corpus annotated for subjectivityon which we train a subjectivity classifier, which isthen used to label the test data.

We evaluate the results against a gold-standardcorpus consisting of 504 Romanian sentences man-ually annotated for subjectivity. These sentencesrepresent the manual translation into Romanian ofa small subset of the SemCor corpus, which wasremoved from the training corpora used in experi-ments two and three. This is the same evaluationdataset as used in (Mihalcea et al., 2007). TwoRomanian native speakers annotated the sentencesindividually, and the differences were adjudicated

Page 5: Multilingual Subjectivity Analysis Using Machine Translationweb.eecs.umich.edu/~mihalcea/papers/banea.emnlp08.pdf · 2009-01-16 · machine translation to produce a corpus in the

through discussions. The agreement of the two an-notators is 0.83% (κ = 0.67); when the uncertain an-notations are removed, the agreement rises to 0.89(κ = 0.77). The two annotators reached consensuson all sentences for which they disagreed, resultingin a gold standard dataset with 272 (54%) subjectivesentences and 232 (46%) objective sentences. Moredetails about this dataset are available in (Mihalceaet al., 2007).

In order to learn from our annotated data, we ex-periment with two different classifiers, Naıve Bayesand support vector machines (SVM), selected fortheir performance and diversity of learning method-ology. For Naıve Bayes, we use the multinomialmodel (McCallum and Nigam, 1998) with a thresh-old of 0.3. For SVM (Joachims, 1998), we use theLibSVM implementation (Fan et al., 2005) with alinear kernel.

The automatic translation of the MPQA and ofthe SemCor corpus was performed using LanguageWeaver,1 a commercial statistical machine transla-tion software. The resulting text was post-processedby removing diacritics, stopwords and numbers. Fortraining, we experimented with a series of weight-ing schemes, yet we only report the results obtainedfor binary weighting, as it had the most consistentbehavior.

The results obtained by running the three experi-ments on Romanian are shown in Table 2. The base-line on this data set is 54.16%, represented by thepercentage of sentences in the corpus that are sub-jective, and the upper bound (UB) is 71.83%, whichis the accuracy obtained under the scenario wherethe test data is translated into the source languageand then annotated using the high-coverage Opin-ionFinder tool.

Perhaps not surprisingly, the SVM classifier out-performs Naıve Bayes by 2% to 6%, implying thatSVM may be better fitted to lessen the amount ofnoise embedded in the dataset and provide more ac-curate classifications.

The first experiment, involving the automatictranslation of the MPQA corpus enhanced with man-ual annotations for subjectivity at sentence level,does not seem to perform well when compared to theexperiments in which automatic subjectivity classi-

1http://www.languageweaver.com/

RomanianExp Classifier P R F

E1 Naıve Bayes 60.91 60.91 60.91SVM 66.07 66.07 66.07

E2 Naıve Bayes 63.69 63.69 63.69SVM 69.44 69.44 69.44

E3 Naıve Bayes 65.87 65.87 65.87SVM 67.86 67.86 67.86

UB OpinionFinder 71.83 71.83 71.83

Table 2: Precision (P), Recall (R) and F-measure (F) forRomanian experiments

fication is used. This could imply that a classifiercannot be so easily trained on the cues that humansuse to express subjectivity, especially when they arenot overtly expressed in the sentence and thus canbe lost in the translation. Instead, the automaticannotations produced with a rule-based tool (Opin-ionFinder), relying on overt mentions of words ina subjectivity lexicon, seems to be more robust totranslation, further resulting in better classificationresults. To exemplify, consider the following sub-jective sentence from the MPQA corpus, which doesnot include overt clues of subjectivity, but was an-notated as subjective by the human judges becauseof the structure of the sentence:It is the Palestini-ans that are calling for the implementation of theagreements, understandings, and recommendationspertaining to the Palestinian-Israeli conflict.

We compare our results with those obtained bya previously proposed method that was based onthe manual translation of the SemCor subjectivity-annotated corpus. In (Mihalcea et al., 2007), weused the manual translation of the SemCor corpusinto Romanian to form an English-Romanian par-allel data set. The English side was annotated us-ing the Opinion Finder tool, and the subjectivity la-bels were projected on the Romanian text. A NaıveBayes classifier was then trained on the subjectivityannotated Romanian corpus and tested on the samegold standard as used in our experiments. Table 3shows the results obtained in those experiments byusing the high-coverage OpinionFinder classifier.

Among our experiments, experiments two andthree are closest to those proposed in (Mihalceaet al., 2007). By using machine translation, from

Page 6: Multilingual Subjectivity Analysis Using Machine Translationweb.eecs.umich.edu/~mihalcea/papers/banea.emnlp08.pdf · 2009-01-16 · machine translation to produce a corpus in the

OpinionFinder classifier P R Fhigh-coverage 67.85 67.85 67.85

Table 3: Precision (P), Recall (R) and F-measure (F) forsubjectivity analysis in Romanian obtained by using anEnglish-Romanian parallel corpus

English into Romanian (experiment two) or Roma-nian into English (experiment three), and annotatingthis dataset with the high-coverage OpinionFinderclassifier, we obtain an F-measure of 63.69%, and65.87% respectively, using Naıve Bayes (the samemachine learning classifier as used in (Mihalcea etal., 2007)). This implies that at most 4% in F-measure can be gained by using a parallel corpus ascompared to an automatically translated corpus, fur-ther suggesting that machine translation is a viablealternative to devising subjectivity classification in atarget language leveraged on the tools existent in asource language.

As English is a language with fewer inflectionswhen compared to Romanian, which accommodatesfor gender and case as a suffix to the base form of aword, the automatic translation into English is closerto a human translation (experiment three). Thereforelabeling this data using the OpinionFinder tool andprojecting the labels onto a fully inflected human-generated Romanian text provides more accurateclassification results, as compared to a setup wherethe training is carried out on machine-translated Ro-manian text (experiment two).

0.5

0.55

0.6

0.65

0.7

0.2 0.4 0.6 0.8 1

F-m

easu

re

Percentage of corpus

NBSVM

Figure 5: Experiment two: Machine learning F-measureover an incrementally larger training set

We also wanted to explore the impact that the cor-

0.5

0.55

0.6

0.65

0.7

0.2 0.4 0.6 0.8 1

F-m

easu

re

Percentage of corpus

NBSVM

Figure 6: Experiment three: Machine learning F-measureover an incrementally larger training set

pus size may have on the accuracy of the classifiers.We re-ran experiments two and three with 20% cor-pus size increments at a time (Figures 5 and 6). Itis interesting to note that a corpus of approximately6000 sentences is able to achieve a high enough F-measure (around 66% for both experiments) to beconsidered viable for training a subjectivity classi-fier. Also, at a corpus size over 10,000 sentences, theNaıve Bayes classifier performs worse than SVM,which displays a directly proportional trend betweenthe number of sentences in the data set and the ob-served F-measure. This trend could be explainedby the fact that the SVM classifier is more robustwith regard to noisy data, when compared to NaıveBayes.

5 Portability to Other Languages

To test the validity of the results on other languages,we ran a portability experiment on Spanish.

To build a test dataset, a native speaker of Span-ish translated the gold standard of 504 sentences intoSpanish. We maintain the same subjectivity anno-tations as for the Romanian dataset. To create thetraining data required by the first two experiments,we translate both the MPQA corpus and the Sem-Cor corpus into Spanish using the Google Transla-tion service,2 a publicly available machine transla-tion engine also based on statistical machine transla-tion. We were therefore able to implement all the ex-periments but the third, which would have required

2http://www.google.com/translatet

Page 7: Multilingual Subjectivity Analysis Using Machine Translationweb.eecs.umich.edu/~mihalcea/papers/banea.emnlp08.pdf · 2009-01-16 · machine translation to produce a corpus in the

a manually translated version of the SemCor corpus.Although we could have used a Spanish text to carryout a similar experiment, due to the fact that thedataset would have been different, the results wouldnot have been directly comparable.

The results of the two experiments exploring theportability to Spanish are shown in Table 4. Inter-estingly, all the figures are higher than those ob-tained for Romanian. We assume this occurs be-cause Spanish is one of the six official United Na-tions languages, and the Google translation engineis using the United Nations parallel corpus to traintheir translation engine, therefore implying that abetter quality translation is achieved as compared tothe one available for Romanian. We can thereforeconclude that the more accurate the translation en-gine, the more accurately the subjective content istranslated, and therefore the better the results. As itwas the case for Romanian, the SVM classifier pro-duces the best results, with absolute improvementsover the Naıve Bayes classifier ranging from 0.2%to 3.5%.

Since the Spanish automatic translation seems tobe closer to a human-quality translation, we are notsurprised that this time the first experiment is ableto generate a more accurate training corpus as com-pared to the second experiment. The MPQA corpus,since it is manually annotated and of better quality,has a higher chance of generating a more reliabledata set in the target language. As in the experimentson Romanian, when performing automatic transla-tion of the test data, we obtain the best results withan F-measure of 73.41%, which is also the upperbound on our proposed experiments.

SpanishExp Classifier P R F

E1 Naıve Bayes 65.28 65.28 65.28SVM 68.85 68.85 68.85

E2 Naıve Bayes 62.50 62.50 62.50SVM 62.70 62.70 62.70

UB OpinionFinder 73.41 73.41 73.41

Table 4: Precision (P), Recall (R) and F-measure (F) forSpanish experiments

6 Discussion

Based on our experiments, we can conclude that ma-chine translation offers a viable approach to gener-ating resources for subjectivity annotation in a giventarget language. The results suggest that either amanually annotated dataset or an automatically an-notated one can provide sufficient leverage towardsbuilding a tool for subjectivity analysis.

Since the use of parallel corpora (Mihalcea et al.,2007) requires a large amount of manual labor, oneof the reasons behind our experiments was to assesthe ability of machine translation to transfer subjec-tive content into a target language with minimal ef-fort. As demonstrated by our experiments, machinetranslation offers a viable alternative in the construc-tion of resources and tools for subjectivity classifica-tion in a new target language, with only a small de-crease in performance as compared to the case whena parallel corpus is available and used.

To gain further insights, two additional experi-ments were performed. First, we tried to isolate therole played by the quality of the subjectivity anno-tations in the source-language for the cross-lingualprojections of subjectivity. To this end, we used thehigh-precision OpinionFinder classifier to annotatethe English datasets. As shown in Table 1, this clas-sifier has higher precision but lower recall as com-pared to the high-coverage classifier we used in ourprevious experiments. We re-ran the second exper-iment, this time trained on the 3,700 sentences thatwere classified by the OpinionFinder high-precisionclassifier as either subjective or objective. For Ro-manian, we obtained an F-measure of 69.05%, whilefor Spanish we obtained an F-measure of 66.47%.

Second, we tried to isolate the role played bylanguage-specific clues of subjectivity. To this end,we decided to set up an experiment which, by com-parison, can suggest the degree to which the lan-guages are able to accommodate specific markers forsubjectivity. First, we trained an English classifierusing the SemCor training data automatically anno-tated for subjectivity with the OpinionFinder high-coverage tool. The classifier was then applied to theEnglish version of the manually labeled test data set(the gold standard described in Section 4). Next, weran a similar experiment on Romanian, using a clas-sifier trained on the Romanian version of the same

Page 8: Multilingual Subjectivity Analysis Using Machine Translationweb.eecs.umich.edu/~mihalcea/papers/banea.emnlp08.pdf · 2009-01-16 · machine translation to produce a corpus in the

SemCor training data set, annotated with subjectiv-ity labels projected from English. The classifier wastested on the same gold standard data set. Thus, thetwo classifiers used the same training data, the sametest data, and the same subjectivity annotations, theonly difference being the language used (English orRomanian).

The results for these experiments are compiled inTable 5. Interestingly, the experiment conducted onRomanian shows an improvement of 3.5% to 9.5%over the results obtained on English, which indi-cates that subjective content may be easier to learnin Romanian versus English. The fact that Roma-nian verbs are inflected for mood (such as indicative,conditional, subjunctive, presumptive), enables anautomatic classifier to identify additional subjectivemarkers in text. Some moods such as conditionaland presumptive entail human judgment, and there-fore allow for clear subjectivity annotation. More-over, Romanian is a highly inflected language, ac-commodating for forms of various words based onnumber, gender, case, and offering an explicit lex-icalization of formality and politeness. All thesefeatures may have a cumulative effect in allowingfor better classification. At the same time, Englishentails minimal inflection when compared to otherIndo-European languages, as it lacks both genderand adjective agreement (with very few notable ex-ceptions such asbeautiful girl andhandsome boy).Verb moods are composed with the aid of modals,while tenses and expressions are built with the aidof auxiliary verbs. For this reason, a machine learn-ing algorithm may not be able to identify the sameamount of information on subjective content in anEnglish versus a Romanian text. It is also interestingto note that the labeling of the training set was per-formed using a subjectivity classifier developed forEnglish, which takes into account a large, human-annotated, subjectivity lexicon also developed forEnglish. One would have presumed that any clas-sifier trained on this annotated text would thereforeprovide the best results in English. Yet, as explainedearlier, this was not the case.

7 Conclusion

In this paper, we explored the use of machine trans-lation for creating resources and tools for subjec-

Exp Classifier P R F

En Naıve Bayes 60.32 60.32 60.32SVM 60.32 60.32 60.32

Ro Naıve Bayes 67.85 67.85 67.85SVM 69.84 69.84 69.84

Table 5: Precision (P), Recall (R) and F-measure (F) foridentifying language specific information

tivity analysis in other languages, by leveraging onthe resources available in English. We introducedand evaluated three different approaches to generatesubjectivity annotated corpora in a given target lan-guage, and exemplified the technique on Romanianand Spanish.

The experiments show promising results, as theyare comparable to those obtained using manuallytranslated corpora. While the quality of the trans-lation is a factor, machine translation offers an effi-cient and effective alternative in capturing the sub-jective semantics of a text, coming within 4% F-measure as compared to the results obtained usinghuman translated corpora.

In the future, we plan to explore additionallanguage-specific clues, and integrate them into thesubjectivity classifiers. As shown by some of ourexperiments, Romanian seems to entail more subjec-tivity markers compared to English, and this factormotivates us to further pursue the use of language-specific clues of subjectivity.

Our experiments have generated corpora of about20,000 sentences annotated for subjectivity in Ro-manian and Spanish, which are available for down-load at http://lit.csci.unt.edu/index.php/Downloads,along with the manually annotated data sets.

Acknowledgments

The authors are grateful to Daniel Marcu and Lan-guageWeaver for kindly providing access to theirRomanian-English and English-Romanian machinetranslation engines. This work was partially sup-ported by a National Science Foundation grant IIS-#0840608.

Page 9: Multilingual Subjectivity Analysis Using Machine Translationweb.eecs.umich.edu/~mihalcea/papers/banea.emnlp08.pdf · 2009-01-16 · machine translation to produce a corpus in the

References

C. Ovesdotter Alm, D. Roth, and R. Sproat. 2005.Emotions from text: Machine learning for text-basedemotion prediction. InProceedings of the Hu-man Language Technologies Conference/Conferenceon Empirical Methods in Natural Language Process-ing (HLT/EMNLP-2005), pages 347–354, Vancouver,Canada.

K. Balog, G. Mishne, and M. de Rijke. 2006. Why arethey excited? identifying and explaining spikes in blogmood levels. InProceedings of the 11th Meeting ofthe European Chapter of the Association for Compu-tational Linguistics (EACL-2006).

A. Esuli and F. Sebastiani. 2006. Determining term sub-jectivity and term orientation for opinion mining. InProceedings the 11th Meeting of the European Chap-ter of the Association for Computational Linguistics(EACL-2006), pages 193–200, Trento, IT.

R. Fan, P. Chen, and C. Lin. 2005. Working set selectionusing the second order information for training svm.Journal of Machine Learning Research, 6:1889–1918.

M. Hu and B. Liu. 2004. Mining and summarizingcustomer reviews. InProceedings of ACM SIGKDDConference on Knowledge Discovery and Data Min-ing 2004 (KDD 2004), pages 168–177, Seattle, Wash-ington.

Y. Hu, J. Duan, X. Chen, B. Pei, and R. Lu. 2005. A newmethod for sentiment classification in text retrieval. InIJCNLP, pages 1–9.

T. Joachims. 1998. Text categorization with SupportVector Machines: learning with mny relevant features.In Proceedings of the European Conference on Ma-chine Learning, pages 137–142.

H. Kanayama and T. Nasukawa. 2006. Fully automaticlexicon expansion for domain-oriented sentiment anal-ysis. InProceedings of the Conference on EmpiricalMethods in Natural Language Processing (EMNLP-2006), pages 355–363, Sydney, Australia.

N. Kando and D. Kirk Evans, editors. 2007.Proceed-ings of the Sixth NTCIR Workshop Meeting on Evalua-tion of Information Access Technologies: InformationRetrieval, Question Answering, and Cross-Lingual In-formation Access, 2-1-2 Hitotsubashi, Chiyoda-ku,Tokyo 101-8430, Japan, May. National Institute of In-formatics.

S.-M. Kim and E. Hovy. 2006. Identifying and ana-lyzing judgment opinions. InProceedings of the Hu-man Language Technology Conference of the NAACL,pages 200–207, New York, New York.

N. Kobayashi, K. Inui, Y. Matsumoto, K. Tateishi, andT. Fukushima. 2004. Collecting evaluative expres-sions for opinion extraction. InProceedings of the 1stInternational Joint Conference on Natural LanguageProcessing (IJCNLP-04).

L. Lloyd, D. Kechagias, and S. Skiena. 2005. Lydia: Asystem for large-scale news analysis. InString Pro-cessing and Information Retrieval (SPIRE 2005).

A. McCallum and K. Nigam. 1998. A comparison ofevent models for Naive Bayes text classification. InProceedings of AAAI-98 Workshop on Learning forText Categorization.

R. Mihalcea, C. Banea, and J. Wiebe. 2007. Learningmultilingual subjective language via cross-lingual pro-jections. InProceedings of the Association for Com-putational Linguistics, Prague, Czech Republic.

G. Miller, C. Leacock, T. Randee, and R. Bunker. 1993.A semantic concordance. InProceedings of the 3rdDARPA Workshop on Human Language Technology,Plainsboro, New Jersey.

Y. Suzuki, H. Takamura, and M. Okumura. 2006. Ap-plication of semi-supervised learning to evaluative ex-pression classification. InProceedings of the 7th In-ternational Conference on Intelligent Text Process-ing and Computational Linguistics (CICLing-2006),pages 502–513, Mexico City, Mexico.

H. Takamura, T. Inui, and M. Okumura. 2006. Latentvariable models for semantic orientations of phrases.In Proceedings of the 11th Meeting of the EuropeanChapter of the Association for Computational Linguis-tics (EACL 2006), Trento, Italy.

J. Wiebe and R. Mihalcea. 2006. Word sense and subjec-tivity. In Proceedings of COLING-ACL 2006.

J. Wiebe and E. Riloff. 2005. Creating subjective andobjective sentence classifiers from unannotated texts.In Proceedings of the 6th International Conferenceon Intelligent Text Processing and Computational Lin-guistics (CICLing-2005) ( invited paper), Mexico City,Mexico.

J. Wiebe, T. Wilson, and C. Cardie. 2005. Annotating ex-pressions of opinions and emotions in language.Lan-guage Resources and Evaluation, 39(2-3):165–210.

H. Yu and V. Hatzivassiloglou. 2003. Towards answeringopinion questions: Separating facts from opinions andidentifying the polarity of opinion sentences. InPro-ceedings of the Conference on Empirical Methods inNatural Language Processing (EMNLP-2003), pages129–136, Sapporo, Japan.