Upload
rodger-hunter
View
220
Download
0
Embed Size (px)
Citation preview
Multilingual Opinion Holder Identification Using Author and Authority Viewpoints
Yohei Seki, Noriko Kando ,Masaki AonoToyohashi University of Technology and National Institute of Informatics, JapanJournal of Information Processing and
Management 2009
Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen
2009/04/27 Cicilia Chia-ying Lee 2
Outline1. Problem Definition
2. Corpus: NTCIR-6 pilot
3. Approach in NTCIR-6
4. Revised Approach after NTCIR-6
5. Comparison and Discussion
6. Conclusion
2009/04/27 Cicilia Chia-ying Lee 3
Problem Definition(1/2) Identify opinion holder in opinion sentence It is important because news articles contain
many opinions from different opinion holder Opinion holder:
1. The explicit noun phrases in the sentences
2. The inexplicit noun phrases (ex: anaphor)
3. The exophoric elements (ex: author)
2009/04/27 Cicilia Chia-ying Lee 4
Problem Definition(2/2)Author: the writer of the documentAuthority: the third partiesFocused on different writing style
Difference in syntactic constructs or term usages.
2009/04/27 Cicilia Chia-ying Lee 5
Corpus NTCIR-6 Opinion Analysis Pilot Task
Evaluation method
2009/04/27 Cicilia Chia-ying Lee 6
Approach in NTCIR-6
Evaluation results in NTCIR-6
12
3
2009/04/27 Cicilia Chia-ying Lee 7
Author and Authority Opinion Extraction(1/4)
Three opinion types (Wiebe et al 2005)1. Explicit mentions of private states by a person,
nation, or organization
2. Speech events expressing private states by an agent
3. Expressive subjective elements (author view)
2009/04/27 Cicilia Chia-ying Lee 8
Author and Authority Opinion Extraction(2/4)
Japanese Train set: NTCIR-6, 4 training topics Features:
Syntactic pairs of grammatical subjects and predicates such as pronouns
Subjects : named entities, semantic primitives, and key terms
Predicates : semantic primitives from a thesaurus
Parser: Cabocha
2009/04/27 Cicilia Chia-ying Lee 9
Author and Authority Opinion Extraction(3/4)
English Train set: MPQA Corpus
Author view: ‘‘nested source” attributes was a ‘‘w” (writer) and not nested
Feature: Syntactic pairs of the syntactic patterns such as nouns and adjectives/verbs
Parser: Minipar
2009/04/27 Cicilia Chia-ying Lee 10
Author and Authority Opinion Extraction(4/4)
2009/04/27 Cicilia Chia-ying Lee 11
Rule-based Holder Identification
(1) Bracketed elements of PER,ORG,LOC in the sentence.
(2) Grammatical subject elements of PER, ORG, LOC in the sentence.
(3) Grammatical subject elements of PER, ORG, LOC in the previous sentences.
(4) PER, ORG, LOC in the sentences other than those classified by (1) or (2).
Name entity extractor: NExT
Evaluation results in NTCIR-6(1/3)
2009/04/27 Cicilia Chia-ying Lee 12
2009/04/27 Cicilia Chia-ying Lee 13
Evaluation results in NTCIR-6(2/3)
Opinion holder extraction(1) Extraction using term sequences (Cornell, GATE)
(2) Lexicon-based heuristics (IIT)
(3) Named entity extraction approach (TUT and others)
Identify the author(1) To utilize author-related clues such as verbs (ICU-IR)
(2) To detect author opinion holders when there were no holder candidates surrounding the opinionated sentences (EHBN, Cornell)
2009/04/27 Cicilia Chia-ying Lee 14
English: Author-opinionated sentences appeared more often
Evaluation results in NTCIR-6(3/3)
2009/04/27 Cicilia Chia-ying Lee 15
Outline1. Problem Definition
2. Corpus: NTCIR-6 pilot
3. Approach in NTCIR-6
4. Revised Approach after NTCIR-61. More features
2. Direct-subjective Classifier
5. Comparison and Discussion
6. Conclusion
2009/04/27 Cicilia Chia-ying Lee 16
More Features (1/3) Extend by ICU-IR approach
Phrase governed by “say”, “by” NP followed by “according to”, “by” Subjects governed by opinion verbs
Grammatical syntactic patterns Grammatical subject & verbs Auxiliary verb & verb
2009/04/27 Cicilia Chia-ying Lee 17
More Features(2/3)
2009/04/27 Cicilia Chia-ying Lee 18
More Features (3/3)
Features selected based on χ-square tests on the MPQA corpus
three count features: cntopnoun, cntopadj, and cntopadv in the subjective lexicon (Wilson et al)
Direct-subjective Classifier(1/2)
Goal: Filtering the author-opinionated sentences
Method: Combine opinion type 1 and 2 Train set : MPQA Classifier: SVM-light
2009/04/27 Cicilia Chia-ying Lee 19
Direct-subjective Classifier(2/2)
2009/04/27 Cicilia Chia-ying Lee 20
↗0.1
↗0.08
2009/04/27 Cicilia Chia-ying Lee 21
Comparison and Discussion
Baseline: The algorithm from authority opinion Features selected based on χ-square tests on the MPQA
corpus for the opinionated sentence extraction 7 topics contained more than 30% of author-opinionated
sentences attained higher F-value
2009/04/27 Cicilia Chia-ying Lee 22
Conclusion Proposed an opinion holder identification
system in both Japanese and English Features selected based on χ-square tests and
direct-subjective classifier improve the result in English
Future work: Public opinion Multilingual blogs