Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Sentiment Analysis of Peer Review Textsfor Scholarly Papers
Ke Wang & Xiaojun Wan{wangke17,wanxiaojun}@pku.edu.cn
July 9, 2018
Institute of Computer Science and Technology, Peking UniversityBeijing , China
Outline
1. Introduction
2. Related Work
3. Framework
4. Experiments
5. Conclusion and Future Work
1/29
Outline
1. Introduction
2. Related Work
3. Framework
4. Experiments
5. Conclusion and Future Work
2/29
Introduction
• The boom of scholarly papers• Motivations
• Help review submission system todetect the consistency of reviewtexts and scores.
• Help the chair to write acomprehensive meta-review.
• Help authors to further improve theirpaper.
Figure 1: An example of peer reviewtext and the analysis results.
3/29
Introduction
• Challenges• Long length.• Mixture of non-opinionated and opinionated texts.• Mixture of pros and cons.
• Contributions• We built two evaluation datasets. (ICLR-2017 and ICLR-2018)• We propose a multiple instance learning network with a novel
abstract-based memory mechanism (MILAM)• Evaluation results demonstrate the efficacy of our proposed model
and show the great helpfulness of using abstract as memory.
4/29
Outline
1. Introduction
2. Related Work
3. Framework
4. Experiments
5. Conclusion and Future Work
5/29
Related Work
• Sentiment ClassificationSentiment analysis has been widely explored in many textdomains, but few studies trying to perform it in the domain ofpeer reviews for scholarly papers.
• Multiple Instance LearningMIL can extract instance labels(sentence-level polarities)from bags (reviews in our case), but none of previous workwas applied to this challenging task.
• Memory NetworkMemory network utilizes external information for greatercapacity and efficiency.
• Study on Peer ReviewsThese tasks are related but different from the sentimentanalysis task addressed in this study. 6/29
Outline
1. Introduction
2. Related Work
3. Framework
4. Experiments
5. Conclusion and Future Work
7/29
Framework
• Architecture1 Input
Representation2 Sentence
Classification3 Review
Classification
...
1I 2I nI...
...
...
...
1M 2M mM...
...
...
MLP MLP MLP
1V 2V nV
...
...
...
1V 2V nV
2h nh1h...
...
document attention
(2)E( )nE
(2)R ( )nR
(1)E
Input
Representation
Layer
Sentence
Classification
Layer
nP1P2P
reviewP
abstractT
1
aS 2
aS a
mS
reviewT
1
rS2
rSr
nS
matched
attention
response
content
sentence
embedding
convolution
...
max pooling
1a 2a nasoftmax
Review
Classification
Layer
Abstract-based Memory Mechanism
Sum
( )iR(1)R
( )
1
ie ( )
2
ie ( )i
me( )iE
Figure 2: The architecture of MILAM
8/29
Framework
1 Input Representation Layer:I A sentence S of length L (padded where necessary) is represented
as:S = w1 ⊕ w2 ⊕ · · · ⊕ wL, S ∈ RL×d, (1)
II The convolutional layer:
fk = tanh(Wc ·Wk−l+1:k + bc), (2)
f (q) = [f (q)1 , f (q)
2 , · · · , f (q)L−l+1], (3)
III A max-pooling layer:uq = max{f (q)}. (4)
Finally, the representations of the review text {Sri}n
i=1 and theabstract text {Sa
j }mj=1 are denoted as [Ii]
ni=1, [Mj]
mi=1
respectively. where Ii,Mj ∈ Rz.
9/29
Framework
2 Sentence Classification Layer:I Obtain a matched attention vector E(i) = [e(i)
t ]mt=1 which indicatesthe weight of memories.
II Calculate the response content R(i) ∈ Rz using this matchedattention vector.
III Use a MLP to obtain the final representation vector of eachsentence in the review text.
Vi = fmlp(Ii||R(i); θmlp), (5)
IV Use the softmax classifier to get sentence-level distribution oversentiment labels.
Pi = softmax(Wp · Vi + bp), (6)
Finally, we obtained new high-level representations ofsentences in the review text by leveraging relevant abstractinformation.
10/29
Framework
3 Review Classification Layer:I use separate LSTM modules to produce forward and back- ward
hidden vectors:
−→hi =
−−−→LSTM(Vi),
←−hi =
←−−−LSTM(Vi), hi =
−→hi ||←−hi (7)
II The importance (ai) of each sentence is measured as follows:
h′i = tanh(Wa · hi + ba), ai =
exp(h′i )∑
j exp(h′j )
(8)
III Finally, we obtain a document-level distribution over sentimentlabels as the weighted sum of sentence-level distributions:
P(c)review =
∑i
aiP(c)i , c ∈ [1,C] (9)
11/29
Framework
• Abstract-based Memory Mechanism1 Get the matched attention vector E(i) of memories:
e′t = LSTM(ht−1,Mt), (h0 = Ii, t = 1, ...,m) (10)
e(i)t =
exp(e′t )∑
j exp(e′j )
(11)
E(i) = [e(i)t ]mt=1 (12)
2 Calculate the response content R(i):
R(i) =m∑
t=1
e(i)t Mt (13)
3 Use R(i) and Ii to compute the new sentence representationvector Vi:
Vi = fmlp(Ii||R(i); θmlp), (14)
12/29
Framework
• Objective Function
• Our model only needs the review’s sentiment label while eachsentence’s sentiment label is unobserved.
• The categorical cross-entropy loss:
L(θ) =∑
Treview
C∑c=1
−P(c)review log(P(c)
review) (15)
13/29
Outline
1. Introduction
2. Related Work
3. Framework
4. Experiments
5. Conclusion and Future Work
14/29
Experiments
• Evaluation Datasets• Statistics for ICLR-2017 and ICLR-2018 datasets.
Data Set #Papers #Reviews #Sentences #WordsICLR-2017 490 1517 24497 9868ICLR-2018 954 2875 58329 13503
• The score distributions:
15/29
Experiments
• Comparison of review sentiment classification accuracy onthe 2-class task {accept(score ∈ [1, 5]), reject(score ∈ [6,10])}
16/29
Experiments
• Comparison of review sentiment classification accuracy onthe 3-class task {accept(score ∈ [1, 4]), borderline(score ∈[5, 6]), reject(score ∈ [7, 10])}
17/29
Experiments
• Sentence-Level Classification Results.We randomly selected 20 reviews, a total of 213 sentences, andmanually labeled the sentiment polarity of each sentence.
Figure 3: Example opinionated sentences with predicted polarityscores extracted from a review text.
18/29
Experiments
• Influence of Abstract Text.
Figure 4: Example sentences in a review text and its most relevant sentencein the paper abstract text. The sentence with the largest weight in thematched attention vector E(i) is considered most relevant. The red textsindicate similarities in the review text and the abstract text.
19/29
Experiments
• Influence of Abstract Text.• A simple method of using abstract texts as a contrast experiment
Remove the sentences that are similar to the paper abstract’ssentences from the review text and use the remaining text forclassification.(The threshold is set to 0.7)
Figure 5: The comparison of using and not using the paper abstract viaa simple method.
20/29
Experiments
• Influence of Borderline Reviews.
Figure 6: Experimental results on different datasets with, without and onlyborderline reviews.
21/29
Experiments
• Cross-Year Experiments.
Figure 7: Results of cross-year experiments. Model@ICLR− ∗ meansthe model is trained on ICLR− ∗ dataset.
22/29
Experiments
• Cross-Domain Experiments.We further collected 87 peer reviews for submissions in the NLPconferences (CoNLL, ACL, EMNLP, etc.), including 57 positive reviews(accept) and 30 negative reviews (reject).
Figure 8: Results of cross-domain experiments.∗ means the performanceimprovement over the first three methods is statistically significant withp-value < 0.05 for sign-test. Model@ICLR− ∗ means the model is trained onICLR− ∗ dataset. 23/29
Experiments
• Final Decision Prediction for Scholarly Papers.• Methods to predict the final decision of a paper based on several
review scores.
• Voting:
Decision =
{Accept if #accept > #reject
Reject Otherwise(16)
• Simple Average:Simply average the scores of all reviews. If the average score is largerthan or equal to 0.6, then the paper is predicted as final accept, andotherwise final reject.
• Confidence-based Average:
overall_score =1
|S|
|S|∑i=1
Si ∗1
(6 − ReviewerConfidencei)(17)
24/29
Experiments
• Final Decision Prediction for Scholarly Papers.• Results of final decision prediction for scholarly papers.
Figure 9: Results of final decision prediction for scholarly papers.
25/29
Outline
1. Introduction
2. Related Work
3. Framework
4. Experiments
5. Conclusion and Future Work
26/29
Conclusion and Future Work
• Contributions• We built two evaluation datasets. (ICLR-2017 and ICLR-2018)• We propose a multiple instance learning network with a novel
abstract-based memory mechanism (MILAM)• Evaluation results demonstrate the efficacy of our proposed model
and show the great helpfulness of using abstract as memory.
• Future Work
• Collect more peer reviews.• Try more sophisticated deep learning techniques.• Several other sentiment analysis tasks:
Prediction of the fine-granularity scores of reviews, Automaticwriting of meta-reviews, Prediction of the best papers...
27/29
Acknowledgments
• National Natural Science Foundation of China.
• Anonymous reviewers for their helpful comments.
• SIGIR Student Travel Grant.
28/29
29/29