AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis...

AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL

Ben He

Craig Macdonald

Iadh Ounis

University of Glasgow

Jiyin HeUniversity of Amsterdam

CIKM 2008

Introduction

Finding opinionated blog posts is still an open problem.

A popular solution is to utilize the external resources and manual efforts in identifying subjective features.

The authors proposed a dictionary-based statistical approach, which automatically derives evidence for subjectivity from the blog collection itself, without requiring any manual effort.

TREC Opinion Finding Task (1/2) Text REtrieval Conference. Goal: To identify sentiment at the

document-level. The dataset are composed of:

Feed documents: XML format, usually a short summary of the blog post.

Permalink documents: HTML format, the complete blog post and its comments.

Homepage documents: HTML format, main entry to the blog.

TREC Opinion Finding Task (2/2) Sample query format:

<top><num> 863<title> netflix<desc>

Identify documents that show customer opinions of Netflix.

<narr>A relevant document will indicate subscriber satisfaction with Netflix. Opinions about the Netflix DVD allocation system, promptness or delay in mailings are relevant.Indications of having been or intent to become a Netflix subscriber that do not state an opinion are not relevant.

</top>

Statistical Dictionary-based Approach

Dictionary Generation

The Skewed Query Model Rank all terms in the collection by term

frequencies in descending order. The terms, whose rankings are in the range

(S·#terms, U·#terms)are selected in the dictionary. #terms : the number of unique terms in the

collection S,U : model parameters. S=0.00007 and

U=0.001 in this paper.

Dictionary Generation

Ex:#terms=200,000#terms x 0.00007=14#terms x 0.001=200Only those terms ranked 14 to 200 will be

preserved

The dictionary is not necessary opinionated.

Term Weighting (1/2)

KL divergence method

D(Rel): Collection of relevant documents. D(opRel): Collection of opinionated and relevant documents. c(D(opRel))= #tokens in the opinionated documents. c(D(Rel))= #tokens in the relevant documents. tfx=term frequency of the term t in the opinionated

documents. tfrel=term frequency of the term t in the relevant

documents.

Term Weighting (2/2)

Bose-Einstein statistics method Measures how informative a term is in the

set D(opRel) against D(Rel).

= : the frequency of the term t in the D(Rel). : the number of documents in D(Rel). : the frequency of the term t in the

D(opRel).

Generating the Opinion Score Take the X top weighted terms from the

opinion dictionary. X will be tuned in the training step.

Submit them to the retrieval system as a query Qopn.

Score(d,Qopn): the opinion score of document d.

Score(d,Q): the initial ranking score.

Score Combination

Linear combination:

Log combination:

a, k will be tuned in the training step.

Experiment Settings (1/3)

TREC06: 50 topics for training. TREC07: 50 topics for testing. Only the “title” field is used (1.74

words/topic). Baseline 1: Apply InLB model, a variation of

the BM25 ranking function. Retrieve as many relevant documents as possible.

Baseline 2: favor documents where the query terms appear in close proximity.

Q2: The set of all query term pairs in query Q. N: #Docs in the collection. T: #Tokens in the collection. pfn: The normalized frequency of the tuple p.

Manually collecting an external dictionary from OpinionFinder and several other resources.

Contains approximately 12,000 English words, mostly adjectives, adverbs and nouns.

Experiment: Term Weighting (1/2) Hypothesis: the most opinionated terms

for one query set are also good indicators of opinion for other queries.

Sampling:

For each sample set, calculate the weight of each terms.

Training Set(50 Topics)

Set10…

Each with 25 Topics

Overlap :65%

maximum

Experiment: Term Weighting (2/2) Compute the cosine similarity between

the weights of the top 100 weighted terms from each two samples

Experiment: Validation (1/3)

Tuning the parameters X, a and k mentioned before.

Maximize X by maximizing the mean MAP of the 10 samples.

Training Set(50 Topics)

Set1for

assigning term weight

Set1’for

validation

Experiment: Validation (2/3)18

Experiment: Validation (3/3)

Fix X=100, tuning a and k. a within [0, 1] , step=0.05 k within (0, 1000] , step=50

Experiment: Evaluation (1/3)

Experiment: Evaluation (2/3)

Experiment: Evaluation (3/3) Comparison with the OpinionFinder

All being equal, replace the opinion score Score(d,Qopn) with

Conclusion

An effective and practical approach to retrieving opinionated blog posts without manual effort.

Opinion scores are computed during indexing Computational cost is negligible.

The automatically generated internal dictionary performs as good as the external dictionary.

Diferrent random samples from the collection reach a high consensus on the opinionated terms if the Bose-Einstein statistics given by the geometric distribution are applied.

Thank you for listening!24

AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis...

Documents

The Month of Kiahk (Also called the Miriam Month)Nyok ounis] al/ywc: ;w pifaisennoufi ;nkaloc: qen nitaxic ;naggelikon: nem nitagma ;n;epouranion. You are truly great, O messenger

Special Care Dentistry Postgraduate Curriculum Guidanceiadh.org/wp-content/uploads/2014/10/iADH-post-graduate-curriculum... · Special Care Dentistry Postgraduate Curriculum Guidance!

Ma iADH gazineiadh.org/wp-content/uploads/2014/01/2011-Winter.pdfGledelig Jul // Craciun fericit si un An Nou fericit! // Yeni Yiliniz Kutlu Olsun 圣诞快乐，新年快乐; Gledileg

iADH: International Association for Disability and …iadh.org/.../uploads/2020/04/iADH-COVID-Fact-Sheet-15… · Web view2020/04/15 · Suspected cases – people with an epidemiological

Philadelphia University Syllabu… · 4 ear 09.10 10.10 or Dr. Mohammed Bani Younis hiladel hia.edu:o mbani ounis www. hiladel hia.edu:o/academics/mbani ounis En ineerin skills +

Undergraduate Curriculum in Special Care Dentistryiadh.org/wp-content/uploads/2013/09/iADH-Curriculum-in...Undergraduate Curriculum in Special Care Dentistry This work is licensed

Mohamed Elloumi, Omar Ben Ounis, Daniel Courteix, Emna Makni, Saleheddine Sellami,

Lupe/Honest and Fairrits resered ot or oeria use is ateria is proprietar to and a e used reprodued distriuted eusie sta ounis ir out ounteers serie units and/or troops soe in onnetion

1 Storage Space Allocation in Container Terminals Chuqian Zhang 1, Jiyin Liu 1, Yat-wah Wan 1, Katta G. Murty 2, Richard Linn 3 1 IEEM, HKUST,

Evidence of Positive Selection on a Class IADH Locus · The American Journal of Human Genetics Volume 80 March 2007 441 ARTICLE Evidence of Positive Selection on a Class IADH Locus

Ma iADH gazine - IADH, iADHGledelig Jul // Craciun fericit si un An Nou fericit! // Yeni Yiliniz Kutlu Olsun 圣诞快乐，新年快乐; Gledileg Jol og Farsaelt Komandi ar! Nollaig

The Impacts of Structural Difference and Temporality of ...meng/pub.d/tois-13-Lifeng.pdf · 21:2 L.Jiaetal. relevant information to a query within Twitter [Ounis et al. 2011]. To

The Fish4Knowledge Project Disclosing Computer Vision Errors to End-Users Emma Beauxis-Aussalet, Lynda Hardman, Jacco Van Ossenbruggen, Jiyin He, Elvira

October Magazine 2012 - IADH, iADHiadh.org/wp-content/uploads/2014/01/2012-Autumn.pdf · Dr. Srivats Bharadwaj. IADH MAGAZINE News Report from spring-meeting in the Danish Section

Iadh innovation in healthcare r4

contents Magazine iADH · special letter of recommendation, of gratitude, of recognition. And that is the letter Iʼd like to write to each one of you … thanking you for being there

Mutua Madrid Open Sub16 Fase Final Fecha Inicial Fecha Final … · Ounis Sorribes, Mohamed Sa Torres Muñ0Z, Daniel 5 13067302 364 Planelles Ripoll, Gerard [4] 8664650 2402 WC Soto

Accessing Existing Distributed Science Archives as RDF Models Alasdair J G Gray 1 Norman Gray 2 Iadh Ounis 1 1 Computing Science, University of Glasgow

He, Jiyin and Qvarfordt, Pernilla and Halvey, Martin and ... · Beyond Actions: Exploring the Discovery of Tactics from User Logs Jiyin Hea, Pernilla Qvarfordtb, Martin Halveyc, Gene

Evidence of Positive Selection on a Class IADH LocusEvidence of Positive Selection on a Class IADH Locus Yi Han,* Sheng Gu,* Hiroki Oota,† Michael V. Osier,‡ Andrew J. Pakstis,