Online feedback correlation using clustering

Online Feedback Correlation using Clustering

Research Work Done for CS 651: Internet Algorithms

Dedicated to Tibor Horvath

Whose endless pursuit of getting a PhD (imagine that) kept him from researching this topic.

Problem Statement

Millions+ of reviews available Consumers read only a small number of reviews. Reviewer content not always trustworthy

Problem Statement (continued)

What information from reviews is important? What can we extract from the overall set of reviews

efficiently to provide more utility to consumers than is already provided?

Motivation

People are increasingly relying on online feedback mechanisms in making choices [Guernsey 2000]

Online feedback mechanisms draw consumers Competitive Edge Quality currently bad

Current Solutions

“Good” review placement Show small number of reviews

. . . more Trustworthy?

Amazon Example

Observations

Consumers look at a product based on its overall rating

Consumers read “editorial review” for content Reviews indicate can indicate common issues

… Can we correlate these reviews in some meaningful way?

Observations Lead to Hypotheses!

Hypothesis: Products with numerous similar negative reviews will often not be purchased regardless of their positive reviews. Furthermore, the number of negative reviews is a high indication of the likeliness of certain flaws in a product.

Definitions

Semantic Orientation: polar classification of whether something is positive or negative

Natural Language Processing: deciphering parts of speech from free text

Feature: quality of a product that customers care about

Feature Vector: vector representing a review in a d-dimensional space where each dimension represents a feature.

Overview of Project

Obtain large repository of customer reviews Extract features from customer reviews and orient

them Create feature vectors i.e. [1,0,-1,1,1,-1 … ] from

reviews and features Cluster feature vectors to find large negative clusters Analyze clusters and compare to hypothesis

Related Work

Related work has fallen into one of three disparate camps

1. Classification: classifying Reviews into Negative or Positive reviews

2. Domain Specificity: overall effect of reviews in a domain

3. Summarization: features extraction to summarize reviews

Limitations of Related Work

Classification– Overly summarizing

Domain Specificity– Hard to generalize given domain information

Summarization– No overall knowledge of collection

Close to Summarization?

Most closely related to work done in Summarization by Hu and Liu.– Summarization with dynamical feature extraction and

orientation per review

Data for Project

Data from Amazon.com customer reviews – Available through the use of Amazon E-Commerce

Service (ECS)– Four thousand products related to mp3 players– Over twenty thousand customer reviews

Technologies Used

Java to program modules Amazon ECS NLProcessor (trial version) from Infogistics Princeton’s WordNet as a thesaurus KMLocal from David Mount’s group at University of

Maryland for clustering

Project Structure

Simplifications Made

Limited data set Feature list created a priori Features from same sentence given same

orientation Sentences without features neglected Number of clusters chosen only to see correlations in

biggest cluster Small adjective seed set

Analysis

Associated Clusters with Products Found negative clusters using threshold (-0.1) Eliminated non-Negative Clusters Sorted products list twice

– Products by sales rank (given by Amazon)– Products sorted by hypothesis with tweak

Tweak: Relative Size * Distortion Computed Spearman’s Distance

Results

Hypothesis calculates with 82% accuracy! But most of the four thousand products were pruned

due to poor orientation

Conclusion

Consumers are affected by negative reviews that correlate to show similar flaws.

Affected regardless of the positive reviews

Future Work

Larger seed set for adjectives Use more complicated NLP techniques Experiment with the size of clusters Dynamically determine features using summary

techniques Use different data sets Use different distance measure in clustering

Questions

Online feedback correlation using clustering

Technology

Delta-Notch Signaling in Odontogenesis: Correlation with … · 2017-02-25 · Delta–Notch Signaling in Odontogenesis: Correlation with Cytodifferentiation and Evidence for Feedback

Correlation Clustering: from Theory to · PDF fileCorrelation Clustering: from Theory to Practice Francesco Bonchi Yahoo Labs, Barcelona David Garcia- Soriano Yahoo Labs, Barcelona

Correlation Clustering - SpringerMACH.0000033116.… · Correlation Clustering∗ NIKHIL BANSAL nikhil@cs.cmu.edu AVRIM BLUM avrim@cs.cmu.edu SHUCHI CHAWLA shuchi@cs.cmu.edu Computer

Clustering in pursuit of temporal correlation for human ...breckon.eu/toby/publications/papers/qian17clustering.pdf · Clustering in pursuit of temporal correlation for human motion

KCK-means A Clustering Method based on Kernel Canonical Correlation Analysis

Correlation based clustering of the Stockholm Stock Exchange196577/FULLTEXT01.pdf2.1 Covariance and Correlation between stocks In real life, many measured variables are related. Consider

A Chromatic Correlation Clusteringpeople.seas.harvard.edu/~babis/ccc.pdf · Chromatic Correlation Clustering A:3 We extend our problem to address the case where relations between

Bi-correlation clustering algorithm for determining a set of co-regulated genes

Modeling and clustering disease progression for correlation with genetic and demographic factors

Qcluster: Relevance Feedback Using Adaptive …islab.kaist.ac.kr/chungcw/interconfpapers/sigmod2003...Qcluster: Relevance Feedback Using Adaptive Clustering for Content-Based Image

Correlation Clustering - uni-muenchen.de · Correlation Clustering Arthur Zimek Dissertation an der Fakult¨at fur¨ Mathematik, Informatik und Statistik der Ludwig–Maximilians–Universit¨at

Higher-Order Correlation Clustering for Image Segmentation · Higher-Order Correlation Clustering for Image Segmentation ... partitioning algorithm often used in natural language

Feedback Tracking and Correlation Spectroscopy of

Correlation Clustering

An Application of Correlation Clustering to Portfolio

Scatter/Gather Clustering: Flexibly Incorporating User ... · Scatter/Gather Clustering: Flexibly Incorporating User Feedback to Steer Clustering Results M. Shahriar Hossain1,4, Praveen

Improved Approximation Algorithms for Bipartite Correlation … · Correlation bi-clustering results approx const running time Demaine, Emanuel, Fiat, Immorlica O(log(n)) LP Charikar,

Data clustering based on correlation analysis applied to ... · Clustering of trafﬁc data based on correlation analysis is an important element of several network management objectives

Chromatic Correlation Clustering - Francesco Bonchi

Multi-view Clustering via Canonical Correlation Analysis Kamalika Chaudhuri et al. ICML 2009