Chinese Blog Clustering by Hidden Sentiment Factors
ADMA 2009Shi Feng, Daling Wang, Ge Yu,
Chao Yang, and Nan Yang.College of Information Science and
Engineering, Northeastern University
Hidden Sentiment Factors(HSF)
• Probabilistic latent semantic analysis (PLSA)– Blog Set B = {b1,b2,…,bN}– Sentiment words set W = {w1,w2,…,wM}• NTUSD
– 2,812 positive words and 8,276 negative words
• Hownet Sentiment Dictionary– 4,566positive words and 4,370 negative words
– A = NxM Matrix , A(i,j) = Freq(bi,wj)– HSF Z = {z1,z2,….,zk}
Hidden Sentiment Factors(HSF)
Hidden Sentiment Factors(HSF)
P(w|b) -> P(z|b)
Clustering by HSF
• K-Means Algorithm– k’ : # of clusters. In this paper, set k’ = k.– Fig.1 Similarity=0– Fig.2 Similarity=?
Label Words Extraction
Experiment
– 1. Collect blogs about reviews on Stephen Chow’s movie “CJ7” (Long River 7)
– 2. Collect blog entries about Liu Xiang since 2008/8/18.
• Tag1. “Positive”, “Negative” and “Neutral”Tag2. “Irrelevant” or not
• Ex: A blog may tagged {“Positive” , ”Irrelevant”}, {“Neutral”} or {“Negative” , ”Irrelevant”}
• Evaluate the clustering purity.
Experiment
Experiment
Experiment
Experiment