Upload
pamela-patterson
View
223
Download
0
Embed Size (px)
DESCRIPTION
Outline Background and Motivation Problem Statement Proposed USFS Framework Experimental Results Conclusions and Future Work
Citation preview
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1
Unsupervised Streaming Feature Selection in Social Media
Jundong Li1, Xia Hu2, Jiliang Tang3 and Huan Liu1
1Arizona State University2Texas A&M University
3Yahoo! Labs
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 22
• Background and Motivation
• Problem Statement
• Proposed USFS Framework
• Experimental Results
• Conclusions and Future Work
Outline
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 33
Social Media
• Rapid growth of social media provides a platform for people to perform online social activities
• Massive amounts of high dimensional data are user generated and quickly disseminated
• It is desirable to reduce the dimensionality of social media data due to curse of dimensionality
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 44
• Feature selection is effective to preparing high-dimensional data by selecting a subset of relevant features for a compact and accurate representation
Feature Selection
feature selection
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 55
• Traditional feature selection assumes that all features are static and known in advance
• Features in social media are usually generated dynamically in a streaming fashion– Twitter produces more than 500 millions of tweets
everyday and a large amount of slang words (features) are continuously being user generated
– In disaster relief, topics (features) like ``Chile Earthquake” emerge to be hot shortly
Feature Selection in Social Media
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 66
• It is more appealing to perform streaming feature selection to capture relevant features timely
Streaming Feature Selection
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 77
• Challenges– Label information is costly – Data not i.i.d
• Opportunities– Link information is abundant and maybe helpful
• Target– Propose an unsupervised streaming feature selection
algorithm for social media data
Challenges, Opportunities and Target
No existing unsupervised streaming feature selection
algorithms !
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 88
• Background and Motivation
• Problem Statement
• Proposed USFS Framework
• Experimental Results
• Conclusions and Future Work
Outline
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 99
• Given n linked instances, let adjacency M denotes their link information. Assume that features arrive dynamically one each time, at time step t, each instance is associated with a set of streaming features X(t) = {f1, f2, …, ft}
• we want to select a subset of relevant features at each time step effectively and efficiently by using link information M and content information X(t)
Problem Statement
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1010
Illustration
……
t+i ……
……
…………
t t+1
t+it t+1t+it t+1
t+it t+1
t+it t+1
Selected Feature Set
Accept the new feature?
Reject existing feature?
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1111
• Background and Motivation
• Problem Statement
• Proposed USFS Framework
• Experimental Results
• Conclusions and Future Work
Outline
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1212
• Social media users connect due to a variety of reasons such as movie fans, sports enthusiasts, colleagues, etc
• Users with similar hidden factors are similar• Hidden factors are helpful to steer unsupervised
streaming feature selection • We use mixed membership stochastic blockmodel
(MMSB) [Blei+NIPS2009] to extract hidden social factors from link information
Modeling Link Information
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1313
• At time step t:
• Hidden social factors as regression targets• L1-norm can be used for feature selection
Modeling Link Information (con’t)
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1414
• If two users are similar in the original feature space, the two users are also similar in the selected feature space.
Modeling Content Information
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1515
Optimization Formulation at Time t
• By combining network information and content information
• Decompose into a set of sub-problems
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1616
• At time step t+1 when the new feature arrives:
• Objective function is reduced if the reduction in 1st,3rd,4th term outweighs the increase in the 2nd term
• Therefore, the condition to accept the new feature is
Testing New Feature
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1717
• Test existing features when new feature is added• When new feature is accepted, we optimize the
following w.r.t. current variables, which forces some feature coefficient to be zero
• Convex optimization problem, we use Broyden-Fletcher-Goldfarb-Shanno (BFGS)
Testing Existing Features
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1818
Feature Selection by USFS
• If the new feature is accepted, we obtain sparse coefficient matrix by solving all sub-problems
• For each feature j, if any of its k corresponding feature weight is nonzero, the feature is included in the final model, the feature score is defined as
• Features are ranked in a descending order by their feature scores
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 1919
• Background and Motivation
• Problem Statement
• Proposed USFS Framework
• Experimental Results
• Conclusions and Future Work
Outline
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2020
• Q1: How is the quality of selected features by the USFS framework?
• Q2: How efficient is the proposed USFS framework?
Questions to Investigate
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2121
• BlogCatalog (social blog directory)• Flickr (image sharing website)
• Assume features arrive in a random order, take {20%,30%,…,90%,100%} of all features
Datasets
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2222
• Evaluation– Clustering: K-means– Metrics: Accuracy and NMI
• Baseline batch-mode methods• Laplacian Score [He et al. NIPS 2005]• SPEC [Zhao and Liu. ICML 2007]• NDFS [Li et al. AAAI 2012]• LUFS [Tang and Liu, KDD 2012]
Experimental Settings
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2323
Performance on Flickr
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2424
Performance on BlogCatalog
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2525
Cumulative Running Time
• In BlogCatalog, USFS is 7x, 20x, 29x, 76x faster • In Flickr, USFS is 5x, 11x, 20x, 75x faster
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2626
• Background and Motivation
• Problem Statement
• Proposed USFS Framework
• Experimental Results
• Conclusions and Future Work
Outline
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2727
• Goals: – Perform unsupervised streaming feature selection
for social media data• Solutions:
– Leverage link information as constraints– Stagewise algorithm for streaming features
• Results: – Achieve better feature selection performance in
terms of clustering– Reduce running time compared with batch-mode
methods
Conclusion
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 2828
• In this work, we consider the link information is relative stable compared with dynamic content information, we will investigate streaming feature selection in dynamic networks
• Streaming features come from different sources, we will investigate how to fuse heterogeneous feature sources for streaming feature selection
Future Work
Unsupervised Streaming Feature Selection in Social MediaArizona State University Data Mining and Machine Learning Lab CIKM 2015 29
Acknowledgement: This material is, in part, supported by National Science Foundation (NSF) under grant number IIS-1217466. Comments and suggestions from DMML members and reviewers are greatly appreciated.
Questions