Upload
lucien
View
53
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Text Retrieval and Data Mining in SI - An Introduction. Qiaozhu Mei School of Information Computer Science and Engineering University of Michigan [email protected]. Challenge of Data Mining. Published Content: 3-4G/day User generated data: 8-10G/day Private text data: 3T/day. - PowerPoint PPT Presentation
Citation preview
2010 © University of Michigan 1
Text Retrieval and Data Mining in SI - An Introduction
Qiaozhu MeiSchool of Information
Computer Science and EngineeringUniversity of Michigan
2010 © University of Michigan
Challenge of Data Mining
2
Published Content: 3-4G/dayUser generated data: 8-10G/dayPrivate text data: 3T/day - Ramakrishnan and Tomkins 2007
2010 © University of Michigan
What do We Do in this Battle?
3
Crowd
ContextContent
Social networks
Online communitiesAcademic networks
Information networks
time location
authorship sentiments
impact
event
Topics
User
Query logs
Social bookmarks
Scientific Literature
News articles
blogstweets
Web pages
Social media
EHRContextual Text Mining
Social Data Mining
Information Retrieval
Social Network Mining
Health Informatics
Bioinformatics
Statistical Topic Modeling
Web Search
2010 © University of Michigan
Personalization v.s. Diversification
4
MSR
PageRank
Mountain Safety Research +MSR Tents +MSR Wheels + Microsoft Research …
?
Personalized Rank
Microsoft Research +Microsoft Research Redmond +Microsoft Research Asia …?
Diverse Rank
Mountain Safety Research +Microsoft Research +Metropolis Street Racer …?
- Joint work with Jian Guo, Qian Zhen
2010 © University of Michigan 5
Hot Topics in SIGMOD
Topic Evolution and Trends
What’s hot in literature/twitter?
2010 © University of Michigan 6
One Week Later
Modeling Spatiotemporal Topic Diffusion
How does discussion spread?Topic = “government response in
hurricane Katrina”
2010 © University of Michigan 7
Tom Hanks, who is my favorite movie star act the leading role.
protesting... will lose your faith by watching the movie.
a good book to past time.
... so sick of people making such a big deal about a fiction book
The Da Vinci Code
Summarizing and Tracking Opinions
What is good and what is bad?
Blogs; customer reviews
2010 © University of Michigan 8
Information retrieval community
Machine learningcommunity
Data miningcommunity
Social/Academic Network
Topical Community Detection
Who works together on what?Text Content