9
2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering University of Michigan [email protected]

Text Retrieval and Data Mining in SI - An Introduction

  • Upload
    lucien

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

Text Retrieval and Data Mining in SI - An Introduction. Qiaozhu Mei School of Information Computer Science and Engineering University of Michigan [email protected]. Challenge of Data Mining. Published Content: 3-4G/day User generated data: 8-10G/day Private text data: 3T/day. - PowerPoint PPT Presentation

Citation preview

2010 © University of Michigan 1

Text Retrieval and Data Mining in SI - An Introduction

Qiaozhu MeiSchool of Information

Computer Science and EngineeringUniversity of Michigan

[email protected]

2010 © University of Michigan

Challenge of Data Mining

2

Published Content: 3-4G/dayUser generated data: 8-10G/dayPrivate text data: 3T/day - Ramakrishnan and Tomkins 2007

2010 © University of Michigan

What do We Do in this Battle?

3

Crowd

ContextContent

Social networks

Online communitiesAcademic networks

Information networks

time location

authorship sentiments

impact

event

Topics

User

Query logs

Social bookmarks

Scientific Literature

News articles

blogstweets

Web pages

Social media

EHRContextual Text Mining

Social Data Mining

Information Retrieval

Social Network Mining

Health Informatics

Bioinformatics

Statistical Topic Modeling

Web Search

2010 © University of Michigan

Personalization v.s. Diversification

4

MSR

PageRank

Mountain Safety Research +MSR Tents +MSR Wheels + Microsoft Research …

?

Personalized Rank

Microsoft Research +Microsoft Research Redmond +Microsoft Research Asia …?

Diverse Rank

Mountain Safety Research +Microsoft Research +Metropolis Street Racer …?

- Joint work with Jian Guo, Qian Zhen

2010 © University of Michigan 5

Hot Topics in SIGMOD

Topic Evolution and Trends

What’s hot in literature/twitter?

2010 © University of Michigan 6

One Week Later

Modeling Spatiotemporal Topic Diffusion

How does discussion spread?Topic = “government response in

hurricane Katrina”

2010 © University of Michigan 7

Tom Hanks, who is my favorite movie star act the leading role.

protesting... will lose your faith by watching the movie.

a good book to past time.

... so sick of people making such a big deal about a fiction book

The Da Vinci Code

Summarizing and Tracking Opinions

What is good and what is bad?

Blogs; customer reviews

2010 © University of Michigan 8

Information retrieval community

Machine learningcommunity

Data miningcommunity

Social/Academic Network

Topical Community Detection

Who works together on what?Text Content

2010 © University of Michigan

Thanks!

9

- Joint work with Cheng Zhai, Ken Church, Bruce Schatz, Ravi Kumar, Andrew Tomkins, Denny Zhou, Jian Guo, Qian Zhen, Xu Ling, Duo Zhang, Deng Cai, Dong Xin, Chao Liu ...