9
2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering University of Michigan [email protected]

2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering

2010 © University of Michigan 1

Text Retrieval and Data Mining in SI

- An Introduction

Qiaozhu Mei

School of Information

Computer Science and EngineeringUniversity of Michigan

[email protected]

Page 2: 2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering

2010 © University of Michigan

Challenge of Data Mining

2

Published Content: 3-4G/dayUser generated data: 8-10G/dayPrivate text data: 3T/day - Ramakrishnan and Tomkins 2007

Page 3: 2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering

2010 © University of Michigan

What do We Do in this Battle?

3

Crowd

ContextContent

Social networks

Online communities

Academic networks

Information networks

time location

authorship sentiments

impact

event

Topics

User

Query logs

Social bookmarks

Scientific Literature

News articles

blogstweets

Web pages

Social media

EHRContextual Text Mining

Social Data Mining

Information Retrieval

Social Network Mining

Health Informatics

Bioinformatics

Statistical Topic Modeling

Web Search

Page 4: 2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering

2010 © University of Michigan

Personalization v.s. Diversification

4

MSR

PageRank

Mountain Safety Research +MSR Tents +MSR Wheels + Microsoft Research …

?

Personalized Rank

Microsoft Research +Microsoft Research Redmond +Microsoft Research Asia …

?

Diverse Rank

Mountain Safety Research +Microsoft Research +Metropolis Street Racer …

?

- Joint work with Jian Guo, Qian Zhen

Page 5: 2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering

2010 © University of Michigan 5

Hot Topics in SIGMOD

Topic Evolution and Trends

What’s hot in literature/twitter?

Page 6: 2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering

2010 © University of Michigan 6

One Week Later

Modeling Spatiotemporal Topic Diffusion

How does discussion spread?

Topic = “government response in hurricane Katrina”

Page 7: 2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering

2010 © University of Michigan 7

Tom Hanks, who is my favorite movie star act the leading role.

protesting... will lose your faith by watching the movie.

a good book to past time.

... so sick of people making such a big deal about a fiction book

The Da Vinci Code

Summarizing and Tracking Opinions

What is good and what is bad?

Blogs; customer reviews

Page 8: 2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering

2010 © University of Michigan 8

Information retrieval community

Machine learningcommunity

Data miningcommunity

Social/Academic Network

Topical Community Detection

Who works together on what?Text Content

Page 9: 2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering

2010 © University of Michigan

Thanks!

9

- Joint work with Cheng Zhai, Ken Church, Bruce Schatz, Ravi Kumar, Andrew Tomkins, Denny Zhou, Jian Guo, Qian Zhen, Xu Ling, Duo Zhang, Deng Cai, Dong Xin, Chao Liu ...