Kevin teh insight presentation

Preview:

Citation preview

Disambiguating Twitter SearchKevin Tehkkwteh@gmail.comInsight Data Science Fellows ProgramMarch 2013

Tuesday, February 26, 13

That’s not the python that I meant...

Tuesday, February 26, 13

The solution? cluster-pluck.

Tuesday, February 26, 13

cluster-pluck disambiguates Twitter search in real time

Tuesday, February 26, 13

It works in Spanish too!

Tuesday, February 26, 13

Tuesday, February 26, 13

Tools

300,000Tweets

User

Filter

Word Filter Web Application

Tuesday, February 26, 13

Algorithmread query and d/l

corpus of 1500 tweets

select potentially meaningful words

countwords

cluster candidatesinto groups

assign tweetsto clusters

filter outcommon words

rank remaining words by rate of capitalization and

select top 10

rank remaining words by number

of occurrences and select top 10

link two candidates if their relative

proportion of co-occurrence is

greater than 0.25

rank connected components by

total occurrences and take top 3

Tuesday, February 26, 13

Kevin Tehkkwteh@gmail.com

Math PhD -- May ’13Topic: Noncommutative Geometry (Whatever that is)

B.A.Sc. -- April ’07Engineering Science (Whatever that is)

Tuesday, February 26, 13

Tuesday, February 26, 13

Recommended