15
Named Entity Recognition in Tweets: TwitterNLP Ludymila Lobo Twitter NLP Ludymila Lobo

Named Entity Recognition in Tweets: TwitterNLP Ludymila Lobo Twitter NLP Ludymila Lobo

Embed Size (px)

Citation preview

Named Entity Recognition in Tweets:TwitterNLP

Ludymila Lobo

Twitter NLP

Ludymila Lobo

Reading material

Named Entity Recognition in Tweets, RITTER, Alan, CLARK, Sam, Mausam and ETZIONI, Oren. Obtained on Association for Computational Linguistics website, at https://aclweb.org/anthology/D/D11/D11-1141.pdf

http://www.academia.edu/1128304/Shallow_parsing_as_part-of-speech_tagging

Twitter NLP Tool

https://github.com/aritter/twitter_nlp

Aplication with Twitter NLP

statuscalendar.com

Collecting Tweets

https://dev.twitter.com

http://www.webdevdoor.com/jquery/twitter-feed-authentication-search

https://github.com/abraham/twitteroauth

http://sourceforge.net/projects/xampp/

Resources

http://www.webdevdoor.com/jquery/twitter-feed-authentication-search/

Big amount of data (even more than Library of Congress -Washington D.C.)*, with 151 millions of itens

Real time information, some times more up-to-date than articles.

Why Twitter?

http://pt.wikipedia.org/wiki/Library_of_Congress

*Hachman (2011)

Noisy and informal nature

Diversity of entities (companies, products, bands, teams, movies, etc), that are not relatively frequent, which makes a sample of Tweets with a few examples

Lack of context

Challenges

http://twitter.com

Tool

• https://github.com/aritter/twitter_nlp• Unzip file, on Linux terminal type:– sh build.sh

Tool

• statuscalendar.com

How it works

POS (Part of Speech) ->NLP, clustering Chunking (shallow parsing)

@paulwalk oIt b-np's b-vpthe b-npview i-npfrom b-ppwhere b-advpI b-np'm b-vpliving i-vpfor b-pptwo b-npweeks i-np

best ADJ ADV NP V better ADJ ADV V DET close ADV ADJ V N cut V N VN VD even ADV DET ADJ V grant NP N V hit V VD VN N DET

How it works

Capitalization classifier:Predicts whether or not a tweet is informatively capitalized (using SVM learning)

NER (Named Entity Recognition)

POS (Part of Speech) ->NLP, clustering

Chunking (shallow parsing)

Tom Hanks was awesome in Forrest Gump

actor movie

Tool

@cityofcalgary: Free swimming and golf tomorrow for @cbc Sports Day in Canada #yyc #sportsday http://ow.ly/2G4sf

@cityofcalgary/O :/O Free/O swimming/O and/O golf/O tomorrow/O for/O @cbc/O Sports/B-other Day/I-other in/O Canada/B-geo-loc #yyc/O #sportsday/O http://ow.ly/2G4sf/O

Adam Beyer: Swedish Techno Pioneer: When it comes to his own DJing and sound, he's slightly more diverse and likes...

Adam/B-person Beyer/I-person :/O Swedish/O Techno/O Pioneer/O :/O When/O it/O comes/O to/O his/O own/O DJing/O and/O sound/O ,/O he/O 's/O slightly/O more/O diverse/O and/O likes/O

How to retrieve data from Twitter?

https://dev.twitter.com

<?phpsession_start();require_once("twitteroauth/twitteroauth/twitteroauth.php"); //Path to twitteroauth library $search = "wpi OR #WPI";$notweets = 50;$consumerkey = “123456";$consumersecret = “123456";$accesstoken = "123456";$accesstokensecret = “123456"; function getConnectionWithAccessToken($cons_key, $cons_secret, $oauth_token, $oauth_token_secret) { $connection = new TwitterOAuth($cons_key, $cons_secret, $oauth_token, $oauth_token_secret); return $connection;} $connection = getConnectionWithAccessToken($consumerkey, $consumersecret, $accesstoken, $accesstokensecret); $search = str_replace("#", "%23", $search); $tweets = $connection->get("https://api.twitter.com/1.1/search/tweets.json?q=".$search."&count=".$notweets);

echo json_encode($tweets);?>

http://www.webdevdoor.com/jquery/twitter-feed-authentication-search/

• Authentication libraryhttps://github.com/abraham/twitteroauth

Download and include in the same folder as the code

How to retrieve data from Twitter?

How to retrieve data from Twitter?

http://sourceforge.net/projects/xampp/

How to retrieve data from Twitter?

Copy the project folder to C:\xampp\htdocs

How to retrieve data from Twitter?

http://localhost/TwitterStreams/tweet.php on a browser