12
CLASSIFICATION OF TWEETS MUKUL KUMAR JHA (201205567) KONDAPALLI SIRISHA (201150873) AVANTI GUPTA (201305553) SUKHJASHAN SINGH (201101092) Mentor: ROMIL BANSAL

Tweets Classifier

Embed Size (px)

DESCRIPTION

video link => http://youtu.be/D9PBX8FmtpQ Tweets Classifier which categorises tweets into these 6 categories: Business Politics Music Health Sports Technology

Citation preview

Page 1: Tweets Classifier

CLASSIFICATION OF TWEETS

MUKUL KUMAR JHA (201205567)KONDAPALLI SIRISHA (201150873)AVANTI GUPTA (201305553)SUKHJASHAN SINGH (201101092)

Mentor:ROMIL BANSAL

Page 2: Tweets Classifier

INTRODUCTION

Tweet Classification model categorizes the input tweets into one of the genres like politics, sports, music, technology, health and business.

Model was trained from a set of predefined tweets.

Based on this training model, the classifier makes decision regarding which class the test input belongs to.

Page 3: Tweets Classifier

APPROACHES

•First challenge was to collect a proper set of tweets which was going to be utilized for training the model.

• Next step was to identify a set of keywords for each category based on which tweets were fetched.

Two Approaches were used: 1) Naive Baye’s 2) SVM (Support Vector Machine)

Relative comparison of performance of both Algorithms.

Page 4: Tweets Classifier

NAÏVE BAYE’S MODEL

• A high dimensional dense vector for each tweet is constructed.

• Vector is constructed using each unique word of training tweets.

• Each word is treated as an independent feature.

• These features are treated as independent of each other and they contribute equally in classification of any tweet.

Page 5: Tweets Classifier

SUPPORT VECTOR MACHINE

• A high dimensional dense vector is constructed for input tweet.

• Multiclass variant of SVM model was created for having multi-class classification.

Feature Selection

Here each word in the tweet is taken as independent feature which contributes inthe decision of classifying the tweet into any class.We are using Unigram approach in this techique.

Tools/libraries used

LIBSVM : Used to scale train and test file.WEKA : Used for implementing Naive Bayes classification.

Page 6: Tweets Classifier

Over Fitting issues

There is high probability that this classification model will be highly biased towards its training set data. So the impact on the classification is one particular tweet will be classified in its correct class because words used in were present in training set but tweet with similar meaning but containing different set of words might not be classified in the same class.

Page 7: Tweets Classifier

BLOCK DIAGRAM

Page 8: Tweets Classifier

EXPERIMENTS AND RESULTS

•The model has been experimented with a certain amount of test data separated from the training data. The model, in turn, was verified for accuracy levels.

•The final result is the graph / chart categorizing the user tweets on various genres.

Page 9: Tweets Classifier

Tweet : microsoft 's cortana assistant personalization comes to bing on the webResult : Technology Class (Naïve Bayes Model)

Page 10: Tweets Classifier

Tweet : Lady Gaga released a new albumResult : Music Class (SVM model)

Page 11: Tweets Classifier

CONCLUSION

Using the above described approaches(SVM and Naïve Bayes) tweets are classified into their respective categories with a very little percentage of error.

Page 12: Tweets Classifier

REFERENCES

•A Machine Learning Approach to Twitter User Classification by Marco Pennacchiotti and Ana-Maria Popescuhttp://coitweb.uncc.edu/~anraja/courses/SMS/SMSBib/2886-14198-1-PB.pdf

•Short Text Classification in Twitter to Improve Information Filtering by Bharath Sriram, David Fuhry, Engin Demir, Hakan Ferhatosmanogluhttp://www.cs.bilkent.edu.tr/~hakan/publication/TweetClassification.pdf

•Twitter Trending Topic Classification by Kathy Lee, Diana Palsetia, Ramanathan Narayanan, Md. Mostofa Ali Patwary, Ankit Agrawal, and Alok Choudharyhttp://cucis.ece.northwestern.edu/publications/pdf/LeePal11.pdf

•Analysis and Classication of Twitter messages by Christopher Hornhttp://know-center.tugraz.at/wp-content/uploads/2010/12/Master-Thesis-Christopher-Horn.pdf