12
CLASSIFICATION OF TWEETS MUKUL KUMAR JHA (201205567) KONDAPALLI SIRISHA (201150873) AVANTI GUPTA (201305553) SUKHJASHAN SINGH (201101092) Mentor: ROMIL BANSAL

CLASSIFICATION OF TWEETS

Embed Size (px)

DESCRIPTION

MAIN AIM WAS TO DIVIDE THE TWEETS INTO SEVERAL CATEGORIES USING SVM AND NAIVE BAYES CLASSIFIER.

Citation preview

Page 1: CLASSIFICATION OF TWEETS

CLASSIFICATION OF TWEETS

MUKUL KUMAR JHA (201205567)KONDAPALLI SIRISHA (201150873)AVANTI GUPTA (201305553)SUKHJASHAN SINGH (201101092)

Mentor:ROMIL BANSAL

Page 2: CLASSIFICATION OF TWEETS

INTRODUCTION

Tweet Classification model categorizes the input tweets into one of the genres like politics, sports, music, technology, health and business.

Model was trained from a set of predefined tweets.

Based on this training model, the classifier makes decision regarding which class the test input belongs to.

Page 3: CLASSIFICATION OF TWEETS

APPROACHES

•First challenge was to collect a proper set of tweets which was going to be utilized for training the model.

• Next step was to identify a set of keywords for each category based on which tweets were fetched.

Two Approaches were used: 1) Naive Baye’s 2) SVM (Support Vector Machine)

Relative comparison of performance of both Algorithms.

Page 4: CLASSIFICATION OF TWEETS

NAÏVE BAYE’S MODEL

• A high dimensional dense vector for each tweet is constructed.

• Vector is constructed using each unique word of training tweets.

• Each word is treated as an independent feature.

• These features are treated as independent of each other and they contribute equally in classification of any tweet.

Page 5: CLASSIFICATION OF TWEETS

SUPPORT VECTOR MACHINE

• A high dimensional dense vector is constructed for input tweet.

• Multiclass variant of SVM model was created for having multi-class classification.

Feature Selection

Here each word in the tweet is taken as independent feature which contributes inthe decision of classifying the tweet into any class.We are using Unigram approach in this techique.

Tools/libraries used

LIBSVM : Used to scale train and test file.WEKA : Used for implementing Naive Bayes classification.

Page 6: CLASSIFICATION OF TWEETS

Over Fitting issues

There is high probability that this classification model will be highly biased towards its training set data. So the impact on the classification is one particular tweet will be classified in its correct class because words used in were present in training set but tweet with similar meaning but containing different set of words might not be classified in the same class.

Page 7: CLASSIFICATION OF TWEETS

BLOCK DIAGRAM

Page 8: CLASSIFICATION OF TWEETS

EXPERIMENTS AND RESULTS

•The model has been experimented with a certain amount of test data separated from the training data. The model, in turn, was verified for accuracy levels.

•The final result is the graph / chart categorizing the user tweets on various genres.

Page 9: CLASSIFICATION OF TWEETS

Tweet : microsoft 's cortana assistant personalization comes to bing on the webResult : Technology Class (Naïve Bayes Model)

Page 10: CLASSIFICATION OF TWEETS

Tweet : Lady Gaga released a new albumResult : Music Class (SVM model)

Page 11: CLASSIFICATION OF TWEETS

CONCLUSION

Using the above described approaches(SVM and Naïve Bayes) tweets are classified into their respective categories with a very little percentage of error.

Page 12: CLASSIFICATION OF TWEETS

REFERENCES

•A Machine Learning Approach to Twitter User Classification by Marco Pennacchiotti and Ana-Maria Popescuhttp://coitweb.uncc.edu/~anraja/courses/SMS/SMSBib/2886-14198-1-PB.pdf

•Short Text Classification in Twitter to Improve Information Filtering by Bharath Sriram, David Fuhry, Engin Demir, Hakan Ferhatosmanogluhttp://www.cs.bilkent.edu.tr/~hakan/publication/TweetClassification.pdf

•Twitter Trending Topic Classification by Kathy Lee, Diana Palsetia, Ramanathan Narayanan, Md. Mostofa Ali Patwary, Ankit Agrawal, and Alok Choudharyhttp://cucis.ece.northwestern.edu/publications/pdf/LeePal11.pdf

•Analysis and Classication of Twitter messages by Christopher Hornhttp://know-center.tugraz.at/wp-content/uploads/2010/12/Master-Thesis-Christopher-Horn.pdf