27
PREDICTIVE ANALYSIS OF UNITED STATES PRESIDENTIAL ELECTIONS USING Machine Learning A Project by Harindu Kodituwakku Submitted as final year project towards completion of BEng (Honours) in Comp

Predictive Analysis of the U.S. Presidential Election using Machine Learning

Embed Size (px)

Citation preview

Page 1: Predictive Analysis of the U.S. Presidential Election using Machine Learning

PREDICTIVE ANALYSIS OF UNITED STATES PRESIDENTIAL ELECTIONS USING

Machine Learning

A Project by Harindu Kodituwakku

Submitted as final year project towards completion of BEng (Honours) in Computing Science.

Page 2: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Problem Analysis

Page 3: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Project Objectives • Objective

A desktop application that analyze and visualize the Predictive results of the U.S. Election 2016 using Twitter data.

• Assumptions and Constraints Tweets related only to Democratic and Republican Presidential

candidates will be analyzed. Tweets related to English Language will only be considered. Predictive analysis will be calculated using the tweets collected

during 2016/10/11 – 2016/11/07.

• The Scope – Machine Learning, Sentiment Analysis, Big Data

Page 4: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Resource Analysis

SimilarApproachAnalysis

ExtractConcept

s

Solution

Concept

Research Approach

Sentiment140.com The Predictive Power of Social Media: On the Predictability of U.S.

Presidential Elections using Twitter – 2012 Predicting Elections with Twitter: What 140 Characters Reveal about

Political Sentiment - 2008

Page 5: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Literature Review• Twitter vs Facebook• Levels of Sentiment Analysis • Feature Extraction Methodologies • POS Tagging • Negation handling

• Supervised Machine Learning vs Unsupervised Machine Learning Techniques.

Page 6: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Solution Concept• Data collection through Twitter.• Supervised Machine Learning will be selected as the

classification technique.• Naïve Bayes Algorithm will be selected as the machine

learning algorithm. • MongoDB No-SQL Database will be used to store tweets

reguarding the candidates.

Page 7: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Design• 1st Iteration – Primitive and Informal design.• 2nd Iteration – Integration of No-SQL database.

Proper OOed UML Design. The behavioral and structural relationships with

the Python classes were showcased.• 3rd Iteration – Integration with Matplotlib graphs

for data representation.

Page 8: Predictive Analysis of the U.S. Presidential Election using Machine Learning

ImplementationCore

Components Implementation of No-SQL Database Live Tweets Streamer Pre-processing of Tweets Classification Algorithm Implementation of Word Cloud

Page 9: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Naïve Bayes AlgorithmĈ = argmaxc P(c | d)

Ĉ = argmaxc P(c | d) = argmaxc P (d | c) P(c)

Page 10: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Word Cloud

Page 11: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Testing• 1st Iteration – Unit Testing

Unit Tested the cleaning and pre-processing algorithms. PyUnit was used for Unit testing.

• 2nd Iteration – Integration Testing Main components of the system were integrated during this

iteration. Many test cases were failed during this iteration. • 3rd Iteration – Accuracy Testing

This iteration was divided into 3 phases. In each iteration the classifier was trained using differnet number

of training data. Improved the accuracy of the system drastically.

Page 12: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Testing

Page 13: Predictive Analysis of the U.S. Presidential Election using Machine Learning

TestingMost Informative Features Contains (bummed) = True 0 : 1 = 39.7 : 1.0 Contains (lonely) = True 0 : 1 = 25.9 : 1.0 Contains (followfriday) = True 1 : 0 = 23.7 : 1.0 Contains (tummy) = True 0 : 1 = 20.6 : 1.0 Contains (infection) = True 0 : 1 = 17.0 : 1.0 Contains (ankle) = True 0 : 1 = 16.3 : 1.0 contains (cancelled) = True 0 : 1 = 15.0 : 1.0 contains (heyy) = True 1 : 0 = 15.0 : 1.0 contains(boom) = True 1 : 0 = 15.0 : 1.0 contains (hurts) = True 0 : 1 = 14.9 : 1.0 contains (sad) = True 0 : 1 = 14.8 : 1.0 contains (depressed) = True 0 : 1 = 14.6 : 1.0 contains (hating) = True 0 : 1 = 14.3 : 1.0 contains (worst) = True 0 : 1 = 13.9 : 1.0

Page 14: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Performance Evaluation

0 10000 20000 30000 40000 50000 600000

10

20

30

40

50

60

70

80

90

Evaluating the Accuracy of the Classifier

Number of Tweets in the Training Dataset

Accu

racy

(%)

Page 15: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Performance Evaluation

At 9.00 am – 2016/11/09 At 11.00 am – 2016/11/09

Page 16: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Performance Evaluation

At 9.00 am – 2016/11/09 At 11.00 am – 2016/11/09

Page 17: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Contribution• Implementation of No-SQL Database• Use of Python Object Serialization • Removal of Neutral Dataset from the Training • Implementation of the Word Cloud

Page 18: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Predictive Analysis

Page 19: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Predictive Analysis

Page 20: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Predictive Analysis

Page 21: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Predictive Analysis

Page 22: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Predictive Analysis

Source: https://shift.newco.co/what-i-discovered-about-trump-and-clinton-from-analyzing-4-million-facebook-posts-922a4381fd2f#.44nju339g

Page 23: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Predictive Analysis

Page 24: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Predictive Analysis

Source: https://shift.newco.co/what-i-discovered-about-trump-and-clinton-from-analyzing-4-million-facebook-posts-922a4381fd2f#.44nju339g

Page 25: Predictive Analysis of the U.S. Presidential Election using Machine Learning

System Evaluation • The application is unable to clearly detect the true sentiment of

Sarcastic tweets.• When the names of both candidates were mentioned in a single

tweet, polarity for each candidate is unable to detect.e.g.- Trump is a racist but Hillary is a humanitarian.

In Real time sentiment Analysis, only 100 tweets can be classified per time.

Page 26: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Further Improvements • The Accuracy can be further improved using larger training data

set.• The Classification engine can be hosted in a web service in order to

obtain real time classification without any delay in the process.• By gathering larger number of testing data (tweets), more accurate

classification can be obtained.• More sophisticated natural language processing techniques should

be implemented to detect sarcasm and slang tweets.

Page 27: Predictive Analysis of the U.S. Presidential Election using Machine Learning

Q&A