View
71
Download
1
Category
Tags:
Preview:
Citation preview
Sentiment Analysis
Steps• Acquire • Pre-Process • Explore • Model • Test
AcquireSources: database, API, scraping...
Storage: memory, database, flat file...
Format: language specific classes that help with processing (corpus)
Pre-ProcessLetter Case
!
Stop Words
!
Stemming
Pre-ProcessLetter Case
"Hello World" -> "hello world"
Stop Words
!
Stemming
Pre-ProcessLetter Case
"Hello World" -> "hello world"
Stop Words
"The iPhone is fantastic" -> "iPhone fantastic"
Stemming
Pre-ProcessLetter Case
"Hello World" -> "hello world"
Stop Words
"The iPhone is fantastic" -> "iPhone fantastic"
Stemming
"run, runs, running, ran" -> ("run," 4)
Explore
Model• Pointwise Mutual
Information (PMI) • Discrete PMI • Naive Bayes • Maximum Entropy
Unsupervised• Pointwise Mutual
Information (PMI) • Discrete PMI • Naive Bayes • Maximum Entropy
Pointwise Mutual Information
Original: I am loving my new iPhone!!Pre-Processed: love new iphone
TERM VALUE
love +1
new +1
iphone 0 Sentiment Score: +2
Discrete PMIOriginal: I am loving my new iPhone!!Pre-Processed: love new iphone
TERM VALUE
love +4
new +1
iphone 0 Sentiment Score: +5
Supervised• Pointwise Mutual
Information (PMI) • Discrete PMI • Naive Bayes • Maximum Entropy
Naive BayesPre-Processed: love new iphone
POSITIVEWho doesn't love the new iPhone?I love tacos, especially from Taco Mayo.New shoes are the best!!!!NEGATIVEMy friend Chuck is an idiot.The new iPhone sucks, DO NOT BUY!I got dumped today, feeling super sad.
love !
new !
iphone
Naive BayesPre-Processed: love new iphone
POSITIVEWho doesn't love the new iPhone?I love tacos, especially from Taco Mayo.New shoes are the best!!!!NEGATIVEMy friend Chuck is an idiot.The new iPhone sucks, DO NOT BUY!I got dumped today, feeling super sad.
love !
new !
iphone
1.0, 0.0 !
!
!
Naive BayesPre-Processed: love new iphone
POSITIVEWho doesn't love the new iPhone?I love tacos, especially from Taco Mayo.New shoes are the best!!!!NEGATIVEMy friend Chuck is an idiot.The new iPhone sucks, DO NOT BUY!I got dumped today, feeling super sad.
love !
new !
iphone
1.0, 0.0 !
0.67, 0.33 !
Naive BayesPre-Processed: love new iphone
POSITIVEWho doesn't love the new iPhone?I love tacos, especially from Taco Mayo.New shoes are the best!!!!NEGATIVEMy friend Chuck is an idiot.The new iPhone sucks, DO NOT BUY!I got dumped today, feeling super sad.
love !
new !
iphone
1.0, 0.0 !
0.67, 0.33 !
0.5, 0.5
Naive BayesPre-Processed: love new iphone
POSITIVEWho doesn't love the new iPhone?I love tacos, especially from Taco Mayo.New shoes are the best!!!!NEGATIVEMy friend Chuck is an idiot.The new iPhone sucks, DO NOT BUY!I got dumped today, feeling super sad.
love !
new !
iphone
1.0, 0.0 !
0.67, 0.33 !
0.5, 0.5 0.72, 0.28
Evidence points to a 72% likelihood that this document is positive.
Maximum Entropy
Maximum Entropy
Maximum Entropy
Really good at large document sets.
Black box of magicVS.
TestHuman Validation: Randomly select n documents to be manually tagged and compare accuracy.
TestCross Validation: Hold n% out of already classified samples to see how good our model generalizes to new data.
Challenges
Spelling: the new iPohne is grate!
Ambiguous: The iPhone is fantastic at sucking
Figurative: this new iPhone is sick.
Irony: screen broke, thanks Apple
Context: Angry Birds looks awesome on my iPhone
Sentiment Analysis
Recommended