Intro to Sentiment Analysis

Preview:

Citation preview

Sentiment Analysis

Steps• Acquire • Pre-Process • Explore • Model • Test

AcquireSources: database, API, scraping...

Storage: memory, database, flat file...

Format: language specific classes that help with processing (corpus)

Pre-ProcessLetter Case

!

Stop Words

!

Stemming

Pre-ProcessLetter Case

"Hello World" -> "hello world"

Stop Words

!

Stemming

Pre-ProcessLetter Case

"Hello World" -> "hello world"

Stop Words

"The iPhone is fantastic" -> "iPhone fantastic"

Stemming

Pre-ProcessLetter Case

"Hello World" -> "hello world"

Stop Words

"The iPhone is fantastic" -> "iPhone fantastic"

Stemming

"run, runs, running, ran" -> ("run," 4)

Explore

Model• Pointwise Mutual

Information (PMI) • Discrete PMI • Naive Bayes • Maximum Entropy

Unsupervised• Pointwise Mutual

Information (PMI) • Discrete PMI • Naive Bayes • Maximum Entropy

Pointwise Mutual Information

Original: I am loving my new iPhone!!Pre-Processed: love new iphone

TERM VALUE

love +1

new +1

iphone 0 Sentiment Score: +2

Discrete PMIOriginal: I am loving my new iPhone!!Pre-Processed: love new iphone

TERM VALUE

love +4

new +1

iphone 0 Sentiment Score: +5

Supervised• Pointwise Mutual

Information (PMI) • Discrete PMI • Naive Bayes • Maximum Entropy

Naive BayesPre-Processed: love new iphone

POSITIVEWho doesn't love the new iPhone?I love tacos, especially from Taco Mayo.New shoes are the best!!!!NEGATIVEMy friend Chuck is an idiot.The new iPhone sucks, DO NOT BUY!I got dumped today, feeling super sad.

love !

new !

iphone

Naive BayesPre-Processed: love new iphone

POSITIVEWho doesn't love the new iPhone?I love tacos, especially from Taco Mayo.New shoes are the best!!!!NEGATIVEMy friend Chuck is an idiot.The new iPhone sucks, DO NOT BUY!I got dumped today, feeling super sad.

love !

new !

iphone

1.0, 0.0 !

!

!

Naive BayesPre-Processed: love new iphone

POSITIVEWho doesn't love the new iPhone?I love tacos, especially from Taco Mayo.New shoes are the best!!!!NEGATIVEMy friend Chuck is an idiot.The new iPhone sucks, DO NOT BUY!I got dumped today, feeling super sad.

love !

new !

iphone

1.0, 0.0 !

0.67, 0.33 !

Naive BayesPre-Processed: love new iphone

POSITIVEWho doesn't love the new iPhone?I love tacos, especially from Taco Mayo.New shoes are the best!!!!NEGATIVEMy friend Chuck is an idiot.The new iPhone sucks, DO NOT BUY!I got dumped today, feeling super sad.

love !

new !

iphone

1.0, 0.0 !

0.67, 0.33 !

0.5, 0.5

Naive BayesPre-Processed: love new iphone

POSITIVEWho doesn't love the new iPhone?I love tacos, especially from Taco Mayo.New shoes are the best!!!!NEGATIVEMy friend Chuck is an idiot.The new iPhone sucks, DO NOT BUY!I got dumped today, feeling super sad.

love !

new !

iphone

1.0, 0.0 !

0.67, 0.33 !

0.5, 0.5 0.72, 0.28

Evidence points to a 72% likelihood that this document is positive.

Maximum Entropy

Maximum Entropy

Maximum Entropy

Really good at large document sets.

Black box of magicVS.

TestHuman Validation: Randomly select n documents to be manually tagged and compare accuracy.

TestCross Validation: Hold n% out of already classified samples to see how good our model generalizes to new data.

Challenges

Spelling: the new iPohne is grate!

Ambiguous: The iPhone is fantastic at sucking

Figurative: this new iPhone is sick.

Irony: screen broke, thanks Apple

Context: Angry Birds looks awesome on my iPhone

Sentiment Analysis

Recommended