25
Sentiment Analysis

Intro to Sentiment Analysis

Embed Size (px)

Citation preview

Page 1: Intro to Sentiment Analysis

Sentiment Analysis

Page 2: Intro to Sentiment Analysis

Steps• Acquire • Pre-Process • Explore • Model • Test

Page 3: Intro to Sentiment Analysis

AcquireSources: database, API, scraping...

Storage: memory, database, flat file...

Format: language specific classes that help with processing (corpus)

Page 4: Intro to Sentiment Analysis

Pre-ProcessLetter Case

!

Stop Words

!

Stemming

Page 5: Intro to Sentiment Analysis

Pre-ProcessLetter Case

"Hello World" -> "hello world"

Stop Words

!

Stemming

Page 6: Intro to Sentiment Analysis

Pre-ProcessLetter Case

"Hello World" -> "hello world"

Stop Words

"The iPhone is fantastic" -> "iPhone fantastic"

Stemming

Page 7: Intro to Sentiment Analysis

Pre-ProcessLetter Case

"Hello World" -> "hello world"

Stop Words

"The iPhone is fantastic" -> "iPhone fantastic"

Stemming

"run, runs, running, ran" -> ("run," 4)

Page 8: Intro to Sentiment Analysis

Explore

Page 9: Intro to Sentiment Analysis

Model• Pointwise Mutual

Information (PMI) • Discrete PMI • Naive Bayes • Maximum Entropy

Page 10: Intro to Sentiment Analysis

Unsupervised• Pointwise Mutual

Information (PMI) • Discrete PMI • Naive Bayes • Maximum Entropy

Page 11: Intro to Sentiment Analysis

Pointwise Mutual Information

Original: I am loving my new iPhone!!Pre-Processed: love new iphone

TERM VALUE

love +1

new +1

iphone 0 Sentiment Score: +2

Page 12: Intro to Sentiment Analysis

Discrete PMIOriginal: I am loving my new iPhone!!Pre-Processed: love new iphone

TERM VALUE

love +4

new +1

iphone 0 Sentiment Score: +5

Page 13: Intro to Sentiment Analysis

Supervised• Pointwise Mutual

Information (PMI) • Discrete PMI • Naive Bayes • Maximum Entropy

Page 14: Intro to Sentiment Analysis

Naive BayesPre-Processed: love new iphone

POSITIVEWho doesn't love the new iPhone?I love tacos, especially from Taco Mayo.New shoes are the best!!!!NEGATIVEMy friend Chuck is an idiot.The new iPhone sucks, DO NOT BUY!I got dumped today, feeling super sad.

love !

new !

iphone

Page 15: Intro to Sentiment Analysis

Naive BayesPre-Processed: love new iphone

POSITIVEWho doesn't love the new iPhone?I love tacos, especially from Taco Mayo.New shoes are the best!!!!NEGATIVEMy friend Chuck is an idiot.The new iPhone sucks, DO NOT BUY!I got dumped today, feeling super sad.

love !

new !

iphone

1.0, 0.0 !

!

!

Page 16: Intro to Sentiment Analysis

Naive BayesPre-Processed: love new iphone

POSITIVEWho doesn't love the new iPhone?I love tacos, especially from Taco Mayo.New shoes are the best!!!!NEGATIVEMy friend Chuck is an idiot.The new iPhone sucks, DO NOT BUY!I got dumped today, feeling super sad.

love !

new !

iphone

1.0, 0.0 !

0.67, 0.33 !

Page 17: Intro to Sentiment Analysis

Naive BayesPre-Processed: love new iphone

POSITIVEWho doesn't love the new iPhone?I love tacos, especially from Taco Mayo.New shoes are the best!!!!NEGATIVEMy friend Chuck is an idiot.The new iPhone sucks, DO NOT BUY!I got dumped today, feeling super sad.

love !

new !

iphone

1.0, 0.0 !

0.67, 0.33 !

0.5, 0.5

Page 18: Intro to Sentiment Analysis

Naive BayesPre-Processed: love new iphone

POSITIVEWho doesn't love the new iPhone?I love tacos, especially from Taco Mayo.New shoes are the best!!!!NEGATIVEMy friend Chuck is an idiot.The new iPhone sucks, DO NOT BUY!I got dumped today, feeling super sad.

love !

new !

iphone

1.0, 0.0 !

0.67, 0.33 !

0.5, 0.5 0.72, 0.28

Evidence points to a 72% likelihood that this document is positive.

Page 19: Intro to Sentiment Analysis

Maximum Entropy

Page 20: Intro to Sentiment Analysis

Maximum Entropy

Page 21: Intro to Sentiment Analysis

Maximum Entropy

Really good at large document sets.

Black box of magicVS.

Page 22: Intro to Sentiment Analysis

TestHuman Validation: Randomly select n documents to be manually tagged and compare accuracy.

Page 23: Intro to Sentiment Analysis

TestCross Validation: Hold n% out of already classified samples to see how good our model generalizes to new data.

Page 24: Intro to Sentiment Analysis

Challenges

Spelling: the new iPohne is grate!

Ambiguous: The iPhone is fantastic at sucking

Figurative: this new iPhone is sick.

Irony: screen broke, thanks Apple

Context: Angry Birds looks awesome on my iPhone

Page 25: Intro to Sentiment Analysis

Sentiment Analysis