35
Stephanie Gohkman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie

EACL2012: In Search of a Gold Standard in Studies of Deception

Embed Size (px)

DESCRIPTION

Presentation by myself and Jeff Hancock on April 23, 2012, in Avignon, France, at the 2012 conference for the European Association of Computational Linguistics (EACL) Deception Detection Workshop.

Citation preview

Page 1: EACL2012: In Search of a Gold Standard in Studies of Deception

Stephanie Gohkman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie

Page 2: EACL2012: In Search of a Gold Standard in Studies of Deception

In Search of a Gold Standard in Studies of Deception

Stephanie Gokhman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie

Page 3: EACL2012: In Search of a Gold Standard in Studies of Deception

In Search of a Gold Standard in Studies of Deception

Stephanie Gohkman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie

Newman-Pennebaker Model (2003)

Page 4: EACL2012: In Search of a Gold Standard in Studies of Deception
Page 5: EACL2012: In Search of a Gold Standard in Studies of Deception
Page 6: EACL2012: In Search of a Gold Standard in Studies of Deception
Page 7: EACL2012: In Search of a Gold Standard in Studies of Deception

The NP model not consistent across contexts

On reflection, why would we expect it to be?

Psychological and persuasion dynamics of deception are highly constrained by context

Page 8: EACL2012: In Search of a Gold Standard in Studies of Deception

Context: Deception in Online Reviews

Page 9: EACL2012: In Search of a Gold Standard in Studies of Deception

1.Sanctioned Lies

Creating Deception for Research

• Researcher asks participant to lie• Topics include beliefs, attitudes, feelings, actions

Ex: mock crime

Page 10: EACL2012: In Search of a Gold Standard in Studies of Deception

1.Sanctioned Lies

Creating Deception for Research

• Researcher asks participant to lie• Topics include beliefs, attitudes, feelings, actions

Ex: mock crime

Adv: researcher can control when and where lie occursLimitations: permission to lie, requires high stakes

Page 11: EACL2012: In Search of a Gold Standard in Studies of Deception

1. Sanctioned Lies

2. Unsanctioned Lies

Creating Deception for Research

i. Diary Studies

i. Retrospective Identification

i. Cheating paradigms

Page 12: EACL2012: In Search of a Gold Standard in Studies of Deception

1. Sanctioned Lies

2. Unsanctioned Lies

Creating Deception for Research

Psychology & Communication

Page 13: EACL2012: In Search of a Gold Standard in Studies of Deception

1. Sanctioned Lies

2. Unsanctioned Lies

3. Non-gold Standard Approaches

Creating Deception for Research

i. Manual Annotation

i. Heuristically labeled

i. Unlabeled (distributional analysis)

Psychology & Communication

ComputerScience

Page 14: EACL2012: In Search of a Gold Standard in Studies of Deception

1.Sanctioned Lies

1.Unsanctioned Lies

1.Non-gold Standard Approaches

A Novel Method: The Crowd-sourcing Approach…

Creating Deception for Research

Page 15: EACL2012: In Search of a Gold Standard in Studies of Deception

The Crowdsourcing Approach

Crowdsourcing divides large projects into small manageable tasks and matches these tasks with humans that will perform them

- harness distributed resources

- maximize speed

- minimize cost

- more powerful than local tech & small research groups

- data collection, access, annotation, and analysis

Page 16: EACL2012: In Search of a Gold Standard in Studies of Deception

Amazon's Mechanical Turk

Requesters create a Human Intelligence Task (HIT) to be completed by Workers

HITs are similar to HTML forms an may include:

- the solicitation

- information needed for the Workers to complete the task

- collection of survey information

Page 17: EACL2012: In Search of a Gold Standard in Studies of Deception

4 Assumptions of our Crowdsourcing Approach

1. Balanced data set Equal # of truthful and deceptive reviews Uniform valence: whole positive or negative data set

2. Both truthful and deceptive reviews cover same set of entities

Minimize distinguishing features that may be context-based rather than language of deception

3. Data set of reasonable size 800 total reviews (400 crowdsourced)

Page 18: EACL2012: In Search of a Gold Standard in Studies of Deception

4 Assumptions of our Crowdsourcing Approach

4. Deceptive reviews should be generated under the same basic guidelines as governs the generation of truthful reviews

Length Quality Time

Page 19: EACL2012: In Search of a Gold Standard in Studies of Deception

STEP 1: Identify entities to be covered in the reviews

Truthful corpus– Find all entities (specific hotels) from the real world

database (TripAdvisor)

– Extract all statements (reviews) from those entities

– Identify the subcategories to which these entities belong (Chicago hotels)

Page 20: EACL2012: In Search of a Gold Standard in Studies of Deception

STEP 1: Identify entities to be covered in the reviews

Page 21: EACL2012: In Search of a Gold Standard in Studies of Deception

STEP 1: Identify entities to be covered in the reviews

Truthful corpus– Find all entities (specific hotels) from the real world

database (TripAdvisor)

– Extract all statements (reviews) from those entities

– Identify the subcategories to which these entities belong (Chicago hotels)

Deceptive Corpus– Use entities from truthful corpus to create the prompt

for the Turkers

Page 22: EACL2012: In Search of a Gold Standard in Studies of Deception

STEP 2: Develop the Mechanical Turk prompt

Survey real solicitations for deception (hotel reviews, doctor reviews, etc)

Page 23: EACL2012: In Search of a Gold Standard in Studies of Deception

A Real Solicitation

Page 24: EACL2012: In Search of a Gold Standard in Studies of Deception

STEP 2: Develop the Mechanical Turk prompt

Survey real solicitations for deception (hotel reviews, doctor reviews, etc)

Mimic the workflow, vocabulary and tone of the Turkers

Page 25: EACL2012: In Search of a Gold Standard in Studies of Deception

Step 3: Attach appropriate warnings to the solicitation

May not complete this task more than once Their work will not be awarded if it is not

coherent or off topic This review is for academic purposes

Be aware of priming effects and placement of this warning

Page 26: EACL2012: In Search of a Gold Standard in Studies of Deception

Step 4: Gather demographic data and comments

Survey mechanism for demographics– Age, Education, etc

Qualitative, open-ended commentProvides technical information

Incentivize comments

Page 27: EACL2012: In Search of a Gold Standard in Studies of Deception

Step 5: Pilot

Pilot the resulting HIT in small batches (10)

Remove all plagiarized results through automated processes (Yahoo! Boss API)

– Workers do not receive payment for any plagiarized material

Manually evaluate remaining set

Coherence, Topical, Length of Review

Iterate until: No technical complaints

Experiment quality

Full run of solicitation (400 reviews) by unique workers

Page 28: EACL2012: In Search of a Gold Standard in Studies of Deception

Let's see it!

Page 29: EACL2012: In Search of a Gold Standard in Studies of Deception

Finding the Gold Standard

Resulting set of 400 reviews are then used to train the algorithm for deceptive positive reviews

The algorithm trains separately on the set of 400 truthful* reviews for comparison

Page 30: EACL2012: In Search of a Gold Standard in Studies of Deception
Page 31: EACL2012: In Search of a Gold Standard in Studies of Deception
Page 32: EACL2012: In Search of a Gold Standard in Studies of Deception
Page 33: EACL2012: In Search of a Gold Standard in Studies of Deception

Discussion & Conclusion

Advantages

• model the deception as closely to real-world as possible• known deceptive

Limitations

• sanctioned?• limited knowledge of Turkers• constrained to certain contexts• construction of the ‘truthful’ set non-trivial

Page 34: EACL2012: In Search of a Gold Standard in Studies of Deception

Discussion & Conclusion

Key Potential:

to create datasets more easily and efficientlyin an effort to model deception customized tospecific contexts for a Context Constrained Approach to Deception

Page 35: EACL2012: In Search of a Gold Standard in Studies of Deception

In Search of a Gold Standard in Studies of Deception

Stephanie Gokhman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie