Upload
stephanie-steinhardt
View
838
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Presentation by myself and Jeff Hancock on April 23, 2012, in Avignon, France, at the 2012 conference for the European Association of Computational Linguistics (EACL) Deception Detection Workshop.
Citation preview
Stephanie Gohkman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie
In Search of a Gold Standard in Studies of Deception
Stephanie Gokhman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie
In Search of a Gold Standard in Studies of Deception
Stephanie Gohkman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie
Newman-Pennebaker Model (2003)
The NP model not consistent across contexts
On reflection, why would we expect it to be?
Psychological and persuasion dynamics of deception are highly constrained by context
Context: Deception in Online Reviews
1.Sanctioned Lies
Creating Deception for Research
• Researcher asks participant to lie• Topics include beliefs, attitudes, feelings, actions
Ex: mock crime
1.Sanctioned Lies
Creating Deception for Research
• Researcher asks participant to lie• Topics include beliefs, attitudes, feelings, actions
Ex: mock crime
Adv: researcher can control when and where lie occursLimitations: permission to lie, requires high stakes
1. Sanctioned Lies
2. Unsanctioned Lies
Creating Deception for Research
i. Diary Studies
i. Retrospective Identification
i. Cheating paradigms
1. Sanctioned Lies
2. Unsanctioned Lies
Creating Deception for Research
Psychology & Communication
1. Sanctioned Lies
2. Unsanctioned Lies
3. Non-gold Standard Approaches
Creating Deception for Research
i. Manual Annotation
i. Heuristically labeled
i. Unlabeled (distributional analysis)
Psychology & Communication
ComputerScience
1.Sanctioned Lies
1.Unsanctioned Lies
1.Non-gold Standard Approaches
A Novel Method: The Crowd-sourcing Approach…
Creating Deception for Research
The Crowdsourcing Approach
Crowdsourcing divides large projects into small manageable tasks and matches these tasks with humans that will perform them
- harness distributed resources
- maximize speed
- minimize cost
- more powerful than local tech & small research groups
- data collection, access, annotation, and analysis
Amazon's Mechanical Turk
Requesters create a Human Intelligence Task (HIT) to be completed by Workers
HITs are similar to HTML forms an may include:
- the solicitation
- information needed for the Workers to complete the task
- collection of survey information
4 Assumptions of our Crowdsourcing Approach
1. Balanced data set Equal # of truthful and deceptive reviews Uniform valence: whole positive or negative data set
2. Both truthful and deceptive reviews cover same set of entities
Minimize distinguishing features that may be context-based rather than language of deception
3. Data set of reasonable size 800 total reviews (400 crowdsourced)
4 Assumptions of our Crowdsourcing Approach
4. Deceptive reviews should be generated under the same basic guidelines as governs the generation of truthful reviews
Length Quality Time
STEP 1: Identify entities to be covered in the reviews
Truthful corpus– Find all entities (specific hotels) from the real world
database (TripAdvisor)
– Extract all statements (reviews) from those entities
– Identify the subcategories to which these entities belong (Chicago hotels)
STEP 1: Identify entities to be covered in the reviews
STEP 1: Identify entities to be covered in the reviews
Truthful corpus– Find all entities (specific hotels) from the real world
database (TripAdvisor)
– Extract all statements (reviews) from those entities
– Identify the subcategories to which these entities belong (Chicago hotels)
Deceptive Corpus– Use entities from truthful corpus to create the prompt
for the Turkers
STEP 2: Develop the Mechanical Turk prompt
Survey real solicitations for deception (hotel reviews, doctor reviews, etc)
A Real Solicitation
STEP 2: Develop the Mechanical Turk prompt
Survey real solicitations for deception (hotel reviews, doctor reviews, etc)
Mimic the workflow, vocabulary and tone of the Turkers
Step 3: Attach appropriate warnings to the solicitation
May not complete this task more than once Their work will not be awarded if it is not
coherent or off topic This review is for academic purposes
Be aware of priming effects and placement of this warning
Step 4: Gather demographic data and comments
Survey mechanism for demographics– Age, Education, etc
Qualitative, open-ended commentProvides technical information
Incentivize comments
Step 5: Pilot
Pilot the resulting HIT in small batches (10)
Remove all plagiarized results through automated processes (Yahoo! Boss API)
– Workers do not receive payment for any plagiarized material
Manually evaluate remaining set
Coherence, Topical, Length of Review
Iterate until: No technical complaints
Experiment quality
Full run of solicitation (400 reviews) by unique workers
Let's see it!
Finding the Gold Standard
Resulting set of 400 reviews are then used to train the algorithm for deceptive positive reviews
The algorithm trains separately on the set of 400 truthful* reviews for comparison
Discussion & Conclusion
Advantages
• model the deception as closely to real-world as possible• known deceptive
Limitations
• sanctioned?• limited knowledge of Turkers• constrained to certain contexts• construction of the ‘truthful’ set non-trivial
Discussion & Conclusion
Key Potential:
to create datasets more easily and efficientlyin an effort to model deception customized tospecific contexts for a Context Constrained Approach to Deception
In Search of a Gold Standard in Studies of Deception
Stephanie Gokhman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie