Deception Detection in Transcribed Speech and Written Text

Deception Detection in Transcribed Speech and

Written Text Rebecca Pottenger

Background

•  Detecting deception is a difficult problem

•  Human detection accuracy is low •  For text: on average, correctly classify 47% of lies and 61% of truths1

•  For transcribed speech: on average, correctly classify 58.2%2

•  Not many automated methods; most existing automated methods have low accuracy •  Logistic regression; Linguistic Inquiry and Word Count (LIWC) features; 5-

fold CV – 59% of lies and 62% of truths3 •  Ripper rule induction; Acoustic, LIWC and speaker-dependent features; 10-

fold CV – 66.4%4

•  Naïve Bayes and SVM; bag-of-words; 10-fold CV – 70%5

•  Best method in existing literature: SVM; LIWC and bigram features; 5-fold CV – 89.8%6

Questions To Explore

•  Is there an underlying distribution to deceptive language? What is it?

•  Is this distribution different depending on whether the person was speaking (i.e. transcribed text) or writing?

•  Can we improve the accuracy of automated deception detection with better features? What should those features be?

Dataset

•  400 truthful (Trip Advisor) and 400 gold-standard deceptive (Amazon Mechanical Turk) hotel reviews6

•  Michigan State University cheating game7

•  7 lied about cheating, 9 confessed to cheating, 44 did not cheat

•  Possibly: Testimony from convicted perjurers and other cases

Experimental Methods

1) Re-create existing best method on dataset #2 •  Use variety of supervised algorithms in addition to SVM (Naïve Bayes,

Artificial Neural Networks etc.)

2) Distribution building •  Identify space of possible features to work with (entire word set,

bigram, LIWC, etc.)

•  Build probability distributions from deceptive and truthful data for both datasets

3) Use new sets of features to learn the model on dataset #1 and #2 •  Use variety of features as well as variety of supervised algorithms

Methods of Analysis

•  Maximum Likelihood Estimate to find best fitting distribution

•  10-fold cross validation

•  Accuracy, Precision, Recall, F-score

•  Feature weights

Sources

1)  C.F. Bond and B.M. DePaulo. 2006. Accuracy of deception judgments. Personality and Social Psychology Review, 10(3):214

2)  F. Enos, S. Benus, R.L. Cautin, M. Graciarena, J. Hirschberg, and E. Shriberg. Personality Factors in Human Deception Detection: Comparing Human to Machine Performance. In Proceedings of INTERSPEECH-2006, Pittsburgh, Pennsylvania, USA.

3)  M.L. Newman, J.W. Pennebaker, D.S. Berry, and J.M. Richards. 2003. Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin, 29(5):665.

4)  J. Hirschberg, S. Benus, J. Brenier, F. Enos, S. Friedman, S. Gilman, C. Girand, M. Graciarena, A. Kathol, L. Michaelis, B. Pellom, E. Shriberg, and A. Stolcke. 2005. Distinguishing deceptive from non-deceptive speech. In Proceedings of INTERSPEECH-2005, Lisbon, Portugal.

5)  R. Mihalcea and C. Strapparava. 2009. The lie detector: Explorations in the automatic recognition of deceptive language. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 309–312. Association for Computational Linguistics.

6)  M. Ott, Y. Choi, C. Cardie, and J.T. Hancock. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pages 309-319. Association for Computational Linguistics.

7)  M. Ali & T. Levine. 2008. The Language of Truthful and Deceptive Denials and Confessions, Communication Reports, 21:2, 82-91.

Documents

Deception Detection in Transcribed Speech and Written Text