Upload
sierra
View
38
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Automated Personality Classification. A. KARTELJ and V. FILIPOVIC School of Mathematics, University of Belgrade, Serbia and V. MILUTINOVIC School of Electrical Engineering, University of Belgrade, Serbia. Agenda. Problem overview Classification of the existing solutions - PowerPoint PPT Presentation
Citation preview
Automated Personality Classification
A. KARTELJ and V. FILIPOVICSchool of Mathematics, University of Belgrade, SerbiaandV. MILUTINOVICSchool of Electrical Engineering, University of Belgrade, Serbia
AgendaProblem overviewClassification of the existing
solutionsPresentation of the existing
solutionsComparison of the solutionsWork in progress:
Bayesian Structure Learning for the APC
Future work: Video Based APC
Conclusions
MULTI 2012 23.10.2012
Problem Overview
MULTI 2012 33.10.2012
The Big 5 Model
MULTI 2012 43.10.2012
The Steps in Our Research1. Survey paper
(under review at ACM CSUR)2. Research paper:
A new APC model based on Bayesian structure learning (in progress)
3. Real-purpose applicationof the APC model from step 2
4. Go to step 3 MULTI 2012 53.10.2012
Elements of APCCorpus:
Essay, weblog, email, news group, Twitter counts...
Personality measurement:Questionnaire (internet and written). We are searching for an alternative!
Model:Stylistic analysis, linguistic features, machine learning techniques
MULTI 2012 63.10.2012
Applications
MULTI 2012 73.10.2012
Mining People’s Characteristics
MULTI 2012 83.10.2012
Classification of Solutions
MULTI 2012 93.10.2012
• C1 criterion separates solutions by type of conversation (1 = self-reflexive, N = continuous)
• C2 criterion separates solutions by approach (TD = top-down, DD = data-driven, or HY = hybrid)
Linguistic Styles: Language Use as an Individual DifferencePennebaker and King [1999]
MULTI 2012 103.10.2012
LIWC and MRC FeaturesFeature Type ExampleAnger words LIWC Hate, killMetaphysical issues LIWC God, heaven, coffinPhysical state / function
LIWC Ache, breast, sleep
Inclusive words LIWC With, and, includeSocial processes LIWC Talk, us, friendFamily members LIWC Mom, brother, cousinPast tense verbs LIWC Walked, were, hadReferences to friends
LIWC Pal, buddy, coworker
Imagery of words MRC Low: future, peace – High: table, car
Syllables per word MRC Low: a – High: uncompromisingly
Concreteness MRC Low: patience, candor – High: ship
Frequency of use MRC Low: duly, nudity – High: he, the
MULTI 2012 113.10.2012
What Are They Blogging About? Personality, Topic and Motivation in BlogsGill et al. [2009]
MULTI 2012 123.10.2012
Taking Care of the Linguistic Features of ExtraversionGill and Oberlander [2002]
MULTI 2012 133.10.2012
Personality Based Latent Friendship Mining Wang et al. [2009]
MULTI 2012 143.10.2012
A Comparative Evaluation of Personality Estimation Algorithms for the TWIN Recommender System Roshchina et al. [2011]
MULTI 2012 153.10.2012
Predicting Personality with Social MediaGolbeck et al. [2011]
MULTI 2012 163.10.2012
Our Twitter Profiles, Our Selves: Predicting Personality with TwitterQuercia et al. [2011]
MULTI 2012 173.10.2012
Paper Input Corpus Features Algorithm Soft. Cit. I S A R
[Pennebaker and King 1999] text essays LIWC correlations n/a 455 H H H M
[Mairesse et al. 2007] text, speech essays LIWC, MRC C4.5, NB, SMO,
M5’ Weka 99 M M H M
[Gill et al. 2009] text weblogs (14.8words) LIWC linear regression n/a 26 H H M M
[Yarkoni 2010] text weblogs (100K words) LIWC correlations n/a 21 H M M M
[Gill and Oberlander 2002] text emails (105
students) bigrams bigram analysis n/a 49 L M M L
[Nowson et al. 2005] text weblogs (410K words) word list correlations n/a 48 L H H L
[Oberlander 2006] text weblogs (410K words) N-grams NB, SMO Weka 53 H M H M
[Wang et al. 2009] text, weblogs (200 pairs) lexical freq. ,TFIDF
logistic regression Minitab 1 H M M M
[Iacobelli et al. 2011] text weblogs (3000) LIWC, bigrams, SVM, SMO, NB.. Weka 1 H H M H
[Argamon et al. 2005] text essays word list, conj. SMO Weka 38 H M M M
[Argamon et al. 2007] text essays word list, conj. SMO Weka, ATMan 45 H M M M
[Mairesse and Walker 2006]
text , conv. extracts
96 persons (≈100Kwords)
LIWC, MRC, utterance… RankBoost n/a 22 M M H M
[Rigby and Hassan 2007] text mail. lists (140K
emails) LIWC C4.5 Weka, SPSS 30 M H M L
[Roshchina et al. 2011] text TripAdvisor
reviews LIWC, MRC Linear, M5, SVM Weka 2 H M L M
[Quercia et al. 2011] meta 335 Twitter users Twitter counts M5’ rules Weka 5 M H M M
[Golbeck et al. 2011] text, meta 279 FB users 5 classes
(161 in total)M5’ rules, Gaussian processes
Weka 12 H M M M
[Celli 2012] text 1065 posts 22 ling. Features
majority-based classification n/a 1 M M M M
MULTI 2012 183.10.2012
Naive Bayes Classifier
MULTI 2012 193.10.2012
Naive Bayes and Bayesian Network
MULTI 2012 203.10.2012
Bayesian Network for the APC
MULTI 2012 213.10.2012
Bayesian Network Structure Learning1. Obtain corpus (training set T)2. Fit T to appropriate network
structure by:a) ILP formulation + solver (CPLEX,
Gurobi…) on smaller instances
b) Apply metaheuristic on larger instances
3. Validate quality of metaheuristic approach
4. Compare obtained APC accuracy with other approaches
MULTI 2012 223.10.2012
Other Ideas
MULTI 2012 23
Games with a purpose (GWAP)
Clustering personality characteristics
3.10.2012
Packing everything together: Video Based APC
MULTI 2012 243.10.2012
ConclusionsClassification of the existing
solutions (Survey paper)Filling the gaps inside
classification treeIntroducing Bayesian Structure
Learning for the APCUtilizing metaheuristics in
dealing with high dimensionality
APC potential: social networks, recommender, and expert systems
MULTI 2012 253.10.2012
THANK YOU!Aleksandar Kartelj [email protected] Filipovic [email protected] Milutinovic [email protected]