View
70
Download
0
Category
Preview:
Citation preview
1
The Five Factor Model of personality and evaluation of drug consumption risk
E. Fehrman1, A. K. Muhammad2, E. M. Mirkes2, V. Egan3, A. N. Gorban2
1Men's Personality Disorder & National Women's Directorate, Rampton Hospital, Retford, Nottinghamshire, DN22 0PD, UK
2Department of Mathematics, University of Leicester, Leicester, LE1 7RH, UK 3Department of Psychiatry and Applied Psychology, University of Nottingham, Nottingham, NG8 1BB, UK
Abstract The problem of evaluating an individual’s risk of drug consumption and misuse is highly important for health planning. An online survey methodology was employed to collect data including personality traits (NEO-FFI-R), impulsivity (BIS-11), sensation-seeking (ImpSS), and demographic information which influence drug use. The data set contained information on the consumption of 18 central nervous system psychoactive drugs. Correlation analysis using a relative information gain model demonstrates the existence of a group of drugs (amphetamines, cannabis, cocaine, ecstasy, legal highs, LSD, and magic mushrooms) with strongly correlated consumption patterns. An exhaustive search was performed to select the most effective subset of input features and data mining methods to classify users and non-users for each drug type. A number of classification methods were employed (decision tree, random forest, k-nearest neighbors, linear discriminant analysis, Gaussian mixture, probability density function estimation, logistic regression and naïve Bayes) and the most effective method selected for each drug. The quality of classification was surprisingly high. The best results with sensitivity and specificity (evaluated by leave-one-out cross-validation) being greater than 75% were obtained for VSA (volatile substance abuse) and methadone use. Good results with sensitivity and specificity (greater than 70%) were achieved for amphetamines, cannabis, cocaine, crack, ecstasy, heroin, ketamine, legal highs, and nicotine. The poorest result was obtained for prediction of alcohol consumption.
Key Words: Drug consumption, data mining, personality factors, NEO-FFI-R, risk evaluation, correlation analysis.
1 Introduction
Drug use is a risk behaviour that does not happen in isolation; it constitutes an important factor for increased risk of poor health, along with earlier mortality and morbidity, and has significant consequences for society (McGinnis & Foege, 1993; Sutina, Evans, & Zonderman, 2013). Drug consumption and addiction constitutes a serious problem globally. This includes numerous risk factors, which are defined as any attribute, characteristic, or event in the life of an individual that increases the probability of drug consumption. A number of factors are correlated with initial drug use including psychological, social, individual, environmental, and economic factors (Cleveland, Feinberg, Bontempo, & Greenberg, 2008; Ventura, de Souza, Hayashida, & Ferreira, 2014; World Health Organization, 2004). These factors are likewise associated with a number of personality traits (Dubey, Arora, Gupta, & Kumar, 2010; DordiNejad & Shiran, 2011; Bogg & Roberts, 2004). While legal drugs such as sugar, alcohol and tobacco are probably responsible for far more premature death than illegal recreational drugs (Beaglehole, et al., 2011), the social and personal consequences of recreational drug use can be highly problematic (Bickel, Johnson, Koffarnus, MacKillop, & Murphy, 2014).
2
Psychologists have largely agreed that the personality traits of the Five Factor Model (FFM) are the most comprehensive and adaptable system for understanding human individual differences (Costa & MacCrae, 1992). The FFM comprises Neuroticism, Extraversion, Openness, Agreeableness, and Conscientiousness.
Previous studies demonstrate that high Neuroticism and Openness, and low Agreeableness and Conscientiousness are associated with higher risk of drug use (including cocaine, cannabis, alcohol and marijuana) (Sutina, Evans, & Zonderman, 2013; Flory, Lynam, Milich, Leukefeld, & Clayton, 2002). Roncero et al. (2014) highlight the importance of the relationship between high Neuroticism scores and the presence of psychotic symptoms following cocaine-induced drug consumption. Vollrath & Torgersen (2002) observe that the personality traits of Neuroticism, Extraversion, and Conscientiousness are highly correlated with hazardous health behaviours. A low score of Conscientiousness, and high score of Extraversion or high score of Neuroticism correlate strongly with multiple risky health behaviours.
The statistical characteristics of groups of drug users and non-users have been studied by many authors (see, for example, Terracciano et al. (2008). They found that the personality profile for the users and non-users of tobacco, marijuana, cocaine, and heroin are associated with a higher score on Neuroticism and a very low score for Conscientiousness. The problem of risk evaluation for individuals is much more complex. This was explored very recently by Yasnitskiy et al. (2015), Valeroa et al. (2014), and Bulut & Bucak (2014).
Valeroa et al. (2014) evaluated the individual risk of drug consumption for alcohol, cocaine, opiates, cannabis, ecstasy and amphetamines. Input data were collected using the Spanish version of the Zuckerman–Kuhlman Personality Questionnaire (ZKPQ). Two samples were used in this study. The first one consisted of 336 drug dependent psychiatric patients of one hospital. The second sample included 486 control individuals. The authors used a decision tree as a tool to identify the most informative attributes. The sensitivity of 40% and the specificity of 94% were achieved for the training set. The main purpose of this research was to test if predicting drug consumption was possible and to identify the most informative attributes using data mining methods. The authors applied a decision tree to explore the differential role of personality profiles in drug consumer and control individuals. They found that the two personality factors of Neuroticism and anxiety and the ZKPQ’s Impulsivity were most relevant for drug consumption prediction. Low sensitivity does not provide application of this decision tree for real life problems. The current study seeks to test these associations with personality for different types of drugs separately using the Revised NEO Five-Factor Inventory (NEO-FFI-R) (McCrae & Costa, 2004), the Barratt Impulsiveness Scale Version 11 (BIS-11) (Stanford, Mathias, Dougherty, Lake, Anderson, & Patton, 2009), and the Impulsivity Sensation-Seeking scale (ImpSS) (Zuckerman, 1994) to assess impulsivity and sensation-seeking respectively.
Bulut & Bucak (2014) detected a risk rate for teenagers in terms of percentage who are at high risk without focusing on specific addictions. The attributes were collected by an original questionnaire, which included 25 questions. The form was filled in by 671 students. The first 20 questions asked about the teenagers’ financial situation, temperament type, family and social relations, and cultural preferences. The last five questions were completed by their teachers and concerned the grade point average of the student for the previous semester according to a 5-point grading system, whether the student had been given any disciplinary punishment so far, if the student had alcohol
3
problems, if the student smoked cigarettes or used tobacco products, and whether the student misused substances.”
In Bulut et al’s study there are five risk classes as outputs. The authors diagnosed teenagers risk to be a drug abuser using seven types of classification algorithms: k-nearest neighbor, ID3 and C4.5 decision tree based algorithms, naïve Bayes classifier, naïve Bayes/decision trees hybrid approach, one-attribute-rule (OneR), and projective adaptive resonance theory (PART). The classification accuracy of the best classifier was reported as 98%.
Yasnitskiy et al. (2015), attempted to evaluate the individual’s risk of drug consumption and to recommend the most efficient changes in the individual’s social environment to reduce this risk. The input and output features were collected by an original questionnaire. The attributes consisted of: level of education, having friends who use drugs, temperament type, number of children in the family, financial situation, alcohol drinking and smoking, family relations (cases of physical, emotional and psychological abuse, level of trust and happiness in the family). There were 72 participants. A neural network model was used to evaluate the importance of attributes for diagnosis of the tendency to drug addiction. For several test patients (drug users) a series of virtual experiments was performed to evaluate how it is possible to control the propensity for drug addiction. For each patient the most effective change of social environment features was predicted. The recommended changes depended on the personal profile, and significantly varied for different patients. This approach produced individual bespoke advice for decreasing drug dependence.
In the current study we evaluated the individual drug consumption risk separately, for each drug. We also analysed interrelations between the individual drug consumption risks for different drugs. We applied several data mining approaches: decision tree, random forest, k-nearest neighbors, linear discriminant analysis, Gaussian mixture, probability density function estimation, logistic regression and naïve Bayes.
An online survey methodology was employed by Elaine Fehrman from March 2011 to March 2012. The data collected from the 2051 respondents included personality traits (NEO-FFI-R), impulsivity (BIS-11), sensation-seeking (ImpSS), and demographic information including country of location, ethnicity, level of education, gender, and age ranges. The data set contained information on the consumption of 18 central nervous system psychoactive drugs including alcohol, amphetamines, amyl nitrite, benzodiazepines, cannabis, chocolate, cocaine, caffeine, crack, ecstasy, heroin, ketamine, legal highs, LSD, methadone, magic mushrooms, nicotine, and Volatile Substance Abuse (VSA).
Participants were asked about substances, which were classified as central nervous system (CNS) depressants, stimulants, or hallucinogens. The depressant drugs comprised alcohol, amyl nitrite, benzodiazepines and tranquillizers, GHB, solvents and inhalants, and opiates such as heroin and methadone/prescribed opiates. The stimulants consisted of amphetamines, nicotine, cocaine powder, crack cocaine, caffeine, and chocolate. Although chocolate contains caffeine, data for chocolate was measured discretely, given that it may induce parallel psychopharmacological and behavioural effects in individuals congruent to other addictive substances (Bruinsma & Taren, 1999). The hallucinogens included 2CB, cannabis, ecstasy, ketamine, LSD, and magic mushrooms. Legal highs such as methedrone, salvia, and various legal smoking mixtures were also measured.
The objecsensation-firstly, to secondly, profiles (contribute
The studybiased whpublishedcohorts (G2008).
Correlatiocaffeine) amphetamcorrelatedsymmetricexample, consumptconsumpt
In order approacheanalysis, GBayes). Fhighest leand Tableconsumpthypothese
The databfrom Surparticularparticipanbeing give
The studydid not reto being iendorsed respondenuseable sa
ctive of the-seeking, anidentify theto predict t
(i.e. NEO-Fe to an impr
y sample when compared by Egan, eGurrera, Ne
on analysis is not corre
mines, cannad (when thc a priori). knowledge
tion is usefution is signif
to predict es were appGaussian m
For each druevel of accue 12). The tion in relates for furthe
base was corvey Gizmoly relevant
nts were reqen.
y recruited 2espond correinattentive tusing a fi
nts who oveample of 18
e study wasnd demograe associationthe risk of dFFI-R, BISrovement of
as created bed with the et al. (2000)stor, & O'D
demonstrateelated with abis, cocain
he correlatioThere are ae of amphful for the eficantly less
the risk oplied (decis
mixture, probug, the mosuracy. Unex
creation oftion to indi
er study (see
ollected by o was empto canvassin
quired to dec
2051 particiectly to a reto the questictitious recer-claim, as 85 participa
s to assess aphic data n of persona
drug consumS-11, ImpSSf knowledge
by an anonygeneral pop) and Costa
Donnel, 2000
es that the cthat of oth
ne, ecstasy, ons are mea number ofetamines, c
evaluation os useful for t
f drug consion tree, rabability denst effective xpectedly gof classifiersividuals. The Section 3.6
2 Mate
Elaine Fehployed to ng respondeclare themse
ipants over esponse-chetions being creational dhave other
ants (male/f
4
the potention drug coality profile
mption for eaS, educatio
e concerning
ymous onlipulation, wha Jr & McC0; Terraccia
consumptionher drugs. T
legal higheasured byf strongly ascocaine, ecof ketaminethe evaluati
nsumption fandom forensity functio
subset of iood classifis provided the risk map6 and Figure
erials and
Data2.1
hrman betwegather datents’ viewselves at leas
an 18-montck built intoasked. Nin
drug, and wstudies of t
female = 943
ial effect oonsumptiones (i.e. NEOach individuon level, gg the pathwa
ne survey. hich is indicarae (2004).
ano, Löcken
n of legal dThe consums, LSD, an
y relative insymmetric c
cstasy, legae consumptiion of usage
for individust, k-neareson estimatioinput attribuiers were fothe capabilips in provie 8 there).
Methods
abase
een 2011 ana with ma, given the st 18 years
th recruitmeo the middlee of these pwhich wasthis kind (H3/942).
f personalit. The study
O-FFI-R) witual accordinender, and ays leading
It was founated from coSuch a bia
nhoff, Crum
drugs (i.e. almption of se
d mushroomnformation correlations
al highs, Lion, but knoe of the drug
uals, a numst neighborson, logistic rutes was seound for allity to evaluide a tool f
nd 2012. Aaximum ansensitive naof age prior
ent period. Oe of the scapersons wer
included pHoare & Mo
ty traits, imy had two th drug conng to their p
age). Theto drug con
nd to be sigomparison t
as is usual fom, Bienvenu,
lcohol, choceven illicit dms) is sym
gain, whics (see FigurLSD, and mowledge of gs listed abo
mber of dats, linear disregression, lected to pr drugs (seeuate the riskfor the gen
An online sunonymity, tature of drur to informe
Of these perale, so were re found to precisely to
oon, 2010).
mpulsivity, purposes:
nsumption; personality e findings nsumption.
gnificantly to the data
for clinical , & Costa,
colate and drugs (i.e. metrically ch is not e 7b). For mushroom f ketamine ove.
ta mining scriminant and naïve rovide the
e Table 11 k of drug eration of
urvey tool this being ug use. All ed consent
rsons, 166 presumed also have
o identify This led a
5
The snowball sampling methodology recruited a primarily (93.5%) native English-speaking sample, with participants from the UK (1044; 55.4%), the USA (557; 29.5%), Canada (87; 4.6%), Australia (54; 2.9%), New Zealand (5; 0.3%) and Ireland (n = 20; 1.1%). A total of 118 (6.3%) came from a diversity of other countries, none of whom individually met 1% of the sample or did not declare the country of location. Further optimizing anonymity, persons reported their age band, rather than their exact age; 18-24 years (643; 34.1%), 25-34 years (481; 25.5%), 35-44 years (356; 18.9%), 45-54 years (294; 15.6%), 55-64 (93; 4.9%), and over 65 (18; 1%). This indicates that although the largest age cohort band were 18 to 24, some 40% of the cohort was 35 or above, which are a sample often missed in studies of this kind.
The sample recruited was highly educated, with just under two thirds (59.5%) educated to, at a minimum, degree or professional certificate level: 14.4% (271) reported holding a professional certificate or diploma, 25.5% (n = 481) an undergraduate degree, 15% (n = 284) a master’s degree, and 4.7% (n = 89) a doctorate. Approximately 26.8% (n = 506) of the sample had received some college or university tuition although they did not hold any certificates; lastly, 257 (13.6%) had left school at the age of 18 or younger.
Participants were asked to indicate which racial category was broadly representative of their cultural background. An overwhelming majority (91.2%; 1720) reported being White, 1.8% (33) stated they were Black, and 1.4% (26) Asian. The remainder of the sample (5.6%; 106) described themselves as ‘Other’ or ‘Mixed’ categories. This small number of persons belonging to specific non-white ethnicities precludes any analyses involving racial categories.
2.1.1 Personality measurements
In order to assess personality traits of the sample, the NEO-FFI-R) questionnaire was employed (Costa & MacCrae, 1992). The NEO-FFI-R is a highly reliable measure basic personality domains; internal consistencies are 0.84 (N); 0.78 (E); 0.78 (O); 0.77 (A), and 0.75 (C) (Egan, 2011). The scale is a 60-item inventory comprised of five personality domains or factors. The NEO-FFI-R is a shortened version of the Revised NEO-Personality Inventory (NEO-PI-R) (Costa & MacCrae, 1992). The five factors are: N (Neuroticism), E (Extraversion), O (Openness), A (Agreeableness), and C (Conscientiousness) with 12 items per domain. These traits can be summarized as:
1. Neuroticism – a long-term tendency to experience negative emotions such as nervousness, tension, anxiety and depression;
2. Extraversion – manifested in outgoing, warm, active, assertive, talkative, cheerful, and in search of stimulation characteristics;
3. Openness – a general appreciation for art, unusual ideas, and imaginative, creative, unconventional, and wide interests,
4. Agreeableness – a dimension of interpersonal relations, characterized by altruism, trust, modesty, kindness, compassion and cooperativeness;
5. Conscientiousness – a tendency to be organized and dependable, strong-willed, persistent, reliable, and efficient.
All of these domains are hierarchically defined by specific facets (McCrae & Costa, 1991). Egan et al. (2000) observe that the score Openness and Extraversion domains of the NEO-FFI instrument are less reliable than Neuroticism, Agreeableness, and Conscientiousness.
6
Participants in our study were asked to read the 60 NEO-FFI-R statements and indicate on a five-point Likert scale how much a given item applied to them (i.e. 0 = ‘Strongly Disagree’, 1 = ‘Disagree’, 2 = ‘Neutral’, 3 = ‘Agree’, to 4 = ‘Strongly Agree’).
We expected that drug usage is associated with high N, and low A and C. The darker dimension of personality can be described in terms of low A, whereas much of the anti-social behaviour in non-clinical persons appears underpinned by high N and low C (Jakobwitz & Egan, 2006). The so-called ‘negative urgency’ is the tendency to act rashly when distressed, and characterized by high N, low C, and low A (Settles, Fischer, Cyders, Combs, Gunn, & Smith, 2012). The negative urgency is partially proved below for users of most of the illegal drugs. In addition, our findings suggest that O is higher for drug users.
The second measure used was the Barratt Impulsiveness Scale (BIS-11) (Stanford, Mathias, Dougherty, Lake, Anderson, & Patton, 2009). The BIS-11 is a 30-item self-report questionnaire, which measures the behavioural construct of impulsiveness, and comprises three subscales: motor impulsiveness, attentional impulsiveness, and non-planning. The ‘motor’ aspect reflects acting without thinking, the ‘attentional’ component poor concentration and thought intrusions, and the ‘non-planning’ a lack of consideration for consequences (Snowden & Gray, 2011). The scale’s items are scored on a four-point Likert scale. This study modified the response range to make it compatible with previous related studies (García-Montes, Zaldívar-Basurto, López-Ríos, & Molina-Moreno, 2009). A score of 5 usually connotes the most impulsive response although some items are reverse-scored to prevent response bias. Items are aggregated, and the higher BIS-11 scores, the higher the impulsivity level (Fossati, Ergis, & Allilaire, 2001). The BIS-11 is regarded a reliable psychometric instrument with good test-retest reliability (Spearman’s rho is equal to 0.83) and internal consistency (Cronbach’s alpha is equal to 0.83; (Stanford, Mathias, Dougherty, Lake, Anderson, & Patton, 2009; Snowden & Gray, 2011)).
The third measurement tool employed was the Impulsiveness Sensation-Seeking (ImpSS). Although the ImpSS combines the traits of impulsivity and sensation-seeking, it is regarded as a measure of a general sensation-seeking trait (Zuckerman, 1994). The scale consists of 19 statements in true-false format, comprising eight items measuring impulsivity (Imp), and 11 items gauging sensation-seeking (SS). The ImpSS is considered a valid and reliable measure of high risk behavioural correlates such as, substance misuse (McDaniel & Mahan, 2008).
2.1.2 Drug use
Participants were questioned concerning their use of 18 legal and illegal drugs (alcohol, amphetamines, amyl nitrite, benzodiazepine, cannabis, chocolate, cocaine, caffeine, crack, ecstasy, heroin, ketamine, legal highs, LSD, methadone, mushrooms, nicotine and volatile substance abuse (VSA)) and one fictitious drug (Semeron) which was introduced to identify over-claimers.
It was recognised at the outset of this study that drug use research regularly (and spuriously) dichotomises individuals as users or non-users, without due regard to their frequency or duration/desistance of drug use (Ragan & Beaver, 2010). In this study, finer distinctions concerning the measurement of drug use have been deployed, due to the potential for the existence of qualitative differences amongst individuals with varying usage levels. In relation to each drug, respondents were asked to indicate on if they never used the drug, used it over a decade ago, or in
the last deand the sp
It can be sin last dayand ‘Usedover a decategoriesstudy we
The propoindividualrelatively Consumptamphetamamyl nitri30%. Finanumbers biased to drugs is ex
The raw sdata, by u
ecade, year, pecific recen
seen that pay’ and also td in last deccade ago’. s into the canalysed thi
ortions of drls without common (o
tion of benmines, mushite is approxally, crack, characterisethe higher
xpected to b
score for eacusage of the
T-s
Never U
Used ovdecade
month, weency of use. T
articipants wto the categ
cade’. ThereThese two
class ‘User’is binary cla
rug users diany missin
over 96%). nzodiazepinhrooms and ximately 19heroin, and
e the groupproportion
be significan
ch factor of equation be
core 10
Used
ver a ago
ek, or day. TThe seven c
Figure 1. C
who had usegories ‘Usede are two spcategories w, as the simassification.
iffered for dng data. CConsumptio
nes, ecstasy cocaine wa
%. Consumd VSA usep of respond
of drug usntly lower (
2.
f the NEO-Felow (McCr
7
This format categories of
Categories of
ed a drug thed in last weeecial categowere placedmplest versi.
different druonsumptionon of cannaand legal
as approximmption of me
is approximdents. It is sers and for(Home Offic
Data a2
FFI-R was crae & Costa,
Used in l
Used in
Used in l
Used in
Used in
captured thf drug users
drug users
e previous dek’, ‘Used inories (see Fid into the clion of binar
ugs. The datn of alcohoabis and nic
highs wasmately 36%ethadone is mately 10%,
worth to mr the populace, UK, 201
analysis
onverted in, 2004).
ast decade
last year
last month
last week
n last day
he breadth os are depicte
day belong tn last monthigure 1): ‘Nlass of ‘Nonry classifica
tabase sampol, caffeinecotine was a
less, at 41%. Consumpt
above 22%, 11%, 12%mention heration consu4).
nto a T-Scor
of a drug-usied in Figure
to the categh’, ‘Used in
Never used’ an-user’, andation. Furth
ple comprisee and chocalso high (o1%. Consumtion of keta and LSD is
%, respectivere that the
umption of t
re based on n
50.
ing career, 1.
gory ‘Used n last year’ and ‘Used d all other her in this
ed of 1885 olate was
over 67%). mption of amine and s less than ely. These sample is the illegal
normative
Table 1 pof the sigdifference
F
NeuroticismExtraversionOpenness AgreeablenConscientio
The meanand Figurusual for Fridberg, Bienvenubelow thausers (seethe mean users from
T-scoresam
formula b
The T-scoconcerninlow scoreThe intervgroups (Umeans scothe averag49-51 ind
resents T-scgnificance oes between p
Table 1.
Factors
m n
ess ousness
ns of the NEre 2 illustratclinical samVollmer, O, & Costa,
at the group e Table 5 bevalues of th
m the sample
mple is introdbelow. The r
T-s
ore is categng each factes. The interval 65-80 in
Users and Nores belong ge is introdu
dicates neutr
core statisticof the diffepopulation a
Descriptive s
Samp
5947544743
EO-FFI-R Tte that the samples. For O'Donnell, &2008; Floryin this stud
elow). Howhis deviatione mean. For
Figu
duced for imresulting T-s
score
gorized intoor. The interval 45-55 indicates ver
Non-users) into the averauced as follral (0), and t
cs, sample merence betwand sample
statistics (Me
ple mean s
9.64 57.35 44.04 57.15 43.93 4
T-scores basample is biaexample, fo& Skosnik, y, Lynam, Mdy deviates fever, in thisn. Thereforer this purpos
ure 2. Mean T
mproved visscoressample
10raws
o five categerval 20-35 indicates avry high scornstead of eaage score inlows: the inthe interval
8
means, samween the sa
means are s
ean, SD, 95% 95% CI for
sample mean59.08, 60.20 46.88, 47.83 53.55, 54.52 46.62, 47.69 43.25, 44.61
sed on normased with reor schizoph2011) and
Milich, Leufrom the pos sample, gre, it is convse we introd
T-score NEO
sibility and contains a m
wscore-sasamplestan
gories to suindicates ve
verage scoreres. This stuach individunterval. A sunterval 44-451-56 indic
mple standardample meanstatistically
CI). P-value
SD
12.4110.4810.7511.8811.06
mative data aespect to thehrenia (Gurr
drug use (Tukefeld, & Copulation noroups of dru
venient to studuce the T-s
-FFI-R (red l
simplicity omean of 50 a
amplemeanndarddevia
ummarise anery low scoes. The interudy considerual’s score. ubdivision o9 indicates
cates modera
d deviation n and the psignificant
is calculated Popul
me55555
are depictede populationrera, NestorTerraccianoClayton, 20
orm in the saug users andudy deviatioscore in the
line)
of comparisand a standa
nscoreiation
n individuaores. The intrval 55-65 irs the meanAll values r
of the T-scomoderatelyately high (+
(SD), and epopulation m(p<0.001).
by t-test lation ean
p
0 0 0 0 0
d in Figure 2n. This typer, & O'Donno, Löckenho002). It is hame directiod non-usersons of userssample.
son, calculatard deviatio
50.
al’s personaterval 35-45indicates hi
n T-scoresamp
related to thresample near
y low (−), th+).
evaluation mean. All
p-value
<0.001 <0.001 <0.001 <0.001 <0.001
2. Table 1 e of bias is nel, 2000; off, Crum, ighlighted on as drug s differ for s and non-
ted by the on of 10.
ality score 5 indicates gh scores. ple for two he groups’ r to that of he interval
The meanusers) forusually usand groupt-test is egroups ofprobabilitSAS 9.4.
There arecategorica
One of thcorrelationcoefficiencorrelationcontinuoufollows tdrawbackthresholds
Let us hacategoryThe samp
The simp‘average’ the value average p
The polycpolychoriapproach calculatio
n of the T-scr each drugsed as a meps of Users mployed tof users and ty value gre
many data al features to
he widely un (Lee, Poo
nts further in is based o
us random vthe normal
ks: it definess defined ar
ave the ordi. The emp
ple estimatio
plest methovalue in eawith averagrobability is
choric coefc coefficienis the usag
on the catego
coressample fo. Any diffe
easure of disand Non-us
o estimate thnon-users
eater, than 0
mining meo use these
2.3.1
used techniqon, & Bentlis used to con suggestio
values with fdistributio
s the threshe different f
inal feature pirical estimon of thresho
d of ordinaach intervalge probabilis
fficients, cants calculatege of the saories’ value
for all factorrences betwssimilarity isers for eachhe significafor each N
0.1, is consi
Inpu2.3
ethods to womethods.
1 Ordina
ques to analer, 1995; Mcalculate pron that valufixed thresh
on. Unfortuolds of discfor different
with catmation of proolds are eva
al feature q. There sevity: if thresh
Φ
lculated oned by usingame threshos.
9
rs enables coween the mein scores. Th drug is exance of the
NEO-FFI-R idered as no
ut feature t
ork with con
al features
alyse categoMartinson &rincipal comues of ordinholds. Furtheunately, polcretization bt pairs of att
tegories ,obability ofaluating as:
,
quantificatioveral variantholds an
2
n base of qug the maximolds for all
omparisonsean T-score
The relationsamined by cdifferencesscores. The
on-significan
transform
ntinuous da
quantifica
orical data & Hamdan, mponents, etnal feature aermore, thislychoric cobut not the vtributes.
, … , , af category
on is to usts of ‘averagnd t define
.
uantificationmum likeliho
pairs of att
of groups (esample for usship betweecomparing m
s between me differencent. The anal
ation
ata. It is nec
ation
is the calcu1971). The tc. The techare the resus latent contorrelation tevalues for e
and with nu is
e thresholdge’ value. Fe the interva
n (2), haveood approatributes and
(both Users sers and noen NEO-FFImean T-sco
mean T-scores with an alysis is perf
cessary to qu
ulation of pmatrix of p
hnique of plt of discrettinuous randechniques each categor
umber of ca/ , where
ds (1) and For this studal of categor
less likelihch. The me
d explicit fo
and Non-on-users is I-R scores
oresample. A resample for associated formed by
uantify all
polychoric polychoric polychoric tization of dom value have two ry and the
ases of ∑ .
(1)
select the dy we use ry , then
(2)
hood than erit of this ormula for
We cannolocation afeatures wThis proce
1.
2. 3. 4.
The proceFigure 3 spoints.
As an alteof nomina0 (if ‘falseEthnicity Mixed-W
Fig
ot use techniand ethnicitwe implemeedure includ
Exclude nprincipal 2010) in Kaiser’s rCalculateCalculateThe numcomponen
ess of nomishows that p
ernative varal variables:e’): UK, Cais transform
White/Black,
gure 3. CatPC
2.3.2
iques descrity because ented the tedes four step
nominal feacomponentspace of r
rule (Guttme the centroie the first pri
merical valunt.
inal featurepoints corre
riant of nom: ‘country’ ianada, USA,med into sevAsian, Blac
CA quantifica
2 Nomina
ibed above categories
echnique ofps:
atures from ts (Pearson, retained inp
man, 1954; Kid of each caincipal com
ue for each
e quantificatesponding t
minal featureis transform, Other (couven binary fck and Mixe
ation of ‘Coun
10
al feature
to quantify of these fea
f nonlinear
the set of i1901; Gorb
put feature. Kaiser, 1960ategory in p
mponent of ccomponen
tion for theo the UK c
e quantificatmed into seveuntry), Austrfeatures: Med-Black/As
ntry’ on the p
quantifica
nominal feaatures are uCatPCA (L
input featurban & ZinovTo select i).
projection oncentroids. nt is the pro
e feature ‘Category are
tion we useen binary feralia, Repubixed-White/sian.
plane of the fir
ation
atures such unordered.
Linting & v
res and calcvyev, 2008; informative
n selected p
ojection of
Country’ is d located ver
dummy coeatures with blic of Irelan/Asian, Wh
rst two princ
as gender, cTo quantify
van der Koo
culate the inGorban & Zcomponen
rincipal com
f its centroi
depicted in ry far from
oding (Gujarvalues 1 (ifnd and New
hite, Other (
ipal compone
country of y nominal oij, 2012).
nformative Zinovyev,
nts we use
mponents.
id on this
Figure 3. any other
rati, 2003) f ‘true’) or
w Zealand; ethnicity),
ents
In this stuprincipal feature whtogether wso on.
The seconinformativof an attrimportantdefined thimportancthe attribuminimal vprocedurethe best on
The third the simpleseveral ste
1. De
2. Seco
3. If go
4. Se5. If 6. Se
sea7. Su8. Se
tri1.
For this st
We used quantificaquantificaNeuroticisSensation
udy, we usevariables (
hich explainwith the pre
nd approachve principalibute is deft principal he thresholdce is greaterute is triviavalue of ime stops if thne.
approach uest thresholdeps:
efine the nu/ and
earch usualomponent. T
. the
o to step 8. earch the att
thet the value arch the prin
ubtract projeearch the attvial. If ther
tudy, we app
two generation descriation’. We sm, Extrav
n-Seeking. T
ed three diff(McCabe, 1ns the maximeviously sele
h was doubl componentfined as macomponents
d of importar than the thal. If there mportance. Were are no t
used was spd for sparse
umber of fead the Kaiserl principal The iterative
en all the inf
tribute with hen there is nof the foundncipal compection on foutributes wit
re are trivial
plied severa
ral sets of bed in subcalled this
version, OThe second
2.3.3 Inp
ferent techn984). The
mal fractionected featur
ble Kaiser’sts by Kaiser
aximum of as representeance as1/√hreshold of iare trivial aWe removetrivial attrib
parse PCA (e PCA. The
atures , var threshold f
componente algorithm
formative c
non-zero cono trivial atd coefficienponent undeund componth zero coefl attributes t
Ri2.4
al classificat
f input featsections ‘O
s set of fepenness, Aset was the
11
put feature
niques of inpmain idea o
n of the datares explains
s selection. r’s rule (Guabsolute valed by norm√ , where importance attributes thed the worsbutes. This a
(Naikal, Yasearching o
ariance of dafor coefficiet and calcgives the p
omponents
oefficients wttributes. Gont to zero. Bler this condinent from dfficients in then remove
isk evaluat
tion method
tures. The Ordinal featueature ‘origAgreeablenee set of proj
e ranking
put feature rof this appr
a variance, ths the maxim
Calculate puttman, 1954lue of the c
malized vec is the numthen this at
hen the worst attribute algorithm ra
ang, & Sastrof each spar
ata , the Kents 1/√culate the vrincipal com
are found.
with least abo to step 7. lock changeition. Go to
data and go teach founde it from th
tion metho
ds which pro
first wasures quanti
ginal’. It iness, Conscjections of i
ranking. Throach is to hen select th
mal fraction
principal co4; Kaiser, 19correspondinctors. For amber of attrittribute is inrst attributeand repeat
anks attribu
ry, 2011). Inrse principal
Kaiser thres√ . variancemponents in
Remove the
bsolute valu
es of this attstep 4.
to step 2. d componenhe set of attr
ods
ovide risk ev
the set of ification’, ancludes ageientiousnesinput featur
he first technselect first
he next featof data vari
omponents a960). The imng coordina
attribute seleibutes. If thenformative. O is the attri the proced
utes from the
n this studyl componen
shold for co
explainedn descendin
e last comp
e .
tribute coeff
nt. This attrributes and
valuation as
input featuand ‘Nomine, gender, es, Impulsivres onto the
nique was the input
ture which iance, and
and select mportance ates in the ection we e attribute Otherwise ibute with dure. This e worst to
y, we used nt contains
omponents
d by this g order of
ponent and
ficient and
ributes are go to step
s well.
ures after nal feature education, vity, and
e first four
12
principal components. This set of input features we called ‘projected’. Also, we used the subsets of original and projected sets.
2.4.1 K nearest neighbors (KNN)
The basic concept of KNN is the class of object is the class of the majority of its k nearest neighbors (Clarkson, 2005). This algorithm is very sensitive to distance definition. There are several commonly used variants of distance for KNN: Euclidean distance; Minkovsky distance; and distances calculated after some transformation of the input space.
In this study, we used three distances: the Euclidean distance, the Fisher’s transformed distance (Fisher, 1936) and the adaptive distance (Hastie & Tibshirani, 1996). Moreover, we used a weighted voting procedure with weighting of neighbors by one of the standard kernel functions (Li & Racine, 2007).
The KNN algorithm is well known (Clarkson, 2005). The adaptive distance transformations algorithm is described in (Hastie & Tibshirani, 1996). KNN with Fisher’s transformed distance is less known. For them, the following options are defined: k is the number of nearest neighbors, is the kernel function, and kf is the number of neighbors which are used for the distance transformation. To define the risk of drug consumption we have to do the following steps:
1. Find the kf nearest neighbors of the test point. 2. Calculate the covariance matrix of kf neighbors and Fisher’s discriminant direction. 3. Find the k nearest neighbors of the test point using the distance along Fisher’s discriminant
direction among earlier found kf neighbors. 4. Define the maximal distance from the test point to k neighbors. 5. For each class calculate the membership of this class as a sum of the points’ weights. The
weight of a point is the ratio: the value of the kernel function of distance from this point to the test point divided by the maximal distance defined at step 4.
6. Drug consumption risk is defined as the ratio of the positive class membership to the sum of memberships of all classes.
The adaptive distance version implements the same algorithm but uses the other transformation on step 2 and the other distance on step 3. The Euclidean distance version simple defines and omits 2 and 3 steps of algorithm.
We test the KNN versions, which differ by:
The number of nearest neighbors, which is varied between 1 and 20; The set of input features; One of the three distances: Euclidean distance, adaptive distance and Fisher’s distance; The kernel function for adaptive distance transformation; The kernel functions for voting.
2.4.2 Decision tree
The decision tree approach is a method that constructs a tree like structure, which can be used to choose between several courses of action. The binary decision trees are used in this study. The decision tree is comprised of nodes and leaves. Every node can have a child node. If a node has no
13
child node, it is called a leaf or a terminal node. Any decision tree contains one root node, which has no parent node. Each non terminal node calculates its own Boolean expression (with the value ‘true’ or ‘false’). According to the result of this calculation, the decision for a given sample would be delegated to the left child node (‘true’) or to the right child node (‘false’). Each leaf (terminal node) has a label which shows how many samples of the training set belong to each class. The probability of each class is estimated as a ratio of the number of samples in this class to the total number of samples in the leaf.
There are many methods for developing a decision tree (Rokach & Maimon, 2010; Quinlan, 1987; Breiman, Friedman, Stone, & Olshen, 1984; Gelfand, Ravishankar, & Delp, 1991; Dietterich, Kearns, & Mansour, 1996; Kearns & Mansour, 1999; Sofeikov, Tyukin, Gorban, Mirkes, Prokhorov, & Romanenko, 2014). We use the methods based on information gain, Gini gain, and DKM gain. Let us consider one node and one binary input attribute which can take values 0 or 1. Let us use notation: is the number of cases in the node, is the number of categories of the target feature, is the number of th category cases with input attribute value in the node, the number of th category cases in the node is . , the number of cases with the input attribute value in the node is . ∑ , , … , is the vector of frequencies with input attribute value , and ., … , . is the vector of frequencies with any input attribute value. To form a tree we select the base function for information criterion among
, ,
, 1 ,
and
, 2 , where is the vector of frequencies and is the sum of the elements of vector . The DKM can be applied to a binary target feature only. The value of the criterion is the gain of base function:
, . , . , .
There are several approaches to use real valued inputs in decision trees. The commonly used approach suggests the binning of real valued attribute before forming the tree. In this study we implemented ‘on the fly’ binning: the best threshold is searched in each node for each real valued attribute and then this threshold is used to bin these feature in this node. The best threshold depends from the split criteria used (information gain, Gini gain, or DKM gain).
Another possibility we employ is usage of Fisher’s discriminant to define the best linear combinations of the real valued features (Fisher, 1936) in each node. The pruning techniques are applied to improve the tree.
The specified minimal number of instances in the tree’s leaf is used as a criterion to stop node splitting. Each leaf of the tree cannot contain fewer instances than a specified number.
For the case study we tested the decision trees, which differ by:
14
The three split criterion (information gain, Gini gain or DKM gain); The use of the real-valued features in the splitting criteria separately or in linear combination
by Fisher’s discriminant; The set of the input features; The minimal number of instances in the leaf, which varied between 3 and 30.
2.4.3 Linear discriminant analysis
We used Fisher's linear discriminant for binary version of problem (Fisher, 1936). We calculate the mean of points of th class and covariance matrix of th class for both classes. Then we calculate the discriminating direction as
.
Each point is projected onto discriminating direction by calculation of dot product , . The threshold to separate two classes is calculated by finding the maximum of relative information gain, Gini gain, or DKM gain. This method cannot be used for problems of risk evaluation.
For the case study we tested the LDA which differ by one of the three criteria (information gain, Gini gain or DKM gain) which were used to define the threshold and the set of input feature.
2.4.4 Gaussian mixture
Gaussian mixture is the method to estimate the probability under assumption that each category of target feature has the multivariate normal distribution (Dinov, 2008). For each category we should estimate the covariance matrix and inverse it. The primary probability of belonging to the th category is:
2 | | ,
where is a prior probability of ith category, is dimension of input space, is the mean of point of th category, is tested point, is covariance matrix of ith category and | | is determinant of it. The final probability of belonging to category is calculated as
/ .
The prior probabilities are estimated as fractions of the th category cases among all cases. For the binary problem, we also used the varied multiplier to correct priors.
For the case study, we tested the Gaussian mixtures, which differ by the set of input features and corrections of prior probabilities.
2.4.5 Probability density function estimation
We implemented the radial-basis functions method (Buhmann, 2003) for probability density function estimation (Scott, 1992).
15
The number of probability densities to estimate is equal to the number of categories of the target feature. Each probability density function is estimated separately by using nonparametric techniques. The prior probabilities are estimated by database: / where is the number of cases with thcategory of the target feature and is the total number of cases in the database.
For each point, k nearest neighbors from the database is defined. These k points are used to estimate the radius of neighborhood as a maximum of distances from data point to each of k neighbors. The centre of one of the kernel functions is placed in the data point (Li & Racine, 2007). The integral of any kernel function over the whole space is equal to 1. The total probability of th category is the integral of sum of kernel functions and is equal to but the total probability of each category have to be equal to the prior probability . It means that sum of kernel functions has to be divided by . The probability of each category in arbitrary point is estimated as result of division of sum of values of kernel function which are placed in data points that correspond to records of this category by .
The following steps are to be used to evaluate the risk of each category: (i) the probability functions for all categories are estimated and (ii) the risk of each category is defined as a ratio of the probability of this category to the sum of all probabilities.
We tested the PDFE versions which differ by:
The number of the nearest neighbors (it is varied between 5 and 30); The set of the input features; The kernel function which was placed in each data points.
2.4.6 Logistic regression
We implemented the weighted version of logistic regression (Hosmer & Lemeshow, 2004). This method can be used for binary problem only. The log likelihood estimation of the regression coefficients is used. The weights of categories are defined as the fractions of th category cases among all cases. Logistic regression gives only one result because there is no option to customize the method except the set of input features.
2.4.7 Naïve Bayes
We implemented the standard version of naïve Bayes (Russell & Norvig, 1995). All attributes which containd less than or equal to 20 different values were interpreted as categorical and the standard contingency tables were calculated for such attributes. Calculated contingency tables are used to estimate the conditional probability. Attributes which contain more than 20 different values were interpreted as continuous. For continuous attributes the mean and the variance were calculated instead of the contingency tables. For each value of the output attribute we calculated the isolated mean and variance. The conditional probability of a specified outcome and a specified value of the attribute was calculated as the value of probability density function for normal distribution at point with mean and variance, which were calculated for outcome . This method has no customized options and is to be tested on different sets of input features.
Random fdecision tclassifier
are most popu
In random(Hastie, Tamong alpredictors
Random fexpectatio
The foresany two trthe strengIncreasingalgorithm2011).
A numbeclassificatthe sensitfrom the 50% were
There are validationused for atechniqueet al. (200
The descrnormative(2004), 95distributioasymmetr
forests are ptrees that groconsisting independenular class at
m forest, eacTibshirani, &ll variables.s randomly c
forests try ton (Hastie, T
st error rate rees in the f
gth of each ig the streng
m builds hun
r of differetion, the senivity and sp‘completely
e not consid
several appn and Leaveall tests in te like decisio09).
ription statie data (mea5% confideon shape cory of the dis
proposed byow in randoof a collect
nt identicallyt input ” (B
ch tree is co& Friedman. In a randochosen at th
to improve Tibshirani, &
depends onforest. Increindividual tgth of the ndreds of d
2.4.9 C
ent criteria nsitivity andpecificity asy random gudered.
proaches to e-One-Out Cthis study. Ton tree and
istics for fians and staence intervaompared to tribution) fo
2.4.8
y Breiman (omly selectetion of tree y distributedBreiman, 20
onstructed un, 2009). Inom forest,
hat node (Li
on bagging& Friedman
n two thingseasing the ctree in the foindividual
decision tree
Criterion o
exist for sed specificitys the integrauess’ classif
test the quaCross ValidThere are sorandom for
3
D3.1
ive factorsandard devials, kurtoses
normal distor NEO-FFI
16
Random
(2000) for bed subspacestructured
d random ve01).
sing a differn standard treach node aw & Wien
g by ‘de-corn, 2009).
s (Breiman,correlation iforest. A tree
trees decrees and com
of the best
election of ty were emplal criterion. fier. Classif
ality of classdation (LOOome problemrest. These p
Results
Descriptiv
is presenteiations in ths (kurtosis itribution) anI-R for the f
forest
building a ps of data (Bclassifiers ectors and e
rent bootstrrees, each nis split usi
ner, 2002)
rrelating’ th
, 2001). Thencreases thee with a low
eases the fombines them
method se
the best claloyed as the
This criterfiers with se
sifier: usageOCV) (Arloms with claproblems ar
ve statistics
ed in Table he populatis a measurend ‘skewnefull sample.
predictor enBiau, 2012).
, ,each tree ca
rap sample fnode is spliing the bes
he trees. Ea
e first is thee forest errow error rate orest error
into a sing
election
assifier. In e primary crion selects ensitivity or
e of isolatedot & Celissassifier qualre considere
s
2: means, ion followine of flatness
esses’ (skew
nsemble wit“Random f
1, . . . sts a unit vo
from the oriit using the t among a
ach tree has
e correlationor rate. Theis a strong rate. Randogle model (
this study friteria, and tthe classifier specificity
d test set, n-e, 2010). Lity estimati
ed in details
standard dng McCraes/‘peakedne
wness is a m
th a set of forests is a where the ote for the
iginal data best split subset of
s the same
n between second is classifier.
om Forest (Williams,
for binary the sum of er furthest y less than
-fold cross LOOCV is on for the by Hastie
deviations, e & Costa ess’ of the measure of
Tabl
Fa
NExOpAgCo
No
Pearson's associatiofactors haAgreeablesignifican
Table 4 ddrug. Signrelationshas followswhereas arisk of useon Agreeascore of E
le 2. Descripti
Factors
NeuroticismExtraversionOpenness AgreeableneConscientiou
actors
euroticism xtraversion penness greeableness onscientiousneote: p-value is
correlationon between tave no signeness and Ontly correlate
Com3.2
demonstratesnificant diffhip between s: an increaan increase e. Thus for ableness an
Extraversion
ive statistics (Normatives,
s Mean
m 23.92n 27.58
33.76ess 30.87usness 29.44
Ta
Neurotic
-0.4320.017
-0.215ess -0.398s the probabil
if
n coefficienttwo factors.nificant cor
Openness (r=ed in the sam
mparison o
s the mean ferences of p
personalityase in score
in the scoreach drug,
nd Conscienn is drug spe
(Means, sampKurtoses, Sk
n SD N
Me2 9.14 16.8 6.77 29.6 6.58 31.7 6.44 32.4 6.97 33.
able 3. PCC f
cism Extrav-0.43
2** 7 0.235** 0.158** 0.31lity to observdata uncorre
t (PCC) r i. PCC for alrrelation: (1=0.033 p=0.mple.
f mean pe
T-scoresamp
personality y profile ands of Neurotres of Agredrug users s
ntiousness wecific (non-u
17
ple Standard kewnesses) for
ormative 9
ean SD .83 7.36 .29 6.46 .29 6.12 .41 5.42 .26 6.3
for NEO-FFI-PCC
version Open32** 0.0
0.236** 59** 0.018** -0.0e by chance telated: * p<0.
is employedll pairs of fa1) Neurotici155). Howe
rsonality t
ple NEO-FFIfactor scored risk of drticism and Oeableness ascored high
when compauniversal).
Deviations (Sr NEO-FFI-R
95% Cl for Me
23.51, 24.3327.27, 27.8833.47, 34.0530.58, 31.1629.12, 29.75
-R for raw daC for scales nness Agree017 -0.236** 0. 0.
033 060* 0.the same or gr01; ** p<0.00
d as a meaactors are prism and Opever, all othe
traits for d
I-R factors fes exist betwrug consumpOpenness eand Conscieer on Neuro
ared to drug
SD), ConfidenR for raw data
ean Kurtosis
3 -0.55 8 0.06 5 -0.27 6 0.13 5 -0.17
ata
eableness Co.215** .159** .033
.249** reater correla01.
asure of theresented in Tpenness (r=er pairs of p
drug users
for users anween these gption can gentails an in
entiousness oticism and g non-users.
nce Intervals a
s Skewness
0.11 -0.27 -0.30 -0.26 -0.38
onscientiousne-0.398** 0.318** -0.060* 0.249**
ation coefficie
e strength oTable 3. Tw=0.017 p=0personality f
and non-u
nd non-usergroups. Theenerally be
ncrease in rientails a deOpenness, The influen
(CI),
ess
ent
of a linear wo pairs of 0.471); (2) factors are
users
rs for each e universal
described isk of use, ecrease in and lower nce of the
18
Table 4. Mean T-scoresample and 95% CI for T-scoresample mean for groups of Users and Non-users. P-value is the probability to observe by chance the same or greater differences between means if the means are the same. The
green background corresponds to p-value less than 0.001, the yellow background corresponds to p-value between 0.001 and 0.01, the blue background corresponds to p-value between 0.01 and 0.05, the red
background corresponds to p-value between 0.05 and 0.1 and the white background corresponds to p-value greater than 0.1 which is non-significant.
Factors User Non-user
p-value Mean T-score 95% CI for mean Mean T-score 95% CI for mean
Alcohol # 1817 68 N 50.13 49.67,50.59 48.19 45.77, 50.61 0.116 E 50.06 49.60, 50.52 50.04 47.61, 52.42 0.988 O 50.04 49.58, 50.51 48.81 46.45, 51.17 0.318 A 49.93 49.47, 50.39 52.51 50.26, 54.77 0.036 C 49.94 49.48, 50.40 53.31 51.05, 55.56 0.006
Amphetamines # 679 1206 N 51.71 50.95, 52.46 49.14 48.58, 49.69 <0.001 E 49.71 48.89,50.53 50.26 49.72, 50.80 0.251 O 53.05 52.34, 53.77 48.28 47.72, 48.84 <0.001 A 48.39 47.60, 49.18 50.94 50.39, 51.48 <0.001 C 47.04 46.29, 47.80 51.76 51.22, 52.30 <0.001
Amyl nitrite # 370 1515 N 50.78 49.78, 51.79 49.89 49.37, 50.39 0.122 E 50.97 49.95, 51.99 49.84 49.33, 50.35 0.052 O 51.45 50.47, 52.43 49.65 49.14, 50.15 0.002 A 48.69 47.65, 49.72 50.35 49.84, 50.85 0.004 C 48.08 46.10, 49.07 50.54 50.04, 51.05 <0.001
Benzodiazepines # 679 1206 N 52.83 52.12, 53.54 48.15 47.59, 48.72 <0.001 E 49.07 48.31, 49.83 50.74 50.19, 51.30 <0.001 O 52.66 51.97, 53.34 48.17 47.59, 48.75 <0.001 A 48.28 47.53, 49.02 51.22 50.67, 51.78 <0.001 C 47.70 46.98, 48.41 51.69 51.12, 52.25 <0.001
Cannabis # 1265 620 N 51.08 50.52, 51.65 47.98 47.25, 48.71 <0.001 E 49.75 49.17, 50.33 50.70 49.98, 51.41 0.053 O 52.48 51.96, 52.99 44.95 44.20,45.70 <0.001 A 48.84 48.28,49.40 52.42 51.69,53.15 <0.001 C 48.15 47.60,48.70 53.92 53.27,54.65 <0.001
19
Factors User Non-user
p-value Mean T-score 95% CI for mean Mean T-score 95% CI for mean
Chocolate # 1850 35 N 50.06 49.60, 50.51 50.29 46.36, 54.21 0.894 E 50.05 49.59, 50.51 50.80 47.71, 53.89 0.660 O 50.05 49.59, 50.51 47.37 44.42, 50.32 0.117 A 50.05 49.59, 50.50 48.66 44.61, 52.70 0.416 C 50.03 49.58, 50.49 51.57 47.87, 55.27 0.366
Cocaine # 687 1198 N 51.85 51.09, 52.60 49.04 48.48, 49.59 <0.001 E 50.33 49.55, 51.12 49.91 49.35, 50.46 0.374 O 52.60 51.89, 53.30 48.51 47.94, 49.08 <0.001 A 47.71 46.93, 48.50 51.34 50.81,51.88 <0.001 C 47.49 46.76, 48.22 51.53 50.98, 52.09 <0.001
Caffeine# 1848 37 N 50.06 49.61, 50.52 50.08 46.59,53.57 0.991 E 50.13 49.67, 50.59 46.76 43.82, 49.69 0.0429 O 50.11 49.65, 50.57 44.59 41.67,47.52 <0.001 A 49.99 49.54, 50.45 51.11 47.71, 54.51 0.504 C 49.99 49.53, 50.44 53.59 50.79, 56.40 0.029
Crack# 191 1694 N 53.08 51.64, 54.52 49.72 49.25, 50.20 <0.001 E 48.80 47.33, 50.27 50.20 49.73, 50.68 0.066 O 52.91 51.60, 54.21 49.67 49.19, 50.15 <0.001 A 47.02 45.46, 48.58 50.36 49.89, 50.83 <0.001 C 46.19 44.70, 47.68 50.50 50.03, 50.96 <0.001
Ecstasy # 751 1134 N 51.30 50.58, 52.02 49.24 48.67, 49.82 <0.001 E 50.56 49.80, 51.31 49.73 49.17, 50.30 0.081 O 53.62 52.96,54.28 47.60 47.03, 48.17 <0.001 A 48.49 47.75, 49.23 51.03 50.47, 51.60 <0.001 C 47.30 46.59, 48.00 51.89 51.33, 52.45 <0.001
Heroin # 212 1673 N 54.60 53.29, 55.92 49.49 49.01, 49.96 <0.001 E 48.42 46.94, 49.90 50.27 49.80, 50.74 0.011 O 54.25 53.04 , 55.47 49.46 48.98, 49.94 <0.001 A 45.53 44.00, 47.06 50.59 50.12, 51.05 <0.001 C 45.91 44.55, 47.26 50.59 50.11, 51.06 <0.001
Ketamine # 350 1535 N 51.42 50.40, 52.43 49.75 49.25, 50.26 0.005 E 50.36 49.23, 51.48 49.99 49.50, 50.49 0.542 O 53.87 52.90, 54.84 49.12 48.62, 49.62 <0.001 A 47.80 46.67, 48.94 50.53 50.04, 51.01 <0.001 C 46.86 45.81, 47.92 50.79 50.30, 51.28 <0.001
20
Factors User Non-user
p-value Mean T-score 95% CI for mean Mean T-score 95% CI for mean
Legal highs # 762 1123 N 51.49 50.77,52.22 49.09 48.52, 49.66 <0.001 E 49.71 48.94, 50.47 50.30 49.75, 50.86 0.206 O 54.30 53.67, 54.92 47.08 46.51, 47.65 <0.001 A 48.61 47.86, 49.36 50.98 50.42, 51.53 <0.001 C 46.98 46.27, 47.72 52.14 51.60, 52.68 <0.001
LSD # 557 1328 N 50.87 50.05, 51.69 49.72 49.18, 50.26 0.023 E 50.07 49.16, 50.98 50.06 49.54, 50.58 0.986 O 55.25 54.54, 55.96 47.80 47.27, 48.33 <0.001 A 48.44 47.57, 49.31 50.68 50.16, 51.21 <0.001 C 47.54 46.71, 48.36 51.12 50.59, 51.65 <0.001
Methadone # 417 1468 N 53.41 52.47, 54.35 49.11 48.61, 49.62 <0.001 E 47.97 46.88, 49.05 50.66 50.17, 51.15 <0.001 O 53.77 52.86, 54.69 48.93 48.42, 49.44 <0.001 A 47.07 46.03, 48.10 50.86 50.37, 51.35 <0.001 C 46.21 45.22, 47.20 51.15 50.66, 51.64 <0.001
Magic Mushrooms # 694 1191 N 50.73 49.98,51.47 49.67 49.10, 50.24 0.027 E 50.15 49.35,50.96 50.01 49.47, 50.55 0.765 O 54.36 53.71,55.01 47.46 46.90, 48.02 <0.001 A 48.55 47.78,49.32 50.88 50.33, 51.43 <0.001 C 47.54 46.81,48.27 51.53 50.97, 52.08 <0.001
Nicotine # 1264 621 N 50.97 50.41, 51.52 48.22 47.45, 48.99 <0.001 E 49.98 49.41, 50.54 50.24 49.48, 50.99 0.599 O 51.47 50.92,52.01 47.01 46.26, 47.77 <0.001 A 49.20 48.65, 49.75 51.69 50.92, 52.46 <0.001 C 48.64 48.08, 49.20 52.95 52.24,53.67 <0.001
VSA # 230 1655 N 52.88 51.57, 54.20 49.67 49.19,50.15 <0.001 E 48.96 47.45, 50.47 50.22 49.74,50.69 0.075 O 54.20 53.00, 55.41 49.42 48.93,49.90 <0.001 A 47.30 45.92, 48.68 50.40 49.92,50.87 <0.001 C 45.22 4 3.88, 46.56 50.73 50.26,51.20 <0.001
21
The introduction of moderate subcategories of T-scoresample enables the separation of drugs into five groups, as presented in Table 5. The description of each drug can be named by five moderate subcategories with the moderate profile (N, E, O, A, C). Firstly, the group with the profile (0, 0, 0, 0, 0) includes the users of the licit drugs alcohol, chocolate and caffeine. Thus, T-scoresample for all factors for licit drug consumers does not significantly differ from the sample mean. Secondly, the group of drugs with the profile (0, 0, +, −, −) includes the users of amyl nitrite, LSD, and magic mushrooms. Thirdly, nicotine users form their own group with the profile (0, 0, +, 0, −). Fourthly, the largest group of drugs users with the profile (+, 0, +, −, −) includes the users of amphetamines, benzodiazepines, cannabis, cocaine, ecstasy, ketamine and legal highs. Finally, the group with the profile (+, −, +, −, −) includes the users of crack, heroin, VSA and methadone.
Table 5. Moderate subcategories of T-scoresample with respect to the sample mean for groups of users. The white background corresponds to a neutral score (0), the green background corresponds to
moderately high score (+) and a pink background corresponds to moderately low score (−). Neuroticism Extraversion Openness Agreeableness Conscientiousness
Alcohol, Chocolate, and Caffeine 0 0 0 0 0
Amyl nitrite, LSD, and Magic Mushrooms 0 0 + − −
Nicotine 0 0 + 0 −
Amphetamines, Benzodiazepines, Cannabis, Cocaine, Ecstasy, Ketamine, and Legal highs + 0 + − −
Crack, Heroin, VSA, and Methadone + − + − −
Table 5 demonstrates that for all drugs the mean values of Neuroticism and Openness for groups of Users is neutral (0) or moderately high (+). Groups of User of illicit drugs can be seen as moderately low (−) on Agreeableness and Conscientiousness. Groups of licit drug users (alcohol, chocolate, caffeine, and nicotine) have neutral (0) score of Agreeableness and Conscientiousness, apart from nicotine users, who have moderately low (−) scores of Conscientiousness. For the groups of users of crack, heroin, VSA and methadone, the score of Extraversion is moderately low (−). However, for groups of Users of other drugs, the score of Extraversion is neutral (0).
Table 5 is based on the size of differences between the mean T-scoressample for users from the sample mean. Table 6 represents the groups of drugs with the same set of T-scoressample which significantly differ from the sample mean. The difference is considered as significant if the p-value is less than 0.01. Three out of six groups correspond to licit drugs. Chocolate has no significant differences for all factors. Alcohol has a significant difference only for Conscientiousness, and caffeine has a significant difference only for Openness. Amyl nitrite, LSD, and magic mushrooms form a group containing significant differences in relation to three factors: Openness, Agreeableness, and Conscientiousness. The next group contains amphetamines, cannabis, cocaine, crack, ecstasy, heroin, ketamine, legal highs, nicotine, and VSA, and have significant differences in relation to the following four factors: Neuroticism, Openness, Agreeableness, and Conscientiousness. The last groups are formed by two users of two drugs: benzodiazepines and methadone. The groups of benzodiazepines and methadone users form a significantly different sample mean in all factors.
a
Fn
GreeNeur
Amph
Graphs ofFigure 5. all drugs group ‘Nimushroomusers withsample m
a)
Figure 4. Avenorm mean ((continued on
Table 6. Pen backgrounroticism
hetamines, Ca
X
X
f mean valuA single drin the sameicotine Userms. Figure 5h respect to
mean (i.e. the
erage personaleft column) a the next page
Profile of signnd and symbo
Extraversi
A
annabis, Coca
X
ues of factorug was choe group are r’ are simila5 represents o the popule right colum
ality profiles fand T-scoress
e)
nificantly diffol ‘X’ correspion
Amyl nitrite, L
aine, Crack, E
Benzodiaz
ors scores foosen to be pvery similaar to the seT-score gra
lation normmn) for alco
for groups ofsample with res
22
ferent of meaponds to signi
Openness Chocolate
Alcohol
Caffeine
X LSD and mag
X Ecstasy, Hero
X zepines and M
X
or groups olotted for ea
ar. Nicotine cond groupaphs of pers
m mean (theohol, LSD, c
b)
f users and nospect to the sa
an for groups ficant deviati
Agree
gic Mushroom
oin, Ketamine
Methadone
f drugs defiach group, dis not plotte
p consisting sonality facte left columcannabis, an
on-users in T-ample means
of User and Nions with p-vaableness
ms
X e, Legal highsX
X
fined in Tabdue to the fed, since thof amyl nit
tors for groumn) and thend heroin.
-scores with r(right column
Non-user. alue less than
Conscientio
X
X s, Nicotine an
X
X
ble 5 are prefact that the he scale scortrite, LSD aups of userse with resp
respect to then) for: a) & b
n 0.01 ousness
d VSA
esented in shapes of
res for the and magic s and non-ect to the
population b) Alcohol.
c
e
g
c)
e)
g)Figure 5 (cto the pop
continuation)pulation norm
. Average perm mean (left c
for: c) &
rsonality profcolumn) and T& d) LSD, e) &
23
d)
f)
h)files for groupT-scoressample
& f) Cannabis
ps of users anwith respect
s, and g) & h)
nd non-users to the sample
) Heroin.
in T-scores we means (righ
with respect ht column)
The usage
Table 13 correlationfrom a tochance themploy a significantechniqueused to ccorrelationBH step-ucoefficien
However,example, p-value isconsidere| | 0.4.can be covery stron
Figure 6 dhighs, LScorrelationketamine
Crack, bethree otheor weakly
Relative Iattributes RIG is zeri.e. the va(entropy) possible tRIGs is s0.15. FiguRIG Y|Xseen that LSD and Figure 6 from Figu
e (use or non
shows PCCns are signi
otal of 153 he same or g
multi-testinnce of the ce – Bonferrocontrol Falns. 115 corup procedu
nts.
, a significanthe correlat
s equal to 0d as an im. Figure 6 s
onsidered weng if| | 0
demonstrateSD, and magns betweenusage (
enzodiazepiner drugs usay correlated
Information(Mitchell, ro for indepalue of RIGin drug 1
to estimate significant, bure 7a dem|/min RIGin Figure 7amagic musexcept for
ure 6.
C3.3
n-use) of ea
Cs, which aificant, due pairs have pgreater corrng approach
correlation (oni correctise Discove
rrelation coeure with thr
nt correlatiotion coeffici.0005), but
mportant asssets out all seak if0.40.5.
es that the ugic mushroo
n cannabis 0.393).
nes, heroin,ages. Alcohowith all oth
n Gain (RIG1997). The
pendent attriG for drug 1
usage, whicthe significbut contain
monstrates ‘G X|Y ,RIGa a group coshrooms usa
ketamine u
Correlation
ach drug is c
are computeto the fact p-values le
relation coefh when test(Benjamini ion – and Bery Rate (Fefficients arreshold of
on does not ient for alcothe value oocitaion. Wsignificant i| |; mediu
usage of ampoms correlaand ketami
, methadoneol, amyl nitr
her drug use
G) is widelygreater the
ibutes. RIG 1 usage fromch can be rance of RIG
ns small valusymmetric’
G Y|X 0onsisting ofages are corusage. Asym
24
n between
considered a
ed between that the sam
ess than 0.0fficient for ting 153 pa& Hochber
BH step-up FDR), to esre significanFDR equal
necessarilyohol usage aof this coeff
We consideridentified coum if0.45
phetaminesates with aline usage (
e, and nicotrite, chocola.
y used in davalue of RIis not symm
m usage of removed if G, but it isues. Figure RIGs (we
0.2.) Figure f amphetamirrelated eachmmetric RI
n different
as a single B
the drug umple size is1 (p-valueuncorrelate
airs of drug rg, 1995). Wprocedure (stimate thent with Bonls 0.01 def
y imply a strand amyl nificient is eqr correlationorrelations g| | 0.4;
, cannabis, l other drug( 0.302)
tine usagesate, caffeine
ata mining tIG, the strometric. It is drug 2 is ethe value olikely that (7 presents call RIG(X7b demonstines, cannabh with eachIGs illustrat
drug usag
Boolean feat
usage pairs.s 1885. 121is the prob
ed variablesusages in
We apply th(Benjamini
e genuine snferroni corrfines 127 si
rong associatrate usage
qual to 0.074ns with absgreater than; strong if 0
cocaine, ecsgs in the sa) and betw
are correlate and VSA
to measure nger is the a measure o
equal to a frof drug 2 u(as for the Pall pairs w
X|Y) symmtrates asymmbis, cocaineh. This groute pattern s
ge
ture.
The major pairs of drability to o). In realityorder to esthe most con& Hochbe
significancerected p-vaignificant c
ation or cauis significan4, and thus olute value
n 0.4. The c0.5 | |
stasy, ketamame group,
ween legal h
ted with onusage is un
dependenceindicated co
of mutual inraction of uusage is knPCC) the m
with RIG grmetric if |RI
metric RIG, ecstasy, le
up is the samsignificantly
rity of the rug usages observe by y, it has to timate the nservative rg, 1995),
e of these lue 0.001.
correlation
sality. For nt (i.e. the cannot be
es of PCC correlation 0.45; and
mine, legal excluding highs and
ne, two, or ncorrelated
e between orrelation.
nformation uncertainty own. It is
majority of reater than IG X|Y. It can be
egal highs, me that in y different
Figure 7
significan
The resuquantificacontain thby the prin
Results ofeature seeffect regUSA. Furuse. To consumptWe calcusubsample
7. Pairs of druntly asymmet
k
lts of the ation, and inhe list of attrncipal varia
f implemenelection we garding counrthermore, inunderstand
tion we comulated the e. For both
F
ug usages wittric RIG. In f
knowledge of
principal vn Table 8 foributes in orables approa
ntation sparcan exclud
ntry of locatncluding co
the reasonmpare the st
p-value ofh divisions i
Figure 6. Stro
th high relativfigure B) arroLSD usage ca
I3.4
variables cor dummy qrder from beach are show
se PCA arede ethnicitytion. Only t
ountry in perns of thesetatistics for f coincidinginto subsam
25
ng drug usag
ve informatioow from LSDan decrease u
Input featu
calculation quantificatioest to worst.wn in the sam
e representey from furthtwo countrirsonality mee two counthe subsam
g distributimples we ob
ge correlation
on gain: A) mD usage to heruncertainty in
ure rankin
are represeon of nomin. For compame table.
ed in Table her consideres are inforeasures addntries’ impo
mples: UK –on of persbtained the
ns.
more or less syroin usage, forn heroin usage
ng
ented in Tnal features.arison the lis
9 and Tabration. Therrmative (in ds not much ortance for
– non-UK ansonality me
same resul
ymmetric RIGr example, me.
Table 7 for Table 7 anst of attribut
ble 10. As are is more our sample)to predictio
r predictionnd USA – neasurementslts: all inpu
G and B) eans that
r CatPCA nd Table 8 tes ranked
a result of intriguing ): UK and on of drug n of drug non-USA. s in each ut features
26
have the significantly different distributions with a 99.9% confidence level for UK and non-UK subsamples and likewise for USA – non-USA subsamples. It means that sample UK and non-UK are biased. The same situation is with the USA and non-USA samples. We excluded the country variable from further study.
Table 7. Results of feature ranking. Data table includes Country of location and Ethnicity quantified by CatPCA. FVE is the fraction of explained variance. CFVE is the cumulative FVE. The least informative
features are lower located. Principal variable ranking
Double Kaiser’s ranking Attribute FVE CFVE
Sensation-seeking 0.192 0.192 Extraversion Neuroticism 0.153 0.345 Conscientiousness Agreeableness 0.106 0.451 Sensation-seeking Education 0.104 0.555 Neuroticism Openness 0.092 0.647 Impulsivity Conscientiousness 0.088 0.735 Openness Extraversion 0.076 0.811 Agreeableness Age 0.073 0.884 Age Impulsivity 0.055 0.939 Education Country 0.037 0.976 Country Gender 0.021 0.997 Gender Ethnicity 0.003 1.000 Ethnicity
Table 8. Results of feature ranking. Data table includes dummy coded Country of location and Ethnicity. FVE is the fraction of explained variance. CFVE is the cumulative FVE. The least informative features are lower
located. Principal variable ranking
Double Kaiser’s ranking Attribute FVE CFVE
Sensation-seeking 0.186 0.186 Extraversion Neuroticism 0.149 0.335 Conscientiousness Agreeableness 0.103 0.438 Sensation-seeking Education 0.101 0.539 NeuroticismOpenness 0.089 0.627 Impulsivity Conscientiousness 0.086 0.714 Openness Extraversion 0.074 0.787 Agreeableness Age 0.071 0.858 Age Impulsivity 0.053 0.911 Education UK 0.027 0.938 UK Gender 0.020 0.959 USA USA 0.013 0.972 Gender White 0.010 0.982 Other (country) Other (country) 0.005 0.988 White Canada 0.004 0.991 Other (ethnicity) Other (ethnicity) 0.003 0.994 Canada Black 0.002 0.995 AsianAustralia 0.002 0.997 Mixed-White/Black Asian 0.001 0.998 Australia Mixed-White/Black 0.001 0.999 Black Republic of Ireland 0.000 1.000 Mixed-White/Asian Mixed-White/Asian 0.000 1.000 Republic of Ireland New Zealand 0.000 1.000 New Zealand Mixed-Black/Asian 0.000 1.000 Mixed-Black/Asian
Table 9
Step
1
2
Table 10. T
Step
1
2
3
The first described selection first four p
Table 11 and specifour princbenzodiaz
There is nused featuand Consc
If the featfeatures ‘results pre
Table 9, bgender, wby other m
9. The result o
p # ocompo
5
4
The result of s
p # ocompo
8
5
4
step for thein section
are presenteprincipal co
shows that ficity are grcipal compzepines, coc
no single moures is 6 outcientiousnes
ture is consiby fact’. Aesented in T
but is used which is conmethods (se
of sparse PCA
of onents 5 Gen
4
No AgeAgrsee
sparse PCA f
of onents
8 CanZeaWh
5 Gen
4
No AgeAgrsee
Th3.5
e risk evalu‘Risk evalu
ed in Table omponents.
for all drureater than 7onents accu
caine, ketam
ost effectivet of 10 and tss only and
idered to be Age proves Table 7 and
in the bestsidered as ne Table 7).
A feature ranquan
nder and Ethremoved at
e, Educareeablenessking, Count
feature rankin
nada, Otheraland, Mixehite/Black, Ander, UK anremoved at
e, Educareeablenessking
he solution
uation is couation meth11 for the o
ugs except a70%. It is auracy is gr
mine, methad
e classifier the least numprovides se
informativenot to be t
t classifiersnon-informa
27
king. Data tantified by Cat
Re
hnicity ttributes. Thation, Ne, Conscientry
ng. Data table
Re
r (country)ed-White/AAsian, Blacknd USA ttributes. Thation, Ne, Conscien
n to the pr
onstruction oods’. and se
original inpu
alcohol, cocan unexpectreater than done, nicotin
employing mber is 2. Tensitivity of
e when usedthe most in
for 14 druative by Spa
able includes CPCA. emoved attr
he retained seuroticism, ntiousness,
e includes du
emoved attr
), AustraliaAsian, Whitk and Mixed
he retained seuroticism, ntiousness,
roblem of c
of classifierelected the ut space and
caine and medly high ain the origne, and VSA
all input feaThe decision
80.63% and
d for classifnformative m
ugs. The secarse PCA an
Country of lo
ributes
set of attribuExtravers
Impulsivity
mmy coded C
ributes
a, Republic e, Other (ed-Black/Asi
set of attribuExtravers
Impulsivity
classificati
rs. We testebest one. R
d in Table 1
magic mushrccuracy. In
ginal input A.
atures. The n tree for crad specificity
fier formingmeasure in
cond most nd as one of
ocation and E
utes: sion, Opy and Sen
Country and E
of Irelandethnicity), ian
utes: sion, Opy and Sen
ion
ed the eightResults of th
2 for the sp
rooms, the sthe space ospace for s
maximum nack uses Exy of 78.57%
, it is possibaccordance
used input f the least in
Ethnicity
penness, nsation-
Ethnicity.
d, New Mixed-
penness, nsation-
t methods he method pace of the
sensitivity of the first six drugs:
number of xtraversion
%.
ble to rank e with the
feature is nformative
28
Table 11. The best results of the drug users classifiers in the original input space. Symbol ‘X’ means used input feature. Results are calculated by LOOCV.
Target feature Method Age
Edu
catio
n
Neu
rotic
ism
Ext
rave
rsio
n
Ope
nnes
s
Agr
eeab
lene
ss
Con
scie
ntio
usne
ss
Impu
lsiv
ity
Sen
satio
n-se
ekin
g
Gen
der
Sen
sitiv
ity
Spe
cifi
city
Sum
Alcohol LDA X X X X X 75.34% 63.24% 138.58% Amphetamines DT X X X X X X 81.30% 71.48% 152.77% Amyl nitrite DT X X X X 73.51% 87.86% 161.37% Benzodiazepines DT X X X X X X 70.87% 71.51% 142.38% Cannabis DT X X X X X X 79.29% 80.00% 159.29% Chocolate KNN X X X X 72.43% 71.43% 143.86% Cocaine DT X X X X X 68.27% 83.06% 151.32% Caffeine KNN X X X X X 70.51% 72.97% 143.48% Crack DT X X 80.63% 78.57% 159.20% Ecstasy DT X X X 76.17% 77.16% 153.33% Heroin DT X X X 82.55% 72.98% 155.53% Ketamine DT X X X X X 72.29% 80.98% 153.26% Legal highs DT X X X X X X 79.53% 82.37% 161.90% LSD DT X X X X X X 85.46% 77.56% 163.02% Methadone DT X X X X X 79.14% 72.48% 151.62% Magic Mushrooms DT X X 65.56% 94.79% 160.36% Nicotine DT X X X X 71.28% 79.07% 150.35% VSA DT X X X X X X 83.48% 77.64% 161.12%
Table 12. The best results of the drug users classifiers in the space of the first four principal components. Symbol ‘X’ means used input feature. Results are calculated by LOOCV.
Target feature Method PC 1 PC 2 PC 3 PC 4 Sensitivity Specificity Sum Alcohol GM X X 54.71% 70.59% 125.29% Amphetamines DT X 74.37% 77.78% 152.15% Amyl nitrite DT X X 61.35% 79.47% 140.82% Benzodiazepines DT X 64.63% 92.20% 156.83% Cannabis DT X X 78.02% 77.26% 155.28% Chocolate LDA X X X 57.35% 62.86% 120.21% Cocaine DT X 72.20% 85.23% 157.42% Caffeine LDA X X X 62.55% 78.38% 140.93% Crack DT X X 78.01% 77.10% 155.11% Ecstasy DT X X X 73.10% 73.46% 146.56% Heroin DT X 76.89% 74.72% 151.60% Ketamine DT X 73.43% 93.55% 166.98% Legal highs PDFE X X X X 75.98% 76.14% 152.12% LSD DT X 99.46% 61.30% 160.76% Methadone DT X X 79.38% 84.47% 163.85% Magic Mushrooms LDA X X X X 76.37% 69.10% 145.47% Nicotine DT X X X 79.67% 74.56% 154.23% VSA DT X X 83.48% 84.23% 167.71%
Results presented in Table 11 and Table 12, were calculated by LOOCV. It should be stressed that different methods of testing give different sensitivity and specificity. The widely used methods
include caentire samother.
For examspecificityuse the de71.16%, cpresented
Figure 8.nodes
procedurenon-user c
alculation omple (if it is
mple, a decy different fecision tree calculated uin the Tabl
Decision trees are depicteds described in
class it is 1.66.
f a test set e sufficiently
cision tree from LOOCfor ecstasy,
using the whle 11, show
e for ecstasy. d with dashedn section “Inp. Columns ‘W
errors (the hy large, so-c
formed foCV (Hastie, , depicted inhole samplesensitivity 7
Input featured border. Valuput feature tr
Weighted’ presu
29
holdout metcalled ‘naïve
or entire saTibshirani,
n the Figuree. Results o76.17% and
es are: age, Sues of age, SSransformationesent normalizum of weight
thod), k-folde’ method),
ample can & Friedma
e 8. It has sef LOOCV f
d specificity
ensation-seekS and gender n”. Weight ofzed weights: ts.
d cross-vali random sub
have accuran, 2009). Fensitivity 78for a tree w77.16%.
king (SS) andare calculatedf each case of weight of eac
idation, testibsampling,
racy, sensitor illustratio8.56% and s
with the sam
d gender. Nond by quantific
f user class is ch class is divi
ing on the and many
tivity and on we can specificity
me options,
n-terminal cation 1.91 and of ided by the
It is interbasis of aapparent p
Successfudrug consAlexandra2014b). TSensationand Figuraged betwsensation-9D) illusthypothese
Figure 9.
esting that age, genderpsychopatho
ul constructisumption foakis, Slater,The risk mn-seeking anre 9B) it canween 25-34-seeking havtrates qualites for furthe
Risk map of
the risk of r and Sensaology assoc
ion of a clasor each indi, Tuli, & G
map of ecstand gender) in be observe4 years, is sve significatatively the er study.
ecstasy consuand C) and
ecstasy conation-seekingiated with it
3.6
ssifier proviividual, alon
Gorban, 2014asy consumis depicted ed that a cosignificantly
antly less rissame shap
umption for: D) decision t
30
nsumption cg (see Tablts use..
Risk eva6
des us by anng with the4a; Mirkes
mption on thin Figure 9
onsiderable ay less for fsk. Decisione. The risk
A) & C) fematree based ma
can be evalule 11, Figur
aluation
n instrumene creation oE. , Alexanhe basis of. At the PDarea of highfemales, butn tree based
maps prov
ale and B) & ap; E) Legend
uated with hre 8, and F
nt for the evaof a map ofndrakis, Slaf three inpu
DFE based rh risk (indict young marisk maps (
vide a tool
D) male; A) &d of colours.
high accuraFigure 9), an
aluation of tf risk (Mirkater, Tuli, &ut features risk maps (Fcated in blueales with thFigure 9C afor the gen
& B) PDFE b
acy on the nd has no
the risk of kes E. M., & Gorban,
(i.e. age, Figure 9A e) for men he highest and Figure neration of
based map
31
4 Discussion
This study, along with other studies, demonstrates a strong correlation between personality profiles and drug use (Sutina, Evans, & Zonderman, 2013; Haider et al., 2002; Vollrath & Torgersen, 2002; Terracciano, Löckenhoff, Crum, Bienvenu, & Costa, 2008; Roncero et al., 2014). The studies indicate that individuals involved in drug usage are more likely to have higher scores for Neuroticism, and low scores for Agreeableness and Conscientiousness.
Sutina et al. (2013) demonstrated that the relationship between low Conscientiousness and drug consumption is moderated by poverty; low Conscientiousness is a stronger risk factor for illicit drug usage among those with relatively higher socio-economic status. Roncero et al. (2014) showed that personality factors (Neuroticism-Anxiety and Aggression-Hostility) have an important impact on the risk of experiencing psychotic symptoms. Vollrath & Torgersen (2002) observe that low Conscientiousness and high Neuroticism (or high Extraversion) are particularly associated with engagement in multiple risky health behaviours.
An individual’s personality profile plays a role in becoming a drug user. Terracciano et al. (2008) demonstrate that the personality profile for the users and non-users of nicotine, cannabis, cocaine, and heroin are associated with a Five-Factor Model of personality samples from different communities. They also highlight the links between the consumption of these drugs and low Conscientiousness. Turiano et al. (2012) found a positive correlation between Neuroticism and Openness, and drug use. On the other hand, increasing scores for Conscientiousness and Agreeableness decreases risk of drug use. Previous studies demonstrate that participants who use drugs including alcohol and nicotine have a strong positive correlation between Agreeableness and Conscientiousness and a strong negative correlation for each of these factors with Neuroticism (Haider, et al., 2002; Stewart & Devine, 2000) (This is common, as is the N/A/C complex for externalising.)
Our study reveals that all five personality factors of Neuroticism, Openness, Agreeableness, Extraversion, and Conscientiousness are relevant traits to be taken into account when assessing risk of drug consumption. This study has established that drug users for all 18 drugs score moderately high (+) or neutral (0) on the personality profile concerning Neuroticism and Openness, and moderately low (−) on both Agreeableness and Conscientiousness. However, when it comes to groups of legal drugs (i.e. alcohol, chocolate, caffeine, and nicotine) users score (0) on Agreeableness and Conscientiousness, apart from nicotine users, whose score of Conscien-tiousness is moderately low (−).
The impact of the Extraversion score is drug specific for two groups: (1) groups of problematic drug users of consuming crack, heroin, VSA, and methadone, who score (−) on Extraversion to (0) their Introversion; (2) for groups of other drugs (alcohol, amphetamines, amyl nitrite, benzodiazepines, cannabis, chocolate, cocaine, caffeine, ecstasy, ketamine, legal highs, LSD, magic mushrooms, nicotine, and VSA), users score (0) on Extraversion (see Table 5).
In addition, higher scores for Neuroticism and Openness lead to increased drug use, whereas lower scores for Conscientiousness and Agreeableness cause an increase of drug consumption. O is marked by curiosity and open-mindedness (and correlated with intelligence), and it is therefore understandable why higher O may be sometimes associated with drug use (Wilmoth, 2012). As a result, it can be seen how personality factors affect drug use. These findings have been confirmed
32
by our study. Our results improve the knowledge concerning the pathways leading to drug consumption.
There are, however, limitations to this study. This sample is biased with respect to the general population, but it can still be used for risk evaluation. A further limitation concerns the fact that a number of the findings may be culturally specific.
Unexpectedly successful classifiers have been created for all drugs, thus providing the possibility of evaluating individuals for the risk of drug consumption. The creation of risk maps forms a tool for the generation of hypotheses for further study.
33
Table 13. PCCs between drug consumptions
Drug use
Alc
ohol
Am
phet
amin
es
Am
yl n
itrit
e
Ben
zodi
azep
ine
Can
nabi
s
Cho
cola
te
Coc
aine
Caf
fein
e
Cra
ck
Ecs
tasy
Her
oin
Ket
amin
e
Leg
al h
ighs
LS
D
Met
hado
ne
Mus
hroo
ms
Nic
otin
e
VS
A
Alco. 1.000 0.0741 0.0741 0.0512 0.1191 0.0991 0.1111 0.1571 0.027 0.1051 0.033 0.0782 0.0612 0.0692 -0.007 0.0711 0.1131 0.0463 Amp. 0.0741 1.000 0.3721 0.4631 0.4691 0.013 0.5801 0.1061 0.3231 0.5971 0.3591 0.4121 0.4811 0.4901 0.4151 0.4811 0.3431 0.3041 Amy. 0.0741 0.3721 1.000 0.2261 0.2921 0.028 0.3811 0.0602 0.1441 0.3921 0.1371 0.3451 0.2681 0.2131 0.0841 0.2711 0.1961 0.1301 Ben. 0.0512 0.4631 0.2261 1.000 0.3541 0.006 0.4281 0.0552 0.3261 0.3831 0.3951 0.3031 0.3481 0.3521 0.4681 0.3661 0.2601 0.2941 Can. 0.1191 0.4691 0.2921 0.3541 1.000 0.0463 0.4531 0.1131 0.2161 0.5211 0.2171 0.3021 0.5261 0.4211 0.2991 0.4971 0.5331 0.2371 Cho. 0.0991 0.013 0.028 0.006 0.0463 1.000 0.006 0.1221 0.032 0.040 -0.026 0.035 0.017 0.029 0.007 0.024 0.037 -0.021 Coc. 0.1111 0.5801 0.3811 0.4281 0.4531 0.006 1.000 0.0991 0.3961 0.6331 0.4141 0.4541 0.4451 0.4421 0.3541 0.4801 0.3621 0.2771 Cof. 0.1571 0.1061 0.0602 0.0552 0.1131 0.1221 0.0991 1.000 0.035 0.1071 0.026 0.0583 0.0851 0.0751 0.039 0.1001 0.1453 0.0533 Cra. 0.027 0.3231 0.1441 0.3261 0.2161 0.032 0.3961 0.035 1.000 0.2801 0.5091 0.2551 0.2031 0.2681 0.3671 0.2761 0.1911 0.2781 Ecs. 0.1051 0.5971 0.3921 0.3831 0.5211 0.040 0.6331 0.1071 0.2801 1.000 0.3011 0.5111 0.5861 0.5991 0.3151 0.5991 0.3701 0.2891 Her. 0.033 0.3591 0.1371 0.3951 0.2171 -0.026 0.4141 0.026 0.5091 0.3011 1.000 0.2741 0.2371 0.3471 0.4941 0.3061 0.1851 0.2931 Ket. 0.0782 0.4121 0.3451 0.3031 0.3021 0.035 0.4541 0.0583 0.2551 0.5111 0.2741 1.000 0.3931 0.4621 0.2461 0.4361 0.2431 0.1921 Leg. 0.0612 0.4811 0.2681 0.3481 0.5261 0.017 0.4451 0.0851 0.2031 0.5861 0.2371 0.3931 1.000 0.5191 0.3341 0.5751 0.3641 0.3141 LSD 0.0692 0.4901 0.2131 0.3521 0.4211 0.029 0.4421 0.0751 0.2681 0.5991 0.3471 0.4621 0.5191 1.000 0.3431 0.6801 0.2891 0.2991 Met. -0.007 0.4151 0.0841 0.4681 0.2991 0.007 0.3541 0.039 0.3671 0.3151 0.4941 0.2461 0.3341 0.3431 1.000 0.3431 0.2341 0.2771 Mus. 0.0711 0.4811 0.2711 0.3661 0.4971 0.024 0.4801 0.1001 0.2761 0.5991 0.3061 0.4361 0.5751 0.6801 0.3431 1.000 0.3241 0.2531 Nic. 0.1131 0.3431 0.1961 0.2601 0.5331 0.037 0.3621 0.1453 0.1911 0.3701 0.1851 0.2431 0.3641 0.2891 0.2341 0.3241 1.000 0.2211 VSA 0.0463 0.3041 0.1301 0.2941 0.2371 -0.021 0.2771 0.0533 0.2781 0.2891 0.2931 0.1921 0.3141 0.2991 0.2771 0.2531 0.2211 1.000
1 p-value <0.001, 2 p-value <0.01, 3 p-value <0.05
34
5 Bibliography
Arlot, S., & Celisse, A. (2010). A survey of cross‐validation procedures for model selection. Statistics
surveys, 4, 40‐79.
Beaglehole, R., Bonita, R., Horton, R., Adams, C., Alleyne, G., Asaria, P., et al. (2011). Priority actions for
the non‐communicable disease crisis. The Lancet, 377 (9775), 1438‐1447.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful
approach to multiple testing. Journal of the Royal Statistical Society, 57(1), 289‐300.
Biau, G. (2012). Analysis of a random forests model. The Journal of Machine Learning Research, 13(1),
1063‐1095.
Bickel, W. K., Johnson, M. W., Koffarnus, M. N., MacKillop, J., & Murphy, J. G.‐6. (2014). The behavioral
economics of substance use disorders: reinforcement pathologies and their repair. Annual
review of clinical psychology, 10, 641‐677.
Bogg, T., & Roberts, B. (2004). Conscientiousness and health‐ related behaviors: A meta‐analysis of the
leading behavioral contributores to mortality. Psychological Bulletin, 130(6), 887‐919.
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5‐32.
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and Regression Trees.
Belmont, Calif: Wadsworth International Group.
Buhmann, M. D. (2003). Radial Basis Functions: Theory and Implementations. Cambridge: University
Press, Vol. 12.
Bulut, F., & Bukak, I. O. (2014). An urgent precaution system to detect students at risk of substance
abuse through classification algorithms. Turkish Journal of Electrical Engineering & Computer
Sciences, 22(3), 690‐ 707.
Clarkson, K. L. (2005). Nearest‐neighbor searching and metric space dimensions. Nearest‐neighbor
methods for learning and vision: theory and practice, 15–59.
Cleveland, M. J., Feinberg, M. E., Bontempo, D. E., & Greenberg, M. T. (2008). The role of risk and
protective factors in substance use across adolescence. Journal of Adolescent Health, 43(2),
157–164.
Costa, P. T., & MacCrae, R. R. (1992). Revised NEO‐Personality Inventory (NEO‐PI‐R) and the NEO‐Five
Factor Inventory (NEO‐FFI): Personality manual. Odessa,FL: Psychological assessment Resources.
Dietterich, T., Kearns, M., & Mansour, Y. (1996). Applying the weak learning framework to understand
and improve C4.5 In ICML. (pp. 96–104). Proc. of the 13th Int. Conf. on Machine Learning, San
Francisco: Morgan Kaufmann.
Dinov, I. D. (2008). Expectation maximization and mixture modeling tutorial. ExStatistics Online
Computational Resource.
35
DordiNejad, F. G., & Shiran, M. A. (2011). Personality Traits and Drug Usage among Addicts. Literacy
Information and Computer Education Journal (LICEJ), 2(2), 402‐405.
Dubey, C., Arora, M., Gupta, S., & Kumar, B. (2010). Five Factor Correlates: A Comparison of Substance
Abusers and Non‐Substance Abusers. Journal of the Indian Academy of Applied Psychology,
36(1), 107‐114.
Egan, V. (2011). Individual differences and antisocial behaviour. In Handbook of Individual Differences
(ed.) A. Furnham, S. Stumm, and K. Petredies(Oxford: Blackwell‐Wiley), 512‐537.
Egan, V., Deary, I., & Austin, E. (2000). The NEO‐FFI: Emerging British norms and an item‐level analysis
suggest N, A and C are more reliable than O and E. Personality and Individual Differences, 29(5),
907 ‐ 920.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2),
179–188.
Flory, K., Lynam, D., Milich, R., Leukefeld, C., & Clayton, R. (2002). The relations among personality,
symptoms of alcohol and marijuana abuse, and symptoms of comorbid psychopathology:
Results from a community sample. Experimental and Clinical Psychopharmacology, 10(4), 425–
434.
Fossati, P., Ergis, A. M., & Allilaire, J. F. (2001). Problem‐solving abilities in unipolar depressed patients:
comparison of performance on the modified version of the Wisconsin and the California sorting
tests. Psychiatry Research, 104(2), 145–156.
Fridberg, D. J., Vollmer, J. M., O'Donnell, B. F., & Skosnik, P. D. (2011). Cannabis users differ from non‐
users on measures of personality and schizotypy. Psychiatry research, 186(1), 46–52.
García‐Montes, J. M., Zaldívar‐Basurto, F., López‐Ríos, F., & Molina‐Moreno, A. (2009). The role of
personality variables in drug abuse in a Spanish university population. International journal of
mental health and addiction, 7(3), 475–487.
Gelfand, S. B., Ravishankar, C. S., & Delp, E. J. (1991). An iterative growing and pruning algorithm for
classification tree design. IEEE Transaction on Pattern Analysis and Machine Intelligence , 13(2),
163–174.
Gorban, A. N., & Zinovyev, A. (2010). Principal manifolds and graphs in practice: from molecular biology
to dynamical systems. International journal of neural systems, 20(3), 219–232.
Gorban, A. N., & Zinovyev, A. Y. (2008). Principal graphs and manifolds. In A. Gorban, B. .Kégl, D.
Wunsch, & A. Zinovyev (Eds.), Principal Manifolds for Data Visualisation and Dimension
Reduction (pp. 28‐59). Berlin‐Heidelberg‐NewYork: Springer.
Gujarati, D. N. (2003). Basic econometrics (4 ed.). McGraw‐Hill: Inc.,US.
Gurrera, R. J., Nestor, P. G., & O'Donnel, B. F. (2000). Personality Traits in Schizophrenia: Comparison
with a Community Sample. The Journal of nervous and mental disease, 188(1), 31‐35.
Guttman, L. (1954). Some necessary conditions for common‐factor analysis. Psychometrika, 19(2), 149‐
161.
36
Haider, A. H., Edwin, D. H., MacKenzie, E. J., Bosse, M. J., Castillo, R. C., Travison, T. G., et al. (2002). The
use of the NEO‐five factor inventory to assess personality in trauma patients: a two‐year
prospective study. Journal of Orthopaedic trauma, 16(9), 660‐667.
Hastie, T., & Tibshirani, R. (1996). Discriminant adaptive nearest neighbor classification. Pattern Analysis
and Machine Intelligence, IEEE Transactions on, 18(6), 607–616.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2 ed.). New York:
Springer.
Hoare, J., & Moon, D. (2010). Drug misuse declared: Findings from the 2009/10 British Crime Survey.
London: Home Office. Home Office Statistical Bulletin 13/10.
Home Office, UK. (2014). Drug misuse: findings from the 2013 to 2014 CSEW second edition.
https://www.gov.uk/government/statistics/drug‐misuse‐findings‐from‐the‐2013‐to‐2014‐csew.
Hosmer, D. W., & Lemeshow, S. (2004). Applied Logistic Regression (2 ed.). John: Wiley & Sons.
Jakobwitz, S., & Egan, V. (2006). The dark triad and normal personality traits. Personality and Individual
Differences, 40 (2), 331‐339.
Kaiser, H. (1960). The application of electronic computers to factor analysis. Educational and
Psychological Measurement, 20, 141‐151.
Kearns, M., & Mansour, Y. (1999). On the boosting ability of top‐down decision tree learning algorithms.
Journal of Computer and System Sciences, 58(1), 109–128.
Lee, S. Y., Poon, W. Y., & Bentler, P. M. (1995). A two‐stage estimation of structural equation models
with continuous and polytomous variables. British Journal of Mathematical and Statistical
Psychology, 48(2), 339–358.
Li, Q., & Racine, J. S. (2007). Nonparametric Econometrics: theory and practice. Princeton : University
Press.
Liaw, A., & Wiener, M. R. (2002). Classification and Regression by Random Forest. R news, 2(3), 18‐22.
Linting, M., & van der Kooij, A. (2012). Nonlinear Principal Components Analysis With CATPCA: A
Tutorial. Journal of Personality Assessment, 94(1), 12‐25.
Martinson, E. O., & Hamdan, M. A. (1971). Maximum likelihood and some other asymptotically efficient
estimators of correlation in two way contingency tables. Journal of Statistical Computation and
Simulation, 1(1), 45‐54.
McCabe, G. p. (1984). Principal Variables. Technometrics, 26(2), 137–144.
McCrae, R. R., & Costa, P. T. (1991). The NEO Personality Inventory: Using the Five‐Factor ModeI in
Counseling. Journal of Counseling & Development, 69(4), 367‐372.
McCrae, R. R., & Costa, P. T. (2004). A contemplated revision of the NEO Five‐Factor Inventory.
Personality and Individual Differences, 36(3), 587‐596.
37
McDaniel, S., & Mahan, J. (2008). An examination of the ImpSS scale as a valid and reliable alternative to
the SSS‐V in optimum stimulation level research. Personality and Individual Differences, 44(7),
1528–1538.
McGinnis, J. M., & Foege, W. H. (1993). Actual causes of death in the United States. Journal of the
American Medical Association, 270(18), 2207‐2212.
Mirkes, E. M., Alexandrakis, I., Slater, K., Tuli, R., & Gorban, A. N. (2014a). Computational diagnosis and
risk evaluation for canine lymphoma. Computers in biology and medicine, 53, 279‐290.
Mirkes, E., Alexandrakis, I., Slater, K., Tuli, R., & Gorban, A. (2014b). Computational diagnosis of canine
lymphoma. J.Phys.:Conf.Ser., 490, 012135.
Mitchell, T. M. (1997). Machine learning.1997. New York: Burr Ridge, IL: McGraw Hill,45.
Naikal, N., Yang, A., & Sastry, S. (2011). Informative feature selection for object recognition via sparse
PCA. In Proceedings of the 13th International Conference on Computer Vision (ICCV), 818‐825.
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. philosophical
magazine, 2(6), 559‐572.
Quinlan, J. R. (1987). Simplifying decision trees. International journal of man‐machine studies, 27(3),
221–234.
Ragan, D. T., & Beaver, K. M. (2010). Chronic offenders:A life‐course analysis of marijuana users. Youth
and Society, 42, 174‐198.
Rokach, L., & Maimon, O. (2010). Decision trees. In O. Maimon, & L. Rokach (Eds.), Data Mining and
Knowledge Discovery Handbook (pp. 165–192). Berlin: Springer.
Roncero, C., Daigre, C., Barral, C., Ros‐Cucurull, E., Grau‐López, L., Rodríguez‐Cintas, L., et al. (2014).
Neuroticism Associated with Cocaine‐Induced Psychosis in Cocaine‐Dependent Patients: A
Cross‐Sectional Observational Study. PloS one, 9(9), e106111.
Russell, S., & Norvig, P. (1995). Artificial Intelligence A Modern Approach. Egnlewood Cliffs: Prentice‐Hall,
25.
Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization (1 ed.). New
York: Wiley.
Settles, R. E., Fischer, S., Cyders, M. A., Combs, J. L., Gunn, R. L., & Smith, G. T. (2012). Negative urgency:
A personality predictor of externalizing behavior characterized by neuroticism, low
conscientiousness, and disagreeableness. Journal of Abnormal Psychology 121 (1), 160‐172.
Snowden, R., & Gray, N. S. (2011). Impulsivity and psychopathy: Associations between the Barrett
Impulsivity Scale and the Psychopathy Checklist revised. Psychiatry Research, 187(3), 414‐417.
Sofeikov, K. I., Tyukin, I. Y., Gorban, A. N., Mirkes, E. M., Prokhorov, D. V., & Romanenko, I. V. (2014).
Learning optimization for decision tree classification of non‐categorical data with information
gain impurity criterion. In Neural Networks (IJCNN), 2014 International Joint Conference on,
3548‐3555.
38
Stanford, M. S., Mathias, C. W., Dougherty, D. M., Lake, S. L., Anderson, N. E., & Patton, J. H. (2009). Fifty
years of the Barratt Impulsiveness Scale: An update and review. Personality and Individual
Differences, 47(5), 385‐395.
Stewart, S. H., & Devine, H. (2000). Relations between personality and drinking motives in young adults.
Personality and Individual Differences, 29(3), 495‐511.
Sutina, A. R., Evans, M. K., & Zonderman, A. B. (2013). Personality traits and illicint substances:the
moderation role of poverty. Drug and Alcohol Dependence, 131, 247‐251.
Terracciano, A., Löckenhoff, C. E., Crum, R. M., Bienvenu, O. J., & Costa, P. T. (2008). Five Factor Model
personality profiles of druge users. Bmc Psychiatry, 8(1), 22.
Turiano, N. A., Whiteman, S. D., Hampson, S. E., Roberts, B. W., & Mroczek, D. K. (2012). Personality and
substance use in midlife:Conscientiousness as a moderator and the effect of trait change.
Journal of Research in Personality, 46(3), 295‐305.
Valeroa, S., Daigre, C., Rodrígu, L., Gomà‐i‐Freixanet, M., Ferrer, M., Casasa, M., et al. (2014).
Neuroticism and impulsivity: Their hierarchical organization in the personality characterization
of drug‐dependent patients from a decision tree learning perspective. Comprehensive
Psychiatry, 55(5), 1227–1233.
Ventura, C. A., de Souza, J., Hayashida, M., & Ferreira, P. (2014). Risk factors for involvement with illegal
drugs: opinion of family members or significant others. Journal of Substance Use(0), 1–7.
Vollrath, M., & Torgersen, S. (2002). Who takes health risks? A probe into eight personality types.
Personality and Individual Differences, 32(7), 1185‐1197.
Williams, G. (2011). Data mining with Rattle and R: the art of excavating data for knowledge discovery.
Springer New York Dordrecht Heidelberg:London: Springer Science & Business Media.
Wilmoth, D. R. (2012). Intelligence and past use of recreational drugs. Intelligence, 40 (1), 15‐22.
World Health Organization. (2004). Prevention of mental disorders: Effective interventions and policy
options. Geneva Universities of Nijmegen and Maastricht: Summary report.
Yasnitskiy, L., Gratsilev, V., Kulyashova, J., & Cherepanov, F. (2015). Possibilities of artificial intellect in
detection of predisposition to drug addiction. Perm University Bulletin. Series «Philosophy.
Psychology. Sociology», 1 (21), 61‐73.
Zuckerman, M. (1994). Behavioral expressions and biosocial bases of sensation seeking. New York:
Cambridge University Press.
Recommended