Exploring mental well-being from
prisoner case notes using text mining
Jo Lee – Data Scientist at the Ministry of Justice
[email protected], @jo_noms on Slack
Mental well-being in prisoners
https://www.theguardian.com/society/2017/may/02/ministers-
should-have-legal-duty-to-combat-rise-in-prison-suicides
https://www.theguardian.com/society/2017/mar/11/prison-
psychiatrists-warm-mental-health-care-breaking-point
https://www.theguardian.com/healthcare-
network/2017/may/10/prison-mental-health-crisis
Unpublished, exploratory analysis & not for wider
distribution
Mental well-being in prisoners
Unpublished, exploratory analysis & not for wider
distribution
The most recent Adult Psychiatric Morbidity Survey of prisoners (Singleton et al 1998)
found that over 90% of prisoners had one or more of the five psychiatric disorders studied
(psychosis, neurosis, personality disorder, hazardous drinking and drug dependence).
A recent NICE publication indicated:
“There is low quality evidence for a range of systems for the delivery and
coordination of care in the criminal justice system (for example, drug or mental
health courts, and case management).”
“There is clear evidence of poor engagement, uptake and retention in treatment
for people with mental health problems in contact with the criminal justice system.”
https://www.nice.org.uk/guidance/ng66/resources/mental-health-of-adults-in-contact-with-the-criminal-
justice-system-pdf-1837577120965
Mental well-being in prisoners
The most recent Adult Psychiatric Morbidity Survey of prisoners (Singleton et al 1998)
found that over 90% of prisoners had one or more of the five psychiatric disorders studied
(psychosis, neurosis, personality disorder, hazardous drinking and drug dependence).
A recent NICE publication indicated:
“There is low quality evidence for a range of systems for the delivery and
coordination of care in the criminal justice system (for example, drug or mental
health courts, and case management).”
“There is clear evidence of poor engagement, uptake and retention in treatment
for people with mental health problems in contact with the criminal justice system.”
https://www.nice.org.uk/guidance/ng66/resources/mental-health-of-adults-in-contact-with-the-criminal-
justice-system-pdf-1837577120965
Surveys are expensive, so I am investigating the reliability of using
administrative data to gain understanding of prisoner well-being.
Unpublished, exploratory analysis & not for wider
distribution
Aims for project
Use text mining techniques to determine mental well-being:
Can the case notes be used to correctly identify prisoners with mental health
issues?
How does mental well-being correlate with external factors (i.e. drug/alcohol
issues, vulnerability/bullying, debt or smoking)?
Can the analysis feed into self-harm/violence predictive models?
Are the case notes an effective way to determine/investigate/monitor mental
health issues?
Unpublished, exploratory analysis & not for wider
distribution
The case notes
The case notes are a rich source of information, detailing the journey of a prisoner
through the prison system and providing key information about their well-being.
• there is no obligation to make case notes detailed
• the data set is not a complete representation of events in prisons
• gives insight into how prison officers document events and implemented procedures.
Mental health in the case notes
• Recording is idiosyncratic to each prison officer, but tend to be descriptive of events
that lead prison officers to worry about well-being
“X stated he wanted to be dead and became very emotional and started
to cry when asked about his visit with his mum earlier in the week he stated
he missed his children and wanted to see them but was told this would not
be possible at the present time.”
Unpublished, exploratory analysis & not for wider
distribution
Overview of methodology
Use text mining techniques to determine mental well-being:
Pre-processing the free text case notes.
Curating a mental well-being dictionary.
Exploratory analysis of the dictionary.
Use classification techniques to determine the polarity of the case notes.
Unpublished, exploratory analysis & not for wider
distribution
Overview of methodology
Convert to
UTF-8
Free text
case notes
Remove punctuation,
except separator ‘;’
Change ‘;’ to , in text to
use ‘;’ as separator
Remove all
numbers from case
notes (case notes
have date & time
stamp)
Place cleaned
notes in
postgreSQL DBDevelop a
Mental Well-
Being Dictionary
Search for
dictionary
terms
Categorise
the case
notes
Analyse
term/category
occurrence
Replace
misspellings
and abbrev.
Unpublished, exploratory analysis & not for wider
distribution
Pre-processing the free text case notes
Protocol was implemented at the command line: bash script
• This was because the 19 million case notes were too large to process in RAM
greedy languages (R or python)
for file in *.csv; do # change file directory for case note dumpiconv -f ISO-8859-1 -t UTF-8 "$file" > "${file%.txt}.utf8.converted"
echo "Converting "$file""done
for file in utf8.converted; do # create one file to processcat "$file" >> utf8.tmp
rm "$file" # deletes the old, temporary files after UTF-8 conversionecho ""$file" added to utf8.tmp"
done
The first step was to ensure the file encoding was correct, as it was provided in
CSV format from the pNOMIS database.
Unpublished, exploratory analysis & not for wider
distribution
Pre-processing the free text case notes
Remove duplicate lines, multiple concurrent punctuation occurrences, any
unprintable characters
cat utf8.tmp | awk '!seen[$0]++' | tr -s '[:punct:]' | sed's/[^[:print:]\r\t]/ /g' > tidy_casenote.tmpecho "tidied tmp"
Tidying the case notes into a format for text mining
Unpublished, exploratory analysis & not for wider
distribution
Pre-processing the free text case notes
AWK was invaluable for coding line-by-line pre-processing
hadn’t -> had not
OASYSS -> OASYS
recieved -> received
BEGIN { FS = ","while (getline
Pre-processing the free text case notes
Remove punctuation from the case notes
remove the first two columns: ID and date and then replace the CSV delimiter ‘,’
with ‘;’ in the case notes – removing instances of ‘;’ in the case notes
cat misspellings.tmp | cut -d',' -f1-2 | sed -i 's/,/;/g' > iddate.tmp# take first 2 columns, and change delimitercat misspellings.tmp | cut -d',' -f3- | sed -i 's/;/,/g' > casenote.tmp# take the case note and remove occurrences of the delimiter from there
pr -mts';' iddate.tmp casenote.tmp > casenote_forDBecho "made final file for DB"
Unpublished, exploratory analysis & not for wider
distribution
Curating a mental well-being dictionary
The dictionary was curated with input from
policy colleagues and DH colleagues.
Unpublished, exploratory analysis & not for wider
distribution
Clinical History
• Mental illness diagnosis (e.g. depression,
bipolar disorder, schizophrenia)
• Personality disorder diagnosis (e.g. borderline
personality disorder)
Psychological and Psychosocial Factors
• Desperate
• Angry
• Sad
• Ashamed
• Hopeless
• Worthless
• Lonely
• Disconnected
• Powerless
Current ‘context’
• Recent suicide/self-harm thoughts/actions
• Violence, intimidation or fear of these
• Parole refusal or other knock-back
• Longer sentence than expected
• Alcohol/drug misuse
• Irrational behaviour, out of touch with reality
• Recklessness
• Hostile rejection of help
Curating a mental well-being dictionary
The dictionary was curated with input from
policy colleagues and DH colleagues.
Unpublished, exploratory analysis & not for wider
distribution
Clinical History
• Mental illness diagnosis (e.g. depression,
bipolar disorder, schizophrenia)
• Personality disorder diagnosis (e.g. borderline
personality disorder)
Psychological and Psychosocial Factors
• Desperate
• Angry
• Sad
• Ashamed
• Hopeless
• Worthless
• Lonely
• Disconnected
• Powerless
Current ‘context’
• Recent suicide/self-harm thoughts/actions
• Violence, intimidation or fear of these
• Parole refusal or other knock-back
• Longer sentence than expected
• Alcohol/drug misuse
• Irrational behaviour, out of touch with reality
• Recklessness
• Hostile rejection of help
ACCT: Assessment of Care in Custody
Teamwork
Any prisoner believed to be at risk of self-
harm is placed under an ACCT review.
The individual is given an assessment, and
the mental health in-reach team is informed.
The individual is given the opportunity to
talk to a Listener and/or a Samaritan.
The day/night manager is informed of the
risk.
Curating a mental well-being dictionary
The dictionary was curated with input from
policy colleagues and DH colleagues.
The most commonly occurring mental health dictionary
terms in the case notes. The size of the word represents
occurrence.
Unpublished, exploratory analysis & not for wider
distribution
Clinical History
• Mental illness diagnosis (e.g. depression,
bipolar disorder, schizophrenia)
• Personality disorder diagnosis (e.g. borderline
personality disorder)
Psychological and Psychosocial Factors
• Desperate
• Angry
• Sad
• Ashamed
• Hopeless
• Worthless
• Lonely
• Disconnected
• Powerless
Current ‘context’
• Recent suicide/self-harm thoughts/actions
• Violence, intimidation or fear of these
• Parole refusal or other knock-back
• Longer sentence than expected
• Alcohol/drug misuse
• Irrational behaviour, out of touch with reality
• Recklessness
• Hostile rejection of help
Exploratory analysis of the case notes
The case notes can be categorised according
to their topic, into the topics on the plot below
• Since ACCT reviews can be opened for long
periods of time, it is not expected that every
case note will relate to mental well being.
• The most common categories are relations,
IEP (Incentives and Earned Privileges),
chaplaincy, and ACCT (if individual is on
ACCT).
The categories of case notes for individuals on
ACCT and not on ACCT review
not on ACCT
on ACCTFNC = first night in custody, which is when prisoners are
perceived to be at their most vulnerable
SID = individuals who went on to have a self-inflicted death
ACCT
Bullying
Canteen
Chaplaincy
Constant Supervision
Debt
Drug/Alcohol
Education
FNC
Gym
IEP
Medical
Relation
SID
Work
Unpublished, exploratory analysis & not for wider
distribution
Percentage of case notes
Exploratory analysis of the dictionary
Heat map showing probability of co-occurrence of 2 terms in the case notes for Sept - Nov 2016
The probability of both terms occurring is shown in each square of the heat map. The most commonly occurring terms are
‘ACCT’ with ‘suicidal’, ‘stress’ and ‘anxiety’ with ‘depression’.
0.06
Use classification techniques to determine
the polarity of the case notes
Unpublished, exploratory analysis & not for wider
distribution
The starting point for this iterative process is the R package ETEA.
This contains a list of words that assign polarity (positive or negative sense) to the case notes.
It contains its own categorical tags (specific for clinical notes), which will be refined to fit the case
notes and mental health.
https://github.com/chriskirkhub/etea
How the ETEA algorithm works
1. Cleans the case note, pre-processing
before running the algorithm.
2. Classification of the case note by binning it
by the presence of the categorical tags.
3. Assesses the polarity of the case note
(positive or negative).
Use classification techniques to determine
the polarity of the case notes
Unpublished, exploratory analysis & not for wider
distribution
How the ETEA algorithm works
1. Cleans the case note, pre-processing
before running the algorithm.
2. Classification of the case note by binning it
by the presence of the categorical tags.
3. Assesses the polarity of the case note
(positive or negative).
The starting point for this iterative process is the R package ETEA.
This contains a list of words that assign polarity (positive or negative sense) to the case notes.
It contains its own categorical tags (specific for clinical notes), which will be refined to fit the case
notes and mental health.
https://github.com/chriskirkhub/etea
“X stated he wanted to be dead and
became very emotional and started to
cry when asked about his visit with his
mum earlier in the week he stated he
missed his children and wanted to see
them but was told this would not be
possible at the present time.”
Use classification techniques to determine
the polarity of the case notes
Unpublished, exploratory analysis & not for wider
distribution
How the ETEA algorithm works
1. Cleans the case note, pre-processing
before running the algorithm.
2. Classification of the case note by binning it
by the presence of the categorical tags.
3. Assesses the polarity of the case note
(positive or negative).
The starting point for this iterative process is the R package ETEA.
This contains a list of words that assign polarity (positive or negative sense) to the case notes.
It contains its own categorical tags (specific for clinical notes), which will be refined to fit the case
notes and mental health.
https://github.com/chriskirkhub/etea
“X stated he wanted to be dead and
became very emotional and started to
cry when asked about his visit with his
mum earlier in the week he stated he
missed his children and wanted to see
them but was told this would not be
possible at the present time.”
Use classification techniques to determine
the polarity of the case notes
Unpublished, exploratory analysis & not for wider
distribution
How the ETEA algorithm works
1. Cleans the case note, pre-processing
before running the algorithm.
2. Classification of the case note by binning it
by the presence of the categorical tags.
3. Assesses the polarity of the case note
(positive or negative).
The starting point for this iterative process is the R package ETEA.
This contains a list of words that assign polarity (positive or negative sense) to the case notes.
It contains its own categorical tags (specific for clinical notes), which will be refined to fit the case
notes and mental health.
https://github.com/chriskirkhub/etea
“X stated he wanted to be dead and
became very emotional and started to
cry when asked about his visit with his
mum earlier in the week he stated he
missed his children and wanted to see
them but was told this would not be
possible at the present time.”
Evaulation of output
Unpublished, exploratory analysis & not for wider
distribution
Number of ETEA assignments that match manual assignments
The confusion table shows how well the algorithm works compared with
a manual assignment: 65% agreement
Algorithm assignment
Negative Neutral Positive
Ma
nu
al
as
sig
nm
en
t
Negative 14 11 7
Neutral 4 33 5
Positive 2 11 26
Longitudinal study
Unpublished, exploratory analysis & not for wider
distribution
Score
giv
en t
o c
ase n
ote
by E
TE
A
ETEA algorithm assignment of polarity against the manual
assignment of polarity (indicated by the colour)
Case notes over time
Strong negative
Weak negative
Neutral
Weak positive
Strong positive
1.
2.
manual assignment
Case note examples from the prisoner’s timeline in plot
1.“Personal officer Spoke with X who states he is very pleased to have been awarded his enhanced status. He states he is doing well with starting computers and art classes. He has finished his induction and is aware he will not be assessed for at least 9 months. He gets on with most prisoners on the unit and will help out with additional cleaning if so asked by staff. No issues raised.”
2.“X has told me that he would like to help in the barbershop as he has qulaifications in hair cutting. He has missed several music classes as a result. I would like X to decide which activity he would like to concentrate on and to adhere to his timetable accordingly and not pick and choose when he wants to come to education. When he does comes he is demanding of time and attention and always wants what he wants straight away and gets frustrated when things don’t happen as quickly as he would like. I am also wondering if part of his behaviour may be located somewhere on the autistic spectrum. I wonder if he has been tested for this. If not perhaps this might be a good idea to help him deal with his impatience and demanding nature.”
Self-Inflicted Death (SID) study
Unpublished, exploratory analysis & not for wider
distribution
De
nsity
0
Polarity score
The distribution of polarity scores for 4 week periods,
counting back in time for the self-inflicted death event
Does polarity change over time, in the weeks running up to a SID
Self-Inflicted Death (SID) study
Unpublished, exploratory analysis & not for wider
distribution
Does polarity change over time, in the weeks running up to a SID
Po
larity
0
March May
Date
The positive (green) and negative (red) contributions
to an individuals polarity over time
April
Violence Predictor (VIPER)
• The violence in prisons estimator (Viper) is a measure which is to be used
to assess the risk of an individuals likelihood to be a perpetrator of
violence in prison.
• Viper is a framework rather than a single score, which can be applied in
several situations to provide the most appropriate score/measure in that
specific scenario.
• The framework is made up of several random forest models. Each model
will represent the probability of an individual being the perpetrator of
violence in a month.
• The case note categorisation, and the polarity have been used as
variables in the machine learning.
Unpublished, exploratory analysis & not for wider
distribution
Conclusions and further work
Unpublished, exploratory analysis & not for wider
distribution
Using the case notes in predictive modelling
• The use of categorisation and polarity in the case notes is being tested in the violence predictive model.
• There are further applications for the case notes in self-harm predictive models.
Operational use of the case notes
• The case note analysis can be implemented in a Shiny application, which can be available to prison staff. This will permit
prison staff to search the case notes by prison or prisoner, by time, or by content of the case note.
Further analytical work
• A more detailed exploration of the case notes:
• Comparison of polarity between ACCT and non-ACCT review individuals. Can this be used to explain where
individuals with mental health issues are being missed?
• Can the magnitude of polarity be used as a useful metric? Especially when doing a longitudinal study for
individuals.
• Expansion of these methods to other free text fields. For example, using natural language processing / machine learning
to triage the ~500,000 intelligence reports received per year, to ensure that analysts time is only spent on the most
important ones, and applications in relation to sentence plans, pre-sentence reports.
Thank you for your attention
[email protected], @jo_noms on Slack
Acknowledgements
Chris Kirk – data scientist who developed ETEA
Offender Insight Team, DaSH, MoJ
Maria Angulo, Offender Health Analysis Team, MoJ
Unpublished, exploratory analysis & not for wider
distribution