Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December

Detecting Dementia from Written and Spoken LanguageMSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini

December 20th, 2017

2

• Recent interest in intersection of computational linguistics and clinical psychology

• Can we detect: • Depression from Facebook updates • PTSD from tweets

• Is it possible to detect dementia from interview transcripts?

• Why do we care?

Background

ContributionsBackground Medical

Study 2Study 1 ConclusionStudy 3

• What is dementia? • Broad category of brain diseases which cause decrease in mental ability

• Alzheimer’s disease • vascular dementia • dementia with Lewy Bodies, and others

• Alzheimer’s is one of the most costly diseases in developing countries

• 47 million people worldwide w/ dementia • One in nine people aged 65 and older have AD • $109 - $215 billion annually in US

3

Background



“Researchers believe that early detection of Alzheimer’s will be key to preventing, slowing and stopping the disease.”

4

A. Association. 2016 alzheimer’s disease facts and figures. https://www.alz.org/documents custom/2016-facts-and-figures.pdf, 2016

Background



• Currently diagnosis is: • Expensive • Time consuming • Invasive • Often goes undiagnosed, especially in developing countries

• Symptoms: • Wide variety of symptoms severe enough to affect daily

functioning • Aphasia: Speech and language difficulty

5

Background



6

• Our work builds upon Fraser 2015 • Address small data issue • Introduce a new feature set • Extend to written samples

Background• State of the art in

detecting AD was achieved by Fraser 2015* • Introduced a large (370) feature set • Achieved state-of-the-art results on

DementiaBank dataset

“Cookie Theft” Photo from the Boston Diagnostic Aphasia

Examination

ConclusionContributionsBackground Previous Work

Study 2Study 1 Study 3

*Fraser, Kathleen C., Jed A. Meltzer, and Frank Rudzicz. "Linguistic Features Identify Alzheimer’s Disease in Narrative Speech." Journal of Alzheimer's Disease 49.2 (2015): 407-422. APA

7

A clock drawn by a person with hemispatial neglect

Study 1: New feature sets• Fraser missed two feature sets

• One from the medical literature • One from the NLP literature

• We replicate their findings and show: • Measuring hemispatial neglect achieves new

state-of-the-art • Discourse analysis has no effect on

DementiaBank dataset

Thalia Shoshana Field, Vaden Masrani, Gabriel Murray, Giuseppe Carenini. "Improving Diagnostic Accuracy Of Alzheimer's Disease From Speech Analysis Using Markers Of Hemispatial Neglect." Alzheimer's & Dementia: The Journal of the Alzheimer's Association 13.7 (2017): P157-P158.

Contributions New feature sets

ConclusionBackground Study 2Study 1 Study 3

8

Study 2: Written Language• All previous work has been on spoken samples

• Written samples harder to process but… • There will be lots of data in the future

• We create a publicly available dataset of blog posts • Posts written by people with / without dementia • Allows for:

• Analysis of changes in writing style as disease progresses • Comparison between different pathologies

• We show it is possible to detect dementia from writing samples

Vaden Masrani, Gabriel Murray, Thalia Shoshana Field, Giuseppe Carenini. "Detecting Dementia through Retrospective Analysis of Routine Blog Posts by Bloggers with Dementia." BioNLP (2017).

ConclusionBackground Study 2Study 1 Study 3Contributions Written language

9

Study 3: Detecting Mild Cognitive Impairment

Vaden Masrani, Gabriel Murray, Thalia Shoshana Field, Giuseppe Carenini "Domain Adaptation for Detecting Mild Cognitive Impairment." Canadian Conference on Artificial Intelligence (2017).

• MCI is difficult to detect • Very little data (n=37) • Less symptomatic than AD • Less studied than AD

• We show how to overcome the lack of data • Idea: Use AD data to detect MCI • Technique: domain adaptation • We compare two algorithms across a range of models

ConclusionBackground Study 2Study 1 Study 3Contributions

MCI w/ Domain Adaptation

10

ConclusionContributionsBackground Study 2 Study 3

Study 1 Hemispatial

Neglect

Study 1

11

Hemispatial Neglect

A clock drawn by a person with hemispatial neglect

• Reduced awareness on one side of the visual field

• Often occurs as a result of brain damage

• Not blindness • Patients may not know they

have hemispatial neglect

ConclusionContributionsBackground Study 2 Study 3Study 1

Hemispatial Neglect

12

Hemispatial Neglect

Eye movements of a patient with left-side hemispatial neglect. Patient was asked to search for letter T among Ls

• Reduced awareness on one side of the visual field

• Often occurs as a result of brain damage

• Not blindness • Patients may not know they

have hemispatial neglect


Hemispatial Neglect

13


Hemispatial Neglect

14


Hemispatial Neglect

HalvesLeft boy, girl, cookie, jar, stool, cupboard, steal, fall, kitchen

Right woman, exterior, sink, plate, dishcloth, water, window,

dishes, curtains, wash, overflow, cupboard, kitchen

StripsFar-left girl, cookie, jar, stool, cupboard, steal, kitchen, cupboard

Center-left boy, cookie, stool, steal, fall, kitchen, cupboard

Center-right woman, exterior, sink, plate, dishcloth, water, window,

dishes, curtains, wash, overflow, kitchen, cupboard

Far-right exterior, window, dishes, curtains, kitchen, cupboard

QuartersNE woman, exterior, plate, dishcloth, wash, window, curtains,

kitchen

SE woman, sink, water, dishes, overflow, cupboard, kitchen

NW girl, cookie, jar, cupboard, steal, boy, cookie, kitchen

SW girl, stool, fall, cupboard, kitchen

15


Hemispatial Neglect

attentioni = # of mentions of info-units in division i

concentrationi =# of mentions of info-units in division i

total words uttered

repetitioni =# of unique info-units mentioned in division i

# of mentions of info-units in division i

perceptioni =# of unique info-units mentioned in division i

# of infounits in division i

16

Discourse Features• In a coherent passage one sentence has a clear relation

to the next • Discourse parsing is the task of segmenting a piece of

text into EDUs* and then forming a discourse tree

* EDU = Elementary Discourse Unit

ConclusionContributionsBackground Study 2 Study 3Study 1 Discourse Analysis

17

Discourse tree for the two sentences: "But he added: 'Some people use the purchasers' index as a leading indicator, and some use it as a coincident

indicator. But the thing it's supposed to measure - manufacturing strength - is missed altogether last month.'"


18

Discourse Features• # of each discourse relation • Depth of the discourse tree • Mean number of EDU per utterance • Discourse relation type-to-token ratio


• Baseline: Fraser’s feature set • Psycholinguistic (5) • Vocabulary Richness (4): • Acoustic (172): • Info-Units (40) • Parts-of-speech (15) • Context-free-grammar rules (44) • Syntactic Complexity (27) • Repetitiveness (5)

• 10-fold CV • Feature selection preprocessing step

19

Methodology

A parse tree for “a child is stealing cookies”

ConclusionContributionsBackground Study 2 Study 3Study 1 Methodology

20

ConclusionContributionsBackground Study 2 Study 3Study 1 Results

• Halves improves best model by 2.2%

• Strips second largest improvement

• Quadrant and discourse features have negligible effect

halves -> hemispatial neglect features w/ halves partition strips -> hemispatial neglect features w/ strips partition quadrants -> hemispatial neglect features w/ quadrants partition

Results

21


• Plot shows change is performance (from baseline)

• Quadratic terms don’t help all models

Results


22


• Plot shows change is performance (from baseline)

• Quadratic terms don’t help all models

Results


23


Results• Features are scored

based on correlation w/ diagnosis

• Score of 1.0 -> most correlated feature across all folds

• Perception: Rightside most important feature (more than age!)

24


Results• Participants w/

dementia:• are less

perceptive on the right side of the image

25



dementia:• use more

personal pronouns

26



dementia:• use shorter

words

27



dementia:• are older

28

ConclusionContributionsBackground Study 2 Study 3Study 1 Conclusion

Conclusion• Hemispatial neglect features

improve our ability to detect AD • On DementiaBank dataset • More correlated w/ diagnosis than age • Try with/without quadratic cross

terms

• Main negative result: • Discourse features have no effect on

DementiaBank dataset • May be due to non-narrative structure

of response

29

ConclusionContributionsBackground Study 3Study 1

Study 2 Written Samples

Study 2

30

Written Samples: Background• Most work to date has been with spoken language

samples • Detecting impairment from writing is more

difficult • No test-specific features • Not constrained to single topic • Author can make revisions

• There will be lots of data in the future • As the “internet generation” grows older • Today only 34% of seniors use social media

• Can we detect dementia from written language?

ConclusionContributionsBackground Study 3Study 1 Study 2 Written Samples

31

Data set• We developed novel data set of blog posts

• Three written by people with dementia • Three written by caretakers of people with dementia

• Will allow for: • Longitudinal analysis • Comparison between pathology, demographics • Comparison between subtypes of dementia

• We perform preliminary analysis • Is automatic detection possible?

ConclusionContributionsBackground Study 3Study 1 Study 2 Data set

32

Data set


Blog Name Posts Words Start Date Diagnosis Gender/Age

living-with-alzhiemers 344 263.03 (s=140.28) Sept 2006 AD M, 72 (approx)

creatingmemories 618 242.22 (s=169.42) Dec 2003 AD F, 61

parkblog-silverfox 692 393.21 (s=181.54) May 2009 Lewy Body M, 65

journeywithdementia 201 803.91 (s=548.34) Mar 2012 Control F, unknown

earlyonset 452 615.11 (s=206.72) Jan 2008 Control F, unknown

helpparentsagewell 498 227.12 (s=209.17) Sept 2009 Control F, unknown

Study 2 Data set

33

Methodology• Features

• Part-of-speech (15), psycholinguistic (5), syntactic complexity (27), repetitiveness (5), Vocabulary Richness (4), context-free-grammar (44)

• No info-unit feature or acoustic features • Include feature selection

• 9-fold CV • Test fold -> posts from 1 dementia blog, 1 control blog • Train fold -> posts from remaining 4 blogs

• Task: predict class of unseen blog post

ConclusionContributionsBackground Study 2 Methodology

Study 3Study 1

34

ConclusionContributionsBackground Study 3Study 1 Study 2 Results

• All models beat majority class baseline

• KNN/LogReg have best AUC

Results

35


• Ablation analysis • Remove feature

group, measure change in performance

• All feature groups are important • Psycholinguistic

features most of all

Results

36


Results• SUBTLWord score

most important feature • Measure of how

frequently a word is used in daily life

• Pronoun Use/ Word Length also score highly • Same findings as in

spoken dataset

37


Results

• ↑ SUBTLWord score == ↓ Vocabulary • Red == dementia

• SUBTLWord score most important feature • Measure of how

frequently a word is used in daily life

• Pronoun Use/ Word Length also score highly • Same findings as in

spoken dataset

38

ConclusionContributionsBackground Study 3Study 1 Study 2 Conclusion

Conclusion• Despite difficulties associated with written language, it is

possible to detect dementia from blog posts • All feature groups significant • All models achieved AUC above baseline (.50) • Blogger w/ dementia use simpler language

• SUBTL word score

• Future work • Control for education level • Topic clustering preprocessing step • Longitudinal analysis

39


Study 3 Domain

Adaptation

Study 3

40

Domain Adaptation: Background• Mild Cognitive Impairment

• Precursor stage which may lead to eventual dementia diagnosis

• Potentially treatable • Less data available than for AD

• Domain adaptation: • Useful when you have lots of data

from one domain and you want to use it to help you in a second domain

ConclusionContributionsBackground Study 2Study 1 Study 3 Domain

Adaptation

41

Methodology• Idea: Use AD data to improve MCI classification

accuracy • We compare two domain adaptation algorithms

against baselines: • Alg1: AUGMENT • Alg2: CORAL

• DementiaBank Data: • Source : 257 AD, 201 control • Target : 43 MCI samples, 41 control,

• 10-fold cross validation • Only target data in test fold

ConclusionContributionsBackground Study 2 Study 3 Methodology

Study 1

42

Alg1: AUGMENT• From widely cited 2009 paper “Frustratingly Easy

Domain Adaptation” by Hal Daumé III • Augment feature space by creating a shared, target

only, and source only copy of each feature:

ConclusionContributionsBackground Study 2 Study 3 Alg1: AUGMENT

Study 1

source data = Alzheimer’s + control, target data = Mild Cognitive Impairment + control

https://arxiv.org/find/cs/1/au:+Daume_H/0/1/0/all/0/1

43

Alg1: AUGMENT


Study 1


• Imagine d=2

# of words mean word length

71 4.1

71 4.1

44

Alg1: AUGMENT


Study 1


• Imagine d=2

source (AD)

target (MCI)

71 4.1

69 6.1

64 2.0

73 3.1

60 3.3

65 7.1

71 4.1

69 6.1

64 2.0

73 3.1

60 3.3

65 7.1

45

Alg1: AUGMENT


Study 1


• Imagine d=2

71 4.1 0 0 71 4.1

69 6.1 0 0 69 6.1

64 2.0 0 0 64 2.0

73 3.1 0 0 73 3.1

60 3.3 60 3.3 0 0

65 7.1 65 7.1 0 0

source (AD)

target (MCI)

71 4.171 4.171 4.1

69 6.1

64 2.0

73 3.1

60 3.3

65 7.1

71 4.1

69 6.1

64 2.0

73 3.1

60 3.3

65 7.1

46

Alg1: AUGMENT


Study 1


• Imagine d=2shared

71 4.1

69 6.1

64 2.0

73 3.1

60 3.3

65 7.1

target only

source only{ { {

source (AD)

target (MCI)

71 4.1 0 0 71 4.1

69 6.1 0 0 69 6.1

64 2.0 0 0 64 2.0

73 3.1 0 0 73 3.1

60 3.3 60 3.3 0 0

65 7.1 65 7.1 0 0

71 4.1

69 6.1

64 2.0

73 3.1

60 3.3

65 7.1

47

Alg1: AUGMENT


Study 1


• Important cavet! • Model must learn a weight vector

for this to work • e.g., KNN, Naive Bayes not expected to improve

48

Alg2: CORAL (CORrelation ALignment)• “CORAL minimizes domain shift by aligning the

second-order statistics of source and target distributions*”

• How? Make covariance matrix of source data == covariance matrix of target data

*B. Sun, J. Feng, and K. Saenko. Return of frustratingly easy domain adaptation. arXiv preprint arXiv:1511.05547, 2015

ConclusionContributionsBackground Study 2 Study 3 Alg2: CORAL

Study 1


49

Alg2: CORAL (CORrelation ALignment)

ConclusionContributionsBackground Study 2 Study 3 Alg2: CORAL

Study 1


50

ConclusionContributionsBackground Study 2 Study 3 Results

Study 1

Results• Alg1: AUGMENT

improves upon all baselines

• 5% increase from target_only

• Alg2: CORAL result in a worse performance than target_only


51 source data = Alzheimer’s + control, target data = Mild Cognitive Impairment + control


Results• Alg1: AUGMENT

fails to improve models that do not learn a weight vector • KNN • Random Forests • Naive Bayes

52AD = Alzheimer’s Disease, MCI = Mild Cognitive Impairment

ConclusionContributionsBackground Study 2Study 1 Study 3 Conclusion

Conclusion• Mild Cognitive Impairment is hard to detect:

• Lack of data • Less symptomatic

• We can improve MCI detection with AD data • Using domain adaptation

• Alg1: AUGMENT algorithm increases F-measure by 5% • Model must learn a weight vector

• Main negative result: • Alg2: CORAL doesn’t work on DementiaBank dataset • May be due to binary features

53

Motivation• Detecting dementia early is very

important • Key to finding a cure • For patients to receive support

• NLP + ML can be used to detect dementia from language samples

• Language samples are easy to collect • Application can distributed easily to

developing countries

Conclusion Motivation

ContributionsBackground Study 2 Study 3Study 1

“The Progression of Alzheimer's Through My Mom's Crocheting” - wuillermania

https://www.reddit.com/user/wuillermania

54

Contributions• We showed:

• Measuring hemispatial neglect improves accuracy

• It’s possible to automatically detect dementia from unstructured written text

• AD data can improve our ability to detect MCI using domain adaptation

Conclusion Contributions




55

Contribution

Conclusion Summary



• We released a new data set • Blog posts • Will permit longitudinal analysis

• Main negative results: • Discourse features have no effect w/

CookieTheft data • Domain adaptation alg2: CORAL

performs poorly w/ CookieTheft data


56

Future work

Conclusion Future work



• Multiple source domains • AD, vascular dementia, dementia with

Lewy Bodies • CookieTheft, Narrative Retelling Task,

Blogs

• Extend written data set • Emails • Forum conversations • Presidential tweets

• Longitudinal analysis • Track changes in writing style as

disease progresses


57

Acknowledgements• Dr. Giuseppe Carenini, University of British Columbia • Dr. Gabriel Murray, University of Fraser Valley • Dr. Thalia Field, UBC Faculty of Medicine

58

Thank you!

59

Methodology Feature Selection

60

Contribution 1 Results

F-Measure Baseline Strips Halves Quarters DiscourseLogReg 0.824 (0.798-0.850) 0.833 (0.801-0.866) 0.846 (0.813-0.878) 0.821 (0.784-0.859) 0.824 (0.798-0.850)

SVM 0.737 (0.688-0.786) 0.725 (0.683-0.766) 0.721 (0.678-0.764) 0.730 (0.693-0.767) 0.737 (0.688-0.786)

KNN 0.692 (0.654-0.729) 0.716 (0.674-0.759) 0.728 (0.669-0.788) 0.707 (0.675-0.739) 0.692 (0.654-0.729)

RandomForest 0.796 (0.761-0.832) 0.802 (0.759-0.845) 0.754 (0.717-0.792) 0.799 (0.765-0.834) 0.800 (0.763-0.836)

NaiveBayes 0.780 (0.746-0.814) 0.777 (0.747-0.806) 0.760 (0.732-0.789) 0.780 (0.750-0.810) 0.780 (0.746-0.814)

61


Change in F-Measure Strips Halves Quarters DiscourseLogReg 0.010 (-0.008-0.027) 0.022 (0.003-0.041) -0.002 (-0.033-0.028) 0.000

SVM -0.012 (-0.058-0.034) -0.016 (-0.084-0.053) -0.007 (-0.034-0.020) 0.000

KNN 0.025 (-0.013-0.062) 0.037 (-0.025-0.099) 0.016 (-0.013-0.045) 0.000

RandomForest 0.006 (-0.026-0.037) -0.042 (-0.072--0.012) 0.003 (-0.011-0.017) 0.003 (-0.012-0.019)

NaiveBayes -0.004 (-0.017-0.010) -0.020 (-0.046-0.006) -0.000 (-0.008-0.007) 0.000

62


Change in F-Measure Strips Halves Quarters DiscourseLogReg 0.010 (-0.008-0.027) 0.004 (-0.024-0.033) -0.002 (-0.033-0.028) 0.000

SVM -0.012 (-0.058-0.034) -0.019 (-0.052-0.014) -0.007 (-0.034-0.020) 0.000

KNN 0.025 (-0.013-0.062) 0.022 (-0.010-0.054) 0.016 (-0.013-0.045) 0.000

RandomForest 0.006 (-0.026-0.037) 0.004 (-0.028-0.037) 0.003 (-0.011-0.017) 0.003 (-0.012-0.019)

NaiveBayes -0.004 (-0.017-0.010) 0.001 (-0.011-0.013) -0.000 (-0.008-0.007) 0.000

63


Performance Majority Class LogReg SVM KNN Random

Forests Naive Bayes

Accuracy 0.515 (0.471-0.560)

0.822 (0.795-0.848)

0.723 (0.674-0.772)

0.701 (0.654-0.749)

0.796 (0.759-0.833)

0.782 (0.746-0.818)

AUC 0.500 0.894 (0.867-0.921)

0.769 (0.741-0.796)

0.750 (0.712-0.787)

0.871 (0.838-0.903)

0.846 (0.810-0.882)

F-Measure 0.677 (0.638-0.716)

0.824 (0.798-0.850)

0.737 (0.688-0.786)

0.692 (0.654-0.729)

0.796 (0.761-0.832)

0.780 (0.746-0.814)

64


F-Measure target only source only relabelled AUGMENT CORAL

Majority Class 0.283 (.122-0.445) 0.637 (.487-0.786) 0.637 (.487-0.786) 0.637 (.487-0.786) 0.637 (.487-0.786)

LogReg 0.667 (.505-0.829) 0.614 (.445-0.783) 0.690 (.577-0.803) 0.717 (.562-0.871) 0.637 (.487-0.786)

SVM 0.595 (.472-0.719) 0.637 (.487-0.786) 0.659 (.520-0.798) 0.664 (.533-0.796) 0.637 (.487-0.786)

source data = alzheimer’s + control, target data = MCI + control

65


F-Measure target only source only relabelled AUGMENT CORAL

Majority Class 0.283 (0.122-0.445) 0.637 (0.487-0.786) 0.637 (0.487-0.786) 0.637 (0.487-0.786) 0.637 (0.487-0.786)

RandomForest 0.602 (0.503-0.701) 0.670 (0.534-0.806) 0.695 (0.558-0.832) 0.581 (0.478-0.684) 0.642 (0.490-0.794)

KNN 0.597 (0.429-0.764) 0.522 (0.347-0.696) 0.550 (0.372-0.729) 0.558 (0.357-0.760) 0.637 (0.487-0.786)

NaiveBayes 0.536 (0.362-0.710) 0.520 (0.396-0.643) 0.545 (0.422-0.668) 0.512 (0.425-0.599) 0.647 (0.502-0.791)


66

Contribution 2 Domain Adaptation: CORAL

Whitening


67

Contribution 2 Domain Adaptation: CORAL

Recolouring


68


Performance Majority Class LogReg SVM KNN Random

Forests Naive Bayes

Accuracy 0.629 (0.583-0.674)

0.724 (0.677-0.770)

0.638 (0.595-0.681)

0.728 (0.687-0.769)

0.681 (0.617-0.745)

0.658 (0.595-0.721)

AUC 0.500 0.759 (0.689-0.829)

0.666 (0.613-0.719)

0.761 (0.714-0.807)

0.696 (0.599-0.792)

0.674 (0.574-0.775)

F-Measure 0.770 (0.736-0.804)

0.785 (0.743-0.827)

0.773 (0.739-0.806)

0.785 (0.746-0.823)

0.766 (0.719-0.813)

0.732 (0.680-0.783)

Documents

Detecting Dementia from Written and Spoken Language · 2020-03-07 · Dementia from Written and Spoken Language MSc Presentation by: Vaden Masrani Supervisor: Giuseppe Carenini December