Upload
barnan-das
View
207
Download
7
Tags:
Embed Size (px)
DESCRIPTION
This paper was selected at the ICDM Workshop on Data Mining in Biomedical Informatics and Healthcare (DMBIH), 2013.
Citation preview
Barnan DasSchool of Electrical Engineering and Computer Science
Washington State University
***Self-portraits by William Utermohlen, an American artist living in London, after he was diagnosed with Alzheimer’s disease in 1995. Utermohlen died from the consequences of Alzheimer’s disease in March 2007.
Handling Class Overlap and Imbalance to Detect Prompt Situations in
Smart HomesBarnan Das, Narayanan C. Krishnan, Diane J. Cook
2
Worldwide Dementia population
Source: World Health Organization and Alzheimer’s Association.
Actual and expected number of Americans >=65 year with Alzheimer’s
Payment for care in 2012$200billion
Unpaid caregivers15million
36million
2010 2030 2050
5.1m
7.7m
13.2m
3
Automated Prompting
4
Help with Activities of Daily Living (ADLs)
5
• Rule-based (temporal or contextual)• Activity initiation• RFID and video-input based prompts for
activity steps
• Learning-based• Sub-activity level prompts• No audio/video input
Existing Work
Our Contribution
Architectural Overview
6
7
Data
8 dailyactivities
300 elderlyparticipants
Prompts issued when errors were committed
Raw Data
1 activitystep
17 engineered features
Binary class{no-prompt, prompt}
Clean Data
1 datapoint
0/1
SweepingCookingMedicationWatering PlantsEtc.
Length of activity stepLocation in apartment
# sensors involves# distribution of sensor events
Etc.
Class Distribution
8
149
3831
Total number of data points
3980
Overlapping Classes9
Overlapping Classes in Prompting Data
10
3D PCA Plot of prompting data
Existing Approaches
11
• Discard data of the overlapping region
• Treat overlapping region as a separate class
• Polynomial combination of existing features
• Using kernel methods
Tomek Links
12
Cluster-Based Under-Sampling(ClusBUS)
13
Form clusters Under-sampling candidate clusters
14
Choice of Clustering Algorithm
Determining Candidate Clusters
DBSCAN• Density-based• Non-spherical clusters• No need to predetermine
number of clusters
Empirically Determined• Based on minority class
dominance (r) in clusters• Threshold determined by q-
quantile values of r
Two Critical Components
Empirically Determined Threshold
15
Experimental Setup
16
Alternative Sampling Method SMOTE
Classifiers• C4.5 Decision Tree• Naïve Bayes• k-Nearest Neighbor• SVM
Performance Metric TP Rate, G-mean, AUC
Results (1)
17
C4.5 Naïve Bayes IBk SMO0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Original SMOTE ClusBUS
TP R
ate
C4.5 Naïve Bayes IBk SMO0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Original SMOTE ClusBUS
G-m
ean
TP Rate G-mean
Results (2)
18
C4.5 Naïve Bayes IBk SMO0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Original SMOTE ClusBUS
AUC
Area Under ROC Curve
Conclusion
19
• Automated prompting as a classification problem
• Proposed ClusBUS: under-sampling-based preprocessing
• Solution to class overlap helps address imbalance classes
21
Backup Slides
22
Activities
23
Sweeping
Cooking
Taking Medication
Watering Plants
Watching DVD
Selecting Outfit
Taking Phone Call
Writing Birthday Card
Feature Generation
24
Feature # Feature Name Description
1 stepLength Length of the step in time (seconds)
2 numSensors Number of unique sensors involved with the step
3 numEvents Number of sensor events associated with the step
4 prevStep Previous step
5 nextStep Next step
6 timeActBegin Time (seconds) elapsed since the beginning of the activity
7 timePrevStep Time (seconds) difference between the last event of the previous step and the first event of the current step
8 stepsActBegin Number of steps visited since the start of the activity
9 activityID Activity ID
10 stepID Step ID
11 location Set of features representing sensor frequencies in kitchen, dining room, living room, etc. when the activity was performed
12 Class Binary class. 1-”Prompt”, 0-”No-Prompt”