Upload
roanna-moon
View
26
Download
2
Embed Size (px)
DESCRIPTION
Limitations with Activity Recognition Methodology and Data Sets. Gary Weiss Fordham University Jeffrey Lockhart Cambridge University Work supported by National Science Foundation Grant No. 1116124. Genesis of this work. - PowerPoint PPT Presentation
Citation preview
Gary WeissFordham University
Jeff rey LockhartCambridge University
Work supported by Nat ional Sc ience Foundat ion Grant No. 1116124 .
LIMITATIONS WITH ACTIVITY
RECOGNITION METHODOLOGY AND
DATA SETS
HASCA 2014 2
Our WISDM (Wireless Sensor Data Mining) Lab has been working on activity recognition for several years Have focused on building and deploying a real-world
system called actitrackerRecent work has focused on implementing,
analyzing, and using different types of modelsWhen comparing our AR work with other work
we identified several key issues in methodology, which also impact the resulting data sets
GENESIS OF THIS WORK
9/13/2014
HASCA 2014 3
Identify some methodological issues and resulting impact on data setsMake people aware of these issuesPropose mechanisms for addressing these issues
Largest focus on model type but many other factors are considered
Ultimate goal is to generate more diverse data sets and precisely label underlying assumptions
OVERVIEW
9/13/2014
HASCA 2014 4
Model Type:Personal, Impersonal, Hybrid
Collection Method:Fully Natural, Semi-Natural, Laboratory
DataNumber of SubjectsPopulation (college, elderly, etc.)Traits (height, weight, income, education ,…)Activities (running, jogging, standing, …)Duration (1 hour of data …)
FACTORS IMPACTING ACTIVITY RECOGNITION
9/13/2014
HASCA 2014 5
Sensors Type: accelerometer, gyroscope, barometer Sampling rate: 20Hz, 50 Hz, … Number of sensors Location of sensors (pocket, belt, wrist, …) Orientation (facing up, down, in, out)
Features Raw features Transformed, Window Size
Results Accuracy Consistency
FACTORS IMPACTING ACTIVITY RECOGNITION
9/13/2014
HASCA 2014 6
We examined 34 published AR papers Many were smartphone-based
Several papers cover multiple data sets and thus 38 data sets were analyzed
Several papers utilized multiple model types and hence 47 distinct models were analyzed
Detailed analysis published in Lockhart’s MS thesis: Benefits of Personalized Data Mining Approaches to
Human Activity Recognition with Smartphone Sensor Data A table describes each of the factors listed on prior 2
slides for each dataset Summary information described in this presentation
ANALYSIS OF AR RESEARCH
9/13/2014
HASCA 2014 7
Personal Models Model based on labeled data from intended user Requires new users to provide training data Our AR results show high accuracy (~98%)
Impersonal/Universal Models Model based on a panel of representative users No training phase required– works “out of the box” Our AR results show modest performance (~76%)
Hybrid Models Model based on panel of users that includes intended user Requires a training phase for user Our AR results much closer to personal models even
though panel includes dozens of users (~95%)
BACKGROUND ON MODEL TYPE
9/13/2014
HASCA 2014 8
Our results show that personal models perform really well with only small amounts of data per activity Little practical need for hybrid models given need for training
Why are hybrid models often used in research papers? Simple experimental setup: use cross validation on single
data set. No need to carefully partition the data. With n users, personal and impersonal models require n
separate partitions Often assumed that hybrid models approximate impersonal
models and are treated as such In actuality they are much closer to personal models
ISSUES WITH HYBRID MODELS
9/13/2014
HASCA 2014 9
Hybrid model most popular and authors often claim results generalizable to new users (not true) In 10 of 19 cases 10 or fewer
users so even closer to personal models (we had 59)
Couldn’t determine model type in 6 cases; serious methodological issue
53% of the cases we claim methodological issues (40% + 13%)
Model Type
Count Percentage
Personal 12 26%
Impersonal
10 21%
Hybrid 19 40%
Unknown 6 13%
ISSUE 1: MODEL TYPE
Analysis of 47 models from 38 data sets
9/13/2014
HASCA 2014 10
Number of subjects often small 11 studies had less than 5; 12 had less
than 10 HASC 2010 & 2012 more users but little
data per userImpacts ability to evaluate
performance Our results show impersonal models are
very inconsistent across users4 studies evaluated universal models
with less than 8 users; only 2 had at least 30
Populations should also be diverse but many studies focus on college students; personal info should also be provided (height, weight, etc)
ISSUE 2: # SUBJECTS & DIVERSITY
Distribution of impersonal model performance across 59 users
9/13/2014
HASCA 2014 11
Many possible distinctions but 3 main categories: Fully natural: normal daily activities Semi-natural: operate in normal environment but may
be directed (e.g., asked to walk for 5 minutes) Laboratory: structured tasks in a controlled
environmentType of collection environment should be
documented since this impacts results and ability to replicate We have released an AR data set that is semi-natural
and our Actitracker data set that is fully natural (except for self-training phase)
ISSUE 3: COLLECTION METHODOLOGY
9/13/2014
HASCA 2014 12
Type of sensor and number of sensorsUsually provided: not an issue
LocationPrecise location and orientation is often not specified
Our results indicate these factors are important For smartphone, which pants pocket? How oriented? Mine almost always down and in (i.e.,
screen facing thigh).
ISSUE 4: SENSORS
9/13/2014
HASCA 2014 13
Usually little choice in how to represent raw features except for sampling rate
Raw sensor data transformed into multivariate records using sliding window and summary features Half of studies don’t report window size
Vast majority of smartphone AR research only uses basic statistics Yield good results which appear to be
competitive with more complex features (e.g., based on FFT info)
ISSUE 5: FEATURES & FEATURE GENERATION
Distribution of window sizes for 52% of studies that report this info
9/13/2014
HASCA 2014 14
Important that all AR data sets:Release raw dataTransformed data or script to generate transformed data Descriptions of higher level features often not suffi ciently
well specified
Our datasets include raw and transformed data sets and recently we also released the transformation scripts Interestingly, researchers found inconsistencies between our raw and transformed data and helped us identify several bugs
FEATURES & FEATURE GENERATION
9/13/2014
HASCA 2014 15
Two main data setsActivity Prediction
36 users with semi-natural data collection All data is labeled with activity
Actitracker Data Data from our publically available Actitracker app Data set will be updated periodically Fully natural data collection with semi-natural data
collection for self-training data Self-training data is labeled; remaining data is not
labeled Available from: http://www.cis.fordham.edu/wisdm/dataset.php
9/13/2014
WISDM ACTIVITY RECOGNITION DATA SETS
HASCA 2014 16
All activity recognition research should clearly describe relevant factors and describe experimental methodology Propose a list of factors/issues to include Many existing studies do not provide important
informationHighlight role of model type
Show that many studies do not specify model type or use hybrid models
Hybrid models are inappropriate in most cases and many studies assume they approximate impersonal/universal models– which is contradicted by our research
9/13/2014
CONCLUSIONS
HASCA 2014 17
Material based on Jeff Lockhart’s MS Thesis
Activity Recognition research was supported by all WISDM Lab members
Funding provided by NSF Grant 1116124
9/13/2014
ACKNOWLEDGEMENTS
HASCA 2014 18
Information available from wisdmproject.comPapers available under “About: Publications” tab
Includes Jeff’s MS Thesis Jeff Lockhart, Gary Weiss (2014).
The Benefits of Personalized Smartphone-Based Activity Recognition Models, In Proc. SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, Philadelphia, PA, 614-622.
Info about our app available from actitracker.comApp available for download from Google Play
Feel free to download our data sets and ask us about our data
9/13/2014
MORE INFORMATION