Upload
gunarto-sindoro
View
110
Download
3
Embed Size (px)
Citation preview
On Selecting Feature-Value Pairs on Smart Phonesfor Activity InferencesGunarto Sindoro Njoo, Yu-Hsiang Peng, Kuo-Wei Hsu, Wen-Chih Peng
Introduction• Smart phone is getting smarter and smarter
2
Introduction• Computer • RAM: 2~8 GB on average• Storage: >500 GB• Power: Hundreds Watts
• Smartphone • RAM: 512 MB~3 GB• Storage: >4 GB• Power: a few Watts
• Sensor Hub • RAM: 16~64 KB• Storage: 64 KB~256 KB• Power: mW
3
Activity Inference Process
Raw data
Discretization
• MDLP• LGD
Classifier construction
• Decision Tree• Naïve Bayesian• K-Nearest Neighbor• SVM
4
Activity Inference Process
Raw data
Discretization
• MDLP• LGD
Feature-Value
selection
• ONEFVAS• GIFVAS• CBFVAS
Classifier construction
• Decision Tree• Naïve Bayesian• K-Nearest Neighbor• SVM
5
Feature-Value Selection• What is feature-value?• A range of sensor reading• e.g. Accelerometer magnitude high, GPS at home, light bright
• Why using feature-value?• Sensor reading relation with activity: relevant or not• e.g. Accelerometer magnitude value reading
Accelerometer: LowAccelerometer: Low Accelerometer: high 6
FEATURE-VALUE METHODS
One-CutIteration-basedCorrelated-based
7
One-Cut (ONEFVAS)• Entropy-based selection using threshold
Entropy <= 0.5
8
Iteration-based (GIFVAS)• Looping on the threshold, selecting feature-values iteratively.• Evaluating accuracy for each iteration• If accuracy reduction is big• Then cancel the selection on this iteration, tag any feature-value
as special• Any special feature-value will be remained until the last iteration
• Special feature-value• Frequent but confusing • Pure but infrequent
1.000 0.885 0.770 0.655 0.540 0.425 0.310 0.195 0.0800.00%
50.00%
100.00%
Accuracy
Entropy Threshold1.000 0.885 0.770 0.655 0.540 0.425 0.310 0.195 0.080
0.00%
50.00%
100.00%
Feature-Value Pairs
Entropy Threshold
9
Correlation-based• Using Pearson correlation in the feature level• Using entropy in the feature-value level• For each feature-value pair• Generate correlated feature-value • Sort the correlated feature-value using entropy
• Pick only the best-N feature-value from it• Discard other feature-value
1 3 5 7 9 11 13 15 1780.00%
85.00%
90.00%
95.00%
100.00%
Accuracy
CorrelationOriginal
Best-N feature-value remained
1 3 5 7 9 11 13 15 17350.00 KB
550.00 KB
750.00 KB
950.00 KB
1150.00 KB
Model Size
CorrelationOriginal
Best-N feature-value remained
10
Experiments• Environments:• Intel Quad Core 2.66GHz• RAM 8 GB• Java 7• Weka 3.6.11 (all default parameter)
• Datasets:• Collect from 11 participants• At least 2 different activities, up to 6 activities• Average 3 weeks, maximum 2 months
• Classifier Algorithm:• Naïve Bayesian• Decision Tree (J48)• SVM (SMO)• k-Nearest Neigbor (kNN)
11
Experiments (Model Size)
• Feature-value selection is not effective on Naïve Bayesian• In general, feature-value selection works best on decision tree
Original ONEFVAS GIFVAS CBFVAS0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Model Size (LGD)
Original ONEFVAS GIFVAS CBFVAS0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Model Size (MDLP)
Naïve BayesDecision TreekNNSVM
12
Experiments (LGD)
Original ONEFVAS GIFVAS CBFVAS0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Model Size (LGD)
Original ONEFVAS GIFVAS CBFVAS40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Accuracy (LGD)
Decision TreekNNSVM
• ONEFVAS gives the biggest saving on model size, but accuracy is low• Most stable accuracy is on CBFVAS, while reducing model size well
13
Experiments (MDLP)
Original ONEFVAS GIFVAS CBFVAS0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Model Size (MDLP)
Original ONEFVAS GIFVAS CBFVAS40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Accuracy (MDLP)
Decision TreekNN
• Accuracy are more stable, while reductions on model size are well• In most cases, decision tree gets the most benefit.
14
Conclusions• Proposed feature-value selection for reducing model size• ONEFVAS – Using entropy threshold• GIFVAS – Using iteration on entropy threshold• CBFVAS – Using correlation and entropy
• Proposed method is able to reduce model size while maintaining accuracy performance• Performance varies on discretization and classification algorithms
• Decision Tree gets the most benefit
15
Thank youOn Selecting Feature-Value Pairs on Smart Phones for Activity Inferences
Presented by: Gunarto Sindoro [email protected]
16