Upload
sydnee-case
View
31
Download
2
Embed Size (px)
DESCRIPTION
ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006. 1. Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6. OUTLINE. Introduction Entropy and Concept Shift Adaption Calculating Entropy on Data Streams - PowerPoint PPT Presentation
Citation preview
1
ENTROPY-BASED CONCEPT SHIFT DETECTIONPETER VORBURGER, ABRAHAM BERNSTEINIEEE ICDM 2006
Speaker: Li HueiJyun
Advisor: Koh JiaLing
Date:2007/11/6
1
2
OUTLINE
Introduction Entropy and Concept Shift Adaption
Calculating Entropy on Data Streams Algorithm Control Strategy using Entropy
Measure Experimental setup Experimental Results Discussion of the Experiments Application to a Real-World Problem: Context
Switches in Sensor Data Limitations, Future Work, and Conclusion 2
3
INTRODUCTION
Problem: In many applications data is gathered over time,
which raises the problem that the concepts to be learned may drift (i.e., change) over time.
The increasing amount of data (e.g., multimedia content, data warehouse ) and limitation of computing power due to miniaturization (e.g., wearable computing) call for faster and more resource friendly algorithms.
Motivation: the analysis of sensor data on wearable devices.
3
4
INTRODUCTION
Context-awareness: A Scenario-based Approach for Direct Interruptablity Prediction on Wearable Devices Classifiers predict peoples’ anticipated behavior
based on sensory input Contexts (or contextual situations) switch rather
than gradually change Contextual information could be reused, even for
new, not yet encountered situations An ongoing monitoring of the sensor stream
is needed4
5
INTRODUCTION
Problem: online pattern matching mechanism comparing
the sensor stream to the entire library of already known contexts is computational complex and not yet suitable for today’s wearable devices.
Solution: indicate possible candidates (or hot spots) for
context changes limiting the computationally intensive context (re-)determination on those candidates.
5
6
ENTROPY AND CONCEPT SHIFT ADAPTION
Assumptions: As long as the distribution of older instances
(features and target values) is similar to the distribution of new instances no concept drift occurred
A distribution difference between older and more recent instances indicates a change in the target concept
Measure the distribution inequality: If two distributions are equal, the entropy
measure results in a value of 1 If they are absolutely different the measure will
result in a value of 0 6
7
CALCULATING ENTROPY ON DATA STREAMS
Sliding window technique: compares two windows, one presenting older and the other representing more recent instances in the stream Compare the two windows by counting and
comparing all instances with respect to their class and stream membership
Discretize the range of instance values to a fixed number of bins to take the approximate value distribution into account
7
8
CALCULATING ENTROPY ON DATA STREAMS
A data stream: a sequence consisting of sequentially ordered tuples in time ti
i (1, 2, 3, …) := ( , li)
where is the vector of all feature stream instances sni
at time ti
The domain of the label stream l is discrete and contains all class values c C
8
is
==============
id==============
id==============
is==============
9
CALCULATING ENTROPY ON DATA STREAMS
Hi: the resulting entropy at time ti and is defined as the mean of all data stream entropies His at time ti
where
S: the number of feature-streams His is calculated from the entropies Hiscb
1
1 S
i i s
s
H HS
1 1
C B
i s i scb
c b
H H
10
CALCULATING ENTROPY ON DATA STREAMS
Hiscb: represent the entropy of each class (c C) and bin (b B) given the stream s at time ti
Bins: discrete aggregation of the values of each feature stream s
: the probability that an instance occurs in the old window at time ti, belong to class c, with feature domain of stream s in bin b
wiscb: depend on i, s, c, b
EXPERIMENTAL SETUP
Real concept drifts: changes in the actual target concepts
Virtual concept drifts: changes in the distribution
Generate synthetic data set: H. Wang, W. Fan, P. S. Yu, and J. Han. Mining
concept-drifting data streams using ensemble classifiers. In KDD ‘03: proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining Real drift data set Virtual drift data set Mixed data set
12
EXPERIMENTAL SETUP
Performance measure: Accuracy The area under the ROC-curve
A representative set of benchmarks: Perfect benchmark: assumes an oracle-given
ideal window size ξ for any point in time A selection of ensemble classifiers: the literature
so far showed to have the highest accuracy and robustness against noise
P. Vorburger and A. Bernstein. Entropy-based detection of real and virtual concept shifts. Working Paper – University of Zurich, Department of Informatics, 2006
13
EXPERIMENTAL RESULTS
Computational complexity Compare ensemble classifiers and the entropy
measure based algorithm Measure the elapsed time:
three committee classifiers: 2031.6±15s Entropy based algorithm: 148.6s Entropy calculation without Naïve Bayes model
building: 1.1±0.1s
16
DISCUSSION OF THE EXPERIMENT
Entropy measure outperforms the ensemble benchmark algorithm on real concept shifts
Exhibit a greater predictive power while requiring less computational resources
The entropy measure based algorithm showed the nearly the same robustness towards noise as the perfect benchmark and the committee classifiers
17
APPLICATION TO A REAL-WORLD PROBLEM: CONTEXT SWITCHES IN SENSOR DATA
Data set: Audio: decomposed into 10 features accelerometer data recorded over a time of
15381s: merged in one single feature The wearable data acquisition set up: a
microphone and three three-dimentional accelerometers attached on the subject’s shoulder, wrist, and leg
18
APPLICATION TO A REAL-WORLD PROBLEM: CONTEXT SWITCHES IN SENSOR DATA
(A)walking (B)streetcar (C)office work (D)lecture (E)cafeteria (F)meeting
19
LIMITATIONS, FUTURE WORK, AND CONCLUSION
Gradual concept drifts Find boundary conditions Recognize recurring concepts and exploit this
information Generalizability The choice of the suitable parameters could
be optimized
20
LIMITATIONS, FUTURE WORK, AND CONCLUSION
Find a measure for detecting and measuring concept shifts as an analogue for context switches
Formulation of entropy on data streams is capable to detect and measure concept shifts
Algorithm with an entropy based instance selection strategy outperformed ensemble based algorithms on real concept shift data sets
Given algorithm robustness towards noise, its sensitivity towards concept shifts, its computational efficiency, and predictive power on real concept shift data sets
21