Upload
kalli
View
26
Download
0
Embed Size (px)
DESCRIPTION
An adaptive modular approach to the mining of sensor network data. G. Bontempi, Y. Le Borgne (1) {gbonte,yleborgn}@ulb.ac.be Machine Learning Group Université Libre de Bruxelles – Belgium - PowerPoint PPT Presentation
Citation preview
An adaptive modular approach to the mining of sensor network data
G. Bontempi, Y. Le Borgne (1)
{gbonte,yleborgn}@ulb.ac.be
Machine Learning Group
Université Libre de Bruxelles – Belgium(1) Supported by the COMP2SYS project, sponsored by the HRM program of the European
Community (MEST-CT-2004-505079)
Y. Le Borgne 2
Outline
Wireless sensor networks: Overview
Machine learning in WSN
An adaptive two-layer architecture
Simulation and results
Conclusion and perspective
Y. Le Borgne 3
Sensor networks : Overview
Goal : Allow for a sensing task over an environment
Desiderata for the nodes:Autonomous power
Wireless communication
Computing capabilities
Y. Le Borgne 4
Smart dust project
Smart dust: Get mote size down to 1mm³Berkeley - Deputy dust (2001)
6mm³
Solar powered
Acceleration and light sensors
Optical communication
Low cost in large quantities
Y. Le Borgne 5
Current available sensors
Crossbow : Mica / Mica dotuProc: 4Mhz, 8 bit Atmel RISCRadio: 40 kbit 900/450/300 MHz or
250 kbit 2.5GHz (MicaZ 802.15.4)Memory: 4K RAM / 128 K Program Flash /
512 K Data FlashPower: 2 x AA or coin cell
Intel : iMoteuProc: 12Mhz, 16 bit ARMRadio: BluetoothMemory: 64K SRAM / 512 K Data FlashPower: 2 x AA
MoteIV : TelosuProc: 8Mhz, 16 bit TI RISCRadio: 250 kbit 2.5GHz (802.15.4)Memory: 2 K RAM / 60 K Program Flash /
512 K Data FlashPower: 2 x AA
Y. Le Borgne 6
Applications
Wildfire monitoring
Ecosystem monitoring
Earthquake monitoring
Precision agriculture
Object tracking
Intrusion detection
…
Y. Le Borgne 7
Challenges for…
Electronics
Networking
Systems
Data bases
Statistics
Signal processing
…
Y. Le Borgne 8
Machine learning and WSN
Local scale
Spatio-temporal correlationsLocal predictive model identification
Can be used to:Reduce sensor communication activity
Predict values for malfunctioning sensors
Y. Le Borgne 9
Machine learning and WSN
Global scale
The network as a a whole can achieve high level tasks
Sensor network <-> Image
Y. Le Borgne 10
Supervised learning and WSN
Classification (Traffic type classification)
Prediction (Pollution forecast)
Regression (Wave intensity, population density)
Y. Le Borgne 11
A supervised learning scenario
Ѕ: Network of S sensors
x(t)={s1(t),s2(t),…sS(t)} snapshot at time t
y(t)=f(x(t))+ε(t) the value associated to S at time t (ε standing for noise)
Let DN be a set of N observations (x(t),y(t))
Goal : Find a model that predicts y for any new x
Y. Le Borgne 12
Centralized approach
High transmission overhead
Y. Le Borgne 13
Two-layer approach
Use of compressionReduce transmission overhead
Spatial correlation induces low loss in compression
Reduction of learning problem dimensionality
Y. Le Borgne 14
Two-layer adaptive approach
PAST : Online compression
Lazy learning : Online learning
Y. Le Borgne 15
Compression : PCA
PCA: Transform the set of n input variables , into a set of m variables , m<n.
Linear transformation : ,
Variance preserving maximization
Solution :m first eigenvectors of x correlation matrix, or
Minimization of
Y. Le Borgne 16
PAST – Recursive PCA
Projection approximation subspace tracking [YAN95]
Online formulation:
Low memory requirement and computational complexity :
O(n*m)+O(m²)
Y. Le Borgne 17
PAST AlgorithmRecursive formulation: [HYV01]
Y. Le Borgne 18
Learning algorithm
Lazy learning: K-NN approachStorage of observation set:
When a query q is asked, takes the k nearest neighbours to q:
Builds a local linear model: , such that
Computes the output at by applying
Y. Le Borgne 19
How many neighbours?
•y=sin(x)+e
•e : Gaussian noise with σ=0.1
•What is the y value at x=1.5?
Y. Le Borgne 20
How many neighbours?
•K=2 : Overfitting
Y. Le Borgne 21
How many neighbours?
•K=2 : Overfitting
•K=3 : Overfitting
Y. Le Borgne 22
How many neighbours?
•K=2: Overfitting
•K=3: Overfitting
•K=4: Overfitting
Y. Le Borgne 23
How many neighbours?
•K=2: Overfitting
•K=3: Overfitting
•K=4: Overfitting
•K=5: Good
Y. Le Borgne 24
How many neighbours?
•K=2: Overfitting
•K=3: Overfitting
•K=4: Overfitting
•K=5: Good
•K=6: Underfitting
Y. Le Borgne 25
Automatic model selection([BIR99],[BON99],[BON00])
Starting with a low k, local models are identifiedTheir quality is assessed by a leave one out procedureThe best model(s) are kept for computing the predictionLow computational cost
PRESS statistics (ALL74)Recursive least squares ([GOO84])
Y. Le Borgne 26
Advantages of PAST and lazy
No assumption on the process underlying data
On-line learning capability
Adaptive with non-stationarity
Low computational and memory costs
Y. Le Borgne 27
Simulation
Modeling wave propagation phenomenon
Helmholtz equation:
k is the wave number
•2372 sensors
•30 k values between 1 and 146; 50 time instants
•1500 Observations
•Output k is noisy
Y. Le Borgne 28
Test procedure
Prediction error measurementNormalized Mean Square Error (NMSE)
10-fold cross-validation (1350/150)
Example of learning curve:
Y. Le Borgne 29
Experiment 1
Centralized configuration
Comparison PCA / PAST for 1 to 16 first principal components
Y. Le Borgne 30
Results
m 1 2 3 4 5 6 8 12 16NMSE
PCA0.621 0.266 0.181 0.144 0.138 0.134 0.133 0.124 0.116
NMSE PAST
0.782 0.363 0.257 0.223 0.183 0.196 0.132 0.124 0.115
•Prediction accuracy similar if number of principal components sufficient
Y. Le Borgne 31
Clustering
The number of clusters involves a trade-off between
The routing costs between clusters and gateway
The final prediction accuracy
The robustness of the architecture
Y. Le Borgne 32
Experiment 2Partitioning into geographical clusters
P varies from P(2) to P(7)
2 main components for each cluster
Ten-fold cross-validation – 1500 data
Example of P(2) partitioning
Y. Le Borgne 33
Results
P(2) P(3) P(4) P(5) P(6) P(7)
NMSE 0.140 0.118 0.118 0.118 0.116 0.114
•Comparison of P(2) (Top) and P(5) (bottom) error curves
•As number of cluster increases:
•Better accuracy
•Faster convergence
Y. Le Borgne 34
Experiment 3
Simulation: at each time instantProbability 10% for a sensor failure
Probability 1% for a supernode failure
Recursive PCA and lazy learning deals efficiently with input space dimension variations
Robust with random sensor malfunctioning
Y. Le Borgne 35
Results
P(2) P(3) P(4) P(5) P(6) P(7)
NMSE 0.501 0.132 0.119 0.116 0.116 0.117
•Comparison of P(2) (Top) and P(5) (bottom) error curves
•The number of clusters increases the robustness
Y. Le Borgne 36
Experiment 4
Time varying changes in sensor measures
2700 time instants
Sensor response decreases linearly from a factor 1 to a factor 0.4
A temporal window:Only the last 1500 measures are kept
Y. Le Borgne 37
Results
•Due to the concept drift, the fixed model (in black) becomes outdated
•The lazy characteristic of the proposed architecture can deal with this drift very easily
Y. Le Borgne 38
Conclusion
Architecture:Yielding good results compared to batch equivalent
Computationally efficient
Adaptive with appearing and disappearing units
Handling easily non-stationarity
Y. Le Borgne 39
Future work
Extensions of tests to real-world data
Improvement of clustering strategyTaking costs (routing/accuracy) into consideration
Making use of ad-hoc feature of the network
Test of other compression proceduresRobust PCA
ICA
Y. Le Borgne 40
References
Smart Dust project: http://www-bsac.eecs.berkeley.edu/archive /users/warneke-brett/SmartDust/
Crossbow: http://www.xbow.com/
[BON99] G.Bontempi. Local Techniques for Modeling, Prediction and Control. PhD Thesis, IRIDIA- Université Libre de Bruxelles, 1999.
[YAN95] B. Yang. Projection Approximation Subspace Tracking, IEEE Transactions on Signal Processing, 43(1):95-107,1995.
[ALL74] D.M. Allen. 1974. The relationship between variable and data augmentation and a method of prediction. Technometrics, 16, 125-127
[GOO84] G.C. Goodwin & K.S. Sin. 1984. Adaptive filtering Prediction and Control. Prentice-Hall.
[HYV01] Independent Component Analysis. A. Hyvarinen, J. Karhunen, E. Oja. 2001.
Y. Le Borgne 41
References on lazy learning
[BIR99] M. Birattari, G. Bontempi, and H. Bersini. Lazy learning meets the recursive least square algorithm. In M. S. Kearns, S.a. Solla, and D.a. Cohn, editors, NIPS 11, pages 375-381, Cambridge,1999, MIT Press.
[BON99] G. Bontempi, M.Birattari, and H.Bersini. Local learning for iterated time-series prediction. In I. Bratko and S. Dzeroski, editors, Machine Learning : Proceedings of the 16th International Conference, pages 32-38, San Francisco, CA, 1999. Morgan Kaufmann Publishers.
[BON00] G. Bontempi, M.Birattari, and H. Bersini. A model selection approach for local learning. Artificial Intelligence Communications, 121(1), 2000.
Y. Le Borgne 42
Thanks for your attention!