An adaptive modular approach to the mining of sensor network data

An adaptive modular approach to the mining of sensor network data

G. Bontempi, Y. Le Borgne (1)

{gbonte,yleborgn}@ulb.ac.be

Machine Learning Group

Université Libre de Bruxelles – Belgium(1) Supported by the COMP2SYS project, sponsored by the HRM program of the European

Community (MEST-CT-2004-505079)

Y. Le Borgne 2

Outline

Wireless sensor networks: Overview

Machine learning in WSN

An adaptive two-layer architecture

Simulation and results

Conclusion and perspective

Y. Le Borgne 3

Sensor networks : Overview

Goal : Allow for a sensing task over an environment

Desiderata for the nodes:Autonomous power

Wireless communication

Computing capabilities

Y. Le Borgne 4

Smart dust project

Smart dust: Get mote size down to 1mm³Berkeley - Deputy dust (2001)

6mm³

Solar powered

Acceleration and light sensors

Optical communication

Low cost in large quantities

Y. Le Borgne 5

Current available sensors

Crossbow : Mica / Mica dotuProc: 4Mhz, 8 bit Atmel RISCRadio: 40 kbit 900/450/300 MHz or

250 kbit 2.5GHz (MicaZ 802.15.4)Memory: 4K RAM / 128 K Program Flash /

512 K Data FlashPower: 2 x AA or coin cell

Intel : iMoteuProc: 12Mhz, 16 bit ARMRadio: BluetoothMemory: 64K SRAM / 512 K Data FlashPower: 2 x AA

MoteIV : TelosuProc: 8Mhz, 16 bit TI RISCRadio: 250 kbit 2.5GHz (802.15.4)Memory: 2 K RAM / 60 K Program Flash /

512 K Data FlashPower: 2 x AA

Y. Le Borgne 6

Applications

Wildfire monitoring

Ecosystem monitoring

Earthquake monitoring

Precision agriculture

Object tracking

Intrusion detection

…

Y. Le Borgne 7

Challenges for…

Electronics

Networking

Systems

Data bases

Statistics

Signal processing

…

Y. Le Borgne 8

Machine learning and WSN

Local scale

Spatio-temporal correlationsLocal predictive model identification

Can be used to:Reduce sensor communication activity

Predict values for malfunctioning sensors

Y. Le Borgne 9

Machine learning and WSN

Global scale

The network as a a whole can achieve high level tasks

Sensor network <-> Image

Y. Le Borgne 10

Supervised learning and WSN

Classification (Traffic type classification)

Prediction (Pollution forecast)

Regression (Wave intensity, population density)

Y. Le Borgne 11

A supervised learning scenario

Ѕ: Network of S sensors

x(t)={s1(t),s2(t),…sS(t)} snapshot at time t

y(t)=f(x(t))+ε(t) the value associated to S at time t (ε standing for noise)

Let DN be a set of N observations (x(t),y(t))

Goal : Find a model that predicts y for any new x

Y. Le Borgne 12

Centralized approach

High transmission overhead

Y. Le Borgne 13

Two-layer approach

Use of compressionReduce transmission overhead

Spatial correlation induces low loss in compression

Reduction of learning problem dimensionality

Y. Le Borgne 14

Two-layer adaptive approach

PAST : Online compression

Lazy learning : Online learning

Y. Le Borgne 15

Compression : PCA

PCA: Transform the set of n input variables , into a set of m variables , m<n.

Linear transformation : ,

Variance preserving maximization

Solution :m first eigenvectors of x correlation matrix, or

Minimization of

Y. Le Borgne 16

PAST – Recursive PCA

Projection approximation subspace tracking [YAN95]

Online formulation:

Low memory requirement and computational complexity :

O(n*m)+O(m²)

Y. Le Borgne 17

PAST AlgorithmRecursive formulation: [HYV01]

Y. Le Borgne 18

Learning algorithm

Lazy learning: K-NN approachStorage of observation set:

When a query q is asked, takes the k nearest neighbours to q:

Builds a local linear model: , such that

Computes the output at by applying

Y. Le Borgne 19

How many neighbours?

•y=sin(x)+e

•e : Gaussian noise with σ=0.1

•What is the y value at x=1.5?

Y. Le Borgne 20


•K=2 : Overfitting

Y. Le Borgne 21




Y. Le Borgne 22


•K=2: Overfitting

•K=3: Overfitting

•K=4: Overfitting

Y. Le Borgne 23


•K=2: Overfitting

•K=3: Overfitting

•K=4: Overfitting

•K=5: Good

Y. Le Borgne 24


•K=2: Overfitting

•K=3: Overfitting

•K=4: Overfitting

•K=5: Good

•K=6: Underfitting

Y. Le Borgne 25

Automatic model selection([BIR99],[BON99],[BON00])

Starting with a low k, local models are identifiedTheir quality is assessed by a leave one out procedureThe best model(s) are kept for computing the predictionLow computational cost

PRESS statistics (ALL74)Recursive least squares ([GOO84])

Y. Le Borgne 26

Advantages of PAST and lazy

No assumption on the process underlying data

On-line learning capability

Adaptive with non-stationarity

Low computational and memory costs

Y. Le Borgne 27

Simulation

Modeling wave propagation phenomenon

Helmholtz equation:

k is the wave number

•2372 sensors

•30 k values between 1 and 146; 50 time instants

•1500 Observations

•Output k is noisy

Resident

Y. Le Borgne 28

Test procedure

Prediction error measurementNormalized Mean Square Error (NMSE)

10-fold cross-validation (1350/150)

Example of learning curve:

Y. Le Borgne 29

Experiment 1

Centralized configuration

Comparison PCA / PAST for 1 to 16 first principal components

Y. Le Borgne 30

Results

m 1 2 3 4 5 6 8 12 16NMSE

PCA0.621 0.266 0.181 0.144 0.138 0.134 0.133 0.124 0.116

NMSE PAST

0.782 0.363 0.257 0.223 0.183 0.196 0.132 0.124 0.115

•Prediction accuracy similar if number of principal components sufficient

Y. Le Borgne 31

Clustering

The number of clusters involves a trade-off between

The routing costs between clusters and gateway

The final prediction accuracy

The robustness of the architecture

Y. Le Borgne 32

Experiment 2Partitioning into geographical clusters

P varies from P(2) to P(7)

2 main components for each cluster

Ten-fold cross-validation – 1500 data

Example of P(2) partitioning

Y. Le Borgne 33

Results

P(2) P(3) P(4) P(5) P(6) P(7)

NMSE 0.140 0.118 0.118 0.118 0.116 0.114

•Comparison of P(2) (Top) and P(5) (bottom) error curves

•As number of cluster increases:

•Better accuracy

•Faster convergence

Y. Le Borgne 34

Experiment 3

Simulation: at each time instantProbability 10% for a sensor failure

Probability 1% for a supernode failure

Recursive PCA and lazy learning deals efficiently with input space dimension variations

Robust with random sensor malfunctioning

Y. Le Borgne 35

Results

P(2) P(3) P(4) P(5) P(6) P(7)

NMSE 0.501 0.132 0.119 0.116 0.116 0.117

•Comparison of P(2) (Top) and P(5) (bottom) error curves

•The number of clusters increases the robustness

Y. Le Borgne 36

Experiment 4

Time varying changes in sensor measures

2700 time instants

Sensor response decreases linearly from a factor 1 to a factor 0.4

A temporal window:Only the last 1500 measures are kept

Y. Le Borgne 37

Results

•Due to the concept drift, the fixed model (in black) becomes outdated

•The lazy characteristic of the proposed architecture can deal with this drift very easily

Y. Le Borgne 38

Conclusion

Architecture:Yielding good results compared to batch equivalent

Computationally efficient

Adaptive with appearing and disappearing units

Handling easily non-stationarity

Y. Le Borgne 39

Future work

Extensions of tests to real-world data

Improvement of clustering strategyTaking costs (routing/accuracy) into consideration

Making use of ad-hoc feature of the network

Test of other compression proceduresRobust PCA

ICA

Y. Le Borgne 40

References

Smart Dust project: http://www-bsac.eecs.berkeley.edu/archive /users/warneke-brett/SmartDust/

Crossbow: http://www.xbow.com/

[BON99] G.Bontempi. Local Techniques for Modeling, Prediction and Control. PhD Thesis, IRIDIA- Université Libre de Bruxelles, 1999.

[YAN95] B. Yang. Projection Approximation Subspace Tracking, IEEE Transactions on Signal Processing, 43(1):95-107,1995.

[ALL74] D.M. Allen. 1974. The relationship between variable and data augmentation and a method of prediction. Technometrics, 16, 125-127

[GOO84] G.C. Goodwin & K.S. Sin. 1984. Adaptive filtering Prediction and Control. Prentice-Hall.

[HYV01] Independent Component Analysis. A. Hyvarinen, J. Karhunen, E. Oja. 2001.

http://www-bsac.eecs.berkeley.edu/archive/users/warneke-brett/SmartDust/









http://www.xbow.com/

Y. Le Borgne 41

References on lazy learning

[BIR99] M. Birattari, G. Bontempi, and H. Bersini. Lazy learning meets the recursive least square algorithm. In M. S. Kearns, S.a. Solla, and D.a. Cohn, editors, NIPS 11, pages 375-381, Cambridge,1999, MIT Press.

[BON99] G. Bontempi, M.Birattari, and H.Bersini. Local learning for iterated time-series prediction. In I. Bratko and S. Dzeroski, editors, Machine Learning : Proceedings of the 16th International Conference, pages 32-38, San Francisco, CA, 1999. Morgan Kaufmann Publishers.

[BON00] G. Bontempi, M.Birattari, and H. Bersini. A model selection approach for local learning. Artificial Intelligence Communications, 121(1), 2000.

Y. Le Borgne 42

Thanks for your attention!

Documents

An adaptive modular approach to the mining of sensor network data