Upload
gianluca-bontempi
View
245
Download
0
Embed Size (px)
Citation preview
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Yann-Ael Le Borgne, Gianluca Bontempi
Machine Learning GroupDepartment of Computer Science
Universite Libre de Bruxelles
May 15th, 2009
Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor Networks
Wireless sensors: latest trend of Moore’s law (1965)
“The number of transistors that can beplaced inexpensively on an integrated circuitdoubles every two years.”
1950 1990 2000 2010
Computing devices get
Smaller
Cheaper
→ Enable new kinds of interactions with our world
Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor NetworksWireless sensors
Sensor nodes can collect, process and communicate data[Warneke et al., 2001; Akyildiz et al., 2002]
TMote Sky
Deputy dust
Sensors: Light, temperature, humidity,pressure, acceleration, sound, . . .
Radio: ∼ 10s kbps, ∼ 10s meters
Microprocessor: A few MHz
Memory: ∼ 10s KB
Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor NetworksEnvironmental monitoring and periodic data collection
Base station
Wireless node
Internet
Sensor network
Base station
separate calibration procedures to test the system prior toplacing it in the field, and then spent a day in the forestwith ropes, harnesses, and a notebook.
We decided on the following envelope for our deployment:
Time: One month during the early summer, sampling allsensors once every 5 minutes. The early summer con-tains the most dynamic microclimatic variation. Wedecided that sampling every 5 minutes would be su!-cient to capture that variation.
Vertical Distance: 15m from ground level to 70m fromground level, with roughly a 2-meter spacing betweennodes. This spatial density ensured that we could cap-ture gradients in enough detail to interpolate accu-rately. The envelope began at 15m because most ofthe foliage was in the upper region of the tree.
Angular Location: The west side of the tree. The westside had a thicker canopy and provided the most bu"er-ing against direct environmental e"ects.
Radial Distance: 0.1-1.0m from the trunk. The nodeswere placed very close to the trunk to ensure that wewere capturing the microclimatic trends that a"ectedthe tree directly, and not the broader climate.
Figure 1 shows the final placement of each mote in thetree. We also placed several nodes outside of our angularand radial envelope in order to monitor the microclimate inthe immediate vicinity of other biological sensing equipmentthat had previously been installed.
Figure 1: The placement of nodes within the tree
4.1 Hardware and Network Architecture
The sensor node platform was a Mica2Dot, a repackagedMica2 mote produced by Crossbow, with a 1 inch diameterform factor. The mote used an Atmel ATmega128 micro-controller running at 4 MHz, a 433 MHz radio from Chip-con operating at 40Kbps, and 512KB of flash memory. Themote was connected to digital sensors using I2C and SPIserial protocols and to analog sensors using the on-boardADC.
The choice of measured parameters was driven by the bio-logical requirements. We measured traditional climate vari-ables – temperature, humidity, and light levels. Tempera-ture and relative humidity feed directly into transpirationmodels for redwood forests. Photosynthetically active radi-ation (PAR, wavelength from 350 to 700 nm) provides infor-mation about energy available for photosynthesis and tellsus about drivers for the carbon balance in the forest. Wemeasure both incident (direct) and reflected (ambient) levelsof PAR. Incident measurements provide insight into the en-ergy available for photosynthesis, while the ratio of reflectedto incident PAR allows for eventual validation of satelliteremote sensing measurements of land surface reflectance.The Sensirion SHT11 digital sensor provided temperature(± 0.5!C) and humidity (± 3.5%) measurements. The in-cident and reflected PAR measurements were collected bytwo Hamamatsu S1087 photodiodes interfaced to the 10-bitADC on Mica2Dot.
The platform also included a TAOS TSL2550 sensor tomeasure total solar radiation (300nm - 1000nm), and anIntersema MS5534A to measure barometric pressure, butwe chose not to use them in our deployment. During cali-bration, we found that the TSR sensor was overly sensitive,and would not produce useful information in direct sunlight.Because TSR and PAR would have told roughly the samestory, and because PAR was more useful from the biologyviewpoint, we decided not to gather data on total solar ra-diation. As for the pressure sensor, barometric pressure issimply too di"use a phenomenon to show appreciable di"er-ences over the height of a single redwood tree. A standardpressure gradient would exist as a direct function of height,but any pressure changes due to weather would a"ect theentire tree equally. Barometric pressure sensing should beuseful in future large-scale climate studies.
The package for such a deployment needs to protect theelectronics from the weather while safely exposing the sen-sors. Our chosen sensing modalities place specific require-ments on the package. Standardized temperature and hu-midity sensing should be performed in a shaded area withadequate airflow, implying that the enclosure must providesuch a space while absorbing little radiated heat. The out-put of the sensors that measure direct radiation is dependenton the sensor orientation, so the enclosure must expose thesesensors and level their sensing plane. The sensors measuringambient levels of PAR must be shaded but need a relativelywide field of view.
The package designed for this deployment is shown in Fig-ure 2. The mote, the battery, and two sensor boards fit in-side the sealed cylindrical enclosure. The enclosure is milledfrom white HDPE, and reflects most of the radiated heat.The endcaps of the cylinder form two sensing surfaces – onecaptures direct radiation, the other captures all other mea-surements. The white “skirt” provides extra shade, protec-
53
One-to-one applications Many-to-one applications
Data is sent by nodes periodically to a base station.
Applications: Medical, interactive arts, ecology, industry,disaster prevention.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor Networks
Challenges in environmental monitoring:
Long-running applications (months or years),
Limited energy on sensor nodes.
Operation mode Telos nodeStandby 5.1 µAMCU Active 1.8 mAMCU + Radio RX 21.8 mAMCU + Radio TX (0dBm) 19.5 mA
The radio is the most energy consuming module.95% of energy consumption in typical data collection tasks [Madden, 2003].
If run continuoulsy with the radio, the lifetime is about 5 days.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor Networks
Challenges in environmental monitoring:
Long-running applications (months or years),
Limited energy on sensor nodes.
Operation mode Telos nodeStandby 5.1 µAMCU Active 1.8 mAMCU + Radio RX 21.8 mAMCU + Radio TX (0dBm) 19.5 mA
The radio is the most energy consuming module.95% of energy consumption in typical data collection tasks [Madden, 2003].
If run continuoulsy with the radio, the lifetime is about 5 days.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Supervised learning
Goal:
Using examples, find relationships in data by means ofprediction models hθ (parametric functions).
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
0 20 40 60 80 100 120
020
4060
80100
120
x
y
Time
Mea
sure
men
t
t
s i[t]
Model si[t] = !t
Training examples
Preliminaries Adaptive Model Selection Adaptive Model Selection
Machine learningModeling data with parametric functions
Let S = {1, 2, . . . ,S} be the set of S sensor nodes.
Let t ∈ N denote time instants, or epochs.
Let si [t] be the measurement of sensor i ∈ S at epoch t.
A model is a parametric function
hθ : Rn → Rx 7→ si [t] = hθ(x)
hθ : Model with parameter θ ∈ Rp
x ∈ Rn: Input.si [t] ∈ R: approximation to si [t]
Temporal models, e.g., si [t] = θ1si [t − 1] + θ2si [t − 2]
x = (si [t − 1], si [t − 2]): past measurements of sensor i areused to model si [t].
Spatial models, e.g., si [t] = θ1sj [t] + θ2sk [t]
x = (sj [t], sk [t]), j , k ∈ S: measurements of sensors j and k areused to model measurements of sensor i .
Preliminaries Adaptive Model Selection Adaptive Model Selection
Machine learningModeling data with parametric functions
Let S = {1, 2, . . . ,S} be the set of S sensor nodes.
Let t ∈ N denote time instants, or epochs.
Let si [t] be the measurement of sensor i ∈ S at epoch t.
A model is a parametric function
hθ : Rn → Rx 7→ si [t] = hθ(x)
hθ : Model with parameter θ ∈ Rp
x ∈ Rn: Input.si [t] ∈ R: approximation to si [t]
Temporal models, e.g., si [t] = θ1si [t − 1] + θ2si [t − 2]
x = (si [t − 1], si [t − 2]): past measurements of sensor i areused to model si [t].
Spatial models, e.g., si [t] = θ1sj [t] + θ2sk [t]
x = (sj [t], sk [t]), j , k ∈ S: measurements of sensors j and k areused to model measurements of sensor i .
Preliminaries Adaptive Model Selection Adaptive Model Selection
Machine learningModeling data with parametric functions
Let S = {1, 2, . . . ,S} be the set of S sensor nodes.
Let t ∈ N denote time instants, or epochs.
Let si [t] be the measurement of sensor i ∈ S at epoch t.
A model is a parametric function
hθ : Rn → Rx 7→ si [t] = hθ(x)
hθ : Model with parameter θ ∈ Rp
x ∈ Rn: Input.si [t] ∈ R: approximation to si [t]
Temporal models, e.g., si [t] = θ1si [t − 1] + θ2si [t − 2]
x = (si [t − 1], si [t − 2]): past measurements of sensor i areused to model si [t].
Spatial models, e.g., si [t] = θ1sj [t] + θ2sk [t]
x = (sj [t], sk [t]), j , k ∈ S: measurements of sensors j and k areused to model measurements of sensor i .
Preliminaries Adaptive Model Selection Adaptive Model Selection
LearningLearning procedure
The model parameters are obtained by a learning procedure,based on a training set of N examples. A loss function L(y , y)is used to minimize the model error.
Target function
Prediction model
Training set
output y
output yinput x
error L(y, y)Learning procedure
Loss function: Quadratic, Zero/One, . . .
Learning procedure: Linear/nonlinear regression, K nearestneighbors, SVM, . . .
Preliminaries Adaptive Model Selection Adaptive Model Selection
LearningLearning procedure
The model parameters are obtained by a learning procedure,based on a training set of N examples. A loss function L(y , y)is used to minimize the model error.
Target function
Prediction model
Training set
output y
output yinput x
error L(y, y)Learning procedure
Loss function: Quadratic, Zero/One, . . .
Learning procedure: Linear/nonlinear regression, K nearestneighbors, SVM, . . .
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection (AMS)
Preliminaries Adaptive Model Selection Adaptive Model Selection
State of the art: Temporal replicated modelsOverview
Base station
Wireless node
Model Copy of modelh! h!
Models are used to predict sensors’ measurements over time.
A user defined threshold ε determines when a sensor nodeupdates the model.
Constant model [Olston et al., 2001]
si [t] = si [t − 1]
Most simple: no parameter to compute
Preliminaries Adaptive Model Selection Adaptive Model Selection
Temporal replicated modelsOverview
Temperature measurements, Solbosch greenhouse.
160 170 180 190 200 210 220
3032
3436
3840
42
Accuracy: 2°C Constant model
Time instants
Tem
pera
ture
(°C
)
● ● ● ● ●
Sensor nodeBase station
5 updates instead of 58→ more than 90% of communication savings.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Temporal replicated modelsFrom simple to complex model
Constant model
si [t] = si [t − 1]
Most simpleNo parameter to computeNot complex
Autoregressive model AR(p)
si [t] = θ1si [t−1]+. . .+θpsi [t−p]
Regressionθ = (XT X )−1XT Yusing N past observations
Least Mean SquareProvides a way to computeθ recursively with µ as thestep size
0 5 10 15 20
2025
3035
4045
Accuracy: 2°C Constant model
Time (Hour)
Tem
pera
ture
(°C
)
● ●● ●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ●
0 5 10 15 20
2025
3035
4045
Accuracy: 2°C AR(2)
Time (Hour)
Tem
pera
ture
(°C
)
● ●● ● ● ●●● ● ●● ● ● ● ● ● ●
Preliminaries Adaptive Model Selection Adaptive Model Selection
Temporal replicated modelsFrom simple to complex model
Constant model
si [t] = si [t − 1]
Most simpleNo parameter to computeNot complex
Autoregressive model AR(p)
si [t] = θ1si [t−1]+. . .+θpsi [t−p]
Regressionθ = (XT X )−1XT Yusing N past observations
Least Mean SquareProvides a way to computeθ recursively with µ as thestep size
0 5 10 15 20
2025
3035
4045
Accuracy: 2°C Constant model
Time (Hour)
Tem
pera
ture
(°C
)
● ●● ●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ●
0 5 10 15 20
2025
3035
4045
Accuracy: 2°C AR(2)
Time (Hour)
Tem
pera
ture
(°C
)
● ●● ● ● ●●● ● ●● ● ● ● ● ● ●
Preliminaries Adaptive Model Selection Adaptive Model Selection
State of the art: Temporal replicated modelsPros and cons
Pros:
Guarantee the observer with ε accuracy.
Simple or complex models can be used (from constant model[Olston, 2001] to autoregressive models [Santini et al., Tuloneet al., 2006]).
Cons:
In most cases, no a priori information is available on themeasurements. Which model to choose a priori?
The metric (update rate) does not consider the number ofparameters of the models.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionMotivation
Tradeoff: More complex models better predict measurements,but have a higher number of parameters.
Model complexity
Met
ric
Communication costs
Model error
AR(p) : si[t] =p!
j=1
!jsi[t! j]
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionCollection of models
With Adaptive Model Selection (AMS), a collection of Kmodels {hk}, 1 ≤ k ≤ K , of increasing complexity are run bythe node. Wk : metric estimating the communication savings.
h1 h2 h3 h4
Model complexity
(W1) (W2) (W3) (W4)
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionMetric to assess communication savings
Metric suggested: the weighted update rate
Wk [t] = CkUk [t]
Update rate Uk [t]: percentage of updates for model k atepoch t ([Olston, 2001, Jain et al., 2004, Santini et al., Tuloneet al., 2006]).Model cost Ck : takes into account the number of parametersof the k-th model.
Ck = PP−D+1
P: Size of the packet.D: Size of the data load.
→ P − D is the packet overhead
SYNC Packet Address Message Group Data . . . Data CRC SYNCBYTE Type Type ID Length BYTE1 2 3 5 6 7 . . . Size D P-2 P
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionModel selection
When an update is required, the model hk withk = argminkWk [t] is sent to the base station.
Assuming stationarity in the data, the confidence in eachestimated Wk [t] increases with t.
Running poorly performing models is detrimental to energyconsumption.
Racing [Maron, 1997]: Model selection technique based onthe Hoeffding bound, which allows to discard poorlyperforming models.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionModel selection
When an update is required, the model hk withk = argminkWk [t] is sent to the base station.
Assuming stationarity in the data, the confidence in eachestimated Wk [t] increases with t.
Running poorly performing models is detrimental to energyconsumption.
Racing [Maron, 1997]: Model selection technique based onthe Hoeffding bound, which allows to discard poorlyperforming models.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionHoeffding bound
Let x be a random variable with range R. Let µ be its mean.
Let µ[t] be an estimate of µ using t samples of x.
Given a confidence 1− δ, the Hoeffding bound states that
P(|µ− µ[t]| < ∆) > 1− δ
with ∆ = R√
ln 1/δ2t [Hoeffding, 1963].
In AMS, the random variable considered is the modelperformance Wk . Wk [t] is the estimate of Wk after t epochs.
We have Wk [t] = CkU[t]. The range of Wk [t] is R = 100Ck
as U[t] ≤ 100.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionHoeffding bound
Let x be a random variable with range R. Let µ be its mean.
Let µ[t] be an estimate of µ using t samples of x.
Given a confidence 1− δ, the Hoeffding bound states that
P(|µ− µ[t]| < ∆) > 1− δ
with ∆ = R√
ln 1/δ2t [Hoeffding, 1963].
In AMS, the random variable considered is the modelperformance Wk . Wk [t] is the estimate of Wk after t epochs.
We have Wk [t] = CkU[t]. The range of Wk [t] is R = 100Ck
as U[t] ≤ 100.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionRacing
Best model: Wk [t] for which k = arg minkWk [t]. Upper
bound is Wk [t] + 100Ck
√ln 1/δ
2t .If a model hk ′ has
Wk ′ [t]− 100Ck ′
√ln 1/δ
2t>Wk [t] + 100Ck
√ln 1/δ
2tthen it can be discarded.
Model type
Wei
ghte
d up
date
rate
h1 h2 h3 h4 h5 h6
Upper bound for error
Estimated error for
h3
h4
The test used is Wk ′ [t]−Wk [t] > 100(Ck + Ck ′)√
ln 1/δ2t
Using racing, models h1 and h6 are discarded.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionRacing
Model type
Wei
ghte
d up
date
rate
h1 h2 h3 h4 h5 h6
Upper bound for error
Estimated error for
h3
h4
The test used is Wk ′ [t]−Wk [t] > 100(Ck + Ck ′)√
ln 1/δ2t
Using racing, models h1 and h6 are discarded.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionExperimental evaluation
14 time series, various types of measured physical quantities.Data set Sensed quantity Sampling period Duration Number of samplesS Heater temperature 3 seconds 6h15 3000
I Light light 5 minutes 8 days 1584M Hum humidity 10 minutes 30 days 4320
M Temp temperature 10 minutes 30 days 4320NDBC WD wind direction 1 hour 1 year 7564
NDBC WSPD wind speed 1 hour 1 year 7564NDBC DPD dominant wave period 1 hour 1 year 7562NDBC AVP average wave period 1 hour 1 year 8639NDBC BAR air pressure 1 hour 1 year 8639
NDBC ATMP air temperature 1 hour 1 year 8639NDBC WTMP water temperature 1 hour 1 year 8734NDBC DEWP dewpoint temperature 1 hour 1 year 8734
NDBC GST gust speed 1 hour 1 year 8710NDBC WVHT wave height 1 hour 1 year 8723
Error threshold ε is set to 0.01r where r is the range of themeasurements.
AMS is run with k = 6 models: the constant model (CM) andautoregressive models AR(p) with p ranging from 1 to 5.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionExperimental evaluation
Assuming a packet overhead of 24 bytes, the model cost is setto CCM = 1 for the CM, and to CAR(p) = 24+2p
24+1 for an AR(p).
CM AR1 AR2 AR3 AR4 AR5 AMSS Heater 74 78 68 70 76 81 AR2I Light 38 42 44 48 51 53 CMM Hum 53 55 55 60 62 66 CMM Temp 48 50 50 54 56 60 CMNDBC DPD 65 89 89 95 102 109 CMNDBC AWP 72 75 81 88 93 99 CMNDBC BAR 51 52 44 47 49 50 AR2NDBC ATMP 39 41 40 43 46 49 CMNDBC WTMP 27 28 23 25 27 28 AR2NDBC DEWP 57 54 58 62 67 71 AR1NDBC WSPD 74 87 92 99 106 113 CMNDBC WD 85 84 91 98 104 111 AR1NDBC GST 80 84 90 96 103 110 CMNDBC WVHT 58 58 63 67 71 76 CM
Bold numbers report significantly better update rates(Hoeffding bound, δ = 0.05).
For all time series, the AMS selects the best model.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionExperimental evaluation
Number of models remaining over time:
0 200 400 600 800 1000
02
46
810
12
14
timeChange[1, ]
rep(1
, 7)
6 4 3 2 1
6 1
6 5 4 3 1
6 5 3 1
6 2 1
6 3 2 1
6 5 4 3 2 1
6 5 4 1
6 5 1
6 4 3 2 1
6 1
6 3 2 1
6 3 2 1
6 4 3 2 1
Time instants
S Heater
I Light
M Hum
M Temp
NDBC DPD
NDBC AWP
NDBC BAR
NDBC ATMP
NDBC WTMP
NDBC DEWP
NDBC WSPD
NDBC WD
NDBC GST
NDBC WVHT
The speed of convergence of the racing algorithm depends onthe time series, and is in most cases reasonably fast.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionConclusions
In summary, Adaptive Model Selection
Takes into account the cost of sending model parameters,
Allows sensor nodes to determine autonomously the modelwhich best fits their measurements,
Provides a statistically sound selection mechanism to discardpoorly performing models,
Gave in experimental results about 45% of communicationsavings on average,
Was implemented in TinyOS, the reference operating system.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Additional contributions
Several deployments at the ULB.Data sets (and code) available at www.ulb.ac.be/di/labo.
Microclimate monitoring - Solbosch greenhouses.
18 sensors in three greenhouses. Several experiments.
Data collected: Temperature, humidity and light.
Sampling interval: 5 minutes.
Experimental setups monitoring - Unit of Social Ecology.
18 sensors in three experimental labs, running for 5 days.
Data collected: Temperature, humidity and light.
Sampling interval: 5 minutes.
PIMAN project (Region Bruxelles Capitale - 2007/2008):
Goal: localize an operator in an industrial environment.
Techniques: Triangulation, multidimensional scaling, Kalman filters.
Several deployments (up to 48 sensors).
Preliminaries Adaptive Model Selection Adaptive Model Selection
Thank you for your attention
Questions?
Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor NetworksData centric networking
A sensor network can be seen as a distributed database.
SQL can be used as the language to interact with the network:
SELECT temperature FROM sensorsWHERE location=[0,0, 15, 35]DURATION=00:00:00,10:00:00EPOCH DURATION 30s
1
5 7
6
3
2 4
Base station
1
5 7
6
3
2 4
Base station
1
5 7
6
3
2 4
Base station
The query is broadcasted from the base station to thenetwork. Sensors involved in the query establish a routingstructure towards the base station.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor Networks
Challenges in environmental monitoring:
Long-running applications (months or years),
Limited energy on sensor nodes.
Preliminaries Adaptive Model Selection Distributed Principal Component Analysis
Wireless Sensor Networks
Challenges in environmental monitoring:
Long-running applications (months or years),
Limited energy on sensor nodes.
Operation mode Telos nodeStandby 5.1 µAMCU Active 1.8 mAMCU + Radio RX 21.8 mAMCU + Radio TX (0dBm) 19.5 mA
The radio is the most energy consuming module.95% of energy consumption in typical data collection tasks [Madden, 2003].
If run continuoulsy with the radio, the lifetime is about 5 days.
x10
The radio is the most energy consuming module.95% of energy consumption in typical data collection tasks [Madden, 2003].
If run continuoulsy with the radio, the lifetime is about 5 days.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor Networks
Challenges in environmental monitoring:
Long-running applications (months or years),
Limited energy on sensor nodes.
Preliminaries Adaptive Model Selection Distributed Principal Component Analysis
Wireless Sensor Networks
Challenges in environmental monitoring:
Long-running applications (months or years),
Limited energy on sensor nodes.
Operation mode Telos nodeStandby 5.1 µAMCU Active 1.8 mAMCU + Radio RX 21.8 mAMCU + Radio TX (0dBm) 19.5 mA
The radio is the most energy consuming module.95% of energy consumption in typical data collection tasks [Madden, 2003].
If run continuoulsy with the radio, the lifetime is about 5 days.
x10
The radio is the most energy consuming module.95% of energy consumption in typical data collection tasks [Madden, 2003].
If run continuoulsy with the radio, the lifetime is about 5 days.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Machine learningOverview
Goal: Uncover structure and relationships in a set ofobservations, by means of models (mathematical functions).
!
!
!
!
! !
!
!!
!
!
!
! !!
! !!
!!
!
!
0 1 2 3 4 5 6 7
01
23
45
67
x[1:22]y2[1:22]
!
!
!
!
! !
!
!!
!
!
!
! !!
! !!
!!
!
!
0 1 2 3 4 5 6 7
01
23
45
67
x[1:22]
y2[1:22]
Variable 1 ( )x Variable 1 ( )x
Variable
2 (
)
y
Variable
2 (
)y
Learning
Procedure
Observations y = h(x)Model
A learning procedure is used to find the model.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Machine learningLearning methodology
Unknown relationship
Observations
Learning procedure
Model
Input Output
Output
x y
y
L(y, y)Error
Input x can have several dimensions (Image classification)
Different models exist (linear models, neural networks,decision trees, ...) with specific learning procedures.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Machine learningModeling sensor data
Temporal model:
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
0 20 40 60 80 100 120
020
4060
80100
120
x
y
Time
Mea
sure
men
t
t
s i[t]
Model si[t] = !t
Training examples
Input: Time.
Output: The measurement si [t] of a sensor i at time t.
Model: si [t] = θt.
The model approximates the set of measurements with just oneparameter θ.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Learning with wireless sensor data
Thesis statement
Machine learning techniques can be used to reducecommunication by approximating sensor data with models.
→ Instead of sending all the measurements, only the parameters ofthe models are transmitted.
Effective approach as sensor data are
temporally and spatially related (correlations)
Noisy: exact measurements rarely needed.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Contribution I:
Adaptive Model Selection (AMS)
Temporal modeling
Preliminaries Adaptive Model Selection Adaptive Model Selection
Replicated modelsOverview
Recall: In environmental monitoring, a sensor sends itsmeasurements periodically.
Measurements s[t] are sent at every time t.
Base station
Wireless node
s[t]s[t]
t
s[t]
t
Replicated models:
Models h are sent instead of the measurements.
Base station
Wireless node
s[t]
t t
h s[t]
Preliminaries Adaptive Model Selection Adaptive Model Selection
Replicated modelsOverview
Models computed by the sensor node→ The node can compare the model prediction with the truemeasurements:
A new model is sent if |s[t]− s[t]| > ε
ε is user-defined, and application dependent.
Simple learning procedure must be used. Most simple model:
Constant model [Olston et al., 2001]
si [t] = si [t − 1]
Simply: The next measurement is the same as the previousone
no parameter to compute
Preliminaries Adaptive Model Selection Adaptive Model Selection
Replicated modelsConstant model
Temperature measurements, Solbosch greenhouse. ε = 2◦C .
160 170 180 190 200 210 220
3032
3436
3840
42
Accuracy: 2°C Constant model
Time instants
Tem
pera
ture
(°C
)
● ● ● ● ●
Sensor nodeBase station
5 updates instead of 58→ more than 90% of communication savings.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Replicated modelsAutoregressive models
More complex models can be used: autoregressive models AR(p)[Santini et al., Tulone et al., 2006].
s[t] = θ1s[t − 1] + . . .+ θps[t − p]
0 5 10 15 20
2025
3035
4045
Accuracy: 2°C AR(2)
Time (Hour)
Tem
pera
ture
(°C)
● ●● ● ● ●●● ● ●● ● ● ● ● ● ●
Time (hours)
Tem
pera
ture
(°C)
An AR(2) reduces the number of updates by 6 percents
in comparison to the constant model.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Replicated modelsPros and cons
Pros:
Guarantee the observer with ε accuracy.
Simple or complex models can be used.
Cons:
In most cases, no a priori information is available on themeasurements. Which model to choose a priori?
The metric (update rate) does not consider the number ofparameters of the models.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionMotivation
Tradeoff: More complex models better predict measurements,but have a higher number of parameters.
Model complexity
Met
ric
Communication costs
Model error
AR(p) : si[t] =p!
j=1
!jsi[t! j]
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionCollection of models
A collection of K models {hk}, 1 ≤ k ≤ K , of increasingcomplexity are run by the node.
Base station
Wireless node
s[t]
t t
{h1, h2,
. . . , hK}
s[t]h2
Wk : new metric estimating the communication costs.
When an update is needed, the model with the lowest Wk issent.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionMetric to assess communication savings
Wk : the weighted update rate
Wk [t] = CkUk [t]
Update rate Uk [t]: percentage of updates for model k atepoch t ([Olston, 2001, Jain et al., 2004, Santini et al., Tuloneet al., 2006]).Model cost Ck : takes into account the number of parametersof the k-th model.
Ck = PP−D+1
P: Size of the packet.D: Size of the data load.
→ P − D is the packet overhead
SYNC Packet Address Message Group Data . . . Data CRC SYNCBYTE Type Type ID Length BYTE1 2 3 5 6 7 . . . Size D P-2 P
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionModel selection
Model performances Wk [t] are estimated over time.
When data collection starts, no idea which model is best.
Running poorly performing models is detrimental to energyconsumption.
Racing [Maron, 1997]: Model selection technique based onthe Hoeffding bound, which allows to select the bestperforming model.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionModel selection
Model performances Wk [t] are estimated over time.
When data collection starts, no idea which model is best.
Running poorly performing models is detrimental to energyconsumption.
Racing [Maron, 1997]: Model selection technique based onthe Hoeffding bound, which allows to select the bestperforming model.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionRacing
Model type
Wei
ghte
d up
date
rate
h1 h2 h3 h4 h5 h6
W1[t]Upper bound
for Wk[t]
Lower bound for W1[t]
W1[t]
W2[t]
W3[t]W4[t]
W5[t]
W6[t]
At first, all models are in competition.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionRacing
Model type
Wei
ghte
d up
date
rate
h1 h2 h3 h4 h5 h6
W1[t]Upper bound
for
Wk[t]
W1[t]W2[t] W3[t]
W4[t]
W5[t]W6[t]
As time passes, model h1 statistically outperforms h6.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionRacing
Model type
Wei
ghte
d up
date
rate
h1 h2 h3 h4 h5 h6
Upper bound for
Wk[t]
W1[t]W2[t]
W3[t]
W4[t]
W5[t]
W3[t]
h3 then statistically outperforms h5.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionRacing
Model type
Wei
ghte
d up
date
rate
h1 h2 h3 h4 h5 h6
Upper bound for
Wk[t]
W1[t]W2[t]
W3[t]
W4[t]
W3[t]
h3 finally is selected as the best one.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionExperimental evaluation
14 time series, various types of measured physical quantities.Data set Sensed quantity Sampling period Duration Number of samplesS Heater temperature 3 seconds 6h15 3000
I Light light 5 minutes 8 days 1584M Hum humidity 10 minutes 30 days 4320
M Temp temperature 10 minutes 30 days 4320NDBC WD wind direction 1 hour 1 year 7564
NDBC WSPD wind speed 1 hour 1 year 7564NDBC DPD dominant wave period 1 hour 1 year 7562NDBC AVP average wave period 1 hour 1 year 8639NDBC BAR air pressure 1 hour 1 year 8639
NDBC ATMP air temperature 1 hour 1 year 8639NDBC WTMP water temperature 1 hour 1 year 8734NDBC DEWP dewpoint temperature 1 hour 1 year 8734
NDBC GST gust speed 1 hour 1 year 8710NDBC WVHT wave height 1 hour 1 year 8723
Error threshold ε is set to 0.01r where r is the range of themeasurements.
AMS is run with k = 6 models: the constant model (CM) andautoregressive models AR(p) with p ranging from 1 to 5.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionExperimental evaluation
CM AR1 AR2 AR3 AR4 AR5 AMSS Heater 74 78 68 70 76 81 AR2I Light 38 42 44 48 51 53 CMM Hum 53 55 55 60 62 66 CMM Temp 48 50 50 54 56 60 CMNDBC DPD 65 89 89 95 102 109 CMNDBC AWP 72 75 81 88 93 99 CMNDBC BAR 51 52 44 47 49 50 AR2NDBC ATMP 39 41 40 43 46 49 CMNDBC WTMP 27 28 23 25 27 28 AR2NDBC DEWP 57 54 58 62 67 71 AR1NDBC WSPD 74 87 92 99 106 113 CMNDBC WD 85 84 91 98 104 111 AR1NDBC GST 80 84 90 96 103 110 CMNDBC WVHT 58 58 63 67 71 76 CM
Bold numbers report significantly better update rates(Hoeffding bound, δ = 0.05).
For all time series, the AMS selects the best model.
Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model SelectionConclusions
In summary, Adaptive Model Selection
Takes into account the cost of sending model parameters,
Allows sensor nodes to determine autonomously the modelwhich best fits their measurements,
Provides a statistically sound selection mechanism to discardpoorly performing models,
Gave in experimental results about 45% of communicationsavings on average,
Energy for computation is not a problem (negligible),
Was implemented in TinyOS, the reference operating system.