Analyzing wireless sensor network data under suppression and failure in transmission Alan E. Gelfand...

Preview:

Citation preview

Analyzing wireless sensor network data under suppression

and failure in transmission

Alan E. GelfandInstitute of Statistics and Decision Sciences

Duke University (with G.Puggioni, J.Yang, A.Silberstein, K.Munagala)

Before we begin…

• From the perspective of a stochastic modeler

• In fact, a hierarchical modeler working in the Bayesian inference paradigm

• Neophyte with regard to sensor networks• However, not much attention in the

statistics community to issues involved in studying these networks

• No optimization!

Outline

• Our niche in the sensor networks world• Global, local, modeling, computation, and analysis• Local data collection• Suppression or transmission; failure; redundancy• Stochastic implications suggests focus on probabilistic

modeling rather than on algorithms• Fully model based approach implies full and exact

inference with uncertainty• Computational challenges (model fitting)• Measuring information loss• An example, some “experiments”• Future work; getting closer to what we really want

Data Collection

• At the node; multiple sensors per node; local calibration using field data collection

• Collection at high temporal resolution (scales?)

• Cost of collection; periods of no collection

• Collection is cheap, transmission is expensive in terms of battery life

• Multivariate data collection

Network Communication

• Here a very simple version – “spokes on a wheel”; single-hop; nodes to gateway (and back); no node to node communication

Model Building Plan

• Single node, suppression only, failure only, both

• Two nodes, suppression only, failure only, both

• Network of nodes, spatial modeling, suppression only, failure only, both

Suppression

• Temporal suppression only here• The basic idea: at high temporal resolution, at a

given node, data expected to change little from time point to time point

• Transmission is expensive relative to collection so only transmit given a “consequential” change

• Suppression schemes? based upon comparison with previous observation? with previously transmitted observation? with a “predicted” value at that time and location?

Suppression cont.

• For location s at time t, Y(s,t) is collected value• For continuous data and a specified Є, consider,

say |Y(s,t) – Y(s,tprevtrans)| > Є or |Y(s,t) – Yest(s,t)| > Є. NOT |Y(s,t)-Y(s,t-1)| > Є.

• Choice of Є? Anticipate a high rate of suppression. Much more “missing data” than in usual statistical analysis settings

• Again, no cross-node communication so here suppression can not be based on neighboring values (spatial suppression)

Transmission Failure

• Practical issue, what is a failure - bit errors, corrupted transmission

• Rate varies spatially, varies seasonally• Will not be known - so models for failure• Disentangling failure from suppression?• Redundancy or error-correcting schemes - when

transmitting, transmit both a value and the time or do this for several previous transmission times (how many?)

• Another idea is to include acknowledgement from gateway; no acknowledgement implies retransmission.

• Suppression or observation after failure results from comparison with Y(s,tfailure).

Modelling

• Envision an overall process model which is spatially dependent time series.

• Observed data is a noisy version of this• In fact, we envision the familiar specification, [data|

process, parameters] x [process|parameters] x [parameters] with dynamics at the second stage

• Dynamics can be driven by local autoregressive models (with drift), by local discretized continuous time models, by local differential equations

• They are connected up in space by spatially colored noise at the second stage and, more generally, by spatially associated model parameters

Inference

• Global and local parameters• Which model parameters vary spatially?• Temporally evolving parameters reflecting seasonality• Interest in reconstruction of the local time series (but not

interested in piecewise interpolation schemes – want full model and full inference under the model)

• Again, full inference in terms of posterior distributions• Global model fitting – offline activity at server, what

temporal scale?• With regard to local computation, communication of

parameter estimation to nodes for local suppression?

Model fitting

• Offline computation

• Bayesian hierarchical spatio-temporal model

• Fitted using Gibbs sampling

• Currently, no local modeling; just comparison with previous transmission (failure or not)

Some details

Details cont.

Cont.

Cont.

Cont.

Cont.

Dynamic model version

An Example

• An AR(1) model• Known drift (as in say precipitation input for a soil

moisture model)• Drift measured at the gateway but assumed

applicable to all nodes• Only parameters are autoregressive coefficient

and process variance• Suppression rate Є known, failure rate not

modeled• Experiments - using, not using (i) suppression-

failure information; (ii) redundancy

Single missing value, known endpoints, no other information

Single missing value, known endpoints, missing value is a known failure

Single missing value, known endpoints, missing value is a known suppression

String of five missing values, known endpoints, no

other information on missing values

String of five missing values, known endpoints, all missing values known to be suppressions

Joint density, adjacent missing values, no other information

Joint density, adjacent missing values, missing values known to be suppressions

Comments

• Anticipate high rate of suppression

• Failure should not “dominate” suppression or else we should not suppress

• Failure rate model – reflecting space and time

• We have not viewed lowering failure rate as an option

Information loss

• For process parameters: - Kullback-Liebler distance between full data posterior and “partial” data posterior - Kullback-Leibler distance comparing different Є’s - Length of fixed coverage credible interval - Coverage probability of a symmetric (about the point estimate) fixed length interval

• For sequence reconstruction: A predictive mean square error criterion

Cont.

• Priority on process parameter inference or on sequence reconstruction

• Cost vs. information loss trade-off

• Utility function with cost linear in transmission

• No “off-line” cost associated with computation, e.g., using or ignoring suppression/failure information

Future Work

• Parameters changing over time• Node-to-node communication• Multi-hop transmission• Multivariate local data collection• Local, non- network data collection for

calibration, fusion• Good approximations for handling high

suppression rate and high failure rate settings• All moving toward modeling for an environmental

observation network

Recommended