47
Sensor Data Management In Sensor Networks

Sensor Data Management In Sensor Networks. I. Introduction –This paper defines a model for sensor databases –Stored data are represented as relations

Embed Size (px)

Citation preview

Sensor Data Management In Sensor Networks

I. IntroductionI. Introduction

– This paper defines a model for sensor databasesThis paper defines a model for sensor databases

– Stored data are represented as relations while sensor data are Stored data are represented as relations while sensor data are

represented as time series and each long-running query formulated over represented as time series and each long-running query formulated over

a sensor database defines a persistent view, which is maintained during a a sensor database defines a persistent view, which is maintained during a

given time intervalgiven time interval

– The design and implementation of the COUGAR sensor database system The design and implementation of the COUGAR sensor database system

is also describedis also described

– Applications monitor the world by querying and analyzing sensor dataApplications monitor the world by querying and analyzing sensor data

Towards Sensor Database Systems

[Bonnet+ 2001]

I. IntroductionI. Introduction

– Examples of monitoring applications include Examples of monitoring applications include

o supervising items in a factory warehouse, supervising items in a factory warehouse,

o gathering information in a disaster area, gathering information in a disaster area,

o organizing vehicle traffic in a large cityorganizing vehicle traffic in a large city

– These applications involve a combination of stored data (a list of sensors These applications involve a combination of stored data (a list of sensors

and their related attributes, such as their location) and sensor dataand their related attributes, such as their location) and sensor data

– These will be called These will be called sensor databasessensor databases

– This paper focuses on sensor query processing – the design, algorithms, This paper focuses on sensor query processing – the design, algorithms,

and implementations used to run queries over sensor databasesand implementations used to run queries over sensor databases

– A A sensor querysensor query is defined as a query expressed over a sensor database is defined as a query expressed over a sensor database

Towards Sensor Database Systems

[Bonnet+ 2001]

I. IntroductionI. Introduction

Factory Warehouse ExampleFactory Warehouse Example

– A A sensor querysensor query is defined as a query expressed over a sensor database is defined as a query expressed over a sensor database

– Each item of a factory warehouse has a stick-on temperature sensor Each item of a factory warehouse has a stick-on temperature sensor

attached to it as well as other attached sensors are to walls and attached to it as well as other attached sensors are to walls and

embedded in floors and ceilingsembedded in floors and ceilings

– Each sensor provides two signal-processing functions:Each sensor provides two signal-processing functions:

1.1. getTemperature()getTemperature() returns the measured temperature at regular intervals, and returns the measured temperature at regular intervals, and

2.2. detectAlarmTemperature(threshold)detectAlarmTemperature(threshold) returns the temperature whenever it returns the temperature whenever it

crosses a certain thresholdcrosses a certain threshold

– Each sensor is able to communicate this data and/or to store it locallyEach sensor is able to communicate this data and/or to store it locally

Towards Sensor Database Systems

[Bonnet+ 2001]

Factory Warehouse ExampleFactory Warehouse Example

– The sensor database stores the identifier of all sensors in the warehouse The sensor database stores the identifier of all sensors in the warehouse

along with their location and is connected to the sensor networkalong with their location and is connected to the sensor network

– The sensor database is used to make sure that items do not overheatThe sensor database is used to make sure that items do not overheat

– Typical queries that are run continuously may include:Typical queries that are run continuously may include:

o Query 1: “Return repeatedly the abnormal temperatures measured”Query 1: “Return repeatedly the abnormal temperatures measured”

o Query 2: “Every minute, return the temperature measured on the third floor”Query 2: “Every minute, return the temperature measured on the third floor”

o Query 3: “Generate a notification whenever two sensors within 5 yards of each Query 3: “Generate a notification whenever two sensors within 5 yards of each

other simultaneously measure an abnormal temperature”other simultaneously measure an abnormal temperature”

o Query 4: “Every five minutes retrieve the maximum temperature measured Query 4: “Every five minutes retrieve the maximum temperature measured

over the last five minutes”over the last five minutes”

o Query 5: “Return the average temperature measured on each floor over the Query 5: “Return the average temperature measured on each floor over the

last 10 minutes”last 10 minutes”

Towards Sensor Database Systems

[Bonnet+ 2001]

Factory Warehouse ExampleFactory Warehouse Example

– These examples queries has the following characteristics:These examples queries has the following characteristics:

o Monitoring queries are long runningMonitoring queries are long running

o The desired result of a query is typically a series of notifications of system The desired result of a query is typically a series of notifications of system

activity (periodic or triggered by special situations)activity (periodic or triggered by special situations)

o Queries need to correlate data produced simultaneously by different sensorsQueries need to correlate data produced simultaneously by different sensors

o Queries need to aggregate sensor data over time windowsQueries need to aggregate sensor data over time windows

o Most queries contain some condition restricting the set of sensors that are Most queries contain some condition restricting the set of sensors that are

involved (usually geographical conditions)involved (usually geographical conditions)

– Queries are formulated regardless of the physical structure or the Queries are formulated regardless of the physical structure or the

organization of the sensor network since the actual structure and organization of the sensor network since the actual structure and

population of a sensor network may vary over the lifespan of a querypopulation of a sensor network may vary over the lifespan of a query

– There are similarities with relational database query processing – most There are similarities with relational database query processing – most

applications combine sensor data with stored dataapplications combine sensor data with stored data

Towards Sensor Database Systems

[Bonnet+ 2001]

– Sensor data differs from traditional relational data since it is not stored in a Sensor data differs from traditional relational data since it is not stored in a

database server and it varies over timedatabase server and it varies over time

– There are two approaches for processing sensor queries:

o Warehousing approach:

represents the current state-of-the-art

processing of sensor queries and access to the sensor network are

separated – the sensor network is used by a data collection

mechanism

suited for answering predefined queries over historical data

proceeds in two steps: (i) data is extracted from sensor network in a

predefined manner and is stored in a database located on a unique

front-end server; (ii) query processing takes place on the centralized

database

Towards Sensor Database Systems

[Bonnet+ 2001]

– Distributed approach:

this approach is the focus of this paper

the query workload determines the data to be extracted from sensors

provides flexibility – different queries extract different data from the

sensor network – and efficient – only relevant data are extracted

from the sensor network

it allows the sensor database system to leverage the computing

resources on the sensor nodes: a sensor query can be evaluated at

the front-end server, in the sensor network, at the sensors, or at

some combination of the three

– Sensor database system should deal with sensor and communication Sensor database system should deal with sensor and communication

failures; it should consider sensor data as measurements with an failures; it should consider sensor data as measurements with an

associated uncertainty not as fact; it should establish and run a distributed associated uncertainty not as fact; it should establish and run a distributed

query execution plan without assuming global knowledgequery execution plan without assuming global knowledge

Towards Sensor Database Systems

[Bonnet+ 2001]

– The paper has the following contributions: The paper has the following contributions:

o Built on the results of [Seshadri+ 1995] to define a data model and long-Built on the results of [Seshadri+ 1995] to define a data model and long-

running queries semantics for sensor databases. A sensor database mixes running queries semantics for sensor databases. A sensor database mixes

stored data and sensor data. stored data and sensor data. Stored dataStored data are represented as relations while are represented as relations while

sensor datasensor data are represented as time series. Each long-running query defines are represented as time series. Each long-running query defines

a persistent view that is maintained during a given time interval.a persistent view that is maintained during a given time interval.

o Described the design and implementation of the Cornell COUGAR sensor Described the design and implementation of the Cornell COUGAR sensor

database system where COUGAR extends the Cornell PREDATOR object-database system where COUGAR extends the Cornell PREDATOR object-

relational database system. In COUGAR, each type of sensor is modeled as relational database system. In COUGAR, each type of sensor is modeled as

a new Abstract Data Type (ADT). Signal-processing functions are modeled as a new Abstract Data Type (ADT). Signal-processing functions are modeled as

ADT functions that return sensor data. Long-running queries are formulated in ADT functions that return sensor data. Long-running queries are formulated in

SQL. To support the evaluation of long-running queries, the query execution SQL. To support the evaluation of long-running queries, the query execution

engine is extended with a new mechanism for the execution of sensor ADT engine is extended with a new mechanism for the execution of sensor ADT

functionsfunctions

Towards Sensor Database Systems

[Bonnet+ 2001]

II. A Model for Sensor Database SystemsII. A Model for Sensor Database Systems

– Build on existing work by [Seshadri+ 1995] to define a data model for Build on existing work by [Seshadri+ 1995] to define a data model for

sensor data and an algebra of operators to formulate sensor queriessensor data and an algebra of operators to formulate sensor queries

II.A. Sensor DataII.A. Sensor Data

– A sensor database involves stored data and sensor dataA sensor database involves stored data and sensor data

– Stored data include the set of sensors participating in the sensor Stored data include the set of sensors participating in the sensor

database along with characteristics of the sensors (e.g., their location) or database along with characteristics of the sensors (e.g., their location) or

characteristics of the physical environmentcharacteristics of the physical environment

– These stored data are represented as relationsThese stored data are represented as relations

– The question: The question: how to represent sensor data?how to represent sensor data?

Towards Sensor Database Systems

[Bonnet+ 2001]

II.A. Sensor DataII.A. Sensor Data

– Sensor data are generated by signal processing functions and the Sensor data are generated by signal processing functions and the

representation chosen for sensor data should formulate sensor queries representation chosen for sensor data should formulate sensor queries

(data collection, correlation in time, and aggregates over time windows)(data collection, correlation in time, and aggregates over time windows)

– Time is essential -- signal processing functions may return output Time is essential -- signal processing functions may return output

repeatedly over time, and each output has a time-stamprepeatedly over time, and each output has a time-stamp

– In addition, monitoring queries introduce constraints on the sensor data In addition, monitoring queries introduce constraints on the sensor data

time-stamps, e.g., Query 3 in Example 1 assumes that the abnormal time-stamps, e.g., Query 3 in Example 1 assumes that the abnormal

temperatures are detected either simultaneously or within a certain time temperatures are detected either simultaneously or within a certain time

interval. Queries 4 and 5; on the other hand, aggregates over time interval. Queries 4 and 5; on the other hand, aggregates over time

windows and reference time explicitlywindows and reference time explicitly

Towards Sensor Database Systems

[Bonnet+ 2001]

II.A. Sensor DataII.A. Sensor Data

– Sensor data is represented as time seriesSensor data is represented as time series

– Representation of sensor time series are based on the sequence model Representation of sensor time series are based on the sequence model

introduced by [Seshadri+ 1995]introduced by [Seshadri+ 1995]

– A sequence is defined as a 3-tuple comprised of A sequence is defined as a 3-tuple comprised of

o a set of records Ra set of records R

o a countable totally ordered domain O (ordering domain – the elements of a countable totally ordered domain O (ordering domain – the elements of

the ordering domain are referred to as positions)the ordering domain are referred to as positions)

o an ordering of R by Oan ordering of R by O ( (defined as a relation between O and R, such

that every record in R is associated with some position in O

– Sequence operators are n-ary mappings on sequences; they operate Sequence operators are n-ary mappings on sequences; they operate

on a given number of input sequences producing a unique output on a given number of input sequences producing a unique output

sequencesequence

Towards Sensor Database Systems

[Bonnet+ 2001]

II.A. Sensor DataII.A. Sensor Data

– All sequence operators can be composedAll sequence operators can be composed

– Sequence operators include: select, project, compose (natural join

on the position), and aggregates over a set of positions

– Sensor data as a time series is represented with the following

properties:

1. The set of records corresponds to the outputs of a signal processing function

over time

2. The ordering domain is a discrete time scale, i.e. a set of time quantum where

each time quantum corresponds a position. Natural numbers are used as the

time-series ordering domain. Each natural number represents the number of

time units elapsed between a given origin and any (discrete) point in time. It is

assumed that clocks are synchronized and thus all sensors share the same

time scale

Towards Sensor Database Systems

[Bonnet+ 2001]

II.A. Sensor DataII.A. Sensor Data

3. All outputs of the signal processing function generated during a time quantum

are associated to the same position p. In case a sensor does not generate

events during the time quantum associated to a position, the Null record is

associated to that position

4. Whenever a signal processing function produces an output, the base

sequence is updated at the position corresponding to the production time.

Updates to sensor time series occur in increasing position order

II.B. Sensor QueriesII.B. Sensor Queries

– A sensor database involves stored data and sensor data, i.e., relations A sensor database involves stored data and sensor data, i.e., relations

and sequencesand sequences

– Sensor querySensor query is defined as an acyclic graph of relational and sequence is defined as an acyclic graph of relational and sequence

operatorsoperators

Towards Sensor Database Systems

[Bonnet+ 2001]

II.B. Sensor QueriesII.B. Sensor Queries

– The inputs of a relational operator are either base relations or the output The inputs of a relational operator are either base relations or the output

of another relational operator; the inputs of a sequence operator are of another relational operator; the inputs of a sequence operator are

either base sequences or the output of another sequence operator, i.e. either base sequences or the output of another sequence operator, i.e.

relations are manipulated using relational operators and sequences are relations are manipulated using relational operators and sequences are

manipulated using sequence operatorsmanipulated using sequence operators

– There are three exceptions to this rule – three operators allow combining There are three exceptions to this rule – three operators allow combining

relations and sequences: relations and sequences:

o the relational projection operator can take a sequence as input and project the relational projection operator can take a sequence as input and project

out the position attribute to obtain a relationout the position attribute to obtain a relation

o a cross product operator can take as input a relation and a sequence to a cross product operator can take as input a relation and a sequence to

produce a sequenceproduce a sequence

o an aggregate operator can take a sequence as input and a grouping list that an aggregate operator can take a sequence as input and a grouping list that

does not include the position attributedoes not include the position attribute

Towards Sensor Database Systems

[Bonnet+ 2001]

II.B. Sensor QueriesII.B. Sensor Queries

– Sensor queries are long runningSensor queries are long running

– Each sensor query is associated a time interval of the form [O, O + T] Each sensor query is associated a time interval of the form [O, O + T]

where where O O is the time at which it is submitted and is the time at which it is submitted and T T is the number of time is the number of time

quantums during which it is runningquantums during which it is running

– During the life of long-running query, relations and sensor sequences During the life of long-running query, relations and sensor sequences

may be updated may be updated

– An update to a relation R can be an insert, a delete, or modifications of a An update to a relation R can be an insert, a delete, or modifications of a

record in R, whereas, an update to a sensor sequence S is the insertion record in R, whereas, an update to a sensor sequence S is the insertion

of a new record associated to a position greater than or equal to all the of a new record associated to a position greater than or equal to all the

undefined positions in S undefined positions in S

Towards Sensor Database Systems

[Bonnet+ 2001]

II.B. Sensor QueriesII.B. Sensor Queries

– A sensor query defines a view that is persistent during its associated time A sensor query defines a view that is persistent during its associated time

interval where this persistent view is maintained to reflect the updates interval where this persistent view is maintained to reflect the updates

that are repeatedly performed on sensor time seriesthat are repeatedly performed on sensor time series

– [Jagadish+ 1995] presented that persistent views over relations and [Jagadish+ 1995] presented that persistent views over relations and

sequences could be maintained incrementally without accessing the sequences could be maintained incrementally without accessing the

complete sequencescomplete sequences

– Informally, persistent views can be maintained incrementally if updates Informally, persistent views can be maintained incrementally if updates

occur in increasing position order and if the algebra used to compose occur in increasing position order and if the algebra used to compose

queries does not allow sequences to be combined using any relational queries does not allow sequences to be combined using any relational

operatorsoperators

– Both conditions hold in the definition of a sensor database used in this Both conditions hold in the definition of a sensor database used in this

paperpaper

Towards Sensor Database Systems

[Bonnet+ 2001]

III. The COUGAR Sensor Database SystemIII. The COUGAR Sensor Database System

– The initial version of COUGAR system has been evaluated in the The initial version of COUGAR system has been evaluated in the

following aspects:following aspects:

1.1. User representationUser representation: :

o How are sensors and signal processing functions modeled in the How are sensors and signal processing functions modeled in the

database schema? database schema?

o How are queries formulated?How are queries formulated?

2.2. Internal representationInternal representation: :

o How is sensor data represented within the database components that How is sensor data represented within the database components that

perform query processing?perform query processing?

o How are sensor queries evaluated to provide the semantics of long-How are sensor queries evaluated to provide the semantics of long-

running queries?running queries?

Towards Sensor Database Systems

[Bonnet+ 2001]

III. A User RepresentationIII. A User Representation

– In COUGAR, signal-processing functions are represented as Abstract In COUGAR, signal-processing functions are represented as Abstract

Data Type (ADT) and a Sensor ADT is considered for all sensors of a Data Type (ADT) and a Sensor ADT is considered for all sensors of a

same type (e.g., temperature sensors, seismic sensors)same type (e.g., temperature sensors, seismic sensors)

– The public interface of a Sensor ADT corresponds to the specific signal-The public interface of a Sensor ADT corresponds to the specific signal-

processing functions supported by a type of sensor whereas an ADT processing functions supported by a type of sensor whereas an ADT

object in the database corresponds to a physical sensor in the real worldobject in the database corresponds to a physical sensor in the real world

– Sensor queries are formulated in SQL with small modifications to the Sensor queries are formulated in SQL with small modifications to the

languagelanguage

– The ‘FROM’ clause of a sensor query includes a relation whose schema The ‘FROM’ clause of a sensor query includes a relation whose schema

contains a sensor ADT attribute while the expressions over sensor ADTs contains a sensor ADT attribute while the expressions over sensor ADTs

are included in either the ‘SELECT’ or the ‘WHERE’ clause of a sensor are included in either the ‘SELECT’ or the ‘WHERE’ clause of a sensor

queryquery

Towards Sensor Database Systems

[Bonnet+ 2001]

III. A User RepresentationIII. A User Representation

– The queries introduced earlier are formulated in COUGAR as follows: The queries introduced earlier are formulated in COUGAR as follows:

o The simplified schema of the sensor database contains one relation The simplified schema of the sensor database contains one relation R(loc R(loc

point, floor int, s sensorNode),point, floor int, s sensorNode), where where locloc is a point ADT that stores the is a point ADT that stores the

coordinates of the sensor, coordinates of the sensor, floorfloor is the floor where the sensor is located in the is the floor where the sensor is located in the

data warehouse and data warehouse and sensorNodesensorNode is a Sensor ADT that supports the methods is a Sensor ADT that supports the methods

getTemp()getTemp() and and detectAlarmTemp(threshold)detectAlarmTemp(threshold), where threshold is the threshold , where threshold is the threshold

temperature above which abnormal temperatures are returnedtemperature above which abnormal temperatures are returned

o Both ADT functions return temperature represented as floatBoth ADT functions return temperature represented as float

Towards Sensor Database Systems

[Bonnet+ 2001]

III. A User RepresentationIII. A User Representation

– Query 1: “Return repeatedly the abnormal temperatures measured by all sensors”Query 1: “Return repeatedly the abnormal temperatures measured by all sensors”

SELECT R.s.detectAlarmTemp(100)

FROM R

WHERE $every();

The expression $every() is introduced as a syntactical construct to indicate that the

query

is long-running

– Query 2: “Every minute, return the temperature measured by all sensors on the Query 2: “Every minute, return the temperature measured by all sensors on the

third floor”third floor”

SELECT R.s.getTemp()

FROM R

WHERE R.floor = 3 AND $every(60);

The expression $every() takes as argument the time in seconds between successive

outputs of the sensor ADT functions in the query

Towards Sensor Database Systems

[Bonnet+ 2001]

III. A User RepresentationIII. A User Representation

– Query 3: “Generate a notification whenever two sensors within 5 yards of each Query 3: “Generate a notification whenever two sensors within 5 yards of each

other measure simultaneously an abnormal temperature”other measure simultaneously an abnormal temperature”

SELECT R1s.detectAlarmTemp(100), R2.s. detectAlarmTemp (100)

FROM R R1, R R2

WHERE $SQRT($SQR(R1.loc.x – R2.loc.x) + $SQR( R1.loc.y – R2.loc.y)) < 5

AND R1.s > R2.s AND $every();

This formulation assumes that the system incorporates an equality condition on the

time at which the temperatures are obtained from both sensors.

– Queries 4 and 5 cannot be expressed in the initial COUGAR since aggregates Queries 4 and 5 cannot be expressed in the initial COUGAR since aggregates

over time windows are not supportedover time windows are not supported

– Time interval associated with long-running queries in COUGAR is the interval Time interval associated with long-running queries in COUGAR is the interval

between the instant the query is submitted and the instant the query is explicitly between the instant the query is submitted and the instant the query is explicitly

stoppedstopped

Towards Sensor Database Systems

[Bonnet+ 2001]

III. B Internal RepresentationIII. B Internal Representation

– Query processing takes place on a database front-end while signal-Query processing takes place on a database front-end while signal-

processing functions are executed on the sensor nodes involved in the processing functions are executed on the sensor nodes involved in the

queryquery

– The query execution engine on the database front-end includes a The query execution engine on the database front-end includes a

mechanism for interacting with remote sensors where a query execution mechanism for interacting with remote sensors where a query execution

engine in each sensor executes signal processing functions and sends engine in each sensor executes signal processing functions and sends

data back to the front-enddata back to the front-end

– In COUGAR, it is assumed that there are no modifications to the stored In COUGAR, it is assumed that there are no modifications to the stored

data during the execution of a long-running query -- strict two-phase data during the execution of a long-running query -- strict two-phase

locking on the database front-end ensures verification of this assumptionlocking on the database front-end ensures verification of this assumption

Towards Sensor Database Systems

[Bonnet+ 2001]

III. B Internal RepresentationIII. B Internal Representation

– The initial version of COUGAR does not consider a long-running query as The initial version of COUGAR does not consider a long-running query as

a persistent view; the system computes the incremental results that could a persistent view; the system computes the incremental results that could

be used to maintain a view where these incremental results are obtained be used to maintain a view where these incremental results are obtained

by evaluating sensor ADT functions repeatedly and by combining the by evaluating sensor ADT functions repeatedly and by combining the

outputs they produce over time with stored dataoutputs they produce over time with stored data

– The execution of Sensor ADT functions is essential for sensor queries The execution of Sensor ADT functions is essential for sensor queries

executionexecution

Towards Sensor Database Systems

[Bonnet+ 2001]

Advantages:Advantages:

– Distributed approach makes it efficient since only the relevant data are Distributed approach makes it efficient since only the relevant data are

extracted from the WSN under consideration, hence reducing the extracted from the WSN under consideration, hence reducing the

communication and processing overheadcommunication and processing overhead

– The representation of the processing function as ADT provides The representation of the processing function as ADT provides

controlled access to encapsulated data through a well-defined set of controlled access to encapsulated data through a well-defined set of

functionsfunctions

– The authors chose not to reinvent the wheel as the sensor queries are The authors chose not to reinvent the wheel as the sensor queries are

formulated in SQL with little modification to the languageformulated in SQL with little modification to the language

– The use of Virtual Relations introduces more flexibilityThe use of Virtual Relations introduces more flexibility

Towards Sensor Database Systems

[Bonnet+ 2001]

Disadvantages:Disadvantages:

– Since the authors are proposing a sensor DB system, how does their Since the authors are proposing a sensor DB system, how does their

system adhere to the ACID properties of a DB needs to be mentionedsystem adhere to the ACID properties of a DB needs to be mentioned

– The protocol assumes that the sensed data is all stored in the sensor The protocol assumes that the sensed data is all stored in the sensor

node – this may introduce a space constraint in the sensor nodenode – this may introduce a space constraint in the sensor node

– The sensor data is time variant and after a certain time the data would The sensor data is time variant and after a certain time the data would

be outdatedbe outdated

– The authors do not suggest any rule for what time duration the data The authors do not suggest any rule for what time duration the data

should be held in the nodeshould be held in the node

Towards Sensor Database Systems

[Bonnet+ 2001]

Suggestions/Improvements/Future Work:Suggestions/Improvements/Future Work:

– Handle multiple copies of same kind of data available from nearby Handle multiple copies of same kind of data available from nearby

resourcesresources

– Provide adaptive query processing mechanismProvide adaptive query processing mechanism

– Handle mobile nodes and sinksHandle mobile nodes and sinks

Towards Sensor Database Systems

[Bonnet+ 2001]

IntroductionIntroduction

– The paper discusses the challenges associated with implementing the five The paper discusses the challenges associated with implementing the five

basic database aggregates (COUNT, MIN, MAX, SUM, and AVERAGE) basic database aggregates (COUNT, MIN, MAX, SUM, and AVERAGE)

– The network aggregation approach discussed in this paper is driven by a The network aggregation approach discussed in this paper is driven by a

general purpose, SQL-style interface that can execute queries over any kind general purpose, SQL-style interface that can execute queries over any kind

of sensor data irregardless of the applicationof sensor data irregardless of the application

– There are two benefits of this approach over the traditional network solution There are two benefits of this approach over the traditional network solution

which is generally application dependent:which is generally application dependent:

o Computation can be optimized by defining the language that users use Computation can be optimized by defining the language that users use

to express aggregatesto express aggregates

o Since the same aggregation language can be applied to all data types, Since the same aggregation language can be applied to all data types,

the burden on programmers is substantially less: they can issue the burden on programmers is substantially less: they can issue

declarative, SQL style queries rather than implementing custom declarative, SQL style queries rather than implementing custom

networking protocols to extract the needed data from the networknetworking protocols to extract the needed data from the network

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

IntroductionIntroduction

– The paper presents a variety of techniques to improve the reliability and The paper presents a variety of techniques to improve the reliability and

performance of the proposed solutionperformance of the proposed solution

– In addition, it is shown how grouped aggregates can be efficiently computed In addition, it is shown how grouped aggregates can be efficiently computed

and offered a comparison to related systems and database projectsand offered a comparison to related systems and database projects

– Two properties of radio communication need to be pointed out:Two properties of radio communication need to be pointed out:

o Radio is a broadcast medium such that any sensor within hearing Radio is a broadcast medium such that any sensor within hearing

distance can hear any message irrespective of whether or not it is the distance can hear any message irrespective of whether or not it is the

intended recipientintended recipient

o Radio links are typically symmetric: if a sensor a can hear sensor b, it is Radio links are typically symmetric: if a sensor a can hear sensor b, it is

assumed that sensor b can also hear sensor a; however, this may not assumed that sensor b can also hear sensor a; however, this may not

hold true in some caseshold true in some cases

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

BackgroundBackground

– Messages in the current generation of TinyOS are a fixed size Messages in the current generation of TinyOS are a fixed size

preprogrammed into sensorspreprogrammed into sensors

– Each message type has a Each message type has a message idmessage id that distinguishes it from other types that distinguishes it from other types

of messages and each sensor has a unique of messages and each sensor has a unique sensor idsensor id that distinguishes it that distinguishes it

from other sensorsfrom other sensors

– All messages specify their recipient (or broadcast), allowing sensors to ignore All messages specify their recipient (or broadcast), allowing sensors to ignore

messages not intended for them, although non-broadcast messages must messages not intended for them, although non-broadcast messages must

still be received by all sensors within range – unintended recipients drop still be received by all sensors within range – unintended recipients drop

messages not addressed to themmessages not addressed to them

– The technique adopted is to build a routing tree to route sensor dataThe technique adopted is to build a routing tree to route sensor data

– One sensor, typically interfaces the querying user to the rest of the network, One sensor, typically interfaces the querying user to the rest of the network,

is chosen to be the root from which the tree will be built and where the is chosen to be the root from which the tree will be built and where the

aggregated data will convergeaggregated data will converge

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

BackgroundBackground

– The root broadcasts a message for sensors to organize into a routing tree The root broadcasts a message for sensors to organize into a routing tree

and it includes its and it includes its own idown id and and levellevel (or distance from the root) (or distance from the root)

– Any sensor hearing this message assigns its own level to be the level in the Any sensor hearing this message assigns its own level to be the level in the

message plus one only if its current level is not already less than or equal to message plus one only if its current level is not already less than or equal to

the level in the messagethe level in the message

– It chooses the sender of the message as its It chooses the sender of the message as its parentparent

– Each of these sensors then broadcasts the routing message, along with their Each of these sensors then broadcasts the routing message, along with their

own ids and levelsown ids and levels

– The routing message floods down the tree with each node rebroadcasting the The routing message floods down the tree with each node rebroadcasting the

message until all nodes have been assigned a level and a parentmessage until all nodes have been assigned a level and a parent

– Node that hear multiple parents chose one randomlyNode that hear multiple parents chose one randomly

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

BackgroundBackground

– These routing messages are periodically broadcast from the root such that These routing messages are periodically broadcast from the root such that

the process of topology discovery goes on continuouslythe process of topology discovery goes on continuously

– This topology maintenance makes adaptation to network changes easier This topology maintenance makes adaptation to network changes easier

since each sensor looks at the history of received routing messages, since each sensor looks at the history of received routing messages,

chooses the best parent and ensures no routing cycles are createdchooses the best parent and ensures no routing cycles are created

– This method allows efficient route data towards the rootThis method allows efficient route data towards the root

– In order a message to reach the root, a sensor nodes sends a message to its In order a message to reach the root, a sensor nodes sends a message to its

parent which in turn forwards the message on to its parents and so onparent which in turn forwards the message on to its parents and so on

– Even though this approach does not address point-to-point routing, flooding Even though this approach does not address point-to-point routing, flooding

aggregation requests and routing replies up the tree to the route is sufficientaggregation requests and routing replies up the tree to the route is sufficient

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

Aggregation in Database SystemsAggregation in Database Systems

– Aggregation in SQL-based database systems is defined by an aggregate Aggregation in SQL-based database systems is defined by an aggregate

function and a grouping predicatefunction and a grouping predicate

– The aggregate function specifies how a set of values should be combined to The aggregate function specifies how a set of values should be combined to

compute an aggregate; the standard set of SQL aggregate functions is compute an aggregate; the standard set of SQL aggregate functions is

COUNT, MIN, MAX, AVERAGE, and SUMCOUNT, MIN, MAX, AVERAGE, and SUM

– These compute functions such as the following SQL statement: These compute functions such as the following SQL statement:

SELECTSELECT AVERAGE(temp) AVERAGE(temp) FROMFROM sensors sensors

computes the average temperature from some table sensors, which computes the average temperature from some table sensors, which

represents a set of sensor readings that have been read into the systemrepresents a set of sensor readings that have been read into the system

– Similarly, the COUNT function counts the number of items in a set, the MIN Similarly, the COUNT function counts the number of items in a set, the MIN

and MAX functions compute minimal and maximal values, and SUM and MAX functions compute minimal and maximal values, and SUM

calculates the total of all values calculates the total of all values

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

Aggregation in Database SystemsAggregation in Database Systems

– In addition, most database systems allowIn addition, most database systems allow user-defined functionsuser-defined functions (UDFs) that (UDFs) that

specify more complex aggregates than the five listed abovespecify more complex aggregates than the five listed above

– Rather than merely computing a single aggregate value over the entire set of Rather than merely computing a single aggregate value over the entire set of

data values, a grouping predicate partitions the values into groups based on data values, a grouping predicate partitions the values into groups based on

some attributesome attribute

– For example, the query:For example, the query:

SELECTSELECT TRUNC(temp/10), AVERAGE(light) TRUNC(temp/10), AVERAGE(light)FROMFROM sensors sensorsGROUP BYGROUP BY TRUNC(temp/10) TRUNC(temp/10)HAVINGHAVING AVERAGE(light) AVERAGE(light)

partitions sensor readings into groups according to their temperature readingpartitions sensor readings into groups according to their temperature readingand computes the average light reading within each group -- HAVING clauseand computes the average light reading within each group -- HAVING clauseexcludes groups whose average light readings are less than or equal to 50excludes groups whose average light readings are less than or equal to 50

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

Generic Aggregation Techniques Generic Aggregation Techniques

– There are two approaches:There are two approaches:

1.1. Server-basedServer-based – centralized approach where all sensor readings are sent – centralized approach where all sensor readings are sent

to host PC that computes the aggregatesto host PC that computes the aggregates

2.2. In-networkIn-network – distributed approach where aggregates are partially or fully – distributed approach where aggregates are partially or fully

computed by the sensors themselves as readings are routed through the computed by the sensors themselves as readings are routed through the

network towards the host PCnetwork towards the host PC

– This paper focuses on distributed in-network aggregation approachThis paper focuses on distributed in-network aggregation approach

– Figure 1 illustrates the benefits of in-network approach where dotted lines Figure 1 illustrates the benefits of in-network approach where dotted lines

represent connections between sensors and solid lines represent the routing represent connections between sensors and solid lines represent the routing

tree imposed on top of this graph to allow sensors to propagate data to the tree imposed on top of this graph to allow sensors to propagate data to the

root along a single pathroot along a single path

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

Generic Aggregation Techniques Generic Aggregation Techniques

– In the centralized approach, each sensor value must be routed to the root of In the centralized approach, each sensor value must be routed to the root of

the network; for a node at depth the network; for a node at depth nn, this requires , this requires n-1n-1 messages to be messages to be

transmitted per sensortransmitted per sensor

– The sensors in Figure 1(a) have been labeled with their distance from the The sensors in Figure 1(a) have been labeled with their distance from the

root; summing these numbers gives a total of sixteen messages required to root; summing these numbers gives a total of sixteen messages required to

route all aggregation information to the rootroute all aggregation information to the root

– The sensors in Figure 1(b): sensors with no children transmit their readings The sensors in Figure 1(b): sensors with no children transmit their readings

to their parentsto their parents

– Intermediate nodes (with children) combine their own readings with the Intermediate nodes (with children) combine their own readings with the

readings of their children via the aggregation function readings of their children via the aggregation function ff and propagate the and propagate the

partial aggregate, along with any extra data required to update the partial aggregate, along with any extra data required to update the

aggregate, up the treeaggregate, up the tree

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

Generic Aggregation Techniques Generic Aggregation Techniques

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

Figure 1: Figure 1: Server-based (a) vs. In-network (b) aggregation. In (a), each node isServer-based (a) vs. In-network (b) aggregation. In (a), each node islabelled with the number of messages required to get data to the host PC: a labelled with the number of messages required to get data to the host PC: a

total of 16 messages are required. In (b), only one message is sent along total of 16 messages are required. In (b), only one message is sent along each edge as aggregation is performed by the sensors themselveseach edge as aggregation is performed by the sensors themselves

Generic Aggregation Techniques Generic Aggregation Techniques

– Amount of data transmitted depends on the aggregateAmount of data transmitted depends on the aggregate

– For example, AVERAGE function will be calculated by the sum and the count For example, AVERAGE function will be calculated by the sum and the count

of all children’s sensor readings while other standard SQL aggregates such of all children’s sensor readings while other standard SQL aggregates such

as COUNT, MIN, MAX, and SUM can be computed by a parent node given as COUNT, MIN, MAX, and SUM can be computed by a parent node given

sensor or partial aggregate values at all of the child nodessensor or partial aggregate values at all of the child nodes

– This work focuses on a class of aggregation predicates that can be This work focuses on a class of aggregation predicates that can be

expressed as an aggregate function f over the sets a and b such that:expressed as an aggregate function f over the sets a and b such that:

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

bfafgbaf ,

Injecting a Query Injecting a Query

– Computing an aggregate consists of two phases: a Computing an aggregate consists of two phases: a propagationpropagation phase, in phase, in

which aggregate queries are pushed down into sensor networks, and an which aggregate queries are pushed down into sensor networks, and an

aggregationaggregation phase, where the aggregate values are propagated up from phase, where the aggregate values are propagated up from

children to parentschildren to parents

– Leaf nodes must find out that they are leaves and propagate singular Leaf nodes must find out that they are leaves and propagate singular

aggregates up to their parentsaggregates up to their parents

– When When aa sensor p receives an aggregate a, it transmits a and begins listening sensor p receives an aggregate a, it transmits a and begins listening

– If If p p has any children, it will hear those children re-transmit a to their children, has any children, it will hear those children re-transmit a to their children,

and figure out that it is not a leafand figure out that it is not a leaf

– After time interval After time interval tt, if , if pp has not heard any children, it means that it is a leaf has not heard any children, it means that it is a leaf

and transmits its current sensor value up the routing treeand transmits its current sensor value up the routing tree

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

Injecting a Query Injecting a Query

– If If pp has children, they are supposed to report within time has children, they are supposed to report within time tt, and after time , and after time tt, it , it

computes the value of computes the value of aa applied to its own value and the value of its children applied to its own value and the value of its children

and forwards this partial aggregate to its parentand forwards this partial aggregate to its parent

– It is essential to note that short duration for It is essential to note that short duration for tt can lead to missed reports from can lead to missed reports from

children, and also the proper value of children, and also the proper value of tt varies depending on the depth of the varies depending on the depth of the

routing treerouting tree

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

Streaming Aggregates Streaming Aggregates

– Since sensor networks are inherently unreliable, it is difficult to guarantee Since sensor networks are inherently unreliable, it is difficult to guarantee

that portion of a sensor network was not detached during an aggregate that portion of a sensor network was not detached during an aggregate

computationcomputation

– The paper proposes The paper proposes pipelined aggregatepipelined aggregate

– The aggregates are propagated into the network and time is divided into The aggregates are propagated into the network and time is divided into

intervals of duration intervals of duration ii

– During each interval, each sensor that has heard the request to aggregate During each interval, each sensor that has heard the request to aggregate

transmits a partial aggregate by applyingtransmits a partial aggregate by applying a a to its local reading and the values to its local reading and the values

of its children reported during previous intervalof its children reported during previous interval

– After the first interval, root hears from one-hop away sensorsAfter the first interval, root hears from one-hop away sensors

– After the second, it hears aggregates of sensors one and two hops away and After the second, it hears aggregates of sensors one and two hops away and

so onso on

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

Streaming Aggregates Streaming Aggregates

– The pipelined solution has the following properties:The pipelined solution has the following properties:

o After aggregates have propagated up from leaves, a new aggregate After aggregates have propagated up from leaves, a new aggregate

arrives every arrives every ii seconds seconds

o Total time for an aggregation request to propagate down to the leaves Total time for an aggregation request to propagate down to the leaves

and back to the root is about t, but the user starts seeing approximations and back to the root is about t, but the user starts seeing approximations

of the aggregate after the first interval has elapsedof the aggregate after the first interval has elapsed

– These properties give users a stream of aggregate values that changes as These properties give users a stream of aggregate values that changes as

sensor readings and the underlying network changesensor readings and the underlying network change

– Continuous results are more useful than a single aggregate since they Continuous results are more useful than a single aggregate since they

provide users with understanding of how the network is behaving over timeprovide users with understanding of how the network is behaving over time

– Figure 2 illustrates a simple aggregate running in a pipelined approach Figure 2 illustrates a simple aggregate running in a pipelined approach

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

Streaming Aggregates Streaming Aggregates

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

–The important drawback of this approach is the number of additional The important drawback of this approach is the number of additional

messages transmitted to extract the first aggregate over all sensorsmessages transmitted to extract the first aggregate over all sensors

–The non-pipelined aggregate requires one down and one back along with each The non-pipelined aggregate requires one down and one back along with each

edgeedge

Figure 2: Pipelined computation of aggregatesFigure 2: Pipelined computation of aggregates

Advantages:Advantages:

– Routing messages for tree building are broadcast periodically, thus any Routing messages for tree building are broadcast periodically, thus any

changes in the topology are updatedchanges in the topology are updated

– The method of pipelined aggregate overcomes the need to have multiple The method of pipelined aggregate overcomes the need to have multiple

transmissions of aggregate queries. Any nodes that miss the query in transmissions of aggregate queries. Any nodes that miss the query in

an interval have the time to catch up in the successive onesan interval have the time to catch up in the successive ones

– Using pipelined aggregates, the user sees approximates of the Using pipelined aggregates, the user sees approximates of the

aggregates. This continuous results is more useful then one single, aggregates. This continuous results is more useful then one single,

isolated resultisolated result

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

Disadvantages:Disadvantages:

– No suggestions made on any protocol for choosing a node as the rootNo suggestions made on any protocol for choosing a node as the root

– The node designated as the root will have to cope with a lot of data The node designated as the root will have to cope with a lot of data

constraining on the battery power of the root nodeconstraining on the battery power of the root node

– The radio of the nodes would have to be kept ON most of the times to The radio of the nodes would have to be kept ON most of the times to

facilitate snoopingfacilitate snooping

– Using pipelined aggregation increases the number of messages that are Using pipelined aggregation increases the number of messages that are

transmitted to the node, which contradicts the goal of reducing the transmitted to the node, which contradicts the goal of reducing the

number of messagesnumber of messages

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

Suggestions/Improvements/Future Work:Suggestions/Improvements/Future Work:

– Introduce hybrid pipeline scheme in order to balance tradeoff between Introduce hybrid pipeline scheme in order to balance tradeoff between

robustness, incorporating nodes that lose initial aggregation requests robustness, incorporating nodes that lose initial aggregation requests

and number of messages passedand number of messages passed

– There is a need to work towards adaptive aggregation algorithms that There is a need to work towards adaptive aggregation algorithms that

are going to be self-configuringare going to be self-configuring

– Introduce a query layer for WSN that will accept queries in declarative Introduce a query layer for WSN that will accept queries in declarative

language, which can be optimized to generate efficient query execution language, which can be optimized to generate efficient query execution

plan with in-network processingplan with in-network processing

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks [Madden+ 2002]

[Bonnet+ 2001] Philippe Bonnet, J.E. Gehrke, and Praveen Seshadri, Towards Sensor Database Systems, In Proceedings of the Second International Conference on Mobile Data Management. Hong Kong, January 2001.

[Jagadish+ 1995] H.V. Jagadish, I.S. Mumick, and A. Silberschatz, View Maintenance Issues for the Chronicle Data Model, PODS 1995, pp. 113-124.

[Madden+ 2002] S. Madden, R. Szewczyk, M. J. Franklin, and D. Culler, Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks, In proceedings of the 4th IEEE Workshop on Mobile Computing and Systems Applications, 2002.

[Seshadri+ 1995] P. Seshadri, M. Livny, and R. Ramakrishnan, SEQ: A Model for Sequence Databases, ICDE 1995, pp. 232-239.

[Yao+ 2003] Y. Yao and J.E. Gehrke, Query Processing in Sensor Networks, In Proceedings of the First Biennial Conference on Innovative Data Systems Research (CIDR 2003), Asilomar, California, January 2003.

References