Upload
demetris-trihinas
View
413
Download
0
Embed Size (px)
Citation preview
AdaM: an Adaptive Monitoring Framework for Sampling and Filtering
on IoT Devices
IEEE International Conference on Big Data 2015 (BigData 2015)
Oct 29 – Nov 01, 2015 @ Santa Clara, CA, USA
Demetris Trihinas, George Pallis, Marios D. DikaiakosUniversity of Cyprus
{trihinas, gpallis, mdd}@cs.ucy.ac.cy
Demetris Trihinas 2
IoT was initially devices sensing and exchanging data streams with humans or other network-enabled devices
IEEE BIGDATA CONFERENCE 2015
Edge-Mining
Demetris Trihinas 3
A term coined to reflect data processing and decision-making on
“smart” devices that sit at the edge of IoT networks
Buy more detergent
…our devices just got a little bit more “smarter”…
IEEE BIGDATA CONFERENCE 2015
The “Big Data” in IoT
Demetris Trihinas 4
• Taming data volume and data velocity with
limited processing and network capabilities
• IoT devices are usually battery-powered which means intense
processing leads to increased energy consumption (less battery-life)
Zhang et al., Usenix HotCloud, 2015
Challenges
[IDC, Big Data in IoT, 2014]
IEEE BIGDATA CONFERENCE 2015
Adaptive Sampling and Filtering
Demetris Trihinas 5IEEE BIGDATA CONFERENCE 2015
Metric Stream
Demetris Trihinas 6
• A metric stream 𝑀 = {𝑠𝑖}𝑖=0𝑛 is a large sequence of collected samples,
denoted as 𝑠𝑖 where 𝑖 = 0, 1, … , 𝑛 and 𝑛 → ∞
• Each sample 𝑠𝑖 is a tuple 𝑡𝑖 , 𝑣𝑖 described by a timestamp 𝑡𝑖 and a
value 𝑣𝑖
𝑠𝑖
𝑠𝑖+1
Metric Stream M
IEEE BIGDATA CONFERENCE 2015
Periodic Sampling
Demetris Trihinas 7
• The process of triggering the collection mechanism of a monitored source every
𝑇 time units such that the 𝑖𝑡ℎ sample is collected at time 𝑡𝑖 = 𝑖 ∙ 𝑇
𝑠𝑖
𝑠𝑖+1
Metric Stream M sampled every T = 1s
Compute resources and energy are wasted while generating large data volumes at a high velocity
Metric Stream M sampled every T = 10s
Sudden events and significant insights are missed
IEEE BIGDATA CONFERENCE 2015
Adaptive Sampling
Demetris Trihinas 8
• Dynamically adjust the sampling period 𝑇𝑖 based on some function,
containing information of the metric stream evolution
𝑠𝑖𝑠𝑖+1
𝑇𝑖+1
Metric Stream 𝑀′ dynamically sampledMetric Stream 𝑀 with 𝑇 = 1𝑠
IEEE BIGDATA CONFERENCE 2015
• Adapting the sampling rate reduces unnecessary processing accompanied
by computing a new sample and consequently preserves energy!
Metric Filtering
Demetris Trihinas 9
• “Cleansing” the metric stream by not transmitting over the network
consecutive collected samples of “little” difference
• Fixed Range Metric Filters
Metric Stream M with fixed filter Range 𝑅 = 1%
if (curValue ∈ [prevValue – R, prevValue + R ])
filter(curValue)
Although stable phases exist, only 2 metric values are filtered
Can we know this R beforehand?
Should it keep the same value forever?
Is it the same for each and every metric?
IEEE BIGDATA CONFERENCE 2015
𝑅𝑠𝑖+1
𝐹𝑖𝑙𝑡𝑒𝑟𝑒𝑑!𝑠𝑖
x x
Adaptive Filtering
Demetris Trihinas 10
• Dynamically adjusting the filter range 𝑅 based on some function containing
information of the current variability of the metric stream
Metric Stream 𝑀′ with adaptive RangeMetric Stream 𝑀
𝑠𝑖
𝑠𝑖+1
𝑅𝑖+1
𝑅𝑖
IEEE BIGDATA CONFERENCE 2015
• Adapting the filter range to the variability of the metric streams allows
for data volume and network traffic to be reduced
The Adaptive Monitoring FrameworkAdaM
Demetris Trihinas 11
SensingUnit
ProcessingUnit
Communication
Unit
AdaptiveSampling
AdaptiveFiltering
Knowledge Base
AdaM
IoT Device
𝑠𝑖
𝑇𝑖+1
𝑅𝑖+1𝑠𝑖
• Software library with no external dependencies embeddable on IoT devices
• Reduces processing, network traffic and energy consumption by adapting the
monitoring intensity based on the metric stream evolution and variability
• Achieves a balance between efficiency and accuracyIt’s open-source! Give it a try!
https://github.com/dtrihinas/AdaM
Raspberry Pi
Arduino
Beacons
IEEE BIGDATA CONFERENCE 2015
Adaptive Sampling Algorithm• Step 1: Compute the distance 𝛿𝑖 between the current two consecutive
sample values
• Step 2: Compute metric stream evolution based on a Probabilistic
Exponential Weighted Moving Average (PEWMA) to estimate 𝛿𝑖+1
and the standard deviation 𝜎𝑖+1:
𝛿𝑖+1 = 𝜇𝑖 = 𝑎 ∙ 𝜇𝑖−1 + 1 − 𝑎 𝛿𝑖
Demetris Trihinas 12
Looks like an exponential moving average, right?
But weighting is probabilistically applied!
𝑎 = 𝛼 1 − 𝛽P𝑖
IEEE BIGDATA CONFERENCE 2015
Why use a Probabilistic EWMA?
Simple EWMA is volatile to abrupt transient changes
• Slow to acknowledge spike after “stable” periods
• If “stable” phases follow sudden spikes, then subsequent
values are overestimated
Demetris Trihinas 13
Sounds a lot like a job for
a Gaussian distribution!
IEEE BIGDATA CONFERENCE 2015
Adaptive Sampling Algorithm• Step 3: Compute actual standard deviation 𝜎𝑖
• Step 4: Compute the current confidence (𝑐𝑖 ≤ 1) of our approach
based on the actual and estimated standard deviation
𝑐𝑖 = 1 −| 𝜎𝑖 − 𝜎𝑖|
𝜎𝑖
• Step 5: Compute the sampling period 𝑇𝑖+1 based on the current
confidence and the user-defined imprecision γ
Demetris Trihinas 14
… the more “confident” the algorithm is, the larger the
outputted sampling period 𝑇𝑖+1 can be…
O(1) complexity as all steps only use their previous values!
IEEE BIGDATA CONFERENCE 2015
Adaptive Filtering Algorithm
• Step 1: Compute the dispersion of the stream based on average 𝜇𝑖
and standard deviation 𝜎𝑖 already computed from adaptive sampling
𝐹𝑖 =𝜎𝑖
2
𝜇𝑖
• Step 2: Compute the new filter range 𝑅𝑖+1 based on 𝐹𝑖 and the user-
defined imprecision 𝛾
Demetris Trihinas 15
…a low 𝐹𝑖 indicates a non dispersed data stream which
means a low variability in the metric stream…
O(1) complexity as all steps only use their previous values!
IEEE BIGDATA CONFERENCE 2015
𝑠𝑖
𝑠𝑖+1𝑅𝑖+1
𝑠𝑖+2
𝑠𝑖+3
𝐹𝑖𝑙𝑡𝑒𝑟𝑒𝑑!𝑅𝑖+2𝑅𝑖
𝐹 < 𝛾
Evaluation
Demetris Trihinas 16IEEE BIGDATA CONFERENCE 2015
AdaM vs State-of-the-Art Adaptive Techniques
• i-EWMA: an EWMA adapting sampling period by 1 time unit when the
estimated error (ε) is under/over user-defined imprecision
• L-SIP[1]: a linear algorithm using a double EWMA to produce estimates of
current data distribution based on the rate sample values change
=> Slow to react to highly transient and abrupt fluctuations in the metric stream
• FAST[2]: an (aggressive) framework using a PID controller to compute (large)
sampling periods accompanied by a Kalman Filter to predict values at non
sampling points
Demetris Trihinas 17
[1] Gaura et al., IEEE Sensors Journal, 2013 [2] Fan et al., ACM CIKM, 2012
IEEE BIGDATA CONFERENCE 2015
Datasets
Demetris Trihinas 18
Memory TraceJava Sorting Benchmark
CPU TraceCarnegie Mellon RainMon Project
Disk I/O TraceCarnegie Mellon RainMon Project
TCP Port Monitoring TraceCyber Defence SANS Tech Institute
Fitbit TraceCoursera Data Science Course
IEEE BIGDATA CONFERENCE 2015
Our Small “Big Data” Testbeds
Demetris Trihinas 19
Raspberry Pi 1st Gen. Model B512MB RAM, Single Core 700MHz
Daemon on OS emulates traces while feeding samples to each algorithm
Android Emulator + SensorSimulator512MB RAM, Single Core
SensorSimulator script emulates traces by feeding samples to Android emulator for each
algorithm
Daemon
IEEE BIGDATA CONFERENCE 2015
Evaluation Metrics• Estimation Accuracy – Mean Absolute Percentage Error (MAPE)
• CPU Cycles
• Outgoing Network Traffic
• Edge Device Energy Consumption*
Demetris Trihinas 20
*Xiao et al., CPSCom, 2010
perf
nethogs
powerstat
Other than error evaluation no other system goes through an overhead study!
Error Comparison
Demetris Trihinas 21
Memory Trace CPU Trace Disk I/O Trace
TCP Port Monitoring Trace Fitbit Trace
For all traces AdaM has the smallest inaccuracy (<10%)
Even in a more aggressive configuration (λ=2) AdaM is still comparable to L-SIP
AdaM AdaM AdaM
AdaM AdaM
User-defined accepted imprecision set to γ = 0.1
IEEE BIGDATA CONFERENCE 2015
So How Well Does AdaM Perform?
Demetris Trihinas 22
Memory TraceJava Sorting Benchmark
CPU TraceCarnegie Mellon RainMon Project
Disk I/O TraceCarnegie Mellon RainMon Project
TCP Port Monitoring TraceCyber Defence SANS Tech Institute
Fitbit TraceCoursera Data Science Course
IEEE BIGDATA CONFERENCE 2015
Overhead Comparison (1)
Demetris Trihinas 23
Network Traffic
Energy Consumption
CPU Cycles
• Data volume reduced by 74%
• Energy consumption by 71%
• Accuracy is always above 89% and 83%
with filtering enabled
IEEE BIGDATA CONFERENCE 2015
Overhead Comparison (2)
Demetris Trihinas 24
Network Traffic Energy Consumption
Error (MAPE)
FAST’s aggressiveness results in slightly lower
energy consumption and network traffic
But, this does not come for free…
FAST features a large inaccuracy
AdaM achieves a balance between
efficiency and accuracy
IEEE BIGDATA CONFERENCE 2015
Scalability Evaluation• Integrated AdaM to a data streaming
system
• JCatascopia Cloud Monitoring*
• JCatascopia Agents (data sources) use AdaM
to adapt monitoring intensity
• Archiving time is measured at JCatascopia
Server to evaluate data velocity
• Compare AdaM over Periodic
Sampling
Demetris Trihinas 25
* [Trihinas et al, CCGrid 2014] https://github.com/dtrihinas/JCatascopia
DB
AdaMAgent
Agent
Agent
.
.
.
JCatascopia
AdaM
AdaM
metrics
Data velocity is reduced to a linear growth
IEEE BIGDATA CONFERENCE 2015
Conclusion and Future Work
Demetris Trihinas 26
• Data volume and velocity of IoT-generated data
keeps increasing
• AdaM reduces processing, network traffic and
energy consumption of IoT device by adapting
monitoring intensity
• AdaM achieves a balance between efficiency and
accuracy
• AdaM v2.0 support data streaming engines such
as Apache Spark and Flink
IEEE BIGDATA CONFERENCE 2015
It’s open-source! Give it a try! https://github.com/dtrihinas/AdaM
Demetris Trihinas 27
Laboratory for Internet ComputingDepartment of Computer Science
University of Cyprus
http://linc.ucy.ac.cy
IEEE BIGDATA CONFERENCE 2015