Upload
boris-adryan
View
647
Download
1
Embed Size (px)
Citation preview
The logarithmic history of things
Boris the Academic “Give me £50M and I build you the best IoT ontology money can buy.”
“I wonder if anyone is making money with IoT”
Talking about inflated expectations
“There may be
money in IoT”
“I’m going to get
rich with IoT”“I’m making a decent
salary with IoT”
The logarithmic history of things
Boris the Academic “Give me £50M and I build you the best IoT ontology money can buy.”
Boris the Freelancer “If you want to pay £5M for machine learning - make sure it isn’t rude or annoying.”
Boris at Zühlke “Don’t pay anyone £0.5M - I show you how we can do it for half.”
Do I get more peanuts at Thing Monk or at Monki Gras?
0 50 100
“on average” thingmonk 3 samples
“on average” monkigras
Do I get more peanuts at Thing Monk or at Monki Gras?
0 50 100
“on average” thingmonk 4 samples
“on average” monkigras
Do I get more peanuts at Thing Monk or at Monki Gras?
0 50 100
“on average” thingmonk
n samples
“on average” monkigras
statistical power through large numbers of samples
deviation
Statisticians and data scientists LOVE larger sample sizes!
…but if sampling costs time and resources, we need a compromise.
precision and accuracy that can be achieved
theoretically
Sampling strategy
precision and accuracy that is needed to get
a job done
accurate and precise
not accurate, but precise
accurate, not precise
not what you want
39% of survey participants are worried about the upfront investment for an industrial IoT solution.
“Why aren’t you doing IoT?”
•how to cut down on hardware costs •how to cut down on software costs
Sweetening IoT for your customerA few recommendations from the trenches:
insights from a project with OpenSensors
Westminster Parking Trial
https://www.westminster.gov.uk/new-trial-improve-conditions-disabled-drivers
IoT solution
Service company
~750 independent parking lots with a total of
>3,500 individual spaces
access to
Can we learn an optimal deployment and sampling pattern?
•sampling rate of 5-10 min •data over 2 weeks in May 2015 •overall 2.6 million data points
Can we make Ethos’ budget go further by • distributing a given number of sensors over
a wider geographic area? • lowering the sampling rate for better
battery life?
labour: expensive
sensor: cheap
Correlation and clustering
0
5
10
15
20
0 3 6 9 12
“correlated”
0
5
10
15
20
0 3 6 9 12
“anti-correlated”
0
5
10
15
20
0 3 6 9 12
“independent”
lorry
coach
car
bike
skateboard
hierarchical clustering on the basis of a feature matrix
Good news: temporal occupancy pattern roughly predicts neighbours
lots in Southampton
lots around the corner of each other
750 parking lots
A caveat: Is a high-degree of correlation a function of parking lot size?
finding two lots of 20 spaces that correlate
finding two lots of 3 spaces that correlate
0:00 12:00 23:59
0:00 12:00 23:59
“more likely”
“less likely”
Bootstrapping in DBSCAN clusters
Simulation: Swap the occupancy vectors between parking lots of similar size and test per grid cell if lots still correlate
Verdict: In some grid cells the level of the occupancy of one parking lot predicts the occupancy of most parking spaces.
x
x
x
x
x
x
x
xx x x
xxxx
x
Better for navigation
We suggested that about ONE THIRD of the sensors may be sufficient.
Better predictive power
Suggested technology for trials
A temporary survey would have allowed us to make the same recommendation, including the insight that the provided 5’ resolution is probably not required.
Monte Carlo simulations are great tools to assess the business value of IoT
base
assets
“A tour of my assets every Friday.”
base
‘cost function’: sum of all edges
p1(need today)
“A demand-driven tour of my assets.”
‘cost function’: sum of edges
needed in 7 days
p2(need today)
p3(need today)
p4(need today)
p5(need today)
p6(need today)
Hardware is often perceived as investment that customers understand and therefore anticipate the cost.
This talk is about unfounded IoT fears.
There’s an air of magic around data and analytics.
“My data problem must be special!”
✓ unstructured data
✓ distributed ingestion and storage
Or they believe from hear-say that IoT automatically requires:
✓ real-time analytics
✓ sophisticated machine learning
My company went to an IoT conference
& all I got was this t-shirt
and a bunch of buzzwords.
Customers fear costs because they’re thinking about:
“I need to do real-time analytics!”
microseconds to seconds
seconds to minutes
minutes to hours
hours to weeks
on device
on stream
in batch
am I falling? counteract
battery level should I land?
how many times did I
stall?
what’s the best weather for
flying?
in process
in database
operational insight
performance insight
strategic insight
e.g. Kalman filter
e.g. with machine learning
e.g. rules engine
e.g. summary stats
Edge, fog and cloud computing
Edge Pro: - immediate compression from raw
data to actionable information - cuts down traffic - fast response
Con: - loses potentially valuable raw data - developing analytics on embedded
systems requires specialists - compute costs valuable battery life
Cloud Pro: - compute power - scalability - familiarity for developers - integration centre across
all data sources - cheapest ‘real-time’
option
Con: - traffic
Fog Pro: - same as Edge - closer to ‘normal’ development work - gateways often mains-powered
Con: - loses potentially valuable raw data
Options for real-time in cloudsome features can cost a bit, especially when you don’t really know what you’re doing and want to ‘try it out’.
a badly configured SMACK stack on your own commodity hardware can be slow and unreliable
your pre-trained classifier
My current pet hate: Deep Learning
Deep learning has delivered impressive results mimicking human reasoning, strategic thinking and creativity.
At the same time, big players have released libraries such that even ‘script kiddies’ can apply deep learning.
It’s already leading to unreflected use of deep learning when other methods would be more appropriate.
Dr. Boris Adryan @BorisAdryan
‣ Preliminary surveys, data analysis and simulation can help to minimise the number of sensors and develop an optimal deployment strategy and sampling schedule.
‣ Faster analytics on bigger and better hardware are not automatically the most useful solution.
‣ A good understanding on the type of insight that is required by the business model is essential.
Zühlke can advise on options around IoT and data analytics, and provide complete solutions where needed.
Summary