Upload
jose-a-rodriguez-serrano
View
42
Download
0
Embed Size (px)
Citation preview
Data Proximity:Simple solutions to complex
data science problems
Jose A. Rodriguez-Serrano@bbvadata
Ph.D. In Computer Science
Lead Data Scientist at BBVA Data & Analytics
Data science Solving problems with data (and computers)
Problem 1. Undoing a traffic jam
CC h
ttps:
//www
.flick
r.com
/pho
tos/
prou
st/
Problem 2 :Where was each of these
pictures taken?
(GPS coordinates if possible)
Problem 3: Forecast the next value of anything
?
?How would you solve these 3 problems?
If you had to solve all the 3 problems atthe same time, would you think differently?
They can all be addressedwith the same solution!
Dilemma:
Best solution for each problemvs.
1 acceptable solution for all the problems
Sensor Sensor
Sensor
SensorSensor
Sensor
Sensors measure current traffic “state”
Sensor Sensor
Sensor
SensorSensor
Sensor
Timestamp State Action that solved23/09/13
18:00[81 54 53 9 17 98 1 20
…]OPEN BUS GATE
25/09/13 08:54
[154 53 91 17 98 1 20 …]
DISPLAY ALT ROUTE
25/08/13 17:56
[23 87 65 87 24 89 89 …]
ALTER TRAFFIC LIGHT
28/08/13 20:00
[81 34 53 9 27 98 1 20 …]
DISPLAY EVENT INFO
Sensors measure current traffic “state”
(Large) Database ofTraffic Problems,
States, and Solutions
Next time: Find most similar traffic state, and apply registered action.
E.g. Mounce et al., A metric for pattern-matching applications to traffic management, Transportation Research C, 2010
Geolocalizing images just with data
Geotagged image database(e.g… Flickr)
e.g. Hays, Efros, IM2GPS: estimating geographic information from a single image, CVPR 2006
Mos
t sim
ilar g
eota
gged
imag
es
Find
mod
e of
loca
tions
Forecast
Reasoning from “neighbor transfer”A design pattern to quickly build data science applications
1/ Find a similar situation in your data (neighbors)
2/ Take the solution/action/output that was registered
Reasoning from “neighbor transfer”
Neighbor transfer is not new
Crucial enablers:
1/Lots of data
2/Good similarity measures
3/ Efficient search (HW & SW)
Make things as simple as possiblebut not simpler
A. Einstein
Vehicle pose recognition
Rodriguez, Larlus, Dai, Data-driven detection of prominent objects, IEEE Trans. PAMI, 2015
Neighbor transfer… + deep learning = doableRippel et al., Metric learning with adaptive density discrimination, ICLR 2016
Why should I adopt that?
When there’s a lot of data, sometimes simple solutions work well.
With big data, sometimes it’s even difficult to beat the simple methods
Technical Debt Matters
This method is generic, and easy to maintain.
Any programmer can implement it.
We think often about scaling to lots of data,
Should we start thinking about scaling to lots of problems?