Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Kumamoto U CMU CS
Smart Analytics for Big Time-series Data
Yasushi Sakurai (Kumamoto University)Yasuko Matsubara (Kumamoto University)Christos Faloutsos (Carnegie Mellon University)
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 1
Kumamoto U CMU CS
Roadmap
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 2
Part1
Part2
Part3
- Motivation- Similarity search,
pattern discoveryand summarization
- Non-linear modeling and forecasting
- Extension of time-series data: tensor analysis
Kumamoto U CMU CS
Big Data
Big time-series data
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 3
climate
EconomyWeb
Social/natural phenomena
IoT devices
Epidemic
Kumamoto U CMU CS
Big Data
Big time-series data
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 4
climate
EconomyWeb
Social/natural phenomena
IoT devices
Online user activities- Web clicks, - Online reviewsApplication- Market analysis
Epidemic
Kumamoto U CMU CS
Big Data
Big time-series data
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 5
climate
EconomyWeb
Social/natural phenomena
IoT devices
IoT data streams- Vibration sensors, acceleration,
temperature, etc.Application- Self-driving car- Structural health monitoring- Manufacturing
Epidemic
Kumamoto U CMU CS
Motivation• Given: Big time-series data• Goal:
Find important patterns Forecast future social activities
• At-work: –Online marketing–Sensor monitoring, anomaly detection–Forecasting future events
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 6
Kumamoto U CMU CS
Motivation• Time-series analysis for big data–Web and social networks– IoT data streams–Medical and healthcare records
• Digital universe growth– 4.4 zettabytes
(4.4 trillion gigabytes)– 44 zettabytes in 2020
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 7
4.4ZB
44ZB
2013 2020TheDIGITALUNIVERSEofOPPORTUNITIES(IDC2014)
Kumamoto U CMU CS
Big Time-series analysis• Volume and Velocity– High-speed processing for large-scale data– Low memory consumption– Online processing for real-time data management
• Variety of data types– Multi-dimensional time-series data (e.g., IoT device data)– Complex time-stamped events (e.g., web-click logs)– Time-evolving graph (e.g., social networks)
• Advanced techniques for big data– Model estimation, summarization– Anomaly detection, forecasting
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 8
Time
Kumamoto U CMU CS
Big Time-series analysis• Time-series data mining
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 9
Indexing,similaritysearch
Featureextraction
Streammining
Linearmodeling
PCA ICAModel(M)Data(X)
ED,DTWCorrelation
AR,ARIMA,LDS
StatStreametc…
DFT,DWT,SVD,ICA
Y
X
DTW
Kumamoto U CMU CS
New research directionsR1. Automatic mining (no magic numbers!)R2. Non-linear (gray-box) modelingR3. Tensor analysis
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 10
X
≈ + ...NOmagicnumbers!
Kumamoto U CMU CS
(R1) Automatic miningNo magic numbers! ... because,
Big data mining: -> we cannot afford human intervention!!
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 11
Manual- sensitive to the parameter tuning- long tuning steps (hours, days, …)
Automatic (no magic numbers)- no expert tuning required
Kumamoto U CMU CS(R2) Non-linear (gray-box) modeling
• Gray-box mining– If we know the equations
• Non-linear (differential) equations–Epidemic–Biology–Physics, Economics, etc.,
• Modeling non-linear phenomena– Non-linear analysis for big time-series data
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 12
ImagecourtesyofTinaPhillipsandamenic181atFreeDigitalPhotos.net.
Gray-box
Kumamoto U CMU CS(R3) Large-scale tensor analysis
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 13
Time URL User08-01-12:00 CNN.com Smith08-02-15:00 YouTube.com Brown08-02-19:00 CNET.com Smith08-03-11:00 CNN.com Johnson
… … …
RepresentasMth ordertensor(M=3)
URL
user
Time
x
u
v
n
Elementx:#ofeventse.g.,‘Smith’,‘CNN.com’,‘Aug1,10pm’;21times
• Time-stamped events– e.g., web clicks
X
Kumamoto U CMU CS
• Time-series data analysis– Indexing and fast searching–Sequence matching–Clustering– etc.
• New research directionsR1. Automatic miningR2. Non-linear modelingR3. Large-scale tensor analysis
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 14
X ≈ + ...
Time
New research directions
NOmagic#
Kumamoto U CMU CS
• Time-series data analysis– Indexing and fast searching–Sequence matching–Clustering– etc.
• New research directionsR1. Automatic miningR2. Non-linear modelingR3. Large-scale tensor analysis
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 15
X ≈ + ...
Time
New research directions
NOmagic#
Part2Part3
Part1