15
Smart Analytics for Big Time-series Data Yasushi Sakurai (Kumamoto University) Yasuko Matsubara (Kumamoto University) Christos Faloutsos (Carnegie Mellon University) http://www.cs.kumamoto- u.ac.jp/~yasuko/TALKS/17-KDD-tut/ © 2017 Sakurai, Matsubara & Faloutsos 1

Smart Analytics for Big Time-series Data - Sakurai

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Smart Analytics for Big Time-series Data - Sakurai

Kumamoto U CMU CS

Smart Analytics for Big Time-series Data

Yasushi Sakurai (Kumamoto University)Yasuko Matsubara (Kumamoto University)Christos Faloutsos (Carnegie Mellon University)

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 1

Page 2: Smart Analytics for Big Time-series Data - Sakurai

Kumamoto U CMU CS

Roadmap

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 2

Part1

Part2

Part3

- Motivation- Similarity search,

pattern discoveryand summarization

- Non-linear modeling and forecasting

- Extension of time-series data: tensor analysis

Page 3: Smart Analytics for Big Time-series Data - Sakurai

Kumamoto U CMU CS

Big Data

Big time-series data

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 3

climate

EconomyWeb

Social/natural phenomena

IoT devices

Epidemic

Page 4: Smart Analytics for Big Time-series Data - Sakurai

Kumamoto U CMU CS

Big Data

Big time-series data

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 4

climate

EconomyWeb

Social/natural phenomena

IoT devices

Online user activities- Web clicks, - Online reviewsApplication- Market analysis

Epidemic

Page 5: Smart Analytics for Big Time-series Data - Sakurai

Kumamoto U CMU CS

Big Data

Big time-series data

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 5

climate

EconomyWeb

Social/natural phenomena

IoT devices

IoT data streams- Vibration sensors, acceleration,

temperature, etc.Application- Self-driving car- Structural health monitoring- Manufacturing

Epidemic

Page 6: Smart Analytics for Big Time-series Data - Sakurai

Kumamoto U CMU CS

Motivation• Given: Big time-series data• Goal:

Find important patterns Forecast future social activities

• At-work: –Online marketing–Sensor monitoring, anomaly detection–Forecasting future events

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 6

Page 7: Smart Analytics for Big Time-series Data - Sakurai

Kumamoto U CMU CS

Motivation• Time-series analysis for big data–Web and social networks– IoT data streams–Medical and healthcare records

• Digital universe growth– 4.4 zettabytes

(4.4 trillion gigabytes)– 44 zettabytes in 2020

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 7

4.4ZB

44ZB

2013 2020TheDIGITALUNIVERSEofOPPORTUNITIES(IDC2014)

Page 8: Smart Analytics for Big Time-series Data - Sakurai

Kumamoto U CMU CS

Big Time-series analysis• Volume and Velocity– High-speed processing for large-scale data– Low memory consumption– Online processing for real-time data management

• Variety of data types– Multi-dimensional time-series data (e.g., IoT device data)– Complex time-stamped events (e.g., web-click logs)– Time-evolving graph (e.g., social networks)

• Advanced techniques for big data– Model estimation, summarization– Anomaly detection, forecasting

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 8

Time

Page 9: Smart Analytics for Big Time-series Data - Sakurai

Kumamoto U CMU CS

Big Time-series analysis• Time-series data mining

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 9

Indexing,similaritysearch

Featureextraction

Streammining

Linearmodeling

PCA ICAModel(M)Data(X)

ED,DTWCorrelation

AR,ARIMA,LDS

StatStreametc…

DFT,DWT,SVD,ICA

Y

X

DTW

Page 10: Smart Analytics for Big Time-series Data - Sakurai

Kumamoto U CMU CS

New research directionsR1. Automatic mining (no magic numbers!)R2. Non-linear (gray-box) modelingR3. Tensor analysis

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 10

X

≈ + ...NOmagicnumbers!

Page 11: Smart Analytics for Big Time-series Data - Sakurai

Kumamoto U CMU CS

(R1) Automatic miningNo magic numbers! ... because,

Big data mining: -> we cannot afford human intervention!!

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 11

Manual- sensitive to the parameter tuning- long tuning steps (hours, days, …)

Automatic (no magic numbers)- no expert tuning required

Page 12: Smart Analytics for Big Time-series Data - Sakurai

Kumamoto U CMU CS(R2) Non-linear (gray-box) modeling

• Gray-box mining– If we know the equations

• Non-linear (differential) equations–Epidemic–Biology–Physics, Economics, etc.,

• Modeling non-linear phenomena– Non-linear analysis for big time-series data

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 12

ImagecourtesyofTinaPhillipsandamenic181atFreeDigitalPhotos.net.

Gray-box

Page 13: Smart Analytics for Big Time-series Data - Sakurai

Kumamoto U CMU CS(R3) Large-scale tensor analysis

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 13

Time URL User08-01-12:00 CNN.com Smith08-02-15:00 YouTube.com Brown08-02-19:00 CNET.com Smith08-03-11:00 CNN.com Johnson

… … …

RepresentasMth ordertensor(M=3)

URL

user

Time

x

u

v

n

Elementx:#ofeventse.g.,‘Smith’,‘CNN.com’,‘Aug1,10pm’;21times

• Time-stamped events– e.g., web clicks

X

Page 14: Smart Analytics for Big Time-series Data - Sakurai

Kumamoto U CMU CS

• Time-series data analysis– Indexing and fast searching–Sequence matching–Clustering– etc.

• New research directionsR1. Automatic miningR2. Non-linear modelingR3. Large-scale tensor analysis

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 14

X ≈ + ...

Time

New research directions

NOmagic#

Page 15: Smart Analytics for Big Time-series Data - Sakurai

Kumamoto U CMU CS

• Time-series data analysis– Indexing and fast searching–Sequence matching–Clustering– etc.

• New research directionsR1. Automatic miningR2. Non-linear modelingR3. Large-scale tensor analysis

http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/17-KDD-tut/ ©2017Sakurai,Matsubara&Faloutsos 15

X ≈ + ...

Time

New research directions

NOmagic#

Part2Part3

Part1