22
IBM Research Multi-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide (“Ide-san”), Dzung T. Phan, J. Kalagnanam PhD, Senior Technical Staff Member IBM Thomas J. Watson Research Center IEEE International Conference on Data Mining (ICDM 2017) This slides are available at ide-research.net.

Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

IBM Research

Multi-task Multi-modal Models for Collective Anomaly Detection

Tsuyoshi Ide (“Ide-san”), Dzung T. Phan, J. Kalagnanam

PhD, Senior Technical Staff Member

IBM Thomas J. Watson Research Center

IEEE International Conference on Data Mining (ICDM 2017)

This slides are available at ide-research.net.

Page 2: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

2

IBM Research

Outline

▪ Problem setting

▪Modeling strategy

▪Model inference approach

▪ Experimental results

Page 3: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

3

IBM Research

Wish to build a collective monitoring solution

▪ You have many similar but not identical industrial

assets

▪ You want to build an anomaly detection model for

each of the assets

▪ Straightforward solutions have serious limitationso 1. Treat the systems separately. Create each model

individually

✓ Suffers from lack of fault examples

o 2. Build one universal model by disregarding individuality

✓ Model fit is not good

…System 1

(in New

Orleans)

System s

System S

(in New York)

Page 4: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

4

IBM Research

Practical requirements: Need to capture both commonality

and individuality

▪ Capture both individuality and commonality

▪ Automatically capture multiple operational

states o Real-world is not single-peaked (single-modal)

▪ Be robust to noise

▪ Be highly interpretable for diagnosis purposes

…System 1

(in New

Orleans)

System s

System S

(in New York)

Page 5: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

5

IBM Research

Formalizing the problem as multi-task density estimation for

anomaly detection

Data Prob. density Anomaly score

all data

• overall

• variable-wise

mu

lti-

task le

arn

ing

(M

TL

)

…System 1

(in New

Orleans)

System s

System S

(in New York)

Page 6: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

6

IBM Research

Outline

▪ Problem setting

▪Modeling strategy

▪Model inference approach

▪ Experimental results

Page 7: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

7

IBM Research

Use Gaussian graphical model (GGM)-based anomaly

detection approach as the basic building block

Sparse graphical model

training data

Multi-variate data Anomaly score

Overall score

Variable-wise score

[Ide+ SDM09] [Ide+ ICDM16]

sample covariance

Page 8: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

8

IBM Research

Basic modeling strategy: Combine common pattern

dictionary with individual weights

Monitoring model

for System 1

Monitoring model

for System 2

Monitoring model

for System S

……

sparse

GGM 1

sparse

GGM 2

sparse

GGM K

Common dictionary

of sparse graphs

GGM=Gaussian Graphical Model

prob.

prob.

prob.

Individual sparse weights

…System 1

(in New

Orleans)

System s

System S

(in New York)

Page 9: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

9

IBM Research

Basic modeling strategy: Resulting model will be a sparse

mixture of sparse GGM

Monitoring model for System s

sparse

GGM 1

sparse

GGM 2

sparse

GGM K

GGM=Gaussian Graphical Model

prob.

System s

Gaussian mixture

Sparse mixture

weights

(= automatic

determination of the

number of patterns)

Sparse

Gaussian

graphical

model

Page 10: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

10

IBM Research

Outline

▪ Problem setting

▪Modeling strategy

▪Model inference approach

▪ Experimental results

Page 11: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

11

IBM Research

Employing a Bayesian model for multi-modal MTL

▪Observation model (for the s-th task)o Gaussian mixture with task-dependent weight

▪ Sparsity enforcing priors (non-conjugate)o Laplace prior for the precision matrix

o Bernoulli prior for the mixture weights

▪ Conjugate prior on and

Page 12: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

12

IBM Research

Maximizing log likelihood using variational Bayes combined

with point-estimation

▪ Log likelihood

▪ Use VB for

▪ Use point-estimate for o Results in two convex optimization problems

Likelihood by the obs. model Prior distributions

Page 13: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

13

IBM Research

Maximizing log likelihood using variational Bayes combined

with point-estimation

▪ Update sample weights

▪ Update cluster weights

▪ Update precision matrices

▪ Update other parametersSolved by graphical lasso [Friedman 08]

Use new semi-closed form solution

total # of samples assigned to the k-th cluster

The ratio of samples

assigned to the k-th cluster

Page 14: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

14

IBM Research

Solving the L0-regularized optimization problem for mixture

weights

▪What is the problem of the conventional VB approach?o Simply differentiate w.r.t. o Claims to get a sparse solution [Corduneanu+ 01]o But mathematically cannot be zero due to logarithm

▪We re-formalized the problem as a convex mixed-integer programmingo A semi-closed form solution can be derived ( see paper)

Page 15: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

15

IBM Research

Comparison with possible alternatives

Interpretability Noise reduction Fleet-readiness Multi-modality

Our work [Ide et al. ICDM 17] Yes Yes Yes Yes

(single) sparse GGM [Ide et al. SDM 2009, Ide et al.

ICDM 2016] Yes Yes No No

Gaussian mixtures [Yamanishi

et al., 2000; Zhang and Fung,

2013; Gao et al., 2016]

Limited Limited No Yes

Multi-task sparse

GGM

[Varoquaux et al., 2010;

Honorio and Samaras, 2010;

Chiquet et al., 2011; Danaher

et al., 2014; Gao et al., 2016;

Peterson et al., 2015].

Yes Yes Yes No

Multi-task learning

anomaly detection

[Bahadori et al., 2011; He et

al., 2014; Xiao et al., 2015] No (depends) Yes No

Page 16: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

16

IBM Research

Outline

▪ Problem setting

▪Modeling strategy

▪Model inference approach

▪ Experimental results

Page 17: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

17

IBM Research

Experiment (1): Learning sparse mixture weights

▪ Conventional ARD approach

sometimes get stuck with local

minimao ARD = automatic relevance

determination

o Often less sparser than the proposed

convex L0 approach

▪ Typical result of log likelihood vs

VB iteration count

Conventional ARD

approach gets stuck

with a local minimum

Proposed convex L0 approach

gives better likelihood

Page 18: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

18

IBM Research

Experiment (2): Learning GGMs and detecting

anomalies

▪ “Anuran Calls” (frog voice) data in UCI Archiveo Multi-modal (multi-peaked)o Voice signal + attributes (species, etc.)

▪ Created 10-variate, 3-task dataseto Use the species of “Rhinellagranulosa”

as the anomaly

▪ Resultso Two non-empty GGMs are automatically

detected starting from K=9o Clearly outperformed single-modal MTL

alternative in anomaly detection

✓ Group graphical lasso, fused graphical lasso

Automatically learned GGMs

Example of variable-wise distribution

Page 19: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

19

IBM Research

Conclusion

▪ Developed multi-task density estimation framework that can handle

multi-modalityo Featuring double sparsity: mixture weights, variable dependency

▪ Demonstrated the utility in the context of condition-based asset

management

Page 20: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

20

IBM Research

Thank you!

Page 21: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

21

IBM Research

Integrated monitoring tool allows sharing rare anomaly data

across different assets

▪ In condition-based monitoring, big data may

not be really bigo Anomalous samples account for less than 0.2% in a

metal smelting process

▪ Coverage of anomalies and thus accuracy

can be limited due to lack of dataNormal: 99.8%

Anomalous:

0.2%

Page 22: Multi-task Multi-modal Models for Collective …ide-research.net/papers/2017_ICDM_Ide.pptx.pdfIBM ResearchMulti-task Multi-modal Models for Collective Anomaly Detection Tsuyoshi Ide

22

IBM Research

System (task) 1

(in New Orleans)

System (task) 2

(in New York)

Existing methods cannot handle multi-modality

Comparing the proposed multi-task multi-modal (MTL-MM) model with standard

Gaussian mixture (GMM) and multi-task learning (MTL) models

Can handle multi-modality

but two systems must have

the same model

Can treat different systems

differently but cannot handle

multi-modality

sensor variable

(e.g. temperature)

sensor variable

(e.g. temperature)

probability

density