Automatic recognition of anomalous patterns in discharges

Gonzalo Farias1

Ernesto Fabregas2

Sebastián Dormido-Canto2

Jesús Vega3

Sebastián Vergara1

May 31/2019, Vienna, AUSTRIA

Automatic recognition of anomalous patterns in

discharges by applying Deep Learning

1 2 3

3rd IAEA Technical Meeting

on Fusion Data Processing,

Validation and Analysis

Outline

❑ Introduction

❑ Background

❑Anomaly detection

❑Autoencoder (AE)

❑ Proposed solution

❑ Preliminary results

❑ Conclusions

❑ Experimental nuclear fusion devices have enormous databases

Introduction

A simple shot of few seconds can

generate huge quantity of data:

• TJ-II device has +1000 channels

of measurements.

• A discharge in JET can take a

couple of seconds (10 GB/shot.

around 100 TB/year).

• ITER could generate 1 TB/shot.

around 1 PB/year.

It is estimated that only 10% of this data is analyzed!

❑ The idea is to use Artificial Intelligence to deal with this huge

quantity data.

❑ Create systems that allow specialists to analyze and interpret

data more quickly and efficiently than manually.

❑ One important issue is anomaly detection.

Introduction

❑ Anomaly: Something that deviates from what is standard,

normal, or expected.

❑ One type of anomaly is known as 'outlier', which is a value

located outside of the normal/regular class.

❑ In our case, anomaly detection refers to the problem of

finding patterns in data that do not conform to expected

behaviour.

Background – Anomalies

Clustering Pattern recognition

❑ We try to find anomalies in signals (known and unknown).

❑ Unknown (most of anomalies)

❑ Known: disruptions, ELMs, L-H transitions, and stray-light TS diagnostic.

1. Vega, J., et al. (2013). Results of the JET real-time disruption predictor in the ITER-like wall campaigns. Fusion Engineering and Design, 88(6-

8), 1228-1231.

2. Farias, G., Vega, J., Gonzalez, S., Pereira, A., Lee, X., Schissel, D., Gohil, P. (2012) Automatic determination of L/H transition times in DIII-D

through a collaborative distributed environment, Fusion Engineering and Design, ISSN 0920-3796, Volume 87, Issue 12, Pages 2081–2083.

3. Farias, G. , Dormido–Canto, S., Vega, J. , Santos, M. , Pastor, I., Fingerhuth, S. , Ascencio, J. (2014) Iterative Noise Removal from Temperature

and Density Profiles in the TJ–II Thomson Scattering, Fusion Engineering and Design, ISSN 0920-3796, Volume 89, Number 5, Pages 761–765.

Background – Anomalies

The importance of anomaly detection is due to the fact that anomalies in data

frequently involves significant and critical information.

❑ Prediction (trained model)

Background– Anomaly detection Using ML

t0 t1 t2 tn tn+1 t

V

Observed

Predicted

Anomaly

tn+2

E

K*std

-K*std

Anomaly Detection – Threshold (th=k*std)

❑ How the Anomaly is detected?

❑We fix a threshold proportional to the Standard Deviation of the Error.

❑ Anomaly detection using Autoencoder (AE).

❑ The AE learns the regular waveforms of data through the coding and

decoding of the input data.

❑ Important differences between predicted and observed signals are

considered as anomalies.

❑ The occurrence of one anomaly at a similar time in several signals

during a discharge could reveal that something significant has happened.

Anomaly Detection - Using ML/AE

❑ Autoencoder (AE)

Anomaly Detection – Autoencoder

Input

Layer

Hidden

Layer

Output

Layer

Input

Dat

a

Outp

ut

Dat

a

Encoder Decoder

It adjusts the bias and weights to

learn a function in an unsupervised

way.

AE tries to learn an approximation

to the identity function, so it

outputs something similar to its

input.

Placing constraints on the network,

such as by limiting the number of

hidden units, we can discover

interesting structure about the data.

The network is forced to learn a

“compressed” representation of the

input.

Farias, G., et al. (2016). Automatic feature extraction in large fusion databases by using deep learning approach. Fusion Engineering and Design, Volume

112, Pages 979–983.

Farias, G., et al. (2018). Applying deep learning for improving image classification in Nuclear Fusion Devices. IEEE Access, vol. 6, pp. 72345–72356.

❑ Autoencoder (AE) Cost Function

Anomaly Detection – Autoencoder

𝑀𝑆𝐸 =1

𝑁

𝑛=1

𝑁

(𝑥𝑘𝑛 − ො𝑥𝑘𝑛)2

Ω𝑤𝑒𝑖𝑡ℎ𝑡𝑠 =1

2

𝑙

𝐿

𝑗

𝑛

𝑖

𝑘

𝑤𝑗𝑖𝑙

Ω𝑠𝑝𝑎𝑟𝑠𝑖𝑡𝑦 =

𝑖=1

𝑠2

𝜌 log𝜌

ො𝜌𝑖+ (1 − 𝜌) log

1 − 𝜌

1 − ො𝜌𝑖

𝐽 = 𝑀𝑆𝐸 + 𝜆 ∗ Ω𝑤𝑒𝑖𝑡ℎ𝑡𝑠 + 𝛽 ∗ Ω𝑠𝑝𝑎𝑟𝑠𝑖𝑡𝑦

ො𝜌𝑖 =1

𝑛

𝑗=1

𝑛

ℎ 𝑤𝑖𝑙 𝑇𝑥𝑗 + 𝑏𝑖

(𝑙)

Input

Layer

Hidden

Layer

Output

Layer

Input

Dat

a

Outp

ut

Dat

a

Encoder DecoderFarias, G., et al. (2016). Automatic feature extraction in large fusion databases by using deep learning approach. Fusion Engineering and Design, Volume

112, Pages 979–983.

Farias, G., et al. (2018). Applying deep learning for improving image classification in Nuclear Fusion Devices. IEEE Access, vol. 6, pp. 72345–72356.

Anomaly Detection – Learning

It adjusts the bias and

weights to learn a function

(unsupervised):𝑥1

𝑥2

𝑥𝑛

𝑥3..

.

𝑥1′

𝑥2′

𝑥𝑛′

𝑥3′.

.

.

ℎ𝑊,𝑏(𝑥) ≈ 𝑥

The above is similar to

learning the waveform

Illustrative example:

Training as a database of

sinusoidal signals with

noise and different

frequencies, phases and

amplitudes. Then it is tested

with square-pulse signal

Forward

𝑥1, 𝑥2, 𝑥3, … , 𝑥𝑛 ≈ [𝑥1′, 𝑥2′, 𝑥3′, … , 𝑥𝑛′]

Anomaly Detection – PredictingAE used to detect anomalies Moving average to detect anomalies

Anomaly Detection – AlgorithmSet parameters

(𝑑𝑖,𝛼, Δ𝑡)

For each signal 𝑋𝑖 predict 𝑋𝑖′

𝑋𝑖′ = 𝑃𝑟𝑒𝑑𝑖𝑐𝑡(𝑋𝑖)

Trained AE

Calculate difference between Predicted and Observed𝐷𝑖𝑓𝑓𝑖𝑡 = 𝑋𝑖𝑡 − 𝑋’𝑖𝑡

Anomalies detection

Outlierit =ቊ10

𝑖𝑓 𝐷𝑖𝑓𝑓𝑖𝑡 ≥ 𝑑𝑖𝑖. 𝑜. 𝑐.

𝑘=𝑡+1−Δ𝑡

𝑡

∀𝑖

𝑛

𝑜𝑢𝑡𝑙𝑖𝑒𝑟𝑖𝑡 ≥ 𝛼

∀𝑖

𝑛

𝑜𝑢𝑡𝑙𝑖𝑒𝑟𝑖𝑡 ≥ 𝛼

Anomaly detected at time instant t

yes yes

no no

• di: Threshold of the difference band for signal i

• 𝛼: Number of signals involved in an anomaly

• Δ𝑡: Time windows

Anomalies during a time windowAnomalies at the same time

15

Anomaly Detection – In action (Δt = 1)

Anomaly Detection – In action (Δt = 1)

❑We look for simultaneous anomalies (same time instant) in different signals

within the same shot.

Anomaly Detection – In action (Δt > 1)

Anomaly Detection – In action (Δt > 1)

❑We look for simultaneous anomalies in Time Windows (Δt).

❑ Database from TJ-II Fusion Device

❑ 430 Shots with 9 signals each (330 for training, 100 for testing

randomly selected)

❑ Training time (1 minute for each type of signal, GPU)

❑ Testing time (Predicting, 10 ms each type of signal, GPU)

Preliminary Results

Preliminary Results – AE structure

❑ How many hidden units to use?

❑We train networks with different hidden units and measure the RMSE for

each type of signal.

RMSE

HiddenUnits Signal 1 Signal 2 Signal 3 Signal 4 Signal 5 Signal 6 Signal 7 Signal 8 Signal 9

1 0,42912 0,21228 0,19873 0,08722 0,04503 0,24234 0,13889 0,25375 0,18487

10 0,16928 0,09222 0,07663 0,04399 0,03544 0,08213 0,07262 0,10284 0,07730

100 0,082032 0,043349 0,031299 0,024351 0,013658 0,043946 0,043788 0,070419 0,04304

1000 0,089530 0,052544 0,025972 0,025922 0,017132 0,040409 0,037944 0,043563 0,04426

Underfitting

Overfitting Less anomalies detected

More anomalies detected

Preliminary Results – AE structure

❑ How many hidden units to use?

❑We train networks with different hidden units and measure the RMSE for

each type of signal.

RMSE

HiddenUnits Signal 1 Signal 2 Signal 3 Signal 4 Signal 5 Signal 6 Signal 7 Signal 8 Signal 9

1 0,42912 0,21228 0,19873 0,08722 0,04503 0,24234 0,13889 0,25375 0,18487

10 0,16928 0,09222 0,07663 0,04399 0,03544 0,08213 0,07262 0,10284 0,07730

100 0,082032 0,043349 0,031299 0,024351 0,013658 0,043946 0,043788 0,070419 0,04304

1000 0,089530 0,052544 0,025972 0,025922 0,017132 0,040409 0,037944 0,043563 0,04426

Underfitting

Overfitting Less anomalies detected

More anomalies detected

❑ Simultaneous anomalies detection

Preliminary Results – Anomalies (Δt = 1)

*100 shots randomly selected

The wider is the band the less anomalies are detected

The m

ore

sim

ultaneity is r

equired,

the

less a

nom

alie

s a

re d

ete

cte

d.

4 simultaneous

anomalies in 5

signals for a

threshold with k=1

at given time (t)

𝑑𝑖 = 𝑘 ∗ 𝑆𝑇𝐷

𝛼 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

1 387 381 343 299 274 264 258 249 248 238 233 225 210 194

2 387 354 284 252 238 217 203 184 163 143 113 104 79 65

3 377 293 240 213 172 143 112 87 61 42 36 28 23 14

4 350 253 210 159 108 74 56 28 18 10 8 6 5 5

5 302 214 146 96 54 32 17 10 6 4 3 3 3 3

6 240 155 86 44 16 11 5 2 1 1 0 0 0 0

7 164 70 26 10 5 1 1 0 0 0 0 0 0 0

8 65 20 7 2 1 0 0 0 0 0 0 0 0 0

9 0 0 0 0 0 0 0 0 0 0 0 0 0 0

❑ Simultaneous anomaly detection in a time window (Δt=5)


3 simultaneous

anomalies in 6

signals for k=0,8

with Δt=5

The m

ore

sim

ultaneity is r

equired,

the

less a

nom

alie

s a

re d

ete

cte

d.



𝛼 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

1 383 383 348 313 276 266 263 253 245 236 229 213 180 168

2 383 376 288 262 218 201 172 148 128 110 97 75 56 45

3 383 318 234 205 184 137 109 90 68 55 47 38 32 26

4 362 244 209 167 113 74 53 31 28 22 19 12 9 5

5 321 212 150 95 56 28 20 12 7 5 5 4 4 2

6 273 184 90 44 27 12 6 3 3 2 2 2 2 0

7 237 114 37 23 11 5 3 0 0 0 0 0 0 0

8 210 59 20 8 3 0 0 0 0 0 0 0 0 0

9 152 26 8 3 1 0 0 0 0 0 0 0 0 0

Preliminary Results – Anomalies (Δt > 1)

The wider is the TW the more anomalies are detected

❑ Simultaneous anomaly detection with a deep Autoencoder (DAE)

Preliminary Results – Anomalies (Δt = 1)



The

more

sim

ultan

eity is r

equ

ired,

the

less a

nom

alie

s a

re d

ete

cte

d.

2 simultaneous

anomalies in 6

signals for k=0.6

at given time (t)


𝛼 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

1 387 365 303 259 230 213 196 184 171 162 152 122 107 96

2 373 277 218 183 153 103 75 58 50 36 30 24 18 14

3 335 217 165 104 64 44 31 21 17 16 14 10 7 5

4 258 157 92 50 28 14 7 3 3 2 2 0 0 0

5 215 99 40 18 12 5 4 1 1 0 0 0 0 0

6 171 51 18 14 6 2 0 0 0 0 0 0 0 0

7 121 23 12 6 0 0 0 0 0 0 0 0 0 0

8 60 14 2 0 0 0 0 0 0 0 0 0 0 0

9 17 6 1 0 0 0 0 0 0 0 0 0 0 0

❑ How to define the parameters of the approach (𝑑𝑖,𝛼, Δ𝑡)?❑ This mainly depends on the anomaly to be detected, i.e., the expert has

to tune such values according to what he/she is looking for.

❑ We could determine a range for these parameters automatically if theexpert defines the fraction of discharges that generates anomalies.

❑ Instead AEs we could use different ML algorithms to learn

regular waveforms (LSTM/GRU).❑ Automatic recognition of anomalous patterns in discharges by recurrent neural networks. In the 12th IAEA Technical

Meeting on Control, Data Acquisition and Remote Participation for Fusion Research (CODAC 2019).

❑ Once an unknown anomaly is regularly detected (i.e., it is now

a known anomaly), we could use supervised learning to detect

such anomaly.

Preliminary Results – Discussion

❑AE networks can learn the shape of a waveform (one

model for each signal).

❑AE networks can be used for anomaly detection in

signals.

❑The specialists have to define the parameters to

distinguish the noise from the real anomalies.

❑It is possible to design supervised systems that allows

the detection of previous detected/studied anomalies.

Conclusions

Gonzalo Farias1

Ernesto Fabregas2

Sebastián Dormido-Canto2

Jesús Vega3

Sebastián Vergara1

May 31/2019, Vienna, AUSTRIA

Automatic recognition of anomalous patterns in

discharges by applying Deep Learning

1 2 3

3rd IAEA Technical Meeting

on Fusion Data Processing,

Validation and Analysis

Documents

Automatic recognition of anomalous patterns in discharges