From sensor readings to prediction: on the process of developing practical soft sensors

From sensor readings to prediction: on the process of developing practical soft sensors

Marcin Budka1, Mark Eastwood2, Bogdan Gabrys1, Petr Kadlec3, Manuel Martin Salvador1, Stephanie Schwan3, Athanasios Tsakonas1, Indre Zliobaite4

1Bournemouth University, UK2Coventry University, UK3Evonik Industries, Germany4Aalto University and HIIT, Finland IDA 2014. Leuven, Belgium

Outline

1. INFER project2. Sensors, sensors, sensors3. Easy vs difficult4. Soft Sensors

4.1. Soft Sensors: models4.2. Soft Sensors in the Process Industry4.3. An unsuccessful soft sensor4.4. A successful soft sensor4.5. How to build a successful data-driven soft sensor?

4.5.1. Performance goal and evaluation criteria4.5.2. Data Analysis4.5.3. Data Preparation and Pre-processing4.5.4. Training and validation

5. Our case study5.1. Versions of the data5.2. Evaluation

6. Conclusion

Sensors, sensors, sensors

SENSORSSENSORS

SENSORS EVERYWHERESENSORS EVERYWHERE

Sensors, sensors, sensorsImage copyright by Disney Pixar. Qualifies fair usage.

Easy vs difficult

Easy-to-measure variables Difficult-to-measure variables

TemperaturePressureHumidityFlow

ConcentrationFermentation progress

Polymerisation progress

Soft Sensors

Soft sensors are computational models that aggregate readings of physical sensors

Soft sensors operate online using streams of sensor readings, therefore they need to be robust to noise

and adaptive to changes over time.

Soft Sensors: models

First principle models Data-driven models

Based on physical and chemical process knowledge

Usually focus on ideal states of the process

Process knowledge is not available

Such knowledge can be extracted from the data (Machine Learning algorithms)

y=temp + press/2 - flow2Linear RegressionPLS regressionSupport Vector Machines

π

Soft Sensors in the Process Industry

Main areas of application

1. Online prediction of a difficult-to-measure variable

2. Inferential control in the process control loop

3. Multivariate process monitoring for determining the process state

4. Hardware sensor backup

An unsuccessful soft sensor

Ima

ge

co

py r

igh

t b

y D

i sn

ey

Pix

ar .

Qu

alif

ies

f air

usa

ge

.

A successful soft sensor

Requirements:

Implemented into the process online environment

Accepted by the process operators

• Reasonable performance• Stable• Predictable• Transparency• Automation• Robustness• Adaptivity

A successful soft sensorImage copyright by Disney Pixar. Qualifies fair usage.

How to build a successful data-driven soft sensor?

1) Setting up the performance goals and evaluation criteria

2) Data analysis (exploratory)

3) Data preparation and preprocessing

4) Training and validating the predictive model

Keep domain expert in the loop from the beginning

Proposed framework:

1. Performance goals and evaluation criteria

Performance goal examples:

● Classification accuracy > 85%● Processing time per sample < 1s

Evaluation criteria:

● Qualitative evaluation:● Transparency● Model complexity

● Quantitative evaluation:● RMSE● MAE● Jitter● Confidence

2. Data Analysis

Exploratory data analysis

Time series analysis

3. Data Preparation and Pre-processing

✔Queries from databases

✔Sampling rate✔Synchronization


✔Remove data from shutdown periods


1 . Physical constraints2. Univariate statist ical tests for individual sensors3. Mult ivariate statist ical tests for al l variables together4. Missing values


✔ If outl iers=noise, replace them with missing values imputation techniques


✔Discretization✔Derive new

variables✔Data scaling✔Data rotation


✔Feature selection✔Subsampling


4. Training and Validation

Training set for tuning pre-processing methods and building the model

Testing set for evaluating the model

Our case study

Real industrial dataset from a debutanizer column

3 years of operation189,193 records (every 5 min)85 sensorsTarget: concentration of the product

Background picture is Creative Commons by Paul Joyce

Versions of the data

Code Description

RAW no pre-processing (188752 training / 21859 testing)

SUB subsampling (every 1h – 15611 training / 1822 testing)

SYN features are synchronised

FET-E 20 features selected using the first 1000 training samples

FET-L 20 features selected using the latest 1000 training samples

FRA additional features derived by computing the fractal dimension

DIForiginal values are replaced with the first derivative with respect to time

Evaluation

Partial Least Squares regression → transparencyMAE = Mean Absolute Error

Data #1 MAE #1 Data #2 MAE #2 % improvement

RAW 225 RAW-SYN 222 1%

SUB 227 SUB-SYN 221 3%

RAW-FET-E 228 RAW-FET-L 198 13%

RAW-SYN-FET-E 245 RAW-SYN-FET-L 201 18%

SUB-FET-E 236 SUB-FET-L 193 18%

SUB-SYN-FET-E 215 SUB-SYN-FET-L 185 14%

SUB-DIF 41.8 SUB-DIF-SYN 35.3 16%

SUB-DIF 41.8 SUB-DIF-FRA 32.4 22%

Evaluation (cont.)

● Feature synchronization can have positive or negative effect in prediction

● Adaptive feature selection using the latest samples is beneficial → Feature importance change over time

● Taking into account temporal differences is very beneficial → Product concentration does not change suddenly

Conclusion

✔Framework for building a successful soft sensor

✔Case study with real data from industrial production process

✔Adaptive pre-processing could be very beneficial (and sometimes a must)

Future directions:Extend feature space with autoregressive featuresFilter out the effects of data compression

Ongoing work:Automation and adaptation of data stream pre-processing

Thanks!

Slides available in http://slideshare.net/draxus

[email protected]

Data & Analytics

From sensor readings to prediction: on the process of developing practical soft sensors