Upload
manuel-martin
View
159
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Automatic data acquisition systems provide large amounts of streaming data generated by physical sensors. This data forms an input to computational models (soft sensors) routinely used for monitoring and control of industrial processes, traffic patterns, environment and natural hazards, and many more. The majority of these models assume that the data comes in a cleaned and pre-processed form, ready to be fed directly into a predictive model. In practice, to ensure appropriate data quality, most of the modelling efforts concentrate on preparing data from raw sensor readings to be used as model inputs. This study analyzes the process of data preparation for predictive models with streaming sensor data. We present the challenges of data preparation as a four-step process, identify the key challenges in each step, and provide recommendations for handling these issues. The discussion is focused on the approaches that are less commonly used, while, based on our experience, may contribute particularly well to solving practical soft sensor tasks. Our arguments are illustrated with a case study in the chemical production industry.
Citation preview
From sensor readings to prediction: on the process of developing practical soft sensors
Marcin Budka1, Mark Eastwood2, Bogdan Gabrys1, Petr Kadlec3, Manuel Martin Salvador1, Stephanie Schwan3, Athanasios Tsakonas1, Indre Zliobaite4
1Bournemouth University, UK2Coventry University, UK3Evonik Industries, Germany4Aalto University and HIIT, Finland IDA 2014. Leuven, Belgium
Outline
1. INFER project2. Sensors, sensors, sensors3. Easy vs difficult4. Soft Sensors
4.1. Soft Sensors: models4.2. Soft Sensors in the Process Industry4.3. An unsuccessful soft sensor4.4. A successful soft sensor4.5. How to build a successful data-driven soft sensor?
4.5.1. Performance goal and evaluation criteria4.5.2. Data Analysis4.5.3. Data Preparation and Pre-processing4.5.4. Training and validation
5. Our case study5.1. Versions of the data5.2. Evaluation
6. Conclusion
Sensors, sensors, sensors
SENSORSSENSORS
SENSORS EVERYWHERESENSORS EVERYWHERE
Sensors, sensors, sensorsImage copyright by Disney Pixar. Qualifies fair usage.
Easy vs difficult
Easy-to-measure variables Difficult-to-measure variables
TemperaturePressureHumidityFlow
ConcentrationFermentation progress
Polymerisation progress
Soft Sensors
Soft sensors are computational models that aggregate readings of physical sensors
Soft sensors operate online using streams of sensor readings, therefore they need to be robust to noise
and adaptive to changes over time.
Soft Sensors: models
First principle models Data-driven models
Based on physical and chemical process knowledge
Usually focus on ideal states of the process
Process knowledge is not available
Such knowledge can be extracted from the data (Machine Learning algorithms)
y=temp + press/2 - flow2Linear RegressionPLS regressionSupport Vector Machines
π
Soft Sensors in the Process Industry
Main areas of application
1. Online prediction of a difficult-to-measure variable
2. Inferential control in the process control loop
3. Multivariate process monitoring for determining the process state
4. Hardware sensor backup
An unsuccessful soft sensor
Ima
ge
co
py r
igh
t b
y D
i sn
ey
Pix
ar .
Qu
alif
ies
f air
usa
ge
.
A successful soft sensor
Requirements:
Implemented into the process online environment
Accepted by the process operators
• Reasonable performance• Stable• Predictable• Transparency• Automation• Robustness• Adaptivity
A successful soft sensorImage copyright by Disney Pixar. Qualifies fair usage.
How to build a successful data-driven soft sensor?
1) Setting up the performance goals and evaluation criteria
2) Data analysis (exploratory)
3) Data preparation and preprocessing
4) Training and validating the predictive model
Keep domain expert in the loop from the beginning
Proposed framework:
1. Performance goals and evaluation criteria
Performance goal examples:
● Classification accuracy > 85%● Processing time per sample < 1s
Evaluation criteria:
● Qualitative evaluation:● Transparency● Model complexity
● Quantitative evaluation:● RMSE● MAE● Jitter● Confidence
2. Data Analysis
Exploratory data analysis
Time series analysis
3. Data Preparation and Pre-processing
✔Queries from databases
✔Sampling rate✔Synchronization
3. Data Preparation and Pre-processing
✔Remove data from shutdown periods
3. Data Preparation and Pre-processing
1 . Physical constraints2. Univariate statist ical tests for individual sensors3. Mult ivariate statist ical tests for al l variables together4. Missing values
3. Data Preparation and Pre-processing
✔ If outl iers=noise, replace them with missing values imputation techniques
3. Data Preparation and Pre-processing
✔Discretization✔Derive new
variables✔Data scaling✔Data rotation
3. Data Preparation and Pre-processing
✔Feature selection✔Subsampling
3. Data Preparation and Pre-processing
4. Training and Validation
Training set for tuning pre-processing methods and building the model
Testing set for evaluating the model
Our case study
Real industrial dataset from a debutanizer column
3 years of operation189,193 records (every 5 min)85 sensorsTarget: concentration of the product
Background picture is Creative Commons by Paul Joyce
Versions of the data
Code Description
RAW no pre-processing (188752 training / 21859 testing)
SUB subsampling (every 1h – 15611 training / 1822 testing)
SYN features are synchronised
FET-E 20 features selected using the first 1000 training samples
FET-L 20 features selected using the latest 1000 training samples
FRA additional features derived by computing the fractal dimension
DIForiginal values are replaced with the first derivative with respect to time
Evaluation
Partial Least Squares regression → transparencyMAE = Mean Absolute Error
Data #1 MAE #1 Data #2 MAE #2 % improvement
RAW 225 RAW-SYN 222 1%
SUB 227 SUB-SYN 221 3%
RAW-FET-E 228 RAW-FET-L 198 13%
RAW-SYN-FET-E 245 RAW-SYN-FET-L 201 18%
SUB-FET-E 236 SUB-FET-L 193 18%
SUB-SYN-FET-E 215 SUB-SYN-FET-L 185 14%
SUB-DIF 41.8 SUB-DIF-SYN 35.3 16%
SUB-DIF 41.8 SUB-DIF-FRA 32.4 22%
Evaluation (cont.)
● Feature synchronization can have positive or negative effect in prediction
● Adaptive feature selection using the latest samples is beneficial → Feature importance change over time
● Taking into account temporal differences is very beneficial → Product concentration does not change suddenly
Conclusion
✔Framework for building a successful soft sensor
✔Case study with real data from industrial production process
✔Adaptive pre-processing could be very beneficial (and sometimes a must)
Future directions:Extend feature space with autoregressive featuresFilter out the effects of data compression
Ongoing work:Automation and adaptation of data stream pre-processing