Multivariate Temporal Data Analysis Using Self-Organizing Maps. 1. Training Methodology for Effective Visualization of Multistate Operations

Multivariate Temporal Data Analysis Using Self-Organizing Maps. 1. TrainingMethodology for Effective Visualization of Multistate Operations

Yew Seng Ng† and Rajagopalan Srinivasan*,†,‡

Department of Chemical and Biomolecular Engineering, National UniVersity of Singapore, 10 Kent RidgeCrescent, Singapore 117576, Process Sciences and Modeling, Institute of Chemical & Engineering Sciences,1 Pesek Road, Jurong Island, Singapore 627833

Multistate operations are common in chemical plants and result in high-dimensional, multivariate, temporaldata. In this two-part paper, we develop self-organizing map (SOM)-based approaches for visualizing andanalyzing such data. In Part 1 of this paper, the SOM is used to reduce the dimensionality of the data andeffectively visualize multistate operations in a three-dimensional map. A key characteristic of multistateprocesses is that the plant operates for long durations at steady states and undergoes brief transitions involvinglarge changes in variable values. When classical SOM training algorithms are used on data from multistateprocesses, large portions of the SOM become dedicated to steady states, which exaggerates even minor noisein the data. Also, transitions are represented as discrete jumps on the SOM space, which makes it an ineffectivetool for visualizing multistate operations. In this Part 1, we propose a new training strategy specifically targetedat multistate operations. In the proposed strategy, the training dataset is first resampled to yield equalrepresentation of the different process states. The SOM is trained with this state-sampled dataset. Furthermore,clustering is applied to group neurons of high similarity into compact clusters. Through this strategy, modesand transitions of multistate operations are depicted differently, with process modes visualized as intuitiveclusters and transitions as trajectories across the SOM. We illustrate the proposed strategy using two real-case studies, namely, startup of a laboratory-scale distillation unit and operation of a refinery hydrocracker.

1. Introduction

Multistate operations are increasingly common, even inpetrochemical plants that have traditionally been considered asoperating in a “continuous” fashion. Generally, the processoperation can be classified into modes and transitions. A modecorresponds to the region of continuous operation under fixedflowsheet conditions; i.e., no equipment is brought online ortaken offline. During a mode, the process operates under steadystate and its constituent variables vary within a narrow range.1

In contrast, transitions correspond to the portion of large changesin plant operating conditions that are due to throughputs, productgrade changes, etc. Transitions often result in sub-optimal plantoperations, because of the production of off-specificationproducts. Understanding transitions and minimizing their dura-tion can lead to major savings and increase periods of normaloperation. Advancements in sensors and database technologiesin chemical plants have resulted in the availability of a hugeamount of process data. Hence, visual exploration methods,which allow humans to uncover knowledge, patterns, trends,and relationships from the data is crucial in understandingprocess operations, especially when multistate operation andtransitions between them is common.

In this first part of the two-part paper, we exploit thedimension reduction ability of the self-organizing map andpropose a new training strategy for visualizing multistateoperations, whereas in Part 2,35 we use the representation toautomate fault detection and diagnosis. The organization of thispaper is as follows: section 2 provides the literature review of

data visualization methods, section 3 describes the principlesof the self-organizing map (SOM) and its application tovisualization. In section 4, we examine the shortcomings of theclassical SOM training algorithm when applied to data frommultistate operations and propose a data resampling methodol-ogy for visualizing multistate operations effectively. Theproposed method is illustrated using two case studies: alaboratory-scale distillation unit and an industrial hydrocrackingunit, in sections 5 and 6, respectively.

2. Visualization of Multivariate Data

Visualization techniques use graphical representation toimprove the human understanding of the structure in the data.These techniques convert data from a numeric form into agraphic that facilitates human understanding by means of thevisual perception system. The simplest approach, by far, is thecoordinate plot using a mutually orthogonal axis. However, thisis useful only for two- or three-dimensional data. Othertechniques have been proposed to facilitate visualization ofhigher dimensional, including Andrews’ curves2 and Chernoff’sface.3 The Andrews’ curve forces all variables onto a two-dimensional curve by transforming each N-dimensional observa-tion xi ) {xi1,..., xin,..., xiN} to

fi )xi1

√2+ xi2 sin(t)+ xi3 cos(t)+ xi4 sin(2t)+ xi5 cos(2t)+ ...

(1)

where

t ∈ [-π π ]

Chernoff’s face uses different parts of the face, such as the ears,mouth, and eyes, to represent different values observed fromthe data. However, these approaches have limited applicabilityto high-dimensional, temporal data.

* To whom all correspondence should be addressed. Tel.: +65-65168041. Fax: +65-67791936. E-mail: [email protected].

† Department of Chemical and Biomolecular Engineering, NationalUniversity of Singapore.

‡ Process Sciences and Modeling, Institute of Chemical & Engineer-ing Sciences.

Ind. Eng. Chem. Res. 2008, 47, 7744–77577744

10.1021/ie0710216 CCC: $40.75 2008 American Chemical SocietyPublished on Web 09/25/2008

The scatter diagram or scatter plot has been a popular toolfor visualizing the correlation among multivariate data usingtwo-dimensional graphs by displaying all pairs of variablesagainst each other.4 If the relationship between two variablesis linear, the points on the corresponding subplot would fall ona straight line. An illustration is shown in Figure 1 based oneight variables from the distillation column case study describedin section 4. The diagonal plots have been replaced by thehistogram of the variable. From the figure, variables 1 and 2, 2and 3, and 16 and 18 appear to be linearly correlated, whereasno direct correlation is apparent between variables 6, 9, and11. Scatter diagrams have been used to visualize the geneexpression data from Zhang et al.5 and Craig et al.6 However,they have limited use, even for correlation analysis, becausethe number of subplots to be analyzed grows as NC2. Conse-quently, more-efficient techniques are required for high-dimensional data.

Parallel coordinates was first proposed by Inselberg7 andis based on a dual representation of the Cartesian coordinates.Each dimension in parallel coordinates is represented by avertical line, so all the coordinate axes are parallel, insteadof being mutually perpendicular. Each observation is repre-sented by locating its ith variable along the ith axis andconnecting all N points for the observation by a line. A high-dimensional data set is then captured completely in a two-dimensional envelope of the polygonal lines representing allthe observations.8 An example of parallel coordinates isshown in Figure 2 for the same distillation column datasetas that previously given. Inselberg9 applied the parallelcoordinates for mining operational data from a very largescale integration (VLSI chip) production plant. From therepresentation, the operating conditions that give higher yieldduring manufacturing could be extracted visually. Albazzazet al.10 used parallel coordinates to visualize multidimensionaldata from a wastewater treatment plant. However, theyconcluded that automating the parallel coordinates methodremains a difficult task, so it was determined to not be suitablefor the online visualization of large-scale data.

Principal components analysis (PCA) is a popular statisticaltechnique for dimensionality reduction and information extrac-tion. PCA finds a combination of latent variables that describesmajor variations in the data.11,12 Generally, only a few principalcomponents (PCs) are necessary to adequately represent the data.In such cases where the dimensions of multivariate data can be

reduced effectively through PCA, visualization can be achievedthrough the biplot of the first few scores, because they wouldexplain the most important trends in the data. Jokinen13 usedPCA for the visualization of an industrial continuous stirredtank reactor (CSTR) and distillation column. Six fault classescould be distinguished from the normal operation as clusterson a biplot. Sebzalli and Wang14 used a similar technique todiscover the various operating zones of an FCC process. Someother examples of PCA-based visualization can also be foundin Mandenius et al.15 for visualizing state transitions inbiopharmaceutical processes. Martin and Morris16 and Fourieand de Vaal17 applied it for process visualization and monitoring.Srinivasan et al.1 showed that for multistate operations, processmodes form clusters on the scores plot, while transitions aredepicted as trajectories. However, the use of PCA is not withoutlimitation. First, the linear approximation of PCA might not besufficient to capture nonlinear relationships in the multivariatedata. Also, often, the first two or three PCs are not adequatefor capturing all the important variance in the data, so a depictionof observations in a two- or three-dimensional coordinate plotis not adequate. One solution for such cases is the use of parallelcoordinates to visualize a larger number of scores, as proposedby Wang et al.18 Finally, when multistate operations arevisualized, the scaling of each variable is dominated by the largevariation during transitions; significant changes within a steadystate would be obscured by the depiction. To overcome theseshortcomings, a self-organizing map based methodology isdeveloped in this work for visualizing high-dimensional, mul-tistate operational data.

3. Self-Organizing Maps

The self-organizing map (SOM) is an unsupervised neuralnetwork that was first proposed by Kohonen.19 It is capable ofprojecting high-dimensional data to a two-dimensional grid andtherefore can serve as a visualization tool. Self-organizationmeans that the net orients and adaptively assumes the form thatcan best describe the input vectors.20 The SOM applies anonparametric regression of discrete, ordered reference vectorsto the distribution of the input feature vectors. A finite numberof reference vectors are adaptively placed in the input signalspace to approximate the input signals.

Consider a dataset X that contains I samples, each N-dimensional. Therefore, dataset X is a two-dimensional matrixof size I × N, with the ith sample being xi ) {xi1,..., xin,..., xiN}.A SOM is an ordered collection of neurons. Each neuron has

Figure 1. Scatter plot of eight variables from the distillation column data. Figure 2. Parallel coordinate visualization of the distillation unit startup.

Ind. Eng. Chem. Res., Vol. 47, No. 20, 2008 7745

an associated reference vector mj ∈ Rn; thus, mj ) {mj1,..., mjn,...,mjN}. Consider a SOM described as MSOM ) {m1,..., mj,..., mJ}T

with J neurons, which must be trained to represent and visualizeX. This involves the calculation of the reference vector of everyneuron. Initially, let each mj be assigned a random vector fromthe domain of X. When a sample xi ∈ X is presented to theSOM for training, the neuron whose reference vector has thesmallest difference from xi is identified and defined as the winneror the best matching unit (BMU) for that input:

bi ) arg minj|xi -mj| ∀ j ∈ [1, J] (2)

The distance |...| between xi and mj is measured here using theEuclidean metric, but other metrics can also be used.

The neurons in MSOM are usually placed in a two-dimensionalgrid. Let the location of the jth neuron be rj, where rj ∈ R2. Adistance metric can be defined on the two-dimensional grid andall neurons up to a certain distance from the jth neuron can beconsidered to be its topological neighbors Nj in the grid. Thisconcept of a neuron neighborhood is the key differentiatingfeature of SOM and is responsible for its unique properties(described below). During training, when each sample xi ∈ Xis presented, the reference vector of the BMU, mbi, as well asthose of its topological neighbors in the grid, are updated bymoving them toward the training sample xi. In its simplest form,the SOM learning rule at the tth iteration is given as follows:

mj(t+ 1))mj(t)+R(t)hbij(t)|xi(t)-mj(t)| (3)

where R(t) is the learning rate factor and hbi j(t) is a neighborhoodfunction centered on the BMU bi but defined over all the neuronsin MSOM. The Gaussian neighborhood function is commonlyused and is given by

hbij(t)) exp(- ||rbi

- rj||2

2σ2(t)) (4)

where σ(t) is the neighborhood width. During training, theneighborhood width is varied from iteration to iteration bychanging σ(t). A large value of σ and R is used initially and isusually decreased monotonically with t.21 To guarantee con-vergence, it is necessary that, as training proceeds and t f ∞,R(t)f 0 and σ(t) approaches a small value (typically 1). Othervariants of the neighborhood function, as well as trainingalgorithms, have been proposed in the literature.22

The updating of the reference vector of the neighborhoodneurons, along with that of the BMU, provides the topology-preserving feature of the SOM (i.e., the neighboring neuronsare activated and learn from the training input xi) and thusacquire similar reference vectors. Thefrefore, the neighboringunits are, in a sense, more similar to each other and the trainedSOM maps similar input samples onto nearby neurons.

The training of the SOM requires the specification of twoparameters: the number of neurons in the SOM (J) and theaspect ratio of the two-dimensional grid. As recommended byVesanto,22 we have used the heuristic

J) 8√I (5)

to specify the number of neurons and the square root of theratio between the two largest eigenvalues of the covariancematrix of X to specify the aspect ratio.

After the SOM has been trained, it can be directly used forthe classification of new samples. In this case, the index of theBMU can be considered as the class index. For any testing datapoint xi, a class can be assigned by finding its BMU. The

goodness of fit of a testing sample xi to its BMU (bi) can bemeasured based on the quantization error, Ei

q:22

Eiq ) |xi -mbi

| (6)

The quantization error measures the goodness of projection. Alarge value of Ei

q indicates a large difference between the samplexi and the identified BMU (bi), i.e., the xi is not well-representedby the SOM and is not very similar to any of the trainingsamples.

The SOM can be used for visualization in two different ways.First, the trained SOM can be thought of as a mapping from X∈ Rn to two dimensions. The neurons in a trained SOM arenot equally distributed among the entire input space Rn; rather,more neurons are designated for regions with more samples inX (high density) and fewer ones for lower density regions inRn. Therefore, one way of visualizing clusters in X is by meansof the distance between a neuron and each of its neighbors.23

Djj′ ) |mj -mj′| j ′ ∈ Nj (7)

The unified distance matrix (U-matrix) visualizes the SOM bydepicting the boundary between each pair of neurons by a coloror gray shade that is proportional to their Djj′. Alternatively,the average distance between the neuron and its neighbors canbe used to color the neuron.

Dj )1

|Nj|∑

j′|mj -mj′| j ′ ∈ Nj (8)

where |Nj| is the size of the neighborhood.23 Cluster bordersare then indicated as “mountains” of high distances separating“valleys” of similar neurons. An example of a U-matrix is shownin Figure 3. From the U-matrix, three distinct clusters can beseen, as distinct valleys separated by mountains.

Second, clusters in the trained SOM can be labeled anddirectly used as a three-dimensional display to depict a newsample xi from the space of X. This is particularly useful foridentifying the class (cluster) of a new sample. The BMUcorresponding to this new xi is often referred to as its “hit”. Ifnew samples are regularly available from an online process,the state of the process at a point in time can be identified fromthe hit; the evolution of the process state can also be visualizedby the sequence of the hits, as discussed in Section 4.

SOM has been used successfully in diverse fields. Deventeret al.24 demonstrated how disturbances in a froth flotationplant can be visualized using the SOM. Features wereextracted from gray-level images of the froth. These high-

Figure 3. Representation of process operational data in a SOM U-matrix.

7746 Ind. Eng. Chem. Res., Vol. 47, No. 20, 2008

dimensional features were then visualized using the SOM.A change in the hits from one region of the SOM to anotherindicated a change in the froth and, hence, a change in theunderlying operating conditions. However, they could notestablish a unique mapping between the SOM units and theoperating conditions. Srinivasan and Gopal25 showed thatSOM can be used to extract features during the operation ofa fluidized catalytic cracking unit (FCCU). They projecteddata from the FCCU onto the SOM and identified the differentoperating modes. Kolehmainen et al.26 used SOM to visualizethe various growth phases of yeast, based on data obtainedfrom ion mobility spectrometry. They showed that hits fromthe same phase cluster together but are separated from thosefrom other phases by “mountains” of high distance betweenthe neurons. Xiao et al.27 used SOM to find co-expressedgenes from microarray data analysis. They used SOM tovisualize the transcriptional changes in tumor samples andrecognize those tumors that potentially belong to the samesubtype. Jamsa-Jounela et al.28 used SOM to detect aggrega-tions in a smelter plant. Abonyi et al.29 applied SOM toestimate product quality in a polyethylene process.

A comprehensive review of the applications of SOM has beenreported by Kaski et al.30 In summary, previous work havelargely focused on exploiting the clustering capability of SOMfor grouping multivariate data. In this paper, we extend SOMto visualize, in real time, the multivariate samples that originatefrom multistate processes.

4. Self-Organizing Map (SOM) for Representing ProcessOperations

In this section, we propose a SOM-based methodology todepict multistate process operations. In the proposed representa-tion, data from different process states (steady-state andtransient) demonstrate different characteristics in the SOM space.Steady-state operations form clusters of adjacent BMUs, whereastransient operation is reflected as a trajectory. Differencesbetween two states can be observed easily based on the locationand evolution of the BMUs.

A SOM must be suitably trained to represent the variousprocess states effectively. For robust visualization, the trainingof the SOM is performed using all available historical operationsdata, including those from steady-state and transient operations,during both normal and abnormal operations.31 The use of allavailable process data for training enables the SOM to representa wide range of operating conditions on the map. Data forprocess operation are usually available in the form of a timeseries (the typical frequency is a sample every second, 10 s, orminute). When a SOM is trained with such time-series data,the resulting visualization is not adequate, as illustrated nextusing a laboratory-scale distillation unit operation case study.

4.1. Laboratory-Scale Distillation Unit Case Study. Theflowsheet of the unit is shown in Figure 4. The distillationcolumn is 2 m high and has 10 trays, each 20 cm wide. Thefeed ethanol-water 30% v/v mixture passes through a heatexchanger before being fed to the column at tray 4. The systemis well integrated with a control console and data acquisitionsystem. The 18 variables shown in Table 1 are measured at

Figure 4. Schematic of the distillation unit.


intervals of 10 s. Cold startup of the distillation column isperformed following the standard operating procedure (SOP)shown in Table 2. The startup normally takes approximatelytwo hours. The evolution of some of the key variables in onerun is shown in Figure 5. More information about the startuptransition is available from Ng and Srinivasan.32

To obtain the training data, 11 runss1 normal run and 10runs with the disturbances listed in Table 3swere conducted,resulting in a total of 22 368 samples. The data were firstnormalized by autoscaling each of the 18 variables. A SOMwas trained using the classical training method described insection 3. During training, the neurons on the SOM will orientthemselves and evolve into a process map representing all theoperating conditions in the training data, while preserving thetopology of the measurement space. The resulting SOM isshown in Figure 6. It is evident that, in this SOM, a large portionhas been assigned to regions of steady state. This over-representation of a small region of the measurement spacearising from the large number of samples from steady-stateoperation being present in the training data set obviates effectivevisualization. To illustrate this, data from a normal run wereprojected on this SOM. Of the 1837 samples in this run, 1356(73.8%) samples correspond to the two steady states (modes)and the remainder (26.2%) corresponds to the transient opera-tion. The BMUs for the samples from the two steady states areshown in Figure 6 with consequent hits connected by lines. Inthis SOM, it is apparent that even a small change in themeasurement (for example, that due to noise) is amplified andhits from consequent samples are spread over a wide region, ascan be seen from the many long lines. Furthermore, evenvisualization of transient operation is hindered, because the largechanges during the transition correspond to few nodes in theSOM (due to the limited number of transient samples in the

training dataset); therefore, the transition is barely visible distinctfrom modes, even though the underlying variables evolve overlarge magnitudes. To overcome these shortcomings, we proposea new training methodology specifically designed for theeffective visualization of multistate operations data.

4.2. Effective Visualization of Multistate Processes. First,we identify the desired characteristics for effectively visualizingmultistate operations. As described in Srinivasan et al.,1 processoperation states can be classified into modes and transitions asmanifested through the values and behavior of the measuredvariables. A mode corresponds to the continuous operation ofthe plant and corresponds to a fixed flowsheet configuration (i.e.,no equipment is brought online or taken offline). During a mode,the process operates in a quasi-steady state and measurementsshould generally vary within a narrow range, although somevariables may oscillate due to noise, instrumentation faults, orimproperly tuned controllers. A transition corresponds todiscontinuities in the plant operation, such as change of setpoint,opening valve, change of equipment configuration, turning onor idling equipment, etc., usually induced by operator actions.During a transition, at least one of its constituent variables wouldshow a significant change. These characteristics of multistateoperations should be exploited for effective visualization.

Process modes and transitions should display differentcharacteristics on the SOM space. When a process is in a mode,all its variables have almost-constant values. Therefore, onlinemeasurements from such a state should be projected on the sameBMU. Noise and minor variations in process operation couldresult in projection of the online measurement to differentBMUs; however, these should be neighboring neurons andexplore the topology preserving feature of SOM training. In asuitably trained SOM, process modes can be identified when ahigh frequency of BMUs are found within a small neighborhoodin the map. Process variables have significantly different valuesbetween different modes. Therefore, different modes should bedistinguishable on the SOM based on the difference in theBMUs.

In contrast, process transitions are characterized by largechanges in plant operating conditions. Such an evolution of thevariable values during the transition should cause the BMUs totraverse over a wide region in the SOM space. The transitioncan be visualized by connecting the successive BMUs anddisplaying the trajectory of process evolution. During transitionoperations, continuous variables and discrete variables may havedifferent effects over the trajectory. Continuous variables usuallyevolve from their original values to new target values over someperiod of time. For example, a heating operation will lead toan increase in temperature from some initial value to a newvalue over a period of time. Such an evolution of continuousvariables should cause the BMUs to advance through adjacentneurons, resulting in a smooth trajectory on the SOM space.However, discrete variables correspond to abrupt changes inplant operations, such as the opening or closing of valves, theactivation (or deactivation) of equipment, etc. Such changes cancause abrupt jumps to a BMU that are a significant distanceapart, and thus would be exhibited as a discontinuous evolutionin the trajectory. The transition trajectory on the SOM representstime in an implicit manner as it is produced based only on themagnitude of the variables. An increase or decrease in the rateof change between two instances of the same transition wouldnot be evident in the sequence of BMUs in the trajectory. Thismakes the representation robust to run-length variations.

Different transitions would exhibit different trajectories inthe SOM, because they would start and end at different operating

Table 1. Variables Used To Monitor the Laboratory-ScaleDistillation Unit

variable description range

1 tray 1 temperature 20.5-89.5 °C2 tray 2 temperature 20.7-90.1 °C3 tray 3 temperature 20.8-90.2 °C4 tray 4 temperature 20.6-89.8 °C5 tray 5 temperature 20.4-91.3 °C6 tray 6 temperature 20.5-91.6 °C7 tray 7 temperature 20.7-91.1 °C8 tray 8 temperature 20.5-91.4 °C9 reboiler temperature 21.4-90.9 °C10 top column temperature 20.5-88.8 °C11 cooling water inlet temperature 21.1-27.2 °C12 cooling water outlet temperature 21.2-33.3 °C13 condenser inlet temperature 21.1-77.9 °C14 feed temperature 23.6-28.4 °C15 reboiler power 0-2.0 kW16 feed pump speed 0-199.6 RPM17 reflux cycle time 0-4 s18 reflux ratio 0-4

Table 2. Standard Operating Procedures (SOPs) forDistillation-Unit Startup

step description

1 set all controllers to manual2 fill reboiler with bottom product3 open reflux valve and operate the column in full reflux4 establish cooling water flow to condenser5 start the power of reboiler heating coil6 wait for all of the temperatures to stabilize7 start feed pump8 activate reflux control and set reflux ratio9 open bottom valve to collect product10 wait for all the temperatures to stabilize


points; also, the sequence of operations executed during differenttransitions would differ; hence, the values of the interveningconditions would be different. These differences should manifestthemselves as differences in the trajectory on the SOM space.As a corollary, abnormal operations (including the wrongsequence of SOP execution, wrong procedures, wrong timing,and hardware failures) would also manifest themselves differ-ently in the SOM space, leading to BMUs in abnormalneighborhoods. These can then be detected from the visualiza-tion. Next, we describe the proposed training strategy that bringsforth such visualization.

4.3. Two-Step Training Methodology. As shown previouslyin this work, the classical training strategy assigns neurons torepresent the measurement space in a fashion that reflects thedensity of the samples in the training dataset. Data obtainedfrom multistate operations have a large proportion of the samplesfrom steady states (modes), but these correspond to a very smallregion of the measurement space; short periods of transitionsduring the plant operation correspond to large changes. How-ever, these are proportionally under-represented in the time-series data. The proposed training methodology overcomes theaforementioned problem through a resampling of the training

dataset, followed by a granular annotation of the SOM map, asdescribed next.

4.3.1. Magnitude-Based Resampling. We propose a mag-nitude-based resampling strategy to improve the visualizationof multistate operations. The proposed SOM training algorithmis based on the prefiltering of self-similar data (based onmagnitude) from being used for training the SOM. This isachieved by introducing a resampling strategy prior to SOMlearning. Let H ) {x1,..., xi′,..., xI′} be the historical dataset thathas been normalized. The resampling method seeks to developa new dataset, for example, X, to be used for training the SOM.During resampling, samples from H are iteratively included inX if they are adequately distinct from other samples alreadypresent in X. Let ui′ be the membership of sample xi′ in X.

ui′ ) { 1 if ||xi′ - xi|| < τminN ∀ xi ∈ X0 otherwise

(9)

where τmin is a user-specified threshold and N is the number ofmeasured variables. With this resampling, historical data frommodes, which are repetitive, are eliminated in X. Hence, whenX is subsequently used for training the SOM, it does notoveremphasize steady-state operations over transitions. The

Figure 5. Process state variables observed during a normal startup of the distillation unit.

Table 3. Process Disturbances Introduced in the Distillation Unit Startup Case Study

case disturbance type time fault introduced (× 10 s)

DST01 reboiler power low step 1DST02 reboiler power high step 1DST03 feed pump high step 359DST04 feed pump low step 430DST05 tray temperature sensor T6 fault random variation 425DST06 reflux ratio high step 353DST07 reflux ratio low step 345DST08 bottom valve sticking 420DST09 cooling water slow drift 1DST10 low cooling water flow and feed pump malfunction slow drift and step 299


proportion of neurons that represents transitions would alsoincrease, and the regions of the SOM that corresponds to statetransitions would become more evident.

4.3.2. Granular Annotation. Next, consider the relationshipbetween the process states and their depiction on the SOM map.Typically, the number of neurons is selected to be muchlarger than the number of states in which the process wouldoperate. Therefore, the process states would not map to uniqueneurons. The jth neuron in the SOM corresponds to the processoperating conditions whose mean is specified by mj. The sameneuron would continue to be a BMU for all xi near this operatingcondition. A larger SOM with more neurons would offer finerresolution of operating conditions and is required to visualizetransition conditions and progression precisely. However, in largeSOMs, even small changes in the operating conditions would leadto different neurons (although in the same neighborhood) becomingthe BMU, i.e., the “noise” absorbed by each neuron is low. Tomeet the conflicting requirements of finer resolution and better noiseabsorbance, a second layer of abstraction is defined by groupingneurons into neuronal clusters.

A neuronal cluster is defined as a set of contiguous neuronsin the SOM map with high similarity in mj. The neuronal clusterexploits the topology preserving feature of SOM training toprovide a coarser representation of operating conditions andprocess states. Through this abstraction, as shown in sections 5and 6, different modes would map to unique neuronal clusters.The state can then be identified in real time, based on the hit.A neuronal cluster is said to have a hit if any of its constituentneurons has a hit. This can be used to abstract the trajectory, asshown in Figure 7, where the hits are visualized on the clustercentroid instead of the neurons, thus depicting the transition ata lower level of resolution.

Neuronal clusters are defined by clustering the neurons inthe trained SOM based on their reference vectors. Let all the

neurons in MSOM be grouped into K neuronal clusters {S1, S2,...,Sk,..., SK}. The assignment of neuron j to cluster k is specifiedby a membership function ujk:

ujk ) { 1 if neuron j is assigned to cluster k0 otherwise

(10)

Any clustering technique can be used to specify ujk. We haveused the k-means clustering algorithm, which identifies the Kclusters, to minimize the total squared distance, εp:33

εp )∑k)1

K

∑j)1

J

||mj · ujk - ck|| (11)

where ck is the centroid of the kth neuronal cluster and is givenby

ck )∑j)1

J

mj · ujk (12)

The assignment ujk is usually found through a two-step iterativeprocedure, starting from a random initialization. In the first step,the jth neuron is assigned to the cluster with the nearest centroid,ck:

k′) arg mink

(rj - ck) (13)

where

ujk′ ) 1 and ujk ) 0 (if k* k′)

In the second step, the positions of all K centroids is updatedby eq 12, which may necessitate changes in the assignment.The two steps are repeated until there are no further changes toujk and the centroids become stable. Because the aforementionedprocedure could terminate at a local minima, the procedure must

Figure 6. Visualization of the startup of distillation unit using the classical SOM.


be repeated multiple (P) times, with different initial assignments.The subscript “p” in the term εp signifies the total distance inthe pth replicate of the procedure. The assignment from thereplicate with the minimum εp is selected. Also, the k-meansalgorithm requires the a priori specification of K. We have foundthat a K value of ∼0.1 multiplied by the number of neuronsserves as a good initial value to annotate the SOM. The valueof K can be subsequently increased or decreased to obtain thedesired level of granularity.

In the following sections, these characteristics of SOM areexploited for representation and visualization of high-dimensionprocess operational data.

5. Case Study 1: Effective visualization of distillationcolumn operations

First, we demonstrate the effectiveness of the proposedtraining methodology on the laboratory-scale distillation unitdescribed in Section 4.1. The historical data set contains22 368 samples, with 17 446 samples (78%) correspondingto modes and 4922 samples (22%) corresponding to transi-tions. The magnitude-based resampling strategy was appliedwith τmin ) 0.0015. The resulting training data has a size of2574 samples from the original 22 368 samples, a reductionof 88.5%. The proportion of samples from modes was reducedto 3.2%, and those from transitions formed 96.8% of thetraining data. Subsequently, a SOM was designed. From eq5, J was selected to be 405. PCA was performed on thetraining data and the ratio of the square root of the first twoeigenvalues was determined to be ∼4.0. This was used asthe aspect ratio for the SOM, leading to a map configurationof 41 × 10 neurons. A SOM with this configuration wastrained with all the training data and the average quantizationerror was determined to be 0.303. Then, k-means clusteringwith K ) 60 and P ) 1000 replicates was performed toidentify neuronal clusters. The replicate with the smallest εp

is shown in Figure 8, where all the nodes that belong to thesame cluster have been similarly shaded and labeled withthe cluster number.

The available training data were then projected on the trainedSOM to annotate it. Neuronal cluster 3 was determined tocorrespond to the cold state and cluster 60 corresponded to thefinal steady state. The centroid of cluster 3 has all of its traytemperatures near room temperature. Similarly, at cluster 60,

the tray temperatures are ∼80-85 °C, the feed temperature isalso higher (from preheating), and the reflux ratio is ∼1. Tovisualize the startup transition, the online measurement fromthe normal startup run were projected on the trained SOM andthe evolution of the hits tracked. There are three major phasesduring the startup (reboiler heating, evaporation, and columnstabilization) before a steady state is established. The timeevolution of the neuronal-cluster hits is labeled in Figure 9 andis summarized in Figure 8. The startup operation begins atcluster 1 in this run, with the column at the cold start state.When the reboiler power is turned on, its temperature (variable9) increases to the boiling point (∼90 °C). The lower traytemperatures also increase slowly and the hits evolve throughadjacent clusters from 3 f 5 f 14 f 15 f 18. When thereboiler contents start to boil (at ∼80 °C), the temperaturesthroughout the column increase significantly. This leads to asmooth evolution of hits through clusters 20 f 23 f 24 f 25f 26 f 28 f 33 f 35 f 37 f 39 f 41 f 48 f 52 f 54f 55. Finally, the column is stabilized by establishing the feedflow (variable 16) and the hits move to clusters 56 f 59.Starting at t ) 3890 s, the hits remain at cluster 60, whichindicates that the process has reached the final steady state. Datafrom the abnormal runs were also used to annotate the SOM.Figure 10 shows the various clusters that correspond to thevarious faults.

The startup transition demonstrates that continuous opera-tions are exhibited as smooth trajectories on the SOM space,wherein successive hits are in close proximity. Discreteevents, on the other hand, are represented as abrupt jumpson the SOM. For example, when the feed pump is started att ) 3410 s, the hit moves across three intervening neuronsfrom a neuron in cluster 56 to another in cluster 59. Similarly,when the reflux ratio is changed from 0 to 1 at t ) 3430, thehits jump over four intermediate neurons to cluster 60.

5.1. Visualization of Abnormal Runs. Next, we describethe use of the trained SOM for fault detection. The processsignals for DST01 are shown in Figure 11, where the solidlines represent the signals for the abnormal operation whilethe dotted lines represent a normal run. The fault wasintroduced at t ) 10 s. The online samples are projected onthe SOM and its trajectory is visualized as shown in Figure

Figure 7. Representation of process transition as a trajectory using hits on (a) the neurons and (b) the cluster centroids.


10. The process can be observed to follow an abnormaltrajectory from the beginning, with the hits evolving fromcluster 3 to cluster 50, instead of cluster 5. When the feedpump is activated (at t ) 5160 s), the startup operationbecomes unsuccessful, because there is not enough heatsupply to supplement the heat of evaporation. The temper-ature in most trays starts to fall and the process trajectory isobserved to move back toward the cold state (cluster 3).

During another run, a fault (human error) was induced att ) 3530 s when the reflux ratio was set to 2.5 times its nominalvalue. This leads to a reduction in product throughput andincreases the load of the reboiler. In this run, samples prior tot ) 3540 s broadly correspond to normal operation, although

there are minor run-to-run deviations (for example, from t )2500 s to t ) 3500 s). The trajectory on the SOM absorbs theseminor differences and the sequence of cluster hits remains thesame. At t ) 3540 s, when the fault is introduced, the SOMshows an abnormal hit on cluster 53 which corresponds toDST06. Although only two of the measured variables (variables17 and 18) reflect this fault, the variation can be clearly observedon the SOM space. This illustrates two key benefits of SOMvisualization methodology: (i) dimensionality reduction (theoperator is not required to monitor the different variablesindividually; rather, the reduced two-dimensional SOM map canreveal variations effectively) and (ii) diagnosis support (theabnormal hits serve as a signature of the fault, so the operator

Figure 8. Trajectory of normal startup of distillation unit as projected on the SOM.

Figure 9. Operating state identification based on the SOM.


Figure 10. Visualization of process operation during run DST01.

Figure 11. Comparison of process state variables during DST01 and normal startup.


can quickly perform fault diagnosis based on the location ofthe hit on the annotated SOM). The latter can also be used forautomated fault diagnosis, as described in Part 2 of this paper.35

One key issue while forming the neuronal cluster is todetermine the number of clusters. The number of clusters (K)affects the selectivity and sensitivity of SOM when it is usedto track process operations. A larger K results in more neuronalclusters on SOM, and, hence, improves the system’s ability to

represent different states (including disturbance classes). How-ever, the SOM also becomes more sensitive with increasing Kand could result in false alarms. To evaluate the effect of K,the same SOM was clustered with various K values. For all Kvalues in the range K ∈ [40,70], all the faults could bedifferentiated from the SOM visualization. The next sectiondescribes the application of the proposed visualization methodto an industrial case study.

Figure 12. Process flow diagram of the refinery hydrocracking unit (HCU).


6. Case Study 2: Transition Identification andVisualization in an Industrial Hydrocracking Unit

The process analyzed in this section is the boiler of ahydrocracking unit (HCU) in a major refinery in Singapore.34

Hydrocracking is a versatile process for converting heavypetroleum fractions into lighter, more-valuable products. Theobjective of HCU is to convert heavy vacuum gas oil (HVGO)to kerosene and diesel with minimum naphtha production. Thesimplified process flow diagram of the HCU is shown in Figure12. The operations of the HCU considered are complex andinvolve catalytic hydrocracking reactions in a hydrogen-richatmosphere at elevated temperatures and pressures. The HCUincludes two sections: a reactor section and a fractionationsection. A waste heat boiler (WHB) unit is integrated into bothof these sections, for heat recovery. This section illustrates theapplication of SOM for visualizing different operating states inthe WHB unit.

6.1. Analysis of Operating Data from the Waste HeatBoiler (WHB). In this study, one month of operating data,consisting of 21 measured variables from the WHB unit,sampled at intervals of 5 min, is considered. The originalhistorical dataset contains 12 650 samples. Resampling with τmin

) 0.0015 reduces the prevalence of the mode samples and thetraining dataset contains 3422 samples, with a reduction of73.0%. These were used to train a SOM with 468 neurons(dimensions of 39 × 12). The trained SOM was then clusteredwith K ) 70. The clustered SOM was annotated with the typicalregions of operation. As can be seen from Figure 13, the WHBunit operates in five different modes, shown as M1 to M5. Thesemodes can be annotated suitably; for instance, Mode M2

corresponds to the production of steam at 22 T/h, whereas modeM3 corresponds to a throughput of 14 T/h. Analysis of the SOMshowed that the unit underwent seven different transitions duringthe period under consideration. Four instances of these transi-

Figure 13. Visualization of transition trajectories for the refinery WHB unit.

Figure 14. Classical SOM for the WHB unit trained using the historical dataset.


tions are also shown in Figure 13. In one instance, depicted asT34

A , the unit transitioned from mode M3 to mode M4 in 140min. Another instance of the same transition, T34

B . required only85 min. Therefore, the operating strategy for the latter instancecan be used as the basis for all future transitions of this class.For comparison, Figure 14 shows the SOM trained with thehistorical dataset. It can be observed that some of the modes(for example, modes M2 and M3) occupy a large portion of thisSOM and dominate it.

The trained SOM was also used to visualize the operationduring another 15-day period. Data from this period was notused during the training, therefore demonstrating the generaliza-tion ability of the SOMs. The mean quantization error for thisperiod was 0.906, indicating that SOM also provides a goodrepresentation for these. During this period, the plant wasobserved to operate in mode M3 83% of the time, mode M2

∼3% of the time, and mode M4 ∼4% of the time. The processunderwent transitions for a total of 32.5 h (∼10%) during thisperiod. All the transitions could be easily visualized with thepreviously trained SOM.

For the purpose of comparison, we also attempted to visualizethe same data using PCA. The first three PCs captured 85.47%of the variance, as shown in Figure 15. Data from the five modesidentified from the SOM are shown in the biplot. In contrast tothe SOM, the different modes of operation are not delineatedas clearly by PCA.

7. Conclusions

Methods that enable effective visual exploration are crucialfor extracting knowledge from complex, high-dimensional,temporal, multistate data. In this work, we have shown that theself-organizing map (SOM) provides a method to reducedimensionality and visually depict high-dimensional process datain an intuitive graphic. Because process modes form the majorityof the training data, classical SOM training allocates moreneurons to these, and, hence, modes and transitions do not have

distinct representations. A large portion of the SOM is used torepresent process modes and transitions are under-represented.To overcome this, a magnitude-based resampling strategy hasbeen proposed to facilitate visualization of the multistateoperations. With the proposed algorithm, process modes andtransitions are mapped differently onto the SOM. Process modes,when process variables exhibit near-constant values, are pro-jected to a small neighborhood on the SOM. In contrast, processtransitions that exhibit large changes are visualized as a smoothtrajectory when adjacent hits are connected by lines. Thisenables transitions to be visualized clearly. In cases where theunderlying process has high noise levels, an additional layer ofabstraction is desirable. In such cases, neighboring neurons arecombined into a neuronal cluster, which maps to a broad rangeof process operation.

Application of the proposed approach to two case studiessstartup of a distillation unit and operations of an industrial boilerwithin a hydrocracker in a major refinerysillustrates the efficacyof the proposed training methodology and benefits of visualiza-tion in extracting process knowledge even from complex,multistate operations. Visualization of multistate operationsusing the SOM results in a map of the process that has numeroususes. As demonstrated in the two case studies, valuable insightsabout the process operations can be obtained by analyzingoperations data. The different states in which the processoperates can be segregated. If multiple instances of the samestate are present in the data, they can be compared. The trainedmap can also be used for real-time state identification byidentifying the location of the latest BMU on the annotatedSOM. In contrast to traditional dimensionality reduction ap-proaches such as PCA, which preserve global distances, theSOM dedicates neurons to an operating region only if it ispresent in the training data. Therefore, it offers a more compactand rich representation of the operation, which also can beexploited for process monitoring, as proposed in Part 2 of thispaper.35

Figure 15. Visualization of the WHB unit operation using the first three scores.


Acknowledgment

The authors would like to thank Mr. Qin Zhen for assistancein conducting the experiments for the laboratory-scale distilla-tion-unit case study. The authors also gratefully acknowledgeMr. Yu Weihao, Singapore Refining Company, for his assistancewith the hydrocracker case study.

Nomenclature

Indices

i,i′ ) samplej,j′ ) neuron in the self-organizing map (SOM) modeln ) variablek,k′ ) cluster

Variables

bi ) best matching unit (BMU) for sample xi

ck ) centroid for cluster kDjj′ ) distance between neurons j and j′Ei

q ) quantization error for sample xi

H ) historical datasethbij ) neighborhood function of bi

MSOM ) set of neurons; MSOM ) {m1,...,mj,..., mJ}mj ) reference vector of neuron j ) {mj1,..., mjn,..., mjN}Nj ) set of neurons that are topological neighbors of neuron jrj ) location of neuron j in a two-dimensional gridSk ) neuronal cluster kX ) training dataset; X ) {x1,..., xi,..., xI}xi ) sample i in X ) {xi1,..., xin,..., xiN}ujk ) membership of neuron j in cluster kR ) learning rate factorεp ) total cluster assignment distance in replicate pτmin ) minimum distance between samples in training datasetσ ) neighborhood width

Constants

I ) total number of samples (rows) in XI′ ) total number of samples (rows) in HJ ) total number of neurons in SOMK ) total number of neuronal clustersN ) total number of variables (columns) in XP ) total number of replicates for clustering

Literature Cited

(1) Srinivasan, R.; Wang, C.; Ho, W. K.; Lim, K. W. Dynamic principalcomponent analysis based methodology for clustering process states in agilechemical plants. Ind. Eng. Chem. Res. 2004, 43, 2123.

(2) Andrews, D. Plots of high dimensional data. Biometrics 1972, 28,125.

(3) Chernoff, H. Using faces to represent points in k-dimensional spacegraphically. J. Am. Stat. Assoc. 1973, 68, 361.

(4) Rauwendaal, C. SPCsStatistical Process Control in Extrusion;Hanser Publishers: Munich and New York, 1993.

(5) Zhang, L.; Tang, C.; Song, Y.; Zhang, A. VizCluster and itsapplication on classifying gene expression data. Distrib. Parallel Databases2003, 13, 73.

(6) Craig, P.; Kennedy, J.; Cumming, A. Animated interval scatter-plotviews for the exploratory analysis of large-scale microarray time-coursedata. Inf. Visualization 2005, 4, 149.

(7) Inselberg, A. Intelligent instrumentation and process control. InProceedings of the 2nd Conference on Artificial Intelligence; 1985, p 302.

(8) Inselberg, A.; Chomut, T.; Reif, M. Convexity algorithms in parallelcoordinates. J. Assoc. Comput. Mach. 1987, 34, 765.

(9) Inselberg, A. Visualization and data mining of high-dimensional data.Chemom. Intell. Lab. Syst. 2002, 60, 147.

(10) Albazzaz, H.; Wang, X. Z.; Marhoon, F. Multidimensional visu-alization for process historical data analysis: a comparative study withmultivariate statistical process control. J. Process Control 2005, 15, 285.

(11) Jackson, J. E. A User’s Guide to Principal Components; Wiley-Interscience: New York, 1991.

(12) Wise, B. M.; Ricker, N. L.; Veltkamp, D. J.; Kowalski, B. R. Atheoretical basis for the use of principal components model for monitoringmultivariate processes. Process Control Quality 1990, 1, 41.

(13) Jokinen, P. A. Visualization of multivariate processes using principalcomponent analysis and nonlinear inverse modeling. Decision Support Syst.1994, 11, 53.

(14) Sebzalli, Y. M.; Wang, X. Z. Knowledge discovery from processoperational data using PCA and fuzzy clustering. Eng. Appl. Artif. Intell.2001, 14, 607.

(15) Mandenius, C.-F.; Hagman, A.; Dunås, F.; Sundgren, H.; Lund-strom, I. A multisensor array for visualizing continuous state transitions inbiopharmaceutical processes using principal component analysis. BiosensorsBioelectrics 1998, 13, 193.

(16) Martin, E. B.; Morris, A. J. Non-parametric confidence bounds forprocess performance monitoring charts. J. Process Control 1996, 6, 349.

(17) Fourie, S. H.; de Vaal, P. Advanced process monitoring using anon-line non-linear multiscale principal component analysis methodology.Comput. Chem. Eng. 2000, 24, 755.

(18) Wang, X. Z.; Medasani, S.; Marhoon, F.; Albazzaz, H. Multidi-mensional visualization of principal component scores for process historicaldata analysis. Ind. Eng. Chem. Res. 2004, 43, 7036.

(19) Kohonen, T. Self-organized formation of topologically correctfeature maps. Biol. Cybern. 1982, 43, 59.

(20) Kohonen, T. Things you haven’t heard about the self-organizingmap. Proc. IEEE Int. Conf. Neural Networks 1993, 3, 1147.

(21) Kohonen, T. Self-Organizing Maps; Springer Series in InformationSciences; Springer: Berlin, Germany, 2000.

(22) Vesanto, J. Data exploration process based on the self-organizingmap, Ph.D. Dissertation, Helsinki University of Technology, Departmentof Computer Science & Engineering, Helsinki, Finland, 2002.

(23) Ultsch, A.; Siemon, H. P. Kohonen’s self organizing feature mapsfor exploratory data analysis. In Proceedings of International NeuralNetwork Conference; Kluwer Academic Publishers: Dordrecht, The Neth-erlands, 1990; p 305.

(24) Deventer, J. S. J. V.; Moolman, D. W.; Aldrich, C. Visualizationof plant disturbances using self-organizing maps. Comput. Chem. Eng. 1996,20, 1095.

(25) Srinivasan, R.; Gopal, S. Extracting Information from High-Dimensional Operations Data Using Visualization Techniques. Presentedat the 2002 AIChE Meeting, Indianapolis, IN, 2002, Paper #271c.

(26) Kolehmainen, M.; Ronkko, P.; Raatikainen, O. Monitoring of yeastfermentation by ion mobility spectrometry measurement and data visualiza-tion with self-organizing maps. Anal. Chem. Acta 2003, 484, 93.

(27) Xiao, L.; Wang, K.; Teng, Y.; Zhang, J. Component planepresentation integrated self-organizing map for microarray data analysis.FEBS Lett. 2003, 538, 117.

(28) Jamsa-Jounela, S. L.; Vermasvuori, M.; Enden, P.; Haavisto, S. Aprocess monitoring system based on the Kohonen self-organizing maps.Control Eng. Pract. 2003, 11, 83.

(29) Abonyi, J.; Nemeth, S.; Vincze, C.; Arva, P. Process analysis andproduct quality estimation by self-organizing map with an application topolyethylene production. Comput. Ind. 2003, 52, 221.

(30) Kaski, S.; Kangas, J.; Kohonen, T. Bibliography of self-organizingmap (SOM) papers: 1981-1997. Neural Comput. SurVeys 1998, 1, 102.

(31) Ng, Y. S.; Srinivasan, R. Monitoring of distillation column operationthrough self-organizing maps. Presented at the 7th International Symposiumon Dynamics and Control of Process Systems, Boston, MA, 2004.

(32) Ng, Y. S.; Srinivasan, R. Distillation unit case study homepage-,http://www.iace.eng.nus.edu.sg/research/Distillationcolumn/, National Uni-versity of Singapore, 2005.

(33) Seber, G. A. F. MultiVariate ObserVation; Wiley-Interscience:Hoboken, NJ, 2004.

(34) Ng, Y. S.; Yu, W.; Srinivasan, R. Transition classification andperformance analysis: A study on industrial hydro-cracker. Presented atthe International Conference on Industrial Technology, December 15-17,2006, Mumbai, India.

(35) Ng, Y. S.; Srinivasan, R. Multivariate Temporal Data AnalysisUsing Self-Organizing Maps. 2. Monitoring and Diagnosis of MultistateOperations Ind. Eng. Chem. Res. 2008, 47, 7758.

ReceiVed for reView July 26, 2007ReVised manuscript receiVed August 6, 2008

Accepted August 7, 2008

IE0710216


Documents

Multivariate Temporal Data Analysis Using Self-Organizing Maps. 1. Training Methodology for Effective Visualization of Multistate Operations