Composition Prediction of a Debutanizer Column · Composition Prediction of a Debutanizer Column using Equation Based ... a Chemical Engineering ... Debutanizer column is an important

Composition Prediction of a Debutanizer Column using Equation BasedArtificial Neural Network Model

Nasser Mohamed Ramli a,b, M.A. Hussain b,c,n, Badrul Mohamed Jan b, Bawadi Abdullah a

a Chemical Engineering Department, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, 31750 Tronoh, Perak, Malaysiab Chemical Engineering Department, Faculty of Engineering, University of Malaya, 50603 Kuala Lumpur, Malaysiac UMPDEC, University of Malaya, Malaysia

a r t i c l e i n f o

Article history:Received 27 May 2013Received in revised form17 September 2013Accepted 28 October 2013Communicated by J. ZhangAvailable online 7 January 2014

Keywords:Statistical analysisNeural networkPartial least square analysisRegression analysisDebutanizer column

a b s t r a c t

Debutanizer column is an important unit operation in petroleum refining industries. The design of onlinecomposition prediction by using neural network will help improve product quality monitoring in an oilrefinery industry by predicting the top and bottom composition of n-butane simultaneously andaccurately for the column. The single dynamic neural network model can be used and designed toovercome the delay introduced by lab sampling and can be also suitable for monitoring purposes. Theobjective of this work is to investigate and implement an artificial neural network (ANN) for compositionprediction of the top and bottom product of a distillation column simultaneously. The major contributionof the current work is to develop these composition predictions of n-butane by using equation basedneural network (NN) models. The composition predictions using this method is compared with partialleast square (PLS) and regression analysis (RA) methods to show its superiority over these otherconventional methods. Based on statistical analysis, the results indicate that neural network equation,which is more robust in nature, predicts better than the PLS equation and RA equation based methods.

& 2014 Elsevier B.V. All rights reserved.

1. Introduction

Distillation column is considered one of the most common unitoperations in the chemical industry. However, its complex beha-viour and highly un-predictive nature, has made it as a unitoperation which is complicated and difficult to handle by engi-neers [1]. Hence it becomes more important to attain the desiredpurity of products by manipulating the top and bottom composi-tion of the distillation column accurately. In order to maintain andcontrol the composition at its optimum value, it is necessary topredict it with high accuracy and precision, simultaneously withfast response. Chemical process industries also encounter a lot ofproblem in monitoring the debutanizer column. Open loopinstability issues, non-linearity, multivariable issues and the diffi-culty to measure a certain variable directly are the key factorscomplicating the composition prediction. The composition at thetop and bottom respectively for the column is currently measuredusing normal laboratory sampling which is tedious and timeconsuming. It has been found that the computing time forcomposition prediction monitoring by neural network is fast andaccurate compared to normal laboratory while in the industry it

normally takes one day to measure the composition by laboratorysampling. In this context, the need for software-based onlineanalyzer to provide the speed and accuracy for its measurementhas become incumbent and this research deals with the predictionof the composition online using equation based artificial neuralnetwork models, and compared with partial least square andregression models.

In relation to the use of online sensors, an adaptive soft sensorfor online monitoring of melt index (MI), an important variabledetermining the product quality in the industrial propylene poly-merization (PP) process, has been proposed by Zhang and Liu [2].The fuzzy neural network (FNN) served as the basic model for itsnonlinear approximation ability using its learning method. Toovercome the difficulty of structure determination of the FNN,an adaptive fuzzy neural network (A-FNN) is subsequently devel-oped to determine the number of fuzzy rules, where a noveladaptive method dynamically changes the structure of the modelby the predefined thresholds. In order to get better generalizationability of the soft sensor, support vector regression (SVR) isintroduced for parameter tuning, where the output function istransformed into an SVR based optimization problem. The softsensors including the SVR, FNN–SVR and A-FNN–SVR models arecompared in detail and the proposed soft sensor achieves goodperformance in the industrial MI prediction process.

Three soft sensor models involving radial basis function (RBF),support vector machine (SVM), and independent component

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/neucom

Neurocomputing

0925-2312/$ - see front matter & 2014 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.neucom.2013.10.039

n Corresponding author at: Chemical Engineering Department, Faculty ofEngineering, University of Malaya, 50603 Kuala Lumpur, Malaysia.

E-mail address: [email protected] (M.A. Hussain).

Neurocomputing 131 (2014) 59–76

analysis–support vector machine (ICA–SVM)] methods has beendeveloped by Yan and Liu [3]. The process is to infer the ChemicalOxygen Demand (COD) of the quench water produced from thepesticide waste incinerator. An optimization model of COD isfurther proposed based on a fore mentioned soft sensor models.The chaos genetic algorithm is introduced to solve the optimiza-tion model. A novel soft sensor model with principal componentanalysis, radial basis function neural network (RBF) and multi scaleanalysis (MSA) has been proposed by Shi J and Liu [4]. The purposeis to infer the melt index of manufactured products from realprocess variable, where PCA is carried out to select the importantrelevance process features and to eliminate the correlations ofinput variable, the MSA is to introduce much more informationand to reduce the uncertainty of the system, and RBF networks areused to characterize the nonlinearity of the process. A black-boxmodeling scheme to predict melt index (MI) in the industrialpropylene polymerization process has also been developed by Liuand Zhao [5]. MI is one of the most important quality variablesdetermining product specification and influenced by a largenumber of process variables. In their work a faster statisticalmodeling method has been proposed to predict MI online whichinvolves fuzzy neural network, particle swarm optimization (PSO)algorithm, and online correction strategy (OCS).

Furthermore an adaptive soft sensor based on systematic processkey variables has also been proposed for inferential control usingderived adaptive model by Ma Ming et. al. [6]. The key variables arebased on statistical approach of stepwise linear regression. Theonline plant measurements are selected as key features to estimatetardily-detected variables. The parameters of the linear inferentialmodel are adapted as the online and offline data which are available.In order to improve the numerical characteristics of the algorithm,square root filter is used due to the multi-collinearity probleminvolved. The soft sensor has been implemented to an o-xylenepurification column. The inferential model predicts accurately thereal plant data which is useful for industrial application in thedistillation column. The statistical stepwise regression techniquewas used to infer fast- measuring variables to some key variablesso that the model is easy to maintain. By introducing the concept ofadaptation, the model structure would reflect the current operationof the plant and the accuracy of the soft sensor could be improved.

In this respect, Artificial Neural Network (ANN) offers as analternative powerful and fast tool to model non-linear processessuch as the debutanizer column and which can be utilized as anefficient soft sensor. ANN has the ability to learn the relationshipbetween the outputs and the inputs for a system. To develop aprocess using ANN, it requires suitable network architecture andappropriate training data. The literature reported some work ondebutanizer column modeling using neural network. For example,a nonlinear state space model is used for representing the inputsand outputs and singular value decomposition (SVD) is used toremove redundant nodes and model reduction in the work ofPrasad and Bequette [7].

The design of dynamic neural network soft sensors to improveproduct quality in a debutanizer column has also been reportedusing a three step predictive method to evaluate its top productconcentration by Fortuna and co-workers [8]. The approach useslagged values of the input and composition in the neural networkprediction. Real time estimation of plant variables such as thecomposition are used for monitoring purposes and the number ofneurons in the hidden layer for the neural network was deter-mined by trial and error. The ANN estimator based on Levenberg–Marquardt (LM) algorithm has been used because it has beentested for binary as well as multi-component mixture by Singhand co-workers [9]. The LM algorithm suits very well to both casesand gives more accurate and sensitive results compared toSteepest Descent Back Propagation (SDBP) algorithm. For a com-plex chemical plant having hundreds of parameters, LM approachwork efficiently. By using these parameters, the quality of theproduct could be estimated and corrective actions are takensimultaneously. ANN has also been utilized widely in crudefractionation section in the oil refinery industry where the outputneural network prediction is the naphtha temperature rather thancomposition prediction by Zilochian and Bawazir [10]. Neuralnetwork has in reality been used for a number of chemicalengineering applications involving sensor analysis, fault detectionand nonlinear process control both in simulation and onlineimplementation, as reported in the literature by Hussain [11].

Partial least square regression (PLSR) together with artificialneural network (ANN) with back propagation (BP) algorithm hasalso been proposed by Xuefeng [12]. The neural networks weretrained to extract the quantitative information from the trainingsamples for a preflash tower. Hybrid Artificial Neural Network(HANN) was employed to develop the naphtha dry point softsensor which is the most important intermediate product con-centration soft sensor in the p-xylene (PX) oxidation reaction. Anoptimization framework to obtain optimal operation of thedynamic processes under process-model mismatches has beendeveloped by Mujtaba and Hussain [13]. In order to model thesemismatches, neural network have been utilized in the batchdistillation process for a binary batch distillation with only onespecified product. In another work by Greaves and co-workers, aframework has been proposed to optimize the operation of batchsystem and utilize an artificial neural network (ANN) basedprocess model in the optimization of the pilot-plant middle-vesselbatch column [14]. The maximum-product problem is formulatedand solved by optimizing the column operating parameters, suchas the batch time, reflux and reboil ratios. The ANN based modelwas capable of reproducing the actual plant dynamics with goodaccuracy, and allows a large number of optimization studies to becarried out with little computational effort.

Partial Least Square (PLS), an extension of PCA provide modelparameters with diagnostic tools where by increasing the numberof X variables, it could improve the precision of the PLS model [15].In the literature there also exists some modeling work of

Nomenclature

At actual valuexmeamsuredmeasure valueCp person correlation co-efficientxpredicted predictedDi product yi! yiyi difference actual and average actualEa actual valueEp predicted value

yi difference predicted and average predictedEa average actual valueEp average predicted values2 varianceFt predicted valueK number of free model parametersMSE mean square errorN number of observationR2 R squaredT number of parameters

N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–7660

a debutanizer column using PLS. For example, dynamic partialleast square regression is used in the inferential model forcomposition prediction in a multicomponent distillation columnby Kano et. al [16]. Past sampling times measurement are used asinput variables to interpret the dynamic process. PLS was also usedto predict the composition profile in a simulated batch distillationcolumn by Zamprogna et. al. [17]. The inputs are temperaturemeasurements and the output is the composition in the distillateand bottom streams. The estimator performance is evaluatedbased on the pre-processing of the calibration and validation datasets. The number of measurements used as sensor inputs, consistof lagged measurements. A simple augmentation of the conven-tional PLS regression approach is based on the development andsequential use of multiple regression models.

A soft sensor for a chemical process using PLS that could handlecorrelations for a number of process variables and nonlinearitiesbased on the smoothness concept has also been proposed by Parkand Han [18]. The proposed method was to build a soft sensor for adistillation column based on multivariate smoothing by using localweighted regression. There were two different type of casesapplied for the distillation column which are the nonlinear andlinear behavior and use for online measurement to estimate theimportant variables such as temperature and composition. Processmonitoring using modified PLS through an independent compo-nent analysis (ICA) approach has also been developed by ZhangYingwei and Zhang Yang [19]. The method make use of the kernelto the ICA-PLS to solve the non-linearity in the data set and theoriginal algorithm are modified by giving the regression coefficientmatrix and residual matrix to the ICA-PLS to reduce computationtime. An application of PLS as a soft sensor has been developed topredict the melt flow index using measured process variable for anindustrial autoclave reactor by Sharmin and co-workers [20].Detailed first principle model for free radical polymerization isnot an easy task since there are large reactions and kineticparameters involved. Multivariate regression model are used tosolve this problem and the melt index can be successfullypredicted using these statistical tools.

A multivariate statistical soft sensor for online estimation ofproduct quality in an industrial batch polymerization process hasalso been proposed by Facco et. al.[21]. For each estimation, PLSsensors are designed, and their performance is evaluated againstactual plant data. The estimation are evaluated by augmentingthe process variable with lagged measurement. The projectionmethod, using PLS regression are used to design a soft sensor forthe online estimation of the resin quality properties. Multivariatestatistical (MVS) techniques have been proven to be an excellenttool for analyzing and monitoring of processes where the processdata are huge. Online soft sensor was proposed by using threedifferent methods in terms of just in term learning (JITL) which arebased on PLS, support vector regression (SVR) and least squaressupport vector regression (LSSVR) by Ge and Song [22]. The realtime performance strategy is to enhance the online efficiency ofthe JILT based soft sensor for a distillation column. The JILTmethods are suitable for real time performance. The modelingefficiency of SVR is not difficult because it only requires a quadraticprogramming optimization and the efficiency could be improvedby the LSSVR.

A least squares support vector machines (LS-SVM) soft-sensormodel of propylene polymerization process has been developed byShi and Liu to infer the MI of polypropylene [23]. Considering theuse of cost function without regularization might lead to lessrobust estimates, the weighted least squares support vectormachines (weighted LS-SVM) approach for the propylene poly-merization process is further proposed to obtain a robust estima-tion of the melt index. Reliable estimation of melt index (MI) forthe production of polypropylene has also been proposed by Shi

and coworkers [24]. Propylene polymerization process is highlynonlinear and characterized by multi-scale nature with hugenumber of variables and information which are highly correlatedand derived at different sample rates from different sensors. Anovel soft-sensor architecture based on radial basis function net-works (RBF) combining independent component analysis (ICA) aswell as multi-scale analysis (MSA) is proposed to infer the MI ofpolypropylene from other process variables.

A RBF (radial basis function) neural network soft-sensor modelfor the polypropylene process has been developed by Li and Liu toinfer the MI from a number of process variables [25]. Since the PPprocess is complicated for the RBF neural network with a generalset of parameters, a new ant colony optimization (ACO) algorithm,N-ACO, and its adaptive version, A-N-ACO, which aimed tooptimize the structure parameters of the RBF neural network,respectively. An optimal soft sensor, named the least squaressupport vector machines with Ant Colony-Immune Clone ParticleSwarm Optimization (AC-ICPSO-LSSVM), has also been proposedby Jiang and coworkers which combines the advantages of thehigh accuracy of LSSVM and the fast convergence of PSO [26].Furthermore, the immune clone (IC) method is introduced into thePSO algorithm to make the particles of ICPSO diverse and enhanceglobal search capability for avoiding the premature convergenceand local optimization of the conventional PSO algorithm.

Another novel chemical soft-sensor approach for the predictionof the melt index (MI) in the propylene polymerization industryhas been developed by Jiang and co-workers using accurateoptimal predictive model of the MI values with the relevancevector machine (RVM) method [27]. The RVM is employed to buildthe MI prediction model and a modified particle swarm optimiza-tion (MPSO) algorithm is introduced to optimize the parameter ofthe RVM, after which the MPSO-RVM approach is developed. Anonline correcting strategy (OCS) is further carried out to updatethe modeling data and to revise the model’s parameter self-adaptively whenever model mismatch happens.

In this paper we demonstrate the use of a single ANN to predictthe composition of n-butane for the top and bottom of a debuta-nizer column simultaneously and compare it with predictionsusing PLS and regression analysis. One of the significant and novelcontribution of this work is the use of an equation based neuralnetwork model whereas other works, mention previously, utilizeneural network as a black box model only. The use of an equationbased neural network is more reliable and robust than theconventional method and at the same time gives better predictionthan the other methods such as the PLS and regression analysis.This equation based approach is also a concrete, fast and practicalway of utilizing neural network models as a soft sensor for thissystem. Furthermore, we utilize a combination of online data bothopen loop and closed loop as well as simulated data and furthervalidate these data using the closed loop system. This furtherenhanced the reliability and online capability of the NN modelwhen applying it online as a software sensor.

The paper is organized in various sections. Section 2 containsthe description of the column and plant, and Section 3 describesthe theoretical background and Section 4 describes the methodol-ogy for the online composition prediction. Finally Section 5 is theoverall analysis for the online composition prediction.

2. Description of Crude Oil Processing Plant and DebutanizerColumn

The crude oil processing plant as seen in Fig. 1, consists of arefinery process, condensate fractionation and reforming aro-matics section. The feedstocks of the refinery process are mainlycrude oil while the products are petroleum products, liquefied

N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 61

petroleum gas, naphtha and low sulphur waxy residue. Therefinery has two main process units, which are Catalytic ReformingUnit (CRU) and Crude Distillation Unit (CDU). The Crude OilTerminal provides the feedstock and the crude oil is preheatedusing heat exchangers within the range of 190 1C – 210 1C. It isthen further heated in a furnace to 340 1C – 342 1C before beingrouted to the CDU. The crude oil is separated into a number offractions, which are heavy Straight Run Naphtha as overheadvapour, untreated kerosene, straight run kerosene and straightrun diesel. From the crude tower, there are 3 sides cut streams,which are drawn to a stripper column and the stripper consists ofa kerosene stripper, naphtha stripper and diesel stripper.

From the CDU, the pretreater feed Heavy Straight Run Naphtha(HSRN) is mixed with hydrogen from the reformer and heated upto the reaction temperature using a heater and fed into thepretreater catalytic reactor. The reactions involved are denitrifica-tion and desulphurization, which will protect the reformer catalystfrom poisoning. The product from the reactor is transferred to thepretreater stripper while the feed to the reforming unit is thebottom product of the stripper and the feed to the reformersreactors is the treated naphtha, which is heated to the reactiontemperature. Effluent from the reactor is collected in a reformerseparator where it is cooled. Some portion of the gas which isseparated, is recycled to the reactor feed stream while the otherportion is transferred to an absorber. In the absorber, at the rawnaphtha feed, hydrogen gas is purged and recycled to the pre-treater heater. The feed into the LPG absorber is liquid phasewhere it is drawn off and the liquid fraction is pumped into astabiliser. Before being sent to storage, reformate is withdrawnfrom the stabiliser bottom for cooling. From the stabiliser refluxdrum, overhead vapours from the stabiliser are cooled, condensedand recovered.

The debutanizer column is the main column for producing themain product, which is the liquefied petroleum gas. The debuta-nizer column is located at the CDU section depicted top right inFig. 1. The unit is used to recover light gases and LPG from theoverhead distillate before producing light naphtha. The light gases

mainly C2 is used to refine fuel gas and mixed with LPG. The feedto the debutanizer column which has 35 valve trays, is from theDeethanizer bottom product. The debutanizer condenser con-denses the overhead vapor and the debutanizer overhead pressurecontrol valves with two split ranges controls the overhead system.The reflux from the top of the debutanizer consists of the collectedcondensed hydrocarbon while reboiler section is used to striplighter hydrocarbon.

There are three manipulated variables for the column which arethe feed flow rate, reflux flow rate and reboiler flow rate. The feedflow rate controls the feed to the column, the debutanizer reboilercontrol valve controls the reboiler temperature while the debuta-nizer bottom level controller controls the bottom product level. Thedebutanizer reflux control valve controls the ratio of the liquid anddistillate flow rate at the top of the column. This column is achallenging process because it deals with non-linearity, is a highlymultivariable process, involves a great deal of interactions betweenthe variables, has lag in many of the control system, all of whichmakes it difficult system to be modeled by linear techniques. Hencenon-linear methods such as the neural network equation basedmodel is highly appropriate for this process. Table 1 outlines the

Table 1Column specification.

Number of tray of the column 35Feed tray - stage number 23Type of tray used ValveColumn diameter 1.3 mColumn height 23.95 mCondenser type PartialFeed mass flowrate 44106 kghr"1

Feed temperature 113 1CFeed pressure 823.8 kPaOverhead vapor mass flowrate 11286 kghr"1

Overhead liquid mass flowrate 5040 kghr"1

Condenser pressure 823.8 kPaReboiler pressure 853.2 kPa

Fig. 1. Block diagram for the oil refinery industry.


column specification while Table 2 describes in detail all thevariables surrounding the column. The measured variables are theFeed flow, Pressure 1 (Debutanizer receiver overhead pressure),Flow 2 (LPG flow to storage), Flow 1 (Light Naphtha flow to storage),Level 2 (Debutanizer condenser level), Level 1 (Debutanizer level)and Temp 5 (Reboiler outlet temperature to column). The top andbottom compositions of the column are currently measured usinglaboratory sampling by gas chromatography. Fig. 2 shows thecolumn configuration of the debutanizer column under study inthis work.

3. Theoretical background

Artificial Neural Network (ANN) is a popular and reliable toolwhen dealing with problems involving prediction of variables inengineering problems at the present age [14]. It comprises a greatnumber of interconnected neurons that consists of a series oflayers with a number of nodes. Every node receives a signal fromthe network link and the signal is added together before beingapplied to a specific transfer function to produce the output. Thesignal from the output will be sent to other node until it reachesthe network output. Nodes called neuron are the basic processors

of neural network. Each connection between two nodes with a realvalue is called weight and the values of the weights are obtainedby training a set of input and output correlations. The weights areadapted by the learning rule and it has long-term memory for thenetwork.

The advantage of ANN is in their ability to be used as anarbitrary function approximation mechanism that learns fromobserved data. However, using them is not so straightforwardand a relatively good understanding of the underlying theory isessential. One of the main criteria is the choice of model and thiswill depend on the representation of data and its application. Thesecond criteria is the learning algorithm where there are numer-ous trade-offs regarding these algorithms. Furthermore selectingand tuning an algorithm for training on unseen data requires asignificant amount of experimentation to ensure the robustness ofthe selected model. If the model, cost function and learningalgorithm are selected appropriately, the resulting ANN can beextremely robust and gives the correct implementation. It can beused naturally in online learning and large data set applications.However the main argument against the widespread use of theneural network is that it is a black box model and can only berepresented by the NN structure and difficult to be represented byalgorithmic equations which are cumbersome in nature. In thiswork, it can be shown that by the appropriate use of the activationfunctions and with proper pruning of the weights, an equationbased neural network model can be obtained to be used in theprediction for the column compositions.

The general equation for the output from the neural networkcan be given as (for a 3 layer network)

y¼ f iðLW3;if iðLW2;if iðIW1;ipþb1Þþb2Þþb3Þ ð1Þ

IW1;i ¼ input weight at layer 1 (input layer)b1¼ bias values at layer 1LW2;i ¼ layer weight at layer 2 (hidden layer)b2¼ bias values at layer 2LW3;i ¼ layer weight at layer 3 (output layer)b3¼bias values at layer 3p¼vector inputs to the neural network

Table 2Description of the variables for the column.

Tag Description Units

Temp 1 Debutanizer top temperature 1CTemp 2 Debutanizer bottom temperature 1CTemp 3 Debutanizer receiver bottom temperature 1CTemp 4 Light Naphtha temperature after condenser E 1 1CTemp 5 Reboiler outlet temperature to column 1CTemp 6 Debutanizer feed temperature 1CLevel 1 Debutanizer level %Level 2 Debutanizer condenser level %Level 3 Debutanizer level indicator %Level 4 Condenser level indicator %Flow 1 Light Naphtha flow to storage m3/hrFlow 2 LPG flow to storage m3/hrPressure 1 Debutanizer receiver overhead pressure kPa

Fig. 2. Debutanizer column configuration.


y¼vector outputs from the neural networkf i¼activation function at layer i

This equation based neural network model is more robust andstable as compared to the black box based model, frequently usedby researchers and practitioners and will be the highlight of ourresearch work in this paper.

PLS regression is a method that generalizes and combinesfeatures from principal component analysis and multiple regres-sions. This is very useful in data analyses for system which arecollinear and have incomplete variables. The precision of PLSmodel is a function of the number of input variables. This is oftenuseful in predicting a set of dependent variables (Y) from a largeset of independent variables or predictors (X). PLS has been provenreliable in process monitoring and optimization prediction. PLSinterpretation could indicate matrix vector multiplication to a setof bivariate regression. It provides the connection between twooperations in algebra matrix and statistics. PLS has the ability toprovide the foundation of a multivariable system. It could alsodemonstrate projection models as long as there is a similaritybetween the variables[15]. Based on PLS, the general regressionequation is given as

Y ¼ yþXWnCþF ð2Þ

where y the variable average for Y, WnC are the loading weightsand F is the residual in Y.

The disadvantages of PLS with further increase in the size of thedata sets is that we will start to see inadequacies in thesemultivariate methods, both in their efficiency and interpretability.PLS coefficients are of interest because it could be simplified whenthere are several components in the model but the disadvantagesof the coefficients for the PLS equation is that informationregarding the correlation structure among the response isunknown.

Multivariate regression is the other conventional method usedto obtain the relationship between the input variables, X and theoutput variable, Y. The Y can be predicted as a function of X byusing an equation in the following form given as,

Y 0 ¼ aþboXoþb1X1þ :::þbnXn ð3Þ

where Y’ is the predicted variable on the Y variable, a is the sloperepresenting the predicted change in Y for a one unit increase in Xo

[28]. The performance of regression analysis methods in practicedepends on the form of the data generating process, and how itrelates to the regression approach being used. Since the true formof the data-generating process is generally not known, regressionanalysis often depends to some extent on making assumptionsabout this process. These assumptions are sometimes not testableif a large amount of data to be utilised. Regression models forprediction are still useful even when the assumptions are moder-ately violated, although they may not perform optimally. However,the main disadvantage in many applications, of these regressionmethods, is that it could give misleading results when causalityexists on the observation data.

4. Methodology

4.1. Model data generation

Although most online open loop response from the plantsurrounding the column is available, some of the variables in openloop surrounding the column are not available. In this work,dynamic simulation of a debutanizer column is performed usingthe plant process simulator HYSYS to obtain the unavailable datasets from the plant where the variables that are not available are

Temp 5, Pressure 1 and composition at both ends of the column.The simulated close loop response of the composition of n-butaneat the top and bottom of the column was also established tocompare with the online close loop data. The steady state for thecolumn needs to be developed before transition of the steady-stateto the dynamic state. Steady state simulations can be cast easilyinto dynamic simulations by specifying additional engineeringdetails, including pressure/flow relationships and equipmentdimensions. The necessary information such as feed conditions,feed compositions, reflux ratio, condenser pressure, reboiler pres-sure etc. have to be provided to the selected unit operation in thesimulation. The simulation data was performed using similar stepstest as in the plant to obtain the fluctuation of the process variableunder open loop response, where the manipulated variables arereboiler and reflux flow rates.

The data generated for the process is taken for 541 minuteswith 1 minute sampling interval which amounts to a total data of5410 as will be seen in later sections. These data that are availablefrom actual plant are large and therefore need to be screened byperforming principal component analysis (PCA) and partial leastsquare (PLS), where the important variables for the columnare obtained and are used for monitoring the composition ofn-butane. Table 2 outlines all variables surrounding the column.For each of the step test, PCA is used to determine the importantvariables surrounding the column. Once we have determined theprocess variables, the important variables affecting the composi-tion of n-butane is further analysed using PLS analysis. The rawprocess data generated are scaled down between 0.05 to 0.95using the following equation:-

scaled value¼actual value" min valuemax value" min value

! "0:95"0:05ð Þ

þ min value ð4Þ

Hence the actual value is then given by,

actual value

¼scaled value" min valueð Þ ! max value" min valueð Þ

0:95"0:05ð Þþ min value ð5Þ

4.2. Neural network, Partial least square (PLS) and RegressionAnalysis (RA) data sets

One of the objective of this work is to develop compositionpredictions online using neural network, partial least square andregression analysis. The composition at the top and bottom for thecolumn in the refinery is currently measured using normallaboratory sampling. Therefore neural network, PLS and RA areused as alternative online methods to predict the composition asthey are expected to produce more robust, stable and preciseresults at a faster period.

Open loop responses of the reboiler and reflux data set, whichinclude the composition of n-butane, are used to develop thedynamic neural network architecture. The selected input variablesto the network are time delayed including the composition ofn-butane since the models are dynamic in nature and the outputsare the future predictions of n-butane. The numbers of past valuesfor each input variable are considered to be only 1. These pastvalues are determined by trial error method and it is found thatthis past value for each variable gives the optimum performanceand also reduces the complexity of the dynamic model. The type ofdynamic network used for this case is the Nonlinear Autoregres-sive Network with Exogenous inputs (NARX) while the trainingalgorithm used is the Levenberg-Marquardt method. In addition,the adaptation learning function with momentum is used and theperformance function evaluated is the mean square error criteria.


The data sets are partitioned into 2 sets which are classified astraining and validation sets with 65% data for training and 35% forvalidation. The network training and validation are achieved byusing the mean square performance with specified number ofepoch (training cycle). The number of inputs to the network is 10and the outputs are 2 and the transfer function is linear for all theentire layers.

The architecture consists of 3 layers which are the input, hiddenand output layer. The weights and biases value used in the neuralnetwork equation are obtained after training and validation of theneural network. The hidden nodes are selected by trial and errormethod. The neural network is trained with an initial guess of thehidden nodes at 8 and then the number of hidden nodes is increasedby a factor of 2 till the hidden nodes achieves a value of 40. The RootMean Square Error (RMSE) is then monitored and the one with thelowest RMSE value is selected for determining the final number ofhidden nodes. Fig. 3 shows the profile of RMSE with the change in thenumber of hidden nodes in the hidden layer. Analysis of variance(ANOVA) for NN is also done by using the Statistical Toolbox inMATLAB using the F test statistics method. In this work, the number ofneurons which gives optimum predictions of the outputs is found tobe 10 nodes as seen in the Fig. 3.

Table 3 shows the important variables involved for the neuralnetwork where the open loop responses of the reboiler flow rate andreflux flow rate data set are obtained from plant and simulation. Thesimulated data is the composition of n-butane and the rest of thevariables are obtained from actual plant data. The inputs for the neuralnetwork are obtained frommv2(k) to p_bot(k-1) while the outputs arethe variable p_top(kþ1) and p_bot(kþ1) decided by data pretreat-ment using PCA and PLS as mention earlier.

Multivariate data are measured based on observations andvariables from the step tests in the input variables and the datagenerated for PLS is similar to the data generated for NN. PLSanalysis are performed using the multivariate software calledSIMCA-P. There are 2 important variables classified which arethe primary variable and the observation variable. The primaryvariable consists of 10 variables surrounding the column and theobservation variables are the top and bottom n-butane composi-tion. Once the work set has been developed, the PLS model will befitted with the Partial Least Square equation and it involves theloading weight and residual in terms of the composition of n-butane and average value of the composition of n-butane.

The data generated for Regression Analysis (RA) is also similar tothe data generated for NN and PLS. The data for regression areanalyzed using the data analysis tool in Excel. The important elementsof the RA modeling is the range of inputs and outputs of the dataanalyzed where the confidence level is set at 95%. Once all the

required inputs and outputs are fed to the regression analysis, it willcalculate the predicted output, the equation for RA and the residualanalysis. The regression is based on multivariate linear equation andthese input variables are generally shown in Eq. 3 in terms of the Xvariable.

4.3. Model adequacy test for NN, PLS and RA models

The performances and comparison of the predictions by thedifferent methods are determined using the Root Mean SquareError method. (RMSE) given by;

RMSE¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðxmeasured"xpredictedÞ

2

N

s

ð6Þ

Correct Directional Change (CDC) measures the accuracy of amodel in its prediction of the subsequent actual change of apredicted variable. The formula of CDC is given below as;

CDC ¼100N

∑N

iDi ð7Þ

where the formula of Di is defined as:

Di ¼ yi ! yi

The best known information criterion is the Akaike informationcriterion (AIC) and Bayesian information criteria (BIC) which isgiven below as;

AIC ¼MSEþs22KT

ð8Þ

BIC ¼MSEþlog ðNÞs22K

Tð9Þ

RMSE profile of n-butane

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

Number of hidden nodes

RM

SE

bottom top

Fig. 3. Profile of the RMSE with respect to number of hidden nodes.

Table 3Variables involved in the PLS analysis, regression analysis and neural network.

mv2 (k) Manipulated reboiler flow ratemv2 (k-1) Lag mv2mv3 (k) Manipulated reflux flow ratemv3 (k-1) Lag mv3f (k) Debutanizer feed temperaturef (k-1) Lag feed temperaturep_top (k) Top composition n-butanep_top (k-1) Lag top compositionp_bot (k) Bottom composition n-butanep_bot (k-1) Lag bottom compositionp_top (kþ1) Future predictions top composition n-butanep_bot (kþ1) Future predictions bottom composition n-butane


The coefficient of determination which also determines themeasure of fit is defined as below;

R2 ¼ 1"∑T

t ¼ Lðyt"bytÞ2

∑Tt ¼ Lðyt"ytÞ

2 ð10Þ

Mean Absolute Percentage Error (MAPE) is measure of accuracyin a fitted time series value, given by;

MAPE¼1N

∑N

i ¼ 1

jFt"At jAt

! 100% ð11Þ

Pearson Correlation Coefficient (Cp), measures the goodness ofthe regression fit: the closer the value to one indicate higheraccuracy as given below;

Cp ¼∑NS

j ¼ 1ðEp;j"Ep;jÞðEa;j"Ea;jÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi∑NS

j ¼ 1ðEp;j"Ep;jÞ2∑NSj ¼ 1ðEa;j"Ea;jÞ2

q ð12Þ

Theil’s Inequality Coefficient (TIC), measures the model evalua-tion for the difference between output model and the actualoutput is considered as the error given below;

TIC ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi∑N

i ¼ 1ðyi" yiÞ2

q

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi∑N

i ¼ 1yi2

qþ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi∑N

i ¼ 1yi2

q ð13Þ

Based on the statistical analysis described above, the criteria forits acceptable performance is decided on the deviation betweenactual and composition prediction by NN, PLS and RA establishedas follows; low RMSE, CDC approaching 100, small AIC and BIC, R2

approaching 1, lowMAPE, CP approaching 1 and low TIC value. Eqs.6–10 are obtained from [29], Eq. 11 are obtained from [30], Eq. 12are obtained from [31] and Eq. 13 obtained from [24] for this work.

5. Results and Discussion

5.1. Step tests for reboiler flow rate

Figs. 4–7 show some of the step tests of the reboiler flow rate datasets. In order to generate the input-output data for the neural networktraining, various step changes are applied to the inputs to obtain thecorresponding outputs in which the inputs for this system is thereboiler flow rate. The step test of the reboiler flow rate which is oneof the manipulated variable are generated by using multi amplituderectangular pulse [32]. The step test is important to observe the effectand the fluctuations of the process variable when performing changesto the reboiler flow rate. The fluctuations of Temp 1, Flow 1 andPressure 1 (see Figs. 4–6) increases and decreases as the reboiler flowrate changes as shown in these figures. Level 1 (see Fig. 7) has no effectto the fluctuations as the step test of the reboiler flow rate changeswhich indicates that level does not effect the composition of n-butane.The step test for the reflux flow rate, the other manipulated variable, isalso done in the same way but only the step tests for the reboiler flowrate are shown in this paper.

5.2. Online close loop composition validation and simulation

Figs. 8 and 9 represent the differences between online andsimulation of the top and bottom composition of the n-butane inthe column under normal operating condition. The calculated RootMean Square Error (RMSE) for the top and bottom composition is0.0251 and 0.0082 respectively and the Mean Square Error for topand bottom compositions is 0.00063 and 6.697!10"5 respec-tively for n-butane. These result shows that there is a small

Step test Temp 1

140

141

142

143

144

145

1 50 99 148 197 246 295 344 393 442 491 540Time (min)

Reb

oile

r flo

w ra

te(m

3/hr

)

50

52

54

56

58

60

62

Tem

pera

ture

(0C

)

Reboiler.Flow Temp 1

Fig. 4. Temp 1 Debutanizer top temperature.

Step test Flow 1

140

141

142

143

144

145

1 50 99 148 197 246 295 344 393 442 491 540Time (min)

Reb

oile

r flo

w ra

te(m

3/hr

)

051015202530354045

Flow

rate

(m

3/hr

)

Reboiler.Flow Flow 1

Fig. 5. Flow 1 Light Naphtha flow to storage.


deviation between the online and simulation data and the purposeof the close loop response is to validate between the online andsimulation data. Once the close loop results has been verified withthe simulation results, then the open loop response for thevariables that is not available from the plant could be obtainedin simulation, based on the same step size of the manipulatedvariable from the plant, which involve variables such as Temp 5,Pressure 1 and composition. The combined data consisting of theplant and simulation data are then used to developed the neuralnetwork model, represented by the equations as will be shown inthe next section. Similar data sets are also used to generate the PLS

and regression models for comparison with the neural networkpredictions for the top and bottom n-butane compositions.

5.3. Neural network, PLS and RA modeling

5.3.1. Neural network Equation-based modelAs mention in section 4.2, the final configuration of the neural

network model obtained from the training and validation exercise isgiven to be of a 10-10-2 network. By applying the general Eq. (1) forthis network with the linear activation function, we get the followingequation for the top and bottom composition prediction of n-butane

Step test Pressure 1

140

141

142

143

144

145

1 50 99 148 197 246 295 344 393 442 491 540Time (min)

Reb

oile

r flo

w ra

te(m

3/hr

)600

650

700

750

800

850

Pres

sure

(kP

a)

Reboiler.Flow Pressure 1

Fig. 6. Pressure 1 Debutanizer receiver overhead pressure.

Step test Level 1

140

141

142

143

144

145

1 50 99 148 197 246 295 344 393 442 491 540Time (min)

Reb

oile

r flo

w ra

te(m

3/hr

)

50

55

60

65

70

75

80

Leve

l (%

)

Reboiler.Flow Level 1

Fig. 7. Level 1 Debutanizer level.

Top composition n-butane

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 2000 4000 6000 8000 10000 12000

Time (min)

Com

posi

tion

(mol

e fra

ctio

n)

simulation online

Fig. 8. Top composition n-butane close loop.


where y1 refers to top composition and y2 refers to the bottomcomposition;

y¼y1y2

" #

¼ LW2;1f 1 IW1;1pþb1h i

þb2 ð14Þ

IW1;1 ¼ input weight at layer 1 (input layer)b1¼biases value at layer 1LW2;1 ¼ layer weight at layer 2 (hidden layer)b2¼biases value at layer 2

The values of the inputs weights IW1;1, layer weightLW2;1, b1

and b2, obtained after validation are given in the Appendix. A.Here p is the inputs to the neural network and for this case

study is given by the vector,

p¼ mv2ðkÞ mv2ðk"1Þ mv3ðkÞ mv3ðk"1Þ f ðkÞ f ðk"1Þ ptopðkÞh

ptopðk"1Þ pbotðkÞ pbotðk"1ÞiT

On applying the values of the respective weights and biases forthe validated optimum neural network model for Eq. (14) and withfurther pruning of the values, we get the following equation torepresent the neural network model for the composition predic-tion as in equation below ie;

y¼y1y2

" #

¼"0:29 0:15 0:37 0:23 0:38 0:40 "0:50 0:97 0:12 "0:31"0:09 0:006 0:31 "0:10 0:02 "0:019 "0:42 "0:12 0:36 "0:08

$ %p

þ"0:28"0:22

$ %ð15Þ

This Eq. (15) is obtained by simplifying the general Eq. 1 byconsidering only the hidden layer with inputs weights IW1;1, andthe output layer with the layer weight LW2;1

Initially the matrix input IW1;1 is multiplied with the inputvector, p and added to biases value b1. Since the activation functionof f1 is determined as unity, the resulting matrix is then multipliedto layer weight 2, LW2;1and added to biases value at layer 2, b2. Bypruning out the small resulting values, the equation is thensimplified to the version in Eqn (15).

This Eq. (15) is a multi input multi output equation basedrepresentation of the neural network model for compositionprediction of the debutanizer column. This equation is robust innature and can be easily used as an online estimation forcomposition in the column, without having to resort to use of

complex structure of the neural network, normally difficult to usein an online measurement system.

5.3.2. PLS modelAfter validation, The equation of PLS for prediction of n-butane

at top composition is given as

Y1;PLS ¼ 0:1335þ

mv2 ðkÞmv2 ðk"1Þmv3 ðkÞmv3 ðk"1Þf ðkÞf ðk"1Þp_top ðkÞp_top ðk"1Þp_bot ðkÞp_bot ðk"1Þ

2

66666666666666666664

3

77777777777777777775

0:07"0:07"0:060:06"0:06"0:110:06"0:010:68"0:83

2

6666666666666666664

3

7777777777777777775

þ

"0:0030:0007"0:0006"0:001"0:001"0:0007"0:0004"0:0000760:00030:0010:0020:0040:0180:004:

:

"0:0003

2

6666666666666666666666666666666666664

3

7777777777777777777777777777777777775

ð16Þ

and the equation of PLS for predictions of n-butane at the bottomcomposition is given as,

Y2;PLS ¼ 0:05276þ

mv2 ðkÞmv2 ðk"1Þmv3 ðkÞmv3 ðk"1Þf ðkÞf ðk"1Þp_top ðkÞp_top ðk"1Þp_bot ðkÞp_bot ðk"1Þ

2

66666666666666666664

3

77777777777777777775

0:002"0:0007"0:00120:0004"0:0040:0020:060:171:64"0:073

2

6666666666666666664

3

7777777777777777775

þ

"0:004"0:004"0:004"0:003"0:002"0:001"0:0011"0:00020:0010:0020:0020:001"0:0002"0:0017:

:

0:0012

2

6666666666666666666666666666666666664

3

7777777777777777777777777777777777775

ð17Þ

The F residual for PLS equation consists of 301 data points fortop and bottom composition.

Bottom composition n-butane

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

0 2000 4000 6000 8000 10000 12000

Time (min)

Com

posi

tion

(mol

e fra

ctio

n)

simulation online

Fig. 9. Bottom composition n-butane close loop.


5.3.3. Regression modelFor the regression model, the equations for the top and bottom

prediction n-butane are described below;

Y_1;RA ¼ 0:0008mv2 ðkÞ "0:0007mv2 ðk" 1Þþ 0:0004mv3 ðkÞ"0:0006mv3 ðk" 1Þ " 0:0011f ðkÞþ0:0019 f ðk" 1Þþ1:01p_top ðkÞ"0:051p_top ðk" 1Þþ 0:002p_bot ðkÞ"0:01p_bot ðk" 1Þ " 0:078 ð18Þ

Y2;RA ¼ 0:0019mv2 ðkÞ " 0:0018mv2 ðk" 1Þ " 0:002mv3 ðkÞ

þ0:001mv3 ðk" 1Þþ0:004f ðkÞ " 0:006 f ðk" 1Þþ0:30p_top ðkÞ " 0:23p_topðk" 1Þþ 0:81p_bot ðkÞ"0:059p_bot ðk" 1Þþ0:27 ð19Þ

5.3.4. Analysis of variance (ANOVA) results for neural network model5.3.4.1. Top composition. From Table 4, the adjusted R2 is smallerthan R2 value since to the number of cases is relatively small andthe number of predictor variables is relatively large. There is a totalof 301 samples data observations. The sum of square regression iscalculated to be 0.0906 and the total sum of square is calculated tobe 0.0917. The multiple R is calculated based on the square root ofratio between these 2 values. The multiple R is proportional to thetotal variance in the actual and predicted value. The standard errorshows the ratio between the standard deviation to the square rootof number of observations. The degree of freedom (df) is thevariation between the sample size and number of groups withconfidence level 95%.

The sum of square (SS) consists of regression, residual and total.It is explained by the difference between each group mean and theoverall mean. The value of mean squares (MS) are obtained fromthe ratio of the sum of the square (SS) to the degree of freedom(df). The F value is obtained from the ratio of MS of regression toMS of residual. From the ANOVA analysis outlined in Table 4, the Fvalue obtained is 2562. It indicate that the between estimategroups is more than 2562 times the within group estimate. Thestandard deviation (s) may also be determined from the MS ofresidual and the s value is 1.88!10"3.

5.3.4.2. Bottom composition. Table 5 also shows that the R2 value isgreater that the adjusted R2 due the number of cases which issmall and the number of predictor variables is large. The samplesdata observation consists of 301 data points. From the ANOVAanalysis obtained in Table 5, the F value is 127. It indicates that thebetween groups estimate is more than 127 times the within groupestimate. The significance F value is relatively very small sotherefore the different population mean are recorded. The Fvalue is larger than 1.83, which indicates that all the variablesinvolved for composition prediction is important and related to

each other. The standard deviation s could also be determinedfrom the MS of the residual and has the value of 6.05!10"3.

The analysis for top and bottom composition based on ANOVAis used to determine the hypothesis between the actual andpredicted value of n-butane composition. The F test in ANOVAprovides a single test of the hypothesis that all the population isassume to be equal. The F test was used to access differences for aset of two group where the two groups are the regression andresidual.

5.4. Comparison NN, PLS and RA

Fig. 10 shows the observed versus predicted values of the topcomposition of n-butane as predicted by the neural networkequation. It is apparent that all the points fall close to the 45degree line. The calculated RMSE for the NN equation is 6.6!10"7

were the square regression of one indicates excellent fit of data.Fig. 11 shows the composition line plot of the actual and neuralnetwork equation for n-butane top composition. Fig. 12 shows theobserved versus predicted values of the bottom composition ofn-butane from the NN equation. It is apparent that all the pointsfall close to the 45 degree line. The calculated RMSE for the NNequation is 3.88!10"7. Fig. 13, shows the composition line plot ofthe actual and neural network equation for the n-butane bottomcomposition. The CDC value for top composition is calculated to beat 26.33 and for bottom composition is calculated to be 100 wherehigh CDC value indicates better prediction. The regression value ofR for top and bottom composition is 1 and thus the predictionbetween the actual and simulated is similar. The Cp value forbottom and top composition are calculated to be 1 and the MAPEfor top and bottom are calculated to be 0.0005 and 0.00132respectively. The TIC values for bottom and top composition arecalculated to be 3.56!10"6 and 2.45!10"6 respectively.

Fig. 14 shows the observed versus predicted values of the topcomposition of n-butane from using the PLS equation. It isapparent that all the points fall close to the 45 degree line. Thecalculated RMSE for the PLS equation is 0.002 with R2 is 0.9851 butthe scattered data points around the regression line are anindication of poor prediction. Fig. 15, shows the composition plotof the actual and PLS equation n-butane top composition. Fig. 16shows the observed versus predicted values of the bottom com-position of n-butane from PLS equation. The calculated RMSE forthe PLS equation is 0.0059 and the value of the R2 is 0.8117. Againscattered data points around the regression line are an indicationof poor prediction by the PLS equation. Fig. 17, shows thecomposition plot of the actual and PLS equation n-butane bottomcomposition. The CDC value for top composition is calculated to17.66 and for bottom composition is calculated to be 56.66. Theregression value of R for top and bottom composition is 0.99 and0.9 respectively with Cp value is almost close to 1. The Cp value for

Table 4ANOVA of the n-butane top composition for NN model.

Regression Statistics

Multiple R 0.9943R Square 1.00Adjusted R Square 0.9884Standard Error 0.0018Observations 301

ANOVA df SS MS F Significance F

Regression 10 0.0906 0.0090 2562.012 2.3449E-276Residual 290 0.0010 3.5399E-06Total 300 0.0917

Table 5ANOVA of n-butane bottom composition for NN model.

Regression Statistics

Multiple R 0.9526R Square 1.00Adjusted R Square 0.9383Standard Error 0.0060Observations 301

ANOVA df SS MS F Significance F

Regression 10 0.0467 0.0046 127.565 5.7343E-100Residual 290 0.0106 3.6617E-05Total 300 0.0573


bottom and top composition are calculated to be 0.9 and 0.99respectively and the MAPE for top and bottom are calculated to be0.034 and 0.97. The TIC values for bottom and top composition arecalculated to be 5.51!10"2 and 7.9!10"3 respectively.

Fig. 18 shows the observed versus predicted values of then-butane top composition using regression analysis equation.

Most of the data points falls close to the 45 degree line but withmore scatter than the neural network case. The calculated RMSEfor the regression equation is 0.0021 and the value of the R2 is0.9888. Fig. 19 shows the composition plot of the actual and RAequation of the n-butane top composition. Fig. 20 shows theobserved versus predicted values of the n-butane bottom

R2 = 1

0.1

0.11

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.2

0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2Actual composition (mole fraction)

Pred

icte

d co

mpo

sitio

n (m

ole

fract

ion)

NN equation top composition n-butane

Fig. 10. Prediction versus actual value neural network equation top composition n-butane.

Neural network prediction top composition n-butane

0

0.05

0.1

0.15

0.2

0.25

0 50 100 150 200 250 300

Time (min)

Com

posi

tion

(mol

e fra

ctio

n)

Actual NN eq

Fig. 11. Prediction and actual value for top composition n-butane.

NN equation bottom composition n-butane

R2 = 1

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

Actual composition (mole fraction)

Pred

icte

d co

mpo

sitio

n (m

ole

fract

ion)

Fig. 12. Prediction and actual value equation based neural network bottom composition n-butane.


composition using regression analysis equation. The points arescattered as shown in the figure by the RA equation. This indicatespoor prediction by the RA equation. The calculated RMSE for thenormal regression equation is 0.0064 and the value of the R2 is0.8148. Fig. 21 shows the composition plot of the actual and RAequation n-butane bottom composition. The CDC value for topcomposition is calculated to 17.33 and for bottom composition iscalculated to be 56.66. The Cp value for bottom and top composi-tion are calculated to be 0.89 and 0.99 respectively. The MAPE for

top and bottom are calculated to be 0.058 and 2.67. The TIC valuesfor RA prediction bottom and top composition are calculated to be5.46!10"2 and 6.86!10"3 respectively.

The Akaike information criteria (AIC) is related to the square ofresidual to the number of free model parameters. The purpose is toweigh the error of the model against the number of parameters.The BIC is similar to AIC except that it is motivated by the Bayesianmodel selection principles. The AIC values depend on the meansquare error, the variance, the number of free model parameter

Neural network prediction bottom composition n-butane

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 50 100 150 200 250 300

Time (min)

Com

posi

tion

(mol

e fra

ctio

n)

Actual NN eq

Fig. 13. Prediction and actual value for bottom composition n-butane.

PLS equation top composition n-butane

R2 = 0.9851

0.1

0.11

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.2

0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2


Pred

icte

d co

mpo

sitio

n (m

ole

fract

ion)

Fig. 14. Prediction versus actual value equation based PLS top composition n-butane.

PLS prediction top composition n-butane

0

0.05

0.1

0.15

0.2

0.25

0 50 100 150 200 250 300

Time (min)

Com

posi

tion

(mol

e fra

ctio

n)

Actual PLS eq



and number of parameter. The BIC values depend on the meansquare error, the variance, number of observation, the number offree model parameter and number of parameter. The AIC and BICpredicted by NN for top composition is calculated to be 2572 and2555 respectively while the AIC and BIC for bottom compositioncalculated to be 1957 and 1942 respectively. The AIC and BICpredicted by PLS for top composition is calculated to be 2573 and2558 respectively. The AIC and BIC for bottom compositioncalculated to be 2073 and 2059 respectively. The AIC and BICpredicted by RA for top composition calculated to be 2580 and

2560 respectively and the AIC and BIC for bottom composition,calculated to be 2074 and 2058 respectively. These values can beseen in Table 6, which shows that the neural network equationwith smaller AIC and BIC values, still gives the optimum predictioneven with slight extra parameters in its formulation.

From the statistical analysis outlined in Table 6, NN equationgive better prediction for the n-butane composition than PLSequation and RA equation as the calculated RMSE is small, CDCis high, R2 is close to 1, MAPE is close to 0, Cp is close to 1 and TICclose to zero. The CDC values for NN are larger compared to PLS

PLS equation bottom composition n-butane

R2 = 0.8117

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09Actual composition (mole fraction)

Pred

icte

d co

mpo

sitio

n (m

ole

fract

ion)

Fig. 16. Prediction versus actual value equation based PLS bottom composition n-butane.

PLS prediction bottom composition n-butane

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 50 100 150 200 250 300

Time (min)

Com

posi

tion

(mol

e fra

ctio

n)

Actual PLS eq


RA equation top composition n-butane

R2 = 0.9888

0.1

0.11

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.2

0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2


Pred

icte

d co

mpo

sitio

n (m

ole

fract

ion)

Fig. 18. Prediction versus actual value equation based RA top composition n-butane.


and RA and the high CDC indicates that the subsequent actualchange of the predicted variable is high. The R and Cp value for NNis the optimum performance as the neural network predictionmatches the actual data. The MAPE values indicate that NNprediction is the optimum as the values are closest to 0 comparedto PLS and RA where the MAPE values are larger. When having aperfect fit, MAPE is zero. The percentage error calculated for MAPEis to compare the error of the fitted time series. The difference

between actual value and predicted value divided by the actualvalue determine the MAPE. The absolute value is summed forevery value fitted in time and divided again by the number offitted points. The TIC values indicate the prediction by NN is thebest as the TIC values are small as compared to PLS and RA. Thesestatistical analyses proves that the prediction by the proposed NNmodel gives optimum performance, better than the other conven-tional methods.

Regression prediction top composition n-butane

0

0.05

0.1

0.15

0.2

0.25

0 50 100 150 200 250 300

Time (min)

Com

posi

tion

(mol

e fra

ctio

n)

Actual RA eq


RA equation bottom composition n-butane

R2 = 0.8148

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09


Pred

icte

d co

mpo

sitio

n (m

ole

fract

ion)

Fig. 20. Prediction versus actual value equation based RA bottom composition n-butane.

Regression prediction bottom composition n-butane

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 50 100 150 200 250 300

Time(min)

Com

posi

tion

(mol

e fra

ctio

n)

Actual RA eq



The difference in computing time using these differentapproaches are shown in Table 7 where the NN model takes lessthan 5 seconds to compute which is faster than the PLS (45 sec-onds) and RA method (1 minute). Hence it is suitable for onlinemeasurement since the industrial method takes more than 1 dayto analyse and compute.

5.5. Residual analysis

Figs. 22 and 23 show the residual of the neural networkequation, PLS equation and normal regression equation for topand bottom composition n-butane respectively. From the plot, theresidual of the neural network equation is smaller compared to thePLS equation and NR equation. This shows that neural network isable to predict the top and bottom composition n-butane withhigh accuracy with small error compared to the PLS and RA.

Residual analysis is very important to evaluate the deviationbetween actual and prediction for all the three models.

6. Conclusion

This paper presents the prediction of the composition of n-butane at the top and bottom of a debutanizer column using theequation based neural network model which is then compared toother methods such as PLS and regression analysis. All of theresults gives optimum results in predicting the n-butane composi-tions but it can be concluded that NN equation gives the best n-butane prediction compared to other models based on thestatistical analyses. This proposed equation based NN model isuseful for online composition prediction since it is robust, versatilewith fast computing time and hence can be easily applied as a softsensor for the distillation column. It could also easily be further

Table 6Statistical analysis of NN equation, PLS equation and RA equation for top and bottom n-butane predictions.

NN eq PLS eq RA eq

rmse_bottom 3.88E-07 0.0059 0.0064rmse_top 6.6E-07 0.0020 0.0021CDC_bottom 100 56.66 56.66CDC_top 26.33 17.66 17.33R_bottom 1 0.90 0.89R_top 1 0.99 0.99AIC_bottom 1957.26 2073.63 2074.26AIC_top 2572.72 2573.78 2580.29BIC_bottom 1942.43 2059.8 2058.44BIC_top 2555.89 2558.96 2560.46MAPE_bottom 0.00132 0.97 2.67MAPE_top 0.0005 0.034 0.058Cp_bottom 1 0.90 0.89Cp_top 1 0.99 0.99TIC_bottom 3.56E-06 5.51E-02 5.46E-02TIC_top 2.45E-06 7.90E-03 6.86E-03

Table 7Computing time.

NN eq PLS eq RA eq

Computing time 5 second 45 second 1 minute

Residual analysis top composition equation NN, PLS and RA

-0.02

-0.015

-0.01

-0.005

0

0.005

0.01

0.015

0.02

0.025

0 50 100 150 200 250 300

Time (min)

Res

idua

l com

posi

tion

(mol

efra

ctio

n)

6.54E-07

6.56E-07

6.58E-07

6.60E-07

6.62E-07

6.64E-07

6.66E-07

6.68E-07

6.70E-07

6.72E-07

NN

resi

dual

com

posi

iton

(mol

efra

ctio

n)

PLS RA NN

NN PLS RA

Fig. 22. Residual analysis for neural network equation, PLS equation and regression analysis equation top composition n-butane.


applied as an inverse controller in the equation form especially fornonlinear system, where linear controllers are not able to performsuccessfully. This proposed model based NN method is also easierto visualize and applied for various applications as compared tomethod of using the black box neural network structure which iscumbersome and non-portable in nature. Furthermore it is MIMObased model that can predict both the top and bottom composi-tion through the use of a single vector equation.

Acknowledgment

The authors would like to acknowledge PETRONAS for provid-ing the required data and information for the research andUniversity Malaya for providing the research grant (PS107/2010B).

Appendix A

See Table A1.

References

[1] J.F. Canete, S. Gonzalez-Perez, P. Saz-Orosco, Artificial Neural Network Identi-fication and Control of a Lab-Scale Distillation Column using LABVIEW,International Journal of Intelligent Systems and Technologies 3 (2008)111–116.

[2] M.M. Zhang, X.G. Liu, A soft sensor based on adaptive fuzzy neural networkand support vector regression for industrial melt index prediction, Chemo-metrics and Intelligent Laboratory Systems 126 (2013) 83–90.

[3] Z.B. Yan, X.G. Liu, Soft sensing and optimization of pesticide waste incinerator,Asia Pacific Journal of Chemical Engineering 7 (2012) 635–641.

[4] J. Shi, X.G. Liu, Melt prediction by neural soft-sensor based on multi scaleanalysis and principal component analysis, Chinese Journal of ChemicalEngineering 13 (2005) 849–852.

[5] X.G. Liu, C.Y. Zhao, Melt index prediction based on fuzzy neural network andPSO algorithm with online correction strategy, American Institute ChemicalEngineering Journal 58 (2012) 1194–1202.

[6] K.o. Ma Ming-Da, Wang Jing-Wei, San-Jang, Wu Ming-Feng, Jang Shi Shang,Shieh Shyan-Shu,Wong David Shan-Hill, Development of adaptive soft sensorbased on statistical identification of key variables, Control EngineeringPractice 17 (2009) 1026–1034.

[7] V. Prasad, B. Wayne Bequette, Nonlinear system identification and modelreduction using artificial neural networks, Computer and Chemical Engineer-ing 27 (2003) 1741–1754.

[8] L. Fortuna, S. Graziania, M.G. Xibilia, Soft sensors for product quality monitor-ing debutanizer distillation columns, Control Engineering Practice 13 (2005)499–508.

[9] V. Singh, I. Gupta, H.O. Gupta, ANN-based estimator for distillation usingLevenberg Marquardt approach, Engineering Applications of Artificial Intelli-gence 20 (2007) 249–259.

[10] A. Zilochian, K. Bawazir, Application of Neural Network in Oil Refineries, CRCPress, 2001. (Chapter 7).

[11] M.A. Hussain, Review of the application of neural networks in chemicalprocess control – simulation and online implementation, Artificial Intelligencein Engineering 13 (1999) 55–68.

[12] Y. Xuefeng, Hybrid artificial neural network based on BP-PLSR and itsapplication in development of soft sensors, Chemometrics and IntelligentLaboratory Systems 103 (2010) 152–159.

[13] I.M. Mujtaba, M.A. Hussain, Optimal Operation of Dynamic Processes UnderProcess-Model Mismatches: Application to Batch Distillation, ComputersChemical Engineering 22 (1998) 621–624.

[14] M.A.I.M. Greaves, I.M. Mujtaba, M. Barolo, A. Trotta, M.A. Hussain, NeuralNetwork approach to dynamic optimization of batch distillation Application toa Middle-vessel Column, Trans IChemE 81 (2003) 393–401.

[15] L. Eriksson, E. Johansson, N. Kettaneh-Wold, J. Trygg, C. Wilstrom, S. Wold,Multi and Megavariate Data Analysis Part I Basic Principles and Applications,2nd edition, Umetrics Academy, 2006.

Residual analysis bottom composition equation NN, PLS and RA

-0.025

-0.02

-0.015

-0.01

-0.005

0

0.005

0.01

0.015

0.02

0 50 100 150 200 250 300

Time (min)

Res

idua

l com

posi

tion

(mol

efra

ctio

n)

3.82E-07

3.84E-07

3.86E-07

3.88E-07

3.90E-07

3.92E-07

3.94E-07

3.96E-07

3.98E-07

4.00E-07

4.02E-07

4.04E-07

NN

resi

dual

com

posi

ton

(mol

efra

ctio

n)

PLS RA NN

NN

RA PLS

Fig. 23. Residual analysis for neural network equation, PLS equation and regression analysis equation bottom composition n-butane.

Table A1Input weight and biases value for n-butane with partition.

input weight 1,1 for the first layer b1¼biases at layer 1

"0.86 "0.82 0.98 "0.09 0.34 0.97 0.96 0.04 0.65 "0.16 "0.11"0.55 0.51 0.36 "0.08 0.97 "0.62 0.23 0.15 "0.36 "0.13 0.530.23 "0.40 0.11 "0.16 0.21 "0.19 "0.45 0.34 0.63 0.71 0.430.70 "0.59 0.30 "0.17 0.81 "0.13 0.95 0.39 "0.56 "0.40 "0.100.74 "0.22 "0.72 0.34 0.63 "0.77 "0.62 "0.77 "0.30 0.50 "0.200.95 0.05 "0.95 0.31 0.91 0.77 0.08 0.68 0.81 "0.10 0.37

"0.81 0.59 0.44 0.45 "0.43 "0.50 0.35 "0.80 0.36 0.86 "0.32"0.25 0.17 0.25 0.16 "0.24 0.72 "0.89 0.84 0.13 "0.93 0.86"0.65 0.92 "0.47 1.01 "0.07 "0.71 "0.37 0.64 "1.08 0.14 "0.840.13 0.11 0.96 "0.07 0.85 "0.63 "0.73 "0.82 0.77 0.66 "0.57

layer weight 2,1 for the second layer b2¼biases at layer 2

0.28 "0.17 0.07 "0.11 "0.50 0.16 "0.63 0.04 0.67 0.57 0.29"0.09 0.24 0.18 0.02 0.24 "0.07 0.35 0.55 "0.37 "0.08 "1.08


[16] M. Kano, K. Miyazaki, S. Hasebi, I. Hashimoto, Inferential control system ofdistillation compositions using dynamic partial least squares regression,Journal of Process Control 10 (2000) 157–166.

[17] E. Zamprogna, M. Barolo, D.E. Seborg, Estimating product composition profile inbatch distillation via partial least square, Chemical Engineering Practice 12 (2004)917–929.

[18] S. Park, C. Han, A nonlinear soft sensor based on multivariate smoothingprocedure for quality estimation in distillation columns, Computer andChemical Engineering 24 (2000) 871–877.

[19] Zhang Zhang Yingwei, Yang, Complex monitoring using modified partial leastsquare method of independent component regression, Chemometrics andIntelligent Laboratory Systems 98 (2009) 143–148.

[20] R. Sharmin, U. Sundararaj, S. Shah, L.V. Griend, Y.J. Sun, Inferential sensors forestimation of polymer quality parameters: Industrial application of a PLS basedsoft sensor for a LDPE plant, Chemical Engineering Science 61 (2006) 6372–6384.

[21] P. Facco, F. Doplicher, F. Bezzo, M. Barolo, Moving average PLS soft sensor foronline product quality estimation in an industrial batch polymerizationprocess, Journal of Process Control 19 (2009) 520–529.

[22] Song Zhihuan Ge Zhiqiang, A comparative study of just in time learning basedmethods for online soft sensor modeling, Chemometrics and IntelligentLaboratory Systems 104 (2010) 306–317.

[23] J. Shi, X.G. Liu, Melt index prediction by weighted least square support vectormachines, Journal of Applied Polymer Science 101 (2006) 285–289.

[24] J. Shi, X.G. Liu, Y.X. Sun, Melt index prediction by neural network based onindependent component analysis and multi scale analysis, Neurocomputing70 (2006) 280–287.

[25] J.B. Li, X.G. Liu, Melt index prediction by RBF neural network optimized withan adaptive new ant colony optimization algorithm, Journal of AppliedPolymer Science 119 (2011) 3093–3100.

[26] H.Q. Jiang, Z.B. Yan, X.G. Liu, Melt index prediction using optimized leastsquare support vector machines based on hybrid particle swarm optimizationalgorithm, Neurocomputing 119 (2013) 469–477.

[27] H.Q. Jiang, Y.D. Xiao, J.B. Li, X.G. Liu, Prediction of melt index based onrelevance vector machine with modified particle swarm optimization, Che-mical Engineering and Technology 35 (2012) 819–826.

[28] R.M. Warner, Applied Statistics, Sage Publication, 2008.[29] Ramli Siti Aizura, Study of Neural Network for Heat exchanger with develop-

ment of graphical user interface, Thesis Universiti, Teknologi PETRONAS, 2006.[30] J. Wan, M. Huang, Y. Ma, W. Guo, Y. Wang, H. Zhang, W. Li, X. Sun, Prediction of

effluent quality of a paper mill wastewater treatment using an adaptive network-based fuzzy inference system, Applied Soft Computing 11 (2011) 3238–3246.

[31] R. Sharma, K. Singh, D. Singhal, R. Ghosh, Neural network applications fordetecting process faults in packed towers, Chemical Engineering and Proces-sing 43 (2004) 841–847.

[32] J.S. Lim, M.A. Hussain, M.K. Aroua, Control of a hydrolyzer in an oleochemicalplant using network based controllers, Neurocomputing 73 (2010) 3242–3255.

Nasser Mohamed Ramli is a PhD student in theChemical Engineering Department, Faculty of Engineer-ing, University of Malaya. He obtained his bachelor’sdegree in chemical engineering from LoughboroughUniversity, United Kingdom and his master’s degreefrom University of Queensland, Australia. His area ofresearch is in artificial intelligence, process modelingand control.

Dr Mohd Azlan Hussain joined the Department ofChemical Engineering, University of Malaya in 1987 asa lecturer and obtained his Ph.D in Chemical Engineer-ing from Imperial College, London in 1996. He is amember of the American Institute of Chemical Engi-neers and British Institute of Chemical Engineers. Atpresent he is holding the post of Professor in thedepartment of chemical Engineering. His main researchinterests are in modelling, process controls, nonlinearcontrol systems analysis and applications of artificialintelligence techniques in engineering systems. He haspublished more than 250 papers in book chapters,journals and conferences within these areas at present.

He has also publish and edited a book on “Application of Neural Networks andother learning Technologies in Process Engineering” published by Imperial CollegePress in 2001.

Dr Badrul Mohamed Jan, SPE is a researcher andacademic lecturer attached to the Department of Che-mical Engineering, University of Malaya, Malaysia. Heholds a BS, MS and PhD degrees in petroleum engineer-ing from New Mexico Institute of Mining and Technol-ogy. Jan’s research areas and interest include thedevelopment of super lightweight completion fluidfor underbalance perforation, ultra low interfacial ten-sion microemulsion for enhanced oil recovery, andconversion of palm oil mill effluent into super cleanfuel for diesel replacement. He has worked closely withindustry in oil and gas project such as 3 M Asia Pacificand BCI Chemical Corporation. He has also published

numerous technical conference and journal papers. Jan is the deputy director ofUniversity Malaya Center of Innovation & Commercialization. His responsibilitiesinclude providing an environment at the University of Malaya conducive toresearchers bringing their research outputs to a commercialization-ready level.

Dr Bawadi Abdullah is a Senior Lecturer in the Che-mical Engineering Department, Faculty of Engineering,Universiti Teknologi PETRONAS. He is also a Profes-sional and a Charted Engineer. He obtained his bache-lor’s degree in chemical engineering from University ofWales, Swansea, United Kingdom and master’s degreefrom Dalhousie University, Canada. He obtained hisPhD degree from University of the New South Wales,Australia. He teaches at undergraduate level coursessuch as Transport Phenomena, Chemical EngineeringThermodynamics and Chemical Analysis. His area ofresearch is reaction engineering.


Documents

Composition Prediction of a Debutanizer Column · Composition Prediction of a Debutanizer Column using Equation Based ... a Chemical Engineering ... Debutanizer column is an important