96
I Soft-sensor Development for Hydrocracker Product Quality Prediction Paula Sofia Lourenço Barbosa Thesis to obtain the Master of Science Degree in Chemical Engineering Supervisors: Professor Dr. Carla Isabel Costa Pinheiro (I.S.T. Portugal) Eng. Dora Luísa Rodrigues Moura Nogueira (GalpEnergia S.A.) Examination Committee Chairsperson: Professor Dr. Sebastião Manuel Tavares da Silva Alves Supervisor: Professor Dr. Carla Isabel Costa Pinheiro Member of the Committee: Professor Dr. José Monteiro Cardoso de Menezes June 2014

Soft-sensor Development for Hydrocracker Product Quality … · Paula Sofia Lourenço Barbosa Thesis to obtain the Master of Science Degree in Chemical Engineering Supervisors : Professor

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • I

    Soft-sensor Development for Hydrocracker Product

    Quality Prediction

    Paula Sofia Lourenço Barbosa

    Thesis to obtain the Master of Science Degree in

    Chemical Engineering

    Supervisors: Professor Dr. Carla Isabel Costa Pinheiro (I.S.T. Portugal)

    Eng. Dora Luísa Rodrigues Moura Nogueira (GalpEnergia S.A.)

    Examination Committee

    Chairsperson: Professor Dr. Sebastião Manuel Tavares da Silva Alves

    Supervisor: Professor Dr. Carla Isabel Costa Pinheiro

    Member of the Committee: Professor Dr. José Monteiro Cardoso de Menezes

    June 2014

  • II

  • I

    Man replies:

    You created night, I the lamp

    You created clay, and I the cup

    You-desert, mountain peak and valley

    I-flower bed, park and orchard

    It is I who grind a mirror out of stone

    And brew elixir from poison

    Excerpt from ‘Dialog Between Man and God’ by Muhammad Iqbal.

  • II

  • III

    Resumo

    Com o intuído de maximizar a sua capacidade produtiva, além da maximização do

    rendimento de cada barril de petróleo, a Refinaria de Sines investiu na instalação de uma unidade

    de hydrocracking. Dado que todos os combustíveis produzidos são objecto de rigorosa

    regulamentação, é necessário exercer um controlo apertado sobre a sua qualidade. Assim sendo,

    com o objectivo de implementar controlo avançado na unidade, procedeu-se a uma primeira

    abordagem à previsão de qualidade de Y produzido fazendo uso de um soft-sensor.

    Para desenvolver o soft-sensor para previsão da qualidade de Y, a unidade foi estudada,

    foram escolhidas variáveis de interesse e os seus dados históricos foram recolhidos e analisados.

    Procedeu-se também à realização de step-tests na unidade fabril real para melhor conhecimento da

    dinâmica e comportamento da fracionadora. Realizou-se posteriormente uma análise multivariada

    usando Principal Components Analysis seguido de regressão com Partial Least Squares para

    obtenção de um modelo linear que pudesse prever da melhor forma a qualidade de Y.

    Foram construídos quatro modelos (A, B, C e D) usando diferentes conjuntos de dados. Estes

    modelos eram bons detectores de falhas de processo, porque incluíram as variáveis com valores

    muito diferentes dos seus dados históricos nas suas equações. Todos estes modelos seguiram a

    dinâmica do processo e apresentaram boas previsões da variável de qualidade Y, sendo que o

    Modelo C é que apresenta melhores previsões e é a melhor escolha para ser implementado no

    sistema DCS como um sensor inferencial para providenciar previsões em tempo real da variável de

    qualidade Y.

    Palavras-chave: PCA, PLS, Analise Multivariada, Hydrocracking, Previsão de qualidade, Soft-

    Sensor.

  • IV

    Abstract

    The main goal of this work is to maximize the productive capacity, and the revenue from each

    oil barrel, GalpEnergia Sines Refinery has invested in an hydrocracking unit. Given that all fuels are

    subject to strict regulation, it is necessary to have tight control over their quality. Therefore, in order

    to implement future advanced control on the unit, we proceeded to a first approach of the prediction

    of a quality variable of the diesel produced by making use of a soft- sensor.

    To develop the soft sensor for quality prediction, variables of interest and their historical data

    were collected and analyzed. Step-tests were performed in the real industrial plant in order to better

    understand the dynamic behaviour of the fractionator.

    Four soft-sensors were developed using Principal Components Analysis followed by a Partial

    Least Squares regression to obtain linear models able of quality prediction. The soft-sensors

    developed were good detectors of process faults because they included the faulty variables for

    prediction.

    All soft-sensors followed process dynamics and showed good predictions of the variable quality

    Y. Model C presents the best predictions and is the best choice to be implemented in the DCS

    system as an inferential sensor to provide real time information of the Y prediction to the operators

    and also to be used for control purposes.

    Keywords: PCA, PLS, Multivariate Analysis, Hydrocracking, Quality prediction, Soft-Sensor.

  • V

    Agradecimentos

    Muitas foram as pessoas que me guiaram e ajudaram neste percurso. A todos devo

    agradecimentos, um respeito acrescido e a certeza de que a aprendizagem que fiz durante este

    período vai ser muito útil ao longo da minha carreira profissional.

    Gostaria de agradecer em primeiro lugar às minhas orientadoras, Senhora Professora Carla

    Pinheiro e Senhora Engenheira Dora Nogueira por todos os imensos ensinamentos e incansável

    motivação no decorrer deste trabalho. O apoio dado em períodos de maior stress e a calma e

    paciência com que me guiaram durante o desenvolvimento desta tese merecem toda a minha

    gratidão e respeito, tornando-as no exemplo a seguir na minha futura carreira.

    De seguida gostaria de agradecer ao Senhor Engenheiro José Roque pela oportunidade

    concedida de estagiar na Galp e também por todo o apoio disponibilizado, e também à Senhora

    Engenheira Cristina Ângelo pela disponibilidade em esclarecer dúvidas referentes a alguns

    softwares usados.

    Tenho de referir um especial agradecimento a todas as equipas operacionais e técnicas da

    Fábrica III da Refinaria de Sines. Ao Senhor Engenheiro Hugo Carabineiro gostaria de agradecer a

    grande oportunidade de fazer testes na unidade de Hydrocracking e todo o tempo que dispensou a

    esclarecer duvidas, todo o feedback dado, todas as opiniões, todos os dados fornecidos e acima de

    tudo, o acolhimento e a maneira como gentilmente que desde o primeiro minuto me fez sentir

    confortável num ambiente desconhecido para mim. Também gostaria de agradecer ao Senhor

    Manuel dos Santos, não só pelo caloroso acolhimento na Refinaria, como também o fornecimento

    de algumas informações que me ajudaram na compreensão do processo de hydrocracking e do

    funcionamento da refinaria. A ele tenho também de agradecer a infinita motivação na realização

    desta tese e a boa disposição com que dispensava o seu tempo para me ajudar neste trabalho.

    Gostaria também de referir o apoio dos Senhores Engenheiros Eurico Correia e António Pinto, que

    cederam o seu tempo e conhecimentos aquando do planeamento e realização dos testes na

    Fraccionadora. Não posso esquecer o empenho, dedicação e auxílio dos chefes de turno da sala de

    controlo, os Senhores Paulo Azevedo e Joaquim Santiago que acompanharam de perto todos os

    testes efectuados. Gostaria ainda de agradecer aos operadores de consola, os Senhores Mário

    Oliveira e Jorge Elias, pela intensa e incansável dedicação aos testes, pelos ensinamentos e

    também pelo excelente e caloroso acolhimento no seu local de trabalho.

    Além destes profissionais ligados directamente à área de estudos deste trabalho, também

    tenho de agradecer aos colegas que partilharam o dia a dia comigo no piso 10 da Torre C da Galp.

    A enorme simpatia e boa disposição tornaram o ambiente de trabalho leve e agradável.

    Todos estes excelentes profissionais mostraram-me que a Galp Energia vale muito mais pelo

    seu Capital Humano do que pelos seus lucros anuais.

  • VI

    Não me posso esquecer de alguns dos excelentes colegas do Técnico, entre eles João Pedro

    Ferraz, Mafalda Lancinha, Inês Lino, Juliana Mota, Marisa Pardal, Sara Bernardo e Ana Paias, entre

    outros, cujo companheirismo e amizade me ajudaram nos momentos mais difíceis no IST, e que

    construíram comigo momentos de pura alegria. A todos agradeço profundamente todos os

    momentos que com eles partilhei.

    Last but never least, gostaria de agradecer a meus pais por todas as palavras de motivação e

    também por toda a dedicação e amor, sem a sua constante dedicação não teria conseguido

    terminar o curso. Ao meu irmão em particular agradeço pelo exemplo de trabalho duro e empenho,

    para além do humor retorcido que partilha comigo.

    A todos aqui nomeados, sem excepção, quero exprimir de novo o meu mais sentido

    Muito Obrigada.

  • VII

    Contents

    1. Introduction...................................................................1

    1.1 Supply/Demand.............................................................................................................1

    1.2 Demand by Sector.........................................................................................................1

    1.3 Market Trends...............................................................................................................3

    1.4 Industry Profile..............................................................................................................4

    1.5 Thesis Drivers and Overview .......................................................................................7

    2. Hydrocracking Process Overview..............................9

    3. State of the Art............................................................13

    3.1 Soft-sensor Definition and Application in Industrial Processes...................................13

    3.2 Soft-sensor Methodology............................................................................................14

    3.3 Data Driven Methods for Soft-sensing........................................................................16

    3.3.1 Principal Components Analysis (PCA)..........................................................17

    3.3.2 Partial Least Squares (PLS)..........................................................................19

    4. Implementation of Step Tests...................................21

    4.1 Step Tests Planning....................................................................................................21

    4.1.1 Historical Data Analysis and Variable Selection............................................21

    4.1.2 Sensitivity Analysis........................................................................................22

    4.2 Step Tests Results......................................................................................................25

    5. Soft-sensor Development..........................................29

    5.1 Model A.......................................................................................................................31

    5.1.1 Principal Components Analysis.....................................................................29

    5.1.2 Partial Least Squares....................................................................................35

    5.1.3 Model Calibration...........................................................................................37

    5.1.4 Model Validation............................................................................................37

    5.2 Model B.......................................................................................................................40

    5.2.1 Principal Components Analysis.....................................................................40

    5.2.2 Partial Least Squares ...................................................................................45

    5.2.3 Model Calibration...........................................................................................47

    5.2.4 Model Validation............................................................................................49

    5.3 Model C.......................................................................................................................50

    5.3.1 Principal Components Analysis.....................................................................50

    5.3.2 Partial Least Squares....................................................................................54

  • VIII

    5.3.3 Model Calibration...........................................................................................56

    5.3.4 Model Validation............................................................................................58

    5.4 Model D.......................................................................................................................59

    5.4.1 Principal Components Analysis.....................................................................59

    5.4.2 Partial Least Squares....................................................................................63

    5.4.3 Model Calibration...........................................................................................65

    5.4.4 Model Validation............................................................................................67

    5.5 Model Results Summary.............................................................................................68

    5.6 Soft-Sensor Fault Detection........................................................................................71

    6. Conclusion..................................................................75

    7. Future Work................................................................77

    8. Bibliography/References...........................................79

  • IX

    List of Figures

    Figure 1.1 - World supply of primary energy.[1] ..................................................................................... 1

    Figure 1.2 – Percentage shares of oil demand by sector in 2010 and 2035. [1] .................................... 2

    Figure 1.3 – Global product demand, 2012 and 2035. [1] ...................................................................... 3

    Figure 1.4 - Crude Prices in US dollars, these include Saharan Blend, Girassol, Oriente, Iran Heavy,

    Basra Light, Kuwait Export, Es Sider, Bonny Light, Qatar Marin, Arab Light, Murban and Merey. [2] .. 4

    Figure 1.5 - Global Capacity requirements by process. [1] .................................................................... 5

    Figure 1.6 - Crude products, source: Skrebowski Energy Institute Oil Deplection Conference, 2008. 5

    Figure 4.1 – Graphical User Interface for Petro-SIM™ after the fractionator was built and modelled.

    ............................................................................................................................................................ 22

    Figure 4.3 – Laboratory results for Y during step tests. ...................................................................... 25

    Figure 5.1 - Eigenvalues and cross-validation RMSECV curves for Model A data. ........................... 31

    Figure 5.2 - Scores plots: Q residuals vs Hotelling T2 (top) and confidence ellipse on PC1 vs PC2

    plot (bottom). ....................................................................................................................................... 32

    Figure 5.3 - Correlation Map for Model A. .......................................................................................... 33

    Figure 5.4- Loadings plot PC1 vs PC2 for Model A. ........................................................................... 34

    Figure 5.5- RMSECV Y (Y) vs number of LV plot for Model A. .......................................................... 35

    Figure 5.6 - PLS scores plots for Model A: Q Residuals vs Hotelling's T2 (left) and Scores on LV1 vs

    Scores on LV2 (right) .......................................................................................................................... 36

    Figure 5.7- Model A calibration results. .............................................................................................. 36

    Figure 5.8 – Parity plot of the calibration step. ................................................................................... 37

    Figure 5.9 – Model A validation results. .............................................................................................. 37

    Figure 5.10 – Parity plot of the validation step of Model A. ................................................................ 38

    Figure 5.11 – Eigenvalues and cross-validation RMSECV curves for Model B data. ........................ 40

    Figure 5.12 - Scores plots: Q residuals vs Hotelling T2 (top) and confidence ellipse on PC1 vs PC2

    plot (bottom). ....................................................................................................................................... 41

    Figure 5.13 - Eigenvalues and cross-validation RMSECV curves for Model B data (without outliers)

    ............................................................................................................................................................ 42

    Figure 5.14 - Scores plots: Q residuals vs Hotelling T2 (top) and confidence ellipse on PC1 vs PC2

    plot (bottom), from Model B, without outliers. ..................................................................................... 43

    Figure 5.15 – Correlation Map for Model B (without outliers) ............................................................. 43

    Figure 5.16 - Loadings plot PC1 vs PC2 for Model B. ....................................................................... 44

    Figure 5.17- RMSECV Y (Y) vs number of LV plot for Model B. ........................................................ 45

    Figure 5.18 – PLS scores plots for Model B: Q Residuals vs Hotelling's T2 (left) and Scores on LV1

    vs Scores on LV2 (right). .................................................................................................................... 46

    Figure 5.19 – Calibration Results for Model B. ................................................................................... 47

    Figure 5.20 – Parity plot of the calibration step of Model B. ............................................................... 47

    Figure 5.21– Validation Results for Model B. ..................................................................................... 48

  • X

    Figure 5.22- Parity plot of the validation step of Model B. .................................................................. 48

    Figure 5.23 - Eigenvalues and cross-validation RMSECV curves for Model C data. ......................... 50

    Figure 5.24 - Scores plots for Model C. .............................................................................................. 50

    Figure 5.25 - Eigenvalues and cross-validation RMSECV curves for Model C data (without outliers).

    ............................................................................................................................................................ 51

    Figure 5.26 – Scores plots for Model C (without outliers) ................................................................... 52

    Figure 5.27- Correlations map for Model C (without outliers). ............................................................ 52

    Figure 5.28 - Loadings map for Model C. ........................................................................................... 53

    Figure 5.29 - RMSECV Y (Y) vs number of LV plot for Model C. ...................................................... 54

    Figure 5.30- PLS scores plots for Model C: Q Residuals vs Hotelling's T2 (left) and Scores on LV1 vs

    Scores on LV2 (right). ......................................................................................................................... 55

    Figure 5.31- Calibration Results for Model C. ..................................................................................... 56

    Figure 5.32 – Parity plot of the calibration step of Model C. ............................................................... 56

    Figure 5.33- Validation Results for Model C. ...................................................................................... 57

    Figure 5.34 – Parity plot of the validation step of Model C ................................................................. 57

    Figure 5.35 – Eigenvalues and cross-validation RMSECV curves for Model D data. ........................ 59

    Figure 5.36 – Scores plots for Model D. ............................................................................................. 59

    Figure 5.37 - Eigenvalues and cross-validation RMSECV curves for Model D data (without outliers).

    ............................................................................................................................................................ 60

    Figure 5.38 – Scores plots for Model D (without outliers). .................................................................. 61

    Figure 5.39 – Correlations map for Model D (without outliers). .......................................................... 61

    Figure 5.40 – Loadings map for Model D. .......................................................................................... 62

    Figure 5.41 – RSMSECV Y vs number of LV plot for Model D. .......................................................... 63

    Figure 5.42 – PLS scores plots for Model D: Q Residuals vs Hotelling's T2 (left) and Scores on LV1

    vs Scores on LV2 (right). .................................................................................................................... 64

    Figure 5.43 – Calibration results for Model D. .................................................................................... 65

    Figure 5.44 – Calibration parity plot for Model D. ............................................................................... 65

    Figure 5.45 – Validation results for Model D ....................................................................................... 66

    Figure 5.46 – Validation parity plot for Model D.................................................................................. 66

    Figure 5.47 – Model A validation for the dataset 1st of November 2013 to January 13th of 2014. ..... 70

    Figure 5.48 - Model B validation for the dataset 1st of November 2013 to January 13th of 2014. ...... 70

    Figure 5.49- Model C validation for the dataset 1st of November 2013 to January 13th of 2014. ....... 71

    Figure 5.50 – ‘Corrected’ Model A validation for the dataset 1st of November 2013 to January 13th of

    2014. ................................................................................................................................................... 72

    Figure 5.51 - ‘Corrected’ Model B validation for the dataset 1st of November 2013 to January 13th of

    2014. ................................................................................................................................................... 72

    Figure 5.52 - ‘Corrected’ Model C validation for the dataset 1st of November 2013 to January 13th of

    2014. ................................................................................................................................................... 72

  • XI

    List of Tables

    Table 4.1– Simulation results for the negative step tests. .................................................................. 23

    Table 4.2 – Simulation results for positive step tests ......................................................................... 23

    Table 4.3 – Simulation results for both positive and negative tests in variable X22........................... 23

    Table 4.4 - Fractionator step tests scheduling. ................................................................................... 24

    Table 4.5 – Y magnitude of variation for each step test on variables X3, X10, X12 and X9 .............. 26

    Table 4.6 - Y magnitude of variation for each step test on variables X13 and X8. ............................. 26

    Table 4.7 - Y magnitude of variation for each step test on variable X22. ........................................... 26

    Table 5.2- PCA results obtained for Model A data. ............................................................................ 30

    Table 5.3 - PLS results of Model A data. ............................................................................................ 35

    Table 5.4 - PCA results for Model B. .................................................................................................. 39

    Table 5.5- PCA results for Model B (without outliers). ........................................................................ 41

    Table 5.6 – PLS results for Model B data. .......................................................................................... 45

    Table 5.7- PCA results for Model C .................................................................................................... 49

    Table 5.8 - PCA results for Model C (without outliers) ....................................................................... 51

    Table 5.9 - PLS results for Model C .................................................................................................... 54

    Table 5.10 – PCA results for Model D ................................................................................................ 58

    Table 5.11 – PCA results for Model D (without outliers) ..................................................................... 60

    Table 5.12 – PLS results for Model D. ................................................................................................ 63

    Table 5.13 – Model results summary .................................................................................................. 67

    Table 5.14 – Performance criteria VAF for the modelling results. ...................................................... 67

  • XII

    Abbreviations

    ANN Artificial Neural Networks

    b/d barrels of oil per day

    DCS Digital Control Systems

    LV Latent Variable

    mb/d thousand barrels oil per day

    mboe/d thousand barrels oil equivalent per day

    MSE Mean Square Error

    MSEP Mean Square Error of Prediction

    NFS Neuro-Fuzzy Systems

    OECD Organization for Economic Co-operation and Development

    OPEC Organization of the Petroleum Exporting Countries

    PC Principal Component

    PCA Principal Components Analysis

    PLS Partial Least Squares

    RMSEC Root-Mean-Square Error of Calibration

    RMSECV Root-Mean-Square Error of Cross-Validation

    RTDB Real Time Database

    SVM Support Vector Machines

    VAF Variance Accounted For

    VGO Vacuum Gas Oil

  • XIII

    Nomenclature

    Symbol Description

    b Inner linear regression coefficient

    E Principal Components Analysis residual matrix

    F Partial Least Squares residual matrix

    m Number of matrix columns

    n Number of matrix rows

    N Number of Samples

    p Input loading vector

    pT Transpose input loading vector

    P Loading matrix from Matrix X decomposition

    PT Transpose Matrix of P

    q Output loading vector

    qT Transpose output loading vector

    Q Loading matrix from matrix Y decomposition

    QT Transpose Matrix of Q

    R2 Coefficient of Determination

    s Number of splits

    t Input score vector

    T Score matrix from matrix X decomposition

    u Output score vector

    uT Transpose input score vector

    U Score matrix from matrix Y decomposition

    x Laboratory analysis

    �� Model Prediction �� Mean value of the Laboratory Analysis X Input Matrix

    Y Output Matrix

  • XIV

  • 1

    1. Introduction

    Living without energy seems, nowadays, unthinkable. The need for energy comes from the

    need of comfort, and this need includes heating, technology and the ability to move and travel. Most

    of the technological world we live in today would not exist in the absence of fuels, and most

    particularly in the absence of fossil fuels like petroleum, coal or natural gas.

    1.1 Energy Supply and Demand

    Fossil fuels accounted for 82% of energy supply in 2010 and will be 80% of the global total in

    2035. In 2010 oil demand was of 81,2 mboe/d accounting for 32,2% of fuel shares (figure 1.1),

    demand for coal was 69,8 mboe/d having 27,7% of the shares, and gas supply was 54,8 mboe/d

    having 21,7% fuel shares (the remaining 18,4% are distributed between nuclear, hydro, biomass and

    other renewables). The prediction for 2035 is that the oil demand will be of 100,2 mboe/d counting

    for 26,3% of fuel shares, coal demand is predicted to be 104,0 mboe/d (having 27,2% of fuel shares)

    and gas will have a demand of 99,8 mboe/d, accounting for 26,0% of fuel shares. By 2035 the global

    oil use per head will average just 3,2 barrels, up from 2,4 in 2010 [1].

    Figure 1.1 - World supply of primary energy.[1]

  • 2

    1.2 Demand by Sector

    Of all sectors of oil consumption, transportation of people and goods (road, aviation, railways,

    and marine transport) is the main use of oil, and the other sectors are the petrochemical industry,

    agricultural/commercial/residential, and also electricity generation sector. In 2010 transportation

    accounted for 52% of all oil use and the prediction for 2035 is of 60%, figure 1.2. Furthermore,

    transportation is the main drive for the overall oil consumption increase and this increase is often

    stimulated by demographic changes, higher wealth levels, increasing urbanization, etc, and all of

    these lead to more passenger car ownership. Although car ownership will grow in OECD

    (Organization of Economic Co-operation and Development) states, the major pull in car demand will

    be in developing Asian countries and China, having the latest the biggest oil demand growth. In the

    period of 2010 until 2035 the number of cars in OECD countries will rise by 125 million, but in China

    alone, the rise is substantially dramatic, of about 380 million cars. The overall car park in 2035 will be

    1,9 billion cars, more than the double of the 2010 numbers. As for global oil demand for

    transportation, in 2010 demand was of 34,6 mboe/d and the prediction for 2035 is of 44,6 mboe/d. [1]

    Figure 1.2 – Percentage shares of oil demand by sector in 2010 and 2035. [1]

    Although air traffic is expected to rise, it is somewhat crippled by financial crisis in OECD since

    72% of share in aviation oil demand is of OECD countries, and global recession made aviation oil

    consumption values in 2010 smaller than in 2000.These facts show how much aviation oil demand is

    closely linked to economic activity. Furthermore, it is also heavily influenced by jet fuel prices.

  • 3

    Demand for aviation oil was in 2010 of 5 mboe/d and predictions point to 7,2 mboe/d in 2035, having

    the developing countries leading the growth in demand.

    Oil supply in non-OPEC (Organization of the Petroleum Exporting Countries) countries, was of

    46,4 mb/d in 2012 and is estimated of 45,9 mb/ by 2035, passing through a peak in 2020 of 50,3

    mb/d. As for OPEC countries, supply in 2012 was of 31 mb/d and it is estimated that in 2035 supply

    will be of 37 mb/d.

    The refined product demand, and in the particular case of the middle distillates, in 2012 was of

    32,3 mb/d (accounting for 36,3% of all distillated products) and it is estimated that in 2035 the

    demand will be of 44.1 mb/d (accounting for 40,7% of all distillated products), having the biggest

    growth rates of all the distillates, and particularly the diesel oil with the largest growth rate, figure 1.3. [1]

    Figure 1.3 – Global product demand, 2012 and 2035. [1]

    1.3 Market Trends

    Crude prices have increased dramatically since the middle of the 2000’s, especially since

    2006. The reason for this increase may be because there was (and still is) a rapid growth in Asian

    economies that are sustained in large quantities of oil consumption. In 2008, the US faced the

    longest recession since the Great Depression and therefore ceased to trade oil causing the crude

  • 4

    prices to decline. Knowing this, OPEC decides to also decrease production by the end of the year.

    This decrease in production and the continuing demand in China, have a positive effect on the price,

    and the prices rise steadily from the middle of the year 2009. After 2011, prices surpass those of the

    previous peak in 2008 because of the civil war and loss of production in Libya, and continue to

    increase due to unrest in Middle Easter and North African countries.

    Figure 1.4 - Crude Prices in US dollars, these include Saharan Blend, Girassol, Oriente, Iran Heavy, Basra Light, Kuwait Export, Es Sider, Bonny Light, Qatar Marin, Arab Light, Murban and Merey.

    [2]

    1.4 Industry Profile

    The refining capacity was usually measured by distillation capacity, but nowadays capacity for

    conversion and product quality improvements prove to be the vital role in processing raw crude

    fractions into more valued products, especially now that the trend is for higher demand in lighter

    products with more limitative quality specifications. All new refinery projects have high levels of

    desulphurization and secondary processing leading to the ability to produce high yields of light clean

    products that comply with the most advanced specifications. Moreover, these new projects are

    designed to be able to refine heavy, low quality crudes as well as better quality grades of crude. The

    prediction for 2035 is that projects for conversion capacities will increase more that distillation

    capacities, figure 1.5. Within the conversion projects, hydrocracking will have the highest growth,

    because hydrocracking is the primary means to produce incremental distillate, once straight run

    fractions from crude have been maximized. [1]

  • 5

    Figure 1.5 - Global Capacity requirements by process. [1]

    With the rise in crude prices comes the need to make the most of the oil barrel, especially

    now, that the barrel is traded at about $105 US dollars. Figure 1.6 shows the percentage yields of

    the products obtained from a crude barrel. Since the oil price is rising and the demand is growing for

    middle distillates, there is a need to convert the heavier fractions of the crude distillation into lighter

    distillates, preferably diesel.

    Figure 1.6 - Crude products, source: Skrebowski Energy Institute Oil Deplection Conference, 2008.

  • 6

    Sulphur removal from Diesel proves to be the greatest challenge of the refining industry, due

    to having greater need for processing unit addictions and higher costs. Diesel quality specifications

    vary between geographic regions. In EU, Japan, Hong-Kong, New Zealand, Australia, South Korea,

    Taiwan, Argentina, Armenia and Singapore the limit for Sulphur concentrations for on-road Diesel is

    10 ppm. In the US, Canada and Chile the limitation is of 15 ppm. In some countries, there is even a

    variation between cities. In China, the limit is 350 ppm with the exception of Beijing that has a

    limitation of 10 ppm, and selected cities in the country have a limitation of 50 ppm. For India, the

    nationwide limitation is of 350 ppm, but for selected cities is of 50 ppm. Belarus and Thailand have a

    limitation of 50 ppm. The region with the lowest Diesel quality is Africa, having sulphur limits between

    2000-3000ppm, with the great exception of South Africa, that plans to reduce the limitation to 10 ppm

    by 2017. [1]

    In 2009 there were about 195 hydrocracker units operating worldwide, processing about

    4.000.000 b/d of feedstock [3]. In the start of run, the vast majority of hydrocrackers can reach a near-

    zero sulphur content. Hydrocracker designs include single-stage (either once through or with recycle)

    or multiple-stage hydrocracker (usually two-stage) and can run in vacuum gas oil (VGO) cracking

    mode or in light-cycle oil. In the single-stage and once–through hydrocracker there is only one

    reactor and the bottom of the fractionator (the unconverted oil) is not recycled for further cracking,

    and it is usually needed to hydrotreat the feedstock to remove ammonia and sulphur (or the reactor

    is equipped with catalyst to perform this pre-treatment task). Single-stage hydrocrackers with

    recycling are the most used configurations because the uncracked residual from the bottoms of the

    fractionator returns to the reactor for further cracking, increasing the reaction’s overall yield. Two-

    stage hydrocrackers use two reactors, being the unconverted oil from the bottom of the fractionator,

    recycled to the second reactor for further cracking. Since the first stage reactor performs both

    hydrotreating and hydrocracking, the second stage reactor feed is almost entirely free of ammonia

    and sulphur. [3]

    Hydrocracker technology has become a key process to convert low-value, high-sulphur,

    heavy-oil fractions into valuable products. This is of particular importance in an environment where

    the rising crude prices, figure 1.4, have shrunken the profit margin and have forced refineries to

    consider upgrading to poorer quality crudes and difficult hydrocracker feedstocks, and also were tight

    fuel regulation and emissions legislation are major operational constraints. [4]

  • 7

    1.5 Thesis Outline

    The goal of the work presented in this thesis is to develop a soft-sensor that enables the Galp

    Refinery at Sines to predict the quality of Y produced by their hydrocracker, and to use it in advanced

    control design of the unit. Having such a sensor enables the refinery to find ways to increase the

    production of excellent quality Y to meet both market demand and also the existing regulatory

    limitations. This study is necessary and important since it was the first of the kind and there was no

    previous work in soft-sensor design for quality prediction in hydrocracking products in a online unit.

    Starting with the description of the hydrocracking process at the refinery in Chapter 2, we will

    proceed by describing the chemometric tools used to develop the soft-sensor (PCA and PLS), in

    Chapter 3. Chapter 4 will describe the steps taken to plan and perform the step tests in the

    hydrocracker unit and also the results of the tests. Chapter 5 will describe the development and

    results of calibration and validation of the soft-sensors. Chapter 6 presents the conclusions of this

    study, and the in last chapter, Chapter 7, future work to be done to improve the soft-sensor is

    suggested.

  • 8

  • 9

    2. Hydrocracking Process Overview

    Hydrocracking process is a catalytic process used for cracking the complex high-boiling, high

    molecular weight hydrocarbons mixtures[5] into more valuable low-boiling products [6] like kerosene,

    diesel and naphtha. Hydrocracking is a very important and flexible refinery process because it can

    process a large variety of gas oils, manufacturing products with low sulphur content and high smoke

    point jet fuel, in order to meet the demand of cleaner and environmentally friendly fuels [5].

    In this process, the cracking of carbon-carbon single bonds and the hydrogenation of the

    double bonds are complementary phenomena [7], because the cracking reaction provides olefins for

    hydrogenation and hydrogenation liberates the heat for cracking [6]. The hydrogenation reactions are

    highly exothermic and the cracking reactions are slightly endothermic, making the overall process

    highly exothermic. Hydrogenation reactions extend not only to olefins but also to aromatic, sulphur,

    nitrogen and oxygen compounds [6], making the separation of these pollutants easier, and rendering

    less costly to meet of the current fuel specifications.

    This chapter will describe the hydrocracking process at Sines Galp Refinery, which is

    important to the comprehension of the subject at hand. The description will begin at the hydrogen

    make-up compression section, followed by the reaction section (the filter system, reactor feed

    section, reaction system and effluent cooling), the effluent separation, fractionator section and

    storage.

    The aim of the make-up compression section is to compress hydrogen to ensure a continuous

    supply of hydrogen to the reaction section to preserve system pressure, since hydrogen is consumed

    in the reaction and is also lost by dissolution in the hydrocarbon liquid and eventually through leaks.

    This make-up compression section is composed of three parallel trains of compression, having each

    one of them three stages of compression. In normal operation, only two out of three are working,

    making the third a spare. The feed gas is divided between the two operating trains, compressed to

    the desired reaction section pressure and then combined and fed to the reaction section.

    Vacuum Gas Oil (VGO) is the fresh feed for the first stage reactor (A-01) in the reaction

    section and is pre-heated in the kerosene/fresh feed exchanger (B-01), followed by further heating in

    the diesel/fresh feed exchanger (B-02), being afterwards sent to the filter system (C-01 A/B/C). The

    filters are designed to clear particulate material from the fresh feed that could plug the catalyst bed in

    the first stage reactor, causing not only catalyst deactivation but also pressure drop problems.

    After filtration, the feed is sent to the filtered feed surge drum (D-01). From this drum, the feed

    is pumped to the reactor system pressure. The D-01 is designed to prevent fluctuations and losses

    of feed to the pumps and reaction section.

    The oil feed to the second stage reactor (A-02) is the unconverted oil from the first and second

    stage reactors and comes from the fractionator (D-02) bottoms. This stream is cooled by heat

  • 10

    exchange with the feed to the fractionator furnace in the fractionator feed and bottom exchanger B-

    03 and in the fractionator bottom steam generator B-04. The second stage reactor feed stream is

    then pumped to the reaction section.

    After leaving their feed pumps, the feeds from the first and second stage reactors are mixed

    pre-heated with hydrogen from either the make-up hydrogen from the make-up compression section

    or the recycled hydrogen. The combined feed mixture to the first stage reactor is heated in the first

    stage feed/effluent exchanger (B-05 A/B) and afterwards in the first stage furnace (E-01). The

    second stage feed mixture is heated in the second stage reactor/effluent exchanger (B-06) and

    afterwards in the second stage furnace (B-02).

    The heated gas/oil mixtures are fed to their respective stage reactor: the first stage reactor,

    (that has two types of catalyst, one for hydrotreating and the other one for hydrocracking, having six

    catalyst beds) and the second stage reactor (that has only hydrocracking catalyst, having four

    catalyst beds). As soon as the feed contacts the catalyst, the reaction begins and, because the

    reactions are highly exothermic, the temperature of the mixture increases and also that of the

    catalyst’s beds. To prevent excessive heating and to control the reaction temperature, a quench gas

    (hydrogen) is introduced between the catalyst beds of each reactor section.

    After reaction, the reactors effluents consist of product oil, excess hydrogen not consumed in

    the reaction and light gases formed during hydrocracking. The stream leaving the first stage reactor

    is cooled by heat exchange with the reactor’s feed in B-05 A/B and then mixed with the second stage

    effluent. The second stage effluent is cooled by exchanging heat with the fractionator feed in B-07

    and afterwards it is combined with the first stage reactor effluent. This mixture is further cooled B-08

    and then sent to a steam generator (B-09) to complete the cooling before feeding the hot high

    pressure separator (HHPS), D-03. The D-03 is designed to separate the excess hydrogen from the

    reaction liquids, enabling the recycling of the hydrogen gas to the reaction section, in order to reduce

    the cost of producing hydrogen. The remaining liquid products are then let down in pressure by the

    power recovery engine (F-01) and is then flashed in the hot low pressure separator (D-04). This high

    temperature, low pressure flash enables the separation of dissolved hydrogen gas in the liquid, and

    the gas is recycled to the reaction section.

    The D-04 bottoms is fed to the product stripper (D-05) to separate H2S, LPG and some

    naphtha from the liquid reaction product. This stripper has three packed beds. The stripper bottoms

    is heated in B-08 by heat exchange with the reactors effluent, also by exchanging heat with the

    second stage reactor effluent in B-07. This stream is then further heated by heat exchange with the

    fractionator bottoms stream, in the fractionator bottoms/feed exchanger (B-10), before being sent to

    the fractionator feed furnace (E-03).

    The Fractionator feed is heated in the fractionator feed furnace with the aim of producing

    enough vapour rates so that overflash is produced in the column (in this case, overflash is defined as

    the ratio of volumetric liquid going to the stripping section and the total volumetric rate of the distillate

    products).

  • 11

    The product fractionator (D-02) in normal operation, the light naphtha is sent overhead, heavy

    naphtha kerosene and diesel are drawn trough sidecut and diesel is split into product and

    pumparound. The unconverted oil is drawn as the bottoms and the feed enters the column in the

    flash zone.

    Superheated low pressure stream is used in the fractionator’s stripping section to recover any

    products from the bottom before it is pumped from the column. This steam is cooled by heat

    exchange with the feed in the fractionator bottom and feed exchanger (B-10).

    Before being sent to the fractionator reflux drum, the overhead vapour is totally condensed at

    the fractionator overhead air cooler (B-11) and also at the overhead trim cooler (B-12). The reflux

    drum is a horizontal vessel designed to separate oil from water which is collected at the boot of the

    vessel and sent to the injection water drum (D-06). Part of the oil (light naphtha) is pumped to the

    light ends section and the remaining is pumped back as reflux to the fractionator.

    Heavy naphtha is drawn and flows to the heavy nafta stripper (D-07). This stripper has valve

    trays and a thermosiphon reboiler that exchanges heat with the diesel pumparound. The heavy

    naphtha vapour is returned to the fractionator, and the bottoms is pumped to the light ends section.

    Kerosene is drawn from the fractionator column and is sent to the kerosene stripper (D-08)

    that is similar to D-07, that is, has trays and a thermosiphon reboiler that exchanges heat with the

    diesel pumparound. The stripper’s vapour is returned to the fractionator, and the bottoms is pumped,

    cooled and sent to storage.

    Diesel is drawn from a chimney tray of the fractionator and the flow is split between a

    pumparound stream and also a stream fed to the diesel side stripper (D-09). This stripper uses

    superheated low pressure steam to remove light components from the product. It’s overhead vapour

    is returned to the fractionator and the diesel stripper bottoms is cooled by exchanging heat with the

    first stage fresh feed exchanger B-13, being further cooled by reboiling the B-14 (deethanizer

    bottoms reboiler). Because the diesel is a stream stripper, the water must be taken off to meet

    product specifications, therefore, the stream is sent to the diesel vacuum drier air cooler (B-15), and

    afterwards to the diesel vacuum drier (D-10). The bottoms of D-10 is then cooled in the diesel air

    cooler (B-16) and later in the diesel trim cooler (B-17). Part of the resulting stream is to be sent to the

    cold low pressure separator (D-11, designed to release hydrogen rich vapour, and after amine

    treating is recycled to the reactors), in the reaction section to be used as sponge oil. The remaining

    diesel is sent to storage.

    The diesel pumparound stream removed from the fractionator reduces column traffic above

    the diesel tray side cut and removes valuable high temperature heat that provides heat for four

    column reboilers and also produces medium pressure steam before it returns to the fractionator. So,

    this stream is pumped to the kerosene stripper reboiler (B-18), the heavy naphtha reboiler (B-19), the

    naphtha splitter reboiler (B-20) and the naphtha stabilizer reboiler (B-21) and finally to the medium

  • 12

    pressure generator (B-22) to ensure a continuous pumparound heat removal before entering the

    fractionator [8].

  • 13

    3. State of the Art

    This chapter presents the state of the art of soft-sensors and their scope and useful

    application in process industries. Moreover, soft-sensoring development and its difficulties will be

    discussed and also data-driven methods for soft-model development, particularly Principal

    Components Analysis (PCA) and Partial Least Squares (PLS), will be characterized and discussed.

    3.1 Soft-Sensors for industrial processes

    Chemical plants are usually highly instrumented and have a large number of sensors that

    collect measured data for process control and monitoring. About two decades ago researchers

    began using the large amount of data to build predictive models, and these models are called, in

    process industry, Soft-Sensors. The term soft-sensor is a combination of the words ‘software’ (mainly

    because models are developed in computer programs) and ‘sensors’, because these models are

    providing similar information as hardware sensors. These soft-sensors are often divided into two

    categories: model-driven and data-driven [9,10]. Model-driven sensors (also called white-box models)

    are most commonly based on First Principle Models that describe the physical and chemical

    properties of the process[9,10], are developed primarily for the planning of the plants and usually only

    describe ideal process steady-states and not real process dynamics, focusing on the description of

    the optimal process steady-state, (therefore not being useful or suitable for the description of any

    dynamic state), describing a simplified theoretical background rather than real-life process conditions [9] and being somewhat computationally intensive for real-time applications[10,11] .

    Data-driven models do not have this disadvantage because they are based on data measured

    within the processing plants, thus describing the true conditions of the process in a better way[9,10],

    providing real-time information necessary for effective quality control[11] Data-driven models are also

    known as black-box techniques because the model itself has no knowledge about the process and is

    based on empirical observations of the process. These models are based in real-life measurements

    recorded, stored and provided as historical data.[9]

    The span of tasks performed by Soft-Sensors is quite broad but the most common use is the

    prediction of process variables that can only be known either at low sampling rates or through off-line

    analysis [9,12]. These variables are usually very important for process control because they are

    usually related to the process output quality and it is naturally important and necessary to deliver

    additional information about these variables at higher sampling rate or lower financial burden[9,13],

    hence the use of soft-sensors. Another field of application of soft-sensors is of process monitoring

    and process fault detection by finding the state of the process and identification of the deviation

    source. As previously said, real industrial plants have many sensors, and there is a certain

  • 14

    probability of a sensor failing. Detecting this failure is also the soft-sensor job, adding that it can act

    up as a backup sensor while the hardware sensor is replaced, or, if the soft-sensor proves to be

    good, it can act as a replacement for the hardware measuring device. [9]

    Measuring variables that define product quality is a major problem in process industries.

    These variables are called primary or quality variables quantify the productivity or the specifications

    upon which the product is sold, like purity or physical or chemical properties, and these are the most

    difficult to measure online . The online variables that are easy to access and measure are often

    called secondary variables and can be temperature, pressure and flow rate and can be used to infer

    primary variables. Because of the nature of chemical and processing engineering systems, the

    dynamics and state of the secondary variables reflects the dynamics and state of the primary

    variables, meaning that changes in secondary variables are indicative of changes in product quality.

    The technique of using secondary variables to generate estimates of product quality is usually called

    ‘soft-sensing’ and these inferential estimators are usually in place of direct on-line measurement of

    controlled variables if direct measurements are expensive, unreliable or add large lag[13].

    Soft-sensors have been used for estimation of product composition of distillation columns,

    particle size distributions in a grinding circuit, monitoring emissions of NOx, SO2 and CO2 in industrial

    boilers and furnaces, ensure high and consistent product quality in the pharmaceutical industry and

    process reliability[11]. They have also been used as a feed oil classifier to determine feed oil type by

    estimation of kerosene dry point[14], modelling of an activated sludge plant for detection of shifts in

    the process of various kinds[15] modelling product quality in a crude desalting and dehydration

    process [13,] for oil sludge depository classification for waste treatment [16], to study the influence of

    minerals on the taste of bottled tap water[17], modelling of ground-level ozone and factors affecting

    it’s concentrations[18], to the prediction of product quality for catalytic hydrocracking of vacuum gas

    oil[10], just to say a few.

    3.2 Soft-Sensor methodology

    There are some problems affecting the development of the soft-sensors, and usually they are

    related with measurement noise, missing values, co-linear features and varying sampling rates.

    Adding to this, process plants are usually dynamic environments and abrupt changes can exist like,

    for example, the quality of the process input changes, that results in prediction accuracy

    deterioration[9].

    A challenge issue in soft-sensor development is data co-linearity, because typically, measured

    data in process industry is strongly co-linear and results from partial redundancy in the sensor

    arrangement ( for example: two neighbouring temperature sensors will collect strongly correlated

    measurements). As the measurements collected are usually for process control purposes, there is a

  • 15

    great number of information accumulated that is data rich but information poor. For soft-sensor

    modelling, the requirements are of other kind: only informative variables are required and any other

    information just adds to model complexity, having a negative effect on model training and

    performance. To deal with this problem, two methods are widely accepted, PCA and PLS, that

    transform the input variables into a new reduced space with less co-linearity[9].

    The presence of missing data presents difficulties in model development. Since it is necessary

    to use the maximum amount of samples to develop a model, missing data or removal of incomplete

    data decreases the accuracy of model estimates. Also, when the soft-model is applied and used to

    estimate the quality variable as a part of a control system, the sensor must be able to deal with the

    failure of some online measurements and still be able to provide reliable estimates[19]. Since the

    possibility of having representative data is larger in large datasets than in small, missing data should

    cause less problems in large datasets than in small datasets, because in large datasets any direction

    is still fairly well represented, at least as long as one works in subspaces of projections, like PCA or

    PLS[20].

    In soft-model construction methodology there are no widely accepted guidelines, but there are

    steps that are frequently taken in its development. The presented procedure is rather general but

    resumes the most common steps in model development.

    The usual first step is the first data inspection, where data structure is overviewed and any

    obvious problems are identified, like locked variables having constant value. The next step is to

    assess model complexity, that is, deciding if there is only a need for a simple regression model or a

    more powerful tool like PCA or PLS analysis, for example, to develop the soft-sensor. Also,

    assessing the target variable is very important, because there has to be enough variation in the

    output variable and understand if it can be modelled at all.

    Then one proceeds to the selection of historical data and identification of steady states. In this

    stage a dataset is selected for training and another for validation. The stationary parts of the data are

    identified, selected and used in model development. Next, data must be pre-processed. A typical

    pre-processing step is to normalise the data to zero-mean and unit variance (as required for PCA),

    but other types of pre-processing are also employed, such as handling missing data, outlier detection

    and replacement, selection of relevant variables, and handling of drifting data. The data processing

    is usually done iteratively until the developer considers the data and the model ready for validation.

    Data pre-processing is considered to be the most time consuming, manual work demanding and

    expert knowledge of the underlying process.

    Following pre-processing, the next phase is model selection, training and validation. Selection

    of model type is critical for soft-sensor performance. There is not a theoretical unified approach for

    this step and usually model type and its parameters are selected in an ad hoc manner and its

    selection often subjected to the developer’s past experience, expertise, and personal preference.

    However, there are some techniques that can be adopted, such as starting with a simple model type

    and then increase model complexity as long as significant model improvement can be observed (by

  • 16

    accessing model performance with independent data). After finding the optimal model structure and

    training the model, the soft-sensor has to be validated with independent data. The evaluation of its

    performance can be done numerically, by the use of the Mean Square Error (MSE), which measures

    the average square distance between the predicted and the real value, and by visual representation

    of the predictions. One disadvantage of this last method is that the final decision if the model

    performs adequately is rather subjective depending on the model developer experience.

    Finally, after its developing, the soft-sensor has to be maintained and tuned on a regular basis,

    and this is necessary due to the fact that drifts and other changes in the data deteriorate the

    performance of the soft-sensor, and have to be compensated by adapting or re-developing the

    model[9].

    3.3 Data-driven methods for soft-sensing

    Using soft-sensors in crude oil distillation with varying feed-stock is still a difficult problem to

    solve because of the relationship between easily measured process variables and the difficultly

    measured quality variables vary with the types of crude processed. Moreover, most of the refineries

    use mixed sources of crude oil with varying blending ratios, and the relationship between process

    variables and quality variables varies with different crude oils or blends[14].

    Hydrocracked products are separated into different fractions that constitute the blending

    stocks for the final products. The product quality is significantly influenced by operating conditions

    and the cracking yield is reduced with time by catalyst deactivation. Therefore, the continuous

    monitoring of product quality is very important especially to avoid off-spec petroleum fractions, that

    usually cause problems downstream at the blending stage[10].

    It is usually difficult to get precise and reliable product composition measurements without time

    delay because most composition analysers have significant time lags and their reliability is usually

    quite low. Using tray temperature could be an indication of temperature, but the presence of off-key

    components in multicomponent mixtures, column pressure, and also feed rate jump can affect tray

    temperatures preventing it of being an exact indicator of composition. Due to the strong correlation

    between tray temperature measurements, Principal Components Analysis (PCA) or Partial Least

    Squares (PLS) methods should be applied[21].

    The most used modelling techniques applied to data-driven soft-sensors are the Principle

    Component Analysis (PCA) in a combination with a regression model, Partial Least Squares (PLS),

    Artificial Neural Networks (ANN), Neuro-Fuzzy Systems (NFS) and Support Vector Machines (SVMs)

    [9].

  • 17

    Several reasons motivate the multivariate approach to a problem. Process deviations are not

    always detected by looking at one variable at a time, and often these deviations occur

    simultaneously in many variables and even though a variation is very small it can pose a significant

    influence on product quality. If the process drift to out-of-control state can be detected in early

    stages, corrective measures can be taken sooner to avoid such states. Also, if many variables have

    been measured, the effect of noise can be drastically diminished by modelling correlation structures

    among the different variables [15], and by the reduction of data dimension by using, for example, PCA.

    In this thesis PCA and PLS methods will be used because they are widely accepted and they

    are usually the first approach to soft-model development for of process control.

    3.3.1 Principal Components Analysis (PCA)

    Noise can be found in almost all variables of the majority of datasets. Latent variable models

    like PCA and PLS estimate the relevant part and the noise of each variable and therefore are used in

    the present work[20]. Principal Component Analysis (PCA) was used for analysing the data so that

    only the secondary variables important to the determination of product quality were selected[13].

    Using PCA, the data can be described using far fewer variables than the original variables with no

    significant loss of information, and also, PCA often produces linear combinations of variables that are

    useful predictors of particular processes[12]. Mathematicaly, PCA relies on an eigenvector

    decomposition of the covariation or correlation matrix of the process variables. Here X represents a

    matrix (n x m) where its rows correspond to the samples and its columns correspond to the

    variables. PCA the decomposes the data matrix X into the sum of the outer product of vectors ti and

    pi (i=1,2,3...,k) plus a residual matrix E, equation 2.1 and 2.2 (matrix form).

    � � ���� ���� ⋯ �� � (2.1) Or,

    � � �� � (2.2)

    Where PT is made up of the �� as rows and T of the �� as columns, and k in equation 2.1 must be less than or equal to the smaller dimension of X, i.e � � �����, ��. Vectors ti (n x 1) and pi (m x 1) are the ith score vector and loading vector, respectively. Score vectors are orthogonal and unit length

    and loading vectors are also orthogonal. Loading vector p1 defines the direction of greatest

    variability, and score vector t1 (also known as the first principal component) represents the projection

    of each column of X onto p1, being the linear combination of the columns of X explaining the greatest

    amount of variability (�� � ���). The second principal component is also the linear combination of the columns in X explaining the next greatest amount of variability (�� � ���) subject to the condition that it is orthogonal to the first principal component. Principal components are ordered in decreasing

  • 18

    variability. Since the X columns are highly correlated, the first few principal components can explain

    the majority of data variability[21].

    In equation 2.1, k represents the number of principal components to retain, and E (n x m) is

    the residuals matrix of unfitted variation (or noise) [21]. The matrix product of T and PT reproduces the

    most important variation in X. This matrix is a projection of the X-data onto a new low-dimensional

    space, where it can be effectively analysed. This reduction on space dimensionality is achieved due

    to correlations between the variables in matrix X, and this is the main reason why this method is

    specifically advantageous for data analysis with a large number of mutually correlated variables[16].

    The ��vectors are the eigenvectors of the covariance matrix, that is, in equation 2.3: �������� � ��� (2.3)

    Where � is the eigenvalue associated with the eigenvector ��. In PCA the �� are the loadings and contain information on how variables are related to each other. The �� form an orthogonal set while the �� are orthonormal. In equation 2.4 note that

    ��� � �� or � � �� (2.4) The pairs �� and �� are in descending order of �, having the first pair captured the largest

    amount of information of any other pair in the decomposition, and each subsequent pair captures the

    greatest possible amount of the remaining variance[12].

    The higher the loading of a variable, the more it contributes to explaining the variation of a

    particular principal component, and only variables with loadings higher than 50% should be selected

    for principal component interpretation, and any principal component with a value equal or greater

    than one, is usually considered of statistical relevance[13]. The matrices T and P provide valuable

    information on the internal data structure. These matrices are interpreted based on the fact that

    correlation between two variables (or similarity between two samples) is a function of distance in the

    PC-Space[16]. Pairwise scores plots are often referred to as ‘sample maps’ revealing their grouping

    and outliers. Similarly, the loading plots (variable maps) show variable correlations. The distance

    from the origin to a sample in the score plot or a variable on the loadings plot along a certain PC

    reflects their importance in regard to that PC[16].

    The number of principal components to be retained in the model is usually determined by

    cross-validation and the dataset for building a model is divided into training and testing (validating)

    data set[21]. In this study, the source of training and testing data is from the process data records,

    which are recorded and collected from the DCS systems, and the corresponding laboratory analysis.

    One of the limitations of pure PCA is that it can only effectively handle linear relationships of

    the data and cannot deal with data non-linearity. Another disadvantage is the selection of the optimal

    number of principal components (that can be addressed by using cross-validation techniques).

  • 19

    Another problem is that the principal components describe very well the input space but do not

    explain the relations between the input and output data, that is usually what has to modelled[13].

    3.3.2 Partial Least Squares Regression (PLS)

    The regression problem, that is, the modelling of response variables (primary variables) Y, by

    means of a set of predictor variables (secondary variables) X, is one of the most common problems

    in data-analysis in science and technology, and one example of such problems may include relating

    the quality and quantity of manufactured products (Y) to the conditions of the manufacturing

    process[22].

    The PLS algorithm pays attention to covariance matrix that brings together the input and

    output data space. This method decomposes the input and output simultaneously while keeping the

    orthogonality constraint, having the model focussed on the relation between the input and output

    variables[9]. PLS can be seen as an extension of PCA. This method is concerned with two data

    blocks, X and Y, and the objective is to model X in such a way that Y can be predicted as well as

    possible, maximizing the covariance between matrices X and Y. Matrix X is decomposed into a score

    matrix T and a loading matrix P[9,21] as show in equation 2.5 and 2.6 (matrix form):

    � � ∑ �". �" �"$� (2.5)

    � � �� � (2.6) In a similar way, y can be decomposed in a score matrix U and a loading matrix Q, in

    equations 2.7 and 2.8 (matrix form):

    % � ∑ &". '" ("$� (2.7)

    % � )* ( (2.8) Most of the variance of matrix Y is explained by the first latent variable that is extracted from

    the matrices X and Y. In a similar way, the second latent variable is extracted from the residual

    matrices which has not been described by the first variable, and so on. When optimal number of

    latent variables are calculated, the remaining variance is considered noise[9].

    The objective of this method is to fit a linear relationship between the dependent X variables

    and independent Y variables by performing a least squares regression between each pair of

    corresponding t and u latent vectors, equation 2.9:

    &+" � �"," - � 1,2, … . , � (2.9)

  • 20

    Where bj is the coefficient from the inner linear regression between the jth latent variables tj

    and uj, that is, in equation 2.10:

    ," � &" . �" 1&" . &"23 (2.10)

    Linear PLS leads to the decomposition of the X and Y matrices into a number of rank-one

    matrices. This decomposition can be defined as the product between each pair of input score

    vectors, t, and predicted output score vectors ,û, and a set of corresponding input and output loading

    vectors p and q. [21]

    The PLS method prediction performance was characterized by the Root Mean-Square Error of

    Cross-Validation (RMSECV) (equation 2.11) [24]:

    456�78 � 9∑ �:+;;?@ A (2.11)

    PLS is a simple and powerful approach for data-analysis for complex problems because of its

    flexibility and ability to deal with incomplete and noisy data with multiple variables and observations

    (measurements). In this study, PLS will only model one variable, but the method is able to model

    several response variables[22]. The disadvantage of PLS is that like PCA, it can only model linear

    relations between the data [9].

  • 21

    4. Implementation of Step Tests

    Performing step tests in the unit was of great importance and that was proposed early in this

    work. The unit is new and had never been submitted to step tests and therefore these tests were

    planned and performed, in order to better understand it’s response and behaviour. By better

    understanding the process performance, and by submitting the unit to step tests, we hoped to

    develop a soft-sensor that could explain and predict the behaviour of Y (the primary/quality variable)

    even in the case of the unit operating out of the specified operating temperature, pressure and flow

    values.

    This chapter presents the planning and the development of the tests carried out in the

    Refinery.

    4.1 Step Tests Planning

    4.1.1 Historical Data Analysis and Variable Selection

    Since step tests had never been performed in this unit the first approach was to select the

    variables that would have an influence on Y. This first step included the study of the fractionator

    together with the insight and experience of the Refinery Team and the Thesis Supervisors, and after

    some exchange of ideas and suggestions it was agreed that the variables X3, X8, X9, X10, X12, X13

    and X22 were to be tested.

    The next step was to build a preliminary model using the historical data available using PCA

    followed by PLS (obtained in a similar fashion as described in chapter 5). This model was to be used

    only in assessing if a given step test would indeed influence the quality variable Y, and if it did, how

    long it took the quality variable to stabilize. Then we looked into the historical data and checked if

    there were disturbances in the secondary variables selected previously that could be considered a

    ‘step-test’ (like a sudden decline or increase of rate or temperature). Using those ‘step-test’ values,

    we calculated the Y results, and evaluated and estimated the quality variable setting time for each

    step test. After carefully analysing the data, it was found that for variables X8, X9 and X10 the setting

    time between tests was required to be at least 2 hours; for variables X3, X12 and X13 the setting

    time between tests was to be at least 3 hours and finally, for variable X22, the setting time between

    tests was to be at least 5 hours. The sequence for the variables testing was agreed with the Refinery

    Team in order to reduce the overall impact in the operating conditions.

  • 22

    4.1.2 Sensitivity Analysis

    To evaluate the impact that the tests could have on the quality variable Y, the fractionator was

    modelled using the simulator Petro-SIM™ version 4.1 (Modelling platform for refiners, petrochemical

    and gas processing plants, from KBC Oil and Gas Consulting). To model the fractionator in this

    software, the fluid packaging chosen was Peng-Robinson-LK and all the rates, temperatures and

    pressures used were the ‘Base Case’ values of the Chevron’s Manual as the licensor’s of the

    hydrocraking unit of Sines Refinery [8].

    To start modelling the unit, one must first define the mixture of the feed stream, that is, define

    the viscosity, standard density and ASTM D86 distillation for each oil compound of the feed stream.

    Then, the compounds must be blended and the feed composition must also be described. After this

    procedure this stream is included in the unit and its flow, temperature and pressure are defined.

    Following this procedure, the side strippers were installed, as well as the stripping vapour streams,

    and the pumparound and bottom stream.

    The next step is to assign the pressure and tray efficiencies of the column. The printscreen of

    the fractionator after modelling is in figure 4.1.

    Figure 4.1 – Graphical User Interface for Petro-SIM™ after the fractionator was built and modelled.

  • 23

    After modelling the unit in Petro-SIM™, the amplitudes of the step tests were tested to acess

    their influence in Y. Based on previous tests in other units of the Refinery, it was decided to test if the

    impact of the following steps of -3%, -5%, -7%, -10%, -13% and -15%, and if changes of 3%, 5%,

    7%, 10%, 13% and 15% on each of the chosen secondary variables (except X22) had any influence

    on the quality variable. The results obtained for each of these simulation tests are shown in tables

    4.1, 4.2 and 4.3. Tables 4.1 and 4.2 show the simulation test results for X3, X8, X9, X10, X12 and

    X13. Each of the table’s lines express the percentage variance in the quality variable caused by the

    test in the corresponding secondary variable.

    Table 4.1– Simulation results for the negative step tests.

    Input Output Step Tests Amplitude

    -1(%) -3(%) -5(%) -7(%) -10(%) -13(%) -15(%) X3

    Y deviation

    (%)

    0,41 1,46 2,54 3,71 8,20 X8 -0,05 -0,17 -0,29 -0,42 -0,61 -0,88 -0,95 X9 -0,09 -0,29 -0,49 -0,71 -1,05 -1,40 -1,64 X10 -0,61 -0,61 -1,27 -1,60 -2,97 -4,61 -5,72 X12 -6x10-5 1x10-4 4,5 x10-5 1x10-4 6 x10-5 8 x10-5 1x10-4 X13 -0,13 -0,43 -0,72 -1,00 -1,39 -1,39 -1,61

    Table 4.2 – Simulation results for positive step tests

    Input Output Step Tests Amplitude

    1(%) 3(%) 5(%) 7(%) 10(%) 13(%) 15(%) X3

    Y deviation

    (%)

    -0,39 -1,09 -1,64 -1,60 -1,99 -2,26 -2,41 X8 0,00 0,16 0,26 0,36 0,55 0,72 0,83 X9 0,09 0,25 0,44 0,63 0,90 1,14 1,29 X10 0,52 1,05 1,52 1,54 2,44 2,36 2,79 X12 -3x10-6 9x10-7 -5x10-5 -4x10-5 -7x10-5 -6x10-5 -3x10-5 X13 0,13 0,39 0,70 0,98 1,35 1,68 1,88

    Table 4.3 shows the results of the simulation step tests results for variable X22. The tests

    amplitude of disturbances was different because of the nature of the variable. The step amplitudes

    used in the previous variables would not have any noticeable effect in Y, so the amplitudes of the

    step tests were increased in this case.

    Table 4.3 – Simulation results for both positive and negative tests in variable X22.

    Input Output Step Tests (%)

    -20 -10 +10 +20 +35 +40 X22 Y deviation (%) -0,141 -0,112 -0,056 -0,028 0,014 0,028

    Analysing the previous tables one might be tempted to conclude that the magnitude of these

    step tests on these variables has little influence in the Y, however, as can be seen in chapter 5,

    most of these variables are present on the models developed, demonstrating their importance.

  • 24

    Moreover, the amplitude of the step tests had to be such that the production of the unit would not be

    largely affected, hence the small magnitude of the tests.

    4.2 Step Tests Results

    The scheduling and the sequence of the variables testing was organized to accommodate the

    Refinery conveniences, in order to minimize the impact into the production profile and quality. The

    step tests were performed as much as possible without disrupting the Refinery’s routines. For each

    particular test a sample was taken and the time stamp of the sample was annotated. Samples were

    only taken after the calculations using the preliminary model showed that the quality variable Y had

    stabilized after a given step test. The scheduling of the step tests can be seen in table 4.4.

    Table 4.4 - Fractionator step tests scheduling.

    Predicted

    time Actual Time

    Variable Test Magnitude

    Sample Time

    Day 1 14:00 14:17 X3 -1 % 16:45 17:00 16:50 X3 +1 % 20:10 20:00 20:16 X3 -3 % 23:16 23:00 23:16 X3 +3 % 01:57 Day 2 2:00 2:00 X10 -3 % 03:53 4:00 4:00 X10 +5 % 05:53 6:00 6:00 X10 -5 % 08:07 8:00 8:10 X10 -7 % 09:57 10:00 10:00 X10 +5 % 11:57 12:00 12:04 X10 +5 % 13:05 14:00 13:40 X12 +7 % 16:21 17:00 16:27 X12 -1 % 18:52 20:00 19:00 X12 +2 % 20:55 23:00 21:20 X12 +5 % 22:57 Day 3 2:00 23:03 X12 -5 % 01:01 2:00 01:58 X9 +7 % 03:12 4:00 03:15 X9 -5 % 05:11 6:00 05:16 X9 -7 % 06:40 8:00 07:04 X9 +5 % 08:45 10:00 09:00 X9 -5 % 10:19 14:00 13:53 X13 +7 % 17:00 17:00 17:07 X13 -5 % 18:45 Day 4 24:00 03:23 X13 -7 % 05:17 3:00 05:18 X13 +5 % 07:25 6:00 07:27 X13 +5 % 08:45 9:00 09:21 X8 +13 % 10:40 11:00 10:48 X8 +1 % 11:54 13:00 11:59 X8 -5 % 13:47 15:00 13:53 X8 +11 % 15:06 16:00 15:10 X8 +11 % 16:30 18:00 16:41 X22 -15% 18:47 22:00 19:46 X22 +15% 22:10 Day 5 3:00 22:17 X22 +15% 00:53

  • 25

    0,99

    0,995

    1

    1,005

    1,01

    1,015

    1,02

    1,025

    1,03

    1,035

    0 10 20 30 40

    Y

    Sample number

    Pre-tests

    samplesX3

    X10

    X12

    X9

    X13

    X8

    X22

    Table 4.4 shows the predicted starting time for each test and the actual time the test was

    started as well as the variables to be tested, the magnitude of the test, the time the sample was

    taken and the result of that same sample. Most of the time the step tests started earlier because the

    quality variable Y stabilized earlier than predicted, and the next test could be made sooner.

    As expected, the step tests planned had a clear effect on the Y. All Y results (real values or

    predicted values) presented here and throughout this thesis are shown in an adimensional form, as

    seen in equation 4.1:

    % � BCDEFGHIJFKLG�AKGHI (4.1)

    The target range of [0.97,1.03] degrees Celsius for Y was covered. Having the Y cover a wide

    range of values allows the dynamic data to accommodate the influence of a wider range of process

    conditions on the quality variable. The laboratory analysis error for Y is 0.30% of the set point value

    of the quality variable Y, and most step tests results have been bigger than 0.30%.

    As noted in the subchapter 4.1.2, most of these variables end up appearing in the models

    developed in the next chapter. Variables X3, X10, X13, X8 and X22 appear in Model B, variables X3

    and X13 appear in Model C, and variable X13 appears in Model A.

    Figure 4.2 – Laboratory results for Y during step tests.

    Tables 4.5 to 4.7 presented show the effects of consecutive step tests had on the quality

    variable. From the analysis of figure 4.3 and the above mentioned tables, we can infer which

    selected step test variables influenced the most the response of the quality variable Y, and they are

    X3, X10, X13, and X22.

  • 26

    Table 4.5 – Y magnitude of variation for each step test on variables X3, X10, X12 and X9

    Table 4.6 - Y magnitude of variation for each step test on variables X13 and X8.

    Table 4.7 - Y magnitude of variation for each step test on variable X22.

    Variable Test Magnitude

    Y Magnitude of Variation

    Variable Test Magnitude

    Y Magnitude of Variation

    X3

    -1 % -0,47 % X12 +7 % 0,30 % +1 % 0,71 % -1 % -0,05 % -3 % -0,68 % +2 % -0,30 % +3 % -1,07 % +5 % 0,05 %

    X10 -3 % 1,89 % -5 % 0,49 % +5 % -1,55 % X9 +7 % -0,49 % -5 % -0,19 % -5 % -0,60 % -7 % 2,30 % -7 % 0,50 % +5 % -0,92 % +5 % 0,27 % +5 % -0,03 % -5 % 1,53 %

    Variable Test Magnitude

    Y Magnitude of Variation

    Variable Test Magnitude

    Y Magnitude Variation

    X13 +7 % -3,56 % X8 +13 % 0,14 % -5 % 0,08 % +1 % -0,05 % -7 % 0,92 % -5 % 0,14 % +5 % 0,69 % +11 % 0,08 % +5 % -0,22 % +11 % 0,11 %

    Variable Test Magnitude Y Magnitude of variation X22 -15% -1,10 %

    +15% 0,80 % +15% -2,17 %

  • 27

  • 28

    5. Soft-sensor Development

    The first approach to soft-modelling is usually by the use of the most widely accepted linear

    tools, like PCA and PLS regression. The main advantage of methods like PCA and PLS is that they

    can cope with highly correlated variables. This characteristic is suitable for analysing data from

    hydrocracking process units, because hydrocracking processes are multivariable systems and many

    of these variables are mutually correlated. To perform this type of analysis and model development,

    historical plant data for selected variables was collected and step tests were performed for carefully

    chosen variables and process conditions.

    In this section the quality variable Y is predicted using 25 online variables available in the

    database. These variables include flowmeters, temperature and pressure sensors and all are online

    measured variables. The selection of which variables should be included in the soft-sensor is a

    complex task and the strategy consists in finding a good variable subset capable of making accurate

    predictions. In this study two methods are used to obtain the soft-sensors to predict the quality

    variable: Partial Least Squares (PLS) as a linear modelling tool, and Principal Component Analysis

    (PCA) as a tool to select a good model variable set and to strip down the models from outliers and

    noise.

    Dataset were collected directly from the Digital Control System (DCS) and the Real Time

    Database (RTDB) of the Refinery and were used to build four models. The soft-sensors obtained

    from these data were labelled Model A, B, C and D, and the datasets are:

    Model A: The soft-sensor is obtained from training data collected during the week of the step

    tests, in 2013 from October 27th to October 31st, using the same data to calibrate the model.

    Model B: The soft-sensor is obtained from training data collected in 2013 between the August

    1st and the October 31st, using the same data to calibrate the model.

    Model C: The soft-sensor is obtained using training data collected in