Advanced Production Accounting

Advanced Production Accounting Industrial Modeling Framework (APA-IMF)

i n d u s t r IAL g o r i t h m s LLC. (IAL)

www.industrialgorithms.com July 2013

Introduction to Advanced Production Accounting, UOPSS and QLQP Presented in this short document is a description of what we call "Advanced" Production Accounting (APA). APA is the term given to the technique of vetting, screening or cleaning the past production data using statistical data reconciliation and regression (DRR) when continuous-processes are assumed to be at steady-state (Kelly and Hedengren, 2013) i.e., there is no significant material accumulation. Essentially, the model and data define a simultaneous mass and volume with density DRR problem. Figure 1 depicts a relatively small production accounting flowsheet problem configured in our unit-operation-port-state superstructure (UOPSS) (Kelly, 2004a, 2005, and Zyngier and Kelly, 2012).

Figure 1. Oil-Refinery Production Accounting Flowsheet (Kelly and Mann, 2005). The diamond shapes or objects are the sources and sinks known as perimeters, the triangle shapes are the pools or tanks and the rectangle shapes with the cross-hairs are continuous-

process units and as mentioned these units should have a steady-state detection algorithm (SSD) installed to determine if the units are steady or stationary. The circle shapes with no cross-hairs are in-ports which can accept one or more inlet flows and are considered to be simple or uncontrolled mixers. The cross-haired circles are out-ports which can allow one or more outlet flows and are considered to be simple or uncontrolled splitters. The lines, arcs or edges in between the various shapes are known as internal and external streams and represent in this context the flows of materials from one shape to another. This example and its data are taken directly from Kelly and Mann (2005) but is mapped to our UOPSS modeling framework which includes only one time-period typically defined for one business or calendar day. A related technique using multiple time-periods can be found in Kelly et. al. (2005) to trace or track production qualities throughput any process network and is useful for real-time or on-line monitoring applications as it involves dynamic DRR. In this example, we have a crude-oil distillation unit (CRD), vacuum distillation unit (VAC), fluidized catalytic cracking unit (FCC) and a catalytic reformer (REF) as well as twenty-four (24) tanks for crude-oil, intermediate and final product storage. The continuous-process units only conserve mass whereas all of the tanks conserve both mass and volume using density as the conversion from volume to mass. There are five (5) perimeter units which represent pipeline deliveries and liftings as well as a fuel gas burner export to a cogeneration utilities plant. For this data set, there is no finished product blending that occurred over this production accounting time-period and hence there is no blending header units shown. The key difference between the modeling found in Kelly and Mann (2005) and our formulation is that we use the concept of "ports" which allows for a more unambiguous and parsimonious representation of the quantity, logic and quality phenomenological (QLQP) data. For instance, on the CRD at out-port JVAC there are two flows out (quantity) simultaneously which requires only one density (quality) measurement given that JVAC is an implied splitter. However, in the Kelly and Mann (2005) formulation which does not employ the concept of ports, they require two density measurements i.e., one for each stream out which requires more pre-processing of the data to manage the density of each individual stream. The efficiency of UOPSS and QLQP is that only one density measurement at JVAC needs to be configured and through the topology of the superstructure, the necessary propagation of the out-port qualities is properly and automatically handled. Industrial Modeling Framework (IMF), IMPRESS and SIIMPLE To implement the mathematical formulation of this and other systems, IAL offers a unique approach and is incorporated into our Industrial Modeling and Pre-Solving System we call IMPRESS. IMPRESS has its own modeling language called IML (short for Industrial Modeling Language) which is a flat or text-file interface as well as a set of API's which can be called from any computer programming language such as C, C++, Fortran, Java (SWIG), C# or Python (CTYPES) called IPL (short for Industrial Programming Language) to both build the model and to view the solution. Models can be a mix of linear, mixed-integer and nonlinear variables and constraints and are solved using a combination of LP, QP, MILP and NLP solvers such as COINMP, GLPK, LPSOLVE, SCIP, CPLEX, GUROBI, LINDO, XPRESS, CONOPT, IPOPT and KNITRO as well as our own implementation of SLP called SLPQPE (Successive Linear & Quadratic Programming Engine) which is a very competitive alternative to the other nonlinear solvers and embeds all available LP and QP solvers. In addition and specific to DRR problems, we also have a special solver called SECQPE standing for Sequential Equality-Constrained QP Engine which computes the least-squares

solution and a post-solver called SORVE standing for Supplemental Observability, Redundancy and Variability Estimator to estimate the usual DRR statistics found in Kelly (1998 and 2004b) and Kelly and Zyngier (2008a). SECQPE also includes a Levenberg-Marquardt regularization method for nonlinear data regression problems and can be presolved using SLPQPE i.e., SLPQPE warm-starts SECQPE. SORVE is run after the SECQPE solver and also computes the well-known "maximum-power" gross-error statistics to help locate outliers, defects and/or faults i.e., mal-functions in the measurement system and mis-specifications in the logging system. The underlying system architecture of IMPRESS is called SIIMPLE (we hope literally) which is short for Server, Interacter (IPL), Interfacer (IML), Modeler, Presolver Libraries and Executable. The Server, Presolver and Executable are primarily model or problem-independent whereas the Interacter, Interfacer and Modeler are typically domain-specific i.e., model or problem-dependent. Fortunately, for most industrial planning, scheduling, optimization, control and monitoring problems found in the process industries, IMPRESS's standard Interacter, Interfacer and Modeler are well-suited and comprehensive to model the most difficult of production and process complexities allowing for the formulations of straightforward coefficient equations, ubiquitous conservation laws, rigorous constitutive relations, empirical correlative expressions and other necessary side constraints. User, custom, adhoc or external constraints can be augmented or appended to IMPRESS when necessary in several ways. For MILP or logistics problems we offer user-defined constraints configurable from the IML file or the IPL code where the variables and constraints are referenced using unit-operation-port-state names and the quantity-logic variable types. It is also possible to import a foreign LP file (row-based MPS file) which can be generated by any algebraic modeling language or matrix generator. This file is read just prior to generating the matrix and before exporting to the LP, QP or MILP solver. For NLP or quality problems we offer user-defined formula configuration in the IML file and single-value and multi-value function blocks writable in C, C++ or Fortran. The nonlinear formulas may include intrinsic functions such as EXP, LN, LOG, SIN, COS, TAN, MIN, MAX, IF, NOT, EQ, NE, LE, LT, GE, GT and KIP, LIP, SIP (constant, linear and monotonic spline interpolation) as well as user-written extrinsic functions. Industrial modeling frameworks or IMF's are intended to provide a jump-start to an industrial project implementation i.e., a pre-project if you will, whereby pre-configured IML files and/or IPL code are available specific to your problem at hand. The IML files and/or IPL code can be easily enhanced, extended, customized, modified, etc. to meet the diverse needs of your project and as it evolves over time and use. IMF's also provide graphical user interface prototypes for drawing the flowsheet as in Figure 1 and typical Gantt charts and trend plots to view the solution of quantity, logic and quality time-profiles. Current developments use Python 2.3 and 2.7 integrated with open-source Dia and Matplotlib modules respectively but other prototypes embedded within Microsoft Excel/VBA for example can be created in a straightforward manner. However, the primary purpose of the IMF's is to provide a timely, cost-effective, manageable and maintainable deployment of IMPRESS to formulate and optimize complex industrial manufacturing systems in either off-line or on-line environments. Using IMPRESS alone would be somewhat similar (but not as bad) to learning the syntax and semantics of an AML as well as having to code all of the necessary mathematical representations of the problem including the details of digitizing your data into time-points and periods, demarcating past, present and future time-horizons, defining sets, index-sets, compound-sets to traverse the network or topology, calculating independent and dependent parameters to be used as coefficients and bounds and

finally creating all of the necessary variables and constraints to model the complex details of logistics and quality industrial optimization problems. Instead, IMF's and IMPRESS provide, in our opinion, a more elegant and structured approach to industrial modeling and solving so that you can capture the benefits of advanced decision-making faster, better and cheaper. "Advanced" Production Accounting Synopsis At this point we explore further the purpose of "advanced" production accounting in terms of its diagnostic capability of aiding in the detection, identification and elimination of "bad" production data where "bad" really implies inconsistent data. The major advantage of DRR is its ability to use redundant data which is sometimes referred to as over-determined or over-specified problems. The redundancy primarily occurs because of the inclusion of a model i.e., equations or equality constraints relating flow, holdup and density variables together as in laws of conservation of matter, energy and momentum. Some of these variables are measured or reconciled, some are unmeasured or regressed while others are fixed or rigid. Measured variables include a raw and known (finite) variance, unmeasured variables have a large and unknown (infinite) variance and fixed variables have no or zero variance. The DRR objective function is to minimize the weighted sum of squares of the raw measurements minus its reconciled estimate where the weights are simply determined as the inverse of its raw variance (Kelly, 1998). At a converged DRR solution using SECQPE we have estimates of the reconciled and unmeasured or regressed variables and after running SORVE we have new variance estimates for the reconciled and unmeasured or regressed variables as well as redundancy and observability estimates for each measured and unmeasured variable respectively. Furthermore, using these variances we can compute individual gross-error detection statistics for the measured variables and equality constraints as well as confidence intervals for each unmeasured variable using the Student-t tables to determine statistical threshold or critical values. In addition, we can also compute a global or overall Hotelling statistic on the objective function value to detect if at least one gross-error exists. If we apply these techniques to the data set found in Kelly and Mann (2005) where the flowsheet has been slightly modified to transform it into UOPSS, and there are no injected gross-errors into the system, we arrive at an objective function of 34.87 with a Hotelling critical value of 43.2 indicating that there are no detectable gross-errors. However, if we add a significant bias, drift or offset to the density of pool T300 storing LPG of 0.05 i.e., the density changes from 0.600 to 0.650, the objective function inflates to 334.64 where the Hotelling statistic does not change. This indicates that at least one of the measurements is in gross-error and/or there is a leak or unexpected flow in or out of one of the nodes. Using the individual maximum-power measurement statistics we have three significant ones for the densities on the "FCC,lpg" and "REF,lpg" out-ports as well as on "T300,LPG" of 17.315, 17.353 and 17.313 respectively which are very similar to those found in Table 3 of Kelly and Mann (2005). Although it does not pinpoint "T300,LPG" exactly as the location of the gross-error it is able to isolate the area, section or region of the flowsheet accurately to where the possible outlier may exist which is very useful for large flowsheets. An interesting property or artifact of the maximum-power measurement statistics is that if the measurement is deleted or removed i.e., is made unmeasured, then the reduction in the weighted least-squares objective function will equal the square of the maximum-power statistic. For example, 17.313^2 = 299.74 and when we subtract this amount from 334.64 we get 334.64 - 299.74 = 34.90 which is very close to our original objective function with no detectable gross-error of 34.87. Note that the reason it is called the maximum-power statistic is due to the fact that if there is only one gross-error in the system then this statistic will have the maximum-power or "maximum-probability" to detect that it is a true outlier.

More generally, there are essentially two types of what-if scenarios used in APA to ultimately "close" a production accounting period data set to within statistical control limits i.e., declaring the production accounting period to be in statistical production control. The first is the one mentioned above whereby a measured/reconciled variable is determined to be in gross-error by switching it to an unmeasured/regressed variable and checking to see if the objective function and other measurement and constraint statistics are below their statistical critical values. The second is making a fixed or rigid variable into an unmeasured or regressed variable. In most industrial plants found in the process industries (especially in pipe-less plants) there is flexibility in how materials or resources can be routed, connected or streamed from one piece of equipment to another (Kelly, 2000). The logging or recording of these movements can also be erroneous even to the point where they are not logged at all. If the system knows of all of the possible routes, lineups or external streams (out-port to in-port) then it is prudent to change a suspect route from being fixed or rigid i.e., not open, active or setup with a tolerance or variance of zero (0), to being unmeasured or regressed with an unknown value and a variance of infinity. If a scenario with one of the routes changed from fixed to unmeasured results in a significant reduction in the objective function, then this is potentially a mis-logged or mis-specified connection and should be investigated further (Kelly, 1999). In conclusion, the primary benefit of APA is to statistically scrutinize the production accounting data on a regular and timely basis to quickly and accurately highlight anomalies in the flowsheet where possible defects exist. When gross-errors are detected and identified it is then prudent to eliminate these faults by re-calibrating instruments, improving the logging or recording of manually entered transactional data such as temporary stream flows, updating or refreshing auxiliary data sources more frequently, etc. (Kelly, 2000). If for example advanced planning and scheduling (APS) decisions are made using bad or poor quality production data then of course these decisions are unfortunately suspect and can significantly and negatively impact the performance and profitability of your production-chain (Kelly and Zyngier, 2008b). Finally, Appendix A and B show the APA-IMF.UPS and APA-IMF.IML files used to configure both the model and the data of the APA problem. The UPS file contains the UOPSS constructs or shapes and the IML file contains all of the static and dynamic QLQP capacity data referenced by the UOPSS constructs. The UPS file can be automatically created using the open-source drawing software called GNOME Dia and using the Python 2.3 programming language to access Dia's object model to retrieve the UOPSS sheet shapes. The IML file is a simple text file with several categories or classifications of both the model (master, static) data and the cycle (transactional, dynamic) data. An interesting feature of the IML file are the use of "Calc"'s (values assigned to symbols) which can be used to manage dynamic data from the field such as flow meter readings and laboratory analysis results. This means that interfacing or binding the various data sources to the IML file is achieved by changing the value of a Calc and then using this Calc in the rest of the data categories of the IML file. Another interesting feature is the use of a "missing-value" or "missing-data" number we call a "non-naturally occurring number" (NNON) typically set to -99999. This is useful to switch a measurement from being measured to unmeasured i.e., if the value is NNON then it is to be regressed in the DRR, when performing the gross-error detection and identification analysis similar to running multiple scenarios, cases or situations to determine if the problem contains bad data before the production accounting data is disseminated to other decision-making applications. References Kelly, J.D., "A regularization approach to the reconciliation of constrained data sets", Computers & Chemical Engineering, 1771, (1998).

Kelly, J.D., "Practical issues in the mass reconciliation of large plant-wide flowsheets", AIChE Spring Meeting, Houston, March, (1999). Kelly, J.D., “The necessity of data reconciliation”, NPRA Computer Conference, Chicago, November, (2000). Kelly, J.D., "Production modeling for multimodal operations", Chemical Engineering Progress, February, 44, (2004a). Kelly, J.D., "Techniques for solving industrial nonlinear data reconciliation problems", Computers & Chemical Engineering, 2837, (2004b). Kelly, J.D., Mann, J.L., "Improve yield accounting by including density measurements explicitly", Hydrocarbon Processing, January, (2005). Kelly, J.D., Mann, J.L., Schulz, F.G., "Improve accuracy of tracing production qualities using successive reconciliation", Hydrocarbon Processing, April, (2005). Kelly, J.D., "The unit-operation-stock superstructure (UOSS) and the quantity-logic-quality paradigm (QLQP) for production scheduling in the process industries", In: MISTA 2005 Conference Proceedings, 327, (2005). Kelly, J.D., Zyngier, D., "A new and improved MILP formulation to optimize observability, redundancy and precision for sensor network problems", American Institute of Chemical Engineering Journal, 54, 1282, (2008a). Kelly, J.D., Zyngier, D., "Continuously improve planning and scheduling models with parameter feedback", FOCAPO 2008, July, (2008b). Zyngier, D., Kelly, J.D., "UOPSS: a new paradigm for modeling production planning and scheduling systems", ESCAPE 22, June, (2012). Kelly, J.D., Hedengren, J.D., "A steady-state detection (SDD) algorithm to detect non-stationary drifts in processes", Journal of Process Control, 23, 326, (2013). Appendix A - APA-IMF.UPS (UOPSS) File

Appendix B - APA-IMF.IML File

Technology

Advanced Production Accounting