ControlEng. Practice, Vol. 3, No. 8, pp. 1147-1153, 1995 Copyright 1995 Elsevier Science Ltd
Printed in Great Britain. All rights reserved 096%066t/95 $9.50 + 0.00
A SIMULATOR FOR PERFORMANCE ESTIMATION OF OPEN DISTRIBUTED COMPUTER CONTROL SYSTEMS
S. Horiike, Y. OkaTJaki and H. Soeda
Industrial Systems Laboratory, Mitsubishi Electric Corporation, Amagasaki, Hyogo 661, Japan
(Received October 1994; in final form May 1995)
Abstract. This paper describes a simulator developed for performance estimation of open distributed computer control systems. For an efficient performance estimation, discrete- event simulation with graphical and detailed models is used. The system model consists of operating-system models, communication models, task models and environmental models. Various configurations can be easily simulated by combining these models.
Keyword: Estimation, simulation, control system design, distributed control, modeling.
A computer control system for monitoring and controlling a plant is required to possess various special characteristics. For example, it should respond quickly to computing demands, be fault- tolerant, etc. A specialized computer architecture has been needed to satisfy the requirements. For example, the traditional architecture of an EMS (Energy Management System) is a centralized system using a dedicated computer. Recently, many EMS designers have been interested in open distributed computer systems as a new EMS architecture (Sasson, 1992). The open distributed system consists of multiple workstations or servers linked by a LAN (Local Area Network). The system adopts standard products such as UNIX, Ethernet, TCP/IP, etc. Open distributed systems have many attractive features such as expandability, maintainability, and fault tolerance. On the other hand, they also have problems which need to be resolved. Performance estimation is one of the most important problems.
For developing traditional centralized systems, experienced designers could estimate the performance using their intuition. However, it is difficult to estimate the performance of open distributed systems, for two reasons. First, the number of elements configuring an open distributed
system is too great and they cooperate in a complicated way. The elements include various parts in an open distributed system, ranging from hardware to software, e.g. a network interface device, a CPU scheduler, etc. Secondly, the behavior of each element itself is complicated. So, the use of a computer-aided performance-estimation tool is inevitable for estimating the performance of open distributed computer control systems.
The MELSPEC (MELco System Performance Certifier) performance estimation tool has been developed for efficient design of open distributed computer control systems. The tool includes detailed and graphical simulation models of open distributed systems. The performance is estimated by executing discrete-event simulation. This paper describes the basic idea of the tool, the modeling method and a simulation example.
2. PERFORMANCE ESTIMATION OF A COMPUTER CONTROL SYSTEM
Fig. 1 denotes an ideal design cycle utilizing performance estimation. First, a designer proposes an initial design. Then various performance criteria are estimated according to the design. The estimated values are compared to a specification. If the estimated values satisfy the specification, the design is complete.
1148 S. Horiike et al.
If the estimated values do not satisfy the specification, the designer should repeat this cycle again. This cycle will continue until the specifications are satisfied. The performance estimation is a key stage in the cycle.
The following are the requirements for the efficient performance estimation of open distributed computer control systems.
1) The performance in the transient state is important.
Generally, the computing load in the normal state of computer control systems is low, and the system should operate properly. Once something such as a fault happens in the plant, the load to the computer system increases sharply. It is important to estimate how the system would operate during the peak load period.
2) Various system configurations should be easy to test.
There are many variations for the configuration of open distributed systems. The local area network is not necessarily the only one. It may be duplicated or tripled for communication load share or for fault tolerance. The number of computers and allocation of functions depends on the system designers. Various configurations should be tested and the best one should be chosen through the design procedure. The performance estimation tool must facilitate the testing of various configurations.
3) The performance should be estimated within a moderate time.
The performance estimation is incorporated in the design process. It should not take a lot of time to obtain a proper estimation. It is necessary to estimate the performance within a moderate time.
DESIGN ~ N
Fig. 1. Design cycle
4) The performance estimation tool should be easy to use.
Those who are responsible for the system performance are system engineers. So, a tool which requires special expertise is not appropriate. A tool which a system engineer could use easily is desirable.
5) The performance should be estimated whenever the system is partly replaced or expanded.
An open distributed system could be partly replaced or expanded. However, the performance would not be guaranteed after such changes. To enhance the expandability, the tool should support performance estimation when the system is partly replaced or expanded.
There are three possible methods of estimating the performance. They are: measurement of an actual system, the analytical method and simulation. The value measured from an actual system is usually the most accurate, but the evaluation cost is too high to test various configurations. Another problem is the difficulty of setting the situation for performance measurement. For example, it is impossible for all workstations to access the network simultaneously in order to find the performance value in the worst case. The benefit of the analytical method is its cost effectiveness. However, it is difficult to get an estimation in the transient state. Furthermore, the method is not sufficiently flexible to take all the elements into consideration.
The approach used in MELPSEC is discrete-event simulation with detailed and graphical models. This approach has the potential to exhibit desirable features for this purpose. The performance estimation in transient states could be available, various configurations could be easily tested, etc.
There are several problems inherent in the approach. The first problem is that the computation cost of simulation would be high. However, this has been dramatically decreased by the development of high- performance workstations. A simulation example confirmed that an actual sized computer system can be simulated within a moderate time, even if detailed models are used.
The second problem is that the development cost of the simulation model would be high. However, standard elements would be used to configure open systems by many applications. So, the developed model could be applicable to various areas where open systems are adopted. Consequently, the development cost would be decreased by sharing the cost among the applications.
The third problem is the guarantee of real-time constraints. There are two types of real-time constraints for computer control systems: "hard"
A Simulator for Performance Estimation 1149
real-time constraints and "soft" real-time ones. In hard real-time systems, missing a deadline causes fatal damage to the system, so this must not happen. On the other hand, soft real time deadlines can be missed. Simulation is adequate for verifying soft real-time constraints. If the worst case which may take place in the system is known, and the system is deterministic, simulation guarantees a hard real time constraint. If not, one cannot guarantee the real time constraints, but can raise the possibility of meeting the deadlines by repeating the simulation.
performance are the CPU model and the disk model. A queuing discipline for an operating system is called a scheduler. The scheduler for UNIX is known as a round robin with multilevel feedback (Bach, 1986). This is one of the most important models used to estimate the performance of the open system. Only simple disk access model is available in current MELSPEC. The access is processed in a first-in-first-served principle. Other disk models, such as a swapper process, have not yet been developed.
The development of the MELSPEC performance estimation tool has been based on the above approach. A software package (SES, 1991)for modeling and discrete event simulation has been used. The graphical and detailed models of open distributed computer control systems are developed on the software. The software enables the system to be modeled hierarchically and graphically. Hierarchical modeling means that the overall model is made up from submodules which contain graphs. Graphical modeling means that each model is expressed as a graph with nodes and arcs, using a graphical user interface. The notion of a graph is extended to include other submodules as nodes.
The execution of event-driven simulation in the software means that event-entities traverse the graph. Each node has a function pre-determined by the simulation software or written in C-language. When the event-entity reaches a node, the function defined in the node is executed. After the execution of the function, the event-entity will take the route to another node which the arc incident from the node shows. If the event-entity reaches a node expressing a submodule, it will enter the submodule and traverse the extended graph contained in the submodule.
As an example, Fig. 2 shows a clock tick generation model in the UNIX operating system. The node start_timer will generate an event-entity when the simulation is initiated. The event-entity traverses the loop (timer-delay -> setpri -> interrupt_to_system) infinitely. During the loop, it stays at timer_delay for a period specified in the node.
Fig. 3 indicates the hierarchical structure of developed models. The open distributed computer control system model is divided into a platform model and an application model.
The platform model consists of operating system models and communication models. The operating system model is a set of submodules which define the functions inside the UNIX workstations. The submodules are a scheduler model, a sleep model, a disk access model, a fork model, an exit model and a clock model. The processing elements in an operating system model which mainly relate to the
In the OS model, almost all the event-entities are processes in the notion of UNIX. In the application model described below, information about the activities in the OS are set to the event-entities. The information is the amount of CPU time, disk access size, the priority of the process, etc. Event-entities carry the information to traverse the OS model.
start_t imer _ stpri I in terrupt_to_system
Fig. 2. Model of clock tick generation
PLATFORM I MODEL
" - - - - r ' - -
Fig. 3. Model structure
1150 S. Horiike et al.
includes their models. The TCP/IP model consists of a TCP model, a UDP model and an IP model (Comer, et a l . , 1991). TCP has many functions for connection-oriented communication. The current TCP model has a function to send acknowledges automatically when it receives application data. However, other functions, such as a window mechanism and piggy back, have not yet been developed.
C M ,CDE,1
. ~ CSMA/CD[ 1 ]
Fig. 4. Connections of replicated models
1 Receive H Processing H Event
Fig. 5. Typical Life Cycle of Process
Communication is a critical part of the performance of distr ibuted computer systems. The communication model consists of a protocol model and a network model.
As TCP/IP and Ethernet are current de facto standard products, the first version of MELSPEC
UDP is simply a buffer for sending data from IP to an application and from an application to IP. The IP model can divide a packet if the packet size exceeds the predetermined size which the network device accepts. The IP model includes a function to route a packet in a predetermined way as described below. A CSMAJCD model is the media access protocol of Ethernet (MacDougall, 1987). This is also one of the most important models to estimate the performance of an open system. The CSMA/CD model has the functions of back off delay calculation, detection of jams, etc. This is the most complicated submodule in the current MELSPEC. Ethernet is modeled as propagation delays calculated from distances between senders and receivers.
A event-entity in communication models is a packet. Each packet has information such as node identifications, data size, etc. Other protocols and network devices such as OSI, FDDI and ATM are planned to be developed for future versions of MELSPEC.
The OS model and the communication model are basic in the sense that they can be generally used for modeling an open computer system. A platform model is made by replicating the above basic models and connecting them. Each replicated element has an identification number to distinguish it from the others. Different elements in the same workstation should have the same identification number. That is, ith operating system model has the ith clock model (clock[i]) and the ith scheduler model (scheduler[i]). The CSMA/CD model and the Ethernet model have different numbering schemes. In open distributed computer control systems, multiple LANs would be used to achieve network fault tolerance. Therefore, a workstation may have multiple LAN interface cards. MELSPEC uses a table indicating connections between workstations and CSMA/CDs. Another table is used for indicating connections between the identification numbers of the CSMA/CD model and Ethernet model. A different system configuration can be easily modeled by changing the replication number and the contents of the table. Fig. 4 shows an example to illustrate the modeling of a distributed computer system by replication and connection.
An application model consists of task models and environmental models. The task model is expressed by the life cycle of a process. The typical life cycle of a process can be classified into several patterns. Fig. 5 shows one of the typical life cycles in a
A Simulator for Performance Estimation 1151
distributed computer control system. A demon process is forked, when the system starts up. Then, it falls into infinite loop. First, it waits f...