A simulator for performance estimation of open distributed computer control systems

Pergamon

0967-0661 (95)00110-7

ControlEng. Practice, Vol. 3, No. 8, pp. 1147-1153, 1995 Copyright © 1995 Elsevier Science Ltd

Printed in Great Britain. All rights reserved 096%066t/95 $9.50 + 0.00

A SIMULATOR FOR PERFORMANCE ESTIMATION OF OPEN DISTRIBUTED COMPUTER CONTROL SYSTEMS

S. Horiike, Y. OkaTJaki and H. Soeda

Industrial Systems Laboratory, Mitsubishi Electric Corporation, Amagasaki, Hyogo 661, Japan

(Received October 1994; in final form May 1995)

Abstract. This paper describes a simulator developed for performance estimation of open distributed computer control systems. For an efficient performance estimation, discrete- event simulation with graphical and detailed models is used. The system model consists of operating-system models, communication models, task models and environmental models. Various configurations can be easily simulated by combining these models.

Keyword: Estimation, simulation, control system design, distributed control, modeling.

1. INTRODUCTION

A computer control system for monitoring and controlling a plant is required to possess various special characteristics. For example, it should respond quickly to computing demands, be fault- tolerant, etc. A specialized computer architecture has been needed to satisfy the requirements. For example, the traditional architecture of an EMS (Energy Management System) is a centralized system using a dedicated computer. Recently, many EMS designers have been interested in open distributed computer systems as a new EMS architecture (Sasson, 1992). The open distributed system consists of multiple workstations or servers linked by a LAN (Local Area Network). The system adopts standard products such as UNIX, Ethernet, TCP/IP, etc. Open distributed systems have many attractive features such as expandability, maintainability, and fault tolerance. On the other hand, they also have problems which need to be resolved. Performance estimation is one of the most important problems.

For developing traditional centralized systems, experienced designers could estimate the performance using their intuition. However, it is difficult to estimate the performance of open distributed systems, for two reasons. First, the number of elements configuring an open distributed

system is too great and they cooperate in a complicated way. The elements include various parts in an open distributed system, ranging from hardware to software, e.g. a network interface device, a CPU scheduler, etc. Secondly, the behavior of each element itself is complicated. So, the use of a computer-aided performance-estimation tool is inevitable for estimating the performance of open distributed computer control systems.

The MELSPEC (MELco System Performance Certifier) performance estimation tool has been developed for efficient design of open distributed computer control systems. The tool includes detailed and graphical simulation models of open distributed systems. The performance is estimated by executing discrete-event simulation. This paper describes the basic idea of the tool, the modeling method and a simulation example.

2. PERFORMANCE ESTIMATION OF A COMPUTER CONTROL SYSTEM

Fig. 1 denotes an ideal design cycle utilizing performance estimation. First, a designer proposes an initial design. Then various performance criteria are estimated according to the design. The estimated values are compared to a specification. If the estimated values satisfy the specification, the design is complete.

1147

1148 S. Horiike et al.

If the estimated values do not satisfy the specification, the designer should repeat this cycle again. This cycle will continue until the specifications are satisfied. The performance estimation is a key stage in the cycle.

The following are the requirements for the efficient performance estimation of open distributed computer control systems.

1) The performance in the transient state is important.

Generally, the computing load in the normal state of computer control systems is low, and the system should operate properly. Once something such as a fault happens in the plant, the load to the computer system increases sharply. It is important to estimate how the system would operate during the peak load period.

2) Various system configurations should be easy to test.

There are many variations for the configuration of open distributed systems. The local area network is not necessarily the only one. It may be duplicated or tripled for communication load share or for fault tolerance. The number of computers and allocation of functions depends on the system designers. Various configurations should be tested and the best one should be chosen through the design procedure. The performance estimation tool must facilitate the testing of various configurations.

3) The performance should be estimated within a moderate time.

The performance estimation is incorporated in the design process. It should not take a lot of time to obtain a proper estimation. It is necessary to estimate the performance within a moderate time.

DESIGN ~ N

ESTIMATION I

SPECIFICATION

Fig. 1. Design cycle

4) The performance estimation tool should be easy to use.

Those who are responsible for the system performance are system engineers. So, a tool which requires special expertise is not appropriate. A tool which a system engineer could use easily is desirable.

5) The performance should be estimated whenever the system is partly replaced or expanded.

An open distributed system could be partly replaced or expanded. However , the performance would not be guaranteed after such changes. To enhance the expandability, the tool should support performance estimation when the system is partly replaced or expanded.

There are three possible methods of estimating the performance. They are: measurement of an actual system, the analytical method and simulation. The value measured from an actual system is usually the most accurate, but the evaluation cost is too high to test various configurations. Another problem is the difficulty of setting the situation for performance measurement. For example, it is impossible for all workstations to access the network simultaneously in order to find the performance value in the worst case. The benefit of the analytical method is its cost effectiveness. However, it is difficult to get an estimation in the transient state. Furthermore, the method is not sufficiently flexible to take all the elements into consideration.

The approach used in MELPSEC is discrete-event simulation with detailed and graphical models. This approach has the potential to exhibit desirable features for this purpose. The performance estimation in transient states could be available, various configurations could be easily tested, etc.

There are several problems inherent in the approach. The first problem is that the computation cost of simulation would be high. However, this has been dramatically decreased by the development of high- performance workstations. A simulation example confirmed that an actual sized computer system can be simulated within a moderate time, even if detailed models are used.

The second problem is that the development cost of the simulation model would be high. However, standard elements would be used to configure open systems by many applications. So, the developed model could be applicable to various areas where open systems are adopted. Consequently, the development cost would be decreased by sharing the cost among the applications.

The third problem is the guarantee of real-time constraints. There are two types of real-time constraints for computer control systems: "hard"

A Simulator for Performance Estimation 1149

real-time constraints and "soft" real-time ones. In hard real-time systems, missing a deadline causes fatal damage to the system, so this must not happen. On the other hand, soft real time deadlines can be missed. Simulation is adequate for verifying soft real-time constraints. If the worst case which may take place in the system is known, and the system is deterministic, simulation guarantees a hard real time constraint. If not, one cannot guarantee the real time constraints, but can raise the possibility of meeting the deadlines by repeating the simulation.

performance are the CPU model and the disk model. A queuing discipline for an operating system is called a scheduler. The scheduler for UNIX is known as a round robin with multilevel feedback (Bach, 1986). This is one of the most important models used to estimate the performance of the open system. Only simple disk access model is available in current MELSPEC. The access is processed in a first-in-first-served principle. Other disk models, such as a swapper process, have not yet been developed.

3. MELSPEC

The development of the MELSPEC performance estimation tool has been based on the above approach. A software package (SES, 1991)for modeling and discrete event simulation has been used. The graphical and detailed models of open distributed computer control systems are developed on the software. The software enables the system to be modeled hierarchically and graphically. Hierarchical modeling means that the overall model is made up from submodules which contain graphs. Graphical modeling means that each model is expressed as a graph with nodes and arcs, using a graphical user interface. The notion of a graph is extended to include other submodules as nodes.

The execution of event-driven simulation in the software means that event-entities traverse the graph. Each node has a function pre-determined by the simulation software or written in C-language. When the event-entity reaches a node, the function defined in the node is executed. After the execution of the function, the event-entity will take the route to another node which the arc incident from the node shows. If the event-entity reaches a node expressing a submodule, it will enter the submodule and traverse the extended graph contained in the submodule.

As an example, Fig. 2 shows a clock tick generation model in the UNIX operating system. The node start_timer will generate an event-entity when the simulation is initiated. The event-entity traverses the loop (timer-delay -> setpri -> interrupt_to_system) infinitely. During the loop, it stays at timer_delay for a period specified in the node.

Fig. 3 indicates the hierarchical structure of developed models. The open distributed computer control system model is divided into a platform model and an application model.

The platform model consists of operating system models and communication models. The operating system model is a set of submodules which define the functions inside the UNIX workstations. The submodules are a scheduler model, a sleep model, a disk access model, a fork model, an exit model and a clock model. The processing elements in an operating system model which mainly relate to the

In the OS model, almost all the event-entities are processes in the notion of UNIX. In the application model described below, information about the activities in the OS are set to the event-entities. The information is the amount of CPU time, disk access size, the priority of the process, etc. Event-entities carry the information to traverse the OS model.

s t a r t_ t imer _ s°tpri I i n t e r r u p t _ t o _ s y s t e m

Fig. 2. Model of clock tick generation

MODEL

MODEL

PLATFORM I MODEL

" - - - - r ' - -

REPLICATION CONNECTION

TASK MODEL

OS MODEL

MODEL

Fig. 3. Model structure


CSMAJCD[4] I

includes their models. The TCP/IP model consists of a TCP model, a UDP model and an IP model (Comer, et a l . , 1991). TCP has many functions for connection-oriented communication. The current TCP model has a function to send acknowledges automatically when it receives application data. However, other functions, such as a window mechanism and piggy back, have not yet been developed.

C M ,CDE,1

CSMA/CD[2] ~-*

. ~ CSMA/CD[ 1 ]

O CSMAJCD[0]

Fig. 4. Connections of replicated models

Fork

1 Receive H Processing H Event

Send Event

Fig. 5. Typical Life Cycle of Process

Communication is a critical part of the performance of d i s t r ibu ted compu te r sys tems. The communication model consists of a protocol model and a network model.

As TCP/IP and Ethernet are current de facto standard products, the first version of MELSPEC

UDP is simply a buffer for sending data from IP to an application and from an application to IP. The IP model can divide a packet if the packet size exceeds the predetermined size which the network device accepts. The IP model includes a function to route a packet in a predetermined way as described below. A CSMAJCD model is the media access protocol of Ethernet (MacDougall, 1987). This is also one of the most important models to estimate the performance of an open system. The CSMA/CD model has the functions of back off delay calculation, detection of jams, etc. This is the most complicated submodule in the current MELSPEC. Ethernet is modeled as propagation delays calculated from distances between senders and receivers.

A event-entity in communication models is a packet. Each packet has information such as node identifications, data size, etc. Other protocols and network devices such as OSI, FDDI and ATM are planned to be developed for future versions of MELSPEC.

The OS model and the communication model are basic in the sense that they can be generally used for modeling an open computer system. A platform model is made by replicating the above basic models and connecting them. Each replicated element has an identification number to distinguish it from the others. Different elements in the same workstation should have the same identification number. That is, ith operating system model has the ith clock model (clock[i]) and the ith scheduler model (scheduler[i]). The CSMA/CD model and the Ethernet model have different numbering schemes. In open distributed computer control systems, multiple LANs would be used to achieve network fault tolerance. Therefore, a workstation may have multiple LAN interface cards. MELSPEC uses a table indicating connections between workstations and CSMA/CDs. Another table is used for indicating connections between the identification numbers of the CSMA/CD model and Ethernet model. A different system configuration can be easily modeled by changing the replication number and the contents of the table. Fig. 4 shows an example to illustrate the modeling of a distributed computer system by replication and connection.

An application model consists of task models and environmental models. The task model is expressed by the life cycle of a process. The typical life cycle of a process can be classified into several patterns. Fig. 5 shows one of the typical life cycles in a

A Simulator for Performance Estimation 1151

distributed computer control system. A demon process is forked, when the system starts up. Then, it falls into infinite loop. First, it waits for an event. When it receives an event, it consumes some CPU time. After it sends an event to another process, it waits for the next event. The task model requires the parameters of each process, e.g. CPU time or the amount of communication data. Each task calls the CPU models with an identification number which indicates the workstation where the task is allocated. The simulations of different task allocations can be easily tested by changing the numbers.

An environmental model covers various inputs to a computer control system. Therefore, it generates initial events and triggers the subsequent events. The operator activities and the real-time data from a plant are typical inputs to the system. They are modeled as input times and/or data sizes. These values are usually generated according to random- number generators.

4. VARIOUS PHASES OF THE USE OF MELSPEC

There are various phases at which the tool could be used, besides the initial design process as in Fig. 1.

1) Architecture Comparison

In the early stage of open system development, estimation of parameters is not easy because of the lack of data. For example, when changing from a closed architecture to an open architecture, CPU time consumed for an application cannot be easily estimated. In this case, architecture comparisons are effective. Several candidate architectures are simulated to compare their relative performance values.

4) Educational tool

MELSPEC can also be utilized as an educational tool for open distributed systems, because it holds detailed models of the operating system and communication, and it can simulate a computer system in animation mode. Furthermore, as the model is classified into several aspects and selected parts of animated simulation can be displayed, one can trace the relevant parts of models. For example, the communicat ion programmer would trace the communication part and learn the functions of TCP/IP, CSMA/CD and Ethernet.

5. EXAMPLE

In the following section, an example is presented to describe the way in which to use MELSPEC. Suppose the system structure in Fig. 6 is simulated. The system has 4 types of workstations. The communication server gathers real-time data from the plant and sends the data to the data server. When the data server gets data from the communication server, it stores the data in its own disk after processing. The data server also serves the client's requests. Upon a request, it prepares data by accessing its CPU and disk, then replies with the data. The computing server undertakes intensive computing upon the client's request. The system is supposed to have 6 clients. Each client randomly accesses the data server. The mean time between successive requests is 1.5 seconds. Each client also accesses the computing server with a mean time of 3 seconds. The communica t ion between the communication server and the data server and between the clients and the data server is through LAN1. The communication between the clients and the computing server is through LAN2.

2) Re-Engineering

A system may be required to be reconstructed for per formance improvement . In this case, MELSPEC is used to find the performance bottleneck and to analyze the system, as well as to estimate the performance of the improved system. Model parameters can be measured easily in this case.

3) Replacement or Expansion

The main characteristics of open distributed systems is that they are partly replaceable and extendible. However, if a part of the system is replaced, or some elements are added, this may affect the performance of the whole system. Therefore, a system cannot be partly replaced or extended without a performance study.

I C,ientl ..... Iclient I Fig. 6. System for simulation example


1 I

¢ 0 . 8 -,-I E~

0 . 6 [b

m0.2

0 i0

I I I I

J I I

20 30 40 50 60

Time

Fig. 7 Response time of data server

I I 1 I

i I I I I

i0 20 30 40 50 60

Time

Fig. 9 Queue Length of data server CPU

i~ 0 . 8

m 0.6

° 0 . 4 Q4 ¢

c~ 0 . 2

i 0

8 0

-H

6 N

-H

-H aJ

l I I I

4

2

0 '

15 16

I,, , I,I 17 18 19 20

Time

Fig. 8 Response time of computing server Fig. I0 Utilization of LAN1

The simulation model of the computer system is configured as mentioned before. The system activity for 1 minute is simulated. It actually took 4 minutes for the computation on a workstation. This is considered to be short enough for performance estimation in the design phase.

Fig. 7 shows the response time for about 40 requests from a client to the data server. Fig. 8 shows the response time to the request from a client to the computing server. Suppose the response time is specified to be less than 0.4s. Fig. 7 shows that some responses miss the deadline. Figs 9 and 10 show data for analyzing the response time of the data server. Fig. 9 denotes the number of processes waiting for processing. The number excludes the process executing on the CPU. Fig. 10 denotes LAN utilization. From these data, it is clear that the CPU is the key to improving the performance.

The amount of data sent from a power system is constant in normal states. However, the amount increases sharply in fault states of a power system. The state does not last very long; typically, the period of the states is a few seconds. The activity during this period is the major concern of the system designers. The system consists of a number of servers and clients. The typical clients are consoles at which operators make various decisions and operations. The servers include communication servers, data servers and computing servers. The servers may be duplexed for fault tolerance, and connected by multiple LANs.

MELSPEC is suitable for the performance estimation of the system, because it is possible to model various configurations and to get a more precise performance estimation in a transient state than by using traditional approaches.

6. APPLICATION OF MELSPEC TO THE DESIGN OF EMS

The following are examples of major items which should be studied by simulation during the design phase.

There would be many areas in which MELSPEC could be utilized for system designs. As the first application area, the tool is being used in the design of open distributed EMSs. An EMS is a large-scale computer system for monitoring and controlling a power system.

1 ) Processing of real-time data

The real-time data sent from a power system requires various processes in EMS. Every item of data should be processed within a predetermined time. Even if it misses the time

A Simulator for Performance Estimation t 153

limit, the data should be stored in buffers. Such a case may take place in fault states. The size of the buffers should be designed to avoid an overflow.

2) Allocation of a real-time database

The validity of each model was verified by comparison using measured values or analytical methods. The validity of the whole system model is being verified using an actual open distributed system. Other open system models such as FDDI, ATM, and DCE are planned to be developed.

Real-time databases, which contain data concerning the real-time status of the power system, play a very important role for the EMS. They are updated often, especially in fault states, and referred to by almost all the functions in EMS, so they should be updated and referred to quickly and easily. The simulation would give the best solution for the allocation of real-time databases (Horiike and Okazaki, 1995).

3) Allocation of tasks

Advanced analysis and planning software packages, such as restoration of the system from fault states, are essential in a modern EMS. This requires extensive computation. To take full advantage of computing resources, an appropriate allocation should be studied, using simulation.

8. REFERENCES

Bach, M.J. (1986). The Design of the UNIX Operating System. Prentice-Hall, Englewood Cliffs, New Jersey.

Comer D.E. and D.L.S tevens (1991). lnternetworking with TCP/IP. Prentice-Hall, Englewood Cliffs, New Jersey.

Horiike, S. and Okazaki, Y. (1995). Modeling and Simulation for Performance Estimation of Open Distributed Energy Management Systems. IEEE Power Industrial Computer Applications Conference

MacDougall, M.H. (1987). Simulating Computer Systems. The MIT Press, Cambridge.

7. CONCLUSION

MELSPEC has been developed for efficient performance estimation of open distributed computer control systems. The tool includes detailed and graphical models of each element of the system. Various system configurations can be easily simulated by changing the identification numbers.

Sasson,A.M. (1992). Open Systems Procurement : A Migration Strategy, IEEE Trans. on Power System, Vol.8, No.2,pp.515-526.

SES (1991). SES/Workbench User's Manual.

Documents

A simulator for performance estimation of open distributed computer control systems