Architectural Solutions in PAM - Research in Innovation, Design

Examensarbete, Datalogi, D-nivå, 10pInstitutionen för DatateknikMälardalens Högskola

Master of Science thesis, Computer Science, 10 creditsDepartment of Computer EngineeringMälardalen University

Master thesis work performed at:

Westinghouse Atom ABVästerås, Sweden

2001-05-04

750205-0318Rikard Land

Architectural Solutions in PAM

Photo © Howard Davis

Architectural solutions in PAM Page 2 (50)

750205-0318 Rikard Land Master thesis, Computer ScienceArchitectural solutions in PAM

TABLE OF CONTENTS

1 INTRODUCTION 41.1 SUMMARIES 41.1.1 SUMMARY (IN ENGLISH) 41.1.2 SAMMANFATTNING (PÅ SVENSKA) 41.2 THE PAM SYSTEM 41.2.1 OVERALL DESCRIPTION 41.2.2 PREVIOUS PROBLEMS WITH PAM 51.3 DESCRIPTION OF THE WORK 51.3.1 BACKGROUND 51.3.2 THEORY 51.3.3 REQUIREMENT MEETINGS 51.3.4 THE DEVELOPMENT AND ANALYSIS 61.3.5 TECHNOLOGY 71.3.6 THIS DOCUMENT 71.4 ACKNOWLEDGEMENTS 8

2 AN OVERVIEW OVER SOFTWARE ARCHITECTURE 82.1 THE RESEARCH AREA OF SOFTWARE ARCHITECTURE 82.2 SOFTWARE ARCHITECTURE IN REAL SYSTEMS 92.3 SOCIAL EFFECTS 102.4 THE SOFTWARE DEVELOPMENT PROCESS 102.4.1 THE COST OF ERRORS 102.4.2 THE COST DISTRIBUTION DURING A PROGRAM’S LIFE CYCLE 122.4.3 THE IMPORTANCE OF REQUIREMENTS SPECIFICATION 132.4.4 DEFINITIONS OF SOFTWARE ARCHITECTURE 132.4.5 CHOOSING A DESIGN METHODOLOGY 142.5 SOFTWARE PROPERTIES 152.5.1 DIFFERENT TYPES OF PROPERTIES 152.5.2 THE CONTEXT OF ARCHITECTURAL ANALYSIS 162.5.3 TRADE-OFF 172.5.4 NON-FUNCTIONAL PROPERTIES 172.5.5 FUNCTIONAL PROPERTIES 212.5.6 CONCEPTUAL INTEGRITY 212.5.7 DOCUMENTATION 222.6 VIEWS , ADL’S AND STYLES 232.6.1 ARCHITECTURAL VIEWS 232.6.2 ARCHITECTURAL DESCRIPTION LANGUAGES 252.6.3 ARCHITECTURAL STYLES 262.6.4 HETEROGENEOUS ARCHITECTURAL STYLES 302.7 SUMMARY 312.7.1 SOFTWARE PROPERTIES 312.7.2 LANGUAGES AND VIEWS 312.7.3 ARCHITECTURAL STYLES 312.7.4 ARCHITECTURAL ANALYSIS 322.7.5 LESSON LEARNED 32

3 THE ARCHITECTURAL PROPOSALS 323.1 THE ORGANIZATION OF THE DOCUMENT 323.2 FUNCTIONAL REQUIREMENTS 323.3 NON-FUNCTIONAL REQUIREMENTS 323.3.1 COST 323.3.2 PERFORMANCE 33



3.3.3 MAINTAINABILITY 333.3.4 TESTABILITY 333.3.5 REUSABILITY 333.3.6 PORTABILITY 343.3.7 DOCUMENTATION 343.4 THE CANDIDATES 343.4.1 OVERVIEW 343.4.2 GRAPHICAL VIEWS OF THE ARCHITECTURES 363.4.3 SIMILARITIES 383.4.4 DIFFERENCES 393.4.5 PROCESSES OR THREADS 413.5 ANALYSIS 413.5.1 COST ANALYSIS 413.5.2 PERFORMANCE AND SYSTEM LOAD ANALYSIS 423.5.3 MAINTAINABILITY ANALYSIS (SAAM ANALYSIS) 443.5.4 TESTABILITY ANALYSIS 453.5.5 REUSABILITY ANALYSIS 463.5.6 PORTABILITY ANALYSIS 463.5.7 ANALYSIS SUMMARY 46

4 NEW TECHNOLOGY 474.1 TCL ISSUES 474.1.1 THE USE OF THE TCL LANGUAGE 474.1.2 ORATCL 474.1.3 THREADS 474.2 THE SERVER BROKER 474.2.1 FUNCTIONALITY 474.2.2 BEHAVIORAL DESCRIPTION 484.2.3 CONCLUSION 484.3 FILE HANDLING 484.4 SOCKETS 48

5 REFERENCES 48

APPENDIX A – COST ANALYSIS

APPENDIX B – PERFORMANCE AND SYSTEM LOAD ANALYSIS

APPENDIX C – MAINTAINABILITY ANALYSIS

APPENDIX D – PSEUDOCODE AND CODE ANALYSIS

APPENDIX E – REQUIREMENTS SPECIFICATION

APPENDIX F – TCL REFERENCES

APPENDIX G – PROJECT GROUP ENDING DISCUSSION



1 INTRODUCTIONThis report is the Master of Science thesis in Computer Science, performed atWestinghouse Atom, Västerås, by Rikard Land, 2000-2001.

1.1 SUMMARIES1.1.1 Summary (in English)

The Master thesis work aimed at developing a new architecture for parts of the PAMsystem, a complex software system at Westinghouse Atom experiencing troubles.

During the work, more precise requirements on relevant system parts were stated thanwhat had been done before, in cooperation with future users. After that, two somewhatestablished development and analysis methods were used to develop and analyze newarchitectural solutions.

Four variants on the same architecture were developed and analyzed. The existingsolution was also somewhat analyzed, and the reasons why it is insufficient forimplementing the requirements was found. The proposals, together with the results fromthe analyses, were presented for the project and one proposal was chosen.

In addition, new technology needed to implement the architectures was investigated.

1.1.2 Sammanfattning (på svenska)Syftet med examensarbetet var att utveckla en ny arkitektur för delar av systemet PAM,ett komplext mjukvarusystem vid Westinghouse Atom med problem.

Under examensarbetets gång formulerades mer precisa krav på relevanta delar avsystemet än vad som gjorts tidigare, i samarbete med framtida användare. Sedananvändes två någorlunda etablerade utvecklings- och analysmetoder för att utveckla ochanalysera nya arkitektoniska lösningar.

Fyra varianter på samma arkitektur utvecklades och analyserades. Dagens lösninganalyserades också, och anledningarna till att den inte räcker till för att implementerakraven hittades. Förslagen, tillsammans med resultaten från analyserna, presenteradesför projektet som sedan valde ett av förslagen.

Dessutom undersöktes ny teknik, nödvändig för att implementera arkitekturerna.

1.2 THE PAM SYSTEM1.2.1 Overall description

PAM is a system for managing nuclear plant data, simulating events in these plants, andanalyzing the results from the analyses.

Data is stored in a central database, and client applications on local PC’s access this dataand present it to the end user. There are a number of calculation programs simulatingdifferent aspects of the plant, and PAM gives these programs a homogeneous interface.Today, these calculation programs execute in a Unix environment, but in the future atleast Windows NT servers are considered possible environments for these programs.

Since PAM have had problems with the calculation environment, the project decided tolook over the architecture of these parts. The thesis work aimed at developing an



architecture for the parts managing the calculations, having correct functionality as wellas acceptable non-functional properties (the terms functional and non-functionalproperties are discussed in section 2.5).

1.2.2 Previous problems with PAMThere were a number of reasons to look over the calculation environment. Thefundamental reason was that the requirements on the system were stated to weakly; partof the Master thesis was thus to specify the requirements.

Below is a list with specific issues that has been a problem.

• The system is unstable. Often a message arriving from the server causes the clientapplication to crash, because the client is in a state that does not expect the message.

• The system is difficult to maintain. It is difficult to debug, partly because of theinherent complexity of using several processes; there is also a lack ofdocumentation, not least of the communication protocol.

• The system is tightly knit to one specific calculation program in several ways.

• Much functionality is lacking; it is e.g. not possible to stop an execution in any way.

• The different components of today’s system are not independent enough. Mostnotably, a project execution is dependent on that the client exists throughout theduration of the whole execution.

1.3 DESCRIPTION OF THE WORK1.3.1 Background

The author has been employed for more than two years at Westinghouse Atom(previously ABB Atom), which gave the advantage of gained insight into the problem,the environment and the business during a longer period of time compared to someonewho develops a thesis at a previously unknown company.

It was therefore possible to start sketching on solutions early; the PAM project hadinformally discussed ideas on how PAM could be improved even before the Masterthesis work started. This could both be an advantage and a disadvantage. Hopefully, theideas have had time to ripe, but there is a risk that we stick to certain solution ideas whenthe changed requirements could be more readily implemented with a more radicalredesign.

1.3.2 TheoryAt the beginning, the work was focused on the more theoretical part in which a numberof articles and books were acquired and studied, as suggested by my supervisors. Someof the referenced literature was acquired, and during searches on the Internet, relatedinformation and articles were found. Notes from the literature studies are found insection 2.

1.3.3 Requirement meetingsWhen reading, a lot of ideas turned up and made it possible to start sketching on anarchitecture for PAM; at the same time it was found that the requirements were



formulated too weakly. Therefore, two meetings were set up where the requirementswere discussed. The participants were, apart from the author, the steering group forPAM, including future users and PAM’s project leader, and another developer (whohave developed large parts of PAM). The group was somewhat prepared before themeeting1.

Finding “limiting” requirements on PAM was prioritized (i.e. requirements that mayaffect the architecture, and is “as demanding as possible” - it is e.g. quite easy to changecolors and fonts later, but rather difficult to add stability, as is discussed in section 2.5).After the second meeting, it was assumed that there was no need for another meeting;much fewer new requirements than in the first turned up (8, compared to 23).

Due to of lack of resources, no more formal or thorough requirements analysis wasmade, instead it is assumed that the most demanding and “limiting” requirements wasfound, and that erroneous requirements and details can be changed when users evaluate asemi-functional product. Furthermore, the requirements are assumed to be thoroughlyenough discussed to ensure that everybody involved have the same understanding of therequirements and that no more formal notation of them is needed to make themunambiguous.

It was generally understood that not all required functionality will be available in thefirst implementation of the system, so the requirements were marked with “priority one”(required in the first release) and “priority two” (may be implemented later). Thisclassification was used to estimate the proposals’ maintainability, as described in section3.5.3.

1.3.4 The development and analysisThe development of the architecture proposals did not follow the classical “waterfallmodel” very well (this issue is discussed in section 2.4.1), not least because there wasalready an existing system, and that the formulation of requirements took a little longerthan expected.

However, the development model or methodology described in [3] and outlined insection 2.4.5.1 was used to some extent. The methodology suggests that design beperformed in iteration steps, where more and more of the requirements are included ineach step. The methodology is of course more detailed, and what makes it innovativeand useful is the way functional and non-functional requirements (see section 2.5) areseparated.

Even before the first requirements meeting, a solution could be sketched, using therequirement “better than the existing system”. After each meeting the architecture wasredesigned, including the new requirements. This makes three iterations. At the end, thearchitectures needed some polishing to include some minor requirements that wereoverlooked in the first rounds.

1 According to an experiment reported in [2], these types of meetings succeed best with a prepared group(compared both to an unprepared group and individuals). The experiment was actually carried out withscenario-generation meetings, not requirement meetings; however, there are many similarities, not least in howhumans interact.



In [1] and [9], the Software Architecture Analysis Method (SAAM) is described (seesection 2.4.5.2), which was used to evaluate how well the architecture stands changes ofdifferent sorts (new environments, new functionality etc.). Requirements with lowpriority, and desires expressed by the users that were not specified as requirements, wereused as change scenarios for this analysis (see section 3.5.3 and appendix C), whichmade a total of 19 scenarios.

It was also found that the resulting architectures needed to be analyzed with respect toresource usage and performance, the number of processes and threads on eachparticipating computer and the number of network accesses during some commonscenarios were estimated (see section 3.5.2 and appendix B).

The properties testability, reusability, and portability were also analyzed, see sections3.3.4, 3.3.5, and 3.3.6.

Finally, 4 architectures were presented to the PAM project, and with this the Masterthesis work was considered ended. Appendix G presents the project group’s discussion,pointing out minor issues that should be further investigated.

1.3.5 TechnologyBy-and-by, some technology that might be required by the architectural solutionsstarting to crystallize was also investigated.

• Questioning the use of the tcl language.

• OraTcl, a database accessing extension for tcl, which could be a good candidate ifthe architecture implies that data flows between the calculation environment ofPAM and the database.

• The use of threads in tcl, which is also a tcl extension previously not used atWestinghouse Atom.

• A component developed at Atom for another application, which may be verysimilar to one of the components in the PAM system.

• The ability to read files over the network from the client directly using NT fileaccess methods (instead of using a server process), without exposing the filesdirectly to the user.

1.3.6 This documentThis document consists of the contents from documents released at Westinghouse Atom.After this introduction, the document is organized as follows.

• Section 2 contains a presentation of current software architectural research andstate-of-the-art.

• Section 3 and appendixes A, B, C, D present the architectural proposals for thecalculation environment in PAM, and the analyses of these.



• Appendix E contains the requirements specification for the calculation environmentin PAM.

• Section 4 and appendix F present technical aspects on the architectural proposals.

• Appendix G contains a final discussion with the project group, where minor issuesthat should be investigated further were identified.

In addition to these documents, one memo and four protocols were written during thework.

1.4 ACKNOWLEDGEMENTSSupervisors at the Mälardalen University were Kristian Sandström and Anders Wall,who suggested literature and were available for discussions about certain topics. HansBoström, Kristina Engström, and Niklas Lenasson colleague developer, participated infruitful discussions. Kjell Bergman, Lars Paulsson, Håkan Svensson, Ernst Thulin, andMichael Timm gave their opinions of the functionality as future users.

A positive side effect from this Master thesis is that some colleagues at WestinghouseAtom have given the field of software architecture a higher attention than before, at leastpartly because of informal discussions.

2 AN OVERVIEW OVER SOFTWARE ARCHITECTURE2.1 THE RESEARCH AREA OF SOFTWARE ARCHITECTURE

Computer science and engineering are immature areas compared to other engineeringdisciplines. There are some thoroughly explored areas, founded on mathematicaltheories, but in many other areas, program developers are left without standard solutions.The field of software architecture is thus immature compared to many others, but ofgrowing interest and importance as computer systems grow in size and complexity.Shaw and Garlan observe that in software documentation “there is no clean separation ofarchitectural issues from coding issues. […] It is not unusual to find this, becauseexplicit attention to the architecture as a separate level of software design is relativelyrecent” ([10], italics added).

Today’s state of software architectural research can be described as a number ofstatements.

• Each piece of software can be described architecturally, independently of whetherthe designer explicitly designed an architecture or not.

• A system can be built in different ways (using different architectures), each havingthe same functionality but with different non-functional2 attributes. “The trade-offsbetween performance and security, between maintainability and reliability, andbetween the cost of the current development effort and the cost of futuredevelopments are all manifested in the architecture” [1].

• Moreover, these non-functional properties are to a great extent inherent in thearchitecture. “Thus, once the architecture’s module structure has been agreed on, it

2 See section 2.5 for a discussion of the terms functional and non-functional.



becomes almost impossible for managerial and business reasons to modify it. Thisis one argument (among many) for carrying out extensive analysis before freezingthe software architecture for a large system” [1]

• When a system is being designed, a number of properties can be analyzed to someextent at the architectural level. Only an architectural description of the system isneeded.

• Graphical descriptions are suited for documenting software architectures. However,“it is […] common practice to draw box-and-line diagrams that depict thearchitecture of a system, but no uniform meaning is yet associated with thesediagrams” [10]. Therefore, research has been made to formalize such graphicaldescriptions into architectural description languages.

• Software architecture is not only of technical importance. The mere size of currentsoftware projects requires consideration of architectural issues, to make themmanageable. “Unless a program is small enough to be produced by a singleprogrammer, one must give thought to how the work will be divided into workassignments, which are called modules, and how those modules will interact witheach other.” [1]

The architecture of a system has therefore a major role in the success of a system, and itcan even determine whether the system is ever delivered. The “design of a softwarearchitecture is more than a simple activity within a limited scope. It comprises thetechnical, methodological and process aspects of software engineering” [5].

2.2 SOFTWARE ARCHITECTURE IN REAL SYSTEMSThe notion of software architecture does not origin from academic discussions withoutconnection with reality. Quite the contrary; it has emerged from a need, and been subjectto research, to particularly solve problems in the industry.

In [10], [1], and [3], there are a number of case studies from real applications that hasproved successful (both regarding functionality and non-functional requirements3, suchas cost). There are studies of such different systems as aircraft navigation computers, theWorld Wide Web (WWW), the design of CORBA, air traffic control, flight simulation,product line development, mobile robotics, digital oscilloscopes, product lines4, firealarm, measurement systems, and a dialysis system5. These studies are included toemphasize the importance of software architecture, and how a consciously designedarchitecture contributes to the success of a system.

The very goal of [5] and [8] is to catalogue successful patterns found in real systems,created and refined by experts with great experience. [5] points out that the patterncommunity explicitly “looks for pattern descriptions of proven solutions to problems,rather that on presenting the latest scientific results” (italics added).

3 See section 2.5 for a discussion of the terms functional and non-functional.4 A product line is a set of closely related products that basically are variations on one theme, e.g. a set of carmodels from one manufacturer. As an example from the software community, several Microsoft products comein packages named “Standard”, “Professional”, “Enterprise” etc., such as Visual studio, Visio, and Office.5 The first six are found in [1], the next two in [10], and the last four in [3].



Moreover, the architecture development and assessment methodologies described in [1]and [3] (see section 2.4.5) have been developed with, and been applied to, real softwareprojects with success.

2.3 SOCIAL EFFECTS“[A]rchitecture serves as an important communication, reasoning, analysis, and growthtool for systems” [1].

An understanding of the importance of software architecture affects the relationsbetween people at different positions within a software development project. Adeveloper might be inclined to add one particular piece of functionality but reluctant toinclude another, which a project manager might think is equally simple. If this situationis due to the architecture, but neither of the people involved can express this, it can be asource of conflicts.

The results of the architecture evaluation method described in [1] “are both technical andsocial”. The analysis “acts as a catalyzing activity on an organization”, in the meaningthat “participants end up with a better understanding of the architecture” and generates“deeper insights into the trade-offs that are implicit in the architecture”, simply becausethe issue is brought to attention.

2.4 THE SOFTWARE DEVELOPMENT PROCESSTo understand the importance and context of architectural design, we will take a look atthe whole software development process.

2.4.1 The cost of errors2.4.1.1 The waterfall model

One commonly used software development model is the “waterfall model” (see Figure2.1), which states that “errors found during one phase may make it necessary to return tothe previous phase in order to make corrections there before continuing. A new phasemay only begin after review and approval of the former one.” [9]

Requirements specification

Design

Implementation

Time

Figure 2.1: A simple view of the waterfall software development model.

A variation is the “stairs diagram” (see Figure 2.2; the phases according to [7]), with thetime on the x-axis and the “abstraction level” on the y-axis. The left part of the chartdescribes the order in which different levels of abstraction in the software are designed;the requirements specification form the basis for the architectural design etc. The rightpart of the diagram indicates the software testing. If an error is found at a certain levelduring the test process, the corresponding design level has to be modified, and thedevelopment process should go through each step below it, and each step tested again. Inpractice, the development can be assumed to “converge”, meaning that each new error



found at the same level probably has less impact and requires redesign (and re-implementation) of smaller parts of the program, thus making each iteration faster.

The earlier an error is introduced, the more likely is it to be discovered late, and themore impact on the system; thus, early errors are costly to repair. The compiler and theprogrammer perform the tests interactively during the implementation step, and theseerrors are many but most often easy to find and cheap to repair.

Design

Requirements specification

External design

Product goals

Internal design

Module interface specification

Function test

System test

Integration test

Implementation

Test

Module test

Acceptance test

Increased detail knowledge

Time

Figure 2.2: A common view of software development: the “stairs diagram”.

2.4.1.2 Architectural design in the waterfall/stairs modelThe choice of architecture approximates to, or is included in what [7] calls “externaldesign” (see Figure 2.2). Thus, if the designed architecture is insufficient, it is notdiscovered until the “function test” (in [7] and Figure 2.2), according to the waterfallmodel.

With the words of [1],

[s]oftware architecture represents a system’s earliest set of design decisions. These earlydecisions are the most difficult to get correct and the hardest to change, and they have themost far-reaching effects.

An interesting feature of the waterfall model is that it is supposed that errors made onone step can be discovered at the next step. This means that design errors, in particulararchitectural errors, can be discovered before the system is built. This is interesting sincethe waterfall model, to our knowledge, is much older than the notion of softwarearchitecture, and in particular architectural analysis.

2.4.1.3 Criticism of the waterfall/stairs modelHowever, this view of software development has been criticized for being overlysimplistic. In [1], it is argued that there is more to a software system than implementingcertain functionality. There is a description of the “old way” of looking at softwaresystem development:

Conceptually, the requirements document gets tossed over the wall into the designer’scubicle, and the designer must come forth with a satisfactory design”. The discussioncontinues with the notion that in practice, software development is a little more sophisticated,that there are feedback loops from designer to analyst, but “they still make the implicit



assumption that design is a product of the system’s technical requirements, period. […] Ifyou know the requirements for a system, you can build the architecture for it. This is short-sighted and fails to tell the whole story.

The term “technical requirements” is probably a synonym for the term used in thisthesis, “functional requirements” (see section 2.5).

Brooks writes that the “basic fallacy of the waterfall model is that it assumes a projectgoes through the process once […] the waterfall model assumes the mistakes will all bein the realization, and thus that their repair can be smoothly interspersed with componentand system testing” [4]. He continues by telling that the

waterfall model, which was the way most people thought about software projects in 1975,unfortunately got enshrined into […] the Department of Defense specification for all militarysoftware. This ensured its survival well past the time when most thoughtful practitioners hadrecognized its inadequacy and abandoned it.

However, it is not the goal of this document to either support or criticize the thewaterfall/stairs model. The reason it is described is that it is well-known, and itillustrates the fact that any software development model must include analysis anddesign before implementation, and design must be conducted on higher levels (e.g. thearchitectural level) before lower. The conclusion is that architectural design is importantfor the success of a software system, and that software projects can benefit if this designcan be analyzed before it is implemented.

2.4.2 The cost distribution during a program’s life cycleAnother common, and important, view is the relative amount of resources spent duringdifferent stages in a software system’s lifecycle; this is described in Figure 2.3 (therelative costs according to [7]).

Coding10%

Design15%

Testing25%

Maintenance50%

Figure 2.3: The relative costs during different phases in a piece of software.

Unlike what an inexperienced programmer might think, only ten percent or less is spenton coding. More is spent on designing, even more on testing, but the large part is duringthe rest of the program’s life cycle. Some studies imply that as much as 80% of theresources are spent during the maintenance phase6.

6 According to [12], [7], [3], and [1].



There is a real-life case study in [1], where one of the main requirements was that thesystem should be “easy to understand and change”. The division into components wasmade on the basis of estimations on which parts were most likely to be changed orextended. The developers were aware of the cost distribution, and the resulting designwas an “attempt to lower the expected cost of software by anticipating likely changes”.

Hence we can see that it is important that the architecture makes it convenient tomaintain the software. Furthermore, you cannot add maintainability later (withouttransforming the architecture), but the architecture (and other factors) decides the degreeof maintainability. The “trade-offs between performance and security, betweenmaintainability and reliability, and between the cost of the current development effortand the cost of future developments are all manifested in the architecture” ([1], italicsadded).

2.4.3 The importance of requirements specificationThe development of a new product should start with a carefully prepared requirementsspecification to be able to choose an appropriate architecture. It might be disastrous forthe project if the requirements change late in the project (or were not specified wellenough at the beginning of the project), thus making the chosen architecture insufficient.

If there are no requirements stated, how can it be determined whether the program worksor not? What is meant with “work” if it is not specified what the system should do? Ifthe only requirement is “make a program that does something”, with the argument that“anything is better than nothing”, this reasoning brings to mind the episode when Aliceasks the Cheshire Cat for the way [6]:

“’Would you tell me, please, where I ought to go from here?’‘That depends a good deal on where you want to get to,’ said the Cat.‘I don’t much care where – ‘ said Alice.‘Then it doesn’t matter which way you go,’ said the Cat.‘- so long as I get somewhere,’ Alice added as an explanation.‘Oh, you’re sure to do that,’ said the Cat, ‘if you only walk long enough.’”

In this context, it is important to understand that there might be more requirements thanmere functionality. This issue is discussed in section 2.5.

2.4.4 Definitions of software architectureWhen it comes to defining software architecture, “[t]here are almost as many definitionsfor software architecture in the literature as there are software architects and designers”[12].

In [1], several good descriptions of software architecture are provided. The formaldefinition of [1], which [3] explicitly quotes and uses as a definition, states that

[t]he software architecture of a program or computing system is the structure or structures ofthe system, which comprise software components, the externally visible properties of thosecomponents, and the relationships among them.

A more informal description is also given: “Software architecture concerns the structuresof large software systems. The architectural view of a system is an abstract view that



distills away details of implementation, algorithm, and data representation andconcentrates on the behavior and interaction of ‘black-box’ components”.

This document will not further elaborate the definitions, but assumes that we haveenough informal understanding of the term software architecture to continue.

2.4.5 Choosing a design methodologyThe first step when transforming requirements into a software system7 is to create ahigh-level design, an architectural design, and as was pointed out in section 2.4.1,choosing an insufficient architecture is expensive. Since we therefore we want a gooddesign and not one with random quality, it is obvious that a good design methodologyshould be chosen at this stage.

Much “of the power of a design methodology arises from how well it focuses attentionon significant decisions at appropriate times. Methodologies generally do this bydecomposing the problem in such a way that development of the software structureproceeds hand in hand with the analysis for significant decisions” [10]. Therefore, gooduse of a good design methodology more or less automatically leads to a good design.

However, since software engineering is immature, there are few mechanical rules tofollow which creates a good design given a set of requirements. This topic is discussedin [1]:

Some of the activities […] are intuitive activities, such as the transformation that turns astatement of requirements, enterprise goals, and available development environment into anarchitecture. By intuitive, we mean that there are few analytic tools to aid with this feat ofalchemy that we call design, which is primarily done by intuition, pattern matching,experience, rules of thumb, and ad hoc reasoning. 8

The discussion ends with the statement that “a software architect must have considerableorganizational talents and negotiating skills in addition to comprehensive technical anddomain knowledge.”

In the referenced literature, two methodologies are described. The first is a frameworkfor developing and refining an architecture, described in [3], while the second is amethodology for analyzing certain architectural properties, described in [1]. Worth tonote is that both discuss “stakeholders”, and emphasize the importance of lettingeverybody involved influence the choices made. Different people with different roles ina software project put different requirements on the project and the product, each equallyimportant; e.g. functional requirements9, economical, time-to-market, technical non-functional requirements.

7 The creation of a software system can be viewed as a transformation process. Today, it is not particularlyautomated or deterministic (that is why there is still need for human software developers). However, in someareas (such as artificial intelligence and logic programming), the programming is more or less carried out byformulating the problem, after which the system solves it. One goal for current software architecture research isto make the architectural design more deterministic.8 See note 7.9 For definitions of the terms functional and non-functional requirements, see section 2.5.



2.4.5.1 The iterative methodology of BoschIn [3], Bosch suggests that the architecture is refined in two major iterative steps ofwork.

A subset of the functional requirements is selected, and an architecture is created. Thenon-functional properties of this architecture are evaluated and the architecture istransformed until they are acceptable. Then, a new set of functional requirements isadded, and the architecture is modified to include these properties.

Hopefully, in each iteration the architecture will be less modified than in the previous.But if the developer is not careful, it is possible that functional properties are selectedsuch that each iteration requires great modifications of the architecture. Therefore, [1]gives the advice, “first things first”: “Start with the architectural structure that providesthe most leverage on the qualities (including functionality) that you expect to be themost troublesome.”

2.4.5.2 SAAMThe Software Architectural Analysis Method, SAAM, is a general method for evaluatingnon-functional properties (see section 2.5.1). It can be used for estimating modifiability,security, performance etc, but is particularly suited for maintainability analyses. SAAMis further described in section 2.5.4.5.

2.5 SOFTWARE PROPERTIES2.5.1 Different types of properties

Software has two main types of properties from the developers’ point of view: functionaland non-functional. The functional properties are the ones a program system showswhile it runs, as available functions, performance, security, reliability, while the non-functional properties describes different aspects of the program code, e.g. how easily it isto e.g. maintain, extend, test, or reuse parts from.

Furthermore, [1] also divides properties into direct (that can be assessed directly) andindirect (that depends on other properties).

Different implementations of a requirements specification may have the same functionalproperties, while having very different non-functional properties (they might e.g. showthe same performance, be equally reliable etc., but different cost and maintainabilityaffect whether the system will be successful). It is therefore important to take the non-functional requirements into consideration when designing the system (on all levels, butnot least architecturally).

2.5.1.1 Non-functional requirementsOnce a software project is aware of the existence of non-functional properties, testablerequirements on these should be stated as far as possible.

For a requirement to make sense, it must be measurable; it should be straightforward toconstruct a test case that will show whether the requirement is fulfilled or not (in fact, agood test case, as any scientific experiment, should be constructed to show that thesoftware does not work). It is meaningless to state that the system should “be verysecure”, “be very reliable”, or “have good performance”, because it is not possible to testsuch subjective “requirements”.



Thus, instead of such desires, exact requirements on how the system is supposed tobehave in identified situations should be stated. Requirements can be stated as how thesystem should handle the situations when e.g. one computer is lost, the network fails, ora message is lost. Thus, the desired properties are transformed into measurablerequirements (that could now possibly be called functional, since we have requirementson how the system should function).

Sometimes, the term non-functional is confused with the term quality. The terms areclose relatives, but there are certain aspects of functionality that can be called quality (asreliability). [12], [10], [1], and [3] all seem to have different opinions of how the termsfunctional, non-functional, and quality properties are related. In [1], the distinction ismade between “system quality attributes discernable at runtime” and “system qualityattributes not discernable at runtime”. In this document, the terms functional and non-functional are used, with meanings as will be described in sections 2.5.4 and 2.5.5.

2.5.1.2 Using patternsIn the pattern catalog [5], it is stressed that recognized patterns with known propertiesshould be used to achieve a good design. An “objective of patterns is to build softwaresystems with predictable non-functional properties. Patterns therefore also build on theprinciples for developing software for and with reuse, design for change and so on.”(Italics added.)

2.5.1.3 Analyzing propertiesOne main standpoint of current research is that certain properties of software can beanalyzed at an architectural level. This section describes different properties of softwareas well as how each property can be analyzed, given a description of the architecture.

2.5.2 The context of architectural analysis“One of the messages of this book is that architectures can in fact be evaluated-one ofthe great benefits of paying attention to them-but only in the context of specific goals.”“There is no such thing as an inherently good or bad architecture. Architectures areeither more or less fit for some stated purpose.” ([1], italics added.)

Many non-functional properties can not be quantified, but should be investigatedthoroughly enough to ensure that alternative design solutions can be compared. “Most ofthese [quality] software attributes are qualitative rather than quantitative, thus beingmost practicable for comparison between different architectures” ([12], italics added).

When an architecture exists that fulfills all functional requirements, the non-functionalproperties should be assessed (remember that if the desired non-functional properties canbe quantified, they should be stated as requirements)10. The methodology suggested in[3] describes an iterative process where the architectural design is further refined until anacceptable design is achieved (see section 2.4.5.1).

It is important to understand that architectural analyses can be made invalid at laterstages. “Recall that an architecture cannot guarantee the functionality or quality required

10 A proper distinction in this context would therefore be “properties that can be quantified” and “properties thatcannot be quantified”. This leads back to the discussion whether certain properties are functional or non-functional, and this issue is not elaborated further.



of a system. Poor downstream design, implementation, testing, or management decisionscan always undermine an acceptable architecture.” [1]

2.5.3 Trade-offMany properties, both functional and non-functional, are more or less orthogonal(improving one deteriorates another – the performance may e.g. be decreased if securityor portability is increased). Therefore a trade-off between them is required whenchoosing an architecture for a system.

It is obvious that one cannot maximize all quality attributes. This is the case in anyengineering discipline. […] The strongest bridge is not the lightest, quickest to erect, orcheapest. The fastest, best-handling car doesn’t carry large amounts of cargo and is not fuelefficient. The best-tasting dessert is never the lowest in calories. [1]

To be able to do a trade-off, there must be several architectures (maybe variants on oneand the same) with known estimated properties that can be compared.

2.5.4 Non-functional propertiesThe non-functional properties are discussed before the functional properties, anddescribed in more detail, to emphasize their importance to a software project.

As we have seen, it is theoretically possible to build a system fulfilling its functionalrequirements while having disastrous non-functional properties, which probably willmake the program short-lived. “In fact, if the achievement of functionality were the onlyrequirement, the system could exist as a single monolithic component with no internalstructure at all” [1]. The system might e.g. be too costly or virtually impossible tomaintain.

Therefore the discernible non-functional properties are described below, together withmethods of how they can be analyzed.

2.5.4.1 CostThe architecture should be “cheap enough”. Cost, i.e. the expense of both time andmoney, is in this context a non-functional property, which may be in conflict with otherrequirements (functional or non-functional). It is e.g. in general more expensive themore functionality is included; it is probably more expensive to develop a maintainableprogram, than one that is difficult to maintain. On the other hand, the maintenance partof the programs life cycle is probably less expensive if effort was laid to make itmaintainable, and as we saw in section 2.4.1, this is the large cost for a program system.This effect may be significant even before the first version of the program is released, ifthe design or implementation phases are very iterative.

It is important to see that the choice of architecture affects the cost of building thesystem significantly; one architecture may include twice the number of components, andinclude more complicated interactions, than another, while both fulfills the requirementsspecification.

Cost is an indirect property, which depends highly on other quality properties, such astestability and maintainability. The question of how the cost can be minimized iscomplex, not least because the answer can depend on the unit used to measure cost (e.g.time, time-to-market, future reuse).



There are more or less detailed models that estimate the cost depending on a number offactors, such as the developers experience, familiarity with the language and tools, andof course the complexity and size of the problem11. The cost estimations produced bysuch models are of course dependent on that the parameters in the model are estimatedreasonably well.

The most important factor in such cost analyses is the estimated amount of workrequired to create the system. Given an architectural description, it is possible to do amore accurate analysis than before the architecture was designed. The most obvious wayof estimating the cost from an architectural design is to count the number of codemodules. Provided that these have about the same size, their number is a measure ofcode size, which can be used for an estimation of the cost to write it. The number ofalready available components (commercially available, already existing in-house,existing modules that could easily be modified to meet the needs) should be considered.

Of course, some measure of the complexity of the code modules must also be includedin the analysis. It could be measured in the number of connections between the modules.[1] discusses what is called “buildability”, which “refers to the ease of constructing adesired system”, measured “in terms of cost and time”. An important “aspect ofbuildability is the state of knowledge about the problem to be solved.” The discussionends with the statement that “a design that casts a solution in terms of well-understoodconcepts is more buildable than a design that introduces new concepts”.

2.5.4.2 TestabilityTestability is an indirect property, dependent on:

• Controllability - Maximal controllability is achieved if the path of execution onlydepends on the tester’s input data. If e.g. states in other modules or global variablesaffect the execution, the controllability is decreased.

• Observability – The tester must have possibility to view output data enough to beable to determine whether the program passed the test. For high observability, theprogram must be equipped with interfaces large enough to allow the tester to viewthe output data needed. To be able to effectively locate the error sources, theprogrammer need good debugging tools. The observability is thus not only aproperty dependent on the software itself, but also on the choice of language anddevelopment tools.

• Reproducibility must be ensured to allow for a programmer to perform the same testwith exactly the same result, to be able to locate the error source. Seeminglyrandom errors are much worse than reproducible ones, since they are much harderto locate, and thus to correct. With real-time systems and systems where timeinfluences the system’s behavior, there must be possibilities to restart the system ina known state, this may be achieved e.g. through simulation of the environment.

11 [7] lists a number of informal methods, such as expert estimation, estimation by analogy, the “Parkinsonmethod” (the price is the amount of money the customer have, and the functionality is the amount we have timeto do before the money is out). The more formal COCOMO method (COnstructive COst MOdel) is alsopresented.



On the architectural design level, it is possible to assess the degree of testability by e.g.tracing how input data propagates through the system, how the behavior of one moduledepends on the state of others, and if there is an explicit observing component in thesystem.

2.5.4.3 ReusabilityIf program code can be reused, the costs can be radically decreased. Reusable code partscould be developed in-house, or be bought from a third-party company. To decreasecosts in the future, it is desirable to give the components a generic design, thus allowingthem to be reused. This is a trade-off decision at management level. Maybe the programmust be delivered in short time, thus giving no time to think of reusability in the future.

To analyze the level of reusability of the code to be developed, each component’s gradeof isolation from other components must be analyzed. If one component in thearchitecture has a clear specification of some general functionality, it is possible that thiscomponent is commercially available. If it on the other hand has to be written, it couldbe considered to design it for future reuse.

Reusability is also dependent on the quality and availability of the documentation (seesection 2.5.7).

2.5.4.4 PortabilityIf the program is intended to be able to run on different platforms (hardware and/orsoftware, such as operating systems and compilers/interpreters), portability is animportant property.

Portability is also dependent on the quality and availability of the documentation (seesection 2.5.7).

Later decisions, insufficient design, or bad implementation can make it practicallyimpossible to port software. The choice of programming language is one example; thereare e.g. not compilers available for all languages and all platforms (a program written inVisual Basic runs only on Windows systems; the whole environment, not only thecompiler, is designed for Windows only). Despite all standardization, it is common thatdifferent compiler and interpreter vendors extend languages, or even violate thestandards, thus creating different dialects (SQL is a typical example – different databasevendors use different flavors and extensions of the standard). To achieve highportability, only the standardized parts of the environment, such as language or platformshould therefore be used.

The portability can be measured by counting the number of direct connections to theplatform. If there are many connections, many parts of the program must be modifiedwhen porting it, which decreases the portability. To improve the portability, all platformdependencies can be collected in a proxy called for by other components, that does notchange its interface (the design pattern “bridge” in [8]).

2.5.4.5 MaintainabilityWe saw in section 2.4.2 that when a system is released, as much as four times moreresources will be spent on maintaining it than has been spent so far. It is obvious that acertain degree of maintainability must be ensured. However, it is not as obvious how to



balance the efforts required now with the estimated efforts in the future, neither is itobvious how to measure maintainability.

The SAAM methodology focuses on analyzing the maintainability of a product, given anarchitectural description. It was developed and has been used by the authors of [9] and[1], which justifies “[t]he coarse analysis provided by SAAM” by its “low cost and theability to apply it very early in the software-development life cycle.” [1]

As the basis for the analysis, “a description or specification of the architecture isrequired”. To the question what views, at what level of detail, are needed for theevaluation, the answer is “whatever is needed to answer the questions posed by theevaluation technique”. [1]

The analysis is performed by the use of “change scenarios”. A change scenario is thedescription of a possible change to the system (e.g. addition of functionality). Providedthat the system is built with a certain architecture, each scenario is analyzed to see whichcomponents are affected.

In short, an architecture in which one scenario affects a large number of components isconsidered less apt to allow changes than one in which only a few components areaffected. However, the level of details of the architecture, and the architecturaldescription itself, affect this analysis; if e.g. the architectural description only consist ofone monolithic component, every scenario will affect at most one component. The totalnumber of scenarios affecting each component is therefore also taken into account bySAAM; if the components are affected by about the same number of scenarios, it is anindication of a good division of components.

It should be observed that the analysis gives an estimate about how complex a changewould be, not the size of the code affected (measured e.g. in number of lines of sourcecode). In theory, one proposal may affect a large number of components, but only asmall number of lines of code in each, while another may affect only one component,but a huge amount of code. It can then be questioned which architecture is most apt tochanges.

As with the other non-functional properties, however, SAAM does not tell whether thearchitecture is maintainable or not, rather it can be used to compare two architectures,using a certain set of anticipated changes. “In no case are absolute numbers on somemythical scale of “architectural excellence” produced. SAAM is predicated on theprinciple that there is no such scale; an architecture’s suitability depends entirely on thecontext in which it is being developed” ([1]). To be able to be confident in the results, itis therefore important that the scenarios chosen are believed to reflect the actual futurechanges.

“[Q]uality attributes do not exist in isolation but rather have meaning only within acontext. A system is modifiable (or not) with respect to certain classes of changes,secure (or not) with respect to specific threats, usable (or not) with respect to specificuser classes, efficient (or not) with respect to its utilization of specific resources, and soforth.”



2.5.5 Functional propertiesThis section contains only short descriptions of functional properties, as described in [1].

2.5.5.1 Performance“Performance refers to the responsiveness of the system – the time required to respondto stimuli (events) or the number of events processed in some interval of time.”

2.5.5.2 Security“Security is a measure of the system’s ability to resist unauthorized attempts at usageand denial of service while still providing its services to legitimate users.”

2.5.5.3 Availability“Availability measures the proportion of time the system is up and running.”

2.5.5.4 Functionality“Functionality is the ability of the system to do the work for which it was intended.”

Functionality is the property most easily assessed at the architectural level, because oftwo reasons.

• If the functional requirements are well defined, it should be easy to verify that thearchitecture supports each requirement. Given an adequate description of thearchitecture and a requirements specification, any software developer should beable to see how each piece of functionality is supported by the architecture.

• An architecture that does not achieve its functionality is of very limited interest12.

2.5.5.5 Usability[1] discusses the functional property usability, which includes several issues.

• “Learnability” – how easy is the system to learn?

• “Memorability” – how easy is it to remember how to use the system between usesof the system?

• “Satisfaction” – Does the user make the user’s job easy?

2.5.6 Conceptual integrityThe term conceptual integrity is discussed in several of the referenced books, and isapparently an important concept.

Conceptual integrity is the underlying theme or vision that unifies the design of the system,at all levels. The architecture should do similar things in similar ways. [1]

12 One way to solve this problematic situation is to modify the functional requirements to make the architecturefulfill all requirements. However, there must still be a match between all requirements and the architecture’sabilities before the next phases in the system’s development can take place, which is the point of this discussion.



[I]t is of crucial importance to keep things as simple as possible and to search for conceptualintegrity, a notion hard to quantify but understood by every software engineer [3].

Brooks describes conceptual integrity as “the most important consideration in systemdesign” when he introduced the term with his classical book [4]. He says that “theconceptual integrity of the product, as perceived by the user, is the most important factorin ease of use.”

It is better to have a system omit certain anomalous features and improvements, but to reflectone set of design ideas, than to have one that contains many good but independent anduncoordinated ideas.

In the context of architecture, [3] quotes [4] and states that

[o]ne of the main goals for the architecture is to achieve conceptual integrity […] throughoutthe system by using the same fundamental concepts as a basis everywhere.

2.5.6.1 One architect[1] and [4] argue that there should be at lest one architect responsible for the overallarchitecture (and if there are more, they should work very close together), to ensureconceptual integrity. “If no one is identified with that role, it is a sure sign of trouble.”[1]

2.5.6.2 Violation of the conceptual integrityThere are occasions, however, when violating a system’s conceptual integrity isjustified. Different parts must be allowed to build on different concepts, if developedindependently (which in fact often is the case, since standard program components andexisting operating systems are often used; many systems must also interact with othersystems, such as an Intranet or the Internet).

Optimizations must also be allowed. In one component, calculations must perhaps beperformed using memory-manipulating methods to meet performance demands,although a main concept when developing the program is that memory should not behandled in such detail (e.g. to decrease the probability of the occurrence of certainerrors). Such a violation should of course be done consciously, and be well documented.Also, it is in such a case of course very important to hide this behind an interface.

However, it is in such cases still enormously important that the interface conforms, bothsyntactically and semantically, to an “underlying theme or vision that unifies the designof the system” [1].

2.5.7 DocumentationAn important remark is that many non-functional requirements are dependent on theavailable documentation. If the program is documented badly, it is difficult to e.g.maintain the system. Maybe documentation should just be viewed as a techniquethrough which the other requirements are met, or perhaps be listed as a non-functionalproperty by itself.

The quality of design documents and code comments affects the further maintenance,development, and reuse of the system, and can thus reduce the cost in the future.



The quality of the documentation available for the users, such as manuals, also affect thesupport cost, as well as future incomes. Naturally, the users will be more satisfied, self-going, their work will be of higher quality, and they will more likely buy future releasesof the software, if the system is well documented.

It seems a little surprising that none of the references discuss the documentation issue 13,although this is not a specific architectural topic.

2.6 VIEWS, ADL’S AND STYLESIn this section, three different dimensions of software architecture are explored.

First, the possibility of viewing the same architecture from different perspectives, viaarchitectural views is discussed.

Second, different ways to document architectural views, using architectural descriptionlanguages, are described. It is shown how graphical symbols can be used to describe asystem’s architecture.

Third, a number of well-known architectural styles are listed. Each architectural styledescribes common patterns in existing systems by identifying components andspecifying how they interact.

2.6.1 Architectural views“As soon as we attempt to diagram software structure, we find it to constitute not one,but several, general directed graphs, superimposed one upon another” [4]. In otherwords, you may discover different properties depending on from which “direction” youview an architecture (see Figure 2.4). Such a view “represents a partial aspect of asoftware architecture that shows specific properties of a software system” [5].

Architecture

View 1

View 2

Figure 2.4: Architectures may be viewed from different positions, and thus bringdifferent properties into light.

In [1], the term “structure”, which is what is called “view” in this document, is describedsimilarly:

Which notion of architecture is the right one, the one whose components are modules or theone whose components are runtime entities such as processes? Obviously they both are. It isan axiom of this book that assuming that the two structures are the same is a fundamentaldesign mistake, since they are optimized to meet completely different criteria.

There are a number of well-known views, each revealing certain aspects of thearchitecture being analyzed (on the expense of other characteristics). Consider e.g. anobject-oriented system. Its object structure in run-time, i.e. a number of objects

13 However, there is a brief discussion in [1] in a section about reverse engineering (extracting an architecturaldescription from an existing system).



interacting, differs in many ways from its class structure. The first view may contain avery large number of objects and communication channels between them, while thesecond might contain only a few classes and their inheritance and dependence structure.The former may be used to estimate the system’s performance and find bottlenecks, andthe latter to e.g. estimate the maintainability from the number of interconnectionsbetween components.

The architecture should be described in several relevant architectural views. However,the efforts put into describing an architecture in a certain view can only be justified withthe use of it, either for analysis or for communicating the architecture within a softwareproject.

For the purpose of this document, it is enough to state that it is possible to view a pieceof software from different views, and that different views reveal different properties (andare thus suited for different analyses). Three different lists of views are presented below,without further comments14. Unfortunately, none of the sources presented examples ofwhat the views look like graphically.

2.6.1.1 Four architecturesIn the first list of [5], views are called “architectures”.

• “Conceptual architecture: components, connectors”.

• “Module architecture: subsystems, modules, exports, imports”.

• “Code architecture: files, directories, libraries, includes”.

• “Execution architecture: tasks, threads, processes”.

2.6.1.2 Four viewsIn the second list of [5], there are four views.

• “Logical view: the design’s object model, or a corresponding model such as anentity relationship diagram.”

• “Process view: concurrency and synchronization aspects.”

• “Physical view: the mapping of the software onto the hardware and its distributedaspects.”

• “Development view: the software’s static organization in its developmentenvironment.”

2.6.1.3 Nine structuresIn [1], nine views are listed, called “structures”.

• “Module structure. The units are work assignments”.

14 In [5], two different lists of views are provided as the authors found them in other sources, with the commentthat there is “obvious overlap between the two approaches”.



• “Conceptual, or logical, structure. The units are abstractions of the system’sfunctional requirements”.

• “Process structure, or coordination, structure. This view […] deals with thedynamic aspects of a running system. The units are processes or threads”.

• “Physical structure. This view shows the mapping of software onto hardware”.

• “Uses structure. The units are procedures or modules; they are linked by theassumes-the-correct-presence-of relation”.

• “Calls structure. The unit are usually (sub)procedures; they are related by the callsor invokes relation”.

• “Data flow. Units are programs or modules; the relation is may-send-data-to.”

• “Control flow. Units are programs, modules, or system states; the relation isbecomes-active-after.”

• “Class structure. Units are objects; the relation is inherits-from or is-an-instance-of.”

2.6.2 Architectural description languagesTo be able to communicate an architecture, e.g. among developers, the architecture mustbe written down in some way or another. It could be done via natural language, althoughdescriptions in natural languages tend to be ambiguous. A formal textual language couldbe specified (such languages exist), but formal languages are often hard to learn andmaster.

2.6.2.1 Current folkloreIn software documentation, “the first section often includes a diagram and a fewparagraphs of text labeled as the software architecture. The text refers informally tocommon software notions such as pipelines, client-server relations, interpreters,message-passing systems, and event handlers. The diagram usually consist of boxes andlines, but the semantics of the graphic elements varies substantially from one figure toanother” ([10]). More or less formal graphical notations are thus widely used, and even“an inexperienced person can get a feeling for how a system is built by interpreting agraphical representation” ([12]), which of course is a prerequisite for a successful systemdevelopment.

However, while an ad-hoc graphical description with boxes and lines is better thannothing, different members in a development team may interpret the symbols differently.It is “very important that software engineers can communicate with each other. If not,important information in- and from the design phase might accidentally get lost, leadingto misunderstandings and misinterpretations. The system designers must be able tocommunicate their thoughts to the customer, other project members and management,and vise versa in an unambiguous way. An unambiguous description is also a conditionfor enabling architectural analysis. […] a description language that is commonlyunderstood in the project is needed.” [12]



2.6.2.2 Formalized architecture description languagesThe most common choice of notation in practice, as we saw, is some sort of graphicalnotation, which is a very intuitive way of presenting architectures. If a set of symbolswith well-specified semantics is used, a picture of the architecture can both be useful asa description, and furthermore be formally analyzed.

Such architectural description languages (ADLs) are often specific for an architecturalstyle (see section 2.6.3), and thus the same symbols may have different meanings indifferent languages.

There are a number of properties that are desired for an ADL. [12] lists six classes ofdesired properties.

• Composition – the language should provide a number of elementary building blocksthat can be combined in certain, well-specified, ways.

• Abstraction – the language should emphasize some properties (at the expense ofothers).

• Reusability – patterns found in an architectural description should be possible toreuse.

• Configuration – the description of how the components are interconnected shouldbe separated from the structure within components.

• Heterogeneity – combinations of different architectural descriptions should besupported.

• Analysis – if the semantics of the ADL is unambiguous, different types of analysiscan be performed on an architecture. Even tools for automating this analysis can bedeveloped (prototypes for this have been constructed, and are described in [10]).

2.6.3 Architectural stylesThere are a number of well-known architectural styles, each applicable in somedomains. A few are explained shortly below15, with focus on styles interesting for thecalculation environment in PAM. In [1], architectural styles are described as follows:

When we have models of quality attributes that we believe in, we can annotate architecturalstyles with their prototypical behavior with respect to quality attributes. We can then talkabout performance styles (such as priority-based preemptive scheduling) or modifiabilitystyles (such as layering) or reliability styles (such as analytic redundancy) and then discussthe ways in which these styles can be composed.

Please note that some elements in the figures describing the architectures below looksimilar but have different semantics; arrows may e.g. denote data flow, function calls orsome other type of connection.

15 For a more complete list, see [12], [10], [1] or [5].



2.6.3.1 Pipe-and-filterIn a pipe-and-filter system (see Figure 2.5), the focus is set on the data flow in thesystem. There are a number of components, where output from one component forms theinput to the next. A typical example is the use of Unix pipes.

In its purest form, the different components are completely separated (no shared data orstate), and may start processing as soon as input starts arriving. A close relative of thisarchitecture is the batch sequencing architecture, where each step finishes before thenext start.

System input System output

System input System output

Figure 2.5: Two pipe-and filter systems, one very simple and the second a little morecomplicated. Each box is a processing unit, and an arrow represents data flow.

This style fits a program analyzing and formatting text or data, but is not so useful for aninteractive system. Because data is copied (at least in its pure form) from outputs toinputs, performance is generally decreased.

2.6.3.2 Object-oriented architectureWith an object-oriented architecture, the focus is on the different items in the system,modeled as objects. See Figure 2.6.

Figure 2.6: A view of a simple object-oriented system. A circle represents an object, andan arrow represents a function call.

Object orientation is one of the most widely spread architectural styles, both ineducation, industrial practice and science, and has proven to be very successful duringthe last10-15 years.

2.6.3.3 Layered architectureWith a layered architecture, or onion architecture, focus is laid on the differentabstraction levels in a system. This is a popular way of viewing the whole of a computer



system such as a PC (see Figure 2.7). Other systems are the OSI model for datacommunication and high level programming languages (built-in library functions can beviewed as the lowest layer, with at least one layer more on top of it).

Resource allocation and secutiry

File system

I/O system

Memory system

Process management, semaphores,interrupts

Hardware Hardware

Process management, semaphores, interrupts

I/O system

Memory system

File system

Resource allocation and security

Applications Applications

Figure 2.7: Two different descriptions of the layered architecture of a personal computer.Sometimes concentric circles are used in notation. (The layers according to [7].)

In its pure form, communications between the different layers must only occur in theinterfaces between two adjacent layers. The style’s major drawbacks are that it is notalways easy to identify the appropriate abstraction levels, and that the system may haveto communicate in a more complex way than is implicated by the layers.

2.6.3.4 Blackboard architectureA blackboard (or repository) architecture draws the attention to the data in the system,with agents working with it (see Figure 2.8). The agents may be implicitly invoked whendata changes, or explicitly by some sort of external action (such as a user command).

Write

Write

Read

Write Write Write ReadWrite

Read

Read

Read

Write

Figure 2.8: A repository architecture. The rectangle represents the central data store,while the ellipses are agents acting on the data (the arrows).

A database can be described by the blackboard architectural style. Examples of agentsare client applications, database triggers (small pieces of program code that are executedautomatically when data changes), and administration tools.



2.6.3.5 Client-server architectureA client-server architecture focuses on services different clients want to perform. Thisarchitecture is especially fit when the hardware is organized as a number of localcomputers (e.g. personal workstations) and one central resource such as a file tree,database, or a cluster of powerful central calculation computers. See Figure 2.9.

In a software client-server system, there may be several clients in one computer, andeven the server can be running on the same computer.

Server

Client

Client

Client

ClientClientClient

Client

Client

Figure 2.9: A view of a (hardware) client-server system.

This is a way of viewing a multi-user database, on a different abstraction level than theblackboard architecture.

2.6.3.6 Process controlReal-world systems often control a physical reality, such as control systems in a powerplant. There are a number of software paradigms for process control. The significantproperties are that the software takes its input from sensors (such as a flow sensor), andperform control actions (such as closing a valve). The control loop may be of feedbackor feed-forward type.

2.6.3.7 State machineWhen designing a state machine architecture, the states of the program can be in areidentified, together with legal transitions between them.

State machines are well known to mathematicians, and can be thoroughly investigatedand validated regarding loops, illegal states etc, which makes this style common insafety-critical systems. State machines are particularly well suited for graphicaldescription (see Figure 2.10).



Running

Off

On Off

StandbyStart

Off

Stop

Figure 2.10: A state machine with three states and five legal transitions.

2.6.4 Heterogeneous architectural stylesIn this case as in so many others, reality is more complicated than theories. Manysystems are in fact hybrids. In [11], this property is made explicit:

We describe the styles in their pure forms, although they seldom occur that way. Realsystems hybridize and amalgamate the pure styles, with the architect choosing useful aspectsfrom several in order to accomplish the task at hand. Our classification does not impede thisheterogeneity, but rather enhances the selection and blending process by making stylisticproperties explicit.

Some systems are just modifications of an architectural style; others combine them,maybe on different abstraction levels. Consider the system outlined in Figure 2.11,which implements a client-server architecture, where the server is organized in a layeredarchitectural style, each layer implemented in an object-oriented language, and some ofthe services are executed as pipe-and-filter sub-processes. Or, a system could be built asa pipe-and-filter system, where each filter abstracts the operating system via a portabilitylayer, to facilitate portability. The lowest abstraction layers could be regarded astechnique rather than architecture (object orientation can e.g. be thought of as animplementation technique).

Client

Client

Client

ClientClientClient

Client

Client

Server

Figure 2.11: Informal description of an architecture combining several architecturalstyles.



This issue is also discussed in [1], where it is stated that systems “are seldom built froma single style, and we say that such systems are heterogeneous.” Three kinds ofheterogeneity are identified:

• “Locationally heterogeneous“ – different runtime parts use different styles.

• “Hierarchically heterogeneous” – different components in a system of one stylemay be structured according to another style.

• “Simultaneously heterogeneous” – several styles serve as a description of the samesystem (as we saw, a multi-user database can be viewed as both a blackboard and aclient-server architecture). This heterogeneity “recognizes that styles do notpartition software architectures into nonoverlapping, clean categories” [1].

2.7 SUMMARY2.7.1 Software properties

A software system has both functional properties and non-functional properties(sometimes-called quality properties). The functional properties are the ones a programsystem shows while it runs, as available functions, performance, security, reliability,while the non-functional properties describes different aspects of the program code, e.g.how easily it is to e.g. maintain, extend, test, or reuse parts from. Cost is regarded a non-functional property and can be analyzed, given an architectural design description.

2.7.2 Languages and viewsTo be able to document, communicate, and analyze architectures, they must be describedin a well-understood language. Graphical languages are particularly well suited. It iscommon that arbitrary “boxes-and-lines” pictures are used, but there must be a clearsemantic definition of the graphical symbols to avoid confusion and misinterpretations.Formal architectural description languages (ADLs) have been developed. One commonfeature is that an ADL consist of components (describing the entities of the architecture)and connections (defining relations between the components).

Different views are used to describe architectures, depending on which properties are tobe revealed and analyzed. The elements of the ADL depend on the view. For example, ina process view, computers, processes, and threads might be the basic building blocks,and as connectors e.g. communication channels or “master-and-slave” relationshipsmight be used.

Architectures are thus always described in a language, using a specific view. Thepurpose of architectural descriptions is to reveal certain properties about the architecture,either for communication within the project, or for a more thorough analysis.

2.7.3 Architectural stylesThere are a number of well-known architectural styles, i.e. “standard architectures”,with known properties. Different architectural styles are suitable for different domains ofsoftware (such as real-time systems or database systems). A system does often inpractice combine several architectural styles, thus forming a heterogeneous architecturalstyle.



2.7.4 Architectural analysisAn architectural analysis does not yield absolute numbers on how “good” it is. It is aquestion of comparing alternative architectures’ particular properties, and always in acertain context.

A feasible architecture can be found by designing, or at least sketching, severalarchitectures which are analyzed and compared. Different architectures have differentproperties, and a trade-off analysis and decision between how well the differentrequirements should be met must be performed. The outcome of such a comparisonmight be that an architectural style must be further modified to fit the current problem,thus it is a highly iterative process to design architecture.

2.7.5 Lesson learnedThe quality of an architectural design is crucial to the success of a program system, anda development project. To a program system because many features, good or bad, can betraced back to the architectural design; to a development project because a goodarchitectural description allows the project members to communicate.

Bad decisions early in a project (when architectural design is conducted) are generallythe most expensive.

3 THE ARCHITECTURAL PROPOSALS3.1 THE ORGANIZATION OF THE DOCUMENT

This document discusses the functional and non-functional requirements in appendix E,which is followed in section 3.4 by a presentation of the proposed architectures, both intext and with a number of graphical views. Section 3.5 contain the results from theanalyses performed, and is concluded with an analysis summary. An investigation ofrequired technology, new in PAM, is documented in section 4.

3.2 FUNCTIONAL REQUIREMENTSThe architectural proposals meet the requirements on the system. The functionalrequirements are stated in appendix E.

3.3 NON-FUNCTIONAL REQUIREMENTSThere are several other goals for the architecture, which can be described with non-functional requirements. These cannot be assessed directly from merely an architecturaldescription. Rather, some measure is estimated, which can be used to compare thearchitectural proposals. The project can also decide that no proposal is good enough.

The non-functional requirements that have been analyzed are cost, performance,maintainability, testability, reusability and portability.

3.3.1 CostThe architecture should be “cheap enough”. The actual cost in required man-time can beestimated with several measures, which should only be used to compare the architecturalproposals.

• The size of the code:



- The amount of totally new code.

- The amount of existing code that must be modified.

• The complexity of the code:

- The number of components.

- The number of messages sent between them, and the complexity of theprotocol used (e.g. the number of messages, and need of complexstructures such as message queues and time out mechanisms).

- Time-dependent interactions (in our case, it is difficult (and thusexpensive) to test a system with several interacting processes running ondifferent computers).

This document contains cost analyses for the different architectural proposals in section3.5.1.

3.3.2 PerformanceThe system should have “good enough” performance, and use resources “as economicalas possible”. This document contains performance and system load analyses for thedifferent architectural proposals in section 3.5.2.

3.3.3 MaintainabilityThe system should be “maintainable”. Maintainability has been estimated throughscenario evaluation, as the SAAM method describes (see section 2.4.5.2). The scenariosused are of two types.

• Given a system with the functionality of all requirements in appendix E withpriority 1, the impact on the architecture when introducing any of the requirementswith priority 2 is analyzed.

• Functionality that the users has announced interest in, but not been able to formulatemore precisely, is also used in the same manner; given a system with thefunctionality of all requirements with priority 1, the impact on the architecture whenintroducing this functionality is analyzed.

This document contains maintainability analyses for the different architectural proposalsin section 3.5.3.

3.3.4 TestabilityThis document contains a testability analysis for the different architectural proposals insection 3.5.4.

3.3.5 ReusabilityThis document contains a reusability analysis for the different architectural proposals insection 3.5.5.



3.3.6 PortabilityThis document contains a testability analysis for the different architectural proposals insection 3.5.6.

3.3.7 DocumentationThe system must in addition be well documented. This document serves as part of thedocumentation, for the architecture.

3.4 THE CANDIDATESThere are four candidates, which are variants on the same architecture. They havedifferent characteristics, however, and are evaluated separately. Thus, the project canchoose the one with the estimated best characteristics.

The already existing system’s architecture is also discussed.

3.4.1 OverviewThe new architectures are named A1, A2, B1 and B2.

3.4.1.1 Architectures A1, A2, B1, and B2Common for A1, A2, B1, and B2 are the following main features.

• There is always one SB (Service Broker, or Server Broker) running per servercomputer, and live independently of whether other components are started orterminated.

• When starting a project execution, a CS (Calculation Server) component is startedwhich takes care of the project throughout its execution. Once started, it is totallyindependent on whether there are any clients or not.

• Each executing task (started by the CS) is controlled by a TCS (Task CalculationServer) component.

• Any number of clients may be running, on any number of computers. The clientapplications are launched and exited locally by the individual users.

The main difference between the architectures is that B1 and B2 includes onecomponent more than A1 and A2, the SDS (Service Dispatch Server).

Figure 3.1 presents a runtime view of the architecture. The differences between thearchitectures are tabled in Table 3.1.



Server computer

Client computer

PAM client

SB

PAM client

Server computerSB SDS SDS

Client computer

PAM client

SDS

CS TCS

TCSTCS

Figure 3.1: A simplified runtime view of architectures A1, A2, B1, and B2. This view isexplained further, and presented individually for each architecture, in section 3.4.2.1.(The SDS component is grayed in the figure to illustrate that only architectures B1 andB2 include it.)

Handle files in component Project files storingA1 Client DistributedA2 Client CentralB1 SDS DistributedB2 SDS Central

Table 3.1: The differences between the architectures. These are explained further insection 3.4.4.

3.4.1.2 The architecture of todayMost functional requirements (see appendix E) are not fulfilled by the current system,but many of these can be implemented without modifying the architecture. Somerequirements are however not supported by the architecture, and to introduce some ofthem would require redesign of the architecture16 (which has been done and resulted inarchitectures B1 and B2). The most demanding requirements are listed below.

• Requirements (AC 1-1) and (AC 1-2) define project and task states; implementingthese requirements require major re-coding, maybe even a higher level redesign.

• Requirements (AC 2-1) to (AC 2-3) specifies how client’s and project executionsshall be related; implementing these requirements require major re-coding, maybeeven a higher level redesign.

• Requirement (AC 3-3.5) implies that tasks are to be copied, created, or deletedduring script execution. Today there is no connection between the SDS (whichevaluates the project script) and the database (attempts have been made that

16 This statement would actually require validation through a more thorough analysis (SAAM, see section3.3.3), which has not been performed. Rather, this architecture was used as a starting point for the developmentof the new architectural proposals.



performs these actions via the client, but this violates other requirements (previousbullet).

• The footnote for requirement (AC 5-1) should be particularly noted: “A project maycontain tasks that are executed on different computers, and even platforms.” This isthe major reason why this architecture is insufficient; the SDS can only executeprograms on the same computer (or some covered by a load sharing system, whichstill restricts execution on the same platform, but this is not implemented).

• “(AC 7-2.1)Binary files shall be possible to interpret without modifying PAM’s source code,given an interpreter transforming data into a format that can be handled by PAM.”Today, the interpretation of binary files is not general.

This architecture is not further discussed in this document, and is not included in anyanalysis, since non-functional properties are considered to be of no importance when thesystem doesn’t have the required functionality.

3.4.2 Graphical views of the architecturesIn this section, three different views of the architectures are presented.

3.4.2.1 Process viewIn Figure 3.3 through Figure 3.5, there is a process view of an example PAM system.The system consists of two server computers. In the snapshot presented by the figures,client applications are launched on two client computers, and one project is executing,executing three tasks in parallel.

Figure 3.2 describes the semantics of the graphical objects used in this view.

Computer

Process

Process or thread (in the SB on the same computer)

"Starts and owns" connection

Figure 3.2: Key to the process views.



Server computer

Client computer

PAM client

SB

PAM client

Server computerSB

Client computer

PAM client

CS TCS

TCSTCS

Figure 3.3: The process view of architecture A1 and A2. The only service the SBpresents to the client is project execution, which starts a CS.

Server computer

Client computer

PAM client

SB

PAM client

Server computer

SB

Client computer

PAM client

SDS

CS TCS

TCSTCS

SDS

SDS SDSSDS

SDS

Figure 3.4: The process view of architecture B1. The only service the SB presents to theclient is to start an SDS. Each client is then connected to an SDS on each servercomputer acting as file server, which presents a number of services to the client.



Server computer

Client computer

PAM client

SB

PAM client

Server computerSB

SDSSDS

Client computer

PAM client

SDS

CS TCS

TCSTCS

Figure 3.5: The process view of architecture B2, which is similar to B1. Each client isthen connected to an SDS on an arbitrary server computer.

Two major differences can be noticed.

• First, there are no SDS processes in architectures A1 and A2. Instead, the clientincludes most of the functionality of the SDS in architectures B1 and B2.

• Second, there are more SDS processes in architecture B1 than in architecture B2.This is because in B1, the files are distributed on each server computer, and theclient accesses them through an SDS (in both architectures); therefore, since theclient potentially may request a file on any computer, there must be an SDS onevery computer.

3.4.2.2 Module viewA simple module view of the architecture is presented in Appendix A, together with ananalysis of the number of code modules.

3.4.2.3 Message sequence chartThe messages in the system are listed in Appendix A, together with an analysis of thenumber of messages and complexity of the interactions.

3.4.3 SimilaritiesThe client and the database are mostly untouched, but there are parts that are affected. Inthe server environment, there are more novelties.

3.4.3.1 The SB componentOne component is simply named “SB” after the “Service Broker” in PAM 2.0. There isalso a component used for the CM2 report generator called “Server Broker” which havemuch of the desired functionality. In the analysis in section 3.5, it is shown to whatextent the requirements can be fulfilled if the CM2 component is used.



It is a central resource, running on each server computer in the PAM system. Its tasks arevery simple, but due to its central role, it has to be very reliable. Its tasks are thefollowing:

1. It starts an SDS on request from a client (in architecture B1 and B2 only)

2. It starts a TCS on request from a CS

3. It starts a CS on request from a client (in architecture A1 and A2 only)

4. During startup after a machine crash, it finds failed tasks and restarts them.

The SB is restarted automatically during a computer restart, to fulfill the requirementsabout availability and reliability (requirements (AC 6-1) to (AC 6-3) in appendix E).This might be implemented in different ways on different computers, e.g. via startupscripts, or as a “service” in Windows NT.

3.4.3.2 The CS and TCS componentsTo make project executions as independent as possible from the clients, the client simplystarts a calculation server process, CS, which then works totally on its own. The CSevaluates the project script, and for each task to be executed, it launches a taskcalculation server, TCS.

The project execution is controlled, however, via the file CScommand. The CS respondsto file events on this file, so whenever it is changed, the CS obeys the command writtento it. Legal commands are “Pause”, “Resume”, and “Stop”.

The reason to this somewhat complex approach is that a project may contain tasks thatshould be run on different computers. The tasks can therefore not run as part of the CSexecution, but the CS has to request a TCS execution from the central resource (the SB)on the correct computer.

3.4.4 DifferencesThe architectures differ in whether they include an SDS or not (the major difference),and whether the calculation files are stored in a central file tree or locally on each servercomputer.

3.4.4.1 With/without SDSArchitectures A1 and A2 do not include the SDS, while B1 and B2 do.

The advantages of not using an SDS are obvious. There will be less code, in particularless tcl code. Some inherent complexity in architectures B1 and B2 will not be an issuein A1 and A2; there is e.g. no need to implement an RPC layer (the only request left canbe made asynchronously since no reply is required), less testing of process interactions isneeded, and there is less tcl code to debug. For the complete analysis, see section 3.5.1and appendix A.

However, there are disadvantages. The binary file interpreter must execute in the client,which also means that the whole binary file must be moved over the network to beprocessed (otherwise, the interpreter would filter interesting data before sending it acrossthe network, thus decreasing the network load significantly). For text files, however, it



does not necessarily mean decreased performance when reading large text files. Thecurrent development version of PAM opens large text files using an external application,such as MS Word or Notepad, that starts displaying a large file very fast, while readingthe rest in the background.

However, the same functionality is included in both variants of the architecture. Thefunctionality that is implemented differently in the architectures is listed briefly in Table3.2. The performance of the architectures is analyzed in section 3.5.2 and appendix B.

A1/A2 B1/B2

Execution of project The client requests a projectexecution from the SB, whichstarts the CS process.

The client requests a projectexecution from the SDS, whichstarts the CS process.

Pausing, resuming, andstopping of projects

The client writes the desiredcommand to the CScommandfile.

The client sends the commandto the SDS, which writes thecommand to the CScommandfile.

Viewing of text files The client reads the filedirectly.

The client requests the contentsof the file from the SDS.

Interpretation of binary files The client reads the file andinterprets it.

The client requests a subset ofthe data in the file, which theSDS interprets and returns.

Removing files on the user’scommand

The client deletes the file. The client sends the “delete”command to the SDS, whichdeletes the specified files.

Comparing files The client compares the files. The client sends the “compare”command to the SDS, whichcompares the specified files andreturns an answer.

Backup project The client performs the backup. The client sends the “backup”command to the SDS, whichperforms the backup.

Saving output data The client reads output data andsaves a subset of it to thedatabase.

The SDS reads the output dataand saves a subset of it to thedatabase.

Table 3.2: The location of certain functionality in the different architectures.

3.4.4.2 File storingFiles could be stored centrally on a file server, or locally, on each server computer in thePAM system.



Architectures A2 and B2 uses one central file tree, with the main advantages that the filetree is easy to traverse outside PAM for a system manager, and in architecture B2 thereis only need for one SDS per client (namely on the server computer acting as PAM’s fileserver). Architecture B2 requires one SDS on every server computer per client. Thedisadvantage is that files (which may be numerous, and large) more often must betransferred over the network. This technique requires that each server computer is able tomount PAM’s file tree.

Another approach is presented in architectures A1 and B1, where the file tree isdistributed. This means that each task folder, with input and output data files, resides onthe machine where it is executed. The advantage is that the network becomes lessloaded, while it is more difficult for a system manager to overview the PAM system, andin architecture B1, more processes are used.

An estimation of the network transfers needed in each case is presented in section 3.5.2and appendix B.

3.4.5 Processes or threadsThe SB is a process, while each TCS component could be a process or a thread (in theSB). Even the CS components and the SDSs could be implemented as threads in the SB.

If processes are used, they would be more expensive in terms of system resources, whileusing threads could adventure the stability of the SB.

The architecture does not imply which strategy should be chosen. In fact, it willprobably be easy to modify an existing system to use the opposite strategy.

3.5 ANALYSIS3.5.1 Cost analysis

The full details of the cost estimation are found in appendix A.

3.5.1.1 Number of code modulesTable 3.3 and Figure 3.6summarize the analysis of the number of code modules in thearchitectures. The full details are found in appendix A.

A1/A2 B1/B2

Total number of codemodules

35 47

Number of already existingcode modules

3 3

Number of code modules thatcould partly be reused

8 12

Number of new code modules 24 32

Table 3.3: The number of code modules in each architecture.



0

5

10

15

20

25

30

35

40

45

50

A1/A2 B1/B2

Nu

mb

er o

f co

de

mo

du

les

New

Partly reused

Already existing

Figure 3.6: The number of code modules in each architecture.

3.5.1.2 ComplexityThe interaction between the CS and the TCSs during project execution is somewhatcomplicated. There are however no differences between the architectures in this respect;and as the pseudocode and analysis in Appendix D shows, the code can be made quitesimple, yet reliable.

There are great differences between the architectures in other respects, though. Sincethere is no SDS at all in architectures A1 and A2, there is no communication with it,while architectures B1 and B2 implement a protocol for communication between theclient and the SDS.

In both architectures, there are six messages through which the CS and the TCSscommunicate. In architectures A1 and A2, there is only one additional, very simplemessage (e.g. no answer is required) through which the client communicates with theSB, while B1 and B2 implement a more complex protocol. The full details are found inappendix A.

A1/A2 B1/B2

Total number of components 5 6

Number and complexity of messages 7 20

Table 3.4: Some estimates on the complexity of each architecture. The same six “morecomplex” messages are included in all architectures; in addition there are one message inA1/A2, and fourteen in B1/B2.

3.5.2 Performance and system load analysisIn the performance analysis, the number of processes in the running system and thenumber of large data transfers across the network are measured or estimated. Thecomplete analysis can be found in appendix B, and it is summarized in Figure 3.7 andTable 3.5.



0

100

200

300

400

500

600

700

Small Medium Large

System size

Num

ber

of p

roce

sses

A1

A2

B1

B2

Figure 3.7: The number of processes in running systems of different sizes.

The number of processes is thus lowest for architectures A1 and A2, while the numberof processes may be almost doubled in architecture B1.

A1 A2 B1 B2

A text file isviewed

1 1 3 4

Binary data isinterpreted andviewed

2 2 1 4

Two files arecompared

2 2 1 2

A project isexecuted

1 3 1 3

A project isexported

1 1 1 1

Total 7 9 7 14

Table 3.5: Relative ranking of the architectures, comparing the number of large datatransfers in different scenarios.



3.5.3 Maintainability analysis (SAAM analysis)The maintainability analysis is fully documented in appendix C. There a number ofchange scenarios are applied to the architectures. In this section, this analysis issummarized.

3.5.3.1 Affected componentsTable 3.6 summarizes this evaluation by documenting some statistics on how manymodules and components the scenarios affected. Architectures A1 and A2 in generalscore better than B1 and B2. However, two scenarios affect the SB in architectures A1and A2, which is undesirable because of its central position.

A1/A2 B1/B2

Total number of modules andcomponents affected

27 modules in 23 components 34 modules in 28 components

Average number of modulesand components affected

1.7 modules in 1.4 components 2.1 modules in 1.8 components

Worst number of modulesand components affected

4 modules in 3 components(once)

4 modules in 3 components(twice)

Number of scenarios affectingat least 2 components

6 10

Number of scenarios affectingthe SB

2 0

Table 3.6: Statistics from the scenario executions.

3.5.3.2 Scenario interactionTable 3.7 summarizes the number of scenarios affecting each component, the so-calledscenario interaction. The more interactions, the more complex it is to maintain thesystem. However, as always there are no absolute numbers on how many interactions aretoo many, rather these numbers should be used for comparing architectures (which isdone here), or to focus attention on particular components with many scenariointeractions.

With only five or six components and 16 scenarios, this analysis gives a rather decentdistribution, apart from the client. Moreover, when investigating which code modulesare affected in each component, very few are affected by more than one scenario(including the client).

A1/A2 B1/B2

Database Affected by 4 scenarios

SB Affected by 2 scenarios Affected by 0 scenarios

CS Affected by 3 scenarios



A1/A2 B1/B2

TCS Affected by 3 or 4 scenarios

Client Affected by 10 or 11 scenarios Affected by 9 to 11 scenarios

SDS Irrelevant Affected by 7 or 8 scenarios

Table 3.7: Scenario interaction on each component. (The variations depend on howcertain scenarios are implemented.)

3.5.3.3 ConclusionIn proposals B1 and B2, several scenarios require that the client implements certain partsof the functionality, and the SDS implements other parts. The total number of code linesto be changed are about equal in all architectures, but in architectures B1 and B2, itmight be more complex to perform changes since two components are affected.However, if an RPC layer is implemented as suggested in these architectures, it might bevery straightforward to implement new functionality in a function call manner, only thatthe called subroutine is executed on the server and is written in another language.

One minus for architectures A1 and A2 is that two scenarios affect the SB, which isundesirable because of its central position. The scenarios are ”Requirement (AC 7-6)”,about handling input and output files with external programs, e.g. for converting binaryfiles, and “Requirement (AC 8-1.2)”, about killing processes at the user’s command.

As a conclusion, there is a noticeable trend that architectural proposals A1 and A2 areslightly more apt to changes than B1 and B2, but are by no means superior, and in oneaspect inferior to architectures B1 and B2.

3.5.4 Testability analysisTestability can be divided into three properties, which are analyzed below. Most of theanalysis describes all four architectural proposals.

3.5.4.1 ControllabilityThe relative simplicity of the architectures, and in particular A1 and A2, gives a gooddegree of controllability. The components can be controlled by the normal commandsavailable during normal operation. The architecture does not, however, support any othercontrol of the execution, such as special test commands.

3.5.4.2 ObservabilityThe SB, CS, TCS, and SDS all have very low observability in themselves. However,some logging functionality could be used to make it possible to trace when messagesarrive and what execution paths are followed.

However, there is no support in the architecture to be able to determine the order ofevents in different concurrent executing components.

3.5.4.3 ReproducibilityIn architectures A1 and A2, scenarios involving only the client’s thread of execution aretotally reproducible (given that the environment, e.g. files that are read, does not



change). In B1 and B2, the interaction between the SDS and the client makes the samescenarios more difficult to reproduce, because two threads of execution must besynchronized. If, however, a synchronous RPC layer is implemented and testedthoroughly, even the SDS calls in architectures B1 and B2 can be viewed as part of thesingle thread of execution.

Other scenarios, involving e.g. the SB, CS and TCS, are equally hard to reproduce in allarchitectures. If these components are implemented according to the pseudocode ofappendix D, most scenarios during normal operation (e.g. given the network does notfail) should be reproducible.

3.5.5 Reusability analysisIn the cost analysis (see section 3.5.1 and appendix A), existing code that could bereused is considered. No other reusability analysis has been made.

3.5.6 Portability analysisThe architectural descriptions are not detailed enough to reveal any dependencies on theenvironment (other software, operating system, or hardware). Therefore, no portabilityanalysis can be performed at the architectural level. However, since details about theused languages are known, a brief analysis at a lower level can be carried out.

3.5.6.1 The clientThe client is tied to Windows, because it is written in Visual Basic.

3.5.6.2 The serversIf the SB, SDS, CS, and TCS components are implemented in tcl, there are no directconnections to the underlying operating system or hardware. However, some tclcommands have options and uses that are not available on all platforms, and theseshould of course be avoided to make the system portable.

3.5.7 Analysis summaryArchitectures A1 and A2 are significantly cheaper to implement than B1 and B2.

Architectures A1 and A2 will load the system slightly less than B2, while B1 will loadthe system with a significantly larger number of processes or threads. A1 and B1 will onthe average put the same load on the network, A2 slightly more, and B2 considerablymore. However, the load during different scenarios has not been weighted with theexpected relative frequency of the scenarios.

Architectural proposals A1 and A2 are slightly more apt to changes than B1 and B2, butare by no means superior.

Architectures A1 and A2 are slightly more testable than B1 and B2, but neither can beconsidered very testable. The chosen architecture could be further elaborated to make itmore testable.

No architecture can be said to be more reusable than the others. With architectures B1and B2, more code can be reused from the current version of PAM, but on the wholemore code is required.



All architectures are equally portable. The architecture says nothing of portability, butthe client can not be ported because it is written in Visual Basic, the other componentscan be written in tcl in such a way that the code is portable.

4 NEW TECHNOLOGYThe architectures require some new technology to be incorporated into PAM. In thefollowing section, technology that has been investigated as part of creating thearchitectural proposals is documented.

4.1 TCL ISSUES4.1.1 The use of the tcl language

The use of the tcl language has been questioned. Some of its experienced drawbacks arethat it is an interpreted language (there may be undiscovered syntax errors), it is difficultto debug, and it is vulnerable to ill-formed lists passed via sockets.

On of the major benefits are that it is easily extended. PAM has used this extensibility inseveral places, e.g. for input data formatting and project script evaluation. Some otherappealing features include the easiness with which sockets are handled, and thepossibility to use file events. There is also the argument that some existing code writtenin tcl can be reused.

The architectural solutions do not put any restrictions on the choice of language, but theadvantages continuing using tcl are considered greater then the disadvantages.

4.1.2 OraTclOraTcl, a database accessing extension for tcl, has been investigated. Only a study of theavailable documentation (see [14]) has been made, no practical experience has beengathered (due to shortage of time). However, the documentation gives the impressionthat OraTcl is a mature and stable extension, with full support for e.g. LOBs and storedprocedures, and is extensively used by other applications.

4.1.3 ThreadsThe use of threads in tcl has been investigated. As with OraTcl, time has only allowed atheoretical investigation of the documentation (see [13]).

Threads seem to work, but certain commands should be avoided to guarantee properexecution, and inter-thread communication is not completely reliable. One questionremaining is how well using OraTcl and threads at the same time will work.

4.2 THE SERVER BROKERThe “Server Broker”, a component developed at Atom for the CM2 report generator, hasbeen investigated.

4.2.1 FunctionalityIt has about the same functionality as the SB component, but works somewhatdifferently. The Server Broker is however capable of performing three of the fourdesired tasks, namely to start the processes SDS, CS, and TCS. It could be extended withthe remaining functionality, namely to restart all failed project executions during arestart after a computer crash.



However, the way it works may be too different to make it convenient to use it.

4.2.2 Behavioral descriptionThe Server Broker waits for a connect message from a client. It forks itself, and the childprocess waits for a login message from the connected client. The login uses any Unixlogin user. When the client logs in, the process starts executing a program that was givenon the Server Broker’s command line.

It thus requires the client to implement a more complicated protocol than thearchitectural proposals imply. Moreover, it is not obvious how to distinguish the threeseemingly simple requests (to start execution of the SDS, CS or TCS).

4.2.3 ConclusionIt should however be possible to use the Server Broker in PAM, maybe after somemodification. Thus, the project is advised to choose to investigate the Server Brokerfurther.

4.3 FILE HANDLINGThe possibility to handle files over the network from the client directly was tested(required for architectures A1 and A2).

In particular, the mechanism should not require any preparation by the user, such asmapping a drive, rather should the files not be exposed directly to the user. It was foundthat it is possible to do this without difficult programming, and hopefully withacceptable performance. One remaining question is how files can be protected withoutbeing copied (for performance reasons), to allow for safe displaying in e.g. Word.

During the final meeting with the project group, however, it was found that the testsperformed by the author assumes that external software is correctly installed, both on theclient and the file server. Such dependencies are unpleasant, but during the meeting, aslight modification of architectures A1 and A2 was discussed which probably wouldsolve this problem, probably still making most of the analyses valid. See appendix G.

4.4 SOCKETSAll architectural solutions require that the application is noticed when a socket is closedfor some reason (if e.g. the communicating process terminates, or there is a networkfailure). The socket mechanism was discussed, and is regarded to be “very safe” in thisrespect. No particular measure on how often the environment fails to report a socketerror was found.

5 REFERENCES[1] Software architecture in practice

L Bass, P Clements, R KazmanAddison-WesleyReading, Massachusetts 1998ISBN 0201199300

[2] An Experiment on Creating Scenario Profiles for Software ChangeP Bengtsson, J BoschDepartment of Software Engineering and Computer Science,



University of Karlskrona/Ronneby 1999ISSN 1103-1581

[3] Design & Use of Software ArchitecturesJ BoschAddison-WesleyEdinburgh 2000ISBN 0-201-67494-7

[4] The Mythical Man-Month – Essays On Software Engineering, 20thAnniversary EditionF P Brooks, Jr.Addison-WesleyReading, Massachusetts 1995ISBN 0201835959

[5] Pattern-oriented Software Architecture – A System of PatternsF Buschmann, R Meunier, H Rohnert, P Sommerlad, M StalJohn Wiley & SonsChichester, West Sussex 1996ISBN 0 471 95869 7

[6] Alice’s adventures in WonderlandLewis CarrollPenguin BooksLondon 1994ISBN 0-14-062086-9

[7] Programkonstruktion och projekthanteringS EklundStudentlitteraturLund 1993ISBN 91-44-38491-2

[8] Design Patterns – Elements of Reusable Object-Oriented SoftwareA Gamma, R Helm, R Johnson, J VlissidesAddison-WesleyUpper Saddle River, New Jersey 1994ISBN 0-201-63361-2

[9] SAAM: A Method for Analyzing the Properties of SoftwareArchitecturesR Kazman, L Bass, G Abowd, M WebbProceedings of the 16th International Conference on SoftwareEngineering, 1994

[10] Software architecture – Perspectives on an emerging disciplineM Shaw, D GarlanPrentice HallUpper Saddle River, New Jersey 1996ISBN 0-13-182957-2



[11] A Field Guide to Boxology: Preliminary Classification ofArchitectural Styles for Software SystemsM Shaw, P ClementsProceedings of the 21st Computer Software and ApplicationsConference, 1994

[12] Software architectures – An overviewA WallDepartment of Computer Engineering, Mälardalen University 1998

[13] Web pageTcl Commands – thread manual pageURL: http://dev.scriptics.com/ftp/thread//thread21.htmlLast visit 2001-04-03 (Appendix F)

[14] Web page“Oratcl” 2.6 (TCL) manual pageURL: http://dev.scriptics.com/man/oratcl2.6/oratcl.html2001-04-03Last visit 2001-04-03 (Appendix F)

Architectural solutions in PAM – Appendix A Page A.1 (7)


APPENDIX A – COST ANALYSISThis appendix contains cost estimates for the architectures. The code is divided into codemodules as described in the figures below, and these are counted to give some estimatedmeasure of the cost associated with implementing each. There is also an analysis of theinvolved messages, which gives an implication of the complexity of the system (and thusthe cost).

The size of the code is estimated with the number of identified code modules, assketched in the figures below. The code is described with modules that are small enoughto be possible to overview and straightforward to implement, however it is notguaranteed that they will have the same size. Some may be considerably smaller thanothers, and therefore this measure is just an estimation. The measure is divided betweenthe amount of totally new code and the amount of existing code that must be modified,and existing code that practically don’t need to be modified.

The complexity of the code is estimated through the number of components, the numberof connections between them, and the number of time-dependent interactions.

It should be noted that the figures are no architectural descriptions since there are noconnections in the figures describing the modules’ relationships. They rather describethe functional separation between modules.

CODE SIZEAll architectures

In the database, there are six services that must be provided (Figure A.1). Two of theseexist today, and the other four are straightforward to implement.

Database

Update project state

Update task state

Read task input data

Check authority

Insert new task

Update task

Figure A.1: The six code modules needed in the database. The two first exist already andmay probably be used “as is”, while the other four must be written.

The SB is designed for providing a few services only, and consists of four code modules,two of which may be rewritten from existing code, and two new (Figure A.2).



Only in case ofarchitecture B1/B2

SB

Commands from ClientStart CS process

Database connectionGet list of failed projects

Restart projectsStart CS process

Commands from CSStart TCS processOnly in case of

architecture A1/A2

Commands from ClientStart SDS process

Figure A.2: The four code modules in the SB. The commands’ functionality could beachieved by modifying existing code, while the two lowermost must be written fromscratch.

The CS component consists of eight code modules, three of which partly can be reused,and five new (Figure A.3).

CS

Script commandsRUN/BGRUN

HITMIN

Communication with SBCommand "Start TCS"

Database connectionAdd/modify tasks from HITMIN

Save project state

Receive commandsFileevent for file CScommand

Communication with TCSSend commands to TCS

Recieve task status

Figure A.3: The eight code modules in the CS. For the communication modules, existingcode could be used to start coding from.

The TCS consists of seven code modules, one of which does not need to be modified,two of which can be implemented reusing some code, and four new (Figure A.4).



TCS

Input data formattingInput data formatting

Communication with CSRecieve commands

Report status

Database connectionRead task input data

Control if task is up-to-dateSave task state

CP controlStart, Suspend, Resume, Kill CP, with or without LSF

Figure A.4: The seven code modules in the TCS. The communication modules couldreuse parts, and the input data formatting already exist and should require very little re-coding.

The number of code modules of the code parts that are common to all architecturalproposals is thus 25, according to figures A.1 to A.4. Of these, 3 exist and could be used“as is”, 5 exist but need to be modified, and 17 must be written from scratch.

Architectures A1/A2In architectures A1 and A2, the code in the client is described by figure A.5. There areten code modules, three of which can partly be reused.



Client

File handlingRead/view text files

Remove files (on users command)Compare files (diff)

Export projectView graphical data

Communication with SBCommand "Start CS"

Project executionWrite CScommand file

Binary files interpreterBinary files interpreter

Database connectionSave output data

Control user authority

Figure A.5: The ten code modules of the client in architectures A1 and A2. The code forcommunicating with the SB could in part be reused. Code for reading and viewing textfiles and graphical data also partly exist.

Architectures B1/B2In architectures B1 and B2, there are two more components: the client and the SDS. Thecode is described by figures A.6 and A.7. There are 22 code modules in total, 7 of whichcan partly be reused.



Client

Communication with SDSRPC abstraction layer

File handlingView text files

Remove files (on users command)Compare files (diff)

Export projectView graphical data

Communication with SBCommand "Start SDS"

Project executionStart (resume) execution

Suspend executionStop execution

Database connectionControl user authority

Figure A.6: The eleven code modules of the client in architectures B1 and B2. The codefor communicating with the SB and SDS (“RPC abstraction layer”) could in part bereused. Code for viewing text files and graphical data also partly exists.

SDS

File handling:Read text files

Remove files on users commandCompare files (diff)

Export projectArchive project

CommunicationCommunication with client

Project executionWrite CScommand file

Start CS process

Database connectionSave output data

Control user authority?

Binary files interpreterBinary files interpreter

Figure A.7: The eleven code modules of the SDS in architectures B1 and B2. The codefor communicating with the client could in part be reused, as well as the code for readingtext files and the binary files interpreter.



COMPLEXITYAs a measure of complexity, the number and types of messages can be used. Also, thenumber of interacting components (processes) in an architecture is a measure ofcomplexity.

In the message sequence charts of this section, the first message arrow in each orderedsequence starts with a dot. Different sequences of messages, each starting with a dot, canbe executed in any order. See Figure A.8.

Process 11

Process 2

2

4

5

3

Figure A.8: A schematic message sequence chart. The dots indicate the start of asequence of messages: message 1 always occur before message 2 and 3, while thesequence involving message 4 and 5 can be executed before or after 1, 2, and 3.

All architecturesThe messages involved during project execution are displayed in Figure A.9 as amessage sequence chart. These messages are common for all four architectures.

TCSCSStart TCS

Starting TCS

SB

"Executing"

"Pause"

"Stop"

"Resume"

"Executed"

Figure A.9: All possible messages involved during project execution, affecting the CS,SB and TCS components. The dotted line indicates that the SB starts the TCS process.The TCS exits immediately after the “Executed” message is sent.

These messages seem to implicate rather complex code. However, as the pseudocodelisted in Appendix D shows, the architecture has inherent properties that make the codequite simple, yet reliable.



Architectures A1/A2In architectures A1 and A2, there is only one additional message, namely when the clientrequests a project execution from the SB (see Figure A.10). It can be noted that thismessage does not require an answer, so a very simple call mechanism could be used.

SBClientStart CS

Figure A.10: The only additional message needed in architectures A1 and A2.

Architectures B1/B2In architectures B1 and B2, there are a number of additional messages involved whenthe client and the SDS communicates (see Figure A.11).

SB SDSClientStart SDS

Starting SDSSDS started

"Execute"

"Stop"

"Pause"

"ACK"

"ACK"

"ACK"

"GetFileList"FileList

"ReadTextFile"TextFile

"ReadBinaryFile"InterpretedFile

Figure A.11: The additional messages needed in architectures B1 and B2.

In architectures B1 and B2, there is a need to implement a more sophisticated call layer,to ensure that each message’s acknowledge is handled correctly. Probably a layerimplementing a synchronous remote procedure call abstraction is an appealing solution.

Architectural solutions in PAM – Appendix B Page B.1 (4)


APPENDIX B – PERFORMANCE AND SYSTEM LOAD ANALYSISThis appendix documents the performance and system load analysis.

The number of processes and the number of network accesses are presented for eacharchitecture. These measures give some estimation on the performance and load of thesystem.

NUMBER OF PROCESSESIn this section, it is assumed that all components are implemented as processes, althoughthreads could be used. However, exchanging processes for threads would increaseperformance and decrease system load with only a constant factor, and the measuresbelow are therefore interesting in either case.

To make a realistic estimation of the number of processes, the number of simultaneousprocesses in three estimated “normal” PAM systems are simply counted17. Thefollowing parameters are introduced:

• S, the number of server computers in the system.

• C, the number of clients at a given moment.

• P, the number of currently executing projects in the system.

• T, the average number of tasks in a project.

All architecturesThe number of SB’s is constant throughout the systems life. There is one SB per servercomputer in the system; thus the number of SB’s is always S.

The number of processes started due to project executions is one CS per project, plusone TCS per executing task. Thus the number of started processes due to projectexecutions is P·T, possibly simultaneously.

Architecture A1/A2Each client is a process, but no more processes are started in any scenario. Thus, thenumber of additional processes are C.

Architecture B1Each client is a process, and for each client and server computer there is an SDS. Thus,the number of additional processes are C + C·S.

Architecture B2Each client is a process, and for each client there is an SDS. Thus the number ofadditional processes are thus 2·C.

SummaryThe total number of processes in the system is listed in Table B.1 and Figure B.1.

17 This calculation model was invented by the author.



A1 A2 B1 B2

Formula fornumber ofprocesses

S + P·T + C S + P·T + C S + P·T + C + C·S S + P·T + 2·C

S = 1, C = 4,P = 2, T = 5

15 15 19 19

S = 3, C = 20,P = 5, T = 10

73 73 133 93

S = 8, C = 30,P = 15, T = 20

338 338 578 368

Table B.1: The number of processes in the system, first as formulae and then with valuesfor the parameters, representative for a small, a medium, and a large PAM system.

0

100

200

300

400

500

600

700

Small Medium Large

System size

Num

ber

of p

roce

sses

A1

A2

B1

B2

Figure B.1: The number of processes for a small, a medium, and a large PAM system.

NUMBER OF NETWORK ACCESSESThis section contains an analysis of the number of network accesses, and the size ofmessages, during certain scenarios. The scenarios are:

• A text file is viewed.

• Data in a binary file is interpreted and viewed.



• Two output files are compared.

• A project is executed.

• A project is exported.

Architecture A1/A2To view a text file, the file is read across the network. In case of small files, the wholefile is read, while larger files (the limit is user-defined) are opened using an externalprogram such as Word, which more intelligently reads only the parts that the user views.Therefore, one network access is needed, not necessarily for the whole file if it is big.

To view binary data, the binary file interpreter is invoked, which may read the whole file(depending on how it is implemented). Thus, there is one network access, potentiallyvery large.

To compare two files, both files must be transferred across the network. The mechanismcould however read parts and terminate the comparison when a difference is found.There is thus two potentially large file transfers.

To execute a project, each TCS reads its input data from the database and writes it tofile. In architecture A1, we know that the TCS stores the file locally, while inarchitecture A2, the TCS might run on a different server than the file server. There isthus one network access per task to read all task data from the database, and in case ofA2, possibly one more.

When exporting a project, the files that are to be exported must be moved to the newlocation, possibly across the network.

Architecture B1To view a text file, the SDS on the server computer where the file resides sends thewhole file across the network. Thus, there is one network access, potentially very large.

To view binary data, the SDS on the server computer where the file resides interprets thebinary file, and sends a subset of the data in it. Thus, one network access, probably verysmall compared to the file size.

To compare two files, the SDS running on the server where the files reside performs theactual comparison. One file must potentially be transferred across the network. There isthus at most one potentially large file transfer.

To execute a project, each TCS reads its input data from the database and writes it tofile. We know that the TCS stores the file locally, there is thus one network access pertask to read all task data from the database.




Architecture B2To view a text file, the SDS first reads the whole file (possibly over the network, if itdoes not reside on the same server computer), and then sends the whole file across thenetwork. There is thus potentially two network accesses, potentially very large.

To view binary data, the SDS interprets the binary file, and sends a subset of the data init. There is thus up to two network access, the first potentially very large, and the secondprobably very small compared to the file size.

To compare two files, the SDS might run on another server than the file server.Therefore, potentially both files must be transferred across the network.

To execute a project, each TCS reads its input data from the database and writes it tofile. However, the TCS might run on a different server than the file server, and thus thereis first one network access per task to read all task data from the database, and thenpossibly one more.


SummaryTable B.2 summarizes the number of large data transfers across the network.

A1 A2 B1 B2

A text file isviewed

One, notnecessarily thewhole file

One, notnecessarily thewhole file

One, transferringthe whole file

Up to twotransfers of thewhole file

Binary data isinterpreted andviewed

One, potentiallylarge (dependingon the implemen-tation of theactual interpreter)

One, potentiallylarge (dependingon the implemen-tation of theactual interpreter)

One, probablyvery small

Up to twotransfers, the firstpotentially large,the secondprobably verysmall

Two files arecompared

Two filetransfers,potentially of thewhole file

Two filetransfers,potentially of thewhole file

At most one filetransfer,potentially thewhole file

Potentially twotransfers of thewhole files

A project isexecuted

One per task (datafrom thedatabase)

Potentially twolarge datatransfers per task

One per task (datafrom thedatabase)

Potentially twolarge datatransfers per task

A project isexported

Possibly onenetwork transferper file




Table B.2: The number of large data transfers across the network in different scenarios.

Architectural solutions in PAM – Appendix C Page C.1 (11)


APPENDIX C – MAINTAINABILITY ANALYSISThis appendix contains a maintainability analysis for the PAM calculation environment.

The purpose of this analysis is to estimate how apt for future changes the architecturalproposals are. The analysis is made with the SAAM methodology, which uses changescenarios for estimation. Each scenario is a proposed extension of the functionality, andthe affected components are listed. The analysis then gives an estimation of how easilythe system can be modified in general. For a more detailed description of SAAM, pleaserefer to [9] and [1].

The requirements in the requirements specification (see appendix E) are marked with“priority one” (required) and “priority two” (optional). The analysis uses the optionalrequirements as change scenarios. In addition, some other desired functionality thatnever was formulated as formal requirements is used.

The result of every scenario execution is documented below, first in text, then in TableC.1. The code modules referred are the modules as described in appendix A. Theoptional requirements in the requirements specification are listed by their identity,followed by the other desired functionality.

REQUIREMENTSEach requirement in the requirements specification with priority 2 is listed and analyzedin this section. All quotations are from appendix E, and all references to requirementsare references to appendix E.

Requirement (AC 3-3.5)“(AC 3-3.5)There shall be a possibility to add commands providing more complex functionality in asingle command. Such commands can possibly contain harmful commands, but thepossibility to edit such commands shall be restricted to certain users.”

The CS component, the database, and the client are affected. This functionality isincluded in the architectures, but need not be implemented in the first release. Four codemodules need to be added when the functionality is to be added, “Script commands –HITMIN” and “Database connection – Add/modify tasks from HITMIN” in the CS(Figure A.3 in appendix A, page 2), a new module “Handle script commands” in thedatabase, and “Handle script commands” in the client.

This scenario thus affects four modules (in three components) in each architecturalproposal.

Requirement (AC 3-5)“(AC 3-5)If a task has been successfully executed earlier, it shall be re-executed if and only if anyof the following conditions hold. (Prio 2)

(AC 3-5.1)The task has been modified since the previous execution.



(AC 3-5.2)The task’s parent’s execution end time is later than the task’s lastexecution start time.

(AC 3-5.3)The task’s parent does not have state “Failed”. (See requirement AC3-4)

(AC 3-5.4)The user explicitly wants it to be re-executed.”

This requirement could preferably be implemented by a check in the TCS for theconditions (AC 3-5.1) through (AC 3-5.3) when a task is about to be executed. Probablyonly the “Database connection” module is affected (see Figure A.4 in appendix A, page3).

This scenario thus affects one module in each architectural proposal.

Requirement (AC 4-5)“(AC 4-5)It shall be possible to execute a project folder. This means that the command “Execute”is issued to several (user selected) projects in a folder (recursively).”

This functionality resides basically in the user interface, which is not included in thearchitectural proposals. However, it should be simple to implement this functionality,since nothing hinders several projects from being executed at the same time.

Thus, this scenario is irrelevant for analyzing the maintainability of the architecturalproposals.

Requirement (AC 4-6)“(AC 4-6)It shall be possible to specify a maximum amount of CPU time that may be used by atask. The system shall automatically stop a calculation exceeding this time. (Prio 2)”

The database and other parts not included in the architectural proposals are affected, tothe same extent in the different all architectures.

The TCS must then be augmented, with functionality to monitor the CPU time spent bythe calculation program process, in module “CP control” (see Figure A.4 in appendix A,page 3).


Requirement (AC 4-7)“(AC 4-7)Output data views shall be continuously updated during execution. (This requirementrequires that output files are readable during execution.)”



Only a timer is needed, trigging the “view graphical data” module in the client (forarchitectures A1 and A2, see Figure A.5 in appendix A, page 4, for architectures B1 andB2, see Figure A.6 in appendix A, page 5).


Requirement (AC 5-1.2)“(AC 5-1.2)[For each program registered in PAM, it shall be possible to state where it can be run.] Itshall be possible to select classes of computers. (Such as “any NT computer”, or “anycomputer in the TekBer cluster”.)”

The registration itself affects modules in the client outside the scope of the architecturalproposals, to the same extent in the different all architectures.

The CS’s module executing the script commands “RUN/BGRUN” (see Figure A.3 inappendix A, page 2) need to be extended. It would then be possible to start the TCS viaan available load sharing system (see requirement (AC 8-2))18, or, if there is noneavailable, by randomly choosing on which computer in the specified class the taskshould be executed.


Requirement (AC 5-2.2)“(AC 5-2.2)[A whole PAM system shall be able to run on a single computer.] A user shall be able towork temporarily without connection to the server computers or the database, andsynchronize his work with the central database when reconnected to it.”

This requirement could be implemented in different ways. The calculation programscould e.g. be installed on the local computer, thus allowing project executions on thelocal computer. Or, the users can only edit projects, but has to wait until he reconnects tothe main PAM system before projects can be executed.

The requirement only affects components outside the scope of the architecturalproposals (parts of the database and the client). It is rather a question of e.g. setting uplocal databases and synchronizing them with the central database and handlinginconsistencies. The architectural proposals do not restrict the inclusion of thisfunctionality.

Thus, this scenario is irrelevant for analyzing the maintainability of the architecturalproposals.

Requirement (AC 7-1.2)“(AC 7-1.2)It shall be possible to compare ASCII text files.”

18 This assumes that the load sharing system supports a specification of a number of allowed computers whenthe start command is issued.



In architectures A1 and A2, this functionality needs to be implemented in the client. Thisfunctionality is included in the proposals, but need not be implemented in the firstrelease. One code module thus need to be added when the functionality is to be added,“file handling – compare files” in Figure A.5 (appendix A, page 4).

In architectures B1 and B2, this functionality needs to be implemented in both the clientand the SDS. This functionality is included in the proposals, but need not beimplemented in the first release. Two code modules need to be added when thefunctionality is to be added, “file handling – compare files” in the client (Figure A.6 inappendix A, page 5) and “file handling – compare files” in the SDS (Figure A.7 inappendix A, page 5).

This scenario thus affects one module in architectures A1 and A2, and two modules (intwo components) in architectures B1 and B2.

Requirement (AC 7-3)“(AC 7-3)Specified subsets of output data shall, at the user’s command, be stored in the database.”

In architectural proposals A1 and A2, only the client is affected. This functionality isincluded in the proposals, but need not be implemented in the first release. Two codemodules are thus affected to include the new functionality, namely the modules“Database connection – Save output data” in Figure A.5 (appendix A, page 4) and a newcode module extracting the relevant data. Some more code, outside the scope of thearchitectural proposals, will probably also be affected.

In architectures B1 and B2, this functionality could be implemented in two ways, eitherby letting the client insert data into the database, read through the SDS, or letting theSDS insert the data, on command from the client. The latter version is included in theproposals, but need not be implemented in the first release. In the first case, only theclient need to be extended, in module “Database connection” (Figure A.6 in appendix A,page 5), together with the addition of a code module extracting the relevant data. In thelatter case, the modules “Database connection – Save output data” (Figure A.7 inappendix A, page 5) and a code module extracting the relevant data need to be added.

This scenario thus affects two modules (in one component) in architectures A1 and A2,and two modules (in either the client or the SDS) in architectures B1 and B2.

Requirement (AC 7-4)“(AC 7-4)The user shall be able to remove any (non-system) file.”

In architectures A1 and A2, the client module “File handling” (see Figure A.5 inappendix A, page 4) needs to be extended with the possibility to delete files.

In architectures B1 and B2, the modules “File handling” in the client (Figure A.6 inappendix A, page 5) and the module “File handling” in the SDS (Figure A.7 in appendixA, page 5) need to be extended with the possibility to delete files.




Requirement (AC 7-5)“(AC 7-5)There shall be a possibility to export all files associated with a project (or a subset).Access rights must be correct, ensuring that the user performing the export has rights toit.”

In architectures A1 and A2, this functionality needs to be implemented in the client. Thisfunctionality is included in the proposals (but need not be implemented in the firstrelease). One code module thus need to be added when the functionality is to be added,“file handling – export project” in Figure A.5 (appendix A, page 4).

In architectures B1 and B2, this functionality needs to be implemented in both the clientand the SDS. This functionality is included in the proposals (but need not beimplemented in the first release). Two code modules need to be added when thefunctionality is to be added, “File handling – Export project” in the client (Figure A.6 inappendix A, page 5) and “File handling – Export project” in the SDS (Figure A.7 inappendix A, page 5).


Requirement (AC 7-6)“(AC 7-6)To process a task’s input and output data files, it shall be possible to run additionalprograms on the server computer where the task is run (to e.g. convert a binary file toASCII format).

(AC 7-6.1)After a project execution, at a user’s command.

(AC 7-6.2)Before and after the execution of the calculation program, as a part ofthe task specification.”

Available programs must be registered in PAM. The registration itself affects modules inthe client outside the scope of the architectural proposals, to the same extent in thedifferent all architectures.

In all architectures, the client must be extended to let the users issue the commanddescribed by requirement (AC 7-6.1) (outside the scope of the architectural proposals).To execute the program, it should be sufficient to issue the registered command line.

To execute the program at a user’s command (requirement (AC 7-6.1)) would requirethe addition of one code module; for architectures A1 and A2 in the SB, and in the SDSfor architectures B1 and B2.

To fulfill requirement (AC 7-6.2), the user interface and database parts handling tasksmust be extended with functionality to let the users specify the programs that should beexecuted before and after the execution of the calculation program, but these issues areoutside the scope of the architectural proposals. The TCS should then perform these



actions, so it needs to be extended with some code in maybe one module (the same forall architectures).

This scenario thus affects two modules (in two components, one of them the SB) inarchitectures A1 and A2, and two modules (in two components) in architectures B1 andB2.

Requirements (AC 7-8) and (AC 7-8.2)“(AC 7-8)It shall be possible to archive project data (e.g. through the HSM system, on a compactdisc, or in the database) at the user’s command 19.

…

(AC 7-8.2)It shall be possible to restore an archived project.”

It is not obvious how a general archiving mechanism should be implemented. Onealternative is to copy all relevant files for a project to a specific directory, maybe as acompressed file, and let the user e.g. run the CD writing program. In case of HSM, it isenough to store the project in a specific directory to archive it on magnetic tape. Theuser must then enter the text describing where the project is archived, and approve to thedeletion of the files (one of the reasons to include archiving functionality is to save diskspace). To restore an archived project, only copying of files from the archive to PAMsfile tree is required.

The export mechanism could be used for archiving, as described in the analysis ofrequirement (AC 7-5) above, together with the module “Remove files (on userscommand)” file handling already existing. To restore archived data, a module “Copyfiles” need to be implemented (in architectures A1 and A2 only in the client, in B1 andB2 in both the client and the SDS). To control all this, a new module “Archive project”is needed. (The export mechanism affects one module in architectures A1 and A2, andtwo modules (in two components) in architectures B1 and B2. The already existing filehandling affects one code module in architectures A1 and A2, and two modules (in twocomponents) in architectures B1 and B2. See section “Requirement (AC 7-5)” above.)

This scenario thus affects 2 modules (in 1 component) in architectures A1 and A2, and 4modules (in 2 components) in architectures B1 and B2.

Requirement (AC 7-8.1)“(AC 7-8.1)[It shall be possible to archive project data at the user’s command.] PAM shall notice agroup of people 20 when a project is not touched after a user-specified time. The defaulttime shall be three months.”

19 It is regarded too insecure to archive automatically, since it must be ensured that the files are actually storedon non-volatile media, not only copied to another directory, before the original files are deleted. Otherwise thefiles may be lost in case of e.g. disk failure.20 It is not specified which users, but could be e.g. the user responsible for the project or system administrators.



Notification of users can be implemented by introducing a message system where theclient searches for info in the database during startup 21. This would probably require onenew module in the client and one in the database.

This scenario thus affects 2 modules (in 2 components) in each architecture.

Requirement (AC 8-1.1)“(AC 8-1.1)It shall be able to find all server computer processes started by PAM within PAM.”

PAM should store enough information in the database every time a process is started(not including the SB) to be able to later handle suspicious processes. The informationneeded might be process identity and computer identity. It is then possible to find all leftover processes by searching using the correct criteria.

In all architectures, the CS process could store information about itself during startup,associated with the started project (affecting one code module, “Database connection”(Figure A.4 in appendix A, page 3), and requiring the addition of one database module“Insert project/task process info”). In architectures B1 and B2, information about everySDS must also be stored. This could similarly be done during startup of the SDS, thusaffecting one code module of the SDS, “Database connection” (Figure A.7 in appendixA, page 5), also requiring one new module of the database, “Insert SDS process info”.

This scenario thus affects two modules (in two components) in architectures A1 and A2,and four modules (in three components) in architectures B1 and B2.

Requirement (AC 8-1.2)“(AC 8-1.2)PAM shall be able to kill any such process at a user’s command 22.”

Given enough info to be able to kill a particular process on a particular computer, thisrequires a new code module in the client, “Kill process” (for all architectures). Inarchitectures A1 and A2, the SB would have to execute the kill command, implying thata new code module (and a new message) is introduced. In architectures B1 and B2, thesame would apply to the SDS.

This scenario thus affects two modules (in two components, one of them the SB) inarchitectures A1 and A2, and two modules (in two components) in architectures B1 andB2.

OTHER DESIRED FUNCTIONALITYThe following sections contain functionality desired in future releases of PAM. Theseare not detailed enough to be formal requirements.

Sequence of eventsA specified sequence of events shall be possible to be applied to output data.

21 This mechanism could be used for other purposes, e.g. to inform users about planned system disturbances.22 Clarification: PAM shall never try to automatically find and kill suspicious processes.



This functionality could be implemented in either of at least two ways. The sequence ofevents could be calculated at once the task has calculated, or be re-calculated each timeit is viewed.

With the first approach, the TCS component would be modified with an additional codemodule, which is executed after the calculation is finished (the same with allarchitectures). With the second approach, a new code module would be implemented inthe client, applying a sequence of events to the actual output data (read via the existingbinary files’ interpreter). With both approaches, the database would be affected, inapproximately one module.

This scenario thus affects two modules (in two components) in each architecturalproposal.

Revision of projectWhen revising a project, tasks shall not be copied until they are changed; instead links tothe original tasks shall be used. This both saves storage space and may make it easier toget an overview of the differences between two project revisions.

The client and the database would be equally affected in all architectures. Thisfunctionality is however out of the scope of the architectural proposals, so this scenariois irrelevant for analyzing the maintainability of the architectural proposals.

Search for tasksIt shall be possible to find tasks recursively by stating some search criteria (e.g. all tasksexecuted during 2000 using the model “X” in the current project folder).

The client would be equally affected in all architectures. This functionality is howeverout of the scope of the architectural proposals, so this scenario is irrelevant for analyzingthe maintainability of the architectural proposals.

Restart filesPAM shall support the use of restart files for calculation programs supporting it. Thiswould make certain calculations run much faster. PAM must implement rules thatrestricts the use to eliminate the possibility of e.g. using a restart file inconsistent withthe actual input data used in a task.

The client and the database would be equally affected in all architectures, however outof the scope of the architectural proposals. To actually use them during a calculationwould require extending the TCS's “Database connection” and “Calculation programcontrol” modules (Figure A.4 in appendix A, page 3).

This scenario thus affects two modules (in one component) in each architecturalproposal.

Interactive programsPAM shall support interactive programs, by stepping, pausing, and give commands suchas “insert control rods” interactively. BISON has such possibilities by its BISCOMinterface.



This would require extensive changes to the client and the TCS. It would also requiresome means of communication between them, which could be implemented either bysocket communication (increasing the complexity enormously) or by sending thecommands through a file.

This scenario is thus hard to quantify, but affects at least two components in eacharchitectural proposal.

ANALYSIS SUMMARYTable C.1 summarizes the maintainability analysis. Table C.2 and Table C.3 summarizestatistics about how the architectures are affected by the scenarios.

A1/A2 B1/B2

Requirement (AC 3-3.5) 4 modules in 3 components 4 modules in 3 components

Requirement (AC 3-5) 1 module 1 module

Requirement (AC 4-5) Irrelevant requirement



Requirement (AC 5-1.2) 1 module 1 module

Requirement (AC 5-2.2) Irrelevant requirement

Requirement (AC 7-1.2) 1 module 2 modules in 2 components

Requirement (AC 7-3) 2 modules in 1 component 2 modules in 1 component

Requirement (AC 7-4) 1 module 2 modules in 2 components

Requirement (AC 7-5) 1 module 2 modules in 2 components

Requirement (AC 7-6) 2 modules in 2 components* 2 modules in 2 components

Requirements (AC 7-8) and(AC 7-8.2)

2 modules in 1 component 4 modules in 2 components



Requirement (AC 8-1.2) 2 modules in 2 components* 2 modules in 2 components

Sequence of events 2 modules in 2 components 2 modules in 2 components

Revision of project Irrelevant requirement



A1/A2 B1/B2

Search for tasks Irrelevant requirement

Restart files 2 modules in 1 component 2 modules in 1 component

Interactive programs At least 2 components At least 2 components

Table C.1: The number of modules and components affected in each architecture byeach scenario execution. A “*” means that the SB is affected, which is undesirablebecause of its central position. Scenario execution results in italics are not included inthe statistics summary.

A1/A2 B1/B2

Total number of modules andcomponents affected

27 modules in 23 components 34 modules in 28 components

Average number of modulesand components affected

1.7 modules in 1.4 components 2.1 modules in 1.8 components

Worst number of modulesand components affected

4 modules in 3 components(once)

4 modules in 3 components(twice)

Number of scenarios affectingat least 2 components

6 10

Number of scenarios affectingthe SB

2 0

Table C.2: Statistics from the scenario executions. The average measure is calculated asthe number of modules or components affected divided with the number ofrequirements. Scenario execution results in italics are excluded from the statistics.

A1/A2 B1/B2

Database Affected by 4 scenarios:(AC 3-3.5), (AC 7-8.1), (AC 8-1.1), SOE

SB Affected by 2 scenarios:(AC 7-6), (AC 8-1.2)

Affected by 0 scenarios

CS Affected by 3 scenarios:(AC 3-3.5), (AC 5-1.2), (AC 8-1.1)

TCS Affected by 3 or 4 scenarios(AC 3-5), (AC 4-6), SOE, Restart files



A1/A2 B1/B2

Client Affected by 10 or 11 scenarios:(AC 3-3.5), (AC 4-7), (AC 7-1.2),(AC 7-3), (AC 7-4), (AC 7-5),(AC 7-6), (AC 7-8) and (AC 7-8.2),(AC 7-8.1) , (AC 8-1.2), SOE

Affected by 9 to 11 scenarios:(AC 3-3.5), (AC 4-7), (AC 7-1.2),(AC 7-3), (AC 7-4), (AC 7-5), (AC 7-6),(AC 7-8) and (AC 7-8.2), (AC 7-8.1),(AC 8-1.2), SOE

SDS Irrelevant Affected by 7 or 8 scenarios:(AC 7-1.2), (AC 7-3), (AC 7-4),(AC 7-5), (AC 7-6),(AC 7-8) and (AC 7-8.2), (AC 7-8.1),(AC 8-1.2)

Table C.3: The number of scenarios affecting each component. The variations dependson how scenarios “Requirement (AC 7-3)” (page 4 in this appendix) and “SOE”(“Sequence of events”, page 8 in this appendix) are implemented.

Architectural solutions in PAM – Appendix D Page D.1 (9)


APPENDIX D – PSEUDOCODE AND CODE ANALYSISThis section contains pseudocode skeletons for the TCS, CS, SB, and SDS components.

The purpose of this pseudocode is to show how these components could be implementedto achieve the desired functionality, their interaction in particular. There is a briefanalysis of each, to show that certain properties hold, and to emphasize what theimplementation requires from the environment.

The internal architecture of the components has not been specified in this document.This pseudocode sketches one possible internal architecture, where many of thecomponents use a sort of event loop to trig actions. The code modules specified inappendix A are called from this event loop.

It should be noted that the pseudocode is not complete. Some lines are e.g. marked withquestion marks, and every requirement is not fulfilled.

TCSPseudocode

TCS(CSPort, CSHost, DbConnectData, TaskKey)1. Socket.Connect(CSHost, CSPort)2. Socket.Send(’Executing’)3. Db ← DbConnect(dbConnectData)4. If UpToDate(Db, TaskKey) Ø No need to re-execute5. Exit6. If DbUpdate(Db, TaskKey, ’Executing’) fails Ø Task is executing7. Exit8. If Not DirectoryExist(task.dirname)9. CreateDirectory(task.dirname)10. FileRemove(task.dirname & ’\*’)11. Env ← GetTaskEnv(Db, TaskKey)12. WriteToFiles(Env)13. (Get parent files and store locally if needed)14. PID ← Execute(Task.Command, Background)15. WaitForEvent:16. Socket.Closed Ø Error in CS communication17. Kill PID18. DbUpdate(Db, TaskKey, ’Failed’)19. Exit20. Socket.Receive(Message)21. Switch(Message)22. Case Pause Ø Pause execution23. DbUpdate(Db, TaskKey, ’Pause’)24. Suspend PID25. Case Stop Ø Stop execution26. Kill PID27. DbUpdate(Db, TaskKey, ’Stopped’)28. Exit29. Case Resume Ø Resume execution30. DbUpdate(Db, TaskKey, ’Executing’)31. Resume PID32. ExecFinished(ExitValue)33. DbUpdate(Db, TaskKey, ’Executed(ExitValue)’)34. Socket.Send(’Executed’)35. Exit



UpToDate(Db, TaskKey)1. TaskStartTime ← GetTaskStartTime(Db, TaskKey)2. ParentEndTime ← GetParentEndTime(Db, TaskKey)3. ProjectStartTime ← GetProjectStartTime(Db, TaskKey)4. ProjectModificationTime ← GetProjectModificationTime(Db, TaskKey)5. TaskInputTime ← GetTaskInputLastModified(Db, TaskKey)6. If TaskStartTime < ProjectStartTime7. Ø Current project execution started after last task execution8. Return False9. If TaskStartTime < TaskInputTime10. Ø Task input has been modified since last task execution11. Return False12. If TaskStartTime < ParentEndTime13. Ø Task’s parent executed since last task execution14. Return False15. If TaskStartTime < ProjectModificationTime16. Ø Project modified (e.g. new tasks) since last task execution17. Return False18. Return True

DbUpdate(Db, TaskKey, State)1. Switch(State)2. Case ’Executing’, ’Stopped’, ’Paused’3. WriteFile(’TaskState’, State)4. If (UPDATE I_TblTask SET TaskState = State WHERE pk = TaskKey) fails5. Kill PID6. Exit7. Else8. DeleteFile(’TaskState’)9. Case ’Executed(ExitValue)’10. WriteFile(’TaskState’, State)11. If (UPDATE I_TblTask SET TaskState = State, ErrorValue = ExitValue

WHERE pk = TaskKey) succeeds12. DeleteFile(’TaskState’)

Important variables and communication1. The TCS communicates with its CSs via sockets.

2. If the file TaskState exists, it contains the correct task state (it exists because thedatabase was unavailable). The client should therefore check for the existence of thisfile, update the database immediately if it exists, and delete the file.

AnalysisThe following functionality is required from the operating system:

1. The socket is closed if the CS disappears (in case of e.g. CS termination, networkfailure etc.).

2. CP terminates if TCS terminates (behavior with LSF remains to be verified).

3. We are guaranteed to be noticed when CS has terminated (ExecFinished, line 32)

The calculation program (CP) is never left when TCS is gone, since:

1. The calculation terminates after a certain amount of time in normal cases (line 32).



2. If TCS terminates, the calculation terminates according to prerequisite 2.

3. TCS terminates the calculation explicitly (with Kill) in all other cases:

a CS disappears (socket is closed, line 16)

b CS issues the Stop-command (line 25)

A TCS is never left when its CS has disappeared (in spite of the infinite waiting loop),since:

1. TCS if CP terminates reports to CS and terminates, according to prerequisite 3 (lines32, 34)

2. TCS if CS unexpectedly disappears terminates the CP and itself (lines 16-19)

There may be erroneous information about a task in the database if the TCSunexpectedly terminates between two commands (that should be atomic):

1. Between Kill and DbUpdate (lines 17-18 and 26-27). In the database the task ismarked “Executing” where it should say “Failed” or “Stopped”. (Please note theimportance of the order between the commands; we rather want a dead CP that isbelieved to be alive rather than the opposite (and load the system with processes noone has control of).)

2. Between the event ExecFinished and DbUpdate (lines 32-33). In the database,the task is marked “Executing” when it should be “Executed”.

3. Between DbUpdate and Resume (lines 30-31). In the database, the task is marked“Suspended” when it should be “Executing”.

4. Between DbUpdate and Suspend (lines 23-24) the CP is also terminated,according to prerequisite 2. In the database, the task is marked “Suspended” when itshould be “Executed” or “Failed”.

In addition, there may be erroneous information about a task in the database if thedatabase is lost. The CP may e.g. have executed successfully, but the TCS is not able toupdate the task’s state from Executing to Executed.

CommentThe commands Execute, Suspend, Resume, and Kill run a script that can be tailor-made for a specific installation of PAM and use e.g. LSF.



CSPseudocode

CS(DbConnectData, ProjectKey)1. Db ← DbConnect(dbConnectData)2. Ø Test if project is executing3. If DbUpdate(Db, ProjectKey, ’Executing’) fails4. Ø Write command to existing CS5. WriteFile(project.DirName & ’\CScommand’, ’Execute’)6. Exit7. If Not DirectoryExist(Project.DirName)8. CreateDirectory(Project.DirName)9. Script ← DbGetProjectScript(Db, ProjectKey)10. ScriptState ← ’Executing’11. Evaluate(Script & ’ScriptState ← ’Executed’’)12. WaitForEvent:13. Socket.Closed(TCSid) Ø TCS gone14. If TCS-List(TCSid) exists Ø Unexpected15. StopExecution()16. Socket.Receive(TCSid, Message)17. Switch(Message)18. Case ’Executed’19. TCS ← TCS-List(TCSid)20. TCS-List.Remove(TCSid)21. TCS.Semaphore.Signal()22. If ScriptState = ’Executed’ And TCS-List Is Empty23. Exit24. Case ’Executing’25. Ø TCS responds, turn off time out26. TCS-List(TCSid).Timer.Timer.State = Off27. FileEvent(’CScommand’, Contents)28. Switch(Contents)29. Case ’Pause’ Ø Pause execution30. For each TCS in TCS-List31. SendMessage(TCS, ’Suspend’)32. DbUpdate(Db, ProjectKey, ’Suspended’)33. Case ’Stop’ Ø Stop execution34. ScriptState ← ’Stop’35. For each TCS in TCS-List36. TCS.Semaphore.Signal()37. SendMessage(TCS, ’Stop’)38. DbUpdate(Db, ProjectKey, ’Stopped’)39. Exit40. Case ’Execute’ Ø Resume execution41. DbUpdate(Db, ProjectKey, ’Executing’)42. For each TCS in TCS-List43. SendMessage(TCS, ’Resume’)44. Timer.Alarm45. Ø TCS have not responded in time; stop execution46. StopExecution()

StopExecution()1. ScriptState ← ’Stop’2. DbUpdate(Db, ProjectKey, ’Failed’)3. For each TCS in TCS-list4. SendMessage(TCS.Socket, ’Stop’)5. TCS.Semaphore.Signal()6. Exit



Script Commands Pseudocode

RUN(TaskName)1. TaskKey ← DbGetTaskKey(Db, TaskName)2. TCS(Socket, DbConnectData, TaskKey)3. Ø Associate a new semaphore with this TCS4. Semaphore ← CreateSemaphore(TCS)5. Timer.AlarmTime = TimeOutSeconds6. TCS-List.Add(TCS, Socket, Semaphore, Timer)7. Semaphore.Wait Ø Wait for task to be released8. If ScriptState = ’Stop’9. Exit

BGRUN(TaskName)1. TaskKey ← DbGetTaskKey(Db, TaskName)2. TCS(Socket, DbConnectData, TaskKey)3. Timer.AlarmTime = TimeOutSeconds4. TCS-List.Add(TCS, Socket, None, Timer) Ø No semaphore needed

Important variables and communication1. The CS communicates with its TCSs via sockets.

2. The variable ScriptState is used as a (one way-) communication variable, tellingthe script when it is to terminate as soon as possible (after the Stop command or incase of an error). It is set on lines 10, 1 and 34 in the CS procedure, and is tested online 8 in the RUN procedure.

3. The variable TCS-List contains all relevant information about all currentlyexecuting TCSs (their socket, and a semaphore). Elements are added when thecommands RUN and BGRUN are evaluated, and are removed when a TCS reports thatit has finished execution (lines 18-20).

4. CS receives its commands through the file CScommand, which the CS monitors forfile events.

5. The Evaluate command on line 11 should use a “safe interpreter”, and thus onlyhave access to harmless commands such as for control flow and variablemanipulation, and the commands RUN and BGRUN.

Overview of the codeAfter initialization an interpreter is started that interprets the project script (as a thread ora process). From this interpreter the functions RUN and BGRUN are called, and a TCS isstarted. The TCS connects to the CS via sockets to allow for determining when itfinishes, and the next command in the script is evaluated (synchronization is achievedwith a semaphore). The CS is noticed on file events for the file CScommand, and is thusgiven the commands from the user, which the CS sends to all its TCSs.

AnalysisPrerequisites from the operating system:



1. The socket is closed if the TCS disappears (TCS terminates unexpectedly, networkfailure etc.).

2. The CS is not dependent that the process/thread that created it is still running (theSDS, in case of architecture B1 or B2).

3. There are semaphores available.

4. There are timers available.

5. The CS receives events when the file CScommand is changed.

Neither the TCS nor its CP are left when the CS has terminated (in spite of infinitewaiting loop), according to the analysis for the TCS above.

There may be erroneous information of a project in the database if the CS unexpectedlyterminates between two commands (that should be atomic):

1. Between For each TCS and DbUpdate (lines 30-32 and 35-38). (Please note theimportance of the order between the commands; we rather want a dead CP that isbelieved to be alive rather than the opposite (and load the system with processes noone has control of).)

2. Between DbUpdate and Resume (lines 41-43). As for the TCS, the ordering of thecommands is important; we rather want an executing TCS and CP that is believed tobe suspended than the opposite (the CP and TCS are judged to terminatespontaneously in all normal cases).

CommentsThe semaphores stored per TCS in TCS-List can not yield deadlock. This is notobvious, but a closer analysis shows this. As a start, there must be two semaphoresinvolved to produce deadlock. There is only one per TCS, and the two places where asemaphore is used (line 5 in CS and line 7 in RUN) are per one well-defined TCS.

There is a timer that stops the project execution if a TCS does not start correctly (lines44-46).

It should be taken into consideration whether only these processes should update theproject and task states. A careful analysis should be made to ensure that no harmfulsequences of events could occur.

SBPseudocode

SB(Port, DbConnectData, SDSTimeOutSeconds)1. Db ← DbConnect(dbConnectData)2. Ø Restart each project that was executing on this computer3. RestartProjects(Db, DbConnectData)4. Socket ← OpenPort(Port)5. WaitForEvent:6. Socket.Receive(Message)7. Switch(Message)



8. Case ClientLogin(ClientInfo)9. SDSPort ← GetFreePortNo()10. SDS(DbConnectData, SDSPort, ClientInfo, SDSTimeOutSeconds)11. Case SystemAction(RestartProjects)12. RestartProjects(Db, DbConnectData)13. Case SystemAction(???)14. Other System update, e.g. new DbConnectData, SDSTimeOutSeconds15. Case Else16. Log(’Illegal message:’ & Message)

RestartProjects(Db, DbConnectData)1. ProjList ← DbGetRestartProjects(Db, GetHostName())2. For Each ProjectKey in ProjList3. CS(DbConnectData, ProjectKey)

DbGetRestartProjects(Db, HostName)1. SELECT pk FROM I_TblProject WHERE HostName = ’ & HostName & ’ AND State IN (’Executing’, ’Failed’);


1. The SB is always restarted when the host computer is restarted.

2. The SB is not dependent on any child processes or threads it starts.

The SB does always execute since:

1. It is independent on when child processes terminate, according to prerequisite 2.

2. There is no Exit command in the SB code.

3. It always restarts in case of a host computer crash, according to prerequisite 1.

4. A user with enough rights can however terminate it with operating systemcommands.

5. If written in tcl, the SB is probably vulnerable to messages being ill-formed tcl lists.

CommentThe client is supposed to have passed any request after having controlled the user’srights to execute the system commands.



SDSPseudocode

SDS(DbConnectData???, Port, ClientInfo, TimeOutSeconds)1. Socket ← OpenPort(Port)2. Timer.AlarmTime = TimeOutSeconds3. WaitForEvent:4. Socket.Closed Ø Client exits, or error in client communication5. Exit6. Socket.Receive(Message)7. Ø Test so it is the right client to serve8. If GetClientInfo(Message) = ClientInfo9. Timer.State = Off Ø Client responds, turn off time out10. Switch(Message)11. Case ProjCmdExecute(ProjectKey)12. Ø Start or resume execution (CS takes care)13. CS(DbConnectData, ProjectKey)14. Socket.Send(ACK)15. Case ProjCmdPause(ProjectPath) Ø Pause execution16. WriteFile(ProjectPath & ’\CScommand’, ’Pause’)17. Socket.Send(ACK)18. Case ProjCmdStop(ProjectPath) Ø Stop execution19. WriteFile(ProjectPath & ’\CScommand’, ’Stop’)20. Socket.Send(ACK)21. Case ProjCmdReset(ProjectPath)22. Ø Reset execution23. Case GetFileList(Path) Ø List all files in a directory24. Socket.Send(ListFiles(Path))25. Case ReadTextFile(FileName) Ø Read a text file26. Socket.Send(ReadFile(FileName))27. Case ReadBinaryFile(FileName) Ø Read e.g. supe files28. Socket.Send(FileInterpreter(FileName))29. Else30. Log(’Illegal client:’ & ClientInfo)31. Timer.Alarm Ø Client have not responded in time32. Exit


1. The socket is closed if the client disappears (the user closes the client, the clientterminates unexpectedly, network failure etc.).

2. There are timers available.

An SDS can never be leftover in the system, since:

1. If the client does not make any first call, the SDS terminates after TimeOutSecondsseconds (lines 1 and 31-32).

2. If the client has made a first call, the SDS terminates itself when the clientterminates, according to prerequisite 1 (lines 4-5)



CommentsThe client must for every message identify itself; the ID must correspond with theidentifying string that was given on the command line. There is thus no actual loginprocedure.

GetFileList(Path) may be used both to list files in a task (could be both input andoutput data for a calculation, as well as any PAM system file).

Since a similar connection procedure is used in two places (SDS/client and CS/TCS), ashared library could be written to allow the connection procedure to be reused.

The parameter DbConnectData could maybe be sent from the client at the start ofexecution?

Architectural solutions in PAM – Appendix E Page E.1 (11)


APPENDIX E – REQUIREMENTS SPECIFICATIONThis appendix contains the requirements specification for the calculation serverenvironment in PAM.

ABOUT THE REQUIREMENTSThe requirements in this document were formulated mainly during two meetingsThese are hopefully “limiting” requirements on PAM, meaning requirements whichmay affect the architecture, and are “as demanding as possible”. It is e.g. quite easyto change colours and fonts later, but rather difficult to “add” stability. This approachwas chosen to make sure any further requirements will be supported by thearchitecture, and thus be possible to implement with little effort.

Because of lack of resources, no more formal or thorough requirements analysis hasbeen made, instead we count on that this approach found the most demanding and“limiting” requirements, so that erroneous requirements and details can be changedwhen user evaluate a semi-functional product. We also count on that therequirements are so thoroughly discussed that we have the same understanding of therequirements and that no more formal notation of them is needed to make themunambiguous.

It was generally understood that not all functionality will be available in the firstimplementation of the system, so the requirements are marked with “priority one”(required) and “priority two” (optional).

Unless otherwise stated, the database server is not included in the term “servercomputers”.

PREREQUISITESThe basic requirement is that “it shall work as before but better”. Therefore it isassumed that the reader has some understanding of basic PAM concepts such as“projects” and “tasks”, since the requirements builds on these concepts and reflectsthe subjective meaning of “better” in the requirements.

As a prerequisite, we have local PC’s, on which client applications executes, anumber of server computers on which calculation programs are run (as the major partof executing a “task”), and a central database where the PAM data is stored. Therewill be files describing input to a calculation (created by PAM), containing outputfrom a calculation (created by the calculation program), in addition to possiblesystem files (created by PAM, only meant to be read by PAM, and in case of errors asystem administrator).

Refer to [3] for a description of these concepts.



THE FORM OF THE REQUIREMENTSA requirement is on the form specified in Figure E.1, where ID is the unique identityof the requirement (AC stands for “A” part of PAM, calculation environment), and xis the priority of the requirement.

(AC ID)Requirement specification(Prio x)

Figure E.1: The general form of a requirement specification.

The identity is on the form x-y, or x-y.z, where x, y, and z are integers. The latter formmeans that for all z, each requirement x-y.z must be fulfilled to fulfill the requirementx-y.

“Prio 1” means that the requirement is to be implemented in the first release of thesystem, and “Prio 2” means that the functionality is optional, and may beimplemented in the first release of the system, or added later. Some requirements x-yhave a priority specified, which applies to all sub requirements x-y.z (it might bemeaningless to implement some sub requirements and not others), while for otherrequirements, different sub requirements have different priorities, which means thateach can be implemented according to its priority. In this case, each requirement x-y.z should be tested, but the requirement x-y need not be tested.

PROJECT AND TASK STATESProject states

(AC 1-1)The following states, with the specified semantics, shall apply to projects. (Prio 1)

(AC 1-1.1)”Not Executed”: Command “Execute” has never been issued forthis project, or the command “Reset” was the most recentlyexecuted command 23. Every task has state “Not Executed”.

(AC 1-1.2)”Executing”: One or more tasks are executing. At least one taskhas thus state “Executing”, and other tasks may have one of thefollowing states since previous executions: “Not Executed”,“Stopped”, “Failed”, or “Executed”. No task can have state“Paused”.

23 See section “Project execution control” for a specification on the available commands.



(AC 1-1.3)”Paused”24: No task is executing. At least one task has thus state“Paused”, and other tasks may have one of the following statessince previous executions: “Not Executed”, “Stopped”, “Failed”,or “Executed”.

(AC 1-1.4)”Stopped”25: The project has been stopped on request from someuser. At least one task has thus state “Stopped”, and other tasksmay have one of the following states since previous executions:“Not Executed”, “Stopped”, “Failed”, or “Executed”. No task canhave state “Paused”.

(AC 1-1.5)”Failed” 26: The project has not been successfully executed. Thefailure may be caused by errors in the project script, or some othertechnical reason. Tasks may have entered any of the followingstates during the execution, or in previous executions: “NotExecuted”, “Stopped”, “Failed”, or “Executed”. No task can havestate “Paused”.

(AC 1-1.6)”Executed” 27: The project script has been successfully executed.The tasks that were executed in the script have state “Executed” or“Failed”. Other tasks may have any of the following states sinceprevious executions: “Not Executed”, “Stopped”, “Failed”, or“Executed”. No task can have states “Paused”.

Task states(AC 1-2)The following states, with the specified semantics, shall apply to states. (Prio 1)

(AC 1-2.1)”Not Executed”: The task has never been executed. It does notcontain any output data.

(AC 1-2.2)”Executing”: The task is currently executing. It may contain legaloutput data so far.

24 A “Paused” task can be resumed, from exactly the point of execution where it was paused.25 A “Stopped” task can not be resumed, but will be re-executed if restarted.26 Observe that the project state “Failed” does not imply that any task is in state “Failed”. See also section“Task states ”.27 The project state “Executed” thus only tells that the project execution reached the end of the script, notwhether individual tasks were executed, nor whether tasks were executed successfully, with errors, orfailed.



(AC 1-2.3)”Paused”: The task is paused, but can be resumed later. It maycontain legal output data so far.

(AC 1-2.4)”Stopped”: The task has been stopped on request from a user. Itmay contain viewable output data.

(AC 1-2.5)”Failed” 28: The task has failed, because of some technical reason(such as network failure). It may contain legal output data so far.

(AC 1-2.6)”Executed”: The task has been executed. There may be an errormessage associated with this state, originating from the calculationprogram or the operating system. If there is none, PAM regards thetask as having been successfully executed.

PROJECT EXECUTIONClient and project execution independence

(AC 2-1)It shall be possible to start execution of several projects in the same client. (Prio 1)

(AC 2-2)Project executions shall be totally independent of whether there exists any client ornot. (Prio 1)

(AC 2-3)The user’s possible operations from any given client to supervise or control a projectexecution shall only be restricted by the PAM authorization mechanism, not by whathas been made earlier in the same client29. (Prio 1)

(AC 2-4)The client shall be able to perform all its tasks independently of whether there areany server computers available or not, except when it is explicitly dependent on theservers (as when starting a project execution, or when requesting a file residing onthe server for viewing in the client). (Prio 1)

(AC 2-5)Input data in tasks are to be interpreted, unless file-based parts (free texts that will beused as input). (Prio 1)

28 Observe that the task state “Failed” does not imply that the project is in state “Failed”. See section“Project states”.29 E.g. the possibility to view files during execution in a specific client, or to issue project commands fromit, should be independent of if the execution was started from the client or not.



Script interpretation(AC 3-1)The project execution shall be controlled by a project script. (Prio 1)

(AC 3-2)The script language tcl shall be used to interpret project scripts. (Prio 1)

(AC 3-3)The following language entities shall be available in the project script.

(AC 3-3.1)The command RUN <task>, which sends the command “Execute”to the task <task> and waits for its release before evaluating thenext command in the script. (Prio 1)

(AC 3-3.2)The command BGRUN <task>, which sends the command“Execute” to the task <task> and immediately evaluates the nextcommand in the script (the task is run in the background). (Prio 1)

(AC 3-3.3)All non-harmful tcl commands30. (Prio 1)

(AC 3-3.4)The user shall not be able to perform any potential harmfulcommands, such as direct file manipulation or direct databaseaccess31. (Prio 1)

(AC 3-3.5)There shall be a possibility to add commands providing morecomplex functionality in a single command. Such commands canpossibly contain harmful commands, but the possibility to editsuch commands shall be restricted to certain users32. (Prio 2)

(AC 3-4)When a task is about to be executed and it is dependent on a parent task that has state“Failed”, its state shall immediately be set to “Not Executed”. (Prio 1)

30 E.g. variable manipulation, tcl control structures (e.g. If and While), string and list manipulatingcommands. Not further specified.31 Not further specified.32 One such command could e.g. create a task, modify its input and iterate on it until it meets some specifiedcriteria (or a maximum number of iterations has been carried out). E.g. the users have requested thefunctionality of the current HITMIN script.



(AC 3-5)If a task has been successfully executed earlier, it shall be re-executed if and only ifany of the following conditions hold. (Prio 2)

(AC 3-5.1)The task has been modified since the previous execution.

(AC 3-5.2)The task’s parent’s execution end time is later than the task’s lastexecution start time.

(AC 3-5.3)The task’s parent does not have state “Failed”. (See requirementAC 3-4)

(AC 3-5.4)The user explicitly wants it to be re-executed.

Project execution controlThe following commands shall be interactively available to the user33, for projectscript execution control. The commands result in the state transitions pictured inFigure E.2.

Not executed

Stopped

Command "Execute" Executing

Command"Pause"

Command"Stop"

Paused

Executed

Command"Stop"

Execution finished

Command"Execute"

Command "Execute"

Command "Execute"

Figure E.2: State transitions for a project. Command “Reset” is not included. State“Failed” can be reached from any other state and is not included.

(AC 4-1)Execute. (Prio 1)

33 I.e. not as commands in the script, but rather in the form of e.g. command buttons.



(AC 4-1.1)If the project is in state “Not executed”, “Executed” or “Stopped”,the project state is changed to “Executing”, and the project script isevaluated from the start.

(AC 4-1.2)If the project is in state “Paused”, all suspended tasks (i.e. tasks instate “Paused”) are sent the command “Resume”.

(AC 4-1.3)If the project is in the state “Executing”, the command has noeffect.

(AC 4-2)Stop. (Prio 1)

(AC 4-2.1)If the project is in state “Executing”, script execution is terminatedand the project state is changed to “Stopped”. All current executingtasks (state “Executing”) are terminated and their state is changedto “Stopped”.

(AC 4-2.2)If the project is in any other state, the command has no effect.

(AC 4-3)Pause. (Prio 1)

(AC 4-3.1)If the project is in state “Executing”, the project state is changed to“Paused”. All current executing tasks are suspended and their stateis changed to “Paused”.

(AC 4-3.2)If the project is in any other state, the command has no effect.

(AC 4-4)Reset34. (Prio 1)

(AC 4-4.1)If the project is in state “Executing” or “Paused”, it is first stopped(given the command “Stop”).

34 This command is available to allow the user to force a re-execution in case PAM falsely believes a projectis running.



(AC 4-4.2)The project’s state and all task’s states are changed to “Notexecuted”, independently of the project’s state.

(AC 4-4.3)All (non-system) files for every task in the project are removed.

(AC 4-5)It shall be possible to execute a project folder. This means that the command“Execute” is issued to several (user-selected) projects in a folder (recursively). (Prio2)

(AC 4-6)It shall be possible to specify a maximum amount of CPU time that may be used by atask. The system shall automatically stop a calculation exceeding this time. (Prio 2)

(AC 4-7)Output data views shall be continuously updated during execution35. (Prio 2)

LOCATION OF EXECUTION(AC 5-1)For each program registered in PAM, it shall be possible to state where it can berun36.

(AC 5-1.1)It shall be possible to list individual server computers on which itmay run. (Prio 1)

(AC 5-1.2)It shall be possible to select classes of computers (such as “any NTcomputer”, or “any computer in the TekBer cluster”). (Prio 2)

(AC 5-2)A whole PAM system shall be able to run on a single computer.

(AC 5-2.1)For demo purposes only. (Prio 1)

(AC 5-2.2)A user shall be able to work temporarily without connection to theserver computers or the database, and synchronize his work withthe central database when reconnected to it. (Prio 2)

35 This requirement requires that output files are readable during execution.36 A project may contain tasks that are executed on different computers, and even platforms.



(AC 5-3)Platform data shall be stored with a project, such as operating system versions for allcomponents. (Prio 1)

(AC 5-4)PAM shall be able to handle any calculation program, given that it can be started viaa command to the operating system on a server computer included in the PAMsystem. (Prio 1)

AVAILABILITY AND RELIABILITY(AC 6-1)After a network failure, the PAM system shall work without any loss of systemfunctionality (without the need of any administrator actions). (Prio 1)

(AC 6-2)After any number of server computer failures (possibly at the same time), the PAMsystem shall work without any loss of system functionality (without the need of anyadministrator actions). (Prio 1)

(AC 6-3)When a server computer restarts, PAM shall rerun all projects with tasks that wereexecuting on that computer when it failed. (Prio 1)

FILE HANDLING(AC 7-1)The following requirements apply to ASCII text files:

(AC 7-1.1)ASCII text files shall be viewable. (Prio 1)

(AC 7-1.2)It shall be possible to compare ASCII text files. (Prio 2)

(AC 7-1.3)It must take less than 10% more time to open an ASCII text file inPAM compared to open it with “xedit” using eXcursion. (Prio 1)

(AC 7-2)The following requirements apply to binary files:

(AC 7-2.1)Binary files shall be possible to interpret without modifying



PAM’s source code, given an interpreter transforming data into aformat that can be handled by PAM37. (Prio 1)

(AC 7-2.2)It shall be possible to compare binary files. (Prio 2)

(AC 7-3)Specified subsets of output data shall, at the user’s command, be stored in thedatabase. (Prio 2)

(AC 7-4)The user shall be able to remove any (non-system) file. (Prio 2)

(AC 7-5)There shall be a possibility to export all files associated with a project (or a subset).Access rights must be correct, ensuring that the user performing the export has rightsto it. (Prio 2)

(AC 7-6)To process a task’s input and output data files, it shall be possible to run additionalprograms on the server computer where the task is run (to e.g. convert a binary file toASCII format). (Prio 2)

(AC 7-6.1)After a project execution, at a user’s command.

(AC 7-6.2)Before and after the execution of the calculation program, as a partof the task specification.

(AC 7-7)Only users working in the PAM system, and system administrators, shall be able todelete, move or modify any file part of the PAM system. (Prio 1)

(AC 7-8)It shall be possible to archive project data (e.g. through the HSM system, on acompact disc, or in the database) at the user’s command 38. (Prio 2)

(AC 7-8.1)PAM shall notice a group of people39 when a project is not touchedafter a user-specified time. The default time shall be three months.

37 And viewed e.g. as plots or tables.38 It is regarded too insecure to archive automatically, since it must be ensured that the files are actuallystored on non-volatile media, not only copied to another directory, before the original files are deleted.Otherwise the files may be lost in case of e.g. disk failure.



(AC 7-8.2)It shall be possible to restore an archived project.

PROCESS HANDLING(AC 8-1)PAM shall be able to handle processes in the following ways. (Prio 2)

(AC 8-1.1)It shall be possible to find all server computer processes started byPAM within PAM.

(AC 8-1.2)PAM shall be able to kill any such process at a user’s command 40.

(AC 8-2)It shall be possible to run PAM in an environment with a load sharing facility (LSF)and use it to manage calculations, or without. (Prio 1)

39 It is not specified which users, but could be e.g. the user responsible for the project or systemadministrators.40 Clarification: PAM shall never try to automatically find and kill suspicious processes.

Architectural solutions in PAM – Appendix F Page F.1 (11)


APPENDIX F – TCL REFERENCESThis appendix contains reference pages for the tcl threading model and OraTcl, thetcl extension for communicating with an Oracle database.

TCL THREADING MODELThe rest of the text in this section was found athttp://dev.scriptics.com/ftp/thread//thread21.html on 2001-04-03.

NAMEthread - Create and manipulate threads with Tcl interpreters in them.

SYNOPSISthread::create ?-joinable? ?script?thread::idthread::exists idthread::errorproc procnamethread::exitthread::namesthread::send id ?-async? scriptthread::waitthread::join idthread::transfer id channel

DESCRIPTIONThe thread extension creates threads that contain Tcl interpreters, and it lets you send scripts to thosethreads.thread::create creates a thread that contains a Tcl interpreter. The Tcl interpreter either evaluates thescript, if specified, or it waits in the event loop for scripts that arrive via the thread::send command.The result of thread::create is the ID of the thread. The result, if any, of script is ignored. Using flag-joinable it is possible to create a joinable thread, i.e. one upon whose exit can be waited upon (byusing thread::join).thread::id returns the ID of the current thread.thread::errorproc sets a handler for errors that occur in other threads. Or, if no procedure isspecified, the current handler is returned. By default, an uncaught error in a thread terminates thatthread and causes an error message to be sent to the standard error channel. You can change thedefault reporting scheme by registering a procedure that is called to report the error. The proc is calledin the interpreter that invoked the thread::errorproc command. The original thread that has theuncaught error is terminated in any case. The proc is called like this:myerrorproc thread_id errorInfothread::exit terminates the current thread. There is no way to force another thread to exit - you canonly ask it to terminate by sending it a command.thread::names returns a list of thread IDs. These are only for threads that have been created viathread::create . If your application creates other threads at the C level, they are not reported bythread::names .thread::exists returns true (1) if thread given by the ID parameter exists, false (0) otherwise. Thisapplies only for threads that have been created via thread::create .thread::send passes a script to another thread and, optionally, waits for the result. If the -async flag isspecified then the caller does not wait for the result. The target thread must enter its event loop inorder to receive script messages. This is done by default for threads created without a startup script.Threads can enter the event loop explicitly by calling thread::wait or vwait.



thread::wait enters the event loop so a thread can receive messages from thread::send. This isequivalent to vwait unusedvariable.thread::join waits for the thread with id id to exit and then returns its exit code. Errors will bereturned for threads which are not joinable or already waited upon by another thread.thread::transfer moves the specified channel from the current thread and interpreter to the maininterpreter of the thread with the given id. After the move the current interpreter has no access to thechannel anymore, but the main interpreter of the target thread will be able to use it from now on.

DISCUSSIONThe fundamental threading model in Tcl is that there can be one or more Tcl interpreters per thread,but each Tcl interpreter should only be used by a single thread. A "shared memory" abstraction isawkward to provide in Tcl because Tcl makes assumptions about variable and data ownership.Therefore this extension supports a simple form of threading where the main thread can manageseveral background, or "worker" threads. For example, an event-driven server can pass requests toworker threads, and then await responses from worker threads or new client requests. Everything goesthrough the common Tcl event loop, so message passing between threads works naturally with event-driven I/O, vwait on variables, and so forth. For the transfer of bulk information it is possible to movechannels between the threads.

SEE ALSOA Guide to the Tcl Threading Model.

ORATCLThe rest of the text in this section was found athttp://dev.scriptics.com/man/oratcl2.6/oratcl.html on 2001-04-0341.

"Oratcl" 2.6 (TCL) manual page

NAMEOratcl - Oracle Database Server access commands for Tcl

INTRODUCTIONOratcl is a collection of Tcl commands and a Tcl global array that provides access to an OracleDatabase Server. Each Oratcl command generally invokes several Oracle Call Interface (OCI) libraryfunctions. Programmers using Oratcl should be familar with basic concepts of OCI programming.

ORATCL COMMANDSoralogon connect-str

Connect to an Oracle server using connect-str . The connect string should be a valid Oracle connectstring, in the form:

namename/passwordname@n:dbnamename/password@n:dbname

A logon handle is returned and should be used for all other Oratcl commands using this connectionthat require a logon handle. Multiple connections to the same or different servers are allowed, up to a

41 The text has been formatted by changing fonts, removing blank lines, hyperlinks, and a table of contents,only to make it more readable.



maximum of 25 total connections. Oralogon raises a Tcl error if the connection is not made for anyreason (login or password incorrect, network unavailable, etc.). If the connect string does not include adatabase specification, the value of the environment variable ORACLE_SID is used as the server.

oralogoff logon-handle

Logoff from the Oracle server connection associated with logon-handle . Logon-handle must be avalid handle previously opened with oralogon. Oralogoff returns a null string. Oralogoff raises a Tclerror if the logon handle specified is not open.

oraopen logon-handle

Open an SQL cursor to the server. Oraopen returns a cursor to be used on subsequent Oratclcommands that require a cursor handle. Logon-handle must be a valid handle previously opened withoralogon. Multiple cursors can be opened through the same or different logon handles, up to amaximum of 25 total cursors. Oraopen raises a Tcl error if the logon handle specified is not open.

oraclose cursor-handle

Closes the cursor associated with cursor-handle. Oraclose raises a Tcl error if the cursor handlespecified is not open.

orasql cursor-handle sql-statement ?-parseonly? ?-async?

Send the Oracle SQL statement sql-statement to the server. Cursor-handle must be a valid handlepreviously opened with oraopen. Orasql will return the numeric return code "0" on successfulexecution of the sql statement. The oramsg array index rc is set with the return code; the rows indexis set to the number of rows affected by the SQL statement in the case of insert, update, or delete .

The oramsg array index rowid is set with the Oracle ROWID of the last row processed in an insert,update, or delete statement.

Only a single SQL statement may be specified in sql-statement . Orafetch allows retrieval of returnrows generated.

The optional -parseonly argument parses but does not execute the SQL statement. The SQL statementmay contain bind variables that begin with a colon (':'). The statement may then be executed with theorabindexec command, allowing bind variables to be substituted with values. Bind variables shouldonly be used for SQL statements select, insert, update, or delete .

The optional -async argument specifies the that SQL should be executed asynchronously. Orasql willreturn the numeric code "3123" when using -async. Note that -async is only available for orasql whenOratcl is compiled with Oracle 7.2 or higher libraries, and when connected to an Oracle server ofversion 7.2 or higher. See orapoll below.

Only -parseonly or -async may be specified.

Orasql performs an implicit oracancel if any results are still pending from the last execution of orasql.Orasql raises a Tcl error if the cursor handle specified is not open, or if the SQL statement issyntactically incorrect.

Table inserts made with orasql should follow conversion rules in the Oracle SQL Reference manual.

orabindexec cursor-handle ?-async? ?:varname value ...?



Execute a previously parsed SQL statement, optionally binding values to SQL variables. Cursor-handle must be a valid handle previously opened with oraopen. An SQL statement must havepreviously been parsed by executing orasql with the -parseonly option. Orabindexec may berepeatedly executed after a statement is parsed with bind variables substituted on each execution.Orabindexec does not re-parse SQL statements before execution.

The optional -async argument specifies the that SQL should be executed asynchronously. Orasql willreturn the numeric code 3123 when using -async. If -async is specified, it must preceed any remaining:varname value pairs for the command. Note that -async is only available for orasql when Oratcl iscompiled with Oracle 7.2 or higher libraries, and when connected to an Oracle server of version 7.2 orhigher. See orapoll below.

Optional :varname value pairs allow substitutions on SQL bind variables before execution. As many:varname value pairs should be specified as there are defined in the previously parsed SQL statement.Varnames must be prefixed by a colon ":".

Orabindexec will return "0" when the SQL is executed successfully; "3123" when executed with the -async argument; "1003" if a previous SQL has not been parsed with orasql; "1008" if not all SQL bindvariables have been specified. Refer to Oracle error numbers and messages for other possible values.

The oramsg array index rowid is set with the Oracle ROWID of the last row processed in an insert,update, or delete statement.

orafetch cursor-handle ?commands? ?substitution_character? ?tclvarnamecolnum ...?

Return the next row from the last SQL statement executed with orasql as a Tcl list. Cursor-handlemust be a valid handle previously opened with oraopen. Orafetch raises a Tcl error if the cursor handlespecified is not open. All returned columns are converted to character strings. A null string is returnedif there are no more rows in the current set of results. The Tcl list that is returned by orafetch containsthe values of the selected columns in the order specified by select .

The optional commands argument allows orafetch to repeatedly fetch rows and execute commands foreach row. Substitutions are made on commands before passing it to Tcl_Eval() for each row. Anoptional fourth argument consisting of a single character can be specified for a column numbersubstitution character. If none is specified the character '@' will be used to denote the substitutioncharacter. If the substitution character is a null string, no column substitutions will be performed onthe commands string. Orafetch interprets the substitution character followed by a number (@n ) incommands as a result column specification. For example, @1, @2, @3 refer to the first, second, andthird columns in the result. @0 refers to the entire result row, as a Tcl list. Substitution columns mayappear in any order, or more than once in the same command. Substituted columns are inserted intothe commands string as proper list elements, i.e., one space will be added before and after thesubstitution and column values with embedded spaces are enclosed by {} if needed.

Tcl variables may also be set for commands on each row that is processed. Tcl variables are specifiedafter the substitution_character , consisting of matching pairs of Tcl variable names and a columnnumbers. Column number may be "0", in which the Tcl variable is set to the entire result row as a Tcllist. Column numbers must be less than or equal to the number of columns in the SQL result set.

A Tcl error is raised if a column substitution number is greater than the number of columns in theresults. If the commands execute break , orafetch execution is interrupted and returns with TCL_OK.Remaining rows may be fetched with a subsequent orafetch command. If the commands executereturn or continue , the remaining commands are skipped and orafetch execution continues with the



next row. Orafetch will raise a Tcl error if the commands return an error. Commands should beenclosed in "" or {}.

Oratcl performs conversions for all data types. Raw data is returned as a hexadecimal string, without aleading "0x". Use the SQL functions to force a specific conversion.

If an SQL statement has been executed in asynchrounous mode (-async argument on orasql ororabindexec ), executing orafetch will block until SQL execution is complete. See orapoll below.

The oramsg array index rc is set with the return code of the fetch. 0 indicates the row was fetchedsuccessfully; 1403 indicates the end of data was reached.

The index rows is set to the cumulative number of rows fetched so far. The oramsg array index rowidis set with the Oracle ROWID of the last row fetched when retrieved with a select for updatestatement.

The oramsg array index maxlong limits the amount of long or long raw data returned for each columnreturned. The default is 32768 bytes.

The oramsg array index nullvalue can be set to specify the value returned when a column is null. Thedefault is "0" for numeric data, and "" for other datatypes.

orapoll cursor-handle ?-all?

Return a list of cursor handles that have results waiting to be fetched. Cursor-handle must be a validhandle previously opened with oraopen. If the previous SQL execution has not finished, orapoll willreturn a null string. If execution has finished for the cursor handle and/or results are available, thecursor handle is returned.

The optional argument -all may be specified to return a list of all cursor handles that have resultswaiting.

Note that orapoll is only available when Oratcl is compiled with Oracle 7.2 or higher libraries, andwhen connected to an Oracle server of version 7.2 or higher.

Asynchronous processing (-async argument on orasql or orabindexec ) is actually done on a logon-connection basis. Executing orasql or orabindexec with -async causes all open cursors through thesame logon connection handle to be changed to asycnchronous mode. Likewise, if orafetch isexecuted on a cursor, all open cursors through the same logon connection handle will be changed toblocking mode. It is recommended that asynchronous processing be limited to one cursor per logonconnection to avoid unexpected changes of blocking to/from asynchronous mode for other cursors.

orabreak cursor-handle

Cause a currently executing SQL statement to be interrupted. Cursor-handle must be a valid handlepreviously opened with oraopen. Use the oracancel command to cancel SQL results, see below.

Note that orabreak is only available when Oratcl is compiled with Oracle 7.2 or higher libraries, andwhen connected to an Oracle server of version 7.2 or higher.

oraplexec cursor-handle pl-block ?:varname value ...?

Execute an anonymous PL block, optionally binding values to PL/SQL variables. Cursor-handle mustbe a valid handle previously opened with oraopen. Pl-block may either be a complete PL/SQLprocedure or a call to a stored procedure coded as an anonymous PL/SQL block. Optional :varnamevalue pairs may follow the pl-block. Varnames must be prefixed by a colon ":", and match the



subsitution bind names used in the procedure. Any :varname that is not matched with a value isignored. If a :varname is used for output, the value should be coded as a null string, "".

Cursor variables may be returned from a PL/SQL block by specifing an open cursor as the bind valuefor a :varname bind variable. The cursor must have previously been opened by oraopen using the samelogon handle as the cursor used to execute the oraplexec command. The cursor is closed and re-opened with the results of a PL/SQL "OPEN :cursor FOR select" statement. After oraplexeccompletes, the cursor returned may then fetch result rows by using orafetch; column information isavailable by using oracols.

Note that cursor variables are only available for oraplexec when Oratcl is compiled with Oracle 7.2 orhigher libraries, and when connected to an Oracle server of version 7.2 or higher.

Oraplexec returns the contents of each :varname as a Tcl list upon the termination of PL/SQL block.Oraplexec raises a Tcl error if the cursor handle specified is not open, or if the PL/SQL block is inerror. The oramsg array index rc contains the return code from the stored procedure.

oracols cursor-handle

Return the names of the columns from the last orasql, orafetch, or oraplexec command as a Tcl list.Oracols may be used after oraplexec , in which case the bound variable names are returned.

The oramsg array index collengths is set to a Tcl list corresponding to the lengths of the columns;index coltypes is set to a Tcl list corresponding to the types of the columns; index colprecs is set to aTcl list corresponding to the precision of the numeric columns, other corresponding non-numericcolumns are a null string (Version 7 only); index colscales is set to a Tcl list corresponding to thescale of the numeric columns, other corresponding non-numeric columns are a null string (Version 7only). Oracols raises a Tcl error if the cursor handle specified is not open.

oracancel cursor-handle

Cancels any pending results from a prior orasql command that use a cursor opened through theconnection specified by cursor-handle. Cursor-handle must be a valid handle previously opened withoraopen. Oracancel raises a Tcl error if the cursor handle specified is not open.

oracommit logon-handle

Commit any pending transactions from prior orasql commands that use a cursor opened through theconnection specified by logon-handle. Logon-handle must be a valid handle previously opened withoralogon. Oracommit raises a Tcl error if the logon handle specified is not open.

oraroll logon-handle

Rollback any pending transactions from prior orasql commands that use a cursor opened through theconnection specified by logon-handle. Logon-handle must be a valid handle previously opened withoralogon. Oraroll raises a Tcl error if the logon handle specified is not open.

oraautocom logon-handle {on | off}

Enables or disables automatic commit of SQL data manipulation statements using a cursor openedthrough the connection specified by logon handle. Logon-handle must be a valid handle previouslyopened with oralogon. One of the literal values "on" or "off" must be specified. The automatic commitfeature defaults to "off". Oraautocom raises a Tcl error if the logon handle specified is not open.

orawritelong cursor-handle rowid table column filename



Write the contents of a file to a LONG or LONG RAW column. Cursor-handle must be a valid handlepreviously opened with oraopen. Rowid is the Oracle rowid of an existing row. The rowid must be inthe format of an Oracle rowid datatype. Table is the table name that contains the row and column.Column is the column name that is the LONG or LONG RAW column. Filename is the name of thefile that contains the LONG or LONG RAW data to write into the column. Orawritelong composesand executes an SQL update statement based on the table, column, and rowid.

Orawritelong returns a decimal number upon successful completion of the number of bytes written tothe LONG column. A properly formatted Rowid may be obtained through a prior execution of orasql,"select rowid from table where ..." .

Orawritelong raises a Tcl error if the cursor handle specified is not open, or if rowid, table, or columnare invalid, or if the row does not exist.

orareadlong cursor-handle rowid table column filename

Read the contents of a LONG or LONG RAW column and write results into a file. Cursor-handlemust be a valid handle previously opened with oraopen. Rowid is the Oracle rowid of an existing row.The rowid must be in the format of an Oracle rowid datatype. Table is the table name that contains therow and column. Column is the column name that is the LONG or LONG RAW column. Filename isthe name of a file in which to write the LONG or LONG RAW data. Orareadlong composes andexecutes an SQL select statement based on the table, column, and rowid. A properly formatted Rowidmay be obtained through a prior execution of orasql, "select rowid from table where ..." .

Orareadlong returns a decimal number upon successful completion of the number of bytes read fromthe LONG column.

Orareadlong raises a Tcl error if the cursor handle specified is not open, or if rowid, table, or columnare invalid, or if the row does not exist.

SERVER MESSAGE AND ERROR INFORMATIONOratcl creates and maintains a Tcl global array to provide feedback of Oracle server messages, namedoramsg . Oramsg is also used to communicate with the Oratcl interface routines to specify null returnvalues and LONG limits. In all cases except for nullvalue and maxlong , each element is reset to nullupon invocation of any Oratcl command, and any element affected by the command is set. Theoramsg array is shared among all open Oratcl handles. Oramsg should be defined with the globalstatement in any Tcl procedure needing access to oramsg.Oramsg elements:

version

indicates the version of Oratcl.

nullvalue

can be set by the programmer to indicate the string value returned for any null result. Settingsybmsg(nullvalue) to "default" will return "0" for numeric null data types (integer, float, and money)and a null string for all other data types. Nullvalue is initially set to "default".

maxlong

can be set by the programmer to limit the amount of LONG or LONG RAW data returned by orafetch.The default is 32768 bytes. The maximum is 65536 (Version 6) or 2147483647 (Version 7) bytes.Any value less than or equal to zero is ignored. Any change to maxlong becomes effective on the nextcall to orasql. See notes on maxlong usage with orafetch.



handle

indicates the handle of the last Oratcl command. Handle is set on every Oratcl command (execptwhere an invalid handle is used.)

rc

indicates the results of the last SQL command and subsequent orafetch processing. Rc is set by orasql,orafetch, oraplexec, and is the numeric return code from the last OCI library function called by anOratcl command. Refer to Oracle Error Messages and Codes manual for detailed information. Typicalvalues are:

Function completed normally, without error.

900-999

invalid SQL statement, invalid sql statements, missing keywords, invalid column names, etc.

1000-1099

program interface error, e.g., no sql statement parsed, SQL bind variables not bound, logondenied, insufficient privileges, etc.

1400-1499

execution errors or feedback.

end of data was reached on an orafetch command.a column fetched by orafetch was truncated. Can occur when fetching a LONG or LONG RAW, andthe maxlong value is smaller than the acutal data size. 3123asynchronous execution is pending completion.

errortxt

the message text associated with rc. Since the oraplexec command may invoke several SQLstatements, there is a possiblity that several messages may be received from the server.

collengths

is a Tcl list of the lengths of the columns returned by oracols. Collengths is only set by oracols.

coltypes

is a Tcl list of the types of the columns returned by oracols. Coltypes is only set by oracols. Possibletypes returned are: char, varchar2 (Version 7), number, long, rowid, date, raw, long_raw, mlslabel(Version 7), raw_mlslabel (Version 7), unknown.

colprecs

is a Tcl list of the precision of the numeric columns returned by oracols. Colprecs is only set byoracols. For non-numeric columns, the list entry is a null string. The colprecs element only returnsmeaningful information when Oratcl is compiled for Version 7. Due to an OCI limitation in Version 6,zeros are returned as precision.

colscales



is a Tcl list of the scale of the numeric columns returned by oracols. Colprecs is only set by oracols.For non-numeric columns, the list entry is a null string. The colscales element only returns meaningfulinformation when Oratcl is compiled for Version 7. Due to an OCI limitation in Version 6, zeros arereturned as scale.

sqlfunc

the numeric OCI code of the last sql function performed. See the OCI manual for descriptions.

peo

parse error offset, an index into an sql string that failed due to error.

ocifunc

the number OCI code of the last OCI function called by Oratcl. See the OCI manual for descriptions.

rows

the number of rows affected by an insert, update, or delete in an orasql command, or the cumulativenumber of rows fetched by orafetch.

rowid

the Oracle ROWID of the row affected by a select insert, update, or delete in an orasql, orabindexec,or orafetch command. The rowid element is formatted as a hexadecimal character string that issuitable for use in subsequent SQL.

ociinfo

a list of features present in the Oracle library when Oratcl was compiled. Possible values are: either

"version_6" or "version_7" - reflects the Oracle version

"non_blocking" and "cursor_variables" - Oracle version 7.2+ non-blocking SQL execution andPL/SQL cursor variables.

NOTESTcl errors can also be raised by any Oratcl command if a command's internal calls to OCI libraryroutines fail.

Oracle is very particular about using date literals in SQL. The proper date format is 'dd-mmm-yy',where dd is a numeric day, mmm is a three character month name, yy is two digit year. Some versionsof Oracle give very strange results or failures if date values are not in this format.

The limit of the number of simultaneous connections and cursors is artificial, based on a fixed table inoratcl.c. Change the source #define ORATCLLDAS or #define ORATCLCURS if more are needed.

Oratcl does not mix well with Tcl slave interpreters. If Oratcl functions are needed in a slave, set upalias commands in the slave to execute Oratcl commands in the master interpreter. See the 'interp' manpage.

The maximum amount of LONG or LONG RAW data returned by orafetch is ultimately dependent onOratcl's ability to malloc() maxlong bytes of memory for each LONG/LONG RAW column retrieved.Setting oramsg(maxlong) to too high a value may cause core dumps or memory shortages.



Orareadlong compiled for Version 7 will utilize the oflng() OCI function; otherwise, a single dataallocation will be made to hold the entire data. If memory cannot be allocated, the command will fail.

Unfortunately, OCI does not provide a way to write a LONG/LONG RAW column in chunks. Theentire amount of data required to perform orawritelong is allocated in a single request. Again, ifmemory cannot be allocated, the command will fail.

Orafetch normally caches 10 rows at a time from the Oracle server. When a query contains a LONGor LONG RAW column, single rows are retrieved from the server in order to prevent memoryshortages.

Cursor variables returned by oraplexec must be specified as a currently open cursor from the samelogon connection:

set lda [oralogon scott/tiger]set exec_cur [oraopen $lda]set fetch_cur [oraopen $lda]set plsql { beginopen :fetchcur for select empno, enamefrom emp where job = :job ;end;}oraplexec $exec_cur $plsql :job ANALYST :fetchcur $fetch_curorafetch $fetch_cur {puts "$empno $ename"} "" empno 1 ename 2Using SQL bind variables are more efficient than letting Oracle reparse SQL statements. Use acombination of orasql ... -parseonly / orabindexec:set sql "insert into name_tab(first_name) values(:firstname)"orasql $cur $sql -parseonlyforeach name [list Ted Alice John Sue] {orabindexec $cur :firstname $name}rather than:foreach name [list Ted Alice John Sue] {set sql "insert into name_tab(first_name) values('$name')"orasql $cur $sql}

ENVIRONMENT VARIABLESORACLE_SID - The default Oracle server system ID.

FILES/etc/oratab/etc/sqlnet$HOME/.sqlnet - definitions for Oracle servers.

BUGSOrawritelong compiled for Version 7 will may be problematic until a specific OCI function isavailable from Oracle.

Oracle OCI (Oracle Call Interface) libraries for Oracle release 7.1.4 may cause problems due tochanges that Oracle introduced in that version. Use OCI libraries 7.0.x or 7.1.6+ if you experiencesymptoms of excessively long connect times (2+ minutes) or "fetch out of sequence" errors.



AUTHORTom Poindexter, Denver Colorado. Version 2.5 released May 1997. Concepts borrowed from myearlier work with Sybtcl, a TCL interface to the Sybase RDBMS product.

[email protected]

Architectural solutions in PAM – Appendix G Page G.1 (4)


APPENDIX G – PROJECT GROUP ENDING DISCUSSIONThis appendix contains the protocol from the Master thesis ending presentation anddiscussion meeting.

The purpose of the meeting was to discuss the results from the Master thesis, and ifpossible decide how to continue.

DECISIONIt was agreed that the author had made a thorough investigation and analysis, andthat once the resulting issues are satisfactory solved, the resulting architecture will begood enough to be implemented.

The discussions are summarized below, and in some cases solutions were sketched.The architectural change sketched in section “Dependencies” requires that theanalyses be carried out again, at least partly. However, the participants felt sure thatit is a straightforward change. A new component is built, enabling the remote filehandling functionality believed to be inherent in Windows NT. No architecturalmodifications in other parts should be required. Furthermore, such a componentcould later be used by other systems.

DISCUSSIONSRedundant input dataPotential problem

An alternative solution could allow for a more intelligent way of reading data fromthe database, thus increasing performance and decreasing system load.

A common situation is when a project contains a large number of tasks, where mostinput data is common (such as a common “master file” with different local changes).According to KB, there may be 12 000 lines of input data, where only 10 lines differbetween the tasks. The architectural solutions presented clearly wastes networkbandwidth, since a more intelligent way would be to share common input data.

However, another common situation is when the project contains a large number oftasks, but the project script contains commands that execute only one or a few tasks.In this situation, it is unnecessary to fetch input data for all tasks.

The issue is further complicated by the fact that tasks in the same project may beexecuted in different environments.

Further investigationIt was agreed that the performance and system load in these situations should beinvestigated. It is important that practical problems are solved; the presentedsolutions might be “fast enough” and “light enough” in “average” situations.



Location of filesPotential problem

The location of the files associated with a task was discussed. HB intuitively feltuncomfortable with the solution distributing files on a number of computers. WhenPAM works, it is easy to use the client to find the distributed files, but when PAMdoes not work as expected, it might be a major work finding all files. This is ofcourse the drawback of proposals A1 and B1, while the benefit is better performanceand decreased system load.

External customers have, according to KB, systems where different server computershave separate file systems. In these cases, it might be required that PAM is able tohandle distributed files.

Suggested solutionOne solution may be that each server computer in the PAM system has an associatedcomputer where it stores its files. It might be the same computer, or another. Thiswould cover both approaches; the local PAM administrator can then build a systemwith a centralized or distributed file storing approach. The administrator is thusresponsible for building a system where each file server is technically available fromall other server computers.

TCSPotential problem

If the CS and TCS components are run as individual processes, they maysignificantly load the computers on which they are run, leaving less computingpower to calculation programs. This might be a major problem, if LSF (“LoadSharing Facility”) is used when starting calculation programs; if LSF puts tasks inqueue because the computer is heavily loaded, there will still be a TCS process. Itsounds like a vicious spiral; each TCS delays its own calculation programs finishtime, thus prolonging the time the computer is loaded.

Further investigationIt was agreed that the performance and system load in this situation should beinvestigated.

Part of a solution might be to ensure that once a TCS has started a calculationprogram and waits for its termination, it should do very little. No “busy waiting”loops should be implemented; maybe a timer interrupt trigging every 5 seconds isenough.

DependenciesProblem

Architectural proposals A1 and A2, which the author preferred until the presentationmeeting, are dependent on external software. “NFS” and “Samba” are products used



by Windows when reading files from our current Unix cluster. There might be othercomputers from which PAM cannot read files with this software, and externalcustomers might not use these products.

Suggested solutionOne solution may be that architectural proposal A1 or A2 is extended with a simplefile handler on each file server, thus enabling file reading and writing over thenetwork. Thus, it would look something like proposal B2. See figure G.1.

Server computer

Client computer

PAM client

SB

PAM client

Server computerSB

Client computer

PAM client

CS TCS

TCSTCS

FHFH

Figure G.1: A sketch of a refined A1 architecture. FH = File Handler.

Identified services are:

• “Read file” takes a file name and returns the contents.

• “Write file” takes a file name and the contents.

• “Delete file” takes a file name and deletes the file.

• “List files in directory” takes a directory name and returns all files anddirectories in the directory.

Reading and writing of files could be implemented simply by always returning orwriting the whole file. In case of large files, the protocol might need to be refinedbecause of performance reasons, allowing parts of files to be read and written.

The PAM client must probably be extended with an RPC (“Remote Procedure Call”)layer, as described for architectures B1 and B2.



Other risksOraTcl

NL had noticed that in the OraTcl documentation referred to, there is no documentedsupport for Oracle’s data type LOB (“Large OBject”), only the type LONG (a similardata type that will be removed in future Oracle releases). The author mightaccidentally have mixed the terms, or the author might have read otherdocumentation where it is stated that LOB’s are supported. This issue is important tothe success of any of the proposals.

Tcl ThreadsTcl threads are, according to NL, not operating system threads, but an internal tclconstruction, not much cheaper than processes.

Documents

Architectural Solutions in PAM - Research in Innovation, Design