Meta-Programming - a Software Production Method by Charles Simonyi

Meta-Programming: a Software Production Method by Charles Simonyi

CSL-76-7 D� 1976

This thesis describes an organizational schema, designed to yield very high programming productivity i n a simplif ied task environment which excludes scheduling, system design, documentation, and other engineering activities. The l everage provided by high productivity can, in turn, be used to simplify the engineering tasks.

Difficulty of communications within a production team, caused by the inherently rapid creation of problem specific local language, i s posited as the major obstacle to the improvement of productivity. The thesis proposes a combination of ideas for simplifying communications between programmers. Meta-programs are i nformal , written communi�ations, from the meta-programmer, who creates the local language, to technicians who learn it and actual ly write the programs.

The abstract notion of local language is resolved into the questions: what are the objects that should be named, and what should their names be? The answers involve the concept of painted types (related to types in programming languages), and naming conventions based on the idea of identifying objects by their types.

A method of state vector syntax checking for debugging the programs produced in the high productivity environment is described.

Descriptions of the relationships or contrasts between the meta-programming organization and the relevant software engineering concepts of high level languages, egoless programming, structured programming, Chief Programmer Teams, and automatic program verification are also given.

To verify the predictions of the meta-programming theory, a series of experiments were performed. In one of the projects, three programs were produced from the same specifications, by three different groups in a control led experiment. During the longest experiment 1 4,000 lines of code were written, at an average rate of 6. 1 2 l ines/man-hour. The control led experiments showed that comparable results can be obtained by different persons acting as meta-programmers. The difficult experimental comparisons of the meta-programming and conventional organizations, however, yielded interesting, but Inconclusive, results.

KEY WORDS AND PHRASES

Software engineering, management of software production, measurement of programming productivity, meta-programming, painted types, naming conventions, state vector syntax checking

CR CATEGORIES

1 .53, 2 .2 , 2 .42, 2.43, 3 .50, 4 .22

XEROX PALO ALTO RESEARCH CENTER 3333 Coyote Hill Road I Palo Alto I California 94304

© Copyright 1977

by

Charles Simonyi

-ii-

ACKNOWLEDGEMENTS

First, I would like· to thank my parents for thei r courageous support, which was tendered

often under difficult circumstances. I am also extremely grateful to Mr. Niels lvar Bech,

former President of A/S Regnecentralen, Copenhagen; and Professor Cornelius A. Tobias

of the University of California, Berkeley, for their timely and generous help.

The idea of including experimental verification into the thesis was due to Dr. Jerome I.

Elkind. Manager of the Computer Science Laboratory of the Xerox· Palo Alto Research .

Center. Substantial resources for the experiments, in manpower, computers, and other

facilities, were provided by Xerox Corporation. Dr. Elkind's continuing Sl;1pport was

essential for obtain ing these resources. Dr. Elkind also gave helpful advice about the

management aspects of the thesis and the experiments.

Professor Vinton Cerf, my Principal Adviser, helped me in forming my ideas into a thesis

with great patience. Discussions with Professor Cordell Green, who was also on the

reading committee, were also very helpful. The d ay-to-day interactions with Dr. Butler

W. Lampson, the third committee member, were extremely rewarding and pleasurable.

The expenditures of resources in the experiments were wisely monitored by a Board of

Directors, chaired by Dr. Elkind. Other members of the Board were: Dr. James Morris,

and Robert F. Sproull.

Dr. Ben Wegbreit contributed much valuable criticism. Advice on some combinatorial

problems was given by Dr. Leo Guibas.

I am deeply indebted to the seven i ndividuals who participated in the experi ments. Their

dil igent effort was absolutely essential to the success of the experiments. The valuable

contributions of Dr. Patrick Baudelaire and Thomas Malloy deserve special mention.

This thesis was typed by the author himself. The illustrations were drawn by Joe

Leitner. Vicki Parish and Gail Pilkington helped with the layout work.

-iii-

TABLE OF CONTENTS

CHAPTER 1: The Business of the Software Producer

1.1 I ntroduction

1.2 Software production as a process _technology

1.3 Design strategies when production i s i nexpensive

1.4 Process technology and software shari ng

1.5 Measures of software productivity

1.6 What determi nes productivity?

CHAPTER 2: Meta-Programming

2.1 In troduction

2.2 Optimizing software productivity

2.3 Task orders and meta-programs

2.4 Abstractions and operations

2.5 Naming of types and quantities

2.6 Debugging as an organized activ i ty

2.1 Other m eta-programming conventions

2.1.1 Divisions in meta-programs

2.1.2 Naming conventions for procedures

2.7.3 Name hyphenation

2.1.4 Parameter order i n procedures

2.7.5 Use of comments for explanation

2.7.6 Programming language syntax extensions

2.7.1 Standard operations

2.8 Meta-programming example

2.9 Comparisons and combinations with other programming methods

2.9.1 H igh level languages

2.9.2 Buddy system, Egoless Programming

2.9.3 Structured programming. goto-less programming

2.9.4 Chief programmer teams

2.9.5 Automatic program verification

- iv-

1

2

4

10

15

19

22

25

26

26

30

34

39

46

54

55

56

51

51

57

58

59

59

66

66

67

68

68

71

CHAPTER 3: Experimental Verification

3.1 Introduction

3.2 Experimental approach

3.3 Experimental environment

3.4 Experimental setup

3.5 Measurement methods

3.6 Task specifications

3.7 Productivity accounting

3.8 Potential sources of measurement errors

3.9 Experimental results

3.9.1 Early experiments group (Projects A and B)

3.9.2 Project C

3.9.3 Projects 01, 02 and D control

CHAPTER 4: Conclusion

4.1 Conclusions from the experimental results

4.2 Recommendations for future work

APPENDICF.S

A: Programming Test

8: Format of the Measurement File

C: Project C System Description

D: Task Order for Project D

E: Summary of the Measurements

REFERENCF.S

INDEX

-v-

73

74

74

15

77

79

80

81

82

83

84

86

88

97

98

100

103

104

107

111

121

126

134

139

LIST OF ILLUSTRATIONS

1. ·Bui lding a runway

2. Organiz ing continuous process software production

3. Design strategy when i mplementation is inexpens ive

4. The effect of h igh productivity on software sharing

5. Approximate conversion factors relat ing the most common units of production and time

6 . Structure of a software production team

7. Contours of the function P(T,Q,M )

8. Localization of programm i ng error by b inary search

9. Organization of the experimen ts

10. Productiv i ty plots for Projects A+B and C

11. Productivity plots for Projects 01, 02, and 0 con trol

12. Productiv i ty plots for the participants i n Project D control

13. Cumulative production plots for Projects 01, 02, and D control

-vi-

6

8

11

17

20

28

43

48

78

85

89

90

91

CHAPTER 1: THE BUSINESS OF THE SOFTWARE PRODUCER

2

1 . 1 Introduction

The explosive growth of the computer industry is l i kely to contin ue in the com ing years.

Between 1960 and 1971, the number of appl ication areas for computers has grown from

300 to 2000, and it is estimated to reach 7700 by 1985 [Kosy]. The major problem facing

the i ndustry is whether programming technology can be improved to keep up with the

expected growth. Improvements would be necessary even if the current demands could be

wel l satisfied; i n fact, recent performance has been characterized as a "crisis" [NAT01].

further underl i n ing the urgency and importance of positive action.

There are many ways of approaching the problem: developers of advanced software

production tools and techniques and educational i nstitutions teaching the use of the tools

and techn iques wi l l certainly make i mportant contributions. Key contributions must also

be made by management and management scientists. The best results, however, wi l l come

from concerted efforts to improve the tools, techniques, and management as a total

production system.

H istorical development of programming technology confirms this view. In the late

n i neteen-si xties and early seventies, management of software production, starti ng with

fi rst level management and higher, was mostly concerned with al location of resources in

response to changing circumstances. Management had practical ly no d irect i nfl uence i n

techn ical matters; most, and often a l l , tasks which required detai led understanding of

software concepts were. entrusted to programmers, hence programmers, by default, could

maintain absolute product control [ Brandon]. Programming management handbooks had

to be content with recommendi ng management controls based solely on visible indicators:

specifications, flowcharts, comments, and other forms of documentation [Metzger]

[Weinwurm].

At the same time, computer theoreticians were developing significant new ideas

interrelating high level languages [Hoare-Wi rth] [Dahi-Nygaard], proofs of correctness

[Naur1] [ Fioydl ], and structured programming [ Dijkstra]. These developments did not

have an early impact on management practices, however. The reports from the important

1968-69 NATO Software Engineering conferences [NAT01] [NAT02] did not yet show a

desire to attack the recogn ized techn ical and managerial problems s imultaneously.

In 1971 Weinberg's book, "The Psychology of Computer Programming" al ready discussed

techn ical and styl istic issues together with new forms of organ izations and

interrelationships based on "egoless programming". Meanwhi le the state of the art in

engineeri ng theory was further advanced by the clarification of data structuring [Hoare],

new languages [Wirth2], new modularization criteria [Parnas1], and firm sty l i stic

pri nci pies [Kernighan-PI auger].

META-PROGRAMMING: A SOFTWARE PRODUCTION METHOD

A first clear break in this pattern of separate development of technology and management

occurred after practical experience with the Chief Programmer Team (CPT) organization

had been publ ished [Mi l l s] [ Bakerl]. I n a CPT, the Chief Programmer, in a first level

managerial position, provides technical leadership by programming critical program

sections and assigning specific subtasks to other team members. The organization relies

on a number of supporting techniques (3.8.7), especial ly on the institutional use of

structured programming.

[ Horowitz].

More recent books echo similar sentiments [ Brooks]

Meta-programming, the main subject of the present dissertation, and its host organ ization.

the Software Production Team (SPT) form an integrated method of software production.

J ust as in a CPT, the SPT's first level manager, the meta-programmer, is directly involved

in programming activities. The techniques supporting SPT, on the other hand, are

different from those used in CPT. These differences wil l be discussed in deta i l in Section

3.8. The most important feature of the SPT organization is the emphasis on optimizing

productivity in the simpler phases of programming, which are detailed design

(meta-programming), coding, and debugging.

The dissertation is organized as fol lows. Fol lowing the present introduction, alternative

task environments are proposed for the software producer. The purpose of the argument

is to motivate the specific optimization criteria for the SPT organization. Productivity is

shown to be the key parameter. The leverage provided by a highly productive producer

can be used to simplify schedul ing, design, and other forms of decision making.

Difficulty of communications within a production unit, caused by the rapid creation of

problem specific, local, l anguage, is posited as the major obstacle to the improvement of

productivity.

Technical detail is fi rst presented in Chapter 2. Within the SPT organization, language

creation is careful ly control led by the meta-programmer, who issues written

meta-programs defining local language and specifying the program logic as wel l .

Technicians write and debug the actual code on the basis of the meta-programs. A

n umber of conventions for increasing the effectiveness of meta-programs, and for

organ izing the debugging activ i ty are a lso presented. A detai led example is given to

i l l ustrate these ideas. The chapter concludes with a series of comparisons and possible

combinations of SPT and other software engineering concepts, CPT in particular.

Chapter 3 describes a series of experiments which were performed to measure the actual

performance of the method. The experiments were designed to serve as demonstrations of

practical capability. The problems the experimental teams worked on, required between 3

and 12 man-months of effort. One of the experimen ts incl uded a Software Production

Team and a traditionally organized control group working on the same programming

3

4 CHAPTER 1: THE BUSINESS OF THE SOFTWARE PRODUCER

problem. Chapter 4 summarizes the results of the experiments and offers some

conclud ing remarks.

1.2 Software Production as a Process Technology

In th is section we shal l d iscuss possible answers to the basic questions faci ng the software

producer: What is the product? Who are the customers and who should they be? In

[Drucker] Peter Drucker convincingly shows that the way the business is defined,

together with the choice of customers, can determine the viabil i ty of a business enterprise

or a whole industry. The contemporary software industry has defined its m ission as

follows:

The software product is the complete (tested and documented) set of software

components which satisfies some data processing need. The customer, or user, i s

the entity with the need.

It is unfortunate that such a definition did not appear in the general literature, and had to be composed by the present writer. As such, it needs some clarifications. First, the products actually delivered may or may not satisfy the definition. A lso note that system programming is not excluded under the broad interpretation of the term data processing favored by [Naur2].

Such a product has been certainly useful and saleable. Nevertheless, serious problems have

arisen . The cost of the software product, both i n absolute terms and relative to hardware

costs, has been risi ng d ramatically [ Boehm] [Aron]. Often, producers have been unable

to l i ve up to their promises: software was del ivered late, incomplete or otherwise not

satisfyi ng the user's need. Managers of large systems fought heroical ly as schedules

slipped and hundreds of man-years were consumed [Brooks]. What has gone wrong?

Observations by [Metzger] showed that the problems have been often due to unstable

problem defin i tions, unrealistic deadl ines and poor plann ing. Other sources ([Boehm],

[Royce]) reported s imi lar l ists. The common factor i n the causes cited is uncertainty.

The stabi l i ty of the problem defin i tion is uncertain, the deadl i nes are uncertain and so on.

Uncertainty is general ly disl i ked; hence, the propensity of managers - and the advice of

many experts - is to strive to el iminate it by plann ing. Thus, general ly, the problems

were defined and deadl ines scheduled with the utmost care permi tted by the

circumstances. When the uncertainties i nherent in the problems and the schedu les

perturbed the plans, projects fai led or produced disappoin ting results. Instead of

ascribing such events to the lack of sufficient preparation and plann ing, they might be

more precisely diagnosed as fai l ures to deal effectively with uncertainty . Valuable i nsight

into the proper treatment of uncertainty can be gained from the experiences of other

industries.

META-PROGRAMMING: A SOFfWARE PRODUCfiON M ETHOD

Drilling for water or oil is a notoriously risky business. Yet the outfit which

performs the actual drilling operations is shielded from most of the risk by a

simple formula: they will drill for $1/foot in clay, $100/foot in rock until asked

to stop. The entrepreneur who commissioned the well has absorbed the

uncertainties of what lies underground: clay, rock, oil or more rock. The basis for

the absorption may be a scientific geological survey, intuition about probabilities

or a tax scheme. This is not to say that the contractor's operations are without

risks: the drilling contractor is responsible for tool changes, safety, the

productivity of personnel and dealings with the union.

Commissioned to build a new runway for an airport, the civil engin�ering firm

doesn't need to worry about the uncertainties of future needs for the runway. The

decision, right or wrong, has been made to proceed; partially on the basis of the

plans and cost figures submitted by the engineers. A Ready-Mix contractor will

perform the largest subtask in the project: the pouring of concrete. Since the

market price of poured concrete is fairly stable, the engineers' cost estimate is

really an estimate of the volume of concrete that will be required. The engineers

are thus responsible for ("absorb the uncertainties of") the validity of the plans,

while the contractor must produce, deliver and pour the concrete as the plans

require. These relationships are summarized in Figure l.

The above are "gedanken" or "thought" examples, idealized to suggest metaphors for alternative organizations of software production. After the inferences will have been drawn, the full impact of the painful realities of life will also have to be analyzed to reach a conclusion.

Uncertainty absorption is not the same as a reduction or elimination of uncertainty by

planning; it is merely a promise of action which enables others to operate free from the

"absorbed" uncertainty.

Partitioning responsibilities along the lines of the above examples results in a number of

remarkable developments. Since the participants individually have less to worry about,

specialization can take place. As new information becomes available, changes can be

implemented. If a project is unsuccessful, blame and financial burden can be correctly

assigned. Reputations can be established for reliability and capacity independent of the

merits and, to a large extent, the nature of the projects.

The engineers, for example, might have selected the contractor on the basis of his

good performance in an otherwise disastrous freeway project. The customer �

airport, in turn, trusts the engineers because of their previous successful execution

of a runway, although in a different part of the country. Uncertainty about the

local conditions will be borne mostly by the contractor.

The driller's formula permits his customer to stop the sinking of the well, which

he might want to do if recovered core samples look unpromising. A reasonable

5

6

PUBLIC

Needs air transportation

AIRPORT MANAGEMENT

Absorbs uncertainty about future transportation needs

Needs new runway

CIVIL ENGINEERING ORGANIZATION

Absorbs uncertainty about the abstraction of runway. Determines volume of concrete required

Needs concrete

READY-MIX CONTRACTOR

Absorbs uncertainty about delivery, price, quality

Pours concrete

(which turns out to be a runway)

(which satisfies future transportation needs)

Figure 1 Building a runway

META-PROGRAMMING: A SOFfWARE PRODUCfiON METHOD

minimum charge may be required to protect the driller from caprice. There is no

"loss" incurred by the driller when asked to stop, the stoppages merely tend to

reduce the average depth of the wells he drills. The extra business attracted by the

protection more than compensates for the inconvenience.

In these examples the flexible relationship between the contractor and his customer is

possible because the contractor's service is a continuous process, characterized by:

1. Small unit size. Units are determined by the boundaries where delivery can stop

without residue. Since small is interpreted relative to the requirements of a

customer, small unit size implies the expected delivery of large number of units.

2. Uniform production method for the units. Hence, the production process is the

continuous application of the production method to make the units. The

repetitive nature of the production process means that its properties can be

precisely measured for control and optimization. Scheduling of delivery becomes

a matter of reserving a portion of the productive capacity.

The units produced need not be uniform or interchangeable. Even the homogeneous concrete-mix ceases to be uniform when thought of as a product to be del ivered at a given place and a given time: two shipments could not be interchanged. The production process, however, is uniform for the delivered mix: prepare mix first, then load, del iver, and unload.

The key to understanding the difficulties of the software industry is the observation that

the software producer is expected to absorb too much uncertainty, in particular the

uncertainties about the customer's needs, the method of solution, planning, scheduling,

writing, testing, and documenting the implementation of the solution. To improve the

situation, the absorption of uncertainties should be partitioned and continuous process

production of software should be introduced. The characteristics of continuous process

production, as defined above, are manifestly incompatible only with the engineering

phases of software production: analysis of the user's needs, choice of algorithms, user

documentation, and acceptance testing. The production phases: detailed design, coding,

testing, and internal documentation, can be organized in a continuous process, as will be

shown in detail in the sequel. As a first step in implementing the partitioning suggested

by this distinction, we define the products of the production organization:

The units of production of the software production organization are lines of

proto-software which work toward the solution of well-defined problems. The

customers are software engineering organizations.

The relationshi ps between the user, the production organization and the engineering organization are summarized in figure 2. For brevity's sake, we wil l generally write code for proto-software.

The lines produced are interdependent in that they must fit together with other lines to

form procedures, modules or programs so that they can run on computers. Procedures or

7

8

USER

Absorbs uncertainty about desirability and form of solution

Needs software solution

ENGINEERING ORGANIZATION

Well-defines problem, absorbing uncertainty of the volume of proto-software required

Needs proto-software

PRODUCTION ORGANIZATION

Absorbs uncertainty about delivery, unit price, quality

Produces proto-software

(which is refined into software by the engineers)

(which satisfies the user's needs)

Figure 2 Organizing continuous process software production

META-PROGRAMMING: A SOFfWARE PRODUCTION METHOD

modules may be the most common units of delivery, but small pieces of replacement code

may also be offered; at any rate, units are small and rule 1 above is satisfied. Lines are

the units of charge. They represent tangible incremental value for the customer, because

they can be individually associated with some aspect of the customer's problem, and they

are ready to be used in an environment which already exists or is formed by the other

lines delivered.

The question "How many lines or units is a program?" is just like asking how

many feet of concrete is a runway? Well, 100 ft. is not, 5,000 ft. is, and so is

10,000 feet. Which is the better runway? That depends on the airport's needs. If

the needs change, existing runways may be lengthened or the building of a new

runway may be cut short - which is not the same as leaving the runway unfinished.

Most problems cannot be solved by any single line of code, so the product, in general, can

only contribute to ("work toward") the solution of a larger problem. This allows the

producer to concentrate on the rate and efficiency of production, or productivity, and

charges the engineers with the responsibility of estimating the volume of code that will be

required.

The technical term proto-software is used to distinguish the product from user software

which is refined from proto-software by the engineering organizations. Proto-software

comes in a single quality grade; it is, say, 99.7% correct. Refining improves the quality

further, as required. The other technical term, well-defined problem, implies that the

engineering organizations have absorbed substantial uncertainties in the process of

well-defining the users' not so well-defined problems. Indeed, well-defining is just the

engineering partition of the conventional design phase. The difference between

well-defining and the other partition, production design, is precisely that the former

absorbs uncertainty. Production testing and refining are similarly related. When

production testing reaches the 0.3% errors/line level, distinctions between actual

production mistakes and singularities due to definitional and user uncertainty become

blurred. At this point further testing is the best performed by the engineering

organization. When the engineers expose an error or decide on a change, they will ask the

producer to deliver the replacement lines of code.

The savings perceived by the end user will depend on the increases in the engineers' and

producer's productivity, weighted by the fractions of their respective participation in the

total effort. Our approach for getting the largest savings will be to obtain large

productivity gains in the production phase and at the same time ensuring that the value

contributed by the producer can dominate the engineers' share. Implicit in this strategy is

the belief that methods for the significant improvements in engineering productivity are

already available, for example in [Dijkstra] [Parnasl] [Wirthl] [Hoare]. . However, the

question of how engineering practices might be influenced by the access to highly

9


productive production organizations is a new issue which will be discussed in the next

section.

1.3 Design Strategies when Production is Inexpensive

Design is easily identified as the critical phase in software production. In typical projects

40% of the effort is spent designing [Boehm]; moreover, the quality of design greatly

affects the project schedule [Brooks]. In this section, we shall explore ways to reduce the

sensitivity of production costs to design.

The outputs of design are choices; to design is to make design decisions. There are two

activities supporting the decisions: first, alternatives must be proposed; and second, the

alternatives must be evaluated. The difficulty of creating alternatives ranges from

outright discovery to the simple recognition that a standard approach might work. The

evaluation of alternatives might take the form of intuitive or rigorous proofs of

correctness, and performance analyses.

It is often attractive to accept the overhead of conversions of related problems into the

domain of applicability of a highly productive technology. A manifold increase in the

productivity of software implementation would make design much more expensive

relative to implementation. The "distortion" of the price structure would tend to reverse

established preferences. In the paragraphs following, we shall elaborate on such

"reversed" operational decisions, which would be appropriate when the incremental design

costs exceed the cost of equivalent production. Figure 3 illustrates how the lowering of

implementation costs tends to push more decisions into the region of reversed preference

below the diagonal. As the coordinates indicate, the decisions involved must be capable

of converting, further decision making and implementation. It is interesting to note that

reversed decisions can be readily observed when implementation costs are of little or no

importance - as in certain phases of a space project or in emergencies.

I .3.1 Implement without exploring all alternatives

It is seldom possible to explore all alternatives for a decision, therefore the issue

here is a matter of degree. For a cost-effective approach, the cutoff point in

considering alternatives should be determined by the cost of further deliberations,

compared with the expected incremental value. If no further decisions are

pending on the choice, that is the current decision is an independent one, the best

result that can be expected is that the implementation will not have to be redone.

Thus the cost of implementation has direct bearing on the incremental value of

decision-making.

�IMPLEMENT

/ /

Figure Ja For typical decisions (area around the arrow) cost of additional design is less than cost of equivalent implementation.

�IMP LEMENT

/

/ /

/

Figure Jb If implementation costs are lower, implementation may be preferred for design (shaded area).

1 1

12 CHAPTER 1 : THE BUSINESS OF THE SOFTWARE PRODUCER

Discussion of non-independent, or basic, decisions is outside of the scope of this dissertation. However, it should be pointed out that there are methods available to convert basic decisions into independent ones by hiding the information about design decisions in modules [Parnasl] (1.3.4). This makes the effective handling of independent decisions even more importanl

Some guidelines for controlling independent decisions may be the following:

Some decisions are operationally unimportant. For example: a space/time

tradeoff opportunity in a situation where both space and time are plentiful.

Many decisions turn out to be operationally unimportant. For example: if

there is an important limit on space, space tradeoffs are consistently

made. If the limit is not reached, some of the tradeoffs become, in

retrospect, unimportant.

Sometimes seemingly important decisions are relatively unimportant. Such

situation may arise with the discovery of a serious problem which dwarfs

the existing ones. As a corollary, while there is some probability of a

serious unknown problem existing, the importance of all decisions is

diminished.

Sometimes only implementation can suggest the right decision, and then a

pre-implementation decision is meaningless. Such is the case for many

human engineering and user requirement problems.

Implementation often suggests ways for better decisions. This means that

decisions are simpler to make and are more reliable the second time.

([Brooks] Chapter 2).

These observations can be combined into a startling but viable strategy: make the

meta-decision to consider all independent decisions initially unimportant. For

decisions which belong to the first four of the above five categories, this

treatment will be, in fact, proper. In the remaining fifth case, when the decision

"bounces", our loss will not be total since we are guaranteed valuable clues for the

correct decision.

Unimportant decisions should be made by reference to standards or by conscious

arbitrariness. The important thing is that the decisions be made, and made swiftly.

In management science, this principle has long had many adherents. In [Morrisl] Robert McNamara is paraphrased as saying, " In the past hour I have made a number of decisions resolving controversies [regarding the standardization in single clothing items among the services] which have been going on since the Department of Defense was created. None of these decisions was important. The important thing is that I made a decision. [We should learn to] make unimportant decisions quickly because action is better than inaction".

META-PROGRAMMING: A SOFTWARE PRODUCfiON METHOD

In conventional design practice, this strategy is not applicable because the expense

of implementation or the schedule demands (or is perceived to demand) success

on the first attempt The low cost implementation is the crucial ingredient which

enables the conversion of design problems into a stream of unimportant and

independent decisions which can be processed efficiently.

It is worth noting that truly important decisions are not only expensive to make,

but they are also dangerous! By definition, the effects of errors in important

decisions can be disastrous. By contrast, unimportant decisions cannot, by

themselves, cause much harm. When dealing with unimportant decisions, the

designers' effectiveness can be measured, controlled and optimized continuously;

an improvement from 80% to 85% correct decisions, for example, may be

considered significant

1 .3.2 Implement alternatives beyond a satisfactory one

In the previous section, we discussed how a decision-maker may bet on the

adequacy of an alternative without detailed evaluation of others. The low penalty

for a losing bet, that is the low re-implementation costs, combined with the

savings in evaluation costs, make the bets attractive. After a satisfactory solution

has been demonstrated, another type of bet may be made on the possibility of a

re-implementation being even better. Again, the lower the implementation costs,

the more appropriate the bet.

1 .3.3 Implement an experimental system as improvements to a test bed

The requirement of software producers that the problems be well-defined does not

exclude their direct participation in research efforts. The researchers, presumably,

are trying to extend the limits of the technology in some area. To take advantage

of the leverage provided by the producer, they should first retreat and well-define

a system which is within the limits of technology, but not too far from the

eventual goal of the research. This system is called a test bed, and it can be

implemented by the software producer. Research can then proceed by piecemeal

extensions of the test bed into the experimental domain. Throughout the research

project, the researchers will benefit from a complete and working system, and

continuous feedback on the validity of their approach.

1 .3.4 Implement alternatives instead of making a critical choice

If the parallel implementation of several alternatives is initiated, the problem of a

priori evaluation can be replaced by the considerably simpler a posteriori

1 3


measurement. The price of conversion is high: all but one implementation will be

wasted. Still, severe scheduling constraints may tip the balance in favor of

accepting the price and postponing the decision. The option of aborting

alternatives prior their completion should be retained; the ability of the

continuous process producer to stop producing can be very helpful. The

implementation of modules should be ordered with special emphasis on the

earliest resolution of the major uncertainties.

1.3.5 Implement instead of analysing or simulating

Analytical tools and simulation are often used to predict the behavior of a

complex system without recourse to implementation. Nonetheless. implementation

is intellectually less demanding and measurements from even a partial

implementation may yield more precise or more credible results than simulation

or the analysis of a simplified model.

1.3.6 Rewrite instead of modifying, translating, or bootstrapping

Solving a problem by modifying an existing, related implementation has obvious

advantages: presumably, the cost of the new implementation will be reduced by the

value of the re-used portion of the existing one. However, the cost of

understanding the properties of the existing software, so that the proper

modifications may be determined, should also be considered. Although recent

developments in making software more readable [Dijkstra] tend to decrease the

cost of understanding, implementation costs may decrease even more and offset

the advantage of re-use in most cases. Implementation from scratch will also

involve "understanding", or production design; nevertheless, for ro·utine problems,

it may be less expensive than the engineering design which would have to absorb

the uncertainties about the modifications. Also, the more complex the problem,

the smaller the probability of the existence of a related implementation.

1.3.7 Implement general rather than special solution

If the straightforward generalization of a special problem can be implemented at a

small extra cost, it. is often reasonable to do so. The general solution is more

likely to tolerate the inevitable escalation of demands; if there is a performance

penalty, the solution can be easily particularized.


1.3.8 Implement special rather than general solution

If the design of a problem turns out to be especially d ifficult, the possib i l i ty of

i mplementing a scaled-down, special sol ution should be considered. The special

implementation can be helpful in a number of ways:

I t may show that the problems are more serious than thought

It may suggest an approach to the general problem (1.3.1).

I t can be used as a test bed ( 1.3.3).

I t will insure agai nst a complete fai l ure s ince at least a part of the original

problem wi l l be solved.

1.3.9 Implement backup algorithms

The choice between alternative i mplementations ( 1.3.4) can be delayed until

"run-time", when a dynamic decision can be made depending on system load,

normal- or restart operating mode, or even user preference. This option can be

taken instead of compromising between opposing requi rements, for example

efficiency versus robustness or beginner versus expert user interface.

I .3. 10 Implement non-essential features

A tightly coupled engineering - producer complex may experience transients of

unused productive capacity. During such periods there is an opportunity to

implement discretionary additions to the software product, such as improved

reactions to errors, improved output formats, defaults, and so on. Such features

are easy to well-define; they generate enthusiasm, and they often turn out to be

indispensable after all .

1.4 Process TecJmology and Software Sharing

If an engineering or production organization can utilize the same software program to

solve two seemingly different problems, their effective productivity, as perceived by an

outside observer, is doubled. The program is said to be shared between the applications

which use it. Effective productivity can be greatly increased by sharing more software,

each among more applications. Moreover, if the shared programs are to be used within

the same system, it is often possible to save memory space using standard virtual memory

techniques [Dennis-VanHorn]. The commonality in the solution can make

documentation, training and use of the product easier, too. For example, if the l ine

1 5

16 CHAPTER 1 : THE BUSINESS OF THE SOFTWARE PRODUCER

editor code is shared, the line editing conventions in a time sharing executive and an

interactive debugger will be the same.

Despite these considerable incentives, shared software is not prevalent, for reasons that

can be surmised from the conditions of successful sharing:

First, commonality between problems must be recognized. A· re-formulation of

one or both problems may be necessary to make the commonality apparent

Second, the uncertainties of the shared approach have to be absorbed over and

above the uncertainties of the individual problems. The shared solution will be

more complex and more expensive than any of the individual solutions to the

problems; within the limited context of any single problem, sharing is not

attractive.

Sharing is most common when the conditions are easily satisfied. For example, the need

for mathematical functions is easy to recognize and the small uncertainties of their

sharing (such as domains, overhead when not in use, error conditions etc.) were cheerfully

absorbed by the high-level language designers and implementors. More complex software,

however, will be shared only if some organization has the intricate knowledge of the

applications to recognize commonality and if they also have responsibility for the

implementations so that the substantial uncertainties of sharing can be balanced by the

local benefits. It is also apparent that the conditions are independent of programmers'

attitude toward writing sharable code. This suggests that any attempt to improve software

sharing by exhorting programmers to "reform" is futile.

The counterargument from reductio ad absurdum points out that programmers might simply refuse to write sharable code. However, by assumption, the uncertainties of sharing have been absorbed and hence the problems can be solved independently. lgn\')ring the ethical problems, the refusniks need not be told at a l l that they are writing programs that might be later shared.

The same method can be appl ied to many other controversies: documentation, comments, exhaustive testing, use of various tools or other confl icts between local and global values. A manager could absorb the uncertainty about documentation, for example, by rewarding a programmer exclusively for doing documentation as planned, regardless of sl ippages in project schedule or unappreciative co-workers. Once the uncertainties are removed, the controversy disappears. It is a separate question whether or not the enforced methodology is actually useful.

The engineering organization in Figure 2 is a natural niche for software sharing

responsibility. The engineers can, in principle, recognize commonalities in the flow of

problems from different users, and they are also experienced in uncertainty absorption.

The high productivity of the proposed organization will amplify this sharing potential, as

shown in the following paragraphs.

Consider a software producer operating indefinitely in a perfectly stable production

environment, without any changes in personnel, computer systems, languages, or methods.

EFFECTIVE

P RODUCTIVITY

P ROJECT

LIFETIME

I I I I I

TAKEOFF POINT

��""��------------� t

Figure 4a Small group is unable to accumulate critical software mass (shaded) within project lifetime.

Figure 4b If group size is increased, subgroups will form. and the critical

mass will increase.

Figure 4c By increasing productivity, the small group can reach the takeoff point.

17


Let us further postulate that within this environment al l sharing opportunities are

exploited. Such a producer would be able to accumulate an extensive l ibrary of the

stereotyped computer science problems: assembler, loader, compi ler, operating system,

information storage and retrieval, l inear programming, graphics and so on. At some

point, it would be d iscovered that the next problem - for example a support system for

large systems described by [ Brown] - can be developed by large scale sharing of l ibrary

i tems. By large scale sharing, we mean the sharing of substantial portions of complete

systems as contrasted with the, more common, small scale sharing of modules.

An excellent example of large scale sharing is the lnterLisp system described in [Teitelman] in which the services of Dwim, the Programmer's Assistant and other powerful applications are available to one another as well as to the interactive user or to the user's programs.

We shall cal l the the point in time where large scale sharing is l ikely to commence the

takeoff point. Operating in the post takeoff regime is exceptional ly rewarding: the

effective productivity wi l l soar and the product qual i ty wi l l benefit from the synergy of

sharing.

In real i ty, the properties postulated for the producer can only be approximated.

Ind ividual programmers working alone wi l l take advantage of almost a l l sharing

opportuni ties. Small , tightly kni t groups can come very close to optimum because of the

number of interactions necessary for recognizing commonal i ties in the problems is stil l

relatively low. The development of the effective productivity of such small producers is

depicted on Figure 4a. Note that the finite project l ifetime (symbol ized by dashed l ine

on the figure) prevents the accumulation of the critical software mass (shaded area) for

large scale sharing. The l ifetime may be determined by local values which d ictate

termination on the achievement of a l im i ted goal. Even if the producer is interested in

achieving as much as possible, the project l ifetime wil l be l imited by natural personnel

turnover, people losing interest, external schedul ing constraints, or computer systems,

languages and methods becoming obsolete.

A producer may try to reach the takeoff point within the project l ifetime l imi t by

assigning more people to the task. Unfortunately, as the number of interactions grows

steeply with the number of people, sharing opportuni ties w i l l be missed. Formal ly or

informally, subgroups of manageable size wi l l form, consciously excl uding the possib i l ity

of large scale sharing between the subgroups in order to control the cost of interactions

and allow work to proceed. The result is shown on Figure 4b: if the work force is

doubled, the critical software mass doubles, too, leaving the takeoff point beyond reach.

On the other hand, a smal l production group operating at sufficiently high rates of

production can produce the critical software mass within the project l ifeti me l imi t as

shown on Figure 4c. Thus we may conclude that h igh productivity can do more than just


reducing unit costs; it wi l l make large scale sharing possible, increasing effective

productivity and product quali ty.

To summarize the argument up to this point, we have proposed uncertainty absorption for

improving the software industry's abi l i ty to deal with the uncertainties inherent in large

software problems. Uncertainty absorption - the promise of action which enables others

to operate free of the uncertai nty - is particularly simple when production can be

organized as a continuous process which can be measured, control led, and hence,

optimized. We divided the software production task i nto an engineering phase, in which

the user's problems are made wel l defined; and production phase, in which proto-software

is produced by a continuous process. The proto-software is given back to the engineers

for refinement to create the final product

If proto-software is inexpensive, design methodologies should be changed to conform to

the new economies. To this end, we l i sted a number of methods to divert effort from

design to implementation. These methods offered new uses for the proto-software

product; for example, for exploring alternative approaches. Thi s also impl ied that some

fraction of the proto-software produced wil l never be refined, since its purpose wi l l have

been fulfi l led entirely within the engineering organization.

Finally, we noticed that add itional benefits can be reaped from enabl ing small production

groups to amass software l i braries which can be shared on a large scale.

1.5 · Measures of Software Productivity

Productivity is tradi tionally defined as the relationshi p between the output of goods and

services and the inputs used in their production. Applied to software production, the

output of program bulk should be expressed as a function of the inputs: the time of

programmers and other labor and possibly overhead costs. In general, this function

depends on the size and type of problem being programmed [ Pietrasanta] [Brooks].

Once the domain of d iscourse is held reasonably constant, two simpl ifying

approximations are justified: all inputs may be expressed in terms of programmer-hours

burdened with the overhead, and the productivity function itself may be taken to be

l inear. Even if the simpl ifications yield crude results, they may be useful in establ ishing

lower limits, the actual functions being always worse than l inear [ Brooks].

The way to obtain the simpl ified productivity measure is then to take a bulk measure of

the software produced, such as l ines of source, and divide i t by the number of man-hours

associated with i ts production. The results of measurements are often expressed using

different uni ts. The approximate conversion factors relating the most common units are

summarized in Figure 5 .

19

20

1 l ine (high level lang) � 1 statement

26 characters

1 l ine ( low level lang)

1 man-month

1 man-year

5 machine instructions

1 machine instruction

1 70 man-hours

2000 man-hours

Figure 5 A pproximate conversion factors relating the most common units of production and time.


Two objections are often made to productivity measures. Some argue that the variations

between ind ividual productivities is too l arge for the measure to be a useful predictor.

Experimental results showing differences as large as 1:26 are often quoted [Sackman].

It is hard to see how the employment of a programmer with, say, 5 times lower

than average performance would be economical ly justified (Note that 5 is the

approximate geometric mean of 1 and 26). Even disregarding salary and

overhead, if this person spends more than 20% [ Mayer-Stalnaker] of his time

communicating with other, 5 times more productive, programmers, making an

equal demand on their t ime, his total contribution will be negative!

Weinberg attributes such results to "ambiguous programming objectives" ([Weinberg]

page 128). In Weinberg's experiments, two groups were given the same problem

descri ption which also incl uded expl icit statements of objectives. The objectives set for

the groups were different, however. The variation of results was greater between the

groups than with in. We can expect, therefore, that uncertainty absorption wil l greatly

reduce the variation of individual productivity among programmers with comparable

training.

The other common objection is that management interest in l ines per man-hour wil l

merely i ncrease the bulk of programs by encouraging programmers to "write insipid code"

[ McCl ure] [Cw]. Indeed, many misguided attempts might have had this result. The

correct approach is not to ask the programmers to be "more productive" but rather to

organize for productivity and reward the programmers for making the organization work.

Peter Drucker's comments are remarkably appl icable ([Drucker] page 267): "It is

fol ly to ask workers to take responsibi l i ty for thei r job when the work has not

been studied, the process has not been synthesized, the standards and controls have

not been thought through, and the physical i nformation tools have not been

designed. I t is also managerial incompetence".

I t is significant that the defin i tion of productivity and the defin i tion of the product of

software production closely correspond - this is a di rect consequence of viewing

production as a continuous process. I t can be said then that the busi ness of the software

producer i s productivity. To improve productivity is to improve the business.

The software producer in· steady-state would program a stream of small units of

approximately equal complexity - all problems being wel l-understood ( 1 .2). The accuracy

of the simple l inear productivity measure wi l l be very good under such conditions. The

precise productivity figures wil l be important to the producer for fine-tuning the

production process, and also to the engineering organization (Figure 2) for quantification

of the uncertainties to be absorbed.

21

22 CHAPTER 1: THE BUSINESS O F THE SOFTWARE PRODUCER

How can productiv i ty be improved? One way is automation. In software production,

automation means the use of Artifical Intel l igence, very high level languages and

automatic proofs of program correctness. Whi le both the volume and the qual i ty of

research in these areas are h igh, practical results are not expected within the next 5 -10

years [ Balzer] [Deutsch]. There remain the short term solutions to improve productivity

by i mproving on the current manual techniques. Although such sol utions do not compare

wel l with the long term promises of automation, there are areas of current practice where

substantial and immediate improvement could be made. One such area is the uti l ization

of the programmers' time. A revealing set of measurements is quoted i n

([Mayer-Stalnaker] page 86). According to this reference, the observed programmers

spent 14% of thei r time read ing and 1 3% writing "with a l i st, card, or worksheet i n

evidence", that i s in "productive capaci ty". "Talking o r l istening (Business)" took 17%.

The time of inexperienced programmers and trainees is especially poorly uti l i zed. They

are often given either meaningless tasks [ Metzger] or inordinate responsibi l i ty, and thus

are al lowed to fai l or cause harm. Clearly, there is room for improvement

1 .6 What Determines Productivity?

Our problem, then, is to find organizational methods to increase programming

productiv i ty. To approach this problem we shal l f irst explore the space of possi ble

solutions by investigating the parameters on which productivity depends.

The productivity of a programmer working alone on a problem is determined by the ski l l

and motivation of the programmer, and by the tools used. There are two reasons why

most problems cannot be solved by a single ind ividual and hence must be solved by teams

or organizations: First, the problem may involve subtasks which require extraordinary

ski l l s possessed only by specialists in that area. The team approach then becomes

imperative if the ski l l s of the specialist are incomplete with respect to the whole

problem. The other reason is that the productivity of an individual is insufficient to

solve most problems within the requi red time.

The productivity of a group depends only partially on the productivity of i ts members, at

least two other factors also have to be considered: specialization and communications.

Specialization is the concentration of effort to a l imited field of activ i ty. If the

concentration is consistent over the long term we speak of area specialization, where the

area might be, for example, numerical analysis or channel programming. In the short

term, the field of concentration is simply the subtask being solved and we have subtask

specialization.

M ETA-PROGRAMMING: A SOFTWARE PRODUCTION METHOD

A rea special i zation is often the sign of special ist's outstanding abi l i ty and motivation.

Because of his long term concentration in the area, the special ist can also acqui re greater

ski l ls and hence, within his area, h is productivity wi l l be better than non-specialists'.

Outside of his field, the area special ist i s l i kely to perform worse than non-specialists,

because of lack of experience and motivation. The concl usion is that the attractiveness of

area specialization depends on the long term importance of the area for the organ ization

employing the special ist. As a corol lary, if solving a problem requires area special i zation

which is otherwise unattractive, the attractiveness of the problem is reduced.

Subtask special i zation has the same features as area special ization, but on a smal ler scale.

Subtask special ists certainly get better acquain ted with thei r own subtask than. with other

aspects of a problem bei ng solved, and thei r productiv i ty wil l r ise on a learning curve.

Dependi ng on the size of subtasks, this i ncrease in productivity may not be very large.

On the other hand, the requirement of long-term i nterest is all but removed, so in the

long run, the subtasks undertaken by an organization and assigned to a person may vary

considerably.

Coordination of special ists is necessary to make sure that the subtask partitioni ng remains

valid as the original concepts are developed by implementing them. Development here

simply means the continuous introduction of detai l or other effects of work being done.

We define a proto-solution as an incomplete solution which can be developed into the

solution of the problem. A partitioning is val id if i t can be i ntegrated into a

proto-solution.

Coordination in any form requires communication of information, which in turn requires

expenditure of effort. This means that specialization also has a negative impact on

productiv i ty, by siphon ing effort away from d irectly productive activities. The cost of

communications is then the other important factor in determin ing the productivity of a

team.

If unchecked, communication costs can grow very fast as team size, and hence subtask

special i zation, i ncreases; i n the l imi ting case the number of potential communication

channels is a quadratic function of the group size ( [ Brooks] page 18). It is somewhat

surprisi ng, however, that communications become more difficult as productivity increases,

even if the n umber of channels is held constant. Th is wil l be shown i n the following

paragraphs.

In typical productive activities involv ing communicating special ists, the activ ity specific

language used for communication is wel l known to the communicants. I t is easy to

remain proficient i n the languages of these activi ties because the languages tend to change

very slowly, the rate of their growth being related to the rate of introduction of new

concepts or abstractions into the process.

2 3


Consider, for instance, the office i n a l ife insurance company handl ing cla ims

([Drucker] page 220). Ski l led special ists working on cla ims of different

complexity can communicate i n the wel l defined language of the trade. Events

causing changes i n the language, such as i ntroduction of a new pol icy type, are

rare and, at any rate, independent of the productiv i ty of the claims office.

Software production differs considerably from other productive activities in this respect

The elements of computer science, computer languages, standards and operational

procedures, form a slowly changing, fundamental, language, the global language of a

software production environment Any production activ ity , however, wi l l give rise to a

special ized, local, language. The production process involves the creation of abstractions

even at, or very near to, the productive l evel; therefore the rate of introduction of

abstractions is necessari ly coupled to the rate of production or productivity. The greater

the productiv ity , the more rapid the change in the language which wi l l tend to impede

further progress.

The term hash table i s understood by any programmer and it properly belongs to

a global language. However, i f in the course of producing a large program a hash

table is needed, a new, more specific, abstraction, say HSHTBL, is created whose

properties are imperfectly covered by the generic term. The new abstraction wi l l

enlarge the local language and entai l comm unication costs.

To be able to d iscuss newly created abstractions without c ircumlocutions, a typical

communication is prefaced with a set of definitions which we shall cal l the dictionary.

The operation performed by the source of the communication wil l be called language

creation while the recipient's action wi l l be cal led learning the language. Creation of the

abstractions themselves is to be d istinguished from creation of language; the latter denotes

the addi tional effort necessary for molding abstractions into communicable form.

CHAPTER 2: META-PROGRArvtMING

26

2.1 I ntroduction

This chapter presents the major thesis: an organizational schema, the Software Production

Team, designed to fulfi l l the requirements of a software producer. The emphasis in this

organization is on the improvement of productivity by simplifying communications

between the programmers. Section 2.2 wi l l propose the use of the wheel network type of organization to m inimize the number of communication channels and to central ize the

important language creation (1.6) function. Language learning ( 1.6) wi l l be overlapped

with task performance to effect further savings.

Meta-programs, as described in Section 2.3, are informal, written comm unications, from

the meta-programmer, who creates the local language, to the technicians who learn i t and

actual ly write the programs. Feedback communications from the technicians to the

meta-programmer are very efficient, because no language creation or learning i s

involved. Meta-programs are characterized more by the ir purpose than by any specific

form or syntax.

In Sections 2.4 and 2.5, the abstract notion of local language is resolved into the questions:

what are the objects that should be named, and what should their names be? The answers

involve the concept of painted types (related to types in programming languages), and

naming conventions based on the idea of identifying objects by their types.

Section 2.6 addresses the problem of debugging in a h igh productivity environment The

method of error localization using state vector syntax checking is descri bed. This method

involves, first, the preparation of procedures to check the run-time consistency of data

structures, and second, a binary search strategy for swift error local ization.

Section 2.7 introduces additional useful meta-programming conventions. The role of

meta-programs in documentation i s also discussed. A complete meta-programm ing

example is presented and analysed in Section 2.8. Final ly, i n Section 2.9 we consider the

relationships or contrasts between the meta-programming organization and the relevant

software engineering concepts of h igh level languages, egoless programming, structured

programming, Chief Programmer Teams, . and automatic program verification.

2.2 Optimizing Software Productivity

We proceed to consider organizational schemes and their effects on the most important

parameters determining productivity. By maximizing the contributions of the parameters

we can find a local maximum which we shall select as the point of interest.

First, the parameters affecting ind ividual productivity - skil ls anq tools - can be

conveniently separated from the group factors, which are specialization and


communication. Considerations of possible i mprovements in program mers' ski l l s woul d

involve deep questions of computer science education. The problems of bui lding

improved or new tools, such as h igh level languages, edi tor- compiler- debugger

complexes, augmentation systems, are also very d ifficult; yet the possib i l i ties are already

well covered in the l i terature [Teitelman] [Engel hart] [Geschke-Mitchel l]. The present

work wi l l excl ude discussion of these questions. Instead, we wi l l assume some realistic constant qual i ty of the avai lable skil ls and tools, and concentrate on the question of optimal organization which will achieve our goal s. This approach retains the option of

uti l izing new ski l ls �nd tools as they become avai lable.

The group factors - special ization and comm unication - are i nterrelated in compl icated

ways. The merits of any given organizational choice must be evaluated by s imultaneous

consideration of i ts combined effects on all group factors.

For increased productivity, communication costs m ust be decreased, consistent with satisfying the essential communication requirements of the organization. The options

number three: the requirements themselves may be decreased by suitable partit ioning of

subtasks; waste of communication capacity can be m inimized by distr ibution on a strict

need-to-know basis, and finally, the most efficient med ium and language can be used i n

each instance.

Note that these and the following comments apply only for task-oriented and not socio-emotional or other supportive communications [Katz-Kahn].

The importance of communications to software production was very explicitly elucidated i n ([NATOI] page 89) Suggestions made there i ncluded proposals covering each of the above points: "effectively structuring the object to be constructed and ensuring that this structure is reflected i n the structure of the organi7.ation mak ing the product" (Dijkstra), need-to-know type controls, and using automation for communication efficiency ( remote consoles, text edi ti ng).

We shal l choose the fol lowing aggregate of organizational schemes to accomplish our

purpose:

wheel network (Figure 6) as the model for the communication channels and task

partitioning in a team of programmers.

new language wi l l be created only by the central node i n the wheel network.

task oriented language i n written form for most comm unications.

The wheel network is a two-level h ierarchical structure consisti ng of a central node and

other nodes which are connected to the hub by the spokes of the wheel. We shal l cal l the

central node the meta-programmer and the other nodes wil l be called technicians (these

designations wi l l be justified later). The com plete network wi l l be referred to as a

Software Production Team or s imply team.

27

28

TECHNICIAN

MET A-PROG RAMMER

LOCAL LANGUAGE

COMMUNICATION CHANNEL

Figure 6 Structure of a Software Production Team


The attraction of the wheel organization l ies in the simpl icity of i ts topology. This

intuition is reinforced by experimental results in psychology which generally confirm that

the efficiency of groups in task performance is greater in wheel networks than in other

networks admitting more channels (for references see [Katz- Kahn] page 237).

Relying on his central position, and having exclusive l icense for language creation, the

meta-programmer can control the d istr ibution of information on the basis of need-to-know. The sum total of new language d irected toward, and learned by, a given

technician is the technician's local language, which i s, in general, d isjoint from other local

languages as shown in Figure 6. The technicians wi l l be subtask specialists not only by

what they do, but also by the local language they understand. The lack of common

language wil l tend to minimize the informal and expensive i nformation flow between

technicians outside of the h ighly optimized channels (but see the note above on

supportive communications). The meta-programmer may be considered an area special ist,

specializing in language creation and meta-programming.

Return, or feedback, communications from technicians to the meta-programmer are

particularly efficient because the language used wil l be known to both communicants.

Thi s point is made in antici pation of tradeoff possibi l i ties between costs and error rate of

forward communications. With efficient feedback avai lable for error correction, the

uncorrected error rate may be allowed to rise and costs can be reduced.

A serious drawback of the wheel organization is that it cannot grow arbi trari ly. The

bottleneck is clearly in the central node, so the team size wi l l be l imited by the

meta-programmer's abi l i ty to perform as the number of technicians increases. The

precise figure for the maximal team size should be determined by experiment, but a

common rule of thumb for managers ([Metzger] page 85) suggests an upper l imit of four

technicians in a team. The question of growth beyond this l imit wi l l be treated in Section

2.9.5.

Except for certain responses to feedback, al l communications from the meta-programmer

to the technicians will be in writing, describing specific programming tasks the

technicians should perform. These communications are the meta-programs, so called

because they describe the steps to be taken when writing a particular computer program.

New language wil l be introduced by including definitions of new terms i n the

meta-programs; expl icit explanation using terms already establ ished wil l always

accompany initial usage. Since the meta-programs wil l be avai lable in written form, the

techn icians wi l l be able to consult the definitions at any time, and thus accompl ish the

tasks and learn the new terms in paral lel. Ideally, the learning process should be

completed at the same time as the task itself, in which case the full insvuctional potential

of the task is exploited and the enriched language can be profitably used as early as the

29

30 CHAPTER 2: M ETA-PROGRAM MING

next task. To start the implementation sequence, the f irst task wil l be described in some

global language (see Section 1 .6), and the fol lowing tasks w i l l use the progressively richer

local language.

The order of local language introduction readi ly fol lows from a design obtained by

stepwise refinement and expressed in terms of levels of abstractions [ D ij kstra] [Wirth!].

Since we want the language of the first task to be the s implest, and later tasks to use

language introduced earl ier, the levels of abstractions w i l l have to be visited from the

bottom up. Note that this does not i m ply that the d esign itself has to be prepared

bottom-up or in any other particular sequence; it appl i es only to the order of the

combined communication and implementation of a design.

The main advantage of the proposed scheme is that the time spent by a technician communicating is reduced to a negl igible fraction: most of the received i nformation wi l l

be processed while performing production tasks; minor clarifications w i l l be obtained by

referring to the written material, and verbal feedback wi l l be necessary only if the

meta-programs contain incomprehensible or inconsistent parts. The cost of writing the

meta-programs wi l l be more than offset by the savings in communications.

2.3 Task Orders and Meta-programs

The key communications within a Software Production Team, as wel l as between the user

and the engineers or the engineers and the producers, a im at getting some software task

performed. We shall use the term task order to denote such comm unications. The

essential characteristics of task orders are the following:

they carry authority to initiate expenditure of effort;

they are instruments of uncertainty absorption;

they must be interpreted in the context of som e global or local language;

a task order uniquely determines some fami l y of programs; members of this

family are equivalent in thei r abi l i ty to fulfi l l the intent of the task order.

Fi rm intent, resulting from uncertainty absorption, can be expressed in a task order by

the use of powerful local language, or by being as expl ic i t as necessary given the ava i lable

global language. Conversely, l icense to follow any prudent course of action, especia l ly in

areas of lesser importance, can be granted by omission of specific i nstructions.

The form of a task order may vary considerably depending on the language available to

those wishing to communicate. For example, al l of the following three communications

can qual ify as task orders under plausible c ircumstances:


1. Write an ALGOL-60 compiler for the Xvz computer. Implement the full language

except for integer labels, arrays cal led by val ue and dynamic own arrays. Use the

reference character set of the Revised Report, available on the ABC terminal.

Implement l/0 as in GIER-ALGOL 4.

2. Implement GcD(m,n) as follows:

El. [Find remainder.] Divide m by n and let r be the remainder.

E2. [ Is it zero?] If r=O, the algorithm termi nates; n is the answer.

E3. [ Interchange.] Set m+-n, n+-r, and go back to step El.

3. Type the following:

procedure TREESORT (M, n); value n; integer array M; integer n;

begin procedure siftup(i ,n); value i ,n; integer i ,n; begin integer copy, j;

copy := M[i]; loop: j := 2 • i;

if j < n then begin if j � n then

begin if M[j+1] > M[j] then j := j + 1 end; if M[j] > copy then

begin M[i] := M [j]; i := j; go to loop end end;

M[i] := copy end siftup; integer i; for i := n ; 2 step -1 until 2 do siftup(i ,n); for i := n step -1 unti l 2 do

begin siftup(1, i); exchange(M[1],M[ i]) end end TREESORT

These examples d iffer greatly in the richness of the operational language. In the first

example, which is a specification for a routine problem, a basic agreement is apparent

about the extremely complex meaning of the term "compi ler" si nce no further

performance, implementation or rel iabi l ity specifications are given. Mutual trust and powerful local language may have been developed dur ing long-term professional

association between the commun icants. Uncertainty absorption by the customer is evident

in the exclusion of certa in expensive language features and the explicit selection of

i nput/output style. All this reminds us of a typical shopper who selects the style and

color of a dress with great care, while rely ing on the shop's reputation for qual i ty.

The second example (an adaptation of Eucl id's algorithm as stated in [Knuth]) uses much

s impler language: a mix of Engl ish, algebra and basic computer science. Th is language is

31

32 CHAPTER 2: M ETA-PROGRAMMING

understood by most college sophomores. The precise meaning of the imperative verb

"implement" is , again , i mpl icit; it is plausibly establ ished by a short-term association

between the communicants. There is very l ittle uncertainty left about the in tent of the

task order, s ince it not only specifies the algorithm, but also suggests a specific

implementation by expl icit looping instead of, for example, recursion. Depending on the

local language, the meani ng of the terms "divide" or "terminate" may also be highly

specific. Th is task order in troduces new language by nami ng both the variables and the

steps of the algorithm. However scant, the new language may be useful, as i n the response

to feedback seeking help: "Print m and n before the interchange!".

Although the thi rd example looks l ike an ALGOL procedure [ Floyd2], it is rather a

request to a typist. The communicants presumably have an understanding about the

requi red fidel i ty and about the " implementation" of the special characters ; , .?_, and

boldface. For the recipient typist, the operational meanings of all characters in the

communication (whether they belong to del imiters, identifiers, constants, or comments)

are equ ivalent, to wit: cause a sim i lar mark to appear on a sheet of paper.

Task orders covering the full range of complexi ties i l l ustrated above may appear i n

different areas of software production. The style of the first example is typical of

programming product specifications passed from a user to a software engineer, or from an

engineer to the leader of a production team. Use of di rect quotation, as in the thi rd

example, is quite proper for modules accepted as black-boxes, where deta i led

understanding of the insides would be rather d ifficult and would serve no i mmediate

purpose. Most local operational procedures for job control, assembly or loading are i n

fact i n this category.

In the Software Production Team organization, meta-programs are the particular task

orders given by the meta-programmer to the technicians for elaboration, that is for the

purpose of creating the actual computer software fulfi l l ing the intent of the orders. Since

a meta-program is just one step removed from a computer program, i t must show

considerable detail , and may be closely related to programming languages. In this respect,

the second example may be representative. Differences between the informal description

of an algorithm (from which the second example was adapted) and a meta-program arise

because the meta-programs possess the properties of task orders. Whi le an algorithm is

an option (one may take i t or leave i t), a meta-program embodies the decision that the

algorithm it represents is, in fact, the proper one for the problem at hand.

Meta-programs can be implementation specific and they may rely on local language.

Publ ished algorithms, on the other hand, are always described in a global language.

The preparation of a detai led plan for a program before coding commences has been long

considered a good programming practice. The use of flowcharts, deGision tables, HIPO


charts, or other Program Design Languages are often recommended. (see, for instance

[ Metzger] [ Horowitz] [ Barry])

The advice in the excellent style manual by [ Kernighan- Piauger] reduces the i ssue

to its essence: "Write first i n an easy-to-understand pseudo-language: then

translate into whatever language you have to use."

A meta-program i s a flexible med ium whereby the detail ed design can be i n i tial ly stated

and iteratively improved. I t can be also used to document the program, as noted i n

[ Kernighan-Piauger]. Moreover, the completeness and correctness of meta-programs, and

therefore their documentation value, is enhanced by operational use d uring

implementation. It should be stressed, however, that the main purpose of meta-programs

is not to be a design or documentation aid, but to disseminate detailed design information

efficaciously. In particular, meta-programs generally omit the reasoning behind the

particular decisions. This is partly because using only the local language already

i ntroduced (2.2), the reasoning m ight be d ifficult to state. The reasons may also be i rrelevant, obvious, and/or un important (1.3).

The syntax and semantics of meta-programs are determined by conventions, which are

essentially admin istrative rules. Uncertainty about the value of the conventions is

absorbed when the team is organized; the meta-programmer and techn icians can proceed

forthwith, assuming that others wi l l comply with the rules. The stabi l i ty of this

organization wi l l depend whether the rules are simple and unambiguous, and whether it is

easier to comply than not. Non-compl iance should result in i mmed iate calamity which

ampl ifies the culprit's appreciation of the intrinsic, if temporarily mal igned, merits of the

broken rule.

Probably the most basic convention is that technicians should precisely follow the

decisions in a meta-program. I t is clearly easier to com ply with this rule than to

embroil oneself i n redundant decision mak ing. If, the convention

notwithstanding, the techn ician changes a seemingly inconsequential decision, such

as the name of an object, the meta-programmer can point out the difficulties

which could be caused by such uni lateral action. Feedback communications would

become less efficient, other techn icians might have already acted on the original

decision, and the meta-programs would have to be updated to reta in their

documentation value. This, however, does not mean that the techn icians cannot

influence the detai led design; they can always feed back thei r observations to the

meta-programmer, particularly if the meta-program is plainly in error.

It is sign ificant that conventions need not involve special software aids. Conventions can

be adapted to existing c ircumstances: the computing environment, avai lable uti l i ties,

implementation language and so on. They can be adjusted as d ictated ·by experience and

33

34 CHAPTER 2: META-PROGRAMMING

measurements to optim ize the continuous production process. Exceptions can be made

whenever appropriate.

Conventions are also expected to improve productiv i ty by s imp1 ifyi ng or altogether

e1 iminating acts of decision making. Thus sma11 excursions in the cost of implementing a

standard decision, relative to other options, are not necessari l y of primary interest

For selecting conventions, analogies with programming languages are very usefu1. I n the

remainder of this chapter we shaH explore how type declarations, type conversions, and

other programming language related extensions can simpl ify the writing of meta-programs.

2.4 Abstractions and Operations

The task of the meta-programmer is to prepare the detailed design of some software and

to put the design into an easi ly communicable meta-program. In thi s section we shall

describe how the we1 1-known concept of type can be used to s imp1 ify the preparation of

meta-programs.

From the early h igh-level language concepts of integer and real types, there emerged the

modern software engineering view that types are classes of values associated with which

there are a n umber of operations which apply to such values [ Dahi-Hoare] [Morris3].

The sign ificance of this tenet i s that i t is truly language i ndependent, i ndeed i t i s

appl icable to h igh level languages as wel l as machine languages or hardware

implementation. The term operation is to be interpreted broadly; i t covers arithmetic and

other operators, assignment, subscripting, procedure cal ls or even peripheral i nput/output

operations, however they might be represented. The type of any val ue can be uniquely

identified by l isting the operations the value takes part in. I t is obvious that even i n a

sma11 program the number of different 1 ists thus obtained wil l be greater than the number

of readi ly identifiable types such as i ntegers and reals, and therefore new constructions

are necessary for the expression of the "excess" types.

While a new piece of software is being created, such an inspection of uses is infeasible,

and if the identification of type is desired, clairvoyance is cal led for on the part of the

designer. What needs to be predicted is: can the variable under consideration share all

operations with some other existing variable? If so, the i r types are the same, otherwise we

have a new type. The prediction ptocess can be s impl ified by looking for differences i n

the fol lowing properties of the variables compared:

card inal i ty of the class of values;

physical d imension (length, time, mass etc.), if a physical quantity is being

represented;


unit of measurement (hours, seconds, words, bytes etc.);

origin of measurement (GMT, local time, starting at 0 or 1 etc.).

Any d isagreement wi l l exclude the possibi l i ty of sharing a l l operations. If they agree, further investigations are necessary, of course.

The process of determi ning types is i l l ustrated by the following examples:

Example 1:

Program for centering a card i mage ([Kernighan-Piauger] page 55). If the input

is:

ABC 1

the output shall be:

ABC 1

The method is to "read the input in to the m iddle of a large array of blanks and

write out the appropriate part with the right number of blanks on each side".

This method was suggested by the availab i l i ty, in FORTRAN, of certa in operations

and the lack of other ones. The i nformal plan for the program is:

1. create array A containing 120 blanks

2. read . card image (80 columns) in to the l ast 80 locations of the array

3. find position L and R i n the card of the leftmost and rightmost non-blank characters defin ing the text "body" to be centered

4. get N, the number of blanks to precede the body

5. output 80 columns starting in the array so that the right n umber of blanks precede the body

To find the types we examine the quantities appearing i n the program. First, we

have A, an array of characters. The associated operations are: read and write 80

characters starting at a given i ndex, and fetch and store a character C at i ndex I .

This immediately introduces two· new types: characters, which can be compared

for equal ity as wel l as stored in A; and i ndices to A, which can take part in loops

( incremented, decremented and compared) and, by defin i tion, i ndex any array

with the same type as A. Are L and R such i ndex types? The program could be

written that way. However, the plan impl ies a conceptually simpler interpretation:

L and R are the fami liar col umn n umbers 1 through 80 on the punched card.

They form a new type, the n umber of different possi ble values (80) bei ng

different then the cardinal i ty of the index type (120). Column numbers can be

enumerated i n loops, converted to indices by the operation " +40" and the

35


difference of two column n umbers may be taken to yield N- 1 . The quantity N

belongs to yet another type representing a count of columns. Al l of the in teger

operations are defined for the count type, moreover, it can be added or subtracted

from an i ndex or column, yielding another i ndex or column provided only that no

overflow occurs.

Considering the s impl icity of the problem, the number of different types may

seem rather large. However, extensions to the problem - to include left and right

flush formats - could be programmed using j ust the types introduced. Types

appear quickly but their number stays almost constant as a program is expanded

with more operations on the basic objects.

Example 2:

I n-core sort program TREESORT (Section 2.3). At least three types can be

associated with the quantities involved: i tems, which can be compared; the array of

i tems, M , which wi l l be sorted with respect to the comparison using the operations:

fetch and store i tem at some i tem index; and, i tem indices. The latter can be

enumerated in loops and, in TREESORT, m ultipl i ed and divided by 2 . The l ength n

of the array M, i s also of the i tem i ndex type. This can be easi ly seen: i , i n the

outer block, is clearly an i tem index, and both i and n appear as the second

parameter to the procedure siftup, therefore they are of the same type. One can

in terpret n as the index of the last i tem, since i ndexing starts with 1 in this case.

These examples show that the idea of types i s i ndependent of how the objects belonging

to the types are represented. All scalar quantities appearing above - column numbers,

indices and so forth - could be represented as in tegers, yet the set of operations defined

for them, and therefore their types, are different. We shal l denote the assignment of

objects to types, independent of thei r representations, by the term painting. When an

object is painted, it acqui res a disti nguish ing mark (or color) without changing i ts

underly ing representation. A painted type i s a class of values from an underlying type,

col lectively painted a un ique color. Operations on the underlying type are available for

use on painted types as the operations are actual ly performed on the underly ing

representation; however, some operations 'may not make sense with in the semantics of the

painted type or may not be needed. The purpose of painting a type is to symbol i ze the

association of the values belonging to the type with a certain set of operations and the

abstract objects represented by them.

The col umn numbers of Example 1 , for instance, are painted integers. Indeed, i t

is impossible to find any other properties of column numbers which might be

considered essential. The fact that column numbers belong to the subrange type

[ Hoare] of integers i n the closed interval [1:80] is certa in ly neither unique nor

invariant if other subrange types over the same interval or conversions to other


card formats with, say, 90 columns are considered. The operations of the column

number type (loops, +40 and difference) are simply inherited from the underly ing

i nteger type.

Any type can be painted, and painted types can take part in the construction of aggregate

types, such as arrays and records, providing an additional degree of type d iscrimination.

Arrays are the simplest representations of mappings from integers (often restricted to a

subrange) to array elements of some possibly different type ([Hoare] page 115). The mapping operation is called subscripting. I t yields a reference to an element given the

subscript, an integer value. Now, since painted types can inherit the operations of the

underlying types, values of any pai nted type based on integers or integer subranges could

also be used as subscripts. If the domain type i s distinguished by pai nting, the type of an

array should be properly characterized by the pair of domain and range types instead of

j ust the range type alone.

Records are aggregate types d iffering from arrays in the fol lowing respects: the elements

are called fields, the types of the fields need not be the same, and the elements are named

by a fixed set of field names. Records are used to col lect quantities of arbitrary types for some common purpose: a record may contain the properties of a complex object, the local

variables of a block or parameters of a procedure instance [ Lampson-Mitchell]. In the

latter two cases, the common terms for the field names are variable and formal parameter

names, respectively. References to fields are obtained using the field selection operation

which takes a record and a field name as arguments. For variables, parameters, and

sometimes for other fields [Wirth2], the record is specified i mpl icitly.

A n umber of advantages accrue from precise type specifications. Firstly, type checking

can be more thorough.

In Example 2, the complete description of the type of the array to be sorted, M, i s

{array with domain item index and range item}, instead of {integer array} or

even {item array}. Specifying the array type this way excludes incorrect

statements of the form:

M[copy] := M[j];

where both copy and j are represented as integers, but one is an i tem and the

other is an item index. The fol lowing statements also contai n type errors, not

otherwise d iscern ible:

M[j] : = j; j : = M[j] ;

The second advantage is related to the first: the set of possible (or legal) uses of some

quantity is small and it is impl ied just by the type of the quanti ty. This is leads us to the

37


idea of coercions [Wijngaarden], or implicit type conversions. We define any operation

which is uniquely determined (within some domain of discourse) by i ts operand and

result types, as a type conversion. It is then expected, that many operations can be

expressed impl icitly just by mentioning the types of the operands and the result

An early appl ication of coercion was the automatic conversion of integers to reals

and vice versa. The former operation (floating) is un ique, the real to integer

conversion, however, can be defined in truncated and rounded versions. By

convention, only one of these - usual ly rounding - is considered for coercion.

The un ique conversion operation from column numbers to indices of Example 1,

i s "+40". Using coercion, the i llegal expression A[L] could be transformed into

the correct A[L+40] where L i s a column number and A demands an index as

subscript. In Example 2, subscripting into the array M converts an i tem index i nto

an i tem. The i l legal expression j > copy could be coerced into M[j] > copy, since

the relations are defined only for l i ke types and there is no conversion from i tems

to i tem indices.

The conversions between painted types and their underlyi ng types may be

considered as the trivial operations painting and unpainting. Thus, in i := 1 , the

integer constant "1 " is coerced into an i ndex type by the { pain t index type}

operation. The inheritance, by pain ted types, of the operations of the underlying type, could also be explained as a conversion of the painted type, by unpainting,

followed by the original operation. For instance, terms of the relation M[j] >

copy may be first coerced into i ntegers, by unpain ting, and then the ")" operation

defined for i ntegers can be appl ied.

Note that a reference to a variable is also an operation, it is the selection of a field from

an implicit record, the local frame of a procedure or a block [ Lampson-Mitchell]. If the

type of the variable is unique within i ts scope, the reference can be made, in fact, by

coercion from that record. Since the record is i mpl icit, it is sufficient to demand the

type, and the variable is determi ned without any expl icit naming. One way the demand

can be made, is by omitting some arguments of an over-determined type conversion

operation which is un iquely identified by the types of the arguments provided. The

operation will then demand the remain ing arguments by their types. Alternatively, an

operation can be specified explici tly and then the omission of any argument will create a

demand for a value of some type.

META-PROGRAMMING: A SOFTWARE PRODUCTION M ETHOD

The use of coercions necessarily reduces the error checking potential of types because an

error may be inadvertently coerced in to a legal, if meaningless, expression. An explic it

signal when coercion is expected can prevent this k i nd of mistake. Another source of

error is i n troduced when a n umber of possible conversions exist and. by convention, one

is designated for coercions. The intent of what is written may be i ncongruous with this

choice.

The connection to meta-programming i s now evident coercions can make the descriptions

of operations and their operands concise. The expressive power of coercions is derived

from the resolution of types; more detailed type specifications mean more opportunities

for coercions.

In summary, we have shown how to i ncrease type resolution by painting. The color of a

painted type represents the association of the type with operations. Painted types can be

clustered in arrays and records; the element selection operations of subscri pting and field

selection can be thought of as type conversions. When the combination of operand and

result types is unique, a conversion operation can be i mplicit and it is called a coercion.

Moreover, references to simple quantities - such as variables - can be also obtained by

coercion if the quantity is considered to be a field in some i mpl ic it record. The purpose of using coercions is to make the part of meta-programs describing operations concise.

2.5 Naming of Types and Quantities

Deciding on the name of a quantity is the prototype of decisions which are unimportant

in themselves, but appear frequently enough to have an i m pact on productivity.

Considering the narrowly defined requirements of productivity, name creation should be

speedy, preferably automatic (automobi le l icense plates are such liames). Names should

be short to min imize writing or typing (or keypunching) time, to reduce the number of

mistyped names and, perhaps, to stay wi th in bounds of existing l im i tations. Names of

extreme brevity or extreme s imi larity should be avoided, however; otherwise s imple

mistakes may transform one val id name into another, rendering some checks, such as

declarations, i neffective. Lastly, names should assist in the association of the name and .

the named quantity; that is, they should be mnemonic.

The most common mnemonic device is to express by the name an important property of

the named quantity, The association is readi ly made i n both d irections: seeing the name,

one learns an important property of the quantity which, in turn, leads to other

properties. Conversely, given the quantity, i ts important properties are known, hence the

name is suggested.

39


In the business oriented language COBOL, there is a standard defin ition for the

quantity larger than all others i n the collating sequence. The name given for this

quantity is: HIGH-VALUE. This name is mnemonic because i t reflects an

important property of the quantity represented.

In example 2.3.3, the quantity named copy is indeed the copy of M[ i] . It requires

deep understanding of the algorithm to see why the property of being a copy of

something else is important i n this case.

A n umber of problems arise with this practice: a quantity may not have any significant

properties, or it may have so many that i t is d ifficult to remember which one was

chosen. Note that the latter problem mostly affects the association in the d irection from

the quantity to the name. In other cases, the important property may be difficult to

express concisely. Yet other quantities share their most important property, complicating

the association from the name to the quantity.

These problems can be exhibited by naming, respectively: the loop variables i n TREESORT, giving rise to the ubiquitous i; the main hash table of variable

identifiers in a compiler, which may be the MainTable, HashTable and so on; the

stack reference to the lexicographical ly enclosing block in an ALGOL runtime

system; or the special value used as a "high del imiter" in COBOL. The actual name

defined for the last quantity is UPPER- BOUND, easi ly confused with HIGH-VALUE.

These problems considerably complicate the naming decisions. The selection of the

property to be expressed by the name takes time, especial ly if shorter names are sought

Nevertheless, it would be a mistake to abandon mnemonic names, because the

development of local languages depend mostly on the ease of learning of new names.

We shall simpl ify the naming process by introducing a compound naming scheme: we

shall select a single property, appl icable to al l quantities, for the major qualifier part of

all names. This part wil l provide enough resolution to identify a single quantity in most

cases, or at least to reduce the number of quantities matching the description to a few. In

the latter cases, a second minor qualifier property wil l be chosen appropriately to provide

un ique identification of the quantity. ·The simpl ifications l ie in the el imination of

explicit decision-making in some cases and the substi tution of a simpler decision for a

more d ifficult one in others. The selection of the minor qual ifier is simple because the

number of quantities to be distinguished is smal l - practically any property would do. In

view of the concl usions of Section 3.3, the property for major qual ification wil l be the

quanti ty's type.

There are many examples of compound naming and using types as qual ifiers i n

programming languages and systems. The early algorithmic language FORTRAN,


for i nstance, encoded the types of variables in to the first letter of their names:

ICOUNT was manifestly an i nteger, RsUM a real, and so on. Actually, this

convention was meant to assist the compiler in assigni ng the proper representation

to the variables.

In ALGOL-W [ Hoare-Wirth] and SNOBOL 4 [ Farber-Griswold-Polonsky] as well

as i n other languages, the procedure creating a new i nstance of a record type i s

named the same as the record type i tself. S ince this procedure is the only object

named by the record type, no minor qualifiers are necessary.

Many time-sharing executives (for example SDs-940 or TENEX) include a type

identifying extension in to all fi le names as a m inor qual ifier. Thus the source

text for some program may be stored i n file PROG.TXT and the compiled binary

version of the same program might be called PROG.BIN. The extensions denote

true types, s ince they determine the operations which may be performed on the

fi les: a text fi le may be edi ted or compi led and a b inary file may be run.

For conciseness and ease of creation, primi ti ve types and some of the painted and

aggregate types wil l be described by two- or three letter tags, abbreviating the spoken,

i nformal, type name. For the other types, the description wi l l be constructed from the

descriptions of constituent types. The construction schema may be standard, or it may be

defined when needed. The schema for arrays, probably the most important one, can be

stated thusly: let X, Y be the descriptions of the domain and range of the array,

respectively; the description mpXY wil l be used for the array type. The reason for short

tags is now evident: longer tags would make unwieldy constructions.

Let us assign tags to the types of Example 2.4.1 as follows: use en for column numbers, c i for character i nd ices and ch for characters. The major qual ifier for

the array A of characters wi l l be mpcich .

Qual ifier construction schemes are not restricted to aggregate types. Consider, for

example, the difference type dX, generated by the ari thmetic d ifferences of any pai r of

objects of type X. A comprehensive l ist of useful schemes is given at the end of this

section. Note that there are no record ,construction schemes on the l ist: i t appears that

records types are independent of the number and types of their fields and are best

described by new tags. This is supported by the fol lowing argument: Fields of a record

represent properties of an abstract object. The reason for add ing a new field, representing

another property of the same object, is to extend the set of operations or to make existing

operations more efficient. This action will not change the type of the record.

Let X be a type, as determined by a set of operations. I f this set is changed, the new set determines type X'. In principle, X is not identical to X'. However, since after the change there remain no objects of type X, we may safely claim that the types are the same.

41


To ensure the sufficiency of the resolution, types should be first d istinguished by painting

as described in the previous section. If groups of identically typed objects remain ,

strongly related objects can be organized into arrays, and new scopes can be i ntroduced to

separate the more loosely related ones. New scopes are created by declaring records or

procedures, for example. Fields need to be identified only within a record and

parameters within a procedure instance. These steps are also good programming practice;

hence in a properly constructed program which uses painted types, type resolution is

probably as good as it can be. Conversely, unseemly type resolution may be an indication

of poor design. We shall return to this poin t later.

In spite of proper specification of types and scopes, in some cases multiple values i n the

same scope, belonging to the same type, need to be d i stinguished, ostensibly by minor

qual ifiers. Since the success of the compound naming scheme depends on the sparing use

of minor qualifiers, the probabi l i ty of such an event should be estimated by enumeration

of the reasons for distinguishing values. Whether a distinguished value is a constant or i s

given by reference to a variable or array element possessing i t, is largely irrelevant in this

case. In either case, a potential for conflict is presen t.

In case of the arrays, values of the i ndex type identifying the d isti nguished array elements must, in turn, be d istinguished. Aggregation of values i nto arrays can el iminate only unnecessary names. Actually. there is an i ndependent advantage to aggregation: operations which need to enumerate all values are s impl ified.

Constant values do not require names if written as constants, such as 3.14 or 'string'. It is good programming style� however, to treat constant values as potential variables, in which case the value has to be named.

Val ues within certain types must be individually distinguishable, in particular, a large

number of procedures, Boolean variables (flags) and values of an enu merated type

[ Hoare] may conceivably appear in some scope. Compound naming offers some help, i n

that the selection of the minor qualifier i s indeed simpler if distinctions need to be made

within the type only, rather than among all objects with in the scope of the type.

In many types, a certa in value i s d istinguished to represent the "empty" or nil object. If

the val ues of a type are ordered, the min and max values are often d istinguished. These

cases can be handled by standard minor· qual ifiers l isted below.

Lastly, identical ly typed variables, parameters or fields may appear in the same scope.

Assuming a stochastic model of random assignment of types to quantities, the expected

number of minor qual ifiers, M , is a function of the number of types, T, and the number

of quantities per scope, Q . Contours of this function are plotted in Figure 7. The plot

reveals that for T = Q, the probabi l i ty that three minor qualifiers will suffice, is better

than 80%. Measurements by [Geschke] ind icate that for 82% of scopes, Q � 8. With the

expedient trick of distinguishing between parameters and local variables by a prefix (see

1 2

1

2

3

4

5

6

7

Q 8

9

1 0

1 1

1 2

1 3

1 4

1 5

T

3 4 5 6 7 8

Q P(T,Q,M) = C(T,Q,M}/T , where

9 1 0 1 1 1 2 1 3 1 4 1 5

C(T,Q,M) = i f T= 1 then ( i f QS..M then 1 else 0} else

M

.E (� ) C(T- 1 ,0-i,M}

i =O

43

M = 2, p > 90%

M = 3, p > 90%

M = 3, p > 80%

Figure 7 Contours of the function P(T,Q,M): The probabil ity that a selection. with replacement, of size Q from T i tems contains less or equal than M repeti tions of any i tem.


below), Q may be halved. The trivial examples i n section 3.3 show that T is l i kely to be

at least 4 and probably m uch larger.

Experience suggests that the property first considered for m i nor qual ification should be

the quantity's position in a spatial or temporal order. Thus the values represented are

often the first or last i n some i n terval, or they are initial, old, new, previous, current or

next in temporal sequence.

The sign ifi cance of compound naming is enhanced by additional benefits. The presence

of the type in every name is extremely valuable for coercions, type checking and general

documentation. Some type checking can be performed even without detailed knowledge

of the tags or operations by a form of "type calculus", not un l ike the d imensional checks

of physical equations:

Let X and Y denote arbitrary tags. Clearly, the types in the expression: mpXY[X]

� Y are consistent. Simi larly for: mpXdY[X] � mpXY[X] - Y.

The type calculus is also useful for defin ing type construction schemes:

Given arbitrary tag X, define dX to be the type such, that X + dX i s also an X.

The abi l i ty to identify the types of objects may be a major reason for fol lowing the

conventions in situations where compound naming is otherwise awkward. Consider the

enumerated type: co = {coRed, coYellow, coGreen} . The choice of names could be

considered inferior to the straightforward: color = {red, yel low, green} were i t not for

the type indication. Besides, making the decision to make an exception is probably more

expensive than the val ue of the difference.

A d ifferent kind of check is made possi.ble by associating semantics with the standard

minor qual ifiers. For example, last may be defined to mean the upper l imi t in a closed

interval . Now, if X and Xlast are to be compared as part of testing whether X belongs to

an interval, there wil l be no doubt that the proper operation is X � Xlast as opposed to X

< X last. By rigidly adhering to the standard semantics for the minor qual ifiers, many of

these common "off-by-one" mistakes [ Kt;rn ighan-Piauger] can be avoided.

A summary of standard major and minor qualifiers is given in the fol lowing table. (X

and Y denote arbitrary tags, throughout. Note that whenever some operation is used in a

definition, the appl icabi l i ty of the operation to instances of the actual operand types is

assumed).

pX pointer to X. Let $ be the indirection operation. $pX is then an X.

aX address of X. paX is an X.


eX counts instances of X (not necessarily al l i nstances). For example, ceo could be a

counter counti ng colors which appear i n a graph (assuming the type definition co

above).

dX first d ifference of X. X + dX i s an X.

mpXY array (map) with domain X . and range Y. mpXY[X] is a Y.

rgX short for mpiXX, array with domain iX and range X.

iX domain of rgX.

IX l ength of an i nstance of X i n words (this construction is useful i n system

programming languages).

tX temporary X, the same type as X. A somewhat inelegant but efficacious device to

d istinguish between parameters and local (temporary) variables i n procedures,

thereby increasing major qual ifier reso lution.

Xmin min imum X val ue: for al l X, X 2. Xmin.

Xmax max imum X value, for all X used as a subscript, X < Xmax. We note that i f

Xmin=O, Xmax i s the cardinality of the domain of mpXY. Xmax=O means the

domain is empty.

Xmac current maximum X value: when X is the domain of some array which is used as a

stack, max may be used to denote the allocated size of the array whi le mac keeps

track of the portion actual ly used, act ing as the top of the stack pointer. For a l l X

used as a subscript, X < Xmac; Xmac � Xmax. Xmac=O means the stack is empty.

Xfirst first X val ue in some closed in terval . For al l X in the in terval, X 2. Xfirst.

Xlast last X value in some closed interval. For all X i n the in terval, X � Xlast. If the empty interval is allowed, it is represented by Xlast < Xfirst.

Xnil dist inguished X value to represent the empty i nstance. May be used for checking

equal ity or i nequal ity only.

45


2.6 Debugging as an Organized Activity

Sin ce the design and creation of program text i nclude only manual checks of correctness,

it seems unavoidable that this intermediate product wi l l contain errors. The process of

local izing and removal of the errors is termed debugging. Other related terms are testing

and integration. The former denotes especial ly the generation of a range of stimul i and

checking the corresponding responses i n an attempt to uncover errors. The i nclusion of

integration in this category reflects the recognition that many errors are introduced when

already debugged components are combined. In tegration, thus, is in the m idst of, and all

but i ndistinguishable from, the debugging activ ity.

Data publ ished in [ Boehm] show that 30% to 50% of the total software cost is l ikely to

be spent on debugging. There are some reasons to bel ieve that meta-programming wi l l

reduce the n umber of errors i n the in i tial program text and thereby simpl ify the

debugging problem. The logic of all software wil l be scrutinized and understood by at

least two persons: the meta-programmer and the technician. The naming conventions

described in Section 2.5 provide additional opportuni ties for checking operator and

operand compatibil ity. Nonetheless, without mechanical checks of semantic correctness -

considerations of which have been excluded (for bibl iography see [ Deutsch]) - debugging could remain a serious problem, especially in view of the expected increase in the

production rate. Consistent with the plan laid down in Section 2.2, we shal l concentrate

on the question of optimal organization while assuming the availabi l i ty of realistic tools

to assist · debugging.

H igh debugging productivity means that individual errors are made apparent, local ized

and removed quickly. Given the volume of activity, i t is reasonable to assume that these

steps will be performed by the technicians. The nature and extent of the

meta-programmer's contribution is a key problem.

The first evidence of a software error, the error indication, may be incorrect termination

including fail ure to terminate, excessive use of resources, incorrect output, or an error

message. The actual error, the cause of the error indication is typical ly removed from the

locus of ind ication both in space and time.

The plausibi l i ty of this effect can be seen as fol lows. An error indication is a

coincidence of a statement capable of making the indication (trap, loop or output)

with the occurrence of erroneous operands wh ich actual ly cause the indication.

Assuming un iform distributions, the probabi l i ty of this coincidence occurring in

the vicinity of the error is low. The situation is compl icated by statements which

depend on erroneous data, but, i nstead of giving an indication, propagate the error

by producing erroneous results. The avalanche resulting from error propagation


i ncreases the probabi l i ty of early indication, but i t also tends to destroy evidence

and generally frustrate analysis.

Frequent checks of the reasonableness of the data pass ing through checking

interfaces, also i mprove the chances of early error i nd ication. This method,

however, is l imi ted i n its appli cabi l i ty. If data about to be used i s checked, for

example i n the case of dynamic bounds checks of array subscripts, the interface

mainly serves to prevent error propagation and to give an earlier and more

controlled indication than the one which would have happened otherwise. Checks

of results from operations are rare, because they would be but restatements of

what has been done immediately preceding.

When, i n a large system, a reference count of a certain class of pointers gets fouled up, that is usually not the fault of the procedures responsible for creating or deleting pointers which unconditionally i ncrease or decrease the count. On the other hand, the procedure which does i nconsiderately smash a poi nter or the reference count itself, is not l ikely to include any checks against that particular form of unexpected behavior. The i ndication of the error could be given by an i nterface check before a pointer is deleted refusing to decrement the zero count. This i ndication would convey very l ittle information about the t ime and place of the actual error.

We can conclude that, whi le i nterface checks can be val uable, the problem of localizing a

large n um ber of errors on the basis of scant i nformation m ust be solved. Localization i s often approached as if i t were a puzzle of the form: W hat could cause the observed error

indication? The solution space - the set of possible answers for this problem - is

extremely large, considering the n umber of possi ble i mmediate causes first, then what

could cause those and so on. A further complicating factor is that the reasoning involved

must go beyond the domairi of the abstractions and operations of the program since the

events reasoned about do not necessari ly take place i n a correct environment. Even in a

wel l protected h igh-level language envi ronment, an error wil l cause a transition from the

domain of the program into a more complex domain where the behavior of objects i s

constrained only by the most complete defin i tion of the language.

Most languages do not have i ron-clad protection. In such cases, or if the error is i n the language processor, execution after some errors is constrai ned only by the defin ition of the v i rtual or real machine. If the error is caused by operating system or hardware malfunction, the constraints can be even more obscure.

These observations suggest that concentration on the post-error regime, includ ing the

error indication itself, may be a mistake; instead, the question to be answered should be:

At what time does the program state change from correct to incorrect? The sol ution

space is a trace, a l ist of the program statements as they were executed. The important

property of this space is not i ts size, but that i t is ordered, and therefore an efficient

binary search can be used to find the correct to i ncorrect state transition point.

A binary search is performed as follows: consider the points of the lasi good state

and the earliest bad state. I n itially, these are at the start of the run and at the error

47

48

STA RT ER ROR PROPAGATION INDICATION

�----------------------�·��--------------------------------------------� state is correct state is incorrect

2

2

2

3

3 6 5 4

• • •

1

1

1

1

Figure 8 Localization of programming error by binary search. Probes 2, 3 and 6 found the state correct; 1, 4 and 5 found it incorrect.


i ndication, respectively. Choose a new probe poin t i n between and decide whether the state is correct there. If so, we have a later good state, otherwise, an ear l ier bad state. Update the points accordingly and repeat The search term inates when the points straddle

an erroneous statement, or a small area wherein the error may be found by i nspection

(Figure 8).

Two operations are essential i n this scheme: exhibit ing the program state at the chosen poin t, and decid ing whether a state is correct or not. In contrast with the "puzzle"

approach, the defin i tions of the abstractions and operations of the program provide a

sufficient basis for determi ning the latter. A n um ber of possible i mplementations for

both operations wi l l be described below, ranging from manual procedures to others

requiring extensive preparations and programming. By the term debugging strategy, we

shall mean the choices among the possible i mplementations. The execution of the search

schema, i nc lud ing the choice of intermediate probe points, we shall call debugging

tactics. This distinction wil l be used for assigning roles to the meta-programmer and the techn icians, respectively.

The simplest way to determine the correctness of a state is by manual i nspection of some

representation of the state. The representation may be a uniform octal or hexadecimal

dump of the bits comprising the state, test output, or a stored b inary image interrogated

by interactive means.

It js important that the representation be adequate for determi ning the correctness

of the data structures comprising the state. Let R be a transfer function, as

defined by [Morris2], such that for some W and for al l x of some type: W(R(x)) =

x. R is then adequate for the given type. Octal dumps or equivalent interactive

tools are clearly adequate for al l types. However, i nspection is m uch simpl ified if

an R transfer function, the test print procedure, i s written for every type to

produce detailed textual images for values of that type, with fields clearly labeled

and formatted accordi ng to their underlying type. (see Section 2.7.7)

Note that it is s impler to show that the state is wrong than that i t is right; a

demonstration of a single inconsistency being sufficient i n the former case while the .

latter involves universal quantification: consistency must be shown for all assertions

characterizing a correct state. This suggests that the inspections start by looking for

inconsistencies. The known earl iest bad state can give a valuable h in t as to where and

what to look for. The problems arise if no i nconsistencies are found this way.

One possibi l i ty is to accept the state as provisional ly correct if it does not contain the

inconsistency of the earl iest bad state. The search then will converge either on the error

i tself, or on an instance of error propagation in which case a new h int is obtained and the

whole procedure may be iterated. This procedure systematical ly uncovers the l i nks in the

49


causal chain of error propagation. While each binary search wi l l converge quickly to the

next l i nk, the n umber of l i nks i n the cha in, and therefore the time for local izing the

error, may be large.

To restate symbol ical ly: let Ab be an i ncorrect data structure, let Ag be an earlier correct state of the same structure in a provisionally good state. The search wi ll

converge on some operation Ab +- Ag ? B, for some structure, or group of structures, B. If B is correct, we found an error, otherwise we have Bb, a new h int

Alternatively, at the cost of evaluating all assertions, a state can be certified correct or

i ncorrect and the search wil l find the error d irectly. The expected large number of

complex assertions exclude the possibi l i ty of manual evaluation. Instead, software check

procedures which determine the correctness of instances of a given type, wil l be

combined to form an easily executable state vector syntax checker ..

The assertions the check procedures evaluate are very s imilar to those used i n

proving programs correct. The s imi larity ends there, however, because check

procedures show the val idity of the assertions restricted to a single, actual instance

of a type, while program correctness proofs extend over all values i n all possi ble

executions.

For example, the fol lowing assertions about a chained hash table are typical of

those appearing in check procedures:

Al l list pointers poin t with in the boundaries of the table.

The number of entries i n each l ist is less or equal than the total number of

entries (no ci rcular l ists).

The hash codes of all keys on any given l ist are equal and point to the

head of that l ist (keys are probably intact, l i sts are disjoin t).

The sum of sizes of entries on all l ists plus the free entries accounts for all

storage in the table (no lost entries).

If any assertions are found not. to hold, a check procedure can i mmed iately

terminate with some indication, ignoring other errors that might also exist. The

indication should identify the assertion which fai led. To assist in identifying the

erroneous value, the verifier should keep some easi ly accessible variables updated

with the type and address of the current value being checked. Further

information about the nature of the error can be gleaned from the meta-program

or code for the assertion.

It is not strictly necessary that the assertions be complete in describing the correct

behavior of the program. If an inconsistency is missed, in the worst case, the


manual procedure described above may have to be followed for one search

iteration. When the error is found, the check procedure can be updated with the proper test

Since a few missing assertions do not cause undue harm, some assertions may be

expl icitly omitted if their cost/benefit ratio is low. In particular, assertions with

memory are often as difficult to implement as the operations themselves, while excluding only rather obvious errors which are best local ized manually.

The most important property of a hash table is that it remembers the keys

that were inserted. The assertions expressing thi s property would involve

an i ndependent implementation of an associative memory to serve as the

model for the behavior of the hash table. The expense of producing the

i ndependent implementation wou ld not be justified by the small number

of additional fai lure modes it would cover.

Consider the memoryless consistency checks of a chained hash table

described above. They can determine whether any l ists are destroyed or

malformed, or if keys are destroyed (unless the bad data happens to hash

into the correct code). The add itional property ensured by a perfect

checker would be that the keys to be looked up, provided as parameters to

the hash table operations, are reproduced and compared faithful ly.

The fai l ure modes covered by the assertions with memory are - related to the smal l

number of operations of a single abstraction. In comparison, the errors detected

by the memoryless checks may be the u ndesired side effects of any erroneous

operation whatsoever. Note also that the private storage of a checker would not

be immune to side effects, either.

It is apparent that the power of m emoryless assertions are derived from

redundancy i n data structures. The usual reasons for redundancy are breakage,

efficiency, and error checking of peripheral operations. By breakage, we mean the

storage of values from smal ler sets, carrying a few bits of information, in full

machine words capable of holding dozens of bits. Redundant secondary I

structures are often bui l t and mainta ined for efficient access to important

functions on the independent, primary, data. The consistency of the structures

can be tested by checking the membersh ip of values i n the sets to which they

should belong, or evaluating the functions of primary data and comparing with

corresponding results obtained from the secondary structure. If the above

conditions are not present, it may be reasonable to in troduce some redundancy

just for the purpose of error checking: such practice is quite common for

hardware peripheral operations where par ity, checksumming, . identifying labels,

write locks, or even error correcting codes help in copi ng with errors. Similar

51


measures may be appropriate for the protection of important data structures,

since, i n the presence of software errors, the address space where the structures

reside can be viewed as a noisy storage med ium.

I t is understood that check procedures can not be used at arbitrary points in the

execution of a program; the critical sections excluded are those modifying the

structure which is checked. A voidance of cri tical sections is an i mportant part of

debugging tactics. Errors local ized to with in a critical section can be certain ly

found by inspection.

What happens if a check procedure contains an error? Errors of omission are

simi lar to the missing assertions d i scussed earlier. Side effects wi l l be also

detected by the standard strategy. Other errors cause incorrect indications; these

are best found i n the operational envi ronment of the check procedure. The in i tial indications of a newly instal led check procedure should be verified by inspecting

the data structures claimed to be malformed. Since check procedures are

memoryless, the cause of an erroneous indication is always immediate and can be

found by inspection. If the indication is justified, the standard strategy should be

followed, of course.

The second essential operation for the binary search scheme is finding the state of

execution at some, for the purposes of the operation arbitrarily, selected point. If the

execution of the program can be repeated exactly, or almost exactly, any state can be

obtained by re-execution with a break or halt at the proper place.

Practical considerations may alter the strategy in a number of ways. First, the

selection of the probe points may be constrained to expl icitly programmed ones

by the lack of break faci l i ties. Second, the exact repetition of program executions

may be impractical, even if theoretical ly possible: the execution time or batch

turn-around time may be too long, or the program may depend on real-time

inputs such as typein or interrupts. Fortunately, all of these adverse conditions

are predictable from the nature of the computing environment and the problem.

Appropriate preparations may include the following:

Identify the set of regular points in the program such that control wil l

pass through one of them with med ium frequency and where al l data

structures are in consistent state. These points can be fitted with

conditional halts, state dumps for inspection, or conditional calls on the

state vector syntax verifier. The number of program executions in the

search process can be reduced by running the verifier at the h ighest

possible frequency consistent with the length of execution and the

available computing resources. Thus, after the first run. the error (or the

META-PROGRAMMING: A SOFfWARE PRODUCfiON M ETHOD

new hint, depending on the power of the verifier) i s local ized to within

one "wavelength" of the verifier. Further debuggi ng can proceed by

i nspection. or a new run may be prepared with h igher frequency

verification concentrated i n the smaller, local ized, area. N umerous

variations of these schemes are possibl e: the verifier may be turned on

during all executions whi le debugging or even i n an operational system;

check procedures i n the verifier may be ind ividual ly turned on or off so

that the overhead and interference of verification can be decreased whi le

the frequency and resolution can be i ncreased.

To find the most elusive bugs, a c ircular event buffer may be e mployed.

The buffer can hold the recent h istory of a small piece of the state and i t

can be updated without appreciable i nterference to the program. The

shortcomings of the buffer are short temporal and spatial reach. These are

somewhat alleviated when the use of an event buffer and a verifier are

combined: the verifier may local i ze the error to with i n a wavelength and

may also give a sharper h int as to what part of the state should be

buffered. This method i s analogous to hardware debugging with delay

l ines in oscil 1oscopes which enable the engineer to i nspect events occu rr ing

shortly before a trigger signal.

Provisions should be made for avoiding unnecessary real-time i nputs

d uring debugging. In particular, major input for test runs should be read

from a fi le, even if an on-l i ne terminal is avai lable. The program should

also i nclude some global i ni tial i zation to protect i tself from dependence

on un in i tial i zed values.

Program execution time may be reduced by the standard techn ique of

checkpoints and provisions for restart. At a checkpoint the program state,

resulting from a lengthy computation, is saved on a fi le. Points past the

checkpoint can be then reached repeatedly starting from a restored state.

The computing envi ronment may not offer checkpointi ng services, but i t is

relatively simple to implement them integral to the program . .

Removal of errors once they are local ized, is probably the s impl est of the debugging steps,

because it is closely . related to production. Since there are two independent

representations of the program logic: the meta-program and the elaborated program text,

two cases must be d isti nguished. If the local ized error occurs i n the program text only,

the techn ician can perform the correction. If the meta-program is manifestly i n error,

the techn ician may or may not propose a solution, but the meta-programmer should be

told in any case, so that the meta-programs and the meta-programmer's model of the

world can be kept up to date, and also that the meta-programmer cari comment on the

53


impl ications, or, if the error is serious, prepare the required changes. N ote that thi s

would be an instance of efficient feedback communications (Section 2.2) rely ing entirely

on language wel l known to both communicants.

2.7 Other Meta-Programming Conventions

In addition to object naming, conventions may be used to control other syntactic and

semantic aspects of meta-programs and the produced code. Conventions should be

selected on the basis of thei r contribution to productiv i ty and ease of communication. I t

should be re-emphasized that the meta-programs' main purpose (2 .3) is to commun icate

the detailed design to a techn ician so that he can produce code which fulfi l ls the intent of

the meta-program, and so that he can learn the new terms in the local language at the

same time. Uncertainty about the form and economies of conven tions involv ing special purpose addenda to meta-programs or code should be properly absorbed by engineering organi zations (1 .2).

It i s by no means certain, for example, that special documentation for the

purposes of future program maintenance is always desirable. Some code may be

short-l ived (1.3.1 1 .3.4) if evaluation by the engineering organization shows that

the engineering design is unsatisfactory. For the purpose of evaluating

alternatives, the least expensive code, undocumented except for the

meta-programs, is the best suited. Furthermore, the worst-case costs of future

program maintenance from the meta-programs can not be m uch greater than the

technician's con tribution to the original creation of the code, which is sizeable but

does not preclude repetition. However, the unavai labi l i ty of feedback from the

meta-programmer and incomplete meta-programs may make maintenance, from

the meta-programs. alone, d ifficult.

Software is said to be readable if the cost of a m in imal modification i s low, even

when the expert preparing the modification has had no prior fami l iarity with the

detai ls of the program. The combination of meta-programs and code is not

readable i n th i s sense, si nce the information contained therein i s geared for

writeability, for understand ing by an organized and large scale scan of the

contents. The important point is that the production of readable software

involves more engineering effort and it i s more expensive than the production of

wri teable code. If future modifications turn out to be s imple, readable software

may look better; but, in the larger picture, the ease of the small modifications

were bought at the d isproportionate cost of modification-proofi ng the whole

program. For larger future modifications the importance of the narrowly

construed concept of readabi l ity d imin ishes as the modifications begin to resemble

production tasks.


The fol lowing conventions have proved themselves i n operational use (see Chapter 4), and

are strongly recommended:

1.7. 1 Divisions in meta-programs

The definitions of new major and minor qual ifiers, comprising the major portion

of the new language i ntroduced by a meta-program, form a body of reference

material which the technician as well as the meta-programmer wil l peruse

frequently. To s impl ify these references, the defin i tions appear at the beginning

of a meta-program i n the Abstractions divi sion. The Operations division which

describes the actual code to be written as a set of procedures operating on

instances of abstractions already defined, follows thereafter.

Within the l ist of abstractions there may appear the fol lowing constructs: new tags

together with their informal, or spoken, names; l ists of fields if the abstraction is

a data structure, and l ists of d istinguished values to define the non-standard

minor qual ifiers. The essential properties of an abstraction may be summarized

by invariant relations which hold true for all instances; however, such detail i s

seldom necessary save for more intricate structures. I f i nvariances are given, they

may be used for the meta-programmer's own reference and general

documentation; or they may help in determin ing the correctness of state during

debugging (2.6). Moreover, the description of those portions of the operations

which are responsible for the maintenance of the invariances may be simpl ified.

Definitions of new type construction (2.5) may be written among the abstractions.

Very l i ttle, if any, code results from the elaboration of the abstractions. Depending on the programming language used, declarations for the data structures

and their fields have to be prepared; d istinguished values have to be declared and

in itial ized.

The d ivisions of a meta-program are somewhat analogous to the Data and Procedure divisions of the business-oriented language COBOL [McCracken]. The main difference is in the concentration of generic i nformation i n the div ision of Abstractions, as opposed to the more concrete declarations of the COBOL Data division.

The Operations division contains the descriptions of the expl icitly programmed

operations, wri tten in a convenient pseudo-language commentary which usually

resembles a higher - level programming language. Implicit operations, such as

painting or operations inherited from the underlying type (2.5), need not be

defined. Variables need not be declared.

The essential properties of operations may be expressed by state transformation

relations coupl i ng the program state before and after the operation. These

55


relations. if given. are used s imilarly to invariances. as described above.

The elaborated operations constitute the major portion of the produced code.

Some new language may be in troduced i n the Operations d ivision by refinements:

an action may be descri bed using a new term with an explanation following

i mmediately or i n a separate section. Parts of the refinement. i n turn. may need

further explanation unti l all actions are defined entirely in known terms.

For example. a meta-programmer may elect to i ntroduce a new concept as

follows:

i f buffer i s empty then

followed by the refinement in terms of the known type bi (for buffer i ndex):

buffer is empty iff: biRead=biWrite- 1 or

(biRead=O and biWrite=biMax)

This arrangement is related to the design techn ique of stepwise refinement

[Wirthl]. The relation. however. need not be a strong one: the design deta i l communicated by refinement could have been created using other design methods,

for example by bui lding action clusters [ Naur3].

2.7.2 Naming conventions for procedures

The naming conventions described in Section 2.5 are not d irectly appl icable for

naming procedures. Many procedures do not return any value and, therefore. are

not typed in the usual sense. The scopes of procedures are usual ly large, often as

large as the whole program. The combination of these two effects means that the

minor qualifier must disti nguish a procedure from all other procedures just as a

conventional procedure name would. When a procedure does return a value. the

major qualifier of the procedure name should be retained to i ndicate the type of

the value. If no value is return�d. the major qual ifier can be safely omitted

because potential ambiguities are rare and most h igh-level language processors can

check the correct uses of procedure names from context.

The minor qualifiers of most procedure names are composed of an imperative

verb (Create, Sum, Print and so on) and the tags for the fi rst one to three

arguments (see Section 2.8 for examples). Procedures implementing mappings are

qual ified by the tag for the range which is the procedure's result type, fol lowed

by the word From and the tag for the argument (as i n CiFromCh(ch) where the


domain is ch and the range c i ). These conventions offer a reasonable compromise

between the requ irements of speedy creation, mnemoni c value and type checking.

2.7.3 Name hyphenation

Some implementation languages allow the h ighl ighting the. boundaries between

constituent parts of names by hyphens (as i n PRINT-CH), by underl i nes (PRINT_CH)

or by the use of capital ized in i tials (PrintCh ). Since there may be a n umber of

different ways of separating a name, an unambiguous rule m ust be chosen: for

instance, hyphenation may be restricted to mark the boundary between the major

and m inor qual ifier only. Marking the components of type construction would

result i n too many separators, while sub-components of minor qual ifiers are

d ifficult to define unambiguously. Aga in , an exception can be made for

procedure names where the minor qual if ier is constructed i n a well -defined way

from a few words (2.7.2). These components, as well as the major qual ifier, may

be hyphenated.

2.7.4 Parameter order in procedures

Correspondence between actual and formal parameters i n procedu re cal ls has

traditionally been establ ished by the i r ordering: i n general the nth actual

parameter wil l correspond with the nth formal one. Thus, the ordinal number n

of a parameter acts as i ts external name. The choice of the parameter order is a

naming problem where con ventions are a ppropriate. Since important properties

of the parameters have al ready been expressed by the formal parameter names, we

can proceed by mapping the names into an order. Th i s can be accom pl ished by

establ ishing separate canonical orderi ngs for major and· minor qualifiers and

sorting parameter l i sts accordingly. The canonical ordering should be based on

the in tuitive size or importance of the abstractions represented. N ote that the

minor qual ifiers often come already partia l ly ordered (2.5).

An exception is warranted if some parameters are used to return values from a

procedure. Because of the dangers i nherent in their m isuse, these parameters

should be expl icitly identified by writing them first i n the parameter l ists. Th is

rule i s easily remembered because the ordering resembles the conventional order

in ass ignment statements [ Lampsonl].

2.7.5 Use of comments for explanation

Although comments have long been an i mportant part of programmi ng practice,

their value must be re-exam ined in l ight of the meta-programming conventions.

51


The meta-programs themselves, unencumbered by petty l imi tations of h igh-level

languages, can answer the same operational purposes as comments used to serve.

This poin t is expressed in the discussion of comments in [Kernighan

Piauger] thusly: "If you wrote your code by first programming in a made

up pseudo-language ... then you already have an excel lent 'readable

description of what each program is supposed to do'." (see also the quote

from the same reference in Section 2.3)

In particular, comments describing procedure parameters are superseded by the use

of pain ted types and naming conventions; structure descriptions are given i n the

Abstractions divisions of meta-programs; the intents of action clusters are stated

by refinements. Since exceptional needs for comments can be always satisfied by

the meta-programmer, technicians do not have to write explanatory comments at

all .

2.7.6 Programming language syntax extensions

Conventions about the use of the implementation language are often the easiest to state in terms of extensions of the language syntax. As noted earlier, these

extensions need not be backed by software i mplementation.

The extended syntax may regulate the use of new J ines, spacing, and indentation,

otherwise partial ly or whol ly ignored by the language processor. Typically, the

indentation would be used to show the nesting of scopes, conditional and iterative

statements.

Example 2.3.3 is shown with standardized indentation. Note that compound and conditional statements fitting entirely on a single l ine are

treated d ifferently from longer ones. Although a natural convention, such

a fine distinction would be d ifficult to express in syntax equations.

When the implementation language allows a number of equivalent options, a

single one may be selected for use, or redundant information may be encoded into

the choice. To d istinguish logical ly d ifferent uses of the same syntactic form; to

identify a group of statements as the implementation of a higher-level construct,

or to emphasize a particularly important statement, further redundancy can be

introduced in the form of standard comments. Insofar as the use of these

comments must follow prescribed syntax, the remarks of the previous section do

not apply.


2.7.7 Standard operations

Whenever the meta-programmer defines a new abstraction, he should also

consider the immediate implementation of a number of standard operations for

checkin!. printing and enumerating instances of the abstraction.

The purpose and deta i ls of the checking and prmting procedures were

d iscussed in Section 2.6. Examples are given in Section 2.9. I t is also

worth noting that by writing the checking and printing procedures, the

technicians' mental models of the abstractions are confirmed or updated;

thus these procedures are also very effective means of comm unication.

The enumerator procedure provides conven ient access to al l instances of the the

abstraction by arranging to call a formal procedure, representing the body of a loop, once for each instance. The d ifficulty of performi ng the enumeration may

range from simple counting to complex operations on sets. In e ither case, the

enumerator serves to hide i nformation [ Parnasl] about the nature of loops

involving the ab_straction. The appl icabi l i ty of en umerators is determined by

weighi ng the value of i nformation h idi ng against the execution overhead

introduced. If e i ther the enumeration algorithm or the body of some loop is

complex, relative overhead wi l l be low and i nformation h id ing wi l l be valuable.

The detai ls of enumerator procedure conventions are h ighly dependent on

the availabil i ty of various i mplementation language features. G iven that

procedures may be passed as formal parameters, the convention may look

as follows: for abstraction X, EnX(Proc) wil l call Proc(x) for all i nstances

x of the abstraction. For example, EnCi(PrintCi ) would implement the

i nformal meta-programming statement:

for al l ci , print c i

Other implementations, using macros or even by man ual copying of action

clusters, are also possible.

2.8 Meta-Programming Example

We now have sufficient theory to attempt i ts appl ication to a s imple example. The

subject problem for the example was chosen to be the one described by [Dijkstra] so that

the close relationsh ip between the structured design and the meta-programs can be better

i l l ustrated. Briefly restated, the problem is to prepare a plot of some in teger function

given in parametric form (fx( i ), fy( i )) on a l i ne-printer which 'is capable only of the

fol lowing operations:

59

60 CHAPTFR 2: META-PROGRAMMING

pri nt blank

print mark

return carriage and start a new l i ne

Dijkstra's solution - which we sha l l also fol low - is a program consisting of six "pearls",

or levels of refinement. These are from the top down: (Dijkstra's names are given i n

parenthesis)

1 . (COMPFIRST) says that we fi rst bui ld a n "image" then prin t i t.

2. (CLEARFIRST) explains bui ld ing as clearing the image then sett ing marks.

3. ( ISCANNER) defines setting marks: for al l i (parameter for fx and fy) add mark.

4. (COMPPOS) states the rule for adding marks: calculate the posi tion of the mark

(fx(i), fy(i)), then mark that position.

5. (L INER) contains the defin i tion of the image: it consists of a fixed number of

" l i nes". To clear the i mage (used by 2 ), it clears a l l l i nes. To print the image (1),

i t pri nts all l ines. To mark a position (4), i t selects the l i ne at y and marks that

l ine at the given x.

6. (SHORTREP) i n troduces a particular representation for l i nes: they are fixed length

arrays of characters with an associated counter which keeps track of the number

of characters to be printed. To print the l i ne, i t prints the requ ired number of

characters from the array. To clear a l i ne, the counter is reset to 0. If a posi tion

is to be marked, depend ing on the counter, the l ine first may have to be

"lengthened" and the added space fi l l ed with blanks, then the mark may be stored

in the array.

S ince both the problem and the sol ution are now presented, the question may arise: what

can we expect to add to this? For the answer, a comparison of goals is in order. Dijkstra

analyses the program development process, from the point the problem is clearly posed,

a l l the way to the completion of the language processor executable program text. We, on

the other hand, assume that such design work has al ready been completed by the

meta-programmer, except that this design m ight not be i n machine executable (or even

human readable) form, but rather i n the highly personal notation of the

meta-programmer, such as personal notes, mental i mages, references to l i terature or a task

order from the customer. In particular, a specification statement, such as the above

descri ption, would probably not exist at a l l . What remai ns to be accompl ished is to

transfer the knowledge of the design to the technician who wi l l prepare the machine


executable version and do the debugging. The transfer med ium wil l be a meta-program. Let us also assume an implementation language which includes data structures, such as

ALGOL W [ Hoare-Wirth] or BCPL [ Richards].

The first meta-program will describe the lowest level of refinement: ( l ine numbers are

given for reference only)

1 Abstractions:

2 3

xc x coordinate ch character

4 5 6

In l ine, structure with f ie lds:

7 Operations:

8 Println( ln ) :

xcMac mpxcch fixed size

9 for a l l xc i n I n 1 0 PrintCh (mpxcch[xc ] ) 1 1 Newline( ) 1 2 PrintCh and Newline must be declared EXTERNAL! 1 3 end of Println . . .

1 4 Clearln( ln) : 15 set xcMac �o 1 6 end o f C learln . . .

1 7 Markln( ln ,xc ) : 1 8 fi rst ensure xcMac >xc : 1 9 for al l txc i n [xcMac, xc- 1 ] 20 mpxcch[txc]�chSpace 2 1 xcMac�xc + 1

22 mpxcch[xc] +-chMark 23 end of Markln . . .

24 Pxln( ln ) 25 print on new l ine: (a l l #'s octal ) 26 "In: " In, xcMac, Println( ln ) 27 end of Pxln . . .

28 Ckln( ln) : 29 if xcMac< O or >xcMax then error

61

62

30 3 1 32 33

CHAPTER 2: META-PROGRAMMING

for all xc in In do

end of Ckln ...

if mpxcch[xc] is not space or mark then error

While this meta-program conta ins very l i ttle information about the nature of the larger

problem, i t i ntroduces the basic abstractions and operations rely ing only on global

language. I n l ine 2, we find the i ntroduction of the pain ted i n teger type xc which wil l

represent printer positions. The reason for not cal l i ng it prin ter position i s the

expectation that m agn ified and rotated printout formats may be added later. The

explanation of this fine poin t in the meta-program would serve no operational purpose,

however. The real defin i tion of the abstraction xc is g iven by the operations fol lowing: xc is a quantity which is used as shown.

The fields of structure In in l i ne 4 incl ude a fixed size array and the quantity xcMac,

ostensibly designati ng the defined portion of the array (see Section 2.5). The al located

size of the array wi l l be set to some val ue, say 10, and named xcMax by convention.

I n l i ne 8, the defin i tion of the fi rst operation starts. The n ame Println i s a typical

construction from an act ive verb and the parameter type. It may be pronounced partially

spelled out print- I-n or, i nformally, as print-l i ne. The statement on the next l i ne:

for all xc in In

elaborates i n to a loop from 0, which may be the default lower bound i n . the global

language, to xcMac . The latter quantity can only be obtained from the parameter In by

field selection; this is an example of a coercion. Type compati b i l i ty in the next statement:

PrintCh( mpxcch[xc ] )

can be easi ly checked: mpxcch may be i ndexed by xc and y ields the c h expected by

PrintCh.

The explanation about a subtle implementation language requirement i n l i ne 1 2 is a

useful precautionary measure.

A si mple refinement is apparent in l ine 18 where the purpose of an action cl uster is

stated, fol lowed by more detai l . The quantities chSpace and chMark, the character codes

for space and the mark, arc disti nguished i nstances of the type ch. Thei r defi n i tions can

be safely entrusted to the techn ician. The convenient notation for an in terval in l i ne 19

need not be a legal construction in the implementation language. We also note a trick in

l i ne 21 , setting xcMac (of I n , by coercion) to i ts desi red value d i rectly instead of

META-PROGRAMMING: A SOFfWAR E PRODUCfiON METHOD

Dijkstra's orig inal:

xcMac .-xcMac + 1

which is more d ifficult to prove correct. The practical value o f the i mprovement i s

infin i tesimal but then there was n o precious production time wasted b y explanation.

Starting at l ine 24 the test print and check procedures (given the standard names Pxln

and Ckln, respectively) are defined for l ines. The difference between the normal and the

test prin t procedures is evident; in fact the normal prin t procedu re, Println, is used as

part of the test printing. Test printout wi l l be in octal for easy comparisons with data

obtained by an interactive debugger. The code to be written when the errors are detected

i n the check procedure ( l ines 29 and 32) is defined by convention.

The next meta-program will define the next h igher level of abstraction, providing the

second d imension to form the i mage:

34 Abstractions:

35

36 37

38 Operat ions:

yc

im mpycln

y coord inate

image, structure with f ie lds: fixed s ize

39 Printlm( im) : for a l l yc Println.

40 Clearlm( im) : for al l yc Clearln.

4 1 Marklm( im,xc,yc) : Markln(mpycln[yc] ,xc)

42 Pxlm( im): 43 print on new l ine ( al l #'s octal ) "im", im 44 for al l yc, pr int on new l ine "yc" , yc, Pxln

45 end of Pxlm . . .

46 Cklm( im) : for al l yc C kln

The upper l im i ts of the loops on yc wil l be ycMax ( im pl icit from mpycln being fi xed

size) because there is no ycMac defined anywhere. We also note a compound coercion i n

l ine 44: Pxln needs a I n , but the only quanti ties ava i lable are im , the formal parameter,

6 3


and yc, the loop variable. The solution is s imple: Pxln ( (mpycln of im)[yc ] ).

Finally, the driver i s meta-programmed as fol 1ows:

.. .....................................

47 Abstractions:

48

49 Operations:

par parameter for the parametric functions XcPar, Yc Par.

50 XcPar(par) : return min(par, xcMax)

5 1 YcPar(par): return min(par, ycMax)

52 EnPar(Proc ) : for al l par in [0, 1 00) Proc(par)

53 Draw ( ): 54 CompPar(par): Marklm( im, XcPar(par), YcPar (par) )

55 reserve storage for l ocal structure im

56 Clearlm, EnPar(CompPar), Printlm

57 end of Draw .. .

An enumerator i s specified in l ine 52 to h ide information about the nature of loops on

pars i n anticipation of changes to more complex loops, in case pars are changed to

floating point representation, for example. The use of the enumerator is i l l ustrated i n

l ines 5 6 where i t i s cal led to cause execution of the loop body, defined in l ine 54; for a l 1

pars. In this i nstance, the notation is rather unfortunate as the body of the loop i s

removed from the place where i t i s active, but the technician's task of elaboration remains

s imple. Once the techn icians are fami l iar with the construction, a more compact notatio n

may be used, such as the ALGOL 68 style:

EnPar(CompPar(par): Marklm( im, XcPar(par) , YcPar(par ) ) ) .

A pair of simple parametri c functions are also defined in l ines 50 and 5 1 for

com pleteness. The implicit painti ng and unpainting operations in the functions wi l l

remain i m pl icit i n the code as well a s long a s all underly ing types are integers i n the

implementation. In a strictly typed environment, the expression

return min(par, xcMax)

M ETA-PROGRAMMING: A SOFTWAR E PRODUCfiON METHOD

would have to be written as

return Xc(min( lnt(par) , lnt(xcMax) ) ); or

return XcMin(Xc(par), xcMax)

where Xc i s a pai nting, lnt i s an unpainting operator and XcM i n i s the m i n i m um

operation defined for xes. Some of these com plexi ties are d ue to lack of foresight the

bounds checks for the coordi nates should have been implemented in the lower levels. The

omission can be easi ly remedied:

1 7. 1 1 7.2

4 1 . 1

ignore out o f bounds xc: return unless xc is in [O,xcMax)

but ignore out of bounds yc!

65


2.9 Comparisons and Combinations with Other Programming Methods

I n this section, the relationships between meta-programm ing and the most important

methods of software engineering, are d i scussed. Whenever the method d i scussed attacks

the same problems as meta-programming, we contrast the d ifferent approaches; otherwise

the poss ibi l i ty of combini ng the ideas wi l l be explored.

2.9. 1 High Level Languages

The development of high level languages was a n • .jtorically important step i n

improving programming productivity . A sign if icant factor i n their success has

been the users' taci t acceptance of s impl ifying conventions which go beyond the

syntax and semantics of the languages to include the use of standard run-time

environment, 1/0 packages, s impl ified register and instruction usage. The factors

more generally recognized as important have been readabi l i ty, conciseness,

availabi l ity of operators, control structures, compi le- and run-time checks.

When h igh level languages are used in conjunction with meta-programs, we saw

that readabi l i ty of code becomes less cri tical (2 .7), type checking may be the best

handled by naming conventions (2.5) and m echanical enforcement of other

conventions is unnecessary (2.3). What remains essential are capabilities, access

to the most efficient means of doing useful work on the computer.

Examples of capabi l i ties may include such mundane conven iences as

compile time constants, the abi l i ty to retrieve the remainder in a d ivision

operation and the high order part in a prod uct, or to access data through

pointers; or necessities such as reading or writing magnetic tapes.

Unfortunately, questions of capabi l i ties have become enmeshed with styl istic

considerations and access to capabi l i ties has been often den ied for fear of

aesthetic disun i ty, abuse, loss of protection or possibi l ity of m isunderstanding.

While these fears have been val id under conventional organization of production,

under meta-programming styl istic focus is on the meta-programs and the language

of implementation is simply a tool of interaction with the computer. The style of

ttie meta-programs is controlled expl icitly, by the meta-programmer, and

implicitly, by administrative conventions (2.3). Further controls of style by high

level language processors are redundant and may actual ly be harmful if

capabi l i ties are lost as a result.

META-PROGRAMMING: A SOFrWARE PRODUCfiON M ETHOD

2.9.2 Buddy System [Metzger], Ego/ess Programming [Weinberg]

Both of these essentially equivalent techniques emphasize careful reading and

checking of code before debugging may start It is also significant that the

checking would not be done by the author, who is more l i kely to overlook his own

mistakes, but by a peer, the buddy. The fol lowi ng advantages accrue from the

arrangement: debugging is simpl ified because the checki ng is l ikely to remove

some fraction of the m istakes; the checking also ensures that at least two persons

wi l l be fami l iar with the details of the code; finally, the peer review may serve as

an i ncentive for more careful work. The major cost factor is the the time spent

by programmers reading other program mer's code, learning the local language

defined therein and understandi ng the detai ls to the degree necessary for finding

mistakes. Note that there are no operationally unambiguous signals of the

reviewer's fai l ure to do a thorough job. In fact, the better the unchecked code, the

more difficult to evaluate the reviewer's work.

In a Software Production Team, a form of the buddy system is present: all design

details undergo intense scrutiny by the meta-programmer, while writing the

meta-programs, and by a technician while writing the code. Since both of these activities are d irectly productive, checking does not enta i l extraordi nary costs.

Assuming, conservatively, equal productivity, a Team of two wi l l complete

some module in half the time taken for the same task by a conventional

programmer. The man-hours used are the same in both cases, but the

Team's code is already checked. Checking of the programmer's code may

cost an estimated 30-60% more.

Strictly speaki ng, the Team's checking is less complete: the techn ician's

written contribution, the elaborated code, is not checked by review.

However, the conceptual difference between the code and the double

checked meta-programs i s small enough to suggest that errors introduced

by the elaboration process wi l l be s imple and few in n umber. These and

the other remain ing errors wi l l be caught during debugging.

The combination of d i rectly productive and checking activities also means that the

completion of the productive task impl ies the completion of a careful scan of the

contents and, therefore, a measure of checki ng.

The buddy system and the Team approach both requi re that the participants

practice ego/ess programming [Weinberg], that is be wi l l ing to release their work

for public scrutiny. The meta-programmer should have no problem in accepting

th is condi tion since the meta-programs are all but worthless un_less someone reads

67


them. However, the techn icians are put i n to a potentially less comfortable

situation: not only they cannot keep the ir programs private, but they must also

submit to decisions made by the meta-programmer. This suggests that

inexperienced programmers should be selected for techn icians. These people

would welcome the learn i ng opportun i ty and would be motivated primarily by

being part of an extremely productive organization.

An attempt to combine the sim pler social structure of the buddy system with

h igher efficiency of meta-programming i s cross meta-programming. In this

scheme, a pai r of programmers both play the d ual roles of meta-programmer and

technician working for one another. This way the checki ng time wil l be reduced

and scrupulousness of checking wi l l be operationall y ensured, as shown above.

The difference between cross meta-programming and the Software Production

Team organization is in special ization: the Team members are more special ized i n

their roles. Because of the Jack of special i zation, cross meta-programming is less

efficient. A programmer is either over-qual ified to be a techn ician or

under-qual ified for the meta-programmer's job. Nevertheless, under existing

conditions, cross meta-programming may be an attractive form of organization.

2.9.3 Structured Programming, Goro-less programming

Structured programming i s a design methodology, originally described i n

[Dijkstra], which can be used to great advantage by engineering organizations

( 1.2) for system analysis and also by the meta-programmer for detai led design.

The meta-programmi ng requ irement that i mplementation proceed bottom-up (2.2)

i s compatible with structured programming: the design may i tself be bottom-up

[ Dahi-Hoare] or the top-down design may precede the i mplementation.

The problem of personnel tra in ing for structured programming is greatly

simpl ified if the techn ique is used in a Software Production Team: only the

meta-programmer has to be trained in i tially. The techn icians fol lowing the wel l

structured meta-programs cannot but write structured code.

The remarks of Section 2.9.1 apply to comparisons of structured constructs and

unstructured GoTo statements in implementation languages.

2.9.4 Chief Programmer Teams

The Chief Programmer Team (CPT) organization is the pioneering application of

engineering and management principles to prod uction programm i ng. The method

is introduced i n [ Bakerl] thusly:


"Seeking to demonstrate i ncreased programmer productiv i ty, a functional

organization of specialists led by a chief programmer has combined and

appl i ed known techniques i n to a unified methodology. Com bined are a

program production l ibrary [also called development support l ibrary,

DsL], general-to-deta i l [top-down] implementation and structured

programming ... "

Additional techniques associated with the CPT organization are egoless

programming, top-down development, the employment of "more competent but

fewer people", among them the backup programmer who "can assume the

leadership role at any t ime, if required", and the programming secretary who

maintains the DsL; and finally, the "reintroduction of senior people i nto detailed

program codi ng" [Mi lls]. Comments made earl ier on structured programming and

egoless programming remain applicable when these techniques are used in a CPT.

I t is evident that these ideas cover a larger range of concerns than the present

work; i n particular, system archi tecture and system design are within the scope of

the team effort, and so are certain tools. We assigned the former tasks to an

engineering organization ( 1.2) and have not discussed the question of tools at all (1.6).

For example, the DsL and the associated special ist, the programming

secretary, can greatly simpl ify the use of batch processing systems. The

reported success of this tool within or without a CPT (Mi l ls] shows that

software implementation of al l clerical functions is not a prerequisite of

programming productivity. The DsL's significance i n promoting

commun ications wil l be discussed below.

Top down development of system architecture, as advocated i n (Mi l ls],

requires that the archi tect have a clear v ision of the lower levels of

abstraction. Often the design wi l l have to be developed i teratively,

"osci l lating between two levels of description ... This osci l lation, this form

of trial and error, is defin i tely not attractive, but with a sufficient lack of

clai rvoyance and being forced to take our decisions i n sequence, I see no

other way." comments [ Dijkstra]. Uncerta inty absorption and contin uous

process production, i ntroduced in Section 1.2, are explicit concepts for

clarifying organ izational roles while the design is developed. S imi lar ideas

are implicit i n Mi l ls' remarks: "software was del ivered ... i n spite of 1 200

formal changes i n the requirements [. The] rate at which computer time

was used remained nearly constant from the 9th to the 24th month, a

consequence of the con tinuous i n tegration ... " [Mi lls].

69


I n a CPT the chief programmer bears project responsibi l i ty, a ided by the

backup programmer who can insure the continuity of the project should

the chief leave. The locus of project responsibil ity may or may not reside

i n a SPT depending on the detai l of task orders (2.3). For shorter, routine.

or general ly parsimonious projects the meta-programmer can take the ful l

responsibil ity. Larger projects, which have to be able to survive changes in

key personnel, should be supported by an engineering organization

representi ng the overall project responsibi l i ty and main tain ing continu i ty.

The task orders from the engineering organization to the SPT would be

more detailed i n this case and the tasks themselves would be shorter i n

duration. Several variations for replacement of personnel are possible: the

meta-programmer can be replaced with the loss of at most one task plus

his knowledge of the project; the key archi tect i n the engineering

organizat1on could be probably replaced by the meta-programmer, or a backup architect could be employed by the engineer ing organization.

The basic CPT idea of letting sen ior talent participate in d irectly productive

activities has been fully adopted i n the SPT organization (2.2), substantially

determin ing the meta-programmer's role. Nonetheless, there are numerous

differences of detail. The meta-programmer does not write code at all, yet he can

maintain absolute product contro1 by meta-programming. Lacking this powerful

communication instrument, the chief programmer m ust code the critical portions

of the program to exercise control. Because of the h ighly leveraged position of

the meta-programmer, the other members of the team do not have to be "more

competent" to be able to emulate and absorb the meta-programmer's ski l l and

experience.

The critical communication problem (1.6) is addressed in a CPT by reliance on

structured programming and the visibi l i ty of programs afforded by the DsL.

These measures enable programmers to read and understand each other's code. I n

the S PT the wheel organization, the central ization of language creation, and the

object naming conventions aid communications to the degree that al l reading and

understanding can be overlapped with directly productive activities.

The opposite d i rections of implementation i n CPT and in SPT were determined in

both cases by independent considerations. The bottom-up order of SPT is

necessary so that communications can always use known, concrete, terms; defined

operationally by procedures already coded and understood. The argument

supporting top-down order of implementation in CPT (a question separable from

the order of design which has been d iscussed above) shows the efficiency and

thoroughness of testing when h igher level routines (the earl ier ones in the

M ETA-PROGRAMMING: A SOFTWAR E PRODUCfiON METHOD

top-down sequence) are available to create a real istic test environment for lower

levels [ Baker] [ Barry]. It is possible to combine these advantages: a set of

routines may be coded bottom-up unti l a level at the top or near the top i s

reached, then debugging can start from the top down, always using the h igher ones

to create the test environment for the others below. It should be noted that the

test data in the real istic environment is more complex than if data were generated

by special purpose drivers. State vector syntax checkers (2.6) are i ndispensable for

local iz ing errors under such c i rcumstances.

2.9.5 Automatic Program Verification

I n [ Deutsch] we find the following defin i tion of this method:

" Program verification refers to the idea that one can state the i n tended

effect of a program in a precise way that is not merely another program,

and then prove rigorously that the program conforms to this specification.

Automatic refers to the hope that ... we can bui ld systems that perform

some or all of th is verification task for us".

The promise of verification is then both qual i tative and quantitative. On the

qual i tative side, absolute, rather than approximate, correctness wil l be attai nable.

Quantitatively, the mechanization of the process may i mprove productivity by

el iminating the need for manual debugging. Thi s d istinction is important, because

the absoluteness of correctness has very t i ttle practical val ue. The property valued

by users is reliability, defined i n [Parnas2] as a "measure of the extent to which

the system can be expected to del iver usable services when those services are

demanded." Parnas goes on to argue that rel iabi l i ty and correctness are

complementary but not synonymous. A logically correct program may be, in fact,

unrel iable if i ts specifications fai l to account for the poss ibi l i ty of hardware

errors or i ncorrect input.

I n general, it is not sufficient that the system main ta in i ts temper in face of

adversi ty as operational experience may show that technical ly well defined

responses may be operational ly unacceptable. The difficulty of predicting the

sources of operational difficulties so that thei r hand l ing can become part of the

specifications is well i l l ustrated by the Ess experience [Vyssotsky] where most of

the (extremely rare) fai l ures were caused by external events, or combination of

events, which the system designers did not foresee at all . This means that if the

number of program errors can be kept substantially below the number of

specification problems, further el im ination of program errors wi l l not perceptibly

improve rel iab i l i ty.

7 1

72 CHAPTER 2 : META-PROGRAMMING

The projected output of verifiers would i nclude theorems and conditions under which the theorems do not hold. The conditions might be of the form of paths

through the program, symbol ic counterexamples and so on. Such output is

essentia l ly the equivalent of a run-time error indication (2.6). To be

quantitatively helpful , a verifier w i l l also have to local i ze the point of error.

The possibi l ity of i nteractive help to verifiers [Deutsch] also raises personnel

issues: what l evel of train ing w i l l be requ ired for the helpers?

CHAPTER 3: EXPERIMENTAL VERIFICATION

74

3.1 Introduction

To verify the predictions of the meta-programming theory, a series of experi ments were

performed, as described in this chapter. The general experimental approach was to do a

small n umber of full-scale programming projects, with some variation in key personnel

and in organization (Sections 3.2 and 3.4). In particular, in the last project (Project D,

3.9.3) three programs were produced from the same specifications, by three d ifferent

groups i n a controlled experi ment

All participants in the experi ments were full-ti me employees. Programming was done on

personal computers using a high-level system programming language (3.3). Uti l i ty

programs on the computers were i nstrumented to record measurements of their usage

automatically. Details of the measurement system are described in Section 3.5 and in Appendi x B . One of the projects (Project C ; 3.6, 3.9.2, Appendix C) produced a simple

Management Information System, which was later used to process the col lected

measurements.

Independent evaluation of the experimental results is made possi ble by the deta i led

descriptions of the experimental environment (3.3), the personnel selection criteria (3.4),

the task specifications (3.6, Appendices C and D), the defin i tions of the productivity

measures used (3 .7), and the processing used to el im inate various d istortions from the raw

measurement data (3.8).

Section 3.9 describes the results of the experiments. During the longest experiment,

Project C. almost 14,000 l ines of code were written, at an average rate of 6.12

l ines/man- hour. The control led experiments of Project D showed that comparable results

can be obtained by d ifferent persons acting as meta-programmers. The d ifficult

experimental comparisons of the meta-programming and conventional organizations,

however, y ielded only inconclusive results.

3.2 Experimental Approach

Organ ization of experiments for the measurement of software productivity demand a

fundamental choice of resource al location between a larger number of experimental

implementation efforts, each l imited in size and scope, or a smaller number of samples

which may be more representative of the important, larger-scale, problems. In the former

case the results can be statistically significant, but serious doubts would remain about

their scalabil ity or appl icabi l i ty to the larger-scale domain. The latter choice would y ield

results wh ich would be appl icable, but thei r statistical value would be correspondi ngly

reduced and the contributions of disti nct variables blurred.


The concern about the scalabi l i ty of results is caused mostly by the nonlinear growth of

communications, both within the organization producing the program and with in the

program i tself ([Brooks] Chapter 8). S ince the difficul ty of communications in a team

of producers caused by the contin uous enrichment of the local language has been posited

i n Section 1.6 as the basic structural obstacle to higher productivity, the decision was

made to perform only larger scale experiments whereby this effect could be observed or countered.

Real i stic resource l i mi tations would severely l imit the number of such experiments. They

would then, at best, serve as demonstrations of the feasib i l i ty of achieving certain results

under certain condi tions. The subjective sign ificance of the demonstration to an external

observer would depend on the deviation of the results from the norm; the presence of

val id predictions, si nce a predicted deviation is less l ikely to be a fluctuation; and finally,

the perceived abi l i ty to reproduce the c ircumstances of the experiment

The enthusiastic response to the Chief Programmer Team results in the celebrated

New York Times Information Bank project [ Mi l ls] [ Bakerl] exempl ifies the

potential impact of demonstrations. The results were far above norm; the a uthors

i n fact predicted the productiv i ty improvement, and the purely organizational

approach invited reproduction.

Since the environmental and personnel factors are generally the major obstacles to

i ndependent reproduction of resul ts, i t was also decided that, insofar as resources permi t,

the fraction of results attributable to these factors should be also demonstrated. The

meta-programming method itself makes no assumptions about tools (2.2) and special

programming skil ls are required only from the meta-programmer. The fraction of

productivity improvement not d ue to the envi ronment and personnel should then be the

method's own contribution, reproducible in a wide set of environments by different

participants.

The separation of contributions to the results was done by matched pai rs of

demonstrations, in which some critical variable was varied while the other variables were

matched as closely as possible. W henever matching required approximation, either

because of the d ifficulty of perfect match ing, or because the variation i n the critical

variable precluded certain matches, a conservative approach was taken, as described for

each case in the sequel , to obta in credi ble results.

3.3 Experimental Environm�nt

Although not a part of the method under discussion, a description of the programming

environment is i n order; fi rst, because it contains some unusual features, _ and second, to

15

76 CHAPTER 3: EXPERIMENTAL VERIFICATION

al low d irect comparisons of the uncontrolled experimental results w i th other experiments

or experiences.

The choice of environment was determined by considerations of avai lability, i nherent

efficiency so that personnel costs can be reduced, and support of measurements (3.5).

Throughout the experiments, an operating personal min i-computer· [LRG] [Lampson2]

was available to each participant at all times. A removable disk cartridge provided 2.5

mil l ion characters of file storage on each computer. Furthermore, the computers were

connected by a commun ication network [ Metcalfe-Boggs] to each other and to a central

time-sharing system which was used as a repository for common fi les and for archival

storage. Another means of backing-up files was the copying of whole disk cartridges. A

h igh speed printer was also avai lable via the network.

All programming was done i n the typeless system programming language BcPL

[Richards]. The sequence of operations i n the program creation cycle was to generate or

edit source program text using an interactive editor, compile the new source or the old

source modules affected by the changes, issue the load command, and run the loaded

program under the control of an interactive debugger. The editor used was QED

[Deutsch-Lampson] in the early experiments group (3.4) and the Project B editor (3.6)

during the main experiments (3.4). The debugger was a d irect descendant of DDT

[TENEX]. I t could be used to set breakpoints, inspect variables, and cal l procedures

during execution of a. loaded program. The symbol ic names of procedures, labels, and

global variables were known to the debugger, but the names of local variables and compile

time constants were not

The programs written could depend on the services, such as streams, fi les and file

d irectories, of an open operating system described in [ Lampson2].

Participants also enjoyed reasonably private accomodations. Jun ior participants (3.4),

hired for the duration of the experiments, had the experimental work as their full - time

assignment. Senior participants had only the usual load of plann ing, reviews, reports, and

conferences in addition to thei r major, full-time experimental responsibi l ity. All

participants were paid competitive ind ustrial wages commensurate with their experiences.

Benefits i ncluded paid holidays and legislated state benefits.

To min imize the effects of the measurements on the experimental ensemble, the

measurements were made unobtrusive and largely automatic (3.5). Absolutely no

evaluations of the measurements were made while the experi ments were in progress except

for periodic inspections to ensure that the collected data is safe and complete.


3.4 Experimental Setup

The sequence of experiments can be d iv ided into two major groups: first, the early

experiments group comprising two projects designated A and B respectively; and second,

the main experiments group which included projects C, 01, 02 and D control.

The purpose of the early experiments was the validation of the basic meta-programming ideas, the clarification of the supplementary ideas and conventions, and the tra in ing of a

second meta-programmer. The software produced for projects A and B, a cross-reference

program and a text edi tor (3.6), was used i n support of the main experi ments. The edi tor

B provided the instrumentation for the measurements (3.5).

Based on the experiences from the early experiments, the main group was designed to

implement the approach described i n Section 3.2.

Project C demonstrated the productivity of a Software Production Team and the

qual ity of the code produced.

Projects C, Dl and 02 showed the degree of independence of results from

personnel factors.

Project D control provided data on the performance of a conventional programming group for comparison.

The assignment of personnel to the various projects is i l l ustrated in Figure 9. The present

author is desig�ated M I. Programmer Pl, a researcher with a Ph.D. i n Computer Science

and M l, at that time a cand idate for the same degree, were the senior participants. The

techn icians Tl-T5 and programmer P2 were junior participants, hi red for the d uration of

the experiments only. Tl and M 2 denote the same person in different roles.

Sen ior participants were well acquainted with the experimental environment. Technicians

got their tra in ing strictly on the job. Program mer P2 and meta-programmer M2 were

given time to practice, as described below, before thei r participation i n the experiments.

The technicians' score on a programming test (Appendix A) was a major factor in their

selection. However, appl icants with professional programming background, who often

had excel lent scores, were considered overqual ified. Techn icians Tl, T3 and T4 had very

sim ilar backgrounds (4 years at prestigious un iversities, no professional programming

experience, approximately 5 computer science courses with a grade point average of 3.8

for those courses only) and sim ilar test scores (no errors; 75, 70 and 103 minutes for Tl,

T3 and T4 respectively). The qualifications of T2 and T5 were sim ilar except for

professional experience and test results. I t is evident from the topology of Figure 9 that

these differences could not affect the main experiments group.

77

78

I I I I I I I I I I

...L

, - , / '

\ T5 ,,...-ct---tf M2 \ " - \ I ....... ...,

I I I

Project A+B July-Sept 197 4

Project C July-Nov 197 5

Practice July-Nov 197 5

Project D Dec 1975

Practice Dec 1975

Project D control Jan-Feb 1 976

Figure 9 Organization of the experiments. Tl-T5 are techn icians. M l and M 2 are meta-programmers. PI and P2 are programmers.

META-PROGRAMMING: A SOFTWARE PRODUCfiON M ETHOD

The participation of T3 and T4 in Project C was designed to test that with the above

selection cri teria, the variation i n the technicians' individual productivities i s small (1.5).

M 2, the secon d meta-programmer, learned the use of the tools, the meta-programming

method and the conventions as a techn ician i n the early experiments. He was later given

the opportuni ty to practice the meta-programmer's role i n a team with T5 for about five

months. Thus the preparations of M l and M2 for Projects 01 and 02 differed

considerably. On the other hand, T4 and T3, the other participants of 01 and 02, were

closely matched i n tra in ing prior to join ing the experiment, as wel l as after: they took

part i n the production of the same program, Project C, under the d irection of the same

meta-programmer, Ml . The particular pairings of M l wi th T4 and M2 with T3 were

obtained by random selection. After the pai rings the two teams were given identical task

orders (3.6) which they implemented i ndependently. These teams were set up to

demonstrate the relative i nsensitivity of the method of the meta-programmers'

personal i ty, wh i le the other variables (environment, problem specification, techn ician

selection criteria, techn ician tra in ing) were held as comparable as possible.

To approximate the potential of the other two Project 0 teams, the Project 0 control

team was organ ized around a senior member, PI, and a jun ior programmer, P2. The latter

had a B.A. degree in Mathematics and three years of systems programming experience.

He was hired on the basis of references and an in terview. No written tests were given;

this is now considered a mistake. Accordi ng to standard industry practices, his starting

salary was 3 1% higher than the technicians'. He was allowed three weeks to get

acquainted with the i mplementation language and the tools.

3.5 Measurement Methods

The simple measurements obtained from the early experiments were weekly printouts of

the lengths, i n characters ( 3.7), of all meta-programs and source language programs. At

the same interval, the conten ts of these files were also stored on magnetic tape. Manual

record keeping of time spent in various activities was also attempted and abandoned as

i mpractical.

In the main experiments, collection of productivity data was aided by software

mod ifications to the edi tor to record data on a measurement file. Records i n this file are

in form of text l ines, each containing the date and time, the name of the person working,

and a code identifying the format of the remain ing variable portion of the record. The

contents of the latter part depend on the nature of the event being recorded:

Ed iting of files is performed on temporary copies for techn ical reasons. When

the ed i ts are complete, the user issues a save command to store the edi ted copy i n

79


a permanent fi le. For every save, a measurement record is made showing the

filename. the n umber of characters written, .the change in the size of the file, and

a breakdown of the characters written by source, which may be the keyboard, the

previous version of the same file, or different fi les identified by their names.

At the end of an editing session, usual ly right after the edi ted files are saved,

general information about the time spent editi ng, the n umber of commands typed,

the total number of characters entered from the keyboard is recorded on the

measurement file.

Also at the end of a session, the BCPL compiler or the loader may be designated by

the user as a successor program. The designation and any parameters to the

successor, such as the name of the fi le to be compiled, are also recorded. When

the compilation is complete, control is automatically returned to the edi tor and

the l ist of compilation errors is d isplayed. The user is prompted to make a

comment about the number of errors (see below). This way the use of the

compi ler and loader can be mon i tored by the edi tor's measurement mechanism,

provided the user abides by the conventions and always cal ls these programs from

the edi tor.

The user can also make miscel laneous comments which wi l l be recorded. For

example, a sty l ized comment may mark the beginn ing and end of a work period,

the reception and completion of a task order, or other important events.

The precise format of the measurement file is documented i n Appendix B.

I n preparation for processi ng the col lected data, the implementation of a simple

Management Information System was also undertaken as Project C (Append i x C).

3.6 Task Specifications

It should be emphasized that the object of the experiments was to measure productivity of

software production organizations working on well-defined ( 1.2) problems. Other

characteristics of the problems and the qual i ties of the abstract solutions were not of

primary i n terest.

For Projects A, B, and C there were no fi xed specifications prepared i n advance. The task

orders to the experimental group, comprising the production organization, were the

statements of problems; the organization was to produce code working toward the solution

of the problems. These were:

Project A: prepare a cross-referenced l isting of a set of BCPL files.

META-PROGRAMMING: A SOFTWARE PRODUCT'ION METHOD

Project B: al low editing of BcPL source text and other documents with commands such as insert, delete, search, read and write fi les, and transfer data between files.

Project C: implement a query language operating on measurement fi les (Appendi x

B), powerful enough to obtain productivity figures from a database that may

contai n errors.

The lack of pre-plann ing meant that the designs had to be d ivided into relatively

i ndependent parti tions so that one part could be implemented while another was

designed. The remain i ng parts were considered only i n general terms before full attention

could be focussed on them. This mode of operations was consistent with the principles of

continuous process production expounded i n Sections 1 .2 and 1.3. The success of the

partitioning, and indeed, the success of the production effort, was dependent on the

meta-programmer's understanding of the tasks. The above problem statements appeared

wel l-defined for the particular meta-programmer Ml because of his earl ier experiences

w i th s im i lar systems. The resulting design for Project C is described i n A ppendix C.

For Project D (01, 02 and D control) it was important that all groups work on comparable tasks. Accordi ngly, a detailed task order was drawn by an external

collaborator. The order is shown in Append ix D. It specifies a uti l i ty program which

can permute d isk storage while keeping the assorted d irectory and file structures intact.

The reason for permuting storage is usual ly to bring logical ly consecutive file pages

together in the physical address space in order to improve the speed of sequential access

i n the rotating memory. Uncertainties about the permu tation algorithm and the user

i nterface were absorbed by the order. Although the directory and file structures were not

described in the order, they were amply documented elsewhere (for example [ Lampson2])

and were also wel l known to M l, M2, Pl, and, to a lesser extent, to P2.

3.7 Productivity Accounting

The simpl ified productivi ty measure, i ntroduced in Section 1.5, is defined as the amount

of completed source code divided by the man-hours associated with i ts production. I n

this section, a more detai led breakdown of the components of the productivity calculation

is given.

The quantity of code is always measured in characters [ASCI I], although i t may be

expressed as " l i nes" of 26 characters. The count of characters is not only more

conven ient to obtain for measurements, but it is also more i nvariant of style. The

conversion factor 26 has been obtai ned by counti ng li nes in a represen tative sample of

BCPL source programs. Lines whol ly blank were not coun ted. End of l i nes counted as

si ngle "carriage return" (CR) characters. The sample programs were properly indented;

81


each i ndentation level on each l ine counted as one "horizontal tabulation" (HT)

character. Conversions of the productiv i ty figures to other l i ne length statistics can be

readi ly performed by converting to character units first.

Code produced by SPT's contained no explanatory comments (2.7.5), but standards

required a comment statement with the name of every procedure and approximately five

comments identifying various groups of declarations in every source module. A l l

comments appearing i n code produced by the Project D control team were i ncluded in the

length measurements.

The lengths of meta-programs, although reported separately, were not i ncluded i n

productivity figures.

Externally produced shared code was excluded from the productivity calculation i n all

projects. I nformation on sharing opportun ities was made avai lable to al l three Project D

teams equally.

The final production figure for every project refers to net l i nes, that is l ines debugged to

proto-software qual i ty (1 .2). Figures reporting on the in termed iate progress of projects,

however, do not d isti nguish between debugged and undebugged l ines because that would

be i mpractical. While not measuring true productivity, these intermediate figures are very useful i n investigations of the continuous production process (3.9).

Although the measurements show the precise number of hours worked by al l participants,

productiv i ty was calculated on the basis of standard eight-hour days, with only a few

exceptions. I nherently part-time activity, such as advance design activity by the

meta-programmer was included as measured. Overtime (3.9.3 .1) was also i ncluded as

measured. Days of physical absence by sen ior participants were not inc luded. There were

no sick leaves or personal leaves taken during the projects.

I t is important to note that the meta-programmer's time was charged against the SPTs'

productiv i ty. The on-the-job tra in ing time (3.4) of the technicians was simi larly

i nc luded. The time for special train i ng of M2 and P2 (3.4) was excluded.

3.8 Potential Sources of Measurement Errors

There were a number of fai l ure modes of the measurement setup (3.5) which caused the

in termi ttent record ing of erroneous information. Using the redundancy i n the

measurements, inconsistencies in the data were local ized and the errors were estimated or,

in most cases, corrected. The particulars of this process depended on the fail ure mode.

META-PROGRAMMING: A SOFfWARE PRODUCTION M ETHOD

For example, the min i-computer used i n the experiments (3.3) relied on a time

base, kept in unprotected core, for keeping time. The measurements, i n turn,

recorded the time as provided by the machines. I t was not uncommon for the

base to get lost whi le programs were debugged. Many of these events were noticed

and corrected by the users. Others were found by using the Project C system to

scan the database for records with time stamps out of order. Each instance of the

error was i nspected and the correct time was estimated to fit the correctly

recorded neighbouring records. Correction of the database was done by manual

editing.

The procedure for local izing and correcting other errors followed the same

pattern. First, the database was scanned by a special purpose Project C program to

find all questionable records. The selected records were then inspected and

corrected if necessary.

Another common error was the operator's omission to mark the beginning and the

end of a working period (Appendi x B). These were easily found after l isting a l l

i ntervals of apparent inactivity which were ·longer than 30 minutes.

While it was possi ble to omit records of compilations, call s on the loader, and

syntax errors (3.5), in fact, the records of these events are precise because the use

of the correct procedure was actually s impler than the alternative.

Records of the number of semantic errors (bugs) were generally unrel iable, partly

because of the subjective element in deciding what constitutes a bug, and partly

because of the complexity of the procedure: at the time the bug was found, the

user was usually working with the debugger but the record had to be made in the

editor. An i ndependent rough estimate of the n umber of bugs can be obtained

from the number of re-compi lations and toads.

During the experiments, source code fi les were frequently copied and renamed for

backup, recovery or other purposes. This created a dangerous situation in which

the same code might have appeared in the measurements under different names

and m ight have been counted more than once. Careful mon itoring of the

appearance of new fi lenames in the database helped to account for these events.

3.9 Experimental Resul ts

The summaries of the measurements are given i n Append ix E. Selected measurements are

also plotted in Figures 10 through 13. These measurements do not, in themselves,

comprise the experimental results. The fol lowing sections wi l l complete the basic

measurement data with particular interpretations and with the descriptions of other, not

8 3


readi ly quantifiable, results. The summaries by no means lessen the importance of the

highly-resolved detai ls of the measu rements: in some instances the m ethod of

in terpretation and the acceptabi l i ty of s impl ifications depend on the nature of the data.

Moreover, access to the detailed data offers the opportun i ty for alternative

interpretations. Finally, some of the measurements are also of general interest.

3.9.1 Early Experiments Group (Projects A and B)

The simpl ified programming productivity obtained dur ing this early effort can be

calculated from the data given i n Appendi x E.1 as fol l ows (see also Section 3.7):

5671 source l i nes I (13 weeks - 3 hol idays) • 3 employees - 3.81 1/m-h

In addition of the executable code, the projects y i elded more than 3800 l ines of

meta-programs. We shal l cal l the ratio of source length to the length of the

meta-programs, the meta-program expansion. In this experiment, the expansion was

149%. Reliabi l i ty, user acceptance, and modifiabi l ity of the products were excel lent;

numerous extensions to the Project B edi tor (such as the addition of measurements (3.5))

were later implemented by M1, T1, and also by other programmers whose in terests were

unrelated to the experiments.

The occurrence of specific d ifficulties d uring the projects suggested the the exhibited

productivity could be improved just by ref ining the method and the conventions. I n

particular, several days were wasted because of the i nsufficient understanding of the

modularization requi rements of the BCPL system. The module template final ly developed

has been in use through experiments C, 01, and 02. Not al l of the naming conventions

described in Section 3.4 were known during the early experiments; i nstead of using the

standard constructions aX, eX, dX, or iX (3 .4), different and often incons.istent tags were

introduced. Procedure names (2.7 .2) were not regular at al l . Check procedures and test

print procedures (2.6) were written only after some time had already been wasted by

conventional interactive debugging.

Inspection of the graph of the weekly changes in productiv ity (Figure 10, upper portion)

yields some interesti ng results. We note that there isn't much evidence of a learning

curve for the technicians. By the end of the third working day i n a completely new

programm ing environment, with the help of the meta-programmer the two technicians

were able to write about 300 l i nes of code (see E.l). However, th is figure is not di rectly

comparable to the long-term average performance because the in itial transient period was

not burdened with debugging tasks. Also, the in i tial meta-programs were especially

careful in specifying the kind of programming language constructs which were expected to

be used in the elaboration.

1 000

800

600 r - - - ,.:- - - - ., I

I

400

200

.. - - - .. I I I I 1 I I I I I I I I I I I I I

I -- - - -

r- - - - i - - .1 I r - .......---t I

A+B

0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3

1 000

800

600

400

200

r - - -, I I I I I I

I ._ __ _

r - - - ., I

I . - - - .J I I I I ._ _ _ _ , I r - - - ., ._ _ _ _ ., I

I

r - - - -. I I I I I I r---1

L--J I I ._ _ _ _ , I I I I

I

r - - -.. .. --- -' : I •

0 . · -:::1:·- - -:=1

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9

c

20

Figure 10 Plot of weekly changes i n coding and meta-programming productivi ties (sol id and broken l i nes, respectively) in the early experiments group (above) and in Project C (below). The X axis is marked i n calendar weeks. Coding productivity is shown as l i nes of code per week per techn ician; adjusted for short weeks where i ndicated. Meta-programming productivity is shown as l i nes of meta-program written per week.

00 VI


Dumps of the project state show that the first load of the A system occurred during the

4th week (the modularization problem mentioned earl ier surfaced at the same time), and

· the system was released i n the 8th week. System B was fi rst loaded during the 7th week.

The first meta-program for a system B procedure was issued as early as the second week.

The overlap between the two projects explains why Project A did not have a "tai l", a final

transient period of reduced productivity caused by the preponderance of debugging tasks

relative to code creation tasks. Project B exhibi ts a tai l , starting at about the 9th week.

3.9.2 Project C

In this project, the fully developed meta-programming method, as described i n Chapters 2

and 3, was appl ied to a medium size problem (3.6). The s impl ified productiv i ty obtained

(E.1 , 3. 7) was:

13944 source l ines I (19 weeks + 1 day - 1 holiday) * 3 employees - 6.12 1/m-h

Separati ng the contributions of the two technicians, we have:

T3: 7423 source l ines - 6.51 1/m-h

T4: 6521 source l ines - 5.72 1/m-h

The n umber of compilations and program loads performed by the techn icians were also

very sim ilar (959 versus 846 and 573 versus 651 respectively (E.2)). The techn icians spent

most of their time working on disjoint portions of the system; T3 concentrated on the

compi ler and the user i nterface, while T4 worked mostly on the run-time environment

(Appendi x C). Any variation of the i ndividual productivi ties should be viewed in l ight

of the possible differences between the complexities of the subtasks worked on.

After the completion of the project, the f inal product worked rel iably when used to

process the more than 800,000 characters of measurement records collected during the

experiments. About 20 to 30 programs of an average length of 50 l ines were written i n

the C language. The Summary of the Measurements i n Appendix E was compiled from

the outputs of these programs.

Al though a small number ( -5) of programming errors were also uncovered, the

most serious operational problems were caused by the lack of certain capabi l i ties

(2.9.2). For example, it was discovered that for some complex reason , fi lenames

i n the database had been inconsistently l isted in either lower or upper case letters.

The implementation of a special-purpose function to convert strings to lower case

M ETA-PROGRAMMING: A SOFTWARE PRODUCfiON M ETHOD

was imperative to solve thi s problem. This experience supported the theory that

the last fraction of production errors would be domi nated by specification

problems (2.9.6). Lacking the production team, this i mplementation task was

successfully undertaken by the meta-programmer.

The meta-programming conventions and the debugging organization described i n Sections

2.5, 2.6, and 2.7 were used with good results. The check procedures were very effective i n

localizi ng the complex failures of the storage allocation and garbage collection algorithm

required by the C language.

A n in teresting application of checking procedures was called for i n the sol ution of

a rare "real-ti me" error. The i n i tial indication was a consistent machine halt but

at a random place in the code. I t was immediately concluded that the i ndication

was related to some side effect of the code bei ng debugged on the only unprotected real-time process in the computer: the 60 cycle timer i n terrupt. To

find the origin of the side effect, a check procedu re was defined as follows: the

program state is correct (for this purpose) if the 60 cycle i n terrupt can take place,

otherwise it is incorrect. To implement this defin i tion, the check procedure just

had to idle more than one-si xtieth of a second, to al low at least one interrupt, and

then signal that the state is correct. An observed machine halt served as the incorrect state signal. A binary search (2.6) located the error in a few i terations.

Note that the check procedure used only an externally known property of the

timer interrupt, namely, that it takes place 60 times a second.

The total length of the meta-programs was 4916 l ines (284% expansion). Compared to

the early experiments, the h igher expansion may indicate a more efficient style, or the

development of a richer local language i n the longer project. The plot of weekly changes

in productivity (Figure 10, lower portion) shows evidence of the growth of local language

where the vol ume of meta-programs decreases while code production remains

approximately level; for example d ur ing weeks 5 through 8, 9 through 12, and especially

during weeks 13 through 15. This effect is the most pronounced during the bottom-up

implementation of a new subtree in the structured hierarchy (2.2). The "sawtooth"

starting at the 13th week, for example. marks the implementation of the run-time

in terpreter and the various run-time standard procedures (C.6). I t should be noted that

the writing of the meta-programs were timed so that elaboration could usual ly commence

immediately after a meta-program had been issued. For this reason , variations of weekly

meta-programming and cod ing productivi ties should correspond without appreciable

queu ing delay.

It is apparent from the measurements ( E.2) that in Project C, the i n i tial tra in ing transient

has ended by the second week of operations. For techn ician T3, d ur ing the second week

all indicators ( l i nes wri tten, compi lations, loads) were above the long term averages.

87


During the same week, some of the indicators for T4 were lower, yet comparable to his

own averages over the first 9 weeks of the project

To simpl ify the evaluation of the measurements, Project C had been brought to a halt

before Projects 01 and 02 were started. The final transient of Project C, closely

resembl i ng the tai l of Project B. can be observed starting at about the 16th week.

The measurements also show that there was, on the average, one compilation for every 6

source l i nes. Given the average productivity of 6.12 1/m-h, we see that one man-hour

supported the average compilation ( 40 m i nutes, if the meta-programmer's t ime i s

excluded). One loading was performed ( implying approximately one bug) for every 11

source l ines. Obviously, compilation and load times (ranging from 30 seconds to 3

m inutes) had very l i ttle effect on productivity.

3.9.3 Projects Dl, D2, and D control

The purposes of the 0 experiments were (3.4) to measure production results i n groups

lead by d ifferent meta-programmers (Project 01 versus 02) and to compare the

performance of the meta-programming organizations with the performance of a group of

s imi lar size but using trad itional techniques (Projects 01 and 02 versus D control). The

optimal experimental ensemble would have Jet the three experimental groups work on the

same problem specifications, produce comparable products, and achieve the same

milestone before their termination. The actual execution of the experiments fel l short of

the ideal i n a number of ways. First, the scope of the problem was reduced midway

through Projects 01 and 02 (Appendix D); the 0 control team was given the simpl ified

specifications from the beginning. Second, Projects 01 and 02 had to be termi nated

before normal operations of the product could be demonstrated, al though test output

indicated the correct operation of large portions of the programs.

One problem with the large-scale experimental approach described i n Section 3.2 was th2t

the same resource l imitations preventing the repeti tion of the experiments for control,

prevented the exti rpation of anomal ies. Approximate results can be sti ll obtained by

careful consideration of the possible effects of the anomal ies. The fact that the size of

the program was in i tially misjudged ind icates an engineering, rather than production,

problem ( 1 .2). The causes and remedies of such mistakes were beyond the immediate

i nterests of the present research.

Al l three groups chose to rely on the services of the existi ng operating system

[ Lampson2] and on the same l ibrary sort routine. The sizes of these common routines

are excl uded from the program sizes l isted below and in Append ix E.

1 000

800

600

400

200

1 000

800

600

400

200

1 000

800

600

400

200

-200

-400

-

- - - -.. : _ _ _ _ j I I I 1

,- - - -, I I I I I I I I I I - - -J

.....

L- - - "

I � - - - -' I

.. - --

- - - -

I I I I '" - - --

0 1

02

0 CONTRO L

Figure I I Comparisons of productivities i n Projects 01, 02, and 0 control. Projects 01 and 02 are plotted accord ing to the conventions of Figure 10. The last plot shows the total codi ng productiv ity of the two participants of Project D control.

89

90

1 000

800

600 P2

400

200

0 1 • • •

800

600

s-P 1

400

200

0

7 [ • •

-200

-400

Figure 12 Comparisons of the ind iv idual productivi ties of the two participants i n Projects D control. The plots fol low the conventions of Figure 10. The sum of these two curves appears i n Figure 1 1.

3000

2000

-- - 0 1

1 000 - - - -- 02

3000

2000

1 000

-- 0 CONTR O L

8

Figure /3a Lines of code accumulated i n Projects 01, 02, and 0 control as a function of elapsed time. X axis is marked at every 5 working days elapsed.

- - - M 1 - - - - - M2

Figure /3b Lines of meta-programs accumulated i n Projects 01 and 02 (by meta-programmers M l and M2, respectively) as a function of elapsed time. Triangular symbol marks start of code production.

91


3.9.3.1 Results of Projects Dl and D1

It i s conservatively estimated that both projects 01 and 02 were terminated 4 man-days

before operational demonstrations. These estimates are supported by the following

observations: in both projects, all meta-programs have been completed and all code has

been written ; test output indicated that the most important sections of the programs were

working correctly; all participants have demonstrated previously thei r abi l ity to design or

elaborate code which was free of major surprises; and at 4 man-days, the simple

productivi ties of Project C and 01 would be approximately equal. A val iant, . but

unsuccessful , attempt to reach the mi lestone was i n fact made in 10 hours of overtime

(Appendix D), prior to the impending Christmas vacation period. The estimates are

equivalent to declaring the projects 92% complete (see below), a difference of 1 man-day

in the estimate would change the results by approxi mately 2%.

Mechanical application of the productivity accounting principles used earl ier yields the

following n umbers:

01: 2399 source l ines I 49 man-days - 6.12 llm-h

where the denominator is:

(5 weeks + 2 days) * 1 meta-programmer + (3 weeks + 3 days) * 1 technician + 4 man-days of debugging (estimate)

02: 2467 source l ines I 49 man-days (same as for 01) - 6.29 llm-h

The l ines of meta-programs written in the two projects differed considerably:

01: 1572(-187) l i nes, expansion: 173%

Note: 187 l ines of meta-programs were never elaborated because of the change i n problem scope.

02: 2304 l ines, expansion: 107%

The cumulative plot of meta-programming production is depicted on Figure 13b. The

start of meta-programming preceded the start of code production by more than one week

in both projects. Experience with Project C showed that supporting the immediate start

of coding put an unreal istic load on the meta-programmer. The lead times in Project 01

and 02 were to be used by the meta-programmers to build a comfortable backlog of

meta-programs. The difference in the lead times (also shown in Figure 11 ) is not thought

to be of significance.

The 173% expansion of the meta-programs in Project 01 was less than in C (184%)

although both projects involved the same subjects: M 1 and T4. The difference suggests


that d ue to the smal ler size of the project, the local language of 01 was less rich than that of C. Since the actual times spent meta-programming by M1 and M 2 were nearly equal

(98 and 96 hours, respectively) the lower efficiency of M2's meta-programs can be

attributed to a more verbose writi ng style. Also, M2 and T3 did not have the benefit of

prior col laboration so the meta-program expansion should be more comparable to that of

Project A (which was probably less than 149% (3.9.1)), than of 01. Some .of the verbosity

i n M 2's meta-programs found i ts way in to the elaborated code as wel l . The density of the

02 code was 3.61 binary words/source l ine, lower than the density of 01: 4.58.

Inspection of the code shows that M 2's selection of longer tags and extra-long identifiers

when the tags were combined (2.5) was the major cause of the lower density.

If the l ine counts were obtai ned by actually counting carriage-returns i nstead of the character counting method (3.7), the longer identifiers would have made only a small difference. Of course, the counts of carriage-returns would be sensitive to some other stylistic variations.

Compensating for the code densi ties changes the relative productivity figures. If 02 had

the same density as 01, the source length of 02 would be: 8898 words I 4.58 words/l ine =

1943 l ines, and the simple productivity measure would show:

02: 1943 01 density l in es I 49 man-days - 4.96 1/m-h

The considerable difference between the sizes of the programs in binary words (01 :

10988, 02 : 8898) was partial ly d ue to the differing amounts of test code built i n to the

programs. Inspection of the sources showed 423 l ines of test code in 01 (check

procedures, test prin t, and a functional simulator for the disk), versus 70 l ines in 02.

Removing all test code from both programs would have left approximately 9050 words in

01, 8650 words in 02. Other causes of the difference in size i ncl uded the unequal impact

of the changes in the problem specifications, and differences of programming style.

The weekly rates of code production are plotted in Figure 11. The cumulative plot of

code production is given in Figure 1 3a. These plots do not include compensation for the

differing code densities. It is apparent from the data in figures 10 and 11 (also in

Appendix E) that in both 01 and 02 , code was wri tten a t higher rates than duri ng any

week in Project C. Note that figures 10 and 11 were plotted in commeasurable un i ts. The

higher coding productivity of the techn icians can be partial ly attributed to the full

support of the meta-programmer, whereas in the earlier projects, the attention of the

meta-programmer was divided among two technicians. There were some indications that

the time of meta-programmers were underutil ized, especially toward the end of the

projects. In particular, both meta-programmers found some time to help debugging the

code. Measurements of their contributions are shown i n the Appendix ( E.5, E.6).

In summary, the short Projects 01 and 02 were at a relative d isadvantage compared to

the longer Project C, for three reasons. Fi rst, there was not enough . time for the

93


development of a powerfu l local language. Second, the meta-programming and codi ng

capacities of the min imal production team of two persons are unbalanced. Lastly , the

diseconomies of productivity transients at the project boundaries are relatively more

sign ificant in the smaller projects.

3.9.3.2 Results of Project D control

The s imple productivity of the control group was:

D control: 2893 source l i nes I 69 man-days - 5.24 llm-h

where the denominator is:

(6 weeks + 4 days) • senior programmer Pl + 7 weeks • jun ior programmer P2

However, this result i s not d i rectly comparable to the corresponding results of Dl and 02, because of substantial d ifferences in programming style, such as the i nclusion, by the

control programmers, of ample comments in the code. Note that elaborated

meta-programs do not contain comments (2.7.5), and while meta-programs substitute for

comments in a sense (2.7.5), they are not i ncluded in the source length measurements ( 3.7).

The plot of weekly rates of code production for the control group as a whole in given i n

Figure 11. Thi s plot shows the sum total of production by the two programmers, as

opposed to the 01, and 02 plots which show the productivity of a s ingle techn ician,

which, however, was supported by another person, the meta-programmer. All three plots

then show the effective productivity of 2 persons (1.5 persons in the sim i lar A+B and C

plots of Figure 10). Contributions from the two participants in the control experiments

are separated in Figure 12. The cumulative plot of code production is shown in Figure

1 3a.

The drop of the productivity curve below zero in Figures 11 and 12 was caused by the

senior participant, PI, edi ting and removing portions of the source code originally written

by the junior programmer P2. The reasons for the removal of source wi l l be discussed

below. Even after the trimming, the density of the code remained low: 2.97 words/source

l ine. Compensating for the densi ty, we get:

D control: 1876 Dl density l ines I 69 man-days - 3.40 1/m-h

The fin ished binary code was only 6364 binary words long, not incl ud ing the largest

fraction of test output routines which were prepared in separate program packages. The

code, however, implemented a simplified design, based entirely on the reduced problem

specifications (Appendix D).


As descri bed i n Section 3.4, the control team was organized of a senior participant. Pl. a

peer of Ml , and of an experienced jun ior programmer P2. The qualifications of P2 were

necessari ly different from the qual ifications of the technicians (T3 and T4); tradi tional

organization required experience for i ndependent performance in all phases of

programm i ng, i nclud i ng design, codi ng, and debugging. The greater experience of P2

would tend to make control comparisons tess favorable to Dl and 02, hence provide

conservative results. However, the only avai lable measures of P2's experience were

ind irect: n umber of years s ince BA degree, employ ment references, and salary h istory.

Before the start of the project, P2 had three weeks to work w i th another programmer on a

s imple u ti l i ty program so that he could get acquai nted with the programm i ng

environment. This train i ng t ime was not i ncluded i n the productivity measurements.

Unfortunately, Pl and P2 did not have an opportuni ty to meet before the project started.

During the first week of the project, the participants parti tioned the task along a

convenient l i ne: Pl was to work on the permuter (the second phase of the program, see

Appendi x D for the detai led specifications), whi le P2 was to write the planner (the f irst

phase). Pl assumed the leadership role by defin i ng a high l evel block-diagram of the

planner and by providing general guidance. The effectiveness of the guidance was

reduced by the difficulties of comm un ication between the programmers who were both

developing disjoin t local languages.

For example, Pl asked for ample test o utput to simpl i fy debugging. P2 compl ied,

except for a subtle detail; the test o utputs, at numerous places i n the planner,

contained the output values sampled before the output records were assembled

from the values. When there were any errors in the (non-tr ivial) assembly of the

records, the output sti l l appeared correct. It is, however, very d ifficult to describe

the correct way of implement ing test output, as well as al l other parts of a

program where subtle mistakes may be made, unless the communicants use the

same local language.

Measurements i n Appendix E.5 show that Pl d id very l i ttle, if any, debugging before the

4th week of the project. By the 7th week, the permuter was essential ly debugged and Pl

took over the debugging of the existing portions of the planner, while P2 was worki ng on

addi tional planner code. P2's employment con tract was termi nated after the 7th week and

Pl brought the project to i ts successful conclusion alone.

The shortcomings of P2's code came to l ight during the last two weeks.

Substantial amounts of source text removed by Pl i ncluded the mislead ing test

output statements (see above) and n umerous imprecise comments (cf.

[Kernighan-Piauger] page 119). I n some instances, instead of decipheri ng

erroneous logic, Pl replaced whole sections of the code ( ibid. page 50).

95


The individual contributions of PI and P2 in the total product can be estimated from the

data in Appendix E.5, by assuming that P1 created 100 lines of source during both weeks

7 and 8, since the n umber of lines typed on the keyboard were similar during weeks 6, 7,

and 8, and 110 l ines were created during the 6th week. Under this assumption, PI's share

was 1650 l ines (57% of total), versus P2's 1243 lines ( 43% of total).

c'HAPTER 4: CONCLUSION

98

4.1 Conclusions from the Experimental Results

The production experiments verified the qual i tative pred ictions of the theory. A

production organization was set u p which successful ly i m plemented a number of small

and med ium size systems at production rates above 6 J ines/man-hour (3.6, 3.9.2). This

organization was u nique in that i t could uti l i ze the experience of a single person. the

meta-programmer, for leverage i n a production team. G i ven an experienced

meta-programmer. equally good results were obtained by different technicians (Project C)

who satisfied certa in selection cri teria (3.4). These results are i nterpreted to mean that

the meta-programmer has absorbed most of the uncertainties ( 1.2) i nherent i n software

production which would normall y cause large differences i n individual productivities to

appear (1 .5).

Uncertainty absorption d id not mean that the task of the techn icians. the other members

of the production teams. was reduced to .routi ne. As the tasks were performed, the

technicians learned the problem specific local language (1.6) and progressively i ncreased

the ir relative contribution (3.9.2).

Techn icians were able to grow on the job; in particular, one former technician became the

meta-programmer in Project 02.

Further leverage was obtained by the separation of the engineering activities from the

production organization which i ntroduced another layer of uncertainty absorption. I n

Project D, the problem specification. prepared by an engineer. removed the major

uncertainties from the program implementation. Working from the specification. two

d ifferent teams, one lead by an experienced meta-programmer, the other by a less

experienced former technician , obtained comparable productivity results (6. 12 versus 4.96

l i ne equivalents I man-hour (3.9.3.1, Figure 13a)).

Although the time spent meta-programming was virtual ly identical for both

meta-programmers (3.9.3.1), the meta-programs written by the less experienced

meta-programmer, M 2, were substantially longer than those written by the more

experienced M l (Figure 13b). M 2's meta-programs were clearly not as efficient as

M l's, si nce the latter's group had higher net productivity, yet, considering the

c i rcumstances, the d ifference was surprisi ngly small .

Non-productive train ing time for techn icians was consisten tly negl igible (Projects A and

C) because what would be usual ly classified as tra in ing was recognized not to be

qual i tati vely different from the contin uous learn ing process which took place throughout

the projects. Meta-programs, written at different levels of detail , could serve as the main

M ETA-PROGRAMMING: A SOFTWARE PRODUCTION M ETHOD

i nstruments of communication from the meta-programmer to the techn icians (2.2) at al l

stages of train ing and program development

The results of the control experiment (3.4), for comparing the traditional programming

organ ization with meta-programm ing, were inconclusive, although 'at least one indicator,

the amount of binary code produced i n unit cal endar time, was sharply i n favor of the

meta-programming method (6.1.2 versus 3.40 l ine equivalents I man-hour (3.9.3)). Note

also that all the meta-programming groups also produced complete sets of meta-programs

which could be used as documentation (2.7), and that, in each of the projects, at least two

people were well acquainted with every detail of the logic of the programs. These

ancil lary benefits would be particularly important if the programs produced were parts of

a larger system. The control group on the other hand, could not create documentation as

a natural by-product, except for comments, which had l ess deta i l or uti l i ty than

meta-programs. Also, large portions of the program written by the control group were known only to a single programmer.

The simpl ified subject problem for the control experiment (Appendix D) was probably

too small to create the major communication problems the meta-programming

organ ization was designed to solve. Even with a smaller problem, the simultaneous

requirements of a controlled experiment, for resources and for motivated people with the

right qualifications, proved impossible to fulfil l entirely.

The productivity figures do not show the i ncreased rel iance of the control group on the sen ior participant, a cri tical resource. I n fact, the actual time spent by the

senior participant in Project D control was 30% higher than in 01 (note that this

number was not affected by the early shutdown of Project 01 (3.9.3.1), s ince

meta-programming was com plete before the shutdown ).

The key factor i n the lower productivity of the control group was the inefficient use of human resources: both the senior programmer P1, and the less experienced

P2, have spent most of their time worki ng on tasks of simi lar complexity and

value. Some of these tasks were in fact beyond the capabi l i ties of P2 and this led

to some wasted effort (3.9.3.2). The partition ing of the problem into largely

disjoint subproblems of approximately equal size and complexity implied the

reduction of communication needs of the group to exchanging information about

a narrow i nterface. This organizational simpl ification, however, delayed the

detection of P2's m istakes, and ultimately made it necessary for P1 to debug or to

rewrite unfami l iar sections of P2's code.

The 20% difference between Projects Dl and D control, i n the actual hours

worked per week by the junior participants, accounts for only about 0.3 l i ne

equivalents I man-hour in the productiv i ty difference, if the net con tribution of

99

100 CHAPTER 4: CONCLUSION

the junior participant i n Project D control is assumed to be 43% (3.9.3.2).

The experiments demonstrated the feasibi l i ty of contin uous production ( 1.2), as shown,

for example, i n Figure 1 3a, or i n the smooth transition between Projects A and B (Figure

10). The collection of productivity measurements was almost completely automated. The

measurements could have been used to mon itor and optimize the production process i n

real-time, except for our desire to simpl ify the experimental ensemble and delay the

evaluation of the measurements (3.3).

The use of design principles ' appropriate for high-productivity environments (1.3) was

essential for keeping the production teams occupied. User acceptance of the programs

(especially A and B), showed that the design of high quality programs may be obtained

from a continuous stream of largely i ndependent design decisions, each considered

unimportant in themselves. In Project C, time spent on system design and detailed design

was clearly less than 33% of the total, si nce only one out of three participants, namely the

meta-programmer was involved i n design, and since the meta-programmer had other

responsibi l i ties as well . Considering the actual time, rather than calendar time, spent by

the meta-program mer, we find that design took less than 20% of the total man- hours.

The meta-programming conventions and the debugging organization were also observed to

work well (3 .9.2). They ensured the surprise-free and contin uous execution of routine

tasks, such as the localization of fai l ures. The object naming conventions also contri buted to the actualization of the concept of local language, si nce the object names, i n fact,

comprise a large portion of local languages. Dependence on the existence of specific

programming language features, such as type checking, was reduced.

4.2 Recommendations for Future Work

We expect productivity to remain a key concern in the software industry. Accord ing to

the conclusions presented above, i t is unreal istic to assume that future experiments to

provide unequivocal comparative data about the meri ts and demeri ts of various

production methods could be successfully executed on larger scale and with better

control . It is also evident, however, that the automatic collection of productivity data is

relatively simple to implement. The most promis ing subject for future research,

therefore, might be the comparison of the measurements taken in the large scale software

efforts solving real problems. Innovative software producers should support such research

by col lecti ng and publ ishing productivity data.

Designers of the uti l i ty programs supporting software production, such as editors,

compilers, loaders, debuggers, or job control languages, should make provisions for

productivity measurement. Variations in programming languages and code density


could be accounted for by selecting a mix of representative programs, for example

from the set of standard algorithms publ ished i n the Communication of AcM, to

define the standard 200, 500, and 1000 l ines. These programs could be translated

into whatever programming language is used to yield the correction factor for the

measurements.

I n the designs of future programming languages, emphasis may be. shifted from the question of how can the programming language, by i tself, ensure the highest productiv i ty,

to the fundamentally d ifferent question of what can the programming language contribute

to the organization which has'· the h ighest productivity (2.9). Such shift may also occur i n

the research area of program correctness proofs.

For the business executive who may wish to try the meta-programming organization, we

have the fol lowing advice: Select a programmer with proven techn ical competence and who i s enthusiastic about the idea, as the meta-programmer. H i re entry level person nel

fresh out of col lege for techn icians. Insist that all appl icants be given a programm i ng

test, such as the one i n Appendix A. If the programming environment is properly set up

by the meta-programmer, the train ing time for the techn icians should be very short. The

reasons for the in i tial exclusion of other programmers with experience are, that train i ng time would not be saved, and that the programmer's experience may actually interfere

with the meta-programmer's efforts to control the creation of local language (2.2). Start

the team on a smaller problem (by absorbing the uncertainties about the boundaries, a

subproblem of a larger problem may be also used) and determine the team's productiv i ty

as the basis for future plann ing.

For the most spectacular results, the scope of the problems may be later expanded, so that

the team can go "critical" in the sense of Section 1 .4.

101

APPENDICES

104

Appendix A: Programming Test

The fol lowing programmi ng test was used to select techn icians for the exper imental teams

(3.4). The test was intended to be a s imulation of a meta-program for two reasons: to

help find those appl icants actual l y capable of elaborating meta-programs, and also to give

the appl icants some feel as to what is expected of them.

The f i rst portion of the test is a cover sheet explaining the ground rules, fol lowed by the

s imulated meta-program (The term specialist used on the cover i s a euphemism for

techn ician). The meta-programm ing conventions were not used to avoid the need of

explain i ng them.

The sharp contrast between the complexity of the abstract algorithm (for explanation of

the algorithm see [Knuth]) and the s implicity of the description of the steps i s

i n tentional. It was expected that most applicants would not be famil iar with the

algorithm and would have to complete the task without the benefit of deep

understanding. The appl i cants were given ample opportun i ty to ask questions so that

ambiguities i n the wordi ng of the test could have been resolved.

None of the selected techn icians knew of the algorithm prior to tak ing the test. Those

appl icants famil iar with tite algorithm happened to be also clearly overqual ified.

Common errors i ncluded exchanging elements of KEY, comparing elements of Q,

exchanging or comparing indices, and confusing the value of 0[1] with the name

Q[l] (at GETPO 5). There were also many errors in contorted WHILE statements

into which the appl icants were trying to force the algorithm.

The obvious specification error at PUTPO 5 (repeat from 3 instead of repeat from 4) was

i n troduced un intentionally. The reproduction of the test below has been slightly ed i ted to

conform to the format of the present work. The test given to the appl icants was prepared

on a typewri ter.

The attached sheet contains the descript ion of a programming task, typical of the k ind

of tasks special ists wi l l perform i n the Software Production Team.

Please write the three procedures described, using the language of your choice

(ALGOL or FORTRAN are preferred) . Try to make the code c lean and reasonably

effi c ient. You need not ensure or prove that the specif ied algorithms are correct.

You need not fo l low the spec i fi ed steps exactly. Write comments, but do not

"overcomment". Try to reflect the "state" of the variables in the comments.


The appearance of your completed manuscript shoul d be such that someone unfami liar

with the language should be able to copy it correctly. Ask any questions you wish

and work at an unhurried pace. Good l uck!

A Po (Priority Queue) is an i nteger array with the following properties:

1 . Po[O] contains LPO, the " length" of the Po which is always < MXLPO.

(Assume here that i ndexing with 0 i s al lowed.)

2. There is an i nteger array KEY (with indices ranging from 1 to MXLKEY- 1 ,

inclusive) and:

either LPO=O or

KEY[P0[ 1 ]] 2. KEY[PO[I]] for ali i such that LPO 2. 1 2. 1 .

This simpl y means that Po contains indices of KEY (pointers i nto KEY) and the first

index in Po points to a largest (maximal ) key, thus Po's are sorted in a very weak

sense.

Procedure to add an index to a Po:

PUTPO{Q, INDEX, KEY)

Q is a Po. Add index to a Q as fol lows:

1 . Increment the length (in 0[0] ). If too large, call ERROR("Po OVERFLOW") .

ERROR wi l l not return.

2. Store INDEX at Q[length] so it wi l l be the item at the end of the queue.

3. Set I = l ength.

4. If I = 1 , we are f in ished.

5. If KEY[Q[I]] ) KEY[Q[I DIV 2 ] ]

then exch.ange them and repeat from 3 with I = I DIV 2 . D1v is the integer

div is ion operator. Otherwise, we are finished.

Procedure to remove the index at the "top" of the queue (to obtain an index to a

maximal key) :

105

106 APPENDIX A: PROGRAMMING TEST

GETPO(O, KEY)

This function returns the index as its value. The algorithm is as follows:

1 . The result is 0( 1 ], of course. Save i t. If length=O, call ERROR("Po EMPTY").

2. Move the i tem at the end of the queue to O[ 1 ], the top.

3. Set I = 1 .

4. If 2 * 1 2. length, decrement the length and return with the result saved above.

5. Call 0[1] the "father" and 0[2 * 1] and 0[2* 1+ 1 ] i ts two "sons". Find the one

among the three with the greatest key comparing KEY[SON1 ] to

KEY[SON2] and so on, 3 comparisons altogether. If the father wins,

decrement the length and return as under 4. Otherwise, make I point to

the winning son, exchange same with the father and repeat from 4.

Procedure to check if an array is a Po:

CHECKP0(0, KEY)

Call ERROR("PO STATE INCORRECT" ) if 0 is not a PO

Otherwise return.

Appendix B: Format of the Measurement File

As described i n Section 3.5, measurements of production activity was recorded by the

Project 8 editor. The format of the measurement fi le i s given below. This format was

designed to accommodate extensions so that other tools. such as the compiler or the

debugger, may be also i nstrumented i n the future.

Throughout the description, field names wil l be shown i n lower case sans-serif letters (for

example: t ime) whi le upper case letters or other marks (P, *, or 541 ) denote the val ues of

fields.

All records on the measurement fi le consist of coded characters [ Ascn] and have the

fol lowing general form:

date t ime subsys type rest

where the fields are separated by blanks and the record is terminated by a carriage return

(CR) character. The fields contain the following information:

date

t ime

subsys

the year, month, and day as YYMMDD decimal digits.

the hour (24 hour system), minute, and second as HHMMSS.

identifier of the subsystem which made the record. The editor, the sole

source of measurements in the experiments, is identified as B.

type determines the format of rest relative to subsys. The editor uses two

d ifferent formats, identified as S and Q respectively. These formats are

described below.

rest other i nformation, as determined by subsys and type.

After every successful save command (4.4) the ed itor records the fol lowing information

(preceded by date time B):

S user f i lename nO balance keyboard ( fi lename1 n 1 f i lename2 n2 . . . )

where:

s

user

f i lename

is the type

i s the user (M1 . T1 , and so on)

is the name of the file i n which the edited text is saved. This fi le is

usually, but not necessarily, also the original source of the text. Fi lenames

107

108

nO

balanc e

keyboard

f i l ename 1

n 1

f i lename2

APPENDIX 8: FORMAT OF THE MEASUREMENT FILE

are written with extensions appended. By convention, the extensions

determi ne the type of the fi le: .MP for meta-programs, .SA and .OF

(defin i tions) for source code. Other extensions are also i n use for special

p urpose files.

is the n umber of characters written on fi le fi lename . .

i s the change i n the length of the file fi lename, that i s nO - (the length of

the file prior to the save command). N ote that the balance may be

negative. If the fi le is a new file, created by the save command, balance =

nO.

is the count of characters typed in to the saved text from the keyboard. For

example, if i n a program the word THEN is replaced by typing i n Do and

the result is saved, balance wil l be -2 and keyboard wil l be 2. The

characters in the edited text are flagged so that their origin can be

ascertai ned.

is the name of the first f ile, d ifferent from f i lename, also con tributing to

the saved text. This field i s empty unless copying of text from different

fi les took place.

is the n umber of characters contributed by f i lename1 .

i s the name of the second fi le... As many {fi lenamei , ni } pai rs appear as necessary.

At the end of an edi ting session, when the user executes a quit comman�. another record

is made (preceded by date time B):

Q user elapsed nk nc nd successor corns print • • remarks

where:

Q

user

elapsed

nk

i s the type

i s the user as above

is the elapsed time i n the session measured i n seconds

is the total n umber of characters typed on the keyboard. This n umber is i n

general grater than the sum of the keyboard fields of the type S records

for the session because of some of the characters typed may have been later

removed. Characters i m mediately backspaced over are 'not counted.

nc

nd

M ETA-PROGRAMMING: A SOfTWARE PRODUCTION M ETHOD

i s the total n um ber of characters copied with i n the same fi le or from other

fi les. See also the remark for nd.

is the total number of characters deleted. Whenever characters are moved

(that is copied while destroyi ng the original) both nc and nd wi l l be

incremented.

successor is a code for the successor program which the user may specify before

confirming the quit com mand (4.4). The code B i n this field denotes the

BCPL compiler, L denotes the loader. If no expl ic i t successor is specified,

this field will conta in an asterisk (• ).

corns

print

•

remarks

is the n um ber of prim i tive edi tor com mands executed.

is the number of pages l i sted on the l inepr inter.

unused fields for future expansion .

a possi bly · empty l ist of remarks made by the user. Each remark may

occupy a n umber of fields. The first field is always a remarktype which

determines what follows as described next.

The different remarks, with the remarktypes l isted f irst, and the c ircumstances of thei r

usage are as follows:

E n Revising n source code syntax or loader errors. I n particular, n = 0 means

that the edi ti ng activ i ty is necessary for the f ix ing of syntax or loader error

which has already been accounted for. After every compilation thi s remark

is automatically prompted and the user has to type in the number of

errors. If there were no errors, the DEL key should be used so that the

remark wi l l be omitted altogether.

8 n As above, except for semantic errors (bugs). The error may be in the

meta-program or i n the source as shown by the extension of the f i le being

edi ted.

C n As above, except for repeat efforts to fix semantic errors.

F f i lename This remark is made automatically when the compiler is specified as the

successor program. The f i lename designates the f i le bei ng compiled .

Z SUSPEND Marks the suspension of productive activ i ty: meta-programming,

elaboration, or debugging. Resumption may be marke<.f by e i ther:

109

1 10

Z RESUME

APPENDIX B: FORMAT OF THE MEASUREMENT FILE

or by any other log entry. This remark needs to be used only when the

starting activity does not leave i ts own record; for example when starting

the day using the debugger.

X anything To be used in exceptional situations.

The following two records are examples of the basic measurement formats:

750728 1 23850 8 S T4 PM.SR 7899 7 1 86 ( )

750728 1 23905 8 Q T4 951 95 1 9 1 1 93 8 44 0 • • E 1 F PM.SR

Using the above descri ption as a key, we find that the records were made on the 28th of

July, 1975, at around 12:39 noon. In an approximately 1 5 m inutes long edit ing session,

techn ician T4 fixed a s ingle syntax error in source file PM.SR. After 44 edi tor commands, the length of the file was increased by 71 characters. The new version of the file

contained 86 new characters, the balance, 7899 - 86 = 7813 characters, were transferred

from the old version; possibly somewhat rearranged (note the 191 characters copied and

deleted). At the end of the session, the compiler was called to compile the the same fi le

again.

Appendix C: Project C System Description

· C./ System Overview

An i nformal description of the Project C System ( Pes) is given i n the following sections.

The purpose of this documentation is to define the scope and complexity of the problem

the Project C experimental team worked on, and to i ntroduce the tool used to process the

measurement data.

Pes is a s imple Management Information System consisting of a user i nterface, a compiler

for the C language, and an i n terpreter. A lgor i thms for statistical processi ng of

measurement records can be expressed in the C language. The user i nterface al lows the

· user to pose a query by typing a program in the C language or by referring to a l i brary of

programs. A degree of parameterization of the l i brary programs i s made possible by the

macro faci l i ty. Once the specification of the query is completed, the macros are expanded

and the result is compiled by the compiler and executed by the i n terpreter. Programs for

typical queries scan the whole database and l ist the i r results both on the computer d i splay

and on a scratch fi le using the REPORT proced ure (C.5). Upon termination of the

program, the system awaits the next query.

C.2 Macros

Macros simpl ify the system by substituting for a procedure mechan ism and a run-time

user input mechanism. The latter is accompl ished by writing the programs as macros and

letting the user specify the val ues for the macro formal parameters. For each of these

parameters a prompting message, a default value, and an optional l ist of possible values

may be specified to simpl ify the user's task.

Macros are defined by the construction: {.-name\fparam1 \fparam2 . . . \body} , where the

body must be balanced with respect to braces ( {} ). There may be any number of fparami formal parameter names. A macro cal l is written as: {name\param1 \param2 . . . } . Here

the actual parameters are arbitrary strings of characters not con ta in ing the separator (\)

and balanced in braces. A macro call is equivalent to the expanded body of the macro

defined with the given name. The form {fparami} is a formal parameter call: expansion of the macro body means the replacement of the formal parameter calls wi th the

corresponding parami's. The actual parameters themselves are expanded before the macro was called.

lll

112 A PPENDIX C: PROJECT C SYSTEM DESCRIPTION

C.3 Record Declarations

Records are aggregate values such that the components of the aggregate may be selected by

symbolic field names. The record declaration serves to define the field names and also

to establ ish the correspondence between binary or Ascn external file formats and in ternal

data representation. The s implest form of the declaration is as follows:

RECORD recordtype(fieldname1 : type, fieldname2 : type ... )

The type (for example INT, TIME, ATOM , or STRING) i s used for the in terpretation of

external data only. Internally, fields are s imple variables, and they can hold values of any

type (C.4). Operations on records are d i scussed in Section C.4.6. The recordtype is a

user defined name for the record. The field names are also defined by the user. The

same field name may be used in different record dec1arations.

Record declarations are more complex if the external file format allows variabil i ty i n the

number or in the type of the fields. In particular, the declarations must accommodate the

measurement file format described in Appendix B. Thi s is done by the following devices:

C.3.1 In place of a type, a record or sequence (C.4.7) may be declared by writing

recordtype(f ie ldl ist) or recordtype[fieldl ist] respectively. I n either case, the

expl icit name for recordtype may be omitted. A fie ld l ist of a sequence defines

the succession of types in the sequence in a wrap-around order. Field names are

ignored in the sequence s ince fields are selected by ord inal n umber (C.4.7).

C.3.2 The fieldnamei and the fol lowi ng colon (:) may be omitted in sequences or if the field is just a placeholder.

C.3.3 A conditional expression may appear in a fie ld l ist:

< condit ion 1 f ieldl ist 1: condition 1 fieldl ist ... I f ieldlist>

The < sign may be read as if, the 1 following a condition as then, the 1 : as e/seif,

the final 1 as else, and the > as endif. The condition must be i n the form:

f ie ldname = constant, where the named field must precede the conditional

expression in the same record declaration.

For example, the measurement record format (Appendix B) may be declared, i n

part, a s follows:

RECORD LINE(TIME:TIME, ATOM, TYPE:ATOM, ATOM,

( TYPE = 'S' I FILENAME:ATOM, NO:INT, 8ALANCE:INT, KEYBOARD:INT,

OTHER:[ ( FILENAME:ATOM, N: INT)]

1 : TYPE = 'Q' I ELAPSED:INT, •••

> );


Note how types of some the fields depend on the value of the TYPE field. The l i st

of {fi lename;, n; } pai rs is declared as the value of the OTHER field a sequence of an unnamed record type. The field for fi lename i n these records may be named

the same as a field i n the LINE record.

C.4 Types in the language.

All values i n the C language are i nstances of some type. Most operations restrict the

types of thei r operands. All variables (in c1 ud ing elements of records or sequences) may

possess values of any type. The assignment operator (written as +- or := ) may be used to

assign any type. A complete l ist of types with their associated constants and operations is

given next:

C.4.1 N il : There is just one instance of this type: the n i l value. Al l variables are

in i tial ized to possess the n i l value. Most operations wi l J accept the n i l value and

wil l do somethi ng reasonable, as described i n the sequel. S ince there are no

boolean values, boolean operations (AND, OR , and NoT, also written as &, %, and

� ) interpret the n i l val ue as false and everythi ng else as true (boolean operations

will produce the integer 1 for true).

The n i l constant NIL is avai lable. The constant FALSE=NIL is useful in boolean

operations.

C.4.2 Integer: Sixteen bit i ntegers and the standard arithmetic and relational operations

(+, - , • , I, mod, ++- (+:=) , - +- (-:= ) , min, max, <, ), =) are available.

NIL wi l l be accepted in l ieu of the i n teger 0. Integer constants may be written i n

the decimal system as usual, for example 1 23. The form $x stands for the integer

character code of the character fol lowing the $. The constants MONDAY= l,

TUESDAY=2 ... JANUARY=!, FEBRUARY=2 ... are defined for use by the t ime

procedures (C.5). The constant TRUE=l i s useful i n boolean operations.

C.4.3 Time: Instances of this type may be i nterpreted ei ther as a time interval of

seconds (up to 232 seconds), or �s an absolute date by representing the interval

between the date and the 1st of January, 1900. 0:00. A number of procedures are

avai lable to create and modify time values (C.5). The operations +, - , < , ) , = are

also avai lable.

There are no time valued constants (but see C.4.2 and C.5).

C.4.4 Atoms: Atoms are alphanumeric stri ngs represented by their i ndex i n a symbol

table. The = operation may be used with atoms. Atoms are also used i n

conjunction with sets (C.4.8).

113

114 APPENDIX C: PROJECf C SYSTEM DESCRIPTION

The file conta in ing the symbol table has to be declared in the beginn ing of any

program which uses atoms read from files, by writi ng:

ATOMFILE "fi lename"

The extension .AT for the fi lename is automaticall y supplied.

Atom constants may be written enclosed in si ngle quotes: 'atom'.

C.4.5 Strings: for efficient representation of sequences of characters. The operations are: < , > , =. String concatenation may be written as + or + +- . Substring and find

procedures with various options are l isted in Section C.5.

String constants are written in double quotes: "string". CR i s a string constant

contain ing a single carriage return.

C.4.6 Records: Records type values may be created by the INIT statement (INIT

variable:recordtype) which assigns a variable a record value of the desired type.

All fields in the record are i n itial ized to n i l. Records are also created by reading

the record from a fi le using the NEXT statemen t (C.4.9).

The other operation on records is field selection, written as:

record . f ieldname

When used in an expression, the value of the selection is the value of the field

fieldname in the specific record i nstance. A selection may also appear on the left

side of the assignment operator, in which case the selected field wi l l be assigned a

new value. For example, one can write:

R.F +- R.F + 1

There are no record constants.

C.4.7 Sequences: simi lar to records, except values are selected by indexing. Note that

elements i n the sequence need not be of the same type. N i l is accepted as the

empty sequence.

Selection by indexing is written as: sequence[ integerexpr] . The selection may be

written on the left side of an assignment or in any expression (C.4.6). Index 0

selects the first element in a sequence. The largest i ndex used in an assignment,

pl us one, is called the length of a sequence. Un in i tial ized elements in a sequence

wi l l appear to contain n i l values.

NIL may be used as a sequence constant.

M ETA-PROGRAMMING: A SOFTWAR E PRODUCTION M ETHOD

C.4.8 Sets: A set is a sequence of atoms without repeti tion of any atom. N i l is accepted

as the empty set. Sets may be indexed just as sequences can . Other operations are:

AND, OR, MINUS, IN , and INTO. The first three are the set i ntersection, un ion , and

d ifference operations respectively, all return i ng sets. The binary operations IN and

INTO check the membership of atoms in sets as fol lows:

atom IN set: returns the i n teger i such that: set[ i]=atom, or returns NIL i f

there does not exist such i .

atom INTO setvariable: this operation first ensures that the atom is a

member of the set (by doing setvariable � setvariable OR SET(atom ) i f

necessary) and returns the integer i such that: setvariable[ i] =atom .

NIL may be used as a set constant.

C.4.9 Streams: for file transput. Every stream is associated with a data file and a binary

property determin ing whether the file is encoded as binary data or as Ascn

characters. Binary streams are easier to process. Measurements are originall y

recorded in fi les which are not binary, however. Operations on streams are:

creation (OPEN, C5), i nput, and output. The input statement:

NEXT variabfe:recordtype FROM stream

reads and converts the next record from the stream and assigns the variable the

record value (C.4.6). If the end of the data stream is reached, the n i l value i s

assigned to the variable. The data conversion i s d i rected by the record type

declaration (C.3). The output statement is s imi lar:

NEXT variable:recordtype To stream

For output, the variable m ust contain a record val ue. The record type and the

types of the fields in this record m ust correspond to the record declaration.

C.4.10 Statistics: I nstances of this type contain a set of double precision integer values to

accumulate sums and sums of squares. The ++- operation with statistics type left

operand will form the sums, sums of squares and counts of the in teger values

appearing on its right. Standard procedures are avai lable to obtain the mean and

the standard deviation from the collected values (C5). Other operations: +, - , * , I ,

<, = , > , and REPORT treat statistics type values as double precision integers (32 bits

precision). Calculation of mean and standard deviation are meaningless after

them.

DO is the constant 0 for in i tial ization of variables and to establ ish their types.

115

116 A PPENDIX C: PROJECf C SYSTEM DESCRIPTION

C.4.1 1 Formats: special values returned by certai n standard procedu res (C5). By

presenting these values to the REPORT procedure, the format of the report may be

controlled. The format values themselves wi l l not be prin ted.

C.5 Other Statements

Statements i n the C language are separated by semicolons (;). The assignment statement i s

written as:

leftpart � expression

where the leftpart may be a variable or a selection (C.3). Variables need not be declared

and they wi l l be in itial ized to n i t values. Parenthesis may be used i n expressions,

otherwise the customary rules of precedence apply [Wijngaarden]. The form:

procedure(parameter1 , parameter2 . . . )

is a call on one of the standard procedures (C.6). Procedures which return a val ue may be

called from expressions.

Comments may appear anywhere, starting with double hyphens (-- ) and term inated by

the hyphens or by the end of l ine.

The avai lable loop forms are as follows:

FORALL variable INDEXING sequence DO body

FORALL variable IN sequence DO body

FoR variable FROM integer To integer BY integer Do body

FROM integer To integer BY integer Do body

To integer BY integer Do body

WHILE boolean Do body

.

Expressions may be used where a type is indicated. Sets may be used i nstead of

sequences. Times may be used instead of integers. The BY clauses may be omitted i n

which case BY 1 witt be assumed. The loop bodies are l i sts of statements enclosed in

square brackets ([] ). The statement:

BREAK

wri tten in the body wil l exit from the loop, whi te the statement:

META-PROGRAMMING: A SOFTWAR E PRODUCTION M ETHOD

LOOP

wil l skip the rest of the body. The forms of the conditional statement are:

IF boolean THEN body

IF boolean THEN body ELSE body

IF boolean THEN body ELSEIF boolean THEN body

The special loop form:

FORALL variable:recordtype IN stream Do body

is a convenient short notation for:

WHILE TRUE DO

[ NEXT variable:recordtype FROM stream;

IF variable = NIL THEN [ BREAK ] ;

body

]

C.6 Standard procedures

REPORT( . . . ) prints the arguments one by one on the computer display and the standard

output fi le. The prin tout format depends on the argument types. In particular,

structured values are printed as if their elements were en umerated in order.

FCHARS( i nt ) returns a format value control l ing the n umber of characters to be occupied by

an item on the report. lnt=O means free format.

FlTEMS(int) returns a format value control l i ng the n umber of items per l ine in the report.

lnt=O means free format.

FJUST(I j ) returns a format val ue control l i ng whether the proper characters of the item

should be left (lj is true) or right justified.

OPEN(f i lename, f lag ) returns a stream val ue associated with the fi le f i lename (a string) i n

binary mode i f the f lag is true, otherwise, or i s the flag i s omi tted, i n ASCII mode.

WITHIN(a, b, c ) returns true if, and only if, a is in the closed in terval [b, c ]. The

parameters m ust be times or integers.

WITHOUT(a, b, c ) returns true if, and only if, a is not in the closed interval [b, c ]. The

parameters must be times or integers.

117

118 APPENDIX C: PROJECT C SYSTEM DESCRIPTION

MtN(a, b . . . ) returns the smallest among a, b and so on (times or i ntegers)

MAX(a, b . . . ) returns the largest among a, b and so on (times or i ntegers)

STRING(any) returns a str ing which would be prin ted by REPORT for an i nteger, string,

atom or time val ue.

ATOM(any) returns an atom such that STRING(ATOM{STRING{any ) ) ) = STRING{any )

SUBSTRING{ str ing, i 1 , i 2 ) returns the substri ng from character i 1 up to and i ncluding

character i 2 . Indexing of characters starts with 0 . The n u l l string is returned i f

i 2 ( i 1 o r if the ind ices are out of range.

REPLACE(string 1 , i 1 , i 2 , string2 ) returns a copy of strin g 1 i n which. the substring i 1

through i 2 i s replaced by string2 .

FtND{stri ng 1 , string2) returns the index of the fi rst character of str ing2 i n string 1 , or n i l

if string2 i s not contained i n string 1 .

SET(a, b . . . ) returns a set contain ing the atoms a, b . . .

SEOUENCE(a, b . . . ) returns a sequence contain ing the values a, b . . .

PERMUTE{a, b ) Parameter b must be an i n teger sequence, a is a set or sequence; returns a

permuted by b . (b[ i ] determines the new index of a[ i ] )

PERMSORT(sequence) returns a sequence of i ntegers which is a permutation vector which

if appl ied to the sequence (or set) will result in a sorted sequence. Sets are

sorted by comparing the printed representation (see STRING) of their constituent

atoms. Sequences must contain integers or times.

SORT(a) does PERMUTE{a, PERMSORT(a ) )

DATE{ year, month, day) returns the absolute date (C.4.3) assembled from the i n teger

operands. The integer constants JANUARY, FEBRUARY . . . may be used for mon th.

YEARS{ int ) returns the time interval of i nt years.

MONTHS( int) returns the time in terval of int months.

DAYS{ int) returns the time interval of i n t days.

HOURS{ i nt ) returns the time interval of int hours.

MINUTES{ i nt ) returns the time interval of int min utes.

SECONDS{ int) returns the time i n terval of i n t seconds.


Now(int) returns the current absol ute date.

IYEAR(t ime) returns the in teger year portion of the absolute date.

IMONTH(t ime) returns the i nteger month portion of the t ime value.

IDAY(t ime) returns the in teger day portion of the t ime value.

IHOUR(t ime) returns the in teger hour portion of the t ime value.

IMINUTE(t ime) returns the integer m i n ute portion of the t ime value.

ISECOND(t ime) returns the i nteger second portion of the t ime value.

IWEEKDAY(time) returns the integer weekday of the absolute date. The result can be

checked against the constants MoNDAY, TUESDAY .. .

MEAN(stat, mult) returns mult times the mean accumulated i n the stat ++- . . . operations as

an integer value. Mult may be omitted and then i t defaults to 1 .

SIGMA(stat, mult) returns mult times the standard deviation accumulated in the stat ++- . . .

operations as an integer value. Mult may be omitted since i t defaults to 1 .

DMEAN(stat, mult) returns a double precision mean i n a statistics type value. Mult may be

omitted and then i t defaults to 1 .

C.l Example

Let us assume that given a measurement file (Appendix B), a report of the fi les

mentioned i n i t and their final lengths is desired. The report should appear in two

columns, sorted alphabetically on fi lenames. The program can be wri tten as follows:

{ +-FILELENGTHS\FILE\

RECORD LINE(TIME:TIME, ATOM, TYPE:ATOM, ATOM,

( TYPE = 'S' I FILENAME:ATOM, NO:INT, ... - - see C.3 ) );

FORALL L:LINE IN 0PEN("{FILE } " ) DO

[

LENGTHS[l.FILENAME INTO FILENAMES] +- L.NO

] ;

FORALL I I N PERMSORT(FILENAMES) D O

}

[

REPORT(FITEMS(2) , FCHARS{20) , FILENAMES(l ] , LENGTHS(l ] )

] ;

1 19

120 APPENDIX C: PROJECT C SYSTEM DESCRIPTION

The name of the input file is specified as a macro parameter, FILE. The report is prepared

as two sequences: FILENAMES, a set, holds the names of the files whi le corresponding

elements in LENGTHS hold the integer l engths. Note the use of enumeration through a

temporary permutation vector for printing the report i n alphabetical order.

Appendix D: Task Order for Project D

[ Note: this task order was changed (3.9.3) on December 12, 1975 with the addition of the

fol lowing qual ification:

NOTE: In i ts i n itial version, the program should j ust perform the default permutation:

@OTHER FILES

@FREE SPACE

with or without any i nput.

]

D./ Introduction

Project D is to implement a system for permuting the pages of a disk without changing

the contents of any file or the mean ing of any d irectory. The system comes in two parts:

the planner, which constructs the desired permutation of the pages and writes i t on a fi le;

the permuter, which performs the permutation specified by a fi le, which m ight be

the output of the planner, or might be generated in some other way.

The system should be able to handle up to a m i llion pages and two hundred thousand

fi les. In other words, any per-page or per-file i nformation m ust be kept on scratch files,

not in memory. The details of the disk format and input-output operations should be

wel l parameterized.

In order to make the program work at a reasonable speed, i t is essential to

run the disk at full speed while moving data, as nearly as possible (The time to

transfer one page is typically about one-twentieth of the time to make a random

reference);

do the bookkeeping of page positions with batch-processing techn iques (sorts and

merges) rather than straight-forward table lookups, s ince looking something up

randomly in a table will always require a disk reference.

To construct this system, you will have to know about the structure of a d isk. This

i nformation can be found in the operating system manual.

121

122 APPENDIX D: TASK ORDER FOR PROJECT D

D.2 The Planner

The planner takes as i nput a l ist of pairs: (partition, l ist of entities). A partition i s an

expression which speCifies a set of disk pages. I t has the form

[DRIVES ld, SURFACES lu, TRACKS It, SECTORS 15]

where each I has the form

I : := X I I X x : := i nteger 1 integer - integer I ALL

Part itions are a way of segmenting the d isk. A file i s not al lowed to occupy more than

one parti tion. In other words, if any pages of a file are in a given partition, then all the

pages of the file must be i n that partition.

The l ist of enti ties is a sequence of enti ties separated by spaces or carriage returns. An

entity may be

a file name, which may i nclude #s and • s, which should be i nterpreted as

matching a single character or an arbitrary stri ng respectively.

@OTHER FILES

n FREE PAGES

@FREE SPACE

@FREE SPACE*f

The constructed permutation should leave the fi les in the order i ndicated by the l ist of

enti ties within each parti tion; i .e. successive files i n the l ist of entities occupy success ive

virtual disk addresses. The pages i n each file should occupy disk pages with consecutive

virtual addresses, and should be ordered according to page n umber i n the file.

The entity @OTHER FILES stands for all the files not mentioned expl ic i tly i n the entity

l ist. The entity n FREE PAGES means that n free pages should be inserted at that poin t.

The entity @FREE SPACE*f stands for a fraction f of the free space in the current

parti tion ( i .e. the n umber of pages in the partition m inus the number of pages in all the

fi les in the en ti ty l ist). Here f is expressed i n decimal, e.g. @FREE SPACE* .333 for one

third of the free space. @FREE SPACE stands for any space left over after all the other

enti ties have been taken care of.


Here is an example of i nput to the planner:

[DRIVES 0, SURFACES 0- 1 , TRACKS 0- 1 7 4 225-400, SECTORS ALL]

*.BR

@OTHER FILES

@FREE SPACE

[DRIVES 0, SURFACES 0- 1 , TRACKS 1 75-2 24, SECTORS ALL]

SYSDIR

20 FREE PAGES

BCPL.*

@FREE SPACE

This i n put specifies two partitions. The second one, which occupies the middle tracks of

the disk, wil l contain the system directory SYSDIR (followed by 20 free pages) and all the

BCPL fi les. The remainder of the disk wil l get all the other fi les, with the .BR files first.

The output of the planner is a file contain ing a sequence of d isk addresses. The ith item

in this sequence is the destination of the page which currently has d isk address i. (If

some other representation of the permutation proves to be more conven ient, that i s fine.)

D.2. 1 Planner Algorithm

Here is a possible way for the planner to operate.

1. Look up all the file names in the entity l ists and replace each by the identifier

(serial and version number) of the fi le. This should be done by sorting the

d i rectories and the entity l ists, and then passi ng one against the other.

2. Make a complete scan of the disk and construct a l ist D which describes the

contents of each non-empty disk page: [disk address, fi le identifier, page

number].

3. Sort D on file identifier and page number. Sort the entity l i sts the same way,

keeping track of the position of each entry.

4. Pass the sorted entity l ists against D (all at once) and add the partition and

position within the parti tion to each entry in D. At the same ti me, make a l ist F

with one entry per file which contains the identifier, length, parti tion and

position of the fi le.

5. Sort F by parti tion and position.

123

124 A PPENDIX D: TASK ORDER FOR PROJECT D

6. Now i t is easy to compute the destination of the first page m each file, since the

files in F are in the order i n which they are to appear on the final d isk. Add this

i nformation to each entry of F, and sort i t again by f i le identifier.

7. Pass F against D and add the final position information to each entry of D.

8. Finally, sort D by current disk position.

D.J The Permuter

There are three jobs to be done by the permuter:

1. Move the data;

2. Fix up the chains of forward and backward pointers which l ink the pages of each

fi le together;

3. Fix up all the d irectories so that each entry con tains the new d isk address of the

leader pages for its fi le.

D.3. 1 Permuter Algorithm

The algorithm to be used is a simple recursive one. At each stage it is working on an

active region of n consecutive disk pages, to which some permutation m ust be applied

( in i tial ly i t is working on the entire disk). If n is smal l enough that there is room i n core

for al l the data, simply read in all the pages, and rewrite them in the permuted order.

Otherwise, spl i t the active region i n to two sub-regions A and B, each contain ing n/2

pages, and switch pages between A and B unti l each page is in the proper sub-region.

To do this spl i t, start at the beginn ing of A and fi l l memory with pages from A which

belong in B, leaving room for one track worth of data. Then move to the beginn ing of B,

and track by track read in pages which belong in A, and then write onto the space thus

freed, and any free pages, the pages in memory which are i n transit from A to B. Then

go back to A and iterate this procedure until both regions have been exhausted. The time

requi red is twice the time to scan the entire active region, plus some seek time which wi l l

be fai rly small i n comparison. Now apply the algorithm recursively to regions A and B.

The total time to deal with a region of n pages is roughly


where T is the time to read one page. and M is the number of pages which wi l l fit i n

memory with room for one track more.

This algorithm needs as i nput a l ist which gives the destination of each page ordered by current location of the page. It should produce two n ew l ists which serve the same

purpose for each sub-region, so that the recursion can proceed. The construction of these

new l i sts can easi ly be done while the data is being moved. since all the necessary

information is avai lable i n the right order.

D.4 Remarks

Some care m ust be taken with the scratch fi les. s ince they are being moved along with

everyth ing else. It would probably be prudent to create all the scratch files needed before

doing anythi ng else.

125

126

Appendix E: Summary of the Measurements

The fol lowing reports were generated from the measurement database by small programs

written i n the C language (see Sections 3.5, 3.7, and 3.8). The outputs of the programs

were edited to conform to the format of the present work. The reports are ordered by

projects, and by employees with in a project. Two different types of reports appear: f irst,

a daily and weekly breakdown of the actual time spent by the employee i n productive

capaci ty, and second, the weekly breakdown of the number of l ines of meta-program or

code written and compilations i n i tiated. The precise mean i ngs of the labels used are as

follows:

week

l ines

kbd. l in

weeks are n umbered to correspond to the label i ng i n Figures 10 through

13. If a date is g iven, it refers to the Monday of the week.

net change in the length of meta-programs (for M 1 and M2) or i n the

length of source code expressed in l i nes (3.7).

n umber of J ines typed in from the keyboard. Some of these l ines would

be later deleted or duplicated by copyi ng.

days/week (man-)days in the week, excluding hol idays.

meta-programmers.

Not used for

cor. l in

tot. com

net.com

loads

the l ines col umn corrected for the standard 5 day week. This number i s

used i n Figures 10 through 1 3.

the total n umber of times the compiler was called. It was standard

practice to run the compiler on i ncomplete code to get a listi ng of symbols

which had to be defined.

n umber of compi lations without errors.

the total n umber of times the loader was cal led.

Important note: the n umbers do not add up because of truncation in the terms. Sums

given are the precise sums truncated. Denominators in the l istings of averages were

selected for convenience.

META-PROGRAMMING: A SOF-TWARE PRODUCTION METHOD

E. / Projects A+B

Note: during the early experiments group (Projects A and B) measurements were not as

extensive as i n the later ones (4.4).

Employee: M 1

week l ines

1 1 -Jul-75 340 2 8-Jul-75 530 3 1 5-Jul-75 568 4 22-Jul-75 37 5 29-Jul-75 701 6 5-Aug-75 262 7 1 2-Aug-75 4 1 3 8 1 9-Aug-75 453 9 26-Aug-7.5 80

1 0 2-Sep-75 1 56 1 1 9-Sep-75 1 73 1 2 1 6-Sep-75 1 1 2 1 3 23-Sep-75 8

total 3832 total/ 1 2 3 1 9

Emgloy�e: T1 + T2

week lines man-days/week cor. l i n

1 293 6 243 2 546 1 0 273 3 867 1 0 433 4 235 1 0 1 1 8 5 666 1 0 333 6 666 1 0 333 7 322 1 0 1 6 1 8 722 1 0 361 9 231 1 0 1 1 6

1 0 201 8 1 26 1 1 1 96 1 0 98 1 2 329 1 0 1 64 1 3 398 1 0 1 99

total 567 1 total/ 1 3 436

127

128 APPENDIX E: SUMMARY OF THE M EASUREM ENTS

E.2 Project C

Employee: Ml

M T w T F s s total 1 1 4-Jul-75 ? ? 3 : 1 1 2 :01 0:33 0:47 5: 1 5 1 1 :49 2 2 1 -Jul-75 0: 1 0 2 :24 1 :29 1 :52 4:45 1 :06 1 :07 1 2:56 3 28-Jul-75 1 :03 4:27 3 :34 7:42 2:24 1 :5 1 0:37 2 1 :41 4 4-Aug-75 2:37 4:26 7 :29 7:06 6:52 2 : 1 4 3:49 34:37 5 1 1 -Aug-75 7:25 5:08 4 :31 1 :48 0:29 0:00 2 :31 2 1 :54 6 1 8-Aug-75 4:49 4: 1 3 2 :49 2 :27 3:34 1 :23 3:08 22:26 7 25-Aug-75 3:52 1 :4 1 1 :59 5:07 3:05 0:00 0:3 1 1 6: 1 7 8 1 -Sep-75 5:03 1 :39 5:2 1 4:58 3: 1 6 0:00 1 : 1 5 2 1 :34 9 8-Sep-75 8:09 6:24 3:53 2:58 0:00 0:00 2:24 23:49

1 0 1 5-Sep-75 3:31 1 :58 6:29 2: 1 4 0:00 0:00 0:00 1 4: 1 4 1 1 22-Sep-75 1 : 1 3 0:00 0:00 0:00 2:04 4:20 3:57 1 1 :37 1 2 29-Sep-75 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 1 3 E;>-Oct-75 3:52 5:07 5:27 5:03 4:07 1 : 1 9 2 :30 2 7:28 1 4 1 3-0ct-75 5:54 0:44 0:00 3:09 2 : 1 3 0:00 1 :37 1 3:39 1 5 20-0ct-75 0:00 0:00 0:00 0:00 2 :25 0:53 1 : 1 9 4:38 1 6 27-0ct-75 2: 1 9 1 :50 2 :25 2: 1 6 4:52 0:00 0:00 1 3:44 1 7 3-Nov-75 2:45 1 :05 2 :06 0:34 0:43 0:00 0:00 7: 1 5 1 8 1 0-Nov-75 0:49 1 :06 1 :06 0:00 0:3 1 0:00 0:00 3:34 1 9 1 7-Nov-75 0: 1 6 0:20 0:00 0:00 0:00 0:00 0:00 0:36

total 283:48 total/ 1 8 1 5:46

week l ines kbd. l in days/wk cor. l i n

1 42 1 397 5 421 2 375 351 5 375 3 526 485 5 526 4 324 2 1 3 5 324 5 622 477 5 622 6 345 322 5 345 7 1 56 1 40 5 1 56 8 250 1 85 5 250 9 354 3 1 0 5 354

1 0 98 77 5 98 1 1 1 45 1 35 5 1 45 1 2 0 0 0 0 1 3 749 664 5 749 1 4 258 1 94 5 258 1 5 38 28 5 38 1 6 79 85 5 79 1 7 1 52 1 3 1 5 1 52 1 8 1 5 4 5 1 5 1 9 1 1 1 5 1

total 49 1 6 42 1 7 total/ 1 9 258 221

META-PROGRAMMING: A SOFTWARE PRODUCfiON M ETHOD 129

Project C continued • . .

EmRIQ�ee: T3

M T w T F s s total 1 1 4-Jul-75 ? ? 1 :34 8:46 4:29 0:00 0:00 1 4:50 2 2 1 -Jul-75 5:42 8:0 1 4: 1 8 5:50 6:03 0:00 0:00 29:55 3 28-Jul-75 5:08 7:24 4:28 6:56 7 :01 0:00 0:00 30:59 4 4-Aug-75 7 : 1 0 8:08 7:09 7:44 8:32 0:00 0:00 38:45 5 1 1 -Aug-75 6:5 1 8 :41 7 : 1 6 6 :41 7 :56 0 :00 0:00 37:28 6 1 8-Aug-75 7 :29 7 :53 5 :2 1 6: 1 1 6:43 0:08 2:37 36:24 7 25-Aug-75 3:54 7 :33 5 : 1 6 7:37 7 :36 0:00 0:00 3 1 :58 8 1 -Sep-75 0:00 4 :25 7 :48 7 :27 6:28 0:00 0:00 26: 1 0 9 8-Sep-75 6 :01 3:5 1 4:59 5:47 7 :20 0:00 0:00 28:01

1 0 1 5-Sep-75 7 :46 . 8:50 8:52 8: 1 3 8:52 0:00 0:00 42:35 1 1 2 2-Sep-75 8:42 5 :03 5:48 4:39 7:25 0:00 0:00 3 1 :40 1 2 29-Sep-75 6:58 7:37 7 : 1 6 7:05 3:57 2:39 0:00 35:35 1 3 6-0ct-75 4:22 8:05 5:34 5:44 5:55 0:00 0:00 29:42 1 4 1 3-0ct-75 6:59 5:26 ' 8:34 7:38 7 :42 0:00 0:00 36: 2 1 1 5 20-0ct-75 6:52 7 :34 8:50 8:06 7:03 2:02 0:00 40:31 1 6 2 7-0ct-75 4:44 8:20 7:30 7:44 5:23 0:00 0:00 33:43 1 7 3-Nov-75 7 :53 7 :27 6:45 6: 1 1 5:49 0:00 0:00 34:08 1 8 1 0-Nov-75 6:45 7:04 5:27 7 :28 6:37 0:00 0:00 33:24 1 9 1 7-Nov-75 6:28 6:3 1 6:39 7:29 7 : 1 4 0:00 0:00 34:23 20 24-Nov-75 7 :09 0:00 0 :00 0:00 0:00 0:00 0:00 7:09

total 633:4 1 total/ 1 9 33: 2 1

week l ines kbd. l in days/wk COLlin tot. com net. com loads

1 270 1 99 5 223 26 1 9 1 3 2 690 299 5 558 88 67 35 3 596 4 1 9 5 5 1 0 72 52 26 4 1 70 302 5 267 72 67 47 5 496 586 5 445 54 45 34 6 362 331 5 42 1 96 85 50 7 242 1 82 5 248 34 27 1 7 8 4 1 8 233 4 489 43 28 1 5 9 474 1 84 5 406 50 40 24

1 0 4 1 3 4 1 9 5 429 1 20 92 47 1 1 383 358 5 320 6 1 54 26 1 2 6 1 4 473 5 452 6 1 49 38 1 3 604 307 5 588 29 23 1 4 1 4 2 1 8 337 5 372 69 49 23 1 5 679 4 1 7 5 478 90 74 42 1 6 243 337 5 2 1 8 70 65 32 1 7 342 323 5 284 52 40 25 1 8 2 1 6 286 5 264 6 1 46 36 1 9 - 1 4 1 34 5 53 38 34 27 20 0 0 1 1 65 3 3 2

total 7423 6 1 33 1 1 89 959 573 total/ 1 9 390 322 62 50 30

1 30 A PPENDIX E: SUMMARY OF THE MEASUREMENTS

Project C continued • • •

Employee: I4

M I w I F s s total 1 1 4-Jul-75 ? ? 0:49 4:52 6:43 0:00 0 :00 1 2:25 2 2 1 -Jul-75 6:37 8:30 6:36 8: 1 4 4:50 0:00 0:00 34:48 3 28-Jul-75 8 : 1 1 6: 1 7 5:49 5:43 6:46 0:00 0:00 32:48 4 4-Aug-75 5:52 8:38 5:43 6:57 7 :38 0:00 0:00 34:50 5 1 1 -Aug-75 5: 1 0 8: 1 5 8:04 7:34 8:22 0:00 0:00 37:27 6 1 8-Aug-75 5: 1 7 7:35 6:58 6:47 9:06 0:00 0:00 35:45 7 25-Aug-75 7:38 6:2 1 . 7 : 1 0 8:2 1 7 :55 0:00 0:00 37:26 8 1 -Sep-75 0:00 7:52 7 : 1 6 7 :54 5:46 0 :00 0:00 28:49 9 8-Sep-75 8:23 7:38 7 :50 6:53 7 : 1 1 0:00 0:00 37:57

1 0 1 5-Sep-75 8:05 7:25 4:50 7: 1 2 7 : 1 6 0:00 4 :41 39:32 1 1 22-Sep-75 8:53 7:34 3:35 6:38 7 :56 2:28 0:00 37:07 1 2 29-Sep-75 5:59 6: 1 5 8 : 1 4 7 :59 8: 1 7 0:00 2 :22 39:08 1 3 6-0ct-75 8: 1 3 7 :04 7 :58 7 :31 5:40 0:00 0:00 36:28 1 4 1 3-0ct-75 7:38 7:39 8:54 6:43 6:54 0:00 0:00 37:50 1 5 20-0ct-75 6:25 7 :35 7 :35 7:55 5 :41 0:00 1 :38 36:51 1 6 27-0ct-75 7:52 7:33 7:32 5:45 8:55 0:00 0:00 37:40 1 7 3-Nov-75 6:49 7: 1 4 7 :4 1 8:46 8:54 0:00 0:00 39:26 1 8 1 0-Nov-75 8: 1 8 7:28 7:52 9:04 7 :34 0:00 0:00 40: 1 8 1 9 1 7-Nov-75 8: 1 0 9:54 6:30 8:04 7 :27 0:00 0:36 40:44 20 24-Nov-75 7 :21 0:00 0:00 0:00 0:00 0:00 0:00 7 :2 1

total 684:40 total/ 1 9 36:02

week l ines kbd.l i n days/wk cor. l in tot. com net. com loads

1 1 76 1 1 1 5 223 22 1 5 1 1 2 427 288 5 558 47 3 1 1 6 3 425 348 5 5 1 0 47 27 24 4 364 386 5 267 35 30. 1 9 5 394 424 5 445 50 36 27 6 481 4 1 0 5 4 2 1 54 37 28 7 254 230 5 248 58 42 32 8 364 268 4 489 40 32 25 9 338 320 5 406 54 41 29

1 0 445 2 1 9 5 429 82 67 60 1 1 258 3 1 1 5 320 27 20 1 2 1 2 291 554 5 452 1 23 93 53 1 3 573 544 5 588 9 1 78 4 1 1 4 526 549 5 372 7 1 39 1 6 1 5 277 294 5 478 66 6 1 67 1 6 1 94 287 5 2 1 8 79 67 5 1 1 7 226 97 5 284 5 1 46 5 1 1 8 31 2 205 5 264 48 40 38 1 9 1 2 1 1 87 5 53 51 40 46 20 67 9 1 1 65 4 4 5

-

total 652 1 6052 1 1 00 846 651 total/1 9 343 3 1 8 57 44 34

MET A -PROGRAMMING: A SOFTWARE PRODUCTION METHOD

E.J Project Dl

EmgiQ��e: M 1

M T w T .. 1 0-Nov-75 0:00 1 :35 2 :4 0:00

1 7-Nov-75 2:47 0:34 3:57 2:49 1 24-Nov-75 4:47 0:42 2 :25 0:00 2 1 -Dec-75 1 :00 2:34 6 : 1 8 1 :38 3 8-Dec-75 2:52 2:56 6:54 2:39 4 1 5-Dec-75 1 :25 6:27 7 :32 3:08

total total/5

week l ines kbd. l in

244 206 1 297 267 2 1 62 1 37 3 401 335 4 427 350

total 1 572 1 345 total/5 3 1 4 269

Employee: M 1 helping with the d ebugging

4 1 5-Dec-75

Emgloyee: T 4

1 24-Nov-75 2 1 -Dec-75 3 8-Dec-75 4 1 5-Dec-75

total 1 45:44 5 (est. )

toh:ll/4

week l ines

1 407 2 6 1 3 3 355 4 1 022

total 2399 5 (est.) 0

total/4 599

M 0:00

· M 0:00 5:54 5:36 6:27

kbd. l in

391 641 522 874

2429

607

binary code: 1 0988 words

T w T 0:00 0:00 0:00

T w T 7:50 1 0:01 0:00 7 : 1 2 7 :2 1 7 :31 7:40 7 :33 7:34 7:24 1 1 :24 9:38

days/wk cor. l in

2 1 0 1 8 5 6 1 3 5 355 6.5 786

F s s total 0:00 0:00 0:00 3:39 0:44 1 :32 3:44 1 6: 1 0 0:00 0:00 3 :20 1 1 : 1 6 2 :07 0:00 2 :54 1 6:33 5:26 1 :05 3 :09 25:04 7 :03 0:00 0:07 25:44

98:26 1 9:41

F s s total 1 :43 5 :30 0 :00 7: 1 4

F s s total 0:00 0:00 0 :00 1 7:5 1 7 : 1 7 0:00 0 :00 35: 1 7 5:23 4:09 0:00 37:57 1 1 :59 7:4·5 0:00 54:39

32:00 36:26

tot.com net. com loads

29 23 22 95 68 58 83 58 66 90 39 65

297 1 88 2 1 1 80 50 60 74 47 52

131

132 APPENDIX E: SUMMARY OF THE M EASUREMENTS

E.4 Project Dl

EmQio��: M2 M T w

1 0-Nov-75 . 0:00 5:02 8:07 1 7-Nov-75 5 :39 6:02 4:5 1

1 24-Nov-75 6:4 1 5:30 6:55 2 1 -Dec-75 5:03 0:00 2:33 3 8-Dec-75 0:00 0:49 4:35 4 1 5-Dec-75 0:00 0:28 0:00

total total/5

week l ines kbd. l in

342 327 874 782

1 231 42 1 2 296 263 3 401 4 1 5 4 1 57 1 48

total 2304 2360 total/5 460 472

Emplo�ee : M2 helping with the debugging

M T w 3 8-Dec-75 0:00 0:00 1 : 1 8 4 1 5-Dec-75 6:47 3: 1 1 6:59

total total/2

EmpiQ�e�: T3 M T w

1 24-Nov-75 0:00 7 : 1 1 6:02 2 1 -Dec-75 7: 1 6 7:04 7:04 3 8-Dec-75 7: 1 3 7:36 7 :2 1

T F s s total 5:03 6:32 0:00 0:00 24:46 5: 1 9 7:28 0:00 0:00 29:22 0:00 0:00 0:00 0:00 1 9:06 0:00 1 :04 0:00 0:00 8:42 4:03 3: 1 6 0:00 0:00 1 2:44 0:4 1 0:33 0:00 0:00 1 :42

96:22 1 9:26

T F s s total 2 :47 2:46 0:00 3:32 1 0:26 1 : 1 1 7:23 4:08 0:00 29:42

40:08 20:04

T F s s total 0:00 0:00 0:00 0:00 1 3 : 1 4 5:40 6:57 0:00 0:00 34:03 6:35 6:40 0:00 0:00 35:27

4 1 5-Dec-75 6: 1 1 7 : 1 1 7:50 1 0:34 1 0: 1 6 8:58 0:00 5 1 :03 total 1 33:47 total/4 33:26 5 ( est.) 32:00

week l ines kbd. l in days/wk cor.l in tot. com net. com loads

1 268 257 2 670 1 4 1 0 4 2 884 7 1 6 5 884 37 24 1 2 3 387 445(83) 5 387 52(24) 4 1 ( 1 1 ) 36( 1 0) 4 927 659( 1 5 1 ) 6.5 7 1 3 94(78) 68(70) 38(43)

total 2467 2079(234) 1 97( 1 02 ) 1 43(8 1 ) 90(53) 5 (est. ) 0 80 50 60

total/4 6 1 6 5 1 9 49 35 22

(f igures in parenthesis show the contribution of M2 whi le helping with the debugging ( 4.8.3. 1 ) ) binary code: 8898 words

META ·PROGRAMMING: A SOFTWARE PRODUCTION M ETHOD

E.5 Project D control

Employee: P1

1 1 2-Jan-76 2 1 9-Jan-76 3 26-Jan-76 4 2-Feb-76 5 9-Feb-76 6 1 6-Feb-76 7 23-Feb-76 8 1 -Mar-76

total 1 29:03 total/7

week l ines

1 0 2 246 3 472 4 596 5 25 6 1 1 0 7 -55 8 -409

total 986 total/? 1 40

Employee: P2

1 1 2-Jan-76 2 1 9-Jan-76 3 26-Jan-76 4 2-Feb-76 5 9-Feb-76 6 16-Feb-76 7 23-Feb-76

total 209:34 total/7 29:56

week l ines

1 454 2 297 3 2 1 6 4 1 06 5 337 6 337 7 1 56 8 0

total 1 907 total/7 272

M T w T ? ? 3:36 5:00 0:00 0:01 6:32 6:29 3:44 5: 1 5 6:23 0:00 0: 1 0 0:00 5: 1 7 5 :38 0:00 0:00 0:00 0 :00 8:34 6:49 6:36 0:00 0:28 0:00 1 :2 1 2 :26 4:4 1 5:52 6:08 1 :28

kbd.l in days/wk cor. l in

1 1 4 0 1 7 1 4 308 507 4 590 863 5 596 1 7 0 0 1 85 3 1 83 1 5 1 4 -67 1 78 4 - 5 1 1 2088 298

M T w T 5 : 1 7 6 : 1 2 6:22 7 :36 6: 1 6 6:38 7:20 6:53 6:48 6:30 4:52 6:02 6:42 7:00 6: 1 3 5:47 6:03 6 : 1 7 6:33 5 :51 5:53 6:02 6:36 5:37 6:06 6:45 5:51 6:17

kbd. l in days/wk cor . l in

26 1 5 454 529 5 297 3 1 0 5 2 1 6 263 5 1 06 385 5 337 439 5 337 323 5 1 56 0 0 0 25 1 3 359

binary code: 6364 words, representing 2 1 34 l ines ( balance of source code was used for testing)

F s s total 0 :00 0 :00 0:00 8:36 4:53 1 :42 0 :00 1 9:38 3:37 0:00 0:00 1 9:00 4:4 1 9:53 5 :01 30:43 0:00 0:00 0:52 0:53 0:00 0:00 0:00 2 1 :59 4: 1 6 1 :3 1 0:00 1 0:04 0:00 0:00 0:00 1 8: 1 0

1 8:26

tot. com net. com loads

4 4 ? 0 0 ? 8 6 ? 69 40 ? 1 1 ? 25 20 ? 2 1 1 3 ? 32 22 ? 1 60 1 06 ? 2 2 1 5 ? ·

F s s total 5 :06 0:00 0 :00 30:35 4:07 0:00 0 :00 3 1 : 1 6 5:32 0:00 0:00 29:46 6:08 0:00 0:00 3 1 :5 1 3:50 . 0 :00 0:00 28:36 2:40 0:00 0:00 26:49 5:39 0:00 0:00 30:41

tot. com net. com loads

0 0 0 73 25 1 4 78 45 26 65 39 32 65 42 24 63 30 22 76 44 33 0 0 0 42 1 226 1 5 1 60 32 2 1

133

1 34

[A ron]

(ASCII]

REFERENCF..S

Aron, J. D., 1970 See (NAT02] page 52.

Proposed Revised American Standard . Code for Information Interchange, Communications of the ACM December, 1965

[ Baker!] Baker, F. Terry, Chief Programmer Team Management of Production Programm i ng, I BM Systems Journal, Vol. 1 1, No. 1, 1972

[ Baker2] Baker, F. Terry, System Qual i ty Through Structured Programming, 1972 Fall Joint Computer Conference

[ Balzer] Balzer, R., Automatic Programming, lsi Technical Review, January, 1973

[ Barry] Barry, Barbara S., et al. Structured Programming Series, Volume X, Chief Programmer Team Operations Description, National Techn ical Information Service RADC-TR-74-300 1975

[ Boehm] Boehm, Barry. W., The H igh Cost of Software. 1975 See [Horowitz]

[ Brandon] Brandon, Dick. H., The Economics of Computer Programming. 1970 See [Weinwurm]

[Brooks] Brooks, Frederick P. Jr., The Mythical Man-Month, Addison,.Wesley, 1975

[ Brown]

[Cw]

Brown, 1970 See [NAT02] page 53.

Computerworld, 1974 Aug 21 Raw ·count of I nstructions I Day May Reward Poor, Not Good Code

[ Dahl-Hoare] Dahl, Ole-Johan & Hoare, C. A. R ., H ierarchical Program Structures. Structured Programming, Academic Press, 1972

[ Dahi-Nygaard] Dahl, Ole-Johan & Nygaard, K., Simula - an Algol -Based Simulation Language, Communications of the ACM 9,9. September, 1966


[ Dennis-VanHorn] Denn is, Jack B. & Van Horn, Earl C., Programming Semantics for Multiprogrammed Computations, Commun ications of the AeM 9,3. March, 1966

[ Deutsch] Deutsch, L. Peter, An In teractive Program Verifier, Ph. D. dissertation, Department of Computer Science, University of Cal ifornia, Berkeley, June 1973

[ Deutsch-Lampson] Deutsch, L. Peter & Lampson, Butler W., An On-line Editor, Communications of the AeM 10,12. Dece,mber, 1967

[Dijkstra] Dijkstra, Edsger W., Notes on Structured Programming. Structured Programming, Academic Press, 1972

[ Drucker] Drucker, Peter F., Management: Tasks, Responsibi l i ties, Practices, Harper & Row, 1973

[ Engel bart] Engelhart, Douglas C.; Watson. Richard W. & Norton, James C., The Augmented Knowledge Workshop, In AFIPS Proceedings, Vol. 42, Nee, pp. 9-21 , 1973

[ Farber-Griswold-Polonsky] Farber, D. J.; Griswold , R. E. & Polonsky, I. P., Snobol, a String Manipulations Language, Journal of the AeM 11, 1 1964

[Floyd I] Floyd, Robert W:, Assigning Mean ings to Programs, i n Proc. Symp. Appl ied Mathematics, vol. XIX, Mathematical Aspects of Computer Science, American Mathematical Society, 1967 -

[ Fioyd2] Floyd, Robert W., Algorithm 245 TREESORT 3 [M1], Communications of the ACM 7,12. December, 1964

[Geschke] Geschke, Charles M., 1975 Private commun ication.

[Oesch ke-Mitchel l] Geschke, Charles M. & Mitchell , J., On the Problem of Uniform References to Data Structures, IEEE Transactions, SE-1, 2. June, 1975

[Hoare] Hoare, C. A. R .• Notes on Data Structuring. Structured Programming, Academic Press, 1972

[Hoare-Wirth] Hoare, C. A. R. & Wi rth, Ni klaus, A Contribution to the Development of Algol, Communications of the AcM 9,6. J une, 1966

135

136 REFERENCES

[Horowitz] E1 1 is Horowitz, (Ed.) Practical Strategies for Developing Large Software Systems, Addison-Wesley, 1775

[ Katz-Kahn] Katz, Dan iel & Kahn, Robert L., The Social Psychology of Organizations, Wiley, 1 965

[Kern ighan-Piauger] Kernighan, Brian W. & Plauger, P. J., The Elements of Programming Style, M cGraw-H il l , 1974

[Knuth ]

[Kosy]

K nuth, Donald E., The Art of Computer Programming, Vol. 1, Addison-Wesley, 1968

Kosy, Donald W., Air Force Command and Control I nformation Processing in the 1 980s:· Trends in Software Technology, USAF Project Rand, National Technical I nformation Service Ao-A017-128 1974

[ Lampson1] Lampson, Butler W., 1974 Private comm un ication.

[ Lampson2] Lampson, Butler W., An Open Operating System for a Single-user Machi ne. Revue Francaise d'Automatique, Informatique et Recherche Operationnel l e, n° sept. 1975, B-3

[ Lampson-Mitchell]

[ LRG]

Lampson, B. W, Mitchell J. G. & Satterthwaite E. H., On the Transfer of Control Between Contexts. Proceedi ngs, Colloque sur Ia Programmation, Ed. by B. Robinet, Springer-Verlag, 1974

Learni ng Research Group, Personal Dynamic Med ia. Xerox Palo Alto Research Center, 1975

[Mayer-Stalnaker] Mayer, David B. & Stalnaker, Ashford W., Selection and Evaluation of Computer Personnel. 1970 See [Weinwurm]

[ McClure] McClure, R.M., 1969 See [NAT02] page 88.

[McCracken] McCracken, Dan iel D., A G uide to COBOL Programming, John Wiley & Sons, 1963

[Metcalfe-Boggs] Metcalfe, Robert M. & Boggs, David R., Ethernet: Distributed Packet Switching for Local Computer Networks, Communications of the AcM 19,7. July, 1976


[Metzger] Metzger, Phi l i p W., Managing a Programming Project, Prentice-Hall, 1973

[Mil ls] Mi l ls, Harlan D., Chief Programmer Teams, Datamation, December, 1973

[Morris!] Morris, Thomas D., Commentary on the Effective Executive. Peter Drucker: Contributions to Business Enterprise, Ed. by T.H. Bonaparte, N Y University Press, 1970

[Morris2] Morris, James H. J r., Towards More Flexible Type Systems . . 1974 See [Lampson-Mitcheii-Satterthwai te]

[Morris3] Morris, James H. Jr., Types Are Not Sets, StGPLAN - SIGACT Symposium on the Principles of Programmi ng Languages, Boston, October 1973

[NATOl] Software Engineering, Report of Nato Science Committee, Ed. Peter Naur and Brian Randel l 1969

( N AT02] Software Engineering Techn iques, Report of Nato Science Commi ttee, Ed. J.N. Buxton and B. Randel l 1970

[ Naur1] Naur, Peter, Proof of Algorithms by General Snapshots, BIT 6,4 1966

[Naur2] Naur, Peter, Program Translation Viewed as a General Data Processing Problem, Commun ications of the ACM 9,3. March, 1966

[Naur3] Naur, Peter, Concise S urvey of Computer Methods, 1974 Petrocel l i Books

[Parnas1] Parnas, D. L., On the Criteria to be Used in Decomposi ng Systems into Modules, Communications of the ACM December, 1972

[ Parnas2] Parnas, D. L., The Influence of Software Structure on Rel iabi l i ty, Proceedi ngs of the International Conference on Rel iable Software, Los Angeles, April 1975. IEEE Cat. No. 75CH0940-7CSR

[Pietrasanta] Pietrasanta, Alfred M., Resource Analysis of Computer Program System Development. 1970 See [Weinwurm]

1 37

138

[ Reynolds] _

R EFERENCES

Reynolds, Carl. H., What's Wrong with Computer Programming Management? 1970 See [Weinwurm]

[Richards] R ichards, M., BCPL: A Tool for Compi ler writing and System Programmi ng, Proc. AFIPS Conf., 35, 1969, SJCC

[Royce] Royce, Winston. W., Software Requirements Analysis. 1975 See [H.orowitz]

[Sackman] Sackman, H ., Erikson, W. H. & Grant, E. E., Exploratory Experimental Studies Comparing Onl i ne and Offl ine Programming Performance, Communications of the AcM 1 1,1. January, 1968

[Teitel man] Tei telman, Warren, Interl isp Reference Manual. Xerox Palo Alto Research Center, 1975

[Vyssotsky] Vyssotsky, V ictor, Large-scale Rel iable Software: Recent Experience an Bel l Labs. 1975 See [Parnas2]

[Weinberg] Weinberg, Gerald M., 1971 The Psychology of Computer Programming, Van Nostrand

[Weinwurm] Weinwurm, George F., (Ed.) On the Management of Computer Programming, 1970 Auerbach

[Wirth1] W i rth, Ni klaus., Program Development by Stepwise Refinement, Communications of the ACM 14,4. Apri l , 1971

[Wirth2] Wirth, N iklaus, The Programming Language PASCAL, Acta Informatica, Volume 1, pp. 35-63 1971

[Wijngaarden] W ijngaarden, A. van (Ed.); Mail loux B. J.; Peck, J. E. L. & Koster, C. H . A., Report on the Algori thmic Language ALGOL 68, N umerische Mathematik, 14, 79-218 1969

139

INDEX

Page

area special i zation 22

capabi l i ties of h igh level languages 66

check procedure 50

contin uous process 7

cross meta-programm i ng 68

debugging strategy 49

debugging tactics 49

dictionary 24

ear.ly experiments group 7 7

elaboration 32

engineering phases of software production 7

error 46

error indication 46

feedback communications 29

global language 24

language creation 24

language learning 24

local language 24

large scale sharing 1 8

main experiments group 77

major qual ifier 40

meta-program 29

meta-programmer 27

m i nor qual ifier 40

operation 34

painted type 36

production phases of software production 7

proto-software 9

readable software 54

refin ing proto-software 9

140 INDEX

sharing of software 15

state vector syntax checker 50

subtask special ization 22

task order 30

techn icians 27

test bed 13

test print procedure 49

uncertainty absorption 5

units of production 7

unpainting 38

underlying type 36

user software 9

wheel network 27

writeable software 54

Documents

Meta-Programming - a Software Production Method by Charles Simonyi