Software Engeneering Using C++

7/27/2019 Software Engeneering Using C++

http://slidepdf.com/reader/full/software-engeneering-using-c 1/116

Department of Computing Science

Faculty of Computing & Engineering

Software Engineering

using C++

Lecture Notes

Prepared by Terry Chapman

September 1999





Tabl e of Cont ent s

ii

ARRAYS.......................................................................................................................351. Introduction.......................................................................................................................................35

2. Defining and referencing arrays........................................................................................................35

3. Array initialisation............................................................................................................................37

4. Multi-dimensional arrays..................................................................................................................38

5. Arrays as function arguments ...........................................................................................................38

6. Pointers and arrays............................................................................................................................397. Character strings and variable pointers.............................................................................................40

8. Character string input/output ............................................................................................................40

9. Arrays of pointers and pointers to pointers.......................................................................................41

10. Command line arguments .................................................................................................................4211. Initialising pointer arrays ..................................................................................................................43

12. Review ..............................................................................................................................................43

13. Summary...........................................................................................................................................44

14. An array application - Stack of char .................................................................................................45

PROGRAM FILES ........................................................................................................471. Introduction.......................................................................................................................................47

2. The steps to produce an executable...................................................................................................48

3. Types, storage class and scope..........................................................................................................484. Local duration...................................................................................................................................49

5. Declaration versus definition ............................................................................................................50

6. Static duration...................................................................................................................................51

7. Storage class static ............................................................................................................................52

8. Static local variables .........................................................................................................................52

9. Static global variables.......................................................................................................................52

10. The C++ pre-processor .....................................................................................................................53

11. Conditional compilation....................................................................................................................53

12. Conditional file inclusion..................................................................................................................54

DATA STRUCTURES...................................................................................................551. Data Types........................................................................................................................................55

2. Abstract Data Types..........................................................................................................................553. Classification ....................................................................................................................................55

4. Categories of Collection ...................................................................................................................56

5. Stacks................................................................................................................................................56

6. Abstract Data Type? .........................................................................................................................59

7. Queues ..............................................................................................................................................59

8. Lists...................................................................................................................................................619. Structs ...............................................................................................................................................61

10. Unions...............................................................................................................................................62

DYNAMIC DATA STRUCTURES..................................................................................631. Structures ..........................................................................................................................................63

2. Comparison between structs and arrays............................................................................................64

3. Storage Management ........................................................................................................................654. Dynamic Data Structures - Linked Lists ...........................................................................................68

5. Other dynamic structures ..................................................................................................................72

SORTING......................................................................................................................731. Introduction.......................................................................................................................................73

2. Components of Sorting .....................................................................................................................73

3. Sorting Files......................................................................................................................................73

4. Why sort?..........................................................................................................................................75

5. Does it pay to sort? ...........................................................................................................................756. What is the best sort? ........................................................................................................................75

7. Sorting efficiency..............................................................................................................................75

8. Simple Array Sort - Exchange (Bubble) ...........................................................................................76

9. Insertion Sort.....................................................................................................................................77

10. Simple Sort performance ..................................................................................................................78

11. Conclusions.......................................................................................................................................7812. Complex sorts ...................................................................................................................................78



Table of Cont ent s

iii

13. QuickSort.......................................................................................................................................... 79

14. Efficiency of Quicksort .................................................................................................................... 8015. C++ code for function Quicksort ( see Wirth )................................................................................. 81

16. Comparison of complex sorting algorithms...................................................................................... 81

17. Further Reading................................................................................................................................ 81

TESTING ......................................................................................................................831. The context for testing - Verification and Validation....................................................................... 83

2. The objectives of testing................................................................................................................... 83

3. Testing & Debugging .......................................................................................................................84

4. Two different testing strategies ........................................................................................................ 845. Categories of Testing........................................................................................................................86

6. Test Planning.................................................................................................................................... 86

7. How much testing? ........................................................................................................................... 87

8. Test Data v Test Cases ..................................................................................................................... 87

9. Black box v White box testing ......................................................................................................... 87

10. Black box testing ..............................................................................................................................88

11. White box testing - Introduction....................................................................................................... 91

12. White box testing..............................................................................................................................92

13. Automated Testing ........................................................................................................................... 96

DATA STRUCTURE METRICS ....................................................................................991. Representing Abstract Structure.......................................................................................................99

2. Implementing Data Structures........................................................................................................ 1003. Metrics............................................................................................................................................100

4. Mathematical Notations.................................................................................................................. 101

TREES........................................................................................................................1051. Applications....................................................................................................................................105

2. Implementation............................................................................................................................... 105

3. Variations ....................................................................................................................................... 105

4. Example Declaration ......................................................................................................................105

5. Expression Trees ............................................................................................................................ 1066. Tree Traversal................................................................................................................................. 106

7. Parse Trees .....................................................................................................................................107

8. Binary Search Trees .......................................................................................................................107

9. Importance of Balance.................................................................................................................... 108

10. Other types of tree .......................................................................................................................... 108

HASH TABLES...........................................................................................................1111. Applications....................................................................................................................................1112. Operations ......................................................................................................................................111

3. Efficiency ....................................................................................................................................... 111

4. Problem .......................................................................................................................................... 111

5. Hashing...........................................................................................................................................111

6. Collision Resolution....................................................................................................................... 112

7. Hash Table example ....................................................................................................................... 1128. Perfect Hashing Functions..............................................................................................................113

LIBRARIES.................................................................................................................1151. The ctype library.............................................................................................................................115

2. The maths library............................................................................................................................ 116

3. The standard library........................................................................................................................ 117

BIBLIOGRAPHY.........................................................................................................119



Basic C++

1

Basic C++

1. A First C++ Program

// first.cpp

// My first C++ program // A. Student

// 27/09/99

#include<iostream>

int main( void )

{

cout << “Hello World” << endl;

return 0;

}

The lines starting with // are comments. These are for human consumption - the compiler

ignores them. They cause all text on the current line to the right of the symbol to be acomment. An alternative form of comment is the pair:-

/* this is a comment */

These do not need repeating on every line and therefore a number of lines can be enclosed

within one pair.

Since the program is going to display output, it is necessary to make available the

input/output library iostream. This is done by issuing a compiler directive that the text of

the file iostream.h should be included in the compilation. The compiler knows where to

find this file. The word cout represents the output stream and the symbol << causes what

follows it to be placed on the standard output stream. By default, the standard output

stream is displayed on the terminal.

Every C++ program must have one, and only one function main. This is where program

execution always commences. This, and all other functions have a return type, in this case

int , and an argument list, in this case empty - indicated by void and a body that is delimited

by open and close braces { }.

The first line of function main outputs the message “Hello World” to the terminal followed

by a new line. The program then terminates, returning the value 0 to the operating system.

By convention, a return value of 0 indicates success. This program deals only with two

values - a constant string1 literal containing the words “Hello World” and an integer

constant 0. It does not require the use of any variables. Most programs require the use of

variables, i.e. storage locations in memory that contain values during program execution.

Variables may be of different types.

2. Data Types

There are a number of basic data types built in to all programming languages. A data type

consists of a name and a specification of :-

! the range of values that a variable of that type can hold - its domain. This range is

often limited due to the amount of storage that is used by such items.

! the operations that may be carried out on values of that type

In C++, the most common data type is int - whole numbers that may be positive or

negative - natural numbers. The amount of storage allocated to variables of type int is

1

A sequence of characters



Basic C++

2

often 2 bytes and sometimes 4 bytes depending on the compiler. This allows a range of

values from

! -32768 - 32767 in the case of 2 bytes and

! -2,147,483,648 - 2,147,483,647 where 4 bytes are employed.

These peculiar ranges arise from use of the binary system.

The fundamental native2 data types and their storage size in GNU C++ are:-

type Range of values Bytes

Char Character codes 0 - 127 1

unsigned char Unsigned character codes 0 - 255 1

short int Signed integer -32768 to 32767 2

Int Signed integer -2,147,483,648 to 2,147,483,647 4

unsigned int Unsigned integer 0 - 4,294,967,295 4

long int Signed integer -2,147,483,648 to 2,147,483,647 4

Float 1.17549e-38 to 3.40282e+38 4

Double 2.22507e-308 to 1.79769e+308 8

Note that, unlike some compilers, GNU C++ uses 4 bytes for type int thus providing the

same range of values as type long int (or just long). Unsigned integers have double the

capacity of signed integers because there is no need to store the sign.

Strings and characters are not the same. A string containing only a single character, e.g.

"W" actually occupies 2 bytes of storage one for the 'W' and one for the ASCII NUL. A

character variable can hold only one single character, e.g. 'W', normally occupying only

one byte.

To declare a variable of type string and give it a value immediately:-

char myname[] = "Terry Chapman";

If the string is not intended to be changed, it should be declared as a constant:-

const char myname[] = "Terry Chapman";

The empty brackets signify an array whose size is determined automatically by the

compiler which also reserves space for the terminating ASCII NUL. The variable or

constant can be output in the usual way, i.e.

cout << myname;

3. String Constants

A string constant is a sequence of characters enclosed in double quotes. e.g. "MSc Information Technology". The sequence may be empty e.g. "".

If the string is to include certain characters, e.g. double quotes and the backslash, then

these must be escaped with the '\' backslash character, e.g.

"She said \"I have lost my file mydir\\myprog.cpp\"". When output, this would display:

She said "I have lost my file mydir\myprog.cpp"

2

Types built into the language



Basic C++

3

H e l l o \0

0 1 2 3 4 5

Other special characters may be included, e.g.

\n newline \? question mark

\t Tab \' single quote

\f formfeed \a alarm bell

A string constant can extend over 2 or more lines by placing a backslash at the end of an

uncompleted line.

Two adjacent strings are concatenated to form a single string e.g.

"This string " "is concatenated with this one"

There is no native data type string in C++. Instead, strings are implemented as

an array3 of characters terminated by the special character '\0' (ASCII NUL).

ie the unprintable character which has the ASCII code 0. We will cover arrays

later - they are a very important compound data type holding a sequence of

data items in a contiguous area of memory.

Strings and characters are not the same. A string containing only a single character, e.g.

"W" actually occupies 2 bytes of storage one for the 'W' and one for the ASCII NUL. A

character variable can hold only one single character, e.g. 'W', normally occupying only

one byte.

To declare a variable of type string and give it a value immediately:-

char myname[] = "Terry Chapman";

If the string is not intended to be changed, it should be declared as a constant:-

const char myname[] = "Terry Chapman";

The empty brackets signify an array whose size is determined automatically by the

compiler which also reserves space for the terminating ASCII NUL. The variable or

constant can be output in the usual way, i.e.cout << myname;

4. Variables and Constants

Variables are names associated with a value. In programming, names are referred to as

identifiers. During program execution, the value associated with an identifier may be

changed many times. In C++ the compiler must know the type of the identifier because

this determines the amount of storage that must be allocated for its value. For this reason,

every variable declaration must have a type. In addition, the variable may be initialised

with a value:-

Examples:-int sum; // variable named sum of type int

int size = 37; // initialised on declaration

int sum, total = 0; // 2 integers, only total is initialised to 0

float average = 0.0; // Initialisation must be of the appropriate type

char ch; // Uninitialised declaration

char ch = ‘ ‘; // literal space surrounded by single quotes

char progname[] = “myprog.cpp”; // strings by double quotes

3 A contiguous sequence of memory locations



Basic C++

4

Identifiers must start with a letter. After this, they may contain any number of letters,

digits or the underscore character. They must not include spaces.

int this_is_a_very_long_identifier_with_99 = 99; // valid

float The Average; // invalid - contains a space

char 2good; // invalid, starts with digit You must use meaningful identifiers. They are part of the program’s documentation and

should be expressive of the purpose for which the identifier is required. An exception to

this is loop control variables that have no other purpose than to access elements of an

array. These are commonly a single character e.g. i, j.

Constants are named items that cannot change. These are used for values in the program

that will remain constant throughout the program’s execution. They must be initialised

with a value.

Examples:-

const double pi = 3.14159265359;

const int numitems = 350;

5. Arithmetic Operators

+ unary plus or addition

- unary minus or subtraction

* multiplication

/ division

% modulus

Note that there is no exponentiation operator that raises a number to a power. There are

library routines that accomplish this.The above operators apply to all numeric types (except %). Modulus produces the

remainder after integer division and applies only to integral types-

5 % 2 = 1, 11 % 3 = 2, 19 % 5 = 4.

You should find a table of operator precedence in your textbook. 2 + 3 * 4 means "add 2

to the product of 3 and 4". If you want it to mean "add 2 to 3 and then multiply by 4" you

must change the precedence with parentheses (2 + 3) * 4.

A combination of arithmetic operators and arithmetic constants or variables is known as an

arithmetic expression. An expression has a value, thus 10 * 3 has the value 30.

A statement on the other hand is a command to carry out processing, e.g. x = 10 * 3; is a statement that means assign to the variable x the value of the expression

10 * 3.

You might have rationalised the difference between a statement and an expression by

thinking to yourself that an expression has a value whereas a statement does not. You

would be correct if you were talking about most conventional programming languages like

Pascal, Modula-2 and BASIC. But you would be wrong if you were talking about C and

C++ since, in these languages, a statement also has a value - in the above example, the

statement x = 10 * 3 has the value (30). This value can be used for further operations, e.g.

for assignment to another variable:-

y = x = 10 * 3; // both x and y now have the value 30



Basic C++

5

6. Type conversions

There are two aspects:-

! Automatic conversions carried out by the compiler

These are discussed below (para 7)

! Type conversion operators

These use the name of a type as a function in order to force an expression into a

particular type e.g. int(99.21) will yield 99.

7. Assignment operator

C++ carries out automatic type conversion so that the result of an expression on the right

hand side of the assignment symbol is automatically converted (if possible) into the type

of the variable on the left hand side. This is convenient in many ways, but there are

occasions when you need to know what the exact effect is. Like letting a futuristic washingmachine automatically decide what program to use according to the clothes you put in.

What program does the machine decide to use when you wash a silk shirt and a very dirty

towel? Do you get a grubby towel or a ruined silk shirt? Ultimately you will need to know

what the conversion rules are, but do not worry about them at present. In any case it is

desirable not to make a habit of mixing your washing since you may get a result you did

not expect.

Briefly, fractional values (types float and double) are truncated when assigned to integral

variables (int, unsigned int, long int). Large values that exceed the capacity of the integral

variable to which they are assigned will cause overflow and the result will be meaningless.

No overflow warning is issued and care should be taken when writing expressions with

integral value to ensure that overflow does not occur.

8. The compound assignment operators

These are all very intuitive and make life easier by reducing the typing.

count += 2 increment the value of count by 2

stock -= 1 decrement the value of stock by 1

divisor /= 10.0 assign to the float divisor the result of dividing its current value by 10

power *= 9 assign to power the result of multiplying its previous value by 9.

remainder %= 2 assign to remainder the result of taking its current value modulus 2

Note that the last may only be used with integers, all others may be used with any

arithmetic type. Note also the effect of sum /= 3 + 7 . The expression 3 + 7 is evaluated

first.

9. The increment & decrement operators

y = x++ assign the old value of x to y and then increment x (postincrement)

y = ++x increment x and assign the new value to y (preincrement)

similarly with --

We will return to these when we look at processing arrays using loops.



Basic C++

6

10. Iostream library

Input and output in C++ is based on streams. A stream is an abstract concept that you do

not need to worry about. Just think of the natural phenomenon. Whenever a C++ program

executes, three streams are opened automatically - standard input, standard output and

standard error . Normally, standard input is expected to come from the keyboard and

standard output is sent to the display. However standard input and standard output can be

redirected from the DOS command line using the < and > characters when the program is

executed. Standard error cannot be redirected.

11. Command line redirection

If you wish to capture the output of a program in a file, simply redirect its output as

follows

myprog > myprog.out

Similarly you can substitute a file to be the input to a program instead of the keyboard.

myprog < myinput

To redirect both input and output use something like

myprog < myinput > myprog.out

12. Streams

Access to istream (input stream) and ostream (output stream) operators is obtained by

putting the preprocessor directive #include<iostream> at the top of each program file that

needs to carry out standard input and/or standard output. This has the effect of including

the header file iostream.h (a text file) in the compilation.

12.1 Unformatted input and output

cin and cout are the predefined standard input and output streams defined in the

above header file (there is a third cerr ):-

cin >> x >> y >> z; obtains from the keyboard values for 3 variables.

Spaces or tabs may separate the actual inputs.

cout << "A message : " << message << endl;

where message is a string constant or variable.

<< and >> are known as the insertion and extraction operators. Theunusual notation arises from the object-oriented aspects of thelanguage. Just take it for granted at present

endl causes subsequent output to be displayed on the next line of thedisplay.

cin.get(ch) gets a single character from standard input and returns the stateof the standard input stream

cout.put(ch) puts a single character to standard output and returns the state ofthe standard output stream

cout.good()

cin.good()

Return true if there has been no error from the last output (input)operation

cout.bad()

cin.bad()The opposite of good()

cin.eof() Returns true if end of input encountered, false otherwise. When

entering from the keyboard, end of input is indicated with Ctrl Z.All of the fundamental types supported by C++ (including strings) may be input

using cin >> and output using cout <<.



Basic C++

7

13. Output manipulators

As their name implies, these allow formatting of the output stream for such things as the

field width, justification, decimal precision etc. They are normally included within the

output statement - see examples below and Skansholm pp 365-369. Use of these

manipulators requires that the header file iomanip be included in the program:-

#include<iomanip>

setw(int) sets the field width to n characters for the output e.g.

cout << "22 right adjusted in field width of 4 is [" << setw(4) << 22 << "]";

produces

22 right adjusted in field width of 4 is [ 22]

setw must be repeated for each subsequent output for which afieldwidth is required. In the absence of setw() the fieldwidth is theactual width of the output.

setfill( char ) specifies the character that is to be used for padding output that isnarrower than the field width, e.g.

cout << ‘[‘ << setw(4) << setfill(‘*’) << 22 << ‘]’;

produces [**22]

setprecision(int) changes the precision for the display of types float and double (thedefault is 6 digits). Normally it determines the number of digitsdisplayed, but if the showpoint flag (see below) has been set, then itcontrols the number of decimal places displayed

setiosflags(… )

and setf()change flags that control such things as justification, precision etc.

setiosflags( ios::showpoint ) forces the decimal point to be displayedeven for whole numbers. After the showpoint flag has been set, the

effect of setprecision is to control the number of decimal placesdisplayed.

setiosflags( ios::left ) and setiosflags( ios::right ) determine the justification of the output which will remain unchanged until the flag ismodified by another call.

setf() is a member function of iostream and does the same job assetiosflags except that it cannot be used within an output statementas setiosflags can. It would be called by e.g. cout.setf(ios::right);

The items starting with ios:: within the parentheses after setiosflags are constants that are

defined in the iostream library. Their names are self-explanatory and you do not need to

know their values. The meaning of ios:: will only be explained in a subsequent moduleunless you read up on it yourself.

A program basiccpp.cpp is provided in the lab that shows the effect of setw(n) and some

of the flags that can be set using setiosflags(), including display of integer in octal and

hexadecimal.



Basic C++

8

14. Relational operators and expressions

14.1 Relational operators

< less than

> greater than

<= less than or equal

>= greater than or equal

== equal (Note: 2 equal signs with no space between)

!= not equal

14.2 Relational expressions

These expressions compare two values and return true if the test succeeds and false

otherwise.

ch > 'A' y * y <= 2 * y + 1

f < 0.0

y == x

ch != '\0'

Take care not to use = as the equality operator. This is a common programming

error.

Beware of testing two floating point variables for equality. Their binary internal

representation means that many fractional values cannot be expressed exactly.

Instead, test for the difference between their absolute values. The function

fabs(<float>) can be used to find the absolute value of a float or double. To use ityou need to #include<math.h>

double f1 = 12.34574, f2 = 12.34578;

const double delta = 0.00005;

if ( fabs( f1 - f2 ) < delta )

… // consider them equal

else

… // consider them unequal

See also Skansholm p52.

15. FALSE and TRUE

Recent C++ compilers support a bool data type that can take one of two possible values -

true or false. Earlier compilers do not support this type and, instead, false is represented by

an integer with the value 0 and true by any non-zero value. GNU C++ provides the bool

data type and we shall be using it on this course. If you are using a compiler at home that

does not support bool, there is a simple addition that you can make to your programs that

gets around this deficiency. Enter the following into a file called bool.h and #include this

in all programs if you are not using GNU C++:-

typedef int bool;

const bool false = 0;

const bool true = !false;

However, I strongly recommend that you do use the GNU compiler. Several students have

had problems when using the Borland 4.5 compiler.



Basic C++

9

16. Logical operators and expressions

16.1 Logical operators.

The draft C++ ANSI standard introduced the new operators AND, OR and NOT .

These are not supported by the GNU C++ compiler, nor by Borland 4.5. Instead use

&&, || and !

&& or AND logical AND

|| , or OR logical OR

!, or NOT unary negation

16.2 Logical expressions

!5 false

!0 true

ch = 'a'; assign to ch the letter 'a', i.e.NOT the test for equality

ch == '\0' false!(ch == '\0') true

(ch)true (the character 'a' is converted to an integer and istested for non-zero)

(!ch) false

(ch && ch != '\n') true

(ch == 'a' || ch == 'A') true

17. Short-circuit evaluation

Note that, in the logical expression expression1 && expression2 both expressions must be

true for the whole expression to be true. If expression1 yields false, then the whole

expression cannot possibly be true. Therefore expression2 does not need to be and will not

be evaluated.

Similarly with expression1 || expression2, if the expression1 yields true, then the whole

expression must be true, whatever the value of expression2, so expression2 is not

evaluated.

This feature is important in cases where, if the first test fails, the second test must not be

evaluated because it would cause an error. We will meet this again when we come to look

at pointers.



Basic C++

10

18. The while statement

This is one of several iteration constructs provided by C++, and is

the simplest.

while ( logical expression == true )

<statement>

The parentheses () are required. If there is more than one

statement to be executed within the loop then braces { } are

required:-

while (logical expression == true)

{

statement1;

statement2;

etc..

}

Example // show.cpp

// copies its input to its output

#include<iostream.h>

int main(void)

{

char ch;

cin.get(ch); // get a character from the keyboard

while ( cin.good() ) // Becomes false if end of file or other input problem

{

cout.put(ch); // output the character to the display

cin.get(ch); // get the next char in preparation for the next loop iteration

}

return(0);

}

The while statement is preceded by a statement cin.get(ch) that sets up the value to be

tested by while. This is important because the termination condition may already exist in

which case the loop should not be entered. If the loop is entered, then cin.get(ch) is

repeated at the bottom of the loop to set up the condition again. This is invariably the way

that files are processed since they may be empty. It is a common error to forget to initialise

the test condition before entering the while loop.

This program can be used to display the contents of a text file if issued at the DOS

command line using redirection:-

show < show.cpp displays the source program file show.cpp at the terminal

The output can also be redirected, giving a file copy

show < show.cpp > showcpy.txt

showcpy.txt is now an exact copy of show.cpp

Here is a refinement of the above program:-

set upcondition

condition

statement(s)

set upcondition

next

statement

true

false

while



Basic C++

11

// show2.cpp

// copies its input to its output

#include<iostream>

int main(void)

{

char ch;

while ( cin.get(ch) ) // Becomes false if end of file or other input problem

cout.put(ch); // output the character to the display

return(0);

}

The get( ch ) function is called within the loop condition parentheses. The expression

cin.get(ch) does two things: a) it gets a character from standard input and passes it back

via its argument ch and b) it returns a reference to the standard input stream cin as its

function result. The stream has the value 0 when there is no further input and this is the

condition being tested by while. This does away with the need for the get prior to entry of

the loop, and also with the get at the bottom of the loop.

19. The if statement

This statement is classified as a branching construct. It allows the flow of control of the

program to be changed depending on the value tested, e.g. an input from the user or a

value held in a file.

if (condition)

statement;

Or

if (condition)

statement1;

elsestatement2;

As with while, if there is more than one statement in either the if

part or the else part then they must be surrounded by braces {} as

in the body of function main. Notice that, unlike Pascal, there must

be a semi-colon after each statement.

Condition is any logical expression yielding a boolean value (true

or false). It may consist of expressions combined into a larger,

more complex expression by the logical operators && (logical

AND) and || (logical OR). Statements within each part ( if or else

) may be any statement, including another if statement.if (condition1)

if (condition2)

statement2a;

else

statement2b;

else

statement1;

The else clause is assumed to relate to the immediately preceding if unless braces are used

to change this association.

condition

statement(s)

statement(s)

next

statement

true (non-

zero)

false (0)

if

these may also

be if statements



Basic C++

12

20. Style for logical expressions

In natural language we can say “If late for lecture then hurry else have another coffee”. We

do not say “If late for lecture is true then … “.

Similarly, in programming, the test of a logical value e.g. in an if statement would be

written as

if( late_for_lecture )

hurry();

else

have_another_coffee();

and not

if( late_for_lecture == true )

hurry();

…

It is generally considered to be poor programming style to use this second approach and

you will lose marks if you use it.

21. The ctype library

This is a 'C' library of functions that operate on characters. They include functions to test

whether a char is a letter, a digit, punctuation etc. and also to carry out case conversion.

See Libraries on page 115.



Functions

13

Functions

1. Introduction

You have already seen and used a function - the function main which every C++ program

must have. Until now it has been reasonable to write all of the code of your programs inthis function. However, as programs become larger, it is necessary to break them downinto collections of smaller and more manageable units. One such subdivision is thefunction. Functions give us the ability to store a computation in a named block of code andto carry out the computation simply by referring to its name i.e. by calling the function.

This facility for breaking programs down into simpler and more manageable units is amajor weapon in the fight to reduce the complexity of large programs and involves theprocess of abstraction. Abstraction allows us to concentrate on the current task and toignore details that are not relevant. So when we call a function e.g. sqrt to find the squareroot of a number, we are concerned only with how to make the call and not what steps thefunction takes to achieve the computation. We do need to know the data type of thenumber to be passed to sqrt , the data type of the value returned by it and what happens if we pass a negative value etc. - these aspects are relevant to our making the call, but theactual details of the computation are not relevant.

Of course, at different times we will have different levels and views of abstraction - if wehad been concerned with writing function sqrt then we would have been concentrating ourattention on expressing the algorithm to compute the square root of a number and wouldhave ignored unnecessary detail elsewhere (e.g. the other functions which make up thelibrary maths). A further advantage of storing code in functions is, of course, the ability tore-use them again in other programs.

This type of abstraction is called procedural abstraction after the procedures - the namethat most other languages use to refer to these named blocks of code. Technically afunction differs from a procedure in that it returns a value, whereas a procedure does not.C++ does not have procedures, but it is possible to specify that a function does not return avalue. Functions in C, C++ and most other languages (except the functional languages) donot conform closely to the mathematical concept of a function that accepts a singleargument and returns a single value. As we shall see, it is possible to pass more than onevalue to a function and to get back more than one result.

The structure of a function is:-

type-specifier function_name(argument_list)

{

definition_and_statement_list

}

type_specifier The data type of the value that is returned by the function

function_name A programmer-defined identifier that conforms to the rules for

identifiers. This is the name that is used to call the function.

formal argument_list The names and types of the values that are passed to the

function on which it is to carry out some computation.

definition_and_statement_list Exactly what you have been writing in function main up

until now, i.e.. constant and variable definitions and statements

including (normally) a return statement that provides the value

returned back to the point of the call, e.g. the return(0) appearing at

the bottom of main.





Functions

15

2. Input and output in functions

In general, it is considered good practice to isolate input and output statements in one

particular area of a program. This is because I/O tends to be hardware-specific and it is

easier to make changes for a different machine platform or display device if all the I/O

code is in one place. When writing small programs in a learning situation, it is not always

easy to follow this guide for best practice. But, wherever possible, try to confine I/O to one

or more suitable functions rather than spreading it across the program in a number of

functions whose primary purpose is not I/O.

In particular, it is not good practice to carry out I/O in low level functions. The reason for

this is that a function that may be re-used many times in many different programs cannot

know how the calling program wishes its output to be displayed, whereas the calling

program does know this. Different operating environments have different ways of

displaying output to the user of the program, so a low level routine that displays output for

a character console could not be used in a program that runs in a windowing environment.

3. Multi-function programsThere must always be a function called main in any C++ program. There may be any

number of other functions in the same source program file (or indeed in other source

program files). The question then arises - where do you put these other functions? C++

does not allow functions to be nested within other functions (unlike Pascal and Modula-2).

So additional functions may appear textually either before function main, or after it. When

the compiler scans the source text of a program, it will flag an error if it finds a call to a

function whose definition it has not yet encountered. So if a function is defined after main,

then a function declaration must appear before the point at which the call is made. This

declaration (also known as a function prototype) should normally be placed at the start of

main giving the compiler sufficient information to enable it to check that the function has

been called properly. This prototype will consist only of the return type, the functionname, and the types of its arguments.

// fun01.cpp

// illustrates the placing of functions in relation to main

// tdc 28/09/95

#include<iostream>

int add(int a, int b ) // this placing is deprecated

{

return(a + b);

}

int main(void){

// int mult(int a, int b); // prototype commented out

int x = 10, y = 3;

cout << add( x, y ) << endl << mult(x,y) << endl

Error: Function 'mult' should have a prototype in function main()

return(0);

}

int mult(int a, int b)

{

return (a * b);

}

Function add has been placed before main contrary to the recommendation for best

practice above.



Functions

16

The prototype for function mult has been commented out, causing the compiler error.

Removing the comments allows the program to compile successfully.

Different organisations may set their own 'house' styles, but we will show the full

definition of functions after main with prototypes normally appearing as the first

definitions within the body of main.

Note that the identifiers a and b in the prototype for mult are not essential. The prototype

could have been

int mult(int, int); // prototype with argument identifiers omitted

But the argument identifiers may be included if they aid the understanding of their

purpose. The compiler will also flag an error if the prototype does not match the formal

definition as regards either its name, or its number and type of arguments. But it will not

detect a difference between the return type as declared in the prototype and as defined in

the formal definition. If there is such a difference then a run-time error is likely to result.

4. Stepwise Refinement (or Top-down design)

When designing the solution to a problem it is normal to set out in logical order the steps

that need to be taken.

Stepwise refinement is a technique for tackling the problem of program complexity by

breaking a task down into steps, each of which is implemented by a function. Each of

these functions are then further

refined by breaking each of

them down into a series of

steps implemented as

functions, and so on.

Initially, the design process

can be approached by using a

PDL (Program Description

Language). We do not

introduce a formal description

of such a language, it is better

left flexible so that you can

use a structured type of natural

language. Read Skansholm pp

20-26 for an example of Top-

down design. When you find

that it is impossible to specify

any further steps in the processof functional decomposition

without using commands of

the programming language, then you have taken the functional decomposition process as

far as it can go. You are then ready to translate your natural language description into C++

source code.

Initially, your programs will be short and simple and you will wonder what all the fuss

was about. But when you have to tackle a large problem you will, I hope, see the point.

Initially, you will be unfamiliar with the syntax of C++, so it will be extremely difficult for

you to express a solution to a difficult problem directly in the programming language. In

these circumstances, it is essential that you develop the habit of expressing a solution

in natural language before attempting to write the code.

Note that, in the schematic diagram above, those boxes (functions) which consist entirely

of Step x.x should not be assumed to consist entirely of function calls without any other

Step 3

Step 4

Step 1.1

Step 1.2

Step 1

Step 2

Step 2.1Step 3.1Step 3.2Step 3.3

Code

Code

Code

Code Code Code CodeStep 1.2.1

Funct ion main()





Functions

18

and type of the formal arguments with the exception of default arguments (see Default

arguments on page 32. The function may modify the values of its arguments, and this will

have no effect on the values of any actual argument variables used in the call.

Remember that the values of the actual arguments are copied into the formal argument

identifiers. This is the pass-by-value argument mechanism. The actual arguments may be

any expression of the correct type. This includes a literal constant, e.g. 9.0, a variable, e.g. f , or even a call to another function which returns a value of the correct type, e.g.

cout << sqrt( sqrt(81.0) ); // outputs 3.

8. Function argument agreement & conversion

Automatic type conversion is carried out when actual arguments are copied into formal

argument variables in just the same way as that carried out during assignment. Generally

speaking you should not rely on this. Instead always pass the correct type as actual

arguments.

9. Overloaded function names

First you should recognise that an operator (e.g. +) is just a function specified in a different

way i.e. normally in infix form. Thus the arithmetic expression a + b in infix form is just a

different (and more convenient) way of expressing the function add( a, b ) which is in

prefix form. Assuming that add has been declared as:-

int add ( int a, int b );

then both a + b and add( a, b ) are expressions which have the value of the sum of a and b.

In most programming languages, some operators are overloaded, e.g. the '-' operator can

mean

! unary negation e.g. -1

! binary subtraction of integers e.g. 3 - 2

! binary subtraction of floats e.g. 4.5 - 3.2

! binary subtraction of long int e.g. 123456789L - 123456788L

We are allowed to use the same operator for semantically similar operations because it is

convenient to do so even though the actual computation required is quite different - the

compiler determines which computation to perform based on the type of the operands.

But many languages will not allow the corresponding functions to have the same name,

e.g. subtract( int, int ) - a function accepting two arguments of type int would not be

permitted to exist in the same scope as subtract( float, float ) - a function accepting two

float arguments. This is illogical. Fortunately for us, C++ does permit overloading of

function names provided that they can be distinguished by their signatures i.e. the number

and type of their arguments. You have already seen this with the standard output stream

cout that has a function << that accepts an argument of any one of the fundamental types.

The language allows the function << to be declared in such a way that it can be used as an

operator.

Note that functions with the same name must be distinguishable by their number and type

of arguments. The function return type is not taken into account in determining whether

they are different.

void print( int, int );

void print( float, float ); // OK. different argument typesint print( int, int ); // error erroneous redeclaration, the return type is

// not considered



Functions

19

10. Reference Arguments

This will be dealt with in a subsequent lecture.

11. Function comments

Each function should be provided with one or two lines of comment after the header

describing what it does and any special assumptions that it makes about any arguments

passed to it. The formal way to do this is to provide pre and post conditions which

specify:-

pre assumptions the function makes about the value of arguments passed to it and any

other relevant conditions. There is no need to include assumptions about the types

of the arguments since the compiler will check these.

post the state after it has accomplished its task. This may include any limitations on the

return value, how unusual situations are flagged etc.

These pre and post conditions then form a contract between the caller and the function.

The caller guarantees to meet the pre-conditions and the function guarantees to satisfy the

post-conditions. If the caller fails to meet his side of the contract (i.e. he does not meet the

pre-conditions), then all bets are off, and the function is relieved from meeting the post-

conditions.

Some language designers consider that this concept is so important that it should not be

dealt with merely by comments. They have therefore incorporated pre and post conditions

into the language so that they can be checked at run-time, raising an exception if the

contract is broken. Eiffel is an example.

Large programs have to be broken down into smaller and more manageable components in

order to deal with their complexity and to allow teams of programmers to work on them.

The separate components can be tested individually with a range of inputs to ensure thatthey behave as specified. But what happens when they are put back together again? Will

all these components work together? Or will there be discrepancies arising from a

misunderstanding on how the parts interrelate? The ability to check the interaction of

these components at run time can provide significant advantages in terms of quality and

reduction of debugging time.

12. Summary

We have looked at functions which may have formal arguments or should have the word

void in the formal argument list to indicate that no arguments are required. Functions

normally return a value via the return statement, and the type returned must agree with the

return type provided in the definition.

Functions are called by name, passing actual arguments whose values are copied into the

formal arguments. Since a call to a function that returns a value is an expression (i.e. it has

a value), a function call may be used in any case where an expression is expected.

It is recommended that function definitions appear after the function main. This requires

that function prototypes appear as the first lines of function main. Functions whose

prototypes are supplied in main are private to main, i.e. the prototypes serve the

requirements of main and no other functions. If there are other functions, defined after

main, and before the functions they wish to call, then they will not be able to do so. There

are two solutions:-



Functions

20

! Ensure that the definitions of the functions to be called appear before the definitions

of the functions that wish to call them.

! Provide prototype declarations for the called functions before main so that they have

file scope and can therefore be called from anywhere in that file.

Local variables of a function usually have the storage class auto and are not visible to code

outside the function. They cease to exist after the function terminates. The formal

arguments are also invisible from outside. Changes to formal arguments that are passed by

value and changes to local auto variables have no effect outside the function, and their

identifiers may duplicate identifiers appearing elsewhere in the program.

Functions are one of the weapons that C++ provides in the war against complexity and the

errors that this complexity may bring with it. They are an example of procedural

abstraction and allow a program to be designed as a hierarchy of functions that

progressively refine the problem by breaking it down into smaller problems. Large

programs must be designed on paper using this process of stepwise refinement before the

program is written. A suitable tool for this design process is a PDL (program description

language), one variant of this being known as Structured English. Libraries of frequently

used routines (functions) can be written and a very large number of libraries are providedwith all compilers, each library containing a number of functions.

Pre and post conditions provided as comments at the head of the function are an important

way of specifying what they do and how they are to be used. This helps to ensure that,

when a large number of tested functions is finally brought together to form a program, the

various parts work together as specified.

Ideally, input and output should be isolated in a limited number of functions designed for

that purpose and not scattered about over many functions whose primary purpose is not

I/O. Generally speaking, functions should not modify global variables and should never

use global variables for such local uses as loop control.



Fl ow of Control

21

Fl ow of Control

1. The type cast operator

The typecast operator provides the possibility of forcing an expression into another datatype by using the name of the new type as though it were a function. For example, in a

program to calculate the statistics on a sequence of integers, the mean can be calculated

from the integer total of the numbers divided by the count of the number of items (where

mean is a float) by:-

mean = float(sum) / float(count);

The new C++ standard has introduced four new operators that carry out explicit

conversions from one type to another. Of these four, only static_cast is introduced. It is

intended to be used for conversion between similar types, e.g. between char and int ,

between int and enum, and between float and int. Example:-

mean = static_cast<float>(sum) / static_cast<float>(count);

Explicit type conversions are error-prone and a large proportion of program errors is due

to them (Stroustrup). The virtue of the new operators is that they are easy to search and

find in large program source files, whereas the earlier example float(sum) could be very

difficult to find.

2. The comma operator

A sequence of statements can be separated by commas. The last statement or expression

provides the value of the sequence, e.g.

s = ( t = 2, t + 3 );t is assigned the value 2, then the expression t + 3 is evaluated to 5, and this last

expression provides the value assigned to s.

This device can be used to include, for instance, a number of statements in a while loop

condition. The value of the last expression is the value of the condition.

while( cin.get(ch), !cin.eof() )

An attempt is made to read a character from standard input and a test is made to see if a

character could not be read because the end of the file has been reached. The value of the

loop continuation condition is that of the test !cin.eof() and, if this has the value false, then

the loop terminates. If the input stream is empty then the loop is never entered.

3. The conditional operator

This consists of 3 expressions separated by a '?' and a ':'

expression1 ? expression2 : expression3

The first must be a logical expression, i.e.. yielding either true or false. If expression1 is

true, then the value of the whole expression is the value of expression2, otherwise the

value is that of expression3. This could have been used in defining function max:-

Example 1

int max( int a, int b )

{ return( a > b ? a : b );

}



Fl ow of Control

22

Example 2

cout.setf( justify == ‘L’ ? ios::left : ios::right );

4. The for statementIn most programming languages, the for iteration construct is suitable mainly for loops

whose number of iterations can be determined in advance. In C++, the for loop is much

more general and can, in fact, be employed for any loop including while and do. The

syntax is

for ( expression1; expression2; expression3 )

statement_block;

where:-

expression1 may consist of one or more statements (separated by commas)

that initialise the loop. e.g. count = 1, max = 10;A new variable declaration may be made here whose scope will extend

to the end of the for loop block:- e.g.

int count = 1, max = 10;

expression2 is a logical expression which determines the continuation of the

loop (in the same way as in a while loop) e.g. count <= max.

This expression may consist of several logical expressions

connected by the boolean operators && (and) and || (or).

expression3 is a statement or statements which will be executed at the end

of each loop iteration. Normally this is used to modify the loop

control variable, e.g. count++

statement_block is either one statement, or more than one statement surrounded by

braces. The single statement may be empty.

If any of the 3 expressions is missing, the semi-colon separator must remain to show its

absence.

Examples

a) for( ; ; )

cout << "hello" << endl;

runs for ever, printing "hello" on a new line each time

b) for(bool forever = true; forever ; )cout << "hello" << endl;

behaves as a) because forever is forever true

c) for ( cin.get(ch); !cin.eof() ; cin.get(ch))

cout.put(ch);

this is the same as:-

cin.get( ch );

while( !cin.eof())

{

cout.put(ch);cin.get(ch);

}

or

while( cin.get(ch), !cin.eof() )

cout.put(ch);

Equivalent to:-

if ( justify == 'L' )

cout.setf( ios::left );

else

cout.set ios::ri ht



Fl ow of Control

23

Note that, in example b) above, bool forever in the first expression is the declaration

of a new boolean variable. It is convenient and makes programs easier to read if the

declaration of variables is as close as possible to the point where they are used. This

facility is one of the improvements over the ‘C’ language provided by C++.

Because of its versatility, there is a tendency for programmers to use the for loopexclusively and to ignore the while loop. However, the latter is designed to deal explicitly

with cases where the loop should not be entered at all under certain conditions (e.g. whenprocessing a file which may be empty). Although this condition can be handled by for asshown above , its primary purpose is for loops whose number of iterations can be

determined before it is entered e.g. when processing arrays (to be covered soon). The veryfact that a while loop is being used signals that it may never be entered whereas, in a for loop this fact can only be determined by inspection of its expressions.

5. The do statement

In a limited number of cases, processing requires that the loop condition is tested at the

bottom rather than at the top of loop. In other words, the statement(s) in the loop body willalways be executed at least once. The format of the do loop is:-

do

statement_block;

while ( expression );

where expression is a logical expression yielding either true or false. As with all loop

statements, if statement_block comprises more than one statement, it must be enclosed in

braces:-

do

{

statement_1;statement_2 ;

...

} while ( expression ); // Note: the test normally appears on same line as the

// closing brace

6. Nested loops

Frequently, a loop is nested within another loop or loops. The reasons why this might be

necessary will become clearer when arrays are covered. Notice that the total number of iterations of the inner loop is the product of the number of its iterations and those of any

surrounding loops. This number can escalate to very large values and can result in

programs that run slowly. for ( int i = 0; i < 10; i++ )

for ( int j = 0; j < 10; j++ )

for ( int k = 0; k < 10; k++ )

process( i, j, k )

;Function process is called 1,000 times.

Sometimes, it may not be obvious how many potential iterations of the inner statement

will occur because, for instance, the second and third lines above may consist of function

calls that, themselves, contain a loop. You should always be aware of the possibility of

introducing inefficiencies into a program in this way because it may result in unacceptable

performance.



Fl ow of Control

24

7. The break statement

This statement is used to alter the flow of control in loop statements ( for , while and do)

and in the switch statement (see below). Its effect in loops is to cause immediate exit from

the loop in which it appears. This might be required if some abnormal condition occurs

that requires that no further iterations of the loop should be made. The abnormal condition

would normally be detected by an if statement. If the loop is nested within another loop

then control will return to the immediately surrounding loop.

8. The continue statement

This complements the break statement and causes an immediate switch of control to the

test part of the loop in which it appears, thus ignoring any remaining statements that

appear after it in the loop body. Its use is deprecated.

9. The switch statement

Where a choice needs to be made from a number of possible states, the if else statementcan become cumbersome. The switch statement is a more compact and readable

alternative. The syntax is:-

switch ( expression )

{

case constant_expression1 : statement_block;



...

default : statement_block;

}

Where:-

expression is an expression yielding an integral value, i.e. int, short,

long, unsigned or char (but excluding floats and arrays),

e.g. a typical menu selection might include:-

cin.get(ch);

switch( toupper(ch) )

…

constant_expression1,2,3 .. are some possible values of expression, e.g. 'P', 'D', 'E'.

They must be constants, either literal or symbolic - seethe example below.

statement_block is a statement or sequence of statements which will

normally end with the break statement. The effect of

break is to cause control to jump out of the switch

statement and not to execute any statements in the

following cases. If break is not present, then the

statements in subsequent cases are executed until either a

break statement is encountered, or the end of the switch

statement is met.

default if none of the cases is met, the statements in the default

section are executed. It is wise always to include this soas to deal with all other possible values of expression.



Fl ow of Control

25

Example (this program is installed in the lab)

int main(void)

{

void DoPrint( void );

void DoDisplay( void );void DoEdit( void );

const char EDIT = 'E', DISPLAY = 'D', PRINT = 'P', QUIT = 'Q';

cout << "Enter choice P)rint D)isplay E)dit Q)uit : " << flush;

for( char ch = '\0'; ch != QUIT; ) // ch initialised to null. While ch == null

{

ch = getch(); // getch from conio.h - char input without echo to the

// display

switch ( toupper( ch )) // toupper from ctype.h

{

case PRINT : DoPrint();

break; // assumed functions DoPrint etc. are // defined elsewhere

case DISPLAY : DoDisplay(); break;

case EDIT : DoEdit(); break;

case QUIT : break;

default : cout << '\a'

<< endl; ch = '\0'; // invalid response,

// sound the bell

}

cout << "Enter choice P)rint D)isplay E)dit Q)uit : " << flush;

}

return(0);

}



Poi nt ers, References and Funct i ons

27


1. Introduction

A variable is a symbolic name for a location in memory that holds a value of the relevant

type. In the assignment statement:- x = y;

the meaning of the use of the variable names x and y is different. The use of variable y

means the value currently stored at the memory location known as y. The use of variable x

means the memory location known as x. Thus the whole statement can be understood as

meaning obtain the value stored at the memory location known as y and store it in the

memory location known as x. The current value stored in x is neither needed nor accessed,

and is overwritten by the assignment.

2. Reference Type

We introduce a new data type the reference whose value is not an integer, float, char etc.but a reference to a variable which holds an integer, float, char etc. It is an alias for

another object. Alias means another name for.

Example:-

Assume the following declarations

int k = 5;

int & b = k;

and assume that variable k is stored in

memory location number 46524.The value stored at memory location 46524

is 5.

Variable b , a reference variable, is declared to be a reference to variable k. It will therefore

hold as its value the memory location of k, thus referencing the value of k , i.e. 5.

Any assignment of a new value to k will therefore affect the value referenced by b, and

any change to the value referenced by b will change the value of k .

Note that the special symbol & is placed after the type (int ) in the declaration of b and that

this must be followed by an initialisation using a previously declared variable (not a value)

of the correct type. Once this declaration and initialisation has been made, b is behaves

exactly as though it were an ordinary variable. The compiler looks after the necessaryindirection so that e.g. the assignment:-

b = 12;

is interpreted as 'assign the value 12 to the memory location referenced by b' . After this

assignment, k also has the value 12, and after the further statement b++, k has the value

13. Note that b++ does not (in contrast to a pointer) increment the value stored in b, i.e.. it

does not change 46524 to 46525.

Once a reference variable has been associated with another variable in this way, it cannot

be changed so that it refers to a different variable. Thus &b = m intended to mean 'change

b so that it is now an alias for m instead of k' is not allowed.

5

k b

Address Contents

46524

Address Contents

7514546524




28

3. Pointers v References

Pointers are carried over from C, and are, in part, superseded by the reference type.

However, many C libraries use pointers and the type has been retained for compatibility

purposes and for their importance in building dynamic data structures. Some books

describe references in abstract terms, and pointers in concrete terms. Pointers, they say, are

variables which hold the address of another variable. But, in fact, this is exactly what

references hold as their value. The differences are:-

! Abstraction

The fact that references hold an address does not need to be known in order to use

them, whereas you must take specific action in order to make a pointer point to

some other object and to obtain the value of the object pointed to (see Syntax

below).

! Syntax

Pointers require special symbols to be used by the programmer -

! to assign to a pointer the address of another object i.e. to make it point to it - usethe address operator &

! to yield the value of the object to which a pointer points, known as

dereferencing - use the indirection operator *

Reference variables, once declared are treated as ordinary variables without the use

of special symbols. The necessary indirection is looked after by the compiler.

References are at a higher level of abstraction than pointers. A further difference is that

pointers can be reassigned at will to point to another variable and can be incremented to

step through memory. They are a much lower level tool than references as befits their

origin. References cannot be reassigned to point to a different object.

Pointer variables Reference variables

int k = 5; int k = 5;

int* ptr OK. Declaration of a pointer to intnamed ptr

int& ref; illegal. Must be initialised ondeclaration

int* ptr = &k; declaration and initialisationcombined using address operator &

int& ref = k; declaration and initialisation

*ptr = 12; using indirection operator * toassign 12 to the variable to whichptr points

ref = 12; assignment to the variablereferenced by ref (no specialsyntax needed)

cout << k; prints 12 cout << k prints 12

cout << *ptr; prints 12 using dereferencing cout << ref; prints 12

cout << ptr; prints the address of k e.g. 46524

ptr++;

increments the address held by ptr which now references the memorylocation immediately following thatof k . (k is unchanged)

ref++; increments the variablereferenced by ref , k now hasthe value 13




29

4. Enumeration Types

It is valuable as a documentation tool to use symbolic names for constant values in

programs. The classic case is pi which can be given a symbolic name by

const double pi = 3.14159265359;If however you need to model a real world object that may take on any one of a set of

know values, then you can declare an enumeration type -

enum dow = { SUN, MON, TUE, WED, THU, FRI, SAT };

dow day1, day2, day3;

Creates an new data type dow (day of week) and declares 3 variables ( day1, day2 and

day3 ) of this type. Note that the enumerated values are not strings. They are simply

constant numeric values that commence with 0.

A further possible use for an enumeration type is to describe the different states that a

program may be in at any one time. In this type of program, the processing e.g. of input

will vary depending on the current state, and certain types of input will have the effect of changing the state. An example of this type of processing is reading a data file that

consists of several lines, each containing a description and a number. The description may

include numeric digits, so it is enclosed in quotes:-

“3D Drawing Program” 12

“Sprocket Type 4S” 31

The states might be described using an enumeration as follows

enum State = {IN_NAME, IN_NUMBER, BETWEEN};

State state = BETWEEN;

5. The typedef statementAllows the definition of new data types based on the fundamental types of the language.

The new type is just an alias for the base type and cannot be given any attributes that are

different from the original.

typedef float real; // creates a new type real based on float.

real length = 10.56;

typedef int* intptr; // type intptr is a pointer to int

There may not seem to be a great deal of value in this mechanism until we meet compound

data types, e.g. array and struct .

6. Reference arguments to functions

The reference type is rarely used in the way described in para. 2. It is intended primarily to

be used in function formal arguments.

Earlier, we looked at simple functions and noted that functions in programming languages

are not 'pure' in the mathematical sense in that they can return more than one value. The

classic example of this is the function that swaps the value of its two arguments. This is

frequently used in sorting algorithms.

int a = 6, b = 199;

swap( a, b );

cout << setw(6) << a << setw(6) << b << endl; 199 6




30

This function does not need to have a return value, but it must return the changed values of

its two arguments. This is accomplished by using reference arguments-

void swap( int& x, int& y )

{

int temp;

temp = x; x = y;

y = temp; // classic swap algorithm. Needs a temporary variable

}

So what is happening here?

The actual arguments in the call swap( a, b ) are the variables a and b. The formal

arguments are defined to be references to integer ( int& x, int& y ). When the function is

called, the compiler recognises that the function is expecting references to integers and not

integer values, so it copies into x and y, not the values 6 and 199, but references to the

variables a and b which hold these values. When the swapping is carried out in the body of

the function, the values that are swapped are those of the variables referenced by x and y,

namely a and b. This is because x and y are aliases for a and b, so anything done to x and y

is actually being done to a and b! For this reason, the function may only be called with

variables and not with literal constants, e.g. swap( 6, 199 ); would be an error.

This is the mechanism provided by C++ to allow a function to return values via its

arguments. Not all of the arguments need to be reference arguments. A function to convert

a time in seconds held as a long int (first argument) into hours, minutes and seconds (the

remaining 3 arguments) will have the first argument as a value parameter and the

remaining three as reference parameters.

void time2hms( long t, int& h, int& m, int& s )

Sometimes formal arguments are referred to as IN, OUT and INOUT arguments. In thecase of function swap, both arguments were INOUT, whereas in time2hms, the first

argument is IN, and the next three are OUT. So both OUT and INOUT arguments need to

be defined in the function declaration as reference arguments.

Notice that, in the prototype of any function , the argument identifiers may be omitted as

shown below, but notice that the absence of the identifiers makes it more difficult to

understand what the function does without further comment being provided.

void time2hms( long, int&, int&, int& ); // prototype for time2hms

7. Pointer arguments to functions

Although there is generally no need to use pointer arguments to functions becausereference arguments can do the same job, it is still common to find them - particularly if

they were originally written by 'C' programmers. In addition, many functions in the 'C'

libraries accept and return pointers. A further consideration is that pointers are frequently

used to access successive components of an array rather than by the conventional means

(array indexing - to be covered later).




31

Example 1

A typical example of C code:-

void makeupper( char* s )

// converts a string to upper case by using a pointer to access the components of the

// array{

char *p = s; // local pointer variable p given value of s,

// i.e. p now points to the first character of the string s

while ( *p ) // while char pointed to by p != '\0' - the ASCII NUL string

// terminator

{

if (*p >= 'a' && *p <= 'z') // if the char pointed to by p is lower case

*p += ('A' - 'a'); // convert to upper case

p++; // increment pointer to look at the next char

}

// since p points to the same string as s and s is a

// pointer to the actual array argument, the actual

// argument has been converted to upper case

}

...

char name[] = "i am all in lower case";

makeupper(name);

cout << name << endl;

I AM ALL IN LOWER CASE

Note that, in 'C' and C++ an array passed as an argument to a function is always passed asa pointer to the first element.

Example 2

The 'C' string library cstring or string.h contains a number of functions operating on 'C'

style strings which accept pointer arguments and some of which return pointer results,

typical ones are:-

char *strcat(char *dest, const char *src); // concatenates 2 strings returning a

// pointer to the result. dest has been

// modified

char *strcpy(char *dest, const char *src); // copies src into dest returning a

// pointer to dest as resultAn example of the use of these two functions is:-

char source[25] = "GNU";

char *blank = " ", *cplus = "C++";

char destination[25];

char *p = destination; // p points to the string destination

p = strcat(source, blank); // concatenate a blank onto source. p points to source

strcat(source, cplus); // concatenate "C++" onto source

strcpy(destination, p); // copy the result back into destination. p still points to

// source which has been changed.

cout << "destination = " << destination << endl;

destination = GNU C++




32

8. Default arguments

Sometimes we need to provide an argument that enables the caller to change the default

behaviour of the function. Where the default behaviour is not to be overridden, then there

should be no need to provide this argument. C++ permits a default argument value to bespecified in the function declaration and, if this argument is not supplied by the caller, then

the default value is used by the function. If the argument is supplied, then it overrides the

default. In the case of one default, it must be the last. In the case of two defaults, they must

be the last and last but one etc.

The default must be supplied only once - in the declaration (prototype), and should not be

repeated in the function definition.

Assume a function is to print to the stdout a number of lines of a file. The default is 4

lines, but this may be overridden by supplying an argument specifying a different number

of lines.

void printfile( char filename[], int numlines = 4 ); // prototypevoid printfile ( char filename[], int numlines ) // definition

{

...

...

}

printfile( "fred.cpp", 10); // overrides default with 10

printfile( "jim.cpp"); // default of 4 is used

9. Inline functions

Calling a function has an overhead that costs time. The runtime system has to set up a

'stack frame' and allocate space for the arguments and local variables. On termination, the

stack frame has to be released and a jump made to the point immediately after the call.

Very small functions can be specified as 'inline' so that the compiler will substitute the

actual code of the function body for each occurrence of a call to the function. This will

improve speed at the expense of code size. In fact, the use of inline is a recommendation

only, and there is no guarantee that the compiler will honour it - this will depend on the

compiler and the size of the inline function.

int main ( void )

{

inline int square( int ); // prototype

... z = square( x ); // compiler should substitute z = x * x

...

}

int square( int a )

{

return ( a * a );

}

A test of the above program was timed for 100 million calls to function square. The

elapsed time without inlining was approx 3.9 seconds and, with inlining, approx 3.05

seconds - an improvement of 20%. The code size was increased by a very minor amount

because the call to function square occurs only once.




33

Note that the GNU compiler does not care whether the keyword inline occurs in the

prototype, in the function definition or in both places. To achieve inlining the compiler

optimisation switch -O has to be set. In RHIDE change the option

Options.Compilers.Optimizations -O to 1

10. Mathematical functionsSee Libraries on page 116



Arrays

35

Arrays

1. Introduction

Arrays are an aggregate type capable of holding a number of values all of the same type,

contiguously in memory. The components may be any one of the fundamental data types -int, long, unsigned, float, char, enumerated, pointer or one of the aggregate types, i.e.

array, struct or class. The struct and class types have not yet been covered. The struct is

referred to in other languages as record and consists of one or more fields of (possibly)

different types (including arrays and records). The class data type will be covered in the

Object-Oriented Programming & Design module.

The advantage of the built-in array type is that a large number of data items can be held in

a single named array variable whose components can be accessed randomly as we shall

see later. The disadvantage is that its size is fixed at compile time and this cannot be varied

at run time to accommodate the fluctuating requirements of the application. Most of the

time, therefore, it is wasting space because it is not full and the type itself does not allow

resizing. The solution, as we shall see later, is dynamic memory allocation.

2. Defining and referencing arrays

The syntax for the definition of an array is

type_specifier name[number_of_elements]

where

• type_specifier is the data type of the components.

• name is an identifier conforming to the normal requirements for

identifiers.

• number_of_elements is the total number of components that the array is to be

capable of holding. This value appears in square brackets

and may be a literal e.g. 6, or a previously defined

constant e.g. numelements where numelements has been

defined as const int numelements = 6;

Example

int table[6]; an array called table capable of holding 6 integers

float temperatures[31]; an array called temperatures capable of holding 31

floats

char name[16]; an array called name capable of holding 16 characters

(but note that, allowing for the terminating NUL

character, only 15 readable characters can be held).

Arrays are indexed . That is, each element is uniquely numbered. The numbering always

starts at 0 and always increments by 1 for each successive element (regardless of the size

of the elements).

9 14 7 5 1 3

0 1 2 3 4 5

An array of integer with 6 elements



Arrays

36

The value held by table element 0 is 9, the value held by table element 1 is 14 etc. Access

to the elements (or components) is by subscripting the table name with the desired element

number. Thus table[0] is an integer with the value 9, table[1] contains 14 etc. Notice that,

since the numbering starts at 0, the last element always has an index one less than the

number of elements. The subscripted array can be used anywhere that an expression of the

component type is required:-

const int size = 6;

int table[ size ];

table[ 5 ] = 22;

table[ 1 ] = table[ 5 ];

cout << table[1];

The subscript may be any expression with an integer value, thus:-

int i = 3;

table[ i ] = table[ size - 1 ];

Since the array subscript can be a variable, we can process an array's elements by means of

a loop using as subscript a variable that increments for each iteration of the loop:-

2.1 Inputting values to array table

int count = 0, size = 6, anint;

cout << “Enter an integer: “;

cin >> anint; cout << endl;

while( cin.good() && count < size )

{

table[ count++ ] = anint;cout << “Enter an integer: “; cin >> anint; cout << endl;

}

Note the need to check two conditions:-

! The input is a valid integer cin.good()

! The end of the array has not been reached count < size

For this reason, the input is read into an auxiliary variable anint before the start of

the loop and before it is assigned to an array element inside the loop. A further input

is then assigned to anint at the bottom of the loop.

2.2 Outputting values from array table

for( int i = 0; i < count; i++ )

cout << table[ i ] << endl;

Note that, in this example, the condition for the loop to continue is controlled by the

number of items entered (count ). This might be less than the total number of

elements in the array. Attempting to process elements of an array that have not been

given a value can lead to unpredictable results.

Change the value of element 1 tothat of element 5

output the integer (22) contained inelement 1

change the value of element 3 tothat of element 5 (the last)



Arrays

37

2.3 Shuffling array elements one position left (or down)

This requires care to avoid overwriting the changes.

const int size = 6;

int table[ size ] = { 0, 1, 2, 3, 4, 5 }; // initialised on declaration - see below

Original contents 0 1 2 3 4 5

for( i = 1; i < size; i++ )

table[ i - 1 ] = table[ i ]; // shuffle the contents one element to

// the left

Shuffled left 1 2 3 4 5 5

2.4 Shuffling array elements one position right (or up)

for( i = size - 1; i > 0; i-- ) // traverse the array backwards

table[ i ] = table[ i - 1 ]; // shuffle the contents one element to

// the right

Shuffled right 0 0 1 2 3 4

3. Array initialisation

Arrays may be initialised on declaration by enclosing a list of values within braces,

separated by commas. If all elements of the array are given values in this way, the number

of elements need not be supplied between the brackets after the array name:-

int table[] = { 9, 14, 7, 5, 1, 3 };

Multi-dimensional arrays may be initialised by placing braces around each row, and

separating the rows with commas (see the definition of type Plane in section 4):-

Plane aPlane ={

{ 'X', ' ', 'X', 'X' }, // Row 1

{ ' ', 'X', ' ', 'X' }, // Row 2

.... // etc.

{ 'X', 'X', ' ', 'X' } // Row 12, no comma

};

Where some initialisers are omitted, and the array is not auto, the remaining elements are

set to 0. The behaviour for auto (local function) variables is undefined.

The number of elements in an array can be found by the built-in sizeof function:-

cout << "sizeof(table) = " << sizeof(table) << endl

<< "sizeof(int) = " << sizeof(int) << endl<< "num elements = " << sizeof(table) / sizeof(table[0]) << endl;

sizeof(table) = 24

sizeof(int) = 4

num elements = 6

But note that sizeof cannot be used in a function to find the size of an array formal

argument since this is a pointer.



Arrays

38

4. Multi-dimensional arrays

There is no theoretical limit to the number of dimensions an array may have, although the

number of elements increases rapidly with the number of dimensions as do the chances of

there being redundant elements. Two dimensional arrays are declared with 2 values, each

enclosed in brackets:-

// airplane reservation system

const int maxRows = 12,

seatsPerRow = 4;

typedef char Plane[maxRows][seatsPerRow]; // declares a new type based on a

// fundamental type

Plane aPlane; // aPlane is a variable of type Plane

void makeEmpty( Plane aPlane)

{

for( int row = 0; row < maxRows; row++ )

for( int seat = 0; seat < seatsPerRow; seat++ )

aPlane[ row ][ seat ] = ' '; // Space = empty }

Functions that operate on the Plane data structure

bool seatFree( Plane aPlane, int row, int seat );

// return true if row,seat is a space, else false

void allocateSeat( Plane aPlane, int row, int seat );

// mark seat allocated with an 'X'

void showSeatingPlan( const Plane aPlane );

// show plan with spaces and Xs as opposite

5. Arrays as function arguments

An example of a 2 dimensional array aPlane of type Plane being passed to a function

appears in 4 above. In C++, an array formal argument to a function is always a pointer to

the first element of the array. This is automatic without any action on the part of the

programmer. Within the function, the array may be subscripted in the normal way. This

explains why, in the function makeEmpty above, it was not necessary to use a reference

argument to ensure that the changed value of the array was passed back to the point of the

call. Since a pointer is passed automatically, any change to the formal argument within the

function body is, in fact, being made to the actual argument. If it is not intended that the

function should modify its formal argument, then the argument should be const modified

to indicate the fact. The compiler will then flag an error if the function body containsstatements that might modify the formal argument.

void showSeatingPlan( const Plane aPlane )

1 2 3 4

1

2

3

4

11

12

Row

Seat

X

X

X

X

X

X

X

X

aPlane is a constant and may notappear on the LHS of an

assignment within the function.



Arrays

39

6. Pointers and arrays

This has already been introduced under pointers. Note that an array name unqualified is

treated by the compiler as an address, so

const int size = 6;int table[size] = { 0, 1, 2, 3, 4, 5 };

int *ptr = table; // assigns to ptr the address of the first element of table

cout << *ptr // outputs the object to which ptr points, namely the

// integer 0

*ptr = 10 // changes the value of table[0] to 10

ptr++ // moves ptr to point to the next element of the array

cout << *ptr // outputs 1

cout << *(table + 3) // outputs 3

cout << table[3] // same as above, outputs 3

Unlike most other languages, C++ supports pointer arithmetic and, since table is a pointer,

a variable can be used to indicate an offset from the beginning

for ( int i = size - 1; i > 0; i-- )

*( table + i ) = *( table + i - 1 ); // shuffle contents one element to the right

or, using a supplementary pointer

for ( int* p = table + size - 1; p > table; p-- )

*p = *( p - 1 );

The compiler knows the size of an int, so p-- results in p being adjusted by sizeof(int), i.e.by 2 or 4 bytes on a PC (depending on the

compiler), similarly with p - 1

While the address held by p > the address oftable

address of table + size(6) - 1elements = address of last

element



Arrays

40

7. Character strings and variable pointers

Notice the difference between char word[] = "hello" and char *greeting = "hello" . word

is a constant address where the string is stored. greeting is a pointer containing the address

at which the string is stored.

char word[] = "hello";

char *greeting = "hello";

cout << "word[] = " << word << endl; // OK. No problem

cout << "greeting = " << greeting << endl; // OK. No problem

word = "fred";

strcpy(word, "wilfred");

greeting = "william";

cout << "word[] = " << word << endl;

cout << "greeting = " << greeting << endl;

8. Character string input/output

As shown above, inserting into the output stream either the name of a character array e.g.

word or a pointer to a character string e.g. greeting has the same effect.

setw(<field_width>) causes a string to be output right-justified in field_width. It can be left

justified by the manipulator

cout.setf( ios::left, ios::adjustfield );

or by setiosflags(ios::left) as in

cout << setiosflags(ios::left)

<< setw(10) << word << endl;

cin can be used for string input, but terminates at the first whitespace character (space,

tab). To avoid possible overflow by the input exceeding the space allocated to the string,

setw can be used within cin to limit the number of characters entered. The excess

characters are held in the input buffer and are used to satisfy any subsequent use of cin.

const int MESSAGESIZE = 4;

char input[MESSAGESIZE+1];

cout << "Enter a message without spaces: ";cin >> setw(MESSAGESIZE+1) >> input;

char overflow[80];

cin >> overflow;

cout << "your input: " << input << endl

<< "the overflow was: " << overflow << endl;

To input lines of text whose length is unknown at compile time, use

cin.getline( char *line, int limit, char delim = '\n' )

The input is restricted to limit characters (e.g. 80 for a typical line of text) and is

terminated by the supplied delimiter that defaults to newline and may be omitted to use the

default. The terminator is not stored in the array. The address at which the line is stored is

held in the pointer line

do this instead, but note that, if the new string is longer, the

extra chars are stored outside the array's allocated memory

and may cause the program to crash

OK because greeting is variable pointer. Fresh memory is

allocated for the new string and greeting is changed to point

to the new location.

compiler error: "incompatible types in assignment of

'char[5]' to 'char[6]'" because word is a constant pointer

and can't be assigned



Arrays

41

const int linelen = 80;

char line[linelen+1];

cin.getline( line, linelen ); // excess chars over 80 discarded

while( !cin.eof() )

{

cout << line << endl; // output the line

cin.getline( line, linelen);

}

9. Arrays of pointers and pointers to pointers

Arrays of pointers can point to different arrays whose declared lengths differ. Thus arrays

of pointers to char can accommodate jagged arrays i.e. arrays of string whose lengths are

different - not just different in the number of characters held, but also in the numbers of

elements allocated in memory.

char *ptr[4] = { "one", "two", "three", "four" }; // array of 4 pointers to char

Assume that the address held in ptr[0] is 36714

Using the Borland C++ Debug Inspect4 menu item:-

-------- Inspecting ptr -------

8F50:0FF0

[0] 8F4C:001E "one" 36714

[1] 8F4C:0022 "two" 36718

[2] 8F4C:0026 "three" 36722

[3] 8F4C:002C "four" 36728

This makes for efficient use of memory when storing large numbers of strings.

The 4 arrays of char are allocated contiguously in memory and the above could be viewed

as follows:-

Printing this array of pointers can be done by

for (int i = 0; i < 4; i++ )

cout << ptr[i]) << endl ;

4

The GNU C++ debugger built into RHIDE does not support inspect

o n e \0

t w o \0

t h r e e \0

f o u r \0 \0

36714

36718

36722

36728

ptr[0]

ptr[1]

ptr[2]

ptr[3]

o n e \0 t w o \0 t h r e e \0 f o u r \0 \0

36714ptr[0]

36718ptr[1]

36722ptr[2]

36728ptr[3]



Arrays

42

10. Command line arguments

You have already encountered programs that accept command line arguments, e.g. dir /w.

Dir accepts an argument w that indicates a wide display of file names. The slash is just an

indicator that an argument follows.

MS DOS provides the facility for programs to pick up arguments supplied at the command

line when invoking a program. For example pretty.exe might be a C++ program to 'pretty

print' C++ source files, in the command line invocation pretty myprog.cpp the argument

myprog.cpp represents the name of the source file to be printed.

In C++, information about these command line arguments is provided by 2 arguments to

function main named by convention:-

! int argc the number of arguments (including the name of the executed

program)

! char *argv[] an array of pointers to char representing the strings appearing

on the command line.In the above example, argc = 2, argv[0] is a pointer to the string " pretty", and argv[1] is a

pointer to the string "bacteria.cpp" . Whitespace on the command line separates the

arguments into the individual components of argv[].

Thus a command line containing myprog /x/y/t myfile would represent 3 arguments, with

"myprog" in argv[0], "/x/y/t" in argv[1] and myfile in argv[2], whereas myprog /x /y /t

myfile would produce argc with the value 5 with argv[0] holding the string myprog the

four arguments /x, /y, /t and myfile held in elements argv[1], argv[2], argv[3] and argv[4]

respectively. What these arguments mean, of course, is up to the author of myprog. It is

good practice to check the number of arguments in main and, if the number falls outside

the number expected (often a variable number of arguments can be entered), an error

message is issued and the program terminates. If no arguments are supplied (other than theprogram name, of course) and at least one is expected, then it is usual to print the program

name together with a list of valid arguments. This list should not be verbose and should

not exceed about 22 lines otherwise some lines will disappear off the top of the screen.

There is a convention that MSDOS programs expect arguments announced by the slash '/' .

In Unix the character used is invariably minus '-' .

Assuming that you have written a program to pretty-print a C++ program; that the program

name is pretty and that 3 arguments are allowed:-

1. /ln print n lines per page, where n is an integer (optional - defaults to 60)

2. /fn print with font size n, where n is an integer (required)

3. filename to print (required)

argc will hold a maximum value of 4 (the name of the program plus 3 arguments) and a

minimum of 3. If argc < 3 or argc > 4 then there is an error and the program should display

an error message to the terminal and then terminate. The error message would be

something like:-

incorrect number of arguments

usage: pretty [/ln] /fn filename

/ln = print n lines per page

/fn = use font size n (8..12)



Arrays

43

Note the square brackets to indicate an optional argument. The program can then be

terminated with either:-

! return 1; when the error is detected in main, or

! exit(1); in other cases. exit is in cstdlib (or stdlib.h).

By convention, a non-zero value returned from main or as an argument to exit indicates anerror. In both cases, other non-zero values can be used to indicate different error

conditions.

11. Initialising pointer arrays

Here is an example of an array being used to provide a lookup table. In a program

involving the use of dates, it is likely that a facility to convert a day number into the name

of a day of the week may be required. The day numbers would be in the range 0 - 6, and

values within this range would be passed as an argument to a function that returns the

corresponding day of the week, i.e. "Sunday" - "Saturday". We introduce here the concept

of static function local variables. These are variables declared within a function whose

scope is limited to the function body but, unlike auto local variables their life is that of the

surrounding program - they are not destroyed when the function terminates. This topic is

covered more fully in the chapter on Program Files para. 8

char* dayname( int daynum )

{

static char *name[] = { "Sunday", "Monday", "Tuesday", "Wednesday"

"Thursday", "Friday", "Saturday" };

return name[daynum];

}

….

int daynumber = 2;

cout << "day " << daynumber << " is " << dayname(daynumber) << endl;

The static local variable name is created and initialised only once. Thereafter, the

declaration is ignored and the return statement simply looks up the day name within the

array that corresponds to the incoming argument. Note that no check is carried out on the

argument, so if it falls outside the range of values 0 - 6 the function will either return an

incorrect value or cause a runtime error.

12. Review

You will, by now, have seen that arrays and pointers to arrays in C++ are somewhat

complex and error-prone. This is because these facilities were designed over 20 years ago

for 'C' (a language that was originally designed for writing operating systems) and havehad to be retained in C++ for backward compatibility. In fact, the object-oriented facilities

provided by C++ allow these deficiencies to be hidden from the application programmer

who can use libraries of classes e.g. class string which hides the underlying shortcomings

of the built-in array of char type. In particular, the disadvantage of the fixed size of built-in

arrays and the absence of array bounds checking can be overcome in container classes

which are provided with most C++ implementations and are now standardised as the

Standard Template Library. However, we shall be concerned with how container classes

are designed and written and we therefore need to understand the base facilities on which

they are built.

You will be provided with a simple String data type that can be used for assignments. You

should read Skansholm pp 91-93 on the standard string type that is now part of theStandard Template Library. If you wish, you can use this standard type wherever strings

are required.



Arrays

44

13. Summary

! The array type allows a collection of items of the same type to be stored under a

single name. The array declaration specifies the type of its components and the

number of elements.

! Individual components of an array can be accessed by subscripting the array name

with an integer expression, making them well suited to processing by loops. The

compiler provides no run time checking of array bounds so that care needs to be

taken to ensure that array bounds are not exceeded otherwise memory may be

corrupted.

! When an array is passed to a function, the address of the first element of the actual

argument is copied into the corresponding formal argument. An array formal

argument can be declared as either e.g. int table[] or int *table they both mean a

pointer to an array of int . Within the body of the function, the components of the

array may be accessed either using subscripts, normally in the form of a variable

whose values are controlled by a loop e.g. table[ i ], or by a pointer. In the formalargument list of a function, a multi-dimensional array must specify the number of

all dimensions except the first. Arrays with 2 or more dimensions are likely to be

specific to a particular application and are best given a new type name using

typedef .

! Arrays can be initialised on declaration with values inside braces separated by

commas. Any items unspecified in this way are initialised to 0 except in auto

declarations where the treatment of unspecified values is undefined. This default

initialisation only has meaning for the primitive types.

! Strings are one-dimensional arrays of char terminated by the ASCII NUL ('\0')

character. Room must be allowed for this character otherwise output and other

routines will not behave correctly. In some programmer-defined functions thatprocess arrays of char, the terminator must be provided by the programmer.

! Arrays of pointers to char can be used to handle arrays of strings. This is how

command line arguments are provided as the second argument (argv) to function

main, the first argument (argc) being an integer representing the number of

arguments.

A array of pointers to char can be initialised with a list of strings.

The number of pointer elements, unless given within the brackets, is fixed by the

number of strings in the initialisation list. Output of this array of char pointers could

be by:-

int size = sizeof(course) / sizeof(course[0]);

for ( int i = 0; i < size; i++ )

cout << course[ i ] << " ";

cout << endl;

sizeof(course[0]) will yield either 2 or 4 (the size of a pointer), and sizeof(course)

will yield either 10 or 20 (5 pointers). The value of size in either case will be 5.



Arrays

45

14. An array application - Stack of char

A stack is an abstract data type - a type that is not provided by the programming language

but which can be implemented by using the data structuring facilities of the language. A

stack works on the LIFO (last in, first out) principle - the last item put onto the stack is the

first to be removed from it. The last item put onto the stack is at the top of the stack and

the next item to be removed will be taken from the top. Access to the stack is at one end

only - the top. Compare it to a stack of plates - the next one to be used is the latest one to

be placed onto the stack. The standard operations on a stack of char are:-

! void push( char ) char is pushed onto the stack

! void pop( void ) the top of stack item is removed

! char top( void ) the top of stack char is returned, the stack is unchanged

! bool empty( void ) returns true if the stack is full, otherwise false

! void makeempty( void ) empties the stack

One way of implementing a stack is to use an array:-

// charstck.cpp

// illustrates an array implementation of a stack of char

const int MAXSTACK = 20; // 20 elements

char stack[MAXSTACK ]; // the stack

int thetop; // the index value of the current top of stack

// (initially empty)

int main ( void )

{

void push( char ch ); // 5 function prototypes

void pop ( void );

char top( void );

bool empty( void );

void makeempty( void );

char word[] = “abracadabra”;

makeempty();

for ( int i = 0; word[ i ] != ‘\0’; i++ ) // push each letter of word

push( word[i] );

cout << word << “ reversed = “;

while ( !empty( ) )

{

cout << top( ); pop( ); // output the top char and then pop

}cout << endl;

return 0;

}

void push( char ch )

// post - ch has been placed at the top of the stack

{ ... }

void pop ( void )

// pre - the stack is not empty

// post - the top of stack item has been removed

{ ... }



Arrays

46

char top( void )

// pre - the stack is not empty

// post - the top of stack item has been returned. The state of the stack is unchanged

{ ... }

bool empty( void )

// post - if the stack is empty, true is returned, else false is returned { ... }

void makeempty( void )

// post - the stack is empty

// abracadabra reversed = arbadacarba

Note that the code in function main never accesses the array stack directly. All operations

are carried out only via the provided routines makeempty, push, pop, top, empty. This is an

example of data abstraction - the stack data structure is protected from corruption by

requiring all accesses to be made through these functions. In the example, this discipline is

not enforced - it is possible for the stack to be accessed directly since stack is a global

variable that has file scope. We shall see later how direct access can be prevented, andhow the stack can be encapsulated in a single entity that holds both the array and the

variable that records the top of stack.



Program Fi l es

47

Program Fi l es

1. Introduction

The unit of compilation in C++ is the file. A program can be built from several files. These

will comprise:-

! The main program file that includes a function main

! Zero or more ‘modules’ providing support functions, data types etc. comprising

! A header file ( .h ) that contains prototype declarations for the functions

provided by the module and possibly type and data declarations.

! A source ( .cpp ) file containing the definition of the functions, types and

variables provided by the module. This file may or may not be present.

! The object file ( .obj ) created by compiling the .cpp file (see above) that

provides the definition of the functions whose declarations appear in the header.

The main program file contains compiler directives to #include the header file(s) for thesupporting modules. This ensures that functions and variables, constants and types definedin the supporting source files can be accessed by the main program. In other words, theheader files provide the prototypes for functions and referencing declarations for variablesetc. that allow the compiler to generate code for the main program without the source of the supporting .cpp files themselves being present at compile time.

At link time, the programmer must indicate which supporting object ( .obj ) files he wantsto be linked with the object code of the main program. Within the GNU C++ IDE this isdone by creating a project which defines all the required source files for a particularproject and ensures that the object code of each is up to date before the linker links themall in to produce the executable. The project definition itself is saved as a .gpr file whichcan be opened and changed as required. By default, the name of the executable file will be

the name of the project file. Thus assign1.gpr (the project file) will cause the executableresulting from linking all object files to be named assign1.exe regardless of the name of the main source program file. The default can be changed by the menu item Project.maintargetname.

Take iostream as an example. You must include the compiler directive#include<iostream> to ensure that the actual text of this header file is included in thecompilation of your main program. Without this, the compiler would not be able to makesense of a call to e.g. cin.get(). You do not need the source of iostream (iostream.cpp) andit is not even present on the machine. At link time, the linker sees the header declarationand knows from this that the object file for iostream must be combined with the objectcode generated from the source of your main program in order to produce the executable.

The integrated environment allows the location of the object code of iostream to bespecified and the linker fetches it from that directory for inclusion.

Thus we have the concept of separate program modules that consist of two parts:-

! an interface part - the header file iostream

! an implementation part - the object file iostream.obj. (In fact, you will not find

iostream.obj in the directory because the code is included in the library files in the

lib directory).

The interface part defines the services provided by the module in terms of the functions,variables, constants and types that are provided (exported) by the module. The

implementation part provides the actual implementation in the form of object code that is

needed at link time.

This is another example of abstraction. We need to know how to call the iostreamfunctions, and it is convenient that objects like cin and cout are pre-declared. For this



Program Fi l es

48

reason, prototypes of the functions and the declaration of the standard I/O streams aremade available to us in the header file iostream, but the implementation is hidden in thelibrary files since we need not be concerned with how the functions are implemented norhow stream objects are represented. Consequently we can access the resources provided byiostream only via the routines and declarations provided in the header file (the interface).We cannot access the representation of streams because it is hidden and is thereforeprotected from the possible corruption that might have occurred had we been alloweddirect access to it.

Note that the ANSI C++ standard specifies that system header files such as iostream,string, vector etc. should not be given with a .h file name extension. However, all othermodules (including those that you write) must have the extension .h. The GNU C++compiler meets this requirement of the standard, but other, older, compilers may not and,in those cases you will have to use the old name for such system headers, e.g. iostream.h.

2. The steps to produce an executable

Assume that you have a program that consists of the files:-

main.cpp the main program file containing the function main

other.cpp a source file containing the definition of support functions, type and

variable definitions

other.h the header file containing the external referencing declarations for the

functions, types and variables that are defined in other.cpp

! Select Project - Open project - call it myprog.prj

! Add other.cpp and main.cpp to the project

! Compile other.cpp

! Compile main.cpp. (Header file other.h is brought in during compilation)

! Link main.obj with other.obj

! When you choose link with main.cpp

! You are not linking the source files but the object files created by the compiler.

! The linker doesn't know what to link with main.obj unless you have a project

! The linker links together the object code of other , main and of any library code

required e.g. iostream

! You could use make instead of compile and link. This will compile all modules

whose object file has a time earlier than the source file (.cpp) and then link.

The name of the executable is the same as that of the project i.e. myprog ( not main ).

Some students find this process of setting up a project intimidating for some reason. But it

quite simple and has to be mastered in order to write real programs that consist of more

than one file.

3. Types, storage class and scope

Each object that is given an identifier in a program is a reference to a memory location

where that object's representation is stored. Thus the declaration int count associates count

with a location in memory where the bit representation of the value of count is stored.

An object known by its identifier has 3 attributes in addition to its value: -



Program Fi l es

49

! type

This is important because it determines the amount of memory that is allocated for

the representation of the object and also its bit pattern. Thus both the number of

bytes and the pattern of the bits stored in those bytes will be completely different

between e.g. an int and a float even if they appear to hold the same value.

! storage class

This is important because it determines the lifetime of the object, i.e. how long it

remains in existence occupying storage. Storage class has defaults which are

determined by the position in the source code of the object's declaration. This may

be varied by providing an explicit storage class on declaration. There are 3

categories of lifetime -

! local (auto) lifetime is transient and exists only for the lifetime of the

enclosing block (usually a function, but see later).

! static lifetime exists for the duration of the program's execution

! dynamic allocated dynamically during a program's execution. lifetime is

for the duration of the program, or until de-allocation whicheveris sooner. This will be dealt with later.

! scope

This is the portion of the source code within which the object is visible. Thus a

variable declared within a function is visible (in scope) only within the block of

statements that constitute the function body regardless of its storage class. See also

Skansholm Chapter 4.3 Declaration, scope and visibility.

There can be different combinations of scope and storage class, e.g. a function local

variable can be declared static. The effect is that its visibility (scope) remains limited to

the enclosing block (i.e. the function body) but its lifetime continues for the duration of the

program's execution.

4. Local duration

Unlike some programming languages (e.g. Pascal and Modula-2), the body of a function

may not include the definition of another function. In other words, functions may not be

nested in C++ and the only valid definitions appearing within a function are those for data

items. Variables defined in a function have the default storage class auto and the formal

arguments to the function are also treated as auto.

The body of a function is a sequence of declarations and statements surrounded by braces

{}. This construct is known as a compound statement or block . Within a function body,

any statement may itself be a block . It is logical therefore that such a block, nested within

a function body, should be allowed to contain data declarations, and that the scope of those

declarations should be the surrounding block as with function local variables. Therefore

the sequence of statements that depend on the truth or otherwise of the logical expression

in an if statement may be a block that contains declarations whose scope is limited to that

block. A block may even consist of just the braces surrounding one or more statements :-



Program Fi l es

50

void swapifless ( int& a, int& b )

{

if ( a < b )

{

const int temp = a;a = b;

b = temp ;

{

int inner = temp; cout << a;

}

cout << inner << endl; //error undefined symbol inner

}

cout << temp << endl; // error undefined symbol temp

}

Function swapifless above could have included a local variable definition int temp

(declared before the if statement). This outer temp would have been invisible within the if

block because the inner temp would have caused a 'hole' in its scope. This hole would

extend for the scope of the if block only.

A local variable can, of course, be initialised on definition. This initialisation can be by

any expression that is valid at that point, for instance by an expression that contains

reference to the formal arguments as above. In the absence of any initialisation, the value

of a local auto variable is undefined.

5. Declaration versus definitionA definition of a function is a block of source code that defines the function and its body:-

void swap ( int& a, int& b )

{

int temp = a; a = b; b = temp;

}

A declaration of a function is just the header followed by a semi-colon:-

void swap ( int&, int& ); // prototype

A definition of a variable is a statement that allocates storage with optional initialisation:-

int count = 0; // allocates storage

A declaration of a variable is a notification to the compiler that a variable has been defined

in another file, but is being referenced in the current file:-

extern int count; // external referencing declaration. Does not

// allocate storage

You will not normally need to make external referencing declarations because our

standard practice will be to #include a header file that serves the same purpose (see para 6

below).

function body block

if block

inner block



Program Fi l es

51

6. Static duration

An external referencing declaration for a function is no different in form from the

function prototypes with which you are already familiar. It informs the compiler that a

function is to be called from a separate file from that in which it is defined. An external

referencing declaration for a function is made in the source program file in which the call

to the function is to be made, i.e. in the file in which it is not defined. The format is as

follows: -

external void print( void ); // declares a function that is defined in another file

// external may be omitted

External referencing declarations are usually made by placing in the main program file a

compiler directive to #include a header file that provides the necessary external

referencing declarations as explained in paragraph 1.

Variables declared outside of any function - e.g. before function main have file scope and

are referred to as global variables. The C++ compiler guarantees to initialise any globalvariables to zero, but it is considered good practice to initialise them explicitly. As with

any data declaration, using the same identifier as another object declared in a surrounding

block, a local variable causes a hole in the scope of the global variable with the same name

- see the example below:-

#include<iostream>

int sum;

int main( void )

{

void subroutine( void ); // prototype declaration

sum = 15;

subroutine();

cout << "Global sum is " << sum << endl;return 0;

}

void subroutine( void )

{

float sum = 1.234;

cout << "Local sum is " << sum << endl;

}

The global variable sum is distinct from the local variable of the same name in function

subroutine. The latter causes a hole in the scope of the global from the point immediately

after the definition of float sum. The only variable of that name visible within subroutine isthe local one with the value 1.234. As a corollary float sum is not visible with main

because its scope is confined to the function in which it is defined. The program's output

is:-

Local sum is 1.234

Global sum is 15

It is possible to gain access to a global variable even when it is masked by a local variable

of the same name. In function subroutine for instance the global variable sum can be

referenced by preceding it with the double colon scope resolution operator which you

have already met in e.g. setiosflags( ios::left ):-

cout << "Local sum is " << sum << endl;

cout << "Global sum is " << ::sum << endl;



Program Fi l es

52

7. Storage class static

Variables that are explicitly given the storage class static may be either local or global.

The meaning of the static differs depending on whether its declaration appears within a

function or outside.

8. Static local variables

The default storage class of variables declared within a function is auto. This means that

their scope is confined to the block in which they are declared, and also that their lifetime

is the same as that of the block. If a variable declared within a function is initialised with

e.g.

int local = 1;

Then the initial value of local will be 1 for every activation of that function. If it is not

initialised, then its initial value is undefined.

A local variable given the storage class static still has local scope, but retains its value

between successive activations of the block in which it is declared.

void fun1()

{

static int staticlocal = 1;

...

staticlocal++;

}

On the very first occasion that fun1 is called, the value of staticlocal will be 1. But for

subsequent calls staticlocal will have the value that it was last given in the body of fun1

e.g. 2 on entry at the second call, 3, 4 etc. in the above example. In other words,

staticlocal retains its value across activations of fun1 and occupies storage for the whole of

the program's execution.

9. Static global variables

The effect of giving a global variable or function the storage class static is to make it

inaccessible to any program unit (i.e. file) other than the one in which it is defined. In

other words, it can be accessed by any function in the file in which it is declared, but may

not be accessed from any other file, even if an external referencing declaration is given in

the other file.

The effect of static definitions at the global level in source files that have no function main

is to give the programmer of these implementation modules the ability to control the

export of both variables and functions. This is a standard requirement of a programminglanguage that supports the separate compilation of modules. A function of storage class

static would typically be a support function called by other functions in the same module

but required not to be accessible from another module. A global variable would be given

the storage class static to prevent access to it from any module other than the one in which

is it declared. This is known as data hiding. Items which are explictly made visible (by

declaring them in a header file) are said to be exported from the module. Note that this

mechanism could be used to prevent access to the stack and its top-of-stack indicator if the

char stack in the previous chapter were to be implemented in a separate file.



Program Fi l es

53

10. The C++ pre-processor

This is a simple macro processor that, in the case of GNU C++, constitutes a separate passby the compiler. It makes a pass over the source file substituting all occurrences of definedidentifiers with the token string that represents the macro definition. Thus, if you likedPascal and also like typing, you could make C++ look more like Pascal by replacing alloccurrences of { with BEGIN and all occurrences of } with END; and by providing macrosthat carry out the conversion back to the C++ convention immediately prior tocompilation;

#define BEGIN {

#define END; }

int main( void )

BEGIN

int a, b;

if ( a > b )

BEGIN

int temp = a;a = b;

b = temp;

END;

return(0);

END;

The macro processor was used extensively in C to produce the effect of inline functions

and constant declarations which are now part of the C++ language. Its use in C++ is

therefore mostly confined to controlling conditional compilation and the inclusion of

header files.

11. Conditional compilation

When developing a complex program, it may be useful to include debugging statements

that output the value of certain variables or that indicate at which point in the source codeexecution is currently being carried out. The output can be directed to a file by usingoutput redirection at the command line. When the program appears to be workingcorrectly, these debugging statements could be deleted from the source. But all too often,it is found that bugs still remain and some or all of the debugging statement have to be re-inserted. The inclusion in the compilation of the debugging statements can be controlledby macro conditional statements of the form:-

#define DEBUG 1 // Macro definition setting DEBUG to true

...

#if DEBUG

statements1

#else

statements2

#endif

Statements1 and statements2 are actual C++ program statements. The sequence #if DEBUG, #else, #endif can be scattered throughout the source code and will have the effectof including statements1 into the compilation if DEBUG is true, and including statements2if DEBUG is false.



Program Fi l es

54

In order to eliminate the debugging statements, it is only necessary to change the value of DEBUG from true to false (0), and re-compile and link. The GNU C++ IDE allows macroconstant definitions to be changed via the menu item:-

Options.Compiler options

To define a macro named DEBUG, go to this menu item and enter -DDEBUG. Toundefine it, enter -UDEBUG.

A file macro.cpp is installed in the labs for you to try this out.

The conditional compilation facility may also be used to generate different versions of a

program for different platforms or conditions.

12. Conditional file inclusion

There are two formats for specifying the name of the include file in a compiler include

directive. If the header file name is surrounded by angle brackets, a predefined list of

specified include directories is searched. If the header file name is surrounded by double

quotes, the current directory is searched followed by the specified include directories.

#include<iostream> look in the standard include directories

#include"myheader.h" look in the current directory first, then the standard include

directories

The standard include directories are stored in a directory indicated by operating system

path directives that are set up when the system starts or that are indicated by values that

can be configured from within the IDE.

When developing programs that consist of several modules (files) it is normal to supply a

header file for each module other than the main module. The main module then requirescompiler directives to #include these header files, using the form #include "filename.h" . If

necessary, the header file may also be included in the compilation of the .cpp file for

which it is the header. In cases where header files themselves contain include directives,

there is the likelihood that some declarations will be included twice. In those cases, header

file inclusion may be made conditional on the existence or otherwise of a definition

Initially, you will not be writing programs whose complexity requires the use of #ifndef

and #define so do not worry about them unduly. When the linker complains that you have

multiple definitions of a function or variable, you will know that you have hit the problem.

Then seek advice.



Data St ructures

55

Dat a St ructures

1. Data Types

Data types can be described in terms of the range of values they may hold and by the

operations provided for them. e.g. type int has a range of possible values from-2,147,483,648 to 2,147,483,647, and the provided operations include +, -, *, /, %, ++, +=,

>, <=, ==, !=.

We have not dealt in any detail with the way in which type int is represented in memory

because we do not need to know this in order to use the type.

We defined a type Clock to have a range of values representing the times from midnight to

23:59 at intervals of 1 minute. We also provided a small set of operations - gettime, tick

and show.

We try to follow the principle that the definition of such data types provides all the

information another programmer needs in order to use them in his program, but that the

representation should be hidden so that it cannot be corrupted. Another reason for hidingthe implementation is that it should be possible to change it, e.g. to improve performance.

The client program will have to be re-linked with the object code of the new

implementation but, provided that the definition is unaltered, no change should be required

to the source code of the client program.

2. Abstract Data Types

These are data types that are defined entirely in terms of their set of operations without any

consideration of how their values are represented. The domain of values may also feature

in their definition, but often it is so large as to make this not useful. There may, in fact, be

several different ways of implementing them, each with their own set of advantages and

disadvantages. They are often models of objects from the real world or from mathematics,

e.g. Sets, Queues and Lists.

The implementation should allow a programmer to define new instances of the type, but

should prevent access to the representation.

3. Classification

There are two main groups - single entities of which there may be many instances

e.g.Clock , and collections (or containers) of many objects of the same type e.g. Set , List

etc. The components of these collections may be of any type, but, within one collection,

must all be of the same type. Frequently, part of the definition of a collection is the

relationship between the members.



Data St ructures

56

4. Categories of Collection

The broad categories are:-

! Collections in which there is no relationship between themembers except that, in the domain of all possible values that

may be a members, each is either a member or is not, e.g. Set

and Bag.

! Linear structures in which the members have a one to onerelationship with each other.

! Hierarchical structures in which the members have a one

to many relationship with each other.

! Graphs - where the members have a many to many

relationship.

5. Stacks

Definition

This is the simplest of the linear collection types since the

number of operations is typically small. As with all containers,

the components may be of any type, but must be of the same

type within any one stack. Additions to, and removals from the

stack are made at one end only - the top. Access to components

is limited to the item currently at the top. The consequence of

this relationship between members is that the first item to be

added is the last to be removed. This is known as a LIFO

structure - last in, first out.

Stacks are very widely used in Computer Science. When afunction is called, a stack frame is built containing the address

to which control must return when the function has finished

execution. In addition, space is reserved in the stack frame for

any auto local variables and for the values of any actual

arguments passed to the function. This structure

is pushed onto the system stack. When the

function terminates, the stack frame is popped

from the stack, causing the arguments and local

variables to perish. Another application is

recording the path taken through a structure so

that it can be retraced - the 'Hansel & Gretel'

effect.

Set

Linear

Hierarchical (Tree)

Graph

int funa ( int y )

{

return ( y * 2 ) ; }

int funb ( int z )

{

return ( funa ( z ) / 2 );

}

int func( int a )

{

return ( funb( a ) );

}

int main (void )

{

int x = 4, y;

y = func( x );

}

main main

func func

main

funb

funa

func

main

funb

func

funb

func

main main

Stack frames for the above code

main



Data St ructures

57

The classic operations are:-

push push a new item onto the stack

top retrieve the top of stack item without removing it

pop remove the top of stack item

empty test if the stack is empty

Viewed as an abstract type, a stack cannot be full, but the actual implementation may have

to place a limit on the number of items that can be held on the stack. This gives rise to a

further operation

full test if the stack is full

Operations on abstract data types can typically be categorised into those that:-

! change the state of the data type e.g. push, pop

! report on the state of the data type without changing it e.g. top, empty, full.

! create and/or initialise an instance of the type - no example hereEach operation is provided with a pre-condition and post-condition that states

i) pre - any requirement placed on the caller as to the state of the structure prior to the call,

or on the values passed as arguments; for instance, top and pop must not be called on

an empty stack.

ii) post - the state of the structure that is guaranteed to hold after the operation has been

carried out, provided that the pre-condition has been met; for instance, after a push, the

number pushed is at the top of the stack.

The definition of a stack of integers can be placed in a header file which is then available

for importing (using #include "intstack.h" ) by any client program requiring it:-

// intstack.h

// definition of a stack of integers

void push( int arg );

// pre - !full()

// post - stack contains the value of arg, top() = arg

void pop( void );

// pre - !empty()

// post - top() has been removed

int top ( void );

// pre - !empty()

// post - stack is unchanged, the item at the top of the stack has been returned

bool empty();

// pre - none

// post - returns TRUE if stack is empty, otherwise FALSE

bool full();

// pre - none

// post - returns TRUE if stack is full, otherwise FALSE



Data St ructures

58

Representation

The obvious first choice for representing a stack is an array, although this has the

disadvantage that an upper limit for the number of items to be stored must be chosen

before compiling, and this cannot be varied at run-time. This representation should be

hidden from a user of the stack by specifying the storage class static

// intstack.cpp

// representation and implementation of a stack of integers

#include "intstack.h"

const int MAX_STACK = 10; // the maximum number of items that can be stored

static int data[MAX_STACK]; // the container for the stack members

static int Top; // the index of the top item.

// Top will need to be initialised on startup, incremented

// before pushing a new member, and decremented after

// popping a member.

// When Top = MAX_STACK - 1, the stack is full

Implementation of the operations

This is left as an exercise. The full definition of the functions would be placed after the

global data definitions in intstack.cpp. Note that intstack.cpp contains an include compiler

directive for the header file. intstack.cpp would contain only the data declarations shown

above and the function definitions. There must be no function main.

Using the stack

A client program wishing to use the integer stack would import the definition (i.e. #include

"intstack.h" ) and then carry out operations on it as though it had been defined in the same

file. Because of the static qualifiers used for the array definition data and the integer

variable Top, the client program cannot access the representation directly even if extern

declarations are made for these two items in the client's source code. const MAX_STACK also cannot be accessed because of its const qualifier.

#include <iostream>

#include "intstack.h"

int main( void )

{ // push some items

cout << endl << endl;

while( !full())

{

static int item = 0;

push( ++item );

cout << "pushing " << item << endl; }

Now an attempt to access the stack variables directly - causes linker errors:-

Top = -1; // Linker error undefined symbol _Top - defined as static

cout << "MAX_STACK = " // Linker error undefined symbol

<< MAX_STACK << endl; // MAX_STACK is const in intstack.cpp

// pop them

while ( !empty() )

{

cout << "popping " << top() << endl; pop();

}



Data St ructures

59

6. Abstract Data Type?

Can this implementation of a stack of integers be classed as an abstract data type? It has

been defined in terms of its set of operations. It is encapsulated by being placed in separate

files and its representation is hidden from its clients - its state can only be altered through

the supplied operations. But only one stack can exist at any one time in any one client

program. The client cannot declare instances of the type by e.g.

IntStack astack, bstack;

This is clear since there is no mechanism provided by the stack module for specifying on

which stack the operations are to be carried out - there is only one. This single instance of

an encapsulated type is sometimes referred to as an abstract state machine and is simple to

implement and useful when only one instance of the type is required at any one time.

Later, we will see how a true abstract data type can be defined of which as many instances

may be created as the client program requires.

7. Queues

A queue follows closely the real-world example. Operations are permitted at both 'ends'

with additions (enqueue or append ) being made at the tail and removals (serve or remove)

being taken from the head . Effectively, the elements are ordered physically according to

the time of their arrival. It is known as a FIFO structure - first in, first out. Typical

operations are:-

! append or enqueue add an element at the tail

! serve or remove remove an element from the head

! size return the length of the queue

! empty query whether the queue is empty

! full query whether the queue is full

Implementation

Again, an array implementation is considered. We need two integers to indicate the head

and tail of the queue and possibly a further integer to record the size (although this can be

computed from head and tail).

const int MAX_QUEUE = 10;

static char queueitems[ MAX_QUEUE ]; // A queue of characters

static int head = 0, tail = -1, count = 0;

Initially, the indicator (technically cursor ) tail is set to a special value to indicate the

empty state. The head of the queue can be viewed as being at the 'left hand' or 'bottom' of

the array, while the tail grows 'right' or 'up' the array as items are appended.



Data St ructures

60

The problem with this method of handling the array is that as items are appended and

served, the queue moves up the array, and will eventually bump up against the end when,

in fact, there may be space available lower down caused by elements being removed from

the head e.g. ‘A’ in this case. One solution is to slide all items in the queue down the array

once the tail has reached the top, but data moves are relatively expensive - particularly if

the queue elements are large.

A satisfactory solution is to view the array as

circular so that the first element follows onimmediately after the last. Spare space in the

array caused by removals will always be

available for use as long as the number of

elements remains below MAX_QUEUE .

Instead of simply incrementing head on each

removal, and tail on each append, these two

cursors must be taken modulus

MAX_QUEUE each time they are

incremented. Thus, if e.g. tail is presently 9,

and a further element is appended, tail

becomes ( 9 + 1 ) % 10 = 0, and the newly

arrived element is inserted at array element 0.

70 1 2 3 4 5 6 8 9head

tail

70 1 2 3 4 5 6 8 9

A

Empty1.

append('A')2.tail

head

70 1 2 3 4 5 6 8 9

A Bappend('B')3.

tail

head

70 1 2 3 4 5 6 8 9

A Bch = serve()4.tail

head

70 1 2 3 4 5 6 8 9

A B C

append('C')5. tail

head

Process

0

1

2

3

4

5

6

7

8

9

J

K

L

M

N O

tail

h e a d

coun t = 6



Data St ructures

61

void enqueue( char element )

{

tail = (tail + 1) % MAX_QUEUE;

queueitems[tail] = element;

count++;

}

The simplest way of implementing the test for full and empty is to maintain the size of the

queue in a variable (e.g. count ) within the queue module.

As with all data structures based on an array, the storage space is fixed at compile time and

the number of items that can therefore be stored is bounded. This inflexibility means that

arrays can only be used in cases where the maximum number of components can be

determined in advance.

8. Lists

Basically a list is a sequence of elements, each element other than the first and the last

having a predecessor and a successor. Another way of expressing this is that a list is

! either empty or

! consists of an element followed by a list.

This is known as a recursive definition.

The elements may be ordered:-

! by their time of arrival, i.e. each successive addition is placed after the previous last,

or

! inversely by their time of arrival - each element is inserted before the previous in a

similar way to a stack, although access may be allowed to any element.

! by some quality of the data e.g. a list of names ordered alphabetically.

! by requesting insertion at the 'current' position as indicated by some cursor .

Again, an array is considered as the method of representation. However, we find that there

is a high cost involved where insertion and deletion is permitted other than at the ends.

Each insertion within the list will require all elements following it to be moved ‘up’ the

array to make room, and, since there can be no ‘null’ elements, each deletion will require

all following elements to be moved down to close the gap. The time required to carry out

these moves makes this method of representation less than optimal. There are more

efficient and flexible ways of implementing lists in cases where insertions and deletions

are permitted within the list.

9. Structs

Frequently there is a need to store information about an entity under a single name where

the information describing that entity involves different data types. The struct is an

aggregate type that provides this facility:-

struct student // student is a type, not a variable.

{

char name[30];

int age;

char coursecode[6];

}; // note the semi-colon

student courserep; // courserep is one student



Data St ructures

62

Each separate data item within the structure is referred to as a data member . Once the new

type student has been declared, a collection with that component type can be defined.

student aclass[16]; // aclass is an array of 16 students

Access to the members of a struct is by dot notation:-

strcpy( courserep.name, “William Brown” ); // simple assignment not allowed

courserep.age = 21;

strcpy( courserep.coursecode, “mit96” );

cout << courserep.name << endl << courserep.age << endl

<< courserep.coursecode << endl;

A queue of students could be declared as:-

const int MAX_QUEUE = 16;

static student stuqueue[ MAX_QUEUE ]; // A queue of students

static int head = 0, tail = -1, count = 0;

10. UnionsThis is similar to the struct in that it can hold one or more items of different types. It

differs from struct in that it can hold only one of its components at any one time. The

compiler allocates storage for the largest of the specified members and all members are

overlaid onto the same storage. In other programming languages this type is usually

known as a variant record . There are two main uses for unions.

! In cases where different instances of the same entity may have different

characteristics, i.e. they are described by a different set of variables. This might

arise in a collection of students where part-time students require a record of their

employer whereas full-time students do not.

! In low level programming when a location in memory may be viewed as two

different sets of data, e.g. either two separate integer values or a long integer.

Example:

typedef short TwoInts[2];

union cheat

{

Twoints twoints;

long along;

};

cheat x;

x.twoints[0] = 255; x.twoints[1] = 1;

cout << x.along << endl; 65791



Dynami c Data St ruct ures

63


1. Structures

These were introduced in the previous chapter. The type name in C and C++ is struct , butin most other languages they are known as Records. They are particularly useful for

modelling real-world objects that are described by a set of attributes (data values). The

syntax is

struct type-name

{

list-of-members

};

This is a type definition and does not allocate storage. It introduces a new type that can be

used subsequently in definitions of variables whose type is type-name.

Examples:-

struct Date

{

int year;

int month;

int day;

};

...

Date today, his_birthday;

struct Person

{

char name[20];

Date birthdate;

char address[4][20];

};

....

Person Fred, Jane;

struct Student

{

Person personaldata;

char tutorGrp;

int modulemarks[9];

};

...

Student mscit[40];

These examples illustrate several things about the data type struct .

! The members (referred to as fields in other languages) may be of the same type, or

of different types.

! There is no limit to the number of members, but large records can be built up from

other struct types, for instance, type Person has a field birthdate which is itself a

struct type ( Date).

! The members may be of any type, including arrays (and other structs)

! The type name can be used in declarations of arrays whose elements are of struct

type, e.g. mscit is an array of 40 elements, each of whose data type is Student. EachStudent has a data member called personaldata of type Person; a tutorGrp of type

char; and an array of 9 elements of type int called modulemarks.

! The type-name appearing after the reserved word struct is known as the structure

tag. It is desirable that this name (e.g. Date, Person, Student) be unique within itsown scope.

As you can see, structures can be used in combination with other structures and with

arrays to create arbitrarily complex types capable of modelling many real-world entities.




64

2. Comparison between structs and arrays

! Component data type

The elements of an array must all be of the same type whereas structs may contain

data members of different types.

! Assignment

An array may not be assigned to another array because an array name is a constant

pointer whereas the use of a structure variable name accesses the whole structure.

The consequences of this are important:-

Variables of structure type may be assigned to other variables of the same type. The

effect of assignment is to copy all of the fields from the source structure to the target

structure (including each element of any array members of the structure). Thus we

could write

Jane = Fred;

or mscit[1] = mscit[2];

! Function arguments and return

Structure arguments are, by default, passed to a function by value (not as a pointer

in the case of arrays). However, a reference argument may be used to reduce the

cost of copying large structures and/or to enable any changes to the structure to be

reflected in the actual argument. If the objective is to eliminate the cost of copying

large structures when a function is called and it is not the intention to modify the

structure within the function, then the formal reference argument can be const

modified, e.g.

void printDate( const Date& aDate ){

cout << aDate.day << '/' << aDate.month << '/'

<< aDate.year << endl;

}

There is no intention to change the value of the argument aDate since it is only

being output. However, to reduce the cost of copying the actual argument into the

formal argument, the formal argument is made a reference to the actual argument -

Date&. Copying a reference involves only a few bytes.

A function may return a structure or a reference to a structure as its result.

Example:-

Date changeDate( Date aDate );{

aDate.year++;

return aDate;

}

! Access to components

Elements of an array can be accessed by subscripting the array name as in the

example above. The subscript can be a variable that is modified within a loop c.f.

the Plane example. This allows computed random access to any array component.

The members of a struct, on the other hand, are accessed using dot notation i.e. the

structure variable name followed by a dot followed by the member name. The dot isknown as the structure member operator . If the member name is itself a structure

and access is required to its members, then further dots are required to tunnel down

through the member hierarchy, viz.




65

Fred.name;

Fred.birthdate.day;

mscit[10].personaldata.birthdate.year;

mscit[20].personaldata.address[1];

mscit[30].marks[2]; // the marks of student number 30 for the

// second module

! Pointers to structures

If a structure is referenced by a pointer then the de-referencing operator applied to

the pointer provides the access:-

Date* dptr = today; // dptr is a pointer to Date and points to the Date today

Date dt = *dptr; // dt is assigned the value of today by dereferencing the

// pointer dptr

However, the structure member operator (dot) has a higher precedence than the

dereferencing operator (*). So access to a member of today via the pointer dptr must

use parentheses to resolve the precedence:-

cout << (*dptr).year; // displays the year member of today via the

// pointer dptr

This type of access is frequently required and the syntax is rather clumsy. A new

operator is introduced for this purpose - the structure pointer operator ->. This does

two things - dereferences the pointer to access the whole structure, and then

accesses the member given after the operator ( year in this example).

cout << dptr->year;

! Initialisation

As with arrays, structures may be initialised at the time they are defined, e.g.

Date his_birthday = { 1995, 11, 15 };

3. Storage Management

So far we have only been able to use data items that have been defined at compile-time.

Thus, an array defined in the source code of a program as:-

int table[100];

Will hold 100 integers and, if the requirements of the program exceed this number of

elements, then the excess cannot be handled. Clearly this is unsatisfactory. The

programmer cannot predict the demands that will be made on his program when it is being

used by a client. What may have seemed a generous estimate when the program was

written might soon turn out in practice to be a ludicrous under-estimate. What is more, if

the estimate is indeed generous, then a large amount of storage space remains unused andtherefore wasted because it cannot be used temporarily by other data items.

An example is a windowing system like MS Windows. The programmers of Windows

could not possibly have worked on the assumption that the number of open windows

should never exceed a certain fixed limit. Since that code was written, the memory

installed in the average PC has at least doubled, redoubled, and redoubled again. To have

fixed this limit 3 or 4 years ago would have put all users in a straight jacket which would

now appear intolerable.

So how can we create and delete data items dynamically at run-time in response to the

demands of the application program?

By using the memory allocation and deletion procedures new and delete. The use of theseroutines is closely bound up with pointers and equivalent facilities are to be found in most

of the conventional programming languages such as Ada, Pascal, Modula-2 and C.




66

3.1 new

The syntax is new type-name [number-of-elements], where [number-of-elements] is

optional and is used when a dynamically allocated array is required.

Examples int* intptr = new int;

char* chptr = new char[20];

The first statement allocates from the heap a chunk of memory sufficient to hold

one integer and sets the pointer to integer intptr to point to this memory location.

The heap, or free store is the name given to that part of available random access

memory that is not currently occupied by program code and ordinary program

variables.

The second statement allocates sufficient memory from the heap to accommodate

an array of 20 characters and sets chptr to point to the first.

3.2 de-referencing the pointers

Notice that the new data items are anonymous - they have no name. This is notsurprising since the compiler is responsible for associating variable names with

memory locations and the compiler did not know whether or not we would execute

these two statements at run time - they may be encountered only if the user selects a

particular menu option. Access to the newly allocated data items is obtained only

via a pointer that points to them:-

*intptr = 99;

cout << "intptr points to " << *intptr << endl;

strcpy( chptr, "Hello, Hello!!!!!!!");

cout << " and chptr points to " << chptr << endl;

In the assignment and output statements, intptr needs de-referencing to produce the

value of the integer to which it points. chptr , on the other hand, does not require de-referencing since we want the whole array to be assigned or output rather than just

the single character to which chptr points. This treatment is analogous to that of an

array name.

3.3 delete

the delete operator has two forms, without brackets for single data items, and with

brackets for arrays. Note that, whereas the form of new required the brackets to be

placed after the type name:-

char* chptr = new char[20];

the syntax of delete requires the brackets to be placed after delete

delete intptr; // de-allocate memory occupied by int pointed to by intptr

delete[] chptr; // de-allocate memory occupied by string pointed to by chptr

The effect of delete is to return back to the heap the memory referenced by the

pointer (intptr and chptr in the above examples) and not to delete the pointer itself.

After this, it is an error to attempt to de-reference these pointers in order to access

the item they previously referenced.

3.4 Lifetime

The lifetime of objects allocated by new is from allocation to the earlier of de-

allocation (via delete) or termination of the program.

Notice that lifetime may be different from scope. If a pointer providing access to a

dynamically allocated item goes out of scope (perhaps because it is a local function




67

variable and the function terminates) then the dynamic data item continues to exist,

but is inaccessible. This is known as memory leakage. If it happens often enough,

the program could run out of memory even though not all is being used. Local

function variables can be used for allocating dynamic data items, but it is necessary

to ensure that, before the function terminates, some other pointer that will continue

in scope is set to point to it.

int* makenewtable( int size )

{

int* intptr = new table[size];

return intptr;

}

Since the function returns a pointer to integer, the result of the function call will be

assigned to some other pointer to integer and access will not be lost by the demise

of inptr :-

int* newtable;

newtable = makenewtable( 20 );

If there is insufficient memory available on the heap when new is called, new

returns the special pointer value 0. This means that the pointer does not point to

anything and that, in this case, the allocation has failed. When building dynamic

data structures, 0 is frequently used as a pointer value to indicate that no link exists

between components of the structure.

int* intptr = new table[ size ];

if ( intptr == 0 )

{

cout << "Error, insufficient memory " << endl;

exit(1);

}

The size argument to new permits the size of a dynamically allocated array to be

determined at runtime. This can be used to get over the fixed size problem of arrays.

The array is allocated on start-up with. say 10 elements. When it becomes full,

makenewtable is called with an argument of, say, double this (i.e. 20). The contents

of the original array are copied into the newly allocated one, and the old array then

deleted. Next time the array becomes full, makenewtable is called with an argument

of 40, and the copying done again. In this way, the effect of a dynamically

resizeable array can be obtained. However, during this doubling process, there is a

temporary requirement for additional memory that might cause memory exhaustion.

Also, the requirement that the old data be copied into the newly allocated table is

relatively costly in terms of time, and it is therefore advisable to minimise the

number of resizing operations wherever possible - this is the reason for doubling the

size on each resize.




68

4. Dynamic Data Structures - Linked Lists

A list is a sequence of data items, each item other than the first and the last having a

predecessor and a successor. A more elegant definition using recursion (and one which

can be realised in most programming languages) is :-

! a list is either empty or

! consists of a head representing a single data item followed by a tail which is a list of

data items.

Lists may be implemented using arrays but dynamic memory allocation is more flexible in

that the list may grow and shrink in response to the demands of the application.

The list should be

viewed as a series of

nodes, each node

containing some data

and a link to the nextnode. The link is a

pointer to a node, and

the node is most

usefully implemented

as a struct.

For simplicity, a list of

integers will be illustrated, but the data contained in a node (struct ) may be as large or as

complex as the application requires. The node is therefore defined as:-

struct Node

{

int data; Node* link;

}

Each node therefore consists of a data field (in this case an integer) and a pointer to thenext node. The list itself can be implemented as a structure containing links to the first andlast nodes in the list, and a count of the number of nodes. These links are, again, of typepointer to node. If the list is empty, then the links to the first and last nodes are given thespecial value 0 referred to above. The same principle will be applied to the link member of the last node in the list since it will have no successor:-

struct LinkList

{ int count;

Node* first, * last;

}

The operations for a list are much less closely prescribed than those for stacks and queuessince it is a more general structure and access may be provided at any point. There are alsoseveral possibilities for the ordering of the nodes. For simplicity therefore, the exampleshown below will add new items to the end of the list, and remove items from the front.This is therefore, in effect, a queue.

NodeN o d e

last

f irst

count

Linked List

data l ink da ta l ink

Node

d at a l in k




69

4.1 List Cursors

Sometimes a list is provided with an internal cursor that can be moved about by

making calls to appropriate functions. At any time, additions may be made at the

position indicated by this internal cursor, and also deletions provided the list is not

empty. The addition of a cursor (a pointer to Node) and the operations to move itare left as an exercise. It is not usual to provide a print function for a data structure

since it ties it a particular I/O regime which may not be appropriate for all

applications or on other platforms. However, it is sometimes useful for debugging

purposes and one is included here to demonstrate a traversal of the list. These

example operations all have a reference to a list as one of their arguments. This

allows the client program to declare several lists, and to specify via the actual

argument on which list the operation is to be carried out.

4.2 Initialising the list

void init( LinkList& t)

{

t.count = 0; // count of elements = 0

t.first = 0; // pointer to first element does not point to anything

t.last = 0; // pointer to last element does not point to anything

}

4.3 Creating a new node

static Node* newnode() // function that returns a pointer to a Node. This

// function is private to the list module (it does not

// appear in the header file and is

// declared with storage class static to prevent

// access by a client)

{ Node* n = new Node; // allocate memory from the heap sufficient to

// accommodate a Node and store a pointer it

// in n.

return(n); // return the pointer as the function's result

}

4.4 Checking for empty

bool empty( const LinkList& t) // the argument is const because the list is not

// changed by this function

// post: returns true if the list is empty, false otherwise

{

return (t.count == 0);

}




70

4.5 Adding a new item to the list

void add (LinkList& t, const int item)

// post: item is added at end of list

{

Node* n = newnode(); // create a new node dynamically by calling // function newnode

n->data = item; // put incoming data into the data member of the

// new node

n->link = 0; // and set its link member to point to nothing

if (empty(t)) // special action required if empty

t.first = n; // set first to point to the new (first) node

else

t.last->link = n;// set link member of last node to point to the new

// node

t.last = n; // set 'last' member of the list to point to new (last)

// node

t.count++; // increment the count }

N o d eN o d e

last

f irst

2

LinkList

1 2

d at a l i nk

H eap

3

N o d eN o d e

last

f irst

2 3

LinkList

1 3

N o d e

2

No de *n = ne wno de ( );

n->data = i tem;

n->link = 0;

a) t.last->link = n;

b) t.last = n;

c) t .count++;

a)

b)

c)




71

4.6 Removing an item from the list

int remove(LinkList& t)

// pre : the list is not empty()

// post: the first item in the list has been removed

{ int tempdata = t.first->data; // save the data in the node pointed to by

// first for return

Node* tempnode = t.first; // save the first node for deletion

t.first = t.first->link; // reset first to point to the next node after

// first

t.count--;

delete tempnode; // recover memory for the old node

return tempdata; // return the saved data

}

N o d eN o d e

last

f irst

3 2

LinkList

1 3

N o d e

2a)

b)

d)

c)

t e mp n o d e

H eap

N o d e

last

f irst

2

LinkList

3

N o d e

2

a) int tempdata = t . f irst -> data;

b) No de *t em pn od e = t. fi rs t;

c) t.first = t.first -> link;

d) t.count--;

e) delete tempnode

e)




72

4.7 Printing the list

void printlist( constLinkList& t )

{

Node* temp = t.first; // temp points to first node

while ( temp != 0 ) // while list not completely traversed{

cout << temp-> data << endl; // output the data member of the node

// pointed to by temp

temp = temp-> link; // move pointer forward one node.

}

}

4.8 Searching the list

bool found( const LinkList& t, const int target )

// post: returns true if target is in the list, else false

{

if (empty( t ))return false;

Node* temp = t.first;

do

{

if( target == temp->data )

return true; // return true if found

temp = temp-> link; // else move to next node

} while( temp != 0 ); // while not at end of list

return false; // not found

}

5. Other dynamic structures

The ability to create storage space for data dynamically at run time in response to the

requirements of the application and to link these data items together by means of a pointer

or pointers allows us to represent a wide range of structures of arbitrary complexity. Thus

we can model stacks, queues, priority queues, lists, ordered lists, lists of lists, trees, graphs

etc. The object-oriented features of the language that we shall be studying in the second

Semester enable us to design data types as classes of object that represent these data

structures. There are a number of books available that provide examples of these data

structures and the algorithms to process them.



Sorting

73

Sorting

1. Introduction

There are two main types of sorting - sorting arrays held in random access memory, and

sorting files. In the early period of computing, file sorting tended to be dominant becauseRAM was very expensive and mass storage was held on magnetic tape, access to which is

sequential. In contrast, magnetic disk storage provides the possibility of accessing file

records by reference to their position in the file.

2. Components of Sorting

Sorting involves rearranging the elements so that they are in order. This, in turn consists of

two operations:-

! Comparing elements - usually by reference to a key field

!

Moving elements - usually by swapping pairs of elementsThere are normally many more comparisons than moves and the number of comparisons

will be the most significant operation in terms of time, and therefore the prime indicator of

the efficiency of a sorting algorithm.

3. Sorting Files

Database systems are now universal, and file sorting has become less important. Instead, a

number of different indexes are held - either within the data file, or as separate files - that

allow the data file to be read (and output) in different orderings.

If the amount of RAM permits it, and indexes are not supported, then the fastest way of

sorting a file is to read it into an array, sort the array and write the data back out to file. If the file is too big, then it can be broken up into chunks, each of which is sorted in an array

and written out to a separate file. Then the several ordered files are merged back into a

single file.

The traditional file merge requires only 2 elements of the file to be in memory at any one

time and works as follows:-

! split the original file into two new files writing 1 item to each new file alternately.

Then merge back into the original file in pairs, creating n 2 runs of 2 items per run

! split the original file into 2 writing 2 items to each file alternately. Then merge back

into the original file in quadruples creating n 4 runs of 4 items per run

! split the original file into 2 writing 4 items to each file alternately. Then merge back

into the original file in octuples creating n 8 runs of 8 items per run

! etc.

The sort has finished when the original file contains 1 run of n items. The following is a

simplified example based on a file of 8 items. The principle is exactly the same for any

number of items.



Sorting

74

Pass Description Files

1 Original File 5 8 3 6 7 2 4 1

Split into 2 files consisting of 1 item from file 1 5 3 7 4

the original file written alternately file 2 8 6 2 1

Merge the two files by comparing 1 item fromeach file and writing the smaller then thelarger into the original file giving 4 runs of 2items

5 8, 3 6, 2 7, 1 4

2 Split into 2 files consisting of 2 items from theoriginal alternately

5 8, 2 7

3 6, 1 4

Merge the two files in groups of 2 items fromeach file, giving 2 runs of 4 items

2.1 Run 1 5 3 3

5 6 3 5

8 6 3 5 6

only 1 item remaining from this run, write it 8 3 5 6 8,

2.2 Run 2 2 1 3 5 6 8, 1

2 4 3 5 6 8, 1 2

7 4 3 5 6 8, 1 2 4

only 1 item remaining from this run, write it 7 3 5 6 8, 1 2 4 7

3Split into 2 files consisting of 4 items from 3 5 6 8

the original alternately1 2 4 7

Merge the 2 files in groups of 4 items giving1 run of 8 items. The file is now sorted

1 2 3 4 5 6 7 8

Note that:-

! There are only 2 elements from the file present in memory at any one time

! The process is dominated by I/O time

! The number of passes required to sort the original file is log2n

n Passes

8 3

64 6

512 9

4,096 12

32,768 15

262,144 18

2,097,152 21



Sorting

75

4. Why sort?

! Sorting is used to optimise searching for and retrieving data either by humans or by

the computer

! To produce a report which, because it is sorted, simplifies the manual retrieval of

information

! To make more efficient searches for items held in either main memory or external

storage

5. Does it pay to sort?

Sorting carries an overhead for

! time

! memory for the code

! memory for temporary data

For very small data amounts of data, sequential searching may be sufficiently fast to avoid

the need for sorting

But a simple sorting technique can be employed for low data volumes, needing little

overhead.

6. What is the best sort?

Different sorting techniques have different strengths and weaknesses depending on:-

! The number of items to be sorted

! Whether the items are:

! already ordered, or nearly so

! in random order

! already inversely ordered, or nearly so

! The amount of additional storage required:-

! Temporary

⇒ local variables

⇒ an explicit stack

⇒ additional space on the system stack for stack frames if a recursivealgorithm is used

! Permanent

⇒ for the code which implements the sort

! The number and size of data items required to be moved

7. Sorting efficiency

We are not usually concerned with the absolute amount of time required for a sort. But we

are concerned with how the time t taken for a sort varies with the number of items n

required to be sorted.

If there is a linear relationship, then t will vary directly with n. i.e. it will be O(n). But no

O(n) sort has yet been discovered!



Sorting

76

If t varies as a function of n2

then an increase in n by a factor of, say 10 will increase t 100

times and increasing n by 100 will increase t 10,000 times

The simple sorting algorithms are all O(n2)

8. Simple Array Sort - Exchange (Bubble)

Work through the array comparing adjacent pairs of elements.

If the first element is heavier (larger) than the second, swap them

Continue making passes, but stop one element sooner on each pass, because the next

heaviest element has bubbled down to its correct place

k = n

While k > 1 Do

For each element i from 1 to k - 1 Do

If element i > element i +1 then

Swap element i with element i + 1 Endif

EndFor

Decrement k

EndWhile

Pass 1 2 3 4 5 6 7

K 8 7 6 5 4 3 2

44 44 12 12 12 12 6 6

55 12 42 42 18 6 12 12

12 42 44 18 6 18 18 18

42 55 18 6 42 42 42 4294 18 6 44 44 44 44 44

18 6 55 55 55 55 55 55

6 67 67 67 67 67 67 67

67 94 94 94 94 94 94 94

Notice that after each pass, the heaviest element in the unsorted part of the array has

settled to the bottom, increasing the sorted portion by one and decreasing the unsorted

portion by one. The indicators of the efficiency of this algorithm are:-

Comparisons = (n-1) + (n-2) ... + 1 = 28 = ½(n2

- n)

Max moves =3 / 2 (n

2- n) = 84 max

Ave moves =3 / 4(n

2- n) = 42 ave

This algorithm can be improved by employing a flag that is set when no exchanges take

place on a pass. In this case the array is sorted and no further passes are required. This is

an O(n2) algorithm. It is never used in real application because it is the least efficient of all

sorting algorithms. It is introduced here because it is relatively easy to understand and so

that you will know never to use it!



Sorting

77

9. Insertion Sort

This works in a similar way to the sorting of a hand of cards

Pick up the last but one element and place it in the correct order in the last 2

Pick up the last but 2 and place in the correct order in the last 3 etc.

If the number of items to be sorted > 1 then

For each element k from last item but one down to 0

j = k + 1

save = k'th element

While j <= last item AND

the key of save > the key of the j'th element

r[ j-1] = r[ j];

increment j

endwhile

r[ j-1] = save

endfor

endif

Pass 1 2 3 4 5 6 7

K 7 6 5 4 3 2 1

k'th key 6 18 94 42 12 55 44

44 44 44 44 44 44 44 6

55 55 55 55 55 55 6 12

12 12 12 12 12 6 12 18

42 42 42 42 6 12 18 42

94 94 94 6 18 18 42 44

18 18 6 18 42 42 55 55

6 6 18 67 67 67 67 67

67 67 67 94 94 94 94 94

Ave No. Comparisons = ¼(n2

+ n - 2) = 14 (14 in the example)

Ave No Moves = ¼(n2

+ 9n - 10) = 32 (29 in the example)

! On average, there are half as many comparisons as Exchange sort

! The algorithm is efficient if the data is already in order

! It is an O(n2) algorithm

! It is stable - equal keys are not moved. This can be important if 2 or more

consecutive sorts are required - each using a different key - the second being the tie

breaker when the first keys contain duplicates.



Sorting

78

10. Simple Sort performance

11. Conclusions

11.1 Insertion sort is better for small data items and large keys. It also gives good

performance when the data is already ordered (or nearly so). For this reason it is often

used in conjunction with advanced sorting algorithms, e.g. Quicksort

11.2 Exchange sort is the slowest sorting algorithm and is only used in teaching or trivial

applications because it is the simplest to code

11.3 Selection sort (not shown) is better for large data items with small keys. It has

shown slightly better performance than Insertion on inversely ordered data

12. Complex sorts

! Shell sort - derived from insertion sort

! Quicksort - See later

! Heapsort

! These are in a different class to the simple sorts. The number of comparisons tend to

vary in proportion to n.log2 n and they are therefore O(n.log n) sorts.

Selection Sort(not covered in this note)

Insertion Sort Exchange Sort

Moves Compares Moves Compares Moves Compares

Worst 3(n-1) ½n(n-1) ½n(n-1) ½n(n-1) 1.5n(n-1) ½n(n-1)

Average 3(n-1) ½n(n-1) ¼n(n-1) ¼n(n-1) 3/4n(n-1) ½n(n-1)

Best 3(n-1) ½n(n-1) 2(n-1) n-1 0.00 n-1

1

10

100

1,000

0Insertion Selection Exchan e

Ordered

Random

Inverse

Sim le Sortin Al orithms

Lo Scale

10,000



Sorting

79

13. QuickSort

This was invented by C.A.R. Hoare - a famous Oxford professor of computing and is an

advanced algorithm, based on the exchange sort, that normally employs recursion. It is the

most efficient of the advanced sorts although it becomes inefficient under certain veryexceptional conditions. The more data items, the less likely these conditions are to arise.

Insertion sort is often used in conjunction with Quicksort to sort small partitions.

The technique is to split the array into two partitions and then to sort the first partition

followed by the second partition:-

void QuickSort( AnyType array[] )

{

If sorting is needed then

split array into partitions S1 and S2

QuickSort(S1); QuickSort(S2);

EndIf }

All the keys in partition S1 must be less than (or possibly equal to) each of the keys in

partition S2. The recursive routine sorts successively smaller and smaller partitions until a

partition contains only one item and is therefore sorted

The partitions are portions of the array itself - described by starting and ending indexes,

and not some additional temporary data structure.

Here is a refinement of the first description using four array index variables

void QuickSort( AnyType array[], int first, int last )

{ if( first < last )

{

split the array into 2 partitions

QuickSort( array, first, last_of_first_partition );

QuickSort( array, first_of_last_partition, last );

}

}

The 'partition' portion of the algorithm is where all the work is done. the second and third

statements are simply recursive calls to the function itself.

The partitioning process ensures that all items in the first partition have values that are <=

all items in the second partition - although neither partition is necessarily sorted.

One of the keys in the partition currently under consideration is selected as the pivot (the

central element in this example)

The items in the current partition are scanned

! first from left to right looking for an element >= pivot

! then from right to left looking for an element <= pivot

! when each scan has stopped, and provided the scan indexes have not crossed over,

the two items are swapped.



Sorting

80

Scanning continues until the 2 pointers cross over. The pivot is now in its correct position

in the array and is no longer involved in the partitioning. It may have been moved from its

original position.

Quicksort is called recursively to partition the lower and upper partitions, provided there

are at least 2 elements in them

14. Efficiency of Quicksort

14.1 Best CaseThe pivot exactly divides the array into 2 equal partitions. There are then log2

partitions. There are n items, so the total number of comparisons is n.log2 n i.e.

O(n.logn )

14.2 Worst case

O(n2) - no better than Exchange sort. But this is extremely unlikely. The choice of

pivot is crucial - ideally, this should be the median key, but the true median can only

be found by sorting! Some variants choose the pivot by finding the median of 3

items randomly selected. The example below selects the central element as the

pivot.

14.3 Average

For all possible orderings of the keys 1.39n.log2n. Mathematicians can see the proof

in Algorithms - see para 17. below.

44 55 12 42 94 6 18 67

Pivot

Scan ScanSwa

4455 12 42 94 618 67

Scan Scan

Swa

445512 42 94618 67

1st Partition 2nd Partition



Sorting

81

15. C++ code for function Quicksort ( see Wirth )

void QuickSort( int array[], int first, int last )

{

int lb = first, ub = last; // lower bound and upper boundint pivot = array[ (first + last) / 2 ]; // pivot = central element

int temp; // for the swap

do

{

while ( array[ lb ] < pivot ) // search up for item >= pivot

lb++;

while ( pivot < array[ ub ] ) // search down for item <= pivot

ub--;

if ( lb <= ub ) // if not crossed over, then swap

{

swap ( lb, ub ); // swap elements using their index

lb++; // increment ready for next scanub--; // decrement ready for next scan

}

} while ( lb <= ub ); // until indexes cross over

if ( first < ub ) // if > 1 item in the partition

QuickSort(array, first, ub); // partition the lower partition

if ( lb < last ) // if > 1 item in the partition

QuickSort(array, lb, last); // partition the upper partition

}

16. Comparison of complex sorting algorithms

16.1 Shell sort - a refinement on insertion sort proposed by D L Shell in 1959. The

analysis of this algorithm poses some difficult mathematical problems.

16.2 Heapsort - a refinement of selection sort. It seems to like sequences which are

initially in inverse order. The second fastest of the advanced sorts. Shell sort is

faster only if the data is already ordered.

16.3 Quicksort - is significantly faster than either of the above whatever the initial

ordering of the data.

17. Further Reading

Algorithms + Data Structures = Programs, Wirth N, 1976, Prentice HallClassic Data Structures in C++, Budd Timothy A., 1994, Addison Wesley

Ordered

Random

Inverse100

200

300

400

500

Shell Sort Heap Sort Quicksort

T

i

m

e



Testing

83

Testing

1. The context for testing - Verification and Validation

Verification and Validation is a generic term for all processes which ensure that the

software meets its requirements, and that the specification meets the needs of the client. Inother words,

Verification means - Are we building the product right?

This involves checking that the software product conforms to its

specification

Validation means - Are we building the right product?

This involves checking to ensure that the software product meets

the expectations of the client

Techniques required

! Static - Analysis of the design and program listing.

Includes Walkthroughs, Inspections, Formal verification

! Dynamic - Exercising the program using test data similar to real data, i.e.testing

2. The objectives of testing

! To show that the software system meets its specification.

! To exercise the system in such a way that any latent defects are exposed.

Testing cannot prove the absence of defects, only their presence. A successful test is one

that discovers defects.

Testing can never be exhaustive

Apart from trivial programs, the number of different

! possible inputs

! pathways through the program

are effectively infinite. For large programs, testing all possible combinations of pathwaysthrough the code and all possible variations in categories of input would take until the end

of the universe even at the rate of one test per millisecond.



Testing

84

3. Testing & Debugging

! Testing is required to discover errors in software.

! Debugging is the process of correcting errors discovered by testing.

It is much more economical to discover errors at the design stage than after the program

has been coded because this avoids the correction process i.e. it avoids the need to debug

and re-test.

4. Two different testing strategies

! Bottom-up

! Top-down

4.1 Bottom-up testing

As each component (e.g. function or module) is developed it is tested 'stand-alone'

by using a specially written 'test harness' or 'test driver'. This is referred to as unit

testing. In C++ a module is a file pair - the interface (header file) and the

implementation (object code file). Usually this pair will implement either:-

! A set of useful functions, e.g. iostream, math

! An abstract type, e.g. a linked list or string abstraction

Re-usable components (e.g. a linked list module) should be distributed with test

drivers.

Individual components e.g. functions are tested to ensure that they operate correctly.

Each component is treated as a stand-alone entity that does not need other

components in order for it to be tested.

Functions are assembled into modules that are then tested. - module testing.

Several modules may be amalgamated to produce sub-systems which are then tested

- sub-system testing. One of the problems that module or sub-system testing might

reveal is a mismatch between the interfaces. This can occur when the module using

the facilities of another module has been designed on assumptions that differ fromthose made in the design of the module. This might result from a lack of

understanding of the interface specification on the part of either the author or the

user of the module. Or it might be caused by an error in implementation.

Finally, all modules are combined to produce the program - system testing.

LocateError

DesignRepair

RepairError

Re-Test

UnitTesting

ModuleTesting

Sub-SystemTesting

SystemTesting

AcceptanceTesting

Inte ration Testin

User

TestingComponent Testing



Testing

85

After this, the user carries out acceptance testing. For bespoke systems developed

for a single user, this is sometimes referred to as alpha testing. For marketable

software products beta testing may be used where a number of users agree to use

the system and to report on any problems. In exchange for this they may get the

software either free or at a preferential rate.

Advantages and Disadvantages of Bottom-up Testing

! Advantage

It is easier to create test conditions. The functionality is there - it just needs

code to test it.

! Disadvantages

" If combined with top-down development, all system components must

be available before testing can start because the last items to be

completed under this development strategy are the lowest level

components - the first to be tested.

" If top-down development is not employed, then special test drivers

have to be written for each component. Eventually these are replacedby the actual higher level components when they are implemented.

4.2 Top-Down Testing

This starts with a skeleton of the system. An 'executive module' (at the top of the

hierarchy). Some or all of lower level modules may not have been implemented and

exist only as stubs. Stubs are functions whose body has not yet been implemented.

They simply report e.g. the name of the function or the value of the arguments

and/or return a dummy value.

Initially, the tests are very limited - the purpose is only to exercise the interfaces

between major sub-systems. As more and more modules are implemented the tests

can become more comprehensive.

Advantages-

! The testing process matches the top-down design approach.

! Structural errors - perhaps faults in the design are found earlier. This may avoid

extensive re-design at a later stage.

! The availability of a limited working system is a morale booster and may be

available to demonstrate to client.

Disadvantages

!

It may be difficult to provide stubs which simulate the behaviour of a complexcomponent.

! In most systems, output is generated by lower level modules. There may

therefore be a need for an artificial environment to generate test results for

higher level modules.

4.3 Conclusion

The top-down approach is generally considered preferable for most systems today -

Yourdon. But, in practice, it will always be necessary to include a certain amount of

bottom up testing of low level components.



Testing

86

5. Categories of Testing

5.1 Functional testing

The most common form. Its purpose is to ensure that the program performs its

normal functions correctly - see above.

5.2 Thread testing

This may be used in real-time systems which are usually made up of a number of

co-operating processes. An external event such as an input from a sensor may cause

control to be transferred from the current process to the process that handles that

event. Real time systems are difficult to test because of the time-dependent

interactions between the processes. An error may occur only when the processes are

each in a particular state. Thread testing follows the functional testing of the

processes and is designed to trace the effect of the different external events as they

thread through the various processes. The number of combinations of state of thevarious processes may be so great that it is impossible to test all of them, e.g. 10

processes, each with 10 possible states produces 10,000,000,000 different

combinations.

5.3 Recovery Testing

Purpose - to ensure that the system can recover from various types of failure.

This is important in on-line and real-time systems e.g. controlling manufacturing

processes.

It may be necessary to simulate in software such failures as hardware, power,

operating system etc.

5.4 Performance (Stress) Testing

Purpose - to ensure that the system can handle the specified volume of transactions

in terms of response time, storage requirements etc. This would be important in

large transaction processing applications such as airline reservation systems.

6. Test Planning

The planning of tests should be carried out during the Specification and Design phases of

the software project:-

Re 'ments

S ec

System

Spec

System

Design

Detailed

Design

Module &

Unit code

test

Sub-System

Integration

test

System

Integration

test

Acceptance

test

Service

Sub-system

integrationtest plan

System

Integration

test plan

Acceptance

test plan



Testing

87

6.1 Test Plan & Test Log

The Test plan includes

! A unique identifying number for the test.

! A description of the purpose of the test.

! A specification of the data to be used.

! A description of the expected result.

The Test log includes

! A reference to a test plan item.

! The date of the test.

! The result of test.

! An indication of whether or not expected result was obtained.

! A reference to any corrective action required if a fault is found.

! A possible reference to re-testing if this is needed.

7. How much testing?

In theory, a program should be tested in such a way that all sets of pathways through it and

all possible combinations of input data are covered. In practice this is impossible for all

except very trivial programs because the number of combinations of input and pathway is

effectively infinite. However not every possible input may need to be tested. There is

probably a very large number of different inputs that will have the same effect. Thus, if a

function expects to receive an integer argument in the range 1..100, then all argument

values in this range should cause the function to behave correctly, and any outside of this

range should cause an error. It should not be necessary to test for every single valid

argument value, nor for every single invalid value. Instead, the range of argument valuescan be partitioned into a number of equivalence classes (see para. 10).

8. Test Data v Test Cases

Test Data - The inputs devised to test the system

Test Cases - Input and Output specifications + a statement of the function under

test, the reason for the test and the expected result.

Test data can sometimes be generated automatically, but it is impossible to generate test

cases automatically.

9. Black box v White box testing

Black Box - Does not consider the code of a component. Test cases are derived

only from its specification and interface.

White box - Test cases are derived from a detailed study of the code of the

component to be tested.

These two methods are NOT alternatives. White box testing may be carried out early in

the testing process, while black box testing may be applied later. They are likely to

uncover different classes of error.



Testing

88

10. Black box testing

There are two techniques for deriving the test data -

! Equivalence Partitioning

! Boundary Value analysis

10.1 Equivalence Partitioning

This technique divides the input domain into a number of equivalence classes so

that a test on one representative value of each class is equivalent to a test using any

other value in that class.

Example

A function requires an argument Age which is an integer. The allowable range of

values for Age accepted by the function is 18..65.

From a study of the specification of the function or other program documentationthe following 3 equivalence classes can be identified:-

! Valid class any value in range 18..65

! Invalid class any value in range MIN(int)..17

! Invalid class any value in range 66..MAX(int)

Test cases can then be designed for each valid equivalence class and for each

invalid equivalence class - a total of 3 tests in this simple case.

If there is more than one argument, the test cases should cover the invalid classes

for only one argument at a time because one erroneous argument may mask the

effect of another erroneous argument.

Another Example - Binary Search function of an ordered array

bool binsearch( int array[], int numitems, int target, int& location ) /* Pre - The array is ordered, numitems >= 1, numitems <= no. of array elements

Post - If target is present in the array, then location records the element numberat which target was found and true is returned, else location recordsthe correct insertion point and false is returned */

{

int low = 0, high = numitems - 1, mid;

bool found = false;

do

{ mid = (low + high) / 2;

if( target > array[ mid ] )

low = mid + 1;

else

high = mid - 1;

} while( target != array[ mid ] && low <= high );

found = ( target == array[ mid ] )

if ( found )

location = mid;

else

location = low;

return found; }



Testing

89

Valid Equivalence classes for input arguments:-

The choice of VECs may require experience, e.g. that the binary search of an

ordered array may, if not correctly coded, behave differently depending on whether

the number of items stored in the array is odd or even, or if there is only one item.

! Array

" has 1 item (numitems = 1)

" has even number of items (e.g numitems = 6)

" has odd number of items (e.g. numitems = 7)

! Target

" is present in the array

" is not present in the array

= 6 combinations of valid equivalence classes

Invalid Equivalence classes for input arguments:-

These are all cases where the pre-conditions are not met. The specification of the

binsearch function says nothing about how it will respond to such error conditions.

C++ provides the facility for an exception to be raised in such cases and for error

handlers implemented elsewhere in the code to catch the exception and take the

necessary action.

In a production program these invalid equivalence classes would be tested to ensure

that the exception and handling mechanisms dealt correctly with the various causes

of the error.

Black box testing on classes of output

It is necessary to test the outputs from the function in the same way as for inputs.

The same principles are applied as for input by specifying valid and invalidequivalence classes for each output. Inputs are then devised that will produce these

defined outputs:-

! location

" valid 0..numitems

" invalid < 0

" > numitems

! valid return values (there are no invalid return values)

" non-zero (true)

" zero (false)

= 2 combinations of valid, and two combinations of invalid equivalence classes.



Testing

90

10.2 Boundary Value Analysis

This complements equivalence partitioning and, in practice, is used at the same time

as equivalence partitioning to determine the test data required for testing a

component.

Boundary values are those

! directly on

! just below

! just above

the boundaries of the equivalence classes

It is an observed fact that a greater number of errors occur at the boundaries of the

input domain than in the centre.

Examples

! Range of values, e.g. 18..65

! Test 17,18,65 and 66

! Discrete set of values, e.g. 2, 3, 5, 8, 13

! Test 1, 2, 13, 14

! Data structure (e.g. array) has 1..100 elements

! Test 0, 1, 100, 101

! Loop iterations, none, 1, 2, max, max + 1

It is also necessary to identify the boundaries of the output equivalence classes.

Boundary Analysis of Search Procedure

Previously identified valid equivalence classes:-

! Array

" has 1 item (numitems = 1)

" has even number of items (numitems = e.g. 6)

" has odd number of items (numitems = e.g. 7)

! Target

" is present

" is not present= 6 combinations of valid equivalence classes

Experience shows that programmers often make errors in an algorithm due to a

misunderstanding of its behaviour at the boundaries of its input domain. In the case

of the binary search algorithm, these errors might occur when the target (if present)

is located in the first element of the array, or in the last element. Obviously it is

necessary also to test the normal case when the target is in neither of these locations.



Testing

91

Thus the further test cases are added to those above:-

! Target is in first element of the array

! Target is in the last element of the array

! Target is in neither the first nor the last element

When the equivalence classes already developed are combined with these boundaryvalues, the following 10 test cases arise:-

! numitems = 1, target is present

! numitems = 1, target is not present

! numitems is even, target is in the first element

! numitems is even, target is in the last element

! numitems is even, target is present and in neither the first nor the last element

! numitems is even, target is not present

! numitems is odd, target is in the first element

! numitems is odd, target is in the last element

! numitems is odd, target is present and in neither the first nor the last element

! numitems is odd, target is not present.

11. White box testing - Introduction

Test data is derived from the actual source code of the component instead of from its

specification. Ideally tests should exercise all possible sets of paths through the code

How many different sets of paths exist for this simple piece of code?

1 2 3 4 5 6 7 8 9

First iteration A A A B B B C C C

Second iteration A B C B A C C A B

Loop twice

A B C

Sta tement Block

Branch ing Dec is ion



Testing

92

The answer is 9, i.e. 3 paths raised to the power number of

loop iterations.

And this?

The answer is 95,367,431,640,625 = 520 different sets of

paths.Evaluating every possible set of paths at 1 test/millisecond

would take 3,022 years. So exhaustive testing is not possible.

In practice, tests should guarantee that

! Each path (not necessarily all sets of paths) has been

exercised.

! All logical branches have both values tested (true and

false).

! All loops are exercised at their boundaries and within

their bounds.! All internal data structures have been exercised to ensure their validity.

But why do we need to go to all this trouble? Wouldn't we spend our time better simply

ensuring that the function/module/program requirements have been met? In other words

why don't we confine our tests to black box testing?

Because

! Logic errors and incorrect assumptions tend to occur in inverse proportion to the

probability that a path will be executed.

Normal processing tends to be well understood and scrutinised, but special casestend to fall down the cracks.

! We often believe that a path is unlikely to be executed when, in fact, it may be

executed regularly.

! Typing errors are usually picked up by the compiler. But those that are not detected

are just as likely to occur on an obscure logical path as on a mainstream path.

12. White box testing

12.1 Techniques

! Statement coverage

! Condition coverage

" Branch testing

" Domain testing

! Loop coverage

12.2 Statement coverage

Every statement should be executed at least once. See Sommerville Ch 22.2.1 on

path testing & cyclomatic complexity. also Pressman Ch 18.2..18.4

loop 20 t imes



Testing

93

12.3 Path testing

A technique for finding the number of unique paths through a program thus

providing the number of test cases.

Uses flow graphs derived from the program code or from the PDL (program

description language) for the routine + metrics for calculating the cyclomatic

complexity.

12.4 Path testing

Flow Graph Constructs

12.5 Cyclomatic Complexity

The cyclomatic complexity is a measure of

the logical complexity of the code. A flow

graph is drawn from the flow chart of the

component. The C.C. may be calculated in

any one of three ways (see flow graph on

next page).

! Number of regions (including the oneoutside the graph)

! Number of edges - number of nodes +

2

! Number of predicate nodes + 1.

(Predicates are simple 2 branch

constructs. Each diamond in the flow

chart opposite is a predicate).

Each of these three methods produces the

same cyclomatic complexity metric (i.e.

the number of independent paths throughthe code). In this example = 5

The number of independent paths also

provides the number of different test cases

required to ensure that all statements are

exercised.

Sequence

If

While

Re eat

Case

do

mid = (low + high) / 2;


low = mid + 1;else

high = mid - 1;

while( target != array[ mid ]

&& low <= high );

if ( found )

location = mid;else

location = low;

return found;

Flow chart for binary search



Testing

94

12.6 Condition Testing

Conditions are made up of:-

! Arithmetic & character

expressions involving

arithmetic and charactervariables and constants

! Relational expressions - logical

expressions involving

arithmetic and character

expressions and relational

operators. They have the value

of either TRUE or FALSE.

! Boolean variables.- Values

Non-zero (TRUE), zero

(FALSE).

! Boolean operators (&&, ||, !)

joining one or more logical

expressions.

! Parentheses surrounding simple

or compound conditions

Condition testing

Focuses on testing each condition

in the component (including each

of the simple conditions making up

a compound condition).

Condition testing strategies

! Branch testing

! Domain testing

The advantages of condition testing are i) it is easy to generate test cases and ii) it is

likely to reveal other errors in the program.

12.7 Branch testing

Test data is constructed so that the TRUE and FALSE branches of compound

conditions and the TRUE and FALSE values of every simple condition within the

compound conditions are tested.To find all possible combinations of the TRUE and FALSE branches of all

conditions, it is necessary to construct a truth table.

mid = (low + high) / 2;


low = mid + 1;else

high = mid - 1;

while( target != array[ mid ]

&& low <= high );

if ( found )

location = mid;else

location = low;

return found;

R1

R2

R3

R4

R5

Number of edges = 14

Number of nodes = 11

Number of regions = 5

Number of predicates = 4

Flow graph for binary search



Testing

95

Example

if ( A > 1 && B == 0 )

X /= A;

A > 1 B == 0 A > 1 && B == 0

TRUE / FALSE Value TRUE / FALSE Value TRUE / FALSE

T 3 T 0 T

T 3 F 1 F

F 1 T 0 F

F 1 F 1 F

For the above 2 conditions there are 4 test cases i.e. 22. For 3 conditions, there are 2

3

= 8 possible combinations etc. This technique is therefore only practicable for small

numbers of conditions.

12.8 Domain testing

Domain testing of relational expressions requires that 3 values be considered for

each variable component of a relational expression : less than, equal to and greater

than. For the above example, this gives rise to the following test cases for each

variable A and B.

= > <

A == 1 1 A > 1 2 A < 1 0B == 0 0 B > 0 1 B < 0 -1

There are therefore 3 test cases for each of the two variables in the example

compound condition, leading to 3

2

= 9 test cases. Again, the number of test casesrises rapidly as the number of variables involved in a relational expression

increases.

12.9 Loop coverage

The vast majority of algorithms in

software employ loops. Loop testing

focuses entirely on the validity of loops

which are classified as follows

! Simple loops

! Nested loops

! Concatenated loops

Simple loops Nested loops Concatenated loops



Testing

96

Simple loops

The following tests should be applied to simple loops, where n is the maximum

number of allowable iterations of the loop:-

! Skip (loop is not entered)

! One pass

! 2 passes

! m passes (m < n)

! n - 1, n, n + 1 passes

Nested loops

The number of times that statements within the inner loop are executed is the

product of the number of iterations of all nested loops within which it appears. Thus

a triply nested loop, where each loop iterates 10 times, will cause statements in the

inner loop to be executed 1,000 times. The number of test cases grows

geometrically and full testing may be impracticable. The suggested solution is:-a) Start with the innermost loop, setting all outer loop control variables to their

minimum.

b) Test the inner loop as Simple above.

c) Work outwards to next innermost etc. keeping outer loop control variables at

their minimums, and the inner at typical values.

d) Continue until all nested loops have been tested.

Concatenated loops

Where the concatenated loops are independent of each other, treat each as a simpleloop.

Where the second loop has the same control variable as the first and starts with its

value unchanged, treat the two loops as nested.

13. Automated Testing

Testing often accounts for as much as 40% of the total time spent on software

development. Automated testing tools are therefore an important ingredient in the software

developer's armoury. The following categories have been identified:-

! Static Analysers

! Carry out a static analysis of the program's structure and format.

! Code auditors

! Special purpose filters that check the quality of software to ensure it meets

minimum coding standards.

! Assertion processors



Testing

97

! The programmer writes assertions about the state of program. The assertion

processor tests whether they are true or false. C incorporates a simple form of

assertion testing:-

#include <assert.h>

int main ( void )

{ int i = 0;

for( ; i <= 10; i++ );

assert( i == 10 );

return 0;

}

/* Assertion failed: i == 10, file ASSERT.CPP, line 7

Abnormal program termination */

C++ provides exception handling which gives greater flexibility and permits an

exception handler to attempt recovery from an error.

! Test file & Test data generators

! Test verifiers - measure and report on internal test coverage

! Test harnesses - Allow the program to be installed in a test environment, and fed

input data. The behaviour of subordinate modules is simulated by stubs.

! Output comparators - compare output from the current version of program with that

from an earlier version to determine any differences

This is an area of growing importance and descendants of the first generation testing

tools are expected to cause radical changes in the way software is tested.



Data St ruct ure M et ri cs

99


1. Representing Abstract Structure

Assume we wish to store a linear list of names in random access memory. There are

several ways this could be done.

Scheme 1

Names are stored in successive memory locations (each name is assumed

to occupy only 8 bytes).

Given the start address of the list (1000), we can find the ith name by

going to address Start + (i - 1) * 8.

We can find the address of the next name by adding 8 to the address of the current element.

Thus, Scheme 1 implements the logical structure of the data by locating its elements in

physically adjacent memory locations.

But if we wish to retrieve a name (in order to access some other data associated with it),

then we would have to scan the list from the start, looking for the name to be retrieved.

Scheme 2

Each name is positioned in memory according to the value of its first

letter. The address for a particular name is found by

1000 + 8 * (int(firstletter) - int(`A'))

In this case there is no way of finding the logical successor of a record.

We are prevented from operating on the data using its logical structure.

But if we wished to retrieve a particular name, we could do so very

quickly by calculating the address directly from the name.

Scheme 3

Each element contains both a name and an address pointing to the

element's logical successor. Given the address of any element, we

can find its successor by simply going to the address contained in

that element.

Scheme 3 implements the logical order by linking the elements

together in the proper sequence which is not the same as the

physical sequence. Address 992 is used to hold the address of the

first name in the list. Milton has a blank successor address field

indicating that this is the last name in the list.

As with Scheme 1 we cannot find a given name other than by starting at the beginning of

the list and comparing each successive name with the target. These three schemes illustrate

the three fundamental methods of implementing abstract list data types - by an array, a

hash table and a linked list.

Address Name

1000 Milton

1008 Dickens

1016 Eliot

1024 Arnold

1032 Conrad

Scheme 1

Address Name

1000 Arnold

1008 -

1016 Conrad1024 Dickens

1032 Eliot

.. ..

1096 Milton

Scheme 2

Address Name SuccessorAddress

992 10241000 Milton 0

1008 Dickens 1016

1016 Eliot 1000

1024 Arnold 1032

1032 Conrad 1008

Scheme 3




100

2. Implementing Data Structures

The implementor of a data structure must design the black box so that memory space is not

wasted and the operations are performed efficiently.

If the user knows in advance how many data elements the structure is required to handle,then certain efficiencies can be gained. If not, then the structure must be made flexible in

order to accommodate an unknown number of items, or considerable space may be wasted.

If the length of the elements is fixed and known, again efficiencies can be obtained

compared with the case where elements are of unknown length.

The implementor can make certain operations efficient at the expense of others, and he

will need to know for which operations maximum efficiency is important to the user.

There is almost always a trade-off available between space and time. Greater speed can be

obtained at the expense of more memory space and, conversely, a saving in space will

usually incur a time penalty. Which is the more important to the user, space or time?

All of these considerations must be taken into account in the implementation of a data

structure.

3. Metrics

One way of implementing a list is to use an array. It is true that arrays are relatively

unsuitable for this purpose because of their inflexibility and because of the need to shuffle

array elements down to fill the hole left by a deletion, but they have the advantage of

requiring no overhead in terms of space. Linked lists, of course, carry an overhead in the

form of the links (pointers) that connect the nodes.

Envisage then a list implemented as an array as in Scheme 1 above and assume that we

wish to find the name Eliot in the list.

3.1 Number of Comparisons

We simply start at the first name in the list (Milton) and search through the list,

comparing each name encountered with Eliot. One measure of the time required to

find this name is the number of comparisons made of each name with the target.

Unless the list is very short, the time required to initialise and finalise the search

will be relatively unimportant when set against the number of comparisons. It is

generally true that the number of comparisons made when searching a data structure

will be one of the major factors in determining the speed of execution.

3.2 Number of Data Moves

This is the second most important operation determining the efficiency of operations on a data structure. Suppose we wished to remove the name Eliot from

the list and did not wish to leave a gap. We could move each name below Eliot up

one location, thus reducing the length of the list by one element. No comparisons

are required for this operations, but many data moves. The speed of deletion will

therefore be governed by the number of moves.




101

3.3 Algorithm Complexity

The measurement of the complexity of an algorithm is important because of the

effort required to

! implement the algorithm

! understand it

! debug it

! modify it

! maintain it

4. Mathematical Notations

One way of ascertaining the efficiency of algorithms used in operations on data structures

is to write a program which tests the algorithm on a large number of different types and

sizes of data. This approach is useful in trying to understand an algorithm and the factors

which affect its efficiency, but the problem is that:-

a) The data would only be valid for the computer, operating system and language we

have employed and the nature of the data stored in the data structure.

b) We could not possibly examine exhaustively all possible combinations of data

(there are over 358,000 different combinations of just four characters, ignoring

case).

c) We would finish up with a mass of results which would be difficult to understand

and distil into a general indication of the efficiency of the algorithm under

consideration.

We require a crude indicator of the time complexity of an algorithm that relates the time

taken to the number of elements held in the data structure. We are not particularly

concerned with the absolute amount of time, which, for one algorithm, will depend on thefactors mentioned in a) above.

Looking at the search example above, how many comparisons, on average will be required

to find a name in the list? Let n denote the number of names in the list:-

To find the average number of comparisons necessary to locate a name present in the list,

we first find the total required to find each of the names, and then divide by n. Thus, n

comparisons would be needed to find the last name, n - 1 to find the last but one ... through

to just one comparison to find the first. We can calculate the average number of

comparisons for n items without needing to know the value of n:-

Element Number Number ofComparisons

1 1

2 2

3 3

.. ..

.. ..n n




102

Total number of comparisons = n + (n-1) + (n-2) + .. 1

Reverse 1 + 2 + 3 + .. n

Add (n+1) + (n+1) + (n+1) + .. (n+1)

Since there are n items in the sequence, the total of the third row is n(n + 1). To find the

average number of comparisons, we need to divide by n and also by 2 since we added the2 sequences together.

Divide by 2n to find the average for any one name n(n+1) ie ½(n + 1)

2n

Thus the average number of comparisons required to find a name in the list is about half n

whatever the value of n. Since we have seen that the number of comparisons is a major

determinant of the time required, we can say that the time taken for this search is

proportional to ½(n + 1). Since the constant ½ is not significant in relation to other

possible factors of n, we can say that the order of magnitude of the efficiency of the search

is n, and we write this as O( n). This is sometimes referred to as the Big O notation.

Only the dominant term is chosen to represent a crude notion of the order of magnitude of the entire expression, eg

n(n+1) is O(n2)

15n logn + 0.1n2

+ 5 is O(n2)

6 logn + 3n + 7 is O(1)

2n - 5

Why is the second item above classified as O(n2) when this appears to form only a small

part of the expression? Table 1 shows the value of this function for various values of n.

The last column shows the value of the expression divided by 0.1n2. Note that from n =

512, the value in this last column starts to settle down to about 1.0 indicating the

overwhelming importance of the 0.1n2

component.

n 15n log2n 0.1n

215n.log

2n+0.1n

2+5 / 0.1n

2

8 120 3 6 371 58.03

32 480 5 102 2,507 24.49

128 1,920 7 1,638 15,083 9.21

512 7,680 9 26,214 95,339 3.64

4,096 61,440 12 1,677,722 2,415,007 1.44

65,536 983,040 16 429,496,730 445,225,375 1.04

1,048,576 15,728,640 20 109,951,162,778 110,265,735,583 1.00

Table 2 gives some idea of the values of several different functions of n.

Some simple sorting methods (e.g. Exchange or Bubble sort) operate in a time which is

O(n2) whereas other, more complex algorithms (e.g. Shell sort and Quicksort), operate in a

time which is O(nlog2n). If there were 1024 items to sort, the simple method would take

approx 1,000,000 units of time. Compare this with a time of only approx 10,000 for the

O(nlog2n) sort. However, it is not true to say that the complex sort is 100 times faster than

the simple sort.

Because of its complexity, the more powerful O(nlog2n) sort will carry an overhead whichresults in constants which are present in the true value of the function but which are

ignored in arriving at the crude order of magnitude value. For this reason also, the

complex sort may not be as fast as the simple sort for small values of n.

TABLE 1 Value of expression 15n logn + 0.1n + 5 for various values of n





Trees

105

Trees

1. Applications

! Trees are hierarchical structures and can be used in any application that models a

hierarchical structure, e.g. disk directory and file structure.

! In some forms they can provide rapid searching and lookup

! They can maintain their data ordered (usually on a unique key that is associated

with their data)

2. Implementation

Trees cannot normally be based on a fixed size structure such as an array. They are

normally implemented using dynamically allocated nodes linked by pointers.

3. Variations! Binary Search trees

! Expression Trees

! Balanced Trees

! N'ary Trees

! B Trees

4. Example Declaration

struct DataItem{

int key; // key to search on

anytype value; // depends on the application

};

struct Node

{

DataItem data; // struct as above

Node* left, *right; // pointers to left and right child nodes

};

struct BinaryTree

{ int count; // number of nodes

Node* root; // single entry point into the tree

};



Trees

106

5. Expression Trees

Assume the expression ( 3 + 4 ) * ( 6 - 4 ) is to be

evaluated. Parsing and evaluating an infix expression

of this sort in a single pass is very difficult because

the string has to be searched back and forth to

recognise and allow for the modifying effect that the

parentheses have on the meaning of the expression.

A tree of nodes representing operators ( +, -, *, / )

and values (or variables) can be built to represent the

semantics of the expression without the parentheses.

The tree can then be traversed to retrieve the

symbols and values in an appropriate order for

evaluation - see Traversal below.

6. Tree Traversal

There are several possible ways in which the tree can be traversed, the most common are

known as inorder, postorder and preorder :-

Inorder <left tree> Node <right tree> (3 + 4) * (6 - 4)

PostOrder <left tree> <right tree> Node 3 4 + 6 4 - *

PreOrder Node <left tree> <right tree> * + 3 4 - 6 4

The post order traversal would produce the nodes in an order suitable for evaluating the

resultant postfix expression using a stack.

The algorithm for binary tree traversal is one of the most elegant in computer science. It isrecursive:-

void inorderTraverse( Node* p )

{

if ( p != 0 )

{

inorderTraverse( p->left );

Process( p-> data );

inorderTraverse( p->right );

}

}

Process( p-> data ) is the operation that is to be carried out on each node. Note that this

algorithm effectively maintains its own stack of nodes visited but not yet processed. This

is represented by the series of stack frames that is pushed onto the system stack for each

call to the function. A non-recursive version of this algorithm requires an explicit stack of

nodes to be maintained and is quite inelegant when compared to the above.

#

+ -

3 4 6 4



Trees

107

7. Parse Trees

Sentence = Subject Verb Object

Subject = Noun | Noun Phrase

Object = Noun | Noun Phrase

Noun = Cat | Mat | Dog

Verb = sat | ate | chased

Parse trees such as the above very simple

example above may be used in natural language recognition and language translation

software.

8. Binary Search TreesThe (recursive) definition of a binary tree is:-

A binary tree is

! either empty

! or consists of a node with left and right binary trees

Binary search trees are ordered on a unique key field. The first data item to arrive causes a

new node to be allocated which becomes the root node. Access to the tree is always via the

root. For subsequent additions, the tree is traversed, looking for an empty left or right child

node starting at the root. If the key of the data to be added is less than that of the current

node, then the left child of the current node is visited. If the data to be inserted is greaterthan that of the current node, then the right child is visited. If the two data values are

equal, then the data cannot be added since binary trees rely on the keys being unique.

Eventually, an empty left link or right link is encountered. A new node is allocated and

linked in to the tree as the left, or right child of the node currently being visited. All

additions therefore take place at the lower levels of the tree - as leaf nodes.

! Searching for 6

Left, Right, found

! Searching for 11

Right, Left, Right, not found! Inserting 13

Right, Right, Left, not found so Insert as Left child

of 14

The total number of nodes in a perfectly balanced binary search tree is 2Level

-1. Thus, for

20 levels, the total number of nodes would be 1,048,575.

The efficiency of a perfectly balanced tree is measured by the average number of

comparisons required to find a key that is present in the tree. Since it requires one

comparison to visit the root node, two comparisons to examine the root node and one of its

child nodes etc. the maximum number of comparisons is the number of levels and, sincethe number of nodes doubles at each level, the average number of comparisons for a

perfectly balanced tree is the number of levels - 1. Thus for a perfectly balanced tree of

1,048,000 nodes, the average number of comparisons is Number of Levels - 1 = 19.

Sentence

Subject Verb Object

Noun NounPhrase

NounPhrase

Noun

OR OR

8

4 12

2 6 10 14



Trees

108

This makes binary search trees a suitable structure for fast retrieval of data by reference to

a key and, for this reason, the C++ Standard Template Library uses balanced binary search

trees to implement searchable structures such as map and set .

9. Importance of Balance

This tree was generated by inserting the data in

numeric order - 2, 4, 6 .. 16. If, as in this case,

the tree is not balanced, search efficiency

degrades towards a simple sequential search,

i.e. from an average number of comparisons =

Level - 1 to

½(n + 1). There is little difference between the

two in this small example but, for large

numbers of items, the difference in searching

efficiency is extremely large.

AVL Trees (from Adelson-Velskii & Landis)

employ a balancing algorithm on every insertion and deletion which ensures that the tree

maintains an adequate (although not perfect) balance. Another algorithm is red/black trees

that are used in the Standard Template Library.

10. Other types of tree

An important tree structure is the B Tree (not to be confused with binary tree). They are

used extensively in database software for file indexing, i.e. the storage (possibly in a

different file) of pairs consisting of a key and a record number at which the data associated

with the key may be found in a data file. They differ from binary trees in that each node

contains not one key, but an array of ordered keys and have the attribute that they arealways balanced and that new nodes are created by splitting the root node in two.

Arrays can be searched efficiently by a binary search in which the searched-for key is

compared with the middle element of the array. If it is smaller, then the 'top' half of the

array can be discarded, and a binary search carried out on the lower half. The converse

applies, of course, where the key is larger than the middle element of the array. The

efficiency of a binary search is the same as that of a binary tree. Each comparison is

halving the number of items which remain to be searched.

2

4

6

8

10

12

14

16Degenerate Binary Search Tree



Trees

109

Table 1 Metrics for Binary Trees

No. of comparisons for

Nodes in 2^ level Total Nodes level = Total Ave comps

Level Level (2^level - 1) level * nodes in level comps per node

1 1 2 1 1 1 1.000

2 2 4 3 4 5 1.667

3 4 8 7 12 17 2.429

4 8 16 15 32 49 3.267

5 16 32 31 80 129 4.161

6 32 64 63 192 321 5.095

7 64 128 127 448 769 6.055

8 128 256 255 1,024 1,793 7.031

9 256 512 511 2,304 4,097 8.018

10 512 1,024 1,023 5,120 9,217 9.010

11 1,024 2,048 2,047 11,264 20,481 10.005

12 2,048 4,096 4,095 24,576 45,057 11.003

13 4,096 8,192 8,191 53,248 98,305 12.002

14 8,192 16,384 16,383 114,688 212,993 13.001

15 16,384 32,768 32,767 245,760 458,753 14.000

16 32,768 65,536 65,535 524,288 983,041 15.000

17 65,536 131,072 131,071 1,114,112 2,097,153 16.000

18 131,072 262,144 262,143 2,359,296 4,456,449 17.000

19 262,144 524,288 524,287 4,980,736 9,437,185 18.000

20 524,288 1,048,576 1,048,575 10,485,760 19,922,945 19.000



Hash Tabl es

111

Hash Tabl es

1. Applications

! Compilers (see later under perfect hashing functions)

! Basis for other Abstract Data Types, e.g. Set, Dictionary

! Very efficient retrieval

2. Operations

! Insert

! Remove

! Find (Lookup)

3. EfficiencyThe measure of efficiency of searching and sorting is given using the big O notation (see

Data Structure Metrics on page 99). This is a very crude measure of the relationship

between time and the number of items being dealt with. The important factor is the rate at

which time increases as the number of items increases. Hash tables are unique among data

structures in that their efficiency is not dependent on the number of items stored and their

efficiency is therefore given as O(1).

4. Problem

The penalty paid for this exceptional measure of efficiency is that hashing destroys the

lexical order of keys, so that they cannot subsequently be retrieved in their lexical order.

5. Hashing

Data is stored in a Hash Table that is based on the fundamental array structure provided by

the language. The size of the table is always a prime number. Insertion (and searching) is

performed by applying some function to the key which converts it into an integer in the

range 0 .. table_size -1. The modulus operation is used to achieve wrap-around. In this

example the column headed ASC represents the sum of the ASCII codes of the first 3

characters of the name. This is then taken modulo 11 (the table size) to produce the table

index. The insertion of the first three items is

shown in the hash table (second of the two

tables). The fourth key BYR produces the same

index as that of WORDSWORTH - a collision.

This is not surprising since we are trying to

insert a very large domain of values into a table

with only 11 locations.

Name Key ASC Table Index

SHELLEY SHE 224 4

WORDSWORTH WOR 248 6

KEATS KEA 209 0

BYRON BYR 237 6

BLAKE BLA 207 9

BETJEMAN BET 219 10



Hash Tabl es

112

6. Collision Resolution

There are two strategies for resolving collisions:-

! Open Addressing

A second hashing function is used to give a

new table location and a further attempt is

made to enter the key into the table. The

simplest function to produce a new location

after a collision is to successively add 1 to the

result of hashing the key. But this can cause

clustering where the relative density of certain

areas of the table is higher than average. This

can give rise to a higher than necessary

number of collisions. An improved second

hashing function is:-

hashvalue = hashvalue + step

where step = hashvalue % ( table size - 2) + 1

step is computed only once before the loop is entered.

Probing continues until an empty slot is found or, after a certain number of tries, the

table is deemed to be full.

! Chaining

The Table entry contains a data

entry and a pointer to the head of a

list of data items that collided with

the first or, more simply, just a

pointer to the head of a list.

7. Hash Table example

This is a simple skeleton for a hash table that holds a String pair - a key and its associated

string data. Several functions are not shown, e.g. resize, search. The search function

closely matches the add function except that resizing is not needed.

#include "strng.h"

#include <assert.h>

struct Item // component type of the table{

String Key, Data;

bool occupied;

};

const TABLESIZE = 167; // 167 is prime

Item tabl[TABLESIZE]; // Hash table is an array of Item

int itemcount; // number of items stored

void init(void )

{

for ( int i = 0; i < TABLESIZE; i++ )

tabl[i].occupied = false;theSize = TABLESIZE;

itemcount = 0;

Key Data

0 KEA KEATS

1

2

3

4 SHE SHELLEY

5

6 WOR WORDSWORTH

7

8

9

10

KEA01

2

3

4

5

67

KEATS

SHE SHELLEY

WOR WORDSWORTH BYR BYRON



Hash Tabl es

113

}

void add( const String& key, const String& data )

{

// for best efficiency, the number of occupied slots should be <=

// 80% of table size

if ( itemcount > theSize * 8 / 10 )

{ resize( ); }

int hash = key.hashvalue(); // key must support a hashvalue function

int step = hash % (theSize - 2) + 1; // step size for collision resolution

hash %= theSize; // hash mod table size

int numprobes = 1; // to count the number of probes

// look for an unoccupied slot

bool foundslot = ( !tbl[hash].occupied );

// loop not entered if unoccupied slot found first time

while( !foundslot && (numprobes < theSize) ) // second cond is belt & braces

{

hash = ( hash + step ) % theSize;

foundslot = ( !tbl[hash].occupied );numprobes++;

}

assert( foundslot ); // should always be true

tbl[hash].Key = key; // store the key

tbl[hash].Data = data; // and the associated data

tbl[hash].occupied = true; // slot is now occupied

itemcount++; // increment count of items

}

8. Perfect Hashing Functions

The special properties of hash tables have led to extensive research to exploit theirefficiency. One such area is the speed at which compilers can parse the source code of a

program. If a function can be found that is guaranteed to find a unique location in a fixed

size hash table for all the reserved words of a programming language, then a significant

speed improvement could be gained. It is not easy to find such a function other than

empirically. But it may be worth a considerable amount of effort to find it bearing in mind

the commercial advantage to be obtained for a fast compiler.

A perfect hashing function for Pascal reserved words that does not result in any collisions

is:-

H(key) = L + g(key[1]) + g(key[L])

where L = the length of the reserved word and g = a function associating a letter with aninteger. This gives the fastest retrieval possible



Hash Tabl es

114

[2] do [11] while [20] record [29] array

[3] end [12] const [21] packed [30] if

[4] else [13] div [22] not [31] nil

[5] case [14] and [23] then [32] for

[6] downto [15] set [24] procedure [33] begin

[7] goto [16] or [25] with [34] until

[8] to [17] of [26] repeat [35] label

[9] otherwise [18] mod [27] var [36] function

[10] type [19] file [28] in [37] program



Libraries

115

Libraries

1. The ctype library

This is a 'C' library of functions that operate on characters. They include functions to testwhether a char is a letter, a digit, punctuation etc. and also to carry out case conversion.

The functions available from ctype.h are:-

int isalnum(int c);

int isalpha(int c);

int isascii(int c);

int toascii(int c);

int iscntrl(int c);

int isdigit(int c);

int isgraph(int c);

int islower(int c);

int isprint(int c);

int ispunct(int c);

int isspace(int c);

int isupper(int c);

int isxdigit(int c);

int tolower(int c);

int toupper(int c);

The use of int instead of char in the return and argument types is historical. For the is..

functions, the return type can be understood to be boolean, In all cases the argument type

can be read as type char.

Help on each on these functions is provided from the RHIDE menu Help.libc reference.functional categories.ctype.



Libraries

116

2. The maths library

These are to be found in math.h. To use them you need to

#include <cmath> or

#include <math.h>

The functions and constants to be found are:

double acos(double x);

double asin(double x);

double atan(double x);

double atan2(double y, double x);

double ceil(double x);

double cos(double x);

double cosh(double x);

double exp(double x);

double fabs(double x);

double floor(double x);

double fmod(double x, double y);double frexp(double x, int *pexp);

double ldexp(double x, int _exp);

double log(double y);

double log10(double x);

double modf(double x, double *pint);

double pow(double x, double y);

double sin(double x);

double sinh(double x);

double sqrt(double x);

double tan(double x);

double tanh(double x);

double acosh(double a);

double asinh(double a);double atanh(double a);

double hypot(double x, double y);

double log2(double x);

long double modfl(long double x, long double *pint);

double pow10(double x);

double pow2(double x);

#define M_E 2.7182818284590452354#define M_LOG2E 1.4426950408889634074

#define M_LOG10E 0.43429448190325182765

#define M_LN2 0.69314718055994530942

#define M_LN10 2.30258509299404568402

#define M_PI 3.14159265358979323846

#define M_PI_2 1.57079632679489661923#define M_PI_4 0.78539816339744830962

#define M_1_PI 0.31830988618379067154

#define M_2_PI 0.63661977236758134308

#define M_2_SQRTPI 1.12837916709551257390

#define M_SQRT2 1.41421356237309504880

#define M_SQRT1_2 0.70710678118654752440#define PI M_PI

#define PI2 M_PI_2

The usage of any of these functions can be found by running the info program from the

DOS command line. Move the cursor to

* libc.a: (libc.inf). The Standard C Library Reference

press Enter and choose menu options Functional Categories and math functions.

press Q to exit the info program



Libraries

117

3. The standard library

This requires the inclusion of cstdlib or stdlib.h. It is a miscellaneous collection of

functions for such operations as converting strings to numeric types, sorting and searching,

exiting or aborting a program, and executing DOS commands.

void abort(void);

int abs(int _i);

int atexit(void (*_func)(void));

double atof(const char *_s);

int atoi(const char *_s);

long atol(const char *_s);

void * bsearch(const void *_key, const void *_base, size_t _nelem,

size_t _size, int (*_cmp)(const void *_ck, const void *_ce));

div_t div(int _numer, int _denom);

void exit(int _status) __attribute__((noreturn));

char * getenv(const char *_name);

long labs(long _i);

ldiv_t ldiv(long _numer, long _denom);

void qsort(void *_base, size_t _nelem, size_t _size,

int (*_cmp)(const void *_e1, const void *_e2));

int rand(void);

void srand(unsigned _seed);

double strtod(const char *_s, char **_endptr);

long strtol(const char *_s, char **_endptr, int _base);

unsigned long strtoul(const char *_s, char **_endptr, int _base);

int system(const char *_s);

Some functions in the standard library have been omitted from the above list, because they

are either 'C' functions that have a better counterpart in C++ or because they refer to the

wide char type that is not covered on this course.

Help on these functions can be obtained from within RHIDE by selecting Help.libc

reference.alphabetical list or by entering info at a DOS prompt, moving the cursor to

* libc.a: (libc).

The Standard C Library Reference

and pressing Enter, then Alphabetical list.



Bibliography

Bibliography

C++ From the Beginning Skansholm J Addison-Wesley

C++ for Engineers Bramer B & Bramer S Arnold Instant C++ Programming Wilks Ian Wrox

C++ Primer 3rd

Edition Lippman Stanley B Addison-Wesley

The C++ Programming Language 3rd

Edition Stroustrup Bjarne Addison Wesley

Object-Oriented Programming using C++ Romanovskaya, Shapetko

& Svitovsky Wrox

Software Engineering 4th Edition Sommerville I Addison-Wesley

Software Engineering - A

Practitioner's Approach Pressman R S McGraw-Hill

Algorithms + Data Structures =

Programs

Wirth N Prentice Hall

Classic Data Structures in C++ Budd Timothy A Addison Wesley

Documents

Software Engeneering Using C++