78
91.102 - Computing II Lists (more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings Dynamic Memory Allocation Implementation Allocation Strategies Garbage Collection

91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

Embed Size (px)

Citation preview

Page 1: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Lists (more complex than before…)

List Representations

Generalized Lists (and Lists of Lists...)

Strings

C strings

Pascal strings

Dynamic Memory Allocation

Implementation

Allocation Strategies

Garbage Collection

Page 2: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

List ADT: a list (L) of items of type T is a sequence of items of type T on which the following operations are defined:

1.) Initialize L to empty;

2.) Determine whether L is empty;

3.) Find the length (size) of L;

4.) Retrieve the ith item of L;

5.) Replace the ith item (X) of L with a new item (Y);

6.) Delete an item from a non-empty list L at an arbitrary position - return the item to the user ;

7.) Insert a new item into L at an arbitrary position.

Page 3: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

8.) Add and element to the end.

9.) Determine whether L is full.

10.) Traverse the list while performing some operation on the members.

The last three, given for the Problem Set, may or may not be included in any minimal set of functions for the ADT.

Page 4: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Remember: we already discussed Linked Lists. At that moment we were less concerned about abstract data types than about implementation issues. From the ADT point of view, we looked at a set of functions and three distinct implementations with an interface that would hide (nearly) all the implementation details.

We also had a much simpler ADT in mind:

Create(List *L);

Empty(List *L);

Tail(Cons(&info, L)) = L

Head(Cons(&info, L)) = &info

And

Cons(Head(L), Tail(L)) = L

Page 5: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

We started looking more seriously at ADTs in the context of Priority Queues, Stacks and Queues. It is now time to look at more complex ADTs, their uses and implementation issues. You will find that there is no exact agreement as to what the List ADT is: different authors might add more functions to the previous table or delete some from it.

What is important is not the exact set of functions, but the separation of “functional definition” from implementation:

WHAT is done against HOW it is done.

Page 6: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Sequential List Representation:

x1 x2 x3 x4

Advantages: Selection and Replacement, given the index of the item affected, can be performed in constant time - O(1).

Disadvantages: Insertion and Deletion may require movement of large blocks of items (about half the items, on average, thus giving us expensive O(n) operations). Overflow is possible - checking for it would require adding another function to the ADT definition, and would require pre-allocation of enough space to hold the largest possible list.

Page 7: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

One-way Linked Lists:

L x1 x2 x3 x4

Advantages: uses only memory needed for Items + links; insertions and deletions do not require movement of Items.

Disadvantages: most operations are O(n) - but only involve Item comparisons, rather than moves. Simple comparisons are much cheaper than moves.

Memory disadvantage: if the Item is small (the size of a link or smaller), this scheme requires twice (or more) as much memory per Item stored as the sequential implementation. This would indicate that there ARE situations where the sequential implementation is preferable...

Page 8: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

An Operation Cost Comparison Table: n Items in the list, i denotes an arbitrary position, Delete assumes you want the deleted Item. L denotes the list, X and Y denote items.

Length(&L)

Linked Rep.Sequential Rep.List Operation

O(1) O(n)

Insert(X, &L, First) O(n) O(1)

Delete(&X, &L, Last) O(1) O(n)

Replace(Y, &L, i) O(1) O(n)

Delete(&X, &L, i) O(n)O(n)

Page 9: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Besides questions of time, one must also address questions of space:

Array representations allocate all the space at the beginning, and have the potential of wasting much space if the array is never close to being filled. If the array IS close to being filled, it also runs the risk of overflow…

The linked representations use only the space they need PLUS the amount of space used by the pointer that points to the next node in the list. If the item stored uses little space, the storage for the address may use most of the space allocated - this is wasteful too.

Page 10: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

A Memory Cost Comparison Table: n Items in the List. What is the OCCUPANCY rate for the sequential representation beyond which the sequential representation requires LESS memory than the Linked one? Let’s take a look at the space required.

Item = 1 Byte

Linked Rep.Sequential Rep.Item Size

n Bytes >= (1 + 4)*n B

Item = 4 Bytes (1word) 4*n Bytes (4 + 4)*n B

Item = 16 Bytes 16*n Bytes (16 + 4)*n B

Item = 256 Bytes 256*n Bytes (256 + 4)*n B

Item = 1024 Bytes (1024 + 4)*n B1024*n Bytes

L/S

5

2

5/4

260/256

1028/1024

Page 11: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

When are the space requirements the same?

Let q = number of Bytes for an Item, and p = number of Bytes for a pointer (2 in a 16-bit machine, 4 in a 32-bit one, 8 in a 64-bit one, etc…). Let MaxSize be the size of the array used for the sequential representation, and let n = number of Items stored.

Space for Sequential = q*MaxSize.

Space for Linked = (q + p)*n.

Same space used when: (q + p)*n = q*MaxSize, or

n/MaxSize = q/(q + p)

n = q*MaxSize/(q + p)

Page 12: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

Conclusion: n/MaxSize is the percentage of the array that must be occupied for the array implementation to be as efficient of memory as the linked list one: note that, for large q (large Items), this ratio is near 1.

Thus, for large q (large size items compared to the size p of a pointer), the array must be nearly full to provide any space advantage - but when operating near capacity, the probability of overflow becomes unacceptable. For small q (small size items compared to the size p of a pointer) the array doesn’t need to contain many items before it has better space utilization than the linked list…

91.102 - Computing II

Page 13: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Other List Representations.

Circular Linked Lists.

L x1 x2 x3 x4

One use: keep a history of something that goes back only n time steps. You point always to the earliest value (x1 in the picture above). When you update, you just overwrite it and move on:

L

x5 x2 x3 x4

Page 14: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Search in a Circular List: a KEY is what you use to search with (and for).

typedef struct ItemTag {

KeyType ItemKey;

InfoType ItemInfo;

} ItemType;

typedef struct NodeTag {

ItemType Item;

struct NodeTag *Link;

} ListNode;

This will allow a search for an Item given a Key and an entry point into the list.

Page 15: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

// Pre: a circular list and a key

// Post: a pointer to the node containing the key

// or null

NodeType *Search(ListNode *L, KeyType K)

{ ListNode *N;

N = L; // point to “first” node

if (N != NULL) // we have a list

do { // rummage in it

// equality check could be more complex

if (N->Item.ItemKey == Key) // found it

return(N); // return the pointer

else N = N->Link; // not found - keep going

} until (N == L); // back at starting point

return(NULL); // found nothing, say so

}

Page 16: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

// If you need to pass an equality function…

NodeType *Search(ListNode *L, KeyType K,

bool (* Equal)(KeyType, KeyType))

{ ListNode *N = L; // point to “first” node

if (N != NULL) // we have a list

do { // rummage in it

// equality check could be more complex

if (Equal(N->Item.ItemKey, Key)) // found it

return(N); // return the pointer

else N = N->Link; // not found - keep going

} until (N == L); // back at starting point

return(NULL); // found nothing, say so

}

91.102 - Computing II

Page 17: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Two-way (or Doubly) Linked Lists.

LL x1 RL LL x2 RL LL x3 RL LL x4 RL

L

Advantages: Insertions and deletions at arbitrary locations are easier; one can maintain a “current pointer” and do searches both forwards and backwards along the list.

Disadvantages: extra complexity in the data structures; more space used, especially if the Items are small.

Page 18: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

int Delete(ItemType *X, ListNode **L, int pos)

{ ListNode *N;

if (*L = NULL) return(0); // no list!!! Fail...

else if (pos == 1) { // deleting FIRST position

*X = (*L)->Item; // copy the item

N = (*L)->RightLink; // save the address of next

N->LeftLink = (*L)->LeftLink; // should be NULL

free(*L); // free the first node

*L = N; // update the list pointer

return(1); // return success

} else if (pos > 1) { // find the right place

N = *L; // get started

while ((N != NULL) && (pos > 1)) {

N = N->RightLink; pos--;

} // continued on next slide

91.102 - Computing II

Page 19: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

if (pos != 1) // list ended too soon

return(0); // failed - say so

else if (N != NULL) { // Check there is a node

*X = N->Item; // copy item

N->LeftLink->RightLink = N->RightLink; // reset

if (N->RightLink != NULL) // not last node

N->RightLink->LeftLink = N->LeftLink;// links

free(N); // release space

return(1); // signal success

} else

return(0); // conditions???...

}

else return(0); // conditions???...

}

91.102 - Computing II

Page 20: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Linked Lists with Header Nodes.

Problem: What if two different data structures point to the same list and you change the first element of the list from one of them, forgetting the other??? DANGLING POINTERS!!!!

Solution: provide every Linked List with a header node, so when you change the List, the header node will still be there. Everybody who uses the list MUST access it through the header node.

L

x1 x2 x3 x4

Header Node

Page 21: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Generalized Lists. What if our lists need to contain other lists as Items?

This gets more complicated: how can the Item field stand for both an “Item” and a List? Aren’t the two incompatible?

Many languages provide a “linguistic escape clause”: the “union” or, at least, the “variant record”. What is it?

In C, we can use the syntax:

union SubNodeTag{

ItemType Item;

struct GenListTag *SubList;

} SubNode;

Page 22: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

If G is a pointer to a List Node, we can use

G->SubNode.Item or

G->SubNode.SubList

And the system will know what to do…

Will we? Which one do we invoke when? We need to know if the current node contains a “real item” or just a pointer to another list structure, because our making the wrong decision could be rather lethal (“segmentation fault error”… or worse)

Since we know what we are INSERTING, we can always leave a “type marker” in the node at insertion time. We can use the marker to choose the right access syntax when we want to access the field.

Page 23: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Here is the full “solution”:

typedef struct GenListTag{

struct GenListTag *Link;

bool Atom;

union SubNodeTag {

ItemType Item;

struct GenListTag *SubList;

} SubNode;

} GenListNode;

Page 24: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

F

F T 4

T 3T 2T 1

T 7

FT 6T 5

L AtomSubList or Item Link

Page 25: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

// Pre: a Generalized (non-circular) List// Post: List is unchanged// Side-effect: contents of list are printed.void PrintList(GenListNode *L){ GenListNode *G; printf("("); // open the first parenthesis G = L; // G points to successive nodes of the List while (G != NULL) { // atomic item or sublist in node if (G->Atom) { // pointed to by G is printed printf("%d",G->SubNode.Item); } else { // sublists are printed recursively PrintList(G->SubNode.SubList);

} if (G->Link != NULL) // look ahead printf(" , "); // commas follow each item G = G->Link; // except the last item } printf(")"); // the closing parenthesis}

Page 26: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Generalized Lists and Structure Sharing.

There are times when one would like to have just one copy of some item, regardless of how many lists this item belongs to. The “standard” way (the (Item, Link)-Node) does not lend itself to any easy solution, since the Item is an integral part of the Node.

Generalized Lists provide a way to solve the problem:

F FFF

L

T dT cT bT a

F FFF

L1

Page 27: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Printing the first list will give:

((a), (b), (c), (d))

Printing the second will give:

((b), (a), (d), (c))

Why? Recall that each Item appears in a list with just one element…

If we update c to e, and reprint, BOTH printouts will reflect the change.

Page 28: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

What would happen if you were to try to print out:

F FFF

L

T cT bT a T d

Compared to:

T cT bT a T d

L

Page 29: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Some applications of Generalized Lists.

Symbolic Computation: Maple V anyone?

Representations of univariate and multivariate polynomials.

Artificial Intelligence: LISP anyone? Natural Language Understanding.

Next Step: Generic Lists - these would allow objects of all different types to be put and manipulated in lists. Lists of integers, strings, structs, floats, etc., all supported with the same syntax in the same program.

Page 30: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

An Application of Lists: Strings.

These are sequences (possibly empty) of characters. They correspond fairly naturally to “words” in whatever language - human or computer - we use.

Many computer languages provide an implementation of the string ADT, or a built-in library module for it. The functions provided vary a little from implementation to implementation, but all have a common core.

We look at a simplified version of the C package <string.h>: only part of the functions will be described - more are available (for a total of 25 in string.h) and should be studied by anyone planning to manipulate strings.

Page 31: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Assume S and T are strings - ex.: char *S = "ten"

strlen(S) : returns the number of characters in S;

strstr(S, T) : a pointer to the first occurrence of string S in string T - or NULL if there is no such occurrence.

strcat(S, T) : concatenate (append) a copy of T to the end of S and return a pointer to the new S.

strcpy(S, T) : make a copy of T and store it in S - i.e., starting at the position pointed to by S.

strspn(S, T) : return the length of the prefix of S consisting of characters in the string T.

strpbrk(S, T) : returns a pointer to the first occurrence in S of any character in T. NULL if there aren’t any.

Page 32: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

As you might have noticed, some of the functions are “obvious” and some are rather obscure, and may not appear very useful if you were asked to write a simple text-processing program…

Page 33: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

What about implementation?

We have to decide HOW to represent strings.

One part is easy: a string is just a sequence of characters, and since each character takes up one Byte (in ASCII), it makes sense - from the point of view of space utilization and speed of random access - to implement these sequences as arrays of characters. The “name” of the array is just a pointer to the first element (in C - not in every language, though).

The second part is a little harder, and is a consequence of the first part: usually, the array will NOT be full. How do we know which part of the array contains the string, and which part is unused space?

91.102 - Computing II

Page 34: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

There are two "canonical" solutions:

A) Use the first byte to hold a number - the number of characters in the string. This limits the maximum length of a string to 255 characters, but we can find its length in O(1) time.

B) Use a special terminating value (0 - not the character, but the value), something which CANNOT be a (n ASCII) character. This puts no limits on the size of the string but requires O(strlen(S)) time for finding its length.

Pascal uses the first, C the second.

Page 35: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

C is more efficient in its use of space - important since C was designed as a systems programming language for very small memory machines. Pascal was designed as a teaching language where safety and simplicity were more important: wasting some space was not crucial - and some mechanisms were added to allocate shorter arrays for “short” strings if space became a problem.

Page 36: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Implementation Issues: Finding the First Occurrence of a given Substring in a given String.

strstr(S, T) : returns a pointer to the first occurrence of string S in string T - or NULL if there is no such occurrence.

A p a r t m e n t

m e n

m e n

m e n

m e n

T

S

m e n

m e n

No Match: 'm' != 'A'

No Match: 'm' != 'p'

No Match: 'm' != 'a'

No Match: 'm' != 'r'

No Match: 'm' != 't'Match: 'm' = 'm', 'e' = 'e', 'n' = 'n'

Page 37: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

From Mr. Dale MooreAssociate Director of Computing FacilitiesSchool of Computer ScienceCarnegie Mellon University

#include <assert.h>char *strstr(const char *T, const char *S) { int j = 0; for (;;) { if (S[j] == '\0') return (char *)T; if (T[j] == '\0') return NULL; if (S[j] == T[j]) j++; else { j = 0; T++;} } }

91.102 - Computing II

Page 38: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Type Qualifiers. ANSI C contains two type qualifiers, const and variable. (Harbison & Steele, p. 72)

An l-value expression of a const-qualified type cannot be used to modify an object - or: such an l-value cannot be used as the left operand of an assignment expression or the operand of an increment or decrement operator.const int ic = 37;

ic = 5; // illegal

ic++; // illegal

---------------------------------

int * const const_pointer; // the POINTER is constant

const int *pointer_to_const; // the DATUM is constant

Page 39: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

What is the time cost of this algorithm? (This is not the quickest implementation of it, but it isn’t much worse than any other.)

Every time the string S is checked, one starts from its beginning and marches down both IT and the segment of T it is trying to match.

Consider:

S = aaaaaaab;

T = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa;

Page 40: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

S = aaaaaaab;

T = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa;

You must check ALL eight characters of S before you can decide that there is NO match. In the worst case, if m is the length of S and n is the length of T we have to carry out almost n*m comparisons: the algorithm is O(m*n)…

This may not seem bad, except if you are trying to find the location of a good size paragraph (m ≈ 500) in a thousand page novel….(n ≈ 2,000,000)

Page 41: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Can we do better?

The answer is YES, but (you should expect the BUT after every YES, by now…)...

The best we could conceivably expect is O(m + n) - after all, you should check all the characters before giving up. This is in fact obtained by the Knuth-Morris-Pratt Algorithm - much harder to code.

Another very good algorithm - with a not so good worst case - is the Boyer-Moore one. Also much harder to code.

Both of these algorithms have other overhead, so that if all you are trying to do is match a substring within a string the size of an 80-column input line, don’t bother...

Page 42: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

strcat(S, T) : the concatenator of two strings.

Concatenate T to the end of S. S must be large enough… If it isn’t, you are out of luck!!!

We will look at several implementations, more and more C "idiomatic". The first one will be - to all intents and purposes - "idiom independent".

Page 43: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

This is the "cleanest".

void strcat(char *S, char *T)

{

int i, j;

i = j = 0;

while (S[i] != '\0') i = i + 1; // find end of S

while (T[j] != '\0') { // now copy T

S[i + j] = T[j];

j = j + 1;

}

S[i + j] = '\0'; // terminate it

}

91.102 - Computing II

Page 44: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

This implementation is from Kernighan & Ritchie, The C Programming Language, 2nd edition.. It uses a number of C “idioms”. In particular, it makes use of the fact that every operator (including the assignment one) is a FUNCTION that returns the value which is the "result" of the operation. In this case the result is the value assigned. This code would not be "portable" to languages such as Pascal, Modula-3 or Ada.

void strcat(char *S, char *T)

{ int i, j;

i = j = 0;

while (S[i] != '\0') i++; // find end of S

while ((S[i++] = T[j++]) != '\0'); // now copy T

}

91.102 - Computing II

Page 45: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

Another version: from the CodeWarrior Pro 4 "string.c" file:

char * strcat(char * dst, const char * src){ const unsigned char * p = (unsigned char *) src - 1; unsigned char * q = (unsigned char *) dst - 1;

while (*++q); q--;

while (*++q = *++p);

return(dst);}

This is probably FASTER than the others and makes use of a programmer's understanding of what code a PARTICULAR compiler might generate. It IS legal C code, but little else.

91.102 - Computing II

Page 46: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

We finish with a version from P. J. Plauger, The Standard C Library: this also makes use of the fact that C allows one to increment pointers, so it is even less portable than the Kernighan & Ritchie example.

char * strcat(char * s1, const char * s2){ /* copy char s2[] to the end of s1[] */ char *s; /* find end of s1[] */ for(s = s1; *s != '\0'; ++s);

/* copy s2[] to end */ for(;(*s = *s2) != '\0'; ++s, ++s2);

return(s1);}

91.102 - Computing II

Page 47: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

What makes

while (S[i] != '\0') i++;

less desirable thanfor(s = s1; *s != '\0'; ++s); ?From what point of view?The first one requires an address recomputation from the beginning of the string (start address + index_of_item*size_of_item: one addition and one multiplication) every time the corresponding character is accessed; the second one updates the address by adding the correct number of bytes to the previous one (just one addition). The added efficiency depends not just on idioms, but on a capability of the compiler that is not shared by other languages...

91.102 - Computing II

Page 48: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

How about strstr : from the CodeWarrior Pro 4 "string.c" file:

char * strstr(const char * str, const char * pat) { unsigned char * s1 = (unsigned char *) str-1; unsigned char * p1 = (unsigned char *) pat-1; unsigned long firstc, c1, c2;

if ((pat == NULL) || (!(firstc = *++p1))) return((char *) str);

while(c1 = *++s1) if (c1 == firstc) { const unsigned char * s2 = s1-1; const unsigned char * p2 = p1-1;

while ((c1 = *++s2) == (c2 = *++p2) && c1);

if (!c2) return((char *) s1); } return(NULL);}

Probably faster: can you read it any better???

91.102 - Computing II

Page 49: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Text Processing:

A simple text processor could simply consist of a linked list of

typedef struct LineNodeTag {

char Line[80];

LineNodeTag *next;

} LineNode;

Searching for a string in this text consists of applying the function strstr to each of the successive lines, until success or failure.

Inserting a new line between two given ones is fairly easy; deleting a line is also easy.

Page 50: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Microsoft Word (or any other large scale word processor) is not quite that easy…

Just about every character (or potential position on the page) can have multiple attributes: for example a character within a word needs to know at least

• font;

• size;

• variants: (boldface, italic, underlined, etc…);

The text is often organized so that automatic tables of contents, bibliographies, footnotes, etc., can be handled.

And the megabytes pile up… along with the missed release deadlines...

Page 51: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

An Application of Lists: Dynamic Memory Allocation.

Static Allocation: all global named variables have their space allocated at program start and keep their space until program termination. There is nothing actively to manage.

Local variables, procedure parameters and return values use the “activation stack”, which is managed by the “system” - the compiler can make the appropriate decisions so there is nothing for the user to worry about, but the O.S. does need to worry.

Dynamically allocated structures, via explicit calls to malloc, have to be managed much more actively (by the user ?).

Page 52: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

A further problem: in multiprogramming environments each process needs its own memory resources and it is the job of the Operating System to find those resources and manage them in such a way that different processes will not accidentally or maliciously interfere with one another.

91.102 - Computing II

Page 53: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Garbage Collection: or how do we deal with malloc’d stuff.

One of the difficulties we must deal with is that a particular item might be pointed to from more than one location: if I change the value of one of the pointers and free the space, the other pointer is left dangling; if I don’t free the space, on the assumption that somebody else is pointing to it, the space may become inaccessible and thus “uncollectable garbage” reducing my memory resources.

Page 54: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

Solution 1 : for each item malloc’d, keep a “reference count”, so that every time a pointer stops pointing to it the count goes down by 1, and every time a new pointer points to it the count goes up by 1.

1 c1 b2 a 1 d

L

count

Problem: Make L point somewhere else, and we still have a count of 1, but the circular list is now inaccessible…

Do we now add a “circularity check”? To be run on a list every time its reference count at some node goes down to 1?

91.102 - Computing II

Page 55: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

1 c1 b2 a 1 d

L

Every assignment of a pointer variable must:

a) check that it actually alters the value of the variable;

b) if it does, it must decrement the count on the “old” space (unless the variable has a value of NULL) and increment the count of the “new” space;

Every call to free must

a) decrement the count on the space pointed to;

b) if the count is now 1, perform a circularity check; if at least one other count is > 1 before finding the original space, you will not be able to free anything: if the structure is circular, with all counts = 1, free everyone; if linear, free no-one.

c) if the count is 0, free the space. If it contains a pointer != NULL, decrement the count for that space and go to b).

d) the pointer variable in the free call must be returned NULLed.

Page 56: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Solution 2: Leave it to the programmer. malloc and free are to be used by the programmer with NO assistance from the system.

Usually, since the pair of functions malloc and free are rather expensive to use - they have to ask the Operating System to manage the memory blocks in question - the programmer sets up lists of objects of appropriate types (possibly more than one).

Page 57: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

When she wants a new object of a certain type, she first checks on the appropriate “free list”. If the list is not empty, a few pointer assignments are all it takes to get a new object; if the list is empty, a call to malloc will allocate more space.

When she wants to dispose of an object, rather than calling free, she can add the object to the appropriate “free list”. Again, just a couple of pointer assignments...

See: http://www.cs.uml.edu/~giam/CS2/PS/psSpecial.Source/Arithmetic.c

for some examples: calls to IndefIntFree…

Page 58: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

The two methods just mentioned work quite well if the application program does not need sophisticated list management facilities.

Otherwise, we need to introduce some much more powerful methods. Here is a very simplified version of:

Solution 3: Automatic Garbage Collection.

Since memory can be thought of as an array, allocate all of memory as list nodes. Although the user cannot access them as array elements, the system can.

Each list node has one bit (the mark bit ), set to either free or reserved .

You need space and there is no more space available: what must be done?

Page 59: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Your call for more space triggers a sequence of events:

1) The system marks all of the list nodes as free . This can be done in one sweep through the array of list nodes.

2) Starting from each and every defined symbol (named variable, named function, named anything - a table of names - a symbol table - must be kept active) the system chases pointers to any list nodes that are accessible (directly or indirectly) from the named entities, and marks them reserved.

Page 60: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

3) In a second sweep through the array of list nodes, the system collects all those still marked free into a “free list”. Nobody was pointing to them…

4) The program receives the head item of this free list and goes on.

5) If nothing is collected into the free list, your program dies...

Page 61: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

x y w z Free

Free

Mark all status bits “free”

Mark status bits “in use” chasing down from all symbols

Sweep up all remaining “free” cells into the FreeList

Resume computation

Page 62: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

A number of algorithms have been developed to support this activity. The text gives an implementation of the “mark and sweep” (or “mark and gather”) one.

The implementation shown is indicative of WHAT needs to be done and HOW one can attack the problem. It leaves much to be desired: the usual context when Garbage Collection is activated is when there is no more memory. Invoking (non tail-)recursive functions in the absence of memory can be rather suicidal: you will need a new stack frame for every recursive activation, and you have NO space left…

Page 63: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

The actual algorithms used are more complex and run in “bounded space” - they use fairly complicated loops - this IS a case where recursion must be actively avoided (unless you have some clever incremental algorithms and lots of Virtual Memory space).

Let's look at some details.

91.102 - Computing II

Page 64: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Data Structures for the Support of Garbage Collection:

#define FREE 0#define RESERVED 1

// Each ListNode has a MarkBit : FREE or RESERVED typedef struct NodeTag { short MarkBit; // FREE or RESERVED struct NodeTag *Item; struct NodeTag *Link;}ListNode;

// Assume further that all ListNodes are allocated // inside a region of memory as an array of nodes // called the ListNodeArray, as follows: ListNode ListNodeArray[ListNodeArraySize];ListNode *Avail; // Avail will point to the // available space list

Page 65: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

void GarbageCollection(void){ int i; // i indexes the ListNodeArray

// Phase 1—Initialization Phase—mark all ListNodes FREE for (i = 0; i < ListNodeArraySize; ++i) ListNodeArray[i].MarkBit = FREE;

// Phase 2—Marking Phase—mark all ListNodes in use // RESERVED

// Use the function MarkListNodesInUse of Program // Strategy 8.24 to mark all list nodes in use

Page 66: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

// Phase 3—Gathering Phase—link all FREE ListNodes // together

Avail = NULL; for (i = 0; i < ListNodeArraySize; ++i) {

if (ListNodeArray[i].MarkBit == FREE) { ListNodeArray[i].Link = Avail; Avail = (ListNode *)(&ListNodeArray[i]); }

} // at the conclusion, Avail is the new available space // list}

91.102 - Computing II

Page 67: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

void MarkListNodesInUse(ListNode *L){

if ( (L != NULL) && (L->MarkBit != RESERVED) ) {

L->MarkBit = RESERVED;

if (L->Item is a pointer to a ListNode) { MarkListNodesInUse(L->Item); }

MarkListNodesInUse(L->Link); }}

91.102 - Computing II

Page 68: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Heaps and Dynamic Memory Allocation.

What does the O.S. do for you and how?

The region of memory from which malloc extracts memory blocks is called the heap - free returns them there, at the same exact place where malloc got them.

How does the O.S. know what to give you?

It keeps a “free list”. How? There are many strategies, but here is a simplified version of one:

At the beginning the “free list” contains just one element: the block of memory the system allocated for your heap.

Page 69: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing IIYou now call malloc - then what?

Original heap:free

After the call to mallocfree

Allocated to you

After more calls to malloc free

Page 70: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

One usually interleaves calls to malloc with calls to free. After a while memory may look like this:

A mixture of free zones and in-use ones. How does the system manage?

At this point it would have a “FreeList”

Where each free item must know both its SIZE and where the next free item is.

Page 71: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

The Free List is not necessarily kept in order of increasing address: after all, allocations and deallocations should be somewhat random.

The arrows are more likely to look like a bowl of spaghetti than the neat picture of the earlier slide.

Page 72: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

You now call malloc again - what will happen?

FreeList

A) one of the elements of the free list is large enough to accommodate your request. At this point, the decision to be made is which one to give you, and what to do with the leftover memory.

Aa) Which one: could be the first that fits (first fit); could be the smallest that fits (best fit); could be the largest (worst fit). One could find reasonable justifications for any one of the three.

Page 73: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

Ab) What do you do with the leftover memory: you could just allocate it and not tell the user (internal fragmentation); or you could leave the leftover in the free list (external fragmentation). In the best-fit case, the leftover is likely to be so small as to be unusable for anything else; in the first-fit or worst-fit it might be large enough for some other request.

91.102 - Computing II

FreeList

Page 74: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

B) none of the elements of the free list is large enough to accommodate your request.

Ba) You could just die… this is the cheap way (from the O.S. point of view) of solving the problem.

Bb) You could try to coalesce memory - if two (or more) free blocks are one next to the other (contiguous) you could make a single block, which might be big enough.

Original:

Coalesced:

Request:

Page 75: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Bc) You could compact memory - move all the blocks in use so that they cover a contiguous block, move and coalesce all the free blocks so they cover a contiguous block, and try again.

Original:

Compacted:

Problem: your program had all kinds of pointers to the blocks in use. The blocks' addresses have changed due to compaction. Who updates YOUR pointers?

Page 76: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Answer: YOU can't and the OS can't.

Solution: since the OS CAN keep track of addresses in the HEAP (it DOES manage the compaction), it allocates a FIXED region of the heap to keep track of the addresses of the blocks, and your program, rather than obtaining the addresses of the actual blocks from the OS, gets addresses of these address locations. These latter are not changed by the compaction: their contents (addresses of the blocks) ARE.

The kind of pointer YOU have is called a "handle"; the kind of pointer the OS manages is called a "master pointer". It is the master pointer that points to the memory block. You point to the master pointer - the master pointers DO NOT GET MOVED AROUND, but their values give the current addresses of the blocks.

Page 77: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

Your Program

Heap Before Compaction

.

.

.

Handles Master Pointers

.

.

.

Master Pointers

Heap After Compaction

Page 78: 91.102 - Computing II Lists(more complex than before…) List Representations Generalized Lists (and Lists of Lists...) Strings C strings Pascal strings

91.102 - Computing II

All of these methods are complex, and all of them involve various time-space-complexity trade-offs. Unfortunately, the moment we moved away from the single-task personal computer (once in the 1950s - when mainframes became multi-user and multi-tasking, once in the late 1960s - when minicomputers did it, and once in the 1980s when the PC became multi-tasking) there was no way to avoid all these problems. They are and will be with us forever...