Array Data Structures & Algorithms

Array Data Structures & Algorithms

One-Dimensional and Multi-Dimensional Arrays, Searching & Sorting, Parameter passing to FunctionsArray Data Structures & Algorithms1Array Data Structures & AlgorithmsConcepts of Data CollectionsArrays in CSyntaxUsageArray based algorithms2Concepts of Data CollectionsSets, Lists, Vectors, Matrices and beyond3Sets & ListsThe human language concept of a collection of items is expressed mathematically using SetsSets are not specifically structured, but structure may be imposed on themSets may be very expressive but not always easily representedExample: The set of all human emotions.Lists are representations of sets that simply list all of the items in the setLists may be ordered or unordered.Some useful lists include data values and the concept of position within the list.Example representations: { 0, 5, -2, 4 } { TOM, DICK }First Second4Vectors, Matrices and beyondMany examples of lists arise in mathematics and they provide a natural basis for developing programming syntax and grammarVectors are objects that express the notion of direction and size (magnitude)The 3-dimensional distance vector D has components { Dx, Dy, Dz } along the respective x, y and z axes, with magnitude D = sqrt(Dx2+Dy2+Dz2)We use descriptive language terminology such asThe xth component of D ... Or, D-sub-x5Vectors, Matrices and beyondHigher dimensional objects are often needed to represent lists of lists.Matrices are one example that sometimes can be represented in tabular form2-dimensional tables (row, column)3- and higher are hard to visualize, but are meaningfulBeyond this, mathematics works with objects called tensors and groups (and other entities), and expresses the access to object members (data values) using properties and indices based on topologies (loosely put, shape structures)6Vectors, Matrices and beyondIn this course we focus on an introduction to the basic properties and algorithms associated with vectors and matrices, and lists (later)These objects are mathematically defined as ordered, structured listsThe ordering derives from the property that each element of the list exists at a specific positionEnumerated starting at 0 and incrementing by 1 to a maximum valueThe structure determines the mechanism and method of access to the list and its member elements7Arrays in CSyntaxMemory structureUsageFunctions8Arrays in CSyntaxHow to declare and reference 1D arrays using subscript notationMemory structureHow is RAM allocated the meaning of direct access through subscriptingUsageSome simple illustrative examplesFunctionsHow to deal with arrays as function arguments9Arrays in C - SyntaxConsider the array declarations

int StudentID [ 1000 ] ; float Mark [ 1000 ] ; char Name [ 30 ] ;

Each of the declarations defines a storage container for a specific maximum number of elementsUp to 1000 Student (integer) ID valuesUp to 1000 (real) MarksA Name of up to 30 characters10Arrays in C - SyntaxEach array is referred to by its declared name

float A [ 100 ] ;

... where A refers to the entire collection of 100 storages

On the other hand, each separate element of A is referenced using the subscript notation

A[0] = 12.34 ; /* assign 12.34 to the first element */ /* NOTE: subscripts always start from 0 */

A[K] = 0.0 ; /* assign 0 to the (K+1)th element */Note:Although a bit clumsy in human natural language, we can change our use of language so that A[K] always refers to the Kth element (not K+1), always starting from 0 as the 0th element.11Arrays in C - SyntaxIt is not necessary to initialize or reference all elements of an array in a programUnlike scalar variables declared as primitive data types, these uninitialized, non-referenced array elements are not flagged as warnings by the compilerThere are good reasons to declare arrays of larger size than might be required in a particular execution run of a programAt the outset, design the program to accommodate various sizes of data sets (usually acquired as input), up to a declared maximum sizeThis avoids the need to modify and recompile the program each time it is used.12Arrays in C - SyntaxThis is a good time to introduce another compiler pre-processor directive, #define:

#define is used to define constant expression symbols in C programs. The value of such symbols is that they localize positions of program modification.

Example:

#define MAX_SIZE 1000 int main ( ) { int SID [ MAX_SIZE ] ; float Mark [ MAX_SIZE ] ; ..... }

#define directives are normally located at the beginning of the program source code, after #include directives, and before function prototypes and the main function.

In the example, by using the defined symbol MAX_SIZE, changes to SID and Mark array sizes can be accomplished by simply changing the value assigned to MAX_SIZE and recompiling.13Arrays in C Memory structureNow consider the declaration int A [ 9 ] ;

The entire allocation unitis called A the array nameThere must be 9 integersized allocations in RAMEach element is locatedcontiguously (in sequenceand touching)RAMAA[0]

A[8]14Arrays in C Memory structureArrays are often calleddirect access storagecontainersThe reference to A[K]is translated by thecompiler to First, calculate the relative address offsetK * sizeof intSecond, add RAO tobase address of A, or simply&A[0]

&A[K] == &A[0] + K*sizeof intRAMA[0]

A[ K ]sizeof intRAODirect Access :: Since the cost of the address computation is always the same (constant) and it provides the actual RAM address location where the data is stored.The sizeof, operator is a compile-time operator (not an executable instruction or operator) that determines the RAM storage size allocated to a data structure. When sizeof is applied to a primitive data type, it provides the size of allocated storage, in bytes.Try running a program with statements such as:

printf( The size of int is %d bytes\n, sizeof int ) ;01

K15Arrays in C UsageReferencing arrays is straightforward using the subscript notation

B = A[ 5 ] ; /* assign 6th element of A to B */

A [ J ] < A [ K ] /* relational expression */

B = 0.5*( A[J] A[J-1] ); /* finite difference */

printf( %d %d %d\n, A[0], A[mid], A[N-1] ) ;

scanf ( %d%lf%lf, &N, &R[K], R ) ; /* Note */16Arrays in C Average vs MedianProblem: Input N real numbers and find their average and median.Assume the values are already sorted from lowest to highestAssume no more than 1000 values will be inputted

Solution:Declarations float X [1000], Sum, Ave, Median ; int N, Mid ;17Arrays in C Average vs MedianDeclarations float A [1000], Sum = 0.0, Ave, Median ; int N, Mid, K ;

Input Data printf( Enter number of values in list ) ; scanf( %d, &N ) ;

/* Enter all real values into array X */ for( K=0; K < N; K++ ) { scanf( %f, &A[K] ) ; /* NOTE: & must be used */ Sum += A[K] ; }18Arrays in C Average vs MedianCompute Average and Median

Ave = Sum / (float) N ; /* real division */

Mid = N / 2 ; /* (integer) midpoint of list */ Median = A [ Mid ] ;

Report results

printf( Average = %, Ave ); printf( Median = %f\n, Median );19Arrays in C Related arraysProblem: Obtain student marks from testing and store the marks along with ID numbersAssume marks are float and IDs are int data types

Solution Related arraysDefine two arrays, one for IDs and one for marks int SID [ 100 ] ; float Mark [ 100 ] ;

Coordinate input of data (maintain relationships) for( K = 0 ; K < N ; K++ ) scanf( %d%f, &SID[K], &Mark[K] ) ;20Arrays in C FunctionsPassing arrays as parameters in functions requires some care and some understanding. We begin with an example.

Calculate the dot product of two 3-vectors U and V.

Components: U[0], U[1], U[2] V[0], V[1], V[2]

Mathematics: The dot product is defined as DotProd( U, V ) ::= U[0]*V[0] + U[1]*V[1] + U[2]*V[2]

Since the dot product operation is required often, it would make a useful function.UVU . V21Arrays in C FunctionsSolution function:

double DotProd3 ( double U[3], double V[3] ) { return U[0] * V[0] + U[1] * V[1] + U[2] * V[2] ;

}

Note the arguments which specify that arrays of type double with exactly three (3) elements will be passed.Note that the limitation to 3 elements is reflected in the design of the function name: DotProd322Arrays in C FunctionsExtend this to dot product of N-dimensional vectors:

double DotProdN ( double U[ ], double V[ ], int N ) { double DPN = 0.0 ; int K ; for( K = 0 ; K < N ; K++ ) DPN += U[K] * V[K] ; return DPN ; }

Note the array arguments do not specify a maximum array size.This provides flexibility of design since now the function can handle any value of N. It is up to the programmer to ensure that the actual input arrays and N conform to the assumptions.23Arrays in C FunctionsAn alternative to the same code is to use pointer references:

double DotProdN ( double * U, double * V, int N ) { double DPN = 0.0 ; int K ; for( K = 0 ; K < N ; K++ ) DPN += U[K] * V[K] ; return DPN ; }

Note the array arguments are now expressed as pointer references.This maintains the same flexibility as previously.24Arrays in C FunctionsA final alternative to the same code is to use pointer references altogether:

double DotProdN ( double * U, double * V, int N ) { double DPN = 0.0 ; int K ; for( K = 0 ; K < N ; K++, U++, V++ ) DPN += *U * *V ; return DPN ; }

The U and V variables are address pointers to the array components. U++ and V++ perform the action of updating the pointers by an amount equal to the size of the array data type (in this case double is usually 8 bytes), thus pointing to the next array component in sequence.Pointers are not the same as ints !

If A is an int (say, 5), then A++ always evaluates to the next (or successor) value in sequence (ie. 6).

On the other hand, if P is a pointer (say, int *, with value &A[K]), then P++ evaluates to the next (or successor) value in sequence, which is usually the next element of an array (ie. &A[K=1]).25Arrays in C FunctionsThe previous examples have illustrated the various ways that are used to pass array arguments to functions.double DotProd3 ( double U[3], double V[3] );double DotProdN ( double U[ ], double V[ ], int N );double DotProdN ( double * U, double * V, int N ); There are important differencesWhen the size of the array is specified explicitly (eg. double U[3]) , some C compilers will allocate storage space for all array elements within the function stack frameIf arrays are declared within the function body, they are almost always allocated within the stack frameWhen the array size is not stated explicitly, a pointer to the array is allocated (much smaller in size than the entire array)26Arrays in C FunctionsC compilers may perform code and storage optimization Create the most efficient executable codeCreate the most efficient use of RAM storageBy allocating array storage within stack frames, a significant amount of wasted space occurs due to avoidable duplication of storage allocationsIt also follows that a wastage of time occurs since it is necessary to copy data from arrays declared in the calling point code to arrays declared in the called point.Pointers solve most of these problems (with a small, but acceptable, increase in processing time)Optimization is a difficult problem and is still the subject of much researchTheoretical27Array Based AlgorithmsSearchingSortingVery practical !28Array Based AlgorithmsSearchingHow to locate items in a listSimplicity versus speed and list properties

SortingPutting list elements in order by relative valuePromoting efficient search29Search AlgorithmsSearching is a fundamentally important part of working with arraysExample: Given a student ID number, what is the Mark they obtained on the test? Do this for all students who enquire.

Constructing a good, efficient algorithm to perform the search is dependent on whether the IDs are in random order or sorted.Random order use sequential searchSorted order use divide-and-conquer approach30Search Algorithms - RandomIf a list is stored in random order a possible search technique is to look at the list elements in random order search

int srchID, K ; printf( Enter your SID ) ; scanf( %d, &srchID ) ; for( K=rand() % N ; srchID != SID[ K ] ; K=rand() % N ) ;

printf( SID = %d, Mark = %f\n, SID[K], Mark[K] );

PROBLEM 1: No guarantee that rand() will produce a result and exit the for loop, especially if the item does not exist.

PROBLEM 2:It is possible that an array element position will be accessed that has not had data stored (will stop the program as an error uninitialized data access violation).31Search Algorithms - LinearIf a list is stored in random order a better search technique is to look at the list elements in order, from the beginning of the list until the element is found or the list elements are exhausted

int srchID, K, N = 100 ; /* Assume 100 elements */ printf( Enter your SID ) ; scanf( %d, &srchID ) ;/* Perform the search */ for( K=0; K

Documents

Array Data Structures & Algorithms