102
91.102: Honors Computing II Data Structures Willie Boag Spring 2013 May 10, 2013

91.102 Honors Computing II Final Project

Embed Size (px)

DESCRIPTION

Honors Computing II with Jim Canning had a three-program final. We also had the option of including additional work that we were proud of. Each program spanned many files. In addition, I have include write-ups for the programs that cover an introduction, capabilities, benefits and drawbacks from implementation choices, and test case results.My Project:I. Kruskal's MSTII. Simple Line EditorIII. Topological SortIV. Bloom FiltersV. Fast Fourier Transform

Citation preview

Page 1: 91.102 Honors Computing II Final Project

91.102: Honors Computing II

Data Structures

Willie Boag

Spring 2013

May 10, 2013

Page 2: 91.102 Honors Computing II Final Project

Table of Contents

Kruskal’s MST 4

Abstract, Description

Algorithm, Challenges

Testing

C Implementation

Simple Line Editor 29

Abstract, Description

Drawbacks, Reflection

C Implementation

Topological Sort 50

Abstract, Description

Algorithm, Testing

Reflection

C Implementation

Page 3: 91.102 Honors Computing II Final Project

Bloom Filters 69

Abstract, Description

Benefits, Drawbacks

Implementation Choices, Motivation

Results

C Implementation

Fast Fourier Transform 84

Abstract

Fourier Transform

Original Algorithm, Fast Fourier Transform

A Taste of Recursion

Why It Matters, Conclusion

C Implementation

Appendices 98

A: Kruskal’s MST Testing Code

B: Topological Sort Testing Code

Page 4: 91.102 Honors Computing II Final Project

Kruskal’s MST Abstract:

A minimum spanning tree (MST) of a graph is a new graph that connects all of the

vertices of the original in such a way that the sum of the edge weights of the tree is at a

minimum. There are many algorithms that have been developed that efficiently find a MST of a

graph. Kruskal’s algorithm is one example of a greedy algorithm.

Description:

A graph is a set of vertices and the edges that connect them. The edges on the graph can

be either one-directional (directed) or two-directional (undirected). Often times, every edge also

has an associated weight. For instance, if you represent a road map as a graph, then the cities

would be vertices, roads would be edges, and the time it takes to drive from one city to another

one would be represented by the weight of the edge joins them. It is usually very useful to find

the minimum spanning tree of a given graph. An MST has the property that all cycles are gone

(which means it removes some edges from the original graph).

One necessary condition for a MST to exist is that the graph must be connected. Since a

MST of a graph can only be formed by removing edges, it would be impossible to connect every

edge if the graph already starts disconnected. This issue had to be addressed when I was

generating test cases for my program.

Page 5: 91.102 Honors Computing II Final Project

Algorithm:

Kruskal’s algorithm is greedy. A greedy algorithm is one that it decides what action to

take based on the best immediate choice. The reason this MST algorithm is greedy is because it

decides which edge to consider next by selecting the edge of lowest weight.

At any point of the process, the graph that we store the answer in represents a forest (a

forest is just a collection of unconnected trees). At each iteration of the algorithm, the next

minimum weight is selected. A decision is made as to whether to add that edge to the forest- if

the edge combines two trees into one larger tree, then add it, but if the edge would create a cycle

in a current tree, then discard it. Once the forest has been unified into one large tree, the MST has

been found.

Challenges:

While writing this program, the two most challenging functions to write were

KruskalMST() and set_union(). Because we were given partially completed code to start from,

some of the implementation choices were selected for us. As a result, I found that Collection (the

array of sets) was a little clumsy to work with. I think I would’ve preferred a set of sets, because

removing an element from the set was less intuitive when using the array. Replacing two sets

with one set union created holes in the array, which seemed more awkward than it needed to be.

The biggest problem that I had during this assignment was freeing all of my allocated

space. I probably spent three times as much time trying to find all of my un-freed space as I did

actually writing the program. The reason that I knew I had un-freed space was because I ran the

command

valgrind --leak-check=full --track-origins=yes –v

which monitored my allocated memory and returned a summary of how many pointers were un-

freed at the termination of my program. This command was shown to me last semester, although

I do not know much at all about how it actually works. That being said, it was very helpful!

After a few days of searching for my memory leaks, I finally fixed them all. The most

subtle allocation bug that I saw came from set_union(), where any elements that were in both

sets S1 and S2 would have one copy of the data stored in the union while the other copy was

forgotten about. I could’ve fixed the problem by freeing the extra copy inside set_union(), but I

did not want to mutilate the arguments. Ultimately, I decided to copy all of the data into the new

set rather than just passing pointers. This made freeing the data much more straightforward.

Page 6: 91.102 Honors Computing II Final Project

Testing

To test my MST program, I needed lots of test graphs. I wrote a program that generated

random graphs. Unfortunately, my first attempt at this resulted in graphs that were not

necessarily always connected. Since a MST requires a connected graph, my first approach failed.

The solution, however, was very simple. After generating a random graph, I then made an

edge from 0 to every other vertex with a very large weight. This allowed the algorithm to

function normally but with the back-up edges that always connected everything to 0 (if need be).

This ensured that my graphs were always connected.

Page 7: 91.102 Honors Computing II Final Project

C Implementation:

Makefile

Makefile 8

Header Files

globals.h 9

graph.h 10

heap.h 11

queue.h 12

queue_interface.h 13

set.h 14

setinterface.h 15

Source Files

main.c 16

globals.c 17

graph.c 18

heap.c 22

queue_interface.c 25

set.c 26

setinterface.c 28

Page 8: 91.102 Honors Computing II Final Project

## Programmer: Willie Boag## Makefile for Kruskal’s Minimum Spanning Tree#

mst: main.o globals.o graph.o heap.o queue_interface.o set.o setinterface.ogcc −g −o mst main.o globals.o graph.o heap.o queue_interface.o set.o setinterface.o

main.o: main.c graph.h globals.hgcc −g −ansi −pedantic −Wall −c main.c

globals.o: globals.c globals.hgcc −g −ansi −pedantic −Wall −c globals.c

graph.o: graph.c graph.h queue.h set.h queue_interface.h setinterface.h globals.hgcc −g −ansi −pedantic −Wall −c graph.c

heap.o: heap.c heap.h globals.hgcc −g −ansi −pedantic −Wall −c heap.c

queue_interface.o: queue_interface.c queue_interface.h queue.h graph.h globals.hgcc −g −ansi −pedantic −Wall −c queue_interface.c

set.o: set.c set.h globals.hgcc −g −ansi −pedantic −Wall −c set.c

setinterface.o: setinterface.c setinterface.h set.h globals.hgcc −g −ansi −pedantic −Wall −c setinterface.c

clean:rm −f *.o

Page 9: 91.102 Honors Computing II Final Project

/********************************************************************//* Programmer: Willie Boag *//* *//* globals.h (Kruskal’s MST) *//********************************************************************/

#ifndef _globals#define _globals

#define DATA( L ) ( ( L ) −> datapointer )#define NEXT( L ) ( ( L ) −> next )

typedef enum { OK, ERROR } status ;typedef enum { FALSE=0 , TRUE=1 } bool ;typedef void *generic_ptr ;

extern int compare_vertex( generic_ptr *a, generic_ptr *b ) ;

#endif

Page 10: 91.102 Honors Computing II Final Project

/********************************************************************//* Programmer: Willie Boag *//* *//* graph.h (Kruskal’s MST) *//********************************************************************/

#ifndef _graph#define _graph

#include " globals.h"

typedef int vertex ;typedef struct { int weight ; int from ; vertex vertex_number ; } edge ;

#define UNUSED_WEIGHT (32767)#define WEIGHT(p_e) ((p_e) −> weight )#define VERTEX(p_e) ((p_e) −> vertex_number )

typedef enum { directed, undirected } graph_type ;

typedef struct {

graph_type type ; int number_of_vertices; edge **matrix ;

} graph_header, *graph ;

extern status init_graph( graph *p_G, int vertex_cnt, graph_type type ) ;extern void destroy_graph( graph *p_G ) ;extern status add_edge( graph G, vertex vertex1, vertex vertex2, int weight );extern void graph_size( graph G, int *p_vertex_cnt, int *p_edge_cnt );extern status KruskalMST( graph G, graph *T ) ;extern status write_graph( graph G ) ;extern int min_weight( graph G ) ;

#endif

Page 11: 91.102 Honors Computing II Final Project

/***************************************************************//* Programmer: Willie Boag *//* *//* heap.h (Kruskal’s MST) *//***************************************************************/

#ifndef _heap#define _heap

#define HEAPINCREMENT 128

#include "globals.h"

typedef struct { generic_ptr *base; int nextelement; int heapsize ; } heap ;

extern status init_heap( heap *p_H ) ;extern bool empty_heap( heap *p_H ) ;extern status heap_insert( heap *p_H , generic_ptr data , int (*p_cmp_f) () ) ;extern status heap_delete( heap *p_H, int element, generic_ptr *p_data, int (*p_cmp_f)() ) ;

#endif

Page 12: 91.102 Honors Computing II Final Project

/*******************************************************************//* Programmer: Willie Boag *//* *//* queue.h (Kruskal’s MST) *//*******************************************************************/

#ifndef _queue#define _queue

#include "heap.h"

typedef heap queue ;

#define init_queue( p_Q ) init_heap( (heap *) p_Q)#define empty_queue( p_Q) empty_heap( (heap *) p_Q)#define qadd(p_Q, data, p_cmp_f) heap_insert( (heap *) p_Q, data, p_cmp_f )#define qremove(p_Q, p_data, p_cmp_f) heap_delete( (heap *) p_Q, 0, p_data, p_cmp_f)

#endif

Page 13: 91.102 Honors Computing II Final Project

/************************************************************//* Programmer: Willie Boag *//* *//* queue_interface.h (Kruskal’s MST) *//************************************************************/

#ifndef _queueinterface#define _queueinterface

#include "queue.h"

extern status qadd_edge(queue *p_Q , int from, int to, int weight, int (*p_cmp_func)() ) ;

extern status qremove_edge(queue *p_Q, int *from, int *to, int *weight, int (*p_cmp_func)() ) ;

#endif

Page 14: 91.102 Honors Computing II Final Project

/************************************************************//* Programmer: Willie Boag *//* *//* set.h (Kruskal’s MST) *//************************************************************/

#ifndef _set#define _set

#include "globals.h"

typedef struct { generic_ptr *base ; generic_ptr *free ; int universe_size ; } set ;

extern status set_insert( set *p_S, generic_ptr element, int (*p_cmp_f)() ) ;extern status init_set( set *p_S, int size ) ;extern bool set_member( set *p_S, generic_ptr element, int (*p_cmp_f)() ) ;extern status set_write( set *p_S, status (*p_write_f)() ) ;extern status set_union( set *p_S1, set *p_S2, set *p_S3, int (*p_func_cmp)(), int sizeofelement ) ;

#endif

Page 15: 91.102 Honors Computing II Final Project

/*****************************************************************//* Programmer: Willie Boag *//* *//* setinterface.h (Kruskal’s MST) *//*****************************************************************/

#ifndef _setinterface#define _setinterface

#include "globals.h"#include "set.h"

extern status vertex_set_insert( set *p_S , int v ) ;

#endif

Page 16: 91.102 Honors Computing II Final Project

/******************************************************************//* Programmer: Willie Boag *//* *//* main.c (Kruskal’s MST) */ /******************************************************************/

#include <stdio.h>#include <stdlib.h>

#include "graph.h"#include "globals.h"

int main( int argc, char *argv[] ){

FILE *fileptr ; int weight, from, to, numberofvertices ; graph G, T ;

fileptr = fopen( argv[1], "r");

fscanf( fileptr, "%d", &numberofvertices );

init_graph( &G, numberofvertices, undirected ); init_graph( &T, numberofvertices, undirected );

while ( fscanf( fileptr, "%d %d %d", &from, &to, &weight ) != EOF ) add_edge( G, from, to, weight ) ;

printf("\nThe edges of the original graph are: \n" ) ; write_graph( G ) ; KruskalMST( G , &T ) ;

printf("\nThe edges of the MST are: \n" ) ; write_graph( T ) ;

printf("\n The minimum total weight is %d.\n ", min_weight( T ) ) ;

destroy_graph( &G ) ; destroy_graph( &T ) ;

fclose( fileptr ) ;

return 0 ;

}

Page 17: 91.102 Honors Computing II Final Project

/************************************************************//* Programmer: Willie Boag *//* *//* globals.c (Kruskal’s MST) *//************************************************************/

#include " globals.h"

extern int compare_vertex( generic_ptr *a, generic_ptr *b ) {

return *( int *)a − *( int *)b ;

}

Page 18: 91.102 Honors Computing II Final Project

/*******************************************************************//* Programmer: Willie Boag *//* *//* graph.c (Kruskal’s MST) *//*******************************************************************/

#include " globals.h"#include " queue.h"#include " graph.h"#include " set.h"#include " queue_interface.h"#include " setinterface.h"

#include <stdlib.h>#include <stdio.h>

extern status init_graph( graph *p_G, int vertex_cnt, graph_type type ) {

graph G ; int i, j ;

G = (graph) malloc ( sizeof(graph_header)) ; if ( G == NULL ) return ERROR ;

G −> number_of_vertices = vertex_cnt ; G −> type = type ; G −> matrix = (edge **) malloc ( vertex_cnt * sizeof(edge *));

if ( G −> matrix == NULL ) { free(G) ; return ERROR ; }

G −> matrix[0] = (edge *) malloc(vertex_cnt*vertex_cnt* sizeof(edge)) ;

if (G −> matrix[0] == NULL ) {

free ( G −> matrix ); free( G ); return ERROR ;

}

for ( i = 1 ; i < vertex_cnt; i++ ) G −> matrix[i] = G −> matrix[0] + vertex_cnt * i ;

for ( i = 0 ; i < vertex_cnt ; i++ ) for ( j = 0 ; j < vertex_cnt ; j++ ) { G −> matrix[i][j].weight = UNUSED_WEIGHT ; G −> matrix[i][j].vertex_number = j ; G −> matrix[i][j].from = i ; }

*p_G = G ; return OK ;

}

extern void destroy_graph( graph *p_G ) {

free((*p_G) −> matrix[0] ) ; free((*p_G) −> matrix ) ; free(*p_G) ;

p_G = NULL ;

}

extern status add_edge ( graph G, vertex vertex1, vertex vertex2, int weight ) {

if ( vertex1 < 0 || vertex1 > G −> number_of_vertices ) return ERROR ; if ( vertex2 < 0 || vertex2 > G −> number_of_vertices ) return ERROR ; if ( weight <= 0 || weight >= UNUSED_WEIGHT ) return ERROR ;

Page 19: 91.102 Honors Computing II Final Project

G −> matrix[vertex1][vertex2].weight = weight ;

if( G −> type == undirected ) G −> matrix[vertex2][vertex1].weight = weight ;

return OK ;}

extern void graph_size ( graph G, int *p_vertex_cnt , int *p_edge_cnt ) {

int i, j, edges ;

*p_vertex_cnt = G −> number_of_vertices ;

edges = 0 ;

for ( i = 0 ; i < G −> number_of_vertices ; i++ ) for ( j = 0 ; j < G −> number_of_vertices ; j++ ) if ( G −> matrix[i][j].weight != UNUSED_WEIGHT ) edges++ ; if ( G −> type == undirected ) edges /= 2 ;

*p_edge_cnt = edges ;

}

extern edge *edge_iterator( graph G, vertex vertex_number, edge *p_last_return ) {

vertex other_vertex ;

if (vertex_number < 0 || vertex_number >= G −> number_of_vertices) return NULL ;

if (p_last_return == NULL) other_vertex = 0 ; else other_vertex = VERTEX(p_last_return) + 1 ;

for ( ; other_vertex < G−> number_of_vertices ; other_vertex++) {

if (G −> matrix[vertex_number][other_vertex].weight != UNUSED_WEIGHT)

return &G −> matrix[vertex_number][other_vertex] ;

}

return NULL ;

}

static int What_Set_Am_I_In( set *Collection, int v, int n ) {

int i ;

for ( i = 0 ; i < n ; i++ ) {

if ( Collection[i].base == NULL ) continue ;

if ( set_member( &Collection[i], (generic_ptr) &v, compare_vertex ) == TRUE ) break ;

}

if ( i == n ) return −1 ;

return i ;

}

static int collection_size( set *Collection, int numberofvertices ) {

int i, size = 0 ;

for( i = 0 ; i < numberofvertices ; i++ )

Page 20: 91.102 Honors Computing II Final Project

if ( Collection[i].base != NULL ) size++ ;

return size ;

}

static int compare_weight( generic_ptr a, generic_ptr b ) {

return WEIGHT((edge *) a) − WEIGHT((edge *) b) ;

}

extern status KruskalMST( graph G, graph *T ) {

int i, S1, S2, numberofvertices, numberofedges ; int from, to, weight ; queue Q ; set *Collection, S3 ; edge *p_edge ; generic_ptr *item ;

graph_size( G, &numberofvertices, &numberofedges ) ;

/* Special case: graph has only one vertex. */ if (numberofvertices == 1) return OK ;

/* Construct a priority queue Q containing all edges. */ init_queue( &Q ) ;

for (i = 0 ; i < numberofvertices ; i++) { p_edge = NULL ;

while ( (p_edge = edge_iterator( G, i, p_edge)) != NULL) qadd_edge( &Q, p_edge−>from, VERTEX(p_edge), WEIGHT(p_edge), compare_weight ) ; }

/* Create an array of sets. */ Collection = (set *) malloc( sizeof(set) * numberofvertices ) ;

for (i = 0 ; i < numberofvertices ; i++) { init_set( &Collection[i], 1 ) ; vertex_set_insert( &Collection[i], i ) ; }

/* While the tree is not fully formed. */ while ( collection_size(Collection, numberofvertices) > 1 ) {

qremove_edge( &Q, &from, &to, &weight, compare_weight ) ;

S1 = What_Set_Am_I_In( Collection, from, numberofvertices ) ; S2 = What_Set_Am_I_In( Collection, to, numberofvertices ) ;

if ( S1 != S2 ) {

init_set( &S3, numberofvertices ) ;

set_union( &Collection[S1], &Collection[S2], &S3, compare_vertex, sizeof(vertex) ) ;

/* Free data of S1 and S2. */ for (item = Collection[S1].base ; item < Collection[S1].free ; item++)

free( *item ) ; free( Collection[S1].base ) ;

for (item = Collection[S2].base ; item < Collection[S2].free ; item++)free( *item ) ;

Page 21: 91.102 Honors Computing II Final Project

free( Collection[S2].base ) ;

Collection[S2].base = NULL ; Collection[S1] = S3 ;

add_edge( *T, S1, S2, weight ) ;

}

} /* Free all reserved space. */ while ( empty_queue( &Q ) == FALSE ) qremove_edge( &Q, &from, &to, &weight, compare_weight ) ; free( Q.base ) ;

for ( i = 0 ; i < numberofvertices ; i++) free( Collection[S1].base[i] ) ;

free( Collection[S1].base ) ;

free( Collection ) ;

return OK ;

}

extern status write_graph( graph G ) {

int i, j, numberofvertices, numberofedges ;

graph_size( G, &numberofvertices, &numberofedges );

for ( i = 0 ; i < numberofvertices ; i++ ) {

for ( j = 0 ; j < numberofvertices ; j++ ) {

if( G −> matrix[i][j].weight != UNUSED_WEIGHT ) {

printf( " \n") ;printf( " %d ", G −> matrix[i][j].from ) ;printf( " %d ", G −> matrix[i][j].vertex_number ) ;printf( " %d ", G −> matrix[i][j].weight ) ;

printf( " \n" ) ;

} } } return OK ;}

extern int min_weight( graph G ) {

int sum = 0, i, j, numberofvertices, numberofedges ;

graph_size( G, &numberofvertices, &numberofedges );

for ( i = 0 ; i < numberofvertices ; i++ ) {

for ( j = 0 ; j < numberofvertices ; j++ ) {

if( G −> matrix[i][j].weight != UNUSED_WEIGHT ) sum = sum + G −> matrix[i][j].weight ; } } return sum/2 ;}

Page 22: 91.102 Honors Computing II Final Project

/***************************************************************//* Programmer: Willie Boag *//* *//* heap.c (Kruskal’s MST) *//***************************************************************/

#include " globals.h"#include " heap.h"#include <stdlib.h>

static void siftdown( heap *p_H, int parent, int (*p_cmp_f)() ) ;static void siftup( heap *p_H, int element, int (*p_cmp_f)() );

extern status init_heap( heap *p_H) {

p_H −> base = (generic_ptr *) malloc(HEAPINCREMENT* sizeof(generic_ptr)) ;

if ( p_H −> base == NULL ) return ERROR ;

p_H −> nextelement = 0 ; p_H −> heapsize = HEAPINCREMENT ; return OK;

}

extern bool empty_heap ( heap *p_H ) {

return (p_H −> nextelement == 0) ? TRUE : FALSE ;

}

extern status heap_insert( heap *p_H, generic_ptr data, int (*p_cmp_f)() ) {

generic_ptr *newbase ;

/* * Insert data into p_H, p_cmp_f() is a comparison function that * returns a value less than 0 if its first argument is less than * its second, 0 if the arguments are equal. Otherwise, p_cmp_f() * returns a value greater than 0. * * The data is inserted in the heap by placing it at the end and * using siftup() to find its proper position. */

if (p_H −> nextelement == p_H −> heapsize) {

/* * Not enough space in the array, so more must be allocated. */

newbase = (generic_ptr *) realloc( p_H −> base, (p_H −> heapsize + HEAPINCREMENT) * sizeof(generic_ptr)) ;

if (newbase == NULL) return ERROR ; p_H −> base = newbase ; p_H −> heapsize += HEAPINCREMENT ;

}

p_H −> base[p_H −> nextelement] = data ;

siftup( p_H, p_H −> nextelement, p_cmp_f ) ;

p_H −> nextelement ++ ;

return OK ;

}

extern void siftup( heap *p_H, int element, int (*p_cmp_f)() ) {

Page 23: 91.102 Honors Computing II Final Project

int parent ; int cmp_result; generic_ptr tmpvalue ;

if ( element == 0 ) return ;

parent = (element − 1)/2; cmp_result = (*p_cmp_f)(p_H −> base[element], p_H −> base[parent] ); if (cmp_result >= 0 ) return ;

tmpvalue = p_H −> base[element] ; p_H −> base[element] = p_H −> base[parent] ; p_H −> base[parent] = tmpvalue ; siftup(p_H, parent, p_cmp_f );

return;

}

extern status heap_delete( heap *p_H, int element, generic_ptr *p_data, int (*p_cmp_f)() ){

if ( element >= p_H −> nextelement ) return ERROR ;

*p_data = p_H −> base[element] ; p_H −> nextelement−− ; if ( element != p_H −> nextelement ) {

p_H −> base[element] = p_H −> base[p_H −> nextelement ] ; siftdown(p_H, element, p_cmp_f );

} return OK ;

}

static void siftdown ( heap *p_H, int parent, int (*p_cmp_f)() ) {

/* * p_H is a heap except for parent. Find the correct place for parent * by swapping it with the smaller of its children. If a swap is * made, p_H is a heap except for the child’s position, so call * siftdown() recursively. */

int leftchild, rightchild, swapelement ; int leftcmp, rightcmp, leftrightcmp ; generic_ptr tmpvalue ;

leftchild = 2 * parent + 1 ; rightchild = leftchild + 1 ;

if (leftchild >= p_H −> nextelement) /* * No children. */ return ;

leftcmp = (*p_cmp_f)(p_H−>base[parent], p_H−>base[leftchild] ) ;

if (rightchild >= p_H −> nextelement) { /* * No right child. */

if (leftcmp > 0) {

tmpvalue = p_H−>base[parent] ; p_H−>base[parent] = p_H−>base[leftchild] ; p_H−>base[leftchild] = tmpvalue ;

}

Page 24: 91.102 Honors Computing II Final Project

return ;

}

rightcmp = (*p_cmp_f)( p_H−>base[parent], p_H−>base[rightchild] ) ;

if (leftcmp > 0 || rightcmp > 0) { /* * Two children. Swap with smaller child. */

leftrightcmp = (*p_cmp_f)( p_H−>base[leftchild], p_H−>base[rightchild] ) ;

swapelement = (leftrightcmp < 0) ? leftchild : rightchild ;

tmpvalue = p_H−>base[parent] ; p_H−>base[parent] = p_H−>base[swapelement] ; p_H−>base[swapelement] = tmpvalue ;

siftdown( p_H, swapelement, p_cmp_f ) ;

}

return ;

}

Page 25: 91.102 Honors Computing II Final Project

/****************************************************************//* Programmer: Willie Boag *//* *//* queue_interface.c (Kruskal’s MST) *//****************************************************************/

#include " queue_interface.h"#include " globals.h"#include " queue.h"#include " graph.h"

#include <stdlib.h>

extern status qadd_edge( queue *p_Q , int from, int to, int weight, int (*p_cmp_func)( ) ){

edge *p_edge = ( edge * ) malloc( sizeof( edge ) ) ;

if ( p_edge == NULL ) return ERROR ;

p_edge −> from = from ; p_edge −> vertex_number = to ; p_edge −> weight = weight ;

if (qadd(p_Q, (generic_ptr) p_edge, p_cmp_func) == ERROR) { free(p_edge) ; return ERROR ; }

return OK ;

}

extern status qremove_edge( queue *p_Q, int *from, int *to, int *weight, int (*p_cmp_func)() ) {

edge *p_edge ;

if ( qremove( p_Q, (generic_ptr *) &p_edge, p_cmp_func ) == ERROR ) return ERROR ;

*from = p_edge −> from ; *to = p_edge −> vertex_number ; *weight = p_edge −> weight ;

free( p_edge ) ;

return OK ;

}

Page 26: 91.102 Honors Computing II Final Project

/******************************************************************//* Programmer: Willie Boag *//* *//* set.c (Kruskal’s MST) *//******************************************************************/

#include <stdio.h>#include <stdlib.h>

#include " set.h"#include " globals.h"

#define MAX(a , b) ( ( (a) > (b) ) ? (a) : (b) )#define MINIMUM_INCREMENT 100 #define member_count(p_S) ( ( int) ((p_S) −> free − (p_S) −> base) )

typedef char byte ;

static status memcopy( byte *to, byte *from, int count ) {

while ( count−− > 0 )

*to++ = *from++ ;

return OK ;

}

extern status init_set( set *p_S, int size ) {

/* * Initialize a set of size elements. This set implementation * uses a dynamic array. */

p_S −> universe_size = MAX(size, MINIMUM_INCREMENT) ;

p_S −> base = (generic_ptr *) malloc( p_S−>universe_size * sizeof(generic_ptr)) ;

if (p_S−>base == NULL) return ERROR ;

p_S−>free = p_S−> base ;

return OK ;

}

extern status set_insert( set *p_S , generic_ptr element, int (*p_cmp_f)() ) {

generic_ptr *newset ;

/* * Insert element into the set. The dynamic array should * grow if needed. */

if ( set_member( p_S, element, p_cmp_f ) == TRUE ) return OK ;

if ( p_S −> universe_size == member_count(p_S) ) {

newset = (generic_ptr *) realloc( p_S −> base, (p_S−>universe_size + MINIMUM_INCREMENT) * sizeof(generic_ptr) ) ;

if (newset == NULL) return ERROR ;

p_S −> base = newset ;

Page 27: 91.102 Honors Computing II Final Project

p_S −> free = p_S −> base + p_S −> universe_size ; p_S −> universe_size += MINIMUM_INCREMENT ;

}

*p_S−>free = element ; p_S−>free++ ;

return OK ;

}

bool set_member( set *p_S, generic_ptr element, int (*p_cmp_f)() ) {

/* * Determine if element is in the set (using the passed comparison * function p_cmp_f()). Search the set sequentially. */ generic_ptr *item ;

for (item = p_S−>base ; item < p_S−>free ; item++)

if ( (*p_cmp_f)(*item, element) == 0)

return TRUE ;

return FALSE ;

}

extern status set_union( set *p_S1, set *p_S2, set *p_S3, int (*p_cmp_f)(), int sizeofelement ) {

/* * Store the union of sets *p_S1 and *p_S2 into the set *p_S3. */

generic_ptr *item, tmp ;

for (item = p_S1−>base ; item < p_S1−>free ; item++) { tmp = malloc( sizeofelement ) ; memcopy( (byte *) tmp, (byte *) *item, sizeofelement ) ; set_insert( p_S3, tmp, p_cmp_f) ; }

for (item = p_S2−>base ; item < p_S2−>free ; item++) { if (set_member( p_S3, *item, p_cmp_f) == TRUE) continue ; tmp = malloc( sizeofelement ) ; memcopy( (byte *) tmp, (byte *) *item, sizeofelement ) ; set_insert( p_S3, tmp, p_cmp_f) ; }

return OK ;

}

extern status set_write( set *p_S, status (*p_write_f)( ) ) {

generic_ptr *item ;

for ( item = p_S −> base ; item < p_S −> free; item++ ) (*p_write_f)(*item) ;

return OK ;

}

Page 28: 91.102 Honors Computing II Final Project

/*****************************************************************//* Programmer: Willie Boag *//* *//* setinterface.c (Kruskal’s MST) *//*****************************************************************/

#include "globals.h"#include "setinterface.h"#include "set.h"#include <stdlib.h>

extern status vertex_set_insert( set *p_S , int v ) {

int *p_int = ( int * ) malloc( sizeof( int ) ) ;

if ( p_int == NULL ) return ERROR ;

*p_int = v ;

if ( set_insert( p_S, (generic_ptr) p_int, compare_vertex ) == ERROR ) {

free( p_int ) ; return ERROR ;

}

return OK ;

}

Page 29: 91.102 Honors Computing II Final Project

Simple Line Editor

Abstract:

This is a program that can edit text files. Rather than being a full screen editor, it interacts

with the data one line at a time.

Description:

The Simple Line Editor is a case study in Data Structures: An Advanced Approach Using

C by Esakov and Weiss. It utilizes Doubly-Linked Lists. The operations that it can accomplish

include insert, delete, print, cut & paste, save, and quit.

Drawbacks:

This program cannot add lines to the end of a file. In addition, if you try to cut & paste

lines of text to the front of the file, the lines are lost/deleted.

Although it is not really a bug, the interface for the program is not user-friendly. The

driver cannot process commands that have spaces, which makes the interface messier to deal

with.

Page 30: 91.102 Honors Computing II Final Project

Reflection

Honestly, I did not like this application at all. I first started it in January (when I was

going through all of Esakov & Weiss on my own). It was the fourth application that relied on

some form of a linked list, and it had a third set of primitives to copy (ordinary, circular, double).

Since I had been going through the whole book in about two weeks, I found the program to be

very boring- especially since I had just finished working on the LISP interpreter the day before.

As a result, I began to dislike the Simple Line Editor. I decided to stop working my way through

Esakov & Weiss and start working on other schoolwork.

I revisited the code for this program in the first few days of May, and I still did not like it.

I felt as though it was neither a learning experience (unlike LISP, which was) nor was it an

actually useful application. Finishing the code was not an exciting process.

Fortunately, I did manage to have some fun with the code. There were many primitive

functions not in the book, which I needed to write myself. They all dealt with traversing their

way through the list. I decided that I could make the program more fun by writing all of these

functions recursively. That was my favorite part of the assignment.

Page 31: 91.102 Honors Computing II Final Project

C Implementation

Makefile

Makefile 32

Header Files

globals.h 33

dlists.h 34

interface.h 35

user.h 36

Source Files

main.c 37

dlists.c 39

interface.c 43

user.c 44

Page 32: 91.102 Honors Computing II Final Project

## Programmer: Willie Boag## Makefile for Simple Line Editor#

sle: main.o dlists.o user.o interface.ogcc −ansi −pedantic −Wall −o sle main.o dlists.o user.o interface.o

main.o: main.c dlists.h user.hgcc −ansi −pedantic −Wall −c −g main.c

dlists.o: dlists.c dlists.hgcc −ansi −pedantic −Wall −c −g dlists.c

user.o: user.c user.h dlists.h interface.h globals.hgcc −ansi −pedantic −Wall −c −g user.c

interface.o: interface.c interface.h dlists.hgcc −ansi −pedantic −Wall −c −g interface.c

clean:rm −f *.o

Page 33: 91.102 Honors Computing II Final Project

/************************************************************//* Programmer: Willie Boag *//* *//* globals.h (Simple Line Editor) *//************************************************************/

#ifndef _globals#define _globals

#define DATA( L ) ( ( L ) −> datapointer )#define NEXT( L ) ( ( L ) −> next )#define PREV( L ) ( ( L ) −> previous )

typedef enum { OK, ERROR } status ;typedef enum { FALSE=0, TRUE=1 } bool ;

typedef void *generic_ptr ;

#define E_IO 1#define E_SPACE 2#define E_LINES 3#define E_BADCMD 4#define E_DELETE 5#define E_MOVE 6#define MAXERROR 7

#define BUFSIZE 80

#endif

Page 34: 91.102 Honors Computing II Final Project

/*****************************************************************//* Programmer: Willie Boag *//* *//* dlists.h (Simple Line Editor) *//*****************************************************************/

#ifndef _dlists#define _dlists

#include "globals.h"

typedef struct double_node double_node, *double_list;

struct double_node { generic_ptr datapointer; double_list previous; double_list next;} ;

extern status allocate_double_node( double_list *p_L, generic_ptr data ) ;extern void free_double_node( double_list *p_L ) ;extern status init_double_list( double_list *p_L ) ;extern bool empty_double_list( double_list L ) ;extern status double_insert( double_list *p_L, generic_ptr data ) ;extern status double_append( double_list *p_L, generic_ptr data) ;extern status double_delete( double_list *p_L, generic_ptr *p_data ) ;extern status double_delete_node( double_list *p_L, double_list node ) ;extern void cut_list( double_list *p_L, double_list *p_start, double_list *p_end ) ;extern void paste_list( double_list *p_target, double_list *p_source ) ;

extern int double_length( double_list L ) ;extern double_list nth_double_node( double_list L, int n ) ;extern status double_traverse( double_list L, status (*p_func_f)() ) ;extern int double_node_number( double_list L ) ;extern double_list nth_relative_double_node( double_list L, int n ) ;extern void destroy_double_list( double_list *p_L, void (*p_func_f)() ) ;

#endif

Page 35: 91.102 Honors Computing II Final Project

/**********************************************************//* Programmer: Willie Boag *//* *//* interface.h (Simple Line Editor) *//**********************************************************/

#ifndef _interface#define _interface

#include "dlists.h"

extern status string_double_append( double_list *p_L, char *buffer ) ;

#endif

Page 36: 91.102 Honors Computing II Final Project

/****************************************************//* Porgammer: Willie Boag *//* *//* user.h (Simple Line Editor) *//****************************************************/

#ifndef _user#define _user

#include "globals.h"#include "dlists.h"

extern int readfile( char *filename, double_list *p_L );extern int writefile( char *filename, double_list *p_L );

extern int insertlines( char *linespec, double_list *p_head, double_list *p_current ) ;extern int deletelines(char *linespec, double_list *p_head, double_list *p_current ) ;extern int movelines( char *linespec, double_list *p_head, double_list *p_current ) ;extern int printlines( char *linespec, double_list *p_head, double_list *p_current ) ;

#endif

Page 37: 91.102 Honors Computing II Final Project

/********************************************************************//* Programmer: Willie Boag *//* *//* main.c (Simple Line Editor) *//********************************************************************/

#include " dlists.h"#include " user.h"

#include <stdlib.h>#include <ctype.h>#include <string.h>#include <stdio.h>

void printerror( int errnum ) ;

int main( int argc, char *argv[] ) {

/* * A simple text editor. */

char filename[BUFSIZ]; char buffer[BUFSIZ]; double_list linelist, currentline; bool file_edited, exit_flag; int rc;

init_double_list(&linelist);

printf(" Enter the name of the file to edit: ");

gets(filename); if ((rc = readfile(filename, &linelist)) != 0) { printerror(rc); exit(1); }

printf(" %d lines read.\n", double_length(linelist));

currentline = nth_double_node(linelist, −1);

file_edited = FALSE; exit_flag = FALSE;

while (exit_flag == FALSE) { printf(" cmd: "); gets(buffer); /* * Implement the following commands: * p − print * d − delete * i − insert * m − move * w − write * q − quit */ switch (toupper(buffer[0])) { case ’ \0’:

break; case ’ P’:

rc = printlines(&buffer[1], &linelist, &currentline);if (rc) printerror(rc);break;

case ’ D’:file_edited = TRUE;rc = deletelines(&buffer[1], &linelist, &currentline);if (rc) printerror(rc);break;

Page 38: 91.102 Honors Computing II Final Project

case ’ I’:file_edited = TRUE;rc = insertlines(&buffer[1], &linelist, &currentline);if (rc) printerror(rc);break;

case ’ M’:file_edited = TRUE;rc = movelines(&buffer[1], &linelist, &currentline);if (rc) printerror(rc);break;

case ’ W’:if (buffer[1] != ’ \0’) strcpy(filename, &buffer[1]);rc = writefile(filename, &linelist);if (rc != 0) printerror(rc);else printf(" %d lines written\n", double_length(linelist));file_edited = FALSE;break;

case ’ Q’:/* * If text has been modified, can’t quit without writing * unless you enter q two times in a row. */if (file_edited == TRUE) { printf(" File modified. Enter W to save, Q to discard.\n"); file_edited = FALSE;} else exit_flag = TRUE;break;

default:printerror(E_BADCMD);break;

} }

return 0;

}

void printerror( int errnum ) {

/* * Print error message to standard output. */

static char *errmsg[] = { " io error", " out of memory space", " invalid line specification", " invalid command", " error deleting lines" };

if (errnum < 0 || errnum >= MAXERROR) { printf(" System Error. Invalid error number: %d\n", errnum); return; }

printf(" %s\n",errmsg[errnum−1]);

return;

}

Page 39: 91.102 Honors Computing II Final Project

/********************************************************//* Programmer: Willie Boag *//* *//* dlists.c (Simple Line Editor) *//********************************************************/

#include <stdio.h>#include <stdlib.h>#include " dlists.h"#include " globals.h"

extern status allocate_double_node( double_list *p_L, generic_ptr data ) {

double_list L ;

L = (double_list) malloc( sizeof(double_node));

if (L == NULL) return ERROR;

*p_L = L; DATA(L) = data; PREV(L) = NULL; NEXT(L) = NULL;

return OK;

}

extern void free_double_node( double_list *p_L ) {

free(p_L); *p_L = NULL;

return;

}

extern status init_double_list( double_list *p_L ) {

/* * Initialize *p_L by setting the list pointer to NULL. * Always return OK (a different implementation * may allow errors to occur). */

*p_L = NULL;

return OK;

}

extern bool empty_double_list( double_list L ) {

/* Return TRUE if L is an empty list, FALSE otherwise. */

return (L == NULL) ? TRUE : FALSE;

}

extern status double_insert( double_list *p_L, generic_ptr data ) {

/* Insert a new node containing data as the first item in *p_L. */

double_list L;

if (allocate_double_node(&L, data) == ERROR) return ERROR;

if (empty_double_list(*p_L) == TRUE) { PREV(L) = NEXT(L) = NULL; } else {

Page 40: 91.102 Honors Computing II Final Project

NEXT(L) = *p_L; PREV(L) = PREV(*p_L); PREV(*p_L) = L; if (PREV(L) != NULL) NEXT(PREV(L)) = L; } *p_L = L;

return OK;

}

extern status double_append( double_list *p_L, generic_ptr data) {

/* Append a node to the end of a double_list. */

double_list L, temp;

if (allocate_double_node(&L, data) == ERROR) return ERROR;

if (*p_L == NULL) { *p_L = L; } else { for ( temp = *p_L ; NEXT(temp) != NULL ; ) temp = NEXT(temp); NEXT(temp) = L; PREV(L) = temp; }

return OK;

}

extern status double_delete( double_list *p_L, generic_ptr *p_data ) {

/* * Delete the first node in *p_L and return the DATA in p_data. */

if (empty_double_list(*p_L) == TRUE) return ERROR;

*p_data = DATA(*p_L); return double_delete_node(p_L, *p_L);

}

extern status double_delete_node( double_list *p_L, double_list node ) {

/* * Delete node from *p_L. */

double_list prev, next;

if (empty_double_list(*p_L) == TRUE) return ERROR;

prev = PREV(node); next = NEXT(node);

if (prev != NULL) NEXT(prev) = next; if (next != NULL) PREV(next) = prev;

if (node == *p_L) { if (next != NULL) *p_L = next; else *p_L = prev; }

Page 41: 91.102 Honors Computing II Final Project

free_double_node(p_L); return OK;

}

extern void cut_list( double_list *p_L, double_list *p_start, double_list *p_end ) {

/* *Extract the range of nodes *p_start −− *p_end from *p_L. */

double_list start, end ;

start = *p_start ; end = *p_end ;

if (PREV(start)) NEXT(PREV(start)) = NEXT(end) ;

if (NEXT(end)) PREV(NEXT(end)) = PREV(start) ;

if (*p_L == start) *p_L = NEXT(end) ;

PREV(start) = NEXT(end) = NULL ;

}

extern void paste_list( double_list *p_target, double_list *p_source ) {

/* * Take *p_source and put it after *p_target. Assumes * *p_source is the first node in the list. */

double_list target, source, lastnode ;

if (empty_double_list(*p_source) == TRUE) /* * Nothing to do. */ return ;

if (empty_double_list(*p_target) == TRUE) *p_target = *p_source ; else { source = *p_source ; target = *p_target ;

lastnode = nth_double_node(source, −1) ;

NEXT(lastnode) = NEXT(target) ;

if (NEXT(target) != NULL) PREV(NEXT(target)) = lastnode ;

PREV(source) = target ; NEXT(target) = source ; }

*p_source = NULL ;

}

extern int double_length( double_list L ) {

if (L == NULL) return 0 ;

return double_length( NEXT(L) ) + 1 ;

Page 42: 91.102 Honors Computing II Final Project

}

extern double_list nth_double_node( double_list L, int n ) {

if (L == NULL) return NULL ;

if (n == −1) { for ( ; NEXT(L) != NULL ; L = NEXT(L) ) ; return L ; }

if (n == 1) return L ;

return nth_double_node( NEXT(L), n − 1 ) ;

}

extern status double_traverse( double_list L, status (*p_func_f)() ) {

if (L == NULL) return OK ;

if ((*p_func_f)(DATA(L)) == ERROR) return ERROR ;

return double_traverse( NEXT(L), p_func_f ) ;

}

extern int double_node_number( double_list L ) {

if (L == NULL) return 0 ;

return double_node_number( PREV(L) ) + 1 ;

}

extern double_list nth_relative_double_node( double_list L, int n ) {

if (n == 0) return L ; if (n == −1) return PREV(L) ;

if (n > 0) return nth_relative_double_node( NEXT(L), n − 1 ) ;

return nth_relative_double_node( PREV(L), n + 1 ) ;

}

extern void destroy_double_list( double_list *p_L, void (*p_func_f)() ) {

if (empty_double_list(*p_L) == TRUE) return ;

destroy_double_list( &NEXT(*p_L), p_func_f ) ;

(*p_func_f)( DATA(*p_L) ) ;

free_double_node( p_L ) ;

*p_L = NULL ;

}

Page 43: 91.102 Honors Computing II Final Project

/**********************************************************//* Programmer: Willie Boag *//* *//* interface.c (Simple Line Editor) *//**********************************************************/

#include "dlists.h"#include "globals.h"#include <string.h>#include <stdlib.h>

extern status string_double_append( double_list *p_L, char *buffer ) {

char *str ;

str = (char *) malloc( sizeof(char) * (strlen((char *)buffer) + 1) ) ;

if (str == NULL) return ERROR ;

strcpy(str, (char *) buffer) ;

if (double_append(p_L, (generic_ptr) str) == ERROR) { free(str) ; return ERROR ; }

return OK ;

}

Page 44: 91.102 Honors Computing II Final Project

/*********************************************************************//* Programmer: Willie Boag *//* *//* user.c (Simple Line Editor) *//*********************************************************************/

#include " globals.h"#include " dlists.h"#include " user.h"#include " interface.h"

#include <stdio.h>#include <stdlib.h>#include <string.h>#include <ctype.h>

static FILE *outputfd;

static status writeline( char *s ) ;static int parse_linespec( char *linespec, double_list head, double_list current, double_list *p_start, double_list *p_end ) ;static int parse_number( char *numberspec, double_list head, double_list current, double_list *p_node ) ;

extern int readfile( char *filename, double_list *p_L ) {

/* * Read data from filename and put in the linked list *p_L. */ char buffer[BUFSIZ]; FILE *fd;

if ((fd = fopen(filename, " r")) == NULL) return 0;

while (fgets(buffer, BUFSIZ, fd) != NULL) { if (string_double_append(p_L, buffer) == ERROR) { fclose(fd) ; return E_SPACE; } }

fclose(fd); return 0;

}

extern int writefile( char *filename, double_list *p_L ) {

/* * Output the data in *p_L to the output file, filename. * Use the static global variable outputfd to store the output * file descriptor so that it can be used by writeline(). */

status rc;

if ((outputfd = fopen(filename, " w")) == NULL) return E_IO;

rc = double_traverse(*p_L, writeline); fclose(outputfd);

return (rc == ERROR) ? E_IO : 0;

}

static status writeline( char *s ) {

Page 45: 91.102 Honors Computing II Final Project

/* * Write a single line of output to outputfd. Outputfd * must point to a file previously pened with fopen (as * is done in writefile(). */

if (fputs(s, outputfd) == EOF) return ERROR;

return OK;

}

extern int insertlines( char *linespec, double_list *p_head, double_list *p_current ) {

/* * Insert new lines before the current line. */

double_list newdata, startnode, endnode, lastnode; status rc; int cmp, parseerror; char buffer[BUFSIZ];

/* * If the list is empty, no linespec is allowed. */ if (empty_double_list(*p_head) == TRUE) { if (strlen(linespec) != 0) return E_LINES; startnode = endnode = NULL; } else { /* * If a linespec is given, it better be a single line number */ parseerror = parse_linespec(linespec, *p_head, *p_current, &startnode, &endnode);

if (parseerror) return parseerror; if (startnode != endnode) return E_LINES; }

/* * Collect the new lines in newdata. Then "paste" the list before * startnode. */ init_double_list(&newdata); do { printf(" insert>"); fgets(buffer, BUFSIZ, stdin); cmp = strcmp(buffer, " .\n"); if (cmp != 0) { rc = string_double_append(&newdata, buffer); if (rc == ERROR)

return E_SPACE; } } while (cmp != 0); if ( empty_double_list(newdata) == TRUE) return 0;

if (startnode == NULL) { /* * Empty list */ *p_head = newdata; *p_current = nth_double_node(newdata, −1); } else if (PREV(startnode) == NULL) { /* * Insert before the first line. */

Page 46: 91.102 Honors Computing II Final Project

lastnode = nth_double_node(newdata, −1); paste_list(&lastnode, p_head); *p_head = newdata; *p_current = startnode; } else { /* * Insertin the middle of the list. */ paste_list(&PREV(startnode), &newdata); *p_current = startnode; }

return 0;

}

extern int deletelines( char *linespec, double_list *p_head, double_list *p_current ) {

/* * Delete some lines (according to linespec from p_head. * Update p_current to be after last line deleted. * If the last line is deleted, make p_current be before first line. */

double_list startnode, endnode, tmplist; double_list new_current; int startnumber, endnumber; int rc;

rc = parse_linespec(linespec, *p_head, *p_current, &startnode, &endnode); if (rc) return rc;

startnumber = double_node_number(startnode); endnumber = double_node_number(endnode); if (startnumber > endnumber) { tmplist = startnode; startnode = endnode; endnode = tmplist; } new_current = nth_relative_double_node(endnode, 1); if (new_current == NULL) new_current = nth_relative_double_node(startnode, −1);

cut_list(p_head, &startnode, &endnode);

*p_current = new_current; destroy_double_list(&startnode, free);

return 0;

}

extern int movelines( char *linespec, double_list *p_head, double_list *p_current ) {

/* * Move lines to after p_current. Make sure the lines moved * do not include p_current. */

double_list startnode, endnode; double_list tmpnode; int startnumber, endnumber; int rc, currentnumber; int tmp;

rc = parse_linespec(linespec, *p_head, *p_current, &startnode, &endnode ); if (rc) return rc; startnumber = double_node_number(startnode); endnumber = double_node_number(endnode);

Page 47: 91.102 Honors Computing II Final Project

currentnumber = double_node_number(*p_current); /* * Make sure start < end. */ if (startnumber > endnumber) { tmp = startnumber; startnumber = endnumber; endnumber = tmp; tmpnode = startnode; startnode = endnode; endnode = tmpnode; } /* * Do not include the current line in the ones being moved. */ if (currentnumber >= startnumber && currentnumber <= endnumber) return E_LINES;

cut_list(p_head, &startnode, &endnode); paste_list(&PREV(*p_current), &startnode);

return 0;

}

extern int printlines( char *linespec, double_list *p_head, double_list *p_current ) {

/* * Print out lines. Direction indicates whether going forward or * backward. */

double_list startnode, endnode ; int startnumber, endnumber, count, direction ; int rc ;

rc = parse_linespec( linespec, *p_head, *p_current, &startnode, &endnode ) ; if (rc) return rc ;

startnumber = double_node_number( startnode ) ; endnumber = double_node_number( endnode ) ;

direction = (startnumber < endnumber) ? 1 : −1 ;

count = (endnumber − startnumber) * direction + 1 ;

while ( count−− ) { printf(" %d %s", startnumber, ( char *) DATA(startnode) ) ;

startnumber += direction ; startnode = nth_relative_double_node(startnode, direction) ; }

*p_current = endnode ;

return 0 ;

}

static int parse_linespec( char *linespec, double_list head, double_list current, double_list *p_start, double_list *p_end ) {

/* * Parse linespec (consisting of numberspec,numberspec). * Set p_start to the starting line and p_end to the ending line. */

int rc ;

Page 48: 91.102 Honors Computing II Final Project

char *nextnumber ;

if (*linespec == ’ \0’) *p_start = current ; else { rc = parse_number(linespec, head, current, p_start) ; if (rc) return rc ; }

nextnumber = strchr(linespec, ’ ,’) ;

if (nextnumber == NULL) *p_end = *p_start ; else { rc = parse_number(nextnumber + 1, head, current, p_end) ; if (rc) return rc ; }

if (*p_start == NULL || *p_end == NULL) return E_LINES ;

return 0 ;

}

static int parse_number( char *numberspec, double_list head, double_list current, double_list *p_node ) {

/* * Parse a single numberspec. */

char numberbuffer[BUFSIZ], *p_num ; int nodenumber ; int direction ;

if (*numberspec == ’ .’) { /* * Start with the current line. */ *p_node = current ; numberspec++ ; } else if (*numberspec == ’ $’) { /* * Start with the last line. */ *p_node = nth_double_node(head, −1) ; if (*p_node == NULL){ return E_LINES ; } numberspec++ ; } else if (isdigit(*numberspec)) { /* * Have a line number. */ p_num = numberbuffer ;

while (isdigit(*numberspec)) *p_num++ = *numberspec++ ;

*p_num = ’ \0’ ; nodenumber = atoi( numberbuffer ) ;

*p_node = nth_double_node(head, nodenumber) ;

if (*p_node == NULL) return E_LINES ; } else return E_LINES ;

Page 49: 91.102 Honors Computing II Final Project

/* * Any plusses or minuses? */ if (*numberspec == ’ +’) { direction = 1 ; numberspec++ ; } else if (*numberspec == ’ −’) { direction = −1 ; numberspec++ ; } else direction = 0 ; /* * If a digit and previously saw a plus or minus, figure * offset from p_node. */

if (isdigit(*numberspec) && direction != 0) { p_num = numberbuffer ;

while ( isdigit(*numberspec)) *p_num++ = *numberspec++ ; *p_num = ’ \0’ ;

nodenumber = atoi( numberbuffer) * direction ;

*p_node = nth_relative_double_node(*p_node, nodenumber) ;

if (p_node == NULL) return E_LINES ;

direction = 0 ; }

/* * If direction is 0 (meaning no offset or offset was parsed ok) * and at end of this numberspec, then everything is ok. */ if (direction == 0 && (*numberspec == ’ \0’ || *numberspec == ’ ,’)) return 0 ; else return E_LINES ;

}

Page 50: 91.102 Honors Computing II Final Project

Topological Sort Abstract:

A topological sort of a partial ordering is an arrangement of elements of a set in such a

way that they satisfy the rules by a given comparison function. Such an arrangement comes up in

many fields of study. One example of a partial ordering is a set of courses and their pre-

requisites. The pre-requisite rules impose restrictions on the order in which classes can be

chosen. A topological sort that partial ordering would simply be a list of classes to take so that

you always take the pre-requisites of a class first.

Description:

In Mathematics, the combination of a set and a comparison function form a partial

ordering if the comparison function is transitive, reflexive, and anti-symmetric. A topological

sort of a partial ordering is a permutation of the elements of the set such that there are no

conflicts with the order established by the comparison function.

As mentioned above, two necessary conditions for a partial ordering to exist is that the

comparison function is anti-symmetric and transitive. In terms of what that means on a graph, the

graph cannot have cycles. As a result, only Directed Acyclic Graphs (DAGs) can be

topologically sorted. If a cycle were to exist, then it would be the equivalent of a never-ending

circle of pre-requisites.

Page 51: 91.102 Honors Computing II Final Project

Algorithm

The general idea behind my chosen topological sorting algorithm is as follows. Because there are

no cycles in the graph, I know that there is at least one minimum element. I loop through the

vertices of the graph until I find that minimum element. I print that element, and remove it from

consideration. This brings me back to the situation where I can find a new minimum element.

This process repeats until I eventually visit every node in my graph exactly once. At that point, a

valid topological sort has been found.

Testing

In order to verify that my program produced a valid topological sort, I wrote an

automated program that generated a random DAG, topologically sorted it using my program, and

then compared the sort to the ordering imposed by the edges of the graph. This whole process

was run in a loop that executed however many times I choose. Once it passed 1000/1000 cases, I

accepted that it was (likely) a correct algorithm.

The hardest part of my testing program was generating the DAG. My naïve attempt

involved generating a graph of random numbers in such a way that any two vertices had a chance

of being connected. Unfortunately, this resulted in cyclic graphs (and topological sorts do not

exist for cyclic graphs). My next attempt involved trying to build the graph in so that I would

never insert edges that caused cycles. This, too, proved to be very difficult to manage, and I was

forced to find another way.

Eventually, I got the idea to generate a completely random graph (as I first did) and then

run Kruskal’s Minimum Spanning Tree (MST) algorithm in order to eliminate cycles. This was

pretty much what I ended up using, except that a MST only exists for connected graphs. On my

randomly generated graphs, connectivity was not necessarily implied. As a result, I had to

modify the algorithm to find a minimum spanning forest rather than a tree.

Page 52: 91.102 Honors Computing II Final Project

Reflection

I really enjoyed this problem. I think that it is really interesting trying to sort a partial

ordering. When we first learn the sorting problem, we sort integers, which have a total ordering.

Although it seems like it would be simpler to sort a less-constrained partial ordering (and maybe

it is), it seems to me that it is harder. The current algorithm that I used for this topological sort (in

which I find the minimum element, remove it, and repeat) is the partial ordering analog of

Selection Sort.

I have tried thinking of how other sorting algorithms would be implemented on partial

orderings, but that is where I find issues. For instance, quicksort works by partitioning an array

into two sub arrays- one array is full of elements less than the pivot and one array is full of

elements greater than the pivot. So how could we apply this method to a partial ordering? We

cannot partition the array into the two sub arrays, because we do not have the luxury of every

two elements being comparable. Surprisingly (surprising to me, at least) our problem has become

harder because of our less-constraining comparison function.

Page 53: 91.102 Honors Computing II Final Project

C Implementation:

Makefile

Makefile 54

Header Files

globals.h 55

graph.h 56

list.h 57

queue.h 58

Source Files

main.c 59

graph.c 60

list.c 65

queue.c 67

Page 54: 91.102 Honors Computing II Final Project

## Programmer: Willie Boag## Makefile for Topological Sort#

tsort: main.o graph.o queue.o list.ogcc −o tsort main.o graph.o queue.o list.o

main.o: main.c globals.h graph.hgcc −ansi −pedantic −Wall −c −g main.c

graph.o: graph.c graph.h globals.h queue.hgcc −ansi −pedantic −Wall −c −g graph.c

queue.o: queue.c queue.h globals.h list.hgcc −ansi −pedantic −Wall −c −g queue.c

list.o: list.c list.h globals.hgcc −ansi −pedantic −Wall −c list.c

clean:rm −f *.o

Page 55: 91.102 Honors Computing II Final Project

/*************************************************************//* Programmer: Willie Boag *//* *//* globals.h (Topological Sort) *//*************************************************************/

#ifndef _globals#define _globals

#define DATA( L ) ( ( L ) −> datapointer )#define NEXT( L ) ( ( L ) −> next )

#define RIGHT(T) ( (T) −> right )#define LEFT(T) ( (T) −> left )

typedef enum { OK, ERROR } status ;typedef enum { FALSE=0 , TRUE=1 } bool ;typedef void *generic_ptr ;

#endif

Page 56: 91.102 Honors Computing II Final Project

/********************************************************//* Programmer: Willie Boag *//* *//* graph.h (Topological Sort) *//********************************************************/

#ifndef _graph#define _graph

#include " globals.h"

typedef int vertex ;typedef struct { int weight; vertex vertex_number; } edge ;

#define UNUSED_WEIGHT (32767)#define WEIGHT(p_e) ((p_e) −> weight)#define VERTEX(p_e) ((p_e) −> vertex_number)

typedef enum {directed, undirected } graph_type ;typedef enum {DEPTH_FIRST, BREADTH_FIRST, TOPOLOGICAL} searchorder ;

typedef struct {

graph_type type ; int number_of_vertices ; edge **matrix ;

} graph_header, *graph ;

extern status traverse_graph( graph G, searchorder order, status (*p_func_f)() ) ;extern status init_graph( graph *p_G, int vertex_cnt, graph_type type ) ;extern void destroy_graph( graph *p_G ) ;extern status add_edge( graph G, vertex vertex1, vertex vertex2, int weight ) ;extern status delete_edge( graph G, vertex vertex1, vertex vertex2 ) ;extern bool isadjacent( graph G, vertex vertex1, vertex vertex2 ) ;extern void graph_size( graph G, int *p_vertex_cnt, int *p_edge_cnt ) ;extern edge *edge_iterator( graph G, vertex vertex_number, edge *p_last_return ) ;

#endif

Page 57: 91.102 Honors Computing II Final Project

/**************************************************************//* Programmer: Willie Boag *//* *//* list.h (Topological Sort) *//**************************************************************/

#ifndef _list#define _list

#include "globals.h"typedef struct node node, *list ;struct node { generic_ptr datapointer; list next; } ;

extern status allocate_node( list *p_L, generic_ptr data ) ;extern void free_node(list *p_L ) ;

extern status init_list( list *p_L ) ;extern bool empty_list( list L ) ;extern status insert( list *p_L, generic_ptr data ) ;extern status append( list *p_L, generic_ptr data ) ;extern status delete( list *p_L, generic_ptr *p_data ) ;extern status delete_node( list*p_L, list node ) ;

extern status traverse( list L, status (*p_func_f) () ) ;extern status find_key( list L, generic_ptr key, int (*p_cmp_f)(), list *p_keynode ) ;

extern list list_iterator( list L, list lastreturn ) ;extern void destroy( list *p_L, void (*p_func_f)() ) ;

#endif

Page 58: 91.102 Honors Computing II Final Project

/***************************************************//* Porgrammer: Willie Boag *//* *//* queue.h (Topological Sort *//***************************************************/

#ifndef _queue#define _queue

#include " globals.h"#include " list.h"

typedef struct { node *front ; node *rear ;} queue ;

#define FRONT(Q) ((Q) −> front)#define REAR(Q) ((Q) −> rear)

status init_queue( queue *p_Q ) ;bool empty_queue( queue *p_Q ) ;status qadd( queue *p_Q, generic_ptr data ) ;status qremove( queue *p_Q , generic_ptr *p_data ) ;void qprint( queue Q, status (*p_func_f)() ) ;

#endif

Page 59: 91.102 Honors Computing II Final Project

/******************************************************************//* Programmer: Willie Boag *//* *//* main.c (Topological Sort) *//******************************************************************/

#include "graph.h"#include "globals.h"#include <stdio.h>

status write_vertex( int a ) {

printf( " %d ", a ) ;

return OK ;

}

int main( int argc, char *argv[] ){

FILE *fileptr ; int weight ; int from ; int to ; int numberofvertices ; graph G ;

fileptr = fopen( argv[1], "r" ) ;

fscanf(fileptr, "%d", &numberofvertices ) ;

init_graph( &G, numberofvertices, directed ) ;

while (fscanf( fileptr, "%d %d %d", &from, &to, &weight) != EOF)

add_edge( G, from, to, weight ) ;

printf("\n Topological Traversal: ") ; traverse_graph( G, TOPOLOGICAL, write_vertex ) ;

printf("\n\n") ;

destroy_graph( &G ) ;

fclose(fileptr) ;

return 0 ;

}

Page 60: 91.102 Honors Computing II Final Project

/********************************************************//* Programmer: Willie Boag *//* *//* graph.c (Topological Sort) *//********************************************************/

#include <stdlib.h>#include " globals.h"#include " graph.h"#include " queue.h"

#include <stdio.h>

extern status init_graph( graph *p_G, int vertex_cnt, graph_type type ) {

graph G ; int i, j ;

G = (graph) malloc( sizeof(graph_header)) ; if (G == NULL) return ERROR ;

G −> number_of_vertices = vertex_cnt ; G −> type = type ; G −> matrix = (edge **) malloc(vertex_cnt * sizeof(edge *)) ;

if (G −> matrix == NULL) { free(G) ; return ERROR ; }

G −>matrix[0] = (edge *) malloc(vertex_cnt * vertex_cnt * sizeof(edge)) ;

if (G −>matrix[0] == NULL) {

free(G −> matrix) ; free(G) ; return ERROR ;

}

for (i = 1; i < vertex_cnt ; i++) G −>matrix[i] = G −> matrix[0] + vertex_cnt * i ;

for (i = 0 ; i < vertex_cnt ; i++) {

for (j = 0 ; j < vertex_cnt ; j++) {

G −> matrix[i][j].weight = UNUSED_WEIGHT ; G −> matrix[i][j].vertex_number = j ;

}

}

*p_G = G ;

return OK ;

}

extern void destroy_graph( graph *p_G ) {

free((*p_G) −> matrix[0] ) ; free((*p_G) −> matrix ) ; free(*p_G) ;

p_G = NULL ;

}

extern status add_edge( graph G, vertex vertex1, vertex vertex2, int weight ) {

if (vertex1 < 0 || vertex1 > G −> number_of_vertices) return ERROR ; if (vertex2 < 0 || vertex2 > G −> number_of_vertices) return ERROR ; if (weight <= 0 || weight >= UNUSED_WEIGHT) return ERROR ;

G −> matrix[vertex1][vertex2].weight = weight ;

Page 61: 91.102 Honors Computing II Final Project

if (G −> type == undirected) G −> matrix[vertex2][vertex1].weight = weight ;

return OK ;

}

extern status delete_edge( graph G, vertex vertex1, vertex vertex2 ) {

if (vertex1 < 0 || vertex1 > G −> number_of_vertices) return ERROR ; if (vertex2 < 0 || vertex2 > G −> number_of_vertices) return ERROR ;

G −> matrix[vertex1][vertex2].weight = UNUSED_WEIGHT ;

if (G −> type == undirected) G −> matrix[vertex2][vertex1].weight = UNUSED_WEIGHT ;

return OK ;

}

extern bool isadjacent( graph G, vertex vertex1, vertex vertex2 ) {

if (vertex1 < 0 || vertex1 > G −> number_of_vertices) return FALSE ; if (vertex2 < 0 || vertex2 > G −> number_of_vertices) return FALSE ;

return (G −> matrix[vertex1][vertex2].weight == UNUSED_WEIGHT) ? FALSE : TRUE ;

}

extern void graph_size( graph G, int *p_vertex_cnt, int *p_edge_cnt ) {

int i , j ,edges ;

*p_vertex_cnt = G −> number_of_vertices ;

edges = 0 ;

for (i = 0 ; i < G −> number_of_vertices ; i++) for (j = i + 1 ; j < G −> number_of_vertices ; j++) if (G −> matrix[i][j].weight != UNUSED_WEIGHT) edges++ ;

*p_edge_cnt = edges ;

return ;

}

extern edge *edge_iterator( graph G, vertex vertex_number, edge *p_last_return ) {

vertex other_vertex ;

if (vertex_number < 0 || vertex_number >= G −> number_of_vertices) return NULL ;

if (p_last_return == NULL) other_vertex = 0 ; else other_vertex = VERTEX(p_last_return) + 1 ;

for ( ; other_vertex < G−> number_of_vertices ; other_vertex++) {

if (G −> matrix[vertex_number][other_vertex].weight != UNUSED_WEIGHT)

return &G −> matrix[vertex_number][other_vertex] ;

}

return NULL ;

}

static status breadth_first_search( graph G, vertex vertex_number, bool visited[], status (*p_func_f)() ) {

edge *tmp, *p_edge ; queue Q ;

Page 62: 91.102 Honors Computing II Final Project

visited[vertex_number] = TRUE ; if ((*p_func_f)(vertex_number) == ERROR) return ERROR ;

init_queue(&Q) ;

p_edge = NULL ; while ( (p_edge = edge_iterator(G, vertex_number, p_edge)) != NULL)

qadd( &Q, (generic_ptr) p_edge ) ;

while ( empty_queue(&Q) == FALSE ) {

qremove( &Q, (generic_ptr *) &tmp) ;

if (visited[VERTEX(tmp)] == FALSE) {

visited[VERTEX(tmp)] = TRUE ; if ((*p_func_f)(VERTEX(tmp)) == ERROR) return ERROR ; p_edge = NULL ; while ( (p_edge = edge_iterator(G, VERTEX(tmp), p_edge)) != NULL)

qadd( &Q, (generic_ptr) p_edge ) ; } }

return OK ;

}

static status depth_first_search( graph G, vertex vertex_number, bool visited[], status (*p_func_f)() ) {

edge *p_edge ; status rc ;

visited[vertex_number] = TRUE ;

if ((*p_func_f)(vertex_number) == ERROR) return ERROR ;

p_edge = NULL ;

while ( (p_edge = edge_iterator(G, vertex_number, p_edge)) != NULL)

if (visited[VERTEX(p_edge)] == FALSE) {

rc = depth_first_search(G, VERTEX(p_edge), visited, p_func_f) ;

if (rc == ERROR) return ERROR ;

}

return OK ;

}

static int *count_predecessors( graph G ) {

int vertex_cnt, edge_cnt, *pred ; int i ; edge *p_edge ;

graph_size( G, &vertex_cnt, &edge_cnt) ;

pred = ( int *) malloc( sizeof( int) * vertex_cnt ) ; if (pred == NULL) return NULL ;

Page 63: 91.102 Honors Computing II Final Project

for (i = 0 ; i < vertex_cnt ; i++) pred[i] = 0 ;

for (i = 0 ; i < vertex_cnt ; i++) {

p_edge = NULL ;

while ( (p_edge = edge_iterator( G, i, p_edge)) != NULL )

pred[VERTEX(p_edge)]++ ;

}

return pred ;

}

static int extract_min( int pred[], int n ) {

int i ;

/* Uses assumption that at least element has a value of zero. */ for (i = 0 ; i < n ; i++)

if (pred[i] == 0) {

pred[i] = −1 ;

return i ;

}

/* Should never get here. */ return −1 ;

}

static status topological_sort( graph G, status (*p_func_f)() ) {

int vertex_cnt, edge_cnt, *pred ; int count = 0, ind ; edge *p_edge ;

graph_size( G, &vertex_cnt, &edge_cnt) ;

pred = count_predecessors( G ) ;

while ( count < vertex_cnt ) {

ind = extract_min( pred, vertex_cnt ) ;

if ((*p_func_f)( ind ) == ERROR) return ERROR ;

count++ ;

p_edge = NULL ;

while ( (p_edge = edge_iterator( G, ind, p_edge)) != NULL )

pred[VERTEX(p_edge)]−− ;

}

free(pred) ;

return OK ;

}

extern status traverse_graph( graph G, searchorder order, status (*p_func_f)() ) {

status rc ; bool *visited ; int vertex_cnt, edge_cnt ;

Page 64: 91.102 Honors Computing II Final Project

int i ;

graph_size( G, &vertex_cnt, &edge_cnt) ;

visited = (bool *) malloc( sizeof(bool) * vertex_cnt);

if (visited == NULL) return ERROR ;

for (i = 0 ; i < vertex_cnt ; i++) visited[i] = FALSE ;

for ( rc = OK, i = 0 ; i < vertex_cnt && rc == OK ; i++) {

if (visited[i] == FALSE) {

switch (order) {

case DEPTH_FIRST:

rc = depth_first_search(G, i, visited, p_func_f) ;break ;

case BREADTH_FIRST:rc = breadth_first_search(G, i, visited, p_func_f) ;break ;

case TOPOLOGICAL:i = vertex_cnt ;rc = topological_sort( G, p_func_f ) ;break ;

} } }

free(visited) ; return OK ;

}

Page 65: 91.102 Honors Computing II Final Project

/**********************************************************//* Programmer: Willie Boag *//* *//* list.c (Topological Sort) *//**********************************************************/

#include <stdlib.h>#include "list.h"#include "globals.h"

status allocate_node( list *p_L, generic_ptr data ) { list L = (list) malloc(sizeof(node));

if (L == NULL) return ERROR;

*p_L = L; DATA(L) = data; NEXT(L) = NULL; return OK;}

void free_node( list *p_L ) { free(*p_L); *p_L = NULL;}

status init_list( list *p_L ) { *p_L = NULL; return OK;}

bool empty_list( list L ) { return (L == NULL) ? TRUE : FALSE;}

status insert( list *p_L, generic_ptr data ) { list L;

if (allocate_node(&L, data) == ERROR) return ERROR; NEXT(L) = *p_L; *p_L = L ; return OK;}

status append( list *p_L, generic_ptr data ) { list L, tmplist;

if (allocate_node(&L, data) == ERROR) return ERROR;

if (empty_list(*p_L) == TRUE) *p_L = L; else { for (tmplist = *p_L; NEXT(tmplist)!=NULL; tmplist=NEXT(tmplist)); NEXT(tmplist) = L; } return OK;}

status delete( list *p_L, generic_ptr *p_data ) { if ( empty_list(*p_L)) return ERROR;

*p_data = DATA(*p_L); return delete_node(p_L, *p_L);}

status delete_node( list *p_L, list node ) { list L;

Page 66: 91.102 Honors Computing II Final Project

if (empty_list(*p_L) == TRUE) return ERROR;

if (*p_L == node) *p_L = NEXT(*p_L); else { for (L = *p_L; L != NULL&& NEXT(L) != node; L = NEXT(L));

if (L == NULL ) return ERROR; else NEXT(L) = NEXT(node); } free_node(&node); return OK;}

status traverse( list L, status (*p_func_f) () ) { if (empty_list(L)) return OK;

if ((*p_func_f)(DATA(L)) == ERROR) return ERROR;

return traverse(NEXT(L), p_func_f);}

status find_key( list L, generic_ptr key, int (*p_cmp_f)(), list *p_keynode ) { list curr = NULL;

while ( ( curr = list_iterator(L, curr)) != NULL ) { if ((*p_cmp_f)(key, DATA(curr)) == 0 ) { *p_keynode = curr; return OK; } } return ERROR;}

list list_iterator( list L, list lastreturn ) { return (lastreturn == NULL) ? L : NEXT(lastreturn);}

void destroy( list *p_L, void (*p_func_f) () ) { if (empty_list(*p_L) == FALSE) { destroy(&NEXT(*p_L), p_func_f); if (p_func_f != NULL) (*p_func_f)(DATA(*p_L)); free_node(p_L); }}

Page 67: 91.102 Honors Computing II Final Project

/****************************************************//* Programmer: Willie Boag *//* *//* queue.c (Topological Sort) *//****************************************************/

#include <stdlib.h>

#include "globals.h"#include "queue.h"#include "list.h"

#include <stdio.h>

extern status init_queue( queue *p_Q ) {

/* *Initialize the queue to empty. */

FRONT( p_Q ) = NULL; REAR( p_Q ) = NULL ;

return OK ;

}

extern bool empty_queue( queue *p_Q ) {

/* * Return TRUE if queue is empty, FALSE otherwise. */

return (FRONT(p_Q) == NULL) ? TRUE : FALSE ;

}

extern status qadd( queue *p_Q, generic_ptr data ) {

/* * Add data to p_Q. */ list newnode ;

if (allocate_node(&newnode, data) == ERROR) return ERROR; if (empty_queue(p_Q) == FALSE) { NEXT(REAR(p_Q)) = newnode ; REAR(p_Q) = newnode ; } else { FRONT(p_Q) = REAR(p_Q) = newnode ;

}

return OK ; }

extern status qremove( queue *p_Q, generic_ptr *p_data ) {

/* * Remove a value from p_Q and put in p_data. */

list nodeinfront ;

Page 68: 91.102 Honors Computing II Final Project

if (empty_queue(p_Q) == TRUE) return ERROR;

nodeinfront = FRONT(p_Q) ; *p_data = DATA(nodeinfront) ;

if (REAR(p_Q) == FRONT(p_Q)) REAR(p_Q) = FRONT(p_Q) = NULL ; else FRONT(p_Q) = NEXT(nodeinfront) ;

return OK ;

}

extern void qprint( queue Q, status (*p_func_f)() ) {

node *temp ;

if (empty_queue(&Q) == TRUE) return ;

qremove( &Q, (generic_ptr *) &temp ) ;

(*p_func_f)( temp ) ;

qprint( Q, p_func_f ) ;

return ;

}

Page 69: 91.102 Honors Computing II Final Project

Bloom Filters Abstract:

Often times, data is stored in sets. Two of the most important operations on sets are insertion and

testing for membership. When time and space are the most important aspect of data look-up, the

Bloom Filter data structure can be used. The drawback of Bloom Filters is that they are

probabilistic by nature. Consequently, they could report a false positive for membership.

However, they never report false negatives.

Description:

There are many ways to represent a set in memory. If there exists a one-to-one

correspondence between your data and the natural numbers, then the obvious choice would be a

bit vector. However, such a correspondence is not always feasible, for instance when storing

strings.

Another implementation could be a hash table (which has a function that will map your

data to an index into your table). The good thing about a hash table is that it strives to locate data

very quickly. When data is stored in the table, its location is determined via the hashing function.

The goal is two quickly compute the index returned to the hashing function, which would only

take some constant amount of time. However, one of the most significant disadvantages of a

hash table results from “collisions” (when different pieces of data map to the same index). One

solution for this is problem is “chaining” pieces data together when they collide. In this solution,

although data would hash to the index in a constant amount of time, there would still be the

Page 70: 91.102 Honors Computing II Final Project

possibility of traversing a chain of data in order to find a particular element. As a result, constant

search time is not guaranteed.

Bloom Filters are similar to hash tables, except that they have guaranteed constant search

time (but at the price of certainty). They are implemented as bit vectors, but not with the usual

one-to-one correspondence.

When data needs to be stored, multiple hash functions are calculated using that data, and

each hash produces its own index into the bit vector. The bit at each hashed index is then set to

true. An important point here is that the inserted data is not actually stored. Instead, resultant data

is generated from it, and that resultant data is stored (in the bit vector). Consequently, when you

test whether previously inserted data is in the set, the hash functions will generate the same

indices that have already been set to true, and it will be determined that the data must have been

added to the set already.

Benefits:

As mentioned above, Bloom Filters are implemented as bit vectors, which means that

they take up very little space. This is a significant gain over hash tables, which not only store the

inserted data, but also have lots of unused space in order to probabilistically avoid collisions.

In addition, Bloom Filters have guaranteed constant look-up time. Unlike hash tables,

Bloom Filters never “chain” in the event of a collision. Consequently, the time required for the

“yes” or “no” to be generated by searching for a member is independent of the number of

collisions. Furthermore, there can never be false negatives, because once a member has been

added to the set, its hash-generated indices will always be set to true. Therefore, when it is tested

for membership at a later time, the test will never fail.

Drawbacks:

If a piece of data that is not actually added in the set just happens to hash to indices that

have all been set to true by other members of the set, then the Bloom Filter will mistakenly

believe that the current data was inserted, and report a false positive for membership.

Just as with hash tables, the higher the number of collisions, the less ideal the

performance of the data structure. Despite requiring little memory cost and guaranteed constant

search time, collisions result in the generation of false positives. As a result, Bloom Filters would

not be appropriate for situations that require high accuracy.

Page 71: 91.102 Honors Computing II Final Project

Implementation Choices:

For my implementation, I chose to use an array of integers (instead of bits) in order to

simplify the code. My code focuses more on the concept than the efficiency of space.

Furthermore, my hashing functions are very simple. More sophisticated (usually determined by

the kind of data to be evaluated) would be used in practice in order to avoid collisions.

I chose to represent my data structure as a struct with two fields: the “bit” array, and the

size of that array. My functions are able to initialize the structure, insert, test for membership,

delete, and free the structure.

My Bloom Filter was implemented in a polymorphic manner. As a result, I created very

basic interface functions. In addition, I separated the hashing functions from the primitives,

allowing them to be modified without re-compilation of the primitive functions.

I was able to delete from my Bloom Filter, because my array (of integers) could hold

more values than just true or false. I chose to store how many inserted elements have mapped to

each index. As a result, I could safely delete from the array without the fear of zeroing out the

index of a member that also hashed to the same index as the element to be removed.

I used three hashing functions. I made the decision to modulo the calculated index inside

of my primitives, as opposed to inside of my hashing functions. This decision was made so that

the hashing functions do not need to know how large the Bloom Filter bit array is. For the

purposes of dealing with strings, the three chosen functions calculate:

1. The sum of the ASCII values of the characters of the string.

2. The product of the ASCII values of the characters of the string.

3. The length of the string.

Motivation

I wanted to implement and discuss Bloom Filters, because I find the idea of the

probabilistic data structures to be fascinating. Just like with the randomized pivot selection of

quicksort, there is a seemingly mystical quality of randomness that can allow our algorithms and

data structures to perform better (in expectation) when we don’t know their exact behavior

before runtime. This concept seems counter intuitive, yet powerful.

Page 72: 91.102 Honors Computing II Final Project

Results

In this example, I inserted the words of poem.txt into my Bloom Filter. I then tested to

see if 20 very common English words were (probably) in my set. The words “to”, “of”, and

“you” (which actually are in the file) were correctly stated to have been found using the Bloom

Filter. In addition, the words “be”, “in”, and “have” were also predicted to be in my set, because

they hashed to the same indices that were set to true by the words of poem.txt. One false positive

example is explored in more detail below, showing which words caused the false positive “in”.

in to belong base

hash1: 23 3 23 27

hash2: 30 12 16 30

hash3: 2 2 6 4

Page 73: 91.102 Honors Computing II Final Project

C Implementation:

Makefile

Makefile 74

Header Files

globals.h 75

bloom.h 76

bloom_interface.h 77

hash.h 78

Source Files

main.c 79

bloom.c 80

bloom_interface.c 82

hash.c 83

Page 74: 91.102 Honors Computing II Final Project

## Programmer: Willie Boag## Makefile for Bloom Filters#

bloom: bloom.o main.o hash.o bloom_interface.ogcc −o bloom bloom.o main.o hash.o bloom_interface.o

bloom.o: bloom.c bloom.h globals.h hash.hgcc −ansi −pedantic −Wall −c bloom.c

main.o: main.c bloom.h globals.h bloom_interface.hgcc −ansi −pedantic −Wall −c main.c

hash.o: hash.c hash.h globals.hgcc −ansi −pedantic −Wall −c hash.c

bloom_interface.o: bloom_interface.c bloom.h globals.hgcc −ansi −pedantic −Wall −c bloom_interface.c

clean:rm −f *.o

Page 75: 91.102 Honors Computing II Final Project

/*************************************************************//* Progammer: Willie Boag *//* *//* globals.h (Bloom Filters) *//*************************************************************/

#ifndef _globals#define _globals

typedef enum { OK, ERROR } status ;typedef enum { FALSE=0 , TRUE=1 } bool ;typedef void *generic_ptr ;

#endif

Page 76: 91.102 Honors Computing II Final Project

/*****************************************************//* Programmer: Willie Boag *//* *//* bloom.h (Bloom Filters) *//*****************************************************/

#ifndef _bloom#define _bloom

#include "globals.h"

typedef struct { int * base ; int size ;} bloom ;

extern status init_bloom( bloom *p_B, int size ) ;extern void bloom_insert( bloom *p_B, generic_ptr data ) ;extern bool bloom_member( bloom *p_B, generic_ptr data ) ;extern status bloom_delete( bloom *p_B, generic_ptr data ) ;extern void destroy_bloom( bloom *p_B ) ;

#endif

Page 77: 91.102 Honors Computing II Final Project

/*****************************************************//* Programmer: Willie Boag *//* *//* bloom_interface.h (Bloom Filters) *//*****************************************************/

#ifndef _bloom_interface#define _bloom_interface

#include "globals.h"#include "bloom.h"

extern void str_bloom_insert( bloom *p_B, char *data ) ;extern bool str_bloom_member( bloom *p_B, char *data ) ;extern status str_bloom_delete( bloom *p_B, char *data ) ;

#endif

Page 78: 91.102 Honors Computing II Final Project

/*********************************************//* Programmer: Willie Boag *//* *//* hash.h (Bloom Filters) *//*********************************************/

#ifndef _hash#define _hash

extern int hash1( generic_ptr str ) ;extern int hash2( generic_ptr str ) ;extern int hash3( generic_ptr str ) ;

#endif

Page 79: 91.102 Honors Computing II Final Project

/********************************************************//* Programmer: Willie Boag *//* *//* main.c (Bloom Filters) *//********************************************************/

#include "globals.h"#include "bloom.h"#include "bloom_interface.h"

#include <stdio.h>#include <stdlib.h>

#define CALL_USAGE "\n\tCall Usage: ./bloom input_file comparison_file\n\n"

int main( int argc, char *argv[] ) {

FILE *fid1, *fid2 ; char word[20] ; bloom B ;

if (argc != 3) { fprintf( stderr, CALL_USAGE ) ; exit(1) ; }

fid1 = fopen( argv[1], "r" ) ; fid2 = fopen( argv[2], "r" ) ;

/* Error−check file opens. */ if (fid1 == NULL) { fprintf( stderr, "\n\tERROR: Could not open file: %s\n\n", argv[1] ) ; exit(1) ; }

if (fid2 == NULL) { fprintf( stderr, "\n\tERROR: Could not open file: %s\n\n", argv[2] ) ; exit(1) ; }

init_bloom( &B, 32 ) ;

/* Read input data into Bloom Filter. */ while (fscanf( fid1, "%s ", word ) != EOF) str_bloom_insert( &B, word ) ;

/* Check comparison data for membership. */ while (fscanf( fid2, "%s ", word ) != EOF) if ( str_bloom_member(&B, word) ) printf("\n\"%s\": maybe", word ) ; else printf("\n\"%s\": NO", word ) ;

printf("\n\n\n" ) ;

destroy_bloom( &B ) ;

fclose( fid1 ) ; fclose( fid2 ) ;

return 0 ;

}

Page 80: 91.102 Honors Computing II Final Project

/*****************************************************//* Programmer: Willie Boag *//* *//* bloom.c (Bloom Filters) *//*****************************************************/

#include " globals.h"#include " bloom.h"#include " hash.h"#include <stdlib.h>#include <string.h>

#include <stdio.h>

extern status init_bloom( bloom *p_B, int size ) {

int i ; bloom B ;

B.base = ( int *) malloc( sizeof( int) * size ) ; if (B.base == NULL) return ERROR ;

B.size = size ;

for (i = 0 ; i < size ; i++) B.base[i] = 0 ;

*p_B = B ;

return OK ;

}

extern void bloom_insert( bloom *p_B, generic_ptr data ) {

int h1, h2, h3 ;

h1 = hash1( data ) % p_B−>size ; h2 = hash2( data ) % p_B−>size ; h3 = hash3( data ) % p_B−>size ;

p_B−>base[h1]++ ; p_B−>base[h2]++ ; p_B−>base[h3]++ ;

}

extern bool bloom_member( bloom *p_B, generic_ptr data ) {

int h1, h2, h3 ;

h1 = hash1( data ) % p_B−>size ; h2 = hash2( data ) % p_B−>size ; h3 = hash3( data ) % p_B−>size ;

if (p_B−>base[h1] == 0) return FALSE ; if (p_B−>base[h2] == 0) return FALSE ; if (p_B−>base[h3] == 0) return FALSE ;

return TRUE ;

}

extern status bloom_delete( bloom *p_B, generic_ptr data ) {

int h1, h2, h3 ;

h1 = hash1( data ) % p_B−>size ; h2 = hash2( data ) % p_B−>size ; h3 = hash3( data ) % p_B−>size ;

Page 81: 91.102 Honors Computing II Final Project

if (p_B−>base[h1] < 0) return ERROR ; if (p_B−>base[h2] < 0) return ERROR ; if (p_B−>base[h3] < 0) return ERROR ;

p_B−>base[h1]−− ; p_B−>base[h2]−− ; p_B−>base[h3]−− ;

return OK ;

}

extern void destroy_bloom( bloom *p_B ) {

free( p_B −> base ) ; p_B −> base = NULL ;

}

Page 82: 91.102 Honors Computing II Final Project

/*****************************************************//* Programmer: Willie Boag *//* *//* bloom_interface.c (Bloom Filters) *//*****************************************************/

#include "globals.h"#include "bloom.h"

extern void str_bloom_insert( bloom *p_B, char *data ) {

bloom_insert( p_B, (generic_ptr) data ) ;

}

extern bool str_bloom_member( bloom *p_B, char *data ) {

return bloom_member( p_B, (generic_ptr) data ) ;

}

extern status str_bloom_delete( bloom *p_B, char *data ) {

return bloom_delete( p_B, (generic_ptr) data ) ;

}

Page 83: 91.102 Honors Computing II Final Project

/*******************************************************//* Programmer: Willie Boag *//* *//* hash.c (Bloom Filters) *//*******************************************************/

#include " globals.h"#include <string.h>

extern int hash1( generic_ptr data ) {

int sum = 0 ; char *str = ( char *)data;

while (*str != ’ \0’) sum += ( int) *str++ ;

return sum > 0 ? sum : −sum ;

}

extern int hash2( generic_ptr data ) {

int prod = 1 ; char *str = data;

while (*str != ’ \0’) prod *= ( int) *str++ ;

return prod > 0 ? prod : −prod ;

}

extern int hash3( generic_ptr data ) {

return strlen( ( char *) data ) ;

}

Page 84: 91.102 Honors Computing II Final Project

Fast Fourier Transform Abstract:

This is the algorithm that changed the world. The Fourier Transform is a tool that makes

analyzing waves and signals very easy – and waves are everywhere (light, sound, radar, etc).

Unfortunately, although the output of the transform is easy to work with, it used to be very

expensive to actually go through the process of applying the transform – which defeated the

purpose of it altogether. But in 1965, James Cooley and John Tukey discovered an efficient way

to calculate the Fourier Transform of a signal. Their efficient algorithm is called the Fast Fourier

Transform (FFT), because it is much more efficient than the original method for computing the

Discrete Fourier Transform (DFT). As a result, massive amounts of data are able to be analyzed

and processed in near-real time, profoundly impacting a large range technology- including MRIs

and police radar guns.

Page 85: 91.102 Honors Computing II Final Project

Fourier Transform:

Using such a transform, any wave can be “broken down” into its fundamental building

blocks – sine waves. The idea is that for any given wave, you can represent it as the sum of sine

waves of different frequencies. Such a representation allows for easy calculations in noise

reduction and convolution of waves.

The concept of breaking a wave down into its natural building blocks is better understood

using a simpler example. Consider two 3-dimensional vectors, which we can call a and b. The goal

is to add these two vectors together. There are many different methods that can be used to add

them (such as physically moving them for the tail-to-tip method), but the most natural way to do it

is to break the two vectors down into their x, y, and z components and add the corresponding

components together. The important step here, was breaking our vectors down into their x, y, and z

coordinates. This made the addition of a and b very easy. To relate this analogy back to the Fourier

Transform, it is the transform that actually breaks a signal into its corresponding sine waves. The

output of the transform would be the “coefficients” of each wave.

Page 86: 91.102 Honors Computing II Final Project

Original Algorithm:

The original method for calculating the Fourier Transform involves generating a matrix of

complex numbers. The data representing the signal is then stored in a vector and multiplied by the

complex matrix. This process is very inefficient, because the time and space required to store the

matrix of n by n values (where n is the length of the signal vector) grows very quickly as n

becomes large. The un-scalability of this algorithm is what makes it so impractical. Computers

needed to analyze large amounts of data signals very quickly, and that is just not possible with the

complex matrix method.

Fast Fourier Transform:

Cooley and Tukey realized that they could take advantage of the special structure of the

complex matrix that relied on the periodic nature of the complex entries. By separating all of the

odds rows from the even rows of the calculated output vector, they saw an incredible pattern! The

grouped rows could be arranged in such a way that each group was formed by the Fourier

Transform of a smaller vector.

I will say that once again, because of how critical that discovery is. They found a

recursive formula that calculates the FFT of a vector of length n by calculating the FFTs of two

vectors of length (n/2). The calculation of each (n/2)-sized vector would generate two (n/4)-sized

vectors. This process of solving simpler problems could continue could continue until the

problem becomes very easy to solve. Without getting too technical in my analysis, I will just say

that repeatedly cutting the input size in half allowed for very efficient calculations. In addition,

there was no longer the need to calculate and store the very large complex matrix for this

algorithm. As a result, the FFT greatly reduced the cost of performing the Fourier Transform on a

signal vector.

Page 87: 91.102 Honors Computing II Final Project

A Taste of Recursion:

Without going into too much detail, this section will point out some of the repetitive

structures of the Fast Fourier Transform calculations. The goal of this section is not to understand

every little detail, but instead to begin to see the overall picture. Colors have been used as an aid

when referencing the equations. Even without a full understanding of the mechanics of the

algorithm, the important take-away is that we are able to divide the problem into smaller sub-

problems.

The written expression on the upper-left-hand side of the page describes the components

of a length-4 signal vector after applying the Fourier Transform matrix multiplication. In this

picture, the terms of the form ω (from the complex matrix) have already been evaluated, which is

where the complex numbers involving e came from.

On the upper-right-hand side of the page, I have grouped the odd rows and the even rows

and separated the two groups with a black line. In the even grouping, notice that the terms

(x0 + x2) and (x1 + x3) are the same across the top two rows. In addition, the odd grouping has

(x0 – x2) and (x1 – x3) in the same situation for the bottom two rows.

The important idea behind this observation is that the large system of equations for

calculating the DFT of a length-4 vector can be re-arranged into two smaller groups that exhibit

remarkable similarities in structure. Furthermore, the two terms (x0 + x2) and (x0 – x2) define the

DFT of the length-2 vector {x0, x2}. Likewise, (x1 + x3) and (x1 – x3) is the DFT of the vector

{x1, x3}. As promised, the length-4 DFT can be written in terms of two length-2 DFTs.

For the curious learner, it can be show that the coefficients (that is, the terms involving e)

are identical within groups. The reason for this involves the periodic structure of complex

numbers. In other words, in both groups have the same characteristic of f(x0,x2) + scale * f(x1,x3),

except the top row of each group has a plus sign and the bottom row has a minus sign. There is a

picture at the bottom of the page to clarify the above statements. We can obtain equal coefficients

for the even vector by rewriting the second row, changing the + (as it has above) to -. We can do

the same for the coefficients of the odd vector, making sure to again change the + to – in the

second row. This concludes the analysis for the length-4 FFT.

Page 88: 91.102 Honors Computing II Final Project

Why It Matters:

Although that seems like a lot of work for “simplifying” an algorithm, the gains in

efficiency of the FFT over the original DFT algorithm are incredible. The important thing to

consider is how fast the computation time grows as the length of the signal vector increases. I

have implemented both algorithms for the Fourier Transform, and compared the computation

difference for signal vectors of very large length.

When reading the output of the time command, the important number to consider is the

one on the middle line that says “user.” That is the time that was required for the actual algorithm

to run. By comparing these two times, we can see that for a vector of length 8192, the FFT took

.148 seconds. Compare that to the 60.392 seconds required for the ordinary DFT algorithm. In

case you do not have your calculator with you, the FFT was about 408 times faster.

Conclusion:

The FFT is tremendously useful for breaking down signals and waves into their natural

building blocks. Once in their more natural form, computations such as combining two

overlapping signals together becomes very easy. As a result, signals can be processed at

incredibly high speeds. Imagine trying to meaningfully interpret MRI results if it took 400 times

longer to process the data.

Page 89: 91.102 Honors Computing II Final Project

C Implementation:

Makefile

Makefile 90

Header Files

complex.h 91

Source Files

fft.c 92

dft.c 95

complex.c 97

Page 90: 91.102 Honors Computing II Final Project

## Programmer: Willie Boag## Makefile for Fast Fourier Transform#

all: dft fft

fft: fft.o complex.ogcc −o fft fft.o complex.o −lm

dft: dft.o complex.ogcc −o dft dft.o complex.o −lm

dft.o: dft.c complex.hgcc −ansi −pedantic −Wall −c dft.c −D_GNU_SOURCE

complex.o: complex.c complex.hgcc −ansi −pedantic −Wall −c complex.c

clean:rm −f *.o

Page 91: 91.102 Honors Computing II Final Project

/***************************************************************//* Programmer: Willie Boag *//* *//* complex.h (Fast Fourier Transform) *//***************************************************************/

#ifndef _complex#define _complex

typedef struct { double real ; double imaginary ; } complex ;typedef enum { FALSE=0 , TRUE=1 } bool ;typedef enum { ERROR, OK } status ;

complex load_complex( double real, double imaginary ) ;

complex add_complex( complex a, complex b ) ;complex multiply_complex( complex a, complex b ) ;complex subtract_complex( complex a, complex b ) ;

#endif

Page 92: 91.102 Honors Computing II Final Project

/***************************************************************//* Programmer: Willie Boag *//* *//* fft.c (Fast Fourier Transform) *//***************************************************************/

#include " complex.h"#include <stdio.h>#include <stdlib.h>#include <math.h>

typedef enum { FORWARD, INVERSE } direction ;

/* Global variable. */complex *omega ;

complex *create_vector( int n ) {

return (complex *) malloc( sizeof(complex) * n ) ;

}

void fill_omega_vector( int n ) {

int i ; double real, imag ;

omega = create_vector( n ) ;

for ( i = 0 ; i < n ; i++ ) { real = cos( −(2 * M_PI/n) * i ) ; imag = sin( −(2 * M_PI/n) * i ) ;

omega[i] = load_complex( real, imag ) ; }

}

void print_vector( complex v[], int n ) {

int i ;

printf( " \n" ) ;

for ( i = 0 ; i < n ; i++ )

printf( " \t%f %f\n", v[i].real, v[i].imaginary ) ;

printf( " \n\n" ) ;

}

complex *FFT( complex *p, int k , int m, direction dir ) {

int n, i ; complex *transform ; complex *evens, *odds ; complex *p_e, *p_o ; complex scale, scaled_odd ;

n = pow(2, k ) ;

transform = create_vector( n ) ;

if (k == 0) { transform[0] = p[0] ; return transform ; }

p_e = create_vector( n/2 ) ; p_o = create_vector( n/2 ) ;

Page 93: 91.102 Honors Computing II Final Project

/* collect evens */ for ( i = 0 ; i < n/2 ; i++ ) p_e[i] = p[2*i] ; /* collect odds */ for ( i = 0 ; i < n/2 ; i++ ) p_o[i] = p[2*i + 1] ;

/* Two n/2 FFTs */ evens = FFT( p_e, k−1, 2*m, dir ) ; odds = FFT( p_o, k−1, 2*m, dir ) ;

for ( i = 0 ; i < n/2 ; i++ ) { /* Forward or Inverse transform? */ scale.real = omega[m*i].real ; scale.imaginary = (((dir == FORWARD) ? 1 : −1) * omega[m*i].imaginary) ;

/* scale * odd */ scaled_odd = multiply_complex( scale, odds[i] ) ;

/* even + (scale * odd) */ /* even − (scale * odd) */ transform[ i ] = add_complex( evens[i], scaled_odd ) ; transform[ n/2 + i ] = subtract_complex( evens[i], scaled_odd ) ; }

/* Scale result by 1/n for inverse FFT. */ if (dir == INVERSE && m == 1) for (i = 0 ; i < n ; i++) { transform[i].real /= ( double ) n ; transform[i].imaginary /= ( double ) n ; }

free( p_e ) ; free( p_o ) ; free( evens ) ; free( odds ) ; return transform ;

}

complex *pointwise_complex_multiply( complex *u, complex*v, int n ) {

int i ; complex *w ;

w = (complex *) malloc( sizeof(complex) * n ) ;

for ( i = 0 ; i < n ; i++ )

w[i] = multiply_complex( u[i], v[i] ) ;

return w ;

}

int main( int argc, char *argv[] ) {

int i, n , k ; complex *u, *v, *w ; complex *uf, *vf, *wf ;

n = atoi( argv[1] ) ; k = ceil( log( ( double ) n ) / log( 2.0 ) ) ;

fill_omega_vector( n ) ;

u = create_vector( n ) ; v = create_vector( n ) ;

for ( i = 0 ; i < n/2 ; i++ ) {

u[i] = load_complex( i + 1.0, 0.0 ) ; u[i+n/2] = load_complex( 0.0 , 0.0 ) ;

Page 94: 91.102 Honors Computing II Final Project

v[i] = load_complex( 1.0 , 0.0 ) ; v[i+n/2] = load_complex( 0.0 , 0.0 ) ;

}

uf = FFT(u, k, 1, FORWARD ) ; vf = FFT(v, k, 1, FORWARD ) ;

wf = pointwise_complex_multiply( uf, vf, n ) ;

w = FFT(wf, k, 1, INVERSE) ;

printf(" \n\nw:") ; print_vector(w, n) ;

free( omega ) ; free( u ) ; free( uf ) ; free( v ) ; free( vf ) ; free( w ) ; free( wf ) ;

return 0 ;

}

Page 95: 91.102 Honors Computing II Final Project

/*****************************************************//* Programmer: Willie Boag *//* *//* dft (vs. Fast Fourier Transform) *//*****************************************************/

#include " complex.h"#include <stdlib.h>#include <stdio.h>#include <math.h>

complex *complex_matrix_vector_multiply( complex *A, complex *x, int n ) ;complex *pointwise_complex_multiply( complex *u, complex*v, int n ) ;

void print_vector( complex v[], int n ) {

int i ;

printf( " \n" ) ;

for ( i = 0 ; i < n ; i++ )

printf( " \t%f %f\n", v[i].real, v[i].imaginary ) ;

printf( " \n\n" ) ;

}

int main( int argc, char *argv[] ) {

int n, i, j ; double real, imag ; complex *F, *IF ; complex *u, *uf ; complex *v, *vf ; complex *w, *wf ;

n = atoi( argv[1] ) ;

F = (complex *) malloc( sizeof(complex) * n * n ) ; IF = (complex *) malloc( sizeof(complex) * n * n ) ;

u = (complex *) malloc( sizeof(complex) * n ) ; v = (complex *) malloc( sizeof(complex) * n ) ;

for ( i = 0 ; i < n ; i++ ) {

/* Fourier Transform matrix */ for ( j = 0 ; j < n ; j++ ) { real = cos(((2 * M_PI)/n) * ((i * j) % n) ) ; imag = sin(((2 * M_PI)/n) * ((i * j) % n) ) ;

F[i*n + j] = load_complex( real, imag ) ; }

/* Inverse Fourier Transform matrix */ for ( j = 0 ; j < n ; j++ ) { real = (1.0/n) * cos(((2 * M_PI)/n) * (−(i * j) % n) ) ; imag = (1.0/n) * sin(((2 * M_PI)/n) * (−(i * j) % n) ) ;

IF[i*n + j] = load_complex( real, imag ) ; }

}

/* Fill vectors with data */ for ( i = 0 ; i < n/2 ; i++ ) {

u[i] = load_complex( i + 1.0, 0.0 ) ; u[i+n/2] = load_complex( 0.0 , 0.0 ) ;

Page 96: 91.102 Honors Computing II Final Project

v[i] = load_complex( 1.0 , 0.0 ) ; v[i+n/2] = load_complex( 0.0 , 0.0 ) ;

}

/* Perform Fourier Transform */ uf = complex_matrix_vector_multiply( F, u, n ) ; vf = complex_matrix_vector_multiply( F, v, n ) ;

wf = pointwise_complex_multiply( uf, vf, n ) ;

w = complex_matrix_vector_multiply( IF, wf, n ) ;

printf(" \n\nw:") ; print_vector(w, n) ;

free( u ) ; free( uf ) ; free( v ) ; free( vf ) ; free( w ) ; free( wf ) ; free( F ) ; free( IF ) ;

return 0 ;

}

complex *complex_matrix_vector_multiply( complex *A, complex *x, int n ) {

int i, j ; complex *b ; complex sum, prod ;

b = (complex *) malloc( sizeof(complex) * n ) ;

for ( i = 0 ; i < n ; i++ ) {

sum = load_complex( 0.0, 0.0 ) ;

for ( j = 0 ; j < n ; j++ ) { prod = multiply_complex( A[i*n + j], x[j] ) ; sum = add_complex( sum, prod ) ; }

b[i] = sum ;

}

return b ;

}

complex *pointwise_complex_multiply( complex *u, complex*v, int n ) {

int i ; complex *w ;

w = (complex *) malloc( sizeof(complex) * n ) ;

for ( i = 0 ; i < n ; i++ )

w[i] = multiply_complex( u[i], v[i] ) ;

return w ;

}

Page 97: 91.102 Honors Computing II Final Project

/***************************************************************//* Programmer: William George Boag *//* *//* complex.c (Fast Fourier Transform) *//***************************************************************/

#include " complex.h"#include <stdio.h>

extern complex load_complex( double real, double imaginary ) {

complex c ;

c.real = real ; c.imaginary = imaginary ;

return c ;

}

extern complex add_complex( complex a, complex b ) {

complex sum ;

sum.real = a.real + b.real ; sum.imaginary = a.imaginary + b.imaginary ;

return sum ;

}

extern complex multiply_complex( complex a, complex b ) {

complex prod ; double ar, ai, br, bi ;

ar = a.real ; ai = a.imaginary ;

br = b.real ; bi = b.imaginary ;

prod.real = (ar * br) − (ai * bi) ; prod.imaginary = (ar * bi) + (ai * br) ;

return prod ;

}

extern complex subtract_complex( complex a, complex b ) {

complex diff ;

diff.real = a.real − b.real ; diff.imaginary = a.imaginary − b.imaginary ;

return diff ;

}

Page 98: 91.102 Honors Computing II Final Project

Appendices

Appendix A

Kruskal’s MST Testing Code 99

Appendix B

Topological Sort Testing Code 100

Page 99: 91.102 Honors Computing II Final Project

/****************************************************************//* Programmer: Willie Boag *//* *//* create.c *//* *//* Task: Create a randomly generated graph that is guaranteed *//* to be connected. The graph is stored in the file *//* tmp.txt *//****************************************************************/

#include <stdlib.h>#include <stdio.h>#include <time.h>

int main( int argc, char *argv[] ) {

FILE *fid ; int vertices, edges ; int i ; int a, b , weight ;

/* Set new seed. */ srand(time(NULL)) ;

fid = fopen( "tmp.txt", "w" ) ;

if (fid == NULL) { fprintf( stderr, "\n\tERROR: Could not open file: tmp.txt\n\n" ) ; exit(1) ; } /* Randomly determine number of edges and vertices. */ vertices = rand() % 20 + 1 ; edges = rand() % 50 ; /* Create a graph file. */ fprintf( fid, "%d\n", vertices ) ; for (i = 0 ; i < edges ; i++) {

/* Random edge */ a = rand() % vertices ; b = rand() % vertices ; weight = rand() % 30 ;

/* No self−loops. */ if (a == b) b = (b + 1) % vertices ;

fprintf( fid, "%d %d %d\n", a, b, weight ) ;

}

/* Guarantee that graph is connected. */ for (i = 0 ; i < vertices ; i++) fprintf( fid,"0 %d 1000\n", i ) ; fclose( fid ) ; return 0 ; }

Page 100: 91.102 Honors Computing II Final Project

/************************************************************************//* Programmer: Willie Boag *//* *//* Program: compare.c *//* *//* Task: Find a topological sort of a set of randomly created graphs. *//* Then, verify that the computed solution is correct. *//************************************************************************/

#include <stdlib.h>#include <stdio.h>#include <time.h>#include <string.h>

#define CALL_USAGE " \n\tCall Usage: ./compare [n]\n\n"

void create_graph( void ) ;int is_topo( void ) ;

int main( int argc, char *argv[] ) {

int n = 20, failures = 0 ; int i ; char command[70] ;

/* Call Usage error−check */ if (argc != 1 && argc != 2) { fprintf( stderr, CALL_USAGE ) ; exit(1) ; }

/* Change number of iterations, if desired. */ if (argc == 2) n = atoi(argv[1]) ;

for (i = 0 ; i < n ; i++) {

/* Create random graph (possibly containing cycles. */ create_graph() ;

/* Convert the graph into a DAG using the Minimum Spanning Forest algorithm. */ system( " ~wboag/Public/msf .tmp.txt > .tmp2.txt" ) ;

/* Find a topological sort of the DAG. */ sprintf( command, " tsort .tmp2.txt > .tmp.txt" ) ; system( command ) ;

/* Verify if toplogical sort is correct. */ if ( !is_topo() ) { failures++ ; fprintf(stderr, " \n\tFAILURE #%d", failures ) ; fprintf(stderr, " \n\tFiles: %d, %d\n\n", failures*2+1, failures*2 + 2 ) ; sprintf( command, " cp .tmp.txt fail%d.txt",2*failures + 1 ) ; system( command ) ; sprintf( command, " cp .tmp2.txt fail%d.txt",2*failures + 2 ) ; system( command ) ; } } /* Analysis of tests. */ printf(" \n\nPassed on %d/%d random graphs.\n\n\n", n − failures, n ) ;

return 0 ;

}

Page 101: 91.102 Honors Computing II Final Project

void create_graph( void ) {

FILE *fid ; int vertices, edges ; int i ; int a, b , weight ;

/* Set new seed. */ srand(time( NULL)) ;

fid = fopen( " .tmp.txt", " w" ) ;

if (fid == NULL) { fprintf( stderr, " \n\tERROR: Could not open file: .tmp.txt\n\n" ) ; exit(1) ; } /* Randomly determine number of edges and vertices. */ vertices = rand() % 20 + 1 ; edges = rand() % 50 ; /* Create a graph file. */ fprintf( fid, " %d\n", vertices ) ; for (i = 0 ; i < edges ; i++) {

/* Random edge */ a = rand() % vertices ; b = rand() % vertices ; weight = rand() % 30 ;

/* No self−loops. */ if (a == b) b = b + 1 % vertices ;

fprintf( fid, " %d %d %d\n", a, b, weight ) ;

} fclose( fid ) ; }

int is_topo( void ) {

FILE *fid ; int vertices ; int i, j ; int *visited, **depends ; int a, b, weight ;

/* Update dependency matrix for given DAG. */ fid = fopen(" .tmp2.txt", " r") ; fscanf( fid, " %*[^:]:" ) ; fscanf( fid, " %d", &vertices ) ; visited = ( int *) malloc( sizeof( int ) * vertices ) ;

depends = ( int **) malloc( sizeof( int *) * vertices ) ; for (i = 0 ; i < vertices ; i++) depends[i] = ( int *) malloc( sizeof( int ) * vertices ) ;

/* Initialize the visited "bit vector" */ for (i = 0 ; i < vertices ; i++) visited[i] = 0 ; /* Initialize the dependency matrix. */ for (i = 0 ; i < vertices ; i++) for (j = 0 ; j < vertices ; j++) depends[i][j] = 0 ;

while (fscanf(fid, " %d %d %d", &a, &b, &weight) != EOF)

Page 102: 91.102 Honors Computing II Final Project

depends[b][a] = 1 ; fclose( fid ) ;

/* Sweep through alleged topological sort. Check dependencies */ fid = fopen( " .tmp.txt", " r" ) ; fscanf( fid, " %d", &a ) ; for (i = 0 ; i < vertices ; i++) for ( j = 0 ; j < vertices ; j++)

/* Case: dependency is present, but they are out of order. */ if ( depends[i][j] == 1 && visited[j] == 0)

return 0 ; else

visited[i] = 1 ; fclose(fid ) ;

return 1 ; }