THE WEB SERVER PROJECT

I

THE WEB SERVER PROJECT

Final Design Report

ECE299 – Communication and Design II

Date of Submission:

Saturday, April 15, 2006

Submitted by:

Daniel Dieana (993124221) Rishi Kohli (994055017) Gaurav Jain (993796380)

Team # 092

II

EXECUTIVE SUMMARY

Our group has successfully designed a web server that accepts browser

requests and delivers Web pages to in response and other files through HTTP protocol.

This project was implemented as a course requisite for ECE299 by a group of three

students. The server integrates many features which allows for the smooth, continuous

and reliable operation for server-based applications. Features such as dynamic content

handling, multi-threading and error logging have been streamlined into the code to

provide ease of use and maintainability. This report provides a detailed analysis of some

of the design features, their functionality and structure. Also, discussed are the important

design decisions which have been taken in order to ensure the smooth functionality of

the web server.

The primary design of the server is based upon the idea of Modularity, Cohesion

and Coupling. The above design elements are explained further in the report. They

provide a means to structure the code in a manner that is coherent and well-thought.

Our server does have certain limitations. It does not handle load balancing and has the

potential to become overwhelmed. As well, there is an issue with multi-threading which

stays in a session and does not allow new information to be refreshed. This is only a

problem if a user makes numerous requests simultaneously. These issues aside, our

server proves to handle many situations properly as shown in the testing section.

A load generator was designed to test the server and to produce statistics resulting

from performance and reliability. It verified that our server was operating within project

specifications. Our server reached an average bandwidth of 459Kbps with a response

time of 1.58s. These numbers are impressive when compared to a well know server

such as Apache™ HTTP server. Their bandwidth is 470 Kbps with a response time of

1.40s.

III

TABLE OF CONTENTS

Executive Summary... …………………………………………………………………… II 1. Introduction……………………………………………………………………………...1 2. Design and Functionality Overview………………………………………………......2

- 2.1. Server Management……………………………………………………………2

- 2.2. Concurrent Connections…………………………………………………........5

- 2.3. Load Generation………………………………………………………………..6

-2.4. Handling Dynamic Content………………………………………………….....6

3. Design Basis: Stress on Modularization…………………………………………......8

4. Testing Methods and Results……………………………………………………......10

5. Performance……………………………………………………………………………13

6. Server Evaluation………………………………………………………………….......13 7. Conclusion………………………………………………………………………………14 APPENDIX A………………………………………………………………………………15 APPENDIX B………………………………………………………………………………16 APPENDIX C………………………………………………………………………………17

APPENDIX D………………………………………………………………………………18

1

1. INTRODUCTION

This report provides a full analysis on the design and functionality of the web

server implemented by team #92 as a course requisite for ECE299. According to the

guidelines presented in the project handout (http://ccnet.utoronto.ca/20061/ece299h1s/),

our group has implemented the web server in C++ language operating on Solaris TM

platform. The main goal of this report is twofold:

• To describe the functionality embedded into the server

• To evaluate the server based on testing results

This report contains a design and functionality overview, testing methods and

harnesses, a performance section, and a final server evaluation. The overview provides

a brief outline for some of the features of the server (Server Management, Concurrent

Connections, Handling Dynamic Request and Load Generator) along with their design

decisions. The design was based on 3 key elements including modularity, cohesion and

coupling. Modularity was essential in the design of the server as it promotes simplicity

and simplifies the process of integration. Cohesion is a measure of how strongly related

and focused the responsibilities of a single class are. This ensured that the likelihood of

code reuse is increased and complexity kept manageable. With low coupling, a change

in one module will not require a change in the implementation of another module. This

provides a well-structured system, however, sacrificing maximum efficiency of the

program. Testing methodology is provided along with specific examples and outcomes

of certain features. Performance is important for any server and so a summary follows

which provides numerical statistics to reinforce our claims. Finally, a server evaluation

concludes this report.

2

2. DESIGN AND FUNCTIONALITY OVERVIEW

The following section describes some of the key design features – their

functionality and implementation details. Refer to APPENDIX B for the main design

description flowchart.

2.1. SERVER MANAGEMENT

In order for a web server to run effectively and continuously, it is necessary to get

feedback about certain issues such as the activity and performance of the web server

along with any problems that may be occurring. The following features play a key role in

order to manage the server effectively.

Log Files - Functionality

The web server implemented by the design team includes extensive logging

capabilities. Each client request that is received by the server is parsed into different

details such as the IP Address of the client, the requested file and then appended into a

readable format along with other details such as the current time, date and the file size.

Below is provided an example taken from access log file (Refer to APPENDIX D) that

our web server uses to store the successfully responded requests:

128.100.175.14 Sat Apr 8 04:41:36 2006 "get public_www HTTP/1.1" 200 359

Storing log information in various log files is the first step in log management. The next

step is to be able to analyze this information to provide important and useful statistics.

Although, various applications are available on the World Wide Web, Log Analysis is

something that was beyond the scope of the project and will not be discussed in this

document.

Client IP Address Date and Time Relative URL Size of the file

3

Log Files – Design and Structure

The logging capabilities of the web server were implemented by making use of

the object-oriented interface of C++ language. Using inheritance, three separate classes

were derived from one parent class named as “Log.C”. The inheritance chart provided

below further explains the relationship between separate classes. The inheritance

mechanism provides the flexibility to add in further log files in addition to these three

mentioned below.

AccessLog.C

ErrorLog.C

DebugLog.C

Log.C

Fig 1.1 Log Files Inheritance Structure

Configuration File Format – Functionality

A configuration file is an important tool for a Web Server to configure itself. The

first instructions that our server executes after being run, is read and parse the

configuration file (which is saved as a .txt extension file in the root directory), and get

important details such as Port number, Root directory and other relevant information.

Configuration File Format – Design and Structure

The feature to read and parse the configuration file is based on the Linked List

structure, which is implemented in a separate class Config.C. The uniqueness of the

design lies in the fact that each new piece of information is parsed into a separate node

of the linked list which grows infinitely. This allows the flexibility of entering more

parameters to the configuration file at the later stages. Also, the Config.C class has

4

various functions such as find (“Port”), which are used for retrieval of the required data

such as the Port number.

Port: 55555 DocRoot: pubic_wwwAccessLog: logs/

access.log

Fig 1.2 Linked List format of the Config class

Graceful Server Shutdown – Functionality and Design

Our server implements a feature that allows the server to be shut down through

the Unix signal mechanism1. This allows the server to shut down, where no new

connections are accepted, but the server responds appropriately to the already received

requests. Whenever there is a signal sent by the operating system (Unix in our case),

the signal handler, HUP_received(), is invoked asynchronously, which sets the value of

the global variable ‘Terminate’ to true. The main program contains a set of continuous

loops running until the value of ‘Terminate’ is false, it checks for the value of this global

variable and exits immediately. The basic design and flow on how graceful server

shutdown is implemented is provided below.

Fig. 1.3 Graceful Server Shutdown Flowchart

HUP_received( ) HUP signal

Terminate

Set Terminate to true TRUE

?

Server Shutdown

Continue running Server

YES

5

2.2. CONCURRENT CONNECTIONS

The concept of multithreading is used to enable a web server to handle multiple

requests from different users simultaneously. Multithreading involves executing different

parts of the program called threads.

Multithreading – Design and Structure

As the server project was implemented in two stages, the fundamental

architecture of the web server from the first stage was changed to incorporate

multithreading. The program now included an element of parallelism where the

HTTPServer class now derived from a base class Runnable. In the main program, with

each new connection a new instance of HTTPServer class is created. Also, a new

thread is created associated with that particular instance of HTTPServer class, followed

by the execution of that thread. The Pseudo-code to elaborate this further is provided

below.

While (Server running) { Waiting for a new connection { Create a new thread;

Create a new instance of HTTPServer and link in with the thread;

Run Thread; } }

Design Decisions/Uniqueness The principle advantage of using the above structure was that it made the

program very efficient as it allowed the program to utilize idle time that was present

before. Also, it created a new thread for each connection and dynamically destroyed it

upon the closure of that connection avoiding any memory leaks.

Secondly, one key decision that the design team had to incorporate was to limit

the maximum number of simultaneous connections to the server to 30. This was done to

6

prevent the server from crashing and make the load time faster. This is further discussed

in the evaluation section of the report.

2.3. LOAD GENERATION

The load generation is responsible for creating file content and accessing them

through the server. This process is clocked and vital performance statistics can be

calculated. A stand-alone program was designed separate from the server which could

be run through the UNIX terminal. The code was designed for simplicity and efficiency.

The following is a code snippet from HTMLGeneration.C:

//Generic file name char fileName[20] = "classa_b"; //takes care of class0 (below 1k; increment 0.1k) for(int count=0; count<9; count++){

strcpy(dir,dirBackUp); //replace dir with backup fileName[5] = '0'; fileName[7] = (char)count+49; ofstream test(dir); for (int sizeCount=0; sizeCount<class0 + count*102; sizeCount++){

test << "A"; } }

The underlined items show how the file and file sizes are generated. A generic

file name is given such as “classa_b” where the letters “a” and “b” can be replaced as

the loop cycles. In order to create specific file sizes the character “A” was inserted into

the files. One character represents a byte so to create a file size of 102Kb, 12750 “A” ‘s

would be inputted into the file. The “for” loop is responsible for adding the characters

until it reaches the parameter “sizeCount” which will then exit the loop and return for a

different file of a different size. This function is compact, which allows it to be accessed

many times. Using this feature, files could be generated until the hard drive runs out of

space.

7

2.4 HANDLING DYNAMIC CONTENT (GET and POST Requests)

In case of present web serving, often clients enter specific data in the browser

window and expect a customized response in return. This was an essential feature of

our web server, which allowed the user to enter any arbitrary content in the URL and

view custom made pages depending on the request entered. Generally, dynamic content

handling can be divided into main categories: GET and POST Requests. Initially, inside

our HTTPServer, the request is checked for the dynamic extension, specifically,

“.dyn”. After that, the program checks if the request is of type GET or POST and

takes further actions depending on the request type, which are described below (Refer

to APPENDIX C for algorithm chart for handling dynamic requests).

Dynamic GET Requests: Our program parses and separates out the dynamic

parameters from the URL, arranges them in a particular format and feeds that format to

the dynamic program via the system command. The dynamic program then returns in

response either a customized JPEG page or an error message in case of incorrect

request, on the browser window.

Dynamic POST Requests: Dynamic POST requests are generally requested when the

data that the user wants to transfer through the request is substantially large. Our

program creates a .txt file which stores all the information that the client wants to be

transferred. Then, that file is passed to the dynamic program just like in the case of GET

requests, via a system command, which, in turn, responds back to the client.

Design Decisions/Uniqueness

1. In case of dynamic GET requests, the design team incorporated the decision to

display a customized JPEG page, instead of just displaying the dynamic

8

parameters on the browser window. This added a personal touch to the program

making the program unique and works similar to Google Image Search. For

example, a request format below would display a JPEG page including the

pictures of apple and mango.

http://ugpsarc251.eecg.toronto.edu:55555/program.dyn?fruit=apple&fruit=mango.

2. As seen from the above request example, the server also allows the user to allow

multiple parameters (up to a maximum of three) instead of a single one, where

different parameters are separated by the ‘&’ character. This gives the client the

flexibility to view customized pages which include multiple pictures.

3. DESIGN BASIS: STRESS ON MODULARIZATION

3.1. MODULARITY

The first basic design strategy in implementing all the classes that has been

followed is keeping the code modular. The base of the server is kept very light in terms

of code, and all other functions are implemented as modules that can be added to the

base. Most of the source code has been written keeping in mind the factors of

modularity: High-cohesion and Low-coupling. These modularization drivers are

discussed below with relevant examples.

3.2. COHESION

As explained earlier in the introduction, cohesion is a measure of how strongly

related and focused the responsibilities of a single class are. The server comprises of

many classes which are robust and tightly knit in regards to the functions that they

contain. Below is an example of a couple of classes that demonstrate high-cohesion.

i) The Config.C class which is used to read and parse information from the configuration

file has the following set of functions: insert(), find(), isEmpty(), parse() and some others.

9

The implementation of parse() involves the calls to isEmpty() and insert() in sequential

order and hence implies that the class is self-sufficient. Each function/subroutine does

one required task which makes the class highly-cohesive.

ii) The HTTPServer.C class is another example which fits apt into the high-cohesive

structure. The class comprises of the following set of functions: strconvert(), FileSize(),

FileDate(), typeChecker(), generateHTML(), setBuffer(), getBuffer(), dyncheck() and

some others. Each of these functions is written specific to certain tasks (as their names

suggest), and hence, their implementation together makes the code comprehensive,

coherent and robust. As the prime function of this class is called from within the main

class; HTTPServer->Run(), almost all of these functions are sequentially executed

thereby again supporting another facet of cohesion.

3.3. COUPLING

Coupling is defined as a degree to which different modules of the program are

inter-dependent. The implementation of the web server is aimed at minimizing coupling,

which is done by providing a complete set of different functions on simple interfaces

supporting code-reusability. Although avoiding coupling totally is very improbable, the

following module dependency diagram (Fig 1.4) elaborates the design structure that

promotes minimal class dependence.

As seen, the diagram closely follows the waterfall type model of growth and

dependence. As explained in the earlier section, each class is self-sufficient in regards to

the functions it contains which, discourages any polymorphism. Also, the abstract

structure is based on a hierarchical model and is unidirectional in flow. This avoids the

spaghetti code and provides the perfect platform for minimal coupling.

10

Fig 1.4 Module Dependency Diagram

4. TESTING METHODS AND RESULTS

Testing plays a key role throughout the development of the server and to assure

the final product meets project standards. Testing is an exhausting process as there are

an infinite amount of test cases. For this reason, only major components and their corner

cases were focused upon. Two methods were used, one using a standard web browser

and the other using Telnet. Telnet is used because it provides more control over the

testing procedure; requests and headers can be manually entered. The browser, on the

other hand, is an end-user tool which masks all the background information and

LLoogg

HHTTTTPPSSeerrvveerr

CCoonnffiigg

HHTTTTPPRReeqquueesstt

MMuutteexx

ssssbbuuff

SSoocckkeett

RRuunnnnaabbllee

TThhrreeaaddss

TTyyppeess

HHTTTTPPMMeessssaaggee

SSeerrvveerr

((MMaaiinn))

11

processes. Finally, each individual module is tested to ensure everything will work prior

to integration. In order to provide aggressive testing we believe that all 3 methods must

be used when testing.

i) Testing using Telnet

Telnet is a terminal emulation program for networks such as the Internet. It is ideal

as it allows a user to manually enter commands as if it were directly connected to the

server. Numerous tests were performed this way; it provided great control over the

server’s processes. An example is given below:

Request: get /google.html http/1.1 Host: www.google.com

Purpose: To request the file “google.html” from the server “google.com”. If all goes

well, this page will be displayed on the terminal screen. The second line in the request is

known as a header (up to 10 headers can accompany a request). We could have also

provided 15 headers to see how the server would respond or provide the incorrect

format for requests. This includes multiple blanks, incorrect order of parameters and

invalid methods (such as “put” as opposed to “get”). As you can see, there are an infinite

amount of possibilities. For more test cases and results please see Table 1 in the

Appendix A.

ii) Testing using a standard web browser

Telnet is extremely useful for testing and debugging, however, the end-user will most

likely be using a web browser such as Internet Explorer or Firefox. This sort of testing is

known as “Black box” testing. All you are concerned with is the input and the output.

Unlike Telnet, in-between processes cannot be monitored or controlled through a

browser. An example is given below how a simple request is made:

Request: http://ugsparc58.eecg.toronto.edu:54012/google.html

12

Outcome: The actual Google page will be displayed through the browser. Only the

“google.html” parameter is entered. In this method, the browser controls the headers and

handles requests automatically.

iii) Method/Function testing

The above two methods prove to be essential in the progress of the server

development. However, these methods are not useful when a problem is occurring

within different modules of the code. For this situation, modules must be tested

separately before being integrated into the main server. An example is given below on

how this is done:

Module being tested: Config.C

Function being tested: Config -> find(“Port”);

Result: This function returns the number “55555”. Its purpose is to open the

config file, search for the word “port” and get the information linked to it. Using

this method, many items can be looked up such as “Logs”, “Server instances”

and “Virtual Hosts”. As with Telnet, many modules or functions can be tested.

Table 2 in the Appendix provides more examples.

It is crucial to ensure testing is accurate and thorough as possible. This not only

makes integration easier but saves time when doing final server testing. The above three

methods were detrimental in the progress of our server. In certain cases, testing proved

that modules contained bugs. The Load Generator for example, was found to create 36

files all with the same file size. Method testing identified that the function was caught in

its loop and would not break and continue from the program. Load balancing was an

issue as the server became far too stressed when multiple clients were issuing requests.

This is only a serious concern depending on the use of the server. For a small number of

clients (20-30) are server is able to function properly.

13

5. PERFORMANCE

Testing provides a means of checking how robust and accurate a server

operates. Performance is a measure of a server’s speed and response time. To gather

this information a separate program was designed. This program will generate a series

of test files available for download. The next step is to start a timer, send a request,

receive a file, stop the timer and finally check if file sizes are correct. From these two

pieces of information vital performance statistics can be calculated. Table 1 below

provides a summary of the results when compared to the Apache™ HTTP server:

Table 1: Performance Statistics

6. DESIGN EVALUATION

Two main factors of design and functionality will be considered in order to

provide a fair evaluation of the web server. As the project was executed in two stages,

ECE299 Server Apache™ HTTP server

Number of files requested for and received

36 36

File Size

All files downloaded successfully with correct file size

All files downloaded successfully with correct file size

Simultaneous multi-threaded connections

15

15

Response Times Test 1 Test 2 Test 3

1.60s 1.54s 1.59s

1.30s 1.36s 1.40s

Average Response Time 1.58s 1.35s

Bandwidth Test 1 Test 2 Test 3

460 Kbps 463 Kbps 455 Kbps

477 Kbps 469 Kbps 470 Kbps

Average Bandwidth 459Kbps 472 Kbps

14

by the end of first stage, the design team had implemented a solid and robust base

server which included all the server management capabilities. Moving forward, adding

multi-threading capability to the server allowed it to serve multiple connections at a time.

However, upon testing there were a couple of limitations that were brought into design

team’s notice. The first was that in case of multiple simultaneous connections, the very

first connection to the server did not allow the session information to be refreshed.

However, this was the case only if the user made numerous requests simultaneously.

Secondly, the maximum number of users that the server allowed to connect and

serve without crashing was noted to be 30. Although this number is significantly less as

compared to the commercial web servers, the proposed server is robust and works

efficiently with no noticeable delay time within its limitations.

Also, as mentioned in the performance section, by the means of Load Generator,

it was noted that our server reached an average bandwidth of 459Kbps with a response

time of 1.58s. These numbers are impressive when compared to a well know server

such as Apache™ HTTP server. Their bandwidth is 470 Kbps with a response time of

1.40s.

7. CONCLUSION

In spite of certain limitations of the web server and the limited experience of the design

team in server programming, the designed server can be concluded as successful in

meeting all of its design requirements. The key engineering decisions that were taken

made the server robust, efficient and reliable in regards to performance and stability to

work for long spans of time.

15

APPENDIX A – TEST RESULTS

Request Purpose Outcome/Result

get /google.html http/1.1 To retrieve html page which exists

Page retrieved/Success

get /junk.html http/1.0 To retrieve html page which does not exist

Error 404 page retrieved/Success

get /google.html http/1.1

To check if request is valid given that there are multiple spaces within parameters

Page retrieved/Success

get /ferrari.jpg http/1.1 To check whether alternate file types can be downloaded

Picture retrieved/Success

get /hotmail.html http/1.1 Host: www.hotmail.com x15

To ensure only the first 10 headers are entered

Telnet confirms that first 10 are entered into list/Success

get /garbage.html http/1.1 sdfdsadfkls;jfsdalfjas;l

To check if server rejects request with “garbage” parameters

Server Crashes/Failed

get /group/file.jpg http/1.2 Test if server can process directories

Page retrieved from director/Success

Post /file.dyn http/1.1 “Text entered here”

To see if post will create text file and output it to screen

Text outputted to screen/Success

post or get Test if incomplete requests are rejected

Server recognizes incomplete requests and discards them/Success

Table 1: Sample test cases and results using Telnet

Module Given Input/function Output Result

Config.C Config->find(“Port); 55555 Retrieves port number/Success

HTTPrequest.C Get /test.html http/1.1 Method: Get URL: test.html Version: http/1.1

Parses request/Success

TypeChecker( ) Typechecker(“google.html”); “2” “2” corresponds to “.html”/Success

ReadAndParse( ) ReadAndParse(buffer) buffer = “ “

Program crashes

Supposed to handle this error/Failed

Table 2: Testing separate modules

16

APPENDIX B – OVERALL SERVER WORKING ALGORITHM CHART

Fig 1 Overall Server Working Algorithm Chart

Protocol Processing

Server

Multi-threading

HTTPServer

Content Type Handler

Static/Dynamic

Request

Config Logs

Process Request

Send information back to client

Server Management

Server Management

Concurrent Connections

Handling Dynamic Content

17

APPENDIX C – DYNAMIC CONTENT HANDLING ALGORITHM CHART

1

Check if request is

dynamic

2

.dyn?

3

Post dynamic

request

3

Get dynamic

request

5

Display the

requested jpeg

page

4

Feed the request

to

system(command)

5

Feed the request

to

system(command)

4

Create a .txt file

containing client

request

Fig 2 Dynamic Content Handling Algorithm Chart

18

APPENDIX D – SAMPLE FROM ACCESS LOG FILE

128.100.175.14 Sat Apr 8 04:41:36 2006 "get public_www HTTP/1.1" 200 359 128.100.175.14 Sat Apr 8 04:41:46 2006 "get public_www HTTP/1.1" 200 359 128.100.175.14 Sat Apr 8 04:41:58 2006 "get public_www HTTP/1.1" 200 614 128.100.175.14 Sat Apr 8 04:42:07 2006 "get public_www HTTP/1.1" 200 614 128.100.175.14 Sat Apr 8 04:44:47 2006 "get public_www HTTP/1.1" 200 359 128.100.175.14 Sat Apr 8 04:50:10 2006 "get public_www HTTP/1.1" 200 359 128.100.175.14 Sat Apr 8 04:52:31 2006 "get public_www HTTP/1.1" 200 359 128.100.175.14 Sat Apr 8 04:53:56 2006 "get public_www/cpp.html HTTP/1.1" 200 5627 128.100.175.14 Sat Apr 8 04:53:56 2006 "get public_www/cpp_files/cs.css HTTP/1.1" 200 561 128.100.13.215 Sat Apr 8 07:50:21 2006 "get public_www/google.html HTTP/1.1" 200 4500 128.100.13.215 Sat Apr 8 07:51:13 2006 "get public_www HTTP/1.1" 200 87 128.100.13.215 Sat Apr 8 07:52:05 2006 "get public_www/ign.html HTTP/1.1" 200 103223 128.100.175.24 Sun Apr 9 03:04:24 2006 "get public_www/hotmail.html HTTP/1.1" 200 9792 128.100.175.24 Sun Apr 9 03:04:28 2006 "get public_www/ign.html HTTP/1.1" 200 103223 128.100.175.24 Sun Apr 9 03:08:32 2006 "get public_www/hotmail.html HTTP/1.1" 200 9792 128.100.175.24 Sun Apr 9 03:34:22 2006 "get public_wwwgmail.html HTTP/1.1" 200 0 128.100.175.24 Sun Apr 9 03:40:39 2006 "get public_www/~gaurav HTTP/1.1" 200 0 128.100.175.24 Sun Apr 9 03:43:03 2006 "get public_www/~gaurav HTTP/1.1" 200 0 128.100.175.24 Sun Apr 9 03:49:46 2006 "get public_www HTTP/1.1" 200 614 128.100.175.24 Sun Apr 9 03:50:30 2006 "get public_www HTTP/1.1" 200 339 128.100.175.24 Sun Apr 9 03:52:41 2006 "get public_www/gmail.html HTTP/1.1" 200 16638 128.100.175.24 Sun Apr 9 03:52:41 2006 "get public_www/gmail_files/mail HTTP/1.1" 200 1853 128.100.175.24 Sun Apr 9 03:54:39 2006 "get public_www HTTP/1.1" 200 333 128.100.175.24 Sun Apr 9 03:54:49 2006 "get public_www/value.html

Documents

THE WEB SERVER PROJECT