Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
I
THE WEB SERVER PROJECT
Final Design Report
ECE299 – Communication and Design II
Date of Submission:
Saturday, April 15, 2006
Submitted by:
Daniel Dieana (993124221) Rishi Kohli (994055017) Gaurav Jain (993796380)
Team # 092
II
EXECUTIVE SUMMARY
Our group has successfully designed a web server that accepts browser
requests and delivers Web pages to in response and other files through HTTP protocol.
This project was implemented as a course requisite for ECE299 by a group of three
students. The server integrates many features which allows for the smooth, continuous
and reliable operation for server-based applications. Features such as dynamic content
handling, multi-threading and error logging have been streamlined into the code to
provide ease of use and maintainability. This report provides a detailed analysis of some
of the design features, their functionality and structure. Also, discussed are the important
design decisions which have been taken in order to ensure the smooth functionality of
the web server.
The primary design of the server is based upon the idea of Modularity, Cohesion
and Coupling. The above design elements are explained further in the report. They
provide a means to structure the code in a manner that is coherent and well-thought.
Our server does have certain limitations. It does not handle load balancing and has the
potential to become overwhelmed. As well, there is an issue with multi-threading which
stays in a session and does not allow new information to be refreshed. This is only a
problem if a user makes numerous requests simultaneously. These issues aside, our
server proves to handle many situations properly as shown in the testing section.
A load generator was designed to test the server and to produce statistics resulting
from performance and reliability. It verified that our server was operating within project
specifications. Our server reached an average bandwidth of 459Kbps with a response
time of 1.58s. These numbers are impressive when compared to a well know server
such as Apache™ HTTP server. Their bandwidth is 470 Kbps with a response time of
1.40s.
III
TABLE OF CONTENTS
Executive Summary... …………………………………………………………………… II 1. Introduction……………………………………………………………………………...1 2. Design and Functionality Overview………………………………………………......2
- 2.1. Server Management……………………………………………………………2
- 2.2. Concurrent Connections…………………………………………………........5
- 2.3. Load Generation………………………………………………………………..6
-2.4. Handling Dynamic Content………………………………………………….....6
3. Design Basis: Stress on Modularization…………………………………………......8
4. Testing Methods and Results……………………………………………………......10
5. Performance……………………………………………………………………………13
6. Server Evaluation………………………………………………………………….......13 7. Conclusion………………………………………………………………………………14 APPENDIX A………………………………………………………………………………15 APPENDIX B………………………………………………………………………………16 APPENDIX C………………………………………………………………………………17
APPENDIX D………………………………………………………………………………18
1
1. INTRODUCTION
This report provides a full analysis on the design and functionality of the web
server implemented by team #92 as a course requisite for ECE299. According to the
guidelines presented in the project handout (http://ccnet.utoronto.ca/20061/ece299h1s/),
our group has implemented the web server in C++ language operating on Solaris TM
platform. The main goal of this report is twofold:
• To describe the functionality embedded into the server
• To evaluate the server based on testing results
This report contains a design and functionality overview, testing methods and
harnesses, a performance section, and a final server evaluation. The overview provides
a brief outline for some of the features of the server (Server Management, Concurrent
Connections, Handling Dynamic Request and Load Generator) along with their design
decisions. The design was based on 3 key elements including modularity, cohesion and
coupling. Modularity was essential in the design of the server as it promotes simplicity
and simplifies the process of integration. Cohesion is a measure of how strongly related
and focused the responsibilities of a single class are. This ensured that the likelihood of
code reuse is increased and complexity kept manageable. With low coupling, a change
in one module will not require a change in the implementation of another module. This
provides a well-structured system, however, sacrificing maximum efficiency of the
program. Testing methodology is provided along with specific examples and outcomes
of certain features. Performance is important for any server and so a summary follows
which provides numerical statistics to reinforce our claims. Finally, a server evaluation
concludes this report.
2
2. DESIGN AND FUNCTIONALITY OVERVIEW
The following section describes some of the key design features – their
functionality and implementation details. Refer to APPENDIX B for the main design
description flowchart.
2.1. SERVER MANAGEMENT
In order for a web server to run effectively and continuously, it is necessary to get
feedback about certain issues such as the activity and performance of the web server
along with any problems that may be occurring. The following features play a key role in
order to manage the server effectively.
Log Files - Functionality
The web server implemented by the design team includes extensive logging
capabilities. Each client request that is received by the server is parsed into different
details such as the IP Address of the client, the requested file and then appended into a
readable format along with other details such as the current time, date and the file size.
Below is provided an example taken from access log file (Refer to APPENDIX D) that
our web server uses to store the successfully responded requests:
128.100.175.14 Sat Apr 8 04:41:36 2006 "get public_www HTTP/1.1" 200 359
Storing log information in various log files is the first step in log management. The next
step is to be able to analyze this information to provide important and useful statistics.
Although, various applications are available on the World Wide Web, Log Analysis is
something that was beyond the scope of the project and will not be discussed in this
document.
Client IP Address Date and Time Relative URL Size of the file
3
Log Files – Design and Structure
The logging capabilities of the web server were implemented by making use of
the object-oriented interface of C++ language. Using inheritance, three separate classes
were derived from one parent class named as “Log.C”. The inheritance chart provided
below further explains the relationship between separate classes. The inheritance
mechanism provides the flexibility to add in further log files in addition to these three
mentioned below.
AccessLog.C
ErrorLog.C
DebugLog.C
Log.C
Fig 1.1 Log Files Inheritance Structure
Configuration File Format – Functionality
A configuration file is an important tool for a Web Server to configure itself. The
first instructions that our server executes after being run, is read and parse the
configuration file (which is saved as a .txt extension file in the root directory), and get
important details such as Port number, Root directory and other relevant information.
Configuration File Format – Design and Structure
The feature to read and parse the configuration file is based on the Linked List
structure, which is implemented in a separate class Config.C. The uniqueness of the
design lies in the fact that each new piece of information is parsed into a separate node
of the linked list which grows infinitely. This allows the flexibility of entering more
parameters to the configuration file at the later stages. Also, the Config.C class has
4
various functions such as find (“Port”), which are used for retrieval of the required data
such as the Port number.
Port: 55555 DocRoot: pubic_wwwAccessLog: logs/
access.log
Fig 1.2 Linked List format of the Config class
Graceful Server Shutdown – Functionality and Design
Our server implements a feature that allows the server to be shut down through
the Unix signal mechanism1. This allows the server to shut down, where no new
connections are accepted, but the server responds appropriately to the already received
requests. Whenever there is a signal sent by the operating system (Unix in our case),
the signal handler, HUP_received(), is invoked asynchronously, which sets the value of
the global variable ‘Terminate’ to true. The main program contains a set of continuous
loops running until the value of ‘Terminate’ is false, it checks for the value of this global
variable and exits immediately. The basic design and flow on how graceful server
shutdown is implemented is provided below.
Fig. 1.3 Graceful Server Shutdown Flowchart
HUP_received( ) HUP signal
Terminate
Set Terminate to true TRUE
?
Server Shutdown
Continue running Server
YES
5
2.2. CONCURRENT CONNECTIONS
The concept of multithreading is used to enable a web server to handle multiple
requests from different users simultaneously. Multithreading involves executing different
parts of the program called threads.
Multithreading – Design and Structure
As the server project was implemented in two stages, the fundamental
architecture of the web server from the first stage was changed to incorporate
multithreading. The program now included an element of parallelism where the
HTTPServer class now derived from a base class Runnable. In the main program, with
each new connection a new instance of HTTPServer class is created. Also, a new
thread is created associated with that particular instance of HTTPServer class, followed
by the execution of that thread. The Pseudo-code to elaborate this further is provided
below.
While (Server running) { Waiting for a new connection { Create a new thread;
Create a new instance of HTTPServer and link in with the thread;
Run Thread; } }
Design Decisions/Uniqueness The principle advantage of using the above structure was that it made the
program very efficient as it allowed the program to utilize idle time that was present
before. Also, it created a new thread for each connection and dynamically destroyed it
upon the closure of that connection avoiding any memory leaks.
Secondly, one key decision that the design team had to incorporate was to limit
the maximum number of simultaneous connections to the server to 30. This was done to
6
prevent the server from crashing and make the load time faster. This is further discussed
in the evaluation section of the report.
2.3. LOAD GENERATION
The load generation is responsible for creating file content and accessing them
through the server. This process is clocked and vital performance statistics can be
calculated. A stand-alone program was designed separate from the server which could
be run through the UNIX terminal. The code was designed for simplicity and efficiency.
The following is a code snippet from HTMLGeneration.C:
//Generic file name char fileName[20] = "classa_b"; //takes care of class0 (below 1k; increment 0.1k) for(int count=0; count<9; count++){
strcpy(dir,dirBackUp); //replace dir with backup fileName[5] = '0'; fileName[7] = (char)count+49; ofstream test(dir); for (int sizeCount=0; sizeCount<class0 + count*102; sizeCount++){
test << "A"; } }
The underlined items show how the file and file sizes are generated. A generic
file name is given such as “classa_b” where the letters “a” and “b” can be replaced as
the loop cycles. In order to create specific file sizes the character “A” was inserted into
the files. One character represents a byte so to create a file size of 102Kb, 12750 “A” ‘s
would be inputted into the file. The “for” loop is responsible for adding the characters
until it reaches the parameter “sizeCount” which will then exit the loop and return for a
different file of a different size. This function is compact, which allows it to be accessed
many times. Using this feature, files could be generated until the hard drive runs out of
space.
7
2.4 HANDLING DYNAMIC CONTENT (GET and POST Requests)
In case of present web serving, often clients enter specific data in the browser
window and expect a customized response in return. This was an essential feature of
our web server, which allowed the user to enter any arbitrary content in the URL and
view custom made pages depending on the request entered. Generally, dynamic content
handling can be divided into main categories: GET and POST Requests. Initially, inside
our HTTPServer, the request is checked for the dynamic extension, specifically,
“.dyn”. After that, the program checks if the request is of type GET or POST and
takes further actions depending on the request type, which are described below (Refer
to APPENDIX C for algorithm chart for handling dynamic requests).
Dynamic GET Requests: Our program parses and separates out the dynamic
parameters from the URL, arranges them in a particular format and feeds that format to
the dynamic program via the system command. The dynamic program then returns in
response either a customized JPEG page or an error message in case of incorrect
request, on the browser window.
Dynamic POST Requests: Dynamic POST requests are generally requested when the
data that the user wants to transfer through the request is substantially large. Our
program creates a .txt file which stores all the information that the client wants to be
transferred. Then, that file is passed to the dynamic program just like in the case of GET
requests, via a system command, which, in turn, responds back to the client.
Design Decisions/Uniqueness
1. In case of dynamic GET requests, the design team incorporated the decision to
display a customized JPEG page, instead of just displaying the dynamic
8
parameters on the browser window. This added a personal touch to the program
making the program unique and works similar to Google Image Search. For
example, a request format below would display a JPEG page including the
pictures of apple and mango.
http://ugpsarc251.eecg.toronto.edu:55555/program.dyn?fruit=apple&fruit=mango.
2. As seen from the above request example, the server also allows the user to allow
multiple parameters (up to a maximum of three) instead of a single one, where
different parameters are separated by the ‘&’ character. This gives the client the
flexibility to view customized pages which include multiple pictures.
3. DESIGN BASIS: STRESS ON MODULARIZATION
3.1. MODULARITY
The first basic design strategy in implementing all the classes that has been
followed is keeping the code modular. The base of the server is kept very light in terms
of code, and all other functions are implemented as modules that can be added to the
base. Most of the source code has been written keeping in mind the factors of
modularity: High-cohesion and Low-coupling. These modularization drivers are
discussed below with relevant examples.
3.2. COHESION
As explained earlier in the introduction, cohesion is a measure of how strongly
related and focused the responsibilities of a single class are. The server comprises of
many classes which are robust and tightly knit in regards to the functions that they
contain. Below is an example of a couple of classes that demonstrate high-cohesion.
i) The Config.C class which is used to read and parse information from the configuration
file has the following set of functions: insert(), find(), isEmpty(), parse() and some others.
9
The implementation of parse() involves the calls to isEmpty() and insert() in sequential
order and hence implies that the class is self-sufficient. Each function/subroutine does
one required task which makes the class highly-cohesive.
ii) The HTTPServer.C class is another example which fits apt into the high-cohesive
structure. The class comprises of the following set of functions: strconvert(), FileSize(),
FileDate(), typeChecker(), generateHTML(), setBuffer(), getBuffer(), dyncheck() and
some others. Each of these functions is written specific to certain tasks (as their names
suggest), and hence, their implementation together makes the code comprehensive,
coherent and robust. As the prime function of this class is called from within the main
class; HTTPServer->Run(), almost all of these functions are sequentially executed
thereby again supporting another facet of cohesion.
3.3. COUPLING
Coupling is defined as a degree to which different modules of the program are
inter-dependent. The implementation of the web server is aimed at minimizing coupling,
which is done by providing a complete set of different functions on simple interfaces
supporting code-reusability. Although avoiding coupling totally is very improbable, the
following module dependency diagram (Fig 1.4) elaborates the design structure that
promotes minimal class dependence.
As seen, the diagram closely follows the waterfall type model of growth and
dependence. As explained in the earlier section, each class is self-sufficient in regards to
the functions it contains which, discourages any polymorphism. Also, the abstract
structure is based on a hierarchical model and is unidirectional in flow. This avoids the
spaghetti code and provides the perfect platform for minimal coupling.
10
Fig 1.4 Module Dependency Diagram
4. TESTING METHODS AND RESULTS
Testing plays a key role throughout the development of the server and to assure
the final product meets project standards. Testing is an exhausting process as there are
an infinite amount of test cases. For this reason, only major components and their corner
cases were focused upon. Two methods were used, one using a standard web browser
and the other using Telnet. Telnet is used because it provides more control over the
testing procedure; requests and headers can be manually entered. The browser, on the
other hand, is an end-user tool which masks all the background information and
LLoogg
HHTTTTPPSSeerrvveerr
CCoonnffiigg
HHTTTTPPRReeqquueesstt
MMuutteexx
ssssbbuuff
SSoocckkeett
RRuunnnnaabbllee
TThhrreeaaddss
TTyyppeess
HHTTTTPPMMeessssaaggee
SSeerrvveerr
((MMaaiinn))
11
processes. Finally, each individual module is tested to ensure everything will work prior
to integration. In order to provide aggressive testing we believe that all 3 methods must
be used when testing.
i) Testing using Telnet
Telnet is a terminal emulation program for networks such as the Internet. It is ideal
as it allows a user to manually enter commands as if it were directly connected to the
server. Numerous tests were performed this way; it provided great control over the
server’s processes. An example is given below:
Request: get /google.html http/1.1 Host: www.google.com
Purpose: To request the file “google.html” from the server “google.com”. If all goes
well, this page will be displayed on the terminal screen. The second line in the request is
known as a header (up to 10 headers can accompany a request). We could have also
provided 15 headers to see how the server would respond or provide the incorrect
format for requests. This includes multiple blanks, incorrect order of parameters and
invalid methods (such as “put” as opposed to “get”). As you can see, there are an infinite
amount of possibilities. For more test cases and results please see Table 1 in the
Appendix A.
ii) Testing using a standard web browser
Telnet is extremely useful for testing and debugging, however, the end-user will most
likely be using a web browser such as Internet Explorer or Firefox. This sort of testing is
known as “Black box” testing. All you are concerned with is the input and the output.
Unlike Telnet, in-between processes cannot be monitored or controlled through a
browser. An example is given below how a simple request is made:
Request: http://ugsparc58.eecg.toronto.edu:54012/google.html
12
Outcome: The actual Google page will be displayed through the browser. Only the
“google.html” parameter is entered. In this method, the browser controls the headers and
handles requests automatically.
iii) Method/Function testing
The above two methods prove to be essential in the progress of the server
development. However, these methods are not useful when a problem is occurring
within different modules of the code. For this situation, modules must be tested
separately before being integrated into the main server. An example is given below on
how this is done:
Module being tested: Config.C
Function being tested: Config -> find(“Port”);
Result: This function returns the number “55555”. Its purpose is to open the
config file, search for the word “port” and get the information linked to it. Using
this method, many items can be looked up such as “Logs”, “Server instances”
and “Virtual Hosts”. As with Telnet, many modules or functions can be tested.
Table 2 in the Appendix provides more examples.
It is crucial to ensure testing is accurate and thorough as possible. This not only
makes integration easier but saves time when doing final server testing. The above three
methods were detrimental in the progress of our server. In certain cases, testing proved
that modules contained bugs. The Load Generator for example, was found to create 36
files all with the same file size. Method testing identified that the function was caught in
its loop and would not break and continue from the program. Load balancing was an
issue as the server became far too stressed when multiple clients were issuing requests.
This is only a serious concern depending on the use of the server. For a small number of
clients (20-30) are server is able to function properly.
13
5. PERFORMANCE
Testing provides a means of checking how robust and accurate a server
operates. Performance is a measure of a server’s speed and response time. To gather
this information a separate program was designed. This program will generate a series
of test files available for download. The next step is to start a timer, send a request,
receive a file, stop the timer and finally check if file sizes are correct. From these two
pieces of information vital performance statistics can be calculated. Table 1 below
provides a summary of the results when compared to the Apache™ HTTP server:
Table 1: Performance Statistics
6. DESIGN EVALUATION
Two main factors of design and functionality will be considered in order to
provide a fair evaluation of the web server. As the project was executed in two stages,
ECE299 Server Apache™ HTTP server
Number of files requested for and received
36 36
File Size
All files downloaded successfully with correct file size
All files downloaded successfully with correct file size
Simultaneous multi-threaded connections
15
15
Response Times Test 1 Test 2 Test 3
1.60s 1.54s 1.59s
1.30s 1.36s 1.40s
Average Response Time 1.58s 1.35s
Bandwidth Test 1 Test 2 Test 3
460 Kbps 463 Kbps 455 Kbps
477 Kbps 469 Kbps 470 Kbps
Average Bandwidth 459Kbps 472 Kbps
14
by the end of first stage, the design team had implemented a solid and robust base
server which included all the server management capabilities. Moving forward, adding
multi-threading capability to the server allowed it to serve multiple connections at a time.
However, upon testing there were a couple of limitations that were brought into design
team’s notice. The first was that in case of multiple simultaneous connections, the very
first connection to the server did not allow the session information to be refreshed.
However, this was the case only if the user made numerous requests simultaneously.
Secondly, the maximum number of users that the server allowed to connect and
serve without crashing was noted to be 30. Although this number is significantly less as
compared to the commercial web servers, the proposed server is robust and works
efficiently with no noticeable delay time within its limitations.
Also, as mentioned in the performance section, by the means of Load Generator,
it was noted that our server reached an average bandwidth of 459Kbps with a response
time of 1.58s. These numbers are impressive when compared to a well know server
such as Apache™ HTTP server. Their bandwidth is 470 Kbps with a response time of
1.40s.
7. CONCLUSION
In spite of certain limitations of the web server and the limited experience of the design
team in server programming, the designed server can be concluded as successful in
meeting all of its design requirements. The key engineering decisions that were taken
made the server robust, efficient and reliable in regards to performance and stability to
work for long spans of time.
15
APPENDIX A – TEST RESULTS
Request Purpose Outcome/Result
get /google.html http/1.1 To retrieve html page which exists
Page retrieved/Success
get /junk.html http/1.0 To retrieve html page which does not exist
Error 404 page retrieved/Success
get /google.html http/1.1
To check if request is valid given that there are multiple spaces within parameters
Page retrieved/Success
get /ferrari.jpg http/1.1 To check whether alternate file types can be downloaded
Picture retrieved/Success
get /hotmail.html http/1.1 Host: www.hotmail.com x15
To ensure only the first 10 headers are entered
Telnet confirms that first 10 are entered into list/Success
get /garbage.html http/1.1 sdfdsadfkls;jfsdalfjas;l
To check if server rejects request with “garbage” parameters
Server Crashes/Failed
get /group/file.jpg http/1.2 Test if server can process directories
Page retrieved from director/Success
Post /file.dyn http/1.1 “Text entered here”
To see if post will create text file and output it to screen
Text outputted to screen/Success
post or get Test if incomplete requests are rejected
Server recognizes incomplete requests and discards them/Success
Table 1: Sample test cases and results using Telnet
Module Given Input/function Output Result
Config.C Config->find(“Port); 55555 Retrieves port number/Success
HTTPrequest.C Get /test.html http/1.1 Method: Get URL: test.html Version: http/1.1
Parses request/Success
TypeChecker( ) Typechecker(“google.html”); “2” “2” corresponds to “.html”/Success
ReadAndParse( ) ReadAndParse(buffer) buffer = “ “
Program crashes
Supposed to handle this error/Failed
Table 2: Testing separate modules
16
APPENDIX B – OVERALL SERVER WORKING ALGORITHM CHART
Fig 1 Overall Server Working Algorithm Chart
Protocol Processing
Server
Multi-threading
HTTPServer
Content Type Handler
Static/Dynamic
Request
Config Logs
Process Request
Send information back to client
Server Management
Server Management
Concurrent Connections
Handling Dynamic Content
17
APPENDIX C – DYNAMIC CONTENT HANDLING ALGORITHM CHART
1
Check if request is
dynamic
2
.dyn?
3
Post dynamic
request
3
Get dynamic
request
5
Display the
requested jpeg
page
4
Feed the request
to
system(command)
5
Feed the request
to
system(command)
4
Create a .txt file
containing client
request
Fig 2 Dynamic Content Handling Algorithm Chart
18
APPENDIX D – SAMPLE FROM ACCESS LOG FILE
128.100.175.14 Sat Apr 8 04:41:36 2006 "get public_www HTTP/1.1" 200 359 128.100.175.14 Sat Apr 8 04:41:46 2006 "get public_www HTTP/1.1" 200 359 128.100.175.14 Sat Apr 8 04:41:58 2006 "get public_www HTTP/1.1" 200 614 128.100.175.14 Sat Apr 8 04:42:07 2006 "get public_www HTTP/1.1" 200 614 128.100.175.14 Sat Apr 8 04:44:47 2006 "get public_www HTTP/1.1" 200 359 128.100.175.14 Sat Apr 8 04:50:10 2006 "get public_www HTTP/1.1" 200 359 128.100.175.14 Sat Apr 8 04:52:31 2006 "get public_www HTTP/1.1" 200 359 128.100.175.14 Sat Apr 8 04:53:56 2006 "get public_www/cpp.html HTTP/1.1" 200 5627 128.100.175.14 Sat Apr 8 04:53:56 2006 "get public_www/cpp_files/cs.css HTTP/1.1" 200 561 128.100.13.215 Sat Apr 8 07:50:21 2006 "get public_www/google.html HTTP/1.1" 200 4500 128.100.13.215 Sat Apr 8 07:51:13 2006 "get public_www HTTP/1.1" 200 87 128.100.13.215 Sat Apr 8 07:52:05 2006 "get public_www/ign.html HTTP/1.1" 200 103223 128.100.175.24 Sun Apr 9 03:04:24 2006 "get public_www/hotmail.html HTTP/1.1" 200 9792 128.100.175.24 Sun Apr 9 03:04:28 2006 "get public_www/ign.html HTTP/1.1" 200 103223 128.100.175.24 Sun Apr 9 03:08:32 2006 "get public_www/hotmail.html HTTP/1.1" 200 9792 128.100.175.24 Sun Apr 9 03:34:22 2006 "get public_wwwgmail.html HTTP/1.1" 200 0 128.100.175.24 Sun Apr 9 03:40:39 2006 "get public_www/~gaurav HTTP/1.1" 200 0 128.100.175.24 Sun Apr 9 03:43:03 2006 "get public_www/~gaurav HTTP/1.1" 200 0 128.100.175.24 Sun Apr 9 03:49:46 2006 "get public_www HTTP/1.1" 200 614 128.100.175.24 Sun Apr 9 03:50:30 2006 "get public_www HTTP/1.1" 200 339 128.100.175.24 Sun Apr 9 03:52:41 2006 "get public_www/gmail.html HTTP/1.1" 200 16638 128.100.175.24 Sun Apr 9 03:52:41 2006 "get public_www/gmail_files/mail HTTP/1.1" 200 1853 128.100.175.24 Sun Apr 9 03:54:39 2006 "get public_www HTTP/1.1" 200 333 128.100.175.24 Sun Apr 9 03:54:49 2006 "get public_www/value.html