Upload
lethuan
View
218
Download
2
Embed Size (px)
Citation preview
Capstone Project Cover Sheet
Capstone Project Name: Implementing an Online Database for Scientific Conference
Presentations
Student Name: Eric Forthman
Degree Program: BSIT
Mentor Name: Lori Hawker / Jason Jia – Capstone
Signature Block
Student’s Signature
Mentor’s Signature
Implementing an Online Database for Scientific Conference Presentations Page 2
Table of Contents
Capstone Report Summary …..…………………………………………………………... 3
Review of Other Work ..……..………………………………………………………....... 6
Rationale and Systems Analysis .………………………………………………………... 8
Goals and Objectives .……….………………………………………………………….. 10
Project Deliverables …………………………………………………………………….. 15
Project Plan and Timelines ..……………………………………………………………. 17
Project Development …………………………………………………………………... 20
Additional Deliverables ………………………………………………………………... 22
Conclusions …………………………………………………………………………….. 23
References ………………………………………………………………………………. 24
Appendix 1: Competency Matrix ….……………………………………………………. 25
Appendix 2: Database Table Schema …………………………………………………... 27
Implementing an Online Database for Scientific Conference Presentations Page 3
Capstone Report Summary
The main thrust of this project was to provide organized storage, platform neutral access
and a clearly organized web interface for a collection of slide presentations created in Microsoft
PowerPoint™ and subsequently used in lectures given at XYZ Association’s international
scientific conference.
The conference was one week in duration and was organized around a daily schedule of
topic-specific sessions held simultaneously in multiple meeting rooms. Many of the sessions
involved more than one lecture, each with a corresponding slide presentation.
A primary purpose of this project was to make the collection of the slide presentations
accessible through the internet. A calendar of sessions and lectures already existed, along with
previously submitted abstracts including titles and other information corresponding to the
lectures. This provided the initial logical layout for the main user interface, which has been
graphically presented as an html image map over the image file of the conference calendar.
Universal access to this material was also of concern. The required file type for
presentation submission is .ppt or .pptx (Microsoft PowerPoint™). While this file type can be
retrieved from the internet using many types of computer systems, it can only be displayed in
Microsoft’s IE browser, the desired paradigm is to allow any user to view these documents with
any currently available browser. To this end, the .ppt or .pptx files were converted to .pdf
(Portable Document Format), which was developed by Adobe Systems and is now an open
standard format for document files, readable on all modern computer systems and browsers.
Implementing persistent storage of the .pdf files was a straightforward matter using a
single directory on the web server. The logical organization needed to find and retrieve these
files was accomplished by employing a MySQL RDMS (Relational Database Management
Implementing an Online Database for Scientific Conference Presentations Page 4
System). Each file was named using a convention that included a session number, subject
category number, and a unique file number. These numbers along with other presentation
specific information collected from the abstracts make up the metadata used to populate the
database tables.
There are many options for the web hosting of an application. For this project, web
hosting has been implemented via the web-hosting provider Lunarpages. The option of shared
hosting on Linux servers with the MySQL database engine and PHP already installed came at a
very reasonable fee. Lunarpages also has an excellent 99.868% uptime rating for a 9-year period
beginning in December 2005 ([email protected], 2014).
The project started on schedule on July 21st 2013. Work proceeded smoothly without
issue until July 31. At this point in Phase II the database schema was addressed. XYZ uses an
Excel spreadsheet to record the metadata of each presentation. Due to an informal representation
in the association’s presentations metadata spreadsheet, multiple authors’ names, for a given
presentation, were grouped in single fields in the spreadsheet’s Authors column. The most
efficient way to import the data to the actual database table would be column-to-column data
transfer. This would cause a violation of the First Normal Form of relational database design in
the primary table, which, in a large database, could lead to a serious data redundancy problem.
Because of the relatively small size of this database and the additional time that would be
consumed by creating a new script to parse out each author name and write a new table, a
decision was made to go ahead without conforming to First Normal Form. From a practical
perspective, SQL queries were created using a “like” condition that can search through all the
column fields for author’s names, producing accurate results. Otherwise, the project proceeded
according to plan and the projected timeline was maintained without any setbacks. The project
Implementing an Online Database for Scientific Conference Presentations Page 5
was completed and operational on September 16, 2013. Final customer sign-off occurred on
September 18, 2013.
The sections of this document are:
1) Project Summary: The project overview as described in the current section.
2) Preview of Other Work: A review of similar projects to lend credence to the
implemented solutions.
3) Rationale and System Analysis: An examination of the technologies used in this
project.
4) Goals and Objectives: A detailed look at the project goals and objectives.
5) Project Deliverables: An explanation of the delivered results of the project.
6) Project Plan and Timeline: A timetable containing start date, end date and duration
for each listed task as executed to complete the project.
Implementing an Online Database for Scientific Conference Presentations Page 6
Review of Other Work
The technologies employed for the implementation of this project are HTML, CSS, PHP,
and MySQL. These technologies are all readily available in web-hosting environments. HTML
and CSS constitute the standard for rendering a user interface within a web browser. The
combination of PHP and MySQL are a very widely accepted and proven solution for the creation
of data driven web applications. The widespread usage of PHP can be seen in the alexa.com top
rankings of websites pertaining to programming languages. Websites that are PHP content
related or represent technologies constructed from PHP currently hold the positions of 3, 5, 7, 10,
14, and 22 as shown at alexa.com (Alexa, 2014).
There are many examples of websites using this combination of technologies. A web
design company named Mindfire Solutions has a list of its own case studies totaling well over
100 projects citing the use of HTML, CSS, PHP, and MySQL.
Here is a case study from Mindfire Solutions using the aforementioned technologies to
create an online repository targeted to a specific group of users.
“Video sharing community for bird and sports enthusiasts
Client:
Software Development Company
Industry:
Internet Software & Services
Technologies:
LAMP, Ajax, Javascript, Flash, HTML, CSS, PHP, MySQL, Lighttpd, Video
Streaming
Implementing an Online Database for Scientific Conference Presentations Page 7
Designed and implemented a highly secure web site similar to YouTube, where
users can share videos and pictures with the world. The website allows members
to view online videos, create forum discussions, blog and create groups with
whom they can chat, share videos and pictures online. The sites also allow Super
Administrators to login and manage users, videos, pictures, groups and forum
discussions. Super administrators can also create different user types who can
administer the site with their assigned permissions. One of the key features of
these sites is that it is self-managed and requires minimal administration. Key
features include conversion of video files uploaded by users to desired format
using FFMPEG, MENCODER, LAME, and FLVtool2 for video streaming. We
also configured and set up lighttpd for streaming and provided the client with a
flash video player that uses external interface to support JS calls and supports
streaming.” (Mindfire Solutions, 2014)
While there are additional technologies and features included in the Mindfire example
that were not necessary for our application, it shows the technologies of HTML, CSS, PHP, and
MySQL being used as fundamental building blocks in this type of project. In conclusion, these
very commonly used and accepted technologies provided a viable and stable infrastructure for
the project.
Implementing an Online Database for Scientific Conference Presentations Page 8
Rationale and Systems Analysis
The server side software technologies used for the creation of the application are PHP and
MySQL. These technologies were chosen for a variety of reasons. A pertinent concern for XYZ,
being a non-profit organization was, of course, cost. Both PHP and MySQL were created as open
source software, meaning that anyone may use these technologies without having to pay a hefty
licensing fee, which can make a smaller sized custom project, such as this one, unfeasible.
Another compelling argument is the very wide deployment of both PHP and MySQL. PHP is
the most commonly used dynamic web programming language in the world (BuiltWith, 2014). A
similar perspective can be shown for the use of MySQL. MySQL is the second most popular
database management system (DB-Engines, 2014), right behind the license-required Oracle
system. The pervasive deployment of these technologies has some advantages. User
documentation and working examples of PHP, MySQL, and the combination of the two are easy
to find and consume as necessary. Most web service providers have PHP and MySQL already
installed on their servers and included in the basic fee. The preceding facts extended to the
expertise of already knowledgeable and capable in house programmers, negating the need to hire
outside contractors.
As for client side technologies, ordinary web browsers were employed as a user interface
to further reduce costs and provided for availability as they are included in every major desktop
operating system and can be programmed with standard HTML and CSS. In order for the
presentations to be viewed in any modern web browser, regardless of vendor, the files were
converted to the PDF format, as plugins are available for most browsers. The users on any
Implementing an Online Database for Scientific Conference Presentations Page 9
system merely need to navigate to the project’s web site and install the free plugin, if necessary,
to begin accessing the repository files.
Another concern, the system uptime, was addressed during web host provider selection.
Selection of a reliable web host provider was accomplished by searching web reviews, the
provider’s operational statistics and through previous experience. The user’s ability to access the
application at any time they deem necessary was of the utmost importance.
The final product was presented as a read-only system where all the presentations were
gathered beforehand. However, the application was designed to allow for the addition of
presentations after the fact and the structure of the system can be reused for future repositories of
the same type.
Implementing an Online Database for Scientific Conference Presentations Page 10
Goals and Objectives
The goal of this project has been to enable XYZ conference attendees to access slide
presentations that were used in conjunction with various lectures during the course of a weeklong
event. A web hosted database driven repository provided for an efficient and effective means to
store, access and search this body of work. The objectives used to attain this result are broken
down into 4 phases.
1) Phase I: Obtained web hosting
a. Determined software requirements: The web-hosting service needed to be able to
provide the server side software that was necessary to implement the dynamic aspects
of the project as well as a database engine. This is an important consideration but one
which was easy to fulfill as I chose to use a couple of the most common open source
technologies, PHP and MySQL. This meant being able to select from the less
expensive web server plans that are implemented on Linux machines using Apache
web servers. Most web-hosting services offer this configuration known collectively as
LAMP (Linux, Apache, MySQL, PHP).
b. Chose a web-hosting company: Web-hosting companies number in the hundreds,
perhaps thousands, so a complete comparison is not practical. There are also many
sites that offer reviews and rankings. Having used a Los Angeles-based company
called Lunarpages for many years with almost no technical issues, I tend to gravitate
to what I am sure will function reliably. For good measure, I have checked out some
current reviews on the site Web Hosting Review. The Web Hosting Review 2014
Best Reviews and Comparisons shows Lunarpages in position 15 (Web Hosting
Review, 2014). This is likely due to the price listing of $8.95 a month. Competition
Implementing an Online Database for Scientific Conference Presentations Page 11
for web hosting is intense and all of the top rated plans come in under $10.00 a
month. Lunarpages price of $8.95 a month can be reduced to $4.95 a month by
choosing a two-year contract on the basic hosting plan, which for any organization is
a nominal cost. The resulting very competitive rate included a free domain name,
unlimited bandwidth, unlimited storage and unlimited MySQL databases on a LAMP
platform, which covered the main hosting concerns for this application.
c. Confirmed web host provider choice with customer: The preceding arguments were
presented to the client and approval was received.
2) Phase II: Configured production and development environments
a. Installed software in development environment to emulate deployment in the hosted
server environment: The development workstation for the project was running a
Windows XP SP3 operating system. However, this is not an issue for application
programming as is the paradigm for this project, the focus is on the server software
and application code that is run on top of the system. The technologies I worked with
are cross platform, so the code runs on both Windows and Linux. Next, an Apache
2.0 web server was installed on the workstation to mirror the Apache 2.0 on the
hosted server. Then PHP 5.3 and MySQL 5.5 were installed on the workstation, as
they were the current versions for the PHP interpreter and the MySQL database
engine supplied by the provider. All three pieces of newly installed software needed
to be configured to work to together in the development environment. This was
accomplished through configuration scripts that were included in each software
package. Additionally an open source FTP tool, WinSCP 5.1, was installed to allow
for a graphical view of the project directories on the hosted web server and for the
Implementing an Online Database for Scientific Conference Presentations Page 12
ease of transferring completed files from the development to the deployment
environment. The hosted web server’s ip address and secure port numbers were saved
to the WinSCP tool. The web server could then be logged into using the tool.
b. Created database and schema in both environments: This is the point where the
database schema was designed. The schema is the logical representation of the
database tables, the data fields (columns) and how they relate to each other, thereby
giving the ability to store and search metadata for each presentation. With the
database engine installed and the schema designed, the database tables could be
created.
c. Setup project directories in both environments: Directories were created for the
project as a whole, for public facing web pages, for .css files, and the collection of
.pdf files.
d. Conducted preliminary testing of server functionality and database operation: After
completion of the preceding steps the development and deployment environments
were tested using a small sample of data entries and SQL scripts to ensure that our
infrastructure was in working order.
3) Phase III: Additional preparation of development environment
a. Installed code editor: A good source code editor is a tool to increase programming
efficiency. Useful features include line numbering, syntax highlighting, and find and
replace. In keeping with the open source theme of this project, I chose Notepad++, a
free code editor that runs on Windows. Additionally, I used it in conjunction with
WinSCP to edit files on the deployment server.
Implementing an Online Database for Scientific Conference Presentations Page 13
b. Installed software for bulk file type conversion: To accomplish the goal of
converting each PowerPoint™ presentation into a PDF file. I researched a solution
that involved running OpenOffice software as a server process and an open source
Python program, PyODConverter (GitHub, 2012).
c. Wrote additional scripts to automate and manage conversion process: In order to add
a degree of automation to the file conversion process two Python scripts were needed.
One script simply took as input a directory of .ppt or .pptx files and executed the
conversion solution on every file one after another. This script initially served as a
test of the solution and as an after the fact process if metadata had already been
written to the database for a given presentation. The second script was similar to the
first with the addition of embedded SQL commands to populate the database with
metadata for each presentation as the conversion took place. Directories used as input
and output boxes were created on the workstation to further organize the process and
the automation script contained the code to make use of this feature.
4) Phase IV: Project Execution
a. Coded user interface and dynamic request web pages: The actual coding of the
deliverable product occurred in this phase. HTML and CSS were used to create the
visual interface. PHP coding was used for the dynamic interactions and to query the
database with embedded SQL.
b. Performed bulk file conversions and populated development database: The solution
outlined in section 3b and 3c was executed here.
c. Tested project in development environment: This involved testing of all links and
testing of all query types using the developed interface. All bugs and anomalies that
Implementing an Online Database for Scientific Conference Presentations Page 14
manifested in the development environment were corrected before moving the code
to the deployment environment.
d. Prepared deployment server: The completed project files were copied over to the
hosting web server and the project’s database was loaded with the completed
metadata entries.
e. Tested project in production environment: Again, this involved testing of all links and
testing of all query types using the developed interface. All remaining bugs and
anomalies were caught and corrected before the client signed off on delivery.
f. Created user documentation: Step-by-step instructions were written as .html files to
clarify the usage and operation of the application. These files were copied into the
application and given hyperlinks on the main web page.
g. Obtained final customer approval: The customer was given a formal demonstration of
the finished application. Questions and concerns were addressed and resolved. The
customer was satisfied and gave their final approval.
Implementing an Online Database for Scientific Conference Presentations Page 15
Project Deliverables
The primary deliverable for the project was an online database repository of XYZ’s scientific
slide presentations from the 2013 conference. A secondary deliverable was the establishment of a
web infrastructure to support the application, by way of a web-hosting provider. An ongoing
deliverable during development process was the formal and informal communication with
XYZ’s staff. This section outlines, in sequence, the deliverables that were presented to the
customer.
Phase I. Deliverables: High-level overview of the system and associated costs.
a. A listing of the software requirements and verification that the chosen web-
hosting provider met the technical requirements.
b. A comparison of costs from several other web-hosting providers to show
competitive pricing.
c. Independent documentation of web server uptime statistics illustrating the
reliability of the chosen web-hosting provider.
d. The high-level technical documentation detailing the functionality of the
application.
Phase II. Deliverables: Documentation that system resources were in place and passed
preliminary testing.
a. An accounting of software resources that were in place on the deployment server
and the matching software was installed on the development workstation.
b. Entity-Relationship charts that showed the database schema.
c. A diagram of the system directory hierarchy for the project.
Implementing an Online Database for Scientific Conference Presentations Page 16
d. Documentation of system test results. The project could not have moved forward
from this point without a positive test of the infrastructure.
Phase III. Deliverables: Documentation of the software resources that were in place.
a. A high-level explanation of the project’s custom document conversion process
including a discussion on the use of the open source software pertinent to this
process as well as the in house scripts developed to facilitate the batch processing,
file naming conventions, and database population. This document also confirmed
that the software was in place and passed operational testing.
Phase IV. Deliverables: Project execution that resulted in the fully functioning online
database application along with its operational documentation.
a. Weekly status meetings were held to keep the customer apprised of progress.
b. As part of the weekly status meeting, reports of the application’s test results on
the development system were presented, up to and including confirmation of the
fully functioning application the development environment.
c. Status reports were presented ad hoc for the transfer of the application to the
deployment environment.
d. A single final report was presented confirming the completed operational
application in the deployment environment.
e. Operational documentation was included by way of hyperlinks within the
application.
Implementing an Online Database for Scientific Conference Presentations Page 17
Project Plan and Timelines
Phase Activity Start Date End Date Duration
I: Obtain web hosting
Determine software requirements.
07/21/2013 07/23/2013 3 days
Select web host provider.
07/24/2013 07/25/2013 2 days
Confirm web host provider choice with customer.
07/28/2013 07/28/2013 1 day
Create Phase I deliverable: High-level overview of proposed system and associated costs.
07/29/2013 07/29/2013 1 day
II: Configure production and development environments
Install scripting software in development environment to mirror server environment.
07/30/2013 07/30/2013 .5 day
Install FTP tool in the development environment if necessary. Add host provider URL to FTP tool addresses in the development environment.
07/30/2013 07/30/2013 .5 day
Create database and schema in both environments.
07/31/2013 08/06/2013 5 days
Setup project directories in both environments.
08/07/2013 08/07/2013 .25 day
Conduct preliminary test of server functionality and database operation.
08/07/2013 08/07/2013 .75 day
Implementing an Online Database for Scientific Conference Presentations Page 18
Create Phase II deliverable: Documentation of the physical resources that are in place.
08/08/2013 08/08/2013 1 day
III: Additional preparation of development environment
Install code editor. 08/11/2013 08/11/2013 .5 day
Install software for bulk file type conversion.
08/11/2013 08/11/2013 .5 day
Write additional scripts to automate and manage conversion process.
08/12/2013 08/15/2013 3.75 days
Create working directories to be used during bulk file conversions.
08/15/2013 08/15/2013 .25 day
Create Phase III deliverable: Documentation of the software resources that are in place.
08/18/2013 08/18/2013 1 day
IV: Project Execution
Code user interface and dynamic request web pages.
08/19/2013 08/25/2013 5 days
Perform bulk file conversions and populate development database.
08/26/2013 09/05/2013 8 days
Test project in development environment.
09/08/2013 09/09/2013 2 days
Copy converted presentation files over to hosting web server.
09/10/2013 09/10/2013 .5 day
Load web server’s database with populated tables.
09/10/2013 09/10/2013 .5 day
Implementing an Online Database for Scientific Conference Presentations Page 19
Test project in production environment.
09/11/2013 09/12/2013 2 days
Create user documentation.
09/15/2013 09/16/2013 2 days
Obtain final customer approval.
09/17/2013 09/19/2013 3 days
Implementing an Online Database for Scientific Conference Presentations Page 20
Project Development
In Phase I XYZ’s expectations were determined and project requirements were gathered
accordingly. The concern of cost was paramount. One reason I obtained this contract is because
my rates and final project totals were significantly less than an audio/video contractor who had
been considered to record each session live. Besides not being a larger company with more
overhead, I was able to deliver an internet solution at a reasonable rate due to my ability to use
open source software and platforms. After discussions with XYZ’s staff about the project
requirements, I was confident that an open source approach would be feasible and the projected
cost would be more than satisfactory for the client. To this end, I chose a reliable web-hosting
provider that would provide the required infrastructure at a competitive rate.
Phase II was concerned with the design of the solution. Open source and universal
accessibility were major factors. My design addressed these concerns by implementing a
document conversion process that assured all presentations displayed on the web conformed to
the PDF format. Further conformity and accessibility were established by the use of the open
source software technologies of HTML, CSS, PHP, and MySQL for the coding of the project.
An integral part of the solution was the inclusion of a database. This provided convenient lookup
and search capabilities for the users and contributed to an efficient design behind the scenes, as
there were more than 500 documents to be managed.
For Phase III the project design was implemented. During this phase both the
development and deployment environments were established and configured. The documents
were put through the conversion process, the database was created and written to, dynamic web
pages were coded, and the static web pages were written. The completed application was tested
in the development environment to confirm basic functionality.
Implementing an Online Database for Scientific Conference Presentations Page 21
In Phase IV, troubleshooting took place. The application was rigorously tested in the
development environment. Next, it was migrated to the deployment environment and was tested
there until it passed with satisfactory results. This testing involved working extensively with the
user interface, from an end user’s perspective, to ensure the expected results were obtained and
any technical issues were caught before operational status was declared.
Phase V was the maintenance phase. After Phase IV was completed, the application ran
as planned without the need for intervention from myself or other technical people. A minimal
number of presentations needed to be cleaned up after the conversion process and a few XYZ
members submitted presentations after the fact. Both of these issues were addressed and
resolved.
Implementing an Online Database for Scientific Conference Presentations Page 22
Additional Deliverables
In order to emphasize the importance of the database to this application, I am including as
Appendix 2 a diagram of the table schema for the project. This is a representation of the database
structure, something an end user doesn’t need to consider, but the use of a database is critical to
an application where data and/or documents need to be organized in a logical fashion.
Implementing an Online Database for Scientific Conference Presentations Page 23
Conclusion
The project started on schedule on July 21, 2013. There were no major setbacks. In the
design of the database, there had been a question of whether or not to enforce the First Normal
Form and I made the decision to forego the First Normal Form. This decision did not affect the
quality of the end result and the schedule was maintained. The planning and implementation of
the document conversion process was successful and efficient. Coding of the dynamic and static
web pages went smoothly, again due to good planning. The web-hosting provider proved to be
effective and trouble free. Communication with the customer before the start and during the
project was clear and concise allowing for an effective and pleasant experience for all concerned.
The project ended on time September 18, 2013 with the formal customer signoff.
Implementing an Online Database for Scientific Conference Presentations Page 24
References
Alexa (2014). Top Sites in: All Categories > Computers > Programming Retrieved on June 19,
2014 from http://www.alexa.com/topsites/category/ Top/Computers/Programming
BuiltWith (2014). Programming Language Usage Retrieved on June 21, 2014 from
http://trends.builtwith.com/ framework/programming-language
DB-Engines (2014). DB-Engines Ranking Retrieved on June 21, 2014 from
http://db-engines.com/en/ranking
GitHub (2012). mirkonasato/pyodconverter Retrieved on June 24, 2014 from
https://github.com/mirkonasato/pyodconverter
Mindfire Solutions (2014). Projects Retrieved from on June 19, 2014
http://www.mindfiresolutions.com/php-mysql-lamp-development.htm
[email protected] (2014). Lunarpages Uptime Report Retrieved on
June 20, 2014 from http://uptime.besthostratings.com/viewreport.php?host=lunarpages
Web Hosting Review (2014). 2014 Best Reviews and Comparisons Retrieved on
June 23, 2014 from http://web-hosting-review.toptenreviews.com
Implementing an Online Database for Scientific Conference Presentations Page 25
Appendix 1: Competency Matrix
Domain Competency Explanation
Leadership and Professionalism
Upper-Division Communication and Interpersonal Skills
I engaged in discussions with client to fully understand the project requirements.
Organizational Behavior and Management Principles/Principles of Management
Strategic Planning I created a formal plan for the project implementation.
Liberal Arts/ Reasoning and Problem Solving
Systematic problem solving I understood the need for a vendor neutral application and devised a system to this end.
Evidence I used widely proven technologies to achieve a solution with a high level of operational confidence.
Liberal Arts/Quantitative Literacy
Constructing Arguments and Reasoning
I presented logical arguments in favor of the technologies used.
Liberal Arts/Language and Communication
Reading critically I critically evaluated documentation of software solutions to determine viability in project design.
Upper Division Collegiate Level Reasoning and Problem Solving
Planning and Information Gathering
I gathered information on open source resources for bulk document conversion and integrated a solution into the project to achieve this goal.
Language and Communication
Working with Sources I used pertinent sources with citations to reinforce arguments.
Writing Style, Citations, and Use of Sources
I employed APA Citations inline with quoted text and properly formatted and listed on references page.
Implementing an Online Database for Scientific Conference Presentations Page 26
Adaptation I have allowed for future extensibility of the project.
Networks Network Infrastructure and Associated Components
I used my knowledge of multiple computer operating systems in composing the solution.
Software Engineering and Development
Web Programming I applied my programming skills to the development of the solution.