38
Recommendation For Automating Current Data Entry System Angus Lam, Steven Le, Tyler Zalischuk ICT Students Computer Systems Option School of Information Communication Technology SAIT Polytechnic

proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Embed Size (px)

Citation preview

Page 1: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

RecommendationFor

AutomatingCurrent

Data Entry System

Angus Lam, Steven Le, Tyler Zalischuk

ICT StudentsComputer Systems Option

School of Information Communication TechnologySAIT Polytechnic

Page 2: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

RecommendationFor

AutomatingCurrent

Data Entry SystemWritten for

Andrew Campbell Client/Instructor

Written byAngus Lam, Steven Le, Tyler Zalischuk

ICT StudentsComputer Systems Option

School of Information Communication TechnologySAIT Polytechnic

Requested by Andrew Campbell, Client/Instructor

SAIT Polytechnic

18 April 2019

Page 3: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Executive summary 6Introduction 7Business Case 9

Assessment 9Background 9Anticipated Outcomes 9Identified stakeholders 9

Analysis 10Root cause Analysis 10Gap Analysis 10Known Risks 10

Critical Success factors 11Hardware 11Software 11Networking 11Server Services 11Security 12

Conclusion and recommendations 13Proposed solutions 13

Conclusion 14Solution Approach 15System Review 16

Software component 16Anticipated 16Outcome 16

Hardware 17Anticipated 17Outcome 17

Networking 19Anticipated 19Outcome 19

Security 20Anticipated 20Outcome 20

Server Services 20Anticipated 20Outcome 20

Overall outcome 21Budget 22

Budget break-down anticipated 22

Page 4: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Budget break-down outcome 23Conclusions 24

Budget: 24Schedule 24Scope 24Functionality 24

Recommendations & Opportunities 25

Glossary 26A 26C 26D 26G 26I 26K 26R 27O 27S 27V 28

References 29

Page 5: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Executive summaryThis project is a shipping company that relies on customers sending personal data to be stored in our server by using a printer that takes pictures of hard document forms. Our project also consists of having the end user to also look at his information via website.

The first part of our introduction contains the business case which is the reason why we started this project by going to three different interviews and picking one that we would like. In this case we decided to go with UPS our solution was a printer that acts as a scanner so it can take pictures of hard documents and sending it to a database. The major sections that are in the business case is the assessment, analysis, crucial success factors, conclusion and recommendation and the conclusion for this project.

The anticipated outcomes would include the success of the project by showing a way on how it operates by taking pictures of the forms, processing and sending it to a database.

The identified stakeholders for our project are:● Andrew Campbell (Client/Instructor)● Sait

The analysis of the project would include categories such as root cause analysis, gap analysis and known risks. The biggest problem that we face in the project is human error such as incorrect data from the OCR module and OCR training so that the module can read human writing. The improvement that we would like to see is the camera to take a picture of the field forms and be sent to a database. The known risks that we had to deal with is the OCR module since it was only trained to read computer fonts and not human text.

The critical success factors are hardware, software, server service, security and networking. The hardware component consisted of the printer to pull the paper and send it out, the cameras to take a picture of the form. The software portion of the project contains the OCR module and controlling the cameras. Networking contains the routing protocols and VLANS. Server service in this project provides AD, database and a web server. The security part for the project is to use a VPN and to use VLANS.

The conclusion of this report details how companies are not processing documents in an efficient way. The best possible solution for this problem is to implement software like OCR that can take handwritten letters and convert them to computer text in order to reduce cost and man hours.

The following part includes our solutions approach which documents how we approached this problem and came to the conclusion of making a gutted printer scan in forms.

Page 6: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Next is our phase system review and budget review, in these sections we discuss the differences between our anticipated budget and work flow and what we actually did. This includes an analysis of our individual components and hours spent and the variance in budget.

Lastly this document contains some recommendations for any future projects that may come from this these include better hardware, improved software or training it.

Page 7: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

IntroductionThis report is a recommendation for an automated system to process forms in place of manual data entry. This report summarizes and justifies our recommendations for a better and improved system for data entry. This report is for clients looking to use this system to automate and overall make their business more efficient. Currently the systems used for data entry are slow and are done by employees leading to human error and wasted time. Coming from our interviews one of the respondents highlighted issues in their work place over so much time being spent on data entry. They lamented on how between staffing issues and demand have overworked their staff. Our recommendation for this includes a form scanner that takes in generic forms and converts the fields into text. The text is sent to a database for future use and can be accessed from a web interface by users.

This document is a formal report that contains a business case for this project, a review of our project variance in comparison with the scope and times we set out to fulfil. Lastly this document contains our recommendations and conclusions from this project and for subsequent projects.

Page 8: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Business Case

Assessment

BackgroundThis project was originally based off an interview assignment from 3rd semester. Out of the three interviews that we obtained we chose one with a UPS employee as the basis for this project. The employee expressed how so much time is taken up for data entry in their business. Specifically the time it takes for transfering a customer’s filled in form to a database or data entry piece of software. Being a company that takes in a large number of user filled in forms this was a perpetual problem for them as they have some staffing issues. So from there our group devised a way to automate this process to reduce the time required to complete this task. At first we envisioned some kind of conveyor belt system that would pass the forms under a scanner that would take a photo of the form and it transfer to a database.

Anticipated OutcomesIf this project is successful then success would include a functioning way to take in forms and output the forms. It would include a functioning database and web interface and lastly it would include a piece of software to coordinate the i/o of the paper, and the conversion of the paper image to text and to send that to a database. These would all work together to automate the process of entering customer written data into a database and cut down on the time required to do data entry.

Identified stakeholders The identified stakeholders for this project include SAIT and Andrew Campbell who is both our client and instructor for Capstone.

Page 9: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Analysis

Root cause AnalysisCurrently users inputting data manually can create risks and leads to many problems. Data entered could be incorrect leading to human error as a major problem in the process. For the person who enters the information from a pile of documents the process is both time consuming and tedious. These two factors make the current method of data entry from forms inefficient and costly to a business. A way to automate the processing of document information to a database will save time and resources. Information can be captured more accurately with less human error assuming that everything works perfectly. As a result a company would save both man hours and money by implementing a solution that addresses this repetitive task

Gap AnalysisThe current method used for data entry in many businesses is costly and inefficient. By employing people to do this task it introduces human error and costs more. We believe there is a way to improve this process and maybe even fully automate it.

The improvement we would like to see is having information in the fields of forms to be taken and instantly stored. In doing this the risk of human error should also be reduced. The overall goal is automation to fill the gap between its current performance and its potential. Currently the method used to add data from forms to a database is slow and inefficient. It is slowed down further by inaccuracies and user error. The gap in simple terms is that the tedious task of data entry can be automated but hasn’t been yet and it is costing businesses money.

To close this gap we believe that Optical Character Recognition (OCR) is key to making this possible. This software will be used to read the information of a form and this information would be transferred to a database for future use. This in conjunction with a paper feeder system will be able to help automate the process.

Known Risks The known risks to this project are that the OCR software is very picky and was originally trained for printed fonts. For OCR software handwritten text is more difficult to read then printed text and applicants who fill out the document may not have the best handwriting. This can lead to many errors in the detection of the text by the OCR software. Many factors fall into the users handwriting their in such as character spacing, consistent lines throughout each character, the way they write certain characters, etc.

Other risks include the gutted printer that we will be using to move the paper as if that were to fail then the automation would fail and the ability of the document scanner would be compromised. Another risk that faces this project is our lack of knowledge with python libraries and using them in the capacity that we need for this project.

Page 10: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Critical Success factors

HardwareThe hardware in the project mostly pertains to the gutted printer that we will be using to take in papers and push them back out. In addition the ability of the cameras to take clear and coherent pictures will be integral to the project functioning correctly.

● Gut a printer and take control of it’s paper rollers with GPIO● Consistently able pull papers in and spit out papers● Able to take clear and consistent photos and send it to a raspberry pi

SoftwareThe software in this project will be responsible for controlling the cameras and printer along with processing the resulting images. In addition it will be key to turning those images to text and sending that text to a database.

● Software is able to manipulate GPIO to control paper rollers● Software is able to take and open pictures from a web camera● Software is able to filter and improve readability of incoming images● Software is able to implement and accurately use OCR library● Software is able to send text data to a database● Database has a table created and compatible with software● Database is properly set up to received data from software

NetworkingThe networking function in this project serves to make sure our other devices are able to communicate and securely pass on information. The network will be divided into 2 VLANS and will support the function of the raspberry pi interfacing with the database.

● All devices are able to communicate with each other● Connection between software and database can be easily made● Website is accessible by clients only on specific internet addresses

Server ServicesThe primary need for a server in this project is to host the database and web interface. These will be used as the portal for the user and as the point where information is delivered by the software.

● Server is able to host web server and database● Server is able to host services and allow access to users and software

Page 11: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

SecurityThe security part in this project is used to make sure that no one on the outside network can gain access to the internal network. The software that we used to make this happen is OpenVPN and

● OpenVPN creates a encrypted tunnel from the server to the desktop

Page 12: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Conclusion and recommendations

Proposed solutions

Proposed solution

Benefit Feasibility Risk

Fully automatic system that reads in an entire form and converts it to text

1. Drastically cuts down on time spent on data entry

2. Reduce the chance of human error

3. A set and forget solution that would be able to just have papers thrown into and work

4. Easy for an end user to use

Moderate difficulty as the only part that is variable and could make or break the project would be the OCR software while the other components will be about as easy as the others

1. OCR software is not the best and accuracy may not be up to snuff

2. Messy handwriting will outright fail in OCR

3. Not an easy option will be difficult to implement to a high degree of accuracy

Semi automatic system that reads in specific parts of a form and converts that to text

1. Modestly can cut down on time spent on data entry

2. Should help reduce the need to do repetitive tasks

Moderate difficulty in implementation would be probably about the same as first option but used in a more limited capacity

1. OCR software will still have trouble with messy writing

2. The time to recoup the cost of this project may be longer

Simple automatic system that reads in binary options and converts into text

1. Supplements the data entry task

2. Reduces some time off of data entry

3. Cheap to implement

Very simple and easy to implement relative to other suggestions

1. The cost to create this may not outweigh the benefit

Page 13: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

ConclusionThe way companies are currently processing documents is not optimal and there should be an automated way to process these documents. The current method introduces human errors and can increase the amount of time consumed processing these documents. These can be alleviated with some method of automation to either completely take the process out of user hands or to assist people doing data entry.

The best possible solution is the fully automatic solution that will read in everything for the user but it also carries the most risk of the three proposals. This solution would drastically reduce cost and man hours required to enter information from forms. A company could expect a very quick return on investment. The other two are much lower risk but the relative benefit to the business. The other two options are to some extent more feasible as the OCR software is less likely to fail although OCR software is finicky in the first place so those have their own risks.

Page 14: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Solution ApproachOur project is aimed at small companies looking to automate the process of data entry.Originally based off an interview with a UPS employee who highlighted how much time it takes to complete data entry. Our solution and capstone project was to create an automated solution to process these documents. Somehow we had to take a photo of a form have the form enter and leave and get that information to a database that could be accessed by a user. There were 3 proposed solutions one was fully automated the second was semi automated and the last only assisted the user. In the end we chose to try the fully automated approach. This included, a gutted printer to take in papers and send them back out. A web camera to take a photo and send it to the raspberry pi. Lastly software that will crop, filter, OCR and send the data to a database. All of this solution sits on our server and raspberry pi that are connected between 2 switches a router and a soho router.

Page 15: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

System Review

Software component

Anticipated We will have to implement these libraries in our python script: Opencv2, numpy, GPIO, mysql Connector and pytesseract. These libraries while unfamiliar are key to our software component. OCR software is going to be inaccurate and we hope to reduce the percent error rate by applying filtering and cropping to the fields before sending it to the library.

Following our gantt chart we expect to have this software portion to be done by late february with hopefully 80% accuracy on the OCR portion.

Outcome The opencv library became the primary library in processing the photos and numpy was not used. The originally planned photo filter and improved photo cropper using opencv2 bounding boxes. In addition the OCR library was able to read our text was about 50% of the time and it got worse with sloppy writing. We have trained it to the best extent but we are not sure if it’s enough so we have had to modify the generic form that we are sending to device.

This software portion took up essentially the whole 13 weeks due to delays with the hardware. That pushed back a fair bit of the testing and being able to see how everything interacts in a full run further pushed back the finish date.

Page 16: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Hardware

Anticipated We anticipated that disassembly and analysis of the parts would be difficult. We were unsure of what had to be taken out or disassembled in order see the important components. We were uncertain how the motor would work if it was going to be more complicated to control or simple. Once the motor was located we were hoping that a simple connection from the GPIO of the Pi to the motor will work. The motor was tested to ensure the Raspberry Pi can power it. For the camera positioning we planned to have it strapped on a mount and aiming down on the printer. In terms of our actual hardware for our network, we believe that 2 servers, Raspberry Pi, desktop, 2 switches and a router were required.

We expect this portion to be done by about mid february with most issues being resolved and working.

Outcome At the start we decided to have everything on 1 server instead of two putting Active Directory and SQL together. During the disassembly of the printer, it turned out to be much more simple than we thought. Simply taking off the printer cover, let us easily see the internals of the printer such as the rollers, gears and the motor controlling the paper feeder. During the earlier weeks of the project we ran into an issue with the motor, which was not getting enough power from the pi.

To solve this issue we asked Andrew for advice and he suggested using a relay. A relay allows us to add in more voltage from another source. In this case we simply used a 5 volt and 3 volt output from the GPIO on the Pi giving us 8 volts. The relay took a bit of time to figure out but after some testing, the paper feeder was able to take a single piece of paper through and shoot it back out properly.

A KVM switch was soon later bought and implemented to allow easy transitioning between devices. When implementing the camera mount, we had the idea of instead taking the scanner off of the printer lid giving us a window into the printer. This not only allows us to see the internals of our printer but also allowed us to place the camera on to of the glass and looking down on the document fields. For better resolution we got another 1080p/720p camera to easily capture both groups of fields. LEDs were soon added underneath the glass panel to help the cameras and OCR read the written information more easily.

Page 17: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

When it came to testing with a stack of papers, a problem we ran into was it taking the whole stack of paper instead of just one page at a time. After further analysis we found that one part of a gear or roller underneath was not turning. In order to get this certain mechanism to turn we found it only functions when the other gears turn in a certain direction. To move the motor in a different direction, we switched its cable around between hot and ground. The printer was soon able to manually take one page at a time from a stack of papers. Another problem we ran into was the paper coming out of the paper feeder after being processed through. It seems to jam or get stuck to the point of wrapping around the rollers.

After further analysis of the moving components, it seems like the rollers need to roll one way to take the page and the other to push it back out. We attempted to use a second relay and make the same circuit but polarize the motor wires to move at a different direction. After many attempts trying to fix the circuit and more research, we figured out we needed another component called a H bridge or Motor Driver which allows the direction control of 2 different motors. The motor driver we bought ourselves and after some fiddling we were able to control the direction of the motor along with further testing to have the paper successfully go in and out with little error.

When it came to testing the stack of pages again we ran into issues. The stack of page would only take the first page and won’t take any more or sometimes take the whole stack. Paper positioning had to be very precise, taking this into account we fell back on simply sending only one page at a time. We soon also implemented a feature to which the code for the motor will stop the loop if it sees the word “STOP” to which we labeled on the feeder. For proper cable management, we used a seperate power supply to power on the breadboard for the LED’s and motor driver.

This part in actuality took all 13 weeks to get done as new problems kept cropping up from time to time. Especially with the last second realization that we needed an H bridge for the motor to function properly.

Page 18: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Networking

Anticipated Our topology had a Cisco router that use one of the interface ports that would connect to the first switch while the other interface port would connect to the second switch. We still weren't sure where the SOHO router would be attached so we just connected the SOHO to the first switch. Another objective that we anticipated was to somehow implement DNS configurations to the Cisco router so that we can have the windows server be our DNS server. We also wanted to implement a concept of using three VLANS so that Vlan 10 and 20 will be for the two switches while Vlan 99 will be for trunking.

The networking part of the project was expected to take 4 weeks of work that should be relatively easy since all we did was to configure 2 switches, a Cisco router and a SOHO Linksys router that will allow our network access the outside network so other people can use the WIFI.

Outcome The biggest problem that we had was how the VLANS will talk to each other while implementing a topology that we changed from having two switches connected to a router to a router on a stick that is connected to a single switch that will daisy chain to the second switch. What we have done to solve this problem was to make a single Vlan interface to have an IP address on both switches while the first switch had encapsulation configurations and the router had vlan tag configurations in order to make the packets flow around the network with either a vlan 10 or vlan 20 tag. Another problem that we faced was to make the SOHO router access the outside network to our network. To solve this problem we had to make a Vlan tag of Vlan 20 (that would be the first switch that the Cisco router will touch) and also to make a static route to the outside network. Overall the project went smooth and everything is connected and works.

The networking part of the project was expected to take 4 weeks of work that turned out to be 8 weeks of work because of problems that we had to deal with.

Page 19: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Security

Anticipated We expected the implementation of AD authentication and ip restriction on IIS to be fairly easy, Openvpn was also expected to be easy as we have previous experience in that. Lastly hardening the servers and devices was expected to be easy as it was removing services and reducing the surface of attack.

The security function of the project was expected to take 3-4 weeks of work, while easy we were expecting to hit a hiccup or two during that time.

OutcomeMost of it was as we expected IIS security was easy to set up, the devices were easy enough as well as that was just checking for extra programs and software and removing those. Openvpn took a few weeks as while it was correctly installed there was an error in the configuration and due to a miss trouble shooting of the issue there were a few reinstalls of the whole implementation.

The security function of the project took about 2 weeks to implement and was easy enough though we feel we might have missed a few things.

Server Services

Anticipated We planned to have DHCP as a service on the Windows server along with Active Directory, DNS and IIS. IIS is to host the website that we have created for the SQL, Active Directory is to store usernames and passwords, DNS to get the domain name and DHCP to receive IP address to dynamic computers.

We expected this to be done quickly without much errors over the first few weeks.

Outcome We decided to instead not use the DHCP server service and have the router give DHCP addresses to computers and other devices. Active Directory, DNS and IIS was still installed on the single server.

This was done quickly without much errors over the first few weeks.

Page 20: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Overall outcomeSo in review based on the anticipated hours coming purely from assigned time in lab we expected to spend around 78 hours in total for this project. The actual hours vary a fair bit as most of us went over that time. This is mostly due to how our perceptions of the time we needed for our individual components were. We expected overall that this project would have been a bit easier and have taken less time. This applied especially to the hardware and software portions which at every step had an issue.

Angus Lam Steven Le Tyler Zalischuk

Hours Anticipated 78 78 78

Hours Actual 87 106 76

Page 21: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Budget

Budget break-down anticipatedItem January February March April Total

Phase 1 Software

E & M 0 0 0 0 0

Labour $300 $450 $100 $200 $1050

Phase 2 Hardware

E & M $100 $169.99 $4289.99 0 $4559.98

Labour $180 $120 $150 0 $450

Phase 3Network

E & M 0 $7669.78 0 0 $7669.78

Labour 0 $100 $125 0 $225

Phase 4Server Services

E & M 0 0 $1944 0 $1944

Labour 0 0 $225 $200 $425

Phase 5Security

E & M 0 0 0 0

Labour $150 $125 $100 $175 $550

Total $730 $8634.77 $6933.99 $575 $16873.76

Page 22: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Budget break-down outcomeItem January February March April Total

Phase 1 Software

E & M 0 0 0 0 0

Labour $193.125 $386.25 $321.87 $643.75 $1545

Phase 2 Hardware

E & M $2030.75 $10.00 $140.00 $13.00 $2180.75

Labour $226.25 $426.25 $426.25 $226.25 $1305

Phase 3Network

E & M $3822.42 0 0 0 $3822.42

Labour $570 $285 $285 0 $1140

Phase 4Server Services

E & M $1944 0 0 0 $1944

Labour $570 $225 $135 0 $930

Phase 5Security

E & M 0 0 0 0 0

Labour 0 0 0 $550 $550

Total $9356.55 $1232.5 $1308.12 $1342.25 $13417.17

Total Difference - $8626.55 $7402.27 $5625.87 -$767.25 $3456.59

So over all our budget variance totals out to a difference of $ 3456.59 less than our anticipated requirements for this project. Looking at the diffrence in the anticipated budget vs our actual budget we can see that over labour cost more than we expected in total with some components labour costs going over a thousand. While on the other hand our cost for equipment was less than what we expected most in part due to our team having one less server than we were expecting. In addition our original estimates for when the money would be needed was far different than we first thought as most equipment was obtained at the beginning. Labour on the other hand rather than lasting for the duration of that component were more spread out and was stacked towards the end.

Page 23: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Conclusions

Budget:

In review our project cost less for the required equipment and materials than we had initially anticipated. This was due to a lack of understanding how equipment would be obtained over the course of the project and the exact specifications of the equipment it self. At the end we had planned for more than we actually used.

With regards to labour we we spent far more time on this project than anticipated. The amount of work required for our software and hardware sections was on average 20 more hours than the allotted work time we were given. Other components like networking also had taken up more time due to issues.

Schedule

When progressing through our project, we ran into minimal issues at the start. Many of the tasks were finished ahead of time which gave us extra time for future tasks that we may struggle with. Networking was finished on schedule even with the minor problems of communications with our VLANS. We were getting behind schedule with our hardware due to the consistent issues of the document being sent in and send back. This involved us to spend some time to figure out the connection of the H bridge. Software was finished on time with some extra time put into improving its accuracy. Although testing was pushed back on the software due to the hardware issues.

Scope

Our scope was met, we have a gutted printer to take in papers and send them out, software than controls, processes, and send that data to the database and lastly a secured network that supports all of these processes. All objectives in scope were met and achieved in the project except for one. The objective we missed was having the VPN hosted on the Raspberry Pi and connecting to our database server.

Functionality

Our project fully functions from start to finish. Only minor problem is the positioning of the document or stack of papers in order to prevent jamming or other anomalies that may occur. Otherwise, document goes through to which is captured by our cameras. Captured images are analyzed by our software and OCR storing the hand written information into our SQL database.

Recommendations & OpportunitiesAs a result of this project there were many things that could be worked on to further improve its ability to be efficient and secure.

Page 24: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

1. More training and work put in the OCR to increase the its accuracy. 2. It would also be possible to move the OCR engine from tesseract to a paid solution such

as abbyword to improve accuracy3. Dedicated hardware for both cameras and paper feeder would drastically improve

performance4. Modifications to the software could allow for it to work with other forms both single and 2

sided and from other industries.5. Further disassembly for the printer would also be recommended in order to improve the

paper’s ability to be pulled in and sent out.6. Have a modern layer 3 switch instead of a layer 2 switch because you can have more

than one Vlan having an IP address per interface7. Implement EGRIP instead of static routing for better redundancy for packets traveling

through the network.

Page 25: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

GlossaryActive Directory: A server running Active Directory Domain Services (AD DS) is called a domain

controller. It authenticates and authorizes all users and computers in a Windows domain type network by assigning and enforcing security policies for all computers and installing or updating software.

Authentication: The process of identifying an individual, usually on a username and password.

Authorization: Authorization is a security mechanism used to determine user/client privileges or access levels related to system resources, files, services, and data.

Camera: A camera is an optical instrument for capturing still images or for recording moving images ("video"), which are stored in a physical medium such as in a digital system or on a photographic film

DHCP: (Dynamic Host Configuration Protocol) is a protocol used to provide quick, automatic, and central management for the distribution of IP addresses within a network

DNS: (Domain Name System) Translates domain names to IP addresses so browsers can load internet resources

GPIO: GPIO stands for General Purpose Input/Output. It’s a standard interface used to connect microcontrollers to other electronic devices.

ISS: (Internet Information Services) is a extensible web server created by Microsoft for use with the Windows NT family. IIS supports HTTP, HTTPS, FTP, SMTP, etc.

KVM: KVM switch (with KVM being an abbreviation for "keyboard, video and mouse") is a hardware device that allows a user to control multiple computers from one or more[1] sets of keyboards, video monitors, and mice.

Raspberry PI: The Raspberry PI is a low cost, credit-card computer that plugs into a computer monitor or TV, and uses a standard keyboard and mouse.

OCR: OCR (optical character recognition) is the use of technology to distinguish printed or handwritten text characters inside digital images of physical documents, such as a scanned paper document

Scanner: is a device that optically scans images, printed text, handwriting or an object that converts it to a digital image.

Page 26: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

Server: A server is a computer program or a device that provides functionality for other programs or devices called client. Servers can provide various functionality, often called services, such as sharing data or resources among multiple clients

SQL: (Structured Query Language) is a domain specific language used in programming and designed for managing data held in a relational database management system.

Switch: A network switch is a computer networking device that connects devices together on a computer network by using packet switching to receive, process, and forward data to the destination devices.

VLANS: A (Virtual Local Area Network) functions by applying tags to network packets and handling these tags in networking systems: creating the appearance and functionality of network traffic that is physically on a single network but acts if it is split between separate networks.

VPN: A (Virtual Private Network) is a private network that extends across a public network and enables users to send and receive data across shared or public networks as if their computing devices were directly connected to the private network.

Page 27: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

References[1]

“Raspberry Pi 3 Model B+ Ultimate Kit.” [Online]. Available: https://www.canakit.com/raspberry-pi-3-model-b-plus-ultimate-kit.html. [Accessed: 17-Apr-2019]

[2]“hp proliant dl360 g6 price - Google Search.” [Online]. Available: https://www.google.com/search?q=hp+proliant+dl360+g6+price&source=lnms&tbm=shop&sa=X&ved=0ahUKEwjJhf35udfhAhV4HzQIHdBuD9gQ_AUIDygC&biw=1366&bih=667#spd=12107479859361437172. [Accessed: 17-Apr-2019]

[3]“HP ProLiant DL360 G6 Server Intel E5520 Quad 2.27Ghz 4GB Ram 4x 300GB HDD P410i,” eBay. [Online]. Available: https://www.ebay.ca/i/HP-ProLiant-DL360-G6-Server-Intel-E5520-Quad-2-27Ghz-4GB-Ram-4x-300GB-HDD-P410i/352589595053. [Accessed: 17-Apr-2019]

[4]“CISCO 1900 CISCO1921/K9 10/100/1000Mbps 1921 Multi Service Router - 2 Port - 2 - Newegg.ca.” [Online]. Available: https://www.newegg.ca/Product/Product.aspx?Item=N82E16833120491. [Accessed: 17-Apr-2019]

[5]“CISCO 2960-Plus 2960-Plus 24LC-L Switch - Newegg.ca.” [Online]. Available: https://www.newegg.ca/Product/Product.aspx?Item=N82E16833420529. [Accessed: 17-Apr-2019]

[6]“OptiPlex 3060 Tower and Small Form Factor Business Desktop | Dell Canada,” Dell. [Online]. Available: https://www.dell.com/en-ca/work/shop/cty/optiplex-3060-tower-and-small-form-factor/spd/optiplex-3060-desktop. [Accessed: 17-Apr-2019]

[7]“Wired Desktop 600 - English | Dell Canada,” Dell. [Online]. Available: https://www.dell.com/en-ca/shop/accessories/apd/a7283563?mkwid=ssmZlzeEY&pcrid=341923694868&pkw=&pmt=&pdv=c&slid=&product=a7283563&VEN1=ssmZlzeEY,341923694868,901mtv7630,c,,a7283563&VEN2=,&cid=3852&lid=5635587&d

Page 28: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

gc=st&dgseg=cbg&acd=1230881379501410&cid=3852&st=&gclid=EAIaIQobChMIi4DbxrrX4QIVEMRkCh1gxAu6EAkYBiABEgLhWfD_BwE&pdv=c&lid=5635587&VEN1=ssmZlzeEY%2C341923694868%2C901mtv7630%2Cc%2C%2Ca7283563&VEN2=%2C&dgc=st&dgseg=dhs&acd=1230881379501410&VEN3=113804885092951183. [Accessed: 17-Apr-2019]

[8]“TP-Link (Archer C3150) AC3150 Wireless Wi-Fi Router - High Performance Wave 2 Wi-Fi for 4K Streaming and Gaming: Amazon.ca: Computers & Tablets.” [Online]. Available: https://www.amazon.ca/TP-Link-Archer-C3150-Wireless-4xAntenna/dp/B01AK9TC0Y/ref=asc_df_B01AK9TC0Y/?tag=googleshopc0c-20&linkCode=df0&hvadid=292901645034&hvpos=1o11&hvnetw=g&hvrand=10515658079076096141&hvpone=&hvptwo=&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=9001342&hvtargid=pla-504904241997&psc=1. [Accessed: 17-Apr-2019]

[9]“UGREEN USB KVM Switch Box 2 Port VGA Video Sharing Adapter 2 IN 1 OUT Manual Switcher with USB Cables for Computer, PC, Laptop, Desktop, Monitor, Printer, Keyboard, Mouse Control: Amazon.ca: Electronics.” [Online]. Available: https://www.amazon.ca/dp/B01D5XVO62/ref=sspa_dk_detail_5?psc=1&pd_rd_i=B01D5XVO62&pd_rd_w=Q2nZJ&pf_rd_p=4b7c8c1c-293f-4b1e-a49a-8787dff31bcb&pd_rd_wg=AX0y9&pf_rd_r=06RG92TE2YXFVFNQBQ7H&pd_rd_r=cd14e1c9-6126-11e9-8078-db1490754bb9. [Accessed: 17-Apr-2019]

[10]“OSEPP MOTOR DRIVER MODULE MTD01 ARDUINO RETURN POLICY: EXPERIMENTAL USE, NOT RETURNABLE,” B&E ELECTRONICS. [Online]. Available: https://www.be-electronics.com/product_p/mtd01.htm. [Accessed: 17-Apr-2019]

[11]Tesseract Open Source OCR Engine (main repository): tesseract-ocr/tesseract. tesseract-ocr, 2018.

[12]“Deep Learning based Text Recognition (OCR) using Tesseract and OpenCV | Learn OpenCV.” [Online]. Available: https://www.learnopencv.com/deep-learning-based-text-recognition-ocr-using-tesseract-and-opencv/. [Accessed: 26-Nov-2018].

[13]V. Beal, “What is Authentication? Webopedia Definition.” [Online]. Available: https://www.webopedia.com/TERM/A/authentication.html. [Accessed: 27-Nov-2018].

Page 29: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

[14]“What is Authorization? - Definition from Techopedia,” Techopedia.com. [Online]. Available: https://www.techopedia.com/definition/10237/authorization. [Accessed: 27-Nov-2018].

[15]“Software,” Wikipedia. 25-Nov-2018.

[16]“Active Directory,” Wikipedia. 24-Oct-2018.

[17]“Camera,” Wikipedia. 15-Nov-2018.

[18]“Client (computing),” Wikipedia. 15-Oct-2018.

[19]T. F. T. F. has 30+ years’ professional technology support experience H. writes troubleshooting content and is the G. M. of Lifewire, “What Is DHCP? (Dynamic Host Configuration Protocol),” Lifewire. [Online]. Available: https://www.lifewire.com/what-is-dhcp-2625848. [Accessed: 20-Nov-2018].

[20]“What is DNS? | How DNS works,” Cloudflare. [Online]. Available: https://www.cloudflare.com/learning/dns/what-is-dns/. [Accessed: 20-Nov-2018].

[21]“File Transfer Protocol,” Wikipedia. 08-Nov-2018.

[22]“Internet Information Services,” Wikipedia. 15-Oct-2018.

[23]“What is a Raspberry Pi?,” Raspberry Pi. .

[24]“Image scanner,” Wikipedia. 15-Nov-2018.

[25]“Server (computing),” Wikipedia. 18-Nov-2018.

[26]“SQL,” Wikipedia. 08-Nov-2018.

[27]“What is an SSL Certificate?” [Online]. Available: https://www.globalsign.com/en/ssl-information-center/what-is-an-ssl-certificate/. [Accessed: 20-Nov-2018].

[28]“Network switch,” Wikipedia. 19-Nov-2018.

[29]

Page 30: proj354.comproj354.com/itcs/10to19/2019/06/CommonDocuments/FormalReport.d… · Web viewproj354.com

“Virtual LAN,” Wikipedia. 13-Nov-2018.[31]

“Virtual private network,” Wikipedia. 19-Nov-2018.[32]

V. Beal, “What is HTTP - HyperText Transfer Protocol? Webopedia Definition.” [Online]. Available: https://www.webopedia.com/TERM/H/HTTP.html. [Accessed: 20-Nov-2018].

[33]“Optical character recognition,” Wikipedia. 30-Mar-2019 [Online]. Available: https://en.wikipedia.org/w/index.php?title=Optical_character_recognition&oldid=890211129. [Accessed: 17-Apr-2019]

[34]“What is GPIO?,” Estimote Community Portal. [Online]. Available: http://community.estimote.com/hc/en-us/articles/217429867-What-is-GPIO-. [Accessed: 17-Apr-2019]

[35]“KVM switch,” Wikipedia. 19-Dec-2018 [Online]. Available: https://en.wikipedia.org/w/index.php?title=KVM_switch&oldid=874488514. [Accessed: 17-Apr-2019]