Upload
madan-r-honnalagere
View
225
Download
0
Embed Size (px)
Citation preview
8/3/2019 OCR by me
1/26
Click to edit Master subtitle style
5/3/12
Introduction to Optical
Character Recognition(OCR)
MADAN H R
8/3/2019 OCR by me
2/26
5/3/12
Summary
Overview of OCR
System Requirements
Advantages and DisadvantagesOperation and Management
Questionnaire Design andPreparation
8/3/2019 OCR by me
3/26
5/3/12
OCR (Optical Character Recognition)
Function & Features of OCR/ICR
ICR, OCR and OMR Compared
o Optical Mark Reader (OMR)
o OCR/ ICR
8/3/2019 OCR by me
4/26
5/3/12
OCR (Optical Character Recognition)
Also referred to as Optical Character Reader
a system that provides a full alphanumeric
recognition of printed or handwritten charactersat electronic speed by simply scanning the form.
Intelligent Character Recognition (ICR) is used to
describe the process of interpreting image data,in particular alphanumeric text.
Sometimes OCR is known as ICR
8/3/2019 OCR by me
5/26
5/3/12
Functions & Features ofOCR
Forms can be scanned through a scanner and thenthe recognition engine of the OCR system interpretthe images and turn images of handwritten orprinted characters into ASCII data (machine-readablecharacters).
The technology provides a complete form processingand documents capture solution.
Allows an open, scalable and workflow.
Includes forms definition, scanning, image
8/3/2019 OCR by me
6/26
5/3/12
Functions & Features ofOCR
Delivers an easy training process for building thecharacter library
OCR finds character pattern matches from a library
of taught characters - Watch a Real ApplicationVideo
Optical Character Verification (OCV) confirms thepresence of desired characters in a specific
location
8/3/2019 OCR by me
7/26
8/3/2019 OCR by me
8/26
8/3/2019 OCR by me
9/26
5/3/12
Functions & Features ofOCR
Date Lot Inspection for pharmaceutical andmedical packaging.
d
8/3/2019 OCR by me
10/26
5/3/12
ICR,OCR and OMRDifferences
ICR and OCR are recognition enginesused with imaging;
OMR is a data collection technologythat does not require a recognition
engine.
OMR cannot recognize hand-printed
or machine-printed characters.
O i l k d
8/3/2019 OCR by me
11/26
5/3/12
Optical Mark Reader(OMR)
Forms
o An OMR works with a specialized document and contains timingtracks along one edge of the form to indicate scanner where toread for marks which look like black boxes on the top or bottomof a form.
o The cut of the form is very precise and the bubbles on a formmust be located in the same location on every form.
Storage
o With OMR, the image of a document is not scanned and stored.
Accuracy
o OMR is simpler than OCR.
o designed properly, OMR has more accuracy than OCR.
8/3/2019 OCR by me
12/26
5/3/12
OCR/ ICR
Forms
o OCR/ ICR is more flexible since no timing tracks orblock like form IDs required.
o The image can float on a page.
o ICR/ OCR technology uses registration mark on thefour-corners of a document, in the recognition of animage. Respondents place one character per box onthis form.
o
The use of drop color reduces the size of thescanners output and enhances the accuracy.
Storage/ retrieval
o If the document needs to be electronically stored andmaintained, then OCR/ ICR is needed.
o OCR/ICR technolo ies, ima es can be scanned,
OMR OCR/ICR
8/3/2019 OCR by me
13/26
5/3/12
OMR-OCR/ICRCompared
8/3/2019 OCR by me
14/26
5/3/12
System Requirements
Minimum capacity PC Requirements:
o Processor: Pentium 200 MHz RAM: 32 MB Disk: 4 GB
o Form modules are designed to operate in a batchprocessing;
o Run under LAN and PC based platforms and take fulladvantage of the graphical user interface and 32 bitprocessing power available with most Windowsversions.
Software:
OCR with ICR capability software
Questionnaire Design Software
8/3/2019 OCR by me
15/26
5/3/12
System Requirements(cont.)
Scanner
o OCR scanners with minimum capacity:
o
Duplex scanningo Speed: 60 sheets/ min
o Automatic Document Feeder (ADF):
Scanning can take a significant amount oftime, and the system lets user scan upwithout doing the OCR.
Ad t d
8/3/2019 OCR by me
16/26
5/3/12
Advantages andDisadvantages
Advantages of Using Images Rather Than Paper
o Quicker processing; no moving or storage of questionnairesnear operators
o Savings in costs and efficiencies by not having the paperquestionnaires
o Scanning and recognition allowed efficient management andplanning for the rest of the processing workload
o Reduced long term storage requirements, questionnaires couldbe destroyed after the initial scanning, recognition and repair
Ad t d
8/3/2019 OCR by me
17/26
5/3/12
Advantages andDisadvantages
Disadvantages of Using Images Rather ThanPaper
o Accuracy
While OCR technology can be effective inconverting handwritten or typedcharacters, it does not give as highaccuracy as of OMR for reading data, where
users are actually marking forms Additional workload to data collectors OCR
has severe limitations when it comes tohuman handwriting
Characters must be hand-printed with
O ti d
8/3/2019 OCR by me
18/26
5/3/12
Operation andManagement
OCR Process Stages
o Document Scanning process
Scanning speed will be determined by the quality of
the scanner machines, the size of non-drop outcolor. Paper quality, cleanness, weights.
o Recognizing process
The recognizing process is to interpret images. Theright memory (dictionary) and the configurationthreshold will determine the accuracy ofinterpretation of the ICR.
o Verifying Process
To compare the value of the interpreted image with
8/3/2019 OCR by me
19/26
5/3/12
Operation and Management (cont.)
Image Manipulation
o Electronic questionnaires can be sent to specialist operatorsthen back to the original operator if necessary
o Same questionnaire can be worked on simultaneously by two ormore persons
o Electronic questionnaires are readily available for post censusanalysis (easier access to questionnaires)
o Parts of various questionnaires on screen at once for interrecord editing
o Able to view the relevant field book entry on screen inconjunction with questionnaires which is helpful for coding and
8/3/2019 OCR by me
20/26
5/3/12
Operation and Management (cont.)
Coding Assistance
o The problems are simpler for the operator to identify
o Can use images of questions that will not be captured (scanned butnot recognized) to help the coding process. ex, light pencil.
o Operator can magnify images to read characters not discernible tothe naked eye
o
Appropriate software ensures that the data is validated as the formsare read.
o Checks to ensure selections on a form are filled in.
o Possible to distinguish between intended marks and marks that havebeen erased.
8/3/2019 OCR by me
21/26
5/3/12
Operation and Management (cont.)
OMR Scanner Speed
Factors
o Skew: Each document is moved from an automatic
feeder into ascanner and angle of skew is sometimesintroduced.
o De-skew: Analyze the image bit- map, calculates andreturns the angle of skew up to +/-25. Example. De-skew often refer to %, which is the pixel shift. 10% is
a 20-pixel shift in a line of 200 pixels or one tenth ofan inch in an inch long line.
8/3/2019 OCR by me
22/26
5/3/12
Operation and Management (cont.)
Landscape Detection and Auto Rotation:
o landscape detection will automatically detect androtate appropriate images 90 degrees.
White Page Detection:o Normally, a double-sided scanner creates two
images per scanners page.
o However, if the back or front page is blank, there is
no need to store this image.
White page detection
o Allows the user to avoid storing blank page.
OCR Fi ld O ti
8/3/2019 OCR by me
23/26
5/3/12
OCR Field Operation(cont.)
Reasons of Error- Reading of OCR
Bad condition of the form because of dirt, folded, crumple, etc.
Forms fed into OCR scanner are not straight (at an angle); Incompletely filled
Reduce Error-Reading of OCR
Checking the questionnaires for completeness and consistencies; Preparation of own memory(dictionary); Defining permissible margins of OCR reading errors
Particular Care in Writing Numbers or Alphabetic
One box contains only one character; Characters should not extend outside designated boxes;
Unnecessary lines of characters such as points, decorative strokes, hooks, etc. are prohibited.Strokes should not be ended with flourishes or extensions.
All lines should be connected without breaks; All lines or dots should be pressed with the samepressure.
Value Checking Steps: Verify that the information captured by OMR is the same with thequestionnaire
8/3/2019 OCR by me
24/26
5/3/12
(OCR) tool used in pharmaceutical, food andbeverage, and other packaging inspectionapplications to read and verify printed text
8/3/2019 OCR by me
25/26
5/3/12
Mark Inspection for IC packages and discretecomponents.
8/3/2019 OCR by me
26/26
Click to edit Master subtitle style
5/3/12 Workshop on
THANK YOU!