OCR by me

Embed Size (px)

Citation preview

  • 8/3/2019 OCR by me

    1/26

    Click to edit Master subtitle style

    5/3/12

    Introduction to Optical

    Character Recognition(OCR)

    MADAN H R

  • 8/3/2019 OCR by me

    2/26

    5/3/12

    Summary

    Overview of OCR

    System Requirements

    Advantages and DisadvantagesOperation and Management

    Questionnaire Design andPreparation

  • 8/3/2019 OCR by me

    3/26

    5/3/12

    OCR (Optical Character Recognition)

    Function & Features of OCR/ICR

    ICR, OCR and OMR Compared

    o Optical Mark Reader (OMR)

    o OCR/ ICR

  • 8/3/2019 OCR by me

    4/26

    5/3/12

    OCR (Optical Character Recognition)

    Also referred to as Optical Character Reader

    a system that provides a full alphanumeric

    recognition of printed or handwritten charactersat electronic speed by simply scanning the form.

    Intelligent Character Recognition (ICR) is used to

    describe the process of interpreting image data,in particular alphanumeric text.

    Sometimes OCR is known as ICR

  • 8/3/2019 OCR by me

    5/26

    5/3/12

    Functions & Features ofOCR

    Forms can be scanned through a scanner and thenthe recognition engine of the OCR system interpretthe images and turn images of handwritten orprinted characters into ASCII data (machine-readablecharacters).

    The technology provides a complete form processingand documents capture solution.

    Allows an open, scalable and workflow.

    Includes forms definition, scanning, image

  • 8/3/2019 OCR by me

    6/26

    5/3/12

    Functions & Features ofOCR

    Delivers an easy training process for building thecharacter library

    OCR finds character pattern matches from a library

    of taught characters - Watch a Real ApplicationVideo

    Optical Character Verification (OCV) confirms thepresence of desired characters in a specific

    location

  • 8/3/2019 OCR by me

    7/26

  • 8/3/2019 OCR by me

    8/26

  • 8/3/2019 OCR by me

    9/26

    5/3/12

    Functions & Features ofOCR

    Date Lot Inspection for pharmaceutical andmedical packaging.

    d

  • 8/3/2019 OCR by me

    10/26

    5/3/12

    ICR,OCR and OMRDifferences

    ICR and OCR are recognition enginesused with imaging;

    OMR is a data collection technologythat does not require a recognition

    engine.

    OMR cannot recognize hand-printed

    or machine-printed characters.

    O i l k d

  • 8/3/2019 OCR by me

    11/26

    5/3/12

    Optical Mark Reader(OMR)

    Forms

    o An OMR works with a specialized document and contains timingtracks along one edge of the form to indicate scanner where toread for marks which look like black boxes on the top or bottomof a form.

    o The cut of the form is very precise and the bubbles on a formmust be located in the same location on every form.

    Storage

    o With OMR, the image of a document is not scanned and stored.

    Accuracy

    o OMR is simpler than OCR.

    o designed properly, OMR has more accuracy than OCR.

  • 8/3/2019 OCR by me

    12/26

    5/3/12

    OCR/ ICR

    Forms

    o OCR/ ICR is more flexible since no timing tracks orblock like form IDs required.

    o The image can float on a page.

    o ICR/ OCR technology uses registration mark on thefour-corners of a document, in the recognition of animage. Respondents place one character per box onthis form.

    o

    The use of drop color reduces the size of thescanners output and enhances the accuracy.

    Storage/ retrieval

    o If the document needs to be electronically stored andmaintained, then OCR/ ICR is needed.

    o OCR/ICR technolo ies, ima es can be scanned,

    OMR OCR/ICR

  • 8/3/2019 OCR by me

    13/26

    5/3/12

    OMR-OCR/ICRCompared

  • 8/3/2019 OCR by me

    14/26

    5/3/12

    System Requirements

    Minimum capacity PC Requirements:

    o Processor: Pentium 200 MHz RAM: 32 MB Disk: 4 GB

    o Form modules are designed to operate in a batchprocessing;

    o Run under LAN and PC based platforms and take fulladvantage of the graphical user interface and 32 bitprocessing power available with most Windowsversions.

    Software:

    OCR with ICR capability software

    Questionnaire Design Software

  • 8/3/2019 OCR by me

    15/26

    5/3/12

    System Requirements(cont.)

    Scanner

    o OCR scanners with minimum capacity:

    o

    Duplex scanningo Speed: 60 sheets/ min

    o Automatic Document Feeder (ADF):

    Scanning can take a significant amount oftime, and the system lets user scan upwithout doing the OCR.

    Ad t d

  • 8/3/2019 OCR by me

    16/26

    5/3/12

    Advantages andDisadvantages

    Advantages of Using Images Rather Than Paper

    o Quicker processing; no moving or storage of questionnairesnear operators

    o Savings in costs and efficiencies by not having the paperquestionnaires

    o Scanning and recognition allowed efficient management andplanning for the rest of the processing workload

    o Reduced long term storage requirements, questionnaires couldbe destroyed after the initial scanning, recognition and repair

    Ad t d

  • 8/3/2019 OCR by me

    17/26

    5/3/12

    Advantages andDisadvantages

    Disadvantages of Using Images Rather ThanPaper

    o Accuracy

    While OCR technology can be effective inconverting handwritten or typedcharacters, it does not give as highaccuracy as of OMR for reading data, where

    users are actually marking forms Additional workload to data collectors OCR

    has severe limitations when it comes tohuman handwriting

    Characters must be hand-printed with

    O ti d

  • 8/3/2019 OCR by me

    18/26

    5/3/12

    Operation andManagement

    OCR Process Stages

    o Document Scanning process

    Scanning speed will be determined by the quality of

    the scanner machines, the size of non-drop outcolor. Paper quality, cleanness, weights.

    o Recognizing process

    The recognizing process is to interpret images. Theright memory (dictionary) and the configurationthreshold will determine the accuracy ofinterpretation of the ICR.

    o Verifying Process

    To compare the value of the interpreted image with

  • 8/3/2019 OCR by me

    19/26

    5/3/12

    Operation and Management (cont.)

    Image Manipulation

    o Electronic questionnaires can be sent to specialist operatorsthen back to the original operator if necessary

    o Same questionnaire can be worked on simultaneously by two ormore persons

    o Electronic questionnaires are readily available for post censusanalysis (easier access to questionnaires)

    o Parts of various questionnaires on screen at once for interrecord editing

    o Able to view the relevant field book entry on screen inconjunction with questionnaires which is helpful for coding and

  • 8/3/2019 OCR by me

    20/26

    5/3/12

    Operation and Management (cont.)

    Coding Assistance

    o The problems are simpler for the operator to identify

    o Can use images of questions that will not be captured (scanned butnot recognized) to help the coding process. ex, light pencil.

    o Operator can magnify images to read characters not discernible tothe naked eye

    o

    Appropriate software ensures that the data is validated as the formsare read.

    o Checks to ensure selections on a form are filled in.

    o Possible to distinguish between intended marks and marks that havebeen erased.

  • 8/3/2019 OCR by me

    21/26

    5/3/12

    Operation and Management (cont.)

    OMR Scanner Speed

    Factors

    o Skew: Each document is moved from an automatic

    feeder into ascanner and angle of skew is sometimesintroduced.

    o De-skew: Analyze the image bit- map, calculates andreturns the angle of skew up to +/-25. Example. De-skew often refer to %, which is the pixel shift. 10% is

    a 20-pixel shift in a line of 200 pixels or one tenth ofan inch in an inch long line.

  • 8/3/2019 OCR by me

    22/26

    5/3/12

    Operation and Management (cont.)

    Landscape Detection and Auto Rotation:

    o landscape detection will automatically detect androtate appropriate images 90 degrees.

    White Page Detection:o Normally, a double-sided scanner creates two

    images per scanners page.

    o However, if the back or front page is blank, there is

    no need to store this image.

    White page detection

    o Allows the user to avoid storing blank page.

    OCR Fi ld O ti

  • 8/3/2019 OCR by me

    23/26

    5/3/12

    OCR Field Operation(cont.)

    Reasons of Error- Reading of OCR

    Bad condition of the form because of dirt, folded, crumple, etc.

    Forms fed into OCR scanner are not straight (at an angle); Incompletely filled

    Reduce Error-Reading of OCR

    Checking the questionnaires for completeness and consistencies; Preparation of own memory(dictionary); Defining permissible margins of OCR reading errors

    Particular Care in Writing Numbers or Alphabetic

    One box contains only one character; Characters should not extend outside designated boxes;

    Unnecessary lines of characters such as points, decorative strokes, hooks, etc. are prohibited.Strokes should not be ended with flourishes or extensions.

    All lines should be connected without breaks; All lines or dots should be pressed with the samepressure.

    Value Checking Steps: Verify that the information captured by OMR is the same with thequestionnaire

  • 8/3/2019 OCR by me

    24/26

    5/3/12

    (OCR) tool used in pharmaceutical, food andbeverage, and other packaging inspectionapplications to read and verify printed text

  • 8/3/2019 OCR by me

    25/26

    5/3/12

    Mark Inspection for IC packages and discretecomponents.

  • 8/3/2019 OCR by me

    26/26

    Click to edit Master subtitle style

    5/3/12 Workshop on

    THANK YOU!