Upload
harvey-sherman
View
242
Download
1
Tags:
Embed Size (px)
Citation preview
CSA
Scanning Technologyand
Its Application in Ethiopia
Yakob Mudesir Deputy Director General
Central Statistical Agency of EthiopiaE-mail:[email protected]
Web:www.csa.gov.et
October 2009Kampala, Uganda
CSA
Outline:
•Background
•Scanning Technology
•Requirements for Scanning
•The Ethiopian Experience
CSA
Population and Housing Census process is the largest data
capturing exercise that a country can undertake.
It involves capturing of millions of forms
The Central Statistics Agency (CSA) started using old techniques like Punched Card Reader as early 1960’s.
Two Population and Housing Censuses have so far been conducted in Ethiopia using the traditional method of data capturing.
The first Population and Housing Census was carried out in 1984.
Background
CSA
During the 1984 Census: Data capture was done on manual keyboard based entry using mainframe computer FORMSPEC data entry system was used
It took more than 2 years to capture the data for about 42 million people.
In the case of the 1994 Census: Data capture was again done on manual keyboard entry basis using PC’s CENTRY data entry system (IMPS) was used
Background
CSA
It took about 18 months to capture the data for the population of about 53 million.
The entry work was done on 2-shift basis
About 180 data entry clerks were involved Around 90 Pc’s were used
Background
CSA
Time consuming
Does not allow the availability of timely data
The data will be weaker in representing the current or existing situation
Subject to additional non-sampling errors
Human error due to manual keying
Due to the volume of the data, a 100% verification as in the case of sample surveys, is difficult.
Some Limitations of the Keyboard Manual Entry Method
CSA
Involves a great deal of human resource
management.
Large number of data entry operators
and equipment required
Some Limitations of the Keyboard Manual Entry Method
CSA
The need to have the census result right on time and the limitations discussed above forced the statistical offices to look for other alternatives.. specially related to large volume of data
- The Emerging of Scanning Technology
The need for alternative solution
CSA
The Scanning Technology in general implements two basic techniques
Mark recognition, like the Optical Mark Reader (OMR)
Character recognition, like the Optical Character Recognition (OCR) and the Intelligent Character Recognition (ICR)
The Scanning Technology
CSA
OMR is the recognition of shaded marks (blobs) on the forms
The positioning of these blobs on a form determines the alphanumeric characters they represent
The character recognition is the recognition of alphanumeric characters on forms and they are of 2 types:
OCR which is the recognition of machine printed characters and . .ICR which refers to the capture of hand-
printed characters from a form
The Scanning Technology
CSA
Significant decrease in time required to capture the data
This helps to get timely data
Users’ need satisfied (policy makers, planners, researchers, etc.)
Reduces the non sampling error
No need to worry to store millions of forms for possible future references
Scanning captures the whole content of a questionnaire in an electronic image format
Major Benefits of the Scanning Technology
CSA
3533
1
1
1
2
1
2
07
1
01
2
33EXPORT
ScanningRecognition & Extraction
The Process involved in Scanning
CSA
- Proper planning
- On the DESIGN of the questionnaire
- On PRINTING the questionnaire
- On RECORDING answer
- On Questionnaire HANDLING
- On securing ADEQUATE SPACE for questionnaire movement
Why?
In order to minimize the rejection rate and increase the Recognition Rate
Requirements for Effective Scanning
CSA
Proper training
Both on Hardware and Software
This helps to “ own” the technology
Being able to use the technology after the departure of the trainers / technical advisors
A well organized space for forms and data flow is required
Requirements for Effective Scanning
CSA
- A Reliable Network System
Requirements for Effective Scanning
CSA
Data Processing CenterWarehouse
Registering the EA Received from the
Field
Receiving
Receiving the Questionnaires
Registering the EA for Scanning
Waiting Room
Scanning Room
Validation Room
Processing Room
Store
1
2
3
4
5
6
7
8
- A WELL STRUCTURED SPACE FOR FILE FLOW
Requirements for Effective Scanning
CSA
File recorder: Records the ID of the outgoing questionnaires
Filing Box handlerChecks the village code on the box is the same as on the questionnaireMachine room box handler Checks
the number, orientation and damage to questionnaires
Guillotine Machine operator Cuts the binding edge off all questionnaires in the village in one operation and places them on the scanner
Scanner technicianEnsures smooth running of the machine and assists with the paper handling
BOX
Scanner Paper HandlerResponsible for checking paper throughput through the scanner
Computer operators (x2)Responsible detecting errors in the scanning process
Stitching machine operator lifts paper off scanner and stitches in the bottom left corner
Machine room box handler returns box to the File recorder
Filing Box handlerReturns the questioanires to the shelves
File recorder:Records the ID No of the incomming questionnaires.
Loos
e Paper
Loose Paper
Principal Supervisor
Senior Supervisor
Scanner Supervisor
Proper File Management
Requirements for Effective Scanning
CSA
Proper file management and care
Checking batch (EA) IDs and orientation of forms Proper recording of the in-coming and out- going questionnaires
Ensuring the EA code on each box is the same as the one on the questionnaires Close attention in detecting errors in the scanning process is required
Requirements for Effective Scanning
CSA
The Ethiopian Experience
CSA
Study tour made in two African countries
Tanzania
To learn from their successes
Data capture of the 2002 Census of Tanzania was done in about 26 days General report tables were produced within 3 months from the start of the scanning
Experience Sharing
CSA
Ghana To learn from their difficulties Data capture of the 2000 Census took about 6 months - ( forms from 29,000 EAs) 3 Scanners were used (Kodak, Fujitsu) > The larger scanner was Kodak 500D
> Speed: About 500 forms/min Power failure was one of the major problems
> Loss of some data occurred as a result > A large generator was installed to minimize the effect of the frequent power cut
Experience Sharing
CSA Identification of the Technology
For scanning of the 2007 Census the Optical Mark Reader (OMR) technique was used
Scanning Technology to be used
PhotoScribe Series PS900 Scanners
DRS Scanning Technology product
CSA
DRS Photo Scribe Series PS900
High speed Imaging Mark Reader
Windows XP professional
Network connectivity
CD R/WR drive
A TFT monitor, Keyboard, mouse
Speed: up to 8,500 forms / hour
Identification of the Technology
CSA Design and Printing of Forms
Types of the 2007 Census forms
Short questionnaires Long questionnaires Household Listing Forms Summary Forms Community Level Forms
Batch Header Form – Scanned to create EA database
CSA
Long Questionnaire
Design and Printing of Forms
CSA
Control Database Form Summary Form
Design and Printing of Forms
CSA
Data from the Pilot Census successfully scanned (OMR), key-corrected, exported to text format, tabulated and tested.
One scanner (PS 900 Photo Scribe) was used to capture the pilot data
Technical experts from the DRS company assisted in capturing, validating and exporting the pilot data
Pilot
CSA
Hardware and Software training 16 professionals trained The training in general took about 7 working days SOSKITW for Windows:- a DRS software package for scanning was used Components of the SOSKITW Software :
SOSGen : - used to generate scanning decodes for completed OMR forms (How marks on forms are interpreted and stored)
SOSInp : - used to scan, validate and export scanned data.
Training
CSA
Equipment purchased and installed
10 additional PS900 iM2 DRS Scanners
16 high capacity PC’s for key-correction
Census data processing work plan prepared Recruitment of temporary staff Staff training (scanning technology, CSPro) Retrieval and organization of completed forms Scanning and validation Computer editing and tabulation
(For each activity: duration and responsible body are indicated)
Preparatory Activities
CSA
About 33 teams for registering and organizing forms are organized 3 persons assigned per team
Census data processing teams organized
Batch header database group Scanning and validation teamShift supervisors Two senior programmers responsible for the overall scanning
process Other sub-professional staff assigned
4 batch header scanning technicians 16 data validation workers
Preparatory Activities
CSA
– The scanning room organized
– An air conditioner for the scanning room
installed
– A high capacity automatic generator installed to ensure uninterrupted power supply
Preparatory Activities
CSA
Organized forms taken from store to the waiting room
Batch header information printed and associated with its respective EA box
The existence of each EA verified
Checked EAs sent to the scanning room
Scanned forms are finally sent back to the stores
Scanned data are validated / key-corrected
The Scanning Process
CSA
The actual scanning started mid July and the scanning work has been completed in November 2007
The Scanning Process
CSA
Scanned, key-corrected and exported data
Batch Edit program based on edit specs provided by subject matter specialists developed and run on the data.
The software to be used in editing the data will be the Census and Survey Processing System (CSPro)
Data Cleaning / Computer Editing
CSA
Owning the Technology
Two Professionals have been trained in England for two weeks on scanned document processing using DocXP in close collaboration with DRS
Printing of the Questionnaire locally
Questionnaire design
Our Professionals are working hard to process the upcoming welfare Monitoring Survey using scanning technology
Future Plan
CSA
THANK YOU