View
620
Download
0
Category
Tags:
Preview:
Citation preview
AOCRArabic Optical Character Recognition
ABDEL RAHMAN GHAREEB KASEM
ADEL SALAH ABU SEREEA
MAHMOUD ABDEL MONEIM ABDEL MONEIM
MAHMOUD MOHAMMED ABDEL WAHAB
Main contents Introduction to AOCR
Feature extraction
Preprocessing
AOCR system implementation
Experimental results
Conclusion & future directions
Applications
Main contents Introduction to AOCR
Feature extraction
Preprocessing
AOCR system implementation
Experimental results
Conclusion & future directions
Applications
Introduction
Why AOCR? What is OCR? What is the problem in AOCR? What is the solution?
Pre-Segmentation. Auto-Segmentation.
Main contents Introduction to AOCR
Feature extraction
Preprocessing
AOCR system implementation
Experimental results
Conclusion & future directions
Applications
Preprocessing1. Image rotation
2. Segmentation. Line segmentation. Word segmentation
3. Image enhancement
PreprocessingProblem of tilted image
1. Image rotation
Preprocessing 1. Process rotated image
Rotate by -1 degree
Preprocessing 1. Process rotated image
Rotate by -2 degree
Preprocessing 1. Process rotated image
Rotate by -3 degree
Preprocessing 1. Process rotated image
Rotate by -4 degree
Preprocessing 1. Process rotated image
Rotate by -4 degree
Preprocessing 1. Process rotated image
Clear zeros
Clear zeros
Mean value0.2*Mean value
Preprocessing 1. Process rotated image
Threshold effect
Preprocessing 1. Process rotated image
GRAY Scale Vs. Black/White
in Rotation process
Original image
Gray scale
Black/White
Preprocessing
1. Process rotated image
2. Segmentation. Line segmentation. Word segmentation
3. Image enhancement
Preprocessing
2. Segmentation.
What is the Segmentation process?
Why we need segmentation in Arabic OCR?
What is the algorithm used in Segmentation?
2. Segmentation.Preprocessing Line level segmentation
2. Segmentation.Preprocessing Line level segmentation
2. Segmentation.Preprocessing Word level segmentation
2. Segmentation.Preprocessing
Preprocessing
1. Process rotated image
2. Segmentation. Line segmentation. Word segmentation
3. Image enhancement
Preprocessing
3. Image enhancement
3. Image enhancement Preprocessing Noise Reduction
By morphology operations
Very important notation:
Apply Image Enhancement operations on small images not large image
بسم ال الرحمن الرحيم
ال أكبر ال أكبر ال أكبر
ل إله ال ال
وال أكبر
بسم ال الرحمن الرحيم
ال أكبر ال أكبر ال أكبر
ل إله ال ال
وال أكبر
Large Image
X
Small Images
Main contents Introduction to AOCR
Feature extraction
Preprocessing
AOCR system implementation
Experimental results
Conclusion & future directions
Applications
FeatureFeature ExtractionExtraction
اكبر ال
• Feature Selection
Suitable for HMM technique ( i.e. window scanning based features).
Suitable for word level recognition (not character). To retain as much information as possible. Achieve high accuracy with small processing time.
we select features such that:
Satisfaction of the previous points
Each feature designed such that, it deals with the principle of slice technique
محمد رسول ال
n1
n3
n4
n6
n5
n2
n7
Feature vector
Features deal with words not single character, where algorithm is based on segmentation free concept.
We avoid dealing with structural features as it requires hard implementation, in addition large processing time.
To achieve high accuracy with lowest processing time, we use simple features & apply overlap between slices to ensure smoothing of extracted data.
الصلةoverlap
(1)Background Count
Calculate vertical distances (in terms of pixels) of background regions, where each background region is bounded by two foreground regions.
النجاح background
Foreground
Feature vector
d1
d3
d2
d3 d2 d1
Feature vector ofthe selected slide
Two pixels with on overlap
:Example
Feature Figure
Baseline Count (2)
calculate number of black pixels above baseline (with [+ve] value) & number of black pixels below baseline (with [-ve] value) in each slide.
:Example
Two pixels with on overlap
Baseline
Thinning
No. of black pixels(above baseline (X1
No. of black pixels(below baseline (X2
X2 X1 Feature vector
Feature Figure
Centroid (3)
For each slide we get its Centroid (cx, cy) so the feature vector contains sequence of centroids.
:Example
Cx Cy Feature vector
Two pixels with on overlap
Cross Count (4)
For each slide we calculate number of crossing from background (white) to foreground (black).
:Example
2 Feature vector
Two pixels with on overlap
Euclidean distance (5)
We get the average foreground pixel in region above & below baseline, then Euclidean distance is measured from baseline to the average points above & below baseline, with +ve value for point above and –ve value for point below.
Thinning
Baseline
Euclidean distanceabove baseline D1
Euclidean distancebelow baseline D2
One pixel without overlap
D2 D1 Feature vector
:Example
Feature Figure
Horizontal histogram (6)
For each slide we get its horizontal histogram (horizontal summation for rows in the slide).
Calculate HistogramFour pixels with one overlap
:Example
Feature Figure
Vertical histogram (7)
for each slide we get its vertical histogram (vertical summation for columns).
X2 X1 Feature vector
Two pixels with one overlap
:Example
Feature Figure
)Weighted vertical histogram (8
Exactly as the previous feature but the only difference is that, we multiply each row in the image by a number (weight), where the weight vector which be multiplied by the whole image takes a triangle shape.
:Example
weight vector
1
1-
X2 X1 Feature vector
Two pixels with one overlap
Feature Figure
Main contents Introduction to AOCR
Feature extraction
Preprocessing
AOCR system implementation
Experimental results
Conclusion & future directions
Applications
Implementation of AOCR BasedHMM Using HTK
Data preparation
Creating Monophone HMMs
Recognizer Evaluation
Data preparation
The Task Grammar
The Dictionary
Recording the Data
Creating the Transcription Files
Coding the Data
The Task Grammar
Isolated AOCR Grammar ----->Mini project
Connected AOCR Grammar ---->Final project
Isolated AOCR Grammar
$name =a1| a2 | a3 | a4 | a5|……………|a28|a29;
( SENT-START <$name> SENT-END )
a1-----> ا a2---> ب a3---> ت a4---> ث
a29---> space
Connected AOCR Grammar
$name =a1| a2 | a3 | a4 |
a5 |……………|a124|a125;
(SENT-START <$name> SENT-END )
a1-----> ا a2---> ـا a11---> ــبــ a23---> ـ جـ a124---> لله a125---> ـــــــ
?Why Grammar
Start
a1
a2
a124
a125
a3
End
?How is it created
Hparse creates it
Grammar
Word Net )Wdnet (
HParse
The Dictionary
Our dictionary is limited???
The Dictionary
Recording the Data
Featureextraction Transformer
(Image)D signal-2
D-1vector
wav.
Creating the Transcription Files
Word level MLF
Phone level MLF
Word level MLF
#! MLF! #"*/1.lab"فصل."*/2.lab"في الفرق بين الخالق والمخلوق."*/3.lab"وما ابراهيم وآل ابراهيم الحنفاء والنبياء فهم."*/4.lab".يعلمون انه ل بد من الفرق بين الخالق والمخلوق..
فصلفي الفرق بين الخالق والمخلوق
وما ابراهيم وآل ابراهيم الحنفاء والنبياء فهميعلمون انه ل بد من الفرق بين الخالق والمخلوق
Phone level MLF
# !MLF !#"lab.1/*"a74a51a88."lab.2/*"a74a108a123a1a86a75a38a77a123
# !MLF !#"lab.1/*"
فصل.
"lab.2/*"في الفرق بين الخالق والمخلوق
."lab.3/*"
وما ابراهيم وآل ابراهيم الحنفاء والنبياء فهم.
"lab.4/*".يعلمون انه ل بد من الفرق بين الخالق والمخلوق
.
Coding the Data
HCOPY
MFCC FilesS0001.mfcS0002.mfcS0003.mfc
..etc
Wave form filesٍٍS0001.wav
S0002.wavS0003.wav
..etc
ConfigurationFile
Script File
Creating Monophone HMMs
Creating Flat Start Monophones
Re-estimation
Creating Monophone HMMs
The first step in HMM training is to define a
prototype model.
The parameters of this model are not important; its purpose is to define the model topology
The Prototype~o <VecSize> 39 <MFCC_0_D_A>
~h "proto"<BeginHMM><NumStates> 5<State> 2<Mean> 390.0 0.0 0.0 . . . . . . . <Variance> 391.0 1.0 1.0 . . . . . . . . <State> 3<Mean> 390.0 0.0 0.0 . . . . . . .<Variance> 391.0 1.0 1.0 . . . . . . . <State> 4<Mean> 390.0 0.0 0.0 . . . . . . . <Variance> 391.0 1.0 1.0 . . . . . . .<TransP> 50.0 1.0 0.0 0.0 0.00.0 0.6 0.4 0.0 0.00.0 0.0 0.6 0.4 0.00.0 0.0 0.0 0.7 0.30.0 0.0 0.0 0.0 0.0<EndHMM>
Initialization Process
Proto
Vfloors
Proto
HCompV
hmm0
Initialized prototype~o <VecSize> 39 <MFCC_0_D_A>~h "proto"<BeginHMM><NumStates> 5<State> 2<Mean> 39-5.029420e+000 1.948325e+000 -5.192460e+000 . . . . . <Variance> 391.568812e+001 1.038746e+001 2.110239e+001 . . . . . <State> 3<Mean> 39-5.029420e+000 1.948325e+000 -5.192460e+000 . . . . . .<Variance> 391.568812e+001 1.038746e+001 2.110239e+001 . . . . . <State> 4<Mean> 39-5.029420e+000 1.948325e+000 -5.192460e+000 . . . . . . .<Variance> 391.568812e+001 1.038746e+001 2.110239e+001 . . . . . . . <TransP> 50.0 1.0 0.0 0.0 0.00.0 0.6 0.4 0.0 0.00.0 0.0 0.6 0.4 0.00.0 0.0 0.0 0.7 0.30.0 0.0 0.0 0.0 0.0<EndHMM>
Vfloors Contents
v varFloor1~
Variance> 39>
1.568812e-001 1.038746e-001 2.110239e-001 . . . . . .
hmmdefs
~o <VecSize> 39
<MFCC_0_D_A>
Initialized proto
Creating initialized Models
a125
a2
a1
Initializedmodel
Creating Macros File
Vfloors file o~ VecSize> 39>
<MFCC_0_D_A>
Vfloors file
Re-estimation Process
Hmmdefsmacros
HERest
InitializedProto
HCompV
Hmmdefsmacros
Training FilesMFc Files
Phones levelTranscription
monophones
Recognition Process
Hvite
Trained Models
Test Files
Word Networkwnet
The dictioarydict
Reconizedwords
Recognizer Evaluation
HResults
ReferenceTranscription
ReconizedTranscription
Accuracy
Main contents Introduction to AOCR
Feature extraction
Preprocessing
AOCR system implementation
Experimental results
Conclusion & future directions
Applications
Experimental Results
Main Problem -1
1-1 1-1 Requirements:Requirements: Connected Character Recognition.Connected Character Recognition.
Multi-sizes.Multi-sizes.
Multi-fonts.Multi-fonts.
Hand Written.Hand Written.
1-2 1-2 Variables:Variables: Tool .Tool .
Method used to train and test.Method used to train and test.
Model Parameters.Model Parameters.
Feature Parameters.Feature Parameters.
Tool:
How it can operate with images?
DiscreteInput images.
(failed)
ContinuousInput a continuouswave form
(Succeeded)
DATA Input to HTK
Isolated Character Recognition -2
2-1 Single Size (16)- Single Font (Simplified 2-1 Single Size (16)- Single Font (Simplified Arabic Fixed).Arabic Fixed).
2-2 Multi-Sizes Character Recognition.2-2 Multi-Sizes Character Recognition.
2-3 Variable Lengths Character Recognition2-3 Variable Lengths Character Recognition.
2-1 2-1 Single Size (16)- Single Font (Simplified Arabic Single Size (16)- Single Font (Simplified Arabic Fixed)Fixed)
Best method.
Best number of states.
Best Widow size.
Best method:
Model for each char. (35 models) Vs Model for each Char. In each position (116 Models)
(Vertical histogram-11 states-window=2.5)
35No. of Models
99.14 %Accuracy
116
100%
Best number of states:
(Vertical histogram-Number of Models=35 -window=2 pixels)
3No. of States
96.55 %Accuracy
11
99.14%
Best Widow size:
(2-D histogram-Number of Models=124-11 states).
97.00%
97.50%98.00%
98.50%
99.00%
99.50%100.00%
100.50%
1.2 1.7 1.5Window size
Accu
racy
2-2 2-2Multi-Sizes Character RecognitionMulti-Sizes Character Recognition
Sizes (12-14-16):
(2-D histogram-Number of Models=124-11 states).
85.00%
90.00%
95.00%
100.00%
105.00%
1.6 1.84 2
Window size
Ac
cu
rac
y
2-3 2-3Variable Lengths Character RecognitionVariable Lengths Character Recognition
Train with different lengths:
Vertical histogram gives Accuracy more than 2-D histogram Vertical histogram-Number of Models=35 -window=2 pixels
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
4.6 3.7 2.8 2.33 1.66
Window size
Ac
cu
rac
y
Make Model for dash: Training:
Train with characters (with out dash) &dash model.
Train with different lengths & dash model.
Train with different lengths & dash model & if the character has a dash at its end we define it as a character model followed by dash model.
(True way).
Make Model for dash:
•Testing:
•Vertical histogram:
failed to recognize the dash model using all methods (recognize it as a space).
• 2-D histogram : for window size =2.6
Accuracy=100%
-3Connected Character Recognition
3-1 Single Size (16)- Single Font (Simplified 3-1 Single Size (16)- Single Font (Simplified Arabic Fixed).Arabic Fixed).
3-2 Parameter Optimization.3-2 Parameter Optimization.
3-3 Multi-Sizes Character Recognition.3-3 Multi-Sizes Character Recognition.
3-4 Fusion by feature concatenation.3-4 Fusion by feature concatenation.
3-1 3-1 Single Size (16)- Single Font (Simplified Single Size (16)- Single Font (Simplified(Arabic Fixed(Arabic Fixed
Best Method: (on a simple experiment (10 words))
The correct way for the word Recognition is to train the character models by (Words or Lines).
Assumptions: Training data: 25-pages (495 lines). Simplified Arabic fixed (font size = 16). Images: 300dpi-black and white. Testing data: 4-pages (74 lines). Feature properties: window=2*frame.
Vertical histogram:
88.00%89.00%
90.00%91.00%92.00%93.00%
94.00%95.00%
6 6.5 7 7.5 8
Window size
Ac
cu
rac
y
2-D histogram:
89.00%90.00%91.00%92.00%93.00%94.00%95.00%96.00%97.00%
4.99 5.33 5.89
Window Size
Ac
cu
rac
y
3-2 3-2 Parameter OptimizationParameter Optimization
Line Level Vs Word Level. optimum number of mixture. optimum number of States. optimum initial transition probability. optimum window Vs frame ratio.
•Line Level Vs Word levelLine Level Vs Word level
Assumptions:
Simplified Arabic fixed (font size = 16). Testing data: same training data. Feature type: (vertical histogram, window=2*frame). Images: 300dpi-black and white.
Line LevelLevel
84.99% Accuracy
Word Level
85.36%
Conclusion:
We will concentrate on the line segmentation instead of word segmentation because of:
The disadvantages of the word segmentation: We have a limitation on the window size because
of its small size. Accuracy decreases with increasing the number
of mixture. The simplicity of the line segmentation than word
segmentation in preprocessing.
•optimum number of mixtureoptimum number of mixture. One dimension features : Training data: 495 lines Testing data: same training data. Feature type:
(Vertical histogram, window=2*frame, window size = 6.5 pixels).
70.00%
75.00%
80.00%
85.00%
90.00%
95.00%
100.00%
1 3 5 10 15
Number of Mixtures
Ac
cu
rac
y
Two dimension features : Training data: 495 lines Testing data: same training data. Feature type:
(2-D histogram, window=2*frame, window size = 5.33 pixels, N= 4)
86.00%
88.00%
90.00%
92.00%
94.00%
96.00%
98.00%
1 5 7 10
Number of Mixtures
Ac
cu
rac
y
•optimum number of Statesoptimum number of States
One dimension features :
80.00%
85.00%
90.00%
95.00%
100.00%
6 8 11 13
Number of States
Acc
urac
y
Two dimension features : Assumptions: as previous Results:
8Number of
states
92.52%Accuracy =
11
95.02%
•optimum initial transition probability optimum initial transition probability
Almost Equally likely probabilities. (Failed)
Random Probabilities ……..very bad.
Each state may still in it self or go to the next state only, probability that state sill in it self higher than probability to go to the next state…………(Succeed).
0 1 0 0 00 0.7 0.3 0 00 0 0.6 0.4 0------------------------------and so on.
•optimum window Vs frame ratiooptimum window Vs frame ratio
• Assumptions: as previous in (2-D feature)
• Results:
0.40.6Overlapping
Ratio =
91.70%92.52%Accuracy =
0.5
93.92%
Maximum Accuracy for all features:
Max. AccuracyFeature Type
95.96%2-D histogram
87.16%Euclidean distance
91.51%Cross count
95.75%Weighted histogram
89.70%Baseline count
91.61%Background count
Vertical histogram 96.97%
3-3 3-3Multi-Sizes Character RecognitionMulti-Sizes Character Recognition
Resizing the test data only: Training data: Simplified Arabic fixed-font size =16. Testing data:
Simplified Arabic fixed. Font size = 12-16-18 (After resize). 60 lines.
Feature Type: Vertical histogram
1814Font size
76.21%79.74%Accuracy
16
96.97%
Resizing the training and test data: Training data:
• Simplified Arabic fixed.• Font size = 14-16-18 • (After resize).• (324 * 3) lines.
Testing data:• (324 * 3) lines.• Same as training.
Feature Type: Vertical histogram
Accuracy = 92.15%
3-4 F3-4 Feature concatenationeature concatenation
Concatenates vertical histogram and 2-D histogram.
44Scale vertical
histogram)=
4.25.57Window size =
69.02%77.17%Accuracy =
No scale
5
84.09%
Main contents Introduction to AOCR
Feature extraction
Preprocessing
AOCR system implementation
Experimental results
Conclusion & future directions
Applications
Future works
Improving printed text system: Data base: increasing its size to support Multi-
sizes and Multi-fonts. Preprocessing improvements:
Improving the image enhancement to solve the problem of noisy pages.
Develop a robust system to solve the problems that depends on the nature of input pages (delete frames and borders and pictures and tables…..etc).
Search for new features and combine between them to
improve the accuracy.
Training and testing improvements: Tying the models. Using Adaptation supported by HTK-tool that may
improve the (Multi-size) system (size independent). Using tri-phones technique to solve the problems of
overlapping.
Improve the time response (implement all pre-processing programs by C++).
Increasing the accuracy by feature fusion.
Build the Multi-Language system (Language independent system).
Develop the hand written system, especially because HMM can attack this problem efficiently.
Develop the ON-Line system.
Main contents Introduction to AOCR
Feature extraction
Preprocessing
AOCR system implementation
Experimental results
Conclusion & future directions
Applications
Automatic Form Recognition
Check Bank Reading
بنــك مصــر: ..........................شيك رقم
: .................اسم المصرف اليه
: ..................المبلغ بالحروف: ................ المبلغ بالرقام
امضاء ...................
Digital libraries :
Where all books, magazines, newspapers…etc can be stored as a softcopy on PCs & CDs.
بسم ال
Transcription of historical archives & "non-death" of paper
Where we can store all archived papers & documents as a softcopy files.
بسم ال
Recommended