41
Multiple Frame Integration for OCR on Mobile Devices Master’s Thesis Georg Krispel Advisor: Horst Bischof December 12, 2016 Institute for Computer Graphics and Vision Anyline GmbH

Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Multiple Frame Integration for OCR on

Mobile Devices

Master’s Thesis

Georg Krispel

Advisor: Horst Bischof

December 12, 2016

Institute for Computer Graphics and Vision

Anyline GmbH

Page 2: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Scene Text Recognition on Mo-

bile Devices

Page 3: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Scene Text Recognition on Mobile Devices

Scene Text Recognition Use Cases

3

Page 4: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Scene Text Recognition on Mobile Devices Cont’d

• An almost orthogonal view is

assumed

• A search window is introduced to

improve user experience and spare

searching for the text

• Sophisticated preprocessing steps

• Text recognition

• Possible repetition for validation

4

Page 5: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Scene Text Recognition on Mobile Devices Cont’d

• An almost orthogonal view is

assumed

• A search window is introduced to

improve user experience and spare

searching for the text

• Sophisticated preprocessing steps

• Text recognition

• Possible repetition for validation

4

Page 6: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Problems

• Low resolution images from

outdated mobile phones

• Reflections and glares

• Poor lighting conditions

5

Page 7: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Problems

• Low resolution images from

outdated mobile phones

• Reflections and glares

• Poor lighting conditions

5

Page 8: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Objectives

• Evaluate the possibilities of mitigating these effects to improve

overall text recognition results

• Exploit multiple frames available in the camera stream and their

redundant information (Multiple Frame Integration)

• Implement the resulting pipeline on mobile hardware

6

Page 9: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Scene Text Processing Pipeline

Page 10: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Assumptions

• Text is written on a nearly planar surface

• The surface is well textured

• Sufficiently smooth motion of the camera

8

Page 11: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Overview

• Detect text in keyframes and track it (respectively the underlying

plane) over time

• Keyframe selection according to blurriness and text detection result

• Before text detection we rectified the underlying plane

• Utilizing multiple threads to outsource expensive tasks

• Asynchronous plane rectification and scene text detection

• Tracking of dominant plane in order to propagate text detection

results to remaining frames

• Reinitialization after certain time respectively degeneration of

tracking

9

Page 12: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Initialization Process

#0 #1 #8 #9 #10

#0 #0

MainThread

TextDetectionThread

Plane Rectification Text Detection

Tracking TrackingTracking Tracking & MFI Tracking & MFI

Pipeline Initialization Process

10

Page 13: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Initialization Process Cont’d

Pipeline Processing Example

11

Page 14: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Initialization Process Cont’d

Pipeline Processing Example

12

Page 15: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Modules

A modular design ensures the possibility of exchanging the different parts

of the pipeline:

• Visual Tracking

• Rectification

• Text Detection

• Multiple Frame Integration

13

Page 16: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Visual Tracking

• Feature based

• Good features to track and Kanade-Lucas-Tomasi (8, 13, 14)

• AKAZE features (1, 2) and FLANN matching (9)

• Intensity based refinement just for text patches

• Parametric image alignment using ECC (6)

14

Page 17: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Rectification

• Rectangular region localization and

extraction (LocEx) module by

Andreas Hartl et al. (7)

• M-Estimator Sample Consensus

(MSAC) based vanishing point

detection by Nieto et al. (12)

15

Page 18: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Rectification

• Rectangular region localization and

extraction (LocEx) module by

Andreas Hartl et al. (7)

• M-Estimator Sample Consensus

(MSAC) based vanishing point

detection by Nieto et al. (12)

15

Page 19: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Rectification

• Rectangular region localization and

extraction (LocEx) module by

Andreas Hartl et al. (7)

• M-Estimator Sample Consensus

(MSAC) based vanishing point

detection by Nieto et al. (12)

15

Page 20: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Scene Text Detection

• TextSpotter (TS) by Neumann et al. (11)

• Based on classification and grouping of Extremal Regions

• Stroke-Width-Transformation (SWT) by Epshtein et al. (5)

16

Page 21: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Multiple Frame Integration

TextRecognition

RecognitionResult Fusion

ImageEnhancement

TextRecognition

0 4 9 5 4

0 0 9 5 4

8 4 9 5 4

0 4 9 5 4

0 2 9 5 4

0 4 9 5 4

0 4 9 5 4

MFI approaches

17

Page 22: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Multiple Frame Integration Cont’d

• Image Enhancement

• Minimum Operator

• Integration method by Yi et al. (15)

• Result Fusion

• Voting for most frequent recognition

18

Page 23: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Impact of MFI Approaches on

Overall Recognition Results

Page 24: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Datasets

• We assumed the use case of energy

meter readings

• We tailored our pipeline to solely

detect the respective numbers

• Just bright text on dark background

• Additional histogram based

verification step

• Constrained bounding box

dimensions

20

Page 25: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Datasets

Exemplary frames of the evaluation datasets showing different types of energy

meters and ground truth annotation

21

Page 26: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Datasets Cont’d

Video ID Light source max. Resolution No. of Frames Duration

1 Tungsten 768x1366 228 00:07

2 Daylight 768x1366 209 00:07

3 Flash 768x1366 581 00:19

4 Daylight 768x1366 1280 00:42

5 Tungsten 768x1366 507 00:16

6 Tungsten 768x1366 234 00:07

22

Page 27: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Detection and Tracking Accuracy

• We utilized CLEAR-MOT Evaluation Framework (4)

• Multiple Object Tracking Precision (MOTP)

• Multiple Object Tracking Accuracy (MOTA)

• We compared our method with full tracking-by-detection approaches

• Thereby, subsequently occurring bounding boxes are associated by

their overlap utilizing Munkres’ algorithm (10).

23

Page 28: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Detection and Tracking Accuracy Cont’d

Res. Method MOTP Misses FP

rate

MM MOTA

768x1366 NATIVE TS 0.74 0.27 0.98 0.08 -0.32

480x854

NATIVE TS 0.74 0.29 1.52 0.14 -0.95

NATIVE SWT 0.60 0.99 0.87 0.00 -0.86

TS 0.75 0.57 0.13 0.02 0.28

MSAC&TS 0.73 0.59 0.10 0.02 0.29

LOCEX&TS 0.75 0.57 0.13 0.02 0.28

KLT&MSAC&TS 0.71 0.48 0.31 0.00 0.21

AK&MSAC&TS 0.70 0.52 0.11 0.02 0.36

Hybrid KLT&MSAC&TS 0.62 0.49 0.17 0.00 0.34

Multiple Object Tracking Precision and Accuracy

24

Page 29: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Runtime

Device ResolutionTracking

Method

Rectification,

DetectionTotal

Laptop

480x854 AKAZE 318.2 144.4

480x854 KLT 263.1 22.2

Hybrid KLT 73.1 5.0

Shield Tablet480x720 KLT 2788.3 469.2

Hybrid KLT 519.2 84.9

Average time performance measurements in milliseconds

25

Page 30: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Reading Accuracy

We extracted the text patches and utilized the Anyline Energy module1

to read the meter readings from

• the current patch and

• the currently available integrated counterpart respectively we fused

the preceding results.

These recognition rates are compared.

1https://www.anyline.io/energy-anyline-io-de/

26

Page 31: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Reading Accuracy Cont’d

Single extracted frames sampled during a sequence of 62 frames compared to

respective integration results.

27

Page 32: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Reading Accuracy Cont’d

Degenerated Multi-frame Integration over Time

28

Page 33: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Reading Accuracy Cont’d

SF MIN YI HIST

Method

0.0

0.2

0.4

0.6

0.8

1.0

Rec

ogn

itio

nra

te

ECC

Hybrid

The recognition rates using the single extracted frames and the different MFI

methods

29

Page 34: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Reading Accuracy Cont’d

Resolution Single

frame

Minimum

operator

Yi inte-

gration

Histogram

voting

768x1366 0.45 0.44 0.55 0.63

480x854 0.38 0.50 0.50 0.62

320x568 0.36 0.48 0.43 0.61

Hybrid 0.33 0.29 0.27 0.61

Recognition rates

30

Page 35: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Conclusion & Outlook

Page 36: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Conclusion & Outlook

• We showed that our MFI approach is capable of achieving real-time

performance with little optimization on mobile hardware

• The multi-thread detection and tracking approach can keep up with

full detection approaches

• A distinct improvement of the recognition rates is possible

• Generally image enhancement integration methods require almost

perfect image registration

• If text recognition is fast enough, result fusion methods should be

preferred over the evaluated image enhancement approaches

32

Page 37: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

Questions?

33

Page 38: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

References I

References

[1] P. F. Alcantarilla, A. Bartoli, and A. J. Davison. KAZE features. In

European Conference on Computer Vision, 2012.

[2] P. F. Alcantarilla, J. Nuevo, and A. Bartoli. Fast explicit diffusion

for accelerated features in nonlinear scale spaces. In British Machine

Vision Conference, 2013.

[3] D. L. Baggio, S. Emami, D. M. Escriva, K. Ievgen, N. Mahmood,

J. Saragih, and R. Shilkrot. Mastering OpenCV with Practical

Computer Vision Projects. Packt Publishing, Limited, 2012.

[4] K. Bernardin and R. Stiefelhagen. Evaluating multiple object

tracking performance: the CLEAR MOT metrics. EURASIP Journal

on Image and Video Processing, 2008(1):1–10, 2008.

34

Page 39: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

References II

[5] B. Epshtein, E. Ofek, and Y. Wexler. Detecting text in natural

scenes with stroke width transform. In Conference on Computer

Vision and Pattern Recognition, pages 2963–2970. IEEE, 2010.

[6] G. D. Evangelidis and E. Z. Psarakis. Parametric image alignment

using enhanced correlation coefficient maximization. Transactions on

Pattern Analysis and Machine Intelligence, 30(10):1858–1865, 2008.

[7] A. Hartl and G. Reitmayr. Rectangular target extraction for mobile

augmented reality applications. In International Conference on

Pattern Recognition, pages 81–84. IEEE, 2012.

[8] B. D. Lucas, T. Kanade, et al. An iterative image registration

technique with an application to stereo vision. In International Joint

Conference on Artificial Intelligence, volume 81, pages 674–679,

1981.

35

Page 40: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

References III

[9] M. Muja and D. G. Lowe. Fast approximate nearest neighbors with

automatic algorithm configuration. International Conference on

Computer Vision Theory and Applications, 2(331-340):2, 2009.

[10] J. Munkres. Algorithms for the assignment and transportation

problems. Journal of the Society of Industrial and Applied

Mathematics, 5(1):32–38, March 1957.

[11] L. Neumann and J. Matas. Real-time scene text localization and

recognition. In Conference on Computer Vision and Pattern

Recognition, pages 3538–3545. IEEE, 2012.

[12] M. Nieto and L. Salgado. Real-time robust estimation of vanishing

points through nonlinear optimization. In SPIE Photonics Europe,

pages 772402–772402. International Society for Optics and

Photonics, 2010.

36

Page 41: Multiple Frame Integration for OCR on Mobile Devices - Master's … · 2016. 12. 15. · Feature based Good features to track and Kanade-Lucas-Tomasi (8, 13, 14) ... Multiple Object

References IV

[13] J. Shi and C. Tomasi. Good features to track. In Computer Society

Conference on Computer Vision and Pattern Recognition, pages

593–600. IEEE, 1994.

[14] C. Tomasi and T. Kanade. Detection and tracking of point features.

School of Computer Science, Carnegie Mellon Univ. Pittsburgh,

1991.

[15] J. Yi, Y. Peng, and J. Xiao. Using multiple frame integration for the

text recognition of video. In International Conference on Document

Analysis and Recognition, pages 71–75. IEEE, 2009.

37