Text detection and recognition from natural scenes

SCENE TEXT DETECTION AND RECOGNITION

Presented by: J. Hemanth Kumar B. Kishore Kumar

ABSTRACT

Text characters in natural scenes and surroundings provide us with valuable information about the place and even provide us with some legal/important information. Hence it’s very important for us to detect such text and recognise them which helps a lot. But , it’s not really easy to recognize those text information because of the diverse backgrounds and fonts used for the text. In this paper, a method is proposed to extract the text information from the surroundings. First, a character descriptor is designed with existing standard detectors and descriptors. Then, character structure is modelled at each character class by designing stroke configuration maps.

INTRODUCTIONIn natural scenes , the text part is generally found on nearby sign boards and other objects. The extraction of such text is difficult because of noisy backgrounds and diverse fonts and text sizes. But many applications have been proven to be efficient in extraction of text from surroundings. For this , the method of text extraction is divided into two processes;1.Text detection2.Text recognition

TEXT DETECTION It is the process of localizing various regions of the scene which contain text.

It helps in removing most of the non-text regions which act as noise during the extraction of required text.

contd.

TEXT RECOGNITION The process of converting pixel-based text (image text) to readable code.

The main purpose of it is to distinguish between different text types and properly compose them.

The main focus of this paper is on text recognition. It involves 62 different identity categories of text characters, 10 digits (0-9) 26 upper case alphabets(A-Z) 26 lower case alphabets(a-z)

The text regions are generally distinguished with the aid of color uniformity and alignment of text in a line. Two different schemes are designed for achieving text recognition.

contd.

SCHEME-1:• Training a character recognizer to predict the

category of a character in an image patch.SCHEME-2:

• Training a binary character classifier for each character class to predict the existence of this category in an image patch.

TEXT UNDERSTANDING: To acquire text information from natural scene to understand surrounding environments and objects.

TEXT RETRIEVAL: To verify whether piece of text exists in the natural scene.

This understanding is used for mobile applications. Generally, a binary classifier is generated by assigning a stroke configuration for each character by the aid of its boundary and skeleton.

contd.

• By the character recognizer, text understanding is able to provide useful surrounding text information for mobile applications.

• By the character classifier of each character class, text retrieval is able to help search for expect objects from environment

Fig. presents a flowchartof scene text extraction method

This method is made different from other existing methods by adding stroke configuration to model text character structure.

LAYOUT-BASED SCENE TEXT DETECTION

• The text detection is done by taking the help of color decomposition and horizontal alignment of the text.

A. LAYOUT ANALYSIS OF COLOR DECOMPOSITION From the scene image, similar colored pixels are grouped together into same layers to separate the text from the background. A boundary clustering algorithm is used to decompose the image of the scene into different layers based on color. The boundary of the character acts as the border between the characters or the text and the surrounding surfaces.

contd.

This image describes color decomposition of scene image by boundary clustering algorithm. The top row presents original scene image and the edge image obtained from canny edge detector. The other rows present color layers obtained from bigram color uniformity. It shows that the text information in signage board is extracted from complex background in a color layer.

B. LAYOUT ANALYSIS OF HORIZONTAL ALIGNMENT

For each layer obtained, the boundaries are analyzed according to their geometry to estimate that a particular character is present. In most of the cases, the text present on the sign boards or any other regions will be in similar size and horizontal alignment. So an adjacent character grouping algorithm can help in identifying and grouping them together.

A colored bounding box is assigned to each detected text string. Similar adjacent bounding boxes are searched for and if found, they are grouped together into a single box. For non-horizontally oriented strings, characters are searched for, only in a reasonable range. It is generally degrees compared to the horizontal line. To work practically good, some details and parameters of the text detection are slightly adjusted.

contd.

These images describe the adjacent character grouping process. The red box denotes bounding box of a boundary in a color layer. The green regions in the bottom left two figures represent two adjacent groups of consecutive neighboring bounding boxes in similar size and horizontal alignment. The blue regions in the bottom-right figure represent the text string fragments, obtained by merging the overlapping adjacent groups.

STRUCTURE-BASED SCENE TEXT RECOGNITION

After text regions are detected, text information is obtained by character recognition. We use 62 character classes in total. We have two recognition schemes for character recognition. Text understanding is a multi-class classification problem where 62 classes of characters are classified. Text retrieval is a binary classification problem where it is required to estimate if a patch contains a character class or not.

Here, text recognition is done using; A. Character Descriptor B. Character Stroke Configuration

contd.

A. CHARACTER DESCRIPTOR

It uses four keypoint detectors which are Harris detector (HD), MSER detector (MD), Dense detector (DD), Random detector.

• The Harris detector identifies keypoints from corners and junctions. • The MSER detector identifies key points from the stroke components. • The Dense detector is used to uniformly extract the keypoints. • The random detector is used to extract the preset number of keypoints

randomly.

For all the extracted keypoints, the HOG feature is applied and calculated as feature vector x in the feature space.

contd.

Feature Feature descriptors other than HOG can also be used but it is found to give better

results when compared.

For the quantization process, Bag-of-Words model (BOW) and the Gaussian mixture model (GMM) are used for aggregating the extracted features. BOW is used to keypoints from all the four detectors while GMM is applied to those only from DD and RD.

The character patch from both models is mapped into characteristic histogram as feature representation. From the cascading of all the feature representations, the character descriptor with a good power of discrimination and recognition is obtained.

contd.

Character sample

Harris + HOG

MSER + HOG

Dense + HOG

Random + HOG

Feature descriptors

Feature descriptors

BOW

GMM

Histogram of visual word frequency

Histogram of binary

comparison

Characterdescriptor

• Flowchart of the proposed character descriptor, which combines four keypoint detectors, and HOG features are extracted at keypoints. Then BOW and GMM are employed to respectively obtain visual word histogram and binary comparison histogram.

contd.

B. CHARACTER STROKE CONFIGURATION

Character structure consists of multiple oriented strokes, which serve as basic elements of a text character.

From the pixel-level perspective, a stroke of printed text is defined as a region bounded by two parallel boundary segments. Their orientation is regarded as stroke orientation and the distance between them is regarded as stroke width.

In order to locate stroke accurately, stroke is redefined in our algorithm as skeleton points within character sections with consistent width and orientation.

A character can be represented as a set of connected strokes with specific configuration which includes the number, locations, lengths and orientations of the strokes. The structure map of strokes is defined as stroke configuration.

contd.

. In a character class, although the character instances appear in different fonts,

styles, and sizes, the stroke configurations is always consistent.

The configuration of the stroke is estimated by synthesized characters generated from computer software rather than scene characters that are cropped from scene images, as synthesized character can provide accurate boundary and skeleton that are related to character structure.

The Synthetic Font Training Dataset proposed is used here to obtain stroke configuration. This dataset contains about 67400 character patches of synthetic English letters and digits in various fonts and styles, and 20000 patches are selected to generate character patches. It covers all the 62 classes of characters.

A method for scene text recognition from detected text regions for mobile applications is proposed. It detects text regions from scenes or images and then recognizes the text information contained in them.

The proposed character descriptor is effective to extract representative and discriminative text features for both recognition schemes.

To model text character structure for text retrieval scheme, a novel feature representation, stroke configuration map has been designed based on boundary and skeleton.

CONCLUSION

Thank you💐☺

Engineering

Text detection and recognition from natural scenes