A Survey on Detect and Localize Text in Natural Scene

Copyright © 2014 IJECCE, All right reserved59

International Journal of Electronics Communication and Computer EngineeringVolume 5, Issue 1, ISSN (Online): 2249–071X, ISSN (Print): 2278–4209

A Survey on Detect and Localize Text in Natural SceneImage with Wavelet Transform

Palash M. GourshettiwarDepartment of Computer Science & Engineering

Assistant ProfessorD.M.I.E.T.R Sawangi (M),Wardha

R.T.M.N.U, Nagpur University

Sunny G. GandhiDepartment of Information Technology

Assistant ProfessorD.M.I.E.T.R Sawangi (M), Wardha

R.T.M.N.U, Nagpur University

Abstract — Text information in natural scene imagesserves as important clues for many image-based applicationssuch as scene understanding, content-based image analysis.However, locating text from a complex background, multipletext appearances, variations of text font, size and lineorientation. Here present a detect and localize text in naturalscene images with wavelet transform. First, a wavelettransform is applied to the image and the distribution ofhigh-frequency wavelet coefficients is considered tostatistically characterize text and non-text areas. A textregion detector is designed to estimate the text existingconfidence and scale information in image pyramid, whichhelp segment candidate text components by local binarizationand efficiently filter out the non-text components from theimage.

Keywords – Wavelet Transform, Conditional RandomField (CRF), Text Detection, Text Localization.

I. INTRODUCTION

With the increasing use of digital image capturingdevices, such as digital cameras, mobile phones andPDAs, content-based image analysis techniques arereceiving intensive attention in recent years.

Among all the contents in images, text information hasinspired great interests, since it can be easily understoodby both human and computer, and finds wide applicationssuch as license plate reading, sign detection andtranslation, mobile text recognition, content-based webimage search, and so on, define an integrated image textinformation extraction system. Frequency domaintransform mainly concentrate on the wavelet transform,and histogram equalization is a quite typical method ofimage enhancement in the spatial field. Wavelet basedapproach is used to improve performance of face detectionin non-uniform lighting environment with high dynamicrange with local contrast image.A. Image Enhancement

The flow chart of the Image Enhancement is shown infigure 1. Input image is decomposed into the four sub-bands by using Discrete Wavelet Transform (DWT). Thelow frequency sub-band is smoothed and the highfrequency sub-bands are sharpened by using non linearpiecewise filter. The enhanced image is obtained byapplying inverse DWT to the smoothed low frequencysub-band and sharpened high frequency sub-bands.Wavelets have been used quite frequently in imageprocessing. They have been used for feature extraction,denoising, compression, face recognition, and imagesuper-resolution etc. The input image is used here for

obtaining the wavelet transform. The decomposition ofimages into different frequency ranges permits theisolation of the frequency components introduced byintrinsic deformations or extrinsic factors into certainsubbands.

Fig.1. Flow Chart of Image Enhancement WaveletTransform

This process results in isolating small changes in thegiven image in high frequency subband. Hence, discretewavelet transform (DWT) is a suitable tool to be used forimproving an image quality.

Fig.2. The result of 2-D DWT decomposition

The 2-D wavelet decomposition of an image isperformed by applying 1-D DWT along the rows of theimage first, and, then, the results are decomposed alongthe columns. This operation results in four decomposedsubband images referred to as low–low (LL), low–high(LH), high–low (HL), and high–high (HH). The frequencycomponents of those subband images cover the frequencycomponents of the original image as shown in Fig. 2.B. Text in Image

Among all the contents in images, text information hasinspired great interests, since it can be easily understood



by both human and computer, and finds wide applicationssuch as license plate reading, sign detection andtranslation, mobile text recognition, content-based webimage search, and so on. Jung et al. [13] define anintegrated image text information extraction system (TIE,shown in Fig. 3) with four stages: text detection, textlocalization, text extraction and enhancement, andrecognition. Among these stages, text detection andlocalization, bounded in dashed line in Fig. 3, are criticalto the overall system performance. In the last decade,many methods have been proposed to address image andvideo text detection and localization problems, and someof them have achieved impressive results for specificapplications. However, fast and accurate text detection andlocalization in natural scene images is still a challenge dueto the variations of text font, size, color and alignmentorientation, and it is often affected by complexbackground, illumination changes, image distortion anddegrading.

Fig.3. Architecture of a TIE system

II. TEXT DETECTION

Zhong et al. performed text localization on compressedimages, which resulted in a faster performance. Therefore,their text localizers could also be used for text detection.

The color image is converted into the gray-level image,on which image pyramids are built with nearestinterpolation to capture texts with various sizes. A textregion detector is designed by integrating Histograms ofOriented Gradients (HOG) feature extractor and boostedcascade classifier. For each local region in one image ofpyramids, HOG features are extracted as an input to avariation of cascade boosting classifier, WaldBoost, toestimate whether this region contains texts. To measurethe confidence that one region contains texts andBinarized image using image segmentation as shown inFig. 4, leads to the simplification of the training processand the speedup of detection.

Fig.4. Example of Text Detection

III. TEXT LOCALIZATION

A. Region-based methodsRegion-based methods are based on observations that

text regions have distinct characteristics from non textregions such as distinctive gradient distribution, textureand structure. These methods generally consist of twostages: text detection and text localization. For textdetection, features of local regions are extracted todetermine if they contain texts. Then specific grouping orclustering approaches are employed to localize textregions accurately.B. CC-based Methods

CC-based methods directly segment candidate textcomponents by edge detection or color clustering. Thenon-text components are then pruned with heuristic rulesor classifiers. Since the number of segmented candidatecomponents is relatively small, CC-based methods havelower computation cost and the located text componentscan be directly used for recognition.

Lee and Kankanhalli applied a CC-based method to thedetection and recognition of text on cargo containers,which can have uneven lighting conditions and characterswith different sizes and shapes. Edge information is usedfor a coarse search prior to the CC generation. Thedifference between adjacent pixels is used to determine theboundaries of potential characters after quantizing an inputimage. Local threshold values are then selected for eachtext candidate, based on the pixels on the boundaries.These potential characters are used to generate CCs withthe same gray-level. Thereafter, several heuristics are usedto filter out non-text components based on aspect ratio,contrast histogram, and run-length measurement. Despitetheir claims that the method could be effectively used inother domains, experimental results were only presentedfor cargo container images.



Zhong et al. used a CC-based method, which uses colorreduction. They quantize the color space using the peaksin a color histogram in the RGB color space. This is basedon the assumption that the text regions cluster together inthis color space and occupy a significant portion of animage. Each text component goes through a filtering stageusing a number of heuristics, such as area, diameter, andspatial alignment. The performance of this system wasevaluated using CD images and book cover images.

Kim segments an image using color clustering in a colorhistogram in the RGB space. Non-text components, suchas long horizontal lines and image boundaries, areeliminated. Then, horizontal text lines and text segmentsare extracted based on an iterative projection profileanalysis. Kim et al. used cluster-based templates forfiltering out non-character components for multi-segmentcharacters to alleviate the difficulty in defining heuristicsfor filtering out non-text components.

CC-based methods have four processing stages: (i)preprocessing, such as color clustering and noisereduction, (ii) CC generation, (iii) filtering out non-textcomponents, and (iv) component grouping. A CC-basedmethod could segment a character into multiple CCs,especially in the cases of polychrome text strings and low-resolution and noisy video images.C. Connected Component Analysis

In connected component analysis (CCA) stage, a CRFmodel combining unary component properties and binarycontextual component relationships is used to filter outnon-text components. Here, we present a conditionalrandom field (CRF) model to assign candidate componentsas one of the two classes (“text” and “non-text”) byconsidering both unary component properties and binarycontextual component relationships.

Fig.4. Example of the CCA.

CRF is a probabilistic graphical model which has beenwidely used in many areas such as natural languageprocessing. Next considering that neighboring textcomponents normally have similar width or height, webuild up a component neighborhood graph by defining acomponent linkage rule. And also we use the CRF modelto explore contextual component relationships as well asunary component properties. During the test process, toalleviate the computation overhead of graph inference,some apparent non-text components are first removed byusing thresholds on unary component features. Thethresholds are set to safely accept almost all textcomponents in the training set. An example of componentlabeling is shown in Fig. 4.

IV. CONCLUSION

The given images to be decomposed by using WaveletTransform. It will decompose the original image into fourfrequency sub bands for improve the contrast andresolution of the image. Then, the noise in the frequencycoefficients are reduced by using smooth approximation ofPWL filtering techniques. After that detect and localizetexts by integrating region information into a robust CC-based method.

ACKNOWLEDGEMENT

The making of the seminar needed co-operation andguidance of a number of people. I therefore consider it myprime duty to thank all those who had helped me throughtheir venture. It is my immense pleasure to express mygratitude to college D.M.I.E.T.R to provided meconstructive and positive environment during the survey.

I express my sincere thank to the co-author Prof. SunnyG. Gandhi and all other staff members of CSE departmentfor their kind co-operation.

I would like to thank Dr. S. P. Untawale, Principal ofour institution for providing necessary facility during theperiod of working on this report.

I am thankful to my friends and library staff memberswhose encouragement and suggestion helped me tocomplete my survey.

I am also thankful to my parents whose best wishes arealways with me.

REFERENCES

[1] H. Demirel and G. Anbarjafari, Image Resolution Enhancementby Using Discrete and Stationary Wavelet Decomposition, IEEETrans. on Image Processing, Vol.20, May-2011,No. 4, pp 1458-1460 .

[2] X. Wu, and B. Su, A Wavelet-based Image ResolutionEnhancement Technique, Int. Conf on Electronics andOptoelectronics (ICEOE 2011), 2011, pp 62- 65.

[3] N. Unaldi, P. Sankaran, V. K. Asari, Z. Rahaman, ImageEnhancement For Improving Face Detection Under Non uniformLighting Conditions, IEEE Trans. on Image Proce.,vol.25,no3,pp 1332- 1335, Jan-2008.

[4] X. R. Chen and A. L. Yuille, “Detecting and reading text innatural scenes,” in Proc. IEEE Conf. Computer Vision andPattern Recognition (CVPR’04), Washington, DC, 2004, pp.366–373.



[5] R.Lienhart and A. Wernicke, “Localizing and segmenting text inimages and videos,” IEEE Trans. Circuits Syst. Video Technol.,vol. 12, no. 4, pp. 256–268, 2002.

[6] K. I. Kim, K. Jung, and J. H. Kim, “Texture-based approach fortext detection in images using support vector machines andcontinuously adaptive mean shift algorithm,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 25, no. 12, pp. 1631–1639,2003.

[7] Yu Zhong, Hongjiang Zhang, and Anil K. Jain, “AutomaticCaption Localization in Compressed Video”, IEEE Transactionson Pattern Analysis and Machine Intelligence, 22, (4) (2000)385-392.

[8] C.M. Lee, and A. Kankanhalli, Automatic Extraction ofCharacters in Complex Images, International Journal ofPattern Recognition Artificial Intelligence, 9 (1) (1995) 67-82.

[9] S. M. Lucas, “ICDAR 2005 text locating competition results,” inProc. 8th Int. Conf. Document Analysis and Recognition(ICDAR’05), Seoul, South Korea, 2005, pp. 80–84.

[10] Y. X. Liu, S. Goto, and T. Ikenaga, “A contour-based robustalgorithm for text detection in color images,” IEICE Trans. Inf.Syst., vol. E89-D, no. 3, pp. 1221–1230, 2006.

[11] Yu Zhong, Kalle Karu, and Anil K. Jain, Locating Text InComplex Color Images, Pattern Recognition, 28 (10) (1995)1523-1535.

[12] H. P. Li, D. Doermann, and O. Kia, “Automatic text detectionand tracking in digital video,” IEEE Trans. Image Process., vol.9, pp. 147–156, Jan. 2000.

[13] Y.-F. Pan, X. W. Hou, and C.-L. Liu, “Text localization innatural scene images based on conditional random field,” inProc. 10th Int. Conf. Document Analysis and Recognition(ICDAR’09), Barcelona, Spain, 2009, pp. 6–10.

[14] S. Shetty, H. Srinivasan, M. Beal, and S. Srihari, “Segmentationand labeling of documents using conditional random fields,” inProc. Document Recognition and Retrieval XIV, Proc. SPIE, SanJose, CA, Jan. 2007, pp. 6500U-1–11.

[15] K. Jung, K. I. Kim, and A. K. Jain, “Text information extractionin images and video: A survey,” Pattern Recogn., vol. 37, no. 5,pp. 977–997, 2004.

[16] J. Lafferty, A. McCallum, and F. Pereira, “Conditional randomfields: Probabilistic models for segmenting and labelingsequence data,” in Proc. 18th Int. Conf. Machine Learning(ICML’01), San Francisco, CA, 2001, pp. 282–289.

AUTHOR’S PROFILE

Palash M. GourshettiwarD.M.I.E.T.R Sawangi(M),WardhaAssistant Professor (M.E)Department of Computer Science & EngineeringR.T.M.N.U, Nagpur UniversityEmail: [email protected]

Sunny G. GandhiD.M.I.E.T.R Sawangi(M),WardhaAssistant Professor (M-Tech Pursuing)Department of Information TechnologyR.T.M.N.U, Nagpur UniversityEmail: [email protected]

mailto:[email protected]

mailto:[email protected]

Documents

A Survey on Detect and Localize Text in Natural Scene