11
Research Article Predicted Anchor Region Proposal with Balanced Feature Pyramid for License Plate Detection in Traffic Scene Images Hoanh Nguyen Faculty of Electrical Engineering Technology, Industrial University of Ho Chi Minh City, Ho Chi Minh City, Vietnam Correspondence should be addressed to Hoanh Nguyen; [email protected] Received 26 December 2019; Revised 12 May 2020; Accepted 26 May 2020; Published 16 June 2020 Academic Editor: Hassan Zargarzadeh Copyright © 2020 Hoanh Nguyen. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. License plate detection is a key problem in intelligent transportation systems. Recently, many deep learning-based networks have been proposed and achieved incredible success in general object detection, such as faster R-CNN, SSD, and R-FCN. However, directly applying these deep general object detection networks on license plate detection without modifying may not achieve good enough performance. is paper proposes a novel deep learning-based framework for license plate detection in traffic scene images based on predicted anchor region proposal and balanced feature pyramid. In the proposed framework, ResNet-34 architecture is first adopted for generating the base convolution feature maps. A balanced feature pyramid generation module is then used to generate balanced feature pyramid, of which each feature level obtains equal information from other feature levels. Furthermore, this paper designs a multiscale region proposal network with a novel predicted location anchor scheme to generate high-quality proposals. Finally, a detection network which includes a region of interest pooling layer and fully connected layers is adopted to further classify and regress the coordinates of detected license plates. Experimental results on public datasets show that the proposed approach achieves better detection performance compared with other state-of-the-art methods on license plate detection. 1. Introduction License plate recognition plays an important role in intel- ligent transport systems, traffic control, vehicle parking, traffic management, and many other fields. A license plate recognition includes two stages: license plate detection and license plate recognition. License plate detection locates exactly license plates in image, while license plate recog- nition segments and identifies each character on the detected license plate. License plate detection plays a crucial role in the performance of the whole system because exactly lo- cating license plate will increase the accuracy of recognition stage. us, many approaches have been proposed for license plate detection. Previous approaches can be divided into two groups: traditional approaches and deep learning- based approaches. Traditional approaches use handcraft features such as colour, edge, character, and texture to locate license plate in image. Traditional approaches work well under controlled conditions. However, in difficult conditions such as distortion, blurring, and complex backgrounds, the performance of these approaches is still limited. Recently, deep learning-based object detectors such as faster R-CNN [1], SSD [2], and YOLOv3 [3] have achieved significant improvements on general object detection compared with traditional frameworks. However, these object detectors are based on single scale feature map for detecting objects with different scales or used multiscale feature map of the base network with less semantic infor- mation, thus limiting the detection performance of detecting multiscale objects and objects in difficult conditions. To further improve the detection performance, many frame- works which improve semantic information at each feature map such as FPN [4], RetinaNet [5], MS-CNN [6], and DSSD [7] have been proposed and achieved better detection results compared with the baseline framework. However, these deep networks have not yet been studied in license plate detection. Hindawi Complexity Volume 2020, Article ID 5137056, 11 pages https://doi.org/10.1155/2020/5137056

PredictedAnchorRegionProposalwithBalancedFeature ...downloads.hindawi.com › journals › complexity › 2020 › 5137056.pdf · R-CNN for license plate detection. In the first

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PredictedAnchorRegionProposalwithBalancedFeature ...downloads.hindawi.com › journals › complexity › 2020 › 5137056.pdf · R-CNN for license plate detection. In the first

Research ArticlePredicted Anchor Region Proposal with Balanced FeaturePyramid for License Plate Detection in Traffic Scene Images

Hoanh Nguyen

Faculty of Electrical Engineering Technology Industrial University of Ho Chi Minh City Ho Chi Minh City Vietnam

Correspondence should be addressed to Hoanh Nguyen nguyenhoanhiuheduvn

Received 26 December 2019 Revised 12 May 2020 Accepted 26 May 2020 Published 16 June 2020

Academic Editor Hassan Zargarzadeh

Copyright copy 2020 Hoanh Nguyen (is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

License plate detection is a key problem in intelligent transportation systems Recently many deep learning-based networks havebeen proposed and achieved incredible success in general object detection such as faster R-CNN SSD and R-FCN Howeverdirectly applying these deep general object detection networks on license plate detection without modifying may not achieve goodenough performance (is paper proposes a novel deep learning-based framework for license plate detection in traffic sceneimages based on predicted anchor region proposal and balanced feature pyramid In the proposed framework ResNet-34architecture is first adopted for generating the base convolution feature maps A balanced feature pyramid generation module isthen used to generate balanced feature pyramid of which each feature level obtains equal information from other feature levelsFurthermore this paper designs a multiscale region proposal network with a novel predicted location anchor scheme to generatehigh-quality proposals Finally a detection network which includes a region of interest pooling layer and fully connected layers isadopted to further classify and regress the coordinates of detected license plates Experimental results on public datasets show thatthe proposed approach achieves better detection performance compared with other state-of-the-art methods on licenseplate detection

1 Introduction

License plate recognition plays an important role in intel-ligent transport systems traffic control vehicle parkingtraffic management and many other fields A license platerecognition includes two stages license plate detection andlicense plate recognition License plate detection locatesexactly license plates in image while license plate recog-nition segments and identifies each character on the detectedlicense plate License plate detection plays a crucial role inthe performance of the whole system because exactly lo-cating license plate will increase the accuracy of recognitionstage (us many approaches have been proposed forlicense plate detection Previous approaches can be dividedinto two groups traditional approaches and deep learning-based approaches Traditional approaches use handcraftfeatures such as colour edge character and texture to locatelicense plate in image Traditional approaches work wellunder controlled conditions However in difficult

conditions such as distortion blurring and complexbackgrounds the performance of these approaches is stilllimited

Recently deep learning-based object detectors such asfaster R-CNN [1] SSD [2] and YOLOv3 [3] have achievedsignificant improvements on general object detectioncompared with traditional frameworks However theseobject detectors are based on single scale feature map fordetecting objects with different scales or used multiscalefeature map of the base network with less semantic infor-mation thus limiting the detection performance of detectingmultiscale objects and objects in difficult conditions Tofurther improve the detection performance many frame-works which improve semantic information at each featuremap such as FPN [4] RetinaNet [5] MS-CNN [6] andDSSD [7] have been proposed and achieved better detectionresults compared with the baseline framework Howeverthese deep networks have not yet been studied in licenseplate detection

HindawiComplexityVolume 2020 Article ID 5137056 11 pageshttpsdoiorg10115520205137056

Motivated by the above research ideas this paper pro-poses a novel deep learning-based framework based on fasterR-CNN for license plate detection In the first stage abalanced feature pyramid generation module is used togenerate balanced feature pyramid of which each featurelevel obtains equal information from other feature levelsthus enhancing the semantic information of each featuremap Furthermore a novel multiscale region proposalnetwork with predicted location anchor scheme is designedto generate good proposals In the second stage a detectionnetwork which includes a region of interest pooling layerand fully connected layers is used to further classify andregress the bounding box of detected license plates Ex-perimental results on public datasets show that the proposedframework achieves better detection accuracy than state-of-the-art methods on license plate detection (e main con-tributions of this paper can be summarized as follows

(i) (is paper proposes a balanced feature pyramidgeneration module to generate balanced featuremaps of which each integrated feature level pos-sesses balanced information from each resolutionthus enhancing the semantic information of eachfeature map In addition this paper adopts ResNet-34 as the base network for generating the basefeature maps which can improve the detectionperformance compared with VGG architecture

(ii) For generating proposals this paper proposes anovel anchor generation scheme based on guidedanchoring scheme for generating high-qualityproposals (is scheme is integrated into eachbranch of the multiscale region proposal network toobtain a set of high-quality proposals

(iii) Using a two-stage strategy with balanced featuremaps and multiscale region proposal network theproposed approach is evaluated on public datasetsand obtains better detection accuracy than otherstate-of-the-art methods on license plate detection

(e remaining of this paper is organized as followsSection 2 reviews the related work Section 3 details theproposed framework Section 4 provides the experimentalresults and comparison between the proposed method andother methods on public datasets Finally the conclusionsand future works are drawn in Section 5

2 Related Work

21 Deep Learning-Based Object Detection With the fastdevelopment of deep learning many deep learning-basedobject detectors have been proposed and achieved significantimprovements compared with traditional methods (osedeep learning-based object detectors can be divided into twogroups two-stage framework such as fast R-CNN [8] fasterR-CNN [1] and R-FCN [9] and one-stage framework suchas SSD [2] YOLO [10] and YOLOv3 [3] Faster R-CNNintroduced Region Proposal Network (RPN) a fully con-volutional network that simultaneously predicts objectbounds and objectness scores at each position (e RPN

shares full-image convolutional features with the detectionnetwork thus enabling nearly cost-free region proposalsR-FCN proposed a fully convolutional networks with almostall computation shared on the entire image (e position-sensitive score maps are established to address a dilemmabetween translation invariance in image classification andtranslation variance in object detection SSD proposed aone-stage network that predicts category scores and boxoffsets for a fixed set of default bounding boxes using smallconvolutional filters applied to different feature maps atdifferent scales YOLOv3 proposed an improved frameworkwhich detects objects at different feature maps to increasethe detection accuracy of small-scale objects (e above-mentioned object detection frameworks achieved betteraccuracy compared with traditional frameworks Howeverthey used single scale feature map for detecting objects withdifferent scales or used multiscale feature map with lesssemantic information from the base convolution layers thuslimiting the detection performance of detecting multiscaleobjects or objects in difficult conditions

Recently many enhanced frameworks have been pro-posed to improve the detection performance by using dif-ferent enhanced feature maps at different scales such as FPN[4] RetinaNet [5] MS-CNN [6] DSSD [7] and ION [11]FPN proposed to augment a standard convolutional networkwith a top-down pathway and lateral connections so thenetwork efficiently constructs a rich multiscale featurepyramid from a single resolution input image Each level ofthe pyramid can be used for detecting objects at a differentscale RetinaNet proposed novel Focal Loss function whichfocuses training on a sparse set of hard examples to addressthe class imbalance issue MS-CNN consists of a proposalsubnetwork and a detection subnetwork for detecting ob-jects at multiple output layers DSSD introduced deconvo-lution module to generate enhanced feature maps frominput feature maps and improves the detection performanceof small objects ION presented an object detector thatexploits information both inside and outside the region ofinterest

22 License Plate Detection Previous approaches on licenseplate detection can be divided into two groups traditionalapproaches and deep learning-based approaches Traditionalapproaches for license plate detection are usually based onhandcraft features of license plate such as colour edgetexture and character to locate license plate in imageRaghunandan et al [12] proposed a new Riesz fractionalmodel to improve low quality license plate images affectedbymultiple factors AmodifiedMSER algorithm is then usedfor character candidate detection Yuan et al [13] proposed anovel image downscaling method for license plate detectionwhich can substantially reduce the size of the image withoutsacrificing detection performance In addition a novel linedensity filter is designed for extracting license plate candi-dates Gou et al [14] used morphological operations variousfilters different contours and validations for detectingcoarse license plate (en character-specific ERs are selectedas character regions through a Real AdaBoost classifier with

2 Complexity

decision trees Ashtari et al [15] proposed a vehicle licenseplate recognition system based on a modified template-matching technique by the analysis of target colour pixels todetect the location of a license plate along with a hybridclassifier that recognizes license plate characters

With the development of deep learning recently manymethods for license plate detection based on deep learninghave been proposed Kim et al [16] used faster R-CNNframework to locate vehicle regions (en the hierarchicalsampling method is used for generating license plate can-didates from vehicle regions Bulan et al [17] proposed aweak sparse network of winnows classifier trained withsuccessive mean quantization transform features to extractcandidate regions and a strong readableunreadable CNNclassifier to classify those candidate regions Xie et al [18]proposed a preprocessing algorithm to strengthen thecontrast ratio of original car image at the first stage At thesecond stage the integral projection method is used to verifythe true plate Finally a new feature extraction model isdesigned to complete accurate recognition of the licenseplate characters Zou et al [19] proposed to use shallow CNNto quickly remove most of the background regions to reducethe computation cost Deep CNN is then used to detectlicense plate in the remaining regions Xie et al [20] in-troduced a new MD-YOLO model for multidirectional carlicense plate detection (e proposed model could elegantlysolve the problem of multidirectional car license plate de-tection and could also be deployed easily in real-time cir-cumstances because of its reduced computationalcomplexity compared with previous CNN-based methodsHan et al [21] proposed novel and effective strategies totightly enclose the multioriented license plates withbounding parallelograms and detect license plates withmultiple scales(e proposedmethod outperformed existingapproaches in terms of detecting license plates with differentorientations and multiple scales

3 Methodology

Figure 1 illustrates the overall architecture of the proposedframework (e proposed framework is based on fasterR-CNN [1] a popular two-stage general object detector Asshown in Figure 1 a base network based on ResNet-34 [22]architecture is first adopted to generate the base convolutionfeature maps Enhanced feature pyramid is then generatedfrom the base feature maps as in FPN [4] To balance se-mantic features of low-level and high-level information ineach level of enhanced feature pyramid a balanced featurepyramid generation module is added to generate balancedfeature pyramid In the multilevel region proposal network(RPN) a novel predicted location anchor scheme is designedto generate high-quality proposals Finally a detectionnetwork is used to further classify and regress the boundingbox of detected license plates Details of each module will beexplained in the next sections

31 Balanced Feature Pyramid Generation Module (eoriginal faster R-CNN uses VGG-16 [23] as the base

network Ren et al [1] showed that almost of the forwardtime is spent on the base network (us using a faster basenetwork can greatly improve the inference speed of thewhole network ResNet is an efficient architecture whichpresented a residual learning framework to ease the trainingof networks that are substantially deeper than previousnetworks In [22] ResNet-34 achieved nearly as perfor-mance as ResNet-50 and ResNet-101 while being faster andsimpler (us this paper adopts ResNet-34 architecture asthe base network to generate initial convolution featuremaps Compared with VGG-16 ResNet-34 is not only moreaccurate than VGG-16 but also faster than VGG-16 [24](earchitecture of ResNet-34 for ImageNet [25] is shown inTable 1

As in [26] higher lever features in deeper layers ofconvolutional layers contain more semantic representationwhile lower lever features in shallower layers could betterdescribe the characteristics of the small-scale objectsHowever shallow feature maps from the low layers offeature pyramid inherently lack fine semantic informationfor object recognition Recently many feature combinationmethods based on lateral connections such as FPN [4] andRetinaNet [5] have improved the performance of objectdetection over faster R-CNN and SSD However Pang et al[27] showed that the sequential manner in above integrationmethods will make integrated features focus more on ad-jacent resolution but less on other resolutions (e balancedintegrated features which possess balanced informationfrom each resolution will significantly improve the detectionperformance (us this paper proposes a balanced featuremap generation module for generating balanced featuremaps (e proposed module is based on FPN [4] Figure 2illustrates the architecture of the balanced feature mapgeneration module

First let C2 C3 C4 C5 represent the output of the lastresidual block for conv2 conv3 conv4 and conv5 block ofthe ResNet-34 (e strides of these outputs are 4 8 16 32pixels respectively with respect to the input image Fol-lowing [4] a 1times 1 convolutional layer is added on eachoutput feature map of the base network to reduce channeldepth In the top-down path coarser-resolution featuremaps are upsampled by a factor of 2 by using the nearestneighbor upsampling operation (ese upsampled featuresare then merged with the corresponding output featuremaps of the base ResNet-34 by elementwise addition Fi-nally to reduce the aliasing effect of upsampling a 3times3convolution layer is added on each merged feature map(except for M5) to generate the multiscale feature pyramiddenoted as P2 P3 P4 P5 which can be used for detectingobjects at a different scale

Next to integrate multilevel features and preserve theirsemantic hierarchy interpolation and max pooling oper-ation are used on P5 and P2 P3 respectively to resizeP2 P3 P5 to the same size of P4 Because the featuresfrom different convolution layers have different scale ofvalues directly integrating them will lead to the domina-tion of the larger values (us this paper adds L2 nor-malization layer on each of rescaled features to keep thefeature values from different convolution layers on the

Complexity 3

same scale L2 normalization of a vectory y1 y2 yc1113864 1113865 is defined as follows

1113954y y

y2

y

1113936ci1 yi

111386811138681113868111386811138681113868111386811138682

1113872 1113873(12)

(1)

where 1113954y represents the normalized vector y2 represents theL2 normalization of y and c represents the number ofchannels

Based on the rescaled feature maps balanced semanticfeature map is generated as follows

S 1n

1113944

kmax

kkmin

Pkprime (2)

where n represents the number of rescaled features (n 4 inthis paper) kmin and kmax represent the indexes of the lowestand the highest level in the rescaled features Pk

prime representsthe rescaled feature map at resolution level k

Finally the final balanced feature maps F2 F3 F4 F5are received by rescaling the semantic feature map in reverseprocedure More specifically F5 is obtained by using max

pooling operation on balanced semantic feature map andF2 F3 are obtained by using interpolation operation onbalanced semantic feature map With the proposed balancedfeature pyramid generation module each resolution in thefinal feature pyramid obtains equal information from otherresolutions thus balancing the information flow and leadingthe features more discriminative

32 Multiscale Region Proposal Network with Predicted Lo-cation Anchor In faster R-CNN the RPN generates a set ofanchor boxes at each location of the last convolution layer ofthe base network(e RPN then classifies these anchor boxesto objectbackground class and regresses the coordinates ofthese anchor boxes (ere are 9 anchor boxes in total at eachlocation of the feature map in original faster R-CNNframework Each anchor box is associated with predefinedscales and aspect ratios

Wang et al [28] showed that the uniform anchoringscheme in faster R-CNN can lead to significant computa-tional cost because many anchor boxes are generated inregions where the objects of interest are unlikely to exist Inaddition a good of anchor box setting is needed for differentproblems to improve performance (us this paper pro-poses a novel anchor generation scheme based on guidedanchoring scheme [28] for generating anchor boxes

Figure 3 illustrates the difference between the RPN infaster R-CNN and the proposed RPN with novel anchorgeneration scheme In the proposed RPN this paper firstapplies a 1times 1 convolution layer at each feature scale of thebalanced feature maps to create objectness score mapElementwise sigmoid function is then adopted to convert theobjectness score map to probability map Based on theprobability map on each scale the positive regions wherelicense plate candidate may possibly exist can be determinedby selecting those locations whose corresponding proba-bility values are above a predefined threshold As in [28] thispredefined threshold is set at 001 in this paper Finally a3times 3 convolution layer filter is applied across each sliding

Input image

e basenetwork

Enhancedfeature maps

Balancedfeature maps

Balanced featurepyramid generation

module

RPN with predictedlocation anchor

RoI pooling

Fixed sizefeatures

FC la

yer

FC la

yer FC

laye

rFC

laye

r

Classification

Regression

Detection network

Figure 1 Overall pipeline of the proposed approach

Table 1 (e architectures of ResNet-34 for ImageNet

Layer name Kernel size Output sizeConv1 7times 7times 64 stride 2 112times112

Conv23times 3 max pool stride 2

56times 563 times 3 times 643 times 3 times 641113890 1113891 times 3

Conv33 times 3 times 1283 times 3 times 1281113890 1113891 times 4 28times 28

Conv43 times 3 times 2563 times 3 times 2561113890 1113891 times 6 14times14

Conv53 times 3 times 5123 times 3 times 5121113890 1113891 times 3 7times 7

Downsampling is performed by conv3-1 conv4-1 and conv5-1 with a strideof 2

4 Complexity

position on the input feature map At each position on theinput feature map corresponding with positive regions onthe probability map the local features are extracted andconcatenated along the channel axis and form a 256-dfeature vector which is then fed into two separate fullyconvolutional layers for license platebackground classifi-cation and box regression (e probability map can elimi-nate almost negative regions while still maintaining the samerecall Figure 4 shows the example results of the originalRPN and the proposed RPN Because the proposed RPNslides over all positive locations in all balanced pyramidlevels it is not necessary to have multiscale anchors on aspecific level Instead this paper assigns anchors of a singlescale to each level of the balanced pyramid according to thesize and aspect ratio of license plates in the dataset summarytable (Table 2) More specifically this paper defines theanchors to have the height of 5 10 15 20 pixels with anaspect ratio widthheight 5 on F2 F3 F4 F5 respectively

33 Detection Network Although the proposed multiscaleRPN could work as a detector itself it is not strong since itssliding windows do not cover objects well To increasedetection accuracy a license plate detection network isadded License plate detection network is used to classifyproposals generated by the proposed RPN to license plateand background class and further refine the coordinates ofdetected license plate (e license plate detection networkhas a region of interest (RoI) pooling layer and two fullyconnected (FC) layers as shown in Figure 1

Based on proposals generated by the multiscale RPN RoIpooling layer is used to extract the fixed size feature patchesfrom the balanced feature pyramid As in [4] this paperselects the balanced feature map layer in the most proper scaleto extract the feature patches based on the size of eachproposal More specifically a proposal of width w and heighth is assigned to the level Fk of the proposed feature pyramidwith k being calculated as the following formula

k k0 + log2

wh

radic

2241113888 1113889 (3)

where 224 is the canonical ImageNet pretraining size and k0is the target level on which a proposal with w times h 224times 224should be mapped into (is paper sets k0 4 as in [4]

(e fixed size patches are then flattened into a vector andpassed through the two 1024-d FC layers followed by ReLU(e encoded features are then fed into two separate lineartransformation layers license plate classification layer andbounding box regression layer (e license plate classifica-tion layer has two outputs which indicate the softmaxprobability of each proposal as license platebackground(e license plate regression layer produces the bounding boxcoordinate offsets for each proposal

34 Loss Function (e proposed framework is trained in anend-to-end fashion using a multitask loss function Besidethe conventional classification loss Lcls and regression lossLreg this paper adds additional loss function for the anchor

Inpu

t im

age

Conv

1 (C

1)Co

nv2

(C2)

Conv

3 (C

3)Co

nv4

(C4)

Conv

5 (C

5) 1 times 1 conv

1 times 1 conv

1 times 1 conv

1 times 1 conv

M5

M4

M3

M2

Upsample times 2

Upsample times 2

Upsample times 2

3 times 3 conv

3 times 3 conv

3 times 3 conv

P5

P4

P3

P2

Resize

Resize

Resize

L2 norm

L2 norm

L2 norm

L2 norm

Resize

Resize

Resize

Resize

F5

F4

F3

F2

Semanticfeature map

Balanced feature maps

Figure 2 (e architecture of the balanced feature map generation module

Complexity 5

box location prediction Lloc (us the multitask lossfunction is defined as follows

L 1113944 Lcls + 1113944 Lreg + Lloc (4)

In (4) the binary logistic loss is used for box classifi-cation and smooth L1 loss [1] is adopted for box regressionFor training the anchor box location prediction branch inthe proposed RPN this paper follows the training schemedesigned in [28] More specifically this paper denotes theground-truth bounding box as (xgt ygt wgt hgt) where(xgt ygt) represents the center coordinates and (wgt hgt)

represents the size of the ground-truth bounding box (eground-truth bounding box is mapped to the correspondingbalanced feature map scale to obtain (xgtprime ygtprime wgtprime hgtprime ) Basedon the obtained bounding box the center box (CB) ignorebox (IB) and outside box (OB) are defined as follows

CB xgtprime ygtprime z1wgtprime z1hgtprime1113872 1113873 (5)

IB xgtprime ygtprime z2wgtprime z2hgtprime1113872 1113873 minus CB (6)

OB xgtprime ygtprime wgtprime hgtprime1113872 1113873 minus CB minus IB (7)

Input feature map

Conv layer(3 times 3 times C times N)

Conv layer(1 times 1 times N times 2k)

Conv layer(1 times 1 times N times 4k)

Classification

Regression

Proposals

W times H times C

(a)

W times H times C

Conv layer(1 times 1 times C times 1)

Objectnessscore map

Probabilitymap

W times H times 1

Elementwise sigmoid

Conv layer(3 times 3 times C times N)

Conv layer(1 times 1 times N times 2k)

Conv layer(1 times 1 times N times 4k)

Classification

Regression

Proposals

(b)

Figure 3(e architecture of the original RPN (a) and the proposed RPNwith predicted location anchor (b) Multilayer RPN is used in this paper

(a) (b)

Figure 4 Example results of the original RPN (a) and the proposed RPN (b)

Table 2 Dataset summary

Dataset Number of images Number of license plates Image resolution License plate height (in pixels)

PKU vehicle dataset

G1 810 810 1082times 728 35ndash57G2 700 700 1082times 728 30ndash62G3 743 743 1082times 728 29ndash53G4 572 572 1600times1236 30ndash58G5 1152 1438 1600times1200 20ndash60

AOLP datasetAC 681 681 352times 240 25ndash70LE 757 757 640times 480 28ndash80RP 611 611 320times 240 30ndash70

6 Complexity

where z2 gt z1 Pixels inside CB are assigned as positive lo-cations while pixels inside OB are assigned as negativelocations Otherwise pixels inside IB are discarded intraining samples In the end for each image in the trainingset a binary label map where 1 represents a positive locationand 0 represents a negative location is generated for trainingthe anchor box location prediction branch Note that eachlevel of the balanced feature map should only assign objectsof a specific scale range so CB is only assigned on a featuremap that matches the scale range of the targeted object (esame regions of adjacent levels in the balanced featurepyramid are set as IB Finally focal loss function [5] isadopted to train the anchor box location prediction branchfor solving sample level imbalance problem

4 Results and Discussion

In order to compare the effectiveness of the proposed ap-proach with other state-of-the-art approaches on licenseplate detection this paper conducts experiments on twopublic datasets PKU vehicle dataset [13] and ApplicationOriented License Plate (AOLP) dataset [29] (e proposedapproach is implemented on a Window system machinewith Intel Core i7 8700 CPU NVIDIA GeForce GTX 1080GPU and 16Gb of RAM TensorFlow is adopted forimplementing deep CNN frameworks

41 Dataset and EvaluationMetric Two public license platedatasets are adopted to evaluate the performance of theproposed method in this paper including PKU vehicledataset [13] and AOLP dataset [29]

PKU vehicle dataset includes 3828 vehicle images cap-tured from various scenes under diverse environmentconditions (e image in this dataset is divided into fivegroups (G1-G5) corresponding to different configurationsMore specifically all images in G1 G2 and G3 group weretaken on highways while images in G4 group were taken oncity roads and images in G5 group were taken at inter-sections with crosswalks (e image in G4 group is capturedduring nighttime while the image in other groups is cap-tured during daytime (ere is one Chinese license plate ineach image of G1-G4 group while multiple Chinese licenseplates are captured in each image of G5 group For trainingthe proposed network this paper adopts CarFlag-Largedataset [30] which contains 460000 images with Chineselicense plates

AOLP dataset includes 2049 images of Taiwan licenseplates in various locations time traffic and weather con-ditions (is dataset is categorized into three subsets accesscontrol (AC) with 681 images traffic law enforcement (LE)with 757 images and road patrol (RP) with 611 images ACrefers to the cases that a vehicle passes a fixed passage at areduced speed or with a full stop LE refers to the cases that avehicle violates traffic laws and is captured by a roadsidecamera RP refers to the cases that the camera is installed orhandheld on a patrolling vehicle which takes images ofvehicles with arbitrary viewpoints and distances Each imagecontains one license plate Since there is no standard split for

AOLP dataset this paper follows the same strategy as in [30]for training the proposed network More specifically thispaper uses images from different subsets for training and testseparately In addition data augmentation is conducted byrotation and affine transformation to increase the number oftraining images In this paper PKU vehicle dataset andAOLP dataset are adopted to evaluate the performance of theproposed approach and compare the detection results withthe results of other state-of-the-art approaches Table 2shows the detailed descriptions of each dataset used inthis paper

For the evaluationmetric this paper follows the criterionused in [13] to evaluate the performance of the proposedmethod and other methods on the PKU vehicle dataset andAOLP dataset More specifically a detection is considered tobe correct if the license plate is totally encompassed by thebounding box and the IoU between the detected license plateand the ground-truth license plate is at least 05

42ExperimentalResults onPKUVehicleDataset In order toshow the effectiveness of the proposed approach this papercompares the performance results of the proposed methodwith the results of state-of-the-art license plate detectionmethods on PKU vehicle dataset including the methodsproposed by Zhou et al [31] Li et al [32] Yuan et al [13]and Li et al [30] Zhou et al [31] proposed to discover theprincipal visual word characterized with geometric contextfor each license plate character With a new license plateimage the license plates are extracted by matching localfeatures with principal visual word Li et al [32] usedmaximally stable extremal region detector to extract can-didate characters in images (e exact bounding boxes oflicense plates are estimated through the belief propagationinference on conditional random field which are constructedon the candidate characters in neighborhoods Yuan et al[13] proposed a novel line density filter approach to extractlicense candidate regions and a cascaded license plateclassifier based on linear support vector machines usingcolour saliency features is designed to identify the truelicense plate from among the candidate regions Li et al [30]proposed an approach to address both detection and rec-ognition of license plate using a single deep neural network

Table 3 shows the comparison of detection results onPKU vehicle dataset As shown in Table 3 the proposedapproach achieves the best detection accuracy on PKUvehicle dataset More specifically in terms of average de-tection performance the performance of the proposedmethod is improved by 953 823 206 and 024compared with methods proposed by Zhou et al [31] Liet al [32] Yuan et al [13] and Li et al [30] respectively Itshould be noted that the performance of the proposedmethod surpasses the best of the reference methods pro-posed by Li et al [30] by a significant margin on G5 groupImages in G5 group contain multiple license plates in dif-ficult conditions such as large variance of scales reflectiveglare and blurry and are affected by defects (is resultshows a strong ability of the proposed framework ondetecting license plate in difficult conditions with a large

Complexity 7

variance of scales Figure 5(a) shows some examples ofdetection results of the proposed method on PKU vehicledataset As shown in Figure 5(a) the proposed algorithm iseffective to detect license plates with different scales underdifferent situations

43 Experimental Results on AOLP Dataset To furtherevaluate the effectiveness of the proposed framework theperformance of proposed approach is tested on AOLPdataset Table 4 shows the comparison of detection resultsof the proposed method and methods proposed by Hsuet al [29] Li et al [33] and Li et al [30] Experimentalresults in Table 4 show that the proposed method achievesthe best detection ratio on all three subsets compared to theprevious methods More specifically in terms of averagedetection the performance of the proposed method isimproved by 442 223 and 062 compared withmethods proposed by Hsu et al [29] Li et al [33] and Liet al [30] respectively (e experimental results demon-strate that the proposed balanced feature pyramid andpredicted location anchor can effectively enhance featurerepresentation power and boost the performance of licenseplate detection in difficult conditions Figure 5(b) showssome examples of detection results of the proposed methodon AOLP dataset As can be observed the proposedmethod can accurately locate small license plates as well asmedium or large ones

44 Ablation Experiments To evaluate the effectiveness ofeach module in the proposed approach this paper conductsseveral experiments on the Chinese City Parking Dataset(CCPD) [34] and compares the detection results with theresults of original faster R-CNN [1] and FPN with fasterR-CNN baseline [4] framework CCPD dataset is a largepublicly available labeled license plate dataset It contains25k independent license plate images under diverse illu-minations environments and backgrounds Each image hasresolution of 720times1160 and contains one license plate Allimages containing license plate are divided into 8 groupsbased on different conditions CCPD-Base with 200k imagesCCPD-FN with 20k images CCPD-DB with 20k imagesCCPD-Rotate with 10k images CCPD-Tilt with 10k imagesCCPD-Weather with 10k images CCPD-Challenge with 10kimages CCPD-Blur with 5k images As in [34] this paperadopts 100k images in CCPD-Base subset to train theproposed network and then evaluates the results on CCPD-Base CCPD-DB CCPD-FN CCPD-Rotate CCPD-TiltCCPD-Weather and CCPD-Challenge

In the first experiment this paper replaces VGG-16network in original faster R-CNN by the proposed bal-anced feature pyramid generation module (e RPN net-work is kept unchanged in this experiment To show theeffectiveness of the L2 normalization L2 normalizationlayer in the balanced feature pyramid generation module isdiscarded in the second experiment In the third experi-ment this paper adds the proposed RPN network with thepredicted location anchor module to replace the originalRPN network (e VGG-16 is kept unchanged as the basenetwork in this experiment In the fourth experiment thispaper adds both the proposed RPN network with thepredicted location anchor module and the proposed bal-anced feature pyramid generation module with L2 nor-malization layer to replace the original RPN network andVGG-16 architecture

Table 5 shows the detection results of each proposedexperiment on the CCPD dataset As shown in Table 5comparing with the original RPN in faster R-CNNframework the proposed predicted location anchorscheme improves the average detection by 02 Bygenerating good proposals the features for the detectionnetwork are more discriminative thus improving thedetection results Comparing with the feature pyramid inFPN with faster R-CNN baseline the proposed balancedpyramid generation module improves the average de-tection by 15 It should be noted that there is no pa-rameter added in the proposed module With theproposed module each level in the balanced featurepyramid obtains equal information from other levelsthus improving the detection performance of the de-tection network Comparing with faster R-CNN and FPNwith faster R-CNN baseline the proposed approachimproves the average detection by 45 and 19 re-spectively (e comparison results indicate that theproposed framework is superior to both single scale andmultiscale features for a region-based object detectorFurthermore with L2 normalization layer added on eachof rescaled features in the balanced feature pyramidgeneration module the average detection is improved by06 compared with the balanced feature pyramid gen-eration module without L2 normalization (is resultshows the effectiveness of the L2 normalization layerwhich keeps the feature values from different convolutionlayers on the same scale

5 Conclusions and Future Work

(is paper proposes a novel deep learning-based frameworkfor license plate detection In the proposed framework abalanced feature pyramid generation module based onResNet-34 architecture is used to generate enhanced bal-anced feature pyramid of which each feature level obtainsequal information from other feature levels In addition amultiscale region proposal network with predicted locationanchor scheme is introduced to generate good proposalsfrom each level of the balanced feature pyramid With goodproposals generated from balanced feature maps the pro-posed approach shows significant improvements compared

Table 3 Comparison of detection results on PKU vehicle dataset

MethodDetection ratio ()

G1 G2 G3 G4 G5 AverageZhou et al [31] 9543 9785 9421 8123 8237 9022Li et al [32] 9889 9842 9583 8117 8331 9152Yuan et al [13] 9876 9842 9772 9623 9732 9769Li et al [30] 9988 9971 9946 9983 9868 9951Proposed approach 9988 9986 9973 9983 9944 9975

8 Complexity

with other approaches on license plate detection (e goodperformance of the proposed approach on license platedetection has a high reference value in the field of intelligent

transport systems For the future work this paper will ex-plore and compare more feature combination andmultiscaledetection methods such as DeepLabv3+ [35] and MOSI-

(a) (b)

Figure 5 Examples of detection results of the proposed method on PKU vehicle dataset (a) and AOLP dataset (b)

Complexity 9

LPD [21] In addition this paper will adopt the nonlocalmodule [36] to further refine the balanced semantic features(is step may enhance the integrated features and furtherimprove the detection results

Data Availability

(e codes used in this paper are available from the corre-sponding author upon request

Conflicts of Interest

(e author declares that there are no conflicts of interestregarding the publication of this paper

References

[1] S Ren K He R Girshick and J Sun ldquoFaster r-cnn towardsreal-time object detection with region proposal networksrdquoIEEE Transactions on Pattern Analysis and Machine Intelli-gence vol 39 no 6 pp 1137ndash1149 2015

[2] W Liu D Anguelov D Erhan et al ldquoSingle shot multiboxdetectorrdquo 2016 httpsarxivorgabs151202325

[3] J Redmon and F Ali ldquoYolov3 an incremental improvementrdquo2018 httparxivorgabs180402767

[4] T Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 936ndash944 Hon-olulu HI USA July 2017

[5] T Lin P Goyal R Girshick K He and P Dollar ldquoFocal lossfor dense object detectionrdquo in Proceedings of the 2017 IEEEInternational Conference on Computer Vision (ICCV)pp 2999ndash3007 Venice Italy October 2017

[6] Z Cai Q Fan R S Feris and N Vasconcelos ldquoA unifiedmulti-scale deep convolutional neural network for fast objectdetectionrdquo 2016 httparxivorgabs160707155

[7] C-Y Fu W Liu A Ranga A Tyagi and A C Berg ldquoDssddeconvolutional single shot detectorrdquo 2017 httparxivorgabs170106659

[8] R Girshick ldquoFast R-CNNrdquo in Proceedings of the IEEE In-ternational Conference on Computer Vision (ICCV) SantiagoChile December 2015

[9] J Dai Yi Li K He and J Sun ldquoR-fcn object detection viaregion-based fully convolutional networksrdquo Advances inNeural information Processing Systems pp 379ndash387 MitPress Cambridge MA USA 2016

[10] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo 2016 httparxivorgabs150602640

[11] S Bell C L Zitnick K Bala and R Girshick ldquoInside-outsidenet detecting objects in context with skip pooling and re-current neural networksrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognitionpp 2874ndash2883 Las Vegas NV USA June 2016

[12] K S Raghunandan P Shivakumara H A Jalab et al ldquoRieszfractional based model for enhancing license plate detectionand recognitionrdquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 28 no 9 pp 2276ndash2288 2018

[13] Y Yuan W Zou Y Zhao X Wang X Hu andN Komodakis ldquoA robust and efficient approach to licenseplate detectionrdquo IEEE Transactions on Image Processingvol 26 no 3 pp 1102ndash1114 2017

[14] C Gou K Wang Y Yao and Z Li ldquoVehicle license platerecognition based on extremal regions and restricted Boltz-mann machinesrdquo IEEE Transactions on Intelligent Trans-portation Systems vol 17 no 4 pp 1096ndash1107 2016

[15] A H Ashtari M J Nordin andM Fathy ldquoAn Iranian licenseplate recognition system based on color featuresrdquo IEEETransactions on Intelligent Transportation Systems vol 15no 4 pp 1690ndash1705 2014

[16] S G Kim H G Jeon and H I Koo ldquoDeep-learning-basedlicense plate detection method using vehicle region extrac-tionrdquo Electronics Letters vol 53 no 15 pp 1034ndash1036 2017

[17] O Bulan V Kozitsky P Ramesh and M Shreve ldquoSeg-mentation- and annotation-free license plate recognition withdeep localization and failure identificationrdquo IEEE Transac-tions on Intelligent Transportation Systems vol 18 no 9pp 2351ndash2363 2017

[18] F Xie M Zhang J Zhao J Yang Y Liu and X Yuan ldquoArobust license plate detection and character recognition

Table 4 Comparison of detection results on AOLP dataset

MethodDetection ratio ()

AC LE RP AverageHsu et al [29] 960 950 940 95Li et al [33] 9838 9762 9558 9719Li et al [30] 9912 9908 9820 988Proposed approach 9941 9934 9951 9942

Table 5 Detection results of each proposed network on CCPD dataset

NetworkDetection performance ()

Base DB FN Rotate Tilt Weather Challenge AverageFaster R-CNN 981 921 837 918 894 811 839 886FPN with faster R-CNN baseline 992 954 875 930 913 854 863 912Faster R-CNN+balanced feature pyramid 995 962 891 932 916 889 901 927Faster R-CNN+balanced pyramid without L2-norm 993 962 889 931 910 876 883 921Faster R-CNN+predicted anchor RPN 985 926 840 915 894 814 844 888Faster R-CNN+balanced pyramid with L2-norm+predicted anchor RPN 995 964 901 932 918 897 912 931

10 Complexity

algorithm based on a combined feature extraction model andBPNNrdquo Journal of Advanced Transportation vol 2018 ArticleID 6737314 14 pages 2018

[19] L Zou M Zhao Z Gao M Cao H Jia and M Pei ldquoLicenseplate detection with shallow and deep CNNs in complexenvironmentsrdquo Complexity vol 2018 Article ID 79846536 pages 2018

[20] L Xie T Ahmad L Jin Y Liu and S Zhang ldquoA new CNN-based method for multi-directional car license plate detec-tionrdquo IEEE Transactions on Intelligent Transportation Systemsvol 19 no 2 pp 507ndash517 2018

[21] J Han J Yao J Zhao J Tu and Y Liu ldquoMulti-oriented andscale-invariant license plate detection based on convolutionalneural networksrdquo Sensors vol 19 no 5 p 1175 2019

[22] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the 2016 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 770ndash778 Las Vegas NV USA June 2016

[23] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httparxivorgabs14091556

[24] J Huang ldquoSpeedaccuracy trade-offs for modern convolu-tional object detectorsrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 3296-3297 Honolulu HI USA July 2017

[25] O Russakovsky J Deng H Su et al ldquoImagenet large scalevisual recognition challengerdquo 2014 httparxivorgabs14090575

[26] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in European Conference on Com-puter Vision Springer Berlin Germany 2014

[27] J Pang ldquoLibra r-cnn towards balanced learning for objectdetectionrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition Long Beach CA USA June2019

[28] J Wang K Chen S Yang C C Loy and D Lin ldquoRegionproposal by guided anchoringrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognitionpp 2965ndash2974 Long Beach CA USA June 2019

[29] G-S Hsu J-C Chen and Y-Z Chung ldquoApplication-ori-ented license plate recognitionrdquo IEEE Transactions on Ve-hicular Technology vol 62 no 2 pp 552ndash561 2013

[30] H Li P Wang and C Shen ldquoToward end-to-end car licenseplate detection and recognition with deep neural networksrdquoIEEE Transactions on Intelligent Transportation Systemsvol 20 no 3 pp 1126ndash1136 2019

[31] W Zhou H Li Y Lu and Q Tian ldquoPrincipal visual worddiscovery for automatic license plate detectionrdquo IEEETransactions on Image Processing vol 21 no 9 pp 4269ndash4279 2012

[32] B Li B Tian Y Li and D Wen ldquoComponent-based licenseplate detection using conditional random field modelrdquo IEEETransactions on Intelligent Transportation Systems vol 14no 4 pp 1690ndash1699 2013

[33] H Li and C Shen ldquoReading car license plates using deepconvolutional neural networks and LSTMsrdquo 2016 httpsarxivorgabs160105610

[34] Z Xu W Yang A Meng et al ldquoTowards end-to-end licenseplate detection and recognition a large dataset and baselinerdquoComputer VisionmdashECCV 2018 Springer Berlin Germanypp 261ndash277 2018

[35] L-C Chen Y Zhu P George F Schroff and H AdamldquoEncoder-decoder with atrous separable convolution for se-mantic image segmentationrdquo in Proceedings of the European

Conference on Computer Vision (ECCV) pp 801ndash818Munich Germany September 2018

[36] X Wang R Girshick A Gupta and K He ldquoNon-local neuralnetworksrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7794ndash7803 Honolulu HIUSA July 2018

Complexity 11

Page 2: PredictedAnchorRegionProposalwithBalancedFeature ...downloads.hindawi.com › journals › complexity › 2020 › 5137056.pdf · R-CNN for license plate detection. In the first

Motivated by the above research ideas this paper pro-poses a novel deep learning-based framework based on fasterR-CNN for license plate detection In the first stage abalanced feature pyramid generation module is used togenerate balanced feature pyramid of which each featurelevel obtains equal information from other feature levelsthus enhancing the semantic information of each featuremap Furthermore a novel multiscale region proposalnetwork with predicted location anchor scheme is designedto generate good proposals In the second stage a detectionnetwork which includes a region of interest pooling layerand fully connected layers is used to further classify andregress the bounding box of detected license plates Ex-perimental results on public datasets show that the proposedframework achieves better detection accuracy than state-of-the-art methods on license plate detection (e main con-tributions of this paper can be summarized as follows

(i) (is paper proposes a balanced feature pyramidgeneration module to generate balanced featuremaps of which each integrated feature level pos-sesses balanced information from each resolutionthus enhancing the semantic information of eachfeature map In addition this paper adopts ResNet-34 as the base network for generating the basefeature maps which can improve the detectionperformance compared with VGG architecture

(ii) For generating proposals this paper proposes anovel anchor generation scheme based on guidedanchoring scheme for generating high-qualityproposals (is scheme is integrated into eachbranch of the multiscale region proposal network toobtain a set of high-quality proposals

(iii) Using a two-stage strategy with balanced featuremaps and multiscale region proposal network theproposed approach is evaluated on public datasetsand obtains better detection accuracy than otherstate-of-the-art methods on license plate detection

(e remaining of this paper is organized as followsSection 2 reviews the related work Section 3 details theproposed framework Section 4 provides the experimentalresults and comparison between the proposed method andother methods on public datasets Finally the conclusionsand future works are drawn in Section 5

2 Related Work

21 Deep Learning-Based Object Detection With the fastdevelopment of deep learning many deep learning-basedobject detectors have been proposed and achieved significantimprovements compared with traditional methods (osedeep learning-based object detectors can be divided into twogroups two-stage framework such as fast R-CNN [8] fasterR-CNN [1] and R-FCN [9] and one-stage framework suchas SSD [2] YOLO [10] and YOLOv3 [3] Faster R-CNNintroduced Region Proposal Network (RPN) a fully con-volutional network that simultaneously predicts objectbounds and objectness scores at each position (e RPN

shares full-image convolutional features with the detectionnetwork thus enabling nearly cost-free region proposalsR-FCN proposed a fully convolutional networks with almostall computation shared on the entire image (e position-sensitive score maps are established to address a dilemmabetween translation invariance in image classification andtranslation variance in object detection SSD proposed aone-stage network that predicts category scores and boxoffsets for a fixed set of default bounding boxes using smallconvolutional filters applied to different feature maps atdifferent scales YOLOv3 proposed an improved frameworkwhich detects objects at different feature maps to increasethe detection accuracy of small-scale objects (e above-mentioned object detection frameworks achieved betteraccuracy compared with traditional frameworks Howeverthey used single scale feature map for detecting objects withdifferent scales or used multiscale feature map with lesssemantic information from the base convolution layers thuslimiting the detection performance of detecting multiscaleobjects or objects in difficult conditions

Recently many enhanced frameworks have been pro-posed to improve the detection performance by using dif-ferent enhanced feature maps at different scales such as FPN[4] RetinaNet [5] MS-CNN [6] DSSD [7] and ION [11]FPN proposed to augment a standard convolutional networkwith a top-down pathway and lateral connections so thenetwork efficiently constructs a rich multiscale featurepyramid from a single resolution input image Each level ofthe pyramid can be used for detecting objects at a differentscale RetinaNet proposed novel Focal Loss function whichfocuses training on a sparse set of hard examples to addressthe class imbalance issue MS-CNN consists of a proposalsubnetwork and a detection subnetwork for detecting ob-jects at multiple output layers DSSD introduced deconvo-lution module to generate enhanced feature maps frominput feature maps and improves the detection performanceof small objects ION presented an object detector thatexploits information both inside and outside the region ofinterest

22 License Plate Detection Previous approaches on licenseplate detection can be divided into two groups traditionalapproaches and deep learning-based approaches Traditionalapproaches for license plate detection are usually based onhandcraft features of license plate such as colour edgetexture and character to locate license plate in imageRaghunandan et al [12] proposed a new Riesz fractionalmodel to improve low quality license plate images affectedbymultiple factors AmodifiedMSER algorithm is then usedfor character candidate detection Yuan et al [13] proposed anovel image downscaling method for license plate detectionwhich can substantially reduce the size of the image withoutsacrificing detection performance In addition a novel linedensity filter is designed for extracting license plate candi-dates Gou et al [14] used morphological operations variousfilters different contours and validations for detectingcoarse license plate (en character-specific ERs are selectedas character regions through a Real AdaBoost classifier with

2 Complexity

decision trees Ashtari et al [15] proposed a vehicle licenseplate recognition system based on a modified template-matching technique by the analysis of target colour pixels todetect the location of a license plate along with a hybridclassifier that recognizes license plate characters

With the development of deep learning recently manymethods for license plate detection based on deep learninghave been proposed Kim et al [16] used faster R-CNNframework to locate vehicle regions (en the hierarchicalsampling method is used for generating license plate can-didates from vehicle regions Bulan et al [17] proposed aweak sparse network of winnows classifier trained withsuccessive mean quantization transform features to extractcandidate regions and a strong readableunreadable CNNclassifier to classify those candidate regions Xie et al [18]proposed a preprocessing algorithm to strengthen thecontrast ratio of original car image at the first stage At thesecond stage the integral projection method is used to verifythe true plate Finally a new feature extraction model isdesigned to complete accurate recognition of the licenseplate characters Zou et al [19] proposed to use shallow CNNto quickly remove most of the background regions to reducethe computation cost Deep CNN is then used to detectlicense plate in the remaining regions Xie et al [20] in-troduced a new MD-YOLO model for multidirectional carlicense plate detection (e proposed model could elegantlysolve the problem of multidirectional car license plate de-tection and could also be deployed easily in real-time cir-cumstances because of its reduced computationalcomplexity compared with previous CNN-based methodsHan et al [21] proposed novel and effective strategies totightly enclose the multioriented license plates withbounding parallelograms and detect license plates withmultiple scales(e proposedmethod outperformed existingapproaches in terms of detecting license plates with differentorientations and multiple scales

3 Methodology

Figure 1 illustrates the overall architecture of the proposedframework (e proposed framework is based on fasterR-CNN [1] a popular two-stage general object detector Asshown in Figure 1 a base network based on ResNet-34 [22]architecture is first adopted to generate the base convolutionfeature maps Enhanced feature pyramid is then generatedfrom the base feature maps as in FPN [4] To balance se-mantic features of low-level and high-level information ineach level of enhanced feature pyramid a balanced featurepyramid generation module is added to generate balancedfeature pyramid In the multilevel region proposal network(RPN) a novel predicted location anchor scheme is designedto generate high-quality proposals Finally a detectionnetwork is used to further classify and regress the boundingbox of detected license plates Details of each module will beexplained in the next sections

31 Balanced Feature Pyramid Generation Module (eoriginal faster R-CNN uses VGG-16 [23] as the base

network Ren et al [1] showed that almost of the forwardtime is spent on the base network (us using a faster basenetwork can greatly improve the inference speed of thewhole network ResNet is an efficient architecture whichpresented a residual learning framework to ease the trainingof networks that are substantially deeper than previousnetworks In [22] ResNet-34 achieved nearly as perfor-mance as ResNet-50 and ResNet-101 while being faster andsimpler (us this paper adopts ResNet-34 architecture asthe base network to generate initial convolution featuremaps Compared with VGG-16 ResNet-34 is not only moreaccurate than VGG-16 but also faster than VGG-16 [24](earchitecture of ResNet-34 for ImageNet [25] is shown inTable 1

As in [26] higher lever features in deeper layers ofconvolutional layers contain more semantic representationwhile lower lever features in shallower layers could betterdescribe the characteristics of the small-scale objectsHowever shallow feature maps from the low layers offeature pyramid inherently lack fine semantic informationfor object recognition Recently many feature combinationmethods based on lateral connections such as FPN [4] andRetinaNet [5] have improved the performance of objectdetection over faster R-CNN and SSD However Pang et al[27] showed that the sequential manner in above integrationmethods will make integrated features focus more on ad-jacent resolution but less on other resolutions (e balancedintegrated features which possess balanced informationfrom each resolution will significantly improve the detectionperformance (us this paper proposes a balanced featuremap generation module for generating balanced featuremaps (e proposed module is based on FPN [4] Figure 2illustrates the architecture of the balanced feature mapgeneration module

First let C2 C3 C4 C5 represent the output of the lastresidual block for conv2 conv3 conv4 and conv5 block ofthe ResNet-34 (e strides of these outputs are 4 8 16 32pixels respectively with respect to the input image Fol-lowing [4] a 1times 1 convolutional layer is added on eachoutput feature map of the base network to reduce channeldepth In the top-down path coarser-resolution featuremaps are upsampled by a factor of 2 by using the nearestneighbor upsampling operation (ese upsampled featuresare then merged with the corresponding output featuremaps of the base ResNet-34 by elementwise addition Fi-nally to reduce the aliasing effect of upsampling a 3times3convolution layer is added on each merged feature map(except for M5) to generate the multiscale feature pyramiddenoted as P2 P3 P4 P5 which can be used for detectingobjects at a different scale

Next to integrate multilevel features and preserve theirsemantic hierarchy interpolation and max pooling oper-ation are used on P5 and P2 P3 respectively to resizeP2 P3 P5 to the same size of P4 Because the featuresfrom different convolution layers have different scale ofvalues directly integrating them will lead to the domina-tion of the larger values (us this paper adds L2 nor-malization layer on each of rescaled features to keep thefeature values from different convolution layers on the

Complexity 3

same scale L2 normalization of a vectory y1 y2 yc1113864 1113865 is defined as follows

1113954y y

y2

y

1113936ci1 yi

111386811138681113868111386811138681113868111386811138682

1113872 1113873(12)

(1)

where 1113954y represents the normalized vector y2 represents theL2 normalization of y and c represents the number ofchannels

Based on the rescaled feature maps balanced semanticfeature map is generated as follows

S 1n

1113944

kmax

kkmin

Pkprime (2)

where n represents the number of rescaled features (n 4 inthis paper) kmin and kmax represent the indexes of the lowestand the highest level in the rescaled features Pk

prime representsthe rescaled feature map at resolution level k

Finally the final balanced feature maps F2 F3 F4 F5are received by rescaling the semantic feature map in reverseprocedure More specifically F5 is obtained by using max

pooling operation on balanced semantic feature map andF2 F3 are obtained by using interpolation operation onbalanced semantic feature map With the proposed balancedfeature pyramid generation module each resolution in thefinal feature pyramid obtains equal information from otherresolutions thus balancing the information flow and leadingthe features more discriminative

32 Multiscale Region Proposal Network with Predicted Lo-cation Anchor In faster R-CNN the RPN generates a set ofanchor boxes at each location of the last convolution layer ofthe base network(e RPN then classifies these anchor boxesto objectbackground class and regresses the coordinates ofthese anchor boxes (ere are 9 anchor boxes in total at eachlocation of the feature map in original faster R-CNNframework Each anchor box is associated with predefinedscales and aspect ratios

Wang et al [28] showed that the uniform anchoringscheme in faster R-CNN can lead to significant computa-tional cost because many anchor boxes are generated inregions where the objects of interest are unlikely to exist Inaddition a good of anchor box setting is needed for differentproblems to improve performance (us this paper pro-poses a novel anchor generation scheme based on guidedanchoring scheme [28] for generating anchor boxes

Figure 3 illustrates the difference between the RPN infaster R-CNN and the proposed RPN with novel anchorgeneration scheme In the proposed RPN this paper firstapplies a 1times 1 convolution layer at each feature scale of thebalanced feature maps to create objectness score mapElementwise sigmoid function is then adopted to convert theobjectness score map to probability map Based on theprobability map on each scale the positive regions wherelicense plate candidate may possibly exist can be determinedby selecting those locations whose corresponding proba-bility values are above a predefined threshold As in [28] thispredefined threshold is set at 001 in this paper Finally a3times 3 convolution layer filter is applied across each sliding

Input image

e basenetwork

Enhancedfeature maps

Balancedfeature maps

Balanced featurepyramid generation

module

RPN with predictedlocation anchor

RoI pooling

Fixed sizefeatures

FC la

yer

FC la

yer FC

laye

rFC

laye

r

Classification

Regression

Detection network

Figure 1 Overall pipeline of the proposed approach

Table 1 (e architectures of ResNet-34 for ImageNet

Layer name Kernel size Output sizeConv1 7times 7times 64 stride 2 112times112

Conv23times 3 max pool stride 2

56times 563 times 3 times 643 times 3 times 641113890 1113891 times 3

Conv33 times 3 times 1283 times 3 times 1281113890 1113891 times 4 28times 28

Conv43 times 3 times 2563 times 3 times 2561113890 1113891 times 6 14times14

Conv53 times 3 times 5123 times 3 times 5121113890 1113891 times 3 7times 7

Downsampling is performed by conv3-1 conv4-1 and conv5-1 with a strideof 2

4 Complexity

position on the input feature map At each position on theinput feature map corresponding with positive regions onthe probability map the local features are extracted andconcatenated along the channel axis and form a 256-dfeature vector which is then fed into two separate fullyconvolutional layers for license platebackground classifi-cation and box regression (e probability map can elimi-nate almost negative regions while still maintaining the samerecall Figure 4 shows the example results of the originalRPN and the proposed RPN Because the proposed RPNslides over all positive locations in all balanced pyramidlevels it is not necessary to have multiscale anchors on aspecific level Instead this paper assigns anchors of a singlescale to each level of the balanced pyramid according to thesize and aspect ratio of license plates in the dataset summarytable (Table 2) More specifically this paper defines theanchors to have the height of 5 10 15 20 pixels with anaspect ratio widthheight 5 on F2 F3 F4 F5 respectively

33 Detection Network Although the proposed multiscaleRPN could work as a detector itself it is not strong since itssliding windows do not cover objects well To increasedetection accuracy a license plate detection network isadded License plate detection network is used to classifyproposals generated by the proposed RPN to license plateand background class and further refine the coordinates ofdetected license plate (e license plate detection networkhas a region of interest (RoI) pooling layer and two fullyconnected (FC) layers as shown in Figure 1

Based on proposals generated by the multiscale RPN RoIpooling layer is used to extract the fixed size feature patchesfrom the balanced feature pyramid As in [4] this paperselects the balanced feature map layer in the most proper scaleto extract the feature patches based on the size of eachproposal More specifically a proposal of width w and heighth is assigned to the level Fk of the proposed feature pyramidwith k being calculated as the following formula

k k0 + log2

wh

radic

2241113888 1113889 (3)

where 224 is the canonical ImageNet pretraining size and k0is the target level on which a proposal with w times h 224times 224should be mapped into (is paper sets k0 4 as in [4]

(e fixed size patches are then flattened into a vector andpassed through the two 1024-d FC layers followed by ReLU(e encoded features are then fed into two separate lineartransformation layers license plate classification layer andbounding box regression layer (e license plate classifica-tion layer has two outputs which indicate the softmaxprobability of each proposal as license platebackground(e license plate regression layer produces the bounding boxcoordinate offsets for each proposal

34 Loss Function (e proposed framework is trained in anend-to-end fashion using a multitask loss function Besidethe conventional classification loss Lcls and regression lossLreg this paper adds additional loss function for the anchor

Inpu

t im

age

Conv

1 (C

1)Co

nv2

(C2)

Conv

3 (C

3)Co

nv4

(C4)

Conv

5 (C

5) 1 times 1 conv

1 times 1 conv

1 times 1 conv

1 times 1 conv

M5

M4

M3

M2

Upsample times 2

Upsample times 2

Upsample times 2

3 times 3 conv

3 times 3 conv

3 times 3 conv

P5

P4

P3

P2

Resize

Resize

Resize

L2 norm

L2 norm

L2 norm

L2 norm

Resize

Resize

Resize

Resize

F5

F4

F3

F2

Semanticfeature map

Balanced feature maps

Figure 2 (e architecture of the balanced feature map generation module

Complexity 5

box location prediction Lloc (us the multitask lossfunction is defined as follows

L 1113944 Lcls + 1113944 Lreg + Lloc (4)

In (4) the binary logistic loss is used for box classifi-cation and smooth L1 loss [1] is adopted for box regressionFor training the anchor box location prediction branch inthe proposed RPN this paper follows the training schemedesigned in [28] More specifically this paper denotes theground-truth bounding box as (xgt ygt wgt hgt) where(xgt ygt) represents the center coordinates and (wgt hgt)

represents the size of the ground-truth bounding box (eground-truth bounding box is mapped to the correspondingbalanced feature map scale to obtain (xgtprime ygtprime wgtprime hgtprime ) Basedon the obtained bounding box the center box (CB) ignorebox (IB) and outside box (OB) are defined as follows

CB xgtprime ygtprime z1wgtprime z1hgtprime1113872 1113873 (5)

IB xgtprime ygtprime z2wgtprime z2hgtprime1113872 1113873 minus CB (6)

OB xgtprime ygtprime wgtprime hgtprime1113872 1113873 minus CB minus IB (7)

Input feature map

Conv layer(3 times 3 times C times N)

Conv layer(1 times 1 times N times 2k)

Conv layer(1 times 1 times N times 4k)

Classification

Regression

Proposals

W times H times C

(a)

W times H times C

Conv layer(1 times 1 times C times 1)

Objectnessscore map

Probabilitymap

W times H times 1

Elementwise sigmoid

Conv layer(3 times 3 times C times N)

Conv layer(1 times 1 times N times 2k)

Conv layer(1 times 1 times N times 4k)

Classification

Regression

Proposals

(b)

Figure 3(e architecture of the original RPN (a) and the proposed RPNwith predicted location anchor (b) Multilayer RPN is used in this paper

(a) (b)

Figure 4 Example results of the original RPN (a) and the proposed RPN (b)

Table 2 Dataset summary

Dataset Number of images Number of license plates Image resolution License plate height (in pixels)

PKU vehicle dataset

G1 810 810 1082times 728 35ndash57G2 700 700 1082times 728 30ndash62G3 743 743 1082times 728 29ndash53G4 572 572 1600times1236 30ndash58G5 1152 1438 1600times1200 20ndash60

AOLP datasetAC 681 681 352times 240 25ndash70LE 757 757 640times 480 28ndash80RP 611 611 320times 240 30ndash70

6 Complexity

where z2 gt z1 Pixels inside CB are assigned as positive lo-cations while pixels inside OB are assigned as negativelocations Otherwise pixels inside IB are discarded intraining samples In the end for each image in the trainingset a binary label map where 1 represents a positive locationand 0 represents a negative location is generated for trainingthe anchor box location prediction branch Note that eachlevel of the balanced feature map should only assign objectsof a specific scale range so CB is only assigned on a featuremap that matches the scale range of the targeted object (esame regions of adjacent levels in the balanced featurepyramid are set as IB Finally focal loss function [5] isadopted to train the anchor box location prediction branchfor solving sample level imbalance problem

4 Results and Discussion

In order to compare the effectiveness of the proposed ap-proach with other state-of-the-art approaches on licenseplate detection this paper conducts experiments on twopublic datasets PKU vehicle dataset [13] and ApplicationOriented License Plate (AOLP) dataset [29] (e proposedapproach is implemented on a Window system machinewith Intel Core i7 8700 CPU NVIDIA GeForce GTX 1080GPU and 16Gb of RAM TensorFlow is adopted forimplementing deep CNN frameworks

41 Dataset and EvaluationMetric Two public license platedatasets are adopted to evaluate the performance of theproposed method in this paper including PKU vehicledataset [13] and AOLP dataset [29]

PKU vehicle dataset includes 3828 vehicle images cap-tured from various scenes under diverse environmentconditions (e image in this dataset is divided into fivegroups (G1-G5) corresponding to different configurationsMore specifically all images in G1 G2 and G3 group weretaken on highways while images in G4 group were taken oncity roads and images in G5 group were taken at inter-sections with crosswalks (e image in G4 group is capturedduring nighttime while the image in other groups is cap-tured during daytime (ere is one Chinese license plate ineach image of G1-G4 group while multiple Chinese licenseplates are captured in each image of G5 group For trainingthe proposed network this paper adopts CarFlag-Largedataset [30] which contains 460000 images with Chineselicense plates

AOLP dataset includes 2049 images of Taiwan licenseplates in various locations time traffic and weather con-ditions (is dataset is categorized into three subsets accesscontrol (AC) with 681 images traffic law enforcement (LE)with 757 images and road patrol (RP) with 611 images ACrefers to the cases that a vehicle passes a fixed passage at areduced speed or with a full stop LE refers to the cases that avehicle violates traffic laws and is captured by a roadsidecamera RP refers to the cases that the camera is installed orhandheld on a patrolling vehicle which takes images ofvehicles with arbitrary viewpoints and distances Each imagecontains one license plate Since there is no standard split for

AOLP dataset this paper follows the same strategy as in [30]for training the proposed network More specifically thispaper uses images from different subsets for training and testseparately In addition data augmentation is conducted byrotation and affine transformation to increase the number oftraining images In this paper PKU vehicle dataset andAOLP dataset are adopted to evaluate the performance of theproposed approach and compare the detection results withthe results of other state-of-the-art approaches Table 2shows the detailed descriptions of each dataset used inthis paper

For the evaluationmetric this paper follows the criterionused in [13] to evaluate the performance of the proposedmethod and other methods on the PKU vehicle dataset andAOLP dataset More specifically a detection is considered tobe correct if the license plate is totally encompassed by thebounding box and the IoU between the detected license plateand the ground-truth license plate is at least 05

42ExperimentalResults onPKUVehicleDataset In order toshow the effectiveness of the proposed approach this papercompares the performance results of the proposed methodwith the results of state-of-the-art license plate detectionmethods on PKU vehicle dataset including the methodsproposed by Zhou et al [31] Li et al [32] Yuan et al [13]and Li et al [30] Zhou et al [31] proposed to discover theprincipal visual word characterized with geometric contextfor each license plate character With a new license plateimage the license plates are extracted by matching localfeatures with principal visual word Li et al [32] usedmaximally stable extremal region detector to extract can-didate characters in images (e exact bounding boxes oflicense plates are estimated through the belief propagationinference on conditional random field which are constructedon the candidate characters in neighborhoods Yuan et al[13] proposed a novel line density filter approach to extractlicense candidate regions and a cascaded license plateclassifier based on linear support vector machines usingcolour saliency features is designed to identify the truelicense plate from among the candidate regions Li et al [30]proposed an approach to address both detection and rec-ognition of license plate using a single deep neural network

Table 3 shows the comparison of detection results onPKU vehicle dataset As shown in Table 3 the proposedapproach achieves the best detection accuracy on PKUvehicle dataset More specifically in terms of average de-tection performance the performance of the proposedmethod is improved by 953 823 206 and 024compared with methods proposed by Zhou et al [31] Liet al [32] Yuan et al [13] and Li et al [30] respectively Itshould be noted that the performance of the proposedmethod surpasses the best of the reference methods pro-posed by Li et al [30] by a significant margin on G5 groupImages in G5 group contain multiple license plates in dif-ficult conditions such as large variance of scales reflectiveglare and blurry and are affected by defects (is resultshows a strong ability of the proposed framework ondetecting license plate in difficult conditions with a large

Complexity 7

variance of scales Figure 5(a) shows some examples ofdetection results of the proposed method on PKU vehicledataset As shown in Figure 5(a) the proposed algorithm iseffective to detect license plates with different scales underdifferent situations

43 Experimental Results on AOLP Dataset To furtherevaluate the effectiveness of the proposed framework theperformance of proposed approach is tested on AOLPdataset Table 4 shows the comparison of detection resultsof the proposed method and methods proposed by Hsuet al [29] Li et al [33] and Li et al [30] Experimentalresults in Table 4 show that the proposed method achievesthe best detection ratio on all three subsets compared to theprevious methods More specifically in terms of averagedetection the performance of the proposed method isimproved by 442 223 and 062 compared withmethods proposed by Hsu et al [29] Li et al [33] and Liet al [30] respectively (e experimental results demon-strate that the proposed balanced feature pyramid andpredicted location anchor can effectively enhance featurerepresentation power and boost the performance of licenseplate detection in difficult conditions Figure 5(b) showssome examples of detection results of the proposed methodon AOLP dataset As can be observed the proposedmethod can accurately locate small license plates as well asmedium or large ones

44 Ablation Experiments To evaluate the effectiveness ofeach module in the proposed approach this paper conductsseveral experiments on the Chinese City Parking Dataset(CCPD) [34] and compares the detection results with theresults of original faster R-CNN [1] and FPN with fasterR-CNN baseline [4] framework CCPD dataset is a largepublicly available labeled license plate dataset It contains25k independent license plate images under diverse illu-minations environments and backgrounds Each image hasresolution of 720times1160 and contains one license plate Allimages containing license plate are divided into 8 groupsbased on different conditions CCPD-Base with 200k imagesCCPD-FN with 20k images CCPD-DB with 20k imagesCCPD-Rotate with 10k images CCPD-Tilt with 10k imagesCCPD-Weather with 10k images CCPD-Challenge with 10kimages CCPD-Blur with 5k images As in [34] this paperadopts 100k images in CCPD-Base subset to train theproposed network and then evaluates the results on CCPD-Base CCPD-DB CCPD-FN CCPD-Rotate CCPD-TiltCCPD-Weather and CCPD-Challenge

In the first experiment this paper replaces VGG-16network in original faster R-CNN by the proposed bal-anced feature pyramid generation module (e RPN net-work is kept unchanged in this experiment To show theeffectiveness of the L2 normalization L2 normalizationlayer in the balanced feature pyramid generation module isdiscarded in the second experiment In the third experi-ment this paper adds the proposed RPN network with thepredicted location anchor module to replace the originalRPN network (e VGG-16 is kept unchanged as the basenetwork in this experiment In the fourth experiment thispaper adds both the proposed RPN network with thepredicted location anchor module and the proposed bal-anced feature pyramid generation module with L2 nor-malization layer to replace the original RPN network andVGG-16 architecture

Table 5 shows the detection results of each proposedexperiment on the CCPD dataset As shown in Table 5comparing with the original RPN in faster R-CNNframework the proposed predicted location anchorscheme improves the average detection by 02 Bygenerating good proposals the features for the detectionnetwork are more discriminative thus improving thedetection results Comparing with the feature pyramid inFPN with faster R-CNN baseline the proposed balancedpyramid generation module improves the average de-tection by 15 It should be noted that there is no pa-rameter added in the proposed module With theproposed module each level in the balanced featurepyramid obtains equal information from other levelsthus improving the detection performance of the de-tection network Comparing with faster R-CNN and FPNwith faster R-CNN baseline the proposed approachimproves the average detection by 45 and 19 re-spectively (e comparison results indicate that theproposed framework is superior to both single scale andmultiscale features for a region-based object detectorFurthermore with L2 normalization layer added on eachof rescaled features in the balanced feature pyramidgeneration module the average detection is improved by06 compared with the balanced feature pyramid gen-eration module without L2 normalization (is resultshows the effectiveness of the L2 normalization layerwhich keeps the feature values from different convolutionlayers on the same scale

5 Conclusions and Future Work

(is paper proposes a novel deep learning-based frameworkfor license plate detection In the proposed framework abalanced feature pyramid generation module based onResNet-34 architecture is used to generate enhanced bal-anced feature pyramid of which each feature level obtainsequal information from other feature levels In addition amultiscale region proposal network with predicted locationanchor scheme is introduced to generate good proposalsfrom each level of the balanced feature pyramid With goodproposals generated from balanced feature maps the pro-posed approach shows significant improvements compared

Table 3 Comparison of detection results on PKU vehicle dataset

MethodDetection ratio ()

G1 G2 G3 G4 G5 AverageZhou et al [31] 9543 9785 9421 8123 8237 9022Li et al [32] 9889 9842 9583 8117 8331 9152Yuan et al [13] 9876 9842 9772 9623 9732 9769Li et al [30] 9988 9971 9946 9983 9868 9951Proposed approach 9988 9986 9973 9983 9944 9975

8 Complexity

with other approaches on license plate detection (e goodperformance of the proposed approach on license platedetection has a high reference value in the field of intelligent

transport systems For the future work this paper will ex-plore and compare more feature combination andmultiscaledetection methods such as DeepLabv3+ [35] and MOSI-

(a) (b)

Figure 5 Examples of detection results of the proposed method on PKU vehicle dataset (a) and AOLP dataset (b)

Complexity 9

LPD [21] In addition this paper will adopt the nonlocalmodule [36] to further refine the balanced semantic features(is step may enhance the integrated features and furtherimprove the detection results

Data Availability

(e codes used in this paper are available from the corre-sponding author upon request

Conflicts of Interest

(e author declares that there are no conflicts of interestregarding the publication of this paper

References

[1] S Ren K He R Girshick and J Sun ldquoFaster r-cnn towardsreal-time object detection with region proposal networksrdquoIEEE Transactions on Pattern Analysis and Machine Intelli-gence vol 39 no 6 pp 1137ndash1149 2015

[2] W Liu D Anguelov D Erhan et al ldquoSingle shot multiboxdetectorrdquo 2016 httpsarxivorgabs151202325

[3] J Redmon and F Ali ldquoYolov3 an incremental improvementrdquo2018 httparxivorgabs180402767

[4] T Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 936ndash944 Hon-olulu HI USA July 2017

[5] T Lin P Goyal R Girshick K He and P Dollar ldquoFocal lossfor dense object detectionrdquo in Proceedings of the 2017 IEEEInternational Conference on Computer Vision (ICCV)pp 2999ndash3007 Venice Italy October 2017

[6] Z Cai Q Fan R S Feris and N Vasconcelos ldquoA unifiedmulti-scale deep convolutional neural network for fast objectdetectionrdquo 2016 httparxivorgabs160707155

[7] C-Y Fu W Liu A Ranga A Tyagi and A C Berg ldquoDssddeconvolutional single shot detectorrdquo 2017 httparxivorgabs170106659

[8] R Girshick ldquoFast R-CNNrdquo in Proceedings of the IEEE In-ternational Conference on Computer Vision (ICCV) SantiagoChile December 2015

[9] J Dai Yi Li K He and J Sun ldquoR-fcn object detection viaregion-based fully convolutional networksrdquo Advances inNeural information Processing Systems pp 379ndash387 MitPress Cambridge MA USA 2016

[10] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo 2016 httparxivorgabs150602640

[11] S Bell C L Zitnick K Bala and R Girshick ldquoInside-outsidenet detecting objects in context with skip pooling and re-current neural networksrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognitionpp 2874ndash2883 Las Vegas NV USA June 2016

[12] K S Raghunandan P Shivakumara H A Jalab et al ldquoRieszfractional based model for enhancing license plate detectionand recognitionrdquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 28 no 9 pp 2276ndash2288 2018

[13] Y Yuan W Zou Y Zhao X Wang X Hu andN Komodakis ldquoA robust and efficient approach to licenseplate detectionrdquo IEEE Transactions on Image Processingvol 26 no 3 pp 1102ndash1114 2017

[14] C Gou K Wang Y Yao and Z Li ldquoVehicle license platerecognition based on extremal regions and restricted Boltz-mann machinesrdquo IEEE Transactions on Intelligent Trans-portation Systems vol 17 no 4 pp 1096ndash1107 2016

[15] A H Ashtari M J Nordin andM Fathy ldquoAn Iranian licenseplate recognition system based on color featuresrdquo IEEETransactions on Intelligent Transportation Systems vol 15no 4 pp 1690ndash1705 2014

[16] S G Kim H G Jeon and H I Koo ldquoDeep-learning-basedlicense plate detection method using vehicle region extrac-tionrdquo Electronics Letters vol 53 no 15 pp 1034ndash1036 2017

[17] O Bulan V Kozitsky P Ramesh and M Shreve ldquoSeg-mentation- and annotation-free license plate recognition withdeep localization and failure identificationrdquo IEEE Transac-tions on Intelligent Transportation Systems vol 18 no 9pp 2351ndash2363 2017

[18] F Xie M Zhang J Zhao J Yang Y Liu and X Yuan ldquoArobust license plate detection and character recognition

Table 4 Comparison of detection results on AOLP dataset

MethodDetection ratio ()

AC LE RP AverageHsu et al [29] 960 950 940 95Li et al [33] 9838 9762 9558 9719Li et al [30] 9912 9908 9820 988Proposed approach 9941 9934 9951 9942

Table 5 Detection results of each proposed network on CCPD dataset

NetworkDetection performance ()

Base DB FN Rotate Tilt Weather Challenge AverageFaster R-CNN 981 921 837 918 894 811 839 886FPN with faster R-CNN baseline 992 954 875 930 913 854 863 912Faster R-CNN+balanced feature pyramid 995 962 891 932 916 889 901 927Faster R-CNN+balanced pyramid without L2-norm 993 962 889 931 910 876 883 921Faster R-CNN+predicted anchor RPN 985 926 840 915 894 814 844 888Faster R-CNN+balanced pyramid with L2-norm+predicted anchor RPN 995 964 901 932 918 897 912 931

10 Complexity

algorithm based on a combined feature extraction model andBPNNrdquo Journal of Advanced Transportation vol 2018 ArticleID 6737314 14 pages 2018

[19] L Zou M Zhao Z Gao M Cao H Jia and M Pei ldquoLicenseplate detection with shallow and deep CNNs in complexenvironmentsrdquo Complexity vol 2018 Article ID 79846536 pages 2018

[20] L Xie T Ahmad L Jin Y Liu and S Zhang ldquoA new CNN-based method for multi-directional car license plate detec-tionrdquo IEEE Transactions on Intelligent Transportation Systemsvol 19 no 2 pp 507ndash517 2018

[21] J Han J Yao J Zhao J Tu and Y Liu ldquoMulti-oriented andscale-invariant license plate detection based on convolutionalneural networksrdquo Sensors vol 19 no 5 p 1175 2019

[22] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the 2016 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 770ndash778 Las Vegas NV USA June 2016

[23] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httparxivorgabs14091556

[24] J Huang ldquoSpeedaccuracy trade-offs for modern convolu-tional object detectorsrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 3296-3297 Honolulu HI USA July 2017

[25] O Russakovsky J Deng H Su et al ldquoImagenet large scalevisual recognition challengerdquo 2014 httparxivorgabs14090575

[26] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in European Conference on Com-puter Vision Springer Berlin Germany 2014

[27] J Pang ldquoLibra r-cnn towards balanced learning for objectdetectionrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition Long Beach CA USA June2019

[28] J Wang K Chen S Yang C C Loy and D Lin ldquoRegionproposal by guided anchoringrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognitionpp 2965ndash2974 Long Beach CA USA June 2019

[29] G-S Hsu J-C Chen and Y-Z Chung ldquoApplication-ori-ented license plate recognitionrdquo IEEE Transactions on Ve-hicular Technology vol 62 no 2 pp 552ndash561 2013

[30] H Li P Wang and C Shen ldquoToward end-to-end car licenseplate detection and recognition with deep neural networksrdquoIEEE Transactions on Intelligent Transportation Systemsvol 20 no 3 pp 1126ndash1136 2019

[31] W Zhou H Li Y Lu and Q Tian ldquoPrincipal visual worddiscovery for automatic license plate detectionrdquo IEEETransactions on Image Processing vol 21 no 9 pp 4269ndash4279 2012

[32] B Li B Tian Y Li and D Wen ldquoComponent-based licenseplate detection using conditional random field modelrdquo IEEETransactions on Intelligent Transportation Systems vol 14no 4 pp 1690ndash1699 2013

[33] H Li and C Shen ldquoReading car license plates using deepconvolutional neural networks and LSTMsrdquo 2016 httpsarxivorgabs160105610

[34] Z Xu W Yang A Meng et al ldquoTowards end-to-end licenseplate detection and recognition a large dataset and baselinerdquoComputer VisionmdashECCV 2018 Springer Berlin Germanypp 261ndash277 2018

[35] L-C Chen Y Zhu P George F Schroff and H AdamldquoEncoder-decoder with atrous separable convolution for se-mantic image segmentationrdquo in Proceedings of the European

Conference on Computer Vision (ECCV) pp 801ndash818Munich Germany September 2018

[36] X Wang R Girshick A Gupta and K He ldquoNon-local neuralnetworksrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7794ndash7803 Honolulu HIUSA July 2018

Complexity 11

Page 3: PredictedAnchorRegionProposalwithBalancedFeature ...downloads.hindawi.com › journals › complexity › 2020 › 5137056.pdf · R-CNN for license plate detection. In the first

decision trees Ashtari et al [15] proposed a vehicle licenseplate recognition system based on a modified template-matching technique by the analysis of target colour pixels todetect the location of a license plate along with a hybridclassifier that recognizes license plate characters

With the development of deep learning recently manymethods for license plate detection based on deep learninghave been proposed Kim et al [16] used faster R-CNNframework to locate vehicle regions (en the hierarchicalsampling method is used for generating license plate can-didates from vehicle regions Bulan et al [17] proposed aweak sparse network of winnows classifier trained withsuccessive mean quantization transform features to extractcandidate regions and a strong readableunreadable CNNclassifier to classify those candidate regions Xie et al [18]proposed a preprocessing algorithm to strengthen thecontrast ratio of original car image at the first stage At thesecond stage the integral projection method is used to verifythe true plate Finally a new feature extraction model isdesigned to complete accurate recognition of the licenseplate characters Zou et al [19] proposed to use shallow CNNto quickly remove most of the background regions to reducethe computation cost Deep CNN is then used to detectlicense plate in the remaining regions Xie et al [20] in-troduced a new MD-YOLO model for multidirectional carlicense plate detection (e proposed model could elegantlysolve the problem of multidirectional car license plate de-tection and could also be deployed easily in real-time cir-cumstances because of its reduced computationalcomplexity compared with previous CNN-based methodsHan et al [21] proposed novel and effective strategies totightly enclose the multioriented license plates withbounding parallelograms and detect license plates withmultiple scales(e proposedmethod outperformed existingapproaches in terms of detecting license plates with differentorientations and multiple scales

3 Methodology

Figure 1 illustrates the overall architecture of the proposedframework (e proposed framework is based on fasterR-CNN [1] a popular two-stage general object detector Asshown in Figure 1 a base network based on ResNet-34 [22]architecture is first adopted to generate the base convolutionfeature maps Enhanced feature pyramid is then generatedfrom the base feature maps as in FPN [4] To balance se-mantic features of low-level and high-level information ineach level of enhanced feature pyramid a balanced featurepyramid generation module is added to generate balancedfeature pyramid In the multilevel region proposal network(RPN) a novel predicted location anchor scheme is designedto generate high-quality proposals Finally a detectionnetwork is used to further classify and regress the boundingbox of detected license plates Details of each module will beexplained in the next sections

31 Balanced Feature Pyramid Generation Module (eoriginal faster R-CNN uses VGG-16 [23] as the base

network Ren et al [1] showed that almost of the forwardtime is spent on the base network (us using a faster basenetwork can greatly improve the inference speed of thewhole network ResNet is an efficient architecture whichpresented a residual learning framework to ease the trainingof networks that are substantially deeper than previousnetworks In [22] ResNet-34 achieved nearly as perfor-mance as ResNet-50 and ResNet-101 while being faster andsimpler (us this paper adopts ResNet-34 architecture asthe base network to generate initial convolution featuremaps Compared with VGG-16 ResNet-34 is not only moreaccurate than VGG-16 but also faster than VGG-16 [24](earchitecture of ResNet-34 for ImageNet [25] is shown inTable 1

As in [26] higher lever features in deeper layers ofconvolutional layers contain more semantic representationwhile lower lever features in shallower layers could betterdescribe the characteristics of the small-scale objectsHowever shallow feature maps from the low layers offeature pyramid inherently lack fine semantic informationfor object recognition Recently many feature combinationmethods based on lateral connections such as FPN [4] andRetinaNet [5] have improved the performance of objectdetection over faster R-CNN and SSD However Pang et al[27] showed that the sequential manner in above integrationmethods will make integrated features focus more on ad-jacent resolution but less on other resolutions (e balancedintegrated features which possess balanced informationfrom each resolution will significantly improve the detectionperformance (us this paper proposes a balanced featuremap generation module for generating balanced featuremaps (e proposed module is based on FPN [4] Figure 2illustrates the architecture of the balanced feature mapgeneration module

First let C2 C3 C4 C5 represent the output of the lastresidual block for conv2 conv3 conv4 and conv5 block ofthe ResNet-34 (e strides of these outputs are 4 8 16 32pixels respectively with respect to the input image Fol-lowing [4] a 1times 1 convolutional layer is added on eachoutput feature map of the base network to reduce channeldepth In the top-down path coarser-resolution featuremaps are upsampled by a factor of 2 by using the nearestneighbor upsampling operation (ese upsampled featuresare then merged with the corresponding output featuremaps of the base ResNet-34 by elementwise addition Fi-nally to reduce the aliasing effect of upsampling a 3times3convolution layer is added on each merged feature map(except for M5) to generate the multiscale feature pyramiddenoted as P2 P3 P4 P5 which can be used for detectingobjects at a different scale

Next to integrate multilevel features and preserve theirsemantic hierarchy interpolation and max pooling oper-ation are used on P5 and P2 P3 respectively to resizeP2 P3 P5 to the same size of P4 Because the featuresfrom different convolution layers have different scale ofvalues directly integrating them will lead to the domina-tion of the larger values (us this paper adds L2 nor-malization layer on each of rescaled features to keep thefeature values from different convolution layers on the

Complexity 3

same scale L2 normalization of a vectory y1 y2 yc1113864 1113865 is defined as follows

1113954y y

y2

y

1113936ci1 yi

111386811138681113868111386811138681113868111386811138682

1113872 1113873(12)

(1)

where 1113954y represents the normalized vector y2 represents theL2 normalization of y and c represents the number ofchannels

Based on the rescaled feature maps balanced semanticfeature map is generated as follows

S 1n

1113944

kmax

kkmin

Pkprime (2)

where n represents the number of rescaled features (n 4 inthis paper) kmin and kmax represent the indexes of the lowestand the highest level in the rescaled features Pk

prime representsthe rescaled feature map at resolution level k

Finally the final balanced feature maps F2 F3 F4 F5are received by rescaling the semantic feature map in reverseprocedure More specifically F5 is obtained by using max

pooling operation on balanced semantic feature map andF2 F3 are obtained by using interpolation operation onbalanced semantic feature map With the proposed balancedfeature pyramid generation module each resolution in thefinal feature pyramid obtains equal information from otherresolutions thus balancing the information flow and leadingthe features more discriminative

32 Multiscale Region Proposal Network with Predicted Lo-cation Anchor In faster R-CNN the RPN generates a set ofanchor boxes at each location of the last convolution layer ofthe base network(e RPN then classifies these anchor boxesto objectbackground class and regresses the coordinates ofthese anchor boxes (ere are 9 anchor boxes in total at eachlocation of the feature map in original faster R-CNNframework Each anchor box is associated with predefinedscales and aspect ratios

Wang et al [28] showed that the uniform anchoringscheme in faster R-CNN can lead to significant computa-tional cost because many anchor boxes are generated inregions where the objects of interest are unlikely to exist Inaddition a good of anchor box setting is needed for differentproblems to improve performance (us this paper pro-poses a novel anchor generation scheme based on guidedanchoring scheme [28] for generating anchor boxes

Figure 3 illustrates the difference between the RPN infaster R-CNN and the proposed RPN with novel anchorgeneration scheme In the proposed RPN this paper firstapplies a 1times 1 convolution layer at each feature scale of thebalanced feature maps to create objectness score mapElementwise sigmoid function is then adopted to convert theobjectness score map to probability map Based on theprobability map on each scale the positive regions wherelicense plate candidate may possibly exist can be determinedby selecting those locations whose corresponding proba-bility values are above a predefined threshold As in [28] thispredefined threshold is set at 001 in this paper Finally a3times 3 convolution layer filter is applied across each sliding

Input image

e basenetwork

Enhancedfeature maps

Balancedfeature maps

Balanced featurepyramid generation

module

RPN with predictedlocation anchor

RoI pooling

Fixed sizefeatures

FC la

yer

FC la

yer FC

laye

rFC

laye

r

Classification

Regression

Detection network

Figure 1 Overall pipeline of the proposed approach

Table 1 (e architectures of ResNet-34 for ImageNet

Layer name Kernel size Output sizeConv1 7times 7times 64 stride 2 112times112

Conv23times 3 max pool stride 2

56times 563 times 3 times 643 times 3 times 641113890 1113891 times 3

Conv33 times 3 times 1283 times 3 times 1281113890 1113891 times 4 28times 28

Conv43 times 3 times 2563 times 3 times 2561113890 1113891 times 6 14times14

Conv53 times 3 times 5123 times 3 times 5121113890 1113891 times 3 7times 7

Downsampling is performed by conv3-1 conv4-1 and conv5-1 with a strideof 2

4 Complexity

position on the input feature map At each position on theinput feature map corresponding with positive regions onthe probability map the local features are extracted andconcatenated along the channel axis and form a 256-dfeature vector which is then fed into two separate fullyconvolutional layers for license platebackground classifi-cation and box regression (e probability map can elimi-nate almost negative regions while still maintaining the samerecall Figure 4 shows the example results of the originalRPN and the proposed RPN Because the proposed RPNslides over all positive locations in all balanced pyramidlevels it is not necessary to have multiscale anchors on aspecific level Instead this paper assigns anchors of a singlescale to each level of the balanced pyramid according to thesize and aspect ratio of license plates in the dataset summarytable (Table 2) More specifically this paper defines theanchors to have the height of 5 10 15 20 pixels with anaspect ratio widthheight 5 on F2 F3 F4 F5 respectively

33 Detection Network Although the proposed multiscaleRPN could work as a detector itself it is not strong since itssliding windows do not cover objects well To increasedetection accuracy a license plate detection network isadded License plate detection network is used to classifyproposals generated by the proposed RPN to license plateand background class and further refine the coordinates ofdetected license plate (e license plate detection networkhas a region of interest (RoI) pooling layer and two fullyconnected (FC) layers as shown in Figure 1

Based on proposals generated by the multiscale RPN RoIpooling layer is used to extract the fixed size feature patchesfrom the balanced feature pyramid As in [4] this paperselects the balanced feature map layer in the most proper scaleto extract the feature patches based on the size of eachproposal More specifically a proposal of width w and heighth is assigned to the level Fk of the proposed feature pyramidwith k being calculated as the following formula

k k0 + log2

wh

radic

2241113888 1113889 (3)

where 224 is the canonical ImageNet pretraining size and k0is the target level on which a proposal with w times h 224times 224should be mapped into (is paper sets k0 4 as in [4]

(e fixed size patches are then flattened into a vector andpassed through the two 1024-d FC layers followed by ReLU(e encoded features are then fed into two separate lineartransformation layers license plate classification layer andbounding box regression layer (e license plate classifica-tion layer has two outputs which indicate the softmaxprobability of each proposal as license platebackground(e license plate regression layer produces the bounding boxcoordinate offsets for each proposal

34 Loss Function (e proposed framework is trained in anend-to-end fashion using a multitask loss function Besidethe conventional classification loss Lcls and regression lossLreg this paper adds additional loss function for the anchor

Inpu

t im

age

Conv

1 (C

1)Co

nv2

(C2)

Conv

3 (C

3)Co

nv4

(C4)

Conv

5 (C

5) 1 times 1 conv

1 times 1 conv

1 times 1 conv

1 times 1 conv

M5

M4

M3

M2

Upsample times 2

Upsample times 2

Upsample times 2

3 times 3 conv

3 times 3 conv

3 times 3 conv

P5

P4

P3

P2

Resize

Resize

Resize

L2 norm

L2 norm

L2 norm

L2 norm

Resize

Resize

Resize

Resize

F5

F4

F3

F2

Semanticfeature map

Balanced feature maps

Figure 2 (e architecture of the balanced feature map generation module

Complexity 5

box location prediction Lloc (us the multitask lossfunction is defined as follows

L 1113944 Lcls + 1113944 Lreg + Lloc (4)

In (4) the binary logistic loss is used for box classifi-cation and smooth L1 loss [1] is adopted for box regressionFor training the anchor box location prediction branch inthe proposed RPN this paper follows the training schemedesigned in [28] More specifically this paper denotes theground-truth bounding box as (xgt ygt wgt hgt) where(xgt ygt) represents the center coordinates and (wgt hgt)

represents the size of the ground-truth bounding box (eground-truth bounding box is mapped to the correspondingbalanced feature map scale to obtain (xgtprime ygtprime wgtprime hgtprime ) Basedon the obtained bounding box the center box (CB) ignorebox (IB) and outside box (OB) are defined as follows

CB xgtprime ygtprime z1wgtprime z1hgtprime1113872 1113873 (5)

IB xgtprime ygtprime z2wgtprime z2hgtprime1113872 1113873 minus CB (6)

OB xgtprime ygtprime wgtprime hgtprime1113872 1113873 minus CB minus IB (7)

Input feature map

Conv layer(3 times 3 times C times N)

Conv layer(1 times 1 times N times 2k)

Conv layer(1 times 1 times N times 4k)

Classification

Regression

Proposals

W times H times C

(a)

W times H times C

Conv layer(1 times 1 times C times 1)

Objectnessscore map

Probabilitymap

W times H times 1

Elementwise sigmoid

Conv layer(3 times 3 times C times N)

Conv layer(1 times 1 times N times 2k)

Conv layer(1 times 1 times N times 4k)

Classification

Regression

Proposals

(b)

Figure 3(e architecture of the original RPN (a) and the proposed RPNwith predicted location anchor (b) Multilayer RPN is used in this paper

(a) (b)

Figure 4 Example results of the original RPN (a) and the proposed RPN (b)

Table 2 Dataset summary

Dataset Number of images Number of license plates Image resolution License plate height (in pixels)

PKU vehicle dataset

G1 810 810 1082times 728 35ndash57G2 700 700 1082times 728 30ndash62G3 743 743 1082times 728 29ndash53G4 572 572 1600times1236 30ndash58G5 1152 1438 1600times1200 20ndash60

AOLP datasetAC 681 681 352times 240 25ndash70LE 757 757 640times 480 28ndash80RP 611 611 320times 240 30ndash70

6 Complexity

where z2 gt z1 Pixels inside CB are assigned as positive lo-cations while pixels inside OB are assigned as negativelocations Otherwise pixels inside IB are discarded intraining samples In the end for each image in the trainingset a binary label map where 1 represents a positive locationand 0 represents a negative location is generated for trainingthe anchor box location prediction branch Note that eachlevel of the balanced feature map should only assign objectsof a specific scale range so CB is only assigned on a featuremap that matches the scale range of the targeted object (esame regions of adjacent levels in the balanced featurepyramid are set as IB Finally focal loss function [5] isadopted to train the anchor box location prediction branchfor solving sample level imbalance problem

4 Results and Discussion

In order to compare the effectiveness of the proposed ap-proach with other state-of-the-art approaches on licenseplate detection this paper conducts experiments on twopublic datasets PKU vehicle dataset [13] and ApplicationOriented License Plate (AOLP) dataset [29] (e proposedapproach is implemented on a Window system machinewith Intel Core i7 8700 CPU NVIDIA GeForce GTX 1080GPU and 16Gb of RAM TensorFlow is adopted forimplementing deep CNN frameworks

41 Dataset and EvaluationMetric Two public license platedatasets are adopted to evaluate the performance of theproposed method in this paper including PKU vehicledataset [13] and AOLP dataset [29]

PKU vehicle dataset includes 3828 vehicle images cap-tured from various scenes under diverse environmentconditions (e image in this dataset is divided into fivegroups (G1-G5) corresponding to different configurationsMore specifically all images in G1 G2 and G3 group weretaken on highways while images in G4 group were taken oncity roads and images in G5 group were taken at inter-sections with crosswalks (e image in G4 group is capturedduring nighttime while the image in other groups is cap-tured during daytime (ere is one Chinese license plate ineach image of G1-G4 group while multiple Chinese licenseplates are captured in each image of G5 group For trainingthe proposed network this paper adopts CarFlag-Largedataset [30] which contains 460000 images with Chineselicense plates

AOLP dataset includes 2049 images of Taiwan licenseplates in various locations time traffic and weather con-ditions (is dataset is categorized into three subsets accesscontrol (AC) with 681 images traffic law enforcement (LE)with 757 images and road patrol (RP) with 611 images ACrefers to the cases that a vehicle passes a fixed passage at areduced speed or with a full stop LE refers to the cases that avehicle violates traffic laws and is captured by a roadsidecamera RP refers to the cases that the camera is installed orhandheld on a patrolling vehicle which takes images ofvehicles with arbitrary viewpoints and distances Each imagecontains one license plate Since there is no standard split for

AOLP dataset this paper follows the same strategy as in [30]for training the proposed network More specifically thispaper uses images from different subsets for training and testseparately In addition data augmentation is conducted byrotation and affine transformation to increase the number oftraining images In this paper PKU vehicle dataset andAOLP dataset are adopted to evaluate the performance of theproposed approach and compare the detection results withthe results of other state-of-the-art approaches Table 2shows the detailed descriptions of each dataset used inthis paper

For the evaluationmetric this paper follows the criterionused in [13] to evaluate the performance of the proposedmethod and other methods on the PKU vehicle dataset andAOLP dataset More specifically a detection is considered tobe correct if the license plate is totally encompassed by thebounding box and the IoU between the detected license plateand the ground-truth license plate is at least 05

42ExperimentalResults onPKUVehicleDataset In order toshow the effectiveness of the proposed approach this papercompares the performance results of the proposed methodwith the results of state-of-the-art license plate detectionmethods on PKU vehicle dataset including the methodsproposed by Zhou et al [31] Li et al [32] Yuan et al [13]and Li et al [30] Zhou et al [31] proposed to discover theprincipal visual word characterized with geometric contextfor each license plate character With a new license plateimage the license plates are extracted by matching localfeatures with principal visual word Li et al [32] usedmaximally stable extremal region detector to extract can-didate characters in images (e exact bounding boxes oflicense plates are estimated through the belief propagationinference on conditional random field which are constructedon the candidate characters in neighborhoods Yuan et al[13] proposed a novel line density filter approach to extractlicense candidate regions and a cascaded license plateclassifier based on linear support vector machines usingcolour saliency features is designed to identify the truelicense plate from among the candidate regions Li et al [30]proposed an approach to address both detection and rec-ognition of license plate using a single deep neural network

Table 3 shows the comparison of detection results onPKU vehicle dataset As shown in Table 3 the proposedapproach achieves the best detection accuracy on PKUvehicle dataset More specifically in terms of average de-tection performance the performance of the proposedmethod is improved by 953 823 206 and 024compared with methods proposed by Zhou et al [31] Liet al [32] Yuan et al [13] and Li et al [30] respectively Itshould be noted that the performance of the proposedmethod surpasses the best of the reference methods pro-posed by Li et al [30] by a significant margin on G5 groupImages in G5 group contain multiple license plates in dif-ficult conditions such as large variance of scales reflectiveglare and blurry and are affected by defects (is resultshows a strong ability of the proposed framework ondetecting license plate in difficult conditions with a large

Complexity 7

variance of scales Figure 5(a) shows some examples ofdetection results of the proposed method on PKU vehicledataset As shown in Figure 5(a) the proposed algorithm iseffective to detect license plates with different scales underdifferent situations

43 Experimental Results on AOLP Dataset To furtherevaluate the effectiveness of the proposed framework theperformance of proposed approach is tested on AOLPdataset Table 4 shows the comparison of detection resultsof the proposed method and methods proposed by Hsuet al [29] Li et al [33] and Li et al [30] Experimentalresults in Table 4 show that the proposed method achievesthe best detection ratio on all three subsets compared to theprevious methods More specifically in terms of averagedetection the performance of the proposed method isimproved by 442 223 and 062 compared withmethods proposed by Hsu et al [29] Li et al [33] and Liet al [30] respectively (e experimental results demon-strate that the proposed balanced feature pyramid andpredicted location anchor can effectively enhance featurerepresentation power and boost the performance of licenseplate detection in difficult conditions Figure 5(b) showssome examples of detection results of the proposed methodon AOLP dataset As can be observed the proposedmethod can accurately locate small license plates as well asmedium or large ones

44 Ablation Experiments To evaluate the effectiveness ofeach module in the proposed approach this paper conductsseveral experiments on the Chinese City Parking Dataset(CCPD) [34] and compares the detection results with theresults of original faster R-CNN [1] and FPN with fasterR-CNN baseline [4] framework CCPD dataset is a largepublicly available labeled license plate dataset It contains25k independent license plate images under diverse illu-minations environments and backgrounds Each image hasresolution of 720times1160 and contains one license plate Allimages containing license plate are divided into 8 groupsbased on different conditions CCPD-Base with 200k imagesCCPD-FN with 20k images CCPD-DB with 20k imagesCCPD-Rotate with 10k images CCPD-Tilt with 10k imagesCCPD-Weather with 10k images CCPD-Challenge with 10kimages CCPD-Blur with 5k images As in [34] this paperadopts 100k images in CCPD-Base subset to train theproposed network and then evaluates the results on CCPD-Base CCPD-DB CCPD-FN CCPD-Rotate CCPD-TiltCCPD-Weather and CCPD-Challenge

In the first experiment this paper replaces VGG-16network in original faster R-CNN by the proposed bal-anced feature pyramid generation module (e RPN net-work is kept unchanged in this experiment To show theeffectiveness of the L2 normalization L2 normalizationlayer in the balanced feature pyramid generation module isdiscarded in the second experiment In the third experi-ment this paper adds the proposed RPN network with thepredicted location anchor module to replace the originalRPN network (e VGG-16 is kept unchanged as the basenetwork in this experiment In the fourth experiment thispaper adds both the proposed RPN network with thepredicted location anchor module and the proposed bal-anced feature pyramid generation module with L2 nor-malization layer to replace the original RPN network andVGG-16 architecture

Table 5 shows the detection results of each proposedexperiment on the CCPD dataset As shown in Table 5comparing with the original RPN in faster R-CNNframework the proposed predicted location anchorscheme improves the average detection by 02 Bygenerating good proposals the features for the detectionnetwork are more discriminative thus improving thedetection results Comparing with the feature pyramid inFPN with faster R-CNN baseline the proposed balancedpyramid generation module improves the average de-tection by 15 It should be noted that there is no pa-rameter added in the proposed module With theproposed module each level in the balanced featurepyramid obtains equal information from other levelsthus improving the detection performance of the de-tection network Comparing with faster R-CNN and FPNwith faster R-CNN baseline the proposed approachimproves the average detection by 45 and 19 re-spectively (e comparison results indicate that theproposed framework is superior to both single scale andmultiscale features for a region-based object detectorFurthermore with L2 normalization layer added on eachof rescaled features in the balanced feature pyramidgeneration module the average detection is improved by06 compared with the balanced feature pyramid gen-eration module without L2 normalization (is resultshows the effectiveness of the L2 normalization layerwhich keeps the feature values from different convolutionlayers on the same scale

5 Conclusions and Future Work

(is paper proposes a novel deep learning-based frameworkfor license plate detection In the proposed framework abalanced feature pyramid generation module based onResNet-34 architecture is used to generate enhanced bal-anced feature pyramid of which each feature level obtainsequal information from other feature levels In addition amultiscale region proposal network with predicted locationanchor scheme is introduced to generate good proposalsfrom each level of the balanced feature pyramid With goodproposals generated from balanced feature maps the pro-posed approach shows significant improvements compared

Table 3 Comparison of detection results on PKU vehicle dataset

MethodDetection ratio ()

G1 G2 G3 G4 G5 AverageZhou et al [31] 9543 9785 9421 8123 8237 9022Li et al [32] 9889 9842 9583 8117 8331 9152Yuan et al [13] 9876 9842 9772 9623 9732 9769Li et al [30] 9988 9971 9946 9983 9868 9951Proposed approach 9988 9986 9973 9983 9944 9975

8 Complexity

with other approaches on license plate detection (e goodperformance of the proposed approach on license platedetection has a high reference value in the field of intelligent

transport systems For the future work this paper will ex-plore and compare more feature combination andmultiscaledetection methods such as DeepLabv3+ [35] and MOSI-

(a) (b)

Figure 5 Examples of detection results of the proposed method on PKU vehicle dataset (a) and AOLP dataset (b)

Complexity 9

LPD [21] In addition this paper will adopt the nonlocalmodule [36] to further refine the balanced semantic features(is step may enhance the integrated features and furtherimprove the detection results

Data Availability

(e codes used in this paper are available from the corre-sponding author upon request

Conflicts of Interest

(e author declares that there are no conflicts of interestregarding the publication of this paper

References

[1] S Ren K He R Girshick and J Sun ldquoFaster r-cnn towardsreal-time object detection with region proposal networksrdquoIEEE Transactions on Pattern Analysis and Machine Intelli-gence vol 39 no 6 pp 1137ndash1149 2015

[2] W Liu D Anguelov D Erhan et al ldquoSingle shot multiboxdetectorrdquo 2016 httpsarxivorgabs151202325

[3] J Redmon and F Ali ldquoYolov3 an incremental improvementrdquo2018 httparxivorgabs180402767

[4] T Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 936ndash944 Hon-olulu HI USA July 2017

[5] T Lin P Goyal R Girshick K He and P Dollar ldquoFocal lossfor dense object detectionrdquo in Proceedings of the 2017 IEEEInternational Conference on Computer Vision (ICCV)pp 2999ndash3007 Venice Italy October 2017

[6] Z Cai Q Fan R S Feris and N Vasconcelos ldquoA unifiedmulti-scale deep convolutional neural network for fast objectdetectionrdquo 2016 httparxivorgabs160707155

[7] C-Y Fu W Liu A Ranga A Tyagi and A C Berg ldquoDssddeconvolutional single shot detectorrdquo 2017 httparxivorgabs170106659

[8] R Girshick ldquoFast R-CNNrdquo in Proceedings of the IEEE In-ternational Conference on Computer Vision (ICCV) SantiagoChile December 2015

[9] J Dai Yi Li K He and J Sun ldquoR-fcn object detection viaregion-based fully convolutional networksrdquo Advances inNeural information Processing Systems pp 379ndash387 MitPress Cambridge MA USA 2016

[10] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo 2016 httparxivorgabs150602640

[11] S Bell C L Zitnick K Bala and R Girshick ldquoInside-outsidenet detecting objects in context with skip pooling and re-current neural networksrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognitionpp 2874ndash2883 Las Vegas NV USA June 2016

[12] K S Raghunandan P Shivakumara H A Jalab et al ldquoRieszfractional based model for enhancing license plate detectionand recognitionrdquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 28 no 9 pp 2276ndash2288 2018

[13] Y Yuan W Zou Y Zhao X Wang X Hu andN Komodakis ldquoA robust and efficient approach to licenseplate detectionrdquo IEEE Transactions on Image Processingvol 26 no 3 pp 1102ndash1114 2017

[14] C Gou K Wang Y Yao and Z Li ldquoVehicle license platerecognition based on extremal regions and restricted Boltz-mann machinesrdquo IEEE Transactions on Intelligent Trans-portation Systems vol 17 no 4 pp 1096ndash1107 2016

[15] A H Ashtari M J Nordin andM Fathy ldquoAn Iranian licenseplate recognition system based on color featuresrdquo IEEETransactions on Intelligent Transportation Systems vol 15no 4 pp 1690ndash1705 2014

[16] S G Kim H G Jeon and H I Koo ldquoDeep-learning-basedlicense plate detection method using vehicle region extrac-tionrdquo Electronics Letters vol 53 no 15 pp 1034ndash1036 2017

[17] O Bulan V Kozitsky P Ramesh and M Shreve ldquoSeg-mentation- and annotation-free license plate recognition withdeep localization and failure identificationrdquo IEEE Transac-tions on Intelligent Transportation Systems vol 18 no 9pp 2351ndash2363 2017

[18] F Xie M Zhang J Zhao J Yang Y Liu and X Yuan ldquoArobust license plate detection and character recognition

Table 4 Comparison of detection results on AOLP dataset

MethodDetection ratio ()

AC LE RP AverageHsu et al [29] 960 950 940 95Li et al [33] 9838 9762 9558 9719Li et al [30] 9912 9908 9820 988Proposed approach 9941 9934 9951 9942

Table 5 Detection results of each proposed network on CCPD dataset

NetworkDetection performance ()

Base DB FN Rotate Tilt Weather Challenge AverageFaster R-CNN 981 921 837 918 894 811 839 886FPN with faster R-CNN baseline 992 954 875 930 913 854 863 912Faster R-CNN+balanced feature pyramid 995 962 891 932 916 889 901 927Faster R-CNN+balanced pyramid without L2-norm 993 962 889 931 910 876 883 921Faster R-CNN+predicted anchor RPN 985 926 840 915 894 814 844 888Faster R-CNN+balanced pyramid with L2-norm+predicted anchor RPN 995 964 901 932 918 897 912 931

10 Complexity

algorithm based on a combined feature extraction model andBPNNrdquo Journal of Advanced Transportation vol 2018 ArticleID 6737314 14 pages 2018

[19] L Zou M Zhao Z Gao M Cao H Jia and M Pei ldquoLicenseplate detection with shallow and deep CNNs in complexenvironmentsrdquo Complexity vol 2018 Article ID 79846536 pages 2018

[20] L Xie T Ahmad L Jin Y Liu and S Zhang ldquoA new CNN-based method for multi-directional car license plate detec-tionrdquo IEEE Transactions on Intelligent Transportation Systemsvol 19 no 2 pp 507ndash517 2018

[21] J Han J Yao J Zhao J Tu and Y Liu ldquoMulti-oriented andscale-invariant license plate detection based on convolutionalneural networksrdquo Sensors vol 19 no 5 p 1175 2019

[22] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the 2016 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 770ndash778 Las Vegas NV USA June 2016

[23] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httparxivorgabs14091556

[24] J Huang ldquoSpeedaccuracy trade-offs for modern convolu-tional object detectorsrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 3296-3297 Honolulu HI USA July 2017

[25] O Russakovsky J Deng H Su et al ldquoImagenet large scalevisual recognition challengerdquo 2014 httparxivorgabs14090575

[26] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in European Conference on Com-puter Vision Springer Berlin Germany 2014

[27] J Pang ldquoLibra r-cnn towards balanced learning for objectdetectionrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition Long Beach CA USA June2019

[28] J Wang K Chen S Yang C C Loy and D Lin ldquoRegionproposal by guided anchoringrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognitionpp 2965ndash2974 Long Beach CA USA June 2019

[29] G-S Hsu J-C Chen and Y-Z Chung ldquoApplication-ori-ented license plate recognitionrdquo IEEE Transactions on Ve-hicular Technology vol 62 no 2 pp 552ndash561 2013

[30] H Li P Wang and C Shen ldquoToward end-to-end car licenseplate detection and recognition with deep neural networksrdquoIEEE Transactions on Intelligent Transportation Systemsvol 20 no 3 pp 1126ndash1136 2019

[31] W Zhou H Li Y Lu and Q Tian ldquoPrincipal visual worddiscovery for automatic license plate detectionrdquo IEEETransactions on Image Processing vol 21 no 9 pp 4269ndash4279 2012

[32] B Li B Tian Y Li and D Wen ldquoComponent-based licenseplate detection using conditional random field modelrdquo IEEETransactions on Intelligent Transportation Systems vol 14no 4 pp 1690ndash1699 2013

[33] H Li and C Shen ldquoReading car license plates using deepconvolutional neural networks and LSTMsrdquo 2016 httpsarxivorgabs160105610

[34] Z Xu W Yang A Meng et al ldquoTowards end-to-end licenseplate detection and recognition a large dataset and baselinerdquoComputer VisionmdashECCV 2018 Springer Berlin Germanypp 261ndash277 2018

[35] L-C Chen Y Zhu P George F Schroff and H AdamldquoEncoder-decoder with atrous separable convolution for se-mantic image segmentationrdquo in Proceedings of the European

Conference on Computer Vision (ECCV) pp 801ndash818Munich Germany September 2018

[36] X Wang R Girshick A Gupta and K He ldquoNon-local neuralnetworksrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7794ndash7803 Honolulu HIUSA July 2018

Complexity 11

Page 4: PredictedAnchorRegionProposalwithBalancedFeature ...downloads.hindawi.com › journals › complexity › 2020 › 5137056.pdf · R-CNN for license plate detection. In the first

same scale L2 normalization of a vectory y1 y2 yc1113864 1113865 is defined as follows

1113954y y

y2

y

1113936ci1 yi

111386811138681113868111386811138681113868111386811138682

1113872 1113873(12)

(1)

where 1113954y represents the normalized vector y2 represents theL2 normalization of y and c represents the number ofchannels

Based on the rescaled feature maps balanced semanticfeature map is generated as follows

S 1n

1113944

kmax

kkmin

Pkprime (2)

where n represents the number of rescaled features (n 4 inthis paper) kmin and kmax represent the indexes of the lowestand the highest level in the rescaled features Pk

prime representsthe rescaled feature map at resolution level k

Finally the final balanced feature maps F2 F3 F4 F5are received by rescaling the semantic feature map in reverseprocedure More specifically F5 is obtained by using max

pooling operation on balanced semantic feature map andF2 F3 are obtained by using interpolation operation onbalanced semantic feature map With the proposed balancedfeature pyramid generation module each resolution in thefinal feature pyramid obtains equal information from otherresolutions thus balancing the information flow and leadingthe features more discriminative

32 Multiscale Region Proposal Network with Predicted Lo-cation Anchor In faster R-CNN the RPN generates a set ofanchor boxes at each location of the last convolution layer ofthe base network(e RPN then classifies these anchor boxesto objectbackground class and regresses the coordinates ofthese anchor boxes (ere are 9 anchor boxes in total at eachlocation of the feature map in original faster R-CNNframework Each anchor box is associated with predefinedscales and aspect ratios

Wang et al [28] showed that the uniform anchoringscheme in faster R-CNN can lead to significant computa-tional cost because many anchor boxes are generated inregions where the objects of interest are unlikely to exist Inaddition a good of anchor box setting is needed for differentproblems to improve performance (us this paper pro-poses a novel anchor generation scheme based on guidedanchoring scheme [28] for generating anchor boxes

Figure 3 illustrates the difference between the RPN infaster R-CNN and the proposed RPN with novel anchorgeneration scheme In the proposed RPN this paper firstapplies a 1times 1 convolution layer at each feature scale of thebalanced feature maps to create objectness score mapElementwise sigmoid function is then adopted to convert theobjectness score map to probability map Based on theprobability map on each scale the positive regions wherelicense plate candidate may possibly exist can be determinedby selecting those locations whose corresponding proba-bility values are above a predefined threshold As in [28] thispredefined threshold is set at 001 in this paper Finally a3times 3 convolution layer filter is applied across each sliding

Input image

e basenetwork

Enhancedfeature maps

Balancedfeature maps

Balanced featurepyramid generation

module

RPN with predictedlocation anchor

RoI pooling

Fixed sizefeatures

FC la

yer

FC la

yer FC

laye

rFC

laye

r

Classification

Regression

Detection network

Figure 1 Overall pipeline of the proposed approach

Table 1 (e architectures of ResNet-34 for ImageNet

Layer name Kernel size Output sizeConv1 7times 7times 64 stride 2 112times112

Conv23times 3 max pool stride 2

56times 563 times 3 times 643 times 3 times 641113890 1113891 times 3

Conv33 times 3 times 1283 times 3 times 1281113890 1113891 times 4 28times 28

Conv43 times 3 times 2563 times 3 times 2561113890 1113891 times 6 14times14

Conv53 times 3 times 5123 times 3 times 5121113890 1113891 times 3 7times 7

Downsampling is performed by conv3-1 conv4-1 and conv5-1 with a strideof 2

4 Complexity

position on the input feature map At each position on theinput feature map corresponding with positive regions onthe probability map the local features are extracted andconcatenated along the channel axis and form a 256-dfeature vector which is then fed into two separate fullyconvolutional layers for license platebackground classifi-cation and box regression (e probability map can elimi-nate almost negative regions while still maintaining the samerecall Figure 4 shows the example results of the originalRPN and the proposed RPN Because the proposed RPNslides over all positive locations in all balanced pyramidlevels it is not necessary to have multiscale anchors on aspecific level Instead this paper assigns anchors of a singlescale to each level of the balanced pyramid according to thesize and aspect ratio of license plates in the dataset summarytable (Table 2) More specifically this paper defines theanchors to have the height of 5 10 15 20 pixels with anaspect ratio widthheight 5 on F2 F3 F4 F5 respectively

33 Detection Network Although the proposed multiscaleRPN could work as a detector itself it is not strong since itssliding windows do not cover objects well To increasedetection accuracy a license plate detection network isadded License plate detection network is used to classifyproposals generated by the proposed RPN to license plateand background class and further refine the coordinates ofdetected license plate (e license plate detection networkhas a region of interest (RoI) pooling layer and two fullyconnected (FC) layers as shown in Figure 1

Based on proposals generated by the multiscale RPN RoIpooling layer is used to extract the fixed size feature patchesfrom the balanced feature pyramid As in [4] this paperselects the balanced feature map layer in the most proper scaleto extract the feature patches based on the size of eachproposal More specifically a proposal of width w and heighth is assigned to the level Fk of the proposed feature pyramidwith k being calculated as the following formula

k k0 + log2

wh

radic

2241113888 1113889 (3)

where 224 is the canonical ImageNet pretraining size and k0is the target level on which a proposal with w times h 224times 224should be mapped into (is paper sets k0 4 as in [4]

(e fixed size patches are then flattened into a vector andpassed through the two 1024-d FC layers followed by ReLU(e encoded features are then fed into two separate lineartransformation layers license plate classification layer andbounding box regression layer (e license plate classifica-tion layer has two outputs which indicate the softmaxprobability of each proposal as license platebackground(e license plate regression layer produces the bounding boxcoordinate offsets for each proposal

34 Loss Function (e proposed framework is trained in anend-to-end fashion using a multitask loss function Besidethe conventional classification loss Lcls and regression lossLreg this paper adds additional loss function for the anchor

Inpu

t im

age

Conv

1 (C

1)Co

nv2

(C2)

Conv

3 (C

3)Co

nv4

(C4)

Conv

5 (C

5) 1 times 1 conv

1 times 1 conv

1 times 1 conv

1 times 1 conv

M5

M4

M3

M2

Upsample times 2

Upsample times 2

Upsample times 2

3 times 3 conv

3 times 3 conv

3 times 3 conv

P5

P4

P3

P2

Resize

Resize

Resize

L2 norm

L2 norm

L2 norm

L2 norm

Resize

Resize

Resize

Resize

F5

F4

F3

F2

Semanticfeature map

Balanced feature maps

Figure 2 (e architecture of the balanced feature map generation module

Complexity 5

box location prediction Lloc (us the multitask lossfunction is defined as follows

L 1113944 Lcls + 1113944 Lreg + Lloc (4)

In (4) the binary logistic loss is used for box classifi-cation and smooth L1 loss [1] is adopted for box regressionFor training the anchor box location prediction branch inthe proposed RPN this paper follows the training schemedesigned in [28] More specifically this paper denotes theground-truth bounding box as (xgt ygt wgt hgt) where(xgt ygt) represents the center coordinates and (wgt hgt)

represents the size of the ground-truth bounding box (eground-truth bounding box is mapped to the correspondingbalanced feature map scale to obtain (xgtprime ygtprime wgtprime hgtprime ) Basedon the obtained bounding box the center box (CB) ignorebox (IB) and outside box (OB) are defined as follows

CB xgtprime ygtprime z1wgtprime z1hgtprime1113872 1113873 (5)

IB xgtprime ygtprime z2wgtprime z2hgtprime1113872 1113873 minus CB (6)

OB xgtprime ygtprime wgtprime hgtprime1113872 1113873 minus CB minus IB (7)

Input feature map

Conv layer(3 times 3 times C times N)

Conv layer(1 times 1 times N times 2k)

Conv layer(1 times 1 times N times 4k)

Classification

Regression

Proposals

W times H times C

(a)

W times H times C

Conv layer(1 times 1 times C times 1)

Objectnessscore map

Probabilitymap

W times H times 1

Elementwise sigmoid

Conv layer(3 times 3 times C times N)

Conv layer(1 times 1 times N times 2k)

Conv layer(1 times 1 times N times 4k)

Classification

Regression

Proposals

(b)

Figure 3(e architecture of the original RPN (a) and the proposed RPNwith predicted location anchor (b) Multilayer RPN is used in this paper

(a) (b)

Figure 4 Example results of the original RPN (a) and the proposed RPN (b)

Table 2 Dataset summary

Dataset Number of images Number of license plates Image resolution License plate height (in pixels)

PKU vehicle dataset

G1 810 810 1082times 728 35ndash57G2 700 700 1082times 728 30ndash62G3 743 743 1082times 728 29ndash53G4 572 572 1600times1236 30ndash58G5 1152 1438 1600times1200 20ndash60

AOLP datasetAC 681 681 352times 240 25ndash70LE 757 757 640times 480 28ndash80RP 611 611 320times 240 30ndash70

6 Complexity

where z2 gt z1 Pixels inside CB are assigned as positive lo-cations while pixels inside OB are assigned as negativelocations Otherwise pixels inside IB are discarded intraining samples In the end for each image in the trainingset a binary label map where 1 represents a positive locationand 0 represents a negative location is generated for trainingthe anchor box location prediction branch Note that eachlevel of the balanced feature map should only assign objectsof a specific scale range so CB is only assigned on a featuremap that matches the scale range of the targeted object (esame regions of adjacent levels in the balanced featurepyramid are set as IB Finally focal loss function [5] isadopted to train the anchor box location prediction branchfor solving sample level imbalance problem

4 Results and Discussion

In order to compare the effectiveness of the proposed ap-proach with other state-of-the-art approaches on licenseplate detection this paper conducts experiments on twopublic datasets PKU vehicle dataset [13] and ApplicationOriented License Plate (AOLP) dataset [29] (e proposedapproach is implemented on a Window system machinewith Intel Core i7 8700 CPU NVIDIA GeForce GTX 1080GPU and 16Gb of RAM TensorFlow is adopted forimplementing deep CNN frameworks

41 Dataset and EvaluationMetric Two public license platedatasets are adopted to evaluate the performance of theproposed method in this paper including PKU vehicledataset [13] and AOLP dataset [29]

PKU vehicle dataset includes 3828 vehicle images cap-tured from various scenes under diverse environmentconditions (e image in this dataset is divided into fivegroups (G1-G5) corresponding to different configurationsMore specifically all images in G1 G2 and G3 group weretaken on highways while images in G4 group were taken oncity roads and images in G5 group were taken at inter-sections with crosswalks (e image in G4 group is capturedduring nighttime while the image in other groups is cap-tured during daytime (ere is one Chinese license plate ineach image of G1-G4 group while multiple Chinese licenseplates are captured in each image of G5 group For trainingthe proposed network this paper adopts CarFlag-Largedataset [30] which contains 460000 images with Chineselicense plates

AOLP dataset includes 2049 images of Taiwan licenseplates in various locations time traffic and weather con-ditions (is dataset is categorized into three subsets accesscontrol (AC) with 681 images traffic law enforcement (LE)with 757 images and road patrol (RP) with 611 images ACrefers to the cases that a vehicle passes a fixed passage at areduced speed or with a full stop LE refers to the cases that avehicle violates traffic laws and is captured by a roadsidecamera RP refers to the cases that the camera is installed orhandheld on a patrolling vehicle which takes images ofvehicles with arbitrary viewpoints and distances Each imagecontains one license plate Since there is no standard split for

AOLP dataset this paper follows the same strategy as in [30]for training the proposed network More specifically thispaper uses images from different subsets for training and testseparately In addition data augmentation is conducted byrotation and affine transformation to increase the number oftraining images In this paper PKU vehicle dataset andAOLP dataset are adopted to evaluate the performance of theproposed approach and compare the detection results withthe results of other state-of-the-art approaches Table 2shows the detailed descriptions of each dataset used inthis paper

For the evaluationmetric this paper follows the criterionused in [13] to evaluate the performance of the proposedmethod and other methods on the PKU vehicle dataset andAOLP dataset More specifically a detection is considered tobe correct if the license plate is totally encompassed by thebounding box and the IoU between the detected license plateand the ground-truth license plate is at least 05

42ExperimentalResults onPKUVehicleDataset In order toshow the effectiveness of the proposed approach this papercompares the performance results of the proposed methodwith the results of state-of-the-art license plate detectionmethods on PKU vehicle dataset including the methodsproposed by Zhou et al [31] Li et al [32] Yuan et al [13]and Li et al [30] Zhou et al [31] proposed to discover theprincipal visual word characterized with geometric contextfor each license plate character With a new license plateimage the license plates are extracted by matching localfeatures with principal visual word Li et al [32] usedmaximally stable extremal region detector to extract can-didate characters in images (e exact bounding boxes oflicense plates are estimated through the belief propagationinference on conditional random field which are constructedon the candidate characters in neighborhoods Yuan et al[13] proposed a novel line density filter approach to extractlicense candidate regions and a cascaded license plateclassifier based on linear support vector machines usingcolour saliency features is designed to identify the truelicense plate from among the candidate regions Li et al [30]proposed an approach to address both detection and rec-ognition of license plate using a single deep neural network

Table 3 shows the comparison of detection results onPKU vehicle dataset As shown in Table 3 the proposedapproach achieves the best detection accuracy on PKUvehicle dataset More specifically in terms of average de-tection performance the performance of the proposedmethod is improved by 953 823 206 and 024compared with methods proposed by Zhou et al [31] Liet al [32] Yuan et al [13] and Li et al [30] respectively Itshould be noted that the performance of the proposedmethod surpasses the best of the reference methods pro-posed by Li et al [30] by a significant margin on G5 groupImages in G5 group contain multiple license plates in dif-ficult conditions such as large variance of scales reflectiveglare and blurry and are affected by defects (is resultshows a strong ability of the proposed framework ondetecting license plate in difficult conditions with a large

Complexity 7

variance of scales Figure 5(a) shows some examples ofdetection results of the proposed method on PKU vehicledataset As shown in Figure 5(a) the proposed algorithm iseffective to detect license plates with different scales underdifferent situations

43 Experimental Results on AOLP Dataset To furtherevaluate the effectiveness of the proposed framework theperformance of proposed approach is tested on AOLPdataset Table 4 shows the comparison of detection resultsof the proposed method and methods proposed by Hsuet al [29] Li et al [33] and Li et al [30] Experimentalresults in Table 4 show that the proposed method achievesthe best detection ratio on all three subsets compared to theprevious methods More specifically in terms of averagedetection the performance of the proposed method isimproved by 442 223 and 062 compared withmethods proposed by Hsu et al [29] Li et al [33] and Liet al [30] respectively (e experimental results demon-strate that the proposed balanced feature pyramid andpredicted location anchor can effectively enhance featurerepresentation power and boost the performance of licenseplate detection in difficult conditions Figure 5(b) showssome examples of detection results of the proposed methodon AOLP dataset As can be observed the proposedmethod can accurately locate small license plates as well asmedium or large ones

44 Ablation Experiments To evaluate the effectiveness ofeach module in the proposed approach this paper conductsseveral experiments on the Chinese City Parking Dataset(CCPD) [34] and compares the detection results with theresults of original faster R-CNN [1] and FPN with fasterR-CNN baseline [4] framework CCPD dataset is a largepublicly available labeled license plate dataset It contains25k independent license plate images under diverse illu-minations environments and backgrounds Each image hasresolution of 720times1160 and contains one license plate Allimages containing license plate are divided into 8 groupsbased on different conditions CCPD-Base with 200k imagesCCPD-FN with 20k images CCPD-DB with 20k imagesCCPD-Rotate with 10k images CCPD-Tilt with 10k imagesCCPD-Weather with 10k images CCPD-Challenge with 10kimages CCPD-Blur with 5k images As in [34] this paperadopts 100k images in CCPD-Base subset to train theproposed network and then evaluates the results on CCPD-Base CCPD-DB CCPD-FN CCPD-Rotate CCPD-TiltCCPD-Weather and CCPD-Challenge

In the first experiment this paper replaces VGG-16network in original faster R-CNN by the proposed bal-anced feature pyramid generation module (e RPN net-work is kept unchanged in this experiment To show theeffectiveness of the L2 normalization L2 normalizationlayer in the balanced feature pyramid generation module isdiscarded in the second experiment In the third experi-ment this paper adds the proposed RPN network with thepredicted location anchor module to replace the originalRPN network (e VGG-16 is kept unchanged as the basenetwork in this experiment In the fourth experiment thispaper adds both the proposed RPN network with thepredicted location anchor module and the proposed bal-anced feature pyramid generation module with L2 nor-malization layer to replace the original RPN network andVGG-16 architecture

Table 5 shows the detection results of each proposedexperiment on the CCPD dataset As shown in Table 5comparing with the original RPN in faster R-CNNframework the proposed predicted location anchorscheme improves the average detection by 02 Bygenerating good proposals the features for the detectionnetwork are more discriminative thus improving thedetection results Comparing with the feature pyramid inFPN with faster R-CNN baseline the proposed balancedpyramid generation module improves the average de-tection by 15 It should be noted that there is no pa-rameter added in the proposed module With theproposed module each level in the balanced featurepyramid obtains equal information from other levelsthus improving the detection performance of the de-tection network Comparing with faster R-CNN and FPNwith faster R-CNN baseline the proposed approachimproves the average detection by 45 and 19 re-spectively (e comparison results indicate that theproposed framework is superior to both single scale andmultiscale features for a region-based object detectorFurthermore with L2 normalization layer added on eachof rescaled features in the balanced feature pyramidgeneration module the average detection is improved by06 compared with the balanced feature pyramid gen-eration module without L2 normalization (is resultshows the effectiveness of the L2 normalization layerwhich keeps the feature values from different convolutionlayers on the same scale

5 Conclusions and Future Work

(is paper proposes a novel deep learning-based frameworkfor license plate detection In the proposed framework abalanced feature pyramid generation module based onResNet-34 architecture is used to generate enhanced bal-anced feature pyramid of which each feature level obtainsequal information from other feature levels In addition amultiscale region proposal network with predicted locationanchor scheme is introduced to generate good proposalsfrom each level of the balanced feature pyramid With goodproposals generated from balanced feature maps the pro-posed approach shows significant improvements compared

Table 3 Comparison of detection results on PKU vehicle dataset

MethodDetection ratio ()

G1 G2 G3 G4 G5 AverageZhou et al [31] 9543 9785 9421 8123 8237 9022Li et al [32] 9889 9842 9583 8117 8331 9152Yuan et al [13] 9876 9842 9772 9623 9732 9769Li et al [30] 9988 9971 9946 9983 9868 9951Proposed approach 9988 9986 9973 9983 9944 9975

8 Complexity

with other approaches on license plate detection (e goodperformance of the proposed approach on license platedetection has a high reference value in the field of intelligent

transport systems For the future work this paper will ex-plore and compare more feature combination andmultiscaledetection methods such as DeepLabv3+ [35] and MOSI-

(a) (b)

Figure 5 Examples of detection results of the proposed method on PKU vehicle dataset (a) and AOLP dataset (b)

Complexity 9

LPD [21] In addition this paper will adopt the nonlocalmodule [36] to further refine the balanced semantic features(is step may enhance the integrated features and furtherimprove the detection results

Data Availability

(e codes used in this paper are available from the corre-sponding author upon request

Conflicts of Interest

(e author declares that there are no conflicts of interestregarding the publication of this paper

References

[1] S Ren K He R Girshick and J Sun ldquoFaster r-cnn towardsreal-time object detection with region proposal networksrdquoIEEE Transactions on Pattern Analysis and Machine Intelli-gence vol 39 no 6 pp 1137ndash1149 2015

[2] W Liu D Anguelov D Erhan et al ldquoSingle shot multiboxdetectorrdquo 2016 httpsarxivorgabs151202325

[3] J Redmon and F Ali ldquoYolov3 an incremental improvementrdquo2018 httparxivorgabs180402767

[4] T Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 936ndash944 Hon-olulu HI USA July 2017

[5] T Lin P Goyal R Girshick K He and P Dollar ldquoFocal lossfor dense object detectionrdquo in Proceedings of the 2017 IEEEInternational Conference on Computer Vision (ICCV)pp 2999ndash3007 Venice Italy October 2017

[6] Z Cai Q Fan R S Feris and N Vasconcelos ldquoA unifiedmulti-scale deep convolutional neural network for fast objectdetectionrdquo 2016 httparxivorgabs160707155

[7] C-Y Fu W Liu A Ranga A Tyagi and A C Berg ldquoDssddeconvolutional single shot detectorrdquo 2017 httparxivorgabs170106659

[8] R Girshick ldquoFast R-CNNrdquo in Proceedings of the IEEE In-ternational Conference on Computer Vision (ICCV) SantiagoChile December 2015

[9] J Dai Yi Li K He and J Sun ldquoR-fcn object detection viaregion-based fully convolutional networksrdquo Advances inNeural information Processing Systems pp 379ndash387 MitPress Cambridge MA USA 2016

[10] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo 2016 httparxivorgabs150602640

[11] S Bell C L Zitnick K Bala and R Girshick ldquoInside-outsidenet detecting objects in context with skip pooling and re-current neural networksrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognitionpp 2874ndash2883 Las Vegas NV USA June 2016

[12] K S Raghunandan P Shivakumara H A Jalab et al ldquoRieszfractional based model for enhancing license plate detectionand recognitionrdquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 28 no 9 pp 2276ndash2288 2018

[13] Y Yuan W Zou Y Zhao X Wang X Hu andN Komodakis ldquoA robust and efficient approach to licenseplate detectionrdquo IEEE Transactions on Image Processingvol 26 no 3 pp 1102ndash1114 2017

[14] C Gou K Wang Y Yao and Z Li ldquoVehicle license platerecognition based on extremal regions and restricted Boltz-mann machinesrdquo IEEE Transactions on Intelligent Trans-portation Systems vol 17 no 4 pp 1096ndash1107 2016

[15] A H Ashtari M J Nordin andM Fathy ldquoAn Iranian licenseplate recognition system based on color featuresrdquo IEEETransactions on Intelligent Transportation Systems vol 15no 4 pp 1690ndash1705 2014

[16] S G Kim H G Jeon and H I Koo ldquoDeep-learning-basedlicense plate detection method using vehicle region extrac-tionrdquo Electronics Letters vol 53 no 15 pp 1034ndash1036 2017

[17] O Bulan V Kozitsky P Ramesh and M Shreve ldquoSeg-mentation- and annotation-free license plate recognition withdeep localization and failure identificationrdquo IEEE Transac-tions on Intelligent Transportation Systems vol 18 no 9pp 2351ndash2363 2017

[18] F Xie M Zhang J Zhao J Yang Y Liu and X Yuan ldquoArobust license plate detection and character recognition

Table 4 Comparison of detection results on AOLP dataset

MethodDetection ratio ()

AC LE RP AverageHsu et al [29] 960 950 940 95Li et al [33] 9838 9762 9558 9719Li et al [30] 9912 9908 9820 988Proposed approach 9941 9934 9951 9942

Table 5 Detection results of each proposed network on CCPD dataset

NetworkDetection performance ()

Base DB FN Rotate Tilt Weather Challenge AverageFaster R-CNN 981 921 837 918 894 811 839 886FPN with faster R-CNN baseline 992 954 875 930 913 854 863 912Faster R-CNN+balanced feature pyramid 995 962 891 932 916 889 901 927Faster R-CNN+balanced pyramid without L2-norm 993 962 889 931 910 876 883 921Faster R-CNN+predicted anchor RPN 985 926 840 915 894 814 844 888Faster R-CNN+balanced pyramid with L2-norm+predicted anchor RPN 995 964 901 932 918 897 912 931

10 Complexity

algorithm based on a combined feature extraction model andBPNNrdquo Journal of Advanced Transportation vol 2018 ArticleID 6737314 14 pages 2018

[19] L Zou M Zhao Z Gao M Cao H Jia and M Pei ldquoLicenseplate detection with shallow and deep CNNs in complexenvironmentsrdquo Complexity vol 2018 Article ID 79846536 pages 2018

[20] L Xie T Ahmad L Jin Y Liu and S Zhang ldquoA new CNN-based method for multi-directional car license plate detec-tionrdquo IEEE Transactions on Intelligent Transportation Systemsvol 19 no 2 pp 507ndash517 2018

[21] J Han J Yao J Zhao J Tu and Y Liu ldquoMulti-oriented andscale-invariant license plate detection based on convolutionalneural networksrdquo Sensors vol 19 no 5 p 1175 2019

[22] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the 2016 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 770ndash778 Las Vegas NV USA June 2016

[23] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httparxivorgabs14091556

[24] J Huang ldquoSpeedaccuracy trade-offs for modern convolu-tional object detectorsrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 3296-3297 Honolulu HI USA July 2017

[25] O Russakovsky J Deng H Su et al ldquoImagenet large scalevisual recognition challengerdquo 2014 httparxivorgabs14090575

[26] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in European Conference on Com-puter Vision Springer Berlin Germany 2014

[27] J Pang ldquoLibra r-cnn towards balanced learning for objectdetectionrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition Long Beach CA USA June2019

[28] J Wang K Chen S Yang C C Loy and D Lin ldquoRegionproposal by guided anchoringrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognitionpp 2965ndash2974 Long Beach CA USA June 2019

[29] G-S Hsu J-C Chen and Y-Z Chung ldquoApplication-ori-ented license plate recognitionrdquo IEEE Transactions on Ve-hicular Technology vol 62 no 2 pp 552ndash561 2013

[30] H Li P Wang and C Shen ldquoToward end-to-end car licenseplate detection and recognition with deep neural networksrdquoIEEE Transactions on Intelligent Transportation Systemsvol 20 no 3 pp 1126ndash1136 2019

[31] W Zhou H Li Y Lu and Q Tian ldquoPrincipal visual worddiscovery for automatic license plate detectionrdquo IEEETransactions on Image Processing vol 21 no 9 pp 4269ndash4279 2012

[32] B Li B Tian Y Li and D Wen ldquoComponent-based licenseplate detection using conditional random field modelrdquo IEEETransactions on Intelligent Transportation Systems vol 14no 4 pp 1690ndash1699 2013

[33] H Li and C Shen ldquoReading car license plates using deepconvolutional neural networks and LSTMsrdquo 2016 httpsarxivorgabs160105610

[34] Z Xu W Yang A Meng et al ldquoTowards end-to-end licenseplate detection and recognition a large dataset and baselinerdquoComputer VisionmdashECCV 2018 Springer Berlin Germanypp 261ndash277 2018

[35] L-C Chen Y Zhu P George F Schroff and H AdamldquoEncoder-decoder with atrous separable convolution for se-mantic image segmentationrdquo in Proceedings of the European

Conference on Computer Vision (ECCV) pp 801ndash818Munich Germany September 2018

[36] X Wang R Girshick A Gupta and K He ldquoNon-local neuralnetworksrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7794ndash7803 Honolulu HIUSA July 2018

Complexity 11

Page 5: PredictedAnchorRegionProposalwithBalancedFeature ...downloads.hindawi.com › journals › complexity › 2020 › 5137056.pdf · R-CNN for license plate detection. In the first

position on the input feature map At each position on theinput feature map corresponding with positive regions onthe probability map the local features are extracted andconcatenated along the channel axis and form a 256-dfeature vector which is then fed into two separate fullyconvolutional layers for license platebackground classifi-cation and box regression (e probability map can elimi-nate almost negative regions while still maintaining the samerecall Figure 4 shows the example results of the originalRPN and the proposed RPN Because the proposed RPNslides over all positive locations in all balanced pyramidlevels it is not necessary to have multiscale anchors on aspecific level Instead this paper assigns anchors of a singlescale to each level of the balanced pyramid according to thesize and aspect ratio of license plates in the dataset summarytable (Table 2) More specifically this paper defines theanchors to have the height of 5 10 15 20 pixels with anaspect ratio widthheight 5 on F2 F3 F4 F5 respectively

33 Detection Network Although the proposed multiscaleRPN could work as a detector itself it is not strong since itssliding windows do not cover objects well To increasedetection accuracy a license plate detection network isadded License plate detection network is used to classifyproposals generated by the proposed RPN to license plateand background class and further refine the coordinates ofdetected license plate (e license plate detection networkhas a region of interest (RoI) pooling layer and two fullyconnected (FC) layers as shown in Figure 1

Based on proposals generated by the multiscale RPN RoIpooling layer is used to extract the fixed size feature patchesfrom the balanced feature pyramid As in [4] this paperselects the balanced feature map layer in the most proper scaleto extract the feature patches based on the size of eachproposal More specifically a proposal of width w and heighth is assigned to the level Fk of the proposed feature pyramidwith k being calculated as the following formula

k k0 + log2

wh

radic

2241113888 1113889 (3)

where 224 is the canonical ImageNet pretraining size and k0is the target level on which a proposal with w times h 224times 224should be mapped into (is paper sets k0 4 as in [4]

(e fixed size patches are then flattened into a vector andpassed through the two 1024-d FC layers followed by ReLU(e encoded features are then fed into two separate lineartransformation layers license plate classification layer andbounding box regression layer (e license plate classifica-tion layer has two outputs which indicate the softmaxprobability of each proposal as license platebackground(e license plate regression layer produces the bounding boxcoordinate offsets for each proposal

34 Loss Function (e proposed framework is trained in anend-to-end fashion using a multitask loss function Besidethe conventional classification loss Lcls and regression lossLreg this paper adds additional loss function for the anchor

Inpu

t im

age

Conv

1 (C

1)Co

nv2

(C2)

Conv

3 (C

3)Co

nv4

(C4)

Conv

5 (C

5) 1 times 1 conv

1 times 1 conv

1 times 1 conv

1 times 1 conv

M5

M4

M3

M2

Upsample times 2

Upsample times 2

Upsample times 2

3 times 3 conv

3 times 3 conv

3 times 3 conv

P5

P4

P3

P2

Resize

Resize

Resize

L2 norm

L2 norm

L2 norm

L2 norm

Resize

Resize

Resize

Resize

F5

F4

F3

F2

Semanticfeature map

Balanced feature maps

Figure 2 (e architecture of the balanced feature map generation module

Complexity 5

box location prediction Lloc (us the multitask lossfunction is defined as follows

L 1113944 Lcls + 1113944 Lreg + Lloc (4)

In (4) the binary logistic loss is used for box classifi-cation and smooth L1 loss [1] is adopted for box regressionFor training the anchor box location prediction branch inthe proposed RPN this paper follows the training schemedesigned in [28] More specifically this paper denotes theground-truth bounding box as (xgt ygt wgt hgt) where(xgt ygt) represents the center coordinates and (wgt hgt)

represents the size of the ground-truth bounding box (eground-truth bounding box is mapped to the correspondingbalanced feature map scale to obtain (xgtprime ygtprime wgtprime hgtprime ) Basedon the obtained bounding box the center box (CB) ignorebox (IB) and outside box (OB) are defined as follows

CB xgtprime ygtprime z1wgtprime z1hgtprime1113872 1113873 (5)

IB xgtprime ygtprime z2wgtprime z2hgtprime1113872 1113873 minus CB (6)

OB xgtprime ygtprime wgtprime hgtprime1113872 1113873 minus CB minus IB (7)

Input feature map

Conv layer(3 times 3 times C times N)

Conv layer(1 times 1 times N times 2k)

Conv layer(1 times 1 times N times 4k)

Classification

Regression

Proposals

W times H times C

(a)

W times H times C

Conv layer(1 times 1 times C times 1)

Objectnessscore map

Probabilitymap

W times H times 1

Elementwise sigmoid

Conv layer(3 times 3 times C times N)

Conv layer(1 times 1 times N times 2k)

Conv layer(1 times 1 times N times 4k)

Classification

Regression

Proposals

(b)

Figure 3(e architecture of the original RPN (a) and the proposed RPNwith predicted location anchor (b) Multilayer RPN is used in this paper

(a) (b)

Figure 4 Example results of the original RPN (a) and the proposed RPN (b)

Table 2 Dataset summary

Dataset Number of images Number of license plates Image resolution License plate height (in pixels)

PKU vehicle dataset

G1 810 810 1082times 728 35ndash57G2 700 700 1082times 728 30ndash62G3 743 743 1082times 728 29ndash53G4 572 572 1600times1236 30ndash58G5 1152 1438 1600times1200 20ndash60

AOLP datasetAC 681 681 352times 240 25ndash70LE 757 757 640times 480 28ndash80RP 611 611 320times 240 30ndash70

6 Complexity

where z2 gt z1 Pixels inside CB are assigned as positive lo-cations while pixels inside OB are assigned as negativelocations Otherwise pixels inside IB are discarded intraining samples In the end for each image in the trainingset a binary label map where 1 represents a positive locationand 0 represents a negative location is generated for trainingthe anchor box location prediction branch Note that eachlevel of the balanced feature map should only assign objectsof a specific scale range so CB is only assigned on a featuremap that matches the scale range of the targeted object (esame regions of adjacent levels in the balanced featurepyramid are set as IB Finally focal loss function [5] isadopted to train the anchor box location prediction branchfor solving sample level imbalance problem

4 Results and Discussion

In order to compare the effectiveness of the proposed ap-proach with other state-of-the-art approaches on licenseplate detection this paper conducts experiments on twopublic datasets PKU vehicle dataset [13] and ApplicationOriented License Plate (AOLP) dataset [29] (e proposedapproach is implemented on a Window system machinewith Intel Core i7 8700 CPU NVIDIA GeForce GTX 1080GPU and 16Gb of RAM TensorFlow is adopted forimplementing deep CNN frameworks

41 Dataset and EvaluationMetric Two public license platedatasets are adopted to evaluate the performance of theproposed method in this paper including PKU vehicledataset [13] and AOLP dataset [29]

PKU vehicle dataset includes 3828 vehicle images cap-tured from various scenes under diverse environmentconditions (e image in this dataset is divided into fivegroups (G1-G5) corresponding to different configurationsMore specifically all images in G1 G2 and G3 group weretaken on highways while images in G4 group were taken oncity roads and images in G5 group were taken at inter-sections with crosswalks (e image in G4 group is capturedduring nighttime while the image in other groups is cap-tured during daytime (ere is one Chinese license plate ineach image of G1-G4 group while multiple Chinese licenseplates are captured in each image of G5 group For trainingthe proposed network this paper adopts CarFlag-Largedataset [30] which contains 460000 images with Chineselicense plates

AOLP dataset includes 2049 images of Taiwan licenseplates in various locations time traffic and weather con-ditions (is dataset is categorized into three subsets accesscontrol (AC) with 681 images traffic law enforcement (LE)with 757 images and road patrol (RP) with 611 images ACrefers to the cases that a vehicle passes a fixed passage at areduced speed or with a full stop LE refers to the cases that avehicle violates traffic laws and is captured by a roadsidecamera RP refers to the cases that the camera is installed orhandheld on a patrolling vehicle which takes images ofvehicles with arbitrary viewpoints and distances Each imagecontains one license plate Since there is no standard split for

AOLP dataset this paper follows the same strategy as in [30]for training the proposed network More specifically thispaper uses images from different subsets for training and testseparately In addition data augmentation is conducted byrotation and affine transformation to increase the number oftraining images In this paper PKU vehicle dataset andAOLP dataset are adopted to evaluate the performance of theproposed approach and compare the detection results withthe results of other state-of-the-art approaches Table 2shows the detailed descriptions of each dataset used inthis paper

For the evaluationmetric this paper follows the criterionused in [13] to evaluate the performance of the proposedmethod and other methods on the PKU vehicle dataset andAOLP dataset More specifically a detection is considered tobe correct if the license plate is totally encompassed by thebounding box and the IoU between the detected license plateand the ground-truth license plate is at least 05

42ExperimentalResults onPKUVehicleDataset In order toshow the effectiveness of the proposed approach this papercompares the performance results of the proposed methodwith the results of state-of-the-art license plate detectionmethods on PKU vehicle dataset including the methodsproposed by Zhou et al [31] Li et al [32] Yuan et al [13]and Li et al [30] Zhou et al [31] proposed to discover theprincipal visual word characterized with geometric contextfor each license plate character With a new license plateimage the license plates are extracted by matching localfeatures with principal visual word Li et al [32] usedmaximally stable extremal region detector to extract can-didate characters in images (e exact bounding boxes oflicense plates are estimated through the belief propagationinference on conditional random field which are constructedon the candidate characters in neighborhoods Yuan et al[13] proposed a novel line density filter approach to extractlicense candidate regions and a cascaded license plateclassifier based on linear support vector machines usingcolour saliency features is designed to identify the truelicense plate from among the candidate regions Li et al [30]proposed an approach to address both detection and rec-ognition of license plate using a single deep neural network

Table 3 shows the comparison of detection results onPKU vehicle dataset As shown in Table 3 the proposedapproach achieves the best detection accuracy on PKUvehicle dataset More specifically in terms of average de-tection performance the performance of the proposedmethod is improved by 953 823 206 and 024compared with methods proposed by Zhou et al [31] Liet al [32] Yuan et al [13] and Li et al [30] respectively Itshould be noted that the performance of the proposedmethod surpasses the best of the reference methods pro-posed by Li et al [30] by a significant margin on G5 groupImages in G5 group contain multiple license plates in dif-ficult conditions such as large variance of scales reflectiveglare and blurry and are affected by defects (is resultshows a strong ability of the proposed framework ondetecting license plate in difficult conditions with a large

Complexity 7

variance of scales Figure 5(a) shows some examples ofdetection results of the proposed method on PKU vehicledataset As shown in Figure 5(a) the proposed algorithm iseffective to detect license plates with different scales underdifferent situations

43 Experimental Results on AOLP Dataset To furtherevaluate the effectiveness of the proposed framework theperformance of proposed approach is tested on AOLPdataset Table 4 shows the comparison of detection resultsof the proposed method and methods proposed by Hsuet al [29] Li et al [33] and Li et al [30] Experimentalresults in Table 4 show that the proposed method achievesthe best detection ratio on all three subsets compared to theprevious methods More specifically in terms of averagedetection the performance of the proposed method isimproved by 442 223 and 062 compared withmethods proposed by Hsu et al [29] Li et al [33] and Liet al [30] respectively (e experimental results demon-strate that the proposed balanced feature pyramid andpredicted location anchor can effectively enhance featurerepresentation power and boost the performance of licenseplate detection in difficult conditions Figure 5(b) showssome examples of detection results of the proposed methodon AOLP dataset As can be observed the proposedmethod can accurately locate small license plates as well asmedium or large ones

44 Ablation Experiments To evaluate the effectiveness ofeach module in the proposed approach this paper conductsseveral experiments on the Chinese City Parking Dataset(CCPD) [34] and compares the detection results with theresults of original faster R-CNN [1] and FPN with fasterR-CNN baseline [4] framework CCPD dataset is a largepublicly available labeled license plate dataset It contains25k independent license plate images under diverse illu-minations environments and backgrounds Each image hasresolution of 720times1160 and contains one license plate Allimages containing license plate are divided into 8 groupsbased on different conditions CCPD-Base with 200k imagesCCPD-FN with 20k images CCPD-DB with 20k imagesCCPD-Rotate with 10k images CCPD-Tilt with 10k imagesCCPD-Weather with 10k images CCPD-Challenge with 10kimages CCPD-Blur with 5k images As in [34] this paperadopts 100k images in CCPD-Base subset to train theproposed network and then evaluates the results on CCPD-Base CCPD-DB CCPD-FN CCPD-Rotate CCPD-TiltCCPD-Weather and CCPD-Challenge

In the first experiment this paper replaces VGG-16network in original faster R-CNN by the proposed bal-anced feature pyramid generation module (e RPN net-work is kept unchanged in this experiment To show theeffectiveness of the L2 normalization L2 normalizationlayer in the balanced feature pyramid generation module isdiscarded in the second experiment In the third experi-ment this paper adds the proposed RPN network with thepredicted location anchor module to replace the originalRPN network (e VGG-16 is kept unchanged as the basenetwork in this experiment In the fourth experiment thispaper adds both the proposed RPN network with thepredicted location anchor module and the proposed bal-anced feature pyramid generation module with L2 nor-malization layer to replace the original RPN network andVGG-16 architecture

Table 5 shows the detection results of each proposedexperiment on the CCPD dataset As shown in Table 5comparing with the original RPN in faster R-CNNframework the proposed predicted location anchorscheme improves the average detection by 02 Bygenerating good proposals the features for the detectionnetwork are more discriminative thus improving thedetection results Comparing with the feature pyramid inFPN with faster R-CNN baseline the proposed balancedpyramid generation module improves the average de-tection by 15 It should be noted that there is no pa-rameter added in the proposed module With theproposed module each level in the balanced featurepyramid obtains equal information from other levelsthus improving the detection performance of the de-tection network Comparing with faster R-CNN and FPNwith faster R-CNN baseline the proposed approachimproves the average detection by 45 and 19 re-spectively (e comparison results indicate that theproposed framework is superior to both single scale andmultiscale features for a region-based object detectorFurthermore with L2 normalization layer added on eachof rescaled features in the balanced feature pyramidgeneration module the average detection is improved by06 compared with the balanced feature pyramid gen-eration module without L2 normalization (is resultshows the effectiveness of the L2 normalization layerwhich keeps the feature values from different convolutionlayers on the same scale

5 Conclusions and Future Work

(is paper proposes a novel deep learning-based frameworkfor license plate detection In the proposed framework abalanced feature pyramid generation module based onResNet-34 architecture is used to generate enhanced bal-anced feature pyramid of which each feature level obtainsequal information from other feature levels In addition amultiscale region proposal network with predicted locationanchor scheme is introduced to generate good proposalsfrom each level of the balanced feature pyramid With goodproposals generated from balanced feature maps the pro-posed approach shows significant improvements compared

Table 3 Comparison of detection results on PKU vehicle dataset

MethodDetection ratio ()

G1 G2 G3 G4 G5 AverageZhou et al [31] 9543 9785 9421 8123 8237 9022Li et al [32] 9889 9842 9583 8117 8331 9152Yuan et al [13] 9876 9842 9772 9623 9732 9769Li et al [30] 9988 9971 9946 9983 9868 9951Proposed approach 9988 9986 9973 9983 9944 9975

8 Complexity

with other approaches on license plate detection (e goodperformance of the proposed approach on license platedetection has a high reference value in the field of intelligent

transport systems For the future work this paper will ex-plore and compare more feature combination andmultiscaledetection methods such as DeepLabv3+ [35] and MOSI-

(a) (b)

Figure 5 Examples of detection results of the proposed method on PKU vehicle dataset (a) and AOLP dataset (b)

Complexity 9

LPD [21] In addition this paper will adopt the nonlocalmodule [36] to further refine the balanced semantic features(is step may enhance the integrated features and furtherimprove the detection results

Data Availability

(e codes used in this paper are available from the corre-sponding author upon request

Conflicts of Interest

(e author declares that there are no conflicts of interestregarding the publication of this paper

References

[1] S Ren K He R Girshick and J Sun ldquoFaster r-cnn towardsreal-time object detection with region proposal networksrdquoIEEE Transactions on Pattern Analysis and Machine Intelli-gence vol 39 no 6 pp 1137ndash1149 2015

[2] W Liu D Anguelov D Erhan et al ldquoSingle shot multiboxdetectorrdquo 2016 httpsarxivorgabs151202325

[3] J Redmon and F Ali ldquoYolov3 an incremental improvementrdquo2018 httparxivorgabs180402767

[4] T Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 936ndash944 Hon-olulu HI USA July 2017

[5] T Lin P Goyal R Girshick K He and P Dollar ldquoFocal lossfor dense object detectionrdquo in Proceedings of the 2017 IEEEInternational Conference on Computer Vision (ICCV)pp 2999ndash3007 Venice Italy October 2017

[6] Z Cai Q Fan R S Feris and N Vasconcelos ldquoA unifiedmulti-scale deep convolutional neural network for fast objectdetectionrdquo 2016 httparxivorgabs160707155

[7] C-Y Fu W Liu A Ranga A Tyagi and A C Berg ldquoDssddeconvolutional single shot detectorrdquo 2017 httparxivorgabs170106659

[8] R Girshick ldquoFast R-CNNrdquo in Proceedings of the IEEE In-ternational Conference on Computer Vision (ICCV) SantiagoChile December 2015

[9] J Dai Yi Li K He and J Sun ldquoR-fcn object detection viaregion-based fully convolutional networksrdquo Advances inNeural information Processing Systems pp 379ndash387 MitPress Cambridge MA USA 2016

[10] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo 2016 httparxivorgabs150602640

[11] S Bell C L Zitnick K Bala and R Girshick ldquoInside-outsidenet detecting objects in context with skip pooling and re-current neural networksrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognitionpp 2874ndash2883 Las Vegas NV USA June 2016

[12] K S Raghunandan P Shivakumara H A Jalab et al ldquoRieszfractional based model for enhancing license plate detectionand recognitionrdquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 28 no 9 pp 2276ndash2288 2018

[13] Y Yuan W Zou Y Zhao X Wang X Hu andN Komodakis ldquoA robust and efficient approach to licenseplate detectionrdquo IEEE Transactions on Image Processingvol 26 no 3 pp 1102ndash1114 2017

[14] C Gou K Wang Y Yao and Z Li ldquoVehicle license platerecognition based on extremal regions and restricted Boltz-mann machinesrdquo IEEE Transactions on Intelligent Trans-portation Systems vol 17 no 4 pp 1096ndash1107 2016

[15] A H Ashtari M J Nordin andM Fathy ldquoAn Iranian licenseplate recognition system based on color featuresrdquo IEEETransactions on Intelligent Transportation Systems vol 15no 4 pp 1690ndash1705 2014

[16] S G Kim H G Jeon and H I Koo ldquoDeep-learning-basedlicense plate detection method using vehicle region extrac-tionrdquo Electronics Letters vol 53 no 15 pp 1034ndash1036 2017

[17] O Bulan V Kozitsky P Ramesh and M Shreve ldquoSeg-mentation- and annotation-free license plate recognition withdeep localization and failure identificationrdquo IEEE Transac-tions on Intelligent Transportation Systems vol 18 no 9pp 2351ndash2363 2017

[18] F Xie M Zhang J Zhao J Yang Y Liu and X Yuan ldquoArobust license plate detection and character recognition

Table 4 Comparison of detection results on AOLP dataset

MethodDetection ratio ()

AC LE RP AverageHsu et al [29] 960 950 940 95Li et al [33] 9838 9762 9558 9719Li et al [30] 9912 9908 9820 988Proposed approach 9941 9934 9951 9942

Table 5 Detection results of each proposed network on CCPD dataset

NetworkDetection performance ()

Base DB FN Rotate Tilt Weather Challenge AverageFaster R-CNN 981 921 837 918 894 811 839 886FPN with faster R-CNN baseline 992 954 875 930 913 854 863 912Faster R-CNN+balanced feature pyramid 995 962 891 932 916 889 901 927Faster R-CNN+balanced pyramid without L2-norm 993 962 889 931 910 876 883 921Faster R-CNN+predicted anchor RPN 985 926 840 915 894 814 844 888Faster R-CNN+balanced pyramid with L2-norm+predicted anchor RPN 995 964 901 932 918 897 912 931

10 Complexity

algorithm based on a combined feature extraction model andBPNNrdquo Journal of Advanced Transportation vol 2018 ArticleID 6737314 14 pages 2018

[19] L Zou M Zhao Z Gao M Cao H Jia and M Pei ldquoLicenseplate detection with shallow and deep CNNs in complexenvironmentsrdquo Complexity vol 2018 Article ID 79846536 pages 2018

[20] L Xie T Ahmad L Jin Y Liu and S Zhang ldquoA new CNN-based method for multi-directional car license plate detec-tionrdquo IEEE Transactions on Intelligent Transportation Systemsvol 19 no 2 pp 507ndash517 2018

[21] J Han J Yao J Zhao J Tu and Y Liu ldquoMulti-oriented andscale-invariant license plate detection based on convolutionalneural networksrdquo Sensors vol 19 no 5 p 1175 2019

[22] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the 2016 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 770ndash778 Las Vegas NV USA June 2016

[23] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httparxivorgabs14091556

[24] J Huang ldquoSpeedaccuracy trade-offs for modern convolu-tional object detectorsrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 3296-3297 Honolulu HI USA July 2017

[25] O Russakovsky J Deng H Su et al ldquoImagenet large scalevisual recognition challengerdquo 2014 httparxivorgabs14090575

[26] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in European Conference on Com-puter Vision Springer Berlin Germany 2014

[27] J Pang ldquoLibra r-cnn towards balanced learning for objectdetectionrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition Long Beach CA USA June2019

[28] J Wang K Chen S Yang C C Loy and D Lin ldquoRegionproposal by guided anchoringrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognitionpp 2965ndash2974 Long Beach CA USA June 2019

[29] G-S Hsu J-C Chen and Y-Z Chung ldquoApplication-ori-ented license plate recognitionrdquo IEEE Transactions on Ve-hicular Technology vol 62 no 2 pp 552ndash561 2013

[30] H Li P Wang and C Shen ldquoToward end-to-end car licenseplate detection and recognition with deep neural networksrdquoIEEE Transactions on Intelligent Transportation Systemsvol 20 no 3 pp 1126ndash1136 2019

[31] W Zhou H Li Y Lu and Q Tian ldquoPrincipal visual worddiscovery for automatic license plate detectionrdquo IEEETransactions on Image Processing vol 21 no 9 pp 4269ndash4279 2012

[32] B Li B Tian Y Li and D Wen ldquoComponent-based licenseplate detection using conditional random field modelrdquo IEEETransactions on Intelligent Transportation Systems vol 14no 4 pp 1690ndash1699 2013

[33] H Li and C Shen ldquoReading car license plates using deepconvolutional neural networks and LSTMsrdquo 2016 httpsarxivorgabs160105610

[34] Z Xu W Yang A Meng et al ldquoTowards end-to-end licenseplate detection and recognition a large dataset and baselinerdquoComputer VisionmdashECCV 2018 Springer Berlin Germanypp 261ndash277 2018

[35] L-C Chen Y Zhu P George F Schroff and H AdamldquoEncoder-decoder with atrous separable convolution for se-mantic image segmentationrdquo in Proceedings of the European

Conference on Computer Vision (ECCV) pp 801ndash818Munich Germany September 2018

[36] X Wang R Girshick A Gupta and K He ldquoNon-local neuralnetworksrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7794ndash7803 Honolulu HIUSA July 2018

Complexity 11

Page 6: PredictedAnchorRegionProposalwithBalancedFeature ...downloads.hindawi.com › journals › complexity › 2020 › 5137056.pdf · R-CNN for license plate detection. In the first

box location prediction Lloc (us the multitask lossfunction is defined as follows

L 1113944 Lcls + 1113944 Lreg + Lloc (4)

In (4) the binary logistic loss is used for box classifi-cation and smooth L1 loss [1] is adopted for box regressionFor training the anchor box location prediction branch inthe proposed RPN this paper follows the training schemedesigned in [28] More specifically this paper denotes theground-truth bounding box as (xgt ygt wgt hgt) where(xgt ygt) represents the center coordinates and (wgt hgt)

represents the size of the ground-truth bounding box (eground-truth bounding box is mapped to the correspondingbalanced feature map scale to obtain (xgtprime ygtprime wgtprime hgtprime ) Basedon the obtained bounding box the center box (CB) ignorebox (IB) and outside box (OB) are defined as follows

CB xgtprime ygtprime z1wgtprime z1hgtprime1113872 1113873 (5)

IB xgtprime ygtprime z2wgtprime z2hgtprime1113872 1113873 minus CB (6)

OB xgtprime ygtprime wgtprime hgtprime1113872 1113873 minus CB minus IB (7)

Input feature map

Conv layer(3 times 3 times C times N)

Conv layer(1 times 1 times N times 2k)

Conv layer(1 times 1 times N times 4k)

Classification

Regression

Proposals

W times H times C

(a)

W times H times C

Conv layer(1 times 1 times C times 1)

Objectnessscore map

Probabilitymap

W times H times 1

Elementwise sigmoid

Conv layer(3 times 3 times C times N)

Conv layer(1 times 1 times N times 2k)

Conv layer(1 times 1 times N times 4k)

Classification

Regression

Proposals

(b)

Figure 3(e architecture of the original RPN (a) and the proposed RPNwith predicted location anchor (b) Multilayer RPN is used in this paper

(a) (b)

Figure 4 Example results of the original RPN (a) and the proposed RPN (b)

Table 2 Dataset summary

Dataset Number of images Number of license plates Image resolution License plate height (in pixels)

PKU vehicle dataset

G1 810 810 1082times 728 35ndash57G2 700 700 1082times 728 30ndash62G3 743 743 1082times 728 29ndash53G4 572 572 1600times1236 30ndash58G5 1152 1438 1600times1200 20ndash60

AOLP datasetAC 681 681 352times 240 25ndash70LE 757 757 640times 480 28ndash80RP 611 611 320times 240 30ndash70

6 Complexity

where z2 gt z1 Pixels inside CB are assigned as positive lo-cations while pixels inside OB are assigned as negativelocations Otherwise pixels inside IB are discarded intraining samples In the end for each image in the trainingset a binary label map where 1 represents a positive locationand 0 represents a negative location is generated for trainingthe anchor box location prediction branch Note that eachlevel of the balanced feature map should only assign objectsof a specific scale range so CB is only assigned on a featuremap that matches the scale range of the targeted object (esame regions of adjacent levels in the balanced featurepyramid are set as IB Finally focal loss function [5] isadopted to train the anchor box location prediction branchfor solving sample level imbalance problem

4 Results and Discussion

In order to compare the effectiveness of the proposed ap-proach with other state-of-the-art approaches on licenseplate detection this paper conducts experiments on twopublic datasets PKU vehicle dataset [13] and ApplicationOriented License Plate (AOLP) dataset [29] (e proposedapproach is implemented on a Window system machinewith Intel Core i7 8700 CPU NVIDIA GeForce GTX 1080GPU and 16Gb of RAM TensorFlow is adopted forimplementing deep CNN frameworks

41 Dataset and EvaluationMetric Two public license platedatasets are adopted to evaluate the performance of theproposed method in this paper including PKU vehicledataset [13] and AOLP dataset [29]

PKU vehicle dataset includes 3828 vehicle images cap-tured from various scenes under diverse environmentconditions (e image in this dataset is divided into fivegroups (G1-G5) corresponding to different configurationsMore specifically all images in G1 G2 and G3 group weretaken on highways while images in G4 group were taken oncity roads and images in G5 group were taken at inter-sections with crosswalks (e image in G4 group is capturedduring nighttime while the image in other groups is cap-tured during daytime (ere is one Chinese license plate ineach image of G1-G4 group while multiple Chinese licenseplates are captured in each image of G5 group For trainingthe proposed network this paper adopts CarFlag-Largedataset [30] which contains 460000 images with Chineselicense plates

AOLP dataset includes 2049 images of Taiwan licenseplates in various locations time traffic and weather con-ditions (is dataset is categorized into three subsets accesscontrol (AC) with 681 images traffic law enforcement (LE)with 757 images and road patrol (RP) with 611 images ACrefers to the cases that a vehicle passes a fixed passage at areduced speed or with a full stop LE refers to the cases that avehicle violates traffic laws and is captured by a roadsidecamera RP refers to the cases that the camera is installed orhandheld on a patrolling vehicle which takes images ofvehicles with arbitrary viewpoints and distances Each imagecontains one license plate Since there is no standard split for

AOLP dataset this paper follows the same strategy as in [30]for training the proposed network More specifically thispaper uses images from different subsets for training and testseparately In addition data augmentation is conducted byrotation and affine transformation to increase the number oftraining images In this paper PKU vehicle dataset andAOLP dataset are adopted to evaluate the performance of theproposed approach and compare the detection results withthe results of other state-of-the-art approaches Table 2shows the detailed descriptions of each dataset used inthis paper

For the evaluationmetric this paper follows the criterionused in [13] to evaluate the performance of the proposedmethod and other methods on the PKU vehicle dataset andAOLP dataset More specifically a detection is considered tobe correct if the license plate is totally encompassed by thebounding box and the IoU between the detected license plateand the ground-truth license plate is at least 05

42ExperimentalResults onPKUVehicleDataset In order toshow the effectiveness of the proposed approach this papercompares the performance results of the proposed methodwith the results of state-of-the-art license plate detectionmethods on PKU vehicle dataset including the methodsproposed by Zhou et al [31] Li et al [32] Yuan et al [13]and Li et al [30] Zhou et al [31] proposed to discover theprincipal visual word characterized with geometric contextfor each license plate character With a new license plateimage the license plates are extracted by matching localfeatures with principal visual word Li et al [32] usedmaximally stable extremal region detector to extract can-didate characters in images (e exact bounding boxes oflicense plates are estimated through the belief propagationinference on conditional random field which are constructedon the candidate characters in neighborhoods Yuan et al[13] proposed a novel line density filter approach to extractlicense candidate regions and a cascaded license plateclassifier based on linear support vector machines usingcolour saliency features is designed to identify the truelicense plate from among the candidate regions Li et al [30]proposed an approach to address both detection and rec-ognition of license plate using a single deep neural network

Table 3 shows the comparison of detection results onPKU vehicle dataset As shown in Table 3 the proposedapproach achieves the best detection accuracy on PKUvehicle dataset More specifically in terms of average de-tection performance the performance of the proposedmethod is improved by 953 823 206 and 024compared with methods proposed by Zhou et al [31] Liet al [32] Yuan et al [13] and Li et al [30] respectively Itshould be noted that the performance of the proposedmethod surpasses the best of the reference methods pro-posed by Li et al [30] by a significant margin on G5 groupImages in G5 group contain multiple license plates in dif-ficult conditions such as large variance of scales reflectiveglare and blurry and are affected by defects (is resultshows a strong ability of the proposed framework ondetecting license plate in difficult conditions with a large

Complexity 7

variance of scales Figure 5(a) shows some examples ofdetection results of the proposed method on PKU vehicledataset As shown in Figure 5(a) the proposed algorithm iseffective to detect license plates with different scales underdifferent situations

43 Experimental Results on AOLP Dataset To furtherevaluate the effectiveness of the proposed framework theperformance of proposed approach is tested on AOLPdataset Table 4 shows the comparison of detection resultsof the proposed method and methods proposed by Hsuet al [29] Li et al [33] and Li et al [30] Experimentalresults in Table 4 show that the proposed method achievesthe best detection ratio on all three subsets compared to theprevious methods More specifically in terms of averagedetection the performance of the proposed method isimproved by 442 223 and 062 compared withmethods proposed by Hsu et al [29] Li et al [33] and Liet al [30] respectively (e experimental results demon-strate that the proposed balanced feature pyramid andpredicted location anchor can effectively enhance featurerepresentation power and boost the performance of licenseplate detection in difficult conditions Figure 5(b) showssome examples of detection results of the proposed methodon AOLP dataset As can be observed the proposedmethod can accurately locate small license plates as well asmedium or large ones

44 Ablation Experiments To evaluate the effectiveness ofeach module in the proposed approach this paper conductsseveral experiments on the Chinese City Parking Dataset(CCPD) [34] and compares the detection results with theresults of original faster R-CNN [1] and FPN with fasterR-CNN baseline [4] framework CCPD dataset is a largepublicly available labeled license plate dataset It contains25k independent license plate images under diverse illu-minations environments and backgrounds Each image hasresolution of 720times1160 and contains one license plate Allimages containing license plate are divided into 8 groupsbased on different conditions CCPD-Base with 200k imagesCCPD-FN with 20k images CCPD-DB with 20k imagesCCPD-Rotate with 10k images CCPD-Tilt with 10k imagesCCPD-Weather with 10k images CCPD-Challenge with 10kimages CCPD-Blur with 5k images As in [34] this paperadopts 100k images in CCPD-Base subset to train theproposed network and then evaluates the results on CCPD-Base CCPD-DB CCPD-FN CCPD-Rotate CCPD-TiltCCPD-Weather and CCPD-Challenge

In the first experiment this paper replaces VGG-16network in original faster R-CNN by the proposed bal-anced feature pyramid generation module (e RPN net-work is kept unchanged in this experiment To show theeffectiveness of the L2 normalization L2 normalizationlayer in the balanced feature pyramid generation module isdiscarded in the second experiment In the third experi-ment this paper adds the proposed RPN network with thepredicted location anchor module to replace the originalRPN network (e VGG-16 is kept unchanged as the basenetwork in this experiment In the fourth experiment thispaper adds both the proposed RPN network with thepredicted location anchor module and the proposed bal-anced feature pyramid generation module with L2 nor-malization layer to replace the original RPN network andVGG-16 architecture

Table 5 shows the detection results of each proposedexperiment on the CCPD dataset As shown in Table 5comparing with the original RPN in faster R-CNNframework the proposed predicted location anchorscheme improves the average detection by 02 Bygenerating good proposals the features for the detectionnetwork are more discriminative thus improving thedetection results Comparing with the feature pyramid inFPN with faster R-CNN baseline the proposed balancedpyramid generation module improves the average de-tection by 15 It should be noted that there is no pa-rameter added in the proposed module With theproposed module each level in the balanced featurepyramid obtains equal information from other levelsthus improving the detection performance of the de-tection network Comparing with faster R-CNN and FPNwith faster R-CNN baseline the proposed approachimproves the average detection by 45 and 19 re-spectively (e comparison results indicate that theproposed framework is superior to both single scale andmultiscale features for a region-based object detectorFurthermore with L2 normalization layer added on eachof rescaled features in the balanced feature pyramidgeneration module the average detection is improved by06 compared with the balanced feature pyramid gen-eration module without L2 normalization (is resultshows the effectiveness of the L2 normalization layerwhich keeps the feature values from different convolutionlayers on the same scale

5 Conclusions and Future Work

(is paper proposes a novel deep learning-based frameworkfor license plate detection In the proposed framework abalanced feature pyramid generation module based onResNet-34 architecture is used to generate enhanced bal-anced feature pyramid of which each feature level obtainsequal information from other feature levels In addition amultiscale region proposal network with predicted locationanchor scheme is introduced to generate good proposalsfrom each level of the balanced feature pyramid With goodproposals generated from balanced feature maps the pro-posed approach shows significant improvements compared

Table 3 Comparison of detection results on PKU vehicle dataset

MethodDetection ratio ()

G1 G2 G3 G4 G5 AverageZhou et al [31] 9543 9785 9421 8123 8237 9022Li et al [32] 9889 9842 9583 8117 8331 9152Yuan et al [13] 9876 9842 9772 9623 9732 9769Li et al [30] 9988 9971 9946 9983 9868 9951Proposed approach 9988 9986 9973 9983 9944 9975

8 Complexity

with other approaches on license plate detection (e goodperformance of the proposed approach on license platedetection has a high reference value in the field of intelligent

transport systems For the future work this paper will ex-plore and compare more feature combination andmultiscaledetection methods such as DeepLabv3+ [35] and MOSI-

(a) (b)

Figure 5 Examples of detection results of the proposed method on PKU vehicle dataset (a) and AOLP dataset (b)

Complexity 9

LPD [21] In addition this paper will adopt the nonlocalmodule [36] to further refine the balanced semantic features(is step may enhance the integrated features and furtherimprove the detection results

Data Availability

(e codes used in this paper are available from the corre-sponding author upon request

Conflicts of Interest

(e author declares that there are no conflicts of interestregarding the publication of this paper

References

[1] S Ren K He R Girshick and J Sun ldquoFaster r-cnn towardsreal-time object detection with region proposal networksrdquoIEEE Transactions on Pattern Analysis and Machine Intelli-gence vol 39 no 6 pp 1137ndash1149 2015

[2] W Liu D Anguelov D Erhan et al ldquoSingle shot multiboxdetectorrdquo 2016 httpsarxivorgabs151202325

[3] J Redmon and F Ali ldquoYolov3 an incremental improvementrdquo2018 httparxivorgabs180402767

[4] T Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 936ndash944 Hon-olulu HI USA July 2017

[5] T Lin P Goyal R Girshick K He and P Dollar ldquoFocal lossfor dense object detectionrdquo in Proceedings of the 2017 IEEEInternational Conference on Computer Vision (ICCV)pp 2999ndash3007 Venice Italy October 2017

[6] Z Cai Q Fan R S Feris and N Vasconcelos ldquoA unifiedmulti-scale deep convolutional neural network for fast objectdetectionrdquo 2016 httparxivorgabs160707155

[7] C-Y Fu W Liu A Ranga A Tyagi and A C Berg ldquoDssddeconvolutional single shot detectorrdquo 2017 httparxivorgabs170106659

[8] R Girshick ldquoFast R-CNNrdquo in Proceedings of the IEEE In-ternational Conference on Computer Vision (ICCV) SantiagoChile December 2015

[9] J Dai Yi Li K He and J Sun ldquoR-fcn object detection viaregion-based fully convolutional networksrdquo Advances inNeural information Processing Systems pp 379ndash387 MitPress Cambridge MA USA 2016

[10] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo 2016 httparxivorgabs150602640

[11] S Bell C L Zitnick K Bala and R Girshick ldquoInside-outsidenet detecting objects in context with skip pooling and re-current neural networksrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognitionpp 2874ndash2883 Las Vegas NV USA June 2016

[12] K S Raghunandan P Shivakumara H A Jalab et al ldquoRieszfractional based model for enhancing license plate detectionand recognitionrdquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 28 no 9 pp 2276ndash2288 2018

[13] Y Yuan W Zou Y Zhao X Wang X Hu andN Komodakis ldquoA robust and efficient approach to licenseplate detectionrdquo IEEE Transactions on Image Processingvol 26 no 3 pp 1102ndash1114 2017

[14] C Gou K Wang Y Yao and Z Li ldquoVehicle license platerecognition based on extremal regions and restricted Boltz-mann machinesrdquo IEEE Transactions on Intelligent Trans-portation Systems vol 17 no 4 pp 1096ndash1107 2016

[15] A H Ashtari M J Nordin andM Fathy ldquoAn Iranian licenseplate recognition system based on color featuresrdquo IEEETransactions on Intelligent Transportation Systems vol 15no 4 pp 1690ndash1705 2014

[16] S G Kim H G Jeon and H I Koo ldquoDeep-learning-basedlicense plate detection method using vehicle region extrac-tionrdquo Electronics Letters vol 53 no 15 pp 1034ndash1036 2017

[17] O Bulan V Kozitsky P Ramesh and M Shreve ldquoSeg-mentation- and annotation-free license plate recognition withdeep localization and failure identificationrdquo IEEE Transac-tions on Intelligent Transportation Systems vol 18 no 9pp 2351ndash2363 2017

[18] F Xie M Zhang J Zhao J Yang Y Liu and X Yuan ldquoArobust license plate detection and character recognition

Table 4 Comparison of detection results on AOLP dataset

MethodDetection ratio ()

AC LE RP AverageHsu et al [29] 960 950 940 95Li et al [33] 9838 9762 9558 9719Li et al [30] 9912 9908 9820 988Proposed approach 9941 9934 9951 9942

Table 5 Detection results of each proposed network on CCPD dataset

NetworkDetection performance ()

Base DB FN Rotate Tilt Weather Challenge AverageFaster R-CNN 981 921 837 918 894 811 839 886FPN with faster R-CNN baseline 992 954 875 930 913 854 863 912Faster R-CNN+balanced feature pyramid 995 962 891 932 916 889 901 927Faster R-CNN+balanced pyramid without L2-norm 993 962 889 931 910 876 883 921Faster R-CNN+predicted anchor RPN 985 926 840 915 894 814 844 888Faster R-CNN+balanced pyramid with L2-norm+predicted anchor RPN 995 964 901 932 918 897 912 931

10 Complexity

algorithm based on a combined feature extraction model andBPNNrdquo Journal of Advanced Transportation vol 2018 ArticleID 6737314 14 pages 2018

[19] L Zou M Zhao Z Gao M Cao H Jia and M Pei ldquoLicenseplate detection with shallow and deep CNNs in complexenvironmentsrdquo Complexity vol 2018 Article ID 79846536 pages 2018

[20] L Xie T Ahmad L Jin Y Liu and S Zhang ldquoA new CNN-based method for multi-directional car license plate detec-tionrdquo IEEE Transactions on Intelligent Transportation Systemsvol 19 no 2 pp 507ndash517 2018

[21] J Han J Yao J Zhao J Tu and Y Liu ldquoMulti-oriented andscale-invariant license plate detection based on convolutionalneural networksrdquo Sensors vol 19 no 5 p 1175 2019

[22] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the 2016 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 770ndash778 Las Vegas NV USA June 2016

[23] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httparxivorgabs14091556

[24] J Huang ldquoSpeedaccuracy trade-offs for modern convolu-tional object detectorsrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 3296-3297 Honolulu HI USA July 2017

[25] O Russakovsky J Deng H Su et al ldquoImagenet large scalevisual recognition challengerdquo 2014 httparxivorgabs14090575

[26] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in European Conference on Com-puter Vision Springer Berlin Germany 2014

[27] J Pang ldquoLibra r-cnn towards balanced learning for objectdetectionrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition Long Beach CA USA June2019

[28] J Wang K Chen S Yang C C Loy and D Lin ldquoRegionproposal by guided anchoringrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognitionpp 2965ndash2974 Long Beach CA USA June 2019

[29] G-S Hsu J-C Chen and Y-Z Chung ldquoApplication-ori-ented license plate recognitionrdquo IEEE Transactions on Ve-hicular Technology vol 62 no 2 pp 552ndash561 2013

[30] H Li P Wang and C Shen ldquoToward end-to-end car licenseplate detection and recognition with deep neural networksrdquoIEEE Transactions on Intelligent Transportation Systemsvol 20 no 3 pp 1126ndash1136 2019

[31] W Zhou H Li Y Lu and Q Tian ldquoPrincipal visual worddiscovery for automatic license plate detectionrdquo IEEETransactions on Image Processing vol 21 no 9 pp 4269ndash4279 2012

[32] B Li B Tian Y Li and D Wen ldquoComponent-based licenseplate detection using conditional random field modelrdquo IEEETransactions on Intelligent Transportation Systems vol 14no 4 pp 1690ndash1699 2013

[33] H Li and C Shen ldquoReading car license plates using deepconvolutional neural networks and LSTMsrdquo 2016 httpsarxivorgabs160105610

[34] Z Xu W Yang A Meng et al ldquoTowards end-to-end licenseplate detection and recognition a large dataset and baselinerdquoComputer VisionmdashECCV 2018 Springer Berlin Germanypp 261ndash277 2018

[35] L-C Chen Y Zhu P George F Schroff and H AdamldquoEncoder-decoder with atrous separable convolution for se-mantic image segmentationrdquo in Proceedings of the European

Conference on Computer Vision (ECCV) pp 801ndash818Munich Germany September 2018

[36] X Wang R Girshick A Gupta and K He ldquoNon-local neuralnetworksrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7794ndash7803 Honolulu HIUSA July 2018

Complexity 11

Page 7: PredictedAnchorRegionProposalwithBalancedFeature ...downloads.hindawi.com › journals › complexity › 2020 › 5137056.pdf · R-CNN for license plate detection. In the first

where z2 gt z1 Pixels inside CB are assigned as positive lo-cations while pixels inside OB are assigned as negativelocations Otherwise pixels inside IB are discarded intraining samples In the end for each image in the trainingset a binary label map where 1 represents a positive locationand 0 represents a negative location is generated for trainingthe anchor box location prediction branch Note that eachlevel of the balanced feature map should only assign objectsof a specific scale range so CB is only assigned on a featuremap that matches the scale range of the targeted object (esame regions of adjacent levels in the balanced featurepyramid are set as IB Finally focal loss function [5] isadopted to train the anchor box location prediction branchfor solving sample level imbalance problem

4 Results and Discussion

In order to compare the effectiveness of the proposed ap-proach with other state-of-the-art approaches on licenseplate detection this paper conducts experiments on twopublic datasets PKU vehicle dataset [13] and ApplicationOriented License Plate (AOLP) dataset [29] (e proposedapproach is implemented on a Window system machinewith Intel Core i7 8700 CPU NVIDIA GeForce GTX 1080GPU and 16Gb of RAM TensorFlow is adopted forimplementing deep CNN frameworks

41 Dataset and EvaluationMetric Two public license platedatasets are adopted to evaluate the performance of theproposed method in this paper including PKU vehicledataset [13] and AOLP dataset [29]

PKU vehicle dataset includes 3828 vehicle images cap-tured from various scenes under diverse environmentconditions (e image in this dataset is divided into fivegroups (G1-G5) corresponding to different configurationsMore specifically all images in G1 G2 and G3 group weretaken on highways while images in G4 group were taken oncity roads and images in G5 group were taken at inter-sections with crosswalks (e image in G4 group is capturedduring nighttime while the image in other groups is cap-tured during daytime (ere is one Chinese license plate ineach image of G1-G4 group while multiple Chinese licenseplates are captured in each image of G5 group For trainingthe proposed network this paper adopts CarFlag-Largedataset [30] which contains 460000 images with Chineselicense plates

AOLP dataset includes 2049 images of Taiwan licenseplates in various locations time traffic and weather con-ditions (is dataset is categorized into three subsets accesscontrol (AC) with 681 images traffic law enforcement (LE)with 757 images and road patrol (RP) with 611 images ACrefers to the cases that a vehicle passes a fixed passage at areduced speed or with a full stop LE refers to the cases that avehicle violates traffic laws and is captured by a roadsidecamera RP refers to the cases that the camera is installed orhandheld on a patrolling vehicle which takes images ofvehicles with arbitrary viewpoints and distances Each imagecontains one license plate Since there is no standard split for

AOLP dataset this paper follows the same strategy as in [30]for training the proposed network More specifically thispaper uses images from different subsets for training and testseparately In addition data augmentation is conducted byrotation and affine transformation to increase the number oftraining images In this paper PKU vehicle dataset andAOLP dataset are adopted to evaluate the performance of theproposed approach and compare the detection results withthe results of other state-of-the-art approaches Table 2shows the detailed descriptions of each dataset used inthis paper

For the evaluationmetric this paper follows the criterionused in [13] to evaluate the performance of the proposedmethod and other methods on the PKU vehicle dataset andAOLP dataset More specifically a detection is considered tobe correct if the license plate is totally encompassed by thebounding box and the IoU between the detected license plateand the ground-truth license plate is at least 05

42ExperimentalResults onPKUVehicleDataset In order toshow the effectiveness of the proposed approach this papercompares the performance results of the proposed methodwith the results of state-of-the-art license plate detectionmethods on PKU vehicle dataset including the methodsproposed by Zhou et al [31] Li et al [32] Yuan et al [13]and Li et al [30] Zhou et al [31] proposed to discover theprincipal visual word characterized with geometric contextfor each license plate character With a new license plateimage the license plates are extracted by matching localfeatures with principal visual word Li et al [32] usedmaximally stable extremal region detector to extract can-didate characters in images (e exact bounding boxes oflicense plates are estimated through the belief propagationinference on conditional random field which are constructedon the candidate characters in neighborhoods Yuan et al[13] proposed a novel line density filter approach to extractlicense candidate regions and a cascaded license plateclassifier based on linear support vector machines usingcolour saliency features is designed to identify the truelicense plate from among the candidate regions Li et al [30]proposed an approach to address both detection and rec-ognition of license plate using a single deep neural network

Table 3 shows the comparison of detection results onPKU vehicle dataset As shown in Table 3 the proposedapproach achieves the best detection accuracy on PKUvehicle dataset More specifically in terms of average de-tection performance the performance of the proposedmethod is improved by 953 823 206 and 024compared with methods proposed by Zhou et al [31] Liet al [32] Yuan et al [13] and Li et al [30] respectively Itshould be noted that the performance of the proposedmethod surpasses the best of the reference methods pro-posed by Li et al [30] by a significant margin on G5 groupImages in G5 group contain multiple license plates in dif-ficult conditions such as large variance of scales reflectiveglare and blurry and are affected by defects (is resultshows a strong ability of the proposed framework ondetecting license plate in difficult conditions with a large

Complexity 7

variance of scales Figure 5(a) shows some examples ofdetection results of the proposed method on PKU vehicledataset As shown in Figure 5(a) the proposed algorithm iseffective to detect license plates with different scales underdifferent situations

43 Experimental Results on AOLP Dataset To furtherevaluate the effectiveness of the proposed framework theperformance of proposed approach is tested on AOLPdataset Table 4 shows the comparison of detection resultsof the proposed method and methods proposed by Hsuet al [29] Li et al [33] and Li et al [30] Experimentalresults in Table 4 show that the proposed method achievesthe best detection ratio on all three subsets compared to theprevious methods More specifically in terms of averagedetection the performance of the proposed method isimproved by 442 223 and 062 compared withmethods proposed by Hsu et al [29] Li et al [33] and Liet al [30] respectively (e experimental results demon-strate that the proposed balanced feature pyramid andpredicted location anchor can effectively enhance featurerepresentation power and boost the performance of licenseplate detection in difficult conditions Figure 5(b) showssome examples of detection results of the proposed methodon AOLP dataset As can be observed the proposedmethod can accurately locate small license plates as well asmedium or large ones

44 Ablation Experiments To evaluate the effectiveness ofeach module in the proposed approach this paper conductsseveral experiments on the Chinese City Parking Dataset(CCPD) [34] and compares the detection results with theresults of original faster R-CNN [1] and FPN with fasterR-CNN baseline [4] framework CCPD dataset is a largepublicly available labeled license plate dataset It contains25k independent license plate images under diverse illu-minations environments and backgrounds Each image hasresolution of 720times1160 and contains one license plate Allimages containing license plate are divided into 8 groupsbased on different conditions CCPD-Base with 200k imagesCCPD-FN with 20k images CCPD-DB with 20k imagesCCPD-Rotate with 10k images CCPD-Tilt with 10k imagesCCPD-Weather with 10k images CCPD-Challenge with 10kimages CCPD-Blur with 5k images As in [34] this paperadopts 100k images in CCPD-Base subset to train theproposed network and then evaluates the results on CCPD-Base CCPD-DB CCPD-FN CCPD-Rotate CCPD-TiltCCPD-Weather and CCPD-Challenge

In the first experiment this paper replaces VGG-16network in original faster R-CNN by the proposed bal-anced feature pyramid generation module (e RPN net-work is kept unchanged in this experiment To show theeffectiveness of the L2 normalization L2 normalizationlayer in the balanced feature pyramid generation module isdiscarded in the second experiment In the third experi-ment this paper adds the proposed RPN network with thepredicted location anchor module to replace the originalRPN network (e VGG-16 is kept unchanged as the basenetwork in this experiment In the fourth experiment thispaper adds both the proposed RPN network with thepredicted location anchor module and the proposed bal-anced feature pyramid generation module with L2 nor-malization layer to replace the original RPN network andVGG-16 architecture

Table 5 shows the detection results of each proposedexperiment on the CCPD dataset As shown in Table 5comparing with the original RPN in faster R-CNNframework the proposed predicted location anchorscheme improves the average detection by 02 Bygenerating good proposals the features for the detectionnetwork are more discriminative thus improving thedetection results Comparing with the feature pyramid inFPN with faster R-CNN baseline the proposed balancedpyramid generation module improves the average de-tection by 15 It should be noted that there is no pa-rameter added in the proposed module With theproposed module each level in the balanced featurepyramid obtains equal information from other levelsthus improving the detection performance of the de-tection network Comparing with faster R-CNN and FPNwith faster R-CNN baseline the proposed approachimproves the average detection by 45 and 19 re-spectively (e comparison results indicate that theproposed framework is superior to both single scale andmultiscale features for a region-based object detectorFurthermore with L2 normalization layer added on eachof rescaled features in the balanced feature pyramidgeneration module the average detection is improved by06 compared with the balanced feature pyramid gen-eration module without L2 normalization (is resultshows the effectiveness of the L2 normalization layerwhich keeps the feature values from different convolutionlayers on the same scale

5 Conclusions and Future Work

(is paper proposes a novel deep learning-based frameworkfor license plate detection In the proposed framework abalanced feature pyramid generation module based onResNet-34 architecture is used to generate enhanced bal-anced feature pyramid of which each feature level obtainsequal information from other feature levels In addition amultiscale region proposal network with predicted locationanchor scheme is introduced to generate good proposalsfrom each level of the balanced feature pyramid With goodproposals generated from balanced feature maps the pro-posed approach shows significant improvements compared

Table 3 Comparison of detection results on PKU vehicle dataset

MethodDetection ratio ()

G1 G2 G3 G4 G5 AverageZhou et al [31] 9543 9785 9421 8123 8237 9022Li et al [32] 9889 9842 9583 8117 8331 9152Yuan et al [13] 9876 9842 9772 9623 9732 9769Li et al [30] 9988 9971 9946 9983 9868 9951Proposed approach 9988 9986 9973 9983 9944 9975

8 Complexity

with other approaches on license plate detection (e goodperformance of the proposed approach on license platedetection has a high reference value in the field of intelligent

transport systems For the future work this paper will ex-plore and compare more feature combination andmultiscaledetection methods such as DeepLabv3+ [35] and MOSI-

(a) (b)

Figure 5 Examples of detection results of the proposed method on PKU vehicle dataset (a) and AOLP dataset (b)

Complexity 9

LPD [21] In addition this paper will adopt the nonlocalmodule [36] to further refine the balanced semantic features(is step may enhance the integrated features and furtherimprove the detection results

Data Availability

(e codes used in this paper are available from the corre-sponding author upon request

Conflicts of Interest

(e author declares that there are no conflicts of interestregarding the publication of this paper

References

[1] S Ren K He R Girshick and J Sun ldquoFaster r-cnn towardsreal-time object detection with region proposal networksrdquoIEEE Transactions on Pattern Analysis and Machine Intelli-gence vol 39 no 6 pp 1137ndash1149 2015

[2] W Liu D Anguelov D Erhan et al ldquoSingle shot multiboxdetectorrdquo 2016 httpsarxivorgabs151202325

[3] J Redmon and F Ali ldquoYolov3 an incremental improvementrdquo2018 httparxivorgabs180402767

[4] T Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 936ndash944 Hon-olulu HI USA July 2017

[5] T Lin P Goyal R Girshick K He and P Dollar ldquoFocal lossfor dense object detectionrdquo in Proceedings of the 2017 IEEEInternational Conference on Computer Vision (ICCV)pp 2999ndash3007 Venice Italy October 2017

[6] Z Cai Q Fan R S Feris and N Vasconcelos ldquoA unifiedmulti-scale deep convolutional neural network for fast objectdetectionrdquo 2016 httparxivorgabs160707155

[7] C-Y Fu W Liu A Ranga A Tyagi and A C Berg ldquoDssddeconvolutional single shot detectorrdquo 2017 httparxivorgabs170106659

[8] R Girshick ldquoFast R-CNNrdquo in Proceedings of the IEEE In-ternational Conference on Computer Vision (ICCV) SantiagoChile December 2015

[9] J Dai Yi Li K He and J Sun ldquoR-fcn object detection viaregion-based fully convolutional networksrdquo Advances inNeural information Processing Systems pp 379ndash387 MitPress Cambridge MA USA 2016

[10] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo 2016 httparxivorgabs150602640

[11] S Bell C L Zitnick K Bala and R Girshick ldquoInside-outsidenet detecting objects in context with skip pooling and re-current neural networksrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognitionpp 2874ndash2883 Las Vegas NV USA June 2016

[12] K S Raghunandan P Shivakumara H A Jalab et al ldquoRieszfractional based model for enhancing license plate detectionand recognitionrdquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 28 no 9 pp 2276ndash2288 2018

[13] Y Yuan W Zou Y Zhao X Wang X Hu andN Komodakis ldquoA robust and efficient approach to licenseplate detectionrdquo IEEE Transactions on Image Processingvol 26 no 3 pp 1102ndash1114 2017

[14] C Gou K Wang Y Yao and Z Li ldquoVehicle license platerecognition based on extremal regions and restricted Boltz-mann machinesrdquo IEEE Transactions on Intelligent Trans-portation Systems vol 17 no 4 pp 1096ndash1107 2016

[15] A H Ashtari M J Nordin andM Fathy ldquoAn Iranian licenseplate recognition system based on color featuresrdquo IEEETransactions on Intelligent Transportation Systems vol 15no 4 pp 1690ndash1705 2014

[16] S G Kim H G Jeon and H I Koo ldquoDeep-learning-basedlicense plate detection method using vehicle region extrac-tionrdquo Electronics Letters vol 53 no 15 pp 1034ndash1036 2017

[17] O Bulan V Kozitsky P Ramesh and M Shreve ldquoSeg-mentation- and annotation-free license plate recognition withdeep localization and failure identificationrdquo IEEE Transac-tions on Intelligent Transportation Systems vol 18 no 9pp 2351ndash2363 2017

[18] F Xie M Zhang J Zhao J Yang Y Liu and X Yuan ldquoArobust license plate detection and character recognition

Table 4 Comparison of detection results on AOLP dataset

MethodDetection ratio ()

AC LE RP AverageHsu et al [29] 960 950 940 95Li et al [33] 9838 9762 9558 9719Li et al [30] 9912 9908 9820 988Proposed approach 9941 9934 9951 9942

Table 5 Detection results of each proposed network on CCPD dataset

NetworkDetection performance ()

Base DB FN Rotate Tilt Weather Challenge AverageFaster R-CNN 981 921 837 918 894 811 839 886FPN with faster R-CNN baseline 992 954 875 930 913 854 863 912Faster R-CNN+balanced feature pyramid 995 962 891 932 916 889 901 927Faster R-CNN+balanced pyramid without L2-norm 993 962 889 931 910 876 883 921Faster R-CNN+predicted anchor RPN 985 926 840 915 894 814 844 888Faster R-CNN+balanced pyramid with L2-norm+predicted anchor RPN 995 964 901 932 918 897 912 931

10 Complexity

algorithm based on a combined feature extraction model andBPNNrdquo Journal of Advanced Transportation vol 2018 ArticleID 6737314 14 pages 2018

[19] L Zou M Zhao Z Gao M Cao H Jia and M Pei ldquoLicenseplate detection with shallow and deep CNNs in complexenvironmentsrdquo Complexity vol 2018 Article ID 79846536 pages 2018

[20] L Xie T Ahmad L Jin Y Liu and S Zhang ldquoA new CNN-based method for multi-directional car license plate detec-tionrdquo IEEE Transactions on Intelligent Transportation Systemsvol 19 no 2 pp 507ndash517 2018

[21] J Han J Yao J Zhao J Tu and Y Liu ldquoMulti-oriented andscale-invariant license plate detection based on convolutionalneural networksrdquo Sensors vol 19 no 5 p 1175 2019

[22] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the 2016 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 770ndash778 Las Vegas NV USA June 2016

[23] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httparxivorgabs14091556

[24] J Huang ldquoSpeedaccuracy trade-offs for modern convolu-tional object detectorsrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 3296-3297 Honolulu HI USA July 2017

[25] O Russakovsky J Deng H Su et al ldquoImagenet large scalevisual recognition challengerdquo 2014 httparxivorgabs14090575

[26] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in European Conference on Com-puter Vision Springer Berlin Germany 2014

[27] J Pang ldquoLibra r-cnn towards balanced learning for objectdetectionrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition Long Beach CA USA June2019

[28] J Wang K Chen S Yang C C Loy and D Lin ldquoRegionproposal by guided anchoringrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognitionpp 2965ndash2974 Long Beach CA USA June 2019

[29] G-S Hsu J-C Chen and Y-Z Chung ldquoApplication-ori-ented license plate recognitionrdquo IEEE Transactions on Ve-hicular Technology vol 62 no 2 pp 552ndash561 2013

[30] H Li P Wang and C Shen ldquoToward end-to-end car licenseplate detection and recognition with deep neural networksrdquoIEEE Transactions on Intelligent Transportation Systemsvol 20 no 3 pp 1126ndash1136 2019

[31] W Zhou H Li Y Lu and Q Tian ldquoPrincipal visual worddiscovery for automatic license plate detectionrdquo IEEETransactions on Image Processing vol 21 no 9 pp 4269ndash4279 2012

[32] B Li B Tian Y Li and D Wen ldquoComponent-based licenseplate detection using conditional random field modelrdquo IEEETransactions on Intelligent Transportation Systems vol 14no 4 pp 1690ndash1699 2013

[33] H Li and C Shen ldquoReading car license plates using deepconvolutional neural networks and LSTMsrdquo 2016 httpsarxivorgabs160105610

[34] Z Xu W Yang A Meng et al ldquoTowards end-to-end licenseplate detection and recognition a large dataset and baselinerdquoComputer VisionmdashECCV 2018 Springer Berlin Germanypp 261ndash277 2018

[35] L-C Chen Y Zhu P George F Schroff and H AdamldquoEncoder-decoder with atrous separable convolution for se-mantic image segmentationrdquo in Proceedings of the European

Conference on Computer Vision (ECCV) pp 801ndash818Munich Germany September 2018

[36] X Wang R Girshick A Gupta and K He ldquoNon-local neuralnetworksrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7794ndash7803 Honolulu HIUSA July 2018

Complexity 11

Page 8: PredictedAnchorRegionProposalwithBalancedFeature ...downloads.hindawi.com › journals › complexity › 2020 › 5137056.pdf · R-CNN for license plate detection. In the first

variance of scales Figure 5(a) shows some examples ofdetection results of the proposed method on PKU vehicledataset As shown in Figure 5(a) the proposed algorithm iseffective to detect license plates with different scales underdifferent situations

43 Experimental Results on AOLP Dataset To furtherevaluate the effectiveness of the proposed framework theperformance of proposed approach is tested on AOLPdataset Table 4 shows the comparison of detection resultsof the proposed method and methods proposed by Hsuet al [29] Li et al [33] and Li et al [30] Experimentalresults in Table 4 show that the proposed method achievesthe best detection ratio on all three subsets compared to theprevious methods More specifically in terms of averagedetection the performance of the proposed method isimproved by 442 223 and 062 compared withmethods proposed by Hsu et al [29] Li et al [33] and Liet al [30] respectively (e experimental results demon-strate that the proposed balanced feature pyramid andpredicted location anchor can effectively enhance featurerepresentation power and boost the performance of licenseplate detection in difficult conditions Figure 5(b) showssome examples of detection results of the proposed methodon AOLP dataset As can be observed the proposedmethod can accurately locate small license plates as well asmedium or large ones

44 Ablation Experiments To evaluate the effectiveness ofeach module in the proposed approach this paper conductsseveral experiments on the Chinese City Parking Dataset(CCPD) [34] and compares the detection results with theresults of original faster R-CNN [1] and FPN with fasterR-CNN baseline [4] framework CCPD dataset is a largepublicly available labeled license plate dataset It contains25k independent license plate images under diverse illu-minations environments and backgrounds Each image hasresolution of 720times1160 and contains one license plate Allimages containing license plate are divided into 8 groupsbased on different conditions CCPD-Base with 200k imagesCCPD-FN with 20k images CCPD-DB with 20k imagesCCPD-Rotate with 10k images CCPD-Tilt with 10k imagesCCPD-Weather with 10k images CCPD-Challenge with 10kimages CCPD-Blur with 5k images As in [34] this paperadopts 100k images in CCPD-Base subset to train theproposed network and then evaluates the results on CCPD-Base CCPD-DB CCPD-FN CCPD-Rotate CCPD-TiltCCPD-Weather and CCPD-Challenge

In the first experiment this paper replaces VGG-16network in original faster R-CNN by the proposed bal-anced feature pyramid generation module (e RPN net-work is kept unchanged in this experiment To show theeffectiveness of the L2 normalization L2 normalizationlayer in the balanced feature pyramid generation module isdiscarded in the second experiment In the third experi-ment this paper adds the proposed RPN network with thepredicted location anchor module to replace the originalRPN network (e VGG-16 is kept unchanged as the basenetwork in this experiment In the fourth experiment thispaper adds both the proposed RPN network with thepredicted location anchor module and the proposed bal-anced feature pyramid generation module with L2 nor-malization layer to replace the original RPN network andVGG-16 architecture

Table 5 shows the detection results of each proposedexperiment on the CCPD dataset As shown in Table 5comparing with the original RPN in faster R-CNNframework the proposed predicted location anchorscheme improves the average detection by 02 Bygenerating good proposals the features for the detectionnetwork are more discriminative thus improving thedetection results Comparing with the feature pyramid inFPN with faster R-CNN baseline the proposed balancedpyramid generation module improves the average de-tection by 15 It should be noted that there is no pa-rameter added in the proposed module With theproposed module each level in the balanced featurepyramid obtains equal information from other levelsthus improving the detection performance of the de-tection network Comparing with faster R-CNN and FPNwith faster R-CNN baseline the proposed approachimproves the average detection by 45 and 19 re-spectively (e comparison results indicate that theproposed framework is superior to both single scale andmultiscale features for a region-based object detectorFurthermore with L2 normalization layer added on eachof rescaled features in the balanced feature pyramidgeneration module the average detection is improved by06 compared with the balanced feature pyramid gen-eration module without L2 normalization (is resultshows the effectiveness of the L2 normalization layerwhich keeps the feature values from different convolutionlayers on the same scale

5 Conclusions and Future Work

(is paper proposes a novel deep learning-based frameworkfor license plate detection In the proposed framework abalanced feature pyramid generation module based onResNet-34 architecture is used to generate enhanced bal-anced feature pyramid of which each feature level obtainsequal information from other feature levels In addition amultiscale region proposal network with predicted locationanchor scheme is introduced to generate good proposalsfrom each level of the balanced feature pyramid With goodproposals generated from balanced feature maps the pro-posed approach shows significant improvements compared

Table 3 Comparison of detection results on PKU vehicle dataset

MethodDetection ratio ()

G1 G2 G3 G4 G5 AverageZhou et al [31] 9543 9785 9421 8123 8237 9022Li et al [32] 9889 9842 9583 8117 8331 9152Yuan et al [13] 9876 9842 9772 9623 9732 9769Li et al [30] 9988 9971 9946 9983 9868 9951Proposed approach 9988 9986 9973 9983 9944 9975

8 Complexity

with other approaches on license plate detection (e goodperformance of the proposed approach on license platedetection has a high reference value in the field of intelligent

transport systems For the future work this paper will ex-plore and compare more feature combination andmultiscaledetection methods such as DeepLabv3+ [35] and MOSI-

(a) (b)

Figure 5 Examples of detection results of the proposed method on PKU vehicle dataset (a) and AOLP dataset (b)

Complexity 9

LPD [21] In addition this paper will adopt the nonlocalmodule [36] to further refine the balanced semantic features(is step may enhance the integrated features and furtherimprove the detection results

Data Availability

(e codes used in this paper are available from the corre-sponding author upon request

Conflicts of Interest

(e author declares that there are no conflicts of interestregarding the publication of this paper

References

[1] S Ren K He R Girshick and J Sun ldquoFaster r-cnn towardsreal-time object detection with region proposal networksrdquoIEEE Transactions on Pattern Analysis and Machine Intelli-gence vol 39 no 6 pp 1137ndash1149 2015

[2] W Liu D Anguelov D Erhan et al ldquoSingle shot multiboxdetectorrdquo 2016 httpsarxivorgabs151202325

[3] J Redmon and F Ali ldquoYolov3 an incremental improvementrdquo2018 httparxivorgabs180402767

[4] T Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 936ndash944 Hon-olulu HI USA July 2017

[5] T Lin P Goyal R Girshick K He and P Dollar ldquoFocal lossfor dense object detectionrdquo in Proceedings of the 2017 IEEEInternational Conference on Computer Vision (ICCV)pp 2999ndash3007 Venice Italy October 2017

[6] Z Cai Q Fan R S Feris and N Vasconcelos ldquoA unifiedmulti-scale deep convolutional neural network for fast objectdetectionrdquo 2016 httparxivorgabs160707155

[7] C-Y Fu W Liu A Ranga A Tyagi and A C Berg ldquoDssddeconvolutional single shot detectorrdquo 2017 httparxivorgabs170106659

[8] R Girshick ldquoFast R-CNNrdquo in Proceedings of the IEEE In-ternational Conference on Computer Vision (ICCV) SantiagoChile December 2015

[9] J Dai Yi Li K He and J Sun ldquoR-fcn object detection viaregion-based fully convolutional networksrdquo Advances inNeural information Processing Systems pp 379ndash387 MitPress Cambridge MA USA 2016

[10] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo 2016 httparxivorgabs150602640

[11] S Bell C L Zitnick K Bala and R Girshick ldquoInside-outsidenet detecting objects in context with skip pooling and re-current neural networksrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognitionpp 2874ndash2883 Las Vegas NV USA June 2016

[12] K S Raghunandan P Shivakumara H A Jalab et al ldquoRieszfractional based model for enhancing license plate detectionand recognitionrdquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 28 no 9 pp 2276ndash2288 2018

[13] Y Yuan W Zou Y Zhao X Wang X Hu andN Komodakis ldquoA robust and efficient approach to licenseplate detectionrdquo IEEE Transactions on Image Processingvol 26 no 3 pp 1102ndash1114 2017

[14] C Gou K Wang Y Yao and Z Li ldquoVehicle license platerecognition based on extremal regions and restricted Boltz-mann machinesrdquo IEEE Transactions on Intelligent Trans-portation Systems vol 17 no 4 pp 1096ndash1107 2016

[15] A H Ashtari M J Nordin andM Fathy ldquoAn Iranian licenseplate recognition system based on color featuresrdquo IEEETransactions on Intelligent Transportation Systems vol 15no 4 pp 1690ndash1705 2014

[16] S G Kim H G Jeon and H I Koo ldquoDeep-learning-basedlicense plate detection method using vehicle region extrac-tionrdquo Electronics Letters vol 53 no 15 pp 1034ndash1036 2017

[17] O Bulan V Kozitsky P Ramesh and M Shreve ldquoSeg-mentation- and annotation-free license plate recognition withdeep localization and failure identificationrdquo IEEE Transac-tions on Intelligent Transportation Systems vol 18 no 9pp 2351ndash2363 2017

[18] F Xie M Zhang J Zhao J Yang Y Liu and X Yuan ldquoArobust license plate detection and character recognition

Table 4 Comparison of detection results on AOLP dataset

MethodDetection ratio ()

AC LE RP AverageHsu et al [29] 960 950 940 95Li et al [33] 9838 9762 9558 9719Li et al [30] 9912 9908 9820 988Proposed approach 9941 9934 9951 9942

Table 5 Detection results of each proposed network on CCPD dataset

NetworkDetection performance ()

Base DB FN Rotate Tilt Weather Challenge AverageFaster R-CNN 981 921 837 918 894 811 839 886FPN with faster R-CNN baseline 992 954 875 930 913 854 863 912Faster R-CNN+balanced feature pyramid 995 962 891 932 916 889 901 927Faster R-CNN+balanced pyramid without L2-norm 993 962 889 931 910 876 883 921Faster R-CNN+predicted anchor RPN 985 926 840 915 894 814 844 888Faster R-CNN+balanced pyramid with L2-norm+predicted anchor RPN 995 964 901 932 918 897 912 931

10 Complexity

algorithm based on a combined feature extraction model andBPNNrdquo Journal of Advanced Transportation vol 2018 ArticleID 6737314 14 pages 2018

[19] L Zou M Zhao Z Gao M Cao H Jia and M Pei ldquoLicenseplate detection with shallow and deep CNNs in complexenvironmentsrdquo Complexity vol 2018 Article ID 79846536 pages 2018

[20] L Xie T Ahmad L Jin Y Liu and S Zhang ldquoA new CNN-based method for multi-directional car license plate detec-tionrdquo IEEE Transactions on Intelligent Transportation Systemsvol 19 no 2 pp 507ndash517 2018

[21] J Han J Yao J Zhao J Tu and Y Liu ldquoMulti-oriented andscale-invariant license plate detection based on convolutionalneural networksrdquo Sensors vol 19 no 5 p 1175 2019

[22] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the 2016 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 770ndash778 Las Vegas NV USA June 2016

[23] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httparxivorgabs14091556

[24] J Huang ldquoSpeedaccuracy trade-offs for modern convolu-tional object detectorsrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 3296-3297 Honolulu HI USA July 2017

[25] O Russakovsky J Deng H Su et al ldquoImagenet large scalevisual recognition challengerdquo 2014 httparxivorgabs14090575

[26] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in European Conference on Com-puter Vision Springer Berlin Germany 2014

[27] J Pang ldquoLibra r-cnn towards balanced learning for objectdetectionrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition Long Beach CA USA June2019

[28] J Wang K Chen S Yang C C Loy and D Lin ldquoRegionproposal by guided anchoringrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognitionpp 2965ndash2974 Long Beach CA USA June 2019

[29] G-S Hsu J-C Chen and Y-Z Chung ldquoApplication-ori-ented license plate recognitionrdquo IEEE Transactions on Ve-hicular Technology vol 62 no 2 pp 552ndash561 2013

[30] H Li P Wang and C Shen ldquoToward end-to-end car licenseplate detection and recognition with deep neural networksrdquoIEEE Transactions on Intelligent Transportation Systemsvol 20 no 3 pp 1126ndash1136 2019

[31] W Zhou H Li Y Lu and Q Tian ldquoPrincipal visual worddiscovery for automatic license plate detectionrdquo IEEETransactions on Image Processing vol 21 no 9 pp 4269ndash4279 2012

[32] B Li B Tian Y Li and D Wen ldquoComponent-based licenseplate detection using conditional random field modelrdquo IEEETransactions on Intelligent Transportation Systems vol 14no 4 pp 1690ndash1699 2013

[33] H Li and C Shen ldquoReading car license plates using deepconvolutional neural networks and LSTMsrdquo 2016 httpsarxivorgabs160105610

[34] Z Xu W Yang A Meng et al ldquoTowards end-to-end licenseplate detection and recognition a large dataset and baselinerdquoComputer VisionmdashECCV 2018 Springer Berlin Germanypp 261ndash277 2018

[35] L-C Chen Y Zhu P George F Schroff and H AdamldquoEncoder-decoder with atrous separable convolution for se-mantic image segmentationrdquo in Proceedings of the European

Conference on Computer Vision (ECCV) pp 801ndash818Munich Germany September 2018

[36] X Wang R Girshick A Gupta and K He ldquoNon-local neuralnetworksrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7794ndash7803 Honolulu HIUSA July 2018

Complexity 11

Page 9: PredictedAnchorRegionProposalwithBalancedFeature ...downloads.hindawi.com › journals › complexity › 2020 › 5137056.pdf · R-CNN for license plate detection. In the first

with other approaches on license plate detection (e goodperformance of the proposed approach on license platedetection has a high reference value in the field of intelligent

transport systems For the future work this paper will ex-plore and compare more feature combination andmultiscaledetection methods such as DeepLabv3+ [35] and MOSI-

(a) (b)

Figure 5 Examples of detection results of the proposed method on PKU vehicle dataset (a) and AOLP dataset (b)

Complexity 9

LPD [21] In addition this paper will adopt the nonlocalmodule [36] to further refine the balanced semantic features(is step may enhance the integrated features and furtherimprove the detection results

Data Availability

(e codes used in this paper are available from the corre-sponding author upon request

Conflicts of Interest

(e author declares that there are no conflicts of interestregarding the publication of this paper

References

[1] S Ren K He R Girshick and J Sun ldquoFaster r-cnn towardsreal-time object detection with region proposal networksrdquoIEEE Transactions on Pattern Analysis and Machine Intelli-gence vol 39 no 6 pp 1137ndash1149 2015

[2] W Liu D Anguelov D Erhan et al ldquoSingle shot multiboxdetectorrdquo 2016 httpsarxivorgabs151202325

[3] J Redmon and F Ali ldquoYolov3 an incremental improvementrdquo2018 httparxivorgabs180402767

[4] T Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 936ndash944 Hon-olulu HI USA July 2017

[5] T Lin P Goyal R Girshick K He and P Dollar ldquoFocal lossfor dense object detectionrdquo in Proceedings of the 2017 IEEEInternational Conference on Computer Vision (ICCV)pp 2999ndash3007 Venice Italy October 2017

[6] Z Cai Q Fan R S Feris and N Vasconcelos ldquoA unifiedmulti-scale deep convolutional neural network for fast objectdetectionrdquo 2016 httparxivorgabs160707155

[7] C-Y Fu W Liu A Ranga A Tyagi and A C Berg ldquoDssddeconvolutional single shot detectorrdquo 2017 httparxivorgabs170106659

[8] R Girshick ldquoFast R-CNNrdquo in Proceedings of the IEEE In-ternational Conference on Computer Vision (ICCV) SantiagoChile December 2015

[9] J Dai Yi Li K He and J Sun ldquoR-fcn object detection viaregion-based fully convolutional networksrdquo Advances inNeural information Processing Systems pp 379ndash387 MitPress Cambridge MA USA 2016

[10] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo 2016 httparxivorgabs150602640

[11] S Bell C L Zitnick K Bala and R Girshick ldquoInside-outsidenet detecting objects in context with skip pooling and re-current neural networksrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognitionpp 2874ndash2883 Las Vegas NV USA June 2016

[12] K S Raghunandan P Shivakumara H A Jalab et al ldquoRieszfractional based model for enhancing license plate detectionand recognitionrdquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 28 no 9 pp 2276ndash2288 2018

[13] Y Yuan W Zou Y Zhao X Wang X Hu andN Komodakis ldquoA robust and efficient approach to licenseplate detectionrdquo IEEE Transactions on Image Processingvol 26 no 3 pp 1102ndash1114 2017

[14] C Gou K Wang Y Yao and Z Li ldquoVehicle license platerecognition based on extremal regions and restricted Boltz-mann machinesrdquo IEEE Transactions on Intelligent Trans-portation Systems vol 17 no 4 pp 1096ndash1107 2016

[15] A H Ashtari M J Nordin andM Fathy ldquoAn Iranian licenseplate recognition system based on color featuresrdquo IEEETransactions on Intelligent Transportation Systems vol 15no 4 pp 1690ndash1705 2014

[16] S G Kim H G Jeon and H I Koo ldquoDeep-learning-basedlicense plate detection method using vehicle region extrac-tionrdquo Electronics Letters vol 53 no 15 pp 1034ndash1036 2017

[17] O Bulan V Kozitsky P Ramesh and M Shreve ldquoSeg-mentation- and annotation-free license plate recognition withdeep localization and failure identificationrdquo IEEE Transac-tions on Intelligent Transportation Systems vol 18 no 9pp 2351ndash2363 2017

[18] F Xie M Zhang J Zhao J Yang Y Liu and X Yuan ldquoArobust license plate detection and character recognition

Table 4 Comparison of detection results on AOLP dataset

MethodDetection ratio ()

AC LE RP AverageHsu et al [29] 960 950 940 95Li et al [33] 9838 9762 9558 9719Li et al [30] 9912 9908 9820 988Proposed approach 9941 9934 9951 9942

Table 5 Detection results of each proposed network on CCPD dataset

NetworkDetection performance ()

Base DB FN Rotate Tilt Weather Challenge AverageFaster R-CNN 981 921 837 918 894 811 839 886FPN with faster R-CNN baseline 992 954 875 930 913 854 863 912Faster R-CNN+balanced feature pyramid 995 962 891 932 916 889 901 927Faster R-CNN+balanced pyramid without L2-norm 993 962 889 931 910 876 883 921Faster R-CNN+predicted anchor RPN 985 926 840 915 894 814 844 888Faster R-CNN+balanced pyramid with L2-norm+predicted anchor RPN 995 964 901 932 918 897 912 931

10 Complexity

algorithm based on a combined feature extraction model andBPNNrdquo Journal of Advanced Transportation vol 2018 ArticleID 6737314 14 pages 2018

[19] L Zou M Zhao Z Gao M Cao H Jia and M Pei ldquoLicenseplate detection with shallow and deep CNNs in complexenvironmentsrdquo Complexity vol 2018 Article ID 79846536 pages 2018

[20] L Xie T Ahmad L Jin Y Liu and S Zhang ldquoA new CNN-based method for multi-directional car license plate detec-tionrdquo IEEE Transactions on Intelligent Transportation Systemsvol 19 no 2 pp 507ndash517 2018

[21] J Han J Yao J Zhao J Tu and Y Liu ldquoMulti-oriented andscale-invariant license plate detection based on convolutionalneural networksrdquo Sensors vol 19 no 5 p 1175 2019

[22] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the 2016 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 770ndash778 Las Vegas NV USA June 2016

[23] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httparxivorgabs14091556

[24] J Huang ldquoSpeedaccuracy trade-offs for modern convolu-tional object detectorsrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 3296-3297 Honolulu HI USA July 2017

[25] O Russakovsky J Deng H Su et al ldquoImagenet large scalevisual recognition challengerdquo 2014 httparxivorgabs14090575

[26] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in European Conference on Com-puter Vision Springer Berlin Germany 2014

[27] J Pang ldquoLibra r-cnn towards balanced learning for objectdetectionrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition Long Beach CA USA June2019

[28] J Wang K Chen S Yang C C Loy and D Lin ldquoRegionproposal by guided anchoringrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognitionpp 2965ndash2974 Long Beach CA USA June 2019

[29] G-S Hsu J-C Chen and Y-Z Chung ldquoApplication-ori-ented license plate recognitionrdquo IEEE Transactions on Ve-hicular Technology vol 62 no 2 pp 552ndash561 2013

[30] H Li P Wang and C Shen ldquoToward end-to-end car licenseplate detection and recognition with deep neural networksrdquoIEEE Transactions on Intelligent Transportation Systemsvol 20 no 3 pp 1126ndash1136 2019

[31] W Zhou H Li Y Lu and Q Tian ldquoPrincipal visual worddiscovery for automatic license plate detectionrdquo IEEETransactions on Image Processing vol 21 no 9 pp 4269ndash4279 2012

[32] B Li B Tian Y Li and D Wen ldquoComponent-based licenseplate detection using conditional random field modelrdquo IEEETransactions on Intelligent Transportation Systems vol 14no 4 pp 1690ndash1699 2013

[33] H Li and C Shen ldquoReading car license plates using deepconvolutional neural networks and LSTMsrdquo 2016 httpsarxivorgabs160105610

[34] Z Xu W Yang A Meng et al ldquoTowards end-to-end licenseplate detection and recognition a large dataset and baselinerdquoComputer VisionmdashECCV 2018 Springer Berlin Germanypp 261ndash277 2018

[35] L-C Chen Y Zhu P George F Schroff and H AdamldquoEncoder-decoder with atrous separable convolution for se-mantic image segmentationrdquo in Proceedings of the European

Conference on Computer Vision (ECCV) pp 801ndash818Munich Germany September 2018

[36] X Wang R Girshick A Gupta and K He ldquoNon-local neuralnetworksrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7794ndash7803 Honolulu HIUSA July 2018

Complexity 11

Page 10: PredictedAnchorRegionProposalwithBalancedFeature ...downloads.hindawi.com › journals › complexity › 2020 › 5137056.pdf · R-CNN for license plate detection. In the first

LPD [21] In addition this paper will adopt the nonlocalmodule [36] to further refine the balanced semantic features(is step may enhance the integrated features and furtherimprove the detection results

Data Availability

(e codes used in this paper are available from the corre-sponding author upon request

Conflicts of Interest

(e author declares that there are no conflicts of interestregarding the publication of this paper

References

[1] S Ren K He R Girshick and J Sun ldquoFaster r-cnn towardsreal-time object detection with region proposal networksrdquoIEEE Transactions on Pattern Analysis and Machine Intelli-gence vol 39 no 6 pp 1137ndash1149 2015

[2] W Liu D Anguelov D Erhan et al ldquoSingle shot multiboxdetectorrdquo 2016 httpsarxivorgabs151202325

[3] J Redmon and F Ali ldquoYolov3 an incremental improvementrdquo2018 httparxivorgabs180402767

[4] T Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the 2017 IEEE Conference on ComputerVision and Pattern Recognition (CVPR) pp 936ndash944 Hon-olulu HI USA July 2017

[5] T Lin P Goyal R Girshick K He and P Dollar ldquoFocal lossfor dense object detectionrdquo in Proceedings of the 2017 IEEEInternational Conference on Computer Vision (ICCV)pp 2999ndash3007 Venice Italy October 2017

[6] Z Cai Q Fan R S Feris and N Vasconcelos ldquoA unifiedmulti-scale deep convolutional neural network for fast objectdetectionrdquo 2016 httparxivorgabs160707155

[7] C-Y Fu W Liu A Ranga A Tyagi and A C Berg ldquoDssddeconvolutional single shot detectorrdquo 2017 httparxivorgabs170106659

[8] R Girshick ldquoFast R-CNNrdquo in Proceedings of the IEEE In-ternational Conference on Computer Vision (ICCV) SantiagoChile December 2015

[9] J Dai Yi Li K He and J Sun ldquoR-fcn object detection viaregion-based fully convolutional networksrdquo Advances inNeural information Processing Systems pp 379ndash387 MitPress Cambridge MA USA 2016

[10] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo 2016 httparxivorgabs150602640

[11] S Bell C L Zitnick K Bala and R Girshick ldquoInside-outsidenet detecting objects in context with skip pooling and re-current neural networksrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognitionpp 2874ndash2883 Las Vegas NV USA June 2016

[12] K S Raghunandan P Shivakumara H A Jalab et al ldquoRieszfractional based model for enhancing license plate detectionand recognitionrdquo IEEE Transactions on Circuits and Systemsfor Video Technology vol 28 no 9 pp 2276ndash2288 2018

[13] Y Yuan W Zou Y Zhao X Wang X Hu andN Komodakis ldquoA robust and efficient approach to licenseplate detectionrdquo IEEE Transactions on Image Processingvol 26 no 3 pp 1102ndash1114 2017

[14] C Gou K Wang Y Yao and Z Li ldquoVehicle license platerecognition based on extremal regions and restricted Boltz-mann machinesrdquo IEEE Transactions on Intelligent Trans-portation Systems vol 17 no 4 pp 1096ndash1107 2016

[15] A H Ashtari M J Nordin andM Fathy ldquoAn Iranian licenseplate recognition system based on color featuresrdquo IEEETransactions on Intelligent Transportation Systems vol 15no 4 pp 1690ndash1705 2014

[16] S G Kim H G Jeon and H I Koo ldquoDeep-learning-basedlicense plate detection method using vehicle region extrac-tionrdquo Electronics Letters vol 53 no 15 pp 1034ndash1036 2017

[17] O Bulan V Kozitsky P Ramesh and M Shreve ldquoSeg-mentation- and annotation-free license plate recognition withdeep localization and failure identificationrdquo IEEE Transac-tions on Intelligent Transportation Systems vol 18 no 9pp 2351ndash2363 2017

[18] F Xie M Zhang J Zhao J Yang Y Liu and X Yuan ldquoArobust license plate detection and character recognition

Table 4 Comparison of detection results on AOLP dataset

MethodDetection ratio ()

AC LE RP AverageHsu et al [29] 960 950 940 95Li et al [33] 9838 9762 9558 9719Li et al [30] 9912 9908 9820 988Proposed approach 9941 9934 9951 9942

Table 5 Detection results of each proposed network on CCPD dataset

NetworkDetection performance ()

Base DB FN Rotate Tilt Weather Challenge AverageFaster R-CNN 981 921 837 918 894 811 839 886FPN with faster R-CNN baseline 992 954 875 930 913 854 863 912Faster R-CNN+balanced feature pyramid 995 962 891 932 916 889 901 927Faster R-CNN+balanced pyramid without L2-norm 993 962 889 931 910 876 883 921Faster R-CNN+predicted anchor RPN 985 926 840 915 894 814 844 888Faster R-CNN+balanced pyramid with L2-norm+predicted anchor RPN 995 964 901 932 918 897 912 931

10 Complexity

algorithm based on a combined feature extraction model andBPNNrdquo Journal of Advanced Transportation vol 2018 ArticleID 6737314 14 pages 2018

[19] L Zou M Zhao Z Gao M Cao H Jia and M Pei ldquoLicenseplate detection with shallow and deep CNNs in complexenvironmentsrdquo Complexity vol 2018 Article ID 79846536 pages 2018

[20] L Xie T Ahmad L Jin Y Liu and S Zhang ldquoA new CNN-based method for multi-directional car license plate detec-tionrdquo IEEE Transactions on Intelligent Transportation Systemsvol 19 no 2 pp 507ndash517 2018

[21] J Han J Yao J Zhao J Tu and Y Liu ldquoMulti-oriented andscale-invariant license plate detection based on convolutionalneural networksrdquo Sensors vol 19 no 5 p 1175 2019

[22] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the 2016 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 770ndash778 Las Vegas NV USA June 2016

[23] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httparxivorgabs14091556

[24] J Huang ldquoSpeedaccuracy trade-offs for modern convolu-tional object detectorsrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 3296-3297 Honolulu HI USA July 2017

[25] O Russakovsky J Deng H Su et al ldquoImagenet large scalevisual recognition challengerdquo 2014 httparxivorgabs14090575

[26] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in European Conference on Com-puter Vision Springer Berlin Germany 2014

[27] J Pang ldquoLibra r-cnn towards balanced learning for objectdetectionrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition Long Beach CA USA June2019

[28] J Wang K Chen S Yang C C Loy and D Lin ldquoRegionproposal by guided anchoringrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognitionpp 2965ndash2974 Long Beach CA USA June 2019

[29] G-S Hsu J-C Chen and Y-Z Chung ldquoApplication-ori-ented license plate recognitionrdquo IEEE Transactions on Ve-hicular Technology vol 62 no 2 pp 552ndash561 2013

[30] H Li P Wang and C Shen ldquoToward end-to-end car licenseplate detection and recognition with deep neural networksrdquoIEEE Transactions on Intelligent Transportation Systemsvol 20 no 3 pp 1126ndash1136 2019

[31] W Zhou H Li Y Lu and Q Tian ldquoPrincipal visual worddiscovery for automatic license plate detectionrdquo IEEETransactions on Image Processing vol 21 no 9 pp 4269ndash4279 2012

[32] B Li B Tian Y Li and D Wen ldquoComponent-based licenseplate detection using conditional random field modelrdquo IEEETransactions on Intelligent Transportation Systems vol 14no 4 pp 1690ndash1699 2013

[33] H Li and C Shen ldquoReading car license plates using deepconvolutional neural networks and LSTMsrdquo 2016 httpsarxivorgabs160105610

[34] Z Xu W Yang A Meng et al ldquoTowards end-to-end licenseplate detection and recognition a large dataset and baselinerdquoComputer VisionmdashECCV 2018 Springer Berlin Germanypp 261ndash277 2018

[35] L-C Chen Y Zhu P George F Schroff and H AdamldquoEncoder-decoder with atrous separable convolution for se-mantic image segmentationrdquo in Proceedings of the European

Conference on Computer Vision (ECCV) pp 801ndash818Munich Germany September 2018

[36] X Wang R Girshick A Gupta and K He ldquoNon-local neuralnetworksrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7794ndash7803 Honolulu HIUSA July 2018

Complexity 11

Page 11: PredictedAnchorRegionProposalwithBalancedFeature ...downloads.hindawi.com › journals › complexity › 2020 › 5137056.pdf · R-CNN for license plate detection. In the first

algorithm based on a combined feature extraction model andBPNNrdquo Journal of Advanced Transportation vol 2018 ArticleID 6737314 14 pages 2018

[19] L Zou M Zhao Z Gao M Cao H Jia and M Pei ldquoLicenseplate detection with shallow and deep CNNs in complexenvironmentsrdquo Complexity vol 2018 Article ID 79846536 pages 2018

[20] L Xie T Ahmad L Jin Y Liu and S Zhang ldquoA new CNN-based method for multi-directional car license plate detec-tionrdquo IEEE Transactions on Intelligent Transportation Systemsvol 19 no 2 pp 507ndash517 2018

[21] J Han J Yao J Zhao J Tu and Y Liu ldquoMulti-oriented andscale-invariant license plate detection based on convolutionalneural networksrdquo Sensors vol 19 no 5 p 1175 2019

[22] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the 2016 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 770ndash778 Las Vegas NV USA June 2016

[23] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httparxivorgabs14091556

[24] J Huang ldquoSpeedaccuracy trade-offs for modern convolu-tional object detectorsrdquo in Proceedings of the 2017 IEEEConference on Computer Vision and Pattern Recognition(CVPR) pp 3296-3297 Honolulu HI USA July 2017

[25] O Russakovsky J Deng H Su et al ldquoImagenet large scalevisual recognition challengerdquo 2014 httparxivorgabs14090575

[26] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in European Conference on Com-puter Vision Springer Berlin Germany 2014

[27] J Pang ldquoLibra r-cnn towards balanced learning for objectdetectionrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition Long Beach CA USA June2019

[28] J Wang K Chen S Yang C C Loy and D Lin ldquoRegionproposal by guided anchoringrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognitionpp 2965ndash2974 Long Beach CA USA June 2019

[29] G-S Hsu J-C Chen and Y-Z Chung ldquoApplication-ori-ented license plate recognitionrdquo IEEE Transactions on Ve-hicular Technology vol 62 no 2 pp 552ndash561 2013

[30] H Li P Wang and C Shen ldquoToward end-to-end car licenseplate detection and recognition with deep neural networksrdquoIEEE Transactions on Intelligent Transportation Systemsvol 20 no 3 pp 1126ndash1136 2019

[31] W Zhou H Li Y Lu and Q Tian ldquoPrincipal visual worddiscovery for automatic license plate detectionrdquo IEEETransactions on Image Processing vol 21 no 9 pp 4269ndash4279 2012

[32] B Li B Tian Y Li and D Wen ldquoComponent-based licenseplate detection using conditional random field modelrdquo IEEETransactions on Intelligent Transportation Systems vol 14no 4 pp 1690ndash1699 2013

[33] H Li and C Shen ldquoReading car license plates using deepconvolutional neural networks and LSTMsrdquo 2016 httpsarxivorgabs160105610

[34] Z Xu W Yang A Meng et al ldquoTowards end-to-end licenseplate detection and recognition a large dataset and baselinerdquoComputer VisionmdashECCV 2018 Springer Berlin Germanypp 261ndash277 2018

[35] L-C Chen Y Zhu P George F Schroff and H AdamldquoEncoder-decoder with atrous separable convolution for se-mantic image segmentationrdquo in Proceedings of the European

Conference on Computer Vision (ECCV) pp 801ndash818Munich Germany September 2018

[36] X Wang R Girshick A Gupta and K He ldquoNon-local neuralnetworksrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7794ndash7803 Honolulu HIUSA July 2018

Complexity 11