A Deep Learning and Image Recognition System for Image ... · sets, but there was a breakthrough in a large-scale visual recognition competition not long ago. In 2012, a visual recognition

Data Science and Pattern Recognition c©2019 ISSN 2520-4165Ubiquitous International Volume 3, Number 2, May 2019

A Deep Learning and Image Recognition System for ImageRecognition

Ko-Wei HuangDepartment of Electrical Engineering

National Kaohsiung University of Science and TechnologyKaohsiung City, Taiwan

[email protected]

Chun-Cheng LinDepartment of Electrical Engineering


[email protected]

Yi-Ming LeeDepartment of Electrical Engineering


[email protected]

Ze-Xue WuDepartment of Electrical Engineering


[email protected]

ABSTRACT. With the improvement of computer hardware and computing power, the powerfuland automatic feature extraction capability of computers can effectively determine good featuresto improve the efficiency of model recognition systems. The purpose of this study was to apply avariety of binarization preprocessing algorithms and to implement an image recognition systemusing the GoogLeNet model, a convolutional-neural-network module based on deep learning.ImageNet, the largest image recognition database in the world, was used as the training dataset,and five types of gallery sets, namely cars, planes, horses, ships, and trucks, were chosen tocarry out the experiments. This system attempted to use a variety of algorithms on gray-scaleand binarized images; then, it trained three models based on GoogLeNet Inception classificationV1, Inception classification V2, and Inception classification V3, and attempted to improve theaccuracy of the identification of the original models. The system was implemented in two parts.The first part used six methods to preprocess the images. The second part trained and testedthe image data, and then, it used six matching modules to circle and label the image targetssuccessfully. The results of the experiment showed that the preprocessed images could effectivelyimprove the accuracy of module identification.Keywords: Deep learning, Image preprocessing, GoogleNet, Matching module, Image labelling

1. Introduction. Because of a shortage of computer hardware equipment and insufficient data,people were initially indifferent to artificial intelligence. However, the continuous improvementin the computing capability and computer technologies, the increase in hardware storage capa-bility, and the combination with big data have made machine learning a popular technology. Of

1

2 Ko-Wei Huang, Chun-Cheng Lin, Yi-Ming Lee, and Ze-Xue Wu

course, besides machine learning, another term that has frequently been heard in recent yearsis “deep learning.” It originated from the AlphaGo artificial-intelligence system developed byGoogle’s Deep Mind (https://deepmind.com/) in 2016 [1].

In fact, deep learning is only a subfield of machine learning. Its development in the fieldof image recognition far exceeds that of the image classification [2, 3]. The presence of deeplearning can be seen in big data [4], data mining [5–7], natural-language processing [8], recom-mendation systems [9, 10], and sementic image segmentation [11, 12], etc. In fact, the so-calleddeep learning refers to learning based on deep neural networks (DNNs), and the process of DNNlearning is the deep learning that we often hear about [13].

In the past, it was difficult to achieve high accuracy when processing a large number of imagesets, but there was a breakthrough in a large-scale visual recognition competition not long ago.In 2012, a visual recognition competition, the ImageNet Large Scale Visual Recognition Chal-lenge (ILSVRC), was held [14]. Krizhevsky et al. used a model based on convolutional neuralnetworks (CNNs) to reduce the early error rate from a bottleneck of 26% to an error rate of only16.4% [15]. Deep-learning technology added to image recognition achieved overwhelming per-formance, completely overturning the traditional methods of image recognition and producingconsiderable repercussions.

The research objective of this study was to utilize a variety of preprocessing algorithms andthe Google Inception Net model [16] to implement an image recognition system and try to makethe version of Inception V1 added with preprocessing reach the accuracy of V2 or V3 recog-nition results, to prove that the system’s recognition accuracy could be effectively improved byimage preprocessing. ImageNet [17], the largest image recognition database in the world, wasused as the training dataset, and five types of gallery sets, namely cars, planes, horses, boats,and trucks, were selected for the experiments. This system attempted to use a variety of algo-rithms to preprocess images, such as graying and binarization, and then input the preprocessedimages into three models Inception V1, V2, and V3 of Google Inception Net to train and pos-sibly improve the accuracy of the identification results. This system was implemented in twoparts. The first part was to obtain the abovementioned five gallery sets of images in ImageNetand manually filter out 1500 images that met the requirements in each gallery. After reorderingand naming the images through the program used and graying the images, we preprocessedthe images by using six methods provided by OpenCV [18, 19], such as the algorithms of fivedifferent selection threshold functions and the Otsu algorithm. The second part was to utilizethe preprocessed image set to use the Google Inception V1, V2, and V3 models in sequence toanalyze whether the accuracy of the models was effectively improved. At this stage, the param-eters, including the use of preprocessing algorithms and the number of iterations of the modelduring training, were adjusted. The results were analyzed and compared by adjusting the updateparameters. The results of the experiment showed that the preprocessed images could improvethe identification accuracy rate of the Inception V1 module to the identification accuracy rateof the Inception V2 and V3 modules. Finally, based on whether the module with the highestaccuracy rate could successfully circle the test target and label the identified test target throughthe tests in the six types of squares of the matching module, the target was successfully circled,and the label was pasted in several methods of the matching module.

The remainder of this paper is organized as follows. The literature review is described inSection 2. The proposed system is introduced in Section 3. The performance evaluation of theproposed system is presented in Section 4. Finally, Section 5 concludes this paper and providessuggestions for future work.

2. Literature review.

Deep Learning with Image Processing for Recognition 3

2.1. Convolutional neural networks. CNNs have been used extensively in the field of com-puter vision in recent years. CNNs can directly use image data as input without using algorithmsto find features, thus reducing the tedious operations of data preprocessing that were used in thepast. They are regarded as an efficient identification method. The main idea of a CNN canbe illustrated by the following example. When ordinary people look at an image, the same orsimilar features, such as dots, lines, or specific patterns, are often noticed first. Similarly, whenone watches a person’s image, one judges who the person is using features such as the eyes,nose, and mouth, and compares the results step by step through an analysis of features in eachpart. CNNs are the realization of this concept. Through the local connection, weight sharing,and down-sampling of CNNs, the number of parameters is reduced, the complexity of trainingis considerably reduced, and the model is given higher accuracy through a convolutional layerand a pooling layer.

The CNN is an important algorithm in deep learning. It is a feed-forward network [20]. Thebiggest difference between the CNN and the general back-propagation neural network [21] isthat the neurons between adjacent layers in the CNN are partially connected, not completelyconnected; i.e., the sensing area of a neuron comes from a part of the upper neural units, notall of the neural units are connected, and each neuron can extract basic features through localsensing.

2.2. GoogLeNet. The network structure of GoogLeNet is shown in Figure 1 [16]. The biggestfeature is that the neural network not only deepens in the vertical direction but also extends in thehorizontal direction. This structure with widening in the horizontal direction is the inceptionstructure. GoogLeNet is 22 layers deep, but it has small parameter and calculation values.In September 2014, it introduced Inception V1 [22]. By virtue of its excellent classificationability, with an error rate of 6.67%. In 2015, Inception V2 [23] learned VGGNet to replace 5× 5 with two 3 × 3 convolutions to reduce the parameters, and a normalization method wasadded, which considerably improved the accuracy. At the end of 2015, the proposal of InceptionV3 [12] split the larger two-dimensional convolution into two one-dimensional convolutions tospeed up the operation. Its module can handle more-abundant spatial features and is suitablefor image classification.

FIGURE 1. GoogLeNet network architecture [16]

2.3. Binarized image. Image preprocessing is used to remove noise or enhance images to im-prove image quality. The preprocessing action can highlight the desired information and usethe preprocessed image for the subsequent analysis. After a gray-scale operation is completed,the gray-scale image is then binarized. There are many algorithms for image binarization, ofwhich the Otsu algorithm is the most widely known [24]; it is also called the maximum inter-class variance method. Its core approach is to select a segmentation threshold value to dividethe background and the target into two parts according to the gray-scale characteristics of theimage, where variance is a measure of the uniformity of the image gray-scale distribution, andthe greater the value is, the greater is the difference between the target and the background. The


main characteristic of the Otsu algorithm is that it can automatically obtain the best thresholdvalue and find the threshold value according to the method of minimum intragroup variation andmaximum intergroup variation. In this study, we used five image preprocessing methods calledTHRESH-BINARY, THRESH-BINARY-INV, THRESH-TOZERO, THRESH-TOZERO-INV,and Thresh-Trunc [18, 19]. The original image and trail images are shown in Figure 2.

FIGURE 2. Original image and trail binarized images of a ship [17]

3. Proposed System.

3.1. Image Preprocessing Phase. First, the data preprocessing is shown in Figure 3. In this,the calculation formulas and principles of the five image-preprocessing algorithms selected inthis study are presented. In this section, the binarized images of various gallery sets are pre-sented.

Figure 3-1 Flow chart of the first part of the research methodsFIGURE 3. Data cleaning of the input data


3.2. GoogLeNet Inception Recognition Phase. In this phase, a combination of the prepro-cessing part and the GoogLeNet module is presented. The flow chart of steps is shown inFigure 4. In this paper, three different versions of GoogLeNet — Inception V1, Inception Clas-sification V2, and Inception Classification V3 — were used for training. The input datasetincluded 100, 500, 1000, and 1500 images of each type. The training dataset was divided into80% training dataset, 10% test dataset, and 10% verification dataset, while the parameter settingpart set the iterative training times as 4000, 3000, 2000, and 1000 times and the learning ratesas 0.01, 0.005, and 0.001; each batch trained 100 pieces of data.

Start

Preprocessing image set

Inception V1/Inception V2/Inception V3

Convolution network training

Training completed

End of the second part

GoogLeNet

Import test picture

Result accuracy prediction

FIGURE 4. Flowchart of GoogleNet Inception Recognition Phase

3.3. Image Labelling. In this phase, we used six image labelling methods called TM-SQDIFF,TM-SQDIFF-NORMED, TM-CCORR, TM-CCORR-NORMED, TM-CCOEFF, and TM-CCOEFF-NORMED [10, 18]. The target images are shown in Figure 5.

4. Experimental Results. In this section, the accuracy of the identification results when the in-put datasets were 100, 500, 1000, and 1500 sheets using GoogLeNet Inception V1 is described.Figure 6 shows the first test chart of a ship image. Table 1 shows the test results of using theGoogLeNet Inception V1 module when the number of input data sheets and iterations was setat 4000, with a learning rate of 0.01 and 100 entries of data trained in each batch. Table 1also shows the identification results after adding algorithms using GoogLeNet Inception V1,where “Best” represents the proportion of results that the system judges to be the most similarto a type, “Second” represents the proportion of results that the system judges to be the secondmost similar, and bold characters represent the identification results of correct solutions. Forexample, when using the Trunc algorithm to input 100 sheets of datasets in V1, using the testchart in Figure 6, the probability of obtaining a photograph of a ship was 85%. The resultsshowed that the module added with the Trunc algorithm improved the identification degreewhen the number of data entries was small. However, when the number of input datasets wasmore than 1,000, the accuracy decreases slightly. In this experiment, the module added withTrunc pretreatment raised the identification degree to 80% and obtained a correct solution whenthe number of datasets was low. Table 2 presents another two-ship test chart. The test results


FIGURE 5. Results obtained using the six labelling method

are the same as those in Table 1. When the module added with the Trunc algorithm was used,the accuracy of the data count is improved, and a good identification degree was maintained.The main advantage of Trunc method to binary the image that can remove the most of noise.

Figure 4-1 Test chart of ship (I)FIGURE 6. Testing image: ship [17]

TABLE 1. Accuracy rate of GoogLeNet Inception V1-based adept image preprocessing

Image Preprocessing Method V1 100 V1 500 V1000 V1 1500None 41% 38% 71% 79%Otsu [13] 82% 64% 44% 57%THRESH-BINARY 93% 72% 70% 67%THRESH-BINARY-INV 93% 72% 70% 67%THRESH-TOZERO 51% 50% 56% 63%THRESH-TOZERO-INV 71% 61% 57% 54%Thresh-Trunc 86% 86% 78% 80%

Table 2 shows the test results when GoogLeNet Inception V2 was used with 4000 iterationsand a learning rate of 0.01. Table 2 shows that the accuracy of GoogLeNet Inception V2 whenthe numbers of input data were 100 and 500 was considerably higher than that of V1, and thecorrect identification degree was maintained under each number of input data entries. However,the model added with Trunc was the best model to add the algorithm, which improved theaccuracy by 5% to 10% on average.


TABLE 2. Accuracy rate of GoogLeNet Inception V2-based adept image preprocessing

Image Preprocessing Method V2 100 V2 500 V2 1000 V2 1500None 71% 83% 71% 78%Otsu [13] 82% 64% 43% 61%THRESH-BINARY 93% 75% 67% 66%THRESH-BINARY-INV 81% 64% 42% 43%THRESH-TOZERO 50% 47% 55% 57%THRESH-TOZERO-INV 73% 65% 54% 52%Thresh-Trunc 86% 86% 73% 79%

Table 3 shows the test results for each input data sheet using GoogLeNet Inception V3 with4000 iterations and a learning rate of 0.01. The table shows that GoogLeNet Inception V3added with the Trunc algorithm maintained a stable recognition rate and a correct solution forthe considered number of input data entries.

The following is a description of the model accuracy data and curves trained with test im-ages by not using the binarization method (THRESH-TOZERO) gallery and the binarization(THRESH-TOZERO) gallery and setting the Inception Net module parameters inside the In-ception templates (abbreviated as V1, V2, and V3 in the following illustration), the learningrate, and the training iteration times.

TABLE 3. Accuracy rate of GoogLeNet Inception V3-based image preprocessing

Image Preprocessing Method V3 100 V3 500 V3 1000 V3 1500None 90% 80% 78% 78%Otsu [13] 81% 66% 52% 61%THRESH-BINARY 92% 71% 63% 66%THRESH-BINARY-INV 82% 67% 46% 43%THRESH-TOZERO 51% 46% 61% 58%THRESH-TOZERO-INV 72% 62% 52% 54%Thresh-Trunc 86% 86% 82% 81%

Figure 4-2 Test chart of deer (II)FIGURE 7. Testing image: deer [17]

Figure 7 shows an image of the accuracy of the deer test and training module. Table 4 presentsthe data of the accuracy of the test and the training models based on the original image shown inFigure 7. The results showed that the test results with the Google Inception Net module versionof Inception V3, 4000 training iterations, and 0.01 learning rate were the best. Table 5 shows theaccuracy data of the training model tested using the binarized gallery shown in Figure 7. Theresults showed that the test results with the Google Inception Net module version of InceptionV3, training iteration number of 4000, and learning rate of 0.01 were the best.


TABLE 4. Accuracy rate of GoogLeNet Inception using different learning ratesand without image preprocessing

Image Preprocessing Method Number of images Learning rate = 0.01 Learning rate = 0.005 Learning rate = 0.001

V1 1000 74% 67% 46%

2000 82% 73% 53%

3000 82% 77% 59%

4000 83% 78% 64%

V2 1000 79% 68% 45%

2000 55% 75% 53%

3000 58% 78% 60%

4000 83% 82% 64%

V3 1000 73% 67% 45%

2000 77% 75% 53%

3000 84% 77% 59%

4000 85% 79% 63%

TABLE 5. Accuracy rate of GoogLeNet Inception using different learning ratesand image preprocessing

Image Preprocessing Method Number of images Learning rate = 0.01 Learning rate = 0.005 Learning rate = 0.001

V1 1000 71% 57% 39%

2000 81% 68% 40%

3000 87% 74% 48%

4000 90% 81% 52%

V2 1000 90% 81% 52%

2000 79% 67% 42%

3000 86% 77% 47%

4000 91% 83% 52%

V3 1000 71% 57% 34%

2000 79% 70% 40%

3000 87% 75% 48%

4000 90% 81% 53%

Table 6 shows the module with the highest accuracy rate tested in this study. The parametersettings of this module were as follows: the gallery was binarized, the learning rate was 0.01,and the number of training iterations was 4000. Twenty-five images of five animals, namelydogs, cats, deer, frogs, and birds, were used to test the accuracy of the matching module. Sixmethods were used to circle the target and label it correctly. The TM-CCOEFF-NORMEDmethod was used to circle the target and label it correctly with the highest accuracy.

The deer test normalized correlation coefficient matching method for the 25 test images suc-cessfully circled the identification target and labeled it. The normalized correlation coefficientmatching method successfully circled the target with an accuracy rate of 88%. Because the leftchart was the value of the matching result and the white area is relatively concentrated, the rightchart can successfully circle the target with the highest accuracy rate.

REFERENCES 9

TABLE 6. Accuracy rate of the image labelling

Image Labelling Method Accuracy RateTM-CCOEFF 76%TM-SQDIFF-NORMED 88%TM-CCORR 36%TM-CCORR-NORMED 76%TM-CCOEFF 72%TM-CCOEFF-NORMED 72%

5. Conclusions. An image recognition system based on deep learning was described. The Im-ageNet image recognition database was used as the training data. Various image-preprocessingalgorithms were used to gray-scale and binarize the training data to enhance the training data.Then, a GoogLeNet model was used to train these preprocessed images and test their recogni-tion results. The results of the experiment showed that the images preprocessed using Trunc hadbetter recognition results, which optimized the accuracy rate of the original module by 5% to10% on average, particularly when the number of training iterations was 4000, the learning ratewas 0.01, and the template was Inception V3; this was the highest accuracy rate of the trainingmodel. This study used the module with the highest accuracy rate as the test image accuracyrate from the trained modules, and then, the pair module provided by OpenCV was used tocircle the target object of the image, so that the image of the identification result showed thecircle of the target object and the label of the identification result. In the experiment, we foundthat the normalized correlation coefficient matching method of the six matching methods insidewas the method with the highest accuracy rate and could successfully circle the target objectand label it. In the future, we hope that the GoogLeNet module can be used in combinationwith a preprocessing algorithm to identify medical images and be used as an auxiliary diagnos-tic system for medical images. Using this image identification system, computer learning canbe successfully trained, and the target area of medical images can be automatically selectedfor image interpretation, thus helping medical personnel to find abnormal areas of organs, im-proving the speed and accuracy of lesion detection, and developing a complete and systematicmedical service in combination with the professional medical knowledge of doctors.

Acknowledgment. This work was supported in part by the Ministry of Science and Technol-ogy, Taiwan, R.O.C., under grants MOST 107-2218-E-992-303-.

REFERENCES.[1] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, et al., “Mastering the game of go

with deep neural networks and tree search,” Nature, vol. 529, pp. 484–489, 2016.[2] C. Affonso, A. L. D. Rossi, F. H. A. Vieira, and A. C. P. de Leon Ferreira de Carvalho,

“Deep learning for biological image classification,” Expert Systems with Applications, vol.85, pp. 114–122, 2017.

[3] D. Sullivan, C. F. Winsnes, L. Akesson, M. Hjelmare, M. Wiking, et al., “Deep learning iscombined with massive-scale citizen science to improve large-scale image classification,”Nature Biotechnology, vol. 36, pp. 820–828, 2018.

[4] Q. Zhang, L. T. Yang, Z. Chen, and P. Li, “A survey on deep learning for big data,” Infor-mation Fusion, vol. 42, pp. 146–157, 2018.

[5] F. Jia, Y. Lei, J. Lin, X. Zhou. and N. Lu, “Deep neural networks: A promising tool forfault characteristic mining and intelligent diagnosis of rotating machinery with massivedata,” Mechanical Systems and Signal Processing, vol. 72, pp. 303–315, 2016.

10 REFERENCES

[6] S. Poria, E. Cambria, and A. Gelbukh, “Aspect extraction for opinion mining with a deepconvolutional neural network,” Knowledge-Based Systems, vol. 108, pp. 42–49, 2016.

[7] K. Lan, D. T. Wang, S. Fong, L. S. Liu, K. K. Wong, and N. Dey, “A survey of data miningand deep learning in bioinformatics,” Journal of Medical Systems, vol. 42, Article 139,2018.

[8] T, Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in deep learning basednatural language processing,” IEEE Computational Intelligence Magazine, vol. 13(3), pp.55–75, 2018.

[9] S. Zhang, L. Yao, A. Sun, and Y. Tay, “Deep learning based recommender system: Asurvey and new perspectives,” ACM Computing Surveys, vol. 52(1), Article 5, 2019.

[10] M. Fu, H. Qu, Z. Yi, L. Lu, and Y. Liu, “A novel deep learning-based collaborative filteringmodel for recommendation system,” IEEE Transactions on Cybernetics, vol. 49(3), pp.1084–1096, 2018.

[11] G. Papandreou, L. C. Chen, K. P. Murphy, and A. L. Yuille, “Weakly-and semi-supervisedlearning of a deep convolutional network for semantic image segmentation,” IEEE Inter-national Conference on Computer Vision, pp. 1742–1750, 2015.

[12] L. C. Chen, G. Papandreou, I. Kokkinos, K. P. Murphy, and A. L. Yuille, “Deeplab: Se-mantic image segmentation with deep convolutional nets, atrous convolution, and fullyconnected crfs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.40(4), pp. 834–848, 2017.

[13] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521(7553), pp. 436–444, 2015.

[14] O. Russakovsky, J. Deng, H. Su, J. Krause, et al, “Imagenet large scale visual recognitionchallenge,” International Journal of Computer Vision, vol. 115(3), pp. 211–252, 2015.

[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convo-lutional neural networks,” Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.

[16] C. Szegedy, W. Liu, Y. Jia, et al, “Going Deeper with Convolutions,” IEEE Conference onComputer Vision and Pattern Recognition, pp. 1–9, 2015.

[17] L. Fei-Fei, J. Deng, and K. Li, “ImageNet: Constructing a large-scale image database,”Journal of Vision, vol. 9(8), pp. 1037–1037, 2009.

[18] J. Howse, “OpenCV computer vision with python,” Packt Publishing Ltd, 2013.[19] J. Minichino, J and J. Howse, “Learning OpenCV 3 Computer Vision with Python,” Packt

Publishing Ltd, 2015.[20] R. Tauler, B. Walczak, and S. D. Brown, Comprehensive chemometrics: chemical and

biochemical data analysis, Elsevier, 2009.[21] F. Sadoughi, M. Ghaderzadeh, M. Solimany, and R. Fein, “An intelligent system based on

back propagation neural network and particle swarm optimization for detection of prostatecancer from benign hyperplasia of prostate,” Journal of Health and Medical Informatics,vol. 5(3), pp. 1–5, 2014.

[22] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training byreducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.

[23] C. Szegedy, V. Vanhoucke, S. Ioffe, and J. Shlens, “Rethinking the inception architecturefor computer vision,” IEEE Conference on Computer Vision and Pattern Recognition, pp.2818–2826, 2016.

[24] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Transactionson Systems, Man, and Cybernetics, vol. 9(1), pp. 62–66, 1979.

REFERENCES 11

Ko-Wei Huang received Ph.D. degrees in Institute of Computer and Commu-nication Engineering, Department of Electrical Engineering, National ChengKung University, Tainan, Taiwan. 2015. He is currently an Assistant Profes-sor in the Department of Electrical Engineering, National Kaohsiung Univer-sity of Science and Technology, Taiwan. His current research interests mainlyinclude Data Mining, Evolutionary Computing, and Image Processing.

Chun-Cheng Lin received the M.S degrees in Department of Electrical Engi-neering, National Kaohsiung University of Science and Technology, Taiwan.His research interests include Image processing and Raspberry Pi.

Yi-Ming Lee received the M.S degrees in Department of Electrical Engineer-ing, National Kaohsiung University of Science and Technology, Taiwan. Hisresearch interests include Image processing and Data Mining.

Ze-Xue Wureceived the B.Sc degrees in Department of Electrical Engineer-ing, National Kaohsiung University of Science and Technology, Taiwan. Hisresearch interests include machine learning, data mining and artificial intelli-gence.

Documents

A Deep Learning and Image Recognition System for Image ... · sets, but there was a breakthrough in a large-scale visual recognition competition not long ago. In 2012, a visual recognition