Upload
haxuyen
View
219
Download
1
Embed Size (px)
Citation preview
Deep Learning Based Real-time Object Recognition
System with Image Web Crawler
Myung-jae Lee1, Hyeok-june Jeong1, Young-guk Ha2
2 Corresponding author
1 Department of Computer Science & Engineerinig
Konkuk University
Neungdong-ro, Gwangjin-gu, Seoul 143-701, Korea
{dualespresso,amitajung}@naver.com 2 Department of Computer Science & Engineerinig
Konkuk University
Neungdong-ro, Gwangjin-gu, Seoul 143-701, Korea
Abstract. Recently, deep learning algorithm becomes a great solution for
various field. Convolutional Neural Network (CNN), a kind of neural network,
is known as suitable method for image processing. This paper proposes a real-
time object recognition system with CNN. Since many images are needed for
deep learning, this system contains image web crawler that collects images
automatically. This paper will show high accuracy in object recognition.
Keywords: deep learning, object recognition, CNN, crawler
1 Introduction
There are a lot of data in the Internet. These data can be used with suitable process
such as big data analysis which is the process of collecting, organizing and analyzing
a lot of data to discover data patterns and useful information. There are many types of
data, and one of them is image. Images have a lot of information and is used in
various system such as speed camera system on the road, license plate recognition
system and Google image searching system.
Object recognition in image is one of the interested study because object
recognition means that the system can understand like human think. In other words,
object recognition is related with field of Artificial Intelligence. Furthermore, growth
of deep learning algorithm accelerates object recognition system. Deep learning
which is part of a machine learning is used in the many research and industry to help
solve big data problems. It has various architectures such as Convolutional Deep
Neural Networks (CNN). CNN, which is inspired by the organization of the visual
cortex of animal, is the feed forward networks between its neurons. It can be used
vision computing system, such as object recognition system. Deep learning algorithm
with CNN can help analyze image data. It trains a lot of categorized images and helps
Advanced Science and Technology Letters Vol.142 (SIT 2016), pp.103-110
http://dx.doi.org/10.14257/astl.2016.142.19
ISSN: 2287-1233 ASTL Copyright © 2016 SERSC
recognition. In other words, deep learning needs a lot of images. Theas images can be
collected in the Internet.
However, categorizing data is the most important work before using data because
usable and unusable data for purpose are mixed in the Internet. For this reason, there
are increase of researches which is related with collecting data. Web crawling is one
of the collecting method. Web crawler collects data with established category and
helps manage data.
This paper proposes a real-time object recognition system with web crawler. This
system collects images automatically and trains collected images. With trained model,
the system recognize objects in real-time with camera. This system focuses on the
object which appear in the road such as car, traffic sign and police.
2 Related Work
There have been significant researches that have tired for archiving object recognition
in images. Several papers have proposed way of using Scale Invariant Feature
Transform (SIFT) algorithm[1,2]. Paper [1] suggested image recognition system with
pyramidal descriptor adapted SIFT algorithm and paper [2] proposed image
recognition system for colorectal polyp histology with SIFT. Both are well designed
system; however these systems are simply processed, so that its result can be
incorrect. Research [3] suggested persimmon growing monitoring system with
analyzing image and paper [4] proposed image recognition system with three
dimensional Speed Up Robust Feature (SURF) algorithm. However both are have a
possibility of erroneous result.
For this reason, there have been many studies that have aimed to archive object
recognition with deep learning. Paper [5] proposed deep learning based visual
tracking system and research [6] suggested multiple instance analysis system with
deep learning. Both focused on the use of deep learning in object recognition, not
performances.
To improve these performances, various designs are suggested. Related work [7]
suggested very deep CNN for recognition large-scale images. Paper [8] showed
hierarchical feature extraction to improve image recognition performances. Research
[9] proposed simple network of learning for fast image recognition. These studies
showed great performance but they have not managing system for images which are
trained and have not real-time system.
This paper suggests real-time image recognition system with image crawler. This
system tries for recognizing objects which are detected on the road such as Car,
Ambulance and Pedestrians. And this paper proposes a way to manage and collect
images and to recognize object in real-time.
Advanced Science and Technology Letters Vol.142 (SIT 2016)
104 Copyright © 2016 SERSC
3 System Design
Fig. 1. Overall architecture of this system.
Figure 1 shows overall architecture of this system. Web Image Crawler designs
ontology and collects images with designed ontology. This system crawls images
automatically and saves images in Hadoop Distributed File System (HDFS). Image
Trainer brings images from a HDFS and learns these images. Image Trainer makes
DNN model profile that fundamental source of Image Recognizer after learning.
Image Recognizer makes DNN model from downloaded DNN model profile and
detects objects from images which is captured from the camera in real time.
3.1 Image Web Crawler
Fig. 2. System Flow of Image Web Crawler
Figure 2 shows automatic image web crawling system. Ontology Manager generate
ontology and instance file. Ontology changes experiences in the real world into
modeled concept for computer. Ontology Manager consist of various objects on the
road. Web Page Searcher searches keyword with instances of ontology and takes web
source. This study used Selenium Google Chrome Driver for page searching. Image
Crawler crawls URLs of images from parsed web source. File Handler saves images
to HDFS. Before saving, it checks duplication of URL and changes URLs to images.
This study constructed HDFS on cluster server with 60 virtual nodes. HDFS is
suitable system to save big data.
Advanced Science and Technology Letters Vol.142 (SIT 2016)
Copyright © 2016 SERSC 105
3.2 Image Trainer
Fig. 3. System Flow of Image Trainer
As shown Figure 3 above, Image Trainer has three layered process. Image
Downloader brings big data images from a HDFS. Images which saved in HDFS are
classified with their category. Image Downloader brings these images as it is. Image
Learner trains images with deep learning. For deep learning network, this study used
Convolutional Neural Network (CNN) to recognize object from image.
Fig. 4. Graph and Example of Overfitting
CNN uses multiple filter to focus on a small area and get one number. By focusing
on a small area repeatedly, feature of image is found. However, this useful network
was not used until a few years ago. Since it focuses on small area repeatedly, its result
becomes detailed. As a result, trained system recognize only trained images but not
Advanced Science and Technology Letters Vol.142 (SIT 2016)
106 Copyright © 2016 SERSC
testing images which is in same category but not used in train. It is named Overfitting.
For example, the system trained Police Car but cannot recognize not trained Police
Car as shown Figure 4 above. However, as the dropout concept was proposed,
Overfitting problem was solved. Dropout eliminates overfitting and increase its
accuracy.
Deep learning with CNN can be implemented with various libraries. This paper
implemented with Caffe framework which is considered to be rapid and is
modularized with C++, python and Matlab. DNN Model Profile Manager makes
DNN model profile from the result of image training and sends DNN model profile to
image recognizer.
3.3 Image Recognizer
Fig. 5. System Flow of Image Recognizer
As shown Figure 5 above, Image Recognizer has two inputs, captured image and
DNN Model Profile. DNN Model Regenerator generates DNN model with received
DNN Model Profile. As Image Recognizer regenerates DNN Model, multiple Image
Recognizer can be used in this system. Image Receiver captures images from the
camera. Object Recognizer detects object from a received images with DNN model
which is regenerated on DNN Model Regenerator. Recognition Result Logger saves
result of object recognition. This log can be used in feedback of this system.
4 Implementation
The proposed Image Trainer is implemented on Ubuntu 14.04. To increase learning
performance, we used four GPGPUs and high-performance CPU. Caffe library was
used for deep learning and CUDA was used for using GPGPU. Image Recognizer is
implemented in Ubuntu 14.04 and used GPGPU for image recognition.
Table 1. Implementation Environment
Image Trainer Image Recognizer
CPU Intel Xeon E5 2.40GHz Intel i7 3.60 GHz
RAM 128GB 16GB
Advanced Science and Technology Letters Vol.142 (SIT 2016)
Copyright © 2016 SERSC 107
HDD 1TB SSD 256GB SSD
GPGPU Geforce GTX 1080 * 4 Geforce GTX 1080
OS Ubuntu 14.04 LTS Ubuntu 14.04 LTS
Libraries CUDA, OpenCV, Caffe CUDA, OpenCV, Caffe
This implementation trained 65,000 images which is collected in Image Web
Crawler and has general resolution; general resolution is in range from 640x480 to
1920x1080. To increase accuracy, it was learned with 100,000 iteration and 25
network layers. The recognizer experiment uses captured image from the camera in
real-time.
Fig. 6. A Part of the Designed Ontology
Ontology was designed as shown Figure 6. It is comprised of various objects which
is detected on the road. There are various instances at the bottom of the ontology tree
and sub instances at the child of instance. For example, two sub instances, Kia K5 and
Hyundai Sonata, are located for the child of Mid-size Car. These sub instances are
used for keyword to search images.
Fig. 7. Designed Convolutional Neural Network
As shown Figure 7 above, convolutional neural network is designed with 24 layers.
All of focused small images are use this network. To solve Gradient Vanishing
problem, ReLU layer is used for activate function in every Convolutional Layer. If the
Advanced Science and Technology Letters Vol.142 (SIT 2016)
108 Copyright © 2016 SERSC
system uses sigmoid function as a activate function, a gradient becomes zero value.
Using ReLU function can solve this problem with low calculating time. However, this
function makes input size too big and it can be a critical problem in learning
algorithm with increment of calculating time and lack of memory space. Pooling layer
is solution of this problem. With pooling layer, input size can be reduced. Local
Response Normalization (LRN) and Dropout layer prevent overfitting. Fully
Connected (FC) layer, which is implemented after Conv layer, classified images. In
Output layer, Softmax layer transforms result value to possibility.
Fig. 8. Result of implementation
The experiment trained with 65,000 images and 100,000 iteration. Calculating time
on training was 48 hours. Recognition system classified with 17 classes. An accuracy
of object recognition resulted 99% and calculating time was below 50ms in average.
5 Conclusion
This paper proposed a real-time object recognition system with image web crawler.
The proposed crawling system was designed for flexibility that can be modified easily
with ontology. This paper designed CNN to achieve high performance. The deep
learning system performed great accuracy. The recognizer was designed to implement
in itself, if DNN model profile is provided. In other words, this recognizer need not
exchange image or recognition data with deep learning system, but only need
download DNN model profile one time.
The proposed system will be modified in the near future. Recognition system will
apply to crawling process. Recognizer which is implemented in crawling process
checks images whether it is proper or not. This system will make accuracy of images
for training higher.
Advanced Science and Technology Letters Vol.142 (SIT 2016)
Copyright © 2016 SERSC 109
Acknowledgments. This work was supported by Institute for Information &
communications Technology Promotion(IITP) grant funded by the Korea
government(MSIP) (R7118-16-1002, Development of Driving Computing System
Supporting Real-time Sensor Fusion Processing for Self-Driving Car)
References
1. Seidenari, L.: Local pyramidal descriptors for image recognition. IEEE transactions on
pattern analysis and machine intelligence 36.5 (2014): 1033-1040.
2. Kominami, Y.: Computer-aided diagnosis of colorectal polyp histology by using a real-
time image recognition system and narrow-band imaging magnifying
colonoscopy. Gastrointestinal endoscopy 83.3 (2016): 643-649.
3. Chang, K.-C.: Design of persimmon growing stage monitoring system using image
recognition technique. Consumer Electronics-Taiwan (ICCE-TW), 2016 IEEE
International Conference on. IEEE, 2016.
4. Redondo-Cabrera, C.: Surfing the point clouds: Selective 3d spatial pyramids for category-
level object recognition. Computer Vision and Pattern Recognition (CVPR), 2012 IEEE
Conference on. IEEE, 2012.
5. Wang, N., Yeung, D.-Y.: Learning a deep compact image representation for visual
tracking. Advances in neural information processing systems. 2013.
6. Xu, Y.: Deep learning of feature representation with multiple instance learning for medical
image analysis. 2014 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP). IEEE, 2014.
7. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556 (2014).
8. Li, H.: Hierarchical feature extraction with local neural response for image
recognition. IEEE transactions on cybernetics 43.2 (2013): 412-424.
9. Chan, T.-H.: PCANet: A simple deep learning baseline for image classification? IEEE
Transactions on Image Processing 24.12 (2015): 5017-5032.
10. Srivastava, N.: Dropout: a simple way to prevent neural networks from overfitting. Journal
of Machine Learning Research 15.1 (2014): 1929-1958.
Advanced Science and Technology Letters Vol.142 (SIT 2016)
110 Copyright © 2016 SERSC