Implementing a Parallel Image Edge Detection Algorithm ... · poses a parallel image edge detection algorithm based on the Otsu operator by optimizing the thresholds of the CannyoperatorontheHadoopplatform.eproposed

Research ArticleImplementing a Parallel Image Edge Detection Algorithm Basedon the Otsu-Canny Operator on the Hadoop Platform

Jianfang Cao ,1 Lichao Chen ,2 MinWang,2 and Yun Tian1

1Department of Computer Science and Technology, Xinzhou Teachers University, Xinzhou 034000, China2College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China

Correspondence should be addressed to Lichao Chen; chen [email protected]

Received 15 November 2017; Revised 15 January 2018; Accepted 28 March 2018; Published 13 May 2018

Academic Editor: Pedro Antonio Gutierrez

Copyright © 2018 Jianfang Cao et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edgedetection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edgedetection performance of the Canny operator, in this paper, we propose a parallel design and implementation for anOtsu-optimizedCanny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is usedto optimize the Canny operator’s dual threshold and improve the edge detection performance, while the MapReduce parallelprogramming model facilitates parallel processing for the Canny operator to solve the processing speed and communication costproblems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasetsof different scales from the Pascal VOC2012 image database.The proposed parallel Otsu-Canny edge detection algorithm performsbetter than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2%on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up thesystemby approximately 3.4 timeswhen processing large-scale datasets, which demonstrates the obvious superiority of ourmethod.The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.

1. Introduction

Edges are a basic image feature and they are present betweena target and a background and between two targets, tworegions, or two primitives. Most of an image’s information iscarried in the edges. Usually, an image edge is a set of pixelsaround which the gray-level values exhibit a step change.Thus, edge detection technology is based on the discontinuityor mutation of gray levels or textural characteristics betweenan object and its background. Edge detection is an importantaspect of image processing and is the basis of many analyt-ical methods in such fields as image segmentation, patternrecognition, machine vision, and regional shape extraction.In addition, edge detection is an important step in imageanalysis and 3D reconstruction and is therefore also animportant feature in the field of digital image analysis. Imageedge detection algorithms have been widely studied [1]. Thebasic idea is as follows. First, an edge enhancement operatoris used to highlight local edges in an image. Then, the edge

strength of the image is defined, and the edge points areextracted by setting a threshold [2]. The performance ofthe edge detection algorithm directly affects the precision ofextracted object contours and the performance of the system.In 1959, Julesz [3] was the first to discuss edge detection;later, in 1965, Roberts [4] began to systematically study edgedetection. After nearly 60 years of research, many differentedge detection methods have been designed, and each has itsown characteristics and limitations.

With the advent of the big data era, traditional edgedetection technologies are facing problem of poor edgedetection and long runtimes; thus, this technology needs tobe further analyzed and studied. The Canny operator is noexception. Tang and Long [5] proposed a fast implementationof the Canny operator based on a GPU+CPU combinationin which the GPU was used to parallelize the Canny edgedetection algorithm.Their results showed that the processingtime for 8-bit images at a resolution of 1024 × 1024 was122ms, as well as a speedup ratio of approximately 5.39

HindawiComputational Intelligence and NeuroscienceVolume 2018, Article ID 3598284, 12 pageshttps://doi.org/10.1155/2018/3598284

http://orcid.org/0000-0003-3687-3738

http://orcid.org/0000-0001-7111-2630

https://doi.org/10.1155/2018/3598284

2 Computational Intelligence and Neuroscience

times compared to the traditional Canny edge detectionalgorithm. Such architectural improvements can improve theperformance of traditional image edge detection algorithmsunder single-node architectures when processing massivedatasets. However, GPU-based parallel designs involve highhardware costs and require developers who clearly under-stand the target computer hardware systems; furthermore,such implementations have difficulties due to the high costof communications among nodes [6].

In recent years, the MapReduce framework on theHadoop platform has attracted extensive attention fromscholars and researchers. MapReduce is a parallel computingmodel that targets a distributed environment. It providesdevelopers with a complete programming interface thatmakes a deep understanding of the underlying computersystem unnecessary. Furthermore, it is both less expensiveandmore functional than traditional computingmodels. Dueto these advantages, MapReduce has become a hotspot inthe field of parallel design [7]. Therefore, this study aims toapply the MapReduce programming model on the Hadoopplatform to parallelize the Canny operator so that it bettermeets the needs of big data processing.

2. Related Works

Classic image edge detection algorithms include both first-order differential operators (i.e., the Roberts, Prewitt, Sobel,and Canny operators) and second-order differential oper-ators (i.e., the Laplacian and LoG operators) and can beapplied to a wide range of applications. Wang and Liuapplied the Roberts operator to detect vehicle image edgesand recognize vehicle license plate locations by combiningit with mathematical morphology [8]. Yang et al. used theRoberts operator and linear fitting to obtain two boundariesand proposed an Otsu thresholding image segmentationmethod [9]. To calculate image quality more consistentlycompared to subjective evaluations, Zhang et al. designed anew image quality assessment algorithm that used the Prewittoperator to extract vertical edges in the HSV color space[10]. Dwivedi et al. presented a handwritten Sanskrit word-recognition method using the Prewitt operator for characteredge detection [11]. Nguyen et al. built an architecture for thereal-time hardware cosimulation of edge detection using thePrewitt edge operator to evaluate the real-time performanceof an edge detection algorithm [12]. Singh et al. appliedthe Sobel operator to detect edges in real-time surveillancevideos and presented a new resource-efficient FPGA-basedhardware architecture [13]. Jiang et al. performed color imagesegmentation using the Canny operator by combining it witha pulse-coupled neural network [14]. Based on the colorvector angle and the Canny operator, Tang et al. proposeda robust image hashing mechanism to achieve a desirabletradeoff of classification performances between rotationrobustness and discrimination [15]. Sun and Han enhancedimages by extracting the edge and texture information fromthe original images using the Laplacian operator [16]. Forglass fragment images, Yang et al. proposed an edge detectionmethod that combined the LoG operator and mathematicalmorphology and established a rapid transmission system to

realize rapid edge detection [17]. Amer and Abushaala per-formed a comparative study on typical classical edge detec-tion operators [18]. By comparing experiments from existingstudies, the following conclusions were drawn. Although theabovementioned detection algorithms have the advantagesof being simple and easy to implement and provide goodreal-time performance, they also have obvious shortcomings.The image edge feature extracted by the Roberts operatoris relatively rough and provides inaccurate edge locations.The edge features extracted by the Prewitt operator havewide margins and many discontinuities. Similarly, the Sobeloperator does not provide accurate locations of image edges.The Laplacian operator is highly sensitive to noise, andthe LoG operator cannot eliminate salt-and-pepper noisein an image. In contrast, the Canny operator applies anoptimization principle to detect image edges and sets thehigh and low thresholds using the gradient histogram of theimage, which gives edge detection a high signal-to-noise ratioand improves its reliability. The Canny operator can obtainsatisfactory edge detection results when there is only oneimage background and when the gray-level changes betweenthe background and target are not too large [19]. However,in reality, images are easily affected by factors such as theenvironment and illumination, which can cause substantialgray-level changes between the image background and atarget, which degrades the edge detection performance ofthe traditional Canny operator because the parameters mustbe adjusted manually. Such weak contrast conditions reducethe adaptability of the Canny algorithm, causing it to easilyfail to detect edges [20]. Various researchers have improvedthe traditional Canny operator and proposed some improvedCanny edge detection algorithms based on adaptive thresh-olds. Ronggui et al. presented an automatic road extractionmethod for vague aerial images using an improved Cannyedge detection operator that included automatic thresholdingto segment the image into a binary edge image [21]. Guimingand Jidong realized remote sensing image edge detectionbased on an improved Canny operator [22]. However, withthe arrival of the big data era, traditional algorithms havebecome insufficient when faced with massive images. Thealgorithms’ computational load increases dramatically, andtheir runtime performance declines sharply. Thus, extractingedge features from large-scale digital image collections facesnew challenges.

Hadoop is an open-source software framework developedby the community at large and distributed under the ApacheLicense; programmers can use it to develop distributedprograms without knowing the underlying details of theframework [23]. The core design of the Hadoop frame-work includes the Hadoop distributed file system (HDFS)and MapReduce. HDFS provides distributed storage forlarge amounts of data, and MapReduce implements dis-tributed parallel computing, which provides a new approachfor improving the edge detection performance on mas-sive images. O’Driscoll et al. discussed how to apply bigdata technologies, such as the Apache Hadoop project, toprocess and analyze petabyte- (PB-) scale datasets withinthe bioinformatics community [24]. Lee et al. conducted asurvey onMapReduce to assist the database and open-source

Computational Intelligence and Neuroscience 3

communities in understanding various technical aspectsof the MapReduce parallel programming framework [25].MapReduce is widely used in various fields. Alham et al.applied the MapReduce framework to the SVM algorithmand proposed a MapReduce-based distributed SVM algo-rithm (MRSMO) for automatic image annotation [26]. Caoet al. designed a parallel particle swarm optimization- (PSO-) optimized BP neural network algorithm using the MapRe-duce framework to realize massive scene image classification[27]. Li et al. used MapReduce to parallelize the fast fuzzyc-means algorithm to achieve large-scale underwater imagesegmentation [28]. Cao et al. parallelized the traditional 𝑘-means algorithm in a MapReduce environment to retrievelarge-scale scene images [29]. The above studies all improvedthe runtime performance and system efficiency of vari-ous algorithms using the MapReduce parallel programmingmodel, and the number of MapReduce-based applications isgradually increasing. However, there are few studies concern-ing the Otsu algorithm or that investigate the MapReducedistributed parallel processing of edge detection operatorsand their application to the field of digital image processing.

To solve the abovementioned problems, this study pro-poses a parallel image edge detection algorithm based onthe Otsu operator by optimizing the thresholds of theCanny operator on the Hadoop platform. The proposedmethod improves the Canny edge detection algorithm inthe OpenCV function library using the Otsu algorithm.Then, the MapReduce parallel programming model on theHadoop platform is applied to realize the parallel Otsu-Canny edge detection algorithm in a cluster environmentand achieve parallel processing for the feature extractiontask for massive numbers of images. Compared with thesingle-node architecture, the proposed approach reduces therunning time of the proposed algorithm by approximately67.2% when using a Hadoop cluster architecture consistingof 5 nodes and an image scale of 60,000 images. The systemachieves a speedup of approximately 3.4 times, which reflectsits obvious superiority in processing large-scale datasets. Ourapproach significantly reduces the computational load andimproves the runtime and edge detection performance forimages, greatly increasing the speed of edge feature extractionthrough task decomposition.

3. Otsu-Canny Edge Detection Algorithm

The Canny edge detection operator is a multilevel edgedetection algorithm developed by John F. Canny in 1986; thisoperator uses a method called calculus of variations to finda function that optimizes a particular function. The goal isto find an optimal edge detection algorithm that retains theoriginal image attributes [30].

3.1. Canny Algorithm Principle. The traditional Canny oper-ator detects image edges by performing the following stepssequentially.

(1) Applying Gaussian Smoothing to Images Using GaussianConvolution. To eliminate image noise, the Canny operatortakes the first derivative of the two-dimensional Gaussian

function (formula (1)) as a noise filter; then, it performsconvolution processing to smooth the image:

𝐺 (𝑥, 𝑦) = 12𝜋𝜎2 exp[−𝑥2 + 𝑦2

2𝜎2 ] , (1)

where 𝜎 represents the standard deviation of the Gaussianfilter function, which is manually set and controls the imagesmoothness.

(2) Filtering the Image Using the First Derivative of the Gaus-sian Operator to Obtain the Gradient Intensity and Directionof the Image. Suppose that 𝐼(𝑖, 𝑗) is a smooth image. The firstderivative of the image in the 𝑥 and 𝑦 directions is

𝑃𝑥 [𝑖, 𝑗] = 12 (𝐼 [𝑖, 𝑗 + 1] − 𝐼 [𝑖, 𝑗] + 𝐼 [𝑖 + 1, 𝑗 + 1]

− 𝐼 [𝑖 + 1, 𝑗]) ,𝑃𝑦 [𝑖, 𝑗] = 1

2 (𝐼 [𝑖, 𝑗] − 𝐼 [𝑖 + 1, 𝑗] + 𝐼 [𝑖, 𝑗 + 1]− 𝐼 [𝑖 + 1, 𝑗 + 1]) .

(2)

Thus, the gradient intensity𝑀[𝑖, 𝑗] and gradient direction𝜃[𝑖, 𝑗] are𝑀[𝑖, 𝑗] = √𝑃𝑥 [𝑖, 𝑗]2 + 𝑃𝑦 [𝑖, 𝑗]2, (3)

𝜃 [𝑖, 𝑗] = arctan(𝑃𝑦 [𝑖, 𝑗]𝑃𝑥 [𝑖, 𝑗]) , (4)

respectively.

(3) Performing Nonmaximum Suppression Along the GradientDirection. Divide the gradient directions into 8 directions:0∘ –180∘ , 45∘ –22.5∘,90∘ –270∘ , and 135∘ –31.5∘. Comparethe value of the derivative of each pixel with the modulusof adjacent pixels in the image for these 8 directions alongthe edge detection points of the argument direction. Finally,take the pixel with themaximum partial derivative as an edgepoint.

(4) Detecting and Connecting the Edge Points Using the Dual-Threshold Method. The traditional Canny operator requiresmanually setting the high threshold 𝑇high and low threshold𝑇low, the relation of which is generally 𝑇low = 0.5𝑇high. Whenthe gradient intensity 𝑀(𝑖, 𝑗) of a pixel (𝑖, 𝑗) is greater thanthe high threshold value 𝑇high, the point is marked as an edgepoint, and when the gradient intensity 𝑀(𝑖, 𝑗) of the pixel(𝑖, 𝑗) is less than the low threshold value𝑇low, the pixel cannotbe an edge point. When the gradient intensity of the pixel isbetween 𝑇high and 𝑇low, the point is marked as a candidateedge point; further judgments are later made by combiningeach candidate point with its surrounding pixels.

3.2. Otsu-Optimized Threshold Values of the Canny Operator.The Otsu operator, also called the maximum class squareerror method, is a self-adapting threshold determination


method that can solve the Canny operator’s problem, whichis that it is unable to select the high and low thresholdsadaptively according to the image characteristics [31]. TheOtsu operator uses the thresholds to divide the image intotwo parts: the background and the target. The best thresh-old occurs when the difference between the two parts isthe largest (i.e., the maximum variance between classes isachieved).Themethod for determining the optimal thresholdvalue for the Otsu operator is as follows.

Suppose that 𝑡 is the segmentation threshold between thebackground and target of image 𝐼(𝑖, 𝑗), the grayscale range 𝐺of the image is [0, 𝐿−1], and the probability of each grayscaleis 𝑝𝑖. The threshold 𝑡 divides the image into two categories:𝐺0 = [0, 𝑡] and 𝐺1 = [𝑡 + 1, 𝐿 − 1]. Then, the following aretrue.

The probabilities of the two classes 𝐺0 and 𝐺1 are 𝛼0 =∑𝑡𝑖=0 𝑝𝑖 and 𝛼1 = 1 − 𝛼0, respectively.The average gray values of the two classes 𝐺0 and 𝐺1 are𝜇0 = 𝜇𝑡/𝛼0 and 𝜇1 = (𝜇 − 𝜇𝑡)/(1 − 𝛼0), respectively, where𝜇 = ∑𝐿−1𝑖=0 𝑖 × 𝑝𝑖 and 𝜇𝑡 = ∑𝑡𝑖=0 𝑖 × 𝑝𝑖.The between-cluster variance of the two classes𝐺0 and𝐺1

is

𝜂2 (𝑡) = 𝛼0 (𝜇0 − 𝜇)2 + 𝛼1 (𝜇1 − 𝜇)2 = 𝛼0𝛼1 (𝜇0 − 𝜇1)2 . (5)

Then, the high threshold 𝑇high of the Canny operator canbe obtained by solving max(𝜂2(𝑡)).4. Parallel Implementation of the Otsu-CannyEdge Detection Algorithm

Recently, big data has attracted increasing attention. Big dataoften possess multiple sources, complex semantics, and largescales, and they are heterogeneous, dynamic, and changeable,which brings new challenges to traditional machine learningtechnologies. Traditional machine learning methods focuson data analysis with an appropriate statistical method fromdatasets with relatively small numbers of samples to find thefunction and value of the data. However, one of the coreobjectives of big data technologies is to extract the potentialrules from huge amounts of data with extremely complexstructures to maximize the value of the data. Therefore,both the runtime performance and the system efficiency oftraditional algorithms decrease sharply when applied to bigdata.

Although the Otsu-Canny algorithm improves the per-formance of the traditional Canny edge detection algorithm,when it is applied to larger datasets, the time it requires foredge detection increases as well, eventually raising efficiencyissues. The MapReduce parallel programming framework inthe Hadoop platform provides a distributed parallel comput-ing environment for big data processing. Faced with the rapidgrowth in data in the big data era, there is a need to improvethe time efficiency and edge detection performance of theCanny algorithm. Therefore, this study intends to addressthe low time efficiency problem and the poor edge detectionperformance of the traditional Canny algorithm optimizedby the Otsu algorithm as proposed in the literature [18]. Tothis end, this study implemented a parallel version of the

Otsu-Canny edge detection algorithm using the MapReduceparallel programming framework.

4.1. Hadoop Platform and the MapReduce ProgrammingModel. Hadoop is a software framework for the distributedprocessing of large-scale data, and it is a widely accepted bigdata platform. Hadoop implements a distributed file systemcalled HDFS, which provides high fault tolerance, can bedeployed on inexpensive hardware, provides high throughputfor accessing application data, and is suitable for applicationsinvolving large datasets [32].

As one of the core subprojects of Hadoop, MapReduceis a parallel programming model that distributes computingtasks and data to Hadoop cluster nodes, allowing all nodesto perform tasks in parallel to obtain intermediate results,then subsequently summarizing the intermediate results anddistributing additional computing tasks to each node toobtain the final results [33]. When performing these tasks,MapReduce divides calculations into two tasks, Map andReduce, with the help of a functional programming method.The input and output of each task take the form of key-valuepairs, and the mapper() and reducer() functions are designedto achieve the mapping from one key-value pair to anotherkey-value pair. A flowchart of the MapReduce process isshown in Figure 1.

4.2. Parallel Design and Implementation of theOtsu-Canny Algorithm

4.2.1. General Framework for Parallel Image Edge Detection.The general architecture of the parallel Otsu-Canny edgedetection on the Hadoop platform is shown in Figure 2.

The general architecture is divided into three layers.

(1) Presentation layer: users access services through theInternet through which they can submit images orreceive edge detection results.

(2) Business logic layer: a web server executes the cor-responding processing tasks according to the users’requests.

(3) Data processing layer: this layer is the core of theentire system architecture and it is mainly responsiblefor storage, management, optimization of the thresh-old for the Canny operator, edge feature extraction,and outputting the results for massive numbers ofimages. Users submit their images to the Hadoopdistributed system, which then performs thresholdoptimization and edge feature extraction and outputsthe results.

The specific processes are as follows. First, the imagesin the massive image database are processed into the inputformat SequenceFile of a Hadoop job.Then, theMap task seg-ments the image files into splits in accordancewith the defaultslice size (128MB) of the Hadoop system, each of which maycontain multiple image files. Next, the MapReduce frame-work is used to optimize the threshold and extract the edgefeatures of the image in a parallel manner using key-valuepairs such as <image name, image file>. Finally, key-value


Input

Data

Output

Data

Split 0

Split 0

Split 0

mapper

mapper

mapper

reducer

reducer

Part 0

Part 1

split map copy merge reduce output

<Ｅ2, |Ｐ2|><Ｅ1, Ｐ1> <Ｅ2, Ｐ2> <Ｅ3, Ｐ3> |<Ｅ3, Ｐ3>||<Ｅ1, Ｐ1>|

Figure 1: The MapReduce programming model process.

user 2user 1user n

Presentation layer

Business logic layer

Data processing layer

Internet

Image edge extraction

Web Server

Hadoop cluster

Image

database

Image featuredatabase

Featureextraction

Otsu optim

izethreshold

HDFSMapReduce

· · ·

Figure 2: Architecture for massive image edge extraction.

pairs (such as <image name, image edge feature>) aregenerated and written to the HDFS distributed file system ofthe Hadoop platform.

4.2.2. Design and Realization of the Algorithm. BecauseOpenCV is an open-source and cross-platform computervision library that has been used to realize many algorithmsfor image processing and computer vision and provides alarge number of Java interfaces [34], this study improvedthe Canny operator using the OpenCV function library and

implemented the parallel image edge detection algorithmbased on the Otsu-Canny operator using Java on the Hadoopplatform using a total of approximately 600 lines of sourcecode. The most important Map and Reduce tasks of theproposed algorithm are provided as complete source codeimplementations in the Supplementary Materials (availablehere). The workflow of the mapper() and reducer() functionsis shown in Figure 3. MapReduce transforms splits intokey-value pairs (<key1, value1>) using the recordReadermethod of SequenceFileInputFormat, in which key1 is the


Input

split1

split2

splitn

...

SequenceFileInputFormat

recordReader

Mapper( )

Extract and output the image

Reducer( )

While(key2!=null)

reduce(key2,list of value2)for(list of value2)

(

(output the image feature

InputOutputFormat

recordWriter

Output

HDFS

{

{

}}

}}

map(key1,value1)<key1,value1>

While(<key1,value1>!=null)

feature <key2,value2>

<key2,(list of value2)>

<key3,value3>

Figure 3: The workflow of the mapper() and reducer() functions.

path name of the image and value1 is a pointer to theimage data.Themapper() function returns another key-valuepair (<key2, value2>). The MapReduce functionality mergesall the values that have the same key to generate <key2,(value2 list)>, which serves as the input to the reducer()functions. After reducer() processing the output key-valuepair (<key3, value3>) is written to theHDFS file systemby theRecordWriter functions in the ImageOutputFormat customcategory.

Definition of Image Data Type. Because the Hadoop frame-work does not define the class associated with the image asthe data type of the key-value pair <key, value> and becausetheHadoop framework specifies that a user-defined data typecan only be used by implementing theWritable interface, thisstudy defines the data type RawImage and rewrites the basicinput and output method defined by the Writable interfacein Hadoop. Unlike other data types, the RawImage datatype exposes some new functions to implement reading andstoring images, such as converting images to the Mat typeof a single channel or three channels and encoding the Mattype into an image file. These functions facilitate combiningHadoop with OpenCV.

Design of Input/Output Format for Jobs. The image pixelinformation would be destroyed if the image were to be splitto store and process it in a distributedmanner on theHadoopplatform.Therefore, we adopt the whole image as the value ofthe key-value pair.

We define the input format class of the image file. Then,we use the input format SequenceFileInputFormat, which isa built-in Hadoop format that takes the SequenceFile fileas input. The SequenceFileInputFormat format segments theSequenceFile file into splits and sends it to the Map task. Eachsplit contains multiple records, and each record is an image(image name as key, image content as value), which is a goodsolution to the problem of starting too many Map tasks dueto too many small files.

We also defined the output format class of the image file.The class FileOutputFormat describes the output format ofthe data. We designed the class ImageOutputFormat, whichinherits from FileOutputFormat and is used to write the<key, value> provided by the users to a file with a specifiedformat. The ImageRecordWriter class inherits from the classRecordWriter<Text, RawImage>, which regards the imagename as the key and the instance of the RawImage type asthe value to be stored in the HDFS file system.

Design and Realization of mapper() Function. The mainmapper() tasks include reading images, processing images,and converting data. The main mapper() code is as fol-lows.

mapper (Text Key, RawImage value){Mat img = value.toMat (); //Convert value to

Mat matrix


Imgproc.cvtColor (img, img Gray, Imgproc. COLORBGR2GRAY);

//Transform the image intothe gray image

Thresh = Imgproc.threshold (img Gray, dst, 0, 255,Imgproc.THRESH BINARY+Imgproc.THRESH OTSU);

//UseOtsu threshold segmentationmethod toobtain the high threshold of the image 𝑇high

Imgproc. Canny (img Gray, Cannyop, 𝑇high ∗0.5, 𝑇high);//Take 𝑇high ∗ 0.5 as the low threshold andextract the Canny edge feature of the imagemos.write (fileName, RawImage.toImage(Cannyop), fileName.toString());

// Write file to HDFS filesystem using multioutputmode

}Design and Realization of reducer() Function. The mainreducer() tasks include reduction and consolidation, sorting,and outputting the results. The main reducer code is asfollows.

reducer(Text Key, RawImage value){

for(RawImage value: values){

mos.sort (key, value, key.toString());//Reduction and sorting

mos.write (key, value, key.toString());// Write file to HDFS file systemusing multioutput mode

}}

Result Output of Image Edge Detection.The default output filename of Hadoop takes the form name-r(m)-nnnnn, wherename is set by the user, 𝑟 represents the output of the Reducetask, 𝑚 represents the output of the Map task, and 𝑛𝑛𝑛𝑛𝑛 isan integer indicating the block number. However, to facilitatesubsequent processing and display images, we rewrote thegetDefaultWorkFile() method in the FileOutputFormat classusing the following code, which causes the output filenameto be in the form “filename.jpg.”

public Path getDefaultWorkFile (TaskAttemptCon-text context,String extension) throws IOException

{

FileOutputCommitter committer =(FileOutputCommitter) getOutputCommitter

(context);return new Path (committer.getWorkPath(), getU-niqueFile (context, getOutputName (context), exten-sion));

} //Get a task submission

In this study, we use the Hadoop multifile output formatMultipleOutputs and write the output to the file system in theform of records, in which the image file name is regarded asthe key of the key-value pair, and the image file is regarded asthe value of the key-value pair. The main code is as follows.

ImageRecordWriter extends RecordWriter <Text,RawImage>{write (Text fileName, RawImage img){

FSDataOutputStream out = fs.create(outputPath);out.write (img.getRawData ());

} // Write the image data to the output path, usingthe specified file name}

5. Experiment and Result Analysis

To validate the performance of the parallel Otsu-Canny edgedetection algorithm proposed in this study, we tested it onan image edge feature extraction task for a large number ofimages on the Hadoop platform.

5.1. Experimental Environment and Data. The experimentalenvironment was aHadoop cluster composed of five comput-ers (one master node and four slave nodes) configured in anintranet. All the nodeswere equippedwith Intel Core 4.2GHzquad-core, 8-thread processors, 8 GB memory, 4 TB harddisks, JDK1.7.0 79, and the 64-bit Ubuntu 14.04 operatingsystem, and Hadoop-2.5.1 (64-bit compiled version) wasused.

The experimental data stemmed from Pascal VOC2012[32], which consists of 17,125 images, involving people,vehicles, animals and plants, indoor and outdoor scenes,and other categories (20 categories total), and is freelyavailable to researchers. To verify the performance of theproposed algorithmwhen addressingmassive image datasets,we constructed a massive image database by copying imagesfrom the Pascal VOC2012 dataset.

5.2. Experimental Results and Analysis. To validate the per-formance of the proposed algorithm in this study, we con-ducted experimental comparisons by evaluating aspects suchas the edge detection performance, running time, systemspeedup, scaleup, and sizeup.


File name Original image Canny Otsu-Canny [21]

127.jpg

132.jpg

142.jpg

169.jpg

178.jpg

187.jpg

1147.jpg

1152.jpg

Parallel Otsu-CannyParallel Canny

Figure 4: Comparison of the edge detection performance of different algorithms.

5.2.1. Image Edge Detection Performance. Using the Pas-cal VOC2012 image database, the traditional serial Cannyalgorithm, the Otsu-Canny algorithm in the literature [21],the parallel Canny algorithm, and the parallel Otsu-Cannyalgorithm proposed in this study were compared in terms of

their edge detection performances. The experimental resultsare shown in Figure 4.

As shown in Figure 4, using different edge detectionalgorithms, the edge detection performance of the algorithmproposed in this study is preferable to the traditional Canny


Table 1: Comparison of the running times of different algorithms under different data scales.

Image scale Running time (S)Canny algorithm Otsu-Canny algorithm [21] Parallel Canny algorithm The proposed approach (4 slave nodes)

1,000 30 30 27 283,000 58 60 51 508,000 92 95 76 7516,000 155 159 93 9330,000 487 488 199 20160,000 840 843 275 276

800

600

400

200

0

runn

ing

time (

S)

0 10000 20000 30000 40000 50000 60000

image number

1 node2 nodes

3 nodes4 nodes

Figure 5: Comparison of the running times on the Hadoop clusternodes.

algorithm, theOtsu-Canny algorithm, and the parallel Cannyalgorithm; it not only effectively retains the texture informa-tion of the original image but also includes few false edgesand results in better-connected edges. Furthermore, becausethe Otsu algorithm in the Hadoop cluster architecture findsbetter thresholds, the algorithm proposed in this paper ismore accurate in terms of image edge location and better atprocessing image details, which indicates that edge detectionperformance improvements do not occur by chance.

5.2.2. Running Time. To further verify the effectiveness ofthe proposed approach, we constructed datasets of differentsizes by replication. Table 1 and Figure 5 report a comparisonof the edge detection time for the different edge detectionapproaches and different numbers of Hadoop cluster nodeswhile varying the number of images.

The data in Table 1 show the running times required fordifferent numbers of images. The running times required bythe Canny and Otsu-Canny algorithm in the literature [21]are much longer than those of the parallel Canny algorithmand the parallel algorithm proposed in this study and becomedramatically longer as the number of images increasesbecause the parallel algorithm adopts the distributed parallelprocessing technology of the MapReduce framework in the

Hadoop platform, whereas the Canny and Otsu-Canny algo-rithm in the literature [18] use a single-node architecture withlimited processing capacity. Because the complexities of theCanny algorithm andOtsu-Canny algorithmwhen executingin parallel are similar, the running times of the parallelCanny algorithm and the algorithm proposed in this studyare nearly identical under the MapReduce programmingmodel. However, the edge detection effect of the parallelCanny algorithm is similar to that of the traditional Cannyalgorithm as shown in Figure 4. Therefore, the proposedalgorithm in this study achieves better edge detection whilesimultaneously improving the running time, combining theexperimental results shown in Figures 4 and 5.

Figure 5 shows a comparison of the running times forthe different Hadoop cluster nodes. As shown in Figure 5,when the number of images is below 10,000, the runningtime reduction for image edge detection is not very obviouswhen multiple nodes are used. Instead, the running time ofthe multinode cluster architecture is slightly longer than thatof the single-node architecture when processing small-scaleimage datasets due to the increased communication overheadamong node computers. However, with a sharp increase inthe image scale, the advantage of the multinode architectureof the Hadoop cluster gradually becomes apparent. Althoughthe running time with different numbers of Hadoop clusternodes increases as the number of images increases, the timeconsumption of the single-node architecture grows approx-imately linearly, whereas that of the multinode architectureincreases more slowly; in addition, using a larger numberof computer nodes results in a gentler running time curve.These results fully demonstrate the superiority of the Hadoopcluster architecture when processing big data. The runningtime is reduced by approximately 67.2% using a Hadoopcluster architecture with 5 nodes when the number of imagesreaches 60,000. It is obvious from Table 1 and Figure 5 thatincreasing the number of images has little influence on thetime performance of the parallel algorithm based on theMapReduce framework, further illustrating the advantages ofdistributed parallel processing.

5.2.3. Speedup, Sizeup, and Scaleup. Typically, for the parallelprogramming model based on MapReduce on the Hadoopplatform, we adopt three important indicators, speedup,sizeup, and scaleup [35], to evaluate the computationalperformance of the proposed algorithm in this study. Forthese experiments, we still randomly selected and replicated


3.5

3.0

2.5

2.0

1.5

1.0

spee

dup

1 2 3 4

node number

3000 images16000 images50000 images

Figure 6: Comparison of speedup.

images from the Pascal VOC2012 image database to constructdifferent datasets.

Speedup [36] refers to the ratio of the time consumed torun a task under the single-node architecture to the time con-sumed to run the same task under themultinode architecture.Theoretically, the speedup should increase linearly. However,speedup does not grow linearly due to communication costsand load balancing, among other factors. Figure 6 shows theexperimental comparisons of the speedup of the approachproposed in this study using datasets of different scales.

In Figure 6, the speedup presents the speedup growthtrends as the number of nodes in the Hadoop clusterincreases; increased data size leads to an increased speedupmagnitude. For the same dataset, the computing speed ofthe system improves as the number of nodes in the Hadoopcluster increases—that is, the computing time decreases.Therefore, the speedup curve takes on a growth trend. Withregard to the different datasets, a larger number of imagesresult in better performance by the multinode architecture,and the computing speed becomes much higher comparedto that of the single-node architecture. Furthermore, thesystem speedup is almost linear when the number of imagesreaches 50,000, which clearly demonstrates the superiorperformance of the MapReduce parallel processing on theHadoop platform.

Sizeup is defined as how much longer a task takes on agiven system when the data scale is 𝑚-times larger than theoriginal data scale. In other words, as the data size increases, ahigher sizeup means that the Hadoop cluster will take longerto complete the task. To evaluate the sizeup, we varied thenumber of slave nodes in the Hadoop cluster from 1 to 4and varied the number of images from 3,000 to 60,000. Theexperimental results are shown in Figure 7.

As Figure 7 shows, when the image scale increases from3,000 to 60,000, the sizeup increases by 2.97 for the 1-nodecluster, whereas it increases by only 2.34 for the 4-node

10000 20000 30000 40000 50000 60000

Image number

6

5

4

3

2

1

Size

up

1 node2 nodes

3 nodes4 nodes

Figure 7: Comparison of sizeup.

system. The sizeup for the 4-node cluster grows less due toincreased communication time among the nodes. Althoughthe communication time of the 4-node cluster system islonger than that of the 1-node system, the communicationtime does not increase significantly under the proposedapproach in the study as the number of images increases.Therefore, the approach proposed in this study obtains a goodsizeup performance.

Scaleup is often used to measure the performance of analgorithm when increasing the system and data sizes, andit refers to the capability of an 𝑚-times larger system tocomplete an𝑚-times larger job within the same running timeas the original system.Therefore, the scaleup value illustrateshowmuch better an algorithm addresses big data when moreslave node computers are available in the Hadoop cluster.Specifically, a higher scaleup value denotes a better algorithmperformance. Tomeasure scaleup, we obtain different scaleupvalues by increasing the number of slave nodes and the imagescale simultaneously using the following combinations: (1-node cluster, 3,000 images), (2-node cluster, 5,000 images),(3-node cluster, 30,000 images), and (4-node cluster, 60,000images). Figure 8 shows the experimental results.

We can see that the scaleup values in Figure 8 are all above0.90, which confirms the better performance of the algorithmproposed in this study.

6. Conclusions

Edge detection is a basic problem in the image processingfield. The accuracy of edge detection strongly influencessubsequent operations such as feature extraction, objectrecognition, 3D reconstruction, imagematching, and quanti-tative analysis.The rapid development of network technologyand multimedia technology has resulted in a sharp increasein the number of available images—far exceeding the abilitiesof traditional image processing algorithms. The algorithmsfor extracting edge features from images are no exceptions.


1 node 2 nodes 3 nodes 4 nodes

1.00

.98

.96

.94

.92

.90

Scal

eup

The number of slave nodes

Figure 8: Comparison of scaleup.

Currently, the open-source Hadoop platform is widely usedbecause it is convenient and inexpensive for building clus-tered systems. Moreover, it provides an easy-to-use paralleldistributed storage and programming model. Researchers inboth academia and industry are developing new ways toapply algorithms developed for the traditional single-nodearchitecture environment in Hadoop cluster environments.

This study presented a parallel Otsu-Canny edge detec-tion algorithm based on the MapReduce framework of theHadoop platform to realize edge feature extraction frommassive numbers of images. Moreover, it fostered a deeperexploration and discussion of the parallel design and real-ization of the Otsu-Canny algorithm.This study investigatedthree topics: combining the MapReduce parallel program-ming model with a traditional edge detection algorithm,the parallel design and implementation of the Otsu-Cannyalgorithm, and fast and effective automated edge featureextraction for massive numbers of images on the Hadoopplatform. The completed algorithm was tested with imagesfrom the Pascal VOC2012 image database. The experimentalresults indicate that the proposed algorithm can not onlyaddressmassive datasets but also achieve good system perfor-mance in terms of the speedup, sizeup, and scaleup evaluationmetrics. The experimental results also demonstrate that theproposed algorithm can take full advantage of the resourcesof the distributed cluster system to improve the algorithm’sedge detection performance and that its parallelization isvery good. In addition, the distributed parallel cluster systembased on theMapReduce framework on theHadoop platformimproved the runtime performance of the algorithm sig-nificantly compared to traditional algorithms with a single-node architecture, which demonstrates the strong computingability of distributed parallel processing.

The analysis and processing of big data, especially imagedata, have become a hot research topic with the arrival of thebig data era. In the future, we plan to conduct a study thatconsiders the following aspects: optimizing the design of theMap and Reduce tasks used in the MapReduce frameworkand real-time large-scale image edge detection processing.

Data Availability

This work involved data from the Pascal VOC2012 imagedatabase, which is publicly accessible at http://cvlab.postech.ac.kr/∼mooyeol/pascal voc 2012/ [32].

Conflicts of Interest

The authors have no conflicts of interest.

Authors’ Contributions

Jianfang Cao devised the study plan and led the writingof the article. Jianfang Cao and Min Wang conducted theexperiments and collected the data. Jianfang Cao and YunTian conducted the analysis, and Lichao Chen supervised theentire process and provided constructive advice. JianfangCaoand Min Wang contributed equally to this work.

Acknowledgments

This study was supported by the Natural Science Foundationof Shanxi Province (201701D21059), the Education SciencePlanning Projects in the 13th Five-Year Plan of the Key Dis-cipline Project of Shanxi Province (GH-17059), the Art andScience Planning Project of Shanxi Province (2017F06), andthe Science and Technology Cooperation Project of ShanxiProvince and theChineseAcademy of Sciences (20141101001).

Supplementary Materials

This section consists of the core source codes of the Map andReduce tasks, respectively. (Supplementary Materials)

References

[1] C. Akinlar and C. Topal, “ColorED: Color edge and segmentdetection by Edge Drawing (ED),” Journal of Visual Communi-cation and Image Representation, vol. 44, pp. 82–94, 2017.

[2] Hildreth E C., Edge detection, Massachusetts Institute of Tech-nology, Artificial Intelligence Laboratory, Cambridge, 1985.

[3] B. Julesz, “AMethodofCodingTelevision Signals Based onEdgeDetection,”Bell SystemTechnical Journal, vol. 38, no. 4, pp. 1001–1020, 1959.

[4] L. D. Roberts, Machine perception of three-dimension solidsin optical and electro-optimal information processing, Mas-sachusetts Institute of Technology Press, Cambridge, UK, 1966.

[5] B. Tang and W. Long, “Fast Canny algorithm based on GPU +CPU,” Chinese Journal of Liquid Crystals and Displays, vol. 31,no. 7, pp. 714–720, 2016.

[6] H. Cui Y, J. Cao F, and H. Shi, “Parallel PSO-BP neural networkalgorithm based on MapReduce,” in Bulletin of Science andTechnology, vol. 33, pp. 110–115, 110-115, 33(4, 2017.

[7] W. Zhu S and P. Wang, “Large-scale image retrieval solutionbased on Hadoop cloud computing platform,” Journal of Com-puter Applications, vol. 34, no. 3, pp. 695–699, 2014.

[8] A. L. Wang and X. S. Liu, “Vehicle license plate location basedon improved Roberts operator and mathematical morphology,”in Second International Conference on Instrumentation &

http://cvlab.postech.ac.kr/~mooyeol/pascal_voc_2012/

http://cvlab.postech.ac.kr/~mooyeol/pascal_voc_2012/

http://downloads.hindawi.com/journals/cin/2018/3598284.f1.pdf


Measurement, Computer, Communication and Control (IMCCC2012), Harbin, Peoples R China, 2012.

[9] T. Yang, HW. Tian, XM. Liu, Ke. XT, and PJ. Xing, “Otsuthresholding segmentation method based on two boundariesand its fast algorithm,” Application Research of Computers, vol.33, no. 12, pp. 3872–3875, 2016.

[10] H. Zhang, Q. Zhu, C. Fan, and D. Deng, “Image qualityassessment based on Prewitt magnitude,” AEU - InternationalJournal of Electronics and Communications, vol. 67, no. 9, pp.799–803, 2013.

[11] N. Dwivedi, K. Srivastava, andN. Arya, “Sanskrit word recogni-tion using Prewitt’s operator and support vector classification,”in Proceedings of the IEEE International Conference on EmergingTrends in Computing, Communication and Nanotechnology(ICE-CCN’13)., , MAR 25-26, Infant Jesus Coll Engn Technol,Dept Elect Commun Engn, Tirunelveli, India, 2013.

[12] P.-M. Nguyen, J.-H. Cho, and S. B. Cho, “An architecture forreal-time hardware co-simulation of edge detection in imageprocessing using Prewitt edge operator,” in Proceedings of the13th International Conference on Electronics, Information, andCommunication, ICEIC 2014, Malaysia, January 2014.

[13] S. Singh, A. K. Saini, R. Saini, A. S. Mandal, C. Shekhar, andA. Vohra, “A novel real-time resource efficient implementationof Sobel operator-based edge detection on FPGA,” InternationalJournal of Electronics, vol. 101, no. 12, pp. 1705–1715, 2014.

[14] W. Jiang, H. Zhou, Y. Shen, B. Liu, and Z. Fu, “Image segmenta-tion with pulse-coupled neural network and Canny operators,”Computers and Electrical Engineering, vol. 46, pp. 528–538, 2015.

[15] Z. Tang, L. Huang, X. Zhang, and H. Lao, “Robust imagehashing based on color vector angle and Canny operator,” AEU- International Journal of Electronics and Communications, vol.70, no. 6, pp. 833–841, 2016.

[16] Z. G. Sun and C. Z. Han, “Image enhancement based onLaplacian operator,” Application Research of Computers, vol. 24,no. 1, pp. 222-223, 2007.

[17] JW. Yang, TB. Cheng, ZY. Zhong, YJ. Cao, and WQ. Li, “dgedetection technique combined with mathematic morphologyand LoG operator,”Computer Engineering and Applications, vol.47, no. 36, pp. 177–179, 2011.

[18] G.M.H.Amer andA.M.Abushaala, “Edge detectionmethods,”in Proceedings of the 2015 2nd World Symposium on WebApplications and Networking, WSWAN 2015, Tunisia, March2015.

[19] K. A. Panetta, E. J. Wharton, and S. S. Agaian, “Logarithmicedge detection with applications,” Journal of Computers, vol. 3,no. 9, pp. 11–19, 2008.

[20] Z. P. Sun, X. H. Shao, Z. Wang, and Y. X. Zhang, “Theimproved adaptive Canny edge detection algorithm,” ElectricalMeasurement & Instrumentation.

[21] M. Ronggui,W.Weixing, and L. Sheng, “Extracting roads basedon retinex and improved canny operator with shape criteriain vague and unevenly illuminated aerial images,” Journal ofApplied Remote Sensing, vol. 6, no. 1, Article ID 12248, 2012.

[22] S. Guiming and S. Jidong, “Remote sensing image edge-detection based on improved Canny operator,” in Proceedingsof the 8th IEEE International Conference on CommunicationSoftware and Networks, ICCSN 2016, pp. 652–656, China, June2016.

[23] G. Mark, M. Ted, S. Jonathan, and S. Gwen,Hadoop ApplicationArchitectures, Addison, Beijing, China, 2017.

[24] A. O’Driscoll, J. Daugelaite, and R. D. Sleator, “’Big data’,Hadoop and cloud computing in genomics,” Journal of Biomed-ical Informatics, vol. 46, no. 5, pp. 774–781, 2013.

[25] K. H. Lee, Y. J. Lee, H. Choi, Y. D. Chung, and B.Moon, “Paralleldata processing with MapReduce: a survey,” ACM SIGMODRecord, vol. 40, no. 4, pp. 11–20, 2011.

[26] N. K. Alham, M. Li, Y. Liu, and S. Hammoud, “A MapReduce-based distributed SVM algorithm for automatic image annota-tion,” Computers & Mathematics with Applications, vol. 62, no.7, pp. 2801–2811, 2011.

[27] J. Cao, H. Cui, H. Shi, and L. Jiao, “Big data: A parallelparticle swarm optimization-back-propagation neural networkalgorithm based on MapReduce,” PLoS ONE, vol. 11, no. 6,Article ID e0157551, 2016.

[28] X. Li, J. Song, F. Zhang, X. Ouyang, and S. U. Khan, “MapRe-duce-based fast fuzzy c-means algorithm for large-scale under-water image segmentation,” Future Generation Computer Sys-tems, vol. 65, pp. 90–101, 2016.

[29] J. Cao, M. Wang, H. Shi, G. Hu, and Y. Tian, “A New Approachfor Large-Scale Scene Image Retrieval Based on ImprovedParallel k-Means Algorithm in MapReduce Environment,”Mathematical Problems in Engineering, vol. 2016, Article ID3593975, 2016.

[30] Y. Zheng, Y. Zhou, H. Zhou, and X. Gong, “Ultrasound imageedge detection based on a novel multiplicative gradient andcanny operator,” Ultrasonic Imaging, vol. 37, no. 3, pp. 238–250,2015.

[31] T. Xue LX Li, “Adaptive Canny edge detection algorithm,”Application Research of Computer, vol. 27, no. 9, pp. 3589-3590,2010.

[32] T. White, “Hadoop: the definitive guide,” in Sebastopol: O’ReillyMedia, 3rd edition, 2012.

[33] C. Doulkeridis and K. Norvag, “A survey of large-scale analyti-cal query processing inMapReduce,”TheVLDB Journal, vol. 23,no. 3, pp. 355–380, 2014.

[34] H. Xu, L. i. ZB, Y. Y. Jiang, and J. B. Huang, “Pavement crackdetection based on OpenCV and improved Canny operator,” inComputer Engineering andDesign, vol. 35, pp. 4254–4258, 4254-4258, 35(12, 2014.

[35] C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C.Kozyrakis, “Evaluating MapReduce for multi-core and multi-processor systems,” in the 13th IEEE International Symposiumon High Performance Computer Architecture (HPCA ), pp. 13–24, Scottsdale, Ariz, USA, February 2007.

[36] W. S. Zhu and P. Wang, “Large-scale image retrieval solutionbased on Hadoop cloud computing platform,” Journal of Com-puter Applications, vol. 34, pp. 695–699, 2014.

Computer Games Technology

International Journal of

Hindawiwww.hindawi.com Volume 2018

Hindawiwww.hindawi.com

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwww.hindawi.com

Volume 2018


ReconfigurableComputing



Applied Computational Intelligence and Soft Computing

Advances in

Artificial Intelligence



Civil EngineeringAdvances in


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi

www.hindawi.com Volume 2018

Advances in

Multimedia


Biomedical Imaging



Engineering Mathematics


RoboticsJournal of



Computational Intelligence and Neuroscience


Mathematical Problems in Engineering

Modelling &Simulationin EngineeringHindawiwww.hindawi.com Volume 2018

Hindawi Publishing Corporation http://www.hindawi.com Volume 2013Hindawiwww.hindawi.com

The Scientific World Journal

Volume 2018


Human-ComputerInteraction

Advances in


Scienti�c Programming

Submit your manuscripts atwww.hindawi.com

https://www.hindawi.com/journals/ijcgt/

https://www.hindawi.com/journals/je/

https://www.hindawi.com/journals/afs/

https://www.hindawi.com/journals/ijrc/

https://www.hindawi.com/journals/acisc/

https://www.hindawi.com/journals/aai/

https://www.hindawi.com/journals/ace/

https://www.hindawi.com/journals/jece/

https://www.hindawi.com/journals/jcnc/

https://www.hindawi.com/journals/am/

https://www.hindawi.com/journals/ijbi/

https://www.hindawi.com/journals/ijem/

https://www.hindawi.com/journals/jr/

https://www.hindawi.com/journals/cin/

https://www.hindawi.com/journals/mpe/

https://www.hindawi.com/journals/mse/

https://www.hindawi.com/journals/tswj/

https://www.hindawi.com/journals/ahci/

https://www.hindawi.com/journals/sp/

https://www.hindawi.com/

https://www.hindawi.com/

Documents

Implementing a Parallel Image Edge Detection Algorithm ... · poses a parallel image edge detection algorithm based on the Otsu operator by optimizing the thresholds of the CannyoperatorontheHadoopplatform.eproposed