vol3no4_23

Embed Size (px)

Citation preview

  • 7/27/2019 vol3no4_23

    1/8

    VOL. 3, NO. 4, April 2012 ISSN 2079-8407

    Journal of Emerging Trends in Computing and Information Sciences2009-2012 CIS J ournal. All rights reserved.

    http://www.cisjournal.org

    637

    Cloud Hadoop Map Reduce For Remote Sensing Image AnalysisMohamed H. Almeer

    Computer Science & Engineering DepartmentQatar University Doha, [email protected], [email protected]

    ABSTRACTImage processing algorithms related to remote sensing have been tested and utilized on the Hadoop MapReduce parallel

    platform by using an experimental 112-core high-performance cloud computing system that is situated in the EnvironmentalStudies Center at the University of Qatar. Although there has been considerable research utilizing the Hadoop platform forimage processing rather than for its original purpose of text processing, it had never been proved that Hadoop can besuccessfully utilized for high-volume image files. Hence, the successful utilization of Hadoop for image processing has beenresearched using eight different practical image processing algorithms. We extend the file approach in Hadoop to regard thewhole TIFF image file as a unit by expanding the file format that Hadoop uses. Finally, we apply this to other image formatssuch as the JPEG, BMP, and GIF formats. Experiments have shown that the method is scalable and efficient in processingmultiple large images used mostly for remote sensing applications, and the difference between the single PC runtime and theHadoop runtime is clearly noticeable.

    Keywords: Cloud, Hadoop, HPC, Image processing, Map reduce.

    1. INTRODUCTION

    The remote sensing community has recognizedthe challenge of processing large and complex satellitedatasets to derive customized products, and several effortshave been made in the past few years towards incorporationof high-performance computing models. This study analyzesthe recent advancements in distributed computingtechnologies as embodied in the MapReduce programmingmodel and extends that for image processing of collectedremote sensing images of Qatar.

    We make two contributions, which are thealleviation of the scarcity of parallel algorithms for

    processing large numbers of images using the parallelHadoop MapReduce framework and the implementation forQatari remote sensing images. Our research has beenconducted to find an efficient programming method forcustomized processing within the Hadoop MapReduceframework and to determine how this can be implemented.Performance tests for processing large archives of Landsatimages were performed with Hadoop. The following eightimage analysis algorithms were used: Sobel filtering, imageresizing, image format conversion, auto contrasting, image

    sharpening, picture embedding, text embedding, and imagequality inspection. The findings demonstrate thatMapReduce has a potential for processing large-scaleremotely sensed images and solving more complexgeospatial problems.

    Current approaches to processing images depend onprocessing a small number of images having a sequentialprocessing nature. These processing loads can almost fit on asingle computer equipped with a relatively small memory.Still, we can observe that more disk space is needed to store

    the large-scale image repository that usually results fromsatellite-collected data.

    The large-scale TIFF images of the Qatar Peninsulataken from satellites are used in this image processingapproach. Other image formats such as BMP, GIF, andJPEG can also be handled by the Hadoop Java programmingmodel developed. That is, the model can function on thoseformats as well as the original TIFF. The developedapproach can also convert other formats, besides TIFF, toJPEG.

    2. RELATED WORKS

    Kocakulak and Temizel [3] used Hadoop andMapReduce to perform ballistic image analysis, whichrequires that a large database of images be compared with anunknown image. They used a correlation method to comparethe library of images with the unknown image. The processimposed a high computational demand, but its processingtime was reduced dramatically when 14 computational nodeswere utilized. Li et al. [4] used a parallel clusteringalgorithm with Hadoop and MapReduce. This approach wasintended to reduce the time taken by clustering algorithmswhen applied to a large number of satellite images. The

    process starts by clustering each pixel with its nearest clusterand then calculates all the new cluster centers on the basis ofevery pixel in one cluster set. Another clustering of remotesensing images has been reported by Lv et al. [2], buta parallel K-means approach was used. Objects with similarspectral values were clustered together without any formerknowledge. The parallel environment was assisted by theHadoop MapReduce approach taken, since the execution ofthis algorithm is both time- and memory-consuming.Golpayegani and Halem [5] used a Hadoop MapReduceframework to operate on a large number of Aqua satellite

  • 7/27/2019 vol3no4_23

    2/8

    VOL. 3, NO. 4, April 2012 ISSN 2079-8407

    Journal of Emerging Trends in Computing and Information Sciences2009-2012 CIS J ournal. All rights reserved.

    http://www.cisjournal.org

    638

    images collected by the AIRS instruments. A gridding ofsatellite data is performed using this parallel approach.

    In all those cases, with the increase of image datathe parallel algorithm conducted by MapReduce exhibitedsuperiority over a single machine implementation.

    Moreover, by using higher performance hardwarethe superiority of the MapReduce algorithm was betterreflected.

    In the research conducted on image processingoperated on top of the Hadoop environment, which is arelatively new field started for working on satellite images,the number of successfully taken approaches has been few.That is why we decided to use the same eight techniquesfound in the ordinary discipline of image processing andapply the algorithms to remote sensing data on Qatar bymeans of a Java programming tool implemented in a HadoopMapReduce environment within a cluster of computing

    machines.

    3. HADOOP - HDFS MAPREDUCE

    Hadoop is an Apache software project that includeschallenging subprojects such as the MapReduceimplementation and the Hadoop distributed file system(HDFS) that is similar to the main Google file systemimplementation.

    In our study and in others as well, the MapReduceprogramming model will be actively working with thisdistributed file system. HDFS is characterized as a highly

    fault-tolerant distributed file system that can store a largenumber of very large files on cluster nodes. MapReduce isbuilt on top of the HDFS file system but is independent.

    HDFS is an open source implementation of theGoogle file system (GFS). Although it appears as anordinary file system, its storage is actually distributed amongdifferent data nodes in different clusters. HDFS is built onthe principle of the master-slave architecture. The masternode (name node) provides data service and access

    permission to the slave nodes, while the slaves (data nodes)serve as storage for the HDFS. Large files are distributedand further divided among multiple data nodes. The map

    processing jobs located on all nodes are operated on theirlocal copies of the data. It can be observed that the namenode stores only the metadata and the log information whilethe data transfer to and from the HDFS is done through theHadoop API.

    MapReduce is a parallel programming model forprocessing large amounts of metadata on cluster computerswith unreliable and weak communication links. MapReduceis based on the scale-out principle, which involves clusteringa large number of desktop computers. The main point ofusing MapReduce is to move computations to data nodes,

    rather than bring data to computation nodes, and thus fullyutilize the advantage of data locality. The code that divideswork, exerts control, and merges output in MapReduce isentirely hidden from the application user inside theframework. In fact, most of the parallel applications can beimplemented in MapReduce as long as synchronized and

    shared global states are not required.

    MapReduce allows the computation to be done intwo stages: the map stage and then the reduce stage. Thedata are split sets of keyvalue pairs and their instances are

    processed in parallel by the map stage, with a parallelnumber that matches the node number dedicated as slaves.This process generates intermediate keyvalue pairs that aretemporary and can later be directed to reduce stages. Withinmap stages or reduce stages, the processing is conducted in

    parallel. The map and reduce stages occur in a sequentialmanner by which the reduce stage starts when the mapstages finishes.

    3.1 Non-Hadoop Approach

    The current processing of images goes throughordinary sequential ways to accomplish this job. The

    program loads image after image, processing each imagealone before writing the newly processed image on a storagedevice. Generally, we use very ordinary tools that can befound in Photoshop, for example. Besides, many ordinary Cand Java programs can be downloaded from the Internet oreasily developed to perform such image processing tasks.Most of these tools run on a single computer with aWindows operating system. Although batch processing can

    be found in these single-processor programs, there will beproblems with the processing due to limited capabilities.Therefore, we are in need of a new parallel approach to workeffectively on massed image data.

    3.2 Hadoop Approach to Image Processing

    In order to process a large number of imageseffectively, we use the Hadoop HDFS to store a largeamount of remote sensing image data, and we useMapReduce to process these in parallel. The advantages ofthis approach are three abilities: 1) to store and accessimages in parallel on a very large scale, 2) to perform imagefiltering and other processing effectively, and 3) tocustomize MapReduce to support image formats like TIFF.The file is visited pixel by pixel but accessed whole as onerecord.

    The main attraction of this project will be thedistributed processing of large satellite images by using aMapReduce model to solve the problem in parallel.

    In past decades, the remote sensing data acquiredby satellites have greatly assisted a wide range ofapplications to study the surface of the Earth. Moreover,

  • 7/27/2019 vol3no4_23

    3/8

    VOL. 3, NO. 4, April 2012 ISSN 2079-8407

    Journal of Emerging Trends in Computing and Information Sciences2009-2012 CIS J ournal. All rights reserved.

    http://www.cisjournal.org

    639

    these data are available in the form of 2D images, whichassisted us to manipulate them easily using computers.

    Remote sensing image processing consists in theuse of various algorithms for extracting information fromimages and analyzing complex scenes. In this study, we will

    be focusing on spatial transformations such as contrastenhancement, brightness correction, and edge detection,which are based on the image data space. A commonapproach in such transformations is that new pixel values aregenerated on the basis of mathematical operations performedon the current pixel in terms of the surrounding ones. Thesespatial transformations that will be used are characterized aslocal transforms of a local image within a smallneighborhood of a pixel.

    4. SOFTWARE IMPLEMENTATION

    In this project, it is not our intention to use both

    mappers and reducers simultaneously. We simply use eachmap job to read an image file, process it, and finally writethe processed image file back into an HDFS storage area.The name of the image file is considered the Key and its

    byte content is considered the Value. However, the outputpair that is directed to the reducer job will not be used,because reducers will not be used at all, since what we needcan be accomplished in the map phase only. This scheme isapplied to all seven image processing algorithms used in thisresearch, except the algorithm to detect image quality. Here,the Key and the Value for the input parameters of the Map

    job will remain the same, but the output Key value will beused exclusively. This Key value will be assigned to the

    name of the corrupted image file. Reducers in this exceptionwill collect all the file names that have undergone defectdetection and will finally be grouped in a single output filestored in the HDFS output storage area.

    In order to customize the MapReduce framework,which is essentially designed for text processing, somechanges in Java classes are inevitable for image processingneeds. The first thing to consider when using Hadoop forimage processing algorithms is how to customize its datatypes, classes, and functions to read, write, and process

    binary image files as can be seen in sequential Javaprograms running on a PC. We can do that by changing theAPI of Hadoop to allow images in the BytesWritable classinstead of the TextWritable class and stop the splitting ofinput files into chunks of data.

    4.1 Whole File Input Class and Splitting

    In Hadoop, we use a WholeFileInputFormat classthat allows us to deal with an image as a whole file withoutsplitting it. This is a critical modification, since the contentsof a file must be processed solely by one Map instance. Thatis why maps should have access to the full contents of a file.This can be carried out by changing the RecordReader class

    that delivers the file contents as a value of the record. Thisparticular change can be seen in the WholeFileInputFormatclass that defines a format in which the keys are not used,represented by NullWritable, and the values that are the filecontents, represented by BytesWritable instances.

    The alterations will define two methods. In the firstmethod, the isSplitable() is overridden to return false andthus ensure that input files are never split. In the secondmethod, the getRecordReader() returns a customimplementation of the usual RecordReader class.

    While the InputFormat is responsible for creatingthe input splits and dividing those into records,the InputSplit is a Java interface responsible for representingthe data to be processed by one Mapper. The RecordReaderwill read the input splits and then divide the file into records.The record size is equal to the size of the split specified inthe InputSplit interface, which specifies the key and value

    for every record as well.

    Splitting of image files is not accepted, since itconverts the images into binary files with pixels in eachsplit. This splitting of binary files belonging to a wholeimage will definitely cause a loss of image features leadingto image noise or a poor-quality representation. Therefore,we enforce that the files be processed in parallel and asa single record.

    Control of splitting is important so that a singleMapper can process a single image file. The method forcontrolling the process will be implemented by overridingthe IsSplitable() method to return false, thus ensuring thateach file will be mapped by one Mapper only and will berepresented by one record only.

    Now each mapper has been guaranteed to read asingle image file, process it, and then write the newly

    processed image back into the HDFS.

    4.2 Mappers

    The mapper that contains our image processingalgorithms is applied to every input image represented by thekeyvalue pair to generate an intermediate image and store it

    back into the HDFS. The key of the Mapper is NullWritable

    (null) and the value is of BytesWritable type, which ensuresthat images are read one record at a time into every mapper.All the processing algorithms will be carried out in theMapper and no reducer will be required. A typical Mapfunction for one of the image filtering algorithms is listed inTable 1.

  • 7/27/2019 vol3no4_23

    4/8

    VOL. 3, NO. 4, April 2012 ISSN 2079-8407

    Journal of Emerging Trends in Computing and Information Sciences2009-2012 CIS J ournal. All rights reserved.

    http://www.cisjournal.org

    640

    Table 1: Map Function

    Input: public void map(Null Writable key, BytesWritable value,Output Collector output,

    Reporter reporter)

    Output: (Null, Null) Set the input path and the output path Set the type Input Format to whole File

    Input Format class and set the outputformat to Null Output Format class.

    The image is read from the value thatcomes from the Mapper, and thenthe Mapper will read the image bytes forstorage as Bytes Array Input Stream.

    Read source buffered Image usingImagIO.read.

    Create FS Data Output Stream instancevariable to write images directly to FileSystem.

    Create destination Buffered Image with

    the height and width parameters equals tothe source height and width.

    Create kernel instance, specify the matrixsize to be 3 3, and set an array of floats,which is the kernel value.

    Create BufferedI mageOp instance andassign the kernel to Convolveop method.

    Callfilter method from Buffered ImageOpto apply filter on the image.

    Encode the destination image in JPEGformat using JPEG Encoder.

    Store the image in the HDFS.

    4.3 Reducers

    The algorithm to test image quality is one newlydeveloped in our experimentation and has a differentimplementation of mapreduce. The Mapper task in thisimplementation is to obtain the names of processed imagesand pass those through a KeyValue pair to Reducers.The KeyValue pair generated by each Mapper will bethe Null and the file name where a degree of contentcorruption is detected and evaluated. The reducer will collectthe names of files and join those in a single output file.

    The following is a list of the other classes that werealtered: RecordReader:

    The RecordReader is another class that

    has been customized for image file access. This is theone responsible for loading the data from its source andconverting it into (key, value) pairs suitable for Mapper

    processing. By altering this to read a whole input imageas one record passed to Mappers, we ensure no splittingof the image content as is usually done for text files.BufferedImage:

    Raster of image data. In order to process an image,BufferedImage must be used to read the image.

    The BufferedImage is introduced hereas a way to contain an image file and ensure that its

    pixels can be accessed. This is a subclass that describesan Image with an accessible buffer of image data. ABufferedImage is comprised of a ColorModel and a

    JPEGImageEncoder:

    5. IMAGE PROCESSING ALGORITHMS

    USED IN REMOTE SENSING

    5.1 Color Sharpening

    Sharpening is one of the most frequently usedtransformations that can be applied to an image forenhancement purposes, and we will use the ordinarymethods of sharpening here to bring out image details thatwere not apparent before. Image sharpening is used toenhance both the intensity and the edge of the image in orderto obtain the perceived image.

    The sharpening code is implemented usingconvolution, which is an operation that computes thedestination pixel by multiplying the source pixel and itsneighbors by a convolution kernel. The kernel is a linearoperator that describes how a specified pixel and itssurrounding pixels affect the value computed in thedestination image of a filtering operation. Specifically, thekernel used here is represented by 3 3 matrixes of floating-

    point numbers. When the convolution operation isperformed, this 3 3 matrix is used as a sliding mask tooperate on the pixels of the source image. To compute theresult of the convolution for a pixel located at coordinates (x,y) in the source image, the center of the kernel is positionedat these coordinates. To compute the value of the destination

    pixel at (x, y), a multiplication is performed on the kernelvalues with their corresponding color values in the sourceimage.

    5.2 Sobel Filter

    The Sobel edge filter is used to detect edges and isbased on applying horizontal and vertical filters in sequence.Both filters are applied to the image and summed to form thefinal result. Edges characterize boundaries; therefore, edgesare fundamental in image processing. Edges in images areareas with strong intensity contrasts. Detecting the edges inan image significantly reduces the amount of data and filtersout useless information while preserving the importantstructural properties in an image.

    The JPEGImageEncoder is usedextensively to encode buffers of image data as JPEGdata streams, the preferred output format used to storeimages. Users of this interface are required to provide

    image data in a Raster or a BufferedImage, setthe necessary parameters in the JPEGEncodeParamsobject, and successfully open the OutputStream that isthe destination of the encoded JPEG stream.The JPEGImageEncoder interface can encode imagedata as abbreviated JPEG data streams that are writtento the OutputStream provided to the encoder.

  • 7/27/2019 vol3no4_23

    5/8

    VOL. 3, NO. 4, April 2012 ISSN 2079-8407

    Journal of Emerging Trends in Computing and Information Sciences2009-2012 CIS J ournal. All rights reserved.

    http://www.cisjournal.org

    641

    The Sobel operator applies a 2-D spatial gradientmeasurement to an image and is represented by an equation.It is used to find the approximate gradient magnitude at each

    point in an input grayscale image. The discrete form of theSobel operator can be represented by a pair of 3 3convolution kernels, one estimating the gradient in the x-

    direction (columns) and the other estimating the gradient inthey-direction (rows). The Gx kernel highlights the edges inthe horizontal direction, while the Gy kernel highlights theedges in the vertical direction. The overall magnitude of theoutputs, |G|, detects edges in both directions and is the

    brightness value of the output image.

    (1)

    (2)

    In detail, the kernel mask is slid over an area of theinput image, changes the value of that pixel, and then isshifted one pixel to the right. It continues toward the rightuntil it reaches the end of the row, and then it starts atthe beginning of the next row. The center of the mask is

    placed over the pixel that will be manipulated in the image,and the file pointer is moved using the (i, j) values so thatmultiplications can be performed. After taking themagnitude of both kernels, the resulting output detects edgesin both directions.

    5.3 Auto-Contrast

    Automatic contrast adjustment (auto-contrast) is apoint operation whose task is to modify the pixel intensitywith respect to the range of values in an image. This is done

    by mapping the current darkest and brightest pixels tothe lowest and highest available intensity values,respectively, and linearly distributing the intermediatevalues.

    Let us assume that a low and a high are the lowestand highest pixel values found in the current image, whosefull intensity range is [amin, amax]. We first map the smallest

    pixel value alow to zero, subsequently increase the contrastby the factor (amaxamin)/(ahighalow), and finally shift tothe target range by adding amin. According to thismathematical formula, the auto-contrast factor is calculated.

    (3)

    Provided that ahigh alow and for an 8-bit image, amin = 0 andamax = 255.

    The aim of this image process is to brighten thedarkest places in satellite images by a certain factor. Forexample, in the images captured from the satellite the sea isrepresented by a very dark color and does not look obvious;

    therefore, using this process greatly helps to reveal theimage details by increasing their brightness.

    6. EXPERIMENTATION

    6.1 Environment Setup

    Qatar University has recently received a dedicatedcloud infrastructure called the QU Cloud. This is running a14-blade IBM BladeCenter (cluster) with 112 physical coresand 7 TB of storage. The cloud has advanced managementsoftware (IBM Cloud 1.4) with a Tivoli provisioningmanager that allows cloud resources to be provisionedthrough software.

    Our experiments were performed on this clusterequipped with Hadoop. This project has been provisionedwith one NameNode and eight DataNodes. The NameNodewas configured to use two 2.5-GHz CPUs, 2 GB of RAM,

    and 130 GB of storage space.

    Each DataNode was configured to use two 2.5-GHzCPUs, 2 GB of RAM, and 130 GB of disk storage. Besidesthis, all the computing nodes were connected by a gigabitswitch. Red Hat Linux 5.3, Hadoop 0.20.1, and Java 1.6.0_6were installed on both the NameNode and the DataNodes.Table 2 shows the master-slave hardware configuration,while Tables 4 and 5 show the cluster hardware and softwareconfigurations, respectively.

    Table 2: Master/Slave Configuration

    Table 3: Single Machine Configuration

    PC Details

    PersonalComputer

    Intel Core 2 Duo CPU, 2.27 GHz,RAM 4 GB, and total storage500 GB.Operating System: Windows Vista

    Name Number Details

    Master 1 2 2.5 GHz CPUNode, 2 GB RAM,130 GB disk space

    Slaves 8 2 2.5 GHz CPUNode, 2 GB RAM,130 GB disk space

  • 7/27/2019 vol3no4_23

    6/8

    VOL. 3, NO. 4, April 2012 ISSN 2079-8407

    Journal of Emerging Trends in Computing and Information Sciences2009-2012 CIS J ournal. All rights reserved.

    http://www.cisjournal.org

    642

    Table 4: Cluster Hardware Configuration

    Name Number Details

    Name Node 1 2 2.5 GHz CPU

    Node, 2 GB RAMName Node 8 2 2.5 GHz CPU

    Node, 2 GB RAMNetwork 1 Gbit switch

    connecting allnodes

    Storage 9 130 GB

    Table 5: Software Configurationfor Cluster Computers

    Name Version Details

    Hadoop 0.20.1 Installed on eachnode of thecomputer

    Xen Red HatLinux

    5.2 Pre-configured withJava and theHadoop SDKs.

    IBM JavaSDK

    6 Used inprogrammingimage processingalgorithms

    6.2 Data Source

    The data source was a real color satellite image ofQatar supplied by the Environmental Studies Center at QatarUniversity. Subsequent processing was done on thoseimages to change their sizes so that the Java compiler withits limited heap size can open them successfully.The collection of remote sensing images taken from Landsatsatellites consists of 200 image files, each of which has16669 8335 pixels resolution in TIF format. The spatialresolution was 30.0 meters per pixel, and in our experiments,only a small part of the images was used.

    The developed Java program reads the JPEG, TIFF,GIF, and BMP standard image formats. Although the JPEG,and GIF image formats are the most commonly used, theTIF format is the standard universal format for high-quality

    images and is used for archiving important images. Mostsatellite images are in TIF format because TIFF images arestored with their full quality; hence, the size of TIFF imagesis very large in uploading, resizing, and processing.

    When the processing finishes, all the images willhave been encoded using the JPEG format by callingthe JPEGEncoder method to perform the required JPEGcompression for the image. Then, each image will be savedin a JPEG file. The JPEG format is used when an image ofhigh quality is not necessary (e.g., to prove the processinghere). JPEG compression reduces images to a size that issuitable for processing and non-archival purposes as neededin this study.

    7. EXPERIMENTAL RESULTS &

    ANALYSIS

    In this section, we test the performance of someparallel image filtering algorithms applied to remote sensingimages available in the Environmental Studies Center. Wemainly consider the trend in time consumption withthe increase in the volume of data, and we try to showthe difference in run time between a single PCimplementation and the parallel Hadoop implementation.

    Tables 35 list the different configurations of thecluster and single PC computing platforms used to obtainthe experimental results. We repeated each experiment threetimes and the average reading was finally recorded. We

    implemented the image sharpening, Sobel filtering, andauto-contrasting by using Java. The programs ran well whenthe size of the input image was not large.

    Figures 13 show run time against file size for thetwo configurations; namely, the single machine and thecloud cluster. Three different image processing algorithmswere used for experimentation, and different timings wererecorded because each algorithm was uniquely different inits numerical processing overhead. These useful image

    processing algorithms were chosen in the Hadoop project fortheir frequent use in remote sensing and for their diversecomputational loads. We paid more attention to

    the processing time than the number of images processed.Moreover, we considered the elapsed time as the soleperformance measure. We compared the times taken by thesingle PC and the cluster to observe the speedup factors andfinally analyzed the results.

  • 7/27/2019 vol3no4_23

    7/8

    VOL. 3, NO. 4, April 2012 ISSN 2079-8407

    Journal of Emerging Trends in Computing and Information Sciences2009-2012 CIS J ournal. All rights reserved.

    http://www.cisjournal.org

    643

    Figure 1: Auto-contrast chart.

    Figure 2: Sobel filtering chart.

    Figure 3: Sharpening chart.

    Figure 4: Sample image output for auto-contrast,Sobel filtering, and sharpening (Khor Al Udaid Beach).

    8. CONCLUSION AND FUTURE WORK

    In this paper, we presented a case study forimplementing parallel processing of remote sensing imagesin TIF format by using the Hadoop MapReduce framework.The experimental results have shown that the typical image

    processing algorithms can be effectively parallelized withacceptable run times when applied to remote sensing images.A large number of images cannot be processed efficiently inthe customary sequential manner. Although originallydesigned for text processing, Hadoop MapReduce installed

    in a parallel cluster proved suitable to process TIF formatimages in large quantities.

    For companies that own massive images and needto process those in various ways, our implementation provedefficient and successful. In particular, the best results showthat the auto-contrast algorithm reached a sixfold speedupwhen implemented on a cluster rather than a single PC.Moreover, the sharpening algorithm reached an eightfoldreduction in run time, while the Sobel filtering algorithmreached a sevenfold reduction.

  • 7/27/2019 vol3no4_23

    8/8

    VOL. 3, NO. 4, April 2012 ISSN 2079-8407

    Journal of Emerging Trends in Computing and Information Sciences2009-2012 CIS J ournal. All rights reserved.

    http://www.cisjournal.org

    644

    We have observed that this parallel Hadoopimplementation is better suited for large data sizes than forwhen a computationally intensive application is required. Inthe future, we might focus on using different image sourceswith different algorithms that can have a computationallyintensive nature.

    REFERENCES

    [1] G. Fox, X. Qiu, S. Beason, J. Choi, J. Ekanayake,T. Gunarathne, M. Rho, H. Tang, N. Devadasan, andG. Liu, Biomedical case studies in data intensivecomputing, in Cloud Computing (LNCS vol. 5931),M. G. Jaatun, G. Zhao, and C. Rong, Eds. Berlin:Springer-Verlag, 2009, pp. 218.

    [2] Z. Lv, Y. Hu, H. Zhong, J. Wu, B. Li, and H. Zhao,2010. Parallel K-means clustering of remote sensing

    images based on MapReduce, in Proc. 2010 Int.Conf. Web Information Systems and Mining

    (WISM 10), pp. 162170.

    [3] H. Kocakulak and T. T. Temizel, A Hadoop solutionfor ballistic image analysis and recognition, in 2011Int. Conf. High Performance Computing and

    Simulation (HPCS), Istanbul, pp. 836842.

    [4] B. Li, H. Zhao, Z. H. Lv, Parallel ISODATAclustering of remote sensing images based on

    MapReduce, in 2010 Int. Conf. Cyber-EnabledDistributed Computing and Knowledge Discovery

    (CyberC), Huangshan, pp. 380383.

    [5] N. Golpayegani and M. Halem, Cloud computing forsatellite data processing on high end compute

    clusters, in Proc. 2009 IEEE Int. Conf. CloudComputing (CLOUD 09), Bangalore, pp. 8892.

    [6] T. Dalman, T. Dernemann, E. Juhnke, M. Weitzel,M. Smith, W. Wiechert, K. Nh, and B. Freisleben,Metabolic flux analysis in the cloud, in IEEE 6thInt. Conf. e-Science, Brisbane, 2010, pp. 5764.

    [7] Hadoop.http://hadoop.apache.org/.

    [8] MapReduce.http://hadoop.apache.org/mapreduce/.

    [9] HDFS.http://hadoop.apache.org/hdfs/.

    [10] J. Dean and S. Ghemawat, MapReduce: simplifieddata processing on large clusters, Commun. ACM,vol. 51, no. 1, pp. 107114, 2008.

    http://hadoop.apache.org/http://hadoop.apache.org/http://hadoop.apache.org/http://hadoop.apache.org/mapreduce/http://hadoop.apache.org/mapreduce/http://hadoop.apache.org/mapreduce/http://hadoop.apache.org/hdfs/http://hadoop.apache.org/hdfs/http://hadoop.apache.org/hdfs/http://hadoop.apache.org/hdfs/http://hadoop.apache.org/mapreduce/http://hadoop.apache.org/