Research Article SCBI MapReduce, a New Ruby Task-Farm ...downloads.hindawi.com/archive/2013/707540.pdf · task-farm skeleton for the Ruby scripting language [ ]that gathers the requirements

Hindawi Publishing CorporationComputational Biology JournalVolume 2013 Article ID 707540 12 pageshttpdxdoiorg1011552013707540

Research ArticleSCBI_MapReduce a New Ruby Task-Farm Skeleton forAutomated Parallelisation and Distribution in Chunks ofSequences The Implementation of a Boosted Blast+

Dariacuteo Guerrero-Fernaacutendez1 Juan Falgueras2 and M Gonzalo Claros13

1 Supercomputacion y Bioinformatica-Plataforma Andaluza de Bioinformatica (SCBI-PAB) Universidad de Malaga29071 Malaga Spain

2Departamento de Lenguajes y Ciencias de la Computacion Universidad de Malaga 29071 Malaga Spain3 Departamento de Biologıa Molecular y Bioquımica Universidad de Malaga 29071 Malaga Spain

Correspondence should be addressed to M Gonzalo Claros clarosumaes

Received 21 June 2013 Revised 18 September 2013 Accepted 19 September 2013

Academic Editor Ivan Merelli

Copyright copy 2013 Darıo Guerrero-Fernandez et al This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

Current genomic analyses often require the managing and comparison of big data using desktop bioinformatic software that wasnot developed regarding multicore distributionThe task-farm SCBI MapReduce is intended to simplify the trivial parallelisationand distribution of new and legacy software and scripts for biologists who are interested in using computers but are not skilledprogrammers In the case of legacy applications there is no need of modification or rewriting the source code It can be used frommulticore workstations to heterogeneous grids Tests have demonstrated that speed-up scales almost linearly and that distributionin small chunks increases it It is also shown that SCBI MapReduce takes advantage of shared storage when necessary is fault-tolerant allows for resuming aborted jobs does not need special hardware or virtualmachine support and provides the same resultsthan a parallelised legacy software The same is true for interrupted and relaunched jobs As proof-of-concept distribution of acompiled version of Blast+ in the SCBI Distributed Blast gem is given indicating that other blast binaries can be used whilemaintaining the same SCBI Distributed Blast code Therefore SCBI MapReduce suits most parallelisation and distributionneeds in for example gene and genome studies

1 Introduction

The study of genomes is undergoing a revolution the produc-tion of an ever-growing amount of sequences increases yearby year at a rate that outpaces computing performance [1]This huge amount of sequences needs to be processed withthe well-proven algorithms that will not run faster in newcomputer chips since around 2003 chipmakers discoveredthat they were no longer able to sustain faster sequential exe-cution except for generating themulticore chips [2 3]There-fore the only current way to obtain results in a timelymanneris developing software dealing with multicore CPUs or clus-ters of multiprocessors In such a context ldquocloud computingrdquois becoming a cost-effective and powerful resource of multi-core clusters for task distribution in bioinformatics [1 2]

Sequence alignment and comparison are themost impor-tant topics in bioinformatic studies of genes and genomes It isa complex process that tries to optimise sequence homologyby means of sequence similarity using the algorithm ofNeedleman-Wunsch for global alignment or the one ofSmith-Waterman for local alignments Blast and Fasta [4]are the most widespread tools that have implemented themPaired sequence comparison is inherently a parallel pro-cess in which many sequence pairs can be analysed at thesame time by means of functions or algorithms that are iter-atively performed over sequences This is impelling the par-allelisation of sequence comparison algorithms [5ndash9] as wellas other bioinformatic algorithms [10 11]

In most cases the parallelised versions need to be rewrit-ten from scratch including explicit parallel programming

2 Computational Biology Journal

related to communication and synchronisation [12] Thismakes programming software for distributed systems a verychallenging task [13] and important long-running data pro-cessing scripts for bioinformatics remain unparallel Hence itshould be desirable to have a flexible general-purpose frame-work for distribution that could (i) take advantage of theexisting scripts andor binaries without requiring any sourcecode modification (ii) be used for distributing new bio-informatic algorithms (iii) transfer data in the most secureform when secure connections cannot be established and(iv) exploit the total computational power of any multicorecomputing system allowing for parallelisation among coresand distribution between computers

2 Related Work

Native threads is a satisfactory approach in compiled com-puter languages (for example Jrpm [14] a Java runtimemachine for parallelising loops in sequential Java programsautomatically) but it may not be fully implemented in script-ing languages But there are efficient dedicated computerlanguages such as ErLang and Scala [15 16] which offerprogrammable solutions for specific concurrent modelsAlthough being quite efficient its main disadvantage is thatits use requires whole code rewriting making embarrassinglyparallel task regions or orchestrating communication as wellas synchronisation which is reserved only for skilled pro-grammers Moreover the resulting paralleldistributed coderemains bonded to the software version that is adapted

A de facto standard model used in scientific high-performance computing analysis is the Message-PassingInterface (MPI) [17] whosemost widely implementations arepyMPI (httppympisourceforgenet) that requires explicitparallel codingOpenMP [18] and a set of compiler directivesand callable runtime library routines that enables shared-memory parallelism OpenMP includes a set of synchronisa-tion features since programmers are responsible for checkingdependencies deadlocks race conditions etc There is alsothe R library Rmpi which is a wrapper for porting MPI to Rwith the same pros and cons of MPI

True parallelisationdistribution frameworks can also beachieved by means of MapReduce [19] and its most widelydistributed implementation Hadoop [20] A promising newresource is YARN [21] which introduces a generic schedulingabstraction that allows multiple parallelisationdistributionframeworks (for example Hadoop andMPI) to coexist on thesame physical cluster Researchers can also find Condor [22]a specialised full-featured workload management system forcompute-intensive jobs It is easy to use but provides sub-optimal solutions The BOINC platform [23] is a distributedsequence alignment application that offers the aggregation ofthe available memory of all participating nodes but it suffersfrom communication overhead

Parallelisation libraries for R language besides Rmpi areSPRINT [24] and pR [25] packages whose their main advan-tage is that they require very little modification to the existingsequential R scripts and no expertise in parallel computinghowever the master worker suffers from communicationoverhead and the authors recognise that their approach may

not yield the optimal schedule [25] Other parallelisationlibraries are snow and nws that provide coordination andparallel execution facilities

More general-purpose tools such as bag-of-tasks enginesfor multicore architectures and small clusters have also beendeveloped in Python such as PAR [26] its main disadvantageis that it is hard to put in practice and is only available forsmall clusters There is also FastFlow [27] a C++ pattern-based programming framework for parallel and distributedsystems although it simplifies the task of distributed softwareit must be compiled on every machine and seems moreappropriate for skilled programmers in C++ It has beenargued that abstractions are an effective way of enablingnonexpert users to harness clusters multicore computers andclusters of multicore computers [13] Although abstractionscan enable the creation of efficient robust scalable andfault tolerant implementations and are easy for nonexpertprogrammers they are specialised to a restricted class ofworkloads and its customisation to produce general-purposetools is not trivial

The comparison of nucleotide or protein sequences fromthe same or different organisms is a very powerful toolin the study of gene and genomes for finding similaritiesbetween sequences to infer the biological function andstructure of newly sequenced genes predict new membersof gene families decipher genome organisation and exploreevolutionary relationships Blast is the algorithm of choicefor such analyses and its performance has been continuouslyimproved particularly from the arrival of high-throughputsequencing That is why it is becoming a critical componentof genome homology searches and annotation inmany bioin-formatics workflows Improvements in its execution speedwill result in significant impact in the practice of genomestudies Therefore important efforts have been invested inaccelerating it for different computers systems (to cite a fewmpiBLAST [6 12] CloudBLAST [28] AzureBlast [29]GPU-Blast [30] and scalaBLAST 20 [31]) These Blastparallelisations require computer expertise to produce andadapt a particular Blast code and are tightly bonded to thesoftware version included in the paralleliseddistributed code[31] It has been reported [9] that most MPI- and GPU-basedBlast are only adequately optimal and generalizable to per-form the batch processing of small amounts of sequence dataon small clusters Therefore there is room for a distributedflexible easily-upgradeable version of Blast

It can be inferred that distributed algorithms are becom-ing a real need in present bioinformatics research in orderto adapt legacy and new software to multicore computingfacilities This paper describes SCBI MapReduce a newtask-farm skeleton for the Ruby scripting language [32] thatgathers the requirements presented in the Introduction andsimplifies the creation of parallel and distributed software toresearchers without skills in distributed programming Evenif customisation could appear more complicated than usingexisting libraries for parallelisationmdashsuch as OpenMP [18]BOINC [23] or R libraries such asRmpi or SPRINT [24]mdashit isas simple as the parallelisation of R code usingpR [25] In con-trast to these libraries SCBI MapReduce is not constrainedonly to parallelisation (not allowing distribution nor grid

Computational Biology Journal 3

computing) and is able to extract the complete distributioncapabilities of any computer systemWhile other systems likeMPI [17] cannot deal with node failure SCBI MapReducelikeHadoop [20] and FastFlow [27] includes implementationof error handling and job checkpointing methods More-over it gathers additional features for task-skeletons suchas encryption compression on-the-fly and distribution inchunks As a proof-of-concept of usewith legacy software theSCBI Distributed Blast gem was developed to distributethe widely used Blast+ [4] application This gem is notbonded to the Blast+ version included in it since anyBlast+ binary of the scientist computer can be used

3 Methods

31 Hardware and Software Scripting code was based onRuby 19 for OSX and SLES Linux The computing facilitiesused were (i) a ldquox86rdquo cluster consisting of 80 x86 64 E5450cores at 30GHz with 16GB of RAM every 8-core bladeconnected by an InfiniBand network and a PBS queue system(ii) a ldquox86 upgradedrdquo cluster consisting of 768 x86 64 E5-2670 cores at 26GHz with 64GB of RAM every 16-coreblade connected by an InfiniBand FDR network and a Slurmqueue system (iii) a homogeneous symmetric multiprocess-ing machine consisting of a ldquoSuperDomerdquo of 128 Itanium-2 cores at 16 GHz with 400Gb of RAM and (iv) two x86computers with 4 cores at 27 GHz using OSX and four x86computers with 8 cores at 28GHz using Linux and connectedby gigabit Ethernet (GbE)

32 The Task-Farm Skeleton Design Based on the well-known approach of MapReduce [21] which restricted itto trivial problems in which no communication betweenworkers is needed (Figure 1(a)) SCBI MapReduce takes afurther step following a task-farm skeleton design (Figure1(b)) This design does not need synchronous handlingand entails asymmetry-tolerance [33] It works launchingone ldquomanagerrdquo process to dispatch ldquotasksrdquo to ldquoworkersrdquo ondemand When a new worker is connecting it automaticallyreceives the input parameters and a data chunk (eg a groupof sequences) from the manager Since the skeleton containsthe capability to take advantage of shared storage (lustre nfsibrix stornext and even samba) commondata for all workers(eg the subject database for Blast+) are not copied on everynode saving disk space and correspondingly diminishing thedata transfer and starting delay for every worker After taskcompletion the worker sends the results back to themanagerThe results are then written to disk and a new assignment issent to the now idle worker This cycle is repeated until themanager does not have any other data to process As a resultit behaves as black-box where the only requirement is to codesome predefined methods or calls Any training in parallelprogramming or communication libraries is unnecessary theresulting code apparently remaining sequential at the userlevel (see the appendix)

The number of workers is indicated by the user althoughadditional workers can be launchedstopped at any timeIf workers do not use the full capacity of the cores moreworkers than cores can be defined Therefore the task-farm

design of Figure 1(b) using a stand-along managermdashthat isonly dedicated to file opening chunk construction datasaving and worker coordinationmdashavoids threading controldiminishes the idle times and provides asymmetry tolerance

33 Implementation of Other Relevant Features Since no par-ticular compiler technology is required SCBI MapReduceskeleton is prepared for workers to be executed simultane-ously over a mixture of architectures (x86 64 PPC ia64i686) running on UNIX-like standalone machines clusterssymmetric multiprocessing machines and grids Having aconnection protocol based on TCPIP the skeleton canhandle at the same time several interconnection networks(Ethernet Gigabit InfiniBand Myrinet optic-fiber with IPetc) Additionally when network transfers are requireddata encryption as well as compression can be enabled toguarantee data privacy Any encryption and compressionprogram installed on the user computer can be invoked

Implemented inputoutput (IO) operations have beenoptimised to diminish readingwriting overload Optimisa-tion consisted of (i) the use of EventMachine Ruby library forasynchronous IO events for networked operations (ii) themanager reads data from disk only once at the beginning ofthe job (iii) the data in the form of objects are maintainedin memory during the entire job to avoid further disk access(iv) the data are split in memory by the manager into chunksof objects of customisable size and (v) the results are writtenon disk only at the end of a task (see example in the code ofthe appendix) As a result once the whole required softwareis installed on every worker only the manager needs tohave access to data on disk However the use of sharedstorage is optional These features provide portability toSCBI MapReduce implementations

Fault-tolerance policy and basic error handling capabilitywere included These capabilities enabled the safe executionof SCBI MapReduce over long periods when (i) executionexception occurs it is reported to the manager which cantry to restart the faulty task instead of stopping the job (ii)a worker fails the manager redistributes the data to a newworker and launches the same task again (iii) unexpected jobinterruption since completed jobs are checkpointed to diskwhen the user restarts the interrupted job themanager is ableto resume execution precisely at the object being processedwhen the interruption occurred (iv) log file recording anexhaustive log file can be tracked to find execution problemsfor debugging purposes and finally (v) too buggy job whena high error rate in workers is detected themanager stops thejob and informs the user that data are faulty and that hesheshould review them before launching a new job

34 Usage and Customisation of SCBI MapReduce Althoughthe code can be downloaded from httpwwwscbiumaesdownloads its easiest installation is as any other Rubygem [32] using the single command sudo gem installscbi mapreduce An on-line help can be obtained using thescbi mapreduce -h command Skeleton customisation onlyrequires modifying parts of the initial configuration para-meters data chunk sizes worker processes and work dis-patcher to achieve a fully distributed system (see details at the


middot middot middot

Data

Data Data Data Data Data

MAP

Calc Calc Calc Calc Calc

Reduce

Results

Res Res Res Res Res

(a)

DataData

Data

Res Res

ManagerResults

Worker

Worker Worker Worker

Worker


(b)

Figure 1 Comparison of parallelisation flowgrams for a single ldquojobrdquo (a) The classic MapReduce view in which the input data of a job aresplit into smaller data chunks by a ldquomapperrdquo process and executed by separated parallel ldquotasksrdquo once all tasks have been finished the resultsare combined by a ldquoreducerdquo process (b) SCBI MapReduce task-farm flowgram in which a job is handled by a single ldquomanagerrdquo which iscontrolling the start and end of each distributed ldquotaskrdquo executed by every ldquoworkerrdquo

appendix) This requires from the user some knowledge forcalling external code and IO operations

Three different templates (one for string capitalisation asecond for the simulated calculations on integers as shown inTable 1 and a third for calculations as shown in Table 1 usingthe datasets of artificial and real-world sequences whichfinds and removes barcodes from sequences) are providedas a customisable startup point for any project using thecommand scbi mapreduce my project template name

35 SCBI Distributed Blast Gem Basic Local AlignmentSearch Tool (Blast) [4] is the tool most frequently usedfor calculating sequence similarity usually being the com-putationally intensive part of most genomic analyses Itspopularity is based on its heuristics that enable it to performthe search faster This drives us to choose it as a proof-of-concept for distribution of legacy software The binaryrelease v 2224 of Blast+ [4] was elected to demonstratethat SCBI MapReduce can be used as a distributionwrapperfor legacy algorithms Since a main drawback of Blastparallelisations is that the entire sequence database mustbe copied on each node SCBI Distributed Blast takesadvantage of shared storage to avoid the waste of disk spaceand the potential scalability impairment for large databases

To use SCBI Distributed Blast it is only needed to wrapany generic Blast or Blast+ command as follows

scbi distributed blast -w 81015840 any blast command 1015840

which distributes Blast between 8 coresworkers (minus119908option) with chunks of 100 sequences (default value of the minus119892option) An example can be as follows

scbi distributed blast -w 81015840 blastn -task blastn-short -db

myDBfna -query inputfilefna

-out outputfilefna 1015840where blastn is executed using 8 cores withinputfilefna as a Fasta query file and myDBfnaas a customised database outputfilefna being the nameof the output file

4 Results

41 Scalability Studies SCBI MapReduce performance wastested in first instance using a dataset of 1000 objects (inte-gers) as shown in Table 1 column ldquoInteger datasetrdquo Jobs were


Table 1 SCBI MapReduce performance tests using three different datasets on the ldquox86 upgradedrdquo cluster Execution times are expressed inseconds The number immediately before X indicates the number of reads grouped in a chunk for every parallel task

Cores Integer dataseta Real-world sequencesb Artificial sequencesc

1X 100X 250X 2000X 1X 100X 250X 2000X1d 1264 13608 13849 13424 19328 23264 22124 24185 333932 635 8824 7903 7584 10251 11462 11554 11776 153024 322 4363 4507 4167 5890 6776 6507 5881 75038 164 2182 2194 2231 3132 3403 3337 3371 487416 81 1097 1098 1121 1633 1901 1797 1817 260232 41 568 549 569 899 921 888 915 133964 21 293 282 295 532 506 449 466 755128 12 173 153 179 352 268 233 245 464aIntegers were subjected to futile intensive calculations that took at least 1 s on every objectbThe dataset of real-world sequences consisted of 261 304 sequence reads (mean 276 nt mode 263 nt coefficient of variation 11) obtained from a 454FLXsequencer downloaded from the SRA database (AC SRR069473)cThe dataset of artificial sequences consisted of 425 438 sequences obtained using the software ART with a 2X coverage simulating a 454FLX sequencing fromthe Danio rerio chromosome 1 (AC NC 0071125)dUsing one core is equivalent to a linear job without any parallelisation or distribution it acts as control reference

Table 2 Percent of time spent by the manager on every sequence-based job similary as detailed in Table 1

Cores Real-world sequences Artificial sequences1X 100X 250X 2000X 1X 100X 250X 2000X

2 092 048 049 041 084 044 043 0364 071 041 045 035 068 037 039 0328 058 040 041 034 055 035 034 02916 052 039 039 034 058 041 037 02832 052 050 051 037 047 045 044 03364 047 056 055 040 037 045 048 034128 061 057 062 037 054 048 049 040

launched using 1 (as control reference) to 128 cores on theldquox86 upgradedrdquo cluster Since the speed-up achieved is closethemaximal theoretical one (Figure 2(a) compare dotted lineand solid lines) it can be suggested that SCBI MapReducescales well with simple objects such as integers In fact it isable to manage up to 18000 tasks of 1 kB each per second witha single core manager on the ldquox86rdquo cluster (results are notshown)

SCBI MapReduce can find a use beyond integers asis demonstrated by the testing of two sequence datasetsin which each object is a sequence in Fasta format Real-world sequences correspond to a true 454FLX sequencingand the artificial sequences correspond to a simulation of454FLX sequencing using ART (httpwwwniehsnihgovresearchresourcessoftwarebiostatisticsart) Barcodeswerelocalised on both sequence datasets varying sequence chunksizes Table 1 shows that the a priorimost affordable paralleli-sation of sequence-by-sequence (1X chunk) did not providethe best speed-up (Figure 2(a) dark and open triangles)This could be explained in part by the fact that the manageris spending a little more time building a lot of small datachunks (Table 2) Higher chunks (100X and 250X) providedshorter similar execution times (Figure 2(a) dark and opensquares and circles) Regarding execution time (Table 1) andspeed-up (Figure 2(a) dark and open diamonds) the hugest

chunk (2000X) is not a good election Since the manager isnot taking more time during this job (Table 2) the reasonfor this speed-up impairment is that using higher chunksthe computational resources are not optimally used duringthe last calculation cycle where most workers are idle andthe manager is waiting for a few workers to finish a long taskThis issue is also observed with other chunks but it becomesapparent only with chunks of 2000X sequences since workersspend more time on every data chunk In conclusionSCBI MapReduce was scaling almost linearly and the opti-mal number of reads in a chunk is dependent on the numberof workers and chunks used for the parallelisation When thenumber of sequences (objects) to process is unknown chunksof 2000X and 1X provided the lowest speed-up while smallchunks ranging from 100X to 250X sequences are preferable

42 Compression and Encryption Overhead is AcceptableOne of themain features of SCBI MapReduce is the capabil-ity of data compression and encryption on-the-fly Since thesecapabilities rely on the manager data security and privacywill be maintained by an increase in execution time Thisoverhead was tested using real-world jobs of Table 1 launchedon the ldquox86 upgradedrdquo cluster with the encryption and com-pression capabilities being enabled (Table 3) Distributionof 1X chunks was dramatically affected (Figure 2(b) open


Table 3 Analysis of compression and encryption of the same real-world sequence jobs in Table 1 Both the execution times (in seconds) andthe percent of this time used by the manager are provided

Cores Execution timea (s) Manager time ()1X 100X 250X 2000X 1X 100X 250X 2000X

2b 9482 8178 7270 10279 473 056 058 0444 4619 4307 3814 5234 276 047 049 0398 2359 2156 2165 3145 176 043 043 03516 1274 1085 1142 1692 154 042 040 03332 913 553 571 905 240 051 052 03764 821 282 294 540 354 057 055 043128 709 163 173 346 387 062 061 039aCompression was performed with ZLib and encrypted with AES-256 any other method installed on computers can be usedbThere is no need of compression or encryption using one single core

0

38

75

113

150

0 38 75 113 150

Spee

dup

Number of cores

Theoretical Integer dataset1X100X250X2000X

1X100X250X2000X

Real-world seqs

Artificial seqs

(a)

0

38

75

113

150

0 38 75 113 150

Spee

dup

Number of cores

Real-world seqs+ compression+ encryption

Theoretical1X100X250X2000X

(b)

Figure 2 Speed-up achieved by SCBI MapReduce implementations (a) Speed-up based on Table 1 data was calculated dividing the timetaken with 1 core by the time taken by each number of cores (b) Performance using compressed and encrypted real-world sequences basedon execution times in Table 3 The speed-up was calculated dividing the time that real-world sequences took with 1 core in Table 1 by theircorresponding times in Table 3 In both plots theoretical values correspond to a speed-up that equals the number of cores used

triangles) and this may be due to the important increase ofthe time spent by the manager on every job (from 091ndash047 in Table 2 to 475ndash154 in Table 3) But overheadbecame bearable when using any other chunk size since(i) the execution time in Table 3 for 100Xndash2000X chunksis close to the presented in Table 1 (ii) the speed-up canrecover the previously observed values (Figure 2(b)) and(iii) the manager spends nearly the same percent of timeThese results suggest that overhead introduced by encryptionand compression can be alleviated using chunks providing

a significant speed-up and that distribution of sequence-by-sequence (1X chunks) was the worst available approach Inconclusion compression and encryption capabilities can beregularly used for distributed calculations when encryptedconnections are not available but desirable without thedramatic increase of execution time

43 Fault-Tolerance Testing The fault-tolerance of SCBIMapReduce was tested using the complete dataset ofreal-world sequences as input and the same analysis was


0

5

10

15

20

25

30

0 65 130 195 260 325

Obj

ect c

ount

s

Time (s)

NormalJob shutdownWorker failure

16 workersfailing

16 workers failing

16 workers failing

Whole job stopped

times104

Figure 3 Fault tolerance testing using the real-world sequences in100X chunks on the ldquox86 upgradedrdquo cluster using 128 cores TheldquoNormalrdquo execution occurred without errors The ldquoJob shutdownrdquoincluded a complete job shutdown (indicated by a solid arrow) andthen amanual restartThe ldquoWorker failurerdquo execution included threeshutdowns (indicated by dashed arrows) of 16 workers each duringthe job

performed on Table 1 on the ldquox86 upgradedrdquo cluster using 128cores Sequences were used in 100X chunks and the job wasexecuted three times The first execution occurred withouterrors and took 184 s in finishing the analysis of 261 304reads (objects) Figure 3 shows that this ldquoNormalrdquo executionpresents the expected constant slope The second executionwas to test the unexpected ldquoJob shutdownrdquo that was simulatedwith a manual interruption of the whole job with the jobbeing then manually relaunched It can be seen in Figure 3that the interruption adds a lag time to the job increasingto 239 s which is the time required to finish the analysis ofall sequences The sequence counts were the same than inldquoNormalrdquo indicating that no sequence and no chunk werereanalysed twice Finally the test of recovery after a ldquoWorkerfailurerdquo was performed stopping 16 workers at three differenttime points of the job execution (a total of 48 differentworkers were affected) In this case the manager handlesautomatically the reanalysis of the unfinished chunks and thejob took 300 s Again no sequence was saved twice in theoutput file As expected the output result of the ldquoNormalrdquoexecution and the interrupted executions was exactly thesame (results not shown) In conclusion the fault-toleranceimplementation of SCBI MapReduce is able to handle exe-cution exceptions and broken workers and stopped jobs canbe restarted without the reanalysis of finished tasks

44 Distributed Blast+ Using Chunks of Sequences Thegeneric blastn command blastn -task blastn-short -db myDBfna -query inputfilefna -out outputfilefnawas launched withreal-world sequences as input and a customised database of240MB containing complete bacterial genomes The speed-up achieved by SCBI Distributed Blast compared to thenondistributed execution (using 1 single core) is presented inFigure 4(a) Outputs of all executions were exactly the same

in all cases and also they were identical to the output ofthe original binary Blast+ (results not shown) MoreoverSCBI Distributed Blast was able to cope without modi-fication with Blast+ versions 2223 to 2227

Blast+ is described to have threading capabilities [4]Therefore 10 000 reads of AC SRR069473 were launchedwith native blastn and with SCBI Distributed Blastboth configured to use the 8 cores of a single blade at theldquox86rdquo cluster Figure 4(b) shows that Blast+ did not appearto efficiently parallelise since it started using only 5 coresand rapidly decreased to only one single core taking 224minto finish the task Similar behaviour was confirmed in othercomputers indicating that it seems to be an inherent featureof Blast+ releases In contrast SCBI Distributed Blastused 8 cores all the time and finished in 24min (that isthe reason why 0 CPU are ldquousedrdquo since then) Thereforethe speed-up introduced by SCBI Distributed Blast is 94demonstrating that it performs much better in exploitingmulticore capabilities than the threaded implementationincluded in Blast+

5 Discussion

51 SCBI MapReduce is an Efficient Task-Farm SkeletonSCBI MapReduce customisation is simpler than thatcustomisation of other frameworks such as FastFlow [27]and it does not need compilation of the final code makingit portable to most computers ldquoas isrdquo Its flexibility allowsto include SCBI MapReduce as a coding part of any newalgorithms as well as a wrapper for already existing functionsscripts or compiled software Promising and efficient resultswere provided when used within several of our algorithms(eg SeqTrimNext [httpwwwscbiumaesseqtrimnext]and Full-LengtherNext [httpwwwscbiumaesfulllength-ernext] both are specifically designed to manage sequencesobtained from next-generation sequencing) as well asa wrapper for Blast+ in the SCBI Distributed Blastgem (Figure 4) Therefore SCBI MapReduce seems to besufficiently powerful for most of parallelisation and distri-bution needs concerning new algorithms legacy softwareand existing scripts in most bioinformatics contexts

It has been described that GPU-based and some MPI-based parallelisations lack of good scalability when dealingwith rapidly growing sequence data whileMapReduce seemsto perform better in those settings [9] That could explainwhy SCBI MapReduce skeleton shows a speed-up of 31-foldfor 32 cores and 59-fold for 64 cores even with sequencedata (Figure 2(a)) This performance is better than the onedisplayed by the R package pR where 32 cores providespeedups of 20ndash27-fold depending on the process [25]Several design reasons can also be invoked to explain suchan efficiency [34] (i) disk IO operations are reduced tominimum (data are read only at the beginning and resultsare saved only at the end) (ii) absence of asymmetry impact(Figure 1(b)) (iii) the manager overhead is limited whenusing more than 2 cores and chunks of sequences (Tables 2and 3) and (iv) longer tasks increased the efficiency becausethe manager is on standby most of the time while waiting


0

20

40

60

0 10 20 30 40

Spee

d-up

Number of cores

Theoreticalx86x86 upgraded

(a)

0

2

4

6

8

10

0 40 80 120 160 200

Num

ber o

f cor

es

Time (min)

Blast+SCBI distributed blast

(b)

Figure 4 Behaviour of SCBI Distributed Blast (a) Blast+ speed-up in chunks of 100X in two different clusters using both differentnetwork protocols and queue systems Theoretical speed-up corresponds to the one that equals the number of cores used Speed-up wascalculated dividing the time spent using 1 core by the time of the corresponding number of cores The following execution times were usedfor 50 000 reads fromAC SRR069473 in the ldquox86rdquo cluster 258 h (92 880 s 1 core) 27 600 s (2 cores) 13 980 s (4 cores) 6960 s (8 cores) 3540 s(16 cores) and 1740 s (32 cores) for the 261 304 reads of AC SRR069473 in the ldquox86 upgradedrdquo cluster 886 h (318 960 s 1 core) 115 161 s (2cores) 56 385 s (4 cores) 28 180 s (8 cores) 14 123 s (16 cores) and 7068 s (32 cores) (b)Threaded Blast+ and SCBI Distributed Blast usedifferently the 8 cores available in the same computer Blast+ was executed with the -num threads 8 option and SCBI Distributed Blastwas executed with the minus119908 8 option using chunks of 100X by default in the ldquox86rdquo cluster

for the workers to finish avoiding relaunching of internal orexternal programs for brief executions

SCBI MapReduce includes implementation of errorhandling and job checkpointingmethods It has been demon-strated (Figure 3) that data chunks from a broken worker oreven a job shutdown can be relaunched in a running workerThis provides robustness and fault-tolerance guarantees safelong-lasting executions and provides for preserving com-putational resources since it avoids processing objects thathave already been processed Such properties will serve tosave time and make the job execution traceable at any timeTherefore SCBI MapReduce represents another step in thedirection of programming environments with the task-farmskeleton concept

52 Distribution in Chunks is More Efficient SCBIMapReduce was intended to deal with problems thatinvolve processing a huge number of small sequences (asin high-throughput sequencing or RNA-Seq experiments)Results showed that splitting datasets into small chunksyields a better speed-up than sending sequences one by oneor in big chunks (Figure 2(a)) An analogous idea has alreadybeen reported in mpiBLAST [35] but for database segmenta-tion instead of sequence groupingTherefore grouping reads

in chunks appear to be another way to provide speed-upalways taking into account that big chunks could be detri-mental when the number of chunks produced is not divisibleby the number of workers used (see 2000X in Figure 2)

Since chunk sizes of 100X and 250X perform similarly(Figure 2) a chunk size of 100X can suit well as default valueeven if the optimal chunk size has not been assessed takinginto account the number of cores and the number of objectsto split in chunks It could be then hypothesised that theuse of chunks may reduce the manager surcharge (Tables2 and 3) Figure 4(a) shows that speed-up could achievesuperscalar behaviour using chunks combined with distri-bution although this is dependent on the task performedby the worker (Blast+ in this instance) and not on thecapabilities of SCBI MapReduce In conclusion the use ofchunks provides an improved overall performance

53The Added Value of Compression and Encryption Capabil-ity In distributed grids or the cloud encrypted connectionscannot be always established for data privacy and datacompression can accelerate any transfer particularly in lowbandwidth connectionsThe overhead introduced by encryp-tion and compression is particularly evident when data areprocessed one-by-one (Figure 2(b) open triangles) since the


class MyWorkerManager lt WorkManager

def selfinit work manager

open input fastq file and results as output

fastq file=FastqFilenew(fastq file path)

results=FastqFilenew( lsquo resultsfastq rsquo lsquo w+ rsquo )end

def selfend work manager

close files on finish

fastq fileclose

resultsclose

end

this method is called every time a worker

needs a new work

def next work

get next sequence or nil from file

namefastaqualcomments=fastq filenext seq

if namenil

return namefastaqualcomments

else

return nil

end

end

def work received(results)

write results to disk

resultseach do |namefastaqualcomments|

resultswrite seq(namefastaqualcomments)

end

end

end

Algorithm 1

use of more andmore cores did not significantly speed up theprocess But compression and encryption overhead becomeacceptable when the dataset is split into chunks (compareslopes in Figure 2 and execution times in Tables 1 and 3)Encryption capability per chunks should be enabled onlywhen untrusted networks are involved in distributed jobsCompression per chunks could be envisaged when using lowbandwidth networks (eg in some grids [2]) provided thatcompressed data transfer is faster than the time spent incompressing data As a result SCBI MapReduce can be usedon grids with confidential data when encrypted connectionscannot be established

54 SCBI MapReduce is Ready for Grid Computing It hasbeen shown (Figure 4(a)) that SCBI MapReduce and there-fore SCBI Distributed Blast could work with homo-geneous clusters (the ldquox86rdquo and ldquox86 upgradedrdquo clusters)consisting of different types of CPUs It has been tested thatSCBI MapReduce was also able to deal with one heteroge-neous grid consisting of one x86 computer using OSX onex86 computer Linux 24 cores of the ldquox86rdquo cluster and 32cores of the ldquoSuperdomerdquo (results are not shown) HenceSCBI MapReduce can cope with different queue systems(PBS Slurm) and networks and can distribute in sym-metric multiprocessing machines (ldquoSuperdomerdquo) clusters(Figure 4(a)) and heterogeneous Unix-based grids (above)

Other features that enable SCBI MapReduce at least the-oretically [2 36 37] to be used in nearly any type ofcomputer grid are (i) the above described encryption andcompression capabilities (ii) lack of administrator privileges(iii) the case that running is on-demand only and (iv)minimal requirement of hard disk since it takes advantage ofshared storage onlywhennecessarymaking it highly portableto other computer systems Testing SCBI MapReduce inldquocloud computingrdquo services remains a pending task howeverit is expected that it should work and provide benefits relatedto cost-effectiveness

55 SCBI Distributed Blast is a Boosted Version of Blast+Previous improvements of Blast were performed byskilled programmers and provide parallelised versionstightly bonded to one released version The development ofSCBI Distributed Blast based on the SCBI MapReducetask-farm skeleton comes to remove version bonding andcoding challenges since it can boost in a core-dependent way(Figure 4(a)) any Blast+ release installed on the scientistcomputer not only the version tested in this study enablingthe update of the Blast+ release while maintaining the sameSCBI Distributed Blast code

In contrast to other Blast parallelisations includ-ing mpiBLAST and MapReduce Blast [9] SCBIDistributed Blast distributed tasks are seeded with


sequence chunks while maintaining intact the database Thisis because it does not need to copy the sequence database oneach worker since it takes advantage of shared storageThis isalso the reason why it provides exactly the same results as theoriginal Blast+ in less time (Figure 4) Another MapReduceapproach for Blast CloudBlast [9] has very poorscalability since it is optimised for short reads mapping andneeds to copy the database on each node while the speed-upobserved with SCBI Distributed Blast was linear andappeared to be superscalar in tested clusters (Figure 4(a))However superscalar behaviour was exclusively observed for2 cores (speed-ups of 33 in the ldquox86rdquo cluster and 28 in theldquox86 upgradedrdquo cluster) since taking two cores as referencethe speed-up slope was close to the theoretical speed-up (4081 163 326 650 and 1276)

Comparing the speed-up of our ldquoboostedrdquo Blast+ anda threaded execution of Blast+ (Figure 4(b)) it can beseen that SCBI Distributed Blast can take advantage ofall computing capabilities and scales linearly in contrast tonative Blast+ (Figure 4(b)) and the older NCBI-Blast [30]In conclusion SCBI Distributed Blast illustrates the ease-of-use and performance of SCBI MapReduce opening theway for code modifications that can easily produce scalablebalanced fault-tolerant and distributed versions of otherBlast-related programs like PSI-Blast WU-BlastAB-Blast NCBI-Blast and the like Furthermore Blast-based genome annotation processes can take advantage ofSCBI Distributed Blast with minor changes in the code

6 Conclusions

This work does not aim at advancing parallelisation technol-ogy but it aims to apply distribution advantages in the use ofbioinformatic tools that are useful for example for genomicsgiving attractive speedups In a context of continuous devel-opment of parallel software SCBI MapReduce provides atask-farm skeleton for parallelisationdistribution with fea-tures such as fault-tolerance encryption and compression on-the-fly data distribution in chunks grid-readiness and flexi-bility for integration of new and existing code without being askilled programmer In fact SCBI MapReducewas designedfor researchers with a biological background that considercomplicated MPI Hadoop or Erlang solutions for paralleli-sationdistributionThat is why Ruby was selected since it hasa shallow learning curve even for biologists and easily man-ages the programming necessities In the context of genomicstudies one significant advantage is that SCBI MapReduceenables to reuse in a commodity paralleldistributed com-puting environment existing sequential code with little or nocode changes SCBI Distributed Blast can illustrate this

Results indicate that SCBI MapReduce scales well isfault-tolerant can be used on multicore workstations clus-ters and heterogeneous grids even where secured connec-tions cannot be established can use several interconnectionnetworks and does not need special hardware or virtualmachine support It is also highly portable and shall diminishthe disk space costs in ldquocloud computingrdquo In conclusionSCBI MapReduce andhence SCBI Distributed Blast are

class MyWorker lt Worker

process each obj in received objs

def process object(objs)

find barcodes

find mids(objs)

return objs

end

end

Algorithm 2

get custom worker file path

custom worker file = lsquo my workerrb rsquo init worker manager

MyWorkerManagerinit work manager

use any available ip and first empty port

ip= lsquo 0000 rsquo port=0 workers = 4

launch Manager and start it

manager = Managernew(ipport workers

MyWorkerManagercustom worker file)

managerstart server

Algorithm 3

ready among other uses for intensive genome analyses andannotations

Appendix

Customisation of the Three Files That GovernSCBI_MapReduce

SCBI MapReduce consists of a number of files but inorder to be customised for particular needs users onlyneed to modify the IO data management methods at themanager file (my worker managerrb) the computation tobe distributed at the worker file (my workerrb) and themain file (mainrb)

Themethods to redefine in my worker managerrb are(i) next work that provides new data for workers or nil ifthere is no more data available (in the following code itsimply reads one sequence at a time from a fastq file ondisk) (ii) selfinit work manager that opens IO data files(iii) selfend work manager that closes files when finishedand (iv) work received that writes results on disk as they aregenerated The relevant code for my worker managerrb is(see Algorithm 1)

Customisation of the worker file (my workerrb)includes redefinition of the process object method thatcontains the function call to find mids The functionfind mids can be defined by the user in hisher own sourcecode or a compiled algorithm or an existing code Therelevant code for my workerrb is (see Algorithm 2)

The main program file (mainrb) has to be invoked tolaunch the distributed job It can be used as it is from the


command line as a common Ruby script (ruby mainrb)or as a part of a more complex code Skilled users can alsomodify its code andor name to enable special features oreven receive user parameters which is the case when usingSCBI MapReduce for distribution of an internal part of analgorithmThenumber of workers is defined here at least onefor the manager and one for one worker The relevant codefor mainrb is (see Algorithm 3)

Conflict of Interests

The authors declare that they have no conflict of interests

Acknowledgments

The authors gratefully acknowledge Rafael Larrosa andRocıo Bautista for the helpful discussions and the computerresources of the Plataforma Andaluza de Bioinformatica ofthe University of Malaga Spain This study was supportedby Grants from the Spanish MICINN (BIO2009-07490) andJunta de Andalucıa (P10-CVI-6075) as well as institutionalfunding to the research group BIO-114

References

[1] C Huttenhower and O Hofmann ldquoA quick guide to large-scalegenomic data miningrdquo PLoS Computational Biology vol 6 no5 Article ID e1000779 2010

[2] M C Schatz B Langmead and S L Salzberg ldquoCloud comput-ing and the DNA data racerdquoNature Biotechnology vol 28 no 7pp 691ndash693 2010

[3] D Patterson ldquoThe trouble withmulti-corerdquo IEEE Spectrum vol47 no 7 pp 28ndash53 2010

[4] C Camacho G Coulouris V Avagyan et al ldquoBLAST+ archi-tecture and applicationsrdquo BMC Bioinformatics vol 10 article421 2009

[5] S Galvez D Dıaz P Hernandez F J Esteban J A Caballeroand G Dorado ldquoNext-generation bioinformatics using many-core processor architecture to develop a web service forsequence alignmentrdquo Bioinformatics vol 26 no 5 pp 683ndash6862010

[6] H Lin XMaW Feng andN F Samatova ldquoCoordinating com-putation and IO in massively parallel sequence searchrdquo IEEETransactions on Parallel and Distributed Systems vol 22 no 4pp 529ndash543 2011

[7] T NguyenW Shi andD Ruden ldquoCloudAligner a fast and full-featured MapReduce based tool for sequence mappingrdquo BMCResearch Notes vol 4 article 171 2011

[8] T Rognes ldquoFaster Smith-Waterman database searches withinter-sequence SIMD parallelisationrdquo BMC Bioinformatics vol12 article 221 2011

[9] X-L Yang Y-L Liu C-F Yuan and Y-H Huang ldquoParalleliza-tion of BLAST with MapReduce for long sequence alignmentrdquoin Proceedings of the 4th International Symposium on ParallelArchitectures Algorithms and Programming (PAAP rsquo11) pp 241ndash246 IEEE Computer Society December 2011

[10] B Langmead M C Schatz J Lin M Pop and S L SalzbergldquoSearching for SNPs with cloud computingrdquo Genome Biologyvol 10 no 11 article R134 2009

[11] M Needham R Hu S Dwarkadas and X Qiu ldquoHierarchicalparallelization of gene differential association analysisrdquo BMCBioinformatics vol 12 article 374 2011

[12] M K Gardner W-C Feng J Archuleta H Lin and XMal ldquoParallel genomic sequence-searching on an ad-hoc gridexperiences lessons learned and implicationsrdquo in Proceedingsof the ACMIEEE Conference on High Performance Networkingand Computing vol 1 pp 1ndash14 2006

[13] L Yu CMoretti AThrasher S Emrich K Judd and DThainldquoHarnessing parallelism inmulticore clusters with the All-PairsWavefront andMakeflow abstractionsrdquoCluster Computing vol13 no 3 pp 243ndash256 2010

[14] MK Chen andKOlukotun ldquoThe Jrpm system for dynamicallyparallelizing Java programsrdquo in Proceedings of the 30th AnnualInternational Symposium on Computer Architecture (ISCA rsquo03)pp 434ndash445 San Diego Calif USA June 2003

[15] P Haller and M Odersky ldquoScala Actors unifying thread-basedand event-based programmingrdquo Theoretical Computer Sciencevol 410 no 2-3 pp 202ndash220 2009

[16] J Armstrong R Virding C Wikstrom and M Williams Con-current Programming in ERLANG Prentice Hall 2nd edition1996

[17] WGropp E Lusk andA SkjellumUsingMPI Portable ParallelProgramming with the Message-Passing Interface MIT PressCambridge Mass USA 2nd edition 1999

[18] L Dagum and R Menon ldquoOpenmp an industry-standardapi for shared-memory programmingrdquo IEEEComputationalScience amp Engineering vol 5 no 1 pp 46ndash55 1998

[19] Q Zou X-B Li W-R Jiang Z-Y Lin G-L Li and KChen ldquoSurvey ofmapreduce frameoperation inbioinformaticsrdquoBriefings in Bioinformatics In press

[20] R C Taylor ldquoAn overview of the hadoopmapreducehbaseframework and its current applications in bioinformaticsrdquo BMCBioinformatics vol 11 supplement 12 p S1 2010

[21] J Lin ldquoMapreduce is good enoughrdquo Big Data vol 1 no 1 pp28ndash37 2013

[22] D Thain T Tannenbaum and M Livny ldquoDistributed comput-ing in practice the Condor experiencerdquoConcurrency Computa-tion Practice and Experience vol 17 no 2-4 pp 323ndash356 2005

[23] S Pellicer G Chen K C C Chan and Y Pan ldquoDistributedsequence alignment applications for the public computingarchitecturerdquo IEEE Transactions on Nanobioscience vol 7 no1 pp 35ndash43 2008

[24] J Hill M Hambley T Forster et al ldquoSPRINT a new parallelframework for Rrdquo BMC Bioinformatics vol 9 article 558 2008

[25] J Li XMa S YoginathG Kora andN F Samatova ldquoTranspar-ent runtime parallelization of the R scripting languagerdquo Journalof Parallel and Distributed Computing vol 71 no 2 pp 157ndash1682011

[26] F Berenger C Coti and K Y J Zhang ldquoPAR a PARallel anddistributed job crusherrdquoBioinformatics vol 26 no 22 pp 2918ndash2919 2010

[27] M Aldinucci M Torquati C Spampinato et al ldquoParallelstochastic systems biology in the cloudrdquo Briefings in Bioinfor-matics In press

[28] A Matsunaga M Tsugawa and J Fortes ldquoCloudBLAST com-bining MapReduce and virtualization on distributed resourcesfor bioinformatics applicationsrdquo in Proceedings of the 4th IEEEInternational Conference on eScience (eScience rsquo08) pp 222ndash229 IEEEComputer SocietyWashington DC USA December2008


[29] W Lu J Jackson and R Barga ldquoAzureBlast a case study ofdeveloping science applications on the cloudrdquo in Proceedingsof the 19th ACM International Symposium on High Perfor-mance Distributed Computing (HPDC rsquo10) pp 413ndash420 ACMChicago Ill USA June 2010

[30] P D Vouzis and N V Sahinidis ldquoGPU-BLAST using graphicsprocessors to accelerate protein sequence alignmentrdquo Bioinfor-matics vol 27 no 2 pp 182ndash188 2011

[31] C S Oehmen and D J Baxter ldquoScalablast 20 rapid and robustblast calculations on multiprocessor systemsrdquo Bioinformaticsvol 29 no 6 pp 797ndash798 2013

[32] J Aerts and A Law ldquoAn introduction to scripting in Ruby forbiologistsrdquo BMC Bioinformatics vol 10 article 221 2009

[33] S Balakrishnan R RajwarMUpton andK Lai ldquoThe impact ofperformance asymmetry in emerging multicore architecturesrdquoSIGARCH Computer Architecture News vol 33 no 2 pp 506ndash517 2005

[34] L Jostins and J Jaeger ldquoReverse engineering a gene networkusing an asynchronous parallel evolution strategyrdquo BMC Sys-tems Biology vol 4 article 17 2010

[35] O Thorsen B Smith C P Sosa et al ldquoParallel genomicsequence-search on a massively parallel systemrdquo in Proceedingsof the 4th Conference on Computing Frontiers (CF rsquo07) pp 59ndash68 Ischia Italy May 2007

[36] M Armbrust A Fox R Griffith et al ldquoA view of cloudcomputingrdquo Communications of the ACM vol 53 no 4 pp 50ndash58 2010

[37] C-L Hung and Y-L Lin ldquoImplementation of a parallel proteinstructure alignment service on cloudrdquo International Journal ofGenomics vol 2013 Article ID 439681 8 pages 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of


Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology


Molecular Biology International

GenomicsInternational Journal of


The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


BioinformaticsAdvances in

Marine BiologyJournal of



Signal TransductionJournal of


BioMed Research International

Evolutionary BiologyInternational Journal of



Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Genetics Research International


Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational



Enzyme Research



Microbiology


related to communication and synchronisation [12] Thismakes programming software for distributed systems a verychallenging task [13] and important long-running data pro-cessing scripts for bioinformatics remain unparallel Hence itshould be desirable to have a flexible general-purpose frame-work for distribution that could (i) take advantage of theexisting scripts andor binaries without requiring any sourcecode modification (ii) be used for distributing new bio-informatic algorithms (iii) transfer data in the most secureform when secure connections cannot be established and(iv) exploit the total computational power of any multicorecomputing system allowing for parallelisation among coresand distribution between computers

2 Related Work

Native threads is a satisfactory approach in compiled com-puter languages (for example Jrpm [14] a Java runtimemachine for parallelising loops in sequential Java programsautomatically) but it may not be fully implemented in script-ing languages But there are efficient dedicated computerlanguages such as ErLang and Scala [15 16] which offerprogrammable solutions for specific concurrent modelsAlthough being quite efficient its main disadvantage is thatits use requires whole code rewriting making embarrassinglyparallel task regions or orchestrating communication as wellas synchronisation which is reserved only for skilled pro-grammers Moreover the resulting paralleldistributed coderemains bonded to the software version that is adapted

A de facto standard model used in scientific high-performance computing analysis is the Message-PassingInterface (MPI) [17] whosemost widely implementations arepyMPI (httppympisourceforgenet) that requires explicitparallel codingOpenMP [18] and a set of compiler directivesand callable runtime library routines that enables shared-memory parallelism OpenMP includes a set of synchronisa-tion features since programmers are responsible for checkingdependencies deadlocks race conditions etc There is alsothe R library Rmpi which is a wrapper for porting MPI to Rwith the same pros and cons of MPI

True parallelisationdistribution frameworks can also beachieved by means of MapReduce [19] and its most widelydistributed implementation Hadoop [20] A promising newresource is YARN [21] which introduces a generic schedulingabstraction that allows multiple parallelisationdistributionframeworks (for example Hadoop andMPI) to coexist on thesame physical cluster Researchers can also find Condor [22]a specialised full-featured workload management system forcompute-intensive jobs It is easy to use but provides sub-optimal solutions The BOINC platform [23] is a distributedsequence alignment application that offers the aggregation ofthe available memory of all participating nodes but it suffersfrom communication overhead

Parallelisation libraries for R language besides Rmpi areSPRINT [24] and pR [25] packages whose their main advan-tage is that they require very little modification to the existingsequential R scripts and no expertise in parallel computinghowever the master worker suffers from communicationoverhead and the authors recognise that their approach may

not yield the optimal schedule [25] Other parallelisationlibraries are snow and nws that provide coordination andparallel execution facilities

More general-purpose tools such as bag-of-tasks enginesfor multicore architectures and small clusters have also beendeveloped in Python such as PAR [26] its main disadvantageis that it is hard to put in practice and is only available forsmall clusters There is also FastFlow [27] a C++ pattern-based programming framework for parallel and distributedsystems although it simplifies the task of distributed softwareit must be compiled on every machine and seems moreappropriate for skilled programmers in C++ It has beenargued that abstractions are an effective way of enablingnonexpert users to harness clusters multicore computers andclusters of multicore computers [13] Although abstractionscan enable the creation of efficient robust scalable andfault tolerant implementations and are easy for nonexpertprogrammers they are specialised to a restricted class ofworkloads and its customisation to produce general-purposetools is not trivial

The comparison of nucleotide or protein sequences fromthe same or different organisms is a very powerful toolin the study of gene and genomes for finding similaritiesbetween sequences to infer the biological function andstructure of newly sequenced genes predict new membersof gene families decipher genome organisation and exploreevolutionary relationships Blast is the algorithm of choicefor such analyses and its performance has been continuouslyimproved particularly from the arrival of high-throughputsequencing That is why it is becoming a critical componentof genome homology searches and annotation inmany bioin-formatics workflows Improvements in its execution speedwill result in significant impact in the practice of genomestudies Therefore important efforts have been invested inaccelerating it for different computers systems (to cite a fewmpiBLAST [6 12] CloudBLAST [28] AzureBlast [29]GPU-Blast [30] and scalaBLAST 20 [31]) These Blastparallelisations require computer expertise to produce andadapt a particular Blast code and are tightly bonded to thesoftware version included in the paralleliseddistributed code[31] It has been reported [9] that most MPI- and GPU-basedBlast are only adequately optimal and generalizable to per-form the batch processing of small amounts of sequence dataon small clusters Therefore there is room for a distributedflexible easily-upgradeable version of Blast

It can be inferred that distributed algorithms are becom-ing a real need in present bioinformatics research in orderto adapt legacy and new software to multicore computingfacilities This paper describes SCBI MapReduce a newtask-farm skeleton for the Ruby scripting language [32] thatgathers the requirements presented in the Introduction andsimplifies the creation of parallel and distributed software toresearchers without skills in distributed programming Evenif customisation could appear more complicated than usingexisting libraries for parallelisationmdashsuch as OpenMP [18]BOINC [23] or R libraries such asRmpi or SPRINT [24]mdashit isas simple as the parallelisation of R code usingpR [25] In con-trast to these libraries SCBI MapReduce is not constrainedonly to parallelisation (not allowing distribution nor grid



3 Methods











Data


MAP


Reduce

Results

Res Res Res Res Res

(a)

DataData

Data

Res Res

ManagerResults

Worker


Worker


(b)











4 Results








2 092 048 049 041 084 044 043 0364 071 041 045 035 068 037 039 0328 058 040 041 034 055 035 034 02916 052 039 039 034 058 041 037 02832 052 050 051 037 047 045 044 03364 047 056 055 040 037 045 048 034128 061 057 062 037 054 048 049 040









0

38

75

113

150

0 38 75 113 150

Spee

dup

Number of cores


1X100X250X2000X

Real-world seqs

Artificial seqs

(a)

0

38

75

113

150

0 38 75 113 150

Spee

dup

Number of cores



(b)






0

5

10

15

20

25

30

0 65 130 195 260 325

Obj

ect c

ount

s

Time (s)


16 workersfailing

16 workers failing

16 workers failing

Whole job stopped

times104






5 Discussion




0

20

40

60

0 10 20 30 40

Spee

d-up

Number of cores


(a)

0

2

4

6

8

10

0 40 80 120 160 200

Num

ber o

f cor

es

Time (min)


(b)
















fastq fileclose

resultsclose

end


needs a new work

def next work



if namenil


else

return nil

end

end





end

end

end

Algorithm 1









6 Conclusions






find barcodes

find mids(objs)

return objs

end

end

Algorithm 2









managerstart server

Algorithm 3


Appendix










Acknowledgments


References














































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology



3 Methods











Data


MAP


Reduce

Results

Res Res Res Res Res

(a)

DataData

Data

Res Res

ManagerResults

Worker


Worker


(b)











4 Results








2 092 048 049 041 084 044 043 0364 071 041 045 035 068 037 039 0328 058 040 041 034 055 035 034 02916 052 039 039 034 058 041 037 02832 052 050 051 037 047 045 044 03364 047 056 055 040 037 045 048 034128 061 057 062 037 054 048 049 040









0

38

75

113

150

0 38 75 113 150

Spee

dup

Number of cores


1X100X250X2000X

Real-world seqs

Artificial seqs

(a)

0

38

75

113

150

0 38 75 113 150

Spee

dup

Number of cores



(b)






0

5

10

15

20

25

30

0 65 130 195 260 325

Obj

ect c

ount

s

Time (s)


16 workersfailing

16 workers failing

16 workers failing

Whole job stopped

times104






5 Discussion




0

20

40

60

0 10 20 30 40

Spee

d-up

Number of cores


(a)

0

2

4

6

8

10

0 40 80 120 160 200

Num

ber o

f cor

es

Time (min)


(b)
















fastq fileclose

resultsclose

end


needs a new work

def next work



if namenil


else

return nil

end

end





end

end

end

Algorithm 1









6 Conclusions






find barcodes

find mids(objs)

return objs

end

end

Algorithm 2









managerstart server

Algorithm 3


Appendix










Acknowledgments


References














































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology



Data


MAP


Reduce

Results

Res Res Res Res Res

(a)

DataData

Data

Res Res

ManagerResults

Worker


Worker


(b)











4 Results








2 092 048 049 041 084 044 043 0364 071 041 045 035 068 037 039 0328 058 040 041 034 055 035 034 02916 052 039 039 034 058 041 037 02832 052 050 051 037 047 045 044 03364 047 056 055 040 037 045 048 034128 061 057 062 037 054 048 049 040









0

38

75

113

150

0 38 75 113 150

Spee

dup

Number of cores


1X100X250X2000X

Real-world seqs

Artificial seqs

(a)

0

38

75

113

150

0 38 75 113 150

Spee

dup

Number of cores



(b)






0

5

10

15

20

25

30

0 65 130 195 260 325

Obj

ect c

ount

s

Time (s)


16 workersfailing

16 workers failing

16 workers failing

Whole job stopped

times104






5 Discussion




0

20

40

60

0 10 20 30 40

Spee

d-up

Number of cores


(a)

0

2

4

6

8

10

0 40 80 120 160 200

Num

ber o

f cor

es

Time (min)


(b)
















fastq fileclose

resultsclose

end


needs a new work

def next work



if namenil


else

return nil

end

end





end

end

end

Algorithm 1









6 Conclusions






find barcodes

find mids(objs)

return objs

end

end

Algorithm 2









managerstart server

Algorithm 3


Appendix










Acknowledgments


References














































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology







2 092 048 049 041 084 044 043 0364 071 041 045 035 068 037 039 0328 058 040 041 034 055 035 034 02916 052 039 039 034 058 041 037 02832 052 050 051 037 047 045 044 03364 047 056 055 040 037 045 048 034128 061 057 062 037 054 048 049 040









0

38

75

113

150

0 38 75 113 150

Spee

dup

Number of cores


1X100X250X2000X

Real-world seqs

Artificial seqs

(a)

0

38

75

113

150

0 38 75 113 150

Spee

dup

Number of cores



(b)






0

5

10

15

20

25

30

0 65 130 195 260 325

Obj

ect c

ount

s

Time (s)


16 workersfailing

16 workers failing

16 workers failing

Whole job stopped

times104






5 Discussion




0

20

40

60

0 10 20 30 40

Spee

d-up

Number of cores


(a)

0

2

4

6

8

10

0 40 80 120 160 200

Num

ber o

f cor

es

Time (min)


(b)
















fastq fileclose

resultsclose

end


needs a new work

def next work



if namenil


else

return nil

end

end





end

end

end

Algorithm 1









6 Conclusions






find barcodes

find mids(objs)

return objs

end

end

Algorithm 2









managerstart server

Algorithm 3


Appendix










Acknowledgments


References














































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology





0

38

75

113

150

0 38 75 113 150

Spee

dup

Number of cores


1X100X250X2000X

Real-world seqs

Artificial seqs

(a)

0

38

75

113

150

0 38 75 113 150

Spee

dup

Number of cores



(b)






0

5

10

15

20

25

30

0 65 130 195 260 325

Obj

ect c

ount

s

Time (s)


16 workersfailing

16 workers failing

16 workers failing

Whole job stopped

times104






5 Discussion




0

20

40

60

0 10 20 30 40

Spee

d-up

Number of cores


(a)

0

2

4

6

8

10

0 40 80 120 160 200

Num

ber o

f cor

es

Time (min)


(b)
















fastq fileclose

resultsclose

end


needs a new work

def next work



if namenil


else

return nil

end

end





end

end

end

Algorithm 1









6 Conclusions






find barcodes

find mids(objs)

return objs

end

end

Algorithm 2









managerstart server

Algorithm 3


Appendix










Acknowledgments


References














































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology


0

5

10

15

20

25

30

0 65 130 195 260 325

Obj

ect c

ount

s

Time (s)


16 workersfailing

16 workers failing

16 workers failing

Whole job stopped

times104






5 Discussion




0

20

40

60

0 10 20 30 40

Spee

d-up

Number of cores


(a)

0

2

4

6

8

10

0 40 80 120 160 200

Num

ber o

f cor

es

Time (min)


(b)
















fastq fileclose

resultsclose

end


needs a new work

def next work



if namenil


else

return nil

end

end





end

end

end

Algorithm 1









6 Conclusions






find barcodes

find mids(objs)

return objs

end

end

Algorithm 2









managerstart server

Algorithm 3


Appendix










Acknowledgments


References














































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology


0

20

40

60

0 10 20 30 40

Spee

d-up

Number of cores


(a)

0

2

4

6

8

10

0 40 80 120 160 200

Num

ber o

f cor

es

Time (min)


(b)
















fastq fileclose

resultsclose

end


needs a new work

def next work



if namenil


else

return nil

end

end





end

end

end

Algorithm 1









6 Conclusions






find barcodes

find mids(objs)

return objs

end

end

Algorithm 2









managerstart server

Algorithm 3


Appendix










Acknowledgments


References














































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology









fastq fileclose

resultsclose

end


needs a new work

def next work



if namenil


else

return nil

end

end





end

end

end

Algorithm 1









6 Conclusions






find barcodes

find mids(objs)

return objs

end

end

Algorithm 2









managerstart server

Algorithm 3


Appendix










Acknowledgments


References














































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology




6 Conclusions






find barcodes

find mids(objs)

return objs

end

end

Algorithm 2









managerstart server

Algorithm 3


Appendix










Acknowledgments


References














































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology





Acknowledgments


References














































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology


















Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology








Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology

Documents

Research Article SCBI MapReduce, a New Ruby Task-Farm ...downloads.hindawi.com/archive/2013/707540.pdf · task-farm skeleton for the Ruby scripting language [ ]that gathers the requirements