Special issue on Science-Driven Cloud Computingdownloads.hindawi.com/journals/sp/2011/362493.pdf · Special issue on Science-Driven Cloud Computing ... periods of time, to data-intensive

Scientific Programming 19 (2011) 71–73 71DOI 10.3233/SPR-2011-0326IOS Press

Guest-editorial

Special issue on Science-Driven CloudComputing

Ivona Brandic a,∗ and Ioan Raicu b,c

a Information Systems Institute, Vienna University of Technology, Vienna, Austriab Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USAc Mathematics and Computer Science Div., Argonne National Laboratory, Argonne, IL, USA

1. Introduction

It is our honor to serve as guest editors of thisspecial issue of the Scientific Programming journalon Science-Driven Cloud Computing. We are pleasedto present 7 high-quality contributions that focus onscientific cloud computing. Scientific computing in-volves a broad range of technologies, from high-performance computing (HPC) which is heavily focus-ed on compute-intensive applications, high-throughputcomputing (HTC) which focuses on using many com-puting resources over long periods of time to accom-plish its computational tasks, many-task computing(MTC) which aims to bridge the gap between HPC andHTC by focusing on using many resources over shortperiods of time, to data-intensive computing which isheavily focused on data distribution and harnessingdata locality by scheduling of computations close tothe data.

This special issue on Science-Driven Cloud Com-puting provides the scientific community a dedicatedforum for discussing new research, development, anddeployment efforts in running these kinds of scien-tific computing workloads on Cloud Computing in-frastructures. This special issue will focus on theuse of cloud-based technologies to meet new com-pute intensive and data intensive scientific challengesthat are not well served by the current supercomput-ers, grids or commercial clouds. What architecturalchanges to the current cloud frameworks (hardware,operating systems, networking and/or programming

*Corresponding author: E-mail: [email protected].

models) are needed to support science? Dynamic in-formation derived from remote instruments and cou-pled simulation and sensor ensembles are both im-portant new science pathways and tremendous chal-lenges for current HPC/HTC/MTC technologies. Howcan cloud technologies enable these new scientific ap-proaches? How are scientists using clouds? Are therescientific HPC/HTC/MTC workloads that are suitablecandidates to take advantage of emerging cloud com-puting resources with high efficiency? What benefitsexist by adopting the cloud model, over clusters, grids,or supercomputers? What factors are limiting cloudsuse or would make them more usable/efficient?

Cloud computing first established in the businesscomputing domain is now a topic of research in com-puter science and an interesting execution platformfor science applications. Today there are a number ofcommercial and science cloud deployments, includingthose provided by Amazon, Google, IBM, Microsoftand others. Campus and national labs are also deploy-ing their own cloud solutions. The ability to control theresources and the pay-as-you go usage model enablesnew approaches to application development and re-source provisioning. Science applications are lookingtowards the cloud to provide a stable and customizableexecution environment. This special issue of the Sci-entific Programming journal is dedicated to the com-putational challenges and opportunities of cloud com-puting. It is also an extension to several workshops theguest editors have been involved in, including the Sci-entific Cloud Computing (ScienceCloud) workshop se-ries (see www.cs.iit.edu/~iraicu/ScienceCloud2011/).

1058-9244/11/$27.50 © 2011 – IOS Press and the authors. All rights reserved

72 I. Brandic and I. Raicu / Special issue on Science-Driven Cloud Computing

This special issue reflects current research trends indevelopments of Cloud computing infrastructures forHPC and parallel computing area. The first trend isoptimization of Cloud resources based on the avail-able programming models for HPC (like workflows)where the control mechanism for the application ex-ecution are external to the Cloud. The second trendare research issues in application of currently availablepublic Clouds for HPC resulting in adaptation of datastructures and programming models, e.g., data formatadaptation, performance and cost analysis for compu-tations in public Clouds and evaluation of available in-frastructure for storage of large data sets.

2. In this issue

It was a challenge to select just 7 of the 18 high-quality submissions for inclusion here. We thank the56 reviewers for their thoughtful work. We briefly in-troduce the accepted articles in the following severalparagraphs.

Manish Parashar et al.’s “Autonomic managementof applications workflows on hybrid computing in-frastructure” [3] proposes a framework to manageautonomics, including scheduling the mix of hybridresources, application management, monitoring re-sources and adaptation of resource provisioning basedon objectives and metrics. The authors explored theautonomics using a real world scientific workflow, theDefiant reservoir simulator using Ensemble KalmanFilter with several different objectives (e.g., acceler-ation, conservation and resilience) and metrics (e.g.,deadline and budget). They formed a hybrid infrastruc-ture with TeraGrid and several instance types of Ama-zon EC2 and the results show that the proposed frame-work for autonomics works well efficiently for the ap-plication workflow achieving the objectives using themetrics.

Satish Srirama et al.’s “Scalability of parallel sci-entific applications on the cloud” [5] studies the ef-fects of moving parallel scientific applications ontothe cloud through deploying several benchmark appli-cations (e.g., matrix–vector operations, NAS parallelbenchmarks and DOUG – Domain decomposition OnUnstructured Grids) on the cloud. The authors also ob-served the limitations of cloud and its comparison withcluster in terms of performance, and raises importantissues around the necessity for better frameworks oroptimizations for MapReduce style applications.

Keith Jackson et al.’s “Performance and cost anal-ysis of the supernova factory on the Amazon AWScloud” [2] studies the feasibility of porting the NearbySupernova Factory pipeline (a complex pipeline of se-rial processes that execute various image processing al-gorithms in parallel on ∼10 TBs of data) to the Ama-zon Web Services cloud environment. Specifically theauthors describe the tool set they developed to man-age a virtual cluster on Amazon EC2, explore thevarious design options available for application dataplacement, and offer detailed performance results andlessons learned from each of the design options.

Zach Hill et al.’s “Early observations on the per-formance of Windows Azure” [1] presents the resultsof comprehensive performance experiments conductedon Windows Azure from October 2009 to February2010. The authors present performance and reliabil-ity observations and analysis from their deploymentof a large-scale scientific application hosted on Azure,called ModisAzure, that show unusual and sporadicVM execution slowdown of over 4× in some casesand affected up to 16% of task executions at times.In addition to a detailed performance evaluation ofWindows Azure, the authors provide recommendationsfor potential users of Windows Azure based on theseearly observations. Although the discussion and anal-ysis is tailored to scientific applications, the results arebroadly applicable to the range of existing and futureapplications running in Windows Azure.

Gabriela Turcu et al.’s “Reshaping text data for effi-cient processing on Amazon EC2” [7] investigates pro-visioning on the Amazon EC2 cloud from the user per-spective, attempting to provide a scheduling strategythat is both timely and cost effective. The authors relyon the empirical performance of the application of in-terest on smaller subsets of data, to construct an execu-tion plan. Using predictions of the performance of theapplication based on measurements on small data sets,the authors devise an execution plan that meets a userspecified deadline while minimizing cost.

Ani Thakar et al.’s “Large science databases – arecloud services ready for them?” [6] reports on attemptsto put an astronomical database – the Sloan Digital SkySurvey science archive – in the cloud. The authors findthat it is very frustrating to impossible at this time tomigrate a complex SQL Server database into currentcloud service offerings such as Amazon (EC2) and Mi-crosoft (SQL Azure). Certainly it is impossible to mi-grate a large database in excess of a TB, but even with(much) smaller databases, the limitations of cloud ser-vices make it very difficult to migrate the data to the

I. Brandic and I. Raicu / Special issue on Science-Driven Cloud Computing 73

cloud without making changes to the schema and set-tings that would degrade performance and/or make thedata unusable. Preliminary performance comparisonsshow a large performance discrepancy with the Ama-zon cloud version of the SDSS database. The authorsdescribe a powerful new computational instrument thatthey are developing in the interim – the Data-Scope –that will enable fast and efficient analysis of the largest(petabyte scale) scientific datasets. Data-Scope uses atwo-tier architecture and commodity components toprovide a data-intensive analysis platform that max-imises sequential disk throughput while minimizingpower consumption and overall cost.

Ostermann et al.’s [4] “Using a new event-basedsimulation framework for investigating resource pro-visioning in Clouds” presents GroudSim, a novel Gridand Cloud toolkit for scientific computing. GroudSimfacilitates execution of complex simulation scenariosin computational science considering scalable simula-tion independent discrete-event engine supporting var-ious scenarios, as for example simple job execution,file transfers, cost calculation and similar. The authorsstudy and evaluate four strategies for provisioning andleasing Cloud resources based on resource bulks, fuzzydescriptions and hourly payment intervals.

3. Guest editors

Ivona Brandic is Assistant Professorat the Distributed Systems Group, In-formation Systems Institute, ViennaUniversity of Technology (TU Wien).Prior to that, she was Assistant Pro-fessor at the Department of Scien-tific Computing, Vienna University.She received her PhD degree fromVienna University of Technology in2007. From 2003 to 2007 she partic-ipated in the special research projectAURORA (Advanced Models, Ap-plications and Software Systems forHigh Performance Computing) andthe European Union’s GEMSS (Grid-

Enabled Medical Simulation Services) project. She is involved inthe European Union’s SCube project and she is leading the Aus-trian national FoSII (Foundations of Self-governing ICT Infrastruc-tures) project funded by the Vienna Science and Technology Fund(WWTF). She is Management Committee member of the Euro-pean Commission’s COST Action on Energy Efficient Large ScaleDistributed Systems. From June–August 2008 she was visiting re-searcher at the University of Melbourne. Her interests comprise SLAand QoS management, Service-oriented architectures, autonomiccomputing, workflow management, and large scale distributed sys-tems (Cloud, Grid, Cluster, etc.).

Ioan Raicu is an assistant profes-sor in the Department of ComputerScience at Illinois Institute of Tech-nology (IIT), as well as a guest re-search faculty in the Math and Com-puter Science Division at ArgonneNational Laboratory. He is also thefounder and director of the Data-Intensive Distributed Systems Lab-oratory at IIT. He has received theprestigious NSF CAREER award in2011 for his innovative work on dis-tributed file systems for exascale

computing. He was a NSF/CRA Computation Innovation Fellowat Northwestern University in 2009–2010, and obtained his PhDin Computer Science from University of Chicago under the guid-ance of Dr. Ian Foster in 2009. He is a 3-year award winner of theGSRP Fellowship from NASA Ames Research Center. His researchwork and interests are in the general area of distributed systems.His work focuses on a relatively new paradigm of Many-Task Com-puting (MTC), which aims to bridge the gap between two predom-inant paradigms from distributed systems, High-Throughput Com-puting and High-Performance Computing. His work has focused ondefining and exploring both the theory and practical aspects of re-alizing MTC across a wide range of large-scale distributed systems.He is particularly interested in resource management in large scaledistributed systems with a focus on many-task computing, data in-tensive computing, cloud computing, grid computing and many-corecomputing.

References

[1] Z. Hill, J. Li, M. Mao, A. Ruiz-Alvarez and M. Humphrey,Early observations on the performance of Windows Azure, Sci-entific Programming 19(2,3) (2011), 121–132 (this issue).

[2] K.R. Jackson, K. Muriki, L. Ramakrishnan, K.J. Runge andR.C. Thomas, Performance and cost analysis of the Supernovafactory on the Amazon AWS cloud, Scientific Programming19(2,3) (2011), 107–119 (this issue).

[3] H. Kim, Y. el-Khamra, I. Rodero, S. Jha and M. Parashar, Au-tonomic management of application workflows on hybrid com-puting infrastructure, Scientific Programming 19(2,3) (2011),75–89 (this issue).

[4] S. Ostermann, K. Plankensteiner and R. Prodan, Using a newevent-based simulation framework for investigating resourceprovisioning in clouds, Scientific Programming 19(2,3) (2011),161–178 (this issue).

[5] S.N. Srirama, O. Batrashev, P. Jakovits and E. Vainikko, Scala-bility of parallel scientific applications on the cloud, ScientificProgramming 19(2,3) (2011), 91–105 (this issue).

[6] A. Thakar, A. Szalay, K. Church and A. Terzis, Large sciencedatabases – are cloud services ready for them?, Scientific Pro-gramming 19(2,3) (2011), 147–159 (this issue).

[7] G. Turcu, I. Foster and S. Nestorov, Reshaping text data forefficient processing on Amazon EC2, Scientific Programming19(2,3) (2011), 133–145 (this issue).

Submit your manuscripts athttp://www.hindawi.com

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Distributed Sensor Networks


Advances in

FuzzySystems

Hindawi Publishing Corporationhttp://www.hindawi.com

Volume 2014


ReconfigurableComputing

Hindawi Publishing Corporation http://www.hindawi.com Volume 2014


Applied Computational Intelligence and Soft Computing

Advances in

Artificial Intelligence


Advances inSoftware EngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Advances in

Multimedia


Biomedical Imaging


ArtificialNeural Systems

Advances in


RoboticsJournal of



Computational Intelligence and Neuroscience

Industrial EngineeringJournal of


Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014


Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in


Documents

Special issue on Science-Driven Cloud Computingdownloads.hindawi.com/journals/sp/2011/362493.pdf · Special issue on Science-Driven Cloud Computing ... periods of time, to data-intensive