Dedicated to my Family - DiVA portal523474/... · 2012-05-21 · Presentation at NorduGrid Conference, Sundvolden, 2011. Extension of Chelonia Storage System to handle databases

Dedicated to my Family

List of papers

This thesis is based on the following papers, which are referred to in the textby their Roman numerals.

Project - 1: In papers I and II, work on application execution environments isdescribed. In paper I, we present tools for general purpose solutions usingportal technology while paper II addresses access of grid resources within anapplication specific problem solving environment.

I Erik Elmroth, Sverker Holmgren, Jonas Lindemann, Salman Toor, andPer-Olov Östberg. Empowering a Flexible Application Portal with aSOA-based Grid Job Management Framework. In Proc. 9th Workshopon State-of-the-art in Scientific and Parallel Computing (PARA 2008),Springer series Lecture Notes in Computer Science (LNCS),6126 – 6127.

II Mahen Jayawardena, Carl Nettelblad, Salman Toor, Per–Olov Östberg,Erik Elmroth, and Sverker Holmgren. A Grid–Enabled ProblemSolving Environment for QTL Analysis in R. In Proc. 2ndInternational Conference on Bioinformatics and ComputationalBiology (BiCoB 2010), 2010. ISBN 978-1-880843-76-5.

Contributions: In this project I participated in architecture design, integrationcomponent implementation and design of the QTL specific interface in LAP. I havealso participated in system deployment, running experiments and in writing thearticle.

Project - 2: Paper III, IV and V describe file-oriented distributed storage solutions.Papers III is focused on the architectural design of the Chelonia system whereaspapers IV and V addressed stability, performance and identified issues.

III Jon Kerr Nilsen, Salman Toor, Zsombor Nagy, and Bjarte Mohn.Chelonia – A Self-healing Storage Cloud. M. Bubak, M. Turala, andK. Wiatr, editors, In CGW’09 Proceedings, Krakow, 2 2010. ACCCYFRONET AGH. ISBN 978-83-61433-01-9.

IV Jon Kerr Nilsen, Salman Toor, Zsombor Nagy, and Alex Read.Chelonia: A self-healing, replicated storage system. Published inJournal of Physics: Conference Series, 331(6):062019, 2011.

V Jon Kerr Nilsen, Salman Toor, Zsombor Nagy, Bjarte Mohn, andAlex Read. Performance and Stability of the Chelonia Storage System.Accepted in International Symposium on Grids and Clouds (ISGC)2012.

Contributions: I did part of the system design and implementation. Also I designed,implemented and executed the test scenarios presented in all the articles. I was alsoheavily involved in technical discussions and papers writing.

Project - 3: In papers VI and VII a database driven approach for managing dataand the analysis requirements from scientific applications is discussed. Paper VIfocuses on the data management whereas paper VII presents a solution for dataanalysis.

VI Salman Toor, Manivasakan Sabesan, Sverker Holmgren, andTore Risch. A Scalable Architecture for e-Science Data Management.Published in Proc. 7th IEEE International Conference on e-Science,ISBN 978-1-4577-2163-2.

VII Salman Toor, Andrej Andrejev, Andreas Hellander, Sverker Holmgren,and Tore Risch. Scientific Analysis by Queries in Extended SPARQLOver a Distributed e-Science Data Store. Submitted in TheInternational Conference for High Performance Computing,Networking, Storage and Analysis (SC 2012).

Contributions: I did the architecture design, interface implementation and staticpartitioning for complex datatypes in Chelonia. I also participated in designinguse-cases to demonstrate the system and in article writing.

Project - 4: Paper VIII also addresses a distributed storage solution. In this paper weexplore a cloud based storage solution for scientific applications.

VIII Salman Toor, Rainer Töebbicke, Maitane Zotes Resines, andSverker Holmgren. Investigating an Open Source Cloud Infrastructurefor CERN-Specific Data Analysis. Accepted in 7th IEEE InternationalConference on Networking, Architecture, and Storage (NAS 2012).

Contributions: I participated in enabling access from the ROOT framework toSWIFT and in prototype system deployment. I worked on design, implementationand execution of the test-cases presented, contributed to the technical discussion, andparticipated in paper writing.

Reproduced with the permission of the publishers, presented here in anotherformat than in the original publication.

Contents

Part I: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.1 Overview of Distributed Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.1.1 Communication Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.1.2 Architectural Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.1.3 Frameworks for Distributed Computing . . . . . . . . . . . . . . . . . . . . . . . . 17

1.2 Models for Scalable Distributed Computing Infrastructures . . . . . . 181.2.1 Grid Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.2.2 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.2.3 Grids vs Clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.2.4 Other Relevant Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.3 Technologies for Large Scale Distributed ComputingInfrastructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Part II: Application Execution Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2 Application Environments for Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.1 Grid Portals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2 Application Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.3 The Job Management Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Part III: Distributed Storage Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 Distributed Storage Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.1 Characteristics of Distributed Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2 Challenges of Distributed Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.3 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.1 Chelonia Storage System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.3.2 Database Enabled Chelonia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.3.3 Cloud based Storage Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Part IV: Resource Allocation in Distributed Computing Infrastructures . . . . . 41

4 Resource Allocation in Distributed Computing Infrastructures . . . . . . . . . . . . . 434.1 Models for Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.2 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Part V: Article Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5 Summary of Papers in the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.1 Paper-I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.2 Paper-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.3 Paper-III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.4 Paper-IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.5 Paper-V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.6 Paper-VI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.7 Paper-VII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.8 Paper-VIII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6 Svensk sammanfattning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

List of Other PublicationsThese publications have been written during my PhD studies but are not partof the thesis. However, some of the material in publications I and II below isincluded in other papers in the thesis. Also, some of the conclusions in publi-cation III are presented in Section 4.2 in the thesis summary.

I. Mahen Jayawardena, Salman Toor, and Sverker Holmgren. A grid portalfor genetic analysis of complex traits. Proc. 32nd International Conven-tion on Information and Communication Technology, Electronics andMicroelectronics : Volume I. - Rijeka, Croatia : MIPRO, 2009. - S.281-284.

II. Mahen Jayawardena, Salman Toor, and Sverker Holmgren. Compu-tational and visualization tools for genetic analysis of complex traits.Technical Report no. 2010-001. Department of Information Technol-ogy, Uppsala University.

III. Salman Toor, Bjarte Mohn, David Cameron, and Sverker Holmgren.Case-Study for Different Models of Resource Brokering in Grid Sys-tems. Technical Report no. 2010-009. Department of Information Tech-nology, Uppsala University.

9

List of PresentationsThe material presented in this thesis has been presented at the following con-ferences/workshops:

• Usage of LUNARC Portal in Bioinformatics. Presented at The Nor-duGrid Conference in Copenhagen, 2007.

• Empowering a Flexible Application Portal with a SOA-based Grid JobManagement Framework. Presentation at Workshop on State-of-the-Artin Scientific and Parallel Computing (PARA 2008) in Trondheim, 2008.

• Two presentations at NorduGrid Workshop in Bern, 2008. Introductionto LUNARC Application Portal and Empowering LUNARC portal usingGJMF.

• Architecture of the Gateway Component of Chelonia Storage. Presentedat NorduGrid Conference in Budapest, 2008.

• Efficient and Reliable Brokering System for ARC middleware. Posterpresentation at The International Summer School on Grid Computing inSophia Antipolis, Nice, 2009.

• Joint demo presentation together with Jon Kerr Nilsen and ZsomborNagy on Chelonia Storage System at EGEE User Forum in Barcelona,2009. Video link: http://www.youtube.com/watch?v=NEUWzGHHGhc

• Architecture of Chelonia Storage. Presented at Cracow Grid Workshop,2009.

• A Grid-Enabled Problem Solving Environment for QTL Analysis in R.Poster presentation at The EGEE User Forum in Uppsala, 2010.

• ARC User Interfaces using LUNARC Portal. Presentation at NorduGridConference, Sundvolden, 2011.

• Extension of Chelonia Storage System to handle databases. Presentationat The Summer School in International Center for Theoretical Physics(ICTP) in Treiste, 2011.

• A Scalable Architecture for e-Science Data Management. Presentationat 7th IEEE e-Science Conference in Stockholm, 2011.

• Performance and Stability evaluation of Chelonia Storage. Presentedat The International Symposium on Grids and Clouds (ISGC), Taipei,March 2012.

• Status report of Open Source Cloud Storage Infrastructure for CERN-Specific Data Analysis, Presented (via Skype) at the 3rd Workshop ofCOST – Open Network for High-Performance Computing on ComplexEnvironments, Genova Italy, April 2012.

• Investigating an Open Source Cloud Storage Infrastructure for CERN-Specific Data Analysis, will be presented at 7th IEEE International Con-ference on Networking, Architecture, and Storage (NAS 2012), XiamenChina, June 2012.

10

Part I:Introduction

1. Introduction

Computational science plays a vital role for rapid progress of commercial andscientific environments. Together with tools, methods and techniques used incomputational science, the advancements in computational models have thepotential of providing fundamental break-through in this progress. Dependingon the needs of the applications, different parallel and distributed computingmodels have been developed over time. To fulfill the ever-growing computa-tional and storage needs of the applications, even more efficient, reliable andsecure computing environments will be needed also in the future.

Applications from disciples like engineering, astronomy, medicine, and bi-ology require sustainable computational models which can fulfill their require-ments for long periods of time. For example, applications using stochasticmodels [87] require several thousands of independent executions for single ex-periments, implying that significant computational power is needed. Other ex-amples are given in the field of bioinformatics, where multidimensional opti-mization problems must be solved to determine e.g. interacting genes [77]. Interms of data intensive applications, LHC experiments [22] running at CERN[12] requires storage solutions managing petabytes of data. Similarly, the stor-age requirements for genome sequencing [45] is even beyond the petascale. In[74], a number of different data-intensive applications are presented whichrequire unconventional solutions to meet their demands.

Distributed Computing Infrastructures (DCI) enables geographically dis-tributed resources under autonomous administrative domains to be seamlessly,securely and reliably utilized by applications from various disciplines. Duringthe last decades, a number of different projects have aimed at designing sys-tems which enable efficient usage of geographically distributed resources tofulfill computational and storage requirements. Several models have been usedto describe different distributed computing infrastructures, e.g. Utility Com-puting, Meta Computing, Scalable Computing, Internet Computing, Peer-to-Peer Computing, and Grid Computing. Service oriented architecture enablesCloud Computing to focus on providing non-trivial quality of services bothfor computational and storage requirements.

In principle, grid computing was the first concept that enabled use of largescale distributed computing infrastructures. The idea of building a computa-tional grid evolved from the concept of electric grids [65]. Under the headlineof grid computing, issues of efficient, reliable and seamless access to geo-graphically distributed resources have been extensively studied, and a numberof production level grids are today essential tools in different scientific disci-plines.

13

After grid computing, computational and storage clouds have emerged toprovide alternative options for flexible access to computing infrastructures.Cloud computing can be considered as a successor of grid computing, addingsome more advanced concepts essential to address a wider span of user com-munities.

The work presented in this thesis is based on the grid and cloud computingparadigms. Three areas are studies; Application environments, developmentand evaluation of storage solutions for grids and clouds, and efficient resourceallocation in grids. Below, a brief introduction to the challenges studied ineach field is given:

Application environments: For enabling distributed computing infrastruc-tures, it has been realized that two major issues should be addressed. First,the monolithic design of applications needs to be modified. Second, moreuser friendly and flexible application environments are required to executeand manage complex applications. A number of solutions have been proposedbased on high level client API(s), web application portals and workflow man-agement systems. We have developed a general purpose and an applicationspecific problem-solving environments based on R framework [34], GJMF[58] and LUNARC portal [86] for managing applications in DCI.

Storage solutions: The significance of storage systems in distributed com-puting is indispensable. The task of building a large-scale storage system us-ing geographically distributed storage resources is non-trivial, and to achieveproduction level quality requires functionality such as security, scalability, atransparent view over the geographically distributed resources, simple/easydata access, and a certain level of self-healing capability where componentscould join and leave the system without affecting the systems availability. Todesign solutions that can address these features is still a challenge. We havedeveloped and analyzed the Chelonia storage system [8]. Chelonia providesreliable, secure, efficient and self-healing file storage over geographically dis-tributed storage nodes. Recently we have extended the capabilities of Cheloniaby enabling databases at storage nodes. The databases are specialized for sci-entific applications. By using a generalized database schema, Chelonia canhandle simple (integer, real and string) and complex (arrays, matrices and ten-sors) datatypes using databases. We have also investigated the performanceand scalability of an Openstack storage [30] solution for CERN-specific dataanalysis.

Resource allocation: For grid systems, efficient selection of the execu-tion or storage target within the set of available resources is one of the keychallenges. The heterogeneous nature of the grid environment makes the taskof resource discovery and selection cumbersome. A comprehensive view ofavailable resources require up-to-date information. The task of collecting in-formation is expensive and consumes network bandwidth. We have proposed astrategy of classifying attributes of resources which helps in efficient resourcediscovery.

14

1.1 Overview of Distributed ComputingNetwork-enabled computational nodes are the fundamental building block forthe concept of distributed computing. This concept allows the researchers touse computational power far beyond what is available at a centralized facility.The goal of distributed computing is to build powerful and scalable solutionsto enhance the computational and storage capabilities.

The two most commonly used network models for enabling distributedcomputing are the request/response model and the message queue approach.Message queues provide an asynchronous communication mode in which mes-sages can be sent any time whereas a request/response system can be eithersynchronous or asynchronous. Client/Server and Peer-to-Peer communica-tion are both examples of request/response models. These models themselvesintroduce native ways of utilizing remote resources. On top of these basicmodels, other solutions have been developed.

1.1.1 Communication ProtocolsIn beginning, Remote Procedure Calls (RPC) [100] introduced in the softwareplayed a vital role in enabling distributed computing. RPC works at the trans-port and application layer of the Open System Interconnect (OSI) model ofnetwork communication. RPC provides interprocess communication wherethe process can be on the local or on a remote host. The communication usingRPC is point to point. It also hides the underlined communication details andprovides high level interfaces to access remote resources. RPC allows a nor-mal procedural/function call to be executed in another process on the remotehost. It works in the client server mode and requires synchronous commu-nication. Other variants for interprocess communication includes messagingqueuing and IBM’s Advanced Program-to-Program Communication (APPC).

Simple Object Access Protocol (SOAP) [36] is an eXtensible Markup Lan-guage (XML) [14] based protocol for applications to share structured infor-mations on the network. It provides an envelope format for exchanging infor-mation using communication protocol like the Hyper Text Transport Protocol(HTTP) or the Simple Mail Transport Protocol (SMTP). Since HTTP is thestandard communication protocol for internet, SOAP and HTTP together pro-vides a standardized and much used solution to communicate over wide areanetworks. One of the gains of using SOAP over HTTP is that its easy to crossthe security walls in the network. This is because HTTP is allowed on the net-work and then communicating over a specific port make it possible to identifythe incoming requests. Since SOAP-based communication is XML based andXML is accepted on almost all platforms, this approach allows communica-tion in heterogeneous environments. On the other side, because of that the richXML format is used, SOAP based communication is slow.

15

REST stands for Representational State Transfer [60]. The REST approachis aimed at avoiding SOAP and RPC and simply rely on HTTP requests.REST-based communication is stateless, i.e. each request is self contained andequipped with all the information required by the server to fulfill the request.It provides simple yet fundamental functionality over HTTP. Using REST onecan send request like Create, Update, Read and Delete to the remote appli-cation. REST is an alternative solution avoiding the limitations in SOAP andRPC. It is light weight and communicating over standard HTTP, which makesit a convenient and platform independent communication option.

1.1.2 Architectural DesignsBased on the communication protocols presented in 1.1.1, two basic architec-tures for distributed computing can be identified; Component-based architec-ture and Service-oriented Architectures. Here, a number of different variantsand hybrid architectures are also available.

Component-based architecture rely on point-to-point communication be-tween the nodes. A component is a software object interacting with anothersoftware object located on the remote host. In simplest case, each componentexposes certain interfaces which are used to interact and access functionalitiesprovided by the component. Since the communication is point-to-point, oftensystem based on this architecture uses RPCs as communication medium. Then

Figure 1.1. Point-to-point communication in Component-based architecture using re-mote procedural calls (RPC).

the solution inherits the advantages and disadvantages of using an interprocesscommunications method. For example the communication will be fast but itwill be difficult to use a heterogeneous network environment.

The design of the Service Oriented Architecture (SOA) is natural for looselycoupled distributed applications in a heterogeneous environment. During re-cent years, SOA has been a common choice for designing solutions for dis-tributed computing. A definition given in [29] describes SOA as a "paradigmfor organizing and utilizing distributed capabilities that may be under the con-trol of different ownership domains”. In general, SOA allows for having arelationship between the needs and the capabilities. This relation can be one-

16

to-one, where one need can be fulfilled by one capability, or it can be many-to-many. The visibility of the capabilities, offered by entities, is describedin the service description which also contains the information necessary forthe interaction. The service description also informs about what result will bedelivered and under what conditions the service can be invoked.

Figure 1.2. Service Oriented Architecture (SOA).

Similar to the component based architecture, a SOA solution also inher-its the advantages and disadvantages used by the underlying communicationmechanism. SOAP over HTTP(S) has become a default choice for solutionsbased on SOA. This allows requests and responses to be comprehensive be-cause of the extensibility of XML. The communication over HTTP(S) allowsthe services to be deployed over the internet. This will increase the visibilityof the services and also these service can be reused by different applications.On the down-side, due to the extensibility of XML, the message parsing mech-anism is time consuming.

1.1.3 Frameworks for Distributed ComputingDistributed Computing Environment (DCE) [9] is based on a component-based architecture. It provides consistent communication across the remoteexecution environments. The framework is used to build client-server ap-plications and also allows features like DCE-specific remote procedure calls(DCE/RPC), authentication and security, naming service and access to dis-tributed file system.

Java Remote Method Invocation (RMI) [56] allows Java objects to be sharedbetween Java Virtual Machines (JVM) running on multiple nodes. Since JVMprovides a platform-independent environment for Java applications this allowsJava objects to be shared across different platforms. Java RMI frameworkbased on the component-based architecture. Using Java RMI, the communi-

17

cation is restricted to a pure Java environment, thus it does not provide supportfor cross language interoperability. One of the gains of using Java RMI is itsobject-oriented approach which facilitates building applications.

Microsoft’s Distributed Component Object Model (DCOM) [96] is anotherframework based on the component-based object architecture. DCOM is anextension of the Component Object Model (COM). COM allows to build aclient server communication model on same host and DCOM extend it to mul-tiple hosts within a same network. DCOM uses standard RPCs developed forthe distributed computing environment. DCOM provides security features andit also introduces platform independence for the DCOM award applications.

The Common Object Request Broker Architecture (CORBA) [105] pro-vides a framework for platform-independent distribution of objects on a het-erogeneous network. This is another effort based on a component-based objectsharing architecture. CORBA is one of the most successful frameworks forbuilding distributed solutions. The Object Request Broker (ORB) is the coreof CORBA. This component allows the connected nodes to initiate a requestwithout knowing the actual location and the interface to program at the nodewhich can fulfill that request. The framework also provides runtime interfaceidentification and invocation using the Interface Repository (IR) and DynamicInvocation Interface (DII).

Web services [29] form one implementation of a service-oriented architec-ture. It is the most widely used technology in distributed computing solutionsdays. A web service framework enables a distributed application that offersfunctionality by publishing its functions by interfaces and hiding the imple-mentation details. Clients communicate with standard protocols without actu-ally knowing the platform or the implementation details. The success of webservice technology is due to the acceptance of standards. Usually the com-munication process is based on three components: XML for data exchangebetween client application and service, SOAP and HTTP(s). Also, WSDL(Web Service Description Language) [41], which is an XML based languageto describe the attributes, interfaces and other properties of the web-service, issometimes used.

1.2 Models for Scalable Distributed ComputingInfrastructures

Distributed Computing Infrastructures can broadly be categorized into eithersmall to medium scale, formed by closely interconnected computing resourcein a single organization, or large scale computational environments, based ondistributed, shared resources of the organizations. Managing geographicallyheterogeneous distributed infrastructures requires more advanced solutions toprovide reliable systems. Some of the key issues that should be consideredare:

18

• The proposed model should be scalable and adaptable to the new re-quirements.

• Due to the heterogeneous nature of the environment, the availability ofthe resources is not guaranteed. The solutions should be flexible enoughto accommodate changes.

• The resources are managed by different administrative domains, Whendesigning a federated infrastructure the domain autonomy should be in-tact.

• It is important to ensure the correct use of the resources in the system,and mechanism for authorization and authentication is needed.

• An efficient resource discovery mechanism is required to keep the infor-mation updated.

• The maximum usage of the system requires an efficient and reliable re-source allocation mechanism.

• An abstraction layer should hide the underlying complexity from theusers.

Grids Computing and Cloud Computing are the two most successful modelsthat implements a distributed computing infrastructure.

1.2.1 Grid ComputingGrid Technology provides means to facilitate work in collaborative environ-ments, formed across boundaries of institutions and research organizations.In [62], grid technology is stated to “promise to transform the practice ofscience and engineering, by enabling large-scale resource sharing and coor-dinated problem solving within farflung communities”. Over the last decade,a number of research and development projects have put a lot of effort intomaking grid technology stable enough to provide a production infrastructurefor both computation and data.

Grid technology allows different kinds of resources to be seamlessly avail-able over geographical and technological boundaries. The resource can beanything from a single workstation, a rack mounted cluster, a supercomputer,a complex RAID storage, to e.g. a scientific instrument that produces data.These resources are normally independent and managed by different admin-istrative domains. This brings in lots of challenges in how to enable differentvirtual organizations [64] to access resources in different domains. A basicquestion is to select which resource to use to run the application or store thedata. Since each set of resources are subject to different access policies, howcan one enable a standard access mechanism? And how can the environmentbe made secure enough to maintain the integrity of the system? How can onebuild a reliable monitoring and accounting system with low overhead? Whatprotocols should be used to communicate with users, between computing re-sources and between storage centers? Each of these questions emerge as a

19

sub-field in grid computing research in which different research groups comeup with various types of solutions.

The uptake of grid technology within the scientific community can be mea-sured by the number of middleware initiatives and the number of projects uti-lizing grid resources using these middlewares. For example by the end of theEGEE project, the gLite middleware [10] had more than 260 sites all over theworld, in which they had 150,000 processing cores, 28 petabytes of disk spaceand 41 petabytes of long-term tape storage. More than 15 different scientificdomains benefited from this infrastructure. The Advanced Resource Connec-tor (ARC) middleware [57] by NorduGrid [27] have 66 sites in which morethan 54,000 CPUs are in use [7]. Many other middlewares, such as Condor-G, Globus [17], Unicore [40] for computing grids and DCache, CASTOR,DPM and SRB for storage grids are also heavily used in different scientificexperiments. Apart from these production middlewares for computational andstorage grids, there are a number of research projects which have developeddifferent application specific and general purpose environments based on thesemiddleware.

1.2.2 Cloud ComputingClouds address large-scale storage and computing needs by providing a cer-tain level of abstraction. This technology has gained much attention over thelast few years and companies like Amazon, Yahoo and Google have presentedcommercial solutions. There are a number of definitions [39, 103] explainingthe concept of a cloud, one example is found in [106] stating that “A Comput-ing Cloud is a set of network enabled services, providing scalable, QoS guar-anteed, normally personalized, inexpensive computing platform on demand,which could be accessed in a simple and pervasive way”.

The basic idea of cloud technology is to provide a given level of qualityof service while keeping the infrastructural details hidden from the end users.The customer pays and get the services on demand. In [103], the set-up of acloud service is based on two actors; Service Providers (SPs), which providea set of different services (e.g. Platform as a Service (PaaS) or Software asa Service (SaaS)) and ensure that the customer access these. Then the Infras-tructure Providers (IPs) are responsible for the hardware infrastructure. Actorswith specialized roles introduce flexibility in the system, for example one SPcan utilize infrastructure of multiple IPs and a single IP can provide infrastruc-ture for a single or multiple SP(s).

Having actors responsible for providing services fulfilling a certain ServiceLevel Agreement (SLA) together with an economic model encourage compa-nies to adopt cloud technology and sell computing and storage services likeother utilities such as electricity or gas.

20

1.2.3 Grids vs CloudsCurrently, a discussion aimed at pinpointing the differences between cloudsand grids is ongoing. In [66], a detailed comparison of these technologies ispresented, and it is clarified that there are differences in security, computingand programming model. Another key difference is the elasticity providedby cloud solutions. The concept of elasticity allows applications to grow andshrink according to their requirements. This is very important for clouds asthe idea of pay-as-you-go can not work if clouds doesn’t have provision ofdynamic management of resource utilization. Also, the grid concept focuseson loosely coupled federated infrastructures in which there is no guarantee thatresources are available all times. In contrast, the current solutions for cloudsare based on closely connected dedicated resources where the infrastructureproviders guarantee the availability. However, there are also similarities invision, sometimes in the architecture and also in the tools that are used tobuild the systems.

1.2.4 Other Relevant ModelsApart from Grids and Clouds, there are some more models available for man-aging distributed infrastructures. Utility Computing is one of them.This con-cept is a bit similar to the cloud in which an economic model is attached to thecomputing model and the cost depends on the usage of the resource. Anothereffort in this direction is the Desktop Grid. The Desktop Grid model inheritthe features from Grid computing but focus on low cost, reliable and maintain-able solutions. Autonomic Computing focuses on the self-managing processin distributed environments. The idea is to build self sufficient components inthe system which can manage themselves under the unpredictable conditions.An other model that has gained significant attention is Pervasive Computing.The model is based on the idea that devices should be completely connectedand fully available.

1.3 Technologies for Large Scale DistributedComputing Infrastructures

There is a number of reliable solutions available which are based on the con-cept of grid and cloud computing. In grids, the term grid middleware is used todescribe a software stack, designed to enable seamless, reliable, efficient andsecure access to the geographically distributed resources whereas in cloudseverything is knows as service.

A number of different middleware initiatives have been started over theyears, and the following description only gives a brief overview of a few pro-duction level middlewares for computational and storage grids.

21

• Globus Toolkit: Globus is a pioneering project that provides tools tobuild grid middlewares. The toolkit [63] provided by Globus containsseveral components which can broadly be categorized into five classes:Execution Management [13], which execute, monitor, and schedule gridjobs; Information Service [23], which discover and monitor resourcesin the grid; Security [35], which provides a Grid Security Infrastructure(GSI); Data Management [18], which allows for handling of large datasets, and finally Common Runtime, which is a set of tools and librariesused to build the services.

Other middleware initiatives provide a more full-blown solution for dis-tributed computational and storage resources and are directly used in differentapplication areas:

• Advanced Resource Connector (ARC): The Advanced Resource Con-nector (ARC) Grid middleware is developed by the NorduGrid consor-tium [26] and the EU KnowARC project [21]. The ARC middleware isSOA-based where services run in a customized service container calledthe Hosting Environment Daemon (HED) [49]. HED comprises plug-gable components which provide different functionalities. For example,Data Management Components are used to transfer data using variousprotocols, Message Chain Components are responsible for the commu-nication within clients and services, ARC Client Components are plug-ins used by the clients to connect to different Grid flavors, and PolicyDecision Components are responsible for the security model within thesystem. There are a number of services available for fulfilling funda-mental requirements of a grid system. For example, grid job executionand management is handled by the A-REX service [80], policy decisionsare taken by the Charon service, the ISIS service [98] is responsible forinformation indexing, and batch job submission is handled by the Schedservice. The work presented in this thesis is based on the ARC middle-ware. In [28], further details on each of the component and services inARC are presented.

• gLite: The gLite middleware [85] was the interface to the resourcesin the EGEE [70] infrastructure. Also gLite is SOA-based. Two corecomponents of the gLite middleware stack are gLiteUI, a specializeduser interface to access available resources, and the Virtual Organiza-tion Management Service (VOMS) which manages information and ac-cess rights of the users within a VO. Resource level security is managedby the Local Centre Authorization Service (LCAS) and Local Creden-tial Mapping Service (LCMAPS). The Berkeley Database InformationIndex (BDII) is used for publishing the information. The WorkloadManagement System (WMS) [90] is a key component of the systemand distributes and manages user tasks across the available resources.The lcgCE and CREAM-CE (Computing Resource Execution And Man-agement Computing Element) are services for providing the computing

22

element, and lcgWN is the service for a worker node. For Data Man-agement [102], the LFC (LCG File Catalog) and the FTS (File TransferService) are used. R-GMA [52] and File Transfer Monitor (FTM) [15]are used for monitoring and accounting.

• UNICORE: UNICORE [99] is a middleware based on a three-layeredarchitecture. Here, the top layer deals with the client tools, the secondservice layer consist of core middleware services such as authentication,job management and execution. Application workflows are managedby Workflow Engine and Service Orchestrator. The bottom layer is thesystems layer, which contains a connection between Unicore and the au-tonomous resources management system. External Storage is managedby the GridFTP protocol.

• dCache: dCache [97] is a distributed storage solution which combinesgeographically distributed storage nodes. It also provides access to ter-tiary storage systems. The major features of dCache include hot-spotdetection, data flow control and the support of different data access pro-tocols. dCache is based on the service-oriented architecture which com-bines heterogeneous storage elements to collect several hundreds of ter-abytes in a single namespace. Nordic Data Grid Facility (NDGF) [25]is the largest example of the dCache deployment. There, the core com-ponents, such as the metadata catalogue, indexing service and protocoldoors are run in a centralized manner, while the storage pools are dis-tributed.

• OGSA-DAI: The Open Grid Services Architecture – Data Access andIntegration (OGSA-DAI) [78] is a storage middleware solution that al-lows uniform access to data resources using a SOA approach. OGSA-DAI consist of three core services, the Data Access and Integration Ser-vice Group Registry (DAISGR) allows other services in the system topublish metadata and capabilities, the Grid Data Service Factory (GDSF)has a direct connection to the data resource and contains additional meta-data about the resource, and the Grid Data Service (GDS) creates GDS(s)which is used by the clients to access the data. A set of Java-based APIsallows clients to communicate with the system.

• European Middleware Initiative (EMI): The EMI [11] is a completesoftware stack based on four major European middlewares ARC, UNI-CORE, gLite, and dCache. The aim is to provide a coherent middle-ware by adhering software standards for interoperability between thecore services of partner middlewares. Recently, EMI-1 codename Keb-nekaise has been released. It consist of a comprehensive set of toolsand services for distributed computing infrastructures which includesEMI-Compute for enabling computational resources, EMI-Data for dis-tributed data management, EMI-Infrastructure provides a set of servicesrequire for information and management of DCI and EMI-Security forsecure communication.

23

• Meta-middlewares: The problem of having to learn and use multiplemiddlewares has been addressed by adding another layer on top of theexisting middlewares. This meta-layer interacts with the underling mid-dlewares and can also add new functionality. The Grid Job ManagementFramework (GJMF) [58] used in this thesis is an example of a middle-ware independent resource allocation framework.

In contrast to the grid middleware initiatives described above, some well knowcloud based solutions are described below.

• Amazon Cloud Services: Amazon [71, 6] provides commercial so-lutions for computing and storage capabilities by using Elastic CloudComputing (EC2) [1] and Simple Storage Solution (S3) [4] web ser-vices. The Amazon cloud provides a seamless view to the computingand storage services with a pay-as-you go model. Here, the S3 service isbased on the concept of Buckets; a container to store objects which canbe configured to be stored in specific region. S3 provides APIs usingREST [79] and SOAP for most common operations like Create Bucket,Delete, Write and Read Objects and Listing Keys. EC2 allows accessto the computational resources using web service interfaces. Apart fromthese two service, Amazon also provides SimpleDB [5] for providingcore database functions like indexing and querying in the cloud, whileRDS [3] addresses the users that need a relational database system andthe Elastic ReduceMap [2] services allows users to process massiveamount of data.

• Azure Cloud Services: The Azure is a commercial cloud solution de-veloped by Microsoft. Using Azure Compute, one can build applica-tions using any language, tool or framework. Azure Storage is similar toother storage cloud solutions in the sense that users can create objects,containers, where each container stores items of different types. Azurealso provides features like data access using RESTful interfaces, auto-matic content caching near to the users and a secure access mechanismfor data in the cloud [7]. Other services provided by Azure includesSQL Azure for databases. Applications can enable reporting by usingAzure Business Analytics and Service Bus allows applications to createreliable messaging mechanism for loosely coupled applications.

• Openstack Cloud: The Openstack effort is a global, collaborative en-terprise for specifying interfaces and building open source componentsfor cloud technology. The effort spans a wide field, covering comput-ing, storage and image services. Openstack Compute is an open sourcesolution based on a large network of virtual machines to provide a scal-able computing platform. SWIFT is the Openstack storage solution. It’sa BLOB-based solutions for managing petabytes of data. OpenstackImage Service provides discovery, registration and delivery services forvirtual images.

24

Part II:Application Execution Environments

2. Application Environments for Grids

Grid systems provide a means for building large-scale computational and stor-age environments meeting the growing needs of scientific communities. Thereare challenges in building and managing efficient and reliable grid softwarecomponents, but another area that also requires serious attention is how to en-able applications to use the grid environment. Often, scientific applicationsare built using a monolithic approach which makes it difficult to exploit a dis-tributed computing framework. Even for a very simple application, the userneeds certain expertise to run the job on a grid system. The client tool hasto be installed and configured, a job description file has to be prepared, cre-dentials have to be handled, commands to submit/monitor the job have to beissued, and finally the output files might have to be downloaded. Complexscientific applications use external libraries, input data sets, external storagespace and certain toolkits which adds complexity when running the applica-tion in a grid environment. Large efforts are needed to handle all these issues,and this greatly affects the overall progress of the real scientific activity.

To get maximum benefit of a grid computing infrastructure, there is a needto facilitate the user community with flexible, transparent and user friendlygeneral purpose and application specific environments. Such environmentscan also e.g. handle several different middlewares in a transparent way.

2.1 Grid PortalsGrid application portals represent one way to address the requirements men-tioned above. The goal is to access the distributed computational power usinga web interface and make application management as simple as utilizing theweb for sharing the information. A number of different projects have devel-oped production level application portals. For example; GridSphere [19], LU-NARC portal [86], GENIUS [44] and P-Grid [94] together with GEMLCA [55]provide middleware independent grid portals.

2.2 Application WorkflowsScientific applications are often quite complex and a computerized experimentis built up from the execution of multiple dependent or independent compo-nents. Single or bulk jobs submission and management systems cannot handle

27

such applications. Enabling complex applications to utilize grid resources re-quire a comprehensive execution model. In a grid environment such modelsare known as application workflows [108]. In [67] a formal definition of a gridworkflow is given as “The automation of the processes, which involves the or-chestration of a set of grid services, agents and actors that must be combinedtogether to solve a problem or to define a new service”. Apart from differ-ent independent web-based or desktop applications for handling workflows,different middlewares provide separate components for managing workflows.These components allows for submitting a workflow as one single, completetask. Condor’s DAGMan (Directed Acyclic Graph Manager) [53] and Uni-core’s Workflow engines [46] are examples of such components. Other exten-sive efforts include Tirana [38], an open source problem solving environment,Pegasus [33], and Taverna [95] for bioinformatics applications.

2.3 The Job Management ComponentThe job management component is an important basic building block of anapplication environment. The task of this component is to handle job submis-sion, management, resubmission of failed jobs and possibly also migration ofjobs from one resource to another. Often the job management component isdesigned as a set of services having well-defined tasks and the functionalityis exposed by client tools or a set of APIs. This component works togetherwith the client-side interface to provide a flexible, robust and reliable man-agement component. This job management component is also responsiblefor providing seamless access to multiple middlewares. One example is theGEMLCA integration with the P-Grid portal in which the layered architectureof GEMLCA provides a grid-middleware independent way to execute legacyapplications. In other examples, the GridWay [75] metascheduler providesreliable and autonomous execution of grid jobs, and GridLab [101] producesa set of application-oriented grid services which are accessed using the GridApplication Toolkit (GAT). Using these tools, application developers can buildand run applications on the grid without knowing too much details.

2.4 Thesis ContributionIn articles I and II we have developed frameworks for managing applicationsin grid system. The solutions address both general purpose and applicationspecific problem-solving environments. Our solution is based on the LunarcApplication Portal (LAP) [86], the R framework [34] and the Grid Job Man-agement Framework (GJMF) [58]. LAP provides a user friendly web interfacefor executing the applications. The default version of LAP rely on the Ad-vanced Resource Connector (ARC) middleware for job management. GJMF

28

was designed to provide a middleware independent job management frame-work. Using a multi-layered architecture, GJMF allows to subdivide the tasksand provide reliable and fault-tolerant submission and management for gridjobs. The R framework is heavily used by biologists and it provides a widevariety of statistical and graphical techniques, and is highly extensible.

2.4.1 System ArchitectureThe architecture developed in this thesis enables use of distributed comput-ing infrastructures by introducing an abstraction layer to hide the underlyingdetails and provide a simple and easy to use interface. In a distributed envi-ronment, computational and storage resources are exposed following variousstandards. Due to the lack of interopreatability, application users are bound touse a limited set of resources. The proposed solution also addresses this issueby adhering GJMF’s transparent access to the resource running under differentmiddlewares.

Figure 2.1. System architecture for enabling flexible execution environment.

Based on the features and the functionalities provided by the LAP andGJMF, we have developed an architecture which joins the best of these twosystems. The architecture is based on three layers and the components in thelayers have well-defined tasks. LAP works as the Presentation Layer and pro-vides the application management whereas GJMF works at the Logic Layerand ensures reliable and middleware independent job submission and man-agement functionalities. Figure 2.1 illustrate the flexibility of the architecture.The architecture is highly modular and provides component level fault toler-ance, i.e. single or multiple LAP(s) can use single or multiple GJMF deploy-ments. Article I describes the work in detail.

Based on these principles, we have enabled the R software framework atthe presentation layer. This approach enables the scientists to utilize theirlocal resources for simple tasks expressed in R and submit computationallyexpensive tasks to grids while working in the familiar environment. Article IIpresents an application specific problem-solving environment based on R andGJMF.

29

Part III:Distributed Storage Solution

3. Distributed Storage Systems

Large-scale storage systems have become an essential computing infrastruc-ture component for both research and commercial environments. Distributedstorage systems already hold petabytes of data, and the size is constantly in-creasing. The challenge of handling huge data volumes include requirementsof consistency, reliability, long term archiving and high availability. In dis-tributed collaborative environments, such as particle physics [32], earth sci-ences [61] and biomedicine [73], the requirement of a distributed storage sys-tems are more pronounced. In order to efficiently utilize the computationalpower, high availability of the required data is essential. In commercial en-vironments, companies like Amazon, Yahoo and Google are working withsolutions to provide "unlimited storage anytime, anywhere".

Centralized storage solutions cannot handle the upcoming data challengesin a scalable manner, instead distributed storage systems (DSS), are needed toaddress these challenges. Network Attached Storage (NAS) and Storage AreaNetworks (SAN) provide limited solutions, but for large scale storage require-ments the concept of geographically distributed resources in Data Grids [51] isa viable solution. The concept of the data grids is to create large, virtual stor-age pools by connecting a set of geographically distributed storage resources.

During the last two decades, the challenge of designing DSS for huge datasets has been addressed in a number of projects. Solutions such as GoogleBigTable [50], have been developed where a distributed storage system is usedfor managing petabytes of data over thousands of machines. BigTable is basedon the Google file systems [72] and in use with some highly data intensiveapplications like Google Earth, Google Analytics and Google personal SearchEngine. Amazon Dynamo [54] is a storage system used by the world’s biggestweb-store Amazon.com. Hadoop [20] is another effort aimed at designing areliable, and scalable, distributed storage system.

In the research community there are several projects where different solu-tions have been developed. For example, CASTOR, DPM [102] from CERNand DCache [69] from FermiLab and DESY laboratory are in use to handlepetabytes of data generated from the Large Hardon Collider (LHC) experi-ments. Here, the data centres are located all over the world and the DSS is usedto store the data on geographically distributed storage nodes. DCache is alsocapable of handling tertiary storage for long term data archiving. Tahoe [37] isan open source filesystem which utilizes several nodes with the design of a re-silient architecture. XTreemFS [76] addresses the same problem of distributedstorage over the heterogeneous environment using an object-based filesystem.

33

iRODS [107] presents a layer on top of third party storage solutions and giveshigh-level seamless access to different storage systems.

The projects listed above show the variety of large scale distributed storagesystems available for both commercial and research communities. Despite ofall these big projects, new efforts are needed to assess limitations in the currentDSS.

3.1 Characteristics of Distributed StorageDifferent studies have been conducted to identify the key features or the char-acteristics of large-scale storage systems. In [104], a comprehensive summaryof the requirements and the key characteristics of such systems:

• Reliability: The system should be capable to reliably store and sharethe data generated from various applications.

• Scalability: The system should have a scalable architecture in whichthousands of geographically distributed storage pools can dynamicallyjoin and leave the system.

• Security: The security model is an essential part of the DSS. It is im-portant that users can share the data in an easy-to-use but secure envi-ronment. The security is required at different levels in the system, e.g.between different components of the system, while transferring data, ac-cessing meta-data, and to determine ownerships on files and collections.

• Fault Tolerance: While handling large amounts of data in a geographi-cally distributed environment, it is expected that the system experienceshardware or component failures. The system should have the capabilityto recover transparently from certain level of fail-overs.

• High Availability: To run the system in a production environment it isimportant that the system should be highly available.

• Accessibility: To make the system practically usable it is very importantthat the interfaces should be simple enough to hide the overall complex-ity from the end user.

• Interoperatability: Due to the diverse emerging requirements from ap-plications to build various scalable solution. It is important to followstandards that allow interoperatability between such solutions.

3.2 Challenges of Distributed StorageDesigning large-scale distributed storage systems is a non-trivial task. All thecharacteristics of DSS listed above have been extensively studied in the pastyears. In [51] core components have been identified for distributed data man-agement. Several projects have been initiated that helps to increase the overallprogress. Below, the most commonly identified technical challenges in build-

34

ing a reliable, efficient, scalable, highly available and self-healing distributedstorage system are listed:

Data Abstraction or Virtualization: The system should provide a highlevel abstraction when utilizing the storage resources over independent ad-ministrative domains.

Data Transfer: Data intensive applications and replication mechanism re-quire protocols for efficient and reliable data transfer.

Metadata Management: Decoupling and management of information aboutthe available data in the system is a serious challenge in the design of DSS.For large scale systems, the meta-data store often is the scalability bottleneckand a single point of failure in the system.

Authentication and Authorization: Resources running in independent ad-ministrative domains must have a security layer which allows single sign-onaccess to the resources. In grid systems security this is often handled by x509certificates signed by a certificate authority. Also, the concept of a virtual orga-nization has evolved to make it possible to apply policies or rules by defininga group of individuals or projects in the same field.

Replica Management: High availability and reliability of the data is oftenensured by creating multiple copies of the data. A number of strategies havebeen proposed and studied for offering efficient and reliable replica manage-ment in the DSS.

Resource Discovery and Selection: The heterogeneous nature of mostDSS results in a need of a mechanism that gives information about the avail-ability of data and its replicas in the system. The information about the dataavailability helps to select the source which can efficiently deliver the data tothe destination.

3.3 Thesis ContributionIn this thesis the development of the Chelonia storage system is presented, andthe open source cloud solution Openstack – SWIFT for CERN-specific dataanalysis is also presented. The following sections give an overview of theseprojects.

3.3.1 Chelonia Storage SystemThe Chelonia storage system was developed with the next generation compo-nents of the ARC middleware. Chelonia is a file-oriented distributed storagesystem based on geographically distributed storage nodes. The system is de-signed to fulfill requirements ranging from creating a store e.g. for managingholiday pictures to facilitating scientific communities requiring a grid-awaredistributed storage system that can be used by the grid jobs. The Cheloniasystem can address many of the challenges mentioned in section 3.2. Below a

35

brief overview of the system is given whereas paper III, IV, V and [92, 91, 93]provide complete details about Chelonia’s architecture, performance and sta-bility evaluation as well as experiences of deploying Chelonia in real environ-ments.

Architecture and System ComponentsThe architecture of Chelonia follows a service-oriented architecture. It isbased on four core services in which each of the services has a well-definedrole. Figure 3.1 shows an overview of the Chelonia architecture. The commu-nication in the system is using SOAP over HTTP(S).

Figure 3.1. Architecture of Chelonia Storage System

Following are the descriptions of the Chelonia services:• A-Hash (A-H): A-Hash is a metadata store for consistently storing in-

formation in property-value pairs. Chelonia supports two types of A-Hash, centralized and replicated. Being such a central part of the storagesystem, the A-Hash needs to be consistent and fault-tolerant. ReplicatedA-Hash is based on the Oracle Berkeley DB [31] (BDB), an open sourcedatabase library with a replication API. The replication is based on a sin-gle master, multiple clients framework where all clients can read fromthe database and only the master can write to the database. In the eventof a master going offline, the clients sends a request for election, and anew master is elected amongst the clients.

• Librarian (L): The Librarian works as a metadata catalog while keepingits status as a stateless service in the system. Instead it stores all the per-sistent information in the A-Hash. This makes it possible to deploy anynumber of independent Librarian services to provide high-availabilityand load-balancing. The Librarian only needs to know about one of theA-Hashes at start-up to be able to get the list of all available A-Hashes.During run-time the Librarian holds a local copy of the A-Hash list andrefreshes it both regularly and in the case of a failing connection.

36

• Shepherd (S): Each instance of the Shepherd service manages a par-ticular storage node and provides a uniform interface for storing andaccessing file replicas. In addition to storing files and providing accessto them, the Shepherd is responsible for checking if a file replica is validand, if necessary, initiating replication of the file to other Shepherds.

• Bartender (B): The Bartender service provides a high-level interface ofthe storage system for the clients (other services or users). Access poli-cies associated with files and collections are evaluated by the Bartender.The Bartender communicates with the Librarian and Shepherd servicesto execute the client’s requests. The Bartender also supports so-calledgateway modules which make it possible to communicate with third-party storage solutions, thus enabling the user to access multiple storagesystems through a single Bartender client.

Features of CheloniaThe Chelonia storage system offers the following set of features:

• A global hierarchical namespace that allows all users to see exactly thesame tree of files and collections.

• Files in Chelonia are replicated, and broken replicas are repaired auto-matically by the system. This makes Chelonia self-healing.

• Chelonia ensures secure file transfer through the HTTPS protocol, andsupport for additional protocols can easily be added.

• Files in Chelonia can be referred by Logical Names (LN), which are apaths in the Chelonia global namespace.

• Chelonia is flexible in its setup, and it is possible to add or remove anyservice in the system without downtime or complicated reconfiguration.

• Third-party storage systems can be integrated into the Chelonia globalnamespace in a way similar to mounting remote file systems into a localfile system.

• Chelonia provides an easy way to turn any directory on any computerinto a storage element in the Chelonia via Hopi - the native lightweighthttps server.

• Chelonia comes with a FUSE module making it possible to handle theentire storage as a local directory.

• Chelonia users can assign access policies to files and collections in thesystem granting access to individual users or entire virtual organizationsin grid environment.

37

3.3.2 Database Enabled CheloniaRecently we have enabled relational databases at the Shepherd nodes (stor-age nodes) running under the Chelonia domain. The extension with databasesis specialized for scientific applications. We have used MySQL [24] as theRDBMS. Based on the proposed extended architecture, we have also sim-plified the use of databases by having a generalized databases schema. TheChelonia-Schema can be viewed as a variable catalog which stores the vari-ables of both simple (integer, real and string) and complex (arrays, matricesand tensors) datatypes in the underlying geographically distributed databaserunning at Shepherd nodes. Since the data is in the RDBMS, the users accessthe required data by formulating queries in Structured Query Language (SQL),a well-known query language for relational databases. As shown in figure 3.2,using Chelonia command-line tool or WSMED [89] (Web Service MEDiator,a system that provides relational views of any data providing web service op-erations by reading their WSDL documents) users can send SQL queries tothe system.

Figure 3.2. The database enabled Chelonia storage can handle queries coming fromWSMED and SSDM.

We have also used database enabled Chelonia as a backend of a SciSPARQLDatabase Manager (SSDM) [43]. SciSPARQL is an extension of SPARQL, aquery language for Semantic Web, and provides additional syntax and seman-tics for accessing numeric arrays of arbitrary dimensionality, including arrayslicing, projection and transposition. The aim of this project is to enhance thedata analysis capabilities for data-intensive applications. Apart from providingaccess to the sections off multidimensional arrays, SciSPARQL also supportsfunctions for computing mean, covariance and aggregative mean of differentsamples for the available data. For advanced data analysis, application-specificprograms written in Python, Java and C can also be executed to extract more

38

precise information from the underlying data. Articles VI and VII explain thedetails of this project.

3.3.3 Cloud based Storage SolutionClouds are emerging as a solution to address the computational and storagerequirements of different applications. A number of different studies havebeen conducted to identify the strength and weaknesses of available cloudsolutions. In this thesis we have investigated an open source storage cloud,Openstack – SWIFT, for the CERN-specific data analysis.

CERN and its collaborative partners are using a number of different storagesolutions for managing data coming form the experiments. In large collabora-tions, various research organizations and institutions of different capabilitiesand with different amounts of resources are involved. It is required to have arange of solutions that fit different requirements. The ROOT [47] software is adata analysis framework for the experiments running at CERN. ROOT alreadyhave the capabilities to interact with could solutions. The work presented in ar-ticle VIII designed a solution which utilizes already available building blocksand minimizes the needs to develop new components.

The Openstack effort is a global, collaborative enterprise for specifying in-terfaces and building open source components for cloud technology. The ef-fort spans a wide field, covering computing, storage and image services. Forthe project, we focus on the Openstack SWIFT. This can be used to setuppublic or private cloud-based object storage solutions similar to Amazon S3.Since SWIFT is designed to run on commodity hardware, sites can deploycloud storage solutions in different ways, depending on available expertiseand resources for deployment. Also by using RESTful interfaces for systemaccessibility, SWIFT becomes a potential candidate for further investigations.Article VIII discusses the requirements for the data analysis and the functionaland performance evaluation of a SWIFT storage solution.

39

Part IV:Resource Allocation in Distributed ComputingInfrastructures

4. Resource Allocation in DistributedComputing Infrastructures

Resource allocation is one of the most important areas in the design of dis-tributed computing infrastructures. The task is to select the best possible re-source from the available pool of resources. This requires information frome.g. information and cataloging components in the system, and sometimesalso information directly from the resources (depending the architecture of thesystem).

In the grid systems, the process of resource allocation and the actual tasksubmission to the selected resource are normally two separate processes. Thegrid resource broker, also known as the high-level- or meta-scheduler, selects aresource on the basis of the available information. The local resource manage-ment system is then responsible for submitting jobs to the underlying cluster.Different strategies for resource brokering in the meta-scheduler have beenchosen in different middlewares like gLite [16], Condor [88], ARC [57] andNimrod-G [42].

In many cases it has been observed that the brokering component is a scal-ability bottleneck and a single point of failure within the whole grid system.Here, a tight connection between different components in the system affectsthe overall performance while a too loosely coupled approach affects the re-source selection criteria. A lack of well defined responsibilities of the compo-nents can increase the communication overhead.

4.1 Models for Resource AllocationThe non-trivial issue of selecting the best resources for a given set of tasks hasbeen addressed with many different approaches. Realizing the complexity ofthe task an abstract level approach has been adopted by defining taxonomies.In [81], this approach has been studied in detail in the context of the computa-tional grids.

Grid middlewares are using different models for resource allocation [48].For the meta-level scheduler, a centralized or a distributed brokering modelcan be used. The centralized model can provide a complete view of the over-all load on the system, hence a more effective distribution of the load on theavailable resources can be achieved. gLite and Condor are examples of mid-dlewares using the centralized resource allocation model. In the distributed

43

model, each user has a separate broker (a user-centric brokering model). TheARC middleware uses an implementation of the distributed model. Agentbased approaches are also employed for efficient and reliable resource al-location. Here, agents are software components considered to have intelli-gence, autonomous in nature, capability of self-healing and can take decisions.[68, 83, 82] are examples of systems using agents for resource allocation.

These basic models have been further developed in models using marketoriented resource allocation [84, 109]. Here, the concept is to create a virtualmarket in which the resources (computational or storage) are considered ascommodities. Resources can be purchased from the resource providers. Theprices varies according to the resource demand, as for a real market. Nimrod-G and Tycoon uses a market based strategy for resource allocation.

For mission critical applications, the result is needed within a certain timeframe. Finding a resource which can fulfill the job requirements and alsoprovide the result within a given time adds another level of complexity to theallocation model. To address such requirements the concept of advanced reser-vations [59] has emerged. An advanced reservation allows for determining thejob’s starting time in advance.

4.2 Thesis ContributionDuring the PhD education project, we conducted a case study to analyze thestrengths and weaknesses of different brokering models used in middleware.We have also highlighted the strengths and weaknesses of the models. Wepropose some key modifications in the brokering component of the AdvancedResource Connector (ARC) middleware. Figure 4.1 shows the proposed mod-

Figure 4.1. Modified ARC resource allocation mechanism

44

ifications in the ARC components. Our results show that these modificationsimprove the efficiency of the brokering component which in turn has an impacton the overall user response time.

In the existing model used in the ARC middleware a broker at the clientside is used for selecting the candidate resource from the available resourcepool. In the proposed model we have adopted a three layer brokering model asshown in figure 4.1. Our initial tests using the ARC client show that as muchas 90% of the job submission time was spent on the resource discovery andonly 10% was used for the matchmaking and actual submission. The goal ofcreating a hierarchical model is to subdivide the responsibilities and minimizethe time spent in resource discovery which in turn enhances the efficacy ofresource allocation. In article III in the list of other publications (not includedin the thesis), we explain the modifications and performance improvement indetail.

45

Part V:Article Summary

5. Summary of Papers in the Thesis

5.1 Paper-IThis paper presents a reliable, robust and user-friendly environment for man-aging jobs on grids. The presented architecture is based on the integrationof the LUNARC Application Portal (LAP) and The Grid Job ManagementFramework (GJMF). LAP provides a user-friendly environment for handlingapplications whereas GJMF contributes with a reliable, robust middleware in-dependent job management. A Java based component, the Portal IntegrationExtensions (PIE) is developed and used as an integration bridge between LAPand GJMF. The scalability and flexibility of the integration architecture resultin that a single LAP can make use of multiple GJMFs, while multiple LAPscan make use of the same GJMF. Similarly, a single GJMF can make use ofmultiple middleware installations concurrently, as can multiple GJMFs utilizethe same middleware installation. The components of the architecture are de-signed to function non-intrusively for seamless integration in production Gridenvironments. The architecture also allows for backward compatibility. Usingthe proposed model and with the help of applications from different researchfields the results presented show that such application environments can en-hance the progress of research in the application fields.

5.2 Paper-IIThis paper describes a Grid-enabled problem solving environment (PSE) forQuantitative Trait Loci (QTL) analysis, which allows end-users to work withina familiar setting and provides transparent access to computational Grid re-sources. The computational environment is targeted towards end-users withlimited experience of grid computing, and supports workflows expressed inthe R language where small tasks are performed locally on PSE hosts, whilelarger, more computationally intensive tasks, are allocated to grid resources.In this model, the grid computations are scheduled asynchronously. The archi-tecture integrates the R statistical environment with the computational powerof grid environments. By exploiting GJMF within this architecture the PSEis decoupled from a specific grid middleware and reliable access to the gridresources through concurrent use of multiple Grid middlewares is provided.

49

5.3 Paper-IIIIn this paper we present the architecture of a self-healing, grid-aware andresilient storage cloud called Chelonia, and this new system is compared toother existing solutions. This storage system is based on a Service OrientedArchitecture (SOA) in which each service is responsible for a well definedtask. Chelonia consists of five core services. The Bartender, which is astateless service, provides a high level interface for user interaction. TheLiberian is a stateless service which works as a catalog service. The metadatastore, A-Hash, follows a master client model and provides metadata replica-tion amongst the available A-Hashes. The Shepherd runs as the storage nodeand is responsible for checking all the available files and sending reports toLibrarian. The Hopi service provides the actual transfer service. The paperalso describes how security in Chelonia is divided into three levels. By usinga gateway module, Chelonia also provides access to third party storage sys-tems. The first proof-of-concept test setup presented in this paper shows theself-healing and resilient capabilities of the Chelonia Cloud.

5.4 Paper-IVPaper IV highlights the benefits and the usability of the file-oriented Cheloniastorage solution. It is important to evaluate the strengths and weaknesses of thesolutions consisting of different modules that can run independently over geo-graphically distributed resources. In such solutions, it has been observed thateither the deployment of the system is too complicated or adding new com-ponents in the system or stopping certain services effects the overall systemresponse. The paper presents our experiences gained from deploying Cheloniain a real environment. The benefits of Chelonia include an easy mechanismfor accessibility and system deployment. Another major advantage is the sys-tem expandability, with Chelonia new system and storage nodes can be easilyincluded in the running setup while keeping the system response intact. Ch-elonia provides a three level self-healing mechanism: (1) for stored data; (2)for meta-data; and (3) for system components. The paper also presents theissues identified while running a distributed storage solution based on Chelo-nia in the Particle-Physics department at Oslo University. This includes highcommunication load between Librarian and Shepherd nodes and also betweenreplicated A-Hashes. Finally, the paper also discusses the future directions forthe Chelonia system.

5.5 Paper-VThis paper provides results from performance and stability tests using differ-ent deployments of the Chelonia storage system. The paper presents a number

50

of test-cases. For example, the depth test show the average amount of timetaken by the system to create and list collections and the width test illustratesthe average response time when a collection contains 1000 entries. The per-formance of the system while multiple clients are interacting simultaneouslyis also examined, and the difference of performance while using a centralizedand distributed A-Hash is studied. It is expected that some of the storage nodeswill become offline and later again join the system. A file replication test de-scribes how the system identifies if any Shepherd is offline and then respondsby replicating files to the other available Shepherds to achieve the high avail-ability of the data. Finally, a stability test is performed where Chelonia is runfor a full week while clients are regularly interacting with the system.

5.6 Paper-VIThe paper presents a scalable architecture for managing data-intensive scien-tific applications. The extended architecture is based on a loosely coupledapproach between the Chelonia storage system and Web-Service MEDiator(WSMED), a system used to provides a web service to query data withoutany further programming. Chelonia provides storage capabilities by enablingdatabases at the geographically distributed storage nodes and WSMED pro-vides a query interface to view the data by invoking the web service calls.The extension of Chelonia is developed specifically with requirements fromanalysis of scientific data in mind. It allows simple (integers, real, and string)and complex (arrays, matrices, and tensors) datatypes to be stored and thenqueried using SQL. For complex data types we have used BLOBs, to serializedata as binary objects. We also address the issue of database schema designby introducing a generic database schema, the Chelonia Schema for scientificapplications. Furthermore, the system provides a mechanism to extract datainside the BLOBs using User Defined Functions (UDF) for MySQL. A mainbenefit of the developed solution is that only the required data is extractedfrom the SQL queries. Extract preprocessing, which in many cases can resultin poor performance, is avoided. Finally, a proof-of-concept application frombioinformatics is discussed and a performance evaluation of data populationand query execution is performed.

5.7 Paper-VIIThe work presented in paper VI illustrates how a database enabled Chelo-nia storage system can be used to accelerate the workflow of analysis forscientific applications. A limiting factor in most of the current solutions isthat they only support batch processing. This approach is inadequate to meetthe requirements of upcoming data-intensive applications. We present a solu-

51

tion based on Chelonia and SciSPARQL, a language that extends SPARQL toqueries over numeric scientific data. The system is capable of interactive on-line data analysis. The persistence of the complex (arrays, matrices or tensors)datatype is enabled by the Chelonia storage system. The presented solution al-lows for access of subsections of complex types and also enables a mechanismfor online processing in which there is a support for basic functionalities likesum, mean or aggregative mean for complex datatypes. Advanced application-specific analysis requirements can be fulfilled by writing customized modulesin Java, Python or C. By using a system for a highly data-intensive applicationfrom Systems Biology, URDME, we demonstrate the strength of the solutiondeveloped. The workflow of frameworks like URDME, based on stochas-tic simulations, requires post-processing on generated data which determinesthe next possible direction for the computerized experiment performed by theapplication scientist. In the paper we present results for queries which areimportant for such application and we show that the results can be extracteddirectly from the system by formulating queries in SciSPARQL. The paperalso presents the integration architecture for Chelonia and the SciSPARQLData Manager (SSDM) and improvements in storage and query mechanismfor the complex datatypes.

5.8 Paper-VIIIIn this paper we present a case-study where an open source storage cloudOpenstack – SWIFT for CERN-specific data analysis requirements is investi-gated. In a large collaborative environment like CERN, different institute andresearch centers join the collaboration with different capacities and provid-ing different e-infrastructures. Thus it is important to have a set of tools andframeworks which can be used in a general setting. In contrast to the commer-cial solutions, SWIFT can be used to build public and private storage cloudsbased on commodity hardware. The aim of the study presented in this paperis to explore Openstack – SWIFT and identify its strength and weakness. An-other important task is to propose a cloud solution based on already availablebuilding blocks. ROOT, a software framework used for CERN specific dataanalysis, already has interface to connect to the Amazon S3 store. Since the S3protocol is a de facto standard and Openstack – SWIFT also has SWIFT3 mid-dleware which allows S3 clients to access data stored in SWIFT. Thus the nat-ural extension is to use S3 protocol and utilizes the SWIFT store. This enablesthe ROOT framework to access both S3 commercial and SWIFT open sourcestorage cloud from the same interfaces. The requirements for CERN-Specificdata analysis and functional/performance test-cases are in-detail discussed inthe paper.

52

6. Svensk sammanfattning

Under de senaste decennierna har efterfrågan på storskaliga beräknings- ochlagringresurser inom vetenskap ökat dramatiskt. Nya beräkningsmetoder ochutvecklingen av dator- och lagringskapacitet gör det möjligt för forskare attanvända e-vetenskapliga metoder som kompletterar traditionell teori och ex-periment. E-vetenskap är till sin natur tvärvetenskaplig, med deltagande avforskare från flera discipliner. E-vetenskapen öppnar även upp för stora samar-beten där utspridda grupper av forskare gör vetenskapliga framsteg genomatt dela mjukvara och data. Inom området e-vetenskap uppstår ständigt nyafrågeställningar som gäller hur storskaliga distribuerade datorsystem och dis-tribuerade dataset skall kunna användas inom olika forskningsprojekt. Olikamodeller, t.ex. griddar (grids) och moln (clouds), har utvecklats och använtsgenom åren, men nya lösningar som bygger på dessa strukturer behövs för attmöjliggöra enkel och flexibel användning av distribuerade datorinfrastrukturerinom ett vitt spann av forskningsområden.

I denna avhandling presenteras lösningar för användning av beräknings-och datalagringsresurser för simuleringar och dataanalys i vetenskapliga tillämp-ningar. De presenterade systemen är främst tänkta att användas i distribueradeeller ”avlägsna” datorinfrastrukturer som griddar och moln. För att kunna ut-nyttja sådana datorresurser behöver flera stora problem lösas, och avhandlin-gen ger bidrag till lösning av två av dessa. För det första behöver systemengöra det möjligt för forskare att utnyttja distribuerade beräkningsresurser medminimal insats och liten kunskap om hur de underliggande miljöerna fungerarinternt. Detta kräver gränssnitt som kan dölja tekniska detaljer och exponerarsäkra, okomplicerade och användarvänliga miljöer. För det andra behöver sys-temen möjliggöra effektiv, pålitlig och säker åtkomst till distribuerade dataset,och effektiva och lättanvända analysverktyg för sådana data behöver utveck-las.

I den första delen av avhandlingen studeras tillämpningsmiljöer. Målet äratt dölja tekniska detaljer i den underliggande distribuerade datorinfrastruk-turen och ge slutanvändarna säkra och användarvänliga forskningsverktyg.Först beskrivs en generell lösning byggd på portalteknik som möjliggör trans-parent och enkel användning av olika griddmellanvaror (grid middlewares).Lösningen är baserad på integrering av the Lunarc Application Portal (LAP)och the Grid Job Management Framework (GJMF). LAP ger en användarvän-lig miljö för hantering av tillämpningsprogram medan GJMF bidrar med enpålitlig, robust mellanvaruoberoende hantering av gridd-jobben. Därefter pre-senteras en problemlösningsmiljö för genetisk analys som gör att slutanvän-

53

dare kan arbeta i en välbekant miljö samtidigt som det ger tillgång till om-fattande distribuerade beräkningsresurser. I denna problemlösningsmiljö an-vänds den statistiska programvaran R för att beskriva arbetsflödet i analysen,och programmeringsmiljön utökas med gridd-aktiverade rutiner för att utförade beräkningsmässigt krävande delarna. De distribuerade beräkningarna ut-förs asynkront och genom att utnyttja GJMF blir även denna lösning mellan-varuoberoende. Slutligen studeras även frågan om resursallokering i griddarkortfattat och vissa ändringar i den distribuerade resursallokeringsmodellen iARC-mellanvaran föreslås.

I den andra delen av avhandlingen presenteras lösningar för hantering ochanalys av vetenskapliga data med hjälp av distribuerade lagringsresurser. Förstpresenteras ett nytt pålitlig och säkert filorienterat distribuerat lagringssystem,Chelonia. Utformningen av systemet beskrivs och implementationsaspekterdiskuteras. Sedan verifieras stabiliteten och de skalbara prestanda för Cheloniagenom att flera testscenarier definieras och motsvarande experiment genom-förs. Därefter presenteras verktyg som utvecklats i syfte att ge en effektiv ochanvändarvänlig plattform för dataanalys byggd på Chelonia. Här används endatabasbaserad metodik. Chelonia kombineras med ett verktyg för webbtjän-ster vilket gör det möjligt att söka information i dataset lagrade i Cheloniautan ytterligare programmering. Detta tillvägagångssätt utvecklas sedan vi-dare och Chelonia kombineras med SciSPARQL, ett frågespråk som utvidgarSPARQL med stöd för frågor i databaser som gäller numeriska vetenskapligadata. Detta resulterar i ett system som är kapabelt att analysera distribueradedataset interaktivt. Avancerade applikationsspecifika analyskrav kan uppfyl-las genom att anpassade moduler i Java, Python eller C skrivs och inklud-eras i systemet. Användbarheten av systemet visas genom att det används föratt analysera data producerade av URDME, en simuleringsmiljö för system-biologi, och resultaten för en uppsättning frågor uttryckta i SciSPARQL påsådana data presenteras.

Slutligen presenteras en pilotimplementation där en molnmiljö för lagringbyggd på öppen källkod (Openstack – SWIFT) används för analys av data frånexperiment genomförda vid CERN. Detta är ett pilotprojekt där dataanalysinom ROOT-ramverket genomförs med hjälp av den framtagna lösningen ochen utvärdering av prestanda presenteras.

54

7. Acknowledgments

During my PhD studies, I have had the opportunity to work with many re-searchers at several academic institutions and to be part of interesting interna-tional collaborative projects. Now I think it time to acknowledge their support.

First I would like to thank God. I think the existence of anything in this uni-verse is not bounded to be sensed or judged by humans and human knowledge.Still the ability of critical thinking is another powerful source for exploring oursurroundings. This is what I have leant from my education. So thanks to AL-LAH almighty that gives me the ability to think and unfold the curiosities ofmy life.

I would like to express my deepest gratitude to my supervisor Prof. SverkerHolmgren for introducing me to the field of large-scale distributed comput-ing Infrastructures. Our regular discussions, creative ideas and good companykept me motivated and focused throughout the PhD period. Thanks for be-lieving in me and giving me the confidence to contribute my part in scientificresearch. Apart from research, I have also experienced my first alpine rock-climbing adventure hanging in a climbers rope in the Sylarna mountains innorth of Sweden.

I would also like to thank my co-supervisor Dr. Mattias Ellert for his ex-cellent technical support and fruitful discussion. The technical feedback that Igot during our regular Uppsala-Grid-Meetings was quite valuable. Thank youfor proof reading and all the language corrections.

I acknowledge the ARC team for their thought-provoking discussions. A veryspecial thanks to the Chelonia team; Bjarte Mohn, Zsombor Nagy, Jon. K.Nilsen and Alex Read for the great teamwork. I would also like to thanksmy other co-authors Per-Olov Östberg, Mahen Jayawardena, Carl Nettelblad,Jonas Lindemann, David Cameron, Andreas Hellander, Andrej Andrejev, RainerTöbbicke, Maitane Zotes Resines, Manivasakan Sabesan and Erik Elmroth fortheir valuable ideas and discussions. A special thanks to Prof. Tore Risch forgiving me new ideas and to think critically about our solutions. It was verygreat working with all of you and I will be happy to collaborate in future.

I am thankful to Tore Sundqvist and Jukka Komminaho for their support inmanaging computing resources and also to Carina Lindgren and Tom Smed-saas for always being very helpful. Many thanks to the whole TDB (Teknisk

55

DataBehandling) division for providing a very friendly, helpful and extremelycooperative working environment.

This section cannot be completed without acknowledging a number of otherfriends and colleagues, especially my Uppsala friends Boris Granovskiy, Ed-die Wadbro, Katharina Kormann, Martin Kronbichler and also Jawad Nisar,Tanveer Hussain, Muhammad Ahsan, Shahid Manzoor, Hassan Jafri, SultanAkhtar and many more. Special thanks to one of my closest friend UsmanAhmad Malik and my colleague Sven-Erik Ekström for our interesting dis-cussions and proof reading. Thanks to Prof. Hafeez Hoorani for his guidance.

I would like to thank my family, especially my parents for their love and sup-port.

Thanks to my brothers Imran Toor, Najib Toor and Bilal Toor for always beingvery supportive.

I thank my wife Sana for being with me through thick and thin in all thoseyears. Through our son Faris, thanks for enlighten me and widening the scopeof my world by showing how to change his diapers and make feeders :-).

Thanks to my son Faris for being the best thing in my life. You are a source ofcontinuous enjoyment and happiness.

The work presented in this thesis is funded by the Innovative Tools and Ser-vices for NorduGrid (NGIn) project within Nordunet3 program and by theeSSENCE strategic research area effort at Uppsala University, Sweden.

56

References

[1] Amazon EC2. http://aws.amazon.com/ec2/. [15th Apr 2012].[2] Amazon MapReduce. http://aws.amazon.com/elasticmapreduce/.

[15th Apr 2012].[3] Amazon RDS. http://aws.amazon.com/rds/. [15th Apr 2012].[4] Amazon S3. http://aws.amazon.com/s3/. [15th Apr 2012].[5] Amazon SimpleDB. http://aws.amazon.com/simpledb/. [15th Apr

2012].[6] Amazon Web Services. http://aws.amazon.com/what-is-aws/. [15th

Apr 2012].[7] ARCMonitor. http://www.nordugrid.org/monitor. [15th Apr 2012].[8] Chelonia Web Link. http://www.nordugrid.org/chelonia/. [15th Apr

2012].[9] Distributed Computing Environment (DCE).

http://www.opengroup.org/dce/. [15th Apr 2012].[10] Enabling Grid for E-sciencE. http://www.eu-egee.org/. [15th Apr 2012].[11] European Middleware Initiative. http://www.eu-emi.eu/. [15th Apr 2012].[12] European Organization for Nuclear Research (CERN).

http://public.web.cern.ch/public/. [15th Apr 2012].[13] Execution Mangement.

http://www.globus.org/toolkit/docs/4.0/execution/. [15th Apr2012].

[14] eXtensible Markup Language (XML). http://www.w3.org/XML/. [15thApr 2012].

[15] File Transfer Monitor (FTM). http://glite.cern.ch/glite-FTM/.[16] gLite Middleware. http://glite.cern.ch/. [15th Apr 2012].[17] Globus Alliance. http://www.globus.org/alliance/. [15th Apr 2012].[18] Globus Data Management.

http://www.globus.org/toolkit/docs/4.0/data/key/. [15th Apr2012].

[19] Gridsphere Portal Framework.http://www.gridsphere.org/gridsphere/gridsphere.

[20] Hadoop. http://hadoop.apache.org/index.html. [15th Apr 2012].[21] KnowARC Project. http://www.knowarc.eu. [15th Apr 2012].[22] Large Hadron Collider (LHC). http://public.web.cern.ch/public/.

[15th Apr 2012].[23] Monitoring and Discovery Service.

http://www.globus.org/toolkit/mds/. [15th Apr 2012].[24] My SQL 5.5 Reference Manual.

http://dev.mysql.com/doc/refman/5.5/en/. [15th Apr 2012].[25] Nordic DataGrid Facility. http://www.ndgf.org/. [15th Apr 2012].

57

[26] NorduGrid Collaboration. http://www.nordugrid.org/. [15th Apr 2012].[27] NorduGrid Middleware. http://www.nordugrid.org. [15th Apr 2012].[28] Nordugrid Papers. http://www.nordugrid.org/papers.html. [15th Apr

2012].[29] OASIS Reference Model for SOA. http://www.oasis-open.org/

committees/download.php/16587/wd-soa-rm-cd1ED.pdf. [15th Apr2012].

[30] Openstack SWIFT. http://openstack.org/projects/storage. [15thApr 2012].

[31] Oracle Berkeley DB. http://www.oracle.com/technology/products/berkeley-db/index.html.[15th Apr 2012].

[32] Particle Physics Data Grid. http://www.ppdg.net/. [15th Apr 2012].[33] Pegasus. http://pegasus.isi.edu/index.php. [15th Apr 2012].[34] R Framework. http://www.r-project.org/. [15th Apr 2012].[35] Security Component. http:

//www.globus.org/toolkit/docs/4.0/security/key-index.html.[15th Apr 2012].

[36] Simple Object Access Protocol (SOAP).http://www.w3.org/TR/2000/NOTE-SOAP-20000508/. [15th Apr 2012].

[37] Tahoe. http://allmydata.org/~warner/pycon-tahoe.html. [15th Apr2012].

[38] Triana. http://www.trianacode.org/index.html. [15th Apr 2012].[39] Twenty Experts Define Cloud Computing.

http://cloudcomputing.sys-con.com/node/612375. [15th Apr 2012].[40] Unicore Middleware. http://www.unicore.eu/. [15th Apr 2012].[41] Web Service Description Language (WSDL).

http://www.w3.org/TR/wsdl. [15th Apr 2012].[42] D. Abramson, R. Buyya, and J. Giddy. A computational economy for grid

computing and its implementation in the nimrod-g resource broker. FutureGener. Comput. Syst., 18(8):1061–1074, 2002.

[43] A. Andrejev and T. Risch. Scientific sparql: Semantic web queries overscientific data. presented at The 3rd International Workshop on DataEngineering Meets the Semantic Web (DESWEB 2012), 2012.

[44] A. Andronico, R. Barbera, A. Falzone, P. Kunszt, G. Lo Re, A. Pulvirenti, andA. Rodolico. Genius: a simple and easy way to access computational and datagrids. Future Generation Computer Systems, 19:805–813.

[45] M. Baker. Next-generation sequencing: adjusting to data overload. NatureMethods, 7(7):495–499, July 2010.

[46] D. Breuer, D. Erwin, D. Mallmann, R. Menday, M. Romberg, V. Sander,B. Schuller, and P. Wieder. Scientific computing with unicore. InNICSymposium2004, Proceedings, pages 429–440, 2003.

[47] R. Brun and F. Rademakers. ROOT – An object oriented data analysisframework. Nuclear Instruments and Methods in Physics Research Section A:Accelerators, Spectrometers, Detectors and Associated Equipment,389(1-2):81–86, April 1997.

[48] R. Buyya, S. J. Chapin, and D. C. DiNucci. Architectural models for resource

58

management in the grid. Lecture Notes in Computer Science, Volume1971/2000:18–35, 1999.

[49] D. Cameron, M. Ellert, J. Jönemo, A. Konstantinov, I. Marton, J. K. Mohn,B. Nilsen, M. Nordén, W. Qiang, G. Roczei, F. Szalai, and A. Wäänänen. TheHosting Environment of the Advanced Resource Connector middleware.NorduGrid. NORDUGRID-TECH-19.

[50] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows,T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage systemfor structured data. ACM Trans. Comput. Syst., 26(2):1–26, 2008.

[51] A. Chervenak, I Foster, C. Kesselman, C. Salisbury, and S. Tuecke. The datagrid: Towards an architecture for the distributed management and analysis oflarge scientific datasets. Journal of Network and Computer Applications,Volume 23, Issue 3:187–200, 2000.

[52] A. Cooke, A. J. G. Gray, L. Ma, W. Nutt, J. Magowan, M. Oevers, P. Taylor,R. Byrom, L. Field, S. Hicks, J. Leake, M. Soni, A. Wilson, R. Cordenonsi,L. Cornwall, A. Djaoui, N. Fisher, S. Podhorszki, B. Coghlan, S. Kenny, andD. OrsquoCallaghan. R-gma: An information integration system for gridmonitoring. volume 2888/2003 of Lecture Notes in Computer Science, pages462–481, Berlin / Heidelberg, 2003. Springer.

[53] P. Couvares, T. Kosar, A. Roy, J. Weber, and K. Wenger. Workflowmanagement in condor. pages 357–375.

[54] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman,A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo:amazon’s highly available key-value store. In SOSP ’07: Proceedings oftwenty-first ACM SIGOPS symposium on Operating systems principles, pages205–220, New York, NY, USA, 2007. ACM.

[55] T. Delaitre, T. Kiss, A. Goyeneche, G. Terstyanszky, S. Winter, and P. Kacsuk.Gemlca: Running legacy code applications as grid services. Journal of GridComputing, pages 75–90.

[56] T. B. Downing. Java RMI: Remote Method Invocation. IDG BooksWorldwide, Inc., Foster City, CA, USA, 1st edition, 1998.

[57] M. Ellert, M. Grønager, A. Konstantinov, B. Kónya, J. Lindemann, I Livenson,J. Nielsen, M. Niinimäki, O. Smirnova, and A. Wäänänen. Advanced resourceconnector middleware for lightweight computational grids. Future Gener.Comput. Syst., 23(2):219–240, 2007.

[58] E. Elmroth, P. Gardfjäll, A. Norberg, J. Tordsson, and P-O Östberg. Designinggeneral, composable, and middleware-independent grid infrastructure tools formulti-tiered job management. In T. Priol and M. Vaneschi, editors, TowardsNext Generation Grids, pages 175–184. Springer-Verlag, 2007.

[59] E. Elmroth and J. Tordsson. Grid resource brokering algorithms enablingadvance reservations and resource selection based on performance predictions.Future Gener. Comput. Syst., 24(6):585–593, 2008.

[60] R. T. Fielding. Architectural styles and the design of network-based softwarearchitectures. PhD thesis, 2000. AAI9980887.

[61] S. Fiore, S. Vadacca, A. Negro, and G. Aloisio. Data issues at theeuro-mediterranean centre for climate change. pages 23–35.

[62] I. Foster. The grid: A new infrastructure for 21st century science. Physics

59

Today, pages 42–47, 2002.[63] I. Foster and C. Kesselman. Globus: a metacomputing infrastructure toolkit.

International Journal of High Performance Computing Applications, 11, No.2:115–128.

[64] I. Foster, C. Kesselman, and S. Tuecke. The anatomy of the grid: Enablingscalable virtual organizations. Int. J. High Perform. Comput. Appl.,15(3):200–222, 2001.

[65] I. Foster and C. Kesselmanl. The globus toolkit. pages 259–278, 1999.[66] I. Foster, Z. Yong, I. Raicu, and S. Lu. Cloud computing and grid computing

360-degree compared. In Grid Computing Environments Workshop, 2008.GCE ’08, pages 1–10, Nov. 2008.

[67] G. Fox and D. Gannon. Workflow in grid systems. Concurrency andComputation: Practice and Experience, 2006:1009–1019.

[68] D. Frederic, J. Clement, D. Pascal, and A. C. Stefano. Agent-grid integrationontology. Volume 4277/2006(1):136–146, 2005.

[69] P. Fuhrmann and V. Gulzow. dcache, storage system for the future. InEuro-Par 2006 Parallel Processing, volume Volume 4128/2006 of LectureNotes in Computer Science, pages 1106–1113. Springer Berlin / Heidelberg,2006.

[70] F. Gagliardi. The egee european grid infrastructure project. volume 3402/2005of Lecture Notes in Computer Science, pages 194–203, Berlin / Heidelberg,2005. Springer.

[71] S. Garfinkel. Commodity grid computing with amazon’s s3 and ec2,.[72] S. Ghemawat, H. Gobioff, and S-T Leung. The google file system. SIGOPS

Oper. Syst. Rev., 37(5):29–43, 2003.[73] J. S. Grethe, C. Baru, A. Gupta, M. James, B. Ludaescher, M. E. Martone,

P.M. Papadopoulos, S. T. Peltier, A. Rajasekar, S. Santini, I. N. Zaslavsky, andM. H. Ellisman. Biomedical informatics research network: Building a nationalcollaboratory to hasten the derivation of new understanding and treatment ofdisease. In From Grid to Healthgrid: Proceedings of Healthgrid 2005, volumeVolume 112/2005, pages 100–109. IOS Press, 2005.

[74] T. Hey and A. Trefethen. The Data Deluge: An e-Science Perspective, pages809–824. John Wiley and Sons, Ltd, 2003.

[75] E. Huedo, R. S. Montero, and I. M. Llorente. The gridway framework foradaptive scheduling and execution on grids. Scalable Computing - Practiceand Experience 6 (3): 1-8, 2005.

[76] F. Hupfeld, T. Cortes, B. Kolbeck, J. Stender, E. Focht, M. Hess, J. Malo,J. Marti, and E. Cesario. The xtreemfs architecture—a case for object-basedfile systems in grids. Concurr. Comput. : Pract. Exper., 20(17):2049–2060,2008.

[77] M. Jayawardena and S. Holmgren. Grid-enabling an efficient algorithm fordemanding global optimization problems in genetic analysis. In Proceedingsof the Third IEEE International Conference on e-Science and Grid Computing,E-SCIENCE ’07, pages 205–212, Washington, DC, USA, 2007. IEEEComputer Society.

[78] K. Karasavvas, M. Antonioletti, M. Atkinson, N. C. Hong, T. Sugden,A. Hume, M. Jackson, A. Krause, and C. Palansuriya. Introduction to ogsa-dai

60

services. Volume 3458/2005:1–12.[79] R. Khare and R. N. Taylor. Extending the representational state transfer (rest)

architectural style for decentralized systems. In ICSE ’04: Proceedings of the26th International Conference on Software Engineering, pages 428–437,Washington, DC, USA, 2004. IEEE Computer Society.

[80] A. Konstantinov. The ARC Computational Job Management Module - A-REX.NorduGrid. NORDUGRID-TECH-14.

[81] K. Krauter, R. Buyya, , and M. Maheswaran. A taxonomy and survey of gridresource management systems for distributed computing. SOFTWAREPRACTICE AND EXPERIENCE, 32:135–164, 2002.

[82] W. Kuranowski, M. Paprzycki, M. Ganzha, M. Gawinecki, I . Lirkov, andS. Margenov. Efficient matchmaking in an agent-based grid resource brokeringsystem. pages 327–335, 2006.

[83] W. Kuranowski, M. Paprzycki, M. Ganzha, M. Gawinecki, I. Lirkov, andS. Margenov. Agents as resource brokers in grids Ñ forming agent teams.Future Gener. Comput. Syst., Volume 4818/2008(1):489–491, 2005.

[84] K. Lai, B. A. Huberman, and L. Fine. Tycoon: A Distributed Market-basedResource Allocation System. Technical Report arXiv:cs.DC/0404013, HPLabs, Palo Alto, CA, USA, April 2004.

[85] E. Laure, F. Hemmer, F. Prelz, S. Beco, F. Fisher, M. Livny, L. Guy,M. Barroso, P. Buncic, P. Z. Kunszt, A. Di. Meglio, A. Aimar, A. Edlund,D. Groep, F. Pacini, M. Sgaravatto, and O. Mulmo. Middleware for the nextgeneration grid infrastructure. (EGEE-PUB-2004-002):4 p, 2004.

[86] J. Lindemann and G. Sandberg. An extendable grid application portal. Volume3470/2005:1012–1021.

[87] J. Linderoth and S. Wright. Decomposition algorithms for stochasticprogramming on a computational grid. Computational Optimization andApplications, 24:207–250, 2003. 10.1023/A:1021858008222.

[88] M. Litzkow, M. Livny, and M. Mutka. Condor - a hunter of idle workstations.In Proceedings of the 8th International Conference of Distributed ComputingSystems, June 1988.

[89] M. Manivasakan. Querying Data Providing Web Services. PhD thesis,Uppsala UniversityUppsala University, Division of Computing Science,Computing Science, 2010.

[90] C. Marco, C. Fabio, D. Alvise, G. Antonia, G. Ghiselli, G. Francesco,M. Alessandro, M. Moreno, M. Salvatore, P. Fabrizio, P. Luca, andP. Francesco. The glite workload management system. In GPC ’09:Proceedings of the 4th International Conference on Advances in Grid andPervasive Computing, pages 256–268, Berlin, Heidelberg, 2009.Springer-Verlag.

[91] Zs. Nagy, Nilsen J. K., and S. Toor. Chelonia Administrator’s Manual.NorduGrid. NORDUGRID-MANUAL-10.

[92] Zs. Nagy, J. K. Nilsen, and S. Toor. Chelonia - Self-healing distributed storagesystem. NorduGrid. NORDUGRID-TECH-17.

[93] Zs. Nagy, J. K. Nilsen, and S. Toor. Chelonia User’s Manual. NorduGrid.NORDUGRID-MANUAL-14.

[94] C. Németh, G. Dózsa, R. Lovas, and P. Kacsuk. The p-grade grid portal.

61

Volume 3044/2004:10–19.[95] T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver,

K. Glover, M. R. Pocock, A. Wipat, and P. Li. Taverna: a tool for thecomposition and enactment of bioinformatics workflows. 2004.

[96] F. E. Redmond. Dcom: Microsoft Distributed Component Object Model withCdrom. IDG Books Worldwide, Inc., Foster City, CA, USA, 1st edition, 1997.

[97] D. M. Riese, P. Fuhrmann, T. Mkrtchyan, M. Ernst, A. Kulyavtsev,V. Podstavkov, M. Radicke, N. Sharma, D. Litvintsev, T. Perelmutov, andT. Hesselroth. dCache Book.

[98] G. Roczei, G. Szigeti, and I. Marton. ARC peer-to-peer information system.NorduGrid. NORDUGRID-TECH-21.

[99] M. Romberg. The unicore architecture: Seamless access to distributedresources. High-Performance Distributed Computing, InternationalSymposium on, 0:44, 1999.

[100] D.E. Ruddock, R. Wikoff, and R. Salz. Remote Procedure Calls. John Wileyand Sons, Inc., 2001.

[101] E. Seidel, G. Allen, A. Merzky, and J. Nabrzyski. Gridlab: a grid applicationtoolkit and testbed. Future Generation Computer Systems,18(8):1143–1153(11).

[102] G. A. Stewart, D. Cameron, G. A. Cowan, and G. McCance. Storage and datamanagement in egee. In ACSW ’07: Proceedings of the fifth Australasiansymposium on ACSW frontiers, pages 69–77, Darlinghurst, Australia,Australia, 2007. Australian Computer Society, Inc.

[103] L. M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner. A break in theclouds: towards a cloud definition. SIGCOMM Comput. Commun. Rev.,39(1):50–55, 2009.

[104] S. Venugopal, R. Buyya, and K. Ramamohanarao. A taxonomy of data gridsfor distributed data sharing, management, and processing. ACM Comput.Surv., 38(1):3, 2006.

[105] S. Vinoski. Corba: integrating diverse applications within distributedheterogeneous environments. IEEE Communications Magazine, 35(2):46–55,February 1997.

[106] L. Wang, J. Tao, M. Kunze, A. C. Castellanos, Kramer. D., and W. Karl.Scientific cloud computing: Early definition and experience. In HighPerformance Computing and Communications, 2008. HPCC ’08. 10th IEEEInternational Conference on, pages 825–830, Sept. 2008.

[107] A. Weise, M. Wan, W. Schroeder, and A. Hasan. Managing groups of files in arule oriented data management system (irods). In ICCS ’08: Proceedings ofthe 8th international conference on Computational Science, Part III, pages321–330, Berlin, Heidelberg, 2008. Springer-Verlag.

[108] J. Yu and R. Buyya. A taxonomy of scientific workflow systems for gridcomputing. SIGMOD Rec., 34(3):44–49, 2005.

[109] J. Yu, M. Li, Y. Li, F. Hong, and M. Gao. A framework for price-basedresource allocation on the grid. Volume 3320/2005(1):341–344, 2005.

62

Documents

Dedicated to my Family - DiVA portal523474/... · 2012-05-21 · Presentation at NorduGrid Conference, Sundvolden, 2011. Extension of Chelonia Storage System to handle databases