12
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2011; 23:223–234 Published online 28 December 2010 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.1687 Special Issue: Portals for life sciences—Providing intuitive access to bioinformatic tools Sandra Gesing 1, , , Jano van Hemert 2 , Peter Kacsuk 3 and Oliver Kohlbacher 1 1 Zentrum für Bioinformatik, Eberhard-Karls-Universität Tübingen, Germany 2 National e-Science Centre, School of Informatics, University of Edinburgh, U.K. 3 MTA SZTAKI, Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary SUMMARY The topic ‘Portals for life sciences’ includes various research fields, on the one hand many different topics out of life sciences, e.g. mass spectrometry, on the other hand portal technologies and different aspects of computer science, such as usability of user interfaces and security of systems. The main aspect about portals is to simplify the user’s interaction with computational resources that are concerted to a supported application domain. Copyright 2010 John Wiley & Sons, Ltd. Received 29 August 2010; Revised 26 September 2010; Accepted 16 October 2010 KEY WORDS: portals; life sciences; Grid and Cloud computing; workflows; semantics 1. IWPLS’09 It is our great pleasure to introduce this special issue on the First International Workshop on Portals for Life Sciences held at the e-Science Institute in Edinburgh, U.K., on 14–15 September 2009. The workshop was focused on research contributions for portals and tools in the field of life sciences and brought together scientists from the fields of life science, bioinformatics, and computer science. It formed an international platform to exchange experience, formulate ideas, and catch up on technological advances in molecular and systems biology in the context of portals. The variety of nationalities covered by the delegates reflects the international interest in a platform on scientific portals. Delegates from Austria, Brazil, Germany, Hungary, Italy, The Netherlands, Norway, Spain, The United Kingdom and The United States of America contributed to and attended the workshop. We especially highlight the mix of technology-related and domain-related talks, which mirrors the diverse scientific background of the delegates. Papers and abstracts for the workshop were accepted through a blind peer-reviewing process. Accepted papers resulted in 30-min presentations and were published in an open-access workshop proceedings. Accepted abstracts resulted in a 10-min ‘lightning talk’. The speakers gave excellent and thought-provoking presentations. The highlights were presentations from two invited keynote speakers. In addition to the presentations, lively and promising discussions took place after talks, in coffee breaks and even during the social event. The workshop was highly successful and turned out to be the starting point of an international workshop series with the subject extended to science gateways for e-Science: The International Workshop on Science Gateways. We faced the challenge to invite several authors for extended versions of their papers and abstracts for this special issue. Some of the papers in this special edition are combinations of papers Correspondence to: Sandra Gesing, Zentrum für Bioinformatik, Eberhard-Karls-Universität Tübingen, Germany. E-mail: [email protected] Copyright 2010 John Wiley & Sons, Ltd.

Special Issue: Portals for life sciences—Providing intuitive access to bioinformatic tools

Embed Size (px)

Citation preview

Page 1: Special Issue: Portals for life sciences—Providing intuitive access to bioinformatic tools

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCEConcurrency Computat.: Pract. Exper. 2011; 23:223–234Published online 28 December 2010 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.1687

Special Issue: Portals for life sciences—Providing intuitive accessto bioinformatic tools

Sandra Gesing1,∗,†, Jano van Hemert2, Peter Kacsuk3 and Oliver Kohlbacher1

1Zentrum für Bioinformatik, Eberhard-Karls-Universität Tübingen, Germany2National e-Science Centre, School of Informatics, University of Edinburgh, U.K.

3MTA SZTAKI, Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary

SUMMARY

The topic ‘Portals for life sciences’ includes various research fields, on the one hand many differenttopics out of life sciences, e.g. mass spectrometry, on the other hand portal technologies and differentaspects of computer science, such as usability of user interfaces and security of systems. The main aspectabout portals is to simplify the user’s interaction with computational resources that are concerted to asupported application domain. Copyright � 2010 John Wiley & Sons, Ltd.

Received 29 August 2010; Revised 26 September 2010; Accepted 16 October 2010

KEY WORDS: portals; life sciences; Grid and Cloud computing; workflows; semantics

1. IWPLS’09

It is our great pleasure to introduce this special issue on the First International Workshop onPortals for Life Sciences held at the e-Science Institute in Edinburgh, U.K., on 14–15 September2009. The workshop was focused on research contributions for portals and tools in the field oflife sciences and brought together scientists from the fields of life science, bioinformatics, andcomputer science. It formed an international platform to exchange experience, formulate ideas,and catch up on technological advances in molecular and systems biology in the context of portals.The variety of nationalities covered by the delegates reflects the international interest in a platformon scientific portals. Delegates from Austria, Brazil, Germany, Hungary, Italy, The Netherlands,Norway, Spain, The United Kingdom and The United States of America contributed to and attendedthe workshop. We especially highlight the mix of technology-related and domain-related talks,which mirrors the diverse scientific background of the delegates.

Papers and abstracts for the workshop were accepted through a blind peer-reviewing process.Accepted papers resulted in 30-min presentations and were published in an open-access workshopproceedings. Accepted abstracts resulted in a 10-min ‘lightning talk’. The speakers gave excellentand thought-provoking presentations. The highlights were presentations from two invited keynotespeakers. In addition to the presentations, lively and promising discussions took place after talks,in coffee breaks and even during the social event. The workshop was highly successful and turnedout to be the starting point of an international workshop series with the subject extended to sciencegateways for e-Science: The International Workshop on Science Gateways.

We faced the challenge to invite several authors for extended versions of their papers andabstracts for this special issue. Some of the papers in this special edition are combinations of papers

∗Correspondence to: Sandra Gesing, Zentrum für Bioinformatik, Eberhard-Karls-Universität Tübingen, Germany.†E-mail: [email protected]

Copyright � 2010 John Wiley & Sons, Ltd.

Page 2: Special Issue: Portals for life sciences—Providing intuitive access to bioinformatic tools

224 S. GESING ET AL.

and abstracts where original authors collaborated. The papers cover the following topics: workflow-enabled Grid portals, security in Grid portals, cost-effective and time-efficient development ofscientific portals, and specific applications in the field of life sciences. The latter consists of papersthat take a more detailed look at for instance virtual user communities and data management.

We appreciate we cannot cover all aspects on portals for life sciences. Hence, we first give abroad overview on portals in High-Performance Computing (HPC) facilities, and Grid and Cloudinfrastructures with subsections dedicated to Grid portals for life sciences, before introducing thetopics covered by the papers in this special issue of Concurrency and Computing: Practice andExperience.

2. INTRODUCTION TO PORTALS ON LIFE SCIENCES

Life sciences cover a broad range of disciplines including biology and medicine. In all thesefields, computational tools have become indispensable in research and development. Computationalmethods often require specific computational resources and highly advanced computing skillsfor installation, administration, and daily use. Scientists want to focus on their specific researchcombining all kinds of approaches, but they do not want to deal with the details of softwareinstallation, usability, and hardware configuration. Hence, there is a need for self-explanatory andintuitive user interfaces for computational tools in the life sciences.

The installation of scientific software on the user side is often awkward and difficult. Furthermore,it requires users to take the responsibility of keeping their software up-to-date. Portals offer analternative interface that avoids most of these drawbacks. In general, a portal can be defined as aframework for integrating information and applications. It operates across organizational boundariesand as a single entry point for a community. Users are in the position to customize their tools andviews and are provided with a repository of personal information. Most users are familiar withcommercial portals such as Amazon or Google.

There are various aspects to consider in the context of scientific portals. The main aspect is theuser in the supported domain and his role as end-user, developer, or administrator. Irrespective ofthe underlying infrastructure and whether the integrated tools rely on internet technologies, theuser should be empowered with intuitive tools that are relevant to their specific scientific domainand role in that domain.

Besides usability, several functional aspects are important for scientific computing portals. Firstof all, job monitoring and control, which enables users to check and change the status of theircompute jobs. Access to tools and data should be granted on the basis of an authentication andauthorization strategy. Restricting access to sensitive data and dealing with large data sets arerecurring topics in life sciences, which highlights the important role of security in portals and theneed for sophisticated distributed data management.

In the context of scientific portals, there is still much work needed to improve human–computerinteraction. The kind of interaction that is chosen for a specific portal depends on the domain theportal is developed for and the requirements posed by its community.

3. PORTALS IN HPC FACILITIES, AND GRID AND CLOUD INFRASTRUCTURES

Available computational resources exist in the form of HPC facilities, and more recently, Grid andCloud computing infrastructures. All these enable scientists to access a large set of resources inorder to accelerate the execution of their sophisticated scientific applications. However, scientistsfind interacting with these facilities and infrastructures requires much detailed technical knowledge,which is hampering the scientific progress where a need for large-scale analysis and simulationsis needed. Most HPC, Grid, and Cloud technology manifests itself in the form of sophisticatedcommand-line languages that require a significant effort to learn and use correctly. The unreliablenature of the Grid is a particular large barrier to adoption for scientists.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:223–234DOI: 10.1002/cpe

Page 3: Special Issue: Portals for life sciences—Providing intuitive access to bioinformatic tools

PORTALS FOR LIFE SCIENCES 225

Portals offer a viable alternative approach. They usually provide an intuitive user interface thatrequires much less effort to learn and use in a correct way. A typical portal consists of a front-endand a back-end layer. The front-end layer provides the necessary high-level interface and tools foruser interaction, whereas the back-end layer deals with the underlying computational framework.The latter calls the necessary scripts or services that operate that are specific to the computingresources used. In the case of HPC and Grid, these scripts typically use a command-line interface.A well-designed portal should hide the technical details of these scripts.

Portals for scientific computing are a particularly popular concept in Grid computing. This ismost likely because the Grid aims to serve a broad range of scientific disciplines, which includemany scientists that have no experience with large computing systems. We will discuss severaldifferent approaches to scientific computing portals next.

Grid portals can be classified as generic-purpose portals enabling the development of various Gridapplications or application-specific portals created for the use of one particular user community.Usually, application-specific portals can be built on top of generic-purpose portals by addingapplication-specific portlets to the basic set of portlets and omitting some of the generic-purposeportlets in order to simplify the use of the portal for the end-user community.

3.1. Generic-purpose Grid portals

Although there were many attempts to create a generic-purpose Grid portal, not many of themwere able to survive and reach a maturity level where they can really be used by various usercommunities in a reliable way. In this paper, we selected five generic-purpose Grid portals that havebeen used by several communities in order to illustrate the typical solutions applied in such portals.These portals are OGCE [1], GridPort [2], Vine Toolkit [3], P-GRADE [4, 5], and Genius [6].

In most cases to successfully build a Grid portal, it is better to build on an existing Gridportal framework than writing every portal function from scratch. The most commonly used portalframework is GridSphere [7] that provides the basic functionalities of a JSR168-compliant portalframework. It provides user management and a portlet framework whereby new functionalitiescan be added to the portal in a modular and standard way. As such any portals that are writtenfor GridSphere should be compatible with other JSR168-compliant frameworks such as Liferay,IBM’s Websphere, and Apache’s Jetspeed. Owing to this advantage, all the portals except Geniusare written on top of GridSphere. Recently Liferay [8] has gained popularity and hence VineToolkit and P-GRADE have been ported to Liferay, too. A further benefit of the portlet concept isthat it enables the easy customization of these portals to user needs by adding application-specificportlets to the basic set of generic-purpose portlets.

Grid portals typically realize a layered concept in their architecture implementing the followinglayers:

1. User interface layer;2. Applications, portlets layer;3. High-level (aggregation) services layer.

These layers are implemented on top of the low-level Grid services layer (Grid middleware).A Grid portal’s versatility depends on how many Grid middlewares it can manage by the high-level services’ layer. Most portals (except Genius) support Globus as the most widely used Gridmiddleware. The second most supported Grid middleware is gLite that is used in the EGEE Gridinfrastructure. All the three Grid portals developed in Europe (Vine Toolkit, P-GRADE, Genius)support gLite since EGEE is the largest Grid infrastructure in Europe. The U.S. portals supportsolely Globus since both OSG and TeraGrid are based on Globus. Since in Europe there are otherpopular Grid middlewares beyond gLite, Vine Toolkit and P-GRADE support further middlewaresbesides Globus and gLite. A unique feature of P-GRADE is that it supports even BOINC and hencewith the help of P-GRADE portal application execution on BOINC became actually manageablefor end-users. Previously, it was the privilege of BOINC project administrators to submit jobs.

Once a Grid portal is able to support various Grid types, it is another important feature if theportal can be connected to several Grids and can run jobs simultaneously on several Grids. This

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:223–234DOI: 10.1002/cpe

Page 4: Special Issue: Portals for life sciences—Providing intuitive access to bioinformatic tools

226 S. GESING ET AL.

feature is particularly interesting when large parameter sweep applications should be executed andhence the parallel exploitation of several Grids is an advantage. This feature is especially importantin heterogeneous Grids like the D-Grid in Germany where Globus, gLite, and UNICORE sites areavailable. Unfortunately, the current Grid portals are not prepared for this flexible usage of theGrids. The only exception is P-GRADE, which can simultaneously run several jobs of a workflowin any Grids that are supported by P-GRADE.

If a portal supports several Grids or VOs, then it is important to provide some forms of Gridsettings facilities. Unfortunately, this is an area overlooked by most Grid portals. Only Genius andP-GRADE provide some possibilities to control the usage of Grid resources and services. Geniusenables the users to select resource broker, replica location server, and MyProxy server. It couldbe a very useful function in an unreliable Grid infrastructure where any of these services can bedown at any time. Genius also enables the connection of several VOs to the same portal and userscan select the VO they would like to use. This is very similar to P-GRADE where not only severalVOs but also several types of Grids can be connected to the same portal. It is the task of theportal administrator to configure the portal for various Grids and VOs, but then the user can add orremove resources from any of those Grids and VOs as he likes. In this way, a user can customizeany VO or Grid according to his preferences. For example, if a site proves to be very unreliableand the user is fed up with the large number of failed jobs on this resource, then he can removethis resource from the list of resources his applications can use. Of course, it has no impact forthe available resource list of other users.

In order to submit jobs to any Grid (except for BOINC), the user needs a user certificate. Globus-based Grids work with temporary proxys generated from the certificate and UNICORE supportstrust delegation, e.g. with SAML. The management of certificates and proxys is also an importantand typical task of Grid portals. All the examined Grid portals provide some functionalities tohandle certificates and proxys. Most of them support the most popular MyProxy server concept,but GridPort uses Kerberos instead of MyProxy server.

The next issue is the support of data management in the Grid. The most popular solutionsare GridFTP, SRB (Storage Resource Broker), SRM (Storage Resource Manager), and OGSA-DAI. GridFTP was introduced by Globus and hence every portal supporting Globus typicallysupports GridFTP, too. SRM is used in gLite and hence every portals supporting gLite handlesSRM. SRB is also popular in Globus-based Grids; hence, related portals provide SRB management,too. Finally, OGSA-DAI is a high-level database access Grid service that is currently supportedby Vine Toolkit and P-GRADE.

As mentioned before, Grids are not perfectly reliable and hence monitoring the Grid resourcesis an important feature for the users to check the status of the various Grid resources. Themonitoring information usually describes other important features of resources such as capacity,load, configuration, etc. Every Grid portal should be able to show this information to the users andindeed, all the investigated portals have this functionality. Furthermore, it is not only the resourcesthe user would like to observe but also the jobs as they are executed in the Grid; hence, applicationmonitoring is another must in Grid portals.

Since the architecture of Grid portals is layered, it is not too difficult to modify the portal layersto support Cloud middlewares, too. OGCE and P-GRADE have already started to support variousClouds and it is expected that other portals will follow this practice. Probably, the most difficultpart of Cloud support is the creation of the necessary user interface to provide accounting andpaying interface for commercial Clouds.

3.2. Application-specific Grid portals

As mentioned before, generic-purpose portals can be the basis to be customized toward anapplication-specific portal. The advantage of this concept is that the Grid-related high-level (aggre-gation) services layer of the portal can be used without any modification. Only the top two layers,the User interface layer and the Applications portlets layer should be modified. This significantlyreduces the development time of the application-specific portal since usually the most difficult partin creating a Grid portal is the development of the high-level (aggregation) services layer. Not

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:223–234DOI: 10.1002/cpe

Page 5: Special Issue: Portals for life sciences—Providing intuitive access to bioinformatic tools

PORTALS FOR LIFE SCIENCES 227

Table I. Comparison of generic-purpose Grid portals.

OGCE GridPort Vine Toolkit P-GRADE Genius

Grid portalframework

GridSphere2.1.5

GridSphere2.0.2

GridSphere 3.1and Liferay

GridSphere 3.1and Liferay

No

Grid support Globus Globus Globus, gLite,UNICORE,GRIA

Globus, gLite,ARC, BOINC

gLite

Simultaneousmulti-Grid support

No No No Yes No

Grid settings No No No Yes Yes

VO certificatemanagement

MyProxy based Kerberos based Credential/certificatemanager

MyProxy based MyProxy based

Grid datamanagement

GridFTP, SRB GridFTP, SRB GridFTP, SRM,SRB, OGSA-DAI

GridFTP, SRM,SRB*, OGSA-DAI*

SRM

Grid resourcemonitoring,informationservice

Yes Yes Yes Yes Yes

Job executionmonitoring

Yes Yes Yes Yes Yes

Cloud support Amazon S3,EC2

No No Eucalyptus,OpenNebula

No

Used for LEAD BIRN QosCosGridtestbed portal

G-FLUXOPROSIMCancerGrid

ALICE portal

surprisingly this concept is quite popular and was followed in the case of many application-specificportals. Here we list some examples that were derived from the five example generic-purpose portals(see Table I).

The LEAD portal [9] that is a weather forecast portal was developed on top of OGCE. The BIRNportal [10] that is a biology portal was created based on GridPort. The Vine Toolkit was appliedto create the QosCosGrid [11] testbed portal. The G-FLUXO portal [12] specialized in Computa-tional Biochemistry was developed on top of P-GRADE. The ProSim [13] and CancerGrid [14]life sciences portals were customized on top of WS-PGRADE portal (the second-generationP-GRADE portal).

Another option is to write the application-specific portal directly on top of a low-level portalframework such as GridSphere or Liferay. For example, Pandora [15], the portal of the neuGRIDproject [16], was developed on top of Liferay. The problem with this concept is that all the threelayers of Grid portals should be developed from scratch. This not only makes the developmenttime longer but also reduces the reliability of the portal as one can expect many bugs and pitfallsto occur. In contrast, these have been dealt with during the development of generic portals anddedicated scientific portal development toolkits as introduced in Section 5.

3.3. Workflow-enabled Grid portals

Applications frequently require the execution of several jobs in a certain order. Without applyingworkflows, the user should do by hand all the necessary file transfer among the jobs of theapplication. This is an error-prone, tedious task especially in an unreliable Grid infrastructure.Workflows enable to automatically manage these long and tedious application execution processes.Therefore, it is not surprising that workflow applications are getting more and more popular in thescientific community.

A very distinguishing feature of Grid portals is if they are able to support workflow developmentor not. Taking again the same five Grid portals as example, Table II shows their workflow support

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:223–234DOI: 10.1002/cpe

Page 6: Special Issue: Portals for life sciences—Providing intuitive access to bioinformatic tools

228 S. GESING ET AL.

Table II. Comparison of generic-purpose Grid portals.

OGCE GridPort Vine Toolkit P-GRADE Genius

Workflow editor XBaya No No P-GRADE workflow Triana workflow(graphical) (graphical) (graphical)

Parameter sweep No No No Yes Noworkflow executionWorkflow repository XRegistry No No DSpace No

characteristics. As it can be seen, only three of them provide workflow editors to develop workflowapplications to be executed in the underlying Grid infrastructures.

Many Grid applications come from the area of scientific simulation where parameter sweepsupport is very important. Therefore, it is also important to investigate whether a portal frameworkis able to support parameter sweep applications both at the workflow language level and withinthe high-level services layer. Among the selected five Grid portals, only P-GRADE has built-insupport for parameter sweep applications.

Another important aspect of generic workflow-enabled portals is whether they support the re-useof existing workflows by different members of the portal user community. This is also an oftenoverlooked feature of portals. Usually, they are designed to support only individual users and notuser communities. However, the state-of-the-art portals should put emphasis on supporting usercommunities by enabling application developers to publish complete applications or workflowtemplates in a workflow repository. Then end-users can download the published applications fromthe workflow repository and can execute them in the connected Grids controlled by the portal.This concept is quite new and supported only by two of the example portals. OGCE providesXRegistry to store workflow application while P-GRADE is integrated with the open-source DSpacerepository [17] developed by MIT.

3.3.1. Workflow-enabled grid portals for life sciences. Although Grid portals are used in manyareas of science, one important area where the usage of portals is actively investigated is lifescience. This community has recognized early the importance of providing easily usable, intuitiveGrid interface for the scientists and particularly in the case of complex applications where themost convenient and adequate way of describing and controlling the application execution is theusage of the workflow concept. There are more workflow-oriented experimental and productionsystems that exist than we can cover here; hence, we have selected as illustrating examples onlythree projects that created advance workflow-enabled Grid portals for scientists working in thefield of life sciences.

The BIRN (Biomedical Informatics Research Network) portal has been developed in the U.S.by a consortium including more than 20 universities and 30 research groups dealing with brainimaging of human neurological disease. The BIRN portal is a workflow and application integrationenvironment enabling researchers to visualize and perform analysis on data stored within theBIRN data Grid. The portal environment allows for the management and execution of workflowsdeveloped in the LONI pipeline or the Kepler workflow system. The supported use case scenarioenables collaborative experiments where the user can follow the data flow all the way from datacollection to the BIRN data Grid through interactive processing stages that can be executed onGrid resources.

The G-Fluxo project is devoted to the development of a Grid Portal Workflow specializedin Computational Biochemistry where very different computational platforms (Grids, clusters,etc.) can be used without the need of very specific computer skills. After a careful investigationof existing Grid portal frameworks, P-GRADE Portal has been chosen as the starting point ofthis project since the most widely used Grid middleware systems are supported by this portal.Within the project, P-GRADE was extended with the OGF standard DRMAA protocol in order toprovide access to local clusters, too. As a result, the G-Fluxo project developed the COMPCHEMP-GRADE portal that enables the mixed use of Grid (COMPCHEM VO of EGEE Grid) and local

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:223–234DOI: 10.1002/cpe

Page 7: Special Issue: Portals for life sciences—Providing intuitive access to bioinformatic tools

PORTALS FOR LIFE SCIENCES 229

clusters for the biochemistry workflows developed in the project. The project has also implementeda GROMACS package and a JMOL portlet for the portal.

The aim of the U.K. ProSim project was to design and create a science gateway that couldbe utilized by bio-scientists to run massively parallel simulation workflows in order to discoverthe mechanisms that lead to specific and selective recognition of carbohydrates by proteins. TheProSim project selected WS-PGRADE portal as the basis of the ProSim Science Gateway [3] dueto its support for parameter sweep workflow execution and its service status on the U.K. NationalGrid Service. The user scenario for modeling receptor–ligand interaction includes well-known andwidely used bioscience application packages such as Gromacs and AutoDock as well as someopen-source molecule visualization tools. All these are integrated together within a WS-PGRADEworkflow application that can be executed both on the U.K. NGS and in the EGEE Grid.

3.4. Semantic Grid portals for life sciences

Semantic portals in general integrate knowledge based on ontologies and support the users withpersonalized views on information processed automatically with semantic descriptions. In the fieldof life sciences, various ontologies have been developed to meet the users’ needs in their specificresearch area, e.g. the NCBO’s BioPortal (National Center for Biomedical Ontologies) [18] listsnumerous ontologies in the biomedical field. Most of them are based on OWL (Web OntologyLanguage) [19] or on OBO (Open Biomedical Ontologies) [20]. OWL is a W3C standard forrepresenting ontologies in a web-compatible language (RDF/XML) and has become the standardfor web ontologies. It models the methods to allow machines to understand the meaning ofinformation on the World Wide Web as proposed by Berners-Lee [21]. OBO is often used in lifesciences projects and it is quite informal. Christine Golbreich et al. specified the OBO syntaxand semantics to map it to OWL [22]. This allows to use the existing semantic web tools andtechniques for OBO.

There are two different definitions of semantic Grid portals for life sciences. The first one solelyutilizes Grid infrastructures for supporting processes and maintaining data. The second one utilizesGrid infrastructures and delivers additional knowledge about Grid infrastructures or Grid services.

The BioSphere portal [23] belongs to the group of Grid portals fitting to the first definition. Itsupports the collaborative creation of biological ontologies by scientists. Users are enabled to editand access ontology versions and different ontologies via ontology editor and browser. These toolsare embedded in the portal that is based on Grid technology. BioSphere is developed under theGridSphere portal framework and uses OGSA-DAI for data management. Furthermore, the portalbenefits from the existing security mechanisms of the Grid and supports many data and systemintegration scenarios, e.g. to allow direct access to the BioSphere database from portlets deployedby others via secure web-service calls.

At present there are no portals available or published that would fit to the second definitionand additionally would fit to our definition of portals as far as we know. Available tools for Gridservice discovery are tools that need to be installed by the users to profit of the full functionality,e.g. Taverna [24]. The myGrid ontology [25] for service discovery of web and Grid services can behandled by Taverna, a workflow workbench based on the workflow language SCUFL. Users canregister Grid and web services via Taverna, which is required to make them apparent for a servicediscovery engine. The system ServiceSemanticWeb [26] allows the registration of bioinformaticresources via a web interface. However, the user needs the graph-based interface, implemented asrich-client, to discover a resource.

4. SECURITY IN GRID PORTALS

The characteristics of a Grid portal being a single entry point to HPC facilities, and Grid and Cloudinfrastructures demands a sophisticated security infrastructure. The aim is to provide single-signon for the users regardless of which infrastructure they are authorized for.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:223–234DOI: 10.1002/cpe

Page 8: Special Issue: Portals for life sciences—Providing intuitive access to bioinformatic tools

230 S. GESING ET AL.

Portal frameworks are not stand-alone applications but consist of a container with defaultapplications and a portal interface that is deployed inside an application, e.g. Apache Tomcat. Anapplication server handles the HTTP/HTTPS traffic between web applications and the clients, andprovides a default implementation of a server interface and a servlet container. Hence, the basicsecurity concept in a portal relies on the security concept of the underlying application server. Thestandard authentication is to offer a login via username/password with the possibility to connect toan LDAP-based directory server. The authorization is role-based and each user can have a numberof roles. Portal frameworks such as GridSphere, Liferay, and Jetspeed 2 are extended for the useof additional authentication concepts. The bundle GridSphere 3.x/Apache Tomcat, for example,offers a certificate-based login. The portal of the project bwGRiD [27] utilizes this feature andsupports the creation of a MyProxy certificate via a Firefox addon.

In Section 3.1, we already presented that most of the established Grid middlewares are usingX.509-based certificates for authentication except for BOINC. The associated process to receive acertificate for a VO is an initial barrier for non-expert users and hampers the use of high-computingfacilities. Furthermore, users have to create proxy certificates for most of the Grid middleware.Hence, there are solutions to lower this barrier.

Barbera et al. [28] suggest to use robot certificates for user communities. These certificatesauthorize users for specific tools and resources. The GENIUS Grid portal offers a mechanisminvisible for a logged-in user that generates automatically a proxy certificate and can be used forchosen functions and resources in the EGEE Grid. The Grid middleware UNICORE also utilizesa public key infrastructure (PKI) based on certificates, but users do not have to create proxycertificates. The software supports trust delegation with digitally signed Security Assertion MarkupLanguage (SAML) authentication. The MoSGrid portal [29] developed on top of WS-PGRADEwill offer access to molecular simulations tools in the D-Grid infrastructure via UNICORE andSAML authentication.

There are several projects using federated access control based on the Internet2 Shibbolethtechnologies that are working with SAML. Users are authenticated by their home organiza-tion Identity Provider (IdP) server and the Service Provider (SP), e.g. Shibboleth, only need totrust the limited number of IdPs for authentication purposes. Is the user authenticated by hishome authentication server, an SAML authentication assertion message is sent to the SP. Sinnottet al. [30] introduced two case studies, the DAMES portal [31] and the EuroDSD portal [32].The DAMES project developed secure portals for occupational data management, educational datamanagement, and e-Health data management. The EuroDSD portal supports a secure upload andsearching of clinical case information for disorders of sex development (DSD).

5. COST-EFFECTIVE AND TIME-EFFICIENT PORTAL DEVELOPMENT TOOLKITS

Portal development is often considered expensive—many existing scientific gateways for runningcomputing processes remotely have spent in the order of 24 person months of development time.There are several reasons for this. First, specifications are often vague, which means that thedevelopment process depends on iterating through many versions to get the final version right.Second, this type of portal requires developers to know about the domain they are developing for,the general area of portal, and web development as well as be able to work with HPC, Grid, and/orCloud computing infrastructures—currently, this skill set is often not found in one person. Third,the current state of portal development tools is immature; the frameworks are often stable, but theunderlying development processes are not as developed as in other areas, making debugging andintegrating components time consuming.

All this makes scientific portal development expensive relative to the user base a scientific portalbase often supports. The specific goal of a scientific portal is often only of interest to hundredsof researchers, which is significantly less than would be the case for commercial portals such asAmazon, Google, LinkedIn, and FaceBook. Therefore, the design and implementation plan of thescientific portals must at every stage take into account the fact that resources for development andmaintenance will be scarce and often have an end date that coincides with an externally funded

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:223–234DOI: 10.1002/cpe

Page 9: Special Issue: Portals for life sciences—Providing intuitive access to bioinformatic tools

PORTALS FOR LIFE SCIENCES 231

Table III. Three mature and live portlet generators that require little to no development inconventional programming languages.

Rapid Rappture EnginFrame

Stable Version 1.0 1.0 5.0Development Version 1.0 1.1 n/aLicense Open Source Open Source CommercialLanguage used XML XML(1) XMLRemote connection Ssh and JSDL VNC VNCPortal dependencies None NanoHUB EnginFrame ServerResource description In portlet Not appropriate In portal

(1) The original application must be adapted as well.

project. We provide a brief comparison of three toolkits that aim to provide a cost-effective andtime-efficient solution to the development and maintenance of scientific computing portals. A moredetailed overview of two of the solutions is provided in the paper entitled ‘Generating web-baseduser interfaces for computational science’ [33] in this special issue.

In Section 3.1, a comparison is provided of several generic solutions. Generic here means thatthese solutions can be used by researchers from any discipline to execute whatever application theydepend on. Of course, this does mean that the researcher must know how to use the applicationand must invest significant time to learn how to go from a generic framework for job submissionsto their specific task.

A distinction can be made between portals that provide generic user interface for submittingcompute jobs and frameworks that can be used to develop a web portal for submitting jobs, as well asother tasks, where the jobs can be generic or more tailored to a set of tasks from a specific discipline.Basically, the latter provides a container to host individual applications; these applications mustthen be developed manually. Issues around these different solutions are discussed in Section 3.2.

Here we will focus on frameworks that aim to provide a specific solution to enable existingapplications on to compute and data resources with the minimal amount of development required.Three of these frameworks currently exist, which are all actively maintained. These frameworks aredifferent from the previously described solutions in that these can be used to provide a specific userinterface as supposed to a generic job submission interface without the need to develop a web portalfrom scratch on top of the existing portal containers. All three take a slightly different approach toachieving the goal of providing web-based access to applications running elsewhere. We discussbelow the latest stable versions as of this date and compare several of their discriminative features.Table III shows basic information about the versions we have used.

Rappture is a toolkit developed at the Purdue University [34]. Its aim is to web enable scientificapplications by providing an experiment environment specifically to each application. It combinesnumerical building blocks, such as Poisson equation solvers and iterative matrix solvers, alongwith an infrastructure for handling user interfaces. Once you describe the input/output for yoursimulator, Rappture handles the rest, generating a graphical interface automatically based on yourdescription. The resulting application must then be deployed on the nanoHUB portal framework.

Development of a portlet involves two stages. In stage one, the interface is defined in terms ofinputs and outputs in an XML file. In stage two, the original application is modified to include state-ments to allow the user interface to control the application and to show relevant output. The latterdevelopment depends on the language the original application was developed in. Rappture currentlysupports C/C++, TCL, Fortran, Perl, Ruby, Octave, and Matlab. The connection between theremote application and the web interface is via the Virtual Networking Computing (VNC) protocol.A significant difference between Rappture and the next two solutions is that the current version isnot specifically designed to enable the applications to run on large-scale compute resources.

EnginFrame is a commercial product developed by NICE s.r.l. [35], which includes an XMLlanguage to define service descriptions of processes that are to be made available in a graphical userinterface as well as the actual portal framework that provides this user interface. The frameworksupports many different job schedulers, which include both cluster and Grid computing.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:223–234DOI: 10.1002/cpe

Page 10: Special Issue: Portals for life sciences—Providing intuitive access to bioinformatic tools

232 S. GESING ET AL.

Development of an interface is through a description of its inputs, outputs, and executionspecifics. Compute resources are maintained by the overall portal framework. This is differentfrom the next solution, where each portlet is self-contained and therefore can live in any containeras it knows everything it needs to successfully handle compute jobs.

Rapid is developed by the U.K. National e-Science Centre at the University of Edinburgh [36].Its philosophy is to make submitting compute jobs as easy as booking a flight or purchasing abook online. The main idea is that the whole task, the resources, and the interface are describedin one XML file. This file is then translated directly into a portlet that can be used directly in aportal container. An important aspect is that Rapid allows the construction of a task that guidesits users through several steps; it does not try to be a generic interface for any compute job. Thisway each individual portlet can be tailored to the requirements of its user base.

Portlets generated through Rapid do not depend on any specific portal platform, which is incontrast to the other two solutions and to most so-called generic job submission portals. Instead itrelies on the JSR-168 industry standard for portal frameworks, which allows portlets created withRapid to work in any portal framework compliant with this standard. Most vendor and open-sourceportal frameworks adhere to this standard; we recommend using Rapid with the open-source versionof Liferay as it is stable, mature, and well maintained. Moreover, compute jobs are internallytranslated into the Job Submission Description Language (JSDL), another standard that adheres tothe Basic Execution Engine standard. This allows Rapid to connect to any system that is able toconsume JSDL, such as GridSAM.

6. SELECTED APPLICATIONS

In the field of life sciences, various complex and sophisticated tools and algorithms are alreadyavailable. One challenge is to attract user communities to use these tools. Nowadays, socialnetworking tools like Facebook are popular and widely used. Users are well versed in handling suchportals and form virtual groups in portals. This popularity led to scientific activities to offer portalsfor virtual scientific communities. The users expect collaborative tools for data management, searchtools, and analysis tools for their specific area and would be affected by chatrooms and wikis. Inthis special issue, Elsayed et al. introduce two portals for collaborative research communities [37],VectorBase and BGA-Space. VectorBase attracts scientists in the field of invertebrate vectors thattransmit human diseases with data mining tools, annotation pipelines, and analysis. BGA-Spaceoffers advanced data management features for the full life cycle of data in breath gas analysisexperiments. These complementary case studies show different approaches to meet the researchers’needs in application-specific portals.

Long running tasks like molecular dynamics simulations demand the use of Cloud and Gridinfrastructures to achieve acceptable performance. A detailed profile of the interface to those toolsand of the required cores and CPU hours is necessary for portal providers to support users to accessfitting infrastructures. Furthermore, a rudimentary understanding of the scientific background,which tools are used and which analysis steps have to be used, is essential to provide the researcherswith an intuitive user interface. Jens Krüger and Gregor Fels present in this special issue anexample of high-performance molecular dynamics in the case of ion permeation simulations byGromacs [38]. They compared two methods for the calculation of energy profiles and measuredthe performance on different HPC clusters and the influence of required CPU hours and realsimulation time. This case study shows a typical user requirement in the context of moleculardynamics simulations under consideration of a scientific question.

7. SUMMARY

We have set the scene for this special issue on scientific portals in the context of life sciences.This paper introduces most of the key aspects that were raised during the 2-day workshop heldat the Edinburgh e-Science Institute in September 2009. It also provides a balanced overview of

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:223–234DOI: 10.1002/cpe

Page 11: Special Issue: Portals for life sciences—Providing intuitive access to bioinformatic tools

PORTALS FOR LIFE SCIENCES 233

tools and technologies out there that are mature and ready to use. The further papers in this specialissue will explore several of the concepts in more detail and provide compelling use cases whereportals were taken full advantage of.

ACKNOWLEDGEMENTS

We acknowledge the sponsorship of the Edinburgh e-Science Institute and the Scottish BioinformaticsForum to the 2009 International Workshop on Portals for Life Sciences. We also thank the authors fortheir contributions and the reviewers for their effort. We owe much gratitude to the local organizers,for without their hard work the workshop would not have been such a success. To conclude, we thankthe editors of this journal for providing us with the opportunity for the special issue on portals for lifesciences.

REFERENCES

1. Amin K, Hategan M, von Laszewski G, Zaluzec NJ. Abstracting the Grid. Proceedings of the 12th EuromicroConference on Parallel, Distributed and Network-Based Processing, PDP 2004, Coruna, Spain, 2004; 250–257.

2. Dahan M, Boisseau JR. The GridPort Toolkit: A system for building grid portals. Proceedings of the 10th IEEEInternational Symposium on High Performance Distributed Computing (HPDC ’01). IEEE Computer Society,Washington, DC, U.S.A., 2001; 216.

3. Vine Toolkit. Available at: http://www.ogf.org/OGF28/materials/1953/OGF28_PSNC_Vine_Toolkit.pdf [25 August2010].

4. Kacsuk P, Sipos G. Multi-grid, multi-user workflows in the P-GRADE portal. Journal of Grid Computing 2005;3(3–4):221–238.

5. Kacsuk P. P-GRADE portal family for Grid infrastructures. Concurrency and Computation: Practice andExperience, Special Issue on IWPLS 2009 2009.

6. Barbera R, Falzone A, Ardizzone V, Scardaci D. The GENIUS Grid Portal: Its architecture, improvements offeatures, and new implementations about authentication and authorization. Sixteenth IEEE International Workshopson Enabling Technologies: Infrastructure for Collaborative Enterprises, WETICE 2007, Paris, 2007; 279–283.

7. Novotny J, Russell M, Wehrens O. GridSphere: An advanced portal framework. Proceedings of the 30thEUROMICRO Conference, EUROMICRO, Rennes, France, 2004; 412–419.

8. Liferay. Available at: http://www.liferay.com/products/liferay-portal;jsessionid=FFBC3407A12B3EE550EDAFDE728936B1.node-1 [25 August 2010].

9. Gannon D, Plale B, Marru S, Kandaswamy G, Simhann Y, Shirasuna S. Dynamic, adaptive workflows forMesoScale meteorology, in workflows for e-science. Workflows for eScience: Scientific Workflows for GridsSpringer: Berlin, 2007; 126–142.

10. Lin AW, Peltier SW, Grethe JS, Ellisman SH. Case studies on the use of workflow technologies for scientificanalysis: The biomedical informatics research network and the telescience project. Workflows for e-science.Springer: Berlin, 2007; 109–125.

11. Kurowski K, de Back W, Dubitzky W, Gulyás L, Kampis G, Mamonski M, Szemes G, Swain M. ComplexSystem Simulations with QosCosGrid (Lecture Notes in Computer Science, vol. 5544). Springer: Berlin, 2009;387–396.

12. Gutiérrez E, Costantini A, Cacheiro JL, Rodríguez A. G-FLUXO: A workflow portal specialized in ComputationalBioChemistry. Proceedings of First Workshop IWPLS’09, CEUR Workshop Proceedings, Edinburgh, U.K., 2009;ISSN 1613-0073. Available at: CEUR-WS.org/Vol-513/paper04.pdf [30 November 2010].

13. Kiss T, Greenwell P, Heindl H, Terstyanszky G, Weingarten N. Parameter sweep workflows for modellingcarbohydrate recognition. Journal of Grid Computing 2010; 8(4):587–601.

14. Kovács J, Kacsuk P, Lomaka A. Dedicated desktop grid system for drug discovery. Future Generation ComputingSystems, The International Journal of Grid Computing and eScience, submitted.

15. Pandora. Available at: http://ng-maat-devel1.maatg.eu/pandora/ [25 August 2010].16. neuGRID. Available at: http://www.neugrid.eu/pagine/home.php [25 August 2010].17. DSpace. Available at: http://www.dspace.org/ [25 August 2010].18. NCBO’s BioPortal. Available at: http://bioportal.bioontology.org/ [25 August 2010].19. OWL. Available at: http://www.w3.org/TR/owl2-overview/ [25 August 2010].20. OBO. Available at: http://www.geneontology.org/GO.format.obo-1_0.shtml [25 August 2010].21. Berners-Lee T, Hendler J, Lassila O. The Semantic Web. Scientific American, May 2001; 34–43.22. Golbreich C, Horridge M, Horrocks I, Motik B, Shearer R. OBO and OWL: Leveraging semantic web technologies

for the Life Sciences. Proceedings of the Sixth International Semantic Web Conference (Lecture Notes in ComputerScience, vol. 4825). Springer: Berlin, 2007; 169–182.

23. Aitken S, Bard J. Ontology views for collaborative ontology creation: The BioSphere Portal. Proceedings of U.K.e-Science All-Hands Meeting, Edinburgh, U.K., 2008.

24. Hull D, Wolstencroft K, Stevens R, Goble C, Pocock M, Li P, Oinn T. Taverna: A tool for building and runningworkflows of services. Nucleic Acids Research 2006; 34:729–732.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:223–234DOI: 10.1002/cpe

Page 12: Special Issue: Portals for life sciences—Providing intuitive access to bioinformatic tools

234 S. GESING ET AL.

25. Wolstencroft K, Alper P, Hull D, Wroe C, Lord P, Stevens R, Goble C. The myGrid ontology: Bioinformaticsservice discovery. International Journal of Bioinformatics Resesearch and Applications 2007; 3(3):303–325.

26. Menager H, Lacroix Z, Tuffery P. Bioinformatics services discovery using ontology classification. IEEE Congresson Services, IEEE, Salt Lake City, UT, U.S.A., 2007; 106–113.

27. bwGRiD Portal. Available at: http://www.bw-grid.de/portal/ [25 August 2010].28. Barbera R, Andronico G, Donvito G, Falzone A, Keijser JJ, La Rocca G, Milanesi L, Maggi GP, Vicario S.

A Grid portal with robot certificates for bioinformatics phylogenetic analyses. Concurrency and Computation:Practice and Experience, Special Issue on IWPLS 2009.

29. MoSGrid. Available at: http://www.d-grid-ggmbh.de/index.php?id=96&L=1 [25 August 2010].30. Sinnott R, Doherty T, Jiang J, McCafferty S, Stell A, Watt J. Security-oriented portals for the Life Sciences.

Proceedings of First Workshop IWPLS’ 09, CEUR Workshop Proceedings, Edinburgh, U.K., 2009; ISSN 1613-0073. Available at: CEUR-WS.org/Vol-513/paper09.pdf [30 November 2010].

31. Dames. Available at: http://dames.nesc.gla.ac.uk/web/guest;jsessionid=E6CFCE7377C53967CD679D65-FC1DF2FF [25 August 2010].

32. EuroDSD. Available at: http://www.eurodsd.eu/ [25 August 2010].33. van Hemert J, Koetsier J, Torterolo L, Porro I, Melato M, Barbera R. Generating web-based user interfaces for

computational science. Concurrency and Computation: Practice and Experience, Special Issue on IWPLS 2009.34. Rappture. Available at: http://nanohub.org/infrastructure/rappture/ [25 August 2010].35. Torterolo L, Porro I, Fato M, Melato M, Calanducci A, Barbera R. Building science gateways with EnginFrame:

A Life Science example. Proceedings of First Workshop IWPLS’ 09, CEUR Workshop Proceedings, Edinburgh,U.K., 2009; ISSN 1613-0073. Available at: CEUR-WS.org/Vol-513/paper10.pdf [30 November 2010].

36. Koetsier J, van Hemert J. Rapid development of computational science portals. Proceedings of First WorkshopIWPLS ’09, CEUR Workshop Proceedings, Edinburgh, U.K., 2009; ISSN 1613-0073. Available at: CEUR-WS.org/Vol-513/paper05.pdf.

37. Elsayed I, Madey G, Brezany P. Portals for collaborative research communities: Two distinguished case studies.Concurrency and Computation: Practice and Experience, Special Issue on IWPLS 2009.

38. Krüger J, Fels G. Ion permeation simulations by gromacs—-An example of high performance molecular dynamics.Concurrency and Computation: Practice and Experience, Special Issue on IWPLS 2009.

Copyright � 2010 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2011; 23:223–234DOI: 10.1002/cpe