54
The Ibis e-Science Software Framework Henri Bal Frank J. Seinstra, Jason Maassen, Niels Drost High Performance Distributed Computing Group Department of Computer Science VU University, Amsterdam, The Netherlands

The Ibis e-Science Software Framework

  • Upload
    mohawk

  • View
    46

  • Download
    7

Embed Size (px)

DESCRIPTION

The Ibis e-Science Software Framework. Henri Bal Frank J. Seinstra, Jason Maassen, Niels Drost High Performance Distributed Computing Group Department of Computer Science VU University, Amsterdam, The Netherlands. Introduction. Distributed systems continue to change - PowerPoint PPT Presentation

Citation preview

Page 1: The Ibis e-Science Software Framework

The Ibis e-Science Software Framework

Henri BalFrank J. Seinstra, Jason Maassen, Niels Drost

High Performance Distributed Computing GroupDepartment of Computer Science

VU University, Amsterdam, The Netherlands

Page 2: The Ibis e-Science Software Framework

Introduction

● Distributed systems continue to change● Clusters, grids, clouds, mobile devices

● Distributed applications continue to change● e-Science, web, pervasive applications

● Distributed programming continues to be notoriously difficult

Page 3: The Ibis e-Science Software Framework

Distributed Systems: 1980sMultiple PCs on a (local) network

● Networks of Workstations (NOWs)● Collections of Workstations (COWs)● Processor pools● Condor pools● Clusters

Page 4: The Ibis e-Science Software Framework

Distributed Systems: 1990sSharing wide-area resources

● Metacomputing (Smarr & Catlett, CACM)● Flocking Condor (Epema)● DAS (Distributed ASCI Supercomputer)● Grid Blueprint (Foster & Kesselman)● Desktop grids, SETI@home

Page 5: The Ibis e-Science Software Framework

Distributed Systems: 2000s

● Cloud computing● Pay-on-demand● Virtualization

● Hardware diversity /heterogeneous computing● Green IT● The Networked World

● Sensor networks● Smart phones

Page 6: The Ibis e-Science Software Framework

Our approach● Study fundamental underlying problems● … hand-in-hand with realistic applications● … integrate solutions in one system: Ibis

Distributed SystemsUser

!

● Funding from NWO (2002), VL-e (2003-2009), EU (JavaGAT, XtreemOS, Contrail), VU, COMMIT

Page 7: The Ibis e-Science Software Framework

● ‘Problem Solving’ vs. ‘System Fighting’● Jungle Computing● Example applications:

● Computational Astrophysics● Multimedia Content Analysis

● The Ibis Software Framework● The 3 Common Uses of Ibis

● ‘Master Key’ + ‘Glue’ + ‘HPC’● Some current work: Green Clouds

Outline

Page 8: The Ibis e-Science Software Framework

Ibis: ‘Problem Solving’ vs. ‘System Fighting’

Page 9: The Ibis e-Science Software Framework

● DACH 2008, Japan● Distributed multi-cluster system

● Heterogeneous● Distributed database (image pairs)

● Large vs small databases/images● Partial replication

● Image-pair comparison given (in C)

● Find all supernova candidates● Task 1: As fast as possible● Task 2: Idem, under system crashes

A Random Example: Supernova Detection

Page 10: The Ibis e-Science Software Framework

‘Problem Solving’ vs. ‘System Fighting’

● All participating teams struggled (1 month)● Middleware instabilities…● Connectivity problems…● Load balancing…

● But not the Ibis team● Winner (by far) in both categories● Note: many Japanese teams with years of experience

● Hardware, middleware, network, C-code, image data…● Focus on ‘problem solving’, not ‘system fighting’

● incl. ‘opening’ of black-box C-code

Page 11: The Ibis e-Science Software Framework

Ibis Results: Awards & Prizes

11stst Prize: SCALE 2008 Prize: SCALE 2008

AAAI-VC 2007AAAI-VC 2007Most Visionary Research AwardMost Visionary Research Award

11stst Prize: DACH 2008 - BS Prize: DACH 2008 - BS 11stst Prize: DACH 2008 - FT Prize: DACH 2008 - FT

WebPie: A Web-Scale Parallel Inference Engine

J. Urbani, S. Kotoulas,J. Maassen, N. Drost, F.J. Seinstra,

F. van Harmelen, and H.E. Bal

33rdrd Prize: ISWC 2008 Prize: ISWC 2008 1st Prize: SCALE 20101st Prize: SCALE 2010

● Many domains; data/compute intensive, real-time...● Winner Sustainability Award in the Enlighten Your Research (EYR) competition, 7 Dec. 2011 (Frank Seinstra)

Page 12: The Ibis e-Science Software Framework

Ibis Users…

……and many and many moremore

Page 13: The Ibis e-Science Software Framework

Jungle Computing

Page 14: The Ibis e-Science Software Framework

Jungle Computing (Frank Seinstra)

● ‘Worst case’ computing as required by end-users● Distributed● Heterogeneous● Hierarchical (incl. multi-/many-cores)

Page 15: The Ibis e-Science Software Framework

Why Jungle Computing?

● Scientists often forced to use a wide variety of resources simultaneously to solve computational problems, e.g. due to:

● Desire for scalability● Distributed nature of (input) data● Software heterogeneity (e.g.: mix of C/MPI and CUDA)● Ad hoc hardware availability● Energy consumption (use most energy-efficient resource)● …

● Note: most users do not need ‘worst case’ jungle● Ibis aims to apply to any subset

Page 16: The Ibis e-Science Software Framework

Example Application Domains

● Computational Astrophysics (Leiden)● AMUSE: multi-model / multi-kernel simulations● “Simulating the Universe on an Intercontinental

Grid” - Portegies Zwart et al (IEEE Computer, Aug 2010)

● Climate Modeling (Utrecht)● CPL: multi-model / multi-kernel simulations

● Atmosphere, ocean, source rock formation, …- hardware:

(potentially) very diverse - high resolution => speed & scalability - …

Page 17: The Ibis e-Science Software Framework

Domain Example #1:Computational Astrophysics

Page 18: The Ibis e-Science Software Framework

Domain Example #1: Computational Astrophysics

Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA

Page 19: The Ibis e-Science Software Framework

Domain Example #1: Computational Astrophysics

AMUSE

radiative transport

gravitational dynamics

hydro-dynamics

stellar evolution

● The AMUSE system (Leiden University)● Early Star Cluster Evolution, including gas

● Gravitational dynamics (N-body): GPU / GPU-cluster● Stellar evolution: Beowulf cluster / Cloud● Hydro-dynamics, Radiative transport: Supercomputer

Page 20: The Ibis e-Science Software Framework

Domain Example #1: Computational Astrophysics

Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA

Page 21: The Ibis e-Science Software Framework

Domain Example #2:Multimedia Content Analysis

Page 22: The Ibis e-Science Software Framework

Multimedia Content Analysis (MMCA)

● Aim:● Automatic extraction of ‘semantic concepts’ from image sets and

video streams

● Depending on specific problem & size of data set:● May take hours, days, weeks, months, years…

Page 23: The Ibis e-Science Software Framework

● Applications in (a.o):● Remote Sensing● Security / Surveillance● Medical Imaging● Document Analysis● Multimedia Systems● Astronomy

● Application types:● Real-time vs. off-line● Fine-grained vs. coarse-grained● Data-intensive / compute-intensive / information-intensive

Multimedia Content Analysis (MMCA)

Page 24: The Ibis e-Science Software Framework

Domain Example #2: Color-based Object Recognition by a Grid-connected Robot Dog

Seinstra et al (IEEE Multimedia, Oct-Dec 2007)Seinstra et al (AAAI’07: Most Visionary Research Award)

Page 25: The Ibis e-Science Software Framework
Page 26: The Ibis e-Science Software Framework

Successful…

● …but many fundamental problems unsolved!● Scaling up to very large systems● Platform independence● Middleware independence● Connectivity (a.o. firewalls, …)● Fault-tolerance● …

● Software support tool(s) urgently needed!● Jungle-aware + transparent + efficient● No progress until ‘discovery’ of Ibis

Page 27: The Ibis e-Science Software Framework

The Ibis Software Framework

Page 28: The Ibis e-Science Software Framework

The Ibis Software Framework

● Offers all functionality to efficiently & transparently implement & run Jungle Computing applications

● Designed for dynamic / hostile environments

● Modular and flexible● Allow replacement of Ibis components by external ones, including

native code

● Open source● Download: http://www.cs.vu.nl/ibis/

Page 29: The Ibis e-Science Software Framework

Ibis Design

● Applications need functionality for● Programming (as in programming languages)● Deployment (as in operating systems)

Programming

Logical

Likes math

Deployment

Practical

Visual (GUI)

Page 30: The Ibis e-Science Software Framework

Ibis Software Stack

Page 31: The Ibis e-Science Software Framework

JavaGAT

● Java Grid Application Toolkit● High-level API for developing (Grid) applications independently of

the underlying (Grid) middleware● Use (Grid) services; file cp, resource discovery, job submission, …

● Developed in EU GridLab project● Thilo Kielmann, Rob van Nieuwpoort

● SAGA API standardized by OGF● Simple API for Grid Applications (a.o. with LSU)● SAGA on top of JavaGAT (and v.v.)

Page 32: The Ibis e-Science Software Framework

Zorilla

● A prototype P2P middleware● A Zorilla system consists of a collection of nodes, connected by a

P2P network● Each node independent & implements all middleware functionality● No central components● Supports fault-tolerance and malleability● Easily combines resources in multiple administrative domains

Page 33: The Ibis e-Science Software Framework

IbisDeploy

Page 34: The Ibis e-Science Software Framework

Ibis Portability Layer (IPL)

● Java-centric ‘run-anywhere’ communication library● Sent along with your application● “MPI for the Grid”

● Supports fault-tolerance and malleability● Resource tracking (Join-Elect-Leave model)● Open-world / Closed world

● Efficient● Highly optimized object serialization● Can use optimized native libraries (e.g. MPI, Infiniband)

Page 35: The Ibis e-Science Software Framework

SmartSockets

● Robust connection setup

● Always connection in 30 different scenarios

Problems:Firewalls

Network Address Translation (NAT)

Non-routed networksMulti-homing

Page 36: The Ibis e-Science Software Framework

Ibis Programming Models

● IPL-based programming models, a.o.:● Satin:

● A divide-and-conquer model● MPJ:

● The MPI binding for Java● RMI:

● Object-Oriented remote Procedure Call● Jorus:

● A ‘user transparent’ parallel model for multimedia applications

Page 37: The Ibis e-Science Software Framework

The 3 Common Uses of Ibis

Page 38: The Ibis e-Science Software Framework

Ibis as ‘Master Key’ (or ‘Passepartout’)

● Use JavaGAT to access ‘any’ system● Develop/run applications independently of available middlewares● JavaGAT ‘adaptors’ required for each middleware● ‘Intelligent dispatching’ even allows for transparent use of multiple middlewares

● Example: file copy● JavaGAT vs. Globus

● Simple, portable, …● SAGA API standardized

package org.gridlab.gat.io.cpi.rftgt4; import java.net.MalformedURLException;import java.net.URL;import java.rmi.RemoteException;import java.security.cert.X509Certificate;import java.util.Calendar;import java.util.HashMap;import java.util.LinkedList;import java.util.List;import java.util.Map;import java.util.Vector; import javax.xml.namespace.QName;import javax.xml.rpc.ServiceException;import javax.xml.rpc.Stub;import javax.xml.soap.SOAPElement; import org.apache.axis.message.addressing.EndpointReferenceType;import org.apache.axis.types.URI.MalformedURIException;import org.globus.axis.util.Util;import org.globus.delegation.DelegationConstants;import org.globus.delegation.DelegationException;import org.globus.delegation.DelegationUtil;import org.globus.gsi.GlobusCredential;import org.globus.gsi.GlobusCredentialException;import org.globus.gsi.gssapi.GlobusGSSCredentialImpl;import org.globus.gsi.jaas.JaasGssUtil;import org.globus.rft.generated.BaseRequestType;import org.globus.rft.generated.CreateReliableFileTransferInputType;import org.globus.rft.generated.CreateReliableFileTransferOutputType;import org.globus.rft.generated.DeleteRequestType;import org.globus.rft.generated.DeleteType;import org.globus.rft.generated.OverallStatus;import org.globus.rft.generated.RFTFaultResourcePropertyType;import org.globus.rft.generated.RFTOptionsType;import org.globus.rft.generated.ReliableFileTransferFactoryPortType;import org.globus.rft.generated.ReliableFileTransferPortType;import org.globus.rft.generated.Start;import org.globus.rft.generated.TransferRequestType;import org.globus.rft.generated.TransferType;import org.globus.transfer.reliable.client.BaseRFTClient;import org.globus.transfer.reliable.service.RFTConstants;import org.globus.wsrf.NotificationConsumerManager;import org.globus.wsrf.NotifyCallback;import org.globus.wsrf.ResourceException;import org.globus.wsrf.WSNConstants;import org.globus.wsrf.container.ContainerException;import org.globus.wsrf.container.ServiceContainer;import org.globus.wsrf.core.notification.ResourcePropertyValueChangeNotificationElementType;import org.globus.wsrf.encoding.DeserializationException;import org.globus.wsrf.encoding.ObjectDeserializer;import org.globus.wsrf.impl.security.authentication.Constants;import org.globus.wsrf.impl.security.authorization.Authorization;import org.globus.wsrf.impl.security.authorization.HostAuthorization;import org.globus.wsrf.impl.security.authorization.IdentityAuthorization;import org.globus.wsrf.impl.security.authorization.SelfAuthorization;import org.globus.wsrf.impl.security.descriptor.ClientSecurityDescriptor;import org.globus.wsrf.impl.security.descriptor.ContainerSecurityDescriptor;import org.globus.wsrf.impl.security.descriptor.GSISecureMsgAuthMethod;import org.globus.wsrf.impl.security.descriptor.GSITransportAuthMethod;import org.globus.wsrf.impl.security.descriptor.ResourceSecurityDescriptor;import org.globus.wsrf.impl.security.descriptor.SecurityDescriptorException;import org.globus.wsrf.security.SecurityManager;import org.gridlab.gat.CouldNotInitializeCredentialException;import org.gridlab.gat.CredentialExpiredException;import org.gridlab.gat.GATContext;import org.gridlab.gat.GATInvocationException;import org.gridlab.gat.GATObjectCreationException;import org.gridlab.gat.Preferences;import org.gridlab.gat.URI;import org.gridlab.gat.io.cpi.FileCpi;import org.gridlab.gat.security.globus.GlobusSecurityUtils;import org.ietf.jgss.GSSCredential;import org.ietf.jgss.GSSException;import org.oasis.wsn.Subscribe;import org.oasis.wsn.TopicExpressionType;import org.oasis.wsrf.faults.BaseFaultType;import org.oasis.wsrf.lifetime.SetTerminationTime;import org.oasis.wsrf.properties.GetMultipleResourcePropertiesResponse;import org.oasis.wsrf.properties.GetMultipleResourceProperties_Element;import org.oasis.wsrf.properties.ResourcePropertyValueChangeNotificationType; class RFTGT4NotifyCallback implements NotifyCallback { RFTGT4FileAdaptor transfer; OverallStatus status;  public RFTGT4NotifyCallback(RFTGT4FileAdaptor transfer) { super(); this.transfer = transfer; this.status = null; }  @SuppressWarnings("unchecked") public void deliver(List topicPath, EndpointReferenceType producer, Object messageWrapper) { try { ResourcePropertyValueChangeNotificationType message = ((ResourcePropertyValueChangeNotificationElementType) messageWrapper) .getResourcePropertyValueChangeNotification(); this.status = (OverallStatus) message.getNewValue().get_any()[0] .getValueAsType(RFTConstants.OVERALL_STATUS_RESOURCE, OverallStatus.class); if (status.getFault() != null) { transfer.setFault(getFaultFromRP(status.getFault())); } // RunQueue.getInstance().add(this.resourceKey); } catch (Exception e) { } transfer.setStatus(status); }  private BaseFaultType getFaultFromRP(RFTFaultResourcePropertyType fault) { if (fault == null) { return null; }  if (fault.getDelegationEPRMissingFaultType() != null) { return fault.getDelegationEPRMissingFaultType(); } else if (fault.getRftAuthenticationFaultType() != null) { return fault.getRftAuthenticationFaultType(); } else if (fault.getRftAuthorizationFaultType() != null) { return fault.getRftAuthorizationFaultType(); } else if (fault.getRftDatabaseFaultType() != null) { return fault.getRftDatabaseFaultType(); } else if (fault.getRftRepeatedlyStartedFaultType() != null) { return fault.getRftRepeatedlyStartedFaultType(); } else if (fault.getTransferTransientFaultType() != null) { return fault.getTransferTransientFaultType(); } else if (fault.getRftTransferFaultType() != null) { return fault.getRftTransferFaultType(); } else { return null; } }} @SuppressWarnings("serial")public class RFTGT4FileAdaptor extends FileCpi { public static final Authorization DEFAULT_AUTHZ = HostAuthorization .getInstance(); Integer msgProtectionType = Constants.SIGNATURE; static final int TERM_TIME = 20; static final String PROTOCOL = "https"; private static final String BASE_SERVICE_PATH = "/wsrf/services/"; public static final int DEFAULT_DURATION_HOURS = 24; public static final Integer DEFAULT_MSG_PROTECTION = Constants.SIGNATURE; public static final String DEFAULT_FACTORY_PORT = "8443"; private static final int DEFAULT_GRIDFTP_PORT = 2811; NotificationConsumerManager notificationConsumerManager; EndpointReferenceType notificationConsumerEPR; EndpointReferenceType notificationProducerEPR; String securityType;

String factoryUrl; GSSCredential proxy; Authorization authorization; String host; OverallStatus status; BaseFaultType fault; String locationStr; ReliableFileTransferFactoryPortType factoryPort;  public RFTGT4FileAdaptor(GATContext gatContext, Preferences preferences, URI location) throws GATObjectCreationException { super(gatContext, preferences, location); if (!location.isCompatible("gsiftp") && !location.isCompatible("gridftp")) { throw new GATObjectCreationException("cannot handle this URI"); }  String globusLocation = System.getenv("GLOBUS_LOCATION"); if (globusLocation == null) { throw new GATObjectCreationException("$GLOBUS_LOCATION is not set"); } System.setProperty("GLOBUS_LOCATION", globusLocation); System.setProperty("axis.ClientConfigFile", globusLocation + "/client-config.wsdd"); this.host = location.getHost(); this.securityType = Constants.GSI_SEC_MSG; this.authorization = null; this.proxy = null;  try { proxy = GlobusSecurityUtils.getGlobusCredential(gatContext, preferences, "globus", location, DEFAULT_GRIDFTP_PORT); } catch (CouldNotInitializeCredentialException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (CredentialExpiredException e) { // TODO Auto-generated catch block e.printStackTrace(); }  this.notificationConsumerManager = null; this.notificationConsumerEPR = null; this.notificationProducerEPR = null; this.status = null; this.fault = null; factoryPort = null; this.factoryUrl = PROTOCOL + "://" + host + ":" + DEFAULT_FACTORY_PORT + BASE_SERVICE_PATH + RFTConstants.FACTORY_NAME; locationStr = setLocationStr(location); }  String setLocationStr(URI location) { if (location.getScheme().equals("any")) { return "gsiftp://" + location.getHost() + ":" + location.getPort() + "/" + location.getPath(); } else { return location.toString(); } }  protected boolean copy2(String destStr) throws GATInvocationException { EndpointReferenceType credentialEndpoint = getCredentialEPR();  TransferType[] transferArray = new TransferType[1]; transferArray[0] = new TransferType(); transferArray[0].setSourceUrl(locationStr); transferArray[0].setDestinationUrl(destStr);  RFTOptionsType rftOptions = new RFTOptionsType(); rftOptions.setBinary(Boolean.TRUE); // rftOptions.setIgnoreFilePermErr(false); TransferRequestType request = new TransferRequestType(); request.setRftOptions(rftOptions); request.setTransfer(transferArray); request.setTransferCredentialEndpoint(credentialEndpoint); setRequest(request);  while (!transfersDone()) { try { Thread.sleep(1000); } catch (InterruptedException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } } return transfersSucc(); }  public void copy(URI dest) throws GATInvocationException { String destUrl = setLocationStr(dest); if (!copy2(destUrl)) { throw new GATInvocationException( "RFTGT4FileAdaptor: file copy failed"); } }  public void subscribe(ReliableFileTransferPortType rft) throws GATInvocationException { Map<Object, Object> properties = new HashMap<Object, Object>(); properties.put(ServiceContainer.CLASS, "org.globus.wsrf.container.GSIServiceContainer"); if (this.proxy != null) { ContainerSecurityDescriptor containerSecDesc = new ContainerSecurityDescriptor(); SecurityManager.getManager(); try { containerSecDesc.setSubject(JaasGssUtil .createSubject(this.proxy)); } catch (GSSException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: ContainerSecurityDescriptor failed, " + e); } properties.put(ServiceContainer.CONTAINER_DESCRIPTOR, containerSecDesc); } this.notificationConsumerManager = NotificationConsumerManager .getInstance(properties); try { this.notificationConsumerManager.startListening(); } catch (ContainerException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: NotificationConsumerManager failed, " + e); } List<Object> topicPath = new LinkedList<Object>(); topicPath.add(RFTConstants.OVERALL_STATUS_RESOURCE); ResourceSecurityDescriptor securityDescriptor = new ResourceSecurityDescriptor(); String authz = null; if (authorization == null) { authz = Authorization.AUTHZ_NONE; } else if (authorization instanceof HostAuthorization) { authz = Authorization.AUTHZ_NONE; } else if (authorization instanceof SelfAuthorization) { authz = Authorization.AUTHZ_SELF; } else if (authorization instanceof IdentityAuthorization) { // not supported throw new GATInvocationException( "RFTGT4FileAdaptor: identity authorization not supported"); } else { // throw an sg throw new GATInvocationException( "RFTGT4FileAdaptor: set authorization failed"); } securityDescriptor.setAuthz(authz); Vector<Object> authMethod = new Vector<Object>(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { authMethod.add(GSISecureMsgAuthMethod.BOTH); } else { authMethod.add(GSITransportAuthMethod.BOTH); } try { securityDescriptor.setAuthMethods(authMethod); } catch (SecurityDescriptorException e) {

throw new GATInvocationException( "RFTGT4FileAdaptor: setAuthMethods failed, " + e); }  RFTGT4NotifyCallback notifyCallback = new RFTGT4NotifyCallback(this); try { notificationConsumerEPR = notificationConsumerManager .createNotificationConsumer(topicPath, notifyCallback, securityDescriptor); } catch (ResourceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: createNotificationConsumer failed, " + e); } Subscribe subscriptionRequest = new Subscribe(); subscriptionRequest.setConsumerReference(notificationConsumerEPR); TopicExpressionType topicExpression = null; try { topicExpression = new TopicExpressionType( WSNConstants.SIMPLE_TOPIC_DIALECT, RFTConstants.OVERALL_STATUS_RESOURCE); } catch (MalformedURIException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: create TopicExpressionType failed, " + e); } subscriptionRequest.setTopicExpression(topicExpression); try { rft.subscribe(subscriptionRequest); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: subscription failed, " + e); } }  protected EndpointReferenceType getCredentialEPR() throws GATInvocationException { this.status = null; URL factoryURL = null; try { factoryURL = new URL(factoryUrl); } catch (MalformedURLException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set factoryURL failed, " + e); } try { factoryPort = BaseRFTClient.rftFactoryLocator .getReliableFileTransferFactoryPortTypePort(factoryURL); } catch (ServiceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set factoryPort failed, " + e); } setSecurityTypeFromURL(factoryURL); return populateRFTEndpoints(factoryPort); }  protected void setRequest(BaseRequestType request) throws GATInvocationException { CreateReliableFileTransferInputType input = new CreateReliableFileTransferInputType(); if (request instanceof TransferRequestType) { input.setTransferRequest((TransferRequestType) request); } else { input.setDeleteRequest((DeleteRequestType) request); }  Calendar termTimeDel = Calendar.getInstance(); termTimeDel.add(Calendar.MINUTE, TERM_TIME); input.setInitialTerminationTime(termTimeDel); CreateReliableFileTransferOutputType response = null; try { response = factoryPort.createReliableFileTransfer(input); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: set createReliableFileTransfer failed, " + e); } EndpointReferenceType reliableRFTEndpoint = response .getReliableTransferEPR(); ReliableFileTransferPortType rft = null; try { rft = BaseRFTClient.rftLocator .getReliableFileTransferPortTypePort(reliableRFTEndpoint); } catch (ServiceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: getReliableFileTransferPortTypePort failed, " + e); } setStubSecurityProperties((Stub) rft); subscribe(rft); Calendar termTime = Calendar.getInstance(); termTime.add(Calendar.MINUTE, TERM_TIME); SetTerminationTime reqTermTime = new SetTerminationTime(); reqTermTime.setRequestedTerminationTime(termTime); try { rft.setTerminationTime(reqTermTime); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: setTerminationTime failed, " + e); } try { rft.start(new Start()); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: start failed, " + e); } }  private void setSecurityTypeFromURL(URL url) { if (url.getProtocol().equals("http")) { securityType = Constants.GSI_SEC_MSG; } else { Util.registerTransport(); securityType = Constants.GSI_TRANSPORT; } }  private void setStubSecurityProperties(Stub stub) { ClientSecurityDescriptor secDesc = new ClientSecurityDescriptor();  if (this.securityType.equals(Constants.GSI_SEC_MSG)) { secDesc.setGSISecureMsg(this.getMessageProtectionType()); } else { secDesc.setGSITransport(this.getMessageProtectionType()); }  secDesc.setAuthz(getAuthorization());  if (this.proxy != null) { // set proxy credential secDesc.setGSSCredential(this.proxy); }  stub._setProperty(Constants.CLIENT_DESCRIPTOR, secDesc); }  public Integer getMessageProtectionType() { return (this.msgProtectionType == null) ? RFTGT4FileAdaptor.DEFAULT_MSG_PROTECTION : this.msgProtectionType; }  public Authorization getAuthorization() { return (authorization == null) ? DEFAULT_AUTHZ : this.authorization; }  private EndpointReferenceType populateRFTEndpoints( ReliableFileTransferFactoryPortType factoryPort) throws GATInvocationException { EndpointReferenceType[] delegationFactoryEndpoints = fetchDelegationFactoryEndpoints(factoryPort); EndpointReferenceType delegationEndpoint = delegate(delegationFactoryEndpoints[0]); return delegationEndpoint; }

private EndpointReferenceType delegate( EndpointReferenceType delegationFactoryEndpoint) throws GATInvocationException { GlobusCredential credential = null; if (this.proxy != null) { credential = ((GlobusGSSCredentialImpl) this.proxy) .getGlobusCredential(); } else { try { credential = GlobusCredential.getDefaultCredential(); } catch (GlobusCredentialException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } }  int lifetime = DEFAULT_DURATION_HOURS * 60 * 60;  ClientSecurityDescriptor secDesc = new ClientSecurityDescriptor(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { secDesc.setGSISecureMsg(this.getMessageProtectionType()); } else { secDesc.setGSITransport(this.getMessageProtectionType()); } secDesc.setAuthz(getAuthorization());  if (this.proxy != null) { secDesc.setGSSCredential(this.proxy); }  // Get the public key to delegate on. X509Certificate[] certsToDelegateOn = null; try { certsToDelegateOn = DelegationUtil.getCertificateChainRP( delegationFactoryEndpoint, secDesc); } catch (DelegationException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } X509Certificate certToSign = certsToDelegateOn[0];  // FIXME remove when there is a DelegationUtil.delegate(EPR, ...) String protocol = delegationFactoryEndpoint.getAddress().getScheme(); String host = delegationFactoryEndpoint.getAddress().getHost(); int port = delegationFactoryEndpoint.getAddress().getPort(); String factoryUrl = protocol + "://" + host + ":" + port + BASE_SERVICE_PATH + DelegationConstants.FACTORY_PATH;  // send to delegation service and get epr. EndpointReferenceType credentialEndpoint = null; try { credentialEndpoint = DelegationUtil.delegate(factoryUrl, credential, certToSign, lifetime, false, secDesc); } catch (DelegationException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } return credentialEndpoint; }  public EndpointReferenceType[] fetchDelegationFactoryEndpoints( ReliableFileTransferFactoryPortType factoryPort) throws GATInvocationException {  GetMultipleResourceProperties_Element request = new GetMultipleResourceProperties_Element(); request .setResourceProperty(new QName[] { RFTConstants.DELEGATION_ENDPOINT_FACTORY }); GetMultipleResourcePropertiesResponse response; try { response = factoryPort.getMultipleResourceProperties(request); } catch (RemoteException e) { e.printStackTrace(); throw new GATInvocationException( "RFTGT4FileAdaptor: getMultipleResourceProperties, " + e); } SOAPElement[] any = response.get_any();  EndpointReferenceType epr1 = null; try { epr1 = (EndpointReferenceType) ObjectDeserializer.toObject(any[0], EndpointReferenceType.class); } catch (DeserializationException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: ObjectDeserializer, " + e); } EndpointReferenceType[] endpoints = new EndpointReferenceType[] { epr1 }; return endpoints; }  synchronized void setStatus(OverallStatus status) { this.status = status; }  public int transfersActive() { if (status == null) { return 1; } return status.getTransfersActive(); }  public int transfersFinished() { if (status == null) { return 0; } return status.getTransfersFinished(); }  public int transfersCancelled() { if (status == null) { return 0; } return status.getTransfersCancelled(); }  public int transfersFailed() { if (status == null) { return 0; } return status.getTransfersFailed(); }  public int transfersPending() { if (status == null) { return 1; } return status.getTransfersPending(); }  public int transfersRestarted() { if (status == null) { return 0; } return status.getTransfersRestarted(); }  public boolean transfersDone() { return (transfersActive() == 0 && transfersPending() == 0 && transfersRestarted() == 0); }  public boolean transfersSucc() { return (transfersDone() && transfersFailed() == 0 && transfersCancelled() == 0); }  /* * private BaseFaultType getFaultFromRP(RFTFaultResourcePropertyType fault) { * if (fault == null) { return null; } * * if (fault.getRftTransferFaultType() != null) { return * fault.getRftTransferFaultType(); } else if * (fault.getDelegationEPRMissingFaultType() != null) { return * fault.getDelegationEPRMissingFaultType(); } else if * (fault.getRftAuthenticationFaultType() != null) { return * fault.getRftAuthenticationFaultType(); } else if * (fault.getRftAuthorizationFaultType() != null) { return * fault.getRftAuthorizationFaultType(); } else if * (fault.getRftDatabaseFaultType() != null) { return * fault.getRftDatabaseFaultType(); } else if * (fault.getRftRepeatedlyStartedFaultType() != null) { return * fault.getRftRepeatedlyStartedFaultType(); } else if * (fault.getTransferTransientFaultType() != null) { return * fault.getTransferTransientFaultType(); } else { return null; } } */  /* * private BaseFaultType deserializeFaultRP(SOAPElement any) throws * Exception { return getFaultFromRP((RFTFaultResourcePropertyType) * ObjectDeserializer .toObject(any, RFTFaultResourcePropertyType.class)); } */  void setFault(BaseFaultType fault) { this.fault = fault; } }

package tutorial;

import org.gridlab.gat.GAT;import org.gridlab.gat.GATContext;import org.gridlab.gat.URI;import org.gridlab.gat.io.File;

public class RemoteCopy { public static void main(String[] args) throws Exception { GATContext context = new GATContext();

URI src = new URI(args[0]); URI dest = new URI(args[1]); File file = GAT.createFile(context, src);

file.copy(dest); GAT.end(); }}

Page 39: The Ibis e-Science Software Framework

Ibis as ‘Glue’

● Use IPL + SmartSockets, generally for wide-area communication

● Linking up separate ‘activities’ of an application● Activities: often largely ‘independent’ tasks implemented in any

popular language or model (e.g. C/MPI, CUDA, Fortran, Java…)● Each typically running on a single GPU/node/Cluster/Cloud/…

● Automatically circumvent connectivity problems● Example:

With SmartSockets: No SmartSockets:

Page 40: The Ibis e-Science Software Framework

Ibis as ‘HPC Solution’

● Use Ibis as replacement for e.g. C++/MPI code● Benefits:

● (better) portability● malleability (open world)● fault-tolerance● (run-time) task migration

● Downside:● requires recoding

● Comparable speedups:

Page 41: The Ibis e-Science Software Framework

C++/MPIC++/MPI

Sockets + SSH TunnelingSockets + SSH Tunneling

SSHSSH

● Code pre-installed at each cluster site● Instable / faulty communication● Connectivity problems ● Execution on each cluster ‘by hand’

MMCA: Situation in 2004/2005

Parallel

Horus

Client

Parallel

Horus

Server

Parallel

Horus

Client

Page 42: The Ibis e-Science Software Framework

C++/MPIC++/MPI

Sockets + SSH TunnelingSockets + SSH Tunneling

JavaGAT + JavaGAT + IbisDeployIbisDeploy

Phase 1: Ibis as ‘Master Key’ (2006)

Parallel

Horus

Client

Parallel

Horus

Server

Parallel

Horus

Client

Page 43: The Ibis e-Science Software Framework

C++/MPIC++/MPI

IPL + SmartSocketsIPL + SmartSockets

JavaGAT + JavaGAT + IbisDeployIbisDeploy

Phase 2: Ibis as ‘Glue’ (2006/2007)

Parallel

Horus

Client

Parallel

Horus

Server

Parallel

Horus

Client

Page 44: The Ibis e-Science Software Framework

Ibis/JavaIbis/Java

IPL + SmartSocketsIPL + SmartSockets

JavaGAT + JavaGAT + IbisDeployIbisDeploy

Phase 3: Ibis as ‘HPC Solution’ (2008)

Parallel

Jorus

Client

Parallel

Jorus

Server

Parallel

Jorus

Client

Page 45: The Ibis e-Science Software Framework

‘Master Key’ + ‘Glue’ + ‘HPC’

● Step-wise conversion to 100% Ibis / Java● Phase 1: JavaGAT as ‘Master Key’● Phase 2: IPL + SmartSockets as ‘Glue’● Phase 3: Ibis as ‘HPC Solution’● After each phase a fully functional, working solution was available!

● Eventual result:● ‘wall-socket computing from a memory stick’● Remember: the ‘Promise of the Grid’?● Awards at AAAI 2007 and CCGrid 2008

Page 46: The Ibis e-Science Software Framework

100% Ibis Implementation (2008++)

Seinstra, Maassen, Drost et al. (SCALE 2008 @ CCGrid 2008: First Prize Winner)Bal, Maassen, Drost, Seinstra et al. (IEEE Computer, Aug 2010)

Page 47: The Ibis e-Science Software Framework

Some current work

Page 48: The Ibis e-Science Software Framework

● NWO Smart Energy Systems project with Univ. of Amsterdam (Cees de Laat) & SARA

● How to map high-performance applications onto hybrid distributed computing system, taking both performance & energy consumption into account

● System-level approach to reduce HPC energy consumption

Green Clouds

Page 49: The Ibis e-Science Software Framework

DAS-4: infrastructure for Green IT

Dual quad-core Xeon E5620 Various accelerators (GPUs, multicores, ….)Scientific LinuxBuilt by ClusterVision

VU (74)

TU Delft (32) Leiden (16)

UvA/MultimediaN (16/36)

SURFnet6

10 Gb/s lambdasASTRON (23)

Page 50: The Ibis e-Science Software Framework

● Adapt resources to application needs dynamically, accounting for computational & energy efficiency

● Using Ibis malleability support

● Exploit hardware diversity● Graphics Processing Units (GPUs) have much higher FLOPS/Watt

for many applications

● Use optical and photonic networks● Build a knowledge base & semantic infrastructure

description

Main ideas

Page 51: The Ibis e-Science Software Framework

Other current PhD projects using Ibis

● Distributed reasoning over semantic web data● WebPIE: Parallel reasoner on Web scale● Written in Java, uses Hadoop (MapReduce)

● Graph applications (HiPG)● E.g. for bioinformatics applications● http://www.graph500.org/

● Games & distributed model checking● Deal with large state space

● Distributed smart phone applications● Computation & communication offloading to a cloud

Page 52: The Ibis e-Science Software Framework

Conclusions

● Ibis enables problem solving (avoids system fighting)

● Successfully applied in many domains● Astronomy, multimedia analysis, climate modeling,

remote sensing, semantic web, medical imaging, …● Data intensive, compute intensive, real-time…

● Open source, download:● www.cs.vu.nl/ibis/

Page 53: The Ibis e-Science Software Framework

Conclusions (2)

● Jungle Computing is hard

● High-Performance Jungle Computing even harder

● While research into efficient & green Jungle-aware programming models has only just begun…

● …Ibis provides the basic functionality to efficiently & transparently overcome most Jungle Computing complexities

Page 54: The Ibis e-Science Software Framework

AcknowledgementsNiels Drost

Ceriel JacobsRoelof Kemp

Timo van KesselThilo KielmannJason Maassen

Maarten van MeersbergenRob van Nieuwpoort

Nick PalmerKees van ReeuwijkFrank J. SeinstraKees Verstoep

Ben van WerkhovenGosia Wrzesinska