Upload
caron
View
46
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Designing a Java-based Grid Scheduler using Commodity Services. Patrick Wendel Arnold Fung Moustafa Ghanem Yike Guo Discovery NetInforSense Department of ComputingLondon Imperial College, London. Outline. Discovery Net Project Platform Workflow Execution Design Deployment - PowerPoint PPT Presentation
Citation preview
Designing a Java-based Grid Scheduler using Commodity Services
Patrick WendelArnold FungMoustafa GhanemYike Guo
Discovery Net InforSenseDepartment of Computing LondonImperial College, London
19/09/2006All Hands Meeting, Nottingham
Outline
Discovery Net Project Platform Workflow Execution Design Deployment Conclusions – Future Works
19/09/2006All Hands Meeting, Nottingham
Discovery Net Multidisciplinary project funded by the EPSRC under the UK e-Science programme
(started Oct 2002, ended March 05)
Develop an infrastructure for integrating various types of data sources, software and hardware resources, targeted at e-Scientists.
Applications to: Life Sciences
• High throughput genomics and proteomics Real-time Environmental Monitoring
• High throughput dispersed air sensing technology Geo-Hazard modelling
• Earthquake modelling through satellite imagery
The project covered many areas including infrastructure, applications and algorithms (e.g. text mining)
Produced the Discovery Net platform which aims to integrate, compose, coordinate and deploy knowledge discovery services using a workflow technology.
19/09/2006All Hands Meeting, Nottingham
Excel
SQL Database
s
Data Processing
ToolsAnalysis Services
Third Party Tools
Multiple data sources
Interactive Knowledge Discovery
Interactive Solution Building
Rapid Application Deployment
Portal / DashboardApplication
Discovery Net Overview
Files
Automation & Scheduling
DataApplicationsComponents
Computations Services
Integrative Analytics Workflow Environment
Distribution to Scientists
Dynamic Data & App Integration
Business Process
Online Sources
Web Services
Grid Services
Web/Grid Service
19/09/2006All Hands Meeting, Nottingham
Rail Network Data Analysis Collaboration between the London e-
Science Centre, and AEA Technology Rail funded by DfT
Project showed how it is possible to analyse the large amounts of data available within the rail industry using e-Science methods and Grid computing
Imaging Applications Project using imageodesy
algorithms Medical imaging
Combinatorial Chemistry TOPCOMBI (EU Project) 22
partners
Latest Applications
19/09/2006All Hands Meeting, Nottingham
SIMDAT
EU-funded project 4 years Start date: September 1st, 2004 26 partners www.simdat.org
InforSense is technology champion for workflow systems
Pharma applications Automotive applications Knowledge services application
Capability Providers
Grid Technologists
End Users
19/09/2006All Hands Meeting, Nottingham
SIMDAT Work conducted within SIMDAT (EU-funded project)
Extended workflow engine to support B2B use case scenario in the automotive, pharmaceutical
Integration with GRIA Prototypes for coupled workflow engines Prototypes for workflow engine interoperability
19/09/2006All Hands Meeting, Nottingham
ModulesInterface
Submission
Execution/Optimisation
Monitoring
Interaction
Verification
Enactment/Execution
IntermediateResults
Data Access
Table Management
Data ManagementActivitiesActivity Definition
Activity Authorisation
WorkflowsWorkflow Storage
Workflow Client ToolWeb Portal Web Service
Workflow Execution History
PersistentResults
Workflow Authorisation Data Authorisation
Authorisation
19/09/2006All Hands Meeting, Nottingham
Workflow?
Data-flow Dependency graph Workflow construction paradigm:
Visual graph construction (layout, annotation) Aided construction through application-specific wizards
Using workflows provides: A simple rapid application development environment Visual representation of the process Re-usable, maintainable and shared processes Workflow-based knowledge management (provenance,
audit, policies, warehousing) Handling of basic parallel programming constructs
(concurrent executions of branches, pipelining of executions for certain type of data and certain activities, interface for data-parallel activity implementations)
Coupling with data sets management
19/09/2006All Hands Meeting, Nottingham
Interface Client interface:
Workflow construction, verification, execution, monitoring
Supports visualisation and interactive activities (activities executed in the client)
Synchronised with activity repository (using JWS)
Web Portal and Web Services endpoint, for accessing workflows as Services
19/09/2006All Hands Meeting, Nottingham
Server-side Architecture
RepositoriesUser/GroupWorkflows Results Interm.
Results Jobs
Generic Services
Activities
Logging(Log4J)
Container-Managed Persistence for EJB
Messaging (JMS)P2P, Publish/Subscribe Security (JAAS)
Services
HTTP Servlets
Presentation
Stateless Stateful CMP Message-driven
JSP Struts Portlets(JSR 168)
Cache/Results AccessCode Download
TaskManagement
Job ExecutionHandlerDataMgmt
Queues TopicsJobs queue Status topicJobs topic
ComponentMgmt
Database Connectivity (JDBC)
Data Transfer Web Service
JobsHistory
Naming Service (JNDI)
Management Service (JMX)
Plugin Framework (JPF)
Software Delivery(JWS)
19/09/2006All Hands Meeting, Nottingham
Distributed Execution
Activity-level distributed computing SSH (data streaming), SGE, LSF… Web Services, GRIA, HTTPClient (Groovy)
Workflow-level (scheduling of overall execution): Depends on usage and type of workflow:
• Developing prototype workflows: • Iterative refinement • Caching and reuse of intermediate results within a user session
• Stateless production workflow:• Entire workflow executed for different input/parameters• Scheduling
• Stateful production workflow as services:• Workflow executed following a process/guide• Execution engine must be able to reuse results cached
19/09/2006All Hands Meeting, Nottingham
Granularity of an execution
Architecture based on the Java EE stack, which provides a hosting environment for the activities (context, security, logging, access to resources and application-level environment information)
Each workflow execution is handled by one or more threads running in a Java server, while usually tasks submitted to grid schedulers are OS processes.
Periodic monitoring information generated by each activity (not only by the workflow engine) sent back to the client tool or portal.
What’s the best way to handle task scheduling in that context?
19/09/2006All Hands Meeting, Nottingham
Requirements Summary
From the Discovery Net architecture: Workflow execution and activity reliant on JEE
services Scheduling should depend on the need to reuse and
the availability of intermediate results for the workflow Additional constraints:
Execution servers can be distributed over WAN Based only on standard Grid infrastructure or JEE
Services No direct communication between execution servers
and client tool
19/09/2006All Hands Meeting, Nottingham
First attempt: Grid Scheduler
Submit execution to SGE Issues:
Cannot start an instance of the server for each execution (only one instance of JBoss at a time, except adding new configurations for each execution).
Start up cost of the server not negligible for some workflows.
The execution server needs to connect back to the submission server and setup a two-way communication channel.
How is the client notified of new status?
19/09/2006All Hands Meeting, Nottingham
Second attempt: AS Clustering Application server level clustering CMP Entity bean Clustering Experiment with JBoss Clustering (based on JGroups) Issues
Application Server Clustering not fully standardized. Different issues on different application servers
Cluster configuration based on JGroups, only supported static clusters (set of IPs) or join protocol based on broadcast (may be better now?)
Modifications of the clustering code required to ensure that a unique instance of the Entity bean representing the task is created and used throughout the execution
Not designed for long running tasks
19/09/2006All Hands Meeting, Nottingham
Third Attempt: Using Java Services Stateless Session Bean as entry point (Task Management Service):
Mapping to IIOP/RMI or SOAP/HTTP Stateful CMP Entity Bean to represent the state of the task
(workflow, cached results, monitoring information) Job JMS Queue to submit requests ExecutionHandler Message-Driven Bean to handle the requests Job Topic to send control commands to the execution Status Topic to send back information from the execution Scheduling policy implemented by the JMS Queue service provider:
Default using round-robin Integrated with SGE using simple scripts to find out a potential
execution server (extended to check whether the execution server is started or should be started)
Customised implementation to check for workflow cached intermediate results
Number of concurrent executions on each execution server defined by the size of the pool for the ExecutionHandler MDB
19/09/2006All Hands Meeting, Nottingham
Web PortalTask Management Service (Stateless)
MessagingService Provider
Persistence Service
Execution Server 1
Execution Server N
load/save
subscribe subscribe
publish
submit
Client Tool
Services Provided
Design
19/09/2006All Hands Meeting, Nottingham
Task Management Service
Job queue
Execution Server
load/savesubscribe
publish
submitClient Tool
Services Provided
- Create JobEntity
-Publish to Job queue
- Message-triggered ExecutionHandler receives notification
- JobEntity activated
Persistence Provider for
JobEntity
Submission
19/09/2006All Hands Meeting, Nottingham
Task Management Service
Job topic
Execution Server
load/save
Subscribeto messages for Job ID
publish
Control(pause/resume/kill)Client Tool
Services Provided
-Publish control request on Job topic
- JobEntity receives notification
Persistence Provider for
JobEntity
Control
19/09/2006All Hands Meeting, Nottingham
Task Management Service
Status Topic
Execution Server
updatepublish
Client Tool
Services Provided
-Update Job entity state- Publish status update
Persistence Provider for
JobEntity
Subscribeto messages for Job ID
Monitoring
19/09/2006All Hands Meeting, Nottingham
Management
Status update period: The ExecutionHandler is in charge of checking regularly (base period) if the monitoring information of the workflow has changed, increase the period if it has not (up to a maximum update period) and notify the Status Topic if it has.
Failure detection: The server hosting the Task Management service also checks for tasks for which the time since the last update is significantly higher than the maximum update period.
Security Context: All the execution servers can have dedicated JAAS configuration. To avoid the issue of having to re-authenticate the user who submitted the workflow, execution servers use a customised login module to handle the delegation.
19/09/2006All Hands Meeting, Nottingham
Deployment
JBoss 3.2, JBossMQ, Hibernate LAN: Using faster native Java
protocols (RMI/JRMP) and call back WAN: Using HTTP-based and polling
based protocols
19/09/2006All Hands Meeting, Nottingham
Task Management Service
MessagingService Provider
Persistence Service
Execution Server 1
Execution Server NClient Tool
Services Provided TCPTCP
RMI/JRMP
RMI/JRMP
RMI/JRMP
RMI/JRMP
IIOPWeb Service/HTTP
LAN Deployment
19/09/2006All Hands Meeting, Nottingham
Task Management Service
MessagingService Provider
Persistence Service
Execution Server 1
Execution Server NClient Tool
Services Provided TCPTCP
HTTP
HTTP
HTTP
HTTP
Web Service/HTTP IIOP
Firewall/NAT Firewall/NAT
WAN Deployment
19/09/2006All Hands Meeting, Nottingham
Evaluation
Reliability: Dependent on reliability of CMP provider, JMS provider Task Management service is stateless Execution Servers do not hold the state of the task (only
intermediate results) LAN configuration, used for running nightly regression test
workflows, over a heterogeneous cluster (Linux servers + desktop PCs)
Deployed on production clusters (with limited connectivity from the slaves to the outside network)
WAN configuration adding several seconds of delay depending on the workflow: Workflow submission is still synchronous RPC using tunnelled
JRMP Monitoring information using Java serialisation as well
19/09/2006All Hands Meeting, Nottingham
Conclusion Simple, scalable solution based on Java EE commodity
services, instead of working around Grid submission APIs, yet customisable to use any command-line based scheduler, resource monitor or workflow specific policies.
The implementation is not bound to any network protocol. Issues
To have custom policies rely on the flexibility of the JMS provider
No software delivery mechanism for execution servers (unlike the client). You have to install it.
Reliance on JEE services performances? Why bother about having a hosting environment for
workflow execution?
19/09/2006All Hands Meeting, Nottingham
Future Works
Use the workflow structure to refine the scheduling algorithm, taking into account information about the workflow (such as the number of branches and pipelined activities)
User-defined rules/scripts to define workflow-level or activity-level scheduling policy/rules.