40
Research Area Background Area: systems – applied computer science Question: what to do? Dr. Dan Reed, Vice President Microsoft, in his Keynote talk “Clouds: from Both Sides New” in Washington in 2011 stated (my interpretation) University researchers should find a research niche because they do not have enough resources (human and financial) to compete against main stream of research carried out by big companies SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013

Research Area Background Area: systems – applied computer science Question: what to do? Dr. Dan Reed, Vice President Microsoft, in his Keynote talk “Clouds:

Embed Size (px)

Citation preview

  • Slide 1
  • Research Area Background Area: systems applied computer science Question: what to do? Dr. Dan Reed, Vice President Microsoft, in his Keynote talk Clouds: from Both Sides New in Washington in 2011 stated (my interpretation) University researchers should find a research niche because they do not have enough resources (human and financial) to compete against main stream of research carried out by big companies SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 2
  • Andrzej Goscinski Service and Cloud Computing Lab Senior Members: A. Wong, P. Church, M. Brock
  • Slide 3
  • Biology and Medicine Needs Biology and medicine specialists collect a lot of data Many of them only use their workstations, desktops and even laptops to carry out data analysis Many of them are not familiar with HPC Many biology and medicine specialists do not program well and do not have system admin skills (they should not have it I guess) Biology and medicine specialists would like to use computers to get analysis results quickly without a burden of computing jargon SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 4
  • Lab (Current) Research Aim to carry out the study into the development of a technology for simplifying the deployment, exposure, access and customization of HPC science applications in SaaS clouds This technology forms a basis of research environments enabling science specialists to use HPC resources in clouds for running their computational demanding software easily on-demand at reasonable costs for the discovery of new and significant discipline knowledge SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 5
  • The NIST Definition of Cloud Computing NIST Special Publication 800-145, P. Mell and T. Grance, Sept 2011 Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction This cloud model is composed of five essential characteristics, three service models, and four deployment models
  • Slide 6
  • SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013 NIST: Service Models Infrastructure as a Service (IaaS) The delivery of hardware resources as a service Users are granted access to cloud infrastructure through virtual machines Platform as a Service (PaaS) Build services on IaaS clouds supporting cloud application deployment Most cloud platforms consist of a high-level language and a well-defined Application Programming Interface Software as a Service (SaaS) Exposes applications designed to run on a cloud as services Eliminates the need to install or run applications on the customers computer and is often cheaper than buying a full software licence
  • Slide 7
  • SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013 NIST: Deployment Models Public Clouds Accessed by the general public Allows users to rent resources such as computational time or storage as necessary Private Clouds Used exclusively by an organisation Allow for a specific service level agreement (SLA) to be made to ensure availability and security Community Clouds Used by a group of users that have shared concerns Allows for a shared mission statement which has specific security and policy requirements Hybrid Clouds Combines cloud resources from two or more deployment models to accomplish a users goal
  • Slide 8
  • SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013 NIST: Essential Characteristics On-demand self-service Broad network access Resource pooling Rapid elasticity Measured service
  • Slide 9
  • SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013 Characteristics of Clouds that Attract Business Clients only pay for what they consume Rather than spending money on buying, managing and upgrading servers, business administrators concentrate on the management of their applications The required service is always there availability is very high that leads to short times from submission to the completion of execution Cloud computing provides opportunities to small businesses by giving them access to world class systems otherwise unaffordable On the other hand, even small companies can export their specialized services to clients
  • Slide 10
  • When Using Clouds Additional Steps Must be Carried out Depending on the Service Model IaaS - involves construction of a virtual cluster, compilation and deployment of distributed software System administrators jobs PaaS - aimed at developers provide users with a development environment and automating the deployment of resources Limited access to development tools and languages SaaS - users are able to access HPC applications through graphical interfaces; however users are reliant on what cloud service providers have made available Such software would have expensive licenses or be not readily available SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 11
  • Cloud Trends {ChangeWave Investing Weekly Update (5/21/2013)} Over the past 2.5 years the percentage of companies who say they are currently using public cloud computing services has climbed from 14% to 40%. SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 12
  • Cloud Trends {ChangeWave Investing Weekly Update (5/21/2013)} The results in the latest ChangeWave cloud survey point to continued growth for public, private and hybrid cloud computing Within public cloud computing, software as a service (SaaS) remains the area with the fastest growth rate When asked why their companies do not use cloud computing, the most important reasons are Security Concerns (41%), while 15% cite the Complexity of Integrating with Existing IT Infrastructure SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 13
  • Cloud Trends {ChangeWave Investing Weekly Update (5/21/2013)} SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 14
  • HPC vs. HPC Clouds vs. Discipline Specialists Problem 1: HPC requires powerful and expensive computational and data storage hardware advanced middleware sophisticated discipline oriented applications knowledgeable programmers and system managers Clouds have been created for business ($$$), not to earn money from HPC ($) Most HPC clouds are based on IaaS clouds enhanced by additional hardware and middleware to support HPC Problem 2: the cost and time overheads in learning how to prepare a HPC cloud and properly install and configure applications in the underlying HPC facilities Conclusion: if discipline specialists want to use HPC clouds for scientific discovery, they also must become system administrators and good programmers SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 15
  • Clouds and HPC A response to Problems 1 & 2 faced by discipline specialists lies in cloud computing These days clouds can support some HPC workloads Clouds are oriented to support High Scalability Computing (HSC) rather than HPC Note: with the improvement of communication performance clouds are becoming a major tool for HPC Question: what kind of HPC applications could be executed on a cloud? SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 16
  • HPC Clouds vs. Applications SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 17
  • HPC Clouds vs. Discipline Specialists Most HPC clouds are based on IaaS clouds enhanced by additional hardware and middleware to support HPC Problem 3 again: the cost and time overheads in learning how to prepare a HPC cloud and its applications remain a problem HPC cloud users are presented with a set of virtual and physical servers required to put the servers together to form the HPC facilities to run their software applications on The software applications must be properly installed and configured in the underlying HPC facilities Conclusion: if discipline specialists want to use HPC clouds for scientific discovery, they must also become system administrators and good programmers SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 18
  • Web-based Software Tools/Packages In many areas of science, discipline specialists benefit from Web-based software tools Software tools are easy to use and attractive to specialists through their discipline oriented interfaces scientific workflow systems (Galaxy) web portals for accessing grid resources (P-GRADE) web portals of scientific gateway such as HubZero Observation: specialists appreciate easy to use Web-based discipline oriented interfaces! Plenary "Cloud in Action" CLOUD 2013 panel
  • Slide 19
  • HPC Applications Exposed as Services in SaaS Clouds Conclusion: discipline specialists could benefit most from the execution of their HPC applications if they are exposed as services in SaaS clouds and accessed through discipline (tool-based) interfaces Use of clouds (ChangeWave Research) SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 20
  • Merging SaaS Cloud Services and Web Tools Question: are we on a good track? Yes, we are! Providing users faster turnaround times on their experiments using clouds has been one of the major issues promised to be addressed in a new version of the AGAVE software tool AGAVE is one of the well known and widely used Web- based software tools AGAVE delivers science-as-a-service Data processed using analytics provided as SaaS services Plenary "Cloud in Action" CLOUD 2013 panel
  • Slide 21
  • Direct Research Questions How to make scientists able to deploy software applications in clouds? How to make clouds easy to use for discipline researchers to run HPC applications? How to support the customization and reuse of HPC applications in clouds? These three questions form the current research scope of our Lab Our research aim again: develop a technology that automatically creates a virtual machine (VM) exposes an application as a service deploys it on the VM generates an easy to use interface a Web form SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 22
  • Initial Labs Research Web services, which are used to develop services, are stateless Our response: stateful Web services Service discovery and selection is a major threshold of the application of cloud computing (only simple catalogues are in use) Our response: a dynamic broker based on attributed names The application of HPC is unaffordable to small and medium research groups and institution Our response: the CaaS framework that exposes a cluster as a service, and makes it available within a private and public cloud SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 23
  • From IaaS/PaaS to SaaS with a Broker (M. Brock) SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 24
  • From IaaS/PaaS to SaaS with a Broker (M. Brock) The RVWS Framework Allows current activity and characteristics of resources to be exposed as services via WSDL documents A compatible extension to existing Web standards The Dynamic Broker A discovery service that uses stateful WSDL documents CaaS Infrastructure Web service-based middleware for easy publishing, discovery and use of clusters HPCynergy A prototype private cloud built using CaaS for easy access to HPC resources and applications HPC Hybrid Deakin (H 2 D) Cloud Able to discover suitable resources from both public and private clouds to execute single applications too large to singular clusters All tasks such as parameter modification, data file break up and multiple application monitoring handled on behalf of the user SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 25
  • SaaS Cloud Supporting HPC Science Applications SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013 Three steps: Deployment of HPC applications on IaaS clouds Exposure of HPC application services Access of HPC application services Transforming complicated HPC applications into easy-to-use SaaS cloud services User Web Form Virtual Machine Image HPC Application Service SaaS Cloud HPC Resources IaaS Cloud HPC Application HPCApplication Service Registry HPC Application Deployment HPC Application Service, Web Form Generation Publishing Accessing Deploying Service Discovery No Yes
  • Slide 26
  • Using the Framework The discipline researcher to conduct a scientific discovery by executing HPC applications on clouds contacts the HPC Application Service Registry Scenario 1: the HPC application services of researcher s interest is found Researcher selects the cloud service Resources are selected automatically and the application deployment service sets up and configures the cloud The automated interface generation service constructs a user friendly discipline specific interface for the requested HPC application service Researcher accesses the cloud service through the provided interface Scenario 2: the HPC application service of users interest is not found but the discipline researcher has programming and system administration skills and decides to deploy a new targeted HPC application in IaaS cloud The Automatic HPC Application Deployment System can automate parts of this process The outcome is either a virtual machine image containing a copy of the properly installed and configured HPC application or a software service (consisting of input/output, invocation information and hardware requirements) which can be deployed on a virtual machine Stage 1: the cloud service published in the HPC application service registry is readily accessible in IaaS cloud The new cloud service generated by the Automatic HPC Application Deployment System is stored for future use in the HPC Application Service Registry Stage 2: the user can employ the Automatic HPC Application Service and Web Form Generation System to automate the formation of a HPC Application Service exposing the HPC application The HPC Application Service is abstracted by a user friendly discipline specific interface that is published in the HPC application service registry (see Scenario 1) SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 27
  • Implementation of the HPC Cloud Framework (A. Wong) Services provided at the Cloud service stack: Bottom (IaaS layer): the Amazon EC2 was used to provide cloud infrastructure services Middle (HPCaaS Layer): a HPC software library was used to expose and access Amazon EC2 services Top (SaaS Layer): a HPC application service was developed and exposed as a tool in the Galaxy server
  • Slide 28
  • Galaxy provides a powerful feature for tool integration where each tool (application) is presented to users as a Web form SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013 The Galaxy Web-based Platform (A. Wong)
  • Slide 29
  • A HPC cluster was being constructed where compute instances of the cluster would support mpiBlast execution SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013 An Interface to Access the HPC Cloud (A. Wong)
  • Slide 30
  • A cluster of 8 nodes was constructed at Amazon EC 2 SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013 An Interface to Access the HPC Cloud (A. Wong)
  • Slide 31
  • mpiBlast was accessed by supplying parameters: cluster name, number of processes and other typical parameters SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013 An Interface to Access mpiBlast (A. Wong)
  • Slide 32
  • mpiBlast execution finished at Amazon EC2; its result file was transferred automatically to the Galaxy server for post processing SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013 An Interface to mpiBlast (A. Wong)
  • Slide 33
  • Uncinus: Cloud Deployment (P. Church) Supports Resource Allocation Workflow Orchestration Cloud Bursting Genomics in the clouds Gene Discovery Personalized Genomics Leverage EC 2 to improve the speed and accuracy of analysis SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 34
  • Uncinus: Case Study (P. Church) To identify genes transferred upon digestion of dairy products Mother -> Child A 8 step workflow was developed and ran on Uncinus Run on the following resources; SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013 Resources#Nodes Amazon (cc1.4xlarge)2 Amazon (m1.Large)2 West-Lin Cluster2 Mamsap Server1
  • Slide 35
  • Uncinus: Case Study (P. Church) Cloud bursting improved performance Workflow mode reduced run time by 8 hours SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 36
  • Uncinus: Case Study (P. Church) Results from the workflow found genes active during lactation and during digestion of dairy Is this gene transfer or a reaction? Further work is needed SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 37
  • Increasing Scalability Hybrid Clouds Storage Cloud Compute Cloud Storage Cloud Compute Cloud Publishing Service Request (Distributed) Service Broker Broker 1 Broker N Private Compute Cloud Public Clouds SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 38
  • Solutions from Hybrid/Federated Clouds Hybrid/Federated Cloud Management (FCM) Architecture A recent work that provides a reference architecture consisting of brokering services User requests are serviced by creating virtual appliances based on user request parameters and ran inside virtual machines Appliances are stored in repositories and decomposed over time to support the creation of future appliances As virtual appliances contain a software stack (operating system) upwards, there are high data transfer costs SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 39
  • Solutions from Hybrid/Federated Clouds There is also an (unnamed) toolkit for VM migration between clouds Users are able to transfer VMs between public and private clouds to control load (manually or automatically) However, the interface itself is primitive at best SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013
  • Slide 40
  • Conclusions Clouds are being moved from business to specialized research HPC on clouds promise scalability, faster turnaround times, lower costs, services on demand Discipline specialist should not be forced to become (good) programmers and system administrators Easy and discipline oriented interfaces are very important Web tools offer discipline oriented interfaces but are inflexible and do not support HPC widely Combining HPC clouds and Web tools is the way HPC applications exposed as services of SaaS cloud and accessed using Web forms is the solution! Hybrid clouds will grab the HPC market SaaS Clouds Supporting HPC Biology Sciences - CMU CV July 2013