© 2014 ADAPTIVE COMPUTING, INC.
HPC, Cloud & Big Workflow:What’s New in Moab 8.0
Trev HarmonAdaptive Computing
ISC'14
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Adaptive Computing Highlights
▪ Innovating world-class HPC solutions for over 12 years
▪ Pioneers of HPC schedulers, grid, power management, HPC-Cloud, optimization, scale, dynamic provisioning, Big Workflow and more
▪ 50+ patents issued or pending
▪ Backed by top-tier investors
▪ Many customers in the Top 100 and Fortune 500
▪ Top systems including: #2 Titan, Cascade, Cielo, Hopper, & Bluewaters
▪ Major multi-nationals including: DOW, Exxon, & Boeing
▪ Largest provider of HPC workload management software to HPC sites*
▪ Global partnerships include Intel, HP, IBM, Cray, SGI, & Microsoft
Cloud System ManagementInnovator
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Broad HPC Customer Base
Oil and Gas, Financial, Manufacturing, Research and Government
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Expanding HPC Value & Use
Greater access to technical computing resources
Expanding HPC applicability across industries
Demand for more simplified access & management (e.g. SLAs)
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Growth in System Size & Complexity
100 cores --> 1 Million+ cores
Diversity of environments and processing needs
Growing organizational complexity
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Greater Need for Alignment with Organization / Business Directives
Increased tracking & accountability
Increasing global competition
Ability to quickly adapt is vital
Growing Collaboration
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Themes for Accelerating Insights
▪ Unify data center resources ▪ As a single, adaptive ecosystem▪ Technical computing (HPC & Big Data) ▪ Public and private cloud▪ Bare metal & virtual machines
▪ Optimize the analysis process▪ Increase throughput and productivity▪ Ensure SLAs, maximize uptime▪ Reduce cost, complexity and errors
▪ Guarantee service to the business▪ Policies that model your organization▪ Prove services were delivered▪ Job completion in spite of failures▪ Verify resources were allocated fairly
© 2014 ADAPTIVE COMPUTING, INC.
What’s New in 8.0Enhancing Big Workflow
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
What’s new in 8.0 - Unify
▪ OpenStack▪ Breaks down siloed environments▪ Offers virtual and physical resource provisioning for IaaS and PaaS ▪ Select Beta Customers
▪ Intersect360▪ Moab and TORQUE - top two job management packages ▪ Received 40% of the mentions
“Adaptive’s Big Workflow…is to provide a way for big data, HPC, and cloud environments to interoperate, and do so dynamically based on what applications are running. With the added benefits of a unified platform, OpenStack is a promising platform to interoperate multiple environments.”
-Addison Snell, CEO Intersect360
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
What’s New in 8.0 – Optimize
▪ Moab Performance Boost▪ 2-3x overall performance improvements▪ 100K Job Submission▪ High Throughput Computing with Nitro
▪ Advanced Data Staging▪ Multi-job workflow▪ Staging job runtime prediction▪ Improved cluster utilization▪ Multiple transfer methods
▪ Advanced Power Management▪ New power states options
▪ Suspend▪ Hybernate▪ Shutdown
▪ Clock Frequency Control
NODE S
Input Output
Compute
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
What’s New in 8.0 – Guarantee
▪ Next Generation Viewpoint▪ Enhanced Web-based UI▪ Next Generation dashboard▪ Today monitors and reports workload and resource utilization
▪ Cray 3D Torus topology awareness
Web
© 2014 ADAPTIVE COMPUTING, INC.
Ascent Project
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Performance Boost with Operation Ascent
▪ 3x the Performance Boost▪ Reduce Command Latency▪ Decrease Scheduling Cycle Time▪ Improve Multi-Threading ▪ Faster Moab/TORQUE Communication
▪ Advanced High-throughput Computing with Nitro
ProgressExample
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Nitro – High Throughput Computing
▪ Removes launch speed bottlenecks▪ Achieves exascale computing▪ Localizes decision making▪ Up to 100x faster throughput on short
jobs▪ Launches 10 jobs per node per second▪ Reduces latency▪ Runs on Moab/TORQUE environments
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
How Does Nitro Work?
▪ Ultra high-speed message queue▪ Different approach to scheduling▪ Combines small, alike jobs▪ Creates policies for the entire batch job▪ Schedules the batch as one job
▪ Incur scheduling overhead only once▪ Not once per individual small job
▪ Limitations▪ Speed of your processor & job size▪ Nitro sacrifices some granularity in management
▪ i.e. individual tasks in a large batch cannot be cancelled or pre-empted in isolation
▪ The batch is the unit of management and reporting
© 2014 ADAPTIVE COMPUTING, INC.
Topology-awareNode Allocation
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Topology-aware Node Allocation
▪ Cray Gemini 3D Torus▪ Network characteristics-aware
▪ 3D torus▪ Y-dimension bandwidth▪ Dateline zones
▪ Shape-fitting▪ Six shapes
▪ Built-in Moab node allocation policy
© 2014 ADAPTIVE COMPUTING, INC.
Data-staging
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Data-staging Refactor
▪ Data-staging using Moab “system” jobs▪ Input and output data-staging system jobs▪ System jobs separately scheduled by Moab▪ Dependencies between system jobs and user job▪ Calculate system jobs’ data-staging wall time estimates
▪ Support additional file transfer utilities▪ Linux rsync in addition to scp utility▪ Commercial data-transfer products (e.g. Aspera)
© 2014 ADAPTIVE COMPUTING, INC.
▪Node Allocation Timing Exception▪If data staged to local file system, compute nodes allocated
during data-staging system jobs▪Why? Preserve job execution time consistency!
▪Grid data-staging▪Part of data-staging initiative▪Grid Moab chooses cluster▪Grid Moab stages data
▪Can run data-staging system jobs on dedicated data transfer servers
Data-staging Refactor
© 2014 ADAPTIVE COMPUTING, INC.
Power Management /Green Computing
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Power/Performance Profiles
▪ Minimizing energy consumption requires application-specific optimal clock frequency
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
CPU Clock Frequency Control
▪ New cpuclock= job submission option▪ Absolute Clock Frequency Number
▪ Example: cpuclock=2200 or cpuclock=1800mhz
▪ Linux Power Governor Policy▪ Example: cpuclock=conservative
▪ Relative P-state Number▪ Values 0-15
▪ 0=“turbo” frequency▪ 15=slowest frequency
▪ Example: cpuclock=0 or cpuclock=P2
▪ Can set in job templates
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
CPU Clock Frequency Control (continued)
▪ TORQUE pbs_mom sets clock frequency▪ Logs clock frequency changes in pbs_mom log
▪ Moab records▪ Job’s requested clock frequency in job record▪ Nodes’ clock frequency in node statistics
▪ Uses▪ Energy conservation for lower operational costs▪ Power/performance profile generation▪ Diagnostics
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Green Policy Configuration
▪ New Moab Web Services RM Plug-in▪ Contains power management
logic▪ Specifies power state Moab
should place a compute node in when applying “green” policy
▪ Standby▪ Suspend▪ Hibernate▪ Shutdown▪ Off
▪ Multi-threaded▪ New power management
“reference” scripts
© 2014 ADAPTIVE COMPUTING, INC.
Administrator Portal
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Admin Portal (8.0 - 2014)
▪ Exciting Features▪ Dashboards: Workload and Resource Views▪ Simplified management of credentials▪ Easy to use Policies
▪ Priority▪ Fairshare▪ Backfill▪ Node Policies
▪ Cluster Management▪ Historical Database▪ HPC Web Services
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Persistent Database
▪ Relational database for historical data!▪ Published View Schema▪ Easily extract reports using standard reporting
tools / frameworks▪ Prepopulated Views
Web
Services
API
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Dashboard
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Credential Management - Details
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Graphical Policy Management
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Resource Management – Zoom levels
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Resource Management – Zoom levels
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Resource Management – Historic Utilization
© 2014 ADAPTIVE COMPUTING, INC.© 2014 ADAPTIVE COMPUTING, INC.
Submitting a job…
© 2014 ADAPTIVE COMPUTING, INC.
Questions?