Upload
domenic-randall
View
212
Download
0
Tags:
Embed Size (px)
Citation preview
Grid Computing in Distributed High-End Computing Applications: Coupled Climate Models and Geodynamics Ensemble Simulations
Shujia ZhouNorthrop Grumman IT/TASC
W. Kuang2, W. Jiang3, P. Gary2, J. Palencia4, G. Gardner5 2NASA Goddard Space Flight Center, 3JCET, UMBC,
4Raytheon ITSS, 5INDUSCORP
Outline
• Background– One potential killer application (coupling distributed climate models)
– One near-reality application (managing distributed ensemble simulation)
– One framework supporting Grid computing applications: Common Component Architecture (CCA/XCAT3, CCA/XCAT-C++)
– High-speed network at NASA GSFC
• An ensemble-dispatch prototype based on XCAT3
• ESMF vs. Grid
• ESMF-CCA Prototype 2.0: Grid computing
• Summary
Earth-Sun System Models
• Typical Earth-Sun system models (e.g., climate, weather, data assimilation) consist of several complex components coupled together through exchange of a sizable amount of data.
• There is a growing need for coupling model components from different institutions– Discover new science
– Validate predictions
A M PEs to N PEs data-transfer problem !!!
Coupled Atmosphere-Ocean Models
Atmosphere
Ocean
Different grid type, resolution
Flow Diagram of Coupling Atmosphere and Ocean(a typical ESMF application)
Registration
ESMF_State::exportAtm
CplAtmXOcn
ESMF_State::importOcn
Regridding: interpolate(…)
Ocean
ESMF_State::exportOcn
Run n tOcn time steps, run()
CplOcnXAtm
ESMF_State::importAtm
Regridding: extract(…)
Atmosphere exportAtm
Run m tAtm time steps, run()
timet=t0
t=t0 + ncycle tglobal
Create Atm, Ocn, CplAtmXOcn, CplOcnXAtm componets
tglobal
Finalize
Atmosphere
Component registrationdata
Coupling Earth System Model Components from Different Institutions
• Current approach: physically port source codes and their support environment such as libraries and data files to one computer
• Problems: – Considerable efforts and times are needed for porting, validating,
and optimizing the codes– Some code owners may not want to release their source codes.– Owners continue to update the mode codes.
A killer application: Couple models at their institutions via Grid computing !
How Much Data Exchange in A Climate Model? (e.g, NOAA/GFDL MOM4 Version Beta2)
• Import: 12 2D arrays– U_flux, v_flux, q_flux,
salt_flux, sw_flux, fprec, runoff, calving, p, t_flux, lprec, lw_flux
– For the 0.25 degree resolution without mask, ~99 MB data
• Export: 6 2D arrays
– T_surf, s_surf, u_surf, v_surf, sea_level, frazil
– For the 0.25 degree resolution without mask, ~49 MB data
For a coupling interval of 6 hours between atmosphere and ocean models with 0.25 degree resolution, data exchange is typically not more frequent than 1 minute of a wall clock <1MB per second.
A Gbps-network is much sufficient for this kind of data exchange!
Observation: ~100KB/s for using “scp” to move data from NCCS to UCLA!
Distributed Ensemble Simulations
• Typical Earth-Sun system models (e.g., climate, weather, data assimilation, solid Earth) are also highly computationally demanding– One geodynamo model, MoSST, requires 700 GB
RAM, and 1016 flops for the (200, 200, 200) truncation level
• The ensemble simulation is needed to obtain the best estimation used for optimal forecasting– For a successful assimilation with MoSST, a minimum
of 30 ensemble runs and ~50PB storage are expected.
Using a single supercomputer is not practical!
Characteristics of Ensemble Simulation
• Little or no interaction among ensemble members– The initial state for next ensemble run may depend on
the previous ensemble run---loosely coupled.
• High failure tolerance– Small network usage reduces the failure possibility
– The forecasting depends on the collection of all the ensemble members, not on a particular ensemble member
Technology: Grid Computing Middleware (CCA/XCAT)
• Merging OGSI and DOE’s high-performance component framework, Common Component Architecture (CCA)
Component model Compliant with CCA specification
Grid services Each XCAT component is also a collection of Grid services XCAT Provides Ports are implemented as OGSI web service XCAT Uses Ports can also accept any OGSI compliant web service
• XCAT: provide a component-based Grid services model for Grid computing
Component Assembly: Composition in space “Provide-Use” pattern facilitates composition
Standard ports are defined to streamline the connection process More applicable for the cases where users and providers know each others
Technology: Grid Computing Middleware (Proteus: Multi-Protocol Library)
CCA Framework
Proteus API
Protocol 1 Protocol 2
TCP UDT
• Proteus provides single-protocol abstraction to components
– Allows users to dynamically switch between traditional and high-speed protocols – Facilitates use of specialized implementations of serialization and deserialization
Proteus allows a user to have a choice of networks
Technology: Dedicated High-Speed Network (Lambda Networks)
• Dedicated high speed links (1Gbps, 10 Gbps, etc)
• Being demonstrated in large-scale distributed visualization and data mining
• National LambdaRail is currently under development
• NASA GSFC is prototyping it and is in the process of connecting to it.
High Performance Networking and Remote Data AccessGSFC L-Net for NCCS and Science Buildings
JPG 8/05/04ISI-EGSFC at GreenbeltNLR
CMUJPLSIOUCSDUICGWUUMCPBosSNetATDnetooo
MIT/HaystackUMBCDRAGON
ORNLA512pA512pA512pA512pA512pA512pA512pA512pA512pA512pA512pA512pT512pT512pT512pT512pT512pT512pT512pT512pFE64/64SATA35TBFibre Channel20TBSATA35TBSATA35TBSATA35TBSATA35TBSATA35TBSATA35TBSATA35TBFC Switch 128pFC Switch 128pFibre Channel20TBFibre Channel20TBFibre Channel20TBFibre Channel20TBFibre Channel20TBFibre Channel20TBFibre Channel20TBInfiniBand10GigE/1GigEGRAPHICS
ooo
ooo
High Performance Remote Data Cache Facility(creating Inter-Facility virtual SANs using SAN-over-IP technologies)
ARC/ProjectColumbiaA512pA512pA512pA512pA512pA512pA512pA512pA512pA512pA512pA512pT512pT512pT512pT512pT512pT512pT512pT512pFE64/64SATA35TBFibre Channel20TBSATA35TBSATA35TBSATA35TBSATA35TBSATA35TBSATA35TBSATA35TBFC Switch 128pFC Switch 128pFibre Channel20TBFibre Channel20TBFibre Channel20TBFibre Channel20TBFibre Channel20TBFibre Channel20TBFibre Channel20TBInfiniBand10GigE/1GigEGRAPHICS
GISSo o oFCCPU(s)
CPU(s)
VISGE Sw/Rtro o oFCCPU(s)
CPU(s)
VISGE Sw/Rtro o oooo
CPU(s)CPU(s)CPU(s)CPU(s)FCo o oFCCPU(s)
CPU(s)
VISVISGE Sw/RtriFCPGateway
FCIPGateway
iSCSIGatewayFCo o oNCCS“Classic”Level3
POP
at
McLean
VISOp. Sw/OADM
o o oFCCPU(s)
CPU(s)
VISVISGE Sw/Rtro o oFCCPU(s)
CPU(s)
VISVISGE Sw/RtrOther GSFC Science Data Facilities10-GESW/RTR
10-GESW/RTRBGP
FW10-GE Sw/Rtr
w/OSPF10 Gbps GE1 Gbps GEDark FiberLegend 2 Gbps FC
Distributed Ensemble Simulation via Grid Computing(System Architecture)
driver
dispatch
geo1 geo2 geo3
MoSST MoSST MoSST
host remote1 remote2 remote3
PE0 PE1PE0 PE1PE0PE1PE0
Note: is “grid computing codes” is an “application code” Separated for flexibility!
Prototype: Ensemble Simulation via Grid Computing(Components and Ports)
dispatch
geo1driver
geo2
dispatchUse
geo1Use
geo2Use
go
dispatchProvide
geo1Provide
geo2Provide
Simpler than “workflow”
Prototype: Ensemble Simulation via Grid Computing(Flow Chart of Invoking A Simulation)
dispatch geo1
useCMD useCMD
provideCMDprovideCMD
driver
geo2
useCMD
provideCMD
1 2
2
3
4
3
Dispatcher invokes remote applications
Run on three computer nodes connected by 10 Gigabit Ethernet
Prototype: Ensemble Simulation via Grid Computing(Flow Chart Of Feedback During A Simulation)
dispatch geo1
useCMD useCMD
provideCMDprovideCMD
driver
geo2
useCMD
provideCMD
A monitoring functionality is developed for geo components
3 4
4
1
2
1
Simulations report failure or completion
Adaptive User Interface
• Network programming is complex and its concept is unfamiliar to scientists
• A user-friendly interface is even more important in applying grid computing to scientific applications– A Model Running Environment (MRE) tool is
developed to reduce the complexity of running scripts by adaptively hiding details.
Original script
Marked script
Filled script
MRE 2.0 is used in GMI Production!
Where is ESMF in Grid Computing?
• Make components known to Grid– Need global Component ID
• Make component services available to Grid– ESMF_Component (F90 user type + C function pointer)
• C++ interfaces for three fundamental data types are not complete ESMF_Field, ESMF_Bundle, ESMF_State
• The function pointers need to be replaced with the remote one
• Make data-exchange type transferable via Grid– ESMF_State (F90 data pointer + C array)
• Serialization/deserialization is available The data represented by a pointer needs to be replaced with data copy
Grid-Enabled ESMF:Link Functions in Remote Model Components
run
final
init
run
final
init
run
final
init
Atmosphere Coupler Ocean
run
final
init
run
final
init
setEntryPoint
setService
Assembled component
Driver
gridlayout
gridlayout
gridlayout
gridlayout
Network
Grid-Enabled ESMF:Transfer Data Across Network
ESMF_State::importOcn
Ocean proxy
ESMF_State::exportOcn
OceanRMI
Network
Component, import/export state, clock
ESMF-CCA Prototype 2.0
Init()Run()Final()
Provide Port
Use Port
CCA component registration
CCA tool
ESMF concept
Init()Run()Final()
Provide Port
Use Port
Grid computing
ESMF_State
Init()Run()Final()
Proxy
Network
Atmosphere
Ocean
Global component ID
RMI for remote pointerXSOAP for data transfer
ESMF_State::exportAtm
CplAtmXOcn
ESMF_State::importOcn
Regridding
OceanProxy
ESMF_State::exportOcn
CplOcnXAtm
ESMF_State::importAtm
Regridding
Atmosphere
ESMF_State::exportAtm
timet=t0
t=t0 + ncycle tglobal
Create Atm, Ocn, CplAtmXOcn, CplOcnXAtm componets
tglobal
Finalize
Atmosphere
Component registration
Ocean
Evolution
Evolution
RMI
Registration
A sequential coupling between an atmosphere and a remote ocean model componentimplemented in the ESMF-CCA Prototype 2.0
Jython Script
ClimateComponent
Atm
A2O
Ocn
O2A
CplAtmXOcnComponent
A2O
Ocean1Component
Ocn
GoComponent
Go
AtmosphereComponent
Atm
CplOcnXAtmComponent
O2A
1. Launch components
2. Connect Uses and Provides Ports
Go
Composing Components with XCAT3
Run on two remote computer nodes connected by 10 Gigabit Ethernet
Summary
• Grid computing technology and high-speed network such as Lambda network make distributed high-end computing applications promising.
• Our prototype based on XCAT3 framework shows distributed ensemble simulation can be performed on a up to 10 Gbps network in a user-friendly way.
• ESMF component could be grid-enabled with the help of CCA/XCAT.
Backup slides
Prototype: Ensemble Simulation via Grid Computing(Flow chart of intelligently dispatching ensemble members)
driver
geo1
The type, “geoCMD,” is used to exchange data among components
geo2 geo3
1
2
3
4
5
6
dispatch
Scientific Objective:
Develop a geomagnetic data assimilation framework with MoSST core dynamics model and surface geomagnetic observations to predict changes in Earth’s magnetic environment.
( )ffa HxzKxx −+=
Xa : Assimilation solution
Xf : Forecast solution
Z : Observation dataK: Kalman Gain matrixH: Observation operator
Algorithm
observation
New Transport Layer Protocols
Why needed
• TCP’s original design for slow backbone networks
• Standard “out-of-the-box” kernel TCP protocol tunings inadequate for large bandwidth*long delay application performance
• TCP requires a knowledgeable “wizard” to optimize the host for high performance networks
Current throughput findings from GSFC’s 10-Gbps networking efforts
From UDP-based tests between GSFC hosts with 10-GE NIC’s, enabled by: nuttcp -u -w1m
From To Throughput TmCPU% RcCPU% %packet-loss
San Diego Chicago 5.213+ Gbps 99 63 0Chicago San Diego 5.174+ Gbps 99 65 0.0005
Chicago McLean 5.187+ Gbps 100 58 0McLean Chicago 5.557+ Gbps 98 71 0
San Diego McLean 5.128+ Gbps 99 57 0McLean San Diego 5.544+ Gbps 100 64 0.0006
Current throughput findings from GSFC’s 10-Gbps networking efforts
From TCP-based tests between GSFC hosts with 10-GE NIC’s, enabled by: nuttcp -w10m
From To Throughput TmCPU% RcCPU%
San Diego Chicago 0.006+ Gbps 0 0Chicago San Diego 0.006+ Gbps 0 0
Chicago McLean 0.030+ Gbps 0 0McLean Chicago 4.794+ Gbps 95 44
San Diego McLean 0.005+ Gbps 0 0McLean San Diego 0.445+ Gbps 8 3
Current throughput findings from GSFC’s 10-Gbps networking efforts
From UDT*-based tests between GSFC hosts with 10-GE NIC’s, enabled by: iperf
From To Throughput
San Diego Chicago 2.789+ GbpsChicago San Diego 3.284+ Gbps
Chicago McLean 3.435+ GbpsMcLean Chicago 2.895+ Gbps
San Diego McLean 3.832+ GbpsMcLean San Diego 1.352+ Gbps
• *Developed by Robert Grossman (UIC): http://udt.sourceforge.net/
Year Experts Non-experts Ratio
1988 1 Mb/s 300 kb/s 3:1
1991 10 Mb/s
1995 100 Mb/s
1999 1 Gb/s
2003 10 Gb/s 3 Mb/s 3000:1
The non-experts are falling behind
New Transport Layer Protocols
Major Types
• UDP and TCP Reno standard (“default w/OS”)
• Other versions of TCP (Vegas, BIC) are included in the Linux 2.6 train
– Other OS’s may not have the stack code included
• Alternative transport protocols are non-standard and require kernels to be patched or operate in user space
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Next Step: Transform A Model into A Set of Grid Services
usePortprovidePort(grid service)
XCAT Component
Wrapper to XCAT Component
Import/export stateInit(),Run(),Final()
ESMF Component
model
supercomputer data storage
•Standalone (local)
•Coupled systems (distributed)