View
217
Download
0
Tags:
Embed Size (px)
Citation preview
GG in Data-Intensive Discovery
Challenges in analyzing a High-Resolution Climate Simulation
John M. Dennis, Matthew [email protected], [email protected]
October 26, 2010
October 26-28, 20101
GG in Data-Intensive Discovery 2
Climate Modeling?
Coupled models Atmosphere-Ocean-Sea Ice-Land
Challenges of climate:Need many years -->Limits resolution -->Limits parallelism
Community Earth System Model (CESM)Formally: Community Climate System Model
(CCSM)
October 26-28, 2010
GG in Data-Intensive Discovery 3
IPCC & AR5Intergovernmental Panel on Climate Change (IPCC)
“The IPCC assesses the scientific, technical and socio-economical information relevant for the understanding of the risk of human-induced climate change”
The Fifth Assessment Report (AR5)
CESM plansControl: 1 run x 1300 years
Historical: 40 runs x 55 years
Future: 180 runs x 95 years
20,600 years of simulation
October 26-28, 2010
GG in Data-Intensive Discovery 4
Would you like to Supersize that?
October 26-28, 2010
GG in Data-Intensive Discovery 5
Absolutely, But how will it clog up my networks, and fill up my archive?
October 26-28, 2010
GG in Data-Intensive Discovery 6
PetaApps projectPetaApps project: Interactive Ensemble
Kinter, Stan (COLA)Kirtman (U of Miami)Collins, Yelick (Berkley)Bryan, Dennis, Loft, Vertenstein (NCAR)Bitz (U of Washington)
Ultra-High resolution climateExplore impact of weather noise on ClimateExplore technical/Computer Science issues
~99,000 core Cray XT5 system at NICS [Kraken]
Large TG allocation: 35M CPU hours
October 26-28, 2010
GG in Data-Intensive Discovery 7
Wendell Sea:Ice thickness (1°)
October 26-28, 2010
GG in Data-Intensive Discovery 8 October 26-28, 2010
GG in Data-Intensive Discovery 9
Impact of Supersizing
October 26-28, 2010
Regular Supersized #1 (0.5x0.1)
Supersized #2 (0.25x0.1)
ATM 0.2 GB 0.9 GB 3.6 GB
LND 0.1 GB 0.2 GB 0.9 GB
ICE 0.7 GB 72 GB 72 GB
OCN 1.2 GB 122.5 GB 122.5 GB
Total 564 TB 48,357 TB 49,187 TB
16x
100x
GG in Data-Intensive Discovery 10
OutlineClimate Modeling
Production Data challenges: AR5 & IPCC
PetaApps simulation
Generating the Data
Analyzing the Data
Conclusions
October 26-28, 2010
GG in Data-Intensive Discovery 11
Large scale PetaApps run
155 year control run0.1° Ocean model [ 3600 x 2400 x 42 ]0.1° Sea-ice model [3600 x 2400 x 20 ]0.5° Atmosphere [576 x 384 x 26 ]0.5° Land [576 x 384
Statistics~18M CPU hours5844 cores for 4-5 months~100 TB of data generated0.5 to 1 TB per wall clock day generated
October 26-28, 2010
4x current production
100x current production
GG in Data-Intensive Discovery 12
Large scale PetaApps run
(con’t)Work flowRun on Kraken (NICS)Transfer output from NICS to NCAR (100 – 180 MB/sec sustained)
Caused noticeable spikes in TG network traffic
Archive on HPSSData analysis using 55 TB project space at NCAR
October 26-28, 2010
GG in Data-Intensive Discovery 13
Issues/challenges with runs
Very large variability with I/O performance
2-10x slowdown common300x slowdown was observedInterference with other jobs?
October 26-28, 2010
GG in Data-Intensive Discovery 14
OutlineClimate Modeling
Production Data challenges: AR5 & IPCC
PetaApps simulation
Generating the Data
Analyzing the Data
Conclusions
October 26-28, 2010
GG in Data-Intensive Discovery 15
Issues with data analysis
Miss-match between structure and form of data needed for analysis
Model generated file:Timestep = n : U,V,T,PS,CLDLOW file.n.nc
Timestep = n+step=m: U,V,T,PS,CLDLOWfile.m.nc
Analysis:Average T (temperature) for last 40 years
Sub-setting and average operation necessary
October 26-28, 2010
GG in Data-Intensive Discovery 16
Issues with data analysis (con’t)
Slightly different form of input for different analysis/tool
Diversity of analysis toolsNCL (NCAR Command Language)
NCO (NetCDF operators
Feret
IDL
Matlab
Cascade of intermediate products
“Beware the data cascade, forever your disk will it fill.” -Yoda-
October 26-28, 2010
GG in Data-Intensive Discovery 17
Impact of self describing format
All variables [~300 varaibles]for single timestep+ Matches raw output from model
- Does not match needs of analysis tools
One variable for single timestep+ Matches analysis tool needs
+ Conceptually simple
- Replication of self describing metadata
- larger number of files…[~75M files]
One variable multiple timesteps+ Matches analysis tool needs
- Lose of generality
October 26-28, 2010
GG in Data-Intensive Discovery 18
Parallelizing Analysis
Parallelizing analysis of atmospheric data
Goal:Speedup analysis of high-resolution data
Analysis no slower then low resolution
AMWG diagnostic packageNCO operators [calculating statistics]
NCL ploting
Parallelization using Swift [T. Sines]
October 26-28, 2010
GG in Data-Intensive Discovery
AMWG Diagnostics
19
• Improved Computing = increasing amounts of datao Data generation multi-threadedo Data analysis single-threaded
Climate Model(Community Earth System Model)
October 26-28, 2010
GG in Data-Intensive Discovery 20
SwiftWorkflow system from U of Chicago
(http://www.ci.uchicago.edu/swift/index.php)
Allows parallel execution of scripts
Relatively easy to use
Dependency driven
Variety of jobs submission options
October 26-28, 2010
GG in Data-Intensive Discovery 21
Issues with SwiftFull generality generates “excessive” copying of data
Climate workflow has small number of flops relative to input/output data size
Copy-in data: 15 sec
Calculate: 60 sec
Copy-out data: 15 sec
Used “un-published” option to perform direct-IO [0-copy]
Unresolved issue with deleting out of scope ‘files’
October 26-28, 2010
GG in Data-Intensive Discovery 22
Swift workflow on Dash (8 cores)
October 26-28, 2010
GG in Data-Intensive Discovery 23
Data analysis Platforms
Twister@NCAR:8 core, Intel box, GPFS filesystem
Dash@SDSC:Compute nodes with SSD disk
vSMP node with SSD + ScaleMP
GPFS-WAN
Nautilus@NICS1024 core SGI Ultra Violet
GPFS filesystem
October 26-28, 2010
GG in Data-Intensive Discovery 24
Execution time for AMWG
diagnosticsTwister (8 cores)
Dash (8 cores)
Nautilus(8 cores)
2 deg/10 yrs 243 sec 78 sec 277 sec
1 deg/10 yrs 663 sec 621 sec
0.5 deg/2 yrs 1085 sec
October 26-28, 2010
Nice/Interesting resultNot so interesting result
GG in Data-Intensive Discovery 25
Performance Issues with AMWG
diagnosticTwister@NCAR:No specialized hardware
Tiny system
Dash@SDSC:vSMP system has network driver problem
7.7x too slower compute node vs vSMP
Nautilus@NICS:Apparently excessive Java exec overhead
October 26-28, 2010
GG in Data-Intensive Discovery 26 October 26-28, 2010
ConclusionsPetaApps project [Interactive Ensembles]
155 year CCSM control run 0.5° ATM & LND + 0.1° OCN ICE0.5 to 1 TB/day of data to analysis
Increased ability to generated data creates analysis bottleneck core
Encouraging results for parallel workflow with Swift
No transformative benefit yet for expensive hardware !Overnight
After coffee break
Nearly Instant
No solution yet to data volume growth
27
Acknowledgements• NCAR: • D. Bailey
• F. Bryan
• T. Craig
• B. Eaton
• J. Edwards [IBM]
• N. Hearn
• K. Lindsay
• N. Norton
• M. Vertenstein
• COLA:• J. Kinter
• C. Stan
• U. Miami• B. Kirtman
• U.C. Berkeley• W. Collins
• K. Yelick (NERSC)
• U. Washington• C. Bitz
• Grant Support:• DOE
• DE-FC03-97ER62402 [SciDAC]
• DE-PS02-07ER07-06 [SciDAC]
• NSF • Cooperative Grant NSF01• OCI-0749206 [PetaApps]• CNS-0421498• CNS-0420873• CNS-0420985
• Computer Allocations:• TeraGrid TRAC @ NICS• DOE INCITE @ NERSC • LLNL Grand Challenge
• Thanks for Assistance:• Cray, NICS, and NERSC
• NICS:• M. Fahey• P. Kovatch
• ANL:• R. Jacob• R. Loy
• LANL:• E. Hunke• P. Jones• M. Maltrud
• LLNL• D. Bader• D. Ivanova• J. McClean (Scripps)• A. Mirin
• ORNL: • P. Worley
and many more…October 26-28, 2010GG in Data-Intensive Discovery