Upload
kerry-lane
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
A Framework for Visualizing Science at the Petascale and Beyond
Kelly GaitherKelly Gaither
Research ScientistResearch Scientist
Associate Director, Data and Information AnalysisAssociate Director, Data and Information Analysis
Texas Advanced Computing CenterTexas Advanced Computing Center
Outline of Presentation
• Science at the Petascale– Scaling Resources– Scaling Applications– Access Mechanisms
• Issues and Impediments• Framework
Science at the Petascale
Global Weather Prediction
Understanding Chains of Reaction with Living Cells
Formation and Evolution of Stars and Galaxies in the Early Universe
Scaling HPC Resources• Mission
– Provide greater computational capacity/capability to the science community to compute ever larger simulations
• Enablers:– Commodity multi-core chip sets with low
power/cooling requirements– Efficient packaging for a compact footprint– High-speed commodity interconnects for fast
communications– Affordable! (Nodes with 8 cores, 2GB/core memory,
in the 2K/node price range)
TeraGrid Network Map: BT2
Largest single machine 96 TF
TeraGrid Network Map: AT2
RangerFeb 2008 (0.5 PF)
TeraGrid Network Map: AT2
RangerFeb 2008 (0.5 PF)
KrakenJune 2008(~1PF)
TeraGrid Network Map: AT2
RangerFeb 2008 (0.5PF)
KrakenJune 2008(~1PF)
Track2C2010 (>1PF)
TeraGrid Network Map: AT1
RangerFeb 2008 (0.5PF)
KrakenJune 2008(~1PF)
Track2C2010 (>1PF)
Track12010 (10PF)
Scaling Analysis Resources• Mission
– Provide an interactive interface allowing users to manipulate/view the results of their science
• Enablers:– Commodity chips with low power/cooling requirements?
Commodity graphics chips yes, low power/cooling no!– Efficient packaging for a compact footprint? Until
recently, desktop box packaging, now available in rack mounted 2U boxes
– High-speed commodity interconnects for fast communications? Yes!
– Affordable? (Nodes with 8 cores, 6GB/core memory, in the 10K/node price range) No!
TeraGrid Network Map: BT2
Largest single machine 96 TF Maverick: 0.5TB shared Maverick: 0.5TB shared
memory, 16 GPUs, 128 coresmemory, 16 GPUs, 128 cores
UIC/ANL Cluster: 96 nodes, UIC/ANL Cluster: 96 nodes, 4GB/node, 96 GPUs4GB/node, 96 GPUs
TeraGrid Network Map: AT2
Spur: 1TB aggregate memory, Spur: 1TB aggregate memory, 32 GPUs, 128 cores32 GPUs, 128 cores
RangerFeb 2008 (0.5PF)
KrakenJune 2008(~1PF)
Track2C2010 (>1PF)
Track12010 (10PF)
UIC/ANL Cluster: 96 nodes, UIC/ANL Cluster: 96 nodes, 4GB/node, 96 GPUs4GB/node, 96 GPUs
Impediments to Scaling Analysis Resources
• Power and cooling requirements: (10x more power needed for analysis resource)!
• Footprint: (2x more space needed for analysis resource)!
• Cost: (5x more money needed for comparable analysis resource)!
Scaling HPC Applications• Mission
– As the number of processing cores increases, scale as close to linearly as possible
• Enablers:– Science driven need to solve larger and larger
problems – so significant intellectual body of work applied to scaling applications
– There is basic information that you know ahead of time
• Size of the problem you want to solve• Number of unknowns that you are trying to solve for• Decomposition Strategy• Communication patterns between nodes
Application Examples: DNS/Turbulence
Application Examples: DNS/Turbulence
Courtesy: P.K. Yeung, Diego Donzis, TG 2008
Application Example: Earth Sciences Mantle Convection, AMR Method
Courtesy: Omar Ghattas, et. al.
Application Example: Earth Sciences
Courtesy: Omar Ghattas, et. al.
Scaling Analysis Applications• Mission
– As the number of processing cores increases, scale as close to linearly as possible
• Enablers:– Science driven need to solve larger and larger
problems? Yes, but it’s more complicated than that– Is there basic information that you know ahead of
time?• Size of the problem you want to analyze? Yes• Decomposition Strategy? Tricky!• Communication patterns between nodes? Dependent on
your decomposition strategy!
Impediments to Scaling Analysis Applications
• Decomposition Strategy is a moving target! Tied to the viewpoint.
• Have an additional requirement for interactive frame rate performance!
Accessing HPC Applications• Mission:
– Provide mechanisms for submitting and perhaps monitoring job performance
• Enablers:– Schedulers for submitting jobs – comes with a price!
• Impediments:– Weak support for interactive applications– Still in the mode of hypothesize, run, check…
Accessing Analysis Applications• Mission:
– Provide mechanisms for interactively running applications to analyze data
• Enablers:– Lots of intellectual capital in remote and collaborative
access mechanisms – this is where we are ahead of the HPC community
• Remote desktop• VNC• AccessGrid
Impediments to Reaching the Petascale
• 10x power requirement• 2x space requirement• 5x more expensive• Tenuous balance between requirement for
interactive performance and need to scale to more processing cores
• Retrofitting our access mechanisms to work with batch schedulers
Requirements for Designing a Framework for Visualizing at the
Petascale• 10x power requirement• 2x space requirement• 5x more expensive• Address balance between requirement for
interactive performance and the need to scale to more processing cores
• Retrofit our access mechanisms to work with batch schedulers
Not something I can address short term
Requirements for Designing a Framework for Visualizing at the
Petascale• Minimize Data Movement – users can generate
100s of TB of data, but can’t move it off the storage local to the machine it was generated on
• Optimize for the platforms that we can run on – data starved cores become much more apparent
• Reduce the barriers to entry
SCOREVIS Software Stack
• Scalable, Collaborative and Remote Visualization• NSF STCI funded project that began March 1• Balance Goals:
– Accessibility: provide remote and collaborative access to visualization applications over common networks with standard communications protocols.
– Rendering: include data decomposition, the transformation from data primitives to geometric primitives, and the transformation from geometric primitives to pixels.
– Scalability: choose between image decomposition or data decomposition depending on underlying size of the data and number of processors available.
SCOREVIS Requirements
• Minimize Data Movement• Address balance between requirement for
interactive performance and the need to scale to more processing cores
• Retrofit our access mechanisms to work with batch schedulers
• Optimize for the platforms that we can run on – data starved cores become much more apparent
• Reduce the barriers to entry
SCOREVIS Approach
• Minimize Data Movement – move analysis to where the data is generated
• Address balance between requirement for interactive performance and the need to scale to more processing cores – address data decomposition and scaling of applications and core algorithms
• Retrofit our access mechanisms to work with batch schedulers – allow remote and collaborative access
• Reduce the barriers to entry – phased approach providing access to familiar applications – OpenGL based
Traditional OpenGL Architecture
Application Node
Application OpenGL Hardware Screen
Application/Client Host
Application libGL/Xlib X Server
GLX and XProtocol
OpenGLHardware
Screen
User’s Local Display
SCoReViS Architecture
Chromium
Compositing
Application Processing (in some cases)
Application/UI Rendering
Chromium Server
Hardware Accelerated OpenGL
or
Mesa Software Rendering
VNC Client VNC Client …
Compute NodeVNC Server on Login Node
Application/UI Rendering
Chromium
VNC Client VNC Client …
SCOREVIS To Date• Been successful in providing remote and collaborative access
to visualization applications based on OpenGL – with a caveat (ParaView and VisIt and HomeGrown)– Did get “interactive” frame rates – 6-10 fps
• Been successful in profiling to better understand where the bottlenecks exist in the analysis pipeline:– I/O (Lustre Parallel file system ~32GB/sec)– Core visualization algorithms (current apps do not do a good
job for load balancing)– Rendering in mesa – quickly find out that native mesa does
not handle multiple cores• Also developing quick and dirty ways to handle in-situ analysis
Acknowledgements:
• Thanks to the National Science Foundation Office of Cyberinfrastructure for supporting this work through STCI Grant #0751397.