View
212
Download
0
Category
Tags:
Preview:
Citation preview
Slide-1
University of Hawaii
Hackystat and the DARPA High Productivity Computing
Systems Program
Philip JohnsonUniversity of Hawaii
Slide-2
University of Hawaii
Overview of HPCS
Slide-3
University of Hawaii
High ProductivityComputing Systems
Goal: Provide a new generation of economically viable high productivity computing
systems for the national security and industrial user community (2007 – 2010)
Impact:Performance (time-to-solution): speedup critical national
security applications by a factor of 10X to 40XProgrammability (time-for-idea-to-first-solution): reduce
cost and time of developing application solutions Portability (transparency): insulate research and
operational application software from systemRobustness (reliability): apply all known techniques to
protect against outside attacks, hardware faults, & programming errors
Fill the Critical Technology and Capability GapToday (late 80’s HPC technology)…..to…..Future (Quantum/Bio Computing)
Fill the Critical Technology and Capability GapToday (late 80’s HPC technology)…..to…..Future (Quantum/Bio Computing)
Applications: Intelligence/surveillance, reconnaissance, cryptanalysis, weapons analysis, airborne contaminant
modeling and biotechnology
HPCS Program Focus Areas
Slide-4
University of Hawaii
Fill the high-end computing technology and capability gap for critical national security missions
Fill the high-end computing technology and capability gap for critical national security missions
Tightly Coupled Parallel Systems
2010High-End Computing Solutions
New Goal: Double Value
Every 18 Months
New Goal: Double Value
Every 18 Months
Moore’s Law Double Raw Performance
every 18 Months
Moore’s Law Double Raw Performance
every 18 Months
Vision: Focus on the Lost Dimension of HPC –
“User & System Efficiency and Productivity”
Commodity HPCs
Vector
Parallel Vector Systems
1980’s Technology
Slide-5
University of Hawaii
HPCS Technical Considerations
Microprocessor
Shared-MemoryMulti-Processing
Distributed-MemoryMulti-Computing
“MPI”
Custom Vector
Architecture TypesCommunication Programming
ModelsSymmetric
MultiprocessorsDistributed Shared
Memory
ParallelVector
CommodityHPC
VectorSupercomputer
ScalableVector
MassivelyParallel
ProcessorsCommodity
Clusters, Grids
Single Point Design Solutions are no longer Acceptable
HPCS FocusTailorable Balanced Solutions
PerformanceCharacterization
& PrecisionProgramming
Models
HardwareTechnology
SoftwareTechnology
SystemArchitecture
Slide-6
University of Hawaii
HPCS Program Phases I - III
02 05 06 07 08 09 1003 04
ProductsMetrics,
Benchmarks
Academia
Research
Platforms
Early
Software
Tools
Early
Pilot
Platforms
Phase IIR&D
Phase IIIFull Scale Development
Metrics and Benchmarks
System DesignReview
Industry
Application Analysis
PerformanceAssessment
HPCSCapability or
Products
Fiscal Year
Concept Reviews PDR
Research Prototypes
& Pilot Systems
Phase III Readiness Review
TechnologyAssessments
Requirementsand Metrics
Phase IIReadiness Reviews
Phase IIndustry
Concept Study
Reviews
Industry Procurements
Critical Program Milestones
DDR
Industry Evolutionary Development Cycle
Slide-7
University of Hawaii
Application Analysis/Performance Assessment
Activity Flow
Productivity
Ratio of Utility/Cost
Metrics
- Development time (cost)
- Execution time (cost)
Implicit Factors
DDR&E & IHEC Mission Analysis
HPCS Applications
1. Cryptanalysis2. Signal and Image Processing3. Operational Weather4. Nuclear Stockpile Stewardship5. Etc.
Common Critical Kernels
Participants
HPCS Technology Drivers
Define System Requirements and
CharacteristicsCompact
Applications
Applications
Application Analysis Benchmarks & Metrics Impacts
Mission Partners:
DOD
DOE
NNSA
NSA
NRO
Participants:
Cray
IBM
Sun
DARPA
HPCS ProgramMotivation
Inputs
Mission Partners
Improved Mission Capability
Mission-Specific Roadmap
Mission Work Flows
Slide-8
University of Hawaii
Workflow Priorities & Goals
Implicit Productivity FactorsWorkflow Perf. Prog. Port. Robust.Researcher HighEnterprise High High High HighProduction High High
Pro
du
cti
vit
y
Problem Size
Workstation
Cluster
HPCS
HPCS Goal
Re
se
arc
he
r
Production
Enterp
rise
• Workflows define scope of customer priorities
• Activity and Purpose benchmarks will be used to measure Productivity
• HPCS Goal is to add value to each workflow
– Increase productivity while increasing problem size
• Workflows define scope of customer priorities
• Activity and Purpose benchmarks will be used to measure Productivity
• HPCS Goal is to add value to each workflow
– Increase productivity while increasing problem size
MissionNeeds
SystemRequirements
Slide-9
University of Hawaii
Productivity Framework Overview
Phase I: Define Framework & Scope Petascale Requirements
Phase II: Implement Framework & Perform Design Assessments
Phase III: Transition To HPC Procurement Quality Framework
Value Metrics•Execution•Development
Benchmarks-Activity•Purpose
Workflows-Production-Enterprise-Researcher
Preliminary
Multilevel
System
Models
&
Prototypes
Final
Multilevel
System
Models
&
SN001
HPCS Vendors
HPCS FFRDC & Gov
R&D Partners
Mission Agencies
Acceptance
Level Tests
Run Evaluation
Experiments
Commercial or Nonprofit
Productivity Sponsor
HPCS needs to develop a procurement quality assessment methodology that will be the basis of 2010+ HPC procurements
HPCS needs to develop a procurement quality assessment methodology that will be the basis of 2010+ HPC procurements
Slide-10
University of Hawaii
HPCS Phase II Teams
Industry:
Productivity Team (Lincoln Lead)
PI: SmithPI: Elnozahy PI: Rulifson
MIT Lincoln Laboratory
Goal: Provide a new generation of economically viable high productivity computing
systems for the national security and industrial user community (2007 – 2010)
PI: Kepner PI: Lucas
PI: Koester
PI: Basili PI: Benson & Snavely
PIs: Vetter, Lusk, Post, Bailey PIs: Gilbert, Edelman, Ahalt, Mitchell
LCSOhioState
Goal: Develop a procurement quality assessment methodology that will be the basis
of 2010+ HPC procurements
Slide-11
University of Hawaii
Motivation: Metrics Drive Designs
Execution Time (Example)
Current metrics favor caches and pipelines
• Systems ill-suited to applications with
• Low spatial locality
• Low temporal locality
Development Time (Example)
• No metrics widely used
• Least common denominator standards
• Difficult to use
• Difficult to optimize
“You get what you measure”
Top500 LinpackRmax
Large FFTs(Reconnaissance)
StreamsAdd
Table Toy (GUPS)(Intelligence)
High
High
Low
Low
HPCS
Sp
atia
l L
oc
alit
y
Temporal Locality
Adaptive Multi-PhysicsWeapons DesignVehicle Design
WeatherC/Fortran
MPI/OpenMP
Matlab/Python
Assembly/VHDL
High PerformanceHigh Level Languages
LanguagePerformance
Lan
gu
ag
eE
xpre
ssiv
enes
s
UPC/CAF
SIMD/DMA
HPCS
Low
Low
High
High
Tradeoffs
Slide-12
University of Hawaii
Development Time (cost)
Execution Time (cost)
ProductivityMetrics
System Parameters(Examples)
BW bytes/flop (Balance)Memory latencyMemory size……..
Productivity
Processor flop/cycle Processor integer op/cycleBisection BW………Size (ft3)Power/rackFacility operation ……….
Code size Restart time (Reliability) Code Optimization time ………
Activity & Purpose
Benchmarks
Actual System
orModel
WorkFlows
Phase 1: Productivity Framework
(Ratio of Utility/Cost)C
om
mo
n M
od
eling
Interface
Slide-13
University of Hawaii
Development Time (cost)
Execution Time (cost)
ProductivityMetrics
System Parameters(Examples)
BW bytes/flop (Balance)Memory latencyMemory size……..
Productivity
Processor flop/cycle Processor integer op/cycleBisection BW………Size (ft3)Power/rackFacility operation ……….
Code sizeRestart time (Reliability) Code Optimization time ………
Activity & Purpose
Benchmarks
Actual System
orModel
WorkFlows
Phase 2: Implementation
(Ratio of Utility/Cost)C
om
mo
n M
od
eling
Interface
Dev
Inte
rfac
eE
xe In
terf
ace
Metrics Analysis ofCurrent and New Codes
(Lincoln, UMD & Mission Partners)
University Experiments(MIT, UCSB, UCSD, UMD, USC)
(ANL & Pmodels Group)
(ISI, LLNL& UCSD)
(Mitre, ISI, LBL, Lincoln, HPCMO, LANL & Mission Partners)
Performance Analysis(ISI, LLNL & UCSD)
Contains Proprietary Information - For Government Use Only
(Lincoln, OSU, CodeSourcery)
Slide-14
University of Hawaii
HPCS Mission Work Flows
Decide
Observe
Act
Orient
Production
Hours toMinutes
(Response Time)
Design
Simulation
Visualize
Enterprise
Monthsto days
Overall Cycle Development Cycle
Optimize
ScaleTestDevelopment
Years tomonths
Monthsto days
Code
DesignPrototyping
Evaluation
OperationMaintenance
Design
Code
Test
Port, Scale,Optimize
Init
ial
De
ve
lop
me
nt
Days tohours
Experiment
TheoryCode
TestDesignPrototyping
Hours tominutes
HPCS Productivity Factors: Performance, Programmability, Portability, and Robustness are very closely coupled with each work flow
HPCS Productivity Factors: Performance, Programmability, Portability, and Robustness are very closely coupled with each work flow
Researcher
Execution
Development
Initial Product Development
Port Legacy Software
Port Legacy Software
Researcher
Enterprise
Production
Slide-15
University of Hawaii
HPC Workflow SW Technologies
Production Workflow
• Many technologies targeting specific pieces of workflow
• Need to quantify workflows (stages and % time spent)
• Need to measure technology impact on stages
Design, Code, TestAlgorithm
Development Spec RunPort, Scale, Optimize
Workstation Supercomputer
OperatingSystems
Compilers
Libraries
Tools
ProblemSolving
Environments
Linux RT Linux
C++ F90
ATLAS, BLAS,FFTW, PETE, PAPIMPI
GlobusUML
POOMA
CORBA
CCA PVL
VSIPL||VSIPL++
ESMF
Matlab UPC CoarrayJava
TotalView
OpenMP
HPC SoftwareMainstream Software
DRI
Slide-16
University of Hawaii
Prototype Productivity Models
Special Model with Work Estimator (Sterling)
Least Action (Numrich)
Efficiency and Power(Kennedy, Koelbel, Schreiber)
hour
day
week
month
year
hour day week month year
Pro
gra
mm
ing
Tim
e
Execution Time
executionboundedmissions
programmingboundedmissions
Surveillance
Cryptanalysis
Intelligence
Weather(operational)
WeaponsDesign
HPCS Goal
Weather(research)
Time-To-Solution (Kogge)
x A xEffortMultipliers Size
ScaleFactors
CoCoMo II(software engineering
community)
productivityGUPS ...Linpack
useful opssecond GUPS
...Linpack
Hardware Cost
productivityfactor
mission
factor
Productivity Factor Based (Kepner)
productivityfactor
Language
Level
Parallel
Model
Portability
Availability
Maintenance
T(PL) I(PL) rE(PL)
I(P 0)I (PL)I (P 0) rE(P 0)E(PL)
E(P 0)
I(P 0) /L rE (P 0) /LUtility (Snir)
HPCS has triggered ground breaking activity in understanding HPC productivity-Community focused on quantifiable productivity (potential for broad impact)HPCS has triggered ground breaking activity in understanding HPC productivity-Community focused on quantifiable productivity (potential for broad impact)
P(S,A,U(.)) mincos t
U(T(S,A,Cost))
Cost
wSPEA
cf n cm co T
S = º [ wdev + wcomp ] dt; S = 0
Slide-17
University of Hawaii
Example Existing Code Analysis
Cray Inc. Proprietary Ğ Not For Public Disclosure
MG Performance
Cray Inc. Proprietary Ğ Not For Public Disclosure
NAS MG Linecounts
0
200
400
600
800
1000
1200
MPI Java HPF OpenMP Serial A-ZPL
comm/sync/dir
declarations
computation
Analysis of existing codes used to test metrics and identify important trends in productivity and performance
Analysis of existing codes used to test metrics and identify important trends in productivity and performance
Slide-18
University of Hawaii
Example Experiment Results (N=1)
0
1
10
100
1000
0 200 400 600 800 1000
Matlab
BLAS/MPI
SingleProcessor
SharedMemory
DistributedMemory
Matlab C
Per
form
ance
(S
pee
du
p x
Eff
icie
ncy
)
Development Time (Lines of Code)
C++
BLAS
pMatlab
MatlabMPI
BLAS/OpenMP
PVLBLAS/MPI
Research CurrentPractice
• Same application (image filtering)
• Same programmer• Different langs/libs
•Matlab•BLAS•BLAS/OpenMP•BLAS/MPI*•PVL/BLAS/MPI*•MatlabMPI•pMatlab*
• Same application (image filtering)
• Same programmer• Different langs/libs
•Matlab•BLAS•BLAS/OpenMP•BLAS/MPI*•PVL/BLAS/MPI*•MatlabMPI•pMatlab*
*Estimate
3
2 1
4
6
7 5
Controlled experiments can potentially measure the impact of different technologies and quantify development time and execution time tradeoffs
Controlled experiments can potentially measure the impact of different technologies and quantify development time and execution time tradeoffs
Slide-19
University of Hawaii
Summary
• Goal is to develop an acquisition quality framework for HPC systems that includes
– Development time– Execution time
• Have assembled a team that will develop models, analyze existing HPC codes, develop tools and conduct HPC development time and execution time experiments
• Measures of success– Acceptance by users, vendors and acquisition community– Quantitatively explain HPC rules of thumb:
• "OpenMP is easier than MPI, but doesn’t scale a high”• "UPC/CAF is easier than OpenMP”• "Matlab is easier the Fortran, but isn’t as fast”
– Predict impact of new technologies
Slide-20
University of Hawaii
Example Development Time Experiment
• Goal: Quantify development time vs. execution time tradeoffs of different parallel programming models
– Message passing (MPI)– Threaded (OpenMP)– Array (UPC, Co-Array Fortran)
• Setting: Senior/1st Year Grad Class in Parallel Computing (MIT/BU, Berkeley/NERSC, CMU/PSC, UMD/?, …)
• Timeline:– Month 1: Intro to parallel programming– Month 2: Implement serial version of compact app– Month 3: Implement parallel version
• Metrics:– Development time (from logs), SLOCS, function points, …– Execution time, scalability, comp/comm, speedup, …
• Analysis:– Development time vs. Execution time of different models– Performance relative to expert implementation– Size relative to expert implementation
Slide-21
University of Hawaii
Hackystat in HPCS
Slide-22
University of Hawaii
About Hackystat
• Five years old:– I wrote the first LOC during first week of May, 2001.– Current size: 320,562 LOC (not all mine)– ~5 active developers– Open source, GPL
• General application areas:– Education: teaching measurement in SE– Research: Test Driven Design, Software Project
Telemetry, HPCS– Industry: project management
• Has inspired startup: 6th Sense Analytics
Slide-23
University of Hawaii
Goals for Hackystat-HPCS
• Support automated collection of useful low-level data for a wide variety of platforms, organizations, and application areas.
• Make Hackystat low-level data accessable in a standard XML format for analysis by other tools.
• Provide workflow and other analyses over low-level data collected by Hackystat and other tools to support:
– discovery of developmental bottlenecks – insight into impact of tool/language/library choice
for specific applications/organizations.
Slide-24
University of Hawaii
Pilot Study, Spring 2006
• Goal: Explore issues involved in workflow analysis using Hackystat and students.
• Experimental conditions (were challenging):– Undergraduate HPC seminar– 6 students total, 3 did assignment, 1 collected data.– 1 week duration– Gauss-Seidel iteration problem, written in C, using
PThreads library, on cluster
• As a pilot study, it was successful.
Slide-25
University of Hawaii
Data Collection: Sensors
• Sensors for Emacs and Vim captured editing activities.
• Sensor for CUTest captured testing activities.
• Sensor for Shell captured command line activities.
• Custom makefile with compilation, testing, and execution targets, each instrumented with sensors.
Slide-26
University of Hawaii
Example data: Editor activities
Slide-27
University of Hawaii
Example data: Testing
Slide-28
University of Hawaii
Example data: File Metrics
Slide-29
University of Hawaii
Example data: Shell Logger
Slide-30
University of Hawaii
Data Analysis: Workflow States
• Our goal was to see if we could automatically infer the following developer workflow states:
– Serial coding
– Parallel coding
– Validation/Verification
– Debugging
– Optimization
Slide-31
University of Hawaii
Workflow State Detection: Serial coding
• We defined the "serial coding" state as the editing of a file not containing any parallel constructs, such as MPI, OpenMP, or PThread calls.
• We determine this through the MakeFile, which runs SCLC over the program at compile time and collects Hackystat FileMetric data that provides counts of parallel constructs.
• We were able to identify the Serial Coding state if the MakeFile was used consistently.
Slide-32
University of Hawaii
Workflow State Detection: Parallel Coding
• We defined the "parallel coding" state as the editing of a file containing a parallel construct (MPI, OpenMP, PThread call).
• Similarly to serial coding, we get the data required to infer this phase using a MakeFile that runs SCLC and collects FileMetric data.
• We were able to identify the parallel coding state if the MakeFile was used consistently.
Slide-33
University of Hawaii
Workflow State Detection: Testing
• We defined the "testing" state as the invocation of unit tests to determine the functional correctness of the program.
• Students were provided with test cases and the CUTest to test their program.
• We were able to infer the Testing state if CUTest was used consistently.
Slide-34
University of Hawaii
Workflow State Detection: Debugging
• We have not yet been able to generate satisfactory heuristics to infer the "debugging" state from our data.
– Students did not use a debugging tool that would have allowed instrumentation with a sensor.
– UMD heuristics, such as the presence of "printf" statements, were not collected by SCLC.
– Debugging is entwined with Testing.
Slide-35
University of Hawaii
Workflow State Detection:Optimization
• We have not yet been able to generate satisfactory heuristics to infer the "optimization" state from our data.
– Students did not use a performance analysis tool that would have allowed instrumentation with a sensor.
– Repeated command line invocation of the program could potentially identify the activity as "optimization".
Slide-36
University of Hawaii
Insights from the pilot study, 1
• Automatic inference of these workflow states in a student setting requires:
– Consistent use of MakeFile (or some other mechanism to invoke SCLC consistently) to infer serial coding and parallel coding workflow states.
– Consistent use of an instrumented debugging tool to infer the debugging workflow state.
– Consistent use of an "execute" MakeFile target (and/or an instrumented performance analysis tool) to infer the optimization workflow state.
Slide-37
University of Hawaii
Insights from the pilot study, 2
• Ironically, it may be easier to infer workflow states from industrial settings than from classroom settings!
– Industrial settings are more likely to use a wider variety of tools which could be instrumented and provide better insight into development activities.
– Large scale programming leads inexorably to consistent use of MakeFiles (or similar scripts) that should simplify state inference.
Slide-38
University of Hawaii
Insights from the pilot study, 3
• Are we defining the right set of workflow states?
• For example, the "debugging" phase seems difficult to distinguish as a distinct state.
• Do we really need to infer "debugging" as a distinct activity?
• Workflow inference heuristics appear to be highly contextual, depending upon the language, toolset, organization, and application. (This is not a bug, this is just reality. We will probably need to enable each MP to develop heuristics that work for them.)
Slide-39
University of Hawaii
Next steps
• Graduate HPC classes at UH.– The instructor (Henri Casanova) has agreed to
participate with UMD and UH/Hackystat in data collection and analysis.
– Bigger assignments, more sophisticated students, hopefully larger class!
• Workflow Inference System for Hackystat (WISH)– Support export of raw data to other tools. – Support import of raw data from other tools.– Provide high-level rule-based inference mechanism to
support organization-specific heuristics for workflow state identification.
Recommended