16
SOS7, Durango CO, 4- Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD [Trimmed & Distilled Distilled for SOS7 by M. Levine 4-March-2003, Durango]

SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

Embed Size (px)

Citation preview

Page 1: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

Scaling to New HeightsRetrospective

IEEE/ACM SC2002 Conference

Baltimore, MD

[Trimmed & DistilledDistilled for SOS7 by M. Levine

4-March-2003, Durango]

Page 2: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

Contacts and References

• David O’Neal [email protected]

• John Urbanic [email protected]

• Sergiu Sanielevici [email protected]

Workshop materials:www.psc.edu/training/scaling/workshop.html

Page 3: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

Introduction• More than 80 researchers from universities, research

centers, and corporations around the country attended the first "Scaling to New Heights" workshop, May 20 and 21, 2002, at the PSC, Pittsburgh.

• Sponsored by the NSF leading-edge centers (NCSA, PSC, SDSC) together with the Center for Computational Sciences (ORNL) and NERSC, the workshop included a poster session, invited and contributed talks, and a panel.

• Participants examined issues involved in adapting and developing research software to effectively exploit systems comprised of thousands of processors. [Fred/Neil’s Q1.]

The following slides represent a collection of ideas from the workshop

Page 4: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

Basic ConceptsBasic Concepts

• All application components must scaleAll application components must scale• Control granularity; VirtualizeControl granularity; Virtualize• Incorporate latency toleranceIncorporate latency tolerance• Reduce dependency on synchronizationReduce dependency on synchronization• Maintain per-process load; Facilitate balanceMaintain per-process load; Facilitate balance

Only new aspect, at larger scale, is the degree Only new aspect, at larger scale, is the degree to which these things matterto which these things matter

Page 5: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

Poor Scalability?(Keep your eye on the ball)

Processors

Speedup

Page 6: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

Good Scalability? (Keep your eye on the ball)

Processors

Speedup

Page 7: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

Processors

Speedup

Performance is the Goal! (Keep your eye on the ball)

Page 8: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

Issues and Remedies

• Granularity [Q2a]

• Latencies [Q2b]

• Synchronization• Load Balancing [Q2c]

• Heterogeneous Considerations

Page 9: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

GranularityGranularity

Define problem in terms of a large number of small Define problem in terms of a large number of small objects independent of the process count objects independent of the process count [[Q2aQ2a]]

• Object design considerations– Caching and other local effects– Communication-to-computation ratio

• Control granularity through virtualization– Maintain per-process load level– Manage comms within virtual blocks, e.g. Converse– Facilitate dynamic load balancing

Page 10: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

LatenciesLatencies• Network

– Latency reduction lags improvement in flop rates; Much easier to grow bandwidth

– Overlap communications and computations; Pipeline larger messages

– Don’t wait – Speculate! Don’t wait – Speculate! [[Q2bQ2b]]

• Software Overheads– Can be more significant than network delays– NUMA architectures

Scalable designs must accommodate latencies

Page 11: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

Synchronization

• Cost increases with the process count– Synchronization doesn’t scale well– Latencies come into play here too

• Distributed resource exacerbates problems– Heterogeneity another significant obstacle

• Regular communication patterns are often characterized by many synchronizations– Best suited to homogeneous co-located clusters

Transition to asynchronous models?

Page 12: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

Load BalancingLoad Balancing

• Static load balancing– Reduces to granularity problem– Differences between processors and network

segments are determined a priori

• Dynamic process management requiring Dynamic process management requiring distributed monitoring capabilities distributed monitoring capabilities [[Q2cQ2c]]– Must be scalable– System maps objects to processes

Page 13: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

Heterogeneous Considerations

• Similar but different processors or network components configured within a single cluster– Different clock rates, NICs, etc.

• Distinct processors, networking segments, and operating systems operating at a distance– Grid resources

Elevates significance of dynamic load balancing; Data-driven objects immediately adaptable

Page 14: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

Tools Tools [[Q2dQ2d?]?]

• Automated algorithm selection and performance Automated algorithm selection and performance tuning by empirical means, e.g. ATLAStuning by empirical means, e.g. ATLAS– Generate space of algorithms and search for fastest

implementations by running them

• Scalability prediction, e.g. PMaC LabScalability prediction, e.g. PMaC Lab– Develop performance models (machine profiles;

application signatures) and trending patterns

Identify/fix bottlenecks; choose new methods?

Page 15: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

Topics for Discussion

• How should large, scalable computational science problems be posed?

• Should existing algorithms and codes be modified or should new ones be developed?

• Should agencies explicitly fund collaborations to develop industrial-strength, efficient, scalable codes?

• What should cyber-infrastructure builders and operators do to help scientists develop and run good applications?

Page 16: SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M

SOS7, Durango CO, 4-Mar-2003

Summary Comments (MJL)

• Substantial progress, with scientific payoff, is being made.

• It is hard work without magic bullets.• >>> Dynamic load balancing <<<

– Big payoff, homogeneous and heterogeneous– Requires considerable people work to

implement– Runtime overhead very small.