15
Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

Embed Size (px)

DESCRIPTION

This is a problem when CPU architectures change… Image © Guido da Rozze CC-BYGuido da Rozze

Citation preview

Page 1: Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

Show don’t tell: improving vectorization awareness in HPC

Mark O’ConnorVP Product Management

Page 2: Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

An Uncomfortable Truth About HPC Codes

caption

Hey, at least it compiles

Optimized for modern CPUs

Page 3: Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

This is a problem when CPU architectures change…

Image © Guido da Rozze CC-BY

Page 4: Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

This is a problem when CPU architectures change…

The Dragon of Hell © Baltasar Vischi CC-BY-ND

Page 5: Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

How do we help scientific users get the most out of our machines?

Brave Hero:1 code 10x faster

Wise Ruler:100 codes 10%

faster

Images © Frank Kovalchek CC-BY

Page 6: Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

Two very different ways to improve the situation

Page 7: Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

Can we help scientific developers scalably?

You can teach a man to fishBut first he must realize he is hungry

Image © Kanani CC-BY

Page 8: Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

Communicating the benefits of optimization

caption

This is your brain on drugs… … this is your code on –O0

Page 9: Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

Show the user with a performance model they understand

caption “Vectorization, how does it work?”

Page 10: Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

Communicating at the user’s abstraction level

caption

Out-of-order

Pipelined

Time per retired

instruction

Page 11: Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

Explaining performance in terms of the program counter

caption

Statistical wallclock time estimate of:• Scalar numeric operations• AVX/AVX2 operations• Memory accesses (what

about indirect accesses?)• Other (branch, logic, …)

+ simple, actionable advice

Page 12: Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

Not just for vectorization, but for MPI, I/O, memory and energy usage too

caption

Page 13: Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

Diving deeper with Allinea Forge – where can I improve?

caption

Performance over time

Slow lines of codeUnvectorized loopsMPI Bottlenecks

OpenMP bottlenecks

…and much more!

Page 14: Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

Intel® Xeon Phi™ Knights Landing Support

Debug• First-class Intel® Xeon Phi™ support• Memory debugging enhancements for HBM

Tune and Analyze• First-class Intel® Xeon Phi™ support• Investigating additional Intel® Xeon Phi™ metrics

– watch this space!

Profile• First-class Intel® Xeon Phi™ support• Investigating additional Intel® Xeon Phi™ metrics

– watch this space!

Page 15: Show don’t tell: improving vectorization awareness in HPC Mark O’Connor VP Product Management

Thank you! Any questions?

Mark O’ConnorVP Product Management