Performance Visualizations using XML Representations

Performance Visualizations using

XML Representations

Presented by Kristof BeylsYijun Yu

Erik H. D’Hollander

Overview

1. Background: program optimization research

2. XML representations

3. Visualizations

4. Conclusion

Program optimization research

What slows down a program execution?Need to pinpoint the performance bottlenecks.(by analyzing the program)

How to improve the performance?By program transformations, based on pinpointed bottlenecks.

How to transform the program?1. Compiler

advantage: automatic optimizationdisadvantage: sometimes hard to understand what program does

2. Programmer:advantage: has good understanding of program functionalitydisadvantage: requires human effort / How to present performance bottlenecks best?

How to construct a research infrastructure that supports all the above in a common framework? ( XML)

Two main performance factors

Parallelismperforming computation in parallelreduces execution time

Data localityfetching data from fast CPU caches reduces execution time

Overview

3. Visualizations

4. Conclusion

Why XML representations? Extensible and versatile Standard and Interoperable Language Independent

XMLnamespace (tool)

Representing

1. ast (yaxx) abstract syntax tree

2. par (oc) identified parallel or sequential loops

3. trace (isv, cv) execution trace of memory instructions

4. hotspot(isv,cv)

performance bottleneck locations

5. isdg (isv) iteration space dependence graph

6. rdv (distv) a reuse distance vector

yaxx – YACC extension to XMLoc – Omega calculatorisv – iteration space visualizercv – cache (trace) visualizerdistv – (cache reuse)

distance visualizer

1. AST (Abstract Syntax Tree) (ast) XML is a good representation for AST by its hierarchical

nature. ast namespace captures syntactical information of a

program We can construct AST from source code through YAXX

and regenerate source code through XSLT.

<ast:DO_Loop> <var name=“I”/> <lb><const value=“1”/></lb> <ub><const value=“10”/></ub> <st><const value=“1”/></st> <body>…</body>

</ast:DO_Loop>

DO I=1,10,1

……

Who transforms the program?1. Compiler

2. Parallel loops (par)

Identified parallel loop are annotated with a <par:true/> element in the “par” namespace.

<ast:DO_Loop><par:true/>…

</ast:DO_Loop> In this way, semantics and syntax information

are in orthogonal name spaces. Syntax-based tools (e.g. unparser) can still ignore it, or translate it into directive comments: e.g. Fortran C$DOALL.

XFPT: an extended optimizing compiler

3. Traces (trace) Trace records a sequence of memory address accesses<trace:seq>

<access addr=“0x00ffe8” bytes=“8” /><access addr=“0x00fff0” bytes=“16” />……

</trace:seq> Trace alone can be used to identify runtime data

dependences and identify cache misses through cache simulator

Associate an address with the array reference number or loop iteration index on the program’s AST, the trace can be used for advanced loop dependence analysis and cache reuse distance analysis.

<trace:seq><access addr=“0x00ffe8” bytes=“8” hotspot:id=“1”>

<!-– The 1st reference --> <do_loop hotspot:id=“1” vector=“1 2”/>

<!– The 1st DO loop:(I,J)=(1,2) --> <array hotspot:id=“1” vector=“1”/>

<!-– Reference to array element X(1) --></access>

……</trace:seq>

4. Hotspots (hotspot) Hot spots are identified bottlenecks of the program Two types are used:

Bottleneck loops: tells which loop is the performance bottlenecks Bottleneck references: tells which references are performance

bottlenecks<hotspot:list>

<do_loop id=“1”><index vector=“I J”/><start lineno=“3” colno=“1”/><end lineno=“7” colno=“12”/>

</do_loop> ……<array id=“2” name=“X”>

</reference>……</hotspot:list>

1 DIM T(3), X(10)2 REAL S, X3 DO I = 1, 104 DO J = 1, 105 S = S + X(I)*J6 ENDDO7 ENDDO8 …

Overview

3. Visualizations

4. Conclusion

Performance Visualizations

XML plays an important role to glue the visualizers with an optimizing compiler:

1.Loop dependence visualization

2.Reuse distance visualization

3.Cache behavior visualization

Visualization 1:ISDG: iteration space dependence graph

An iteration is an instance of the loop body statements. An iteration space is the set of integer vector values of the DO loop index variables for the traversed iterations.

Loop carried dependence is a dependence caused by two references R1 and R2 that access to the same memory address, while:1. One of R1, R2 is a write2. R1 belongs to loop iteration (i1,

j1) and R2 belongs to loop iteration (i2, j2) (i1,j1)

A ISDG is a graph with nodes representing the iteration space and edges representing loop carried dependences.

DO i=1,5 DO j=1,5 A(i,j) = A(i,j+1) ENDDOENDDO

The WTCM CFD application

WTCM has a Computational Fluid Dynamics simulator which involves solving partial differential equations (PDE) through a Gauss-Siedel solver

temperature3D geometry + 1D time

The visualized dependences

The loop transformation

A 3-D unimodular transformation is found after visualizing the 4D loop nest which has 177 array references at run-time for each iteration. Here we use a regularshape. The transformation makes it possible to speed-up the program around N2/6 times where N is the diameter of the geometry.

Visualization 2:Reuse distances

Reuse distance is the amount of data accessed before a memory address is reused.

reuse distance > cache size cache miss

Execution time reduction on an Itanium processor (Spec2000 programs).

program

calculation

other bottlenecks

data cache misses

Visualization 3:Cache miss traces (Tomcatv/Spec95)

White: hit

Blue: compulsory

Green: capacity

Red: conflict

4.2 Visualizing hotspots of conflict cache misses

X(I,J+1) and X(I,J) has conflictif X has a dimension (512,512).It is resolved by changing thedimension to (524, 524).

Also known as, Array Padding

4.2 Cache misses trace after array padding, most spatial locality is exploited, conflict misses resolved

On Intel 550MHz Pentium III (single CPU), the measured speedup with VTune >50%

Overview

3. Visualizations

4. Conclusion

Conclusion

An existing optimizing compiler FPT was extended with an extensible XML interface.

The performance factors, in particular loop parallelism and data locality, were exported from FPT.

These factors were visualized through Loop dependence visualizer ISV Execution trace visualizer CacheVis Reuse distance visualizer ReuseVis

The programmer can use the visualized feedback to improve the performance.

The End.

Any questions?

Program semantics (Software) vs. Architecture capabilities (Hardware)

Research Area Program Architecture

Parallel Computing

Parallelism at Task, Loop, Instruction levels through data dependence analysis

Multi-processors (MIMD), pipeline (SIMD), multi-threads, network of workstations (NOW, Grid computing)

Memory-hierarchy Temporal and spatial data locality, data layout, stack reuse distances

Cache at level 1, 2, 3, TLB, set associativity, data replacement policy

2. Major Performance factors

Parallelism Loop dependences Loop-level parallelism Instruction-level parallelism Partition load balance

Data locality Temporal locality Spatial locality CCC (Compulsory, Capacity, Conflict) cache misses Reuse distances

3.6 Cache parameters

To tune different architectural cache configurations, we represent the cache parameters: cache size, cache line size and set associativity, into a configuration file in XML. For example, a 2-level cache is specified as follows:

<cache:hierarchy><parameters level=“1”><size>1024</size><line>32</line><associativity>32</associativity></parameters><parameters level=“2”><size>65536</size><line>32</line><associativity>1</associativity></parameters>

</cache:hierarchy>

4.2 Visualizing data locality histogram distributed over reuse distances

Performance Visualizations using XML Representations

Documents

Advanced Visualizations

Best Practices in Data Visualizations - MicroStrategy · Best Practices in Data Visualizations Why We Visualize Understanding Data Visualizations Enhancing Visualizations Visualization

HEINZ NIXDORF INSTITUT. Overview What are Topic Maps? Graphical Visualizations for Topic Maps Establishing Asossiations by graphical means XML Tools in

Do 3D Visualizations Fail? An Empirical Discussion on 2D and 3D … · 2014. 11. 12. · Do 3D Visualizations Fail? An Empirical Discussion on 2D and 3D Representations of the Spatio-temporal

Representation of XML Schema Components Among the diﬀerent representations of XML Schema’s data model which are available today, none provides access to the Schema …

Work visualizations

2012/12 visualizations

Oracle Data Visualizations

visualizations 12_2012

XML and Semantic Web Technologies · – RDF Semantics (REC-2004/02/10), concrete representations thereof: – RDF/XML Syntax Speciﬁcation (Revised; REC-2004/02/10), – RDF/N3

Introducing Product and Process Visualizations to …feldt/teaching/master_theses/alette-fritzon... · Introducing Product and Process Visualizations to ... and Process Visualizations

Visualizations with Empathy

Interactive data visualizations

NETx Visualizations

Designing Great Visualizations

Concurrent Dynamic Visualizations With Expressive Petri ... · Concurrent Dynamic Visualizations With Expressive Petri Net Representations to Enrich the Understanding of Biological

Visualizations and Animations

Turner Fleischer Visualizations

Knowledge Domain Visualizations:

Kern Blueprint Visualizations