21
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases 20 years supporting research, development and innovation in Galicia

Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

Embed Size (px)

DESCRIPTION

Intel Xeon Phi is a new x86-compatible co-processor architecture which permits the execution of legacy applications with minimum changes on the code. Using two real applications as example, we have evaluated the effort to run them using it with minimal changes on the code, and we have compared the results against the host performance.

Citation preview

Page 1: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

Can You Get Performance from Xeon

Phi Easily?Lessons Learned from

Two Real Cases

20 years supporting research,

development and innovation

in Galicia

Page 2: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

Objective• Check the amount of work to use Intel

Xeon Phi.• Minimal modifications using only

pragmas.• Two applications:

– CalcunetW. Test MKL Libraries.– GammaMaps. Test pragmas.

• Two modes:– Native: Only compiled to execute on Xeon Phi– Offload: Uses Host+Xeon Phi

Page 3: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

CalcuNetw: Calculate Measurements in Complex

Networks • Complex networks, consisting of sets of

nodes or vertices joined together in pairs by links or edges.

• Application Calculates for each network:– Subgraph Centrality (SC): characterizes the participation

of each node in all subgraphs in a network.– SC odd: account only paths of long odd– SC even: account only paths of long even– Bipartivity: Is a proportion of even to total number of closed

walks in the network. – Network Communicability for Connected Nodes: C(p,q):

Measures how well communicated are two nodes in the network.

– Network Communicability C(G): is the mean of all the C(p,q), Mouriño J.C., Estrada E., Gomez A. “ CalcuNetw: Calculate Measurements in Complex Networks ”,Informe Técnico

CESGA-2005-003

Page 4: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

CalcuNetW

• Uses intensively DGEMM from BLAS• Calculates parameters for input• Plus n random matrixes

Page 5: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

GammaMaps: A figure-of-merit in Radiation Therapy

X

Y

Z

),

𝑑(𝑟 )

Dose in voxel i,j,k

X

Y

Z

𝑟 Dose Reference

Dose Test

Page 6: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

GammaMaps: A figure-of-merit in Radiation Therapy

Read Doses

Initialise and normalise

Compute Gamma

Store Gamma

• Application in FORTRAN 90• Parallelised using OpenMP• Geometric algorithm*• 512 x 512 x 128 =

33,554,432 voxels• Auto-vectorization• Pragmas for offload

* T. Ju, T. Simpson, J. O. Deasy, and D. A. Low, “Geometric interpretation of the γ dose distribution comparison technique: Interpolation-free calculation,” Medical Physics, vol. 35, no. 3, p. 879, 2008.

Page 7: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

Results of Experiments

Page 8: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

PlatformHost

CPU Model Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz

Nr. of cores 16

Memory 32788 MB

Operating System Linux 2.6.32-279.el6.x86_64

Compiler Version 2013U2Intel Xeon Phi

Model Beta0 Engineering Sample

Nr. of cores 61 at 1.09GHz

Memory 7936 MB

Operating System MPSS Gold U1

Compiler Version 2013U2

GDDR Technology GDDR5

GDDR Frecuency 2750000 KHz

• Remote access to Intel systems

• Feb. 2013

Page 9: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

COMPACT - FINE

C1 C2 C3 C4

HT1

HT2

HT3

HT4

HT1

HT2

HT3

HT4

HT1

HT2

HT3

HT4

HT1

HT2

HT3

HT4

0 1 2 3 4 5 6 7

Intel Xeon Phi Affinity Policies

SCATTER - FINE

C1 C2 C3 C4

HT1

HT2

HT3

HT4

HT1

HT2

HT3

HT4

HT1

HT2

HT3

HT4

HT1

HT2

HT3

HT4

0 4 1 5 2 6 3 7

BALANCED - FINE

C1 C2 C3 C4

HT1

HT2

HT3

HT4

HT1

HT2

HT3

HT4

HT1

HT2

HT3

HT4

HT1

HT2

HT3

HT4

0 1 2 3 4 5 6 7

BALANCED - CORE

C1 C2 C3 C4

HT1

HT2

HT3

HT4

HT1

HT2

HT3

HT4

HT1

HT2

HT3

HT4

HT1

HT2

HT3

HT4

{0,1} {2,3} {4,5} {6,7}

• TYPE– Compact– Scatter– Balanced

• Granularity– Fine or Thread– Core

Page 10: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

Results for CalcunetW

Page 11: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

CalcunetW

Page 12: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

CalcunetW

Page 13: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

CalcunetW

Page 14: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

Results for GammaMaps

Page 15: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

GammaMaps

Page 16: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

Host

0 2 4 6 8 10 12 14 16 180

200

400

600

800

1000

1200

1400

Host

local-compact-corelocal-compact-finelocal-scatter-finelocal-scatter-core

Nr. of Threads

Ela

pse

d T

ime

(s)

Page 17: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

GammaMaps

Page 18: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

Xeon Phi poor I/O

Page 19: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

Conclusions• Using MKL library is easy and does

not require changes in the code.• Easy pragmas on code permit fast

usage• I/O performance issues in Xeon Phi• 1 Xeon Phi ~ 1 Xeon E5-2680• Improve performance requires

additional work.

Page 20: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

Acknowledge

The authors would like to thank Intel for providing access to Intel

Xeon Phi coprocessor.

Page 21: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

Questions

Andrés Gómez

José Carlos Mouriño

Carmen Cotelo

Aurelio Rodríguez

The TEAM