Upload
phungmien
View
220
Download
1
Embed Size (px)
Citation preview
Intel R© Cluster Checker 2.1
Contents
1 clock 21.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 dgemm 22.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3.1 mflops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3.2 deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3.3 m, n, k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3.4 memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 disk bandwidth 33.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3.1 bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.3.2 deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.3.3 options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.3.4 workdir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 ethernet 44.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.3.1 driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.3.2 name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.3.3 options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.3.4 version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.7 NOTES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5 generic 55.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.3.1 test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6 hardware 66.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
6.3.1 exclude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76.3.2 options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
6.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
i
Intel R© Cluster Checker 2.1
6.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
7 hpl 87.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
7.3.1 fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87.3.2 mpi-path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97.3.3 process-number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97.3.4 Ns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97.3.5 Ps and Qs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97.3.6 NBs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
7.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
8 libraries 108.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
8.3.1 exclude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
9 micconf 119.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
9.3.1 options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119.3.2 mpss-path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119.3.3 include . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119.3.4 exclude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129.3.5 diff-c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129.3.6 diff-mb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129.3.7 diff-uv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
9.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
10 micmpi 1210.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1210.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1210.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
10.3.1 device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1310.3.2 mpi-path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1310.3.3 process-number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
10.4 NODEFILE CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1310.5 NODEFILE EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1310.6 CONFIG EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1310.7 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1310.8 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1310.9 NOTES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
11 micperf 1411.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1411.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1411.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
11.3.1 stream-deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1411.3.2 stream-bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
11.4 XML CONFIG EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1411.5 NODEFILE CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1411.6 NODELIST EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1411.7 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1411.8 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
ii
Intel R© Cluster Checker 2.1
12 mount 1512.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1512.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
12.2.1 Compliance mode: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1512.2.2 Execution modes other than compliance: . . . . . . . . . . . . . . . . . . . . . . . . . 15
12.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1512.3.1 user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1512.3.2 sticky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
12.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1512.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1512.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1512.7 NOTES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
13 mpi internode 1613.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1613.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1613.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
13.3.1 device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1613.3.2 mpi-path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1613.3.3 process-number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
13.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1613.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1613.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1713.7 NOTES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
14 mpi local 1714.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1714.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
14.2.1 Compliance mode: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1714.2.2 Execution modes other than compliance: . . . . . . . . . . . . . . . . . . . . . . . . . 17
14.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1714.3.1 device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1714.3.2 mpi-path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1714.3.3 process-number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
14.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1814.5 INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1814.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
14.6.1 Compliance mode: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1814.6.2 Other modes: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
14.7 NOTES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
15 packages 1815.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1815.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1815.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
15.3.1 node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1915.3.2 head . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1915.3.3 exclude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1915.3.4 include . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1915.3.5 command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
15.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1915.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1915.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
16 ping 2016.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2016.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2016.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
16.3.1 time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2016.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2016.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2016.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
iii
Intel R© Cluster Checker 2.1
17 process 2017.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2017.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2017.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
17.3.1 elapsed time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2117.3.2 exclude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2117.3.3 exempt uids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2117.3.4 percent cpu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2117.3.5 percent memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2117.3.6 zombie allowed elapsed time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
17.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2117.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2117.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
18 remote login 2218.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2218.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2218.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
18.3.1 cmd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2218.3.2 time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2218.3.3 version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
18.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2218.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2218.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
19 shells 2219.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2319.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2319.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
19.3.1 none . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2319.4 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2319.5 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
20 storage 2320.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2320.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2320.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2320.4 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2320.5 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
21 stream 2421.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2421.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2421.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
21.3.1 bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2421.3.2 deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
21.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2421.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2421.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
22 tools 2422.1 DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2422.2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2422.3 CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
22.3.1 python-path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2522.3.2 python-version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2522.3.3 perl-path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2522.3.4 perl-version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2522.3.5 tclsh-path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2522.3.6 tclsh-version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
22.4 EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2522.5 MODULE INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2522.6 DEPENDENCIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
iv
Intel R© Cluster Checker 2.1
Disclaimer and Legal Information
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NOLICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROP-ERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS ANDCONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER ANDINTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OFINTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PAR-TICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OROTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directlyor indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FORANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL ANDITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, ANDEMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES ANDREASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OFPRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSIONCRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENTIN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers mustnot rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intelreserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilitiesarising from future changes to them. The information here is subject to change without notice. Do notfinalize a design with this information.
The products described in this document may contain design defects or errors known as errata which maycause the product to deviate from published specifications. Current characterized errata are available onrequest.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placingyour product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature,may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm
Intel, the Intel logo, the Intel Inside logo, Xeon, and Intel Xeon Phi are trademarks of Intel Corporation in theU.S. and/or other countries.
* Other names and brands may be claimed as the property of others.
c© 2013 Intel Corporation. All rights reserved.
1
Intel R© Cluster Checker 2.1
1 clock
Check the cluster clock synchronization
1.1 DESCRIPTION
The clock module checks that system clocks on each compute node are reasonably synchronized.Having nodes with synchronized clocks ensures proper functionality. Component developers usually assumethis synchronization. When logging distributed events, synchronized timekeeping will help to order the events.
1.2 METHOD
The clock module calculates the difference between the node time and the cluster median time and comparesit to a threshold value. The date command is used to gather timing information.
1.3 CONFIGURATION
1.3.1 deviation
The deviation tag specifies the maximum deviation (in seconds) of the clock on any node from the clustermedian.Allowed values: Decimal values greater than zero.Default value: 300 seconds
1.4 EXAMPLE
<clock><deviation>300</deviation>
</clock>
1.5 MODULE INFORMATION
• Level: 2
1.6 DEPENDENCIES
• Commands: date
• Test Modules: remote login
2 dgemm
Check the floating point performance of a node
2.1 DESCRIPTION
The dgemm module checks the floating point performance of each cluster compute node. The test moduleexecutes the DGEMM library routine from the Intel(R) Math Kernel Library to measure the floating pointperformance and deviation over the cluster nodes.
2.2 METHOD
By default, a pre-built binary is used to calculate performance. If no thresholds are configured, the resultsare considered indeterminate. The deviation in reported performance is checked when there are three or morevalid results from the compute nodes.
2
Intel R© Cluster Checker 2.1
2.3 CONFIGURATION
2.3.1 mops
The mflops tag specifies the minimum acceptable floating point performance in MFLOPS.NOTE: If not configured, no comparison will be made and the obtained mflops, will be displayed.Default value: none
2.3.2 deviation
The deviation tag specifies the number of allowed standard deviations from median, used to search foroutlier values. The allowed range is (median +/- deviation x stddev).Default value: 3
2.3.3 m, n, k
These tags specify the matrix dimensions used in DGEMM. The total memory required is (mn + mk + nk)x sizeof(double) bytes.Default values: m = 6000, n = 6000, k = 120NOTE: (memory requirement = 200 MB if sizeof(double) is 8 bytes)
2.3.4 memory
The memory tag specifies the percentage of memory to use. This tag automatically calculates m and n (bysetting m = n) for a fixed value of k (k = 120), to best match the configured percentage of total memory ofthe node. This tag has precedence over m, n and k. If memory is used, m, n and k won't be used.Default value: none
2.4 EXAMPLE
<dgemm><deviation>3</deviation><k>120</k><m>6000</m><mflops>17000</mflops><n>6000</n>
</dgemm>
2.5 MODULE INFORMATION
• Level: 3
2.6 DEPENDENCIES
• Commands: grep, awk
• Test Modules: remote login
3 disk bandwidth
Single-node Disk Bandwidth
3.1 DESCRIPTION
The disk bandwidth module checks the disk I/O bandwidth of each compute node and its deviation amongcluster compute nodes. Deviation is checked only if there are three or more valid results from the computenodes.Having a uniform disk bandwidth allows distributed computations to run without the need for complex bal-ancing schemes.
3
Intel R© Cluster Checker 2.1
3.2 METHOD
The IOzone* filesystem benchmark is used to exercise I/O. The test module executes the benchmark in automode with 64MB files using direct access. Only the read values will be checked.
3.3 CONFIGURATION
3.3.1 bandwidth
The bandwidth tag specifies the minimally acceptable disk bandwidth, in MB/s.NOTE: If not configured no comparison will be made and the obtained bandwidth will be displayed.Default value: none
3.3.2 deviation
The deviation tag specifies the number of allowed standard deviations from median, used to search foroutlier values. The allowed range is (median +/- deviation x stddev).Default value: 3
3.3.3 options
The options tag specifies a string with the options to be used to execute the benchmark. Options areexpected to be valid.Default value: -az -i0 -i1 -y 512 -s 65536 -+n -+r -I
3.3.4 workdir
The workdir tag specifies the base path to use as working directory instead of /tmp. The directory shouldexist and have proper permissions. If the directory is not local, the test will fail.Default values: /tmp
3.4 EXAMPLE
<disk_bandwidth><bandwidth>40</bandwidth><deviation>3</deviation><options>-az -i0 -i1 -y 512 -s 65536 -+n -+r -I</options><workdir>/tmp</workdir>
</disk_bandwidth>
3.5 MODULE INFORMATION
• Level: 3
3.6 DEPENDENCIES
• Test Modules: remote login
4 ethernet
Ethernet driver uniformity and wellness
4.1 DESCRIPTION
The ethernet module verifies Intel(R) Ethernet network drivers, including their uniformity and wellness.
4.2 METHOD
The load status and the uniformity of the required drivers is checked across compute nodes. If version oroptions are explicitly defined, they will be verified to be the requested ones.For more details check http://www.intel.com/design/network/drivers/.
4
Intel R© Cluster Checker 2.1
4.3 CONFIGURATION
4.3.1 driver
The driver tag is a container that can be used to override detection or to check unsupported drivers.
4.3.2 name
The name tag specifies the name of the driver to be checked.
4.3.3 options
The options tag specifies the options that are required in the driver configuration.
4.3.4 version
The version tag specifies the string to be used as required driver version.
4.4 EXAMPLE
<ethernet><driver>
<name>ixgb</name><options>TxDescriptors=256</options><version>1.0.135-k2-NAPI</version>
</driver></ethernet>
4.5 MODULE INFORMATION
• Level: 1
4.6 DEPENDENCIES
• Commands: cut, grep, lsmod, lspci, md5sum, modinfo, modprobe
• Test Modules: remote login
4.7 NOTES
The modprobe configuration files (/etc/modprobe.d or /etc/modprobe.conf) must be readable.
5 generic
Perform user specified check(s)
5.1 DESCRIPTION
The generic module runs one or more user specified commands and either verifies that the output is uniformfor all nodes or verifies that the output matches the user specified reference output. Only "compute" typenodes are used, i.e., the "head" and "knc-compute" type nodes do not execute the generic module. Theeffective user ID should be non-root.This module is an easy and convenient way to extend Intel(R) Cluster Checker.
5.2 METHOD
See Description.
5
Intel R© Cluster Checker 2.1
5.3 CONFIGURATION
5.3.1 test
The test tag provides a container for the user-specified command.Tags which may be used inside test: command, result
command The command tag specifies the command to execute. This tag is mandatory.
result The result tag specifies the "correct" output of the corresponding command. This tag is optional.If result is not specified, then the output is verified to be uniform for all nodes.For both the command output and the result tag value, the module trims the leading and trailing whitespace.Otherwise, the comparison is whitespace sensitive.
5.4 EXAMPLE
<generic><test>
<command>stat -c %a /tmp</command><result>1777</result>
</test><test>
<command>uname -r</command></test>
</generic>
5.5 MODULE INFORMATION
• Level: 2
5.6 DEPENDENCIES
• Test Modules: remote login
6 hardware
Check hardware uniformity among cluster nodes
6.1 DESCRIPTION
The hardware module checks the uniformity of the hardware among the cluster compute nodes. The testmodule checks that specific attributes of some hardware devices have uniform values among compute nodes.
6.2 METHOD
The lshw utility is used to list the hardware devices attributes on each node.(see http://ezix.org/project/wiki/HardwareLiSter#Usage)The items compared by default can be modified using the include and exclude configuration tags. Whenexecuting the test module as a privileged user, some extra items are shown, including the base board modeland BIOS version. The module runs lshw and captures the xml output, then it creates a 'key -> value' list.Each element key is created by concatenating the lshw xml output attributes called "id", "type", "value" andthe name of the final xml tag. There are exceptions for xml tags called "setting", "capability" and "resource",in which only the above mentioned attributes are used and the tag name is not appended.Example 1: Given the following xml snippet from the output of lshw<list>
<node id="computer" claimed="true" class="system" handle=""><node id="core" claimed="true" class="bus" handle=""></node><node id="pci" claimed="true" class="bridge" handle="PCIBUS:0000:00"><node id="pci:0" claimed="true" class="bridge" handle="PCIBUS:0000:01"><description>PCI bridge</description>
6
Intel R© Cluster Checker 2.1
The pci:0 description will be saved as:# key # # value #'core-pci-pci:0-description' -> 'PCI bridge'
The 'id' computer is removed because it is common to the entire output.Example 2:
<node id="core" claimed="true" class="bus" handle=""><node id="pci" claimed="true" class="bridge" handle="PCIBUS:0000:00"><node id="pci:0" claimed="true" class="bridge" handle="PCIBUS:0000:01"><resource type="irq" value="48"/>
The pci:0 resource and type irq will be saved as:# key # # value #'core-pci-pci:0-irq-48' -> '48'
Example 3:<node id="core" claimed="true" class="bus" handle=""><node id="pci" claimed="true" class="bridge" handle="PCIBUS:0000:00">
<node id="pci:0" claimed="true" class="bridge" handle="PCIBUS:0000:01"><node id="network:0" disabled="true" ...>
<setting id="driverversion" value="3.0.6-k"/>
The pci:0 and network:0 driverversion will be saved as:# key # # value #'core-pci-pci:0-network:0 -driverversion -3.0.6-k' -> '3.0.6-k'
There are cases were the key may not be unique, for those a number will be appended, e.g:# key # # value #'pci-storage-ioport' -> '3108(size=8)''pci-storage-ioport -0' -> '3114(size=4)''pci-storage-ioport -1' -> '3100(size=8)'
NOTE: The "value" attribute is also used in the value field, just for convenience.NOTE: Memory size comparison is relaxed, values are allowed to differ up to 5%.
6.3 CONFIGURATION
6.3.1 exclude
The exclude tag excludes items from the comparison. The string is interpreted as a POSIX matching regularexpression. This configuration tag can be repeated multiple times. Note that to exactly match meta characters( ^ [. � (${n ()+j ?<>), they should be escaped.NOTE: To exclude an item, the regular expression should math the key, e.g:
# key # # value #'pci-storage-ioport' -> '3108(size=8)''pci-storage-ioport -0' -> '3114(size=4)''pci-storage-ioport -1' -> '3100(size=8)'
To exclude all storage items, the tag should be:<exclude>.*storage.*</exclude>
Default values: noneNOTE: These tags need to be reviewed if mixed with the option tag.
6.3.2 options
The options tag appends extra options when running the hardware listing utility. It can be used to enableor disable tests, if required, during troubleshooting.Allowed values:
• -class CLASS only show a certain class of hardware
• -C CLASS same as '-class CLASS'
• -c CLASS same as '-class CLASS'
7
Intel R© Cluster Checker 2.1
• -disable TEST disable a test (like pci, isapnp, cpuid, etc. )
• -enable TEST enable a test (like pci, isapnp, cpuid, etc. )
• -sanitize sanitize output (remove sensitive information like serial numbers, etc.)
• -numeric output numeric IDs (for PCI, USB, etc.)
Default value: none ('-quiet -sanitize -xml' will always be appended)NOTE: If the option -class or -C is used, it is possible that previously configured exclusions fail because thekey has changed.
6.4 EXAMPLE
<hardware><exclude>core-pci-pci:4-description</exclude><exclude>core-pci-pci:0-irq-48</exclude><exclude>core-pci-pci:0-network:0 -driverversion -3.0.6-k</exclude><options>-disable cpuinfo</options>
</hardware>
6.5 MODULE INFORMATION
• Level: 3
6.6 DEPENDENCIES
• Test Modules: remote login
7 hpl
Run the Optimized HPL Benchmark
7.1 DESCRIPTION
The hpl module runs an optimized version of the High Performance Linpack* (HPL) benchmark. This binaryis part of the Intel(R) Math Kernel Library package.
7.2 METHOD
The hpl module uses a pre-built binary to exercise all cluster compute nodes at once. It uses the totalnumber of nodes in the cluster as an input parameter, but is scaled down to execute quickly, but still yieldrepresentative benchmark performance.
7.3 CONFIGURATION
7.3.1 fabric
The fabric tag provides a container for specifying the network interconnect fabric to use for the benchmark. Itmust contain one tag specifying the device type to be used (device) and optionally, a performance threshold,against which a comparison is performed (tflops). The fabric block may be repeated to test multipleinterconnects. If no fabric container is configured, the test module runs mpirun with no device and noperformance threshold.Default value: noneTags which may be used inside fabric: device, tflops
8
Intel R© Cluster Checker 2.1
device The device tag specifies a string to specify which Intel(R) MPI Library device to use. It may bespecified only once per fabric. Any extra options can be provided by using an "options" XML attribute. Theoptions should be placed in order, with global modifiers first, as required by the library. This tag can onlybe used inside a fabric container. If no device is configured, then no device is passed to MPI and it getsreported as "default".Allowed values: fabric | intra-node fabric : inter-node fabricwhere
• fabric is one of {shm, dapl, tcp, tmi, ofa}
• intra-node fabric is one of {shm, dapl, tcp, tmi, ofa}
• inter-node fabric is one of {dapl, tcp, tmi, ofa}
Default value: noneNOTE: The test module uses I MPI FABRICS style introduced in version 4.0.
tops The tflops tag specifies the minimum acceptable floating point performance, in Teraflops. Thistag should be used inside a fabric container for each device. If used outside a fabric container it will beused to compare the "default" device.NOTE: If not configured, no comparison will be made and the obtained TFLOPS, will be displayed.
7.3.2 mpi-path
The mpi-path tag specifies the base path to the Intel(R) MPI Library installation directory. Setting thisparameter automatically sets up the environment.Allowed values: Any path.
7.3.3 process-number
The process-number tag specifies the number of MPI processes to start on each node.Allowed values: Integer value.Default value: Number of physical cores in the head node
7.3.4 Ns
The Ns tag specifies the size of the problem to be used in the calculation. It applies only to the fabric onwhich it was defined.Default value: 8000 x sqrt ( number of nodes )
7.3.5 Ps and Qs
These tags are factors to define the division of the matrix, one for each dimension. They can be set by theuser with the Ps and Qs tags. Both tags should be provided, otherwise they will be automatically calculated.It is important to take into account that the multiplication of Ps x Qs must be equal to the total number ofMPI processes (sum of all nodes). If no values are configured, the test module automatically calculates themaccording to the following rules.Default value: Ps x Qs = Total # of MPI processes (# all nodes x # physical cores). Ps <= Qs. Ps as bigas possible, complying with former rules.
7.3.6 NBs
The NBs tag specifies the size of the atomic blocks used in the DGEMM operation.Default value: 168
7.4 EXAMPLE
<hpl><NBs>148</NBs><Ns>8000</Ns><Ps>1</Ps><Qs>4</Qs><fabric>
<device>shm:tcp</device>
9
Intel R© Cluster Checker 2.1
<tflops>2</tflops></fabric><fabric>
<device options="-genv I_MPI_DEBUG 5">shm:dapl</device><tflops>2.5</tflops>
</fabric><mpi-path>/opt/intel/mpi-rt/4.0</mpi-path>
<process-number>2</process-number></hpl>
7.5 MODULE INFORMATION
• Level: 4
7.6 DEPENDENCIES
• Commands: bc, mktemp, mpirun, sed
• Test Modules: mount, remote login, tools, mpi internode
8 libraries
Check libraries wellness and compliance
8.1 DESCRIPTION
The libraries module checks that runtime libraries meet requirements both for wellness and compliance.Besides checking for a specific set of base, x11 and runtime libraries, which needs to be provided for all nodes;it also checks that all 32-bit libraries have 64-bit counterparts. This module can be used to check compliancewith Intel(R) Cluster Ready Specification versions 1.2 and 1.3.Having a functional runtime ensures that binaries run without extra configuration steps. Minimum softwareruntime requirements ensure that functional clusters are built when following the specification. This set oflibraries can also be used by component developers as a baseline for interoperability, support and validation.
8.2 METHOD
The libraries module retrieves a full list of available libraries and their versions from the dynamic linkercache, using the ldconfig command. It also looks for required runtime library files under /opt/intel. Forthe counterpart check, it confirms that all 32-bit libraries in the dynamic linker cache have a 64-bit counterpart.If the path to the runtime libraries is located in a shared filesystem; then the search is optimized and only areference node is checked for compliance.The required libraries and their versions are listed in the libs.csv file in a directory in the installation path ofIntel(R) Cluster Checker.
8.3 CONFIGURATION
The version of the Intel(R) Cluster Ready Specification to be checked can be set using the complianceand certification options, please see the User's Guide for more information about specifying the Intel(R)Cluster Ready Specification version.
8.3.1 exclude
The exclude tag specifies the full name of a library to exclude from comparisons.Default value: none
8.4 EXAMPLE
<libraries><exclude>libI810XvMC.so.1</exclude>
</libraries>
10
Intel R© Cluster Checker 2.1
8.5 MODULE INFORMATION
• Level: 2
8.6 DEPENDENCIES
• Commands: find, ldconfig
• Test Modules: remote login
9 micconf
Checks Intel(R) Xeon Phi(TM) configuration health and uniformity
9.1 DESCRIPTION
The micconf module checks that the Intel(R) Xeon Phi(TM) configuration is healthy and uniform.It first checks that the micinfo information is correct and uniform in compute nodes. Any error, undefinedvalue or difference is reported. Then it checks the sanity of the coprocessor cards by running the miccheckdiagnostic tool in every compute node.The micinfo fields names are normalized to make their handling easier. The micinfo health check makes sureno undefined values are shown. The micinfo uniformity check makes sure relevant fields are uniform.Frequency, voltage and speed should be reported as non-zero. Temperature should be a value between 40and 100. Only differences less than 128 MB, 100000 uV or 20 C are allowed.The default behavior can be altered by custom configuration.
9.2 METHOD
The outputs of micinfo and miccheck are parsed and compared on compute nodes. By default, miccheckis run with its default configuration.
9.3 CONFIGURATION
9.3.1 options
The options tag can be used to include or exclude any test executed by miccheck. If no options are provided,miccheck will be run by default.Default value: noneAllowed values: miccheck command line options
9.3.2 mpss-path
The mpss-path tag specifies the location of micinfo and miccheck commands.Default value: /opt/intel/mic/bin
9.3.3 include
The include tag includes one or more micinfo fields for uniformity comparison.Field names are simplified by using lowercase and removing spaces. For instance the following field is shortenedas device0-board-vendor-id.Device No: 0, Device Name: L1OM Board Vendor ID : 8086Default value:On the host system:host-system-info-driver-version host-system-info-mpss-version host-system-info-os-versionOn each attached device:board-device-id board-ecc-mode board-mic-board-stepping board-mic-processor-family board-mic-processor-family-ext board-mic-processor-model board-mic-processor-model-ext board-mic-processor-stepping-id board-mic-processor-type board-sku board-subsystem-id board-vendor-id core-frequency core-total-no-of-active-corescore-voltage gddr-gddr-density gddr-gddr-frequency gddr-gddr-size gddr-gddr-technology gddr-gddr-vendorgddr-gddr-voltage thermal-die-temp thermal-fsc-strap thermal-smc-firmware-version version-flash-versionversion-uos-version
11
Intel R© Cluster Checker 2.1
9.3.4 exclude
The exclude tag excludes one or more micinfo fields from checks.Default value: none
9.3.5 diff-c
The diff-c tag changes the default allowed deviation in Celsius degrees of temperature.Default value: 20
9.3.6 diff-mb
The diff-mb tag sets the default allowed deviation in MBs of memory size.Default value: 128
9.3.7 diff-uv
The diff-uv tag sets the default allowed deviation in uV of voltage.Default value: 100000
9.4 EXAMPLE
<micconf><diff-c>20</diff-c><diff-mb>128</diff-mb><diff-uv>100000</diff-uv><exclude>device0-board-pcie-speed</exclude><include>device0-thermal-fan-rpm</include><options>-t install,load,driver,detect 0</options><mpss-path>/opt/intel/mic/bin</mpss-path>
</micconf>
9.5 MODULE INFORMATION
• Level: 2
9.6 DEPENDENCIES
• Commands: miccheck, micinfo• Test Modules: remote login
10 micmpi
Intel(R) MPI Library internode check for Intel(R) Xeon Phi(TM) coprocessors
10.1 DESCRIPTION
The micmpi module test checks the basic functionality of the Intel(R) MPI Library Runtime Environmentover the cluster list of Intel(R) Xeon Phi(TM) coprocessors by running an MPI Hello World program usingone or more Intel(R) MPI Library devices.In order to be able to run MPI jobs on Intel(R) Xeon Phi(TM), a version 4.1 or higher of Intel(R) MPI Libraryis needed.
10.2 METHOD
The micmpi module runs an MPI Hello World program on one or more Intel(R) MPI Library devices. Thetest module copies the MPI Hello world program in the home folder of the user running the tool, assumingthis folder is shared among all the nodes. By default, it exercises 4 MPI processes on each Intel(R) XeonPhi(TM), leaving Intel(R) MPI Library to select the device to be used.The test module uses the I MPI FABRICS style introduced in Intel(R) MPI Library version 4.0. It also assumesthat the $HOME and MPI path are shared between the hosts and the Intel(R) Xeon Phi(TM).
12
Intel R© Cluster Checker 2.1
10.3 CONFIGURATION
10.3.1 device
The device tag specifies a string to specify which Intel(R) MPI Library device to use. It may be defined morethan once. Any extra options can be provided by using the "options" XML attribute. The options should beprovided in order, as required by the library. If no device is specified, the Intel(R) MPI Library will use themost appropriate fabric for communication between processes.Allowed values: fabric | intra-node fabric : inter-node fabricwhere
• fabric is one of {shm, dapl, tcp, tmi, ofa}
• intra-node fabric is one of {shm, dapl, tcp, tmi, ofa}
• inter-node fabric is one of {dapl, tcp, tmi, ofa}
Default value: none
10.3.2 mpi-path
The mpi-path tag specifies the base path to the Intel(R) MPI Library installation directory. Setting thisparameter automatically sets up the environment.
10.3.3 process-number
The process-number tag specifies the number of MPI processes to start on each node.Allowed values: Integer value.Default value: 4
10.4 NODEFILE CONFIGURATION
When creating a nodefile for your cluster, to define an Intel(R) Xeon Phi(TM) node, you must assign thenode type as follows: #type: knc-compute.
10.5 NODEFILE EXAMPLE
headnode #type: headcompute00compute00 -mic0 #type: knc-compute
10.6 CONFIG EXAMPLE
<cluster><test><micmpi>
<device>shm:tcp</device><device options="-genv I_MPI_DEBUG 5">shm:dapl</device><mpi-path>/opt/intel/impi/4.1</mpi-path><process-number>2</process-number>
</micmpi></test>
</cluster>
10.7 MODULE INFORMATION
• Level: 2
10.8 DEPENDENCIES
• Commands: mktemp, mpirun
• Test Modules: micperf
13
Intel R© Cluster Checker 2.1
10.9 NOTES
This test module does not build the MPI Hello World binary using the MPI Library compiler wrappers (i.e.,mpicc). It uses a pre compiled binary.
11 micperf
Checks Intel(R) Xeon Phi(TM) native performance
11.1 DESCRIPTION
The micperf module runs the STREAM benchmark natively in all available coprocessors and compares thedeviation between them.
11.2 METHOD
The micperf module runs the STREAM benchmark natively on all available coprocessors.
11.3 CONFIGURATION
11.3.1 stream-deviation
The stream-deviation tag specifies the number of allowed standard deviations from median, used to searchfor outlier values. The allowed range is (median +/- deviation x stddev).Default value: 3
11.3.2 stream-bandwidth
The stream-bandwidth tag specifies the minimum acceptable Triad memory bandwidth, in MB/s.Default value: none
11.4 XML CONFIG EXAMPLE
<cluster><test><micperf>
<stream-bandwidth>51067</stream-bandwidth><stream-deviation>3</stream-deviation>
</micperf></test>
</cluster>
11.5 NODEFILE CONFIGURATION
When creating a nodefile for your cluster, to define an Intel(R) Xeon Phi(TM) node, you must assign thenode type as follows: #type: knc-compute.
11.6 NODELIST EXAMPLE
headnode #type: headcompute00compute00 -mic0 #type: knc-compute
11.7 MODULE INFORMATION
• Level: 3
11.8 DEPENDENCIES
• Test Modules: micconf
14
Intel R© Cluster Checker 2.1
12 mount
Check that known directories are correctly mounted
12.1 DESCRIPTION
The mount module checks that well known directories are correctly mounted or meet certain requirements.The /proc filesystem and shared memory device (/dev/shm) should be mounted. The /home directory shouldbe a shared, common directory accessible from any cluster compute node. The permissions on the temporarydirectory should be correct.
12.2 METHOD
12.2.1 Compliance mode:
Running in compliance mode ensures that the home directory of the user running the tool is under /homeand it's inode number is the same on all nodes. The stat command is used to gather the information. Whenthe tool is executed by a privileged user, the home directory is tested for the first user in /etc/passwd filewith a home directory under /home.
12.2.2 Execution modes other than compliance:
Other execution modes also check the status of the /proc filesystem and /dev/shm using the /proc/mountsfile information.The mount module also checks that the permissions on the temporary directory are 1777 as reported by thestat command. If $TMPDIR is set, it will be used as the temporary directory, otherwise, /tmp will be used.
12.3 CONFIGURATION
12.3.1 user
If the user tag is provided, the test module will use the home directory of this user for comparing in all thenodes.
12.3.2 sticky
If the sticky tag is used, the mount module will consider permissions 0777 on /tmp directory to be alsocorrect.
12.4 EXAMPLE
<mount><sticky/><user> icr </user>
</mount>
12.5 MODULE INFORMATION
• Level: 1
12.6 DEPENDENCIES
• Commands: awk cat test• Test Modules: remote login
12.7 NOTES
This module assumes that /proc is the mount point for procfs.This module assumes that the temporary directory is named /tmp.If this module is being run by a privileged user and /home is managed by automount, at least one user accountshould be created prior to running this test module.
15
Intel R© Cluster Checker 2.1
13 mpi internode
Check the functionality of the Intel(R) MPI Library Runtime Environment
13.1 DESCRIPTION
The mpi internode module checks the basic functionality of the Intel(R) MPI Library Runtime Environmentover the cluster compute nodes by running an MPI Hello World program using one or more Intel (R) MPILibrary devices.
13.2 METHOD
The mpi internode module runs an MPI Hello World program on one or more Intel(R) MPI Library devices.The test module copies the MPI Hello world program to the home folder of the user running the tool (assumingthis folder is shared among all the nodes). By default, it exercises 4 MPI processes on each compute node,leaving the Intel(R) MPI Library select the device to be used.The test module uses I MPI FABRICS style, introduced in version 4.0.
13.3 CONFIGURATION
13.3.1 device
The device tag specifies a string to denote which Intel(R) MPI Library device to use. The tag may be definedmore than once. Any extra options can be provided by using an "options" XML attribute. The options shouldbe provided in order, as required by the library. If no device is specified, the Intel(R) MPI Library will use themost appropriate fabric for communication between processes.Allowed values: fabric | intra-node fabric : inter-node fabricwhere
• fabric is one of {shm, dapl, tcp, tmi, ofa}
• intra-node fabric is one of {shm, dapl, tcp, tmi, ofa}
• inter-node fabric is one of {dapl, tcp, tmi, ofa}
Default value: none
13.3.2 mpi-path
The mpi-path tag specifies the base path to the Intel(R) MPI Library installation directory. Setting this tagautomatically sets up the environment.
13.3.3 process-number
The process-number tag specifies the number of MPI processes to start on each node.Allowed values: Integer value.Default value: 4
13.4 EXAMPLE
<mpi_internode><device>shm:tcp</device><device options="-genv I_MPI_DEBUG 5">shm:dapl</device><mpi-path>/opt/intel/impi/4.0.3</mpi-path><process-number>2</process-number>
</mpi_internode>
13.5 MODULE INFORMATION
• Level: 2
16
Intel R© Cluster Checker 2.1
13.6 DEPENDENCIES
• Commands: mktemp, mpirun
• Test Modules: mount, remote login, tools, mpi local
13.7 NOTES
This test module does not build the MPI Hello World binary using the MPI Library compiler wrappers (i.e.,mpicc).
14 mpi local
Check the functionality of the Intel(R) MPI Library Runtime Environment
14.1 DESCRIPTION
The mpi local module checks the consistency of the MPI job startup commands among all the nodes and thebasic functionality of the Intel(R) MPI Library Runtime Environment on each node by running a single-nodeMPI Hello World program on one or more Intel(R) MPI Library devices.
14.2 METHOD
14.2.1 Compliance mode:
If running in compliance mode, the test module checks that the paths to mpirun and mpiexec are consistenton all nodes by using the which command. If no paths are found, the test fails.NOTE: It tries to resolve the path to mpirun and mpiexec by using the command which. It does not use<mpi-path>.
14.2.2 Execution modes other than compliance:
If running in a mode other than compliance, the module additionally checks the functionality of the MPIruntime environment, exercising (by default) 4 MPI processes over different network devices, using the shmand tcp I MPI FABRICS. Furthermore, if the /etc/dat.conf file or the DAT OVERRIDE variable are present,it also locally exercises the DAPL fabric device.The test module uses I MPI FABRICS style introduced in version 4.0.
14.3 CONFIGURATION
14.3.1 device
The device tag specifies a string to specify which Intel(R) MPI Library device to use. It may be defined morethan once. Any extra options can be provided by using an "options" XML attribute. The options should beprovided in order, as required by the library.Allowed values: fabric | intra-node fabric : inter-node fabricwhere
• fabric is one of {shm, dapl, tcp, tmi, ofa}
• intra-node fabric is one of {shm, dapl, tcp, tmi, ofa}
• inter-node fabric is one of {dapl, tcp, tmi, ofa}
Default value: none
14.3.2 mpi-path
The mpi-path tag specifies the base path to the Intel(R) MPI Library installation directory. Setting this tagautomatically sets up the environment.NOTE: This is not used to perform the uniformity checks.
17
Intel R© Cluster Checker 2.1
14.3.3 process-number
The process-number tag specifies the number of MPI processes to start on each node.Allowed values: Number greater than zero.Default value: 4
14.4 EXAMPLE
<mpi_local><device>shm:tcp</device><device options="-genv I_MPI_DEBUG 5">shm:dapl</device><mpi-path>/opt/intel/impi/4.0.3</mpi-path><process-number>2</process-number>
</mpi_local>
14.5 INFORMATION
• Level: 2
14.6 DEPENDENCIES
14.6.1 Compliance mode:
• Commands: mpirun, mpiexec
• Test Modules: remote login
14.6.2 Other modes:
• Commands: mktemp, stat, rm, mpirun, mpiexec
• Test Modules: mount, remote login
14.7 NOTES
This test module does not build the MPI Hello World binary using the MPI Library compiler wrappers (i.e.,mpicc).
15 packages
Check software packages presence and uniformity among nodes
15.1 DESCRIPTION
The packages module checks the uniformity and correctness of the installed packages. It can use a referencelist of packages to check for correctness. If no reference list file is configured, the test module checks packageuniformity among nodes.
15.2 METHOD
If a reference list of packages is configured, the test module will check that these packages are installed on thenodes; however, if no reference list is configured it checks uniformity of installed packages among the nodes.The reference list should contain one package per line, and lines starting with '#' will be interpreted as acomment.The test module obtains the list of installed packages on each node using the following commands:
• rpm --query --all # for Red Hat based systems
• dpkg -l | awk "{ print $2,$3 }" # for Debian based systems
18
Intel R© Cluster Checker 2.1
• pacman -Q # for Arch based systems
• ls /var/log/packages/ # for Slackware based systems
• ls -d /var/db/pkg/*//* # for Gentoo based systems
15.3 CONFIGURATION
15.3.1 node
The node tag specifies the path to the file with the list of installed packages, to be checked on computenodes, including head nodes acting as compute nodes.Default value: none
15.3.2 head
The head tag specifies the path to the file containing the list of installed packages to be checked on the headnode. This is not intended to be used for head nodes acting as compute nodes.Default value: none
15.3.3 exclude
The exclude tag specifies the name of a package to be excluded from the comparison. The string isinterpreted as a POSIX matching regular expression. It may be specified multiple times to exclude more thanone package.Note that to exactly match meta characters^[.*(${\()+|?<>
they should be escaped.Example: to exclude libstdc++-4.4.6-3.el.i686:
<exclude>libstdc\+\+-4\.4\.6-3\.el6\.i686</exclude>
Default value: none
15.3.4 include
The include tag explicitly verifies that the specified package is installed. It may be specified multiple timesto include more than one package.Default value: none
15.3.5 command
The command tag specifies a custom command that lists the packages installed in the system. The output ofthe command has to list one package per line.If none of the default commands can list the installed packages, a custom command should be used.Default value: none
15.4 EXAMPLE
<packages><command>pkg_info</command><exclude>rpm-4.3.3-9nonptl</exclude><exclude>xterm-*</exclude><head>head.list</head><include>mpich-ch_p4-gcc-oscar-module -1.2.7-4</include><node>node.list</node>
</packages>
15.5 MODULE INFORMATION
• Level: 3• Scope: head, compute
19
Intel R© Cluster Checker 2.1
15.6 DEPENDENCIES
• Commands: sed, rpm (Red Hat-based systems), dpkg (Debian-based systems), pacman (Arch-basedsystems), ls (for Gentoo and Slackware based systems).
• Test modules: remote login
16 ping
Check that all nodes respond to ping from the head node.
16.1 DESCRIPTION
The ping module checks that all nodes respond to ping from the head node.If running in compliance or certification mode, it checks that the number of nodes complies with the specifi-cation.A cluster is defined as containing one or more head/login nodes, multiple compute nodes, potentially groupedlogically into sub-clusters. Each node may provide multiple capabilities. A certified cluster shall contain atleast four compute nodes per sub-cluster.Minimum hardware requirements ensure functional clusters are built when following the specification.
16.2 METHOD
The ping command is used to gather status about all other nodes from the head node.
16.3 CONFIGURATION
16.3.1 time
The time tag specifies the maximum time allowed for a ping response in milliseconds.Default value: 100
16.4 EXAMPLE
<ping><time>50</time>
</ping>
16.5 MODULE INFORMATION
• Level: 1
16.6 DEPENDENCIES
• Commands: ping
17 process
Check for stale processes
17.1 DESCRIPTION
The process module checks that the process list does not contain runaway processes (in terms of CPU ormemory usage), zombies, or other stale processes.
17.2 METHOD
On each compute node, the test module uses information from the current processes in execution as returnedby the ps command.
20
Intel R© Cluster Checker 2.1
17.3 CONFIGURATION
17.3.1 elapsed time
The elapsed time tag specifies time (in seconds) that is used to define a stale process. See also exempt -uids.Allowed values: Integers greater than zero.Default value: 3600
17.3.2 exclude
The exclude tag specifies process names that are excluded from the check. The string is interpreted as aPOSIX regular expression. This option may be repeated to exclude more than one process name. Note thatto exactly match meta characters^[.*(${\()+|?<>
they should be escaped.Default value: monitoring processes and filesystem daemons
17.3.3 exempt uids
The exempt uids tag specifies uids lower than this value are exempt from the elapsed time check. Daemons,etc., started from system accounts should not be flagged as stale regardless of how long they have beenrunning. See also elapsed time.Allowed values: Integers greater than or equal to zero.Default value: 500
17.3.4 percent cpu
The percent cpu tag specifies the percentage of cpu that is used to define a runaway process. Note: onsome systems, the percent cpu is defined relative to a single core, on others it is relative to all cores.Allowed values: [1-100]Default value: 5
17.3.5 percent memory
The percent memory tag specifies the percentage of memory that is used to define a runaway process.Allowed values: [1-100]Default value: 1
17.3.6 zombie allowed elapsed time
The zombie allowed elapsed time tag specifies the time (in seconds) that is used to allow transient zom-bies. Intel(R) Cluster Checker and other applications may create transient zombies that are quickly, but notinstantly reaped. Do not flag these transient zombies as "true" zombie processes unless their elapsed time isgreater than this value.Allowed values: Greater than zero.Default value: 1
17.4 EXAMPLE
<process><elapsed_time>3600</elapsed_time><exclude>ntpd</exclude><exclude>portmap</exclude><exempt_uids>500</exempt_uids><percent_cpu>5</percent_cpu><percent_memory>1</percent_memory><zombie_allowed_elapsed_time>1</zombie_allowed_elapsed_time>
</process>
17.5 MODULE INFORMATION
• Level: 2
21
Intel R© Cluster Checker 2.1
17.6 DEPENDENCIES
• Commands: ps
• Test modules: remote login
18 remote login
Check remote connectivity
18.1 DESCRIPTION
The remote login module checks remote connectivity to all nodes, including remote command versionuniformity and proper execution time.
18.2 METHOD
The cmd tag is used to run simple commands in all nodes.
18.3 CONFIGURATION
18.3.1 cmd
The cmd tag specifies the remote execution command to be used.Default value: ssh
18.3.2 time
The time tag specifies the maximum time allowed for <cmd> to respond, in milliseconds.Default value: 100
18.3.3 version
The version tag specifies the output of '<cmd> -V' that should be received. (Notice that if a configured<cmd> is used, it should support the switch '-V' and it should print its version).NOTE: The version is ignored on knc-compute nodes.Default value: The output of a random node will be used as reference
18.4 EXAMPLE
<remote_login><cmd>ssh</cmd><time>1</time><version>OpenSSH_5.3p1, OpenSSL 1.0.0-fips 29 Mar 2010</version>
</remote_login>
18.5 MODULE INFORMATION
• Level: 1
18.6 DEPENDENCIES
• Commands: ping
19 shells
Check that all nodes have the required shells
22
Intel R© Cluster Checker 2.1
19.1 DESCRIPTION
The shell module checks that all nodes have the required shells.
19.2 METHOD
For all shells, the module checks that the interpreter is present in /bin and is able to run a "Hello World"script using it. The list of shells tested are: sh, bash, csh, ksh and tcsh.
19.3 CONFIGURATION
19.3.1 none
19.4 MODULE INFORMATION
• Level: 1
19.5 DEPENDENCIES
• Commands: chmod, echo, mktemp, test
• Test Modules: remote login
20 storage
Check Intel(R) Cluster Ready specification compliance
20.1 DESCRIPTION
The storage module checks that the storage capacity available to the head node meets the requirements inthe Intel(R) Cluster Ready Specification. This module can check against Intel Cluster Ready Specificationsversions 1.2 or 1.3.Compliance with minimum hardware requirements ensures that functional clusters are built when followingthe specification.
20.2 METHOD
The disk space information is retrieved by using the df command. Shared memory partitions are not consid-ered. If no head node is detected, then the first compute node is used.
20.3 CONFIGURATION
The version of the Intel(R) Cluster Ready Specification to be checked can be set using the complianceand certification options, please see the User's Guide for more information about specifying the Intel(R)Cluster Ready Specification version.
20.4 MODULE INFORMATION
• Level: 1
20.5 DEPENDENCIES
• Commands: df
• Test Modules: remote login
23
Intel R© Cluster Checker 2.1
21 stream
Check the memory bandwidth of a node using the STREAM benchmark
21.1 DESCRIPTION
The stream module checks the memory bandwidth of each compute node using the Triad STREAM bench-mark and its deviation among the cluster nodes. Deviation is checked only if there are three or more validresults from the compute nodes.
21.2 METHOD
STREAM is configured to use a 30 million element array by default using a pre-compiled binary, which requiresnearly 687 MB of memory.
21.3 CONFIGURATION
21.3.1 bandwidth
The bandwidth tag specifies the minimum acceptable Triad memory bandwidth, in MB/s.NOTE: If not configured, no comparison will be made and the obtained bandwidth, will be displayed.Default value: none
21.3.2 deviation
The deviation tag specifies the number of allowed standard deviations from median, used to search foroutlier values. The allowed range is (median +/- deviation x stddev).Default value: 3
21.4 EXAMPLE
<stream><bandwidth>1000</bandwidth><deviation>3</deviation>
</stream>
21.5 MODULE INFORMATION
• Level: 3
21.6 DEPENDENCIES
• Test Modules: remote login
22 tools
Check that all nodes have the required tools
22.1 DESCRIPTION
The tools module checks that all nodes have the required tools and functionality.
22.2 METHOD
For all tools, the module checks that the interpreter is present in /usr/bin, the versions are uniform, andthey run a "Hello World" one-liner. The list of tools tested are: perl, python, tclsh.NOTE: If the versions are not explicitly configured in the configuration file and the execution was configuredto check compliance, then versions are compared against the Intel(R) Cluster Ready specification.
24
Intel R© Cluster Checker 2.1
22.3 CONFIGURATION
The version of the Intel(R) Cluster Ready Specification to be checked can be set using the complianceand certification options, please see the User's Guide for more information about specifying the Intel(R)Cluster Ready Specification version.
22.3.1 python-path
The python-path tag specifies the path where the python interpreter is located.Default value: /usr/bin
22.3.2 python-version
The python-version tag specifies the version of python expected.Default values (in compliance mode):
• (Intel(R) Cluster Ready 1.2) 2.3.4
• (Intel(R) Cluster Ready 1.3) 2.4.3
22.3.3 perl-path
The perl-path tag specifies the path where the perl interpreter is located.Default value: /usr/bin
22.3.4 perl-version
The perl-version tag specifies the version of perl expected.Default values (in compliance mode):
• (Intel(R) Cluster Ready 1.2) 5.6.1
• (Intel(R) Cluster Ready 1.3) 5.8.8
22.3.5 tclsh-path
The tclsh-path tag specifies the path where the tclsh interpreter is located.Default value: /usr/bin
22.3.6 tclsh-version
The tclsh-version tag specifies the version of tclsh expected.Default values (in compliance mode):
• (Intel(R) Cluster Ready 1.2) 8.4.7
• (Intel(R) Cluster Ready 1.3) 8.4.13
22.4 EXAMPLE
<tools><perl-path>/usr/bin</perl-path><perl-version>5.8.8</perl-version>
<python-path>/usr/bin</python-path><python-version>2.4.3</python-version>
<tclsh-path>/usr/bin</tclsh-path><tclsh-version>8.4.13</tclsh-version>
</tools>
22.5 MODULE INFORMATION
• Level: 2
25
Intel R© Cluster Checker 2.1
Optimization Notice
Intel compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique toIntel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel doesnot guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizationsnot specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User andReference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
27