23
Decision Trees and MPI Decision Trees and MPI Collective Algorithm Collective Algorithm Selection Problem Selection Problem Jelena Pje¡sivac-Grbovi´c ,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra, IPDPS(IEEE International Parallel & Distributed Processing Symposium) 2007. Reporter : Yu Tang Liu

Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

Embed Size (px)

Citation preview

Page 1: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

Decision Trees and MPI Decision Trees and MPI Collective Algorithm Collective Algorithm Selection ProblemSelection Problem

Jelena Pje¡sivac-Grbovi´c ,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra, IPDPS(IEEE International Parallel & Distributed Processing Symposium) 2007.

Reporter : Yu Tang Liu

Page 2: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

OutlineOutlineAbstractIntroductionC4.5 Decision Tree algorithmExperimental Results and

AnalysisConclusion

Page 3: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

AbstractAbstractSelecting the close-to-optimal

collective algorithm based on the parameters of the collective call at run time is an important step in achieving good performance of MPI applications.

Explore the applicability of C4.5 decision trees to the MPI collective algorithm selection problem.

Page 4: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

IntroductionIntroductionPerformance of MPI collective

operations depend on◦Total number of nodes involved in

communication◦System and network characteristics◦Size of data being transferred◦Current load◦The operation that is being performed◦The segment size used for operation

pipeliningSelecting the best possible algorithm

and segment size combination for every instance of collective operation.

Page 5: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

IntroductionIntroductionProcess of tuning a system

1.Detailed profiling of the system, possibly combined with communication modeling.

2.Analyzing the collected data and generating a decision function

3.During run-time, the decision function selects the close-to-optimal method(combination of algorithm and segment size) for a particular collective instance.

Page 6: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

C4.5 Decision Tree C4.5 Decision Tree AlgorithmAlgorithmDecision Tree Example

Page 7: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

C4.5 Decision Tree C4.5 Decision Tree AlgorithmAlgorithmIn the decision tree each node

corresponds to a non-categorical attribute and each arc to a possible value of that attribute. A leaf of the tree specifies the expected value of the categorical attribute for the records described by the path from the root to that leaf.

In the decision tree at each node should be associated the non-categorical attribute which is most informative among the attributes not yet considered in the path from the root.

Page 8: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

C4.5 Decision Tree C4.5 Decision Tree AlgorithmAlgorithmRequirement of application of

C4.5 algorithm◦Attribute-value description◦Predefined classes◦Discrete classes◦Sufficient data◦“Logical” classification models

Page 9: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

C4.5 Decision Tree C4.5 Decision Tree AlgorithmAlgorithmAdditional parameters that affect

the resulting decision tree◦Weight◦Confidence level◦Attribute grouping◦Windowing

Page 10: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

C4.5 Decision Tree C4.5 Decision Tree AlgorithmAlgorithm

◦ID3 algorithm

◦C4.5 algorithm= ID3 algorithm +

Page 11: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

Experimental Results and Experimental Results and AnalysisAnalysisC4.5 decision tree for Alltoall on

Nano cluster

Page 12: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

Experimental Results and Experimental Results and AnalysisAnalysisBarrier is a collective operation used to

synchronize a group of nodes. It guarantees that by the end of the operation, all processes involved in the barrier have at least entered the barrier. ◦ In flat-tree/linear algorithm all nodes report to a

preselected root; once every node has reported to the root, the root sends a releasing message to all participants.

◦ In the double ring algorithm, a zero-byte message is sent from a preselected root circularly to the right. A node can leave barrier only after it receives the message for the second time.

◦ Bruck algorithm requires communication steps. At step k, node r receives a zero-byte message from and sends message to node and node (with wrap around) respectively.

P2log

)2( kr )2( kr

Page 13: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

Experimental Results and Experimental Results and AnalysisAnalysis Alltoall is used to exchange data

among all processes in a group. The operation is equivalent to all processes executing the scatter operation on their local buffer. ◦ In the linear algorithm at step i, the ith node

sends a message to all other nodes. The (i+1)th node is able to proceed and start sending as soon as it receives the complete message from the ith node. We allow for segmentation of messages being sent.

◦ In the pairwise exchange algorithm, at step i, node with rank r sends a message to node (r+i) and receives a message from the (r-i)th node, with wrap around. We do not segment messages in this algorithm.

Page 14: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

Experimental Results and Experimental Results and AnalysisAnalysis The Broadcast operation transmits an identical

message from the root process to all processes of the group. At the end of the call, the contents of the root’s communication buffer is copied to all other processes.◦ In flat-tree/linear algorithm root node sends an

individual message to all participating nodes. ◦ In pipeline algorithm, messages are propagated from

the root left to right in a linear fashion. ◦ In binomial and binary tree algorithms, messages

traverse the tree starting at the root and going towards the leaf nodes through intermediate nodes.

◦ In the splitted-binary tree algorithm , the original message is split into two parts, and the “left” half of the message is sent down the left half of the binary tree, and the “right” half of the message is sent down the right half of the tree. In the final phase of the algorithm, every node exchanges message with their “pair” from the opposite side of the binary tree.

◦ binary tree algorithm

Page 15: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

Experimental Results and Experimental Results and AnalysisAnalysisThe Reduce operation combines

elements provided in the input buffer of each process within a group using the specified operation, and returns the combined value in the output buffer of the root process. ◦flat-tree/linear◦Pipeline◦binomial tree◦binary tree◦k-chain tree.

Page 16: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

Experimental Results and Experimental Results and AnalysisAnalysis

Page 17: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

Experimental Results and Experimental Results and AnalysisAnalysisBroadcast decision tree statistics

corresponding to the data presented in last figure.

Page 18: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

Experimental Results and Experimental Results and AnalysisAnalysisPerformance penalty of

Broadcast decision trees corresponding to the data presented in last Figure and table

Page 19: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

Experimental Results and Experimental Results and AnalysisAnalysis

Page 20: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

Experimental Results and Experimental Results and AnalysisAnalysisStatistics for combined Broadcast

and Reduce decision trees corresponding to the data presented in last figure.

Page 21: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

Experimental Results and Experimental Results and AnalysisAnalysisMean performance penalty of the

combined decision tree for each of the collectives.

Page 22: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

Experimental Results and Experimental Results and AnalysisAnalysisSegment of combined Broadcast

and Reduce decision tree ‘-m 40 –c 25’

Page 23: Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

ConclusionConclusionC4.5 decision tree can be used to generate

a reasonably small and very accurate decision function: the mean performance penalty on existing performance data was within the measurement error for all trees we considered.

These trees were also able to produce decision functions with less than 2.5% relative performance penalty for both collectives. This indicates that it is possible to use information about one MPI collective operation to generate a reasonable well decision function for another collective.