© SERG Reverse Engineering (Bunch) Software Clustering Using Bunch

  • View
    216

  • Download
    1

Embed Size (px)

Text of © SERG Reverse Engineering (Bunch) Software Clustering Using Bunch

  • Slide 1
  • SERG Reverse Engineering (Bunch) Software Clustering Using Bunch
  • Slide 2
  • SERG Reverse Engineering (Bunch) Software Clustering Environment Source Code Source Code Analyzer (e.g., Chava) Module Dependency Graph Output File Graph Layout & Visualization Utility (e.g., dotty) Clustered Graph Query Script (e.g., AWK) Bunch Clustering Tool Source Code Database
  • Slide 3
  • SERG Reverse Engineering (Bunch) The Structure of a Graphical Editor boxParserClass error stackOfAnyWithTopboxScannerClass Fonts GlobalsMathlibEdgeClassColorTablestackOfAny Lexer hashedBoxes Event GenerateProlog NodeOfAny hashGlobals main boxClass
  • Slide 4
  • SERG Reverse Engineering (Bunch) The Software Clustering Problem Find a good partition of the Software Structure. A partition is the decomposition of a set of elements (i.e., all the nodes of the graph) into mutually disjoint clusters. A good partition is a partition where: highly interdependent nodes are grouped in the same clusters independent nodes are assigned to separate clusters
  • Slide 5
  • SERG Reverse Engineering (Bunch) Not all Partitions are Created Equal... Good Partition! Bad Partition! N1 N2 N5 N4 N6 N1 N2 N3N5 N4 N6 N1 N2 N4 N3 N5 N6 N3
  • Slide 6
  • SERG Reverse Engineering (Bunch) Our Assumption Well designed software systems are organized into cohesive clusters that are loosely interconnected.
  • Slide 7
  • SERG Reverse Engineering (Bunch) Problem: There are too many partitions of the MDG 1 = 1 2 = 2 3 = 5 4 = 15 5 = 52 6 = 203 7 = 877 8 = 4140 9 = 21147 10 = 115975 11 = 678570 12 = 4213597 13 = 27644437 14 = 190899322 15 = 1382958545 16 = 10480142147 17 = 82864869804 18 = 682076806159 19 = 5832742205057 20 = 51724158235372 otherwisekSS nkkif S knkn kn,11,1, 11 A 15 Module System is about the limit for performing Exhaustive Analysis The number of MDG partitions grows very quickly, as the number of modules in the system increases
  • Slide 8
  • SERG Reverse Engineering (Bunch) Edge Types With respect to each cluster, there are two different kinds of edges: edges (Intra-Edges) which are edges that start and end within the same cluster edges (Inter-Edges) which are edges that start and end in different clusters a bc CLUSTER Other Clusters
  • Slide 9
  • SERG Reverse Engineering (Bunch) A Partition of the Structure of the Graphical Editor Main Event error boxParserClass Globals Generate Prolog stackOfAnyWithTop boxClass colorTable Fonts edgaClassMathLib stackOfAny NodeOfAny hashedBoxes hashGlobals boxScanner Class Lexer
  • Slide 10
  • SERG Reverse Engineering (Bunch) Example: The Structure of the dot System
  • Slide 11
  • SERG Reverse Engineering (Bunch) Example: The Structure of the dot System The Module Dependency Graph (MDG) We represent the structure of the system as a graph, called the MDG The nodes are the modules/classes The edges are the relations between the modules/classes The Module Dependency Graph (MDG) We represent the structure of the system as a graph, called the MDG The nodes are the modules/classes The edges are the relations between the modules/classes
  • Slide 12
  • SERG Reverse Engineering (Bunch) Example: The Structure of the dot System Dot is a reasonably small system, but its structure is complex and hard to understand
  • Slide 13
  • SERG Reverse Engineering (Bunch) The dot System after Clustering and Filtering Modules From the Source Code The clustered view highlights Relations Between the Modules Subsystems that group common features Identification of special modules (relations are hidden to improve clarity)
  • Slide 14
  • SERG Reverse Engineering (Bunch) The dot System after Clustering and Filtering The Partitioned Module Dependency Graph (MDG) The partitioned MDG contains clusters that group related nodes from the MDG. The clusters represent the subsystems The subsystems highlight high-level features or services provided by the modules in the cluster. The Partitioned Module Dependency Graph (MDG) The partitioned MDG contains clusters that group related nodes from the MDG. The clusters represent the subsystems The subsystems highlight high-level features or services provided by the modules in the cluster.
  • Slide 15
  • SERG Reverse Engineering (Bunch) The dot System after Clustering and Filtering The clustered view of dot (its partitioned MDG) is easier to understand than the unclustered view
  • Slide 16
  • SERG Reverse Engineering (Bunch) Step 1: Creating the MDG Example: The MDG for Apaches Regular Expression class library Source Code Analysis Tools Source Code void main() { printf(hello); } AcaciaChava 1.The MDG can be generated automatically using source code analysis tools 2.Nodes are the modules/classes, edges represent source-code relations 3.Edge weights can be established in many ways, and different MDGs can be created depending on the types of relations considered
  • Slide 17
  • SERG Reverse Engineering (Bunch) Step 1: Creating the MDG Example: The MDG for Apaches Regular Expression class library Source Code Analysis Tools Source Code void main() { printf(hello); } AcaciaChava 1.The MDG can be generated automatically using source code analysis tools 2.Nodes are the modules/classes, edges represent source-code relations 3.Edge weights can be established in many ways, and different MDGs can be created depending on the types of relations considered Automatic MDG Generation We provide scripts on the SERG web page to create MDGs automatically for systems developed in C, C++, and Java. The MDG eliminates details associated with particular programming languages Automatic MDG Generation We provide scripts on the SERG web page to create MDGs automatically for systems developed in C, C++, and Java. The MDG eliminates details associated with particular programming languages
  • Slide 18
  • SERG Reverse Engineering (Bunch) Why do we want to Cluster? To group components of a system. Grouping gives some semantic information We find out which components are strongly related to each other. We find out which components are not so strongly connected to each other. This gives us an idea of the logical subsystem structure of the entire system.
  • Slide 19
  • SERG Reverse Engineering (Bunch) Clustering useful in practice? Documentation may not exist. As software grows, the documentation doesnt always keep up to date. There are constant changes in who uses and develops a system.
  • Slide 20
  • SERG Reverse Engineering (Bunch) Our Approach to Automatic Clustering Treat automatic clustering as a searching problem Maximize an objective function that formally quantifies the quality of an MDG partition. We refer to the value of the objective function as the modularization quality (MQ) MQ is a Measurement and not a Metric
  • Slide 21
  • SERG Reverse Engineering (Bunch) Why automatic clustering? Experts are not always available. Gives the previously mentioned benefits without costing much.
  • Slide 22
  • SERG Reverse Engineering (Bunch) How does clustering help? Helps create models of software in the absence of experts. Gives the new developers a road map of the softwares structure. Helps developers understand legacy systems. Lets developers compare documented structure with some inherent structure of the system.
  • Slide 23
  • SERG Reverse Engineering (Bunch) Module Dependency Graph Represents the dependencies between the components of a system. Nodes are components: Classes, modules, files, packages, functions Edges are relationships between components: Inherit, call, instantiate, etc
  • Slide 24
  • SERG Reverse Engineering (Bunch) Software Clustering Problem For a given mdg, and a function that determines the quality of a partitioning of that mdg, find the best partition.
  • Slide 25
  • SERG Reverse Engineering (Bunch) Possible Answer? We could look at every possible partition Exhaustive search always gives the best answer. The number of partitions of a set are equal to the recurrence given by Sterling: S(n, k) =1 if k = 1 or k = n S(n 1, k 1) + k S(n 1, k)
  • Slide 26
  • SERG Reverse Engineering (Bunch) Exhaustive Search If Exhaustive search is optimal, then why not use it? The recurrence to the Sterling recurrence grows very rapidly. The search for partitioning for n nodes must go through partitions for all k in S(n, k). Thus the size of the search space is: S(n, k), for k = 1..n enormous
  • Slide 27
  • SERG Reverse Engineering (Bunch) Other Answers If we dont use Exhaustive search, then we have to use a sub-optimal strategy. Just because a strategy is sub-optimal, though, doesnt mean it wont give good results.
  • Slide 28
  • SERG Reverse Engineering (Bunch) Software Clustering with Search Algorithms Source Code Analysis Tools MDG Source Code void main() { printf(hello); } AcaciaChava M1 M2 M3 M5M4 M6 M7M8 Software Clustering Search Algorithms bP = null; while(searching()) { p = selectNext(); if(p.isBetter(bP)) bP = p; } return bP; GOOD MDG Partition M1 M2 M3 M5M4 M6 M7M8 SEARCH SPACE Set of All MDG Partitions M1 M2 M3 M5M4 M6 M8M7 M1 M2 M3 M5M4 M6 M8M7 Total = 4140 Partitions
  • Slide 29
  • SERG Reverse Engineering (Bunch) Software Clustering with Search Algorithms Source C