Improved Software Fault Detection with Graph Mining

Improved Software Fault Detection with Graph Mining

Frank Eichinger [email protected] Bohm [email protected] Huber [email protected] for Program Structures and Data Organisation (IPD), Universitat Karlsruhe (TH), Germany

Keywords: graph mining, software fault detection, weighted graph mining, call graphs

Abstract

This work addresses the problem of discov-ering bugs in software development. Weinvestigate the utilisation of call graphs ofprogram executions and graph mining algo-rithms to approach this problem. We pro-pose a novel reduction technique for callgraphs which introduces edge weights. Then,we present an analysis technique for suchweighted call graphs based on graph miningand on traditional feature selection. Our newapproach finds bugs which could not be de-tected so far. With regard to bugs which canalready be localised, our technique also dou-bles the precision of finding them.

1. Motivation

Software quality is a big concern in industry. Almostany software displays at least some minor bugs afterbeing released, incurring significant costs. A group ofbugs which is particularly hard to handle is noncrash-ing bugs, e.g., logical failures which do not lead to acrash but to faulty results. Such bugs are in generalhard to find, as no stack trace of the failure is available.

2. Related Work

Research in the field of software reliability has beenextensive, and various techniques have been devel-oped for locating bugs. One direction of research isstatic analysis, where properties of the source code orthe version history are analysed. Another directionis dynamic analysis, which requires the execution ofthe program. These techniques can be further distin-

Appearing in the 6 th International Workshop on Miningand Learning with Graphs, Helsinki, Finland, 2008.

guished in data centric (analysing values of variables)and control flow oriented (analysing branches taken).

Many static techniques require a large bug and ver-sion history database, which is not always available.Nagappan et al. (2006), e.g., use standard metrics fromsoftware engineering and mine the source code with re-gression models.

Dynamic techniques usually work with just the currentversion of the software and require test oracles, whichdecide if an execution is correct or failing. An exampleof a dynamic data flow focused approach is the workof Liblit et al. (2003). It instruments the source codeto gain program invariants. These are used as featureswhich are analysed with regression techniques. Thisallows to discover potentially buggy pieces of code.Such instrumentation based approaches usually sufferfrom a poor runtime behaviour or from missed bugs,if the software is not fully instrumented.

Dynamic control flow based techniques often rely oncall graphs of program executions. In such graphs,a node represents a method and an edge a methodcall. Figure 1(a) gives an abstract example. Recentwork operating on call graphs by Liu et al. (2005) anddi Fatta et al. (2006) employ graph mining techniquesfor bug localisation. These studies discover structuralpatterns in call graphs, which are characteristic forfailing executions. A major challenge of graph mining

a

b c

b b b

(a)

a

b

1

c

1

b

3

(b)

a

b

c

(c)

a

b c

b b

(d)

Figure 1. Variants of reduced call graphs.


techniques is that such algorithms do not scale for largegraphs. Call graphs become relatively huge, as sub-structures typically are repeated many times. There-fore, reduction techniques need to be applied first. Inthe mentioned literature, this leads to a loss of infor-mation. Namely, call frequencies are lost, which areimportant for detecting certain groups of bugs.

3. Call Graph Reduction

The last section has mentioned two approaches of re-ducing software call graphs. The first one from Liuet al. (2005) does total reduction every method oc-curs just once within the graph. See Figure 1(c) fora totally reduced version of Figure 1(a). This leadsto small graphs, which allows for graph-mining-basedbug localisation even with larger software projects. Onthe other side, much information about the programexecution is lost, e.g., frequencies of the execution ofmethods and information on different structural pat-terns within the graphs. The approach from di Fattaet al. (2006) omits substructures which are called morethan twice in a row (see Figure 1(d)). Thus, it keepsmore information than the other one, with the risk ofgenerating very large graphs. In consequence, graphmining algorithms might not work on these graphs.

In our approach, we try to overcome the shortcomingsof both approaches and keep most of the informationavailable. We reduce substructures executed severaltimes in a row by deleting all but the first one andinserting the call frequencies as edge weights (see Fig-ure 1(b)). This allows for a detailed analysis of the callfrequencies. If, for example, a bug is hidden in a loopcondition, this might lead to hundreds of iterations ofthe loop, compared to just a few in correct program ex-ecutions. Note that, with both previous graph reduc-tion techniques, the graph of the correct and failing ex-ecution is reduced to exactly the same structure in thiscase. In our approach, the edge weights would be sig-nificantly different. This raises the need for weightedgraph mining algorithms. In the following section, wepresent a technique for analysing differences in edgeweights subsequent to traditional graph mining.

4. The Mining Framework

We will now describe our framework to derive a rank-ing of potentially buggy methods from call graphs.This ranking can be given to a software developer whocan do a code review of the suspected methods. Atfirst, frequent subgraph mining is applied to the re-duced call graphs. The resulting frequent subgraphsare then processed with two different approaches: the

conventional scoring approach and our entropy basedapproach. As some bugs result in different call fre-quencies while others result in different substructures,we combine both scores, which leads to a final rank-ing. Figure 2 is an overview of this framework thefollowing paragraphs describe the individual steps.

conventional approach entropy based approach

reduced call graphs

frequent subgraph mining

conventional-scoring(based on support measures)

frequent in failingbut not in correct

entropy-scoring(based on edge weights)

occurring infailing and correct

conventional method-ranking combination ofthe scores

intermediatemethod-ranking

combined method-ranking

Figure 2. The ranking framework.

After having reduced the call graphs gained from cor-rect and failing program executions, we do frequentsubgraph mining with the CloseGraph algorithm (Yan& Han, 2003), ignoring the edge weights for now.

In the conventional approach (di Fatta et al., 2006),we just consider the discovered subgraphs which arefrequent within the set of failing executions, but notfrequent in the set of correct ones. In order to gain ascoring of the methods, we calculate for every methodthe probability Pc for containing a bug. Please see theliterature for further details.

In our entropy based approach, we focus on fre-quent subgraphs occurring in both classes of programexecutions, the class of correct and the class of failingones. Our goal is to find out which edge weights aremost significant to discriminate the two classes of exe-cutions. To this end, we first assemble a feature table.This table contains all edge weights in all subgraphsdiscovered by CloseGraph in the columns1 and all pro-gram executions (represented by the call graphs) in therows. The following table serves as an example:

SG1 SG1 SG2ab ac ab Class

Graph1 2 1 6 failingGraph2 0 0 4 correct

1More precisely, we also differentiate between edges oc-curring at different positions within one subgraph.


The first column corresponds to the first subgraph(SG1 ) and the edge from a to b, the second column tothe same subgraph but the edge from a to c, the thirdcolumn to the second subgraph (SG2 ) etc. In the lastcolumn, the class correct or failing is displayed.

Once the table is filled, we employ a standard featureselection algorithm to score the columns of the tableand thus the effect of the different edges. We use anentropy based algorithm which calculates the informa-tion gain for every column. At this point, the rankingnot only contains suspected methods, but suspectedmethod calls (edges). Thus, not just the informationin which method a bug is hidden is included, but alsowhere exactly within this method. In order to com-pare and to combine our results with those of di Fattaet al. (2006), we just keep the starting methods fromour edge list and omit the methods called. As our listmight contain duplicates, we eliminate them and keepthe maximum score for every method. This leads toour score Pe for methods and an intermediate rankingordered by the likelihood of containing a bug.

Finally, we combine our edge weight based resultsgained so far with the ones from di Fatta et al. (2006).This results in an overall likelihood P of containinga bug for every method, based on the average of thenormalised values for Pc and Pe.

5. Experimental Results

For the evaluation of our technique, we instrumented aJava diff tool (Darwin, 2004) with nine different bugs.The bugs we are using represent the same types ofbugs which are used in the evaluation of Liu et al.(2005) and di Fatta et al. (2006). We executed everyversion corresponding to an instrumented bug exactly100 times with different input data. We then classi-fied the results as correct or failing executions witha test oracle based on a bug free reference version.Based on this data, we carried out three experiments(see Figure 2): (1) The conventional method-rankingfollowing the approach from di Fatta et al. (2006), in-cluding its graph reduction technique, (2) the inter-mediate method-ranking, using our reduction techniquetogether with the entropy based scoring and (3) thecombined method-ranking which combines (1) and (2).

In order to evaluate the precision of the results, itneeds to be checked in which line (out of 25 methodsin our program) of the ranking the first instrumentedbug is found. A software developer would have to payattention to all lines till the bug is found. The follow-ing table displays the results of the three experimentsfor all nine versions (25 refers to bugs not discovered):

Exp. \ Bug 1 2 3 4 5 6 7 8 91. Conventional 3 25 1 4 6 4 3 3 12. Intermediate 3 3 1 1 1 3 3 1 253. Combined 1 3 1 2 2 1 2 1 3

Bug 2 is not found by the structural technique (1) as itoccurs within a loop and affects call frequencies only:it is found with edge weight analysis (2). Bug 9 inturn does not affect any call frequencies and can onlybe found by structural analysis (1). The combined ap-proach (3) proves to be key to locate both bugs. Look-ing at the other bugs, our intermediate approach (2)already performs better than the conventional one (1).The combination (3) can slightly improve this. Leav-ing aside Bugs 2 and 9, our combined approach morethan doubles the precision of di Fatta et al. (2006).

6. Conclusion

In this work we have presented a dynamic control flowcentred approach for the localisation of noncrashingbugs. It uses call graphs which are reduced by a noveltechnique. This is done by introducing edge weightsrepresenting call frequencies. As none of the recentlydeveloped graph mining algorithms allows for the anal-ysis of weighted graphs, we have developed a respectivetechnique. This involves frequent subgraph miningand scoring of numerical edge weights using an entropybased algorithm. Our experiments show that a combi-nation of structural and numerical mining techniquesleads to significantly improved bug localisations.

References

Darwin, I. F. (2004). Java Cookbook. OReilly.

di Fatta, G., Leue, S., & Stegantova, E. (2006). Dis-criminative Pattern Mining in Software Fault Detec-tion. Proc. of the Int. Workshop on Software QualityAssurance (SOQUA).

Liblit, B., Aiken, A., Zheng, A. X., & Jordan, M. I.(2003). Bug Isolation via Remote Program Sam-pling. ACM SIGPLAN Notices, 38, 141154.

Liu, C., Yan, X., Yu, H., Han, J., & Yu, P. S. (2005).Mining Behavior Graphs for Backtrace of Non-crashing Bugs. Proc. of the Int. Conf. on Data Min-ing (SDM).

Nagappan, N., Ball, T., & Zeller, A. (2006). MiningMetrics to Predict Component Failures. Proc. of theInt. Conf. on Software Engineering (ICSE).

Yan, X., & Han, J. (2003). CloseGraph: Mining ClosedFrequent Graph Patterns. Proc. of the Int. Conf. onKnowledge Discovery and Data Mining (KDD).

Documents

Improved Software Fault Detection with Graph Mining