6
A Memetic Algorithm for Program Verification Nassima Aleb Computer Science Department USTHB Algiers, Algeria [email protected] Zahia Tamen Computer Science Department USTHB Algiers, Algeria [email protected] Abstract-We present a memetic algorithm for program safety properties verification. This problem is expressed by means of Reachability of some erroneous location L in the program. We use a new method for program modeling: A Separation Modeling Approach: ASMA, in which programs are represented by two components: Data Model DM, and Control Model CM. The erroneous location is represented by its “Location Access Chain”, LAC: a string where each position represents the required value of CM elements guards to reach L. The memetic algorithm generates each time a new population attempting to provide an execution which is "conform" to the location access chain. An individual of the population is a set of intervals each one representing an input variable. At each generation, two local search operators are used to improve some chosen solutions. Keywords-program modeling; program verification; memetic algorithms; reachability analysis I. INTRODUCTION Software verification has been an active area of recent research [1,2,5,10,12,17]. Having a program source and a temporal safety property, the aim is either a proof of program correctness [15], or a counterexample in the form of an execution path of the program. Recently, two abstraction techniques have provided good results: Predicate abstraction and abstract interpretation. Predicate abstraction [1,3,4,9,16,18] starts by verifying the property on a coarse abstraction of the program, in the case of imprecision, the abstraction is refined. It is used by the Slam project [1] and the Blast model checker [8]. Abstract interpretation [6] uses abstract domains to capture the semantic of programs. This paper presents another approach for C-like program verification. We model programs by using the ASMA modeling approach[14]. Each function of the program is represented by two models: Data Model (DM) representing operations on variables and Control Model (CM) expressing the program control structure. The erroneous location L is characterized by an access chain: “LAC”. A memetic algorithm generates and improves individuals representing various input variables values such that their execution paths become closer to LAC. The objective is achieved if there is some individual which access chain matches with LAC. Symbolic executions are performed on the pair (DM,CM) by exploiting and extending the concept of weakest precondition[7,11]. The rest of the paper is organized as follow: Section 2 describes the modeling approach: it details program modeling, symbolic executions, weakest precondition and execution paths. Section 3 exposes the memetic solution to program verification. While the section 4 is devoted to some evaluations: we expose in this section some experimental results. The last section concludes by highlighting contributions of our work and exposing some future directions. II. PROGRAM MODELLING A. Program Modeling As usual in program analysis, the program must be preprocessed. for and do while instructions are replaced by equivalent while instruction. Switch instruction is expressed by if-then-else statement; Post and Pre-increment(and decrement)are transformed into standard forms. Output statements are eliminated. A program is represented by two components: the Data Model and the Control Model. 1) Data Model: It models variable’s declarations, assignments, inputs and function calls. These instructions are numbered with integers representing locations. Variable declarations: A declaration of the form: type idf is modeled by idf =type_idf0, meaning that idf has the type type and has not yet a known value. Global declarations are all designed by location 0. Local declarations are numbered by their locations. Assignments: Are represented in the same way as in the source program. Inputs: The input of a variable v is modeled by v=$v, where $v is interpreted as an unknown constant. Predefined functions: Function calls of the form: v=rand()or v=malloc()are denoted by v=£v where £v is an unknown constant. 2) Control Model: models conditionals and loops: Conditional statements: The if-then-else statement is modeled by I=(Cd,Si,Se,Sf) where: Cd is Statement condition ; S i : the location of the first instruction to perform if Cd is True ; S e : location of the first instruction if Cd is False and S f : Location of the first instruction after the conditional. Then(I)=[Si,Se[ and Else(I)=[Se,Sf[. [Sk,Sl[ is the set of locations from Sk (included) to Sl (excluded). 2011 UKSim 5th European Symposium on Computer Modeling and Simulation 978-0-7695-4619-3/11 $26.00 © 2011 IEEE DOI 10.1109/EMS.2011.92 30

[IEEE 2011 European Modelling Symposium (EMS) - Madrid, Spain (2011.11.16-2011.11.18)] 2011 UKSim 5th European Symposium on Computer Modeling and Simulation - A memetic algorithm for

  • Upload
    zahia

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE 2011 European Modelling Symposium (EMS) - Madrid, Spain (2011.11.16-2011.11.18)] 2011 UKSim 5th European Symposium on Computer Modeling and Simulation - A memetic algorithm for

A Memetic Algorithm for Program Verification

Nassima Aleb Computer Science Department

USTHB Algiers, Algeria

[email protected]

Zahia Tamen Computer Science Department

USTHB Algiers, Algeria

[email protected]

Abstract-We present a memetic algorithm for program safety properties verification. This problem is expressed by means of Reachability of some erroneous location L in the program. We use a new method for program modeling: A Separation Modeling Approach: ASMA, in which programs are represented by two components: Data Model DM, and Control Model CM. The erroneous location is represented by its “Location Access Chain”, LAC: a string where each position represents the required value of CM elements guards to reach L. The memetic algorithm generates each time a new population attempting to provide an execution which is "conform" to the location access chain. An individual of the population is a set of intervals each one representing an input variable. At each generation, two local search operators are used to improve some chosen solutions.

Keywords-program modeling; program verification; memetic algorithms; reachability analysis

I. INTRODUCTION Software verification has been an active area of recent

research [1,2,5,10,12,17]. Having a program source and a temporal safety property, the aim is either a proof of program correctness [15], or a counterexample in the form of an execution path of the program. Recently, two abstraction techniques have provided good results: Predicate abstraction and abstract interpretation. Predicate abstraction [1,3,4,9,16,18] starts by verifying the property on a coarse abstraction of the program, in the case of imprecision, the abstraction is refined. It is used by the Slam project [1] and the Blast model checker [8]. Abstract interpretation [6] uses abstract domains to capture the semantic of programs. This paper presents another approach for C-like program verification. We model programs by using the ASMA modeling approach[14]. Each function of the program is represented by two models: Data Model (DM) representing operations on variables and Control Model (CM) expressing the program control structure. The erroneous location L is characterized by an access chain: “LAC”. A memetic algorithm generates and improves individuals representing various input variables values such that their execution paths become closer to LAC. The objective is achieved if there is some individual which access chain matches with LAC. Symbolic executions are performed on the pair (DM,CM) by

exploiting and extending the concept of weakest precondition[7,11]. The rest of the paper is organized as follow: Section 2 describes the modeling approach: it details program modeling, symbolic executions, weakest precondition and execution paths. Section 3 exposes the memetic solution to program verification. While the section 4 is devoted to some evaluations: we expose in this section some experimental results. The last section concludes by highlighting contributions of our work and exposing some future directions.

II. PROGRAM MODELLING

A. Program Modeling As usual in program analysis, the program must be

preprocessed. for and do while instructions are replaced by equivalent while instruction. Switch instruction is expressed by if-then-else statement; Post and Pre-increment(and decrement)are transformed into standard forms. Output statements are eliminated. A program is represented by two components: the Data Model and the Control Model.

1) Data Model: It models variable’s declarations, assignments, inputs and function calls. These instructions are numbered with integers representing locations.

�� Variable declarations: A declaration of the form: type idf is modeled by idf =type_idf0, meaning that idf has the type type and has not yet a known value. Global declarations are all designed by location 0. Local declarations are numbered by their locations.

�� Assignments: Are represented in the same way as in the source program.

�� Inputs: The input of a variable v is modeled by v=$v, where $v is interpreted as an unknown constant.

�� Predefined functions: Function calls of the form: v=rand()or v=malloc()are denoted by v=£v where £v is an unknown constant.

2) Control Model: models conditionals and loops:�� Conditional statements: The if-then-else statement is

modeled by I=(Cd,Si,Se,Sf) where: Cd is Statement condition ; Si: the location of the first instruction to perform if Cd is True ; Se: location of the first instruction if Cd is False and Sf: Location of the first instruction after the conditional. Then(I)=[Si,Se[ and Else(I)=[Se,Sf[. [Sk,Sl[ is the set of locations from Sk (included) to Sl (excluded).

2011 UKSim 5th European Symposium on Computer Modeling and Simulation

978-0-7695-4619-3/11 $26.00 © 2011 IEEE

DOI 10.1109/EMS.2011.92

30

Page 2: [IEEE 2011 European Modelling Symposium (EMS) - Madrid, Spain (2011.11.16-2011.11.18)] 2011 UKSim 5th European Symposium on Computer Modeling and Simulation - A memetic algorithm for

The if-then statement is represented by (Cd,Si,Sf) where Cd, Si and Sf have the same meaning as previously.

�� Loops: while statement is modeled by I=(Cd,Si,Sf)* where Cd, Si and Sf have the same meaning as in conditional statements. We call Body(I)=[Si,Sf[.

The element number i of CM is noted CM[i]; Cd[i] is its condition; and Begin(i) the first location in it. ITE: The subset of CM elements of the form (Ci,Si,Se,Sf); IT: The subset of CM elements of the form (Ci,Si,Se); and LOOP: The subset of CM elements of the form (Ci,Si,Sf)*.

3) Functions Modeling: Each function f is modeledindependently of the caller program, by its two models: DMf and CMf. Without loss of generality, we suppose the two following assumptions:

�� Each function has exactly one return statement: We call it return location. �� Every call of a function f is of the form Sk:v=f(…).

Sk is called: call location. Let P be the caller program represented by DM and CM, Each statement of the form Sk: v=f() in P is modeled as follow :

�� In DM it is represented as an ordinary assignment. So, it is modeled by: Sk: v=f().

�� In CM, it is modeled by (Reff , Sk). Where Reff is a reference to the function f. An efficient and simple way is to reference each function by an integer.

4) Pointers and Aliasing Modeling: Variables references are regrouped in sets representing equivalence classes. The following actions are performed:

�� To each declaration of the form type *v, the class Cv={(v,0)} is created, v is the representing element.

�� For each variable x, the first assignment of the form Si: v=&x creates the class C&x={(&x,0),(v,Si)}. �� For every assignment Si: a=b such that b is an

element of a given class C, (a,Si) is added to the class C.

�� Each assignment *c=d such that c is an element of a class having e as representing element, assign the value d to *e. So, it is modeled in DM by: *e=d. This ensures that all operations done on variables referenced by different names are expressed on the unique class representing element.

Example: Let’s consider the following portion of program and generate its Data Model: Program Data Model 1: x=&i; 1: x=&i C&i={(&i,0),(x,1)} 2: y=&i; 2: y=&I C&i={(&i,0), (x,1),(y,2)} 3: *y=1; 3: i=1 y is a reference to i 4: x=&j; 4: x=&j C&j={(&j,0),(x,4)} 5: z=x; 5: z=x C&j={(&j,0),(x,4),(z,5)} The equivalence classes are: C&i and C&j. These classes contain all aliasing information. As example, from the location 1 to 3 x is a reference to i, while from the location 4 to the end it is a reference to j.

5) Modeling Example

Source Code 00 : int a,b,c; 01 : a=15 ; 02 : scanf(“%d”,&b); 03 : scanf(“%d”,&c); if (a>b+c) 04: {a=c; if(b>0) 05: {a=b ; 06 : c=0 ; } else 07: b=b+a ;}else 08: {a=b; {While(a<c) 09: a=a+2; if (a==c) 10: b=b+a; else 11: c=a ;}

Figure1. Program Example Prog with its modeling

B. Executions We use the concept of weakest precondition [7,11] to

perform symbolic executions. We adopt a method of computing weakest preconditions in such a way that allows computing the considered guard just in the desired point of the program instead of running all the statements of the program from the beginning until the considered location. 1) Weakest Precondition: The weakest precondition of the predicate C with respect to (w.r.t.) the statement S, WP(S,C), is defined as the weakest predicate whose truth before S entails the truth of C after S terminates. By definition WP(v=exp,C) is C with all occurrences of v replaced with exp, denoted C[exp/v]. In the subsequent, each statement is designated by its location. We define the weakest precondition of a predicate C w.r.t. an interval [Si,Sj[, denoted by WPI([Si,Sj[,C), as the weakest predicate whose truth before Si entails the truth of C after Sj-1 terminates. The idea is to compute successively the weakest preconditions of C with respect to each location within [Si,Sj[ starting by the end until we attain Si, or we obtain a constant meaning that there are no variables occurring in C. For each location Sk�[Si,Sj[, the result obtained from computing WP(Sk,Ck) is given as a predicate to compute its weakest precondition w.r.t. Sk-1 and so on. So: C If no variable occurs in C WPI([Si,Sj[,C)= WP(Si,C) If Sj=Si WPI([Si,Sj-1[,WP(Sj-1,C)) Otherwise Example: WPI([0,4[,a>b+c)=WPI([0,3[,WP(3,a>b+c))= WPI([0,2[,WP(2,a>b+$c))=WPI([0,1[,WP(1,a>$b+$c))= 15>$b+$c. Which means that the condition a>b+c is satisfied in the location 4 if and only if the input values of a and b verify the constraint 15>$b+$c. We define the weakest precondition on intervals union by: WPI([Si,Sj[�[Sk,Sl[,C)=WPI([Si,Sj[,WPI([Sk,Sl[,C)). The weakest precondition of a predicate with a dereference *a is computed using the representing element, instead of the variable a, in the same manner than precedent cases. For

Dada Model Condrol Model 00:a=int_a0 1:(a>b+c,4,8,12)00:b=int_b0 2:(b>0,5,7,8) 00:c=int_c0 3:(a<c,9,10)* 01: a=15 4:(a=c,10,11,12) 02: b=$b 03 :c=$c 04: a=c 05: a=b 06: c=0 07: b=b+a 08: a=b 09: a=a+2 10:b=b+a 11: c=a

31

Page 3: [IEEE 2011 European Modelling Symposium (EMS) - Madrid, Spain (2011.11.16-2011.11.18)] 2011 UKSim 5th European Symposium on Computer Modeling and Simulation - A memetic algorithm for

example, let’s compute WPI([1,4[,*x=1). WP(3,*&i=1)= WP(3,i=1)=(1=1)=True.So, WPI([1,4[,*x=1)=True.

1) CM Guards Value Computing: The guard value of CM[k] depends on the path followed from the beginning to this element. The execution path P is the succession of CM intervals targeted by some input values. Let �(Cd[k]) be the truth value of Cd[k], and P[k] the portion of the path executed in CM[k]. The path P is defined as the union: P=P[1]�P[2]..�P[m] where : Then(k) if CM[k]�(IT�ITE)and �(Cd[k])=’1’ P[k]= Else (k) if CM[k]� ITE and �(Cd[k])=’0’ (Body(k))n if CM[k]�LOOP and �(Cd[k])=’1’ n is the iterations number. (Body(k))n represents the union of the interval Body(k) n times. Let’s call Pk the prefix having the length k of the path P, and let’s compute �(Cd[k]). Two cases are possible: �� CM[k]� LOOP : WPI([0,Begin(k)[) If k=1 �(Cd[k]) = WPI(P(k-1),Cd[k]) Otherwise

�� CM[k]�LOOP: Let CM[k]=(C,Sb,se)*, we note �(Cd[k])j the value of Cd[k] in the path P in the iteration j:

�(Cd[k]) If j=1 �(Cd[k])j = WPI(P(k-1) �[Sb,Se[j-1 ,Cd[k]) If j>1 and �(Cd[k])j-1=True The loop iterations number: Is the least integer n such that �(Cd[k])n+1=False

III. A MEMETIC ALGORITHM FOR PROGRAM VERIFICATION

A memetic algorithm is an evolutionary algorithm which includes one or more local search phases within its evolutionary cycle. As in [14], having a program represented by its models DM and CM, and a safety property expressed as some erroneous location L, the goal is to find a set of input values delivering an access chain which matches with LAC. Our algorithm is presented by the figure 2. LAC is a string constituted of the characters ‘1’,’0’ or ‘x’:

‘1’ if (CM[I]�(ITE�IT))�(L�Then(I)) (CM[i]�LOOP)�(L�Body(I)) LAC[i] = ‘0’ if (CM[I]�(ITE))�(L�Else(I)) ‘x’ Otherwise Example: In the program Prog: For L=13, LAC=0x1x1. Each input value is represented by an interval. Using intervals instead of unique values is very valuable. It has at least two advantages: first, each individual represents a set of possible executions instead of one execution, this allows to inspect more efficiently the search space. Second, we can

“correct” gradually the undesirable behavior instead of rejecting systematically each unwanted results. In fact, the intervals operations we define allow us to decide if inside some interval, the considered property is always True, always False, or True for some values and False for the others in which case, we can explore inside the interval. For each individual i, we compute its Individual Access Chain IACi recording the sequence of CM elements executed by the individual i. The fitness function computes the distance between LAC and IACi. The objective is reached if we find an individual i* such that the distance between IACi* and LAC is zero. In the contrary case, the population must be improved. To ensure both intensification and diversification, we define one recombination operator, one mutation operator and two local search operators.

A. Individuals Access Chain Computing First, let’s notice the two following points:

�� The value of a CM guard is not always known. Example: if a=[0,20], b=[-10,50] and c=[-20,10] the value of a>b+c is not known

�� The truth values of all CM guards is not always required, since CM elements are often opposite. In the program Prog, if an individual is such that Cd[1] is False, Cd[2] is not required. Consequently, the access chain of an individual contains the characters: ‘1’,’0’, ‘u’ or ‘x’ meaning respectively: True, False, Unknown or not required. When we find the first ‘u’, we stop the computing by completing all the remainder positions by ‘u’. Let i be an individual, and let �i(Cd[k])be the valuation of Cd[k] for the individual i. �i(Cd[k]) is computed in the same manner than �(Cd[k]) by using the data of the individual i. The access chain of i noted by IACi is the string: a1a2a3..an such that:

‘1’ if �i(Cd[k])=True ak = ‘0’ if �i(Cd[k]) =False ‘u’ if �i(Cd[k]) is unknown Or �i(Cd[k-1])=’u’

‘x’ if Cd[k] is not required In the execution path of an individual i, the case where ak is not required is handled by: if ak=’x’ then P[k]=.

1) Interval Operations: We adopt the same definitions for arithmetic operations as in interval abstract interpretation [6]. Logical operations are our own definitions since we use a three valuated logic.

Arithmetic Operations Logical Operations n= [n, n] [a, b]+[c, d] = [a+c, b+d] [a, b]- [c, d] = [a�d, b�c] -[a, b] = [�b,�a] [a, b]*[c, d] =[Min,Max] with: Min=min(ac,ad,bc,bd) Max=max(ac,ad,bc,bd)

([a,b]=[c,d])=T if(a=b=c=d) F if [a,b]�[c,d]= U else ([a,b]<[c,d])=T if (b<c) F if (d<a) U else ([a,b]>[c,d])=T if (a>d) F if (b<c) U else

32

Page 4: [IEEE 2011 European Modelling Symposium (EMS) - Madrid, Spain (2011.11.16-2011.11.18)] 2011 UKSim 5th European Symposium on Computer Modeling and Simulation - A memetic algorithm for

Truth Table F G FG F�G �G 1 U 1 U U 0 U U 0 U U U U U U

Intervals union, intersection, inclusion and appurtenance are defined as ordinary intervals operations. 2)Example of Individual Access Chain Computing: In the program Prog, each individual is composed of two integer intervals representing the variables y and z. A possible individual i is[-10,-5] ; [-200,0], let’s compute its access chain IACi to the location 6, let’s note IACi =a1a2 . So, let’s compute �i(Cd[1]) and �i(Cd[2]). We use the formulas defined previously: �i(Cd[1])=WPI([0,4[,x>y+z)= 1>$y+$z=True =>a1=1 (since $y+$z=[-210,-5]).�i(Cd[2])=WPI(([0,4[�[4,6[),y>0)=$y>0=False=>a2=0 Consequently, IACi =10

B. Population Initialization Initial population is generated in such a way to ensure

both diversity and acceptable quality. Each individual is evaluated to verify the two properties. To ensure diversity we select individuals having different values in LAC positions represented by ‘x’ since they are those positions which can provide different access paths to the same location. An individual has an acceptable quality if its access chain has at least one correct position. It is also advantageous to use large size intervals in the initialization stage.

C. Fitness Function The fitness function Fitness(i) measures the distance

between LAC and IACi. The computing of the fitness is performed in the following manner: Let Fit be a string such that: 0 if (LAC[k]=’x’) OR (LAC[k]=IACi[k])

1 Otherwise.

Fitness(i) is the decimal value of Fit. The Fitness function represents truthfully the distance between the desired behavior and the behavior of the considered individual. In fact, if we consider for example two individuals i1 and i2 such that Fit1=1000 and Fit2=0001, despite the fact that these two individuals have both one faulty position which does not match with LAC, their Fitness must be different because the first individual has failed in the first guard so it has taken a path completely different from LAC and it represents an execution that is completely deviating. While i2 has matched with LAC until the last position so it is closer to LAC. So, i2 is better than i1, which is effectively expressed by our fitness function since: Fitness(i1)=8 and Fitness(i2)=1. The goal is to find an individual i* such that Fitness(i*)=0.

D. Population improvement To ameliorate the population, we adopt a guided

approach which increases the probability of obtained individuals to be effectively better than their parents. So, we perform a gradual amelioration. It consists to ‘correct’ the first faulty position of each individual: We call faulty position a position whose value in the access chain and in LAC are different, and its value in LAC is either ‘0’ or ‘1’. A faulty position could be an unknown position or an erroneous one. We categorize individuals considering their fitness (fit or unfit) and their faulty positions values (wrong or unknown). Let’s call FU, FE, UU and UE, the individual’s categories: Fit with Unknown positions, Fit with Erroneous positions, Unfit with Unknown positions and finally, Unfit with Erroneous positions. In the subsequent we note C(P) the category C of the population P. e.g. FU(P) is the category Fit with Unknown positions of the population P. 1) Recombination: Is applied on the category FU to correct progressively individual’s faulty positions. Let i1 and i2 be two individuals such that i1 has the position pos as first unknown position and pos is not a faulty position for i2. The idea is to use the data of the individual i2 to correct the unknown position pos of i1. However, since in a program variables are strongly correlated to each other, modifying some data of an individual to correct some guard could in the same time alter negatively other guards. To avoid this situation, we modify data corresponding to some faulty position in a “conservative” way. So we perform an intersection between the data, occurring in the position pos, of i2 and those of i1. Consequently, all the guards which had a known value conserve their value, those which had unknown values could have a known ones. Hence, the recombination operator is defined as follow: Let the individuals: i1,i2 and i3.such that pos is a faulty position of i1, and let x1i,x2i,x3i the values intervals of the input variable xi respectively for i1,i2 and i3.Recombination (i1,i2,pos)=(i3,i2) such that for all input variable xi : x3i = x1i �x2i If xi occurs in Cd[pos] x1i Otherwise 1: Inputs: A program Pg; a location L 2: Outputs: An execution leading to L. 3: Compute the acces chain of L : LAC 4: InitialializePopulation(P0); i=0; 5: Repeat ComputeFitness(Pi) 6: Pw=LSW(FE(Pi)) 7: Pn=LSN(UU(Pi)) 8: Pr=Recombine( FU(Pi)�FU(Pw)�FU(Pn)) 9: Pm=WM(FE(Pw)�FE(Pn)) 10: i:=i+1; 11: Pi=Pr�Pm�P’ 12: Until a Stop Condition is met Figure.2 Memetic algorithm for program verification

Fit[k]=

33

Page 5: [IEEE 2011 European Modelling Symposium (EMS) - Madrid, Spain (2011.11.16-2011.11.18)] 2011 UKSim 5th European Symposium on Computer Modeling and Simulation - A memetic algorithm for

Example: Let LAC= 1x1101; let i1 an individual such that IACi1 = 011001, so the faulty positions of i1 are: 1 and 4. The recombination point will be the position 1. Let i2 be an individual such that IACi2= 101000, so, 1 is not a faulty position of i2. so, we use i2 to correct i1. input variables occurring in Cd[1] are y and z, so : Recombination (i1,i2,1)=(i3,i2) such that : y3=y1�y2 and z3=z1�z2, where yi and zi are the intervals of variables y and z of the individual i. [0,200] [-50,100]

The recombination operator is applied to FU individuals of the entire population as well as for those obtained by applying the two local search operators.

1) Local Search: In memetic algorithms, local search is usually performed on good solutions to improve their quality. In our work, we define two local search operators:

�� LSW is applied on the FE category. Like for the FU category, we correct progressively individual’s faulty positions. Ideally, we would like to correct each individual first erroneous position. Nevertheless, this goal is not straightforwardly achievable. So we use the following observation: Let i1 and i2 be two individuals having the same first faulty position pos and the same fitness value, if IACi1[pos]=’u’ and IACi2[pos] ’u’ then i1 is better than i2. Consequently, the idea we adopt is to transform in a first step the erroneous position in an unknown one. This transformation is achieved by considering as neighborhood relationship intervals widening. So, we enlarge the intervals of all variables appearing in the erroneous position until we obtain a solution better than the considered one. The population obtained is noted Pw, its individuals are of two kinds: FE and FU.

�� LSN is applied on the UU category. In fact, this category should not be neglected since its unfit valuation is due to unknown positions which, in their turn, are due to large size intervals. Let i be an individual of UU, and let pos be the first unknown position in IACi, a neighbor solution of i is obtained by considering the input variables appearing in

Cd[pos] and reduce progressively their intervals until we obtain an individual i’ such that : FitI’[pos]=0 and Fitness(i’)<Finess(i). The population obtained is noted Pn, it is constituted of individuals of the categories: FU,FE and UU.

The duality (Widening, Narrowing) allows covering a first area of space research in which a solution exists, then refine this space to target the exact area of the solution. The same reasoning is used in abstract interpretation [6] to compute fixed points. 3) Mutation Operator(WM): This operator is applied on the sets FE(Pw) and FE(Pn) which are constituted of fit individuals not improved by LSW nor LSN. So, a weak mutation is performed on variable values occurring in their first erroneous position. In figure 2, P’ is a set of individuals generated randomly.

E. Functions Handling Our method allows handling programs with functions. Let ri be the return location of fi ; L0 the location of fi call in the caller program, and let’s note LAC(ri) the access chain of ri in fi. Let’s check if a given location L is reachable. We note LACl the access chain of L in the program, there are three cases regarding the position of L w.r.t. L0: �� L<L0: fi has no effect on the reachability of L. �� L is in fi: To reach L, we must first reach L0. So the

problem is transformed in two reachability analysis problems: First, consider the reachability of L0 in the caller program, then the reachability of L in the function. Let’s call LACl0: the access chain of L0 in the program and LAClf: the access chain of the location L in fi. So: LACl= LACl0^LAClf. The symbol ‘^’ designs the string concatenation operator.

�� L>L0 and L not in fi: LACl=LACl0^LAC(ri)^aj+1… an. Where: “aj+1…an”. Records required values for all CMp elements situated between the function call and L.

Individual access chain is computed as previously by replacing formal parameters by effective one.

IV. RESULTS AND DISCUSSIONS We do several experimentations to test our approach. Memetic algorithm parameters are adjusted during experimentation. The table 1 reports some results. Each program Pn is constituted of n lines of code. We report in the table the number of: variables, predicates, iterations,

[-10,25] [20,50] [-10,25] [20,50]

Program variables Predicates Iterations Individuals widening Narrowing Recombination Time(sec) P100 10 8 10 50 17 13 247 5 P200 18 28 654 60 23 20673 56 80 P400 26 56 457 70 50 21354 15 70 P600 34 70 1030 60 103 40000 75 126

P1000 36 74 500 150 48 20206 32 20

Table 1. Experimental results.

34

Page 6: [IEEE 2011 European Modelling Symposium (EMS) - Madrid, Spain (2011.11.16-2011.11.18)] 2011 UKSim 5th European Symposium on Computer Modeling and Simulation - A memetic algorithm for

individuals, LSW calls, LSN calls, recombination and finally the execution time. In our actual tool, we did not focus on optimizations issues, so we can improve it in diverse manners to deal with larger number of variables and predicates in LAC. Yet, obtained results are already very encouraging. We remark that although iteration numbers seem great the execution times are insignificant. This is due to the fact that all memetic operators are based on uncomplicated operations on intervals. We have also noticed that initial population quality is decisive for the tool efficiency. It is appropriate that initial intervals be rather large to avoid early bad solutions. It is also desirable that for each element of CM, there is at least one individual satisfying the guard and one individual satisfying the opposite guard. This requirement is essential for the recombination operator. Finally, the iteration number must be rather vast since we correct errors gradually.

V. CONCLUSION AND FUTURE WORKWe have presented an original approach for the program

safety verification problem. The results obtained are encouraging. Our work presents several characteristics: �� The modeling technique ASMA is very powerful; it is as

simple as advantageous. In fact, it permits to manipulate a program very easily. Symbolic execution process allows computing the weakest preconditions only on the considered path in the needed location, and not over the entire program.

�� Using intervals instead of single values is very efficient; it allows exploring a larger research space.

�� Our approach is compositional: Functions can be analyzed by our technique straightforwardly.

�� The Widening Narrowing: ensures in the same time diversification and intensification.

�� Even though the pointer analysis problem was not our primordial objective; pointers are represented and manipulated in a simple and natural way. A lot of aliasing and points-to information can be deduced from the data model without significant effort.

There are several future directions to our work. The first is to investigate, with the same modeling, other optimization techniques. The second is automatic data test generation.

REFERENCES [1] T. Ball, S.K. Rajamani,: The Slam project: Debugging system

software via static analysis. In: Proc. POPL, pp. 1–3. ACM, New York (2002)

[2] S. Chaki, E.M. Clarke.,A. Groce,, S. Jha, H. Veith.: Modular verification of software components in C. IEEE Trans. Softw. Eng. 30(6), 388–402. Wiley, New York (2004) E.M. Clarke, O. Grumberg, S. Jha, Y. Lu, H. Veith,Counterexample-guided abstraction refinement. In: Proc. CAV, LNCS, vol. 1855,pp. 154–169. Springer, Berlin (2000)

[3] E.M. Clarke, D. Kroening, N. Sharygina,, K. Yorav: SatAbs: SAT-based predicate abstraction for ANSI-C. In: Proc. TACAS,LNCS, vol. 3440, pp. 570–574. Springer, Berlin(2005)

[4] J.C. Corbett, M.B. Dwyer, J. Hatcliff, C. Pasareanu,, Robby, S. Laubach,,H. Zheng, : Bandera: Extracting finite-state models from Java source code. In: Proc. ICSE, pp. 439–448. ACM, New York (2000)

[5] P. Cousot, R. Cousot : Abstract interpretation : A Unified lattice model for static analysis of programs by construction or approximation of fixpoints, in Principales of Programming Languages, POPL’77, pp. 238-252. (1977).

[6] E. Dijkstra ., A discipline of programming . Prentice Hall 1976. [7] D. Beyer, A. Thomas, Henzinger · Ranjit Jhala · R. Majumda. :The

software model cheker Blast. Int J Softw Tools Technol Transfer. Springer Verlag, Berlin,(2007)

[8] J. Esparza, S. Kiefer, S. Schwoon, Abstraction refinement with Craig interpolation and symbolic pushdown systems. In: Proc. TACAS, LNCS, vol. 3920, pp. 489–503. Springer, Berlin (2006)

[9] P. Godefroid,, Model checking for programming languages using VeriSoft. In: Proc. POPL, pp. 174–186. ACM, New York (1997)

[10] S. Graf , H. Saidi , “Construction of abstract state graphs with PVS”. In CAV 97: Computer Aided Verification, LNCS 1254, pages 72-83. Springer-Verlag, (1997).

[11] K. Havelund, T. Pressburger,, Model checking Java programs using Java PathFinder. STTT 2(4), 366–381 (2000)

[12] J.K. HAO Memetic algorithms. A book chapter [13] N.Aleb, Z.Tamen, N.Kamel, “An evolutionary Approach for Program

Model checking.” Proc. International Conference on Model & Data Engineering (Medi’2011)., in press.

[14] T.A. Henzinger, R. Jhala, R. Majumdar, G.C. Necula,, G. Sutre, W.Weimer, Temporal-safety proofs for systems code. In: Proc. CAV,LNCS, vol. 2404, pp. 526–538. Springer, Berlin (2002)

[15] T.A. Henzinger, R. Jhala, R. Majumdar, Sanvido, M.A.A.: Extreme model checking. In: In ternational Symposium on ,Verification: Theory and Practice, LNCS, vol. 2772, pp. 332– 358. Springer, Berlin (2003)

[16] T.A. Henzinger, R. Jhala,,R. Majumdar, G. Sutre,, Lazy abstraction. In: Proc. POPL, pp. 58–70. ACM, New York (2002).

[17] G.J. Holzmann, The Spin model checker. IEEE Trans. Softw. Eng. 23(5), 279–295 (1997)

[18] D. Kroening,, A. Groce, E.M. Clarke, Counterexample guided abstraction refinement via program execution. In: Proc. CFEM, LNCS, vol. 3308, pp. 224–238.Springer, Berlin(2004)

35