Compiler Ggcc

Preview:

DESCRIPTION

The presentation will start by summarizing some results of the Eureka/ITEA project GGCC (Global GNU Compiler Collection) where Julio collaborated in the design of an open platform for coding rule validation.Then, the presentation continues on ellaboration on the different connections between formal techniques, in a broad sense, and open source software development. Finally, I will discuss how these examples lead naturally to the emergent concept of semantic forge.

Citation preview

The Eureka/ITEA Global GCC Project

Julio Marino(joint work with Guillem Marpons and others)

Babel Research Group — Universidad Politecnica de Madrid

FOSSA09, Grenoble

Marino et al. (UPM) Global GCC FOSSA, November 2009 1 / 30

Overview

1 Project Overview

2 Coding Rule ValidationStructural Rule ValidationDomain-specific language: CRISP

3 The need for static analysis

4 Lessons learned

5 The way ahead

Marino et al. (UPM) Global GCC FOSSA, November 2009 2 / 30

ContextThe Global GCC Project (2006–2008)

ITEA-labeled consortium of industrial / research partnersI Industrial: Mandriva, Bertin, Telefonica I+D, small/medium-sized

companiesI Research labs: INRIA, CEA-LIST, UPM

Goal: make the GNU Compiler Collection (GCC) more attractive tothe (european) software industry by transferring academic resultsin three areas:

I Project-wide static analysisI Global optimizationI Minimise programming hazards by means of coding rules

Global GCC knowledge base: integrates heterogeneous informationprovided by the different components of GGCC

http://www.ggcc.info

Marino et al. (UPM) Global GCC FOSSA, November 2009 3 / 30

Coding Rules

Definition

Coding Rules constrain admissible constructs of alanguage to help produce more reliable and

maintainable code.

Standard coding rule sets do exist, e.g.:

High-Integrity C++ (HICPP): general C++ applications

MISRA-C (C language): automotive industry / embedded systems

Many organisations need to write their own rule setsor adapt existing ones.

Marino et al. (UPM) Global GCC FOSSA, November 2009 4 / 30

Coding RulesSome Actual Examples

“Do not call the malloc() function” (MISRA-C 20.4)

“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)

“Expressions that are effectively Boolean should not beused as operands to operators other than (&&, || and !)”

(MISRA-C 12.6)

“If a virtual function in a base class is not overridden inany derived class, then make it non virtual”

(HICPP 3.3.6)

“All automatic variables shall have been assigned a valuebefore being used”

(MISRA-C 9.1)

“Behaviour should be implemented by only one memberfunction in a class”

(HICPP 3.1.9)

Marino et al. (UPM) Global GCC FOSSA, November 2009 5 / 30

Coding RulesSome Actual Examples

“Do not call the malloc() function” (MISRA-C 20.4)

“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)

“Expressions that are effectively Boolean should not beused as operands to operators other than (&&, || and !)”

(MISRA-C 12.6)

“If a virtual function in a base class is not overridden inany derived class, then make it non virtual”

(HICPP 3.3.6)

“All automatic variables shall have been assigned a valuebefore being used”

(MISRA-C 9.1)

“Behaviour should be implemented by only one memberfunction in a class”

(HICPP 3.1.9)

Marino et al. (UPM) Global GCC FOSSA, November 2009 5 / 30

Coding RulesSome Actual Examples

“Do not call the malloc() function” (MISRA-C 20.4)

“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)

“Expressions that are effectively Boolean should not beused as operands to operators other than (&&, || and !)”

(MISRA-C 12.6)

“If a virtual function in a base class is not overridden inany derived class, then make it non virtual”

(HICPP 3.3.6)

“All automatic variables shall have been assigned a valuebefore being used”

(MISRA-C 9.1)

“Behaviour should be implemented by only one memberfunction in a class”

(HICPP 3.1.9)

Marino et al. (UPM) Global GCC FOSSA, November 2009 5 / 30

Coding RulesSome Actual Examples

“Do not call the malloc() function” (MISRA-C 20.4)

“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)

“Expressions that are effectively Boolean should not beused as operands to operators other than (&&, || and !)”

(MISRA-C 12.6)

“If a virtual function in a base class is not overridden inany derived class, then make it non virtual”

(HICPP 3.3.6)

“All automatic variables shall have been assigned a valuebefore being used”

(MISRA-C 9.1)

“Behaviour should be implemented by only one memberfunction in a class”

(HICPP 3.1.9)

Marino et al. (UPM) Global GCC FOSSA, November 2009 5 / 30

Coding RulesSome Actual Examples

“Do not call the malloc() function” (MISRA-C 20.4)

“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)

“Expressions that are effectively Boolean should not beused as operands to operators other than (&&, || and !)”

(MISRA-C 12.6)

“If a virtual function in a base class is not overridden inany derived class, then make it non virtual”

(HICPP 3.3.6)

“All automatic variables shall have been assigned a valuebefore being used”

(MISRA-C 9.1)

“Behaviour should be implemented by only one memberfunction in a class”

(HICPP 3.1.9)

Marino et al. (UPM) Global GCC FOSSA, November 2009 5 / 30

Rule Conformance Checking

Problems with Current Approaches

Rules are specified in natural language:I AmbiguityI Automatic checking hindered

Closed tools

Lack of extensibility

Proposed Solution

Define a logic based language that allows for precisely specifyingrule sets such as MISRA-C or HICPP

Use logic programming to get an automatic rule conformancechecking procedure

Integrate information provided by different program analyses

Marino et al. (UPM) Global GCC FOSSA, November 2009 6 / 30

Rule Conformance Checking

Problems with Current Approaches

Rules are specified in natural language:I AmbiguityI Automatic checking hindered

Closed tools

Lack of extensibility

Proposed Solution

Define a logic based language that allows for precisely specifyingrule sets such as MISRA-C or HICPP

Use logic programming to get an automatic rule conformancechecking procedure

Integrate information provided by different program analyses

Marino et al. (UPM) Global GCC FOSSA, November 2009 6 / 30

Other Tools

Proprietary tools:

Compilers: IAR Systems (C)

QA: Parasoft, Klocwork, Coverity, Semmle Code (Java)

Free software:

Checkstyle (Java)

Gendarme (ECMA CIL, Mono and .Net)

Drawbacks:

Lack of appropriate extensibility mechanisms

Ambiguity in natural language

Interoperability is difficult

Marino et al. (UPM) Global GCC FOSSA, November 2009 7 / 30

Motivation: C++ “Strange” Behavior

class A{public:

A();virtual void func ();

};

class B : public A{

B() : A() {}virtual void func ();

};

A::A() {func ();

}

B *d = new B();

// A::func or B::func?

Marino et al. (UPM) Global GCC FOSSA, November 2009 8 / 30

Motivation: C++ “Strange” Behavior

class A{public:

A();virtual void func ();

};

class B : public A{

B() : A() {}virtual void func ();

};

A::A() {func ();

}

B *d = new B();

// A::func or B::func?

Coding Rule:

“Do not invoke virtual methods of the declared classin a constructor or destructor.”

Marino et al. (UPM) Global GCC FOSSA, November 2009 8 / 30

C++ “strange” behavior (2)

class Base {};

class Derived : public Base{public:

~Derived () {}};

void foo(){

Derived* d = new Derived;delete d; // correctly calls derived destructor

}

void boo(){

Derived* d = new Derived;Base* b = d;delete b; // problem! does not call derived destructor !

}

Marino et al. (UPM) Global GCC FOSSA, November 2009 9 / 30

C++ “strange” behavior (2)

class Base {};

class Derived : public Base{public:

~Derived () {}};

void foo(){

Derived* d = new Derived;delete d; // correctly calls derived destructor

}

void boo(){

Derived* d = new Derived;Base* b = d;delete b; // problem! does not call derived destructor !

}

Rule HICPP 3.3.2

“Write a ‘virtual’ destructor for base classes.”

Marino et al. (UPM) Global GCC FOSSA, November 2009 9 / 30

ExampleRule Formalisation

Rule HICPP 3.3.15

“Ensure base classes common to more than one derived class arevirtual”

violate hicpp 3,3,15(a, b, c, d)←b 6= c ∧direct base of(a, b) ∧direct base of(a, c) ∧base of(b, d) ∧ base of(c, d) ∧¬virtual base of(a, c)

Rules are specified in an enriched LP-language with: disequality,quantifiers, constructive negation and sorts.

Marino et al. (UPM) Global GCC FOSSA, November 2009 10 / 30

ExampleExtraction of Program Information and Search of Violations

Rule HICPP 3.3.15 in Prolog

violate_hicpp_3_3_15(A,B,C,D) :-class(B), class(C),B \= C,class(D), class(A),direct_base_of(A, B),direct_base_of(A, C),base_of(B, D),base_of(C, D),\+ virtual_base_of(A, C).

class(’::Animal’). class(’::WingedAnimal’).

class(’::Mammal’). class(’::Bat’).

direct base of(’::Animal’, ’::Mammal’).

direct base of(’::Animal’, ’::WingedAnimal’).

direct base of(’::Mammal’, ’::Bat’).

direct base of(’::WingedAnimal’, ’::Bat’).

virtual base of(’::Animal’, ’::Mammal’).

Marino et al. (UPM) Global GCC FOSSA, November 2009 11 / 30

Proposed Approach

1 Formalize rules in a logic-based specification languagethat is executable: CRISP

2 Use GCC ?? for gathering necessary programinformation

Marino et al. (UPM) Global GCC FOSSA, November 2009 12 / 30

Our Rule Checking Procedure

Coding rules(in English)

C++ projectsource files

Coding rulesformalized

in CRISPC++

Coding rulecompiler

g++’

(project build)

Coding rulescompiled

into Prolog

Project factsin Prolog

Ciao Prologengine

Rule viola-tions report

1 Coding rule(s) written oncein the logic-based formalism

2 Extract program information(+ analysis information ifavailable) using GCC, andstore it

3 Search (using a Prologengine) for a counterexample

Marino et al. (UPM) Global GCC FOSSA, November 2009 13 / 30

Our Rule Checking Procedure

Coding rules(in English)

C++ projectsource files

Coding rulesformalized

in CRISPC++

Coding rulecompiler

g++’

(project build)

Coding rulescompiled

into Prolog

Project factsin Prolog

Ciao Prologengine

Rule viola-tions report

1 Coding rule(s) written oncein the logic-based formalism

2 Extract program information(+ analysis information ifavailable) using GCC, andstore it

3 Search (using a Prologengine) for a counterexample

Marino et al. (UPM) Global GCC FOSSA, November 2009 13 / 30

Our Rule Checking Procedure

Coding rules(in English)

C++ projectsource files

Coding rulesformalized

in CRISPC++

Coding rulecompiler

g++’

(project build)

Coding rulescompiled

into Prolog

Project factsin Prolog

Ciao Prologengine

Rule viola-tions report

1 Coding rule(s) written oncein the logic-based formalism

2 Extract program information(+ analysis information ifavailable) using GCC, andstore it

3 Search (using a Prologengine) for a counterexample

Marino et al. (UPM) Global GCC FOSSA, November 2009 13 / 30

Our Rule Checking Procedure

Coding rules(in English)

C++ projectsource files

Coding rulesformalized

in CRISPC++

Coding rulecompiler

g++’

(project build)

Coding rulescompiled

into Prolog

Project factsin Prolog

Ciao Prologengine

Rule viola-tions report

1 Coding rule(s) written oncein the logic-based formalism

2 Extract program information(+ analysis information ifavailable) using GCC, andstore it

3 Search (using a Prologengine) for a counterexample

Marino et al. (UPM) Global GCC FOSSA, November 2009 13 / 30

CRISP Building Blocks 1: Sorts

Variable, DataMember, LocalVariable

Function, MemberFunction, Constructor

Type, PointerType, Record

Scope, Namespace, Record, CompoundStatement

Operator

ArgumentTypeInFunctionType

ClassMember

Thing

Marino et al. (UPM) Global GCC FOSSA, November 2009 14 / 30

CRISP Building Blocks 2: (Binary) Relations

Function calls FunctionRecord hasImmediateBase RecordVariable hasType NonFunctionTypeFunction hasType FunctionTypeThing isDefinedIn ScopeScope isNestedIn ScopeRecord hasMember MemberFunctionRecord hasMember DataMemberRecord hasBase RecordRecord isPrivateBaseOf RecordRecord isVirtualBaseOf RecordPointerType hasPointedType TypeFunctionType hasReturnType TypeRecord hasFriend RecordRecord hasFriend MemberFunctionClassMember hasVisibility Visibility

Marino et al. (UPM) Global GCC FOSSA, November 2009 15 / 30

Example of Rule Formalization

Rule HICPP 3.3.13:

“Do not invoke virtual methods of the declared classin a constructor or destructor.”

Marino et al. (UPM) Global GCC FOSSA, November 2009 16 / 30

Example of Rule Formalization

Rule HICPP 3.3.13:

“Do not invoke virtual methods of the declared classin a constructor or destructor.”

rule HICPP 3.3.13

violated by Caller : MemberFunction; Callee : VirtualFunction

when exists R : Record such that

(

R hasMember Caller

and R hasMember Callee

and

(

Caller is Constructor

or Caller is Destructor

)

and Caller calls+ Callee

)

.

Marino et al. (UPM) Global GCC FOSSA, November 2009 16 / 30

Formalization of Rule HICPP 3.3.2

Rule HICPP 3.3.13:

“Write a ‘virtual’ destructor for base classes.”

rule HICPP 3.3.2violated by C : Record

when exists C’ such that C’ hasBase Cand not exist VD : Destructor such that(

VD isDefinedIn Cand VD is VirtualFunction

).

Marino et al. (UPM) Global GCC FOSSA, November 2009 17 / 30

Auxiliary Sorts and Relations

relation F : Function overloads F’ : Function

when exists S : Scope ; N : String such that

(

F isDefinedIn S

and F’ isDefinedIn S

and F hasUnqualifiedName N

and F’ hasUnqualifiedName N

and F \= F’

)

.

sort M : ClassMember is PrivateClassMember

when exists V : Visibility such that

(

M hasVisibility V and V is ‘private’

)

.

Marino et al. (UPM) Global GCC FOSSA, November 2009 18 / 30

Experimental Results

PROJECT KLOC LOAD TIME # VIOLATIONS (CHECKING TIME)3.3.1 3.3.2 3.3.11 3.3.15

Bacula 20 0.24 0 (0.0) 3 (0.0) 0 (0.0) 0 (0.0)CLAM 46 1.62 1 (0.0) 15 (0.5) 115 (0.1) 0 (0.2)Firebird 439 2.61 16 (0.0) 60 (1.0) 115 (0.2) 0 (0.3)IT++ 39 0.42 0 (0.0) 6 (0.0) 12 (0.0) 0 (0.0)OGRE 209 3.05 0 (0.0) 15 (0.9) 79 (0.2) 0 (0.3)Orca 89 1.17 1 (0.0) 12 (0.4) 0 (0.1) 0 (0.2)Qt 595 10.42 15 (0.0) 75 (10.5) 1155 (1.3) 4 (1.2)

All times expressed in seconds.

Marino et al. (UPM) Global GCC FOSSA, November 2009 19 / 30

Work in Progress

1 Implement / Enrich the CRISP Language

2 Implement more rules with information given by other tools

3 Open our abstract representation of programs to external tools

Marino et al. (UPM) Global GCC FOSSA, November 2009 20 / 30

Implement / enrich the CRISP language

Quantification and true negation neededI Both performed over certain domains (sorts)I Infinite domains may appear with templates / genericsI We have an implementation of constructive intensional negation

Goals automatically reordered

Extend CRISP to other languages: Java, Ada, C, Fortran, . . .

Marino et al. (UPM) Global GCC FOSSA, November 2009 21 / 30

Integration of Information from External Analyzers

Coding rules(in English)

C++ projectsource files

Coding rulesformalized

in CRISPC++

Coding rulecompiler

g++’

(project build)

Coding rulescompiled

into Prolog

Project factsin Prolog

Ciao Prologengine

Rule viola-tions report

Marino et al. (UPM) Global GCC FOSSA, November 2009 22 / 30

Integration of Information from External Analyzers

Coding rules(in English)

C++ projectsource files

Coding rulesformalized

in CRISPC++

Coding rulecompiler

g++’

(project build)

Knowledge Base about the compiled program

Ciao Prologengine

Rule viola-tions report

ExternalAnalyzer

Translation

Marino et al. (UPM) Global GCC FOSSA, November 2009 22 / 30

Example of New Relation that Needs Specific Analysis

relation F : MemberFunction maySelfCall G : MemberFunctionwhen (

exists C : Record ; R : ProgramLocation such that(

C hasMember Fand C hasMember Gand F \= Gand F hasProgramLocation Land G calledOn Land L mayAlias ’this’

))or F mustSelfCall G

.

Marino et al. (UPM) Global GCC FOSSA, November 2009 23 / 30

Example of Rule that Needs Specific Analysis (1)

Rule HICPP 3.4.2:

“Do not return non-const handles to class data from const member functions”

rule HICPP 3.4.2violated by F : ConstMemberFunction

when exists C : Record;L : ProgramLocation;A : PrivateDataMember;P : PointerType

such that(

A hasType Pand not P is ConstType

and C hasMember Aand C hasMember Fand F returns Land L mayAlias A

).

Marino et al. (UPM) Global GCC FOSSA, November 2009 24 / 30

Example of Rule that Needs Specific Analyses (2)

Rule HICPP 3.2.5:“Ensure destructors release all objects owned by the object”

rule HICPP 3.2.5

violated by D : Destructor

when exists C : Record; A : DataMember; F : MemberFunction;

L : ProgramLocation such that

(

C hasMember D

and C hasMember A

and not D releases A

and L isFreshLocationIn F

and A mayPointTo L

and not exists G : MemberFunction such that

(

C hasMember G

and not A mustBeLinkedFromHeapIn G

)

)

.

Marino et al. (UPM) Global GCC FOSSA, November 2009 25 / 30

New Relations

ProgramLocation mayPointTo AbstractMemoryLocationProgramLocation mustPointTo AbstractMemoryLocationProgramLocation mayAlias ProgramLocationProgramLocation mustAlias ProgramLocation

Marino et al. (UPM) Global GCC FOSSA, November 2009 26 / 30

Lessons learnedgo out & meet people

Industrial projects are different, but there is a whole world ofproblems to solve out there.

Take advantage of european instruments to get in contact with theindustry / overall impression with ITEA quite positive.

Do not try to include your own research agenda in the proposal,that will not work!. . . but it can work in the opposite direction:

I DESAF10S (2010–2012), Spanish Ministry of Science andInnovation

I PROMETIDOS (2010–2013), Madrid RegionalGoverment/European Social Fund

I A PhD on its way!

Marino et al. (UPM) Global GCC FOSSA, November 2009 27 / 30

Lessons learnedbe open, in several ways if possible

Adding the open source label to your project proposal may be beneficialbut try to avoid the obvious, naive argumentations.Global GCC exemplified the benefits of openness in several aspects:

The GCC suite itself, as a vehicle for efficient transfer of advancedcompilation techniques to the european industry, alleviating theirdependency from external proprietary solutions.

Our proposal for an extensible platform for coding rulespecification and validation is itself open source in the sense thatspecs are code that can be shared and enhanced by a new marketof potential users.

This is only possible thanks to a variety of existing static analysersand tools (e.g. CIAO) from academia already distributed on opensource licenses.

Marino et al. (UPM) Global GCC FOSSA, November 2009 28 / 30

Lessons learnedkeep your ears open for unexpected applications

Coding rules for COBOL and beyond. . .

Tools for semi-automatic refactoring

Better source code searches at Google

SAFE-GCC: NXP, Trimedia. . .

Marino et al. (UPM) Global GCC FOSSA, November 2009 29 / 30

Lessons learnedsome negative bits. . .

The GNU compiler collection itself may be a problem, sometimes,due to an obsolete architecture

Issues with copyright transfer to the FSF

Multiplicity of languages has been a problem as well (i.e. multiplefront-ends)

Do not try to solve all the problems of our planet. . . Get focused!

Read the small print — national issues concerning europeanprojects, etc.

Marino et al. (UPM) Global GCC FOSSA, November 2009 30 / 30

The way aheadcurrent state of affairs

Preliminary conclusions:

Clean (declarative) semantics given to potentially ambiguouscoding rules by means of (extended) logic programming

A number of rules implemented using plain Prolog

Rule violations found in highly regarded C++ projects!

Checker: little resource (memory and time) consumption

Future work:

Complete definition of a highly expressive language aimed atspecifying rules and translation scheme into efficient Prolog

Connect the framework with other parts of the GGCC project

Improve performance of overall checking procedure

http://www.ggcc.info

Marino et al. (UPM) Global GCC FOSSA, November 2009 31 / 30

The way aheada research agenda

Focus on tools

Do not miss reliability of open software as a real issue!Bring semantics to open source software development

I type systemsI description logics (ontologies, etc.)I static program analysis (abstract interpretation, model checking,

etc.)I programming language design (DSLs, concurrency. . . )

The future is. . . SFI searching sources based on types (Foogle)I ontology powered semantic desktops (Nepomuk)I coherent management of packages (Mancoosi)I automatic discovery and composition of sw (AMOS, EZweb)I safe composition of componentsI etc.

Marino et al. (UPM) Global GCC FOSSA, November 2009 32 / 30

Recommended