Lifting variability from C to mbeddr-C

Preview:

DESCRIPTION

Information about variability is expressed in C through the usage of preprocessor directives which interact in multiple ways with proper C code, leading to systems difficult to understand and analyze. Lifting the variability information into a DSL to explicitly capture the features, relations among them and to the code, would substantially improve today’s state of practice. In this paper we present a study which we performed on 5 large projects (including the Linux kernel) and almost 30M lines of code on extracting variability information from C files. Our main result is that by using simple heuristics, it is possible to interpret a large portion of the variability information present in large systems. Furthermore, we show how we extracted variability information from ChibiOS, a realtime OS available on 14 different core architectures, and how we lifted that information in mbeddr, a DSL-based technology stack for embedded programing with explicit support for variability.

Citation preview

Contribution to Mbeddr

Image

C

Extracting variability from C and lifting it to mbeddr

Federico Tomassetti, Daniel Ratiu

3. Analysis

1. Variability in C

2. Variability in mbeddr

5. Case study

4. Results

C

The C preprocessor is evil

• It let you obfuscate everything, even keywords

• Everything is in global scope• What a module does, depends on the

context where it is included• It operates at token level• It makes the code very difficult to analyze

The C preprocessor is evil

• It let you obfuscate everything, even keywords

• Everything is in global scope• What a module does, depends on the

context where it is included• It operates at token level• It makes the code very difficult to analyze

The C preprocessor is evil

What a module does, depends on the context where it is included

#define A#include «foo.h»

#define B 50#include «foo.h»

// foo.h

#ifdef Astruct SomeStruct { …}#elseint b = B;void foo();#endif

What foo.h declares depend on where it is included

The C preprocessor is evil

• It let you obfuscate everything, even keywords

• Everything is in global scope• What a module does, depends on the

context where it is included• It operates at token level• It makes the code very difficult to analyze

The C preprocessor is evil

• It let you obfuscate everything, even keywords

• Everything is in global scope• What a module does, depends on the

context where it is included• It operates at token level• It makes the code very difficult to analyze

It is an extensible variant of C built on top of a projectional editor.

Existing extensions include:• interfaces with pre- and postconditions, • components, • state machines,• physical units,• requirements tracing,• product line variability.

mbeddr introduces higher-level abstractions

• Constants with scope• Feature models• Configuration models• Isolated modules

AnalysisWe analyzed:• Linux• Apache Openoffice• Quake• VLC• Mozilla

For a total of circa 73K files and 30M LOCs.

We analyzed these projects to understand how variability is used in C and what we can do for lifting it to mbeddr.

Individuate relevant statements

#define, #undef

#ifdef, #ifndef, #if, #elif, #else, #endif

Configuration processing

Presenceconditions

Parsing PC expressions

#if A>B && !(C||D)#elif D!=10#ifndef C

185K expression parsed 3 errors

Extra: parsing define expressions

#define A 5#define B do {} while(1);#define C 3 +

82-95% of define values are valid expressions

Exclude non-variability usages

#ifndef FOO_H#define FOO_H…#endif

#ifndef A#define A 5#endif

Double inclusion guard Override guard

VPs combination

#ifdef Afoo1();#elif Bfoo2();#if B>Afoo3();#elsefoo4();#endif#endif

VP1{ then_block: { foo1(); } elif_block: { foo2(); VP2 { then_block: { foo3();} else_block: { foo4();} } }}

VPs combination// A foo1();// !A && Bfoo2();// !A && B && B>Afoo3();// !A && B && !(B>A)foo4();

VP1{ then_block: { foo1(); } elif_block: { foo2(); VP2 { then_block: { foo3();} else_block: { foo4();} } }}

This is important in order to understand which kind of expressions we need to support in the higher level configuration language.

RQ1 Which are the typical building blocks in presence conditions?

Kind of expressions Presence conditions containing them

Identifier references 85-98 %Logical operators 21-66 %Number literals 6-16 %Comparison operators 0-6 %Others < 2%

RQ1 Which are the typical building blocks in presence conditions?

Depending on changes upon defined symbols, defines can be lifted (or not) as constant configuration values.

Which changes (re- #defines and #undefs) are operated on a defined symbol?

RQ2

We want constant to avoid this situation:

#define A 1#if A>1foo1();#endif#define A 2#if A>1foo2();#endif

Same condition, one is included, one is not

Which changes (re- #defines and #undefs) are operated on a defined symbol?

RQ2

Cases RangeSingle definitionMultiple definitions to the same valueDefinitions under different conditionsTotal

Which changes (re- #defines and #undefs) are operated on a defined symbol?

RQ2

Which changes (re- #defines and #undefs) are operated on a defined symbol?

RQ2

Definitions under different conditions

#if VERS <= 2#define A 1#elif VERS == 3#define A 2#else#define A 3#endif

Cases RangeSingle definition 69-90 %Multiple definitions to the same value

2-24 %

Definitions under different conditions

2-9 %

Total 95-99 %

Which changes (re- #defines and #undefs) are operated on a defined symbol?

RQ2

If they are, it could be possible to extract feature model constraints from them.

Are #error and #warning used in practice?RQ3

They are present in 4 out of 5 projects but they represent between 0 and 0.26% of the preprocessor statements.

Linux contains more than 800 #error/#warningMozilla more than 700

Are #error and #warning used in practice?RQ3

Results

RQ1) Which are the typical building blocks in presence conditions?

RQ2) Which changes (re- #defines and #undefs) are operated on a defined symbol?

RQ3) Are #error and #warning used in practice?

Results

RQ1) Which are the typical building blocks in presence conditions?

RQ2) Which changes (re- #defines and #undefs) are operated on a defined symbol?

RQ3) Are #error and #warning used in practice?

Identifiers, integers, logical and comparison operations

Results

RQ1) Which are the typical building blocks in presence conditions?

RQ2) Which changes (re- #defines and #undefs) are operated on a defined symbol?

RQ3) Are #error and #warning used in practice?

Identifiers, integers, logical and comparison operations

More than 90% of symbols behave like constants

Results

RQ1) Which are the typical building blocks in presence conditions?

RQ2) Which changes (re- #defines and #undefs) are operated on a defined symbol?

RQ3) Are #error and #warning used in practice?

Identifiers, integers, logical and comparison operations

More than 90% of symbols behave like constants

Depends on the project

ChibiOS

ChibiOS is a real-time OS supporting 14 core architectures, different compilers and platforms.

OS Kernel module

41 files246 presence conditions233 definitions54 symbols in presence conditions2 symbols used in definitions of PC symbols53 symbols not defined in the module (feat.)3 defined in the module (derived feat.)

Demos/ARMCM3-STM32F103ZG-FATFS module

Definitions for 31 of the 53 features28 defined to TRUE/FALSE1 has no value1 has value 01 has value 20

Extracting variability from C and lifting it to mbeddr

Federico Tomassetti, Daniel Ratiu

Questions?

Recommended