28
Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and René Rydhof Hansen (DIKU) Gilles Muller (Ecole des Mines de Nantes) the Coccinelle project

Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

Embed Size (px)

Citation preview

Page 1: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

Documenting and Automating Collateral Evolutions in Linux Device Drivers

Yoann PadioleauEcole des Mines de Nantes (now at UIUC)

withJulia Lawall and René Rydhof Hansen (DIKU)

Gilles Muller (Ecole des Mines de Nantes)

the Coccinelle project

Page 2: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

2

The problem: Collateral Evolutions

int bar(int x){

Evolution

in a library becomes

Can entail lots of

Collateral Evolutions (CE) in clients

foo(1);

bar(1);

foo(foo(2));bar(bar(2));

int foo(int x){

lib.c

client1.c client2.c

clientn.c

foo(2);

bar(2);

if(foo(3)) {

if(bar(3)) {

Legend:before

after

Page 3: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

3

The problem: Collateral Evolutions

int bar(int x, int y){

Evolution

in a library becomes

Can entail lots of

Collateral Evolutions (CE) in clients

foo(1);

bar(1,?);

foo(foo(2));bar(bar(2,?),?);

int foo(int x){

lib.c

client1.c client2.c

clientn.c

foo(2);

bar(2,?);

if(foo(3)) {

if(bar(3,?)) {

Legend:before

after

Page 4: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

4

Our target: Linux device drivers Many libraries and many clients:

Lots of driver support libraries: one per device type, one per bus (pci library, sound library, …)

Lots of device specific code: Drivers make up more than 50% of Linux

Many evolutions and collateral evolutions [Eurosys’06] 1200 evolutions in Linux 2.6 For each evolution, lots of collateral evolutions Some collateral evolutions affect over 400 files

at over 1000 code sites

Page 5: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

5

Our goal

Currently, Collateral Evolutions in Linux are done nearly manually:

Difficult Time consuming Error prone

The highly concurrent and distributed nature of the Linux development process makes it even worse:

Patches that miss code sites (because of newly introduced sites and newly introduced drivers)

Out of date patches, conflicting patches Drivers outside the Linux source tree are not updated Misunderstandings

Need a tool to document and automate Collateral Evolutions

Page 6: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

6

Taxonomy of transformations

Taxonomy of evolutions (library code): add parameter, split data structure, change protocol

sequencing, change return type, add error code, etc

Taxonomy of collateral evolutions (client code)?

Very wide variety of program transformations, affecting wide variety of C and CPP constructs

Often depends on context, e.g. for add argument the new value must be constructed from enclosing code

Note that not necesseraly semantic preservingCan not be done by current refactoring tools (more than just renaming entities). Need a flexible tool.

Page 7: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

7

Complex Collateral Evolutions (2.5.71)

int a_proc_info(int x ) { scsi *y; ... y = scsi_get(); if(!y) { ... return -1; } ... scsi_put(y);

... }

Delete calls

to library

Delete error checking code

From local var to

parameter

Legend:before

after

,scsi *y;

Evolution: scsi_get()/scsi_put() dropped from SCSI library Collateral evolutions: SCSI resource now passed directly to

proc_info callback functions via a new parameter

Page 8: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

8

Our idea

int a_proc_info(int x ,scsi *y ) { scsi *y; ... y = scsi_get(); if(!y) { ... return -1; } ... scsi_put(y);

... }

The example How to specify

the required program transformation ?

In what programming language ?

Page 9: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

9

Our idea: Semantic Patches

int a_proc_info(int x+ ,scsi *y ) {- scsi *y; ...- y = scsi_get();- if(!y) { ... return -1; } ...- scsi_put(y);

... }

function a_proc_info;

identifier x,y;

@@

@@

the ‘...’ operator

modifiers

Declarative language

Patch-like syntax

metavariable

references

Transform if everythin

g “matches

metavariable declarations

Page 10: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

10

Affected Linux driver code

int s53c700_info(int limit) { char *buf; scsi *sc; sc = scsi_get(); if(!sc) { printk(“error”); return -1; } wd7000_setup(sc); PRINTP(“val=%d”, sc->field+limit); scsi_put(sc); return 0;}

int nsp_proc_info(int lim) { scsi *host; host = scsi_get(); if(!host) { printk(“nsp_error”); return -1; } SPRINTF(“NINJASCSI=%d”, host->base); scsi_put(host); return 0;}

drivers/scsi/53c700.c

Similar, but not identical

drivers/scsi/pcmcia/nsp_cs.c

Page 11: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

11

Applying the semantic patch

@@function a_proc_info; identifier x,y;@@ int a_proc_info(int x+ ,scsi *y ) {- scsi *y; ...- y = scsi_get();- if(!y) { ... return -1; } ...- scsi_put(y);

... }

proc_info.sp

int nsp_proc_info(int lim) { scsi *host; host = scsi_get(); if(!host) { printk(“nsp_error”); return -1; } SPRINTF(“NINJASCSI=%d”, host->base); scsi_put(host); return 0;}

int s53c700_info(int limit) { char *buf; scsi *sc; sc = scsi_get(); if(!sc) { printk(“error”); return -1; } wd7000_setup(sc); PRINTP(“val=%d”, sc->field+limit); scsi_put(sc); return 0;}

$ spatch *.c < proc_info.sp

Page 12: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

12

Applying the semantic patch

@@function a_proc_info; identifier x,y;@@ int a_proc_info(int x+ ,scsi *y ) {- scsi *y; ...- y = scsi_get();- if(!y) { ... return -1; } ...- scsi_put(y);

... }

int s53c700_info(int limit, scsi *sc) { char *buf; wd7000_setup(sc); PRINTP(“val=%d”, sc->field+limit); return 0;}

proc_info.sp

int nsp_proc_info(int lim, scsi *host) { SPRINTF(“NINJASCSI=%d”, host->base); return 0;}

$ spatch *.c < proc_info.sp

Page 13: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

13

SmPL: Semantic Patch Language A single small semantic patch can modify

hundreds of files, at thousands of code sites

The features of SmPL make a semantic patch generic. Abstract away irrelevant details: Differences in spacing, indentation, and comments Choice of the names given to variables

(metavariables) Irrelevant code (‘...’, control flow oriented) Other variations in coding style (isomorphisms)

e.g. if(!y) ≡ if(y==NULL) ≡ if(NULL==y)

Page 14: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

14

Sequences and the ‘…’ operator

One ‘-’ line can erase multiple lines

1 y = scsi_get();

2 if(exp) {

3 scsi_put(y);

4 return -1;

5 }

6 printf(“%d”,y->f);

7 scsi_put(y);

8 return 0;

- y = scsi_get();

...

- scsi_put(y);

C file Semantic patch

Control-flow graph(CFG) of C file

1

2

3

8

6

4

7

exit

path 1:

path 2:

“. . .” means for all subsequent paths

Page 15: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

15

Isomorphisms, C equivalences

Examples: Boolean : X == NULL !X NULL == X

Control : if(E)S1 else S2 if(!E) S2 else S1

Pointer : E->field *E.field etc.

How to specify isomorphisms ?

Reuses SmPL syntax

@@ expression *X; @@

X == NULL <=> !X <=> NULL == X

Page 16: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

How does it work ?

Page 17: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

17

The transformation engine architecture

Parse C file Parse Semantic Patch

Translate to CFG

Translate to extended CTL

Expand isomorphisms

Match CTL against CFG using

a model checking algorithm

Modify matched code

Unparse

Computational Tree Logic

[Clark86] with extra features

Page 18: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

18

CTL and Model checking Model checking a CTL formula against a

model answers just yes/no (with counter example).

We do program transformations, not just “pattern checking”. Need: Bind metavariables and remember their value Remember where we have matched sub-

formulas We have extended CTL : existential

variables and program transformation annotations

@@ exp X,Y;@@ f(X); ...- g(Y);+ g(X,Y);

9X.f(X);Æ AX A[true U

9v.9Y.g-(-Y-)-;-+g(X,Y);v

Page 19: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

19

Other issues

Need to produce readable code Keep space, indentation, comments Keep CPP instructions as-is. Also programmer

may want to transform some #define,iterator macros (e.g. list_for_each)

Interactive engine, partial match Isomorphisms

Rewriting the Semantic patch (not the C code), Generate disjunctions

Very different from most other C tools

60 000 lines of OCaml code

Page 20: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

Evaluation

Page 21: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

21

Experiments

Methodology Detect past collateral evolutions in Linux 2.5

and 2.6 using patchparse tool [Eurosys’06] Select representative ones

Test suite of over 60 CEs Study them and write corresponding semantic

patches Note: we are not kernel developers

Going "back to the future". Compare: what Linux programers did manually what spatch, given our SPs, does automatically

Page 22: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

22

Test suite

20 Complex CEs : mistakes done by the programmers In each case 1-16 errors or misses

23 Mega CEs : affect over 100 sites Up to 40 people for up to two years

26 “typical” CEs: The whole set of CEs affecting a

“typical” (median) directory from 2.6.12 to 2.6.20More than 5800 driver files

Page 23: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

23

Results Our SPs are on average 106 lines long SPs often 100 times smaller than “human-made”

patches. A measure of time saved: Not doing manually the CE on all the drivers Not reading and reviewing big patches, for people with

drivers outside source tree Overall: correct and complete automated

transformation for 93% of files Problems on the remaining 7%: We miss code sites

CPP issues, lack of isomorphisms (data-flow and inter-procedural)

We are not kernel developers … don’t know how to specify No false positives, just false negatives

Average processing time of 0.7s per relevant fileSometimes the tool was right and human

wrong

Page 24: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

24

Impact on the Linux kernel

We also wrote some SPs for current collateral evolutions (looking at linux kernel mailing lists)

use DIV_ROUND_UP, BUG_ON, FIELD_SIZE convert kmalloc-memset to kzalloc Total diffstat: 154 files changed, 203 insertions(+), 375

deletions(-) We wrote other SPs, for “bug-fixing” (good side

effects of our tool) Add missing put functions (reference counting) Drop unnecessary put functions (reference counting) Remove unused variables Total diffstat: 111 files changed, 340 insertions(+), 355

deletions(-)

Accepted in latest Linux kernel

Page 25: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

25

Future work

Are semantic patches and spatch useful Only for Linux device drivers? Only for Linux programmers? Only for collateral evolutions program

transformations? Only for program transformations?

Our thesis: We don’t think so. But first device driver CEs are an important

problem! All software evolves. Software libraries are

more and more important, and have more and more clients

We may also help that software, those libraries

Page 26: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

26

Related work

Refactoring CatchUp[ICSE’05], tool-based, replay

refactorings from Eclipse JunGL[ICSE’06], language-based, but

based on ML, less Linux-programmer friendly

Program transformation engines Stratego[04]

C front-ends CIL[CC’02]

Page 27: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

27

Conclusion

Collateral Evolution is an important problem, especially in Linux device drivers

SmPL: a declarative language to specify collateral evolutions

Looks like a patch; fits with Linux programmers’ habits

But takes into account the semantics of C hence the name Semantic Patches

A transformation engine to automate collateral evolutions based on model checking technology.

Page 28: Documenting and Automating Collateral Evolutions in Linux Device Drivers Yoann Padioleau Ecole des Mines de Nantes (now at UIUC) with Julia Lawall and

28

Thank you

You can download our tool, spatch, at http://www.emn.fr/x-info/coccinelle

Questions ?