37
Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

Embed Size (px)

Citation preview

Page 1: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

Dr Jekyll and Mr C

Rob EnnalsIntel Research Cambridge

Page 2: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)2

C is holding us back

Much important software is currently written in C– Even if not the most lines of code, probably most of the cycles

Security problems

Hard to analyse

Hard to debug

Unreliable

Unsafe

Hard to understand

Hard to write

Hard to parallelise

Unexpressive

connected

Page 3: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)3

Functional Languages are Great!

Safety

Generic Types

Lambda Expressions

Controlled Effects

Type Classes

Easier to write

Easier to understand

More reliable

More secure

Easier to parellelize

Features Benefits

So why does nobody use them?

Page 4: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)4

A Problem: Language Switching Costs

Much important software is currently written in C

Moving to a new language incurs high switching costs– Programmers, tools, libraries, and existing code, all tied to C

C

Programmers

Trust

Libraries

Existing Code Tools

Page 5: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)5

A Solution: Lossless Round Tripping

Jekyll is a high level functional programming language– Featuring most of the features of Haskell + more

Jekyll can be translated losslessly to and from C– Preserving layout, formatting, comments, everything– C code is readable and editable

C File

C File

Jekyll File

Jekyll File

C Programmer

Jekyll Programmer

Page 6: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)6

Switching Costs are Reduced

Programmers and Tools can still use the C version.

Existing C code can stay in C– Although there may be benefit to be had from modifying it

If Jekyll ceases to be maintained, just use the C

Jekyll

C Programmers

C Trust

C Libraries

Existing C Code C Tools

Page 7: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)7

Jekyll is Transparent

C Programmers can edit programs without knowing about Jekyll.

This requires that:– C programmers can understand C produced by the Jekyll Translator– The Jekyll translator can understand edits made by C programmers

C FileC Programmer Jekyll Translator

Jekyll is very tolerant of edits to C code. This is essential.

Page 8: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)8

We assume that C programmers do

NOT KNOW ANYTHING about Jekyll

But they still need to be able to edit Jekyll-encoded C files

Page 9: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)9

Jekyll-Encoded C files are Unannotated

No funny macros, No weird comments, No restrictive naming rules

Just good, readable, editable, C

All extra info is simply thrown away– retrieved from the previous Jekyll version when converted back

struct<%a> Node{ %a *element; List<%a> *tail; };

struct Node{ void *element; List *tail;}

Jekyll Encoded C

Page 10: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)10

Reconstruction based on previous version

There are many ways to decode a C file as Jekyll– Extra type info, different features being encoded, etc etc

We chose the encoding that matches the previous version– Aiming to minimise the textual difference from the previous Jekyll file

This allows Jekyll to correctly decode unannotated C

New C File

Old Jekyll File

New Jekyll File

Page 11: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)11

Encoding based on the previous version

There are many ways to encode a Jekyll feature as C– Temporary names, whitespace, different encodings, etc

We chose the encoding that matches the previous C version– Aiming to minimise the textual difference from the previous file

This allows Jekyll to avoid modifying hand-edited C

New Jekyll File

Old C File

New C File

Page 12: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)12

Jekyll is another view of C

Authoritative source code can stay as C

But programmers and tools can also view it as Jekyll

C Programmers need not know Jekyll is even being used.

C Repository

C File

C File

Jekyll File

C File

Jekyll Programmer

C Programmer

Jekyll Repository

Jekyll File

Jekyll File

Page 13: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)13

Jekyll Features

Use of unsafe features causes a warning unless marked as “unsafe”

All of C

Unsafe FeaturesImperative FeaturesLow-Level Features

C TypesC ExpressionsPre-processor

Most of O'Caml +Haskell

Algebraic TypesType Classes

Lambda ExpressionsPattern Matching

Generic TypesType SafetyOptional GCNOT LAZY!

Jekyll

Page 14: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)14

What is Jekyll

Jekyll & its C Encoding

Lossless Translation

Demo

Page 15: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)15

Superset of C

All C programs are valid Jekyll programs, unless:– They use extensions that Jekyll does not understand– They use the pre-processor in a way that Jekyll does not understand

In future: Support everything GCC can compile

C

Jekyll

Page 16: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)16

Haskell

A mix of Haskell, O'Caml, and Cyclone

Jekyll contains no original language features– All features are present in either Haskell, O'Caml or Cyclone– Features are usually implemented in the same way too– Although the combination can be interesting…

We will focus on the encoding, rather than the language itself

Cyclone

O'Caml

Page 17: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)17

Generic Types

All extra type info is thrown away– type parameters– type variables– type constraints

The Jekyll translator restores them from the previous Jekyll file

struct<%a> Node{ %a *element; List<%a> *tail; };

struct Node{ void *element; List *tail;}

Jekyll C

Page 18: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)18

Tagged Unions

No annotations here either– Jekyll will attempt to decode any struct that has _tag and _body fields

tagged<%a> List{ Node<%a> NODE; void EMPTY; };

switch(*l){ case EMPTY: return 0; case NODE n: return len(n);};

struct List{ enum {NODE,EMPTY} _tag; union { Node NODE; void EMPTY; } _body; };

switch(l->_tag){ case EMPTY: return 0; case NODE: return len(l->_body.NODE);};

Jekyll C

Page 19: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)19

Unsafe Unions

All unsafe C operations are allowed– Pointer arithmetic– Unchecked array bounds– Unsafe casts, etc etc

Must be marked with the "unsafe" keyword to avoid a warning

unsafe *p++ = *q++; *p++ = *q++;

Jekyll C

Page 20: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)20

Lambda Expressions

Programmers are free to change all generated names– The fe and ft prefixes are the defaults, but are not required– They are just used to reduce incidence of name clashes

int plusthree(int z){return foo(3, x : x + z;);}

struct fe_env{ int *z;};int ff_lam(struct fe_env *_env, int x){return x+*(_env->z);}

int plusthree(int z){ struct fe_env ft0 = {&z}; return foo(3,(void*)&ff_lam,&ft0);}

Page 21: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)21

Type Classes (Haskell-Style) (1/2)

Jekyll implements the full Haskell98 type class system

Any struct that contains only functions can be decoded as a type class

Type-classes are a good match for C code– They don't change the in-memory representation (unlike vtables)– One can add methods to existing types (unlike vtables)

interface Print %a{ void print(%a *x);};

struct Print { void (*print)(void* _env, _va *x);};

Page 22: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)22

Type Classes (Haskell-Style) (2/2)

Defining a new type class instance creates a new dictionary struct.

implement Print int { void print(int *x){print_int(*x);};};

implement(Print int);

void int_print(void* _env,int *x){print_int(*x);};

struct Print Print_int = {(void*)&int_print};

Page 23: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)23

Initialiser Expressions

Safe, easy, creation of values.

One can of course rename all temporaries.

return new Node{h,t}

List *tmp;tmp = (List*) jkl_GC_malloc(sizeof(List);tmp->_tag = Node;tmp->_body.Node.head = h;tmp->_body.Node.tail = t;return tmp;

Page 24: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)24

Other Features

Fat pointers – allow safe pointer arithmetic (like Cyclone)

Macrotype – tell Jekyll how to interpret foreign macros (like Astec)

Page 25: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)25

What is Jekyll

Jekyll & its C Encoding

Lossless Translation

Demo

Page 26: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)26

Simplified C->Jekyll Translation

Ignoring parsing, transforms, analysis, typchecking, etc etc

Jekyll FileC FileNon-det

Jekyll File

Previous Jekyll File

Select Closest

Decode Output

Page 27: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)27

Simplified Jekyll->C Translation

C FileJekyll FileNon-det

C File

Previous C File

Select Closest

Encode Output

Ignoring parsing, transforms, analysis, typchecking, etc etc

Page 28: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)28

Expanded Jekyll->C Translation

Non-detCombined AST

CTokens

JekyllTokens

Parse Pretty Print

EncodeJekyll AST

Possible CTokens

Previous CTokens

Select Closest

Whiteflow/Check Output

Analysis

Possible JklTokens

Guesses

Page 29: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)29

Encode/Decode: Non-deterministic

Produce a non-deterministic AST describing all possibilities– Encode: Produce C that could implement a Jekyll feature– Decode: Look for C code that might implement a Jekyll feature

Decode is very aggressive – will even accept invalid encodings– If it seems that that might have been what was intended– User can be warned about these at check time

Non-detC AST

Encode Jekyll AST

Non-detJekyll AST

DecodeC AST

Page 30: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)30

Check: Ensure input was well formed

Decode stage will accept illegal encodings– By design: Makes converting mangled C easier

Check that our output be translated back to our input?– If not, then warn the user to look at the diffs

CTokens

PossibleTokens

Check

Page 31: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)31

Degrees of Conformity

Cannot Translate

Translatesbut check

fails

Translatesand check

passes

Translatesand is

canonical

Encoding stays as C

Best match is a decoded featureBut encoding was invalid

Generate a file, but warn

All is good

Page 32: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)32

Select Closest: Resolve Non-Determinism

Chose encoding so as to minimise the textual differencefrom the previous file

If AST did not change, new file will bebit-for-bit identical to old file

Now: Line-by-line comparison– Minimises differences as seen by "diff"

Future: Burrows-Wheeler longest common substring

Non-detFile

PreviousFile

Select Closest

Page 33: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)33

Twinned Token Printing

Carry whitespace and comments between Jekyll and C– Otherwise language comments would be entirely disconnected

Whitespace can come from input file or previous file– Twinned token: Whitespace from input token that matches the twin– Untwinned token: Whitespace from previous file version

Printed C

Printed Jekyll

TwinsJekyllAST

Input Jekyll

Previous C

Page 34: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)34

What is Jekyll

Jekyll & its C Encoding

Lossless Translation

Demo

Page 35: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)35

Demo

Page 36: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)36

Conclusions

• Jekyll is a powerful functional programming language

• Lossless translation makes it practical to migrate C code

• Non-Deterministic encoding makes it tolerant of C edits

Download Jekyll now:

http://jekyllc.sf.net

Page 37: Dr Jekyll and Mr C Rob Ennals Intel Research Cambridge

13/3/06 Dr Jekyll and Mr C (SRG Talk)37