104
Just Enough C For Open Source Projects Andy Lester OSCON 2008

Just Enough C.pdf

Embed Size (px)

Citation preview

  • Just Enough C For Open Source Projects

    Andy LesterOSCON 2008

  • Why this session?

  • It's all forMichael Schwern

    Schwern is a brilliant programmer.

    An invaluable asset to the Perl community.

    He still doesn't know C, even though Perl is written in C.

  • My assumptions

    You're like Schwern. Already a programmer, but raised on Ruby,

    Python, Java or PHP.

    You don't know what it was like in the bad old days.

    You want to work on open source projects.

  • Goals

    As much territory as possible As fast as possible Danger zones Impedance mismatches

  • Big differences

    Nothing is DWIM. All memory must be explicitly handled. Nothing automatically does anything. No strings extending. No magic type conversions.

  • Jumping in

  • #include

    int main( int argc, char *argv[] ) {puts( "Hello, World!" );

    return 0; }

    Hello, world!

  • uniqua:~/ctut/c : gcc -Wall -o hello hello.c

    uniqua:~/ctut/c : ls -ltotal 20-rwxr-xr-x 1 andy andy 6592 Jul 6 01:02 hello*-rw-r--r-- 1 andy andy 103 Jul 6 01:00 hello.c

    uniqua:~/ctut/c : ./helloHello, World!

    Build & run

  • Build & run

    gcc compiles hello.c into a.out or hello.o.

    gcc calls the linker to make an executable program out of the object file

    main() has to be called main(). Otherwise linking wont work.

  • Literals

    Strings are double-quoted. Characters are single-quoted.

  • Variables

    Variables have no sigils. Integers are int, and can be unsigned. Long integers are long, can be unsigned. Floating point numbers are float. Characters are char. There is no string type.

  • Variables

    Variables are never initialized for you. You create a variable, it's going to contain

    whatever happens to be in that spot in memory. It is almost never what you want.

  • Casting variablesTo change an int to a long, say:int n;long l;l = (long)n;

    Upcasting to a bigger size is implicit.l = n;

    Downcasting is dangerousn = (int)l; /* Could lose significant bits */

  • Converting valuesConvert strings to numbers with atoi and atofint i = atoi( "1234" );float f = atof( "1234.5678" );

    Convert numbers to strings with sprintfsprintf( str, "%d", 1234 );sprintf( str, "%8.4f", 1234.5678 );

  • Numeric max/minints can only be so big, then wrap#include int n = INT_MAX;printf( "n = %d\n", n ); n++;printf( "n = %d\n", n );

    unsigned int u = UINT_MAX;printf( "u = %u\n", u ); u++;printf( "u = %u\n", u );

    n = 2147483647n = -2147483648u = 4294967295u = 0

  • IntegersAll integer size maxima and minima are platform-dependent.

    Those 32-bit ints on the previous slide? Luxury!

    In my day, we had 2-bit ints, and we were glad to get those!

    When in doubt, use .

  • ArraysArrays are pre-defined.

    Arrays cannot change in size.int scores[10];/* Numbered 0-9 */

  • FunctionsTake 0 or more arguments.

    Return 0 or 1 values/* Declaration */int square( int n );

    /* Definition */int square( int n ) { return n*n;}

  • FunctionsFunctions can return void, meaning nothing.void greet_person( const char * name ) { printf( "Hello, %s\n", name );}

    Functions can take no arguments with voidvoid launch_nukes( void ) { /* Implementation details elided. */ /* Return value not necessary. */}

  • Questions?

  • Pointers

  • Pointers

    Address in memory, associated with a type Dangerous But you can't live without 'em

  • PointersTake address of something with &

    Dereference the pointer with *char answer;char * cptr = &answer;*cptr = 'x';

  • PointersPass-by-reference with pointersvoid set_defaults( int *x, int *y, char *c ) { *x = 42; *y = 0; *c = ' ';}

    int this;int that;char other;

    set_defaults( &this, &that, &other );

  • Pointer mathYou can move pointers forward & back:

    int totals[3] = { 2112, 5150, 90125 };int *i = totals;/* or = &totals[0] */

    i = i + 2; /* now points at totals[2] */*i = 14; /* Sets totals[2] to 14 */

  • Strings & structs

  • StringsStream of chars, ended with a nul ('\0').char buffer[100];char *p = &buffer;p[0] = 'H';p[1] = 'e';p[2] = 'l';...p[12] = 'd';p[13] = '!';p[14] = '\0';puts( p );

    /* Prints "Hello, world" */

  • StringsOr you can use standard functions.#include char buffer[100];char *p = &buffer;

    strcpy( p, "Hello, world!" );

    buffer[] contains this:|H|e|l|l|o|,| |W|o|r|l|d|!|\0| ... +86 bytes trash

  • StringsThere is no bounds checking.char buffer[10];

    strcpy( buffer, "Hello, world!" );/* Writes 14 chars into a 10-char buffer */

    This is where buffer overflow security advisories come from.

  • StringsDeclaring a string at compile-time will automagically give you the buffer you need.char greeting[] = "Hello, world!";

    printf( "greeting \"%s\", size %d\n", greeting, (int)sizeof(greeting));

    >> greeting "Hello, world!", size 14

  • Stringsstrcat() tacks strings on each otherchar greeting[100];

    strcpy( greeting, "Hello, " );strcat( greeting, "world!" );

  • Stringsstrcmp() compares strings and returns 1, 0, -1 if the first string is less than, equal, or greater.if ( strcmp( str, "monkey" ) == 0 ) { handle_monkey();}

    It is NOT a boolean, so don't pretend it is.#define STREQ(x,y) (strcmp((x),(y))==0)/* Wrapper macro that is a boolean */

  • strlen()Gives the length of a stringconst char name[] = 'Bob';int len = strlen( name );

    /* len is now 3, although sizeof(name) == 4 */

  • StructsStructs aggregate vars together#define LEN_USERNAME 8 struct User { char username[ LEN_USERNAME + 1 ]; unsigned int age; unsigned int salary; int example_originality_rating; };

    struct User staff[100];

  • UnionsUnions let storage be of one type or another.struct User { char type; union { float hourly_rate; long yearly_salary; } pay;};

    struct User Bob;Bob.type = 'H';Bob.pay.hourly_rate = 9.50;

    struct User Ted;Ted.type = 'S';Ted.pay.yearly_salary = 70000L;

  • Questions?

  • File I/O

  • File I/OIn Perl, it's easy.open( my $fh, '
  • In C? No such luck.

  • File I/OFILE *fp;fp = fopen( "/path/to/file", "r" );if ( fp == NULL ) { puts( "Unable to open file" ); /* Print an error based on errno */ exit 1;}

    char buffer[100];char *p;p = fgets( buffer, sizeof(buffer), fp );if ( errno ) { /* put out an error based on errno */}if ( p == NULL ) { puts( "I'm at EOF" );}fclose( fp );

  • This is why I fell in love with Perl*

  • Questions?Praise for modern

    languages?

  • Macros and the preprocessor

  • Macros

    Macros get handled by the preprocessor.#define MAX_USERS 100int scores[MAX_USERS];for ( int i = 0; i < MAX_USERS; i++ ) { ...}

    This expands before compilation.int scores[100];for ( int i = 0; i < 100; i++ ) { ...}

  • Macros

    Macros can take arguments that are replaced by the preprocessor.#define MAX_USERS 100#define BYTES_NEEDED(n,type) n * sizeof(type)int *scores = malloc( BYTES_NEEDED( MAX_USERS, int ) );

    becomesint *scores = malloc( 100 * sizeof(int) );

  • Macro safetyAlways wrap your macro arguments in parens, because of order of operations.#define BYTES_NEEDED(n,type) n * sizeof(type)const int bytes = BYTES_NEEDED(n+1,int)

    becomes#define BYTES_NEEDED(n,type) n * sizeof(type)const int bytes = nusers+1 * sizeof(int);/* Evals as nusers +(1*sizeof(int)) */

  • Macro safetyInstead, define it as:#define BYTES_NEEDED(n,type) ((n)*sizeof(type))

    so it expands asconst int bytes = ((nusers+1) * sizeof(int));

  • Macrosgcc -E will preprocess to stdout, so you can see exactly what the effects of your macros are.

    Most compilers these days will inline simple functions, so dont use macros instead of functions in the name of efficiency.

  • Conditional compilationMacros allow you to have multiple platforms in the same block of code.#ifdef WIN32/* Compile some Win32-specific code.#endif

  • Conditional compilationMacros let you compile in debug code or leave it out.int launch_missiles( void ) {#ifdef DEBUG log( "Entering launch_missiles" );#endif

    /* Implementation details elided */

    #ifdef DEBUG log( "Leaving launch_missiles" );#endif}

  • Conditional compilationYou can also use the value of those macros.int launch_missiles( void ) {#if DEBUG_LEVEL > 2 log( "Entering launch_missiles" );#endif

    /* Implementation details elided */

    #if DEBUG_LEVEL > 2 log( "Leaving launch_missiles" );#endif}

  • Open source projects use conditional

    compilation a LOT.

  • Questions?

  • Memory allocation

  • Memory allocation

    Perl, Ruby, PHP, Python, most any dynamic language hides all this from you.

    Soon youll see why. It's a pain to deal with. It's dangerous. It's necessary.

  • malloc() & free()const char name[] = "Bob in Marketing";

    int main( int argc, char *argv[] ) { char *message = malloc( 100 );

    if ( message == NULL ) { puts( "Failed to allocate memory" ); exit(1); } strcpy( message, "Hello, " ); strcat( message, name ); strcat( message, "!" ); puts( message ); free( message );

    return 0;}

  • sizeof()Get the size of a type with sizeofint *scores = malloc( 100 * sizeof( int ) );

    sizeof is an operator.sizeof happens at compile time. Sorry, no run-time dynamic type sizing.

  • memset()Sets a range of memory to a given value.

    Definition:memset( void *p, char ch, unsigned int nbytes );

    Use:memset( scores, 0, 100 * sizeof( int ) );

  • memcpy()Copies range of memory to another place.

    Definition:memcpy( void *targ, void *source, unsigned int nbytes );

    Use:memcpy( scores, original_scores, 100 * sizeof(int) );

    If the ranges of memory overlap, you have to use memmove.

  • realloc()realloc resizes the buffer you previously malloced, or a new one of the new size.int bufsize = users_allocated * sizeof(int);int *scores = malloc( bufsize );...nusers++;if ( n_users > users_allocated ) { users_allocated += (users_allocated/2); bufsize = users_allocated * sizeof(int); scores = realloc( scores, bufsize );)

    You may not get the same block of memory back, so other pointers that pointed into the buffer are now invalidated.

  • Memory catastrophes

  • These are why we have programs crashes and

    security advisories.

  • Memory catastrophesReturning a pointer to a local variable.

    We do this all the time in Perl, for example.sub name_ref { my $name = 'Bob'; return \$name;}

    my $ref = name_ref();print ${$ref};

    Perl has reference counting to keep track of when areas of memory are no longer used and can be returned to memory.

  • Memory catastrophesWhen you exit a function in C, you lose the rights to what's on the stack.char *name( void ) { char temp[4]; int n; strcpy( temp, "Bob" ); return temp;}

    char *who = name();do_something();puts( who );

    Let's see how this works.

  • Memory catastrophesBefore calling name()char *name( void ) { char temp[4]; int n; strcpy( temp, "Bob" ); return temp;}

    char *who = name();do_something();puts( who );

    Top of stack

    n (4 bytes)

    temp[0]

    temp[1]

    temp[2]

    temp[3]

    Return address

  • Memory catastrophesJust called name()char *name( void ) { char temp[4]; int n; strcpy( temp, "Bob" ); return temp;}

    char *who = name();do_something();puts( who );

    Top of stack

    n (4 bytes)

    temp[0]

    temp[1]

    temp[2]

    temp[3]

    Return address

  • Memory catastrophesReturning from name()char *name( void ) { char temp[4]; int n; strcpy( temp, "Bob" ); return temp;}

    char *who = name();do_something();puts( who );

    Top of stack

    n (4 bytes)

    temp[0]

  • Memory catastrophesReturned from name()char *name( void ) { char temp[4]; int n; strcpy( temp, "Bob" ); return temp;}

    char *who = name();do_something();puts( who );

    who

    Top of stack

  • Memory catastrophesCalled do_something()char *name( void ) { char temp[4]; int n; strcpy( temp, "Bob" ); return temp;}

    char *who = name();do_something();puts( who );

    ???

    ???

    ??? who

    ???

    ???

    ???

    Return address

  • Memory catastrophesPrinting whochar *name( void ) { char temp[4]; int n; strcpy( temp, "Bob" ); return temp;}

    char *who = name();do_something();puts( who );

    ???

    ???

    ??? who

    ???

    ???

    ???

    Top of stack

  • Memory catastrophesIf you dereference NULL, you crash.char *p = NULL;*p = 'x';

  • Memory catastrophesIf you dereference a random value:char *p;*p = 'x';

  • Memory catastrophesIf you free something you didn't malloc:char name[] = 'Bob';char *p = &name;free(p);

  • Memory catastrophesIf you dereference memory, and use it again, you crash, or corrupt memory, or open yourself to a security hole.char *p = malloc( 100 );...free(p);*p = 'x';

  • Memory catastrophesIf you use more memory than you allocated, you crash, or corrupt memory, or open yourself to a security hole.char *p = malloc( 10 );strcpy( p, "Hello, world!" );

  • Memory catastrophesIf you use more memory than you allocated, you crash, or corrupt memory, or open yourself to a security hole.char *p = malloc( 10 );strcpy( p, "Hello, world!" );

    orchar *p = malloc(10);p[10] = 'x'; /* Off-by-one, a fencepost error */

  • Memory catastrophesIf you allocate memory and don't free it, you have a memory leak.void do_something( void ) { char *p = malloc(100); /* do some stuff */ return;}

    p is never freed, and we'll never have that pointer to that buffer again to free it in the future.

  • Questions?Groans of fear?

  • Advanced pointers

  • Advanced pointersInstall cdecl. It is invaluable. It also is rarely packaged, so you'll have to build from source.uniqua:~ : cdeclType `help' or `?' for helpcdecl> explain char *pdeclare p as pointer to charcdecl> explain int x[34]declare x as array 34 of int

  • Advanced pointersvoid * throws away your type checking.char name[100];char *p = name;int *n = p;

    void.c:5: warning: initialization from incompatible pointer type

    But this causes no warnings:char name[100];char *p = name;void *v = p;int *n = v;

  • Double pointersYou can have a pointer to a pointer, so you can modify your pointers.char *p = "Hello, world.";repoint_pointer( p );

    void repoint_pointer( char **handle ) { /* Repoint if the current string starts with 'H' */ if ( strcmp( *handle, "Hello, world." ) == 0 ) { *handle = "Goodbye, world."; }}

  • Function pointersYou can point to functions.cdecl> explain void (*fn_ptr)(void)declare fn_ptr as pointer to function (void) returning void

    Here's how it's usedvoid (*fn_ptr)(void) = \&some_action;if ( x == 1 ) { fn_ptr = \&some_action;}fn_ptr();

  • Function pointersYou can point to any kind of function.cdecl> explain char * (*fn_ptr)(int, char **)declare fn_ptr as pointer to function (int, pointer to pointer to char) returning pointer to char

    This is how you can do dispatch tables.

    Use typedef to create names for these types.There is no shame in using cdecl.

  • Questions?

  • const

  • constThe const qualifier lets the compiler know you are don't want to modify a value.const int bufsize = NUSERS * sizeof(int);

    Trying to modify bufsize is an error.bufsize++; /* error */

  • constLiteral strings should be thought of as const.const char username[] = "Bingo";username[0] = 'R'; /* Error */

    If your compiler has a switch to make this automatic, use it.

  • constconst your pointers in function arguments to tell the compiler that you promise not to touch the contents.int strlen( const char *str );int strcmp( const char *a, const char *b );

    It would be tragic if strlen() modified what was passed into it, no?

  • constconst also lets the compiler or other tools know that your function does not initialize the contents of what's passed in.int strlen( const char *str );int mystery_func( char *str );

    char *p;n = strlen(p); /* "uninitialized value" error */n = mystery_func(p); /* not an error */

    The compiler has to assume that mystery_func is going to fill str.

  • Questions?

  • Multiple source files

  • Header files/* In handy.h */int square( int n );

    /* In handy.c */int square( int n ) { return n*n;}

    /* In hello.c */#include /* for square() */#include /* for printf() */

    int main( int argc, char *argv ) { printf( "Hello, world, 12^2 = %d\n", square( 12 ) );

    return 0;}

  • Standard header files#include /* Standard I/O functions */

    #include /* Catch-all useful: malloc(), rand(), etc */

    #include /* Time-handling structs & functions */

    #include /* String handling */

    #include /* Math functions of all kinds */

  • Package header filesLook in /usr/include or /usr/local/include#include /* Berkeley DB */

    #include /* SQLite3 */

    #include /* LDAP */

    #include /* Apache 2's main header file */

  • ackack is grep for big codebases.

    It searches recursively from current directory by default.

    It lets you specify filetypes.

    http://petdance.com/ack/

  • TagsTags files are prebuilt indexes of the symbols in your project.ctags \ --links=no --totals \ -R --exclude=blib --exclude=.svn \ --exclude=res_lea.c \ --languages=c,perl --langmap=c:+.h,c:+.pmc,c:+.ops \ .1371 files, 649213 lines (19150 kB) scanned in 2.2 seconds (8665 kB/s)33566 tags added to tag file33566 tags sorted in 0.00 seconds

  • TagsEach line of the tags file looks like this:find_builtinsrc/builtin.c/^find_builtin(ARGIN(const char *func))$/;"ffile:

    But run together with tabs.

    Tells the editor how to find the symbol.

  • TagsJump to a tag from the command-line:$ vim -t find_builtin

    Jump to a tag from inside vim::tag find_builtin

    Jump to the symbol under the cursor, or jump back to previous positionCtrl-], Ctrl-t

  • TagsThese have been vim tag files.

    Emacs supports tags as well.

    Exuberant ctags generates tags in both forms.http://ctags.sf.net/ (I think)

  • Questions?

  • Slides will be athttp://petdance.com

  • Topics omittedFile I/OMacro side effectswarnings vs. errorsdebuggingprofilingWorking on large projectslint/splintvalgrind