Upload
walter
View
41
Download
0
Embed Size (px)
DESCRIPTION
CS 5600 Computer Systems. Lecture 2 : Crash Course in C “CS 5600? That’s the class where you write 1 million lines of C code.”. Overview. C: A language written by Brian Kernighan and Dennis Ritchie UNIX was written in C Became the first "portable " language - PowerPoint PPT Presentation
Citation preview
CS 5600Computer Systems
Lecture 2: Crash Course in C“CS 5600? That’s the class where you
write 1 million lines of C code.”
2
Overview
• C: A language written by Brian Kernighan and Dennis Ritchie– UNIX was written in C– Became the first "portable" language
• C is a typed, imperative, call-by-value language• “The C programming language -- a language
which combines the flexibility of assembly language with the power of assembly language.”
3
Programming in C
• Four stages– Editing: Writing the source code by using some IDE or
editor– Preprocessing: Already available routines – Compiling: Translates source to object code for a specific
platform – Linking: Resolves external references and produces the
executable module • For the purposes of this class:– Editing (emacs or vim)– Compiling (gcc and make)
4
Example: Hello World in C
1: #include <stdio.h>2:3: int main(int argc, char *argv[]) {4: printf("Hello, world!\n");5: return 0;6: }
5
Line 1
1: #include <stdio.h>
• During compilation, the C compiler runs a program called the preprocessor– The preprocessor is able to add and remove code
from your source file• In this case, the directive #include tells the
preprocessor to include code from the file stdio.h• Equivalent to import in Python or Java
6
Line 3
3: int main(int argc, char *argv[]) {
• This statement declares the main function– A C program can contain many functions but must
have one main.• A function is a self-contained module of code that
can accomplish some task• The int specifies the return type of main– By convention, a status code is returned– 0 = success, any other value indicates an error
7
Line 4
4: printf("Hello, world!\n");
• printf is a function from a standard C library– Prints strings to standard output (usually the
terminal)– Included by #include <stdio.h>
• The compiler links code from libraries to the code you have written to produce the final executable
8
Line 5
5: return 0;
• Returns 0 from the current function– For main(), this is the return status of the program– Return value type must match the function
definition• No statements after the return will execute
9
Data Types, Constants, Variables
10
Data Types
• A variables data type determines– How it is represented internally by the hardware– How it may be legally manipulated (operations allowed on that data
type)– What values it may take on
• Constants and variables are classified into 4 basic types– Character: char (1 byte)– Short: short (2 bytes)– Integer: int (usually 4 bytes), long int (usually 8 bytes)– Floating Point: float (usually 4 bytes), double (8 bytes)
• All other data objects are built up from these fundamental types
11
char
• What is a character?– A member of the character set– ASCII: American Standard Code for Information
Interchange (128 characters)– Each character is internally represented using a
number (e.g: A is 65)• Recognized types, sizes and ranges:– char: 1 byte (-128 ≤ x ≤ 127)– unsigned char: 1 byte (0 ≤ x ≤ 255)
12
More about char
• ASCII character ‘2’ and the number 2 are not the same (2 != '2')– '2' is the character 2 (50 in ASCII)– 2 is the number 2– As a result, '2' = 50
• Escape Sequence– Special characters denoted by ‘\’ followed by characters or
hexadecimal code– \n new line– \t tab– \r carriage return
– \a alert– \\ backslash– \” double quote
13
short, int, long
• A whole or integral number, not containing a fractional part• Recognized types, size and ranges:
– short int: 2 bytes (-215 ≤ x ≤ 215 –1)– int: 4 bytes (-231 ≤ x ≤ 231 – 1, signed by default)– unsigned int: 4 bytes (0 ≤ x ≤ 232-1)– long long int: 8 bytes (-263 ≤ x ≤ 263 – 1, signed by default)
• Expressible as octal, decimal or hexadecimal– Integer starting with 0 will be interpreted in octal (e.g: 010 is 8)– Integer starting with 0x will be interpreted in hexadecimal (0x10 is 16)
• Negative numbers typically represented using two’s complement– What does short a = -7 look like in machine representation?
14
float, double
• A number which may include a fractional part– Representation is an estimate (although at times, it may be exact)– 2.13 can be represented exactly – 1.23456789 x 1028 does not fit into a float
• Hence number is truncated
• Recognized types, sizes and ranges– float: 4 bytes (1 sign bit, 8 exponent bits, 24 fraction bits)
• Range is -1e37 ≤ x ≤ 1e38
– double: 8 bytes (1 sign bit, 11 exponent bits, 53 fraction bits)• Range is -1e307 ≤ x ≤ 1e308
– long double: usually the same as double, sometimes 80 bits• You won’t be using floats or doubles in this class
15
Arithmetic Operators
• Order of precedence (highest to lowest)– Parenthesis– Multiplication, division, modulus– Addition, subtraction– Comparisons– Assignment
Assignment (=) a = bAddition (+) a + bSubtraction (-) a – bMultiplication (*) a * bDivision (/) a / bModulus (%) a % b
Equal to a == bNot equal to a != bLess than a < b< or = to a <= bIncrement a++Decrement a--
16
Bitwise Operators• ints can be viewed as collections of bits; C provides bitwise
operations– Bitwise AND (&) a & b – Bitwise OR (|) a | b – Bitwise NOT (~) ~a – Bitwise XOR (^) a ^ b – Bitwise left shift (<<) a << b – Bitwise right shift (>>) a >> b
• Examples:unsigned int a = 9; // 00001001unsigned int b = 3; // 00000011printf("9 & 3 is: %d\n", (a & b)); // 00000001printf("9 << 3 is: %d\n", (a << b)); // 01001000
17
Constants
• Constants are values written directly into program statements– Not stored in variables– Can assign constant values to variables
unsigned int x = 122;char x = 'c';
• Variables can also be set as constantsconst int x = 1234;x = 5; // illegal, won’t compile
18
Numerical Constants
• Integer constants– Default data type is int– If value exceeds int range, it then becomes long– Can force compiler to store value as long/unsigned using l/u
suffixes
long c = 0x0304lu;• Floating point constants
– Default type of floating point is double– Can force type float during compilation by using decimal (e.g., 1.0)– Can also use scientific notation
float x = 1.3e-20;
19
Variables in C• Variable declaration (e.g., int x = 7;)
– Associates data type with the variable name– Allocates memory of proper size– Cannot be re-declared within the same scope– Variable memory space initially contains junk, unless initialized
int x;printf(“%i\n”, x); // who knows what will print
• Variable names– Set of characters starting with [A-Z] or [a-z] or underscore – Period (.) is not allowed; no leading numbers; no reserved
keywords – Case sensitive
• Convention is lowercase used for variables and UPPERCASE for constants
20
Variable Types
• Character variables:char first = 'c';char second = 100;
• Integer variablesint a = 8;short unsigned int c = 2;
• Floating point variablesfloat g = 3.4;double e = -2.773;
21
Arrays• A data structure storing a collection of identical objects
– Allocated memory space is contiguous– Identified using a single variable– The size is equal to the number of elements and is constant
• Create an array called test_array which has 10 integer elementsint test_array[10];int my_array[2] = { 30, 10 };
• Array Elements– Referenced by name[index] (0-indexed; 2nd element is test_array[1] )– Array bounds not checked
int my_array[10];printf(“%i\n”, my_array[37]); // will compile and run, but may crash
my_array[123] = 7; // will compile and run, but is wrong
22
Type Casting
• What happens if you need to assign a variable of one type to a variable of a different type?
• By default, operands converted to the longest type– char converts to int, int converts to float, float converts
to double• Manual casting
int y = (int) 3.14 * x; // legaldouble z = (double) y; // legalchar * a = (char *) (3.14 * x); // won’t compile
23
Division and Modulus
• Dividing ints always results in a new intprintf(“%i\n”, 30 / 2); // 15printf(“%i\n”, 31 / 2); // 15printf(“%i\n”, 30 / 29); // 0
• Can force float division by castingprintf(“%f\n”, 31 / (float) 2); // 15.5
• Modulus operator yields remainder of divisionprintf(“%i\n”, 30 % 2); // 0printf(“%i\n”, 31 % 2); // 1
24
sizeof()
• Returns number of bytes required to store a specific data type.– Usage: int x = sizeof(<argument>);
• Works with types, variables, arrays, structures:– Size of an int sizeof(int); // 4– Size of a structure sizeof(struct foo);– Size of an array element: sizeof(test_array[0]);– Size of the entire array: sizeof(test_array);– Size of variable x: sizeof(x);
25
Pointers
26
Pointers
• Pointers are an address in memory– Includes variable addresses, constant addresses,
function addresses...• Pointer is a data type just like any other (int, float,
double, char)– On 32-bit machines, pointers are 4 bytes– On 64-bit machines, pointers are 8 bytes
• Pointers point to a particular data type– The compiler checks pointers for correct use just as it
checks int, float, etc.
27
Declaring Pointers
• Use * to denote a pointerint *ptrx; // pointer to data type intfloat *ft; // pointer to data type floatshort *st; // pointer to data type short
• Compiler associates pointers with corresponding data types– int * must point to an int, etc
28
Referencing/Dereferencing Pointers• How can you create a pointer to a variable?
– Use &, which returns the address of the argument
int y = 7;int *x = &y; // assigns the address of y to x
• How can you get the value pointed to?– Dereference the pointer using *– Go to the address which is stored in x, and return the value at that address
int y = 7; // y holds the value 7int *x = &y; // x is the memory address of yint z = *x; // z now holds the value 7(*a)++; // increments the value pointed to a *(a + 1); // accesses the value pointed to by the address (a + 1)
29
Pointer Quiz
int y = 10;int x = y;y++;x++;
• What is the value of y? 11int y = 10;int *x = &y;y++;(*x)++;
• What is the value of y? 12
30
Arrays and Pointers• Compiler associates the address of the array to/with the
nameint temp[34];
– Array name (temp) is the pointer to the first element of array• To access the nth element of the array:
– Address = starting address + n * size of element– Starting address = name of the array – Size of element = size of data type of array– array[n] de-references the value at the nth location in the array
int temp[10]; // Assume temp = 100 (memory address)temp[5] = *(100 + sizeof(int) x 5) = *(120)
31
Passing Arrays• Passing an array to a function passes a pointer
– Hence arrays are always passed by reference
int general (int size, int name []);//Expects a pointer to an int array
int general (int size, int *name);//Expects a pointer to an int
void foo(int a[]) {a[0] = 17;
}
int b[1] = { 5 };foo(b);
• What is the value of b[0]? 17
32
Functions and Pointers
• Functions must return a value of the declared type• Just like variables, functions can return a pointer
float *calc_area (float radius);• Function formal arguments may be of type
pointer:double calc_size (int *stars);
• For example, scanf takes in parameters as pointers:int scanf(const char *format, ...); // int*, int*scanf("%d%f", &x, &f);
33
Passing in Pointers
• Why pass a variable address at all and complicate functions?– By design we can return only one value– Sometimes we need to return back more than one
value• For example, consider scanf("%d %f", &x, &f); – Three values are returned (in x, f, and the return
value)• Pointers allows us to return more than one value
34
Pointer Arithmetic
• Pointers can be added to and also subtracted from– Pointers contain addresses
• Adding to a pointer goes to next specified location (dep. on data type)– <data type> *ptr;– ptr + d means ptr + d * sizeof (<data type>);
• For exampleint *ptr;
– ptr + 2 means ptr + 2*sizeof(int) which is ptr + 8char *ptr;
– ptr + d means ptr + 2*sizeof(char) which is ptr + 2
35
Example#include <stdio.h>void main () {
int *i;int j = 10;i = &j;printf ("address of j is : %p\n", i);printf ("address of j + 1 is : %p\n", i + 1);
}
• What is the output?$ ./a.outaddress of j is : 0xbffffa60address of j + 1 is : 0xbffffa64
• Note that j + 1 is actually 4 more than j
36
Strings
37
Character Strings
• A sequence of character constants such as “This is a string”– Each character is a character constant in a consecutive
memory block
• Each character is stored in ASCII, in turn is stored in binary – Character strings are actually character arrays
• A string constant is a character array whose elements cannot change
char *msg = "This is a string";
T H I S I S A S T R I N G \0
38
Strings as Arrays
char *msg = "This is a string !";• The variable msg is associated with a pointer to the first element
– msg is an array of 19 characters– \0 is also considered a character
• Appended to each string by the compiler• Used to distinguish strings in memory, acts as the end of the string• Also called the NULL character
• Character pointerschar *ptr;ptr = "This is a string";
– ptr is a char pointer containing the address of the first character (T)
39
String Functions
#include <string.h>int strcmp(char * a, char * b);
// return 0 if strings are identical// otherwise, return is non-zero
int strlen(char * a);// returns count of characters, not including null
char * strcpy(char * dst, char * src);// copies characters from b into a// returns the address of a// WARNING: a must have enough space to hold b!
40
Getting Numbers from Strings
#include <stdlib.h>int atoi(const char *a);
// converts an alphanumeric string to an // integer if possible (returns 0 on failure)
double atof(const char * a);// converts string to double if possible
int a = atoi("17"); //a is now 17double b = atof("89.29393"); //b is now 89.29393
41
Structures
42
Structures• An aggregate data type which contains a fixed number of components
– Declaration:
struct name {// a field// another field
};• For example
struct dob {int month;int day;int year;
};• Each dob has a month, day, and year (ints) inside it
43
Using Structures• Declare variables using struct keyword
– All internal variables are allocated space
struct dob d1, d2;• Access member values using ‘.’ notation
d1.day = 10; d2.year = 1976;printf("%d\n", d2.year);
• A structure can be assigned directly to another structurestruct dob d1, d2;d1.day = 10; d1.month = 12; d1.year = 1976;d2 = d1; // now d2 has the same values as d1 for its fields.
44
Operations on Structures
• Cannot check to see if two structures are alike directlystruct dob d1, d2;if (d1 == d2) // WRONG !!!
– To compare, we need to check every internal value• Cannot print structures directly
– Must print one field at a time• Pointers to structures use the ‘->’ notation
struct dob *d1 = (struct dob *) malloc(sizeof(struct dob));d1->year = 1976; d1->day = 26; (*d1).month = 6;
45
A Little More on Structures
• Can be initialized field by field or during declaration
struct dob d1 = {26, 6, 1976};• Can create arrays of structures
struct dob d1[10]; // array of 10 structs dob• … and access them in the usual manner
d1[1].day = 26; d1[1].month = 6;d1[1].year = 1976;
46
Making a struct Into a Type
• Type definition allows an alias for an existing type identifiertypedef <type> <name>;
• For examplestruct dob_s {
int day;int month;int year;
};typedef struct dob_s dob;
• Now you can simplydob my_dob;my_dob.year = 1984;
Alternate form:typedef struct dob_s {
int day;int month;int year;
} dob;
47
Memory Management
48
Memory Management
• Two different ways to look at memory allocation– Transparent to user
• Done by the compiler, OS
– User-defined• Memory allocated by the user
• Why allocate memory?– Required by program for permanent variables– Reserve working space for program
• Normally maintained using a stack and a heap
49
Stack
• Programs are typically given an area of memory called the stack– Stores local variables– Stores function parameters– Stores return addresses
• On x86, the stack grows downwards– Functions are responsible for allocating and
deallocating storage on the stack
50
Stack Frame Example
void foo(char * packet) { int offset = 0; char username[64]; int name_len;
name_len = packet[offset]; memcpy(&username, packet + offset + 1,name_len); …}
Callers return address
char username[]
int offset
int name_len
Stack
X
X-4
X-8
X-72
X-76
name_len name0 43
Packet
Chris
to
Wils
on
15
Bytes n
51
Local Variable Scoping
• Variables on the stack only exist when they are in scope– i.e. the function that created the variables is executing
struct mystruct *foo(int i) {struct mystruct;mystruct.i = 7;return &mystruct;
}• What happens if you try the program above?
– Can the caller access mystruct?
52
Heap
• A heap is a collection of memory manipulated but the user/system– No order specified for addition/removal
• Typically, the OS manages a pool of free heap memory– Applications may request allocations of memory
from this pool at runtime
53
Dynamic Memory Allocation• There are times when we don’t know how much memory is
required– Employee data base doesn’t know how many employees will exist– Program accepting command line input of different sizes– User requests more records to be created
printf("Add a new record y/n ?");scanf ("%c", &response);if (response == 'y') {
// we now have to allocate space}
• We use the heap to allocate memory dynamically
54
malloc()
#include <stdlib.h>void *malloc (size_t size); // size_t = unsigned int
• Returns a pointer to a contiguous block in memory of size bytes– For example
int *x = malloc(sizeof(int));• Returns a valid address, or NULL if an error occurs
– ALWAYS check the return value of malloc• Memory is allocated (automatically) on the heap
– Allocated space is not initialized; contains garbage
55
A void *?
• Why is void * returned?– malloc just allocates memory which consists of some number of
bytes– Memory as such has no data type– Programmer must associate a data type with the block of memory
• How to assign a data type to the memory just created?– Use type casting
double *x; // x is a pointer to doublex = (double *) malloc(100);
• How many doubles can we store? 100/8 = 12– We can clearly see a pit falls of malloc here
x = (double *) malloc(100 * sizeof(double)); // better
56
Freeing Memory
• You, the programmer, are in charge of freeing memory– Only dynamically allocated memory can be freed
• When finished using dynamically allocated memory, free it
#include <stdlib.h>void free (void *blk);
• Good practice to avoid memory leaks– Growing loss of memory due to unreleased locations– This will be crucial when developing your OS
57
Using free()char *x;x = (char *) malloc (10 * sizeof (char));free (x); // frees the memory just assigned
• Never free a memory block that is not dynamically allocated.int *x = 0;free (x); // error!!!
• Never double-freechar *x = (char *) malloc (10 * sizeof (char));free (x); free (x); // error!!!
• Never access freed memorychar *x = (char *) malloc (10 * sizeof (char));free (x); strcpy(x, "welcome"); // program may or may not crash
58
Keep Track of Your Pointers
• Dangling pointerschar *x, *y;x = (char *) malloc (10 * sizeof (int));y = (char *) malloc (10 * sizeof (int));x = y; // cannot access what was in x anymore
• Can easily lead to memory leaks
59
Writing Safe Code in an Unsafe Language
60
A Few Tips
• C is powerful because it gives you raw access to the system and to memory
• But, C provides much less help to the programmer than others– You can get yourself into trouble easily– C is very liberal with types (i.e. because of casting)
• Also, much harder to debug– Memory is just an array of bytes
• Programmer has to assign meaning
– You can overwrite memory, making the source of problems• A few tips will make your experience in this class easier
61
Buffer Overflows
char b[10]; b[10] = x;
• Array starts at index zero– So [10] is 11th element– One byte outside buffer was referenced– Program may or may not crash
• Off-by-one errors are common and can be exploitable!
62
Buffer Overflow Example
char username[]
int offset
int name_len
Stack
X
X-4
X-8
X-72
X-76
Chris
to
Wils
on
1572 (64 + 8)
void foo(char * packet) { int offset = 0; char username[64]; int name_len;
name_len = packet[offset]; memcpy(&username, packet + offset + 1,name_len); …}
Callers return address
name_len name0 43
PacketBytes n
Name data…
63
What Happens With an Overflow?
• If memory doesn't exist:– Bus error, crash
• If memory protection denies access:– Segmentation fault, crash– General protection fault, crash
• If access is allowed, memory next to the buffer can be accessed– Can overwrite the heap, stack– Worst of all options; won’t detect immediately
• Leads to weird, non-deterministic outcomes
– Can compromise your program
64
Preventing Buffer Overflows
void foo(int *array, int len);• Always pass array/buffer length with array– Don’t assume you know the length – Many library functions require you to do this already
int foo[3];for (int i=0; i<=3; i++) foo[i] = 0;
• Remember that last element of n-length array is n-1• Can happen to the best programmers
65
Problems With String Functions
char * strncpy(char * dst, const char * src, size_t len);– len is the maximum number of characters to copy – dst is NULL-terminated only if < len characters were copied!– All calls to strncpy must be followed by a NULL-termination
operation
int strlen(const char * str);– What happens when you call strlen on an improperly terminated
string?• strlen scans through memory until a null character is found• Can scan outside buffer if string is not null-terminated
– strlen is not safe to call unless you know string is NULL-terminated
66
The Worst String Operation
char * strcpy(char * dst, const char * src);• How can you use strcpy safely?– Set the last character of src to NULL• Even if string is shorter than the entire buffer
– Do not check according to strlen(src)!• Check that the src is smaller than or equal to dst– Or allocate dst to be at least equal to the size of src
67
Debugging
68
Debugging With gdb
• GDB is a debugger that helps you debug your program– Time you spend learning gdb will save you days of
debugging time• You need to compile with the -g option to use gdb– The -g option adds debugging information to your
program
$ gcc -g -o hello hello.c• Should be done automatically in all Makefiles we
give you
69
Running gdb• To run a program with gdb type
$ gdb name_of_program....(gdb)
• Then set a breakpoint in the main function– Marker in your program that will make the program stop – Return control back to gdb
(gdb) break main
• Now run your program– If your program has arguments, you can pass them after run
(gdb) run arg1 arg2 ... argN
70
Stepping Through Your Program
• Your program will start and then pause at maingdb>
• You have the following commands to run your program step by step
(gdb) step– It will run the next line of code and stop– If it is a function call, it will enter into it
(gdb) next– It will run the next line of code and stop– If it is a function call, it will go through it
• If the program is running without stopping, regain control CTRL-C
71
Setting Breakpoints
• You can set breakpoints in a program in multiple ways:
(gdb) break function_name– Set a breakpoint in a function
(gdb) break line_# – Set a break point at a line number in the current
file
(gdb) break file_name:line_#– Set a break point at a line number in a specific file
72
Inspecting the Stack
• The command(gdb) where
– Prints the current function being executed – And the chain of functions that are calling that function– This is also called the backtrace
• Example:(gdb) where#0 main () at test_mystring.c:22#1 test () at test_mystring.c:38(gdb)
73
Inspecting Variables
• To print the value of a variable(gdb) print variable_name
– Will automatically print char*s and arrays
(gdb) print i$1 = 5(gdb) print s1$1 = 0x10740 "Hello"(gdb) print stack[2]$1 = 56(gdb) print stack$2 = {0, 0, 56, 0, 0, 0, 0, 0, 0, 0}(gdb)
74
Catching Seg Faults• If your program seg faults, gdb will catch it
(gdb) runStarting program: /home/cbw/a.out test string
Program received signal SIGSEGV, Segmentation fault.0x4007fc13 in _IO_getline_info () from /lib/libc.so.6
(gdb) backtrace#0 0x4007fc13 in _IO_getline_info () from /lib/libc.so.6#1 0x4007fb6c in _IO_getline () from /lib/libc.so.6#2 0x4007ef51 in fgets () from /lib/libc.so.6#3 0x80484b2 in main (argc=1, argv=0xbffffaf4) at segfault.c:10#4 0x40037f5c in __libc_start_main () from /lib/libc.so.6
75
Valgrind
• Valgrind is a memory mismanagement detector– Use of uninitialized memory– Reading/writing memory after free()– Reading/writing outside bounds of malloc()ed
memory– Reading/writing inappropriate areas of the stack– Memory leaks
76
Finding Memory Leaks
#include <stdlib.h>int main() {
char *x = malloc(100);return 0;
}
$ valgrind --tool=memcheck --leak-check=yes example1==2330== 100 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==2330== at 0x1B900DD0: malloc (vg_replace_malloc.c:131)==2330== by 0x804840F: main (example1.c:5)
77
Finding Invalid Pointer Use
#include <stdlib.h>int main() {
char *x = malloc(10);x[10] = 'a';return 0;
}
$ valgrind --tool=memcheck --leak-check=yes example2==9814== Invalid write of size 1 ==9814== at 0x804841E: main (example2.c:6)==9814== Address 0x1BA3607A is 0 bytes after a block of size 10 alloc'd==9814== at 0x1B900DD0: malloc (vg_replace_malloc.c:131) ==9814== by 0x804840F: main (example2.c:5)
78
Detecting Use of Uninitialized Variables
#include <stdio.h>int foo(int x) {
if(x < 10) printf("x is less than 10\n");}int main() {
int y;foo(y);return 0;
}==4827== Conditional jump or move depends on uninitialized value(s)==4827== at 0x8048366: foo (example4.c:5)==4827== by 0x8048394: main (example4.c:14)