27
Static Analysis In Software Security Project Report For Summer Project At Institute for Development and Research in Banking Technology May 1- June 30, 2013 Guide: Dr. V.Radha Institute for Development and Research in Banking Technology, Hyderabad By: Krishnendu Saha Indian Institute of Technology, Kharagpur 1

Static Analysis In Software Security - IDRBT Saha_Static... · Static Analysis In Software Security ... Diagram 1)Lexical analysis 2)Parse Tree and AST ... a basic block that has

Embed Size (px)

Citation preview

Static Analysis In Software Security Project Report For Summer Project At Institute for Development and Research in Banking Technology May 1- June 30 2013

Guide Dr VRadha Institute for Development and Research in Banking Technology Hyderabad

By Krishnendu Saha Indian Institute of Technology Kharagpur

1

CERTIFICATE OF COMPLETION This is to certify that Mr Krishnendu Saha has successfully completed Summer Project on ldquoStatic Analysis in Software Security under the guidance of Dr V Radha IDRBT The Duration of this Project was from May 1 2013 to June 30 2013 Dr V Radha Institute for Development and Research in Banking Technology (Guide)

2

Abstract

Security is sometimes considered as perimeter security ie restricting attackers from reaching deep inside our enterprise But to be totally secure software must be without any weakness that may go wrong even under some internal causes So security should be concerned through out al the process of software development That is where the utility of static analysis tools come They can find out vulnerability just by looking at the source code at the time of coding itself thus saving software testing time as much less vulnerable code In this project I have built some code checker that works as plugins of Eclipse IDE for CC++ language Though C is a highly used language many of its library functions are vulnerable

3

CONTENTS Topics Page 1 Introduction 4 11 Static Analysis In context of Software Security 5

2 Static Analysis8

21 Definition 8 22 Working Procedure 8 221 Build Model10 222 Perform Analysis helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip13 223 Present Results helliphelliphelliphelliphelliphelliphelliphelliphelliphellip16

3 Ways of Implementing Static Analysis Tools 16

31 My Static Analysis Tool Implementationhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip16

4 Implementation Tools17

41 Hardware Details17 42 Software Details 17 421Eclipse IDEhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip18 422Codan 18 423 PDEhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip18

5Implemention And Resultshelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip20

6Future Work Scope helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip27

7Limitations helliphellip helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip27

8Conclusions helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip28 8Referenceshelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip28

4

1Introduction In this digital age we use software to in every phase of our life whether it be our day to day articles or satellites in outer space Softwares automate totally or partially the things we use So a small mistake may lead to a huge apocalypse Hence softwares should be reliable for our own safety Also there are bunch of people ( hackers) who tries to jeopardize the system Cyber threat is a matter of huge concern these days Software security is the practice of building software to be secure and function properly under malicious attack The traditional way of making software less vulnerable is to test it with different sets of inputs thus finding out areas of weaknesses But if we can apply our knowledge about common vulnerabilities at the time of building it then the huge cost effort and time can be saved

And here comes the importance the importance of static analysis After all an attacker becomes successful if there is weakness in code If the vulnerable points are reduced then we may demand our software to be much more fail proof

5

11 Static Analysis In context of

Software Security Software security means

working of software correctly (giving correct outputs) under all possible situations even under malicious attacks (ie intentionally trying to find software weaknesses and exploit them ) Software Security is sometimes thought as security features cryptographic ciphers passwords and access control mecha- nisms But For a program to be secure all portions of the program must be secure not just the bits that explicitly address security In many cases security failings are not related to security features at all In conventional and mostly used way software security is considered in test and field phases of software building But those are actually effort to make up coding malpractices

The root to security issues lies in coding malpractices and using vulnerable Library functions and API lsquos So security issues must be considered during coding with faulty library functions and in the early stages of software development

Dynamic Analysis Firewall

Virus Scanner Penetration Detection Intrusion Detection

6

The root to security issues lies in coding malpractices and using vulnerable Library functions and API lsquos So security issues must be considered during coding with faulty library functions and in the early stages of software development

It is easier to fix the problems in the development stages as they are simple But in testing phase if some bugs appears then it may require to recheck the whole programme again

Static Analysis

Architectural risk Analysis

Security requirements

7

2Static Analysis 21 Definition Static analysis is analysing the source

code of software without executing it

22 Working Procedure It is divided in four steps

1Build Model

2 Perform Analysis Performing analysis needs another basic step of gathering security knowledge 21Security Knowledge 3Present Results

8

1)Lexed Tokens 2)Parse Tree 3)Abstract Syntax Tree 4)Control Flow Graph 5)Data Flow Diagram

1)Lexical analysis 2)Parse Tree and AST Analysis 3) Control Flow Graph Analysis 4)Data Flow Diagram Analysis 5)Taint Analysis 6)Value Range Propagation

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb Project (httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project) 3) SAMATE group at NIST ( httpsamatenistgov )

1)error (severe threat) 2)warning (may or may not be a security bug but obeying it is good practice) 3)info(good coding practice but no threat

9

221 Build model In analysis to understand the code by

analysis tools it needs to be represented by data structures that most nearly represents the property to be analysed Those basic data structures are actually build by compilers and static analysis tools borrow them and Those data-structures are lexer tokens parse tree abstract syntax tree (AST) control flow graph(CFG) dataflow diagram(DFD) This models are build by compilers or static analysis tools or by both

Models Used in Analysis

Lexed Tokens The source code converted into a token stream

discarding unimportant whitespaces and comments Eg Source Code if (ret) mat[x][y] = END_VAL This code produces the following sequence of tokens Lexer Output IF LPAREN ID(ret) RPAREN ID(mat) LBRACKET ID(x) RBRACKET LBRACKET ID(y) RBRACKET EQUAL ID(END_VAL) SEMI Some of the token needs extra one property like name for identifier(ID) These token stream is subsequently used in making parse tree

Parse Tree A language parser uses a context-free grammar (CFG) to match

the token stream The grammar consists of a set of productions that describe the symbols (elements) in the language The parser performs a derivation by matching the token stream against the production rules If each symbol is connected to the symbol from which it was derived a parse tree is formed

10

Control Flow Graph It is the graphical way of representing all

possible way the flow of programme may occur Each node in CFG represents a basic block that has no branching or looping CFG gives the idea of Cyclomatic complexity that directly shows the no of possibility of errors During dynamic analysis it also helps us to get exhaustive sets of test cases

Source Code if (a gt b) nConsec = 0 else s1 = getHexChar(1) s2 = getHexChar(2) return nConsec

11

Parser Output Parse Tree

CFG Builder output

Data Flow Diagram Data Flow Diagram shows all the possible path of

data in put and output and data transfer between different entities within the software thus giving us the points where data should be validated and setting up trust boundaries

12

If(agtb)

nConsec=0 s1 = getHexChar() s2= getHexChar()

return nConsec

222 Perform Analysis Analysis is performed on the tokens or nodes of tree

or graphs Lexical analysis Simplest of all analysis techniques helps in

checking syntactical errors and it uses in most cases regular pattern matching Not much useful than detecting wrong identifier names or function names Tools using lexical analysis techniques are ITS4 RATS and Flawfinder

Parse Tree and AST Analysis These representations helps us

understanding of semantics of the program So helps us in finding deviation of rules of grammar and security rules like a if block should start and end with curly bracket Most modern compilers does these kind of checking and violation comes as a parse error Codan the code analysis platform in CDT(CC+ Development Tools eclipse plugin ) uses AST to built checkers Similar Platform PMD Crystal (eclipse plugins) uses AST for detecting errors in Java

Control Flow Graph Analysis AST and parse trees though

appear useful enough in detecting most of rule violations they fail in case of rules that apply for branching in code As example we may take that opened file or database should be closed once and only once in a flow control of a programme CFG is analysed in a number of stages starting from basic block and then a procedure( method or function) and then to a bigger module like class Fortify Source Code Analyser Klockwork

Data Flow Diagram Analysis A Data Flow Diagram (DFD) with

security-specific annotations is used to describe how data enters leaves and traverses the system it shows data sources and destinations relevant processes that data goes through and trust boundaries in the system A DFD has a fixed set of component types Process HighLevelProcess Data Store and External Interactor A process is concern of DFD diagram A High Level program is represented by hierarchical multistage DFDs A datastore may be a database a file or the Registry An ExternalInteractor represents an entity that exists outside the system being modelled and which interacts with the system at an entry point typically a human The data flows are represented by arrows A DFD used in threat modelling often separates elements that have different privilege levels using a Boundary to describe locations where a privilege impersonation on the part of an adversary could occur a machine or process boundary may be crossed etc

13

Taint AnalysisThe concept of tainting refers to marking data coming

from an untrusted source as ldquotaintedrdquo and propagating its status to all locations where the data is used A security policy specifies what uses of untrusted data are allowed or restricted An attempt to use tainted data is a violation of this policy is an indication of a vulnerability Tainted data should not be used in any function which modifies files directories and processes or executes external programs If the rule is violated then the program should be aborted

1 Initialize all variables as NOT TAINTED 2 Find all calls to functions that read data from an untrusted source Mark the values returned by these functions as TAINTED 3 Propagate the tainted values through the program If a tainted value is used in an expression mark the result of the expression as TAINTED 4 Repeat step 3 until a fixed point is reached 5 Find all calls to potentially vulnerable functions If one of their arguments is tainted report this as a vulnerability 1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

Using Taint analysis memcpy() of both line no 5 and 7 will be marked as

vulnerability where as in actual case 5 is a false positive So taint analysis has a high possibility of giving false positive

Value Range Propagation In this case the tainted variables should

also carry a range of its possible values If some vulnerable function uses that variable then it may be checked that the value range of that variable still makes the function vulnerable or not Thus we may avoid some false positives

14

1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

2221Security Knowledge The main logic behind these tools

is to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again There are many such collections of common mistakes done by programmers

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb project(httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project)

3) SAMATE group at NIST (httpsamatenistgov)

In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized

15

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

CERTIFICATE OF COMPLETION This is to certify that Mr Krishnendu Saha has successfully completed Summer Project on ldquoStatic Analysis in Software Security under the guidance of Dr V Radha IDRBT The Duration of this Project was from May 1 2013 to June 30 2013 Dr V Radha Institute for Development and Research in Banking Technology (Guide)

2

Abstract

Security is sometimes considered as perimeter security ie restricting attackers from reaching deep inside our enterprise But to be totally secure software must be without any weakness that may go wrong even under some internal causes So security should be concerned through out al the process of software development That is where the utility of static analysis tools come They can find out vulnerability just by looking at the source code at the time of coding itself thus saving software testing time as much less vulnerable code In this project I have built some code checker that works as plugins of Eclipse IDE for CC++ language Though C is a highly used language many of its library functions are vulnerable

3

CONTENTS Topics Page 1 Introduction 4 11 Static Analysis In context of Software Security 5

2 Static Analysis8

21 Definition 8 22 Working Procedure 8 221 Build Model10 222 Perform Analysis helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip13 223 Present Results helliphelliphelliphelliphelliphelliphelliphelliphelliphellip16

3 Ways of Implementing Static Analysis Tools 16

31 My Static Analysis Tool Implementationhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip16

4 Implementation Tools17

41 Hardware Details17 42 Software Details 17 421Eclipse IDEhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip18 422Codan 18 423 PDEhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip18

5Implemention And Resultshelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip20

6Future Work Scope helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip27

7Limitations helliphellip helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip27

8Conclusions helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip28 8Referenceshelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip28

4

1Introduction In this digital age we use software to in every phase of our life whether it be our day to day articles or satellites in outer space Softwares automate totally or partially the things we use So a small mistake may lead to a huge apocalypse Hence softwares should be reliable for our own safety Also there are bunch of people ( hackers) who tries to jeopardize the system Cyber threat is a matter of huge concern these days Software security is the practice of building software to be secure and function properly under malicious attack The traditional way of making software less vulnerable is to test it with different sets of inputs thus finding out areas of weaknesses But if we can apply our knowledge about common vulnerabilities at the time of building it then the huge cost effort and time can be saved

And here comes the importance the importance of static analysis After all an attacker becomes successful if there is weakness in code If the vulnerable points are reduced then we may demand our software to be much more fail proof

5

11 Static Analysis In context of

Software Security Software security means

working of software correctly (giving correct outputs) under all possible situations even under malicious attacks (ie intentionally trying to find software weaknesses and exploit them ) Software Security is sometimes thought as security features cryptographic ciphers passwords and access control mecha- nisms But For a program to be secure all portions of the program must be secure not just the bits that explicitly address security In many cases security failings are not related to security features at all In conventional and mostly used way software security is considered in test and field phases of software building But those are actually effort to make up coding malpractices

The root to security issues lies in coding malpractices and using vulnerable Library functions and API lsquos So security issues must be considered during coding with faulty library functions and in the early stages of software development

Dynamic Analysis Firewall

Virus Scanner Penetration Detection Intrusion Detection

6

The root to security issues lies in coding malpractices and using vulnerable Library functions and API lsquos So security issues must be considered during coding with faulty library functions and in the early stages of software development

It is easier to fix the problems in the development stages as they are simple But in testing phase if some bugs appears then it may require to recheck the whole programme again

Static Analysis

Architectural risk Analysis

Security requirements

7

2Static Analysis 21 Definition Static analysis is analysing the source

code of software without executing it

22 Working Procedure It is divided in four steps

1Build Model

2 Perform Analysis Performing analysis needs another basic step of gathering security knowledge 21Security Knowledge 3Present Results

8

1)Lexed Tokens 2)Parse Tree 3)Abstract Syntax Tree 4)Control Flow Graph 5)Data Flow Diagram

1)Lexical analysis 2)Parse Tree and AST Analysis 3) Control Flow Graph Analysis 4)Data Flow Diagram Analysis 5)Taint Analysis 6)Value Range Propagation

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb Project (httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project) 3) SAMATE group at NIST ( httpsamatenistgov )

1)error (severe threat) 2)warning (may or may not be a security bug but obeying it is good practice) 3)info(good coding practice but no threat

9

221 Build model In analysis to understand the code by

analysis tools it needs to be represented by data structures that most nearly represents the property to be analysed Those basic data structures are actually build by compilers and static analysis tools borrow them and Those data-structures are lexer tokens parse tree abstract syntax tree (AST) control flow graph(CFG) dataflow diagram(DFD) This models are build by compilers or static analysis tools or by both

Models Used in Analysis

Lexed Tokens The source code converted into a token stream

discarding unimportant whitespaces and comments Eg Source Code if (ret) mat[x][y] = END_VAL This code produces the following sequence of tokens Lexer Output IF LPAREN ID(ret) RPAREN ID(mat) LBRACKET ID(x) RBRACKET LBRACKET ID(y) RBRACKET EQUAL ID(END_VAL) SEMI Some of the token needs extra one property like name for identifier(ID) These token stream is subsequently used in making parse tree

Parse Tree A language parser uses a context-free grammar (CFG) to match

the token stream The grammar consists of a set of productions that describe the symbols (elements) in the language The parser performs a derivation by matching the token stream against the production rules If each symbol is connected to the symbol from which it was derived a parse tree is formed

10

Control Flow Graph It is the graphical way of representing all

possible way the flow of programme may occur Each node in CFG represents a basic block that has no branching or looping CFG gives the idea of Cyclomatic complexity that directly shows the no of possibility of errors During dynamic analysis it also helps us to get exhaustive sets of test cases

Source Code if (a gt b) nConsec = 0 else s1 = getHexChar(1) s2 = getHexChar(2) return nConsec

11

Parser Output Parse Tree

CFG Builder output

Data Flow Diagram Data Flow Diagram shows all the possible path of

data in put and output and data transfer between different entities within the software thus giving us the points where data should be validated and setting up trust boundaries

12

If(agtb)

nConsec=0 s1 = getHexChar() s2= getHexChar()

return nConsec

222 Perform Analysis Analysis is performed on the tokens or nodes of tree

or graphs Lexical analysis Simplest of all analysis techniques helps in

checking syntactical errors and it uses in most cases regular pattern matching Not much useful than detecting wrong identifier names or function names Tools using lexical analysis techniques are ITS4 RATS and Flawfinder

Parse Tree and AST Analysis These representations helps us

understanding of semantics of the program So helps us in finding deviation of rules of grammar and security rules like a if block should start and end with curly bracket Most modern compilers does these kind of checking and violation comes as a parse error Codan the code analysis platform in CDT(CC+ Development Tools eclipse plugin ) uses AST to built checkers Similar Platform PMD Crystal (eclipse plugins) uses AST for detecting errors in Java

Control Flow Graph Analysis AST and parse trees though

appear useful enough in detecting most of rule violations they fail in case of rules that apply for branching in code As example we may take that opened file or database should be closed once and only once in a flow control of a programme CFG is analysed in a number of stages starting from basic block and then a procedure( method or function) and then to a bigger module like class Fortify Source Code Analyser Klockwork

Data Flow Diagram Analysis A Data Flow Diagram (DFD) with

security-specific annotations is used to describe how data enters leaves and traverses the system it shows data sources and destinations relevant processes that data goes through and trust boundaries in the system A DFD has a fixed set of component types Process HighLevelProcess Data Store and External Interactor A process is concern of DFD diagram A High Level program is represented by hierarchical multistage DFDs A datastore may be a database a file or the Registry An ExternalInteractor represents an entity that exists outside the system being modelled and which interacts with the system at an entry point typically a human The data flows are represented by arrows A DFD used in threat modelling often separates elements that have different privilege levels using a Boundary to describe locations where a privilege impersonation on the part of an adversary could occur a machine or process boundary may be crossed etc

13

Taint AnalysisThe concept of tainting refers to marking data coming

from an untrusted source as ldquotaintedrdquo and propagating its status to all locations where the data is used A security policy specifies what uses of untrusted data are allowed or restricted An attempt to use tainted data is a violation of this policy is an indication of a vulnerability Tainted data should not be used in any function which modifies files directories and processes or executes external programs If the rule is violated then the program should be aborted

1 Initialize all variables as NOT TAINTED 2 Find all calls to functions that read data from an untrusted source Mark the values returned by these functions as TAINTED 3 Propagate the tainted values through the program If a tainted value is used in an expression mark the result of the expression as TAINTED 4 Repeat step 3 until a fixed point is reached 5 Find all calls to potentially vulnerable functions If one of their arguments is tainted report this as a vulnerability 1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

Using Taint analysis memcpy() of both line no 5 and 7 will be marked as

vulnerability where as in actual case 5 is a false positive So taint analysis has a high possibility of giving false positive

Value Range Propagation In this case the tainted variables should

also carry a range of its possible values If some vulnerable function uses that variable then it may be checked that the value range of that variable still makes the function vulnerable or not Thus we may avoid some false positives

14

1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

2221Security Knowledge The main logic behind these tools

is to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again There are many such collections of common mistakes done by programmers

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb project(httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project)

3) SAMATE group at NIST (httpsamatenistgov)

In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized

15

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

Abstract

Security is sometimes considered as perimeter security ie restricting attackers from reaching deep inside our enterprise But to be totally secure software must be without any weakness that may go wrong even under some internal causes So security should be concerned through out al the process of software development That is where the utility of static analysis tools come They can find out vulnerability just by looking at the source code at the time of coding itself thus saving software testing time as much less vulnerable code In this project I have built some code checker that works as plugins of Eclipse IDE for CC++ language Though C is a highly used language many of its library functions are vulnerable

3

CONTENTS Topics Page 1 Introduction 4 11 Static Analysis In context of Software Security 5

2 Static Analysis8

21 Definition 8 22 Working Procedure 8 221 Build Model10 222 Perform Analysis helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip13 223 Present Results helliphelliphelliphelliphelliphelliphelliphelliphelliphellip16

3 Ways of Implementing Static Analysis Tools 16

31 My Static Analysis Tool Implementationhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip16

4 Implementation Tools17

41 Hardware Details17 42 Software Details 17 421Eclipse IDEhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip18 422Codan 18 423 PDEhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip18

5Implemention And Resultshelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip20

6Future Work Scope helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip27

7Limitations helliphellip helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip27

8Conclusions helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip28 8Referenceshelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip28

4

1Introduction In this digital age we use software to in every phase of our life whether it be our day to day articles or satellites in outer space Softwares automate totally or partially the things we use So a small mistake may lead to a huge apocalypse Hence softwares should be reliable for our own safety Also there are bunch of people ( hackers) who tries to jeopardize the system Cyber threat is a matter of huge concern these days Software security is the practice of building software to be secure and function properly under malicious attack The traditional way of making software less vulnerable is to test it with different sets of inputs thus finding out areas of weaknesses But if we can apply our knowledge about common vulnerabilities at the time of building it then the huge cost effort and time can be saved

And here comes the importance the importance of static analysis After all an attacker becomes successful if there is weakness in code If the vulnerable points are reduced then we may demand our software to be much more fail proof

5

11 Static Analysis In context of

Software Security Software security means

working of software correctly (giving correct outputs) under all possible situations even under malicious attacks (ie intentionally trying to find software weaknesses and exploit them ) Software Security is sometimes thought as security features cryptographic ciphers passwords and access control mecha- nisms But For a program to be secure all portions of the program must be secure not just the bits that explicitly address security In many cases security failings are not related to security features at all In conventional and mostly used way software security is considered in test and field phases of software building But those are actually effort to make up coding malpractices

The root to security issues lies in coding malpractices and using vulnerable Library functions and API lsquos So security issues must be considered during coding with faulty library functions and in the early stages of software development

Dynamic Analysis Firewall

Virus Scanner Penetration Detection Intrusion Detection

6

The root to security issues lies in coding malpractices and using vulnerable Library functions and API lsquos So security issues must be considered during coding with faulty library functions and in the early stages of software development

It is easier to fix the problems in the development stages as they are simple But in testing phase if some bugs appears then it may require to recheck the whole programme again

Static Analysis

Architectural risk Analysis

Security requirements

7

2Static Analysis 21 Definition Static analysis is analysing the source

code of software without executing it

22 Working Procedure It is divided in four steps

1Build Model

2 Perform Analysis Performing analysis needs another basic step of gathering security knowledge 21Security Knowledge 3Present Results

8

1)Lexed Tokens 2)Parse Tree 3)Abstract Syntax Tree 4)Control Flow Graph 5)Data Flow Diagram

1)Lexical analysis 2)Parse Tree and AST Analysis 3) Control Flow Graph Analysis 4)Data Flow Diagram Analysis 5)Taint Analysis 6)Value Range Propagation

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb Project (httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project) 3) SAMATE group at NIST ( httpsamatenistgov )

1)error (severe threat) 2)warning (may or may not be a security bug but obeying it is good practice) 3)info(good coding practice but no threat

9

221 Build model In analysis to understand the code by

analysis tools it needs to be represented by data structures that most nearly represents the property to be analysed Those basic data structures are actually build by compilers and static analysis tools borrow them and Those data-structures are lexer tokens parse tree abstract syntax tree (AST) control flow graph(CFG) dataflow diagram(DFD) This models are build by compilers or static analysis tools or by both

Models Used in Analysis

Lexed Tokens The source code converted into a token stream

discarding unimportant whitespaces and comments Eg Source Code if (ret) mat[x][y] = END_VAL This code produces the following sequence of tokens Lexer Output IF LPAREN ID(ret) RPAREN ID(mat) LBRACKET ID(x) RBRACKET LBRACKET ID(y) RBRACKET EQUAL ID(END_VAL) SEMI Some of the token needs extra one property like name for identifier(ID) These token stream is subsequently used in making parse tree

Parse Tree A language parser uses a context-free grammar (CFG) to match

the token stream The grammar consists of a set of productions that describe the symbols (elements) in the language The parser performs a derivation by matching the token stream against the production rules If each symbol is connected to the symbol from which it was derived a parse tree is formed

10

Control Flow Graph It is the graphical way of representing all

possible way the flow of programme may occur Each node in CFG represents a basic block that has no branching or looping CFG gives the idea of Cyclomatic complexity that directly shows the no of possibility of errors During dynamic analysis it also helps us to get exhaustive sets of test cases

Source Code if (a gt b) nConsec = 0 else s1 = getHexChar(1) s2 = getHexChar(2) return nConsec

11

Parser Output Parse Tree

CFG Builder output

Data Flow Diagram Data Flow Diagram shows all the possible path of

data in put and output and data transfer between different entities within the software thus giving us the points where data should be validated and setting up trust boundaries

12

If(agtb)

nConsec=0 s1 = getHexChar() s2= getHexChar()

return nConsec

222 Perform Analysis Analysis is performed on the tokens or nodes of tree

or graphs Lexical analysis Simplest of all analysis techniques helps in

checking syntactical errors and it uses in most cases regular pattern matching Not much useful than detecting wrong identifier names or function names Tools using lexical analysis techniques are ITS4 RATS and Flawfinder

Parse Tree and AST Analysis These representations helps us

understanding of semantics of the program So helps us in finding deviation of rules of grammar and security rules like a if block should start and end with curly bracket Most modern compilers does these kind of checking and violation comes as a parse error Codan the code analysis platform in CDT(CC+ Development Tools eclipse plugin ) uses AST to built checkers Similar Platform PMD Crystal (eclipse plugins) uses AST for detecting errors in Java

Control Flow Graph Analysis AST and parse trees though

appear useful enough in detecting most of rule violations they fail in case of rules that apply for branching in code As example we may take that opened file or database should be closed once and only once in a flow control of a programme CFG is analysed in a number of stages starting from basic block and then a procedure( method or function) and then to a bigger module like class Fortify Source Code Analyser Klockwork

Data Flow Diagram Analysis A Data Flow Diagram (DFD) with

security-specific annotations is used to describe how data enters leaves and traverses the system it shows data sources and destinations relevant processes that data goes through and trust boundaries in the system A DFD has a fixed set of component types Process HighLevelProcess Data Store and External Interactor A process is concern of DFD diagram A High Level program is represented by hierarchical multistage DFDs A datastore may be a database a file or the Registry An ExternalInteractor represents an entity that exists outside the system being modelled and which interacts with the system at an entry point typically a human The data flows are represented by arrows A DFD used in threat modelling often separates elements that have different privilege levels using a Boundary to describe locations where a privilege impersonation on the part of an adversary could occur a machine or process boundary may be crossed etc

13

Taint AnalysisThe concept of tainting refers to marking data coming

from an untrusted source as ldquotaintedrdquo and propagating its status to all locations where the data is used A security policy specifies what uses of untrusted data are allowed or restricted An attempt to use tainted data is a violation of this policy is an indication of a vulnerability Tainted data should not be used in any function which modifies files directories and processes or executes external programs If the rule is violated then the program should be aborted

1 Initialize all variables as NOT TAINTED 2 Find all calls to functions that read data from an untrusted source Mark the values returned by these functions as TAINTED 3 Propagate the tainted values through the program If a tainted value is used in an expression mark the result of the expression as TAINTED 4 Repeat step 3 until a fixed point is reached 5 Find all calls to potentially vulnerable functions If one of their arguments is tainted report this as a vulnerability 1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

Using Taint analysis memcpy() of both line no 5 and 7 will be marked as

vulnerability where as in actual case 5 is a false positive So taint analysis has a high possibility of giving false positive

Value Range Propagation In this case the tainted variables should

also carry a range of its possible values If some vulnerable function uses that variable then it may be checked that the value range of that variable still makes the function vulnerable or not Thus we may avoid some false positives

14

1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

2221Security Knowledge The main logic behind these tools

is to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again There are many such collections of common mistakes done by programmers

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb project(httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project)

3) SAMATE group at NIST (httpsamatenistgov)

In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized

15

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

CONTENTS Topics Page 1 Introduction 4 11 Static Analysis In context of Software Security 5

2 Static Analysis8

21 Definition 8 22 Working Procedure 8 221 Build Model10 222 Perform Analysis helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip13 223 Present Results helliphelliphelliphelliphelliphelliphelliphelliphelliphellip16

3 Ways of Implementing Static Analysis Tools 16

31 My Static Analysis Tool Implementationhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip16

4 Implementation Tools17

41 Hardware Details17 42 Software Details 17 421Eclipse IDEhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip18 422Codan 18 423 PDEhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip18

5Implemention And Resultshelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip20

6Future Work Scope helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip27

7Limitations helliphellip helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip27

8Conclusions helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip28 8Referenceshelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip28

4

1Introduction In this digital age we use software to in every phase of our life whether it be our day to day articles or satellites in outer space Softwares automate totally or partially the things we use So a small mistake may lead to a huge apocalypse Hence softwares should be reliable for our own safety Also there are bunch of people ( hackers) who tries to jeopardize the system Cyber threat is a matter of huge concern these days Software security is the practice of building software to be secure and function properly under malicious attack The traditional way of making software less vulnerable is to test it with different sets of inputs thus finding out areas of weaknesses But if we can apply our knowledge about common vulnerabilities at the time of building it then the huge cost effort and time can be saved

And here comes the importance the importance of static analysis After all an attacker becomes successful if there is weakness in code If the vulnerable points are reduced then we may demand our software to be much more fail proof

5

11 Static Analysis In context of

Software Security Software security means

working of software correctly (giving correct outputs) under all possible situations even under malicious attacks (ie intentionally trying to find software weaknesses and exploit them ) Software Security is sometimes thought as security features cryptographic ciphers passwords and access control mecha- nisms But For a program to be secure all portions of the program must be secure not just the bits that explicitly address security In many cases security failings are not related to security features at all In conventional and mostly used way software security is considered in test and field phases of software building But those are actually effort to make up coding malpractices

The root to security issues lies in coding malpractices and using vulnerable Library functions and API lsquos So security issues must be considered during coding with faulty library functions and in the early stages of software development

Dynamic Analysis Firewall

Virus Scanner Penetration Detection Intrusion Detection

6

The root to security issues lies in coding malpractices and using vulnerable Library functions and API lsquos So security issues must be considered during coding with faulty library functions and in the early stages of software development

It is easier to fix the problems in the development stages as they are simple But in testing phase if some bugs appears then it may require to recheck the whole programme again

Static Analysis

Architectural risk Analysis

Security requirements

7

2Static Analysis 21 Definition Static analysis is analysing the source

code of software without executing it

22 Working Procedure It is divided in four steps

1Build Model

2 Perform Analysis Performing analysis needs another basic step of gathering security knowledge 21Security Knowledge 3Present Results

8

1)Lexed Tokens 2)Parse Tree 3)Abstract Syntax Tree 4)Control Flow Graph 5)Data Flow Diagram

1)Lexical analysis 2)Parse Tree and AST Analysis 3) Control Flow Graph Analysis 4)Data Flow Diagram Analysis 5)Taint Analysis 6)Value Range Propagation

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb Project (httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project) 3) SAMATE group at NIST ( httpsamatenistgov )

1)error (severe threat) 2)warning (may or may not be a security bug but obeying it is good practice) 3)info(good coding practice but no threat

9

221 Build model In analysis to understand the code by

analysis tools it needs to be represented by data structures that most nearly represents the property to be analysed Those basic data structures are actually build by compilers and static analysis tools borrow them and Those data-structures are lexer tokens parse tree abstract syntax tree (AST) control flow graph(CFG) dataflow diagram(DFD) This models are build by compilers or static analysis tools or by both

Models Used in Analysis

Lexed Tokens The source code converted into a token stream

discarding unimportant whitespaces and comments Eg Source Code if (ret) mat[x][y] = END_VAL This code produces the following sequence of tokens Lexer Output IF LPAREN ID(ret) RPAREN ID(mat) LBRACKET ID(x) RBRACKET LBRACKET ID(y) RBRACKET EQUAL ID(END_VAL) SEMI Some of the token needs extra one property like name for identifier(ID) These token stream is subsequently used in making parse tree

Parse Tree A language parser uses a context-free grammar (CFG) to match

the token stream The grammar consists of a set of productions that describe the symbols (elements) in the language The parser performs a derivation by matching the token stream against the production rules If each symbol is connected to the symbol from which it was derived a parse tree is formed

10

Control Flow Graph It is the graphical way of representing all

possible way the flow of programme may occur Each node in CFG represents a basic block that has no branching or looping CFG gives the idea of Cyclomatic complexity that directly shows the no of possibility of errors During dynamic analysis it also helps us to get exhaustive sets of test cases

Source Code if (a gt b) nConsec = 0 else s1 = getHexChar(1) s2 = getHexChar(2) return nConsec

11

Parser Output Parse Tree

CFG Builder output

Data Flow Diagram Data Flow Diagram shows all the possible path of

data in put and output and data transfer between different entities within the software thus giving us the points where data should be validated and setting up trust boundaries

12

If(agtb)

nConsec=0 s1 = getHexChar() s2= getHexChar()

return nConsec

222 Perform Analysis Analysis is performed on the tokens or nodes of tree

or graphs Lexical analysis Simplest of all analysis techniques helps in

checking syntactical errors and it uses in most cases regular pattern matching Not much useful than detecting wrong identifier names or function names Tools using lexical analysis techniques are ITS4 RATS and Flawfinder

Parse Tree and AST Analysis These representations helps us

understanding of semantics of the program So helps us in finding deviation of rules of grammar and security rules like a if block should start and end with curly bracket Most modern compilers does these kind of checking and violation comes as a parse error Codan the code analysis platform in CDT(CC+ Development Tools eclipse plugin ) uses AST to built checkers Similar Platform PMD Crystal (eclipse plugins) uses AST for detecting errors in Java

Control Flow Graph Analysis AST and parse trees though

appear useful enough in detecting most of rule violations they fail in case of rules that apply for branching in code As example we may take that opened file or database should be closed once and only once in a flow control of a programme CFG is analysed in a number of stages starting from basic block and then a procedure( method or function) and then to a bigger module like class Fortify Source Code Analyser Klockwork

Data Flow Diagram Analysis A Data Flow Diagram (DFD) with

security-specific annotations is used to describe how data enters leaves and traverses the system it shows data sources and destinations relevant processes that data goes through and trust boundaries in the system A DFD has a fixed set of component types Process HighLevelProcess Data Store and External Interactor A process is concern of DFD diagram A High Level program is represented by hierarchical multistage DFDs A datastore may be a database a file or the Registry An ExternalInteractor represents an entity that exists outside the system being modelled and which interacts with the system at an entry point typically a human The data flows are represented by arrows A DFD used in threat modelling often separates elements that have different privilege levels using a Boundary to describe locations where a privilege impersonation on the part of an adversary could occur a machine or process boundary may be crossed etc

13

Taint AnalysisThe concept of tainting refers to marking data coming

from an untrusted source as ldquotaintedrdquo and propagating its status to all locations where the data is used A security policy specifies what uses of untrusted data are allowed or restricted An attempt to use tainted data is a violation of this policy is an indication of a vulnerability Tainted data should not be used in any function which modifies files directories and processes or executes external programs If the rule is violated then the program should be aborted

1 Initialize all variables as NOT TAINTED 2 Find all calls to functions that read data from an untrusted source Mark the values returned by these functions as TAINTED 3 Propagate the tainted values through the program If a tainted value is used in an expression mark the result of the expression as TAINTED 4 Repeat step 3 until a fixed point is reached 5 Find all calls to potentially vulnerable functions If one of their arguments is tainted report this as a vulnerability 1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

Using Taint analysis memcpy() of both line no 5 and 7 will be marked as

vulnerability where as in actual case 5 is a false positive So taint analysis has a high possibility of giving false positive

Value Range Propagation In this case the tainted variables should

also carry a range of its possible values If some vulnerable function uses that variable then it may be checked that the value range of that variable still makes the function vulnerable or not Thus we may avoid some false positives

14

1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

2221Security Knowledge The main logic behind these tools

is to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again There are many such collections of common mistakes done by programmers

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb project(httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project)

3) SAMATE group at NIST (httpsamatenistgov)

In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized

15

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

1Introduction In this digital age we use software to in every phase of our life whether it be our day to day articles or satellites in outer space Softwares automate totally or partially the things we use So a small mistake may lead to a huge apocalypse Hence softwares should be reliable for our own safety Also there are bunch of people ( hackers) who tries to jeopardize the system Cyber threat is a matter of huge concern these days Software security is the practice of building software to be secure and function properly under malicious attack The traditional way of making software less vulnerable is to test it with different sets of inputs thus finding out areas of weaknesses But if we can apply our knowledge about common vulnerabilities at the time of building it then the huge cost effort and time can be saved

And here comes the importance the importance of static analysis After all an attacker becomes successful if there is weakness in code If the vulnerable points are reduced then we may demand our software to be much more fail proof

5

11 Static Analysis In context of

Software Security Software security means

working of software correctly (giving correct outputs) under all possible situations even under malicious attacks (ie intentionally trying to find software weaknesses and exploit them ) Software Security is sometimes thought as security features cryptographic ciphers passwords and access control mecha- nisms But For a program to be secure all portions of the program must be secure not just the bits that explicitly address security In many cases security failings are not related to security features at all In conventional and mostly used way software security is considered in test and field phases of software building But those are actually effort to make up coding malpractices

The root to security issues lies in coding malpractices and using vulnerable Library functions and API lsquos So security issues must be considered during coding with faulty library functions and in the early stages of software development

Dynamic Analysis Firewall

Virus Scanner Penetration Detection Intrusion Detection

6

The root to security issues lies in coding malpractices and using vulnerable Library functions and API lsquos So security issues must be considered during coding with faulty library functions and in the early stages of software development

It is easier to fix the problems in the development stages as they are simple But in testing phase if some bugs appears then it may require to recheck the whole programme again

Static Analysis

Architectural risk Analysis

Security requirements

7

2Static Analysis 21 Definition Static analysis is analysing the source

code of software without executing it

22 Working Procedure It is divided in four steps

1Build Model

2 Perform Analysis Performing analysis needs another basic step of gathering security knowledge 21Security Knowledge 3Present Results

8

1)Lexed Tokens 2)Parse Tree 3)Abstract Syntax Tree 4)Control Flow Graph 5)Data Flow Diagram

1)Lexical analysis 2)Parse Tree and AST Analysis 3) Control Flow Graph Analysis 4)Data Flow Diagram Analysis 5)Taint Analysis 6)Value Range Propagation

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb Project (httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project) 3) SAMATE group at NIST ( httpsamatenistgov )

1)error (severe threat) 2)warning (may or may not be a security bug but obeying it is good practice) 3)info(good coding practice but no threat

9

221 Build model In analysis to understand the code by

analysis tools it needs to be represented by data structures that most nearly represents the property to be analysed Those basic data structures are actually build by compilers and static analysis tools borrow them and Those data-structures are lexer tokens parse tree abstract syntax tree (AST) control flow graph(CFG) dataflow diagram(DFD) This models are build by compilers or static analysis tools or by both

Models Used in Analysis

Lexed Tokens The source code converted into a token stream

discarding unimportant whitespaces and comments Eg Source Code if (ret) mat[x][y] = END_VAL This code produces the following sequence of tokens Lexer Output IF LPAREN ID(ret) RPAREN ID(mat) LBRACKET ID(x) RBRACKET LBRACKET ID(y) RBRACKET EQUAL ID(END_VAL) SEMI Some of the token needs extra one property like name for identifier(ID) These token stream is subsequently used in making parse tree

Parse Tree A language parser uses a context-free grammar (CFG) to match

the token stream The grammar consists of a set of productions that describe the symbols (elements) in the language The parser performs a derivation by matching the token stream against the production rules If each symbol is connected to the symbol from which it was derived a parse tree is formed

10

Control Flow Graph It is the graphical way of representing all

possible way the flow of programme may occur Each node in CFG represents a basic block that has no branching or looping CFG gives the idea of Cyclomatic complexity that directly shows the no of possibility of errors During dynamic analysis it also helps us to get exhaustive sets of test cases

Source Code if (a gt b) nConsec = 0 else s1 = getHexChar(1) s2 = getHexChar(2) return nConsec

11

Parser Output Parse Tree

CFG Builder output

Data Flow Diagram Data Flow Diagram shows all the possible path of

data in put and output and data transfer between different entities within the software thus giving us the points where data should be validated and setting up trust boundaries

12

If(agtb)

nConsec=0 s1 = getHexChar() s2= getHexChar()

return nConsec

222 Perform Analysis Analysis is performed on the tokens or nodes of tree

or graphs Lexical analysis Simplest of all analysis techniques helps in

checking syntactical errors and it uses in most cases regular pattern matching Not much useful than detecting wrong identifier names or function names Tools using lexical analysis techniques are ITS4 RATS and Flawfinder

Parse Tree and AST Analysis These representations helps us

understanding of semantics of the program So helps us in finding deviation of rules of grammar and security rules like a if block should start and end with curly bracket Most modern compilers does these kind of checking and violation comes as a parse error Codan the code analysis platform in CDT(CC+ Development Tools eclipse plugin ) uses AST to built checkers Similar Platform PMD Crystal (eclipse plugins) uses AST for detecting errors in Java

Control Flow Graph Analysis AST and parse trees though

appear useful enough in detecting most of rule violations they fail in case of rules that apply for branching in code As example we may take that opened file or database should be closed once and only once in a flow control of a programme CFG is analysed in a number of stages starting from basic block and then a procedure( method or function) and then to a bigger module like class Fortify Source Code Analyser Klockwork

Data Flow Diagram Analysis A Data Flow Diagram (DFD) with

security-specific annotations is used to describe how data enters leaves and traverses the system it shows data sources and destinations relevant processes that data goes through and trust boundaries in the system A DFD has a fixed set of component types Process HighLevelProcess Data Store and External Interactor A process is concern of DFD diagram A High Level program is represented by hierarchical multistage DFDs A datastore may be a database a file or the Registry An ExternalInteractor represents an entity that exists outside the system being modelled and which interacts with the system at an entry point typically a human The data flows are represented by arrows A DFD used in threat modelling often separates elements that have different privilege levels using a Boundary to describe locations where a privilege impersonation on the part of an adversary could occur a machine or process boundary may be crossed etc

13

Taint AnalysisThe concept of tainting refers to marking data coming

from an untrusted source as ldquotaintedrdquo and propagating its status to all locations where the data is used A security policy specifies what uses of untrusted data are allowed or restricted An attempt to use tainted data is a violation of this policy is an indication of a vulnerability Tainted data should not be used in any function which modifies files directories and processes or executes external programs If the rule is violated then the program should be aborted

1 Initialize all variables as NOT TAINTED 2 Find all calls to functions that read data from an untrusted source Mark the values returned by these functions as TAINTED 3 Propagate the tainted values through the program If a tainted value is used in an expression mark the result of the expression as TAINTED 4 Repeat step 3 until a fixed point is reached 5 Find all calls to potentially vulnerable functions If one of their arguments is tainted report this as a vulnerability 1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

Using Taint analysis memcpy() of both line no 5 and 7 will be marked as

vulnerability where as in actual case 5 is a false positive So taint analysis has a high possibility of giving false positive

Value Range Propagation In this case the tainted variables should

also carry a range of its possible values If some vulnerable function uses that variable then it may be checked that the value range of that variable still makes the function vulnerable or not Thus we may avoid some false positives

14

1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

2221Security Knowledge The main logic behind these tools

is to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again There are many such collections of common mistakes done by programmers

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb project(httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project)

3) SAMATE group at NIST (httpsamatenistgov)

In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized

15

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

11 Static Analysis In context of

Software Security Software security means

working of software correctly (giving correct outputs) under all possible situations even under malicious attacks (ie intentionally trying to find software weaknesses and exploit them ) Software Security is sometimes thought as security features cryptographic ciphers passwords and access control mecha- nisms But For a program to be secure all portions of the program must be secure not just the bits that explicitly address security In many cases security failings are not related to security features at all In conventional and mostly used way software security is considered in test and field phases of software building But those are actually effort to make up coding malpractices

The root to security issues lies in coding malpractices and using vulnerable Library functions and API lsquos So security issues must be considered during coding with faulty library functions and in the early stages of software development

Dynamic Analysis Firewall

Virus Scanner Penetration Detection Intrusion Detection

6

The root to security issues lies in coding malpractices and using vulnerable Library functions and API lsquos So security issues must be considered during coding with faulty library functions and in the early stages of software development

It is easier to fix the problems in the development stages as they are simple But in testing phase if some bugs appears then it may require to recheck the whole programme again

Static Analysis

Architectural risk Analysis

Security requirements

7

2Static Analysis 21 Definition Static analysis is analysing the source

code of software without executing it

22 Working Procedure It is divided in four steps

1Build Model

2 Perform Analysis Performing analysis needs another basic step of gathering security knowledge 21Security Knowledge 3Present Results

8

1)Lexed Tokens 2)Parse Tree 3)Abstract Syntax Tree 4)Control Flow Graph 5)Data Flow Diagram

1)Lexical analysis 2)Parse Tree and AST Analysis 3) Control Flow Graph Analysis 4)Data Flow Diagram Analysis 5)Taint Analysis 6)Value Range Propagation

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb Project (httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project) 3) SAMATE group at NIST ( httpsamatenistgov )

1)error (severe threat) 2)warning (may or may not be a security bug but obeying it is good practice) 3)info(good coding practice but no threat

9

221 Build model In analysis to understand the code by

analysis tools it needs to be represented by data structures that most nearly represents the property to be analysed Those basic data structures are actually build by compilers and static analysis tools borrow them and Those data-structures are lexer tokens parse tree abstract syntax tree (AST) control flow graph(CFG) dataflow diagram(DFD) This models are build by compilers or static analysis tools or by both

Models Used in Analysis

Lexed Tokens The source code converted into a token stream

discarding unimportant whitespaces and comments Eg Source Code if (ret) mat[x][y] = END_VAL This code produces the following sequence of tokens Lexer Output IF LPAREN ID(ret) RPAREN ID(mat) LBRACKET ID(x) RBRACKET LBRACKET ID(y) RBRACKET EQUAL ID(END_VAL) SEMI Some of the token needs extra one property like name for identifier(ID) These token stream is subsequently used in making parse tree

Parse Tree A language parser uses a context-free grammar (CFG) to match

the token stream The grammar consists of a set of productions that describe the symbols (elements) in the language The parser performs a derivation by matching the token stream against the production rules If each symbol is connected to the symbol from which it was derived a parse tree is formed

10

Control Flow Graph It is the graphical way of representing all

possible way the flow of programme may occur Each node in CFG represents a basic block that has no branching or looping CFG gives the idea of Cyclomatic complexity that directly shows the no of possibility of errors During dynamic analysis it also helps us to get exhaustive sets of test cases

Source Code if (a gt b) nConsec = 0 else s1 = getHexChar(1) s2 = getHexChar(2) return nConsec

11

Parser Output Parse Tree

CFG Builder output

Data Flow Diagram Data Flow Diagram shows all the possible path of

data in put and output and data transfer between different entities within the software thus giving us the points where data should be validated and setting up trust boundaries

12

If(agtb)

nConsec=0 s1 = getHexChar() s2= getHexChar()

return nConsec

222 Perform Analysis Analysis is performed on the tokens or nodes of tree

or graphs Lexical analysis Simplest of all analysis techniques helps in

checking syntactical errors and it uses in most cases regular pattern matching Not much useful than detecting wrong identifier names or function names Tools using lexical analysis techniques are ITS4 RATS and Flawfinder

Parse Tree and AST Analysis These representations helps us

understanding of semantics of the program So helps us in finding deviation of rules of grammar and security rules like a if block should start and end with curly bracket Most modern compilers does these kind of checking and violation comes as a parse error Codan the code analysis platform in CDT(CC+ Development Tools eclipse plugin ) uses AST to built checkers Similar Platform PMD Crystal (eclipse plugins) uses AST for detecting errors in Java

Control Flow Graph Analysis AST and parse trees though

appear useful enough in detecting most of rule violations they fail in case of rules that apply for branching in code As example we may take that opened file or database should be closed once and only once in a flow control of a programme CFG is analysed in a number of stages starting from basic block and then a procedure( method or function) and then to a bigger module like class Fortify Source Code Analyser Klockwork

Data Flow Diagram Analysis A Data Flow Diagram (DFD) with

security-specific annotations is used to describe how data enters leaves and traverses the system it shows data sources and destinations relevant processes that data goes through and trust boundaries in the system A DFD has a fixed set of component types Process HighLevelProcess Data Store and External Interactor A process is concern of DFD diagram A High Level program is represented by hierarchical multistage DFDs A datastore may be a database a file or the Registry An ExternalInteractor represents an entity that exists outside the system being modelled and which interacts with the system at an entry point typically a human The data flows are represented by arrows A DFD used in threat modelling often separates elements that have different privilege levels using a Boundary to describe locations where a privilege impersonation on the part of an adversary could occur a machine or process boundary may be crossed etc

13

Taint AnalysisThe concept of tainting refers to marking data coming

from an untrusted source as ldquotaintedrdquo and propagating its status to all locations where the data is used A security policy specifies what uses of untrusted data are allowed or restricted An attempt to use tainted data is a violation of this policy is an indication of a vulnerability Tainted data should not be used in any function which modifies files directories and processes or executes external programs If the rule is violated then the program should be aborted

1 Initialize all variables as NOT TAINTED 2 Find all calls to functions that read data from an untrusted source Mark the values returned by these functions as TAINTED 3 Propagate the tainted values through the program If a tainted value is used in an expression mark the result of the expression as TAINTED 4 Repeat step 3 until a fixed point is reached 5 Find all calls to potentially vulnerable functions If one of their arguments is tainted report this as a vulnerability 1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

Using Taint analysis memcpy() of both line no 5 and 7 will be marked as

vulnerability where as in actual case 5 is a false positive So taint analysis has a high possibility of giving false positive

Value Range Propagation In this case the tainted variables should

also carry a range of its possible values If some vulnerable function uses that variable then it may be checked that the value range of that variable still makes the function vulnerable or not Thus we may avoid some false positives

14

1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

2221Security Knowledge The main logic behind these tools

is to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again There are many such collections of common mistakes done by programmers

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb project(httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project)

3) SAMATE group at NIST (httpsamatenistgov)

In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized

15

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

The root to security issues lies in coding malpractices and using vulnerable Library functions and API lsquos So security issues must be considered during coding with faulty library functions and in the early stages of software development

It is easier to fix the problems in the development stages as they are simple But in testing phase if some bugs appears then it may require to recheck the whole programme again

Static Analysis

Architectural risk Analysis

Security requirements

7

2Static Analysis 21 Definition Static analysis is analysing the source

code of software without executing it

22 Working Procedure It is divided in four steps

1Build Model

2 Perform Analysis Performing analysis needs another basic step of gathering security knowledge 21Security Knowledge 3Present Results

8

1)Lexed Tokens 2)Parse Tree 3)Abstract Syntax Tree 4)Control Flow Graph 5)Data Flow Diagram

1)Lexical analysis 2)Parse Tree and AST Analysis 3) Control Flow Graph Analysis 4)Data Flow Diagram Analysis 5)Taint Analysis 6)Value Range Propagation

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb Project (httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project) 3) SAMATE group at NIST ( httpsamatenistgov )

1)error (severe threat) 2)warning (may or may not be a security bug but obeying it is good practice) 3)info(good coding practice but no threat

9

221 Build model In analysis to understand the code by

analysis tools it needs to be represented by data structures that most nearly represents the property to be analysed Those basic data structures are actually build by compilers and static analysis tools borrow them and Those data-structures are lexer tokens parse tree abstract syntax tree (AST) control flow graph(CFG) dataflow diagram(DFD) This models are build by compilers or static analysis tools or by both

Models Used in Analysis

Lexed Tokens The source code converted into a token stream

discarding unimportant whitespaces and comments Eg Source Code if (ret) mat[x][y] = END_VAL This code produces the following sequence of tokens Lexer Output IF LPAREN ID(ret) RPAREN ID(mat) LBRACKET ID(x) RBRACKET LBRACKET ID(y) RBRACKET EQUAL ID(END_VAL) SEMI Some of the token needs extra one property like name for identifier(ID) These token stream is subsequently used in making parse tree

Parse Tree A language parser uses a context-free grammar (CFG) to match

the token stream The grammar consists of a set of productions that describe the symbols (elements) in the language The parser performs a derivation by matching the token stream against the production rules If each symbol is connected to the symbol from which it was derived a parse tree is formed

10

Control Flow Graph It is the graphical way of representing all

possible way the flow of programme may occur Each node in CFG represents a basic block that has no branching or looping CFG gives the idea of Cyclomatic complexity that directly shows the no of possibility of errors During dynamic analysis it also helps us to get exhaustive sets of test cases

Source Code if (a gt b) nConsec = 0 else s1 = getHexChar(1) s2 = getHexChar(2) return nConsec

11

Parser Output Parse Tree

CFG Builder output

Data Flow Diagram Data Flow Diagram shows all the possible path of

data in put and output and data transfer between different entities within the software thus giving us the points where data should be validated and setting up trust boundaries

12

If(agtb)

nConsec=0 s1 = getHexChar() s2= getHexChar()

return nConsec

222 Perform Analysis Analysis is performed on the tokens or nodes of tree

or graphs Lexical analysis Simplest of all analysis techniques helps in

checking syntactical errors and it uses in most cases regular pattern matching Not much useful than detecting wrong identifier names or function names Tools using lexical analysis techniques are ITS4 RATS and Flawfinder

Parse Tree and AST Analysis These representations helps us

understanding of semantics of the program So helps us in finding deviation of rules of grammar and security rules like a if block should start and end with curly bracket Most modern compilers does these kind of checking and violation comes as a parse error Codan the code analysis platform in CDT(CC+ Development Tools eclipse plugin ) uses AST to built checkers Similar Platform PMD Crystal (eclipse plugins) uses AST for detecting errors in Java

Control Flow Graph Analysis AST and parse trees though

appear useful enough in detecting most of rule violations they fail in case of rules that apply for branching in code As example we may take that opened file or database should be closed once and only once in a flow control of a programme CFG is analysed in a number of stages starting from basic block and then a procedure( method or function) and then to a bigger module like class Fortify Source Code Analyser Klockwork

Data Flow Diagram Analysis A Data Flow Diagram (DFD) with

security-specific annotations is used to describe how data enters leaves and traverses the system it shows data sources and destinations relevant processes that data goes through and trust boundaries in the system A DFD has a fixed set of component types Process HighLevelProcess Data Store and External Interactor A process is concern of DFD diagram A High Level program is represented by hierarchical multistage DFDs A datastore may be a database a file or the Registry An ExternalInteractor represents an entity that exists outside the system being modelled and which interacts with the system at an entry point typically a human The data flows are represented by arrows A DFD used in threat modelling often separates elements that have different privilege levels using a Boundary to describe locations where a privilege impersonation on the part of an adversary could occur a machine or process boundary may be crossed etc

13

Taint AnalysisThe concept of tainting refers to marking data coming

from an untrusted source as ldquotaintedrdquo and propagating its status to all locations where the data is used A security policy specifies what uses of untrusted data are allowed or restricted An attempt to use tainted data is a violation of this policy is an indication of a vulnerability Tainted data should not be used in any function which modifies files directories and processes or executes external programs If the rule is violated then the program should be aborted

1 Initialize all variables as NOT TAINTED 2 Find all calls to functions that read data from an untrusted source Mark the values returned by these functions as TAINTED 3 Propagate the tainted values through the program If a tainted value is used in an expression mark the result of the expression as TAINTED 4 Repeat step 3 until a fixed point is reached 5 Find all calls to potentially vulnerable functions If one of their arguments is tainted report this as a vulnerability 1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

Using Taint analysis memcpy() of both line no 5 and 7 will be marked as

vulnerability where as in actual case 5 is a false positive So taint analysis has a high possibility of giving false positive

Value Range Propagation In this case the tainted variables should

also carry a range of its possible values If some vulnerable function uses that variable then it may be checked that the value range of that variable still makes the function vulnerable or not Thus we may avoid some false positives

14

1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

2221Security Knowledge The main logic behind these tools

is to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again There are many such collections of common mistakes done by programmers

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb project(httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project)

3) SAMATE group at NIST (httpsamatenistgov)

In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized

15

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

2Static Analysis 21 Definition Static analysis is analysing the source

code of software without executing it

22 Working Procedure It is divided in four steps

1Build Model

2 Perform Analysis Performing analysis needs another basic step of gathering security knowledge 21Security Knowledge 3Present Results

8

1)Lexed Tokens 2)Parse Tree 3)Abstract Syntax Tree 4)Control Flow Graph 5)Data Flow Diagram

1)Lexical analysis 2)Parse Tree and AST Analysis 3) Control Flow Graph Analysis 4)Data Flow Diagram Analysis 5)Taint Analysis 6)Value Range Propagation

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb Project (httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project) 3) SAMATE group at NIST ( httpsamatenistgov )

1)error (severe threat) 2)warning (may or may not be a security bug but obeying it is good practice) 3)info(good coding practice but no threat

9

221 Build model In analysis to understand the code by

analysis tools it needs to be represented by data structures that most nearly represents the property to be analysed Those basic data structures are actually build by compilers and static analysis tools borrow them and Those data-structures are lexer tokens parse tree abstract syntax tree (AST) control flow graph(CFG) dataflow diagram(DFD) This models are build by compilers or static analysis tools or by both

Models Used in Analysis

Lexed Tokens The source code converted into a token stream

discarding unimportant whitespaces and comments Eg Source Code if (ret) mat[x][y] = END_VAL This code produces the following sequence of tokens Lexer Output IF LPAREN ID(ret) RPAREN ID(mat) LBRACKET ID(x) RBRACKET LBRACKET ID(y) RBRACKET EQUAL ID(END_VAL) SEMI Some of the token needs extra one property like name for identifier(ID) These token stream is subsequently used in making parse tree

Parse Tree A language parser uses a context-free grammar (CFG) to match

the token stream The grammar consists of a set of productions that describe the symbols (elements) in the language The parser performs a derivation by matching the token stream against the production rules If each symbol is connected to the symbol from which it was derived a parse tree is formed

10

Control Flow Graph It is the graphical way of representing all

possible way the flow of programme may occur Each node in CFG represents a basic block that has no branching or looping CFG gives the idea of Cyclomatic complexity that directly shows the no of possibility of errors During dynamic analysis it also helps us to get exhaustive sets of test cases

Source Code if (a gt b) nConsec = 0 else s1 = getHexChar(1) s2 = getHexChar(2) return nConsec

11

Parser Output Parse Tree

CFG Builder output

Data Flow Diagram Data Flow Diagram shows all the possible path of

data in put and output and data transfer between different entities within the software thus giving us the points where data should be validated and setting up trust boundaries

12

If(agtb)

nConsec=0 s1 = getHexChar() s2= getHexChar()

return nConsec

222 Perform Analysis Analysis is performed on the tokens or nodes of tree

or graphs Lexical analysis Simplest of all analysis techniques helps in

checking syntactical errors and it uses in most cases regular pattern matching Not much useful than detecting wrong identifier names or function names Tools using lexical analysis techniques are ITS4 RATS and Flawfinder

Parse Tree and AST Analysis These representations helps us

understanding of semantics of the program So helps us in finding deviation of rules of grammar and security rules like a if block should start and end with curly bracket Most modern compilers does these kind of checking and violation comes as a parse error Codan the code analysis platform in CDT(CC+ Development Tools eclipse plugin ) uses AST to built checkers Similar Platform PMD Crystal (eclipse plugins) uses AST for detecting errors in Java

Control Flow Graph Analysis AST and parse trees though

appear useful enough in detecting most of rule violations they fail in case of rules that apply for branching in code As example we may take that opened file or database should be closed once and only once in a flow control of a programme CFG is analysed in a number of stages starting from basic block and then a procedure( method or function) and then to a bigger module like class Fortify Source Code Analyser Klockwork

Data Flow Diagram Analysis A Data Flow Diagram (DFD) with

security-specific annotations is used to describe how data enters leaves and traverses the system it shows data sources and destinations relevant processes that data goes through and trust boundaries in the system A DFD has a fixed set of component types Process HighLevelProcess Data Store and External Interactor A process is concern of DFD diagram A High Level program is represented by hierarchical multistage DFDs A datastore may be a database a file or the Registry An ExternalInteractor represents an entity that exists outside the system being modelled and which interacts with the system at an entry point typically a human The data flows are represented by arrows A DFD used in threat modelling often separates elements that have different privilege levels using a Boundary to describe locations where a privilege impersonation on the part of an adversary could occur a machine or process boundary may be crossed etc

13

Taint AnalysisThe concept of tainting refers to marking data coming

from an untrusted source as ldquotaintedrdquo and propagating its status to all locations where the data is used A security policy specifies what uses of untrusted data are allowed or restricted An attempt to use tainted data is a violation of this policy is an indication of a vulnerability Tainted data should not be used in any function which modifies files directories and processes or executes external programs If the rule is violated then the program should be aborted

1 Initialize all variables as NOT TAINTED 2 Find all calls to functions that read data from an untrusted source Mark the values returned by these functions as TAINTED 3 Propagate the tainted values through the program If a tainted value is used in an expression mark the result of the expression as TAINTED 4 Repeat step 3 until a fixed point is reached 5 Find all calls to potentially vulnerable functions If one of their arguments is tainted report this as a vulnerability 1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

Using Taint analysis memcpy() of both line no 5 and 7 will be marked as

vulnerability where as in actual case 5 is a false positive So taint analysis has a high possibility of giving false positive

Value Range Propagation In this case the tainted variables should

also carry a range of its possible values If some vulnerable function uses that variable then it may be checked that the value range of that variable still makes the function vulnerable or not Thus we may avoid some false positives

14

1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

2221Security Knowledge The main logic behind these tools

is to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again There are many such collections of common mistakes done by programmers

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb project(httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project)

3) SAMATE group at NIST (httpsamatenistgov)

In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized

15

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

1)Lexed Tokens 2)Parse Tree 3)Abstract Syntax Tree 4)Control Flow Graph 5)Data Flow Diagram

1)Lexical analysis 2)Parse Tree and AST Analysis 3) Control Flow Graph Analysis 4)Data Flow Diagram Analysis 5)Taint Analysis 6)Value Range Propagation

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb Project (httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project) 3) SAMATE group at NIST ( httpsamatenistgov )

1)error (severe threat) 2)warning (may or may not be a security bug but obeying it is good practice) 3)info(good coding practice but no threat

9

221 Build model In analysis to understand the code by

analysis tools it needs to be represented by data structures that most nearly represents the property to be analysed Those basic data structures are actually build by compilers and static analysis tools borrow them and Those data-structures are lexer tokens parse tree abstract syntax tree (AST) control flow graph(CFG) dataflow diagram(DFD) This models are build by compilers or static analysis tools or by both

Models Used in Analysis

Lexed Tokens The source code converted into a token stream

discarding unimportant whitespaces and comments Eg Source Code if (ret) mat[x][y] = END_VAL This code produces the following sequence of tokens Lexer Output IF LPAREN ID(ret) RPAREN ID(mat) LBRACKET ID(x) RBRACKET LBRACKET ID(y) RBRACKET EQUAL ID(END_VAL) SEMI Some of the token needs extra one property like name for identifier(ID) These token stream is subsequently used in making parse tree

Parse Tree A language parser uses a context-free grammar (CFG) to match

the token stream The grammar consists of a set of productions that describe the symbols (elements) in the language The parser performs a derivation by matching the token stream against the production rules If each symbol is connected to the symbol from which it was derived a parse tree is formed

10

Control Flow Graph It is the graphical way of representing all

possible way the flow of programme may occur Each node in CFG represents a basic block that has no branching or looping CFG gives the idea of Cyclomatic complexity that directly shows the no of possibility of errors During dynamic analysis it also helps us to get exhaustive sets of test cases

Source Code if (a gt b) nConsec = 0 else s1 = getHexChar(1) s2 = getHexChar(2) return nConsec

11

Parser Output Parse Tree

CFG Builder output

Data Flow Diagram Data Flow Diagram shows all the possible path of

data in put and output and data transfer between different entities within the software thus giving us the points where data should be validated and setting up trust boundaries

12

If(agtb)

nConsec=0 s1 = getHexChar() s2= getHexChar()

return nConsec

222 Perform Analysis Analysis is performed on the tokens or nodes of tree

or graphs Lexical analysis Simplest of all analysis techniques helps in

checking syntactical errors and it uses in most cases regular pattern matching Not much useful than detecting wrong identifier names or function names Tools using lexical analysis techniques are ITS4 RATS and Flawfinder

Parse Tree and AST Analysis These representations helps us

understanding of semantics of the program So helps us in finding deviation of rules of grammar and security rules like a if block should start and end with curly bracket Most modern compilers does these kind of checking and violation comes as a parse error Codan the code analysis platform in CDT(CC+ Development Tools eclipse plugin ) uses AST to built checkers Similar Platform PMD Crystal (eclipse plugins) uses AST for detecting errors in Java

Control Flow Graph Analysis AST and parse trees though

appear useful enough in detecting most of rule violations they fail in case of rules that apply for branching in code As example we may take that opened file or database should be closed once and only once in a flow control of a programme CFG is analysed in a number of stages starting from basic block and then a procedure( method or function) and then to a bigger module like class Fortify Source Code Analyser Klockwork

Data Flow Diagram Analysis A Data Flow Diagram (DFD) with

security-specific annotations is used to describe how data enters leaves and traverses the system it shows data sources and destinations relevant processes that data goes through and trust boundaries in the system A DFD has a fixed set of component types Process HighLevelProcess Data Store and External Interactor A process is concern of DFD diagram A High Level program is represented by hierarchical multistage DFDs A datastore may be a database a file or the Registry An ExternalInteractor represents an entity that exists outside the system being modelled and which interacts with the system at an entry point typically a human The data flows are represented by arrows A DFD used in threat modelling often separates elements that have different privilege levels using a Boundary to describe locations where a privilege impersonation on the part of an adversary could occur a machine or process boundary may be crossed etc

13

Taint AnalysisThe concept of tainting refers to marking data coming

from an untrusted source as ldquotaintedrdquo and propagating its status to all locations where the data is used A security policy specifies what uses of untrusted data are allowed or restricted An attempt to use tainted data is a violation of this policy is an indication of a vulnerability Tainted data should not be used in any function which modifies files directories and processes or executes external programs If the rule is violated then the program should be aborted

1 Initialize all variables as NOT TAINTED 2 Find all calls to functions that read data from an untrusted source Mark the values returned by these functions as TAINTED 3 Propagate the tainted values through the program If a tainted value is used in an expression mark the result of the expression as TAINTED 4 Repeat step 3 until a fixed point is reached 5 Find all calls to potentially vulnerable functions If one of their arguments is tainted report this as a vulnerability 1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

Using Taint analysis memcpy() of both line no 5 and 7 will be marked as

vulnerability where as in actual case 5 is a false positive So taint analysis has a high possibility of giving false positive

Value Range Propagation In this case the tainted variables should

also carry a range of its possible values If some vulnerable function uses that variable then it may be checked that the value range of that variable still makes the function vulnerable or not Thus we may avoid some false positives

14

1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

2221Security Knowledge The main logic behind these tools

is to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again There are many such collections of common mistakes done by programmers

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb project(httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project)

3) SAMATE group at NIST (httpsamatenistgov)

In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized

15

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

221 Build model In analysis to understand the code by

analysis tools it needs to be represented by data structures that most nearly represents the property to be analysed Those basic data structures are actually build by compilers and static analysis tools borrow them and Those data-structures are lexer tokens parse tree abstract syntax tree (AST) control flow graph(CFG) dataflow diagram(DFD) This models are build by compilers or static analysis tools or by both

Models Used in Analysis

Lexed Tokens The source code converted into a token stream

discarding unimportant whitespaces and comments Eg Source Code if (ret) mat[x][y] = END_VAL This code produces the following sequence of tokens Lexer Output IF LPAREN ID(ret) RPAREN ID(mat) LBRACKET ID(x) RBRACKET LBRACKET ID(y) RBRACKET EQUAL ID(END_VAL) SEMI Some of the token needs extra one property like name for identifier(ID) These token stream is subsequently used in making parse tree

Parse Tree A language parser uses a context-free grammar (CFG) to match

the token stream The grammar consists of a set of productions that describe the symbols (elements) in the language The parser performs a derivation by matching the token stream against the production rules If each symbol is connected to the symbol from which it was derived a parse tree is formed

10

Control Flow Graph It is the graphical way of representing all

possible way the flow of programme may occur Each node in CFG represents a basic block that has no branching or looping CFG gives the idea of Cyclomatic complexity that directly shows the no of possibility of errors During dynamic analysis it also helps us to get exhaustive sets of test cases

Source Code if (a gt b) nConsec = 0 else s1 = getHexChar(1) s2 = getHexChar(2) return nConsec

11

Parser Output Parse Tree

CFG Builder output

Data Flow Diagram Data Flow Diagram shows all the possible path of

data in put and output and data transfer between different entities within the software thus giving us the points where data should be validated and setting up trust boundaries

12

If(agtb)

nConsec=0 s1 = getHexChar() s2= getHexChar()

return nConsec

222 Perform Analysis Analysis is performed on the tokens or nodes of tree

or graphs Lexical analysis Simplest of all analysis techniques helps in

checking syntactical errors and it uses in most cases regular pattern matching Not much useful than detecting wrong identifier names or function names Tools using lexical analysis techniques are ITS4 RATS and Flawfinder

Parse Tree and AST Analysis These representations helps us

understanding of semantics of the program So helps us in finding deviation of rules of grammar and security rules like a if block should start and end with curly bracket Most modern compilers does these kind of checking and violation comes as a parse error Codan the code analysis platform in CDT(CC+ Development Tools eclipse plugin ) uses AST to built checkers Similar Platform PMD Crystal (eclipse plugins) uses AST for detecting errors in Java

Control Flow Graph Analysis AST and parse trees though

appear useful enough in detecting most of rule violations they fail in case of rules that apply for branching in code As example we may take that opened file or database should be closed once and only once in a flow control of a programme CFG is analysed in a number of stages starting from basic block and then a procedure( method or function) and then to a bigger module like class Fortify Source Code Analyser Klockwork

Data Flow Diagram Analysis A Data Flow Diagram (DFD) with

security-specific annotations is used to describe how data enters leaves and traverses the system it shows data sources and destinations relevant processes that data goes through and trust boundaries in the system A DFD has a fixed set of component types Process HighLevelProcess Data Store and External Interactor A process is concern of DFD diagram A High Level program is represented by hierarchical multistage DFDs A datastore may be a database a file or the Registry An ExternalInteractor represents an entity that exists outside the system being modelled and which interacts with the system at an entry point typically a human The data flows are represented by arrows A DFD used in threat modelling often separates elements that have different privilege levels using a Boundary to describe locations where a privilege impersonation on the part of an adversary could occur a machine or process boundary may be crossed etc

13

Taint AnalysisThe concept of tainting refers to marking data coming

from an untrusted source as ldquotaintedrdquo and propagating its status to all locations where the data is used A security policy specifies what uses of untrusted data are allowed or restricted An attempt to use tainted data is a violation of this policy is an indication of a vulnerability Tainted data should not be used in any function which modifies files directories and processes or executes external programs If the rule is violated then the program should be aborted

1 Initialize all variables as NOT TAINTED 2 Find all calls to functions that read data from an untrusted source Mark the values returned by these functions as TAINTED 3 Propagate the tainted values through the program If a tainted value is used in an expression mark the result of the expression as TAINTED 4 Repeat step 3 until a fixed point is reached 5 Find all calls to potentially vulnerable functions If one of their arguments is tainted report this as a vulnerability 1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

Using Taint analysis memcpy() of both line no 5 and 7 will be marked as

vulnerability where as in actual case 5 is a false positive So taint analysis has a high possibility of giving false positive

Value Range Propagation In this case the tainted variables should

also carry a range of its possible values If some vulnerable function uses that variable then it may be checked that the value range of that variable still makes the function vulnerable or not Thus we may avoid some false positives

14

1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

2221Security Knowledge The main logic behind these tools

is to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again There are many such collections of common mistakes done by programmers

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb project(httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project)

3) SAMATE group at NIST (httpsamatenistgov)

In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized

15

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

Control Flow Graph It is the graphical way of representing all

possible way the flow of programme may occur Each node in CFG represents a basic block that has no branching or looping CFG gives the idea of Cyclomatic complexity that directly shows the no of possibility of errors During dynamic analysis it also helps us to get exhaustive sets of test cases

Source Code if (a gt b) nConsec = 0 else s1 = getHexChar(1) s2 = getHexChar(2) return nConsec

11

Parser Output Parse Tree

CFG Builder output

Data Flow Diagram Data Flow Diagram shows all the possible path of

data in put and output and data transfer between different entities within the software thus giving us the points where data should be validated and setting up trust boundaries

12

If(agtb)

nConsec=0 s1 = getHexChar() s2= getHexChar()

return nConsec

222 Perform Analysis Analysis is performed on the tokens or nodes of tree

or graphs Lexical analysis Simplest of all analysis techniques helps in

checking syntactical errors and it uses in most cases regular pattern matching Not much useful than detecting wrong identifier names or function names Tools using lexical analysis techniques are ITS4 RATS and Flawfinder

Parse Tree and AST Analysis These representations helps us

understanding of semantics of the program So helps us in finding deviation of rules of grammar and security rules like a if block should start and end with curly bracket Most modern compilers does these kind of checking and violation comes as a parse error Codan the code analysis platform in CDT(CC+ Development Tools eclipse plugin ) uses AST to built checkers Similar Platform PMD Crystal (eclipse plugins) uses AST for detecting errors in Java

Control Flow Graph Analysis AST and parse trees though

appear useful enough in detecting most of rule violations they fail in case of rules that apply for branching in code As example we may take that opened file or database should be closed once and only once in a flow control of a programme CFG is analysed in a number of stages starting from basic block and then a procedure( method or function) and then to a bigger module like class Fortify Source Code Analyser Klockwork

Data Flow Diagram Analysis A Data Flow Diagram (DFD) with

security-specific annotations is used to describe how data enters leaves and traverses the system it shows data sources and destinations relevant processes that data goes through and trust boundaries in the system A DFD has a fixed set of component types Process HighLevelProcess Data Store and External Interactor A process is concern of DFD diagram A High Level program is represented by hierarchical multistage DFDs A datastore may be a database a file or the Registry An ExternalInteractor represents an entity that exists outside the system being modelled and which interacts with the system at an entry point typically a human The data flows are represented by arrows A DFD used in threat modelling often separates elements that have different privilege levels using a Boundary to describe locations where a privilege impersonation on the part of an adversary could occur a machine or process boundary may be crossed etc

13

Taint AnalysisThe concept of tainting refers to marking data coming

from an untrusted source as ldquotaintedrdquo and propagating its status to all locations where the data is used A security policy specifies what uses of untrusted data are allowed or restricted An attempt to use tainted data is a violation of this policy is an indication of a vulnerability Tainted data should not be used in any function which modifies files directories and processes or executes external programs If the rule is violated then the program should be aborted

1 Initialize all variables as NOT TAINTED 2 Find all calls to functions that read data from an untrusted source Mark the values returned by these functions as TAINTED 3 Propagate the tainted values through the program If a tainted value is used in an expression mark the result of the expression as TAINTED 4 Repeat step 3 until a fixed point is reached 5 Find all calls to potentially vulnerable functions If one of their arguments is tainted report this as a vulnerability 1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

Using Taint analysis memcpy() of both line no 5 and 7 will be marked as

vulnerability where as in actual case 5 is a false positive So taint analysis has a high possibility of giving false positive

Value Range Propagation In this case the tainted variables should

also carry a range of its possible values If some vulnerable function uses that variable then it may be checked that the value range of that variable still makes the function vulnerable or not Thus we may avoid some false positives

14

1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

2221Security Knowledge The main logic behind these tools

is to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again There are many such collections of common mistakes done by programmers

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb project(httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project)

3) SAMATE group at NIST (httpsamatenistgov)

In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized

15

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

CFG Builder output

Data Flow Diagram Data Flow Diagram shows all the possible path of

data in put and output and data transfer between different entities within the software thus giving us the points where data should be validated and setting up trust boundaries

12

If(agtb)

nConsec=0 s1 = getHexChar() s2= getHexChar()

return nConsec

222 Perform Analysis Analysis is performed on the tokens or nodes of tree

or graphs Lexical analysis Simplest of all analysis techniques helps in

checking syntactical errors and it uses in most cases regular pattern matching Not much useful than detecting wrong identifier names or function names Tools using lexical analysis techniques are ITS4 RATS and Flawfinder

Parse Tree and AST Analysis These representations helps us

understanding of semantics of the program So helps us in finding deviation of rules of grammar and security rules like a if block should start and end with curly bracket Most modern compilers does these kind of checking and violation comes as a parse error Codan the code analysis platform in CDT(CC+ Development Tools eclipse plugin ) uses AST to built checkers Similar Platform PMD Crystal (eclipse plugins) uses AST for detecting errors in Java

Control Flow Graph Analysis AST and parse trees though

appear useful enough in detecting most of rule violations they fail in case of rules that apply for branching in code As example we may take that opened file or database should be closed once and only once in a flow control of a programme CFG is analysed in a number of stages starting from basic block and then a procedure( method or function) and then to a bigger module like class Fortify Source Code Analyser Klockwork

Data Flow Diagram Analysis A Data Flow Diagram (DFD) with

security-specific annotations is used to describe how data enters leaves and traverses the system it shows data sources and destinations relevant processes that data goes through and trust boundaries in the system A DFD has a fixed set of component types Process HighLevelProcess Data Store and External Interactor A process is concern of DFD diagram A High Level program is represented by hierarchical multistage DFDs A datastore may be a database a file or the Registry An ExternalInteractor represents an entity that exists outside the system being modelled and which interacts with the system at an entry point typically a human The data flows are represented by arrows A DFD used in threat modelling often separates elements that have different privilege levels using a Boundary to describe locations where a privilege impersonation on the part of an adversary could occur a machine or process boundary may be crossed etc

13

Taint AnalysisThe concept of tainting refers to marking data coming

from an untrusted source as ldquotaintedrdquo and propagating its status to all locations where the data is used A security policy specifies what uses of untrusted data are allowed or restricted An attempt to use tainted data is a violation of this policy is an indication of a vulnerability Tainted data should not be used in any function which modifies files directories and processes or executes external programs If the rule is violated then the program should be aborted

1 Initialize all variables as NOT TAINTED 2 Find all calls to functions that read data from an untrusted source Mark the values returned by these functions as TAINTED 3 Propagate the tainted values through the program If a tainted value is used in an expression mark the result of the expression as TAINTED 4 Repeat step 3 until a fixed point is reached 5 Find all calls to potentially vulnerable functions If one of their arguments is tainted report this as a vulnerability 1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

Using Taint analysis memcpy() of both line no 5 and 7 will be marked as

vulnerability where as in actual case 5 is a false positive So taint analysis has a high possibility of giving false positive

Value Range Propagation In this case the tainted variables should

also carry a range of its possible values If some vulnerable function uses that variable then it may be checked that the value range of that variable still makes the function vulnerable or not Thus we may avoid some false positives

14

1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

2221Security Knowledge The main logic behind these tools

is to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again There are many such collections of common mistakes done by programmers

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb project(httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project)

3) SAMATE group at NIST (httpsamatenistgov)

In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized

15

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

222 Perform Analysis Analysis is performed on the tokens or nodes of tree

or graphs Lexical analysis Simplest of all analysis techniques helps in

checking syntactical errors and it uses in most cases regular pattern matching Not much useful than detecting wrong identifier names or function names Tools using lexical analysis techniques are ITS4 RATS and Flawfinder

Parse Tree and AST Analysis These representations helps us

understanding of semantics of the program So helps us in finding deviation of rules of grammar and security rules like a if block should start and end with curly bracket Most modern compilers does these kind of checking and violation comes as a parse error Codan the code analysis platform in CDT(CC+ Development Tools eclipse plugin ) uses AST to built checkers Similar Platform PMD Crystal (eclipse plugins) uses AST for detecting errors in Java

Control Flow Graph Analysis AST and parse trees though

appear useful enough in detecting most of rule violations they fail in case of rules that apply for branching in code As example we may take that opened file or database should be closed once and only once in a flow control of a programme CFG is analysed in a number of stages starting from basic block and then a procedure( method or function) and then to a bigger module like class Fortify Source Code Analyser Klockwork

Data Flow Diagram Analysis A Data Flow Diagram (DFD) with

security-specific annotations is used to describe how data enters leaves and traverses the system it shows data sources and destinations relevant processes that data goes through and trust boundaries in the system A DFD has a fixed set of component types Process HighLevelProcess Data Store and External Interactor A process is concern of DFD diagram A High Level program is represented by hierarchical multistage DFDs A datastore may be a database a file or the Registry An ExternalInteractor represents an entity that exists outside the system being modelled and which interacts with the system at an entry point typically a human The data flows are represented by arrows A DFD used in threat modelling often separates elements that have different privilege levels using a Boundary to describe locations where a privilege impersonation on the part of an adversary could occur a machine or process boundary may be crossed etc

13

Taint AnalysisThe concept of tainting refers to marking data coming

from an untrusted source as ldquotaintedrdquo and propagating its status to all locations where the data is used A security policy specifies what uses of untrusted data are allowed or restricted An attempt to use tainted data is a violation of this policy is an indication of a vulnerability Tainted data should not be used in any function which modifies files directories and processes or executes external programs If the rule is violated then the program should be aborted

1 Initialize all variables as NOT TAINTED 2 Find all calls to functions that read data from an untrusted source Mark the values returned by these functions as TAINTED 3 Propagate the tainted values through the program If a tainted value is used in an expression mark the result of the expression as TAINTED 4 Repeat step 3 until a fixed point is reached 5 Find all calls to potentially vulnerable functions If one of their arguments is tainted report this as a vulnerability 1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

Using Taint analysis memcpy() of both line no 5 and 7 will be marked as

vulnerability where as in actual case 5 is a false positive So taint analysis has a high possibility of giving false positive

Value Range Propagation In this case the tainted variables should

also carry a range of its possible values If some vulnerable function uses that variable then it may be checked that the value range of that variable still makes the function vulnerable or not Thus we may avoid some false positives

14

1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

2221Security Knowledge The main logic behind these tools

is to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again There are many such collections of common mistakes done by programmers

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb project(httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project)

3) SAMATE group at NIST (httpsamatenistgov)

In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized

15

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

Taint AnalysisThe concept of tainting refers to marking data coming

from an untrusted source as ldquotaintedrdquo and propagating its status to all locations where the data is used A security policy specifies what uses of untrusted data are allowed or restricted An attempt to use tainted data is a violation of this policy is an indication of a vulnerability Tainted data should not be used in any function which modifies files directories and processes or executes external programs If the rule is violated then the program should be aborted

1 Initialize all variables as NOT TAINTED 2 Find all calls to functions that read data from an untrusted source Mark the values returned by these functions as TAINTED 3 Propagate the tainted values through the program If a tainted value is used in an expression mark the result of the expression as TAINTED 4 Repeat step 3 until a fixed point is reached 5 Find all calls to potentially vulnerable functions If one of their arguments is tainted report this as a vulnerability 1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

Using Taint analysis memcpy() of both line no 5 and 7 will be marked as

vulnerability where as in actual case 5 is a false positive So taint analysis has a high possibility of giving false positive

Value Range Propagation In this case the tainted variables should

also carry a range of its possible values If some vulnerable function uses that variable then it may be checked that the value range of that variable still makes the function vulnerable or not Thus we may avoid some false positives

14

1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

2221Security Knowledge The main logic behind these tools

is to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again There are many such collections of common mistakes done by programmers

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb project(httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project)

3) SAMATE group at NIST (httpsamatenistgov)

In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized

15

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

1 unsigned int n 2 char src[10] dst[10] 3 n = read_int () 4 if (n lt= sizeof (dst)) 5 memcpy (src dst n) n is lt sizeof (dst) 6 else 7 memcpy (src dst n) n is gt sizeof (dst)

2221Security Knowledge The main logic behind these tools

is to learn from our past mistakes and attack due to weaknesses in code and prevent them from happening again There are many such collections of common mistakes done by programmers

1) Common Weakness Enumeration (CWE) (httpcvemitreorgcwe)

2) OWASP Honeycomb project(httpwwwowasporgindexphpCategoryOWASP_Honeycomb_Project)

3) SAMATE group at NIST (httpsamatenistgov)

In these collection of numerous errors there are patterns and repetitive errors and so the most generic problems may be categorized

15

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

223Presenting And Processing Results The security vulnerabilities are reviewed manually and those are fixed In some cases the analysers itself give some solution for the problem The problems are given in different categories like error (severe threat) warning (may or may not be a security bug but obeying it is good practice) info(good coding practice but no threat )

3Ways of Implementing Static Analysis Tools Application of Static Analysis Tools in Practical World

I Integration with compilers Static analysis tools are part and parcel of modern compilers They does all the basic checking like type checking style checking parse errors identifiers never used many others But it needs compilation of whole programme

Eg- gcc compiler (c language) II Integration with IDEs IDEs (Integrated Development Environment )

use vulnerability checkers as add-ons or plugins to show the errors in coding in the editors while writing This is the most useful form of static analysis tools as it not only shows error but also gives quick fixes Eg- Eclipse Netbeans

III Stand Alone Platforms This kind of tools are generally the most sophisticated ones and detects most complicated problems They are exclusively built for detecting software weaknesses

Eg- Fortify Source code Analyser Klockwork Ounce

31 My Static analysis Tools implementation I have built static analysis tools for an IDE (Integrated Development Environment) The IDE chosen is Eclipse one of the most used platform by software developers This tools are integrated as plugins to eclipse and runs in the back end to find error in code

16

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

4Implementation Tools 41 Hardware Details Model Dell PC Processor Intel(R) Coretrade2 Duo CPU Installed memory (RAM) 4GB System type 64-bit OS 42 Software Details Operating System Windows 7 Basic Softwares Used Jdk 16 Mingw compiler Eclipse IDE

421 Eclipse IDE Eclipse is one of the most used IDE

for java But it also gives tools to build software in other languages As here I have used CDT (CC++ Development Tools ) which comes as plugin to the eclipse

422 Codan(Code Analysis) Codan which is a light-

weight static analysis framework in CDT ( CDT is Eclipses CC++ Development Tools) which would perform real time analysis on the code to find common defects violation of policies etc Framework contains common components and APIs that is shared between static analysis tools for CC++ such as

Profile Editor (Problem Preferences) We can enable or disable our checker Severity of the Problem is specified We can change the

severity of the problem When we keep cursor on the checker the description about

the checker is displayed

17

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

bull How to build an AST of a CC++ source Windows gtShow view

gt Others gtCC++gtDOM AST bull How to get CDT Helpgt Install New SoftwaregtAdd the Url httpdownloadeclipseorgreleasesindigo And then select CDT

423 PDE (Plugin Development Environment) To develop eclipse plugins there is a plugin development

platform The ways of building plugins may be seen from reference 4 The Basic steps of Plugin Development are 1First Go to Filegt New project gt Plugin project 2 Then the MANIFESTMF in META-INF is edited

bull Add Dependencies (ie The plugins that are needed to run this checker plugin)

bull ADD Runtime bull ADD Extensions (eg These checkers need a point of extension orgeclipse

cdtcodancorecheckers) bull Add checker by right clickgtNewgtchecker Give class name as name of its

source code bull Under checker Add problem by right clickgtNewgtproblem And there

message that should be shown when error occurs and default enable etc bull On the Overview page in the Exporting part click on Organize Manifests

Wizard gtfinish Externalize Strings Wizard gtfinish bull At last in the Export Wizard portion Archive file give the name of your

plugin bull Now this zip folder may be included in eclipse plugins folder to make it

permanent in codan

3To test the plugin run it and then another eclipse window opens Right faulty code that your checker is supposed to catch You can see error and messages in the editor

18

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

5Implementation and Results C language(So also in C++) has a lots of vulnerable library function which may be used to crash a program if the arguments are unchecked and unsanitized So we may use some checker to make the programmer aware about such vulnerable functions or whether the values of arguments taken have no potential threat

Codan Plugins developed I For C function int strncpy(char dst const char src

size_t n) It is erroneous to give value of n greater than or equal to size of destination (dst) allocated So it must be checked when this vulnerable function is used It is an example of buffer overflow Algorithm Used 1) First find a call to function strncpy() by viewing all call to ICPPASTFunctionCallExpression and then checking whether first IASTNode is string ldquostrncpyrdquo If it is true then get the String value of the next three nodes The first string is the name of destination and third String is string form of value of n 2) Now we need to get the space allocated for destination

character pointer For that we may visit all the IASTDeclarationStatements made and find out the declaration of the destination character pointer and the space allocated

3) Last step is to compare the allocated space of destination character pointer and the value of n

19

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

Source Code AST

20

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

Limitations And Inefficiency of the checker 1)In the 2nd step of the algorithm the size of the allocated space of destination is determined by accessing the nodes to proper position But this method is inappropriate as space allocation may be done in two different ways so the solution to these may be maintaining a symbol table during static analysis as done during compilation II For C function fopen(stream rsquorrsquo) and fopen( stream rsquowrsquo) When a file is opened in lsquorrsquo mode then the file should already be present in the given stream and if the file is opened in write mode then we should warn the programmer about being the file overwritten There should be a block before every fopen() function with lsquorrsquo or lsquowrsquo mode to check those above conditions

Algorithm Used 1)First check all the ICPPASTFunctionDeclarations for fopen( ) function calls and accessing all its nodes get the name of stream and mode of opening 2)Then search all ICPPASTIfStatements to see if there is a desired function to check the given conditions above ie in this case(read ) access() function which should be in a IASTUnaryExpression (negation ) and there should be a return statement inside the block But in case of write case there should be access() function with appropriate return statement

21

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

22

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

III For C function printf() and its friends [fprintf() sprintf()sprintf() vprintf()] as well as scanf() and its friends sscanf() fscanf() vscanf() All these function takes a format string and all the arguments needed mentioned in format string Error here may occur in two cases 1) If number of format specifiers in the format string is not equal to arguments present 2) If there is no format string 3) If format specifier and the corresponding argument indicates two different type Algorithm 1) Inside a function get account of all the IASTDeclarations and the type of variables And at the same time see the IASTFunctionCallExpression If a function declaration is printf or scanf that may be known from its first node then see the total number of nodes of it ( let be x) 2)The 2nd node is the format string A regular expression analysis is done on this node (the string form of this node excluding the first and last character ) and number of substring with javaregexPattern ( [-+0][(0-9)][(0-9)] [hlL] [cdie EfgGosuxXp]) is noted and if it is equal to (x-2) then it is okay otherwise it is an error 3) To check the third one we will have to check all the format specifier and the corresponding variable type

23

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

24

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

25

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

6Future Work Scope 1)Maintaining a symbol table for all the variables (ie type allocated space name for easy access of them while needed This may help in many problems eg- To check all the values possible in the switch argument variable are covered by the cases 2) Building checkers for other problems Eg-

i) using some variable without initialization ii)using some variable after freeing the space

3) Building static analysis checkers for other languages

7 Limitation of the Project 1) To build static analysers there needs to be a platform or otherwise static analysis tool builders will have to build their own suitable compiler 2) A checker platform by which we may visit all the nodes of CFG and analyse them individually so that we may be able to solve problems that involves understanding of CFG eg- A file or database that is opened should be closed once and only once in a flow path of CFG from start to end

26

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27

8Conclusion Alan Turing as part of his conception of a general purpose computing machine showed that algorithms cannot be used to solve all problems In particular Turing posed the halting problem the problem of determining whether a given algorithm terminates (reaches a final state) The proof that the halting problem is undecidable boils down to the fact that the only way to know for sure what an algorithm will do is to carry it out So that means that a static analysis tool is not enough to find out if an algorithm can successfully handle a problem The only way to do this is dynamic analysis

9References 1 Secure programming with Static Analysis(By Brian Chess Jacob West Addison Wesley) 2 Compilers (By Aho Sethi Ullman ) 3 Checking Threat Modelling Data Flow Diagrams for Implementation Conformance and Security( Daniel Wang-Peter Torr) 4Control flow graph Generator (By Aldi Alimucaj) 5 How to Write Your Own Eclipse Plug-ins Presentation (by Beth Tibbits IBM) 6 ITS4 A Static Vulnerability Scanner for C and C Code John Viega JT Bloch1048576 Tadayoshi Kohno Gary McGraw 7Static Analysis tools (University of Toronto) 8 httpwikieclipseorgCDTdesignsStaticAnalysis 9 httpwwweclipseorgarticlesArticle-PDE-does-pluginsPDE-introhtml 10 Codan a CC++ Static Analysis Framework for CDT

27