33
The operation principles of PVS- Studio static code analyzer Authors: Candidate of Engineering Sciences Evgeniy Ryzhkov, [email protected] Candidate of Physico-Mathematical Sciences Andrey Karpov, [email protected]

The operation principles of PVS-Studio static code analyzer

Embed Size (px)

Citation preview

The operation principles of PVS-Studio static code

analyzerAuthors:Candidate of Engineering Sciences

Evgeniy Ryzhkov, [email protected]

Candidate of Physico-Mathematical Sciences

Andrey Karpov, [email protected]

OOO "Program Verification Systems" (www.viva64.com)• Development, marketing and sales of a software product.• Office: Tula, 200 km away from Moscow.• Staff: 24 people.

PVS-Studio• More than 320 diagnostics for C, C++• More than 120 diagnostics for C#• Windows• Linux• Plugin for Visual Studio• Quick Start (compilation monitoring)• SonarQube

Our achievements• To let the world know about our product, we check open-source projects. By the

moment we have checked about 270 projects.

• A “side” effect: we found more than 10 000 bugs in open source projects, without setting it as a goal.

• On the average there are 40 errors in a project - not that much.

• It is important to emphasize once more that this was a “side” effect. We don’t have a goal to find as many errors as possible. Quite often, we stop when we find enough errors for an article.

• Conclusion: it’s rather easy to check even unfamiliar projects and find errors in them.

In the beginning: what we DO NOT USE

We do not use formal grammar for analysis• The analyzer works on a higher level• We analyze the derivation tree • To build the tree we rely on existing components:• External preprocessor• OpenC ++ library, which we improved with the development of C++ (actually

there is almost nothing left from OpenC++)• When working with C# code we take Roslyn as the basis

We do not use methods of programs proofs.• PVS-Studio has nothing to do with the Prototype Verification System

(PVS) http://pvs.csl.sri.com/• PVS-Studio is a contraction of "Program Verification Systems" (OOO

"Program Verification Systems")

We do not use substring search (string matching) and regular expressions• A dead-end way• It is of no use even in the simplest situations• Example: if (A+B == A+B)• A+B == B+A• A+(B) == (A)+B• ((A+B)) == A+B

• More fatal: types, object sizes, inheritance, variable values and so on.

What we USE

The details of C++ and C# analysis differ, we are not going to cover them here

Pattern-based analysis• Pattern matching based on the derivation tree• It is used to search for fragments in the source code that are similar to

the known code patterns with an error• The complexity of the diagnostics varies greatly• In some cases these are empirical algorithms

if ((*path)[0]->e->dest->loop_father != path->last()->e->....){ delete_jump_thread_path (path); e->aux = NULL; ei_next (&ei;);}else{ delete_jump_thread_path (path); e->aux = NULL; ei_next (&ei;);}

A simple case: copy-paste

The GCC Project

V523 The 'then' statement is equivalent to the 'else' statement. tree-ssa-threadupdate.c 2596

A more complicated case: check of a wrong variable

public override Predicate JoinWith(Predicate other){ var right = other as PredicateNullness; if (other != null) { if (this.value == right.value) {

The CodeContracts Project

V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'other', 'right'. CallerInvariant.cs 189

Quite a complicated case: a badly written macro

#define ICB2400_VPINFO_PORT_OFF(chan) \ (ICB2400_VPINFO_OFF + \ sizeof (isp_icb_2400_vpinfo_t) + \ (chan * ICB2400_VPOPT_WRITE_SIZE))

off += ICB2400_VPINFO_PORT_OFF(chan - 1);

V733 It is possible that macro expansion resulted in incorrect evaluation order. Check expression: chan - 1 * 20. isp.c 2301

The FreeBSD Project

Type inference• The type inference based on the semantic model of the program

allows the analyzer to have full information about all variables and statements in the code. • It is important to detect errors• It is important for exceptions• The information about classes is especially important

Types are also important for bug detection

The Cocos2d-x project

WCHAR *gai_strerrorW(int ecode);

#define gai_strerror gai_strerrorW

fprintf(stderr, "net_listen error for %s: %s", serv, gai_strerror(n)); V576 Incorrect format. Consider checking the fourth actual argument of the 'fprintf' function. The pointer to string of char type symbols is expected. ccconsole.cpp 341

Types are important for exceptions

// volatile the variable is assigned to itselfvolatile int *ptr;....*ptr = *ptr; // No positive V570

The information about classes is especially important: inheritance hierarchy, for instance

class sg_throwable : public std::exception { .... };class sg_exception : public sg_throwable { .... };

if (!aInstall) { sg_exception("missing argument to scheduleToUpdate");}V596 The object was created but it is not being used. The 'throw' keyword could be missing: throw sg_exception(FOO); root.cxx 239

The FlightGear project

Symbolic execution• The symbolic execution allows evaluating variable values that can lead

to errors, perform range checking of values. • One of the most important mechanisms:• Overflows• Memory Leaks• Array index out of bounds• Null pointers/references• Meaningless conditions• Division by zero• and so on…

The values of variables: the size of the array, indicesHandle<YieldTermStructure> md0Yts() { double q6mh[] = { 0.0001,0.0001,0.0001,0.0003,0.00055,0.0009,0.0014,0.0019, 0.0025,0.0031,0.00325,0.00313,0.0031,0.00307,0.00309, ........................................................ 0.02336,0.02407,0.0245 }; 60 elements .... for(int i=0;i<10+18+37;i++) { i < 65 q6m.push_back( boost::shared_ptr<Quote>(new SimpleQuote(q6mh[i])));

The QuantLib project

V557 Array overrun is possible. The value of 'i' index could reach 64. markovfunctional.cpp 176

The values of variables: using conditions to determine the range

std::string rangeTypeLabel(int idx){ const char* rangeTypeLabels[] = {"Self", "Touch", "Target"}; if (idx >= 0 && idx <= 3) return rangeTypeLabels[idx]; else return "Invalid";}

V557 Array overrun is possible. The value of 'idx' index could reach 3. esmtool labels.cpp 502

The OpenMW project

The values of functionsstatic inline size_t UnboxedTypeSize(JSValueType type){ switch (type) { ....... default: return 0; }}

Minstruction *loadUnboxedProperty(size_t offset, ....){ size_t index = offset / UnboxedTypeSize(unboxedType);

The Thunderbird project

V609 Divide by zero. Denominator range [0..8]. ionbuilder.cpp 10922

The values of variables: pointers/referencesif (providerName == null){ ProviderNotFoundException e = new ProviderNotFoundException( providerName.ToString(), SessionStateCategory.CmdletProvider, "ProviderNotFound", SessionStateStrings.ProviderNotFound); throw e;V3080 Possible null dereference. Consider inspecting 'providerName'. System.Management.Automation SessionStateProviderAPIs.cs 1004

The PowerShell Project

Method annotations• Method annotations provides more information about the used

methods than can be obtained by analyzing only their signatures. • C/C++. By this moment we have annotated 6570 functions (standard C

and C++ libraries, POSIX, MFC, Qt, ZLib and so on). • C#. At the moment we have annotated 920 functions.

An example of annotating the memcmp function

C_"int memcmp(const void *buf1, const void *buf2, size_t count);"ADD(REENTERABLE | RET_USE | F_MEMCMP | STRCMP | HARD_TEST | INT_STATUS, nullptr, nullptr, "memcmp", POINTER_1, POINTER_2, BYTE_COUNT);• C_- an auxiliary control mechanism of annotations (unit tests)• REENTERABLE - repetitive call with the same arguments will give the same result• RET_USE - the result should be used• F_MEMCMP - launch of certain checks for buffer out of bounds• STR_CMP - the function returns 0 in case of equality• HARD_TEST - a special function. Some programmers define their own functions in their own

namespace. Ignore namespace.• INT_STATUS - explicitly compare the result with 1 or -1.• POINTER_1, POINTER_2 - the pointers must be non-zero and different.• BYTE_COUNT - this parameter specifies the number of bytes and must be > 0.

Annotation of memcmp: checking the result

bool operator()(const GUID& _Key1, const GUID& _Key2) const{ return memcmp(&_Key1, &_Key2, sizeof(GUID)) == -1;}

The CoreCLR project

V698 Expression 'memcmp(....) == -1' is incorrect. This function can return not only the value '-1', but any negative value. Consider using 'memcmp(....) < 0' instead. sos util.cpp 142

Annotation of memcmp: storing the result

The Firebird project

V642 Saving the 'memcmp' function result inside the 'short' type variable is inappropriate. The significant bits could be lost breaking the program's logic. texttype.cpp 3

SSHORT TextType::compare(ULONG len1, const UCHAR* str1, ULONG len2, const UCHAR* str2){ .... SSHORT cmp = memcmp(str1, str2, MIN(len1, len2)); if (cmp == 0) cmp = (len1 < len2 ? -1 : (len1 > len2 ? 1 : 0)); return cmp;}

Annotation of memcmp: wrong argument

The GLG3D project

V575 The 'memcmp' function processes '0' elements. Inspect the 'third' argument. graphics3D matrix4.cpp 269

bool Matrix4::operator==(const Matrix4& other) const { if (memcmp(this, &other, sizeof(Matrix4) == 0)) { return true; } ...}

static intpsymbol_compare (const void *addr1, const void *addr2, int length){ struct partial_symbol *sym1 = (struct partial_symbol *) addr1; struct partial_symbol *sym2 = (struct partial_symbol *) addr2;

return (memcmp (&sym1->ginfo.value, &sym1->ginfo.value, sizeof (sym1->ginfo.value)) == 0 && .......

Annotation of memcmp: different arguments

The GDB Project

V549 The first argument of 'memcmp' function is equal to the second argument. psymtab.c 1580

dst_s_read_private_key_file(....){ .... if (memcmp(in_buff, "Private-key-format: v", 20) != 0) goto fail; ....} 21 character

Annotation of memcmp: buffer underrun

The Haiku project

V512 A call of the 'memcmp' function will lead to underflow of the buffer '"Private-key-format: v"'. dst_api.c 858

Annotation of memcmp: no statusThe PHP project

V501 There are identical sub-expressions '!memcmp("auto", charset_hint, 4)' to the left and to the right of the '||' operator. html.c 396

if ((len == 4) /* sizeof (none|auto|pass) */ && (!memcmp("pass", charset_hint, 4) || !memcmp("auto", charset_hint, 4) || !memcmp("auto", charset_hint, 4)))

Annotation of custom functions• Almost no support (except certain elements, as for example our own

printf function)• There is no sense to develop this mechanism• No one will spend months doing the markup of large projects• The analyzer must work immediately

Testing the analyzer • Testing the analyzer is the most important part of the development

process• The hardest part about static analysis: not to complain• A large test base:• C++ Windows (Visual C++): 120 projects• C++ Linux (GCC): 34 more projects• C# Windows: 54 projects

We can send a more detailed version of the presentation • Write to us: [email protected]• Follow on Twitter: @Code_Analysis

• Download PVS-Studio for Windows:http://www.viva64.com/en/pvs-studio/

• Download PVS-Studio for Linux:http://www.viva64.com/en/pvs-studio-download-linux/