Nikolaj Bjørner Senior Researcher Microsoft Research Redmond

Nikolaj BjørnerSenior Researcher

Microsoft Research Redmond

Modern Satisfiability Modulo Theories Solvers in Program Analysis

Lectures

Wednesday 10:45–12:15An Introduction to Z3 with Applications

Thursday August 30th 15:45–17:15Introduction to SAT and SMT

Friday 10:30–10:45 Theories and Solving Algorithms

Friday 15:45–17:15 Advanced & Summary: Quantifiers, Arrays, Fixed-points

Plan

I. Introduction to Z3 – An Efficient SMT SolverI. ArchitectureII. Interaction – Input formatsIII. Theories

II. Introduction to ApplicationsI. Basic examples of SMT solvingII. A selection of applications using Z3/SMT.

Takeaways:

• Many program analysis, verification and test tools solve problems that can be reduced to logical formulas and transformations between logical formulas at their core.

• Modern SMT solvers are a often good fit.– Handle domains found in programs directly.

• The selected examples are intended to show instances where sub-tasks are reduced to SMT/Z3.

Z3 – AN EFFICIENT SMT SOLVER

http://research.microsoft.com/projects/z3


What is Z3?

TheoriesBit-Vectors

Lin-arithmetic Nonlin-arithmetic

Free (uninterpreted) functions

Arrays

Quantifiers:E-matching

OCaml

.NET

CNative

SMT-LIB

Model Generation:Finite/parametric

Simplify

Floating pointsRecursive Datatypes

Quantifiers:Super-position

Proof objects

TacticsAssumption

tracking

By Leonardo de Moura, Nikolaj Bjørner, Christoph Wintersteiger

F# quote

Python

Z3 is Cutting Edge

http://smtcomp.org

Z3 is Free Software

for research

Z3: Little Engines of Proof

Congruence

Closure

SAT Solver

Simplifier

Quant.Instance

s

Simplex

Bit-Arith

Super-position

Arrays

Datatypes

User-Theorie

s

MBQI

Q- Elim

Freely available from http://research.microsoft.com/projects/z3


INPUT FORMATS

Input Formats

• Text:– SMT-LIB2 - main exchange format for SMT solvers– Log - low-level log for replay– Datalog - a Datalog format for the fixed-point engine– DIMACS - a format for propositional SAT

• Programmatic:– C - API functions exposed for C– Ocaml - Ocaml wrapper around C API– .NET - .NET wrapper around C API– Python - Try it on http://rise4fun.com/z3py – Scala - by Phillipe Suter

http://rise4fun.com/z3py

A Primer on SMT-LIB2

Online interactive tutorial

http://rise4fun.com/z3/tutorial

http://rise4fun.com/z3/tutorial

THEORIES

Theories

Uninterpreted functions

http://rise4fun.com/Z3/poAs



Uninterpreted functionsArithmetic (linear)

Theories

http://rise4fun.com/Z3/qLLZ

Uninterpreted functionsArithmetic (linear)Bit-vectors

Theories

http://rise4fun.com/Z3/smtc_bv

http://rise4fun.com/Z3/smtc_bv

Uninterpreted functionsArithmetic (linear)Bit-vectorsAlgebraic data-types

Theories

http://rise4fun.com/Z3/smtc_datatypes

Uninterpreted functionsArithmetic (linear)Bit-vectorsAlgebraic data-typesArrays

Theories

http://rise4fun.com/Z3/smtc_arrays

http://rise4fun.com/Z3/smtc_arrays

Uninterpreted functionsArithmetic (linear)Bit-vectorsAlgebraic data-typesArraysPolynomial Arithmetic

Theories

http://rise4fun.com/Z3/heX

http://rise4fun.com/Z3/heX

Uninterpreted functionsArithmetic (linear)Bit-vectorsAlgebraic data-typesArraysPolynomial Arithmetic

Theories

http://rise4fun.com/Z3/VB4

http://rise4fun.com/Z3/VB4

USER-INTERACTION AND GUIDANCE

Logical Formula

Sat/Model

Interaction

Interaction

Logical Formula

Unsat/Proof

Interaction

Simplify

Logical Formula

Interaction

ImpliedEqualities

- x and y are equal- z + y and x + z are equal

Logical Formula

Interaction

QuantifierEliminatio

n

Logical Formula

Interaction

Logical Formula

Unsat. Core

APPLICATIONS OF Z3

A Decision Engine for SoftwareSome Microsoft engines:- SDV: The Static Driver Verifier- PREfix: The Static Analysis Engine for C/C++.- Pex: Program EXploration for .NET.- SAGE: Scalable Automated Guided Execution - Spec#: C# + contracts- VCC: Verifying C Compiler for the Viridian Hyper-Visor- HAVOC: Heap-Aware Verification of C-code.- SpecExplorer: Model-based testing of protocol specs.- Yogi: Dynamic symbolic execution + abstraction.- FORMULA: Model-based Design- F7: Refinement types for security protocols- M3: Model Program Modeling- VS3: Abstract interpretation and Synthesis

They all use the SMT solver Z3.

Hyper-V

Applications

Test case generation

Verifying Compilers

Predicate Abstraction

Invariant Generation

Type Checking

Model Based Testing

Theorem Provers/Satisfiability Checkers

A formula F is validIff

F is unsatisfiable

Theorem Prover/Satisfiability Checker

SatisfiableModel

FUnsatisfiableProof

Theorem Provers/Satisfiability Checkers

A formula F is validIff

F is unsatisfiable


SatisfiableModel

FUnsatisfiableProof

Timeout Memout

Verification/Analysis Tool: “Template”

Verification/AnalysisTool


Problem

Logical Formula

UnsatisfiableSatisfiable(Counter-example)

SMT EXAMPLEBasic

Is formula satisfiable modulo theory T ?

SMT solvers have specialized algorithms for

T

Satisfiability Modulo Theories (SMT)

ArithmeticArray TheoryUninterpreted Functions

𝑥+2=𝑦⇒ 𝑓 (𝑠𝑒𝑙𝑒𝑐𝑡 (𝑠𝑡𝑜𝑟𝑒 (𝑎 ,𝑥 ,3 ) , 𝑦−2 ) )= 𝑓 (𝑦−𝑥+1)

Satisfiability Modulo Theories (SMT)

SMT EXAMPLEJob Shop Scheduling

Job Shop Scheduling

Machines

Jobs

P = NP? Laundry 𝜁 (𝑠 )=0⇒ 𝑠=12+𝑖𝑟

Tasks

Constraints:Precedence: between two tasks of the same job

Resource: Machines execute at most one job at a time

4

132

[ 𝑠𝑡𝑎𝑟 𝑡2,2 ..𝑒𝑛𝑑2,2 ]∩ [ 𝑠𝑡𝑎𝑟 𝑡4,2 ..𝑒𝑛𝑑4,2 ]=∅

Job Shop Scheduling

Constraints:Encoding:

Precedence: - start time of job 2 on mach 3 - duration of job 2 on mach 3Resource:

413

2

[ 𝑠𝑡𝑎𝑟 𝑡2,2 ..𝑒𝑛𝑑2,2 ]∩ [ 𝑠𝑡𝑎𝑟 𝑡4,2 ..𝑒𝑛𝑑4,2 ]=∅

Not convex

Job Shop Scheduling

Job Shop Scheduling

Job Shop Scheduling

case split

case split

Efficient solvers:- Floyd-Warshal algorithm- Ford-Fulkerson algorithm

𝑧−𝑧=5 – 2 –3 – 2=−2<0

EXAMPLE EXERCISESFollow-on to lectures by Hoare and Petersen

True or false?

SMT-LIB2 formulation:(define-fun Max ((x Int) (y Int)) Int (if (> x y) x y))(define-fun ominus ((x Int) (y Int)) Int (Max 0 (- x y)))(declare-const x Int)(declare-const a Int)(assert (not (<= (+ (ominus x a) a) x)))(check-sat)(get-model)

Basic Arithmetic

http://rise4fun.com/Z3/NvqQ

http://rise4fun.com/Z3/ID3

http://rise4fun.com/Z3/Pplg

Follow on Exercises

Prove the other properties for WP

Prove Axioms for Hoare and Milner calculi

SMT APPLICATIONTest case generation

Test generation tools

SAGE

Internal. For Security Fuzzing

Runs on x86 instructions

External. For Developers

Runs on .NET code

Try it on: http://pex4fun.com

Finding security bugs before the hackers

black hat

http://pex4fun.com/

SMT APPLICATIONStatic Analysis

Integrating Z3 with PREfix static analyzer. Yannick Moy, David Selaff, B

int binary_search(int[] arr, int low, int high, int key)

while (low <= high) { // Find middle value int mid = (low + high) / 2; int val = arr[mid]; if (val == key) return mid; if (val < key) low = mid+1; else high = mid-1; } return -1;}

void itoa(int n, char* s) {

if (n < 0) { *s++ = ‘-’; n = -n; } // Add digits to s ….

-INT_MIN= INT_MIN

(INT_MAX+1)/2 +(INT_MAX+1)/2

= INT_MIN

Package: java.util.ArraysFunction: binary_search

Book: Kernighan and RitchieFunction: itoa (integer to ascii)

What is wrong here?

int init_name(char **outname, uint n){ if (n == 0) return 0; else if (n > UINT16_MAX) exit(1); else if ((*outname = malloc(n)) == NULL) { return 0xC0000095; // NT_STATUS_NO_MEM; } return 0;}

int get_name(char* dst, uint size) { char* name; int status = 0; status = init_name(&name, size); if (status != 0) { goto error; } strcpy(dst, name);error: return status;}

The PREfix Static Analysis Engine

C/C++ functions




C/C++ functions

model for function init_nameoutcome init_name_0: guards: n == 0 results: result == 0outcome init_name_1: guards: n > 0; n <= 65535 results: result == 0xC0000095outcome init_name_2: guards: n > 0|; n <= 65535 constraints: valid(outname) results: result == 0; init(*outname)

models




C/C++ functions

model for function init_nameoutcome init_name_0: guards: n == 0 results: result == 0outcome init_name_1: guards: n > 0; n <= 65535 results: result == 0xC0000095outcome init_name_2: guards: n > 0|; n <= 65535 constraints: valid(outname) results: result == 0; init(*outname)

path for function get_name guards: size == 0 constraints: facts: init(dst); init(size); status == 0

models

paths

warnings

pre-condition for function strcpy init(dst) and valid(name)

6/26/2009 54

iElement = m_nSize;if( iElement >= m_nMaxSize ){

bool bSuccess = GrowBuffer( iElement+1 );…

}::new( m_pData+iElement ) E( element );m_nSize++;

Overflow on unsigned addition

m_nSize == m_nMaxSize == UINT_MAX

Write in unallocated

memory

iElement + 1 == 0

Code was written for

address space < 4GB

Using an overflown value as allocation size

ULONG AllocationSize;while (CurrentBuffer != NULL) { if (NumberOfBuffers > MAX_ULONG / sizeof(MYBUFFER)) { return NULL;

} NumberOfBuffers++; CurrentBuffer = CurrentBuffer->NextBuffer;

}AllocationSize = sizeof(MYBUFFER)*NumberOfBuffers;UserBuffersHead = malloc(AllocationSize);

6/26/2009 55

Overflow check

Possible overflow

Increment and exit from loop

LONG l_sub(LONG l_var1, LONG l_var2){ LONG l_diff = l_var1 - l_var2; // perform subtraction // check for overflow if ( (l_var1>0) && (l_var2<0) && (l_diff<0) ) l_diff=0x7FFFFFFF …

6/26/2009 56

Possible overflow

Forget corner case INT_MIN

Overflow on unsigned subtraction

for (uint16 uID = 0; uID < uDevCount && SUCCEEDED(hr); uID++) {…

if (SUCCEEDED(hr)) { uID = uDevCount; // Terminates the loop

6/26/2009 57

Possible overflow

Loop does not terminate

uID == UINT_MAX

Overflow on unsigned addition

Using an overflown value as allocation size

DWORD dwAlloc;dwAlloc = MyList->nElements * sizeof(MY_INFO);if(dwAlloc < MyList->nElements) … // returnMyList->pInfo = malloc(dwAlloc);

6/26/2009 58

Can overflow

Allocate less than needed

Not a proper test

Modular arithmetic

Bit-wise operations

1 0 1 0 1 1 0 1 1 0 0 1

1 0 1 0 1 1 0 1 1 0 0 1

=

Concatenation

1 0 1 0 1 1 [4:2] = 0 1 0

1 0 1 0 1 1

0 1 1 0 0 1

0 0 1 0 0 1

=

1 0 1 0 1 1

0 1 1 0 0 1+

0 0 0 1 0 0

=

Extraction

Bit-wise and

AdditionVector

Segments

Vector Segments

Bit-precise analysis

Bit-precise analysis

FA

a0b0a0b1a0b2a0b3

a1b0a1b1a1b2

a2b0

HAHAHA

FA

FA

a2b1

a3b0

out0 out1 out2 out3

O(n2) clauses

SAT solving time increases exponentially. Similar for BDDs.

Brute-force enumeration + evaluation faster for 20 bits.

Method: Encode bit-vectors bit-by-bit into Boolean Satisfiability

Multiplication [out3, out2, out1, out0] = [a3, a2, a1, a0] * [b3, b2, b1, b0]

SMT APPLICATIONVerified Software

Microsoft Verifying C Compiler

Hypervisor Verification (2007 – 2010)

Partners:• European Microsoft Innovation Center• Microsoft Research• Microsoft’s Windows Division• Universität des Saarlandes

co-funded by the German Ministry of Education and Research

http://www.verisoftxt.de

Hypervisor: Scale & Speed

• VCs have several Mega-bytes• Thousands of non ground formulas• Developers are willing to wait at most 5 min

per VC

Are you willing to wait more than 5 min for your compiler?

Verification Attempt Time vs.Satisfaction and Productivity

By Michal Moskal (VCC Designer and Software Verification Expert), Language quiz: “loose” or “lose” ?

Why did my proof attempt fail?

1. My annotations are not strong enough!weak loop invariants and/or contracts

2. My theorem prover is not strong (or fast) enough.Send “angry” email to Nikolaj and Leo.

VCC Performance Trends Nov 08 – Mar 09

1

10

100

1000

Attempt to improve Boogie/Z3 interaction

Modification in invariant checking

Switch to Boogie2

Switch to Z3 v2

Z3 v2 update

The Importance of Speed

Building VerveVerifie

d

Safe to the Last Instruction / Jean Yang & Chris Hawbliztl PLDI 2010

C# compiler

Kernel.cs

Boogie/Z3

Translator/Assembler

TAL checker

Linker/ISO generator

Verve.iso

Source file

Compilation toolVerification tool

Nucleus.bpl (x86) Kernel.obj (x86)

9 person-months

SMT APPLICATIONFormal Language Theory and Security

Why string analysis?(motivating scenario)

Tomcat v. < 6.0.18

req = http://www.x.com/%c0%ae%c0%ae/%c0%ae%c0%ae/private/

Windows 2000 vulnerability: http://www.sans.org/security-resources/malwarefaq/wnt-unicode.php

Apache Tomcat vulnerability: http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-2938

1) security check: req must not contain "../"

2) dir = utf8decode("%c0%ae%c0%ae/%c0%ae%c0%ae/private/") = "../../private/"

access granted to "../../private/"

Analysis question:Does utf8decode reject overlong

utf8-encodings such as "%C0%AE" for '.'?

http://www.sans.org/security-resources/malwarefaq/wnt-unicode.php



http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-2938



Symbolic Word Transducers

Relativized Formal Language Theory

Classical Word Transducers(e.g. decoding automata,

rational transductions)

Classical I/O Automata(e.g. Mealy machine)

ClassicalWord Acceptors

(NFA, DFA)

Symbolic Word Acceptors

regex matching

string transformation

Classical Word Acceptors modulo Th()

Classical Word Transducers modulo Th()

Rex & Bek – Symbolic RegEx & Transducers

Margus Veanes

http://rise4fun.com/Rex/J3

Symbolic Finite Transducer (SFT)

• Classical transducer modulo a rich label theory• Core Idea: represent labels with guarded

transformation functions– Separation of concerns: finite graph / theory of labels

Concrete transitions:

p

q

Symbolic transition:

‘\x80’/“\xC2\x80”

… ‘\x7FF’/“\xDF\xBF”

q

p

x. 8016 ≤ x ≤ 7FF16/[C016|x10,6, 8016|x5,0]

guard

bitvector operations

1920transitions

SMT APPLICATIONSoftware Model Checking

Static Driver Verifier

SLAM – part of SDV (Windows 7, 8)Z3 is used for:

Predicate abstraction Counter-example refinement

Yogi – next generation SDVZ3 is used for: Refining transition abstractions

Corral – uses reachability modulo theoriesSLAyer – checks memory safety using separation logic

Z3 & Static Driver Verifier: SLAM

• All-SAT– Better (more precise) Predicate Abstraction

• Unsatisfiable cores– Why the abstract path is not feasible?– Fast Predicate Abstraction

Summary

Introduction to Z3: An Efficient SMT Solver

Introduction to Applications:

– Exemplary applications

– Tools built around Z3

Tomorrow. More technical:– Introduction to SAT solving.– Introduction to SMT solving.

Documents

Nikolaj Bjørner Senior Researcher Microsoft Research Redmond