Property-based testing an open-source compiler, pflua (FOSDEM 2015)

Property-based testing an open source compiler, pflua

A fast and easy way to find bugs

[email protected] ( luatime.org )www.igalia.com

Katerina Barone-Adesi

http://www.igalia.com/

Summary

● What is property-based testing?● Why is it worth using?● Property-based testing case study with pflua,

an open source compiler● How do you implement it in an afternoon?● What tools already exist?

Why test?

● Reliability● Interoperability● Avoiding regressions● … but this is the test room, so

hopefully people already think testing is useful and necessary

Why property-based testing?

● Writing tests by hand is slow, boring, expensive, and usually doesn't lead to many tests being written

● Generating tests is cheaper, faster, more flexible, and more fun

● Covers cases humans might not

Why is it more flexible?

● Have you ever written a set of unit tests, then had to change them all by hand as the code changes?

● It's a lot easier and faster to change one part of test generation instead!

What is property-based testing?

● Choose a property (a statement that should always be true), such as:

● somefunc(x, y) < 100● sort(sort(x)) == sort(x) (for stable sorts)

● run(expr) == run(optimize(expr))● our_app(input) == other_app(input)

What is property-based testing not?

● A formal proof● Exhaustive (except for very small types)● What that means: property-based testing tries

to find counter-examples. If you find a counter-example, something is wrong and must be changed. If you don't, it's evidence (NOT proof) towards that part of your program being correct.

Why not exhaustively test?

● Too difficult● Too expensive● Too resource-consuming (human and computer

time)● Formal methods and state space reduction

have limitations

What is pflua?

● Pflua is a source to source compiler● It takes libpcap's filter language (which we call

pflang), and emits lua code ● Why? This lets us run the lua code with luajit● Performance: better than libpcap, often by a

factor of two or more● https://github.com/Igalia/pflua/● Apache License, Version 2.0

https://github.com/Igalia/pflua/

What is pflang?

● The input for pflua, libpcap, and other tools● Igalia's name for it, not an official name● A language for defining packet filters● Examples: “ip”, “tcp”, “tcp port 80”, …● tcp port 80 and not host 192.168.0.1● If you've used wireshark or tcpdump,

you've used pflang

Case study: testing pflua

● Pflua already had two forms of testing, and works in practice

● Andy Wingo and I implemented a property-based checker in an afternoon, with one property...

What was the test property?

● lua code generated from optimized and unoptimized IR has the same result on the same random packet

● It compared two paths:● Input → IR → optimize(IR) → compile → run()

● Input → IR → (no change) → compile → run()

What happened?

● We found 6/7 bugs● Some are ones we were unlikely to

find with testing by hand● Remember: pflua is an already-tested,

working project

What were the bugs?

● Accidental comments: 8--2 is 8, not 10! (Lua)● Invalid optimization: ntohs/ntohl● Generating invalid lua (return must end block)● Range analysis: range folding bug (→ inf)● Range analysis: not setting range of len● Range analysis: NaN (inf – inf is not your friend)● + a Luajit bug, found later by the same test

Case study recap

● Property-based testing is useful even for seemingly-working, seemingly-mature code

● We found 3 bugs in range analysis● We were unlikely to have found all 3 bugs with

unit testing by hand● This was code that appeared to work● Typical use didn't cause any visible problem● 4 of the 6 bugs fixed that afternoon

Property-based testing: how?

● for i = 1,100 do

local g = generate_test_case()

run_test_case(property, g)

● Conceptually, it's that simple:

Generate and run tests (handling exceptions)● With premade tools, you need a property,

and (sometimes) a random test generator

How to generate test cases

● The simplest version is unweighted choices:function True() return { 'true' } end

function Comparison()

return { ComparisonOp(), Arithmetic(),

Arithmetic() } end

…

function Logical()

return choose({ Conditional, Comparison, True, False, Fail })() end

Are unweighted choices enough?

● math.random(0, 2^32-1)● Property: 1/y <= y● False iff y = 0● 4 billion test cases doesn't guarantee this will

be found...● What are other common edge case numbers?

Weighted choices

function Number()

if math.random() < 0.2

then return math.random(0, 2^32 – 1)

else

return choose({ 0, 1, 2^32-1, 2^31-1 })

end

end

Write your own checker!

for i = 1,iterations do

local packet, packet_idx = choose(packets)

local P, len = packet.packet, packet.len

random_ir = Logical()

local unopt_lua = codegen.compile(random_ir)

local optimized = optimize.optimize(random_ir)

local opt_lua = codegen.compile(optimized)

if unopt_lua(P, len) ~= opt_lua(P, len)

then print_details_and_exit() endend

Test generation problems

● Large, hard-to-analyze test cases● Defaults to randomly searching the

solution space; randomly testing that plain 'false' is still 'false' after optimization as 20% of your 1000 tests is a bit daft

What level to test?

● For a compiler: the front-end language? Various levels of IR? Other?

● In general: input? Internal objects?● Tradeoffs: whitebox testing with internals can

be useful, but can break systems with internals that the system itself cannot create.

● Testing multiple levels is possible● Tends to test edge cases of lower levels

Interaction with interface stability

● At any level, more flexible than hand unit testing

● Interfaces change. Inputs hopefully change rarely; internals may change often

● Property-based testing makes refactoring cheaper and easier: less code to change when internals change, more test coverage

It's still worth unit testing

● Use property-based testing to find bugs (and classes of bugs)

● Use unit tests for avoiding regressions; continue to routinely test code that has already caused problems, to reduce the chances that known bugs will be re-introduced

● Use unit testing if test generation is infeasible, or for extremely rare paths

Reproducible tests

● There are some pitfalls to outputting a random seed to re-run tests

● The RNG may not produce consistent results across platforms or be stable across upgrades

● (Rare) Bugs in your compiler / interpreter / libraries can hinder reproducibility

Existing tools: QuickCheck

● Originally in Haskell; has been widely ported to other languages

● Better tools for test case generation● Allows filtering test cases● Starts with small test cases● QuickCheck2: test case minimization

The future of test generation

● Hypothesis, by David Ritchie MacIver (Python)● https://github.com/DRMacIver/hypothesis● Example database is better than saving seeds - it

propagates interesting examples between tests.● Much smarter data generation● Adapts to conditional tests better● Blurs the lines between fuzz testing, conventional

unit testing and property based testing.

Forward-looking Hypothesis

● The following are planned, but not implemented● Using coverage information to drive example

generation● Adding "combining rules" which allow you to

also express things like "set | set -> set" and then it can test properties on those too.

● Better workflows around integrating into CI● End-of-February 1.0 release predicted

Other stable tools

● Scalacheck● Quviq's Quickcheck for Erlang● Have/inspired some of the benefits of

Hypothesis, but are already mature and widely used

Conclusions

● Property-based testing finds tricky bugs and saves time

● You can start it in an afternoon, with no tools● There are some pretty helpful existing tools

(QuickCheck, Hypothesis, ScalaCheck, etc)● Start property-based testing today!● Or Monday, at least.

Technology

Property-based testing an open-source compiler, pflua (FOSDEM 2015)