Upload
igalia
View
87
Download
1
Embed Size (px)
Citation preview
Property-based testing an open source compiler, pflua
A fast and easy way to find bugs
[email protected] ( luatime.org )www.igalia.com
Katerina Barone-Adesi
Summary
● What is property-based testing?● Why is it worth using?● Property-based testing case study with pflua,
an open source compiler● How do you implement it in an afternoon?● What tools already exist?
Why test?
● Reliability● Interoperability● Avoiding regressions● … but this is the test room, so
hopefully people already think testing is useful and necessary
Why property-based testing?
● Writing tests by hand is slow, boring, expensive, and usually doesn't lead to many tests being written
● Generating tests is cheaper, faster, more flexible, and more fun
● Covers cases humans might not
Why is it more flexible?
● Have you ever written a set of unit tests, then had to change them all by hand as the code changes?
● It's a lot easier and faster to change one part of test generation instead!
What is property-based testing?
● Choose a property (a statement that should always be true), such as:
● somefunc(x, y) < 100● sort(sort(x)) == sort(x) (for stable sorts)
● run(expr) == run(optimize(expr))● our_app(input) == other_app(input)
What is property-based testing not?
● A formal proof● Exhaustive (except for very small types)● What that means: property-based testing tries
to find counter-examples. If you find a counter-example, something is wrong and must be changed. If you don't, it's evidence (NOT proof) towards that part of your program being correct.
Why not exhaustively test?
● Too difficult● Too expensive● Too resource-consuming (human and computer
time)● Formal methods and state space reduction
have limitations
What is pflua?
● Pflua is a source to source compiler● It takes libpcap's filter language (which we call
pflang), and emits lua code ● Why? This lets us run the lua code with luajit● Performance: better than libpcap, often by a
factor of two or more● https://github.com/Igalia/pflua/● Apache License, Version 2.0
What is pflang?
● The input for pflua, libpcap, and other tools● Igalia's name for it, not an official name● A language for defining packet filters● Examples: “ip”, “tcp”, “tcp port 80”, …● tcp port 80 and not host 192.168.0.1● If you've used wireshark or tcpdump,
you've used pflang
Case study: testing pflua
● Pflua already had two forms of testing, and works in practice
● Andy Wingo and I implemented a property-based checker in an afternoon, with one property...
What was the test property?
● lua code generated from optimized and unoptimized IR has the same result on the same random packet
● It compared two paths:● Input → IR → optimize(IR) → compile → run()
● Input → IR → (no change) → compile → run()
What happened?
● We found 6/7 bugs● Some are ones we were unlikely to
find with testing by hand● Remember: pflua is an already-tested,
working project
What were the bugs?
● Accidental comments: 8--2 is 8, not 10! (Lua)● Invalid optimization: ntohs/ntohl● Generating invalid lua (return must end block)● Range analysis: range folding bug (→ inf)● Range analysis: not setting range of len● Range analysis: NaN (inf – inf is not your friend)● + a Luajit bug, found later by the same test
Case study recap
● Property-based testing is useful even for seemingly-working, seemingly-mature code
● We found 3 bugs in range analysis● We were unlikely to have found all 3 bugs with
unit testing by hand● This was code that appeared to work● Typical use didn't cause any visible problem● 4 of the 6 bugs fixed that afternoon
Property-based testing: how?
● for i = 1,100 do
local g = generate_test_case()
run_test_case(property, g)
● Conceptually, it's that simple:
Generate and run tests (handling exceptions)● With premade tools, you need a property,
and (sometimes) a random test generator
How to generate test cases
● The simplest version is unweighted choices:function True() return { 'true' } end
function Comparison()
return { ComparisonOp(), Arithmetic(),
Arithmetic() } end
…
function Logical()
return choose({ Conditional, Comparison, True, False, Fail })() end
Are unweighted choices enough?
● math.random(0, 2^32-1)● Property: 1/y <= y● False iff y = 0● 4 billion test cases doesn't guarantee this will
be found...● What are other common edge case numbers?
Weighted choices
function Number()
if math.random() < 0.2
then return math.random(0, 2^32 – 1)
else
return choose({ 0, 1, 2^32-1, 2^31-1 })
end
end
Write your own checker!
for i = 1,iterations do
local packet, packet_idx = choose(packets)
local P, len = packet.packet, packet.len
random_ir = Logical()
local unopt_lua = codegen.compile(random_ir)
local optimized = optimize.optimize(random_ir)
local opt_lua = codegen.compile(optimized)
if unopt_lua(P, len) ~= opt_lua(P, len)
then print_details_and_exit() endend
Test generation problems
● Large, hard-to-analyze test cases● Defaults to randomly searching the
solution space; randomly testing that plain 'false' is still 'false' after optimization as 20% of your 1000 tests is a bit daft
What level to test?
● For a compiler: the front-end language? Various levels of IR? Other?
● In general: input? Internal objects?● Tradeoffs: whitebox testing with internals can
be useful, but can break systems with internals that the system itself cannot create.
● Testing multiple levels is possible● Tends to test edge cases of lower levels
Interaction with interface stability
● At any level, more flexible than hand unit testing
● Interfaces change. Inputs hopefully change rarely; internals may change often
● Property-based testing makes refactoring cheaper and easier: less code to change when internals change, more test coverage
It's still worth unit testing
● Use property-based testing to find bugs (and classes of bugs)
● Use unit tests for avoiding regressions; continue to routinely test code that has already caused problems, to reduce the chances that known bugs will be re-introduced
● Use unit testing if test generation is infeasible, or for extremely rare paths
Reproducible tests
● There are some pitfalls to outputting a random seed to re-run tests
● The RNG may not produce consistent results across platforms or be stable across upgrades
● (Rare) Bugs in your compiler / interpreter / libraries can hinder reproducibility
Existing tools: QuickCheck
● Originally in Haskell; has been widely ported to other languages
● Better tools for test case generation● Allows filtering test cases● Starts with small test cases● QuickCheck2: test case minimization
The future of test generation
● Hypothesis, by David Ritchie MacIver (Python)● https://github.com/DRMacIver/hypothesis● Example database is better than saving seeds - it
propagates interesting examples between tests.● Much smarter data generation● Adapts to conditional tests better● Blurs the lines between fuzz testing, conventional
unit testing and property based testing.
Forward-looking Hypothesis
● The following are planned, but not implemented● Using coverage information to drive example
generation● Adding "combining rules" which allow you to
also express things like "set | set -> set" and then it can test properties on those too.
● Better workflows around integrating into CI● End-of-February 1.0 release predicted
Other stable tools
● Scalacheck● Quviq's Quickcheck for Erlang● Have/inspired some of the benefits of
Hypothesis, but are already mature and widely used