Flaky tests and bugs in Apache software (e.g. Hadoop)

Preview:

Citation preview

Copyright© 2016 NTT Corp. All Rights Reserved.

Flaky Tests and Bugs in

Apache Software (e.g. Hadoop)

Akihiro Suda <suda.akihiro@lab.ntt.co.jp>

NTT Software Innovation Center

ApacheCon Core North America (May 12, 2016, at Vancouver)

2Copyright© 2016 NTT Corp. All Rights Reserved.

• Software Engineer at NTT Corporation

• NTT: the largest telecom in Japan

• Engaged in improvement on reliability of

distributed systems

• Some contributions to ZooKeeper / Hadoop

including critical bug fixes (non-committer)

• github: https://github.com/AkihiroSuda

Who am I

3Copyright© 2016 NTT Corp. All Rights Reserved.

• Current "flakiness" in Apache software

• Why flaky test matters?

• What causes a flaky test?

• How can we find, reproduce, and fix a flaky test?

• Existing work at Apache communities

• Our work: Namazu(鯰, catfish)

https://github.com/osrg/namazu

Agenda

4Copyright© 2016 NTT Corp. All Rights Reserved.

Agenda

• Current "flakiness" in Apache software

• Why flaky test matters?

• What causes a flaky test?

• How can we find, reproduce, and fix a flaky test?

• Existing work at Apache communities

• Our work: Namazu(鯰, catfish)

https://github.com/osrg/namazu

5Copyright© 2016 NTT Corp. All Rights Reserved.

Good News: Apache software are well tested!

Software Production code (LOC) Test code (LOC)

MapReduce 95K 87K

YARN 178K 121K

HDFS 152K 150K

ZooKeeper 33K 27K

HBase 571K 222K

Spark 167K 128K

Flume 46K 34K

Cassandra 168K 78K

Data are measured at 14/01/2016, using CLOC

Prod Test

6Copyright© 2016 NTT Corp. All Rights Reserved.

Bad News: https://builds.apache.org/job/%s-trunk/

MapReduce YARN HDFS

ZooKeeper

Data are captured at 14/01/2016

HBaseBuild

Build Time

Blue = Success

Red = Failure

I've never seen fully successful Hadoop build,

even on my local machine...

7Copyright© 2016 NTT Corp. All Rights Reserved.

Bad News: JIRA QL: project = ? AND text ~ "test fail*"

Software #Matched #All

Issues

MapReduce 2,441 (38%) 6,373

YARN 2,290 (63%) 4,756

HDFS 5,141 (53%) 9,672

ZooKeeper 828 (35%) 2,384

HBase 6,595 (42%) 15,542

Spark 794 ( 6%) 14,047

Flume 342 (12%) 2,882

Cassandra 1,656 (15%) 11,430

Data are captured at 4/4/2016

Roughly speaking,

the half of

Hadoop development

is dedicated to

debugging test failures.

Interestingly,

its flakiness seems

not uniform

across software..

(discussed later)

just for approximation

8Copyright© 2016 NTT Corp. All Rights Reserved.

Agenda

• Current "flakiness" in Apache software

• Why flaky test matters?

• What causes a flaky test?

• How can we find, reproduce, and fix a flaky test?

• Existing work at Apache communities

• Our work: Namazu(鯰, catfish)

https://github.com/osrg/namazu

9Copyright© 2016 NTT Corp. All Rights Reserved.

97% unit test failures in Apache software are said to be

harmless for production ("false-alarm")

• Information source:

"An Empirical Study of Bugs in Test Code" (A.Vahabzadeh et al., ICSME'15)

Not all test failures are critical for production..

10Copyright© 2016 NTT Corp. All Rights Reserved.

It still matters!

For developers..

It's a barrier to promotion of CI

• If many tests are flaky, developers tend to ignore CI

failure overlook real bugs

It's also a psychological barrier to contribution

• A developer may be blamed due to a test failure

For users..

It's a barrier to risk assessment for production

• No one can tell flaky tests from real bugs

So flaky test doesn't matter, as it doesn't affect production?

11Copyright© 2016 NTT Corp. All Rights Reserved.

SemaphoreCI suggests "No broken windows" strategy

for flaky tests

https://semaphoreci.com/community/tutorials/how-to-deal-with-and-eliminate-flaky-tests

So flaky test doesn't matter, as it doesn't affect production?

image: http://guides.lib.jjay.cuny.edu/nypd/brokenwindows

12Copyright© 2016 NTT Corp. All Rights Reserved.

Agenda

• Current "flakiness" in Apache software

• Why flaky test matters?

• What causes a flaky test?

• How can we find, reproduce, and fix a flaky test?

• Existing work at Apache communities

• Our work: Namazu(鯰, catfish)

https://github.com/osrg/namazu

13Copyright© 2016 NTT Corp. All Rights Reserved.

• Typical flaky test is caused by a malformed async

operation like this

(A.Vahabzadeh et al., ICSME'15 / Q.Luo et al., ACM FSE'14 / YARN-4478)

• Basically it can be fixed by increasing timeout&retries

• But it's not easy to find a reasonable timeout value

(e.g. YARN-{4804, 4807, 4929...})

• Long timeout is expensive

Basic cause: async operation

invokeAsyncOperation();// some tests lack even this sleepsleep(certainHardcodedTimeout);assertTrue(checkSomethingGoodHasHappened());

14Copyright© 2016 NTT Corp. All Rights Reserved.

• Host configuration

• Host performance

• Docker is great! But it still has some

issues

Testbed (e.g. CI) can cause test failures as well

15Copyright© 2016 NTT Corp. All Rights Reserved.

• HADOOP-12687

• Many YARN test fails when /etc/hosts has multiple loopback

entries

• ZOOKEEPER-2252

• Test: nslookup("a") should fail

• It does not fail when there is actually the host named "a“

• INFRA-11811

• JDK was not set up properly in a Jenkins slave

• Such a test can fail when the job is assigned to a

specific buildbot and it looks like a flaky test

CI host configuration can cause test failures

16Copyright© 2016 NTT Corp. All Rights Reserved.

CI host performance: they're not made equal

• Hadoop's buildbot https://builds.apache.org/computer/

Data are captured at 25/04/2016

17Copyright© 2016 NTT Corp. All Rights Reserved.

CI host performance: they're not made equal

• Spark's buildbot https://amplab.cs.berkeley.edu/jenkins/computer/

18Copyright© 2016 NTT Corp. All Rights Reserved.

CI host performance: they're not made equal

• Significant difference in the response time!

• Maybe related to the fact that Spark has only a

small number of test-related issues

(e.g. YARN 63% vs Spark 6% (slide 7))

Target Average Max Min

Hadoop 1163ms 1482ms 30ms

Spark 3ms 6ms 0ms

19Copyright© 2016 NTT Corp. All Rights Reserved.

Docker is great for testing!

• Some Apache software are using Docker on their

CI (via Apache Yetus)

• Apache BigTop also utilizes Docker for

provisioning Hadoop

• People also loves Docker for setting up test beds

on their workstations and laptops

• Of course me too

Docker issues

20Copyright© 2016 NTT Corp. All Rights Reserved.

• Mentioned in several Apache-related issue tickets:

• jupyter/docker-stacks#75: Spark hanging

• docker-library/cassandra#43, #46

• docker-solr/docker-solr#4

• ALLURA-8039

• AMBARI-14706

• IGNITE-2377

• YETUS-229 …

• Fortunately Apache Buildbot (Yetus) didn't hit the bug,

but made people's local testbeds flaky in a weird way.

• Fixed in recent kernels (so, accurately, it's not a Docker's issue)

Docker #18180: Java VM unkillable zombie

21Copyright© 2016 NTT Corp. All Rights Reserved.

AUFS: fcntl(F_SETFL, O_APPEND) was not supported

(#20199)

• Can cause data corruption (Dovecot is known to be affected)

• Fixed in recent AUFS

Overlay: You should not open O_RDWR and

O_RDONLY simultaneously (#10180)

• Can cause data corruption (RPM is known to be affected)

• Expected behavior, won't get fixed

More information: https://github.com/AkihiroSuda/docker-issues

Other potential Docker-related issues

22Copyright© 2016 NTT Corp. All Rights Reserved.

• Some issues can occur only in a

deployed environment rather than in a

CI

• e.g. TCP packet corruption

• Very flaky and critical

Flaky test is not limited to xUnit in CI..

TCP

23Copyright© 2016 NTT Corp. All Rights Reserved.

https://www.pagerduty.com/blog/the-discovery-of-apache-

zookeepers-poison-packet/

• TCP checksum was ignored in some IPsec

configuration

• ZooKeeper became weird intermittently due to corrupted TCP

packet

https://tech.vijayp.ca/linux-kernel-bug-delivers-corrupt-tcp-ip-

data-to-mesos-kubernetes-docker-containers-

4986f88f7a19#.gq8chzply

• TCP checksum was ignored in some veth

configuration

• Mesos and Kubernetes are affected

TCP packet corruption

TCP

24Copyright© 2016 NTT Corp. All Rights Reserved.

• It's very hard to notice (and reproduce) flaky TCP

packet corruption...

• Should distributed systems be TCP-corruption

tolerant...?

• the probability is very low in regular environments,

but it is not zero

(32-bit Ethernet CRC + 16-bit TCP checksum)

• JIRA issues: ZOOKEEPER-2175, HDFS-8161…

TCP packet corruption

TCP

25Copyright© 2016 NTT Corp. All Rights Reserved.

Agenda

• Current "flakiness" in Apache software

• Why flaky test matters?

• What causes a flaky test?

• How can we find, reproduce, and fix a flaky test?

• Existing work at Apache communities

• Our work: Namazu(鯰, catfish)

https://github.com/osrg/namazu

26Copyright© 2016 NTT Corp. All Rights Reserved.

• determine-flaky-tests-hadoop.py

• Apache Kudu‘s CI (dist_test)

• Google's TAP

• Our work: Namazu

https://github.com/osrg/Namazu

• and similar great tools

Efforts to find/reproduce a flaky test

27Copyright© 2016 NTT Corp. All Rights Reserved.

• Picks up failed tests using Jenkins API

• Included in hadoop.git/dev-support (HADOOP-

11045)

determine-flaky-tests-hadoop.py

$ determine-flaky-tests-hadoop.py --job Hadoop-YARN-trunk****Recently FAILED builds in url: https://builds.apache.org/job/Hadoop-YARN-trunk...Among 15 runs examined, all failed tests <#failedRuns: testName>:

7: TestContainerManagerRecovery.testApplicationRecovery...

28Copyright© 2016 NTT Corp. All Rights Reserved.

• Great tool, but it doesn't support running a

specific test repeatedly

• Also there is a maven dependency issue (YARN-

4478)

• B depends on A

• TestB is never executed if TestA fails

if TestA is flaky, we can't evaluate the flakiness of

TestB!

determine-flaky-tests-hadoop.py

29Copyright© 2016 NTT Corp. All Rights Reserved.

Kudu's CI: flaky test dashboard

http://dist-test.cloudera.org:8080/ (Apr 25)

Recently open-sourced and introduced at Apache: Big Data (Monday)

https://github.com/cloudera/dist_test

30Copyright© 2016 NTT Corp. All Rights Reserved.

Kudu's CI: flaky test dashboard

• Tests are run repeatedly on CI to find flaky tests

• KUDU_FLAKY_TEST_ATTEMPTS

• KUDU_FLAKY_TEST_LIST

(From https://github.com/apache/incubator-kudu/commit/1a24338a)

Fix flakiness of client_failover-itest

The reason this test was flaky is that there is a race between....

Looped 100x and they all passed:

http://dist-test.cloudera.org/job?job_id=mpercy.1454486819.10566

Author Mike Percy Jan 29, 2016 8:01 AMCommitter Todd Lipcon Feb 4, 2016 2:14 PMCommit 1a24338ad60a8842d1ae5e227f8f03e58faea8c0

31Copyright© 2016 NTT Corp. All Rights Reserved.

• Google's internal CI

• 1.6M test failures per day

• 73K (4.5%) are flaky

• Repeat a failing test 10 times for labeling

flaky tests

• Information source: An Empirical Analysis

of Flaky Tests (Q.Luo et al. ACM FSE'14)

Google's TAP

32Copyright© 2016 NTT Corp. All Rights Reserved.

• Modern CIs run jobs repeatedly to find /

reproduce flaky tests

• But they don't control non-determinism

• Overlook a flaky test

• Can not reproduce a failure

Cannot analyze the failure

• Our suggestion: increase non-determinism

for finding and reproducing flaky tests

Challenge: poor non-determinism

33Copyright© 2016 NTT Corp. All Rights Reserved.

NAMAZU: PROGRAMMABLE FUZZY SCHEDULER

https://github.com/osrg/namazu

NOTE: Namazu was formerly named "Earthquake"

34Copyright© 2016 NTT Corp. All Rights Reserved.

Namazu: programmable fuzzy scheduler

https://github.com/osrg/namazu

EventFuzzed (Randomized)

Schedule

Increases non-determinismfor finding and

reproducing flaky tests

Filesystem Packet Go[planned] Linux threadsJava

鯰 (namazu) means

a catfish in Japanese

35Copyright© 2016 NTT Corp. All Rights Reserved.

FUSE

Netfilter

Openflow

Byteman

AspectJ

Filesystem Packet Go[planned] Linux threadsJava

AspectGo

[wip]

sched_

setattr(2)

Namazu uses non-invasive techniques

• can be easily applied to any environment

• can avoid false-positives

Namazu: programmable fuzzy scheduler

https://github.com/osrg/namazu

https://github.com/AkihiroSuda/golang-exp-aspectgo

36Copyright© 2016 NTT Corp. All Rights Reserved.

• xUnit tests

• 😃 Easy to get started; just run `mvn`

• 😃 Can reproduce test failures observed in CI

• 😞 Limited testable scope

• Integration tests on a distributed cluster

• 😃 Can test everything

• 😞 Need to write a script to set up the cluster

• But Docker helps us a lot!

Namazu targets

37Copyright© 2016 NTT Corp. All Rights Reserved.

We support the both scenarios

Namazu targets

Single-node mode

(for xUnit tests)

Distributed mode

(for integration tests)

$ mvn test

Orchestrator

RPC

38Copyright© 2016 NTT Corp. All Rights Reserved.

NAMAZU + XUNIT TESTS

$ mvn test

39Copyright© 2016 NTT Corp. All Rights Reserved.

• Namazu is a comprehensive framework...

• Quick start: “renice” threads for xUnit tests

• POSIX.1 requires that threads share the single nice(priority)

value, but the actual Linux implementation (NPTL) not.

• Not always effective, but it’s generic and easy to get started

Namazu + xUnit tests

Filesystem Packet Go[planned] Linux threadsJava

40Copyright© 2016 NTT Corp. All Rights Reserved.

Namazu + xUnit tests

$ PID=$(docker inspect $(docker ps -q -f ancestor=hadoop-build-ubuntu) | jq .[0].State.Pid)$ sudo nmz inspectors proc -pid $PID

$ cd hadoop; ./start-build-env.sh[container]$ mvn test –Dtest=TestFoo#testBar

Namazu periodically sets random nice values for all the child

processes and the threads under $PID

Plus utilizes non-default kernel schedulers (e.g. SCHED_BATCH)

41Copyright© 2016 NTT Corp. All Rights Reserved.

Namazu + xUnit tests: Reproducibility

Testcase Traditional Namazu

YARN-4548

RM/TestCapacityScheduler11% 82%

YARN-4556

RM/TestFifoScheduler2% 44%

ZOOKEEPER-2137

ReconfigTest2% 16%

YARN-4168

NM/TestLogAggregationService1% 8%

YARN-1978

NM/TestLogAggregationService0% 4%

YARN-4543

NM/TestNodeStatusUpdater0% 1%

• More information: osrg/namazu#125

42Copyright© 2016 NTT Corp. All Rights Reserved.

Namazu + xUnit tests: Reproducibility

Testcase Traditional Namazu

ZOOKEEPER-2080

ReconfigRecoveryTest

14.0% 61.9%

• "Renicing" is not always effective...

• But even when renicing is ineffective,

sometimes you can also reproduce the flaky test

by injecting delays or reordering packets

$ sudo iptables ... -j NFQUEUE --queue-num 42$ sudo nmz inspectors ethernet -nfq-number 42

43Copyright© 2016 NTT Corp. All Rights Reserved.

NAMAZU + INTEGRATION TESTS

44Copyright© 2016 NTT Corp. All Rights Reserved.

• ZooKeeper: distributed coordination service

• used in Hadoop, Spark, Mesos, Kafka..

• ZooKeeper 3.5 (alpha) introduced the dynamic

configuration

• We performed an integration test so as to evaluate

the reliability of the reconfiguration

• We found a flaky bug!

Namazu + Integration tests

45Copyright© 2016 NTT Corp. All Rights Reserved.

• We permuted some specific Ethernet packets in random

order using Namazu

• TCP retransmissions are eliminated for reducing possible state

space

Namazu + Integration tests

ZooKeeper cluster

Open vSwitch + Ryu SDN Framework

+ Namazu

46Copyright© 2016 NTT Corp. All Rights Reserved.

• Bug: New node cannot participate to ZK cluster properly

New node cannot become a leader of ZK cluster itself

(More technically, it keeps being an "observer“)

• Cause: distributed race (ZAB packet vs FLE packet)

• ZAB.. atomic broadcast protocol for data

• FLE.. leader election protocol for ZK cluster itself

Found ZOOKEEPER-2212

Leader of ZK cluster New ZK node

ZAB [2888/tcp]

FLE [3888/tcp]

Uses different TCP connection

Non-deterministic packet order

47Copyright© 2016 NTT Corp. All Rights Reserved.

Data are captured at 22/01/2016

Found ZOOKEEPER-2212

48Copyright© 2016 NTT Corp. All Rights Reserved.

• Expected: ZK cluster works even when 𝑵/𝟐 nodes

crashed

• Real: single node failure can terminate the 3-node

ensemble

Found ZOOKEEPER-2212

Not participating properly

(keeps being an "observer")

49Copyright© 2016 NTT Corp. All Rights Reserved.

• Reproducibility: 0.0% 21.8%

(tested 1,000 times)

• We could not reproduce the bug even after

5,000 times traditional testing (60 hours!)

• Even reproducible by “renicing” threads, but the

reproducibility is just 0.7%

How hard is it to reproduce?

50Copyright© 2016 NTT Corp. All Rights Reserved.

We define the distributed execution pattern based on code coverage:

𝑷 =

𝒑𝟏,𝟏 ⋯ 𝒑𝟏,𝑵

⋮ ⋱ ⋮𝒑𝑳,𝟏 ⋯ 𝒑𝑳,𝑵

• 𝐿: LOC

• 𝑁: Number of nodes (==3 in this case)

• 𝑝𝑖 ,𝑗 : 1 if the node 𝑗 covers the branch in line 𝑖 , otherwise 0

• We used JaCoCo: Java Code Coverage Library (patch: ZOOKEEPER-2266)

Why we can hit the bug?

Namazu achieves faster pattern growth.

That's why we can hit the bug.

51Copyright© 2016 NTT Corp. All Rights Reserved.

HOW TO USE NAMAZU?

52Copyright© 2016 NTT Corp. All Rights Reserved.

Easy to install

Easy to get started

• Provides Docker-like CLI

• No code instrumentation needed

• No configuration needed (default: just renice threads)

How to use Namazu?

$ sudo apt-get install lib{netfilter-queue,zmq3}-dev$ go get github.com/osrg/namazu/nmz

$ sudo nmz container run –it –v /foo:/foo ubuntu[container]$ cd /foo && mvn test

53Copyright© 2016 NTT Corp. All Rights Reserved.

For threads ("renicing")

$ sudo nmz inspectors proc -pid $TARGET_PID

$ sudo nmz inspectors fs -mount-point /nmzfs

$ sudo iptables ... -j NFQUEUE --queue-num 42$ sudo nmz inspectors ethernet -nfq-number 42

Need distributed mode? (for integration testing)

Just add `--orchestrator-url http://foobar:10080/api/v3` to the CLI.

For filesystem

For network packets

How to use Namazu?

54Copyright© 2016 NTT Corp. All Rights Reserved.

Namazu API (Go)

type ExplorePolicy interface {QueueEvent(Event)ActionChan() chan Action

}

func (p *MyPolicy) QueueEvent(event Event) {action := event.DefaultAction()p.timeBoundedQ.Enqueue(action,

10 * Millisecond, 30 * Millisecond)}

func (p *MyPolicy) ActionChan() chan Action {return p.timeBoundedQ.DequeueChan

}

Action is randomly fired in [10ms, 30ms]

You can also inject fault actions here

Namazu defines REST API,

so you can also use other languages

An event can contain

Ethernet packet bytes

55Copyright© 2016 NTT Corp. All Rights Reserved.

• We found a bug: YARN cannot detect disk failure cases

where mkdir()/rmdir() blocks

• We noticed that the bug can occur theoretically

when we are reading the code, and actually produced the

bug using Namazu

• When we should inject the fault is pre-known;

so we manually wrote a concrete scenario using Namazu API

• Much more realistic than JUnit + mocking

API use case: found YARN-4301

mkdir

EIO

mkdir

...

A case where mkdir() returns EIO explicitly A case where mkdir() blocks

56Copyright© 2016 NTT Corp. All Rights Reserved.

func (p *MyPolicy) signalHandler() {signal.Notify(sigChan, syscall.SIGUSR1)for {

<-sigChanp.sleep = 10 * time.Minute

}}go p.signalHandler()func (p *MyPolicy) QueueEvent(event Event) {..}func (p *MyPolicy) ActionChan() chan Action {..}

$ go run mypolicy.go inspectors fs -mount-point /nmzfs

Set "yarn.nodemanager.local-dirs" to "/nmzfs/nm-local-dir",

Send SIGUSR1 to Namazu when you (and YARN) are ready

Interactive test is often easier than writing a JUnit testcase

We use SIGUSR1 here,

but it is also interesting to

implement human-friendly

CLI or GUI for

interactive testing

fault: blocks for 10 minutes

API use case: found YARN-4301

57Copyright© 2016 NTT Corp. All Rights Reserved.

API use case: found YARN-4301

58Copyright© 2016 NTT Corp. All Rights Reserved.

• If you have knowledge on the protocol, you can make

a hash for a packet

• Note that you have to eliminate time-dependent and random

bytes when you hash the packet

• Using the hash and Namazu API, you can "semi"-

deterministically replay the scenario

• Not fully deterministic; it just does its best effort

• Record-less! You just need to remember the "seed" for

replaying

• PoC: ZOOKEEPER-2212: up to 65% reproducibility

• More information: osrg/namazu#137

• See also (for Go): https://github.com/AkihiroSuda/go-replay

Another API use case: "semi"-deterministic replay

59Copyright© 2016 NTT Corp. All Rights Reserved.

SIMILAR GREAT TOOLS

60Copyright© 2016 NTT Corp. All Rights Reserved.

• Network partitioner + Linearizability tester

• Famous for "Call Me Maybe" blog: http://jepsen.io/

• “Call Me Maybe” by Carly Rae Jepsen (vevo):

https://www.youtube.com/watch?v=fWNaR-rxAic

• Randomly injects network partition using iptables

• "Linearizability" ∈ "Strong consistency"

• Integration test on a flaky network rather than a

flaky xUnit test

Similar great tool: Jepsen

61Copyright© 2016 NTT Corp. All Rights Reserved.

• Has been used to test several Apache software

• Cassandra: 9851,10001,10068,10231,10413,10674

• http://www.datastax.com/dev/blog/testing-apache-cassandra-with-jepsen

• HBase

• Kafka

• Solr: 6530, 6583, 6610

• http:///lucidworks.com/blog/2014/12/10/call-maybe-solrcloud-jepsen-

flaky-networks

• ZooKeeper

Similar great tool: Jepsen

62Copyright© 2016 NTT Corp. All Rights Reserved.

• Namazu is much more generalized

• The bugs we found/reproduced are basically beyond the

scope of Jepsen (Threads, Disks..)

• Namazu can be also combined with Jepsen! It will be

our next work..

Namazu + Jepsen?

• causes network partition

• tests linearizablity

• increases non-determinism

• injects filesystem faults

Jepsen Namazu ...

63Copyright© 2016 NTT Corp. All Rights Reserved.

• Make the filesystem flaky using FUSE

• Used in testing ScyllaDB (Apache Cassandra's clone)

• https://github.com/scylladb/charybdefs

• Similar to Namazu FS

• Both supports API

• Also similar to PetardFS (not active since 2007)

• CharybdeFS can be also combined with Namazu as

well

• CharybdeFS is specialized in FS; Namazu is much more

comprehensive.

Similar great tool: CharybdeFS

64Copyright© 2016 NTT Corp. All Rights Reserved.

https://github.com/NetSys/demi

• Found some akka-raft bugs and reproduced a few Spark bugs

• challenge in reducing false-positives related to instrumentation

• DEMi and Namazu are complementary each other

• DEMi is powerful, but has some limitations

• Namazu is comprehensive and made easy to get started

Similar great tool: DEMi (appeared in NSDI'16)

Namazu DEMi

Target Generic

(Network,Filesystem,Thread..)

Akka

Getting Started Easy Need to write

AspectJ codes

Deterministic Replay? No Yes

Bug Cause Minimization? No Yes

65Copyright© 2016 NTT Corp. All Rights Reserved.

SO... HOW CAN WE FIX FLAKY TESTS?

66Copyright© 2016 NTT Corp. All Rights Reserved.

• Namazu finds/reproduces flaky tests, but it

doesn't automatically fix them😞

• Basic approach for async-related flakiness:

Adjust the values for sleep() and retries in the

test code

How can we fix flaky tests?

invokeAsyncOperation();// some tests lack even this sleepsleep(certainHardcodedTimeout);assertTrue(checkSomethingGoodHasHappened());

67Copyright© 2016 NTT Corp. All Rights Reserved.

How can we fix flaky tests?

invokeAsyncOperation();// some tests lack even this sleepsleep(certainHardcodedTimeout);assertTrue(checkSomethingGoodHasHappened());

• Suggestion: the timeout(&retries) should be a configurable

parameter rather than a hard-coded value

Timeout value Cost

(time)

Risk (timeout) Appropriate for

Long High Low • Slow machine (e.g.CI)

• Conservative person

Short Low High • Fast machine

• Risk-appetite person

68Copyright© 2016 NTT Corp. All Rights Reserved.

CONCLUSION

69Copyright© 2016 NTT Corp. All Rights Reserved.

• Apache software are well tested

• But they are flaky

• Let’s improve them

• Improve asynchronous code

• Repeat tests

• Our tool can control non-determinism

so as to reproduce flaky tests

https://github.com/osrg/namazu

Conclusion

Recommended