Upload
akihiro-suda
View
602
Download
2
Embed Size (px)
Citation preview
Copyright© 2016 NTT Corp. All Rights Reserved.
Flaky Tests and Bugs in
Apache Software (e.g. Hadoop)
Akihiro Suda <[email protected]>
NTT Software Innovation Center
ApacheCon Core North America (May 12, 2016, at Vancouver)
2Copyright© 2016 NTT Corp. All Rights Reserved.
• Software Engineer at NTT Corporation
• NTT: the largest telecom in Japan
• Engaged in improvement on reliability of
distributed systems
• Some contributions to ZooKeeper / Hadoop
including critical bug fixes (non-committer)
• github: https://github.com/AkihiroSuda
Who am I
3Copyright© 2016 NTT Corp. All Rights Reserved.
• Current "flakiness" in Apache software
• Why flaky test matters?
• What causes a flaky test?
• How can we find, reproduce, and fix a flaky test?
• Existing work at Apache communities
• Our work: Namazu(鯰, catfish)
https://github.com/osrg/namazu
Agenda
4Copyright© 2016 NTT Corp. All Rights Reserved.
Agenda
• Current "flakiness" in Apache software
• Why flaky test matters?
• What causes a flaky test?
• How can we find, reproduce, and fix a flaky test?
• Existing work at Apache communities
• Our work: Namazu(鯰, catfish)
https://github.com/osrg/namazu
5Copyright© 2016 NTT Corp. All Rights Reserved.
Good News: Apache software are well tested!
Software Production code (LOC) Test code (LOC)
MapReduce 95K 87K
YARN 178K 121K
HDFS 152K 150K
ZooKeeper 33K 27K
HBase 571K 222K
Spark 167K 128K
Flume 46K 34K
Cassandra 168K 78K
Data are measured at 14/01/2016, using CLOC
Prod Test
6Copyright© 2016 NTT Corp. All Rights Reserved.
Bad News: https://builds.apache.org/job/%s-trunk/
MapReduce YARN HDFS
ZooKeeper
Data are captured at 14/01/2016
HBaseBuild
Build Time
Blue = Success
Red = Failure
I've never seen fully successful Hadoop build,
even on my local machine...
7Copyright© 2016 NTT Corp. All Rights Reserved.
Bad News: JIRA QL: project = ? AND text ~ "test fail*"
Software #Matched #All
Issues
MapReduce 2,441 (38%) 6,373
YARN 2,290 (63%) 4,756
HDFS 5,141 (53%) 9,672
ZooKeeper 828 (35%) 2,384
HBase 6,595 (42%) 15,542
Spark 794 ( 6%) 14,047
Flume 342 (12%) 2,882
Cassandra 1,656 (15%) 11,430
Data are captured at 4/4/2016
Roughly speaking,
the half of
Hadoop development
is dedicated to
debugging test failures.
Interestingly,
its flakiness seems
not uniform
across software..
(discussed later)
just for approximation
8Copyright© 2016 NTT Corp. All Rights Reserved.
Agenda
• Current "flakiness" in Apache software
• Why flaky test matters?
• What causes a flaky test?
• How can we find, reproduce, and fix a flaky test?
• Existing work at Apache communities
• Our work: Namazu(鯰, catfish)
https://github.com/osrg/namazu
9Copyright© 2016 NTT Corp. All Rights Reserved.
97% unit test failures in Apache software are said to be
harmless for production ("false-alarm")
• Information source:
"An Empirical Study of Bugs in Test Code" (A.Vahabzadeh et al., ICSME'15)
Not all test failures are critical for production..
10Copyright© 2016 NTT Corp. All Rights Reserved.
It still matters!
For developers..
It's a barrier to promotion of CI
• If many tests are flaky, developers tend to ignore CI
failure overlook real bugs
It's also a psychological barrier to contribution
• A developer may be blamed due to a test failure
For users..
It's a barrier to risk assessment for production
• No one can tell flaky tests from real bugs
So flaky test doesn't matter, as it doesn't affect production?
11Copyright© 2016 NTT Corp. All Rights Reserved.
SemaphoreCI suggests "No broken windows" strategy
for flaky tests
https://semaphoreci.com/community/tutorials/how-to-deal-with-and-eliminate-flaky-tests
So flaky test doesn't matter, as it doesn't affect production?
image: http://guides.lib.jjay.cuny.edu/nypd/brokenwindows
12Copyright© 2016 NTT Corp. All Rights Reserved.
Agenda
• Current "flakiness" in Apache software
• Why flaky test matters?
• What causes a flaky test?
• How can we find, reproduce, and fix a flaky test?
• Existing work at Apache communities
• Our work: Namazu(鯰, catfish)
https://github.com/osrg/namazu
13Copyright© 2016 NTT Corp. All Rights Reserved.
• Typical flaky test is caused by a malformed async
operation like this
(A.Vahabzadeh et al., ICSME'15 / Q.Luo et al., ACM FSE'14 / YARN-4478)
• Basically it can be fixed by increasing timeout&retries
• But it's not easy to find a reasonable timeout value
(e.g. YARN-{4804, 4807, 4929...})
• Long timeout is expensive
Basic cause: async operation
invokeAsyncOperation();// some tests lack even this sleepsleep(certainHardcodedTimeout);assertTrue(checkSomethingGoodHasHappened());
14Copyright© 2016 NTT Corp. All Rights Reserved.
• Host configuration
• Host performance
• Docker is great! But it still has some
issues
Testbed (e.g. CI) can cause test failures as well
15Copyright© 2016 NTT Corp. All Rights Reserved.
• HADOOP-12687
• Many YARN test fails when /etc/hosts has multiple loopback
entries
• ZOOKEEPER-2252
• Test: nslookup("a") should fail
• It does not fail when there is actually the host named "a“
• INFRA-11811
• JDK was not set up properly in a Jenkins slave
• Such a test can fail when the job is assigned to a
specific buildbot and it looks like a flaky test
CI host configuration can cause test failures
16Copyright© 2016 NTT Corp. All Rights Reserved.
CI host performance: they're not made equal
• Hadoop's buildbot https://builds.apache.org/computer/
Data are captured at 25/04/2016
17Copyright© 2016 NTT Corp. All Rights Reserved.
CI host performance: they're not made equal
• Spark's buildbot https://amplab.cs.berkeley.edu/jenkins/computer/
18Copyright© 2016 NTT Corp. All Rights Reserved.
CI host performance: they're not made equal
• Significant difference in the response time!
• Maybe related to the fact that Spark has only a
small number of test-related issues
(e.g. YARN 63% vs Spark 6% (slide 7))
Target Average Max Min
Hadoop 1163ms 1482ms 30ms
Spark 3ms 6ms 0ms
19Copyright© 2016 NTT Corp. All Rights Reserved.
Docker is great for testing!
• Some Apache software are using Docker on their
CI (via Apache Yetus)
• Apache BigTop also utilizes Docker for
provisioning Hadoop
• People also loves Docker for setting up test beds
on their workstations and laptops
• Of course me too
Docker issues
20Copyright© 2016 NTT Corp. All Rights Reserved.
• Mentioned in several Apache-related issue tickets:
• jupyter/docker-stacks#75: Spark hanging
• docker-library/cassandra#43, #46
• docker-solr/docker-solr#4
• ALLURA-8039
• AMBARI-14706
• IGNITE-2377
• YETUS-229 …
• Fortunately Apache Buildbot (Yetus) didn't hit the bug,
but made people's local testbeds flaky in a weird way.
• Fixed in recent kernels (so, accurately, it's not a Docker's issue)
Docker #18180: Java VM unkillable zombie
21Copyright© 2016 NTT Corp. All Rights Reserved.
AUFS: fcntl(F_SETFL, O_APPEND) was not supported
(#20199)
• Can cause data corruption (Dovecot is known to be affected)
• Fixed in recent AUFS
Overlay: You should not open O_RDWR and
O_RDONLY simultaneously (#10180)
• Can cause data corruption (RPM is known to be affected)
• Expected behavior, won't get fixed
More information: https://github.com/AkihiroSuda/docker-issues
Other potential Docker-related issues
22Copyright© 2016 NTT Corp. All Rights Reserved.
• Some issues can occur only in a
deployed environment rather than in a
CI
• e.g. TCP packet corruption
• Very flaky and critical
Flaky test is not limited to xUnit in CI..
TCP
23Copyright© 2016 NTT Corp. All Rights Reserved.
https://www.pagerduty.com/blog/the-discovery-of-apache-
zookeepers-poison-packet/
• TCP checksum was ignored in some IPsec
configuration
• ZooKeeper became weird intermittently due to corrupted TCP
packet
https://tech.vijayp.ca/linux-kernel-bug-delivers-corrupt-tcp-ip-
data-to-mesos-kubernetes-docker-containers-
4986f88f7a19#.gq8chzply
• TCP checksum was ignored in some veth
configuration
• Mesos and Kubernetes are affected
TCP packet corruption
TCP
24Copyright© 2016 NTT Corp. All Rights Reserved.
• It's very hard to notice (and reproduce) flaky TCP
packet corruption...
• Should distributed systems be TCP-corruption
tolerant...?
• the probability is very low in regular environments,
but it is not zero
(32-bit Ethernet CRC + 16-bit TCP checksum)
• JIRA issues: ZOOKEEPER-2175, HDFS-8161…
TCP packet corruption
TCP
25Copyright© 2016 NTT Corp. All Rights Reserved.
Agenda
• Current "flakiness" in Apache software
• Why flaky test matters?
• What causes a flaky test?
• How can we find, reproduce, and fix a flaky test?
• Existing work at Apache communities
• Our work: Namazu(鯰, catfish)
https://github.com/osrg/namazu
26Copyright© 2016 NTT Corp. All Rights Reserved.
• determine-flaky-tests-hadoop.py
• Apache Kudu‘s CI (dist_test)
• Google's TAP
• Our work: Namazu
https://github.com/osrg/Namazu
• and similar great tools
Efforts to find/reproduce a flaky test
27Copyright© 2016 NTT Corp. All Rights Reserved.
• Picks up failed tests using Jenkins API
• Included in hadoop.git/dev-support (HADOOP-
11045)
determine-flaky-tests-hadoop.py
$ determine-flaky-tests-hadoop.py --job Hadoop-YARN-trunk****Recently FAILED builds in url: https://builds.apache.org/job/Hadoop-YARN-trunk...Among 15 runs examined, all failed tests <#failedRuns: testName>:
7: TestContainerManagerRecovery.testApplicationRecovery...
28Copyright© 2016 NTT Corp. All Rights Reserved.
• Great tool, but it doesn't support running a
specific test repeatedly
• Also there is a maven dependency issue (YARN-
4478)
• B depends on A
• TestB is never executed if TestA fails
if TestA is flaky, we can't evaluate the flakiness of
TestB!
determine-flaky-tests-hadoop.py
29Copyright© 2016 NTT Corp. All Rights Reserved.
Kudu's CI: flaky test dashboard
http://dist-test.cloudera.org:8080/ (Apr 25)
Recently open-sourced and introduced at Apache: Big Data (Monday)
https://github.com/cloudera/dist_test
30Copyright© 2016 NTT Corp. All Rights Reserved.
Kudu's CI: flaky test dashboard
• Tests are run repeatedly on CI to find flaky tests
• KUDU_FLAKY_TEST_ATTEMPTS
• KUDU_FLAKY_TEST_LIST
(From https://github.com/apache/incubator-kudu/commit/1a24338a)
Fix flakiness of client_failover-itest
The reason this test was flaky is that there is a race between....
Looped 100x and they all passed:
http://dist-test.cloudera.org/job?job_id=mpercy.1454486819.10566
Author Mike Percy Jan 29, 2016 8:01 AMCommitter Todd Lipcon Feb 4, 2016 2:14 PMCommit 1a24338ad60a8842d1ae5e227f8f03e58faea8c0
31Copyright© 2016 NTT Corp. All Rights Reserved.
• Google's internal CI
• 1.6M test failures per day
• 73K (4.5%) are flaky
• Repeat a failing test 10 times for labeling
flaky tests
• Information source: An Empirical Analysis
of Flaky Tests (Q.Luo et al. ACM FSE'14)
Google's TAP
32Copyright© 2016 NTT Corp. All Rights Reserved.
• Modern CIs run jobs repeatedly to find /
reproduce flaky tests
• But they don't control non-determinism
• Overlook a flaky test
• Can not reproduce a failure
Cannot analyze the failure
• Our suggestion: increase non-determinism
for finding and reproducing flaky tests
Challenge: poor non-determinism
33Copyright© 2016 NTT Corp. All Rights Reserved.
NAMAZU: PROGRAMMABLE FUZZY SCHEDULER
https://github.com/osrg/namazu
NOTE: Namazu was formerly named "Earthquake"
34Copyright© 2016 NTT Corp. All Rights Reserved.
Namazu: programmable fuzzy scheduler
https://github.com/osrg/namazu
EventFuzzed (Randomized)
Schedule
Increases non-determinismfor finding and
reproducing flaky tests
Filesystem Packet Go[planned] Linux threadsJava
鯰 (namazu) means
a catfish in Japanese
35Copyright© 2016 NTT Corp. All Rights Reserved.
FUSE
Netfilter
Openflow
Byteman
AspectJ
Filesystem Packet Go[planned] Linux threadsJava
AspectGo
[wip]
sched_
setattr(2)
Namazu uses non-invasive techniques
• can be easily applied to any environment
• can avoid false-positives
Namazu: programmable fuzzy scheduler
https://github.com/osrg/namazu
https://github.com/AkihiroSuda/golang-exp-aspectgo
36Copyright© 2016 NTT Corp. All Rights Reserved.
• xUnit tests
• 😃 Easy to get started; just run `mvn`
• 😃 Can reproduce test failures observed in CI
• 😞 Limited testable scope
• Integration tests on a distributed cluster
• 😃 Can test everything
• 😞 Need to write a script to set up the cluster
• But Docker helps us a lot!
Namazu targets
37Copyright© 2016 NTT Corp. All Rights Reserved.
We support the both scenarios
Namazu targets
Single-node mode
(for xUnit tests)
Distributed mode
(for integration tests)
$ mvn test
Orchestrator
RPC
38Copyright© 2016 NTT Corp. All Rights Reserved.
NAMAZU + XUNIT TESTS
$ mvn test
39Copyright© 2016 NTT Corp. All Rights Reserved.
• Namazu is a comprehensive framework...
• Quick start: “renice” threads for xUnit tests
• POSIX.1 requires that threads share the single nice(priority)
value, but the actual Linux implementation (NPTL) not.
• Not always effective, but it’s generic and easy to get started
Namazu + xUnit tests
Filesystem Packet Go[planned] Linux threadsJava
40Copyright© 2016 NTT Corp. All Rights Reserved.
Namazu + xUnit tests
$ PID=$(docker inspect $(docker ps -q -f ancestor=hadoop-build-ubuntu) | jq .[0].State.Pid)$ sudo nmz inspectors proc -pid $PID
$ cd hadoop; ./start-build-env.sh[container]$ mvn test –Dtest=TestFoo#testBar
Namazu periodically sets random nice values for all the child
processes and the threads under $PID
Plus utilizes non-default kernel schedulers (e.g. SCHED_BATCH)
41Copyright© 2016 NTT Corp. All Rights Reserved.
Namazu + xUnit tests: Reproducibility
Testcase Traditional Namazu
YARN-4548
RM/TestCapacityScheduler11% 82%
YARN-4556
RM/TestFifoScheduler2% 44%
ZOOKEEPER-2137
ReconfigTest2% 16%
YARN-4168
NM/TestLogAggregationService1% 8%
YARN-1978
NM/TestLogAggregationService0% 4%
YARN-4543
NM/TestNodeStatusUpdater0% 1%
• More information: osrg/namazu#125
42Copyright© 2016 NTT Corp. All Rights Reserved.
Namazu + xUnit tests: Reproducibility
Testcase Traditional Namazu
ZOOKEEPER-2080
ReconfigRecoveryTest
14.0% 61.9%
• "Renicing" is not always effective...
• But even when renicing is ineffective,
sometimes you can also reproduce the flaky test
by injecting delays or reordering packets
$ sudo iptables ... -j NFQUEUE --queue-num 42$ sudo nmz inspectors ethernet -nfq-number 42
43Copyright© 2016 NTT Corp. All Rights Reserved.
NAMAZU + INTEGRATION TESTS
44Copyright© 2016 NTT Corp. All Rights Reserved.
• ZooKeeper: distributed coordination service
• used in Hadoop, Spark, Mesos, Kafka..
• ZooKeeper 3.5 (alpha) introduced the dynamic
configuration
• We performed an integration test so as to evaluate
the reliability of the reconfiguration
• We found a flaky bug!
Namazu + Integration tests
45Copyright© 2016 NTT Corp. All Rights Reserved.
• We permuted some specific Ethernet packets in random
order using Namazu
• TCP retransmissions are eliminated for reducing possible state
space
Namazu + Integration tests
ZooKeeper cluster
Open vSwitch + Ryu SDN Framework
+ Namazu
46Copyright© 2016 NTT Corp. All Rights Reserved.
• Bug: New node cannot participate to ZK cluster properly
New node cannot become a leader of ZK cluster itself
(More technically, it keeps being an "observer“)
• Cause: distributed race (ZAB packet vs FLE packet)
• ZAB.. atomic broadcast protocol for data
• FLE.. leader election protocol for ZK cluster itself
Found ZOOKEEPER-2212
Leader of ZK cluster New ZK node
ZAB [2888/tcp]
FLE [3888/tcp]
Uses different TCP connection
Non-deterministic packet order
47Copyright© 2016 NTT Corp. All Rights Reserved.
Data are captured at 22/01/2016
Found ZOOKEEPER-2212
48Copyright© 2016 NTT Corp. All Rights Reserved.
• Expected: ZK cluster works even when 𝑵/𝟐 nodes
crashed
• Real: single node failure can terminate the 3-node
ensemble
Found ZOOKEEPER-2212
Not participating properly
(keeps being an "observer")
49Copyright© 2016 NTT Corp. All Rights Reserved.
• Reproducibility: 0.0% 21.8%
(tested 1,000 times)
• We could not reproduce the bug even after
5,000 times traditional testing (60 hours!)
• Even reproducible by “renicing” threads, but the
reproducibility is just 0.7%
How hard is it to reproduce?
50Copyright© 2016 NTT Corp. All Rights Reserved.
We define the distributed execution pattern based on code coverage:
𝑷 =
𝒑𝟏,𝟏 ⋯ 𝒑𝟏,𝑵
⋮ ⋱ ⋮𝒑𝑳,𝟏 ⋯ 𝒑𝑳,𝑵
• 𝐿: LOC
• 𝑁: Number of nodes (==3 in this case)
• 𝑝𝑖 ,𝑗 : 1 if the node 𝑗 covers the branch in line 𝑖 , otherwise 0
• We used JaCoCo: Java Code Coverage Library (patch: ZOOKEEPER-2266)
Why we can hit the bug?
Namazu achieves faster pattern growth.
That's why we can hit the bug.
51Copyright© 2016 NTT Corp. All Rights Reserved.
HOW TO USE NAMAZU?
52Copyright© 2016 NTT Corp. All Rights Reserved.
Easy to install
Easy to get started
• Provides Docker-like CLI
• No code instrumentation needed
• No configuration needed (default: just renice threads)
How to use Namazu?
$ sudo apt-get install lib{netfilter-queue,zmq3}-dev$ go get github.com/osrg/namazu/nmz
$ sudo nmz container run –it –v /foo:/foo ubuntu[container]$ cd /foo && mvn test
53Copyright© 2016 NTT Corp. All Rights Reserved.
For threads ("renicing")
$ sudo nmz inspectors proc -pid $TARGET_PID
$ sudo nmz inspectors fs -mount-point /nmzfs
$ sudo iptables ... -j NFQUEUE --queue-num 42$ sudo nmz inspectors ethernet -nfq-number 42
Need distributed mode? (for integration testing)
Just add `--orchestrator-url http://foobar:10080/api/v3` to the CLI.
For filesystem
For network packets
How to use Namazu?
54Copyright© 2016 NTT Corp. All Rights Reserved.
Namazu API (Go)
type ExplorePolicy interface {QueueEvent(Event)ActionChan() chan Action
}
func (p *MyPolicy) QueueEvent(event Event) {action := event.DefaultAction()p.timeBoundedQ.Enqueue(action,
10 * Millisecond, 30 * Millisecond)}
func (p *MyPolicy) ActionChan() chan Action {return p.timeBoundedQ.DequeueChan
}
Action is randomly fired in [10ms, 30ms]
You can also inject fault actions here
Namazu defines REST API,
so you can also use other languages
An event can contain
Ethernet packet bytes
55Copyright© 2016 NTT Corp. All Rights Reserved.
• We found a bug: YARN cannot detect disk failure cases
where mkdir()/rmdir() blocks
• We noticed that the bug can occur theoretically
when we are reading the code, and actually produced the
bug using Namazu
• When we should inject the fault is pre-known;
so we manually wrote a concrete scenario using Namazu API
• Much more realistic than JUnit + mocking
API use case: found YARN-4301
mkdir
EIO
mkdir
...
A case where mkdir() returns EIO explicitly A case where mkdir() blocks
56Copyright© 2016 NTT Corp. All Rights Reserved.
func (p *MyPolicy) signalHandler() {signal.Notify(sigChan, syscall.SIGUSR1)for {
<-sigChanp.sleep = 10 * time.Minute
}}go p.signalHandler()func (p *MyPolicy) QueueEvent(event Event) {..}func (p *MyPolicy) ActionChan() chan Action {..}
$ go run mypolicy.go inspectors fs -mount-point /nmzfs
Set "yarn.nodemanager.local-dirs" to "/nmzfs/nm-local-dir",
Send SIGUSR1 to Namazu when you (and YARN) are ready
Interactive test is often easier than writing a JUnit testcase
We use SIGUSR1 here,
but it is also interesting to
implement human-friendly
CLI or GUI for
interactive testing
fault: blocks for 10 minutes
API use case: found YARN-4301
57Copyright© 2016 NTT Corp. All Rights Reserved.
API use case: found YARN-4301
58Copyright© 2016 NTT Corp. All Rights Reserved.
• If you have knowledge on the protocol, you can make
a hash for a packet
• Note that you have to eliminate time-dependent and random
bytes when you hash the packet
• Using the hash and Namazu API, you can "semi"-
deterministically replay the scenario
• Not fully deterministic; it just does its best effort
• Record-less! You just need to remember the "seed" for
replaying
• PoC: ZOOKEEPER-2212: up to 65% reproducibility
• More information: osrg/namazu#137
• See also (for Go): https://github.com/AkihiroSuda/go-replay
Another API use case: "semi"-deterministic replay
59Copyright© 2016 NTT Corp. All Rights Reserved.
SIMILAR GREAT TOOLS
60Copyright© 2016 NTT Corp. All Rights Reserved.
• Network partitioner + Linearizability tester
• Famous for "Call Me Maybe" blog: http://jepsen.io/
• “Call Me Maybe” by Carly Rae Jepsen (vevo):
https://www.youtube.com/watch?v=fWNaR-rxAic
• Randomly injects network partition using iptables
• "Linearizability" ∈ "Strong consistency"
• Integration test on a flaky network rather than a
flaky xUnit test
Similar great tool: Jepsen
61Copyright© 2016 NTT Corp. All Rights Reserved.
• Has been used to test several Apache software
• Cassandra: 9851,10001,10068,10231,10413,10674
• http://www.datastax.com/dev/blog/testing-apache-cassandra-with-jepsen
• HBase
• Kafka
• Solr: 6530, 6583, 6610
• http:///lucidworks.com/blog/2014/12/10/call-maybe-solrcloud-jepsen-
flaky-networks
• ZooKeeper
Similar great tool: Jepsen
62Copyright© 2016 NTT Corp. All Rights Reserved.
• Namazu is much more generalized
• The bugs we found/reproduced are basically beyond the
scope of Jepsen (Threads, Disks..)
• Namazu can be also combined with Jepsen! It will be
our next work..
Namazu + Jepsen?
• causes network partition
• tests linearizablity
• increases non-determinism
• injects filesystem faults
Jepsen Namazu ...
63Copyright© 2016 NTT Corp. All Rights Reserved.
• Make the filesystem flaky using FUSE
• Used in testing ScyllaDB (Apache Cassandra's clone)
• https://github.com/scylladb/charybdefs
• Similar to Namazu FS
• Both supports API
• Also similar to PetardFS (not active since 2007)
• CharybdeFS can be also combined with Namazu as
well
• CharybdeFS is specialized in FS; Namazu is much more
comprehensive.
Similar great tool: CharybdeFS
64Copyright© 2016 NTT Corp. All Rights Reserved.
https://github.com/NetSys/demi
• Found some akka-raft bugs and reproduced a few Spark bugs
• challenge in reducing false-positives related to instrumentation
• DEMi and Namazu are complementary each other
• DEMi is powerful, but has some limitations
• Namazu is comprehensive and made easy to get started
Similar great tool: DEMi (appeared in NSDI'16)
Namazu DEMi
Target Generic
(Network,Filesystem,Thread..)
Akka
Getting Started Easy Need to write
AspectJ codes
Deterministic Replay? No Yes
Bug Cause Minimization? No Yes
65Copyright© 2016 NTT Corp. All Rights Reserved.
SO... HOW CAN WE FIX FLAKY TESTS?
66Copyright© 2016 NTT Corp. All Rights Reserved.
• Namazu finds/reproduces flaky tests, but it
doesn't automatically fix them😞
• Basic approach for async-related flakiness:
Adjust the values for sleep() and retries in the
test code
How can we fix flaky tests?
invokeAsyncOperation();// some tests lack even this sleepsleep(certainHardcodedTimeout);assertTrue(checkSomethingGoodHasHappened());
67Copyright© 2016 NTT Corp. All Rights Reserved.
How can we fix flaky tests?
invokeAsyncOperation();// some tests lack even this sleepsleep(certainHardcodedTimeout);assertTrue(checkSomethingGoodHasHappened());
• Suggestion: the timeout(&retries) should be a configurable
parameter rather than a hard-coded value
Timeout value Cost
(time)
Risk (timeout) Appropriate for
Long High Low • Slow machine (e.g.CI)
• Conservative person
Short Low High • Fast machine
• Risk-appetite person
68Copyright© 2016 NTT Corp. All Rights Reserved.
CONCLUSION
69Copyright© 2016 NTT Corp. All Rights Reserved.
• Apache software are well tested
• But they are flaky
• Let’s improve them
• Improve asynchronous code
• Repeat tests
• Our tool can control non-determinism
so as to reproduce flaky tests
https://github.com/osrg/namazu
Conclusion