Testmunk joined Snapchat. For testing services, check out TestObject.


How to Address Flaky Tests

Posted by on March 29th, 2016

A comprehensive automated testsuite can be of tremendous vaIue in speeding up your develop- and test process and enable you to land features sooner live in production. However, one common obstacle that can certainly hinder automation efforts can be ‘test flakiness’. This refers to tests which pass, but fail sporadically on subsequent tests. Such tests can be particularly frustrating because without consistent results, it can become exponentially harder to identify the root cause. Furthermore, rather than being taken as indicators of a problem within the app, ‘flaky tests’ can cause developers and testers to instead lose confidence in the tests and stop using them.

In this article I hope to not only share some of the causes of test flakiness, but also show some of the strategies employed by companies such as Box, Facebook and Spotify in addressing test flakiness.

What is test flakiness

Test flakiness, by our definition, consists of tests which both succeeds and fails at random intervals, providing little to no clue as to the root cause. Test flakiness, in my opinion, is nearly as bad as having no tests at all, simply because you can’t rely on flaky tests.

The next section will highlight some potential causes for test flakiness.

Top 7 causes for test flakiness in mobile testing:

  • Differences in your environments: For example between your local setup and your CI setup.
  • Concurrent runs: Tests running concurrently, with changing data during the test execution.
  • Unstable test code: If test code is poorly written, it can result in inconsistencies.
  • Using fixed sleeps: If a test is given fixed sleep periods, tests can run longer if quick processes must wait for the set interval, and more complicated actions may not be given enough time to complete, resulting in timeouts and failures depending on device or network.
  • Infrastructure/Issues with your device lab: Device issues or network problems can result in inconsistent results.
  • Long tests: Tests longer than 30 teststeps are more likely to introduce inconsistent results
  • Test dependency: Tests which are dependent on each other can fail because dependent actions engage too quickly or not quickly enough to meet success criteria.

The root causes above hint at some of the solutions towards identifying root causes for flaky tests. Creating simpler, shorter testcases can eliminate complexities resulting in inconsistent results. Checking that dependencies are given adequate time to engage can improve results as well. Regardless which of the possible root causes above are the culprit, flaky tests should be investigated thoroughly before taking the drastic step of eliminating the test. Here at testmunk we follow the rule of thumb to check for flakiness right when we script the testcase, meaning at testcase creation, the test has to pass 5 times.

Organizational approaches to test flakiness

Several companies have developed tools and processes in order to deal with test flakiness. One of the most common approaches is to isolate and identify the flaky test to start with.

Spotify: Ensuring Transparency

Kristian Lindwall from Spotify talked about some of the techniques his company was using to address flaky tests during his presentation at Mobile Delivery Days. The common theme throughout his presentation was transparency about test results, and accountability for teams and individuals. Spotify also developed tools in support of these objectives.

Here we see the process used to report flakiness and promote transparency and accountability:


Spotify developed several tools and processes to cope with test flakiness
Rerun dashboard to check for flakiness at Spotify

Box: Rerunning Flaky Tests

Spotify is not the only company that has focused efforts on dealing with flaky tests. Box has developed a library called “flaky” that identifies and reruns flaky tests. From reading their blog, a key motivation for its development was to cope with hard-to-fix flaky tests resulting primarily from dependencies on external components. By default, this library will automatically rerun any test marked as flaky an additional time if (and only if) it fails. The maximum number of times to rerun, as well as the minimum number of times the test must pass, is configurable. A failed test won’t be reported to the test runner if the rerun passes often enough not to be flagged.


Facebook: Using Bots

Facebook, too, has developed strategies for dealing with flaky tests. These efforts include several bots to manage tests and classify them into several categories such as “Good Test”, “Failing Test” or “Disabled Test” based on failure rates and times passed. For example, the developer has 5 days to fix the test from its designation as a “Failing Test” to reactivate it again. Roy Williams from facebook presented on this concept at the GTAC conference in 2014 (Video here).

Test Lifecycle at Facebook

As you can see, flaky tests can be a problem faced by all organizations using automated tests, and is dealt with in numerous interesting ways. How do you approach flaky tests? Tweet @testmunk and let us know.

martin_poschenrieder About the author:
Martin Poschenrieder has been working in the mobile industry for most of the past decade. He began his career as an intern for one of the few German handset manufacturers, years before Android and iPhone were launched. After involvement with several app projects, he soon realized that one of the biggest pain-points in development was mobile app testing. In order to ease this pain, he started Testmunk. Testmunk is based in Silicon Valley, and provides automated app testing over the cloud.
Follow Martin on twitter

Testmunk automates mobile app testing


Leave a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>