Tests are an invaluable tool for me and my team, I consider them to be a key piece of my engineering toolbelt that I’ve built up over the years. At Wayfair, teams use tests to maintain code quality, as a test-driven development tool, and as a way to self-document new features. All of this utility is possible when you have a set of tests, a “test suite”, which can be trusted; a test suite that works as advertised by producing consistent results.

Somewhere around the point of 10,000 or so tests in our test suite, we’ve found ourselves in a nightmare scenario. We’ve had some rogue tests, ones which were giving us false positives or negative results, without being obviously wrong. We would disable one test and another would break – there were red herrings everywhere. In worst-case scenarios, our engineers would even be occasionally blocked from deploying code due to phantom test failures. Our test suite could no longer be trusted.

This article is about how we identified these rogue tests, how we fixed them, and how we went about preventing them from happening again. We ensured our test suite could be trusted and relied upon, and perhaps you’ll discover how you could do the same for your own tests.

A Bit of History

Wayfair made a big transition to React in early 2017. Because of React, our engineers started to spend significantly more of their time writing JavaScript when developing our applications. This brought with it an emphasis on JavaScript testing as we wanted to make sure our JavaScript apps functioned as expected, without defects. New framework features, shared component libraries, feature code – all of it needed to be tested. We decided that we needed to update our aging Jasmine test runner to keep up with the testing demands of our teams. We turned to Jest as it was a natural successor to Jasmine. Thanks to their almost identical APIs we were able to make the switch rather quickly.

It’s one thing to introduce new technologies, it’s a whole other set of problems to ensure that your teams are using these technologies correctly! It turns out that ramping up over a thousand engineers on new technology and testing methods is not at all a simple task. We needed a lot of help ensuring that we had the right testing patterns and tooling in place, so we’ve done a lot over the years to ensure that. We ran (and still do) training programs and talks about testing and JavaScript in general. We also provide bite-sized, self-service training called “Awesome Learning” which engineers can tackle in a group setting at their own pace.

Alongside education, we’ve also incorporated improved tooling into our processes. Tooling that can provide real-time feedback to engineers. For example, with ESLint, the eslint-plugin-jest helps us avoid common issues in tests before they ever make it to code review. Our static analysis is not limited to ESLint – we also have quite a bit of flow-typed JavaScript and daily static analysis jobs powered by SonarQube. All of this is to make sure that we have a healthy codebase covered by tests. 

At the time of writing, our general test suite has grown to be over 14,000 tests, plus an additional thousand or so tests scattered across our different internal frameworks, services, and libraries. As our number of tests grew, so did the problems with the tests themselves. Despite all of our efforts, we have not done enough to ensure that the tests themselves were defect-free!

Hidden Challenges with Tests

This might sound obvious in hindsight, but tests are just code. Code is written by engineers and engineers are people who make mistakes, no one is exempt from this rule. Mistakes lead to bugs, bugs in the test code themselves lead to (if you’re lucky), flaky tests. That’s right, you are lucky if you have a flaky test because this is considered a data point, a signal that tips you off about something not being right.

Our process of triaging a flaky test is straight-forward (though the specific solution might not be). First, isolate the test which is producing inconsistent results by skipping it in a test suite, then open a ticket to the authoring team asking them to patch the test. Finally, the team goes back and produces a fixed, working alternative and its merged back into the test suite.

The much more tricky test to fix is one that provides false positive results, one which gives you no data about it being broken. A test which quietly hums away, always passing. This test might even go as far as breaking some other tests, completely unrelated to it – what fun. This type of test is the most insidious, but is also the most interesting to fix! Let’s look at some examples that are rather generic – while tests in real-time may be much more complex, these examples will be enough to highlight the problems.

That Shouldn’t Be Passing

A big roadblock with identifying test anti-patterns is complicated by Jest itself, unfortunately. The logic of whether or not a test fails in Jest is simple: if it throws an error, it fails, if it does not, then it passes.

Based on this logic, here is a good rule of thumb: don’t ever make your test assertions conditional. If there is even a 0.01% chance that the condition may be false in a test, assume that it will be. Always make your assertions unconditional. This isn’t limited to if-statements by the way, there are a ton of implied conditionals which might not seem obvious when writing tests. Take this test snippet as an example:

test('function adds 4', () => {
  const original = [1, 2, 3, 4];
  const results = fn(original);
  results.forEach((result, i) => expect(result).toBe(original[i] + 4));
});

Can you spot the problem here? The  fn(original) call returns an empty array? You guessed it, this test will pass. Yes, the test didn’t assert anything but it also didn’t throw any errors, so it still passes just fine. Linters don’t catch this either, but someone should definitely write a lint for it. This is a common issue in React UI tests where an engineer is trying to assert something on a collection of rendered elements, they might not render at all and no assertions actually happen. However, the test won’t fail.

Async Soup

This is a variation of the conditional problem, but now the conditional expectations are a result of async logic, which is perhaps not running as expected. Check this out:

test('promises are tough', () => {
   let promise = Promise.resolve();
   const wrapper = <Component fetch={fetchMock} />
   wrapper.find(...).simulate('click');
   promise.then(() => expect(...).toBe(...)); // assert some stuff
   return promise;
});

If you’ve written enough Jest tests and used enough promises, the problem is obvious. The .then() with the expect() call should be part of the return statement. But does this test pass? Yes, yes it does. Things get really interesting here, as static linters don’t catch this case and the test passes when it runs. You might also be thinking that it’s easy enough to fix by adding an expect.hasAssertions() to the top of the test, by ensuring that assertions are accounted for. That won’t fail this test either, at least in Jest at the time of writing. Assertion counts are incremented by calls to expect in Jest and these numbers are collected asynchronously, so a non-async test with an async assertion like this still passes in all ways. Interesting to note that if the assertion does fail, the test will not. More on this below.

Jest Tests in Strict Mode

It’s rather difficult to catch the issues above statically (with linters for example). In a sufficiently complex test suite, there are many permutations of these anti-patterns which are not so obviously wrong when examined, and this makes them very difficult to track down. What you need instead are automated runtime checks and safeguards to prevent this sort of logic from creeping into your tests. That’s why at Wayfair we’ve developed something we call “strict mode” for our Jest tests. It guides our engineers by acting as a runtime safeguard preventing poorly written tests from making into our test suite.

Our “strict mode” patches some part of the Jest API (more on this later) and makes sure that two essential runtime checks are performed on every test executing:

  • Fail any test without an assertion
  • Prevent task scheduling in a test after the test is already complete

The former is mostly linted for, but there are hard to catch cases where a runtime check is necessary. The latter is a very important check that must be done at runtime to ensure that no single test is “leaking” tasks like promises or timeouts across test boundaries. You also need the latter to make sure the former works as expected in Jest. You can read more about this in this linked GitHub issue.

How Does it Work?

The engine that powers our Jest “strict mode” is a small Jest plugin called “jest-plugin-must-assert”, which is available here. All you need to do is install the package and add the plugin to the list of “setupFilesAfterEnv” (for Jest 24+) in the Jest configuration, like so:

{
   setupFilesAfterEnv: [
       "jest-plugin-must-assert" 
   ]
}

Once you do this, two things will begin to happen. First, any tests without assertions will begin to fail. Second, any test attempting to schedule a task after the test itself is complete will be blocked and a warning will be printed to the console:

console.warn src/index.js:72
  Test "unreturned promise assertions" is attempting to invoke a microTask(Promise.then) after test completion. See stack-trace for details.

  onInvokeTaskDefault (src/index.js:14:11)
  Object.onInvokeTask (src/index.js:81:22)
  ____________________Elapsed_1_ms__At__Wed_Jul_17_2019_12_57_10_GMT_0400__Eastern_Daylight_Time_ (http://localhost)
  then (e2e/failing/__tests__/index.js:23:21)
  callback (src/index.js:112:59)

We’ve found these stack traces to be essential in tracking down exactly what piece of code is trying to trigger an errand task callback. These traces are invaluable in helping us track down React warnings for setState calls on unmounted components and poorly timed network requests.

The plugin itself does some complex maneuvering to “patch” the Jest API in order to make every test be wrapped in its own Zone. This is enabled by the excellent Zone.js library brought to you by the Angular team. What is a zone?

“A Zone is an execution context that persists across async tasks. You can think of it as thread-local storage for JavaScript VMs.”

The Zone library gives the strict mode plugin the ability to track asynchronous events, like timeouts and promises, which would not be possible otherwise. I could write an entire blog post about Zones alone and why they are so essential to making this work, but that’ll have to be for another day. I encourage you to read more about it in the GitHub link above.

Your Mileage May Vary

When we first tried “strict mode” Jest configuration in our test suite the results were not pretty. I believe we found something around 3% (of total ~10k tests) of our tests being either flaky, unintentionally missing assertions, or giving false positive results. While the percentage might be low, the havoc it wreaks on the rest of the suite can be disproportionately large. This might not be the case for everyone, though. One of the things that worked in our favor was that we could roll out this strict mode configuration in phases and allow teams to gradually submit patches for their tests. A separate configuration also allowed engineers to run a strict version of tests locally, which we enabled as a requirement shortly after. By now we’ve rolled out strict mode to all of our CI jobs, as it’s baked into all of our test configurations by default. 

As you’ve read, strict mode has been pretty helpful to us, but your mileage may vary. You might want to consider some additional strategies for avoiding these problems.

Avoid Human Error

While this is pretty generic advice, this is all you might need on some teams. A smaller team, where all developers are on the same page, agree on a testing philosophy and everyone reviews each other’s code. In my experience, any team of more than two engineers will benefit from as many automated safeguards for tests as possible. Engineers with the best of intentions make mistakes, and even in our most tested area of the codebase, our framework, we’ve found a non-zero amount of false positive tests.

Make Sure Your Tests Fail First

If you’re a practitioner of Test Driven Development, you’ll recognize this tenet. I’ve found that engineers who are disinterested in the TDD process also throw away this nugget of wisdom along with it. Even if you write tests for code after the fact and don’t follow strict TDD, you should still make sure the tests can fail. Without verifying the failure conditions, you won’t really know for sure that tests are working as intended.

Avoid Indirection in Tests

Indirection is antithetical to a good test case. Avoid too many helpers and indirection in tests and you’ll be less likely to end up with a nondeterministic test. A non-trivial amount of broken tests arise because engineer “B” misread how engineer “A” intended for some test-helper to be used. For example, missing a crucial callback for a promise or a missing assertion. A ton of indirection makes it pretty challenging to decipher the intention of the test after the fact, which makes them even harder to fix if there is a problem.

Write Smaller Tests

This is a general, tried and true method of making sure your tests are not flaky and, ultimately, useful. A sure sign of a convoluted test is one with “and”s in the name, “should render widget AND fetch data … AND launch missiles”. The larger the test, the more variables it depends on which means more things that can and will go wrong. Write more focused tests which are responsible for testing a specific feature of the system, not all of the features at the same time!

Consider Alternative Test Runners

One of the main reasons for writing an entire plugin for Jest is the lack of a configuration option for failing tests without assertions. As of the time of writing this post, it seems unlikely to be added to Jest in the future. If you are starting out and are examining available test runners, you may want to consider the excellent avajs. Avajs is much stricter about how to write asynchronous tests (which are a major source of bugs) and has a built-in ability to fail tests without assertions due to its lack of any globals in the core API.

Conclusion

Today our test suite is humming away, powered by Jest. These issues were not easy to identify or fix. Thankfully, the flexible Jest API bailed us out, allowing us to patch additional safeguards into our test suite via a plugin.

If you are interested in  jest-plugin-must-assert and would like to make additional contributions, or have better alternatives, open up issues on the GitHub repository. If you have any other comments and would like to reach out to me directly, you can contact me on Twitter @ballercat. Send me your grand testing wins! You can also see my talk about our testing tribulations at this year’s React Boston 2019, below:

We are always looking for more engineers excited about Frontend and testing at Wayfair – if that might describe, you don’t hesitate to apply for one of our open Frontend positions.