In software development, quality is paramount, and testing is essential to ensuring this quality. However, developers frequently encounter a persistent challenge - flaky tests. These are tests that produce inconsistent results, passing or failing without any changes in configuration. They pose a significant problem in the software testing process due to their unpredictability and complexity in debugging and resolution.
Playwright, a popular end-to-end testing framework, excels in automating tests for web applications. Despite its benefits, Playwright is not impervious to flaky tests. Thus, it becomes crucial for developers to understand how to detect and resolve these issues.
In this article, we explore the concept of flaky tests in Playwright. We will begin with an understanding of flaky tests, their symptoms, and impacts. We'll investigate their common causes and learn to identify them. Next, we present strategies to manage flaky tests and practices to prevent their occurrence. We will also learn from real-world case studies, illustrating how teams have tackled flaky tests in Playwright.
So, let's equip ourselves to handle flaky tests in Playwright effectively. This guide provides practical knowledge and techniques to counter one of the major challenges in software testing.
Understanding Flaky Tests
Flaky tests are unpredictable tests that exhibit inconsistency in their results. They are infamous within the Quality Assurance community for their elusiveness and the challenges they pose in debugging. As experienced QA engineers using Playwright, you may have encountered these frustrating tests in your projects.
What Are Flaky Tests?
In simple terms, a flaky test is one that could pass or fail for the same configuration. You run the test without any changes to your code, the testing environment, or test input, yet the result is not always the same. It's like flipping a coin; you can't predict if it will pass or fail.
Consider this example:
import { test, expect } from '@playwright/test';
test('flaky test example', async ({ page }) => {
await page.goto('https://ray.run/');
await expect(page).toHaveTitle('Expected Title');
});
The test navigates to https://ray.run/
and asserts that the page's title is 'Expected Title'. This test might pass at times and fail at others if the title of the page is dynamically changing.
It is worth nothing that there is nothing intrinsically wrong with the test itself. The test is valid and should pass if the title of the page is 'Expected Title'. We will cover the causes of flaky tests in more detail later in this article. For now, it's important to understand that flaky tests are not deterministic. They are unpredictable and can pass or fail without any changes in the configuration.
The Impact of Flaky Tests
Flaky tests can severely undermine the value of your testing suite.
- False alarms: They introduce uncertainty into the test suite, causing false alarms and leading you to waste time debugging non-existent defects.
- Loss of confidence: As the unpredictability increases, your trust in the test suite decreases, impacting the overall QA process.
- Slows down development: The unpredictability slows down the development and deployment processes as you have to spend additional time resolving these issues.
Eventually, flaky tests can lead to a loss of confidence in the testing process and the product itself. This can have a significant impact on the overall quality of the product and the team's morale (see The Domino Effect of Flaky Tests).
Symptoms of Flaky Tests
Typically, flaky tests show the following symptoms:
Sporadic Test Failures
The most common symptom of flaky tests is random, sporadic test failures. These are situations where a test passes at times and fails at other times without any changes in the codebase.
Here's a simple example of such a situation:
import { test } from '@playwright/test';
test('sporadic failure', async ({ page }) => {
await page.goto('https://ray.run/');
await expect(page).toHaveTitle('Expected Title');
});
In this example, the test might fail sporadically if the title of the page 'https://ray.run/' changes intermittently.
Unreliable Tests
Unreliable tests are those which produce inconsistent results. These inconsistencies might not always manifest as direct test failures, making them harder to detect. They are often the result of tests depending on each other or relying on specific state of the system which may change.
Consider the following example:
import { test } from '@playwright/test';
test('unreliable test', async ({ page }) => {
await page.goto('https://ray.run/feature');
await expect(page.getByTestId('status')).toHaveText('Submitted');
});
In the above test, if the status
element's text is subject to change based on different conditions, the test becomes unreliable and flaky.
Slow and Timeout Failing Tests
Slow tests can often be a symptom of flaky tests, especially if their execution time varies greatly. A common scenario for this is when a test is waiting for an event or response which takes inconsistent time to occur or doesn't occur at all, leading to timeouts.
Here is an example of such a test:
import { test } from '@playwright/test';
test('slow test', async ({ page }) => {
await page.goto('https://ray.run/');
const response = await page.waitForResponse('https://ray.run/api/slow-response');
await expect(response).toBeOK();
});
This test waits for a response from an API endpoint which is slow or inconsistent, making the test susceptible to timeouts and thereby introducing flakiness.
Non-Deterministic Tests
Non-deterministic tests are another symptom of flaky tests. These tests exhibit different behavior each time they are run, even under the same conditions. This is usually caused by elements of randomness or time dependencies within the test.
Here's an example:
import { test } from '@playwright/test';
test('non-deterministic test', async ({ page }) => {
await page.goto('https://ray.run/random');
const randomNumber = await page.$eval('.random-number', el => el.innerText);
expect(randomNumber).toBeLessThan(10);
});
In this test, the assertion is based on a random number displayed on the page, making the test outcome non-deterministic and hence flaky.
By recognizing these symptoms in your test suite, you can target and rectify flaky tests, ensuring the stability and reliability of your testing environment. Keep an eye out, and happy debugging!
Understanding flaky tests is the first step in taming them. With this knowledge, you can better navigate the challenges they present in your Playwright testing environment.
The Domino Effect of Flaky Tests
One crucial aspect that often goes overlooked is the cumulative impact of flaky tests in a large test suite. On the surface, a test with a flaky failure rate of 0.05% (0.0005) seems negligible. However, the real challenge appears when you have a suite of many such tests.
This article does a great job putting this into perspective. If you have a test suite of 100 tests, each having a 0.05% flaky failure rate, the success rate of the entire suite comes down to 95.12% (0.9995^100). This figure might still appear acceptable, but the true scale of the problem emerges when you are dealing with large-scale applications where you have thousands of tests.
Consider the situation where you have 1,000 of these flaky tests. The success rate of your test suite now plunges to a concerning 60.64% (0.9995^1,000). As the size of your test suite increases, the odds of having a completely successful run diminish rapidly due to the presence of these flaky tests, even if each one of them has a low individual failure rate.
It's crucial to grasp that even a minimal flaky test failure rate can have a significant cumulative impact, particularly in large-scale applications. This understanding underscores the importance of addressing flaky tests in your Playwright suite promptly and efficiently.
Causes of Flaky Tests in Playwright
Asynchronous Operations
Async operations are one of the primary culprits for flaky tests. With the asynchronous nature of JavaScript and TypeScript, there are times when certain operations do not complete before the next line of code executes, leading to unpredictable results.
Consider the following TypeScript example:
test('async operations', async ({ page }) => {
await page.goto('https://ray.run/');
// Async operation
await page.click('#start-button');
// Immediately checking result might lead to flaky test
const result = await page.$eval('#result', el => el.textContent);
expect(result).toBe('Running...');
});
Note that the use of
$eval
is explicitly discouraged in Playwright. It's used here for demonstration purposes only. Uselocator.evaluate()
, other Locator helper methods or web-first assertions instead.
In the example above, if the server takes too long to respond to the click event, the result may not be 'Running...' when checked immediately after, leading to a flaky test.
Test Isolation
Ensuring tests do not depend on each other is crucial for avoiding flaky tests. Each test should be isolated, meaning the result of one test should not influence another.
Here's an example of tests that are not properly isolated:
test('test A', async ({ page }) => {
await page.goto('https://ray.run/');
// Performs an operation that affects the state of the page
await page.click('#start-button');
});
test('test B', async ({ page }) => {
// This test might fail if test A did not complete its operation
const result = await page.$eval('#result', el => el.textContent);
expect(result).toBe('Ready...');
});
In the example above, if test A doesn't complete its operation (like fully loading the page) before test B starts, test B may fail, making it flaky.
This might become a particularly challenging issue when the third-party APIs or services are rate limited. In such cases, the tests might pass when run individually but fail when run in parallel.
Note that by default, Playwright isolates each test by running them in separate browser contexts. However, if you're using the same browser context for multiple tests, you need to ensure proper isolation.
External Dependencies
External dependencies like APIs, databases, or other services can introduce flakiness into your tests. If the dependency is unreliable or slow, it can lead to test failures.
Consider this test which relies on an external API to complete an operation:
test('external dependency', async ({ page }) => {
await page.goto('https://ray.run/api');
// If the API is slow or unreliable, this test might fail
const result = await page.$eval('#result', el => el.textContent);
expect(result).toBe('Successful');
});
In the example above, if the API at https://ray.run/api
is slow or unreliable, the test may fail unpredictably.
Understanding and mitigating these causes can go a long way in making your Playwright test suite more robust and reliable. Keep in mind that a good test is deterministic - it produces the same result given the same input, every time.
Identifying Flaky Tests in Playwright
Once you're aware of the causes of flaky tests, the next step is to identify these tests in your suite. Identifying flaky tests in Playwright involves vigilant observation and strategic use of tools and techniques.
Identifying Inconsistent Test Results
One of the most straightforward ways to identify flaky tests is through their inconsistent results. As we've discussed, a flaky test can pass or fail without any changes in configuration, code, or input. By closely observing the results of your test suite, you can identify which tests show inconsistent results over time.
test('identifying inconsistent result', async ({ page }) => {
await page.goto('https://ray.run/');
const element = await page.$('#dynamic-element');
expect(await element.textContent()).toBe('Expected Text');
});
In the above code, if #dynamic-element
is altering its text content dynamically, the test can sometimes pass and sometimes fail, making it flaky.
Using Flaky Test Detection Tools
There are also numerous tools available that can assist you in detecting flaky tests. These tools work by running your tests multiple times and identifying any tests that show different results between runs. Integrating these tools into your testing process can help automate the process of identifying flaky tests.
Manual Techniques for Identifying Flaky Tests
There are some manual techniques as well that can aid in identifying flaky tests:
- Running Tests in Parallel: If tests that usually pass start failing when run in parallel, they could be flaky.
- Running Tests on Different Machines or Environments: If tests behave differently across environments, they may be flaky.
- Running Tests Multiple Times: Flaky tests may pass now and fail in a subsequent run.
Using GitHub Actions
You may also use a manual GitHub Action to audit if your tests are flaky.
I wrote about this in this blog post.
Using Havoc
You may also use a tool that I developed to detect flaky tests by throttling network requests.
Read Havoc announcement blog post for more details or head straight to https://github.com/lucgagan/playwright-havoc
Strategies to Handle Flaky Tests in Playwright
You can tackle flaky tests in Playwright using various strategies such as applying timeouts and retries, ensuring proper test isolation, mocking external dependencies, and utilizing Playwright's built-in waiting mechanisms.
Retrying Flaky tests
Retrying flaky tests is a common strategy to mitigate intermittent failures and increase the overall stability of your test suite. By automatically rerunning failed tests, you provide an opportunity for transient issues to resolve themselves and ensure that the test outcome is reliable. Playwright Test provides a built-in mechanism for retrying failed tests:
# Give failing tests 3 retry attempts
npx playwright test --retries=3
Retries can be also configured using the test.retries
option in the Playwright Test configuration file:
import { defineConfig } from '@playwright/test';
export default defineConfig({
// Give failing tests 3 retry attempts
retries: 3,
});
While retrying flaky tests can be beneficial, it's important to consider the risks and potential drawbacks associated with this strategy. Here are a few key considerations:
- Increased Test Run Time: Retrying failed tests adds extra time to the overall test execution. If your test suite is large or the retries are frequent, it can significantly increase the test run time, affecting the feedback loop and slowing down the development process.
- Compounding Problems: Retries may mask underlying issues or bugs in your application. If a test fails consistently due to a legitimate bug, retrying the test may not address the root cause and can lead to false positives. It's essential to investigate and fix the root cause rather than relying solely on retries.
- Intermittent Failures: Not all test failures are transient. Some failures may indicate genuine issues that require immediate attention. By blindly retrying all failed tests, you might miss critical failures that require investigation and resolution.
To mitigate these risks, it's important to use retries judiciously and strike a balance between test stability and timely feedback. Here are some best practices to follow:
- Use retries selectively for tests that exhibit intermittent failures and have a high likelihood of transient issues.
- Monitor the test results and analyze the failure patterns. If a test consistently fails despite retries, investigate and fix the underlying issue instead of relying solely on retries.
- Set an upper limit on the number of retries and ensure that the delay between retries is reasonable. Adjust these parameters based on your specific needs and the nature of the flakiness.
In general, my advice is to either not use retries at all or set it to 1. If you have a flaky test, you should fix it instead of retrying it. If you have a test that fails intermittently due to external factors, you should consider other strategies discussed in this article, such as timeouts and waiting mechanisms.
If you have to, consider setting the number of retries at the test suite or test level instead of the global level. This allows you to retry only the tests that need it. You can do this by using test.describe.configure()
:
import { test } from '@playwright/test';
test.describe.configure({
retries: 3,
});
test('flaky test', async ({ page }) => {
await page.goto('https://ray.run/');
await expect(page).toHaveTitle('Expected Title');
});
Quarantine Flaky Tests
Quarantining flaky tests is a common strategy to handle them. This involves removing the tests from the test suite temporarily and running them separately. This allows you to focus on fixing the flakiness without impacting the rest of the test suite.
The following example shows how to quarantine a test in Playwright:
test('@quarantined flaky test', async ({ page }) => {
await page.goto('https://ray.run/');
});
Now that the test has @quarantined tag, you can skip it when running the test suite. You can do this by using --grep-inverted
flag with the test
command:
npx playwright test --grep-inverted @quarantined
You can then run the quarantined tests separately and focus on fixing the flakiness.
npx playwright test --grep @quarantined
The article about Organizing Playwright Tests using Tags goes into more detail about using tags in Playwright.
Something to keep in mind is that quarantining tests is not a long-term solution. It's a temporary measure to help you focus on fixing the flakiness. Once you've fixed the flakiness, you should remove the @quarantined
tag and add the test back to the test suite. You must discuss this with your team and ensure that you have processes in place to re-add the tests once they're fixed.
You could also use the test.skip
(or test.fixme
) function to skip the test instead of using the @quarantined
tag. However, I recommend using the @quarantined
tag because it allows you to run the quantined tests separately.
Isolate and Reproduce the Flaky Behavior
The first step in handling flaky tests is to isolate them from your suite and try to reproduce the flaky behavior. Running the test independently multiple times, under different conditions, or on different machines can help expose the flakiness.
In Playwright, you can use the --repeat-each
flag to run a test multiple times:
npx playwright test --repeat-each 100
Combine this with the --workers
flag to run the test in parallel:
npx playwright test --repeat-each 100 --workers 4
The combination of these flags can help identify flaky tests that are not isolated or are affected by parallel testing.
Throttling the Network Speed
Another variable you should consider is the network latency. At the time of writing this article, Playwright does not have a built-in way to simulate network latency. However, you can simulate network latency using page.route
method:
test('flaky test', async ({ page }) => {
await page.route('**/*', async route => {
await new Promise(resolve => setTimeout(resolve, 100));
await route.continue();
});
// ...
});
This workaround was suggested in this GitHub issue by Max Schmitt.
Throttling the network speed can help identify flaky tests that are affected by network latency.
Throttling the CPU Speed
You can also consider throttling the CPU speed to identify flaky tests that are affected by CPU speed.
This technique is described in this article by Yarden Porat.
import type { ChromiumBrowserContext } from 'playwright-core';
// ...
const client = await (page.context() as ChromiumBrowserContext).newCDPSession(page);
await client.send('Emulation.setCPUThrottlingRate', { rate: 2 });
Rate is the slowdown factor (1 is no throttle, 2 is 2x slowdown)
Addressing Timing and Synchronization Issues
As we discussed earlier, timing and synchronization issues are common culprits for flaky tests in Playwright. These can be mitigated by using Playwright's built-in waiting mechanisms. These ensure that the web page is in the right state before the test executes.
test('fixing timing issue', async ({ page }) => {
await page.goto('https://ray.run/');
const button = await page.$('#button');
await button.click();
await page.waitForSelector('#result');
const result = await page.$('#result');
expect(await result.textContent()).toBe('Expected Result');
});
In the updated code, we've added page.waitForSelector('#result')
before capturing the #result
element. This makes sure the element is loaded before we perform the assertion.
Fix the Root Cause
Sometimes the flakiness is caused by a legitimate bug in your application. In such cases, you should fix the bug instead of relying on retries or other workarounds. A good example of this is when the application allows to perform an action before the page is fully loaded. This article provides a great example of such instance. In cases like this, you should fix the application to restrict the user from performing the action before the page is fully loaded.
Making Your Test Environments Consistent
To prevent flakiness caused by differences in test environments, ensure that your test environments are consistent. The environments where the tests are run should be as close as possible to the production environment in terms of configuration and settings.
Testing the Range of Possible Outputs
Instead of asserting the exact output value, you can define a range of acceptable values that the algorithm should produce. This allows the test to pass as long as the actual output falls within the expected range.
Here's an example of how you can test a non-deterministic algorithm using a range of acceptable values:
test('non-deterministic algorithm', async () => {
const output = // Non-deterministic reference
expect(output).toBeGreaterThan(0);
expect(output).toBeLessThanOrEqual(10);
});
In this example, instead of expecting a specific output value, we assert that the output should be greater than 0 and less than or equal to 10. This allows the test to pass regardless of the specific value produced by the non-deterministic algorithm.
Handling Multithreading Issues
One effective approach to handling multithreading issues is to execute the problematic tests sequentially rather than in parallel. By running the tests one after the other, you eliminate the potential for concurrent interactions that could introduce flakiness.
The relevant Playwright CLI flag for this is --workers 1
:
npx playwright test --workers 1
However, this is a temporary solution. You should aim to fix the flakiness and run the tests in parallel again.
For maximum test execution time and for preventing flaky tests, you should also evaluate running tests in a fully parallel mode using Playwright's fullyParallel
option. Playwright Test runs tests in parallel. In order to achieve that, it runs several worker processes that run at the same time. By default, test files are run in parallel. Tests in a single file are run in order, in the same worker process.
Preventing Flaky Tests in Playwright
While it's crucial to handle flaky tests effectively, prevention is even more important. By following best practices and designing your tests carefully, you can prevent many flaky tests from appearing in the first place.
Locators can be used to prevent flaky tests by providing a reliable and resilient way to identify elements on a web page. Playwright offers several best practices for using locators effectively.
One approach is to use chaining and filtering with locators. This allows you to narrow down the search to a specific part of the page. For example, you can chain locators together to find an element with specific text and then perform an action on it. By using this technique, you can ensure that your tests are targeting the correct elements consistently.
const product = page.getByRole('listitem').filter({ hasText: 'Product 2' });
Another best practice is to prefer user-facing attributes over XPath or CSS selectors when selecting elements. The DOM structure of a web page can easily change, which can lead to failing tests if your locators depend on specific CSS classes or XPath expressions. Instead, use locators that are resilient to changes in the DOM, such as those based on role or text.
page.getByRole('button', { name: 'submit' })
Playwright also provides a test generator that can automatically generate tests and pick appropriate locators for you. The generator analyzes your page and determines the best locator based on factors like role, text, and test id. It even improves the locator if there are multiple matching elements, ensuring that it uniquely identifies the target element (you can also generate locators using Rayrun Browser Extension).
To generate locators using Playwright's codegen command, simply run it followed by the URL of the page you want to pick a locator from:
npx playwright codegen ray.run
During debugging, Playwright offers features like live editing of locators and picking more resilient locators interactively. While running in debug mode, you can edit locators directly in the Pick Locator field and see matching elements highlighted in the browser window. You can also hover over any element in the browser window while debugging to see code snippets for locating that element.
To further aid in troubleshooting flaky tests related to locators, Playwright provides actionability logs that show detailed information about each action performed during a test run. These logs indicate whether an element was visible, enabled, stable, scrolled into view, etc., helping you understand what happened during your test execution.
Additionally, Playwright offers a Trace Viewer tool which allows you to explore recorded Playwright traces of your tests. This GUI tool provides a visual representation of each action, along with a DOM snapshot and action details. You can also examine console messages, network requests, and source code within the Trace Viewer.
By following these best practices and utilizing the features provided by Playwright, you can effectively use locators to prevent flaky tests and ensure the reliability of your automated QA testing.
Using Promise.all
with Playwright to Prevent Race Conditions
When dealing with async functions in Playwright, you might run into race conditions if you're not careful with the control flow. Let's see what this looks like in code and how we can avoid such issues using Promise.all()
.
Consider this TypeScript test case. Can you spot the potential problem?
import { test } from '@playwright/test';
test('this might cause a race condition', async () => {
const next = page.locator('button', { hasText: 'submit' });
await next.click();
await page.waitForResponse('https://ray.run/url');
});
Did you see it? If somehow the URL response arrives before the next
button is clicked (and remember, Playwright is fast), the await page.waitForResponse('https://ray.run/url')
would stall.
This is where Promise.all()
comes into play. This method allows us to wait for multiple Promises to settle. Check this out:
import { test } from '@playwright/test';
test('avoid race condition with Promise.all()', async () => {
const submit = page.locator('button', { hasText: 'submit' });
await Promise.all([
// Start waiting for response
page.waitForResponse('https://ray.run/url'),
// Then click the button
submit.click(),
]);
});
By doing this, we can ensure both the button click and the response are finished. Nobody gets left behind!
However, be cautious about putting multiple actions in one Promise.all()
. It's like having several people controlling the keyboard at once - results can become unpredictable. Look at this example:
import { test } from '@playwright/test';
test('this might be troublesome', async () => {
const emailField = page.locator('input#email');
const nameField = page.locator('input#name');
await Promise.all([
page.waitForResponse('https://ray.run/url'),
emailField.fill('[email protected]'),
nameField.fill('Hi'),
]);
});
In this case, you're attempting to fill in the emailField
and the nameField
at the same time while waiting for a response, which might lead to inconsistent results.
So remember, Promise.all()
is a powerful tool when used appropriately.
This technique is described in this article by Po-Chun Chiu.
Run New Tests Multiple Times
When writing new tests, it's a good practice to run them multiple times to ensure that they are stable and reliable. This allows you to identify and fix any flakiness before the tests are added to the test suite.
We've already discussed how you can run tests multiple times using the --repeat-each
flag.
Ensure Proper Setup and Teardown
To ensure the reliability of your tests and prevent flakiness, it's crucial to establish proper setup and teardown procedures. Setting up the test environment correctly and cleaning up after each test run can greatly contribute to the stability of your automated QA testing using Playwright.
When setting up your test environment, make sure to initialize the necessary dependencies, configure the browser context, and set up any required test data. This ensures that your tests start from a known state, reducing the chances of flaky behavior due to inconsistent initial conditions.
Here's an example of how you can perform the setup using the @playwright/test
library in TypeScript:
import { test, expect } from '@playwright/test';
test.beforeEach(async ({ page }) => {
// Perform setup steps here
await page.goto('https://ray.run/');
// Other setup actions...
});
test('Your test name', async ({ page }) => {
// Your test code here
await page.click('text="Sign In"');
// Assertion and further test steps...
});
// More tests...
test.afterEach(async ({ page }) => {
// Perform teardown steps here
// Clean up test data, reset the environment, etc.
});
By using the beforeEach
and afterEach
hooks provided by @playwright/test
, you can ensure that the necessary setup and teardown actions are performed before and after each test. This way, you start each test with a clean slate and leave the environment in a consistent state for subsequent tests.
Anti-patterns
Using page.waitForTimeout
Don't use page.waitForTimeout
. Tests that wait for time are inherently flaky. Use Locator actions and web assertions that wait automatically.
In thousands of tests I've written, I've never had to use page.waitForTimeout
. If you find yourself using it, you're probably doing something wrong. Every time I see page.waitForTimeout
in a test, I immediately know that the test is flaky and/or that the test is masking a bug in the application.
Besides being flaky, page.waitForTimeout
also makes your tests slower. It's a bad practice to wait for arbitrary timeouts in tests. Instead, you should wait for the application to be in the right state. For example, instead of waiting for 1 second, you should wait for an element to appear on the page.
Conclusion: Embrace the Challenge of Flaky Tests
Flaky tests in Playwright, like in any other testing framework, are an inevitable part of the testing process. However, understanding their nature, causes, and impact can guide you in devising effective strategies to identify, handle, and prevent them.
Dealing with flaky tests is not just about fixing tests that fail inconsistently. It's about improving the reliability and stability of your test suite. It's about maintaining the trust in your testing process and ultimately, delivering a robust, high-quality product.
Remember, while flaky tests might be frustrating, they're not invincible. By being proactive, you can turn these challenges into opportunities for improvement.
In your journey of mastering Playwright, treating flaky tests not as nuisances, but as integral parts of the process will bring you one step closer to achieving a more efficient and reliable testing system. Keep testing, keep improving, and let's continue to build better software, one test at a time.