I'm currently facing an issue with our extensive suite of e2e tests. As our test times are increasing, running all tests during PR validation is becoming inefficient.
Current Setup: Whenever a PR is made, all tests run in the PR validation pipeline.
What I'm Trying: I've been working on a script that:
Retrieves the list of file and function changes in the PR. Takes the names of all tests(npx playwright test --list --reporter=json) and request OpenAI's GPT-4 to determine which tests should be run as a smoke test during PR validation. Has anyone here experimented with OpenAI or a similar approach to optimize which tests to run? I'd love to hear about any experiences or insights on this.
This thread is trying to answer question "Has anyone here experimented with OpenAI or a similar approach to optimize which tests to run?"
I only experimented in getting newly added tests using git diff and grep, something like
git diff -U0 HEAD~1..HEAD '*.test.ts' | ggrep -P '^\+\s*test\(.*' (might not work in your case). Getting modified tests is too complex, especially if you're using fixtures and you modified some fixture. Even for AI you basically would need to give access to whole codebase to deduce what tests were modified.
Thanks for the response. My idea was to make a qualified guess based on the names of the files and functions that were changed. This would help in determining, based on the names of our tests, which tests to run on the PR.
When I talk about changes, I'm not referring to modifications in the test code, but rather changes in the source code of our apps, which the tests are associated with.
Usually when creating a testrun for people to run, we have the same problem. It is always a cost/benefit (risk/reward) calculation and if you decide not to run a testcase, and it turns out to have been important, you guessed wrong and you learn from it (it because a part of a required test case).
Asking AI to make the guesses for you is interesting but would be difficult to evaluate the efficacy of its guesses and you would need to have a feedback mechanism to help it learn that it screwed up. That requires humans to help it train its model. And the model that you are creating is highly influenced by the person creating the model (what they think is important, that can differ wildly between people).
Just my 2cents.
We've experimented with the Playwright testing service, but unfortunately, it's proving to be too costly for us at the moment. It's evident that we need to optimize our test speed. Currently, we're using 5 agents in Azure Devops pipelines (with worker set to 1) and running a total of 150 tests. Although all the tests can run in parallel, doing so extends the duration significantly due to the time taken for data setup and other preparatory tasks. Perhaps, to speed things up, we should consider reducing some of the isolation between tests.
Rayrun is a community for QA engineers. I am constantly looking for new ways to add value to people learning Playwright and other browser automation frameworks. If you have feedback, email email@example.com.