Rayrun
← Back to Discord Forum

Visual comparison with fixed file paths

I'm using Playwright to validate some custom controls, which draw stuff on canvas. I have some expected results prepared, and I want to use them to validate the actual drawing against file, which is going to reside at some fixed path.

This is the code for my test:

test('[redacted] contains actual MR render', async ({page}) => {
    // given
    await page.goto('http://127.0.0.1:8080');

    // when
    let mrCanvas = await page.locator('#mr > canvas');

    // then
    await expect(await mrCanvas.screenshot()).toMatchSnapshot('pacs-mr.png');
});

The thing is that matching snapshots is following some expected pattern (tests/test_name.spec.js-snapshots/pacs-mr-win32.png). This is problematic, as it causes flaky tests (as the snapshots need to be stored in source tree), and makes them dependent from platform, where these test are run.

Is there some other way to perform visual comparison against fixed file paths?

This thread is trying to answer question "Is there some other way to perform visual comparison against fixed file paths?"

21 replies
refactoreric

Hi, since the same browser can have tiny differences in rendering between platforms, and even between headed and headless mode, Playwright's path/file name patterns for the snapshot are practical.

There is a guide here. It also explains how to update the snapshots for CI using Docker: https://playwright.dev/docs/test-snapshots

Personally I don't like this approach, since I don't have single source of truth which I can validate my code against. I think updating snapshots is also bad idea (because I overwrite my expected results) But if there is no other way, then I'll have to use it, thank you for tip anyway πŸ™‚

I too am not not much of a fan for screenshots. In general i don't like base line files, change something you now have to update all your baseline files. For me screen shots are yet another type of a baseline file. You would think even when things don't changes they would work, but i have screen shots that look exactly the same to a human but get flaged as different in automation and this is with 10% difference allowed for the compare... Would agree fine to have a couple generic screen shots for the base application to help verify stuff, but much more than that, as the app starts running and gets into the app, data/values on the screen change and then you end up masking out all if not most of the fields on the screen.. So where is the value of that with screen shots then...?

but get flaged as different in automation and this is with 10% difference allowed for the compare...

either something is wrong with how app renders content or you're comparing screenshots from different OS/browsers. Mine has zero pixel and pixel ratio tolerance and never fail if content has not changed.

Nope, we only ever capture the screenshots only in the docker image, and we only ever run chrome... When viewing the report when you see the "Actual vs Expected" and swipe the bar... Absolutely no difference but reported as different...

docker arm or amd always? do you see pixel diffs in report?

Yes, but the thing is that output is drawn on canvas, and due to nature of drawn imagery drawn they need to be always validable against reference file, so difference between browsers is not acceptable on my use-case.

Tests are not dockerized yet, but the target is amd64

But maybe this is good idea to make sure that they run on Docker + i.e. Chromium (images would have playwright + different browsers), so only one environment would require maintainance (the Linux one)

You really need a stable OS image for visual regression tests.

Even then, Chromium is not fully deterministic.

Does Playwright visual report have a zoom? Because if it reports a visual diff, there is definitely a visual diff, and the default visual diff threshold in Playwright is quite high.

I work on a large mono repo, we built a custom solution to handle our scale, and use a threshold 4x smaller than Playwright. In other words we are much more strict than Playwright and have a rather smooth experience because we do the necessary to stabilize the OS and most moving parts (eg timestamps, network requests,... )

Seems like the article that I wrote this weekend could fit into your discussion, guys πŸ˜„

https://lost-pixel.com/blog/post/playwright-visual-regression-testing

disclaimer: I am the co-founder of Lost Pixel and we wrote it just to eliminate lots of problems of running custom visual tests on a large scale in the company(millions of screenshots per month)

In lost-pixel you have lots of utilities to define the thresholds, explore the differences & finally to help you with maintenance. we support automatic baseline updates on branch regex which makes the approvals seamless

That looks real sleek, and similar to what we built, but our running costs are way smaller 🀷🏻

Wanted to follow up on what is reported as a difference, not sure it gets any more simple?

visual_diff_err.png

Are the mask exactly the same position? Same for the letters of the days, or the angle brackets?

What does the diff look like?

Could you post the actual and expected separately.

That said, if that is all there is in this screenshot, it doesn't seem worth a screenshot, but an assertion that the calendar view is visible.

Agreed, if i look at the "diff" it appears exactly that same for the day " M T W T F S S" for the days along with the "< >" for selecting month. Looking at the "diff" for the report i see a very light grey where the masked out areas are so i've always assume they are more visual place holder outlining where the masked out areas are. While i did try one thing with the images before pasting here. In paint, you can make a "transparient background" and then selected one fo them to overlay the other, from what i can see they look exactly the same. Thus the only way i could see them appear as different might be to color shading is off, one RGB value visually may look the same as another, so two different, while different may look the same? But the report itself i'm seeing 1 pixel off out of 106880.

Given the image is 320x334 gives a total of 106880 pixels but the report when seeing as reported from the image size...: The reported call stack is:

  • 1 pixels (ratio 0.01 of all image pixels) are different. - waiting 100ms before taking screenshot - waiting for locator('xpath=//div[@role='dialog']') - locator resolved to <div role="dialog" aria-labelledby=":r2f:-label" data…>…</div> - 1 pixels (ratio 0.01 of all image pixels) are different.

Okay i would agree 0.01 is uber strict, but would allow at 106880 pixel with 0.01 % difference 1068 pixels tolerance? Given what i can see... Honesly seems like some wierd floating point bug where the math for the comparer isn't correct...? Could be a bad assumption on my part, but only way i can explain the false failure...? But 1 pixel differnence in 106880, that should allow a ceiling of 1068.., Sure looks as something in the PW calculations are off?

But in the end this highlight why screen compares are not very good...

So, that's not a bug. There really is a visual diff in your screenshot.

What you are describing as 1px diff and so on are the maxDiffPixel and maxDiffPixelRatio which are unset by default. What is set by default (not to the best value IMO) is the threshold at 0.2 aka 20% of difference per individual pixel. In other words, by default, Playwright considers the colors #00000 and #333333 to be same.

I advised the Playwright team to make the mask visibility: hidden rather than background: #f0f from day one, because I knew it was going to yield false positive when the bounding box of the mask changes ever so slightly, and that seems to be the case here.

https://playwright.dev/docs/api/class-testconfig#test-config-expect

the 0.01% for 1px diff out of 320*334 must be a Math.ceil(...) + toPrecision(2) to make sure that even a single pixel different is reported as a difference rather than 0% difference if Math.round(...) was used.

If there is even a single pixel difference, then it's not 0% different, and saying so would far more misleading than rounding to the upper second digit.

Thanks for the insight i will check and see but i beleive we have set the pixiel difference to be 5%, What is really confusing then even the default of 1% should allow for up to 1068? Would be nice if there is a monochrome compare either on/off pixiel difference... Would think for vector/svg type image compares, that would be perfect, rastor images may not be so nice for this...

0.01 maxPixelDiffRatio would indeed allow up to 1068 px to different by more than the threshold value.

You might rather use maxPixelDiff

And maybe hide the calendar days yourself rather than using Playwright masks πŸ˜•

TwitterGitHubLinkedIn
AboutQuestionsDiscord ForumBrowser ExtensionTagsQA Jobs

Rayrun is a community for QA engineers. I am constantly looking for new ways to add value to people learning Playwright and other browser automation frameworks. If you have feedback, email [email protected].