What is the method to locate and extract text from a specific HTML element using Playwright?


Extracting Text from HTML Elements with Playwright

You can use Playwright's page.getByText() method to extract text from specific HTML elements like <div> or <span>. This method locates an element based on its text content. You can match the text by a substring, exact string, or even a regular expression.

Here's an example:

const divText = await page.getByText('This is some text inside a div.');
console.log(await divText.textContent());

For an exact match, pass { exact: true } as an option to page.getByText().

const exactMatch = await page.getByText('exact match', { exact: true });
console.log(await exactMatch.textContent());

You can also use regular expressions to match and extract specific patterns of texts.

const regexMatch = await page.getByText(/some [A-Za-z]+/i);
console.log(await regexMatch.textContent());

Remember, page.getByText() always normalizes whitespace. It turns multiple spaces into one and ignores leading and trailing whitespace.

Playwright's getByText() method is a powerful tool for locating elements based on their text content. Whether you're dealing with substrings, exact strings, or regular expressions, it's got you covered.


