HI, know this is a really basic question, but I'm in a bit of a hurry.
I'm trying to scrape some data from an API that otherwise would block me and forward me to a robots page. What I'm aiming to do is open a landing page in the browser, perform some basic actions to soak up the required cookies etc., and then perform the scraping programmatically.
It seems there are a few different ways to do this, and as I'm not much of a web-developer and don;t normally use JS, it's taking a bit of time to navigate all the documentation and find the simplest way to achieve this.
Could someone point me towards a minimal example of the best way to approach what I'm trying to do please?
From what I can see in the network inspector, it seems I need to be sending cookies as well as a bearer token and user-agent headers to the APIs I need to call. If any way of doing tihs also makes it easy for me to see the request that went out on the wire, then that would be a big plus too - it seems that e.g. context.requests.get()
doesn't have a straightforward way for me to inspect the full request before it is sent(?), and only lets me interact with the repsonse. If I can compare the request to what I see from the website itself in the network panel of my browser, then that will be relaly helpful.
Thanks!
This thread is trying to answer question "What is the simplest way to scrape data from an API by opening a landing page in the browser, performing some basic actions to soak up the required cookies, and then performing the scraping programmatically? How can the user send cookies as well as a bearer token and user-agent headers to the APIs and inspect the full request before it is sent?"
Rayrun is a community for QA engineers. I am constantly looking for new ways to add value to people learning Playwright and other browser automation frameworks. If you have feedback, email [email protected].