Rayrun
← Back to Discord Forum

scrapy-playwright help

albionatopqueueposted in #help-playwright
Open in Discord
albionatopqueue
albionatopqueue

Hello,

I have an issue getting the div from a certain site and parsing it into a dict. The code for the spider is as follows:

class sitespider(scrapy.Spider): name = "sitespider" allowed_domains = ["url"] start_urls = ["url/results?search_query=scrapy"]

def start_requests(self):
   yield scrapy.Request(
       url=self.start_urls[0],
        meta={
            "playwright": True,
            "playwright_include_page": True,
            "playwright_page_methods": [
                PageMethod("wait_for_selector", "div#dismissible.style-scope.ytd-video-renderer", timeout=6000),
            ],
        },
        errback=self.errback
   )

async def parse(self, response):
    yield {
        "url": response.css("div#dismissible.style-scope.ytd-video-renderer").getall(),
    }

powershell: [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2023-12-25 12:44:55 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 [scrapy.core.engine] DEBUG: Crawled (200) <GET [scrapy.core.scraper] DEBUG: Scraped from <200 https://url/results?search_query=scrapy> {'url': []}

The spider runs fine but my output is empty. I have tested the selector, which works correctly. If a full on solution is hard with this much information, just some general advice on the structure of the spider would be greatly appreciated. I believe using Pagemethod in this way should be not be causing issues? The only other thing i can think of is something went wrong with my docker usage, but then i'd expect the file not to run at all.

As you can probably tell, I don't have a great amount of experience. Thank you.

This thread is trying to answer question "Why is the output of the scrapy-playwright spider empty despite the spider running fine and the selector being correct?"

1 reply
albionatopqueue
albionatopqueue

scrapy-playwright help

Related Discord Threads

TwitterGitHubLinkedIn
AboutQuestionsDiscord ForumBrowser ExtensionTagsQA Jobs

Rayrun is a community for QA engineers. I am constantly looking for new ways to add value to people learning Playwright and other browser automation frameworks. If you have feedback, email luc@ray.run.