Rayrun
← Back to Discord Forum

How to iterate through items of same parent

ilovelamp69posted in #help-playwright
Open in Discord
ilovelamp69
ilovelamp69

I am looking to iterate through each SEC filing on yahoo finance for a stock, and download the pdf for each one. Sometimes it does not give you the option to download the particular filing, so I also need to add in some logic to handle the case where I am not presented with an option to download the file.

I'm the thinking the algorithm would be something like

-open site -go to stock's page -go to sec filings page -click on first sec filing, if download option is there, download, then go back to the sec filing page. If download option isn't there, just go back to sec filing page. -click on second sec filing, if download option is there, download, then go back to the sec filing page. If download option isn't there, just go back to sec filing page. -click on third sec filing etc...

when it has gone through all sec filings, I will download the temporary downloads to the actual persistent memory of my pc.

Even though I have an idea of how i think it will go, I'm having a difficult time putting it into code as I'm brand new to Playwright.

You don't have to write the code for me, I'd just appreciate some tips on some reference pages on the playwright site that will guide me in the right direction, and maybe some best practices you think I should know about while building this.

Thank you very much.

This thread is trying to answer question "How can I iterate through multiple items with the same parent (each SEC filing) and download the PDF for each one using Playwright?"

4 replies
ilovelamp69
ilovelamp69
from playwright.sync_api import Playwright, sync_playwright, expect


def run(playwright: Playwright) -> None:
    browser = playwright.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()
    page.goto("https://finance.yahoo.com/")
    page.get_by_label("Close modal").click()
    page.get_by_placeholder("Search for news, symbols or companies").click()
    page.get_by_placeholder("Search for news, symbols or companies").fill("bngo")
    page.get_by_title("Bionano Genomics, Inc.", exact=True).get_by_text("BNGO").click()
    page.get_by_role("tab", name="SEC Filings").click()
    page.locator("li").filter(has_text="8-K : Corporate Changes & Voting MattersJune 16, 2023•Corporate Changes & Voting").get_by_role("link").nth(1).click()
    with page.expect_download() as download_info:
        page.get_by_role("link", name="Download").click()
    download = download_info.value
    # Wait for the download process to complete
    print("download", download)
    print("download path", download.path())
    # Save downloaded file somewhere

    # ---------------------
    context.close()
    browser.close()


with sync_playwright() as playwright:
    run(playwright)
ilovelamp69
ilovelamp69

This is what I have so far, just to download a single sec filing

ilovelamp69
ilovelamp69

I actually figured out almost everything what was my initial question.. how would I iterate through a bunch of items of the same parents? This being, how would I grab each child's href and iterate through them?

ilovelamp69
ilovelamp69

each child being each sec filing

Related Discord Threads

TwitterGitHubLinkedIn
AboutQuestionsDiscord ForumBrowser ExtensionTagsQA Jobs

Rayrun is a community for QA engineers. I am constantly looking for new ways to add value to people learning Playwright and other browser automation frameworks. If you have feedback, email [email protected].