I am looking to iterate through each SEC filing on yahoo finance for a stock, and download the pdf for each one. Sometimes it does not give you the option to download the particular filing, so I also need to add in some logic to handle the case where I am not presented with an option to download the file.
I'm the thinking the algorithm would be something like
-open site -go to stock's page -go to sec filings page -click on first sec filing, if download option is there, download, then go back to the sec filing page. If download option isn't there, just go back to sec filing page. -click on second sec filing, if download option is there, download, then go back to the sec filing page. If download option isn't there, just go back to sec filing page. -click on third sec filing etc...
when it has gone through all sec filings, I will download the temporary downloads to the actual persistent memory of my pc.
Even though I have an idea of how i think it will go, I'm having a difficult time putting it into code as I'm brand new to Playwright.
You don't have to write the code for me, I'd just appreciate some tips on some reference pages on the playwright site that will guide me in the right direction, and maybe some best practices you think I should know about while building this.
Thank you very much.
This thread is trying to answer question "How can I iterate through multiple items with the same parent (each SEC filing) and download the PDF for each one using Playwright?"
from playwright.sync_api import Playwright, sync_playwright, expect
def run(playwright: Playwright) -> None:
browser = playwright.chromium.launch(headless=False)
context = browser.new_context()
page = context.new_page()
page.goto("https://finance.yahoo.com/")
page.get_by_label("Close modal").click()
page.get_by_placeholder("Search for news, symbols or companies").click()
page.get_by_placeholder("Search for news, symbols or companies").fill("bngo")
page.get_by_title("Bionano Genomics, Inc.", exact=True).get_by_text("BNGO").click()
page.get_by_role("tab", name="SEC Filings").click()
page.locator("li").filter(has_text="8-K : Corporate Changes & Voting MattersJune 16, 2023•Corporate Changes & Voting").get_by_role("link").nth(1).click()
with page.expect_download() as download_info:
page.get_by_role("link", name="Download").click()
download = download_info.value
# Wait for the download process to complete
print("download", download)
print("download path", download.path())
# Save downloaded file somewhere
# ---------------------
context.close()
browser.close()
with sync_playwright() as playwright:
run(playwright)
Rayrun is a community for QA engineers. I am constantly looking for new ways to add value to people learning Playwright and other browser automation frameworks. If you have feedback, email [email protected].