rare-sapphire•13mo ago
Scraping Government Websites
I'm trying to scrape a business entity search like this (https://ccfs.sos.wa.gov/#/BusinessSearch) but I cannot get any results. To be fair - I'm definitely more of a novice. But from some online forum reading, I'm not sure if this is even possible.
Specifically - I want to get the pdfs from the filing history on each individual business entity site.
Can I do this with the Website Content Crawler or Web Scraper or am I in over my head?
Corporations and Charities System
Washington Corporations and Charities Filing System
2 Replies
Hello! We offer some free scraping courses on the Apify Academy if you are interested into learning the basics: https://docs.apify.com/academy.
It seems to me that the website you are trying to scrape requires to login, and the scraper should be instructed how to do so; there is a page on the Academy about that: https://docs.apify.com/academy/puppeteer-playwright/common-use-cases/logging-into-a-website.
Unfortunately, the actors you mentioned cannot perform a login. Web Scraper can perform some JavaScript operations on the page, but I think the best solution would be to write an actor from scratch for your website.
rare-sapphireOP•13mo ago
thanks Marco this is super helpful! ive always been scared of writing my first actor...frankly bc Apify is so cool that once I get good at it I'll start writing enough actors to quit my day job 😅