rare-sapphire
rare-sapphire•13mo ago

Scraping Government Websites

I'm trying to scrape a business entity search like this (https://ccfs.sos.wa.gov/#/BusinessSearch) but I cannot get any results. To be fair - I'm definitely more of a novice. But from some online forum reading, I'm not sure if this is even possible. Specifically - I want to get the pdfs from the filing history on each individual business entity site. Can I do this with the Website Content Crawler or Web Scraper or am I in over my head?
Corporations and Charities System
Washington Corporations and Charities Filing System
2 Replies
Marco
Marco•13mo ago
Hello! We offer some free scraping courses on the Apify Academy if you are interested into learning the basics: https://docs.apify.com/academy. It seems to me that the website you are trying to scrape requires to login, and the scraper should be instructed how to do so; there is a page on the Academy about that: https://docs.apify.com/academy/puppeteer-playwright/common-use-cases/logging-into-a-website. Unfortunately, the actors you mentioned cannot perform a login. Web Scraper can perform some JavaScript operations on the page, but I think the best solution would be to write an actor from scratch for your website.
rare-sapphire
rare-sapphireOP•13mo ago
thanks Marco this is super helpful! ive always been scared of writing my first actor...frankly bc Apify is so cool that once I get good at it I'll start writing enough actors to quit my day job 😅

Did you find this page helpful?