correct-apricot
correct-apricot17mo ago

How to scrape different things per page

I'm wrapping my head around how to architect my use case. Essentially I have an array of different domain:
[ 'acme.com', 'foo.com, 'bar.com', 'helloworld.org' ]
[ 'acme.com', 'foo.com, 'bar.com', 'helloworld.org' ]
I want to look for different things as I guide my crawler through the domain. For example: 1. On the home page/root, I want to find any links that look similar to: /pricing, /security, /careers, and /blog. 2. I then want to perform different skills on each of these potential pages. For example: a. On the pricing page, pass the innerHTML to ChatGPT to classify their pricing model b. On the security page, search for the word "SOC2" c. On the careers page, queue up to 100 links, and further process the individual job postings d. On the blog page, count the number of articles I'm not looking for someone to help specifically with a - d but more so help me understand best practices for structuring how you might go about creating "context aware" tasks on different pages.
1 Reply
correct-apricot
correct-apricotOP17mo ago
As it typically, soon after posting I stumbled across the more advanced methods of routes/labels. I'll leave this post here in case someone finds it helpful. (And perhaps for someone to confirm that's the right path)

Did you find this page helpful?