1 reply

How to scrape different things per page

I'm wrapping my head around how to architect my use case. Essentially I have an array of different domain:

[ 'acme.com', 'foo.com, 'bar.com', 'helloworld.org' ]

[ 'acme.com', 'foo.com, 'bar.com', 'helloworld.org' ]

I want to look for different things as I guide my crawler through the domain. For example:

1. On the home page/root, I want to find any links that look similar to:

/pricing

/pricing

/security

/security

/careers

/careers

, and

/blog

/blog

.
2. I then want to perform different skills on each of these potential pages. For example:
a. On the pricing page, pass the innerHTML to ChatGPT to classify their pricing model
b. On the security page, search for the word "SOC2"
c. On the careers page, queue up to 100 links, and further process the individual job postings
d. On the blog page, count the number of articles

I'm not looking for someone to help specifically with

a - d

a - d

but more so help me understand best practices for structuring how you might go about creating "context aware" tasks on different pages.

How to scrape different things per page

Similar Threads