21 replies

long running scraper, 500+ pages for each crawl

Hello,

I have a playwright crawler that is listening to db changes, whenever a page gets added I want it to scrape and enqueue 500 links for the whole scraping process, but, there can be multiple things added to the DB at the same time. I've tried keepAlive and the maxRequests thing is hard to manage if we just keep adding urls to the same crawler.

My question is: what's the best way to create a playwright crawler that will automatically handle the processing of 500 pages for each start?

long running scraper, 500+ pages for each crawl

Similar Threads