4 replies

Caching requests for development and testing

Hi,

I'm wondering what people are doing (if anything) to record and replay requests while building scrapers. A lot of building scrapers is trial and error, making sure you have the right selectors, json paths, etc, so I end up running my code a fair few times. I'd ideally cache the initial request to each endpoint and replay it when it's requested again, just for development, so I'm not continually hitting the website (both for politeness, and also to reduce the chances of triggering any antibot provisions).

Thinking back to my ruby days there was a package called VCR which would do this if you instantiated it before HTTP requests, with ways to invalidate the cache. In JS there's netflix's polly which I'm going to try out shortly, but I'm interested to hear what other people are doing/using, if anything.

I'm using a mix of crawlers (Basic, Cheerio, Playwright), so looking for something flexible.

Cheers!

Caching requests for development and testing

Similar Threads