awake-maroonA
Apify & Crawlee4y ago
2 replies
awake-maroon

Share cache between multiple crawlee instances

I am using Crawlee with Chromium Playwright to scrape information about products from various retailers. For some of the information I need to extract, I have to run a headless browser to be able to interact with the page.

I noticed that for one of my targets I have a lot of network transfers happening for scripts (js, json, css) that are the same for all the products. So if I scrape a long list of products these resources are getting cached and their impact on the overall transferred data size is not big. On the other hand if for every session I scrape only a few pages at the target, all this script resources need to be loaded because the cache is initially empty for every playwright session / context.

Does anyone have an idea about how I could reuse the same cache in playwright / crawlee between 2 or multiple runs of my script?
Was this page helpful?