faint-whiteF
Apify & Crawlee3y ago
1 reply
faint-white

Running multiple PlaywrightCrawlers has them using each others context/data, causing data leaks.

Hi folks.

I have a
PlaywrightCrawler
that takes a base URL and then uses requestQueue to find urls on and scan an entire website. That's located in a function called
parseSite
that I call from a Redis queue managed by bullmq. The redis job has data as to what project to save the page details under (like
projectId
), which i send as arguments to the
parseSite
function.

This works fine when I have concurrency set to 1, but when I allow multiple jobs to be picked up at the same time,
PlaywrightCrawler
starts to use the wrong
projectId
for some of the pages. Code wise that shouldn't be possible, since thats an argument for parseSite and there is no way to access other
projectId
s in the context of that function, so it sounds like the
PlaywrightCrawler
is mixing things there. Is that a known issue, and what can I do to prevent it (It's now leaking data to other teams)
Was this page helpful?