vicious-gold•2y ago
Unexpected results when currying a value into the request handler
I need to curry a value into my request handler. When I do this, my first invocation of the crawler runs as expected, but the second invocation reaches maxRequestsPerCrawl immediately (even when passing in a new url).
I see this on the first run: Then this on the second: I have a function that takes a single string value (
I see this on the first run: Then this on the second: I have a function that takes a single string value (
step_id
) and passes that value into a function that returns the request handler.
This function is imported and called as such:
If I don't curry the step_id into the request handler, the second run works just fine. Maybe there is another way to get my step_id value into the scope of the request handler?
Thanks in advance for any advice.3 Replies
vicious-goldOP•2y ago
Update. Knowing that the step_id is globally unique, I added configuration options to override the default requestQueueId and keyValueStoreId. I also opted not to have these written to the storage directory.
Seems to be working now, but IDK if this is the appropriate approach...
Hi, our team will reply soon, its weekend so might take a day or two. 🙂
Hi,
It seems the issue you encountered was related to how the PlaywrightCrawlers state is managed between invocations. When currying step_id into your request handler without explicitly isolating each run's state, the crawler's internal mechanisms for tracking requests and their statuses could have been improperly influenced by the residual state from previous runs. This would explain why the second run immediately hit the maxRequestsPerCrawl limit because, from the crawlers perspective, it was continuing from the previous state rather than starting fresh.
One other way of separating the runs would be to create named request que with the step_id as a name
https://crawlee.dev/api/core/class/RequestQueue
And also if you need to pass some data between different instances, you could do it through the key value store.
https://docs.apify.com/platform/storage/key-value-store