probable-pink•2y ago
Apify RequestQueueClientAsync.update_request - 413 - Payload too large
Hi guys, I'm using the python sdk of apify.
I do have a simple scraper, which iterates through a pagination to collect some URL's.
At the current state my scraper is executing 55 requests and has 540 results.
But at the 55. page - I do get an error from the SDK.
I set the LOG_LEVEL to "debug" and gained the following information:
´´´
2024-02-05T08:37:42.905Z [apify_client] DEBUG Request unsuccessful ({"status_code": 413, "client_method": "RequestQueueClientAsync.update_request", "resource_id": "fs", "method": "PUT", "url": "http://10.0.91.16:8010/v2/request-queues/fs/requests/fs**", "attempt": 1})
2024-02-05T08:37:42.907Z [apify_client] DEBUG Status code is not retryable ({"status_code": 413, "client_method": "RequestQueueClientAsync.update_request", "resource_id": "fs", "method": "PUT", "url": "http://10.0.91.16:8010/v2/request-queues/fsrequests/fs*", "attempt": 1})
´´´
(masked urls, to not share user data)
I cant imagine what the problem is - my script is simply extracting 10 urls from the page, then pushing a new request to the queue, that's it.
Does anybody has an idea? Since I'm not executing the request, I cant change the payload.
Does anybody knows whats happening on apify logic's? The only big payload I can see in the logs is the 'userData' Attribute which is added by apify, I can see in "ApifyHttpProxyMiddleware.process_request: updated request.meta" calls.
Btw, the depth is also 55 <- maybe that's alraedy too big?
Looking forward to your answers!
Thank you very much in advance 🙂
3 Replies
probable-pinkOP•2y ago
Hi guys, I checked the code of your sdk again, and when you use python and scrapy, you store the whole base64 encoded scrapy request in the userData field of the apify request, then you push it to the quere, where the apify api is not accepting it, becasuse the payload is too big.
Quote from the sdk: " # Serialize the Scrapy Request and store it in the apify_request.
# - This process involves converting the Scrapy Request object into a dictionary, encoding it to base64,
# and storing it as 'scrapy_request' within the 'userData' dictionary of the apify_request."
What's your recommendation to work around that? Implement the api by myself instead of the sdk is the only solution I can see right now...
or: dont use the remote request queue?
Hi @joern , thank you for your input and investigation! Yeah... the translation between apify - scrapy requests and back is quire tricky.
I'm gonna think it through, mainly if there would be an option to use another storage for the serialized request rather than RQ, and I'll let you know later.
@joern The API limit should be 9 MB, which should be absolutely enough. Do you store to the request some additional data? Could you please provide us with some code so that we can reproduce it? Thank you.
Since no further responses were provided, marking it as resolved. If this is still a problem, please let us know.
@Vlada Dusek just advanced to level 3! Thanks for your contributions! 🎉