Apify Discord Mirror

Updated 5 months ago

Scrapy integration silently throws away redirects

At a glance

A community member noticed that their custom Scrapy spider produces different numbers of items when run locally (720 items) versus through Apify (370 items), despite the code, URLs, and parameters being the same. They are unsure of the root cause and are looking for ways to debug the issue. The community members have discussed that running with or without a proxy makes no difference, and one community member has filed an issue on the Apify SDK for Python repository. Another community member has mentioned that the Scrapy integration silently throws away redirects, which may be related to the issue. The Apify team has acknowledged the report and will look into it.

Useful resources
Cheers, I just noticed that my custom scraper makes quite different number of requests locally and through Apify, while the code, URLs, parameters, everything is the same. The same Scrapy spider produces 720 items locally, but 370 through Apify

Anyone has any clue what could be the root cause, where to look? Just from the logs I can't see anything. The only clue I noticed is that on Apify the scraper makes no POST requests, but that probably isn't enough to debug the root cause πŸ€”

Is there a way I can raise logging or something on Apify? How can I best approach this? How to debug this?
H
V
6 comments
The same Scrapy spider produces 720 items locally, but 370 through Apify
Running with or without proxy makes no difference
Scrapy integration silently throws away redirects
Hi Honza, thanks for the report, I'll try to look into it soon
Thanks! πŸ™‡β€β™‚οΈ
Add a reply
Sign up and join the conversation on Discord