Apify Discord Mirror

Updated 3 weeks ago

SAME_HOSTNAME not working on non www URLs

At a glance

The community member noticed an issue with the EnqueueStrategy.SAME_HOSTNAME feature in a Python library, where it does not work properly on non-www URLs. The issue is that the library uses the context.request.loaded_url instead of the origin, causing a mismatch in the hostname. The community member tested this with multiple URLs with and without the www prefix and observed the same behavior.

Another community member suggested that the original poster should create an issue about this bug on the library's GitHub repository.

Useful resources
When using the EnqueueStrategy.SAME_HOSTNAME I noticed it does not work properly on non www urls.

In the debugger I noticed it passes origin to the _check_enqueue_strategy but it uses the context.request.loaded_url if available.
So every URL that is checked will mismatch because of the difference in hostname

I tested this with multiple urls with & without www prefix and got the same behaviour.
Marked as solution
Hi @ROYOSTI

Feel free to create an issue about this bug )

https://github.com/apify/crawlee-python/issues
View full solution
M
1 comment
Hi @ROYOSTI

Feel free to create an issue about this bug )

https://github.com/apify/crawlee-python/issues
Add a reply
Sign up and join the conversation on Discord