flat-fuchsia•10mo ago
SAME_HOSTNAME not working on non www URLs
When using the
EnqueueStrategy.SAME_HOSTNAME I noticed it does not work properly on non www urls.
In the debugger I noticed it passes origin to the _check_enqueue_strategy but it uses the context.request.loaded_url if available.
So every URL that is checked will mismatch because of the difference in hostname
I tested this with multiple urls with & without www prefix and got the same behaviour.

2 Replies
Someone will reply to you shortly. In the meantime, this might help:
-# This post was marked as solved by ROYOSTI. View answer.
Hi @ROYOSTI
Feel free to create an issue about this bug )
https://github.com/apify/crawlee-python/issues
GitHub
Issues · apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Wo...