Apify Discord Mirror

Updated 11 months ago

Difference between enqueueLinks and crawler.addRequests

At a glance

The community member is having an issue with the enqueueLinks function in the Crawlee library (version 3.7.0). When they call enqueueLinks({urls, label:'DETAIL'}), the links are not enqueued, and the crawler stops. However, when they use crawler.addRequests(filteredLinks.map(link=>({url:link, label:DETAIL}))), the links are added as expected, and the crawler works fine.

The community members discuss possible reasons for this issue. One suggests that the URLs being enqueued are absolute URLs (e.g., 'https://google.com') instead of relative ones, but they don't think this should be the reason. Another community member mentions that there might be a configuration option to control whether the crawler enqueues links from the same domain or all domains, and this could be the problem.

The answer is provided by a community member, who states that the issue is related to the enqueueing strategy and that the community member needs to use the "all" enqueueing strategy to enqueue both relative and absolute links.

Additionally, another community member suggests that the community member should use the context-aware enqueueLinks function provide

Useful resources
Hey folks, I have a list of urls like ["https//google.com"] etc, and when I call enqueueLinks({urls, label:'DETAIL'}), none of the links are enqueued and the crawler stops right there, but if I do
Plain Text
crawler.addRequests(filteredLinks.map(link=>({url:link, label:DETAIL}))) 

the links are added as expected and the crawler works fine, I just wanted what's the difference between the two and why enqueueLinks was not working here? crawlee ver is 3.7.0
A
w
A
6 comments
the only thing I can figure is that the urls I am enqueuing are absolute urls e.g. 'https://google.com' instead of relative ones, but this shouldnt be the reason right?
because I ran another scraper of mine and that one worked fine
Ok, it really is relative links only but why?
there is, i believe, a config options to whether its enqueuing the same domain or all domains. Could that be your problem?
yeah that was the issue, There's a discussion on github on it.you need to use enqueueing strategy all
If you want to automatically find and enqueue links, you should use the context-aware enqueueLinks function provided on the crawler contexts. Otherwise it will filter out provided URLs array and in your example its already single URL, see https://crawlee.dev/api/core/function/enqueueLinks
Add a reply
Sign up and join the conversation on Discord