Difference between enqueueLinks and crawler.addRequests

At a glance

The community member is having an issue with the enqueueLinks function in the Crawlee library (version 3.7.0). When they call enqueueLinks({urls, label:'DETAIL'}), the links are not enqueued, and the crawler stops. However, when they use crawler.addRequests(filteredLinks.map(link=>({url:link, label:DETAIL}))), the links are added as expected, and the crawler works fine.

The community members discuss possible reasons for this issue. One suggests that the URLs being enqueued are absolute URLs (e.g., 'https://google.com') instead of relative ones, but they don't think this should be the reason. Another community member mentions that there might be a configuration option to control whether the crawler enqueues links from the same domain or all domains, and this could be the problem.

The answer is provided by a community member, who states that the issue is related to the enqueueing strategy and that the community member needs to use the "all" enqueueing strategy to enqueue both relative and absolute links.

Additionally, another community member suggests that the community member should use the context-aware enqueueLinks function provide

Useful resources

AAltairSama2

Hey folks, I have a list of urls like ["https//google.com"] etc, and when I call enqueueLinks({urls, label:'DETAIL'}), none of the links are enqueued and the crawler stops right there, but if I do

Plain Text

crawler.addRequests(filteredLinks.map(link=>({url:link, label:DETAIL})))

the links are added as expected and the crawler works fine, I just wanted what's the difference between the two and why enqueueLinks was not working here? crawlee ver is 3.7.0

6 comments

AAltairSama2

the only thing I can figure is that the urls I am enqueuing are absolute urls e.g. 'https://google.com' instead of relative ones, but this shouldnt be the reason right?

AAltairSama2

because I ran another scraper of mine and that one worked fine

AAltairSama2

Ok, it really is relative links only but why?

wwflanagan

there is, i believe, a config options to whether its enqueuing the same domain or all domains. Could that be your problem?

AAltairSama2

yeah that was the issue, There's a discussion on github on it.you need to use enqueueing strategy all

AAlexey Udovydchenko

If you want to automatically find and enqueue links, you should use the context-aware enqueueLinks function provided on the crawler contexts. Otherwise it will filter out provided URLs array and in your example its already single URL, see https://crawlee.dev/api/core/function/enqueueLinks

Add a reply

Apify Discord Mirror

Difference between enqueueLinks and crawler.addRequests