correct-apricot•3y ago
Apify not queueing my apify links
Why does this not enqueue my links?
3 Replies
graceful-blue•3y ago
It's hard to say without seeing actual URLs - but looking at this snippet - https://www.something/produto is not a valid URL. Same as https://www.something/sitemap_index.xml Also keep in mind that by default
enqueueLinks enqueues the links with the same hostname (as current page/request). You could try changing it to strategy: 'all' - see here: https://crawlee.dev/api/core/interface/EnqueueLinksOptions#strategycorrect-apricotOP•3y ago
stategy appears to work fine, how can I put it to my url?
same-domain appears to crawl more than my domai?
@Andrey Bykov Does
enqueueLinks only query the a selector? by default
What I want to do is, the main sitemap_index.html points to other sitemaps. I want to basically a recursive crawler automaticallygraceful-blue•3y ago
by default it's using the
a selector, yes. You would not be able to use enqueueLinks with sitemap and cheerio, because, well, there are no links - it's only text in loc selector. If you would use the browser though - it should be rendered into a with proper hrefs and thus enqueueLinks will work. If you still want to use cheerio - grab the urls from html manually and then use crawler.addRequests[<your_urls_here>]