inland-turquoise•3y ago
Ignore URLs the matches the current url but does have query params
I do not want to crawl a url that is already crawled but have different query params, how can i do this?
1 Reply
inland-turquoiseOP•3y ago
await enqueueLinks({
selector: 'a[href]',
transformRequestFunction: (link) => {
const { url } = link;
const urlWithoutQuery = url.split('?')[0];
if (!visitedUrls.has(urlWithoutQuery)) {
visitedUrls.add(urlWithoutQuery);
return { url: urlWithoutQuery };
}
},
strategy: EnqueueStrategy.SameHostname,
});