Hello,
I'm researching currently methods to exclude URLs with, for example:
https://domain[.]com/path?query1=test&query2=test2I've tried hooking into the enqueueLinks options like:
await enqueueLinks({ regexps: [ new RegExp('^'+[websiteURL]+'[^?]+') ]});
However, it seems like it still matches, because this isn't necessarily excluding, rather matching allowables based on RegEx.
I"m using PlayrightCrawler via crawlee, but I think this would just be something I can do across all crawler engines. Please let me know of how I might achieve this or guide me to more research. Thanks Team!