Exclude query parameter URLs from crawl jobs
I'm researching currently methods to exclude URLs with, for example: https://domain[.]com/path?query1=test&query2=test2
I've tried hooking into the enqueueLinks options like:
However, it seems like it still matches, because this isn't necessarily excluding, rather matching allowables based on RegEx.
I"m using PlayrightCrawler via crawlee, but I think this would just be something I can do across all crawler engines. Please let me know of how I might achieve this or guide me to more research. Thanks Team!
