colossal-harlequinC

Website language filtering (/en)

Hi, I am new with crawlee, I was wondering if there was a method with which we can only crawl English versions of websites when they exist and when they dont, to just scrape the regular version at its home language. The issue with only setting URLs with https://example/en/.... is that some websites dont have such endings, which means that they will return an error. In those cases id still want to scrape it even if in another language, its just that wherever possible Id prefer the english version to be scraped, and nothing else. Ideally I dont want to post process the results, because i would have already paid for a lot of crawling unnecessarily.
Was this page helpful?