military-pinkM
Apify & Crawlee3y ago
9 replies
military-pink

Cheerio Crawler works for Amazon.de but gets detected bot at amazon.com

Dear all, I am experimenting with cheerio crawler to scrape Amazon. I followed the tutorial online and it works for Germany but the same crawler gets detected as a bot for US. For Germany, I am using a data center proxy of Germany and it works but for USA the datacenter proxy of US doesn't work. Below is the configuration. I am building an Amazon scraper for multiple marketplaces. But this inconsistency makes it challenging.


const crawler = new CheerioCrawler({ proxyConfiguration, requestQueue: queue, useSessionPool: true, persistCookiesPerSession: true, maxRequestRetries: 20, maxRequestsPerMinute: 250, autoscaledPoolOptions:{ maxConcurrency:100, minConcurrency: 5, isFinishedFunction: async () => { // Tell the pool whether it should finish // or wait for more tasks to become available. // Return true or false return false } }, failedRequestHandler: async (context) => rebirth_requests({ ...context}), requestHandler: async (context) => router({ ...context, dbPool}) //sessionPoolOptions:{blockedStatusCodes:[]}, });
Was this page helpful?