military-pinkM
Apify & Crawlee3y ago
9 replies
military-pink

Cheerio Crawler works for Amazon.de but gets detected bot at amazon.com

Dear all, I am experimenting with cheerio crawler to scrape Amazon. I followed the tutorial online and it works for Germany but the same crawler gets detected as a bot for US. For Germany, I am using a data center proxy of Germany and it works but for USA the datacenter proxy of US doesn't work. Below is the configuration. I am building an Amazon scraper for multiple marketplaces. But this inconsistency makes it challenging.


const crawler = new CheerioCrawler({
    proxyConfiguration,
    requestQueue: queue,
    useSessionPool: true,
    persistCookiesPerSession: true,
    maxRequestRetries: 20,
    maxRequestsPerMinute: 250,
    autoscaledPoolOptions:{
      maxConcurrency:100,
      minConcurrency: 5,
      isFinishedFunction: async () => {
        // Tell the pool whether it should finish
        // or wait for more tasks to become available.
        // Return true or false
        return false
    }
    },
    failedRequestHandler: async (context) => rebirth_requests({ ...context}),
    requestHandler: async (context) => router({ ...context, dbPool})
    //sessionPoolOptions:{blockedStatusCodes:[]},

});
Was this page helpful?