Apify & Crawlee

AC

Apify & Crawlee

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻creators-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

Crawlee PuppeteerCrawler not starting with Chrome Profile

I need a Chrome profile to run the scraper, since I need my session cookies to access precise pages. This is my code ```js...

enqueueLinks with urls don't trigger router handler

Hello my "search" handler enqueues a url ( I have verified and the url exists and is valid ) to my "subprocessors" handler but for some reasons it's not being triggered ```js router.addHandler( "search",...
Solution:
Different domains maybe? Have you tried with a different strategy? Try with All , more information here https://crawlee.dev/js/api/core/enum/EnqueueStrategy

Is the default request queue the same for different crawler instances?

Hello everyone, I would like to know if the default request queue (if not specified in the Crawler options) is the same for all instances? I tried to run an HttpCrawler next to a PlaywrightCrawler and for some unknown reason the HttpCrawler picked a request which was for the PlaywrightCrawler...
Solution:
Yes, I believe. If you don't specify a queue's name, they both would use the same default queue. Solution is to use named queues that you'd drop at the end of the actor run See: https://github.com/apify/crawlee/discussions/2026?utm_source=chatgpt.com#discussioncomment-6656135...

What proxy providers work best with Crawlee?

We are trying to benchmark different proxies - which ones are the best?

Max requests per second

Hello! I would like to know, is there, like for the maxRequestsPerMinute / maxTasksPerMinute an option but for second? If not, what would be the easiest way to implement this? Always waiting 1s in the request handler and relying on maxConcurrency? ...

how to implement in easypanel

guys how to implement the crawlee on easypanel, please help , can't make it run with playwright

we are looking for a scraping expert

We are seeking a skilled web scraper developer to create an efficient and reliable web scraping tool. The ideal candidate will have experience in extracting data from various websites and handling different data formats. You will be responsible for building a scraper that can navigate through sites, collect data, and store it in a structured format. If you are proficient in web scraping techniques and have a strong understanding of HTML and JavaScript, Please DM me. Please provide examples of pr...

How to improve recaptcha v3 's score?

Hi there , does anyone have success experience with recaptcha v3? The target is https://completedns.com/dns-history/ . I want get data from it using Python script. I have tested many recaptcha solver but still failed. Anyone could help? Thanks in advance

Throttle on 429 responses

Hi, I'm using a cheerio crawler and things are generally working well. I occasionally get a Cloudflare 429 page, though, and it manifests itself as an error on waitForSelector because I'm getting the Cloudflare response. Should Crawlee be catching these responses and waiting/slowing without intervention? I've had to catch this issue and then pause the autoscale pool (for 10 sec) manually. Should I be tuning other nobs too/instead? I don't have maxRequestsPerMinute configured yet because I'm no...

LinkedIn Session Timeout

I was trying to automate some of my regular activity in LinkedIn using stagehand & browser base. I have enabled the proxy mapped it to close by location as well. the account is getting logged out when I perform one or two actions. The workflow - grab the cookies, user agent, location from already logged in browser...

Timeout in Docker (with Camoufox image)

Hello everyone, I'm trying to create a scraper with Crawlee + Camoufox that I'll run in a Docker container. To do this, I used the Apify image for Camoufox (https://github.com/apify/apify-actor-docker/tree/master/node-playwright-camoufox) and followed the same tutorial as this one: https://docs.apify.com/sdk/js/docs/guides/docker-images...
correct-apricot
correct-apricot5/4/2025

Wiping session between inputs

Hello! I'm crawling / scraping a site which involves doing the following steps for each input. 1. Entering some data 2. doing a bunch of "Load more" 3. Collect output...

preNavigationHooks not followed

Camoufox JS integration used. If I log something before the await page.route it works, inside page.route it doesn't. ```typescript preNavigationHooks: [...
fascinating-indigo
fascinating-indigo4/19/2025

Proxy settings appear to be cached

Hi, I'm trying to use residential proxies on a playwright crawler, but it appears that even when I comment out the proxyConfiguration there is still an attempt to use a proxy. Created a fresh project to create a minimal test to debug and it worked fine, until I had a proxy failure, and then it happened again. The error is: WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. Detected a session error, rotating session... ...
fascinating-indigo
fascinating-indigo4/18/2025

Caching requests for development and testing

Hi, I'm wondering what people are doing (if anything) to record and replay requests while building scrapers. A lot of building scrapers is trial and error, making sure you have the right selectors, json paths, etc, so I end up running my code a fair few times. I'd ideally cache the initial request to each endpoint and replay it when it's requested again, just for development, so I'm not continually hitting the website (both for politeness, and also to reduce the chances of triggering any antibot provisions). Thinking back to my ruby days there was a package called VCR which would do this if you instantiated it before HTTP requests, with ways to invalidate the cache. In JS there's netflix's polly which I'm going to try out shortly, but I'm interested to hear what other people are doing/using, if anything....
fascinating-indigo
fascinating-indigo4/16/2025

Customising logging

Is there a recommended way to customise logging? I want to be able to log which specific crawler and which handler a log is coming from. I have tried to override the logger in the crawler using ```import defaultLog, { Log } from '@apify/log'; ... const crawler = new BasicCrawler({ requestHandler: router,...
generous-apricot
generous-apricot4/13/2025

How to clear cookies?

I need to clear the cookies for a website before requesting it using the CheerioCrawler, how do I do it? TIA
foreign-sapphire
foreign-sapphire4/11/2025

Browerless + Crawlee

Hello, Is there any way to run Crawlee on Browserless?...
exotic-emerald
exotic-emerald4/9/2025

How to handle 403 error response using Puppeteer and JS when click on the button which hit an API

We are building a scrapper and that is using client side pagination and when we click on the Next page it calls the API but the api returns 403 as they are detecting it is coming from some bot. So how can we bypass that while opening the browser or while doing the scrapping. Any suggestion will be halpful....
absent-sapphire
absent-sapphire4/6/2025

Request works in Postman but doesn't works in crawler even with full browser

Hello I'm trying to handle ajax call via got-scraping. I prepare call in postman, where it works fine. But if I want to try it in Actor a got 403 every time. Even if I try i via Puppeteer or Playwrite and click on the button with request I got response with geo.captcha-delivery.com/captcha url to solve it. Please can anybody give me any advice how to handle this issue?...