Apify & Crawlee

AC

Apify & Crawlee

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻creators-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

fascinating-indigo
fascinating-indigo4/13/2025

How to clear cookies?

I need to clear the cookies for a website before requesting it using the CheerioCrawler, how do I do it? TIA
fascinating-indigo
fascinating-indigo4/11/2025

Browerless + Crawlee

Hello, Is there any way to run Crawlee on Browserless?...
conscious-sapphire
conscious-sapphire4/9/2025

How to handle 403 error response using Puppeteer and JS when click on the button which hit an API

We are building a scrapper and that is using client side pagination and when we click on the Next page it calls the API but the api returns 403 as they are detecting it is coming from some bot. So how can we bypass that while opening the browser or while doing the scrapping. Any suggestion will be halpful....
like-gold
like-gold4/6/2025

Request works in Postman but doesn't works in crawler even with full browser

Hello I'm trying to handle ajax call via got-scraping. I prepare call in postman, where it works fine. But if I want to try it in Actor a got 403 every time. Even if I try i via Puppeteer or Playwrite and click on the button with request I got response with geo.captcha-delivery.com/captcha url to solve it. Please can anybody give me any advice how to handle this issue?...
extended-salmon
extended-salmon4/1/2025

about RESIDENTIAL proxies

Hi all, what is your experience with RESIDENTIAL proxies? Let us share: - provider URL - price /GB residential traffic...
fascinating-indigo
fascinating-indigo3/28/2025

served with unsupported charset/encoding: ISO-88509-1

Reclaiming failed request back to the list or queue. Resource http://www.etmoc.com/look/Looklist?Id=47463 served with unsupported charset/encoding: ISO-88509-1
fascinating-indigo
fascinating-indigo3/28/2025

Cannot detect CDP client for Puppeteer

Hi, How to fix this? `Failed to compile...
fair-rose
fair-rose3/20/2025

error in loader module

Hi! Error with Lodash in Crawlee Please help. I ran the actor and got this error. I tried changing to different versions of Crawlee, but the error still persists. node:internal/modules/cjs/loader:1140...
sensitive-blue
sensitive-blue3/20/2025

Saving the working configurations & Sessions for each sites

Hi! I'm new to Crawlee, I'm super excited to migrate my scraping architecture to Crawlee but I can't find how to achieve this. My use case: ...
other-emerald
other-emerald3/15/2025

Request queue with id: [id] not does not exist

I create an API with express that runs crawle when called on an endpoint. It is weird that it works completly fine on the first request I make to the API, but fails on the next ones. I get the error: Request queue with id: [id] not does not exist....
robust-apricot
robust-apricot3/14/2025

Only-once storage

Helllo all, I’m looking to understand how crawlee uses storage a little better and have a question regarding that: Crawlee truncates the storage of all indexed pages every time I run. Is there a way to not have it do that? Almost like using it as an append-only log for new items found....

Camoufox failing

I have a project that is using the PlaywrightCrawler from Crawlee. If I create the template camoufox it's running perfectly, when I take the same commands from the package.json of the template and basically following the same example in my project I get the following error: ``` 2025-03-13T11:58:38.513Z [Crawler] [INFO ℹ️] Finished! Total 0 requests: 0 succeeded, 0 failed. {"terminal":true}...
probable-pink
probable-pink3/12/2025

Redirect Control

Im trying to make a simple crawler, how do proper control the redirects? Some bad proxies sometimes redirect to auth page , in this case i want to mark the request as failed if the redirect URL ( target ) contains something like /auth/login. Whats the best to handle this scenarios and abort the request earlier?
foreign-sapphire
foreign-sapphire3/11/2025

TypeError: Invalid URL

Adding requests with crawler.run(["https://website.com/1234"]); works locally while in the apify cloud it breaks with the following error: Reclaiming failed request back to the list or queue. TypeError: Invalid URL It appears that while running in the cloud, the URL is split by character and each creates a request in the queue, as it can be seen in the screenshot. The bug happens no matter the URL is hardcoded in the code or added dynamically via input....
No description

How to ensure dataset is created before pushing data to it?

I have a public actor and some of my users experience that either default and/or named datasets don't seem to be existing and somehow won't be created when pushing data to them. This is the error message I can see affecting only a handful of user runs: ```bash ...
stormy-gold
stormy-gold3/5/2025

Routing issue

I have a listing website as INPUT and enqueueLinks of it. These links (case studies) at the time has also multiple pages. When the cralwer adds the links with the new label attached, it's not happening anything. When using only case study page, it's scrapping the data and working. Not sure, what to do next and how to test it more. Does the Queue System waits to complete to add all links to start scrapping?
rare-sapphire
rare-sapphire3/5/2025

Using BrightData's socks5h proxies

BrightData's datacenter proxies can be used with socks5 but only with remote dns resolution, thus the protocol should be given like socks5h://... Testing it with curl works, but using it in crawlee it doesn't work. Just keeps hanging. ```...
stormy-gold
stormy-gold3/1/2025

Loadtime

Hello, Is there a way to get the load time of a site from crawlee in headless mode? I'm using PlaywrightCrawler. Thanks!...
deep-jade
deep-jade3/1/2025

How to stop following delayed javascript redirects?

I'm using the AdaptivePlaywrightCrawler with the same-domain strategy in enqueueLinks. The page I'm trying to crawl has delayed JavaScript redirects to other pages, such as Instagram. Sometimes, the crawler mistakenly thinks it's still on the same domain after a redirect and starts adding Instagram URLs to the main domain, like example.com/account/... and example.com/member/..., which don't actually exist, so, how can I stop following these delayed JavaScript redirects?

Replace default logger

Hello, did anybody manage to completely replace the logs from Crawlee with console logs ? If yes, can you please share your implementation ?...