Apify & Crawlee

AC

Apify & Crawlee

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻creators-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

inland-turquoise
inland-turquoise4/17/2025

Routers not working as expected

Hello everyone First of all, thanks for this project — it looks really good and promising! I'm considering using Crawlee as an alternative to Scrapy....
other-emerald
other-emerald4/15/2025

Dynamically change dataset id based on root_domain

Hey folks. I've attached an example of my code as a snippet Is it possible to dynamically change the dataset id so that each link has it's own dataset?...
foreign-sapphire
foreign-sapphire4/9/2025

Handling of 4xx and 5xx in default handler (Python)

I built a crawler for crawling the websites and now trying to add functionality to also handle error pages/links like 4xx and 5xx. I was not able to find any documentation regarding that. So, the question is if it is supported and if yes in what direction to look at?...
conscious-sapphire
conscious-sapphire4/6/2025

Camoufox and adaptive playwright

Hello great friends of Crawlee, I was wondering if there was anyway to use camoufox and the adaptive playwright browser? It seems to throw an error when I try to add the browser pool....
ratty-blush
ratty-blush4/4/2025

Hey ,why do i get web scrapping of first url , since i have another url .

I am implemented Playwright crawler to parse the url , I made a single request to crawler with first url, since the request has been processing , meanwhile , i passed anotther url in craler and hit the request, While processing, through crawler, it is processing content from first url , instead of second url both times. Can be please help? async def run_crawler(url, domain_name, save_path=None): print("doc url inside crawler file====================================>", url)...
metropolitan-bronze
metropolitan-bronze4/1/2025

proxy_config.new_url() does not return new proxy

Here is my selenium python script, where i try to rotate proxies using the proxy_config.new_url(): ```python Standard libraries import asyncio import logging...
other-emerald
other-emerald3/22/2025

Proxy example with PlaywrightCrawler

This is probably a simple fix but I cannot find an example of crawlee using a simple proxy link with Playwright. If anyone has a working example or know what is wrong in the code I would really appreciate your help. Here is the code I have been working with: (I wish I could copy and paste of the code here but the post go over the character limit I get the following error from the code: ...
No description
fair-rose
fair-rose3/16/2025

Input schema is not valid (Field schema.properties.files.enum is required)

input_schema.json ''' { "title": "Base64 Image Processor", "type": "object",...
stormy-gold
stormy-gold3/13/2025

Issues Creating an Intelligent Crawler & Constant Memory Overload

Hey there! I am creating an intelligent crawler using crawlee. Was previously using crawl4ai but switched since crawlee seems much better at anti-blocking. The main issue I am facing is I want to filtering the urls to crawl for a given page using LLMs. Is there a clean way to do this? So far I implemented a transformer for enqueue_links which saves the links to a dict and then process those dicts at a later point of time using another crawler object. Any other suggestions to solve this problem? I don't want to make the llm call in the transform function since that would be an LLM call per URL found which is quite expensive. Also when I run this on my EC2 instance with 8GB of RAM it constantly runs into memory overload and just gets stuck i.e. doesn't even continue scraping pages. Any idea how I can resolve this? This is my code currently...
conscious-sapphire
conscious-sapphire3/7/2025

Selenium + Chrome Instagram Scraper cannot find the Search button when I run it in Apfiy..

Hey everyone, I have built an Instagram Scraper using Selenium and Chrome that works perfectly until I deploy it as an actor here on Apify. It signs in fine but fails every time no matter what I do or try when it gets to the Search button....
conscious-sapphire
conscious-sapphire3/5/2025

Error on cleanup PlaywrightCrawler

I use PlaywrightCrawler with headless=True The package that I use is: crawlee[playwright]==0.6.1 When running the crawler I noticed when waiting for remaining tasks to finish it sometimes receives an error like you can see in the screenshot. Is this something that can be resolved easily? ...
conscious-sapphire
conscious-sapphire3/3/2025

Google Gemini Applet - Google Module Not Found (even though it is there)

Hey all I have a question about whether I can actually use Apify to access Google Gemini for video analyzation: I've built my own python version of the Gemini Video Analyzer Applet that analyzes social media videos for content style, structure, and aesthetic qualities and it works, I have installed all the Google dependencies required but when I try to run it as an actor using "apify run --purge" no matter what I do it says no module named google found. Is this a bug with Apify ? ...
conscious-sapphire
conscious-sapphire3/2/2025

"apify run" no longer able to detect python

Hey all, I successfully deployed one actor yesterday and followed all the same steps to deploy my next actor but now the Apify CLI can not detect python anymore when I run "apify run" which is crazy because it has to detect it in order to build the actor in the first place, This is output in my terminal which shows that it can't detect python but that I can find the version no problem: PS C:\Users\Ken\New PATH py\testing-it> apify run --purge...
flat-fuchsia
flat-fuchsia2/26/2025

Django Google Maps Reviews : Pulling Data into local Django app

Hi hi, Am looking for guidance on how I could interact with the Google Maps Srapper in my django application, I already have a model and a view that I would like to add the individual reviews from a particular listing. NB: I have numerous listings that I will also need to get the reviews from and present them based on their own url/details...
extended-salmon
extended-salmon2/18/2025

Is recommend to use Crawlee without the Apify CLI?

is recommend to use the Crawlee without de Apify CLI, iam using the lib because of the practive to create Crawler and i want to know the experience of another devs using in the same way that i am using
extended-salmon
extended-salmon2/18/2025

How i can change/save the logger that the context provides

The context of handler provides a context.log but i want to save/change the logger is used, because i want to save this, i am using the Crawly without Apify CLI
No description
optimistic-gold
optimistic-gold2/3/2025

How can I add my own cookies to the crawler

Hi, I'm using crawlee to fetch some data but I don't know how to add my own cookies in my crawler. I'm using Playwright to fetch cookies and after that I want to pass (in a session if it is possible) them to my BeautifulSoupCrawler.
conscious-sapphire
conscious-sapphire2/3/2025

SAME_HOSTNAME not working on non www URLs

When using the EnqueueStrategy.SAME_HOSTNAME I noticed it does not work properly on non www urls. In the debugger I noticed it passes origin to the _check_enqueue_strategy but it uses the context.request.loaded_url if available. So every URL that is checked will mismatch because of the difference in hostname ...
No description
fascinating-indigo
fascinating-indigo1/27/2025

Testing my first actor

Hi there. I am coming from scraperPAI solutions and I am having issues w/ them. I just want to try Apify. I am trying to build my firt actor without any succeed currently. The test actor sample offers a full example. Sounds great but I get error when I try to use another URL than the one proposed by default (https://www.apify.com) I get an error. For example I try the following https://fr.indeed.com and I get an error. Any idea?...
conscious-sapphire
conscious-sapphire1/27/2025

Chromium sandboxing failed

I run Crawlee in a docker container. That docker container is used in a Jenkins task. When starting the crawler I receive the following error: ``` Browser logs: Chromium sandboxing failed!...