crawlee-js
apify-platform
crawlee-python
💻hire-freelancers
🚀actor-promotion
💫feature-request
💻creators-and-apify
🗣general-chat
🎁giveaways
programming-memes
🌐apify-announcements
🕷crawlee-announcements
👥community
Routers not working as expected
Dynamically change dataset id based on root_domain
Handling of 4xx and 5xx in default handler (Python)
Camoufox and adaptive playwright
Hey ,why do i get web scrapping of first url , since i have another url .
proxy_config.new_url() does not return new proxy
Proxy example with PlaywrightCrawler

Input schema is not valid (Field schema.properties.files.enum is required)
Issues Creating an Intelligent Crawler & Constant Memory Overload
crawl4ai but switched since crawlee seems much better at anti-blocking.
The main issue I am facing is I want to filtering the urls to crawl for a given page using LLMs. Is there a clean way to do this? So far I implemented a transformer for enqueue_links which saves the links to a dict and then process those dicts at a later point of time using another crawler object. Any other suggestions to solve this problem? I don't want to make the llm call in the transform function since that would be an LLM call per URL found which is quite expensive.
Also when I run this on my EC2 instance with 8GB of RAM it constantly runs into memory overload and just gets stuck i.e. doesn't even continue scraping pages. Any idea how I can resolve this? This is my code currently...Selenium + Chrome Instagram Scraper cannot find the Search button when I run it in Apfiy..
Error on cleanup PlaywrightCrawler
headless=True
The package that I use is: crawlee[playwright]==0.6.1
When running the crawler I noticed when waiting for remaining tasks to finish it sometimes receives an error like you can see in the screenshot. Is this something that can be resolved easily?
...Google Gemini Applet - Google Module Not Found (even though it is there)
"apify run" no longer able to detect python
Django Google Maps Reviews : Pulling Data into local Django app
Is recommend to use Crawlee without the Apify CLI?
How i can change/save the logger that the context provides
context.log but i want to save/change the logger is used, because i want to save this, i am using the Crawly without Apify CLI
How can I add my own cookies to the crawler
SAME_HOSTNAME not working on non www URLs
EnqueueStrategy.SAME_HOSTNAME I noticed it does not work properly on non www urls.
In the debugger I noticed it passes origin to the _check_enqueue_strategy but it uses the context.request.loaded_url if available.
So every URL that is checked will mismatch because of the difference in hostname
...
Testing my first actor
Chromium sandboxing failed