Apify & Crawlee

AC

Apify & Crawlee

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻creators-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

Fingerprints for session.

I found this in js docs. Is this implemented also for python? How can i generate fingerprints based on session_id? https://crawlee.dev/js/api/browser-pool/interface/FingerprintOptions#useFingerprintCache...

Packaging crawlee with pyinstaller

More of an open question, but has anyone had luck with using pyinstaller to package crawlee into an executable? Currently getting a FileNotFoundError after packaging the script where crawlee can't find a file in Local\Temp\ [some random list of letters] \apify_fingerprint_datapoints\data\input-network-definition; pyinstaller has some stuff in its FAQ about import errors but this seems very specific?...
Solution:
yep, it ended up being a mix of the two acting weird running something like ```powershell $env:PLAYWRIGHT_BROWSERS_PATH='0' #playwright config command to force it to acknowledge the browser you want to use because it breaks if you run pyinstaller by itself >> playwright install firefox #use whatever browser you're using crawlee with...

set user-agents for BeautifulSoupCrawler

Is it possible to set user-agents for bs4 crawling when there is no need to use browser crawler?? and what about for specific sessions ?

Crawlee's Compatibility with Kivy

Hey all, I was wondering if there are any specific compatibility issues with using crawlee within a kivy app, because whenever I run the script attached to grab the name of a product in amazon, it gives a recursion error that stems within the default_handler, specifically when the url is pulled from the context. FWIW, the UI pipeline is App -> ScreenManager -> Main Screen & Second Screen, wherein the main screen transitions to the second screen via switch_to() ...
Solution:
Hey, @Glitchy_mess Try adding configure_logging=False when initializing the crawler. ```python...
No description

selenium + Residential Proxies

Hi all, I am rather stupid and need some help setting chrome_options in the Selenium Init to use residential proxies? I've tested a few ways and chrome seems to ignore the proxy and use the host server ip....

Memory access problem on MacOS

I'm deploying on Linux, but my development system is MacOS. I discovered an issue that appears with concurrency > 2. I have plenty of memory but that's not actually the problem;, but in any case it should probably fail gracefully. The Snapshotter tries to access child process memory via psutil.Process.memory_full_info(),...
Solution:
I think it is a bug introduced when improving memory usage estimation on Linux. Unfortunately we have tests only for Windows and Linux in CI, so macOS support is kind of fragile. Thanks for reporting it I will take a look into that https://github.com/apify/crawlee-python/issues/1329...

Unexpected behavior with Statistics logging

I wanted to turn off the periodic Statistics logging; my crawls are relatively short, and I'm only interested in the final statistics. I could set the log_interval to something really long. I thought that if I set periodic_message_logger to None it would prevent logging, but actually that doesn't work. The codes tests for it being None and falls back to the default logger in Crawlee. ...
Solution:
This expected behavior of periodic_message_logger expects either some external logger, or if None, the default value is used. You can achieve your goal by doing the following ```python...

StorageClients w/ Multiple Crawlers

Hi! This is my first time using Crawlee, and ... so far, so good. It's working. However, I noticed it was using the default FileSystemStorage and creating files locally on my development machine. That's less than ideal in production. Changing to MemoryStorageClient revealed some other problems....

Issue with Instagram Reels Playcount Scraper – Restricted Page Errors

Hi team, I’m using the Instagram Reels Playcount Scraper actor to extract play counts from a list of public Instagram reel URLs. Many of the reels are returning an error: "restricted_page", even though they are accessible in a regular browser without login. Examples of such URLs:...

Get metadata in response of /run-sync via API

Hi, I built an actor that runs smooth, but now I am having trouble accessing all relevant data via API calls. The goal is to start the run via /run-sync and later on access the files that the run stored in the keyValueStore via the API. My problem is, that when I start the run via /run-sync the run starts and returns the result of the run, but there is no ID or any information that would allow me to know in which keyValueStore the files were stored. So I can't access the files. Best would be, that the file urls are already included in my response but it would also be okay to just get an Id of the store and then make another request to get the files....

Using browser_new_context_options with PlaywrightBrowserPlugin

Hello, I'm very confused on how to use browser_new_context_options because the error code that i get implies that storage_state is not a variable that Playwright's new_context call supports. However on the playwright docs, storage_state seems to support uploading cookies as a dictionary, and when I try running playwright by itself and uploading cookies through this method, everything works fine. A lot of the cookie related questions in this forum seems to be before the msot recent build, so I was wondering what the syntax should be to properly load cookies through the PlaywrightBrowserPlugin class, as everything else seems to work just fine once that's sorted. ...
Solution:
figured out the issue in dms, for anyone looking up this sort of issue in the future, you'd want to have something akin to this script some lines of interest are: ``` "expires": int(parts[4]) if parts[4] else None, #type: ignore "http_only": parts[3].lower() == 'true',...
No description

enqueue_links does not find any links

Hello, I encountered a weird issue where enqueue_links does not find any links on a webpage, specifically https://nanlab.tech. It does not find any links no matter what strategy I choose. I also tried to use extract_links, which managed to find all links with strategy all, but with strategies same-origin and same-hostname no link is extracted and with strategy same-domain there is an error. I am using the latest version of crawlee for python 0.6.10 and for scraping I am using Playwright. Any idea what might be the issue? Here is the handler: @self.crawler.router.default_handler async def request_handler(context: PlaywrightCrawlingContext) -> None: # type: ignore...

Crawler always stops after exactly 300 seconds

I use crawlee python in docker and it always stops after exactly 300 seconds. I checked that it gets asyncio. CancelledError in AutoscaledPool.run() method but I don't know what sends it. If I try some simple python example the keep_alive works but in my dockerized system it just always sends final statistics after 300 seconds and stops. I checked that it happens with multiple different crawler types
Solution:
Nevermind it was arq job_timeout

Searching for Developer of Apollo scraper 50k leads-code_crafter

Hi! I’m using an actor developed by Code Pioneer (code_crafter) and have a quick question. Anyone know if they’re on here or how best to reach them?”

Scrapping tweets are all mock tweets

logger.info(f"Starting Twitter Scraper actor for users: {info.x_user}") run = client.actor("kaitoeasyapi/twitter-x-data-tweet-scraper-pay-per-result-cheapest").call( run_input=run_input) annos = client.dataset(run["defaultDatasetId"]).iterate_items() for anno in reversed(list(annos)):...
foreign-sapphire
foreign-sapphire5/2/2025

How to send an URL with a label to main file?

I am trying to send an URL with a label and user data to main file in order to run this url directly from a specific handler within routes file. Is that possible? I am using Playwright.
rival-black
rival-black4/29/2025

structlog support?

Could I see an example of how struct log would be implimented officially?
other-emerald
other-emerald4/23/2025

Memory is critically overloaded

I have an AWS EC2 instance with 64GB memory. My crawler is running in a docker container. The CRAWLEE_MEMORY_MBYTES is set to 61440 My docker config ```...
No description
ratty-blush
ratty-blush4/17/2025

Routers not working as expected

Hello everyone First of all, thanks for this project — it looks really good and promising! I'm considering using Crawlee as an alternative to Scrapy....