Apify & Crawlee

AC

Apify & Crawlee

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻creators-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

xenial-black
xenial-black9/25/2024

limit extraction for free plan users

I habe built an Instagram profile scraper, in python, but want to limit the scraping result to 25 for those free plan users not paid plan users. Can anybody help me out?...
afraid-scarlet
afraid-scarlet9/19/2024

infinite_scroll | how to get the updated page

Hey, I created this simple script: ```py import asyncio Instead of BeautifulSoupCrawler let's use Playwright to be able to render JavaScript....
automatic-azure
automatic-azure9/19/2024

Disable persistant storage

Hi guys, I plan to deploy my crawler (a ParselCrawler one) to AWS Lambda. I'm loosely following this guide which is for JavaScript though. I'd like disable persising the storage. I change the configuration like this:
config = Configuration.get_global_configuration()
config.persist_storage = False
config = Configuration.get_global_configuration()
config.persist_storage = False
...
eastern-cyan
eastern-cyan9/17/2024

How can I pass data extracted in the first part of the scraper to items that will be extracted later

Hi. I'm extracting prices of products. In the process, I have the main page where I can extract all the information I need except for the fees. If I go through every product individually, I can get the price and fees, but sometimes I lose the fee information because I get blocked on some products. I want to handle this situation. If I extract the fees, I want to add them to my product_item, but if I get blocked, I want to pass this data as empty. I'm using the "Router" class as the Crawlee team explains here: https://crawlee.dev/python/docs/introduction/refactoring. When I add my URL extracted from the first page as shown below, I cannot pass data extracted before: await context.enqueue_links(url='product_url', label='PRODUCT_WITH_FEES') I want something like this:...

Regarding "tweets mentioning me", I only want to retrieve tweets where users directly mention me

Hi. I only want to retrieve tweets where users directly mention me in the tweet, not tweets that mention me because they are replies to tweets that mentioned me. Can you do this? For more details on this issue, please refer to: https://devcommunity.x.com/t/how-to-differentiate-direct-reply-and-mentions/149262. Thanks....
eager-peach
eager-peach9/16/2024

How can i use proxy with playwright apify

hi I'm trying to make a scraper and i don't now how to implement a proxy hosted by apify in my script i share you a code to see why i'm trying to do
unwilling-turquoise
unwilling-turquoise9/12/2024

Error adding request to the queue: Request ID does not match its unique_key.

Hi and good day. I'm creating a POST API that access the following JSON body: { "url": "https://crawlee.dev/python/", "targets": ["html", "pdf"] }...
unwilling-turquoise
unwilling-turquoise9/6/2024

Encountering NotImplementedError when integrating Crawlee code with REST

Crawlee Code works when I put the URL directly inside the crawlee.run([url]), but when I put the code inside an endpoint and execute the URL on postman error NotImplementedError shows. How to use crawlee w/ FastAPI?...
extended-salmon
extended-salmon9/6/2024

Blog Apify + Qdrant + AWS.

I am interested in writing a blog to use Apify ad Qdrant. I would like to know if this would create any costs as i am new into Apify. The idea is to combine Apify with Qdrant and AWS Services, create an App and showcase it in Medium and any blog where Apify is intereted. Would love to get in contact with anyone from Apify to discuss this opportunity. Mainly I am interested in scraping FAQ websites from AWS https://aws.amazon.com/sagemaker/faqs/...
like-gold
like-gold9/3/2024

Python scrpping

Someone is wanted to tell me how to send mouse event to unactivated window in windows system
conscious-sapphire
conscious-sapphire9/2/2024

Will crawlee scrape data loaded from JS that is triggered by scroll?

Looking for a solution for scraping a website that fills in product details via JS on scroll. Will crawlee “scroll” and scrape after these loads or will this require other means? Thank you!...
fair-rose
fair-rose8/31/2024

Memory is critically overloaded

[crawlee._autoscaling.snapshotter] WARN Memory is critically overloaded. Using 2.54 GB of 1.94 GB (131%). Consider increasing available memory. How do you increase available memory? i have 8gb but its just using just 2...
flat-fuchsia
flat-fuchsia8/28/2024

Proxy not working for chrome browser

When I added my proxy configuration to the playwright crawler that makes use of the chromium type browser, i throws and error meanwhile it doesnt throw an error when i specify firefox browser code ...
No description
flat-fuchsia
flat-fuchsia8/28/2024

Proxy authentication

How do i set the username and password of my proxy when using python for crawlee py
flat-fuchsia
flat-fuchsia8/28/2024

Crawlee Proxy

How do i use proxy crawlee servers if i don't have access to third party proxies
xenial-black
xenial-black8/24/2024

How to set the delay between requests?

crawler = PlaywrightCrawler( max_requests_per_crawl=10, max_request_retries=0, headless=True, browser_type='chromium',...
deep-jade
deep-jade8/20/2024

Idealista

I am currently working on a task that involves scraping property listings from Idealista.es, specifically for properties in Barcelona. The main objective is to ensure that all property data is collected, including photos, which the existing Apify actor seems to miss due to a bug. There are two potential approaches to solving this issue: Debug and fix the existing Apify actor: This would involve identifying and resolving the bug in the current actor to ensure it captures all photos....
ambitious-aqua
ambitious-aqua8/20/2024

crunchbase

hello, I just wanted to ask a question regarding https://apify.com/curious_coder/crunchbase-scraper. Do I have to have an account with crunchbase to get say funding results? If I have an account with you will I be able to scrape all search results for say https://www.crunchbase.com/discover/funding_rounds/911b14126f22caf2fb5adaf7f66bee67 ? Or will I only get the top xx visible results? TY
optimistic-gold
optimistic-gold8/15/2024

How to send post requests

Hello.How can I use normal params for requests such as header, cookies,json in the enque links method