Apify & Crawlee

AC

Apify & Crawlee

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻creators-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

wise-white
wise-white4/1/2024

add_request with same url but different payload

do you have any example of adding a request to the requestQueue with using use_extended_unique_key because I am using the same url for every request the only change is the payload ? I have this error TypeError: Unicode-objects must be encoded before hashing with this code for i in range(1,max_index + 1) : # Update the page number in params_query #json_data['indices']['products']['paging']['index'] = int(next_index) json_data = json.dumps(json_data)...
probable-pink
probable-pink3/28/2024

Fb accounts

Is there any way to scrape FB accounts.
stuck-chocolate
stuck-chocolate3/20/2024

Scrap instagram reels (python)

I'm needing an Instagram API which gives me a link (from a reel) and returns the statistics (likes, number of comments, views). I was looking for all the APIs on the page but I couldn't find any that solve my needs.
extended-salmon
extended-salmon3/18/2024

Python (Selenium) + Rotating Proxies

Hi! Ive been trying to use residential proxies from apify using python + selenium but when I tried to implement it on my apify and tried to open a website, it always says error does any of you have like a starting template to use selenium with rotating proxies? already read documentation and everything but still didnt work :/...
like-gold
like-gold3/18/2024

instagram hashtag scraping based on timestamp

Has anyone been successful in running the Instagram Hashtag Scraper[1], or something similar, based on timestamp instead of volume/number of posts? I am trying to simply identify all of the posts made in the past 24 hours that include a single hashtag. This is the first step in a larger process, but the first step is rather important to optimize for cost. All I need is to feed it a hashtag, and identify all the users that posted (Reels or Images) in the past 24 hours using the hashtag. [1] apify/instagram-hashtag-scraper...
ambitious-aqua
ambitious-aqua3/6/2024

websocket error during Apify run that is not from our code

In the middle of a Python Playwright run, we are getting this error: ``` ERROR Error in websocket connection Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/websockets/legacy/protocol.py", line 1301, in close_connection...
like-gold
like-gold3/5/2024

Passing memory allocation per run in Apify reel scraper

How do you go about changing the memory limits for an actor by way of the api? I am using the Python client. I had read some documentation that implied the memory limit was passed in the URL of the API call using the option of &memory=32 to set the memory limit to 32MB. However, that is not working as my actor is still defaulting to 4MB. I have also tried a few different guesses at putting the memory limit in the run_input, but I have been unsuccessful. Does anyone have documentation on h...
like-gold
like-gold3/4/2024

Multiple usernames for single actor run not working

I am trying to leverage the apify-client for Python to kick off an execution of the Instagram Reel Scraper (apify/instagram-reel-scraper) via API. I have been successful in running this in a "one-to-one" scenario where I run it for a single Instagram handle and it returns ten results as expected. However, I want to be able to pass ~1,600 instagram IDs with a single API call. You can do this through the web console quite easily and it runs without a problem. I can even edit the JSON to execu...
probable-pink
probable-pink2/28/2024

A question regarding the usage of the Python SDK

Hey all, I am fresh off the "I want to create an Actor" bus and I am confused about how the Apify SDK for Python should be used. Is it meant to act as a resource for code samples, tests, and whatnot? Or should I fork the repo and use the existing project files as a starting point for developing my first Actor? I would prefer to develop and test locally and then push to Apify. I was about to create a Python project in VSCode and realized I need to find out what I needed in the project! I am curre...
frail-apricot
frail-apricot2/24/2024

integration of twitter actor in flask based web app

I am facing issue to integrate twitter actor to my flask based web app. anyone assist me in this?
gradual-turquoise
gradual-turquoise2/21/2024

Concept of a 'queue'?

I think i am confused on the request queue. Here's my scenario: I will be having many requests for the same actor, where I will pass a URL. I don't know the frequency of these requests.. I may have several in a few minutes, or several in an hour.. and they may take 2-10 minutes to run. Should I be creating a new run each time I want a request? I know I can START a run with many URL's.. But I may only have one to add at a time....
genetic-orange
genetic-orange2/20/2024

Run instance of apify client not returning, cant access data scraped from my Apify agent

Hey guys, I'm getting an issue where I the run instance of the client.actor call for the python SDK is not returning ? it just says the call is 'running' so I can't access the items for the agent : def apify_reddit_agent(json_input) : ```python def apify_reddit_agent(json_input) :
info_array = [] #Changed the API key here to the samuel account instead....
quickest-silver
quickest-silver2/15/2024

Scrapy integration silently throws away redirects

Cheers, I just noticed that my custom scraper makes quite different number of requests locally and through Apify, while the code, URLs, parameters, everything is the same. The same Scrapy spider produces 720 items locally, but 370 through Apify Anyone has any clue what could be the root cause, where to look? Just from the logs I can't see anything. The only clue I noticed is that on Apify the scraper makes no POST requests, but that probably isn't enough to debug the root cause 🤔 Is there a way I can raise logging or something on Apify? How can I best approach this? How to debug this?...
fascinating-indigo
fascinating-indigo2/14/2024

Long waiting time with proxy

Hello, I'm using selenium + apify proxies. general code idea: ```py...
quickest-silver
quickest-silver2/7/2024

facing issue in selenium on AWS batch process

from tab crashed (Session info: chrome-headless-shell=121.0.6167.139) 2024-02-07T12:31:03.700+05:30 Stacktrace:...
magic-beige
magic-beige2/6/2024

RunClient.resurrect() works first time, but not second time

I have written a Python client that iterates through a batch of input files and calls an actor to process each file. This it accomplishes using the ActorClient.call() method. If one of the files fails, I get the RunClient object from the ActorClient object and call resurrect(). I would like to resurrect twice before throwing an exception. I can get the job to resurrect the first time (because its status is FAILED), but if I try again, I get this error message: ```Cannot resurrect an Actor ru...
deep-jade
deep-jade2/5/2024

Apify RequestQueueClientAsync.update_request - 413 - Payload too large

Hi guys, I'm using the python sdk of apify. I do have a simple scraper, which iterates through a pagination to collect some URL's. At the current state my scraper is executing 55 requests and has 540 results. But at the 55. page - I do get an error from the SDK....
correct-apricot
correct-apricot2/2/2024

mitmproxy python

can anyone help me with running mitmproxy inside the code? like I wrote a addon I need but I cant understand what I need to write to start the mitmproxy with this addon
rising-crimson
rising-crimson2/2/2024

scraping help

I am trying to scrape this webside: https://digital.fidelity.com/prgw/digital/research/quote/dashboard/summary?symbol=AAPL Sector Information Technoloy...