Apify & Crawlee

AC

Apify & Crawlee

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

πŸ’»hire-freelancers

πŸš€actor-promotion

πŸ’«feature-request

πŸ’»creators-and-apify

πŸ—£general-chat

🎁giveaways

programming-memes

🌐apify-announcements

πŸ•·crawlee-announcements

πŸ‘₯community

vicious-gold
vicious-gold2/2/2024

Selenium and apify proxy

Hello, how to use apify proxy with selenium in effective way? I know I can pass it as selenium config, but how to rotate it later?...
automatic-azure
automatic-azure2/1/2024

Taking a screenshot with Selenium/Python

Hello team! I'm just starting out and would appreciate some help here. I'm running into an error that I don't have enough tools to understand unless I see the page that's being scraped visually, so I wanted to use the save_screenshot feature with Selenium. It seems like the code does take the screenshot, and according to my understanding from guides I read, it should be saved to the KeyValueStoarge, but after I run the code, I can't find it in the storage. What am I missing? This is the piece of code I'm running on an Actor in the Apify platform: ```python def screenshot_error(driver, store_id):...
flat-fuchsia
flat-fuchsia1/30/2024

Python rotating proxy with apify request

Every time I attempt web scraping, the website blocks my efforts. Despite utilizing Apify's rotating proxy, I'm unable to access the data. I'm currently using the requests library. Would anyone be able to share a Python code example for web scraping with a rotating proxy using requests?
stormy-gold
stormy-gold1/28/2024

looking for an assistant guy based on USA, CANADA, AUSTRALIA

hello. We are a devs team based on UAE and we are looking a person who can assist us. Your work is making new accounts for each dev in our team. So your work seems to a account renter. We can discuss the budget each other....
instant-orange
instant-orange1/26/2024

need help filling submission forms on website automatically

Hey Im trying to fill forms automatically but it doesn't seem to work. I used gpt scraper but what else should I use

How can I deploy Playwright Python script on Heroku?

I am trying to deploy my code on Heroku. using : python3 getting this error on Heroku console. ╔══════════════════════════════════════════════════════╗...
wee-brown
wee-brown1/10/2024

What’s the fastest?

Hey all! I’m building a web scraper for leads and I need it to click on a button and unclick the button as fast as possible while also using some basic html scraping. My question is, what is the fastest web scraping library im currently using selenium and beautiful soup. Should I switch to JS instead? Any help would be great thanks!...
conscious-sapphire
conscious-sapphire1/9/2024

Saving cookies

Hi, I am building a private actors for my use case which involves logins. I need to save the cookies. What is the best way to save the cookies and use it later within tha actor?
absent-sapphire
absent-sapphire1/4/2024

HTML code via dev google tools and soup are different

Hello, I'm currently working on a small project which aims to scrap articles on the subject of crypto (or other is an example) I've already done some scraping in the past but on simple sites....
No description
optimistic-gold
optimistic-gold1/3/2024

Integration with Apify API

Trying to integrate an actor into my web scraping script, however the api mentions an error with permission, how can I fix it?
quickest-silver
quickest-silver12/30/2023

Scraping public records

I have a script that scrapes pulbic records and it works fine, however, ive been trying to make it also download the pdf file that is attatched to eashc listing and ive ran into some issues there. When clicking on the row a pop up comes up that should have the pdf however when i open it with selenium it doesnt load the pop and im unable to download it, but when i open it manually it comes up just fine. The website:https://officialrecords.broward.org/AcclaimWeb/search/SearchTypeDocType My code(its a bit ugly but im only in the proccess of writing it) is attached ive also attached a picture of the pop up when i click it and when selenium clicks it...
correct-apricot
correct-apricot12/21/2023

Need help with scraping websites.

I have been trying to get a project started wherein I would scrape data from a football website mainly (https://www.fotmob.com/ & https://fbref.com/en/) and perform few data analysis operations to make predictions on the obtained dataset. Like I'll be scraping and converting data into a csv file mainly. So for this action, to make this dataset I was facing a lot of issues. I needed some bit of a help in this part. The attached file may help you to see what I'm trying to do....
automatic-azure
automatic-azure12/15/2023

Everything logged twice?

Any idea why everything is logged twice? Is this a known issue of the Scrapy template, or is it a desired behavior? ``` ... [scrapy.core.engine] INFO Spider closed (finished) ({"spider": "<Spider 'startupjobs' at 0x1072ab890>"}) [scrapy.core.engine] INFO Spider closed (finished) ({"spider": "<Spider 'startupjobs' at 0x1072ab890>", "message": "Spider closed (finished)"})...
metropolitan-bronze
metropolitan-bronze12/15/2023

maximum retries instance variable doesn't affect the crawler behavior

I am talking about crawler apify/website-content-crawler , and here is how i intialize the instance client = ApifyClient(token="{}",max_retries=1,timeout_secs=50) def crawler(url): # Prepare the Actor input...
foreign-sapphire
foreign-sapphire12/8/2023

How to scrape Job listings and keep them updated (ideally daily) without rescraping the same jobs

Scraping from indeed, linkedin job search, ziprecruiter etc
eastern-cyan
eastern-cyan12/7/2023

cloudflare vs selenium

I cannot bypass cloudflare using selenium https://www.usvisascheduling.com/ofc-schedule who can help me? pls dm me....
robust-apricot
robust-apricot12/6/2023

My Code Scratch

2023-12-02T21:40:35.188Z ERROR Cannot extract data from 2023-12-02T21:40:35.190Z Traceback (most recent call last): 2023-12-02T21:40:35.191Z File "/usr/src/app/src/main.py", line 50, in main 2023-12-02T21:40:35.192Z button = WebDriverWait(driver, 20).until( 2023-12-02T21:40:35.193Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...
automatic-azure
automatic-azure12/4/2023

More actors in one repository

I'm not that far in my proof of concept and perhaps I ask about something which would be clear later in my progress, but one question arises in my head when trying to architect my future solution. I think I'd like to have many actors in one repository so it's easy to manage and contribute to them. But is it able to connect Apify actor with such (monorepo?) architecture? E.g. I could have a Python package with a few scrapers to scrape jobs, then another package with a few scrapers to scrape meetup.com events, etc. Having them separated by topic and by setup (schedulers, proxies, etc.). But I'd like to have all scrapers in one repo....
plain-purple
plain-purple12/4/2023

Web Site Content Monitor Tool to scrape new pages

Hello everyone, I mostly use website content crawler on apify store for adding the content into pinecone which is my vector database for passing these vector into langchain LLM to create chatbot using Python. I wonder is there any tool to monitor or check website with time based and scrape new content into my database then I will convert to embeddings and add into my vector database?...