Apify & CrawleeA&CApify & Crawlee
Powered by
dead-brownD
Apify & Crawlee•2y ago•
8 replies
dead-brown

How to architect my actor and scraper

Hi friends!

I've been hacking around with Apify and Crawlee for a few days now and it's a lot of fun.

I'm getting stuck on how to architect my crawler for my use-case and could really use some input:

1. I'm planing to collect inputs on my internal webpage (a list of company name + city)
2. I would then submit this array of objects to my Actor using the Apify API (first problem, my understanding is that I would have to JSON stringify inputs as Apify doesnt support arrays of objects as input?)
3. I would then for each entry, open the landing page of the website I'm scraping and input a search field and see if I get a match and in that case I extract data and save using pushData({id, name, location, employees, ...})

Here comes what I can't wrap my head around:
1. Should I be invoking an actor once per item or can I batch everything to one actor as I was thinking? My thinking was to avoid extra overhead but I also cant quite wrap my head around how proxies, multiple sessions and UA fingerprinting etc. works (seems auto-magic?). It would probably be smart to rotate Fingerprint for each "new" search so its not obvious im hitting it 30 times as the "same" browser/user.
2. How can I queue URLs + userData? It seems like enqueuLink and crawler.addRequest only supports passing URLs?

Maybe I'm just thinking all wrong about how I should be using the framework. If you have a better approach or can help me with the above questions, I would be very grateful!
Apify & Crawlee banner
Apify & CrawleeJoin
This is the official developer community of Apify and Crawlee.
14,091Members
Resources
Recent Announcements

Similar Threads

Was this page helpful?
Recent Announcements
ellativity

**Update to Store Publishing Terms and Acceptable Use Policy** Due to an influx of fraudulent reviews recently, Apify's Legal team has taken some actions to protect developers, customers, and Apify, by updating the Store Publishing Terms and Acceptable Use Policy. Please pay special attention to the updated terms in section 4 of the Store Publishing Terms here: https://docs.apify.com/legal/store-publishing-terms-and-conditions Additionally, please review the changes to section 2 of the Acceptable Use Policy here: https://docs.apify.com/legal/acceptable-use-policy If you have any questions, please ask them in <#1206131794261315594> so everyone can see the discussion. Thanks!

ellativity · 3w ago

ellativity

Hi @everyone I'm hanging out with the Creator team at Apify in https://discord.com/channels/801163717915574323/1430491198145167371 if you want to discuss Analytics and Insights!

ellativity · 4w ago

ellativity

2 things for <@&1092713625141137429> members today: 1. The Apify developer rewards program is open for registrations: https://apify.notion.site/developer-rewards This is the program where you will earn points for marketing activities. The rewards are still TBC, but the real purpose of the program is to help you structure your marketing activities and efforts. In the coming weeks, I will be populating that link with guides to help you identify the best ways to market your Actors, as well as scheduling workshops and office hours to help you create content and develop your own marketing strategy. 2. At 2PM CET (in about 80 minutes) there will be an office hour with the team behind Insights and Analytics, who want your feedback on how to improve analytics for you. Join us in https://discord.com/channels/801163717915574323/1430491198145167371 to share your ideas!

ellativity · 4w ago

Similar Threads

Problem actor Instagram Scraper
ordinary-sapphireOordinary-sapphire / apify-platform
2y ago
Actor Source code (contact scraper)
urgent-maroonUurgent-maroon / apify-platform
3y ago
instagram - actor Instagram Profile Scraper
ordinary-sapphireOordinary-sapphire / apify-platform
3y ago
Web Scraper Actor userData dosent work
wet-aquaWwet-aqua / apify-platform
4y ago