Strategy Discussion: Scalable Solution for Scraping 250+ Event Websites
General🗓️Schedules & TasksAt a high level, we see two possible directions:
Separate Python-based scraping system
1 ) Individual or grouped scripts per website
2) Managed infrastructure, proxies, scheduling, retries, and monitoring handled separately
A Common Apify Actor approach
1) One configurable “universal” Actor supporting multiple event websites
2) Website-specific logic handled via configurations (selectors, pagination, etc.)
3) Centralized scheduling, proxy management, retries, and failure monitoring
Given the scale (275+ websites), long-term maintenance, and reliability requirements, we want to ensure we choose the most cost-effective, scalable, and future-proof solution.
I’d appreciate your guidance on which approach you recommend and why, especially considering:
1) Maintenance overhead
2) Scalability
3) Anti-bot handling
4) Cost optimization
5) Long-term extensibility
Looking forward to your inputs so we can finalize the architecture before execution.
Bhavin
bhavin.shah@techforceglobal.com
