9 replies

Strategy Discussion: Scalable Solution for Scraping 250+ Event Websites

🌐Apify Proxy🖥️Platform

General🗓️Schedules & Tasks

We need to scrape data from 250+ event-based websites, and before implementation, I’d like to align on the most scalable and maintainable approach for this requirement.

At a high level, we see two possible directions:

Separate Python-based scraping system

1 ) Individual or grouped scripts per website

2) Managed infrastructure, proxies, scheduling, retries, and monitoring handled separately

A Common Apify Actor approach

1) One configurable “universal” Actor supporting multiple event websites

2) Website-specific logic handled via configurations (selectors, pagination, etc.)

3) Centralized scheduling, proxy management, retries, and failure monitoring

Given the scale (275+ websites), long-term maintenance, and reliability requirements, we want to ensure we choose the most cost-effective, scalable, and future-proof solution.

I’d appreciate your guidance on which approach you recommend and why, especially considering:

1) Maintenance overhead
2) Scalability
3) Anti-bot handling
4) Cost optimization
5) Long-term extensibility

Looking forward to your inputs so we can finalize the architecture before execution.
Bhavin
bhavin.shah@techforceglobal.com

Solution

This post is not asking for support, but discussion, right?

Jump to solution