Suggestions to integrate Crawlee in a a new cloud platform
-----
Hello! I'm a developer working in https://estela.bitmaker.la/docs/, a platform for web scraping in the cloud. We currently support Scrapy and Requests, but our focus is on expanding to include Crawlee in the platform.
Our system relies on Kafka for queueing requests, stats, logs, and items. To update job statuses (WAITING, RUNNING, COMPLETED, etc.), we use an API endpoint. Now, we're facing some challenges in implementing a wrapper to run Crawlee within Estela.
To store relevant information in Kafka and make calls to the API, we considered a few solutions:
Middlewares: While it's possible to run middlewares in Crawlee, they don't match Scrapy's middlewares, which perfectly suit our needs in Estela. Seems Crawlee's middlewares only run before the request.
Hooks: This seems like an ideal solution, but there's limited documentation on its application with Crawlee crawlers. We found some information on documents and migrations.md, but it's unclear if it applies to Crawlee.
Custom Crawler: Developing a custom crawler would be an extensive maintenance task and is not favored by our team.
Another important consideration is how much code modification a user would need to adapt their existing Crawlee spider for use with Crawlee + Estela. Ideally, we want the migration process to be seamless without requiring additional code.
Any technical advice or insights on these matters would be greatly appreciated. Thank you for your time!
