Cypher
Cypher2mo ago

Best practices for long living crawler & rabbitmq

Hi guys, I’m here to ask best practices. I have coded a simple manager: - starts crawlee instance and initializes queue object - receives rabbitmq messages and pushes them to the queue This is set up as keep alive crawler (it never quits, just awaits new messages). Thing is that it has some cache and some work around the queue and crawler to not actually let it close (for some reason it was just stopping and new messages were pushed tu queue but it wasn’t read by the crawler) This made me wonder, maybe it should be built different? Is there any resource that would help me learn about best practices in building such thing on crawlee? Docs lack long living crawler examples I’ll add that my setup is using many different handlers for different sites - don’t know if it’s important for this question
5 Replies
MEE6
MEE62mo ago
@Cypher just advanced to level 1! Thanks for your contributions! 🎉
Cypher
CypherOP2mo ago
Bump
Nazar Hrozia
Nazar Hrozia5w ago
Hello! Thanks for your question. Unfortunately, without the logs and the code, I won’t be able to help you properly. Regarding keepAlive: true — yes, there is a way to prevent the crawler from stopping
Cypher
CypherOP5w ago
Hi @Nazar Hrozia! I can prepare and share a sample repository, but in 2 days - I’m away from a computer right now. I was hoping there is some place where I can see how it ‘should’ be built
Nazar Hrozia
Nazar Hrozia5w ago
_BasicCrawlerOptions | API | Crawlee for Python · Fast, reliable P...
Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.
BasicCrawlerOptions | API | Crawlee for JavaScript · Build reliabl...
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.

Did you find this page helpful?