stormy-gold
stormy-gold•2y ago

how to handle different workflows with one or multiple crawlers?

Hello, I have an express server standing in front of the crawler accepting different types of mutually exclusive workflow requests like a strategy pattern. each workflow works fine if only one strategy runs at a time but none of them can be work concurrently. I tried 2 separate methods 1. Separate crawler instances but then I get an invalid queue id error 2. Same crawler but use control flow: this results in the requestHandler logic being skipped over and data not being found in the KeyValueStore Any suggestions for how to handle different workflows concurrently?
5 Replies
cloudy-cyan
cloudy-cyan•2y ago
@simponacci did you ever get an answer to this?
MEE6
MEE6•2y ago
@wflanagan just advanced to level 1! Thanks for your contributions! 🎉
Alexey Udovydchenko
Alexey Udovydchenko•2y ago
Are you adding requests with your own https://crawlee.dev/api/types/interface/RequestSchema#uniqueKey ? Sounds like not, because when request(s) processed and same URL added its already considered as resolved and no further actions taken. Logic behind is not to scrape same URL multiple times unless its enforced by code logic.
cloudy-cyan
cloudy-cyan•2y ago
yeah, i am with a uniquekey
Alexey Udovydchenko
Alexey Udovydchenko•2y ago
To quick check you can try named request queue per strategy, if it works means in unified RQ keys are duplicated

Did you find this page helpful?