stormy-gold•2y ago
how to handle different workflows with one or multiple crawlers?
Hello,
I have an express server standing in front of the crawler accepting different types of mutually exclusive workflow requests like a strategy pattern. each workflow works fine if only one strategy runs at a time but none of them can be work concurrently.
I tried 2 separate methods
1. Separate crawler instances but then I get an invalid queue id error
2. Same crawler but use control flow: this results in the requestHandler logic being skipped over and data not being found in the KeyValueStore
Any suggestions for how to handle different workflows concurrently?
5 Replies
cloudy-cyan•2y ago
@simponacci did you ever get an answer to this?
@wflanagan just advanced to level 1! Thanks for your contributions! 🎉
Are you adding requests with your own https://crawlee.dev/api/types/interface/RequestSchema#uniqueKey ? Sounds like not, because when request(s) processed and same URL added its already considered as resolved and no further actions taken. Logic behind is not to scrape same URL multiple times unless its enforced by code logic.
cloudy-cyan•2y ago
yeah, i am with a uniquekey
To quick check you can try named request queue per strategy, if it works means in unified RQ keys are duplicated