Big_Smoke_420B
Apify & Crawlee2y ago
3 replies
Big_Smoke_420

Detect when a specific request finishes for a Express served crawler

I'm developing a long-lived crawler that's being served behind Express. A user sends a request to "localhost:8347/search?q={query}", and the crawler searches Google to find sites to scrape. Currently, it only retrieves the page titles from each site.

Problem:
I need to determine when a specific user's request has finished processing, and I need to differentiate between requests from those of other users. The (naive) solution is to check if the RequestQueue is empty, but that isn't really feasible when there are multiple requests made by different users filling the same RequestQueue. The only solution I can think of right now involves finding every request with a specific datasetIndex in its request.userData propety and checking if all of those requests are marked as "handled", but I don't exactly know how to implement this yet. Are there any built-in methods in Crawlee that could perhaps better solve this?

Example scenario:
1. User 1 makes a request: localhost:8347/search?q="silmarillion"%20"1999"%20site:osta.ee
2. User 2 makes a request: localhost:8347/search?q="tasuja"%20"2017"%20site:osta.ee
3. User 1's request finishes (no more Google results to scrape).
4. searchGoogle needs to detect when User 1's request is complete and return the results to the Express route while differentiating it from User 2's request.
Was this page helpful?