Workflow for manually reprocessing requests when using @apify/storage-local for SQLite Request Queue
Use case: I'm debugging a crawler. Majority of request handlers succeed, only few fail. I wanna fix/adjust the request handler logic; get from logs failed urls; open some SQLite editor; find those requests in request queue table and somehow mark them as unprocessed. Then rerun the crawler with
After lot of debugging/investigating Crawlee & @Apify/storage-local I've managed to figure out a working workflow, but it's kinda laborious:
https://github.com/apify/crawlee/discussions/1232#discussioncomment-1625019
[1]
https://github.com/apify/apify-storage-local-js/blob/8dd40e88932097d2260f68f28412cc29ff894e0f/src/emulators/request_queue_emulator.ts#L341
[2]
https://github.com/apify/crawlee/blob/52b98e3e997680e352da5763b394750b19110953/packages/core/src/storages/request_queue.ts#L164
CRAWLEE_PURGE_ON_START=false so it only run the previously problematic urls. Iterate few times to catch all bugs, and then run the whole crawler with purged storage.After lot of debugging/investigating Crawlee & @Apify/storage-local I've managed to figure out a working workflow, but it's kinda laborious:
- set row's orderNo to some future date in ms from epoch
- edit rows' json and remove handledAt property [2]
- run the crawler, which will re-add handledAt property
- delete row's orderNo (not sure why that is not done automatically)
[1]
https://github.com/apify/apify-storage-local-js/blob/8dd40e88932097d2260f68f28412cc29ff894e0f/src/emulators/request_queue_emulator.ts#L341
[2]
https://github.com/apify/crawlee/blob/52b98e3e997680e352da5763b394750b19110953/packages/core/src/storages/request_queue.ts#L164

GitHub
How can I make requests in the RequestQueueue queue available for re-processing? I am interested in how to do this both for the entire queue and for individual requests. Suppose the actor has proce...
GitHub
Local emulation of the apify-client NPM package, which enables local use of Apify SDK. - apify/apify-storage-local-js
GitHub
CrawleeβA web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast. - apify/crawlee
