Crawler becomes idle after some time (queue not empty)
I'm struggling to understand why my crawler gets idle after some time. I have tried with multiple proxy providers (and mixed) so it doesn't seem to me "proxy throttling". The crawler becomes idle (ie not crawling nor scrapping) but the request queue (v1) isn't empty and the CPU is quite busy. When it starts running again it seems the CPU usage drops again. The crawler can be idle for even 5min or so, so it's quite a long time, until it resumes. Statistics and the AutoscalePool report this:
Maybe worth mentioning: my request queue (single one) is now with 2.7GB.
Any suggestion what might be happening? Is it some sort of "clean up" on the queue?
EDIT:
This really seems RequestQueue related. I reached a point the crawler won't crawl anymore and I get this:
Any suggestions? I guess removing already crawled urls from the queue would be a solution, not sure if a good one though.
Thank you!
