Apify Discord Mirror

Home
Members
CupOfGeo
C
CupOfGeo
Offline, last seen 4 days ago
Joined December 23, 2024
getting this system overloading message just trying to scrape two urls. this check just keeps looping for almost 10 mins now. i set the cpu to 4 and memeory to 4gb but still getting this message. i know cloud runs dont like threads and background tasks is that the real issue? not sure wondering if anyone has run them on cloud run
Plain Text
[crawlee.events._event_manager] DEBUG LocalEventManager.on.listener_wrapper(): Awaiting listener task...
[crawlee.events._event_manager] DEBUG LocalEventManager.on.listener_wrapper(): Awaiting listener task...
'[crawlee._autoscaling.autoscaled_pool] DEBUG Not scheduling new tasks - system is overloaded
'[crawlee.storages._request_queue] DEBUG There are still ids in the queue head that are pending processing ({"queue_head_ids_pending": 1})
[crawlee._utils.system] DEBUG Calling get_memory_info()...
'[crawlee._autoscaling.autoscaled_pool] DEBUG Not scheduling new tasks - system is overloaded
'[crawlee.storages._request_queue] DEBUG There are still ids in the queue head that are pending processing ({"queue_head_ids_pending": 1})
'[crawlee._autoscaling.autoscaled_pool] DEBUG Not scheduling new tasks - system is overloaded
'[crawlee.storages._request_queue] DEBUG There are still ids in the queue head that are pending processing ({"queue_head_ids_pending": 1})
[crawlee._utils.system] DEBUG Calling get_cpu_info()...
'[crawlee._autoscaling.autoscaled_pool] DEBUG Not scheduling new tasks - system is overloaded
'[crawlee.storages._request_queue] DEBUG There are still ids in the queue head that are pending processing ({"queue_head_ids_pending": 1})
'[crawlee._autoscaling.autoscaled_pool] DEBUG Not scheduling new tasks - system is overloaded
1 comment
M
Hello so Ideally i would like to have a file for website im scraping (so ome will contain more than one handler per py file). Im thinking of what the best pattern for that is. I was just going from the docs and have router = Router[BeautifulSoupCrawlingContext]() as a global var in my routes.py but i would need to either pass that router around as a singleton into the different handler files or i would import the files into the one routes.py and then register the handers there which sounds better but then I have something like webpage_handler.py which has my handler_one(context) and handler_two(context) then i register them in routes with. Whitch is fine but doesn't look too pretty.
Plain Text
@router.handler("my_label")
async def handler(context: BeautifulSoupCrawlingContext) -> None:
    handler_one(context)
@router.handler("another_label")
async def handler_another_name(context: BeautifulSoupCrawlingContext) -> None:
    handler_two(context)



to be honest not super sure wondering if someone already has a nice pattern that works.
1 comment
f