CupOfGeo

getting this system overloading message just trying to scrape two urls. this check just keeps looping for almost 10 mins now. i set the cpu to 4 and memeory to 4gb but still getting this message. i know cloud runs dont like threads and background tasks is that the real issue? not sure wondering if anyone has run them on cloud run

Plain Text

[90m[crawlee.events._event_manager][0m [34mDEBUG[0m LocalEventManager.on.listener_wrapper(): Awaiting listener task...
[90m[crawlee.events._event_manager][0m [34mDEBUG[0m LocalEventManager.on.listener_wrapper(): Awaiting listener task...
'[90m[crawlee._autoscaling.autoscaled_pool][0m [34mDEBUG[0m Not scheduling new tasks - system is overloaded
'[90m[crawlee.storages._request_queue][0m [34mDEBUG[0m There are still ids in the queue head that are pending processing [90m({"queue_head_ids_pending": 1})[0m
[90m[crawlee._utils.system][0m [34mDEBUG[0m Calling get_memory_info()...
'[90m[crawlee._autoscaling.autoscaled_pool][0m [34mDEBUG[0m Not scheduling new tasks - system is overloaded
'[90m[crawlee.storages._request_queue][0m [34mDEBUG[0m There are still ids in the queue head that are pending processing [90m({"queue_head_ids_pending": 1})[0m
'[90m[crawlee._autoscaling.autoscaled_pool][0m [34mDEBUG[0m Not scheduling new tasks - system is overloaded
'[90m[crawlee.storages._request_queue][0m [34mDEBUG[0m There are still ids in the queue head that are pending processing [90m({"queue_head_ids_pending": 1})[0m
[90m[crawlee._utils.system][0m [34mDEBUG[0m Calling get_cpu_info()...
'[90m[crawlee._autoscaling.autoscaled_pool][0m [34mDEBUG[0m Not scheduling new tasks - system is overloaded
'[90m[crawlee.storages._request_queue][0m [34mDEBUG[0m There are still ids in the queue head that are pending processing [90m({"queue_head_ids_pending": 1})[0m
'[90m[crawlee._autoscaling.autoscaled_pool][0m [34mDEBUG[0m Not scheduling new tasks - system is overloaded

Hello so Ideally i would like to have a file for website im scraping (so ome will contain more than one handler per py file). Im thinking of what the best pattern for that is. I was just going from the docs and have router = Router[BeautifulSoupCrawlingContext]() as a global var in my routes.py but i would need to either pass that router around as a singleton into the different handler files or i would import the files into the one routes.py and then register the handers there which sounds better but then I have something like webpage_handler.py which has my handler_one(context) and handler_two(context) then i register them in routes with. Whitch is fine but doesn't look too pretty.

Plain Text

@router.handler("my_label")
async def handler(context: BeautifulSoupCrawlingContext) -> None:
    handler_one(context)
@router.handler("another_label")
async def handler_another_name(context: BeautifulSoupCrawlingContext) -> None:
    handler_two(context)

to be honest not super sure wondering if someone already has a nice pattern that works.

Apify Discord Mirror

Not scheduling new tasks - system is overloaded - gcp cloud run

Splitting the handlers into multiple files and testing