Apify Discord Mirror

D
DuxSec
Offline, last seen 2 weeks ago
Joined January 18, 2025
D
DuxSec
·
E
Solved

Double log output

in main.py logging works as expected, however in routes.py logging is printed twice for some reason.
I did not setup any custom logging, I just use
Actor.log.info("STARTING A NEW CRAWL JOB")

example:
Plain Text
[apify] INFO  Checking item 17
[apify] INFO  Checking item 17 ({"message": "Checking item 17"})
[apify] INFO  Processing new item with index: 17
[apify] INFO  Processing new item with index: 17 ({"message": "Processing new item with index: 17"})


If I add this in my main.py (https://docs.apify.com/sdk/python/docs/concepts/logging)
Plain Text
async def main() -> None:
    async with Actor:
        ##### SETUP LOGGING #####
        handler = logging.StreamHandler()
        handler.setFormatter(ActorLogFormatter())

        apify_logger = logging.getLogger('apify')
        apify_logger.setLevel(logging.DEBUG)
        apify_logger.addHandler(handler)

it prints everything from main.py 2x, and everything from routes.py 3x.

Plain Text
[apify] INFO  STARTING A NEW CRAWL JOB
[apify] INFO  STARTING A NEW CRAWL JOB ({"message": "STARTING A NEW CRAWL JOB"})
[apify] INFO  STARTING A NEW CRAWL JOB ({"message": "STARTING A NEW CRAWL JOB"})
11 comments
E
D
A scraper that I am developing, scrapes a SPA with infinite scrolling. This works fine, but after 300 seconds, I get a WARN , which spawns another playwright instance.
This probably happens since I only handle 1 request (I do not add anything to the RequestQueue), from which I just have a while until finished condition is met.

Plain Text
[crawlee.storages._request_queue] WARN  The request queue seems to be stuck for 300.0s, resetting internal state. ({"queue_head_ids_pending": 0, "in_progress": ["tEyKIytjmqjtRvA"]})


What is a clean way to stop this from happening?
3 comments
E
D
A
If i use multiple files, what is the best way to pass data (user input, which contains 'max_results' or something) to my routes.py?

example snippet main.py
Plain Text
        max_results = 5 # example

        crawler = PlaywrightCrawler(
            headless=False, 
            request_handler=router,
        )
        await crawler.run([start_url])


snippet routes.py
Plain Text
@router.default_handler
async def default_handler(context: PlaywrightCrawlingContext) -> None:
    max_results = ???
3 comments
D
M
В