liamk-ultra
liamk-ultra•2mo ago

Unexpected behavior with Statistics logging

I wanted to turn off the periodic Statistics logging; my crawls are relatively short, and I'm only interested in the final statistics. I could set the log_interval to something really long. I thought that if I set periodic_message_logger to None it would prevent logging, but actually that doesn't work. The codes tests for it being None and falls back to the default logger in Crawlee.
statistics = Statistics.with_default_state(
log_message=f"{target.venue} Web Scraper Stats",
log_interval=timedelta(minutes=30),
periodic_message_logger=None,
statistics_log_format="table"
)
statistics = Statistics.with_default_state(
log_message=f"{target.venue} Web Scraper Stats",
log_interval=timedelta(minutes=30),
periodic_message_logger=None,
statistics_log_format="table"
)
Solution:
This expected behavior of periodic_message_logger expects either some external logger, or if None, the default value is used. You can achieve your goal by doing the following ```python...
Jump to solution
3 Replies
MEE6
MEE6•2mo ago
@liamk-ultra just advanced to level 1! Thanks for your contributions! 🎉
Solution
Mantisus
Mantisus•2mo ago
This expected behavior of periodic_message_logger expects either some external logger, or if None, the default value is used. You can achieve your goal by doing the following
import asyncio

from crawlee.crawlers import HttpCrawler, HttpCrawlingContext
from crawlee.statistics import Statistics

from logging import getLogger


silent_logger = getLogger('silent')
silent_logger.disabled = True

async def main():
crawler = HttpCrawler(
statistics=Statistics.with_default_state(periodic_message_logger=silent_logger),
)

@crawler.router.default_handler
async def request_handler(context: HttpCrawlingContext) -> None:
context.log.info(f'Processing {context.request.url} ...')


await crawler.run(['https://crawlee.dev/'])
import asyncio

from crawlee.crawlers import HttpCrawler, HttpCrawlingContext
from crawlee.statistics import Statistics

from logging import getLogger


silent_logger = getLogger('silent')
silent_logger.disabled = True

async def main():
crawler = HttpCrawler(
statistics=Statistics.with_default_state(periodic_message_logger=silent_logger),
)

@crawler.router.default_handler
async def request_handler(context: HttpCrawlingContext) -> None:
context.log.info(f'Processing {context.request.url} ...')


await crawler.run(['https://crawlee.dev/'])
liamk-ultra
liamk-ultraOP•2mo ago
Okay, I'll do that. Thanks.

Did you find this page helpful?