Eric
Eric•2w ago

Strange behaviour when using rq_client()

I have been struggling with tests for a while, and finally reduced it to a simple test for which I don't understand the behaviour. Is this expected (and I am missing sth) or is this expected? This test fails async def test_failing(): storage_client = MemoryStorageClient() request_queue_client = await storage_client.create_rq_client() req = Request.from_url("https://crawlee.dev") await request_queue_client.add_batch_of_requests([req]) crawler = BasicCrawler( concurrency_settings=ConcurrencySettings(desired_concurrency=1, max_concurrency=1), max_crawl_depth=2, storage_client=storage_client, ) @crawler.router.default_handler async def handler(context: BasicCrawlingContext) -> None: pass stats = await crawler.run() assert stats.requests_finished > 0 but this one passes async def test_success(): storage_client = MemoryStorageClient() req = Request.from_url("https://crawlee.dev") crawler = BasicCrawler( concurrency_settings=ConcurrencySettings(desired_concurrency=1, max_concurrency=1), max_crawl_depth=2, storage_client=storage_client, ) @crawler.router.default_handler async def handler(context: BasicCrawlingContext) -> None: pass await crawler.add_requests([req]) stats = await crawler.run() assert stats.requests_finished > 0 the only difference is that in the first I add requests through a request client or through the crawler. If I add it through this rq = await RequestQueue.open() await rq.add_request(req) it also fails. Thanks in advance
5 Replies
MEE6
MEE6•2w ago
@Eric just advanced to level 3! Thanks for your contributions! 🎉
Mantisus
Mantisus•2w ago
Such use is not desirable.
storage_client = MemoryStorageClient()
request_queue_client = await storage_client.create_rq_client()

req = Request.from_url("https://crawlee.dev"/)
await request_queue_client.add_batch_of_requests([req])
storage_client = MemoryStorageClient()
request_queue_client = await storage_client.create_rq_client()

req = Request.from_url("https://crawlee.dev"/)
await request_queue_client.add_batch_of_requests([req])
To create a queue and interact with it, use RequestQueue. The following code should work without any errors
async def test():
storage_client = MemoryStorageClient()
request_queue = await RequestQueue.open(storage_client=storage_client)

req = Request.from_url("https://crawlee.dev")

crawler = BasicCrawler(
concurrency_settings=ConcurrencySettings(desired_concurrency=1, max_concurrency=1),
max_crawl_depth=2,
request_manager=request_queue,
)

@crawler.router.default_handler
async def handler(context: BasicCrawlingContext) -> None:
pass

await request_queue.add_request(req)

stats = await crawler.run()
assert stats.requests_finished > 0
async def test():
storage_client = MemoryStorageClient()
request_queue = await RequestQueue.open(storage_client=storage_client)

req = Request.from_url("https://crawlee.dev")

crawler = BasicCrawler(
concurrency_settings=ConcurrencySettings(desired_concurrency=1, max_concurrency=1),
max_crawl_depth=2,
request_manager=request_queue,
)

@crawler.router.default_handler
async def handler(context: BasicCrawlingContext) -> None:
pass

await request_queue.add_request(req)

stats = await crawler.run()
assert stats.requests_finished > 0
Eric
EricOP•2w ago
oh okay, thank you very much! It is not intuitive to me the difference and why would the first not work, but I changed to what you suggest and it works 🙂 @Mantisus is it possible that this deletes the request queue?
request_queue = await RequestQueue.open(storage_client=sql_client) dataset = await Dataset.open(storage_client=sql_client) my db keeps getting purged since I changed to this
Mantisus
Mantisus•2w ago
@Eric You should use the named queue so that it is not purged .
request_queue = await RequestQueue.open(storage_client=sql_client, name='my-best-queue')
request_queue = await RequestQueue.open(storage_client=sql_client, name='my-best-queue')
Default storages and storages created with alias are purged during startup if Configuration.purge_on_start=True (default behavior). https://crawlee.dev/python/docs/guides/storages#named-and-unnamed-storages
Eric
EricOP•2w ago
oh okay! thanks! I had not read this page sorry

Did you find this page helpful?