cirez_d

·

Error handling/Best Practices Python SDK

Hello,
I am using pre-built actors in my application. I use them like this to create the dataset:

Plain Text

    client = ApifyClientAsync(token=settings.APIFY_API_TOKEN)
    run = await client.actor(actor.value).start(run_input=run_input)
    processed = 0
    while True:
        await asyncio.sleep(2)
        data = client.dataset(run["defaultDatasetId"]).iterate_items(offset=processed)
        async for item in data:
            dataset.append(item)
            processed += 1
            logger.info(f"processing item: {item.get('url')}")
        run_status = await client.run(run["id"]).get()
        if run_status.get("status", None) == "RUNNING":
            logger.info("Run is still running")
            continue
        else:
            logger.info("Run is finished.")
            break

I want to improve the error handling of this approach. I am wondering which types of errors or issues I could encounter and what the best practices are. Example: What happens if the actor breaks (memory/cpu/other issue) or I get an exception (which types)? What if there are errors in the dataset (400 status code, crawler blocked, etc.). Does anyone have recommendations here? Thank you!!

5 comments

c

v

ccirez_d

·

Sharepoint pages crawlable?

Hello, is it possible to crawl Sharepoint pages that lie behind an Auth layer? Would this be generally possible and has anyone experience with this? Thank you.

2 comments

c

L