Apify Discord Mirror

Home
Members
cirez_d
c
cirez_d
Offline, last seen 5 months ago
Joined August 30, 2024
Hello,
I am using pre-built actors in my application. I use them like this to create the dataset:
Plain Text
    client = ApifyClientAsync(token=settings.APIFY_API_TOKEN)
    run = await client.actor(actor.value).start(run_input=run_input)
    processed = 0
    while True:
        await asyncio.sleep(2)
        data = client.dataset(run["defaultDatasetId"]).iterate_items(offset=processed)
        async for item in data:
            dataset.append(item)
            processed += 1
            logger.info(f"processing item: {item.get('url')}")
        run_status = await client.run(run["id"]).get()
        if run_status.get("status", None) == "RUNNING":
            logger.info("Run is still running")
            continue
        else:
            logger.info("Run is finished.")
            break

I want to improve the error handling of this approach. I am wondering which types of errors or issues I could encounter and what the best practices are. Example: What happens if the actor breaks (memory/cpu/other issue) or I get an exception (which types)? What if there are errors in the dataset (400 status code, crawler blocked, etc.). Does anyone have recommendations here? Thank you!!
5 comments
c
v
Hello, is it possible to crawl Sharepoint pages that lie behind an Auth layer? Would this be generally possible and has anyone experience with this? Thank you.
2 comments
c
L
Hello,
I am using pre-built actors in my application. I use them like this to create the dataset:
Plain Text
    client = ApifyClientAsync(token=settings.APIFY_API_TOKEN)
    run = await client.actor(actor.value).start(run_input=run_input)
    processed = 0
    while True:
        await asyncio.sleep(2)
        data = client.dataset(run["defaultDatasetId"]).iterate_items(offset=processed)
        async for item in data:
            dataset.append(item)
            processed += 1
            logger.info(f"processing item: {item.get('url')}")
        run_status = await client.run(run["id"]).get()
        if run_status.get("status", None) == "RUNNING":
            logger.info("Run is still running")
            continue
        else:
            logger.info("Run is finished.")
            break

I want to improve the error handling of this approach. I am wondering which types of errors or issues I could encounter and what the best practices are. Example: What happens if the actor breaks (memory/cpu/other issue) or I get an exception (which types)? What if there are errors in the dataset (400 status code, crawler blocked, etc.). Does anyone have recommendations here? Thank you!!
1 comment
A