Apify Discord Mirror

hey guys, not the most advanced apify user so i need help with scraping leads. Issue is i crape the max 5 k leads then try to restart the scraper and it rescrapes the same 5 k leads. how can i get it to scrape the next 5k leads
We're using the proxy feature and our usage is somewhat difficult to predict, we'd like to either be notified via slack when our account balance goes below a given threshold OR we'd like to setup automatic account balance top offs.

Are either of these possible?
I can just use the free plan and refill some credits for pay as you go ? Becuase my usage is only one time thing.
2 comments
a
s
When using a transparent icon for the actor (WEBP or PNG images), an unexpected black border appears (on Google Chrome 80% Zoom)
1 comment
R
Hi there. I am coming from scraperPAI solutions and I am having issues w/ them. I just want to try Apify.
I am trying to build my firt actor without any succeed currently.
The test actor sample offers a full example. Sounds great but I get error when I try to use another URL than the one proposed by default (https://www.apify.com) I get an error. For example I try the following https://fr.indeed.com and I get an error. Any idea?
1 comment
В
When I use apify run it say python can't be detected. It's installed and it's in PATH variable and everything and works from cmd and powershell like charm. Also, I updated node and npm to the latest version and reinstalled apify-cli
b
billsauce
·

error

hi why do i always get this error: raise ApifyApiError(response, attempt)
apify_client._errors.ApifyApiError: You must rent a paid Actor in order to run it. i have apify pro
I want to test the apify proxy and how it works to integrate it with my py code.
Running a very simple check I found it's not working with https urls. here's a snippet:
Plain Text
import asyncio, httpx
from apify import Actor
import dotenv

async def main():
    async with Actor:
        proxy_configuration = await Actor.create_proxy_configuration(
            password=dotenv.get_key('.env', 'APIFY_PROXY_PASSWORD'),
        )
        proxy_url = await proxy_configuration.new_url()
        proxies = {
            'http': proxy_url,
            'https': proxy_url,
        }
        async with httpx.AsyncClient(proxy=proxy_url) as client:
            for _ in range(3):
                response = await client.get('https://httpbin.org/ip')
                if response.status_code == 200:
                    print(response.json())
                elif response:
                    print(response.text)

if __name__ == '__main__':
    asyncio.run(main())

giveing me a proxy error:
Plain Text
          raise mapped_exc(message) from exc
      httpx.ReadTimeout
[apify] INFO  Exiting Actor ({"exit_code": 91})

If i just only change the protocol to http://httpbin.org/ip it works.
Apify proxy should support https as stated on the site. Thanks in advance.
3 comments
R
What is wrong with my transformation?
everything under physicianInfo is not beeing displayed on joboverview
D
DuxSec
·
E
Solved

Double log output

in main.py logging works as expected, however in routes.py logging is printed twice for some reason.
I did not setup any custom logging, I just use
Actor.log.info("STARTING A NEW CRAWL JOB")

example:
Plain Text
[apify] INFO  Checking item 17
[apify] INFO  Checking item 17 ({"message": "Checking item 17"})
[apify] INFO  Processing new item with index: 17
[apify] INFO  Processing new item with index: 17 ({"message": "Processing new item with index: 17"})


If I add this in my main.py (https://docs.apify.com/sdk/python/docs/concepts/logging)
Plain Text
async def main() -> None:
    async with Actor:
        ##### SETUP LOGGING #####
        handler = logging.StreamHandler()
        handler.setFormatter(ActorLogFormatter())

        apify_logger = logging.getLogger('apify')
        apify_logger.setLevel(logging.DEBUG)
        apify_logger.addHandler(handler)

it prints everything from main.py 2x, and everything from routes.py 3x.

Plain Text
[apify] INFO  STARTING A NEW CRAWL JOB
[apify] INFO  STARTING A NEW CRAWL JOB ({"message": "STARTING A NEW CRAWL JOB"})
[apify] INFO  STARTING A NEW CRAWL JOB ({"message": "STARTING A NEW CRAWL JOB"})
11 comments
E
D
Hi, I've seen mentions of a "pay per event" pricing model https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event & https://apify.com/mhamas/pay-per-event-example, but can't find how to use it for one of my actor, i only see rental or pay per result options.
How can we use this pay per event pricing model?
8 comments
M
A
S
D
J
Hello, I would like to ask if any Apify tool can, for example, find a similar image - https://i.postimg.cc/KzRHFKQc/55.jpg and extract the product name from the links to CSV. We can use Google Lens? I want to use this to automatically name antique products.

Thanks for the all informations and help! 👋
1 comment
A

I have created a scraper but am having issues posting it to the store. I opened my account 2 days ago and would like to start earning money on my scraper

I'm attempting to validate that the proxy works and am not having luck, should I expect the following to work?

Plain Text
~ λ curl --proxy http://proxy.apify.com:8000  -U 'groups-RESIDENTIAL,country-US:apify_proxy_redacted' -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"  https://httpbin.org/ip
curl: (56) CONNECT tunnel failed, response 403
3 comments
a
-
my account: https://apify.com/wudizhangzhi
actors: https://console.apify.com/actors/KAkfFaz8JVdvOQQ5F/source

Error: Operation failed! (You currently don’t have the necessary permissions to publish an Actor. This is expected behavior. Please contact support for assistance in resolving the issue.)

@Saurav Jain
2 comments
P
y
Guys im new to apify and i want to publish my newly built job scraper but when i setup monetization there are two business id and personal id option, where can i get this?
1 comment
S
Hi everyone,
I recently ran a Google Maps scraper (https://apify.com/compass/crawler-google-places) to collect place data, and I've discovered that there are many more places available than what was initially collected in my first run.
Current Situation:
  • Successfully completed an initial scrape
  • Have collected data for X places
  • Discovered there are significantly more places available
  • Already have a dataset from the first run
Questions:
Is it possible to increase the place limit on my existing run configuration?
If I need to create a new run, what's the best way to:
  • Import/merge my existing scraped data
  • Avoid duplicating places already collected
  • Continue from where the previous run stopped
Any guidance on the most efficient approach would be greatly appreciated.
Thanks in advance!
4 comments
P
C
S