Apify Discord Mirror

Updated 5 months ago

Getting no proxies on input - expected or a bug?

At a glance

The community member is trying to integrate their Scrapy actor with Playwright and is confused about the format of the proxy input from Apify. They printed out what their spider gets, which shows an empty list for APIFY_PROXY_SETTINGS, despite having the "Datacenter" option turned on. They are concerned that this means their scraper is running without proxies, which could be the reason they are getting blocked.

Another community member responds that the empty list is expected and that Datacenter proxies are used as the default. They suggest that the community member should be more interested in the actual proxyUrl, which contains information about the login and proxy password. They provide a link to the Apify documentation for more information on configuring proxies based on actor input.

The original community member thanks the other member and confirms that they understand now that the URL doesn't come from the input, but rather from the configuration, which allows the SDK to retrieve the URL from Apify's API.

Useful resources
I'm trying to integrate my Scrapy actor with Playwright, so I attempted to figure out what is the actual format of the proxy input from Apify, so that I could somehow pass it over to Playwright.

I printed out what my spider gets and this is what it prints:
Plain Text
APIFY_PROXY_SETTINGS: {'apifyProxyGroups': [], 'useApifyProxy': True}

Empty list. Is that expected or a bug? Does it mean that my scraper runs without proxies despite the fact I have the "Datacenter" option turned on? I'm really confused now.

The spider has some problems with getting blocked. If I thought I'm using proxies, but in fact there are none, then it's no surprise I'm getting blocked.
Attachment
Screenshot_2024-04-15_at_12.11.48.png
P
H
A
3 comments
Hi
Yes, this is what comes from the UI component for selecting proxy.

If the ProxyConfiguration is set - Datacenter proxies are used as default (I think you also may use "DATACENTER" proxy group but that would be the same). There would be None value provided from the UI component instead of the object structure (I suppose as I am not that familiar with the Python SDK, you may might ask in to get better answer) when the No proxy tab is selected.

What you might be actually interested is not what comes from the UI component but actual proxyUrl - that actually contains information about the login and proxy password etc. You might find more information about it at https://docs.apify.com/sdk/python/docs/concepts/proxy-management#configuring-proxy-based-on-actor-input.
Aaah, thanks! That makes sense πŸ™‚ So the URL doesn't come from the input, it's only config, which allows the SDK to decide how to retrieve the URL from Apify's API. Cool!
just advanced to level 4! Thanks for your contributions! πŸŽ‰
Add a reply
Sign up and join the conversation on Discord