gtry
gtry2y ago

Hi vladdy VaclavRut Petr Pátek 2496

Hi @vladdy @Nardack @petrpatek. If I need to do api(POST request) scraping using crawlee, how do I do it? Thanks
8 Replies
Alexey Udovydchenko
In actor console i.e. https://console.apify.com/actors/4Hv5RhChiaDk6iwad/console click "API" button in top right area, then choose "API Endpoints" and proceed either with calls or references to manual
Apify
Apify Console
Manage the Apify platform and your account.
gtry
gtryOP2y ago
@vladdy @petrpatek. @Alexey Udovydchenko Hi Alexy Thank you for your response. Actually by api post request, I meant instead of regular get request we make using crawler.run function, I would like to do post request
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
countryCode: 'FR',
});
const crawler = new HttpCrawler({
proxyConfiguration,
requestHandler: async ({ request, sendRequest, log, pushData}) => {
// based on the response of the post request I would like to make other post requests here to the same url but with diff headers and body.
log.debug(`Enqueueing pagination: ${request.url}`)
});
});

await crawler.run([
{
url: <URL>,
method: 'POST',
headers: headers,
payload: body,
},
]);
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
countryCode: 'FR',
});
const crawler = new HttpCrawler({
proxyConfiguration,
requestHandler: async ({ request, sendRequest, log, pushData}) => {
// based on the response of the post request I would like to make other post requests here to the same url but with diff headers and body.
log.debug(`Enqueueing pagination: ${request.url}`)
});
});

await crawler.run([
{
url: <URL>,
method: 'POST',
headers: headers,
payload: body,
},
]);
I have the below doubts. 1. How do I achieve that in crawlee? Please check the comments inside the requestHandler function. 2. As you can see that I am using Actor.createProxyConfiguration, would it automatically pick apify proxy and i don't have to do anything? 3. Suppose I do want tp provide a custom array of proxiy urls, do to achieve that using Actor.createProxyConfiguration 4. Can I use Actor.createProxyConfiguration with the BasicCrawler as in the code above? 5. What is the diff between
// no opts
const proxyConfiguration = await Actor.createProxyConfiguration();

// opts
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
countryCode: 'FR',
});

// and using
APIFY_PROXY_PASSWORD
// no opts
const proxyConfiguration = await Actor.createProxyConfiguration();

// opts
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
countryCode: 'FR',
});

// and using
APIFY_PROXY_PASSWORD
I am developing an actor for apify. Thanks
Alexey Udovydchenko
oh, clear now! You doing everything correct already, add method: "POST", payload: [STRING] to request, then crawler will POST it based on proxyConfig. If you want custom proxies you need proxy value { proxyConfiguration: { useApifyProxy: false, proxyUrls: [...] }} basic crawler will not do actual http calls, only forward request to handler function
gtry
gtryOP2y ago
Thank you for your response. Sorry, but I couldn't find details regarding the below code in the docs. So what is the difference between each one?
// with no options, does it autopicks a proxy? if yes, which group?

const proxyConfiguration = await Actor.createProxyConfiguration();

// with options I can understand the we are selecting a specific proxy
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
countryCode: 'FR',
});

// or using
APIFY_PROXY_PASSWORD
// with no options, does it autopicks a proxy? if yes, which group?

const proxyConfiguration = await Actor.createProxyConfiguration();

// with options I can understand the we are selecting a specific proxy
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
countryCode: 'FR',
});

// or using
APIFY_PROXY_PASSWORD
Alexey Udovydchenko
Please check https://crawlee.dev/docs/guides/session-management with samples per crawler type
gtry
gtryOP2y ago
Hi Sorry, not sure if I am missing something here but I couldn't find anything related to the difference between the proxyConfiguration options I have listed above.
Alexey Udovydchenko
Hi! No config means default congif, logically means to use apify proxy but not residentail group, password not accepted for config, in more details https://crawlee.dev/docs/guides/proxy-management#proxy-configuration
gtry
gtryOP2y ago
Thanks

Did you find this page helpful?