**Best practice example on how to

Best practice example on how to implement PPE princing There are quite some questions on how to correctly implement PPE charging. This is how I implement it. Would be nice if someone at Apify or community developers could verify the approach I'm using here or suggest improvements so we can all learn from that. The example fetches paginated search results and then scrapes detailed listings. Some limitations and criteria: - We only use synthetic PPE events: apify-actor-start and apify-default-dataset-item - I want to detect free users and limit their functionality. - We use datacenter proxies.
6 Replies
Louis Deconinck
Louis DeconinckOP6d ago
import { Actor, log, ProxyConfiguration } from 'apify';
import { HttpCrawler } from 'crawlee';

await Actor.init();

const { userIsPaying } = Actor.getEnv();
if (!userIsPaying) {
log.info('You need a paid Apify plan to scrape mulptiple pages');
}

const { keyword } = await Actor.getInput() ?? {};

const proxyConfiguration = new ProxyConfiguration();

const crawler = new HttpCrawler({
proxyConfiguration,
requestHandler: async ({ json, request, pushData, addRequests }) => {
const chargeLimit = Actor.getChargingManager().calculateMaxEventChargeCountWithinLimit('apify-default-dataset-item');
if (chargeLimit <= 0) {
log.warning('Reached the maximum allowed cost for this run. Increase the maximum cost per run to scrape more.');
await crawler.autoscaledPool?.abort();
return;
}

if (request.label === 'SEARCH') {
const { listings = [], page = 1, totalPages = 1 } = json;

// Enqueue all listings
for (const listing of listings) {
addRequests([{
url: listing.url,
label: 'LISTING',
}]);
}

// If we are on page 1, enqueue all other pages if user is paying
if (page === 1 && totalPages > 1 && userIsPaying) {
for (let nextPage = 2; nextPage <= totalPages; nextPage++) {
const nextUrl = `https://example.com/search?keyword=${encodeURIComponent(request.userData.keyword)}&page=${nextPage}`;
addRequests([{
url: nextUrl,
label: 'SEARCH',
}]);
}
}
} else {
// Process individual listing
await pushData(json);
}
}
});

await crawler.run([{
url: `https://example.com/search?keyword=${encodeURIComponent(keyword)}&page=1`,
label: 'SEARCH',
userData: { keyword },
}]);

await Actor.exit();
import { Actor, log, ProxyConfiguration } from 'apify';
import { HttpCrawler } from 'crawlee';

await Actor.init();

const { userIsPaying } = Actor.getEnv();
if (!userIsPaying) {
log.info('You need a paid Apify plan to scrape mulptiple pages');
}

const { keyword } = await Actor.getInput() ?? {};

const proxyConfiguration = new ProxyConfiguration();

const crawler = new HttpCrawler({
proxyConfiguration,
requestHandler: async ({ json, request, pushData, addRequests }) => {
const chargeLimit = Actor.getChargingManager().calculateMaxEventChargeCountWithinLimit('apify-default-dataset-item');
if (chargeLimit <= 0) {
log.warning('Reached the maximum allowed cost for this run. Increase the maximum cost per run to scrape more.');
await crawler.autoscaledPool?.abort();
return;
}

if (request.label === 'SEARCH') {
const { listings = [], page = 1, totalPages = 1 } = json;

// Enqueue all listings
for (const listing of listings) {
addRequests([{
url: listing.url,
label: 'LISTING',
}]);
}

// If we are on page 1, enqueue all other pages if user is paying
if (page === 1 && totalPages > 1 && userIsPaying) {
for (let nextPage = 2; nextPage <= totalPages; nextPage++) {
const nextUrl = `https://example.com/search?keyword=${encodeURIComponent(request.userData.keyword)}&page=${nextPage}`;
addRequests([{
url: nextUrl,
label: 'SEARCH',
}]);
}
}
} else {
// Process individual listing
await pushData(json);
}
}
});

await crawler.run([{
url: `https://example.com/search?keyword=${encodeURIComponent(keyword)}&page=1`,
label: 'SEARCH',
userData: { keyword },
}]);

await Actor.exit();
Strijdhagen
Strijdhagen5d ago
super helpful, thanks for sharing Louis!
Louis Deconinck
Louis DeconinckOP5d ago
An improvement to my code: it's better to first collect new request in an array and push them in 1 go, instead of adding 1 request each time, which might overload the Apify API.
gtry
gtry5d ago
Previously, I used the charging-manager.ts Has the API changed?
Lukas Krivka
Lukas Krivka5d ago
Reddit
From the apify community on Reddit
Explore this post and more from the apify community
Lukas Krivka
Lukas Krivka5d ago
Previously, I used the charging-manager.ts Has the API changed?
There is now option to let Apify platform charge apify-actor-start and apify-default-dataset-item on its own so you don't need to charge them in code. But you loose some flexibility with that.

Did you find this page helpful?