multiple-amethyst
multiple-amethyst•4y ago

how to disable duplicates check

import { Dataset, HttpCrawler, log, LogLevel } from 'crawlee';
log.setLevel(LogLevel.DEBUG);
const crawler = new HttpCrawler({
useSessionPool:false,
persistCookiesPerSession:false,
minConcurrency: 1,
maxConcurrency: 5,
maxRequestRetries: 1,
requestHandlerTimeoutSecs: 30,
maxRequestsPerCrawl: 10,
async requestHandler({ request, body }) {
log.debug(`Processing ${request.url}...`);
log.debug(`${body}`);
},
failedRequestHandler({ request }) {
log.debug(`Request ${request.url} failed twice.`);
},
});
await crawler.run([
'https://httpbin.org/ip','https://httpbin.org/ip',
]);
log.debug('Crawler finished.');
import { Dataset, HttpCrawler, log, LogLevel } from 'crawlee';
log.setLevel(LogLevel.DEBUG);
const crawler = new HttpCrawler({
useSessionPool:false,
persistCookiesPerSession:false,
minConcurrency: 1,
maxConcurrency: 5,
maxRequestRetries: 1,
requestHandlerTimeoutSecs: 30,
maxRequestsPerCrawl: 10,
async requestHandler({ request, body }) {
log.debug(`Processing ${request.url}...`);
log.debug(`${body}`);
},
failedRequestHandler({ request }) {
log.debug(`Request ${request.url} failed twice.`);
},
});
await crawler.run([
'https://httpbin.org/ip','https://httpbin.org/ip',
]);
log.debug('Crawler finished.');
This is my current code
5 Replies
multiple-amethyst
multiple-amethystOP•4y ago
MEE6
MEE6•4y ago
@max just advanced to level 1! Thanks for your contributions! 🎉
ambitious-aqua
ambitious-aqua•4y ago
On each request, give it a uniqueKey that is unqiue. Or, if the payloads/headers are different for each request but the URL is the same, you can use the useExtendedUniqueKey option set to true. These options both go into RequestOptions where you configure the url, label, headers, etc.
multiple-amethyst
multiple-amethystOP•4y ago
thank you

Did you find this page helpful?