unwilling-turquoise
unwilling-turquoise•3y ago

how to disable duplicates check

import { Dataset, HttpCrawler, log, LogLevel } from 'crawlee';
log.setLevel(LogLevel.DEBUG);
const crawler = new HttpCrawler({
useSessionPool:false,
persistCookiesPerSession:false,
minConcurrency: 1,
maxConcurrency: 5,
maxRequestRetries: 1,
requestHandlerTimeoutSecs: 30,
maxRequestsPerCrawl: 10,
async requestHandler({ request, body }) {
log.debug(`Processing ${request.url}...`);
log.debug(`${body}`);
},
failedRequestHandler({ request }) {
log.debug(`Request ${request.url} failed twice.`);
},
});
await crawler.run([
'https://httpbin.org/ip','https://httpbin.org/ip',
]);
log.debug('Crawler finished.');
import { Dataset, HttpCrawler, log, LogLevel } from 'crawlee';
log.setLevel(LogLevel.DEBUG);
const crawler = new HttpCrawler({
useSessionPool:false,
persistCookiesPerSession:false,
minConcurrency: 1,
maxConcurrency: 5,
maxRequestRetries: 1,
requestHandlerTimeoutSecs: 30,
maxRequestsPerCrawl: 10,
async requestHandler({ request, body }) {
log.debug(`Processing ${request.url}...`);
log.debug(`${body}`);
},
failedRequestHandler({ request }) {
log.debug(`Request ${request.url} failed twice.`);
},
});
await crawler.run([
'https://httpbin.org/ip','https://httpbin.org/ip',
]);
log.debug('Crawler finished.');
This is my current code
5 Replies
unwilling-turquoise
unwilling-turquoiseOP•3y ago
MEE6
MEE6•3y ago
@max just advanced to level 1! Thanks for your contributions! 🎉
metropolitan-bronze
metropolitan-bronze•3y ago
On each request, give it a uniqueKey that is unqiue. Or, if the payloads/headers are different for each request but the URL is the same, you can use the useExtendedUniqueKey option set to true. These options both go into RequestOptions where you configure the url, label, headers, etc.
unwilling-turquoise
unwilling-turquoiseOP•3y ago
thank you

Did you find this page helpful?