ambitious-aqua
ambitious-aqua3y ago

Failed request expected type of list but received object

Hello, I am adding a list to crawler.addRequests() but I still get the error which you can see in the header.
if (directRequests.length > 0) {
await crawler.addRequests(directRequests);
} else {
start crawling
await crawler.addRequests([{
url: SITE_ORIGIN,
label: LABEL_CATEGORY,
}]);
if (directRequests.length > 0) {
await crawler.addRequests(directRequests);
} else {
start crawling
await crawler.addRequests([{
url: SITE_ORIGIN,
label: LABEL_CATEGORY,
}]);
Here is the full ERROR message:
INFO Page opened. {"label":"category","url":"https://www.hanwha-security.eu"}
WARN CheerioCrawler: Reclaiming failed request back to the list or queue. Expected `requests` to be of type `array` but received type `Object
INFO Page opened. {"label":"category","url":"https://www.hanwha-security.eu"}
WARN CheerioCrawler: Reclaiming failed request back to the list or queue. Expected `requests` to be of type `array` but received type `Object
How can I parse a list and not an object? Any suggestions? Many thanks in advance!
3 Replies
Oleg V.
Oleg V.3y ago
Did you check directRequests value? Are you sure you have an array of Requests there? Maybe try to console.log it/check type of it.
ambitious-aqua
ambitious-aquaOP3y ago
The type is object. But in my actor.main I am saving it into an array like this:
const { directRequests = [] } = await Actor.getInput();
log.info(typeof(directRequests));
const { directRequests = [] } = await Actor.getInput();
log.info(typeof(directRequests));
Here is the full code of the main.js, my other scrapers are working with this structure but somehow this scraper seems to have problems while processing..
import { Actor } from 'apify';
import { log, CheerioCrawler } from 'crawlee';
import { CONSTANTS } from './constants.js';
import { handleCategories, handleProduct, handleProductList } from './routes/index.js';

const { LABEL_CATEGORY, LABEL_PRODUCT, LABEL_PRODUCT_LIST, DEBUG_DATASET_NAME, DATASET_NAME, SITE_ORIGIN} = CONSTANTS;


Actor.main(async () => {

const { directRequests = [] } = await Actor.getInput();
//Creates a new Requestqueue for handling the request that will come up

const proxyConfiguration = await Actor.createProxyConfiguration();


const dataset = await Actor.openDataset(DATASET_NAME);

const crawler = new CheerioCrawler({
proxyConfiguration,
maxRequestRetries: 10,
maxConcurrency: 10,
requestHandler: async (context) => {
const { url, label } = context.request;

log.info('Page opened.', { label, url });
switch (label) {
case LABEL_CATEGORY:
return handleCategories(context);

case LABEL_PRODUCT_LIST:
return handleProductList(context);
case LABEL_PRODUCT:
return handleProduct(context, dataset);

default:
throw new Error(`Unknown label: ${label}`);
//break;
}
},
});

if (directRequests.length > 0) {

await crawler.addRequests(directRequests);
} else {

await crawler.addRequests([{

url: `${SITE_ORIGIN}`,

label: LABEL_CATEGORY,
}]);
}
console.log('Starting the crawl.');
await crawler.run();

});
import { Actor } from 'apify';
import { log, CheerioCrawler } from 'crawlee';
import { CONSTANTS } from './constants.js';
import { handleCategories, handleProduct, handleProductList } from './routes/index.js';

const { LABEL_CATEGORY, LABEL_PRODUCT, LABEL_PRODUCT_LIST, DEBUG_DATASET_NAME, DATASET_NAME, SITE_ORIGIN} = CONSTANTS;


Actor.main(async () => {

const { directRequests = [] } = await Actor.getInput();
//Creates a new Requestqueue for handling the request that will come up

const proxyConfiguration = await Actor.createProxyConfiguration();


const dataset = await Actor.openDataset(DATASET_NAME);

const crawler = new CheerioCrawler({
proxyConfiguration,
maxRequestRetries: 10,
maxConcurrency: 10,
requestHandler: async (context) => {
const { url, label } = context.request;

log.info('Page opened.', { label, url });
switch (label) {
case LABEL_CATEGORY:
return handleCategories(context);

case LABEL_PRODUCT_LIST:
return handleProductList(context);
case LABEL_PRODUCT:
return handleProduct(context, dataset);

default:
throw new Error(`Unknown label: ${label}`);
//break;
}
},
});

if (directRequests.length > 0) {

await crawler.addRequests(directRequests);
} else {

await crawler.addRequests([{

url: `${SITE_ORIGIN}`,

label: LABEL_CATEGORY,
}]);
}
console.log('Starting the crawl.');
await crawler.run();

});
Alexey Udovydchenko
you need to check actual value of directRequests - from your description sounds like you passing wrong value in input

Did you find this page helpful?