sensitive-blue•3y ago
double import problem
i have my crawler to crawl a couple of sites and scrape them but i get this import problem when importing the router (which is the same for both sites but using a different route for the sites) from both of the sites
if i only import it from one site, it only runs one site how do i import it so it runs multiple sites and make it so it can scale up to multiple sites in the near future
it can successfully scrape amazon and ebay (ebay tags are kinda innacurate) but if only if i use the router from ebay or amazon and remove the other url from starturls, or else it gives an error for not having the AMAZON label or EBAY label anywhere
4 Replies
sensitive-blueOP•3y ago
main.js:
error message when run:
amazon.js:
import { createCheerioRouter } from 'crawlee';
import fs, { link } from 'fs';
import { crawler } from './main.js';
export const router = createCheerioRouter();
router.addHandler('EBAY', async ({ $, crawler }) => {
console.log('starting link scrape') // Scrape product links from search results page const productLinks = $('a.iteminfo-link').map((_, el) => $(el).attr('href')).get(); console.log(
console.log('starting link scrape') // Scrape product links from search results page const productLinks = $('a.iteminfo-link').map((_, el) => $(el).attr('href')).get(); console.log(
Found ${productLinks.length} product links for eBay
);
// Add each product link to request queue
for (const link of productLinks) {
const result = await crawler.addRequests([{ url: link, label: 'EBAY_PRODUCT' }])
await result.waitForAllRequestsToBeAdded;
}
});
router.addHandler('EBAY_PRODUCt', async ({ $, request }) => {
const productInfo = {};
productInfo.link = request.url;
productInfo.storeName = 'eBay';
productInfo.productTitle = $('h3.s-itemtitle').text().trim();
productInfo.productDescription = $('div.a-section.a-spacing-small.span.a-size-base-plus').text().trim();
productInfo.salePrice = $('span.s-itemprice').text().trim();
productInfo.originalPrice = $('span.s-itemprice--original').text().trim();
productInfo.reviewScore = $('div.s-itemreviews').text().trim();
productInfo.shippingInfo = $('span.s-itemshipping').text().trim();
// Write product info to JSON file
if (Object.keys(productInfo).length > 0) {
const rawData = JSON.stringify(productInfo, null, 2);
fs.appendFile('rawData.json', rawData, (err) => {
if (err) throw err;
});
}
});
```
(haven't added pagination this scraper yet)sensitive-blue•3y ago
I already replied in another thread. You should use one router instance
sensitive-blueOP•3y ago
i am using the same router instance for all of the sites but on different route handlers
the problem comes when i am importing the same router instance from different files
but i have each scrape running on a different file
so i need to import the same router instance to the main.js file but that is where i have a problem
sensitive-blue•3y ago
you're creting two instances - one in amazon.js:
export const router = createCheerioRouter();
and another one in ebay.js export const router = createCheerioRouter();
the fastest thing that come to my mind is to have this line export const router = createCheerioRouter();
in main.js
and in both ebay and amazon files - do import { router } from './main.js'