vicious-gold
vicious-gold2y ago

Is it possible to run a crawl within a crawl?

I have a crawler I have setup (x) that collects links from a page. I want to run another crawler(y) within x crawl to collect more data and return it with the rest of x crawl data. Is it possible to do this? (I know you can add requests to the queue from a crawl but I want to keep the data together) example data structure:
const data = [{
// from "x" crawl
title: 'a page',
links: [
{
// from "y" crawl
title: 'a page',
link: 'link'
}
]
}]
const data = [{
// from "x" crawl
title: 'a page',
links: [
{
// from "y" crawl
title: 'a page',
link: 'link'
}
]
}]
3 Replies
MEE6
MEE62y ago
@𝕁𝕚𝕞𝕡𝕖𝕩 just advanced to level 1! Thanks for your contributions! 🎉
Pepa J
Pepa J2y ago
Hi @𝕁𝕚𝕞𝕡𝕖𝕩 , For using two crawlers I advice you to set separate RequestQueue for each of them. You may set RequestQueue1 to Crawler1 and RequestQueue2 to Crawler2. Then in Crawler1, you may add new request to the RequestQueue2 and pass partional information from processing Request in Crawler1 by using Request.userData. So in Crawler2 you'ill have access to the Request userData from Crawler1 and you may modify it and save to the Dataset. I just noticed you want to run crawler inside another crawler, and I don't see any point in it, you should be able to enqueue new Request and pass current value as Request.userData and process if afterwards and save to the Dataset.
vicious-gold
vicious-goldOP2y ago
Thanks! Will do that

Did you find this page helpful?