5 replies

New to Crawlee and after reading the docs, I'm not sure how to use it to crawl links in a website

So I'm quite new to Crawlee and I'm not sure how it really works

I've reach the docs and checked some examples but couldn't find anything really useful. I have a case where I need to login to a website and then go to a page where I have a list of links that I'd like to crawl, within each page I have more links to crawl and finally, within each page I'd like to perform some actions on the page. One of them is get the URL of a video and download the video to Google drive.

I've read about

enqueueLinks

enqueueLinks

and

RequestQueue

RequestQueue

but I really don't know how it works. I've checked the example in the home page but that's not really what I want. I'd like to login, then go to a page

https://www.my-site.com/categories

https://www.my-site.com/categories

and then from there grab all links that match the glob
I have this

await enqueueLinks({
  globs: ['https://www.my-site.com/categories']
})

await enqueueLinks({
  globs: ['https://www.my-site.com/categories']
})

So it would get links

https://www.my-site.com/categories/1

https://www.my-site.com/categories/1

https://www.my-site.com/categories/2

https://www.my-site.com/categories/2

, etc. Then for each category it would get all links in the page

'https://www.my-site.com/categories/1/posts/1

'https://www.my-site.com/categories/1/posts/1

'https://www.my-site.com/categories/1/posts/2

'https://www.my-site.com/categories/1/posts/2

, etc. And then in each of these pages I'd like to do something here.

I've tried to add links to the queue with the glob above but it only get the root URL

https://www.my-site.com

https://www.my-site.com

.

Any help would be greatly appreciated.

Thank you

New to Crawlee and after reading the docs, I'm not sure how to use it to crawl links in a website

Similar Threads