broad-emeraldB
Apify & Crawleeโ€ข3y agoโ€ข
5 replies
broad-emerald

New to Crawlee and after reading the docs, I'm not sure how to use it to crawl links in a website

So I'm quite new to Crawlee and I'm not sure how it really works ๐Ÿ˜ฆ

I've reach the docs and checked some examples but couldn't find anything really useful. I have a case where I need to login to a website and then go to a page where I have a list of links that I'd like to crawl, within each page I have more links to crawl and finally, within each page I'd like to perform some actions on the page. One of them is get the URL of a video and download the video to Google drive.

I've read about
enqueueLinks
and
RequestQueue
but I really don't know how it works. I've checked the example in the home page but that's not really what I want. I'd like to login, then go to a page
https://www.my-site.com/categories
and then from there grab all links that match the glob
I have this
await enqueueLinks({
  globs: ['https://www.my-site.com/categories']
})

So it would get links
https://www.my-site.com/categories/1
,
https://www.my-site.com/categories/2
, etc. Then for each category it would get all links in the page
'https://www.my-site.com/categories/1/posts/1
,
'https://www.my-site.com/categories/1/posts/2
, etc. And then in each of these pages I'd like to do something here.

I've tried to add links to the queue with the glob above but it only get the root URL
https://www.my-site.com
.

Any help would be greatly appreciated.

Thank you
Was this page helpful?