correct-apricot
correct-apricot2y ago

How to make a request within a handler

I'm using Crawlee + PlaywrightCrawler and PlaywrightRouter. I have scenarios where inside one of my link handlers I want to gather data from the page that I need to make a multiple API calls inside of that same page handler. I gather data from those API calls to addRequests for additional pages. Obviously I could reach for a non crawlee http client like Axios but I was wondering if there was any suggested way for inline API requests. Thanks for any help in advance
10 Replies
correct-apricot
correct-apricotOP2y ago
I found that the handler provides a sendRequest parameter in it's callback. It seems like this is what I was looking for but I'm open to other thoughts.
ondro_k
ondro_k2y ago
Hi, sendRequest is the way to go. It also uses your proxy settings out of the box, which you would have to set manually with axios or fetch.
unwilling-turquoise
unwilling-turquoise2y ago
Perfect, thanks!
deep-jade
deep-jade2y ago
@lafffey you want to gather data from some other page while in PlaywrightCrawler.requestHandler - right? I am experimenting with page.goto for this. Well it works... can not say a lot about side effects, disadvantages etc... Would you show some example code with sendRequest pls?
Lukas Krivka
Lukas Krivka2y ago
sendRequest uses the got HTTP library so it is much faster than page.goto
genetic-orange
genetic-orange2y ago
hey, is there a way I can import sendRequest's type from crawlee? use case is I want to break my code out in functions for better modularity and for one of them I need to use sendRequest which is only available under CrawlingContext I just ended up passing in the entire context in these functions to get access to sendRequest's type is this a good way of doing it?
Lukas Krivka
Lukas Krivka2y ago
Hmm, I'm not sure where the type is to be honest
genetic-orange
genetic-orange2y ago
yeah, it was actually under PlaywrightCrawlingContext, I just used sendRequest: PlaywrightCrawlingContext['sendRequest'] and TS was happy with it. Just a suggestion but it might be good add in the notes for the typescript section that we can import the types for Locator and Page from playwright itself since crawlee is agnostic of playwright version, I just saw the package.json today. And I think its probably the same for Puppeteer too and just one last question, will sendRequest auto pick up browser context, session key etc if I call it under a Playwright crawler? since sendRequest is defined for CheerioCrawler in docs, I need it to get the link after a redirect and am using page.goto for it
Lukas Krivka
Lukas Krivka2y ago
It should use the current session with cookies etc. so yeah

Did you find this page helpful?