deep-jade
deep-jade3y ago

How to make a request within a handler

I'm using Crawlee + PlaywrightCrawler and PlaywrightRouter. I have scenarios where inside one of my link handlers I want to gather data from the page that I need to make a multiple API calls inside of that same page handler. I gather data from those API calls to addRequests for additional pages. Obviously I could reach for a non crawlee http client like Axios but I was wondering if there was any suggested way for inline API requests. Thanks for any help in advance
10 Replies
deep-jade
deep-jadeOP3y ago
I found that the handler provides a sendRequest parameter in it's callback. It seems like this is what I was looking for but I'm open to other thoughts.
ondro_k
ondro_k3y ago
Hi, sendRequest is the way to go. It also uses your proxy settings out of the box, which you would have to set manually with axios or fetch.
flat-fuchsia
flat-fuchsia3y ago
Perfect, thanks!
like-gold
like-gold2y ago
@lafffey you want to gather data from some other page while in PlaywrightCrawler.requestHandler - right? I am experimenting with page.goto for this. Well it works... can not say a lot about side effects, disadvantages etc... Would you show some example code with sendRequest pls?
Lukas Krivka
Lukas Krivka2y ago
sendRequest uses the got HTTP library so it is much faster than page.goto
extended-salmon
extended-salmon2y ago
hey, is there a way I can import sendRequest's type from crawlee? use case is I want to break my code out in functions for better modularity and for one of them I need to use sendRequest which is only available under CrawlingContext I just ended up passing in the entire context in these functions to get access to sendRequest's type is this a good way of doing it?
Lukas Krivka
Lukas Krivka2y ago
Hmm, I'm not sure where the type is to be honest
extended-salmon
extended-salmon2y ago
yeah, it was actually under PlaywrightCrawlingContext, I just used sendRequest: PlaywrightCrawlingContext['sendRequest'] and TS was happy with it. Just a suggestion but it might be good add in the notes for the typescript section that we can import the types for Locator and Page from playwright itself since crawlee is agnostic of playwright version, I just saw the package.json today. And I think its probably the same for Puppeteer too and just one last question, will sendRequest auto pick up browser context, session key etc if I call it under a Playwright crawler? since sendRequest is defined for CheerioCrawler in docs, I need it to get the link after a redirect and am using page.goto for it
Lukas Krivka
Lukas Krivka2y ago
It should use the current session with cookies etc. so yeah

Did you find this page helpful?