correct-apricot•2y ago
How to make a request within a handler
I'm using Crawlee + PlaywrightCrawler and PlaywrightRouter. I have scenarios where inside one of my link
handlers
I want to gather data from the page that I need to make a multiple API calls inside of that same page handler. I gather data from those API calls to addRequests
for additional pages.
Obviously I could reach for a non crawlee http client like Axios but I was wondering if there was any suggested way for inline API requests.
Thanks for any help in advance10 Replies
correct-apricotOP•2y ago
I found that the handler provides a
sendRequest
parameter in it's callback. It seems like this is what I was looking for but I'm open to other thoughts.Hi,
sendRequest
is the way to go. It also uses your proxy settings out of the box, which you would have to set manually with axios
or fetch
.unwilling-turquoise•2y ago
Perfect, thanks!
deep-jade•2y ago
@lafffey you want to gather data from some other page while in
PlaywrightCrawler.requestHandler
- right?
I am experimenting with page.goto
for this. Well it works... can not say a lot about side effects, disadvantages etc...
Would you show some example code with sendRequest
pls?sendRequest
uses the got
HTTP library so it is much faster than page.goto
Got Scraping | Crawlee
Blazing fast cURL alternative for modern web scraping
genetic-orange•2y ago
hey, is there a way I can import
sendRequest
's type from crawlee? use case is I want to break my code out in functions for better modularity and for one of them I need to use sendRequest which is only available under CrawlingContext
I just ended up passing in the entire context in these functions to get access to sendRequest's type
is this a good way of doing it?Hmm, I'm not sure where the type is to be honest
genetic-orange•2y ago
yeah, it was actually under
PlaywrightCrawlingContext
, I just used sendRequest: PlaywrightCrawlingContext['sendRequest']
and TS was happy with it. Just a suggestion but it might be good add in the notes for the typescript section that we can import the types for Locator and Page from playwright itself since crawlee is agnostic of playwright version, I just saw the package.json today. And I think its probably the same for Puppeteer too
and just one last question, will sendRequest
auto pick up browser context, session key etc if I call it under a Playwright crawler? since sendRequest is defined for CheerioCrawler in docs, I need it to get the link after a redirect and am using page.goto
for itIt should use the current session with cookies etc. so yeah