conscious-sapphire
conscious-sapphire13mo ago

How to debug seemingly no html in crawled response (CheerioCrawler)

Duplicated a custom apify actor that was working great, didn't really change anything but a few selectors and pointed at a new site. Unfortunately the actor seems to exit "successfully" after the first url (only start url) is handled. None of my logging shows anything is in the html returned, and enqueuelinks ofc does nothing, yet cheerio beleives the page request responded successfully. How would I approach debugging this situation? I've so far checked that $('body').html() returns empty string and attempted using RESIDENTIAL proxy in local geolocation to the website in case it was clever blocking but no success. The url being scraped is https://www.tesco.com/groceries/en-GB/shop/health-and-beauty/shampoo/all?page=1&count=48
5 Replies
HonzaS
HonzaS13mo ago
did you try $.html() ?
correct-apricot
correct-apricot13mo ago
Hello, for debugging I would advise to use a client that displays the responses, I use Insomnia personally. When I try to run the URL you provided, it actually doesn't return any HTML elements inside the body, just some meta properties and a script.
conscious-sapphire
conscious-sapphireOP13mo ago
Thanks for the pointers Seems to be some "advanced" anti-crawling going on
HonzaS
HonzaS13mo ago
Try the playwright, should work.
Oleg V.
Oleg V.12mo ago
You can also try $("*").html() (there are might be no body tag in response) or try just check response.body (in case there is some json object)

Did you find this page helpful?