wise-white
wise-white16mo ago

Cheerio extract function

Cheerio mentions an extract function in its documentation. However this function is not included in any release. Does CheerioCrawler implement this functionality or is it completely the same as the original cheerio library ?
5 Replies
ondro_k
ondro_k16mo ago
Hi, yes, it's the same object: $ from crawlingContext is created with cheerio.load(). crawlee doesn't implement extract on its own.
HonzaS
HonzaS16mo ago
I think the parser is little different than original cheerio, look here https://discord.com/channels/801163717915574323/801163719198638092/875286999408443392
flat-fuchsia
flat-fuchsia15mo ago
Hey — I ran into this issue myself, and the problem is that Cheerio has added the extract function to the NEXT (unreleased) version but not the current one.
flat-fuchsia
flat-fuchsia15mo ago
The documentation doesn't note the difference, so it's pretty damn confusing. The nice news is that there's a workable interim project out there that gives some similar functionality: https://github.com/denkan/cheerio-json-mapper
GitHub
GitHub - denkan/cheerio-json-mapper
Contribute to denkan/cheerio-json-mapper development by creating an account on GitHub.
flat-fuchsia
flat-fuchsia15mo ago
if you have a cheerio instance or an html string, you can pass it in along with the 'extraction template' and it does its magic. I've moved over to using it for almost all of my large-scale page parsing due to the convenience

Did you find this page helpful?