foreign-sapphire
foreign-sapphire2y ago

Cheerio extract function

Cheerio mentions an extract function in its documentation. However this function is not included in any release. Does CheerioCrawler implement this functionality or is it completely the same as the original cheerio library ?
5 Replies
ondro_k
ondro_k2y ago
Hi, yes, it's the same object: $ from crawlingContext is created with cheerio.load(). crawlee doesn't implement extract on its own.
HonzaS
HonzaS2y ago
I think the parser is little different than original cheerio, look here https://discord.com/channels/801163717915574323/801163719198638092/875286999408443392
environmental-rose
environmental-rose2y ago
Hey — I ran into this issue myself, and the problem is that Cheerio has added the extract function to the NEXT (unreleased) version but not the current one.
environmental-rose
environmental-rose2y ago
The documentation doesn't note the difference, so it's pretty damn confusing. The nice news is that there's a workable interim project out there that gives some similar functionality: https://github.com/denkan/cheerio-json-mapper
GitHub
GitHub - denkan/cheerio-json-mapper
Contribute to denkan/cheerio-json-mapper development by creating an account on GitHub.
environmental-rose
environmental-rose2y ago
if you have a cheerio instance or an html string, you can pass it in along with the 'extraction template' and it does its magic. I've moved over to using it for almost all of my large-scale page parsing due to the convenience

Did you find this page helpful?