foreign-sapphire•2y ago
Cheerio extract function
Cheerio mentions an
extract function in its documentation. However this function is not included in any release. Does CheerioCrawler implement this functionality or is it completely the same as the original cheerio library ?5 Replies
Hi, yes, it's the same object:
$ from crawlingContext is created with cheerio.load(). crawlee doesn't implement extract on its own.I think the parser is little different than original cheerio, look here https://discord.com/channels/801163717915574323/801163719198638092/875286999408443392
environmental-rose•2y ago
Hey — I ran into this issue myself, and the problem is that Cheerio has added the extract function to the NEXT (unreleased) version but not the current one.
environmental-rose•2y ago
The documentation doesn't note the difference, so it's pretty damn confusing. The nice news is that there's a workable interim project out there that gives some similar functionality: https://github.com/denkan/cheerio-json-mapper
GitHub
GitHub - denkan/cheerio-json-mapper
Contribute to denkan/cheerio-json-mapper development by creating an account on GitHub.
environmental-rose•2y ago
if you have a cheerio instance or an html string, you can pass it in along with the 'extraction template' and it does its magic. I've moved over to using it for almost all of my large-scale page parsing due to the convenience