wise-white•16mo ago
Cheerio extract function
Cheerio mentions an
extract
function in its documentation. However this function is not included in any release. Does CheerioCrawler implement this functionality or is it completely the same as the original cheerio library ?5 Replies
Hi, yes, it's the same object:
$
from crawlingContext
is created with cheerio.load()
. crawlee
doesn't implement extract
on its own.I think the parser is little different than original cheerio, look here https://discord.com/channels/801163717915574323/801163719198638092/875286999408443392
flat-fuchsia•15mo ago
Hey — I ran into this issue myself, and the problem is that Cheerio has added the extract function to the NEXT (unreleased) version but not the current one.
flat-fuchsia•15mo ago
The documentation doesn't note the difference, so it's pretty damn confusing. The nice news is that there's a workable interim project out there that gives some similar functionality: https://github.com/denkan/cheerio-json-mapper
GitHub
GitHub - denkan/cheerio-json-mapper
Contribute to denkan/cheerio-json-mapper development by creating an account on GitHub.
flat-fuchsia•15mo ago
if you have a cheerio instance or an html string, you can pass it in along with the 'extraction template' and it does its magic. I've moved over to using it for almost all of my large-scale page parsing due to the convenience