ambitious-aqua
ambitious-aqua3y ago

Extracting text from list elements

I want to extract the text from all <li> elements inside an unordered list <ul>. Trying await page.locator("div.my_class > ul > li").textContent(); causes an error: strict mode violation: locator('div.my_class > ul > li') resolved to x elements. The presence of multiple elements is expected since this is a list. Playwright itself doesn't seem to have an issue with selectors that return multiple elements, and I did find the strictSelectors parameter in the crawlee docs, but didn't manage to set it to false (if that is even the solution). In scrapy item.add_css("list", "div.my_class > ul > li::text") returns a list of the text for each list item, which is what I'm looking for. Does anyone know how to solve this?
2 Replies
HonzaS
HonzaS3y ago
you can try to use crawlee function https://crawlee.dev/api/playwright-crawler/interface/PlaywrightCrawlingContext#parseWithCheerio and then extract it with cheerio functions or you can use https://playwright.dev/docs/api/class-page#page-eval-on-selector-all await page.$$eval('div.my_class > ul > li', (els)=>els.map((x)=>x.textContent)) writing it from my head so not sure it is exactly right, but something like this should work or as is written in the docs you can try the same with https://playwright.dev/docs/api/class-locator#locator-evaluate-all
ambitious-aqua
ambitious-aquaOP3y ago
Thanks @HonzaS, using $$eval works:
const list_text = await page.$$eval("div.my_class > ul > li", (els) => {
return els.map((el) => el.textContent);
});
const list_text = await page.$$eval("div.my_class > ul > li", (els) => {
return els.map((el) => el.textContent);
});

Did you find this page helpful?