robust-apricot•3y ago
Need help with shadow-root
I was making an actor for my business, which extracts reviews from different platforms, but i was having issues with a website- it has every data inside shadow-root so no results are coming because of that, i couldn't find a solution over internet so i came here.
any help would be appreciated!
8 Replies
robust-apricotOP•3y ago
Salesforce AppExchange | Leading Enterprise Cloud Marketplace
Medallia Sales & Service Experience for Salesforce
Medallia helps the world’s best-loved brands deliver great customer experiences. Connect Salesforce to the Medallia Experience Cloud and deliver insights to your Sales and Service teams that show them where to focus for improving account relationships.
robust-apricotOP•3y ago
this is the link of the page i need to scrape reviews from.
fair-rose•3y ago
Hey @thenameispriyam
Here is how:
https://stackoverflow.com/a/66399497/9734216
Stack Overflow
Select element within shadow root
I want to change a property in element hidden within shadow root. Due to the nature of a project I can't refer to document in JS directly, I can only use custom class (which doesn't work with shado...
robust-apricotOP•3y ago
didn't help, if you inspect reviews on the link i provided above, you can find there are multiple <ace-review> tags that have shadow-root inside.
i can scrape one at a time, but i want to scrape all simultaneously
fair-rose•3y ago
Here is how you can get all of them:
robust-apricotOP•3y ago
async requestHandler({ request, page, enqueueLinks }) {
console.log(
Scraping ${request.url}...);
console.log('New page created')
// let pageData = await page.evaluate(
// () => document.querySelector("*").outerHTML
// )
let pageData = await page.evaluate(
() => document.querySelector("page-container").shadowRoot.querySelector("analytics-handler>div>x-lazy").shadowRoot.querySelector("#reviews-panel").shadowRoot.querySelectorAll("ace-review").shadowRoot.innerHTML
);
const $ = cheerio.load(pageData);
const data = [];
$('.review-item').each((i, el) => {
let reviewDate = $(el).find("ace-link[data-testid='review-date-link']").text();
let reviewAuthor = $(el).find("a>span.bolded").text();
let reviewTitle = $(el).find("p.review-content.bolded").text();
let reviewDesc = $(el).find("p.review-content").text();
let overAllRatings = $(el).find("div[slot='label']").text().split(".")[0];
data.push({
author: reviewAuthor,
date: reviewDate,
sourceCollector: 'appexchange.salesforce.com',
sourceURL: request.url,
title: reviewTitle,
description: reviewDesc,
ratings: overAllRatings
});
});
await Actor.pushData(data);
},
how do i include that code into this request handler
i tried it, was getting $ not definedfair-rose•3y ago
querySelectorAll("ace-review")
returns an array of nodes, you need to loop over each one to access the shadowRoot.innerHTML

fair-rose•3y ago
Also
cheerio.load
doesn't work on arrays and cheerio.load
is async, so you need to use await
=> const $ = await cheerio.load(pageData)
What you can do is loop over the ace-review
elements, then parse each one individually with cheerio.