robust-apricot
robust-apricot3y ago

Need help with shadow-root

I was making an actor for my business, which extracts reviews from different platforms, but i was having issues with a website- it has every data inside shadow-root so no results are coming because of that, i couldn't find a solution over internet so i came here. any help would be appreciated!
8 Replies
robust-apricot
robust-apricotOP3y ago
Salesforce AppExchange | Leading Enterprise Cloud Marketplace
Medallia Sales & Service Experience for Salesforce
Medallia helps the world’s best-loved brands deliver great customer experiences. Connect Salesforce to the Medallia Experience Cloud and deliver insights to your Sales and Service teams that show them where to focus for improving account relationships.
robust-apricot
robust-apricotOP3y ago
this is the link of the page i need to scrape reviews from.
fair-rose
fair-rose3y ago
Hey @thenameispriyam Here is how: https://stackoverflow.com/a/66399497/9734216
Stack Overflow
Select element within shadow root
I want to change a property in element hidden within shadow root. Due to the nature of a project I can't refer to document in JS directly, I can only use custom class (which doesn't work with shado...
robust-apricot
robust-apricotOP3y ago
didn't help, if you inspect reviews on the link i provided above, you can find there are multiple <ace-review> tags that have shadow-root inside. i can scrape one at a time, but i want to scrape all simultaneously
fair-rose
fair-rose3y ago
Here is how you can get all of them:
let reviews = $('#main > page-container').shadowRoot.querySelector('x-lazy').shadowRoot.querySelector('listing-reviews').shadowRoot.querySelectorAll('ace-review')

reviews.forEach((item => console.log(item.shadowRoot.querySelector('.review-item'))));
let reviews = $('#main > page-container').shadowRoot.querySelector('x-lazy').shadowRoot.querySelector('listing-reviews').shadowRoot.querySelectorAll('ace-review')

reviews.forEach((item => console.log(item.shadowRoot.querySelector('.review-item'))));
robust-apricot
robust-apricotOP3y ago
async requestHandler({ request, page, enqueueLinks }) { console.log(Scraping ${request.url}...); console.log('New page created') // let pageData = await page.evaluate( // () => document.querySelector("*").outerHTML // ) let pageData = await page.evaluate( () => document.querySelector("page-container").shadowRoot.querySelector("analytics-handler>div>x-lazy").shadowRoot.querySelector("#reviews-panel").shadowRoot.querySelectorAll("ace-review").shadowRoot.innerHTML ); const $ = cheerio.load(pageData); const data = []; $('.review-item').each((i, el) => { let reviewDate = $(el).find("ace-link[data-testid='review-date-link']").text(); let reviewAuthor = $(el).find("a>span.bolded").text(); let reviewTitle = $(el).find("p.review-content.bolded").text(); let reviewDesc = $(el).find("p.review-content").text(); let overAllRatings = $(el).find("div[slot='label']").text().split(".")[0]; data.push({ author: reviewAuthor, date: reviewDate, sourceCollector: 'appexchange.salesforce.com', sourceURL: request.url, title: reviewTitle, description: reviewDesc, ratings: overAllRatings }); }); await Actor.pushData(data); }, how do i include that code into this request handler i tried it, was getting $ not defined
fair-rose
fair-rose3y ago
querySelectorAll("ace-review") returns an array of nodes, you need to loop over each one to access the shadowRoot.innerHTML
No description
fair-rose
fair-rose3y ago
Also cheerio.load doesn't work on arrays and cheerio.load is async, so you need to use await => const $ = await cheerio.load(pageData) What you can do is loop over the ace-review elements, then parse each one individually with cheerio.

Did you find this page helpful?