wise-white
wise-white3y ago

Seeking Help to Access Safari Reader View Mode HTML Code

I'm currently working on a project that requires accessing the HTML code generated by Safari's Reader View mode. This mode simplifies the webpage content, making it cleaner and easier to parse. I understand that the reader view mode content appears after clicking on the Reader Mode button. I'm curious to know if there are any tools or methods within the Apify ecosystem that could assist me in obtaining the HTML code from Safari's Reader View mode. Any insights or suggestions on how to accomplish this would be greatly appreciated!
11 Replies
sensitive-blue
sensitive-blue3y ago
I'd be surprised if Safari offered an API or any sort of programmatic access to that, but I'll let the wizards chime in on that because I could be wrong.
sensitive-blue
sensitive-blue3y ago
As a potential alternative, mozilla does offer readability: https://github.com/mozilla/readability
GitHub
GitHub - mozilla/readability: A standalone version of the readabili...
A standalone version of the readability lib. Contribute to mozilla/readability development by creating an account on GitHub.
sensitive-blue
sensitive-blue3y ago
may be able to feed your target document into it and parse from there?
sensitive-blue
sensitive-blue3y ago
Since you're talking about accessing Safari's reader mode specifically I assume you're doing legit in-chrome / not-headless scraping, but the readme for readability does make note of how you'd achieve parsing in node if that's of any use: https://github.com/mozilla/readability#nodejs-usage
GitHub
GitHub - mozilla/readability: A standalone version of the readabili...
A standalone version of the readability lib. Contribute to mozilla/readability development by creating an account on GitHub.
wise-white
wise-whiteOP3y ago
Hello @shovelandsandbox, Thank you for your quick response and suggestion about Mozilla's readability library. I appreciate your input. Actually, I have already used the readability and newspaper3k libraries for similar tasks. However, I've found that neither is as reliable as Safari's Reader Mode in terms of consistently producing clean, simplified HTML. That's why I am particularly interested in tapping into Safari's Reader View mode's functionality. Unfortunately, I couldn't find any existing Python solution that replicates Safari's Reader View mode. I've even tried using Selenium to activate the Reader Mode in Safari, but to no avail. Do you know of any Apify actors that might be able to accomplish this? Any advice or direction in this matter would be greatly appreciated!
sensitive-blue
sensitive-blue3y ago
Hmm, I see – and that doesn't surprise me re: safari reader producing cleaner results more reliably. @logical mirror what exactly are you using for scraping in your actor?
wise-white
wise-whiteOP3y ago
@shovelandsandbox News articles
sensitive-blue
sensitive-blue3y ago
@logical mirror I mean–playwright, puppeteer, etc.
wise-white
wise-whiteOP3y ago
@shovelandsandbox So far, I haven't used any actors, as the manual/logical scraping isn't the main challenge. The key issue is finding a generic way to extract information that's applicable to all articles. The question stands: does Apify have a solution for this?
MEE6
MEE63y ago
@logical mirror just advanced to level 1! Thanks for your contributions! 🎉
sensitive-blue
sensitive-blue3y ago
@logical mirror there may be something relevant in the marketplace, but I'm assuming you've already checked through everything there

Did you find this page helpful?