Apify Discord Mirror

Updated 5 months ago

parsing input urls from google sheet

At a glance

The community member has encountered an issue with the Apify feature to crawl URLs from a Google Sheet, where the feature does not parse the entire URL when there is a comma in the URL. The community member tried to work around this by enclosing the URLs in quotes, but that did not help. The community member suggests using the Google Sheets actor as a workaround, but wants to avoid the hassle of setting up Google OAuth credentials. Other community members suggest encoding the special characters in the URLs or using the public Google Sheets actor, which does not require OAuth credentials. The community members discuss various approaches to accessing the public Google Sheet without the Apify platform, including using the Google Sheets API or parsing the sheet manually.

Useful resources
Hi, I have tried this feature https://docs.apify.com/platform/tutorials/crawl-urls-from-a-google-sheet
It looks like there is a bug that it does not parse out the whole url when there is comma inside it.
I have tried it on this sheet https://docs.google.com/spreadsheets/d/14eS_kezUiZ13U1zEaDrb4s7xnmerJuHwG7wiRIPwBIM/edit#gid=0 I even tried to put the url it inside " but it did not help.
Here is the result you can see that the urls requested are not the same as in the sheet.
https://api.apify.com/v2/datasets/vlTmoYRiFWawRdJsZ/items?clean=true&format=json
P
H
L
14 comments
Hello HonzaS,
Cannot really tell if this is a bug or a feature, but you may be able to encode these special characters in url ( https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent ).

For example , is being encoded to %2C so by simple replacing this character you should be able to achieve your goal.
Hi Pepa, thanks for the suggestion.
This means that the urls would need to be already encoded in the spreadsheet, right? That does not help me much as customer just wants to fill the spreadsheet and I do not think he can encode the urls. I guess only possibility is to use google sheets actor to load the urls.
he is populating the spreadsheet from airtable
This really sounds like something that may be improved, I am definitely raise an issue for the platform team about this.

Thinking about current workaround, you may use the google sheet actor, or provide google sheet url to your implementation and use google sheets API ( https://developers.google.com/sheets/api/quickstart/nodejs#set_up_the_sample ), but that is little bit complicated since, it requires creating OAuth2 credentials from the Google API Console.
yes that is what I was hoping to avoid - hassle with google OAuth πŸ˜„
Yuo don't need OAuth if your sheet is read only and fully public
But generally if that is your own actor, you can just parse the sheet (converted to CSV) manually
You don't need to log in with a user, but you need register app in Google Console and generate credentials to access data via Google Sheet API.
You don't if you use the public actor
Yeah I know, I have already implemented that and it works perfectly. Your actor is a lifesaver. πŸ˜„
how does it work? you have registered the app that the actor use? now I need to access public spreadsheed without apify platform and it looks like I need the api key
Yes, you need the api-key, or you could use puppeteer and extract the data by on your own. The actor uses it under the hood.
thanks a lot for clearing that for me
Add a reply
Sign up and join the conversation on Discord