Need to have an identifier to link back scrapped data to main database
Need to have an identifier to link back scrapped data to main database
At a glance
The community member is using Apify scrappers to scrape IG account information and FB page ad library information, with the goal of importing the data into their main leads database. However, they sometimes don't have an "identifier" to link the scraped data back to the correct lead in their main database. They currently use "lookup" to link the information, but this is problematic when the FB page URL is not formatted correctly in their main database. The community member has two questions:
1. Is there a way to have an ID or the "input data" in the output from Apify so that they can link the scraped data back to the correct lead in the main database? 2. Is there a better way to link those datasets together?
In the comments, a community member states "nevermind, I just saw the input URL", which may indicate they have found a solution.
Hi, I am currently using Apify scrappers to scrape IG account information and FB page ad library information.
The goal of using the scrapper is to get information about my leads and import back that information to my main leads database. However, depending on the scrapper, I sometime don't have an "identifier" to link back the data from the scraper, back to the correct lead in my main data base.
The way I link the information I scrapped back to the right lead is using "lookup". for example: I look up the domain url in both my main database and the scrapped data to link the right information to the sheet.
However, to scrap FB page ad library, I am using the URL of the FB page. Most of the time, the url is formatted correctly, but sometimes it is not (it may have "fr." or other /... ). And when returning the data from apify, it always gives me back the correctly formatted data (the FB page URL) Therefore, I cannot "lookup" all of those leads that had a FB page url that wasn't formatted correctly in my main database.
Question #1. Is there a way to have a ID or the "input data" in the ouput from apify so that I can link the scrapped data back to the correct lead in the main database.
Question #2. Is there a better way to link those datasets together?