correct-apricot
correct-apricot3y ago

Twitter scraping by both keyword and profile

It is too computationally intense/slow for me to make the api call for one of the filters and do post processing with the second filter. I am wondering if you can make an api call to scrape filtering by both keyword and profile. Is this possible or can I only do one or the other? Thanks!
11 Replies
correct-apricot
correct-apricotOP3y ago
I see this question is similar to the Facebook scraper post, is it the same case that you are unable to filter both simultaneously in one api call?
Pepa J
Pepa J3y ago
Hello @Deleted User the twitter has advanced search possibilities by itself . May you fill the form for advanced search ( https://twitter.com/search-advanced?lang=en ) and then copy paste it to the Actor's input? If it would not help, what combination of keywords and profiles, are you trying to scrape?
correct-apricot
correct-apricotOP3y ago
For some reason when I advanced search by both user and keyword on apify, it only searches the keyword. Is that supposed to happen?
Pepa J
Pepa J3y ago
@Deleted User which specific actor do you use? I just tried Twitter Scraper and 90% of the results are from the user I set on Input with the right keywords.
correct-apricot
correct-apricotOP3y ago
I use the same, I’m asking if it’s possible to set keyword and user and have results return the union of both
Lukas Krivka
Lukas Krivka3y ago
Can you give us more specific examples and step by step approach what are you trying to achieve.
correct-apricot
correct-apricotOP3y ago
Sure, so say I want to scrape all tweets by https://twitter.com/JoeBiden containing the word "president", I am current using this body of code actorinput = { "addTweetViewCount": true, "addUserInfo": false, "browserFallback": false, "debugLog": false, "extendOutputFunction": "async ({ data, item, page, request, customData, Apify }) => {\n return item;\n}", "extendScraperFunction": "async ({ page, request, addSearch, addProfile, , addThread, addEvent, customData, Apify, signal, label }) => {\n \n}", "fromDate": "2021-11-02", "handle": [ "https://twitter.com/JoeBiden" ], "handlePageTimeoutSecs": 5000, "maxIdleTimeoutSecs": 60, "maxRequestRetries": 6, "mode": "own", "profilesDesired": 10, "proxyConfig": { "useApifyProxy": true }, "searchTerms": [ "president" ], "tweetsDesired": 10000, "useAdvancedSearch": true, "useCheerio": true } headers = { 'Content-Type': 'application/json; charset=utf-8', 'Authorization': f'Bearer {api_token}' } data = json.dumps(actor_input) response = requests.post(api_endpoint, headers=headers, data=data)
MEE6
MEE63y ago
@Deleted User just advanced to level 1! Thanks for your contributions! 🎉
correct-apricot
correct-apricotOP3y ago
however it looks like the actor is retrieving tweets from any user containing the search term 'president'. I am only interested in tweets from "https://twitter.com/JoeBiden" containing the term 'president'. Thanks!
Pepa J
Pepa J3y ago
@Deleted User yes for this general input I am also receiving a lot unrelevant results. That's why I suggested you to generate expression from advanced search form (on the twitter website) and use it for the searchTerms attribute. The input then looks like this:
{
...
"searchTerms": [
"\"president\" (from:JoeBiden) -filter:links -filter:replies"
],
...
}
{
...
"searchTerms": [
"\"president\" (from:JoeBiden) -filter:links -filter:replies"
],
...
}
Now all the results belongs to the specified twitter account.
correct-apricot
correct-apricotOP3y ago
ahh okay, i was wrongly under the impression that the api would have done this for me, thank you so much!

Did you find this page helpful?