Apify Discord Mirror

Updated 5 months ago

Twitter scraping by both keyword and profile

At a glance

The community member is trying to scrape tweets from a specific Twitter user (JoeBiden) that contain a specific keyword ("president"). They are having trouble filtering the results by both the user and the keyword simultaneously. The community members suggest using the advanced search form on Twitter to generate a more specific search query that can be used in the API call. The answer is that the community member should use a search query like "\"president\" (from:JoeBiden) -filter:links -filter:replies" to get the desired results.

Useful resources
It is too computationally intense/slow for me to make the api call for one of the filters and do post processing with the second filter. I am wondering if you can make an api call to scrape filtering by both keyword and profile. Is this possible or can I only do one or the other? Thanks!
1
D
P
L
11 comments
I see this question is similar to the Facebook scraper post, is it the same case that you are unable to filter both simultaneously in one api call?
Hello the twitter has advanced search possibilities by itself . May you fill the form for advanced search ( https://twitter.com/search-advanced?lang=en ) and then copy paste it to the Actor's input? If it would not help, what combination of keywords and profiles, are you trying to scrape?
For some reason when I advanced search by both user and keyword on apify, it only searches the keyword. Is that supposed to happen?
which specific actor do you use? I just tried Twitter Scraper and 90% of the results are from the user I set on Input with the right keywords.
I use the same, I’m asking if it’s possible to set keyword and user and have results return the union of both
Can you give us more specific examples and step by step approach what are you trying to achieve.
Sure, so say I want to scrape all tweets by https://twitter.com/JoeBiden containing the word "president", I am current using this body of code

actorinput = { "addTweetViewCount": true, "addUserInfo": false, "browserFallback": false, "debugLog": false, "extendOutputFunction": "async ({ data, item, page, request, customData, Apify }) => {\n return item;\n}", "extendScraperFunction": "async ({ page, request, addSearch, addProfile, , addThread, addEvent, customData, Apify, signal, label }) => {\n \n}",
"fromDate": "2021-11-02",
"handle": [
"https://twitter.com/JoeBiden"
],
"handlePageTimeoutSecs": 5000,
"maxIdleTimeoutSecs": 60,
"maxRequestRetries": 6,
"mode": "own",
"profilesDesired": 10,
"proxyConfig": {
"useApifyProxy": true
},
"searchTerms": [
"president"
],
"tweetsDesired": 10000,
"useAdvancedSearch": true,
"useCheerio": true
}

headers = {
'Content-Type': 'application/json; charset=utf-8',
'Authorization': f'Bearer {api_token}'
}
data = json.dumps(actor_input)

response = requests.post(api_endpoint, headers=headers, data=data)
just advanced to level 1! Thanks for your contributions! 🎉
however it looks like the actor is retrieving tweets from any user containing the search term 'president'. I am only interested in tweets from "https://twitter.com/JoeBiden" containing the term 'president'. Thanks!
yes for this general input I am also receiving a lot unrelevant results.

That's why I suggested you to generate expression from advanced search form (on the twitter website) and use it for the searchTerms attribute. The input then looks like this:
Plain Text
{
  ...
  "searchTerms": [
    "\"president\" (from:JoeBiden) -filter:links -filter:replies"
  ],
  ...
}

Now all the results belongs to the specified twitter account.
ahh okay, i was wrongly under the impression that the api would have done this for me, thank you so much!
Add a reply
Sign up and join the conversation on Discord