Introduction
Using Tweepy
Tweepy is a Python library that acts as a wrapper around the Twitter API. The easiest way to install it is to use something like Pip, i.e.:
pip install tweepy
More information about the installation of Tweepy can be found here.
Using the get_followers() method
We’re going to use the get_followers() method to learn about Tweepy. When learning how to use a new method, the first thing we want to do is check the documentation. The documentation for the get_followers() method is here.
Under Resource Information, we have a table of information:
Response formats | JSON |
Requires authentication? | Yes |
Rate limited? | Yes |
Requests / 15-min window (user auth) | 15 |
Requests / 15-min window (app auth) | 15 |
This tells us a number of things. One important detail about the method is that the maximum number of requests we can make is 15, every 15 minutes. Once that threshold is reached, the API will make us wait before it will accept any more requests. It might not sound like it at first, but this is quite a severe limitation. Not only can you only make a certain number of requests in a given time frame, but each request will only return a certain number of results.
Under the Parameters section on that same page, you’ll find more information about this method. The parameter that we’re interested in is called Count:
“The number of users to return per page, up to a maximum of 200. Defaults to 20.”
So, if you simply request a list of followers, the API will by default only give you 20 results. If you specify a maximum count of 200 (we’ll explore this more in a bit), you still only get a list of 200 followers. That leaves us with two questions:
- How do you get multiple pages of followers?
- What happens if you have more than 200 followers * 15 requests?
We’ll explore the answer to both of these questions. First, let’s take a look at a very basic example of working with Python and the Twitter API.
Connecting Python and Twitter
All of the code examples we use in this series of blog posts will make use of the same basic setup. We need to import Tweepy, we need to set up the OAuth connection, and we need to define the API as something we can work with. If you copy this code, ensure you replace the placeholders with your own OAuth credentials.
(You’re welcome to use whichever Python IDE works best for you. I use Spyder.)
import tweepy
consumerKey = "<your API key goes here>"
consumerSecret = "<your API secret goes here>"
accessToken = "<your access token goes here>"
accessTokenSecret = "<your access token secret goes here>"
#Set the access credentials
auth = tweepy.OAuthHandler(consumerKey, consumerSecret)
auth.set_access_token(accessToken, accessTokenSecret)
#define the oauth parameters
api = tweepy.API(auth)
#Define the Twitter API and tell it to use the OAuth settings
#-----This is the end of the "set-up" portion of the script-----
for user in tweepy.Cursor(api.get_followers,count=200).items():
#Use the cursor to paginate
print(user.name)
#Print the name
The “followers” variable is where it gets interesting. We declare that “followers” is a method of the already-defined “api”. Because we simply ask it to retrieve a list of followers, it assumes we want a list of followers associated with the authenticated user. If we wanted to specify which users’ followers we wanted to return, we would add it as a parameter:
followers = api.get_followers(screen_name="ken_mcclean")
This also works with other identifiers, such as account ID #.
Having had done that, “followers” should now contain a list of the followers of the authenticated account. Remember that we haven’t addressed either of the above questions yet. We’ll get to those shortly.
Now we have a list of followers. If we loop through the list and simply print each “user”, we are presented with a massive list of attributes for each user. Assuming we only want to know the name of the user, we can specify that information using dot notation. The example above uses dot notation to specify that we only want the “name” of each “user”.
If we run the script, we should be presented with a list of twenty users. Remember, that’s the default number of results per page, and we haven’t asked the API for more than one page.
Your First Tweepy Error – Positional Arguments
While working with the API, you may encounter an error that says something like this:
TypeError: get_followers() takes 1 positional argument but 2 were given
This generally means that you have passed a parameter to a method without specifying what that parameter means. In other words, you may have done something like this:
followers = api.get_followers("ken_mcclean")
Notice that we haven’t told the method what sort of value “ken_mcclean” actually is. The API used to take these positional arguments, but no longer does. This still leaves us with a question: what is the second positional argument referenced in the error? The “invisible” positional argument is actually the method itself!
Increasing the Number of Results With the cursor
The Twitter API uses rate limiting. In other words, you can only make so many calls or requests to the service in a given period of time.
Let’s examine the rate limiting you’d encounter if you wanted to return a list of everyone who follows your account. Consider the following code:
import tweepy
consumerKey = "<your API key goes here>"
consumerSecret = "<your API secret goes here>"
accessToken = "<your access token goes here>"
accessTokenSecret = "<your access token secret goes here>"
#Set the access credentials
auth = tweepy.OAuthHandler(consumerKey, consumerSecret)
auth.set_access_token(accessToken, accessTokenSecret)
#define the oauth parameters
api = tweepy.API(auth)
#Define the Twitter API and tell it to use the OAuth settings
#-----This is the end of the "set-up" portion of the script-----
for user in tweepy.Cursor(api.get_followers,count=200).items():
#Use the cursor to paginate
print(user.name)
#Print the name
Things have gotten slightly more complicated, but let’s unpack it.
We’re no longer declaring “followers” as a list. Instead we’re using a for-loop. The loop examines each “user” in the items that the Cursor returns. The Cursor does all the work of figuring out how many pages of results exist, and calls each page of results for you. Notice that we’re now specifying the maximum number of results per page (200).
So each time the Cursor returns a page of 200 results, we call each item that is returned the “user”, and use dot notation to return only the name of that user.
This basic setup is what we’ll use for the majority of the scripts that we explore in the series. Being able to use the Cursor, and pass methods to it, will allow you to get most of the information you desire out of Twitter. If you were interested in sourcing the list of followers of a different user, you’d simply add it as a parameter after invoking the get_followers() method:
for user in tweepy.Cursor(api.get_followers,screen_name="ken_mcclean",count=200).items():
Rate Limiting
We haven’t yet addressed the second question. What if you have more than 15 * 200 followers? The API limits you to that many results in a given time frame.
The answer is wait on rate limit. We need only change one line of code:
api = tweepy.API(auth,wait_on_rate_limit=True)
If the API tells the script that the rate limit has been exceeded, the script will now wait until the API gives the all-clear and continue running. In effect, it pauses until the API says “hey, you’re good to make another 15 requests.” Without this functionality, the script simply stops when rate limiting kicks in.
Conclusion
This post will hopefully have assisted you in connecting Python to your Twitter account. In the next post we’ll look at some more involved operations that may be accomplished, using Tweepy.
Leave a Reply