Jeremy Foote bio photo

Jeremy Foote

I'm a PhD candidate, studying Media, Technology, and Society at Northwestern University

Email Twitter Google+ LinkedIn Github Google Scholar ORCID

In this post, I’ll show how simple getting social media data can be. I use the Twitter API to get some summary data about a list of users. A colleague is using this data for a project, and asked for some help in retrieving it, so I thought I would use my code as a tutorial.

I’m going to show how this is done using a Jupyter Notebook. This is a super cool tool that lets you write text inline with code and results. Hopefully it shows up OK on the blog :)

We start off by importing the Python Twitter library. This is the most important part of the program. By importing this library, which is a wrapper around the Twitter API and requests library, we are able to reuse the work of others, and save ourselves from lots of headaches and extra work.

import twitter

Before this next step, you will need to create an app and then look up your access keys and tokens.

I saved mine in a config file named ‘twitter_config.py’. The file looks like this:

consumer_key = 'dadjklsadkja' # Note - these aren't real :)
consumer_secret = 'jdkslfjlkdsa'
access_token_key = 'lkdsjflkdsa'
access_token_secret = 'lkjdlajsdla'

Once you have that file saved, you can import those variables. Storing keys in a separate file is recommended. That way you can share your code with others, add it to github, etc., without ever sharing your keys.

After that, we create an api connection object, using the following code

from twitter_config import *
api = twitter.Api(consumer_key=consumer_key,
                      consumer_secret=consumer_secret,
                      access_token_key=access_token_key,
                      access_token_secret=access_token_secret)

Next, check to make sure that “logging in” worked correctly.

api.VerifyCredentials()
User(ID=16614440, ScreenName=jdfoote)

Everything looks good, so we get our list of names.

This next code opens the twitter_names.csv file, which is just a list of Twitter screen names, one name per line. It combines all of these into one big Python list

# Get the data
with open('/home/jeremy/Desktop/DeleteMe/twitter_names.csv', 'r') as i:
    screen_names = [line.strip() for line in i]

Next, we get all of the data from Twitter. You can only look up data on 100 users per call, so I use a counter and an empty list to store how far through the list of names I am, and then get the next set of names.

I can’t figure out why, but some users weren’t found the first time through, but are found when I pass them all through the lookup again

counter = 0
user_objs = []
unfound_users = []
while counter < len(screen_names):
    curr_names = screen_names[counter:counter + 100]
    users = api.UsersLookup(screen_name=curr_names)
    unfound_users += [x for x in curr_names if x not in [y.screen_name for y in users]]
    user_objs += users
    counter += 100
# For some reason, some of these aren't found
user_objs += api.UsersLookup(screen_name=unfound_users)

Finally, save the output to a CSV file

import csv

with open('./twitter_output.csv', 'w') as o:
    out = csv.writer(o)
    out.writerow(['username','description','join_date','tweets','following','followers','favorites'])
    for u in user_objs:
        out.writerow([
            u.screen_name,
            u.description,
            u.created_at,
            u.statuses_count,
            u.friends_count,
            u.followers_count,
            u.favourites_count
                ]
        )