A beginner's guide to collecting and mapping Twitter data using R

A beginner's guide to collecting and mapping Twitter data using R

Learn to use R's twitteR and leaflet packages, which allow you to map the location of tweets on any topic.

A beginner's guide to collecting and mapping Twitter data using R
Image by : 

When I started learning R, I also needed to learn how to collect Twitter data and map it for research purposes. Despite the wealth of information on the internet about this topic, I found it difficult to understand what was involved in collecting and mapping Twitter data. Not only was I was a novice to R, but I was also unfamiliar with the technical terms in the various tutorials. Despite these barriers, I was successful! In this tutorial, I will break down how to collect Twitter data and display it on a map in a way that even novice coders can understand.

Create the app

If you don't have a Twitter account, the first thing you need to do is to create one. After that, go to apps.twitter.com to create an app that allows you to collect Twitter data. Don't worry, creating the app is extremely easy. The app you create will connect to the Twitter application program interface (API). Think of an API as an electronic personal assistant of sorts. You will be using the API to ask another program to do something for you. In this case, you will be connecting to the Twitter API and asking it to collect data. Just make sure you don't ask too much, because there is a limit on how many times you can request Twitter data.

There are two APIs that you can use to collect tweets. If you want to do a one-time collection of tweets, then you'll use the REST API. If you want to do a continuous collection of tweets for a specific time period, you'll use the streaming API. In this tutorial, I'll focus on using the REST API.

After you create your app, go to the Keys and Access Tokens tab. You will need the Consumer Key (API key), Consumer Secret (API secret), Access Token, and Access Token Secret to access your app in R.

Collect the Twitter data

The next step is to open R and get ready to write code. For beginners, I recommend using RStudio, the integrated development environment (IDE) for R. I find using RStudio helpful when I am troubleshooting or testing code. R has a package to access the REST API called twitteR.

Open RStudio and create a new RScript. Once you have done this, you will need to install and load the twitteR package:

#installs TwitteR
library (twitteR) 
#loads TwitteR

Once you've installed and loaded the twitteR package, you will need to enter the your app's API information from the section above:

api_key <- "" 
 #in the quotes, put your API key 
api_secret <- "" 
 #in the quotes, put your API secret token 
token <- "" 
 #in the quotes, put your token
token_secret <- "" 
 #in the quotes, put your token secret

Next, connect to Twitter to access the API:

setup_twitter_oauth(api_key, api_secret, token, token_secret)

Let's try doing a Twitter search about community gardens and farmers markets:

tweets <- searchTwitter("community garden OR #communitygarden OR farmers market OR #farmersmarket", n = 200, lang = "en")

This code simply says to search for the first 200 tweets (n = 200) in English (lang = "en"), which contain the terms community garden or farmers market or any hashtag mentioning these terms.

After you have done your Twitter search, save your results in a data frame:

tweets.df <-twListToDF(tweets)

To create a map with your tweets, you will need to export what you collected into a .csv file:

write.csv(tweets.df, "C:\Users\YourName\Documents\ApptoMap\tweets.csv") 
 #an example of a file extension of the folder in which you want to save the .csv file.

Make sure you save your R code before running it and moving on to the next step.

Create the map

Now that you have data, you can display it in a map. For this tutorial, we will make a basic app using the R package Leaflet, a popular JavaScript library for making interactive maps. Leaflet uses the magrittr pipe operator (%>%), which makes it easier to write code because the syntax is more natural. It might seem strange at first, but it does cut down on the amount of work you have to do when writing code.

For the sake of clarity, open a new R script in RStudio and install these packages:


Now you need a way for Leaflet to access your data:

read.csv("C:\Users\YourName\Documents\ApptoMap\tweets.csv", stringsAsFactors = FALSE)

stringAsFactors = FALSE means to keep the information as it is and not convert it into factors. (For information about factors, read the article "stringsAsFactors: An unauthorized biography", by Roger Peng.)

It's time to make your Leaflet map. You are going to use the OpenStreetMap base map for your map:

m <- leaflet(mymap) %>% addTiles()

Let's add circles to the base map. For lng and lat, enter the name of the columns that contain the latitude and longitude of your tweets followed by ~. The ~longitude and ~latitude refer to the name of the columns in your .csv file:

m %>% addCircles(lng = ~longitude, lat = ~latitude, popup = mymap$type, weight = 8, radius = 40, color = "#fb3004", stroke = TRUE, fillOpacity = 0.8)

Run your code. A web browser should pop up and display your map. Here is a map of the tweets that I collected in the previous section:


Map of tweets by location

Map of tweets by location, Leaflet and OpenStreetMap, CC-BY-SA


Although you might be surprised with the small number of tweets on the map, typically only 1% of tweets are geocoded. I collected a total of 366 tweets, but only 10 (around 3% of total tweets) were geocoded. If you are having trouble getting geocoded tweets, change your search terms to see if you get a better result.

Wrapping up

For beginners, putting all the pieces together to create a Leaflet map from Twitter data can be overwhelming. This tutorial is based on my experiences doing this task, and I hope it makes the learning process easier for you.

Dorris Scott will present this topic in a workshop, From App to Map: Collecting and Mapping Social Media Data using R, at the We Rise Women in Tech Conference (#WeRiseTech) June 23-24 in Atlanta.


About the author

Dorris Scott - Dorris Scott is a PhD student in geography at the University of Georgia. Her research emphases are in Geographic Information Systems (GIS), geographic data science, visualization, and public health. Her dissertation is on combining traditional and non-traditional data about Veteran’s Affairs hospitals in a GIS interface to help patients make more informed decisions regarding their healthcare.