This is the subhead for the blog post
Sr. Data Analyst Andrew Greenberg contributed to this post.
Ah, Twitter – you may find yourself on the platform searching for the latest celebrity gossip, debating the president’s comments, or finding out if your favorite athlete’s ankle injury is career-ending. However, as marketers we can further leverage the user-generated content to illustrate comprehensive maps that show us how user behaviors interact at a macro level. For example, a group of avid runners may follow @nycmarathon and @bostonmarathon, but perhaps they have a shared penchant for Snickers as well. The Twitter analysis can help bring to light some of these less obvious relationships between users one might target.
The ultimate goal of the force-directed graph analysis is to convert the commonality of Twitter followers from a targeted handle (or multiple handles in the case of this analysis) that represents the audience we’re attempting to target. There are several ways to frame the analysis to answer this question:
- Using a client’s Twitter handle and their followers’ friends to deep dive interests closely related with the client.
- Using the shared followers of a client and a competitor or interest group to better understand how to expand into new markets or a more targeted subset of interests, or how to better capture a competitor’s audience.
- Using the client and multiple close competitors to better understand the full space of people who are interested in the industry.
Once these graphs are built, the interest clusters created from them can be used as audience seeds with an ad platform’s behavior-targeting interface, or performing a more robust deep dive through a DMP. Some concrete examples of where these seeds can provide business value:
- DMP Deep Dives: These audience seeds can be considered the start of a user persona or an additional user behavior that may have otherwise been overlooked as being important.
- Facebook & Instagram: Teams may select behaviors and interests that closely relate to the clusters found to reach a more targeted or wider reaching audience.
- Twitter: Twitter Ads allows a team to feed their audience cluster followers directly back into the ad platform for precision targeting.
- Pinterest: Selecting interest categories and keywords that closely align with our findings will allow for the building of more targeted audiences or allow a team to expand its reach.
The Twitter analysis performed in this blog post is the second method mentioned above, diving into the relationship between a client and competitor to better understand their followers’ shared interests. The analysis follows three main steps:
- Data Collection
- Data Cleaning & Construction
- Data Visualization
This blog post will dive into the finer details of these steps and, by the end, will allow a team to replicate a similar analysis with their own data.
Twitter has a fairly robust REST API, but it does have its quirks and limitations. Grabbing small portions of data is quick and easy, but when you’re trying to scale this to larger quantities, the API reaches a bottleneck. The Twitter API only allows for a few requests (some of these requests may return a ton of data!) every 15 minutes, which severely inhibits any nimble ad hoc data collection at scale. The first step in this analysis is to automate this step, allowing it to run for days at time. Although the language and method chosen to collect the data will vary by implementation, the technology stack used in this analysis (AWS, Python, and the Tweepy library) resulted in a straightforward and easy-to-replicate setup that we recommend you standardize.
The data-gathering process behind the force-directed graph is as follows:
- Get all the followers of the client and its competitor. We also put in a stipulation that each follower must not have over 10,000 followers themselves to limit celebrity and brand/influencer accounts.
- Find the shared followers between our client and competitor handles.
- For each shared follower listed above, create a new query to return a set of all of their friends (friends are not followers).
- Export this data into a csv file of 2 columns. Column A being the shared follower name, and column B being the shared follower friend name.
To create a force-directed graph, we need three columns. Column A should represent our starting node on the graph, Column B should represent our ending node on the graph, and Column C should represent the force pulling these two points together. To continue with our @nycmarathon, @bostonmarathon, and @SNICKERS example, if one row of our data is @nycmarathon in column A and @SNICKERS is in column B with a column C of 30, there must be a second row in our data where @SNICKERS is in column A and @nycmarathon is in column B with a column C of 30 as well; this will apply to every single shared follower friend.
This gives an understanding of the final structure the data should follow before being placed into the force-directed graph algorithm.
To begin the process of reformatting the data, start by loading the csv file created into R.
Next, perform a left join on the data set to itself on column A; this will create a new column (Column C for reference).
Filter the rows in our newly created data set so column B is not equal to column C.
Finally, aggregate the data across columns B and C to get the counts of each unique pairing.
Once these four steps have been accomplished, we will have a data set that resembles the one we discussed in our example and is ready to be loaded into our force-directed graph-building function.
For the final step of this process, the data constructed in the previous step is to be fed into a graph-building function that will generate the finalized chart. However, the methodology in this step doesn’t relate to creating the visualization, but rather provide an understanding on how the force-directed graph creates its visually appealing look, and why the underlying algorithm was selected for this analysis.
Force-directed graphs are defined as an algorithm with the following conditions:
- Distribute vertices evenly within the frame;
- Minimize edge crossings;
- Keep edge lengths even.
Under this basis, a force must be utilized to pull these points to their respective location. Referencing the data created in the previous step, the count of each pairing will be the force between the pair’s respective nodes on the graph. Pairs with higher appearance counts will experience a stronger pull towards one another. 
In the force-directed graph, the design algorithm that was used originated from the Fruchterman and Reingold paper Graph Drawing by Force-Directed Placement. This algorithm was selected over other comparable algorithms because it demonstrated quality performance with medium-sized graphs (50 – 500 nodes) while still giving insightful results on graphs with thousands of nodes; it was also selected because it didn’t require any optimization parameters. 
To build the visualization, the programming language R was used along with the libraries igraph, ggplot2, and ggraph. Following our method listed above, the data was cleaned and fed into the ggraph function to produce the final output.
The above visual is a completed force-directed graph. The center two nodes are the client and competitors, which have been anonymized. The further toward the edge of the graph, the less related the nodes are to the client and competitor. Two clear clusters have formed from this analysis: on the left is one focused on running and shoe companies; on the right is one focused on food and entertainment. To show the business value of the analysis in practice, we’ll continue to use the @nycmarathon example using the above graph (it’s not the client referenced, but let’s assume it is).
With this data, we’d recommend the following actions on the following platforms:
- DMP: Look at any brands in the two created clusters or dive into some more generalized behaviors the clusters represent as a starting point to developing user personas for @nycmarathons ad campaigns.
- Facebook & Instagram advertising: Target users who follow shoe and running-related Instagram or Facebook groups; also target users who have related interests and behaviors.
- Twitter advertising: For all the handles in these two clusters, mine their followers and feed them into the Twitter Ads targeting platform to generate a robust audience of interested users.
- Pinterest advertising: Target running, shoe, or food-related keywords and interests to develop more targeted audiences or expand a campaign’s reach.
Conclusions & Next Steps
While this was a comprehensive analysis on a pair of Twitter followers, there are still plenty of potential next steps that can be taken with social media analysis to better understand the client and who is interested in them. With that being said, the following are 3Q’s next steps for advancing this analysis:
- Compare client against multiple competitors on Twitter;
- Analyze potential keywords found in the paired users’ tweets using sentiment analysis to create keyword clusters from tweets identified to have positive sentiments;
- Expand to other social media platforms — YouTube, Reddit, and Pinterest all have the potential to provide additional insights;
- Experiment with different force-directed graph algorithms and explore their merits.