Trump-Russia Network Construction: A Guide

This guide documents the data collection and code used to construct a network of relationships between Trump and Russians from 1973-2019. Please see the main page for more information and to access the replication data and code: https://devincr.github.io/trumpnetwork/.

If you find this guide useful or use the data or analysis in your work, please cite it as:

Case-Ruchala, D. (2020). Trump-Russia Network Data & Analysis. Retrieved from https://devincr.github.io/trumpnetwork.

Data collection & set-up

The data used to construct the network were hand-coded using Seth Abramson’s (2018) book Proof of Collusion. Our data collection method entailed coding every interaction (meeting, conversation, email, payment, etc.) between individuals or entities (political institutions, corporations, etc.) as presented in the book. These interactions were coded as an “edge list”, i.e. a matrix that includes columns for who iniates the connection (“from”), who receives the connection (“to”), and a brief description of that connection as described in the book. An “edge” refers to the “connection” or “link” between nodes.

To make standardized formatting of the descriptions easier, in our original spreadsheet we made separate columns for each time the connection was mentioned that included the page number, year associated with the interaction, and the description of the interaction (i.e. ‘date1-page’, ‘date1-year’, ‘date1-description’, ‘date2-page’,… and so on). These columns were consolidated into the “description” column (code for this not shown due to length).

The edge list used for the dynamic network is stored in the “links-dyn.csv” spreadsheet.

library(rio) # first install rio for import function using "install.packages(rio)"
library(dplyr) # install dplyr for data sorting below
linksdyn <- import("links-dyn.csv")

We further constructed a nodelist that was essentially just a list of names of each person involved in at least one connection in the edge list, along with additional information about each individual such as their country affiliation.

The node list used for the dynamic network is stored in the “nodes-dyn.csv” spreadsheet. It includes additional columns for plotting purposes.

nodesdyn <- import("nodes-dyn.csv")
head(tbl_df(nodesdyn))

## # A tibble: 6 x 6
##   id    name          country.type country.label country.name  labels      
##   <chr> <chr>                <int> <chr>         <chr>         <chr>       
## 1 s001  Yuri Durbinin            2 russia        russia        ""          
## 2 s002  Donald Trump             3 united states united states Donald Trump
## 3 s003  Vladimir Put…            2 russia        russia        Vladimir Pu…
## 4 s004  Oxana Fedoro…            2 russia        russia        ""          
## 5 s005  Felix Sater              2 russia        russia        Felix Sater 
## 6 s006  Ivanka Trump             3 united states united states ""

The edge list included for replication was stored using only the node ‘id’. Matching the node ‘id’ between the two spreadsheets shows a preview of how the original edgelist was constructed:

# match names to id using nodes spreadsheet
linkspreview <- linksdyn
linkspreview$from.name <- nodesdyn$name[match(linkspreview$from, nodesdyn$id)]
linkspreview$to.name <- nodesdyn$name[match(linkspreview$to, nodesdyn$id)]
linkspreview <- subset(linkspreview, select = c(from.name, to.name, description))

# preview original edge list
head(tbl_df(linkspreview))

## # A tibble: 6 x 3
##   from.name    to.name              description                            
##   <chr>        <chr>                <chr>                                  
## 1 Donald Trump Trump Organization   DONALD TRUMP >> TRUMP ORGANIZATION: Fr…
## 2 Vladimir Pu… Russian Intelligence VLADIMIR PUTIN >> RUSSIAN INTELLIGENCE…
## 3 Felix Sater  Michael Cohen        "FELIX SATER >> MICHAEL COHEN: Felix S…
## 4 Donald Trump Donald Trump Jr.     DONALD TRUMP >> DONALD TRUMP JR.: Dona…
## 5 Tevfik Arif  Soviet Ministry Of … TEVFIK ARIF >> SOVIET MINISTRY OF COMM…
## 6 Tamir Sapir  Alex Sapir           TAMIR SAPIR >> ALEX SAPIR: Alex Sapir …

Important: Before plotting, the following issues in the edge list and node list must be addressed:
* No duplicate edges
* No duplicate nodes
* No nodes in the node list that are missing from the edge list
* No edges with a node that was missing from the node list
* No misspellings that may have generated one of the above problems

Once cleaned, the number of rows in the node list and edge list indicate the number of total nodes and connections respectively:

length(nodesdyn$name) # number of nodes

## [1] 309

length(linksdyn$from) # number of edges/connections

## [1] 718

Dynamic network plot

The dynamic network plot was constructed using the ‘render.d3movie’ function in the ndtv package, as explained in this (very helpful!) tutorial on network visualization: https://kateto.net/network-visualization. The network package is also needed to convert the edge list into a network object before plotting.

library(ndtv)
library(network)

The plot uses edge list and node list with the full set of columns stored in the “links-dyn.csv” and “nodes-dyn.csv” spreadsheets, already imported as ‘linksdyn’ and ‘nodesdyn’. Here is a preview again for reference:

head(tbl_df(nodesdyn))

## # A tibble: 6 x 6
##   id    name          country.type country.label country.name  labels      
##   <chr> <chr>                <int> <chr>         <chr>         <chr>       
## 1 s001  Yuri Durbinin            2 russia        russia        ""          
## 2 s002  Donald Trump             3 united states united states Donald Trump
## 3 s003  Vladimir Put…            2 russia        russia        Vladimir Pu…
## 4 s004  Oxana Fedoro…            2 russia        russia        ""          
## 5 s005  Felix Sater              2 russia        russia        Felix Sater 
## 6 s006  Ivanka Trump             3 united states united states ""

head(tbl_df(linksdyn))

## # A tibble: 6 x 6
##   from  to    description                                width  year period
##   <chr> <chr> <chr>                                      <int> <int>  <int>
## 1 s002  s017  DONALD TRUMP >> TRUMP ORGANIZATION: From …     6  1973      1
## 2 s003  s056  VLADIMIR PUTIN >> RUSSIAN INTELLIGENCE: V…     6  1975      2
## 3 s005  s029  "FELIX SATER >> MICHAEL COHEN: Felix Sate…     6  1975      2
## 4 s002  s009  DONALD TRUMP >> DONALD TRUMP JR.: Donald …     6  1977      3
## 5 s018  s057  TEVFIK ARIF >> SOVIET MINISTRY OF COMMERC…     6  1980      4
## 6 s019  s021  TAMIR SAPIR >> ALEX SAPIR: Alex Sapir is …     6  1980      4

Create network object:

Before plotting, the edge and node lists must be used to create a network object:

# create network object
net <- network(linksdyn, vertex.attr=nodesdyn, matrix.type="edgelist",
                loops=F, multiple=F, ignore.eval = F)

Add transparent colors

Creating transparent colors and adding them as vertex (node) attributes to the network object makes visualization easier. (Node color transparency cannot be set in the function to plot).

# add (transparent) colors for countries
mygray <- rgb(190, 190, 190, max = 255, alpha = 175, names = "mygray")
mycoral <- rgb(240, 128, 128, max = 255, alpha = 175, names = "mycoral")
myblue <- rgb(173, 216, 230, max = 255, alpha = 175, names = "myblue")
net %v% "col" <- c("#BEBEBEAF", "#F08080AF", "#ADD8E6AF")[net %v% "country.type"]

Create edge spells & dynamic network object

Now the network object needs to be converted into a dynamic network object by creating “edge spells” and “node spells”.

An important point here: the ‘linksdyn’ edgelist needs to already be ordered by period in order for the plot to associate the nodes, edges, and edge descriptions correctly.

# make sure edge list is properly ordered
linksdyn <- linksdyn[with(linksdyn, order(period, from, to)),]
head(tbl_df(linksdyn))

## # A tibble: 6 x 6
##   from  to    description                                width  year period
##   <chr> <chr> <chr>                                      <int> <int>  <int>
## 1 s002  s017  DONALD TRUMP >> TRUMP ORGANIZATION: From …     6  1973      1
## 2 s003  s056  VLADIMIR PUTIN >> RUSSIAN INTELLIGENCE: V…     6  1975      2
## 3 s005  s029  "FELIX SATER >> MICHAEL COHEN: Felix Sate…     6  1975      2
## 4 s002  s009  DONALD TRUMP >> DONALD TRUMP JR.: Donald …     6  1977      3
## 5 s018  s057  TEVFIK ARIF >> SOVIET MINISTRY OF COMMERC…     6  1980      4
## 6 s019  s021  TAMIR SAPIR >> ALEX SAPIR: Alex Sapir is …     6  1980      4

# set min/max period; one period beyond the max period (period = factored years)
minterm <- min(linksdyn$year)
maxterm <- max(linksdyn$year) + 1

# create node spells:
# sets the nodes in the network as active throughout time 0 to max period
vs <- data.frame(onset = minterm, terminus= maxterm, vertex.id=1:(length(nodesdyn$id)))
head(vs)

##   onset terminus vertex.id
## 1  1973     2019         1
## 2  1973     2019         2
## 3  1973     2019         3
## 4  1973     2019         4
## 5  1973     2019         5
## 6  1973     2019         6

# create edge spells:
# edges of the network appear one by one, each is active from their first activation til max time period
es <- data.frame(onset=linksdyn$year, terminus= maxterm,
                  head=as.matrix(net, matrix.type="edgelist")[,1],
                  tail=as.matrix(net, matrix.type="edgelist")[,2])
head(es)

##   onset terminus head tail
## 1  1973     2019    2   17
## 2  1975     2019    3   56
## 3  1975     2019    5   29
## 4  1977     2019    2    9
## 5  1980     2019   18   57
## 6  1980     2019   19   21

# create dynamic network
net.dyn <- networkDynamic(base.net=net, edge.spells=es, vertex.spells=vs)

Plot dynamic network

Finally, the dynamic network can be used to create the plot over time.

# compute animation and plot
compute.animation(net.dyn, animation.mode = "kamadakawai",
                  slice.par=list(start= minterm, end= maxterm, interval=1,
                                 aggregate.dur=1, rule='all'))

# saves an HTML file to the directory - open the file to view the plot
render.d3movie(net.dyn, usearrows = F,
               displaylabels = T, label=net %v% "labels", label.cex = .7,
               bg="#ffffff", vertex.border="#333333",
               vertex.cex = log(degree(net)+1)/2.5,
               vertex.col = net.dyn %v% "col",
               edge.lwd = (net.dyn %e% "width")/3,
               edge.col = '#55555599',
               vertex.tooltip = paste("<b>Name:</b>", (net.dyn %v% "name") , "<br>",
                                      "<b>National affiliation:</b>", (net.dyn %v% "country.name")),
               edge.tooltip = paste("<b>Connection:</b>", (net.dyn %e% "description")),
               launchBrowser=T, filename="replication-dynamic.html",
               render.par=list(tween.frames = 30, show.time = F),
               d3.options = list(animateOnLoad = FALSE),  
               plot.par=list(mar=c(0,0,0,0)),  
               output.mode = 'htmlWidget') # change to 'HTML' for .html file output

Static network plot

The static network plots were created using the visNetwork package. This package requires a slightly different dataframe arrangement, so the nodes and edges for this plot are stored in separate files than those used in the dynamic network. The nodes, edges, and corresponding information are the same as in the dynamic network.

Full static plot

### install/load packages:
library(visNetwork)

### import data
nodesstat <- import("nodes-stat.csv")
linksstat <- import("links-stat.csv")

# sort names for drop down menu
nodesstat <- arrange(nodesstat, id)

### plot
set.seed(1)
plot <- visNetwork(nodesstat, linksstat, height = "800px", width = "800px") %>% visLayout(randomSeed = 12)  %>% 
  visNodes(size = 8, font = list(size = 30, align = "left"), color = list(hover = "yellow"),
           scaling = list(min=10, max=60)) %>%
  visEdges(arrows =list(to = list(enabled = TRUE, scaleFactor = .5))) %>%
  visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>%
  visIgraphLayout(layout = "layout_with_lgl") %>%
  visGroups(groupname = "russia", color = "lightcoral", shape = "dot") %>%
  visGroups(groupname = "united states", color = "lightblue", shape = "dot") %>%
  visGroups(groupname ="other countries", color = "grey", shape = "dot") %>%
  visLegend(addNodes = list(list(label = "person", shape = "dot", color = "lightgrey"), 
                            list(label = "corporation", shape = "square", color = "lightgrey"),
                            list(label = "gov institution", shape = "triangle", color = "lightgrey")),
            width = 0.1, position = "left")

### optional: save output as html
# visSave(plot, file = "replication-static.html")
plot

Individual year static plots

Static plots for individual years can be created by subsetting the dataset to include only those edges and nodes that are included as of a particular time point (in this case, 2005). This requires having a column for ‘years’, which was stored separately in the “links-stat+years.csv” spreadsheet.

edges.plot <- import("links-stat+years.csv")
nodes.plot <- nodesstat

## subset edge list and node list for nodes/connections existing up through 2005
edges.plot.2005 <- subset(edges.plot, year <= 2005, select = c(from,to,title,width,smooth,length,arrows))
edges.2005.names <- unique(c(edges.plot.2005$from, edges.plot.2005$to))
nodes.plot.2005 <- subset(nodes.plot, id %in% edges.2005.names)

## plot
set.seed(1)
plot.2005 <- visNetwork(nodes.plot.2005, edges.plot.2005, height = "800px", width = "800px") %>% visLayout(randomSeed = 12)  %>% 
  visNodes(size = 8, font = list(size = 30, align = "left"), color = list(hover = "yellow"),
           scaling = list(min=10, max=60)) %>%
  visEdges(arrows =list(to = list(enabled = TRUE, scaleFactor = .5))) %>%
  visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visGroups(groupname = "russia", color = "lightcoral", shape = "dot") %>%
  visGroups(groupname = "united states", color = "lightblue", shape = "dot") %>%
  visGroups(groupname ="other countries", color = "grey", shape = "dot") %>%
  visLegend(addNodes = list(list(label = "person", shape = "dot", color = "lightgrey"), 
                            list(label = "corporation", shape = "square", color = "lightgrey"),
                            list(label = "gov institution", shape = "triangle", color = "lightgrey")),
            width = 0.1, position = "left")

### optional: save output as html
#visSave(plot.2005, file = "stat-plot-2005.html")

plot.2005