This guide documents the data collection and code used to construct a network of relationships between Trump and Russians from 1973-2019. Please see the main page for more information and to access the replication data and code: https://devincr.github.io/trumpnetwork/.
If you find this guide useful or use the data or analysis in your work, please cite it as:
The data used to construct the network were hand-coded using Seth Abramson’s (2018) book Proof of Collusion. Our data collection method entailed coding every interaction (meeting, conversation, email, payment, etc.) between individuals or entities (political institutions, corporations, etc.) as presented in the book. These interactions were coded as an “edge list”, i.e. a matrix that includes columns for who iniates the connection (“from”), who receives the connection (“to”), and a brief description of that connection as described in the book. An “edge” refers to the “connection” or “link” between nodes.
To make standardized formatting of the descriptions easier, in our original spreadsheet we made separate columns for each time the connection was mentioned that included the page number, year associated with the interaction, and the description of the interaction (i.e. ‘date1-page’, ‘date1-year’, ‘date1-description’, ‘date2-page’,… and so on). These columns were consolidated into the “description” column (code for this not shown due to length).
The edge list used for the dynamic network is stored in the “links-dyn.csv” spreadsheet.
library(rio) # first install rio for import function using "install.packages(rio)"
library(dplyr) # install dplyr for data sorting below
linksdyn <- import("links-dyn.csv")
We further constructed a nodelist that was essentially just a list of names of each person involved in at least one connection in the edge list, along with additional information about each individual such as their country affiliation.
The node list used for the dynamic network is stored in the “nodes-dyn.csv” spreadsheet. It includes additional columns for plotting purposes.
nodesdyn <- import("nodes-dyn.csv")
head(tbl_df(nodesdyn))
## # A tibble: 6 x 6
## id name country.type country.label country.name labels
## <chr> <chr> <int> <chr> <chr> <chr>
## 1 s001 Yuri Durbinin 2 russia russia ""
## 2 s002 Donald Trump 3 united states united states Donald Trump
## 3 s003 Vladimir Put… 2 russia russia Vladimir Pu…
## 4 s004 Oxana Fedoro… 2 russia russia ""
## 5 s005 Felix Sater 2 russia russia Felix Sater
## 6 s006 Ivanka Trump 3 united states united states ""
The edge list included for replication was stored using only the node ‘id’. Matching the node ‘id’ between the two spreadsheets shows a preview of how the original edgelist was constructed:
# match names to id using nodes spreadsheet
linkspreview <- linksdyn
linkspreview$from.name <- nodesdyn$name[match(linkspreview$from, nodesdyn$id)]
linkspreview$to.name <- nodesdyn$name[match(linkspreview$to, nodesdyn$id)]
linkspreview <- subset(linkspreview, select = c(from.name, to.name, description))
# preview original edge list
head(tbl_df(linkspreview))
## # A tibble: 6 x 3
## from.name to.name description
## <chr> <chr> <chr>
## 1 Donald Trump Trump Organization DONALD TRUMP >> TRUMP ORGANIZATION: Fr…
## 2 Vladimir Pu… Russian Intelligence VLADIMIR PUTIN >> RUSSIAN INTELLIGENCE…
## 3 Felix Sater Michael Cohen "FELIX SATER >> MICHAEL COHEN: Felix S…
## 4 Donald Trump Donald Trump Jr. DONALD TRUMP >> DONALD TRUMP JR.: Dona…
## 5 Tevfik Arif Soviet Ministry Of … TEVFIK ARIF >> SOVIET MINISTRY OF COMM…
## 6 Tamir Sapir Alex Sapir TAMIR SAPIR >> ALEX SAPIR: Alex Sapir …
Important: Before plotting, the following issues in the edge list and node list must be addressed:
* No duplicate edges
* No duplicate nodes
* No nodes in the node list that are missing from the edge list
* No edges with a node that was missing from the node list
* No misspellings that may have generated one of the above problems
Once cleaned, the number of rows in the node list and edge list indicate the number of total nodes and connections respectively:
length(nodesdyn$name) # number of nodes
## [1] 309
length(linksdyn$from) # number of edges/connections
## [1] 718
The dynamic network plot was constructed using the ‘render.d3movie’ function in the ndtv
package, as explained in this (very helpful!) tutorial on network visualization: https://kateto.net/network-visualization. The network
package is also needed to convert the edge list into a network object before plotting.
library(ndtv)
library(network)
The plot uses edge list and node list with the full set of columns stored in the “links-dyn.csv” and “nodes-dyn.csv” spreadsheets, already imported as ‘linksdyn’ and ‘nodesdyn’. Here is a preview again for reference:
head(tbl_df(nodesdyn))
## # A tibble: 6 x 6
## id name country.type country.label country.name labels
## <chr> <chr> <int> <chr> <chr> <chr>
## 1 s001 Yuri Durbinin 2 russia russia ""
## 2 s002 Donald Trump 3 united states united states Donald Trump
## 3 s003 Vladimir Put… 2 russia russia Vladimir Pu…
## 4 s004 Oxana Fedoro… 2 russia russia ""
## 5 s005 Felix Sater 2 russia russia Felix Sater
## 6 s006 Ivanka Trump 3 united states united states ""
head(tbl_df(linksdyn))
## # A tibble: 6 x 6
## from to description width year period
## <chr> <chr> <chr> <int> <int> <int>
## 1 s002 s017 DONALD TRUMP >> TRUMP ORGANIZATION: From … 6 1973 1
## 2 s003 s056 VLADIMIR PUTIN >> RUSSIAN INTELLIGENCE: V… 6 1975 2
## 3 s005 s029 "FELIX SATER >> MICHAEL COHEN: Felix Sate… 6 1975 2
## 4 s002 s009 DONALD TRUMP >> DONALD TRUMP JR.: Donald … 6 1977 3
## 5 s018 s057 TEVFIK ARIF >> SOVIET MINISTRY OF COMMERC… 6 1980 4
## 6 s019 s021 TAMIR SAPIR >> ALEX SAPIR: Alex Sapir is … 6 1980 4
Before plotting, the edge and node lists must be used to create a network object:
# create network object
net <- network(linksdyn, vertex.attr=nodesdyn, matrix.type="edgelist",
loops=F, multiple=F, ignore.eval = F)
Creating transparent colors and adding them as vertex (node) attributes to the network object makes visualization easier. (Node color transparency cannot be set in the function to plot).
# add (transparent) colors for countries
mygray <- rgb(190, 190, 190, max = 255, alpha = 175, names = "mygray")
mycoral <- rgb(240, 128, 128, max = 255, alpha = 175, names = "mycoral")
myblue <- rgb(173, 216, 230, max = 255, alpha = 175, names = "myblue")
net %v% "col" <- c("#BEBEBEAF", "#F08080AF", "#ADD8E6AF")[net %v% "country.type"]
Now the network object needs to be converted into a dynamic network object by creating “edge spells” and “node spells”.
An important point here: the ‘linksdyn’ edgelist needs to already be ordered by period in order for the plot to associate the nodes, edges, and edge descriptions correctly.
# make sure edge list is properly ordered
linksdyn <- linksdyn[with(linksdyn, order(period, from, to)),]
head(tbl_df(linksdyn))
## # A tibble: 6 x 6
## from to description width year period
## <chr> <chr> <chr> <int> <int> <int>
## 1 s002 s017 DONALD TRUMP >> TRUMP ORGANIZATION: From … 6 1973 1
## 2 s003 s056 VLADIMIR PUTIN >> RUSSIAN INTELLIGENCE: V… 6 1975 2
## 3 s005 s029 "FELIX SATER >> MICHAEL COHEN: Felix Sate… 6 1975 2
## 4 s002 s009 DONALD TRUMP >> DONALD TRUMP JR.: Donald … 6 1977 3
## 5 s018 s057 TEVFIK ARIF >> SOVIET MINISTRY OF COMMERC… 6 1980 4
## 6 s019 s021 TAMIR SAPIR >> ALEX SAPIR: Alex Sapir is … 6 1980 4
# set min/max period; one period beyond the max period (period = factored years)
minterm <- min(linksdyn$year)
maxterm <- max(linksdyn$year) + 1
# create node spells:
# sets the nodes in the network as active throughout time 0 to max period
vs <- data.frame(onset = minterm, terminus= maxterm, vertex.id=1:(length(nodesdyn$id)))
head(vs)
## onset terminus vertex.id
## 1 1973 2019 1
## 2 1973 2019 2
## 3 1973 2019 3
## 4 1973 2019 4
## 5 1973 2019 5
## 6 1973 2019 6
# create edge spells:
# edges of the network appear one by one, each is active from their first activation til max time period
es <- data.frame(onset=linksdyn$year, terminus= maxterm,
head=as.matrix(net, matrix.type="edgelist")[,1],
tail=as.matrix(net, matrix.type="edgelist")[,2])
head(es)
## onset terminus head tail
## 1 1973 2019 2 17
## 2 1975 2019 3 56
## 3 1975 2019 5 29
## 4 1977 2019 2 9
## 5 1980 2019 18 57
## 6 1980 2019 19 21
# create dynamic network
net.dyn <- networkDynamic(base.net=net, edge.spells=es, vertex.spells=vs)
Finally, the dynamic network can be used to create the plot over time.
# compute animation and plot
compute.animation(net.dyn, animation.mode = "kamadakawai",
slice.par=list(start= minterm, end= maxterm, interval=1,
aggregate.dur=1, rule='all'))
# saves an HTML file to the directory - open the file to view the plot
render.d3movie(net.dyn, usearrows = F,
displaylabels = T, label=net %v% "labels", label.cex = .7,
bg="#ffffff", vertex.border="#333333",
vertex.cex = log(degree(net)+1)/2.5,
vertex.col = net.dyn %v% "col",
edge.lwd = (net.dyn %e% "width")/3,
edge.col = '#55555599',
vertex.tooltip = paste("<b>Name:</b>", (net.dyn %v% "name") , "<br>",
"<b>National affiliation:</b>", (net.dyn %v% "country.name")),
edge.tooltip = paste("<b>Connection:</b>", (net.dyn %e% "description")),
launchBrowser=T, filename="replication-dynamic.html",
render.par=list(tween.frames = 30, show.time = F),
d3.options = list(animateOnLoad = FALSE),
plot.par=list(mar=c(0,0,0,0)),
output.mode = 'htmlWidget') # change to 'HTML' for .html file output