R has a lot of great tools for working with tables (also called dataframes or tibbles). One really powerful and relatively intuitive set of tools is called the “tidyverse”. Tools in the tidyverse make assumptions about what data will look like – rows represent observations and columns represent variables about that observation.
The tidygraph package extends that paradigm to networks, by representing networks as two tables – a table of nodes and node attributes and a table of edges and edge attributes.
This lets you work with networks using many of the same tools that have been developed for working on other types of data.
Because a network is really composed of two tables, we have to let R know which table we want to manipulate. This is done using
For example, the code below activates the node table and then uses
mutate (which we will talk more about further down) to create a variable called
degree in the nodes table.
(Note that the code throughout this tutorial uses “pipes”. Pipes (
%>%) let you express a sequence of operations, by taking the output of the previous operation and using it as the input of the next operation.)
create_notable('zachary') %>% activate(nodes) %>% mutate(degree = centrality_degree())
## # A tbl_graph: 34 nodes and 78 edges ## # ## # An undirected simple graph with 1 component ## # ## # Node Data: 34 x 1 (active) ## degree ## <dbl> ## 1 16 ## 2 9 ## 3 10 ## 4 6 ## 5 3 ## 6 4 ## # … with 28 more rows ## # ## # Edge Data: 78 x 2 ## from to ## <int> <int> ## 1 1 2 ## 2 1 3 ## 3 1 4 ## # … with 75 more rows
Because the networks are just stored as data frames, that means that we can also do things like use
ggplot to graph attributes of a network. This code below creates an edge attribute called
bw which is a measure of edge betweenness, and then makes a histogram of the distribution of
create_notable('zachary') %>% activate(edges) %>% mutate(bw = centrality_edge_betweenness()) %>% ggplot() + geom_histogram(aes(x=bw)) + theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The companion library to
tidygraph is ggraph.
ggraph is a set of tools based on
ggplot2. The key idea behind both
ggplot2 is that you can build a plot by adding layers according to a “grammar of graphics” that let you add to and change things about the plot.
ggraph includes tons of really cool types of plots but for this tutorial I am going to focus on standard plots that show nodes as circles and edges as lines. There are three key components that should be part of any of these plots:
Layouts are defined inside the
ggraph() function, which has to be called before making any plot. The code below makes a tidy graph of the Zachary karate network using
create_notable('zachary'). It then sets
'kk' as the layout, and adds a layer for the nodes and a layer for the edges.
create_notable('zachary') %>% ggraph(layout = 'kk') + geom_node_point() + geom_edge_fan()
The following few plots show how changes the layout can really change the look and the interpretation of the plot. These all show the same data.
create_notable('zachary') %>% ggraph(layout = 'circle') + geom_node_point() + geom_edge_fan()
In this one, we get very fancy. This shows an ego network for node
3. We will get to what some of the different commands in this code mean but like a good programmer what I really did was just find an example that was sort of like what I wanted to do, and played around with adding and changing things until I made it look like what I wanted.
create_notable('zachary') %>% mutate(d = distances(.G(), to=3)) %>% ggraph(layout = 'focus', focus = 3) + geom_edge_fan() + ggforce::geom_circle(aes(x0 = 0, y0 = 0, r = r), data.frame(r = 1:3), colour = 'grey') + geom_node_point(aes(color = as.factor(d)), size = 3) + coord_fixed() + scale_color_viridis_d() + labs(color='Distance from Node 3')