(Day 12 puzzle). This was my favorite day so far. I’ve never faced my own graph problem and this was a great example for trying out the igraph package.
Big shout out to Gábor Csárdi and anyone else on the igraph team who wrote the docs. And I mean wrote the docs! When I google an R question, 99% of the time I land on StackOverflow. The searches I made for Day 12 all* took me to the igraph documentation website, which answered my questions. I don’t know of another R package or topic like that.
Their example of creating a graph was clear and was easy to adapt to the toy example on Day 12. From there, some searching found the two functions I’d need for Day 12: neighborhood()
and clusters()
. Look how short my part 2 is!
Part 0: Playing with igraph
Here’s the documentation example for creating an igraph. I played with it to confirm it would work for my needs:
library(pacman) p_load(igraph, tidyr, dplyr) # Toy example relations <- data.frame(from=c("Bob", "Cecil", "Cecil", "David", "David", "Esmeralda"), to=c("Alice", "Bob", "Alice", "Alice", "Bob", "Alice"), same.dept=c(FALSE,FALSE,TRUE,FALSE,FALSE,TRUE), friendship=c(4,5,5,2,1,1), advice=c(4,5,5,4,2,3)) g <- graph_from_data_frame(relations, directed=FALSE) neighborhood(g, 1, "Esmeralda") # 2 neighborhood(g, 2, "Esmeralda") # 5
Part 1
This was mostly wrangling the data into the igraph. It didn’t seem to like integer names for vertices so I prepended “a”.
create_graph_from_input <- function(filename){ filename %>% read.delim(header = FALSE) %>% separate(V1, into = c("v1", "v2"), sep = "<->") %>% separate_rows(v2, sep = ",") %>% mutate(v1 = paste0("a", str_trim(v1)), v2 = paste0("a", str_trim(v2))) %>% graph_from_data_frame(directed = FALSE) } get_group_size <- function(filename, grp_size, node_name){ create_graph_from_input(filename) %>% neighborhood(grp_size, paste0("a", node_name)) %>% unlist %>% length() } testthat::expect_equal(get_group_size("12_1_test_dat.txt", 30, "0"), 6) get_group_size("12_1_dat.txt", 30, "0") # 239
I increased the `grp_size` parameter until my result stopped increasing. That was at about 30 degrees of separation (it was still changing at 15). A more permanent solution might include a loop to do this.
Part 2
All you need is igraph::clusters()
:
"12_1_dat.txt" %>% create_graph_from_input %>% clusters() %>% .$no #215
One. Function.
Conclusion: graphs are neat, igraph is the way to analyze them.
* okay, one search took me to StackOverflow and gave me what I needed: the `clusters()` function. Everything else came from igraph.org.