(Day 12 puzzle). This was my favorite day so far. I’ve never faced my own graph problem and this was a great example for trying out the igraph package.
Big shout out to Gábor Csárdi and anyone else on the igraph team who wrote the docs. And I mean wrote the docs! When I google an R question, 99% of the time I land on StackOverflow. The searches I made for Day 12 all* took me to the igraph documentation website, which answered my questions. I don’t know of another R package or topic like that.
Their example of creating a graph was clear and was easy to adapt to the toy example on Day 12. From there, some searching found the two functions I’d need for Day 12: neighborhood() and clusters(). Look how short my part 2 is!
Part 0: Playing with igraph
Here’s the documentation example for creating an igraph. I played with it to confirm it would work for my needs:
library(pacman)
p_load(igraph, tidyr, dplyr)
# Toy example
relations <- data.frame(from=c("Bob", "Cecil", "Cecil", "David",
"David", "Esmeralda"),
to=c("Alice", "Bob", "Alice", "Alice", "Bob", "Alice"),
same.dept=c(FALSE,FALSE,TRUE,FALSE,FALSE,TRUE),
friendship=c(4,5,5,2,1,1), advice=c(4,5,5,4,2,3))
g <- graph_from_data_frame(relations, directed=FALSE)
neighborhood(g, 1, "Esmeralda") # 2
neighborhood(g, 2, "Esmeralda") # 5
Part 1
This was mostly wrangling the data into the igraph. It didn’t seem to like integer names for vertices so I prepended “a”.
create_graph_from_input <- function(filename){
filename %>%
read.delim(header = FALSE) %>%
separate(V1, into = c("v1", "v2"), sep = "<->") %>%
separate_rows(v2, sep = ",") %>%
mutate(v1 = paste0("a", str_trim(v1)),
v2 = paste0("a", str_trim(v2))) %>%
graph_from_data_frame(directed = FALSE)
}
get_group_size <- function(filename, grp_size, node_name){
create_graph_from_input(filename) %>%
neighborhood(grp_size, paste0("a", node_name)) %>%
unlist %>%
length()
}
testthat::expect_equal(get_group_size("12_1_test_dat.txt", 30, "0"), 6)
get_group_size("12_1_dat.txt", 30, "0") # 239
I increased the `grp_size` parameter until my result stopped increasing. That was at about 30 degrees of separation (it was still changing at 15). A more permanent solution might include a loop to do this.
Part 2
All you need is igraph::clusters():
"12_1_dat.txt" %>% create_graph_from_input %>% clusters() %>% .$no #215
One. Function.
Conclusion: graphs are neat, igraph is the way to analyze them.
* okay, one search took me to StackOverflow and gave me what I needed: the `clusters()` function. Everything else came from igraph.org.
