Advent of Code 2017 in #rstats: Day 13

I liked the twist in the Day 13 puzzle.  Implementing a literal solution to part 1, where I had the scanners walk back and forth, was straightforward.  And Part 2 looked easy enough.  Then I peeked at how slowly my algorithm was proceeding and realized I would need to refactor.

Part 1

This function has been modified in two places for Part 2.  I added the first line in the function, using the modulo operator `%%` to trim the time down to a single back-and-forth pass.   This made the commented-out direction change line obsolete.

I worked out the `time %% ((size-1)*2)` bit by counting examples on my fingers 🙂

# Time corresponds to "depth", size = "range"

move <- function(size, time){
  time <- time %% ((size-1)*2) # added this line for part 2 to be O(n) not O(n^2)
  pos <- 1
  increment <- 1
  for(i in seq_len(time)){ # with the modulo line this could be done w/o loop
  #  if(pos == 1) { increment <- 1 } # Don't need this b/c of adding the modulo
    if(pos == size) { increment <- -1}
    pos <- pos + increment

expect_equal(move(3, 0), 1)
expect_equal(move(2, 1), 2)
expect_equal(move(4, 4), 3)
expect_equal(move(4, 6), 1)

dat <- read.csv("13_dat.txt", sep = ":", header = FALSE) %>%
  setNames(c("depth", "size"))

sum((dat$size * dat$depth)[with(dat, mapply(move, size, depth)) == 1]) # 1728

Part 2

Without the modulo line above, my loop will walk the scanner back and forth for time steps.  As the delay increases, so does the time.  The answer to this problem is a delay of approximately 4 million time units; at that point, each scanner is walked 4 million times… it’s been a long time since CS 201 but I think that makes the algorithm O(N2).  In practice, I realized I had a problem when progress slowed dramatically.

Realizing this, I eliminated the unnecessary walking.  Now, this could still be much more efficient.  Because I’m recycling code from part 1, I test all scanner depths for collisions at each iteration; a faster solution would move on to the next delay value after a single scanner collision is found.  But hey, that is a mere ~20x slowdown or so, and is still O(N).

Having started with R in 2014, I rarely feel like an old-timer.  I learned dplyr from the beginning, not tapply.  But I had that feeling here… being somewhat comfortable with `mapply` gets in the way of sitting down to learn the `map2` functions from purrr, which I suspect will be useful to know.

# Part 2

  delay <- 0
  hits <- sum(with(dat, mapply(move, size, depth)) == 1) 
    hits != 0
    dat$depth <- dat$depth + 1
    delay <- delay + 1
    hits <- sum(with(dat, mapply(move, size, depth)) == 1)

 user system elapsed 
906.512 2.804 909.319 
> delay
[1] 3946838

My runtime was about 15 minutes.


Advent of Code 2017 in #rstats: Day 12

(Day 12 puzzle). This was my favorite day so far.  I’ve never faced my own graph problem and this was a great example for trying out the igraph package.

Big shout out to Gábor Csárdi and anyone else on the igraph team who wrote the docs.  And I mean wrote the docs!  When I google an R question, 99% of the time I land on StackOverflow.  The searches I made for Day 12 all* took me to the igraph documentation website, which answered my questions.  I don’t know of another R package or topic like that.

Their example of creating a graph was clear and was easy to adapt to the toy example on Day 12.  From there, some searching found the two functions I’d need for Day 12: neighborhood() and clusters().  Look how short my part 2 is!

Part 0: Playing with igraph

Here’s the documentation example for creating an igraph.  I played with it to confirm it would work for my needs:

p_load(igraph, tidyr, dplyr)

# Toy example
relations <- data.frame(from=c("Bob", "Cecil", "Cecil", "David",
                               "David", "Esmeralda"),
                        to=c("Alice", "Bob", "Alice", "Alice", "Bob", "Alice"),
                        friendship=c(4,5,5,2,1,1), advice=c(4,5,5,4,2,3))
g <- graph_from_data_frame(relations, directed=FALSE)
neighborhood(g, 1, "Esmeralda") # 2
neighborhood(g, 2, "Esmeralda") # 5

Part 1

This was mostly wrangling the data into the igraph.  It didn’t seem to like integer names for vertices so I prepended “a”.

create_graph_from_input <- function(filename){
  filename %>%
    read.delim(header = FALSE) %>%
    separate(V1, into = c("v1", "v2"), sep = "<->") %>%
    separate_rows(v2, sep = ",") %>%
    mutate(v1 = paste0("a", str_trim(v1)),
           v2 = paste0("a", str_trim(v2))) %>%
    graph_from_data_frame(directed = FALSE)

get_group_size <- function(filename, grp_size, node_name){
   create_graph_from_input(filename) %>%
    neighborhood(grp_size, paste0("a", node_name)) %>%
    unlist %>%
testthat::expect_equal(get_group_size("12_1_test_dat.txt", 30, "0"), 6)
get_group_size("12_1_dat.txt", 30, "0") # 239

I increased the `grp_size` parameter until my result stopped increasing.  That was at about 30 degrees of separation (it was still changing at 15).  A more permanent solution might include a loop to do this.

Part 2

All you need is igraph::clusters():

"12_1_dat.txt" %>%
  create_graph_from_input %>%
  clusters() %>%
  .$no #215

One.  Function.

Conclusion: graphs are neat, igraph is the way to analyze them.

* okay, one search took me to StackOverflow and gave me what I needed: the `clusters()` function.  Everything else came from


Advent of Code 2017 in #rstats: Day 11

Once I realized that moves on a hex grid map nicely to a standard rectangular grid, this was easy.   Despite playing hours of Settlers of Catan, I’d never realized this relationship.  Maybe because nothing traverses that hex grid?

North and South move one step up or down.  The four diagonal directions move a half-step up or down and a full column laterally.  The shortest solution path will be diagonal moves to reach the desired column, then vertical moves to the right row.

It took only a minute or two to modify my part 1 function for part 2, so I present both together.

Parts 1 & 2

library(dplyr); library(testthat)

steps_needed <- function(dat){
  lat <- 0
  lon <- 0
  max_dist <- 0
  current_dist <- 0
  for(i in seq_along(dat)){
    if(dat[i] == "n"){lat <- lat + 1}
    if(dat[i] == "s"){lat <- lat - 1}
    if(dat[i] == "ne"){lat <- lat + 0.5; lon <- lon + 1}  
    if(dat[i] == "se"){lat <- lat - 0.5; lon <- lon + 1}
    if(dat[i] == "nw"){lat <- lat + 0.5; lon <- lon - 1}
    if(dat[i] == "sw"){lat <- lat - 0.5; lon <- lon - 1}
    current_distance <-   
      abs(lon) + # diagonal steps to move horizontally
      abs(lat) - 0.5 * abs(lon) # vertical steps, adjusted for prev diagonal moves
    max_dist <- max(c(max_dist), current_distance)
    current_dist <- current_distance
  structure(c(current_dist, max_dist),
            names = c("Current Distance", "Maximum Distance"))

# Tests
expect_equal(steps_needed(c("se","sw","se","sw","sw")), 3)
expect_equal(steps_needed(c("ne", "ne", "ne")), 3)
expect_equal(steps_needed(c("ne", "ne", "sw", "sw")), 0)
expect_equal(steps_needed(c("ne", "ne", "s", "s")), 2)

# Execute
dat <- unlist(str_split(scan("11_1_dat.txt", "character"), ","))

# Current Distance Maximum Distance 
# 722 1551 



Advent of Code 2017 in #rstats: Day 9

I write less deeply-nested code now that I program in R.  When writing code poorly  in other programs, I’d often use nested IF statements (hi, Excel!).  Debugging that often looked like counting my nesting depth out loud: add one for every (,  subtract one for every ).

That strategy formed the basis for my solution today.   And I was excited to do Part 2 with arithmetic, not writing new code.

Part 1

Strategy: maintain counter of open braces, decreasing for closed braces.

So `{{<a>},{<a>},{<a>},{<a>}}` = 1 2 2 2 2. Sum = 9.

p_load(stringr, testthat)

# cancel the character after !
cancel_chars <- function(x){ gsub("!.", "", x) }

group_score <- function(dat){
  clean <- gsub("<.*?>", "" , dat) # non-greedy regex to remove garbage
  lvl <- 1 # depth of braces nesting
  counts <- rep(0, nchar(clean)) # default value for non-{ characters
  for(i in{
    if(str_sub(clean, i, i) == "{"){ # log the nested depth and increment counter
      counts[i] <- counts[i] + lvl
      lvl <- lvl + 1
    if(str_sub(clean, i, i) == "}"){
      lvl <- lvl - 1

# Test
expect_equal(group_score("{{{}}}"), 6)
expect_equal(group_score("{{},{}}"), 5)
expect_equal(group_score("{{{},{},{{}}}}"), 16)

dat <- scan("09_1_dat.txt", "character")
group_score(cancel_chars(dat)) # 15922

Part 2

The answer is the total number of characters, minus:

  • The characters canceled by !
  • The bracketing `<>` for each garbage string
  • The valid characters remaining after removing the garbage in Part 1

To wit:

cleaned <- cancel_chars(dat)
nchar(cleaned) -
  2*str_count(string = cleaned, pattern = fixed(">")) -
  nchar(gsub("<.*?>", "" , cleaned))

# 7314



Advent of Code 2017 in #rstats: Day 7

Today was hard, but a good challenge.  I haven’t written a recursive function since college CS – is that typical for data science / analysis work?  I don’t see much about recursion on Twitter.  Recursion feels like a separate way of thinking, and I had forgotten how.

Part 1

The first part was satisfying.  Data cleaning was quick and I thought joining the data.frame to itself to populate parent and child for each entry was a nice touch.

test <- read.delim("07_1_test_dat.txt", header = FALSE, stringsAsFactors = FALSE)

p_load(tidyr, splitstackshape, readr, dplyr, stringr)

# Tidy the data
tidy_puzzle_input <- function(x){
  result <- x %>%
    separate(V1, into = c("name", "child"), sep = "->") %>%
    cSplit(., "child", ",", "long", type.convert = FALSE) %>%
    mutate(weight = parse_number(name),
           name = str_extract(name, "[A-z]+"))
  # Join to itself to attach parent name
    result %>%
      select(-weight) %>%
      rename(name = child, parent = name)

My tidy data looked like:

        name   child weight  parent
1    jlbcwrl    <NA>     93  tzrppo
2    fzqsahw  lybovx    256  peuppj
3    fzqsahw  pdmhva    256  peuppj
4     rxivjo   mewof    206  ikdsvc
5     rxivjo  hrncqs    206  ikdsvc

Then the solution was easy:

# the bottom node will be the only one that doesn't have a parent 

test %>%
  tidy_puzzle_input %>%

read.delim("07_1_dat.txt", header = FALSE, stringsAsFactors = FALSE) %>%
  tidy_puzzle_input %>%


Part 2

This was tough going.  I got a function working for the sample input, which worked by stepping backward from the most distant children and adding their weight to their parents, then removing them and calling itself again.

But while the sample input had symmetric towers, in the bigger test data a tower could be balanced if it had two children of `4` and `2 -> 1 + 1`.  In that scenario, you can’t peel off the 4 in the same step that you peel off the 1s.  (For my own satisfaction, that false solution appears far below).

I’m proud of my eventual solution.  And besides finally getting my brain to think recursively, I learned a few things: creating a named vector with `structure()` and using `purrr::map_dbl`.

get_faulty_set <- function(dat){
  get_node_children <- function(node_name){
    dat$child[dat$name == node_name]
  # recursively get the cumulative weight of a node
  get_node_weight <- function(node_name){
      dat$weight[dat$name == node_name][1]
    } else{
      dat$weight[dat$name == node_name][1] +
        sum(sapply(get_node_children(node_name), get_node_weight))
  # Grab parents whose children have multiple weights - the problem cascades
  faulty_parents <- dat %>%
    rowwise %>%
    mutate(total_weight = get_node_weight(name)) %>%
    group_by(parent) %>%
    summarise(num_wts = n_distinct(total_weight)) %>%
    filter(num_wts > 1) %>%
  # find the lowest parent where the problem occurs
  lowest_bad_parent <- dat %>%
    filter(name %in% faulty_parents) %>%
    filter(! name %in% .$parent) %>%
    slice(1) %>%
  # Get the weights & names of that parent's children
  bad_set <- structure(
    purrr::map_dbl(get_node_children(lowest_bad_parent), get_node_weight),
    names = get_node_children(lowest_bad_parent)
  # From here it's common sense to spot the odd node out, see how much it needs to change;
  # Then fetch its weight using get_node_weight and compute the sum

I finished with a cop-out: once my function returned the weights of the subtower nodes where the problem existed and their names, it wasn’t worth programming the last bit to do the weight subtraction and get my answer.  With this output:

   dkvzre awufxne   osbbt   ycbgx wdjzjlk 
   2255    2255    2255    2260    2255

I just calculated it myself using `get_node_weight(“ycbqx”)` and subtracting 5.

Appendix: Solution with Symmetric Towers

Because I feel salty that this didn’t work.

find_unbalanced_tower <- function(x){
  lowest_children <- x %>%
    filter( %>%
    distinct(.keep_all = TRUE)
  res <- lowest_children %>%
    group_by(parent) %>%
    summarise(uniques = n_distinct(weight),
              total_wt = sum(weight))
  # check for completion and return result if done
  if(!all(res$uniques == 1)){
    problem_parent <- res$parent[res$uniques == 2]
    print(paste("Problem parent is: ", problem_parent))
    lowest_children %>%
      filter(parent == problem_parent) %>%
      group_by(name) %>%
  } else {
    # Then add their weight to parent weight, remove them, update parents to have no children
    x <- x %>%
      mutate(child = ifelse(child %in% lowest_children$name, NA, child)) %>%
      filter(!name %in% lowest_children$name) %>%
      rowwise() %>%
      mutate(weight = weight + sum(res$total_wt[res$parent == name])) %>%
      ungroup() %>%
      distinct(.keep_all = TRUE)

my_input %>%
  tidy_puzzle_input %>%



Advent of Code 2017 in #rstats: Day 6

The solution was a simple concept but harder than I thought to implement.  I learned some lessons today:

  • Read the instructions carefully.  I built the balancing function balance_banks to return how many distributions it made of the maximum value, but removed this code when I realized that the full distribution of a bank of N values counted as one iteration, not N iterations.
  • Related: Up until now, Git has felt unnecessary for these challenges, and I’ve been too lazy to use it (though it would do wonders for my GitHub contribution graph).  Today I got nervous deleting and reworking code for Part 1 … would I need it for the not-yet-known Part 2?
  • I built my balance_banks function inside of my go_til_repeat function (because I thought it would help with passing counter information, which I did not need).  When editing and then testing, I kept failing to load both into memory; I’d update one, load the other into memory, and call the first one.  I don’t work much with nested functions of any complexity; TIL to avoid loading just one while programming and testing them.

Today was the first day that run-time performance became an issue.  My outputs for each part were correct on the first try, but each took a while (3 minutes?) to run.  Suspicious that my code was stuck in a loop, I never let it finish the first call I made: I broke the loop manually and added the counter to print every 1,000 rebalances.

But it was just slow.  To check completion, I switched to the use of double duplicated you see below instead of janitor::get_dupes(), which is easier on the brain but a relatively expensive operation.

Lastly, when I thought I had to return multiple outputs from balance_banks I had it return them in a list.  I removed the increment counter code and the function now returns just one result (the balanced vector), but I left the list syntax in place since it worked.  That’s why you see the $output in last_result <- balance_banks(last_result$output)for instance.

Adding Part 2 onto Part 1 was trivial.  I’d seen Jenny Bryan use diff(range()) in a solution to an earlier puzzle and was stoked to reuse it in Part 2.

I resorted to rbind instead of my beloved dplyr::bind_rows because my data.frames don’t have column names.  I remain tickled by situations where I work with data.frames in R while ignoring column names.

Parts 1 and 2


# Get the index of the largest bank
bank_to_split_loc <- function(x) {
  which(x == max(x))[1] # subset to break ties
expect_equal(bank_to_split_loc(dat), 3)

# Takes a vector and balances the banks by distributing the largest one over the rest
balance_banks <- function(dat){
  loop_amount <- dat[bank_to_split_loc(dat)]
  current_bank <- bank_to_split_loc(dat) + 1
  dat[bank_to_split_loc(dat)] <- 0
  for(i in seq_len(loop_amount)){
    if(current_bank > length(dat)) { current_bank <- 1 } # wrap around the boundary
    dat[current_bank] <- dat[current_bank] + 1
    current_bank <- current_bank + 1
  list(output = dat) # this is unnecessary, but when I misunderstood the problem I was also passing a counter out of this loop
expect_equal(balance_banks(c(0, 2, 7, 0))$output, c(2, 4, 1, 2))

# Takes vector of banks, keeps rebalancing them and checking the resulting vectors until a match is found
# I later added the part 1 or 2 argument to return 
go_til_repeat <- function(input, part = 1){
  # Initialize results data.frame
  results <- data.frame(matrix(ncol = length(input), nrow = 0), stringsAsFactors = FALSE)
  results <- rbind(results, input)
  last_result <- balance_banks(input)
  results <- rbind(results, last_result$output)
  # Print every 1000 rebalances, to confirm making progress
  # I thought the max bank value would decrease over time; it doesn't really, but still works as a progress indicator
  print_counter <- 0
  while(sum(duplicated(results)) == 0){
    if(print_counter %% 1000 == 0 & print_counter > 0){ print(max(last_result$output)) }
    print_counter <- print_counter + 1
    last_result <- balance_banks(last_result$output)
    results <- rbind(results, last_result$output)

  if(part == 1){ # how many total cycles
  } else if (part == 2){ # how many cycles between duplicated rows
    # courtesy of StackOverflow:
    diff(range(which(duplicated(results) | duplicated(results, fromLast = TRUE))))


# These commands will take several minutes to run
day_05_input <- c(5, 1, 10, 0, 1, 7, 13, 14, 3, 12, 8, 10, 7, 12, 0, 6)
go_til_repeat(day_05_input, 1)
go_til_repeat(day_05_input, 2)



Advent of Code 2017 in #rstats: Day 5

This went smoothly.  No tricky parts and Part 2 was a simple extension of Part 1.  No packages needed to solve (I used testthat for checking against the test input).

This was a textbook case for a while loop, which I rarely use in R.

steps_to_leave <- function(vec, allow_decreasing_offsets = FALSE){
  index <- 1
  steps <- 0
    steps <- steps + 1  
    starting_position <- index
    index <- starting_position + vec[index]

    # increase or decrease original offset, updated for part 2
    if(!allow_decreasing_offsets | vec[starting_position] < 3){
      vec[starting_position] <- vec[starting_position] + 1
    } else {
      vec[starting_position] <- vec[starting_position] - 1

# Part 1
test_vec <- c(0, 3, 0, 1, -3)
testthat::expect_equal(steps_to_leave(test_vec, FALSE), 5)
dat <- scan("05_1_dat.txt", numeric(), quote = "")
steps_to_leave(dat) # 364539

# Part 2
testthat::expect_equal(steps_to_leave(test_vec, TRUE), 10)
steps_to_leave(dat, TRUE) # 27477714



Advent of Code 2017 in #rstats: Day 4

Whew, much easier than day 3.  Probably would be a great day to learn the tidytext package but I’ll be faster just muddling through with my current tools.

I wrote a function that compares the length of the input tokens to length of unique() input tokens.  Then for part 2, I added an argument as to whether anagrams should be permitted; if not, the tokens are each sorted alphabetically first into alphagrams (a word borrowed from competitive Scrabble).

Parts 1 & 2 together

I borrowed the string sorting algorithm from StackOverflow – I cited it below and upvoted as well 🙂

p_load(dplyr, stringr, testthat)

check_valid <- function(string, anagrams_okay){
  subs <- unlist(str_split(string, " "))
    subs <- unlist(lapply(subs, make_alphagram)) # for part 2
  length(unique(subs)) == length(subs)

# thanks StackOverflow!

make_alphagram <- function(x){
  paste(sort(unlist(strsplit(x, ""))), collapse = "")

Now it’s just a matter of testing & running:

# Tests
expect_equal(check_valid("aa bb cc dd ee aa", anagrams_okay = TRUE), FALSE)
expect_equal(check_valid("ba bb cc dd ee ab", anagrams_okay = TRUE), TRUE)
expect_equal(check_valid("ba bb cc dd ee ab", anagrams_okay = FALSE), FALSE)

# Part 1
raw <- read.delim("04_1_dat.txt", header = FALSE, stringsAsFactors = FALSE)[[1]] # maybe an inelegant way to read in as a vector...
lapply(raw, check_valid) %>%
  unlist %>%

# Part 2
lapply(raw, check_valid, anagrams_okay = FALSE) %>% unlist %>% sum

Advent of Code 2017 in #rstats: Day 2

After reading this puzzle, I was excited and concerned to see “spreadsheet”:

  1. Excited: R is good at rectangular numeric data, and I work with a lot of spreadsheets
  2. Concerned: Will I be starting with a single long string?  How do I get that into rectangular form?

After considering trying to chop up one long string, I copy-pasted my input to a text file and read.delim() worked on the first try.

Part 1 was simple, Part 2 was tricky.  The idea of row-wise iteration got in my head and I spent a lot of time with apply and lapply… I still hold a grudge against them from battles as a new R user, but having (mostly) bent them to my will I went to them here.  Maybe I should spend more time with the purrr package.

Part 1



ex <- data.frame(
  a = c(5, 7, 2),
  b = c(1,5,4),
  c = c(9,3,6),
  d = c(5, NA, 8)

checksum <- function(x){
  row_max <- apply(x, 1, max, na.rm = TRUE)
  row_min <- apply(x, 1, min, na.rm = TRUE)
  sum(row_max - row_min)

expect_equal(checksum(ex), 18) # works!
my_input <- read.delim("02_1_dat.txt", header = FALSE)

Part 2

Got the idea quickly, but executing took a bit longer.  I decided right away that I would:

  1. Work on a function that tackled a row (as a vector) at a time
  2. The function would divide every value in the vector by every other value in the vector
  3. It would then pick out the integer that wasn’t 1, as well as its reciprocal, and divide them
  4. Then run this function row-wise on the input

Actually building up that function took ~20 minutes.  I felt confident the approach would work so didn’t stop to consider anything else… I’m excited to see other solutions after I publish this, as maybe there’s something elegant I missed.

Along the way I learned: is.integer doesn’t do what I thought it would.  Come on, base R.

I don’t work with matrices much (not a mathematician, don’t have big data) and was a somewhat vexed by the matrix -> data.frame conversion.

# Will apply something row-wise through the data.frame
# Need to get the dividend and divisor

# example data for part 2:
ex2 <- data.frame(
  a = c(5, 9, 3),
  b = c(9,4,8),
  c = c(2,7,6),
  d = c(8, 3, 5)

# start with the top row to play with
x <- ex2[1, ]

# I'll create an x by x matrix with every number in a vector divided by itself
# I want the numbers corresponding to matrix entries containing the sole non-1 integer and its reciprocal.
# Annoying, is.integer() doesn't do what I thought it would, so need to make my own function:
is_integer <- function(x) {
  (x %% 1) == 0

is_recip_integer <- function(x){
  ((1/x) %% 1) == 0

get_quotient <- function(x){
  # create x by x matrix dividing each entry by itself
  divided <- sapply(x, function(a) a/x) %>%
    unlist %>%
    matrix(nrow = sqrt(length(.))) %>% 
  divided[divided == 1] <- NA # don't want the diagonal values
  # Get the position where the vector contains an integer
  bigger_index <- which(
    lapply(divided, is_integer) %>%
      lapply(sum, na.rm = TRUE) %>%
      unlist(.) == 1) # I drafted this function as nested parentheses,
                      # converted to pipes for blogging... it looks weird.
  # Get the position where the vector contains the reciprocal of an integer
  smaller_index <- which(
    lapply(divided, is_recip_integer) %>%
      lapply(sum, na.rm = TRUE) %>%
      unlist(.) == 1)
  # Compute quotient of values at those indices
  x[bigger_index] / x[smaller_index]

# Test
  # apply the function rowwise and sum
  apply(ex2, 1, get_quotient) %>% sum, 

# Get part 2 answer:
apply(my_input, 1, get_quotient) %>% sum



Advent of Code 2017 in #rstats: Day 1

Today’s puzzle is a good example of thinking “R-ishly”.   In R, I find it easier to compare vectors than to compare strings, and easier to work with vectorized functions than to loop.  So my code starts by splitting the input into a vector.

Part 1:

Split the input into a vector, add the first value to the end, and use dplyr’s lead() function to compare each value to the following value:


target &lt;- "112345567891" # should yield 1 + 5 + 1 = 7
target &lt;- str_split(target, "") %&gt;% unlist %&gt;% as.numeric() # split input string into a vector:
target &lt;- c(target, target[1]) # add the first value to the end, to account for wrap-around
sum(target[target == lead(target)], na.rm = TRUE) # sum every number that matches the number following it

Part 2:

This challenge is simpler, in my opinion.  You can halve the vector, then compare the first half to the second half, summing the matches (don’t forget to double that result).

target &lt;- "123425" # should match both 2s for a total of 4
target &lt;- str_split(target, "") %&gt;% unlist %&gt;% as.numeric()
first_half &lt;- target[1:(length(target)/2)]
second_half &lt;- target[-c(1:(length(target)/2))]
sum(first_half[first_half == second_half], na.rm = TRUE) * 2