Intermediate programming with R
Debugging with recover
Learning Objectives
- Use
recover
to debug when code crashes
Recall the function we wrote earlier to calculate a summary statistic which was the average of multiple specified metric columns.
calc_sum_stat <- function(df, cols) {
df_sub <- df[, cols]
sum_stat <- apply(df_sub, 1, mean)
return(sum_stat)
}
Then we could create new metric columns, for example, the average of the multiple Facebook metrics or the average of the download metrics.
counts_raw$facebookAverageCount <- calc_sum_stat(counts_raw, grep("facebook",
colnames(counts_raw),
value = TRUE))
counts_raw$downloadsAverageCount <- calc_sum_stat(counts_raw, grep("Downloads",
colnames(counts_raw),
value = TRUE))
Now what do we expect would happen if we passed only one column name to the function? While this would not be very informative since taking the average of one metric would return the same metric. But there is no reason to think it should not work. Let’s try it.
calc_sum_stat(counts_raw, "mendeleyReadersCount")
Error in apply(df_sub, 1, mean): dim(X) must have a positive length
But in fact this does fail. The error message informs us that the error occurs during the call to apply
. While we could use debug
or browser
to investigate this error, a convenient way to interrogate an error is to use the function recover
. Specifically, we can set the option error
so that the function recover
is called any time there is an error.
options(error = recover)
Now let’s run the function again to produce the error:
calc_sum_stat(counts_raw, "mendeleyReadersCount")
Error in apply(df_sub, 1, mean): dim(X) must have a positive length
Enter a frame number, or 0 to exit
1: calc_sum_stat(counts_raw, "mendeleyReadersCount")
2: #3: apply(df_sub, 1, mean)
Selection:
Essentially, recover
allows us to explore the state of the code right before the error was thrown. It first asks us to select a frame number. The frames refer to the different environments that were created. First we called our function calc_sum_stat
which has its own environment. But within that function, apply
was called creating an additional environment within the environment of calc_sum_stat
. These frames make up the call stack, which we can visualize as the list of sub-functions that have been called.
Since we know the error occured in apply
, we’ll choose frame #2.
Selection: 2
Called from: calc_sum_stat(counts_raw, "mendeleyReadersCount")
Browse[2]>
This brings us to the familiar debugger environment. Since the state is frozen we can’t actually use these commands, but in fact typing one of these commands will bring us back to the frame menu (simply hitting the Enter
key works as well). But we can investigate the state of the variables.
Browse[1]> ls()
[1] "dl" "FUN" "MARGIN" "X"
To remind ourselves of the argument names of apply
, we can open the help page.
Browse[1]> ?apply
This opens the Help tab, where we can read that X
contains the data passed to apply
:
X an array, including a matrix.
Investigating what X is, we see that is a vector instead of a one-column data frame.
Browse[1]> str(X)
int [1:24331] 4 17 0 0 32 10 0 6 2 24 ...
This is strange. Let’s confirm this is what we passed to apply
. Type c
to exit the debugger and return to the frame menu.
Browse[1]> c
Enter a frame number, or 0 to exit
1: calc_sum_stat(counts_raw, "mendeleyReadersCount")
2: #3: apply(df_sub, 1, mean)
Selection:
Now we’ll enter frame #1 to investigate df_sub
.
Selection: 1
Browse[3]> str(df_sub)
int [1:24331] 4 17 0 0 32 10 0 6 2 24 ...
So it turns out that the first line that selects the one column returns a vector and not a data frame. This is actually the default behavior of R’s extract function (yes, the brackets are actually a function, try ?"["
to learn more).
Let’s exit recover
so that we can return to fix our function.
Browse[1]> c
Enter a frame number, or 0 to exit
1: calc_sum_stat(counts_raw, "mendeleyReadersCount")
2: #3: apply(df_sub, 1, mean)
Selection: 0
To fix this problem, we need to pass a third argmument when subsetting a data frame. If we set drop = FALSE
, then it will remain a one column data frame instead of being converted to a vector.
calc_sum_stat <- function(df, cols) {
df_sub <- df[, cols, drop = FALSE]
sum_stat <- apply(df_sub, 1, mean)
return(sum_stat)
}
And now it works.
head(calc_sum_stat(counts_raw, "mendeleyReadersCount"))
[1] 4 17 0 0 32 10
Unfortunately you will not always be able to easily solve the problem yourself. But since you know how to isolate a problem using the debugging tools, you are prepared to ask for help on online forums. For best results, follow advice for creating a minimal, reproducible example: