Intermediate programming with R
Debugging with browser
Learning Objectives
- Use
browser
to set breakpoints - Use
browser
to set conditional breakpoints
In the last lesson we used debug
to enter into a function’s environment for interactive debugging. However, if we have an idea where the bug is located, we can use the function browser
to set a “breakpoint” in that location. This prevents us from having to step through each line of a function to reach the point where the problem is located. Furthermore, we can use a conditional statement to only activate the debugger when a certain condition is true (especially useful for long for
loops).
We’ll start with the function we updated in the last lesson.
mean_metric_per_var <- function(metric, variable) {
if (!is.factor(variable)) {
variable <- as.factor(variable)
}
variable <- droplevels(variable)
result <- numeric(length = length(levels(variable)))
names(result) <- levels(variable)
for (v in levels(variable)) {
result[v] <- mean(metric[variable == v])
}
return(result)
}
And we’ll focus on fixing the following behavior:
mean_metric_per_var(counts_raw$facebookLikeCount, counts_raw$journal)
pbio pcbi pgen pmed pntd pone ppat
0.2041683 0.1906311 0.2006783 0.2293156 0.2297650 NA 0.1893234
Our function returns NA
for the total number of Facebook likes for PLOS One. Why is this happening? It is correctly identifying all the journals, so the first section of code appears to be working correctly. The totals per journal are computed within the for
loop, so that is likely where the problem is originating. Since we suspect the problem is occuring during the for
loop, we’ll set the breakpoint there with browser
instead of starting from the beginning of the function using debug
.
mean_metric_per_var <- function(metric, variable) {
if (!is.factor(variable)) {
variable <- as.factor(variable)
}
variable <- droplevels(variable)
result <- numeric(length = length(levels(variable)))
names(result) <- levels(variable)
for (v in levels(variable)) {
browser()
result[v] <- mean(metric[variable == v])
}
return(result)
}
Now the next time we call the function, we are dropped into the debugger at the breakpoint set by browser
.
mean_metric_per_var(counts_raw$facebookLikeCount, counts_raw$journal)
Called from: mean_metric_per_var(counts_raw$facebookLikeCount, counts_raw$journal)
Browse[1]>
Let’s confirm that the beginning of the function has already been run.
Browse[1]> ls()
[1] "metric" "result" "v" "variable"
Furthermore, we can check the current value of v
in the for
loop.
Browse[1]> v
[1] "pbio"
We want to see what is happening when v
is "pone"
. As we did before we could step through line by line using n
. But using this approach, each time through the loop we would have to type n
multiple times to run the lines of code in the loop. This would be even worse if there were many lines of code inside. Instead, we can use c
for “continue”, which continues running the code until the next time browser
is called.
Browse[1]> c
Called from: mean_metric_per_var(counts_raw$facebookLikeCount, counts_raw$journal)
Browse[1]> v
[1] "pcbi"
Browse[1]> c
Called from: mean_metric_per_var(counts_raw$facebookLikeCount, counts_raw$journal)
Browse[2]> v
[1] "pgen"
But this really isn’t much better, especially if we run through the for
loop multiple times as we attempt to debug the function. Let’s quit the debugger and try a new strategy.
Browse[2]> Q
We can set a conditional breakpoint using an if
statement. Then we will be dropped into the interactive debugger only when the condition is true. We want to enter the debugger when v == "pone"
.
mean_metric_per_var <- function(metric, variable) {
if (!is.factor(variable)) {
variable <- as.factor(variable)
}
variable <- droplevels(variable)
result <- numeric(length = length(levels(variable)))
names(result) <- levels(variable)
for (v in levels(variable)) {
if (v == "pone") {
browser()
}
result[v] <- mean(metric[variable == v])
}
return(result)
}
mean_metric_per_var(counts_raw$facebookLikeCount, counts_raw$journal)
Called from: mean_metric_per_var(counts_raw$facebookLikeCount, counts_raw$journal)
Browse[1]> v
[1] "pone"
Now we entered into the debugger after the loop has reached “pone”. Let’s inspect the variable the values being passed to mean
. Specifically, let’s see all the unique values.
Browse[1]> unique(metric[variable == v])
[1] 0 2 1 6 5 3 10 7 4 8 35 12 34 9 37 16 14 25 892 22
[21] 11 19 18 13 21 49 15 41 50 17 51 104 23 20 26 151 30 39 NA 24
[41] 95 44 109 66
Interestingly, at least one of the Facebook Like counts are NA. Is this different from the other journals? Let’s check “pbio”.
Browse[1]> anyNA(metric[variable == "pbio"])
[1] FALSE
It does not contain any NA
s, so this is likely the problem.
Let’s first exit the debugging environment.
Browse[1]> Q
And then check the help for mean
to see if we can figure out what is going on (remember you can also press the F1
key to see a function’s help page).
?mean
From the help page, we see that mean
has an argument na.rm
to remove NA
s.
na.rm a logical value indicating whether NA values should be stripped before the computation proceeds.
Otherwise by default, the function returns NA
if any of the values are NA
.
Let’s update the function so that mean
removes NA
values. At this point we can also remove the call to browser
.
mean_metric_per_var <- function(metric, variable) {
if (!is.factor(variable)) {
variable <- as.factor(variable)
}
variable <- droplevels(variable)
result <- numeric(length = length(levels(variable)))
names(result) <- levels(variable)
for (v in levels(variable)) {
result[v] <- mean(metric[variable == v], na.rm = TRUE)
}
return(result)
}
And now the function works properly when passed an NA
!
mean_metric_per_var(counts_raw$facebookLikeCount, counts_raw$journal)
pbio pcbi pgen pmed pntd pone ppat
0.2041683 0.1906311 0.2006783 0.2293156 0.2297650 0.3519648 0.1893234