It is usually said, that
for– and
while-loops should be avoided in R. I was curious about just how the different alternatives compare in terms of speed.
The first loop is perhaps the worst I can think of – the return vector is initialized without type and length so that the memory is constantly being allocated.
use_for_loop <- function(x){
y <- c()
for(i in x){
y <- c(y, x[i] * 100)
}
return(y)
}
The second
for loop is with preallocated size of the return vector.
use_for_loop_vector <- function(x){
y <- vector(mode = "double", length = length(x))
for(i in x){
y[i] <- x[i] * 100
}
return(y)
}
I have noticed I use
sapply() quite a lot, but I think not once have I used
vapply() We will nonetheless look at both
use_sapply <- function(x){
sapply(x, function(y){y * 100})
}
use_vapply <- function(x){
vapply(x, function(y){y * 100}, double(1L))
}
And because I am a
tidyverse-fanboy we also loop at
map_dbl().
library(purrr)
use_map_dbl <- function(x){
map_dbl(x, function(y){y * 100})
}
We test the functions using a vector of random doubles and evaluate the runtime with
microbenchmark.
x <- c(rnorm(100))
mb_res <- microbenchmark::microbenchmark(
`for_loop()` = use_for_loop(x),
`for_loop_vector()` = use_for_loop_vector(x),
`purrr::map_dbl()` = use_map_dbl(x),
`sapply()` = use_sapply(x),
`vapply()` = use_vapply(x),
times = 500
)
The results are listed in table and figure below.
expr |
min |
lq |
mean |
median |
uq |
max |
neval |
for_loop() |
8.440 |
9.7305 |
10.736446 |
10.2995 |
10.9840 |
26.976 |
500 |
for_loop_vector() |
10.912 |
12.1355 |
13.468312 |
12.7620 |
13.8455 |
37.432 |
500 |
purrr::map_dbl() |
22.558 |
24.3740 |
25.537080 |
25.0995 |
25.6850 |
71.550 |
500 |
sapply() |
15.966 |
17.3490 |
18.483216 |
18.1820 |
18.8070 |
59.289 |
500 |
vapply() |
6.793 |
8.1455 |
8.592576 |
8.5325 |
8.8300 |
26.653 |
500 |
The clear winner is
vapply() and
for-loops are rather slow. However, if we have a very low number of iterations, even the worst
for-loop isn’t too bad:
x <- c(rnorm(10))
mb_res <- microbenchmark::microbenchmark(
`for_loop()` = use_for_loop(x),
`for_loop_vector()` = use_for_loop_vector(x),
`purrr::map_dbl()` = use_map_dbl(x),
`sapply()` = use_sapply(x),
`vapply()` = use_vapply(x),
times = 500
)
expr |
min |
lq |
mean |
median |
uq |
max |
neval |
for_loop() |
5.992 |
7.1185 |
9.670106 |
7.9015 |
9.3275 |
70.955 |
500 |
for_loop_vector() |
5.743 |
7.0160 |
9.398098 |
7.9575 |
9.2470 |
40.899 |
500 |
purrr::map_dbl() |
22.020 |
24.1540 |
30.565362 |
25.1865 |
27.5780 |
157.452 |
500 |
sapply() |
15.456 |
17.4010 |
22.507534 |
18.3820 |
20.6400 |
203.635 |
500 |
vapply() |
6.966 |
8.1610 |
10.127994 |
8.6125 |
9.7745 |
66.973 |
500 |