List comprehension is an
alternative method of writing for
loops or
lapply
(and similar) functions that is readable,
quick, and easy to use. Many newcomers to R struggle with using the
lapply
family of functions and default to using
for
. However, for
loops are extremely slow in
R and should be avoided. Comprehensions in the eList
package bridge the gap by allowing users to write lapply
or
sapply
functions using for
loops.
This vignette describes how to use list and other comprehensions. It
also explores comprehensions with multiple variables,
if-else
conditions, nested comprehensions, and parallel
comprehensions.
Let us start with a simple example: we have a vector of fruit names
and want to create a sequence for each based on the number of
characters. One common method is to write a for
statement
where we loop across each fruit name, create a sequence from the
numbers, and place it in a list.
x <- c("apples", "bananas", "pears")
fruit_chars <- vector("list", length(x))
for (i in seq_along(x)){
fruit_chars[[i]] <- 1:nchar(x[i])
}
fruit_chars
#> [[1]]
#> [1] 1 2 3 4 5 6
#>
#> [[2]]
#> [1] 1 2 3 4 5 6 7
#>
#> [[3]]
#> [1] 1 2 3 4 5
Alternatively, we could use a faster, vectorized function such as
lapply
. This is similar to the for
loop,
except that a specified function is applied to each fruit name and the
results are wrapped in a list.
The eList
package allows for a hybrid variant of the two
using the List()
function. The for
loop
evaluates the expression and merges the results into a list. Beneath the
scenes, List
is parsing the statement into
lapply
and evaluating it. As we will see, though, there are
additional features in List
.
library(eList)
#>
#> Attaching package: 'eList'
#> The following object is masked from 'package:utils':
#>
#> zip
fruit_chars <- List(for (i in x) 1:nchar(i))
If you are coming from Python or another language that uses list
comprehension, you will notice that this syntax is a bit different; it
is written more similar to standard for
loops in R. The
for
statement comes first, then the variable name and
sequence are placed in parentheses. Finally the expression is placed
afterwards. The expression can be wrapped in braces {}
if
desired.
The lists of numbers in the previous examples may be confusing.
Luckily, List
and other comprehension functions allow us to
assign names to each result using the notation:
name = expr
.
List(for (i in x) i = 1:nchar(i))
#> $apples
#> [1] 1 2 3 4 5 6
#>
#> $bananas
#> [1] 1 2 3 4 5 6 7
#>
#> $pears
#> [1] 1 2 3 4 5
The name can be any type of expression. Though complex expressions
should be wrapped in parentheses so that the parser does not confuse who
has the =
sign.
The results of a comprehension can be filtered using standard
if
statements after naming the variables and sequence,
making sure that the condition is placed in parentheses. The statement
below only returns the sequence for "bananas"
.
Any statement that evaluates to NULL
is automatically
filtered from the results. else
statements can also be
included so that the results are not filtered out (unless the
else
statement evaluates to NULL
).
List(for (i in x) if (i == "bananas") "delicious" else "ewww")
#> [[1]]
#> [1] "ewww"
#>
#> [[2]]
#> [1] "delicious"
#>
#> [[3]]
#> [1] "ewww"
Each entry can still be assigned a name. Furthermore, the expression
can be as complex as necessary for the task with any number of
else if
checks.
Comprehensions can have multiple variables if the variables are
separated using "."
within a single name. To see how this
works, let us use the enum
function in the
eList
package. When enum
is used on a
variable, the first value becomes its index in the loop and the second
is the value of the vector at that index. Now, when we use
(i.j in enum(x))
, i =
the index number of each
item in x
, while j =
the value of the item in
x
(the name of the fruit).
Let us inspect enum(x)
to see what is going on beneath
the surface. enum
took the vector x
and
created a new list at each index. The first list contains two elements:
the first being 1
(the index number), the second
being "apples"
(the first value in
x
).
enum(x)
#> [[1]]
#> [[1]][[1]]
#> [1] 1
#>
#> [[1]][[2]]
#> [1] "apples"
#>
#>
#> [[2]]
#> [[2]][[1]]
#> [1] 2
#>
#> [[2]][[2]]
#> [1] "bananas"
#>
#>
#> [[3]]
#> [[3]][[1]]
#> [1] 3
#>
#> [[3]][[2]]
#> [1] "pears"
When multiple variables like i.j
are passed to the
for
loop, then i
is assigned to the first
element and j
to the second element for item in the
sequence. There can be any number of variables as long as they are
separated by "."
. There does not have to be a variable for
each element; but there does need to be an element for each item or else
an “out-of-bounds” error will be produced.
y <- list(a = 1:3, b = 4:6, c = 7:9)
List(for (i.j.k in y) (i+j)/k)
#> $a
#> [1] 1
#>
#> $b
#> [1] 1.5
#>
#> $c
#> [1] 1.666667
Similarly, variables can be skipped. The function below extracts the first and third elements of each item in the list.
If only the first or second item is needed, though, you should use
two dots: i..
for the first, ..i
for the
second.
Another convenient function that can be used on the sequence is
items()
. It is similar to enum
, except that
the name of each item is used instead of its index number. See the
documentation for eList
for other helper functions.
All of the examples so far have returned a list. However,
eList
supports a variety of different types of
comprehension. For example, Num
returns a numeric vector,
while Chr
returns a character vector. Env
can
be used to produce an environment, similar to dictionary comprehensions
in Python. Note that each entry in an environment must have a unique
name. These work by coercing the result into particular type, producing
an error if unable.
Num(for (i.j.k in y) (i+j)/k)
#> a b c
#> 1.000000 1.500000 1.666667
Chr(for (i.j.k in y) (i+j)/k)
#> a b c
#> "1" "1.5" "1.66666666666667"
One convenience function is ..
. It can be used as either
..[for ...]
or ..(for ...)
. By default, it
attempts to simplify the results in an array, but can mimic any other
type of comprehension.
Multiple for
loops can be used within a comprehension.
As long the subsequent for
statements immediately follow
the variables & sequence of the previous one, or immediately follow
a if-else
statement, then it will be parsed into a
vectorized lapply
style function. Nested loops should be
avoided unless necessary since they can be difficult to understand. The
following are a couple examples using matrix and numeric
comprehension.
Mat(for (i in 1:3) for (j in 1:6) i*j)
#> [,1] [,2] [,3]
#> [1,] 1 2 3
#> [2,] 2 4 6
#> [3,] 3 6 9
#> [4,] 4 8 12
#> [5,] 5 10 15
#> [6,] 6 12 18
Vector Comprehensions can also be nested within each other. The code below nests a numeric comprehension within a character comprehension.
Vector comprehensions also for parallel computations using the
parallel
package. All comprehensions allow the user to
supply a cluster, which activates the parallel version of
sapply
or lapply
. eList
comes
with a function that allows for the quick creation of a cluster based on
the user’s operating system and by auto-detecting the number of cores
available. Users are recommended to explicitly create a cluster though
the functions in the parallel
package unless a quick
parallel calculation is needed.
cluster <- auto_cluster(2)
Num(for (i in 1:100) sample(1:100, 1), clust=cluster)
close_cluster(cluster)
Note that the additional overhead involved with parallelization means that the gain in performance will be relatively small (and often negative) unless the sequence is sufficiently large. Large comprehensions, though, can experience significant gains by specifying a cluster.
The eList
package contains a number of summary functions
that accept vector comprehensions and/or normal vectors. These functions
allow users to combine multiple comprehensions and other vectors in a
single function, then apply a summary function to all entries. Each
summary comprehension accepts a cluster for parallel computations and
the na.rm
argument. Some examples:
# Are no values TRUE? (combines comprehension with other values)
None(for (i in 1:10) i < 0, TRUE, FALSE)
#> [1] FALSE
# Summary statistics from a random draw of 1000 observations from normal distribution
Stats(for (i in rnorm(1000)) i)
#> $min
#> [1] -3.159345
#>
#> $q1
#> [1] -0.6334384
#>
#> $med
#> [1] 0.03214575
#>
#> $q3
#> [1] 0.7308685
#>
#> $max
#> [1] 3.592817
#>
#> $mean
#> [1] 0.04593045
#>
#> $sd
#> [1] 1.017995