The apply() Family of Functions in R


The apply() family of functions in R is a powerful tool for applying operations to data structures like matrices, data frames, and lists. These functions help you write concise and efficient code by avoiding explicit loops. Here’s what we’ll cover:

    1. Introduction: A brief overview of the apply() family and why it’s important in R programming.

    2. The Basic Syntax: A detailed explanation of the syntax and parameters for apply(), lapply(), and sapply().

    3. The Examples: Practical code examples to demonstrate how each function works.

    4. The Case of Using: Real-world scenarios where these functions can be applied effectively.

    5. Key Points: A summary of the main takeaways and best practices for using these functions.

    6. The Meaning: A reflection on the significance of the apply() family in R programming.

    7. Conclusion: A wrap-up encouraging readers to practice and explore these functions further.

1. Introduction

A brief overview of the apply() family and why it’s important in R programming.

2. The Basic Syntax

The apply() family includes functions like apply(), lapply(), and sapply().

The general purpose of these functions is to apply a function to data structures like matrices, data frames, or lists.

The basic syntax for each function:

apply(X, MARGIN, FUN, ...)
lapply(X, FUN, ...)
sapply(X, FUN, ...)

The parameters:

    • X: The input data (matrix, data frame, or list).
    • MARGIN: For apply(), specifies rows (1) or columns (2). MARGIN = 1: Apply the function to rows. MARGIN = 2: Apply the function to columns.
    • FUN: The function to apply.
    • : Additional arguments for the function.

3. The Examples

Let’s dive into some practical examples to understand how these functions work.

Example for apply()

# Apply max function to columns of a matrix

matrix_data <- matrix(1:9, nrow = 3)
apply(matrix_data, 2, max)
    1. matrix_data: this creates a 3×3 matrix.

      [,1] [,2] [,3]
      [1,]    1    4    7
      [2,]    2    5    8
      [3,]    3    6    9
    2. apply(matrix_data, 2, max): the apply() function is used to apply the max function to each column of the matrix (because MARGIN = 2).

    3. It calculates the maximum value for each column:

      • Column 1: max(1, 2, 3) = 3

      • Column 2: max(4, 5, 6) = 6

      • Column 3: max(7, 8, 9) = 9

Result:

[1] 3 6 9

Explanation: The apply() function calculates the maximum value for each column of the matrix. 

    • MARGIN = 2: Apply the function column-wise (i.e., to each column of the matrix).

    • If MARGIN = 1, the function would be applied row-wise (i.e., to each row of the matrix).

    • If the input is a higher-dimensional array, you can use MARGIN = 3, MARGIN = 4, etc., to apply the function along other dimensions.

Example for lapply()

# Apply a function to each element of a list

numbers <- list(1, 2, 3, 4)
squares <- lapply(numbers, function(x) x^2)
print(squares)

Result:

[[1]]
[1] 1

[[2]]
[1] 4

[[3]]
[1] 9

[[4]]
[1] 16

Explanation: The lapply() function applies the square function (x^2) to each element of the list numbers. The output is a list where each element is the square of the corresponding input.

Example for sapply()

# Simplify the output of lapply() to a vector

squared_vector <- sapply(numbers, function(x) x^2)
print(squared_vector)

Result:

[1]  1  4  9 16

Explanation: The sapply() function simplifies the output of lapply() into a numeric vector. Each element of the vector is the square of the corresponding input.

4. The Case of Using

These functions are incredibly useful in real-world scenarios. Here are some examples:

    • Summarizing Data: Use apply() to calculate row or column means, sums, or other statistics in a data frame.
    • Iterating Over Lists: Use lapply() to clean or transform multiple datasets stored in a list.
    • Simplifying Repetitive Tasks: Use sapply() to avoid writing explicit loops for vectorized operations.

5. Key Points

Here are the key takeaways about the apply() family of functions:

    • apply(): Works on matrices or data frames; requires specifying rows (1) or columns (2).
    • lapply(): Works on lists; always returns a list.
    • sapply(): Simplifies the output of lapply() to a vector or matrix when possible.

Best Practices:

    • Use na.rm = TRUE to handle missing values.
    • Prefer sapply() when you need simplified output.
    • Use lapply() when working with lists and preserving the list structure is important.

6. The Meaning

The apply() family of functions is foundational for functional programming in R. These functions:

    • Promote efficient and concise code by avoiding explicit loops.
    • Enable vectorized operations, which are faster and more memory-efficient than traditional loops.
    • Make your code more readable and maintainable.

Mastering these functions can significantly improve your data analysis workflows.

7. Conclusion

The apply() family of functions is a must-know for anyone working with R. Whether you’re summarizing data, iterating over lists, or simplifying repetitive tasks, these functions can save you time and effort.

Next Steps:

    • Practice using apply(), lapply(), and sapply() in your own projects.
    • Explore related functions like tapply(), mapply(), and vapply().

Happy coding!

The Fun in Functional Programming

After some years as a Stata user, I found myself in a new position where the tools available were SQL and SPSS. I was impressed by the power of SQL, but I was unhappy with going back to SPSS after five years with Stata.

Luckily, I got the go-ahead from my leaders at the department to start testing out R as a tool to supplement SQL in data handling. This was in the beginning of 2020, and by March we were having a social gathering at our workplace. A Bingo night! Which turned out to be the last social event before the pandemic lockdown.

What better opportunity to learn a new programming language than to program some bingo cards! I learnt a lot from this little project. It uses the packages grid and gridExtra to prepare and embellish the cards.

The function BingoCard draws the cards and is called from the function Bingo3. When Bingo3 is called it runs BingoCard the number of times necessary to create the requested number of sheets and stores the result as a pdf inside a folder defined at the beginning of the script.

All steps could have been added together in a single function. For instance, a more complete function could have included input for the color scheme of the cards, the number of cards on each sheet and more advanced features for where to store the results. Still, this worked quite well, and was an excellent way of learning since it was both so much fun and gave me the opportunity to talk enthusiastically about R during Bingo Night.

library(gridExtra)
library(grid)
 


##################################################################
# Be sure to have a folder where results are stored
##################################################################

CardFolder <- "BingoCards"

if (!dir.exists(CardFolder)) {dir.create(CardFolder)}


##################################################################
# Create a theme to use for the cards
##################################################################

thema <- ttheme_minimal(
          base_size = 24, padding = unit(c(6, 6), "mm"),
          core=list(bg_params = list(fill = rainbow(5), 
                                     alpha = 0.5, 
                                     col="black"),
          fg_params=list(fontface="plain",col="darkblue")),
          colhead=list(fg_params=list(col="darkblue")),
          rowhead=list(fg_params=list(col="white")))


##################################################################
##  Define the function BingoCard
##################################################################

BingoCard <- function() {
  
  B <- sample(1:15,  5, replace=FALSE)
  I <- sample(16:30, 5, replace=FALSE)
  N <- sample(31:45, 5, replace=FALSE)
  G <- sample(46:60, 5, replace=FALSE)
  O <- sample(61:75, 5, replace=FALSE)
  
  BingoCard <- as.data.frame(cbind(B,I,N,G,O))
  
  BingoCard[3,"N"]<-"X"
  
  a <-  tableGrob(BingoCard, theme = thema)
  return(a)
}

##################################################################
##  Define the function Bingo3
##  The function has two arguments  
##  By default, 1 sheet with 3 cards is stored in the CardFolder   
##  The default name is "bingocards.pdf"
##  This function calls the BingoCard function
##################################################################

Bingo3 <- function(NumberOfSheets=1, SaveFileName="bingocards") {
  
  myplots <- list()
  N <- NumberOfSheets*3
  for (i in 1 : N   )      {     
    a1 <- BingoCard()
    myplots[[i]] <- a1 
  }
  ml <- marrangeGrob(myplots, nrow=3, ncol=1,top="")
  
  save_here <- paste0(CardFolder,"/",SaveFileName,".pdf")
  
  ggplot2::ggsave(save_here,  ml, device = "pdf", width = 210, 
                  height = 297, units = "mm")
}

##################################################################
##  Run Bingo3 with default values
##################################################################

Bingo3()

##################################################################
##  Run Bingo3 with custom values
##################################################################

Bingo3(NumberOfSheets = 30, SaveFileName = "30_BingoCards")