Effective Data Visualization in R in Scientific Contexts workshop

Join our workshop on Effective Data Visualization in R in Scientific Contexts, which is a part of our workshops for Ukraine series! 

Here’s some more info: 

Title: Effective Data Visualization in R in Scientific Contexts

Date: Thursday, April 10th, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone)

Speaker: Christian Gebhard is a specialist in medical genetics. His daily practice of communicating complex scientific facts to both laypersons and healthcare professionals has fostered a deep passion for clear information presentation and effective data visualization. Striving for both clarity and reproducibility, he primarily utilizes R and ggplot2 to create impactful and accessible visualization of scientific data.

Description: The workshop will start by establishing a structured approach to transforming complex data into clear, informative visual representations. We’ll address common challenges and visualization pitfalls in different presentation formats. This part is applicable across different scientific fields and independent of visualization tools. The second part applies those principles to real-world examples using R and ggplot2. Participants will gain hands-on experience applying the learned principles to improve data communication in various presentation settings.


Minimal registration fee: 20 euro (or 20 USD or 800 UAH)



Please note that the registration confirmation is sent 1 day before the workshop to all registered participants rather than immediately after registration


How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!










 



Devops for Data Scientists (R & Python) workshop

Join our workshop on Devops for Data Scientists (R & Python), which is a part of our workshops for Ukraine series! 


Here’s some more info: 

Title: Devops for Data Scientists (R & Python)

Date: Thursday, April 3rd, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone)

Speaker: Rika Gorn is a Senior Platform Engineer at Posit where she helps customers and organizations create infrastructure for data analytics and data science. Her background is in data science and data engineering. 

Description: In this workshop we will learn the key principles of DevOps and problems which it intends to solve for data scientists. We will discuss how DevOps practices such as CI/CD enhance collaboration, automation, and reproducibility. We will learn common workflows for environment management, package management, containerization, monitoring & logging, and version control. Participants will get hands-on experience with a variety of tools including Docker, Github Actions, and APIs.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)



Please note that the registration confirmation is sent 1 day before the workshop to all registered participants rather than immediately after registration


How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!










 



Frame-by-Frame Modeling and Validation of NFL geospatial data using gganimate in R workshop

Join our workshop on Frame-by-Frame Modeling and Validation of NFL geospatial data using gganimate in R, which is a part of our workshops for Ukraine series! 


Here’s some more info: 

Title: Frame-by-Frame Modeling and Validation of NFL geospatial data using gganimate in R

Date: Thursday, March 27th, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone)

Speaker: Pablo L. Landeras has an Applied Mathematics BsC from ITAM (Mexico City). His background as both an athlete and an analyst has shaped his approach to sports research, blending firsthand experience with cutting-edge data science to drive innovation. Today, he is a Data Scientist at Zelus Analytics, where he specializes in R&D in both ice hockey and basketball. 

His career has spanned a variety of projects—from public health initiatives to data-driven scouting for soccer teams like FC Toluca. Before joining Zelus, he worked as a Data Scientist at Coca-Cola.

Description:  This talk will explore the validation and visualization of spatio-temporal data in sports, focusing on the NFL tracking dataset and the application of frame-by-frame modeling. After a brief introduction to spatio-temporal data and its significance, we’ll highlight common errors in tracking datasets, such as missing data and implausible trajectories, emphasizing the importance of validation. The session will delve into the capabilities of gganimate, showcasing how it transforms static plots into dynamic animations to validate data and enhance storytelling. We’ll provide an overview of the NFL tracking dataset, its structure, and key challenges like data noise and synchronization issues. Through step-by-step examples, participants will learn to build animations that visualize player movements, pass probabilities, and pass rush models, while using  techniques to identify anomalies and combine multiple data sources.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)




Please note that the registration confirmation is sent 1 day before the workshop to all registered participants rather than immediately after registration


How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!










 


Introduction to Empirical Macroeconomics with R workshop

Join our workshop on Introduction to Empirical Macroeconomics with R, which is a part of our workshops for Ukraine series! 


Here’s some more info: 

Title: Introduction to Empirical Macroeconomics with R

Date: Thursday, March 20th, 14:00 – 16:00 CET (Rome, Berlin, Paris timezone)

Speaker: Xiaolei (Adam) Wang is an Economics PhD student at the University of Melbourne, his research focuses on Bayesian econometrics. He is the author of the R package bsvarSIGNs, which implements algorithms for macroeconomic analyses with C++ code.

Description: Structural Vector Autoregressions (SVARs) are multivariate time series models commonly used in empirical macroeconomics. By imposing a minimal set of assumptions, such as sign restrictions, these models allow us to recover meaningful economic shocks and their dynamic causal effects from real data. This workshop will provide a gentle introduction to SVARs and their estimation techniques. Then, we will show how to apply these models with a simple workflow using the R package “bsvarSIGNs”. No prior knowledge of macroeconomics is required, any R user interested in analysing macro data can benefit from this workshop.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)





Please note that the registration confirmation is sent 1 day before the workshop to all registered participants rather than immediately after registration


How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!










 

Hitting web APIs with {httr2} in R workshop

Join our workshop on Hitting web APIs with {httr2} in R , which is a part of our workshops for Ukraine series! 


Here’s some more info: 

Title: Hitting web APIs with {httr2} in R 

Date: Thursday, March 13th, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone) 

Speaker: Ted Laderas is the Director of Training and Community at the Data Science Lab at Fred Hutch Cancer Center. He has taught R and Python for over 10 years.  He believes that research should not be lonely, and building communities of practice in science and research that are psychologically safe and inclusive are the key to doing better, more robust science.

Description: Do the words “Web API” sound intimidating to you? This talk is a gentle introduction to what Web APIs are and how to get data out of them using the {httr2}, {jsonlite}. and {tidyjson} packages. You’ll learn how to request data from an endpoint and get the data out. We’ll do this using an API that gives us facts about cats. By the end of this talk, web APIs will seem much less intimidating and you will be empowered to access data from them.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)




Please note that the registration confirmation is sent 1 day before the workshop to all registered participants rather than immediately after registration


How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!










 

Decomposing within and between person effects in longitudinal data with SEM in R workshop

Join our workshop on Decomposing within and between person effects in longitudinal data with SEM in R, which is a part of our workshops for Ukraine series! 


Here’s some more info: 


Title: Decomposing within and between person effects in longitudinal data with SEM in R

Date: Thursday, February 27th, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone)

Speaker: Dustin Haraden, PhD is a clinical psychologist interested in examining risk factors for depression in youth with an emphasis on sleep, circadian rhythms and pubertal development. He has a special interest in measurement, statistics and open science as it relates to research methods in psychology. Currently, Dustin is an assistant professor in psychology at the Rochester Institute of Technology.

Description: This workshop will introduce problems that arise when researchers fail to consider within vs. between sources of variance. We will explore the implementation of the Random Intercept Cross-Lagged Panel Model through model identification, comparing model fit, interpreting parameters and reporting results.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)



How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!


 

Tabular ML in R: an overview of tidymodels in R for tabularized data workshop

Join our workshop on Tabular ML in R: an overview of tidymodels in R for tabularized data, which is a part of our workshops for Ukraine series! 


Here’s some more info: 


Title: Tabular ML in R: an overview of tidymodels in R for tabularized data


Date: Thursday, February 20th, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone)


Speaker: Frank Hull is currently Director of Analytics at ACES. Frank oversees ACES’ Data Science department, which works directly with Portfolio Strategy, Portfolio Modeling, Transmission, Resource Planning, Fundamentals, and Trading & Operations. Frank leads & advises various initiatives such as weather-driven stochastics (WDS), long-term load forecasting (LTLF), peak prediction services (PPS), dark calm (DC) and extreme weather event (EWE) analyses. Frank also hosts internal R meetings for programmers at ACES. Prior to his current role, Frank held various roles related to data science, systems, modeling, and quantitative analysis at AES & ACES. Frank holds a degree in physics with a concentration in engineering physics.


Description: In this workshop, we will 1) discuss what we mean by tabular ml in R, 2) why it’s important, 3) when can it be applicable, and 4) how to setup a robust pipeline for iterative machine learning workflows. We will start off by defining and discussing the prevalence of tabular data across sectors. Followed by data exploration to understand and interpret any known relationships with our example dataset. Lastly, we will establish key practices within the

tidymodels ecosystem to create a predictive framework and benchmark various ML engines.


Minimal registration fee: 20 euro (or 20 USD or 800 UAH)



How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!










 

Flowcharts made easy with the package {flowchart}

In health research, a flowchart is the best way to show the flow of participants in a study when reporting results. But drawing flowcharts can be tedious to prepare and can get on your nerves.



Fortunately, there are several packages in R for drawing flowcharts using different approaches. The problem is that the programming is generally quite complex, and the numbers have to be entered manually or parameterized beforehand. These flowcharts can have reproducible problems because if data changes, we have to manually change the parameters again.

To make our lives easier, there’s a new {flowchart} package that uses the tidyverse workflow, which allows to create many different types of flowcharts in just a few steps.

https://bruigtp.github.io/flowchart


The package provides a set of functions that are thought to be combined with a tidyverse pipe operator (%>% or |>) to create different flowchart designs directly from the study database. These functions are highly customizable and allow the user to create reproducible flowcharts in an easier and tidier way. Now we don’t need to manually set the flowchart parameters such as the box coordinates or the numbers to display, because it automatically adapts to the data we have.

For example, we can create a flowchart of the entire participant study flow with this simple tidy workflow:

Here, we will describe these steps that are involved in creating a flowchart in this example. We will use the built-in safo dataset, that comes with the package, which is a randomly generated dataset from the SAFO clinical trial. For more information and other examples, you can visit the vignette of the package.

Installing and loading the package

As of March of 2024, the package is available on CRAN:

install.packages("flowchart")

You can always install the development version from Github:

remotes::install_github("bruigtp/flowchart")

Initialize the flowchart

The first step is the initialisation of the flowchart with the function as_fc():

library(flowchart) 

x <- safo |> 
  as_fc(label = "Patients assessed for eligibility")

This will create an object of class fc, the class created for this package. Objects of this class consist of a list containing the dataset together with the information related to the flowchart being generated. Let’s see it for our example:

str(x, max.level = 1)
List of 2
 $ data: tibble [925 × 21] (S3: tbl_df/tbl/data.frame)
 $ fc  : tibble [1 × 17] (S3: tbl_df/tbl/data.frame)
 - attr(*, "class")= chr "fc"

The data tibble belongs to the entire SAFO dataset as we haven’t done any further operations:

x$data
# A tibble: 925 × 21
      id inclusion_crit exclusion_crit chronic_heart_failure expected_death_24h
   <int> <fct>          <fct>          <fct>                 <fct>             
 1     1 Yes            No             No                    No                
 2     2 No             No             No                    No                
 3     3 No             No             No                    No                
 4     4 No             Yes            No                    No                
 5     5 No             No             No                    No                
 6     6 No             Yes            No                    No                
 7     7 No             No             No                    No                
 8     8 No             Yes            No                    Yes               
 9     9 No             No             No                    No                
10    10 No             No             No                    No                
# ℹ 915 more rows
# ℹ 16 more variables: polymicrobial_bacteremia <fct>,
#   conditions_affect_adhrence <fct>, susp_prosthetic_valve_endocard <fct>,
#   severe_liver_cirrhosis <fct>, acute_sars_cov2 <fct>,
#   blactam_fosfomycin_hypersens <fct>, other_clinical_trial <fct>,
#   pregnancy_or_breastfeeding <fct>, previous_participation <fct>,
#   myasthenia_gravis <fct>, decline_part <fct>, group <fct>, itt <fct>, …

The fc tibble represents the information on the generated flowchart, which only contains a first initial box indicating the total number of patients assessed for eligibility in the SAFO trial:

x$fc
# A tibble: 1 × 17
     id     x     y     n     N perc  text  type  group just  text_color text_fs
  <dbl> <dbl> <dbl> <int> <int> <chr> <chr> <chr> <lgl> <chr> <chr>        <dbl>
1     1   0.5   0.5   925   925 100   "Pat… init  NA    cent… black            8
# ℹ 5 more variables: text_fface <dbl>, text_ffamily <lgl>, text_padding <dbl>,
#   bg_fill <chr>, border_color <chr>

Drawing the flowchart

We can always use the fc_draw() function to draw the associated flowchart from a fc object:

x |> 
  fc_draw()

Building the flowchart

To build the entire flowchart, we would need to combine the initialized fc object with the desired functions until we obtain the final flowchart.

The second box showing the patients excluded from randomization can be obtained using the fc_filter() function:

safo |> 
  as_fc(label = "Patients assessed for eligibility") |> 
  fc_filter(!is.na(group), label = "Randomized", show_exc = TRUE) |> 
  fc_draw()

with show_exc = TRUE to show the excluded subject box as well. Now $data contains the database filtered only for the randomized subjects while $fc contains the information for these new boxes.

Now, we can split the flowchart by the study group, using the fc_split() function:

safo |> 
  as_fc(label = "Patients assessed for eligibility") |> 
  fc_filter(!is.na(group), label = "Randomized", show_exc = TRUE) |> 
  fc_split(group) |> 
  fc_draw()

Now, $data contains the previously filtered database that has been grouped by the group variable.

Finally, we can apply two more times the fc_filter() function to generate the complete flowchart we want:

safo |> 
    as_fc(label = "Patients assessed for eligibility") |> 
    fc_filter(!is.na(group), label = "Randomized", show_exc = TRUE) |> 
    fc_split(group) |> 
    fc_filter(itt == "Yes", label = "Included in intention-to-treat\n population") |> 
    fc_filter(pp == "Yes", label = "Included in per-protocol\n population") |> 
    fc_draw()

The idea is to combine these basic functions, fc_filter() and fc_split(), in any way we want to create the desired flowchart. The resulting flowchart can be further customized and enhanced using the fc_modify() function, or combined with other flowcharts either horizontally or vertically using the fc_merge() and fc_stack() functions, respectively. Finally, once the final flowchart is drawn, it can be exported to the desired image format using the  fc_export() function.

More information about these features and other examples can be found in the website of the package: https://bruigtp.github.io/flowchart/.

{SLmetrics}: scalable and memory efficient AI/ML performance evaluation in R

On December 3rd, 2024, a post about the release of {SLmetrics} was published. Today, January 11th, 2025, version 0.3-1 has been released and comes with many new features. Among these are weighted classification and regression metrics, OpenMP support and a wide array of new evaluation metrics.

In this blog post, I will benchmark {SLmetrics} and demostrate how it compares to the similar R packages {MLmetrics} and {yardstick} in terms execution time and memory efficiency – essential determinants for scalability and efficiency.

Benchmark Function

To run the benchmark of {SLmetrics}, {MLmetrics} and {yardstick}, I will use {bench} which measures the median execution time and memory efficiency. Below I have created a wrapper function:

## benchmark function
benchmark <- function(
  ..., 
  m = 10) {
  library(magrittr)
  # 1) create list
  # for storing values
  performance <- list()

  for (i in 1:m) {

     # 1) run the benchmarks
    results <- bench::mark(
      ...,
      iterations = 10,
      check = FALSE
    )

    # 2) extract values
    # and calculate medians
    performance$time[[i]]  <- setNames(
        lapply(results$time, mean), 
        results$expression
        )

    performance$memory[[i]] <- setNames(
        lapply(results$memory, function(x) {
             sum(x$bytes, na.rm = TRUE)}
             ), results$expression)

    performance$n_gc[[i]] <- setNames(
        lapply(results$n_gc, sum), results$expression
        )

  }

  purrr::pmap_dfr(
  list(performance$time, performance$memory, performance$n_gc), 
  ~{
    tibble::tibble(
      expression = names(..1),
      time = unlist(..1),
      memory = unlist(..2),
      n_gc = unlist(..3)
    )
  }
) %>%
  dplyr::mutate(expression = factor(expression, levels = unique(expression))) %>%
  dplyr::group_by(expression) %>%
  dplyr::filter(dplyr::row_number() > 1) %>%
  dplyr::summarize(
    execution_time = bench::as_bench_time(median(time)),
    memory_usage = bench::as_bench_bytes(median(memory)),
    gc_calls = median(n_gc),
    .groups = "drop"
  )

}

The wrapper function runs 10 x 10 benchmarks of each passed function – it discards the first run to allow the functions to warm up, before the benchmarks are recorded.

All values are averaged across runs and then presented as the median runtime, median memory usage and median number of gc()-calls during the benchmark.

Benchmarking {SLmetrics}

Bechmarking with and without OpenMP

In the first set of benchmarks, I will demonstrate the new OpenMP feature that has been shipped with version 0.3-1. For the benchmark, we will compare the execution time and memory efficiency of computing a 3×3 confusion matrix on two vectors of length 10,000,000 with and without OpenMP. The source code and results are shown below:

## 1) set seed
set.seed(1903)

## 2) define values
## for classes
actual <- factor(sample(letters[1:3], 1e7, TRUE))
predicted <- factor(sample(letters[1:3], 1e7, TRUE))

## 3) benchmark with OpenMP
SLmetrics::setUseOpenMP(TRUE)
#> OpenMP usage set to: enabled

benchmark(`{With OpenMP}` = SLmetrics::cmatrix(actual, predicted))
#> # A tibble: 1 × 4
#>   expression    execution_time memory_usage gc_calls
#>   <fct>               <bch:tm>    <bch:byt>    <dbl>
#> 1 {With OpenMP}            1ms           0B        0

## 4) benchmark without OpenMP
SLmetrics::setUseOpenMP(FALSE)
#> OpenMP usage set to: disabled

benchmark(`{Without OpenMP}`  = SLmetrics::cmatrix(actual, predicted))
#> # A tibble: 1 × 4
#>   expression       execution_time memory_usage gc_calls
#>   <fct>                  <bch:tm>    <bch:byt>    <dbl>
#> 1 {Without OpenMP}         6.27ms           0B        0

The confusion matrix is computed in less than a millisecond and around six milliseconds with and without OpenMP, respectively. In both cases, it uses zero or near-zero memory.

Benchmarking against {MLmetrics} and {yardstick}

In the second set of benchmarks, I will compare the execution time and memory efficiency of {SLmetrics} against {MLmetrics} and {yardstick}. The source code and results are shown below:

## 1) define classes
set.seed(1903)
fct_actual    <- factor(sample(letters[1:3], size = 1e7, replace = TRUE))
fct_predicted <- factor(sample(letters[1:3], size = 1e7, replace = TRUE))

## 2) perform benchmark
benchmark(
    `{SLmetrics}` = SLmetrics::cmatrix(fct_actual, fct_predicted),
    `{MLmetrics}` = MLmetrics::ConfusionMatrix(fct_predicted, fct_actual),
    `{yardstick}` = yardstick::conf_mat(table(fct_actual, fct_predicted))
)
#> # A tibble: 3 × 4
#>   expression  execution_time memory_usage gc_calls
#>   <fct>             <bch:tm>    <bch:byt>    <dbl>
#> 1 {SLmetrics}         6.34ms           0B        0
#> 2 {MLmetrics}       344.13ms        381MB       19
#> 3 {yardstick}       343.75ms        381MB       19

{SLmetrics} is roughly 60 times faster than both, and significantly more memory efficient as demonstrated by memory_usage and gc_calls. In this perspective, {SLmetrics} is more efficient and scalable than both packages as the memory usage is basically linear. See below:

## 1) define classes
set.seed(1903)
fct_actual    <- factor(sample(letters[1:3], size = 2e7, replace = TRUE))
fct_predicted <- factor(sample(letters[1:3], size = 2e7, replace = TRUE))

## 2) perform benchmark
benchmark(
    `{SLmetrics}` = SLmetrics::cmatrix(fct_actual, fct_predicted),
    `{MLmetrics}` = MLmetrics::ConfusionMatrix(fct_predicted, fct_actual),
    `{yardstick}` = yardstick::conf_mat(table(fct_actual, fct_predicted))
)
#> # A tibble: 3 × 4
#>   expression  execution_time memory_usage gc_calls
#>   <fct>             <bch:tm>    <bch:byt>    <dbl>
#> 1 {SLmetrics}         12.3ms           0B        0
#> 2 {MLmetrics}        648.5ms        763MB       19
#> 3 {yardstick}        654.7ms        763MB       19

{SLmetrics} can process 60x the data in the same time it takes {MLmetrics} and {yardstick} to process 40,000,000 data-points – without any additional memory cost.

Summary

The benchmarks suggests that {SLmetrics} is a strong contender to the more established packages {MLmetrics} and {yardstick} in terms of scalability, memory efficiency and speed.

Installing {SLmetrics}

{SLmetrics} is still under development and is therefore not on CRAN. But the latest release can be installed using {devtools}. A development version is also available for those living on the edge. See below:

Stable version

## install stable release
devtools::install_github(
  repo = 'https://github.com/serkor1/SLmetrics@*release',
  ref  = 'main'
)

Development version

## install development version
devtools::install_github(
  repo = 'https://github.com/serkor1/SLmetrics',
  ref  = 'development'
)

If you made it this far: Thank you for reading the blog post, and feel free to leave a comment here or in the repository.

Spatial modelling with GAMs in R workshop

Join our workshop on Spatial modelling with GAMs in R, which is a part of our workshops for Ukraine series! 


Here’s some more info: 


Title: Spatial modelling with GAMs in R


Date: Thursday, January 30th, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone) 


Speaker: Sophie Lee is a statistician and educator who teaches a range of statistics and R coding courses to non-statisticians. Her goal is to provide accessible, engaging training to prove that statistics does not need to be scary! She has a PhD in Spatio-temporal Epidemiology from LSHTM and is a Fellow of the Higher Education Academy. Her research interests lie in spatial data analysis, planetary health, and Bayesian modelling.


Description: When modelling spatial data we are generally unable to use traditional modelling approaches, such as generalised linear models (GLMs), as the assumption that observations are independent of one another may be invalid. This is due to underlying similarities, including unobservable behaviours, climate, and other characteristics, that are shared between observations close to one another. There are extensions of GLMs that can be used to overcome this lack of independence between observations, often with the inclusion of structured random effects, that try to take account of the underlying spatial relationships. The issue arises when deciding how to structure these spatial random effects: how close is close enough to consider observations no longer independent?

This workshop introduces generalised additive models (GAMs) as a method for generating the underlying spatial structure needed to define spatially structured random effects. We will see how penalised smoothing splines can be applied to coordinates to generate a spatial plane with minimal user assumptions. This ensures the spatial model is relevant and unique to the setting being studied. Using the mgcv package in R, we will apply this approach to real-world data, incorporating the flexible spatial structure into a random effects model, which then can be interpreted similarly to any other spatial model.


Minimal registration fee: 20 euro (or 20 USD or 800 UAH)

Please note that the registration confirmation email will be sent 1 day before the workshop.

How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!