{SLmetrics}: scalable and memory efficient AI/ML performance evaluation in R

On December 3rd, 2024, a post about the release of {SLmetrics} was published. Today, January 11th, 2025, version 0.3-1 has been released and comes with many new features. Among these are weighted classification and regression metrics, OpenMP support and a wide array of new evaluation metrics.

In this blog post, I will benchmark {SLmetrics} and demostrate how it compares to the similar R packages {MLmetrics} and {yardstick} in terms execution time and memory efficiency – essential determinants for scalability and efficiency.

Benchmark Function

To run the benchmark of {SLmetrics}, {MLmetrics} and {yardstick}, I will use {bench} which measures the median execution time and memory efficiency. Below I have created a wrapper function:

## benchmark function
benchmark <- function(
  ..., 
  m = 10) {
  library(magrittr)
  # 1) create list
  # for storing values
  performance <- list()

  for (i in 1:m) {

     # 1) run the benchmarks
    results <- bench::mark(
      ...,
      iterations = 10,
      check = FALSE
    )

    # 2) extract values
    # and calculate medians
    performance$time[[i]]  <- setNames(
        lapply(results$time, mean), 
        results$expression
        )

    performance$memory[[i]] <- setNames(
        lapply(results$memory, function(x) {
             sum(x$bytes, na.rm = TRUE)}
             ), results$expression)

    performance$n_gc[[i]] <- setNames(
        lapply(results$n_gc, sum), results$expression
        )

  }

  purrr::pmap_dfr(
  list(performance$time, performance$memory, performance$n_gc), 
  ~{
    tibble::tibble(
      expression = names(..1),
      time = unlist(..1),
      memory = unlist(..2),
      n_gc = unlist(..3)
    )
  }
) %>%
  dplyr::mutate(expression = factor(expression, levels = unique(expression))) %>%
  dplyr::group_by(expression) %>%
  dplyr::filter(dplyr::row_number() > 1) %>%
  dplyr::summarize(
    execution_time = bench::as_bench_time(median(time)),
    memory_usage = bench::as_bench_bytes(median(memory)),
    gc_calls = median(n_gc),
    .groups = "drop"
  )

}

The wrapper function runs 10 x 10 benchmarks of each passed function – it discards the first run to allow the functions to warm up, before the benchmarks are recorded.

All values are averaged across runs and then presented as the median runtime, median memory usage and median number of gc()-calls during the benchmark.

Benchmarking {SLmetrics}

Bechmarking with and without OpenMP

In the first set of benchmarks, I will demonstrate the new OpenMP feature that has been shipped with version 0.3-1. For the benchmark, we will compare the execution time and memory efficiency of computing a 3×3 confusion matrix on two vectors of length 10,000,000 with and without OpenMP. The source code and results are shown below:

## 1) set seed
set.seed(1903)

## 2) define values
## for classes
actual <- factor(sample(letters[1:3], 1e7, TRUE))
predicted <- factor(sample(letters[1:3], 1e7, TRUE))

## 3) benchmark with OpenMP
SLmetrics::setUseOpenMP(TRUE)
#> OpenMP usage set to: enabled

benchmark(`{With OpenMP}` = SLmetrics::cmatrix(actual, predicted))
#> # A tibble: 1 × 4
#>   expression    execution_time memory_usage gc_calls
#>   <fct>               <bch:tm>    <bch:byt>    <dbl>
#> 1 {With OpenMP}            1ms           0B        0

## 4) benchmark without OpenMP
SLmetrics::setUseOpenMP(FALSE)
#> OpenMP usage set to: disabled

benchmark(`{Without OpenMP}`  = SLmetrics::cmatrix(actual, predicted))
#> # A tibble: 1 × 4
#>   expression       execution_time memory_usage gc_calls
#>   <fct>                  <bch:tm>    <bch:byt>    <dbl>
#> 1 {Without OpenMP}         6.27ms           0B        0

The confusion matrix is computed in less than a millisecond and around six milliseconds with and without OpenMP, respectively. In both cases, it uses zero or near-zero memory.

Benchmarking against {MLmetrics} and {yardstick}

In the second set of benchmarks, I will compare the execution time and memory efficiency of {SLmetrics} against {MLmetrics} and {yardstick}. The source code and results are shown below:

## 1) define classes
set.seed(1903)
fct_actual    <- factor(sample(letters[1:3], size = 1e7, replace = TRUE))
fct_predicted <- factor(sample(letters[1:3], size = 1e7, replace = TRUE))

## 2) perform benchmark
benchmark(
    `{SLmetrics}` = SLmetrics::cmatrix(fct_actual, fct_predicted),
    `{MLmetrics}` = MLmetrics::ConfusionMatrix(fct_predicted, fct_actual),
    `{yardstick}` = yardstick::conf_mat(table(fct_actual, fct_predicted))
)
#> # A tibble: 3 × 4
#>   expression  execution_time memory_usage gc_calls
#>   <fct>             <bch:tm>    <bch:byt>    <dbl>
#> 1 {SLmetrics}         6.34ms           0B        0
#> 2 {MLmetrics}       344.13ms        381MB       19
#> 3 {yardstick}       343.75ms        381MB       19

{SLmetrics} is roughly 60 times faster than both, and significantly more memory efficient as demonstrated by memory_usage and gc_calls. In this perspective, {SLmetrics} is more efficient and scalable than both packages as the memory usage is basically linear. See below:

## 1) define classes
set.seed(1903)
fct_actual    <- factor(sample(letters[1:3], size = 2e7, replace = TRUE))
fct_predicted <- factor(sample(letters[1:3], size = 2e7, replace = TRUE))

## 2) perform benchmark
benchmark(
    `{SLmetrics}` = SLmetrics::cmatrix(fct_actual, fct_predicted),
    `{MLmetrics}` = MLmetrics::ConfusionMatrix(fct_predicted, fct_actual),
    `{yardstick}` = yardstick::conf_mat(table(fct_actual, fct_predicted))
)
#> # A tibble: 3 × 4
#>   expression  execution_time memory_usage gc_calls
#>   <fct>             <bch:tm>    <bch:byt>    <dbl>
#> 1 {SLmetrics}         12.3ms           0B        0
#> 2 {MLmetrics}        648.5ms        763MB       19
#> 3 {yardstick}        654.7ms        763MB       19

{SLmetrics} can process 60x the data in the same time it takes {MLmetrics} and {yardstick} to process 40,000,000 data-points – without any additional memory cost.

Summary

The benchmarks suggests that {SLmetrics} is a strong contender to the more established packages {MLmetrics} and {yardstick} in terms of scalability, memory efficiency and speed.

Installing {SLmetrics}

{SLmetrics} is still under development and is therefore not on CRAN. But the latest release can be installed using {devtools}. A development version is also available for those living on the edge. See below:

Stable version

## install stable release
devtools::install_github(
  repo = 'https://github.com/serkor1/SLmetrics@*release',
  ref  = 'main'
)

Development version

## install development version
devtools::install_github(
  repo = 'https://github.com/serkor1/SLmetrics',
  ref  = 'development'
)

If you made it this far: Thank you for reading the blog post, and feel free to leave a comment here or in the repository.

{SLmetrics}: Machine Learning performance evaluation on steroids

Introduction
{SLmetrics} is a low-level R package designed for efficient performance evaluation in supervised AI/ML tasks. By leveraging {Rcpp} and {RcppEigen}, it ensures fast execution and memory efficiency, making it ideal for handling large-scale datasets. Built on the robust S3 class system, {SLmetrics} integrates seamlessly with stable R packages, ensuring reliability and ease of use for developers and data scientists alike.

Why?
{SLmetrics} combines simplicity with exceptional performance, setting it apart from other packages. While it draws inspiration from {MLmetrics} in its intuitive design, it outpaces it in terms of speed, memory efficiency, and the variety of available performance measures. In terms of features, {SLmetrics} offers functionality comparable to {yardstick} and {scikit-learn}, while being significantly faster.

Current benchmarks show that {SLmetrics} is between 20-70 times faster than {yardstick}, {MLmetrics}, and {mlr3measures} (See Figure 1).

Alt Text
Figure 1. Median execution time of a 2×2 confusion matrix using {SLmetrics}, {MLmetrics}, {mlr3measures} and {yardstick}. The source code can be found in the {SLmetrics} repository on Github.
Whether you’re working with simple models or complex machine learning pipelines, {SLmetrics} provides a highly efficient, reliable solution for model evaluation.

Basic usage of {SLmetrics}
Load {SLmetrics},

library(SLmetrics)
We recode the Species variable and convert the problem to a binary classification problem,

# 1) recode Iris
# to binary classification
# problem
iris$species_num <- as.numeric(
  iris$Species == "virginica"
)

# 2) fit the logistic
# regression
model <- glm(
  formula = species_num ~ Sepal.Length + Sepal.Width,
  data    = iris,
  family  = binomial(
    link = "logit"
  )
)

# 3) generate predicted
# classes
response <- predict(model, type = "response")

# 3.1) generate actual
# classes
actual <- factor(
  x = iris$species_num,
  levels = c(1,0),
  labels = c("Virginica", "Others")
)
Construct the precision-recall curve,

# 4) generate precision-recall
# curve
roc <- prROC(
  actual   = actual,
  response = response
)
Visualize the precision-recall curve,

# 5) plot by species
plot(roc)

Summarise to get the area under the curve metric for each class,

# 5.1) summarise
summary(roc)
#> Reciever Operator Characteristics 
#> ================================================================================
#> AUC
#>  - Others: 0.473
#>  - Virginica: 0.764
The precision-recall function also supports custom thresholds,

# 6) provide custom
# threholds
roc <- prROC(
  actual     = actual,
  response   = response,
  thresholds = seq(0, 1, length.out = 4)
)
Visualize the precision-recall curve with custom thresholds,

# 5) plot by species
plot(roc)


Installing {SLmetrics}
The stable release {SLmetrics} can be installed as follows,
devtools::install_github(
  repo = 'https://github.com/serkor1/SLmetrics@*release',
  ref  = 'main'
)
The development version of {SLmetrics} can be installed as follows,
devtools::install_github(
  repo = 'https://github.com/serkor1/SLmetrics',
  ref  = 'development'
)

Get involved with {SLmetrics}
We’re building something exciting with {SLmetrics}, and your contributions can make a real impact!

While {SLmetrics} isn’t on CRAN yet—it’s a work in progress striving for excellence—this is your chance to shape its future. We’re thrilled to offer co-authorship for substantial contributions, recognizing your expertise and effort.

Even smaller improvements will earn you a spot on our contributor list, showcasing your valuable role in enhancing {SLmetrics}. Join us in creating a high-quality tool that benefits the entire R community. Check out the repository and start contributing today!

{cryptoQuotes}: Open access to cryptocurrency market data in R (Update)

The {cryptoQuotes}-package have been updated to version 1.3.0. With this update comes many new features,  and breaking changes. Prior to version 1.3.0 the package were using camelCase (See for example this post), with no particular style guide. The package now uses the tidyverse style guide which, in return, have deprecated a few core functions.

Note: Only the styling is affected, the returned market data is still xts/zoo-objects
Of the many new features and enhancements includes dark and light themed charting, and  a wide array of new sentiment indicators. The full documentation can be found on pkgdown.

In this blog post the new charting features will be showcased using hourly Bitcoin OHLC-V and long-short ratios from the last two days (From writing this draft).

Cryptocurrency market data in R

# 0) load library
library(cryptoQuotes)
To extract the Bitcoin OHLC-V,  the get_quote()-function [previously getQuote()]  is used as  shown below,

# 1) extract last two
# days of Bitcoin on the
# hourly chart
tail(
  BTC <- get_quote(
    ticker   = "BTCUSDT",
    source   = "binance",
    interval = "1h",
    from     = Sys.Date() - 2
  )
)
#>                        open    high     low   close    volume
#> 2024-06-05 02:00:00 70580.0 70954.1 70462.8 70820.1  7593.081
#> 2024-06-05 03:00:00 70820.2 71389.8 70685.9 71020.7 11466.934
#> 2024-06-05 04:00:00 71020.7 71216.0 70700.0 70892.1  7824.993
#> 2024-06-05 05:00:00 70892.2 71057.0 70819.1 70994.0  5420.481
#> 2024-06-05 06:00:00 70994.0 71327.9 70875.9 71220.2  7955.595
#> 2024-06-05 07:00:00 71220.2 71245.0 70922.0 70988.8  3500.795
The long-short ratios on Bitcoin in the same hourly interval is retrieved using the get_lsratio()-function [previously getLSRatio()] as shown below,

# 2) extract last two days
# of long-short ratio on
# Bitcoin
tail(
  BTC_LS <- get_lsratio(
    ticker   = "BTCUSDT",
    source   = "binance",
    interval = "1h",
    from     = Sys.Date() - 2
  )
)
#>                       long  short  ls_ratio
#> 2024-06-05 02:00:00 0.4925 0.5075 0.9704433
#> 2024-06-05 03:00:00 0.4938 0.5062 0.9755038
#> 2024-06-05 04:00:00 0.4942 0.5058 0.9770660
#> 2024-06-05 05:00:00 0.4901 0.5099 0.9611689
#> 2024-06-05 06:00:00 0.4884 0.5116 0.9546521
#> 2024-06-05 07:00:00 0.4823 0.5177 0.9316206
Prior to version 1.3.0 all charting with indicators were done with the magrittr-pipe operator, both internally and externally. This came with a overhead on both efficiency and readability (Opinionated, I know). The charting has been reworked in terms of layout and syntax.

Below is an example of a dark-themed chart with the long-short ratio alongside simple moving averages, bollinger bands and volume indicators,

# 3) dark-themed
# chart
chart(
  ticker = BTC,
  main   = kline(),
  indicator = list(
    bollinger_bands(),
    sma(n = 7),
    sma(n = 14)

  ),
  sub = list(
    volume(),
    lsr(ratio = BTC_LS)
  )
)

The light-themed chart have been reworked, and have received some extra love, such that its different from the default colors provided by the {plotly}-package,

# 4) light-themed
# chart
chart(
  ticker = BTC,
  main   = kline(),
  indicator = list(
    bollinger_bands(),
    sma(n = 7),
    sma(n = 14)
  ),
  sub = list(
    volume(),
    lsr(ratio = BTC_LS)
  ),
  options = list(
    dark = FALSE
  )
)

About the {cryptoQuotes}-package

The {cryptoQuotes}-package is a high-level API-client that interacts with public market data endpoints from major cryptocurrency exchanges using the {curl}-package.

The endpoints, which are publicly accessible and maintained by the exchanges themselves, ensure a consistent and reliable access to high-quality cryptocurrency market data with R.

Installing {cryptoQuotes}
The {cryptoQuotes}-package can be installed via CRAN,

# installing {cryptoQuotes}
install.packages(
  pkgs ="cryptoQuotes",
  dependencies = TRUE
)

Created on 2024-06-05 with reprex v2.1.0

Gauging Cryptocurrency Market Sentiment in R

Navigating the volatile world of cryptocurrencies requires a keen understanding of market sentiment. This blog post explores some of the essential tools and techniques for analyzing the mood of the crypto market, using the cryptoQuotes-package.

The Cryptocurrency Fear and Greed Index in R

The Fear and Greed Index is a market sentiment tool that measures investor emotions, ranging from 0 (extreme fear) to 100 (extreme greed). It analyzes data like volatility, market momentum, and social media trends to indicate potential overvaluation or undervaluation of cryptocurrencies. This index helps investors identify potential buying or selling opportunities by gauging the market’s emotional extremes.

This index can be retrieved by using the cryptoQuotes::getFGIndex()-function, which returns the daily index within a specified time-frame,

## Fear and Greed Index
## from the last 14 days
tail(
  FGI <- cryptoQuotes::getFGIndex(
    from = Sys.Date() - 14
  )
)
#>            FGI
#> 2024-01-03  70
#> 2024-01-04  68
#> 2024-01-05  72
#> 2024-01-06  70
#> 2024-01-07  71
#> 2024-01-08  71

The Long-Short Ratio of a Cryptocurrency Pair in R

The Long-Short Ratio is a financial metric indicating market sentiment by comparing the number of long positions (bets on price increases) against short positions (bets on price decreases) for an asset. A higher ratio signals bullish sentiment, while a lower ratio suggests bearish sentiment, guiding traders in making informed decisions.

The Long-Short Ratio can be retrieved by using the cryptoQuotes::getLSRatio()-function, which returns the ratio within a specified time-frame and granularity. Below is an example using the Daily Long-Short Ratio on Bitcoin (BTC),

## Long-Short Ratio
## from the last 14 days
tail(
  LSR <- cryptoQuotes::getLSRatio(
    ticker = "BTCUSDT",
    interval = '1d',
    from = Sys.Date() - 14
  )
)
#>              Long  Short LSRatio
#> 2024-01-03 0.5069 0.4931  1.0280
#> 2024-01-04 0.6219 0.3781  1.6448
#> 2024-01-05 0.5401 0.4599  1.1744
#> 2024-01-06 0.5499 0.4501  1.2217
#> 2024-01-07 0.5533 0.4467  1.2386
#> 2024-01-08 0.5364 0.4636  1.1570

Putting it all together

Even though cryptoQuotes::getLSRatio() is an asset-specific sentiment indicator, and cryptoQuotes::getFGIndex() is a general sentiment indicator, there is much information to be gathered by combining this information.

This information can be visualized by using the the various charting-functions in the cryptoQuotes-package,

## get the BTCUSDT
## pair from the last 14 days
BTCUSDT <- cryptoQuotes::getQuote(
  ticker = "BTCUSDT",
  interval = "1d",
  from = Sys.Date() - 14
)
## chart the BTCUSDT
## pair with sentiment indicators
cryptoQuotes::chart(
  slider = FALSE,
  chart = cryptoQuotes::kline(BTCUSDT) %>%
    cryptoQuotes::addFGIndex(FGI = FGI) %>% 
    cryptoQuotes::addLSRatio(LSR = LSR)
)
Bitcoin charted against Fear and Greed Index and the Long-Short Ratio using R
Bitcoin (BTC) plotted with Fear and Greed Index along side the Long-Short Ratio

Installing cryptoQuotes

Installing via CRAN

# install from CRAN
install.packages(
  pkgs = 'cryptoQuotes',
  dependencies = TRUE
)

Installing via Github

# install from github
devtools::install_github(
  repo = 'https://github.com/serkor1/cryptoQuotes/',
  ref = 'main'
)

Note: The latest price may vary depending on time of publication relative to the rendering time of the document. This document were rendered at 2024-01-08 23:30 CET

Cryptocurrency Market Data in R

Getting cryptocurrency OHLCV data in R without having to depend on low-level coding using, for example, curl or httr2, have not been easy for the R community.

There is now a high-level API Client available on CRAN which fetches all the market data without having to rely on web-scrapers, API keys or low-level coding.

Bitcoin Prices in R (Example)

This high-level API-client have one main function, getQuotes(), which returns cryptocurrency market data with a xts– and zoo-class. The returned objects contains Open, High, Low, Close and Volume data with different granularity, from the currently supported exchanges.

In this blog post I will show how to get hourly Bitcoin (BTC) prices in R
using the getQuotes()-function. See the code below,
# 1) getting hourly BTC
# from the last 3 days

BTC <- cryptoQuotes::getQuote(
 ticker   = "BTCUSDT", 
 source   = "binance", 
 futures  = FALSE, 
 interval = "1h", 
 from     = as.character(Sys.Date() - 3)
)
Bitcoin (BTC) OHLC-prices (Output from getQuote-function)
Index Open High Low Close Volume
2023-12-23 19:00:00 43787.69 43821.69 43695.03 43703.81 547.96785
2023-12-23 20:00:00 43703.82 43738.74 43632.77 43711.33 486.4342
2023-12-23 21:00:00 43711.33 43779.71 43661.81 43772.55 395.6197
2023-12-23 22:00:00 43772.55 43835.94 43737.85 43745.86 577.03505
2023-12-23 23:00:00 43745.86 43806.38 43701.1 43702.16 940.55167
2023-12-24 43702.15 43722.25 43606.18 43716.72 773.85301

The returned Bitcoin prices from getQuotes() are compatible with quantmod and TTR, without further programming. Let me demonstrate this using chartSeries(), addBBands() and addMACD() from these powerful libraries,

# charting BTC
# using quantmod
quantmod::chartSeries(
 x = BTC,
 TA = c(
    # add bollinger bands
    # to the chart
    quantmod::addBBands(), 
    # add MACD indicator
    # to the chart
    quantmod::addMACD()
 ), 
 theme = quantmod::chartTheme("white")
)
Cryptocurrency charts using R
Charting Bitcoin prices using quantmod and TTR

Installing cryptoQuotes

Stable version

# install from CRAN
install.packages(
  pkgs = 'cryptoQuotes',
  dependencies = TRUE
)

Development version

# install from github
devtools::install_github(
  repo = 'https://github.com/serkor1/cryptoQuotes/',
  ref = 'main'
)