Benchmarking API Performance: R-Native and Plumber in Data Extraction and Sending

R, a language best known for its prowess in statistical analysis and data science, might not be the first choice that comes to mind when thinking about building APIs. However, rapid prototyping, scalability, seamless integration with data analysis,  and ease of debugging are reasons for me to encapsulate API functionality within R packages. In doing so, I like to distinguish between two approaches:



The R-native approach (blue): API interaction is available by functions of relevant R-packages that are directly installed into the R-session.

The Plumber approach (green): The Plumber package allows R code to be exposed as a RESTful web service through special decorator comments.  As a results, API functionality is assesed by sending REST calls (GET, POST, …) rather than  calling functions of R-packages directly. 

#* This endpoint is a dummy for loading data
#* @get load_from_db
#* @serializer unboxedJSON
#* @param row_limit :int
#* @response 200 OK with result
#* @response 201 OK without result
#* @response 401 Client Error: Invalid Request - Class
#* @response 402 Client Error: Invalid Request - Type
#* @response 403 Client Error: Missing or invalid parameter
#* @response 500 Server Error: Unspecified Server error
#* @response 501 Server Error: Database writing failed
#* @response 502 Server Error: Database reading failed
#* @tag demo
function(res, row_limit = 10000) {
   # load data from db 
   data <- dplyr::tibble()
   res$status <- 200
   res$body <- as.list(data)
   return(res$body)
}

Which approach to use?

Certainly, both approaches have their strengths and limitations. It should be no surprise, that in terms of execution time, CPU utilization and format consistency, the R-native approach is likely to be the first choice, as code and data is processed within one context. Furthermore, the approach offers flexibility for complex data manipulations, but can be challenging when it comes to maintenance, especially propagating new releases of packages to all relevant processes and credential management. To the best of my knowledge, there is no automated way of re-installing new releases directly into all related R-packages, even with using the POSIT package manager – so this easily becomes tedious.

In contrast, the Plumber API encourages modular design that enhances code organization and facilitates integration with a wide array of platforms and systems.  It streamlines package updates while ensuring a consistent interface. This means that interacting with a Plumber API remains separate from the underlying code logic provided by the endpoint. This approach not only improves version management but also introduces a clear separation between the client and server. In general, decoupling functionality through a RESTful API offers the possibility of dividing tasks into separate development teams more easily and thus a higher degree of flexibility and external support. Additionally, I found distributing a Plumber API notably more straightforward than handing over a raw R package. 

The primary goal of this blog post is to quantify the performance difference between the two approaches when it comes to getting data in and out of a database. Such benchmark can be particularly valuable for ETL (Extract, Transform, Load) processes, thereby shedding light on the threshold at which the advantages of the Plumber approach cease to justify its constraints. In doing so, we hope to provide information to developers who are faced with the decision of whether it makes sense to provide or access R functionalities via Plumber APIs.

Experimental Setup
The experimental setup encompassed a virtual machine instance equipped with  64GB RAM and an Intel(R) Xeon(R) Gold 6152 CPU clocked at 2.1GHz, incorporating 8 kernels, running Ubuntu 22.04 LTS, hosting the POSIT Workbench and Connect server (for hosting the Plumber API) and employing R version v4.2.1. Both POSIT services were granted identical access permissions to the virtual machine’s computational resources.

Both approaches are evaluated in terms of execution times, simply measured with system.time(), and maximal observed CPU load, the latter being expecially an important indicator on how how much data can be extracted and send at once. For each fixed number of data row, ranging from 10^4 to 10^7, 10 trials are being conducted and results beeing plotted by using a jittered beeswarm plot. For assessing the cpu load during the benchmark, I build a separate function that returns a new session object, within which every 10 seconds the output of NCmisc::top(CPU = FALSE) is appended to a file.

get_cpu_load <- function(interval = 10, root, name, nrow) {
  rs_cpu <- callr::r_session$new()
  rs_cpu$call(function(nrow, root, name, interval) {
    files <- list.files(root)
    n_files <- sum(stringr::str_detect(files, sprintf("%s_%s_", name, format(nrow, scientific = FALSE))))
    l <- c()
    while (TRUE) {
      ret <- NCmisc::top(CPU = FALSE)
      l <- c(l, ret$RAM$used * 1000000)
      save(l, file = sprintf("%s/%s_%s_%s.rda", root, name, format(nrow, scientific = FALSE), n_files + 1))
      Sys.sleep(interval)
    }
  }, args = list(nrow, root, name, interval))
  return(rs_cpu)
}


Result
Execution time: in the following figure A), the data extraction process is observed to be approximately 10 times slower, when utilizing the plumber API as compared to the R-native approach across all dataset sizes.  


(y-axis in logarithmic scale)

Both approaches display a linear increase in execution time on a logarithmic time scale, indicating exponential growth in the original data domain. Specifically, the mean execution times for R-native and Plumber start at 0.00078 and 0.00456 minutes, respectively, and escalate to 0.286 and 2.61 minutes. It is reasonable to assume that this exponential trend persists for larger datasets, potentially resulting in execution times exceeding half an hour for very large tables (> 100 million rows) when using Plumber.

Conversely, subfigure B) shows the execution time for sending data and illustrates that both approaches provide rather comparable performance, particularly with larger numbers of rows. While for 10,000 rows, the R-native approach is still twice as fast (average of 0.0023 minutes) compared to Plumber (0.00418), the advantage of being in one context diminishes as the number of rows increases. At 10 million rows, the Plumber approach is even faster than the R-native approach (1.88 min), averaging 1.7 minutes. Once again, the execution time exhibits an exponential growth trend with an increasing number of rows.

CPU Load: In examining maximum observable CPU load during data receiving and sending, notable differences emerge between the Plumber API and the R-native approach. 

(y-axis in logarithmic scale)

A) For data extraction up to 1 million rows, CPU utilization remains below 10% for both approaches. However, the utilization patterns diverge as row counts increase. Notably, the R-native approach maintains relatively consistent CPU usage (averaging 5.53%, 5.48%, 5.47%) up to 1 million rows, whereas the Plumber approach already experiences a noticeable increase (5.97%, 6.05%, 8.6%). When extracting 10 million rows, CPU usage surpasses 30% for Plumber, while R-native extraction incurs approximately five times less computational overhead. B) In contrast to execution time, a clear difference in CPU utilization becomes evident also during sending data. The R-native approach consistently demonstrates at least half as less CPU demands compared to Plumber across all data row sizes. For 10,000,000 rows, the plumber approach even consumes over three times more CPU power13.1% vs. 43.2%.  This makes up to almost 30GB in absolute terms.

Conclusion

The Plumber approach, while offering several advantages, encounters clear limitations when dealing with large datasets, be it tables with a substantial number of rows or extensive columns. As a result,  data extraction becomes roughly ten times slower, with CPU utilization being up to five and three times higher during getting data out and in, respectively. Digging deeper into it reveals that this gap is likely to result from the necessity of converting data into JSON format when using a web-based architecture. Plumber can’t handle R dataframes directly, which is why serializer have to to be used before sending and retrieving data from an endpoint. Even with lots of RAM capacity, this conversion process can lead to execution errors in practice as JSON representations may surpass the allowed byte size for the R datatype character.

>jsonlite::toJSON(dataframe)
Error in collapse_object(objnames, tmp, indent):
R character strings are limited to 2^31-1 bytes

The only viable workaround in such scenarios involves breaking down tables into smaller chunks based on certain identifiers.

Providing a precise table size limitation where the Plumber approach remains suitable proves challenging, as it hinges on a multitude of factors, including the number of rows, columns, and cell content within the dataset. Personally, I will stick to using the Plumber API for scenarios with limited data traffic, such as querying terminology or a statistical summary, as I generally prioritize code encapsulation and ease of maintenance over maximizing performance.

Micha Christ
Bosch Health Campus Centrum für Medizinische Datenintegration

55,000 in Awards for Energy & Buildings Hackathon, Sponsored by NYSERDA

The New York State Energy Research & Development Agency (NYSERDA) is partnering with Onboard Data to host a $55,000 Global Energy & Buildings Hackathon. We’re inviting all engineers, data scientists and software developers whether they are professionals, professors, researchers or students to participate. More below…


Challenge participants will propose exciting, new ideas that can improve our world’s buildings. The hackathon will share data from 200+ buildings to participants. This data set is rich and one of a kind. The data set is normalized from equipment, systems and IoT devices found within buildings.
We seek submissions that positively impact or accelerate the decarbonization of New York State buildings. 

Total awards are $55,000. Sign-ups stay open until April 15th and the competition is open from April 22nd to May 30th. More can be found here: www.rtemhackathon.com.

Advance the next generation of building technology!

Climate Change & AI for GOOD | Online Open Forum Oct 15th

Join Data Natives for a discussion on how to curb Climate Change and better protect our environment for the next generation. Get inspired by innovative solutions which use data, machine learning and AI technologies for GOOD. Lubomila Jordanova, Founder of Plan A, and featured speaker, explains that “the IT sector will use up to 51% of the global energy output in 2030. Let’s adjust the digital industry and use Data for Climate Action, because carbon reduction is key to making companies future-proof.” When used carefully, AI can help us solve some of the most serious challenges. However, key to that success is measuring impact with the right methods, mindsets, and metrics.

The founders of startups that developed innovative solutions to combat humanity’s biggest challenge, will share their experiences and thoughts: Brittany Salas (Co-Founder at Active Giving) Peter Sänger (Co-Founder/Executive Managing Director at Green City Solutions GmbH) Shaheer Hussam (CEO & Co-Founder at Aetlan) | Lubomila Jordanova (Founder at Plan A)  Oliver Arafat (Alibaba Cloud’s Senior Solution Architect)

Details
What? Climate Change & AI for GOOD | DN Unlimited Open Forum powered by Alibaba Cloud
When? October 15th at 6 PM CET
Where? Online, worldwide
Register for FREE here: https://datanatives.io/climate-change-ai-for-good-open-forum/

Creating Interactive Charts with R, Shiny, MySQL and AnyChart JS via Template

How to creating interactive chart with R, Shiny, MySQL and AnyChart JS via template

Data visualization and charting are actively evolving as a more and more important field of web development. In fact, people perceive information much better when it is represented graphically rather than numerically as raw data. As a result, various business intelligence apps, reports, and so on widely implement graphs and charts to visualize and clarify data and, consequently, to speed up and facilitate its analysis for further decision making. While there are many ways you can follow to handle data visualization in R, today let’s see how to create interactive charts with the help of popular JavaScript (HTML5) charting library AnyChart. It has recently got an official R, Shiny and MySQL template that makes the whole process pretty straightforward and easy. (Disclaimer: I am the CTO at the AnyChart team. The template I am talking about here is released under the Apache 2.0 license; the library itself can be used for free in any personal, educational and other non-profit projects and is open on GitHub, but for commercial purposes it requires a commercial license though is fully functional when taken on a free trial.) In this step-by-step tutorial, we will take a closer look at the template and the basic pie chart example it includes, and then I will show you how to quickly modify it to get some different data visualization, e.g. a 3D column (vertical bar) chart.

Briefly about AnyChart

AnyChart is a flexible, cross-browser JS charting library for adding interactive charts to websites and web apps in quite a simple way. Basically, it does not require any installations and work with any platform and database. Some more of AnyChart’s features include (but are not limited to): Templates for popular technology stacks, like R, Shiny and MySQL in the present case, can further facilitate AnyChart’s integration.

Getting started

First of all, let’s make sure the R language is installed. If not, you can visit the official R website and follow the instructions. If you have worked with R before, most likely you already have RStudio. Then you are welcome to create a project in it now, because the part devoted to R can be done there. If currently you do not have RStudio, you are welcome to install it from the official RStudio website. But, actually, using RStudio is not mandatory, and the pad will be enough in our case. After that, we should check if MySQL is properly installed. To do that, you can open a terminal window and enter the next command: $ mysql –version mysql Ver 14.14 Distrib 5.7.16, for Linux (x86_64) using EditLine wrapper You should receive the above written response (or a similar one) to be sure all is well. Please follow these instructions to install MySQL, if you do not have it at the moment. Now that all the required components have been installed, we are ready to write some code for our example.

Basic template

First, to download the R, Shiny and MySQL template for AnyChart, type the next command in the terminal: $ git clone https://github.com/anychart-integrations/r-shiny-mysql-template.git The folder you are getting here features the following structure: r-shiny-mysql-template/ www/ css/ style.css # css style app.R # main application code database_backup.sql # MySQL database dump LICENSE README.md index.html # html template Let’s take a look at the project files and examine how this sample works. We’ll run the example first. Open the terminal and go to the repository folder: $ cd r-shiny-mysql-template Set up the MySQL database. To specify your username and password, make use of -u and -p flags: $ mysql < database_backup.sql Then run the R command line, using the command below: $ R And install the Shiny and RMySQL packages as well as initialize the Shiny library at the end: > install.packages("shiny") > install.packages("RMySQL") > library(shiny) If you face any problems during the installation of these dependencies, carefully read error messages, e.g. you might need sudo apt-get install libmysqlclient-dev for installing RMySQL. Finally, run the application: > runApp("{PATH_TO_TEMPLATE}") # e.g. runApp("/workspace/r-shiny-mysql-template") And the new tab that should have just opened in your browser shows you the example included in the template: Interactive Pie chart created with R and AnyChart JS charting library. Basic sample from R, Shiny and MySQL integration template

Basic template: code

Now, let’s go back to the folder with our template to see how it works. Files LICENSE and README.md contain information about the license and the template (how to run it, technologies, structure, etc.) respectively. They are not functionally important to our project, and therefore we will not explore them here. Please check these files by yourself for a general understanding. The style.css file is responsible for the styles of the page. The database_backup.sql file contains a code for the MySQL table and user creation and for writing data to the table. You can use your own table or change the data in this one. Let’s move on to the code. First, open the app.R file. This file ensures the connection to the MySQL database, reads data, and passes it to the index.html file, which contains the main code of using AnyChart. The following is a part of the app.R code, which contains the htmlTemplate function; here we specify the name of the file where the data will be transmitted to, the names of our page and chart, and the JSON encoded chart data from MySQL database. htmlTemplate("index.html", title = "Anychart R Shiny template", chartTitle = shQuote("Top 5 fruits"), chartData = toJSON(loadData()) The main thing here is the index.html file, which is actually where the template for creating charts is. As you see, the first part of this file simply connects all necessary files to the code, including the AnyChart library, the CSS file with styles, and so on. I’ll skip this for now and proceed directly to the script tag and the anychart.onDocumentReady (function () {...}) function. [code lang=”javascript”] anychart.onDocumentReady(function() { var chart = anychart.pie({{ chartData }}); chart.title({{ chartTitle }}); chart.container("container"); chart.draw(); }); This pattern works as follows. We create a pie chart by using the function pie() and get the data that have already been read and prepared using the R code. Please note that the names of the variables containing data are the same in the app.R and index.html files. Then we display the chart title via (chart.title({{ chartTitle }})) and specify the ID of the element that will contain a chart, which is a div with id = container in this case. To show all that was coded, we use сhart.draw().

Modifying the template to create a custom chart

Now that we’ve explored the basic example included in the template, we can move forward and create our own, custom interactive chart. To do that, we simply need to change the template a little bit and add some features if needed. Let’s see how it works. First, we create all the necessary files by ourselves or make a new project using RStudio. Second, we add a project folder named anychart. Its structure should look like illustrated below. Please note that some difference is possible (and acceptable) if you are using a new project in RStudio. anychart/ www/ css/ style.css # css style ui.R # main application code server.R # sub code database_backup.sql # data set index.html # html template Now you know what files you need. If you’ve made a project with the studio, the ui.R and server.R files are created automatically. If you’ve made a project by yourself, just create empty files with the same names and extensions as specified above. The main difference from the original example included in the template is that we should change the file index.html and divide app.R into parts. You can copy the rest of the files or create new ones for your own chart. Please take a look at the file server.R. If you’ve made a project using the studio, it was created automatically and you don’t need to change anything in it. However, if you’ve made it by yourself, open it in the Notepad and add the code below, which is standard for the Shiny framework. You can read more about that here. The file structure of ui.R is similar to the one of app.R, so you can copy app.R from the template and change/add the following lines: loadData = dbGetQuery(db, "SELECT name, value FROM fruits") data1 <- character() #data preparation for(var in 1:nrow(loadData)){ c = c(as.character(loadData[var, 1]), loadData[var, 2]) data1 <- c(data1, c) } data = matrix(data1, nrow=nrow(loadData), ncol=2, byrow=TRUE) ui = function(){ htmlTemplate("index.html", title = "Anychart R Shiny template", chartTitle = shQuote("Fruits"), chartData = toJSON(data) )} Since we are going to change the chart type, from pie to 3D vertical bar (column), the data needs some preparation before being passed to index.html. The main difference is that we will use the entire data from the database, not simply the top 5 positions. We will slightly modify and expand the basic template. Let’s see the resulting code of the index.html first (the script tag) and then explore it. [code lang=”javascript”] anychart.onDocumentReady(function() { var chart = anychart.column3d({{ chartData }}); chart.title({{ chartTitle }}); chart.animation(true); var xAxis = chart.xAxis(); xAxis.title("fruits"); var yAxis = chart.yAxis(); yAxis.title("pounds, t"); var yScale = chart.yScale(); yScale.minimum(0); yScale.maximum(120); chart.container("container"); chart.draw(); }); With the help of var chart = anychart.column3d({{chartData}}), we are creating a 3D column chart by using the function column3d(). Here you can choose any other chart type you need; consider getting help from Chartopedia if you are unsure which one works best in your situation. Next, we are adding animation to the column chart via chart.animation (true) to make it appear on page load gradually. In the following section, we are creating two variables, xAxis and yAxis. Including these is required if you want to provide the coordinate axes of the chart with captions. So, you should create variables that will match the captions for the X and Y axes, and then use the function, transmit the values that you want to see. The next unit is basically optional. We are explicitly specifying the maximum and minimum values for the Y axis, or else AnyChart will independently calculate these values. You can do that the same way for the X axis. And that’s it! Our 3D column chart is ready, and all seems to be fine for successfully running the code. The only thing left to do before that is to change the MySQL table to make it look as follows: (‘apple’,100), (‘orange’,58), (‘banana’,81), (‘lemon’,42), (‘melon’,21), (‘kiwi’,66), (‘mango’,22), (‘pear’,48), (‘coconut’,29), (‘cherries’,65), (‘grapes’,31), (‘strawberries’,76), To see what you’ve got, follow the same steps as for running the R, Shiny and MySQL template example, but do not forget to change the path to the folder and the folder name to anychart. So, let’s open the terminal and command the following: $ cd anychart $ mysql < database_backup.sql $ R > install.packages("shiny") > install.packages("RMySQL") > library(shiny) > runApp("{PATH_TO_TEMPLATE}") # e.g. runApp("/workspace/anychart") Interactive 3D Column chart made with R and AnyChart JS charting library, based on R, Shiny and MySQL integration template For consistency purposes, I am including the code of ui.R and server.R below. The full source code of this example can be found on GitHub.

ui.R:

library(shiny) library(RMySQL) library(jsonlite) data1 <- character() db = dbConnect(MySQL(), dbname = "anychart_db", host = "localhost", port = 3306, user = "anychart_user", password = "anychart_pass") loadData = dbGetQuery(db, "SELECT name, value FROM fruits") #data preparation for(var in 1:nrow(loadData)){ c = c(as.character(loadData[var, 1]), loadData[var, 2]) data1 <- c(data1, c) } data = matrix(data1, nrow=nrow(loadData), ncol=2, byrow=TRUE) server = function(input, output){} ui = function(){ htmlTemplate("index.html", title = "Anychart R Shiny template", chartTitle = shQuote("Fruits"), chartData = toJSON(data) )} shinyApp(ui = ui, server = server)

server.R:

library(shiny) shinyServer(function(input, output) { output$distPlot <- renderPlot({ # generate bins based on input$bins from ui.R x <- faithful[, 2] bins <- seq(min(x), max(x), length.out = input$bins + 1) # draw the chart with the specified number of bins hist(x, breaks = bins, col = ‘darkgray’, border = ‘white’) }) })

Conclusion

When your technology stack includes R, Shiny and MySQL, using AnyChart JS with the integration template we were talking about in this tutorial requires no big effort and allows you to add beautiful interactive JavaScript-based charts to your web apps quite quickly. It is also worth mentioning that you can customize the look and feel of charts created this way as deeply as needed by using some of the library’s numerous out-of-the-box features: add or remove axis labels, change the background color and how the axis is positioned, leverage interactivity, and so on. The scope of this tutorial is likely to be actually even broader, because the process I described here not only applies to the AnyChart JS charting library, but also is mostly the same for its sister libraries AnyMap (geovisualization in maps), AnyStock (date/time graphs), and AnyGantt (charts for project management). All of them are free for non-profit projects but – I must put it clearly here again just in case – require a special license for commercial use. I hope you find this article helpful in your activities when it comes to interactive data visualization in R. Now ask your questions, please, if any.