jhk0530 – R-posts.com

Is round(0.5) 0 or 1?

Actually, it’s both possible

This Article was originally published before on YOZM-IT as Korean

Various way of data science

There are many programming languages in the world and software that utilizes them. And those play an important role in “Data science”.

For example, if you’re using funnel analysis to improve your product, you might want to

Compare the bounce rates of funnel stages before and after an event,
And perform a ratio test to calculate their statistical significance.

Meanwhile, data scientists have various career backgrounds and experiences. So They tend to use the methods they’re comfortable with, including Python, R, SAS and more.

We see this quite a bit, because in most cases, the software you use at the level of business doesn’t make much of a difference.

But what happens if you “produce different results by the software used?”

The following image shows the results of running a proportion test in R, Python, and STATA with example mentioned.

You can see that even though we used the same values of 1000 and 123, the p-value, which indicates the significance of the proportion test, is slightly different for each method.

There are many reasons why the calculation value is different depending on the method used, such as

Different algorithms in the core logic of the programming language
Different default values of the parameters used in the function.

In the example above, if you change the value of the parameter correct in R and apply “Continuity correction” as using “correct = F” , you can see that the result is the same as in STATA.

Rounding

Next, I’ll introduce rounding for more general data analysis.

Similarly, you can see that the round changes its value depending on software.

If the fee is “0.5 billion” in some large financial transaction in business, the rounded cost could be zero or 1 billion, depending on how you calculate the rounding.

Another case could be Logistic regression, which various round can be reverse prediction.

Image from Wikipedia, edited by the author

Why is round different?

Let’s talk a little more about why this round is different.

Rounding as we usually perceive it means changing 0 ~ 4 to 0, and 5 ~ 9 to 10, as shown below image.

And in decimal units, is rounding to the nearest whole number by changing .0 ~ .4999.. to 0 and .5 ~ .9999.. to 1.

However, there are a number of mathematical interpretations of when exactly 0.5 , and when it is a negative number.

For example, round(-23.5) should produce -23 or -24?

Both are possible, depending on the mathematical interpretation and it’s called as rounding half up and rounding half down respectively. We can take this a step further and round both positive and negative numbers closer to zero, or vice versa.

This means that round(-23.5) will round to -23, and round(23.5) will round to 23, or round to -24 and 24, respectively. These are represented by the names Rounding half toward zero, Rounding half away from zero, respectively.

Finally, there are methods called Rounding half to even and Rounding half to odd, which mean that we want to consider the nearest integers to be even and odd, respectively.

In particular, the Rounding half to even method also goes by the names Convergent rounding, Statistician’s rounding, Dutch rounding, Gaussian rounding, and Bankers’ rounding, and is one of the official standard methods according to IEEE 754.

Bankers’ rounding

Bankers’s rounding, is default method in R , so Let’s breif a little bit more.

The image below shows the result of rounding from 0.0 to 2.0.

While this may seem like a good idea, there is actually a problem. Because .5 is unconditionally rounded to the next integer, there is an unconditional bias towards rounding to a “+ value”.

I don’t know the exact reason for this, but one theory is that the US IRS used to use this rounding to collect taxes and was sued for unfairly profiting by collecting more taxes from people who were .5 off, so they lost the case and changed to rounding to the nearest even (or odd) number to match the .5 rounding.

This means that by modifying the rounding as shown below, we can avoid the bias that was previously occurring.

The problem with different results

In recent years, industries in various domains, including pharmaceuticals and finance, have been trying to switch from “commercial” software such as SPSS, SAS and STATA to “open source” software such as Python, R and Julia .

And as rounding mentioned earlier, diffrent result issue by software has been also raised which can create problems in terms of reproducibility, uncertainty, accuracy, and traceability.

So if you’re utilizing multiple softwares, you should be aware of why they produce different results, and how you can use them to properly

CAMIS project

CAMIS stands for Comparing Analysis Method Implementations in Software.

This project compares the differences in softwares (or programming languages) and make standards to produce the same results.

The core area of the project is the “statistical computation” part, so most contributions come from the data science leaders who have strong understanding with it.

But CAMIS is also an open source project, that is not restricted and maintained with various people through regular discussions, collaboration, and sharing of project progress.

Below is one of the comparisons published on the CAMIS project’s webpage, which reviews how a one sample t-test is run with each software, what the results are, and how the results are compatible with each other.

The CAMIS project was started by members who interested in “SAS to R” in the medical and pharmaceutical industry. So it mainly focuses on R and SAS along major statistical data analysis, but recently it’s also working on how to use Python for data science in a broader domain of the industry.

Not only clasiccal methods such as Hypothesis tests, Regression analysis, but modern methods in data science such as Bayesian statistics, Causal inference and novel implementations of existing methods (e.g. MMRM) are topic of interest in project.

Sessions are increasingly appearing at multiple data science conferences, where many researchers and contributors are encouraged to promote, contribute and utilize it as a reference.

Finally, the CAMIS project is also collaborating with academia beyond the data science industry, as similar topics have been published in The American Statistician and Drug Information Association, among others.

The project is also currently working with students on a thesis entitled “A comparison of MMRM methodology in SAS and R software” and is open to collaborations and suggestions on other topics.

Summary

Various software used in data science. As the domain, the libraries or software used by an organization may be dependent on a particular language, which can sometimes be mixed with personal preferred methods. (in many cases, this doesn’t vary much at the level of the business)

However, if you’re not careful, the methods you use can lead to different results.

In this article, I’ve given you some examples of and reasons for differences in the methods used by different software for calculations, and introduced the CAMIS project, a research project that aims to minimize them to ensure consistency in data analysis.

If you use different software in your data analytics work, it’s a good idea to take a look at them to understand the differences and try to find the optimal method for your purposes,

And if you work in data science in the field, I highly recommend that you take an interstate in or contribute to the CAMIS project for a global collaborative experience.

Add shiny in quarto blog with shinylive

Shiny, without server

In previous article, I introduced method to share shiny application in static web page (github page)

At the core of this method is a technology called WASM, which is a way to load and utilize R and Shiny-related libraries and files that have been converted for use in a web browser. The main problem with wasm is that it is difficult to configure, even for R developers.

Of course, there was a way called shinylive, but unfortunately it was only available in python at the time.

Fortunately, after a few months, there is an R package that solves this configuration problem, and I will introduce how to use it to add a shiny application to a static page.

shinylive

shinylive is R package to utilize wasm above shiny. and now it has both Python and R version, and in this article will be based on the R version.

shinylive is responsible for generating HTML, Javascript, CSS, and other elements needed to create web pages, as well as wasm-related files for using shiny.

You can see examples created with shinylive at this link.

Install shinylive

While shinylive is available on CRAN, it is recommended to use the latest version from github as it may be updated from time to time, with the most recent release being 0.1.1. Additionally, pak is the recently recommended R package for installing R packages in posit, and can replace existing functions like install.packages() and remotes::install_github().

# install.packages("pak")

pak::pak("posit-dev/r-shinylive")

You can think of shinylive as adding a wasm to an existing shiny application, which means you need to create a shiny application first.

For the example, we’ll use the code provided by shiny package (which you can also see by typing shiny::runExample("01_hello") in the Rstudio console).

library(shiny)

ui <- fluidPage(

titlePanel("Hello Shiny!"),
  sidebarLayout(
  sidebarPanel(
    sliderInput(
      inputId = "bins",
      label = "Number of bins:",
      min = 1,
      max = 50,
      value = 30
    )
  ),
  mainPanel(
    plotOutput(outputId = "distPlot")
    )
  )
)

server <- function(input, output) {
  output$distPlot <- renderPlot({
  x <- faithful$waiting
  bins <- seq(min(x), max(x), length.out = input$bins + 1)
  hist(x,
    breaks = bins, col = "#75AADB", border = "white",
    xlab = "Waiting time to next eruption (in mins)",
    main = "Histogram of waiting times"
  )
  })
}

shinyApp(ui = ui, server = server)

This code creates a simple shiny application that creates a number of histograms in response to the user’s input, as shown below.

There are two ways to create a static page with this code using shinylive, one is to create it as a separate webpage (like previous article) and the other is to embed it as internal content on a quarto blog page .

First, here’s how to create a separate webpage.

shinylive via web page

To serve shiny on a separate static webpage, you’ll need to convert your app.R to a webpage using the shinylive package you installed earlier.

Based on creating a folder named shinylive in my Documents(~/Documents) and saving `app.R` inside it, here’s an example of how the export function would look like

shinylive::export('~/Documents/shinylive', '~/Documents/shinylive_out')

When you run this code, it will create a new folder called shinylive_out in the same location as shinylive, (i.e. in My Documents), and inside it, it will generate the converted wasm version of shiny code using the shinylive package.

If you check the contents of this shinylive_out folder, you can see that it contains the webr, service worker, etc. mentioned in the previous post.

More specifically, the export function is responsible for adding the files from the local PC’s shinylive package assets, i.e. the library files related to shiny, to the out directory on the local PC currently running R studio.

Now, if you create a github page or something based on the contents of this folder, you can serve a static webpage that provides shiny, and you can preview the result with the command below.

httpuv::runStaticServer("~/Documents/shinylive_out")

shinylive in quarto blog

To add a shiny application to a quarto blog, you need to use a separate extension. The quarto extension is a separate package that extends the functionality of quarto, similar to using R packages to add functionality to basic R.

First, we need to add the quarto extension by running the following code in the terminal (not a console) of Rstudio.

quarto add quarto-ext/shinylive

You don’t need to create a separate file to plant shiny in your quarto blog, you can use a code block called {shinylive-r}. Additionally, you need to set shinylive in the yaml of your index.qmd.

filters: 
- shinylive

Then, in the {shinylive-r} block, write the contents of the app.R we created earlier.

#| standalone: true
#| viewerHeight: 800
library(shiny)
ui <- fluidPage(
  titlePanel("Hello Shiny!"),
  sidebarLayout(
    sidebarPanel(
      sliderInput(
        inputId = "bins",
        label = "Number of bins:",
        min = 1,
        max = 50,
        value = 30
      )
    ),
    mainPanel(
      plotOutput(outputId = "distPlot")
    )
  )
)
server <- function(input, output) {
  output$distPlot <- renderPlot({
    x <- faithful$waiting
    bins <- seq(min(x), max(x), length.out = input$bins + 1)
    hist(x,
      breaks = bins, col = "#75AADB", border = "white",
      xlab = "Waiting time to next eruption (in mins)",
      main = "Histogram of waiting times"
    )
  })
}
shinyApp(ui = ui, server = server)

after add this in quarto blog, you may see working shiny application.

You can see working example in this link

Summary

shinylive is a feature that utilizes wasm to run shiny on static pages, such as GitHub pages or quarto blogs, and is available as an R package and quarto extension, respectively.

Of course, since it is less than a year old, not all features are available, and since it uses static pages, there are disadvantages compared to utilizing a separate shiny server.

However, it is very popular for introducing shiny usage and simple statistical analysis, and you can practice it right on the website without installing R, and more features are expected to be added in the future.

The code used in blog (previous example link) can be found at the link.

Author: jhk0530

Use google’s Gemini in R with R package “gemini.R”

Introduction

Few days ago, Google presented their own multimodal-LLM named as “Gemini”.

Also there was article named “How to Integrate google’s gemini AI model into R” that tells us how to use gemini API in R brieflly.

Thanks to Deepanshu Bhalla (writer of above article), I’ve many inspirations and made some research to utilize Gemini API more. And I’m glad to share the results with you.

In this article, I want to highlight to How to use gemini with R and Shiny via R package for Gemini API

(You can see result and contribute in github repository: gemini.r)

Gemini API

As today (23.12.26), Gemini API is mainly consisted with 4 things. you can see more details in official docs.

1. Gemini Pro: Is get Text and returns Text
2. Gemini Pro Vision: Is get Text and Image and returns Text
3. Gemini Pro Multi-turn: Just chat
4. Embedding: for NLP

and I’ll use 1 & 2.

You can get API keys in Google AI Studio

However, offical docs doesn’t describe for how to use Gemini API in R. (How sad)
But we can handle it as “REST API” ( I’ll explain it later)

Shiny application

I made very brief concept of Shiny application that uses Gemini API for get Image and Text (maybe “Explain this picture”) and returns Answer from Gemini

(Number is expected user flow)

This UI, is consisted 5 components.

1. fileInput for upload image
2. imageOutput for show uploaded Image
3. textInput for prompt
4. actionButton for send API to Gemini
5. textOutput for show return value from Gemini

And this is result of shiny and R code (Again, you can see it in github repository)

—
library(shiny)
library(gemini.R)

ui <- fluidPage(
sidebarLayout(
NULL,
mainPanel(
fileInput(
inputId = “file”,
label = “Choose file to upload”,
),
div(
style = ‘border: solid 1px blue;’,
imageOutput(outputId = “image1”),
),
textInput(
inputId = “prompt”,
label = “Prompt”,
placeholder = “Enter Prompts Here”
),
actionButton(“goButton”, “Ask to gemini”),
div(
style = ‘border: solid 1px blue; min-height: 100px;’, textOutput(“text1”)
)
)
)
)

server <- function(input, output) {
observeEvent(input$file, {
path <- input$file$datapath
output$image1 <- renderImage({
list( src = path )
}, deleteFile = FALSE) })

observeEvent(input$goButton, {
output$text1 <- renderText({
gemini_image(input$prompt, input$file$datapath)
})
})
}

shinyApp(ui = ui, server = server)
—

gemini.R package

I think you may think “What is gemini_image function?”

It is function to send API to Gemini server and return result.

and it consisted with 3 main part.

1. Model query
2. API key
3. Content

I used gemini_image function in example. but I’ll gemini function first (which is function to send text and get text)

Gemini’s API example usage is looks like below. (for REST API)

Which can be transformed like below in R

Also, gemini API key must set before use with “Sys.setenv” function

Anyway, I think you should note, body for API is mainly consisted with list.

Similarly, gemini_image function for Gemini Pro Vision API looks like below

Note that, image must encoded as base64 using base64encode function and provided as separated list.

Example

So with Shiny application and gemini.r package.

You now can run example application to ask image to Gemini.

Summary

I made very basic R package “gemini.R” to use Gemini API.

Which provides 2 function: gemini and gemini_image.

And still there’s many possiblity for develop this package.

like feature to Chat like bard or provide NLP Embedding

and finally, I want to hear feedback or contribution from you. (Really)

Thanks.

* P.S, I think just using bard / chatGPT / copilot is much better for personal usage. (unless you don’t want to provide AI service via R)

Build serverless shiny application via Github page

Simple guide for simple shiny application

TL;DR

I made shiny application in github page with quarto.

You can check code in my github repository and result, result2

How we use shiny

Shiny is R package to make user utilize R with web browser without install it.

So my company utilizes shiny to provide statistical analysis for doctors (who don’t know R but need statistics).

Behind shiny

As you know, shiny is consisted with 2 part. UI and Server

You may think just UI is channel to both get input (data) from user and return calculated output (result) to user.
and server is just calculator

It means, server requires dynamic calculation that may change, not fixed contents (it called as static web page)

To achieve dynamic calculation, there are several options.

We can use shinyapps.io, posit connect, or deploy own shiny server in other cloud like AWS / azure / GCP …

These options can be categorized into two main categories: free but with limited features, or feature-rich but paid.

There is no single right answer, but I use shinyapps.io in see toy level project or deploy using shiny server in company’s cloud server which is not just toy level.

The rise of webR

Recent, webassembly (wasm) has emerged. that is use programming language in web browser (like Chrome) without install it (via javascript)

As far as I know, webR (R version of wasm) is built from late 2022 and some Examples are being shared to make R available on the web.

I understand logic for webR like just below figure. (but understanding is not necessary to run)

Shiny with wasm

For shiny, there is already wasm application called shinylive. but it utilizes shiny for Python.

Personally, I’m not familiar with this. Since I used R for a long time
so I wasn’t interested in this.

but very recent, The article has been shared with appsilon’s shiny weekly newsletter.

and Leemput explains how to implant webR and shiny application in WordPress very kindly.

Since wordpress provides a static page service, this means that shiny can be planted using the github page I’m familiar with. (There are examples with netlify. so I think static page service like Vercel, Notion, Firebase or even medium may use webR)

The logic is just below. (as I understand)

note, Main difference with webR and shiny wasm is service worker

Let’s Build it

To build serverless shiny application with github page, we need 3 + 1 things.

1. HTML contents (button to show status of wasm and iframe for shiny applicaiton)

2. shiny code (app.R, we’ll utilize pre-made and publicly avaiable app)

3. javascript to run service worker (web worker + serivce worker)

and hard thing.

4. configuration for github page to utilize service worker via proxy.

we can utilize the resources provided by Leemput. (HTML contents and javascript code)

so let’s make index.qmd like below (you can check in my repo too)

Important: Change html to “{=html}”. I changed it since it breaks wordpress site.

---
title: "serverless shiny with github page"
include-in-header:
text: | 
  <script> type='application/javascript' src = 'enable-threads.js' </script>
---

```html
<button class="btn btn-success btn-sm" type="button" style="background-color: dodgerblue" id="statusButton">
  <i class="fas fa-spinner fa-spin"></i>
  Loading webR...
</button>
<div id="iframeContainer"></div>
<script defer src="<https://use.fontawesome.com/releases/v5.15.4/js/all.js>" integrity="sha384-rOA1PnstxnOBLzCLMcre8ybwbTmemjzdNlILg8O7z1lUkLXozs4DHonlDtnE7fpc" crossorigin="anonymous"></script>
<script type="module">
  import { WebR } from '<https://webr.r-wasm.org/latest/webr.mjs>';
  const webR = new WebR();
  // TODO
  const shinyScriptURL = '<https://raw.githubusercontent.com/rstudio/shiny/main/inst/examples/01_hello/app.R>'
  const shinyScriptName = 'app.R'
  let webSocketHandleCounter = 0;
  let webSocketRefs = {};
  const loadShiny = async () => {
    try {
      document.getElementById('statusButton').innerHTML = `
        <i class="fas fa-spinner fa-spin"></i>
        Setting up websocket proxy and register service worker`;
      class WebSocketProxy {
        url;
        handle;
        bufferedAmount;
        readyState;
        constructor(_url) {
          this.url = _url
          this.handle = webSocketHandleCounter++;
          this.bufferedAmount = 0;
          this.shelter = null;
          webSocketRefs[this.handle] = this;
          webR.evalRVoid(`
                        onWSOpen <- options('webr_httpuv_onWSOpen')[[1]]
                        if (!is.null(onWSOpen)) {
                          onWSOpen(${this.handle},list(handle = ${this.handle}))
                        }`)
          setTimeout(() => {
            this.readyState = 1;
            this.onopen()},
            0);
        }
        async send(msg) {
          webR.evalRVoid(`
          onWSMessage <- options('webr_httpuv_onWSMessage')[[1]]
          if (!is.null(onWSMessage)) {onWSMessage(${this.handle}, FALSE, '${msg}')}
          `)
        }
      }
      await webR.init();
      console.log('webR ready');
      (async () => {
        for (; ;) {
          const output = await webR.read();
          switch (output.type) {
            case 'stdout':
              console.log(output.data)
              break;
            case 'stderr':
              console.log(output.data)
              break;
            case '_webR_httpuv_TcpResponse':
              const registration = await navigator.serviceWorker.getRegistration();
              registration.active.postMessage({
                type: "wasm-http-response",
                uuid: output.uuid,
                response: output.data,
              });
              break;
            case '_webR_httpuv_WSResponse':
              const event = { data: output.data.message };
              webSocketRefs[output.data.handle].onmessage(event);
              console.log(event)
              break;
          }
        }
      })();
      // TODO
      const registration = await navigator.serviceWorker.register('/wasmR/httpuv-serviceworker.js', { scope: '/wasmR/' }).catch((error) => {
      console.error('Service worker registration error:', error);
      });
      if ('serviceWorker' in navigator) {
        navigator.serviceWorker.getRegistration()
          .then((registration) => {
            if (registration) {
              const scope = registration.scope;
              console.log('Service worker scope:', scope);
            } else {
              console.log('No registered service worker found.');
            }
          })
          .catch((error) => {
            console.error('Error retrieving service worker registration:', error);
          });
      } else {
        console.log('Service workers not supported.');
      }
      await navigator.serviceWorker.ready;
      window.addEventListener('beforeunload', async () => {
        await registration.unregister();
      });
      console.log("service worker registered");
      document.getElementById('statusButton').innerHTML = `
        <i class="fas fa-spinner fa-spin"></i>
        Downloading R script...
      `;
      await webR.evalR("download.file('" + shinyScriptURL + "', '" + shinyScriptName + "')");
      console.log("file downloaded");
      document.getElementById('statusButton').innerHTML = `
        <i class="fas fa-spinner fa-spin"></i>
        Installing packages...
      `;
      await webR.installPackages(["shiny", "jsonlite"])
      document.getElementById('statusButton').innerHTML = `
        <i class="fas fa-spinner fa-spin"></i>
        Loading app...
      `;
      webR.writeConsole(`
          library(shiny)
          runApp('` + shinyScriptName + `')
      `);
      // Setup listener for service worker messages
      navigator.serviceWorker.addEventListener('message', async (event) => {
        if (event.data.type === 'wasm-http-fetch') {
          var url = new URL(event.data.url);
          var pathname = url.pathname.replace(/.*\\/__wasm__\\/([0-9a-fA-F-]{36})/, "");
          var query = url.search.replace(/^\\?/, '');
          webR.evalRVoid(`
                     onRequest <- options("webr_httpuv_onRequest")[[1]]
                     if (!is.null(onRequest)) {
                       onRequest(
                         list(
                           PATH_INFO = "${pathname}",
                           REQUEST_METHOD = "${event.data.method}",
                           UUID = "${event.data.uuid}",
                           QUERY_STRING = "${query}"
                         )
                       )
                     }
                     `);
        }
      });
      // Register with service worker and get our client ID
      const clientId = await new Promise((resolve) => {
        navigator.serviceWorker.addEventListener('message', function listener(event) {
          if (event.data.type === 'registration-successful') {
            navigator.serviceWorker.removeEventListener('message', listener);
            resolve(event.data.clientId);
            console.log("event data:")
            console.log(event.data)
          }
        });
        registration.active.postMessage({ type: "register-client" });
      });
      console.log('I am client: ', clientId);
      console.log("serviceworker proxy is ready");
      // Load the WASM httpuv hosted page in an iframe
      const containerDiv = document.getElementById('iframeContainer');
      let iframe = document.createElement('iframe');
      iframe.id = 'app';
      iframe.src = `./__wasm__/${clientId}/`;
      iframe.frameBorder = '0';
      iframe.style.width = '100%';
      iframe.style.height = '600px'; // Adjust the height as needed
      iframe.style.overflow = 'auto';
      containerDiv.appendChild(iframe);
      // Install the websocket proxy for chatting to httpuv
      iframe.contentWindow.WebSocket = WebSocketProxy;
      document.getElementById('statusButton').innerHTML = `
          <i class="fas fa-check-circle"></i>
          App loaded!
      `;
      document.getElementById('statusButton').style.backgroundColor = 'green';
      console.log("App loaded!");
    } catch (error) {
      console.log("Error:", error);
      document.getElementById('statusButton').innerHTML = `
        <i class="fas fa-times-circle"></i>
        Something went wrong...
      `;
      document.getElementById('statusButton').style.backgroundColor = 'red';
    }
  };
  loadShiny();
</script>
```

note that, there is 4 code that you must notice.

Line 3–4 add header to enable-thread.js : github page has some permissions-policy (CORB / COOP / COEP) that blocks resource from other source page. so add this to enable it.
Line 7 add “=html” to HTML code in quarto (not just show it)
First TODO set app.R code via URL: I tried to include in repo and call it like repo/app.R, but it didn’t work. so upload app.R in your repo and call it with raw file URL
Last TODO register service worker along your github page:
in registration, above code use /wasmR/httpuv-serviceworker.js and scope /wasmR/ but you must change to wasmR as your repository name.

after complete index.qmd. render it to index.html

but result will not show in localhost.

Remain step is so easy.

just commit your work to repository,
and build github page with that.
In page setting, do not use /docs, just / (root) only worked for me. even set quarto project to render output in /docs

Final repository structure

/repo
  - index.qmd
  - index.html
  - /index_files 
  - enable-thread.js
  - httpuv-serviceworker.js
  - …. (readme.md and so on)

bold is essential file and js file should download from link 1, 2.

Summary

We’ve seen how to deploy shiny as a github page with a simple example.

Let’s summarize some of the pros and cons of this method.

Pros

You can deploy simple shinyapp with static page (github page)
You don’t need to consider cost / scale / performance since wasm uses client (User) ‘s PC.
You can extend your shiny app with other framework like react, vue, tailwind… since shinyapp only requires just iframe and javascript code.
You don’t need to consider deploy. (Github will do that)

Cons

webR is in really really really earlier stage. so it doesn’t have much references, resources to refer.
wasm shiny application requires time to initiate webR and shiny in chrome (this is critical, it takes so much time sometime randomly)
(as Leemput already mentioned) why shiny? we can just use common web framework as UI and input, then utilize just webR not shiny.
Heavier work (like file I/O) doesn’t supports yet in wasm shiny.

Future work?

I think some can be improved.

use javascript as separate file (in qmd’s module script) : so quarto only requires iframe and button
use app.R via repo not URL
render quarto page to /docs not root: with this, quarto blog can use wasm shiny application well.
research about what can be done or not via wasm shiny application. )I checked file upload / download can’t)

Other ways to use webR

You may note that, there are other options to build shiny webR using golem framework. (I’ll not brief them)

There is quarto template use webR (not supports shiny yet)

https://github.com/coatless/quarto-webr

Thanks to community

George Stagg for httpuv-serviceworker.js
Joseph rocca for enable-thread.js
Veerle van Leemput for kind introduction and help

If you have question or some ideas. Let’s talk!

Creating Standalone Apps from Shiny with Electron [2023, macOS M1]

💡 I assume that…

- You’re famillar with R / shiny
- You know basic terminal command.
- You’ve heard about node.js, npm, or Javascript something…

1. Why standalone shiny app?

First, let’s talk about the definition of a standalone app.

I’m going to define it this way

An app that can run independently without any external help.

“External” here is probably a web browser, which means software that can be installed and run without an internet connection.

Rstudio can also be seen as a kind of standalone, as you can’t install packages or update them if you’re not connected to a network, but you can still use them.

Creating a standalone app with shiny is really unfamiliar and quite complicated. What are the benefits, and why should you develop it anyway?

I can think of at least two.

1. better user experience

Regardless of the deployment method, when using Shiny as a web app, you have to turn on your web browser, enter a URL, and run it.

The process of sending and receiving data over the network can affect the performance of your app.

However, a standalone app can run without a browser and use the OS’s resources efficiently, resulting in a slightly faster and more stable execution.

Of course, the advantage is that it can be used without an internet connection.

2. Improved security

Shiny apps run through a web browser anyway, so if a “legendary” hacker had their way, they could pose a threat to the security of Shiny apps.

However, standalone is a bit immune to this problem, as long as they don’t physically break into your PC.

2. Very short introduction of electron

Electron (or more precisely, electron.js) is a technology that allows you to embed chromium and node.js in binary form to utilize the (shiny!) technologies used in web development: html, css, and javascript, to quote a bit from the official page.

It’s a story I still don’t fully understand, but fortunately, there have been numerous attempts by people to make shiny standalone with electron.js before, and their dedication has led to the sharing of templates that remove the “relatively complex” process.

The article I referenced was “turn a shiny application into a tablet or desktop app” by r-bloggers, written in 2020, but times have changed so quickly that the stuff from then doesn’t work (at least not on my M1 MAC).

After a considerable amount of wandering, I found a github repository that I could at least understand. Unfortunately, the repository was archived in April 2022. There were some things that needed to be updated for March 23.

Eventually, I was able to make the shiny app work as a standalone app.

And I’m going to leave some footprints for anyone else who might wander in the future.

3. Packaging shiny app with Electron

It’s finally time to get serious and package shiny as an electron.

I’ll describe the steps in a detailed, follow-along way where possible, but if you run into any problems, please let me know by raising an issue in the repository.

(I’ve actually seen it package as a standalone app utilizing the template by following the steps below)

1. the first thing you need to do is install npm.js, npm, and electron forge using Rstudio’s terminal. (I’ll skip these)

2. fork/clone (maybe even star ⭐) the template below

https://github.com/zarathucorp/shiny-electron-template-m1-2023

3. open the cloned project in Rstudio (.Rproj)

4. if you get something like below, except for the version, you are good to go.

Now start at line 6 of readmd (of template).

Let’s name the standalone app we want to create (obviously) “helloworld”

💡 I’ll format directory like /this

5. Run npx create-electron-app helloworld in the terminal to create the standalone app package. This will create a directory called /helloworld, delete /helloworld/src.

6. move the template’s files below to /helloworld and set the working directory to /helloworld.

- start_shiny.R
- add_cran-binary_pkgs.R
- get-r-mac.sh
- /shiny
- /src

7. in the console, use version to check the version of R installed on your PC. Then run the shell script sh ./get-r-mac.sh in the terminal to install R for electron. (The version on your PC and the version of R in sh should be the same)

8. Once you see that the /r-mac directory exists, install the automagic R package from the console

9. modify the package.json (change the author name of course) The parts that should look like the image are the dependencies, repository, and devDependencies parts.

10. develop a shiny app (assuming you’re familiar with shiny, I’ll skip this part)

11. install the R package for electron by running Rscript add-cran-binary-pkgs.R in the terminal.

12. in a terminal, update the package.json for electron with npm install (this is a continuation of 9)

13. in a terminal, verify that the standalone app is working by running electron-forge start

If, like me in the past, the electron app still won’t run, exit the app, restart your R session in Rstudio, and then run the standalone app again. (It seems to be an environment variable issue, such as R’s shiny port.

14. once you’ve verified that start is running fine, create a working app with electron-forge make.

🥳 Voila, you have successfully made shiny a standalone app using electron.

4. Summary

If I’ve succeeded in my intentions, you should be able to use the
template to make shiny a standalone app using electron in 2023 on an m1 mac.

That app (delivered as a zip file) now makes

- the power of R / Shiny available to people with little experience
- without installing or using R.
- Or even in a “closed environment” with no network connection

Since electron is technically electron.js, my biggest challenge in creating a standalone app with electron was utilizing Javascript (which I have limited skills in compared to R).

Fortunately, I was able to do so by making some improvements to the templates that the pioneers had painstakingly created.

Thank you L. Abigail Walter, Travis Hinkelman, and Dirk Shumacher

I’ll end this post with a template that I followed up with that I hope you’ll find useful.

Thank you.

(Translated with DeepL ❤️)

Introduction to data analysis with {Statgarten}.

Overview

Data analysis is a useful way to help solve problems in quite a few situations.

There are many things that go into effective data analysis, but three are commonly mentioned

1. defining the problem you want to solve through data analysis
2. meaningful data collected
3. the skills (and expertise) to analyze the data

R is often mentioned as a way to effectively fill the third of these, but at the same time, it’s often seen as a big barrier for people who haven’t used R before (or have no programming experience).

In my previous work experience, there were many situations where I was able to turn experiences into insights and produce meaningful results with a little data analysis, even if I was “not a data person”.

For this purpose, We have developed an open source R package called “Statgarten” that allows you to utilize the features of R without having to use R directly, and I would like to introduce it.

Here’s the repo link (Note, some description is written in Korean yet)

👣 Flow of data analysis

The order and components may vary depending on your situation, but I like to define it as five broad flows.

1. data preparation
2. EDA
3. data visualization
4. calculate statistics
5. share results

In this article, I’ll share a lightweight data analysis example that follows these steps (while utilizing R’s features and not typing R code whenever possible).

Note, Since our work is still in progress, including deployment in the form of a web application, we will utilize R packages.

Install

With this code, you can install all components of statgarten system.

remotes::install_github('statgarten/statgarten')
library(statgarten)

Run

The core of the statgarten ecosystem is door, which allows you to bundle other functional packages together. (Of course, you can also use each package as a separate shiny module)

Let’s load the door library, and run it via run_app.

library(door)

run_app() # OR door::run_app()

If you didn’t set anything, the shiny application will run in Rstudio’s viewer panel, but we recommend running it in a web browser like Chrome via the Show in new window icon (Icon to the left of the Stop button)

If you don’t have any problems running it (please raise an issue on DOOR to let us know if you do), you should see the screen below.

1. Data preparation

There are four ways to prepare data for Statgarten. 1) Upload a file from your local PC, 2) Enter the URL of a file, 3) Enter the URL of a Google Sheet, or 4) Finally, utilize the public data included in statgarten, which can be found in the tabs File, URL, Google Sheet, and Datatoys respectively.

In this example, we will utilize the public data named bloodTest.

bloodTest contains blood test data from 2014-15 provided by the National Health Insurance Service in South Korea.

1.5 Define the problem

Utilizing bloodtest data, we’ll try to see clues for this question

“Are people with high total cholesterol more likely to be diagnosed with anemia and cerebrovascular disease, and does the incidence vary by gender?”

With a few clicks, select the data as shown below. (after selection, click Import data button)

Before we start EDA, let’s process the data for analysis.

In keeping with the theme, we will “remove” data that is not needed and change some numeric values to the type of factor.

This can be done with the Update Data button, where data selection is done with the checkbox. The type can be changed in the New class.

2. EDA

You can see the organization of the data in the EDA pane below, where we see that the genders are 1 and 2, so we’ll use the Replace function on the Transform Data button to change them to M/F.

3. Data visualization

In the Vis Panel, you can also visualize anemia (ANE) and total cholesterol (TCHOL) by dragging, as well as total cholesterol by cerebrovascular disease (STK) status.

However, it’s hard to tell from the figure if there is a significant difference (in both case).

4. Statistics

You can view the distribution of values by data and key statistics via Distribution in the EDA panel.

For the anemia (ANE) and cerebrovascular disease variables (STK), we see that 0 (never diagnosed) is 92.2% and 93.7%, respectively, and 1 (diagnosed) is 7.8% and 6.3%, respectively.

In the Stat Panel, let’s create a “Table 1” to represent the baseline characteristics of the data, based on anemia status (ANE).

Cerebrovascular disease status(STK) , again from Table 1, we can see that the value of total cholesterol (TCHOL) by gender (SEX) is significant with a Pvalue less than 0.05.

5. Share result

I think quarto (or Rmarkdown) is the most effective way to share data analysis results in R, but utilizing it in a shiny app is another matter.

As a result, statgarten’s results sharing is limited to exporting a data table or downloading an image.

⛳ Statgarten as Open source

The statgarten project has goal for

In order to help process and utilize data in a rapidly growing data economy and foster data literacy for all.

The project is being developed with the support of the Ministry of Science and ICT of the Republic of Korea, and has been selected as a target for the 2022 Information and Communication Technology Development Project and the Standards Development Support Project.

But at the same time, it is an open source project that everyone can use and contribute to freely. (We’ve also used other open source projects in the development process)

It is being developed in various forms such as web app, docker, and R package, and is open to various forms of contributions such as development, case sharing, and suggestions.

Please try it out, raise an issue, fork or stargaze it, or suggest what you need, and we’ll do our best to incorporate it, so please support us 🙂

For more information, you can check out our github page or drop us an email.

Thanks.

(Translated with DeepL ❤️)

Basic data analysis with palmerpenguins

Introduction

In June 17, nice article for introducing new trial dataset were uploaded via R-bloggers.

iris, one of commonly used dataset for simple data analysis. but there is a little issue for using it.

Too good.

Every data has well-structured and most of analysis method works with iris very well.

In reality, most of dataset is not pretty and requires a lot of pre-process to just start. This can be possible works in pre-process

Remove NAs.
Select meaningful features
Handle duplicated or inconsistent values.
or even, just loading the dataset. if is not well-structured like Flipkart-products

However, in this penguin dataset, you can try for this work. also there’s pre-processed data too.

For more information, see the page of palmerpenguins.

There is a routine for me with brief data analysis. and today, I want to share them with this lovely penguins.

Contents

0. Load dataset and library on workspace.

library(palmerpenguins) # for data
library(dplyr) # for data-handling
library(corrplot) # for correlation plot
library(GGally) # for parallel coordinate plot
library(e1071) # for svm

data(penguins) # load pre-processed penguins

palmerpenguins have 2 data penguins, penguins_raw , and as you can see from their name, penguins is pre-processed data.

1. See the summary and plot of Dataset

summary(penguins)
plot(penguins)

It seems species , island and sex is categorical features.
and remaining for numerical features.

2. Set the format of feature

penguins$species <- as.factor(penguins$species)
penguins$island <- as.factor(penguins$island)
penguins$sex <- as.factor(penguins$sex)

summary(penguins)
plot(penguins)

and see summary and plot again. note that result of plot is same.

There’s unwanted NA and . values in some features.

3. Remove not necessary datas ( in this tutorial, NA)

penguins <- penguins %>% filter(sex == 'MALE' | sex == 'FEMALE')
summary(penguins)

And here, I additionally defined color values for each penguins to see better plot result

# Green, Orange, Purple
pCol <- c('#057076', '#ff8301', '#bf5ccb')
names(pCol) <- c('Gentoo', 'Adelie', 'Chinstrap')
plot(penguins, col = pCol[penguins$species], pch = 19)

Now, plot results are much better to give insights.

Note that, other pre-process step may requires for different datasets.

4. See relation of categorical features

My first purpose of analysis this penguin is species
So, I will try to see relation between species and other categorical values

4-1. species, island

table(penguins$species, penguins$island)
chisq.test(table(penguins$species, penguins$island)) # meaningful difference

ggplot(penguins, aes(x = island, y = species, color = species)) +
  geom_jitter(size = 3) + 
  scale_color_manual(values = pCol)

Wow, there’s strong relationship between species and island

– Adelie lives in every island
– Gentoo lives in only Biscoe
– Chinstrap lives in only Dream

4-2 & 4.3.
However, species and sex or sex and island did not show any meaningful relation.
You can try following codes.

# species vs sex
table(penguins$sex, penguins$species)
chisq.test(table(penguins$sex, penguins$species)[-1,]) # not meaningful difference 0.916

# sex vs island
table(penguins$sex, penguins$island) # 0.9716
chisq.test(table(penguins$sex, penguins$island)[-1,]) # not meaningful difference 0.9716

5. See with numerical features

I will select numerical features.
and see correlation plot and parallel coordinate plots.

# Select numericals
penNumeric <- penguins %>% select(-species, -island, -sex)

# Cor-relation between numerics

corrplot(cor(penNumeric), type = 'lower', diag = FALSE)

# parallel coordinate plots

ggparcoord(penguins, columns = 3:6, groupColumn = 1, order = c(4,3,5,6)) + 
  scale_color_manual(values = pCol)

plot(penNumeric, col = pCol[penguins$species], pch = 19)

and below are result of them.

lucky, every numeric features (even only 4) have meaningful correlation and there is trend with their combination for species (See parallel coordinate plot)

6. Give statistical work on dataset.

In this step, I usually do linear modeling or svm to predict

6.1 linear modeling

species is categorical value, so it needs to be change to numeric value

set.seed(1234)
idx <- sample(1:nrow(penguins), size = nrow(penguins)/2)

# as. numeric
speciesN <- as.numeric(penguins$species)
penguins$speciesN <- speciesN

train <- penguins[idx,]
test <- penguins[-idx,]


fm <- lm(speciesN ~ flipper_length_mm + culmen_length_mm + culmen_depth_mm + body_mass_g, train)

summary(fm)

It shows that, body_mass_g is not meaningful feature as seen in plot above ( it may explain gentoo, but not other penguins )

To predict, I used this code. however, numeric predict generate not complete value (like 2.123 instead of 2) so I added rounding step.

predRes <- round(predict(fm, test))
predRes[which(predRes>3)] <- 3
predRes <- sort(names(pCol))[predRes]

test$predRes <- predRes
ggplot(test, aes(x = species, y = predRes, color = species))+ 
  geom_jitter(size = 3) +
  scale_color_manual(values = pCol)

table(test$predRes, test$species)

Accuracy of basic linear modeling is 94.6%

6-2 svm

using svm is also easy step.

m <- svm(species ~., train)

predRes2 <- predict(m, test)
test$predRes2 <- predRes2

ggplot(test, aes(x = species, y = predRes2, color = species)) +
  geom_jitter(size = 3) +
  scale_color_manual(values = pCol)

table(test$species, test$predRes2)

and below are result of this code.

Accuracy of svm is 100%. wow.

Conclusion

Today I introduced simple routine for EDA and statistical analysis with penguins.
That is not difficult that much, and shows good performances.

Of course, I skipped a lot of things like processing raw-dataset.
However I hope this trial gives inspiration for further data analysis.

Thanks.