Hands-on: How to build an interactive map in R-Shiny: An example for the COVID-19 Dashboard

It has been two years since I started to develop various interactive web applications by using R-Shiny packages. Due to Coronavirus disease (COVID-19) at this moment, I spent some time at home on preparing a simple COVID-19 related Dashboard. To share my experiences and consideration, especially those about how to create interactive maps, the following details has been discussed.

Data Source 

Dataset (data/key-countries-pivoted.csv) from Github (https://github.com/datasets/covid-19), which contains the data of daily confirmed cases of COVID-19 since 22th Jan 2020 from the eight most rapidly-spread countries (China, USA, United Kingdom, Italy, France, Germany, Spain and Iran), has been used. An example of overview of the dataset is given as follows:
In addition, the geographic coordinates (Latitude and Longitude) of the centroids of these countries have been collected for placing charts in the leaflet map.


R Shiny: Introduction 

Shiny application consists of two components, a user interface object and a server function. In user interface object, you are able to create dynamic dashboard interface, while the server function contains code provides an interactive connection between objects used for input and output. The UI object and server function will be further listed, respectively.

R library 
We use the R library leaflet, which is a Javascript Library and allows users to create and customize interactive maps. For more details, we refer users to the link: https://cran.r-project.org/web/packages/leaflet/index.html
 library(shiny)
 library(leaflet.minicharts)
 library(leaflet)
Shiny-UI
The user Interface is constructed as follows, where two blocks are included. The upper block shows the worldwide COVID-19 distribution and the total number of the confirmed cases of each country. The amount of cases is labelled separately and those countries with more than 500,000 cases are marked as red, otherwise black. Note that the function leafletOutput() is used for turning leaflet map object into an interface output. In the lower block, we are able to visualize the time dependent development of confirmed cases for a selected country and time history.  The classical layout sideBarLayout, which consists of a sidebar (sidebarPanel()) and the main area (mainPanel()), is used. Moreover, a checkbox is provided to select whether plotting the daily new confirmed cases is desired.


The R code for UI is given as follows:
 ui<- fluidPage(
   #Assign Dasbhoard title 
   titlePanel("COVID19 Analytics"),
  
  # Start:  the First Block
  # Sliderinput: select from the date between 01.20.2020 
  # and 01.04.2020
  sliderInput(inputId = "date", "Date:", min = 
  as.Date("2020-01-20"), max = as.Date("2020-04-01"), 
  value = as.Date("2020-03-01"), width = "600px"),
    
  # plot leaflet object (map) 
  leafletOutput(outputId = "distPlot", width = "700px", 
  height = "300px"),
  #End:  the First Block
  
  #Start: the second Block
  sidebarLayout(
    
    #Sidebar Panel: the selected country, history and 
    #whether to plot daily new confirmed cases.
    sidebarPanel(
      selectInput("selectedcountry", h4("Country"), choices 
      =list("China","US","United_Kingdom","Italy","France",
      "Germany", "Spain"), selected = "US"),
      selectInput("selectedhistoricwindow", h4("History"), 
      choices = list("the past 10 days", "the past 20 
      days"), selected = "the past 10 days"),
      checkboxInput("dailynew", "Daily new infected", 
      value = TRUE),
      width = 3  
    ),
    
    #Main Panel: plot the selected values
    mainPanel (
      plotOutput(outputId = "Plotcountry", width = "500px", 
      height = "300px")
    )
  ),
  #End: the second Block 
)
Shiny-Server
Input and output objects are connected in the Server function. Note that, the input arguments are stored in a list-like object and each input argument is identified under its unique name, for example the sliderInput is named after “date”.
server <- function(input, output){
  
  #Assign output$distPlot with renderLeaflet object
  output$distPlot <- renderLeaflet({
  
    # row index of the selected date (from input$date)
    rowindex = which(as.Date(as.character(daten$Date), 
    "%d.%m.%Y") ==input$date)
    
    # initialise the leaflet object
    basemap= leaflet()  %>%
      addProviderTiles(providers$Stamen.TonerLite,
      options = providerTileOptions(noWrap = TRUE)) 
    
    # assign the chart colors for each country, where those 
    # countries with more than 500,000 cases are marked 
    # as red, otherwise black
    chartcolors = rep("black",7)
    stresscountries 
    = which(as.numeric(daten[rowindex,c(2:8)])>50000)
    chartcolors[stresscountries] 
    = rep("red", length(stresscountries))
    
    # add chart for each country according to the number of 
    # confirmed cases to selected date 
    # and the above assigned colors
    basemap %>%
      addMinicharts(
        citydaten$long, citydaten$Lat,
        chartdata = as.numeric(daten[rowindex,c(2:8)]),
        showLabels = TRUE,
        fillColor = chartcolors,
        labelMinSize = 5,
        width = 45,
        transitionTime = 1
      ) 
  })
  
  #Assign output$Plotcountry with renderPlot object
  output$Plotcountry <- renderPlot({
    
    #the selected country 
    chosencountry = input$selectedcountry
    
    #assign actual date
    today = as.Date("2020/04/02")
    
    #size of the selected historic window
    chosenwindow = input$selectedhistoricwindow
    if (chosenwindow == "the past 10 days")
       {pastdays = 10}
    if (chosenwindow  == "the past 20 days")
       {pastdays = 20}
    
    #assign the dates of the selected historic window
    startday = today-pastdays-1
    daten$Date=as.Date(as.character(daten$Date),"%d.%m.%Y")
    selecteddata 
    = daten[(daten$Date>startday)&(daten$Date<(today+1)), 
    c("Date",chosencountry)]
    
    #assign the upperbound of the y-aches (maximum+100)
    upperboundylim = max(selecteddata[,2])+100
    
    #the case if the daily new confirmed cases are also
    #plotted
    if (input$dailynew == TRUE){
    
      plot(selecteddata$Date, selecteddata[,2], type = "b", 
      col = "blue", xlab = "Date", 
      ylab = "number of infected people", lwd = 3, 
      ylim = c(0, upperboundylim))
      par(new = TRUE)
      plot(selecteddata$Date, c(0, diff(selecteddata[,2])), 
      type = "b", col = "red", xlab = "", ylab = 
      "", lwd = 3,ylim = c(0,upperboundylim))
      
      #add legend
      legend(selecteddata$Date[1], upperboundylim*0.95, 
      legend=c("Daily new", "Total number"), 
      col=c("red", "blue"), lty = c(1,1), cex=1)
    }
    
    #the case if the daily new confirmed cases are 
    #not plotted
    if (input$dailynew == FALSE){
      
      plot(selecteddata$Date, selecteddata[,2], type = "b", 
      col = "blue", xlab = "Date", 
      ylab = "number of infected people", lwd = 3,
      ylim = c(0, upperboundylim))
      par(new = TRUE)
      
      #add legend
      legend(selecteddata$Date[1], upperboundylim*0.95, 
      legend=c("Total number"), col=c("blue"), 
      lty = c(1), cex=1)
    }
    
  })
    
} 
Shiny-UI
At the end, we create a complelte application by using the shinyApp function.
 shinyApp(ui = ui, server = server)
The Dashboard is deployed under the following URL: https://sangmeng.shinyapps.io/COVID19/

Easy text loading and cleaning using readtext and textclean packages. Movies text analysis. Part 1

My primary goal was to research popular movies vocabulary – words all those heroes actually saying to us and put well known movies onto the scale from the ones with the widest lexicon down to below average range despite possible crowning by IMDB rating, Box Office revenue or number / lack of Academy Awards.

This post is the 1st Part of the Movies text analysis and covers Text loading and cleaning using readtext and textclean packages. My first challenge was texts source – I saw text mining projects based on open data scripts but I wanted something new, more direct and, as close to what we actually hear AND read from the screen. I decided directly save subtitles while streaming using Chrome developer panel.

It requires further transformation through the chain .xml > .srt > .txt (not covered here).
Since we have .txt files ready, let`s start from downloading of texts stored in separate folder (1movie = 1 file).



library(readtext)
text_raw <-readtext(paste0(path="C:/Users/Andrew/Documents/R/movies/txt/*"))
text <- gsub("\r?\n|\r", " ", text_raw)
To go further it is important to understand what is text and what is noise here. Since subtitles are real text representation of entire movie, its text part contains:
  1. Speech (actual text)
  2. Any sound (made by heroes/characters or around)
  3. Speakers / characters marks (who is saying)
  4. Intro / end themes
  5. Other text
All but first are noise, at least for purpose of measuring and compering vocabulary / lexicon. Luckily, there are limited number of patterns to distinct Speech from other sound / noise so using string processing is possible to husk the speech leaving noise apart. Let`s start from the brackets and clean not only brackets but what is inside also using textclean package.  
library (textclean)

text1 <- str_replace_all (text, "\\[.*?\\]", "")
text1_nchar <- sum (nchar (text1))
It is important to check how hard your trim your "garden", you could do it element-wise if the summary shows any odd losses or, in opposite, very little decrease of characters #.

the same with () +()

text2 <- str_replace_all(text1, "\\(.*?\\)", "")
text2_nchar <- sum (nchar (text2))
Do not forget any odd characters detected, like, in my case, ‘File:’
text3 <- str_replace (text2,'File:', '')
text3_nchar <- sum (nchar (text3))
Now is the trickiest part, other brackets - ♪♪ Ideally, it requires you to know the text well to distinct theme songs and background music from the situation where singing is the way the heroes “talk” to us. For instance musicals or a lot of kids movies.
text4 <- str_replace_all(text3,"\\âTª.*?\\âTª", "")
text4_nchar <- sum (nchar (text4))
Speech markers e.g. JOHN:

text5 <- str_replace_all(text4,"[a-zA-Z]+:", "")
text5_nchar <- sum (nchar (text5))
Check % of total cut



Stop cut the weed, let`s replace the rest): @#$%&
text6 <- replace_symbol(text5, dollar = TRUE, percent = TRUE, pound = TRUE,
                        at = TRUE, and = TRUE, with = TRUE)
text6_nchar <- sum (nchar (text6))

#time stamps
text7 <- replace_time(text6, pattern = "(2[0-3]|[01]?[0-9]):([0-5][0-9])[.:]?([0-5]?[0-9])?")
text7_nchar <- sum (nchar (text7))

#date to text
replace_date_pattern <- paste0(
  '([01]?[0-9])[/-]([0-2]?[0-9]|3[01])[/-]\\d{4}|\\d{4}[/-]', 
  '([01]?[0-9])[/-]([0-2]?[0-9]|3[01])')
text8 <- replace_time(text7, pattern = replace_date_pattern)
text8_nchar <- sum (nchar (text8))

#numbers to text, e.g. 8 = eight
text9 <- replace_number(text8, num.paste = FALSE, remove = FALSE)
text9_nchar <- sum (nchar (text9))
Now texts are clean enough and ready for further analysis. That`s it for Movies text analysis. Part 1.

In Movies text analysis. Part 2. I plan to compare how wide, in general, vocabulary is within iconic movies, different genres, Movies vs TV Shows, IMDB TOP 100 vs Box Office All Time Best etc.

In addition, having text data and the movies text length I Planned to check #of words per minute parameter to defy or confirm hypothesis that movies speech is significantly different in range and slower overall than normal speech benchmark in 120-150 WPM (words per minute).

“Those who can, do;  those who can’t – use computer simulation.”

     “Those who can, do;  those who can’t – use computer simulation.”  This quote was inspired by playwright George Bernard Shaw.  Computer simulation is a powerful tool that attempts to reproduce the behavior of some real-world system by sampling from one or more probability distributions.  It can help explain and illustrate difficult concepts such as the Monty Hall game show problem.  It can also solve problems that are hard or impossible to solve directly. 

     One example of a problem of this latter type is the probability distribution of the sum of n independent random variables each having probability distribution g, where n is also a random variable having probability distribution f.  This is a special case of a statistical subject called convolutions, where one calculates the convolution of all possible values of the individual distributions.  For certain choices of f and g, such as Poisson for f and gamma for g, the resulting convolution is an integrateable function and there is no need for simulation.  But for many choices of f and g, simulation or some other numerical method is needed.
     A real-world use of this type of simulation is in modeling loss events of a business entity such as a bank or insurance company.  The entity will have some random number of events n per year of random sizes that is modeled by a frequency distribution f such as Poisson or negative binomial.  A number n is randomly selected from the frequency distribution.  Then n random numbers are randomly selected from a severity distribution g such as lognormal or gamma to simulate the sizes of the n loss events.  The n loss amounts are added to produce a total value of losses for one year.  The process is repeated some large number of times, say 10,000, and the 10,000 numbers are ranked.  If the entity is interested in the 99.9th percentile such loss, that value is the 9,990th largest value.
     Base R provides functions to simulate from many probability distributions.  For example, rpois(n, lamda) produces n Poisson distributed samples from a distribution having population mean lambda.  Poisson is a single parameter distribution with variance equal to the mean.  An alternative frequency distribution with a more flexible option for the variance is the negative binomial distribution.  Its R function is rnbinom(n, mu, size) which produces n negative binomial distributed samples from a distribution having population mean mu and dispersion parameter size, where size = mu^2 / (variance – mu).
     Most severity distributions are not symmetrical but rather are positively skewed with low mean and long positive tail.  A random variable is lognormally distributed if the logarithm of the random variable  is normally distributed.  Its R function is rlnorm(n, meanlog, sdlog) which produces n lognormally distributed samples from a distribution having population mean m and standard deviation s, and here meanlog = LN(m) – .5*LN((s/m)^2 + 1)) and sdlog = .5*LN((s/m)^2+1)).
     It is helpful to plot the resulting histogram of the 10,000 simulations.  The real-world purpose of the exercise may be to identify the 99.9th percentile value.    The R quantile function quantile(x, probs) returns the percentile value equal to probs.  To display the percentile value on a plot, use the text and arrow functions.  R has a text function that adds text a plot, text(x, y, label), which adds a label at coordinate (x,y).  Further, R has an arrows function, arrows(x0, y0, x1, y1, code=2), which draws an arrow from (x0, y0) to (x1, y1)  with code 2 drawing the arrowhead at (x1, y1).
     The following is the R code and resulting histogram plot of a negative binomial frequency and lognormal severity simulation.  The user input values are based on the mean and standard deviation values of a dataset.

# Monte Carlo simulation. Negative binomial frequency, lognormal severity. 

# negbinom
nb_m <- 50 # This is a user input.
nb_sd <- 10 # This is a user input.
nb_var <- nb_sd^2
nb_size <- nb_m^2/(nb_var – nb_m)

# lognorm
xbar <- 60 # This is a user input.
sd <- 40 # This is a user input.
l_mean <- log(xbar) – .5*log((sd/xbar)^2 + 1)
l_sd <- sqrt(log((sd/xbar)^2 + 1))

num_sims <- 10000 # This is a user input.
set.seed(1234) # This is a user input.
rtotal <- vector()

for (i in 1:num_sims)
{
nb_random <- rnbinom(n=1, mu=nb_m, size=nb_size)
l_random <- rlnorm(nb_random, meanlog=l_mean, sdlog=l_sd)
rtotal[i] <- sum(l_random)
}

rtotal <- sort(rtotal)
m <- round(mean(rtotal), digits=0)
percentile_999 <- round(quantile(rtotal, probs=.999), digits=0)

print(paste(“Mean = “, m, ” 99.9th percentile = “, percentile_999))

hist(rtotal, breaks=20, col=”red”, xlab=”Annual Loss”, ylab=”Frequency”,
main=”Monte Carlo Simulation”)
text(percentile_999, 100, “99.9th Percentile”)
arrows(percentile_999,75, percentile_999,0, code=2)

[contact-form][contact-field label=”Name” type=”name” required=”true” /][contact-field label=”Email” type=”email” required=”true” /][contact-field label=”Website” type=”url” /][contact-field label=”Message” type=”textarea” /][/contact-form]  

Azure Machine Learning For R Practitioners With The R SDK

[This article was first published on the Azure Medium channel, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)


As probably you already know, Microsoft provided its Azure Machine Learning SDK for Python to build and run machine learning workflows, helping organizations to use massive data sets and bring all the benefits of the Azure cloud to machine learning.

Although Microsoft initially invested in R as the Advanced Analytics preferred language introducing the SQL Server R server and R services in the 2016 version, they abruptly shifted their attention to Python, investing exclusively on it. This basically happened for the following reasons:

  • Python’s simple syntax and readability make the language accessible to non-programmers
  • The most popular machine learning and deep learning open source libraries (such as Pandas, scikit-learn, TensorFlow, PyTorch, etc.) are deeply used by the Python community
  • Python is a better choice for productionalization: it’s relatively very fast; it implements OOPs concepts in a better way; it is scalable (Hadoop/Spark); it has better functionality to interact with other systems; etc.

Azure ML Python SDK Main Key Points

One of the most valuable aspects of the Python SDK is its ease to use and flexibility. You can simply use just few classes, injecting them into your existing code or simply referring to your script files into method calls, in order to accomplish the following tasks:

  • Explore your datasets and manage their lifecycle
  • Keep track of what’s going on into your machine learning experiments using the Python SDK tracking and logging features
  • Transform your data or train your models locally or using the best cloud computation resources needed by your workloads
  • Register your trained models on the cloud, package them into container image and deploy them on web services hosted in Azure Container Instances or Azure Kubernetes Services
  • Use Pipelines to automate workflows of machine learning tasks (data transformation, training, batch scoring, etc.)
  • Use automated machine learning (AutoML) to iterate over many combinations of defined data transformation pipelines, machine learning algorithms and hyperparameter settings. It then finds the best-fit model based on your chosen performance metric.

In summary, the scenario is the following one:

AI/ML lifecycle using Azure ML
fig. 1 — AI/ML lifecycle using Azure ML

What About The R Community Engagement?

In the last 3 years Microsoft pushed a lot over the Azure ML Python SDK, making it a stable product and a first class citizen of the Azure cloud. But they seem to have forgotten all the R professionals who developed a huge amount of data science project all around the world.

We must not forget that in Analytics and Data Science the key of success of a project is to quickly try out a large number of analytical tools and find what’s the best one for the case in analysis. R was born for this reason. It has a lot of flexibility when you want to work with data and build some model, because it has tons of packages and easy of use visualization functionality. That’s why a lot of Analytics projects are developed using R by many statisticians and data scientists.

Fortunately in the last months Microsoft extended a hand to the R community, releasing a new project called Azure Machine Learning R SDK.

Can I Use R To Spin The Azure ML Wheels?

Starting from October 2019 Microsoft released a R interface for Azure Machine Learning SDK on GitHub. The idea behind this project is really straightforward. The Azure ML Python SDK is a way to simplify the access and the use of the Azure cloud storage and computation for machine learning purposes keeping the main code as the one a data scientist developed on its laptop.

Why not allow the Azure ML infrastructure to run also R code (using proper “cooked” Docker images) and let R data scientists call the Azure ML Python SDK methods using R functions?

The interoperability between Python and R is obtained thanks to reticulate. So, once the Python SDK module azureml is imported into any R environment using the import function, functions and other data within the azureml module can be accessed via the $ operator, like an R list.

Obviously, the machine hosting your R environment must have Python installed too in order to make the R SDK work properly.

Let’s start to configure your preferred environment.

Set Up A Development Environment For The R SDK

There are two option to start developing with the R SDK:

  1. Using an Azure ML Compute Instance (the fastest way, but not the cheaper one!)
  2. Using your machine (laptop, VM, etc.)

Set Up An Azure ML Compute Instance

Once you created an Azure Machine Learning Workspace through the Azure portal (a basic edition is enough), you can access to the brand new Azure Machine Learning Studio. Under the Compute section, you can create a new Compute Instance, choosing its name and its sizing:

Create an Azure ML Compute Instance
fig. 2 — Create an Azure ML Compute Instance

The advantage of using a Compute Instance is that the most used software and libraries by data scientists are already installed, including the Azure ML Python SDK and RStudio Server Open Source Edition. That said, once your Compute Instance is started, you can connect to RStudio using the proper link:

Launch RStudio from a started Compute Intstance
fig. 3 — Launch RStudio from a started Compute Intstance

At the end of your experimentation, remember to shut down your Compute Instance, otherwise you’ll be charged according to the chosen plan:

Remember to shut down your Compute Instance
fig. 4 — Remember to shut down your Compute Instance

Set Up Your Machine From Scratch

First of all you need to install the R engine from CRAN or MRAN. Then you could also install RStudio Desktop, the preferred IDE of R professionals.

The next step is to install Conda, because the R SDK needs to bind to the Python SDK through reticulate. If you really don’t need Anaconda for specific purposes, it’s recommended to install a lightweight version of it, Miniconda. During its installation, let the installer add the conda installation of Python to your PATH environment variable.

Install The R SDK

Open your RStudio, simply create a new R script (File → New File → R Script) and install the last stable version of Azure ML R SDK package (azuremlsdk) available on CRAN in the following way:

install.packages("remotes")
remotes::install_cran("azuremlsdk")


If you want to install the latest committed version of the package from GitHub (maybe because the product team has fixed an annoying bug), you can instead use the following function:
remotes::install_github('https://github.com/Azure/azureml-sdk-for-r')

During the installation you could get this error:

Timezone error
fig. 5 — Timezone error

In this case, you just need to set the TZ environment variable with your preferred timezone:

Sys.setenv(TZ="GMT")

Then simply re-install the R SDK.

You may also be asked to update some dependent packages:

Dependent packages to be updated
fig. 6 — Dependent packages to be updated

If you don’t have any requirement about dependencies in your project, it’s always better to update them all (put focus on the prompt in the console; press 1; press enter).

If you are on your Compute Instance and you get a warning like the following one:

Warning about non-system installation of Python
fig. 7 — Warning about non-system installation of Python

just put the focus on the console and press “n”, since the Compute Instance environment already has a Conda installation. Microsoft engineers are already investigating on this issue.

You need then to install the Azure ML Python SDK, otherwise your azuremlsdk R package won’t work. You can do that directly from RStudio thanks to an azuremlsdk function:

azuremlsdk::install_azureml(remove_existing_env = TRUE)

The remove_existing_env parameter set to TRUE will remove the default Azure ML SDK environment r-reticulate if previously installed (it’s a way to clean up a Python SDK installation).

Just keep in mind that in this way you’ll install the version of the Azure ML Python SDK expected by your installed version of the azuremlsdk package. You can check what version you will install putting the cursor over the install_azureml function and visualizing the code definition clicking F2:

install_azureml code definition
fig. 8 — install_azureml code definition

Sometimes there are new feature and fixes on the latest version of the Python SDK. If you need to install it, first check what version is available on this link:

Azure ML Python SDK latest version
fig. 9 — Azure ML Python SDK latest version

Then use that version number in the following code:

azuremlsdk::install_azureml(version = "1.2.0", remove_existing_env = TRUE)

Sometimes you may need to install an updated version of a single component of the Azure ML Python SDK to test, for example new features. Supposing you want to update the Azure ML Data Prep SDK, here the code you could use:

reticulate::py_install("azureml-dataprep==1.4.2", envname = "r-reticulate", pip = TRUE)

In order to check if the installation is working correctly, try this:

library(azuremlsdk)
get_current_run()

It should return something like this:

Checking that azuremlsdk is correctly installed
fig. 10 — Checking that azuremlsdk is correctly installed

Great! You’re now ready to spin the Azure ML wheels using your preferred programming language: R!


Conclusions

After a long period during which Microsoft focused exclusively on Python SDK to enable data scientists to benefit from Azure computing and storage services, they recently released the R SDK too. This article focuses on the steps needed to install the Azure Machine Learning R SDK on your preferred environment.

Next articles will deal with the R SDK main capabilities.

Crowdfight Covid-19 – Call for R-community action

Dear R-community,

This is a rather unusual post for an R-blog, but unusual times may require that. I am not involved into organizing this effort except for volunteering in the same way I want you to do. So, if you aren’t aware of this initiative yet, please likewise
consider subscribing to Crowdfight Covid-19.

Motivation: Most current plans to fight COVID-19 rely on the assumption that treatments and/or vaccines will be available in a few months. Delays in these treatments will have enormous consequences, both in terms of economic impact and human lives.


Aim: To put the wider scientific community at the service of COVID-19 research. There is now a huge pool of highly skilled scientists, willing to volunteer their time and expertise for this cause. This goes from virologists who don’t have the resources to get directly involved with COVID-19 to researchers in other disciplines (bioinformatics, image analysis, AI…). This is a huge resource, the “bottleneck” being coordination. The team behind the platform tries to remove that bottleneck.


The distribution of tasks has been ongoing for three days now. Among many requests less related to coding, the daily task list also contains some data wrangling and modeling tasks every now and then, and many of us probably have additional spare time right now. Examples thus far have been calls for

    1. Building webscraping tool from scratch, which should help to consolodate government measures that have been put into place across various countries at different points in time,
    2. Support for unit testing and data fetching functionality wrapped around a more domain specific R functionality (biochemistry struff)
    3. Bayesian modeling and corresponding clinical experiment design
    4. Phylogenetic tree analyis of the virus RNA
I cannot share any concrete code snippets or implementation outlines, as the mentioned examples are either not existent yet or within private repositories. 
But it’s free to subscribe, and you can also put an upper bound on your weekly availability.


Predicting 1,000,000 of COVID-19 done with R before March 18 – call for help

Due to the unprecedented technological progress that has been made in years past, the current climate allows us to monitor this pandemic better than any other pandemic in the past. We will argue, however, that R was instrumental in predicting when the 1,000,000th case of COVID-19 will occur, as demonstrated here in our collaboration spread out on three continents:

https://www.sciencedirect.com/science/article/pii/S2590113320300079


Since India is currently in lockdown and the correction process is in India, it has not been concluded as of writing.

The first draft of our paper was prepared on March 18 and can be accessed here: http://www.cs.laurentian.ca/wkoczkodaj/p/?C=M;O=D
Open this link and click twice on “last modified” to see the data (the computing was done a few days earlier). 
Our heuristic developed for the prediction could not be implemented so quickly had it not been for our use of R. The function ‘nls‘ is crucial for modelling only the front incline part of the Gaussian function (also known as Gaussian). Should this pandemic not stop, or at the very least slow down, one billion cases could occur by the end of May 2020.

The entire world waits for the inflection point (https://en.wikipedia.org/wiki/Inflection_point) and if you help us, we may be able to reach this point sooner.

A few crucial R commands are: 
modE <- nls(dfcov$all ~ a * exp(bdfcov$report), data = dfcov,
start = list(a = 100, b = 0.1))
a <- summary(modE)$parameters[1]
b <- summary(modE)$parameters[2]
summary(modE)
x <- 1:m + dfcov$report[length(dfcov$report)]
modEPr <- a * exp(b
x)

Waldemar W. Koczkodaj (email: wkoczkodaj [AT] cs.laurentian.ca)
(for the entire Collaboration)

Analyzing Remote Sensing Data using Image Segmentation

This post summarizes material posted as an Additional Topic to accompany the book Spatial Data Analysis in Ecology and Agriculture using R, Second Edition. The full post, together with R code and data, can be found in the Additional Topics section of the book’s website, http://psfaculty.plantsciences.ucdavis.edu/plant/sda2.htm

1. Introduction
The idea is best described with images. Here is a display in mapview of an agricultural region in California’s Central Valley. The boundary of the region is a rectangle developed from UTM coordinates, and it is interesting that the boundary of the region is slightly tilted with respect to the network of roads and field boundaries. There are several possible explanations for this, but it does not affect our presentation, so we will ignore it.
There are clearly several different cover types in the region, including fallow fields, planted fields, natural regions, and open water. Our objective is to develop a land classification of the region based on Landsat data. A common method of doing this is based on the so-called false color image. In a false color image, radiation in the green wavelengths is displayed as blue, radiation in the red wavelengths is displayed as green, and radiation in the infrared wavelengths is displayed as red. Here is a false color image of the region shown in the black rectangle.  

Because living vegetation strongly reflects radiation in the infrared region and absorbs it in the red region, the amount of vegetation contained in a pixel can be estimated by using the normalized difference vegetation index, or NDVI, defined as
where IR is the pixel intensity of the infrared band and R is the pixel intensity of the red band. Here is the NDVI of the region in the false color image above.
If you compare the images you can see that the areas in  black rectangle above that appear green have a high NDVI, the areas that appear brown have a low NDVI, and the open water in the southeast corner of the image actually has a negative NDVI, because water strongly absorbs infrared radiation. Our objective is to generate a classification of the land surface into one of five classes: dense vegetation, sparse vegetation, scrub, open land, and open water. Here is the classification generated using the method I am going to describe.

There are a number of image analysis packages available in R. I elected to use the OpenImageR and SuperpixelImageSegmentation packages, both of which were developed by L. Mouselimis (2018, 2019a). One reason for this choice is that the objects created by these packages have a very simple and transparent data structure that makes them ideal for data analysis and, in particular, for integration with the raster package.  The packages are both based on the idea of image segmentation using superpixels . According to Stutz et al. (2018), superpixels were introduced by Ren and Malik (2003). Stutz et al. define a superpixel as “a group of pixels that are similar to each other in color and other low level properties.” Here is the false color image above divided into superpixels.
The superpixels of can then be represented by groups of pixels all having the same pixel value, as shown here. This is what we will call image segmentation.


In the next section we will introduce the concepts of superpixel image segmentation and the application of these packages. In Section 3 we will discuss the use of these concepts in data analysis. The intent of image segmentation as implemented in these and other software packages is to provide support for human visualization. This can cause problems because sometimes a feature that improves the process of visualization detracts from the purpose of data analysis. In Section 4 we will see how the simple structure of the objects created by Mouselimis’s two packages can be used to advantage to make data analysis more powerful and flexible. This post is a shortened version of the Additional Topic that I described at the start of the post.

2. Introducing superpixel image segmentation
The discussion in this section is adapted from the R vignette Image Segmentation Based on Superpixels and Clustering (Mouselimis, 2019b). This vignette describes the R packages OpenImageR and SuperpixelImageSegmentation. In this post I will only discuss the OpenImageR package. It uses simple linear iterative clustering (SLIC, Achanta et al., 2010), and a modified version of this algorithm called SLICO (Yassine et al., 2018), and again for brevity I will only discuss the former. To understand how the SLIC algorithm works, and how it relates to our data analysis objective, we need a bit of review about the concept of color space. Humans perceive color as a mixture of the primary colors red, green, and blue (RGB), and thus a “model” of any given color can in principle be described as a vector in a three-dimensional vector space. The simplest such color space is one in which each component of the vector represents the intensity of one of the three primary colors. Landsat data used in land classification conforms to this model. Each band intensity is represented as an integer value between 0 and 255. The reason is that this covers the complete range of values that can be represented by a sequence of eight binary (0,1) values because 28 – 1 = 255.  For this reason it is called the eight-bit RGB color model.
There are other color spaces beyond RGB, each representing a transformation for some particular purpose of the basic RGB color space. A common such color space is the CIELAB color space, defined by the International Commission of Illumination (CIE). You can read about this color space here. In brief, quoting from this Wikipedia article, the CIELAB color space “expresses color as three values: L* for the lightness from black (0) to white (100), a* from green (−) to red (+), and b* from blue (−) to yellow (+). CIELAB was designed so that the same amount of numerical change in these values corresponds to roughly the same amount of visually perceived change.”
Each pixel in an image contains two types of information: its color, represented by a vector in a three dimensional space such as the RGB or CIELAB color space, and its location, represented by a vector in two dimensional space (its x and y coordinates). Thus the totality of information in a pixel is a vector in a five dimensional space. The SLIC algorithm performs a clustering operation similar to K-means clustering on the collection of pixels as represented in this five-dimensional space.
The SLIC algorithm measures total distance between pixels as the sum of two components, dlab, the distance in the CIELAB color space, and dxy, the distance in pixel (x,y) coordinates. Specifically, the distance Ds is computed as
where S is the grid interval of the pixels and m is a compactness parameter that allows one to emphasize or de-emphasize the spatial compactness of the superpixels. The algorithm begins with a set of K cluster centers regularly arranged in the (x,y) grid and performs K-means clustering of these centers until a predefined convergence measure is attained.
The following discussion assumes some basic familiarity with the raster package. If you are not familiar with this package, an excellent introduction is given here. After downloading the data files, you can use them to first construct RasterLayer objects and then combine them into a RasterBrick.

library(raster)
> b2 <- raster(“region_blue.grd”)
> b3 <- raster(“region_green.grd”)
> b4 <- raster(“region_red.grd”)
> b5 <- raster(“region_IR.grd”)
> region.brick <- brick(b2, b3, b4, b5)

We will need to save the number of rows and columns for present and future use.
> print(nrows <- region.brick@nrows)
[1] 187
> print(ncols <- region.brick@ncols)

[1] 329
Next we plot the region.brick object (shown above) and write it to disk.
> # False color image
> plotRGB(region.brick, r = 4, g = 3, b = 2,
+   stretch = “lin”)

> # Write the image to disk
> jpeg(“FalseColor.jpg”, width = ncols,
+   height = nrows)

> plotRGB(region.brick, r = 4, g = 3, b = 2,
+   stretch = “lin”)

> dev.off()
We will now use SLIC for image segmentation. The code here is adapted directly from the superpixels vignette.

> library(OpenImageR)
> False.Color <- readImage(“FalseColor.jpg”)
> Region.slic = superpixels(input_image =
+   False.Color, method = “slic”, superpixel = 80,
+   compactness = 30, return_slic_data = TRUE,
+   return_labels = TRUE, write_slic = “”,
+   verbose = FALSE)
Warning message:
The input data has values between 0.000000 and 1.000000. The image-data will be multiplied by the value: 255!
> imageShow(Region.slic$slic_data)

 
We will discuss the warning message in Section 3. The function imageShow() is part of the OpenImageR package.
The OpenImageR function superpixels() creates superpixels, but it does not actually create an image segmentation in the sense that we are using the term is defined in Section 1. This is done by functions in the SuperPixelImageSegmentation package (Mouselimis, 2018), whose discussion is also contained in the superpixels vignette. This package is also described in the Additional Topic. As pointed out there, however, some aspects that make the function well suited for image visualization detract from its usefulness for data analysis, so we won’t discuss it further in this post. Instead, we will move directly to data analysis using the OpenImageR package.

3. Data analysis using superpixels
To use the functions in the OpenImageR package for data analysis, we must first understand the structure of objects created by these packages. The simple, open structure of the objects generated by the function superpixels() makes it an easy matter to use these objects for our purposes. Before we begin, however, we must discuss a preliminary matter: the difference between a Raster* object and a raster object. In Section 2 we used the function raster() to generate the blue, green, red and IR band objects b2, b3, b4, and b5, and then we used the function brick() to generate the object region.brick. Let’s look at the classes of these objects.

> class(b3)
[1] “RasterLayer”
attr(,”package”)
[1] “raster”
> class(full.brick)

[1] “RasterBrick”
attr(,”package”)
[1] “raster”

Objects created by the raster packages are called Raster* objects, with a capital “R”. This may seem like a trivial detail, but it is important to keep in mind because the objects generated by the functions in the OpenImageR and SuperpixelImageSegmentation packages are raster objects that are not Raster* objects (note the deliberate lack of a courier font for the word “raster”).
In Section 2 we used the OpenImageR function readImage() to read the image data into the object False.Color for analysis. Let’s take a look at the structure of this object.

> str(False.Color)
num [1:187, 1:329, 1:3] 0.373 0.416 0.341 0.204 0.165 …
It is a three-dimensional array, which can be thought of as a box, analogous to a rectangular Rubik’s cube. The “height” and “width” of the box are the number of rows and columns respectively in the image and at each value of the height and width the “depth” is a vector whose three components are the red, green, and blue color values of that cell in the image, scaled to [0,1] by dividing the 8-bit RGB values, which range from 0 to 255, by 255. This is the source of the warning message that accompanied the output of the function superpixels(), which said that the values will be multiplied by 255. For our purposes, however, it means that we can easily analyze images consisting of transformations of these values, such as the NDVI. Since the NDVI is a ratio, it doesn’t matter whether the RGB values are normalized or not. For our case the RasterLayer object b5 of the RasterBrick False.Color is the IR band and the object b4 is the R band. Therefore we can compute the NDVI as

> NDVI.region <- (b5 – b4) / (b5 + b4)

Since NDVI is a ratio scale quantity, the theoretically best practice is to plot it using a monochromatic representation in which the brightness of the color (i.e., the color value) represents the value (Tufte, 1983). We can accomplish this using the RColorBrewer function brewer.pal().

> library(RColorBrewer)
> plot(NDVI.region, col = brewer.pal(9, “Greens”),
+   axes = TRUE,
main = “Region NDVI”)
 
This generates the NDVI map shown above. The darkest areas have the highest NDVI. Let’s take a look at the structure of the RasterLayer object NDVI.region.
> str(NDVI.region)
Formal class ‘RasterLayer’ [package “raster”] with 12 slots
              *                 *                   *
..@ data    :Formal class ‘.SingleLayerData’ [package “raster”] with 13 slots
              *                 *                   *
.. .. ..@ values    : num [1:61523] 0.1214 0.1138 0.1043 0.0973 0.0883 …
              *                 *                   *
..@ ncols   : int 329
..@ nrows   : int 187
Only the relevant parts are shown, but we see that we can convert the raster data to a matrix that can be imported into our image segmentation machinery as follows. Remember that by default R constructs matrices by columns. The data in Raster* objects such as NDVI.region are stored by rows, so in converting these data to a matrix we must specify byrow = TRUE.

> NDVI.mat <- matrix(NDVI.region@data@values,
+   nrow = NDVI.region@nrows,
+  
ncol = NDVI.region@ncols, byrow = TRUE)

The function imageShow() works with data that are either in the eight bit 0 – 255 range or in the [0,1] range (i.e., the range of x between and including 0 and 1). It does not, however, work with NDVI values if these values are negative. Therefore, we will scale NDVI values to [0,1].

> m0 <- min(NDVI.mat)
> m1 <- max(NDVI.mat)
> NDVI.mat1 <- (NDVI.mat – m0) / (m1 – m0)
> imageShow(NDVI.mat)

 Here is the resulting image.

The function imageShow() deals with a single layer object by producing a grayscale image. Unlike the green NDVI plot above, in this plot the lower NDVI regions are darker.
We are now ready to carry out the segmentation of the NDVI data. Since the structure of the NDVI image to be analyzed is the same as that of the false color image, we can simply create a copy of this image, fill it with the NDVI data, and run it through the superpixel image segmentation function. If an image has not been created on disk, it is also possible (see the Additional Topic) to create the input object directly from the data.

> NDVI.data <- False.Color
> NDVI.data[,,1] <- NDVI.mat1
> NDVI.data[,,2] <- NDVI.mat1
> NDVI.data[,,3] <- NDVI.mat1 In the next section we will describe how to create an image segmentation of the NDVI data and how to use cluster analysis to create a land classification.

4. Expanding the data analysis capacity of superpixels
Here is an application of the function superpixels() to the NDVI data generated in Section 3.


> NDVI.80 = superpixels(input_image = NDVI.data,
+   method = “slic”, superpixel = 80,
+   compactness = 30, return_slic_data = TRUE,
+   return_labels = TRUE, write_slic = “”,
+   verbose = FALSE)
> imageShow(Region.slic$slic_data)

Here is the result.
Here the structure of the object NDVI.80 .

> str(NDVI.80)
List of 2
$ slic_data: num [1:187, 1:329, 1:3] 95 106 87 52 42 50 63 79 71 57 …
$ labels   : num [1:187, 1:329] 0 0 0 0 0 0 0 0 0 0 …

It is a list with two elements, NDVI.80$slic_data, a three dimensional array of pixel color data (not normalized), and NDVI.80$labels, a matrix whose elements correspond to the pixels of the image. The second element’s name hints that it may contain values identifying the superpixel to which each pixel belongs. Let’s see if this is true.

> sort(unique(as.vector(NDVI.80$labels)))
[1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
[25] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
[49] 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71

There are 72 unique labels. Although the call to superpixels() specified 80 superpixels, the function generated 72. We can see which pixels have the label value 0 by setting the values of all of the other pixels to [255, 255, 255], which will plot as white.

> R0 <- NDVI.80
for (i in 1:nrow(R0$label))
+    for (j in 1:ncol(R0$label))
+       if (R0$label[i,j] != 0)
+          R0$slic_data[i,j,] <- c(255,255,255)
> imageShow(R0$slic_data)


This produces the plot shown here.
The operation isolates the superpixel in the upper right corner of the image, together with the corresponding portion of the boundary. We can easily use this approach to figure out which value of NDVI.80$label corresponds to which superpixel. Now let’s deal with the boundary. A little exploration of the NDVI.80 object suggests that pixels on the boundary have all three components equal to zero. Let’s isolate and plot all such pixels by coloring all other pixels white.

> Bdry <- NDVI.80
for (i in 1:nrow(Bdry$label))
+    for (j in 1:ncol(Bdry$label))
+     if (!(Bdry$slic_data[i,j,1] == 0 &
+     Bdry$slic_data[i,j,2] == 0 &
+       Bdry$slic_data[i,j,3] == 0))

+       Bdry$slic_data[i,j,] <- c(255,255,255)
> Bdry.norm <- NormalizeObject(Bdry$slic_data)
> imageShow(Bdry$slic_data)



This shows that we have indeed identified the boundary pixels. Note that the function imageShow() displays these pixels as white with a black edge, rather than pure black. Having done a preliminary analysis, we can organize our segmentation process into two steps. The first step will be to replace each of the superpixels generated by the OpenImageR function superpixels() with one in which each pixel has the same value, corresponding to a measure of central tendency (e.g, the mean, median , or mode) of the original superpixel. The second step will be to use the K-means unsupervised clustering procedure to organize the superpixels from step 1 into a set of clusters and give each cluster a value corresponding to a measure of central tendency of the cluster. I used the code developed to identify the labels and to generate the  boundary to construct a function make.segments() to carry out the segmentation. The first argument of make.segments() is the superpixels object, and the second is the functional measurement of central tendency. Although in this case each of the three colors of the object NDVI.80 have the same values, this may not be true for every application, so the function analyzes each color separately. Because the function is rather long, I have not included it in this post. If you want to see it, you can go to the Additional Topic linked to the book’s website.

Here is the application of the function to the object NDVI.80 with the second argument set to mean.

> NDVI.means <- make.segments(NDVI.80, mean)
> imageShow(NDVI.means$slic_data)

Here is the result.

The next step is to develop clusters that represent identifiable land cover types. In a real project the procedure would be to collect a set of ground truth data from the site, but that option is not available to us. Instead we will work with the true color rendition of the Landsat scene, shown here.
The land cover is subdivided using K-means into five types: dense crop, medium crop, scrub, open, and water.

> set.seed(123)
> NDVI.clus <-
+ kmeans(as.vector(NDVI.means$slic_data[,,1]), 5)

> vege.class <- matrix(NDVI.clus$cluster,
+    nrow = NDVI.region@nrows,

+    ncol = NDVI.region@ncols, byrow = FALSE)
> class.ras <- raster(vege.class, xmn = W,
+    xmx = E, ymn = S, ymx = N, crs =
+    CRS(“+proj=utm +zone=10 +ellps=WGS84”))

Next I used the raster function ratify() to assign descriptive factor levels to the clusters.


> class.ras <- ratify(class.ras)
> rat.class <- levels(class.ras)[[1]]
> rat.class$landcover <- c(“Water”, “Open”,
+  “Scrub”, “Med. Crop”, “Dense Crop”)

> levels(class.ras) <- rat.class
> levelplot(class.ras, margin=FALSE,
+   col.regions= c(“blue”, “tan”,

+   “lightgreen”, “green”, “darkgreen”),
+    main = “Land Cover Types”)


The result is shown in the figure at the start of the post. We can also overlay the original boundaries on top of the image. This is more easily done using plot() rather than levelplot(). The function plot() allows plots to be built up in a series of statements. The function levelplot() does not.

> NDVI.rasmns <- raster(NDVI.means$slic_data[,,1],
+   xmn = W,
xmx = E, ymn = S, ymx = N,
+   crs = CRS(“+proj=utm +zone=10 +ellps=WGS84”))

> NDVI.polymns <- rasterToPolygons(NDVI.rasmns,
+   dissolve = TRUE)

> plot(class.ras, col = c(“blue”, “tan”,
+   “lightgreen”,
“green”, “darkgreen”),
+   main = “Land Cover Types”,
legend = FALSE)
> legend(“bottom”, legend = c(“Water”, “Open”,
+   “Scrub”, “Med. Crop”,
“Dense Crop”),
+   fill = c(“blue”, “tan”, “lightgreen”, “green”,

+   “darkgreen”))
> plot(NDVI.polymns, add = TRUE)


The application of image segmentation algorithms to remotely sensed image classification is a rapidly growing field, with numerous studies appearing every year. At this point, however, there is little in the way of theory on which to base an organization of the topic. If you are interested in following up on the subject, I encourage you to explore it on the Internet.

References
Achanta, R., A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk (2010). SLIC Superpixels. Ecole Polytechnique Fedrale de Lausanne Technical Report 149300.

Appelhans, T., F. Detsch, C. Reudenbach and S. Woellauer (2017).  mapview: Interactive Viewing of Spatial Data in R. R package version 2.2.0.  https://CRAN.R-project.org/package=mapview Bivand, R., E. Pebesma, and V. Gómez-Rubio. (2008). Applied Spatial Data Analysis with R. Springer, New York, NY.

Frey, B.J. and D. Dueck (2006). Mixture modeling by affinity propagation. Advances in Neural Information Processing Systems 18:379.

Hijmans, R. J. (2016). raster: Geographic Data Analysis and Modeling. R package version 2.5-8. https://CRAN.R-project.org/package=raster

Mouselimis, L. (2018). SuperpixelImageSegmentation: Superpixel Image Segmentation. R package version 1.0.0. https://CRAN.R-project.org/package=SuperpixelImageSegmentation

Mouselimis, L. (2019a). OpenImageR: An Image Processing Toolkit. R package version 1.1.5. https://CRAN.R-project.org/package=OpenImageR

Mouselimis, L. (2019b) Image Segmentation Based on Superpixel Images and Clustering. https://cran.r-project.org/web/packages/OpenImageR/vignettes/Image_segmentation_superpixels_clustering.html.

Ren, X., and J. Malik (2003) Learning a classification model for segmentation. International Conference on Computer Vision, 2003, 10-17.

Stutz, D., A. Hermans, and B. Leibe (2018). Superpixels: An evaluation of the state-of-the-art. Computer Vision and Image Understanding 166:1-27.

Tufte, E. R. (1983). The Visual Display of Quantitative Information. Graphics Press, Cheshire, Conn.

Yassine, B., P. Taylor, and A. Story (2018).  Fully automated lung segmentation from chest radiographs using SLICO superpixels. Analog Integrated Circuits and Signal Processing 95:423-428.

Zhou, B. (2013) Image segmentation using SLIC superpixels and affinity propagation clustering. International Journal of Science and Research. https://pdfs.semanticscholar.org/6533/654973054b742e725fd433265700c07b48a2.pdf  






< p style=”text-align: center”> 

nnlib2Rcpp: a(nother) R package for Neural Networks

For anyone interested, nnlib2Rcpp is an R package containing a number of Neural Network implementations and is available on GitHub. It can be installed as follows (the usual way for packages on GitHub):
library(devtools)
install_github("VNNikolaidis/nnlib2Rcpp")
The NNs are implemented in C++ (using  nnlib2 C++ class library) and are interfaced with R via Rcpp package (which is also required).

The package currently includes the following NN implementations:

  • A Back-Propagation (BP) multi-layer NN (supervised) for input-output mappings.
  • An Autoencoder NN (unsupervised) for dimensionality reduction (a bit like PCA) or dimensionality expansion.
  • A Learning Vector Quantization NN (LVQ, supervised) for classification.
  • A Self-Organizing Map NN (unsupervised, simplified 1-D variation of SOM) for clustering (a bit like k-means).
  • A simple Matrix-Associative-Memory NN (MAM, supervised) for storing input-output vector pairs.
For more information see the package documentation.

Generate names using posterior probabilities

If you are building synthetic data and need to generate people names, this article will be a helpful guide. This article is part of a series of articles regarding the R package conjurer. You can find the first part of this series here.

Steps to generate people names


1. Installation


Install conjurer package by using the following code. 
 install.packages("conjurer") 

2. Training data Vs default data


The package conjurer provides 2 two options to generate names.
    • The first option is to provide a custom training data. 
    • The second option is to use the default training data provided by the package.
If it is people names that you are interested in generating, you are better off using the default training data. However, if you would like to generate names of  items or products (example: pharmaceutical drug names), it is recommended that you build your own training data.
The function that helps in generating names is buildNames. Let us understand the inputs of the function. This function takes the form as given below.
buildNames(dframe, numOfNames, minLength, maxLength)
In this function,
dframe is a dataframe. This dataframe must be a single column dataframe where each row contains a name. These names must only contain english alphabets(upper or lower case) from A to Z but no special characters such as “;” or non ASCII characters. If you do not pass this argument to the function, the function uses the default prior probabilities to generate the names.

numOfNames is a numeric. This specifies the number of names to be generated. It should be a non-zero natural number. 

minLength is a numeric. This specifies the minimum number of alphabets in the name. It must be a non-zero natural number.

maxLength is a numeric. This specifies the maximum number of alphabets in the name. It must be a non-zero natural number.

3. Example


Let us run this function with an example to see how it works. Let us use the default matrix of prior probabilities for this example. The output would be a list of names as given below.
library(conjurer)
peopleNames <- buildNames(numOfNames = 3, minLength = 5, maxLength = 7)
print(peopleNames)
[1] "ellie"   "bellann" "netar" 
Please note that since this is a random generator, you may get other names than displayed in the above example. 

4. Consolidated code


Following is the consolidated code for your convenience.
#install latest version
install.packages("conjurer") 

#invoke library
library(conjurer)

#generate names
peopleNames <- buildNames(numOfNames = 3, minLength = 5, maxLength = 7) 

#inspect the names generated
print(peopleNames) 

5. Concluding remarks


In this article, we have learnt how to use the R package conjurer and generate names. Since the algorithm relies on prior probabilities, the names that are output may not look exactly like real human names but will phonetically sound like human names. So, go ahead and give it a try. If you like to understand the underlying code that generates these names, you can explore the GitHub repository here. If you are interested in what’s coming next in this package, you can find it in the issues section here

Covid-19 interactive map (using R with shiny, leaflet and dplyr)

The departement of Public Health of the Strasbourg University Hospital (GMRC, Prof. Meyer) and the Laboratory of Biostatistics and Medical Informatics of the Strasbourg  Medicine Faculty (Prof. Sauleau), to the extent of their means and expertise, are contributing to the fight against Covid-19 infection. Doctor Fabacher has produced an interactive map for global monitoring of the infection, accessible at :

https://thibautfabacher.shinyapps.io/covid-19/



This map, which complements the Johns Hopkins University tool (Coronavirus COVID-19 Global Cases by Johns Hopkins CSSE), focuses on the evolution of the number of cases per country and for a given period, but in terms of incidence and prevalence. It is updated daily.
The period of interest can be defined by the user and it is possible to choose  :
  • The count of new cases over a period of time or the same count in relation to the population of the country (incidence).
  • The total case count over a period of time or the same count reported to the population (prevalence).

This map is made using R with shiny, leaflet and dplyr packages.

Code available here : https://github.com/DrFabach/Corona/blob/master/shiny.r

Reference
Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis; published online Feb 19. https://doi.org/10.1016/S1473-3099(20)30120-1.