Learning inferential statistics using R

Imagine you need to find the average height of 20-year olds. One way is to go around and measure each person individually. But that seems quite a bit of work, doesn’t it? Luckily, there’s a better way. Inferential statistics allows us to use samples to draw conclusions about the population. In other words, we can get a small group of people and use their characteristics to estimate the characteristics of the entire group.
 To see how this works in practice, let’s take a look at a dataset from Kaggle. This platform provides a wealth of data sets from various fields, each offering unique challenges for R users. Here, we’ll be using a dataset on Cardiovascular diseases compiled by Jocelyn Dumlao.
This dataset originates from a renowned multispecialty hospital situated in India, encompassing a comprehensive array of health-related information. Comprising an extensive structure of 1000 columns and 14 rows, this dataset plays a pivotal role in the early detection of diseases.
Let us see how to import this into RStudio. The dataset is imported into RStudio using the library ‘readr’ (this is only if the dataset is in .csv format). Replace “File path” with the path of your downloaded dataset.
library(readr)
cardio <- read.csv("File path")
Just type in the name of the variable you used to import the dataset so that you can view the entire dataset in RStudio.
cardio


The first 6 rows of the dataset can be viewed using the ‘head’ function.
top_6=head(cardio)
top_6

Similarly, the last 6 rows of the dataset can be viewed using the ‘tail’ function.
bottom_6=tail(cardio)
bottom_6

The dimension of the dataset (number of rows and columns) can be found out using the ‘dim’ function.
dimension=dim(cardio)
dimension

The entire dataset can be termed as population and all the population parameters can be easily found. The mean of a target variable in the population is calculated by the ‘mean’ function. Below, we choose serumcholestrol as the target variable.
mean_chol=mean(cardio$serumcholestrol)
mean_chol

So, we can infer that the average serumcholestrol levels in the patient population taken from the hospital is 311.447.
There also exists a function to calculate the standard deviation of a dataset.

std_chol=sd(cardio$serumcholestrol)
std_chol


From this value, it can be understood that the values of serumcholestrol lies 132.4438 below or above the mean level.
We take a random sample of size 100 where our target variable is serumcholestrol. If you want to take a random sample with replacement, give the third argument as TRUE. Here, we’re taking a sample without replacement.

sample_1=sample(cardio$serumcholestrol,100,FALSE)
sample_1
mean_sample_chol=mean(sample_1)
mean_sample_chol

The mean of the sample that we selected is 317.51. This mean can be used to calculate the test statistic which further can be used to make decisions about the null hypothesis(whether to accept or reject).


Calculating the standard error of the sample


Getting the standard deviation of a dataset gives us many insights. Standard deviation provides the spread of the data around the mean. The standard deviation of sampling distribution is called standard error.
std_error=sd(sample_1)
std_error
The mean and the standard error of the sample is close to the population mean and standard deviation.

Plotting the sample distribution in histogram with x-axis as frequency and y-axis as Cholesterol levels.

To get a sampling distribution, we repeatedly take samples 1000 times. This is done using the replicate function, which repeatedly evaluates an expression a given number of times.
samp_dist_1=replicate(1000,mean(sample(cardio$serumcholestrol,100,replace=TRUE)))
samp_dist_1

The obtained graph is similar to normal distribution graph. That is, values near the mean is occurring more frequently than values far from mean. Now let's calculate the variance of the sampling distribution using the var function.
variance_sample_1=var(samp_dist_1)
variance_sample_1

Now let us see how increasing the sample size affects the variance of the sample.
Increasing the sample size by 200
sample_2=sample(cardio$serumcholestrol,200,FALSE)
sample_2
Calculating the mean of the sample 2
mean_sample_chol=mean(sample_2)
mean_sample_chol

The mean of the sample 2 with sample size 200 is 308.875 .

Calculating the standard error of the sample2
std_error=sd(sample_2)
std_error

The standard error of sample2 is 135.9615 .
We repeat the previous steps to obtain a sampling distribution.
samp_dist_2=replicate(1000,mean(sample(cardio$serumcholestrol,200,replace=TRUE)))
samp_dist_2
Now we plot it like before.
hist(samp_dist_2,main="Sampling distribution of serum_cholestrol",xlab = "Frequency",ylab = "Cholestrol Levels", col = "skyblue")
variance_sample_2=var(samp_dist_2)
variance_sample_2
The variance of the sample 2 with sample size 200 is 84.513. That is, the variance of sample 1 with size 100 is greater than the latter sample. Hence we can conclude that as sample size increase, variance as well as standard error reduces. On the other hand, precision increases with an increase in sample size.

Authors: Aadith Joseph Mathew, Amrutha Paalathara, Devika S Vinod, Jyosna Philip

DICOM Parsing with R

Abstract

This blog post is to describe how to parse medically relevant non-image meta information from DICOM files using the programming language R. The resulting structure of the whole parsing process is an R data frame in form of a name – value table that is both easy to handle and flexible.

We first describe the general structure of DICOM files and which kind of information they contain. Following this, our DicomParseR module in R is explained in detail. The package has been developed as part of practical DICOM parsing from our hospital’s cardiac magnetic resonance (CMR) modalities in order to populate a scientific database. The given examples hence refer to CMR information parsing, however, due to its generic nature, DicomParseR may be used to parse information from any type of DICOM file.

The following graph illustrates the use of DicomParseR in our use case as an example:

Structure of CMR DICOM files

On top level, a DICOM file generated by a CMR modality consists of a header (hdr) section and the image (img) information. In between an XML part can be found.

The hdr section mainly contains baseline information about the patient, including name, birth date and system ID. It also contains contextual information about the observation, such as date and time, ID of the modality and the observation protocol.

The XML section contains quantified information generated by the modality’s embedded AI, e. g. regarding myocardial blood flow (MBF). All information is stored between specifically named sub tags. These tags will serve to search for specific information. For further information on DICOM files, please refer to dicomstandard.org.

The heterogeneous structure of DICOM files as described above requires the use of distinct submodules to compose a technically harmonized name – value table. The information from the XML section will be extended by information from the hdr section. The key benefit of our DicomParseR module is to parse these syntactically distinct sections, which will be described in the following.

Technical Approach

To extract information from the sub tags in the XML section and any additional relevant meta information from the hdr section of the DICOM file, following steps are performed:

    1. Check if a DICOM file contains desired XML tag
    2. If the desired tag is present, extract and transform baseline information from hdr part
    3. If step 1 applied, extract and transform desired tag information from XML part
    4. Combine the two sets of information into an integrated R data frame
    5. Write the data frame into a suitable database for future scientific analysis

The steps mentioned above will be explained in detail in the following.

Step 1: Check if a DICOM file contains the desired XML tag

At the beginning of processing, DicomParseR will check whether a certain tag is present in the DICOM file, in our case <ismrmrdMeta>. In case that tag exists, quantified medical information will be stored here. Please refer to ISMRMRD for further information about the ismrmrd data format.

For this purpose, DicomParseR offers the function file_has_content() that expects the file and a search tag as parameters. The function will use base::readLines() to read in the file and stringr::str_detect() to detect if the given tag is available in the file. Performance tests with help of the package microbenchmark have proven stringr’s outstanding processing speed in this context. If the given tag was found, TRUE is returned, otherwise FALSE.

Any surrounding application may hence call

if (DicomParseR::file_has_content(file, "ismrmrdMeta")) {…}

to only continue parsing DICOM files that contain the desired tag.

It is important to note, that the information generated by the CMR modality is actually not a single DICOM file but rather a composition of a multitude of files. These files may or may not contain the desired XML tag(s). If step 1 were omitted, our parsing module would import many more files than necessary.

Step 2: Extract and transform baseline hdr information

Step 2 will extract hdr information from the file. For this purpose, DicomParseR uses the function readDICOMFile() provided by package oro.dicom. By calling

oro.dicom::readDICOMFile(dicom_file)[["hdr"]]

the XML and image part are removed. The hdr section contains information such as patient’s name, sex and birthdate as well as meta information about the observation, such as date, time and contrast bolus. DicomParseR will save the hdr part as a data frame (in the following called df_hdr) in this step and later append it to the data frame that is generated in the next step.

Note that the oro.dicom package provides functionality to extract header and image data from a DICOM file as shown in the code snippet. However, it does not provide an out-of-the-box solution to extract the XML section and return it as an R data frame. For this purpose, the DicomParseR wraps the extra functionality required around existing packages for DICOM processing.

Step 3: Extract and transform information from XML part

In this step, the data within the provided XML tag is extracted and transformed into a data frame.

Following snippet shows an example about how myocardial blood flow numbers are stored in the respective DICOM files (values modified in terms of data privacy):

<ismrmrdMeta>
                …
                 <meta>
                               <name>GADGETRON_FLOW_ENDO_S_1</name>
                               <value>1.95</value>
                               <value>0.37</value>
                               <value>1.29</value>
                               <value>3.72</value>
                               <value>1.89</value>
                               <value>182</value>
                </meta>
                …
</ismrmrdMeta>

Within each meta tag, “name” specifies the context of the observation and “value” stores the myocardial blood flow data. The different data points between the value tags correspond to different descriptive metrics, such as mean, median, minimum and maximum values. Other meta tags may be structured differently. In order to stay flexible, the final extraction of a concrete value is done in the last step of data processing, see step 5.

Now, to extract and transform the desired information from the DICOM file, DicomParseR will first use its function extract_xml_from_file() for extraction and subsequently the function convert_xml2df() for transformation. With

extract_xml_from_file <- function(file, tag) {
  file_content <- readLines(file, encoding = "UTF-16LE", skipNul = TRUE)
  indeces <- grep(tag, file_content)
  xml_part <- paste(file_content[indeces[[1]]:indeces[[2]]], collapse = "\n")
  return(xml_part)
}

and “ismrmrdMeta” as tag, the function will return a string in XML structure. That string is then converted to an R data frame in the form of a name – value table by convert_xml2df(). Based on our example above, the resulting data frame will look like this:

name value [index]
GADGETRON_FLOW_ENDO_S1 1.95 [1]
GADGETRON_FLOW_ENDO_S1 0.37 [2]

That data frame is called df_ismrmrdMeta in the following. A specific value can be accesses with the combination of name and index, see the example in step 5.

Step 4: Integrate hdr and XML data frames

At this point in time, two data frames have resulted from processing the original DICOM file: df_hdr and df_ismrmrdMeta.

In this step, those two data frames are combined into one single data frame called df_filtered. This is done by using base::rbind().

For example, executing

df_filtered <- rbind(c("Pat_Weight", df_hdr$value[df_hdr$name=="PatientsWeight"][1]), df_ismrmrdMeta)

will extend the data frame df_ismrmrdMeta by the patient’s weight. The result is returned in form of the target data frame df_filtered. As with df_ismrmrdMeta, df_filtered will be a name – value table. This design has been chosen in order to stay as flexible as possible when it comes to subsequent data analysis.

Step 5: Populate scientific database

The data frame df_filtered contains all information from the DICOM file as a name – value table. In the final step 5, df_filtered may now be split again as required to match the use case specific schema of the scientific database.

For example, in our use case, the table “cmr_data” in the scientific database is dedicated to persist MBF values. An external program (in this case, an R Shiny application providing a GUI for end-user interaction) will call its function transform_input_to_cmr_data() to generate a data frame in format of the “cmr_data” table. By calling

transform_input_to_cmr_data <- function(df) {
  mbf_endo_s1 = as.double(df$value[df$name=="GADGETRON_FLOW_ENDO_S_1"][1])
  mbf_endo_s2 = ...
}

with df_filtered as parameter, the mean MBF values of the heart segments are extracted and can now be sent to the database. Another sub step would be to call transform_input_to_baseline_data() to persist baseline information in the database.

Summary and Outlook

This blog post has described the way DICOM files from CMR observations can be processed with R in order to extract quantified myocardial blood flow values for scientific analysis. Apart from R, different approaches by other institutes have been discussed publicly as well, e. g. by using MATLAB. Interested readers may refer to NIH, among others, for further information.

The chosen approach tries to respect both the properties of DICOM files, that is, their heterogeneous inner structure, the different types of information and their file size, as well as the specific data requirements by our institute’s use cases. With a single R data frame in form of a name – value table, the result of the process is easy to handle for further data analysis. At the same time, due to its flexible setup, DicomParseR may serve as a module in any kind of DICOM-related use case.

Thomas Schröder
Centrum für medizinische Datenintegration BHC Stuttgart

Customizing slides and documents using Quarto extensions workshop

Join our workshop on  Introduction to mixed frequency data models in R, which is a part of our workshops for Ukraine series! 


Here’s some more info: 


Title: Customizing slides and documents using Quarto extensions

Date: Thursday, January 11th, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone)

Speaker:Nicola Rennie is a Lecturer in Health Data Science based within the Centre for Health Informatics, Computing, and Statistics at Lancaster Medical School. Her research interests include applications of statistics and machine learning to healthcare and medicine, communicating data through visualisation, and understanding how we teach statistical concepts. Nicola also has experience in data science consultancy and collaborates closely with external research partners. She can often be found at data science meetups, presenting at conferences, and is the R-Ladies Lancaster chapter organiser.


Description: Quarto is an open-source scientific and technical publishing system that allows you to combine text with code to create fully reproducible documents in a variety of formats. The addition of custom styling to documents can make them look more professional and recognisable. In the first half of this workshop, we’ll look at ways to customise HTML outputs (including documents and revealjs slides) using CSS, and ways to customise PDF documents using LaTeX. In the second half, we’ll discuss the use of Quarto extensions as a way of sharing customised templates with others, demonstrate how to install and use extensions, and show the process of building your own custom style extension.


Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!

Using ChatGPT for Exploratory Data Analysis with Python, R and prompting workshop

Join our workshop on Using ChatGPT for Exploratory Data Analysis with Python, R and prompting, which is a part of our workshops for Ukraine series! 


Here’s some more info: 


Title:Using ChatGPT for Exploratory Data Analysis with Python, R and prompting


Date: Thursday, November 30th, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone)


Speakers: Gábor Békés is an Associate Professor at the Department of Economics and Business of Central European University, a research fellow at KRTK in Hungary, and a research affiliate at CEPR. His research is focused on international economics; economic geography and applied IO, and was published among others by the Global Strategy Journal, Journal of International Economics, Regional Science and Urban Economics or Economic Policy and have authored commentary on VOXEU.org. His comprehensive textbook, Data Analysis for Business, Economics, and Policy with Gábor Kézdi was publsihed by Cambridge University Press in 2021.


Seth Stephens-Davidowitz is a data scientist and New York Times bestselling author. His 2017 book, Everybody Lies, on the secrets revealed in internet data, was a New York Times bestseller; a PBS NewsHour Book of the Year; and an Economist Book of the Year.  His 2022 book, Don’t Trust Your Gut, on how people can use data to best achieve their life goals, was excerpted in the New York Times, the Atlantic, and Wired.  Seth has worked as a data scientist at Google; a visiting lecturer at the Wharton School of the University of Pennsylvania; and a contributing op-ed writer for the New York Times.  Seth has consulted for top companies.  He received his BA in philosophy, Phi Beta Kappa, from Stanford, and his PhD in economics from Harvard.  


Description: How can GenAI, like ChatGPT augment and speed up data exploration? Is it true that we no longer need coding skills? Or, instead, does ChatGPT hallucinate too much to be taken seriously? I will do a workshop with live prompting and coding to investigate.  I will experiment with two datasets shared ahead of the workshop.  The first comes from my Data Analysis textbook and is about football managers. Here I’ll see how close working with AI will get to what we have in the textbook, and compare codes written by us vs the machine. Second, I’ll work with a dataset I have no/little experience with and see how far it takes me. In this case, we will look at descriptive statistics, make graphs and tables, and work to improve a textual variable.  It will generate code and reports, and I’ll then check them on my laptop to see if they work. The process starts with Python but then I’ll proceed with R.


Seth Stephens-Davidowitz is writing a book in 30 days using ChatGPT’s Data Analysis.  The book is called Who Makes the NBA? and is a statistical analysis of what it takes to reach the top of basketball. Seth will illustrate his experience with one of the case studies he had worked on. The Workshop will end with Seth and Gabor chatting about their experiences in what works well


Minimal registration fee: 20 euro (or 20 USD or 800 UAH)




How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!



















Introduction to mixed frequency data models in R workshop

Join our workshop on  Introduction to mixed frequency data models in R, which is a part of our workshops for Ukraine series! 


Here’s some more info: 

Title: Introduction to mixed frequency data models in R

Date: Thursday, December 14th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker:Jonas Striaukas, Jonas is an assistant professor of statistics and finance and Marie Skłodowska-Curie Action fellow at the Copenhagen Business School, Department of Finance. His main research interests are econometrics/statistics and applications of machine learning methods to financial and macro econometrics. In particular, Jonas research interests are regularized regression models for mixed frequency data and factor-augmented sparse regression models. Before joining the Copenhagen Business School in 2022, he was a research fellow at the Fonds de la Recherche Scientifique—FNRS and Université Catholique de Louvain, where he carried out my PhD under the supervision of prof. Andrii Babii (UNC Chapel Hill) and prof. Eric Ghysels (UNC Chapel Hill).

Description: The course will cover statistical models for mixed frequency data analysis and their applications using R statistical software. First, we will look into classical mixed frequency, called MIDAS, regression models and their applications to nowcasting. We will then cover multivariate models such vector autoregression (VAR) and their application in mixed frequency data settings. Lastly, we will cover regularized MIDAS regressions and its extension to factor-augmented regression case.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)




How can I register?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.

Looking forward to seeing you during the workshop!









365 Data Science courses 100% free until November 20

From November 6 (07:00 PST) to November 20 (07:00 PST), 365 Data Science provides free unlimited access to their expansive curriculum—including engaging courses, hands-on data projects, and certificates of achievement. Boost your data science and AI expertise risk-free.

About the platform

With over 2 million global users, 365 Data Science empowers learners with essential skills in data science, analytics, programming, and machine and deep learning. The e-learning platform provides a theoretical foundation for all data-related disciplines – Probability, Statistics, and Mathematics. 365’s curriculum also offers a comprehensive introduction to R programming, statistics in R, and courses on data visualization in R.

Tradition & Mission: Enduring Endeavors

Currently in its third season, the 365 Data Science annual free access initiative—born during the initial COVID-19 lockdown—relaunches. CEO Ned Krastev defines data science as “an ever-evolving field offering immense opportunities for career advancement,” underscoring the company’s dedication to nurturing a global network of maturing data talents and avid learners.

The initiative’s impact is impressive, with over 152,000 unique users from 200 countries in 2022 alone, accessing more than 9.2 million minutes of content and earning 38,761 certificates. Krastev marvels at the yearly growth in student engagement, attributing it to their zeal for learning and excelling in new roles. He reaffirms 365 Data Science’s unwavering commitment to supporting these learners on their path to success.

New Features & Learning Insights

365 has launched a collection of real-world, data-focused projects for various skill levels and technical requirements. Users can solve authentic business cases to acquire practical skills, boosting their employability and portfolio. Ned states, “Our focus on practice-based learning is crucial for skill mastery. We’re dedicated to equipping students with pertinent skills from the beginning, and we’re excited to see how these projects will advance their careers.”

Join the program and start for free at  365 Data Science  

 

Builld Your First App With Shiny – R Shiny Tutorial For Beginners

So you want to learn Shiny? Congratulations, great decision!

Shiny is a wonderful tool for creating web applications using just R – without JavaScript, ASP.NET, Ruby on Rails or other programming languages. And the output is absolutely remarkable – beautiful charts, tables and text that present information in a highly attractive way.

Learning Shiny is not terribly hard, even if you are an absolute beginner. You just need some proper guidance. This is why I have created this tutorial that drives you through the process of creating a simple Shiny application, from A to Z.

It’s a long tutorial, so make sure you’re sitting in a comfortable chair and have pen and paper at hand, to take notes. If you want to follow along and code with me in RStudio, that’s even better.

Let’s dive straight in.

What Are We Going to Build?

Let’s start with the end in sight – let’s see first how our app is going to look like. Click below link (opens in a new window) and take a minute to play around with the app, then come back here. I’ll be waiting for you.

https://bestshiny.shinyapps.io/income/

Back already? OK, let’s go on…

The data set used to build this app is called demographics and contains information about 510 customers of a big company. The variables of interest for us are age, gender and income. You can see a fragment of the data set in the image below.


To download the entire data file, click here:

http://www.shiny-academy.com/downloads/demographics.csv

When the user selects a gender category and an age range, the app displays the following outputs:

  • the number of customers that meet the specified criteria
  • a histogram that presents the income distribution for those customers

As you could notice, this information updates instantly any time the user modifies their selection.

So let’s build this app from scratch. However, before even writing the first line of code we must become familiar with the basic components of any Shiny app.

Let’s Understand the Shiny App Structure

Every app is made up of two parts: user interface and server.

The user interface (or, briefly, UI) is actually a web page that the app user sees. As you know, the language of the web pages is HTML. Therefore, the user interface is HTML code that we write using different Shiny functions.

The Shiny app interface contains:

    • the inputs that allow the user to interact with the app
    • the outputs generated by the app in different formats (text, tables, charts, images etc.)

Let’s look at our app, for example.



Its interface has four elements:

    • two input objects: a dropdown menu and a slider
    • two output objects: a text block and a chart

So, the user interface creates the whole application layout – it indicates the exact place and configuration of each element in the page.

The second component of a Shiny app is the server.

The role of the server is to create the outputs (text, tables, charts etc.) using a set of instructions. These instructions are actually R functions and commands. So the Shiny server recognises any code that can be run by the R program.

Your Shiny server can operate:

    • on a local computer, i.e. your own computer (when you run the app in RStudio)
    • on a remote computer (a server located in another place)

As you remember, the outputs of our app update automatically when you change the input values. Why is that? What happens, actually?

Whenever an input is modified, the server re-creates the outputs using the instructions that we have written. This mechanism is called reactivity, and it’s the essential attribute of any Shiny application.

So far, so good. Now that we know the components of a Shiny application, we can write the skeleton of our own app.

The Essential App Template

All Shiny apps have the same basic template, presented in the picture below.


As you notice, this basic template has four lines.

In the first line we load the Shiny package using the library function. That’s self explanatory.

In the second line we create the user interface object, called ui. To build the ui object we use the fluidPage function. As the name shows, this function produces a fluid web page, i.e. a web page with flexible layout. More precisely, the elements in the page are resized in real time so they fit in the browser window.

All the objects in the user interface (inputs and outputs) will be written inside this function.

The third line initialises the server object, the second component of the application. This object is called server.

The server object is created using an R function. This function has two arguments: input and output. So the server function takes the input values (specified by the user) and creates the output objects that will be displayed in the user interface.

The fourth line is very important, because here we call a function that assembles our app: shinyApp. When the program notices this function, it recognises our file as being a Shiny application.

The shinyApp function has two arguments that correspond to the core components of our app: ui and server. As you can see, the ui argument takes the value “ui”, because this is the name of the user interface object. The server argument takes the value “server”, because server is the name of our server object. So this function tells the program which is which:

    • which is the user interface (in our case, ui) and
    • which is the server (in our case, server)

Very important: this line must be the last line of code in your app. So don’t write anything below it. Please keep this in mind.

Before running a Shiny app you must save it under a name (it is advisable to save it in a separate folder, where you don’t have any other apps or R scripts). Then you press the “Run App” button at the top of the editor.



Adding Some Formatting

OK, now we know the basic structure of a Shiny app. It’s time to build a neat interface layout.

If you look at our app interface, the first thing you probably notice is the big heading (“Income Distribution”). To create a heading we can use the titlePanel function. We are going to write this function inside the fluidPage function, because the heading is an element of the user interface. Our app code will look like this:

library(shiny)

ui <- fluidPage (

titlePanel("Income Distribution")

)

server <- function (input, output) {}

shinyApp(ui = ui, server = server)

Furthermore, you can see that the input objects are placed in a side area at left, while the output elements are displayed in the central area. This type of arrangement is called “sidebar layout”, and is created with a special function called, well… sidebarLayout. In the sidebar layout we define two areas:

    • the left panel, using the sidebarPanel function
    • the main area, using the mainPanel function

Obviously, all these functions must be written inside the user interface object (i.e. inside the fluidPage function). Let’s add them to our code:

library(shiny)

ui <- fluidPage (

titlePanel("Income Distribution"),

sidebarPanel(

),

mainPanel(

)

)

server <- function (input, output) {}

shinyApp(ui = ui, server = server)

There is something very important in the code above that you must notice: I have put a comma after titlePanel and another comma after sidebarPanel. So in your code you must separate all the elements in the user interface by commas. It’s an essential rule that you should always remember. If you forget the commas, your app will crash.

When you run the code above, you see the following:


So all we have in our interface right now is the title. It’s time to create the other components. However, there are a couple of things to do before that.

First Things First

In the beginning, we have to load the needed packages. Our app uses two R packages: dplyr (for data manipulation) and ggplot2 (to draw the chart). So we must add two lines of code just before the ui object:

library(shiny)

library(dplyr)

library(ggplot2)

ui <- fluidPage (

titlePanel("Income Distribution"),

sidebarPanel(

),

mainPanel(

)

)

server <- function (input, output) {}

shinyApp(ui = ui, server = server)

Next, we have to load our working data set, using the read.csv command. I am going to call my data set object demo. Please make sure that the CSV file (your data source) is placed in the same folder with the app.

Let’s write a new line of code to load the data set:

library(shiny)

library(dplyr)

library(ggplot2)

demo <- read.csv("demographics.csv", stringsAsFactors = FALSE)

ui <- fluidPage (

titlePanel("Income Distribution"),

sidebarPanel(

),

mainPanel(

)

)

server <- function (input, output) {}

shinyApp(ui = ui, server = server)

Great. Now that we got this out of the way, let’s take care of the user interface objects.

Building the Inputs

We can create many types of input controls in Shiny, using particular input functions. For this app we only need two inputs: a dropdown menu and a slider.

Any input control in Shiny has two main parameters:

    • a name or id. This id must be unique. Please make sure that you don’t have two input objects with the same id in your app.
    • a label or description. This label is optional, but useful. Your app users cannot see the input id, but they can see the label.

Besides id and label, each input control has its own specific parameters.

Now let’s write the functions that create our input controls.

To generate a dropdown menu we use the selectInput function. We’ll write this function inside the sidebarPanel function (because the input controls are located in the sidebar area). Please take a look at the code below to see the arguments of the selectInput function:

library(shiny)

library(dplyr)

library(ggplot2)

demo <- read.csv("demographics.csv", stringsAsFactors = FALSE)

ui <- fluidPage (

titlePanel("Income Distribution"),

sidebarPanel(

selectInput("gender", "Choose a gender group", choices = c("Female", "Male"))

),

mainPanel(

)

)

server <- function (input, output) {}

shinyApp(ui = ui, server = server)

So our input is called “gender” and its label is “Choose a gender group”. The menu options (male and female) are introduced with the choices argument. The first option in the list is selected by default. Our dropdown menu looks like this:


Everything’s fine, so let’s get to the second input object.

To create a slider control we use the sliderInput function. Just like any other input, a slider has an id and a label. In addition, we must define the following parameters:

    • the slider limits (lower and upper)
    • the selected value(s). Our slider must allow us to define a range of values (for the age variable), so we must specify two default selected values (minimum and maximum).

Let’s write the sliderInput function now. First we have to put a comma after selectInput (as you remember, the objects in the user interface must be separated by commas). Then we write our function as you can see below:

library(shiny)

library(dplyr)

library(ggplot2)

demo <- read.csv("demographics.csv", stringsAsFactors = FALSE)

ui <- fluidPage (

titlePanel("Income Distribution"),

sidebarPanel(

selectInput("gender", "Choose a gender group", choices = c("Female", "Male")),

sliderInput("age", "Age range", min = 18, max = 73, value = c(25, 45))

),

mainPanel(

)

)

server <- function (input, output) {}

shinyApp(ui = ui, server = server)


Our slider id is “age”, and its label is “Age range”. The lower and upper limits are 18 and 73, respectively (these are the minimum and maximum age values in our data set). As for the default age range, it is defined as being 25-45, using the value argument. The slider object looks like this:




You notice that it has two handles that let the user specify an age interval between the minimum and the maximum age values.

OK, we are done with the inputs. Are you still with me? Excellent! Let’s move on to the output elements.

Creating the Output Placeholders

Now we must tell the program what type of output objects we need and where to put them in the user interface. For this purpose we use the output placeholders.

The Shiny package provides different output functions, corresponding to different output categories. The argument of each output function is the output name (or id). This name must be written between double quotes.

Please note that these functions do not build output objects. They only create placeholders that indicate the outputs type and place. To actually generate the outputs we have to use server functions, as we’ll see a bit later.

As you know, our app has two output objects: a text output and a chart output. Both are placed in the main area of the interface, so we write them inside the mainPanel function.

To create a text output placeholder we use the textOutput function (easy to remember, right?). Let’s call our output “count”, for example (because it prints the number of customers). Writing it is very simple:

library(shiny)

library(dplyr)

library(ggplot2)

demo <- read.csv("demographics.csv", stringsAsFactors = FALSE)

ui <- fluidPage (

titlePanel("Income Distribution"),

sidebarPanel(

selectInput("gender", "Choose a gender group", choices = c("Female", "Male")),

sliderInput("age", "Age range", min = 18, max = 73, value = c(25, 45))

),

mainPanel(

textOutput("count")

)

)

server <- function (input, output) {}

shinyApp(ui = ui, server = server)


To build a plot placeholder we have to use the plotOutput function. The id of this output will be “chart”, for instance. So I will add another line to of code to my script:

library(shiny)

library(dplyr)

library(ggplot2)

demo <- read.csv("demographics.csv", stringsAsFactors = FALSE)

ui <- fluidPage (

titlePanel("Income Distribution"),

sidebarPanel(

selectInput("gender", "Choose a gender group", choices = c("Female", "Male")),

sliderInput("age", "Age range", min = 18, max = 73, value = c(25, 45))

),

 
mainPanel(

textOutput("count"),

br(),

plotOutput("chart")

)

)

server <- function (input, output) {}

shinyApp(ui = ui, server = server)

Please notice one more thing: between the placeholder functions I have written another function, br. This function inserts a line break between the objects. I did that to separate the visual elements with white space and avoid the sensation of clutter in the user interface. (Of course, you can put line breaks between the input objects as well.)

That’s all for it. Now the program knows which types of output we need and where they will be located.

Our user interface is ready: we have created both input controls and output placeholders. In the following sections we’ll move our focus to the server function, because it’s the server side of our app that actually builds the output objects.

What Will the Server Do?

Basically, the server part of our app has to perform the following operations:

    1. filter the data set applying the user’s selections
    2. print the output text (“count”)
    3. build the plot (“chart”)

    These operations will be done with ordinary R code. However, I have a very important point to bring up here.

    Our code will operate with reactive objects: Shiny inputs and outputs. These objects can only be handled in a particular type of environment called reactive environment or reactive context. So we must create this reactive context using a special Shiny function.

    Now, before writing the code in the server part we have to understand how Shiny creates output objects.

    Creating Output Objects

    The Shiny program builds output objects using a three-step procedure.

    First, it takes the necessary input values from the inputs list. You remember that the server function has an input argument, do you? Well, this argument is nothing but a list that contains all the input values. These values are accessed using the $ sign, just as we do with any list in R.

    In our particular case we have three input values:

    1. the gender category (male or female). To get this category we simply write:

    input$gender
    


    2. the minimum age. To get this age we must write:

    input$age[1]
    

    So the minimum value in a slider input takes the index 1.

    3. the maximum age. To access this age we write:

    input$age[2]
    

    Yes, you guessed right: the maximum value in a slider input takes the index 2.

    In the second step, the outputs are generated using a special rendering function. We’ll talk about these functions in a few moments.

    Finally, in the third step, the outputs are saved in the outputs list. As you remember, the second argument of the server function is output. This argument is actually a list that contains all the output objects. To access any output we use the $ sign.

    Our app has two output placeholders: a text placeholder called “count” and a plot placeholder called “chart”. Correspondingly, we are going to build two output objects – a text and a plot – that we’ll save with the same names in the outputs list. To access these objects we will write:

    output$count

    and

    output$chart
    

    It’s important to remember: the outputs must be saved with the same name as the corresponding placeholders, so the program can match them up. That should go without saying.

    As soon as the outputs are saved, the placeholders are “filled” and the output objects are displayed in the user interface.

    Let’s see how to complete these steps in practice, using our app example.

    First Job: Filter the Data Set

    Before everything, we must filter our data set. We are going to use the filter command in dplyr. Let’s take a look at the code:

    library(shiny)
    
    library(dplyr)
    
    library(ggplot2)
    
    demo <- read.csv("demographics.csv", stringsAsFactors = FALSE)
    
    ui <- fluidPage (
    
    titlePanel("Income Distribution"),
    
    sidebarPanel(
    
    selectInput("gender", "Choose a gender group", choices = c("Female", "Male")),
    
    sliderInput("age", "Age range", min = 18, max = 73, value = c(25, 45))
    
    ),
    
     
    mainPanel(
    
    textOutput("count"),
    
    br(),
    
    plotOutput("chart")
    
    )
    
    )
    
    server <- function (input, output) {
    
    demo_filtered % filter(gender == input$gender,
                                          age >= input$age[1],
                                          age <= input$age[2])
    
    })
    
    }
    
    shinyApp(ui = ui, server = server)
    
    

    The filtering conditions are straightforward. The gender variable must be equal to the selected gender, and the age must be within the selected age range. But there is a key detail that I want you to notice: the filter command is placed inside another function called reactive. The role of this function is to create reactive context.

    As I have explained above, we need this reactive context to handle the reactive variables used by the filter function (input$gender, input$age[1] and input$age[2]). If we try to work with reactive variables outside reactive context, our app crashes. This is an essential take away lesson.

    We must call the reactive function using parentheses and curly brackets, as follows:

    reactive({ })
    
    

    The filtering result is stored in a new data set called demo_filtered. This data set is also a reactive object, because it was created with the reactive function. This is another important fact to keep in mind: all the objects created inside reactive context are reactive objects.

    Furthermore, when we call a reactive object we must always put a pair of round brackets (parentheses) after it, just like this:


    demo_filtered()
    
    

    If you forget the parentheses, the app will not work. That’s because reactive objects are assimilated to functions, and R functions are always called using round brackets, as you know.

    Fine. Now we have to count the number of entries in the filtered data set (because we must print this number in the user interface, right?). We are going to use another dplyr function, count. Let’s examine the code:

    library(shiny)
    
    library(dplyr)
    
    library(ggplot2)
    
    demo <- read.csv("demographics.csv", stringsAsFactors = FALSE)
    
    ui <- fluidPage (
    
    titlePanel("Income Distribution"),
    
    sidebarPanel(
    
    selectInput("gender", "Choose a gender group", choices = c("Female", "Male")),
    
    sliderInput("age", "Age range", min = 18, max = 73, value = c(25, 45))
    
    ),
    
     
    mainPanel(
    
    textOutput("count"),
    
    br(),
    
    plotOutput("chart")
    
    )
    
    )
    
    server <- function (input, output) {
    
    demo_filtered % filter(gender == input$gender,
                                          age >= input$age[1],
                                          age <= input$age[2])
    
    })
    
    entries % count()
    
    })
    
    }
    
    shinyApp(ui = ui, server = server)
    
    

    First, I followed the rule stated above and put parentheses after the data set name (demo_filtered). Then, I wrote the count command inside the reactive function. Why? Because demo_filtered is a reactive object, so it cannot be manipulated outside reactive context. So we absolutely need to create reactive context with this function.

    As a result, the variable entries (the number of entries) is a reactive object as well, so we must use round brackets any time we call it. I hope you’re beginning to get the hang of it.

    Let’s go on. Time to create the outputs now.

    Next Job: Print the Text

    As I said in a previous section, Shiny builds output objects using rendering functions. For the text outputs, this function is renderText. So any time we create a text placeholder with textOutput, we can “fill” that placeholder using renderText.

    Let’s see how the code works:

    library(shiny)
    
    library(dplyr)
    
    library(ggplot2)
    
    demo <- read.csv("demographics.csv", stringsAsFactors = FALSE)
    
    ui <- fluidPage (
    
    titlePanel("Income Distribution"),
    
    sidebarPanel(
    
    selectInput("gender", "Choose a gender group", choices = c("Female", "Male")),
    
    sliderInput("age", "Age range", min = 18, max = 73, value = c(25, 45))
    
    ),
    
     
    mainPanel(
    
    textOutput("count"),
    
    br(),
    
    plotOutput("chart")
    
    )
    
    )
    
    server <- function (input, output) {
    
    demo_filtered % filter(gender == input$gender,
                                          age >= input$age[1],
                                          age <= input$age[2])
    
    })
    
    entries % count()
    
    })
    
    output$count <- renderText({
    
    paste0("Number of customers: ", entries())
    
    })
    
    }
    
    shinyApp(ui = ui, server = server)
    
    

    First, please note the syntax of renderText: it is called using rounded and curly brackets. This function produces reactive context, just like the reactive function, so it can work with reactive objects.

    To create and print the text string we simply use the paste0 function from base R. Nothing special here. Please also note the parentheses after the variable entries (which is a reactive object).

    In the end, we save our text as an object in the outputs list, under the name “count” (the corresponding placeholder name).

    Now our text is shown in the main panel. Any time the user makes a new selection, the text changes accordingly (because the value of the variable entries changes).

    Final Job: Plot the Chart

    The function used to produce charts in Shiny is renderPlot. We need this function any time we have to “fill” a chart placeholder created with plotOutput.

    The renderPlot function generates reactive context, so it can handle reactive variables. Inside it we can use any R function that creates charts; in our app we are going to use ggplot.

    Let’s write the code now:

    library(shiny)
    
    library(dplyr)
    
    library(ggplot2)
    
    demo <- read.csv("demographics.csv", stringsAsFactors = FALSE)
    
    ui <- fluidPage (
    
    titlePanel("Income Distribution"),
    
    sidebarPanel(
    
    selectInput("gender", "Choose a gender group", choices = c("Female", "Male")),
    
    sliderInput("age", "Age range", min = 18, max = 73, value = c(25, 45))
    
    ),
    
     
    mainPanel(
    
    textOutput("count"),
    
    br(),
    
    plotOutput("chart")
    
    )
    
    )
    
    server <- function (input, output) {
    
    demo_filtered % filter(gender == input$gender,
                                          age >= input$age[1],
                                          age <= input$age[2])
    
    })
    
    entries % count()
    
    })
    
    output$count <- renderText({
    
    paste0("Number of customers: ", entries())
    
    })
    
    output$chart <- renderPlot(
    
    width = 500,
    
    height = 400,
    
    {
    
    ggplot(demo_filtered(), aes(income))+
                             geom_histogram(fill = "lightblue",      color="white")+
                             xlab("Income")+
                             ylab("# of customers")+
                             theme(panel.background = element_rect(fill = "white",
                             colour = "black"))+
                             labs(title = "Income Distribution",
                             subtitle = paste0("Gender: ", input$gender))+
                             theme(plot.title = element_text(size = 17, hjust = 0.5, face = "bold"),
                             plot.subtitle = element_text(size = 14, hjust = 0.5))
    
    
    })
    
    }
    
    shinyApp(ui = ui, server = server)
    
    

    So, the chart data source is demo_filtered (called with round brackets, because it’s a reactive object) and the chart is drawn with geom_histogram. The subtitle is created dynamically: it changes when the gender group changes (to accomplish that, we inserted the input$gender variable in the subtitle argument, as you can see).

    Please notice one more thing: the width and height parameters (chart dimensions) are set in the beginning, before opening the curly brackets.

    The rest of the code is usual ggplot code, so I’m not going to comment it.

    Finally, we save our plot in the outputs list, under the name “chart” – the same name as the corresponding placeholder. At this moment, our chart is displayed in the user interface.

    Well, it’s over now. Our app is done and functional.

    Congratulations, great work! You have just created your first Shiny application starting from zero.

    Isn’t There More to Shiny Than This?

    Yes. A lot more.

    Shiny can build sophisticated applications that use advanced data analysis and machine learning models. It can create dynamic input controls and complex, good-looking user interfaces that use HTML and CSS. It can draw interactive charts, work with files (for example, upload a data set from disk and use it further) and much more.

    If you want to learn how to build similar apps (and many more), I highly recommend the free chapter of my new Shiny video course. Actually, it’s a two-hour mini-course that introduces the basics of Shiny. If you like what you see there, you can get the whole course.

    Click here to start learning Shiny for free













    Using ChatGPT for Exploratory Data Analysis with Python, R and prompting

    Join our workshop on Using ChatGPT for Exploratory Data Analysis with Python, R and prompting, which is a part of our workshops for Ukraine series! 


    Here’s some more info: 


    Title:Using ChatGPT for Exploratory Data Analysis with Python, R and prompting


    Date: Thursday, November 30th, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone)


    Speaker: Gábor Békés is an Associate Professor at the Department of Economics and Business of Central European University, a research fellow at KRTK in Hungary, and a research affiliate at CEPR. His research is focused on international economics; economic geography and applied IO, and was published among others by the Global Strategy Journal, Journal of International Economics, Regional Science and Urban Economics or Economic Policy and have authored commentary on VOXEU.org. His comprehensive textbook, Data Analysis for Business, Economics, and Policy with Gábor Kézdi was publsihed by Cambridge University Press in 2021. 


    Description: How can GenAI, like ChatGPT augment and speed up data exploration? Is it true that we no longer need coding skills? Or, instead, does ChatGPT hallucinate too much to be taken seriously? I will do a workshop with live prompting and coding to investigate.  I will experiment with two datasets shared ahead of the workshop.  The first comes from my Data Analysis textbook and is about football managers. Here I’ll see how close working with AI will get to what we have in the textbook, and compare codes written by us vs the machine. Second, I’ll work with a dataset I have no/little experience with and see how far it takes me. In this case, we will look at descriptive statistics, make graphs and tables, and work to improve a textual variable.  It will generate code and reports, and I’ll then check them on my laptop to see if they work. The process starts with Python but then I’ll proceed with R.


    Minimal registration fee: 20 euro (or 20 USD or 800 UAH)




    How can I register?



    • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

    • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

    If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


    How can I sponsor a student?


    • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

    • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


    If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



    You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


    Looking forward to seeing you during the workshop!


















    Using Spatial Data with R Shiny workshop

    Join our workshop on Using Spatial Data with R Shiny, which is a part of our workshops for Ukraine series! 

    Here’s some more info: 

    Title: Using Spatial Data with R Shiny

    Date: Thursday, November 23rd, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone)

    Speaker: Michael C. Rubin, is an Engineer, MIT Data Scientist and Co-Founder of Open Digital Agriculture (former ODAPES), a Start-up with the mission of democratizing Digital Agriculture. Open Digital Agriculture leverages R-Shiny, along with GIS technology and Artificial Intelligence to include the overlooked 540 Million smallholder farmers into the digital transformation. Michael was a 2 times speaker at the global R-Shiny conference.

    Description: This workshop is about how to use R-Shiny in the context of geographic information systems (GIS). We will initially cover the R Leaflet package and learn how geographic information, from points to raster files, can be displayed in an R-Shiny app. During the work, we will develop a nice R-Shiny App, which allows us not only to display, but also to manipulate GIS related data. On the way there, we will touch some interesting Geostatistical concepts. Knowledge in R is required to follow the course and previous exposure to R-Shiny and some GIS techniques would be helpful, but you can follow the course even without it.

    Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


    How can I register?

    • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
    • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

    If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.

    How can I sponsor a student?


    • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
    • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


    If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



    You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


    Looking forward to seeing you during the workshop!

















    Visualising connection in R workshop

    Join our workshop on Visualising connection in R, which is a part of our workshops for Ukraine series! 

    Here’s some more info: 

    Title: Visualising connection in R
    Date: Thursday, November 9th, 19:00 – 21:00 CEST (Rome, Berlin, Paris timezone)
    Speaker: Rita Giordano is a freelance data visualisation consultant and scientific illustrator based in the UK. By training, she is a physicist who holds a PhD in statistics applied to structural biology. She has extensive experience in research and data science. Furthermore, she has over fourteen years of professional experience working with R. She is also a LinkedIn instructor. You can find her course “Build Advanced Charts with R” on LinkedIn Learning.
    Description: How to show connection? It depends on the connection we want to visualise. We could use a network, chord, or Sankey diagram.  The workshop will focus on how to visualise connections using chord diagrams. We will explore how to create a chord diagram with the {circlize} package.  In the final part of the workshop, I will briefly mention how to create a Sankey diagram with networkD3. Attendees need to have installed the {circlize} and {networkD3} packages.  
    Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


    How can I register?


    • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
    • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

    If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


    How can I sponsor a student?

    • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
    • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

    If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).


    You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


    Looking forward to seeing you during the workshop!