Network Analysis with R workshop

Join our workshop on Network Analysis with R, which is a part of our workshops for Ukraine series! 

Here’s some more info: 

Title: Network Analysis with R

Date: Thursday, October 26th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: David Schoch is the team lead for “Transparent Social Analytics” in the Department Computational Social Science at GESIS in Cologne. Before joining GESIS, David was a Presidential Fellow in the Department of Sociology at the University of Manchester, affiliated with the “Mitchell Centre for Social Network Analysis”. He has a PhD in Computer Science from the University of Konstanz and is the creator and maintainer of several network analysis R packages.

Description: Network analysis is a multidisciplinary field that delves into the intricate web of connections and interactions among entities, whether they are individuals, organizations, or nodes in a complex system. By employing graph theory, statistical methods, and computational tools, network analysis unveils the hidden patterns, structures, and dynamics that underlie these relationships.

In this workshop, I will introduce the package ecosystem for network analysis in R. I provide an overview of the key packages to conduct network studies and discuss some practical aspects. No network theory will be introduced, but I provide pointers for those who are interested to learn more beyond practical skills.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)

How can I register?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!


Introduction to Topic Modelling in R and Python workshop

Join our workshop on Introduction to Topic Modelling in R and Python, which is a part of our workshops for Ukraine series! 


Here’s some more info: 

Title: Introduction to Topic Modelling in R and Python

Date: Thursday, October 19th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Christian Czymara is a postdoc fellow at Tel Aviv University. His research focuses on attitudes, immigration, and political communication using quantitative and computational methods as well as natural language processing.

Description: This workshop offers an in-depth exploration of topic models, which allow extracting meaningful insights from extensive text corpora while minimizing the reliance on prior assumptions or annotated data. The workshop will start with the basics of text data preprocessing and progress to a general understanding of the underlying principles of topic modeling. It will cover a range of topic modeling techniques, such as Structural Topic Models, BiTerm, and Keyword Assisted Topic Models in R, and BERTopic in Python. We will explore the cases where each model is particularly promising. Participants will learn about the practical considerations when choosing a topic modeling algorithm, and how to apply these techniques to their own data. The lecture will be of interest to researchers and practitioners who are interested in extracting insights from large volumes of textual data, such as social media, news articles, or scientific publications.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


How can I register?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).


You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!



Introduction to Qualitative Comparative Analysis (QCA)

Join our workshop on Introduction to Qualitative Comparative Analysis (QCA) using R, which is a part of our workshops for Ukraine series! 


Here’s some more info: 

Title: Introduction to Qualitative Comparative Analysis (QCA) using R

Date: Thursday, October 5th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Ingo Rohlfing, Ingo Rohlfing is Professor of Methods of Empirical Social Research at the University of Passau. He is doing research on social science methods with a focus on qualitative methods (case studies and process tracing), Qualitative Comparative Analysis, multimethod research and research integrity.

Description: What are the conditions that produce stable coalition governments? What conditions are necessary for married couples not getting divorced? If you are interested in research questions like these or similar ones, QCA should be one of the first methods to consider for answering them. QCA is the go-to method for analyzing set relationships using any number of cases (small, medium, large) and with any kind of data (micro, meso, macro).

The participants of this course are introduced to the fundamentals of set relations and QCA, and the workflow of a QCA study using R. You are be introduced to the basic principles and requirements of coherent QCA designs and learn how to implement them using R. We cover all fundamental steps of a QCA study, including calibration; a necessity analysis (potentially); truth table formation; truth table minimization and interpretation of results. 


Minimal registration fee: 20 euro (or 20 USD or 800 UAH)



How can I register?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!

Access DataCamp’s entire platform for free until September 3, 11.59 PM EST!

Access DataCamp’s entire platform for free until September 3, 11.59 PM EST!

This one-time deal covers all 440+ data and AI courses. No catch, no credit card required—just unlimited learning for anyone with internet access. 

Start Now

What is Free Access Week?

Free week is exactly that, a free week to explore all of DataCamp’s features for zero cost.

Until September 3, 11.59 PM EST anyone can access DataCamp Premium features with a free DataCamp account. All courses, products, and features are available in the browser, so you don’t need any additional software to access DataCamp.

Don’t have an account? Sign up here and start using DataCamp immediately.

There has never been a better time to start learning data and AI

The proliferation of Generative AI has sharpened the need for data and AI skills in the modern world.

Whether you’re looking to learn or develop your R programming skills, take your first steps with ChatGPT, or you’re a business leader looking to upskill your team to improve data-driven decision-making—Free Access Week is available to everyone.

What can you access during Free Week?

Everything. Here’s an overview of what you get as part of Free Access Week:

  • 440+ interactive data and AI courses; including Introductions to R, ChatGPT, Python, SQL, and more. Alongside, advanced courses in MLOps, deep learning, image processing, and much more.

  • Challenge yourself with skill and career tracks; including the highly popular, full-circle R Programming skills track, start from zero and work your way through tailored learning paths to master some of the world’s most in-demand technologies.

  • Access 100+ real-life projects, 20+ assessments to benchmark your skills, and more—all designed to help you learn new skills quickly.

  • Prove yourself with Forbes’ #1 ranked data certification program. Available at Associate and Professional levels, verify your skills as a Data Scientist or Data Analyst (Associate or Professional), or Data Engineer (Associate).

Unlimited world-class education

Gain unlimited access to DataCamp’s entire learning library.

Brand-new AI curriculum

Go from zero to leveraging AI in your workflows in just a few hours of learning.

At DataCamp, our mission is to democratize data and AI skills for everyone. In collaboration with experts from Google, Microsoft, and Duolingo, they’ve developed an AI curriculum to empower more learners to harness the power of AI in their daily lives.

Whether you invest in the 10-hour AI Fundamentals Track or jump into individual courses such as Introduction to ChatGPT, Generative AI Concept, or Understanding Machine Learning, DataCamp’s interactive learning and bite-sized exercises make mastering new skills a breeze.

Dive into data science learning for all levels

DataCamp is for everyone: whether you have no prior data skills or you’re proficient in multiple programming languages.

If you’re looking for a starting point, check out our interactive course list below to give you a glimpse into our comprehensive course library:

Beginners courses:

Intermediate courses

Advanced courses

Projects to test your skills in real-world scenarios:

Beginner:

Intermediate:

Advanced:

Take advantage of this one-time deal to test your skills on a world-class learning platform before the window shuts!

Build serverless shiny application via Github page

Simple guide for simple shiny application 

TL;DR

I made shiny application in github page with quarto.

You can check code in my github repository and result, result2


How we use shiny

Shiny is R package to make user utilize R with web browser without install it.

So my company utilizes shiny to provide statistical analysis for doctors (who don’t know R but need statistics).


Behind shiny

As you know, shiny is consisted with 2 part. UI and Server

You may think just UI is channel to both get input (data) from user and return calculated output (result) to user.
and server is just calculator

It means, server requires dynamic calculation that may change, not fixed contents (it called as static web page)

To achieve dynamic calculation, there are several options.

We can use shinyapps.io, posit connect, or deploy own shiny server in other cloud like AWS / azure / GCP …

These options can be categorized into two main categories: free but with limited features, or feature-rich but paid.

There is no single right answer, but I use shinyapps.io in see toy level project or deploy using shiny server in company’s cloud server which is not just toy level.

Deep Learning with torch in R workshop

Join our workshop on Introduction to Deep Learning with torch in R, which is a part of our workshops for Ukraine series! 


Here’s some more info: 


Title: Deep Learning with torch in R

Date: Thursday, September 28th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Daniel Falbel, Daniel is a software engineer at Posit and maintains the ‘torch’ R package and its ecosystem. He previously maintained the TensorFlow and Keras packages.

Description: Deep Learning has grown exponentially in recent years and has powered breakthroughs in fields such as computer vision and natural language processing. In this workshop you will learn the basics of torch and its ecosystem, build and train deep learning models with torch.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


How can I register?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!




An Introduction to Bayesian A/B Testing in Stan, R, and Python workshop

Join our workshop on Introduction to An Introduction to Bayesian A/B Testing in Stan, R, and Python, which is a part of our workshops for Ukraine series! 

Here’s some more info: 

Title: An Introduction to Bayesian A/B Testing in Stan, R, and Python

Date: Thursday, September 14th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Jordan Nafa is a Data Scientist at Game Data Pros, Inc. where his work centers around Bayesian A/B Testing, stochastic optimization, and applied causal inference for revenue optimization, promotional pricing, and personalized targeting in video games. He is also a Ph.D. Candidate in Political Science at the University of North Texas where he previously taught undergraduate courses in causal inference, applied statistics, and American political behavior.

Description: This workshop will cover a basic introduction to Bayesian inference, A/B Testing, and decision theory for the analysis of large-scale field experiments in industry settings. After introducing the foundations of the Bayesian approach to A/B Testing, we will work through real-world examples using the probabilistic programming language Stan along with its R and Python interfaces.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


How can I register?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).


You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!



Benchmarking API Performance: R-Native and Plumber in Data Extraction and Sending

R, a language best known for its prowess in statistical analysis and data science, might not be the first choice that comes to mind when thinking about building APIs. However, rapid prototyping, scalability, seamless integration with data analysis,  and ease of debugging are reasons for me to encapsulate API functionality within R packages. In doing so, I like to distinguish between two approaches:



The R-native approach (blue): API interaction is available by functions of relevant R-packages that are directly installed into the R-session.

The Plumber approach (green): The Plumber package allows R code to be exposed as a RESTful web service through special decorator comments.  As a results, API functionality is assesed by sending REST calls (GET, POST, …) rather than  calling functions of R-packages directly. 

#* This endpoint is a dummy for loading data
#* @get load_from_db
#* @serializer unboxedJSON
#* @param row_limit :int
#* @response 200 OK with result
#* @response 201 OK without result
#* @response 401 Client Error: Invalid Request - Class
#* @response 402 Client Error: Invalid Request - Type
#* @response 403 Client Error: Missing or invalid parameter
#* @response 500 Server Error: Unspecified Server error
#* @response 501 Server Error: Database writing failed
#* @response 502 Server Error: Database reading failed
#* @tag demo
function(res, row_limit = 10000) {
   # load data from db 
   data <- dplyr::tibble()
   res$status <- 200
   res$body <- as.list(data)
   return(res$body)
}

Which approach to use?

Certainly, both approaches have their strengths and limitations. It should be no surprise, that in terms of execution time, CPU utilization and format consistency, the R-native approach is likely to be the first choice, as code and data is processed within one context. Furthermore, the approach offers flexibility for complex data manipulations, but can be challenging when it comes to maintenance, especially propagating new releases of packages to all relevant processes and credential management. To the best of my knowledge, there is no automated way of re-installing new releases directly into all related R-packages, even with using the POSIT package manager – so this easily becomes tedious.

In contrast, the Plumber API encourages modular design that enhances code organization and facilitates integration with a wide array of platforms and systems.  It streamlines package updates while ensuring a consistent interface. This means that interacting with a Plumber API remains separate from the underlying code logic provided by the endpoint. This approach not only improves version management but also introduces a clear separation between the client and server. In general, decoupling functionality through a RESTful API offers the possibility of dividing tasks into separate development teams more easily and thus a higher degree of flexibility and external support. Additionally, I found distributing a Plumber API notably more straightforward than handing over a raw R package. 

The primary goal of this blog post is to quantify the performance difference between the two approaches when it comes to getting data in and out of a database. Such benchmark can be particularly valuable for ETL (Extract, Transform, Load) processes, thereby shedding light on the threshold at which the advantages of the Plumber approach cease to justify its constraints. In doing so, we hope to provide information to developers who are faced with the decision of whether it makes sense to provide or access R functionalities via Plumber APIs.

Experimental Setup
The experimental setup encompassed a virtual machine instance equipped with  64GB RAM and an Intel(R) Xeon(R) Gold 6152 CPU clocked at 2.1GHz, incorporating 8 kernels, running Ubuntu 22.04 LTS, hosting the POSIT Workbench and Connect server (for hosting the Plumber API) and employing R version v4.2.1. Both POSIT services were granted identical access permissions to the virtual machine’s computational resources.

Both approaches are evaluated in terms of execution times, simply measured with system.time(), and maximal observed CPU load, the latter being expecially an important indicator on how how much data can be extracted and send at once. For each fixed number of data row, ranging from 10^4 to 10^7, 10 trials are being conducted and results beeing plotted by using a jittered beeswarm plot. For assessing the cpu load during the benchmark, I build a separate function that returns a new session object, within which every 10 seconds the output of NCmisc::top(CPU = FALSE) is appended to a file.

get_cpu_load <- function(interval = 10, root, name, nrow) {
  rs_cpu <- callr::r_session$new()
  rs_cpu$call(function(nrow, root, name, interval) {
    files <- list.files(root)
    n_files <- sum(stringr::str_detect(files, sprintf("%s_%s_", name, format(nrow, scientific = FALSE))))
    l <- c()
    while (TRUE) {
      ret <- NCmisc::top(CPU = FALSE)
      l <- c(l, ret$RAM$used * 1000000)
      save(l, file = sprintf("%s/%s_%s_%s.rda", root, name, format(nrow, scientific = FALSE), n_files + 1))
      Sys.sleep(interval)
    }
  }, args = list(nrow, root, name, interval))
  return(rs_cpu)
}


Result
Execution time: in the following figure A), the data extraction process is observed to be approximately 10 times slower, when utilizing the plumber API as compared to the R-native approach across all dataset sizes.  


(y-axis in logarithmic scale)

Both approaches display a linear increase in execution time on a logarithmic time scale, indicating exponential growth in the original data domain. Specifically, the mean execution times for R-native and Plumber start at 0.00078 and 0.00456 minutes, respectively, and escalate to 0.286 and 2.61 minutes. It is reasonable to assume that this exponential trend persists for larger datasets, potentially resulting in execution times exceeding half an hour for very large tables (> 100 million rows) when using Plumber.

Conversely, subfigure B) shows the execution time for sending data and illustrates that both approaches provide rather comparable performance, particularly with larger numbers of rows. While for 10,000 rows, the R-native approach is still twice as fast (average of 0.0023 minutes) compared to Plumber (0.00418), the advantage of being in one context diminishes as the number of rows increases. At 10 million rows, the Plumber approach is even faster than the R-native approach (1.88 min), averaging 1.7 minutes. Once again, the execution time exhibits an exponential growth trend with an increasing number of rows.

CPU Load: In examining maximum observable CPU load during data receiving and sending, notable differences emerge between the Plumber API and the R-native approach. 

(y-axis in logarithmic scale)

A) For data extraction up to 1 million rows, CPU utilization remains below 10% for both approaches. However, the utilization patterns diverge as row counts increase. Notably, the R-native approach maintains relatively consistent CPU usage (averaging 5.53%, 5.48%, 5.47%) up to 1 million rows, whereas the Plumber approach already experiences a noticeable increase (5.97%, 6.05%, 8.6%). When extracting 10 million rows, CPU usage surpasses 30% for Plumber, while R-native extraction incurs approximately five times less computational overhead. B) In contrast to execution time, a clear difference in CPU utilization becomes evident also during sending data. The R-native approach consistently demonstrates at least half as less CPU demands compared to Plumber across all data row sizes. For 10,000,000 rows, the plumber approach even consumes over three times more CPU power13.1% vs. 43.2%.  This makes up to almost 30GB in absolute terms.

Conclusion

The Plumber approach, while offering several advantages, encounters clear limitations when dealing with large datasets, be it tables with a substantial number of rows or extensive columns. As a result,  data extraction becomes roughly ten times slower, with CPU utilization being up to five and three times higher during getting data out and in, respectively. Digging deeper into it reveals that this gap is likely to result from the necessity of converting data into JSON format when using a web-based architecture. Plumber can’t handle R dataframes directly, which is why serializer have to to be used before sending and retrieving data from an endpoint. Even with lots of RAM capacity, this conversion process can lead to execution errors in practice as JSON representations may surpass the allowed byte size for the R datatype character.

>jsonlite::toJSON(dataframe)
Error in collapse_object(objnames, tmp, indent):
R character strings are limited to 2^31-1 bytes

The only viable workaround in such scenarios involves breaking down tables into smaller chunks based on certain identifiers.

Providing a precise table size limitation where the Plumber approach remains suitable proves challenging, as it hinges on a multitude of factors, including the number of rows, columns, and cell content within the dataset. Personally, I will stick to using the Plumber API for scenarios with limited data traffic, such as querying terminology or a statistical summary, as I generally prioritize code encapsulation and ease of maintenance over maximizing performance.

Micha Christ
Bosch Health Campus Centrum für Medizinische Datenintegration

Calculating the prediction interval coverage probability (PICP)

In digital soil mapping (DSM), we make predictions of the spatial distribution of a soil property, which comes with uncertainties/errors. To quantify the accuracy we split the data into a training and test set, where we train a machine learning model (e.g., random forest, additive models, splines, etc.) on the training set and evaluate the model on the test set. Then we predict over covariates to make a continous map and also we predict the uncertainties (e.g., standard deviation, standard error, prediction limit range, etc.). But how do we evaluate the estimates of uncertainty we just predicted? One option is to calculate the prediction interval coverage probability (PICP), which measures how well the predictions fall within the formulated prediction interval (PI). PIs are similar to confidence intervals but use predicted data to determine the intervals. 

This function will be in the rafikisol R package, however, I have not loaded it yet as there are some more functions I want to add. Nevertheless, we will make it into a function called calcPICP(), which was basically taken from the book “Using R for Digital Soil Mapping” by Malone et al., (2017).

The function takes 3 parameters, x = a data frame of the data, response = the vector of the measured data (e.g., data$response), and pred = the predicted value (e.g., data$predicted).

calcPICP = function(data, response, pred){

#We first get the residuals of the model
res = response - pred

#Then we get the standard deviation of the residuals and combine with the data.
data$stdev = sd(res)

#We than make a series of quantiles from a normal cumulative distribution.
qp <- qnorm(c(0.995, 0.9875, 0.975, 0.95, 0.9, 0.8, 0.7, 0.6, 0.55, 0.525))

#Then make a matrix the with the row length of the data and columns of qp
vMat <- matrix(NA, nrow = nrow(data), ncol = length(qp))

#Now we must loop around the quantiles and multiply it by the standard deviation to get a series of standard errors with different prediction intervals. 
for(i in 1:length(qp)){
vMat[,  i] <- data$stdev * qp[i]
}

#Make another matrix same as before for the upper limits
uMat <- matrix(NA, nrow = nrow(data), ncol = length(qp))

#We calculate the upper limits by adding the series of standard errors to the predictions of the model. 
for(i in 1:length(qp)) {
uMat[,  i] <- pred + vMat[, i]
}

#We make another matrix for the lower limits
lMat <- matrix(NA, nrow = nrow(data), ncol = length(qp))

#We calculate the lower limits by subtracting the series from the predicted values.
for(i in 1:length(qp)) {
lMat[, i] <- pred - vMat[,  i]
}

#Now we want to see which prediction intervals cover the measured data creating a matrix of 1s and 0s. 
bMat <- matrix(NA, nrow = nrow(data), ncol = length(qp))

for(i in 1:ncol(bMat)){
bMat[,  i]<-as.numeric(response <= uMat[,  i]  &
response >= lMat[, i])
}

#To calculate the PICP we take the colsums/nrow*100 for the matrix of 0s and 1s
picp <- colSums(bMat)/nrow(bMat)*100

#Make a vector of confidence levels
cl <- c(99, 97.5, 95, 90, 80, 60, 40, 20, 10, 5)

#We put into a data frame for plotting
results <- data.frame(picp = picp, cl = cl)

#Since we want PICP to CI to be a 1:1 line we also calculate Lin’s concordance correlation coefficient (CCC) with the yardstick R package.
ccc <- as.data.frame(yardstick::ccc_vec(results$picp, results$cl))

#Make name correct
names(ccc) = "CCC" #name

#must add axis values for plotting
ccc$x = 10 #x axis
ccc$y = 90 #y axis

#Now we can plot the PICP to CI, add the 1:1 line and the CCC
p = ggplot(data = results, aes(x= cl, y = picp))+ #add data
geom_point()+ #add points
geom_text(data = ccc,aes(x= x, y =y, label = paste("CCC = ",round(CCC, 2))))+ #add CCC value
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = 'red')+ #add 1:1 line
labs(x = 'Confidence level', y = "PICP", title = "PICP to confidence level")+ #labels
theme_bw() #make it look good

#Now we want to return a list of the plot as well as a data frame of the total results.
return(setNames(list(p, results), c("Plot", "Results")))
}

Now we have the function giving us a plot of the PICP to CI and results. This is useful when running many models and now we can just plug in the data.

picp = picpCalc(dat, dat$clay, dat$pred)

now, plot the data.
picp[[1]]

Introduction to Qualitative Comparative Analysis (QCA) using R workshop

Join our workshop on Introduction to Qualitative Comparative Analysis (QCA) using R, which is a part of our workshops for Ukraine series! 

Here’s some more info: 

Title: Introduction to Qualitative Comparative Analysis (QCA) using R

Date: Thursday, August 31st, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Ingo Rohlfing, Ingo Rohlfing is Professor of Methods of Empirical Social Research at the University of Passau. He is doing research on social science methods with a focus on qualitative methods (case studies and process tracing), Qualitative Comparative Analysis, multimethod research and research integrity.

Description:What are the conditions that produce stable coalition governments? What conditions are necessary for married couples not getting divorced? If you are interested in research questions like these or similar ones, QCA should be one of the first methods to consider for answering them. QCA is the go-to method for analyzing set relationships using any number of cases (small, medium, large) and with any kind of data (micro, meso, macro).

The participants of this course are introduced to the fundamentals of set relations and QCA, and the workflow of a QCA study using R. You are be introduced to the basic principles and requirements of coherent QCA designs and learn how to implement them using R. We cover all fundamental steps of a QCA study, including calibration; a necessity analysis (potentially); truth table formation; truth table minimization and interpretation of results. 

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


How can I register?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).


You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!