Structural Equation Modeling in R with the Lavaan package workshop

Learn how to use Structural Equation modeling in R! Join our workshop on Structural Equation Modeling in R with the Lavaan package which is a part of our workshops for Ukraine series. 


Here’s some more info: 


Title: Structural Equation Modeling in R with the Lavaan package


Date: Thursday, March 30th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone) 


Speaker: Nino Gugushvili is a post-Doc researcher at the Department of Work and Social Psychology at Maastricht University.


Description: In this workshop, we will go over the basics of structural equation modelling (SEM). We will talk about what SEM is and cover the essential steps of SEM. Next, we will learn path analysis (SEM with observed variables), confirmatory factor analysis, and full SEM (SEM with latent variables + observed variables). Along the way, we will also talk about revising our models and interpreting the results, and we’ll do all this in R, using the Lavaan package.


Minimal registration fee: 20 euro (or 20 USD or 800 UAH)




How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!





Generalized Additive Models in R workshop

Learn how to fit Generalized Additive Models in R! Join our workshop on Generalized Additive Models in R which is a part of our workshops for Ukraine series. 


Here’s some more info: 


Title: Generalized Additive Models in R


Date: Thursday, April 13th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)


Speaker: Gavin Simpson, Gavin is a statistical ecologist and freshwater ecologist/palaeoecologist. He has a B.Sc. in Environmental Geography and a Ph.D. in Geography from University College London (UCL), UK. After submitting his Ph.D. thesis in 2001, Gavin worked as an environmental consultant and research scientist in the Department of Geography, UCL, before moving, in 2013, to a research position at the Institute of Environmental Change and Society, University of Regina, Canada. Gavin moved back to Europe in 2021 and is now Assistant Professor of Applied Statistics in the Department of Animal and Veterinary Sciences at Aarhus University, Denmark. Gavin’s research broadly concerns how populations and ecosystems change over time and respond to disturbance, at time scales from minutes and hours, to centuries and millennia. Gavin has developed several R packages, including gratia, analogue, and cocorresp, he helps maintain the vegan package, and can often be found answering R- and GAM-related questions on StackOverflow and CrossValidated.



Description: Generalized Additive Models (GAMs) were introduced as an extension to linear and generalized linear models, where the relationships between the response and covariates are not specified up-front by the analyst but are learned from the data themselves. This learning is achieved by representing the effect of a covariate on the response as a smooth function, rather than following a fixed form (linear, quadratic, etc). GAMs are a large and flexible class of models that are widely used in applied research because of their flexibility and interpretability.

The workshop will explain what a GAM is and how penalized splines and automatic smoothness selection methods work, before focusing on the practical aspects of fitting GAMs to data using the mgcv R package, and will be most useful to people who already have some familiarity with linear and generalized linear models.



Minimal registration fee: 20 euro (or 20 USD or 750 UAH)




How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!




Using R in an High Performance Computing environment

In a common workflow when programming with R one only deals with a Desktop machine or a Laptop, for instance. This PC environment is convenient for R users as they can focus mainly on coding but it could be the case that the program is taking a long time to run (more than 1 hr. for instance) and one needs many repetitions for the same simulation. In some cases, the program could eat up the available memory of the PC. For a PC environment, tools such as Task Manager (Windows), Activity Monitor (Mac), and top/htop (Linux) could help you to monitor the usage of resources.

High Performance Computing (HPC) centers offer the possibility of increasing the resources (memory/CPU power) your program can utilize. If you opt for moving your workflow to an HPC environment, you would need to learn how to deal with it to take full advantage of the provided resources. In this post, I will write some recommendations that we offer to our users at the High Performance Computing Center North (HPC2N) but that could be applied to other centers as well.

One important aspect, that I observed tends to create issues when moving to HPC, is the terminology. Some of the common terms used in HPC such as cores, CPUs, nodes, shared memory, and distributed memory computing, among others are covered in an R for HPC course that we delivered previously in collaboration with the Parallelldatorcentrum (PDC) in Stockholm.

In an HPC environment, one allocates some resources (cores and memory) for running an R program. In a PC this step is hidden in most cases from the user but under the hood, the R program would assume that all resources in that machine are available and it would try to use them. As in HPC, this step should be done explicitly (through the use of batch text files or some web server such as Open OnDemand) you will need to consciously decide how much CPU and memory power your R program will use in an efficient manner. For instance, if you request 10 cores and 20 GB (RAM) but your application is not parallelized (serial code) and uses < 1GB, 9 cores will be idle during the simulation. Sometimes, it is fine to work with this type of setup if your application needs more memory than what is provided by a single core though. Also, take into account that most HPC centers work in a project-based manner with some possible cost (monetary or with job priority for instance).

Some R packages that make use of Linear Algebra libraries, such as BLAS and LAPACK, can automatically trigger the use of several threads. One way to explicitly control the number of threads to be used is with the package RhpcBLASctl as follows:

library(RhpcBLASctl)
blas_set_num_threads(8) #set the number of threads to 8

In some packages, a parallelization layer has been introduced by using a backend (such as the Parallel package), for instance in heavy routines like bootstrapping (boot package).  Other packages opted for a threaded mechanism, for instance for clustering there is a clusternor package. Examples of the usage of these packages can be found here

In the cases already mentioned, someone did the job of parallelizing the application for us and we only need to set the number of threads or workers. If we are the R code developers who want to port some serial into a parallel program, we would need most likely refactor the code and change our programming paradigm. It is important to mention that not all the parts of a program are suitable for parallelization and there could be parts that although parallelizable, one could not observe a significant speedup (ratio of simulation time with 1 core by time with N cores). Thus, one important aspect of code parallelization is to make a code analysis (profiling) by timing parts of the code and locating the bottlenecks that are suitable for parallelization.

In the following code in serial mode (unoptimized one), I am computing the 2D integral of the sinus function between 0 and π in both x and y ranges:

∫∫sin(x+y)dxdy = 0 

integral <- function(N){
# Function for computing a 2D sinus integral
h <- pi/N # Size of grid
mySum <-0 # Camel convention for variables' names

for (i in 1:N) { # Discretization in the x direction
x <- h*(i-0.5) # x coordinate of the grid cell
for (j in 1:N) { # Discretization in the y direction
y <- h*(j-0.5) # y coordinate of the grid cell
mySum <- mySum + sin(x+y) # Computing the integral
}
}

return(mySum*h*h)
}

One way to parallelize this code is by dividing the workload (for loop in the x direction) in an even manner by using some number of workers. In the present case, I will make use of the foreach function that is available in the doParallel package and that allows running tasks in parallel mode. Once I decided what part of the code I will parallelize (x integration) and the tools (foreach), I can refactor my original code. One possible parallel version can be:

integral_parallel <- function(N,i){
# Parallel function for computing a 2D sinus integral
myPartialSum <- 0.0
x <- h*(i-0.5) # x coordinate of the grid cell
for (j in 1:N) { # Discretization in the y direction
y <- h*(j-0.5) # y coordinate of the grid cell
myPartialSum <- myPartialSum + sin(x+y) # Computing the integral
}

return(myPartialSum)
}
 
Notice that here I changed the original programming paradigm because now my function only computes a partial value for each worker. The total value will be known only after all the workers finish their tasks and the result is summarized at the end. The doParallel package requires the initialization of a cluster and the foreach function requires the dopar option to run tasks in parallel mode:

library(doParallel)

cl <- makeCluster(M) # Create the cluster with M workers
registerDoParallel(cl)
r <- foreach(i=1:N, .combine = 'c') %dopar% integral_parallel(N,i)
stopCluster(cl)
integral <- sum(r)*h*h # Summarize and print out final result
integral

The complete example can be found here

A common mistake of HPC users is that they try to use batch scripts from other centers, assuming that SLURM or PBS job schedulers behave equally in different centers. Although that is true for the standard features, system administrators at one center could activate switches that are not available or behave slightly differently in other centers.

One recommendation is to use the HPC tools available in your center to monitor the resources’ usage by a simulation. If you have access to the computing nodes the most straightforward way to obtain this information is with top/htop commands. Otherwise, tools such as Grafana or Ganglia would be handy if they are available in your center.

Additional resources:
  • R in HPC course offered by HPC2N/PDC 

The State of Data Literacy 2023, by DataCamp

The State of Data Literacy 2023, by DataCamp
Download Now

In 2023, 87% of leaders recognize data literacy as the most important skill behind basic computer skills. However, only a third of organizations are offering data upskilling.

For most teams, bridging the data literacy skills gap is a universal challenge across modern businesses. Just as workforces adopted computers in the 1980s, and the internet in the 2000s, now organizations must embrace data skills to stay competitive, drive innovation, and attract top talent.

To help close this gap, DataCamp invested months into compelling The State of Data Literacy 2023 report, an expert-led and free-to-download guide to navigating the current data skills revolution, including a foreword from CEO and co-founder, Jonathan Cornelissen.

DataCamp independently surveyed over 550 business leaders across the UK and US to shed light on the most pressing data skills gaps facing modern organizations. In doing so, they uncovered key insights into the strategies data-first organizations are using to upskill their workforces. 

From companies taking their first steps into data literacy to data mature organizations, the report takes multiple leadership perspectives and dives into the business and individual benefits of data upskilling.

A key highlight revealed that leaders who engaged in data upskilling programs experienced more than 70% improvement in quality and speed of decision-making, innovation, customer experience, and employee retention across the board.

Whilst three of the top five fastest-growing skills in the past five years were data skills; business intelligence (41%), data science (37%), and data literacy 30%). In addition, 77% of leaders agreed they would pay a salary premium to candidates with data literacy skills

Download the report now to discover key insights that you can start applying in your organization today.

Download Now

RADAR 2023 | Free Annual Summit of the World’s Data Leaders

RADAR 2023 | Free Annual Summit of the World’s Data Leaders

Presented by DataCamp, join a selection of the world’s data leaders for a two-day
digital event designed to help data professionals build stronger careers in 2023.

From gaining a deeper understanding of which skills industry leaders are looking for to navigating the evolving data talent pool, uncover insights on data’s most pressing opportunities through a mix of keynotes, fireside chats, and panels.

Across these expert-led sessions, learn from the people at the forefront of data
transformation with leaders from world-class organizations such as Tableau, Alteryx, Qlik, Salesforce, JetBrains, Google, CBRE, and more.

From R to Python, Jupiter, and beyond, this is an unmissable event for anyone looking to strengthen their wider data skillset and accelerate their careers.

March 22-23 2023, 9 AM – 3 PM EST: Save your seat now.

Key sessions aimed at up-and-coming data scientists:

Breaking Into Data in 2023: How Building a Personal Brand Can Accelerate Data Careers
The secrets to a successful data career with the founder of DATAcated, Kate
Strachnyi. Learn how to build a personal brand, create opportunities through
networking, and build lasting connections within the data community.

How The Data Job Market Is Evolving in 2023
Stay informed on how the data job market is evolving in 2023. Join the CEO of
Orbition Group to learn about breaking into a competitive market, and the
importance of soft skills and value creation in building a successful data career.

An In-depth Guide to the DataCamp Certifications
Ranked at the #1 data certification program by Forbes, DataCamp’s VP of
Certification, Vicky Kennedy, discusses how a DataCamp certification can accelerate your data career. You’ll learn about the two levels of certification and how to prepare for exams. You’ll also uncover insider’s secrets to acing the case study—a take-home exercise based on real-world data scenarios.

Tips For Building An Effective Data Science Portfolio
Portfolio projects are the silver bullet for lack of work experience when it comes to finding data roles. Naledi Hollbruegge, Data Analytics Consultant, and James Le, Developer Advocate at Twelve Labs outline how to effectively present your portfolio projects to highlight your technical and soft skills.

Ask a Hiring Manager: The Keys to Landing a Job in Data Science
Google’s director of Ads Safety, Lukas Tencer, and DataCamp’s Director of Analytics, Jorge Vasquez on what drives successful data applicants. Throughout, they’ll answer audience questions on the key characteristics of successful data applicants, the questions hiring managers expect, and more.

View the full agenda and register here

Working with ChatGPT in R workshop

Learn how to use ChatGPT to improve your coding skills in R! Join our workshop on Working with ChatGPT in R which is a part of our workshops for Ukraine series. 


Here’s some more info: 


Title: Working with ChatGPT in R


Date: Thursday, March 9th, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone) 


Speaker: Dariia Mykhailyshyna, PhD Economics student at the University of Bologna. Previously worked at a Ukrainian think tank Centre of Economic Strategy


Description: In this workshop we will learn how you can fully harness the power of ChatGPT to improve your R coding. We will learn how to access ChatGPT directly from R, how to make it write R code, including fairly long and complicated command, debug its (and your) code, translate code from one coding language to another, comment your code, make it more efficient and more! We will also explore some of the drawbacks of ChatGPT and examine when and why you can’t always rely on it.


Minimal registration fee: 20 euro (or 20 USD or 800 UAH)




How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!






Survival Analysis with R and Python workshop

Learn more about Survival Analysis and how to apply it both in R and in Python! Join our workshop on Survival Analysis with R and Python which is a part of our workshops for Ukraine series. 
Here’s some more info: 
Title: Survival Analysis with R and Python
Date: Thursday, March 16th, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone) 
Speaker: Christopher Peters is the Principal Data Scientist and ninth employee at Zapier where the mission is to make automation work for everyone. For the last decade, he’s applied survival analysis in R and Python, along with statistics and econometrics to affect positive change for people. He learned many of his skills through self-study with friends as well as during his education at Louisiana State University where he completed his terminal degree, Masters of Applied Statistics. There he was privileged to be advised by reliability analysis giant, Professor Luis A. Escobar. His committee also included co-founder of Penalized B-splines and co-author of The Joys of P-Splines, Professor Brian Marx. As well as Emeritus Professor of Econometrics R. Carter Hill, co-author of Principles of Econometrics. Christopher was recently invited to review the book Statistical Methods for Reliability Data, 2nd Edition, co-authored by Distinguished Professor William Q. Meeker, Professor Luis A. Escobar, and Emeritus Associate Professor Francis G. Pascual. He also recently reviewed Telling Stories with Data by Assistant Professor Rohan Alexander. He loves being in nature and his interests lie in the interactions of technology and nature and span a wide variety of topics related to business, economics and causal inference. You can find him on Twitter at: @statwonk or at http://statwonk.com.
Description: How can we speed up growth? Bring about or prevent important events? Design technology and human processes for high-reliability? Survival Analysis (time-to-event) allows us to wisely answer these questions by allowing us to accurately and precisely allocate credibility among their possible answers. Our interest in future events is insatiable for many serious reasons. Through the benefit of systemization, we can use time-to-event analysis to better understand the possibilities of future events and how they can be reconfigured for the benefit of people and ourselves. Whether it’s causing or preventing important events, or just better understanding them, time-to-event analysis (aka survival or reliability analysis) affords us these abilities through the benefits of systemization. In this two hour workshop, I’ll give a gentle introduction to industrial and commercial application of time-to-event analysis technology in R and Python side-by-side. The workshop will focus on how you can best get started with these technologies and begin to answer these questions yourself on a deeper-level for the purpose of innovation. As part of that, I’ll share what I’ve learned over a decade of applying this high-technology in the SaaS software industry.
Minimal registration fee: 20 euro (or 20 USD or 750 UAH)


How can I register?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.
How can I sponsor a student?
  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).

You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.
Looking forward to seeing you during the workshop!



A Gentle and Applied Introduction to Rcpp workshop

Learn how to use Rcpp package, while contributing to charity! Join our workshop on A Gentle and Applied Introduction to Rcpp to improve your skills which is a part of our workshops for Ukraine series. 
Here’s some more info: 
Title: A Gentle and Applied Introduction to Rcpp
Date: Thursday, February 9th, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone)  
Speaker: Dirk Eddelbuettel is involved with many R packages on CRAN; co-creator of the Rocker Project providing R Docker containers; the Debian/Ubuntu maintainer for R, many CRAN packages, and some other quantitative software; behind several initiatives to make binary packages more easily available ranging from Quantian to the more recent r2u Project; an elected board member of the R Foundation; an adjunct Clinical Professor at the University of Illinois Urbana-Champaign; an editor at the Journal of Statistical Software; and a Principal Software Engineer at TileDB. He holds a MA and PhD in Mathematical Economics from EHESS in France, and a MSc in Industrial Engineering from KIT in Germany.

Description: R has become the lingua franca of statistical research and applications.  It provides an open and extensible system for which the Rcpp package has become the most widely-used package for extending R via native code.  This talk aims to gently introduce going to compiled code without fear thanks to sophisticated tooling R and Rcpp provide which make the otherwise complicated and sometimes feared steps of compiling, linking, loading, and launching compiled code a relative breeze that is accessible directly from R relying on built-in converters to facilitate exchange to and from R for all key data types. The talk will highlight key aspects, and motivations, of using Rcpp—and will also warn of a few common pitfalls. The second half will be centered around a complete worked example of a package using RcppArmadillo that we will build from scratch. Pointers for further study as well as to additional examples will also be provided.
Minimal registration fee: 20 euro (or 20 USD or 750 UAH)


How can I register?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.
How can I sponsor a student?
  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).

You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.
Looking forward to seeing you during the workshop!



Working with image data in R workshop

Learn how to work with image data in R! Join our workshop on working with image data in R which is a part of our workshops for Ukraine series. 
Here’s some more info: 
Title: Working with image data in R
Date: Thursday, March 23rd, 15:00 – 17:00 CET (Rome, Berlin, Paris timezone)
Speaker: Wolfgang Huber is the author of several R packages for statistical analysis of “omics” data and a co-founder of the Bioconductor project. He co-authored the textbook Modern Statistics for Modern Biology with Susan Holmes. He has worked on cellular phenotyping from genetic and chemical screens and is a co-author of the EBImage package). He is a senior group leader at the European Molecular Biology Laboratory, where he co-directs the Molecular Medicine Partnership Unit and the Theory Transversal Theme. Scientific Homepage is here

Description: Images are a rich source of data. In this workshop, we will see how quantitative information can be extracted from images. We will use segmentation to identify objects, measure their properties such as size, intensity distribution moments, shape and morphology descriptors, and explore statistical models to describe spatial relationships between them. The workshop includes a hands-on demonstration of the EBImage package for R, which provides many functions for feature extraction and visualization. Application examples will be taken from biological imaging of cells and tissues, the methods should also be applicable to other types of data.

Minimal registration fee: 20 euro (or 20 USD or 750 UAH)


How can I register?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.
How can I sponsor a student?
  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).

You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.
Looking forward to seeing you during the workshop!



How to generate data from a model – Part 2


Summary

Traditionally, data scientists have built models based on data. This article details how to do the exact opposite i.e. generate data based on a model. This article is second in the series of articles on building data from model. 
You can find part 1 here


Broadly speaking, there are three steps to generating data from model as given below.

Step 1: Register for the API

    • Head over to the API developer portal at
    • (https://foyi.developer.azure-api.net/)
    • Click on the Sign up button on the home page.
    • Register your details such as email, password etc.
    • You will receive an email to verify your email id.
    • Once you verify your email, please head over to the Products section of the developer portal. You can find it on the menu at the top right hand corner of the web page.
    • Please select the product Starter by clicking it. It will take you to the product page where you will find the section Your subscriptions. Please enter a name for your subscription and hit Subscribe.
    • Post subscribing, on your profile page, under the subscriptions section, click on show next to the Primary key. That is the subscription key you will need to access the API. Congratulations!!, you now can access the API.
    • If you have any issues with the signup, please email [email protected]

Step 2: Install R Package Conjurer

Install the latest version of the package from CRAN as follows.
install.packages("conjurer")

Step 3: Generate data from model

The function used to generate data from model is buildModelData(numOfObs, numOfVars, key, modelObj) .
The components of this function buildModelData are as follows.

    • numOfObs is the number of observations i.e. rows of data that you would like to generate. Please note that the current version allows you to generate from a minimum of 100 observations to a maximum of 10,000. 
    • numOfVars is the number of independent variables i.e. columns in the data. Please note that the current version allows you to generate from a minimum of 1 variable to a maximum of 100.
    • key is the Primary key that you have sourced from the earlier step.
    • modelObj is the model object. In the current version 1.7.1, this accepts either an lm or a glm model object built using the stats module. This is an optional parameter i.e. if this parameter is not specified, then the function generates the data randomly. However, if the model object is provided, then the intercept, coefficient and the independent variable range is sourced from it.

Generate data completely random using the code below.
library(conjurer)
uncovrJson <- buildModelData(numOfObs = 1000, numOfVars = 3, key = "input your subscription key here")
df <- extractDf(uncovrJson=uncovrJson)


Generate data based on the model object provided. For this example, a simple linear regression model is used.
library(conjurer)
library(datasets)

data(cars)
m <- lm(formula = dist ~ speed, data = cars)
uncovrJson <- buildModelData(numOfObs=100, numOfVars=1, key="insert subscription key here", modelObj = m)
df <- extractDf(uncovrJson=uncovrJson)

Interpretation of results

The data frame df (in the code above) will have two columns with the names iv1 and dv. The columns with prefix iv are the independent variables while the dv is the dependent variable. You can rename them to suit your needs. In the example above iv1 is speed and dv is distance. The details of the model formula and its estimated performance can be inspected as follows. 
    • To begin with, you can inspect the JSON data that is received from the API by using the code  str(uncovrJson). This would display all the components of the JSON file. The attributes prefixed as slope are the coefficients of the model formula corresponding to the number. For example, slope1 is the coefficient corresponding to iv1 i.e. independent variable 1. 
    • The regression formula used to construct the data for the example data frame is as follows.
      Please note that the formula takes the form of Y = mX + C. If there are multiple variables, then the component (slope1*iv1) will be repeated for each independent variable.
      dv = intercept + (slope1*iv1) + error.
    • While the slopes i.e. the coefficients are at variable level, the error is at each observation level. These errors can be accessed as uncovrJson$error
A simple comparison can be made to see how the synthetic data generated compares to the original data with the following code. 

summary(cars)
     speed            dist
  Min. : 4.0       Min. : 2.00
1st Qu.:12.0     1st Qu.: 26.00
Median :15.0     Median : 36.00
  Mean :15.4       Mean : 42.98
3rd Qu.:19.0     3rd Qu.: 56.00
  Max. :25.0       Max. :120.00


summary(df)
       iv1              dv
  Min. : 4.080     Min. :-38.76
1st Qu.: 8.915   1st Qu.: 29.35
Median :16.844   Median : 47.03
  Mean :15.405     Mean : 46.13
3rd Qu.:20.461   3rd Qu.: 75.83
  Max. :24.958     Max. :127.66


Limitation and Future Work

Some of the known limitations of this algorithm are as follows.
    • It can be observed from the above comparison that the independent variable range in synthetic data generated i.e. iv1 is close to the range of the original data i.e. speed. However, the range of the dependent variable i.e. dv in synthetic data is very different from the original data i.e. dist. This is on account of the error terms of the synthetic data being totally random and not sourced from the model object. 

    • While the range of the independent variable is similar across the original and synthetic datasets, a simple visual inspection of the distribution using a histogram plot hist(df$iv1) and hist(cars$speed) shows a drastic difference. This is because the independent data distribution is random and not sourced from the model object.

    • Additionally, if the same lm model is used to fit the synthetic data, the formula will be similar but the p values, R2 etc will be way off. This is on account of the error terms being random and not sourced from the model object.
These limitations will be addressed in the future versions. To be more specific, the distribution of the independent variables and error terms will be further engineered in the future versions.

Concluding Remarks

The underlying API uncovr is under development by FOYI . As new functionality is released, the R package conjurer will be updated to reflect those changes. Your feedback is valuable. For any feature requests or bug reports, please follow the contribution guidelines on GitHub repository. If you would like to follow the future releases and news, please follow our LinkedIn page