Introduction to Deep Learning with R workshop

Learn how to use deep learning in R Join our workshop on Introduction to Deep Learning with R which is a part of our workshops for Ukraine series. 


Here’s some more info: 

Title: Introduction to Deep Learning with R

Date: Thursday, May 4th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Eran Raviv is an expert researcher at APG Asset Management, working for the Digitalization & Innovation department. His academic papers are published in top-tier journals. In his present role, Dr. Raviv helps the organization develop its Data Science capabilities and he is engaged in both strategic planning and leading bottom-up initiatives.

Description: The purpose of this workshop is to offer an introductory understanding of deep learning, regardless of your prior experience. It is important to note that this workshop is tailored to those who are absolute beginners in the field. We therefore begin with few necessary fundamental concepts, after which we cover the basics of deep learning, including topics such as what is actually being learned in deep learning, what makes it “deep,” and why it is such a popular field. We will also cover how you can estimate deep learning models in R using the neuralnet package. You should attend this workshop if you heard about deep learning and would like to know more about it.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


How can I register?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).

You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.

Looking forward to seeing you during the workshop!









What is ChatGPT? Will it affect my job? Can it help me? How can I learn it? (sign up for a free course to find out)

The hottest topic in AI has sent shockwaves through mainstream media and multiple industries. In 2023, ChatGPT signals a massive breakthrough in accessible AI, where any schmuck can open it up and start interacting with a large language model instantly.

But, are you maximizing your time on the model?

Probably not. However, you can easily deepen your understanding with a bit of practice—especially through DataCamp’s Introduction to ChatGPT course where you can easily supercharge your AI abilities in just two hours with no prior experience required.

First, let’s take a closer look at Open AI’s large language model.

What is ChatGPT?

ChatGPT is a large language model developed by OpenAI.

It has been trained on a large collection of written language and can generate an almost endless amount of human-like responses to various inputs (i.e. questions or commands), making it a powerful tool for natural language processing tasks.

ChatGPT has the potential to be used in a wide range of applications, from chatbots and customer service to content creation, language translation, and beyond.

Will it affect my job and can it help me?

The first part—if it hasn’t already, it will soon. Second part—absolutely.

ChatGPT can be useful in a variety of industries such as customer service, healthcare, education, e-commerce, and content creation.

For instance, ChatGPT can help customer service representatives handle a large volume of queries, provide immediate assistance, and improve customer satisfaction. In healthcare, ChatGPT can assist doctors in diagnosing and treating patients by analyzing medical records and recommending appropriate treatments.

In education, ChatGPT can act as a personal tutor, answer questions, and provide explanations. Specifically in content creation, ChatGPT can help generate original content, summarize articles, and simplify complex concepts for readers. And, the list goes on.

It’s groundbreaking in every sense of the world. And, we’re just scratching the surface.

How can I learn it?

If you want to stay relevant, elevate your skillset, and increase the quality of your work—you should learn how to proficiently use ChatGPT ASAP. Whilst most people can log on and start using it instantly, many uninitiated users do not maximize their time on the platform.

Fortunately, DataCamp—a world-leading ed-tech platform specializing in data science and AI—has just released a brand new course: Introduction to ChatGPT.

With no experience required, anyone can gain the fundamental skills to instantly improve their ChatGPT skills and start applying their skills in the real world.

From text summarization, explaining complex concepts, drafting engaging marketing content, and generating and explaining code, you’ll learn about the most common and useful applications of ChatGPT.

Plus, you’ll be equipped with a framework to evaluate new use cases and determine if ChatGPT is the right solution for your needs. Finally, you’ll explore the legal and ethical considerations that come with implementing ChatGPT in various situations.

Stay ahead of the AI curve. Start now for free.

 

REDCapDM: a package to access and manage REDCap data

Garcia-Lerma E, Carmezim J, Satorra P, Peñafiel J, Pallares N, Santos N, Tebé C.
Biostatistics Unit, Bellvitge Biomedical Research Institute (IDIBELL)

The REDCapDM package allows users to read data exported directly from REDCap or via an API connection. It also allows users to process the previously downloaded data, create reports of queries and track the identified issues.

The diagram below shows the data management cycle: from data entry in REDCap to obtain data ready for the analysis.



The package structure can be divided into three main components: reading raw data, processing data and identifying queries. Typically, after collecting data in REDCap, we will have to follow these three components in order to have a final validated dataset for analysis. We will provide a user guide on how to perform each one of these steps using the package’s functions. For data processing and query identification, we will use the COVICAN data as an example (see the package vignette for more information about this built-in dataset).

Read data: redcap_data

The redcap_data function allows users to easily import data from a REDCap project into R for analysis.

To read exported data from REDCap, use the arguments data_path and dic_path to, respectively, describe the path of the R file and the REDCap project’s dictionary:

dataset <- redcap_data(data_path="C:/Users/username/example.r",
                       dic_path="C:/Users/username/example_dictionary.csv")

Note: The R and CSV files exported from REDCap must be located in the same directory.

If the REDCap project is longitudinal (contains more than one event) then a third element should be specified with the correspondence of each event with each form of the project. This csv file can be downloaded in the REDCap of the project following these steps: Project Setup < Designate Instruments for My Events < Download instrument-event mappings (CSV).

dataset <- redcap_data(data_path="C:/Users/username/example.r",
                       dic_path="C:/Users/username/example_dictionary.csv",
                       event_path="C:/Users/username/events.csv")

Note: if the project is longitudinal and the event-form file is not provided using the event_path argument, some steps of the processment can not be performed.

Another way to read data exported from a REDCap project is using an API connection. To do this, we can use the arguments uri and token which respectively refer to the uniform resource identifier of the REDCap project and the user-specific string that serves as the password:

dataset_api <- redcap_data(uri ="https://redcap.idibell.cat/api/",
                           token = "55E5C3D1E83213ADA2182A4BFDEA")

In this case, there is no need to specify the event-form file since the function will download it automatically using the API connection, if the project is longitudinal.

Remember that the token would give anyone access to all the project’s information. You should be careful about who you give this information to.

This function returns a list with 3 elements (imported data, dictionary and event-form mapping) which can then be used for further analysis or visualization.

Data process: rd_transform

The main function involved in the processing of the data is rd_transform. This function is used to process the REDCap data read into R using the redcap_data, as described above. Using the arguments of the function we can perform different type of transformations of our data.

As previously stated, we will use the built-in dataset covican as an example.

The only necessary elements that must be provided are the dataset to be transformed and the corresponding dictionary. If the project is longitudinal, as in the case of covican, also the event-form dataset should be specified. These elements can be specified directly using the output of the redcap_data function or separately in different arguments.

#Option A: list object 
covican_transformed <- rd_transform(covican)

#Option B: separately with different arguments
covican_transformed <- rd_transform(data = covican$data, 
                                    dic = covican$dictionary, 
                                    event_form = covican$event_form)

#Print the results of the transformation
covican_transformed$results
1. Recalculating calculated fields and saving them as '[field_name]_recalc'

| Total calculated fields | Non-transcribed fields | Recalculated different fields |
|:-----------------:|:----------------:|:-----------------------:|
|         2         |      0 (0%)      |         1 (50%)         |


|     field_name      | Transcribed? | Is equal? |
|:-------------------:|:------------:|:---------:|
|         age         |     Yes      |   FALSE   |
| screening_fail_crit |     Yes      |   TRUE    |

2. Transforming checkboxes: changing their values to No/Yes and changing their names to the names of its options. For checkboxes that have a branching logic, when the logic is missing their values will be set to missing

Table: Checkbox variables advisable to be reviewed

| Variables without any branching logic |
|:-------------------------------------:|
|        type_underlying_disease        |

3. Replacing original variables for their factor version

4. Deleting variables that contain some patterns

This function will return a list with the transformed dataset, dictionary and the output of the results of the transformation.

As we can see, there are 4 steps in the transformation and they are briefly explained in the output of the function. This four steps are:

        1. Recalculation of REDCap calculated fields

        2. Checkbox transformation

        3. Replacement of the original variable by its factor version

        4. Elimination of variables containing some pattern

In addition, we can change the final structure of the transformed dataset by specifying in the final_format argument whether we want our data to be split by event or by form.

For more examples and information on extra arguments, see the vignette.

Queries

Queries are very important to ensure the accuracy and reliability of a REDCap dataset. The collected data may contain missing values, inconsistencies, or other potential errors that need to be identified in order to correct them later.

For all the following examples we will use the raw transformed data: covican_transformed.

rd_query

The rd_query function allows users to generate queries by using a specific expression. It can be used to identify missing values, values that fall outside the lower and upper limit of a variable and other types of inconsistencies.

Missings

If we want to identify missing values in the variables copd and age in the raw transformed data, a list of required arguments needs to be supplied.

example <- rd_query(covican_transformed,
                    variables = c("copd", "age"),
                    expression = c("%in%NA", "%in%NA"),
                    event = "baseline_visit_arm_1")

# Printing results
example$results
Report of queries
Variables Description Event Query Total
copd Chronic obstructive pulmonary disease Baseline visit The value should not be missing 6
age Age Baseline visit The value should not be missing 5

Expressions

The rd_query function is also able to identify outliers or observations that fulfill a specific condition.

example <- rd_query(variables="age",
                    expression=">70",
                    event="baseline_visit_arm_1",
                    dic=covican_transformed$dictionary,
                    data=covican_transformed$data)

# Printing results
example$results
Report of queries
Variables Description Event Query Total
age Age Baseline visit The value should not be >70 76

More examples of both functions can be seen at the vignette.

Output

When the rd_query function is executed, it returns a list that includes a data frame with all the queries identified and a second element with a summary of the number of generated queries in each specified variable for each expression applied:

Identifier DAG Event Instrument Field Repetition Description Query Code
100-58 Hospital 11 Baseline visit Comorbidities copd · Chronic obstructive pulmonary disease The value is NA and it should not be missing 100-58-1
Report of queries
Variables Description Event Query Total
copd Chronic obstructive pulmonary disease Baseline visit The value should not be missing 6

The data frame is designed to aid users in locating each query in their REDCap project. It includes information such as the record identifier, the Data Access Group (DAG), the event in which each query can be found, along with the name and the description of the analyzed variable and a brief description of the query.

check_queries

Once the process of identifying queries is complete, the typical approach would be to adress them by modifying the original dataset in REDCap and re-run the query identification process generating a new query dataset.

The check_queries function compares the previous query dataset with the new one by using the arguments old and new, respectively. The output remains a list with 2 items, but the data frame containing the information for each query will now have an additional column (“Modification”) indicating which queries are new, which have been modified, which have been corrected, and which remain pending. Besides, the summary will show the number of queries in each one of these categories:

check <- check_queries(old = example$queries, 
                       new = new_example$queries)

# Print results
check$results
Comparison report
State Total
Pending 7
Solved 3
Miscorrected 1
New 1

There are 7 pending queries, 3 solved queries, 1 miscorrected query, and 1 new query between the previous and the new query dataset.

Note: The “Miscorrected” category includes queries that belong to the same combination of record identifier and variable in both the old and new reports, but with a different reason. For instance, if a variable had a missing value in the old report, but in the new report shows a value outside the established range, it would be classified as “Miscorrected”.

Query control output:

Identifier DAG Event Instrument Field Repetition Description Query Code Modification
100-58 Hospital 11 Baseline visit Comorbidities copd · Chronic obstructive pulmonary disease The value is NA and it should not be missing 100-58-1 Pending
100-79 Hospital 11 Baseline visit Comorbidities copd · Chronic obstructive pulmonary disease The value is NA and it should not be missing 100-79-1 New
102-113 Hospital 24 Baseline visit Demographics age · Age The value is NA and it should not be missing 102-113-1 Pending
105-11 Hospital 5 Baseline visit Comorbidities copd · Chronic obstructive pulmonary disease The value is NA and it should not be missing 105-11-1 Pending

Future improvements

In the short term, we would like to make some improvements to the query identification and tracking process to minimise errors and cover a wide range of possible structures. We would also like to extend the scope of the data processing to cover up specific transformations of the data that may be required in some specific scenarios. As a long-term plan, we would like to complement this package with the development of a shiny application to facilitate the use of the package and make it as user-friendly as possible.

 

Designing Beautiful Tables in R

Learn how to design beautiful tables in R! Join our workshop on Designing Beautiful Tables in R which is a part of our workshops for Ukraine series. 

Here’s some more info: 

Title: Designing Beautiful Tables in R

Date: Thursday, April 27th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Tanya Shapiro is a freelance data consultant, helping businesses make better use of their data with bespoke analytical services. She is passionate about data visualization and design, and fell in love with the online R community via #TidyTuesday. When she’s not working on data projects, you can find her cycling or exploring downtown St. Petersburg, Florida.

Description: When we think about data visualization, bar charts and line charts are often top of mind – but what about tables? Tables are a great way to summarize and display different metrics across many records. In this workshop, we will learn how to design visually engaging tables in R and how to enhance them with HTML/CSS techniques. From sparklines, to heatmaps, to embedded images, we’ll cover a variety of tricks to help elevate your tables!

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!








Creating Standalone Apps from Shiny with Electron [2023, macOS M1]

💡 I assume that…

    • You’re famillar with R / shiny 
    • You know basic terminal command.
    • You’ve heard about node.js, npm, or Javascript something…

1. Why standalone shiny app?

First, let’s talk about the definition of a standalone app.

I’m going to define it this way

An app that can run independently without any external help.

“External” here is probably a web browser, which means software that can be installed and run without an internet connection.


Rstudio can also be seen as a kind of standalone, as you can’t install packages or update them if you’re not connected to a network, but you can still use them.

Creating a standalone app with shiny is really unfamiliar and quite complicated. What are the benefits, and why should you develop it anyway?

I can think of at least two.


1. better user experience
Regardless of the deployment method, when using Shiny as a web app, you have to turn on your web browser, enter a URL, and run it.

The process of sending and receiving data over the network can affect the performance of your app.


However, a standalone app can run without a browser and use the OS’s resources efficiently, resulting in a slightly faster and more stable execution.

Of course, the advantage is that it can be used without an internet connection.

2. Improved security
Shiny apps run through a web browser anyway, so if a “legendary” hacker had their way, they could pose a threat to the security of Shiny apps.

However, standalone is a bit immune to this problem, as long as they don’t physically break into your PC.

2. Very short introduction of electron

Electron (or more precisely, electron.js) is a technology that allows you to embed chromium and node.js in binary form to utilize the (shiny!) technologies used in web development: html, css, and javascript, to quote a bit from the official page.

It’s a story I still don’t fully understand, but fortunately, there have been numerous attempts by people to make shiny standalone with electron.js before, and their dedication has led to the sharing of templates that remove the “relatively complex” process.

The article I referenced was “turn a shiny application into a tablet or desktop app” by r-bloggers, written in 2020, but times have changed so quickly that the stuff from then doesn’t work (at least not on my M1 MAC).

After a considerable amount of wandering, I found a github repository that I could at least understand. Unfortunately, the repository was archived in April 2022. There were some things that needed to be updated for March 23.

Eventually, I was able to make the shiny app work as a standalone app.

And I’m going to leave some footprints for anyone else who might wander in the future.

3. Packaging shiny app with Electron

It’s finally time to get serious and package shiny as an electron.

I’ll describe the steps in a detailed, follow-along way where possible, but if you run into any problems, please let me know by raising an issue in the repository.

(I’ve actually seen it package as a standalone app utilizing the template by following the steps below)


1. the first thing you need to do is install npm.js, npm, and electron forge using Rstudio’s terminal. (I’ll skip these)

2. fork/clone (maybe even star ⭐) the template below

https://github.com/zarathucorp/shiny-electron-template-m1-2023


3. open the cloned project in Rstudio (.Rproj)

4. if you get something like below, except for the version, you are good to go.



Now start at line 6 of readmd (of template).


Let’s name the standalone app we want to create (obviously) “helloworld”

💡 I’ll format directory like /this

5. Run npx create-electron-app helloworld in the terminal to create the standalone app package. This will create a directory called /helloworld, delete /helloworld/src.

6. move the template’s files below to /helloworld and set the working directory to /helloworld.

    • start_shiny.R
    • add_cran-binary_pkgs.R
    • get-r-mac.sh
    • /shiny
    • /src
7. in the console, use version to check the version of R installed on your PC. Then run the shell script sh ./get-r-mac.sh in the terminal to install R for electron. (The version on your PC and the version of R in sh should be the same)

8. Once you see that the /r-mac directory exists, install the automagic R package from the console

9. modify the package.json (change the author name of course) The parts that should look like the image are the dependencies, repository, and devDependencies parts.



10. develop a shiny app (assuming you’re familiar with shiny, I’ll skip this part)

11. install the R package for electron by running Rscript add-cran-binary-pkgs.R in the terminal.

12. in a terminal, update the package.json for electron with npm install (this is a continuation of 9)

13. in a terminal, verify that the standalone app is working by running electron-forge start

If, like me in the past, the electron app still won’t run, exit the app, restart your R session in Rstudio, and then run the standalone app again. (It seems to be an environment variable issue, such as R’s shiny port.



14. once you’ve verified that start is running fine, create a working app with electron-forge make.


🥳 Voila, you have successfully made shiny a standalone app using electron.

4. Summary

If I’ve succeeded in my intentions, you should be able to use the
template to make shiny a standalone app using electron in 2023 on an m1 mac.

That app (delivered as a zip file) now makes
    • the power of R / Shiny available to people with little experience
    • without installing or using R.
    • Or even in a “closed environment” with no network connection
Since electron is technically electron.js, my biggest challenge in creating a standalone app with electron was utilizing Javascript (which I have limited skills in compared to R).

Fortunately, I was able to do so by making some improvements to the templates that the pioneers had painstakingly created.

Thank you L. Abigail Walter, Travis Hinkelman, and Dirk Shumacher
I’ll end this post with a template that I followed up with that I hope you’ll find useful.

Thank you.

(Translated with DeepL ❤️)

Updated nnlib2Rcpp package

For anyone interested, ‘nnlib2Rcpp’ package has been updated with several added features. Among other changes, the latest v.0.2.1 of the nnlib2Rcpp package allows users to define the behavior of custom Neural Network (NN) components they create using only R code. This includes custom layers and sets of connections.

Package nnlib2Rcpp is based on the nnlib2 C++ NN library. It interfaces compiled C++ NN components with R. New types of NN components can be created using the provided C++ classes and templates. However, as of v.0.2.1, user-defined NN components can also be created without any need for C++. Of course, NN components defined using nnlib2 C++ classes and templates (as described in an older post), or components already included in the package can still be used. All such components can be added to neural networks defined in R via the nnlib2Rcpp package’s “NN” module and cooperate with each other.

Defining custom NN component behavior in R does have a cost in terms of runtime performance and, to a certain degree, defies many of the reasons for using the provided C++ classes. However, it may be useful for some purposes.

The goal of the simple example listed below is to implement, using only R, a NN with functionality similar to that described in the aforementioned post and required some steps done in C++. In the example, component (connection set and output layer) functions required for a -simplified- perceptron-like NN are defined and the NN is set up. This is essentially a single layer perceptron as the first (“generic” layer just accepts the data and transfers it to the connections without performing any computations.
library(nnlib2Rcpp)

# Function for connections, when recalling/mapping:

CSmap <- function(WEIGHTS, SOURCE_OUTPUT,...)
	SOURCE_OUTPUT %*% WEIGHTS

# Function for connections, when encoding data:

learning_rate <- 0.3

CSenc <- function( WEIGHTS, SOURCE_OUTPUT,
				   DESTINATION_MISC, DESTINATION_OUTPUT, ...)
{
  # desired output should have been placed in misc registers:
  a <- learning_rate *
          (DESTINATION_MISC - DESTINATION_OUTPUT)
  # compute connection weight adjustments:
  a <- outer( SOURCE_OUTPUT, a , "*" )
  # compute adjusted weights:
  w <- WEIGHTS + a
  # return new (adjusted) weights:
  return(list(WEIGHTS=w))
}

# Function for layer, when recalling/mapping:
# (no encode function is needed for the layer in this example)

LAmap <- function(INPUT_Q,...)
{
	x <- colSums(INPUT_Q)		# input function is summation.
	x 0,1,0)		# threshold function is step.
	return(x)
}

# prepare some data based on iris data set:

data_in <- as.matrix(iris[1:4])
iris_cases <- nrow((data_in))

# make a "one-hot" encoding matrix for iris species
desired_data_out <- matrix(data=0, nrow=iris_cases, ncol=3)
desired_data_out[cbind(1:iris_cases,unclass(iris[,5]))]=1

# create the NN and define its components:
# (first generic layer simply accepts input and transfers it to the connections)

p <- new("NN")
p$add_layer("generic",4)
p$add_connection_set(list(name="R-connections",
                          encode_FUN="CSenc",
                          recall_FUN="CSmap"))
p$add_layer(list(name="R-layer",
                 size=3,
                 encode_FUN="",
                 recall_FUN="LAmap"))
p$create_connections_in_sets(0,0)

# encode data and desired output (for 50 training epochs):

for(i in 1:50)
	for(c in 1:iris_cases)
	{
		p$input_at(1,data_in[c,])
		p$set_misc_values_at(3,desired_data_out[c,])  # put desired output in misc registers
		p$recall_all_fwd();
		p$encode_at(2)
	}

# Recall the data and show NN's output:

for(c in 1:iris_cases)
{
	p$input_at(1,data_in[c,])
	p$recall_all_fwd()
	cat("iris case ",c,", desired = ", desired_data_out[c,],
		" returned = ", p$get_output_from(3),"\n")
}
More information can be found in the package’s documentation by typing:
help(NN_R_components)
A complete list of other changes done and features added to the package can be found here.

 Unlock your next move with Datacamp: Save up to 67% on in-demand data upskilling

 Unlock your next move with Datacamp: Save up to 67% on in-demand data upskilling

For a limited time, save up to 67% on a DataCamp Premium subscription and unlock 410+ interactive courses for all levels in R, Python, SQL, Power BI, and more. Alongside, access bespoke career and skills tracks, projects, challenges, and industry-leading certifications to stand out.

Simply follow the link here.

Upcoming free DataCamp content to enhance your data learning journey 

RADAR: Thrive in the era of data

 Presented by DataCamp, RADAR is a free data science summit of industry experts designed to help aspiring data professionals accelerate their data learning and build stronger careers in 2023.

Gain a deeper understanding of the skills industry leaders are looking for. Learn how to navigate the evolving data talent pool, and uncover insights on data’s most pressing opportunities through a mix of sessions from world-class organizations such as Tableau, Alteryx, Qlik, Salesforce, JetBrains, Google, CBRE, and more.

An unmissable event for anyone looking to strengthen their wider data skillset and accelerate their careers.

March 22-23 2023, 9 AM – 3 PM EST. Register now for free.

  The State of Data Literacy 2023

According to 87% of business leaders, data literacy ranks as the most important skill behind basic computer skills.

Commissioned by DataCamp, The State of Data Literacy is a free-to-download report that examines the current state of the global data skills revolution. For real-world accuracy, DataCamp surveyed over 550 business leaders to uncover how they are approaching the data skills revolution, including

  • What data literacy means and what transformational benefits it brings
  • Crucial skills business leaders are looking for and their reasons why (great for discovering upskilling gaps to help frame your data learning)
  • What the future of data skills holds for individuals, organizations, and society at large

For the full report, including a deep dive into the key data skills employers are looking for, download your free copy today.

Download Now For Free

Introduction to data analysis with {Statgarten}.





Overview

Data analysis is a useful way to help solve problems in quite a few situations.

There are many things that go into effective data analysis, but three are commonly mentioned

1. defining the problem you want to solve through data analysis
2. meaningful data collected
3. the skills (and expertise) to analyze the data

R is often mentioned as a way to effectively fill the third of these, but at the same time, it’s often seen as a big barrier for people who haven’t used R before (or have no programming experience).

In my previous work experience, there were many situations where I was able to turn experiences into insights and produce meaningful results with a little data analysis, even if I was “not a data person”.

For this purpose, We have developed an open source R package called “Statgarten” that allows you to utilize the features of R without having to use R directly, and I would like to introduce it.

Here’s the repo link (Note, some description is written in Korean yet)


👣 Flow of data analysis

The order and components may vary depending on your situation, but I like to define it as five broad flows.

1. data preparation
2. EDA
3. data visualization
4. calculate statistics
5. share results

In this article, I’ll share a lightweight data analysis example that follows these steps (while utilizing R’s features and not typing R code whenever possible).

Note, Since our work is still in progress, including deployment in the form of a web application, we will utilize R packages.
Install
With this code, you can install all components of statgarten system. 
remotes::install_github('statgarten/statgarten')
library(statgarten)
Run
The core of the statgarten ecosystem is door, which allows you to bundle other functional packages together. (Of course, you can also use each package as a separate shiny module)

Let’s load the door library, and run it via run_app.
library(door)

run_app() # OR door::run_app()
If you didn’t set anything, the shiny application will run in Rstudio’s viewer panel, but we recommend running it in a web browser like Chrome via the Show in new window icon (Icon to the left of the Stop button)

Statgarten app main pageIf you don’t have any problems running it (please raise an issue on DOOR to let us know if you do), you should see the screen below.
1. Data preparation
There are four ways to prepare data for Statgarten. 1) Upload a file from your local PC, 2) Enter the URL of a file, 3) Enter the URL of a Google Sheet, or 4) Finally, utilize the public data included in statgarten, which can be found in the tabs File, URL, Google Sheet, and Datatoys respectively.

In this example, we will utilize the public data named bloodTest.

bloodTest
contains blood test data from 2014-15 provided by the National Health Insurance Service in South Korea.
1.5 Define the problem
Utilizing bloodtest data, we’ll try to see clues for this question

“Are people with high total cholesterol more likely to be diagnosed with anemia and cerebrovascular disease, and does the incidence vary by gender?” 
With a few clicks, select the data as shown below. (after selection, click Import data button)

statgarten data select


Before we start EDA, let’s process the data for analysis.

In keeping with the theme, we will “remove” data that is not needed and change some numeric values to the type of factor.

This can be done with the Update Data button, where data selection is done with the checkbox. The type can be changed in the New class.

2. EDA
You can see the organization of the data in the EDA pane below, where we see that the genders are 1 and 2, so we’ll use the Replace function on the Transform Data button to change them to M/F.


3. Data visualization
In the Vis Panel, you can also visualize anemia (ANE) and total cholesterol (TCHOL) by dragging, as well as total cholesterol by cerebrovascular disease  (STK) status. 



However, it’s hard to tell from the figure if there is a significant difference (in both case).
4. Statistics
You can view the distribution of values by data and key statistics via Distribution in the EDA panel.


For the anemia (ANE) and cerebrovascular disease variables (STK), we see that 0 (never diagnosed) is 92.2% and 93.7%, respectively, and 1 (diagnosed) is 7.8% and 6.3%, respectively.


In the Stat Panel, let’s create a “Table 1” to represent the baseline characteristics of the data, based on anemia status (ANE).


Cerebrovascular disease status(STK) , again from Table 1, we can see that the value of total cholesterol (TCHOL) by gender (SEX) is significant with a Pvalue less than 0.05.


5. Share result
I think quarto (or Rmarkdown) is the most effective way to share data analysis results in R, but utilizing it in a shiny app is another matter.

As a result, statgarten’s results sharing is limited to exporting a data table or downloading an image.



⛳ Statgarten as Open source

The statgarten project has goal for

In order to help process and utilize data in a rapidly growing data economy and foster data literacy for all.
The project is being developed with the support of the Ministry of Science and ICT of the Republic of Korea, and has been selected as a target for the 2022 Information and Communication Technology Development Project and the Standards Development Support Project.

But at the same time, it is an open source project that everyone can use and contribute to freely. (We’ve also used other open source projects in the development process)

It is being developed in various forms such as web app, docker, and R package, and is open to various forms of contributions such as development, case sharing, and suggestions.

Please try it out, raise an issue, fork or stargaze it, or suggest what you need, and we’ll do our best to incorporate it, so please support us 🙂

For more information, you can check out our github page or drop us an email.

Thanks.

(Translated with DeepL ❤️)

Spatial Data Wrangling with R workshop

Learn how to wrangle spatial data in R ! Join our workshop on Spatial Data Wrangling with R: A Comprehensive Guide which is a part of our workshops for Ukraine series. 


Here’s some more info: 

Title: Spatial Data Wrangling with R: A Comprehensive Guide

Date: Thursday, April 6th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone) 

Speaker: Long Nguyen is a PhD student at SOEP RegioHub at Bielefeld University. He likes to make pretty maps.

Description: This workshop is designed to provide a solid foundation for working with spatial data in R. Starting with fundamental concepts of spatial data types and structures, the workshop provides a systematic overview of techniques for manipulating spatial data, such as spatial aggregation, spatial joins, spatial geometry transformations, and distance calculations. With this focus, the workshop’s aim is to give participants a skill set that is easily extendable and transferable to new data and tools. The data wrangling techniques presented will be accompanied by instructions on creating maps – both static and interactive – to quickly explore and present the results of the operations performed.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)

How can I register?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.

How can I sponsor a student?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list as only those students who are sponsored can participate). Since the number of sponsored places is usually lower than the number of people signing up for the waitlist, we ask you to sign up via the regular registration process to ensure your participation if you can.

You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.

Looking forward to seeing you during the workshop!


Dataviz with R and ggplot: Using colour and annotations for effective story telling workshop

Learn how to fit use annotations and colors in your ggplot plots! Join our workshop on Dataviz with R and ggplot: Using colour and annotations for effective story telling which is a part of our workshops for Ukraine series. 


Here’s some more info: 

Title: Dataviz with R and ggplot: Using colour and annotations for effective story telling

Date: Thursday, April 20th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Cara Thompson, Cara is a freelance data consultant with an academic background, specialising in dataviz and in “enhanced” reproducible outputs. She lives in Edinburgh, Scotland, and is passionate about maximising the impact of other people’s expertise.

Description: If we’re passionate about our data and the patterns we’ve found, a key part of our job is to find effective ways of communicating what we’ve discovered. Intuitive and compelling data visualisations are a great way to draw attention to our main story, and illustrate some of the details. 

In this workshop, we’ll talk about how we can make use of colour, fonts and a few other tricks to make it easier for readers to understand and remember our main story and make our plots publication-ready. We’ll be using R and ggplot to create, modify and annotate the plots we discuss, but the principles apply regardless of the tools you use to plot your data. 

Attendees are encouraged to bring along a plot of their own (which doesn’t need to be made with ggplot!) so that think about how best to apply the principles to their own context – and for a chance for some live feedback during our Q&A session.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


How can I register?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation). You can also submit a plot made by you for a chance for getting feedback!

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).


You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!