RMarkdown and Quarto – Mastering the Basics workshop

Learn how to use RMarkdown and Quarto! Join our workshop on RMarkdown and Quarto – Mastering the Basics which is a part of our workshops for Ukraine series. 

Here’s some more info: 

Title: RMarkdown and Quarto – Mastering the Basics

Date: Thursday, May 25th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Indrek Seppo, a seasoned R programming expert, brings over 20 years of experience from the academic, private, and public sectors to the table. With more than a decade of teaching R under his belt, Indrek’s passionate teaching style has consistently led his courses to top the student feedback charts and has inspired hundreds of upcoming data analysts to embrace R (and Baby Shark).

Description: Discover the power of RMarkdown and its next-generation counterpart, Quarto, to create stunning reports, slides, dashboards, and even entire books—all within the RStudio environment. This session will cover the fundamentals of markdown, guiding you through the process of formatting documents and incorporating R code, tables, and graphs seamlessly. If you’ve never explored these tools before, prepare to be amazed by their capabilities. Learn how to generate reproducible reports and research with ease, enhancing your productivity and efficiency in the world of data analysis.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


How can I register?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).

You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!










Working with Big Data with Hadoop and Spark workshop

Learn how to work with Big Data with Hadoop and Spark! Join our workshop on Working with Big Data with Hadoop and Spark which is a part of our workshops for Ukraine series. 

Here’s some more info:

Title: Working with Big Data with Hadoop and Spark

Date: Thursday, May 18th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Jannic Cutura is an economist turned data engineer turned software engineer who works as a Python developer at the European Central Bank’s Stress Test team. Prior to his current position he worked as research analyst/data engineer in the financial stability and monetary policy divisions of the ECB. He holds a masters and Ph.D. in quantitative economics from Goethe University Frankfurt and conducted research projects at the BIS, the IMF and Columbia University.

Description: Big data — datasets that are difficult to handle on standalone retail-grade computers — are rapidly becoming the norm in social science research. This is true both in academia as well as for policy-oriented research in central banks and similar bodies (let alone industry application). Yet traditional econometrics (and econometrics training) tells us little about how to efficiently work with large datasets. In practice, any data set larger than the researchers computer memory (~20- 30GB) is very challenging to handle as, once that barrier is crossed, most data manipulation tasks becomes painfully slow and prone to failure. The goal of this presentation is to (i) explain what happens under the hood when your computer gets slow and (ii) show how distributed computing (in particular Hadoop/Spark) can help to mitigate those issues. By the end, participants will understand the power of distributed computing and how they can use it to both tackle existing data handling challenges and as well as new ones that were previously prohibitively expensive to evaluate on retail grade computers. The workshop will both contain a theory part and a lab session using data bricks. If you want to follow along during the live session you can create your own free account at data bricks by signing up for the community edition (no credit card required).

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)

How can I register?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).


You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!









WooCommerce Administrator with R

When working with WooCommerce, you have access to a powerful and robust product that offers a variety of benefits. Not only is it free and supported by a large community, but it also has strong SEO capabilities and a vast selection of plugins to enhance functionality. Additionally, the WooCommerce admin tool is user-friendly and easy to navigate, requiring minimal time and effort to learn. In fact, most individuals can become proficient in its use in just one to two hours.

However, there is a drawback. The main concept of WooCommerce is that it’s like managing a storefront. Regardless of the size of your business, whether it’s small or large, it’s still just a storefront. This means that it lacks serious back-office capabilities, and the only way to manage your products is one-by-one, similar to how you would rearrange products in a storefront window.
If you’re managing an e-shop professionally, simply rearranging the products one by one as in a shop window won’t suffice. To stay ahead of the competition, you need to provide your customers (or boss) with more advanced capabilities and perform tasks quickly.
While it’s true that there are plugins available for almost anything you can think of, they often come at a cost. Moreover, they can negatively impact the speed of your store and lead to compatibility issues with each other. If you end up using more than 3-4 plugins, errors are bound to occur, making your workflow inefficient.

Over the years, I have faced several challenges in managing e-shops and I have finally decided to overcome them. After putting in a lot of effort, I have written over 90 functions in R (6.400+ lines of code) and utilized the WooCommerce API to develop a highly robust solution for these problems.
The central concept is to create a duplicate of the essential features of the e-shop such as categories, tags, attributes, products, customers, and orders inside R-Studio,  utilize custom functions to perform filtering and CRUD operations through the REST API.

The solution is seamlessly integrated with WooCommerce through the REST API, and the source code is explained in detail, making it easy for you to modify the functions to suit your needs or even create new ones. I have incorporated multiple ways to achieve the same result, including GUI interfaces with the Tcl/Tk package, allowing you to customize your working environment.

My book, “WooCommerce Administrator with R,” is available on Amazon in both Kindle and paperback formats. 

One use case included in the book demonstrates how easy it is to add new products, whether they are variable or simple. By creating an xlsx file with the necessary data (one line per product), along with variation attributes and category paths, you can use a single command to pass all the information to your e-shop, with variations created automatically.

Check the video in this link to see how it is done: Create new products with WooCommerce API in R.

Let’s see another example. You need to have a large sale on white women’s shoes, sizes 40 and 41 EU, as they are no longer in fashion and you have a lot of stock. You expect that smaller sizes will sell eventually. Act fast, customers, as the grand sale for white women’s shoes in sizes 40 and 41 EU will only last for two weeks!

filter = list(categories = "Shoes", variations = c("Color : White", "Shoe size : 40|41"))
filtered <- filter_products(filter = filter, search.variations = TRUE)

pr_filtered <- filtered[1] %>% as.data.frame()  # parent products
var_filtered <-  filtered[2] %>% as.data.frame() # filtered variations

schema_name =create_schema("name, v_Color, v_Shoe size, regular_price, sale_price,  date_on_sale_to_gmt", template = F, echo = T)[[1]]

my_products <- populate_schema(schema_name, data = pr_filtered, var_df = var_filtered, values.from.parent = FALSE)

# adjust prices and offer date
my_products$sale_price = as.numeric(my_products$regular_price)*0.5 
my_products$date_on_sale_to_gmt = paste0(Sys.Date()+14,"T23:59:59")

my_products <- keep.columns(my_products, "sale_price, date_on_sale_to_gmt") %>%  
filter(parent > 0) # we want to update only the variations

my_products <- modify_variations_batch (my_products, add.elements = F)

These commands, which may seem complex now, become simple to use once you have the source code and analysis. With these commands, you can complete your work in a matter of minutes, depending on the number of products you have, without ever needing to access the WP-Admin interface.

In my book, I also address the challenge of managing metadata. The functions I provide enable you to add additional fields to your products, customers, and orders. For instance, you can add information such as barcodes, product costs, discount policies, sales representatives, and more. If you have brick-and-mortar stores, you can even create orders in batches and include metadata about your retail customers, such as age group, sex, new/old customer status, and so on. All of this data can be extracted in a single data frame for further analysis. It’s a powerful tool that you’ll surely find useful!

I am confident that by learning to use the functions and basic directions provided in the book, you will see a significant improvement in your e-shop management capabilities. As an e-shop manager, this will allow you to work more efficiently and productively.
If you are a business owner, you will gain a better understanding of the potential of your e-shop and be able to hire the appropriate personnel to manage it effectively.
Furthermore, if you are interested in learning R, this book provides a great opportunity to do so while tackling real-life problems.
Lastly, for college students and business executives, acquiring the skills and knowledge provided in this book can be valuable for potential employers.

I highly recommend checking out my book on Amazon, as it provides a comprehensive solution to common issues faced by e-shop managers and business owners.  Get started today and take your e-shop to the next level!



John Kamaras (www.jkamaras.com)







How much weight do you need to lose to drop an inch off the waist?

This is an analysis inspired by Dan’s 2014 article and it’s meant to be a continuation of his idea with a larger sample size and a slightly different approach. 

I’ve used the PRAW package in Python to scrape data from the progresspics subbredit, and I ran the data analysis in R. I was able to collect data from 77 individuals and I chose to split the entire analysis by gender – due to the differences in abdominal fat distribution between men and women.
As can be observed from Table 1 and the plot, weight loss tends to be much higher for the males than for the females both in absolute units and percentage-wise. 

I chose to run four different regression models, for each gender. While Dan’s article only considered weight change, I also included the weight percentage change, BMI change, and BMI percentage change.

# Male
lm(weightDiff ~ waistDiff, data, subset = gender == "Male")
lm(weightDiff_perc ~ waistDiff, data, subset = gender == "Male")
lm(bmiDiff ~ waistDiff, data, subset = gender == "Male")
lm(bmiDiff_perc ~ waistDiff, data, subset = gender == "Male")

# Female
lm(weightDiff ~ waistDiff, data, subset = gender == "Female")
lm(weightDiff_perc ~ waistDiff, data, subset = gender == "Female")
lm(bmiDiff ~ waistDiff, data, subset = gender == "Female")
lm(bmiDiff_perc ~ waistDiff, data, subset = gender == "Female")

Based on the results from Table 2, we could conclude the following:

Males
  1. On average, you need to lose 8.6 lbs (3.9 kg) to lose 1 inch off the waist.
  2. On average, you need to lose 1.4% of your weight to lose 1 inch off the waist.
  3. On average, you need to lose 1.1 BMI points to lose 1 inch off the waist.
  4. On average, you need to reduce your BMI by 1.4% to lose 1 inch off the waist.
Females
  1. On average, you need to lose 4.8 lbs (2.1 kg) to lose 1 inch off the waist.
  2. On average, you need to lose 2.3% of your weight to lose 1 inch off the waist.
  3. On average, you need to lose 0.82 BMI points to lose 1 inch off the waist.
  4. On average, you need to reduce your BMI by 2.4% to lose 1 inch off the waist.
Of course, the sample size is relatively small (especially for women) and the starting weights vary widely, so take these results with a grain of salt.
If you  want to delve deeper, you can find the data and the code I used here.

Introduction to Deep Learning with R workshop

Learn how to use deep learning in R Join our workshop on Introduction to Deep Learning with R which is a part of our workshops for Ukraine series. 


Here’s some more info: 

Title: Introduction to Deep Learning with R

Date: Thursday, May 4th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Eran Raviv is an expert researcher at APG Asset Management, working for the Digitalization & Innovation department. His academic papers are published in top-tier journals. In his present role, Dr. Raviv helps the organization develop its Data Science capabilities and he is engaged in both strategic planning and leading bottom-up initiatives.

Description: The purpose of this workshop is to offer an introductory understanding of deep learning, regardless of your prior experience. It is important to note that this workshop is tailored to those who are absolute beginners in the field. We therefore begin with few necessary fundamental concepts, after which we cover the basics of deep learning, including topics such as what is actually being learned in deep learning, what makes it “deep,” and why it is such a popular field. We will also cover how you can estimate deep learning models in R using the neuralnet package. You should attend this workshop if you heard about deep learning and would like to know more about it.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


How can I register?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).

You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.

Looking forward to seeing you during the workshop!









What is ChatGPT? Will it affect my job? Can it help me? How can I learn it? (sign up for a free course to find out)

The hottest topic in AI has sent shockwaves through mainstream media and multiple industries. In 2023, ChatGPT signals a massive breakthrough in accessible AI, where any schmuck can open it up and start interacting with a large language model instantly.

But, are you maximizing your time on the model?

Probably not. However, you can easily deepen your understanding with a bit of practice—especially through DataCamp’s Introduction to ChatGPT course where you can easily supercharge your AI abilities in just two hours with no prior experience required.

First, let’s take a closer look at Open AI’s large language model.

What is ChatGPT?

ChatGPT is a large language model developed by OpenAI.

It has been trained on a large collection of written language and can generate an almost endless amount of human-like responses to various inputs (i.e. questions or commands), making it a powerful tool for natural language processing tasks.

ChatGPT has the potential to be used in a wide range of applications, from chatbots and customer service to content creation, language translation, and beyond.

Will it affect my job and can it help me?

The first part—if it hasn’t already, it will soon. Second part—absolutely.

ChatGPT can be useful in a variety of industries such as customer service, healthcare, education, e-commerce, and content creation.

For instance, ChatGPT can help customer service representatives handle a large volume of queries, provide immediate assistance, and improve customer satisfaction. In healthcare, ChatGPT can assist doctors in diagnosing and treating patients by analyzing medical records and recommending appropriate treatments.

In education, ChatGPT can act as a personal tutor, answer questions, and provide explanations. Specifically in content creation, ChatGPT can help generate original content, summarize articles, and simplify complex concepts for readers. And, the list goes on.

It’s groundbreaking in every sense of the world. And, we’re just scratching the surface.

How can I learn it?

If you want to stay relevant, elevate your skillset, and increase the quality of your work—you should learn how to proficiently use ChatGPT ASAP. Whilst most people can log on and start using it instantly, many uninitiated users do not maximize their time on the platform.

Fortunately, DataCamp—a world-leading ed-tech platform specializing in data science and AI—has just released a brand new course: Introduction to ChatGPT.

With no experience required, anyone can gain the fundamental skills to instantly improve their ChatGPT skills and start applying their skills in the real world.

From text summarization, explaining complex concepts, drafting engaging marketing content, and generating and explaining code, you’ll learn about the most common and useful applications of ChatGPT.

Plus, you’ll be equipped with a framework to evaluate new use cases and determine if ChatGPT is the right solution for your needs. Finally, you’ll explore the legal and ethical considerations that come with implementing ChatGPT in various situations.

Stay ahead of the AI curve. Start now for free.

 

REDCapDM: a package to access and manage REDCap data

Garcia-Lerma E, Carmezim J, Satorra P, Peñafiel J, Pallares N, Santos N, Tebé C.
Biostatistics Unit, Bellvitge Biomedical Research Institute (IDIBELL)

The REDCapDM package allows users to read data exported directly from REDCap or via an API connection. It also allows users to process the previously downloaded data, create reports of queries and track the identified issues.

The diagram below shows the data management cycle: from data entry in REDCap to obtain data ready for the analysis.



The package structure can be divided into three main components: reading raw data, processing data and identifying queries. Typically, after collecting data in REDCap, we will have to follow these three components in order to have a final validated dataset for analysis. We will provide a user guide on how to perform each one of these steps using the package’s functions. For data processing and query identification, we will use the COVICAN data as an example (see the package vignette for more information about this built-in dataset).

Read data: redcap_data

The redcap_data function allows users to easily import data from a REDCap project into R for analysis.

To read exported data from REDCap, use the arguments data_path and dic_path to, respectively, describe the path of the R file and the REDCap project’s dictionary:

dataset <- redcap_data(data_path="C:/Users/username/example.r",
                       dic_path="C:/Users/username/example_dictionary.csv")

Note: The R and CSV files exported from REDCap must be located in the same directory.

If the REDCap project is longitudinal (contains more than one event) then a third element should be specified with the correspondence of each event with each form of the project. This csv file can be downloaded in the REDCap of the project following these steps: Project Setup < Designate Instruments for My Events < Download instrument-event mappings (CSV).

dataset <- redcap_data(data_path="C:/Users/username/example.r",
                       dic_path="C:/Users/username/example_dictionary.csv",
                       event_path="C:/Users/username/events.csv")

Note: if the project is longitudinal and the event-form file is not provided using the event_path argument, some steps of the processment can not be performed.

Another way to read data exported from a REDCap project is using an API connection. To do this, we can use the arguments uri and token which respectively refer to the uniform resource identifier of the REDCap project and the user-specific string that serves as the password:

dataset_api <- redcap_data(uri ="https://redcap.idibell.cat/api/",
                           token = "55E5C3D1E83213ADA2182A4BFDEA")

In this case, there is no need to specify the event-form file since the function will download it automatically using the API connection, if the project is longitudinal.

Remember that the token would give anyone access to all the project’s information. You should be careful about who you give this information to.

This function returns a list with 3 elements (imported data, dictionary and event-form mapping) which can then be used for further analysis or visualization.

Data process: rd_transform

The main function involved in the processing of the data is rd_transform. This function is used to process the REDCap data read into R using the redcap_data, as described above. Using the arguments of the function we can perform different type of transformations of our data.

As previously stated, we will use the built-in dataset covican as an example.

The only necessary elements that must be provided are the dataset to be transformed and the corresponding dictionary. If the project is longitudinal, as in the case of covican, also the event-form dataset should be specified. These elements can be specified directly using the output of the redcap_data function or separately in different arguments.

#Option A: list object 
covican_transformed <- rd_transform(covican)

#Option B: separately with different arguments
covican_transformed <- rd_transform(data = covican$data, 
                                    dic = covican$dictionary, 
                                    event_form = covican$event_form)

#Print the results of the transformation
covican_transformed$results
1. Recalculating calculated fields and saving them as '[field_name]_recalc'

| Total calculated fields | Non-transcribed fields | Recalculated different fields |
|:-----------------:|:----------------:|:-----------------------:|
|         2         |      0 (0%)      |         1 (50%)         |


|     field_name      | Transcribed? | Is equal? |
|:-------------------:|:------------:|:---------:|
|         age         |     Yes      |   FALSE   |
| screening_fail_crit |     Yes      |   TRUE    |

2. Transforming checkboxes: changing their values to No/Yes and changing their names to the names of its options. For checkboxes that have a branching logic, when the logic is missing their values will be set to missing

Table: Checkbox variables advisable to be reviewed

| Variables without any branching logic |
|:-------------------------------------:|
|        type_underlying_disease        |

3. Replacing original variables for their factor version

4. Deleting variables that contain some patterns

This function will return a list with the transformed dataset, dictionary and the output of the results of the transformation.

As we can see, there are 4 steps in the transformation and they are briefly explained in the output of the function. This four steps are:

        1. Recalculation of REDCap calculated fields

        2. Checkbox transformation

        3. Replacement of the original variable by its factor version

        4. Elimination of variables containing some pattern

In addition, we can change the final structure of the transformed dataset by specifying in the final_format argument whether we want our data to be split by event or by form.

For more examples and information on extra arguments, see the vignette.

Queries

Queries are very important to ensure the accuracy and reliability of a REDCap dataset. The collected data may contain missing values, inconsistencies, or other potential errors that need to be identified in order to correct them later.

For all the following examples we will use the raw transformed data: covican_transformed.

rd_query

The rd_query function allows users to generate queries by using a specific expression. It can be used to identify missing values, values that fall outside the lower and upper limit of a variable and other types of inconsistencies.

Missings

If we want to identify missing values in the variables copd and age in the raw transformed data, a list of required arguments needs to be supplied.

example <- rd_query(covican_transformed,
                    variables = c("copd", "age"),
                    expression = c("%in%NA", "%in%NA"),
                    event = "baseline_visit_arm_1")

# Printing results
example$results
Report of queries
Variables Description Event Query Total
copd Chronic obstructive pulmonary disease Baseline visit The value should not be missing 6
age Age Baseline visit The value should not be missing 5

Expressions

The rd_query function is also able to identify outliers or observations that fulfill a specific condition.

example <- rd_query(variables="age",
                    expression=">70",
                    event="baseline_visit_arm_1",
                    dic=covican_transformed$dictionary,
                    data=covican_transformed$data)

# Printing results
example$results
Report of queries
Variables Description Event Query Total
age Age Baseline visit The value should not be >70 76

More examples of both functions can be seen at the vignette.

Output

When the rd_query function is executed, it returns a list that includes a data frame with all the queries identified and a second element with a summary of the number of generated queries in each specified variable for each expression applied:

Identifier DAG Event Instrument Field Repetition Description Query Code
100-58 Hospital 11 Baseline visit Comorbidities copd · Chronic obstructive pulmonary disease The value is NA and it should not be missing 100-58-1
Report of queries
Variables Description Event Query Total
copd Chronic obstructive pulmonary disease Baseline visit The value should not be missing 6

The data frame is designed to aid users in locating each query in their REDCap project. It includes information such as the record identifier, the Data Access Group (DAG), the event in which each query can be found, along with the name and the description of the analyzed variable and a brief description of the query.

check_queries

Once the process of identifying queries is complete, the typical approach would be to adress them by modifying the original dataset in REDCap and re-run the query identification process generating a new query dataset.

The check_queries function compares the previous query dataset with the new one by using the arguments old and new, respectively. The output remains a list with 2 items, but the data frame containing the information for each query will now have an additional column (“Modification”) indicating which queries are new, which have been modified, which have been corrected, and which remain pending. Besides, the summary will show the number of queries in each one of these categories:

check <- check_queries(old = example$queries, 
                       new = new_example$queries)

# Print results
check$results
Comparison report
State Total
Pending 7
Solved 3
Miscorrected 1
New 1

There are 7 pending queries, 3 solved queries, 1 miscorrected query, and 1 new query between the previous and the new query dataset.

Note: The “Miscorrected” category includes queries that belong to the same combination of record identifier and variable in both the old and new reports, but with a different reason. For instance, if a variable had a missing value in the old report, but in the new report shows a value outside the established range, it would be classified as “Miscorrected”.

Query control output:

Identifier DAG Event Instrument Field Repetition Description Query Code Modification
100-58 Hospital 11 Baseline visit Comorbidities copd · Chronic obstructive pulmonary disease The value is NA and it should not be missing 100-58-1 Pending
100-79 Hospital 11 Baseline visit Comorbidities copd · Chronic obstructive pulmonary disease The value is NA and it should not be missing 100-79-1 New
102-113 Hospital 24 Baseline visit Demographics age · Age The value is NA and it should not be missing 102-113-1 Pending
105-11 Hospital 5 Baseline visit Comorbidities copd · Chronic obstructive pulmonary disease The value is NA and it should not be missing 105-11-1 Pending

Future improvements

In the short term, we would like to make some improvements to the query identification and tracking process to minimise errors and cover a wide range of possible structures. We would also like to extend the scope of the data processing to cover up specific transformations of the data that may be required in some specific scenarios. As a long-term plan, we would like to complement this package with the development of a shiny application to facilitate the use of the package and make it as user-friendly as possible.

 

Designing Beautiful Tables in R

Learn how to design beautiful tables in R! Join our workshop on Designing Beautiful Tables in R which is a part of our workshops for Ukraine series. 

Here’s some more info: 

Title: Designing Beautiful Tables in R

Date: Thursday, April 27th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Tanya Shapiro is a freelance data consultant, helping businesses make better use of their data with bespoke analytical services. She is passionate about data visualization and design, and fell in love with the online R community via #TidyTuesday. When she’s not working on data projects, you can find her cycling or exploring downtown St. Petersburg, Florida.

Description: When we think about data visualization, bar charts and line charts are often top of mind – but what about tables? Tables are a great way to summarize and display different metrics across many records. In this workshop, we will learn how to design visually engaging tables in R and how to enhance them with HTML/CSS techniques. From sparklines, to heatmaps, to embedded images, we’ll cover a variety of tricks to help elevate your tables!

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!








Creating Standalone Apps from Shiny with Electron [2023, macOS M1]

💡 I assume that…

    • You’re famillar with R / shiny 
    • You know basic terminal command.
    • You’ve heard about node.js, npm, or Javascript something…

1. Why standalone shiny app?

First, let’s talk about the definition of a standalone app.

I’m going to define it this way

An app that can run independently without any external help.

“External” here is probably a web browser, which means software that can be installed and run without an internet connection.


Rstudio can also be seen as a kind of standalone, as you can’t install packages or update them if you’re not connected to a network, but you can still use them.

Creating a standalone app with shiny is really unfamiliar and quite complicated. What are the benefits, and why should you develop it anyway?

I can think of at least two.


1. better user experience
Regardless of the deployment method, when using Shiny as a web app, you have to turn on your web browser, enter a URL, and run it.

The process of sending and receiving data over the network can affect the performance of your app.


However, a standalone app can run without a browser and use the OS’s resources efficiently, resulting in a slightly faster and more stable execution.

Of course, the advantage is that it can be used without an internet connection.

2. Improved security
Shiny apps run through a web browser anyway, so if a “legendary” hacker had their way, they could pose a threat to the security of Shiny apps.

However, standalone is a bit immune to this problem, as long as they don’t physically break into your PC.

2. Very short introduction of electron

Electron (or more precisely, electron.js) is a technology that allows you to embed chromium and node.js in binary form to utilize the (shiny!) technologies used in web development: html, css, and javascript, to quote a bit from the official page.

It’s a story I still don’t fully understand, but fortunately, there have been numerous attempts by people to make shiny standalone with electron.js before, and their dedication has led to the sharing of templates that remove the “relatively complex” process.

The article I referenced was “turn a shiny application into a tablet or desktop app” by r-bloggers, written in 2020, but times have changed so quickly that the stuff from then doesn’t work (at least not on my M1 MAC).

After a considerable amount of wandering, I found a github repository that I could at least understand. Unfortunately, the repository was archived in April 2022. There were some things that needed to be updated for March 23.

Eventually, I was able to make the shiny app work as a standalone app.

And I’m going to leave some footprints for anyone else who might wander in the future.

3. Packaging shiny app with Electron

It’s finally time to get serious and package shiny as an electron.

I’ll describe the steps in a detailed, follow-along way where possible, but if you run into any problems, please let me know by raising an issue in the repository.

(I’ve actually seen it package as a standalone app utilizing the template by following the steps below)


1. the first thing you need to do is install npm.js, npm, and electron forge using Rstudio’s terminal. (I’ll skip these)

2. fork/clone (maybe even star ⭐) the template below

https://github.com/zarathucorp/shiny-electron-template-m1-2023


3. open the cloned project in Rstudio (.Rproj)

4. if you get something like below, except for the version, you are good to go.



Now start at line 6 of readmd (of template).


Let’s name the standalone app we want to create (obviously) “helloworld”

💡 I’ll format directory like /this

5. Run npx create-electron-app helloworld in the terminal to create the standalone app package. This will create a directory called /helloworld, delete /helloworld/src.

6. move the template’s files below to /helloworld and set the working directory to /helloworld.

    • start_shiny.R
    • add_cran-binary_pkgs.R
    • get-r-mac.sh
    • /shiny
    • /src
7. in the console, use version to check the version of R installed on your PC. Then run the shell script sh ./get-r-mac.sh in the terminal to install R for electron. (The version on your PC and the version of R in sh should be the same)

8. Once you see that the /r-mac directory exists, install the automagic R package from the console

9. modify the package.json (change the author name of course) The parts that should look like the image are the dependencies, repository, and devDependencies parts.



10. develop a shiny app (assuming you’re familiar with shiny, I’ll skip this part)

11. install the R package for electron by running Rscript add-cran-binary-pkgs.R in the terminal.

12. in a terminal, update the package.json for electron with npm install (this is a continuation of 9)

13. in a terminal, verify that the standalone app is working by running electron-forge start

If, like me in the past, the electron app still won’t run, exit the app, restart your R session in Rstudio, and then run the standalone app again. (It seems to be an environment variable issue, such as R’s shiny port.



14. once you’ve verified that start is running fine, create a working app with electron-forge make.


🥳 Voila, you have successfully made shiny a standalone app using electron.

4. Summary

If I’ve succeeded in my intentions, you should be able to use the
template to make shiny a standalone app using electron in 2023 on an m1 mac.

That app (delivered as a zip file) now makes
    • the power of R / Shiny available to people with little experience
    • without installing or using R.
    • Or even in a “closed environment” with no network connection
Since electron is technically electron.js, my biggest challenge in creating a standalone app with electron was utilizing Javascript (which I have limited skills in compared to R).

Fortunately, I was able to do so by making some improvements to the templates that the pioneers had painstakingly created.

Thank you L. Abigail Walter, Travis Hinkelman, and Dirk Shumacher
I’ll end this post with a template that I followed up with that I hope you’ll find useful.

Thank you.

(Translated with DeepL ❤️)

Updated nnlib2Rcpp package

For anyone interested, ‘nnlib2Rcpp’ package has been updated with several added features. Among other changes, the latest v.0.2.1 of the nnlib2Rcpp package allows users to define the behavior of custom Neural Network (NN) components they create using only R code. This includes custom layers and sets of connections.

Package nnlib2Rcpp is based on the nnlib2 C++ NN library. It interfaces compiled C++ NN components with R. New types of NN components can be created using the provided C++ classes and templates. However, as of v.0.2.1, user-defined NN components can also be created without any need for C++. Of course, NN components defined using nnlib2 C++ classes and templates (as described in an older post), or components already included in the package can still be used. All such components can be added to neural networks defined in R via the nnlib2Rcpp package’s “NN” module and cooperate with each other.

Defining custom NN component behavior in R does have a cost in terms of runtime performance and, to a certain degree, defies many of the reasons for using the provided C++ classes. However, it may be useful for some purposes.

The goal of the simple example listed below is to implement, using only R, a NN with functionality similar to that described in the aforementioned post and required some steps done in C++. In the example, component (connection set and output layer) functions required for a -simplified- perceptron-like NN are defined and the NN is set up. This is essentially a single layer perceptron as the first (“generic” layer just accepts the data and transfers it to the connections without performing any computations.
library(nnlib2Rcpp)

# Function for connections, when recalling/mapping:

CSmap <- function(WEIGHTS, SOURCE_OUTPUT,...)
	SOURCE_OUTPUT %*% WEIGHTS

# Function for connections, when encoding data:

learning_rate <- 0.3

CSenc <- function( WEIGHTS, SOURCE_OUTPUT,
				   DESTINATION_MISC, DESTINATION_OUTPUT, ...)
{
  # desired output should have been placed in misc registers:
  a <- learning_rate *
          (DESTINATION_MISC - DESTINATION_OUTPUT)
  # compute connection weight adjustments:
  a <- outer( SOURCE_OUTPUT, a , "*" )
  # compute adjusted weights:
  w <- WEIGHTS + a
  # return new (adjusted) weights:
  return(list(WEIGHTS=w))
}

# Function for layer, when recalling/mapping:
# (no encode function is needed for the layer in this example)

LAmap <- function(INPUT_Q,...)
{
	x <- colSums(INPUT_Q)		# input function is summation.
	x 0,1,0)		# threshold function is step.
	return(x)
}

# prepare some data based on iris data set:

data_in <- as.matrix(iris[1:4])
iris_cases <- nrow((data_in))

# make a "one-hot" encoding matrix for iris species
desired_data_out <- matrix(data=0, nrow=iris_cases, ncol=3)
desired_data_out[cbind(1:iris_cases,unclass(iris[,5]))]=1

# create the NN and define its components:
# (first generic layer simply accepts input and transfers it to the connections)

p <- new("NN")
p$add_layer("generic",4)
p$add_connection_set(list(name="R-connections",
                          encode_FUN="CSenc",
                          recall_FUN="CSmap"))
p$add_layer(list(name="R-layer",
                 size=3,
                 encode_FUN="",
                 recall_FUN="LAmap"))
p$create_connections_in_sets(0,0)

# encode data and desired output (for 50 training epochs):

for(i in 1:50)
	for(c in 1:iris_cases)
	{
		p$input_at(1,data_in[c,])
		p$set_misc_values_at(3,desired_data_out[c,])  # put desired output in misc registers
		p$recall_all_fwd();
		p$encode_at(2)
	}

# Recall the data and show NN's output:

for(c in 1:iris_cases)
{
	p$input_at(1,data_in[c,])
	p$recall_all_fwd()
	cat("iris case ",c,", desired = ", desired_data_out[c,],
		" returned = ", p$get_output_from(3),"\n")
}
More information can be found in the package’s documentation by typing:
help(NN_R_components)
A complete list of other changes done and features added to the package can be found here.