RADAR AI Edition: DataCamp’s Free Summit on How Generative AI is Transforming Data Science

Date: June 22, 9 AM — 3 PM ET
Hosts: DataCamp
Location: Digital
Cost: Free
How: click here to register

Personified through the soaring popularity of tools like ChatGPT and Midjourney, the rapid adoption of generative AI tools is transforming every industry as we know it. 

As these tools evolve at breakneck speed, how businesses and individuals react today will shape their success in years to come. To demystify AI and showcase a blueprint for thriving with these new technologies, DataCamp is holding a day of expert-led sessions to uncover how tools like ChatGPT and Generative AI are reshaping data science and society as a whole.

Throughout, the focus will be on how individuals and organizations can succeed with data in the age of AI. Each session will be interactive and led by some of the brightest industry and academic minds—including leaders from Microsoft, Thoughtspot, AIMultiple, Lux Capital, Antler, Bitynamics, Two Sigma Ventures, Wittingly Ventures, and more.

To uncover the full speaker agenda, including topics and speakers, hit the link below.

Register Now

Streamline Your Analytical Method Development: Convert Signal to Concentration with Back-PredicteR

Streamline Your Analytical Method Development: Convert Signal to Concentration with Back-PredicteR

Back-PredicteR — An Easy-to-Use Shiny Application for Back Prediction of Analytical Method Signals to Concentrations

By Thomas de Marchin (Associate Director Statistics and Data Sciences at Pharmalex).

Thomas is Data Scientist at Pharmalex. He is passionate about the incredible possibility that data analytics offers to make the world a better place. You can contact him on Linkedin or Twitter.

Photo from Hal Gatewood on Unsplash

What is an analytical method: An analytical method is a technique used to quantitatively or qualitatively determine the concentration or properties of a molecule in a sample. It typically involves sample preparation by an analyst, followed by the analysis of the sample using specialized equipment or techniques.

What is Shiny: Shiny is an R package that allows users to build interactive web applications directly from R without the need for extensive web development skills. It provides an easy-to-use framework for creating interactive and responsive web apps for data visualization, exploration, and analysis, making it a powerful tool for data scientists and analysts.

Analytical methods play a crucial role in the pharmaceutical industry as they are used to determine the composition of samples. These methods typically involve sample preparation, such as centrifugation or heating, carried out by analysts or robots, followed by injecting the prepared sample into analytical equipment. The output from the equipment is usually a signal, such as millivolts or absorbance, which needs to be converted into a concentration that can be interpreted by humans, such as millimolars or percentages. This conversion requires (1) calibration samples of known concentration, (2) fitting an appropriate statistical model to establish the relationship between the signal and the concentration and (3) using this relationship to convert, or back predict, the signal of unknown samples into a concentration (Figure 1).

Figure 1: A calibration is needed to convert a signal into a concentration. The HPLC drawing is from DataBase Center for Life Science (DBCLS), distributed under a CC BY 4.0 license.

While simple linear models can be fitted using Excel, more advanced models may require specialized statistical software, which can be challenging for less experienced users to access and utilize. At Pharmalex, we love R-Shiny and we have developed an R-Shiny application called Back-PredicteR to address this challenge.

Figure 2: Screenshot of Back-PredicteR

Back-PredicteR is a user-friendly application written in R-Shiny that allows users to quickly fit various models commonly used in the analytical world (see the list below) to calibration data and back predict the concentration of their samples of interest from the acquired signal. With its intuitive interface, Back-PredicteR makes it easy for analysts to focus on the important questions rather than getting bogged down by technical details (Figure 2).

If you’re interested in trying Back-PredicteR, you can visit our custom software page at https://www.pharmalex.com/pharmalex-services/custom-software-development/

Conclusion

Calibration and back prediction are routine tasks in laboratories , and Back-PredicteR offers a streamlined and user-friendly solution for fitting advanced models and predicting sample concentrations from signals. However, it’s important to note that choosing the right calibration model is just the first step in analytical method development, as there are other aspects such as sample preparation optimization and qualification/validation of the analytical method that also need to be considered. If you have any questions or would like to discuss further, please don’t hesitate to contact us!

Appendix

List of models available in Back-PredicteR:

1. Linear regression

2. Weighted (1/X) linear regression

3. Weighted (1/X²) linear regression

4. Linear regression after (base 10) LOGARITHM transformation of both concentration and response

5. Linear regression after SQUARE ROOT transformation of both concentration and response

6. Quadratic regression

7. Weighted (1/X) Quadratic regression

8. Weighted (1/X²) Quadratic regression

9. Four parameters logistic regression

10. Weighted (POM) Four parameters logistic regression

11. Five parameters logistic

12. Weighted (POM) Five parameters logistic regression

13. Power regression

14. Weighted (POM) Power regression




Intermediate Web Scraping and API Harvesting using R workshop

Learn how to use web scraping in R! Join our workshop on Intermediate Web Scraping and API Harvesting using R which is a part of our workshops for Ukraine series. 


Here’s some more info: 

Title: Intermediate Web Scraping and API Harvesting using R

Date: Thursday, June 15th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Felix Lennert is a second-year Ph.D. student in Sociology at the CREST, ENSAE, Institut Polytechnique de Paris. He is a co-organizer of the Summer Institute of Computational Social Science in Paris. His research interests lie in the formation and polarization of political opinions which he tackles using a toolbox consisting of “classic” quantitative as well as text-as-data methods.

Description: Digital trace data are an integral element of CSS (cool social scientific) research. This course will show you how this is done on an intermediate level. This implies that we will not cover the fundamentals of selecting and downloading things from static web pages on the one hand, but also not go as far as firing up RSelenium to scrape dynamic web pages on the other. We will start with a brief revision of CSS selectors, then we move on to rvest to simulate a browser session, fill forms, and click buttons. In the second half of the session, APIs and how to make requests to them will be covered. Tangible examples for API queries will be shown. In the end, exemplary workflows will be introduced to provide a scaffolding for students’ future research projects.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)



How can I register?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).


You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!












Automating RMarkdown/Quarto reports workshop

Learn how to automate RMarkdown and Quarto reports! Join our workshop on  Automating RMarkdown/Quarto reports which is a part of our workshops for Ukraine series. 


Here’s some more info: 

Title: Automating RMarkdown/Quarto reports

Date: Thursday, June 1st, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Indrek Seppo, a seasoned R programming expert, brings over 20 years of experience from the academic, private, and public sectors to the table. With more than a decade of teaching R under his belt, Indrek’s passionate teaching style has consistently led his courses to top the student feedback charts and has inspired hundreds of upcoming data analysts to embrace R (and Baby Shark).

Description: For those who already know the basics of RMarkdown/Quarto, I invite you to delve into the world of report automation to streamline your workflow and enhance efficiency. This session will introduce the use of parameters, among other techniques, to create dynamic and customizable reports without repetitive manual work. Learn how to harness the power of R to generate tailored content for diverse audiences, effortlessly updating data, analyses, and visualizations with just a few clicks. 


Minimal registration fee: 20 euro (or 20 USD or 800 UAH)



How can I register?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?

  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).


You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials
here.


Looking forward to seeing you during the workshop!











WrappR for Rstudio: Use Keyboard Shortcuts to Wrap Highlighted Text With Custom Code

Introduction:
While coding in RStudio I wanted to use the RStudio keyboard shortcuts to wrap functions and custom lines of code around datasets, code, or objects in the editor pane.  I could not find what I wanted after reviewing various packages and solutions.  Combining the RStudio API, the shrtcts package, and an HTA interface config tool, I made my own solution.

Example:
Type mtcars$mpg in the editor pane, highlight that line, press Ctrl+T, and next you will have table(mtcars$mpg) with less typing. Using WrappR, the “table(” and “)” are wrapped around mtcars$mpg after pressing the assigned keyboard shortcut.

Getting Started:

1. Requirements Windows PC
(WrappR is not ready for Linux or Mac yet)
The following software need to be installed on the PC:
– R
– Rstudio
– gadenbuie/shrtcts package* (installed in R)

install.packages(“devtools”)
library(“devtools”)
remotes::install_github(“gadenbuie/shrtcts”)

2. WrappR Setup and operation (assuming R, Rstudio, and the gadenbuie/shrtcts are installed already):
b. Unzip and place the files in a folder location to launch the WrappR tool from.
c. Launch the WrappR.hta file and you should see one entry for wrappinc.
d. Included in the zip file is a “.shrtcts.yaml” file. If you want to use a pre-defined set of WrappR entries, then place this file in the ‘C:\Users\%username%\.config’ folder. If it doesn’t exist, then please create the .config in your root windows profile folder.

3. Container Layout:



ROW ONE – Text name. This text is searchable from the Ctrl+F find function.

ROW TWO – Editable name field for the User-Defined-Code (UDC), along with management buttons. Use the Duplicate button to make a new UDC.
(Note: UDC names can’t contain whitespaces and conflicts occur when UDCs have identical names)

ROW THREE – User-Defined-Code (UDC). Your highlighted text in the Rstudio editor pane will be interposed between text from the left and right fields defined here.

ROW FOUR – Assigned keyboard shortcut.  Ctrl, Alt, Shift, and an alpha character can be used in assignments.  Use different combinations of those function keys with an alpha character to create more assignments. 

Examples:
Ctrl+D can be used to produce data.frame()
Ctrl+Shift+D can be used to produce as.data.frame()

Save & Search:



Pressing the Save button will save all your entries to an essential file found in this path. C:\Users\username\.config\.shrtcts.yaml

Type in any text to search among your entries. To revert, clear the search field. You may also need to press the Enter key on your keyboard to return all the entries.

Lastly, run the following code and then close and re-open RStudio to utilize your new shortcut.

if (interactive() && requireNamespace(“shrtcts”, quietly = TRUE)) {
shrtcts::add_rstudio_shortcuts(set_keyboard_shortcuts = TRUE)
}


*Aden-Buie G (2022). shrtcts: Make Anything an RStudio Shortcut. https://pkg.garrickadenbuie.com/shrtcts, https://github.com/gadenbuie/shrtcts.

Constant Proportion Portfolio Insurance (CPPI) trading strategy with Bitcoin data

(This is NOT financial advice, and the content is only for educational purposes)

Introduction to CPPI strategy

The Constant Proportion Portfolio Insurance (CPPI) strategy is a risk management technique that dynamically adjusts the allocation of an investment portfolio between risky and safe assets. The goal of the strategy is to ensure that a minimum level of wealth is maintained, while also participating in potential gains from risky assets. Introduced convex option-like payoffs, but without using options. 

With a simple case of two assets, it is ou’re going 

Getting Started

To implement the CPPI strategy, we will need the following R packages:

  • quantmod: to download the Bitcoin historical data.
  • ggplot2: for graphics.

To implement the CPPI strategy we will use Bitcoin as the risky asset. We will download the monthly historical data from Yahoo Finance using the getSymbols function from the quantmod package. We will then calculate the monthly returns using the Return.calculate function. We will assume a risk-free rate of 2% (per annum).

CPPI Strategy

To implement the strategy we need the following inputs:

  • S: numeric: returns path of risky asset.
  • multiplier: numeric
  • floor: numeric: a percentage, should be smaller than 1.
  • r: numeric: interest rate (per time period tau).
  • tau: numeric: time periods.
    We will assume that the investor wants to set the floor value to 0.8, which means that the investor is willing to tolerate a maximum loss of 20% of the initial asset value before selling off the risky asset and moving entirely into the safe asset. The next step is to identify how much to allocate to the risky asset. If the initial maximum tolerance of a loss is 20% (Value of assets – floor) and the investor sets a multiplier of 3, then the proportion of the initial asset invested in the risky security is 60% (3*20%). This initial starting point is equivalent to a 60/40 strategy with 60% of the initial investment into the risky asset and 40% in the safe asset. 
    Implementing the CPPI strategy involves rebalancing frequently to guarantee that the investments will never breach the floor. The absence of transaction costs using Bitcoin data allows a costless frequent rebalance. However, when trading other assets, such as for example traded ETFs, transaction costs should be taken into account. 
    In the code below we will firstly show an implementation of the CCPI strategy with a static floor. We will them show an alternative that reset the floor in each time-period. We will backtest the CPPI strategy against a risky investment of 100% into Bitcoin as well as a buy-and-hold strategy with 60% invested in the risky asset (Bitcoin) and 40% invested in the safe asset. 

    library(ggplot2) 
    library(quantmod) 
    
    getSymbols("BTC-USD", from = "2020-01-06", to = Sys.Date(), auto.assign = TRUE) 
    bitcoin <- `BTC-USD` 
    
    # Calculate monthly returns of the Bitcoin series
    btc_monthly <- to.monthly(bitcoin)[, 6] 
    btc_monthly_returns <- monthlyReturn(btc_monthly)
     r <– 0.02
    risk_free_rate <- rep(r/12, length(btc_monthly_returns)) 
    
    floor <- 0.8 
    multiplier <- 3 # so the investment is (1-0.8)*3 of the initial value, which is equivalent to a starting portfolio with 60/40 strategy 
    start_value <- 1000 # Initial investment 
    date <- index(btc_monthly_returns) 
    n_steps <- length(date) 
    
    After setting the key inputs for the CPPI algorithm, we initialise the function by setting up the starting values. These include identifying the initial value of the account (initial investment here assumed to be $1000), the weights for the risky and safe assets and the wealth index for the risky asset. Then we just implement a for loop to iterate each time period, under the constraints that the risky weight remains between 0 (no short selling) and 1 (no leverage).
    account_value <- cushion <- risky_weight <- safe_weight <- risky_allocation <- safe_allocation <- numeric(n_steps)
    account_value[1] <- start_value
    floor_value <- floor*account_value
    cushion[1] <- (account_value[1]-floor_value[1])/account_value[1]
    risky_weight[1] <- multiplier * cushion[1]
    safe_weight[1] <- 1-risky_weight[1]
    
    risky_allocation[1] <- account_value[1] * risky_weight[1]  
    safe_allocation[1] <- account_value[1] * safe_weight[1]
    
    risky_return <- cumprod(1+btc_monthly_returns)
    
    # Static floor: horizontal line
    for (s in 2:n_steps) {
      account_value[s] = risky_allocation[s-1]*(1+btc_monthly_returns[s]) + safe_allocation[s-1]*(1+risk_free_rate[s-1])
      floor_value[s] <- account_value[1] * floor
      cushion[s] <- (account_value[s]-floor_value[s])/account_value[s]
      risky_weight[s] <- multiplier*cushion[s]
      risky_weight[s] <- min(risky_weight[s], 1)
      risky_weight[s] <- max(0, risky_weight[s])
      safe_weight[s] <- 1-risky_weight[s]
      risky_allocation[s] <- account_value[s] * risky_weight[s]
      safe_allocation[s] <- account_value[s] * safe_weight[s]
    }
    buy_and_hold <- cumprod(1+(0.6*btc_monthly_returns + 0.4*risk_free_rate))
    
    z_static <- cbind(account_value/1000, risky_return, buy_and_hold, floor_value/1000)
    
    # Rename the columns
    colnames(z_static) <- c("CPPI", "Bitcoin", "Mixed", "Floor_Value")
    
    ggplot(z_static, aes(x = index(z_static))) +
      geom_line(aes(y = CPPI, color = "CPPI")) +
      geom_line(aes(y = Bitcoin, color = "Bitcoin")) +
      geom_line(aes(y = Mixed, color = "60-40")) +
      geom_line(aes(y = Floor_Value, color = "Floor Value")) +
      labs(title = "CPPI Strategy with Static Floor",
           x = "Time",
           y = "Cumulated Return",
           color = "Strategy") +
      theme_bw() 
    

    The plot shows that the CPPI strategy allows not to breach the floor at the beginning of the series. Then, the more it deviates from the floor the greater is the allocation in the risky asset up to 100% exceeding the 60/40 strategy to end up below the other strategies at the end of the backtesting period.  A floor that does not change over time is not very helpful as we end up with a similar strategy to the 60/40 buy and hold where the downside protection seems ineffective. The next algorithm is build to allow for a floor that resets automatically with new highs of the risky allocation. 
    # Dynamic Floor: step wise update of the floor
    peak <- dynamic_floor <- numeric(n_steps)
    
    peak[1] <- start_value
    dynamic_floor <- floor*account_value
    
    for (s in 2:n_steps) {
      account_value[s] = risky_allocation[s-1]*(1+btc_monthly_returns[s]) + safe_allocation[s-1]*(1+risk_free_rate[s-1])
      
      peak[s] <- max(start_value, cummax(account_value[1:s]))
      dynamic_floor[s] <- floor*peak[s]
      
      cushion[s] <- (account_value[s]-dynamic_floor[s])/account_value[s]
      risky_weight[s] <- multiplier*cushion[s]
      risky_weight[s] <- min(risky_weight[s], 1)
      risky_weight[s] <- max(0, risky_weight[s])
      safe_weight[s] <- 1-risky_weight[s]
      risky_allocation[s] <- account_value[s] * risky_weight[s]
      safe_allocation[s] <- account_value[s] * safe_weight[s]
    }
    
    z_dynamic <- cbind(account_value/start_value, risky_return, buy_and_hold, dynamic_floor/start_value)
    
    # Rename the columns
    colnames(z_dynamic) <- c("CPPI", "Bitcoin", "Mixed", "Floor_Value")
    
    ggplot(z_dynamic, aes(x = index(z_dynamic))) +
      geom_line(aes(y = CPPI, color = "CPPI")) +
      geom_line(aes(y = Bitcoin, color = "Bitcoin")) +
      geom_line(aes(y = Mixed, color = "60-40")) +
      geom_line(aes(y = Floor_Value, color = "Dynamic Floor")) +
      labs(title = "CPPI Strategy with Dynamic Floor",
           x = "Time",
           y = "Cumulated Return",
           color = "Strategy") +
      theme_bw() 
    
    The new algorithm works similarly to a dynamic call option that resets a new floor for each new high. The reallocation in the safe asset allows to completely offsets losses after Jan 2021, while fully taking advantage of the upside up until then.  How do our strategies compare with each other? To answer this question we compare Annualised returns, volatility and Sharpe ratios for both CPPI strategies as well as 60/40 strategy and risky portfolio invested 100% in Bitcoins.
    # Annualized returns 
     bh_aret_st <- as.numeric((buy_and_hold[length(buy_and_hold)])^(12/n_steps)-1) # Buy-and-Hold strategy
    bitcoin_aret_st <- as.numeric((risky_return[length(risky_return)])^(12/n_steps)-1) # Risky returns
    cppi_aret_st <- (account_value[length(account_value)]/start_value)^(12/n_steps)-1 # Static CPPI
    cppi_aret_dyn <- (account_value[length(account_value)]/start_value)^(12/n_steps)-1 # Dynamic CPPI
    
    # Volatility and annualized volatility 
    bh_vol_st <- sd(0.6*btc_monthly_returns + 0.4*risk_free_rate) 
    bitcoin_vol_st <- sd(btc_monthly_returns)
    cppi_vol_st <- sd(risky_weight*btc_monthly_returns + safe_weight*risk_free_rate)
    cppi_vol_dyn <- sd(risky_weight*btc_monthly_returns + safe_weight*risk_free_rate)
    
    bh_avol_st <- bh_vol_st*sqrt(12) 
    bitcoin_avol_st <- bitcoin_vol_st*sqrt(12) 
    cppi_avol_st <- cppi_vol_st*sqrt(12)
    cppi_avol_dyn <- cppi_vol_dyn*sqrt(12) 
    
    # Sharpe Ratios 
    bh_sr_st <- (bh_aret_st-0.02)/bh_avol_st 
    bitcoin_sr_st <- (bitcoin_aret_st-0.02)/bitcoin_avol_st
    cppi_sr_st <- (cppi_aret_st-0.02)/cppi_avol_st
    cppi_sr_dyn <- (cppi_aret_dyn-0.02)/cppi_avol_dyn
    
    summary <- c(bitcoin_aret_st, bitcoin_avol_st, bitcoin_sr_st, bh_aret_st, bh_avol_st, bh_sr_st, cppi_aret_st, cppi_avol_st, cppi_sr_st, cppi_aret_dyn, cppi_avol_dyn, cppi_sr_dyn)
    summary <- matrix(summary, nrow=3,ncol=4,byrow=FALSE, dimnames=list(c("Average return","Annualised Vol","Sharpe Ratio"), c("Bitcoin", "Buy-and-hold", "CPPI Static Floor", "CPPI Dynamic Floor")))
    round(summary, 2)
    
                     Bitcoin   Buy-and-hold CPPI Static  CPPI Dynamic
    Average return   0.34      0.27         0.18         0.25
    Annualised Vol   0.73      0.44         0.68         0.27
    Sharpe Ratio     0.43      0.58         0.23         0.86
    

    The fully risk-on portfolio with 100% invested in Bitcoin has the highest annualised return (34%), but also the highest volatility (73%). While there is not much benefit of implementing the static CPPI strategy (lowest Sharpe ratio at 0.23) vs for example a simple buy-and-hold, the dynamic CPPI shows much superior performances. The Sharpe ratio for the dynamic CPPI strategy is significantly higher than all other strategies, mainly due to a lower volatility without sacrificing much returns. 

    RMarkdown and Quarto – Mastering the Basics workshop

    Learn how to use RMarkdown and Quarto! Join our workshop on RMarkdown and Quarto – Mastering the Basics which is a part of our workshops for Ukraine series. 

    Here’s some more info: 

    Title: RMarkdown and Quarto – Mastering the Basics

    Date: Thursday, May 25th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

    Speaker: Indrek Seppo, a seasoned R programming expert, brings over 20 years of experience from the academic, private, and public sectors to the table. With more than a decade of teaching R under his belt, Indrek’s passionate teaching style has consistently led his courses to top the student feedback charts and has inspired hundreds of upcoming data analysts to embrace R (and Baby Shark).

    Description: Discover the power of RMarkdown and its next-generation counterpart, Quarto, to create stunning reports, slides, dashboards, and even entire books—all within the RStudio environment. This session will cover the fundamentals of markdown, guiding you through the process of formatting documents and incorporating R code, tables, and graphs seamlessly. If you’ve never explored these tools before, prepare to be amazed by their capabilities. Learn how to generate reproducible reports and research with ease, enhancing your productivity and efficiency in the world of data analysis.

    Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


    How can I register?

    • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
    • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

    If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


    How can I sponsor a student?

    • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
    • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

    If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).

    You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


    Looking forward to seeing you during the workshop!










    Working with Big Data with Hadoop and Spark workshop

    Learn how to work with Big Data with Hadoop and Spark! Join our workshop on Working with Big Data with Hadoop and Spark which is a part of our workshops for Ukraine series. 

    Here’s some more info:

    Title: Working with Big Data with Hadoop and Spark

    Date: Thursday, May 18th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

    Speaker: Jannic Cutura is an economist turned data engineer turned software engineer who works as a Python developer at the European Central Bank’s Stress Test team. Prior to his current position he worked as research analyst/data engineer in the financial stability and monetary policy divisions of the ECB. He holds a masters and Ph.D. in quantitative economics from Goethe University Frankfurt and conducted research projects at the BIS, the IMF and Columbia University.

    Description: Big data — datasets that are difficult to handle on standalone retail-grade computers — are rapidly becoming the norm in social science research. This is true both in academia as well as for policy-oriented research in central banks and similar bodies (let alone industry application). Yet traditional econometrics (and econometrics training) tells us little about how to efficiently work with large datasets. In practice, any data set larger than the researchers computer memory (~20- 30GB) is very challenging to handle as, once that barrier is crossed, most data manipulation tasks becomes painfully slow and prone to failure. The goal of this presentation is to (i) explain what happens under the hood when your computer gets slow and (ii) show how distributed computing (in particular Hadoop/Spark) can help to mitigate those issues. By the end, participants will understand the power of distributed computing and how they can use it to both tackle existing data handling challenges and as well as new ones that were previously prohibitively expensive to evaluate on retail grade computers. The workshop will both contain a theory part and a lab session using data bricks. If you want to follow along during the live session you can create your own free account at data bricks by signing up for the community edition (no credit card required).

    Minimal registration fee: 20 euro (or 20 USD or 800 UAH)

    How can I register?

    • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)
    • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

    If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


    How can I sponsor a student?


    • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

    • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

    If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).


    You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


    Looking forward to seeing you during the workshop!









    WooCommerce Administrator with R

    When working with WooCommerce, you have access to a powerful and robust product that offers a variety of benefits. Not only is it free and supported by a large community, but it also has strong SEO capabilities and a vast selection of plugins to enhance functionality. Additionally, the WooCommerce admin tool is user-friendly and easy to navigate, requiring minimal time and effort to learn. In fact, most individuals can become proficient in its use in just one to two hours.

    However, there is a drawback. The main concept of WooCommerce is that it’s like managing a storefront. Regardless of the size of your business, whether it’s small or large, it’s still just a storefront. This means that it lacks serious back-office capabilities, and the only way to manage your products is one-by-one, similar to how you would rearrange products in a storefront window.
    If you’re managing an e-shop professionally, simply rearranging the products one by one as in a shop window won’t suffice. To stay ahead of the competition, you need to provide your customers (or boss) with more advanced capabilities and perform tasks quickly.
    While it’s true that there are plugins available for almost anything you can think of, they often come at a cost. Moreover, they can negatively impact the speed of your store and lead to compatibility issues with each other. If you end up using more than 3-4 plugins, errors are bound to occur, making your workflow inefficient.

    Over the years, I have faced several challenges in managing e-shops and I have finally decided to overcome them. After putting in a lot of effort, I have written over 90 functions in R (6.400+ lines of code) and utilized the WooCommerce API to develop a highly robust solution for these problems.
    The central concept is to create a duplicate of the essential features of the e-shop such as categories, tags, attributes, products, customers, and orders inside R-Studio,  utilize custom functions to perform filtering and CRUD operations through the REST API.

    The solution is seamlessly integrated with WooCommerce through the REST API, and the source code is explained in detail, making it easy for you to modify the functions to suit your needs or even create new ones. I have incorporated multiple ways to achieve the same result, including GUI interfaces with the Tcl/Tk package, allowing you to customize your working environment.

    My book, “WooCommerce Administrator with R,” is available on Amazon in both Kindle and paperback formats. 

    One use case included in the book demonstrates how easy it is to add new products, whether they are variable or simple. By creating an xlsx file with the necessary data (one line per product), along with variation attributes and category paths, you can use a single command to pass all the information to your e-shop, with variations created automatically.

    Check the video in this link to see how it is done: Create new products with WooCommerce API in R.

    Let’s see another example. You need to have a large sale on white women’s shoes, sizes 40 and 41 EU, as they are no longer in fashion and you have a lot of stock. You expect that smaller sizes will sell eventually. Act fast, customers, as the grand sale for white women’s shoes in sizes 40 and 41 EU will only last for two weeks!

    filter = list(categories = "Shoes", variations = c("Color : White", "Shoe size : 40|41"))
    filtered <- filter_products(filter = filter, search.variations = TRUE)

    pr_filtered <- filtered[1] %>% as.data.frame()  # parent products
    var_filtered <-  filtered[2] %>% as.data.frame() # filtered variations

    schema_name =create_schema("name, v_Color, v_Shoe size, regular_price, sale_price,  date_on_sale_to_gmt", template = F, echo = T)[[1]]

    my_products <- populate_schema(schema_name, data = pr_filtered, var_df = var_filtered, values.from.parent = FALSE)

    # adjust prices and offer date
    my_products$sale_price = as.numeric(my_products$regular_price)*0.5 
    my_products$date_on_sale_to_gmt = paste0(Sys.Date()+14,"T23:59:59")

    my_products <- keep.columns(my_products, "sale_price, date_on_sale_to_gmt") %>%  
    filter(parent > 0) # we want to update only the variations

    my_products <- modify_variations_batch (my_products, add.elements = F)

    These commands, which may seem complex now, become simple to use once you have the source code and analysis. With these commands, you can complete your work in a matter of minutes, depending on the number of products you have, without ever needing to access the WP-Admin interface.

    In my book, I also address the challenge of managing metadata. The functions I provide enable you to add additional fields to your products, customers, and orders. For instance, you can add information such as barcodes, product costs, discount policies, sales representatives, and more. If you have brick-and-mortar stores, you can even create orders in batches and include metadata about your retail customers, such as age group, sex, new/old customer status, and so on. All of this data can be extracted in a single data frame for further analysis. It’s a powerful tool that you’ll surely find useful!

    I am confident that by learning to use the functions and basic directions provided in the book, you will see a significant improvement in your e-shop management capabilities. As an e-shop manager, this will allow you to work more efficiently and productively.
    If you are a business owner, you will gain a better understanding of the potential of your e-shop and be able to hire the appropriate personnel to manage it effectively.
    Furthermore, if you are interested in learning R, this book provides a great opportunity to do so while tackling real-life problems.
    Lastly, for college students and business executives, acquiring the skills and knowledge provided in this book can be valuable for potential employers.

    I highly recommend checking out my book on Amazon, as it provides a comprehensive solution to common issues faced by e-shop managers and business owners.  Get started today and take your e-shop to the next level!



    John Kamaras (www.jkamaras.com)







    How much weight do you need to lose to drop an inch off the waist?

    This is an analysis inspired by Dan’s 2014 article and it’s meant to be a continuation of his idea with a larger sample size and a slightly different approach. 

    I’ve used the PRAW package in Python to scrape data from the progresspics subbredit, and I ran the data analysis in R. I was able to collect data from 77 individuals and I chose to split the entire analysis by gender – due to the differences in abdominal fat distribution between men and women.
    As can be observed from Table 1 and the plot, weight loss tends to be much higher for the males than for the females both in absolute units and percentage-wise. 

    I chose to run four different regression models, for each gender. While Dan’s article only considered weight change, I also included the weight percentage change, BMI change, and BMI percentage change.

    # Male
    lm(weightDiff ~ waistDiff, data, subset = gender == "Male")
    lm(weightDiff_perc ~ waistDiff, data, subset = gender == "Male")
    lm(bmiDiff ~ waistDiff, data, subset = gender == "Male")
    lm(bmiDiff_perc ~ waistDiff, data, subset = gender == "Male")
    
    # Female
    lm(weightDiff ~ waistDiff, data, subset = gender == "Female")
    lm(weightDiff_perc ~ waistDiff, data, subset = gender == "Female")
    lm(bmiDiff ~ waistDiff, data, subset = gender == "Female")
    lm(bmiDiff_perc ~ waistDiff, data, subset = gender == "Female")

    Based on the results from Table 2, we could conclude the following:

    Males
    1. On average, you need to lose 8.6 lbs (3.9 kg) to lose 1 inch off the waist.
    2. On average, you need to lose 1.4% of your weight to lose 1 inch off the waist.
    3. On average, you need to lose 1.1 BMI points to lose 1 inch off the waist.
    4. On average, you need to reduce your BMI by 1.4% to lose 1 inch off the waist.
    Females
    1. On average, you need to lose 4.8 lbs (2.1 kg) to lose 1 inch off the waist.
    2. On average, you need to lose 2.3% of your weight to lose 1 inch off the waist.
    3. On average, you need to lose 0.82 BMI points to lose 1 inch off the waist.
    4. On average, you need to reduce your BMI by 2.4% to lose 1 inch off the waist.
    Of course, the sample size is relatively small (especially for women) and the starting weights vary widely, so take these results with a grain of salt.
    If you  want to delve deeper, you can find the data and the code I used here.