Crafting Custom and Reproducible PDF Reports with Quarto and Typst in R workshop

Join our workshop on Crafting Custom and Reproducible PDF Reports with Quarto and Typst in R, which is a part of our workshops for Ukraine series! 

Here’s some more info: 

Title: Crafting Custom and Reproducible PDF Reports with Quarto and Typst in R

Date: Thursday, December 19th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone)

Speaker: Riva Quiroga is a linguist and educator based in Valparaíso, Chile. She is a Software Sustainability Insitute Fellow, part of the R-Ladies Global Leadership Team, and a Women Techmakers Ambassador.

Description: In this workshop, we will cover the process of creating a fully customized and reproducible PDF report using Quarto and Typst, a modern typesetting and markup language designed for creating high-quality PDFs that offers a more user-friendly alternative to LaTeX. After walking participants through the building blocks of document layout, the workshop will focus on Quarto’s ability to translate CSS properties into Typst properties, a feature that expands the possibilities for customizing a document’s appearance.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)



Please note that the registration confirmation email will be sent 1 day before the workshop.


How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!

DataCamp’s RADAR: Forward Edition

As 2024 draws to a close, which new trends should you be paying attention to?


Join DataCamp’s flagship conference RADAR: Forward Edition to explore developments in data and AI that will shape 2025 and beyond.

  • Date: November 13, 2024.
  • Location: Online
  • Cost: Free of charge

Experts from top tech organizations will cover topics such as generative AI, how leadership can leverage and transform their organizations with AI, career development in AI, and more.


Register now for free

Introduction to generalized linear models in R workshop

Join our workshop on Introduction to generalized linear models in R, which is a part of our workshops for Ukraine series! 

Here’s some more info: 

Title:  Introduction to generalized linear models in R

Date: Thursday, December 5th, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone)

Speaker: Bodo Winter is Professor of Linguistics at the Dept. of Linguistics and Communication, University of Birmingham, and a UKRI Future Leaders Fellow for the project “Making numbers meaningful”. He uses data science-driven methods to study gesture, iconicity, and numerical communication in language. Bodo has authored Statistics for Linguists: An Introduction Using R and co-founded the Birmingham Statistics for Linguists Summer School.

Description: In this talk, you’ll learn about the fundamentals of generalized linear models, a powerful extension of the general linear model/multiple regression. We will discuss different distributions that can be used to model a diverse range of data-generating processes and how to interpret models that use different link functions. In the hands-on part of the workshop, we’ll work through a dataset for which we are going to use a mixed Poisson regression model, implemented with the package brms. Materials for the hands-on session will be distributed a couple days prior to the workshop.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


Please note that the registration confirmation email will be sent 1 day before the workshop.


How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!






DataCamp’s Free Access Week, Nov 4–10

Free Access Week is a free week to explore all of DataCamp’s features for zero cost.


From November 4 (9 AM EST) – November 10 (11.59 PM EST), anyone can access DataCamp Premium features with a free DataCamp account.


Don’t have an account? Sign up here, and you’ll gain unlimited access until November 10.


What is DataCamp?


From data newbies to advanced practitioners, DataCamp is an online data and AI learning platform that helps over 14 million learners worldwide upskill in new technologies.


All 500+ courses, career tracks, and products are available in-browser, so you don’t need any additional software to access DataCamp.


Whether you’re looking to develop your programming skills, enhance your CV, build AI skills, or upskill your team—Free Access Week gives you unlimited learning for all levels and career ambitions.


Sign Up Here

Free Data and AI Courses with 365 Data Science—Unlimited Access until Nov 21

365 Data Science opens its doors this November!

From November 1 to November 21, 2024 (8 a.m. UTC), enjoy unrestricted access to its entire platform, including a selection of R programming courses and projects.

This offer includes expert-led courses, hands-on projects, and interactive exercises on various data science and AI topics. With flexible, 24/7 access, learners can study at their own pace—making it an excellent opportunity to advance in these fields.
365 Data Science offer free data and AI courses until November 21st.

The Initiative’s Fourth and Biggest Year

As 2024 unfolds, 365 Data Science’s free access initiative enters its fourth consecutive year, reinforcing its dedication to democratizing data science education. This program—which emerged as a response to global lockdowns—has become a pivotal annual event in the data science community.

CEO Ned Krastev emphasizes the initiative’s alignment with industry trends: “As data science continues to shape our world, we’re committed to equipping learners with cutting-edge skills and knowledge.”

The 2023 edition saw remarkable engagement, with over 75,000 participants collectively spending over 3.6 million minutes on the platform and earning over 17,000 certificates. Krastev adds, “The consistent engagement each year reflects a global hunger for knowledge and skills in data science. We’re proud to support learners on their journey to excellence in this field.”

Boost Your Career with Professional Certifications

365 Data Science empowers learners to translate their newly acquired knowledge into tangible career advancements by offering free, authenticated proof of expertise—narrowing the divide between educational achievement and industry recognition.

The program offers diverse certifications encompassing broad career trajectories and niche specializations. These credentials can bolster participants’ profiles in the ever-evolving data science landscape.

Course collections to teach you the skills to become a data scientist, data analyst, or business analyst.

And these certifications are just the beginning of what 365 Data Science offers for free.

R Programming: A Core Focus

365 Data Science recognizes R’s pivotal role in data analysis and visualization, as evidenced by their tailored course offerings:

  • Introduction to R Programming: This foundational course covers syntax, data structures, and basic operations, setting the stage for your R journey.
  • The Complete Data Visualization Course with Python, R, Tableau, and Excel: While comprehensive in scope, this course significantly emphasizes R’s powerful visualization capabilities.

You can also find other vital courses to enhance your use of R and other programming languages, including Statistics, Probability, Statistical Tests in Sales and Marketing, Data Preprocessing, and more.
Interactive exercises to boost your practical data and AI skills

Comprehensive Learning Experience

While R programming is one of their focuses, 365 Data Science offers a wide array of courses covering various aspects of data science and AI:

  • Foundational Skills: Courses in data literacy, strategy, math, and more to build a solid analytical base
  • Programming Languages: Comprehensive courses in Python and SQL to complement your R skills
  • Machine Learning: From basic concepts to advanced algorithms and deep learning
  • Artificial Intelligence: Cutting-edge AI topics, including natural language processing and generative AI
  • Business Applications: Courses that bridge the gap between technical skills and real-world business scenarios

These diverse offerings ensure a well-rounded education in data science and AI.

Beyond Courses: Hands-On Projects & Career Prep

Theory alone isn’t enough in data science. 365 Data Science offers practical projects that let you apply your R skills to real-world scenarios. A standout example includes the Housing Market Data Analysis in R Project, where you’ll use R to dissect and interpret housing market trends. But there is far more to explore on their website.

Mastering technical skills is just one part of pursuing a data science or AI career. You also need to articulate your knowledge and experience during interviews. 365 Data Science has just developed an innovative AI-powered interview preparation tool to help with just that. It offers real-time practice and customizability so you can walk into your next data science or AI interview well-prepared and self-assured.

The #365DataLearningChallenge

To add an element of excitement, 365 Data Science is running a learning challenge alongside the free access period. Participants can earn points by completing courses, projects, and exercises. The top performers can win lifetime platform access, career consultations, and portfolio feedback. So don’t miss this opportunity!

Take Advantage Before November 21

Mark your calendars! This free access window runs from November 1 to November 21, 2024.

Staying competitive is essential in today’s rapidly evolving data and artificial intelligence landscape. This three-week complimentary access to 365 Data Science presents a valuable investment opportunity for your professional growth.

Seize this moment to elevate your career and immerse yourself in data science and AI with 365 Data Science this November.

Begin your journey for free at 365 Data Science.

Free Interview Prep Tool for Data Professionals by 365 Data Science

In an exciting development for aspiring data scientists and analysts, 365 Data Science has launched InterviewAce, an AI-powered interview preparation tool designed to give candidates a competitive edge in the job market. This innovative platform offers a unique, personalized approach to interview prep, including the ability to practice technical questions such as R programming.

Explore InterviewAce for free here.

Tailored Preparation for Data Science Roles

InterviewAce stands out by offering customized interview simulations for a wide array of data-related positions, including:

  • Data Scientist
  • Data Analyst
  • Machine Learning Engineer
  • Business Analyst
  • BI Analyst
  • Data Architect
  • Risk Management Analyst
  • Tableau Developer
  • Database Administrator
  • Data Strategist

Unlike generic interview preparation tools, InterviewAce is specifically tailored for data professionals. It goes beyond general questions, offering a focused approach that addresses the unique challenges and requirements of data-related roles.

You can prepare for 10 roles, including data scientist, data analyst, Tableau developer, etc.

Realistic Interview Scenarios with Coding Capabilities

One of InterviewAce’s standout features is its ability to simulate real-world interview scenarios, including technical assessments. Users can demonstrate their coding skills directly within the tool, including R programming. This feature offers several benefits to candidates.

First, it allows users to familiarize themselves with common technical coding questions that are frequently asked in data science interviews. By practicing these questions, candidates can build confidence and improve their problem-solving skills.

Second, the platform provides immediate feedback on code quality and efficiency. This instant evaluation helps users identify areas for improvement and refine their coding practices to meet industry standards.

Last, InterviewAce enables candidates to enhance their ability to explain technical concepts clearly—demonstrating not only technical proficiency, but also the ability to communicate complex ideas effectively to both technical and non-technical audiences.

AI-Driven Personalization

InterviewAce leverages OpenAI’s GPT models to deliver a personalized preparation experience. The tool adapts to each user’s responses, offering customization for both HR and technical interview simulations. This tailored approach includes several key features.

The platform adjusts question difficulty based on the user’s proficiency, ensuring candidates are consistently challenged at an appropriate level.

The AI also analyzes and interprets the user’s previous experience, selecting targeted questions based on this knowledge.

Finally, InterviewAce simulates the interview style of specific target companies, helping candidates prepare for the unique approaches and expectations of their desired employers.

Comprehensive Feedback and Continuous Improvement

InterviewAce provides comprehensive feedback after each practice session, including overall scores with personalized comments. Users also receive individual critiques for each answer, providing detailed insights.

This feedback highlights areas for improvement, breaking down performance across various skill areas and offering tailored suggestions for specific questions or topics.

Furthermore, InterviewAce recommends additional study resources to help users expand their knowledge and address any gaps in understanding, ensuring continuous growth and preparation.
Feedback report for your interview answers

Key Features of InterviewAce

  • Daily practice with two interview sessions
  • Speech-to-text functionality for natural responses
  • Comprehensive reports and scoring to track progress over time
  • Company-specific interview customization to prepare for target employers
  • In-platform coding assessments, including R programming
  • Both HR and technical interview simulations to cover all aspects of the interview process

Free Access

InterviewAce is now available to all 365 Data Science users, requiring only a free account to access. This accessibility ensures that aspiring data professionals from all backgrounds can benefit from this powerful tool, regardless of their current career stage or financial situation. By democratizing access to high-quality interview preparation, 365 Data Science is helping to level the playing field in the competitive data science job market.

InterviewAce marks a significant advancement in interview preparation for data professionals. By combining AI technology with a focus on both technical and soft skills, 365 Data Science equips candidates with essential tools for success in the data science job market.

With its practical approach and personalized guidance, InterviewAce is a valuable resource for those looking to progress in this dynamic field.

Explore InterviewAce today to enhance your data science interview performance.

Visualizing Variance with Sankey diagrams/Riverplots using R: An Illustration with Longitudinal Multi-level Modeling workshop

Join our workshop on Visualizing Variance with Sankey diagrams/Riverplots using R: An Illustration with Longitudinal Multi-level Modeling, which is a part of our workshops for Ukraine series! 


Here’s some more info: 

Title: Visualizing Variance with Sankey diagrams/Riverplots using R: An Illustration with Longitudinal Multi-level Modeling

Date: Thursday, November 26th, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone)

Speaker: Daniel P. Moriarity, PhD is a clinical psychologist with a particular interest in immunopsychiatry, psychiatric phenotyping, and methods reform in biological psychiatry. He currently works as a Postdoctoral Fellow in the UCLA Laboratory for Stress Assessment and Research with Dr. George Slavich. Starting January 2025, he will join the University of Pennsylvania’s Psychology Department as an Assistant Professor of Clinical Psychology.

Description: This workshop will illustrate how to create Sankey diagrams/Riverplots with a focus on longitudinal multilevel modeling to separately visualize between-person and within-person variance. However, the technique can be applied to many other visualizations of different sources of variance (e.g., different variables, random vs. fixed effects). Data + code templates will be provided to follow along with.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)

Please note that the registration confirmation email will be sent 1 day before the workshop.

How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.

If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!

Understanding Difference-in-Differences: Basics and Beyond with Applications in R workshop

Join our workshop on Understanding Difference-in-Differences: Basics and Beyond with Applications in R, which is a part of our workshops for Ukraine series! 

Here’s some more info: 

Title:  Understanding Difference-in-Differences: Basics and Beyond with Applications in R

Date: Thursday, November 21st, 18:00 – 20:00 CET (Rome, Berlin, Paris timezone)

Speaker: Tobias Eibinger is a final-year PhD candidate in Economics at the University of Graz, Austria. His research focuses on causal environmental policy evaluation, particularly focusing on transport policies and their impact on emission reductions. He specializes in advanced econometric techniques, including Difference-in-Differences, time-series analysis, and macropanels. He has spent time at the Central European University (CEU) in Vienna and the Vrije Universiteit (VU) Amsterdam to deepen and apply his knowledge in causal identification. His work emphasizes the practical application of these methods to analyze real-world policy effects and to inform policy recommendations.

Description: This workshop provides a solid introduction to Difference-in-Differences (DiD), covering both the foundational concepts and more advanced techniques needed to address common challenges in applied research. We begin by exploring canonical DiD and two-way fixed effects (TWFE) as a starting point. We then move on to more complex scenarios like staggered adoption and multiple treatments. We discuss the limitations of traditional DiD, particularly the issue of forbidden comparisons, and introduce the Goodman-Bacon (2021) decomposition to break down treatment effects. Dynamic settings are then covered through event studies, allowing us to examine how effects evolve over time. Finally, we discuss modern remedies such as the Callaway and Sant’Anna (2021) approach to better handle heterogeneous treatment timings. Throughout, participants will follow detailed R examples to apply these methods hands-on, gaining practical experience alongside the theoretical insights.


Minimal registration fee: 20 euro (or 20 USD or 800 UAH)


Please note that the registration confirmation email will be sent 1 day before the workshop.


How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!





Standardising R Projects with the ProjectTemplate package workshop

Join our workshop on Standardising R Projects with the ProjectTemplate package, which is a part of our workshops for Ukraine series! 

Here’s some more info: 

Title: Standardising R Projects with the ProjectTemplate package

Date: Thursday, November 7th, 14:00 – 16:00 CET (Rome, Berlin, Paris timezone)

Speaker: Michael Rasmussen is a data analyst based in Melbourne, Australia. Michael is passionate about using data visualization and machine learning to explore data, answer questions and provide insights for decision making. He has a rich background of work experiences with strengths developed as both a psychologist and data scientist, strong theoretical statistical background and experiences in machine learning.


Description: The aim of this workshop is to help attendees understand why standardised R projects are beneficial for the user, colleagues and the wider organisation. Attendees will then be introduced to ProjectTemplate, a package that enables users to support R project workflows, through batch processing the importation of data, preparation and final analysis in a reproducible, effortless manner. Attendees will also be shown how the structure of the project files and workflows can be modified to suit their needs.

Minimal registration fee: 20 euro (or 20 USD or 800 UAH)

Please note that the registration confirmation email will be sent 1 day before the workshop.

How can I register?



  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the registration form, attaching a screenshot of a donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after donation).

If you are not personally interested in attending, you can also contribute by sponsoring a participation of a student, who will then be able to participate for free. If you choose to sponsor a student, all proceeds will also go directly to organisations working in Ukraine. You can either sponsor a particular student or you can leave it up to us so that we can allocate the sponsored place to students who have signed up for the waiting list.


How can I sponsor a student?


  • Save your donation receipt (after the donation is processed, there is an option to enter your email address on the website to which the donation receipt is sent)

  • Fill in the sponsorship form, attaching the screenshot of the donation receipt (please attach the screenshot of the donation receipt that was emailed to you rather than the page you see after the donation). You can indicate whether you want to sponsor a particular student or we can allocate this spot ourselves to the students from the waiting list. You can also indicate whether you prefer us to prioritize students from developing countries when assigning place(s) that you sponsored.


If you are a university student and cannot afford the registration fee, you can also sign up for the waiting list here. (Note that you are not guaranteed to participate by signing up for the waiting list).



You can also find more information about this workshop series,  a schedule of our future workshops as well as a list of our past workshops which you can get the recordings & materials here.


Looking forward to seeing you during the workshop!

Is round(0.5) 0 or 1?


Actually, it’s both possible

This Article was originally published before on YOZM-IT as Korean

Various way of data science 

There are many programming languages in the world and software that utilizes them. And those play an important role in “Data science”.

For example, if you’re using funnel analysis to improve your product, you might want to 

  • Compare the bounce rates of funnel stages before and after an event,
  • And perform a ratio test to calculate their statistical significance.
Image by the author

Meanwhile, data scientists have various career backgrounds and experiences. So They tend to use the methods they’re comfortable with, including Python, R, SAS and more.

We see this quite a bit, because in most cases, the software you use at the level of business doesn’t make much of a difference.

But what happens if you “produce different results by the software used?

The following image shows the results of running a proportion test in R, Python, and STATA with example mentioned.

Image from the author and CAMIS project

You can see that even though we used the same values of 1000 and 123, the p-value, which indicates the significance of the proportion test, is slightly different for each method.

There are many reasons why the calculation value is different depending on the method used, such as 

  • Different algorithms in the core logic of the programming language 
  • Different default values of the parameters used in the function.

In the example above, if you change the value of the parameter correct in R and apply “Continuity correction” as using “correct = F” , you can see that the result is the same as in STATA.

Image from CAMIS project

Rounding

Next, I’ll introduce rounding for more general data analysis. 

Image by the author

Similarly, you can see that the round changes its value depending on software.

If the fee is “0.5 billion” in some large financial transaction in business, the rounded cost could be zero or 1 billion, depending on how you calculate the rounding.

Another case could be Logistic regression, which various round can be reverse prediction.

Image from Wikipedia, edited by the author

Why is round different?

Let’s talk a little more about why this round is different. 

Rounding as we usually perceive it means changing 0 ~ 4 to 0, and 5 ~ 9 to 10, as shown below image.

And in decimal units, is rounding to the nearest whole number by changing .0 ~ .4999.. to 0 and .5 ~ .9999.. to 1

However, there are a number of mathematical interpretations of when exactly 0.5 , and when it is a negative number.

Image from the Learning corner

For example, round(-23.5) should produce -23 or -24?

Both are possible, depending on the mathematical interpretation and it’s called as rounding half up and rounding half down respectively. We can take this a step further and round both positive and negative numbers closer to zero, or vice versa.

This means that round(-23.5) will round to -23, and round(23.5) will round to 23, or round to -24 and 24, respectively. These are represented by the names Rounding half toward zero, Rounding half away from zero, respectively.

Finally, there are methods called Rounding half to even and Rounding half to odd, which mean that we want to consider the nearest integers to be even and odd, respectively.

In particular, the Rounding half to even method also goes by the names Convergent rounding, Statistician’s rounding, Dutch rounding, Gaussian rounding, and Bankers’ rounding, and is one of the official standard methods according to IEEE 754.

Bankers’ rounding

Bankers’s rounding, is default method in R , so Let’s breif a little bit more.

The image below shows the result of rounding from 0.0 to 2.0.

Image from the author

While this may seem like a good idea, there is actually a problem. Because .5 is unconditionally rounded to the next integer, there is an unconditional bias towards rounding to a “+ value”.

I don’t know the exact reason for this, but one theory is that the US IRS used to use this rounding to collect taxes and was sued for unfairly profiting by collecting more taxes from people who were .5 off, so they lost the case and changed to rounding to the nearest even (or odd) number to match the .5 rounding.

This means that by modifying the rounding as shown below, we can avoid the bias that was previously occurring.


The problem with different results

In recent years, industries in various domains, including pharmaceuticals and finance, have been trying to switch from “commercial” software such as SPSS, SAS and STATA to “open source” software such as Python, R and Julia . 

And as rounding mentioned earlier, diffrent result issue by software has been also raised which can create problems in terms of reproducibility, uncertainty, accuracy, and traceability.

So if you’re utilizing multiple softwares, you should be aware of why they produce different results, and how you can use them to properly

CAMIS project

Image from CAMIS project

CAMIS stands for Comparing Analysis Method Implementations in Software. 

This project compares the differences in softwares (or programming languages) and make standards to produce the same results.

The core area of the project is the “statistical computation” part, so most contributions come from the data science leaders who have strong understanding with it.

But CAMIS is also an open source project, that is not restricted and maintained with various people through regular discussions, collaboration, and sharing of project progress.

Below is one of the comparisons published on the CAMIS project’s webpage, which reviews how a one sample t-test is run with each software, what the results are, and how the results are compatible with each other.

Image from CAMIS project

The CAMIS project was started by members who interested in “SAS to R” in the medical and pharmaceutical industry. So it mainly focuses on R and SAS along major statistical data analysis, but recently it’s also working on how to use Python for data science in a broader domain of the industry.

Not only clasiccal methods such as Hypothesis tests, Regression analysis, but modern methods in data science such as Bayesian statistics, Causal inference and novel implementations of existing methods (e.g. MMRM) are topic of interest in project.

Sessions are increasingly appearing at multiple data science conferences, where many researchers and contributors are encouraged to promote, contribute and utilize it as a reference.

Finally, the CAMIS project is also collaborating with academia beyond the data science industry, as similar topics have been published in The American Statistician and Drug Information Association, among others.

Image from The American Statistician
The project is also currently working with students on a thesis entitled “A comparison of MMRM methodology in SAS and R software” and is open to collaborations and suggestions on other topics.

Summary

Various software used in data science. As the domain, the libraries or software used by an organization may be dependent on a particular language, which can sometimes be mixed with personal preferred methods. (in many cases, this doesn’t vary much at the level of the business)

However, if you’re not careful, the methods you use can lead to different results.

In this article, I’ve given you some examples of and reasons for differences in the methods used by different software for calculations, and introduced the CAMIS project, a research project that aims to minimize them to ensure consistency in data analysis.

If you use different software in your data analytics work, it’s a good idea to take a look at them to understand the differences and try to find the optimal method for your purposes,

And if you work in data science in the field, I highly recommend that you take an interstate in or contribute to the CAMIS project for a global collaborative experience.