Three-Way Analysis of Variance: Simple Second-Order Interaction Effects and Simple Main Effects

Interested in publishing a one-time post on R-bloggers.com? Press here to learn how.
In this article we will show how to run a three-way analysis of variance when both the third-order interaction effect and the second-order interaction effects are statistically significant. This type of analysis can become pretty tedious, especially when our factors have many levels, so we will try to explain it here as clearly as possible. (If you want to watch me doing these analyses live, get my free course on statistical analysis with R here.)

First of all, let’s present the fictitious data we are going to work with. Let’s suppose that a pharmaceutical company is planning to launch a new vitamin that allegedly improves the employees’ resistance to effort. The vitamin is tested on a sample of 720 employees, divided into three groups: employees who take a placebo (the control group), employees who take the vitamin in low dose and employees who take the vitamin in high dose. Half of the employees are male, and half are female. Moreover, we have both blue collar employees and white collar employees in our sample. The resistance to effort is measured on a scale whatsoever, from 1 to 30 (30 being the highest resistance). Our goal is to determine whether the effort resistance is influenced by three factors: dose of vitamin (placebo, low dose, and high dose), gender (male, female) and type of employee (blue collar, white collar). You can find the experiment data in CSV format here. Third-order interaction effect First of all, let’s check whether the third-order interaction effect is significant. We are going to run the analysis using the aov function in the stats package (our data frame is called vitamin). aov1 <- aov(effort~dose*gender*type, data=vitamin) summary(aov1) In the formula above the interaction effect is, of course, dosegendertype. The ANOVA results can be seen below (we have only kept the line presenting the third-order interaction effect).                                 

                                  Df Sum Sq Mean Sq F value   Pr(>F)


dose:gender:type   2    187    93.4  22.367 3.81e-10
The interaction effect is statistically significant: F(2)=22.367, p<0.01. In other words, we do have a third-order interaction effect. In this situation, it is not advisable to report and interpret the second-order interaction effects (they could be misleading). Therefore, we are going to compute the simple second-order interaction effects. Simple second-order interaction effects The simple second-order interaction effects are the effects of each pair of factors at each level of the third factor. Specifically, we have to compute the following effects:
  1. the interaction effect of dose and type of employee, for each gender category (male and female)
  2. the interaction effect of gender and type of employee, at each dose level (placebo, low and high)
  3. the interaction effect of dose and gender, for each type of employee (blue collar and white collar).
The total number of second-order interaction effect is given by the sum of the factor levels. In our case we have 7 effects (3+2+2). We will not analyze of all them, because this article would become too long. We will only focus on the first set of effects, leaving the others for you as an exercise. So let’s investigate the interaction effect of dose and type of employee for each gender group. First we have to create two separate data frames, for male and female employees. We do that with the filter command in the dplyr package (though you can also use brackets or subsets). vitamin_male <- filter(vitamin, gender=="male") vitamin_female <- filter(vitamin, gender=="female") Now we perform a two-way analysis of variance on each data frame (the factors being dose and type, of course). [code lang=””r””] aov1 <- aov(effort~dose*type, data=vitamin_male) summary(aov1) aov2 <- aov(effort~dose*type, data=vitamin_female) summary(aov2) The results of the analyses are shown below (we have only retained the lines with the interaction effects).
                Df Sum Sq Mean Sq F value   Pr(>F)    
dose:type     2    249   124.7   28.42 3.57e-12            
                   Df Sum Sq Mean Sq F value   Pr(>F)   
dose:type     2  137.2    68.6   17.31 6.74e-08

We can notice that both simple second-order interaction effects are significant (p<0.01). So we are dealing with a combined influence of the factors dose and type of employee in both male and female groups. In this situation, we have to examine the simple main effects for each factor. This is what we are going to do in the next section. Simple main effects Let’s compute the main effect for the factor dose of vitamin, which is the most important (after all, the company wants to demonstrate that the vitamin does affect the resistance to effort). You will be able to compute the other simple main effects yourself, using this as an example. Now we must create four separate data frames, for each combination of the factors gender and type of employee: male – blue collar, male – white collar, female – blue collar, female – white collar. [code lang=””r””] vitamin_male_blue <- filter(vitamin, gender=="male", type=="blue collar") vitamin_male_white <- filter(vitamin, gender=="male", type=="white collar") vitamin_female_blue <- filter(vitamin, gender=="female", type=="blue collar") vitamin_female_white <- filter(vitamin, gender=="female", type=="white collar") Next we perform a one-way ANOVA for each data frame. Let’s do it for the first group, male – blue collar. [code lang=””r””] aov1 <- aov(effort~dose, data=vitamin_male_blue) summary(aov1)

                 Df Sum Sq Mean Sq F value Pr(>F)    
dose          2 2943.5  1471.8   349.9 <2e-16

The simple main effect for the factor dose on this group is statistically significant (p<0.01). In other words, there is a significant difference between placebo, low dose and high dose levels within the male – blue collar employees category, regarding the resistance to effort. To find out how big the differences are, we use the TuckeyHSD function to compute the test with the same name.
TukeyHSD(aov1)


                                              diff        lwr       upr   p adj
low dose-high dose -2.528333  -3.413363 -1.643303     0
placebo-high dose  -9.558333 -10.443363 -8.673303     0
placebo-low dose   -7.030000  -7.915030 -6.144970     0

By inspection of the table we conclude that the differences in effort resistance between the dose groups are significant (p<0.01). The highest difference, in absolute values, is that between low dose and placebo levels: 9.5 points. So the employees who took a high dose present a higher resistance to effort than those who just took a placebo. One more example: the simple main effects of the variable dose of vitamin on the female – blue collar group. [code lang=””r””] aov1 <- aov(effort~dose, data=vitamin_female_blue) summary(aov1)                  Df Sum Sq Mean Sq F value Pr(>F)   
dose          2  399.6  199.81   45.57 <2e-16  
TukeyHSD(aov1)


                                            diff        lwr       upr     p adj
low dose-high dose  1.083333  0.1797508  1.986916 0.0141485
placebo-high dose  -2.476667 -3.3802492 -1.573084 0.0000000
placebo-low dose   -3.560000 -4.4635826 -2.656417 0.0000000
  The simple main effect is statistically significant, as it results from the first table. Furthermore, all the differences between dose levels are significant. The highest difference is the difference between low dose and placebo (3.5 points). To learn more on data analysis in R, check the free “Statistics with R” video course here.

2 thoughts on “Three-Way Analysis of Variance: Simple Second-Order Interaction Effects and Simple Main Effects”

  1. Thank you for the post.

    This is an approach, but the problem with sub-setting the data as you have done is the wrong error term is being used in the F-Tests. The correct/better error term, as far as a I know, for these follow-up simple effects analysis is the error term from the total model (in this case, the 3-way model you started with). I find r to be particularly cumbersome doing simple effects analysis in the presence of these higher order interactions. The “phia” package certainly helps.

  2. Why not explore the interaction in a regression model, including an interaction effect plus main coefficients. You can then vary the different base levels in order to test (multiple) hypotheses. R also has some neat packages to visualize these effects (sjPlot::sjp.int, interplot).
    This is not to criticize your blog or work, I am just interested in why you chose to use ANOVA instead of regression. I know that there are some fields in science that prefer one over the other (e.g. economists prefer regressions, while psychologists often use ANOVA) , but maybe you had some other reason?
    Apart from the fact that three-way interactions usually are a pain to analyze, it appears to me that ANOVA may be a bit more tedious than regression, but maybe I am wrong.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.