One-Way Analysis of Variance (ANOVA)

ONE WAY ANOVA

When we are comparing means of more than two groups, we can no longer rely on a t-test to compare means. Instead, we use an extension of the t-test known as analysis of variance (ANOVA) to test the equality of k population means (μ1, μ2, μ3,...,μk). In a one-way ANOVA, there is one categorical independent variable; however, more independent variables can be added to the model. For example, in a two-way ANOVA, two independent categorical variables are included in the model, which allows us to examine interactions between them; that is, whether the effect of one independent variable differs depending on the level of another. In this example, we will be performing a one-way ANOVA, but later we will learn how to perform a two-way ANOVA with interaction effects as well as more advanced ANOVA models.

In this example, we will compare three groups of patients, recruited from three separate medical facilities, in terms of their pulmonary function. A measure of pulmonary function, forced expiratory volume (FEV), was obtained for each patient. We estimate the population means, μ1, μ2, and μ3, with the means from each group, x̅1, x̅2, and x̅3.

x̅1 = 2.63

x̅2 = 3.03

x̅3 = 2.88

Our null and alternative hypotheses in this case are: 

H0= all the groups come from the same underlying population (μ1 = μ2 = μ3) 

Ha= all the groups do not come from the same underlying population and at least one of the population means differs from one of the others  (μ1 ≠ μ2 OR μ1 ≠ μ3 OR μ2 ≠ μ3)

We choose to conduct a two-tailed test at alpha=0.10. If our ANOVA F test (see below) results in a p value less than .10, we can reject the null hypothesis and accept the alternative hypothesis. 

Assumptions:

There are several ways to assess the normality of the data distribution for each group. For this example, we will simply examine side by side box-plots (Figure 1) to look for extreme violations of normality. The box plots do not indicate extreme violations of normality for either group. In case of extreme violations of normality, we can try data transformations or non-parametric tests (e.g. Kruskall Wallis test). 

A violation of the assumption of equal variances is particularly troublesome when sample sizes are unequal. We can use statistical software to conduct a statistical test for homogeneity of variance (e.g., Levene's test) or we can compare our group variances to determine if the the largest standard deviation is greater than twice the smallest standard deviation, which would indicate a violation of this assumption. Adjusted ANOVA F statistics (Welch and/or Brown Forsythe) are available which are robust to violations of homogeneity of variance. Based on the descriptive statistics for our sample (Table 1), we see the largest standard deviation (.523) is not greater than twice the smallest standard deviation (.496).

Figure 1: Side-by-Side Boxplots

Table 1: Descriptive Statistics

F Test

The F statistic is the ratio of the variance between groups to the variance within groups; in other words, it is the ratio of the explained variance and the unexplained variance (i.e., error/residual).

We obtain our p-value (right tailed probability) for our F-statistic from an F distribution (Figure 2). Depending on our degrees of freedom and α , our F-distribution will have a certain critical value

The F-distribution has two values for degrees of freedom: 

df numerator = k-1 (# of groups - 1)

df denominator= N-k (total sample size - # of groups)

For this example, 

df1 = 3 - 1 = 2 

and 

df2 = 60 - 3 = 57

If the F statistic we calculate exceeds the critical value, we can reject our null hypothesis. The area shaded in blue in Figure 2 is known as the rejection region and is equal to α. In our case, we set α at 0.10, so there is a 10% chance of finding an F value within the rejection region if the null hypothesis were true. 

Figure 2: F Distribution

Recall, we need to calculate two sources of variation:

To calculate within group variance we use the following formula:

The within group variance could also be called Mean Square (MS) within, and is calculated by dividing the Sum of Squares (SS) within by degrees of freedom. We use the denominator degrees of freedom for within group variance.We need the sample sizes and variances for group 1, group 2, and group 3 in order to solve this equation. Retrieve the sample sizes and standard deviation values from Table 1 (recall, to calculate variance we must square the standard deviation values). After plugging those values into the equation, we end up with a within group variance of 0.254.

To calculate between group variance, we must first calculate a grand mean, and then we can use the following formula:

The between group variance could also be called Mean Square (MS) between, and is calculated by dividing the Sum of Squares (SS) between by degrees of freedom. We use the numerator degrees of freedom for between group variance. We need the sample sizes and means for group 1, group 2, and group 3 in order to solve this equation. Retrieve the sample sizes and means from Table 1. After plugging those values into the equation, we end up with a between group variance of 0.769.

We can now calculate our F statistic:

We now have all the three pieces of information we need to determine statistical significance.

F statistic = 3.03

df1 = 2

df2 = 57

We could use an F table (Table 2) and compare our F statistic to the critical value. If our F statistic exceeds the critical value, we can reject the null hypothesis and accept the alternative hypothesis. To do so, we look for the table column header containing our degrees of freedom numerator, 2, and the row label containing our degrees of freedom denominator, 57. We look for the critical value at the intersection of our column and row. Below, we see the critical value is between 2.44 and 2.39. Our F-statistic of 3.03 is greater than the critical value.

When performing ANOVA with statistical software, a p-value will normally be provided. Alternatively, we could use a built in formula within Excel: F.DIST.RT(x, deg_freedom1, deg_freedom2) = F.DIST.RT(3.03, 2, 57) = 0.056. Our p-value is less than our predefined alpha of 0.10.

Conclusion: Accept our alternative hypothesis!

Ha= all the groups do not come from the same underlying population and at least one of the population means differs from one of the others  (μ1 ≠ μ2 OR μ1 ≠ μ3 OR μ2 ≠ μ3)

Table 2: F Distribution Table

Post-hoc Tests

ANOVA is an omnibus test; that is, it tells us that there is a difference between at least one of the groups, but it does not tell us specifically which groups are different. In order to determine where the differences lie, we must perform pairwise comparisons. Following a statistically significant ANOVA finding, we can conduct multiple t-tests to compare group means, but by doing so we inflate our Type 1 error. To account for this, we must adjust our alpha based on the number of comparisons we plan to perform. For our example, there are three groups and three possible comparisons:

A common method to adjust for the inflated Type 1 error is the Bonferrroni correction, which is simply dividing our original alpha level by the number of comparisons we plan to perform.