Introduction
In the chapter on independent t tests, we learned how to perform a hypothesis test on two groups that are not related. But what if we have to compare measurements on more than 2 groups? One can imagine a lot of research questions that might involve more than 2 groups such as...
A large portion of research projects involve comparing measurements of three or more groups against each other. Therefore, understanding how to analyze such hypothesis tests is crucial to a solid understanding of behavioral statistics. The method for analyzing data from three or more groups is the Analysis of Variance which is usually referred to as 'ANOVA'.
 
 
What Does ANOVA test?
Consider the simplest ANOVA that tests differences between 3 groups. As you might expect, the null hypothesis states μ123 which implies separate null hypotheses.
These three hypotheses that compare means to each other are called simple hypotheses because the comparisons are easy and straightforward.  Each of these comparison are also sometimes called pairwise mean comparisons.
 
It's easy to understand how the three means would be equal to each other if the null hypothesis is true, but there is a little more here than meets the eye. Namely, that if the means of the groups are all the same, then averages between the groups are also the same. The comparisons between averages are called complex null hypotheses. For an ANOVA with 3 groups, there are 3 complex null hypotheses...
Therefore, when we have 3 groups in an ANOVA, there we have 3 simple null hypotheses and 3 complex null hypotheses for a total of 6 null hypotheses. The statistical analysis that is most commonly used in this situation is the Omnibus test, which tests all 6 of these null hypotheses at the same time.
 
Of course, with larger numbers of groups, there are more and more total hypotheses because there are many more comparisons. For example, with 4 groups, there are a total of 24 null hypotheses (6 simple and 18 Complex). With 4 groups, there are many more averages to compare (e.g. comparing the averages of mean 1 and 2 against mean 3 and 4). No matter how many groups exist, the Omnibus test makes all the comparisons at the same time and the null hypothesis is rejected if ANY of the null hypotheses are rejected.
 
Because the Omnibus test performs multiple comparisons for all the null hypotheses, there is still work left to be done if the Omnibus test produces a rejection of the null hypotheses. Namely, even if we reject the null hypotheses with the Omnibus test, we still need to determine WHICH of the null hypotheses are responsible for the Omnibus null hypotheses rejection.  Post-hoc tests are used to solve this problem. Post-hoc tests tell us exactly which null hypotheses caused the rejection of the null hypotheses in the Omnibus test. 
 
The most widely used Post-hoc test is the Tukey test, which identifies which of the pairwise mean comparisons might be responsible for the rejection of the null hypotheses in the Omnibus test. For example, a Tukey test may indicate that μ1 is significantly different μ2, but that there are no significant differences between μ1 and μ3 or between μ2 and μ3. This website will not compute Tukey test probabilities, but it is certainly important to be aware of what a Tukey test accomplishes.
 
Sometime, the Omnibus test will yield a rejection of the null hypotheses, but the Tukey test will not reveal any significant pairwise differences between the means. To explain this, we need to remember that the Tukey test only tests the 3 pairwise mean comparisons, but not the complex null hypotheses. Therefore, if the means of the 3 groups are not different from each other, then at least one of the complex null hypotheses must be responsible for the rejection of the null in the Omnnibus test.
 
Students sometimes get confused by the idea that the pairwise means may not produce a null hypothesis rejection, but the complex null hypothesis can be rejected. For example, the results of an Omnibus test might show
The confusion arises because the '=' notation in a hull hypothesis test is not the same as mathematical equivalence. When we say that  μ = μ3, we don't really mean that the means of groups 1 and 3 are the same, but rather that the means are 'about the same' or that we do not have enough evidence to reject them as being the same. 
 
To see how this would work, consider the following possible situation.
The key to understanding how complex null hypothesis can be rejected even as the simple null hypotheses are NOT rejected is to realize that rejection of the null hypotheses boils down to probability which comes from distributions. When two groups are averaged together, the width of the average distribution goes down (kind of like a sample that is twice as large) and it is easier to reject a value as coming from the average distribution than from either individual distribution.
 
So far, we've discussed the Omnibus test which evaluates all of the simple and complex null hypotheses at the same time. In some circumstances, we may want to evaluate a specific complex null hypotheses that is predicted by a specific theory/model. These hypothesis tests of a specific complex null hypotheses are called Apriori Hypothesis or planned contrast or planned comparison).  With four groups, some examples might be...
These apriori hypotheses have an advantage over the Omnibus test in that they are easier to reject due to less variance. This should sound familiar -- the apriori hypotheses are similar to one tailed t tests. 
 
 
The ANOVA Method
Analysis of Variance (ANOVA) uses variances estimates to test the null hypotheses of the Omnibus test. The method looks at two different estimates of population variance and compares them. These two estimates are
These two variance estimates have special names in ANOVA...

MSWithin = The variance estimate coming from scores within the groups.
MSBetween = The variance estimate coming from means between the groups.

Ultimately, these two variance estimates are going to be compared to each other in a ratio, and their ratio also has a special name, the F Ratio which is calculated by dividing the MSBetween  by  MSWithin

The reason for this division will be explained very soon.
 
Before we can determine how to calculate MSB and MSW, we need to mention that there are two different ways of solving a one-way between subjects ANOVA.
The simple ANOVA is valuable because it is both easier to understand and easier to calculate than the structural ANOVA. In fact, once we understand the simple ANOVA, the structural ANOVA will be much easier to comprehend. In addition, knowledge of the structural ANOVA serves as a foundation for the rest of the ANOVA analyses (within subjects and two-way between subjects). Both the simple and structural ANOVA use the same terms (MSB, MSW and the F ratio), but they calculate the terms in slightly different ways.
 
As is often the case, a visual display is the best way of understanding how the ANOVA uses the two variances to determine if the null hypothesis is rejected. There are two possibilities. Either the null hypothesis is true or it is false. The diagram below shows the score and mean distributions in an example where there are 9 scores in each sample.

ANOVA Display: Null Hypothesis is True

As the display above shows, the null hypothesis is true, so each of the 3 samples are taken from the same distribution. In this case, the variance of the means (calculated from the means in bottom panel) should be relatively small. In fact, it should be about the same as the variance of the scores (calculated by averaging the 3 variance estimates of the 3 samples) multiplied by N (see the central limit theorem).
 
In the display below, the null hypothesis is false, which means that the three samples are coming from 3 different population distributions.

ANOVA Display: Null Hypothesis is False

In the display above, we can see that the variance of the scores will stay the same because moving the distributions does NOT affect the sample variances of the 3 sample. On the other hand, the variance of the means will rise dramatically because now the means are much further away from each other because they are coming from different distributions. Therefore, the ratio of MSB / MSW will also rise because the numerator (MSB) will increase while the denominator (MSW) will stay the same.
 
So far, we've learned that the F ratio, which relates the two variance estimates (MSB and MSW) is the key to the ANOVA hypothesis test. let's summarize what the F ratio tells us in the two cases where the null hypothesis is true or false.
Of course, in order to perform a hypothesis test, we will need to compare the F ratio to a critical value. With an ANOVA, the critical value will be determined by two degrees of freedom -- dfand dfW, and there is a new table that will be used that indicates a critical value for the each combination of dfand dfW.
 
 
The Simple ANOVA
As usual, it's best to work through an example to see how we will calculates all the necessary components of the hypothesis test. This is a simple ANOVA and works when the sample sizes of each group are the same. In the next section, we will extend this method to the structural method, which works with any sized groups. Here's the problem. Notice that we assume that the scores are normally distributed -- this is always the case for ANOVA analyses performed on this website.

Consider the hypothesis test that examines how handedness (left, right, or ambidextrous) affects pitch discrimination ability. Here are the scores. The scores in the groups are not related.
 
RHX=464343
LHX=110112
AMBX=120021
 XG=2
 
Assume that the scores are normally distributed (α=.05)
What are the results of the hypothesis test?
StepHow to calculate
Choose statistical test.This is an easy question given that we are in the section on one-way between subjects ANOVA, but we need to remember that this is the correct analysis because the ratio variables are not related and there are 3 or more groups.

Calculate Variances.We need to calculate the variance of all the scores so that we can average them to arrive at the best estimate of the population variance. This is very similar to the pooled variance calculation we used for independent t tests, except that know we have 3 groups instead of 2.

For each group j,Sj2=Σ(Xi-X)2/(n-1)
S12=((6-4)2+(4-4)2+(3-4)2+(4-4)2+(3-4)2) / 4 = 1.5
S22=((1-1)2+(0-1)2+(1-1)2+(1-1)2+(2-1)2) / 4 = 0.5
S32=((2-1)2+(0-1)2+(0-1)2+(2-1)2+(1-1)2) / 4 = 1
Calculate MSWithinHaving calculated the variances of the 3 groups, we simply take the average to get our best estimate of the population variance. This is the definition of the MS.

MSWithin=ΣSj2 / K
MSWithin=ΣSj2 / K = (1.5+0.5+1) / 3 = 1
Calculate MsBetweenThis is where we calculate the variance of the means, and then compare that mean variance to the score variance calculated in the previous step. But we cannot compare the mean variance to the score variance without an adjustment, because means vary a lot less than scores do. Specifically -- means variance is N times smaller than score variance by the Central Limit Theorem, so we must multiply the mean variance by N to have a fair comparison.  Here's the formula with the example...

MSBetween= n * Σ(Xj - XG)2/(k-1)
5 * ((4 - 2 )2+(1 - 2 )2+(1 - 2 )2) / 2) = 15
Calculate Observed F ValueObserved F value (written FObs) is the F ratio we've been discussing. If the null hypothesis is true, then we expect FObs to be about 1. If the null hypothesis is false then we expect FObs to be greater than 1.
Calculate Degrees of FreedomWe haven't discussed how to calculate the degrees of freedom, but it isn't hard to figure out.

For the k groups (3 in this case), there are always k-1 (here 2) degrees of freedom. So dfB = 2.

But what about dfW? Well, the total degrees of freedom for all scores and groups is the total number of scores in all groups minus 1. 

dftotal = n (scores per group) * k (number of groups)  - 1= nk-1.

Since all degrees of freedom have to come from either between or within groups, then ...

dftotal= dfB+dfW
nk-1 = k-1 + dfW
dfW =nk -1 - (k-1) = (n -1)k

In this example,

dfBetween = 3 - 1 = 2, dfWithin = 15 - 3 = 12
Calculate Critical ValueUsing the degrees of freedom from the previous step, we use the F critical value table to find the critical value. The F critical table has columns for each dfand rows for each dfW.

Fcritical = 3.885
Test HypothesisWe test the hypothesis by comparing the observed F value (FObs) to the critical value. Is observed F value above critical value (Fcritical)?
FObserved of 15 is above Fcritical of 3.885 so we reject the null hypothesis.
 
Steps to Calculate  a Simple ANOVA

 
As mentioned above, the critical values for F ratios exist in a table with rows (representing dfW) and columns (dfB ). Here is part of the critical value table for F ratios.

Critical Values For F ratios (α =.05)

In this table, if we had two degrees of freedom between groups (e.g. 3 groups) and 12 degrees of freedom within groups (e.g. 15 total scores), then we would look up the value in the column below 2 and the row next to 12. This value would be 3.885.

Another separate table exists for α =.01.
 
Let's look at another example. But this time, the variances of the samples will be given to simplify calculation.

Consider the hypothesis test that examines how handedness (left, right, or ambidextrous) affects pitch discrimination ability. Here are the scores. The scores in the groups are not related.
 
RHX=45434
LHX=11111
AMBX=12002
 XG=2
 
S12=0.6667, S22=0, S32=1.3333
Assume that the scores are normally distributed (α=.05)
What are the results of the hypothesis test?
Answer:FObserved of 18 is above Fcritical of 4.257.
REJECT NULL HYPOTHESIS
Step 1:Choose statistical test. What type of variable is it? How many groups are there? Are the groups related?
Variable is type ratio. There are three or more groups, and they are independent. Use a between subjects ANOVA
Step 2:Calculate MSWithin. MSWithin=ΣSj2 / K
MSWithin=ΣSj2 / K = (0.6667+1.3333) / 3 = 0.6667
Step 3:Calculate MsBetween. MSBetween= n * Σ(Xj - XG)2/(k-1)
4 * ((4 - 2 )2+(1 - 2 )2+(1 - 2 )2) / 2) = 12
Step 4:Calculate Observed F Value. FObserved=MSBetween / MSwithin
FObserved=MSBetween / MSwithin = 12 / 0.6667 = 18
Step 5:Calculate Degrees of Freedom. DfBetween = K(Number of groups) - 1, DfWithin = NG - K(Number of groups)
dfBetween = 3 - 1 = 2, dfWithin = 12 - 3 = 9
Step 6:Calculate Critical Value. Look up in F table
Fcritical = 4.257
Step 7:Test Hypothesis. Is observed F value above critical value?
FObserved of 18 is above Fcritical of 4.257.
REJECT NULL HYPOTHESIS
 
 
The Structural ANOVA
The Structural ANOVA must be used when the sample sizes are unequal, but also can be used if the sample sizes are equal. The structural method is also important to understand because the logic of the structural method will be extended to the other two types of ANOVAs presented on the website, the One-way within subjects ANOVA and the Two-way between subjects method.
 
The simple method can be used when sample sizes are equal because we can use the sample standard deviations of the groups can be pooled and the means of the groups are equally accurate estimates of the population means because they come from similar size groups. But if the sample sizes are not equal, then the simple method can't be used. Instead, we use the structural method which accomplishes the same goal as the simple ANOVA -- comparing an estimate from the mean variance and an estimate of score variance. The only difference between the simple and the structural ANOVA is how these variances are calculated.
 
The key to understanding the structural ANOVA is to realize that for EVERY score in all groups, there is a total amount of variability associated with that score. Most importantly, this total variability has two parts (or partitions).
In other words, we can think of two parts of each score's variability. The first part is difference between the score and the rest of the scores in the score's group. This first part of the variability is called the within-group variability. The second part is the differences between the score's group mean and the other group means. This second part of the variability is called the between-group variability.
 
As we calculate the sums of squares for the between group and within group effects, we usually put these in a table to keep them straight. let's look at a sample table of data and the resulting sum of squares calculations. Here is the data

GroupMean  Scores
 1X=564
 2X=120
 3X=120
 XG=2.3333

A table of these calculations would look like this...
::
 
X
Between
Σ(Xj - XG)2
Within
Σ(X - Xj)2
Total
Σ(X - XG)2
6(5 - 2.3333)2 = 7.1113(6 - 5)2 = 1(6 - 2.3333)2 = 13.4447
4(5 - 2.3333)2 = 7.1113(4 - 5)2 = 1(4 - 2.3333)2 = 2.7779
2(1 - 2.3333)2 = 1.7777(2 - 1)2 = 1(2 - 2.3333)2 = 0.1111
0(1 - 2.3333)2 = 1.7777(0 - 1)2 = 1(0 - 2.3333)2 = 5.4443
2(1 - 2.3333)2 = 1.7777(2 - 1)2 = 1(2 - 2.3333)2 = 0.1111
0(1 - 2.3333)2 = 1.77770 - 1)2 = 1(0 - 2.3333)2 = 5.4443
 
Sample Sum of Squares Calculations for all Scores in A STRUCTURAL ANOVA

 
Ultimately, the structural method adds up all of the within group variability parts of each score (which are measured as Sum of Squares) and compare this sum to the sum of the between group variability parts of each score. The only extra thing to do is to find a way to make the comparison fair because means don't vary as much as scores do (from the central limit theorem). This is done by dividing the sum of squares by the degrees of freedom for the between-group and within-group effect, so...
 
All of the relevant information for ANOVA is summarized in a "Summary Effects Table". An example of this appears below.
:::
EffectSSdfMSF-Ratio
BetweenΣ(Xj - XG)2K - 1SSB / dfBMSB / MSW
WithinΣ(X - Xj)2NG - KSSW / dfW
TotalΣ(X - XG)2NG - 1
 
Summary of Effects Table: One-Way Between Subjects ANOVA

 
Let's look at a sample problem all the way through.
Consider the hypothesis test that examines how handedness (left, right, or ambidextrous) affects pitch discrimination ability. Here are the scores. The scores in the groups are not related.
 
RHX=464343
LHX=110112
AMBX=12002
 XG=2.0714
 
Assume that the scores are normally distributed (α=.05)
What are the results of the hypothesis test?
Answer:FObserved of 13.2589 is above Fcritical of 3.982.
REJECT NULL HYPOTHESIS
Step 1:Choose statistical test. What type of variable is it? How many groups are there? Are the groups related?
Variable is type ratio. There are three or more groups, and they are independent. Use a between subjects ANOVA.
Step 2:Partition Sum of Squares. For each X, calculate SSBetween, SSWithin, and SSTotal
Sum of Squares
BetweenWithinTotal
XΣ(Xj - XG)2Σ(X - Xj)2Σ(X - XG)2
6(4 - 2.0714)2 = 3.7195(6 - 4)2 = 4(6 - 2.0714)2 = 15.4339
4(4 - 2.0714)2 = 3.7195(4 - 4)2 = 0(4 - 2.0714)2 = 3.7195
3(4 - 2.0714)2 = 3.7195(3 - 4)2 = 1(3 - 2.0714)2 = 0.8623
4(4 - 2.0714)2 = 3.7195(4 - 4)2 = 0(4 - 2.0714)2 = 3.7195
3(4 - 2.0714)2 = 3.7195(3 - 4)2 = 1(3 - 2.0714)2 = 0.8623
1(1 - 2.0714)2 = 1.1479(1 - 1)2 = 0(1 - 2.0714)2 = 1.1479
0(1 - 2.0714)2 = 1.1479(0 - 1)2 = 1(0 - 2.0714)2 = 4.2907
1(1 - 2.0714)2 = 1.1479(1 - 1)2 = 0(1 - 2.0714)2 = 1.1479
1(1 - 2.0714)2 = 1.1479(1 - 1)2 = 0(1 - 2.0714)2 = 1.1479
2(1 - 2.0714)2 = 1.1479(2 - 1)2 = 1(2 - 2.0714)2 = 0.0051
2(1 - 2.0714)2 = 1.1479(2 - 1)2 = 1(2 - 2.0714)2 = 0.0051
0(1 - 2.0714)2 = 1.1479(0 - 1)2 = 1(0 - 2.0714)2 = 4.2907
0(1 - 2.0714)2 = 1.1479(0 - 1)2 = 1(0 - 2.0714)2 = 4.2907
2(1 - 2.0714)2 = 1.1479(2 - 1)2 = 1(2 - 2.0714)2 = 0.0051
Total28.92861240.9286
Step 3:Calculate Degrees of Freedom. DfBetween = K(Number of groups) - 1, DfWithin = NG - K(Number of groups)
dfBetween = 3 - 1 = 2, dfWithin = 14 - 3 = 11
Step 4:Calculate MSWithin. MSWithin=SSWithin / dfWithin
MSWithin=SSWithin / dfWithin = 12 / 11 = 1.0909
Step 5:Calculate MsBetween. MSBetween=SSBetween / dfbetween
MSBetween=SSBetween / dfbetween = 28.9286 / 2 = 14.4643
Step 6:Calculate Observed F Value. FObserved=MSBetween / MSwithin
FObserved=MSBetween / MSwithin = 14.4643 / 1.0909 = 13.2589
Step 7:Calculate Critical Value. Look up in F table
Fcritical = 3.982
Step 8:Test Hypothesis. Is observed F value above critical value?
FObserved of 13.2589 is above Fcritical of 3.982.
REJECT NULL HYPOTHESIS
 
Consider the hypothesis test that examines how handedness (left, right, or ambidextrous) affects pitch discrimination ability. Here are the scores. The scores in the groups are not related.
 
RHX=19.37561423343444
LHX=4.7143112121223
AMBX=4.55562120255555
 XG=9.5417
 
Assume that the scores are normally distributed (α=.05), SSBetween=1160.4278, SSWithin=3885.5255

What are the results of the hypothesis test?
Answer:FObserved of 3.1359 is below Fcritical of 3.467.
FAIL TO REJECT NULL HYPOTHESIS
Step 1:Choose statistical test. What type of variable is it? How many groups are there? Are the groups related?
Variable is type ratio. There are three or more groups, and they are independent. Use a between subjects ANOVA.
Step 2:Calculate Degrees of Freedom. DfBetween = K(Number of groups) - 1, DfWithin = NG - K(Number of groups)
dfBetween = 3 - 1 = 2, dfWithin = 24 - 3 = 21
Step 3:Calculate MSWithin. MSWithin=SSWithin / dfWithin
MSWithin=SSWithin / dfWithin = 3885.5255 / 21 = 185.025
Step 4:Calculate MsBetween. MSBetween=SSBetween / dfbetween
MSBetween=SSBetween / dfbetween = 1160.4278 / 2 = 580.2163
Step 5:Calculate Observed F Value. FObserved=MSBetween / MSwithin
FObserved=MSBetween / MSwithin = 580.2163 / 185.025 = 3.1359
Step 6:Calculate Critical Value. Look up in F table
Fcritical = 3.467
Step 7:Test Hypothesis. Is observed F value above critical value?
FObserved of 3.1359 is below Fcritical of 3.467.
FAIL TO REJECT NULL HYPOTHESIS
 
Definitions
Complex Null Hypotheses: ANOVA Hypotheses that involve comparisons between at least one average between means.
 
F Ratio: A ratio between mean square terms that is used to evaluate a hypothesis test
 
F Ratio for One-way Between Subjects ANOVA:
F = MSBetween / MSWithin  = MSB / MSW
 
MSBetween or  MSB: Mean Square Between --  the variance estimate that comes from between group means in an ANOVA.
 
MSWithin or MSW: Mean Square Within -- the variance estimate that comes from scores within groups in an ANOVA.
 
Omnibus Test: The ANOVA test which evaluates all simple and complex null hypotheses at the same time.
 
Simple Null Hypotheses: ANOVA Hypotheses that compare one mean against another.
 
Easy Questions
1. A one-way ANOVA is used for experiments with how many groups?
2. If a one-way ANOVA has three groups (μ1, μ2, and μ3), what is the null hypothesis?
3. What are the pairwise (simple) null hypotheses are there in an ANOVA with 3 groups?
4. What are the complex null hypotheses are there in an ANOVA with 3 groups?
5. Null hypotheses that compare the average of more than one mean with another mean are called __________________ hypotheses
6. What is the name given to the ANOVA test that tests all of the possible relationships among means
7. What is the name for tests such as TUKEY that identify sources of the rejection of the null hypothesis in an omnibus test?
8. If the null hypothesis is rejected in an ANOVA, then we would use a
_______________ test to compare the individual means against each other
9. If we are testing a specific complex hypothesis, then we would use a(n) ______________________ test
10. If the null hypothesis is true, then what should the F ratio be?
11. If the null hypothesis is false, then what should the F ratio be?
12. In an ANOVA with 3 groups and 7 subjects per group, what is the critical F value for alpha =.05.
Medium Questions
13. Consider the hypothesis test that examines how handedness (left, right, or ambidextrous) affects pitch discrimination ability. Here are the scores. The scores in the groups are not related.
 
RHX=454344
LHX=111111
AMBX=220026
 XG=2.3333
 
S12=0.5, S22=0, S32=6
Assume that the scores are normally distributed (α=.05)
What are the results of the hypothesis test?
14. Here's another problem.

Consider the hypothesis test that examines how handedness (left, right, or ambidextrous) affects pitch discrimination ability. Here are the scores. The scores in the groups are not related.
 
RHX=4.5543555
LHX=4333555
AMBX=3201555
 XG=3.8333
 
S12=0.7, S22=1.2, S32=5.2
Assume that the scores are normally distributed (α=.05)
What are the results of the hypothesis test?
15. The null hypothesis is rejected in an ANOVA. We then performed a post-hoc test comparing pairwise means, and we showed no significant differences between the pairs. This would mean that the null hypothesis was rejected because of a(n)
16. Consider the hypothesis test that examines how handedness (left, right, or ambidextrous) affects pitch discrimination ability. Here are the scores. The scores in the groups are not related.
 
RHX=19.37561423343444
LHX=4.7143112121223
AMBX=4.5212025555
 XG=9.7391
 
Assume that the scores are normally distributed (α=.05), SSBetween=1139.1306, SSWithin=3885.3032

What are the results of the hypothesis test?
17. Consider the hypothesis test that examines how phone design (flat, flip, fold, or telescope) affects phone usage.. Here are the scores. The scores in the groups are not related.
 
FLATX=85.270818910779
FLIPX=94.895969310981
FOLDX=996510899124
TELX=10110111785
 XG=94.0588
 
Assume that the scores are normally distributed (α=.05), SSBetween=637.3414, SSWithin=3547.6

What are the results of the hypothesis test?