TheBestStatistics.Info

Introduction

One of the first things students of statistics need to understand is the concept of the distribution of a variable. A variable can take on many different values, but some values are more likely than others. The distribution describes how likely each value appears. For example, the heights of male students in a class can take on different values, but heights of 69 inches (e.g. 5 foot 9 inches) are much more common than heights of 77 inches (i.e. 6 foot 5 inches).

The best way to comprehend a distribution is to produce a visualization/picture of the distribution. In the plot below, each point on the line corresponds to a height (on the horizontal axis) and the probability of a student in the class having that height (on the vertical axis). The shape of this line describes which heights occur more often than others.

Distribution of heights for males

The y axis is not labeled, but for statistical distributions it always indicates the probability of each particular value of the variable (in this example, height). So the highest point on the distribution, 69 inches, is the most probable height.

It is important to note that the distributions you will see on this website (and many others) have smooth curves. This helps us to see more clearly the shape of the distribution, but also assumes something which might not be that obvious -- that the population of the distribution is very, very large and the graph was made in a particular way. Specifically, whenever we make a graph, we need to decide how we are going to group values to form the picture of the distribution. You might be wondering why we have to group values at all. Well, if we didn't group values at all, and we measured the value of height with perfect precision, then the graph would be a set of spikes where each spike represented the value of each person in the population. Therefore, in order to even have a distribution with shape, we need to use ranges of values to produce the graph. The 'smoothness' of the graph will depend upon the sizes of the ranges that we use when we form the picture.

If we do use ranges of values that are too large, then the distribution starts to look more angular and less smooth. The distribution below is the same normal distribution above, but with larger ranges of values grouped together to form the picture

Normal curve created with larger groups of values on the x axis

Ultimately, using the smooth pictures is preferred because they are easier to understand (and to draw!), even if practically these pictures would be very difficult to produce.

Some statistics textbooks spend considerable time on the problems related to the problem with making distributions of small populations and samples. Students often find these concepts of limits (e.g., upper and lower real limits) and various displays of data (histograms, stem and leaf diagrams) to be very annoying. These concepts are not covered on this website because the author believes they are a distraction from the more important concepts of the descriptive statistics needed to support inferential statistics.

Characteristics

Let's get back to our distribution of heights of males showed in the last section.

Distribution of heights for males

Distributions of variables have certain characteristics that describe the shape of the distribution. The three most important characteristics are...

Modality refers to how many HUMPS the distribution has.
Symmetry describes how the right side of a distribution compares to the left side.
Asymptoticness refers to how far the distribution extends out to the left and to the right.

Modality

Modality refers to how many HUMPS (or modes) the distribution has. The distribution below is called UNIMODAL because it has one high point (or mode).

Unimodal Distribution

Unimodal distributions are typical when the distribution is describing a single population of individuals. For example, the distribution of heights of 25 year old males is unimodal because there is one value (the average height) which is most common.

However, the distribution that comes from two different populations would be BIMODAL because there would be a high point at the average value of the two populations. This might look something like the graph below which describes the weightlifting capacity of a large group of college students.

Bimodal distribution of two populations that overlap a very small amount

Notice that in this case, the two modes have a big valley between them because in general, there is not much overlap between the weightlifting capacities of the two genders. Most men can lift quite a bit more than the average woman, even though there may be a few women who can lift more than the average man and a few men who can lift less than the average woman.

Of course, some bimodal distributions have humps that are less distinct. For example, the graph below shows the average heights of a large group of students.

Bimodal Distribution with two populations that overlap alot

In this case, the two modes have a small dip between them because there are quite a few women who are taller than the average man and quite a few men that are smaller than the average women. The size of the valley between the two humps is a function of the width (measured by standard deviation) of the two population distributions that are contributing to the bimodal distribution.

In the bimodal distributions shown so far, the humps have always been equally high, but this does not have to be the case. For example, the graph below shows the average heights of a large group of students where there is more women in the group.

Bimodal Distribution with two populations that are different sizes

Notice that the higher hump that comes from the heights of the men is smaller than the lower hump that comes from the heights of the women.

At some point, if the size of the men is very small compared to the size of the women, then the combined distributions will not even be bimodal, but rather just skewed like this one...

Distribution of two populations where one population is much larger than the other

Ultimately, it may be difficult to determine exactly what constitutes a bimodal distribution as opposed to a skewed distribution. Fortunately for students, most examples on exams are clearly bimodal or not, but the wise student remembers that in the real world things can get a little more complicated.

Of course, distributions may have more than 2 modes if they are a combination of three or more populations. The distribution below is called TRIMODAL because it has three high points (or modes).

Trimodal Distribution of three populations

Some distributions may have no peaks at all with an each variable appearing equally often as all other variables -- this is called an amodal distribution,

In summary, modality describes how many humps show in the distribution, where each hump usually indicates a separate population.

Symmetry

Symmetry refers to how the left and right side of the distributions compare to each other. If the left and right side are equal, the distribution is symmetric which looks like this...

Symmetric Distribution

Non-symmetric distributions come in two types, positively and negatively skewed. The skew refers to where the long tail is heading. So positively-skewed distributions look like this...

Positively Skewed Distribution

Positively skewed distributions are associated with FLOOR EFFECTS because the values on the left side of the distributions are squished up against the floor. For example, the distribution of how many times students have taken the drug ecstasy is positively skewed because:

the majority of students have never taken the drug.
fewer students have taken the drug once.
some of those who have taken it may have taken it a number of times.
even fewer students (but some unfortunately) have taken the drug quite a few times

Negatively-skewed distributions have a tail that goes negative like this...

Negatively Skewed Distribution

Negatively-skewed distributions are associated with CEILING EFFECTS because the values on the right side of the distributions are squished up against the ceiling. For example, the distribution of scores on a very easy exam is negatively skewed because:

the majority of students score near 100%.
fewer students score in the 90%-100% range.
the real slackers score below 90 and some even score very low.

In summary, symmetry describes how the two sides of the distribution compare to each other.

Asymptoticness

Asymptoticness refers to how far the distribution extends out from the center of the distribution. For a distribution to be perfectly asymptotic, the distribution has to extend out to infinity on both sides. Of course, the probability of values that are far away from the mean is very low, but these values are still possible.

Strong violations of asymptoticness are when the maximum or minimum value is not too far away from the mean. For example, if the minimum value is only 1 standard deviation away from the mean, then this distribution is said to have a strong violation of asymptoticness. The line between weak and strong does not really exist, but certainly 1 standard deviation is considered to be a strong violation. The chart below shows a strong violation of asymptoticness

Distribution with strong violation of Asymptoticness

Almost all distributions of real variables have a weak violation of asymptoticness. For example, the distribution of heights of males has a maximum and a minimum value. Thus, if the average height of a male is 68 inches and the standard deviation is 4 inches, then the lowest possible height of 0 inches is 17 standard deviations below the mean. This weak violation of asymptoticness is not important because values that are 17 standard deviations below the mean occur with such incredibly low probability that they have no real effect on the size of the area under the curve (which corresponds to probability). The chart below shows a weak violation of asymptoticness where the lowest and highest values are 4 standard deviations away from the mean. As the graph shows, it is nearly impossible to know this by looking at the graph because the probability of values 4 standard deviations away from the mean is so low.

Distribution with weak/no violation of Asymptoticness

In summary, asymptotic describes how far the distribution extends out from the mean. Although almost all distributions have a weak violation of asymptoticness, only strong violations are considered to be important.

Central Tendencies

Central tendency measures try to describe the 'typical' score of a distribution. The most widely used measures are the mean, the median, and the mode. Although the most important measure for statistics is the mean, the median and the mode can also be useful in some situations. With each central tendency measure, the strengths and weaknesses will be discussed.

Central tendency measures try to describe the 'typical' score of a distribution. The most widely used measures are..

Mode
Median
Mean

Although the most important measure for statistics is the mean, the median and the mode can also be useful in some situations. With each central tendency measure, the strengths and weaknesses will be discussed.

Mode

The mode is the easiest central tendency measure to understand and calculate. The mode is simply the most frequent occurring score. As such, it is a crude measure of central tendency because most scores have little or no effect on the mode. In fact, it is possible that the mode may be extremely misleading for some distributions.

The greatest strength of the mode is that it can be applied to all variable types. Imagine you have the following distribution of party affiliation.

Republicans - 54
Democrats - 34
Independents - 22

The only thing we can say about the 'central tendency' of the distribution is that the most common party affiliation is 'republican'. This is the mode and the only central tendency measure that makes sense for a nominal variable. Of course, we can also apply the mode to all the other types of variables as well.

The number of modes describes the distribution. Here are the terms for the different numbers..

Term	Description	Example
Unimodal	1 most common score	1,2,2,2,2,2,2,2,3,4 (Mode is 2)
Bimodal	2 most common scores	1,2,2,2,3,4,5,5,5,6 (Modes are 2 and 5)
Trimodal	3 most common scores	1,1,2,3,4,5,6,6,7,7 (Modes are 1,6, and 7)
Amodal	All scores equally common	1,2,3,4,5,6,7 (No modes)
Quasimodo	A fairy-tale character with an odd-shaped back (just checking to see if you are paying attention)

Summary of mode characteristics

Here's a table to summarize what we learned about the mode

How to find it	Look at most common score
Strengths	Can be applied to all variable types
Weaknesses	Ignores all scores that are not most common
Variables allowed	Nominal, Ordinal, Ratio and Interval

Summary of mode characteristics

Median

The median is used less often as a central tendency measure in statistics. It is calculated by ordering the scores and taking the score that is in the "middle" of the order of scores. Although all scores impact the median, most scores have only a small impact except the middle score(s) that have a lot of impact.

The best way to understand medians is to look at an example. Here is a set of 5 scores...

1
3
4 = Median
10
11

Since there are 5 ordered scores, the 'middle score' or median will be the 3^rdscore, which is still '4'. Notice how the '10' and '11' score only affect by being higher than the 4 -- it doesn't matter how much higher they are.

Here is another distribution of 5 scores that has the same median as the previous distribution, even though it is actually quite a different distribution...

1
3
4 = Median
1000
1001

Since there are 5 ordered scores, the 'middle score' or median will be the 3^rdscore,which is '4' in this case. Notice how this distribution has the same mean as the previous distribution, even though the highest two scores are much higher than the previous two highest scores.

Here is another distribution of 5 scores that has a very different median than the previous distribution, even though four of the same five scores are the same.

1
3
999 = Median
1000
1001

As you can see, this distribution has a median of 999, which is very different from the previous distribution's median of 4, even though 4 of the 5 scores are the same. This highlights the main problem of the median, which is that it is highly affected by the middle score(s).

It's easy to choose the middle score when there is an odd number of scores, but what happens if there is an even number of scores like this...

There is no middle score, because there are 6 scores. You can probably guess how we choose a new middle score-- we average the 3^rd and the 4^thscores. So the median here is (5+7)/2=6. This shows us that the median does not have to be one of the scores.

The median can be applied to ordinal, interval and ratio variables, but not to nominal variables. This makes sense because nominal variables really don't have relative value, so we don't have an ordered list. All the other variable types do have order, so we can always pick the middle one.

Here's a table to summarize what we learned about the median

How to find it	Pick the middle score. Pick the average of the two middles scores if there are an even number of scores in the distribution
Strengths	Can be used with ordinal variables
Weaknesses	Affected greatly by middle score(s)
Variables allowed	Ordinal, Interval and Ratio

Summary of median characteristics

Mean

The mean is the most widely used central tendency measure in statistics. It is calculated by summing all of the scores and dividing by the number of scores. The strength of this measure is related to its major weakness in that every score of the distribution has an impact on the mean value. The problem is that scores that are extreme can heavily distort the mean. For example, a sample of students might have a mean value of 4 for the number of times the student has appeared on TV. However, in a sample of 100 students, it may be that only one student appeared on TV, but they appeared 400 times because they were on a regular TV show. Thus, in this case, 99% of the students have never been on TV and yet the mean value is still 4 which seems a bit misleading in this case. Therefore, the mean is not a good way of describing the typical score if the data has one or more extreme values.

The mean is calculated by averaging the scores, which is easy to understand. However, we need to start being more precise about how we define 'average'. In order to do that, we need to start introducing some of the symbols used in statistics. At first, these symbols can seem a little annoying, but once you get used to them, you'll find them pretty easy to deal with.

In order to calculate the mean, we are going to have to add up all the scores. There's a special symbol that means 'the sum of all scores' and it looks like this when the scores are of the variable named 'X'...

Σ_x= The sum of all scores of the X variable

The mean (μ) of all the scores is this sum divided by the number of scores which is designated by 'N'. So we have our first formula...μ=(ΣX)/N

The formula for the mean is a perfect example why formulas are concepts. The above formula is basically shorthand for saying 'The average of the scores is equal to the sum of the scores divided by the number of the scores'.

A mean only makes sense for interval and ratio variables. It's easy to see why a mean does not make sense for nominal values -- because you can't average a variable where the values are just category names. How do you average a man and a woman? This is because only interval and ratio variables have equal differences between each value. A mean of ordinal variables does not make sense, because if you average 1^st and 5^th place finishes in horse race, then the average times of those two horse races will not necessary be the times that would have finished in 3^rd place in both of these races. It could have been that in both races, the first 4 horses were very close together.

Horse	Race 1 time (seconds)	Race 2 time (seconds)	Average time for both races
1	60	90	75
2	61	61	61
3	62	62	62
4	63	63	63
5	90	60	75

Why the mean cannot be applied to an ordinal variables

Here's a table to summarize what we learned about the mean

How to find it	Sum the scores and divide the sum of by the number of scores
Strengths	Most common measure of central tendency
Weaknesses	Can be skewed by one extreme score
Variables allowed	Interval and Ratio

Summary of mean characteristics

To practice calculating means, click on the 'Solver: Central Tendency' button in the index on the left side of the page.

Variability

An important characteristic of a distribution is how much the scores spread out. This 'spread' can be measured in a number of ways. We could start by measuring the distance between the maximum and minimum scores (i.e., the 'range'), and this would give us a rough approximation of how spread out the scores are – but it would not give us a sense of where the majority of scores lie within that range.

The range is a crude measure because it is determined only by the maximum and minimum -- everything in between these endpoints is ignored in the calculation. To see why this is a problem, consider the distributions below...

Two very different distributions with the same range

These distributions have the same range, but are clearly quite different.

A better measure of the spread would take into account how far all of the scores are away from the typical score (i.e., mean). One such measure is the standard deviation which is denoted by σ. It is essential that you learn how to calculate this measure of variability because it will be used extensively in descriptive and inferential statistics.

Calculating σ is going to be our first real challenge. The concept is fairly straightforward, even as the formula will look a little ugly at first. Basically, the standard deviation is "the average distance of each score away from the mean score". Of course, calculating the mean is no big deal, we already learned that. The only problem is how we define "distance of each score away from the mean". We could just defined distance as the absolute distance (e.g. if the mean is 3, then a score of 5 has a distance of 2 and a score of 1 has a distance of 2). However, it turns out that if we take the square of the distance, then the sum of these squares is much more useful. It would be nice if this could be explained easily, but it can't and we don't want to get bogged down in long explanations.

So let's look at a small distribution and see how we would calculate the standard deviation. In this example, the mean of the scores is 4.

Value or calc	Deviation (Distance from mean)	(Distance from mean)²
1	(1-4)=-3	(-3)²=9
3	(3-4)=-1	(-1)²=1
5	(5-4)=1	(1)²=1
7	(7-4)=3	(3)²=9
Sum (Σ)		1+9+9+1=20

Sample standard deviation of scores

We are not quite done yet. We've calculated the sum of the squares, but we have to find the average of these, so we will divide this sum by N (the number of scores). This standardized sum of squares is called the variance.

This standardized sum of squares (i.e. variance) is strongly related to our ultimate goal -- the standard deviation. Notice that the variance was a measure of how far away the scores are from the mean, but that the variance was calculated using squared units. Of course, squaring numbers is going to have a big impact on their size, so we need to adjust for that squaring when we calculate the standard deviation. The easiest way to do that is to simply take the square root of the variance to get the standard deviation. So here's our final formula for standard deviation...

Wow, that was a long way to go. So let's go back to our original data and work it out.

Population scores are 1,3,5,7

What is the population standard deviation (σ)?

Answer:2.2361

Step 1:Calculate population mean. μ=(ΣX_i)/N

μ=(ΣX_i)/N=(1+3+5+7)/4=16/4=4

Step 2:Calculate population Sum of Squares. SS=Σ(X_i-μ)²

SS=Σ(X_i-μ)²=(-3²+-1²+1²+3²)=20

Step 3:Calculate Population Variance. σ²=SS/N

σ²=SS/N=20/4=5

Step 4:Calculate Population Standard Deviation. √σ²

√σ²=√5=2.2361

Note that in the sum calculation, X has a subscript of i (i.e. X_i)as the 'i' subscript means that we are summing through each score in X. This subscript isn't really important at this point, but will be more useful in later calculations. Also note that 'SS' is used as shorthand for Sum of Squares. SS will be used a lot when we get to other sections, so you should know what it means.

This calculation is a fantastic example of how formulas represent concepts, even if the formula is a little scary at first. This is typical of many calculations in statistics -- each part is really pretty simple, but putting everything together in the right order can be a challenge, especially if the problem is on an exam with time pressure. The only way to be sure you will do well on the exam is to make sure you've practiced these problems enough -- which means doing more practice problem than you might want to. This kind of problem involves 4 steps, but some of the more complicated problems on this site may involve up to 10 steps. That may sound like madness, but many real world problems are far more complicated than these challenges.

So far, we've learned how to calculate a standard deviation. However, there is another complication about standard deviations that you need to know about. As it turns out, there are two different kinds of standard deviations...

Standard deviations of populations calculated from the scores (which we just learned how to calculate)
Standard deviations of populations that have to be estimated from samples

This makes sense if you think about how often we have all the scores in the population -- which is almost never. Instead, we have to estimate the population standard deviation from the scores in the sample. This may sound like a nightmare, but actually the calculation is mostly the same with just a couple of small changes.

Standard deviation from population scores = √
Σ(X-μ)²/ N
Standard deviation estimated from sample scores √
Σ(X-X)²/ (N-1)

The first difference is that the symbol for the mean has changed from μ to X. This is because sample means are denoted by X while population means are denoted by μ. The second difference is that for the estimated standard deviation, the sum of squares is divided by 'N-1' instead of N. Why? Because the scores in a sample are a little bit closer to the sample mean than the the sample scores are to the population mean. We adjust for this by dividing by 'N-1'. With a very small N, this adjustment is quite large, because there is a very good chance of missing extreme scores.

One final thing we need to do is get out terminology straight. The 'standard deviation of the population that is estimated from the sample' is kind of a lot to say, and since we use this term over and over again, it would be nice if we could come up with an easier way to say it. To make things easier, this website will therefore use the term 'Sample Standard Deviation' in place of 'the population standard deviation that is estimated from the sample.' So when the website says 'sample standard deviation', you need to use the formula with the 'N-1' in the denominator. Also, the sample standard deviation is indicated by 'S'.

To see how this works, let's solve a sample standard deviation problem.

Sample scores are 3,5,7

What is the sample standard deviation (S)?

Answer:2

Step 1:Calculate sample mean. X=(ΣX_i)/N

X=(ΣX_i)/N=(3+5+7)/3=15/3=5

Step 2:Calculate sample Sum of Squares. SS=Σ(X_i-X)²

SS=Σ(X_i-X)²=(-2²+0²+2²)=8

Step 3:Calculate Sample Variance. S²=SS/(N-1)

S²=SS/(N-1)=8/2=4

Step 4:Calculate Sample Standard Deviation. √S²

√S²=√4=2

Let's summarize. The standard deviation is the most widely used measure of spread. There are two different standard deviations -- the population standard deviation (σ) and the sample standard deviation (S). The 'sample standard deviation' is really the 'population standard deviation estimated by the sample', but the 'sample standard deviation' is easier to use. The population divides the sum of squares by N while the sample standard deviation divides the sum of squares by 'N-1'.

Definitions

σ: The standard deviation of a distribution

Σ_x: Sum of all scores of variable X

Amodal Distribution: A distribution with 0 peaks

Asymptotic distribution: A distribution which extends to infinity in both directions. In other words, a distribution that has very, very long tails.

Bimodal Distribution: A distribution with 2 peaks

Mean (μ): The average of all the scores. This is this sum of all the scores divided by the number of scores (N)
Formula: μ=(ΣX)/N

Median: The middle score in a set of ordered scores.

Mode: The most frequent score in a distribution

Negatively Skewed Distribution: A distribution that has a long left side tail and a short right side tail

Positively Skewed Distribution: A distribution that has a short left side tail and a long right side tail

Range of a distribution: The \'spread\' of the distribution.
Formula: (Maximum Score) - (Minimum Score)

Sample standard deviation (S): Standard deviation estimated from sample scores.

Formula: √

Σ(X-X)²/ (N-1)

Standard deviation of population (σ):
Formula: √

Σ(X-μ)²/ N

Sum of squares: The sum of the squared deviations.
Formula : Σ(X-μ)²

Symmetric Distribution: A distribution that is the same on the left side as the right side

Trimodal Distribution: A distribution with 3 peaks

Unimodal Distribution: A distribution with 1 peak

Variance of population scores (σ²): A standardized term indicating the amount of deviation (in squared units)
Formula : Σ(X-μ)²/ N

Easy Questions

1. Ceiling effects are associated with what kind of distribution?

2. Floor effects are associated with what kind of distribution?

3. A population of the weights of men and women would have how many modes?

4. A distribution which is asymptotic must have scores that are _____________.

5. A distribution that is composed of three separate populations would be called ____________

6. An amodal distribution that has...

7. The distribution of the number of times a group of students have been on TV would likely have what characteristic?

8. An exam for a class was very easy, but a small number of students did
not study at all. What kind of a distribution would we expect?

9. The distribution of the number of times a group of students have taken Ecstasy would likely have what characteristic?

10. In reality, almost all distributions are not purely asymptotic because
most of the variables we measure in psychology _____________.

11. What types of variables can a mode be applied to?

12. What types of variables can a median be applied to?

13. What types of variables can a mean be applied to?

14. What is the main weakness of the median as a central tendency measure?

15. What is the main weakness of the mean as a central tendency measure?

Medium Questions

16. A population of the IQs of men and women would have how many modes?

17. A set of scores is 1, 3, 5, 7, 9 and X.
If the mean, median and mode of this set are all the same value, then what is X?

18. A set of scores is 3, 6, 9, 12, 15 and X.
If the mean, median and mode of this set are all the same value, then what is X?