TheBestStatistics.Info

Introduction

Before we get to hypothesis testing, we need to discuss one function that can be useful -- estimating a population mean given a mean calculated from a sample. Of course, if we are estimating a population mean from a sample mean, then there is going to be some error involved in that estimation because samples are never perfect estimates of the population. The way in which we handle the error is to not specify a specific value for the population but rather a range of population values that are likely to be correct given our sample mean. An example will surely help...

If we collect IQ data from a group of 25 students from a large university with a mean IQ of 115, what can we say about the population of all the students at the university? First, if the students were randomly selected from the university population, then there is no reason to expect any bias in that sample, so certainly our best guess of the student population mean is going to be 115 (same as the sample mean). But it is extremely unlikely that the population mean will be exactly 115. Instead, we can specify an interval of values that the population mean is likely to be, but the width of that range is going to depend on how confident we want to be correct. This interval of values is called the confidence interval. If we want to be 100% confident that our interval contains the population mean, we can just make the interval infinitely wide. Because the sample mean is our best estimate of the population mean, we would write this like this.

-∞<μ<∞

Of course, this is a little silly and not very helpful, but it does make the point that 100% certainty is unattainable. If we want to be 99% confident, then we are going to have to make an interval which is very wide. In fact, we would need to construct a distribution and then find a value which contains 99% of the population means. This would look something like this

Distribution with 1% in both tails (i.e. 99% in the middle of the distribution)

As you can see, this is a familiar picture -- the distribution that has 1% in both tails is the same as the distribution with 99% in the middle. This shows a strong link between confidence intervals and the t distributions you learned about at the end of the last chapter -- the confidence intervals are focused on the middle of the distributions and hypothesis testing focuses on the regions on the outside, even though both functions relate to this same display.

The difficulty in determining confidence intervals is calculating how wide the confidence interval is on each side of the mean. Once we have the width (W) from the mean to the edges of the confidence interval, we can state the confidence interval as

X-W <μ< X+W

In the display above for a 99% confidence interval, W would be the distance from the center of the confidence interval to the beginning of the red areas on both sides. This makes sense because the sample mean is the best estimate of the population mean and so we expect the confidence interval to be the part of the distribution between a value below the mean (X-W) and a value above the mean (X+W). The key to solving confidence intervals is determining the width of the interval, but how do we do that? Unfortunately, there are two different types of solutions, depending on the answer to the question 'Where did the standard deviation come from?'

If σ is known from the population scores, then we will use the standard error from the population (σ_X=σ_X/√N)
If σ has to be estimated from the sample, then we will use the sample standard deviation (S) as the estimate of σ, and the estimated standard error (S_X=S_X/√N)

Once we have determined which equation is appropriate, then the actual calculations are not all that difficult.

To solve any confidence interval problem, you need to be given three things.

The source of the standard deviation. If it is from the population scores (less common), we will use the z distribution. If it is from the sample scores (more common), we will use the t distributions.
The sample mean
The size of the confidence interval. The most common confidence interval problems are 95%, 99% and 99.9%.

In most cases, we won't have the population standard deviation information, so we will have to use the sample standard deviation. Why do we only have 99%, 95% and 99.9% confidence intervals? Because most of the time we only have the sample standard deviation, so we have to use the t distribution. But we only have the t statistics for those six area values in the last section, of which 3 are the most commonly used.

05(2) = or 5% in the tails which corresponds to 95% in the middle of the distribution = 95% confidence interval
01(2) = or 1% in the tails which corresponds to 99% in the middle of the distribution = 99% confidence interval
001(2) = or .1% in the tails which corresponds to 99.9% in the middle of the distribution = 99.9% confidence interval

To summarize, confidence intervals specify a range of means that is likely to contain the population mean. The larger the range, the more confident you can be that the true population mean will be in the confidence interval.

Known σ

In the introduction, we learned that a confidence interval is a range of population of means that are likely to contain the true population mean. In order to calculate a confidence interval, we need

The source of the standard deviation.
The sample mean
The size of the confidence interval.

This section will deal with confidence intervals created when the population σ is known. As such, σ will come directly from the population scores. Although it is not very common to have this population standard deviation, the calculation of this confidence interval will lay a foundation for the other type of confidence interval.

The best way to understand how to solve a confidence interval is to jump right in with a sample problem.
A sample mean of 103 was collected from a population that has a known standard deviation of 12. There were 16 subjects in the sample that created this mean. What is the 95% confidence interval for the range of population means?

Step 1: Since we are given σ, we will be using the z distribution and z-table to find the z-scores that we will use to calculate the confidence interval
Step 2: We need to calculate the standard error σ_X.
Formula is σ_X=σ_X/√N.
σ_X = 12 / √16
σ_X = 12 / 4
σ_X = 3
Step 3: Given our standard error of 3, we need to know which z-score to use to have 95% of the mean distribution in the middle of the curve. If there is 95% in the middle part of the curve, then there is 47.5% on each side of the curve because the z distribution is symmetric. But the table is displayed by proportion, so we need to convert 47.% into .475. To make things easier, we can take the size of the confidence interval in % (e.g. 95) and just divided by 200 -- which is combing these two steps of finding half the curve and then converting to proportion.
Step 4: To find the z score that corresponds to area .475, we use the reverse table look up procedure we learned in the normal distribution chapter. As it turns out, a z-score of 1.96 happens to have an area of exactly .475, so we don't have to perform the normal process of seeing which z-score has the area closest to our lookup area.
Step 5: Given a z-score of 1.96, we can calculate the confidence interval. Recall that the confidence interval will take the following form..

X-W <μ< X+W (where W is the width of the confidence interval from the center to the edge)

The problem is determining W. In this case it will be 1.96 standard errors, so all we need to do is multiply 1.96 by the standard error that we calculated to be 3.

X-(1.96*σ_X)<μ< X+(1.96*σ_X)
103-(1.96*3)<μ<103+(1.96*3)
97.12<μ<108.88

A picture will help.

95% Confidence Interval for sample problem

As you can see, the middle 95% of this mean distribution is from -1.96 standard errors to +1.96 standard errors. Stated in raw scores, this range is from 97.12 to 108.88.

Another problem will be helpful. For a normally distributed population of scores, what is the 99% confidence interval of the population mean if the sample mean is 85 (N=64) and the standard deviation (σ_X) of the population scores is 25?

Because the standard deviation from the population is given to us, this will be a z distribution problem. Here's the solution with a graph...

Answer:76.96<μ<93.04

Step 1:Calculate Standard Error. σ_X=σ_X/√N

σ_X=σ_X/√N=25/√64=3.13

Step 2:Find area to lookup. Area=(CI%)/200

Area=(CI%)/200=99/200 = 0.495

Step 3:Find Z_closer to Area. Use Z table

Z_lower=2.57,area=0.4949,distance=0.495-0.4949=0.0001
Z_higher=2.58,area=0.4951,distance=0.4951-0.495=0.0001
Z_closer=2.57

Step 4:Calculate CI. CI=X - (Z_closer* σ_X)<μ<X + (Z_closer* σ_X)

CI=X - (Z_closer* σ_X)<μ<X + (Z_closer* σ_X)=85- (2.57*3.13)<μ<85+ (2.57*3.13)
=76.96 < μ <93.04

99% Confidence Interval for sample problem

Notice that this confidence interval extends from -2.58 standard errors to +2.58 standard errors, which corresponds to the means from 76.96 to 93.04, so the solution is 76.96 < μ <93.04.

Unknown σ

In the introduction, we learned that a confidence interval is a range of population of means that are likely to contain the true population mean. In order to calculate a confidence interval, we need..

The source of the standard deviation.
The sample mean
The size of the confidence interval.

This chapter will deal with confidence intervals created when the population σ is unknown. As such, σ will have to be estimated from the sample standard deviation S. Using the sample standard deviation to estimate σ is very common.

In the last section we solved the following problem.

A sample mean of 103 was collected from a population that has a known standard deviation of 12. There were 16 subjects in the sample that created this mean. What is the 95% confidence interval for the range of population means?

What if we had the same information, but the standard deviation was from a sample instead of a population? The problem would now look like this...

A sample mean of 103 was collected and the sample standard deviation was calculated to be 12. There were 16 subjects in the sample that created this mean. What is the 95% confidence interval for the range of population means?

Think a bit about what differences would there be in the way we solved this problem...

The only important difference between solving confidence intervals for known and unknown σ is the shape of the distribution. When σ is known, we use the standardized z distribution, but when σ is unknown, we need to use the t distribution. The challenge with using t distributions is to remember that the shape of the t distribution depends on the degrees of freedom (i.e. sample size -1). The t distribution affects the solution by changing how far on each side the confidence interval needs to extend in order to contain the appropriate middle portion that is likely to contain the true population mean. More specifically, confidence intervals using the t distribution will always require us to go out a little further to contain the same middle portion of the distribution , compared to the z distribution.

The only important difference between solving confidence intervals for known and unknown σ is the shape of the distribution. When σ is known, we use the standardized z distribution, but when σ is unknown, we need to use the t distribution. The challenge with using t distributions is to remember that the shape of the t distribution depends on the degrees of freedom (i.e. sample size -1). This impacts the solution by changing how far on each side the confidence interval needs to extend in order to contain the appropriate middle portion that is likely to contain the true population mean. More specifically, confidence intervals using the t distribution will always require us to go out a little further to contain the same middle portion of the distribution , compared to the z distribution. Below, we can compare the z distribution with the t distribution where df=5 (sample size=6).

Comparison of z distribution with t distribution where df=5

As you can see the t distribution is wider and flatter so we have to go out further to have 95% of the curve in the middle for the 95% confidence interval. To determine exactly how far we need to go, we have to use the t critical table that we learned about in the sampling mean chapter.

Here are the first 5 rows of the t critical table..

	.05(1)	.05(2)	.01(1)	.01(2)	.001(1)	.001(2)
df
1	6.314	12.706	31.821	63.657	318.313	636.62
2	2.92	4.303	6.965	9.925	22.327	31.598
3	2.353	3.182	4.541	5.841	10.215	12.924
4	2.132	2.776	3.747	4.604	7.173	8.61
5	2.015	2.571	3.365	4.032	5.893	6.869

Since we have df=5 and we have a 95% confidence interval, we use the .05(2) column because 95% in the middle corresponds to 5%(.05) divided between 2 tails. This value of 2.571 is highlighted.

So , the procedure for calculating confidence intervals with a sample standard deviation will be the same as with a population standard deviation except we will use a different critical value to account for the wider t distribution.

Here is the problem again with the complete solution. For a normally distributed population of scores with an UNKNOWN population standard deviation, what is the 95% confidence interval of the population mean if the sample mean is 103 (N=16) and the sample estimate of standard deviation (S_X) of the population scores is 12?

Answer:96.61<μ<109.39

Step 1:Calculate Estimated Standard Error. S_X=S_X/√N

S_X=S_X/√N=12/√16=3

Step 2:Find t_Crit value. Look up t critical

:Since this is a 95% CI, Use .05,2 tailed column
Look up t_Crit from table (Df=N-1=16-1=15). So t_Crit=2.131

Step 3:Calculate CI. CI=X - (t_Crit* S_X)<μ<X +(t_Crit* S_X)

CI=X - (t_Crit* S_X)<μ<X +(t_Crit* S_X)=103- (2.131*3)<μ<103+ (2.131*3)
=96.61 < μ <109.39

Complete solution of confidence interval with only sample information given

Let's do one more problem. For a normally distributed population of scores with an UNKNOWN population standard deviation, what is the 99% confidence interval of the population mean if the sample mean is 104 (N=9) and the sample estimate of standard deviation (S_X) of the population scores is 6?

Answer:97.29<μ<110.71

Step 1:Calculate Estimated Standard Error. S_X=S_X/√N

S_X=S_X/√N=6/√9=2

Step 2:Find t_Crit value. Look up t critical

:Since this is a 99% CI, Use .01,2 tailed column
Look up t_Crit from table (Df=N-1=9-1=8). So t_Crit=3.355

Step 3:Calculate CI. CI=X - (t_Crit* S_X)<μ<X +(t_Crit* S_X)

CI=X - (t_Crit* S_X)<μ<X +(t_Crit* S_X)=104- (3.355*2)<μ<104+ (3.355*2)
=97.29 < μ <110.71

Definitions

Confidence Interval: A range of values that is likely to contain the population mean. Confidence Intervals are estimated from a sample mean and a source of variability.

Easy Questions

1. What are the most common confidence intervals? Why?

2. Which of these confidence intervals will have the widest range, 95, 99, 99.9 or 99.99?

3. When the population standard deviation is known, what kind of a distribution do we use to estimate a confidence interval?

4. When the population standard deviation is NOT known, then what kind of a distribution do we use to estimate a confidence interval?

5. A confidence interval has been determined to be 120 < u < 126, what was the sample mean that was used to produce this confidence
interval?

6. A sample mean of 132 is used to create a confidence interval of 120 < u < X. What is X?

7. For a normally distributed population of scores with an UNKNOWN population standard deviation, what is the 95% confidence interval of the population mean if the sample mean is 100 (N=25) and the sample estimate of standard deviation (S_X) of the population scores is 25?

8. For a normally distributed population of scores with an UNKNOWN population standard deviation, what is the 99% confidence interval of the population mean if the sample mean is 25 (N=25) and the sample estimate of standard deviation (S_X) of the population scores is 25?

Medium Questions

9. You don't know the sample mean and you have no information about the standard deviation, but you want to create a 100% confidence interval. What will it be?

10. For a normally distributed population of scores, what is the 95% confidence interval of the population mean if the sample mean is 85 (N=64) and the standard deviation (σ_X) of the population scores is 12?

11. For a normally distributed population of scores, what is the 99% confidence interval of the population mean if the sample mean is 85 (N=100) and the standard deviation (σ_X) of the population scores is 10?

12. For a normally distributed population of scores with an UNKNOWN population standard deviation, what is the 80% confidence interval of the population mean if the sample mean is 100 (N=25) and the sample estimate of standard deviation (S_X) of the population scores is 25?