8.3: Hypothesis Tests for the Mean (sigma unknown)
Tests About μ When σ is Unknown—The t-test for the Population Mean
As we mentioned earlier, only in a few cases is it reasonable to assume that the population standard deviation, σ, is known. The case where σ is unknown is much more common in practice. What can we use to replace σ? If you don’t know the population standard deviation, the best you can do is find the sample standard deviation, S, and use it instead of σ. (Note that this is exactly what we did when we discussed confidence intervals).
Is that it? Can we just use S instead of σ, and the rest is the same as the previous case? Unfortunately, it’s not that simple, but not very complicated either.
We will first go through the four steps of the t-test for the population mean and explain in what way this test is different from the z-test in the previous case. For comparison purposes, we will then apply the t-test to a variation of the two examples we used in the previous case, and end with an activity where you’ll get to carry out the t-test yourself.
Let’s start by describing the four steps for the t-test:
I. Stating the hypotheses.
In this step there are no changes:
* The null hypothesis has the form:
H0:μ=μ0
(where μ0 is the null value).
* The alternative hypothesis takes one of the following three forms (depending on the context):
Ha:μ<μ0(one-sided)
Ha:μ>μ0 (one-sided)
Ha:μ≠μ0 (two-sided)
II. Checking the conditions under which the t-test can be safely used and summarizing the data.
Technically, this step only changes slightly compared to what we do in the z-test. However, as you’ll see, this small change has important implications. The conditions under which the t-test can be safely carried out are exactly the same as those for the z-test:
(i) The sample is random (or at least can be considered random in context).
(ii) We are in one of the three situations marked with a green check mark in the following table (which ensure that ¯¯¯X is at least approximately normal):
Assuming that the conditions are met, we calculate the sample mean ¯x and the sample standard deviation, S (which replaces σ), and summarize the data with a test statistic. As in the z-test, our test statistic will be the standardized score of ¯xassuming that μ=μ0 (Ho is true). The difference here is that we don’t know σ, so we use S instead. The test statistic for the t-test for the population mean is therefore:
t=¯x−μ0s√n
The change is in the denominator: while in the z-test we divided by the standard deviation of ¯¯¯X, namely σ√n, here we divide by the standard error of ¯¯¯X, namely s√n. Does this have an effect on the rest of the test? Yes. The t-test statistic in the test for the mean does not follow a standard normal distribution. Rather, it follows another bell-shaped distribution called the t distribution. So we first need to introduce you to this new distribution as a general object. Then, we’ll come back to our discussion of the t-test for the mean and how the t-distribution arises in that context.
The t Distribution
We have seen that variables can be visually modeled by many different sorts of shapes, and we call these shapes distributions. Several distributions arise so frequently that they have been given special names, and they have been studied mathematically. So far in the course, the only one we’ve named is the normal distribution, but there are others. One of them is called the t distribution.
The t distribution is another bell-shaped (unimodal and symmetric) distribution, like the normal distribution; and the center of the t distribution is standardized at zero, like the center of the normal distribution.
Like all distributions that are used as probability models, the normal and the t distribution are both scaled, so the total area under each of them is 1.
So how is the t distribution fundamentally different from the normal distribution?
The spread.
The following picture illustrates the fundamental difference between the normal distribution and the t distribution:
You can see in the picture that the t distribution has slightly less area near the expected central value than the normal distribution does, and you can see that the t distribution has correspondingly more area in the “tails” than the normal distribution does. (It’s often said that the t distribution has “fatter tails” or “heavier tails” than the normal distribution.)
This reflects the fact that the t distribution has a larger spread than the normal distribution. The same total area of 1 is spread out over a slightly wider range on the t distribution, making it a bit lower near the center compared to the normal distribution, and giving the t distribution slightly more probability in the ‘tails’ compared to the normal distribution.
Therefore, the t distribution ends up being the appropriate model in certain cases where there is more variability than would be predicted by the normal distribution. One of these cases is stock values, which have more variability (or “volatility,” to use the economic term) than would be predicted by the normal distribution.
There’s actually an entire family of t distributions. They all have similar formulas (but the math is beyond the scope of this introductory course in statistics), and they all have slightly “fatter tails” than the normal distribution. But some are closer to normal than others. The t distributions that are closer to normal are said to have higher “degrees of freedom” (that’s a mathematical concept that we won’t use in this course, beyond merely mentioning it here). So, there’s a t distribution “with one degree of freedom,” another t distribution “with 2 degrees of freedom” which is slightly closer to normal, another t distribution “with 3 degrees of freedom.” which is a bit closer to normal than the previous ones, and so on.
The following picture illustrates this idea with just a couple of t distributions (note that “degrees of freedom” is abbreviated “d.f.” on the picture):
Recall that we were discussing the situation of testing for a mean, in the case when sigma is unknown. We’ve seen previously that when sigma is known, the test statistic is z=¯x−μ0σ√n (note the sigma (σ) in the formula), which follows a normal distribution. But when sigma is unknown, the test statistic in the test for a mean becomes t=¯x−μ0s√n (note the use of “s” in the formula, in place of the unknown sigma). Here is where the t-distribution arises in the context of a test for a mean, because t=¯x−μ0s√n (with “s” in the formula in place of the unknown sigma) follows a t distribution.
Notice the only difference between the formula for the Z statistic and the formula for the t statistic: In the formula for the Z statistic, sigma (the standard deviation of the population) must be known; whereas, when sigma isn’t known, then “s” (the standard deviation of the sample data) is used in place of the unknown sigma. That’s the change that causes the statistic to be a t statistic.
Why would this single change (using “s” in place of “sigma”) result in a sampling distribution that is the t distribution instead of the standard normal (Z) distribution? Remember that the t distribution is more appropriate in cases where there is more variability. So why is there more variability when s is used in place of the unknown sigma?
Well, remember that sigma (σ) is a parameter (it’s the standard deviation of the population), whose value therefore never changes. Whereas, s (the standard deviation of the sample data) varies from sample to sample, and therefore it’s another source of variation. So, using s in place of sigma causes the sampling distribution to be the t distribution because of that extra source of variation:
In the formula z=¯x−μ0σ√n, the only source of variation is the sampling variability of the sample mean ¯¯¯X (none of the other terms in that formula vary randomly in a given study);
Whereas in the formula t=¯x−μ0s√n, there are two sources of variation: One source is the sampling variability of the sample mean ¯¯¯X; The other source is the sampling variability of sample standard deviation s.
So, in a test for a mean, if sigma isn’t known, then s is used in place of the unknown sigma and that results in the test statistic being a t score.
The t score, in the context of a test for a mean, is summarized by the following figure:
In fact, the t score that arises in the context of a test for a mean is a t score with (n – 1) degrees of freedom. Recall that each t distribution is indexed according to “degrees of freedom.” Notice that, in the context of a test for a mean, the degrees of freedom depend on the sample size in the study. Remember that we said that higher degrees of freedom indicatethat the t distribution is closer to normal. So in the context of a test for the mean, the larger the sample size, the higher the degrees of freedom, and the closer the t distribution is to a normal z distribution. This is summarized with the notation near the bottom on the following image:
As a result, in the context of a test for a mean, the effect of the t distribution is most important for a study with a relatively small sample size.
We are now done introducing the t distribution. What are implications of all of this?
1. The null distribution of our t-test statistic: t=¯x−μ0s√n is the t distribution with (n-1) d.f. In other words, when Ho is true (i.e., when μ=μ0), our test statistic has a t distribution with (n-1) d.f., and this is the distribution under which we find p-values.
2. For a large sample size (n), the null distribution of the test statistic is approximately Z, so whether we use t(n-1) or Z to calculate the p-values should not make a big difference. Here is another practical way to look at this point. If we havea large n, our sample has more information about the population. Therefore, we can expect the sample standard deviation s to be close enough to the population standard deviation, σ, so that for practical purposes we can use s as the known σ, and we’re back to the z-test.
3. Finding the p-value
The p-value of the t-test is found exactly the same way as it is found for the z-test, except that the t distribution is used instead of the Zdistribution, as the figures below illustrate.
Comment:
Even though tables exist for the different t distributions, we will only use software to do the calculation for us.
Comment
Note that due to the symmetry of the t distribution, for a given value of the test statistic t, the p-value for the two-sided test is twice as large as the p-value of either of the one-sided tests. The same thing happens when p-values are calculated under the t distribution as when they are calculated under the Z distribution.
4. Drawing Conclusions
As usual, based on the p-value (and some significance level of choice) we assess the significance of results, and draw our conclusions in context.
To summarize:
The main difference between the z-test and the t-test for the population mean is that we use the sample standard deviation s instead of the unknown population standard deviation σ. As a result, the p-values are calculated under the t distribution instead of under the Z distribution. Since we are using software, this doesn’t really impact us practically. However, it is important to understand what is going on behind the scenes, and not just use the software mechanically. This is why we went through the trouble of explaining the t distribution.
We are now ready to look at two examples.
For comparison purposes, we will use a modified version of the two problems we used in the previous case. We’ll first introduce the modified versions and explain the changes.
Example
1
The SAT is constructed so that scores have a national average of 500. The distribution is close to normal. The dean of students of Ross College suspects that in recent years the college attracts students who are more quantitatively inclined. A random sample of 4 students entering Ross college had an average math SAT (SAT-M) score of 550, and a sample standard deviation of 100. Does this provide enough evidence for the dean to conclude that the mean SAT-M of all Ross College students is higher than the national mean of 500?
Here is a figure that represents this example where the changes are marked in blue:
Note that the problem was changed so that the population standard deviation (which was assumed to be 100 before) is now unknown, and instead we assume that the sample of 4 students produced a sample mean of 550 (no change) and a sample standard deviation of s=100. (Sample standard deviations are never such nice rounded numbers, but for the sake of comparison we left it as 100.) Note that due to the changes, the z-test for the population mean is no longer appropriate, and we need to use the t-test.
Example
2
A certain prescription medicine is supposed to contain an average of 250 parts per million (ppm) of a certain chemical. If the concentration is higher than this, the drug may cause harmful side effects; if it is lower, the drug may be ineffective. The manufacturer runs a check to see if the mean concentration in a large shipment conforms to the target level of 250 ppm or not. A simple random sample of 100 portions is tested, and the sample mean concentration is found to be 247 ppm with a sample standard deviation of 12 ppm. Again, here is a figure that represents this example where the changes are marked in blue:
The changes are similar to example 1: we no longer assume that the population standard deviation is known, and instead use the sample standard deviation of 12. Again, the problem was thus changed from a z-test problem to a t-test problem.
However, as we mentioned earlier, due to the large sample size (n = 100) there should not be much difference whether we use the z-test or the t-test. The sample standard deviation, s, is expected to be close enough to the population standard deviation σ . We’ll see this as we solve the problem.
Let’s carry out the t-test for both of these problems:
Example 1:
1. There are no changes in the hypotheses being tested:
2. The conditions that allow us to use the t-test are met since:
(i) The sample is random.
(ii) SAT-M is known to vary normally in the population (which is crucial here, since the sample size is only 4).
In other words, we are in the following situation:
The test statistic is t=¯x−μ0s√n=550−500100√4=1
The data (represented by the sample mean) are 1 standard error above the null value.
3. Finding the p-value.
Recall that in general the p-value is calculated under the null distribution of the test statistic, which,
in the t-test case, is t(n-1). In our case, in which n = 4, the p-value is calculated under the t(3) distribution:
Using statistical software, we find that the p-value is 0.196. For comparison purposes, the p-value that we got when we carried out the z-test for this problem (when we assumed that 100 is the known σ rather the calculated sample standard deviation, s) was 0.159.
It is not surprising that the p-value of the t-test is larger, since the t distribution has fatter tails. Even though in this particular case the difference between the two values does not have practical implications (since both are large and will lead to the same conclusion), the difference is not trivial.
4. Making conclusions.
The p-value (0.196) is large, indicating that the results are not significant. The data do not provide enough evidence to conclude that the mean SAT-M among Ross College students is higher than the national mean (500).
Here is a summary:
Example 2:
1. There are no changes in the hypotheses being tested:
2. The conditions that allow us to use the t-test are met:
(i) The sample is random
(ii) The sample size is large enough for the Central Limit Theorem to apply and ensure the normality of ¯¯¯X. In other words, we are in the following situation:
The test statistic is: t=¯x−μ0s√n=247−25012√100=−2.5
The data (represented by the sample mean) are 2.5 standard errors below the null value.
3. Finding the p-value.
To find the p-value we use statistical software, and we calculate a p-value of 0.014 with a 95% confidence interval of (244.619, 249.381). For comparison purposes, the output we got when we carried out the z-test for the same problem was a p-value of 0.012 with a 95% confidence interval of (244.648, 249.352).
Note that here the difference between the p-values is quite negligible (.002). This is not surprising, since the sample size is quite large (n = 100) in which case, as we mentioned, the z-test (in which we are treating s as the known σ ) is a very good approximation to the t-test. Note also how the two 95% confidence intervals are similar (for the same reason).
4. Conclusions:
The p-value is small (.014) indicating that at the 5% significance level, the results are significant. The data therefore provide evidence to conclude that the mean concentration in entire shipment is not the required 250.
Here is a summary:
Comments
- The 95% confidence interval for μ can be used here in the same way it is used when σ is known: either as a way to conduct the two-sided test (checking whether the null value falls inside or outside the confidence interval) or following a t-test where Ho was rejected (in order to get insight into the value of μ ).
- While it is true that when σ is unknown and for large sample sizes the z-test is a good approximation for the t-test, since we are using software to carry out the t-test anyway, there is not much gain in using the z-test as an approximation instead. We might as well use the more exact t-test regardless of the sample size.
However, it is always worthwhile knowing what happens behind the scenes.
To Summarize
1. In hypothesis testing for the population mean (μ ), we distinguish between two cases:
I. The less common case when the population standard deviation (σ) is known.
II. The more practical case when the population standard deviation is unknown and the sample standard deviation (s) is used instead.
2. In the case when σ is known, the test for μ is called the z-test, and in case when σ is unknown and s is used instead, the test is called the t-test.
3. In both cases, the null hypothesis is: H0:μ=μ0
and the alternative, depending on the context, is one of the following:
Ha:μ<μ0, or Ha:μ>μ0, or Ha:μ≠μ0
4. Both tests can be safely used as long as the following two conditions are met:
(i) The sample is random (or can at least be considered random in context).
(ii) Either the sample size is large (n > 30) or, if not, the variable of interest can be assumed to vary normally in the population.
5. In the z-test, the test statistic is:
z=¯¯¯X−μ0σ√n
whose null distribution is the standard normal distribution (under which the p-values are calculated).
6. In the t-test, the test statistic is:
t=¯¯¯X−μ0s√n
whose null distribution is t(n – 1) (under which the p-values are calculated).
7. For large sample sizes, the z-test is a good approximation for the t-test.
8. Confidence intervals can be used to carry out the two-sided testHa:μ≠μ0, and in cases where Ho is rejected, the confidence interval can give insight into the value of the population mean (μ).
9. Here is a summary of which test to use under which conditions: