{"id":559,"date":"2024-10-18T02:41:41","date_gmt":"2024-10-18T02:41:41","guid":{"rendered":"https:\/\/pressbooks.ccconline.org\/mat1260\/?post_type=chapter&#038;p=559"},"modified":"2025-01-10T22:13:27","modified_gmt":"2025-01-10T22:13:27","slug":"8-3-hypothesis-tests-for-the-mean-sigma-unknown","status":"publish","type":"chapter","link":"https:\/\/pressbooks.ccconline.org\/mat1260\/chapter\/8-3-hypothesis-tests-for-the-mean-sigma-unknown\/","title":{"raw":"8.3: Hypothesis Tests for the Mean (sigma unknown)","rendered":"8.3: Hypothesis Tests for the Mean (sigma unknown)"},"content":{"raw":"<div id=\"ad4cddb074274366ba21655d678e9bac\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<div id=\"d73d1fa10f64412e9cedd7cf886a30a2\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div id=\"d9934ab9907247088157b1a054896017\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Tests About \u03bc When \u03c3 is Unknown\u2014The t-test for the Population Mean<\/span><\/h2>\r\n<p id=\"a8cedc77d9d04167a08989e8034ef22a\">As we mentioned earlier, only in a few cases is it reasonable to assume that the population standard deviation, \u03c3, is known. The case where \u03c3 is unknown is much more common in practice. What can we use to replace \u03c3? If you don\u2019t know the population standard deviation, the best you can do is find the sample standard deviation, S, and use it instead of \u03c3. (Note that this is exactly what we did when we discussed confidence intervals).<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"d68cb487ac91402db4a80f825256caf2\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of interest. \u03bc is unknown and \u03c3 is unknown. From the population we create a SRS of size n, represented by a smaller circle. We can find x-bar for this SRS, and we can also obtain S. We use this instead of the unknown \u03c3.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image355.gif\" alt=\"A large circle represents the population of interest. \u03bc is unknown and \u03c3 is unknown. From the population we create a SRS of size n, represented by a smaller circle. We can find x-bar for this SRS, and we can also obtain S. We use this instead of the unknown \u03c3.\" \/><\/span><\/span>\r\n<p id=\"ba321a5b17b34234b4ed0152e9c07e45\">Is that it? Can we just use S instead of \u03c3, and the rest is the same as the previous case? Unfortunately, it\u2019s not that simple, but not very complicated either.<\/p>\r\n<p id=\"d2ac9feda22b4f76836428de971caac4\">We will first go through the four steps of the t-test for the population mean and explain in what way this test is different from the z-test in the previous case. For comparison purposes, we will then apply the t-test to a variation of the two examples we used in the previous case, and end with an activity where you\u2019ll get to carry out the t-test yourself.<\/p>\r\n<p id=\"e9741215ccb34de88e7a423586646d1d\">Let\u2019s start by describing the four steps for the t-test:<\/p>\r\n<p id=\"baa43dc8485143329b903a4520e073ac\"><em class=\"italic\">I.\u00a0<\/em>Stating the hypotheses.<\/p>\r\n<p id=\"aac312369d1f4f55b79201e1f584c461\">In this step there are no changes:<\/p>\r\n<p id=\"a9ba1a3aee6a4c6baa3049a8c89d2474\">* The null hypothesis has the form:<\/p>\r\n<p id=\"e78d5c21c05746eea36c36c5f639005c\"><strong><em>H<sub>0<\/sub> : \u03bc = \u03bc<sub>0<\/sub><\/em>\u00a0<\/strong><\/p>\r\n<p id=\"e8c3fa40eb464a158aea90fb0fb6f2b3\">(where <strong>\u03bc<sub>0 <\/sub><\/strong>is the null value).<\/p>\r\n<p id=\"fd9eb364b8624c09ab7aa14d979c53b2\">* The alternative hypothesis takes one of the following three forms (depending on the context):<\/p>\r\n<p id=\"ad6d8519c3a1482aaff6cb8dd897870d\"><em><strong>H<sub>a<\/sub> : \u03bc &lt; \u03bc<sub>0<\/sub><\/strong><\/em>\u00a0(one-sided)<\/p>\r\n<p id=\"fb4a7624d6db4c5cb9cd4f89743e204e\"><em><strong>H<sub>a<\/sub> : \u03bc &gt; \u03bc<sub>0<\/sub><\/strong><\/em>\u00a0\u00a0(one-sided)<\/p>\r\n<p id=\"deff4e181ec045d7bec352f549e65d5b\"><em><strong>Ha : \u03bc \u2260 \u03bc<sub>0<\/sub>\u00a0<\/strong><\/em> (two-sided)<\/p>\r\n<p id=\"e27ad0b2327348d19b22add7e9f4bfdd\"><em class=\"italic\">II.<\/em>\u00a0Checking the conditions under which the t-test can be safely used and summarizing the data.<\/p>\r\n<p id=\"f805ffea90ba401b994f0ab2d48f2af2\">Technically, this step only changes slightly compared to what we do in the z-test. However, as you\u2019ll see, this small change has important implications. The conditions under which the t-test can be safely carried out are exactly the same as those for the z-test:<\/p>\r\n<p id=\"eeed6de7a8d447fba95df67bd0b50633\">(i) The sample is random (or at least can be considered random in context).<\/p>\r\n<p id=\"da47ec1725f6462e85afaa223d223cb1\">(ii) We are in one of the three situations marked with a green check mark in the following table (which ensure that\u00a0[latex]\\overline{X}[\/latex]\u00a0is at least approximately normal):<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"f3240078a0bf4872bb816384d05e9644\" class=\"img-responsive popimg aligncenter\" title=\"A table which has two columns and two rows, and is titled &amp;quot;Conditions: z-test for a population mean.&amp;quot; The column headings are: &amp;quot;Small Sample Size&amp;quot; and &amp;quot;Large Sample Size. &amp;quot; The row headings are &amp;quot;Variable varies normally in the population&amp;quot; and &amp;quot;Variable doesn&amp;apos;t vary normally in the population.&amp;quot; Here is the data in the table by cell in &amp;quot;Row, Column: Value&amp;quot; format: Variable varies normally in the population, Small sample size: OK; Variable varies normally in the population, Large sample size: OK; Variable doesn&amp;apos;t vary normally in the population, Small sample size: NOT OK; Variable doesn&amp;apos;t vary normally in the population, Large sample size: OK;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image325.gif\" alt=\"A table which has two columns and two rows, and is titled &amp;quot;Conditions: z-test for a population mean.&amp;quot; The column headings are: &amp;quot;Small Sample Size&amp;quot; and &amp;quot;Large Sample Size. &amp;quot; The row headings are &amp;quot;Variable varies normally in the population&amp;quot; and &amp;quot;Variable doesn&amp;apos;t vary normally in the population.&amp;quot; Here is the data in the table by cell in &amp;quot;Row, Column: Value&amp;quot; format: Variable varies normally in the population, Small sample size: OK; Variable varies normally in the population, Large sample size: OK; Variable doesn&amp;apos;t vary normally in the population, Small sample size: NOT OK; Variable doesn&amp;apos;t vary normally in the population, Large sample size: OK;\" \/><\/span><\/span>\r\n<p id=\"f1a513f6176f4604ab70970d16e2f785\">Assuming that the conditions are met, we calculate the sample mean [latex]\\overline{x}[\/latex] and the sample standard deviation, S (which replaces \u03c3), and summarize the data with a test statistic. As in the z-test, our test statistic will be the standardized score of [latex]\\overline{X}[\/latex] assuming that <strong><em> \u03bc = \u03bc<sub>0 <\/sub><\/em><\/strong>(H<sub>o<\/sub>\u00a0is true). The difference here is that we don\u2019t know \u03c3, so we use S instead. The test statistic for the t-test for the population mean is therefore:<\/p>\r\n[latex]t=\\frac{\\overline{x} - u_{0}}{\\frac{s}{\\sqrt{n}}}[\/latex]\r\n<p id=\"b3f637305a804ef381e0fb73ab274cc8\">The change is in the denominator: while in the z-test we divided by the standard\u00a0<em class=\"italic\">deviation<\/em>\u00a0of\u00a0[latex]\\overline{x}[\/latex], namely\u00a0[latex]\\frac{\\sigma}{\\sqrt{n}}[\/latex], here we divide by the standard\u00a0<em class=\"italic\">error<\/em> of\u00a0[latex]\\overline{X}[\/latex], namely\u00a0[latex]\\frac{s}{\\sqrt{n}}[\/latex]. Does this have an effect on the rest of the test? Yes. The t-test statistic in the test for the mean does not follow a standard normal distribution. Rather, it follows another bell-shaped distribution called the t distribution. So we first need to introduce you to this new distribution as a general object. Then, we\u2019ll come back to our discussion of the t-test for the mean and how the t-distribution arises in that context.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"be78453423644952928e6f31050f077f\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">The t Distribution<\/span><\/h2>\r\n<p id=\"feaaa0de39fc49b7b162a8cf139d6424\">We have seen that variables can be visually modeled by many different sorts of shapes, and we call these shapes distributions. Several distributions arise so frequently that they have been given special names, and they have been studied mathematically. So far in the course, the only one we\u2019ve named is the normal distribution, but there are others. One of them is called the t distribution.<\/p>\r\n<p id=\"f590be039b454c4f83072fc7fa5219d4\">The t distribution is another bell-shaped (unimodal and symmetric) distribution, like the normal distribution; and the center of the t distribution is standardized at zero, like the center of the normal distribution.<\/p>\r\n<p id=\"db8582517b6548a2a85ce751f331541b\">Like all distributions that are used as probability models, the normal and the t distribution are both scaled, so the total area under each of them is 1.<\/p>\r\n<p id=\"a7e2b264318a4cfda43efbf33c65827a\">So how is the t distribution fundamentally\u00a0<em class=\"italic\">different<\/em>\u00a0from the normal distribution?<\/p>\r\n<p id=\"ec77c27a2d1345e9911c449ea997ac1c\">The\u00a0<em class=\"italic\">spread<\/em>.<\/p>\r\n<p id=\"e154a5d29f7340d6b11b8c2288a0dbff\">The following picture illustrates the fundamental difference between the normal distribution and the t distribution:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"e38cb0a802c54b0891cd2437b6b42b41\" class=\"img-responsive popimg aligncenter\" title=\"A standard normal curve modeling the Z-distribution and a curve modeling the t-distribution. Both have been scaled so that the area under the curve is 1. The standard normal curve has less spread than the t-distribution curve. This means that the left and right tails are closer to each other than in the t-distribution, and that it is taller than the t-distribution. The t-distribution is narrower than the standard normal distribution when close to the center. Because of this, the curves intersect once on each side of the center.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image363.gif\" alt=\"A standard normal curve modeling the Z-distribution and a curve modeling the t-distribution. Both have been scaled so that the area under the curve is 1. The standard normal curve has less spread than the t-distribution curve. This means that the left and right tails are closer to each other than in the t-distribution, and that it is taller than the t-distribution. The t-distribution is narrower than the standard normal distribution when close to the center. Because of this, the curves intersect once on each side of the center.\" \/><\/span><\/span>\r\n<p id=\"d52b36e261d346ab8ac5fa24ac76ad21\">You can see in the picture that the t distribution has\u00a0<em class=\"italic\">slightly less area near the expected central value<\/em>\u00a0than the normal distribution does, and you can see that the t distribution has correspondingly\u00a0<em class=\"italic\">more area in the \u201ctails\u201d<\/em>\u00a0than the normal distribution does. (It\u2019s often said that the t distribution has \u201cfatter tails\u201d or \u201cheavier tails\u201d than the normal distribution.)<\/p>\r\n<p id=\"fc254cc52d2544c798475f1535dc6c29\">This reflects the fact that the t distribution\u00a0<em class=\"italic\">has a larger spread<\/em>\u00a0than the normal distribution. The same total area of 1 is spread out over a slightly wider range on the t distribution, making it a bit lower near the center compared to the normal distribution, and giving the t distribution slightly more probability in the \u2018tails\u2019 compared to the normal distribution.<\/p>\r\n<p id=\"f866b4aad2f7492188e1a24be9bfad20\">Therefore, the t distribution ends up being the appropriate model in certain cases where there is\u00a0<em class=\"italic\">more variability<\/em>\u00a0than would be predicted by the normal distribution. One of these cases is stock values, which have more variability (or \u201cvolatility,\u201d to use the economic term) than would be predicted by the normal distribution.<\/p>\r\n<p id=\"d7c45dd022d74790af3db678696dd5be\">There\u2019s actually an entire family of t distributions. They all have similar formulas (but the math is beyond the scope of this introductory course in statistics), and they all have slightly \u201cfatter tails\u201d than the normal distribution. But some are closer to normal than others. The t distributions that are closer to normal are said to have higher \u201cdegrees of freedom\u201d (that\u2019s a mathematical concept that we won\u2019t use in this course, beyond merely mentioning it here). So, there\u2019s a t distribution \u201cwith one degree of freedom,\u201d another t distribution \u201cwith 2 degrees of freedom\u201d which is slightly closer to normal, another t distribution \u201cwith 3 degrees of freedom.\u201d which is a bit closer to normal than the previous ones, and so on.<\/p>\r\n<p id=\"dd09a482f6e049da85df0e358d274a23\">The following picture illustrates this idea with just a couple of t distributions (note that \u201cdegrees of freedom\u201d is abbreviated \u201cd.f.\u201d on the picture):<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"a9f0e3f15c99425ab08407314876ecf4\" class=\"img-responsive popimg aligncenter\" title=\"The standard normal z-distribution curve overlaid with a t-distribution with 5 d.f., and a t-distribution with 2 d.f. The distribution with 2 t.f. is shorter and has more spread than the t-distribution with 5 d.f., which in turn is shorter and wider than the standard normal distribution.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image417.gif\" alt=\"The standard normal z-distribution curve overlaid with a t-distribution with 5 d.f., and a t-distribution with 2 d.f. The distribution with 2 t.f. is shorter and has more spread than the t-distribution with 5 d.f., which in turn is shorter and wider than the standard normal distribution.\" \/><\/span><\/span>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"d5624d2544454a93a4fdeb57ca4e3a9d\">The following figure of the standard normal distribution together with a t distribution will visually help you answer the following questions.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b0b21dab921c46f5bfeb8dd106c41125\" class=\"img-responsive popimg aligncenter\" title=\"The standard normal Z distribution curve and the t-distribution curve overlaid on top of each other, centered at a z-score of 0. At z-score = 3, a blue vertical line has been drawn. Here, the t distribution&amp;apos;s wider spread causes it to be higher than the standard normal curve. Going right, we see that the standard normal curve reaches zero much sooner compared to the t distribution curve.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image395.gif\" alt=\"The standard normal Z distribution curve and the t-distribution curve overlaid on top of each other, centered at a z-score of 0. At z-score = 3, a blue vertical line has been drawn. Here, the t distribution&amp;apos;s wider spread causes it to be higher than the standard normal curve. Going right, we see that the standard normal curve reaches zero much sooner compared to the t distribution curve.\" \/><\/span><\/span>\r\n\r\n[h5p id=\"193\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"aaaa753820b74548801e052847100251\">The following figure of the standard normal distribution together with a t distribution will visually help you answer the following questions.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"e45a9b1aa40046ac8afaf164f74ddc98\" class=\"img-responsive popimg aligncenter\" title=\"The standard normal Z distribution curve and the t distribution curve overlaid on top of each other, centered at a z-score of 0. At z-score = -2, a blue vertical line has been drawn. Here, the t distribution and standard normal curve intersect. Going left, we see that the standard normal curve reaches zero much sooner compared to the t distribution curve, and that the t distribution is above the standard normal distribution. Going right from the vertical blue line, we see that the t distribution is under the standard normal distribution and ultimately will have a lower peak value compared to the standard normal distribution.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image418.gif\" alt=\"The standard normal Z distribution curve and the t distribution curve overlaid on top of each other, centered at a z-score of 0. At z-score = -2, a blue vertical line has been drawn. Here, the t distribution and standard normal curve intersect. Going left, we see that the standard normal curve reaches zero much sooner compared to the t distribution curve, and that the t distribution is above the standard normal distribution. Going right from the vertical blue line, we see that the t distribution is under the standard normal distribution and ultimately will have a lower peak value compared to the standard normal distribution.\" \/><\/span><\/span>\r\n\r\n[h5p id=\"194\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"ab40301b1321460497bce4a86577ae28\" class=\"section section-didigetthis purposewrap\">Now let\u2019s return to our discussion of the test for the mean, and let\u2019s see how and why the t distribution arises in that context.<\/div>\r\n<div>\r\n<p id=\"b3ffbe763f994f5ebcbf8d7f85111243\">Recall that we were discussing the situation of testing for a mean, in the case when sigma is unknown. We\u2019ve seen previously that when sigma is known, the test statistic is [latex]z=\\frac{\\overline{x}-\\mu_0}{\\frac{\\sigma}{\\sqrt{n}}}[\/latex] (note the sigma (\u03c3) in the formula), which follows a normal distribution. But when sigma is\u00a0<em class=\"italic\">unknown<\/em>, the test statistic in the test for a mean becomes [latex]t=\\frac{\\overline{x}-\\mu_0}{\\frac{s}{\\sqrt{n}}}[\/latex] (note the use of \u201cs\u201d in the formula, in place of the unknown sigma).\u00a0<em class=\"italic\">Here<\/em> is where the t-distribution arises in the context of a test for a mean, because [latex]t=\\frac{\\overline{x}-\\mu_0}{\\frac{s}{\\sqrt{n}}}[\/latex] (with \u201cs\u201d in the formula in place of the unknown sigma) follows a t distribution.<\/p>\r\n<p id=\"a59e956ade884ed38c29452f3e4509ec\">Notice the only difference between the formula for the Z statistic and the formula for the t statistic: In the formula for the Z statistic, sigma (the standard deviation of the population) must be known; whereas, when sigma isn\u2019t known, then \u201cs\u201d (the standard deviation of the sample data) is used in place of the unknown sigma. That\u2019s the change that causes the statistic to be a t statistic.<\/p>\r\n<p id=\"e02c0bbdec3b420c9a4cfdd13003b2c8\">Why would this single change (using \u201cs\u201d in place of \u201csigma\u201d) result in a sampling distribution that is the t distribution instead of the standard normal (Z) distribution? Remember that the t distribution is more appropriate in cases where there is more variability. So why is there more variability when s is used in place of the unknown sigma?<\/p>\r\n<p id=\"d6918e6a4fbf476bb4e9db0ee3a2f5ff\">Well, remember that sigma (\u03c3) is a parameter (it\u2019s the standard deviation of the population), whose value therefore never changes. Whereas, s (the standard deviation of the sample data) varies from sample to sample, and therefore it\u2019s another source of variation. So, using s in place of sigma causes the sampling distribution to be the t distribution because of that extra source of variation:<\/p>\r\n<p id=\"dfdcfd253cb543869dfb159ae842eef9\">In the formula [latex]z=\\frac{\\overline{x}-\\mu_0}{\\frac{\\sigma}{\\sqrt{n}}}[\/latex], the only source of variation is the sampling variability of the sample mean [latex]\\overline{X}[\/latex] (none of the other terms in that formula vary randomly in a given study);<\/p>\r\n<p id=\"ab22dd1e90b04718a9651e72264bce09\">Whereas in the formula [latex]t=\\frac{\\overline{x}-\\mu_0}{\\frac{s}{\\sqrt{n}}}[\/latex], there are <em class=\"italic\">two<\/em> sources of variation: One source is the sampling variability of the sample mean [latex]\\overline{X}[\/latex]; The <em class=\"italic\">other<\/em>\u00a0source is the sampling variability of sample standard deviation s.<\/p>\r\n<p id=\"ac4b74607ee34d86b39b01c59e2973ee\">So, in a test for a mean, if sigma isn\u2019t known, then s is used in place of the unknown sigma and that results in the test statistic being a t score.<\/p>\r\n<p id=\"ed3e8903cda1429c9232a27c3d24770b\">The t score, in the context of a test for a mean, is summarized by the following figure:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"e6112c1e652c463b84811fe137b19dcb\" class=\"img-responsive popimg aligncenter\" title=\"The z-score is calculated with z = ( x-bar - \u03bc ) \/ [ \u03c3\/\u221an ]. Note that there is only one source of variation, x-bar. The standard deviation of x-bar is the denominator, \u03c3\/\u221an. This Z (standard normal) distribution is centered at 0, bell shaped, and has a standard devation of 1. The t-score is calculated with t = ( x-bar - \u03bc) \/ [ s\/\u221an ] . Note that the denominator, s\/\u221an, is the standard error of x-bar. Also notice that we now have two sources of variation, x-bar and s. The t-distribution (with n-1 d.f.) is centered at zero, bell shaped, and has a larger spread.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image365.gif\" alt=\"The z-score is calculated with z = ( x-bar - \u03bc ) \/ [ \u03c3\/\u221an ]. Note that there is only one source of variation, x-bar. The standard deviation of x-bar is the denominator, \u03c3\/\u221an. This Z (standard normal) distribution is centered at 0, bell shaped, and has a standard devation of 1. The t-score is calculated with t = ( x-bar - \u03bc) \/ [ s\/\u221an ] . Note that the denominator, s\/\u221an, is the standard error of x-bar. Also notice that we now have two sources of variation, x-bar and s. The t-distribution (with n-1 d.f.) is centered at zero, bell shaped, and has a larger spread.\" \/><\/span><\/span>\r\n<p id=\"d46f1c4b8edc4ed48af57c7739e31531\">In fact, the t score that arises in the context of a test for a mean is a t score with (n \u2013 1) degrees of freedom. Recall that each t distribution is indexed according to \u201cdegrees of freedom.\u201d Notice that, in the context of a test for a mean, the degrees of freedom depend on the sample size in the study. Remember that we said that higher degrees of freedom indicate that the t distribution is closer to normal. So in the context of a test for the mean, the <em class=\"italic\">larger the sample size<\/em>, the higher the degrees of freedom, and\u00a0<em class=\"italic\">the closer the t distribution is to a normal z distribution<\/em>. This is summarized with the notation near the bottom on the following image:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"fa0ca0018c0e46698bcdd3caec9dc9e4\" class=\"img-responsive popimg aligncenter\" title=\"The larger the sample size n, the closer the t-distribution gets to the standard normal.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image419.gif\" alt=\"The larger the sample size n, the closer the t-distribution gets to the standard normal.\" \/><\/span><\/span>\r\n<p id=\"ed56076db85b4811a67b15ea747ad429\">As a result, in the context of a test for a mean, the effect of the t distribution is\u00a0<em class=\"italic\">most important<\/em>\u00a0for a study with a\u00a0<em class=\"italic\">relatively small sample size<\/em>.<\/p>\r\n<p id=\"fb8c2ffc138f4a8986b645db4d115963\">We are now done introducing the t distribution. What are implications of all of this?<\/p>\r\n<p id=\"cd28bc7a4d9843d483f572db7db8fb24\">1. The null distribution of our t-test statistic: [latex]t=\\frac{\\overline{x}-\\mu_0}{\\frac{s}{\\sqrt{n}}}[\/latex] is the t distribution with (n-1) d.f. In other words, when H<sub>o<\/sub> is true (i.e., when<em><strong> \u03bc=\u03bc<sub>0<\/sub><\/strong><\/em>), our test statistic has a t distribution with (n-1) d.f., and this is the distribution under which we find p-values.<\/p>\r\n<p id=\"ae15717cac5346aab44f1abdb1d19625\">2. For a large sample size (n), the null distribution of the test statistic is approximately Z, so whether we use t(n-1) or Z to calculate the p-values should not make a big difference. Here is another practical way to look at this point. If we have a large n, our sample has more information about the population. Therefore, we can expect the sample standard deviation s to be close enough to the population standard deviation, \u03c3, so that for practical purposes we can use s as the known \u03c3, and we\u2019re back to the z-test.<\/p>\r\n\r\n<div id=\"e668030d101c440c9c9c06897ba72cd6\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">3. Finding the p-value<\/span><\/h2>\r\n<p id=\"e182340e5b9f474db8f1e44874442293\">The p-value of the t-test is found exactly the same way as it is found for the z-test, except that the t distribution is used instead of the Z distribution, as the figures below illustrate.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"b44fccc713324fa092b5448f4ed8d341\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Comment:<\/span><\/h2>\r\n<p id=\"c7c59b66101a4c44ac2047c7e67fc9d3\">Even though tables exist for the different t distributions, we will only use software to do the calculation for us.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"badc596888884b398d9dcd3dc721b586\" class=\"img-responsive popimg aligncenter\" title=\"H_a: \u03bc &amp;lt; \u03bc_0 \u21d2 p-value = P(t(n-1) \u2264 t)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image366.gif\" alt=\"H_a: \u03bc &amp;lt; \u03bc_0 \u21d2 p-value = P(t(n-1) \u2264 t)\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img id=\"ade4d840a2424627b24ee925fe93f213\" class=\"img-responsive popimg aligncenter\" title=\"A t(n-1) distribution with t-scores on its horizontal axis. T-scores of 0 and t have been marked, with t to the left of 0. t has been generated from a observed test statistic. The area to the left of t under the curve is the p-value.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image367.gif\" alt=\"A t(n-1) distribution with t-scores on its horizontal axis. T-scores of 0 and t have been marked, with t to the left of 0. t has been generated from a observed test statistic. The area to the left of t under the curve is the p-value.\" \/><\/span><\/span>\r\n\r\n<hr \/>\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"f3ccb982cff643919e369e86c825e374\" class=\"img-responsive popimg aligncenter\" title=\"H_a: \u03bc &amp;gt; \u03bc_0 \u21d2 p-value = P(t(n-1) \u2265 t)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image368.gif\" alt=\"H_a: \u03bc &amp;gt; \u03bc_0 \u21d2 p-value = P(t(n-1) \u2265 t)\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img id=\"ebab7e49448048c69da61ca58fbcc6da\" class=\"img-responsive popimg aligncenter\" title=\"A t(n-1) distribution with t-scores on its horizontal axis. T-scores of 0 and t have been marked, with t to the right of 0. t has been generated from a observed test statistic. The area to the right of t under the curve is the p-value.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image369.gif\" alt=\"A t(n-1) distribution with t-scores on its horizontal axis. T-scores of 0 and t have been marked, with t to the right of 0. t has been generated from a observed test statistic. The area to the right of t under the curve is the p-value.\" \/><\/span><\/span>\r\n\r\n<hr \/>\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"f9644c193dae41b1b082d1d355704cd5\" class=\"img-responsive popimg aligncenter\" title=\"Ha: \u03bc \u2260 \u03bc_0 \u21d2 p-value = P(t(n-1) \u2264 -|t|) + P(t(n-1) \u2265 |t|) = 2P(t(n-1) \u2265 |t|)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image370.gif\" alt=\"Ha: \u03bc \u2260 \u03bc_0 \u21d2 p-value = P(t(n-1) \u2264 -|t|) + P(t(n-1) \u2265 |t|) = 2P(t(n-1) \u2265 |t|)\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img id=\"befe58b463e54a33b9b1b2f288693b47\" class=\"img-responsive popimg aligncenter\" title=\"A t(n-1) distribution with t-scores on its horizontal axis. T-scores of -|t|, 0, and |t| have been marked. -|t| is to the left of 0, and |t| is to the right. t has been generated from a observed test statistic. The sum of the area under the curve to the left of -|t| and to the right of |t| is the p-value.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image371.gif\" alt=\"A t(n-1) distribution with t-scores on its horizontal axis. T-scores of -|t|, 0, and |t| have been marked. -|t| is to the left of 0, and |t| is to the right. t has been generated from a observed test statistic. The sum of the area under the curve to the left of -|t| and to the right of |t| is the p-value.\" \/><\/span><\/span>\r\n\r\n<hr \/>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"fbd6f179178d428292fb6a7b51bfb311\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\r\n<p id=\"f84e3909795e48c28d29e531a07a7530\">Note that due to the symmetry of the t distribution, for a given value of the test statistic t, the p-value for the two-sided test is twice as large as the p-value of either of the one-sided tests. The same thing happens when p-values are calculated under the t distribution as when they are calculated under the Z distribution.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"c9a3fb37efb848fb83056bb887105cd7\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">4. Drawing Conclusions<\/span><\/h2>\r\n<p id=\"de4035d2e9b4486fbadcffcddc8bdfd7\">As usual, based on the p-value (and some significance level of choice) we assess the significance of results, and draw our conclusions in context.<\/p>\r\n<p id=\"c48a361b450b4eb3ac5bd919cb2a7ff2\">To summarize:<\/p>\r\n<p id=\"ec4ea27ead81488bb1fbbd352a7c6bdb\">The main difference between the z-test and the t-test for the population mean is that we use the sample standard deviation s instead of the unknown population standard deviation \u03c3. As a result, the p-values are calculated under the t distribution instead of under the Z distribution. Since we are using software, this doesn\u2019t really impact us practically. However, it is important to understand what is going on behind the scenes, and not just use the software mechanically. This is why we went through the trouble of explaining the t distribution.<\/p>\r\n\r\n\r\n<hr \/>\r\n<p id=\"a39cb249c53a44f7a7f88d41080f29d8\">We are now ready to look at two examples.<\/p>\r\n<p id=\"N10B21\">For comparison purposes, we will use a modified version of the two problems we used in the previous case. We\u2019ll first introduce the modified versions and explain the changes.<\/p>\r\n\r\n<div class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4>1<\/h4>\r\n<div>\r\n<p id=\"N10B29\">The SAT is constructed so that scores have a national average of 500. The distribution is close to normal. The dean of students of Ross College suspects that in recent years the college attracts students who are more quantitatively inclined. A random sample of 4 students entering Ross college had an average math SAT (SAT-M) score of 550, and a sample standard deviation of 100. Does this provide enough evidence for the dean to conclude that the mean SAT-M of all Ross College students is higher than the national mean of 500?<\/p>\r\n<p id=\"N10B2C\">Here is a figure that represents this example where the changes are marked in blue:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_0\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents all of the Students at Ross College. We are interested in finding \u03bc, or the mean of the SAT-M scores, which has a normal distribution. The question we need to answer is &quot;is the mean SAT-M 500 (national mean) or is it higher?&quot; We take a sample from the population of size n = 4, represented by a smaller circle. For this sample, x-bar = 550, and S = 100.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image372.gif\" alt=\"A large circle represents all of the Students at Ross College. We are interested in finding \u03bc, or the mean of the SAT-M scores, which has a normal distribution. The question we need to answer is &quot;is the mean SAT-M 500 (national mean) or is it higher?&quot; We take a sample from the population of size n = 4, represented by a smaller circle. For this sample, x-bar = 550, and S = 100.\" \/><\/span><\/span>\r\n<p id=\"N10B35\">Note that the problem was changed so that the population standard deviation (which was assumed to be 100 before) is now unknown, and instead we assume that the sample of 4 students produced a sample mean of 550 (no change) and a sample standard deviation of s=100. (Sample standard deviations are never such nice rounded numbers, but for the sake of comparison we left it as 100.) Note that due to the changes, the z-test for the population mean is no longer appropriate, and we need to use the t-test.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4>2<\/h4>\r\n<div>\r\n<p id=\"N10B3E\">A certain prescription medicine is supposed to contain an average of 250 parts per million (ppm) of a certain chemical. If the concentration is higher than this, the drug may cause harmful side effects; if it is lower, the drug may be ineffective. The manufacturer runs a check to see if the mean concentration in a large shipment conforms to the target level of 250 ppm or not. A simple random sample of 100 portions is tested, and the sample mean concentration is found to be 247 ppm with a sample standard deviation of 12 ppm. Again, here is a figure that represents this example where the changes are marked in blue:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_1\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population, which is the shipment. \u03bc represents the concentration of the chemical. We need to answer &quot;is the mean concentration the required 250ppm or not?&quot; Selected from the population is a sample of size n=100, represented by a smaller circle. x-bar for this sample is 247 and S=12.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image373.gif\" alt=\"A large circle represents the population, which is the shipment. \u03bc represents the concentration of the chemical. We need to answer &quot;is the mean concentration the required 250ppm or not?&quot; Selected from the population is a sample of size n=100, represented by a smaller circle. x-bar for this sample is 247 and S=12.\" \/><\/span><\/span>\r\n<p id=\"N10B47\">The changes are similar to example 1: we no longer assume that the population standard deviation is known, and instead use the sample standard deviation of 12. Again, the problem was thus changed from a z-test problem to a t-test problem.<\/p>\r\n<p id=\"N10B4A\">However, as we mentioned earlier, due to the large sample size (n = 100) there should not be much difference whether we use the z-test or the t-test. The sample standard deviation, s, is expected to be close enough to the population standard deviation\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>\u00a0. We\u2019ll see this as we solve the problem.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<p id=\"N10B55\">Let\u2019s carry out the t-test for both of these problems:<\/p>\r\n<p id=\"N10B58\"><em>Example 1:<\/em><\/p>\r\n<p id=\"N10B5E\">1. There are no changes in the hypotheses being tested:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_2\" class=\"img-responsive popimg aligncenter\" style=\"vertical-align: middle;border: none;max-width: 100%;height: auto;margin: auto;padding: 0px;cursor: pointer\" title=\"H_0: \u03bc = 500, H_a: \u03bc &gt; 500\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image374.gif\" alt=\"H_0: \u03bc = 500, H_a: \u03bc &gt; 500\" \/><\/span><\/span>\r\n<p id=\"N10B67\">2. The conditions that allow us to use the t-test are met since:<\/p>\r\n<p id=\"N10B6A\">(i) The sample is random.<\/p>\r\n<p id=\"N10B6D\">(ii) SAT-M is known to vary normally in the population (which is crucial here, since the sample size is only 4).<\/p>\r\n<p id=\"N10B70\">In other words, we are in the following situation:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_3\" class=\"img-responsive popimg aligncenter\" title=\"A table which has two columns and two rows, and is titled &quot;Conditions: z-test for a population mean.&quot; The column headings are: &quot;Small Sample Size&quot; and &quot;Large Sample Size. &quot; The row headings are &quot;Variable varies normally in the population&quot; and &quot;Variable doesn't vary normally in the population.&quot; Here is the data in the table by cell in &quot;Row, Column: Value&quot; format: Variable varies normally in the population, Small sample size: OK (this is the case this example falls in); Variable varies normally in the population, Large sample size: OK; Variable doesn't vary normally in the population, Small sample size: NOT OK; Variable doesn't vary normally in the population, Large sample size: OK;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image328.gif\" alt=\"A table which has two columns and two rows, and is titled &quot;Conditions: z-test for a population mean.&quot; The column headings are: &quot;Small Sample Size&quot; and &quot;Large Sample Size. &quot; The row headings are &quot;Variable varies normally in the population&quot; and &quot;Variable doesn't vary normally in the population.&quot; Here is the data in the table by cell in &quot;Row, Column: Value&quot; format: Variable varies normally in the population, Small sample size: OK (this is the case this example falls in); Variable varies normally in the population, Large sample size: OK; Variable doesn't vary normally in the population, Small sample size: NOT OK; Variable doesn't vary normally in the population, Large sample size: OK;\" \/><\/span><\/span>\r\n\r\nThe test statistic is [latex]t=\\frac{\\bar{x}-u_{o}}{\\frac{s}{\\sqrt{n}}}=\\frac{550-500}{\\frac{100}{\\sqrt{4}}}=1[\/latex]\r\n<p id=\"N10BD5\">The data (represented by the sample mean) are 1 standard error above the null value.<\/p>\r\n<p id=\"N10BD8\">3. Finding the p-value.<\/p>\r\n<p id=\"N10BDB\">Recall that in general the p-value is calculated under the null distribution of the test statistic, which,<\/p>\r\n<p id=\"N10BDE\">in the t-test case, is t(n-1). In our case, in which n = 4, the p-value is calculated under the t(3) distribution:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_4\" class=\"img-responsive popimg aligncenter\" title=\"A t(3) distribution with t-scores 0 and 1 marked. The p-value is the area under the curve to the right of t-score 1.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image376.gif\" alt=\"A t(3) distribution with t-scores 0 and 1 marked. The p-value is the area under the curve to the right of t-score 1.\" \/><\/span><\/span>\r\n<p id=\"N10BE7\">Using statistical software, we find that the p-value is 0.196. For comparison purposes, the p-value that we got when we carried out the z-test for this problem (when we assumed that 100 is the known\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>\u00a0rather the calculated sample standard deviation, s) was 0.159.<\/p>\r\n<p id=\"N10BF3\">It is not surprising that the p-value of the t-test is larger, since the t distribution has fatter tails. Even though in this particular case the difference between the two values does not have practical implications (since both are large and will lead to the same conclusion), the difference is not trivial.<\/p>\r\n<p id=\"N10BF6\">4. Making conclusions.<\/p>\r\n<p id=\"N10BF9\">The p-value (0.196) is large, indicating that the results are not significant. The data do not provide enough evidence to conclude that the mean SAT-M among Ross College students is higher than the national mean (500).<\/p>\r\n<p id=\"N10BFC\">Here is a summary:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_5\" class=\"img-responsive popimg aligncenter\" style=\"vertical-align: middle;border: none;max-width: 100%;height: auto;margin: auto;padding: 0px;cursor: pointer\" title=\"A large circle represents all of the Students at Ross College. We are interested in finding \u03bc, or the mean of the SAT-M scores, which has a normal distribution. Our hypotheses are H_0: \u03bc = 500 and H_a: \u03bc &gt; 500. We take a sample of size n = 4 from the population, represented by a smaller circle. For this sample, we calculate that x-bar = 550 and S = 100. Our conditions are met, so we can find t = 1, and p-value = .196 . This p-value is too high, so the conclusion is that H_0 cannot be rejected.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image377.gif\" alt=\"A large circle represents all of the Students at Ross College. We are interested in finding \u03bc, or the mean of the SAT-M scores, which has a normal distribution. Our hypotheses are H_0: \u03bc = 500 and H_a: \u03bc &gt; 500. We take a sample of size n = 4 from the population, represented by a smaller circle. For this sample, we calculate that x-bar = 550 and S = 100. Our conditions are met, so we can find t = 1, and p-value = .196 . This p-value is too high, so the conclusion is that H_0 cannot be rejected.\" \/><\/span><\/span>\r\n<p id=\"N10C05\">Example 2:<\/p>\r\n<p id=\"N10C08\">1. There are no changes in the hypotheses being tested:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_6\" class=\"img-responsive popimg\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image320.gif\" alt=\"\" \/><\/span><\/span>\r\n<p id=\"N10C10\">2. The conditions that allow us to use the t-test are met:<\/p>\r\n<p id=\"N10C13\">(i) The sample is random<\/p>\r\n<p id=\"N10C16\">(ii) The sample size is large enough for the Central Limit Theorem to apply and ensure the normality of\u00a0[latex]\\overline{X}[\/latex]. In other words, we are in the following situation:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_7\" class=\"img-responsive popimg aligncenter\" title=\"The same table as before. The case this example is in is the &quot;Variable doesn't vary normally in the population, Large sample size&quot; case. The table has two columns and two rows, and is titled &quot;Conditions: z-test for a population mean.&quot; The column headings are: &quot;Small Sample Size&quot; and &quot;Large Sample Size. &quot; The row headings are &quot;Variable varies normally in the population&quot; and &quot;Variable doesn't vary normally in the population.&quot; Here is the data in the table by cell in &quot;Row, Column: Value&quot; format: Variable varies normally in the population, Small sample size: OK; Variable varies normally in the population, Large sample size: OK; Variable doesn't vary normally in the population, Small sample size: NOT OK; Variable doesn't vary normally in the population, Large sample size: OK (this is the case this example falls in);\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image333.gif\" alt=\"The same table as before. The case this example is in is the &quot;Variable doesn't vary normally in the population, Large sample size&quot; case. The table has two columns and two rows, and is titled &quot;Conditions: z-test for a population mean.&quot; The column headings are: &quot;Small Sample Size&quot; and &quot;Large Sample Size. &quot; The row headings are &quot;Variable varies normally in the population&quot; and &quot;Variable doesn't vary normally in the population.&quot; Here is the data in the table by cell in &quot;Row, Column: Value&quot; format: Variable varies normally in the population, Small sample size: OK; Variable varies normally in the population, Large sample size: OK; Variable doesn't vary normally in the population, Small sample size: NOT OK; Variable doesn't vary normally in the population, Large sample size: OK (this is the case this example falls in);\" \/><\/span><\/span>\r\n<p id=\"N10C2F\">The test statistic is: [latex]t=\\frac{x-u_{0}}{\\frac{s}{\\sqrt{n}}}=\\frac{247-250}{\\frac{12}{\\sqrt{100}}}=-2.5[\/latex]<\/p>\r\n<p id=\"N10C97\">The data (represented by the sample mean) are 2.5 standard errors below the null value.<\/p>\r\n<p id=\"N10C9A\">3. Finding the p-value.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_8\" class=\"img-responsive popimg aligncenter\" title=\"A t(99) curve, for which the horizontal axis has been labeled with t-scores of -2.5 and 2.5 . The area under the curve and to the left of -2.5 and to the right of 2.5 is the p-value.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image379.gif\" alt=\"A t(99) curve, for which the horizontal axis has been labeled with t-scores of -2.5 and 2.5 . The area under the curve and to the left of -2.5 and to the right of 2.5 is the p-value.\" \/><\/span><\/span>\r\n<p id=\"N10CA5\">To find the p-value we use statistical software, and we calculate a p-value of 0.014 with a 95% confidence interval of (244.619, 249.381). For comparison purposes, the output we got when we carried out the z-test for the same problem was a p-value of 0.012 with a 95% confidence interval of (244.648, 249.352).<\/p>\r\n<p id=\"N10CAA\">Note that here the difference between the p-values is quite negligible (.002). This is not surprising, since the sample size is quite large (n = 100) in which case, as we mentioned, the z-test (in which we are treating s as the known\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>\u00a0) is a very good approximation to the t-test. Note also how the two 95% confidence intervals are similar (for the same reason).<\/p>\r\n<p id=\"N10CB4\">4. Conclusions:<\/p>\r\n<p id=\"N10CB7\">The p-value is small (.014) indicating that at the 5% significance level, the results are significant. The data therefore provide evidence to conclude that the mean concentration in entire shipment is not the required 250.<\/p>\r\n<p id=\"N10CBA\">Here is a summary:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_9\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population, which is the shipment. \u03bc represents the concentration of the chemical. Our hypotheses are H_0:mean = 250, and H_a: mean is not 250. Selected from the population is a sample of size n=100, represented by a smaller circle. x-bar for this sample is 247, and because our conditions are met, we can calculate that t = -2.5, and that the p-value = .014. This p-value is low enough to let us conclude that we can reject H_0.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image380.gif\" alt=\"A large circle represents the population, which is the shipment. \u03bc represents the concentration of the chemical. Our hypotheses are H_0:mean = 250, and H_a: mean is not 250. Selected from the population is a sample of size n=100, represented by a smaller circle. x-bar for this sample is 247, and because our conditions are met, we can calculate that t = -2.5, and that the p-value = .014. This p-value is low enough to let us conclude that we can reject H_0.\" \/><\/span><\/span>\r\n<div id=\"N10CC3\" class=\"section purposewrap\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Comments<\/span><\/h2>\r\n<ol>\r\n \t<li id=\"N10CCA\">The 95% confidence interval for\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><\/span><\/span>\u00a0can be used here in the same way it is used when\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>\u00a0is known: either as a way to conduct the two-sided test (checking whether the null value falls inside or outside the confidence interval) or following a t-test where H<sub>o<\/sub>\u00a0was rejected (in order to get insight into the value of\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><\/span><\/span>\u00a0).<\/li>\r\n \t<li id=\"N10CE5\">While it is true that when\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>\u00a0is unknown and for large sample sizes the z-test is a good approximation for the t-test, since we are using software to carry out the t-test anyway, there is not much gain in using the z-test as an approximation instead. We might as well use the more exact t-test regardless of the sample size.<\/li>\r\n<\/ol>\r\n<p id=\"N10CEF\">However, it is always worthwhile knowing what happens behind the scenes.<\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N10CFC\">A group of Internet users 50-65 years of age were randomly chosen and asked to report the weekly number of hours they spend online. The purpose of the study was to determine whether the mean weekly number of hours that Internet users in that age group spend online differs from the mean for Internet users in general, which is 12.5 (as reported by \"The Digital Future Report: Surveying the Digital Future, Year Four\"). The following information is available:<\/p>\r\n\r\n<div class=\"image shouldbeleft\"><img id=\"_i_10\" class=\"img-responsive popimg aligncenter\" title=\"One-Sample T: hr. online. Test of mu = 12.5 vs mu not = 12.5 Variable: hr. online N: 125 Mean: 12.008 StDev: 3.214 SE Mean: 0.287 95% CI: (11,439, ) T: -1.71 P: 0.090\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image396.gif\" alt=\"One-Sample T: hr. online. Test of mu = 12.5 vs mu not = 12.5 Variable: hr. online N: 125 Mean: 12.008 StDev: 3.214 SE Mean: 0.287 95% CI: (11,439, ) T: -1.71 P: 0.090\" \/><\/div>\r\n<div>[h5p id=\"195\"]<\/div>\r\n<\/div>\r\n<\/div>\r\n<h2><span title=\"Quick scroll up\">To Summarize<\/span><\/h2>\r\n<p id=\"N10D35\">1. In hypothesis testing for the population mean (<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><\/span><\/span>\u00a0), we distinguish between two cases:<\/p>\r\n<p id=\"N10D3F\">I. The less common case when the population standard deviation (<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>) is known.<\/p>\r\n<p id=\"N10D49\">II. The more practical case when the population standard deviation is unknown and the sample standard deviation (s) is used instead.<\/p>\r\n<p id=\"N10D4C\">2. In the case when\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>\u00a0is known, the test for\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><\/span><\/span>\u00a0is called the z-test, and in case when\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>\u00a0is unknown and s is used instead, the test is called the t-test.<\/p>\r\n<p id=\"N10D64\">3. In both cases, the null hypothesis is:\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">:<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/p>\r\n<p id=\"N10D8B\">and the alternative, depending on the context, is one of the following:<\/p>\r\n<p id=\"N10D8E\"><span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">:<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">&lt;<\/span><\/span><span class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span>, or\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">:<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">&gt;<\/span><\/span><span class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span>, or\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">:<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2260<\/span><\/span><span class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/p>\r\n<p id=\"N10DFD\">4. Both tests can be safely used as long as the following two conditions are met:<\/p>\r\n<p id=\"N10E00\">(i) The sample is random (or can at least be considered random in context).<\/p>\r\n<p id=\"N10E03\">(ii) Either the sample size is large (n &gt; 30) or, if not, the variable of interest can be assumed to vary normally in the population.<\/p>\r\n<p id=\"N10E06\">5. In the z-test, the test statistic is:<\/p>\r\n[latex]z=\\frac{\\overline{x}-\\mu_0}{\\frac{\\sigma}{\\sqrt{n}}}[\/latex]\r\n<p id=\"N10E42\">whose null distribution is the standard normal distribution (under which the p-values are calculated).<\/p>\r\n<p id=\"N10E45\">6. In the t-test, the test statistic is:<\/p>\r\n[latex]t=\\frac{x-\\mu_0}{\\frac{s}{\\sqrt{n}}}[\/latex]\r\n<p id=\"N10E80\">whose null distribution is t(n \u2013 1) (under which the p-values are calculated).<\/p>\r\n<p id=\"N10E83\">7. For large sample sizes, the z-test is a good approximation for the t-test.<\/p>\r\n<p id=\"N10E86\">8. Confidence intervals can be used to carry out the two-sided test<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_11\" class=\"img-responsive popimg aligncenter\" title=\"H_0: \u03bc = \u03bc_0 vs.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image386.gif\" alt=\"H_0: \u03bc = \u03bc_0 vs.\" \/><\/span><\/span><span id=\"MathJax-Element-22-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><\/span><span id=\"MJXc-Node-203\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">:<\/span><\/span><span id=\"MJXc-Node-204\" class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><span id=\"MJXc-Node-205\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2260<\/span><\/span><span id=\"MJXc-Node-206\" class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span id=\"MJXc-Node-207\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><span class=\"mjx-sub\"><span id=\"MJXc-Node-208\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span>, and in cases where H<sub>o<\/sub>\u00a0is rejected, the confidence interval can give insight into the value of the population mean (<span id=\"MathJax-Element-23-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-209\" class=\"mjx-math\"><span id=\"MJXc-Node-210\" class=\"mjx-mrow\"><span id=\"MJXc-Node-211\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><\/span><\/span>).<\/p>\r\n<p id=\"N10EBD\">9. Here is a summary of which test to use under which conditions:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_12\" class=\"img-responsive popimg aligncenter\" title=\"A table. Here is the data in the table: With large sample size (regardless of whether the population is normal or not), we use the z-test if sigma is known, otherwise we use the t-test, keeping in mind that the z-test is a good approximation. With a small sample size, and normal population(* footnote), the z-test is used when we know sigma, and when we don't, we use the t-test. With a small sample size which has a population shape which is not normal or is unknown, we can't use the z-test or t-test. (*)Footnote: by &quot;Population normal&quot; we mean that either the population is known to be normal, or else that the population can be reasonably assumed to be normal as judged by the shape of the data histogram.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image387.png\" alt=\"A table. Here is the data in the table: With large sample size (regardless of whether the population is normal or not), we use the z-test if sigma is known, otherwise we use the t-test, keeping in mind that the z-test is a good approximation. With a small sample size, and normal population(* footnote), the z-test is used when we know sigma, and when we don't, we use the t-test. With a small sample size which has a population shape which is not normal or is unknown, we can't use the z-test or t-test. (*)Footnote: by &quot;Population normal&quot; we mean that either the population is known to be normal, or else that the population can be reasonably assumed to be normal as judged by the shape of the data histogram.\" \/><\/span><\/span>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n<em>Scenario:\u00a0<\/em>The Intel Corporation is conducting quality control on its circuit boards. Thickness of the manufactured circuit boards varies unavoidably from board to board. Suppose the thickness of the boards produced by a certain factory process varies normally. The distribution of thickness of the circuit boards is supposed to have the mean \u03bc = 12 mm if the manufacturing process is working correctly. A random sample of five circuit boards is selected and measured, and the average thickness is found to be 9.13 mm, and the standard deviation for the sample is computed to be 1.11 mm.\r\n<div class=\"image shouldbeleft\"><img id=\"N10070\" class=\" img-responsive popimg aligncenter\" title=\"One-sample T A\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image500.png\" alt=\"One-sample T A\" \/><\/div>\r\n<div class=\"image shouldbeleft\"><img id=\"N10073\" class=\" img-responsive popimg aligncenter\" title=\"One-sample Z B\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image501.png\" alt=\"One-sample Z B\" \/><\/div>\r\n<div>[h5p id=\"196\"]<\/div>\r\n<div>\r\n<p id=\"N10F16\">Now, suppose that Intel is testing a brand new manufacturing process, for which prior information wasn\u2019t available. In particular, for this new process,\u00a0<em>the population distribution\u2019s shape isn\u2019t known<\/em>. Use the following histograms to help you answer the question below.<\/p>\r\n\r\n<div class=\"image shouldbeleft\"><img id=\"_i_13\" class=\"img-responsive popimg aligncenter\" title=\"4 Histograms, all titled &quot;Histogram of thickness (mm),&quot; with a vertical axis for frequency and horizontal axis for thickness (mm). Histogram A roughly follows a normal shape and has the following data, organized in &quot;thickness: frequency&quot; order: 8: 1.0 9: 2.0 10: 3.0 11: 2.0 12: 1.0 Histogram B is right-skewed: 8: 7 9: 8 10: 5 11: 3 12: 2 13: 2 14: 2 15: 1 16: 1 17: 1 18: 1 19: 1 20: 1 Histogram C is also right skewed: (same data as Histogram B) Histogram D is right skewed: 8: 3.0 9: 2.0 10: 1.0 11: 1.0 12: 1.0 13: 1.0\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image502.gif\" alt=\"4 Histograms, all titled &quot;Histogram of thickness (mm),&quot; with a vertical axis for frequency and horizontal axis for thickness (mm). Histogram A roughly follows a normal shape and has the following data, organized in &quot;thickness: frequency&quot; order: 8: 1.0 9: 2.0 10: 3.0 11: 2.0 12: 1.0 Histogram B is right-skewed: 8: 7 9: 8 10: 5 11: 3 12: 2 13: 2 14: 2 15: 1 16: 1 17: 1 18: 1 19: 1 20: 1 Histogram C is also right skewed: (same data as Histogram B) Histogram D is right skewed: 8: 3.0 9: 2.0 10: 1.0 11: 1.0 12: 1.0 13: 1.0\" \/><\/div>\r\n<div><\/div>\r\n<\/div>\r\n<div>[h5p id=\"197\"]<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>","rendered":"<div id=\"ad4cddb074274366ba21655d678e9bac\" class=\"section\">\n<div class=\"sectionContain\">\n<div id=\"d73d1fa10f64412e9cedd7cf886a30a2\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div id=\"d9934ab9907247088157b1a054896017\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Tests About \u03bc When \u03c3 is Unknown\u2014The t-test for the Population Mean<\/span><\/h2>\n<p id=\"a8cedc77d9d04167a08989e8034ef22a\">As we mentioned earlier, only in a few cases is it reasonable to assume that the population standard deviation, \u03c3, is known. The case where \u03c3 is unknown is much more common in practice. What can we use to replace \u03c3? If you don\u2019t know the population standard deviation, the best you can do is find the sample standard deviation, S, and use it instead of \u03c3. (Note that this is exactly what we did when we discussed confidence intervals).<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"d68cb487ac91402db4a80f825256caf2\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of interest. \u03bc is unknown and \u03c3 is unknown. From the population we create a SRS of size n, represented by a smaller circle. We can find x-bar for this SRS, and we can also obtain S. We use this instead of the unknown \u03c3.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image355.gif\" alt=\"A large circle represents the population of interest. \u03bc is unknown and \u03c3 is unknown. From the population we create a SRS of size n, represented by a smaller circle. We can find x-bar for this SRS, and we can also obtain S. We use this instead of the unknown \u03c3.\" \/><\/span><\/span><\/p>\n<p id=\"ba321a5b17b34234b4ed0152e9c07e45\">Is that it? Can we just use S instead of \u03c3, and the rest is the same as the previous case? Unfortunately, it\u2019s not that simple, but not very complicated either.<\/p>\n<p id=\"d2ac9feda22b4f76836428de971caac4\">We will first go through the four steps of the t-test for the population mean and explain in what way this test is different from the z-test in the previous case. For comparison purposes, we will then apply the t-test to a variation of the two examples we used in the previous case, and end with an activity where you\u2019ll get to carry out the t-test yourself.<\/p>\n<p id=\"e9741215ccb34de88e7a423586646d1d\">Let\u2019s start by describing the four steps for the t-test:<\/p>\n<p id=\"baa43dc8485143329b903a4520e073ac\"><em class=\"italic\">I.\u00a0<\/em>Stating the hypotheses.<\/p>\n<p id=\"aac312369d1f4f55b79201e1f584c461\">In this step there are no changes:<\/p>\n<p id=\"a9ba1a3aee6a4c6baa3049a8c89d2474\">* The null hypothesis has the form:<\/p>\n<p id=\"e78d5c21c05746eea36c36c5f639005c\"><strong><em>H<sub>0<\/sub> : \u03bc = \u03bc<sub>0<\/sub><\/em>\u00a0<\/strong><\/p>\n<p id=\"e8c3fa40eb464a158aea90fb0fb6f2b3\">(where <strong>\u03bc<sub>0 <\/sub><\/strong>is the null value).<\/p>\n<p id=\"fd9eb364b8624c09ab7aa14d979c53b2\">* The alternative hypothesis takes one of the following three forms (depending on the context):<\/p>\n<p id=\"ad6d8519c3a1482aaff6cb8dd897870d\"><em><strong>H<sub>a<\/sub> : \u03bc &lt; \u03bc<sub>0<\/sub><\/strong><\/em>\u00a0(one-sided)<\/p>\n<p id=\"fb4a7624d6db4c5cb9cd4f89743e204e\"><em><strong>H<sub>a<\/sub> : \u03bc &gt; \u03bc<sub>0<\/sub><\/strong><\/em>\u00a0\u00a0(one-sided)<\/p>\n<p id=\"deff4e181ec045d7bec352f549e65d5b\"><em><strong>Ha : \u03bc \u2260 \u03bc<sub>0<\/sub>\u00a0<\/strong><\/em> (two-sided)<\/p>\n<p id=\"e27ad0b2327348d19b22add7e9f4bfdd\"><em class=\"italic\">II.<\/em>\u00a0Checking the conditions under which the t-test can be safely used and summarizing the data.<\/p>\n<p id=\"f805ffea90ba401b994f0ab2d48f2af2\">Technically, this step only changes slightly compared to what we do in the z-test. However, as you\u2019ll see, this small change has important implications. The conditions under which the t-test can be safely carried out are exactly the same as those for the z-test:<\/p>\n<p id=\"eeed6de7a8d447fba95df67bd0b50633\">(i) The sample is random (or at least can be considered random in context).<\/p>\n<p id=\"da47ec1725f6462e85afaa223d223cb1\">(ii) We are in one of the three situations marked with a green check mark in the following table (which ensure that\u00a0[latex]\\overline{X}[\/latex]\u00a0is at least approximately normal):<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"f3240078a0bf4872bb816384d05e9644\" class=\"img-responsive popimg aligncenter\" title=\"A table which has two columns and two rows, and is titled &amp;quot;Conditions: z-test for a population mean.&amp;quot; The column headings are: &amp;quot;Small Sample Size&amp;quot; and &amp;quot;Large Sample Size. &amp;quot; The row headings are &amp;quot;Variable varies normally in the population&amp;quot; and &amp;quot;Variable doesn&amp;apos;t vary normally in the population.&amp;quot; Here is the data in the table by cell in &amp;quot;Row, Column: Value&amp;quot; format: Variable varies normally in the population, Small sample size: OK; Variable varies normally in the population, Large sample size: OK; Variable doesn&amp;apos;t vary normally in the population, Small sample size: NOT OK; Variable doesn&amp;apos;t vary normally in the population, Large sample size: OK;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image325.gif\" alt=\"A table which has two columns and two rows, and is titled &amp;quot;Conditions: z-test for a population mean.&amp;quot; The column headings are: &amp;quot;Small Sample Size&amp;quot; and &amp;quot;Large Sample Size. &amp;quot; The row headings are &amp;quot;Variable varies normally in the population&amp;quot; and &amp;quot;Variable doesn&amp;apos;t vary normally in the population.&amp;quot; Here is the data in the table by cell in &amp;quot;Row, Column: Value&amp;quot; format: Variable varies normally in the population, Small sample size: OK; Variable varies normally in the population, Large sample size: OK; Variable doesn&amp;apos;t vary normally in the population, Small sample size: NOT OK; Variable doesn&amp;apos;t vary normally in the population, Large sample size: OK;\" \/><\/span><\/span><\/p>\n<p id=\"f1a513f6176f4604ab70970d16e2f785\">Assuming that the conditions are met, we calculate the sample mean [latex]\\overline{x}[\/latex] and the sample standard deviation, S (which replaces \u03c3), and summarize the data with a test statistic. As in the z-test, our test statistic will be the standardized score of [latex]\\overline{X}[\/latex] assuming that <strong><em> \u03bc = \u03bc<sub>0 <\/sub><\/em><\/strong>(H<sub>o<\/sub>\u00a0is true). The difference here is that we don\u2019t know \u03c3, so we use S instead. The test statistic for the t-test for the population mean is therefore:<\/p>\n<p>[latex]t=\\frac{\\overline{x} - u_{0}}{\\frac{s}{\\sqrt{n}}}[\/latex]<\/p>\n<p id=\"b3f637305a804ef381e0fb73ab274cc8\">The change is in the denominator: while in the z-test we divided by the standard\u00a0<em class=\"italic\">deviation<\/em>\u00a0of\u00a0[latex]\\overline{x}[\/latex], namely\u00a0[latex]\\frac{\\sigma}{\\sqrt{n}}[\/latex], here we divide by the standard\u00a0<em class=\"italic\">error<\/em> of\u00a0[latex]\\overline{X}[\/latex], namely\u00a0[latex]\\frac{s}{\\sqrt{n}}[\/latex]. Does this have an effect on the rest of the test? Yes. The t-test statistic in the test for the mean does not follow a standard normal distribution. Rather, it follows another bell-shaped distribution called the t distribution. So we first need to introduce you to this new distribution as a general object. Then, we\u2019ll come back to our discussion of the t-test for the mean and how the t-distribution arises in that context.<\/p>\n<\/div>\n<\/div>\n<div id=\"be78453423644952928e6f31050f077f\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">The t Distribution<\/span><\/h2>\n<p id=\"feaaa0de39fc49b7b162a8cf139d6424\">We have seen that variables can be visually modeled by many different sorts of shapes, and we call these shapes distributions. Several distributions arise so frequently that they have been given special names, and they have been studied mathematically. So far in the course, the only one we\u2019ve named is the normal distribution, but there are others. One of them is called the t distribution.<\/p>\n<p id=\"f590be039b454c4f83072fc7fa5219d4\">The t distribution is another bell-shaped (unimodal and symmetric) distribution, like the normal distribution; and the center of the t distribution is standardized at zero, like the center of the normal distribution.<\/p>\n<p id=\"db8582517b6548a2a85ce751f331541b\">Like all distributions that are used as probability models, the normal and the t distribution are both scaled, so the total area under each of them is 1.<\/p>\n<p id=\"a7e2b264318a4cfda43efbf33c65827a\">So how is the t distribution fundamentally\u00a0<em class=\"italic\">different<\/em>\u00a0from the normal distribution?<\/p>\n<p id=\"ec77c27a2d1345e9911c449ea997ac1c\">The\u00a0<em class=\"italic\">spread<\/em>.<\/p>\n<p id=\"e154a5d29f7340d6b11b8c2288a0dbff\">The following picture illustrates the fundamental difference between the normal distribution and the t distribution:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"e38cb0a802c54b0891cd2437b6b42b41\" class=\"img-responsive popimg aligncenter\" title=\"A standard normal curve modeling the Z-distribution and a curve modeling the t-distribution. Both have been scaled so that the area under the curve is 1. The standard normal curve has less spread than the t-distribution curve. This means that the left and right tails are closer to each other than in the t-distribution, and that it is taller than the t-distribution. The t-distribution is narrower than the standard normal distribution when close to the center. Because of this, the curves intersect once on each side of the center.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image363.gif\" alt=\"A standard normal curve modeling the Z-distribution and a curve modeling the t-distribution. Both have been scaled so that the area under the curve is 1. The standard normal curve has less spread than the t-distribution curve. This means that the left and right tails are closer to each other than in the t-distribution, and that it is taller than the t-distribution. The t-distribution is narrower than the standard normal distribution when close to the center. Because of this, the curves intersect once on each side of the center.\" \/><\/span><\/span><\/p>\n<p id=\"d52b36e261d346ab8ac5fa24ac76ad21\">You can see in the picture that the t distribution has\u00a0<em class=\"italic\">slightly less area near the expected central value<\/em>\u00a0than the normal distribution does, and you can see that the t distribution has correspondingly\u00a0<em class=\"italic\">more area in the \u201ctails\u201d<\/em>\u00a0than the normal distribution does. (It\u2019s often said that the t distribution has \u201cfatter tails\u201d or \u201cheavier tails\u201d than the normal distribution.)<\/p>\n<p id=\"fc254cc52d2544c798475f1535dc6c29\">This reflects the fact that the t distribution\u00a0<em class=\"italic\">has a larger spread<\/em>\u00a0than the normal distribution. The same total area of 1 is spread out over a slightly wider range on the t distribution, making it a bit lower near the center compared to the normal distribution, and giving the t distribution slightly more probability in the \u2018tails\u2019 compared to the normal distribution.<\/p>\n<p id=\"f866b4aad2f7492188e1a24be9bfad20\">Therefore, the t distribution ends up being the appropriate model in certain cases where there is\u00a0<em class=\"italic\">more variability<\/em>\u00a0than would be predicted by the normal distribution. One of these cases is stock values, which have more variability (or \u201cvolatility,\u201d to use the economic term) than would be predicted by the normal distribution.<\/p>\n<p id=\"d7c45dd022d74790af3db678696dd5be\">There\u2019s actually an entire family of t distributions. They all have similar formulas (but the math is beyond the scope of this introductory course in statistics), and they all have slightly \u201cfatter tails\u201d than the normal distribution. But some are closer to normal than others. The t distributions that are closer to normal are said to have higher \u201cdegrees of freedom\u201d (that\u2019s a mathematical concept that we won\u2019t use in this course, beyond merely mentioning it here). So, there\u2019s a t distribution \u201cwith one degree of freedom,\u201d another t distribution \u201cwith 2 degrees of freedom\u201d which is slightly closer to normal, another t distribution \u201cwith 3 degrees of freedom.\u201d which is a bit closer to normal than the previous ones, and so on.<\/p>\n<p id=\"dd09a482f6e049da85df0e358d274a23\">The following picture illustrates this idea with just a couple of t distributions (note that \u201cdegrees of freedom\u201d is abbreviated \u201cd.f.\u201d on the picture):<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"a9f0e3f15c99425ab08407314876ecf4\" class=\"img-responsive popimg aligncenter\" title=\"The standard normal z-distribution curve overlaid with a t-distribution with 5 d.f., and a t-distribution with 2 d.f. The distribution with 2 t.f. is shorter and has more spread than the t-distribution with 5 d.f., which in turn is shorter and wider than the standard normal distribution.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image417.gif\" alt=\"The standard normal z-distribution curve overlaid with a t-distribution with 5 d.f., and a t-distribution with 2 d.f. The distribution with 2 t.f. is shorter and has more spread than the t-distribution with 5 d.f., which in turn is shorter and wider than the standard normal distribution.\" \/><\/span><\/span><\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"d5624d2544454a93a4fdeb57ca4e3a9d\">The following figure of the standard normal distribution together with a t distribution will visually help you answer the following questions.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b0b21dab921c46f5bfeb8dd106c41125\" class=\"img-responsive popimg aligncenter\" title=\"The standard normal Z distribution curve and the t-distribution curve overlaid on top of each other, centered at a z-score of 0. At z-score = 3, a blue vertical line has been drawn. Here, the t distribution&amp;apos;s wider spread causes it to be higher than the standard normal curve. Going right, we see that the standard normal curve reaches zero much sooner compared to the t distribution curve.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image395.gif\" alt=\"The standard normal Z distribution curve and the t-distribution curve overlaid on top of each other, centered at a z-score of 0. At z-score = 3, a blue vertical line has been drawn. Here, the t distribution&amp;apos;s wider spread causes it to be higher than the standard normal curve. Going right, we see that the standard normal curve reaches zero much sooner compared to the t distribution curve.\" \/><\/span><\/span><\/p>\n<div id=\"h5p-193\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-193\" class=\"h5p-iframe\" data-content-id=\"193\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.3 Learn by doing 1\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"aaaa753820b74548801e052847100251\">The following figure of the standard normal distribution together with a t distribution will visually help you answer the following questions.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"e45a9b1aa40046ac8afaf164f74ddc98\" class=\"img-responsive popimg aligncenter\" title=\"The standard normal Z distribution curve and the t distribution curve overlaid on top of each other, centered at a z-score of 0. At z-score = -2, a blue vertical line has been drawn. Here, the t distribution and standard normal curve intersect. Going left, we see that the standard normal curve reaches zero much sooner compared to the t distribution curve, and that the t distribution is above the standard normal distribution. Going right from the vertical blue line, we see that the t distribution is under the standard normal distribution and ultimately will have a lower peak value compared to the standard normal distribution.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image418.gif\" alt=\"The standard normal Z distribution curve and the t distribution curve overlaid on top of each other, centered at a z-score of 0. At z-score = -2, a blue vertical line has been drawn. Here, the t distribution and standard normal curve intersect. Going left, we see that the standard normal curve reaches zero much sooner compared to the t distribution curve, and that the t distribution is above the standard normal distribution. Going right from the vertical blue line, we see that the t distribution is under the standard normal distribution and ultimately will have a lower peak value compared to the standard normal distribution.\" \/><\/span><\/span><\/p>\n<div id=\"h5p-194\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-194\" class=\"h5p-iframe\" data-content-id=\"194\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.3 Did I get this 1\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"ab40301b1321460497bce4a86577ae28\" class=\"section section-didigetthis purposewrap\">Now let\u2019s return to our discussion of the test for the mean, and let\u2019s see how and why the t distribution arises in that context.<\/div>\n<div>\n<p id=\"b3ffbe763f994f5ebcbf8d7f85111243\">Recall that we were discussing the situation of testing for a mean, in the case when sigma is unknown. We\u2019ve seen previously that when sigma is known, the test statistic is [latex]z=\\frac{\\overline{x}-\\mu_0}{\\frac{\\sigma}{\\sqrt{n}}}[\/latex] (note the sigma (\u03c3) in the formula), which follows a normal distribution. But when sigma is\u00a0<em class=\"italic\">unknown<\/em>, the test statistic in the test for a mean becomes [latex]t=\\frac{\\overline{x}-\\mu_0}{\\frac{s}{\\sqrt{n}}}[\/latex] (note the use of \u201cs\u201d in the formula, in place of the unknown sigma).\u00a0<em class=\"italic\">Here<\/em> is where the t-distribution arises in the context of a test for a mean, because [latex]t=\\frac{\\overline{x}-\\mu_0}{\\frac{s}{\\sqrt{n}}}[\/latex] (with \u201cs\u201d in the formula in place of the unknown sigma) follows a t distribution.<\/p>\n<p id=\"a59e956ade884ed38c29452f3e4509ec\">Notice the only difference between the formula for the Z statistic and the formula for the t statistic: In the formula for the Z statistic, sigma (the standard deviation of the population) must be known; whereas, when sigma isn\u2019t known, then \u201cs\u201d (the standard deviation of the sample data) is used in place of the unknown sigma. That\u2019s the change that causes the statistic to be a t statistic.<\/p>\n<p id=\"e02c0bbdec3b420c9a4cfdd13003b2c8\">Why would this single change (using \u201cs\u201d in place of \u201csigma\u201d) result in a sampling distribution that is the t distribution instead of the standard normal (Z) distribution? Remember that the t distribution is more appropriate in cases where there is more variability. So why is there more variability when s is used in place of the unknown sigma?<\/p>\n<p id=\"d6918e6a4fbf476bb4e9db0ee3a2f5ff\">Well, remember that sigma (\u03c3) is a parameter (it\u2019s the standard deviation of the population), whose value therefore never changes. Whereas, s (the standard deviation of the sample data) varies from sample to sample, and therefore it\u2019s another source of variation. So, using s in place of sigma causes the sampling distribution to be the t distribution because of that extra source of variation:<\/p>\n<p id=\"dfdcfd253cb543869dfb159ae842eef9\">In the formula [latex]z=\\frac{\\overline{x}-\\mu_0}{\\frac{\\sigma}{\\sqrt{n}}}[\/latex], the only source of variation is the sampling variability of the sample mean [latex]\\overline{X}[\/latex] (none of the other terms in that formula vary randomly in a given study);<\/p>\n<p id=\"ab22dd1e90b04718a9651e72264bce09\">Whereas in the formula [latex]t=\\frac{\\overline{x}-\\mu_0}{\\frac{s}{\\sqrt{n}}}[\/latex], there are <em class=\"italic\">two<\/em> sources of variation: One source is the sampling variability of the sample mean [latex]\\overline{X}[\/latex]; The <em class=\"italic\">other<\/em>\u00a0source is the sampling variability of sample standard deviation s.<\/p>\n<p id=\"ac4b74607ee34d86b39b01c59e2973ee\">So, in a test for a mean, if sigma isn\u2019t known, then s is used in place of the unknown sigma and that results in the test statistic being a t score.<\/p>\n<p id=\"ed3e8903cda1429c9232a27c3d24770b\">The t score, in the context of a test for a mean, is summarized by the following figure:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"e6112c1e652c463b84811fe137b19dcb\" class=\"img-responsive popimg aligncenter\" title=\"The z-score is calculated with z = ( x-bar - \u03bc ) \/ [ \u03c3\/\u221an ]. Note that there is only one source of variation, x-bar. The standard deviation of x-bar is the denominator, \u03c3\/\u221an. This Z (standard normal) distribution is centered at 0, bell shaped, and has a standard devation of 1. The t-score is calculated with t = ( x-bar - \u03bc) \/ [ s\/\u221an ] . Note that the denominator, s\/\u221an, is the standard error of x-bar. Also notice that we now have two sources of variation, x-bar and s. The t-distribution (with n-1 d.f.) is centered at zero, bell shaped, and has a larger spread.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image365.gif\" alt=\"The z-score is calculated with z = ( x-bar - \u03bc ) \/ [ \u03c3\/\u221an ]. Note that there is only one source of variation, x-bar. The standard deviation of x-bar is the denominator, \u03c3\/\u221an. This Z (standard normal) distribution is centered at 0, bell shaped, and has a standard devation of 1. The t-score is calculated with t = ( x-bar - \u03bc) \/ [ s\/\u221an ] . Note that the denominator, s\/\u221an, is the standard error of x-bar. Also notice that we now have two sources of variation, x-bar and s. The t-distribution (with n-1 d.f.) is centered at zero, bell shaped, and has a larger spread.\" \/><\/span><\/span><\/p>\n<p id=\"d46f1c4b8edc4ed48af57c7739e31531\">In fact, the t score that arises in the context of a test for a mean is a t score with (n \u2013 1) degrees of freedom. Recall that each t distribution is indexed according to \u201cdegrees of freedom.\u201d Notice that, in the context of a test for a mean, the degrees of freedom depend on the sample size in the study. Remember that we said that higher degrees of freedom indicate that the t distribution is closer to normal. So in the context of a test for the mean, the <em class=\"italic\">larger the sample size<\/em>, the higher the degrees of freedom, and\u00a0<em class=\"italic\">the closer the t distribution is to a normal z distribution<\/em>. This is summarized with the notation near the bottom on the following image:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"fa0ca0018c0e46698bcdd3caec9dc9e4\" class=\"img-responsive popimg aligncenter\" title=\"The larger the sample size n, the closer the t-distribution gets to the standard normal.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image419.gif\" alt=\"The larger the sample size n, the closer the t-distribution gets to the standard normal.\" \/><\/span><\/span><\/p>\n<p id=\"ed56076db85b4811a67b15ea747ad429\">As a result, in the context of a test for a mean, the effect of the t distribution is\u00a0<em class=\"italic\">most important<\/em>\u00a0for a study with a\u00a0<em class=\"italic\">relatively small sample size<\/em>.<\/p>\n<p id=\"fb8c2ffc138f4a8986b645db4d115963\">We are now done introducing the t distribution. What are implications of all of this?<\/p>\n<p id=\"cd28bc7a4d9843d483f572db7db8fb24\">1. The null distribution of our t-test statistic: [latex]t=\\frac{\\overline{x}-\\mu_0}{\\frac{s}{\\sqrt{n}}}[\/latex] is the t distribution with (n-1) d.f. In other words, when H<sub>o<\/sub> is true (i.e., when<em><strong> \u03bc=\u03bc<sub>0<\/sub><\/strong><\/em>), our test statistic has a t distribution with (n-1) d.f., and this is the distribution under which we find p-values.<\/p>\n<p id=\"ae15717cac5346aab44f1abdb1d19625\">2. For a large sample size (n), the null distribution of the test statistic is approximately Z, so whether we use t(n-1) or Z to calculate the p-values should not make a big difference. Here is another practical way to look at this point. If we have a large n, our sample has more information about the population. Therefore, we can expect the sample standard deviation s to be close enough to the population standard deviation, \u03c3, so that for practical purposes we can use s as the known \u03c3, and we\u2019re back to the z-test.<\/p>\n<div id=\"e668030d101c440c9c9c06897ba72cd6\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">3. Finding the p-value<\/span><\/h2>\n<p id=\"e182340e5b9f474db8f1e44874442293\">The p-value of the t-test is found exactly the same way as it is found for the z-test, except that the t distribution is used instead of the Z distribution, as the figures below illustrate.<\/p>\n<\/div>\n<\/div>\n<div id=\"b44fccc713324fa092b5448f4ed8d341\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Comment:<\/span><\/h2>\n<p id=\"c7c59b66101a4c44ac2047c7e67fc9d3\">Even though tables exist for the different t distributions, we will only use software to do the calculation for us.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"badc596888884b398d9dcd3dc721b586\" class=\"img-responsive popimg aligncenter\" title=\"H_a: \u03bc &amp;lt; \u03bc_0 \u21d2 p-value = P(t(n-1) \u2264 t)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image366.gif\" alt=\"H_a: \u03bc &amp;lt; \u03bc_0 \u21d2 p-value = P(t(n-1) \u2264 t)\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"ade4d840a2424627b24ee925fe93f213\" class=\"img-responsive popimg aligncenter\" title=\"A t(n-1) distribution with t-scores on its horizontal axis. T-scores of 0 and t have been marked, with t to the left of 0. t has been generated from a observed test statistic. The area to the left of t under the curve is the p-value.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image367.gif\" alt=\"A t(n-1) distribution with t-scores on its horizontal axis. T-scores of 0 and t have been marked, with t to the left of 0. t has been generated from a observed test statistic. The area to the left of t under the curve is the p-value.\" \/><\/span><\/span><\/p>\n<hr \/>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"f3ccb982cff643919e369e86c825e374\" class=\"img-responsive popimg aligncenter\" title=\"H_a: \u03bc &amp;gt; \u03bc_0 \u21d2 p-value = P(t(n-1) \u2265 t)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image368.gif\" alt=\"H_a: \u03bc &amp;gt; \u03bc_0 \u21d2 p-value = P(t(n-1) \u2265 t)\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"ebab7e49448048c69da61ca58fbcc6da\" class=\"img-responsive popimg aligncenter\" title=\"A t(n-1) distribution with t-scores on its horizontal axis. T-scores of 0 and t have been marked, with t to the right of 0. t has been generated from a observed test statistic. The area to the right of t under the curve is the p-value.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image369.gif\" alt=\"A t(n-1) distribution with t-scores on its horizontal axis. T-scores of 0 and t have been marked, with t to the right of 0. t has been generated from a observed test statistic. The area to the right of t under the curve is the p-value.\" \/><\/span><\/span><\/p>\n<hr \/>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"f9644c193dae41b1b082d1d355704cd5\" class=\"img-responsive popimg aligncenter\" title=\"Ha: \u03bc \u2260 \u03bc_0 \u21d2 p-value = P(t(n-1) \u2264 -|t|) + P(t(n-1) \u2265 |t|) = 2P(t(n-1) \u2265 |t|)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image370.gif\" alt=\"Ha: \u03bc \u2260 \u03bc_0 \u21d2 p-value = P(t(n-1) \u2264 -|t|) + P(t(n-1) \u2265 |t|) = 2P(t(n-1) \u2265 |t|)\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"befe58b463e54a33b9b1b2f288693b47\" class=\"img-responsive popimg aligncenter\" title=\"A t(n-1) distribution with t-scores on its horizontal axis. T-scores of -|t|, 0, and |t| have been marked. -|t| is to the left of 0, and |t| is to the right. t has been generated from a observed test statistic. The sum of the area under the curve to the left of -|t| and to the right of |t| is the p-value.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image371.gif\" alt=\"A t(n-1) distribution with t-scores on its horizontal axis. T-scores of -|t|, 0, and |t| have been marked. -|t| is to the left of 0, and |t| is to the right. t has been generated from a observed test statistic. The sum of the area under the curve to the left of -|t| and to the right of |t| is the p-value.\" \/><\/span><\/span><\/p>\n<hr \/>\n<\/div>\n<\/div>\n<div id=\"fbd6f179178d428292fb6a7b51bfb311\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\n<p id=\"f84e3909795e48c28d29e531a07a7530\">Note that due to the symmetry of the t distribution, for a given value of the test statistic t, the p-value for the two-sided test is twice as large as the p-value of either of the one-sided tests. The same thing happens when p-values are calculated under the t distribution as when they are calculated under the Z distribution.<\/p>\n<\/div>\n<\/div>\n<div id=\"c9a3fb37efb848fb83056bb887105cd7\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">4. Drawing Conclusions<\/span><\/h2>\n<p id=\"de4035d2e9b4486fbadcffcddc8bdfd7\">As usual, based on the p-value (and some significance level of choice) we assess the significance of results, and draw our conclusions in context.<\/p>\n<p id=\"c48a361b450b4eb3ac5bd919cb2a7ff2\">To summarize:<\/p>\n<p id=\"ec4ea27ead81488bb1fbbd352a7c6bdb\">The main difference between the z-test and the t-test for the population mean is that we use the sample standard deviation s instead of the unknown population standard deviation \u03c3. As a result, the p-values are calculated under the t distribution instead of under the Z distribution. Since we are using software, this doesn\u2019t really impact us practically. However, it is important to understand what is going on behind the scenes, and not just use the software mechanically. This is why we went through the trouble of explaining the t distribution.<\/p>\n<hr \/>\n<p id=\"a39cb249c53a44f7a7f88d41080f29d8\">We are now ready to look at two examples.<\/p>\n<p id=\"N10B21\">For comparison purposes, we will use a modified version of the two problems we used in the previous case. We\u2019ll first introduce the modified versions and explain the changes.<\/p>\n<div class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4>1<\/h4>\n<div>\n<p id=\"N10B29\">The SAT is constructed so that scores have a national average of 500. The distribution is close to normal. The dean of students of Ross College suspects that in recent years the college attracts students who are more quantitatively inclined. A random sample of 4 students entering Ross college had an average math SAT (SAT-M) score of 550, and a sample standard deviation of 100. Does this provide enough evidence for the dean to conclude that the mean SAT-M of all Ross College students is higher than the national mean of 500?<\/p>\n<p id=\"N10B2C\">Here is a figure that represents this example where the changes are marked in blue:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_0\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents all of the Students at Ross College. We are interested in finding \u03bc, or the mean of the SAT-M scores, which has a normal distribution. The question we need to answer is &quot;is the mean SAT-M 500 (national mean) or is it higher?&quot; We take a sample from the population of size n = 4, represented by a smaller circle. For this sample, x-bar = 550, and S = 100.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image372.gif\" alt=\"A large circle represents all of the Students at Ross College. We are interested in finding \u03bc, or the mean of the SAT-M scores, which has a normal distribution. The question we need to answer is &quot;is the mean SAT-M 500 (national mean) or is it higher?&quot; We take a sample from the population of size n = 4, represented by a smaller circle. For this sample, x-bar = 550, and S = 100.\" \/><\/span><\/span><\/p>\n<p id=\"N10B35\">Note that the problem was changed so that the population standard deviation (which was assumed to be 100 before) is now unknown, and instead we assume that the sample of 4 students produced a sample mean of 550 (no change) and a sample standard deviation of s=100. (Sample standard deviations are never such nice rounded numbers, but for the sake of comparison we left it as 100.) Note that due to the changes, the z-test for the population mean is no longer appropriate, and we need to use the t-test.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4>2<\/h4>\n<div>\n<p id=\"N10B3E\">A certain prescription medicine is supposed to contain an average of 250 parts per million (ppm) of a certain chemical. If the concentration is higher than this, the drug may cause harmful side effects; if it is lower, the drug may be ineffective. The manufacturer runs a check to see if the mean concentration in a large shipment conforms to the target level of 250 ppm or not. A simple random sample of 100 portions is tested, and the sample mean concentration is found to be 247 ppm with a sample standard deviation of 12 ppm. Again, here is a figure that represents this example where the changes are marked in blue:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_1\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population, which is the shipment. \u03bc represents the concentration of the chemical. We need to answer &quot;is the mean concentration the required 250ppm or not?&quot; Selected from the population is a sample of size n=100, represented by a smaller circle. x-bar for this sample is 247 and S=12.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image373.gif\" alt=\"A large circle represents the population, which is the shipment. \u03bc represents the concentration of the chemical. We need to answer &quot;is the mean concentration the required 250ppm or not?&quot; Selected from the population is a sample of size n=100, represented by a smaller circle. x-bar for this sample is 247 and S=12.\" \/><\/span><\/span><\/p>\n<p id=\"N10B47\">The changes are similar to example 1: we no longer assume that the population standard deviation is known, and instead use the sample standard deviation of 12. Again, the problem was thus changed from a z-test problem to a t-test problem.<\/p>\n<p id=\"N10B4A\">However, as we mentioned earlier, due to the large sample size (n = 100) there should not be much difference whether we use the z-test or the t-test. The sample standard deviation, s, is expected to be close enough to the population standard deviation\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>\u00a0. We\u2019ll see this as we solve the problem.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"N10B55\">Let\u2019s carry out the t-test for both of these problems:<\/p>\n<p id=\"N10B58\"><em>Example 1:<\/em><\/p>\n<p id=\"N10B5E\">1. There are no changes in the hypotheses being tested:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_2\" class=\"img-responsive popimg aligncenter\" style=\"vertical-align: middle;border: none;max-width: 100%;height: auto;margin: auto;padding: 0px;cursor: pointer\" title=\"H_0: \u03bc = 500, H_a: \u03bc &gt; 500\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image374.gif\" alt=\"H_0: \u03bc = 500, H_a: \u03bc &gt; 500\" \/><\/span><\/span><\/p>\n<p id=\"N10B67\">2. The conditions that allow us to use the t-test are met since:<\/p>\n<p id=\"N10B6A\">(i) The sample is random.<\/p>\n<p id=\"N10B6D\">(ii) SAT-M is known to vary normally in the population (which is crucial here, since the sample size is only 4).<\/p>\n<p id=\"N10B70\">In other words, we are in the following situation:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_3\" class=\"img-responsive popimg aligncenter\" title=\"A table which has two columns and two rows, and is titled &quot;Conditions: z-test for a population mean.&quot; The column headings are: &quot;Small Sample Size&quot; and &quot;Large Sample Size. &quot; The row headings are &quot;Variable varies normally in the population&quot; and &quot;Variable doesn't vary normally in the population.&quot; Here is the data in the table by cell in &quot;Row, Column: Value&quot; format: Variable varies normally in the population, Small sample size: OK (this is the case this example falls in); Variable varies normally in the population, Large sample size: OK; Variable doesn't vary normally in the population, Small sample size: NOT OK; Variable doesn't vary normally in the population, Large sample size: OK;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image328.gif\" alt=\"A table which has two columns and two rows, and is titled &quot;Conditions: z-test for a population mean.&quot; The column headings are: &quot;Small Sample Size&quot; and &quot;Large Sample Size. &quot; The row headings are &quot;Variable varies normally in the population&quot; and &quot;Variable doesn't vary normally in the population.&quot; Here is the data in the table by cell in &quot;Row, Column: Value&quot; format: Variable varies normally in the population, Small sample size: OK (this is the case this example falls in); Variable varies normally in the population, Large sample size: OK; Variable doesn't vary normally in the population, Small sample size: NOT OK; Variable doesn't vary normally in the population, Large sample size: OK;\" \/><\/span><\/span><\/p>\n<p>The test statistic is [latex]t=\\frac{\\bar{x}-u_{o}}{\\frac{s}{\\sqrt{n}}}=\\frac{550-500}{\\frac{100}{\\sqrt{4}}}=1[\/latex]<\/p>\n<p id=\"N10BD5\">The data (represented by the sample mean) are 1 standard error above the null value.<\/p>\n<p id=\"N10BD8\">3. Finding the p-value.<\/p>\n<p id=\"N10BDB\">Recall that in general the p-value is calculated under the null distribution of the test statistic, which,<\/p>\n<p id=\"N10BDE\">in the t-test case, is t(n-1). In our case, in which n = 4, the p-value is calculated under the t(3) distribution:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_4\" class=\"img-responsive popimg aligncenter\" title=\"A t(3) distribution with t-scores 0 and 1 marked. The p-value is the area under the curve to the right of t-score 1.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image376.gif\" alt=\"A t(3) distribution with t-scores 0 and 1 marked. The p-value is the area under the curve to the right of t-score 1.\" \/><\/span><\/span><\/p>\n<p id=\"N10BE7\">Using statistical software, we find that the p-value is 0.196. For comparison purposes, the p-value that we got when we carried out the z-test for this problem (when we assumed that 100 is the known\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>\u00a0rather the calculated sample standard deviation, s) was 0.159.<\/p>\n<p id=\"N10BF3\">It is not surprising that the p-value of the t-test is larger, since the t distribution has fatter tails. Even though in this particular case the difference between the two values does not have practical implications (since both are large and will lead to the same conclusion), the difference is not trivial.<\/p>\n<p id=\"N10BF6\">4. Making conclusions.<\/p>\n<p id=\"N10BF9\">The p-value (0.196) is large, indicating that the results are not significant. The data do not provide enough evidence to conclude that the mean SAT-M among Ross College students is higher than the national mean (500).<\/p>\n<p id=\"N10BFC\">Here is a summary:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_5\" class=\"img-responsive popimg aligncenter\" style=\"vertical-align: middle;border: none;max-width: 100%;height: auto;margin: auto;padding: 0px;cursor: pointer\" title=\"A large circle represents all of the Students at Ross College. We are interested in finding \u03bc, or the mean of the SAT-M scores, which has a normal distribution. Our hypotheses are H_0: \u03bc = 500 and H_a: \u03bc &gt; 500. We take a sample of size n = 4 from the population, represented by a smaller circle. For this sample, we calculate that x-bar = 550 and S = 100. Our conditions are met, so we can find t = 1, and p-value = .196 . This p-value is too high, so the conclusion is that H_0 cannot be rejected.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image377.gif\" alt=\"A large circle represents all of the Students at Ross College. We are interested in finding \u03bc, or the mean of the SAT-M scores, which has a normal distribution. Our hypotheses are H_0: \u03bc = 500 and H_a: \u03bc &gt; 500. We take a sample of size n = 4 from the population, represented by a smaller circle. For this sample, we calculate that x-bar = 550 and S = 100. Our conditions are met, so we can find t = 1, and p-value = .196 . This p-value is too high, so the conclusion is that H_0 cannot be rejected.\" \/><\/span><\/span><\/p>\n<p id=\"N10C05\">Example 2:<\/p>\n<p id=\"N10C08\">1. There are no changes in the hypotheses being tested:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_6\" class=\"img-responsive popimg\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image320.gif\" alt=\"\" \/><\/span><\/span><\/p>\n<p id=\"N10C10\">2. The conditions that allow us to use the t-test are met:<\/p>\n<p id=\"N10C13\">(i) The sample is random<\/p>\n<p id=\"N10C16\">(ii) The sample size is large enough for the Central Limit Theorem to apply and ensure the normality of\u00a0[latex]\\overline{X}[\/latex]. In other words, we are in the following situation:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_7\" class=\"img-responsive popimg aligncenter\" title=\"The same table as before. The case this example is in is the &quot;Variable doesn't vary normally in the population, Large sample size&quot; case. The table has two columns and two rows, and is titled &quot;Conditions: z-test for a population mean.&quot; The column headings are: &quot;Small Sample Size&quot; and &quot;Large Sample Size. &quot; The row headings are &quot;Variable varies normally in the population&quot; and &quot;Variable doesn't vary normally in the population.&quot; Here is the data in the table by cell in &quot;Row, Column: Value&quot; format: Variable varies normally in the population, Small sample size: OK; Variable varies normally in the population, Large sample size: OK; Variable doesn't vary normally in the population, Small sample size: NOT OK; Variable doesn't vary normally in the population, Large sample size: OK (this is the case this example falls in);\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image333.gif\" alt=\"The same table as before. The case this example is in is the &quot;Variable doesn't vary normally in the population, Large sample size&quot; case. The table has two columns and two rows, and is titled &quot;Conditions: z-test for a population mean.&quot; The column headings are: &quot;Small Sample Size&quot; and &quot;Large Sample Size. &quot; The row headings are &quot;Variable varies normally in the population&quot; and &quot;Variable doesn't vary normally in the population.&quot; Here is the data in the table by cell in &quot;Row, Column: Value&quot; format: Variable varies normally in the population, Small sample size: OK; Variable varies normally in the population, Large sample size: OK; Variable doesn't vary normally in the population, Small sample size: NOT OK; Variable doesn't vary normally in the population, Large sample size: OK (this is the case this example falls in);\" \/><\/span><\/span><\/p>\n<p id=\"N10C2F\">The test statistic is: [latex]t=\\frac{x-u_{0}}{\\frac{s}{\\sqrt{n}}}=\\frac{247-250}{\\frac{12}{\\sqrt{100}}}=-2.5[\/latex]<\/p>\n<p id=\"N10C97\">The data (represented by the sample mean) are 2.5 standard errors below the null value.<\/p>\n<p id=\"N10C9A\">3. Finding the p-value.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_8\" class=\"img-responsive popimg aligncenter\" title=\"A t(99) curve, for which the horizontal axis has been labeled with t-scores of -2.5 and 2.5 . The area under the curve and to the left of -2.5 and to the right of 2.5 is the p-value.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image379.gif\" alt=\"A t(99) curve, for which the horizontal axis has been labeled with t-scores of -2.5 and 2.5 . The area under the curve and to the left of -2.5 and to the right of 2.5 is the p-value.\" \/><\/span><\/span><\/p>\n<p id=\"N10CA5\">To find the p-value we use statistical software, and we calculate a p-value of 0.014 with a 95% confidence interval of (244.619, 249.381). For comparison purposes, the output we got when we carried out the z-test for the same problem was a p-value of 0.012 with a 95% confidence interval of (244.648, 249.352).<\/p>\n<p id=\"N10CAA\">Note that here the difference between the p-values is quite negligible (.002). This is not surprising, since the sample size is quite large (n = 100) in which case, as we mentioned, the z-test (in which we are treating s as the known\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>\u00a0) is a very good approximation to the t-test. Note also how the two 95% confidence intervals are similar (for the same reason).<\/p>\n<p id=\"N10CB4\">4. Conclusions:<\/p>\n<p id=\"N10CB7\">The p-value is small (.014) indicating that at the 5% significance level, the results are significant. The data therefore provide evidence to conclude that the mean concentration in entire shipment is not the required 250.<\/p>\n<p id=\"N10CBA\">Here is a summary:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_9\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population, which is the shipment. \u03bc represents the concentration of the chemical. Our hypotheses are H_0:mean = 250, and H_a: mean is not 250. Selected from the population is a sample of size n=100, represented by a smaller circle. x-bar for this sample is 247, and because our conditions are met, we can calculate that t = -2.5, and that the p-value = .014. This p-value is low enough to let us conclude that we can reject H_0.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image380.gif\" alt=\"A large circle represents the population, which is the shipment. \u03bc represents the concentration of the chemical. Our hypotheses are H_0:mean = 250, and H_a: mean is not 250. Selected from the population is a sample of size n=100, represented by a smaller circle. x-bar for this sample is 247, and because our conditions are met, we can calculate that t = -2.5, and that the p-value = .014. This p-value is low enough to let us conclude that we can reject H_0.\" \/><\/span><\/span><\/p>\n<div id=\"N10CC3\" class=\"section purposewrap\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Comments<\/span><\/h2>\n<ol>\n<li id=\"N10CCA\">The 95% confidence interval for\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><\/span><\/span>\u00a0can be used here in the same way it is used when\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>\u00a0is known: either as a way to conduct the two-sided test (checking whether the null value falls inside or outside the confidence interval) or following a t-test where H<sub>o<\/sub>\u00a0was rejected (in order to get insight into the value of\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><\/span><\/span>\u00a0).<\/li>\n<li id=\"N10CE5\">While it is true that when\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>\u00a0is unknown and for large sample sizes the z-test is a good approximation for the t-test, since we are using software to carry out the t-test anyway, there is not much gain in using the z-test as an approximation instead. We might as well use the more exact t-test regardless of the sample size.<\/li>\n<\/ol>\n<p id=\"N10CEF\">However, it is always worthwhile knowing what happens behind the scenes.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"N10CFC\">A group of Internet users 50-65 years of age were randomly chosen and asked to report the weekly number of hours they spend online. The purpose of the study was to determine whether the mean weekly number of hours that Internet users in that age group spend online differs from the mean for Internet users in general, which is 12.5 (as reported by &#8220;The Digital Future Report: Surveying the Digital Future, Year Four&#8221;). The following information is available:<\/p>\n<div class=\"image shouldbeleft\"><img decoding=\"async\" id=\"_i_10\" class=\"img-responsive popimg aligncenter\" title=\"One-Sample T: hr. online. Test of mu = 12.5 vs mu not = 12.5 Variable: hr. online N: 125 Mean: 12.008 StDev: 3.214 SE Mean: 0.287 95% CI: (11,439, ) T: -1.71 P: 0.090\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image396.gif\" alt=\"One-Sample T: hr. online. Test of mu = 12.5 vs mu not = 12.5 Variable: hr. online N: 125 Mean: 12.008 StDev: 3.214 SE Mean: 0.287 95% CI: (11,439, ) T: -1.71 P: 0.090\" \/><\/div>\n<div>\n<div id=\"h5p-195\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-195\" class=\"h5p-iframe\" data-content-id=\"195\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.3 Did I get this 2\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2><span title=\"Quick scroll up\">To Summarize<\/span><\/h2>\n<p id=\"N10D35\">1. In hypothesis testing for the population mean (<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><\/span><\/span>\u00a0), we distinguish between two cases:<\/p>\n<p id=\"N10D3F\">I. The less common case when the population standard deviation (<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>) is known.<\/p>\n<p id=\"N10D49\">II. The more practical case when the population standard deviation is unknown and the sample standard deviation (s) is used instead.<\/p>\n<p id=\"N10D4C\">2. In the case when\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>\u00a0is known, the test for\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><\/span><\/span>\u00a0is called the z-test, and in case when\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><\/span><\/span><\/span>\u00a0is unknown and s is used instead, the test is called the t-test.<\/p>\n<p id=\"N10D64\">3. In both cases, the null hypothesis is:\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">:<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/p>\n<p id=\"N10D8B\">and the alternative, depending on the context, is one of the following:<\/p>\n<p id=\"N10D8E\"><span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">:<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">&lt;<\/span><\/span><span class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span>, or\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">:<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">&gt;<\/span><\/span><span class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span>, or\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">:<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2260<\/span><\/span><span class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/p>\n<p id=\"N10DFD\">4. Both tests can be safely used as long as the following two conditions are met:<\/p>\n<p id=\"N10E00\">(i) The sample is random (or can at least be considered random in context).<\/p>\n<p id=\"N10E03\">(ii) Either the sample size is large (n &gt; 30) or, if not, the variable of interest can be assumed to vary normally in the population.<\/p>\n<p id=\"N10E06\">5. In the z-test, the test statistic is:<\/p>\n<p>[latex]z=\\frac{\\overline{x}-\\mu_0}{\\frac{\\sigma}{\\sqrt{n}}}[\/latex]<\/p>\n<p id=\"N10E42\">whose null distribution is the standard normal distribution (under which the p-values are calculated).<\/p>\n<p id=\"N10E45\">6. In the t-test, the test statistic is:<\/p>\n<p>[latex]t=\\frac{x-\\mu_0}{\\frac{s}{\\sqrt{n}}}[\/latex]<\/p>\n<p id=\"N10E80\">whose null distribution is t(n \u2013 1) (under which the p-values are calculated).<\/p>\n<p id=\"N10E83\">7. For large sample sizes, the z-test is a good approximation for the t-test.<\/p>\n<p id=\"N10E86\">8. Confidence intervals can be used to carry out the two-sided test<span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_11\" class=\"img-responsive popimg aligncenter\" title=\"H_0: \u03bc = \u03bc_0 vs.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image386.gif\" alt=\"H_0: \u03bc = \u03bc_0 vs.\" \/><\/span><\/span><span id=\"MathJax-Element-22-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><\/span><span id=\"MJXc-Node-203\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">:<\/span><\/span><span id=\"MJXc-Node-204\" class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><span id=\"MJXc-Node-205\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2260<\/span><\/span><span id=\"MJXc-Node-206\" class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span id=\"MJXc-Node-207\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><span class=\"mjx-sub\"><span id=\"MJXc-Node-208\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span>, and in cases where H<sub>o<\/sub>\u00a0is rejected, the confidence interval can give insight into the value of the population mean (<span id=\"MathJax-Element-23-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-209\" class=\"mjx-math\"><span id=\"MJXc-Node-210\" class=\"mjx-mrow\"><span id=\"MJXc-Node-211\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03bc<\/span><\/span><\/span><\/span><\/span>).<\/p>\n<p id=\"N10EBD\">9. Here is a summary of which test to use under which conditions:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_12\" class=\"img-responsive popimg aligncenter\" title=\"A table. Here is the data in the table: With large sample size (regardless of whether the population is normal or not), we use the z-test if sigma is known, otherwise we use the t-test, keeping in mind that the z-test is a good approximation. With a small sample size, and normal population(* footnote), the z-test is used when we know sigma, and when we don't, we use the t-test. With a small sample size which has a population shape which is not normal or is unknown, we can't use the z-test or t-test. (*)Footnote: by &quot;Population normal&quot; we mean that either the population is known to be normal, or else that the population can be reasonably assumed to be normal as judged by the shape of the data histogram.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image387.png\" alt=\"A table. Here is the data in the table: With large sample size (regardless of whether the population is normal or not), we use the z-test if sigma is known, otherwise we use the t-test, keeping in mind that the z-test is a good approximation. With a small sample size, and normal population(* footnote), the z-test is used when we know sigma, and when we don't, we use the t-test. With a small sample size which has a population shape which is not normal or is unknown, we can't use the z-test or t-test. (*)Footnote: by &quot;Population normal&quot; we mean that either the population is known to be normal, or else that the population can be reasonably assumed to be normal as judged by the shape of the data histogram.\" \/><\/span><\/span><\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p><em>Scenario:\u00a0<\/em>The Intel Corporation is conducting quality control on its circuit boards. Thickness of the manufactured circuit boards varies unavoidably from board to board. Suppose the thickness of the boards produced by a certain factory process varies normally. The distribution of thickness of the circuit boards is supposed to have the mean \u03bc = 12 mm if the manufacturing process is working correctly. A random sample of five circuit boards is selected and measured, and the average thickness is found to be 9.13 mm, and the standard deviation for the sample is computed to be 1.11 mm.<\/p>\n<div class=\"image shouldbeleft\"><img decoding=\"async\" id=\"N10070\" class=\"img-responsive popimg aligncenter\" title=\"One-sample T A\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image500.png\" alt=\"One-sample T A\" \/><\/div>\n<div class=\"image shouldbeleft\"><img decoding=\"async\" id=\"N10073\" class=\"img-responsive popimg aligncenter\" title=\"One-sample Z B\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image501.png\" alt=\"One-sample Z B\" \/><\/div>\n<div>\n<div id=\"h5p-196\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-196\" class=\"h5p-iframe\" data-content-id=\"196\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.3 Learn by doing 2\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<div>\n<p id=\"N10F16\">Now, suppose that Intel is testing a brand new manufacturing process, for which prior information wasn\u2019t available. In particular, for this new process,\u00a0<em>the population distribution\u2019s shape isn\u2019t known<\/em>. Use the following histograms to help you answer the question below.<\/p>\n<div class=\"image shouldbeleft\"><img decoding=\"async\" id=\"_i_13\" class=\"img-responsive popimg aligncenter\" title=\"4 Histograms, all titled &quot;Histogram of thickness (mm),&quot; with a vertical axis for frequency and horizontal axis for thickness (mm). Histogram A roughly follows a normal shape and has the following data, organized in &quot;thickness: frequency&quot; order: 8: 1.0 9: 2.0 10: 3.0 11: 2.0 12: 1.0 Histogram B is right-skewed: 8: 7 9: 8 10: 5 11: 3 12: 2 13: 2 14: 2 15: 1 16: 1 17: 1 18: 1 19: 1 20: 1 Histogram C is also right skewed: (same data as Histogram B) Histogram D is right skewed: 8: 3.0 9: 2.0 10: 1.0 11: 1.0 12: 1.0 13: 1.0\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image502.gif\" alt=\"4 Histograms, all titled &quot;Histogram of thickness (mm),&quot; with a vertical axis for frequency and horizontal axis for thickness (mm). Histogram A roughly follows a normal shape and has the following data, organized in &quot;thickness: frequency&quot; order: 8: 1.0 9: 2.0 10: 3.0 11: 2.0 12: 1.0 Histogram B is right-skewed: 8: 7 9: 8 10: 5 11: 3 12: 2 13: 2 14: 2 15: 1 16: 1 17: 1 18: 1 19: 1 20: 1 Histogram C is also right skewed: (same data as Histogram B) Histogram D is right skewed: 8: 3.0 9: 2.0 10: 1.0 11: 1.0 12: 1.0 13: 1.0\" \/><\/div>\n<div><\/div>\n<\/div>\n<div>\n<div id=\"h5p-197\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-197\" class=\"h5p-iframe\" data-content-id=\"197\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.3 Learn by doing 3\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"author":150,"menu_order":4,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[48],"contributor":[],"license":[],"class_list":["post-559","chapter","type-chapter","status-publish","hentry","chapter-type-numberless"],"part":421,"_links":{"self":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/559","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/users\/150"}],"version-history":[{"count":24,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/559\/revisions"}],"predecessor-version":[{"id":1036,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/559\/revisions\/1036"}],"part":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/parts\/421"}],"metadata":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/559\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/media?parent=559"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapter-type?post=559"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/contributor?post=559"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/license?post=559"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}